beachmat 2.0.0

Ordinarily, direct support for a matrix representation would require the appropriate methods to be defined in *beachmat* at compile time.
This is the case for the most widely used matrix classes but is somewhat restrictive for other community-contributed matrix representations.
Fortunately, R provides a mechanism to link across shared libraries from different packages.
This means that package developers who define a R matrix representation can also define C++ methods for native read/write support in *beachmat*-dependent code.
By doing so, we can improve efficiency of access to these new classes by avoiding the need for block processing via R.

A functioning demonstration of this approach is available in the `extensions`

test package.
This vignette will provide an explanation of the code in `extensions`

, and we suggest examining the source code at the same time:

`system.file("extensions", package="beachmat")`

`## [1] "/tmp/RtmpddqHqW/Rinst3543339c4c6f/beachmat/extensions"`

Assume that we have already defined a new matrix-like S4 class (here, `AaronMatrix`

).
To notify the *beachmat* API that direct input support is available, we need to:

- define a method for the
`supportCppAccess()`

generic (from*beachmat*) for this class. This should return`TRUE`

if direct support is available (obviously). - define a method for the
`type()`

generic from the*DelayedArray*package. This should return the type of the matrix, i.e., integer, logical, numeric or character.

It is possible to only have direct support for particular data types of the given matrix representation.
The example in `extensions`

only directly supports integer and character `AaronMatrix`

objects1 Because I was too lazy to add all of them. and will only return `TRUE`

for such types.

We will use integer matrices for demonstration, though it is simple to generalize this to all types by replacing `_integer`

with, e.g., `_character`

2 Some understanding of C++ templates will greatly simplify the definition of the same methods for different types..
First, we define a `create()`

function that takes a `SEXP`

object and returns a `void`

pointer.
This should presumably point to some C++ class that can contain intermediate data structures for efficient access.

`void * ptr = AaronMatrix_integer_input_create(in /* SEXP */);`

We define a `clone()`

function that performs a deep copy of the aforementioned pointer.

`void * ptr_copy = AaronMatrix_integer_input_clone(ptr /* void* */);`

We define a `destroy()`

function that frees the memory pointed to by `ptr`

.

`AaronMatrix_integer_input_destroy(ptr /* void* */);`

We define a `get_dim()`

function that records the number of rows and columns in the object pointed to by `ptr`

.
Note the pointers for `nrow`

and `ncol`

.

```
AaronMatrix_integer_input_dim(
ptr, /* void* */
nrow, /* size_t* */
ncol /* size_t* */
);
```

A systematic naming scheme is used for all functions, consisting of:

- The name of the matrix representation, i.e.,
`AaronMatrix`

. - The data type, i.e.,
`integer`

. - Whether it is an
`input`

or`output`

class. - The purpose of the function, e.g.,
`destroy`

.

In general, the getter functions follow the same structure as that described for the input API.
We expect a `get`

function to obtain a specified entry of the matrix:

```
AaronMatrix_integer_intput_get(
ptr, /* void* */
r, /* size_t */
c, /* size_t */
val /* int* */
);
```

Note that `val`

is a **pointer** to the matrix type.
For example, `val`

should be a `Rcpp::String*`

for character matrices, a `double*`

for numeric matrices, and an `int*`

for logical matrices.

Developers can assume that `r`

and `c`

are valid, i.e., within `[0, nrow)`

and `[0, ncol)`

respectively.
These checks are performed by *beachmat* and do not have to be repeated within developer-defined functions3 Obviously, the dimensions of the matrix pointed to by `ptr`

should not change!.

Here, we will use character matrices4 Character matrices tend to require some special attention, as character arrays need to be coerced to `Rcpp::String`

objects to be returned in `in`

. as an example.
We expect a `getCol`

function to obtain a column of the matrix:

```
AaronMatrix_character_input_getCol(
ptr, /* void* */
c, /* size_t */
in, /* Rcpp::StringVector::iterator* */
first, /* size_t */
last /* size_t */
);
```

… and another `getRow`

function to obtain a row of the matrix:

```
AaronMatrix_character_input_getRow(
ptr, /* void* */
r, /* size_t */
in*, /* Rcpp::StringVector::iterator* */
first, /* size_t */
last /* size_t */
);
```

These are in camelcase to simplify parsing of the function names.
We further expect a `getCols`

function to obtain multiple columns:

```
AaronMatrix_character_input_getCols(
ptr, /* void* */
c, /* size_t */
indices, /* Rcpp::IntegerVector::iterator* */
n, /* size_t */
in, /* Rcpp::StringVector::iterator* */
first, /* size_t */
last /* size_t */
);
```

… and a `getRows`

function to obtain multiple rows:

```
AaronMatrix_character_input_getRows(
ptr, /* void* */
r, /* size_t */
indices, /* Rcpp::IntegerVector::iterator* */
n, /* size_t */
in, /* Rcpp::StringVector::iterator* */
first, /* size_t */
last /* size_t */
);
```

In all cases, `first`

and `last`

can be assumed to be valid, i.e., `first <= last`

and both in `[0, nrow)`

or `[0, ncol)`

(for column and row access, respectively).
Indices in `indices`

can also be assumed to be valid, i.e., within matrix dimensions and strictly increasing.

We stress that the various iterator arguments are *pointers to iterators* rather than the iterators themselves.
This is to avoid potential issues with C++ classes when using C-style linkage via R’s `R_GetCCallable()`

framework.

For integer, logical or numeric matrices, we need to account for type conversions. This is done by defining the following functions (using integer matrices as an example):

`AaronMatrix_integer_input_getCol_integer`

, for getting a single column’s values as integers.`AaronMatrix_integer_input_getCol_numeric`

, for getting a single column’s values as double-precision values.`AaronMatrix_integer_input_getRow_integer`

, for getting a single row’s values as integers.`AaronMatrix_integer_input_getRow_numeric`

, for getting a single row’s values as double-precision values.`AaronMatrix_integer_input_getCols_integer`

, for getting multiple columns’ values as integers.`AaronMatrix_integer_input_getCols_numeric`

, for getting multiple columns’ values as double-precision values.`AaronMatrix_integer_input_getRows_integer`

, for getting multiple rows’ values as integers.`AaronMatrix_integer_input_getRows_numeric`

, for getting multiple rows’ values as double-precision values.

Taking the single-column getter as an example:

```
AaronMatrix_integer_input_getCol_integer(
ptr, /* void* */
c, /* size_t */
in, /* Rcpp::IntegerVector::iterator* */
first, /* size_t */
last /* size_t */
);
AaronMatrix_integer_input_getCol_numeric(
ptr, /* void* */
c, /* size_t */
in, /* Rcpp::NumericVector::iterator* */
first, /* size_t */
last /* size_t */
);
```

The function name now has an additional suffix to denote the destination type.
We explicitly define conversions here as the cross-library linking framework does not support templating or overloading of `in`

.

To notify the *beachmat* API that direct output support is available,
we need to define flags in our package’s namespace.

```
beachmat_AaronMatrix_integer_output <- TRUE
beachmat_AaronMatrix_character_output <- TRUE
```

This indicates that support is available for `AaronMatrix`

integer and character outputs.
Missing or `FALSE`

flags indicate that no support is available, in which case *beachmat* will write to an ordinary matrix by default.

Again, we will use integer matrices for demonstration.
The required functions are mostly similar to the input case.
For creation, we expect to have the number of rows `nr`

and columns `nc`

:

```
void * ptr = AaronMatrix_integer_output_create(
nr /* size_t */,
nc /* size_t */
);
```

We define a `clone()`

function to perform a deep copy:

`void * ptr_copy = AaronMatrix_integer_output_clone(ptr /* void* */);`

We also define a `destroy()`

function to free memory:

`AaronMatrix_integer_output_destroy(ptr /* void* */);`

In all cases, we use `_output_`

to indicate that we are dealing with an output matrix class.

In general, the setter functions follow the same structure as that described for the out API.
We expect a `set`

function to obtain a specified entry of the matrix:

```
AaronMatrix_integer_intput_set(
ptr, /* void* */
r, /* size_t */
c, /* size_t */
val /* int* */
);
```

Again, note that `val`

is a **pointer** to the matrix type.

Here, we will use character matrices5 Character matrices tend to require some special attention, as character arrays need to be coerced to `Rcpp::String`

objects to be returned in `in`

. as an example.
We expect a `setCol`

function to obtain a column of the matrix:

```
AaronMatrix_character_output_setCol(
ptr, /* void* */
c, /* size_t */
in, /* Rcpp::StringVector::iterator* */
first, /* size_t */
last /* size_t */
);
```

… and another `setRow`

function to obtain a row of the matrix:

```
AaronMatrix_character_output_setRow(
ptr, /* void* */
r, /* size_t */
in*, /* Rcpp::StringVector::iterator* */
first, /* size_t */
last /* size_t */
);
```

These are in camelcase to simplify parsing of the function names.
We further expect a `setColIndexed`

function to set specific elements of a column:

```
AaronMatrix_character_output_setColIndexed(
ptr, /* void */
c, /* size_t */
n, /* size_t */
idx, /* Rcpp::IntegerVector::iterator */
in /* Rcpp::StringVector::iterator */
)
```

… where `idx`

points to an array of `n`

zero-indexed row indices and `val`

points to an array of values.
The function should assign each value to the corresponding row at column `c`

of the output matrix.

Similarly, we expect a `setRowIndexed`

function to set specific elements of a row:

```
AaronMatrix_character_output_setRowIndexed(
ptr, /* void */
r, /* size_t */
n, /* size_t */
idx, /* Rcpp::IntegerVector::iterator */
in /* Rcpp::StringVector::iterator */
)
```

… where `idx`

now contains column indices.

For integer, logical or numeric matrices, we need to account for type conversions. This is done by defining the following functions (using integer matrices as an example):

`AaronMatrix_integer_output_setCol_integer`

, for setting a single column’s values from integers.`AaronMatrix_integer_output_setCol_numeric`

, for setting a single column’s values from double-precision values.`AaronMatrix_integer_output_setRow_integer`

, for setting a single row’s values from integers.`AaronMatrix_integer_output_setRow_numeric`

, for setting a single row’s values from double-precision values.`AaronMatrix_integer_output_setColIndexed_integer`

, for indexed setting of a single column’s values from integers.`AaronMatrix_integer_output_setColIndexed_numeric`

, for indexed setting of a single column’s values from double-precision values.`AaronMatrix_integer_output_setRowIndexed_integer`

, for indexed setting of a single row’s values from integers.`AaronMatrix_integer_output_setRowIndexed_numeric`

, for indexed setting of a single row’s values from double-precision values

Taking the single-column setter as an example:

```
AaronMatrix_integer_output_setCol_integer(
ptr, /* void* */
c, /* size_t */
in, /* Rcpp::IntegerVector::iterator* */
first, /* size_t */
last /* size_t */
);
AaronMatrix_integer_output_setCol_numeric(
ptr, /* void* */
c, /* size_t */
in, /* Rcpp::NumericVector::iterator* */
first, /* size_t */
last /* size_t */
);
```

All single-element and single-row/column getters should be supported:

`AaronMatrix_character_output_get`

`AaronMatrix_character_output_getRow`

`AaronMatrix_character_output_getCol`

For numeric types, convertible getters should also be supported:

`AaronMatrix_character_output_get`

`AaronMatrix_character_output_getRow_integer`

`AaronMatrix_character_output_getRow_numeric`

`AaronMatrix_character_output_getCol_integer`

`AaronMatrix_character_output_getCol_numeric`

We use the `R_RegisterCCallable()`

function from the R API to register the above functions (see here for an explanation).
This ensures that they can be found by *beachmat* when an `AaronMatrix`

instance is encountered.
Note that the functions must be defined with C-style linkage in order for this procedure to work properly, hence the use of `extern "C"`

in the `extensions`

test package.

Needless to say, the `NAMESPACE`

should contain an appropriate `useDynLib`

command.
This means that shared library will be loaded along with the package, allowing *beachmat* to access the registered routines within.
However, the `supportCppAccess`

method and output flags do not need to be exported, as these will be directly recovered from the package’s namespace.

We suggest using the *beachtest* package to test correct input and output via external linkage to a custom matrix representation.
When using the *testthat* framework, this can be added to `setup.R`

:

```
testpkg <- system.file("testpkg", package="beachmat")
devtools::install(testpkg, quick=TRUE)
library(beachtest)
```

It is simple to write test scripts using functions like `check_read_all`

and `check_write_all`

to quickly verify that linkage works correctly.
Developers are again referred to the `extensions`

test package for a working example.