TileDBArray 1.13.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.98985735 -0.69119668 0.28247547 . -1.09909399 0.46601535
## [2,] 0.06202741 0.05205531 1.47730037 . -0.03763323 -0.01756452
## [3,] 0.93670122 -1.01417802 -1.27312556 . 1.11824675 0.96758933
## [4,] 1.80882446 -0.14674217 0.20886443 . 0.09007757 1.04095819
## [5,] 0.63481089 -0.56375828 0.54640471 . 0.42859951 0.19934312
## ... . . . . . .
## [96,] 1.794140265 -0.227085616 -0.870625881 . -0.95088438 -0.37940837
## [97,] 0.643239670 -0.067493899 0.342585835 . 0.48150556 -1.18224533
## [98,] -1.008328456 -0.848652712 0.004912309 . 0.07438504 -0.87660895
## [99,] 1.604220392 -0.673710179 -0.299282296 . -0.79961407 0.57366047
## [100,] -0.298915449 -0.446646195 1.149081636 . 0.18454351 -0.42969195
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.98985735 -0.69119668 0.28247547 . -1.09909399 0.46601535
## [2,] 0.06202741 0.05205531 1.47730037 . -0.03763323 -0.01756452
## [3,] 0.93670122 -1.01417802 -1.27312556 . 1.11824675 0.96758933
## [4,] 1.80882446 -0.14674217 0.20886443 . 0.09007757 1.04095819
## [5,] 0.63481089 -0.56375828 0.54640471 . 0.42859951 0.19934312
## ... . . . . . .
## [96,] 1.794140265 -0.227085616 -0.870625881 . -0.95088438 -0.37940837
## [97,] 0.643239670 -0.067493899 0.342585835 . 0.48150556 -1.18224533
## [98,] -1.008328456 -0.848652712 0.004912309 . 0.07438504 -0.87660895
## [99,] 1.604220392 -0.673710179 -0.299282296 . -0.79961407 0.57366047
## [100,] -0.298915449 -0.446646195 1.149081636 . 0.18454351 -0.42969195
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.98985735 -0.69119668 0.28247547 . -1.09909399 0.46601535
## GENE_2 0.06202741 0.05205531 1.47730037 . -0.03763323 -0.01756452
## GENE_3 0.93670122 -1.01417802 -1.27312556 . 1.11824675 0.96758933
## GENE_4 1.80882446 -0.14674217 0.20886443 . 0.09007757 1.04095819
## GENE_5 0.63481089 -0.56375828 0.54640471 . 0.42859951 0.19934312
## ... . . . . . .
## GENE_96 1.794140265 -0.227085616 -0.870625881 . -0.95088438 -0.37940837
## GENE_97 0.643239670 -0.067493899 0.342585835 . 0.48150556 -1.18224533
## GENE_98 -1.008328456 -0.848652712 0.004912309 . 0.07438504 -0.87660895
## GENE_99 1.604220392 -0.673710179 -0.299282296 . -0.79961407 0.57366047
## GENE_100 -0.298915449 -0.446646195 1.149081636 . 0.18454351 -0.42969195
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## -0.98985735 0.06202741 0.93670122 1.80882446 0.63481089 -0.02256995
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 -0.98985735 -0.69119668 0.28247547 0.81386342 0.58311739
## GENE_2 0.06202741 0.05205531 1.47730037 -1.17438117 -1.67167848
## GENE_3 0.93670122 -1.01417802 -1.27312556 -0.30141539 0.72201380
## GENE_4 1.80882446 -0.14674217 0.20886443 -1.32948123 -0.22475132
## GENE_5 0.63481089 -0.56375828 0.54640471 -0.64015347 0.97778288
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -1.9797147 -1.3823934 0.5649509 . -2.19818799 0.93203071
## GENE_2 0.1240548 0.1041106 2.9546007 . -0.07526646 -0.03512904
## GENE_3 1.8734024 -2.0283560 -2.5462511 . 2.23649351 1.93517867
## GENE_4 3.6176489 -0.2934843 0.4177289 . 0.18015515 2.08191638
## GENE_5 1.2696218 -1.1275166 1.0928094 . 0.85719903 0.39868623
## ... . . . . . .
## GENE_96 3.588280530 -0.454171233 -1.741251762 . -1.9017688 -0.7588167
## GENE_97 1.286479339 -0.134987799 0.685171670 . 0.9630111 -2.3644907
## GENE_98 -2.016656911 -1.697305423 0.009824619 . 0.1487701 -1.7532179
## GENE_99 3.208440783 -1.347420358 -0.598564592 . -1.5992281 1.1473209
## GENE_100 -0.597830898 -0.893292390 2.298163273 . 0.3690870 -0.8593839
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6 SAMP_7
## 3.369356 5.934924 6.972511 -18.845546 4.946596 -25.831554 18.472536
## SAMP_8 SAMP_9 SAMP_10
## -8.734845 -16.784105 5.236001
out %*% runif(ncol(out))
## <100 x 1> DelayedMatrix object of type "double":
## y
## GENE_1 -0.5421485
## GENE_2 -1.7752297
## GENE_3 1.8752347
## GENE_4 0.6525985
## GENE_5 0.5070950
## ... .
## GENE_96 -1.0879090
## GENE_97 -1.8571687
## GENE_98 -3.0208955
## GENE_99 0.4196230
## GENE_100 -0.1670388
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.88162226 -0.90262640 -0.65896991 . 1.80207817 0.18554622
## [2,] -0.86176363 -0.89663951 0.36047295 . -1.95566620 1.30855053
## [3,] -0.95955704 -2.33774226 -0.86354288 . 1.57147698 -0.33219284
## [4,] -2.54937699 -0.05328587 -2.03881729 . -0.25498511 -0.83340761
## [5,] -0.85211352 -0.44139845 0.83319001 . 0.64424103 0.05753529
## ... . . . . . .
## [96,] -0.9203207 -0.7657489 -0.4570184 . 1.362479081 1.477289694
## [97,] 1.8232059 -2.7948398 -1.1517764 . -0.486453047 -0.446331195
## [98,] -0.5262661 -0.3787076 0.5055736 . -0.083344724 0.301502187
## [99,] -0.7922214 0.6587029 0.2637650 . 0.447508686 0.663516120
## [100,] -0.9441080 1.6178980 0.4877481 . 0.255369676 0.003286462
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.88162226 -0.90262640 -0.65896991 . 1.80207817 0.18554622
## [2,] -0.86176363 -0.89663951 0.36047295 . -1.95566620 1.30855053
## [3,] -0.95955704 -2.33774226 -0.86354288 . 1.57147698 -0.33219284
## [4,] -2.54937699 -0.05328587 -2.03881729 . -0.25498511 -0.83340761
## [5,] -0.85211352 -0.44139845 0.83319001 . 0.64424103 0.05753529
## ... . . . . . .
## [96,] -0.9203207 -0.7657489 -0.4570184 . 1.362479081 1.477289694
## [97,] 1.8232059 -2.7948398 -1.1517764 . -0.486453047 -0.446331195
## [98,] -0.5262661 -0.3787076 0.5055736 . -0.083344724 0.301502187
## [99,] -0.7922214 0.6587029 0.2637650 . 0.447508686 0.663516120
## [100,] -0.9441080 1.6178980 0.4877481 . 0.255369676 0.003286462
sessionInfo()
## R Under development (unstable) (2023-10-22 r85388)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.14 TileDBArray_1.13.0 DelayedArray_0.29.0
## [4] SparseArray_1.3.0 S4Arrays_1.3.0 abind_1.4-5
## [7] IRanges_2.37.0 S4Vectors_0.41.0 MatrixGenerics_1.15.0
## [10] matrixStats_1.0.0 BiocGenerics_0.49.0 Matrix_1.6-1.1
## [13] BiocStyle_2.31.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.0.5 jsonlite_1.8.7 compiler_4.4.0
## [4] BiocManager_1.30.22 crayon_1.5.2 Rcpp_1.0.11
## [7] jquerylib_0.1.4 yaml_2.3.7 fastmap_1.1.1
## [10] lattice_0.22-5 R6_2.5.1 RcppCCTZ_0.2.12
## [13] XVector_0.43.0 tiledb_0.21.1 knitr_1.44
## [16] bookdown_0.36 bslib_0.5.1 rlang_1.1.1
## [19] cachem_1.0.8 xfun_0.40 sass_0.4.7
## [22] bit64_4.0.5 cli_3.6.1 zlibbioc_1.49.0
## [25] spdl_0.0.5 digest_0.6.33 grid_4.4.0
## [28] data.table_1.14.8 evaluate_0.22 nanotime_0.3.7
## [31] zoo_1.8-12 rmarkdown_2.25 tools_4.4.0
## [34] htmltools_0.5.6.1