TileDBArray
TileDBArray
sTileDBArray 1.17.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.3830612 0.1037259 -0.8703496 . 0.20718888 1.85488931
## [2,] 0.6448052 0.2693365 0.9415870 . 0.50540109 1.28134658
## [3,] -1.1530343 -2.4598970 -0.4138338 . 0.06139755 0.66204275
## [4,] 0.4093842 0.6478312 0.5630322 . 0.38691385 -1.14392717
## [5,] 0.3423499 -0.8149258 2.9321774 . -0.16957241 1.33461913
## ... . . . . . .
## [96,] 1.12933468 -1.64648439 -0.79737325 . -0.7680480 0.3903030
## [97,] 0.48694656 0.04830191 1.18663667 . 0.8742334 1.0547970
## [98,] -1.91272755 -0.16301426 1.31123511 . 2.4300094 0.6086523
## [99,] 0.09322939 0.43280564 0.18963329 . -1.7242823 0.8875694
## [100,] 2.59257710 1.10090346 1.32064314 . 1.2142186 0.3192351
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.3830612 0.1037259 -0.8703496 . 0.20718888 1.85488931
## [2,] 0.6448052 0.2693365 0.9415870 . 0.50540109 1.28134658
## [3,] -1.1530343 -2.4598970 -0.4138338 . 0.06139755 0.66204275
## [4,] 0.4093842 0.6478312 0.5630322 . 0.38691385 -1.14392717
## [5,] 0.3423499 -0.8149258 2.9321774 . -0.16957241 1.33461913
## ... . . . . . .
## [96,] 1.12933468 -1.64648439 -0.79737325 . -0.7680480 0.3903030
## [97,] 0.48694656 0.04830191 1.18663667 . 0.8742334 1.0547970
## [98,] -1.91272755 -0.16301426 1.31123511 . 2.4300094 0.6086523
## [99,] 0.09322939 0.43280564 0.18963329 . -1.7242823 0.8875694
## [100,] 2.59257710 1.10090346 1.32064314 . 1.2142186 0.3192351
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0.00 0.00
## [997,] 0 0 0 . 0.00 0.51
## [998,] 0 0 0 . -0.84 0.00
## [999,] 0 0 0 . 0.00 0.00
## [1000,] 0 0 0 . 0.00 0.00
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE TRUE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.3830612 0.1037259 -0.8703496 . 0.20718888 1.85488931
## GENE_2 0.6448052 0.2693365 0.9415870 . 0.50540109 1.28134658
## GENE_3 -1.1530343 -2.4598970 -0.4138338 . 0.06139755 0.66204275
## GENE_4 0.4093842 0.6478312 0.5630322 . 0.38691385 -1.14392717
## GENE_5 0.3423499 -0.8149258 2.9321774 . -0.16957241 1.33461913
## ... . . . . . .
## GENE_96 1.12933468 -1.64648439 -0.79737325 . -0.7680480 0.3903030
## GENE_97 0.48694656 0.04830191 1.18663667 . 0.8742334 1.0547970
## GENE_98 -1.91272755 -0.16301426 1.31123511 . 2.4300094 0.6086523
## GENE_99 0.09322939 0.43280564 0.18963329 . -1.7242823 0.8875694
## GENE_100 2.59257710 1.10090346 1.32064314 . 1.2142186 0.3192351
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## -0.3830612 0.6448052 -1.1530343 0.4093842 0.3423499 -0.6663089
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 -0.3830612 0.1037259 -0.8703496 -0.1040557 -0.4630479
## GENE_2 0.6448052 0.2693365 0.9415870 0.9353103 -0.3847852
## GENE_3 -1.1530343 -2.4598970 -0.4138338 0.1395017 -0.1422875
## GENE_4 0.4093842 0.6478312 0.5630322 1.9845851 0.9391121
## GENE_5 0.3423499 -0.8149258 2.9321774 0.3350165 0.7586612
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.7661223 0.2074517 -1.7406992 . 0.4143778 3.7097786
## GENE_2 1.2896104 0.5386730 1.8831741 . 1.0108022 2.5626932
## GENE_3 -2.3060687 -4.9197941 -0.8276676 . 0.1227951 1.3240855
## GENE_4 0.8187684 1.2956623 1.1260643 . 0.7738277 -2.2878543
## GENE_5 0.6846999 -1.6298516 5.8643548 . -0.3391448 2.6692383
## ... . . . . . .
## GENE_96 2.25866937 -3.29296878 -1.59474650 . -1.5360961 0.7806059
## GENE_97 0.97389312 0.09660382 2.37327334 . 1.7484667 2.1095940
## GENE_98 -3.82545509 -0.32602851 2.62247021 . 4.8600188 1.2173046
## GENE_99 0.18645878 0.86561128 0.37926659 . -3.4485647 1.7751387
## GENE_100 5.18515420 2.20180693 2.64128627 . 2.4284371 0.6384703
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6
## -2.9702210 -5.7175802 11.4724860 -10.2794058 -13.3048561 5.3574747
## SAMP_7 SAMP_8 SAMP_9 SAMP_10
## -8.4031568 -1.9966348 -0.8672903 11.4093050
out %*% runif(ncol(out))
## [,1]
## GENE_1 0.19768143
## GENE_2 3.34076307
## GENE_3 -2.97828594
## GENE_4 2.12976192
## GENE_5 0.81122634
## GENE_6 1.88948703
## GENE_7 -4.60702143
## GENE_8 -1.82891601
## GENE_9 -0.09215881
## GENE_10 -0.16963827
## GENE_11 -2.35754670
## GENE_12 0.06848792
## GENE_13 1.09511130
## GENE_14 -2.45551494
## GENE_15 -1.91674207
## GENE_16 -5.25008091
## GENE_17 -2.11089028
## GENE_18 -0.92646022
## GENE_19 2.30160581
## GENE_20 -0.99876578
## GENE_21 0.75169328
## GENE_22 -1.85600022
## GENE_23 2.11002234
## GENE_24 0.61799082
## GENE_25 0.39838751
## GENE_26 -1.53161747
## GENE_27 -0.91654165
## GENE_28 0.45820462
## GENE_29 -1.44233901
## GENE_30 1.86108460
## GENE_31 -1.57393763
## GENE_32 1.75559422
## GENE_33 1.07054196
## GENE_34 -0.85317301
## GENE_35 -1.48176534
## GENE_36 -4.28542497
## GENE_37 -2.62748591
## GENE_38 -2.07987713
## GENE_39 -0.93432776
## GENE_40 -1.41580255
## GENE_41 -5.12123150
## GENE_42 0.63296012
## GENE_43 2.19695273
## GENE_44 1.80709428
## GENE_45 2.78889351
## GENE_46 3.89647053
## GENE_47 0.28389213
## GENE_48 -0.07063477
## GENE_49 -1.84993225
## GENE_50 0.89817212
## GENE_51 0.22219001
## GENE_52 1.84264525
## GENE_53 3.27821050
## GENE_54 1.75031187
## GENE_55 0.08548790
## GENE_56 0.83569108
## GENE_57 -2.84253010
## GENE_58 4.70991318
## GENE_59 -0.36152261
## GENE_60 -1.43660071
## GENE_61 0.77535176
## GENE_62 1.80184953
## GENE_63 1.98537947
## GENE_64 -0.17529008
## GENE_65 -2.17761349
## GENE_66 0.67573345
## GENE_67 1.93523474
## GENE_68 2.25228632
## GENE_69 -2.35312063
## GENE_70 -2.94300391
## GENE_71 -0.45775410
## GENE_72 -3.23655437
## GENE_73 -2.18186301
## GENE_74 -1.16722517
## GENE_75 -0.80947856
## GENE_76 0.26503120
## GENE_77 -1.39578147
## GENE_78 0.48776524
## GENE_79 1.37167190
## GENE_80 0.63908077
## GENE_81 0.95163717
## GENE_82 -1.76042320
## GENE_83 2.26118416
## GENE_84 1.71093080
## GENE_85 -0.23815439
## GENE_86 2.39223907
## GENE_87 -1.23584734
## GENE_88 1.76202124
## GENE_89 0.51452674
## GENE_90 0.49701008
## GENE_91 -1.48402590
## GENE_92 1.35434292
## GENE_93 -1.59488959
## GENE_94 -1.15696397
## GENE_95 1.24030831
## GENE_96 -0.59954803
## GENE_97 1.78177062
## GENE_98 -2.10591071
## GENE_99 2.11810230
## GENE_100 2.48014184
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.1669316 -0.2794469 0.1206962 . 1.5444354 0.9652852
## [2,] 0.1260654 -0.1108322 -1.0046356 . 0.6427136 1.8628159
## [3,] -0.6511707 0.7764721 -0.6162857 . 1.5493933 0.3559367
## [4,] 0.4499642 -2.0527499 -0.8283141 . 0.6960138 -0.3011470
## [5,] 0.1216491 0.1577662 -1.5306938 . -0.5145484 -0.6455439
## ... . . . . . .
## [96,] 1.33241079 -0.24294833 0.03742077 . -0.2732691 -0.2042076
## [97,] -0.84281094 0.86089497 0.34069366 . -0.3793389 -0.6014070
## [98,] -0.46095668 -2.01364243 -0.87938008 . -1.0957320 -0.9772675
## [99,] -0.48474229 0.16430845 -1.37426529 . 1.6063097 -1.2423821
## [100,] -2.15818810 -0.63154828 0.72003625 . 1.3932073 0.1396113
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.1669316 -0.2794469 0.1206962 . 1.5444354 0.9652852
## [2,] 0.1260654 -0.1108322 -1.0046356 . 0.6427136 1.8628159
## [3,] -0.6511707 0.7764721 -0.6162857 . 1.5493933 0.3559367
## [4,] 0.4499642 -2.0527499 -0.8283141 . 0.6960138 -0.3011470
## [5,] 0.1216491 0.1577662 -1.5306938 . -0.5145484 -0.6455439
## ... . . . . . .
## [96,] 1.33241079 -0.24294833 0.03742077 . -0.2732691 -0.2042076
## [97,] -0.84281094 0.86089497 0.34069366 . -0.3793389 -0.6014070
## [98,] -0.46095668 -2.01364243 -0.87938008 . -1.0957320 -0.9772675
## [99,] -0.48474229 0.16430845 -1.37426529 . 1.6063097 -1.2423821
## [100,] -2.15818810 -0.63154828 0.72003625 . 1.3932073 0.1396113
sessionInfo()
## R Under development (unstable) (2024-10-21 r87258)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.18 TileDBArray_1.17.0 DelayedArray_0.33.0
## [4] SparseArray_1.7.0 S4Arrays_1.7.0 IRanges_2.41.0
## [7] abind_1.4-8 S4Vectors_0.45.0 MatrixGenerics_1.19.0
## [10] matrixStats_1.4.1 BiocGenerics_0.53.0 Matrix_1.7-1
## [13] BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.5.0 jsonlite_1.8.9 compiler_4.5.0
## [4] BiocManager_1.30.25 crayon_1.5.3 Rcpp_1.0.13
## [7] nanoarrow_0.6.0 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-6 R6_2.5.1
## [13] RcppCCTZ_0.2.12 XVector_0.47.0 tiledb_0.30.2
## [16] knitr_1.48 bookdown_0.41 bslib_0.8.0
## [19] rlang_1.1.4 cachem_1.1.0 xfun_0.48
## [22] sass_0.4.9 bit64_4.5.2 cli_3.6.3
## [25] zlibbioc_1.53.0 spdl_0.0.5 digest_0.6.37
## [28] grid_4.5.0 lifecycle_1.0.4 data.table_1.16.2
## [31] evaluate_1.0.1 nanotime_0.3.10 zoo_1.8-12
## [34] rmarkdown_2.28 tools_4.5.0 htmltools_0.5.8.1