Contents

1 Load Tabula Muris data

The TabulaMurisData data package provides access to the 10x and SmartSeq2 single-cell RNA-seq data sets from the Tabula Muris Consortium. The contents of the package can be seen by querying the ExperimentHub for the package name.

suppressPackageStartupMessages({
    library(ExperimentHub)
    library(SingleCellExperiment)
    library(TabulaMurisData)
})

eh <- ExperimentHub()
query(eh, "TabulaMurisData")
#> ExperimentHub with 2 records
#> # snapshotDate(): 2024-04-29
#> # $dataprovider: Tabula Muris Consortium
#> # $species: Mus musculus
#> # $rdataclass: SingleCellExperiment
#> # additional mcols(): taxonomyid, genome, description,
#> #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#> #   rdatapath, sourceurl, sourcetype 
#> # retrieve records with, e.g., 'object[["EH1617"]]' 
#> 
#>            title               
#>   EH1617 | TabulaMurisDroplet  
#>   EH1618 | TabulaMurisSmartSeq2

The individual data sets can be accessed using either their ExperimentHub accession number, or the convenience functions provided in this package. For example, for the 10x data:

droplet <- eh[["EH1617"]]
#> see ?TabulaMurisData and browseVignettes('TabulaMurisData') for documentation
#> loading from cache
droplet
#> class: SingleCellExperiment 
#> dim: 23341 70118 
#> metadata(0):
#> assays(1): counts
#> rownames(23341): 0610005C13Rik 0610007C21Rik ... Zzef1 Zzz3
#> rowData names(2): ID Symbol
#> colnames(70118): 10X_P4_0_AAACCTGAGATTACCC 10X_P4_0_AAACCTGAGTGCCAGA
#>   ... 10X_P8_15_TTTGTCATCTTACCGC 10X_P8_15_TTTGTCATCTTGTTTG
#> colData names(10): cell channel ... cell_ontology_id free_annotation
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
droplet <- TabulaMurisDroplet()
#> see ?TabulaMurisData and browseVignettes('TabulaMurisData') for documentation
#> loading from cache
droplet
#> class: SingleCellExperiment 
#> dim: 23341 70118 
#> metadata(0):
#> assays(1): counts
#> rownames(23341): 0610005C13Rik 0610007C21Rik ... Zzef1 Zzz3
#> rowData names(2): ID Symbol
#> colnames(70118): 10X_P4_0_AAACCTGAGATTACCC 10X_P4_0_AAACCTGAGTGCCAGA
#>   ... 10X_P8_15_TTTGTCATCTTACCGC 10X_P8_15_TTTGTCATCTTGTTTG
#> colData names(10): cell channel ... cell_ontology_id free_annotation
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):

2 Explore data with iSEE

Each data set is provided in the form of a SingleCellExperiment object. To gain further insights into the contents of the data sets, they can be explored using, e.g., the iSEE package. For the purposes of this vignette, we first subsample a small subset of the cells in the 10x data set, to reduce the run time.

set.seed(1234)
se <- droplet[, sample(seq_len(ncol(droplet)), 250, replace = FALSE)]
se
#> class: SingleCellExperiment 
#> dim: 23341 250 
#> metadata(0):
#> assays(1): counts
#> rownames(23341): 0610005C13Rik 0610007C21Rik ... Zzef1 Zzz3
#> rowData names(2): ID Symbol
#> colnames(250): 10X_P8_12_ACGGGCTGTCAGAGGT 10X_P7_10_CGTCCATGTTATGCGT
#>   ... 10X_P7_9_TGACAACGTGTAAGTA 10X_P8_14_GATCTAGCACGGCCAT
#> colData names(10): cell channel ... cell_ontology_id free_annotation
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):

Next, we calculate size factors and normalize the data using the scran and scater packages, and perform dimension reduction using PCA and t-SNE.

se <- scran::computeSumFactors(se)
se <- scater::logNormCounts(se)
se <- scater::runPCA(se)
se <- scater::runTSNE(se)

Finally, we call iSEE with the subsampled SingleCellExperiment object. This opens up an instance of iSEE containing the provided data set.

if (require(iSEE)) {
    iSEE(se)
}

3 Session info

sessionInfo()
#> R version 4.4.0 RC (2024-04-16 r86468)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] TabulaMurisData_1.23.0      SingleCellExperiment_1.27.0
#>  [3] SummarizedExperiment_1.35.0 Biobase_2.65.0             
#>  [5] GenomicRanges_1.57.0        GenomeInfoDb_1.41.0        
#>  [7] IRanges_2.39.0              S4Vectors_0.43.0           
#>  [9] MatrixGenerics_1.17.0       matrixStats_1.3.0          
#> [11] ExperimentHub_2.13.0        AnnotationHub_3.13.0       
#> [13] BiocFileCache_2.13.0        dbplyr_2.5.0               
#> [15] BiocGenerics_0.51.0         BiocStyle_2.33.0           
#> 
#> loaded via a namespace (and not attached):
#>  [1] DBI_1.2.2                 gridExtra_2.3            
#>  [3] rlang_1.1.3               magrittr_2.0.3           
#>  [5] scater_1.33.0             compiler_4.4.0           
#>  [7] RSQLite_2.3.6             DelayedMatrixStats_1.27.0
#>  [9] png_0.1-8                 vctrs_0.6.5              
#> [11] pkgconfig_2.0.3           crayon_1.5.2             
#> [13] fastmap_1.1.1             XVector_0.45.0           
#> [15] scuttle_1.15.0            utf8_1.2.4               
#> [17] rmarkdown_2.26            UCSC.utils_1.1.0         
#> [19] ggbeeswarm_0.7.2          purrr_1.0.2              
#> [21] bit_4.0.5                 xfun_0.43                
#> [23] bluster_1.15.0            zlibbioc_1.51.0          
#> [25] cachem_1.0.8              beachmat_2.21.0          
#> [27] jsonlite_1.8.8            blob_1.2.4               
#> [29] DelayedArray_0.31.0       BiocParallel_1.39.0      
#> [31] irlba_2.3.5.1             parallel_4.4.0           
#> [33] cluster_2.1.6             R6_2.5.1                 
#> [35] bslib_0.7.0               limma_3.61.0             
#> [37] jquerylib_0.1.4           Rcpp_1.0.12              
#> [39] bookdown_0.39             knitr_1.46               
#> [41] Matrix_1.7-0              igraph_2.0.3             
#> [43] tidyselect_1.2.1          viridis_0.6.5            
#> [45] abind_1.4-5               yaml_2.3.8               
#> [47] codetools_0.2-20          curl_5.2.1               
#> [49] lattice_0.22-6            tibble_3.2.1             
#> [51] withr_3.0.0               KEGGREST_1.45.0          
#> [53] Rtsne_0.17                evaluate_0.23            
#> [55] Biostrings_2.73.0         pillar_1.9.0             
#> [57] BiocManager_1.30.22       filelock_1.0.3           
#> [59] generics_0.1.3            ggplot2_3.5.1            
#> [61] BiocVersion_3.20.0        munsell_0.5.1            
#> [63] scales_1.3.0              sparseMatrixStats_1.17.0 
#> [65] glue_1.7.0                metapod_1.13.0           
#> [67] tools_4.4.0               BiocNeighbors_1.23.0     
#> [69] ScaledMatrix_1.13.0       locfit_1.5-9.9           
#> [71] scran_1.33.0              grid_4.4.0               
#> [73] colorspace_2.1-0          AnnotationDbi_1.67.0     
#> [75] edgeR_4.3.0               GenomeInfoDbData_1.2.12  
#> [77] beeswarm_0.4.0            BiocSingular_1.21.0      
#> [79] vipor_0.4.7               cli_3.6.2                
#> [81] rsvd_1.0.5                rappdirs_0.3.3           
#> [83] fansi_1.0.6               viridisLite_0.4.2        
#> [85] S4Arrays_1.5.0            dplyr_1.1.4              
#> [87] gtable_0.3.5              sass_0.4.9               
#> [89] digest_0.6.35             ggrepel_0.9.5            
#> [91] SparseArray_1.5.0         dqrng_0.3.2              
#> [93] memoise_2.0.1             htmltools_0.5.8.1        
#> [95] lifecycle_1.0.4           httr_1.4.7               
#> [97] statmod_1.5.0             mime_0.12                
#> [99] bit64_4.0.5