The MouseThymusAgeing package provides convenient access to the single-cell RNA sequencing (scRNA-seq) datasets from Baran-Gale et al. (2020). The study used single-cell transcriptomic profiling to resolve how the epithelial composition of the mouse thymus changes with ageing. The datasets from the paper are provided as count matrices with relevant sample-level and feature-level meta-data. All data are provided post-processing and QC. The raw sequencing data can be directly acquired from ArrayExpress using accessions E-MTAB-8560 and E-MTAB-8737.
The package can be installed from Bioconductor. Bioconductor packages can be accessed using the BiocManager package.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("MouseThymusAgeing")
To use the package, load it in the typical way.
library(MouseThymusAgeing)
Detailed experimental protocols are available in the manuscript and analytical details are provided in the accompanying GitHub repo.
This data package contains 2 single-cell data sets from the paper. The first details the initial transcriptomic profiling of defined TEC populations using the plate-based SMART-seq2 chemistry. These cells were sorted from mice at 1, 4, 16, 32 and 52 weeks of age using the following flow cytometry phenotypes:
In each case cells were sorted from 5 separate mice at each age into a 384 well plate containing lysis buffer, with cells from different ages and days block sorted into different areas of each plate to minimise the confounding between batch effects, mouse age and sorted subpopulation. The single-cell libraries were prepared according to the SMART-seq2 protocol and sequenced on an Illumina NovaSeq 6000.
The computational processing invovled the following steps:
computeSumFactors()
function from scran (L. Lun, Bach, and Marioni 2016).The second dataset contains cells that were profiling from TEC at 8, 20 and 36 weeks old, derived from a transgenic model system that is also able to lineage trace cells that derive from those that express the thymoproteasomal gene, \(\beta\)-5t. When this gene is expressed it drives the expression of a fluorescent reporter gene, ZsGreen (ZsG). The mouse is denoted \(\mbox{3xtg}^{\beta5t}\). Each mouse (3 replicates per age) first had their transgene induced using doxycycline, and 4 weeks later the TEC were collected by flow cytometry in separate ZsG+ and ZsG- groups. Within each of these groups cells were FAC-sorted into mTEC (Cd45+EpCam+MHCII+Ly51-UEA1+) and cTEC (Cd45+EpCam+Ly51+UEA1+) populations. For this experiment we made us of recent developments in multiplexing with hashtag oligos (HTO; cell-hashing)(Stoeckius et al. 2018). Consequently, the cells were super-loaded onto the 10X Genomics Chromium chips before library prep and sequencing on an Illumina NovaSeq 6000.
The computational processing for these data is different to above. Specifically:
emptyDrops()
from the DropletUtils (participants in the 1st Human Cell Atlas Jamboree et al. 2019).computeSumFactors()
function from scran (L. Lun, Bach, and Marioni 2016), and used for
normalization with a log(X + 1)
transformation.The SMART-seq2 data is stored in subsets according to the sorting day (numbered 1-5). For the droplet data, the data can be accessed according
to the specific multiplexed samples (6 in total). For the SMART-seq2 the exported object SMARTseqMetadata
provides the relevant metadata
information for each sorting day, the equivalent object DropletMetadata
contains the relevant information for each separate sample. Specific
descriptions of each column can be accessed using ?SMARTseqMetadata
and ?DropletMetadata
.
head(SMARTseqMetadata, n = 5)
## sample Age Gender ncells
## 1 day1 1wk female 10
## 2 day1 4wk female 19
## 3 day1 16wk female 10
## 4 day1 32wk female 43
## 5 day1 52wk female 52
All of the data access functions allow you to select the particular samples or sorting days that you would like to access for the relevant data set. By loading only the samples or sorting days that you are interested in for your particular analysis, you will save time when downloading and loading the data, and also reduce memory consumption on your machine.
Droplet single-cell experiments tend to be much larger owing to the ability to encapsulate and process many more cells than in either 96- or 384-well plates. The droplet scRNA-seq made use of hashtag oligonucleotides to multiplex samples, allowing for replicated experimental design without breaking the bank.
head(DropletMetadata, n = 5)
## sample Gender HTO ncells
## 1 ZsG_1stRun1 female Wk1_ZsGp 5160
## 2 ZsG_1stRun1 female Wk4_ZsGp 3902
## 3 ZsG_1stRun1 female Wk16_ZsGp 3798
## 4 ZsG_1stRun2 female Wk1_ZsGp 3360
## 5 ZsG_1stRun2 female Wk4_ZsGp 2355
Package data are provided as SingleCellExperiment
objects, an extension of the Bioconductor SummarizedExperiment
object for high-throughput
omics experiment data. SingleCellExperiment
object uses memory-efficient storage and sparse matrices to store the single-cell experiment data,
whilst allowing the layering of additional feature- and cell-wise meta-data to facilitate single-cell analyses. This section will detail how
to access and interact with these objects from the MouseThymusAgeing
package.
smart.sce <- MouseSMARTseqData(samples="day2")
smart.sce
## class: SingleCellExperiment
## dim: 48801 661
## metadata(0):
## assays(1): counts
## rownames(48801): ERCC-00002 ERCC-00003 ... ENSMUSG00000064371
## ENSMUSG00000064372
## rowData names(6): Geneid Chr ... Strand Length
## colnames(661): B1.B002294.2_32.4.52.1_S109 B10.B002294.2_32.4.52.1_S118
## ... P9.B002284.2_52.1.32.1_S9 P9.B002294.2_32.4.52.1_S153
## colData names(11): CellID ClusterID ... SubType sizeFactor
## reducedDimNames(1): PCA
## mainExpName: NULL
## altExpNames(0):
The gene counts are stored in the assays(sce, "counts")
slot, which can be accessed using the convenience function counts
. The gene counts are
stored in a memory efficient sparse matrix class from the Matrix package.
head(counts(smart.sce)[, 1:10])
## B1.B002294.2_32.4.52.1_S109 B10.B002294.2_32.4.52.1_S118
## ERCC-00002 2071 2739
## ERCC-00003 1 70
## ERCC-00004 179 533
## ERCC-00009 507 0
## ERCC-00012 0 0
## ERCC-00013 0 0
## B11.B002284.2_52.1.32.1_S275 B11.B002294.2_32.4.52.1_S119
## ERCC-00002 2612 1722
## ERCC-00003 47 140
## ERCC-00004 193 210
## ERCC-00009 898 0
## ERCC-00012 0 0
## ERCC-00013 0 0
## B13.B002294.2_32.4.52.1_S121 B14.B002284.2_52.1.32.1_S278
## ERCC-00002 1853 1987
## ERCC-00003 130 111
## ERCC-00004 417 60
## ERCC-00009 487 0
## ERCC-00012 0 0
## ERCC-00013 0 0
## B14.B002294.2_32.4.52.1_S122 B14.B002295.2_1.16.4.1_S206
## ERCC-00002 1788 14714
## ERCC-00003 112 13
## ERCC-00004 183 1251
## ERCC-00009 479 0
## ERCC-00012 0 0
## ERCC-00013 0 0
## B16.B002284.2_52.1.32.1_S280 B17.B002283.2_4.52.16.1_S197
## ERCC-00002 2458 3697
## ERCC-00003 35 150
## ERCC-00004 62 387
## ERCC-00009 236 0
## ERCC-00012 0 0
## ERCC-00013 0 0
The normalisation factors per cell can be accessed using the sizeFactors()
function.
head(sizeFactors((smart.sce)))
## [1] 1.3178659 1.7034129 1.5202621 0.4705083 2.2748982 1.8011869
These are used to normalise the data. To generate single-cell expression values on a log-normal scale, we can apply the logNormCounts
from the
scuttle package. This will add the logcounts
entry to the assays
slot in our object.
library(scuttle)
smart.sce <- logNormCounts(smart.sce)
With these normalised counts we can perform our standard down-stream analytical tasks, such as identifying highly variable genes, projecting
cells into a reduced dimensional space and clustering using a nearest-neighbour graph. You can further inspect the cell-wise meta-data attached
to each dataset, stored in the colData
for each SingleCellExperiment object.
head(colData(smart.sce))
## DataFrame with 6 rows and 11 columns
## CellID ClusterID Position
## <character> <integer> <character>
## B1.B002294.2_32.4.52.1_S109 B1.B002294.2_32.4.52.. 8 B1
## B10.B002294.2_32.4.52.1_S118 B10.B002294.2_32.4.5.. 2 B10
## B11.B002284.2_52.1.32.1_S275 B11.B002284.2_52.1.3.. 2 B11
## B11.B002294.2_32.4.52.1_S119 B11.B002294.2_32.4.5.. 2 B11
## B13.B002294.2_32.4.52.1_S121 B13.B002294.2_32.4.5.. 2 B13
## B14.B002284.2_52.1.32.1_S278 B14.B002284.2_52.1.3.. 2 B14
## PlateID Column Row SortType
## <character> <integer> <character> <character>
## B1.B002294.2_32.4.52.1_S109 B002294 1 B mTEClo
## B10.B002294.2_32.4.52.1_S118 B002294 10 B mTEClo
## B11.B002284.2_52.1.32.1_S275 B002284 11 B mTEClo
## B11.B002294.2_32.4.52.1_S119 B002294 11 B mTEClo
## B13.B002294.2_32.4.52.1_S121 B002294 13 B mTEClo
## B14.B002284.2_52.1.32.1_S278 B002284 14 B mTEClo
## SortDay Age SubType sizeFactor
## <integer> <character> <character> <numeric>
## B1.B002294.2_32.4.52.1_S109 2 32wk nTEC 0.898811
## B10.B002294.2_32.4.52.1_S118 2 4wk Intertypical TEC 1.161762
## B11.B002284.2_52.1.32.1_S275 2 1wk Intertypical TEC 1.036849
## B11.B002294.2_32.4.52.1_S119 2 4wk Intertypical TEC 0.320896
## B13.B002294.2_32.4.52.1_S121 2 4wk Intertypical TEC 1.551526
## B14.B002284.2_52.1.32.1_S278 2 1wk Intertypical TEC 1.228445
Details of what information is stored can be found in the documentation using ?DropletMetadata
and ?SMARTseqMetada
. In each object we also
have the pre-computed reduced dimensions that can be accessed through the reducedDim(<sce>, "PCA")
slot.
sessionInfo()
## R Under development (unstable) (2024-10-21 r87258)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] scuttle_1.17.0 MouseThymusAgeing_1.15.0
## [3] SingleCellExperiment_1.29.0 SummarizedExperiment_1.37.0
## [5] Biobase_2.67.0 GenomicRanges_1.59.0
## [7] GenomeInfoDb_1.43.0 IRanges_2.41.0
## [9] S4Vectors_0.45.0 BiocGenerics_0.53.1
## [11] generics_0.1.3 MatrixGenerics_1.19.0
## [13] matrixStats_1.4.1 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] KEGGREST_1.47.0 xfun_0.49 bslib_0.8.0
## [4] lattice_0.22-6 vctrs_0.6.5 tools_4.5.0
## [7] parallel_4.5.0 curl_5.2.3 AnnotationDbi_1.69.0
## [10] tibble_3.2.1 fansi_1.0.6 RSQLite_2.3.7
## [13] blob_1.2.4 pkgconfig_2.0.3 Matrix_1.7-1
## [16] dbplyr_2.5.0 lifecycle_1.0.4 GenomeInfoDbData_1.2.13
## [19] compiler_4.5.0 Biostrings_2.75.0 codetools_0.2-20
## [22] htmltools_0.5.8.1 sass_0.4.9 yaml_2.3.10
## [25] pillar_1.9.0 crayon_1.5.3 jquerylib_0.1.4
## [28] BiocParallel_1.41.0 DelayedArray_0.33.1 cachem_1.1.0
## [31] abind_1.4-8 mime_0.12 ExperimentHub_2.15.0
## [34] AnnotationHub_3.15.0 tidyselect_1.2.1 digest_0.6.37
## [37] purrr_1.0.2 dplyr_1.1.4 bookdown_0.41
## [40] BiocVersion_3.21.1 fastmap_1.2.0 grid_4.5.0
## [43] cli_3.6.3 SparseArray_1.7.0 magrittr_2.0.3
## [46] S4Arrays_1.7.1 utf8_1.2.4 withr_3.0.2
## [49] rappdirs_0.3.3 filelock_1.0.3 UCSC.utils_1.3.0
## [52] bit64_4.5.2 rmarkdown_2.29 XVector_0.47.0
## [55] httr_1.4.7 bit_4.5.0 png_0.1-8
## [58] beachmat_2.23.0 memoise_2.0.1 evaluate_1.0.1
## [61] knitr_1.48 BiocFileCache_2.15.0 rlang_1.1.4
## [64] Rcpp_1.0.13-1 glue_1.8.0 DBI_1.2.3
## [67] BiocManager_1.30.25 jsonlite_1.8.9 R6_2.5.1
## [70] zlibbioc_1.53.0