1 Installation

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("SingleCellMultiModal")

1.1 Load

library(SingleCellMultiModal)
library(MultiAssayExperiment)

2 G&T-seq: parallel sequencing data of single-cell genomes and transcriptomes

G&T-seq is a combination of Picoplex amplified gDNA sequencing (genome) and SMARTSeq2 amplified cDNA sequencing (transcriptome) of the same cell. For more information, see Macaulay et al. (2015).

2.1 Downloading datasets

The user can see the available dataset by using the default options

GTseq("mouse_embryo_8_cell", mode = "*", dry.run = TRUE)
##    ah_id           mode file_size           rdataclass rdatadateadded
## 1 EH5431        genomic      0 Mb     RaggedExperiment     2021-03-24
## 2 EH5433 transcriptomic    2.3 Mb SingleCellExperiment     2021-03-24
##   rdatadateremoved
## 1             <NA>
## 2             <NA>

Or by simply running:

GTseq()
##    ah_id           mode file_size           rdataclass rdatadateadded
## 1 EH5431        genomic      0 Mb     RaggedExperiment     2021-03-24
## 2 EH5433 transcriptomic    2.3 Mb SingleCellExperiment     2021-03-24
##   rdatadateremoved
## 1             <NA>
## 2             <NA>

2.2 Obtaining the data

To obtain the actual datasets:

gts <- GTseq(dry.run = FALSE)
## Warning: sampleMap[['assay']] coerced with as.factor()
gts
## A MultiAssayExperiment object of 2 listed
##  experiments with user-defined names and respective classes.
##  Containing an ExperimentList class object of length 2:
##  [1] genomic: RaggedExperiment with 2366 rows and 112 columns
##  [2] transcriptomic: SingleCellExperiment with 24029 rows and 112 columns
## Functionality:
##  experiments() - obtain the ExperimentList instance
##  colData() - the primary/phenotype DataFrame
##  sampleMap() - the sample coordination DataFrame
##  `$`, `[`, `[[` - extract colData columns, subset, or experiment
##  *Format() - convert into a long or wide DataFrame
##  assays() - convert ExperimentList to a SimpleList of matrices
##  exportClass() - save data to flat files

2.3 Exploring the data structure

Check available metadata for each of the 112 mouse embryo cells assayed by G&T-seq:

colData(gts)
## DataFrame with 112 rows and 3 columns
##         Characteristics.organism. Characteristics.sex.
##                       <character>          <character>
## cell1                Mus musculus               female
## cell2                Mus musculus               female
## cell3                Mus musculus                 male
## cell4                Mus musculus                 male
## cell5                Mus musculus               female
## ...                           ...                  ...
## cell108              Mus musculus               female
## cell109              Mus musculus                 male
## cell110              Mus musculus                 male
## cell111              Mus musculus               female
## cell112              Mus musculus               female
##         Characteristics.cell.type.
##                        <character>
## cell1       8_cell_stage_single_..
## cell2       8_cell_stage_single_..
## cell3       8_cell_stage_single_..
## cell4       8_cell_stage_single_..
## cell5       8_cell_stage_single_..
## ...                            ...
## cell108     8_cell_stage_single_..
## cell109     8_cell_stage_single_..
## cell110     8_cell_stage_single_..
## cell111     8_cell_stage_single_..
## cell112     8_cell_stage_single_..

Take a peek at the sampleMap:

sampleMap(gts)
## DataFrame with 224 rows and 3 columns
##              assay     primary     colname
##           <factor> <character> <character>
## 1   transcriptomic       cell1   ERR861694
## 2   transcriptomic       cell2   ERR861750
## 3   transcriptomic       cell3   ERR861695
## 4   transcriptomic       cell4   ERR861751
## 5   transcriptomic       cell5   ERR861696
## ...            ...         ...         ...
## 220        genomic     cell108   ERR863164
## 221        genomic     cell109   ERR863109
## 222        genomic     cell110   ERR863165
## 223        genomic     cell111   ERR863110
## 224        genomic     cell112   ERR863166

2.4 Copy numbers

To access the integer copy numbers as detected from scDNA-seq:

head(assay(gts, "genomic"))[, 1:4]
##                          ERR863111 ERR863834 ERR863112 ERR863835
## chr1:23000001-25500000          NA        NA        NA        NA
## chr4:112000001-114500000        NA        NA        NA        NA
## chr4:145000001-148500000        NA        NA        NA        NA
## chr5:14000001-16500000          NA        NA        NA        NA
## chr15:66500001-69000000         NA        NA        NA        NA
## chrX:21500001-36000000          NA        NA        NA        NA

2.5 RNA-seq

To access raw read counts as quantified from scRNA-seq:

head(assay(gts, "transcriptomic"))[, 1:4]
##                    ERR861694 ERR861750 ERR861695 ERR861751
## ENSMUSG00000000001         4         7        30        32
## ENSMUSG00000000003         0         0         0         0
## ENSMUSG00000000028        11        17        79        94
## ENSMUSG00000000031         0         0         0         0
## ENSMUSG00000000037         0         0         1         0
## ENSMUSG00000000049         0         0         0         0

For protocol information, see Macaulay et al. (2016).

3 sessionInfo

sessionInfo()
## R version 4.4.0 RC (2024-04-16 r86468)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] RaggedExperiment_1.29.0     SingleCellExperiment_1.27.2
##  [3] SingleCellMultiModal_1.17.3 MultiAssayExperiment_1.31.3
##  [5] SummarizedExperiment_1.35.0 Biobase_2.65.0             
##  [7] GenomicRanges_1.57.1        GenomeInfoDb_1.41.1        
##  [9] IRanges_2.39.0              S4Vectors_0.43.0           
## [11] BiocGenerics_0.51.0         MatrixGenerics_1.17.0      
## [13] matrixStats_1.3.0           BiocStyle_2.33.1           
## 
## loaded via a namespace (and not attached):
##  [1] KEGGREST_1.45.1          rjson_0.2.21             xfun_0.45               
##  [4] bslib_0.7.0              lattice_0.22-6           vctrs_0.6.5             
##  [7] tools_4.4.0              generics_0.1.3           curl_5.2.1              
## [10] AnnotationDbi_1.67.0     tibble_3.2.1             fansi_1.0.6             
## [13] RSQLite_2.3.7            blob_1.2.4               BiocBaseUtils_1.7.0     
## [16] pkgconfig_2.0.3          Matrix_1.7-0             dbplyr_2.5.0            
## [19] lifecycle_1.0.4          GenomeInfoDbData_1.2.12  compiler_4.4.0          
## [22] Biostrings_2.73.1        htmltools_0.5.8.1        sass_0.4.9              
## [25] yaml_2.3.8               pillar_1.9.0             crayon_1.5.2            
## [28] jquerylib_0.1.4          DelayedArray_0.31.2      cachem_1.1.0            
## [31] magick_2.8.3             abind_1.4-5              mime_0.12               
## [34] ExperimentHub_2.13.0     AnnotationHub_3.13.0     tidyselect_1.2.1        
## [37] digest_0.6.35            purrr_1.0.2              dplyr_1.1.4             
## [40] bookdown_0.39            BiocVersion_3.20.0       fastmap_1.2.0           
## [43] grid_4.4.0               cli_3.6.2                SparseArray_1.5.8       
## [46] magrittr_2.0.3           S4Arrays_1.5.1           utf8_1.2.4              
## [49] withr_3.0.0              rappdirs_0.3.3           filelock_1.0.3          
## [52] UCSC.utils_1.1.0         bit64_4.0.5              rmarkdown_2.27          
## [55] XVector_0.45.0           httr_1.4.7               bit_4.0.5               
## [58] png_0.1-8                SpatialExperiment_1.15.0 memoise_2.0.1           
## [61] evaluate_0.24.0          knitr_1.47               BiocFileCache_2.13.0    
## [64] rlang_1.1.4              Rcpp_1.0.12              glue_1.7.0              
## [67] DBI_1.2.3                formatR_1.14             BiocManager_1.30.23     
## [70] jsonlite_1.8.8           R6_2.5.1                 zlibbioc_1.51.1

References

Macaulay, Iain C, Wilfried Haerty, Parveen Kumar, Yang I Li, Tim Xiaoming Hu, Mabel J Teng, Mubeen Goolam, et al. 2015. “G&T-seq: Parallel Sequencing of Single-Cell Genomes and Transcriptomes.” Nat. Methods 12 (6): 519–22.

Macaulay, Iain C, Mabel J Teng, Wilfried Haerty, Parveen Kumar, Chris P Ponting, and Thierry Voet. 2016. “Separation and Parallel Sequencing of the Genomes and Transcriptomes of Single Cells Using G&T-seq.” Nat. Protoc. 11 (11): 2081–2103.