1 Installation

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("SingleCellMultiModal")

1.1 Load packages

library(SingleCellMultiModal)
library(MultiAssayExperiment)

2 scNMT: single-cell nucleosome, methylation and transcription sequencing

The dataset was graciously provided by Argelaguet et al. (2019).

Scripts used to process the raw data were written and maintained by Argelaguet and colleagues and reside on GitHub: https://github.com/rargelaguet/scnmt_gastrulation

For more information on the protocol, see Clark et al. (2018).

2.1 Dataset lookup

The user can see the available datasets by using the dry.run argument:

scNMT("mouse_gastrulation", mode = "*", version = "1.0.0", dry.run = TRUE)
##     ah_id         mode file_size rdataclass rdatadateadded rdatadateremoved
## 1  EH3738      acc_cgi      7 Mb     matrix     2020-09-03             <NA>
## 2  EH3739     acc_CTCF    1.2 Mb     matrix     2020-09-03             <NA>
## 3  EH3740      acc_DHS    0.3 Mb     matrix     2020-09-03             <NA>
## 4  EH3741 acc_genebody   49.6 Mb     matrix     2020-09-03             <NA>
## 5  EH3742     acc_p300    0.2 Mb     matrix     2020-09-03             <NA>
## 6  EH3743 acc_promoter   27.2 Mb     matrix     2020-09-03             <NA>
## 7  EH3745      met_cgi    4.6 Mb     matrix     2020-09-03             <NA>
## 8  EH3746     met_CTCF    0.1 Mb     matrix     2020-09-03             <NA>
## 9  EH3747      met_DHS    0.1 Mb     matrix     2020-09-03             <NA>
## 10 EH3748 met_genebody   26.8 Mb     matrix     2020-09-03             <NA>
## 11 EH3749     met_p300    0.1 Mb     matrix     2020-09-03             <NA>
## 12 EH3750 met_promoter   11.5 Mb     matrix     2020-09-03             <NA>
## 13 EH3751          rna   18.6 Mb     matrix     2020-09-03             <NA>

Or by simply running the scNMT function with defaults:

scNMT("mouse_gastrulation", version = "1.0.0")
##     ah_id         mode file_size rdataclass rdatadateadded rdatadateremoved
## 1  EH3738      acc_cgi      7 Mb     matrix     2020-09-03             <NA>
## 2  EH3739     acc_CTCF    1.2 Mb     matrix     2020-09-03             <NA>
## 3  EH3740      acc_DHS    0.3 Mb     matrix     2020-09-03             <NA>
## 4  EH3741 acc_genebody   49.6 Mb     matrix     2020-09-03             <NA>
## 5  EH3742     acc_p300    0.2 Mb     matrix     2020-09-03             <NA>
## 6  EH3743 acc_promoter   27.2 Mb     matrix     2020-09-03             <NA>
## 7  EH3745      met_cgi    4.6 Mb     matrix     2020-09-03             <NA>
## 8  EH3746     met_CTCF    0.1 Mb     matrix     2020-09-03             <NA>
## 9  EH3747      met_DHS    0.1 Mb     matrix     2020-09-03             <NA>
## 10 EH3748 met_genebody   26.8 Mb     matrix     2020-09-03             <NA>
## 11 EH3749     met_p300    0.1 Mb     matrix     2020-09-03             <NA>
## 12 EH3750 met_promoter   11.5 Mb     matrix     2020-09-03             <NA>
## 13 EH3751          rna   18.6 Mb     matrix     2020-09-03             <NA>

2.2 Data versions

A more recent release of the ‘mouse_gastrulation’ dataset was provided by Argelaguet and colleagues. This dataset includes additional cells that did not pass the original quality metrics as imposed for the version 1.0.0 dataset.

Use the version argument to indicate the newer dataset version (i.e., 2.0.0):

scNMT("mouse_gastrulation", version = '2.0.0', dry.run = TRUE)
##     ah_id         mode file_size rdataclass rdatadateadded rdatadateremoved
## 1  EH3753      acc_cgi   21.1 Mb     matrix     2020-09-03             <NA>
## 2  EH3754     acc_CTCF    1.2 Mb     matrix     2020-09-03             <NA>
## 3  EH3755      acc_DHS   16.2 Mb     matrix     2020-09-03             <NA>
## 4  EH3756 acc_genebody   60.1 Mb     matrix     2020-09-03             <NA>
## 5  EH3757     acc_p300    0.2 Mb     matrix     2020-09-03             <NA>
## 6  EH3758 acc_promoter   33.8 Mb     matrix     2020-09-03             <NA>
## 7  EH3760      met_cgi   12.1 Mb     matrix     2020-09-03             <NA>
## 8  EH3761     met_CTCF    0.1 Mb     matrix     2020-09-03             <NA>
## 9  EH3762      met_DHS    3.9 Mb     matrix     2020-09-03             <NA>
## 10 EH3763 met_genebody   33.9 Mb     matrix     2020-09-03             <NA>
## 11 EH3764     met_p300    0.1 Mb     matrix     2020-09-03             <NA>
## 12 EH3765 met_promoter   18.7 Mb     matrix     2020-09-03             <NA>
## 13 EH3766          rna   43.5 Mb     matrix     2020-09-03             <NA>

2.3 Downloading the data

To obtain the data, we can use the mode argument to indicate specific datasets using ‘glob’ patterns that will match the outputs above. For example, if we would like to have all ‘genebody’ datasets for all available assays, we would use *_genebody as an input to mode.

nmt <- scNMT("mouse_gastrulation", mode = c("*_DHS", "*_cgi", "*_genebody"),
    version = "1.0.0", dry.run = FALSE)
## Warning: sampleMap[['assay']] coerced with as.factor()
nmt
## A MultiAssayExperiment object of 6 listed
##  experiments with user-defined names and respective classes.
##  Containing an ExperimentList class object of length 6:
##  [1] acc_DHS: matrix with 290 rows and 826 columns
##  [2] met_DHS: matrix with 66 rows and 826 columns
##  [3] acc_cgi: matrix with 4459 rows and 826 columns
##  [4] met_cgi: matrix with 5536 rows and 826 columns
##  [5] acc_genebody: matrix with 17139 rows and 826 columns
##  [6] met_genebody: matrix with 15837 rows and 826 columns
## Functionality:
##  experiments() - obtain the ExperimentList instance
##  colData() - the primary/phenotype DataFrame
##  sampleMap() - the sample coordination DataFrame
##  `$`, `[`, `[[` - extract colData columns, subset, or experiment
##  *Format() - convert into a long or wide DataFrame
##  assays() - convert ExperimentList to a SimpleList of matrices
##  exportClass() - save data to flat files

2.4 Checking the cell metadata

Included in the colData DataFrame within the MultiAssayExperiment class are the variables cellID, stage, lineage10x_2, and stage_lineage. To extract this DataFrame, one has to use colData on the MultiAssayExperiment object:

colData(nmt)
## DataFrame with 826 rows and 4 columns
##                            cellID       stage     lineage10x_2
##                       <character> <character>      <character>
## E7.5_Plate1_A3     E7.5_Plate1_A3        E7.5         Endoderm
## E7.5_Plate1_H3     E7.5_Plate1_H3        E7.5         Endoderm
## E7.5_Plate1_D2     E7.5_Plate1_D2        E7.5         Endoderm
## E7.5_Plate1_D7     E7.5_Plate1_D7        E7.5         Endoderm
## E7.5_Plate1_F5     E7.5_Plate1_F5        E7.5         Endoderm
## ...                           ...         ...              ...
## PS_VE_Plate9_C11 PS_VE_Plate9_C11        E6.5         Epiblast
## PS_VE_Plate9_E11 PS_VE_Plate9_E11        E6.5         Epiblast
## PS_VE_Plate9_D11 PS_VE_Plate9_D11        E6.5 Primitive_Streak
## PS_VE_Plate9_A11 PS_VE_Plate9_A11        E6.5 Primitive_Streak
## PS_VE_Plate9_B11 PS_VE_Plate9_B11        E6.5         Mesoderm
##                          stage_lineage
##                            <character>
## E7.5_Plate1_A3           E7.5_Endoderm
## E7.5_Plate1_H3           E7.5_Endoderm
## E7.5_Plate1_D2           E7.5_Endoderm
## E7.5_Plate1_D7           E7.5_Endoderm
## E7.5_Plate1_F5           E7.5_Endoderm
## ...                                ...
## PS_VE_Plate9_C11         E6.5_Epiblast
## PS_VE_Plate9_E11         E6.5_Epiblast
## PS_VE_Plate9_D11 E6.5_Primitive_Streak
## PS_VE_Plate9_A11 E6.5_Primitive_Streak
## PS_VE_Plate9_B11         E6.5_Mesoderm

2.5 Exploring the data structure

Check row annotations:

rownames(nmt)
## CharacterList of length 6
## [["acc_DHS"]] ESC_DHS_118970 ESC_DHS_118919 ... ESC_DHS_68996 ESC_DHS_109494
## [["met_DHS"]] ESC_DHS_20778 ESC_DHS_14504 ... ESC_DHS_72133 ESC_DHS_72129
## [["acc_cgi"]] CGI_5278 CGI_6058 CGI_10627 ... CGI_7832 CGI_11329 CGI_10964
## [["met_cgi"]] CGI_3481 CGI_8941 CGI_956 CGI_9461 ... CGI_2867 CGI_3499 CGI_365
## [["acc_genebody"]] ENSMUSG00000036181 ENSMUSG00000071862 ... ENSMUSG00000025576
## [["met_genebody"]] ENSMUSG00000059334 ENSMUSG00000024026 ... ENSMUSG00000078302

The sampleMap is a graph representation of the relationships between cells and ‘assay’ datasets:

sampleMap(nmt)
## DataFrame with 4956 rows and 3 columns
##             assay                primary                colname
##          <factor>            <character>            <character>
## 1    met_genebody E4.5-5.5_new_Plate1_.. E4.5-5.5_new_Plate1_..
## 2    met_genebody E4.5-5.5_new_Plate1_.. E4.5-5.5_new_Plate1_..
## 3    met_genebody E4.5-5.5_new_Plate1_.. E4.5-5.5_new_Plate1_..
## 4    met_genebody E4.5-5.5_new_Plate1_.. E4.5-5.5_new_Plate1_..
## 5    met_genebody E4.5-5.5_new_Plate1_.. E4.5-5.5_new_Plate1_..
## ...           ...                    ...                    ...
## 4952      acc_DHS       PS_VE_Plate9_G05       PS_VE_Plate9_G05
## 4953      acc_DHS       PS_VE_Plate9_G08       PS_VE_Plate9_G08
## 4954      acc_DHS       PS_VE_Plate9_G09       PS_VE_Plate9_G09
## 4955      acc_DHS       PS_VE_Plate9_G12       PS_VE_Plate9_G12
## 4956      acc_DHS       PS_VE_Plate9_H08       PS_VE_Plate9_H08

Take a look at the cell identifiers or barcodes across assays:

colnames(nmt)
## CharacterList of length 6
## [["acc_DHS"]] E4.5-5.5_new_Plate1_A02 ... PS_VE_Plate9_H08
## [["met_DHS"]] E4.5-5.5_new_Plate1_A02 ... PS_VE_Plate9_H08
## [["acc_cgi"]] E4.5-5.5_new_Plate1_A02 ... PS_VE_Plate9_H08
## [["met_cgi"]] E4.5-5.5_new_Plate1_A02 ... PS_VE_Plate9_H08
## [["acc_genebody"]] E4.5-5.5_new_Plate1_A02 ... PS_VE_Plate9_H08
## [["met_genebody"]] E4.5-5.5_new_Plate1_A02 ... PS_VE_Plate9_H08

2.6 Chromatin Accessibility (acc_*)

See the accessibilty levels (as proportions) for DNase Hypersensitive Sites:

head(assay(nmt, "acc_DHS"))[, 1:4]
##                E4.5-5.5_new_Plate1_A02 E4.5-5.5_new_Plate1_A04
## ESC_DHS_118970              0.66666667                      NA
## ESC_DHS_118919              0.76190476                      NA
## ESC_DHS_66330               0.81818182               0.7142857
## ESC_DHS_43318                       NA               0.8000000
## ESC_DHS_6229                0.85714286               0.8000000
## ESC_DHS_9413                0.06666667               0.6800000
##                E4.5-5.5_new_Plate1_A07 E4.5-5.5_new_Plate1_A08
## ESC_DHS_118970                      NA               0.2631579
## ESC_DHS_118919               0.3636364               0.8421053
## ESC_DHS_66330                0.7391304               0.6086957
## ESC_DHS_43318                0.5000000               0.8888889
## ESC_DHS_6229                 0.3333333               0.7142857
## ESC_DHS_9413                 0.2142857               0.5217391

2.7 DNA Methylation (met_*)

See the methylation percentage / proportion:

head(assay(nmt, "met_DHS"))[, 1:4]
##                E4.5-5.5_new_Plate1_A02 E4.5-5.5_new_Plate1_A04
## ESC_DHS_20778                0.8000000                      NA
## ESC_DHS_14504                0.8000000                     0.8
## ESC_DHS_112143                      NA                     0.4
## ESC_DHS_34593                0.6666667                     0.6
## ESC_DHS_20747                0.4000000                     0.6
## ESC_DHS_33671                       NA                     0.6
##                E4.5-5.5_new_Plate1_A07 E4.5-5.5_new_Plate1_A08
## ESC_DHS_20778                0.8571429               0.8000000
## ESC_DHS_14504                0.8000000               0.6000000
## ESC_DHS_112143               0.5714286               0.5000000
## ESC_DHS_34593                0.7142857               0.8000000
## ESC_DHS_20747                       NA               0.6000000
## ESC_DHS_33671                0.8333333               0.6666667

For protocol information, see the references below.

3 sessionInfo

sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] scater_1.30.0               ggplot2_3.4.4              
##  [3] scran_1.30.0                scuttle_1.12.0             
##  [5] rhdf5_2.46.0                HDF5Array_1.30.0           
##  [7] RaggedExperiment_1.26.0     Matrix_1.6-1.1             
##  [9] SingleCellExperiment_1.24.0 SingleCellMultiModal_1.14.0
## [11] MultiAssayExperiment_1.28.0 SummarizedExperiment_1.32.0
## [13] Biobase_2.62.0              GenomicRanges_1.54.0       
## [15] GenomeInfoDb_1.38.0         IRanges_2.36.0             
## [17] S4Vectors_0.40.0            BiocGenerics_0.48.0        
## [19] MatrixGenerics_1.14.0       matrixStats_1.0.0          
## [21] BiocStyle_2.30.0           
## 
## loaded via a namespace (and not attached):
##   [1] jsonlite_1.8.7                magrittr_2.0.3               
##   [3] ggbeeswarm_0.7.2              magick_2.8.1                 
##   [5] farver_2.1.1                  rmarkdown_2.25               
##   [7] zlibbioc_1.48.0               vctrs_0.6.4                  
##   [9] memoise_2.0.1                 DelayedMatrixStats_1.24.0    
##  [11] RCurl_1.98-1.12               htmltools_0.5.6.1            
##  [13] S4Arrays_1.2.0                BiocBaseUtils_1.4.0          
##  [15] AnnotationHub_3.10.0          curl_5.1.0                   
##  [17] BiocNeighbors_1.20.0          Rhdf5lib_1.24.0              
##  [19] SparseArray_1.2.0             sass_0.4.7                   
##  [21] bslib_0.5.1                   plyr_1.8.9                   
##  [23] cachem_1.0.8                  igraph_1.5.1                 
##  [25] mime_0.12                     lifecycle_1.0.3              
##  [27] pkgconfig_2.0.3               rsvd_1.0.5                   
##  [29] R6_2.5.1                      fastmap_1.1.1                
##  [31] GenomeInfoDbData_1.2.11       shiny_1.7.5.1                
##  [33] digest_0.6.33                 colorspace_2.1-0             
##  [35] AnnotationDbi_1.64.0          dqrng_0.3.1                  
##  [37] irlba_2.3.5.1                 ExperimentHub_2.10.0         
##  [39] RSQLite_2.3.1                 beachmat_2.18.0              
##  [41] labeling_0.4.3                filelock_1.0.2               
##  [43] fansi_1.0.5                   httr_1.4.7                   
##  [45] abind_1.4-5                   compiler_4.3.1               
##  [47] bit64_4.0.5                   withr_2.5.1                  
##  [49] BiocParallel_1.36.0           viridis_0.6.4                
##  [51] DBI_1.1.3                     UpSetR_1.4.0                 
##  [53] rappdirs_0.3.3                DelayedArray_0.28.0          
##  [55] rjson_0.2.21                  bluster_1.12.0               
##  [57] tools_4.3.1                   vipor_0.4.5                  
##  [59] beeswarm_0.4.0                interactiveDisplayBase_1.40.0
##  [61] httpuv_1.6.12                 glue_1.6.2                   
##  [63] rhdf5filters_1.14.0           promises_1.2.1               
##  [65] grid_4.3.1                    cluster_2.1.4                
##  [67] generics_0.1.3                gtable_0.3.4                 
##  [69] BiocSingular_1.18.0           ScaledMatrix_1.10.0          
##  [71] metapod_1.10.0                utf8_1.2.4                   
##  [73] XVector_0.42.0                RcppAnnoy_0.0.21             
##  [75] ggrepel_0.9.4                 BiocVersion_3.18.0           
##  [77] pillar_1.9.0                  limma_3.58.0                 
##  [79] later_1.3.1                   dplyr_1.1.3                  
##  [81] BiocFileCache_2.10.0          lattice_0.22-5               
##  [83] bit_4.0.5                     tidyselect_1.2.0             
##  [85] locfit_1.5-9.8                Biostrings_2.70.1            
##  [87] knitr_1.44                    gridExtra_2.3                
##  [89] bookdown_0.36                 edgeR_4.0.0                  
##  [91] xfun_0.40                     statmod_1.5.0                
##  [93] yaml_2.3.7                    evaluate_0.22                
##  [95] codetools_0.2-19              tibble_3.2.1                 
##  [97] BiocManager_1.30.22           cli_3.6.1                    
##  [99] uwot_0.1.16                   xtable_1.8-4                 
## [101] munsell_0.5.0                 jquerylib_0.1.4              
## [103] Rcpp_1.0.11                   dbplyr_2.3.4                 
## [105] png_0.1-8                     parallel_4.3.1               
## [107] ellipsis_0.3.2                blob_1.2.4                   
## [109] sparseMatrixStats_1.14.0      bitops_1.0-7                 
## [111] SpatialExperiment_1.12.0      viridisLite_0.4.2            
## [113] scales_1.2.1                  purrr_1.0.2                  
## [115] crayon_1.5.2                  rlang_1.1.1                  
## [117] cowplot_1.1.1                 KEGGREST_1.42.0              
## [119] formatR_1.14

References

Argelaguet, Ricard, Stephen J Clark, Hisham Mohammed, L Carine Stapel, Christel Krueger, Chantriolnt-Andreas Kapourani, Ivan Imaz-Rosshandler, et al. 2019. “Multi-Omics Profiling of Mouse Gastrulation at Single-Cell Resolution.” Nature 576 (7787): 487–91.

Clark, Stephen J, Ricard Argelaguet, Chantriolnt-Andreas Kapourani, Thomas M Stubbs, Heather J Lee, Celia Alda-Catalinas, Felix Krueger, et al. 2018. “scNMT-seq Enables Joint Profiling of Chromatin Accessibility DNA Methylation and Transcription in Single Cells.” Nat. Commun. 9 (1): 781.