iSEE
DuoClustering2018 1.23.0
In this vignette we describe how to generate a SingleCellExperiment
object
combining observed values and clustering results for a data set from the
DuoClustering2018
package, and how the resulting object can be explored and
visualized with the iSEE
package (Rue-Albrecht et al. 2018).
suppressPackageStartupMessages({
library(SingleCellExperiment)
library(DuoClustering2018)
library(dplyr)
library(tidyr)
})
The different ways of retrieving a data set from the package are described in
the plot_performance
vignette. Here, we will load a data set using the
shortcut function provided in the package.
dat <- sce_filteredExpr10_Koh()
## see ?DuoClustering2018 and browseVignettes('DuoClustering2018') for documentation
## loading from cache
For this data set, we also load a set of clustering results obtained using different clustering methods.
res <- clustering_summary_filteredExpr10_Koh_v2()
## see ?DuoClustering2018 and browseVignettes('DuoClustering2018') for documentation
## loading from cache
We add the cluster labels for one run and for a set of different imposed number of clusters to the data set.
res <- res %>% dplyr::filter(run == 1 & k %in% c(3, 5, 9)) %>%
dplyr::group_by(method, k) %>%
dplyr::filter(is.na(resolution) | resolution == resolution[1]) %>%
dplyr::ungroup() %>%
tidyr::unite(col = method_k, method, k, sep = "_", remove = TRUE) %>%
dplyr::select(cell, method_k, cluster) %>%
tidyr::spread(key = method_k, value = cluster)
colData(dat) <- DataFrame(
as.data.frame(colData(dat)) %>%
dplyr::left_join(res, by = c("Run" = "cell"))
)
head(colData(dat))
## DataFrame with 6 rows and 55 columns
## Run LibraryName phenoid libsize.drop feature.drop total_features
## <character> <character> <character> <logical> <logical> <integer>
## 1 SRR3952323 H7hESC H7hESC FALSE FALSE 4895
## 2 SRR3952325 H7hESC H7hESC FALSE FALSE 4887
## 3 SRR3952326 H7hESC H7hESC FALSE FALSE 4888
## 4 SRR3952327 H7hESC H7hESC FALSE FALSE 4879
## 5 SRR3952328 H7hESC H7hESC FALSE FALSE 4873
## 6 SRR3952329 H7hESC H7hESC FALSE FALSE 4893
## log10_total_features total_counts log10_total_counts
## <numeric> <numeric> <numeric>
## 1 3.68984 2248411 6.35188
## 2 3.68913 2271617 6.35634
## 3 3.68922 584682 5.76692
## 4 3.68842 3191810 6.50404
## 5 3.68789 2190385 6.34052
## 6 3.68966 2187289 6.33991
## pct_counts_top_50_features pct_counts_top_100_features
## <numeric> <numeric>
## 1 18.2790 25.9754
## 2 24.6725 32.2228
## 3 22.7328 30.2060
## 4 20.8674 29.0039
## 5 21.2879 29.4237
## 6 20.5931 27.7401
## pct_counts_top_200_features pct_counts_top_500_features is_cell_control
## <numeric> <numeric> <logical>
## 1 35.5376 52.4109 FALSE
## 2 41.5474 57.9692 FALSE
## 3 39.4313 55.2858 FALSE
## 4 38.7856 56.0209 FALSE
## 5 39.3077 56.6410 FALSE
## 6 36.7819 52.7547 FALSE
## sizeFactor CIDR_3 CIDR_5 CIDR_9 FlowSOM_3 FlowSOM_5
## <numeric> <character> <character> <character> <character> <character>
## 1 1.889865 1 1 1 2 2
## 2 1.810539 1 1 1 2 2
## 3 0.486899 1 1 1 2 2
## 4 2.562950 1 1 1 2 2
## 5 1.848037 1 1 1 2 2
## 6 1.897451 1 1 1 2 2
## FlowSOM_9 PCAHC_3 PCAHC_5 PCAHC_9 PCAKmeans_3 PCAKmeans_5
## <character> <character> <character> <character> <character> <character>
## 1 4 1 1 1 3 1
## 2 4 1 1 1 3 1
## 3 4 1 1 1 3 1
## 4 4 1 1 1 3 1
## 5 4 1 1 1 3 1
## 6 4 1 1 1 3 1
## PCAKmeans_9 RaceID2_3 RaceID2_5 RaceID2_9 RtsneKmeans_3 RtsneKmeans_5
## <character> <character> <character> <character> <character> <character>
## 1 4 1 1 1 1 1
## 2 4 2 2 2 1 1
## 3 4 2 2 2 1 1
## 4 4 1 1 1 1 1
## 5 4 1 1 1 1 1
## 6 4 1 2 2 1 1
## RtsneKmeans_9 SAFE_3 SAFE_5 SAFE_9 SC3_3 SC3_5
## <character> <character> <character> <character> <character> <character>
## 1 9 2 1 3 1 3
## 2 9 2 1 5 1 3
## 3 9 2 1 3 1 3
## 4 9 2 1 5 1 3
## 5 9 2 1 5 1 3
## 6 9 2 1 5 1 3
## SC3_9 SC3svm_3 SC3svm_5 SC3svm_9 Seurat_9 TSCAN_3
## <character> <character> <character> <character> <character> <character>
## 1 4 3 3 3 5 1
## 2 4 3 3 3 5 1
## 3 4 3 3 3 5 3
## 4 4 3 3 3 5 1
## 5 4 3 3 3 5 2
## 6 4 3 3 3 5 1
## TSCAN_5 TSCAN_9 ascend_3 ascend_5 ascend_9 monocle_3
## <character> <character> <character> <character> <character> <character>
## 1 1 1 1 NA NA 3
## 2 1 2 1 NA NA 3
## 3 3 2 1 NA NA 3
## 4 1 1 1 NA NA 3
## 5 2 2 1 NA NA 3
## 6 1 1 1 NA NA 3
## monocle_5 monocle_9 pcaReduce_3 pcaReduce_5 pcaReduce_9
## <character> <character> <character> <character> <character>
## 1 3 3 1 5 5
## 2 3 3 1 5 5
## 3 3 3 1 5 5
## 4 3 3 1 5 5
## 5 3 3 1 5 5
## 6 3 3 1 5 5
iSEE
The resulting SingleCellExperiment
can be interactively explored using, e.g.,
the iSEE
package. This can be useful to gain additional understanding of the
partitions inferred by the different clustering methods, to visualize these in
low-dimensional representations (PCA or t-SNE), and to investigate how well they
agree with known or inferred groupings of the cells.
if (require(iSEE)) {
iSEE(dat)
}
sessionInfo()
## R version 4.4.0 RC (2024-04-16 r86468)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] tidyr_1.3.1 dplyr_1.1.4
## [3] DuoClustering2018_1.23.0 SingleCellExperiment_1.27.0
## [5] SummarizedExperiment_1.35.0 Biobase_2.65.0
## [7] GenomicRanges_1.57.0 GenomeInfoDb_1.41.0
## [9] IRanges_2.39.0 S4Vectors_0.43.0
## [11] BiocGenerics_0.51.0 MatrixGenerics_1.17.0
## [13] matrixStats_1.3.0 BiocStyle_2.33.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 viridisLite_0.4.2 blob_1.2.4
## [4] Biostrings_2.73.0 filelock_1.0.3 viridis_0.6.5
## [7] fastmap_1.1.1 BiocFileCache_2.13.0 digest_0.6.35
## [10] mime_0.12 lifecycle_1.0.4 KEGGREST_1.45.0
## [13] RSQLite_2.3.6 magrittr_2.0.3 compiler_4.4.0
## [16] rlang_1.1.3 sass_0.4.9 tools_4.4.0
## [19] utf8_1.2.4 yaml_2.3.8 knitr_1.46
## [22] S4Arrays_1.5.0 bit_4.0.5 mclust_6.1.1
## [25] curl_5.2.1 DelayedArray_0.31.0 plyr_1.8.9
## [28] abind_1.4-5 withr_3.0.0 purrr_1.0.2
## [31] grid_4.4.0 fansi_1.0.6 ExperimentHub_2.13.0
## [34] colorspace_2.1-0 ggplot2_3.5.1 scales_1.3.0
## [37] cli_3.6.2 rmarkdown_2.26 crayon_1.5.2
## [40] generics_0.1.3 httr_1.4.7 reshape2_1.4.4
## [43] DBI_1.2.2 cachem_1.0.8 stringr_1.5.1
## [46] zlibbioc_1.51.0 ggthemes_5.1.0 AnnotationDbi_1.67.0
## [49] BiocManager_1.30.22 XVector_0.45.0 vctrs_0.6.5
## [52] Matrix_1.7-0 jsonlite_1.8.8 bookdown_0.39
## [55] bit64_4.0.5 jquerylib_0.1.4 glue_1.7.0
## [58] stringi_1.8.3 gtable_0.3.5 BiocVersion_3.20.0
## [61] UCSC.utils_1.1.0 munsell_0.5.1 tibble_3.2.1
## [64] pillar_1.9.0 rappdirs_0.3.3 htmltools_0.5.8.1
## [67] GenomeInfoDbData_1.2.12 R6_2.5.1 dbplyr_2.5.0
## [70] evaluate_0.23 lattice_0.22-6 AnnotationHub_3.13.0
## [73] png_0.1-8 memoise_2.0.1 bslib_0.7.0
## [76] Rcpp_1.0.12 gridExtra_2.3 SparseArray_1.5.0
## [79] xfun_0.43 pkgconfig_2.0.3
Rue-Albrecht, K, F Marini, C Soneson, and ATL Lun. 2018. “iSEE: Interactive SummarizedExperiment Explorer.” F1000Research 7: 741.