chevreulShiny allows exploratory data analysis of a single cell dataset processed using SingleCellExperiment with the command minimalSceApp()

Here we’ll provide a brief tour of this app organized by section with app functions organized as tabs along the sidebar upon startup.

0.0.1 Reformat Metadata

Arbitrary variables can be appended to the cell metadata based on the results of exploratory data analysis.

Metadata addition can be executed by uploading a csv with row names matching cell ids with new variables as columns. Metadata editing can also be accomplished using a built-in spreadsheet tool.

0.0.2 Plot Data

0.0.2.1 Dimensional reduction plots

chevreulShiny provides visualization of embeddings in PCA, tSNE, and UMAP for scRNAseq data summarized at the gene and transcript level. Plots can be customized to display cell metadata and the results of community detection clustering by Louvain or Leiden algorithms as implemented in SingleCellExperiment. In addition, gene and transcript feature expression can be overlaid on embedding plots both at the gene and transcript level.

0.0.2.2 read/UMI count histograms

To facilitate quality control, cell-level summarized UMI and read count values can also be overlaid with cell metadata and clustering results.

0.0.2.3 Clustering trees

A cluster tree of cell identities is displayed using scclustviz. This plot can be used to help establish an optimal clustering resolution.

In order to use this function library(clustree) must be loaded.

0.0.3 Heat Maps/Violin Plots

0.0.3.1 Heat Maps

When plotting the heat map, gene/transcript expression values for each cell are normalized by that cell’s total expression then multiplied by 10,000 and natural-log transformed before plotting.

Clustering model or metadata for column arrangement is taken as a parameter to produce a complex heat map.

By default, 50 most highly variable genes are displayed. However, an arbitrary lists of genes can be plotted for comparison. The genes/transcripts are displayed in the order they are listed.

0.0.3.2 Violin Plots

Feature (gene or transcript) expression can be viewed by violin plot based on sample metadata

Violin plots are a hybrid of density and box plots. It shows the probability density of the data at different values providing a visual representation of distribution of expression level of a feature in different groups of cell. The horizontal line is a marker for the median of the data and the box shows the interquartile ranges.

The parameters that could to be chosen here are: 1. ‘Grouping variable’ 2. ‘Data Type (Transformed/Raw)’

0.0.4 Differential expression

chevreulShiny provides different methods of differential expression analysis that users can choose from to determine differential expression of gene. DE methods included in chevreulShiny are t-test, Wilcoxon rank-sum test, and pairwise binomial test.

Running DE testing results in a data frame containing the following information: * p_val : Unadjusted p-value * avg_log2FC : log fold-change of the average expression between the two groups. * pct.1 : The percentage of cells where the feature is detected in the first group * pct.2 : The percentage of cells where the feature is detected in the second group * p_val_adj : Adjusted p-value, based on bonferroni correction using all features in the dataset.

d1 <- read.csv("Dominic.csv", header = TRUE)

0.0.4.1 Volcano Plots

The results of differential expression analyses can be visualized using a volcano plot. This plot helps in visual identification of genes with large fold changes that are also statistically significant.

### Find Markers

Marker features based on louvain/leiden cluster identities or cell metadata can be defined based on results of wilcoxon rank-sum test. Variable numbers of marker genes per cell group can be specified based on adjusted p value and thresholded log fold change.

0.0.5 Subset SingleCellExperiment Input

It is often useful to subset a single cell data set based on cell metadata whether experimentally determined (age, collection method, etc.) or derived from analysis (quality control metrics, annotated cell type)

chevreulShiny makes it simple to subset a dataset consisting of a single batch or batch-integrated dataset. Subsetting can be accomplished either in a graphical setting by lasso-selection from a dimensionally reduced plot or by by specification of a formatted .csv file.

Subsetting of single or batch integrated data will trigger renewal of all relevant preprocessing steps including dimensional reduction, clustering, marker gene, and pathway enrichment as well as integration based on a ‘batch’ variable

0.0.6 All Transcripts

If expression is summarized with both gene and transcript experiments, it is possible to plot all constituent transcripts (labeled by ensembl transcript ids) making up a given gene.

The two parameters that must be chosen are:

  1. dimensional reduction method: PCA, UMAP or tSNE
  2. The name of the gene of interest

Direct comparison of the contribution of individual transcripts can be achieved using stacked bar plots to answer the question: What contribution does each transcript make for each gene expression?

0.0.7 Regress Features

To correct for expression variation in analysis due to phenomena unrelated to focus of the study, rather than excluding count values attributable to such process, we can adjust the expression of all remaining genes or transcripts in each cell based on the sum score of relevant genes, to regress out cell-cycle effects.

0.0.8 Technical information

Technical information regarding upstream processing, software version numbers, and dataset features can be viewed.

0.1 Server Mode

If we want work with objects interactively and save the results of our exploratory data analysis we need to operate chevreulShiny in server mode. When operating in this manner it is possible to carry out basic project management tasks including: * Loading saved objects * Saving objects after making changes in the app such as : * Reformatting metadata * Subsetting to remove cells * Regressing feature expression as well as * Integrating multiple saved objects

Finally, server mode is necessary for some expanded functionality including: * Coverage plotting

server mode requires access to a few sqlite databases with file path details stored in ~/.cache/chevreul. single_cell_projects.db and bigwig_files.db allowing dynamic loading of objects and bigwig (.bw) sample coverage files for each cell. These are initialized upon package installation using the creat_project_db function.

The following sections will give details on this expanded functionality.

0.1.1 Integrate Projects

While operating chevreulShiny in server mode, separate sequencing batches can be integrated and compared to validate sample processing steps and exclude technical variation in favor of relevant biological variation.

This section includes a list of projects that can be selected, integrated, and saved for future analysis.

0.1.2 Coverage plots

Fine-grained analysis of isoform contributions can be achieved by plotting absolute read coverage plots across a given gene.

indicate the **read depths across all transcripts for different groups of cells within a genomic region.

The three user input parameters are: 1. ‘Select a gene’- Select a gene of interest 2. ‘Color by variable’- Select a variable by which to group the cells by 3. ‘Groups to display’- Select the groups to be displayed

0.2 Session information

#> R Under development (unstable) (2025-02-19 r87757)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] chevreulShiny_0.99.29       chevreulPlot_0.99.34        chevreulProcess_0.99.27     scater_1.35.3              
#>  [5] ggplot2_3.5.1               scuttle_1.17.0              shinydashboard_0.7.2        shiny_1.10.0               
#>  [9] SingleCellExperiment_1.29.1 SummarizedExperiment_1.37.0 Biobase_2.67.0              GenomicRanges_1.59.1       
#> [13] GenomeInfoDb_1.43.4         IRanges_2.41.3              S4Vectors_0.45.4            BiocGenerics_0.53.6        
#> [17] generics_0.1.3              MatrixGenerics_1.19.1       matrixStats_1.5.0           BiocStyle_2.35.0           
#> 
#> loaded via a namespace (and not attached):
#>   [1] later_1.4.1               batchelor_1.23.0          BiocIO_1.17.1             ggplotify_0.1.2          
#>   [5] bitops_1.0-9              tibble_3.2.1              polyclip_1.10-7           XML_3.99-0.18            
#>   [9] lifecycle_1.0.4           edgeR_4.5.6               doParallel_1.0.17         globals_0.16.3           
#>  [13] MASS_7.3-65               lattice_0.22-6            ensembldb_2.31.0          alabaster.base_1.7.7     
#>  [17] magrittr_2.0.3            limma_3.63.8              plotly_4.10.4             sass_0.4.9               
#>  [21] rmarkdown_2.29            jquerylib_0.1.4           yaml_2.3.10               shinyBS_0.61.1           
#>  [25] metapod_1.15.0            httpuv_1.6.15             EnhancedVolcano_1.25.0    DBI_1.2.3                
#>  [29] RColorBrewer_1.1-3        ResidualMatrix_1.17.0     abind_1.4-8               purrr_1.0.4              
#>  [33] ggraph_2.2.1              AnnotationFilter_1.31.0   RCurl_1.98-1.16           yulab.utils_0.2.0        
#>  [37] rappdirs_0.3.3            tweenr_2.0.3              circlize_0.4.16           GenomeInfoDbData_1.2.13  
#>  [41] ggrepel_0.9.6             irlba_2.3.5.1             listenv_0.9.1             megadepth_1.17.0         
#>  [45] cmdfun_1.0.2              parallelly_1.42.0         dqrng_0.4.1               DelayedMatrixStats_1.29.1
#>  [49] codetools_0.2-20          DelayedArray_0.33.6       ggforce_0.4.2             DT_0.33                  
#>  [53] tidyselect_1.2.1          shape_1.4.6.1             UCSC.utils_1.3.1          farver_2.1.2             
#>  [57] rhandsontable_0.3.8       wiggleplotr_1.31.0        ScaledMatrix_1.15.0       viridis_0.6.5            
#>  [61] shinyWidgets_0.9.0        GenomicAlignments_1.43.0  jsonlite_1.9.1            GetoptLong_1.0.5         
#>  [65] BiocNeighbors_2.1.2       waiter_0.2.5              tidygraph_1.3.1           iterators_1.0.14         
#>  [69] foreach_1.5.2             tools_4.5.0               Rcpp_1.0.14               glue_1.8.0               
#>  [73] gridExtra_2.3             SparseArray_1.7.6         xfun_0.51                 dplyr_1.1.4              
#>  [77] withr_3.0.2               BiocManager_1.30.25       fastmap_1.2.0             clustree_0.5.1           
#>  [81] rhdf5filters_1.19.1       bluster_1.17.0            shinyjs_2.1.0             digest_0.6.37            
#>  [85] rsvd_1.0.5                gridGraphics_0.5-1        R6_2.6.1                  mime_0.12                
#>  [89] colorspace_2.1-1          RSQLite_2.3.9             tidyr_1.3.1               data.table_1.17.0        
#>  [93] rtracklayer_1.67.1        graphlayouts_1.2.2        httr_1.4.7                htmlwidgets_1.6.4        
#>  [97] S4Arrays_1.7.3            pkgconfig_2.0.3           gtable_0.3.6              blob_1.2.4               
#> [101] ComplexHeatmap_2.23.0     XVector_0.47.2            htmltools_0.5.8.1         shinyhelper_0.3.2        
#> [105] bookdown_0.42             ProtGenerics_1.39.2       clue_0.3-66               scales_1.3.0             
#> [109] png_0.1-8                 scran_1.35.0              rstudioapi_0.17.1         knitr_1.49               
#> [113] tzdb_0.4.0                rjson_0.2.23              curl_6.2.1                rhdf5_2.51.2             
#> [117] cachem_1.1.0              GlobalOptions_0.1.2       stringr_1.5.1             miniUI_0.1.1.1           
#> [121] parallel_4.5.0            vipor_0.4.7               AnnotationDbi_1.69.0      restfulr_0.0.15          
#> [125] alabaster.schemas_1.7.0   pillar_1.10.1             grid_4.5.0                vctrs_0.6.5              
#> [129] promises_1.3.2            shinyFiles_0.9.3          BiocSingular_1.23.0       EnsDb.Hsapiens.v86_2.99.0
#> [133] beachmat_2.23.6           xtable_1.8-4              cluster_2.1.8             beeswarm_0.4.0           
#> [137] evaluate_1.0.3            readr_2.1.5               GenomicFeatures_1.59.1    cli_3.6.4                
#> [141] locfit_1.5-9.11           compiler_4.5.0            Rsamtools_2.23.1          rlang_1.1.5              
#> [145] crayon_1.5.3              DataEditR_0.1.5           forcats_1.0.0             fs_1.6.5                 
#> [149] ggbeeswarm_0.7.2          stringi_1.8.4             viridisLite_0.4.2         BiocParallel_1.41.2      
#> [153] munsell_0.5.1             Biostrings_2.75.4         lazyeval_0.2.2            Matrix_1.7-2             
#> [157] hms_1.1.3                 patchwork_1.3.0           future_1.34.0             sparseMatrixStats_1.19.0 
#> [161] bit64_4.6.0-1             Rhdf5lib_1.29.1           KEGGREST_1.47.0           statmod_1.5.0            
#> [165] igraph_2.1.4              memoise_2.0.1             bslib_0.9.0               bit_4.5.0.1