spatialHeatmap 2.6.0
The primary utility of the spatialHeatmap package is the generation of spatial heatmaps (SHM) for visualizing cell-, tissue- and organ-specific abundance patterns of biological molecules (e.g. RNAs) in spatial anatomical images (Zhang et al. 2022). This is useful for identifying biomolecules with spatially enriched/depleted abundance patterns as well as clusters and/or network modules composed of biomolecules sharing similar abundance patterns such as similar gene expression patterns. These functionalities are introduced in the main vignette of this package. The following describes extended functionalities for integrating tissue with single cell data by co-visualizing them in composite plots that combine spatial heatmaps with embedding plots of high-dimensional data. The resulting spatial context information is important for gaining insights into the tissue-level organization of single cell data or vice versa.
The required quantitative bulk and single-cell assay data, such as gene expression values, can be
provided in the widely used tabular data structures
SummarizedExperiment
(SE
) and SingleCellExperiment
(SCE
) respectively (Figure 1A, C), while the corresponding anatomical images need to be supplied as annotated SVG (aSVG) images and can be stored in a specific S4 class SVG
(Figure 1B). More details of aSVGs is described in the main vignette of this package. In addition, multiple methods are supported for associating single cells with source tissues and coloring the associated cells and tissues (Figure 2).
For the implementation of the co-visualization functionality, spatialHeatmap takes advantage of efficient and reusable S4 classes for both assay data and aSVGs respectively. The former includes the Bioconductor core data structures SummarizedExperiment
(SE
, Morgan et al. (2018)) and SingleCellExperiment
(SCE
, Amezquita et al. (2020)) for bulk and single-cell data respectively (Figure 1A, C). The slots
assays
, colData
, and rowData
contain expression values,
tissue/cell metadata, and biomolecule metadata respectively. For the embedding plots of single cell data, several dimension reduction algorithms (e.g. PCA, UMAP or tSNE) are supported, and the reduced dimensionality embedding results are stored in the reducedDims
slot of SCE
.
The S4 class SVG
(Figure 1B) is developed specifically in spatialHeatmap
for storing aSVG instances. The two most important slots coordinate
and attribute
stores the aSVG feature coordinates and respective attributes (colors, line withs, etc) respectively, while other slots dimension
, svg
, and raster
stores image dimension, aSVG file paths, and raster image paths respectively. Moreover, the meta class SPHM
(Figure 1D) is developed to harmonize these data objects.
When creating co-visualization plots (Figure 1a-b), SHMs are created by mapping expression values from SE
to corresponding spatial features in SVG
through the same identifiers (here TissuesA and TissueB) between the two, and single cells in SCE
are associated with spatial features through their group labels (here TissuesA and TissueB) stored in the colData
slot.
Figure 1: Schematic view of data structures and creation of co-visualization plots
File imports, classes, and plotting functionalities are illustrated in boxes with color-coded title bars in grey, blue and green, respectively. Quantitative and experimental design data (I) are imported into matching slots of an SE
container (A). aSVG image files are stored in SVG
containers (B). Expression profiles of a chosen gene (GeneX) in (A) are mapped to the corresponding spatial features in (B) via common identifiers (here TissuesA and TissueB). The quantitative data is represented in the matching features by colors according to a number to color key and the output is an SHM (a). For co-visualization plots, single-cell data are stored in the SCE
object class (C). Reduced dimension data for embedding plots can be generated in R or imported from files. The single-cell embedding results are co-visualized with SHMs where the cell-to-tissue mappings are indicated by common colors in the co-visualization plot (b). The SPHM
meta class organizes the individual objects (A)-(C) along with internally generated data.
To co-visualize bulk and single-cell data (Figure
1b), the individual cells of the single-cell data are mapped
via their group labels to the corresponding tissue features in an aSVG image. If
the feature labels in an aSVG are different than the corresponding cell group labels, e.g. due to variable terminologies, a
translation map can be used to avoid manual relabelling. Throughout this
vignette the term feature is a generalization referring in most
cases to tissues or organs. For handling cell grouping
information, five major methods are supported including (a)
annotation labels, (b) manual assignments, (c) marker genes, (d) clustering labels, and (e) automated co-clusterirng (Figure 2a).
The first three are similar by using known cell group
labels. The main difference is how the cell labels are provided. In the
annotation-based method, existing group labels are available and can be
uploaded and/or stored in the SCE
object, as is the case in some of the SCE
instances provided by the scRNAseq
package (Risso and Cole 2022). The manual method
allows users to create the cell to tissue associations one-by-one or import
them from a tabular file. The marker-gene method utilizes known marker genes to group cells. In the clustering method, cells are clustered and grouped by clustering labels. In contrast, the automated co-clustering aims to
assign source tissues to corresponding single cells computationally by a
co-clustering method (Figure 8). This method is
experimental and requires bulk expression data that are obtained from the
tissues represented in the single-cell data.
The matching between cell groups in the embedding plots and tissue features in SHMs are indicated with four coloring schemes (Figure 2b). The first three ‘fixed-group’, ‘cell-by-group’, and ‘feature-by-group’ assign the same color for a cell group and matching tissue. The main difference is that ‘fixed-group’ uses constance colors while the latte two uses heat colors that is proportional to the numeric expression information obtained from the single cell or bulk expression data of a chosen gene. When expression values among groups are very similar, toggling between the constant and heat colors is important to track the tissue origin in the single cell data. In ‘cell-by-group’ coloring, one often wants to first summarize the expression of a given gene across the cells within each group via a meaningful summary statistics, such as mean or median, then heat colors are created from the summary values and assigned to the corresponding cells and tissues (Figure 21-2), so the mapping direction is cell-to-tissue. The ‘feature-by-group’ coloring is very similar except that heat colors are based on summary values of each tissue. The mapping direction in this option is tissue-to-cell. The most meanful coloring is ‘cell-by-value’ (Figure 22-3). In this option, each cell and tissue is colored according to respective expression values of a chosen gene, so the cellular heterogeneity is reflected.
Similar to other functionalities in spatialHeatmap, these functionalities are available within R as well as the corresponding Shiny App (Chang et al. 2021).