1 Overview

SpaceMarkers leverages latent feature analysis of the spatial components of transcriptomic data to identify biologically relevant molecular interactions between cell groups.This tutorial will use the latent features from CoGAPS to look at pattern interactions in a Visium 10x breast ductal carcinoma spatial transcriptomics dataset.

2 Installation

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("SpaceMarkers")

library(SpaceMarkers)

3 Setup

3.1 Links to Data

The data that will be used to demonstrate SpaceMarkers’ capabilities is a human breast cancer spatial transcriptomics dataset that comes from Visium. The CoGAPS patterns as seen in the manuscript Atul Deshpande et al. will also be used

data_dir <- "visiumBrCA"
unlink(file.path(data_dir), recursive = TRUE)
dir.create(data_dir,showWarnings = FALSE)
main_10xlink <- "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0"
counts_folder <- "Visium_Human_Breast_Cancer"
counts_file <- "Visium_Human_Breast_Cancer_filtered_feature_bc_matrix.h5"
counts_url<-paste(c(main_10xlink,counts_folder,counts_file), collapse = "/")
sp_folder <- "Visium_Human_Breast_Cancer"
sp_file <- "Visium_Human_Breast_Cancer_spatial.tar.gz"
sp_url<-paste(c(main_10xlink,sp_folder,sp_file),collapse = "/")

3.2 Extracting Counts Matrix

3.2.1 load10xExpr

Here the counts matrix will be obtained from the h5 object on the Visium site and genes with less than 3 counts are removed from the dataset.This can be achieved with the load10XExpr function.

download.file(counts_url,file.path(data_dir,basename(counts_url)), mode = "wb")
counts_matrix <- load10XExpr(visiumDir = data_dir, h5filename = counts_file)
good_gene_threshold <- 3
goodGenes <- rownames(counts_matrix)[
    apply(counts_matrix,1,function(x) sum(x>0)>=good_gene_threshold)]

## Warning in asMethod(object): sparse->dense coercion: allocating vector of size
## 1.3 GiB

counts_matrix <- counts_matrix[goodGenes,]

3.3 Obtaining CoGAPS Patterns

In this example the latent features from CoGAPS will be used to identify interacting genes with SpaceMarkers. Here the featureLoadings (genes) and samplePatterns (barcodes) for both the expression matrix and CoGAPS matrix need to match.

cogaps_result <- readRDS(system.file("extdata","CoGAPS_result.rds",
    package="SpaceMarkers",mustWork = TRUE))
features <- intersect(rownames(counts_matrix),rownames(
    slot(cogaps_result,"featureLoadings")))
barcodes <- intersect(colnames(counts_matrix),rownames(
    slot(cogaps_result,"sampleFactors")))
counts_matrix <- counts_matrix[features,barcodes]
cogaps_matrix<-slot(cogaps_result,"featureLoadings")[features,]%*%
    t(slot(cogaps_result,"sampleFactors")[barcodes,])

3.4 Obtaining Spatial Coordinates

3.4.1 load10XCoords

The spatial coordinates will also be pulled from Visium for this dataset. These are combined with the latent features to demonstrate how cells for each pattern interact in 2D space. The data can be extracted with the load10XCoords() function

download.file(sp_url, file.path(data_dir,basename(sp_url)), mode = "wb")
untar(file.path(data_dir,basename(sp_url)), exdir = file.path(data_dir))
spCoords <- load10XCoords(visiumDir = data_dir, version = "1.0")
rownames(spCoords) <- spCoords$barcode
spCoords <- spCoords[barcodes,]
spPatterns <- cbind(spCoords,slot(cogaps_result,"sampleFactors")[barcodes,])
head(spPatterns)

##                               barcode         y        x    Pattern_1
## AAACAACGAATAGTTC-1 AAACAACGAATAGTTC-1  67.28568 207.4858 0.4676255882
## AAACAAGTATCTCCCA-1 AAACAAGTATCTCCCA-1 238.79054 375.0650 0.2690758109
## AAACAATCTACTAGCA-1 AAACAATCTACTAGCA-1  77.82161 260.3531 0.1105933860
## AAACACCAATAACTGC-1 AAACACCAATAACTGC-1 268.53653 212.2053 0.0002508377
## AAACAGAGCGACTCCT-1 AAACAGAGCGACTCCT-1 115.92419 360.0982 0.2849308848
## AAACAGCTTTCAGAAG-1 AAACAGCTTTCAGAAG-1 213.86511 192.9231 0.1583736390
##                       Pattern_2    Pattern_3    Pattern_4    Pattern_5
## AAACAACGAATAGTTC-1 1.049391e-01 2.576064e-01 0.6848062277 4.747092e-02
## AAACAAGTATCTCCCA-1 4.394425e-01 2.056469e-01 0.2921337187 1.167576e-02
## AAACAATCTACTAGCA-1 1.148523e-02 2.309153e-01 0.4111314714 9.508318e-02
## AAACACCAATAACTGC-1 1.685795e-01 1.223603e-01 0.0001562788 8.041928e-01
## AAACAGAGCGACTCCT-1 1.102506e-01 9.053156e-08 0.2429406196 3.430807e-08
## AAACAGCTTTCAGAAG-1 9.741083e-06 1.723470e-01 0.3059957027 7.167605e-01

For demonstration purposes we will look at two patterns; Pattern_1 (immune cell) and Pattern_5 (invasive carcinoma lesion). Furthermore we will only look at the relationship between a pre-curated list of genes for efficiency.

data("curated_genes")
spPatterns <- spPatterns[c("barcode","y","x","Pattern_1","Pattern_5")]
counts_matrix <- counts_matrix[curated_genes,]
cogaps_matrix <- cogaps_matrix[curated_genes, ]

4 Executing SpaceMarkers

4.1 SpaceMarker Modes

SpaceMarkers can operate in ‘residual’ or ‘DE’ (DifferentialExpression) mode. In an ideal world the overlapping patterns identified by SpaceMarkers would be a homogeneous population of cells and the relationship between them would be linear. However, due to confounding effects of variations in cell density and common cell types in any given region, this is not always true.

To account for these confounding effects, the ‘residual’ mode compares the feature interactions between the expression matrix and the reconstructed latent space matrix. The features with the highest residual error are reported. The genes are then classified according to regions of overlapping vs exclusive influence. The default mode is ‘residual’ mode.

However this is not to say there is no utility for DE mode. Suppose the feature (gene) information is not readily available and only the sample (cells) latent feature patterns with P-values are available? This is the advantage of ‘DE’ mode. Where residual mode assesses the non-linear effects that may arise from confounding variables, ‘DE’ mode assesses simple linear interactions between patterns directly from expression. DE mode like residual mode also compares genes from regions of overlapping vs exclusive influence butdoes not consider residuals from the expression matrix as there is no matrix reconstruction with the latent feature matrix.

4.1.1 Residual Mode

4.1.1.1 SpaceMarkers Step1: Hotpsots

SpaceMarkers identifies regions of influence using a gaussian kernel outlier based model. Spots that have spatial influence beyond the defined outlier threshold are termed hotspots. SpaceMarkers then identifies where the hotspots are overlapping/interacting and where they are mutually exclusive.

getSpatialParameters: This function sets the width of the spatial kernel (sigmaOpt) as well as the outlier threshold around the set of spots (threshOpt) for each pattern. By default, the sigmaOpt is set to the spot diameter at the appropriate resolution. Note that the legacy function has been deprecated and has been renamed to getSpatialParamsMoransI. Please read the documentation for more information.

optParams <- getSpatialParameters(spPatterns,visiumDir = data_dir,
                                          resolution = "lowres")

## Reading spot diameter from specified .json file

4.1.1.2 SpaceMarkers Step2: Interacting Genes

getPairwiseInteractingGenes: This function identifies the regions of influence and interaction as well as the genes associated with these regions. A non-parametric Kruskal-Wallis test is used to identify statistically significant genes in any one region of influence without discerning which region is more significant. A post hoc Dunn’s Test is used for analysis of genes between regions and can distinguish which of two regions is more significant. If ‘residual’ mode is selected the user must provide a reconstructed matrix from the latent feature matrix. The matrix is passed to the ‘reconstruction’ argument and can be left as NULL for ‘DE’ mode. The ‘data’ parameter is the original expression matrix. The ‘spPatterns’ argument takes a dataframe with the spatial coordinates of each cell as well as the patterns. The spatial coordinate columns must contain the labels ‘x’ and ‘y’ to be recognized by the function. The output of this are all possible pairs fo interactions from the spatial patterns.

SpaceMarkers <- getPairwiseInteractingGenes(data = counts_matrix,
                                    reconstruction = cogaps_matrix,
                                    optParams = optParams,
                                    spPatterns = spPatterns,
                                    mode ="residual",analysis="overlap")

## patternPairs not provided. Calculating all 
##             possible pairs.

## Using user provided optParams.

## Calculating genes of interest for Pattern_1 and Pattern_5

## Warning in matrixTests::row_kruskalwallis(x = testMat, g = region): 2320
## columns dropped due to missing group information

NB: When running getPairwiseInteractingGenes some warnings may be generated. The warnings are due to the nature of the ‘sparse’ data being used. Comparing two cells from the two patterns with identical information is redundant as SpaceMarkers is identifying statistically different expression for interactions exclusive to either of the two patterns and a region that is due to interaction between the given two patterns. Also, if there are too many zeros in the genes (rows) of those regions, the columns are dropped as there is nothing to compare in the Kruskal Wallis test.

print(head(SpaceMarkers[[1]]$interacting_genes[[1]]))

##          Gene Pattern_1 x Pattern_5 KW.obs.tot KW.obs.groups KW.df KW.statistic
## CST1     CST1                vsBoth       2578             3     2    125.46835
## APOE     APOE                vsBoth       2578             3     2     37.80453
## APOC1   APOC1                vsBoth       2578             3     2    271.02023
## IFI30   IFI30                vsBoth       2578             3     2    156.99958
## COL4A1 COL4A1                vsBoth       2578             3     2     79.33185
## CLU       CLU                vsBoth       2578             3     2    125.23575
##           KW.pvalue     KW.p.adj Dunn.zP1_Int Dunn.zP2_Int Dunn.zP2_P1
## CST1   5.687127e-28 1.662391e-27   -10.276169   -11.091384 -1.20777279
## APOE   6.178046e-09 1.257674e-08    -5.888909    -5.945326  0.05899994
## APOC1  1.408334e-59 5.946301e-59   -12.755379   -16.313048 -6.15453456
## IFI30  8.090489e-35 2.561988e-34   -10.088355   -12.484250 -4.10348175
## COL4A1 5.933441e-18 1.470461e-17    -7.568964    -8.906696 -2.23783263
## CLU    6.388564e-28 1.820741e-27   -11.178092    -9.253495  3.82726806
##        Dunn.pval_1_Int Dunn.pval_2_Int Dunn.pval_2_1 Dunn.pval_1_Int.adj
## CST1      9.024361e-25    1.381335e-28  2.271347e-01        5.672455e-24
## APOE      3.887544e-09    2.759075e-09  9.529522e-01        1.236520e-08
## APOC1     2.909553e-37    7.971396e-60  7.529830e-10        2.648696e-36
## IFI30     6.220289e-24    9.099627e-36  4.069786e-05        3.732173e-23
## COL4A1    3.762125e-14    5.257605e-19  2.523197e-02        1.460590e-13
## CLU       5.220049e-29    2.172704e-20  1.295734e-04        4.053214e-28
##        Dunn.pval_2_Int.adj Dunn.pval_2_1.adj SpaceMarkersMetric
## CST1          9.856012e-28      3.656314e-01           6.575456
## APOE          8.992540e-09      1.000000e+00           5.170385
## APOC1         8.768535e-59      2.548558e-09           5.121392
## IFI30         8.007671e-35      1.033099e-04           4.986063
## COL4A1        2.393117e-18      4.724284e-02           4.959997
## CLU           1.062211e-19      3.167349e-04           4.808703

print(head(SpaceMarkers[[1]]$hotspots))

##      Pattern_1   Pattern_5  
## [1,] "Pattern_1" NA         
## [2,] "Pattern_1" NA         
## [3,] NA          NA         
## [4,] NA          "Pattern_5"
## [5,] "Pattern_1" NA         
## [6,] NA          "Pattern_5"

The output is a list of data frames with information about the interacting genes between patterns from the CoGAPS matrix (interacting_genes object). There is also a data frame with all of the regions of influence for any two of patterns (the hotspotRegions object).

For the ‘interacting_genes’ data frames, the first column is the list of genes and the second column says whether the statistical test were done vsPattern_1, vsPattern_2 or vsBoth. The remaining columns are statistics for the Kruskal-Wallis test and the post hoc Dunn’s test.The SpaceMarkersMetric column is a product of sums of the Dunn’s statistics and is used to rank the genes.

4.1.2 DE Mode

As described previously ‘DE’ mode only requires the counts matrix and spatial patterns and not the reconstructed CoGAPS matrix. It identifies simpler molecular interactions between regions and still executes the ‘hotspots’ and ‘interacting genes’ steps of SpaceMarkers

SpaceMarkers_DE <- getPairwiseInteractingGenes(
    data=counts_matrix,reconstruction=NULL,
    optParams = optParams,
    spPatterns = spPatterns,
    mode="DE",analysis="overlap")

## patternPairs not provided. Calculating all 
##             possible pairs.

## Using user provided optParams.

## Calculating genes of interest for Pattern_1 and Pattern_5

## Warning in matrixTests::row_kruskalwallis(x = testMat, g = region): 2320
## columns dropped due to missing group information

4.1.3 Residual Mode vs DE Mode: Differences

One of the first things to notice is the difference in the number of genes identified between the two modes.

residual_p1_p5<-SpaceMarkers[[1]]$interacting_genes[[1]]
DE_p1_p5<-SpaceMarkers_DE[[1]]$interacting_genes[[1]]

paste(
    "Residual mode identified",dim(residual_p1_p5)[1],
        "interacting genes,while DE mode identified",dim(DE_p1_p5)[1],
        "interacting genes",collapse = NULL)

## [1] "Residual mode identified 114 interacting genes,while DE mode identified 114 interacting genes"

DE mode produces more genes than residual mode because the matrix of residuals highlights less significant differences for confounding genes across the spots.The next analysis will show where the top genes rank in each mode’s list if they are identified at all. A function was created that will take the top 20 genes of a reference list of genes and compare it to the entire list of a second list of genes. The return object is a data frame of the gene, the name of each list and the ranking of each gene as compared to the reference list. If there is no gene identified in the second list compared to the reference it is classified as NA.

compare_genes <- function(ref_list, list2,ref_name = "mode1",
                            list2_name = "mode2", sub_slice = NULL){
    ref_rank <- seq(1,length(ref_list),1)
    list2_ref_rank <- which(list2 %in% ref_list)
    list2_ref_genes <- list2[which(list2 %in% ref_list)]
    ref_genes_only <- ref_list[ !ref_list  %in% list2_ref_genes ]
    mode1 <- data.frame("Gene" = ref_list,"Rank" = ref_rank,"mode"= ref_name)
    mode2 <- data.frame("Gene" = c(list2_ref_genes, ref_genes_only),"Rank" = c(
        list2_ref_rank,rep(NA,length(ref_genes_only))),"mode"= list2_name)
    mode1_mode2 <- merge(mode1, mode2, by = "Gene", all = TRUE) 
    mode1_mode2 <- mode1_mode2[order(mode1_mode2$Rank.x),]
    mode1_mode2 <- subset(mode1_mode2,select = c("Gene","Rank.x","Rank.y"))
    colnames(mode1_mode2) <- c("Gene",paste0(ref_name,"_Rank"),
                                paste0(list2_name,"_Rank"))
    return(mode1_mode2)
}

res_to_DE <- compare_genes(head(residual_p1_p5$Gene, n = 20),DE_p1_p5$Gene,
                            ref_name="residual",list2_name="DE")
DE_to_res <- compare_genes(head(DE_p1_p5$Gene, n = 20),residual_p1_p5$Gene,
                            ref_name = "DE",list2_name = "residual")

4.1.3.1 Comparing residual mode to DE mode

res_to_DE

##        Gene residual_Rank DE_Rank
## 9      CST1             1       3
## 3      APOE             2      18
## 2     APOC1             3      17
## 14    IFI30             4      11
## 8    COL4A1             5       8
## 6       CLU             6      43
## 1    AKR1A1             7       4
## 11      FAH             8      40
## 13 HLA-DRB1             9     113
## 5      CAPG            10      29
## 12    GCHFR            11      10
## 7   COL18A1            12      15
## 10     CTSB            13       2
## 17   NDUFB2            14      22
## 16   LAPTM5            15      42
## 18    PHGR1            16      54
## 15     IGHE            17      39
## 19     PSAP            18      19
## 20     TGM2            19      46
## 4      C1QC            20      53

Here we identify the top 20 genes in ‘residual’ mode and their corresponding ranking in DE mode. HLA-DRB1 is the only gene identified in residual mode and not in DE mode. The other genes are ranked relatively high in both residual and DE mode.

4.1.3.2 Comparing DE mode to residual mode

DE_to_res

##          Gene DE_Rank residual_Rank
## 18        SDS       1            24
## 10       CTSB       2            13
## 9        CST1       3             1
## 3      AKR1A1       4             7
## 12        FTL       5            46
## 11       CTSD       6            64
## 14       HCP5       7            35
## 8      COL4A1       8             5
## 2        ACTB       9           112
## 13      GCHFR      10            11
## 15      IFI30      11             4
## 19      TREM2      12            22
## 6    ARHGEF40      13            77
## 20     TSPAN4      14            32
## 7     COL18A1      15            12
## 1  AC069200.1      16           105
## 4       APOC1      17             3
## 5        APOE      18             2
## 17       PSAP      19            18
## 16      PRKD3      20            99

Recall that DE mode looks at the information encoded in the latent feature space and does not filter out genes based on any confounders between the counts matrix and latent feature matrix as is done in ‘residual’ mode. Therefore there are more genes in DE mode not identified at all in residual mode.

There is some agreement with interacting genes between the two methods but there are also quite a few differences. Therefore, the selected mode can significantly impact the downstream results and should be taken into consideration based on the specific biological question being answered and the data available.

4.2 Types of Analyses

Another feature of the SpaceMarkers package is the type of analysis that can be carried out, whether ‘overlap’ or ‘enrichment’ mode. The major difference between the two is that enrichment mode includes genes even if they did not pass the post-hoc Dunn’s test. These additional genes were included to enable a more statistically powerful pathway enrichment analysis and understand to a better extent the impact of genes involved each pathway. Changing analysis = ‘enrichment’ in the getPairwiseInteractingGenes function will enable this.

SpaceMarkers_enrich <- getPairwiseInteractingGenes(data = counts_matrix,
                                    reconstruction = cogaps_matrix,
                                    optParams = optParams,
                                    spPatterns = spPatterns,
                                    mode ="residual",analysis="enrichment")

## patternPairs not provided. Calculating all 
##             possible pairs.

## Using user provided optParams.

## Calculating genes of interest for Pattern_1 and Pattern_5

## Warning in matrixTests::row_kruskalwallis(x = testMat, g = region): 2320
## columns dropped due to missing group information

SpaceMarkers_DE_enrich <- getPairwiseInteractingGenes(
    data=counts_matrix,reconstruction=NULL,
    optParams = optParams,
    spPatterns = spPatterns,
    mode="DE",analysis="enrichment")

## patternPairs not provided. Calculating all 
##             possible pairs.

## Using user provided optParams.

## Calculating genes of interest for Pattern_1 and Pattern_5

## Warning in matrixTests::row_kruskalwallis(x = testMat, g = region): 2320
## columns dropped due to missing group information

residual_p1_p5_enrichment<-SpaceMarkers_enrich[[1]]$interacting_genes[[1]]$Gene
DE_p1_p5_enrichment<-SpaceMarkers_DE_enrich[[1]]$interacting_genes[[1]]$Gene

4.2.1 Residual Mode vs DE Mode: Enrichment

The data frames for the Pattern_1 x Pattern_5 will be used to compare the results of the enrichment analyses

enrich_res_to_de<-compare_genes(
    head(DE_p1_p5_enrichment, 20),
    residual_p1_p5_enrichment,
    ref_name="DE_Enrich",list2_name = "res_Enrich")
enrich_res_to_de

##          Gene DE_Enrich_Rank res_Enrich_Rank
## 18        SDS              1              24
## 10       CTSB              2              13
## 9        CST1              3               1
## 3      AKR1A1              4               7
## 12        FTL              5              46
## 11       CTSD              6              64
## 14       HCP5              7              35
## 8      COL4A1              8               5
## 2        ACTB              9             112
## 13      GCHFR             10              11
## 15      IFI30             11               4
## 19      TREM2             12              22
## 6    ARHGEF40             13              77
## 20     TSPAN4             14              32
## 7     COL18A1             15              12
## 1  AC069200.1             16             105
## 4       APOC1             17               3
## 5        APOE             18               2
## 17       PSAP             19              18
## 16      PRKD3             20              99

The ranks differ alot more here because now genes that were not previously ranked are assigned a score.

overlap_enrich_de<-compare_genes(
    head(DE_p1_p5_enrichment,20),
    DE_p1_p5$Gene,
    ref_name="DE_Enrich",
    list2_name="DE_Overlap")
overlap_enrich_de

##          Gene DE_Enrich_Rank DE_Overlap_Rank
## 18        SDS              1               1
## 10       CTSB              2               2
## 9        CST1              3               3
## 3      AKR1A1              4               4
## 12        FTL              5               5
## 11       CTSD              6               6
## 14       HCP5              7               7
## 8      COL4A1              8               8
## 2        ACTB              9               9
## 13      GCHFR             10              10
## 15      IFI30             11              11
## 19      TREM2             12              12
## 6    ARHGEF40             13              13
## 20     TSPAN4             14              14
## 7     COL18A1             15              15
## 1  AC069200.1             16              16
## 4       APOC1             17              17
## 5        APOE             18              18
## 17       PSAP             19              19
## 16      PRKD3             20              20

The enrichment and overlap analysis are in great agreement for DE mode. Typically, you may see more changes among genes lower in the ranking. This is especially important where genes that do not pass the Dunn’s test for interactions between any of the other two patterns in the overlap analysis are now ranked in enrichment analysis. The Pattern_1 x Pattern_5 entry for these genes is labelled as FALSE.

Here is an example of the statistics for such genes.

tail(SpaceMarkers_DE_enrich[[1]]$interacting_genes[[1]])

##                  Gene Pattern_1 x Pattern_5 KW.obs.tot KW.obs.groups KW.df
## MAGED2         MAGED2                 FALSE       2578             3     2
## AL512625.3 AL512625.3                 FALSE       2578             3     2
## AC006064.2 AC006064.2                 FALSE       2578             3     2
## S100P           S100P                 FALSE       2578             3     2
## HLA-DRB1     HLA-DRB1                 FALSE       2578             3     2
## DUSP18         DUSP18                 FALSE       2578             3     2
##            KW.statistic     KW.pvalue      KW.p.adj Dunn.zP1_Int Dunn.zP2_Int
## MAGED2      1067.072779 1.941289e-232 1.106535e-230   -16.382593    0.7872872
## AL512625.3     2.643635  2.666503e-01  2.747958e-01     1.244831    0.5153811
## AC006064.2     1.772506  4.121973e-01  4.121973e-01     1.100072    0.5465379
## S100P        779.358998 5.812244e-170 1.325192e-168   -13.463590    1.2895356
## HLA-DRB1    1298.247654 1.227733e-282 1.399616e-280     2.413430  -16.7158433
## DUSP18         3.052452  2.173544e-01  2.273248e-01     1.658007    1.1262847
##            Dunn.zP2_P1 Dunn.pval_1_Int Dunn.pval_2_Int Dunn.pval_2_1
## MAGED2       31.849135    2.546512e-60    1.000000e+00 1.353106e-222
## AL512625.3   -1.368202    1.000000e+00    1.000000e+00  1.712489e-01
## AC006064.2   -1.042523    1.000000e+00    1.000000e+00  2.971693e-01
## S100P        27.349313    2.561609e-41    1.000000e+00 1.100412e-164
## HLA-DRB1    -35.047531    1.000000e+00    1.004883e-62 4.251886e-269
## DUSP18       -1.018002    1.000000e+00    1.000000e+00  3.086772e-01
##            Dunn.pval_1_Int.adj Dunn.pval_2_Int.adj Dunn.pval_2_1.adj
## MAGED2            3.278634e-59        1.000000e+00     2.090548e-220
## AL512625.3        1.000000e+00        1.000000e+00      2.019556e-01
## AC006064.2        1.000000e+00        1.000000e+00      3.332193e-01
## S100P             1.930579e-40        1.000000e+00     8.500685e-163
## HLA-DRB1          1.000000e+00        1.411404e-61     1.313833e-266
## DUSP18            1.000000e+00        1.000000e+00      3.405406e-01
##            SpaceMarkersMetric
## MAGED2             -0.4905349
## AL512625.3         -0.5547454
## AC006064.2         -0.6569152
## S100P              -0.7091282
## HLA-DRB1           -1.1050614
## DUSP18             -1.5030264

The rankings of genes between the overlap and enrichment analysis in residual mode are comparable as well.

overlap_enrich_res<-compare_genes(
    head(residual_p1_p5$Gene, 20),
    residual_p1_p5_enrichment,
    ref_name ="res_overlap",list2_name="res_enrich")
overlap_enrich_res

##        Gene res_overlap_Rank res_enrich_Rank
## 9      CST1                1               1
## 3      APOE                2               2
## 2     APOC1                3               3
## 14    IFI30                4               4
## 8    COL4A1                5               5
## 6       CLU                6               6
## 1    AKR1A1                7               7
## 11      FAH                8               8
## 13 HLA-DRB1                9               9
## 5      CAPG               10              10
## 12    GCHFR               11              11
## 7   COL18A1               12              12
## 10     CTSB               13              13
## 17   NDUFB2               14              14
## 16   LAPTM5               15              15
## 18    PHGR1               16              16
## 15     IGHE               17              17
## 19     PSAP               18              18
## 20     TGM2               19              19
## 4      C1QC               20              20

5 Visualizing SpaceMarkers

5.1 Loading Packages

The following libraries are required to make the plots and summarize dataframes

library(Matrix)
library(rjson)
library(cowplot)
library(RColorBrewer)
library(grid)
library(readbitmap)
library(dplyr)
library(data.table)
library(viridis)
library(hrbrthemes)
library(ggplot2)

The two main statistics used to help interpret the expression of genes across the patterns are the KW statistics/pvalue and the Dunn’s test. In this context the null hypothesis of the KW test is that the expression of a given gene across all of the spots is equal. The post hoc Dunn’s test identifies how statistically significant the difference in expression of the given gene is between two patterns. The Dunn’s test considers the differences between specific patterns and the KW test considers differences across all of the spots without considering the specific patterns. Ultimately, we summarize and rank these effects with the SpaceMarkersMetric.

We will look at the top few genes based on our SpaceMarkersMetric

res_enrich <- SpaceMarkers_enrich[[1]]$interacting_genes[[1]]
hotspots <- SpaceMarkers_enrich[[1]]$hotspots
top <- res_enrich %>% arrange(-SpaceMarkersMetric)
print(head(top))

##          Gene Pattern_1 x Pattern_5 KW.obs.tot KW.obs.groups KW.df KW.statistic
## CST1     CST1                vsBoth       2578             3     2    125.46835
## APOE     APOE                vsBoth       2578             3     2     37.80453
## APOC1   APOC1                vsBoth       2578             3     2    271.02023
## IFI30   IFI30                vsBoth       2578             3     2    156.99958
## COL4A1 COL4A1                vsBoth       2578             3     2     79.33185
## CLU       CLU                vsBoth       2578             3     2    125.23575
##           KW.pvalue     KW.p.adj Dunn.zP1_Int Dunn.zP2_Int Dunn.zP2_P1
## CST1   5.687127e-28 1.662391e-27   -10.276169   -11.091384 -1.20777279
## APOE   6.178046e-09 1.257674e-08    -5.888909    -5.945326  0.05899994
## APOC1  1.408334e-59 5.946301e-59   -12.755379   -16.313048 -6.15453456
## IFI30  8.090489e-35 2.561988e-34   -10.088355   -12.484250 -4.10348175
## COL4A1 5.933441e-18 1.470461e-17    -7.568964    -8.906696 -2.23783263
## CLU    6.388564e-28 1.820741e-27   -11.178092    -9.253495  3.82726806
##        Dunn.pval_1_Int Dunn.pval_2_Int Dunn.pval_2_1 Dunn.pval_1_Int.adj
## CST1      9.024361e-25    1.381335e-28  2.271347e-01        5.672455e-24
## APOE      3.887544e-09    2.759075e-09  9.529522e-01        1.236520e-08
## APOC1     2.909553e-37    7.971396e-60  7.529830e-10        2.648696e-36
## IFI30     6.220289e-24    9.099627e-36  4.069786e-05        3.732173e-23
## COL4A1    3.762125e-14    5.257605e-19  2.523197e-02        1.460590e-13
## CLU       5.220049e-29    2.172704e-20  1.295734e-04        4.053214e-28
##        Dunn.pval_2_Int.adj Dunn.pval_2_1.adj SpaceMarkersMetric
## CST1          9.856012e-28      3.656314e-01           6.575456
## APOE          8.992540e-09      1.000000e+00           5.170385
## APOC1         8.768535e-59      2.548558e-09           5.121392
## IFI30         8.007671e-35      1.033099e-04           4.986063
## COL4A1        2.393117e-18      4.724284e-02           4.959997
## CLU           1.062211e-19      3.167349e-04           4.808703

5.2 Code Setup

This first function below can visualize the locations of these patterns on a spatial grid. The code has been adopted from 10xgenomics: (support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/rkit)

geom_spatial <-  function(mapping = NULL,
                          data = NULL,
                          stat = "identity",
                          position = "identity",
                          na.rm = FALSE,
                          show.legend = NA,
                          inherit.aes = FALSE,...) {
  GeomCustom <- ggproto(
    "GeomCustom",
    Geom,
    setup_data = function(self, data, params) {
      data <- ggproto_parent(Geom, self)$setup_data(data, params)
      data
    },
    draw_group = function(data, panel_scales, coord) {
      vp <- grid::viewport(x=data$x, y=data$y)
      g <- grid::editGrob(data$grob[[1]], vp=vp)
      ggplot2:::ggname("geom_spatial", g)
    },
    required_aes = c("grob","x","y")
  )
  layer(geom = GeomCustom,mapping = mapping,data = data,stat = stat,position=
          position,show.legend = show.legend,inherit.aes =
          inherit.aes,params = list(na.rm = na.rm, ...)
  )
}

The plotSpatialData function allows you to look at the deconvoluted patterns on the tissue image.

plotSpatialData <- function(spatial_data,positions_path,image_path,jsonPath,
                              feature_col,colors = NULL,
                              title = "Spatial Heatmap",
                              alpha = 0.6,scaled = FALSE,res = "lowres",
                              x_col = "x",y_col = "y",sample = "Sample",
                              plot = TRUE) {
  og_positions <- read.csv(positions_path, header = FALSE)
  og_positions <- og_positions[,c(1,5,6)]
  colnames(og_positions) <- c("barcode",y_col,x_col)
  spatial_data <- dplyr::inner_join(
    og_positions,spatial_data[,c("barcode",feature_col)],by = "barcode")
  images_cl <- readbitmap::read.bitmap(image_path)
  height <-  data.frame(height = nrow(images_cl))
  width <- data.frame(width = ncol(images_cl))
  grobs<-grid::rasterGrob(images_cl,width=unit(1,"npc"),
                          height=unit(1,"npc"))
  images_tibble <- dplyr::tibble(sample=factor(sample), grob=list(grobs))
  images_tibble$height <- height$height
  images_tibble$width <- width$width
  scales <- jsonlite::read_json(jsonPath)
  if (scaled == FALSE){
    res <- names(scales)[grepl(pattern = res,x = names(scales))]
    spatial_data[[x_col]]<-  scales[[res]] * spatial_data[[x_col]]
    spatial_data[[y_col]] <- scales[[res]] * spatial_data[[y_col]]
  }
  spatial_data$height <- height$height
  spatial_data$width <- width$width
  if (is.numeric(spatial_data[[feature_col]])){
    p <- spatial_data %>% ggplot(aes(
      x=.data[[x_col]], y=.data[[y_col]], fill=.data[[feature_col]],
      alpha = .data[[feature_col]])) + geom_spatial(data=images_tibble[1,],
                                                    mapping = aes(grob=grob),
                                                    x=0.5, y=0.5) +
      geom_point(shape = 21, colour = "black", size = 1.1, stroke = 0.2)
  } else {
    p <- spatial_data %>% ggplot(aes(
      x=.data[[x_col]], y=.data[[y_col]], fill=.data[[feature_col]]))+
      geom_spatial(data=images_tibble[1,],mapping = aes(grob=grob),
                   x=0.5, y=0.5) +
      geom_point(shape = 21, colour = "black", size = 1.1, stroke = 0.2,
                 alpha = alpha)
  }
  p <- p + coord_cartesian(expand=FALSE) + 
    xlim(0,max(spatial_data$width)) + ylim(max(spatial_data$height),0) +
    xlab("") + ylab("")+
    ggtitle(title) + theme_set(theme_bw(base_size = 10)) +
    theme(
      panel.grid.major=element_blank(),
      panel.grid.minor = element_blank(),
      panel.background = element_blank(),
      axis.line = element_line(colour="black"),
      axis.text=element_blank(),axis.ticks = element_blank())
  # Check if the feature_col is continuous or discrete and apply 
  #the appropriate color scale
  if (is.numeric(spatial_data[[feature_col]]) & !is.null(colors)) {
    # Use gradient scale for continuous variables when colors are provided
    p <- p + scale_fill_gradientn(colours = colors)
  } else if (is.numeric(spatial_data[[feature_col]]) & is.null(colors)) {
    # Default to a color palette when no colors are provided for 
    #continuous variables
    p <- p + scale_fill_gradientn(colours = rev(
      RColorBrewer::brewer.pal(name = "RdYlBu", n = 11)))
  } else if (!is.numeric(spatial_data[[feature_col]])) {
    # Use manual scale for discrete variables
    if (!is.null(colors)) {
      p <- p + scale_fill_manual(values = colors)
    } else {
      features <- unique(spatial_data[[feature_col]])
      features <- as.character(features[!is.na(features)])
      if (length(features) > 1){
        message("You can specify your colors. Ensure the length is equal to the
                number of unique features in your feature_col")
      } else {
        p <- p + scale_fill_manual(values = "red")
      }
    }
  }
  if (plot == TRUE) {
    print(p)
  } 
  return(p)
}

We can compare these spatial maps to the expression of genes identified interacting genes on violin plots.

createInteractCol <- function(spHotspots, 
                              interaction_cols = c("T.cells","B-cells")){
  col1 <- spHotspots[,interaction_cols[1]]
  col2 <- spHotspots[,interaction_cols[2]]
  one <- col1
  two <- col2
  one[!is.na(col1)] <- "match"
  two[!is.na(col2)] <- "match"
  both_idx <- which(one == two)
  both <- col1
  both[both_idx] <- "interacting"
  one_only <- setdiff(which(!is.na(col1)),unique(c(which(is.na(col1)),
                                                   both_idx)))
  two_only <- setdiff(which(!is.na(col2)),unique(c(which(is.na(col2)),
                                                   both_idx)))
  both[one_only] <- interaction_cols[1]
  both[two_only] <- interaction_cols[2]
  both <- factor(both,levels = c(interaction_cols[1],"interacting",
                                 interaction_cols[2]))
  return(both)
}

#NB: Since we are likely to plot multipe genes, this function assumes an
#already transposed counts matrix. This saves time and memory in the long run
#for larger counts matrices
plotSpatialExpr <- function(data,gene,hotspots,patterns,
                               remove.na = TRUE,
                               title = "Expression (Log)"){
  counts <- data
  interact <- createInteractCol(spHotspots = hotspots,
                                interaction_cols = patterns)
  df <- cbind(counts,hotspots,data.frame("region" = interact))
  if (remove.na){
    df <- df[!is.na(df$region),]
  }
  p <- df %>% ggplot( aes_string(x='region',y=gene,
                                            fill='region')) + geom_violin() +
    scale_fill_viridis(discrete = TRUE,alpha=0.6) +
    geom_jitter(color="black",size=0.4,alpha=0.9) + theme_ipsum()+
    theme(legend.position="none",plot.title = element_text(size=11),
            axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
    ggtitle(paste0(gene,": ",title)) + xlab("")
  return(p)
}

5.3 Get the Spatial Data

Let’s transpose the counts matrix and combine the expression information with the spatial information.

genes <- top$Gene
counts_df <- as.data.frame(as.matrix(
  t(counts_matrix[rownames(counts_matrix) %in% genes,])))

5.4 Generate Plots

image_paths <- file.path(data_dir,"spatial","tissue_lowres_image.png")
scalefactor_paths <- file.path(data_dir,"spatial","scalefactors_json.json")
tissue_paths <- file.path(data_dir,"spatial","tissue_positions_list.csv")

spatialMaps <- list()
exprPlots <- list()

for (g in genes){
  spatialMaps[[length(spatialMaps)+1]] <- plotSpatialData(
    spatial_data = cbind(spPatterns, counts_df), 
                positions_path = tissue_paths,
                image_path = image_paths,
                jsonPath = scalefactor_paths,
                feature_col = g,
                colors = NULL,
                title = g,
                scaled = FALSE, plot = FALSE)
  exprPlots[[length(exprPlots)+1]] <- plotSpatialExpr(
    data = counts_df,gene = g,hotspots = hotspots, 
                   patterns = c("Pattern_1","Pattern_5"))
}

## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Below are violin plots and spatial heatmaps to help visualize the expression of individual genes across different patterns.

5.4.1 Pattern_1

plotSpatialData(
    spatial_data = cbind(spPatterns, counts_df), 
                positions_path = tissue_paths,
                image_path = image_paths,
                jsonPath = scalefactor_paths,
                feature_col = "Pattern_1",
                colors = NULL,
                title = "Pattern_1",
                scaled = FALSE, plot = FALSE)

5.4.2 Pattern_5

plotSpatialData(
    spatial_data = cbind(spPatterns, counts_df), 
                positions_path = tissue_paths,
                image_path = image_paths,
                jsonPath = scalefactor_paths,
                feature_col = "Pattern_5",
                colors = NULL,
                title = "Pattern_5",
                scaled = FALSE, plot = FALSE)

On the spatial heatmap, Pattern_5, the invasive carcinoma pattern, is more prevalent on the bottom left of the tissue image. Where as Pattern_1, the immune pattern is prevalent along the diagonal of the tissue image.

5.4.3 Top SpaceMarkers

plot_grid(plotlist = list(exprPlots[[1]],spatialMaps[[1]]))

plot_grid(plotlist = list(exprPlots[[2]],spatialMaps[[2]]))

APOE is expressed highly across all patterns but is especially strong in the interacting pattern. This is a good example of a SpaceMarker that is highly expressed in the interacting region relative to both Pattern_1 and Pattern_5. The second gene also has a similar expression profile.

plot_grid(plotlist = list(exprPlots[[3]],spatialMaps[[3]]))

This gene is not as highly expressed in the tissue as the previous genes but still shows higher and specific expression in the interacting region relative to either Pattern_1 and Pattern_5 hence why it is still highly ranked by the SpaceMarkersMetric.

5.4.4 Negative SpaceMarkersMetric

More negative SpaceMarkersMetric highlights that the expression of a gene in the interacting region is lower relative to either pattern.

bottom <- res_enrich %>% arrange(SpaceMarkersMetric)
print(head(bottom))

##          Gene Pattern_1 x Pattern_5 KW.obs.tot KW.obs.groups KW.df KW.statistic
## CREB1   CREB1                 FALSE       2578             3     2     25.95375
## ADPGK   ADPGK                 FALSE       2578             3     2     20.59582
## ACTB     ACTB                 FALSE       2578             3     2     57.71513
## USP24   USP24                 FALSE       2578             3     2     80.94266
## PCDH7   PCDH7                 FALSE       2578             3     2    348.81206
## HMBOX1 HMBOX1                 FALSE       2578             3     2    180.07336
##           KW.pvalue     KW.p.adj Dunn.zP1_Int Dunn.zP2_Int Dunn.zP2_P1
## CREB1  2.313213e-06 4.253327e-06     4.730777     5.021644   0.4016220
## ADPGK  3.370347e-05 5.734621e-05     4.141449     4.501039   0.5435229
## ACTB   2.933041e-13 6.687333e-13     5.382914     7.390187   3.5224001
## USP24  2.651696e-18 6.717629e-18     8.661858     6.116842  -4.8925166
## PCDH7  1.804776e-76 8.229778e-76     8.268278    15.796698  13.5392807
## HMBOX1 7.898913e-40 2.728715e-39     7.022771    11.960579   8.8362128
##        Dunn.pval_1_Int Dunn.pval_2_Int Dunn.pval_2_1 Dunn.pval_1_Int.adj
## CREB1                1               1  6.879623e-01                   1
## ADPGK                1               1  5.867698e-01                   1
## ACTB                 1               1  4.276582e-04                   1
## USP24                1               1  9.955471e-07                   1
## PCDH7                1               1  9.168227e-42                   1
## HMBOX1               1               1  9.901805e-19                   1
##        Dunn.pval_2_Int.adj Dunn.pval_2_1.adj SpaceMarkersMetric
## CREB1                    1      1.000000e+00          -4.629723
## ADPGK                    1      8.954176e-01          -4.295783
## ACTB                     1      1.008052e-03          -3.619841
## USP24                    1      2.826069e-06          -3.564310
## PCDH7                    1      8.644328e-41          -3.412356
## HMBOX1                   1      4.430638e-18          -3.393132

g <- bottom$Gene[1]
p1 <- plotSpatialExpr(
    data = counts_df,gene = g,hotspots = hotspots, 
                   patterns = c("Pattern_1","Pattern_5"))
p2 <- plotSpatialData(
    spatial_data = cbind(spPatterns, counts_df), 
                positions_path = tissue_paths,
                image_path = image_paths,
                jsonPath = scalefactor_paths,
                feature_col = g,
                colors = NULL,
                title = g,
                scaled = FALSE, plot = FALSE)

plot_grid(plotlist = list(p1,p2))

6 Removing Directories

unlink(file.path(data_dir), recursive = TRUE)

7 References

Appendix

Deshpande, Atul, et al. “Uncovering the spatial landscape of molecular interactions within the tumor microenvironment through latent spaces.” Cell Systems 14.4 (2023): 285-301.

“Space Ranger.” Secondary Analysis in R -Software -Spatial Gene Expression - Official 10x Genomics Support, support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/rkit. Accessed 22 Dec. 2023.

1 load10XExpr() Arguments

Argument	Description
visiumDir	A string path to the h5 file with expression information
h5filename	A string of the name of the h5 file in the directory

2 load10XCoords() Arguments

Argument	Description
visiumDir	A path to the location of the the spatial coordinates folder.
resolution	String values to look for in the .json object;lowres or highres.

3 getSpatialParameters() Arguments

Argument	Description
spPatterns	A data frame of spatial coordinates and patterns.
visiumDir	A directory with the spatial and expression data for
the tissue sample
spatialDir	A directory with spatial data for the tissue sample
pattern	A string of the .json filename with the image parameters
sigma	A numeric value specifying the kernel distribution width
threshold	A numeric value specifying the outlier threshold for the
kernel
resolution	A string specifying the image resolution to scale

4 getPairwiseInteractingGenes() Arguments

Argument	Description
data	An expression matrix of genes and columns being the samples.
reconstruction	Latent feature matrix. NULL if ‘DE’ mode is specified
optParams	A matrix of sigmaOpts (width) and the thresOpt (outlierthreshold)
spPatterns	A data frame that contains of spatial coordinates and patterns.
mode	A string of the reference pattern for comparison to other patterns
minOverlap	A string specifying either ‘residual’ or ‘DE’ mode.
hotspotRegions	A value that specifies the minimum pattern overlap. 50 is the default
analysis	A string specifying the type of analysis

sessionInfo()

## R version 4.5.0 beta (2025-04-02 r88102)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] ggplot2_3.5.2      hrbrthemes_0.8.7   viridis_0.6.5      viridisLite_0.4.2 
##  [5] data.table_1.17.0  dplyr_1.1.4        readbitmap_0.1.5   RColorBrewer_1.1-3
##  [9] cowplot_1.1.3      rjson_0.2.23       Matrix_1.7-3       SpaceMarkers_1.5.0
## [13] BiocStyle_2.37.0  
## 
## loaded via a namespace (and not attached):
##  [1] deldir_2.0-4            gridExtra_2.3           rlang_1.1.6            
##  [4] magrittr_2.0.3          matrixStats_1.5.0       compiler_4.5.0         
##  [7] spatstat.geom_3.3-6     png_0.1-8               systemfonts_1.2.2      
## [10] vctrs_0.6.5             reshape2_1.4.4          hdf5r_1.3.12           
## [13] stringr_1.5.1           pkgconfig_2.0.3         crayon_1.5.3           
## [16] fastmap_1.2.0           magick_2.8.6            backports_1.5.0        
## [19] labeling_0.4.3          nanoparquet_0.4.2       rmarkdown_2.29         
## [22] tinytex_0.57            purrr_1.0.4             bit_4.6.0              
## [25] xfun_0.52               cachem_1.1.0            jsonlite_2.0.0         
## [28] goftest_1.2-3           matrixTests_0.2.3       spatstat.utils_3.1-3   
## [31] BiocParallel_1.43.0     jpeg_0.1-11             tiff_0.1-12            
## [34] broom_1.0.8             parallel_4.5.0          R6_2.6.1               
## [37] bslib_0.9.0             stringi_1.8.7           spatstat.data_3.1-6    
## [40] spatstat.univar_3.1-2   car_3.1-3               extrafontdb_1.0        
## [43] jquerylib_0.1.4         Rcpp_1.0.14             bookdown_0.43          
## [46] knitr_1.50              tensor_1.5              extrafont_0.19         
## [49] splines_4.5.0           tidyselect_1.2.1        qvalue_2.41.0          
## [52] abind_1.4-8             yaml_2.3.10             codetools_0.2-20       
## [55] spatstat.random_3.3-3   spatstat.explore_3.4-2  lattice_0.22-7         
## [58] tibble_3.2.1            plyr_1.8.9              withr_3.0.2            
## [61] evaluate_1.0.3          polyclip_1.10-7         pillar_1.10.2          
## [64] BiocManager_1.30.25     carData_3.0-5           generics_0.1.3         
## [67] munsell_0.5.1           scales_1.3.0            glue_1.8.0             
## [70] gdtools_0.4.2           tools_4.5.0             bmp_0.3                
## [73] tidyr_1.3.1             ape_5.8-1               Rttf2pt1_1.3.12        
## [76] colorspace_2.1-1        nlme_3.1-168            Formula_1.2-5          
## [79] cli_3.6.4               spatstat.sparse_3.1-0   fontBitstreamVera_0.1.1
## [82] gtable_0.3.6            rstatix_0.7.2           sass_0.4.10            
## [85] digest_0.6.37           fontquiver_0.2.1        farver_2.1.2           
## [88] htmltools_0.5.8.1       lifecycle_1.0.4         fontLiberation_0.1.0   
## [91] bit64_4.6.0-1

Inferring Immune Interactions in Breast Cancer

2024-01-08