R version: R Under development (unstable) (2024-01-16 r85808)
Bioconductor version: 3.19
Understanding the interplay between different types of cells and their immediate environment is critical for understanding the mechanisms of cells themselves and their function in the context of human diseases. Recent advances in high dimensional in situ cytometry technologies have fundamentally revolutionised our ability to observe these complex cellular relationships providing an unprecedented characterisation of cellular heterogeneity in a tissue environment.
We have developed an analytical framework for analysing data from high dimensional in situ cytometry assays including CODEX, CycIF, IMC and High Definition Spatial Transcriptomics. Implemented in R, this framework makes use of functionality from our Bioconductor packages spicyR, lisaClust, treekoR, FuseSOM, simpleSeg and ClassifyR. Below we will provide an overview of key steps which are needed to interrogate the comprehensive spatial information generated by these exciting new technologies including cell segmentation, feature normalisation, cell type identification, micro-environment characterisation, spatial hypothesis testing and patient classification. Ultimately, our modular analysis framework provides a cohesive and accessible entry point into spatially resolved single cell data analysis for any R-based bioinformaticians.
To install the current release of spicyWorkflow, run the following code.
if (!require("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("spicyWorkflow")
library(cytomapper)
library(dplyr)
library(ggplot2)
library(simpleSeg)
library(FuseSOM)
library(ggpubr)
library(scater)
library(spicyR)
library(ClassifyR)
library(lisaClust)
library(tidySingleCellExperiment)
It is convenient to set the number of cores for running code in parallel. Please choose a number that is appropriate for your resources. Set the use_mc
flag to TRUE
if you would like to use parallel processing for the rest of the vignette. A minimum of 2 cores is suggested since running this workflow is rather computationally intensive.
use_mc <- FALSE
if (use_mc) {
nCores <- max(parallel::detectCores() - 1, 1)
} else {
nCores <- 2
}
BPPARAM <- simpleSeg:::generateBPParam(nCores)
theme_set(theme_classic())
In the following we will re-analyse some MIBI-TOF data (Risom et al, 2022) profiling the spatial landscape of ductal carcinoma in situ (DCIS), which is a pre-invasive lesion that is thought to be a precursor to invasive breast cancer (IBC). The key conclusion of this manuscript (amongst others) is that spatial information about cells can be used to predict disease progression in patients. We will use our spicy workflow to make a similar conclusion.
The R code for this analysis is available on github https://github.com/SydneyBioX/spicyWorkflow. A mildly processed version of the data used in the manuscript is available in this repository.
The images are stored in the images
folder within the data
folder. Here we use loadImages()
from the cytomapper
package to load all the tiff images into a CytoImageList
object and store the images as h5 file on-disk.
pathToImages <- system.file("extdata/images", package = "spicyWorkflow")
# Store images in a CytoImageList on_disk as h5 files to save memory.
images <- cytomapper::loadImages(
pathToImages,
single_channel = TRUE,
on_disk = TRUE,
h5FilesPath = HDF5Array::getHDF5DumpDir(),
BPPARAM = BPPARAM
)
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 11873826 634.2 20135064 1075.4 13752639 734.5
## Vcells 19748219 150.7 33900562 258.7 23419643 178.7
To associate features in our image with disease progression, it is important to read in information which links image identifiers to their progression status. We will do this here, making sure that our imageID
match.
## Read the clinical data
# Read in clinical data, manipulate imageID and select columns
clinical <- read.csv(
system.file(
"extdata/1-s2.0-S0092867421014860-mmc1.csv",
package = "spicyWorkflow"
)
)
clinical <- clinical |>
mutate(imageID = paste0(
"Point", PointNumber, "_pt", Patient_ID, "_", TMAD_Patient
))
image_idx <- grep("normal", clinical$Tissue_Type)
clinical$imageID[image_idx] <- paste0(clinical$imageID[image_idx], "_Normal")
clinicalVariables <- c(
"imageID", "Patient_ID", "Status", "Age", "SUBTYPE", "PAM50", "Treatment",
"DCIS_grade", "Necrosis"
)
rownames(clinical) <- clinical$imageID
We can then store the clinical information in the mcols
of the CytoImageList
.
# Add the clinical data to mcols of images.
mcols(images) <- clinical[names(images), clinicalVariables]
Our simpleSeg R package on https://github.com/SydneyBioX/simpleSeg provides a series of functions to generate simple segmentation masks of images. These functions leverage the functionality of the EBImage package on Bioconductor. For more flexibility when performing your segmentation in R we recommend learning to use the EBimage package. A key strength of the simpleSeg package is that we have coded multiple ways to perform some simple segmentation operations as well as incorporating multiple automatic procedures to optimise some key parameters when these aren’t specified.
If your images are stored in a list
or CytoImageList
they can be segmented with a simple call to simpleSeg()
. Here we have ask simpleSeg
to do multiple things. First, we would like to use a combination of principal component analysis of all channels guided by the H33 channel to summarise the nuclei signal in the images. Secondly, to estimate the cell body of the cells we will simply dilate out from the nuclei by 2 pixels. We have also requested that the channels be square root transformed and that a minimum cell size of 40 pixels be used as a size selection step.
# Generate segmentation masks
masks <- simpleSeg(
images,
nucleus = c("HH3"),
cellBody = "dilate",
transform = "sqrt",
sizeSelection = 40,
discSize = 2,
pca = TRUE,
cores = nCores
)
The display
and colorLabels
functions in EBImage
make it very easy to examine the performance of the cell segmentation. The great thing about display
is that if used in an interactive session it is very easy to zoom in and out of the image.
# Visualise segmentation performance one way.
EBImage::display(colorLabels(masks[[1]]))
The plotPixels
function in cytomapper
make it easy to overlay the masks on top of the intensities of 6 markers. Here we can see that the segmentation appears to be performing reasonably.
# Visualise segmentation performance another way.
cytomapper::plotPixels(
image = images[1],
mask = masks[1],
img_id = "imageID",
colour_by = c("PanKRT", "GLUT1", "HH3", "CD3", "CD20"),
display = "single",
colour = list(
HH3 = c("black", "blue"),
CD3 = c("black", "purple"),
CD20 = c("black", "green"),
GLUT1 = c("black", "red"),
PanKRT = c("black", "yellow")
),
bcg = list(
HH3 = c(0, 1, 1.5),
CD3 = c(0, 1, 1.5),
CD20 = c(0, 1, 1.5),
GLUT1 = c(0, 1, 1.5),
PanKRT = c(0, 1, 1.5)
),
legend = NULL
)