Cardinal 3 provides statistical methods for both supervised and unsupervised analysis of mass spectrometry (MS) imaging experiments. Class comparison can also be performed, provided an appropriate experimental design and sample size.
Before statistical analysis, it is important to identify the statistical goal of the experiment:
Unsupervised analysis. The data has no class labels or conditions, and we are interested in exploratory analysis to discover regions of interest in the data.
Supervised analysis. The data has class labels and we want to train a statistical or machine learning model to predict the class labels of new data.
Class comparison. The data has class labels or conditions, and we want to test whether the abundance of the mass features is different between conditions.
CardinalWorkflows provides real experimental data and more detailed discussion of the statistical methods than will be covered in this brief overview.
Suppose we are exploring an unlabeled dataset, and wish to understand the structure of the data.
set.seed(2020, kind="L'Ecuyer-CMRG")
mse <- simulateImage(preset=2, dim=c(32,32), sdnoise=0.5,
peakheight=c(2,4), centroided=TRUE)
mse$design <- makeFactor(circle=mse$circle,
square=mse$square, bg=!(mse$circle | mse$square))
image(mse, "design")
image(mse, i=c(5, 13, 21), layout=c(1,3))
Principal components analysis is an unsupervised dimension reduction technique. It reduces the data to some number of “principal components” that are a linear combination of the original mass features, where each component is orthogonal to the last, and explains as much of the variance in the data as possible.
Use PCA()
to perform PCA on a MSImagingExperiment
.
pca <- PCA(mse, ncomp=3)
pca
## SpatialPCA on 30 variables and 1024 observations
## names(5): sdev, rotation, center, scale, x
## coord(2): x = 1...32, y = 1...32
## runNames(1): run0
## modelData(): Principal components (k=3)
##
## Standard deviations (1, .., k=3):
## PC1 PC2 PC3
## 7.031542 3.516199 1.092932
##
## Rotation (n x k) = (30 x 3):
## PC1 PC2 PC3
## [1,] -0.03141217 0.21197865 0.03941824
## [2,] -0.02743754 0.19152844 0.16421233
## [3,] -0.02974002 0.19314984 0.11896429
## [4,] -0.05048566 0.32818833 -0.04828145
## [5,] -0.05499438 0.34063726 -0.22523541
## [6,] -0.06129265 0.39304819 -0.18998119
## ... ... ... ...
We can see that the first 2 principal components explain most of the variation in the data.
image(pca, type="x", superpose=FALSE, layout=c(1,3), scale=TRUE)
The loadings of the components show how each feature contributes to each component.
plot(pca, type="rotation", superpose=FALSE, layout=c(1,3), linewidth=2)
Plotting the principal component scores against each other is a useful way of visualization the separation between data classes.
plot(pca, type="x", groups=mse$design, linewidth=2)