Contents

1 Introduction

1.1 Latest: Cardinal 3.6

Cardinal 3.6 is a major update with breaking changes. It bring support many of the new low-level signal processing functions implemented for matter 2.4 and matter 2.6. Almost the entire Cardinal codebase has been refactored to support these improvements.

The most notable of the new features include:

  • Redesign class hierarchy that includes a greater emphasis on spectra: SpectralImagingData, SpectralImagingArrays, and SpectralImagingExperiment lay the groundwork for the new data structures

  • Updated MSImagingExperiment class with a new counterpart MSImagingArrays class for better representing raw spectra.

  • New spectral processing methods in smooth():

    • Improved Gaussian filtering
    • Bilateral and adaptive bilateral filtering
    • Nonlinear diffusion filtering
    • Guided filtering
    • Peak-aware guided filtering
    • Savitsky-Golay smoothing
  • New spectral baseline reduction methods in reduceBaseline():

    • Interpolation from local minima
    • Convex hull estimation
    • Sensitive nonlinear iterative peak (SNIP) clipping
    • Running medians
  • New spectral alignment methods in recalibrate():

    • Local maxima-based alignment using local regression
    • Dynamic time warping
    • Correlation optimized warping
  • New peak picking methods in peakPick():

    • Derivative-based noise estimation
    • Quantile-based noise estimation
    • SD/MAD-based noise estimatino
    • Dynamic peak filtering
    • Continuous wavelet transform (CWT)
  • Improved image() contrast enhancement via enhance:

    • Improved histogram equalization
    • Contrast-limited adaptive histogram equalization (CLAHE)
  • Improved image() spatial smoothing via smooth:

    • Improved Gaussian filtering
    • Bilateral and adaptive bilateral filtering
    • Nonlinear diffusion filtering
    • Guided filtering
  • All statistical methods improved and updated

    • New and improved crossValidate() method
    • New dimension reduction method NMF()
    • Updated PCA() and spatialFastmap()
    • Updated PLS() and OPLS() with new algorithms
    • Updated spatialKMeans() with better initializations
    • Updated spatialShrunkenCentroids() with better initializations
    • Updated spatialDGMM() with improved stability
    • Updated meansTest() with improved data preparation
    • New SpatialResults output with simplified interface

And many other updates! Many redundant functions and methods have been merged to simplify and streamline workflows. Many unnecessary functions and methods have been deprecated.

Major improvements from earlier versions are further described below.

1.2 Previous updates from Cardinal 3

Cardinal 3 lays the groundwork for future improvements to the existing toolbox of pre-processing, visualization, and statistical methods for mass spectrometry (MS) imaging experiments. Cardinal has been updated to support matter 2, and legacy support has been dropped.

Despite minimal user-visible changes in Cardinal (at first), the entire matter package that provides the backend for Cardinal’s computing on larger-than-memory MS imaging datasets has been rewritten. This should provide more robust support for larger-than-memory computations, as well as greater flexibility in handling many data files in the future.

Further changes will be coming soon to Cardinal 3 in future point updates that are aimed to greatly improve the user experience and simplify the code that users need to write to process and analyze MS imaging data.

Major improvements from earlier versions are further described below.

1.3 Previous updates from Cardinal 2

Cardinal 2 provides new classes and methods for the manipulation, transformation, visualization, and analysis of imaging experiments–specifically MS imaging experiments.

MS imaging is a rapidly advancing field with consistent improvements in instrumentation for both MALDI and DESI imaging experiments. Both mass resolution and spatial resolution are steadily increasing, and MS imaging experiments grow increasingly complex.

The first version of Cardinal was written with certain assumptions about MS imaging data that are no longer true. For example, the basic assumption that the raw spectra can be fully loaded into memory is unreasonable for many MS imaging experiments today.

Cardinal 2 was re-written from the ground up to handle the evolving needs of high-resolution MS imaging experiments. Some advancements include:

  • New imaging experiment classes such as ImagingExperiment, SparseImagingExperiment, and MSImagingExperiment provide better support for out-of-memory datasets and a more flexible representation of complex experiments

  • New imaging metadata classes such as PositionDataFrame and MassDataFrame make it easier to manipulate experimental runs, pixel coordinates, and m/z-values by storing them as separate slots rather than ordinary columns

  • New plot() and image() visualization methods that can handle non-gridded pixel coordinates and allow assigning the resulting plot (and data) to a variable for later re-plotting

  • Support for writing imzML in addition to reading it; more options and support for importing out-of-memory imzML for both “continuous” and “processed” formats

  • Data manipulation and summarization verbs such as subset(), aggregate(), and summarizeFeatures(), etc. for easier subsetting and summarization of imaging datasets

  • Delayed pre-processing via a new process() method that allows queueing of delayed pre-processing methods such as normalize() and peakPick() for later execution

  • Parallel processing support via the BiocParallel package for all pre-processing methods and any statistical analysis methods with a BPPARAM option

Classes from older versions of Cardinal should be coerced to their Cardinal 2 equivalents. For example, to return an updated MSImageSet object called x, use as(x, "MSImagingExperiment").

2 Installation

Cardinal can be installed via the BiocManager package.

install.packages("BiocManager")
BiocManager::install("Cardinal")

The same function can be used to update Cardinal and other Bioconductor packages.

Once installed, Cardinal can be loaded with library():

library(Cardinal)

3 Data import

Cardinal natively supports reading and writing imzML (both “continuous” and “processed” types) and Analyze 7.5 formats via the readMSIData() and writeMSIData() functions.

The imzML format is an open standard designed specifically for interchange of mass spectrometry imaging datasets. Vendor-specific raw formats can be converted to imzML with the help of free applications available online at .

3.1 Reading “continuous” imzML

We can read an example of a “continuous” imzML file from the CardinalIO package:

path_continuous <- CardinalIO::exampleImzMLFile("continuous")
path_continuous
## [1] "/home/biocbuild/bbs-3.20-bioc/R/site-library/CardinalIO/extdata/Example_Continuous_imzML1.1.1/Example_Continuous.imzML"
mse_tiny <- readMSIData(path_continuous)
mse_tiny
## MSImagingExperiment with 8399 features and 9 spectra 
## spectraData(1): intensity
## featureData(1): mz
## pixelData(3): x, y, run
## coord(2): x = 1...3, y = 1...3
## runNames(1): Example_Continuous
## experimentData(14): spectrumType, spectrumRepresentation, contactName, ..., scanType, lineScanDirection, pixelSize
## mass range: 100.0833 to 799.9167 
## centroided: FALSE

A “continuous” imzML file contains mass spectra where all of the spectra have the same m/z values. It is returned as an MSImagingExperiment object, which contains both the spectra and the experimental metadata.

3.2 Reading “processed” imzML

We can also read an example of a “processed” imzML file from the CardinalIO package:

path_processed <- CardinalIO::exampleImzMLFile("processed")
path_processed
## [1] "/home/biocbuild/bbs-3.20-bioc/R/site-library/CardinalIO/extdata/Example_Processed_imzML1.1.1/Example_Processed.imzML"
msa_tiny <- readMSIData(path_processed)
msa_tiny
## MSImagingArrays with 9 spectra 
## spectraData(2): intensity, mz
## pixelData(3): x, y, run
## coord(2): x = 1...3, y = 1...3
## runNames(1): Example_Processed
## experimentData(14): spectrumType, spectrumRepresentation, contactName, ..., scanType, lineScanDirection, pixelSize
## centroided: FALSE 
## continuous: FALSE

A “processed” imzML file contains mass spectra where each spectrum has its own m/z values. Despite the name, it can still contain profile spectra. For “processed” imzML, the data is returned as an MSImagingArrays object.

4 Data structures for MS imaging

Cardinal 3.6 introduces a simple set of new data structures for organizing data from MS imaging experiments.

Cardinal classes

These are further explored in the next sections.

4.1 MSImagingArrays: Mass spectra with differing m/z values

In Cardinal, mass spectral data with differing m/z values are stored in MSImagingArrays objects.

msa_tiny
## MSImagingArrays with 9 spectra 
## spectraData(2): intensity, mz
## pixelData(3): x, y, run
## coord(2): x = 1...3, y = 1...3
## runNames(1): Example_Processed
## experimentData(14): spectrumType, spectrumRepresentation, contactName, ..., scanType, lineScanDirection, pixelSize
## centroided: FALSE 
## continuous: FALSE

An MSImagingArrays object is conceptually a list of mass spectra with a companion data frame of spectrum-level pixel metadata.

This dataset contains 9 mass spectra. It can be subset like a list:

msa_tiny[1:3]
## MSImagingArrays with 3 spectra 
## spectraData(2): intensity, mz
## pixelData(3): x, y, run
## coord(2): x = 1...3, y = 1...1
## runNames(1): Example_Processed
## experimentData(14): spectrumType, spectrumRepresentation, contactName, ..., scanType, lineScanDirection, pixelSize
## centroided: FALSE 
## continuous: FALSE

4.1.1 Accessing spectra arrays with spectraData()

The spectral data can be accessed with spectraData().

spectraData(msa_tiny)
## SpectraArrays of length 2 
##        names(2):   intensity          mz
##        class(2): matter_list matter_list
##       length(2):         <9>         <9>
##     real mem(2):     6.75 KB     6.75 KB
##   shared mem(2):        0 KB        0 KB
##  virtual mem(2):   302.37 KB   302.37 KB

The list of spectral data arrays are stored in a SpectraArrays object. An MSImagingArrays object must have at least two arrays named “mz” and “intensity”, which are lists of the m/z arrays and intensity arrays.

The spectra() accessor can be used to access specific spectra arrays.

spectra(msa_tiny, "mz")
## <9 length> matter_list :: out-of-core list
##              [1]      [2]      [3]      [4]      [5]      [6] ...
## $Scan=1 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
##              [1]      [2]      [3]      [4]      [5]      [6] ...
## $Scan=2 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
##              [1]      [2]      [3]      [4]      [5]      [6] ...
## $Scan=3 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
##              [1]      [2]      [3]      [4]      [5]      [6] ...
## $Scan=4 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
##              [1]      [2]      [3]      [4]      [5]      [6] ...
## $Scan=5 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
##              [1]      [2]      [3]      [4]      [5]      [6] ...
## $Scan=6 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
## ...
## (6.75 KB real | 0 bytes shared | 302.37 KB virtual)
spectra(msa_tiny, "intensity")
## <9 length> matter_list :: out-of-core list
##         [1] [2] [3] [4] [5] [6] ...
## $Scan=1   0   0   0   0   0   0 ...
##         [1] [2] [3] [4] [5] [6] ...
## $Scan=2   0   0   0   0   0   0 ...
##         [1] [2] [3] [4] [5] [6] ...
## $Scan=3   0   0   0   0   0   0 ...
##         [1] [2] [3] [4] [5] [6] ...
## $Scan=4   0   0   0   0   0   0 ...
##         [1] [2] [3] [4] [5] [6] ...
## $Scan=5   0   0   0   0   0   0 ...
##         [1] [2] [3] [4] [5] [6] ...
## $Scan=6   0   0   0   0   0   0 ...
## ...
## (6.75 KB real | 0 bytes shared | 302.37 KB virtual)

Alternatively, we can use the mz() and intensity() accessors to get these specific arrays.

mz(msa_tiny)
## <9 length> matter_list :: out-of-core list
##              [1]      [2]      [3]      [4]      [5]      [6] ...
## $Scan=1 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
##              [1]      [2]      [3]      [4]      [5]      [6] ...
## $Scan=2 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
##              [1]      [2]      [3]      [4]      [5]      [6] ...
## $Scan=3 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
##              [1]      [2]      [3]      [4]      [5]      [6] ...
## $Scan=4 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
##              [1]      [2]      [3]      [4]      [5]      [6] ...
## $Scan=5 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
##              [1]      [2]      [3]      [4]      [5]      [6] ...
## $Scan=6 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
## ...
## (6.75 KB real | 0 bytes shared | 302.37 KB virtual)
intensity(msa_tiny)
## <9 length> matter_list :: out-of-core list
##         [1] [2] [3] [4] [5] [6] ...
## $Scan=1   0   0   0   0   0   0 ...
##         [1] [2] [3] [4] [5] [6] ...
## $Scan=2   0   0   0   0   0   0 ...
##         [1] [2] [3] [4] [5] [6] ...
## $Scan=3   0   0   0   0   0   0 ...
##         [1] [2] [3] [4] [5] [6] ...
## $Scan=4   0   0   0   0   0   0 ...
##         [1] [2] [3] [4] [5] [6] ...
## $Scan=5   0   0   0   0   0   0 ...
##         [1] [2] [3] [4] [5] [6] ...
## $Scan=6   0   0   0   0   0   0 ...
## ...
## (6.75 KB real | 0 bytes shared | 302.37 KB virtual)

Note that the full spectra are not fully loaded into memory. Instead, they are represented as out-of-memory matter lists. For the most part, these lists can be treated as ordinary R lists, but the spectra are only loaded from storage on-the-fly as they are accessed.

4.1.2 Accessing pixel metadata with pixelData()

The spectrum-level pixel metadata can be accessed with pixelData(). Alternatively, pData() is a shorter alias that does the same thing.

pixelData(msa_tiny)
## PositionDataFrame with 9 rows and 3 columns
##                x         y               run
##        <numeric> <numeric>          <factor>
## Scan=1         1         1 Example_Processed
## Scan=2         2         1 Example_Processed
## Scan=3         3         1 Example_Processed
## Scan=4         1         2 Example_Processed
## Scan=5         2         2 Example_Processed
## Scan=6         3         2 Example_Processed
## Scan=7         1         3 Example_Processed
## Scan=8         2         3 Example_Processed
## Scan=9         3         3 Example_Processed
## coord(2): x, y
## run(1): run
pData(msa_tiny)
## PositionDataFrame with 9 rows and 3 columns
##                x         y               run
##        <numeric> <numeric>          <factor>
## Scan=1         1         1 Example_Processed
## Scan=2         2         1 Example_Processed
## Scan=3         3         1 Example_Processed
## Scan=4         1         2 Example_Processed
## Scan=5         2         2 Example_Processed
## Scan=6         3         2 Example_Processed
## Scan=7         1         3 Example_Processed
## Scan=8         2         3 Example_Processed
## Scan=9         3         3 Example_Processed
## coord(2): x, y
## run(1): run

The pixel metadata is stored in a PositionDataFrame, with a row for each mass spectrum in the dataset. This data frame stores position information, run information, and all other spectrum-level metadata.

The coord() accessor retrieves the columns giving the positions of the spectra.

coord(msa_tiny)
## DataFrame with 9 rows and 2 columns
##                x         y
##        <numeric> <numeric>
## Scan=1         1         1
## Scan=2         2         1
## Scan=3         3         1
## Scan=4         1         2
## Scan=5         2         2
## Scan=6         3         2
## Scan=7         1         3
## Scan=8         2         3
## Scan=9         3         3

Use runNames() to access the names of the experimental runs (by default set to the file name) and run() to access the run for each spectrum.

runNames(msa_tiny)
## [1] "Example_Processed"
head(run(msa_tiny))
## [1] Example_Processed Example_Processed Example_Processed Example_Processed
## [5] Example_Processed Example_Processed
## Levels: Example_Processed

This data frame is also used to store any other spectrum-level metadata or statistical summaries.

4.2 MSImagingExperiment: Mass spectra with shared m/z values

In Cardinal, mass spectral data with the same m/z values are stored in MSImagingExperiment objects.

mse_tiny
## MSImagingExperiment with 8399 features and 9 spectra 
## spectraData(1): intensity
## featureData(1): mz
## pixelData(3): x, y, run
## coord(2): x = 1...3, y = 1...3
## runNames(1): Example_Continuous
## experimentData(14): spectrumType, spectrumRepresentation, contactName, ..., scanType, lineScanDirection, pixelSize
## mass range: 100.0833 to 799.9167 
## centroided: FALSE

An MSImagingExperiment object is conceptually a matrix where the mass spectra are columns. The rows represent the flattened images for each mass feature.

This dataset contains 9 mass spectra each with the same 8,399 m/z values. It can be subset like a matrix:

mse_tiny[1:500, 1:3]
## MSImagingExperiment with 500 features and 3 spectra 
## spectraData(1): intensity
## featureData(1): mz
## pixelData(3): x, y, run
## coord(2): x = 1...3, y = 1...1
## runNames(1): Example_Continuous
## experimentData(14): spectrumType, spectrumRepresentation, contactName, ..., scanType, lineScanDirection, pixelSize
## mass range: 100.0833 to 141.6667 
## centroided: FALSE

For an MSImagingExperiment, the spectral data are stored as a single matrix of intensities that can be accessed with spectra().

spectraData(mse_tiny)
## SpectraArrays of length 1 
##        names(1):  intensity
##        class(1): matter_mat
##          dim(1): <8399 x 9>
##     real mem(1):    7.16 KB
##   shared mem(1):       0 KB
##  virtual mem(1):  302.37 KB
spectra(mse_tiny)
## <8399 row x 9 col> matter_mat :: out-of-core double matrix
##      Scan=1 Scan=2 Scan=3 Scan=4 Scan=5 Scan=6 ...
## [1,]      0      0      0      0      0      0 ...
## [2,]      0      0      0      0      0      0 ...
## [3,]      0      0      0      0      0      0 ...
## [4,]      0      0      0      0      0      0 ...
## [5,]      0      0      0      0      0      0 ...
## [6,]      0      0      0      0      0      0 ...
## ...     ...    ...    ...    ...    ...    ... ...
## (7.16 KB real | 0 bytes shared | 302.37 KB virtual)

The spectrum-level pixel metadata is accessible via pixelData() just like MSImagingArrays.

pixelData(mse_tiny)
## PositionDataFrame with 9 rows and 3 columns
##                x         y                run
##        <numeric> <numeric>           <factor>
## Scan=1         1         1 Example_Continuous
## Scan=2         2         1 Example_Continuous
## Scan=3         3         1 Example_Continuous
## Scan=4         1         2 Example_Continuous
## Scan=5         2         2 Example_Continuous
## Scan=6         3         2 Example_Continuous
## Scan=7         1         3 Example_Continuous
## Scan=8         2         3 Example_Continuous
## Scan=9         3         3 Example_Continuous
## coord(2): x, y
## run(1): run

The primary difference between MSImagingExperiment and MSImagingArrays is that that all of spectra share the same m/z values, so MSImagingExperiment can store feature metadata.

4.2.1 Accessing feature metadata with featureData()

The feature metadata can be accessed with featureData(). Alternatively, fData() is a shorter alias that does the same thing.

featureData(mse_tiny)
## MassDataFrame with 8399 rows and 1 column
##             mz
##      <numeric>
## 1      100.083
## 2      100.167
## 3      100.250
## 4      100.333
## 5      100.417
## ...        ...
## 8395   799.583
## 8396   799.667
## 8397   799.750
## 8398   799.833
## 8399   799.917
## mz(1): mz
fData(mse_tiny)
## MassDataFrame with 8399 rows and 1 column
##             mz
##      <numeric>
## 1      100.083
## 2      100.167
## 3      100.250
## 4      100.333
## 5      100.417
## ...        ...
## 8395   799.583
## 8396   799.667
## 8397   799.750
## 8398   799.833
## 8399   799.917
## mz(1): mz

Because all of the mass spectra share the same m/z values, a single vector of m/z values can be accessed using mz().

head(mz(mse_tiny))
## [1] 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000

This data frame is also used to store any other feature-level metadata or statistical summaries.

4.2.2 Building from scratch

Typically data is read into R using readMSIData(), but sometimes it is necessary to build a MSImagingExperiment object from scratch. This may be necessary if trying to import data formats other than imzML or Analyze 7.5.

set.seed(2020, kind="L'Ecuyer-CMRG")
s <- simulateSpectra(n=9, npeaks=10, from=500, to=600)

coord <- expand.grid(x=1:3, y=1:3)
run <- factor(rep("run0", nrow(coord)))

fdata <- MassDataFrame(mz=s$mz)
pdata <- PositionDataFrame(run=run, coord=coord)

out <- MSImagingExperiment(spectraData=s$intensity,
    featureData=fdata,
    pixelData=pdata)
out
## MSImagingExperiment with 456 features and 9 spectra 
## spectraData(1): intensity
## featureData(1): mz
## pixelData(3): x, y, run
## coord(2): x = 1...3, y = 1...3
## runNames(1): run0
## mass range: 500.0000 to 599.8071 
## centroided: NA

For loading other data formats into R, read.csv() and read.table() can be used to read CSV and tab-delimited text files, respectively.

Likewise, write.csv() and write.table() can be used to write pixel metadata and feature metadata after coercing them to an ordinary R data.frame with as.data.frame().

Use saveRDS() and readRDS() to save and read and entire R object such as a MSImagingExperiment. Note that if intensity data is to be saved as well, it should be pulled into memory and coerced to an R matrix with as.matrix() first. However, it is typically better to write an imzML file using writeMSIData().

5 Visualization

Visualization of mass spectra and molecular ion images is vital for exploratory analysis of MS imaging experiments. Cardinal provides plot() methods for plotting mass spectra and aimage() methods for plotting images.

We will use simulated data for visualization. We will create versions of the dataset represented as both MSImagingArrays and MSImagingExperiment.

# Simulate an MSImagingExperiment
set.seed(2020, kind="L'Ecuyer-CMRG")
mse <- simulateImage(preset=6, dim=c(32,32), baseline=0.5)
mse
## MSImagingExperiment with 3879 features and 2048 spectra 
## spectraData(1): intensity
## featureData(1): mz
## pixelData(8): x, y, run, ..., circleA, circleB, condition
## coord(2): x = 1...32, y = 1...32
## runNames(2): runA1, runB1
## metadata(1): design
## mass range:  462.3758 to 2181.0856 
## centroided: FALSE
# Create a version as MSImagingArrays
msa <- convertMSImagingExperiment2Arrays(mse)
msa
## MSImagingArrays with 2048 spectra 
## spectraData(2): intensity, mz
## pixelData(8): x, y, run, ..., circleA, circleB, condition
## coord(2): x = 1...32, y = 1...32
## runNames(2): runA1, runB1
## metadata(1): design
## centroided: FALSE 
## continuous: TRUE

5.1 Visualizing spectra with plot()

Use plot() to plot mass spectra from a MSImagingArrays or MSImagingExperiment object. Below we plot the 463rd and 628th mass spectra in the dataset.

plot(msa, i=c(496, 1520))

Alternatively, we can specify the coordinates.

plot(msa, coord=list(x=16, y=16))

We can use superpose to overlay the mass spectra and xlim to control the mass range.

plot(msa, i=c(496, 1520), xlim=c(1000, 1250),
    superpose=TRUE)