Package: xcms
Authors: Johannes Rainer, Michael Witting
Modified: 2023-10-20 14:12:46.866469
Compiled: Mon Mar 18 19:55:44 2024

1 Introduction

Metabolite identification is an important step in non-targeted metabolomics and requires different steps. One involves the use of tandem mass spectrometry to generate fragmentation spectra of detected metabolites (LC-MS/MS), which are then compared to fragmentation spectra of known metabolites. Different approaches exist for the generation of these fragmentation spectra, whereas the most used is data dependent acquisition (DDA) also known as the top-n method. In this method the top N most intense ions (m/z values) from a MS1 scan are selected for fragmentation in the next N scans before the cycle starts again. This method allows to generate clean MS2 fragmentation spectra on the fly during acquisition without the need for further experiments, but suffers from poor coverage of the detected metabolites (since only a limited number of ions are fragmented) and less accurate quantification of the compounds (since fewer MS1 scans are generated).

Data independent approaches (DIA) like Bruker bbCID, Agilent AllIons or Waters MSe don’t use such a preselection, but rather fragment all detected molecules at once. They are using alternating schemes with scan of low and high collision energy to collect MS1 and MS2 data. Using this approach, there is no problem in coverage, but the relation between the precursor and fragment masses is lost leading to chimeric spectra. Sequential Window Acquisition of all Theoretical Mass Spectra (or SWATH [1]) combines both approaches through a middle-way approach. There is no precursor selection and acquisition is independent of acquired data, but rather than isolating all precusors at once, defined windows (i.e. ranges of m/z values) are used and scanned. This reduces the overlap of fragment spectra while still keeping a high coverage.

This document showcases the analysis of two small LC-MS/MS data sets using xcms. The data files used are reversed-phase LC-MS/MS runs from the Agilent Pesticide mix obtained from a Sciex 6600 Triple ToF operated in SWATH acquisition mode. For comparison a DDA file from the same sample is included.

2 Analysis of DDA data

Below we load the example DDA data set and create a total ion chromatogram of its MS1 data.

library(xcms)
library(MsExperiment)

dda_file <- system.file("TripleTOF-SWATH", "PestMix1_DDA.mzML",
                        package = "msdata")
dda_data <- readMsExperiment(dda_file)
chr <- chromatogram(dda_data, aggregationFun = "sum", msLevel = 1L)
## Warning in as.data.frame.integer(col, optional = optional): Direct call of
## 'as.data.frame.integer()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead

According to the TIC most of the signal is measured between ~ 200 and 600 seconds (see plot below). We thus filter the DDA data to this retention time range.

plot(chr)
abline(v = c(230, 610))

## filter the data
dda_data <- filterRt(dda_data, rt = c(230, 610))
## Filter spectra

The variable dda_data contains now all MS1 and MS2 spectra from the specified mzML file. The number of spectra for each MS level is listed below. Note that we subset the experiment to the first data file (using [1]) and then access directly the spectra within this sample with the spectra function (which returns a Spectra object from the Spectra package). Note that we use the pipe operator |> for better readability.

dda_data[1] |>
spectra() |>
msLevel() |>
table()
## 
##    1    2 
## 1389 2214

For the MS2 spectra we can get the m/z of the precursor ion with the precursorMz function. Below we first subset the data set again to a single sample and filter to spectra from MS level 2 extracting then their precursor m/z values.

dda_data[1] |>
spectra() |>
filterMsLevel(2) |>
precursorMz() |>
head()
## [1] 182.18777 182.18893  55.00579 182.19032 237.12296 237.11987

With the precursorIntensity function it is also possible to extract the intensity of the precursor ion.

dda_data[1] |>
spectra() |>
filterMsLevel(2) |>
precursorIntensity() |>
head()
## [1] 0 0 0 0 0 0

Some manufacturers (like Sciex) don’t define/export the precursor intensity and thus either NA or 0 is reported. We can however use the estimatePrecursorIntensity function from the Spectra package to determine the precursor intensity for a MS 2 spectrum based on the intensity of the respective ion in the previous MS1 scan (note that with method = "interpolation" the precursor intensity would be defined based on interpolation between the intensity in the previous and subsequent MS1 scan). Below we estimate the precursor intensities, on the full data (for MS1 spectra a NA value is reported).

prec_int <- estimatePrecursorIntensity(spectra(dda_data))

We next set the precursor intensity in the spectrum metadata of dda_data. So that it can be extracted later with the precursorIntensity function.

spectra(dda_data)$precursorIntensity <- prec_int

dda_data[1] |>
spectra() |>
filterMsLevel(2) |>
precursorIntensity() |>
head()
## [1]        NA  9.198155  2.773988 27.590797  3.443145  7.621923

Next we perform the chromatographic peak detection on the MS level 1 data with the findChromPeaks method. Below we define the settings for a centWave-based peak detection and perform the analysis.

cwp <- CentWaveParam(snthresh = 5, noise = 100, ppm = 10,
                     peakwidth = c(3, 30))
dda_data <- findChromPeaks(dda_data, param = cwp, msLevel = 1L)

In total 114 peaks were identified in the present data set.

The advantage of LC-MS/MS data is that (MS1) ions are fragmented and the corresponding MS2 spectra measured. Thus, for some of the ions (identified as MS1 chromatographic peaks) MS2 spectra are available. These can facilitate the annotation of the respective MS1 chromatographic peaks (or MS1 features after a correspondence analysis). Spectra for identified chromatographic peaks can be extracted with the chromPeakSpectra method. MS2 spectra with their precursor m/z and retention time within the rt and m/z range of the chromatographic peak are returned.

library(Spectra)
dda_spectra <- chromPeakSpectra(dda_data, msLevel = 2L)
dda_spectra
## MSn data (Spectra) with 142 spectra in a MsBackendMzR backend:
##       msLevel     rtime scanIndex
##     <integer> <numeric> <integer>
## 1           2   237.869      1812
## 2           2   241.299      1846
## 3           2   325.763      2439
## 4           2   326.583      2446
## 5           2   327.113      2450
## ...       ...       ...       ...
## 138         2   574.725      5110
## 139         2   575.255      5115
## 140         2   596.584      5272
## 141         2   592.424      5236
## 142         2   596.054      5266
##  ... 34 more variables/columns.
## 
## file(s):
## PestMix1_DDA.mzML
## Processing:
##  Filter: select retention time [230..610] on MS level(s) 1 2 [Mon Mar 18 19:55:46 2024]
##  Filter: select MS level(s) 2 [Mon Mar 18 19:55:56 2024]
##  Merge 1 Spectra into one [Mon Mar 18 19:55:56 2024]

By default chromPeakSpectra returns all spectra associated with a MS1 chromatographic peak, but parameter method allows to choose and return only one spectrum per peak (have a look at the ?chromPeakSpectra help page for more details). Also, it would be possible to extract MS1 spectra for each peak by specifying msLevel = 1L in the call above (e.g. to evaluate the full MS1 signal at the peak’s apex position).

The returned Spectra contains also the reference to the respective chromatographic peak as additional spectra variable "peak_id" that contains the identifier for the chromatographic peak (i.e. its row name in the chromPeaks matrix).

dda_spectra$peak_id
##   [1] "CP004" "CP004" "CP005" "CP005" "CP006" "CP006" "CP008" "CP008" "CP011"
##  [10] "CP011" "CP012" "CP012" "CP013" "CP013" "CP013" "CP013" "CP014" "CP014"
##  [19] "CP014" "CP014" "CP018" "CP022" "CP022" "CP022" "CP025" "CP025" "CP025"
##  [28] "CP025" "CP026" "CP026" "CP026" "CP026" "CP033" "CP033" "CP034" "CP034"
##  [37] "CP034" "CP034" "CP034" "CP035" "CP035" "CP035" "CP041" "CP041" "CP041"
##  [46] "CP042" "CP042" "CP042" "CP043" "CP047" "CP047" "CP049" "CP049" "CP049"
##  [55] "CP049" "CP050" "CP050" "CP050" "CP051" "CP051" "CP051" "CP054" "CP055"
##  [64] "CP055" "CP055" "CP056" "CP056" "CP056" "CP056" "CP056" "CP060" "CP060"
##  [73] "CP060" "CP060" "CP064" "CP064" "CP065" "CP065" "CP066" "CP066" "CP069"
##  [82] "CP069" "CP069" "CP070" "CP070" "CP070" "CP072" "CP072" "CP072" "CP073"
##  [91] "CP074" "CP074" "CP074" "CP074" "CP075" "CP075" "CP075" "CP077" "CP077"
## [100] "CP077" "CP079" "CP079" "CP079" "CP079" "CP080" "CP080" "CP080" "CP081"
## [109] "CP086" "CP086" "CP086" "CP086" "CP086" "CP088" "CP088" "CP088" "CP089"
## [118] "CP089" "CP091" "CP091" "CP093" "CP093" "CP094" "CP094" "CP094" "CP095"
## [127] "CP095" "CP095" "CP096" "CP096" "CP096" "CP098" "CP098" "CP098" "CP098"
## [136] "CP098" "CP099" "CP099" "CP099" "CP100" "CP101" "CP101"

Note also that with return.type = "List" a list parallel to the chromPeaks matrix would be returned, i.e. each element in that list would contain the spectra for the chromatographic peak with the same index. Such data representation might eventually simplify further processing.

We next use the MS2 information to aid in the annotation of a chromatographic peak. As an example we use a chromatographic peak of an ion with an m/z of 304.1131 which we extract in the code block below.

ex_mz <- 304.1131
chromPeaks(dda_data, mz = ex_mz, ppm = 20)
##             mz    mzmin    mzmax      rt   rtmin   rtmax    into     intb
## CP056 304.1133 304.1126 304.1143 425.024 417.985 441.773 13040.8 13007.79
##           maxo  sn sample
## CP056 3978.987 232      1

A search of potential ions with a similar m/z in a reference database (e.g. Metlin) returned a large list of potential hits, most with a very small ppm. For two of the hits, Flumazenil (Metlin ID 2724) and Fenamiphos (Metlin ID 72445) experimental MS2 spectra are available. Thus, we could match the MS2 spectrum for the identified chromatographic peak against these to annotate our ion. Below we extract all MS2 spectra that were associated with the candidate chromatographic peak using the ID of the peak in the present data set.

ex_id <- rownames(chromPeaks(dda_data, mz = ex_mz, ppm = 20))
ex_spectra <- dda_spectra[dda_spectra$peak_id == ex_id]
ex_spectra
## MSn data (Spectra) with 5 spectra in a MsBackendMzR backend:
##     msLevel     rtime scanIndex
##   <integer> <numeric> <integer>
## 1         2   418.926      3505
## 2         2   419.306      3510
## 3         2   423.036      3582
## 4         2   423.966      3603
## 5         2   424.296      3609
##  ... 34 more variables/columns.
## 
## file(s):
## PestMix1_DDA.mzML
## Processing:
##  Filter: select retention time [230..610] on MS level(s) 1 2 [Mon Mar 18 19:55:46 2024]
##  Filter: select MS level(s) 2 [Mon Mar 18 19:55:56 2024]
##  Merge 1 Spectra into one [Mon Mar 18 19:55:56 2024]

There are 5 MS2 spectra representing fragmentation of the ion(s) measured in our candidate chromatographic peak. We next reduce this to a single MS2 spectrum using the combineSpectra method employing the combinePeaks function to determine which peaks to keep in the resulting spectrum (have a look at the ?combinePeaks help page for details). Parameter f allows to specify which spectra in the input object should be combined into one. Note that this combination of multiple fragment spectra into a single spectrum might not be generally the best approach or suggested for all types of data.

ex_spectrum <- combineSpectra(ex_spectra, FUN = combinePeaks, ppm = 20,
                              peaks = "intersect", minProp = 0.8,
                              intensityFun = median, mzFun = median,
                              f = ex_spectra$peak_id)
## Warning in as.data.frame.integer(col, optional = optional): Direct call of
## 'as.data.frame.integer()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead
## Warning in as.data.frame.numeric(col, optional = optional): Direct call of
## 'as.data.frame.numeric()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead
## Warning in as.data.frame.integer(col, optional = optional): Direct call of
## 'as.data.frame.integer()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead

## Warning in as.data.frame.integer(col, optional = optional): Direct call of
## 'as.data.frame.integer()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead
## Warning in as.data.frame.logical(col, optional = optional): Direct call of
## 'as.data.frame.logical()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead

## Warning in as.data.frame.logical(col, optional = optional): Direct call of
## 'as.data.frame.logical()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead
## Warning in as.data.frame.integer(col, optional = optional): Direct call of
## 'as.data.frame.integer()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead

## Warning in as.data.frame.integer(col, optional = optional): Direct call of
## 'as.data.frame.integer()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead
## Warning in as.data.frame.numeric(col, optional = optional): Direct call of
## 'as.data.frame.numeric()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead

## Warning in as.data.frame.numeric(col, optional = optional): Direct call of
## 'as.data.frame.numeric()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead
## Warning in as.data.frame.integer(col, optional = optional): Direct call of
## 'as.data.frame.integer()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead
## Warning in as.data.frame.numeric(col, optional = optional): Direct call of
## 'as.data.frame.numeric()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead

## Warning in as.data.frame.numeric(col, optional = optional): Direct call of
## 'as.data.frame.numeric()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead

## Warning in as.data.frame.numeric(col, optional = optional): Direct call of
## 'as.data.frame.numeric()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead

## Warning in as.data.frame.numeric(col, optional = optional): Direct call of
## 'as.data.frame.numeric()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead
## Warning in as.data.frame.integer(col, optional = optional): Direct call of
## 'as.data.frame.integer()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead
## Warning in as.data.frame.numeric(col, optional = optional): Direct call of
## 'as.data.frame.numeric()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead

## Warning in as.data.frame.numeric(col, optional = optional): Direct call of
## 'as.data.frame.numeric()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead

## Warning in as.data.frame.numeric(col, optional = optional): Direct call of
## 'as.data.frame.numeric()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead

## Warning in as.data.frame.numeric(col, optional = optional): Direct call of
## 'as.data.frame.numeric()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead

## Warning in as.data.frame.numeric(col, optional = optional): Direct call of
## 'as.data.frame.numeric()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead

## Warning in as.data.frame.numeric(col, optional = optional): Direct call of
## 'as.data.frame.numeric()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead
## Warning in as.data.frame.integer(col, optional = optional): Direct call of
## 'as.data.frame.integer()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead

## Warning in as.data.frame.integer(col, optional = optional): Direct call of
## 'as.data.frame.integer()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead

## Warning in as.data.frame.integer(col, optional = optional): Direct call of
## 'as.data.frame.integer()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead

## Warning in as.data.frame.integer(col, optional = optional): Direct call of
## 'as.data.frame.integer()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead
## Warning in as.data.frame.numeric(col, optional = optional): Direct call of
## 'as.data.frame.numeric()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead

## Warning in as.data.frame.numeric(col, optional = optional): Direct call of
## 'as.data.frame.numeric()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead

## Warning in as.data.frame.numeric(col, optional = optional): Direct call of
## 'as.data.frame.numeric()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead

## Warning in as.data.frame.numeric(col, optional = optional): Direct call of
## 'as.data.frame.numeric()' is deprecated.  Use 'as.data.frame.vector()' or
## 'as.data.frame()' instead
## Warning in FUN(X[[i]], ...): 'combinePeaks' for lists of peak matrices is
## deprecated; please use 'combinePeaksData' instead.
ex_spectrum
## MSn data (Spectra) with 1 spectra in a MsBackendMemory backend:
##     msLevel     rtime scanIndex
##   <integer> <numeric> <integer>
## 1         2   418.926      3505
##  ... 34 more variables/columns.
## Processing:
##  Filter: select retention time [230..610] on MS level(s) 1 2 [Mon Mar 18 19:55:46 2024]
##  Filter: select MS level(s) 2 [Mon Mar 18 19:55:56 2024]
##  Merge 1 Spectra into one [Mon Mar 18 19:55:56 2024]
##  ...1 more processings. Use 'processingLog' to list all.

Mass peaks from all input spectra with a difference in m/z smaller 20 ppm (parameter ppm) were combined into one peak and the median m/z and intensity is reported for these. Due to parameter minProp = 0.8, the resulting MS2 spectrum contains only peaks that were present in 80% of the input spectra.

A plot of this consensus spectrum is shown below.

plotSpectra(ex_spectrum)
Consensus MS2 spectrum created from all measured MS2 spectra for ions of chromatographic peak CP53.

Figure 1: Consensus MS2 spectrum created from all measured MS2 spectra for ions of chromatographic peak CP53

We could now match the consensus spectrum against a database of MS2 spectra. In our example we simply load MS2 spectra for the two compounds with matching m/z exported from Metlin. For each of the compounds MS2 spectra created with collision energies of 0V, 10V, 20V and 40V are available. Below we import the respective data and plot our candidate spectrum against the MS2 spectra of Flumanezil and Fenamiphos (from a collision energy of 20V). To import files in MGF format we have to load the MsBackendMgf Bioconductor package which adds MGF file support to the Spectra package.

Prior plotting we scale our experimental spectra to replace all peak intensities with values relative to the maximum peak intensity (which is set to a value of 100).

scale_fun <- function(z, ...) {
    z[, "intensity"] <- z[, "intensity"] /
        max(z[, "intensity"], na.rm = TRUE) * 100
    z
}
ex_spectrum <- addProcessing(ex_spectrum, FUN = scale_fun)
library(MsBackendMgf)
flumanezil <- Spectra(
    system.file("mgf", "metlin-2724.mgf", package = "xcms"),
    source = MsBackendMgf())
## Start data import from 1 files ... done
fenamiphos <- Spectra(
    system.file("mgf", "metlin-72445.mgf", package = "xcms"),
    source = MsBackendMgf())
## Start data import from 1 files ... done
par(mfrow = c(1, 2))
plotSpectraMirror(ex_spectrum, flumanezil[3], main = "against Flumanezil",
                  ppm = 40)
plotSpectraMirror(ex_spectrum, fenamiphos[3], main = "against Fenamiphos",
                  ppm = 40)