xcms 4.5.0
Package: xcms
Authors: Johannes Rainer, Michael Witting
Modified: 2024-10-23 19:24:55.947541
Compiled: Tue Oct 29 20:07:48 2024
Metabolite identification is an important step in non-targeted metabolomics and requires different steps. One involves the use of tandem mass spectrometry to generate fragmentation spectra of detected metabolites (LC-MS/MS), which are then compared to fragmentation spectra of known metabolites. Different approaches exist for the generation of these fragmentation spectra, whereas the most used is data dependent acquisition (DDA) also known as the top-n method. In this method the top N most intense ions (m/z values) from a MS1 scan are selected for fragmentation in the next N scans before the cycle starts again. This method allows to generate clean MS2 fragmentation spectra on the fly during acquisition without the need for further experiments, but suffers from poor coverage of the detected metabolites (since only a limited number of ions are fragmented) and less accurate quantification of the compounds (since fewer MS1 scans are generated).
Data independent approaches (DIA) like Bruker bbCID, Agilent AllIons or Waters MSe don’t use such a preselection, but rather fragment all detected molecules at once. They are using alternating schemes with scan of low and high collision energy to collect MS1 and MS2 data. Using this approach, there is no problem in coverage, but the relation between the precursor and fragment masses is lost leading to chimeric spectra. Sequential Window Acquisition of all Theoretical Mass Spectra (or SWATH [1]) combines both approaches through a middle-way approach. There is no precursor selection and acquisition is independent of acquired data, but rather than isolating all precusors at once, defined windows (i.e. ranges of m/z values) are used and scanned. This reduces the overlap of fragment spectra while still keeping a high coverage.
This document showcases the analysis of two small LC-MS/MS data sets using xcms. The data files used are reversed-phase LC-MS/MS runs from the Agilent Pesticide mix obtained from a Sciex 6600 Triple ToF operated in SWATH acquisition mode. For comparison a DDA file from the same sample is included.
Below we load the example DDA data set and create a total ion chromatogram of its MS1 data.
library(xcms)
library(MsExperiment)
dda_file <- system.file("TripleTOF-SWATH", "PestMix1_DDA.mzML",
package = "msdata")
dda_data <- readMsExperiment(dda_file)
chr <- chromatogram(dda_data, aggregationFun = "sum", msLevel = 1L)
According to the TIC most of the signal is measured between ~ 200 and 600 seconds (see plot below). We thus filter the DDA data to this retention time range.
plot(chr)
abline(v = c(230, 610))
## filter the data
dda_data <- filterRt(dda_data, rt = c(230, 610))
## Filter spectra
The variable dda_data
contains now all MS1 and MS2 spectra from the specified
mzML file. The number of spectra for each MS level is listed below. Note that we
subset the experiment to the first data file (using [1]
) and then access
directly the spectra within this sample with the spectra()
function (which
returns a Spectra
object from the Spectra package). Note that
we use the pipe operator |>
for better readability.
dda_data[1] |>
spectra() |>
msLevel() |>
table()
##
## 1 2
## 1389 2214
For the MS2 spectra we can get the m/z of the precursor ion with the
precursorMz()
function. Below we first subset the data set again to a single
sample and filter to spectra from MS level 2 extracting then their precursor m/z
values.
dda_data[1] |>
spectra() |>
filterMsLevel(2) |>
precursorMz() |>
head()
## [1] 182.18777 182.18893 55.00579 182.19032 237.12296 237.11987
With the precursorIntensity()
function it is also possible to extract the
intensity of the precursor ion.
dda_data[1] |>
spectra() |>
filterMsLevel(2) |>
precursorIntensity() |>
head()
## [1] 0 0 0 0 0 0
Some manufacturers (like Sciex) don’t define/export the precursor intensity and
thus either NA
or 0
is reported. We can however use the
estimatePrecursorIntensity()
function from the Spectra package
to determine the precursor intensity for a MS 2 spectrum based on the intensity
of the respective ion in the previous MS1 scan (note that with method = "interpolation"
the precursor intensity would be defined based on interpolation
between the intensity in the previous and subsequent MS1 scan). Below we
estimate the precursor intensities, on the full data (for MS1 spectra a NA
value is reported).
prec_int <- estimatePrecursorIntensity(spectra(dda_data))
We next set the precursor intensity in the spectrum metadata of dda_data
. So
that it can be extracted later with the precursorIntensity()
function.
spectra(dda_data)$precursorIntensity <- prec_int
dda_data[1] |>
spectra() |>
filterMsLevel(2) |>
precursorIntensity() |>
head()
## [1] NA 9.198155 2.773988 27.590797 3.443145 7.621923
Next we perform the chromatographic peak detection on the MS level 1 data with
the findChromPeaks()
method. Below we define the settings for a
centWave-based peak detection and perform the analysis.
cwp <- CentWaveParam(snthresh = 5, noise = 100, ppm = 10,
peakwidth = c(3, 30))
dda_data <- findChromPeaks(dda_data, param = cwp, msLevel = 1L)
In total 114 peaks were identified in the present data set.
The advantage of LC-MS/MS data is that (MS1) ions are fragmented and the
corresponding MS2 spectra measured. Thus, for some of the ions (identified as
MS1 chromatographic peaks) MS2 spectra are available. These can facilitate the
annotation of the respective MS1 chromatographic peaks (or MS1 features after a
correspondence analysis). Spectra for identified chromatographic peaks can be
extracted with the chromPeakSpectra()
method. MS2 spectra with their precursor
m/z and retention time within the rt and m/z range of the chromatographic peak
are returned.
library(Spectra)
dda_spectra <- chromPeakSpectra(dda_data, msLevel = 2L)
dda_spectra
## MSn data (Spectra) with 142 spectra in a MsBackendMzR backend:
## msLevel rtime scanIndex
## <integer> <numeric> <integer>
## 1 2 237.869 1812
## 2 2 241.299 1846
## 3 2 325.763 2439
## 4 2 326.583 2446
## 5 2 327.113 2450
## ... ... ... ...
## 138 2 574.725 5110
## 139 2 575.255 5115
## 140 2 596.584 5272
## 141 2 592.424 5236
## 142 2 596.054 5266
## ... 34 more variables/columns.
##
## file(s):
## PestMix1_DDA.mzML
## Processing:
## Filter: select retention time [230..610] on MS level(s) 1 2 [Tue Oct 29 20:07:50 2024]
## Filter: select MS level(s) 2 [Tue Oct 29 20:08:00 2024]
## Merge 1 Spectra into one [Tue Oct 29 20:08:00 2024]
By default chromPeakSpectra()
returns all spectra associated with a MS1
chromatographic peak, but parameter method
allows to choose and return only
one spectrum per peak (have a look at the ?chromPeakSpectra
help page for more
details). Also, it would be possible to extract MS1 spectra for each peak by
specifying msLevel = 1L
in the call above (e.g. to evaluate the full MS1
signal at the peak’s apex position).
The returned Spectra
contains also the reference to the respective
chromatographic peak as additional spectra variable "peak_id"
that contains
the identifier for the chromatographic peak (i.e. its row name in the
chromPeaks
matrix).
dda_spectra$peak_id
## [1] "CP004" "CP004" "CP005" "CP005" "CP006" "CP006" "CP008" "CP008" "CP011"
## [10] "CP011" "CP012" "CP012" "CP013" "CP013" "CP013" "CP013" "CP014" "CP014"
## [19] "CP014" "CP014" "CP018" "CP022" "CP022" "CP022" "CP025" "CP025" "CP025"
## [28] "CP025" "CP026" "CP026" "CP026" "CP026" "CP033" "CP033" "CP034" "CP034"
## [37] "CP034" "CP034" "CP034" "CP035" "CP035" "CP035" "CP041" "CP041" "CP041"
## [46] "CP042" "CP042" "CP042" "CP043" "CP047" "CP047" "CP049" "CP049" "CP049"
## [55] "CP049" "CP050" "CP050" "CP050" "CP051" "CP051" "CP051" "CP054" "CP055"
## [64] "CP055" "CP055" "CP056" "CP056" "CP056" "CP056" "CP056" "CP060" "CP060"
## [73] "CP060" "CP060" "CP064" "CP064" "CP065" "CP065" "CP066" "CP066" "CP069"
## [82] "CP069" "CP069" "CP070" "CP070" "CP070" "CP072" "CP072" "CP072" "CP073"
## [91] "CP074" "CP074" "CP074" "CP074" "CP075" "CP075" "CP075" "CP077" "CP077"
## [100] "CP077" "CP079" "CP079" "CP079" "CP079" "CP080" "CP080" "CP080" "CP081"
## [109] "CP086" "CP086" "CP086" "CP086" "CP086" "CP088" "CP088" "CP088" "CP089"
## [118] "CP089" "CP091" "CP091" "CP093" "CP093" "CP094" "CP094" "CP094" "CP095"
## [127] "CP095" "CP095" "CP096" "CP096" "CP096" "CP098" "CP098" "CP098" "CP098"
## [136] "CP098" "CP099" "CP099" "CP099" "CP100" "CP101" "CP101"
Note also that with return.type = "List"
a list parallel to the chromPeaks
matrix would be returned, i.e. each element in that list would contain the
spectra for the chromatographic peak with the same index. Such data
representation might eventually simplify further processing.
We next use the MS2 information to aid in the annotation of a chromatographic peak. As an example we use a chromatographic peak of an ion with an m/z of 304.1131 which we extract in the code block below.
ex_mz <- 304.1131
chromPeaks(dda_data, mz = ex_mz, ppm = 20)
## mz mzmin mzmax rt rtmin rtmax into intb
## CP056 304.1133 304.1126 304.1143 425.024 417.985 441.773 13040.8 13007.79
## maxo sn sample
## CP056 3978.987 232 1
A search of potential ions with a similar m/z in a reference database (e.g. Metlin) returned a large list of potential hits, most with a very small ppm. For two of the hits, Flumazenil (Metlin ID 2724) and Fenamiphos (Metlin ID 72445) experimental MS2 spectra are available. Thus, we could match the MS2 spectrum for the identified chromatographic peak against these to annotate our ion. Below we extract all MS2 spectra that were associated with the candidate chromatographic peak using the ID of the peak in the present data set.
ex_id <- rownames(chromPeaks(dda_data, mz = ex_mz, ppm = 20))
ex_spectra <- dda_spectra[dda_spectra$peak_id == ex_id]
ex_spectra
## MSn data (Spectra) with 5 spectra in a MsBackendMzR backend:
## msLevel rtime scanIndex
## <integer> <numeric> <integer>
## 1 2 418.926 3505
## 2 2 419.306 3510
## 3 2 423.036 3582
## 4 2 423.966 3603
## 5 2 424.296 3609
## ... 34 more variables/columns.
##
## file(s):
## PestMix1_DDA.mzML
## Processing:
## Filter: select retention time [230..610] on MS level(s) 1 2 [Tue Oct 29 20:07:50 2024]
## Filter: select MS level(s) 2 [Tue Oct 29 20:08:00 2024]
## Merge 1 Spectra into one [Tue Oct 29 20:08:00 2024]
There are 5 MS2 spectra representing fragmentation of the ion(s) measured
in our candidate chromatographic peak. We next reduce this to a single MS2
spectrum using the combineSpectra()
method employing the combinePeaks()
function to determine which peaks to keep in the resulting spectrum (have a look
at the ?combinePeaks
help page for details). Parameter f
allows to specify
which spectra in the input object should be combined into one. Note that this
combination of multiple fragment spectra into a single spectrum might not be
generally the best approach or suggested for all types of data.
ex_spectrum <- combineSpectra(ex_spectra, FUN = combinePeaks, ppm = 20,
peaks = "intersect", minProp = 0.8,
intensityFun = median, mzFun = median,
f = ex_spectra$peak_id)
## Warning in FUN(X[[i]], ...): 'combinePeaks' for lists of peak matrices is
## deprecated; please use 'combinePeaksData' instead.
ex_spectrum
## MSn data (Spectra) with 1 spectra in a MsBackendMemory backend:
## msLevel rtime scanIndex
## <integer> <numeric> <integer>
## 1 2 418.926 3505
## ... 34 more variables/columns.
## Processing:
## Filter: select retention time [230..610] on MS level(s) 1 2 [Tue Oct 29 20:07:50 2024]
## Filter: select MS level(s) 2 [Tue Oct 29 20:08:00 2024]
## Merge 1 Spectra into one [Tue Oct 29 20:08:00 2024]
## ...1 more processings. Use 'processingLog' to list all.
Mass peaks from all input spectra with a difference in m/z smaller 20 ppm
(parameter ppm
) were combined into one peak and the median m/z and intensity
is reported for these. Due to parameter minProp = 0.8
, the resulting MS2
spectrum contains only peaks that were present in 80% of the input spectra.
A plot of this consensus spectrum is shown below.
plotSpectra(ex_spectrum)