xcms 4.5.1
Package: xcms
Authors: Johannes Rainer
Modified: 2024-10-23 19:24:55.946541
Compiled: Fri Nov 22 19:31:18 2024
In a typical LC-MS-based metabolomics experiment compounds eluting from the chromatography are first ionized before being measured by mass spectrometry (MS). During the ionization different (multiple) ions can be generated from the same compound which all will be measured by MS. In general, the resulting data is then pre-processed to identify chromatographic peaks in the data and to group these across samples in the correspondence analysis. The result are distinct LC-MS features, characterized by their specific m/z and retention time range. Different ions generated during ionization will be detected as different features. Compounding aims now at grouping such features presumably representing signal from the same originating compound to reduce data set complexity (and to aid in subsequent annotation steps). General MS feature grouping functionality if defined by the MsFeatures package with additional functionality being implemented in the xcms package to enable the compounding of LC-MS data.
This document provides a simple compounding workflow using xcms. Note that the present functionality does not (yet) aggregate or combine the actual features per values, but does only define the feature groups (one per compound).
We demonstrate the compounding (feature grouping) functionality on the simple toy data set used also in the xcms package and provided through the faahKO package. This data set consists of samples from 4 mice with knock-out of the fatty acid amide hydrolase (FAAH) and 4 wild type mice. Pre-processing of this data set is described in detail in the main vignette of the xcms package. Below we load all required packages and the result from this pre-processing updating also the location of the respective raw data files on the current machine.
library(MSnbase)
library(xcms)
library(faahKO)
library(MsFeatures)
xmse <- loadXcmsData("xmse")
Before performing the feature grouping we inspect the result object. With
featureDefinitions
we can extract the results from the correspondence
analysis.
featureDefinitions(xmse) |> head()
## mzmed mzmin mzmax rtmed rtmin rtmax npeaks KO WT peakidx
## FT001 200.1 200.1 200.1 2902.634 2882.603 2922.664 2 2 0 458, 116....
## FT002 205.0 205.0 205.0 2789.901 2782.955 2796.531 8 4 4 44, 443,....
## FT003 206.0 206.0 206.0 2789.405 2781.389 2794.219 7 3 4 29, 430,....
## FT004 207.1 207.1 207.1 2718.560 2714.047 2727.347 7 4 3 16, 420,....
## FT005 233.0 233.0 233.1 3023.579 3015.145 3043.959 7 3 4 69, 959,....
## FT006 241.1 241.1 241.2 3683.299 3661.586 3695.886 8 3 4 276, 284....
## ms_level
## FT001 1
## FT002 1
## FT003 1
## FT004 1
## FT005 1
## FT006 1
Each row in this data frame represents the definition of one feature, with its
average and range of m/z and retention time. Column "peakidx"
provides the
index of each chromatographic peak which is assigned to the feature in the
chromPeaks
matrix of the result object. The featureValues
function allows to
extract feature values, i.e. a matrix with feature abundances, one row per
feature and columns representing the samples of the present data set.
Below we extract the feature values with and without filled-in peak data. Without the gap-filled data only abundances from detected chromatographic peaks are reported. In the gap-filled data, for samples in which no chromatographic peak for a feature was detected, all signal from the m/z - retention time range defined based on the detected chromatographic peaks was integrated.
head(featureValues(xmse, filled = FALSE))
## ko15.CDF ko16.CDF ko21.CDF ko22.CDF wt15.CDF wt16.CDF wt21.CDF
## FT001 NA 506848.9 NA 169955.6 NA NA NA
## FT002 1924712.0 1757151.0 1383416.7 1180288.2 2129885.1 1634342.0 1623589.2
## FT003 213659.3 289500.7 NA 178285.7 253825.6 241844.4 240606.0
## FT004 349011.5 451863.7 343897.8 208002.8 364609.8 360908.9 NA
## FT005 286221.4 NA 164009.0 149097.6 255697.7 311296.8 366441.5
## FT006 1160580.5 NA 380970.3 588986.4 1286883.0 1739516.6 639755.3
## wt22.CDF
## FT001 NA
## FT002 1354004.9
## FT003 185399.5
## FT004 221937.5
## FT005 271128.0
## FT006 508546.4
head(featureValues(xmse, filled = TRUE))
## ko15.CDF ko16.CDF ko21.CDF ko22.CDF wt15.CDF wt16.CDF wt21.CDF
## FT001 135162.4 506848.9 111657.3 169955.6 209929.4 141607.9 226853.7
## FT002 1924712.0 1757151.0 1383416.7 1180288.2 2129885.1 1634342.0 1623589.2
## FT003 213659.3 289500.7 164380.7 178285.7 253825.6 241844.4 240606.0
## FT004 349011.5 451863.7 343897.8 208002.8 364609.8 360908.9 226234.4
## FT005 286221.4 285857.6 164009.0 149097.6 255697.7 311296.8 366441.5
## FT006 1160580.5 1102832.6 380970.3 588986.4 1286883.0 1739516.6 639755.3
## wt22.CDF
## FT001 138341.2
## FT002 1354004.9
## FT003 185399.5
## FT004 221937.5
## FT005 271128.0
## FT006 508546.4
In total 351 features have been defined in the present data set, many of which most likely represent signal from different ions (adducts or isotopes) of the same compound. The aim of the grouping functions of are now to define which features most likely come from the same original compound. The feature grouping functions base on the following assumptions/properties of LC-MS data:
The main method to perform the feature grouping is called groupFeatures
which
takes an xcms result object (i.e., an XcmsExperiment
or XCMSnExp
) as
input as well as a parameter object to chose the grouping algorithm and specify
its settings. xcms provides and supports the following grouping approaches:
SimilarRtimeParam
: perform an initial grouping based on similar retention
time.AbundanceSimilarityParam
: perform a feature grouping based on correlation
of feature abundances (values) across samples.EicSimilarityParam
: perform a feature grouping based on correlation of
EICs.Calling groupFeatures
on an xcms result object will perform a feature
grouping assigning each feature in the data set to a feature group. These
feature groups are stored as an additional column called "feature_group"
in
the featureDefinition
data frame of the result object and can be accessed with
the featureGroups
function. Any subsequent groupFeature
call will
sub-group (refine) the identified feature groups further. It is thus possible
to use a single grouping approach, or to combine multiple of them to generate
the desired feature grouping in an incremental fashion. While the individual
feature grouping algorithms can be called in any order, it is advisable to use
the EicSimilarityParam
as last refinement step, because it is computationally
very expensive, especially if applied to a result object without pre-defined
feature groups or if the feature groups are very large.
The most intuitive and simple way to group features is based on their retention time. Before we perform this initial grouping we evaluate retention times and m/z of all features in the present data set.
plot(featureDefinitions(xmse)$rtmed, featureDefinitions(xmse)$mzmed,
xlab = "retention time", ylab = "m/z", main = "features",
col = "#00000080", pch = 21, bg = "#00000040")
grid()