MsFeatures 1.14.0
Package: MsFeatures
Authors: Johannes Rainer [aut, cre] (https://orcid.org/0000-0002-6977-7147)
Last modified: 2024-10-29 16:03:55.688032
Compiled: Tue Oct 29 21:45:20 2024
Electrospray ionization (ESI) is commonly used in mass spectrometry (MS)-based metabolomics to generate ions from the compounds to enable their detection by the MS instrument. Ionization can generate different ions (adducts) of the same original compound which are then reported as separate MS features with different mass-to-charge ratios (m/z). To reduce data set complexity (and to aid subsequent annotation steps) it is advisable to group features which supposedly represent signal from the same original compound into a single entity.
The MsFeatures
package provides key concepts and functions for this feature
grouping. Methods are implemented for base R objects as well as for
Bioconductor’s SummarizedExperiment
class. See also the description of the
general grouping
concept
on the package webpage for more information. Additional grouping methodology is
expected to be implemented in other R packages for data objects with additional
LC-MS related information, such as the XCMSnExp
object in the xcms
package. The implementation for the SummarizedExperiment
provided in this
package can be used as a reference for these additional methodology.
After definition of the feature groups, the QFeatures package could be used to aggregate their abundances into a single signal.
The package can be installed with the BiocManager
package. To
install BiocManager
use install.packages("BiocManager")
and, after that,
BiocManager::install("MsFeatures")
to install this package.
Features from the same originating compound inherit its characteristics including its retention time (for LC or GC-MS experiments) and abundance/intensity. For the latter it is expected that features from the same compound have the same pattern of feature values/abundances across samples.
The MsFeatures
package defines the groupFeatures
method to perform MS
feature grouping based on the provided input data and a parameter object which
selects and defines the feature grouping algorithm. This algorithm is supposed
to assign individual features to a (single) feature group. Currently two feature
grouping approaches are implemented:
SimilarRtimeParam
: group features based on similar retention times.AbundanceSimilarityParam
: group features based on similar feature
values/abundances across samples.Additional algorithms, e.g. by considering also differences in features’ m/z values matching expected ions/adducts or isotopes, may be implemented in future in this or other packages.
In this document we demonstrate the feature grouping functionality on a simple
toy data set used also in the xcms package with the
raw data being provided in the faahKO
data package. This data set consists of
samples from 4 mice with knock-out of the fatty acid amide hydrolase (FAAH) and
4 wild type mice. Pre-processing of this data set is described in detail in the
xcms vignette of the xcms
package. Below we load all required packages and
the result from this pre-processing which is provided as a
SummarizedExperiment
within this package and can be loaded with data(se)
.
library(MsFeatures)
library(SummarizedExperiment)
data("se")
Before performing the feature grouping we inspect the result object. Feature
properties and definitions can be accessed with rowData
, the feature
abundances with assay
.
rowData(se)
## DataFrame with 225 rows and 11 columns
## mzmed mzmin mzmax rtmed rtmin rtmax npeaks
## <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
## FT001 200.1 200.1 200.1 2901.63 2880.73 2922.53 2
## FT002 205.0 205.0 205.0 2789.39 2782.30 2795.36 8
## FT003 206.0 206.0 206.0 2788.73 2780.73 2792.86 7
## FT004 207.1 207.1 207.1 2718.12 2713.21 2726.70 7
## FT005 219.1 219.1 219.1 2518.82 2517.40 2520.81 3
## ... ... ... ... ... ... ... ...
## FT221 591.30 591.3 591.3 3005.03 2992.87 3006.05 5
## FT222 592.15 592.1 592.3 3022.11 2981.91 3107.59 6
## FT223 594.20 594.2 594.2 3418.16 3359.10 3427.90 3
## FT224 595.25 595.2 595.3 3010.15 2992.87 3013.77 6
## FT225 596.20 596.2 596.2 2997.91 2992.87 3002.95 2
## KO WT peakidx ms_level
## <numeric> <numeric> <list> <integer>
## FT001 2 0 287, 679,1577,... 1
## FT002 4 4 47,272,542,... 1
## FT003 3 4 32,259,663,... 1
## FT004 4 3 19,249,525,... 1
## FT005 1 2 639, 788,1376,... 1
## ... ... ... ... ...
## FT221 2 3 349,684,880,... 1
## FT222 1 3 86,861,862,... 1
## FT223 1 2 604, 985,1543,... 1
## FT224 2 3 67,353,876,... 1
## FT225 0 2 866,1447,1643,... 1
head(assay(se))
## ko15.CDF ko16.CDF ko21.CDF ko22.CDF wt15.CDF wt16.CDF wt21.CDF
## FT001 159738.1 506848.88 113441.08 169955.6 216096.6 145509.7 230477.9
## FT002 1924712.0 1757150.96 1383416.72 1180288.2 2129885.1 1634342.0 1623589.2
## FT003 213659.3 289500.67 162897.19 178285.7 253825.6 241844.4 240606.0
## FT004 349011.5 451863.66 343897.76 208002.8 364609.8 360908.9 223322.5
## FT005 135978.5 25524.79 71530.84 107348.5 223951.8 134398.9 190203.8
## FT006 286221.4 289908.23 164008.97 149097.6 255697.7 311296.8 366441.5
## wt22.CDF
## FT001 140551.30
## FT002 1354004.93
## FT003 185399.47
## FT004 221937.53
## FT005 84772.92
## FT006 271128.02
Columns "mzmed"
and "rtmed"
in the object’s rowData
provide the m/z and
retention time which characterizes each feature. In total 225
features are available in the present data set, with many of them most likely
representing signal from different ions of the same compound. We aim to identify
these based on the following assumptions of the LC-MS data:
As detailed in the general grouping
concept,
the feature grouping implemented in MsFeatures
is by default intended to be
used as a stepwise approach in which each groupFeatures
call further
sub-groups (and thus refines) previously defined feature groups. This enables to
either use a single algorithm for the feature grouping or to build a feature
grouping pipeline by combining different algorithms. In our example we perform
first a initial grouping of features based on similar retention time and
subsequently further refine these feature groups by requiring also similarity of
feature values across samples.
Note that it would also be possible to perform the grouping only on a subset of features instead of the full data set. An example is provided in the last section of this vignette.
The most intuitive and simple way to group LC-MS features is based on their retention times: ionization of the compounds happens after the LC and thus all ions from the same compound should have the same retention time. The plot below shows the retention times (and m/z) of all features from the present data set.
plot(rowData(se)$rtmed, rowData(se)$mzmed,
xlab = "retention time", ylab = "m/z", main = "features",
col = "#00000060")
grid()