Contents

1 Introduction

Mass spectrometry measures data in so called profile mode, were the signal corresponding to a specific ion is distributed around the ion’s actual m/z value (Smith et al. 2014). The accuracy of that signal depends on the resolution and settings of the instrument. Profile mode data can be processed into centroid data by retaining only a single, representative value, typically the local maximum of the distribution of data points. This centroiding substantially reduces the amount of data without much loss of information. Certain algorithms, such as the centWave method in the xcms package for chromatographic peak detection in LC-MS experiments or proteomics search engines that match MS2 spectra to peptides, require the data to be in centroid mode. In this vignette, we will focus on metabolomics data.

Many manufacturers apply centroiding of the profile data, either directly during the acquisition or immediately thereafter so that the user immediately receives processed data. Alternatively, third party software, such as msconvert from the proteowizard suite (Chambers et al. 2012) allow to apply various data centroiding algorithms, including vendor methods. In some cases however, the software provided by some vendors generate centroided data of poor quality. MSnbase also provides some functionality to perform centroiding of profile MS data. These processed data can then be further quantified or analysed within R or serialised to mzML files, and used as input for other software.

2 Centroiding of profile-mode MS data

In this vignette we use a subset of a metabolomics profile-mode LC-MS data of pooled human serum samples measured on a AB Sciex TripleTOF 5600+ mass spectrometer (the employed chromatography was a hydrophilic interaction high-performance liquid chromatography (HILIC HPLC)). The mzML file contains profile mode data for an m/z range from 105 to 130 and a retention time from 0 to 240 seconds. For more details on the sample see ?msdata::sciexdata. Below we load the required packages and read the MS data.

library("MSnbase")
library("msdata")
library("magrittr")

fl <- dir(system.file("sciex", package = "msdata"), full.names = TRUE)[2]
basename(fl)
## [1] "20171016_POOL_POS_3_105-134.mzML"
data_prof <- readMSData(fl, mode = "onDisk", centroided = FALSE)

We next extract the profile MS data for the [M+H]+ adduct of serine with the expected m/z of 106.049871. We thus filter the data_prof object using an m/z range containing the signal for the metabolite and a retention time window from 175 to 187 seconds corresponding to the time when the analyte elutes from the LC.

## Define the mz and retention time ranges
serine_mz <- 106.049871
mzr <- c(serine_mz - 0.01, serine_mz + 0.01)
rtr <- c(175, 187)

## Filtering the object
serine <- data_prof %>%
    filterRt(rtr) %>%
    filterMz(mzr)

We can now plot the profile MS data for serine.

plot(serine, type = "XIC")
abline(h = serine_mz, col = "red", lty = 2)