DIAlignR 2.12.0
In this document we are presenting a workflow of retention-time alignment across multiple Targeted-MS (e.g. DIA, SWATH-MS, PRM, SRM) runs using DIAlignR. This tool requires MS2 chromatograms and provides a hybrid approach of global and local alignment to establish correspondence between peaks.
if(!requireNamespace("BiocManager", quietly = TRUE))
Mass-spectrometry files mostly contains spectra. Targeted proteomics workflow identifyies analytes from their chromatographic elution profile. DIAlignR extends the same concept for retention-time (RT) alignment and, therefore, relies on MS2 chromatograms. DIAlignR expects raw chromatogram file (.chrom.sqMass) and FDR-scored features (.osw) file.
Example files are available with this package and can be located with this command:
dataPath <- system.file("extdata", package = "DIAlignR")
commands to be used:OpenSwathWorkflow -in Filename.mzML.gz -tr library.pqp -tr_irt
iRTassays.TraML -out_osw Filename.osw -out_chrom Filename.chrom.mzML
OpenSwathMzMLFileCacher -in Filename.chrom.mzML -out Filename.chrom.sqMass -lossy_compression false
Note: If you prefer to use chrom.mzML instead of chrom.sqMass, some chromatograms are stored in compressed form and currently inaccesible by mzR
. In such cases mzR
would throw an error indicating Invalid cvParam accession "1002746"
. To avoid this issue, uncompress chromatograms using OpenMS.
FileConverter -in Filename.chrom.mzML -in_type 'mzML' -out Filename.chrom.mzML
pyprophet merge --template=library.pqp --out=merged.osw *.osw
pyprophet score --in=merged.osw --classifier=XGBoost --level=ms1ms2
pyprophet peptide --in=merged.osw --context=experiment-wide
directory and merged.osw file in osw
directory. The parent folder is given as dataPath
to DIAlignR functions.There are three modes for multirun alignment: star, MST and Progressive.
The functions align proteomics or metabolomics DIA runs. They expect two directories “osw” and “xics” at dataPath
, and output an intensity table where rows specify each analyte and columns specify runs.
runs <- c("hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt",
params <- paramsDIAlignR()
params[["context"]] <- "experiment-wide"
# For specific runs provide their names.
alignTargetedRuns(dataPath = dataPath, outFile = "test", runs = runs, oswMerged = TRUE, params = params)
# For all the analytes in all runs, keep them as NULL.
alignTargetedRuns(dataPath = dataPath, outFile = "test", runs = NULL, oswMerged = TRUE, params = params)
For MST alignment, a precomputed guide-tree can be supplied.
tree <- "run2 run2\nrun1 run0"
mstAlignRuns(dataPath = dataPath, outFile = "test", mstNet = tree, oswMerged = TRUE, params = params)
# Compute tree on-the-fly
mstAlignRuns(dataPath = dataPath, outFile = "test", oswMerged = TRUE, params = params)
Similar to previous approach, a precomputed guide-tree can be supplied.
text1 <- "(run1:0.08857142857,(run0:0.06857142857,run2:0.06857142857)masterB:0.02)master1;"
progAlignRuns(dataPath = dataPath, outFile = "test", newickTree = text1, oswMerged = TRUE, params = params)
# Compute tree on-the-fly
progAlignRuns(dataPath = dataPath, outFile = "test", oswMerged = TRUE, params = params)
In a large-scale study, the pyprophet merge
would create a huge file that can’t be fit in the memory. Hence, scaling-up of pyprophet based on subsampling is recommended. Do not run the last two
commands pyprophet backpropagate
and pyprophet export
, as these commands
copy scores from model_global.osw
to each run, increasing the size unnecessarily.
Instead, use oswMerged = FALSE
and scoreFile=PATH/TO/model_global.osw
For getting alignment object which has aligned indices of XICs getAlignObjs
function can be used. Like previous function, it expects two directories “osw” and “xics” at dataPath
. It performs alignment for exactly two runs. In case of refRun
is not provided, m-score from osw files is used to select reference run.
runs <- c("hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt",
AlignObjLight <- getAlignObjs(analytes = 4618L, runs = runs, dataPath = dataPath, objType = "light", params = params)
#> [1] "hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"
#> [2] "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt"
#> [1] "Finding reference run using SCORE_PEPTIDE table"
# First element contains names of runs, spectra files, chromatogram files and feature files.
AlignObjLight[[1]][, c("runName", "spectraFile")]
#> runName
#> run1 hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt
#> run2 hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt
#> spectraFile
#> run1 data/raw/hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt.mzML.gz
#> run2 data/raw/hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt.mzML.gz
obj <- AlignObjLight[[2]][["4618"]][[1]][["AlignObj"]]
#> [1] "indexA_aligned" "indexB_aligned" "score"
#> [1] "indexA_aligned" "indexB_aligned" "score"
AlignObjMedium <- getAlignObjs(analytes = 4618L, runs = runs, dataPath = dataPath, objType = "medium", params = params)
#> [1] "hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"
#> [2] "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt"
#> [1] "Finding reference run using SCORE_PEPTIDE table"
obj <- AlignObjMedium[[2]][["4618"]][[1]][["AlignObj"]]
#> [1] "s" "path" "indexA_aligned" "indexB_aligned"
#> [5] "score"
Alignment object has slots * indexA_aligned aligned indices of reference chromatogram. * indexB_aligned aligned indices of experiment chromatogram * score cumulative score of the alignment till an index. * s similarity score matrix. * path path of the alignment through similarity score matrix.
We can visualize aligned chromatograms using plotAlignedAnalytes
. The top figure is experiment unaligned-XICs, middle one is reference XICs, last figure is experiment run aligned to reference.
runs <- c("hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt",
AlignObj <- getAlignObjs(analytes = 4618L, runs = runs, dataPath = dataPath, params = params)
#> [1] "hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"
#> [2] "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt"
#> [1] "Finding reference run using SCORE_PEPTIDE table"
plotAlignedAnalytes(AlignObj, annotatePeak = TRUE)
#> Warning: Removed 30 rows containing missing values or values outside the scale range
#> (`geom_line()`).