Data preprocessing

The extraction of spectra works the same way as with high resolution LC-MS/MS data:

ms2list <- extractMS2spectra(lowresfile)
length(ms2list)
#> [1] 1989

Like in the GC-EI-MS example, we have to adjust mz_tolerance to a much higher value compared to high resolution data, while the retention time tolerance can remain unchanged.

featlist <- mergeMS2spectra(ms2list, mz_tolerance = 0.02)

length(featlist)
#> [1] 525

We see that we have similar numbers of spectra as in the General Tutorial, because we tried to keep all parameters except for the mass spectrometer type comparable.

Generation of distance matrix

As we do not have low resolution spectral libraries at hand, we skip the integration of previous knowledge on feature identities in this example and generate a distance matrix right away:

distmat <- distanceMatrix(featlist)

Data exploration

Starting from this distance matrix, we can use all the data exploration functions that CluMSID offers.

When we now make an MDS plot, we learn that the similarity data is very different from the comparable high resolution data:

MDSplot(distmat)

Figure 1: Multidimensional scaling plot as a visualisation of MS² spectra similarities of the low resolution LC-MS/MS example data set. Black dots signify spectra from unknown metabolites.

To get a better overview of the data and the general similarity behaviour, we create a heat map of the distance matrix:

HCplot(distmat, type = "heatmap", 
                cexRow = 0.1, cexCol = 0.1,
                margins = c(6,6))

Figure 2: Symmetric heat map of the distance matrix displaying MS² spectra similarities of the low resolution LC-MS/MS example data set. along with dendrograms resulting from hierarchical clustering based on the distance matrix. The colour encoding is shown in the top-left insert.

We clearly see that the heat map is generally a lot “warmer” than in the General Tutorial (an intuition that is supported by the histogram in the top-left corner), i.e. we have a higher general degree of similarity between spectra. That is not surprising as the m/z information has much fewer levels than in high resolution data and fragments of different sum formula are more likely to have indistinguishable mass-to-charge ratios.

We also see that some more or less compact clusters can be identified. This is easier to inspect in the dendrogram visualisation:

HCplot(distmat, h = 0.8, cex = 0.3)

Figure 3: Circularised dendrogram as a result of agglomerative hierarchical clustering with average linkage as agglomeration criterion based on MS² spectra similarities of the low resolution LC-MS/MS example data set. Each leaf represents one feature and colours encode cluster affiliation of the features. Leaf labels display feature IDs. Distance from the central point is indicative of the height of the dendrogram.

In conclusion, CluMSID is capable of processing low resolution LC-MS/MS data and if high resolution data is not available, it can be very useful to provide an overview of spectral similarities in low resolution data, thereby helping metabolite annotation if some individual metabolites can be identified by comparison to authentic standards. However, concerning feature annotation, high resolution methods should always be favoured for the many benefits they provide.

Session Info

sessionInfo()
#> R version 4.5.0 beta (2025-04-02 r88102)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] magrittr_2.0.3      metaMSdata_1.43.0   metaMS_1.45.0      
#>  [4] CAMERA_1.65.0       xcms_4.7.0          BiocParallel_1.43.0
#>  [7] Biobase_2.69.0      BiocGenerics_0.55.0 generics_0.1.3     
#> [10] CluMSIDdata_1.23.0  CluMSID_1.25.0     
#> 
#> loaded via a namespace (and not attached):
#>   [1] RColorBrewer_1.1-3          rstudioapi_0.17.1          
#>   [3] jsonlite_2.0.0              MultiAssayExperiment_1.35.0
#>   [5] farver_2.1.2                MALDIquant_1.22.3          
#>   [7] rmarkdown_2.29              fs_1.6.6                   
#>   [9] vctrs_0.6.5                 base64enc_0.1-3            
#>  [11] htmltools_0.5.8.1           S4Arrays_1.9.0             
#>  [13] BiocBaseUtils_1.11.0        progress_1.2.3             
#>  [15] SparseArray_1.9.0           Formula_1.2-5              
#>  [17] mzID_1.47.0                 sass_0.4.10                
#>  [19] KernSmooth_2.23-26          bslib_0.9.0                
#>  [21] htmlwidgets_1.6.4           plyr_1.8.9                 
#>  [23] impute_1.83.0               plotly_4.10.4              
#>  [25] cachem_1.1.0                igraph_2.1.4               
#>  [27] lifecycle_1.0.4             iterators_1.0.14           
#>  [29] pkgconfig_2.0.3             Matrix_1.7-3               
#>  [31] R6_2.6.1                    fastmap_1.2.0              
#>  [33] GenomeInfoDbData_1.2.14     MatrixGenerics_1.21.0      
#>  [35] clue_0.3-66                 digest_0.6.37              
#>  [37] pcaMethods_2.1.0            colorspace_2.1-1           
#>  [39] GGally_2.2.1                S4Vectors_0.47.0           
#>  [41] Hmisc_5.2-3                 GenomicRanges_1.61.0       
#>  [43] labeling_0.4.3              Spectra_1.19.0             
#>  [45] httr_1.4.7                  abind_1.4-8                
#>  [47] compiler_4.5.0              bit64_4.6.0-1              
#>  [49] withr_3.0.2                 doParallel_1.0.17          
#>  [51] backports_1.5.0             htmlTable_2.4.3            
#>  [53] DBI_1.2.3                   ggstats_0.9.0              
#>  [55] gplots_3.2.0                MASS_7.3-65                
#>  [57] MsExperiment_1.11.0         DelayedArray_0.35.0        
#>  [59] gtools_3.9.5                caTools_1.18.3             
#>  [61] mzR_2.43.0                  tools_4.5.0                
#>  [63] foreign_0.8-90              PSMatch_1.13.0             
#>  [65] ape_5.8-1                   nnet_7.3-20                
#>  [67] glue_1.8.0                  dbscan_1.2.2               
#>  [69] nlme_3.1-168                QFeatures_1.19.0           
#>  [71] grid_4.5.0                  checkmate_2.3.2            
#>  [73] cluster_2.1.8.1             reshape2_1.4.4             
#>  [75] gtable_0.3.6                tzdb_0.5.0                 
#>  [77] preprocessCore_1.71.0       tidyr_1.3.1                
#>  [79] sna_2.8                     data.table_1.17.0          
#>  [81] hms_1.1.3                   MetaboCoreUtils_1.17.0     
#>  [83] XVector_0.49.0              foreach_1.5.2              
#>  [85] pillar_1.10.2               stringr_1.5.1              
#>  [87] vroom_1.6.5                 limma_3.65.0               
#>  [89] robustbase_0.99-4-1         dplyr_1.1.4                
#>  [91] lattice_0.22-7              bit_4.6.0                  
#>  [93] tidyselect_1.2.1            RBGL_1.85.0                
#>  [95] knitr_1.50                  gridExtra_2.3              
#>  [97] IRanges_2.43.0              ProtGenerics_1.41.0        
#>  [99] SummarizedExperiment_1.39.0 stats4_4.5.0               
#> [101] xfun_0.52                   statmod_1.5.0              
#> [103] MSnbase_2.35.0              matrixStats_1.5.0          
#> [105] DEoptimR_1.1-3-1            stringi_1.8.7              
#> [107] UCSC.utils_1.5.0            statnet.common_4.11.0      
#> [109] lazyeval_0.2.2              yaml_2.3.10                
#> [111] evaluate_1.0.3              codetools_0.2-20           
#> [113] archive_1.1.12              MsCoreUtils_1.21.0         
#> [115] tibble_3.2.1                BiocManager_1.30.25        
#> [117] graph_1.87.0                cli_3.6.4                  
#> [119] affyio_1.79.0               rpart_4.1.24               
#> [121] munsell_0.5.1               jquerylib_0.1.4            
#> [123] network_1.19.0              Rcpp_1.0.14                
#> [125] GenomeInfoDb_1.45.0         MassSpecWavelet_1.75.0     
#> [127] coda_0.19-4.1               XML_3.99-0.18              
#> [129] parallel_4.5.0              readr_2.1.5                
#> [131] ggplot2_3.5.2               prettyunits_1.2.0          
#> [133] AnnotationFilter_1.33.0     bitops_1.0-9               
#> [135] viridisLite_0.4.2           MsFeatures_1.17.0          
#> [137] scales_1.3.0                affy_1.87.0                
#> [139] ncdf4_1.24                  purrr_1.0.4                
#> [141] crayon_1.5.3                rlang_1.1.6                
#> [143] vsn_3.77.0

Clustering Mass Spectra from Low Resolution LC-MS/MS Data Using CluMSID

Tobias Depke

April 15, 2025

Introduction

Data import

Data preprocessing

Generation of distance matrix

Data exploration

Session Info