MSstatsResponse provides statistical tools for detecting drug-protein interactions and estimating IC50 values from dose-response mass spectrometry-based proteomics experiments. The package implements isotonic regression models to detect monotonic dose-response relationships and provides robust statistical inference through F-tests and bootstrap confidence intervals.
MSstatsResponse is designed to work seamlessly with summarized data from MSstats and MSstatsTMT workflows, but can also process other dose-response proteomics datasets that include protein-level quantifications across drug treatments.
The package employs isotonic regression, a non-parametric method that fits monotonic functions without assuming a specific functional form. This approach is particularly suitable for dose-response data where monotonicity (increasing or decreasing trend) is expected.
MSstatsResponse implements a three-step statistical analysis pipeline:
MSstatsPrepareDoseResponseFit() - Convert input data to standardized formatdoseResponseFit() - Detect drug-protein interactions using isotonic regression and F-testspredictIC50() - Estimate IC50 values with bootstrap confidence intervalsfutureExperimentSimulation - Simulate dose-response experiments to determine optimal study designs and predict true-positive and false-positive rates# Install from Bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("MSstatsResponse")
library(MSstatsResponse)
library(dplyr)
library(tidyverse)
library(ggplot2)
library(data.table)
# Optional: for upstream data processing
# library(MSstats)
# library(MSstatsTMT)
If analyzing a mass spectrometry-based proteomics dataset, data should be preprocessed using the standard MSstats workflow for cleaning, normalization, and protein summarization prior to analysis with MSstatsResponse.
When using MSstats, data can be converted with one of the available converter functions (e.g., SpectronauttoMSstatsFormat()) and subsequently processed with dataProcess().
For MSstatsTMT, converter functions (e.g., MaxQtoMSstatsTMTFormat()) may be used, followed by proteinSummarization() for data processing.
Please refer to the MSstats and MSstatsTMT Bioconductor vignettes for a comprehensive list of supported converters and detailed preprocessing recommendations.
# Read raw data (example with Spectronaut output)
raw_data <- read_tsv("path/to/spectronaut_report.tsv")
# Convert to MSstats format
msstats_data <- SpectronauttoMSstatsFormat(raw_data)
# Process data: normalization and protein summarization
processed_data <- dataProcess(
msstats_data,
normalization = "equalizeMedians", # or FALSE for no normalization
summaryMethod = "TMP", # Tukey's median polish
MBimpute = TRUE, # Impute missing values
maxQuantileforCensored = 0.999
)
# Extract protein-level data for dose-response analysis
protein_level_data <- processed_data$ProteinLevelData
For this vignette, we’ll use pre-processed data prepared following the steps described above. For more information on preprocessing mass spectrometry–based proteomics experiments, see the vignettes for MSstats and/or MSstatsTMT.
# Load pre-processed DIA-MS data example from data/ (DIA_MSstats_Normalized.rda)
data("DIA_MSstats_Normalized", package = "MSstatsResponse")
dia_normalized <- DIA_MSstats_Normalized
# Examine data structure
str(dia_normalized$ProteinLevelData[1:5, ])
#> 'data.frame': 5 obs. of 11 variables:
#> $ RUN : Factor w/ 27 levels "1","2","3","4",..: 1 2 3 4 5
#> $ Protein : Factor w/ 3 levels "PROTEIN_A","PROTEIN_B",..: 1 1 1 1 1
#> $ LogIntensities : num 5.05 5.65 5.15 5.02 5.32
#> $ originalRUN : Factor w/ 27 levels "20240221_3uL_neu-01_dmso-1_dia_2x10_0p65-1p45_85min_Slot1-1_1_2571",..: 1 2 3 4 5
#> $ GROUP : Factor w/ 9 levels "Drug1_001nM",..: 9 9 9 1 1
#> $ SUBJECT : num 1 2 3 1 2
#> $ TotalGroupMeasurements: int 396 396 396 396 396
#> $ NumMeasuredFeature : int 132 132 132 132 132
#> $ MissingPercentage : num 0 0 0 0 0
#> $ more50missing : logi FALSE FALSE FALSE FALSE FALSE
#> $ NumImputedFeature : int 0 0 0 0 0
The convertGroupToNumericDose() function parses MSstats-style GROUP labels to extract drug names and numeric dose values.
Format expectations:
- Control samples: "DMSO" (assigned dose = 0)
- Treatment samples: "DrugName_ValueUnit" (e.g., "Drug1_003uM", "Drug2_300nM")
# Convert GROUP labels to numeric doses
dose_info <- convertGroupToNumericDose(dia_normalized$ProteinLevelData$GROUP)
# Add dose and drug information to the dataset
dia_normalized$ProteinLevelData <- dia_normalized$ProteinLevelData %>%
mutate(
dose_nM = dose_info$dose_nM, # Dose in nanomolar
drug = dose_info$drug # Drug name
)
# View converted data
dia_normalized$ProteinLevelData %>%
select(Protein, GROUP, drug, dose_nM) %>%
head(10)
#> Protein GROUP drug dose_nM
#> 1 PROTEIN_A DMSO DMSO 0
#> 2 PROTEIN_A DMSO DMSO 0
#> 3 PROTEIN_A DMSO DMSO 0
#> 4 PROTEIN_A Drug1_001nM Drug1 1
#> 5 PROTEIN_A Drug1_001nM Drug1 1
#> 6 PROTEIN_A Drug1_001nM Drug1 1
#> 7 PROTEIN_A Drug1_001uM Drug1 1000
#> 8 PROTEIN_A Drug1_001uM Drug1 1000
#> 9 PROTEIN_A Drug1_001uM Drug1 1000
#> 10 PROTEIN_A Drug1_003nM Drug1 3
This function standardizes the data structure for downstream analysis. Users can either use pre-processed data as shown above (e.g., dia_normalized$ProteinLevelData) or begin at this step with their own datasets. It is important to ensure that column names are correctly matched to the expected ones when using MSstatsPrepareDoseResponseFit().
The dose_column corresponds to the values on the x-axis (e.g., dose, concentration, time, or temperature). The protein_column should contain unique identifiers (e.g., protein, gene, or peptide sequence), and the log_abundance_column represents the response values on the y-axis (e.g., log intensity, expression, or growth).
# Prepare data for dose-response fitting
dia_prepared <- MSstatsPrepareDoseResponseFit(
data = dia_normalized$ProteinLevelData,
dose_column = "dose_nM",
drug_column = "drug",
protein_column = "Protein",
log_abundance_column = "LogIntensities",
transform_nM_to_M = TRUE
)
# Examine prepared data structure
str(dia_prepared)
#> 'data.frame': 81 obs. of 5 variables:
#> $ protein : Factor w/ 3 levels "PROTEIN_A","PROTEIN_B",..: 1 1 1 1 1 1 1 1 1 1 ...
#> $ drug : chr "DMSO" "DMSO" "DMSO" "Drug1" ...
#> $ dose : num 0e+00 0e+00 0e+00 1e-09 1e-09 1e-09 1e-06 1e-06 1e-06 3e-09 ...
#> $ response: num 5.05 5.65 5.15 5.02 5.32 ...
#> $ dose_nM : num 0 0 0 1 1 1 1000 1000 1000 3 ...
# View sample of prepared data
dia_prepared %>%
filter(protein %in% c("PROTEIN_A", "PROTEIN_B")) %>%
arrange(protein, drug, dose) %>%
head(20)
#> protein drug dose response dose_nM
#> 1 PROTEIN_A DMSO 0e+00 5.050662 0
#> 2 PROTEIN_A DMSO 0e+00 5.647209 0
#> 3 PROTEIN_A DMSO 0e+00 5.148809 0
#> 4 PROTEIN_A Drug1 1e-09 5.019425 1
#> 5 PROTEIN_A Drug1 1e-09 5.318420 1
#> 6 PROTEIN_A Drug1 1e-09 5.141087 1
#> 7 PROTEIN_A Drug1 3e-09 5.218344 3
#> 8 PROTEIN_A Drug1 3e-09 5.199884 3
#> 9 PROTEIN_A Drug1 3e-09 5.320023 3
#> 10 PROTEIN_A Drug1 1e-08 4.848356 10
#> 11 PROTEIN_A Drug1 1e-08 5.205914 10
#> 12 PROTEIN_A Drug1 1e-08 4.743209 10
#> 13 PROTEIN_A Drug1 3e-08 4.663491 30
#> 14 PROTEIN_A Drug1 3e-08 4.486120 30
#> 15 PROTEIN_A Drug1 3e-08 4.505798 30
#> 16 PROTEIN_A Drug1 1e-07 3.936098 100
#> 17 PROTEIN_A Drug1 1e-07 4.231085 100
#> 18 PROTEIN_A Drug1 1e-07 4.327733 100
#> 19 PROTEIN_A Drug1 3e-07 3.590255 300
#> 20 PROTEIN_A Drug1 3e-07 3.212578 300
Isotonic regression fits a monotonic function to the data without assuming a parametric form. For this example, we have drug inhibitor dose-response data. When treated with inhibitors we typically expect:
The doseResponseFit() function:
1. Fits isotonic regression for each protein-drug pair
2. Performs F-test comparing the fitted model to a flat (null) model
3. Adjusts p-values for multiple testing using Benjamini-Hochberg procedure
# Detect drug-protein interactions
response_results <- doseResponseFit(
data = dia_prepared,
increasing = FALSE, # Expect decreasing response
transform_dose = TRUE, # Apply log10(dose + 1) transformation
ratio_response = FALSE # Stay on log2 scale for testing
)
# Examine results
response_results %>%
select(protein, drug, F_statistic, P_value, adjust_pval) %>%
arrange(adjust_pval) %>%
head(10)
#> protein drug F_statistic P_value adjust_pval
#> Drug1.2 PROTEIN_B Drug1 255.431210 0.000000e+00 0.000000e+00
#> Drug1.1 PROTEIN_A Drug1 76.470195 2.590261e-12 3.885392e-12
#> Drug1.3 PROTEIN_C Drug1 1.180529 3.627180e-01 3.627180e-01
F_statistic: Ratio of between-dose to within-dose variance
P_value: Raw p-value from F-test
adjust_pval: FDR-adjusted p-value (Benjamini-Hochberg)
Proteins with adjust_pval < 0.05 show significant dose-dependent responses.
# Count significant interactions
n_total <- nrow(response_results)
n_significant <- sum(response_results$adjust_pval < 0.05)
cat("Total protein-drug pairs tested:", n_total, "\n")
#> Total protein-drug pairs tested: 3
cat("Significant interactions (FDR < 0.05):", n_significant, "\n")
#> Significant interactions (FDR < 0.05): 2
cat("Percentage significant:", round(100 * n_significant/n_total, 1), "%\n")
#> Percentage significant: 66.7 %
# Top hits
top_hits <- response_results %>%
filter(adjust_pval < 0.05) %>%
arrange(adjust_pval) %>%
head(5)
print(top_hits)
#> protein drug SSE_Full SSE_Null F_statistic P_value
#> Drug1.2 PROTEIN_B Drug1 1.0502856 120.28394 255.4312 0.000000e+00
#> Drug1.1 PROTEIN_A Drug1 0.7482337 26.17827 76.4702 2.590261e-12
#> adjust_pval
#> Drug1.2 0.000000e+00
#> Drug1.1 3.885392e-12
IC50 represents the dose at which the response is reduced to 50% of the control (DMSO) level. MSstatsResponse estimates IC50 by:
Fitting isotonic regression on the ratio scale (response relative to DMSO)
Finding the dose where fitted response = 0.5
Computing confidence intervals via bootstrap resampling
# Estimate IC50 with bootstrap confidence intervals
ic50_predictions <- predictIC50(
data = dia_prepared,
ratio_response = TRUE, # Use ratio scale for IC50
transform_dose = TRUE, # Log-transform doses
increasing = FALSE, # Decreasing response expected
bootstrap = TRUE, # Compute confidence intervals
n_samples = 1000, # Number of bootstrap samples
alpha = 0.10 # 90% confidence intervals
)
# View IC50 estimates
ic50_predictions %>%
arrange(IC50) %>%
head(10)
#> protein drug IC50 IC50_lower_bound IC50_upper_bound
#> <fctr> <char> <num> <num> <num>
#> 1: PROTEIN_A Drug1 7.47197 7.887871 7.211512
#> 2: PROTEIN_B Drug1 8.19419 8.520550 7.969620
#> 3: PROTEIN_C Drug1 NA NA NA
For datasets with many proteins, use parallel processing:
# Register a BiocParallel backend
library(BiocParallel)
if (.Platform$OS.type == "windows") {
register(SnowParam(workers = 4, type = "SOCK"))
} else {
register(MulticoreParam(workers = 4))
}
# Parallel IC50 estimation (faster for large datasets)
ic50_parallel <- predictIC50(
data = dia_prepared,
ratio_response = TRUE,
transform_dose = TRUE,
bootstrap = TRUE,
n_samples = 1000,
BPPARAM = bpparam() # use 4 CPU cores defined above
)
The visualizeResponseProtein() function creates publication-quality dose-response plots:
# Visualize strong responder
visualizeResponseProtein(
data = dia_prepared,
protein_name = "PROTEIN_A",
drug_name = "Drug1",
ratio_response = TRUE,
show_ic50 = TRUE,
add_ci = TRUE,
transform_dose = TRUE,
n_samples = 1000
)
#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
#> ℹ Please use `linewidth` instead.
#> ℹ The deprecated feature was likely used in the MSstatsResponse package.
#> Please report the issue at
#> <https://github.com/Vitek-Lab/MSstatsResponse/issues>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
# Visualize another target
visualizeResponseProtein(
data = dia_prepared,
protein_name = "PROTEIN_B",
drug_name = "Drug1",
ratio_response = TRUE,
show_ic50 = TRUE,
add_ci = TRUE
)
# Log2 scale (left panel)
p1 <- visualizeResponseProtein(
data = dia_prepared,
protein_name = "PROTEIN_A",
drug_name = "Drug1",
ratio_response = FALSE, # Log2 scale
show_ic50 = TRUE,
add_ci = FALSE
)
# Ratio scale (right panel)
p2 <- visualizeResponseProtein(
data = dia_prepared,
protein_name = "PROTEIN_A",
drug_name = "Drug1",
ratio_response = TRUE, # Ratio scale
show_ic50 = TRUE,
add_ci = FALSE
)
# Combine plots (requires gridExtra)
gridExtra::grid.arrange(p1, p2, ncol = 2)
The futureExperimentSimulation() function helps optimize experimental design by simulating performance under different conditions:
You can use proteins from your own experimental data as templates for more realistic simulations. This approach leverages your validated hits to model expected performance:
# First, identify proteins from your results to use as templates
# Run simulation using your data as templates
custom_simulation <- futureExperimentSimulation(
N_proteins = 3000,
N_rep = 2,
N_Control_Rep = 3,
Concentrations = c(0, 1, 3, 10, 30, 100, 300, 1000, 3000),
data = dia_prepared, # Your data
strong_proteins = 'PROTEIN_B', # User defined strong responder
weak_proteins = 'PROTEIN_A', # User defined weak responder
no_interaction_proteins = 'PROTEIN_C', # User defined negative control
drug_name = "Drug1", # Specify which drug to model
IC50_Prediction = FALSE
)
#> Templates successfully extracted from data
#> Strong interaction proteins: PROTEIN_B
#> Weak interaction proteins: PROTEIN_A
#> No interaction proteins: PROTEIN_C
# Compare performance metrics
print(custom_simulation$Hit_Rates_Plot)
# Examine the templates that were extracted from your data
print(custom_simulation$Template_Used)
#> $strong_interaction
#> dose Intensity LogIntensities ratio
#> 1 0 42.256113 5.4010882 1.00000000
#> 2 1 38.863432 5.2803414 0.91971147
#> 3 3 26.873610 4.7481182 0.63596977
#> 4 10 22.303763 4.4792152 0.52782334
#> 5 30 17.327563 4.1149968 0.41006050
#> 6 100 10.777063 3.4298922 0.25504152
#> 7 300 1.186329 0.2465041 0.02807473
#> 8 1000 1.266790 0.3411772 0.02997885
#> 9 3000 1.177035 0.2351572 0.02785479
#>
#> $weak_interaction
#> dose Intensity LogIntensities ratio
#> 1 0 38.914246 5.282227 1.0000000
#> 2 1 35.744368 5.159644 0.9185420
#> 3 3 37.951462 5.246084 0.9752588
#> 4 10 30.537135 4.932493 0.7847289
#> 5 30 23.454666 4.551803 0.6027270
#> 6 100 17.938304 4.164972 0.4609701
#> 7 300 9.453305 3.240819 0.2429266
#> 8 1000 7.395559 2.886659 0.1900476
#> 9 3000 6.878926 2.782183 0.1767714
#>
#> $no_interaction
#> dose Intensity LogIntensities ratio
#> 1 0 59.16019 5.886555 1.0000000
#> 2 1 60.11579 5.909672 1.0161526
#> 3 3 56.23066 5.813285 0.9504813
#> 4 10 57.99096 5.857756 0.9802362
#> 5 30 57.62351 5.848586 0.9740250
#> 6 100 57.80438 5.853107 0.9770823
#> 7 300 54.28170 5.762394 0.9175376
#> 8 1000 56.94126 5.831402 0.9624927
#> 9 3000 55.61656 5.797443 0.9401010
sessionInfo()
#> R version 4.5.1 Patched (2025-08-23 r88802)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] data.table_1.17.8 lubridate_1.9.4 forcats_1.0.1
#> [4] stringr_1.5.2 purrr_1.1.0 readr_2.1.5
#> [7] tidyr_1.3.1 tibble_3.3.0 ggplot2_4.0.0
#> [10] tidyverse_2.0.0 dplyr_1.1.4 MSstatsResponse_0.99.2
#> [13] BiocStyle_2.37.1
#>
#> loaded via a namespace (and not attached):
#> [1] sass_0.4.10 generics_0.1.4 stringi_1.8.7
#> [4] hms_1.1.4 digest_0.6.37 magrittr_2.0.4
#> [7] evaluate_1.0.5 grid_4.5.1 timechange_0.3.0
#> [10] RColorBrewer_1.1-3 bookdown_0.45 fastmap_1.2.0
#> [13] jsonlite_2.0.0 tinytex_0.57 gridExtra_2.3
#> [16] BiocManager_1.30.26 scales_1.4.0 codetools_0.2-20
#> [19] jquerylib_0.1.4 cli_3.6.5 crayon_1.5.3
#> [22] rlang_1.1.6 withr_3.0.2 cachem_1.1.0
#> [25] yaml_2.3.10 tools_4.5.1 parallel_4.5.1
#> [28] tzdb_0.5.0 BiocParallel_1.43.4 vctrs_0.6.5
#> [31] R6_2.6.1 magick_2.9.0 lifecycle_1.0.4
#> [34] pkgconfig_2.0.3 pillar_1.11.1 bslib_0.9.0
#> [37] gtable_0.3.6 Rcpp_1.1.0 glue_1.8.0
#> [40] xfun_0.53 tidyselect_1.2.1 knitr_1.50
#> [43] dichromat_2.0-0.1 farver_2.1.2 htmltools_0.5.8.1
#> [46] labeling_0.4.3 rmarkdown_2.30 compiler_4.5.1
#> [49] S7_0.2.0
Szvetecz, S., et al. (2025).MSstatsResponse: Semi-parametric statistical model enhances drug-protein interaction detection and IC50 estimation in chemoproteomics experiments. Manuscript in preparation.
Choi, M., et al. (2014). MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments. Bioinformatics, 30(17), 2524-2526.