Contents

0.1 Introduction

MSstatsResponse provides statistical tools for detecting drug-protein interactions and estimating IC50 values from dose-response mass spectrometry-based proteomics experiments. The package implements isotonic regression models to detect monotonic dose-response relationships and provides robust statistical inference through F-tests and bootstrap confidence intervals.

MSstatsResponse is designed to work seamlessly with summarized data from MSstats and MSstatsTMT workflows, but can also process other dose-response proteomics datasets that include protein-level quantifications across drug treatments.

0.1.1 Statistical framework

The package employs isotonic regression, a non-parametric method that fits monotonic functions without assuming a specific functional form. This approach is particularly suitable for dose-response data where monotonicity (increasing or decreasing trend) is expected.

0.1.2 Analysis workflow

MSstatsResponse implements a three-step statistical analysis pipeline:

  1. Data preparation: MSstatsPrepareDoseResponseFit() - Convert input data to standardized format
  2. Interaction detection: doseResponseFit() - Detect drug-protein interactions using isotonic regression and F-tests
  3. IC50 estimation: predictIC50() - Estimate IC50 values with bootstrap confidence intervals
  4. Power analysis for experimental planning: futureExperimentSimulation - Simulate dose-response experiments to determine optimal study designs and predict true-positive and false-positive rates

0.2 Installation and setup

0.2.1 Install MSstatsResponse

# Install from Bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")

BiocManager::install("MSstatsResponse")

0.2.2 Load required libraries

library(MSstatsResponse)
library(dplyr)
library(tidyverse)
library(ggplot2)
library(data.table)

# Optional: for upstream data processing
# library(MSstats)
# library(MSstatsTMT)

0.3 Data preprocessing with MSstats

If analyzing a mass spectrometry-based proteomics dataset, data should be preprocessed using the standard MSstats workflow for cleaning, normalization, and protein summarization prior to analysis with MSstatsResponse.

When using MSstats, data can be converted with one of the available converter functions (e.g., SpectronauttoMSstatsFormat()) and subsequently processed with dataProcess().

For MSstatsTMT, converter functions (e.g., MaxQtoMSstatsTMTFormat()) may be used, followed by proteinSummarization() for data processing.

Please refer to the MSstats and MSstatsTMT Bioconductor vignettes for a comprehensive list of supported converters and detailed preprocessing recommendations.

0.3.1 Example MSstats workflow

# Read raw data (example with Spectronaut output)
raw_data <- read_tsv("path/to/spectronaut_report.tsv")

# Convert to MSstats format
msstats_data <- SpectronauttoMSstatsFormat(raw_data)

# Process data: normalization and protein summarization
processed_data <- dataProcess(
  msstats_data,
  normalization = "equalizeMedians",  # or FALSE for no normalization
  summaryMethod = "TMP",              # Tukey's median polish
  MBimpute = TRUE,                     # Impute missing values
  maxQuantileforCensored = 0.999
)

# Extract protein-level data for dose-response analysis
protein_level_data <- processed_data$ProteinLevelData

0.3.2 Loading example data

For this vignette, we’ll use pre-processed data prepared following the steps described above. For more information on preprocessing mass spectrometry–based proteomics experiments, see the vignettes for MSstats and/or MSstatsTMT.

# Load pre-processed DIA-MS data example from data/ (DIA_MSstats_Normalized.rda)
data("DIA_MSstats_Normalized", package = "MSstatsResponse")
dia_normalized <- DIA_MSstats_Normalized

# Examine data structure
str(dia_normalized$ProteinLevelData[1:5, ])
#> 'data.frame':    5 obs. of  11 variables:
#>  $ RUN                   : Factor w/ 27 levels "1","2","3","4",..: 1 2 3 4 5
#>  $ Protein               : Factor w/ 3 levels "PROTEIN_A","PROTEIN_B",..: 1 1 1 1 1
#>  $ LogIntensities        : num  5.05 5.65 5.15 5.02 5.32
#>  $ originalRUN           : Factor w/ 27 levels "20240221_3uL_neu-01_dmso-1_dia_2x10_0p65-1p45_85min_Slot1-1_1_2571",..: 1 2 3 4 5
#>  $ GROUP                 : Factor w/ 9 levels "Drug1_001nM",..: 9 9 9 1 1
#>  $ SUBJECT               : num  1 2 3 1 2
#>  $ TotalGroupMeasurements: int  396 396 396 396 396
#>  $ NumMeasuredFeature    : int  132 132 132 132 132
#>  $ MissingPercentage     : num  0 0 0 0 0
#>  $ more50missing         : logi  FALSE FALSE FALSE FALSE FALSE
#>  $ NumImputedFeature     : int  0 0 0 0 0

0.4 Data preparation for dose-response analysis

0.4.1 Converting GROUP labels to numeric doses

The convertGroupToNumericDose() function parses MSstats-style GROUP labels to extract drug names and numeric dose values.

Format expectations:
- Control samples: "DMSO" (assigned dose = 0)
- Treatment samples: "DrugName_ValueUnit" (e.g., "Drug1_003uM", "Drug2_300nM")

# Convert GROUP labels to numeric doses
dose_info <- convertGroupToNumericDose(dia_normalized$ProteinLevelData$GROUP)

# Add dose and drug information to the dataset
dia_normalized$ProteinLevelData <- dia_normalized$ProteinLevelData %>%
  mutate(
    dose_nM = dose_info$dose_nM,  # Dose in nanomolar
    drug = dose_info$drug          # Drug name
  )


# View converted data
dia_normalized$ProteinLevelData %>%
  select(Protein, GROUP, drug, dose_nM) %>%
  head(10)
#>      Protein       GROUP  drug dose_nM
#> 1  PROTEIN_A        DMSO  DMSO       0
#> 2  PROTEIN_A        DMSO  DMSO       0
#> 3  PROTEIN_A        DMSO  DMSO       0
#> 4  PROTEIN_A Drug1_001nM Drug1       1
#> 5  PROTEIN_A Drug1_001nM Drug1       1
#> 6  PROTEIN_A Drug1_001nM Drug1       1
#> 7  PROTEIN_A Drug1_001uM Drug1    1000
#> 8  PROTEIN_A Drug1_001uM Drug1    1000
#> 9  PROTEIN_A Drug1_001uM Drug1    1000
#> 10 PROTEIN_A Drug1_003nM Drug1       3

0.4.2 Formatting data with MSstatsPrepareDoseResponseFit()

This function standardizes the data structure for downstream analysis. Users can either use pre-processed data as shown above (e.g., dia_normalized$ProteinLevelData) or begin at this step with their own datasets. It is important to ensure that column names are correctly matched to the expected ones when using MSstatsPrepareDoseResponseFit().

The dose_column corresponds to the values on the x-axis (e.g., dose, concentration, time, or temperature). The protein_column should contain unique identifiers (e.g., protein, gene, or peptide sequence), and the log_abundance_column represents the response values on the y-axis (e.g., log intensity, expression, or growth).

# Prepare data for dose-response fitting
dia_prepared <- MSstatsPrepareDoseResponseFit(
  data = dia_normalized$ProteinLevelData,
  dose_column = "dose_nM",
  drug_column = "drug",
  protein_column = "Protein",
  log_abundance_column = "LogIntensities",
  transform_nM_to_M = TRUE  
)

# Examine prepared data structure
str(dia_prepared)
#> 'data.frame':    81 obs. of  5 variables:
#>  $ protein : Factor w/ 3 levels "PROTEIN_A","PROTEIN_B",..: 1 1 1 1 1 1 1 1 1 1 ...
#>  $ drug    : chr  "DMSO" "DMSO" "DMSO" "Drug1" ...
#>  $ dose    : num  0e+00 0e+00 0e+00 1e-09 1e-09 1e-09 1e-06 1e-06 1e-06 3e-09 ...
#>  $ response: num  5.05 5.65 5.15 5.02 5.32 ...
#>  $ dose_nM : num  0 0 0 1 1 1 1000 1000 1000 3 ...

# View sample of prepared data
dia_prepared %>%
  filter(protein %in% c("PROTEIN_A", "PROTEIN_B")) %>%
  arrange(protein, drug, dose) %>%
  head(20)
#>      protein  drug  dose response dose_nM
#> 1  PROTEIN_A  DMSO 0e+00 5.050662       0
#> 2  PROTEIN_A  DMSO 0e+00 5.647209       0
#> 3  PROTEIN_A  DMSO 0e+00 5.148809       0
#> 4  PROTEIN_A Drug1 1e-09 5.019425       1
#> 5  PROTEIN_A Drug1 1e-09 5.318420       1
#> 6  PROTEIN_A Drug1 1e-09 5.141087       1
#> 7  PROTEIN_A Drug1 3e-09 5.218344       3
#> 8  PROTEIN_A Drug1 3e-09 5.199884       3
#> 9  PROTEIN_A Drug1 3e-09 5.320023       3
#> 10 PROTEIN_A Drug1 1e-08 4.848356      10
#> 11 PROTEIN_A Drug1 1e-08 5.205914      10
#> 12 PROTEIN_A Drug1 1e-08 4.743209      10
#> 13 PROTEIN_A Drug1 3e-08 4.663491      30
#> 14 PROTEIN_A Drug1 3e-08 4.486120      30
#> 15 PROTEIN_A Drug1 3e-08 4.505798      30
#> 16 PROTEIN_A Drug1 1e-07 3.936098     100
#> 17 PROTEIN_A Drug1 1e-07 4.231085     100
#> 18 PROTEIN_A Drug1 1e-07 4.327733     100
#> 19 PROTEIN_A Drug1 3e-07 3.590255     300
#> 20 PROTEIN_A Drug1 3e-07 3.212578     300

0.5 Drug-protein interaction detection

0.5.1 Understanding Isotonic Regression

Isotonic regression fits a monotonic function to the data without assuming a parametric form. For this example, we have drug inhibitor dose-response data. When treated with inhibitors we typically expect:

  • Non-increasing response: Drug binding reduces protein abundance
  • Monotonic constraint: Response doesn’t increase with dose

0.5.2 Running doseResponseFit()

The doseResponseFit() function:
1. Fits isotonic regression for each protein-drug pair
2. Performs F-test comparing the fitted model to a flat (null) model
3. Adjusts p-values for multiple testing using Benjamini-Hochberg procedure

# Detect drug-protein interactions
response_results <- doseResponseFit(
  data = dia_prepared,
  increasing = FALSE,        # Expect decreasing response
  transform_dose = TRUE,     # Apply log10(dose + 1) transformation
  ratio_response = FALSE     # Stay on log2 scale for testing
)

# Examine results
response_results %>%
  select(protein, drug, F_statistic, P_value, adjust_pval) %>%
  arrange(adjust_pval) %>%
  head(10)
#>           protein  drug F_statistic      P_value  adjust_pval
#> Drug1.2 PROTEIN_B Drug1  255.431210 0.000000e+00 0.000000e+00
#> Drug1.1 PROTEIN_A Drug1   76.470195 2.590261e-12 3.885392e-12
#> Drug1.3 PROTEIN_C Drug1    1.180529 3.627180e-01 3.627180e-01

0.5.3 Interpreting results

  • F_statistic: Ratio of between-dose to within-dose variance

  • P_value: Raw p-value from F-test

  • adjust_pval: FDR-adjusted p-value (Benjamini-Hochberg)

Proteins with adjust_pval < 0.05 show significant dose-dependent responses.

# Count significant interactions
n_total <- nrow(response_results)
n_significant <- sum(response_results$adjust_pval < 0.05)

cat("Total protein-drug pairs tested:", n_total, "\n")
#> Total protein-drug pairs tested: 3
cat("Significant interactions (FDR < 0.05):", n_significant, "\n")
#> Significant interactions (FDR < 0.05): 2
cat("Percentage significant:", round(100 * n_significant/n_total, 1), "%\n")
#> Percentage significant: 66.7 %

# Top hits
top_hits <- response_results %>%
  filter(adjust_pval < 0.05) %>%
  arrange(adjust_pval) %>%
  head(5)

print(top_hits)
#>           protein  drug  SSE_Full  SSE_Null F_statistic      P_value
#> Drug1.2 PROTEIN_B Drug1 1.0502856 120.28394    255.4312 0.000000e+00
#> Drug1.1 PROTEIN_A Drug1 0.7482337  26.17827     76.4702 2.590261e-12
#>          adjust_pval
#> Drug1.2 0.000000e+00
#> Drug1.1 3.885392e-12

0.6 IC50 estimation

0.6.1 Understanding IC50 calculation

IC50 represents the dose at which the response is reduced to 50% of the control (DMSO) level. MSstatsResponse estimates IC50 by:

  1. Fitting isotonic regression on the ratio scale (response relative to DMSO)

  2. Finding the dose where fitted response = 0.5

  3. Computing confidence intervals via bootstrap resampling

0.6.2 Running predictIC50()

# Estimate IC50 with bootstrap confidence intervals
ic50_predictions <- predictIC50(
  data = dia_prepared,
  ratio_response = TRUE,     # Use ratio scale for IC50
  transform_dose = TRUE,     # Log-transform doses
  increasing = FALSE,        # Decreasing response expected
  bootstrap = TRUE,          # Compute confidence intervals
  n_samples = 1000,         # Number of bootstrap samples
  alpha = 0.10              # 90% confidence intervals
)

# View IC50 estimates
ic50_predictions %>%
  arrange(IC50) %>%
  head(10)
#>      protein   drug    IC50 IC50_lower_bound IC50_upper_bound
#>       <fctr> <char>   <num>            <num>            <num>
#> 1: PROTEIN_A  Drug1 7.47197         7.887871         7.211512
#> 2: PROTEIN_B  Drug1 8.19419         8.520550         7.969620
#> 3: PROTEIN_C  Drug1      NA               NA               NA

0.6.3 Parallel processing for large datasets

For datasets with many proteins, use parallel processing:

# Register a BiocParallel backend
library(BiocParallel)
if (.Platform$OS.type == "windows") {
  register(SnowParam(workers = 4, type = "SOCK"))
} else {
  register(MulticoreParam(workers = 4))
}

# Parallel IC50 estimation (faster for large datasets)
ic50_parallel <- predictIC50(
  data = dia_prepared,
  ratio_response = TRUE,
  transform_dose = TRUE,
  bootstrap = TRUE,
  n_samples = 1000,
  BPPARAM = bpparam()  # use 4 CPU cores defined above
)

0.7 Visualization

0.7.1 Individual protein dose-response curves

The visualizeResponseProtein() function creates publication-quality dose-response plots:

# Visualize strong responder
visualizeResponseProtein(
  data = dia_prepared,
  protein_name = "PROTEIN_A",
  drug_name = "Drug1",
  ratio_response = TRUE,
  show_ic50 = TRUE,
  add_ci = TRUE,
  transform_dose = TRUE,
  n_samples = 1000
)
#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
#> ℹ Please use `linewidth` instead.
#> ℹ The deprecated feature was likely used in the MSstatsResponse package.
#>   Please report the issue at
#>   <https://github.com/Vitek-Lab/MSstatsResponse/issues>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.

# Visualize another target
visualizeResponseProtein(
  data = dia_prepared,
  protein_name = "PROTEIN_B",
  drug_name = "Drug1",
  ratio_response = TRUE,
  show_ic50 = TRUE,
  add_ci = TRUE
)

0.7.2 Comparing log2 vs ratio scale visualization

# Log2 scale (left panel)
p1 <- visualizeResponseProtein(
  data = dia_prepared,
  protein_name = "PROTEIN_A",
  drug_name = "Drug1",
  ratio_response = FALSE,  # Log2 scale
  show_ic50 = TRUE,
  add_ci = FALSE
)

# Ratio scale (right panel)  
p2 <- visualizeResponseProtein(
  data = dia_prepared,
  protein_name = "PROTEIN_A",
  drug_name = "Drug1",
  ratio_response = TRUE,   # Ratio scale
  show_ic50 = TRUE,
  add_ci = FALSE
)

# Combine plots (requires gridExtra)
 gridExtra::grid.arrange(p1, p2, ncol = 2)

0.8 Experimental design planning

0.8.1 Simulating future experiments

The futureExperimentSimulation() function helps optimize experimental design by simulating performance under different conditions:

0.8.2 Using your own data as templates

You can use proteins from your own experimental data as templates for more realistic simulations. This approach leverages your validated hits to model expected performance:

# First, identify proteins from your results to use as templates
# Run simulation using your data as templates

custom_simulation <- futureExperimentSimulation(
  N_proteins = 3000,
  N_rep = 2,
  N_Control_Rep = 3,
  Concentrations = c(0, 1, 3, 10, 30, 100, 300, 1000, 3000),
  data = dia_prepared,                          # Your data
  strong_proteins = 'PROTEIN_B',                # User defined strong responder
  weak_proteins = 'PROTEIN_A',                    # User defined weak responder 
  no_interaction_proteins = 'PROTEIN_C',          # User defined negative control 
  drug_name = "Drug1",                     # Specify which drug to model
  IC50_Prediction = FALSE
)
#> Templates successfully extracted from data
#>   Strong interaction proteins: PROTEIN_B
#>   Weak interaction proteins: PROTEIN_A
#>   No interaction proteins: PROTEIN_C

# Compare performance metrics
print(custom_simulation$Hit_Rates_Plot)


# Examine the templates that were extracted from your data
print(custom_simulation$Template_Used)
#> $strong_interaction
#>   dose Intensity LogIntensities      ratio
#> 1    0 42.256113      5.4010882 1.00000000
#> 2    1 38.863432      5.2803414 0.91971147
#> 3    3 26.873610      4.7481182 0.63596977
#> 4   10 22.303763      4.4792152 0.52782334
#> 5   30 17.327563      4.1149968 0.41006050
#> 6  100 10.777063      3.4298922 0.25504152
#> 7  300  1.186329      0.2465041 0.02807473
#> 8 1000  1.266790      0.3411772 0.02997885
#> 9 3000  1.177035      0.2351572 0.02785479
#> 
#> $weak_interaction
#>   dose Intensity LogIntensities     ratio
#> 1    0 38.914246       5.282227 1.0000000
#> 2    1 35.744368       5.159644 0.9185420
#> 3    3 37.951462       5.246084 0.9752588
#> 4   10 30.537135       4.932493 0.7847289
#> 5   30 23.454666       4.551803 0.6027270
#> 6  100 17.938304       4.164972 0.4609701
#> 7  300  9.453305       3.240819 0.2429266
#> 8 1000  7.395559       2.886659 0.1900476
#> 9 3000  6.878926       2.782183 0.1767714
#> 
#> $no_interaction
#>   dose Intensity LogIntensities     ratio
#> 1    0  59.16019       5.886555 1.0000000
#> 2    1  60.11579       5.909672 1.0161526
#> 3    3  56.23066       5.813285 0.9504813
#> 4   10  57.99096       5.857756 0.9802362
#> 5   30  57.62351       5.848586 0.9740250
#> 6  100  57.80438       5.853107 0.9770823
#> 7  300  54.28170       5.762394 0.9175376
#> 8 1000  56.94126       5.831402 0.9624927
#> 9 3000  55.61656       5.797443 0.9401010

0.9 Session information

sessionInfo()
#> R version 4.5.1 Patched (2025-08-23 r88802)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] data.table_1.17.8      lubridate_1.9.4        forcats_1.0.1         
#>  [4] stringr_1.5.2          purrr_1.1.0            readr_2.1.5           
#>  [7] tidyr_1.3.1            tibble_3.3.0           ggplot2_4.0.0         
#> [10] tidyverse_2.0.0        dplyr_1.1.4            MSstatsResponse_0.99.2
#> [13] BiocStyle_2.37.1      
#> 
#> loaded via a namespace (and not attached):
#>  [1] sass_0.4.10         generics_0.1.4      stringi_1.8.7      
#>  [4] hms_1.1.4           digest_0.6.37       magrittr_2.0.4     
#>  [7] evaluate_1.0.5      grid_4.5.1          timechange_0.3.0   
#> [10] RColorBrewer_1.1-3  bookdown_0.45       fastmap_1.2.0      
#> [13] jsonlite_2.0.0      tinytex_0.57        gridExtra_2.3      
#> [16] BiocManager_1.30.26 scales_1.4.0        codetools_0.2-20   
#> [19] jquerylib_0.1.4     cli_3.6.5           crayon_1.5.3       
#> [22] rlang_1.1.6         withr_3.0.2         cachem_1.1.0       
#> [25] yaml_2.3.10         tools_4.5.1         parallel_4.5.1     
#> [28] tzdb_0.5.0          BiocParallel_1.43.4 vctrs_0.6.5        
#> [31] R6_2.6.1            magick_2.9.0        lifecycle_1.0.4    
#> [34] pkgconfig_2.0.3     pillar_1.11.1       bslib_0.9.0        
#> [37] gtable_0.3.6        Rcpp_1.1.0          glue_1.8.0         
#> [40] xfun_0.53           tidyselect_1.2.1    knitr_1.50         
#> [43] dichromat_2.0-0.1   farver_2.1.2        htmltools_0.5.8.1  
#> [46] labeling_0.4.3      rmarkdown_2.30      compiler_4.5.1     
#> [49] S7_0.2.0

0.10 References

  1. Szvetecz, S., et al. (2025).MSstatsResponse: Semi-parametric statistical model enhances drug-protein interaction detection and IC50 estimation in chemoproteomics experiments. Manuscript in preparation.

  2. Choi, M., et al. (2014). MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments. Bioinformatics, 30(17), 2524-2526.