The MSstatsConvert
package is a member of the
MSstatst
family of packages, MSstats and MSstatsTMT. It
creates an abstraction for the steps in mass spectrometry (MS) data
analysis that are required before a dataset can be used for statistical
modeling. In short, the package is responsible for converting output
from signal processing tools such as OpenMS
or
MaxQuant
into a format suitable for statistical analysis.
This includes:
MSstatsConvert
allows for transforming any MS
quantification result into a format required by MSstats
and
MSstatsTMT
packages. Additionally, it provides built-in
cleaning functions for outputs of DIAUmpire
,
MaxQuant
, OpenMS
, OpenSWATH
,
Progenesis
, ProteomeDiscoverer
,
Skyline
, Spectromine
, and
Spectronaut
. These functions serve as a base for converter
functions (called *toMSstatsFormat
or
*toMSstatsTMTFormat
) provided by the MSstats
and MSstatsTMT
packages.
MSstats family packages works with label-free, SRM and TMT datasets. The following column are required.
ProteinName
: column that indicates a protein ID. If the
analysis is to be made at the peptide-level, the column should store
peptide IDs. Summarization performed by MSstats
and
MSstatsTMT
packages is done separately for each ID in this
column,PeptideSequence
, PrecursorCharge
,
FragmentIon
and ProductCharge
: these four
columns define a spectral feature (transition in SRM case). If
information for any of the columns is not available, it should be set to
a constant value (for example NA
),IsotopeLabelType
: column that indicates whether the
measurement is based on an endogenous peptide (indicated by value “L” or
“light”) or reference peptide (indicated by value “H” or “heavy”),Run
: column that stores IDs of mass spectrometry runs.
If annotation describing biological conditions and replicates is
provided via a separate table, the run IDs should match Run
IDs in the annotation,Condition
: column that stores labels for biological
conditions (groups of interest). For time-course experiments, this
column will represent time points. If the design experiment includes
both time points and distinct biological subjects, these labels should
be a combination of subject and time point,BioReplicate
: this column should contain a unique
identifier for each biological replicate in the experiment. For example,
in a clinical proteomic investigation this should be a unique patient
ID. Patients from distinct groups (indicated by the
Condition
column) should have distinct IDs,Intensity
: column that stores untransformed (in
particular, no log transformation) measurements of feature abundance in
a given Run (and Channel in TMT case). They can be peak heights, peak
areas under the curve, or other quantitative representations of feature
abundance,Channel
column is required.
Similarly to the Run
column, values in this column must
correspond to values in the annotation file, if provided
separately.Additionally, if the experiment involves fractionation,
Fraction
column can be added to store fraction labels.
MSstatsConvert
allows for flexible logging based on the
log4r
package. Information about preprocessing steps can be
written to a file, to a console, to both or to neither. The
MSstatsLogsSettings
function helps manage log settings. The
user can pass a path to an existing file to the
log_file_path
parameter. Combined with setting
append = TRUE
, this allows writing all information related
to a specific data analysis to a single file. If a user does not specify
a file, a new file will be created automatically with a name starting
with “MSstats_log”, followed by a timestamp.
library(MSstatsConvert)
# default - creates a new file
MSstatsLogsSettings(use_log_file = TRUE, append = FALSE)
# default - creates a new file
MSstatsLogsSettings(use_log_file = TRUE, append = TRUE,
log_file_path = "log_file.log")
# switches logging off
MSstatsLogsSettings(use_log_file = FALSE, append = FALSE)
# switches off logs and messages
MSstatsLogsSettings(use_log_file = FALSE, verbose = FALSE)
Additionally, session info generated by the
utils::sessionInfo()
function can be saved to file with the
MSstatsSaveSessionInfo
function.
By default, the output file name will start with “MSstats_session_info” and end with a current timestamp.
MS data processing by MSstatsConvert
starts with
importing and cleaning data. The MSstatsImport
function
produces a wrapper for possibly multiple files that may describe a
single dataset. For example, MaxQuant
output consists of
two files, while OpenMS
outputs just a single file.
maxquant_evidence = read.csv(system.file("tinytest/raw_data/MaxQuant/mq_ev.csv",
package = "MSstatsConvert"))
maxquant_proteins = read.csv(system.file("tinytest/raw_data/MaxQuant/mq_pg.csv",
package = "MSstatsConvert"))
maxquant_imported = MSstatsImport(list(evidence = maxquant_evidence,
protein_groups = maxquant_proteins),
type = "MSstats", tool = "MaxQuant")
is(maxquant_imported)
#> [1] "MSstatsMaxQuantFiles" "MSstatsInputFiles"
openms_input = read.csv(system.file(
"tinytest/raw_data/OpenMSTMT/openmstmt_input.csv",
package = "MSstatsConvert"
))
openms_imported = MSstatsImport(list(input = openms_input),
"MSstatsTMT", "OpenMS")
is(openms_imported)
#> [1] "MSstatsOpenMSFiles" "MSstatsInputFiles"
The getInputFile
method allows user to retrieve the
files:
getInputFile(maxquant_imported, "evidence")[1:5, 1:5]
#> Sequence Length Modifications Modifiedsequence OxidationMProbabilities
#> <char> <int> <char> <char> <char>
#> 1: AEAPAAAPAAK 11 Unmodified _AEAPAAAPAAK_
#> 2: AEAPAAAPAAK 11 Unmodified _AEAPAAAPAAK_
#> 3: AEAPAAAPAAK 11 Unmodified _AEAPAAAPAAK_
#> 4: AEAPAAAPAAK 11 Unmodified _AEAPAAAPAAK_
#> 5: AEAPAAAPAAK 11 Unmodified _AEAPAAAPAAK_
As a next step of the analysis, input files are combined into a
single data.table
with standardized column names by the
MSstatsClean
function. It is a generic function with
built-in support for outputs of tools listed in the “Purpose of the
MSstatsConvert package” section. The type
parameter is
equal to either MSstats
or MSstatsTMT
and
indicates if the data comes from a labelled TMT experiment.
For some datasets, MSstatsClean
may require additional
parameters described in the respective help files. For our example
datasets, the following calls merge input files into a single table.
maxquant_cleaned = MSstatsClean(maxquant_imported, protein_id_col = "Proteins")
head(maxquant_cleaned)
#> ProteinName PeptideSequence Modifications PrecursorCharge
#> <char> <char> <char> <int>
#> 1: P06959 AEAPAAAPAAK Unmodified 2
#> 2: P06959 AEAPAAAPAAK Unmodified 2
#> 3: P06959 AEAPAAAPAAK Unmodified 2
#> 4: P06959 AEAPAAAPAAK Unmodified 2
#> 5: P06959 AEAPAAAPAAK Unmodified 2
#> 6: P06959 AEAPAAAPAAK Unmodified 2
#> Run Intensity Score
#> <char> <num> <num>
#> 1: 121219_S_CCES_01_01_LysC_Try_1to10_Mixt_1_1 4023100 76.332
#> 2: 121219_S_CCES_01_02_LysC_Try_1to10_Mixt_1_2 5132500 83.081
#> 3: 121219_S_CCES_01_03_LysC_Try_1to10_Mixt_1_3 2761600 104.430
#> 4: 121219_S_CCES_01_05_LysC_Try_1to10_Mixt_2_2 4091800 94.465
#> 5: 121219_S_CCES_01_06_LysC_Try_1to10_Mixt_2_3 4727000 88.596
#> 6: 121219_S_CCES_01_08_LysC_Try_1to10_Mixt_3_2 2258400 90.050
openms_cleaned = MSstatsClean(openms_imported)
head(openms_cleaned)
#> ProteinName PeptideSequence
#> <char> <char>
#> 1: sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 2: sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 3: sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 4: sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 5: sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 6: sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> PrecursorCharge
#> <int>
#> 1: 3
#> 2: 3
#> 3: 3
#> 4: 3
#> 5: 3
#> 6: 3
#> PSM Condition
#> <char> <char>
#> 1: .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3_4359.56536443198 Long_HF
#> 2: .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3_6190.04195694402 Long_HF
#> 3: .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3_4359.56536443198 Long_HF
#> 4: .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3_6190.04195694402 Long_HF
#> 5: .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3_6190.04195694402 Long_HF
#> 6: .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3_4359.56536443198 Long_HF
#> BioReplicate Run Channel Intensity Mixture TechRepMixture Fraction
#> <int> <char> <int> <num> <int> <char> <int>
#> 1: 21 3_3_3 1 NA 3 3_3 3
#> 2: 21 3_3_3 1 NA 3 3_3 3
#> 3: 24 3_3_3 4 NA 3 3_3 3
#> 4: 24 3_3_3 4 NA 3 3_3 3
#> 5: 26 3_3_3 6 NA 3 3_3 3
#> 6: 26 3_3_3 6 1820.072 3 3_3 3
If a user wants to use MSstatsConvert
package with data
in a format that is not currently supported, it is enough to first
re-format the data into a data.table
with column
ProteinName, PeptideSequence, PrecursorCharge, FragmentIon,
ProductCharge (with the latter two possibly equal to NA), Run and
IsotopeLabelType (in case of non-TMT data) or Channel (in case of TMT
data). Moreover, the dataset may include any column that will be used
for filtering the dataset (for example a column that stores q-values).
In our example, such additional columns are “Modifications” and “Score”
from MaxQuant files.
Annotation columns should be called Condition and BioReplicate. For TMT data, Mixture, TechRepMixture columns may be added. Fractionation should be indicated by a Fraction column.
The goal of MSstatsPreprocess
function is to transform
cleaned MS data into a format ready for statistical analysis with
MSstats
or MSstatsTMT
packages. This function
accepts several parameters, and each corresponds to a preprocessing
step.
input
parameter is the dataset for preprocessing,annotation
is a description of biological conditions
and replicates associated with MS runs (and channels in TMT case). If
annotation is already included in the input
, it should be
equal to NULL
. The annotation should be created by the
MSstatsMakeAnnotation
function,feature_columns
is a vector of column names that will
denote features,remove_shared_peptides
is a logical parameter - if
TRUE
, shared peptides will be removed from the analysis.
Currently, MSstats
assumes that only unique peptides are
used, and presence of shared peptides may cause issues,remove_single_feature_proteins
is a logical parameter
that indicates if proteins that only have a single feature should be
removed from the analysis (TRUE
),feature_cleaning
is a list, that currently consists of
two named elements: remove_features_with_few_measurements
should be equal to TRUE or FALSE. In the first case, feature that have
less than three measurements across runs (or channels in a run for TMT
data) will be removed. FALSE means that only features with no
non-missing measurements will be removed. The
summarize_multiple_psms
element should be a function that
will be used to aggregate multiple feature measurements within a single
MS run,aggregate_isotopic
is a logical parameter -
TRUE
means that isotopic peaks will be aggregated
(currently only used for Skyline input),columns_to_fill
is an optional named list with names
corresponding to columns and values correponding to values that will be
used for these columns. For example, if the dataset is missing
information about ProductCharge
, such a column can be added
by passing list(ProductCharge = NA)
to this parameter,score_filtering
, exact_filtering
and
pattern_filtering
parameters are optional parameters that
can be used to perform data filtering. An example is given below.maxquant_annotation = read.csv(system.file(
"tinytest/raw_data/MaxQuant/annotation.csv",
package = "MSstatsConvert"
))
maxquant_annotation = MSstatsMakeAnnotation(maxquant_cleaned,
maxquant_annotation,
Run = "Rawfile")
m_filter = list(col_name = "PeptideSequence",
pattern = "M",
filter = TRUE,
drop_column = FALSE)
oxidation_filter = list(col_name = "Modifications",
pattern = "Oxidation",
filter = TRUE,
drop_column = TRUE)
feature_columns = c("PeptideSequence", "PrecursorCharge")
maxquant_processed = MSstatsPreprocess(
maxquant_cleaned,
maxquant_annotation,
feature_columns,
remove_shared_peptides = TRUE,
remove_single_feature_proteins = FALSE,
pattern_filtering = list(oxidation = oxidation_filter,
m = m_filter),
feature_cleaning = list(remove_features_with_few_measurements = TRUE,
summarize_multiple_psms = max),
columns_to_fill = list("FragmentIon" = NA,
"ProductCharge" = NA,
"IsotopeLabelType" = "L"))
head(maxquant_processed)
#> Run PeptideSequence PrecursorCharge
#> <char> <char> <int>
#> 1: 121219_S_CCES_01_01_LysC_Try_1to10_Mixt_1_1 AEAPAAAPAAK 2
#> 2: 121219_S_CCES_01_02_LysC_Try_1to10_Mixt_1_2 AEAPAAAPAAK 2
#> 3: 121219_S_CCES_01_03_LysC_Try_1to10_Mixt_1_3 AEAPAAAPAAK 2
#> 4: 121219_S_CCES_01_05_LysC_Try_1to10_Mixt_2_2 AEAPAAAPAAK 2
#> 5: 121219_S_CCES_01_06_LysC_Try_1to10_Mixt_2_3 AEAPAAAPAAK 2
#> 6: 121219_S_CCES_01_08_LysC_Try_1to10_Mixt_3_2 AEAPAAAPAAK 2
#> Intensity ProteinName Condition BioReplicate Experiment IsotopeLabelType
#> <num> <char> <int> <int> <char> <char>
#> 1: 4023100 P06959 1 1 1_1 L
#> 2: 5132500 P06959 1 1 1_2 L
#> 3: 2761600 P06959 1 1 1_3 L
#> 4: 4091800 P06959 2 2 2_2 L
#> 5: 4727000 P06959 2 2 2_3 L
#> 6: 2258400 P06959 3 3 3_2 L
#> FragmentIon ProductCharge
#> <lgcl> <lgcl>
#> 1: NA NA
#> 2: NA NA
#> 3: NA NA
#> 4: NA NA
#> 5: NA NA
#> 6: NA NA
# OpenMS - TMT data
feature_columns_tmt = c("PeptideSequence", "PrecursorCharge")
openms_processed = MSstatsPreprocess(
openms_cleaned,
NULL,
feature_columns_tmt,
remove_shared_peptides = TRUE,
remove_single_feature_proteins = TRUE,
feature_cleaning = list(remove_features_with_few_measurements = TRUE,
summarize_multiple_psms = max)
)
head(openms_processed)
#> ProteinName PeptideSequence
#> <char> <char>
#> 1: sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 2: sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 3: sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 4: sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 5: sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 6: sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> PrecursorCharge PSM Condition
#> <int> <char> <char>
#> 1: 3 .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3 Long_HF
#> 2: 3 .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3 Long_HF
#> 3: 3 .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3 Long_HF
#> 4: 3 .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3 Long_HF
#> 5: 3 .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3 Long_LF
#> 6: 3 .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3 Long_M
#> BioReplicate Run Channel Intensity Mixture TechRepMixture Fraction
#> <int> <char> <int> <num> <int> <char> <int>
#> 1: 21 3_3_3 1 NA 3 3_3 3
#> 2: 24 3_3_3 4 NA 3 3_3 3
#> 3: 26 3_3_3 6 1820.0721 3 3_3 3
#> 4: 28 3_3_3 8 445.7412 3 3_3 3
#> 5: 25 3_3_3 5 1580.9510 3 3_3 3
#> 6: 23 3_3_3 3 1508.3302 3 3_3 3
#> FragmentIon
#> <char>
#> 1: <NA>
#> 2: <NA>
#> 3: <NA>
#> 4: <NA>
#> 5: <NA>
#> 6: <NA>
Annotation is created via the MSstatsMakeAnnotation
function. It takes the cleaned dataset and annotation file as input.
Additionally, key-value pairs can be passed to this function to change
column names (not including dots and other symbols) in the annotation
from names given by values to names given by keys.
For programmatic applications and consistency of the interface, filtering is done with the help of lists.
For filtering based on numerical scores (for example q-value filtering), the list should consist of elements named
score_column
: name of a column that stores the
score,score_threshold
: value above or below which
measurements should be kept,direction
: if “greater”, values greater than
score_threshold
will be kept; if “smaller”, values smaller
than score_threshold
will be kept;behavior
: if “remove”, rows not below/above the
threshold will be removed; if “replace”, intensity in rows not
below/above the threshold will be replaced by a given value,handle_na
: if “keep”, NA
in the score
column will not be removed,fill_value
: value that will be used if
behavior = "replace"
,filter
: if TRUE
, filtering will be
performed (can be used for conditional filtering),drop_column
: if TRUE
, column that stored
the score will be removed.For example, to remove intensities smaller than 1, we could pass the
following list to the score_filtering
parameters:
list(
list(score_column = "Intensity", score_threshold = 1,
direction = "greater", behavior = "remove",
handle_na = "remove", fill_value = NA, filter = TRUE, drop = FALSE
)
)
For filtering based on patterns (for example, removing oxidation peptides), the list should consist of elements named
col_name
: name of a column that values that may be
removed,filter_symbols
: vector of values - rows with these
values in col_name
will be removed or corresponding
intensities will be replaced,behavior
: if “remove”, rows that contain
filter_symbols
in col_name
will be removed; if
“replace”, intensity in rows that contain filter_symbols
in
col_name
will be replaced by a given value,fill_value
: value that will be used if
behavior = "replace"
,filter
: if TRUE
, filtering will be
performed (can be used for conditional filtering),drop_column
: if TRUE
, column that stored
the score will be removed.For filtering based on exact values (for example, removing iRT proteins), the list should consists of elements named
col_name
: name of a column that stores strings that
will be searched for given patterns,pattern
: vector of regular expressions - rows with
matching values in col_name
will be removed,filter
: if TRUE
, filtering will be
performed (can be used for conditional filtering),drop_column
: if TRUE
, column that stored
the values for filtering will be removed.Finally, after preprocessing, MSstatsBalancedDesign
function can be applied to handle fractions and create balanced design.
For label-free and SRM data, it means that fractionation or technical
replicates will be detected if these information is not provided.
Features measured in multiple fractions (overlapped) will be assigned to
a unique fraction. Then, the data will be adjusted so that within each
fraction, every feature has a row for certain run. If the intensity
value is missing, it will be denoted by NA
.
For TMT data, a unique fraction will be selected for each overlapped
feature and the data will adjusted so that within each run, every
feature has a row for each channel. If the intensity is missing for a
channel, it will be denoted by NA
.
maxquant_balanced = MSstatsBalancedDesign(maxquant_processed, feature_columns)
head(maxquant_balanced)
#> ProteinName PeptideSequence PrecursorCharge FragmentIon ProductCharge
#> 1 P06959 AEAPAAAPAAK 2 NA NA
#> 2 P06959 AEAPAAAPAAK 2 NA NA
#> 3 P06959 AEAPAAAPAAK 2 NA NA
#> 4 P06959 AEAPAAAPAAK 2 NA NA
#> 5 P06959 AEAPAAAPAAK 2 NA NA
#> 6 P06959 AEAPAAAPAAK 2 NA NA
#> IsotopeLabelType Condition BioReplicate
#> 1 L 1 1
#> 2 L 1 1
#> 3 L 1 1
#> 4 L 2 2
#> 5 L 2 2
#> 6 L 2 2
#> Run Fraction Intensity
#> 1 121219_S_CCES_01_01_LysC_Try_1to10_Mixt_1_1 1 4023100
#> 2 121219_S_CCES_01_02_LysC_Try_1to10_Mixt_1_2 1 5132500
#> 3 121219_S_CCES_01_03_LysC_Try_1to10_Mixt_1_3 1 2761600
#> 4 121219_S_CCES_01_04_LysC_Try_1to10_Mixt_2_1 1 2932900
#> 5 121219_S_CCES_01_05_LysC_Try_1to10_Mixt_2_2 1 4091800
#> 6 121219_S_CCES_01_06_LysC_Try_1to10_Mixt_2_3 1 4727000
dim(maxquant_balanced)
#> [1] 690 11
dim(maxquant_processed)
#> [1] 625 14
openms_balanced = MSstatsBalancedDesign(openms_processed, feature_columns_tmt)
head(openms_balanced)
#> ProteinName PeptideSequence
#> 1 sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 2 sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 3 sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 4 sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 5 sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 6 sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> PrecursorCharge PSM Mixture
#> 1 3 .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3 3
#> 2 3 .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3 3
#> 3 3 .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3 3
#> 4 3 .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3 3
#> 5 3 .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3 3
#> 6 3 .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3 3
#> TechRepMixture Run Channel BioReplicate Condition Intensity
#> 1 3_3 3_3_3 1 21 Long_HF NA
#> 2 3_3 3_3_3 2 22 Norm 1068.580
#> 3 3_3 3_3_3 3 23 Long_M 1508.330
#> 4 3_3 3_3_3 4 24 Long_HF NA
#> 5 3_3 3_3_3 5 25 Long_LF 1580.951
#> 6 3_3 3_3_3 6 26 Long_HF 1820.072
dim(openms_balanced)
#> [1] 330 11
dim(openms_processed)
#> [1] 370 16
MSstatsBalancedDesign
output is a
data.frame
of class MSstatsValidated
. Such a
data.frame
will be recognized by statistical processing
functions from MSstats
and MSstatsTMT
packages
as a valid input, which will allow them to skip checks and
transformation necessary to fit data into this format.