scp data framework
Our data structure is relying on two curated data classes:
(Gatto (2020)) and
QFeatures is dedicated to the manipulation and processing of
MS-based quantitative data. It explicitly records the successive steps
to allow users to navigate up and down the different MS levels.
SingleCellExperiment is another class designed as an efficient data
container that serves as an interface to state-of-the-art methods and
algorithms for single-cell data. Our framework combines the two
classes to inherit from their respective advantages.
Because mass spectrometry (MS)-based single-cell proteomics (SCP) only
captures the proteome of between one and a few tens of single-cells in
a single run, the data is usually acquired across many MS batches.
Therefore, the data for each run should conceptually be stored in its
own container, that we here call an assay. The expected input for
working with the
scp package is quantification data of peptide to
spectrum matches (PSM). These data can then be processed to reconstruct
peptide and protein data. The links between related features across
different assays are stored to facilitate manipulation and
visualization of of PSM, peptide and protein data. This is
conceptually shown below.
There are two input tables required for starting an analysis with
The input table is generated after the identification and
quantification of the MS spectra by a pre-processing software such as
MaxQuant, ProteomeDiscoverer or MSFragger (the
of available software is actually much longer). We will here use as an
example a data table that has been generated by MaxQuant. The table is
available from the
scp package and is called
MaxQuant generated SCP data).
#>  1361 149
In this toy example, there are 1361 rows corresponding to features (quantified PSMs) and 149 columns corresponding to different data fields recorded by MaxQuant during the processing of the MS spectra. There are three types of columns: