This vignette briefly recaps the main concepts of QFeatures on which scp relies. More in depth information is to be found in the QFeatures vignettes.

1 The QFeatures class

The QFeatures class is based on the MultiAssayExperiment class that holds a collection of SummarizedExperiment (or other classes that inherits from it) objects termed assays. The assays in a QFeatures object have a hierarchical relation: proteins are composed of peptides, themselves produced by spectra, as depicted in figure below.

Those links are stored as part as the QFeatures object and connect the assays together. We load an example dataset from the scp package that is formatted as an QFeatures object and plot those connection.

library(scp)
data("scp1")
plot(scp1)

2 Accessing the data

The QFeatures class contains all the available and metadata. We here show how to retrieve those different pieces of information.

2.1 Quantitative data

The quantitative data, stored as matrix-like objects, can be accessed using the assay function. For example, we here extract the quantitative data for the first MS batch (and show a subset of it):

assay(scp1, "190321S_LCA10_X_FP97AG")[1:5, ]
#>          190321S_LCA10_X_FP97AG_RI1 190321S_LCA10_X_FP97AG_RI2
#> PSM3773                       57895                     603.73
#> PSM9078                       64889                    1481.30
#> PSM9858                       58993                     489.85
#> PSM11744                      75711                     539.02
#> PSM21752                          0                       0.00
#>          190321S_LCA10_X_FP97AG_RI3 190321S_LCA10_X_FP97AG_RI4
#> PSM3773                      2787.9                     757.17
#> PSM9078                      4891.6                     597.53
#> PSM9858                      2899.4                     882.37
#> PSM11744                     7292.7                     357.90
#> PSM21752                        0.0                       0.00
#>          190321S_LCA10_X_FP97AG_RI5 190321S_LCA10_X_FP97AG_RI6
#> PSM3773                      862.08                    1118.80
#> PSM9078                     1140.30                    1300.10
#> PSM9858                      296.60                     977.15
#> PSM11744                    1091.30                     736.87
#> PSM21752                       0.00                       0.00
#>          190321S_LCA10_X_FP97AG_RI7 190321S_LCA10_X_FP97AG_RI8
#> PSM3773                      640.10                    1446.10
#> PSM9078                     1092.50                    1309.40
#> PSM9858                      498.60                    1437.90
#> PSM11744                     712.74                     590.75
#> PSM21752                       0.00                       0.00
#>          190321S_LCA10_X_FP97AG_RI9 190321S_LCA10_X_FP97AG_RI10
#> PSM3773                      968.49                      648.56
#> PSM9078                     1538.40                     1014.50
#> PSM9858                      857.40                      888.01
#> PSM11744                   15623.00                      298.60
#> PSM21752                       0.00                        0.00
#>          190321S_LCA10_X_FP97AG_RI11
#> PSM3773                       742.53
#> PSM9078                      1062.80
#> PSM9858                       768.61
#> PSM11744                      481.38
#> PSM21752                        0.00

Note that you can retrieve the list of available assays in a QFeatures object using the names() function.

names(scp1)
#> [1] "190321S_LCA10_X_FP97AG"       "190222S_LCA9_X_FP94BM"       
#> [3] "190914S_LCB3_X_16plex_Set_21" "peptides"                    
#> [5] "proteins"

2.2 Feature metadata

For each individual assay, there is feature metadata available. We extract the list of metadata tables by using rowData() on the QFeatures object.

rowData(scp1)
#> DataFrameList of length 5
#> names(5): 190321S_LCA10_X_FP97AG 190222S_LCA9_X_FP94BM 190914S_LCB3_X_16plex_Set_21 peptides proteins
rowData(scp1)[["proteins"]]
#> DataFrame with 292 rows and 9 columns
#>                                            protein Match.time.difference
#>                                        <character>             <logical>
#> A1A519                                      A1A519                    NA
#> A5D8V6                                      A5D8V6                    NA
#> A5PLK6                                      A5PLK6                    NA
#> A5PLL1                                      A5PLL1                    NA
#> A6NC97                                      A6NC97                    NA
#> ...                                            ...                   ...
#> REV__CON__ENSEMBL:ENSBTAP00000038253 REV__CON__...                    NA
#> REV__CON__P06868                     REV__CON__...                    NA
#> REV__CON__Q05443                     REV__CON__...                    NA
#> REV__CON__Q32PI4                     REV__CON__...                    NA
#> REV__CON__Q3MHN5                     REV__CON__...                    NA
#>                                      Match.m.z.difference Match.q.value
#>                                                 <logical>     <logical>
#> A1A519                                                 NA            NA
#> A5D8V6                                                 NA            NA
#> A5PLK6                                                 NA            NA
#> A5PLL1                                                 NA            NA
#> A6NC97                                                 NA            NA
#> ...                                                   ...           ...
#> REV__CON__ENSEMBL:ENSBTAP00000038253                   NA            NA
#> REV__CON__P06868                                       NA            NA
#> REV__CON__Q05443                                       NA            NA
#> REV__CON__Q32PI4                                       NA            NA
#> REV__CON__Q3MHN5                                       NA            NA
#>                                      Match.score Reporter.PIF Reporter.fraction
#>                                        <logical>    <logical>         <logical>
#> A1A519                                        NA           NA                NA
#> A5D8V6                                        NA           NA                NA
#> A5PLK6                                        NA           NA                NA
#> A5PLL1                                        NA           NA                NA
#> A6NC97                                        NA           NA                NA
#> ...                                          ...          ...               ...
#> REV__CON__ENSEMBL:ENSBTAP00000038253          NA           NA                NA
#> REV__CON__P06868                              NA           NA                NA
#> REV__CON__Q05443                              NA           NA                NA
#> REV__CON__Q32PI4                              NA           NA                NA
#> REV__CON__Q3MHN5                              NA           NA                NA
#>                                      Potential.contaminant        .n
#>                                                <character> <integer>
#> A1A519                                                             1
#> A5D8V6                                                             1
#> A5PLK6                                                             1
#> A5PLL1                                                             1
#> A6NC97                                                             1
#> ...                                                    ...       ...
#> REV__CON__ENSEMBL:ENSBTAP00000038253                     +         1
#> REV__CON__P06868                                         +         1
#> REV__CON__Q05443                                         +         1
#> REV__CON__Q32PI4                                         +         1
#> REV__CON__Q3MHN5                                         +         1

You can also retrieve the names of each rowData column for all assays with rowDataNames.

rowDataNames(scp1)
#> CharacterList of length 5
#> [["190321S_LCA10_X_FP97AG"]] uid Sequence ... peptide Leading.razor.protein
#> [["190222S_LCA9_X_FP94BM"]] uid Sequence ... peptide Leading.razor.protein
#> [["190914S_LCB3_X_16plex_Set_21"]] uid Sequence ... Leading.razor.protein
#> [["peptides"]] Sequence Length Modifications ... .n Leading.razor.protein
#> [["proteins"]] protein Match.time.difference ... Potential.contaminant .n

You can also get the rowData from different assays in a single table using the rbindRowData function. It will keep the common rowData variables to all selected assays (provided through i).

rbindRowData(scp1, i = 1:5)
#> DataFrame with 1388 rows and 10 columns
#>              assay       rowname       protein Match.time.difference
#>        <character>   <character>   <character>             <logical>
#> 1    190321S_LC...       PSM3773        P61981                    NA
#> 2    190321S_LC...       PSM9078        Q8WVN8                    NA
#> 3    190321S_LC...       PSM9858        P55084                    NA
#> 4    190321S_LC...      PSM11744        P19099                    NA
#> 5    190321S_LC...      PSM21752        P52952                    NA
#> ...            ...           ...           ...                   ...
#> 1384      proteins REV__CON__... REV__CON__...                    NA
#> 1385      proteins REV__CON__... REV__CON__...                    NA
#> 1386      proteins REV__CON__... REV__CON__...                    NA
#> 1387      proteins REV__CON__... REV__CON__...                    NA
#> 1388      proteins REV__CON__... REV__CON__...                    NA
#>      Match.m.z.difference Match.q.value Match.score Reporter.PIF
#>                 <logical>     <logical>   <logical>    <logical>
#> 1                      NA            NA          NA           NA
#> 2                      NA            NA          NA           NA
#> 3                      NA            NA          NA           NA
#> 4                      NA            NA          NA           NA
#> 5                      NA            NA          NA           NA
#> ...                   ...           ...         ...          ...
#> 1384                   NA            NA          NA           NA
#> 1385                   NA            NA          NA           NA
#> 1386                   NA            NA          NA           NA
#> 1387                   NA            NA          NA           NA
#> 1388                   NA            NA          NA           NA
#>      Reporter.fraction Potential.contaminant
#>              <logical>           <character>
#> 1                   NA                      
#> 2                   NA                      
#> 3                   NA                      
#> 4                   NA                      
#> 5                   NA                      
#> ...                ...                   ...
#> 1384                NA                     +
#> 1385                NA                     +
#> 1386                NA                     +
#> 1387                NA                     +
#> 1388                NA                     +

2.3 Sample metadata

The sample metadata is retrieved using colData on the QFeatures object.

colData(scp1)
#> DataFrame with 38 rows and 7 columns
#>                                             Set     Channel SampleAnnotation
#>                                     <character> <character>      <character>
#> 190222S_LCA9_X_FP94BM_RI1         190222S_LC...         RI1    carrier_mi...
#> 190222S_LCA9_X_FP94BM_RI2         190222S_LC...         RI2             norm
#> 190222S_LCA9_X_FP94BM_RI3         190222S_LC...         RI3           unused
#> 190222S_LCA9_X_FP94BM_RI4         190222S_LC...         RI4             sc_u
#> 190222S_LCA9_X_FP94BM_RI5         190222S_LC...         RI5             sc_0
#> ...                                         ...         ...              ...
#> 190914S_LCB3_X_16plex_Set_21_RI12 190914S_LC...        RI12            sc_m0
#> 190914S_LCB3_X_16plex_Set_21_RI13 190914S_LC...        RI13            sc_m0
#> 190914S_LCB3_X_16