In this vignette we demonstrate the functionality of the miRcomp package. This package uses data from a dilution / mixture experiment to assess methods that estimate microRNA expression from qPCR amplification curves. Specifically, this package provides assessments of accuracy, precision, data quality, titration response, limit of detection, and complete features. Each of these is described in the following sections. To avoid any confusion due to naming conventions (expression estimates from amplification curves have been called Ct values, Crt values, and Cq values to name a few), we refer to the reported values as expression estimates or simply expression.
Life Technologies has designed a qPCR miRNA array panel to work within their TaqMan® OpenArray® system. The current version of the array has coverage for 754 human miRNAs and additional non-human miRNAs for normalization controls/spike-ins across two primer pools. For raw data analysis, LifeTechnologies uses a closed ExpressionSuite software package to analyze qPCR data. There are a number of previously developed open-source packages to analyze raw qPCR fluorescence data; however, unlike microarray and RNAseq data, raw qPCR data are typically analyzed using the manufacturer’s software. In fact, several open-source packages take expression values (e.g. Ct, Crt, or Cq values) as input, ignoring how these values were estimated. To assess the potential benefits of alternative expression measures, we developed miRcomp, a benchmark data set and R/Bioconductor package to facilitate the development of new algorithms to preprocess LifeTechnologies® OpenArray® miRNA data and to provide tools to integrate the data into other software packages from the Bioconductor tool set.
Two separate RNA pools were prepared by blending two tissues each: (1) kidney and placenta and (2) skeletal muscle and brain (frontal cortex). These sources of RNA were chosen based on prior analyses suggesting that the majority of microRNAs are expressed in at least one of these tissues and that several microRNAs are unique to each pool (e.g. miR-133a for skeletal muscle and the chromosome 19 miRNA cluster for placenta). We extracted RNA from formaldehyde-fixed, paraffin-embedded (FFPE) sections using the AllPrep DNA/RNA FFPE protocol (Qiagen). Mixtures of RNA were made by combining equal masses of kidney and placenta or skeletal muscle and frontal cortex RNA, respectively, and diluting to equal concentration of 3.3 ng/ul. 10 ng of RNA was used as the input for reverse transcription using the ‘A’ and ‘B’ primer pools and following the Life Technologies Open Array protocol modification for low-concentration, FFPE RNA. Separate reverse transcription and pre-amplification reactions were performed for the Life Technologies MegaPlex Pools ‘A’ and ‘B’ primer pools. Following pre-amplification, 30 ul from the ‘A’ and ‘B’ reactions for both pools were mixed with 570 ul of 0.1x TE. Further dilutions and combinations of the pools were then prepared according to the following design:
Each of the 10 unique mixture / dilution sample types was performed four times.
In the following, we will use two data sets included in the miRcomp package. The first was generated using the LifeTech ExpressionSuite software package. The second was generated using the default algorithm from the qpcR R package.
We first load the package and data.
## Load libraries
library('miRcomp')
data(lifetech)
data(qpcRdefault)
Each of the example data sets includes both an expression estimate (ct) and an assessment of the quality of each estimate (qc) for each of the 754 microRNAs and 40 samples. For the lifetech data set, the measure of quality is the AmpScore (a proprietary method of assessing the quality of an amplification curve). For the qpcRdefault data set, the measure of quality is the R2 value from the sigmoidal model fit to the amplification curve data.
str(lifetech)
## List of 2
## $ ct: num [1:754, 1:40] 28.2 19.8 11.9 15.7 23.7 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:754] "hsa-miR-409-5p_002331:A" "hsa-miR-424_000604:A" "hsa-miR-30b_000602:A" "hsa-miR-29a_002112:A" ...
## .. ..$ : chr [1:40] "KW1:1" "KW1:2" "KW1:3" "KW1:4" ...
## $ qc: num [1:754, 1:40] 1.29 1.33 1.42 1.45 1.17 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:754] "hsa-miR-409-5p_002331:A" "hsa-miR-424_000604:A" "hsa-miR-30b_000602:A" "hsa-miR-29a_002112:A" ...
## .. ..$ : chr [1:40] "KW1:1" "KW1:2" "KW1:3" "KW1:4" ...
str(qpcRdefault)
## List of 2
## $ ct: num [1:754, 1:40] 29.8 21.1 12.6 16.6 25.2 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:754] "hsa-miR-409-5p_002331:A" "hsa-miR-424_000604:A" "hsa-miR-30b_000602:A" "hsa-miR-29a_002112:A" ...
## .. ..$ : chr [1:40] "KW1:1" "KW1:2" "KW1:3" "KW1:4" ...
## $ qc: num [1:754, 1:40] 0.998 0.999 0.999 0.999 0.998 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:754] "hsa-miR-409-5p_002331:A" "hsa-miR-424_000604:A" "hsa-miR-30b_000602:A" "hsa-miR-29a_002112:A" ...
## .. ..$ : chr [1:40] "KW1:1" "KW1:2" "KW1:3" "KW1:4" ...
The colnames of the example data matrices correspond to the sample types shown in the design figure above. Each name starts with KW, followed by the sample type (1-10), then a colon and the replicate number (1-4).
colnames(lifetech$ct)
## [1] "KW1:1" "KW1:2" "KW1:3" "KW1:4" "KW2:1" "KW2:2" "KW2:3" "KW2:4"
## [9] "KW3:1" "KW3:2" "KW3:3" "KW3:4" "KW4:1" "KW4:2" "KW4:3" "KW4:4"
## [17] "KW5:1" "KW5:2" "KW5:3" "KW5:4" "KW6:1" "KW6:2" "KW6:3" "KW6:4"
## [25] "KW7:1" "KW7:2" "KW7:3" "KW7:4" "KW8:1" "KW8:2" "KW8:3" "KW8:4"
## [33] "KW9:1" "KW9:2" "KW9:3" "KW9:4" "KW10:1" "KW10:2" "KW10:3" "KW10:4"
We begin by examing the quality scores provided by each of the example methods. The qualityAssessment function allows one to examine the relationship between quality scores and expression estimates (plotType="scatter"
) or the distribution of quality scores across samples (plotType="boxplot"
).
qualityAssessment(lifetech, plotType="scatter", label1="LifeTech AmpScore")
qualityAssessment(lifetech, plotType="boxplot", label1="LifeTech AmpScore")
In addition to assessing a single method, one has the option to compare two methods by passing an optional second data object to the function. For the scatter plot, this results in plotting the quality metrics against each other. For the boxplots, the results are simply presented in a single figure. One also has the option to apply the complementary log-log transformation to the quality metrics prior to plotting (e.g. cloglog2=TRUE
). This is often useful for R2 quality metrics.
qualityAssessment(lifetech, object2=qpcRdefault, cloglog2=TRUE, plotType="scatter", label1="LifeTech AmpScore", label2="qpcR R-squared")
Finally, one can filter quality scores corresponding to NA expression estimates from the boxplots. This can be useful to focus on cases in which the expression estimates appear valid but may be of poor quality.
qualityAssessment(lifetech, plotType="boxplot", na.rm=TRUE, label1="LifeTech AmpScore")