1 Installation

To install the package, please use the following.

if (!requireNamespace("BiocManager"))
    install.packages("BiocManager")
BiocManager::install("miQC")

2 Introduction

This vignette provides a basic example of how to run miQC, which allows users to perform cell-wise filtering of single-cell RNA-seq data for quality control. Single-cell RNA-seq data is very sensitive to tissue quality and choice of experimental workflow; it’s critical to ensure compromised cells and failed cell libraries are removed. A high proportion of reads mapping to mitochondrial DNA is one sign of a damaged cell, so most analyses will remove cells with mtRNA over a certain threshold, but those thresholds can be arbitrary and/or detrimentally stringent, especially for archived tumor tissues. miQC jointly models both the proportion of reads mapping to mtDNA genes and the number of detected genes with mixture models in a probabilistic framework to identify the low-quality cells in a given dataset.

For more information about the statistical background of miQC and for biological use cases, consult the miQC paper (Hippen et al. 2021).

You’ll need the following packages installed to run this tutorial:

suppressPackageStartupMessages({
    library(SingleCellExperiment)
    library(scRNAseq)
    library(scater)
    library(flexmix)
    library(splines)
    library(miQC)
})

2.1 Example data

To demonstrate how to run miQC on a single-cell RNA-seq dataset, we’ll use data from mouse brain cells, originating from an experiment by Zeisel et al (Zeisel et al. 2015), and available through the Bioconductor package scRNAseq.

sce <- ZeiselBrainData()
sce
## class: SingleCellExperiment 
## dim: 20006 3005 
## metadata(0):
## assays(1): counts
## rownames(20006): Tspan12 Tshz1 ... mt-Rnr1 mt-Nd4l
## rowData names(1): featureType
## colnames(3005): 1772071015_C02 1772071017_G12 ... 1772066098_A12
##   1772058148_F03
## colData names(9): tissue group # ... level1class level2class
## reducedDimNames(0):
## mainExpName: gene
## altExpNames(2): repeat ERCC

2.2 Scater preprocessing

In order to calculate the percent of reads in a cell that map to mitochondrial genes, we first need to establish which genes are mitochondrial. For genes listed as HGNC symbols, this is as simple as searching for genes starting with mt-. For other IDs, we recommend using a biomaRt query to map to chromosomal location and identify all mitochondrial genes.

mt_genes <- grepl("^mt-",  rownames(sce))
feature_ctrls <- list(mito = rownames(sce)[mt_genes])

feature_ctrls
## $mito
##  [1] "mt-Tl1"  "mt-Tn"   "mt-Tr"   "mt-Tq"   "mt-Ty"   "mt-Tk"   "mt-Ta"  
##  [8] "mt-Tf"   "mt-Tp"   "mt-Tc"   "mt-Td"   "mt-Tl2"  "mt-Te"   "mt-Tv"  
## [15] "mt-Tg"   "mt-Tt"   "mt-Tw"   "mt-Tm"   "mt-Ti"   "mt-Nd3"  "mt-Nd6" 
## [22] "mt-Nd4"  "mt-Atp6" "mt-Nd2"  "mt-Nd5"  "mt-Nd1"  "mt-Co3"  "mt-Cytb"
## [29] "mt-Atp8" "mt-Co2"  "mt-Co1"  "mt-Rnr2" "mt-Rnr1" "mt-Nd4l"

miQC is designed to be run with the Bioconductor package scater, which has a built-in function addPerCellQC to calculate basic QC metrics like number of unique genes detected per cell and total number of reads. When we pass in our list of mitochondrial genes, it will also calculate percent mitochondrial reads.

sce <- addPerCellQC(sce, subsets = feature_ctrls)
head(colData(sce))
## DataFrame with 6 rows and 21 columns
##                     tissue   group # total mRNA mol      well       sex
##                <character> <numeric>      <numeric> <numeric> <numeric>
## 1772071015_C02    sscortex         1          21580        11         1
## 1772071017_G12    sscortex         1          21748        95        -1
## 1772071017_A05    sscortex         1          31642        33        -1
## 1772071014_B06    sscortex         1          32916        42         1
## 1772067065_H06    sscortex         1          21531        48         1
## 1772071017_E02    sscortex         1          24799        13        -1
##                      age  diameter  level1class level2class       sum  detected
##                <numeric> <numeric>  <character> <character> <numeric> <integer>
## 1772071015_C02        21      0.00 interneurons       Int10     22354      4871
## 1772071017_G12        20      9.56 interneurons       Int10     22869      4712
## 1772071017_A05        20     11.10 interneurons        Int6     32594      6055
## 1772071014_B06        21     11.70 interneurons       Int10     33525      5852
## 1772067065_H06        25     11.00 interneurons        Int9     21694      4724
## 1772071017_E02        20     11.90 interneurons        Int9     25919      5427
##                subsets_mito_sum subsets_mito_detected subsets_mito_percent
##                       <numeric>             <integer>            <numeric>
## 1772071015_C02              774                    23             3.462468
## 1772071017_G12             1121                    27             4.901832
## 1772071017_A05              952                    27             2.920783
## 1772071014_B06              611                    28             1.822521
## 1772067065_H06              164                    23             0.755969
## 1772071017_E02             1122                    19             4.328871
##                altexps_repeat_sum altexps_repeat_detected
##                         <numeric>               <numeric>
## 1772071015_C02               8181                     419
## 1772071017_G12              11854                     480
## 1772071017_A05              18021                     582
## 1772071014_B06              13955                     512
## 1772067065_H06               6876                     363
## 1772071017_E02              17364                     618
##                altexps_repeat_percent altexps_ERCC_sum altexps_ERCC_detected
##                             <numeric>        <numeric>             <numeric>
## 1772071015_C02                21.9677             6706                    43
## 1772071017_G12                28.8012             6435                    46
## 1772071017_A05                31.6435             6335                    47
## 1772071014_B06                25.5999             7032                    43
## 1772067065_H06                19.9299             5931                    39
## 1772071017_E02                34.7600             6671                    43
##                altexps_ERCC_percent     total
##                           <numeric> <numeric>
## 1772071015_C02              18.0070     37241
## 1772071017_G12              15.6349     41158
## 1772071017_A05              11.1238     56950
## 1772071014_B06              12.8999     54512
## 1772067065_H06              17.1908     34501
## 1772071017_E02              13.3543     49954

3 miQC

3.1 Basic usage

Using the QC metrics generated via scater, we can use the plotMetrics function to visually inspect the quality of our dataset.

plotMetrics(sce)