- 1 Installation
- 2 Introduction
- 3 Gene filtering
- 4 ZINB-WaVE
- 5 t-SNE representation
- 6 Normalized values and deviance residuals
- 7 The
`zinbFit`

function - 8 Differential Expression
- 9 Using
`zinbwave`

with Seurat - 10 Working with large datasets
- 11 A note on performance and parallel computing
- 12 Session Info
- References

The recommended way to install the `zinbwave`

package is

```
install.packages("BiocManager")
BiocManager::install("zinbwave")
```

Note that `zinbwave`

requires R (>=3.4) and Bioconductor (>=3.6).

This vignette provides an introductory example on how to work with the `zinbwave`

package, which implements the ZINB-WaVE method proposed in (Risso et al. 2018).

First, let’s load the packages and set serial computations.

```
library(zinbwave)
library(scRNAseq)
library(matrixStats)
library(magrittr)
library(ggplot2)
library(biomaRt)
# Register BiocParallel Serial Execution
BiocParallel::register(BiocParallel::SerialParam())
```

ZINB-WaVE is a general and flexible model for the analysis of high-dimensional zero-inflated count data, such as those recorded in single-cell RNA-seq assays. Given \(n\) samples (typically, \(n\) single cells) and \(J\) features (typically, \(J\) genes) that can be counted for each sample, we denote with \(Y_{ij}\) the count of feature \(j\) (\(j=1,\ldots,J\)) for sample \(i\) (\(i=1,\ldots,n\)). To account for various technical and biological effects, typical of single-cell sequencing technologies, we model \(Y_{ij}\) as a random variable following a zero-inflated negative binomial (ZINB) distribution with parameters \(\mu_{ij}\), \(\theta_{ij}\), and \(\pi_{ij}\), and consider the following regression models for the parameters:

\[\begin{align} \label{eq:model1} \ln(\mu_{ij}) &= \left( X\beta_\mu + (V\gamma_\mu)^\top + W\alpha_\mu + O_\mu\right)_{ij}\,,\\ \label{eq:model2} \text{logit}(\pi_{ij}) &= \left(X\beta_\pi + (V\gamma_\pi)^\top + W\alpha_\pi + O_\pi\right)_{ij} \,, \\ \label{eq:model3} \ln(\theta_{ij}) &= \zeta_j \,, \end{align}\].

where the elements of the regression models are as follows.

- \(X\) is a known \(n \times M\) matrix corresponding to \(M\) cell-level covariates and \({\bf \beta}=(\beta_\mu,\beta_\pi)\) its associated \(M \times J\) matrices of regression parameters. \(X\) can typically include covariates that induce variation of interest, such as cell types, or covariates that induce unwanted variation, such as batch or quality control (QC) measures. By default, it includes only a constant column of ones, \({\bf 1}_n\), to account for gene-specific intercepts.
- \(V\) is a known \(J \times L\) matrix corresponding to \(J\) gene-level covariates, such as gene length or GC-content, and \({\bf \gamma} = (\gamma_\mu , \gamma_\pi)\) its associated \(L\times n\) matrices of regression parameters. By default, \(V\) only includes a constant column of ones, \({\bf 1}_J\), to account for cell-specific intercepts, such as size factors representing differences in library sizes.
- \(W\) is an unobserved \(n \times K\) matrix corresponding to \(K\) unknown cell-level covariates, which could be of “unwanted variation” or of interest (such as cell type), and \({\bf \alpha} = (\alpha_\mu,\alpha_{\pi})\) its associated \(K \times J\) matrices of regression parameters.
- \(O_\mu\) and \(O_\pi\) are known \(n \times J\) matrices of offsets.
- \(\zeta\in\mathbb{R}^J\) is a vector of gene-specific dispersion parameters on the log scale.

To illustrate the methodology, we will make use of the Fluidigm C1 dataset of
(Pollen et al. 2014). The data consist of 65 cells, each sequenced at high and low depth.
The data are publicly available as part of the scRNAseq package, in the form of a `SummarizedExperiment`

object.

```
data("fluidigm")
fluidigm
```

```
## class: SummarizedExperiment
## dim: 26255 130
## metadata(3): sample_info clusters which_qc
## assays(4): tophat_counts cufflinks_fpkm rsem_counts rsem_tpm
## rownames(26255): A1BG A1BG-AS1 ... ZZEF1 ZZZ3
## rowData names(0):
## colnames(130): SRR1275356 SRR1274090 ... SRR1275366 SRR1275261
## colData names(28): NREADS NALIGNED ... Cluster1 Cluster2
```

`table(colData(fluidigm)$Coverage_Type)`

```
##
## High Low
## 65 65
```

First, we filter out the lowly expressed genes, by removing those genes that do not have at least 5 reads in at least 5 samples.

```
filter <- rowSums(assay(fluidigm)>5)>5
table(filter)
```

```
## filter
## FALSE TRUE
## 16127 10128
```

`fluidigm <- fluidigm[filter,]`

This leaves us with 10128 genes.

We next identify the 100 most variable genes, which will be the input of our ZINB-WaVE procedure. Although we apply ZINB-WaVE to only these genes primarily for computational reasons, it is generally a good idea to focus on a subset of highly-variable genes, in order to remove transcriptional noise and focus on the more biologically meaningful signals. However, at least 1,000 genes are probably needed for real analyses.

```
assay(fluidigm) %>% log1p %>% rowVars -> vars
names(vars) <- rownames(fluidigm)
vars <- sort(vars, decreasing = TRUE)
head(vars)
```

```
## IGFBPL1 STMN2 EGR1 ANP32E CENPF LDHA
## 13.06109 12.24748 11.90608 11.67819 10.83797 10.72307
```

`fluidigm <- fluidigm[names(vars)[1:100],]`

Before proceeding, we rename the first assay of `fluidigm`

“counts” to avoid needing to specify which assay we should use for the `zinbwave`

workflow. This is an optional step.

`assayNames(fluidigm)[1] <- "counts"`

The easiest way to obtain the low-dimensional representation of the data with ZINB-WaVE is to use the `zinbwave`

function. This function takes as input a `SummarizedExperiment`

object and returns a `SingleCellExperiment`

object.

`fluidigm_zinb <- zinbwave(fluidigm, K = 2, epsilon=1000)`

By default, the `zinbwave`

function fits a ZINB model with \(X = {\bf 1}_n\) and \(V = {\bf 1}_J\). In this case, the model is a factor model akin to principal component analysis (PCA), where \(W\) is a factor matrix and \(\alpha_\mu\) and \(\alpha_\pi\) are loading matrices.
By default, the `epsilon`

parameter is set to the number of genes. We empirically
found that a high `epsilon`

is often required to obtained a good low-level
representation. See `?zinbModel`

for details. Here we set `epsilon=1000`

.

The parameter \(K\) controls how many latent variables we want to infer
from the data. \(W\) is stored in the `reducedDim`

slot of the object. (See the `SingleCellExperiment`

vignette for details).

In this case, as we specified \(K=2\), we can visualize the resulting \(W\) matrix in a simple plot, color-coded by cell-type.

```
W <- reducedDim(fluidigm_zinb)
data.frame(W, bio=colData(fluidigm)$Biological_Condition,
coverage=colData(fluidigm)$Coverage_Type) %>%
ggplot(aes(W1, W2, colour=bio, shape=coverage)) + geom_point() +
scale_color_brewer(type = "qual", palette = "Set1") + theme_classic()
```

The ZINB-WaVE model is more general than PCA, allowing the inclusion of additional sample and gene-level covariates that might help to infer the unknown factors.

Typically, one could include batch information as sample-level covariate, to account for batch effects. Here, we illustrate this capability by including the coverage (high or low) as a sample-level covariate.

The column `Coverage_Type`

in the `colData`

of `fluidigm`

contains the coverage information. We can specify a design matrix that includes an intercept and an indicator
variable for the coverage, by using the formula interface of `zinbFit`

.

`fluidigm_cov <- zinbwave(fluidigm, K=2, X="~Coverage_Type", epsilon=1000)`

```
W <- reducedDim(fluidigm_cov)
data.frame(W, bio=colData(fluidigm)$Biological_Condition,
coverage=colData(fluidigm)$Coverage_Type) %>%
ggplot(aes(W1, W2, colour=bio, shape=coverage)) + geom_point() +
scale_color_brewer(type = "qual", palette = "Set1") + theme_classic()
```

In this case, the inferred \(W\) matrix is essentially the same with or without covariates, indicating that the scaling factor included in the model (the \(\gamma\) parameters associated with the intercept of \(V\)) are enough to achieve a good low-dimensional representation of the data.

Analogously, we can include gene-level covariates, as columns of \(V\). Here, we illustrate this capability by including gene length and GC-content.

We use the `biomaRt`

package to compute gene length and GC-content.

```
mart <- useMart("ensembl")
mart <- useDataset("hsapiens_gene_ensembl", mart = mart)
bm <- getBM(attributes=c('hgnc_symbol', 'start_position',
'end_position', 'percentage_gene_gc_content'),
filters = 'hgnc_symbol',
values = rownames(fluidigm),
mart = mart)
bm$length <- bm$end_position - bm$start_position
len <- tapply(bm$length, bm$hgnc_symbol, mean)
len <- len[rownames(fluidigm)]
gcc <- tapply(bm$percentage_gene_gc_content, bm$hgnc_symbol, mean)
gcc <- gcc[rownames(fluidigm)]
```

We then include the gene-level information as `rowData`

in the `fluidigm`

object.

`rowData(fluidigm) <- data.frame(gccontent = gcc, length = len)`

`fluidigm_gcc <- zinbwave(fluidigm, K=2, V="~gccontent + log(length)", epsilon=1000)`

A t-SNE representation of the data can be obtained by computing the cell distances in the reduced space and running the t-SNE algorithm on the distance.

```
set.seed(93024)
library(Rtsne)
W <- reducedDim(fluidigm_cov)
tsne_data <- Rtsne(W, pca = FALSE, perplexity=10, max_iter=5000)
data.frame(Dim1=tsne_data$Y[,1], Dim2=tsne_data$Y[,2],
bio=colData(fluidigm)$Biological_Condition,
coverage=colData(fluidigm)$Coverage_Type) %>%
ggplot(aes(Dim1, Dim2, colour=bio, shape=coverage)) + geom_point() +
scale_color_brewer(type = "qual", palette = "Set1") + theme_classic()
```

Sometimes it is useful to have normalized values for visualization and residuals
for model evaluation. Both quantities can be computed with the `zinbwave()`

function.

```
fluidigm_norm <- zinbwave(fluidigm, K=2, epsilon=1000, normalizedValues=TRUE,
residuals = TRUE)
```

The `fluidigm_norm`

object includes normalized values and residuals as additional `assays`

.

`fluidigm_norm`

```
## class: SingleCellExperiment
## dim: 100 130
## metadata(3): sample_info clusters which_qc
## assays(7): counts cufflinks_fpkm ... residuals weights
## rownames(100): IGFBPL1 STMN2 ... SRSF7 FAM117B
## rowData names(0):
## colnames(130): SRR1275356 SRR1274090 ... SRR1275366 SRR1275261
## colData names(28): NREADS NALIGNED ... Cluster1 Cluster2
## reducedDimNames(1): zinbwave
## spikeNames(0):
```

`zinbFit`

functionThe `zinbwave`

function is a user-friendly function to obtain the low-dimensional representation of the data, and optionally the normalized values and residuals from the model.

However, it is sometimes useful to store all the parameter estimates and the value of the likelihood. The `zinbFit`

function allows the user to create an object of class `zinbModel`

that can be used to store all the parameter estimates and have greater control on the results.

`zinb <- zinbFit(fluidigm, K=2, epsilon=1000)`

As with `zinbwave`

, by default, the `zinbFit`

function fits a ZINB model with \(X = {\bf 1}_n\) and \(V = {\bf 1}_J\).

If a user has run `zinbFit`

and wants to obtain normalized values or the low-dimensional representation of the data in a `SingleCellExperiment`

format, they can pass the `zinbModel`

object to `zinbwave`

to avoid repeating all the computations.

`fluidigm_zinb <- zinbwave(fluidigm, fitted_model = zinb, K = 2, epsilon=1000)`

The `zinbwave`

package can be used to compute observational weights to “unlock” bulk RNA-seq tools for single-cell applications, as illustrated in (Van den Berge et al. 2018).

Since version `1.1.5`

, `zinbwave`

computes the observational weights by default. See the man page of `zinbwave`

.
The weights are stored in an `assay`

named `weights`

and can be accessed with the following call.

`weights <- assay(fluidigm_zinb, "weights")`

Note that in this example, the value of the penalty parameter `epsilon`

was set at `1000`

, although we do not recommend this for differential expression analysis in real applications. Our evaluations have shown that a value of `epsilon=1e12`

gives good performance across a range of datasets, although this number is still arbitrary. In general, values between `1e6`

and `1e13`

give best performances.

Once we have the observational weights, we can use them in `edgeR`

to perform differential expression. Specifically, we use a moderated F-test in which the denominator residual degrees of freedom are adjusted by the extent of zero inflation (see (Van den Berge et al. 2018) for details).

Here, we compare NPC to GW16. Note that we start from only 100 genes for computational reasons, but in real analyses we would use all the expressed genes.

```
library(edgeR)
dge <- DGEList(assay(fluidigm_zinb))
dge <- calcNormFactors(dge)
design <- model.matrix(~Biological_Condition, data = colData(fluidigm))
dge$weights <- weights
dge <- estimateDisp(dge, design)
fit <- glmFit(dge, design)
lrt <- glmWeightedF(fit, coef = 3)
topTags(lrt)
```

```
## Coefficient: Biological_ConditionGW21+3
## logFC logCPM LR PValue padjFilter FDR
## VIM -4.768610 13.21770 47.43867 2.379117e-10 2.379117e-08 2.379117e-08
## FOS -5.314312 14.50176 37.39242 8.939724e-09 4.469862e-07 4.469862e-07
## USP47 -3.900572 13.37158 29.91778 2.217582e-07 7.391939e-06 7.391939e-06
## PTN -3.190159 13.22778 22.68374 4.692233e-06 1.173058e-04 1.173058e-04
## MIR100HG 2.388546 14.26683 18.25801 3.514981e-05 6.632650e-04 6.632650e-04
## NNAT -2.062983 13.60868 17.90571 3.979590e-05 6.632650e-04 6.632650e-04
## SPARC -3.202979 13.23879 16.08206 1.048017e-04 1.497168e-03 1.497168e-03
## SFRP1 -3.405943 13.01425 14.45820 2.439216e-04 3.049020e-03 3.049020e-03
## EGR1 -2.658648 14.93922 13.58181 3.193079e-04 3.547866e-03 3.547866e-03
## ST8SIA1 -3.338405 13.35883 12.64058 5.337976e-04 5.337976e-03 5.337976e-03
```

Analogously, we can use the weights in a `DESeq2`

analysis by using observation-level weights in the parameter estimation steps. In this case, there is no need to pass the weights to `DESeq2`

since they are already in the `weights`

assay of the object.

```
library(DESeq2)
dds <- DESeqDataSet(fluidigm_zinb, design = ~ Biological_Condition)
dds <- DESeq(dds, sfType="poscounts", useT=TRUE, minmu=1e-6)
res <- lfcShrink(dds, contrast=c("Biological_Condition", "NPC", "GW16"),
type = "normal")
head(res)
```

```
## log2 fold change (MAP): Biological_Condition NPC vs GW16
## Wald test p-value: Biological Condition NPC vs GW16
## DataFrame with 6 rows and 6 columns
## baseMean log2FoldChange lfcSE
## <numeric> <numeric> <numeric>
## IGFBPL1 2054.40279669602 -8.29483319811805 0.705617529194458
## STMN2 2220.07799968962 -10.0742890437113 0.769583129573025
## EGR1 1342.46496311301 -6.85394192248203 0.662686351762081
## ANP32E 806.98309206076 1.99092259299406 0.502375293036893
## CENPF 255.632254758972 1.37109273580839 0.553765790981213
## LDHA 311.760456182291 2.36715758548987 0.57805888376252
## stat pvalue padj
## <numeric> <numeric> <numeric>
## IGFBPL1 -8.14347819117087 3.84081717918298e-16 2.02148272588578e-15
## STMN2 -9.37587523964712 6.86041862113494e-21 5.27724509318072e-20
## EGR1 -10.3437981974519 2.31543036873467e-18 1.44714398045917e-17
## ANP32E 3.96658859011395 0.000130777183441069 0.000297220871456976
## CENPF 3.04963487457752 0.00319085161183045 0.00580154838514627
## LDHA 4.91242199934622 4.25404936560141e-06 1.11175754712073e-05
```

Note that `DESeq2`

’s default normalization procedure is based on geometric means of counts, which are zero for genes with at least one zero count. This greatly limits the number of genes that can be used for normalization in scRNA-seq applications. We therefore use the normalization method suggested in the `phyloseq`

package, which calculates the geometric mean for a gene by only using its positive counts, so that genes with zero counts could still be used for normalization purposes.
The `phyloseq`

normalization procedure can be applied by setting the argument `type`

equal to `poscounts`

in `DESeq`

.

For UMI data, for which the expected counts may be very low, the likelihood ratio test implemented in `nbinomLRT`

should be used. For other protocols (i.e., non-UMI), the Wald test in `nbinomWaldTest`

can be used, with null distribution a t-distribution with degrees of freedom corrected by the observational weights. In both cases, we recommend the minimum expected count to be set to a small value (e.g., `minmu=1e-6`

).

`zinbwave`

with SeuratThe factors inferred in the `zinbwave`

model can be added as one of the low dimensional data representations in the `Seurat`

object, for instance to find subpopulations using Seurat’s cluster analysis method.

We first need to convert the `SingleCellExperiment`

object into a `Seurat`

object, using Seurat’s `CreateSeuratObject`

function.

Note that the following workflow has been tested with Seurat’s version 3.0.0.

Here we create a simple Seurat object with the raw data. Please, refer to the Seurat’s vignettes for a typical analysis, which includes filtering, normalization, etc.

```
library(Seurat)
seu <- as.Seurat(x = fluidigm_zinb, counts = "counts", data = "counts")
```

Note that our `zinbwave`

factors are automatically in the Seurat object.

`seu`

```
## An object of class Seurat
## 100 features across 130 samples within 1 assay
## Active assay: RNA (100 features)
## 1 dimensional reduction calculated: zinbwave
```

Finally, we can use the `zinbwave`

factors for cluster analysis.

```
seu <- FindNeighbors(seu, reduction = "zinbwave",
dims = 1:2 #this should match K
)
seu <- FindClusters(object = seu)
```

```
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 130
## Number of edges: 2461
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7213
## Number of communities: 4
## Elapsed time: 0 seconds
```

When working with large datasets, `zinbwave`

can be computationally demanding.
We provide an approximate strategy, implemented in the `zinbsurf`

function, that
uses only a random subset of the cells to infer the low dimensional space and
subsequently projects all the cells into the inferred space.

```
fluidigm_surf <- zinbsurf(fluidigm, K = 2, epsilon = 1000,
prop_fit = 0.5)
W2 <- reducedDim(fluidigm_surf)
data.frame(W2, bio=colData(fluidigm)$Biological_Condition,
coverage=colData(fluidigm)$Coverage_Type) %>%
ggplot(aes(W1, W2, colour=bio, shape=coverage)) + geom_point() +
scale_color_brewer(type = "qual", palette = "Set1") + theme_classic()
```

Note that here we use 50% of the data to get a reasonable approximation, since we start with only 130 cells. We found that for datasets with tens of thousands of cells, 10% (the default value) is usally a reasonable choice.

Note that this is an experimental feature and has not been thoroughly tested. Use at your own risk!

The `zinbwave`

package uses the `BiocParallel`

package to allow for parallel
computing. Here, we used the `register`

command
to ensure that the vignette runs with serial computations.

However, in real datasets, parallel computations can speed up the computations dramatically, in the presence of many genes and/or many cells.

There are two ways of allowing parallel computations in `zinbwave`

. The first is
to `register()`

a parallel back-end (see `?BiocParallel::register`

for details).
Alternatively, one can pass a `BPPARAM`

object to `zinbwave`

and `zinbFit`

, e.g.

```
library(BiocParallel)
zinb_res <- zinbwave(fluidigm, K=2, BPPARAM=MulticoreParam(2))
```

We found that `MulticoreParam()`

may have some performance issues on Mac; hence,
we recommend `DoparParam()`

when working on Mac.

`sessionInfo()`

```
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.9-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.9-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] Seurat_3.0.0 DESeq2_1.24.0
## [3] edgeR_3.26.0 limma_3.40.0
## [5] Rtsne_0.15 biomaRt_2.40.0
## [7] ggplot2_3.1.1 magrittr_1.5
## [9] scRNAseq_1.9.0 zinbwave_1.6.0
## [11] SingleCellExperiment_1.6.0 SummarizedExperiment_1.14.0
## [13] DelayedArray_0.10.0 BiocParallel_1.18.0
## [15] matrixStats_0.54.0 Biobase_2.44.0
## [17] GenomicRanges_1.36.0 GenomeInfoDb_1.20.0
## [19] IRanges_2.18.0 S4Vectors_0.22.0
## [21] BiocGenerics_0.30.0 BiocStyle_2.12.0
##
## loaded via a namespace (and not attached):
## [1] backports_1.1.4 copula_0.999-19.1 Hmisc_4.2-0
## [4] plyr_1.8.4 igraph_1.2.4.1 lazyeval_0.2.2
## [7] splines_3.6.0 pspline_1.0-18 listenv_0.7.0
## [10] digest_0.6.18 foreach_1.4.4 htmltools_0.3.6
## [13] gdata_2.18.0 checkmate_1.9.1 memoise_1.1.0
## [16] cluster_2.0.9 ROCR_1.0-7 globals_0.12.4
## [19] annotate_1.62.0 R.utils_2.8.0 stabledist_0.7-1
## [22] prettyunits_1.0.2 colorspace_1.4-1 blob_1.1.1
## [25] ggrepel_0.8.0 xfun_0.6 dplyr_0.8.0.1
## [28] jsonlite_1.6 crayon_1.3.4 RCurl_1.95-4.12
## [31] genefilter_1.66.0 survival_2.44-1.1 zoo_1.8-5
## [34] iterators_1.0.10 ape_5.3 glue_1.3.1
## [37] gtable_0.3.0 zlibbioc_1.30.0 XVector_0.24.0
## [40] future.apply_1.2.0 scales_1.0.0 mvtnorm_1.0-10
## [43] DBI_1.0.0 bibtex_0.4.2 Rcpp_1.0.1
## [46] metap_1.1 viridisLite_0.3.0 xtable_1.8-4
## [49] progress_1.2.0 htmlTable_1.13.1 reticulate_1.12
## [52] rsvd_1.0.0 foreign_0.8-71 bit_1.1-14
## [55] SDMTools_1.1-221.1 Formula_1.2-3 tsne_0.1-3
## [58] glmnet_2.0-16 htmlwidgets_1.3 httr_1.4.0
## [61] gplots_3.0.1.1 RColorBrewer_1.1-2 acepack_1.4.1
## [64] ica_1.0-2 pkgconfig_2.0.2 XML_3.98-1.19
## [67] R.methodsS3_1.7.1 nnet_7.3-12 locfit_1.5-9.1
## [70] reshape2_1.4.3 tidyselect_0.2.5 labeling_0.3
## [73] rlang_0.3.4 softImpute_1.4 AnnotationDbi_1.46.0
## [76] munsell_0.5.0 tools_3.6.0 RSQLite_2.1.1
## [79] ggridges_0.5.1 evaluate_0.13 stringr_1.4.0
## [82] yaml_2.2.0 npsurv_0.4-0 knitr_1.22
## [85] bit64_0.9-7 fitdistrplus_1.0-14 caTools_1.17.1.2
## [88] purrr_0.3.2 RANN_2.6.1 pbapply_1.4-0
## [91] future_1.12.0 nlme_3.1-139 R.oo_1.22.0
## [94] compiler_3.6.0 rstudioapi_0.10 png_0.1-7
## [97] plotly_4.9.0 lsei_1.2-0 tibble_2.1.1
## [100] geneplotter_1.62.0 pcaPP_1.9-73 stringi_1.4.3
## [103] gsl_2.1-6 lattice_0.20-38 Matrix_1.2-17
## [106] pillar_1.3.1 ADGofTest_0.3 BiocManager_1.30.4
## [109] Rdpack_0.11-0 lmtest_0.9-37 data.table_1.12.2
## [112] cowplot_0.9.4 bitops_1.0-6 irlba_2.3.3
## [115] gbRd_0.4-11 R6_2.4.0 latticeExtra_0.6-28
## [118] bookdown_0.9 KernSmooth_2.23-15 gridExtra_2.3
## [121] codetools_0.2-16 MASS_7.3-51.4 gtools_3.8.1
## [124] assertthat_0.2.1 withr_2.1.2 sctransform_0.2.0
## [127] GenomeInfoDbData_1.2.1 hms_0.4.2 grid_3.6.0
## [130] rpart_4.1-15 tidyr_0.8.3 rmarkdown_1.12
## [133] numDeriv_2016.8-1 base64enc_0.1-3
```

Pollen, Alex A, Tomasz J Nowakowski, Joe Shuga, Xiaohui Wang, Anne A Leyrat, Jan H Lui, Nianzhen Li, et al. 2014. “Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex.” *Nature Biotechnology* 32 (10):1053–8.

Risso, D, F Perraudeau, S Gribkova, S Dudoit, and Vert JP. 2018. “A General and Flexible Method for Signal Extraction from Single-Cell RNA-Seq Data.” *Nature Communications* 9:284.

Van den Berge, Koen, Fanny Perraudeau, Charlotte Soneson, Michael I Love, Davide Risso, Jean-Philippe Vert, Mark D Robinson, Sandrine Dudoit, and Lieven Clement. 2018. “Observation Weights to Unlock Bulk Rna-Seq Tools for Zero Inflation and Single-Cell Applications.” *bioRxiv*. Cold Spring Harbor Laboratory, 250126.