1 Introduction

In the single cell World, which includes flow cytometry, mass cytometry, single-cell RNA-seq (scRNA-seq), and others, there is a need to improve data visualisation and to bring analysis capabilities to researchers even from non-technical backgrounds. scDataviz (Blighe 2020) attempts to fit into this space, while also catering for advanced users. Additonally, due to the way that scDataviz is designed, which is based on SingleCellExperiment (Lun and Risso 2020), it has a ‘plug and play’ feel, and immediately lends itself as flexibile and compatibile with studies that go beyond scDataviz. Finally, the graphics in scDataviz are generated via the ggplot (Wickham 2016) engine, which means that users can ‘add on’ features to these with ease.

This package just provides some additional functions for dataviz and clustering, and provides another way of identifying cell-types in clusters. It is not strictly intended as a standalone analysis package. For a comprehensive high-dimensional cytometry workflow, it is recommended to check out the work by Nowicka et al. CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets. For a more comprehensive scRNA-seq workflow, please check out OSCA and Analysis of single cell RNA-seq data.

2 Installation

2.1 1. Download the package from Bioconductor

  if (!requireNamespace('BiocManager', quietly = TRUE))
    install.packages('BiocManager')

  BiocManager::install('scDataviz')

Note: to install development version:

  devtools::install_github('kevinblighe/scDataviz')

2.2 2. Load the package into R session

  library(scDataviz)

3 Tutorial 1: CyTOF FCS data

Here, we will utilise some of the flow cytometry data from Deep phenotyping detects a pathological CD4+ T-cell complosome signature in systemic sclerosis.

This can normally be downloadedd via git clone from your command prompt:


  git clone https://github.com/kevinblighe/scDataviz_data/ ;

In a practical situation, we would normally read in this data from the raw FCS files and then QC filter, normalise, and transform them. This can be achieved via the processFCS function, which, by default, also removes variables based on low variance and downsamples [randomly] your data to 100000 variables. The user can change these via the downsample and downsampleVar parameters. An example (not run) is given below:

  filelist <- list.files(
    path = "scDataviz_data/FCS/",
    pattern = "*.fcs|*.FCS",
    full.names = TRUE)
  filelist

  metadata <- data.frame(
    sample = gsub('\\ [A-Za-z0-9]*\\.fcs$', '',
      gsub('scDataviz_data\\/FCS\\/\\/', '', filelist)),
    group = c(rep('Healthy', 7), rep('Disease', 11)),
    treatment = gsub('\\.fcs$', '',
      gsub('scDataviz_data\\/FCS\\/\\/[A-Z0-9]*\\ ', '', filelist)),
    row.names = filelist,
    stringsAsFactors = FALSE)
  metadata

  inclusions <- c('Yb171Di','Nd144Di','Nd145Di',
    'Er168Di','Tm169Di','Sm154Di','Yb173Di','Yb174Di',
    'Lu175Di','Nd143Di')

  markernames <- c('Foxp3','C3aR','CD4',
    'CD46','CD25','CD3','Granzyme B','CD55',
    'CD279','CD45RA')

  names(markernames) <- inclusions
  markernames

  exclusions <- c('Time','Event_length','BCKG190Di',
    'Center','Offset','Width','Residual')

  sce <- processFCS(
    files = filelist,
    metadata = metadata,
    transformation = TRUE,
    transFun = function (x) asinh(x),
    asinhFactor = 5,
    downsample = 10000,
    downsampleVar = 0.7,
    colsRetain = inclusions,
    colsDiscard = exclusions,
    newColnames = markernames)

In flow and mass cytometry, getting the correct marker names in the FCS files can be surprisingly difficult. In many cases, from experience, a facility may label the markers by their metals, such as Iridium (Ir), Ruthenium (Ru), Terbium (Tb), et cetera - this is the case for the data used in this tutorial. The true marker names may be held as pData encoded within each FCS, accessible via:

  library(flowCore)
  pData(parameters(
    read.FCS(filelist[[4]], transformation = FALSE, emptyValue = FALSE)))

Whatever the case, it is important to sort out marker naming issues prior to the experiment being conducted in order to avoid any confusion.

For this vignette, due to the fact that the raw FCS data is > 500 megabytes, we will work with a smaller pre-prepared dataset that has been downsampled to 10000 cells using the above code. This data comes included with the package.

Load the pre-prepared complosome data.

  load(system.file('extdata/', 'complosome.rdata', package = 'scDataviz'))

One can also create a new SingleCellExperiment object manually using any type of data, including any data from scRNA-seq produced elsewhere. Import functions for data deriving from other sources is covered in Tutorials 2 and 3 in this vignette. All functions in scDataviz additionally accept data-frames or matrices on their own, de-necessitating the reliance on the SingleCellExperiment class.

3.1 Perform principal component analysis (PCA)

We can use the PCAtools (Blighe and Lun 2020) package for the purpose of performing PCA.

  library(PCAtools)
  p <- pca(assay(sce, 'scaled'), metadata = metadata(sce))

  biplot(p,
    x = 'PC1', y = 'PC2',
    lab = NULL,
    xlim = c(min(p$rotated[,'PC1'])-1, max(p$rotated[,'PC1'])+1),
    ylim = c(min(p$rotated[,'PC2'])-1, max(p$rotated[,'PC2'])+1),
    pointSize = 1.0,
    colby = 'treatment',
    legendPosition = 'right',
    title = 'PCA applied to CyTOF data',
    caption = paste0('10000 cells randomly selected after ',
      'having filtered for low variance'))