# User’s Guide

### Simon Cabello-Aguilar1, Jacques Colinge1

1 Institut de Recherche en Cancérologie de Montpellier, Inserm, Montpellier, France ; Institut régional du Cancer Montpellier, Montpellier, France ; Université de Montpellier, Montpellier, France

## Introduction

This guide provides an overview of the SingleCellSignalR package, a comprehensive framework to obtain cellular network maps from scRNA-seq data. SingleCellSignalR comes with a complete pipeline integrating existing methods to cluster individual cell transcriptomes and identify cell subpopulations as well as novel cellular network-specific algorithms. More advanced users can substitute their own logic or alternative tools at various stages of data processing. SingleCellSignalR also maps cell subpopulation internal network linked to genes of interest through the integration of regulated KEGG and Reactome pathways together with ligands and receptors involved in inferred cell-cell interactions. The cellular networks can be exported in text files and graphML objects to be further explored with Cytoscape (www.cytoscape.org), yEd (www.yworks.com), or similar software tools.

## Quick Start

Independent of the chosen scRNA-seq platform, deep or shallower, data comes as a table of read or unique molecule identifier (UMI) counts, one column per individual cell and one row per gene. Initial processing is required to prepare such data for subsequent analysis and we decided to propose a generic solution for the sake of convenience, though users can easily substitute their own computations. Gene names (HUGO symbols) are provided in the first column of the table.

Each analysis is organized around a working directory (or project folder):

The file containing the read counts should be placed in the working directory.

> file <- "example_data.txt"

Data processing can then start:

> data <- data_prepare(file = file)
log-Normalization
14223 genes
400 cells
Zero rate = 89.6%

The data_prepare() function eliminates non expressed genes before performing read count normalization.

Normalized data are submitted to a clustering algorithm to identify cell subpopulations:

clust <- clustering(data = data, n.cluster = 4, n = 10, method = "simlr",write = FALSE,pdf=FALSE)

#> 4 clusters detected
#> cluster 1 -> 10 cells
#> cluster 2 -> 225 cells
#> cluster 3 -> 99 cells
#> cluster 4 -> 66 cells

We set the method argument to simlr, which caused the SIMLR() function of the SIMLR package [1] to be used. The SIMLR_Estimate_Number_of_Clusters() function determined the number of clusters, between 2 and n (n=10 above).

Next, differentially expressed genes in one cluster compared to the others are identified using the cluster_analysis() function, which relies on edgeR. A result table is automatically created in the cluster-analysis folder:

clust.ana <- cluster_analysis(data = data, genes = rownames(data), cluster = clust$cluster, write = FALSE) Once the preliminary steps illustrated above are completed, SingleCellSignalR can be used to generate cellular interaction lists using the cell_signaling() function: signal <- cell_signaling(data = data, genes = rownames(data), cluster = clust$cluster, write = FALSE)
#> No such file as table_dge_cluster 1.txt in the cluster-analysis folder
#> No such file as table_dge_cluster 2.txt in the cluster-analysis folder
#> No such file as table_dge_cluster 3.txt in the cluster-analysis folder
#> No such file as table_dge_cluster 4.txt in the cluster-analysis folder
#> Paracrine signaling:
#> Checking for signaling between cell types
#> 25 interactions from cluster 1 to cluster 2
#> 36 interactions from cluster 1 to cluster 3
#> 49 interactions from cluster 1 to cluster 4
#> 6 interactions from cluster 2 to cluster 1
#> 2 interactions from cluster 2 to cluster 3
#> 6 interactions from cluster 2 to cluster 4
#> 17 interactions from cluster 3 to cluster 1
#> 2 interactions from cluster 3 to cluster 2
#> 21 interactions from cluster 3 to cluster 4
#> 24 interactions from cluster 4 to cluster 1
#> 2 interactions from cluster 4 to cluster 2
#> 18 interactions from cluster 4 to cluster 3

An intercellular network can also be generated to map the overall ligand/receptor interactions invoking the inter_network() function:

inter.net <- inter_network(data = data, signal = signal, genes = genes, cluster = clust$cluster, write = FALSE) #> Doing cluster 1 and cluster 2 ... OK #> Doing cluster 1 and cluster 3 ... OK #> Doing cluster 1 and cluster 4 ... OK #> Doing cluster 2 and cluster 1 ... OK #> Doing cluster 2 and cluster 3 ... OK #> Doing cluster 2 and cluster 4 ... OK #> Doing cluster 3 and cluster 1 ... OK #> Doing cluster 3 and cluster 2 ... OK #> Doing cluster 3 and cluster 4 ... OK #> Doing cluster 4 and cluster 1 ... OK #> Doing cluster 4 and cluster 2 ... OK #> Doing cluster 4 and cluster 3 ... OK At this point the intercellular network have been generated and exported in text and graphML formats in the networks folder. A summary of the interactions between cell clusters can be output in the form of a chord diagram by the visualize_interactions() function: visualize_interactions(signal = signal) #> major.tick.percentage is not used any more, please directly use argument major.tick.length. #> major.tick.percentage is not used any more, please directly use argument major.tick.length. #> major.tick.percentage is not used any more, please directly use argument major.tick.length. #> major.tick.percentage is not used any more, please directly use argument major.tick.length. This function will create a plot in the R plot window. The details of the interactions between two clusters, for example cluster 1 and 2, can also be shown in the plot window with the visualize_interactions() function. Note that in the example below we ask for the display of two pairs of cell clusters, pair 1 that contains interactions from cluster 1 to 2, and pair 4 from cluster 2 to 1. (names(signal) returns the cell cluster names in each pair, see function visualize_interactions() details.) visualize_interactions(signal = signal,show.in=c(1,4)) #> major.tick.percentage is not used any more, please directly use argument major.tick.length. #> major.tick.percentage is not used any more, please directly use argument major.tick.length. #> major.tick.percentage is not used any more, please directly use argument major.tick.length. #> major.tick.percentage is not used any more, please directly use argument major.tick.length. And these plots can be saved into pdf files in the images folder using the write.in argument of the visualize_interactions() function. > visualize_interactions(signal = signal,write.in=c(1,4)) red red red ## Examples of use SingleCellSignalR package functions have many arguments parameters that can be changed by the user to fit her needs (see Reference Manual for more details). Furthermore, several handy functions that were not illustrated above are provided to generate additional plots or reports. ### Exploiting the cell_classifier clustering After running the example in the Quick Start section, the user can define cell clusters after the output of the cell_classifier(). The demo data set is comprised of a subset of the 10x PBMC dataset [3], i.e. immune cells. The t-SNE map calculated with the clustering() function will also be used. For this example we will set the plot.details argument to TRUE to monitor the choice of the threshold of gene signature scores. class = cell_classifier(data=data, genes=rownames(data), markers = markers(c("immune")), tsne=clust$t-SNE,plot.details=TRUE,write = FALSE)

Let us use the cell clustering obtained with the cell_classifier() function. Although “undefined” cells may be interesting in some cases, here they form a heterogeneous cluster because they represent cells that seem to be in a transition between two states (“T-cells” and “Cytotoxic cells”, or “Neutrophils” and “Macrophages”, see heatmap above). We discard these cells.

# Define the cluster vector and the cluster names
cluster <- class$cluster c.names <- class$c.names

# Remove undefined cells
data <- data[,cluster!=(max(cluster))]
tsne <- clust$t-SNE[cluster!=(max(cluster)),] c.names <- c.names[-max(cluster)] cluster <- cluster[cluster!=(max(cluster))] Then the analysis can be carried on. clust.ana <- cluster_analysis(data = data, genes = rownames(data), cluster = cluster, c.names = c.names, write = FALSE) #> edgeR differential gene expression (dge) processing: #> Looking for differentially expressed genes in T-cells #> Looking for differentially expressed genes in B-cells #> Looking for differentially expressed genes in Macrophages #> Looking for differentially expressed genes in Cytotoxic cells #> Looking for differentially expressed genes in Neutrophils Once the cluster analysis is done, the cell_signaling(), inter_network() functions can be used. signal <- cell_signaling(data = data, genes = genes, cluster = cluster, c.names = c.names, write = FALSE) #> No such file as table_dge_T-cells.txt in the cluster-analysis folder #> No such file as table_dge_B-cells.txt in the cluster-analysis folder #> No such file as table_dge_Macrophages.txt in the cluster-analysis folder #> No such file as table_dge_Cytotoxic cells.txt in the cluster-analysis folder #> No such file as table_dge_Neutrophils.txt in the cluster-analysis folder #> Paracrine signaling: #> Checking for signaling between cell types #> 10 interactions from T-cells to B-cells #> 20 interactions from T-cells to Macrophages #> 20 interactions from T-cells to Cytotoxic cells #> 24 interactions from T-cells to Neutrophils #> 5 interactions from B-cells to T-cells #> 26 interactions from B-cells to Macrophages #> 19 interactions from B-cells to Cytotoxic cells #> 29 interactions from B-cells to Neutrophils #> 2 interactions from Macrophages to T-cells #> 11 interactions from Macrophages to B-cells #> 20 interactions from Macrophages to Cytotoxic cells #> 1 interactions from Macrophages to Neutrophils #> 2 interactions from Cytotoxic cells to T-cells #> 4 interactions from Cytotoxic cells to B-cells #> 21 interactions from Cytotoxic cells to Macrophages #> 21 interactions from Cytotoxic cells to Neutrophils #> 7 interactions from Neutrophils to T-cells #> 12 interactions from Neutrophils to B-cells #> 7 interactions from Neutrophils to Macrophages #> 29 interactions from Neutrophils to Cytotoxic cells inter.net <- inter_network(data = data, signal = signal, genes = genes, cluster = cluster, write = FALSE) #> Doing T-cells and B-cells ... OK #> Doing T-cells and Macrophages ... OK #> Doing T-cells and Cytotoxic cells ... OK #> Doing T-cells and Neutrophils ... OK #> Doing B-cells and T-cells ... OK #> Doing B-cells and Macrophages ... OK #> Doing B-cells and Cytotoxic cells ... OK #> Doing B-cells and Neutrophils ... OK #> Doing Macrophages and T-cells ... OK #> Doing Macrophages and B-cells ... OK #> Doing Macrophages and Cytotoxic cells ... OK #> Doing Macrophages and Neutrophils ... OK #> Doing Cytotoxic cells and T-cells ... OK #> Doing Cytotoxic cells and B-cells ... OK #> Doing Cytotoxic cells and Macrophages ... OK #> Doing Cytotoxic cells and Neutrophils ... OK #> Doing Neutrophils and T-cells ... OK #> Doing Neutrophils and B-cells ... OK #> Doing Neutrophils and Macrophages ... OK #> Doing Neutrophils and Cytotoxic cells ... OK If we take a look at signal[[6]] (or signal[["B-cells-Macrophages"]]) signal[[6]] #> B-cells Macrophages interaction type LRscore #> 2589 RPS19 C5AR1 paracrine 0.9154102 #> 1587 HSP90B1 ASGR1 paracrine 0.8673637 #> 1589 HSP90B1 LRP1 paracrine 0.8647858 #> 2599 RPS27A TLR4 paracrine 0.8373524 #> 1592 HSP90B1 TLR4 paracrine 0.8035628 #> 452 CALR LRP1 paracrine 0.7773343 #> 1428 GNAI2 C5AR1 paracrine 0.7567524 #> 2969 UBA52 TLR4 paracrine 0.7487672 #> 2989 UBC TLR4 paracrine 0.7366246 #> 2510 PTMA VIPR1 paracrine 0.7313738 #> 3230 CD99 PILRA paracrine 0.7095773 #> 2049 LTB LTBR paracrine 0.7049480 #> 228 B2M HFE paracrine 0.7007288 #> 1441 GNAI2 FPR1 paracrine 0.6997901 #> 2034 LRPAP1 LRP1 paracrine 0.6561746 #> 1568 HMGB1 TLR4 paracrine 0.6542829 #> 2979 UBB TLR4 paracrine 0.6505954 #> 224 B2M CD1B paracrine 0.6234452 #> 235 B2M KLRD1 paracrine 0.6234452 #> 1473 GNAS VIPR1 paracrine 0.6184186 #> 2457 PSAP LRP1 paracrine 0.6135906 #> 1471 GNAS PTGIR paracrine 0.6071115 #> 2954 UBA52 ACVR1 paracrine 0.6027855 #> 2964 UBA52 NOTCH1 paracrine 0.6027855 #> 1567 HMGB1 THBD paracrine 0.5851681 #> 626 CD55 ADGRE2 paracrine 0.5301838 We can be interested in genes participating in pathways with a receptor of interest inside a cluster of interest. Let us say ASGR1 in “Macrophages”. intra = intra_network(goi = "ASGR1",data = data,genes = rownames(data),cluster = cluster, coi = "Macrophages", c.names = c.names, signal = signal,write=FALSE) Now, let us take an overview of the signaling between the cell types. visualize_interactions(signal) #> major.tick.percentage is not used any more, please directly use argument major.tick.length. #> major.tick.percentage is not used any more, please directly use argument major.tick.length. #> major.tick.percentage is not used any more, please directly use argument major.tick.length. #> major.tick.percentage is not used any more, please directly use argument major.tick.length. #> major.tick.percentage is not used any more, please directly use argument major.tick.length. Let us get deeper and look at the signaling between “T-cells” and “B-cells” for example. visualize_interactions(signal, show.in=c(1,6)) #> major.tick.percentage is not used any more, please directly use argument major.tick.length. #> major.tick.percentage is not used any more, please directly use argument major.tick.length. #> major.tick.percentage is not used any more, please directly use argument major.tick.length. #> major.tick.percentage is not used any more, please directly use argument major.tick.length. #> major.tick.percentage is not used any more, please directly use argument major.tick.length. The following command will save these plots in the images folder. > visualize_interactions(signal, write.in=c(1,6)) ### Marker analysis on a cancer dataset For this example we use the scRNAseq dataset from Tirosh et al. [4]. We use only the data from patient 80. > file <- "patient_80.txt" > data <- data_prepare(file = file) log-Normalization 19452 genes 480 cells Zero rate = 75.5% Remark: One can notice that the zero rate is lower than in the previous example which reflects the fact that the sequencing is deeper. We know that this dataset is composed of melanoma cells and their microenvironment, we hence define our markers table using the markers() function. > my.markers <- markers(category = c("immune", "tme", "melanoma")) > head(my.markers) T-cells B-cells Macrophages Cytotoxic cells DC Mast cells Neutrophils NK cells Treg Endothelial cells CAFs melanoma 1 CD2 CD19 CD163 PRF1 CCL13 TPSB2 FPR1 XCL1 FOXP3 PECAM1 FAP MIA 2 CD3D CD79A CD14 GZMA CD209 TPSAB1 SIGLEC5 XCL2 VWF THY1 TYR 3 CD3E CD79B CSF1R GZMB HSD11B1 CPA3 CSF3R NCR1 CDH5 DCN SLC45A2 4 CD3G BLK C1QC NKG7 MS4A2 FCAR KIR2DL3 CLDN5 COL1A1 CDH19 5 CD8A MS4A1 VSIG4 GZMH HDC FCGR3B KIR3DL1 PLVAP COL1A2 PMEL 6 SIRPG BANK1 C1QA KLRK1 CEACAM3 KIR3DL2 ECSCR COL6A1 SLC24A5 Let us perform the clustering. For this example, we set the method argument to “kmeans” and the n argument to 12. > clust <- clustering(data = data,n = 12, method = "kmeans") Estimating the number of clusters Estimated number of clusters = 6 6 clusters detected cluster 1 -> 157 cells cluster 2 -> 15 cells cluster 3 -> 137 cells cluster 4 -> 6 cells cluster 5 -> 114 cells cluster 6 -> 51 cells Now we take advantage of the markers argument of the cluster_analysis() function using my.markers obtained above with the markers() function. > clust.ana <- cluster_analysis(data = data, genes = rownames(data), cluster = clust$cluster, markers = my.markers)
edgeR differential gene expression (dge) processing:
Looking for differentially expressed genes in cluster 1
Looking for differentially expressed genes in cluster 2
Looking for differentially expressed genes in cluster 3
Looking for differentially expressed genes in cluster 4
Looking for differentially expressed genes in cluster 5
Looking for differentially expressed genes in cluster 6

We can see that the clusters 2 and 5 are well defined, they are respectively cancer associated fibroblasts (CAFs) and melanoma cells. The cluster 6 is also clearly composed of endothelial cells. Clusters 1 and 2 are immune cells but the clustering did not succeed in sorting them correctly and cluster 4 counts only 6 cells. Those do not seem to be homogeneous and we decide to remove them.

> data <- data[,clust$cluster!=4] > cluster <- clust$cluster[clust$cluster!=4] > cluster[cluster>4] <- cluster[cluster>4] - 1 Then we can name our clusters manually before pursuing the analysis. > c.names <- c("Immune 1", "CAFs", "Immune 2", "melanoma", "endothelial") > signal <- cell_signaling(data = data, genes = rownames(data), cluster = cluster, c.names = c.names) Paracrine signaling: Checking for signaling between cell types 78 interactions from Immune 1 to CAFs 30 interactions from Immune 1 to melanoma 11 interactions from Immune 1 to endothelial 67 interactions from CAFs to Immune 1 85 interactions from CAFs to Immune 2 86 interactions from CAFs to melanoma 78 interactions from CAFs to endothelial 84 interactions from Immune 2 to CAFs 55 interactions from Immune 2 to melanoma 44 interactions from Immune 2 to endothelial 12 interactions from melanoma to Immune 1 33 interactions from melanoma to CAFs 19 interactions from melanoma to Immune 2 12 interactions from melanoma to endothelial 53 interactions from endothelial to Immune 1 33 interactions from endothelial to CAFs 69 interactions from endothelial to Immune 2 28 interactions from endothelial to melanoma Remark: the names of the dge tables in the cluster_analysis folder must be changed according to the cluster names (c.names). And now visualize! > visualize_interactions(signal, show.in = c(12,18)) Remark: We observe that in the chord diagrams above, the “specific” interactions were highlighted with a thick black line. Let us look at one of these specific interactions using the expression_plot_2() function. > expression_plot_2(data,"NID1","COL13A1",clust$t-SNE)

red
red
red

Thank you for reading this guide and for using SingleCellSignalR.

## References

1. Wang B, Zhu J, Pierson E, Ramazzotti D, Batzoglou S. Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat Methods. 2017;14:414-6.

2. McCarthy DJ, Chen Y, Smyth GK. Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res. 2012;40:4288-97.

3. 8k PBMCs from a Healthy Donor [Internet]. 2017. Available from: https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/pbmc8k

4. Tirosh I, Izar B, Prakadan SM, Wadsworth MH, Treacy D, Trombetta JJ, et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science. 2016;352:189-96.