1 Introduction

Gene regulatory networks model the underlying gene regulation hierarchies that drive gene expression and cell states. The main functions of this package are to construct gene regulatory networks and infer transcription factor (TF) activity at the single cell level by integrating scATAC-seq and scRNA-seq data and incorporating of public bulk TF ChIP-seq data.

There are three related packages: epiregulon, epiregulon.extra and epiregulon.archr, the two of which are available through Bioconductor and the last of which is only available through github. The basic epiregulon package takes in gene expression and chromatin accessibility matrices in the form of SingleCellExperiment objects, constructs gene regulatory networks (“regulons”) and outputs the activity of transcription factors at the single cell level. The epiregulon.extra package provides a suite of tools for enrichment analysis of target genes, visualization of target genes and transcription factor activity, and network analysis which can be run on the epiregulon output. If the users would like to start from ArchR projects instead of SingleCellExperiment objects, they may choose to use epiregulon.archr package, which allows for seamless integration with the ArchR package, and continue with the rest of the workflow offered in epiregulon.extra.

For full documentation of the epiregulon package, please refer to the epiregulon book.

2 Installation

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
 
BiocManager::install("epiregulon")

Alternatively, you could install from github

devtools::install_github(repo ='xiaosaiyao/epiregulon')

Load package

library(epiregulon)

3 Data preparation

Prior to using epiregulon, single cell preprocessing needs to performed by user’s favorite methods. The following components are required:
1. Peak matrix from scATAC-seq containing the chromatin accessibility information
2. Gene expression matrix from either paired or unpaired scRNA-seq. RNA-seq integration needs to be performed for unpaired dataset.
3. Dimensionality reduction matrix from either single modality dataset or joint scRNA-seq and scATAC-seq

This tutorial demonstrates the basic functions of epiregulon, using the reprogram-seq dataset which can be downloaded from the scMultiome package. In this example, prostate cancer cells (LNCaP) were infected in separate wells with viruses encoding 4 transcription factors (NKX2-1, GATA6, FOXA1 and FOXA2) and a positive control (mNeonGreen) before pooling. The identity of the infected transcription factors was tracked through cell hashing (available in the field hash_assignment of the colData) and serves as the ground truth.

# load the MAE object
library(scMultiome)
mae <- scMultiome::reprogramSeq()

# extract peak matrix
PeakMatrix <- mae[["PeakMatrix"]]

# extract expression matrix
GeneExpressionMatrix <- mae[["GeneExpressionMatrix"]]
rownames(GeneExpressionMatrix) <- rowData(GeneExpressionMatrix)$name

# define the order of hash_assigment
GeneExpressionMatrix$hash_assignment <- 
  factor(as.character(GeneExpressionMatrix$hash_assignment),
         levels = c("HTO10_GATA6_UTR", "HTO2_GATA6_v2", 
                    "HTO8_NKX2.1_UTR", "HTO3_NKX2.1_v2",
                    "HTO1_FOXA2_v2", "HTO4_mFOXA1_v2", "HTO6_hFOXA1_UTR", 
                    "HTO5_NeonG_v2"))
# extract dimensional reduction matrix
reducedDimMatrix <- reducedDim(mae[['TileMatrix500']], "LSI_ATAC")

# transfer UMAP_combined from TileMatrix to GeneExpressionMatrix
reducedDim(GeneExpressionMatrix, "UMAP_Combined") <- 
  reducedDim(mae[['TileMatrix500']], "UMAP_Combined")

Visualize singleCellExperiment by UMAP


scater::plotReducedDim(GeneExpressionMatrix, 
                       dimred = "UMAP_Combined", 
                       text_by = "Clusters", 
                       colour_by = "Clusters")

4 Quick start

4.1 Retrieve bulk TF ChIP-seq binding sites

First, we retrieve a GRangesList object containing the binding sites of all the transcription factors and co-regulators. These binding sites are derived from bulk ChIP-seq data in the ChIP-Atlas and ENCODE databases. For the same transcription factor, multiple ChIP-seq files from different cell lines or tissues are merged. For further information on how these peaks are derived, please refer to ?epiregulon::getTFMotifInfo. Currently, human genomes hg19 and hg38 and mouse mm10 are supported.

grl <- getTFMotifInfo(genome = "hg38")
#> Retrieving chip-seq data, version 2
#> see ?scMultiome and browseVignettes('scMultiome') for documentation
#> loading from cache
grl
#> GRangesList object of length 1377:
#> $AEBP2
#> GRanges object with 2700 ranges and 0 metadata columns:
#>          seqnames            ranges strand
#>             <Rle>         <IRanges>  <Rle>
#>      [1]     chr1        9792-10446      *
#>      [2]     chr1     942105-942400      *
#>      [3]     chr1     984486-984781      *
#>      [4]     chr1   3068932-3069282      *
#>      [5]     chr1   3069411-3069950      *
#>      ...      ...               ...    ...
#>   [2696]     chrY   8465261-8465730      *
#>   [2697]     chrY 11721744-11722260      *
#>   [2698]     chrY 11747448-11747964      *
#>   [2699]     chrY 19302661-19303134      *
#>   [2700]     chrY 19985662-19985982      *
#>   -------
#>   seqinfo: 25 sequences from an unspecified genome; no seqlengths
#> 
#> ...
#> <1376 more elements>

We recommend the use of ChIP-seq data over motif for estimating TF occupancy. However, if the user would like to start from motifs, it is possible to switch to the motif mode. The user can provide the peak regions as a GRanges object and the location of the motifs will be annotated based on Cisbp from chromVARmotifs (human_pwms_v2 and mouse_pwms_v2, version 0.2)

grl.motif <- getTFMotifInfo(genome = "hg38",  
                            mode = "motif", 
                            peaks = rowRanges(PeakMatrix))

4.2 Link ATAC-seq peaks to target genes

Next, we try to link ATAC-seq peaks to their putative target genes. We assign a peak to a gene within a size window (default ±250kb) if the chromatin accessibility of the peak and expression of the target genes are highly correlated (default p-vale threshold 0.05). To compute correlations, we first create cell aggregates by performing k-means clustering on the reduced dimensionality matrix. Then we aggregate the counts of the gene expression and peak matrix and average across the number of cells. Correlations are computed on the averaged gene expression and chromatin accessibility.

4.2.1 Optimize the number of metacells (Optional)

The number of metacells used to aggregate cells into meta-cells is a sensitive parameter, as it should strike the correct balance between averaging out biological variation and reducing signal noise caused by sparsity in single cells. Therefore, we provide a separate function, optimizeMetacellNumber, to automatically select the optimal value for this parameter. The algorithm samples different possible values and calculates the mean empirical p-value of RE–TG connections for each value. Then, a quadratic regression is fitted to the relationship, and the minimum of the function is determined.

cellNum <- optimizeMetacellNumber(peakMatrix = PeakMatrix,
                                  expMatrix = GeneExpressionMatrix,
                                  reducedDim = reducedDimMatrix,
                                  exp_assay = "normalizedCounts",
                                  peak_assay = "counts",
                                  BPPARAM = BiocParallel::SerialParam(progressbar = FALSE))
#> Warning in optimizeMetacellNumber(peakMatrix = PeakMatrix, expMatrix =
#> GeneExpressionMatrix, : Solution at the boundary of examined range.
#> Warning in optimizeMetacellNumber(peakMatrix = PeakMatrix, expMatrix =
#> GeneExpressionMatrix, : Coefficient of determination of the regression model is
#> lower thant 0.7
#> An issue detected during estimation optimal number of metacells.Consider at least one of the following actions:1. Change of the `cellNumMin` and `cellNumMax` paramaters2. Increasing the number of evaluation points (`n_evaluation_points` argument)3. Increasing the number of iterations (`n_iter` argument)4. Increasing the number of false connections used to compute p-valuenull distribution (`nRandConns` argument)
#> Solution not found using quadratic regression. Using cluster size with the lowest mean p-value.

We can plot the results to make sure that the relationship is correctly generalized.

plot(cellNum)

4.2.2 Calculate Peak to Gene Links

The output of optimizeMetacellNumber function is then passed to calculateP2G as cellNum argument. Alternatively, users may skip optimizeMetacellNumber and specify their own cell number in cellNum.

set.seed(1010)
p2g <- calculateP2G(peakMatrix = PeakMatrix, 
                    expMatrix = GeneExpressionMatrix, 
                    reducedDim = reducedDimMatrix,
                    exp_assay = "normalizedCounts",
                    peak_assay = "counts",
                    cellNum = cellNum,
                    BPPARAM = BiocParallel::SerialParam(progressbar = FALSE))
#> Using epiregulon to compute peak to gene links...
#> Value of the paramater 'cellNum' has not been optimized.
#>                 Consider running function 'optimizeMetacellNumber' and use output to set 'cellNum'
#> Creating metacells...
#> Looking for regulatory elements near target genes...
#> Computing correlations...

p2g
#> DataFrame with 27788 rows and 10 columns
#>         idxATAC         chr     start       end    idxRNA     target
#>       <integer> <character> <integer> <integer> <integer>    <array>
#> 1             4        chr1    827287    827787        15  LINC01409
#> 2             8        chr1    917255    917755        33       AGRN
#> 3             9        chr1    920452    920952        30       HES4
#> 4            12        chr1    924540    925040        21 AL645608.2
#> 5            12        chr1    924540    925040        33       AGRN
#> ...         ...         ...       ...       ...       ...        ...
#> 27784    126570        chrX 154541878 154542378     36398       GDI1
#> 27785    126572        chrX 154546500 154547000     36398       GDI1
#> 27786    126574        chrX 154607299 154607799     36402      UBL4A
#> 27787    126589        chrX 155216677 155217177     36426      CLIC2
#> 27788    126590        chrX 155228844 155229344     36426      CLIC2
#>       Correlation p_val_peak_gene FDR_peak_gene  distance
#>          <matrix>        <matrix>      <matrix> <integer>
#> 1       -0.287036       0.0274266      0.897453     48529
#> 2       -0.339948       0.0111970      0.888955    102363
#> 3        0.443191       0.0366656      0.900893     79027
#> 4       -0.251828       0.0488281      0.907338     13105
#> 5       -0.261609       0.0420019      0.903914     95078
#> ...           ...             ...           ...       ...
#> 27784   -0.254763       0.0466513      0.903914    104965
#> 27785   -0.262641       0.0413402      0.903914    109587
#> 27786    0.503043       0.0221027      0.888955    120715
#> 27787    0.465381       0.0305586      0.898925    117435
#> 27788    0.423254       0.0428900      0.903914    105268

4.3 Add TF motif binding to peaks

The next step is to add the TF binding information by overlapping regions of the peak matrix with the bulk chip-seq database. The output is a data frame object with three columns:

idxATAC - index of the peak in the peak matrix
idxTF - index in the gene expression matrix corresponding to the transcription factor
tf - name of the transcription factor


overlap <- addTFMotifInfo(grl = grl, 
                          p2g = p2g, 
                          peakMatrix = PeakMatrix)
#> Computing overlap...
#> Success!
head(overlap)
#>     idxATAC idxTF    tf
#> 160       4     2  AFF1
#> 161       4     3  AFF4
#> 162       4     4  AGO1
#> 163       4     5  AGO2
#> 164       4     6   AHR
#> 165       4     8 ARID2

4.4 Generate regulons

A DataFrame, representing the inferred regulons, is then generated. The DataFrame consists of ten columns:

idxATAC - index of the peak in the peak matrix
chr - chromosome number
start - start position of the peak
end - end position of the peak
idxRNA - index in the gene expression matrix corresponding to the target gene
target - name of the target gene
distance - distance between the transcription start site of the target gene and the middle of the peak
idxTF - index in the gene expression matrix corresponding to the transcription factor
tf - name of the transcription factor
corr - correlation between target gene expression and the chromatin accessibility at the peak. if cluster labels are provided, this field is a matrix with columns names corresponding to correlation across all cells and for each of the clusters.


regulon <- getRegulon(p2g = p2g, 
                      overlap = overlap)
regulon
#> DataFrame with 5423341 rows and 12 columns
#>           idxATAC         chr     start       end    idxRNA    target      corr
#>         <integer> <character> <integer> <integer> <integer>   <array>  <matrix>
#> 1               4        chr1    827287    827787        15 LINC01409 -0.287036
#> 2               4        chr1    827287    827787        15 LINC01409 -0.287036
#> 3               4        chr1    827287    827787        15 LINC01409 -0.287036
#> 4               4        chr1    827287    827787        15 LINC01409 -0.287036
#> 5               4        chr1    827287    827787        15 LINC01409 -0.287036
#> ...           ...         ...       ...       ...       ...       ...       ...
#> 5423337    126589        chrX 155216677 155217177     36426     CLIC2  0.465381
#> 5423338    126589        chrX 155216677 155217177     36426     CLIC2  0.465381
#> 5423339    126590        chrX 155228844 155229344     36426     CLIC2  0.423254
#> 5423340    126590        chrX 155228844 155229344     36426     CLIC2  0.423254
#> 5423341    126590        chrX 155228844 155229344     36426     CLIC2  0.423254
#>         p_val_peak_gene FDR_peak_gene  distance     idxTF          tf
#>                <matrix>      <matrix> <integer> <integer> <character>
#> 1             0.0274266      0.897453     48529         2        AFF1
#> 2             0.0274266      0.897453     48529         3        AFF4
#> 3             0.0274266      0.897453     48529         4        AGO1
#> 4             0.0274266      0.897453     48529         5        AGO2
#> 5             0.0274266      0.897453     48529         6         AHR
#> ...                 ...           ...       ...       ...         ...
#> 5423337       0.0305586      0.898925    117435      1343      ZNF781
#> 5423338       0.0305586      0.898925    117435      1370     ZSCAN30
#> 5423339       0.0428900      0.903914    105268       114       FOXA1
#> 5423340       0.0428900      0.903914    105268       128       GATA2
#> 5423341       0.0428900      0.903914    105268       823       SUMO2

4.5 Network pruning (highly recommended)

Epiregulon prunes the network by performing tests of independence on the observed number of cells jointly expressing transcription factor (TF), regulatory element (RE) and target gene (TG) vs the expected number of cells if TF/RE and TG are independently expressed. The users can choose between two tests, the binomial test and the chi-square test. In the binomial test, the expected probability is P(TF, RE) * P(TG), and the number of trials is the total number of cells, and the observed successes is the number of cells jointly expressing all three elements. In the chi-square test, the expected probability for having all 3 elements active is also P(TF, RE) * P(TG) and the probability otherwise is 1- P(TF, RE) * P(TG). The observed cell count for the category of “active TF” is the number of cells jointly expressing all three elements, and the cell count for the inactive category is n - n_triple.

We calculate cluster-specific p-values if users supply cluster labels. This is useful if we are interested in cluster-specific networks. The pruned regulons can then be used to visualize differential networks for transcription factors of interest.


pruned.regulon <- pruneRegulon(expMatrix = GeneExpressionMatrix,
                               exp_assay = "normalizedCounts",
                               peakMatrix = PeakMatrix,
                               peak_assay = "counts",
                               test = "chi.sq",
                               regulon = regulon[regulon$tf %in% 
                                         c("NKX2-1","GATA6","FOXA1","FOXA2", "AR"),],
                               clusters = GeneExpressionMatrix$Clusters,
                               prune_value = "pval",
                               regulon_cutoff = 0.05
                               )

pruned.regulon

4.6 Add Weights

While the pruneRegulon function provides statistics on the joint occurrence of TF-RE-TG, we would like to further estimate the strength of regulation. Biologically, this can be interpreted as the magnitude of gene expression changes induced by transcription factor activity. Epiregulon estimates the regulatory potential using one of the three measures: 1) correlation between TF and target gene expression, 2) mutual information between the TF and target gene expression and 3) Wilcoxon test statistics of target gene expression in cells jointly expressing all 3 elements vs cells that do not.

Two of the measures (correlation and Wilcoxon statistics) give both the magnitude and directionality of changes whereas mutual information is always positive. The correlation and mutual information statistics are computed on pseudobulks aggregated by user-supplied cluster labels, whereas the Wilcoxon method groups cells into two categories: 1) the active category of cells jointly expressing TF, RE and TG and 2) the inactive category consisting of the remaining cells.

We calculate cluster-specific weights if users supply cluster labels.


regulon.w <- addWeights(regulon = pruned.regulon,
                        expMatrix  = GeneExpressionMatrix,
                        exp_assay  = "normalizedCounts",
                        peakMatrix = PeakMatrix,
                        peak_assay = "counts",
                        clusters = GeneExpressionMatrix$Clusters,
                        method = "wilcox")

regulon.w

4.7 Annotate with TF motifs (Optional)

So far the gene regulatory network was constructed from TF ChIP-seq exclusively. Some users would prefer to further annotate regulatory elements with the presence of motifs. We provide an option to annotate peaks with motifs from the Cisbp database. If no motifs are present for this particular factor (as in the case of co-factors or chromatin modifiers), we return NA. If motifs are available for a factor and the RE contains a motif, we return 1. If motifs are available and the RE does not contain a motif, we return 0. The users can also provide their own motif annotation through the pwms argument.


regulon.w.motif <- addMotifScore(regulon = regulon.w,
                                 peaks = rowRanges(PeakMatrix),
                                 species = "human",
                                 genome = "hg38")
#> annotating peaks with motifs
#> see ?scMultiome and browseVignettes('scMultiome') for documentation
#> loading from cache
#> 
#> Attaching package: 'Biostrings'
#> The following object is masked from 'package:base':
#> 
#>     strsplit
#> 
#> Attaching package: 'rtracklayer'
#> The following object is masked from 'package:AnnotationHub':
#> 
#>     hubUrl

# if desired, set weight to 0 if no motif is found
regulon.w.motif$weight[regulon.w.motif$motif == 0] <- 0

regulon.w.motif
#> DataFrame with 12615 rows and 17 columns
#>         idxATAC         chr     start       end    idxRNA  target      corr
#>       <integer> <character> <integer> <integer> <integer> <array>  <matrix>
#> 1            16        chr1    941458    941958        27 PLEKHN1 -0.316896
#> 2            17        chr1    942448    942948        27 PLEKHN1 -0.311117
#> 3            32        chr1   1005187   1005687        31   ISG15 -0.266184
#> 4            44        chr1   1063727   1064227        30    HES4 -0.281860
#> 5            50        chr1   1079276   1079776        27 PLEKHN1 -0.268102
#> ...         ...         ...       ...       ...       ...     ...       ...
#> 12611    123380        chr9 128635245 128635745     35034  SPTAN1  0.665593
#> 12612    123444        chr9 129139814 129140314     35053 SH3GLB2 -0.397590
#> 12613    123961        chr9 136545877 136546377     35218   EGFL7 -0.263232
#> 12614    124120        chr9 137240872 137241372     35274   NELFB -0.304012
#> 12615    124431        chrX  16719400  16719900     35406   CTPS2 -0.370374
#>       p_val_peak_gene FDR_peak_gene  distance     idxTF          tf
#>              <matrix>      <matrix> <integer> <integer> <character>
#> 1           0.0170654      0.888955     24522       485          AR
#> 2           0.0188591      0.888955     23532       485          AR
#> 3           0.0387629      0.902775      4049       485          AR
#> 4           0.0295685      0.898341     63746       485          AR
#> 5           0.0373176      0.900893    112794       485          AR
#> ...               ...           ...       ...       ...         ...
#> 12611      0.00542585      0.862926     82687       723      NKX2-1
#> 12612      0.00362205      0.852006    111629       723      NKX2-1
#> 12613      0.04083516      0.903914    112477       723      NKX2-1
#> 12614      0.02067008      0.888955     13953       723      NKX2-1
#> 12615      0.00609480      0.866922      7183       723      NKX2-1
#>                                     pval                              stats
#>                                 <matrix>                           <matrix>
#> 1     0.9228115:0.00216014:1.0000000:... 0.00938821:9.408227413:0.00000:...
#> 2     0.7550261:0.62588366:1.0000000:... 0.09735586:0.237684306:0.00000:...
#> 3     0.1536743:0.48024862:1.0000000:... 2.03539507:0.498298692:0.00000:...
#> 4     0.0293318:0.55482832:0.0294814:... 4.74803613:0.348738126:4.73928:...
#> 5     0.9352849:0.98724889:1.0000000:... 0.00659303:0.000255419:0.00000:...
#> ...                                  ...                                ...
#> 12611                4.51202e-02:1:1:...                    4.01414:0:0:...
#> 12612                1.79623e-01:1:1:...                    1.80074:0:0:...
#> 12613                3.56162e-05:1:1:...                   17.09178:0:0:...
#> 12614                1.07516e-01:1:1:...                    2.59036:0:0:...
#> 12615                3.18522e-02:1:1:...                    4.60643:0:0:...
#>            qval    weight     motif
#>        <matrix>  <matrix> <numeric>
#> 1     1:1:1:... 0:0:0:...         0
#> 2     1:1:1:... 0:0:0:...         0
#> 3     1:1:1:... 0:0:0:...         0
#> 4     1:1:1:... 0:0:0:...         0
#> 5     1:1:1:... 0:0:0:...         0
#> ...         ...       ...       ...
#> 12611 1:1:1:... 0:0:0:...         0
#> 12612 1:1:1:... 0:0:0:...         0
#> 12613 1:1:1:... 0:0:0:...         0
#> 12614 1:1:1:... 0:0:0:...         0
#> 12615 1:1:1:... 0:0:0:...         0

4.8 Annotate with log fold changes (Optional)

It is sometimes helpful to filter the regulons based on gene expression changes between two conditions. The addLogFC function is a wrapper around scran::findMarkers and adds extra columns of log changes to the regulon DataFrame. The users can specify the reference condition in logFC_ref and conditions for comparison in logFC_condition. If these are not provided, log fold changes are calculated for every condition in the cluster labels against the rest of the conditions.

# create logcounts
GeneExpressionMatrix <- scuttle::logNormCounts(GeneExpressionMatrix)

# add log fold changes which are calculated by taking the difference of the log counts
regulon.w <- addLogFC(regulon = regulon.w,
                      clusters = GeneExpressionMatrix$hash_assignment,
                      expMatrix  = GeneExpressionMatrix,
                      assay.type  = "logcounts",
                      pval.type = "any",
                      logFC_condition = c("HTO10_GATA6_UTR", 
                                          "HTO2_GATA6_v2", 
                                          "HTO8_NKX2.1_UTR"),
                      logFC_ref = "HTO5_NeonG_v2")

regulon.w
#> DataFrame with 12615 rows and 25 columns
#>         idxATAC         chr     start       end    idxRNA  target      corr
#>       <integer> <character> <integer> <integer> <integer> <array>  <matrix>
#> 1            16        chr1    941458    941958        27 PLEKHN1 -0.316896
#> 2            17        chr1    942448    942948        27 PLEKHN1 -0.311117
#> 3            32        chr1   1005187   1005687        31   ISG15 -0.266184
#> 4            44        chr1   1063727   1064227        30    HES4 -0.281860
#> 5            50        chr1   1079276   1079776        27 PLEKHN1 -0.268102
#> ...         ...         ...       ...       ...       ...     ...       ...
#> 12611    123380        chr9 128635245 128635745     35034  SPTAN1  0.665593
#> 12612    123444        chr9 129139814 129140314     35053 SH3GLB2 -0.397590
#> 12613    123961        chr9 136545877 136546377     35218   EGFL7 -0.263232
#> 12614    124120        chr9 137240872 137241372     35274   NELFB -0.304012
#> 12615    124431        chrX  16719400  16719900     35406   CTPS2 -0.370374
#>       p_val_peak_gene FDR_peak_gene  distance     idxTF          tf
#>              <matrix>      <matrix> <integer> <integer> <character>
#> 1           0.0170654      0.888955     24522       485          AR
#> 2           0.0188591      0.888955     23532       485          AR
#> 3           0.0387629      0.902775      4049       485          AR
#> 4           0.0295685      0.898341     63746       485          AR
#> 5           0.0373176      0.900893    112794       485          AR
#> ...               ...           ...       ...       ...         ...
#> 12611      0.00542585      0.862926     82687       723      NKX2-1
#> 12612      0.00362205      0.852006    111629       723      NKX2-1
#> 12613      0.04083516      0.903914    112477       723      NKX2-1
#> 12614      0.02067008      0.888955     13953       723      NKX2-1
#> 12615      0.00609480      0.866922      7183       723      NKX2-1
#>                                     pval                              stats
#>                                 <matrix>                           <matrix>
#> 1     0.9228115:0.00216014:1.0000000:... 0.00938821:9.408227413:0.00000:...
#> 2     0.7550261:0.62588366:1.0000000:... 0.09735586:0.237684306:0.00000:...
#> 3     0.1536743:0.48024862:1.0000000:... 2.03539507:0.498298692:0.00000:...
#> 4     0.0293318:0.55482832:0.0294814:... 4.74803613:0.348738126:4.73928:...
#> 5     0.9352849:0.98724889:1.0000000:... 0.00659303:0.000255419:0.00000:...
#> ...                                  ...                                ...
#> 12611                4.51202e-02:1:1:...                    4.01414:0:0:...
#> 12612                1.79623e-01:1:1:...                    1.80074:0:0:...
#> 12613                3.56162e-05:1:1:...                   17.09178:0:0:...
#> 12614                1.07516e-01:1:1:...                    2.59036:0:0:...
#> 12615                3.18522e-02:1:1:...                    4.60643:0:0:...
#>            qval                               weight
#>        <matrix>                             <matrix>
#> 1     1:1:1:... 0.000558636: 0.15051940:0.000000:...
#> 2     1:1:1:... 0.003594128:-0.02374424:0.000000:...
#> 3     1:1:1:... 0.025580412:-0.03472721:0.000000:...
#> 4     1:1:1:... 0.039595855: 0.02639944:0.310942:...
#> 5     1:1:1:... 0.006330629: 0.00193837:0.000000:...
#> ...         ...                                  ...
#> 12611 1:1:1:...                    0.0277381:0:0:...
#> 12612 1:1:1:...                    0.0211504:0:0:...
#> 12613 1:1:1:...                    0.0669747:0:0:...
#> 12614 1:1:1:...                    0.0246483:0:0:...
#> 12615 1:1:1:...                    0.0312261:0:0:...
#>       HTO10_GATA6_UTR.vs.HTO5_NeonG_v2.FDR
#>                                  <numeric>
#> 1                                 0.768744
#> 2                                 0.768744
#> 3                                 1.000000
#> 4                                 0.429119
#> 5                                 0.768744
#> ...                                    ...
#> 12611                             0.891051
#> 12612                             1.000000
#> 12613                             0.938624
#> 12614                             1.000000
#> 12615                             0.938624
#>       HTO10_GATA6_UTR.vs.HTO5_NeonG_v2.p.value
#>                                      <numeric>
#> 1                                    -2.406967
#> 2                                    -2.406967
#> 3                                    -0.960831
#> 4                                    -3.480750
#> 5                                    -2.406967
#> ...                                        ...
#> 12611                                -2.088809
#> 12612                                -0.875472
#> 12613                                -1.251026
#> 12614                                -0.346989
#> 12615                                -1.225717
#>       HTO10_GATA6_UTR.vs.HTO5_NeonG_v2.logFC HTO2_GATA6_v2.vs.HTO5_NeonG_v2.FDR
#>                                    <numeric>                          <numeric>
#> 1                                 -0.0224284                           1.000000
#> 2                                 -0.0224284                           1.000000
#> 3                                 -0.0222188                           0.950512
#> 4                                 -0.0698706                           0.629688
#> 5                                 -0.0224284                           1.000000
#> ...                                      ...                                ...
#> 12611                             0.05722383                          0.0780338
#> 12612                            -0.01947302                          1.0000000
#> 12613                            -0.01972348                          1.0000000
#> 12614                            -0.00696269                          1.0000000
#> 12615                            -0.03199543                          1.0000000
#>       HTO2_GATA6_v2.vs.HTO5_NeonG_v2.p.value
#>                                    <numeric>
#> 1                                  -0.498011
#> 2                                  -0.498011
#> 3                                  -1.471853
#> 4                                  -2.930635
#> 5                                  -0.498011
#> ...                                      ...
#> 12611                              -5.984106
#> 12612                              -0.685001
#> 12613                              -0.582334
#> 12614                              -0.430542
#> 12615                              -0.106763
#>       HTO2_GATA6_v2.vs.HTO5_NeonG_v2.logFC HTO8_NKX2.1_UTR.vs.HTO5_NeonG_v2.FDR
#>                                  <numeric>                            <numeric>
#> 1                               0.00796239                                    1
#> 2                               0.00796239                                    1
#> 3                              -0.02663603                                    1
#> 4                              -0.05776614                                    1
#> 5                               0.00796239                                    1
#> ...                                    ...                                  ...
#> 12611                           0.11104599                             0.992412
#> 12612                          -0.01469769                             1.000000
#> 12613                          -0.01081622                             0.859701
#> 12614                          -0.00758876                             1.000000
#> 12615                          -0.00379030                             1.000000
#>       HTO8_NKX2.1_UTR.vs.HTO5_NeonG_v2.p.value
#>                                      <numeric>
#> 1                                    -0.128231
#> 2                                    -0.128231
#> 3                                    -0.163465
#> 4                                    -0.169655
#> 5                                    -0.128231
#> ...                                        ...
#> 12611                               -1.1223310
#> 12612                               -0.1542453
#> 12613                               -2.7304660
#> 12614                               -0.7840765
#> 12615                               -0.0677012
#>       HTO8_NKX2.1_UTR.vs.HTO5_NeonG_v2.logFC
#>                                    <numeric>
#> 1                                -0.00218254
#> 2                                -0.00218254
#> 3                                 0.00444303
#> 4                                 0.00620451
#> 5                                -0.00218254
#> ...                                      ...
#> 12611                             0.03146638
#> 12612                             0.00395594
#> 12613                            -0.03017369
#> 12614                            -0.01222574
#> 12615                             0.00236451

4.9 Calculate TF activity

Finally, the activities for a specific TF in each cell are computed by averaging expressions of target genes linked to the TF weighted by the test statistics of choice, chosen from either correlation, mutual information or the Wilcoxon test statistics. \[y=\frac{1}{n}\sum_{i=1}^{n} x_i * weights_i\] where \(y\) is the activity of a TF for a cell, \(n\) is the total number of targets for a TF, \(x_i\) is the log count expression of target \(i\) where \(i\) in {1,2,…,n} and \(weights_i\) is the weight of TF - target \(i\)

score.combine <- calculateActivity(expMatrix = GeneExpressionMatrix,
                                   regulon = regulon.w, 
                                   mode = "weight", 
                                   method = "weightedMean", 
                                   exp_assay = "normalizedCounts",
                                   normalize = FALSE)
#> Warning in calculateActivity(expMatrix = GeneExpressionMatrix, regulon =
#> regulon.w, : Argument 'method' to calculateActivity was deprecated as of
#> epiregulon version 2.0.0
#> calculating TF activity from regulon using weightedMean
#> Warning in calculateActivity(expMatrix = GeneExpressionMatrix, regulon = regulon.w, : The weight column contains multiple subcolumns but no cluster information was provided.
#>           Using first column to compute activity...
#> aggregating regulons...
#> creating weight matrix...
#> calculating activity scores...
#> normalize by the number of targets...

score.combine[1:5,1:5]
#> 5 x 5 sparse Matrix of class "dgCMatrix"
#>        reprogram#TTAGGAACAAGGTACG-1 reprogram#GAGCGGTCAACCTGGT-1
#> AR                       0.01810276                   0.02010613
#> FOXA1                    0.01729800                   0.01785810
#> FOXA2                    0.01560426                   0.01909460
#> GATA6                    0.01584968                   0.05277976
#> NKX2-1                   0.03142773                   0.02065373
#>        reprogram#TTATAGCCACCCTCAC-1 reprogram#TGGTGATTCCTGTTCA-1
#> AR                      0.011616976                  0.011481322
#> FOXA1                   0.011377877                  0.011818823
#> FOXA2                   0.012336596                  0.015336953
#> GATA6                   0.009286548                  0.007445737
#> NKX2-1                  0.011787289                  0.009625869
#>        reprogram#TCGGTTCTCACTAGGT-1
#> AR                       0.01807436
#> FOXA1                    0.01685097
#> FOXA2                    0.01345712
#> GATA6                    0.01230806
#> NKX2-1                   0.02837461

5 Session Info

sessionInfo()
#> R version 4.5.1 Patched (2025-08-23 r88802)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] BSgenome.Hsapiens.UCSC.hg38_1.4.5 BSgenome_1.77.2                  
#>  [3] rtracklayer_1.69.1                BiocIO_1.19.0                    
#>  [5] Biostrings_2.77.2                 XVector_0.49.1                   
#>  [7] GenomeInfoDb_1.45.12              scMultiome_1.9.0                 
#>  [9] MultiAssayExperiment_1.35.9       ExperimentHub_2.99.6             
#> [11] AnnotationHub_3.99.6              BiocFileCache_2.99.6             
#> [13] dbplyr_2.5.1                      epiregulon_1.99.7                
#> [15] SingleCellExperiment_1.31.1       SummarizedExperiment_1.39.2      
#> [17] Biobase_2.69.1                    GenomicRanges_1.61.6             
#> [19] Seqinfo_0.99.3                    IRanges_2.43.6                   
#> [21] S4Vectors_0.47.5                  BiocGenerics_0.55.4              
#> [23] generics_0.1.4                    MatrixGenerics_1.21.0            
#> [25] matrixStats_1.5.0                 BiocStyle_2.37.1                 
#> 
#> loaded via a namespace (and not attached):
#>   [1] RColorBrewer_1.1-3          jsonlite_2.0.0             
#>   [3] magrittr_2.0.4              ggbeeswarm_0.7.2           
#>   [5] magick_2.9.0                farver_2.1.2               
#>   [7] rmarkdown_2.30              vctrs_0.6.5                
#>   [9] memoise_2.0.1               Rsamtools_2.25.3           
#>  [11] RCurl_1.98-1.17             tinytex_0.57               
#>  [13] htmltools_0.5.8.1           S4Arrays_1.9.1             
#>  [15] BiocBaseUtils_1.11.2        curl_7.0.0                 
#>  [17] BiocNeighbors_2.3.1         Rhdf5lib_1.31.1            
#>  [19] SparseArray_1.9.1           rhdf5_2.53.6               
#>  [21] sass_0.4.10                 bslib_0.9.0                
#>  [23] httr2_1.2.1                 cachem_1.1.0               
#>  [25] GenomicAlignments_1.45.4    igraph_2.2.0               
#>  [27] lifecycle_1.0.4             pkgconfig_2.0.3            
#>  [29] rsvd_1.0.5                  Matrix_1.7-4               
#>  [31] R6_2.6.1                    fastmap_1.2.0              
#>  [33] digest_0.6.37               TFMPvalue_0.0.9            
#>  [35] AnnotationDbi_1.71.2        scater_1.37.0              
#>  [37] dqrng_0.4.1                 irlba_2.3.5.1              
#>  [39] RSQLite_2.4.3               beachmat_2.25.5            
#>  [41] seqLogo_1.75.0              filelock_1.0.3             
#>  [43] labeling_0.4.3              httr_1.4.7                 
#>  [45] abind_1.4-8                 compiler_4.5.1             
#>  [47] bit64_4.6.0-1               withr_3.0.2                
#>  [49] S7_0.2.0                    backports_1.5.0            
#>  [51] BiocParallel_1.43.4         viridis_0.6.5              
#>  [53] DBI_1.2.3                   HDF5Array_1.37.0           
#>  [55] rappdirs_0.3.3              DelayedArray_0.35.3        
#>  [57] bluster_1.19.1              rjson_0.2.23               
#>  [59] gtools_3.9.5                caTools_1.18.3             
#>  [61] tools_4.5.1                 vipor_0.4.7                
#>  [63] beeswarm_0.4.0              glue_1.8.0                 
#>  [65] h5mread_1.1.1               restfulr_0.0.16            
#>  [67] rhdf5filters_1.21.4         grid_4.5.1                 
#>  [69] checkmate_2.3.3             cluster_2.1.8.1            
#>  [71] TFBSTools_1.47.1            gtable_0.3.6               
#>  [73] metapod_1.17.0              BiocSingular_1.25.1        
#>  [75] ScaledMatrix_1.17.0         ggrepel_0.9.6              
#>  [77] BiocVersion_3.22.0          pillar_1.11.1              
#>  [79] motifmatchr_1.31.1          limma_3.65.7               
#>  [81] dplyr_1.1.4                 lattice_0.22-7             
#>  [83] bit_4.6.0                   DirichletMultinomial_1.51.0
#>  [85] tidyselect_1.2.1            locfit_1.5-9.12            
#>  [87] scuttle_1.19.0              knitr_1.50                 
#>  [89] gridExtra_2.3               scrapper_1.3.17            
#>  [91] bookdown_0.45               edgeR_4.7.6                
#>  [93] xfun_0.53                   statmod_1.5.1              
#>  [95] UCSC.utils_1.5.0            yaml_2.3.10                
#>  [97] evaluate_1.0.5              codetools_0.2-20           
#>  [99] tibble_3.3.0                BiocManager_1.30.26        
#> [101] cli_3.6.5                   jquerylib_0.1.4            
#> [103] dichromat_2.0-0.1           Rcpp_1.1.0                 
#> [105] png_0.1-8                   XML_3.99-0.19              
#> [107] parallel_4.5.1              ggplot2_4.0.0              
#> [109] blob_1.2.4                  scran_1.37.0               
#> [111] bitops_1.0-9                pwalign_1.5.0              
#> [113] viridisLite_0.4.2           scales_1.4.0               
#> [115] purrr_1.1.0                 crayon_1.5.3               
#> [117] rlang_1.1.6                 cowplot_1.2.0              
#> [119] KEGGREST_1.49.2

Epiregulon tutorial with MultiAssayExperiment

24 October 2025

Package

Contents