A transcriptional regulatory network (TRN) consists of a collection of regulated target genes and transcription factors (TFs). The TFs recognize specific DNA sequences and influence gene expression across the genome, either activating or repressing the expression of target genes. The set of genes controlled by a given TF forms a regulon. The RTN package provides classes and methods for the reconstruction of TRNs and analysis of regulons. New computational frameworks are also available from RTNduals and RTNsurvival packages.
The RTN package is designed for the reconstruction of TRNs and analysis of regulons using mutual information (MI) (Fletcher et al. 2013). It is implemented by S4 classes in R (R Core Team 2012) and extends several methods previously validated for assessing regulons, e.g. MRA (Carro et al. 2010), GSEA (Subramanian et al. 2005), and EVSE (Castro et al. 2016). The package tests the association between a given TF and all potential targets using either RNA-Seq or microarray transcriptome data. It is tuned to deal with large gene expression datasets in order to build transcriptional regulatory units that are centered on TFs. RTN allows a user to set the stringency of the analysis in a stepwise process, including a bootstrap routine designed to remove unstable associations. Parallel data processing is available for steps that benefit from high-performance computing.
The TNI pipeline starts with the generic function
tni.constructor() and creates a
TNI-class object, which provides methods for TRN inference from high-throughput gene expression data. The
tni.constructor takes in a matrix of normalized gene expression and the corresponding gene and sample annotations, as well as a list of regulators to be evaluated. Here, the gene expression matrix and annotations are available in the
tniData dataset, which was extracted, pre-processed and size-reduced from Fletcher et al. (2013) and Curtis et al. (2012). This dataset consists of a list with 3 objects: a named gene expression matrix (
tniData$expData), a data frame with gene annotations (
tniData$rowAnnotation), and a data frame with sample annotations (
tniData$colAnnotation). We will use this dataset to demonstrate the construction of a small TRN that has only 5 regulons. In general, though, we recommend building regulons for all TFs annotated for a given species; please see the case study sections for additional recommendations.
tni.constructor method will check the consistency of all the given arguments. The TNI pipeline is then executed in three steps: (i) compute MI between a regulator and all potential targets, removing non-significant associations by permutation analysis; (ii) remove unstable interactions by bootstrapping; and (iii) apply the ARACNe algorithm. Additional comments are provided throughout the examples.
# Input 1: 'expData', a named gene expression matrix (genes on rows, samples on cols); # Input 2: 'regulatoryElements', a vector listing genes regarded as TFs # Input 3: 'rowAnnotation', an optional data frame with gene annotation # Input 4: 'colAnnotation', an optional data frame with sample annotation tfs <- c("FOXM1","E2F2","E2F3","RUNX2","PTTG1") rtni <- tni.constructor(expData = tniData$expData, regulatoryElements = tfs, rowAnnotation = tniData$rowAnnotation, colAnnotation = tniData$colAnnotation) # p.s. alternatively, 'expData' can be a 'SummarizedExperiment' object
tni.permutation() function takes the pre-processed
TNI-class object and returns a TRN inferred by permutation analysis (with corrections for multiple hypothesis testing).
# Please set nPermutations >= 1000 rtni <- tni.permutation(rtni, nPermutations = 100)
Unstable interactions are subsequently removed by bootstrap analysis using the
tni.bootstrap() function, which creates a consensus bootstrap network, referred here as
refnet (reference network).
rtni <- tni.bootstrap(rtni)
At this stage each target in the TRN may be linked to multiple TFs. Because regulation can occur by both direct (TF-target) and indirect interactions (TF-TF-target), the pipeline next applies the ARACNe algorithm (Margolin, Nemenman, et al. 2006) to remove the weakest interaction in any triplet formed by two TFs and a common target gene, preserving the dominant TF-target pair (Meyer, Lafitte, and Bontempi 2008). The ARACNe algorithm uses the data processing inequality (DPI) theorem to enrich the regulons with direct TF-target interactions, creating a DPI-filtered TRN, referred here as
tnet (transcriptional network). For additional details, please refer to Margolin, Wang, et al. (2006) and Fletcher et al. (2013). Briefly, consider three random variables,
Z that form a network triplet, with
X interacting with
Z only through
Y (i.e., the interaction network is
X->Y->Z), and no alternative path exists between
Z). The DPI theorem states that the information transferred between
Z is always larger than the information transferred between
Z. Based on this assumption, the ARACNe algorithm scans all triplets formed by two regulators and one target and removes the edge with the smallest MI value of each triplet, which is regarded as a redundant association.
rtni <- tni.dpi.filter(rtni)
For a summary of the resulting regulatory network we recommend using the
tni.regulon.summary() function. From the summary below, we can see the number of regulons, the number of inferred targets and the regulon size distribution.
tni.regulon.summary(rtni) ## Regulatory network comprised of 5 regulons. ## -- DPI-filtered network: ## regulatoryElements Targets Edges ## 5 1195 1365 ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 81 237 324 273 358 365 ## -- Reference network: ## regulatoryElements Targets Edges ## 5 1195 2575 ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 81 510 625 515 667 692 ## ---
tni.regulon.summary() function also lets us get detailed information about a specific regulon, including the number of positive and negative targets. Please use this information as an initial guide in assessing the set of regulons. Usually small regulons (<15 targets) are not useful for most downstream methods, and highly unbalanced regulons (e.g. those with only positive targets) provide unstable activity readouts.
tni.regulon.summary(rtni, regulatoryElements = "FOXM1") ## The FOXM1 regulon has 237 targets, it's a large and balanced regulon. ## -- DPI filtered network targets: ## Total Positive Negative ## 237 134 103 ## -- Reference network targets: ## Total Positive Negative ## 625 365 260 ## -- Regulators with mutual information: ## E2F2, E2F3, PTTG1 ## ## ---
All results available in the
TNI-class object can be retrieved using the
tni.get() accessory function. For example, setting
what = 'regulons.and.mode' will return a list with regulons, including the weight assigned for each interaction.
regulons <- tni.get(rtni, what = "regulons.and.mode", idkey = "SYMBOL") head(regulons$FOXM1) ## C1orf106 HBB MAPT HBA2 ZNF385B FOSB ## 0.1321503 -0.1378999 -0.2063462 -0.1029147 -0.1092636 -0.1271561
The absolute value of a weight represents the MI value, while the sign (+/-) indicates the predicted mode of action based on the Pearson’s correlation between the regulator and its targets. We can also retrieve the inferred regulons into an
igraph-class object (Csardi 2019) using the
tni.graph() function, which sets some basic graph attributes for visualization in the RedeR R package (Castro et al. 2012).
g <- tni.graph(rtni, regulatoryElements = c("FOXM1","E2F2","PTTG1"))
The next chunk shows how to plot the
igraph-class object using RedeR (Figure 1).
library(RedeR) rdp <- RedPort() calld(rdp) addGraph(rdp, g, layout=NULL) addLegend.color(rdp, g, type="edge") addLegend.shape(rdp, g) relax(rdp, ps = TRUE)
Figure 1. Graph representation of three regulons inferred by the RTN package.
In the previous section we outlined the TNI pipeline, which calculates a TRN and makes information available on its regulons. Here, we describe the TNA pipeline, which provides methods for doing enrichment analysis over a list of regulons. The TNA pipeline starts with the function
tni2tna.preprocess(), converting the
TNI-class into a
TNA-class object. Then regulons will be tested for association with a given gene expression signature, provided here in the
tnaData dataset. Note that the dataset in this vignette should be used for demonstration purposes only. It consists of a list with 3 objects: a named numeric vector with log2 fold changes from a differential gene expression analysis, called here a ‘phenotype’ (
tnaData$phenotype), a character vector listing the differentially expressed genes (
tnaData$hits), and a data frame with gene annotations mapped to the phenotype (
# Input 1: 'object', a TNI object with regulons # Input 2: 'phenotype', a named numeric vector, usually log2 differential expression levels # Input 3: 'hits', a character vector, usually a set of differentially expressed genes # Input 4: 'phenoIDs', an optional data frame with gene anottation mapped to the phenotype data(tnaData) rtna <- tni2tna.preprocess(object = rtni, phenotype = tnaData$phenotype, hits = tnaData$hits, phenoIDs = tnaData$phenoIDs)
tna.mra() function takes the
TNA-class object and runs the Master Regulator Analysis (MRA) (Carro et al. 2010) over a list of regulons (with corrections for multiple hypothesis testing). The MRA assesses the overlap between each regulon and the genes listed in the
# Run the MRA method rtna <- tna.mra(rtna)
All results available in the
TNA-class object can be retrieved using the
tna.get() accessory function; setting
what = 'mra' will return a data frame listing the total number of genes in the TRN (
Universe.Size), the number of targets in each regulon (
Regulon.Size), the number of genes listed in the
hits argument (
Total.Hits), the expected overlap between a given regulon and the ‘hits’ (
Expected.Hits), the observed overlap between a given regulon and the ‘hits’ (
Observed.Hits), the statistical significance of the observed overlap assessed by the
phyper() function (
Pvalue), and the adjusted P-value (
# Get MRA results; #..setting 'ntop = -1' will return all results, regardless of a threshold mra <- tna.get(rtna, what="mra", ntop = -1) head(mra) ## Regulon Universe.Size Regulon.Size Total.Hits Expected.Hits Observed.Hits ## 1870 E2F2 5304 324 660 40.32 87 ## 2305 FOXM1 5304 237 660 29.49 60 ## 9232 PTTG1 5304 358 660 44.55 74 ## 1871 E2F3 5304 365 660 45.42 62 ## 860 RUNX2 5304 81 660 10.08 8 ## Pvalue Adjusted.Pvalue ## 1870 2.9e-13 1.5e-12 ## 2305 2.2e-08 5.6e-08 ## 9232 3.5e-06 5.9e-06 ## 1871 5.4e-03 6.7e-03 ## 860 8.1e-01 8.1e-01
As a complementary approach, the
tna.gsea1() function runs the one-tailed gene set enrichment analysis (GSEA-1T) to find regulons associated with a particular ‘response’, which is represented by a ranked list of genes generated from a differential gene expression signature (i.e. the ‘phenotype’ included in the TNI-to-TNA preprocessing step). Here the regulon’s target genes are considered a gene set, which is evaluated against the phenotype. The GSEA-1T uses a rank-based scoring metric in order to test the association between the gene set and the phenotype (Subramanian et al. 2005).
# Run the GSEA method # Please set nPermutations >= 1000 rtna <- tna.gsea1(rtna, nPermutations=100)
what = 'gsea1' in the
tna.get() accessory function will retrive a data frame listing the GSEA statistics, and the corresponding GSEA plots can be generated using the
tna.plot.gsea1() function (Figure 2).
# Get GSEA results gsea1 <- tna.get(rtna, what="gsea1", ntop = -1) head(gsea1) ## Regulon Regulon.Size Observed.Score Pvalue Adjusted.Pvalue ## 1870 E2F2 324 0.69 3.1018e-32 1.5509e-31 ## 9232 PTTG1 358 0.65 9.0035e-28 2.2509e-27 ## 2305 FOXM1 237 0.68 1.8727e-18 3.1212e-18 ## 1871 E2F3 365 0.59 2.4925e-16 3.1156e-16 ## 860 RUNX2 81 0.47 6.9307e-02 6.9307e-02
# Plot GSEA results tna.plot.gsea1(rtna, labPheno="abs(log2 fold changes)", ntop = -1)