A transcriptional regulatory network (TRN) consists of a collection of regulated target genes and transcription factors (TFs). The TFs recognize specific DNA sequences and influence gene expression across the genome, either activating or repressing the expression of target genes. The set of genes controlled by a given TF forms a regulon. The RTN package provides classes and methods for the reconstruction of TRNs and analysis of regulons. New computational frameworks are also available from RTNduals and RTNsurvival packages.
The RTN package is designed for the reconstruction of TRNs and analysis of regulons using mutual information (MI) (Fletcher et al. 2013). It is implemented by S4 classes in R (R Core Team 2012) and extends several methods previously validated for assessing regulons, e.g. MRA (Carro et al. 2010), GSEA (Subramanian et al. 2005), and EVSE (Castro et al. 2016). The package tests the association between a given TF and all potential targets using either RNA-Seq or microarray transcriptome data. It is tuned to deal with large gene expression datasets in order to build transcriptional regulatory units that are centered on TFs. RTN allows a user to set the stringency of the analysis in a stepwise process, including a bootstrap routine designed to remove unstable associations. Parallel data processing is available for steps that benefit from high-performance computing.
The TNI pipeline starts with the generic function
tni.constructor() and creates a
TNI-class object, which provides methods for TRN inference from high-throughput gene expression data. The
tni.constructor takes in a matrix of normalized gene expression and the corresponding gene and sample annotations, as well as a list of regulators to be evaluated. Here, the gene expression matrix and annotations are available in the
tniData dataset, which was extracted, pre-processed and size-reduced from Fletcher et al. (2013) and Curtis et al. (2012). This dataset consists of a list with 3 objects: a named gene expression matrix (
tniData$expData), a data frame with gene annotations (
tniData$rowAnnotation), and a data frame with sample annotations (
tniData$colAnnotation). We will use this dataset to demonstrate the construction of a small TRN that has only 5 regulons. In general, though, we recommend building regulons for all TFs annotated for a given species; please see the case study sections for additional recommendations.
tni.constructor method will check the consistency of all the given arguments. The TNI pipeline is then executed in three steps: (i) compute MI between a regulator and all potential targets, removing non-significant associations by permutation analysis; (ii) remove unstable interactions by bootstrapping; and (iii) apply the ARACNe algorithm. Additional comments are provided throughout the examples.
# Input 1: 'expData', a named gene expression matrix (genes on rows, samples on cols); # Input 2: 'regulatoryElements', a vector listing genes regarded as TFs # Input 3: 'rowAnnotation', an optional data frame with gene annotation # Input 4: 'colAnnotation', an optional data frame with sample annotation tfs <- c("FOXM1","E2F2","E2F3","RUNX2","PTTG1") rtni <- tni.constructor(expData = tniData$expData, regulatoryElements = tfs, rowAnnotation = tniData$rowAnnotation, colAnnotation = tniData$colAnnotation) # p.s. alternatively, 'expData' can be a 'SummarizedExperiment' object
tni.permutation() function takes the pre-processed
TNI-class object and returns a TRN inferred by permutation analysis (with corrections for multiple hypothesis testing).
# Please set nPermutations >= 1000 rtni <- tni.permutation(rtni, nPermutations = 100)
Unstable interactions are subsequently removed by bootstrap analysis using the
tni.bootstrap() function, which creates a consensus bootstrap network, referred here as
refnet (reference network).
rtni <- tni.bootstrap(rtni)
At this stage each target in the TRN may be linked to multiple TFs. Because regulation can occur by both direct (TF-target) and indirect interactions (TF-TF-target), the pipeline next applies the ARACNe algorithm (Margolin, Nemenman, et al. 2006) to remove the weakest interaction in any triplet formed by two TFs and a common target gene, preserving the dominant TF-target pair (Meyer, Lafitte, and Bontempi 2008). The ARACNe algorithm uses the data processing inequality (DPI) theorem to enrich the regulons with direct TF-target interactions, creating a DPI-filtered TRN, referred here as
tnet (transcriptional network). For additional details, please refer to Margolin, Wang, et al. (2006) and Fletcher et al. (2013). Briefly, consider three random variables,
Z that form a network triplet, with
X interacting with
Z only through
Y (i.e., the interaction network is
X->Y->Z), and no alternative path exists between
Z). The DPI theorem states that the information transferred between
Z is always larger than the information transferred between
Z. Based on this assumption, the ARACNe algorithm scans all triplets formed by two regulators and one target and removes the edge with the smallest MI value of each triplet, which is regarded as a redundant association.
rtni <- tni.dpi.filter(rtni)
For a summary of the resulting regulatory network we recommend using the
tni.regulon.summary() function. From the summary below, we can see the number of regulons, the number of inferred targets and the regulon size distribution.
tni.regulon.summary(rtni) ## Regulatory network comprised of 5 regulons. ## -- DPI-filtered network: ## regulatoryElements Targets Edges ## 5 1218 1385 ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 78 238 349 277 357 363 ## -- Reference network: ## regulatoryElements Targets Edges ## 5 1218 2629 ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 78 504 621 526 677 749 ## ---
tni.regulon.summary() function also lets us get detailed information about a specific regulon, including the number of positive and negative targets. Please use this information as an initial guide in assessing the set of regulons. Usually small regulons (<15 targets) are not useful for most downstream methods, and highly unbalanced regulons (e.g. those with only positive targets) provide unstable activity readouts.
tni.regulon.summary(rtni, regulatoryElements = "FOXM1") ## The FOXM1 regulon has 238 targets, it's a large and balanced regulon. ## -- DPI filtered network targets: ## Total Positive Negative ## 238 134 104 ## -- Reference network targets: ## Total Positive Negative ## 621 370 251 ## -- Regulators with mutual information: ## E2F2, E2F3, PTTG1 ## ## ---
All results available in the
TNI-class object can be retrieved using the
tni.get() accessory function. For example, setting
what = 'regulons.and.mode' will return a list with regulons, including the weight assigned for each interaction.
regulons <- tni.get(rtni, what = "regulons.and.mode", idkey = "SYMBOL") head(regulons$FOXM1) ## C1orf106 HBB MAPT HBA2 ZNF385B FOSB ## 0.1321503 -0.1378999 -0.2063462 -0.1029147 -0.1092636 -0.1271561
The absolute value of a weight represents the MI value, while the sign (+/-) indicates the predicted mode of action based on the Pearson’s correlation between the regulator and its targets. We can also retrieve the inferred regulons into an
igraph-class object (Csardi 2019) using the
tni.graph() function, which sets some basic graph attributes for visualization in the RedeR R package (Castro et al. 2012).
g <- tni.graph(rtni, regulatoryElements = c("FOXM1","E2F2","PTTG1"))
The next chunk shows how to plot the
igraph-class object using RedeR (Figure 1).
library(RedeR) rdp <- RedPort() calld(rdp) addGraph(rdp, g, layout=NULL) addLegend.color(rdp, g, type="edge") addLegend.shape(rdp, g) relax(rdp, ps = TRUE)
Figure 1. Graph representation of three regulons inferred by the RTN package.
In the previous section we outlined the TNI pipeline, which calculates a TRN and makes information available on its regulons. Here, we describe the TNA pipeline, which provides methods for doing enrichment analysis over a list of regulons. The TNA pipeline starts with the function
tni2tna.preprocess(), converting the
TNI-class into a
TNA-class object. Then regulons will be tested for association with a given gene expression signature, provided here in the
tnaData dataset. Note that the dataset in this vignette should be used for demonstration purposes only. It consists of a list with 3 objects: a named numeric vector with log2 fold changes from a differential gene expression analysis, called here a ‘phenotype’ (
tnaData$phenotype), a character vector listing the differentially expressed genes (
tnaData$hits), and a data frame with gene annotations mapped to the phenotype (
# Input 1: 'object', a TNI object with regulons # Input 2: 'phenotype', a named numeric vector, usually log2 differential expression levels # Input 3: 'hits', a character vector, usually a set of differentially expressed genes # Input 4: 'phenoIDs', an optional data frame with gene anottation mapped to the phenotype data(tnaData) rtna <- tni2tna.preprocess(object = rtni, phenotype = tnaData$phenotype, hits = tnaData$hits, phenoIDs = tnaData$phenoIDs)
tna.mra() function takes the
TNA-class object and runs the Master Regulator Analysis (MRA) (Carro et al. 2010) over a list of regulons (with corrections for multiple hypothesis testing). The MRA assesses the overlap between each regulon and the genes listed in the
# Run the MRA method rtna <- tna.mra(rtna)
All results available in the
TNA-class object can be retrieved using the
tna.get() accessory function; setting
what = 'mra' will return a data frame listing the total number of genes in the TRN (
Universe.Size), the number of targets in each regulon (
Regulon.Size), the number of genes listed in the
hits argument (
Total.Hits), the expected overlap between a given regulon and the ‘hits’ (
Expected.Hits), the observed overlap between a given regulon and the ‘hits’ (
Observed.Hits), the statistical significance of the observed overlap assessed by the
phyper() function (
Pvalue), and the adjusted P-value (
# Get MRA results; #..setting 'ntop = -1' will return all results, regardless of a threshold mra <- tna.get(rtna, what="mra", ntop = -1) head(mra) ## Regulon Universe.Size Regulon.Size Total.Hits Expected.Hits Observed.Hits ## 1870 E2F2 5304 349 660 43.43 87 ## 2305 FOXM1 5304 238 660 29.62 61 ## 9232 PTTG1 5304 363 660 45.17 74 ## 1871 E2F3 5304 357 660 44.42 61 ## 860 RUNX2 5304 78 660 9.71 6 ## Pvalue Adjusted.Pvalue ## 1870 2.5e-11 1.3e-10 ## 2305 1.0e-08 2.6e-08 ## 9232 6.0e-06 1.0e-05 ## 1871 5.0e-03 6.3e-03 ## 860 9.4e-01 9.4e-01
As a complementary approach, the
tna.gsea1() function runs the one-tailed gene set enrichment analysis (GSEA-1T) to find regulons associated with a particular ‘response’, which is represented by a ranked list of genes generated from a differential gene expression signature (i.e. the ‘phenotype’ included in the TNI-to-TNA preprocessing step). Here the regulon’s target genes are considered a gene set, which is evaluated against the phenotype. The GSEA-1T uses a rank-based scoring metric in order to test the association between the gene set and the phenotype (Subramanian et al. 2005).
# Run the GSEA method # Please set nPermutations >= 1000 rtna <- tna.gsea1(rtna, nPermutations=100)
what = 'gsea1' in the
tna.get() accessory function will retrive a data frame listing the GSEA statistics, and the corresponding GSEA plots can be generated using the
tna.plot.gsea1() function (Figure 2).
# Get GSEA results gsea1 <- tna.get(rtna, what="gsea1", ntop = -1) head(gsea1) ## Regulon Regulon.Size Observed.Score Pvalue Adjusted.Pvalue ## 1870 E2F2 349 0.68 4.2409e-26 1.5769e-25 ## 9232 PTTG1 362 0.65 6.3075e-26 1.5769e-25 ## 2305 FOXM1 238 0.68 1.9083e-22 3.1806e-22 ## 1871 E2F3 357 0.59 2.6079e-18 3.2598e-18 ## 860 RUNX2 78 0.45 1.6832e-01 1.6832e-01
# Plot GSEA results tna.plot.gsea1(rtna, labPheno="abs(log2 fold changes)", ntop = -1)