Introduction

Another Bioconductor package we will also be introduce is called ELMER (L. Yao et al. 2015,Chedraoui Silva et al. (2017)) which allows one to identify DNA methylation changes in distal regulatory regions and correlate these signatures with expression of nearby genes to identify transcriptional targets associated with cancer. For these distal probes correlated with a gene, a transcription factor motif analysis is performed followed by expression analysis of transcription factors to infer upstream regulators.

We expect that participants of this workshop will understand the integrative analysis performed by using TCGAbiolinks + ELMER, as well as be able to execute it from the data acquisition process to the final interpretation of the results. The workshop assumes users with an intermediate level of familiarity with R, and basic understanding of tumor biology.

The figure below highlights the workflow part which will be covered in this section.

Loading required libraries

library(ELMER)
library(DT)
library(dplyr)

ELMER data input

A Multi Assay Experiment object from the MultiAssayExperiment package is the input for multiple main functions of ELMER. To create it you can use the createMAE function.

But instead of using the data previously downloaded, we will use one with more samples provided in this package.

library(MultiAssayExperiment)
lusc.exp

## class: RangedSummarizedExperiment 
## dim: 3742 234 
## metadata(0):
## assays(1): ''
## rownames(3742): ENSG00000188984 ENSG00000204518 ...
##   ENSG00000162378 ENSG00000036549
## rowData names(2): external_gene_name ensembl_gene_id
## colnames(234): TCGA-22-5472-01A-01R-1635-07
##   TCGA-22-5489-01A-01R-1635-07 ... TCGA-56-7730-01A-11R-2125-07
##   TCGA-56-7731-01A-11R-2125-07
## colData names(0):

lusc.met

## class: RangedSummarizedExperiment 
## dim: 1702 268 
## metadata(0):
## assays(1): ''
## rownames(1702): cg00045114 cg00050294 ... cg13985485 cg08542090
## rowData names(36): addressA addressB ... MASK.extBase MASK.general
## colnames(268): TCGA-43-3394-11A-01D-1551-05
##   TCGA-43-3920-11B-01D-1551-05 ... TCGA-98-8022-01A-11D-2245-05
##   TCGA-98-8023-01A-11D-2245-05
## colData names(1): samples

Also, we will need to filter the probes in the MAE to select only distal probes.

distal.probes <- get.feature.probe(genome = "hg38", met.platform = "450K")

The createMAE function will receive both DNA methylation and gene expression objects. Also filter.probes argument will select only probes in the regions of the distal.probes object.

library(MultiAssayExperiment)
mae <- createMAE(exp = lusc.exp, 
                 met = lusc.met,
                 save = TRUE,
                 linearize.exp = TRUE,
                 filter.probes = distal.probes,
                 save.filename = "mae_bioc2017.rda",
                 met.platform = "450K",
                 genome = "hg38",
                 TCGA = TRUE)
mae

## A MultiAssayExperiment object of 2 listed
##  experiments with user-defined names and respective classes. 
##  Containing an ExperimentList class object of length 2: 
##  [1] DNA methylation: RangedSummarizedExperiment with 1584 rows and 234 columns 
##  [2] Gene expression: RangedSummarizedExperiment with 3742 rows and 234 columns 
## Features: 
##  experiments() - obtain the ExperimentList instance 
##  colData() - the primary/phenotype DataFrame 
##  sampleMap() - the sample availability DataFrame 
##  `$`, `[`, `[[` - extract colData columns, subset, or experiment 
##  *Format() - convert into a long or wide DataFrame 
##  assays() - convert ExperimentList to a SimpleList of matrices

colData(mae)[1:5,] %>%  as.data.frame %>% datatable(options = list(scrollX = TRUE))

sampleMap(mae)[1:5,] %>%  as.data.frame %>% datatable(options = list(scrollX = TRUE))

Analysis

Identifying differentially methylated probes

This step is to identify DNA methylation changes at distal enhancer probes which is carried out by function get.diff.meth.

For each distal probe, the function first rank samples from group 1 and group 2 samples by their DNA methylation beta values. To identify hypomethylated probes, the function compared the lower control quintile (20% of control samples with the lowest methylation) to the lower experiment quintile (20% of experiment samples with the lowest methylation), using an unpaired one-tailed t-test.

Source: Yao, Lijing, et al. “Inferring regulatory element landscapes and transcription factor networks from cancer methylomes.” Genome biology 16.1 (2015): 105.

Main get.diff.meth arguments

Argument	Description
data	A `multiAssayExperiment` with DNA methylation and Gene Expression data. See `createMAE` function.
diff.dir	A character can be “hypo” or “hyper”, showing differential methylation dirction. It can be “hypo” which is only selecting hypomethylated probes; “hyper” which is only selecting hypermethylated probes
minSubgroupFrac	A number ranging from 0 to 1,specifying the fraction of extreme samples from group 1 and group 2 that are used to identify the differential DNA methylation. The default is 0.2 because we typically want to be able to detect a specific (possibly unknown) molecular subtype among tumor; these subtypes often make up only a minority of samples, and 20% was chosen as a lower bound for the purposes of statistical power. If you are using pre-defined group labels, such as treated replicates vs. untreated replicated, use a value of 1.0 (*Supervised* mode)
pvalue	A number specifies the significant P value (adjusted P value by BH) cutoff for selecting significant hypo/hyper-methylated probes. Default is 0.01
group.col	A column defining the groups of the sample. You can view the available columns using: `colnames(MultiAssayExperiment::colData(data))`.
group1	A group from group.col. ELMER will run group1 vs group2. That means, if direction is hyper, get probes hypermethylated in group 1 compared to group 2.
group2	A group from group.col. ELMER will run group1 vs group2. That means, if direction is hyper, get probes hypermethylated in group 1 compared to group 2.
sig.dif	A number specifies the smallest DNA methylation difference as a cutoff for selecting significant hypo/hyper-methylated probes. Default is 0.3.

sig.diff <- get.diff.meth(data = mae, 
                          group.col = "definition",
                          group1 =  "Primary solid Tumor",
                          group2 = "Solid Tissue Normal",
                          minSubgroupFrac = 0.2,
                          sig.dif = 0.3,
                          diff.dir = "hypo", # Search for hypomethylated probes in group 1
                          cores = 1, 
                          dir.out ="result", 
                          pvalue = 0.01)

as.data.frame(sig.diff)  %>% datatable(options = list(scrollX = TRUE))

# get.diff.meth automatically save output files. 
# getMethdiff.hypo.probes.csv contains statistics for all the probes.
# getMethdiff.hypo.probes.significant.csv contains only the significant probes which
# is the same with sig.diff
dir(path = "result", pattern = "getMethdiff")

## [1] "getMethdiff.hypo.probes.csv"            
## [2] "getMethdiff.hypo.probes.significant.csv"

Identifying putative probe-gene pairs

This step is to link enhancer probes with methylation changes to target genes with expression changes and report the putative target gene for selected probes. This is carried out by function get.pair.

For each distal probe with differential methylation the closest 10 upstream genes and the closest 10 downstream genes were tested for correlation between methylation of the probe and expression of the gene. Thus, exactly 20 statistical tests were performed for each probe, as follows. For each probe-gene pair, the samples (all experiment samples and control samples) were divided into two groups: the M group, which consisted of the upper methylation quintile (the 20% of samples with the highest methylation at the enhancer probe), and the U group, which consisted of the lowest methylation quintile (the 20% of samples with the lowest methylation.) For each probe-gene pair tested, the raw p-value Pr was corrected for multiple hypothesis using a permutation approach.

Source: Yao, Lijing, et al. “Inferring regulatory element landscapes and transcription factor networks from cancer methylomes.” Genome biology 16.1 (2015): 105.

Main get.pair arguments

Argument	Description
data	A multiAssayExperiment with DNA methylation and Gene Expression data. See createMAE function.
nearGenes	Can be either a list containing output of GetNearGenes function or path of rda file containing output of `GetNearGenes` function.
minSubgroupFrac	A number ranging from 0 to 1.0 specifying the percentage of samples used to create groups U (unmethylated) and M (methylated) used to link probes to genes. Default is 0.4 (lowest quintile samples will be in the U group and the highest quintile samples in the M group).
permu.size	A number specify the times of permuation. Default is 10000.
pvalue	A number specify the raw p-value cutoff for defining signficant pairs. Default is 0.05. It will select the significant P value cutoff before calculating the empirical p-values.
Pe	A number specify the empirical p-value cutoff for defining signficant pairs. Default is 0.001.
group.col	A column defining the groups of the sample. You can view the available columns using: `colnames(MultiAssayExperiment::colData(data))`.
group1	A group from group.col.
group2	A group from group.col.
filter.probes	Should filter probes by selecting only probes that have at least a certain number of samples below and above a certain cut-off. See `preAssociationProbeFiltering` function.
filter.portion	A number specify the cut point to define binary methlation level for probe loci. Default is 0.3. When beta value is above 0.3, the probe is methylated and vice versa. For one probe, the percentage of methylated and unmethylated samples should be above filter.percentage value. Only used if filter.probes is TRUE. See preAssociationProbeFiltering function.
filter.percentage	Minimum percentage of samples to be considered in methylated and unmethylated for the filter.portion option. Default 5%. Only used if filter.probes is TRUE. See preAssociationProbeFiltering function.

nearGenes <- GetNearGenes(data = mae, 
                         probes = sig.diff$probe, 
                         numFlankingGenes = 20, # 10 upstream and 10 dowstream genes
                         cores = 1)

Hypo.pair <- get.pair(data = mae,
                      group.col = "definition",
                      group1 =  "Primary solid Tumor",
                      group2 = "Solid Tissue Normal",
                      nearGenes = nearGenes,
                      minSubgroupFrac = 0.4, # % of samples to use in to create groups U/M
                      permu.dir = "result/permu",
                      permu.size = 100, # Please set to 100000 to get significant results
                      pvalue = 0.05,   
                      Pe = 0.01, # Please set to 0.001 to get significant results
                      filter.probes = TRUE, # See preAssociationProbeFiltering function
                      filter.percentage = 0.05,
                      filter.portion = 0.3,
                      dir.out = "result",
                      cores = 1,
                      label = "hypo")

Hypo.pair %>% datatable(options = list(scrollX = TRUE))

# get.pair automatically save output files. 
#getPair.hypo.all.pairs.statistic.csv contains statistics for all the probe-gene pairs.
#getPair.hypo.pairs.significant.csv contains only the significant probes which is 
# same with Hypo.pair.
dir(path = "result", pattern = "getPair")

## [1] "getPair.hypo.all.pairs.statistic.csv"                  
## [2] "getPair.hypo.pairs.significant.csv"                    
## [3] "getPair.hypo.pairs.statistic.with.empirical.pvalue.csv"

Motif enrichment analysis on the selected probes

This step is to identify enriched motif in a set of probes which is carried out by function get.enriched.motif.

This function uses a pre-calculated data Probes.motif which was generated using HOMER with a \(p-value \le 10^{–4}\) to scan a \(\pm250bp\) region around each probe using HOmo sapiens COmprehensive MOdel COllection http://hocomoco.autosome.ru/ v10 (Kulakovskiy et al. 2016) position weight matrices (PWMs). For each probe set tested (i.e. the list of gene-linked hypomethylated probes in a given group), a motif enrichment Odds Ratio and a 95% confidence interval were calculated using following formulas: \[ p= \frac{a}{a+b} \] \[ P= \frac{c}{c+d} \] \[ Odds\quad Ratio = \frac{\frac{p}{1-p}}{\frac{P}{1-P}} \] \[ SD = \sqrt{\frac{1}{a}+\frac{1}{b}+\frac{1}{c}+\frac{1}{d}} \] \[ lower\quad boundary\quad of\quad 95\%\quad confidence\quad interval = \exp{(\ln{OR}-SD)} \]

where a is the number of probes within the selected probe set that contain one or more motif occurrences; b is the number of probes within the selected probe set that do not contain a motif occurrence; c and d are the same counts within the entire enhancer probe set. A probe set was considered significantly enriched for a particular motif if the 95% confidence interval of the Odds Ratio was greater than 1.1 (specified by option lower.OR, 1.1 is default), and the motif occurred at least 10 times (specified by option min.incidence. 10 is default) in the probe set. As described in the text, Odds Ratios were also used for ranking candidate motifs.

Main get.pair arguments

Argument	Description
data	A multi Assay Experiment from createMAE function. If set and probes.motif/background probes are missing this will be used to get this other two arguments correctly. This argument is not require, you can set probes.motif and the backaground.probes manually.
probes	A vector lists the name of probes to define the set of probes in which motif enrichment OR and confidence interval will be calculated.
lower.OR	A number specifies the smallest lower boundary of 95% confidence interval for Odds Ratio. The motif with higher lower boudnary of 95% confidence interval for Odds Ratio than the number are the significantly enriched motifs (detail see reference).
min.incidence	A non-negative integer specifies the minimum incidence of motif in the given probes set. 10 is default.

# Identify enriched motif for significantly hypomethylated probes which 
# have putative target genes.
enriched.motif <- get.enriched.motif(data = mae,
                                     probes = Hypo.pair$Probe, 
                                     dir.out = "result", 
                                     label = "hypo",
                                     min.incidence = 10,
                                     lower.OR = 1.1)
names(enriched.motif) # enriched motifs

##  [1] "ARI3A_HUMAN.H10MO.D" "ARI5B_HUMAN.H10MO.C" "BATF_HUMAN.H10MO.S" 
##  [4] "BCL6_HUMAN.H10MO.C"  "CDX1_HUMAN.H10MO.C"  "CDX2_HUMAN.H10MO.C" 
##  [7] "CEBPA_HUMAN.H10MO.A" "CEBPB_HUMAN.H10MO.A" "CEBPG_HUMAN.H10MO.C"
## [10] "CPEB1_HUMAN.H10MO.D" "DMBX1_HUMAN.H10MO.D" "EVX1_HUMAN.H10MO.D" 
## [13] "EVX2_HUMAN.H10MO.A"  "FOSB_HUMAN.H10MO.C"  "FOSL1_HUMAN.H10MO.A"
## [16] "FOSL2_HUMAN.H10MO.A" "FOS_HUMAN.H10MO.A"   "FOXJ3_HUMAN.H10MO.S"
## [19] "FOXO1_HUMAN.H10MO.C" "FOXQ1_HUMAN.H10MO.C" "GATA5_HUMAN.H10MO.D"
## [22] "GSX2_HUMAN.H10MO.D"  "HBP1_HUMAN.H10MO.D"  "HMGA1_HUMAN.H10MO.D"
## [25] "HMX1_HUMAN.H10MO.D"  "HXA10_HUMAN.H10MO.C" "HXA1_HUMAN.H10MO.C" 
## [28] "HXB13_HUMAN.H10MO.D" "HXB3_HUMAN.H10MO.D"  "HXB7_HUMAN.H10MO.C" 
## [31] "HXC6_HUMAN.H10MO.D"  "HXD11_HUMAN.H10MO.D" "HXD12_HUMAN.H10MO.D"
## [34] "HXD4_HUMAN.H10MO.D"  "HXD8_HUMAN.H10MO.D"  "HXD9_HUMAN.H10MO.D" 
## [37] "IRF1_HUMAN.H10MO.A"  "IRF3_HUMAN.H10MO.C"  "IRF5_HUMAN.H10MO.D" 
## [40] "IRF9_HUMAN.H10MO.C"  "IRX3_HUMAN.H10MO.D"  "JUND_HUMAN.H10MO.A" 
## [43] "JUN_HUMAN.H10MO.A"   "LMX1A_HUMAN.H10MO.D" "MEF2B_HUMAN.H10MO.D"
## [46] "MNX1_HUMAN.H10MO.D"  "NANOG_HUMAN.H10MO.A" "NANOG_HUMAN.H10MO.S"
## [49] "NFAC2_HUMAN.H10MO.B" "NFAT5_HUMAN.H10MO.D" "NKX31_HUMAN.H10MO.C"
## [52] "PBX1_HUMAN.H10MO.B"  "PO2F1_HUMAN.H10MO.B" "PO3F3_HUMAN.H10MO.D"
## [55] "PO3F4_HUMAN.H10MO.D" "PO4F1_HUMAN.H10MO.D" "PO4F3_HUMAN.H10MO.D"
## [58] "PO5F1_HUMAN.H10MO.A" "PO6F1_HUMAN.H10MO.D" "PROP1_HUMAN.H10MO.D"
## [61] "SOX11_HUMAN.H10MO.D" "SOX1_HUMAN.H10MO.D"  "SOX21_HUMAN.H10MO.D"
## [64] "SOX2_HUMAN.H10MO.B"  "SOX3_HUMAN.H10MO.D"  "SOX8_HUMAN.H10MO.D" 
## [67] "SRY_HUMAN.H10MO.B"   "STAT2_HUMAN.H10MO.B" "ZFHX3_HUMAN.H10MO.D"
## [70] "ZSC16_HUMAN.H10MO.D"

head(enriched.motif["P73_HUMAN.H10MO.A"]) ## probes in the given set that have TP53 motif.

## $<NA>
## NULL

# get.enriched.motif automatically save output files. 
# getMotif.hypo.enriched.motifs.rda contains enriched motifs and the probes with the motif. 
# getMotif.hypo.motif.enrichment.csv contains summary of enriched motifs.
dir(path = "result", pattern = "getMotif")

## [1] "getMotif.hypo.enriched.motifs.rda" 
## [2] "getMotif.hypo.motif.enrichment.csv"

motif.enrichment <- read.csv("result/getMotif.hypo.motif.enrichment.csv")
motif.enrichment  %>% datatable(options = list(scrollX = TRUE))

# motif enrichment figure will be automatically generated.
dir(path = "result", pattern = "motif.enrichment.pdf")

## [1] "hypo.all.quality.motif.enrichment.pdf" 
## [2] "hypo.quality.A-DS.motif.enrichment.pdf"

Figure: hypo.all.quality.motif.enrichment.pdf

Motif enrichment plot

Identifying regulatory TFs

This step is to identify regulatory TF whose expression associates with TF binding motif DNA methylation which is carried out by function get.TFs.

For each motif considered to be enriched within a particular probe set, it will compare the average DNA methylation at all distal enhancer probes within \(\pm250bp\)
of a motif occurrence, to the expression of human TFs. A statistical test was performed for each motif-TF pair, as follows. The samples (all groups samples) were divided into two groups: the M group, which consisted of the 20% of samples with the highest average methylation at all motif-adjacent probes, and the U group, which consisted of the 20% of samples with the lowest methylation. For each candidate motif-TF pair, the Mann-Whitney U test was used to test the null hypothesis that overall gene expression in group M was greater or equal than that in group U. All TFs were ranked by the \(-log_{10}(P_{r})\), and those falling within the top 5% of this ranking were considered candidate upstream regulators.

Source: Yao, Lijing, et al. “Inferring regulatory element landscapes and transcription factor networks from cancer methylomes.” Genome biology 16.1 (2015): 105.

Main get.pair arguments

Argument	Description
data	A multiAssayExperiment with DNA methylation and Gene Expression data. See `createMAE` function.
enriched.motif	A list containing output of get.enriched.motif function or a path of XX.rda file containing output of get.enriched.motif function.
group.col	A column defining the groups of the sample. You can view the available columns using: `colnames(MultiAssayExperiment::colData(data))`.
group1	A group from group.col.
group2	A group from group.col.
minSubgroupFrac	A number ranging from 0 to 1 specifying the percentage of samples used to create the groups U (unmethylated) and M (methylated) used to link probes to TF expression. Default is 0.4 (lowest quintile of all samples will be in the U group and the highest quintile of all samples in the M group).

## identify regulatory TF for the enriched motifs
TF <- get.TFs(data = mae, 
              group.col = "definition",
              group1 =  "Primary solid Tumor",
              group2 = "Solid Tissue Normal",
              minSubgroupFrac = 0.4,
              enriched.motif = enriched.motif,
              dir.out = "result", 
              cores = 1, 
              label = "hypo")

TF  %>% datatable(options = list(scrollX = TRUE))

# get.TFs automatically save output files. 
# getTF.hypo.TFs.with.motif.pvalue.rda contains statistics for all TF with average 
# DNA methylation at sites with the enriched motif.
# getTF.hypo.significant.TFs.with.motif.summary.csv contains only the significant probes.
dir(path = "result", pattern = "getTF")

## [1] "getTF.hypo.TFs.with.motif.pvalue.rda"             
## [2] "getTF.hypo.significant.TFs.with.motif.summary.csv"

# TF ranking plot based on statistics will be automatically generated.
dir(path = "result/TFrankPlot_family/", pattern = "pdf")

##  [1] "ARI3A_HUMAN.H10MO.D.TFrankPlot.pdf"
##  [2] "ARI5B_HUMAN.H10MO.C.TFrankPlot.pdf"
##  [3] "BATF_HUMAN.H10MO.S.TFrankPlot.pdf" 
##  [4] "BCL6_HUMAN.H10MO.C.TFrankPlot.pdf" 
##  [5] "CDX1_HUMAN.H10MO.C.TFrankPlot.pdf" 
##  [6] "CDX2_HUMAN.H10MO.C.TFrankPlot.pdf" 
##  [7] "CEBPA_HUMAN.H10MO.A.TFrankPlot.pdf"
##  [8] "CEBPB_HUMAN.H10MO.A.TFrankPlot.pdf"
##  [9] "CEBPG_HUMAN.H10MO.C.TFrankPlot.pdf"
## [10] "CPEB1_HUMAN.H10MO.D.TFrankPlot.pdf"
## [11] "DMBX1_HUMAN.H10MO.D.TFrankPlot.pdf"
## [12] "EVX1_HUMAN.H10MO.D.TFrankPlot.pdf" 
## [13] "EVX2_HUMAN.H10MO.A.TFrankPlot.pdf" 
## [14] "FOSB_HUMAN.H10MO.C.TFrankPlot.pdf" 
## [15] "FOSL1_HUMAN.H10MO.A.TFrankPlot.pdf"
## [16] "FOSL2_HUMAN.H10MO.A.TFrankPlot.pdf"
## [17] "FOS_HUMAN.H10MO.A.TFrankPlot.pdf"  
## [18] "FOXJ3_HUMAN.H10MO.S.TFrankPlot.pdf"
## [19] "FOXO1_HUMAN.H10MO.C.TFrankPlot.pdf"
## [20] "FOXQ1_HUMAN.H10MO.C.TFrankPlot.pdf"
## [21] "GATA5_HUMAN.H10MO.D.TFrankPlot.pdf"
## [22] "GSX2_HUMAN.H10MO.D.TFrankPlot.pdf" 
## [23] "HBP1_HUMAN.H10MO.D.TFrankPlot.pdf" 
## [24] "HMGA1_HUMAN.H10MO.D.TFrankPlot.pdf"
## [25] "HMX1_HUMAN.H10MO.D.TFrankPlot.pdf" 
## [26] "HXA10_HUMAN.H10MO.C.TFrankPlot.pdf"
## [27] "HXA1_HUMAN.H10MO.C.TFrankPlot.pdf" 
## [28] "HXB13_HUMAN.H10MO.D.TFrankPlot.pdf"
## [29] "HXB3_HUMAN.H10MO.D.TFrankPlot.pdf" 
## [30] "HXB7_HUMAN.H10MO.C.TFrankPlot.pdf" 
## [31] "HXC6_HUMAN.H10MO.D.TFrankPlot.pdf" 
## [32] "HXD11_HUMAN.H10MO.D.TFrankPlot.pdf"
## [33] "HXD12_HUMAN.H10MO.D.TFrankPlot.pdf"
## [34] "HXD4_HUMAN.H10MO.D.TFrankPlot.pdf" 
## [35] "HXD8_HUMAN.H10MO.D.TFrankPlot.pdf" 
## [36] "HXD9_HUMAN.H10MO.D.TFrankPlot.pdf" 
## [37] "IRF1_HUMAN.H10MO.A.TFrankPlot.pdf" 
## [38] "IRF3_HUMAN.H10MO.C.TFrankPlot.pdf" 
## [39] "IRF5_HUMAN.H10MO.D.TFrankPlot.pdf" 
## [40] "IRF9_HUMAN.H10MO.C.TFrankPlot.pdf" 
## [41] "IRX3_HUMAN.H10MO.D.TFrankPlot.pdf" 
## [42] "JUND_HUMAN.H10MO.A.TFrankPlot.pdf" 
## [43] "JUN_HUMAN.H10MO.A.TFrankPlot.pdf"  
## [44] "LMX1A_HUMAN.H10MO.D.TFrankPlot.pdf"
## [45] "MEF2B_HUMAN.H10MO.D.TFrankPlot.pdf"
## [46] "MNX1_HUMAN.H10MO.D.TFrankPlot.pdf" 
## [47] "NANOG_HUMAN.H10MO.A.TFrankPlot.pdf"
## [48] "NANOG_HUMAN.H10MO.S.TFrankPlot.pdf"
## [49] "NFAC2_HUMAN.H10MO.B.TFrankPlot.pdf"
## [50] "NFAT5_HUMAN.H10MO.D.TFrankPlot.pdf"
## [51] "NKX31_HUMAN.H10MO.C.TFrankPlot.pdf"
## [52] "PBX1_HUMAN.H10MO.B.TFrankPlot.pdf" 
## [53] "PO2F1_HUMAN.H10MO.B.TFrankPlot.pdf"
## [54] "PO3F3_HUMAN.H10MO.D.TFrankPlot.pdf"
## [55] "PO3F4_HUMAN.H10MO.D.TFrankPlot.pdf"
## [56] "PO4F1_HUMAN.H10MO.D.TFrankPlot.pdf"
## [57] "PO4F3_HUMAN.H10MO.D.TFrankPlot.pdf"
## [58] "PO5F1_HUMAN.H10MO.A.TFrankPlot.pdf"
## [59] "PO6F1_HUMAN.H10MO.D.TFrankPlot.pdf"
## [60] "PROP1_HUMAN.H10MO.D.TFrankPlot.pdf"
## [61] "SOX11_HUMAN.H10MO.D.TFrankPlot.pdf"
## [62] "SOX1_HUMAN.H10MO.D.TFrankPlot.pdf" 
## [63] "SOX21_HUMAN.H10MO.D.TFrankPlot.pdf"
## [64] "SOX2_HUMAN.H10MO.B.TFrankPlot.pdf" 
## [65] "SOX3_HUMAN.H10MO.D.TFrankPlot.pdf" 
## [66] "SOX8_HUMAN.H10MO.D.TFrankPlot.pdf" 
## [67] "SRY_HUMAN.H10MO.B.TFrankPlot.pdf"  
## [68] "STAT2_HUMAN.H10MO.B.TFrankPlot.pdf"
## [69] "ZFHX3_HUMAN.H10MO.D.TFrankPlot.pdf"
## [70] "ZSC16_HUMAN.H10MO.D.TFrankPlot.pdf"

Figure: FOXJ3_HUMAN.H10MO.S.TFrankPlot.pdf

TF ranking plot

Session Info

sessionInfo()

## R version 3.4.1 (2017-06-30)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.2 LTS
## 
## Matrix products: default
## BLAS: /usr/local/lib/R/lib/libRblas.so
## LAPACK: /usr/local/lib/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] bindrcpp_0.2                          
## [2] MultiAssayExperiment_1.3.20           
## [3] dplyr_0.7.2                           
## [4] DT_0.2                                
## [5] ELMER_2.0.1                           
## [6] ELMER.data_2.0.1                      
## [7] Bioc2017.TCGAbiolinks.ELMER_0.0.0.9000
## 
## loaded via a namespace (and not attached):
##   [1] shinydashboard_0.6.1          R.utils_2.5.0                
##   [3] RSQLite_2.0                   AnnotationDbi_1.39.2         
##   [5] htmlwidgets_0.9               grid_3.4.1                   
##   [7] trimcluster_0.1-2             BiocParallel_1.11.4          
##   [9] devtools_1.13.2               DESeq_1.29.0                 
##  [11] munsell_0.4.3                 codetools_0.2-15             
##  [13] withr_2.0.0                   colorspace_1.3-2             
##  [15] BiocInstaller_1.27.2          Biobase_2.37.2               
##  [17] knitr_1.16                    stats4_3.4.1                 
##  [19] robustbase_0.92-7             labeling_0.3                 
##  [21] GenomeInfoDbData_0.99.1       KMsurv_0.1-5                 
##  [23] mnormt_1.5-5                  hwriter_1.3.2                
##  [25] bit64_0.9-7                   rprojroot_1.2                
##  [27] downloader_0.4                biovizBase_1.25.1            
##  [29] ggthemes_3.4.0                EDASeq_2.11.0                
##  [31] diptest_0.75-7                R6_2.2.2                     
##  [33] doParallel_1.0.10             GenomeInfoDb_1.13.4          
##  [35] locfit_1.5-9.1                AnnotationFilter_1.1.3       
##  [37] flexmix_2.3-14                reshape_0.8.6                
##  [39] bitops_1.0-6                  DelayedArray_0.3.19          
##  [41] assertthat_0.2.0              scales_0.4.1                 
##  [43] nnet_7.3-12                   gtable_0.2.0                 
##  [45] ensembldb_2.1.10              rlang_0.1.1                  
##  [47] genefilter_1.59.0             cmprsk_2.2-7                 
##  [49] GlobalOptions_0.0.12          splines_3.4.1                
##  [51] rtracklayer_1.37.3            lazyeval_0.2.0               
##  [53] acepack_1.4.1                 dichromat_2.0-0              
##  [55] selectr_0.3-1                 broom_0.4.2                  
##  [57] checkmate_1.8.3               yaml_2.1.14                  
##  [59] reshape2_1.4.2                GenomicFeatures_1.29.8       
##  [61] backports_1.1.0               httpuv_1.3.5                 
##  [63] Hmisc_4.0-3                   tools_3.4.1                  
##  [65] psych_1.7.5                   ggplot2_2.2.1                
##  [67] RColorBrewer_1.1-2            BiocGenerics_0.23.0          
##  [69] Rcpp_0.12.12                  plyr_1.8.4                   
##  [71] base64enc_0.1-3               progress_1.1.2               
##  [73] zlibbioc_1.23.0               purrr_0.2.2.2                
##  [75] RCurl_1.95-4.8                prettyunits_1.0.2            
##  [77] ggpubr_0.1.4                  rpart_4.1-11                 
##  [79] GetoptLong_0.1.6              viridis_0.4.0                
##  [81] zoo_1.8-0                     S4Vectors_0.15.5             
##  [83] SummarizedExperiment_1.7.5    ggrepel_0.6.5                
##  [85] cluster_2.0.6                 magrittr_1.5                 
##  [87] data.table_1.10.4             circlize_0.4.1               
##  [89] survminer_0.4.0               mvtnorm_1.0-6                
##  [91] whisker_0.3-2                 ProtGenerics_1.9.0           
##  [93] matrixStats_0.52.2            aroma.light_3.7.0            
##  [95] hms_0.3                       mime_0.5                     
##  [97] evaluate_0.10.1               xtable_1.8-2                 
##  [99] XML_3.98-1.9                  mclust_5.3                   
## [101] IRanges_2.11.12               gridExtra_2.2.1              
## [103] shape_1.4.2                   compiler_3.4.1               
## [105] biomaRt_2.33.3                tibble_1.3.3                 
## [107] R.oo_1.21.0                   htmltools_0.3.6              
## [109] Formula_1.2-2                 tidyr_0.6.3                  
## [111] geneplotter_1.55.0            DBI_0.7                      
## [113] matlab_1.0.2                  ComplexHeatmap_1.15.0        
## [115] MASS_7.3-47                   fpc_2.1-10                   
## [117] BiocStyle_2.5.8               ShortRead_1.35.1             
## [119] Matrix_1.2-10                 readr_1.1.1                  
## [121] R.methodsS3_1.7.1             parallel_3.4.1               
## [123] Gviz_1.21.1                   bindr_0.1                    
## [125] km.ci_0.5-2                   GenomicRanges_1.29.12        
## [127] pkgconfig_2.0.1               GenomicAlignments_1.13.4     
## [129] foreign_0.8-69                plotly_4.7.1                 
## [131] xml2_1.1.1                    roxygen2_6.0.1               
## [133] foreach_1.4.3                 annotate_1.55.0              
## [135] XVector_0.17.0                rvest_0.3.2                  
## [137] stringr_1.2.0                 VariantAnnotation_1.23.6     
## [139] digest_0.6.12                 ConsensusClusterPlus_1.41.0  
## [141] Biostrings_2.45.3             rmarkdown_1.6                
## [143] survMisc_0.5.4                TCGAbiolinks_2.5.6           
## [145] htmlTable_1.9                 dendextend_1.5.2             
## [147] edgeR_3.19.3                  curl_2.8.1                   
## [149] kernlab_0.9-25                shiny_1.0.3                  
## [151] Rsamtools_1.29.0              commonmark_1.2               
## [153] modeltools_0.2-21             rjson_0.2.15                 
## [155] nlme_3.1-131                  jsonlite_1.5                 
## [157] viridisLite_0.2.0             limma_3.33.7                 
## [159] BSgenome_1.45.1               lattice_0.20-35              
## [161] httr_1.2.1                    DEoptimR_1.0-8               
## [163] survival_2.41-3               interactiveDisplayBase_1.15.0
## [165] glue_1.1.1                    prabclus_2.2-6               
## [167] iterators_1.0.8               bit_1.1-12                   
## [169] class_7.3-14                  stringi_1.1.5                
## [171] blob_1.1.0                    AnnotationHub_2.9.5          
## [173] latticeExtra_0.6-28           memoise_1.1.0

Bibliography

Chedraoui Silva, Tiago, Simon G. Coetzee, Lijing Yao, Dennis J. Hazelett, Houtan Noushmehr, and Benjamin P. Berman. 2017. “Enhancer Linking by Methylation/Expression Relationships with the R Package Elmer Version 2.” BioRxiv. Cold Spring Harbor Labs Journals. doi:10.1101/148726.

Kulakovskiy, Ivan V, Ilya E Vorontsov, Ivan S Yevshin, Anastasiia V Soboleva, Artem S Kasianov, Haitham Ashoor, Wail Ba-Alawi, et al. 2016. “HOCOMOCO: Expansion and Enhancement of the Collection of Transcription Factor Binding Sites Models.” Nucleic Acids Research 44 (D1). Oxford Univ Press: D116–D125.

Yao, L, H Shen, PW Laird, PJ Farnham, and BP Berman. 2015. “Inferring Regulatory Element Landscapes and Transcription Factor Networks from Cancer Methylomes.” Genome Biology 16 (1): 105–5.

Integrative analysis workshop with TCGAbiolinks and ELMER - Analysis

Tiago Chedraoui Silva, Simon Coetzee, Dennis Hazelett, Ben Berman, Houtan Noushmehr

2017-08-02

Introduction

Loading required libraries

ELMER data input

Analysis

Identifying differentially methylated probes

Identifying putative probe-gene pairs

Motif enrichment analysis on the selected probes

Figure: hypo.all.quality.motif.enrichment.pdf

Identifying regulatory TFs

Figure: FOXJ3_HUMAN.H10MO.S.TFrankPlot.pdf

Session Info

Bibliography