Contents

1 Introduction

The TENET.ExperimentHub package contains 6 datasets for use in the TENET package’s examples and vignettes. These datasets include an example MultiAssayExperiment object with matched gene expression and DNA methylation data from a subset of both tumor (case) and adjacent normal (control) samples in The Cancer Genome Atlas (TCGA)’s breast adenocarcinoma (BRCA) cohort with essential information used in all TENET functions, an example GRanges object produced by the TENET step1MakeExternalDatasets function, a SummarizedExperiment object with example purity data to pass to the TENET step2GetDifferentiallyMethylatedSites function, a data frame with example patient clinical data (matching the data in the example MultiAssayExperiment object), and two additional GRanges objects containing example peak and topologically associating domain (TAD) data, respectively. Where applicable, all datasets are aligned to the hg38 human genome.

2 Acquiring and installing TENET.ExperimentHub

R 4.5 or a newer version is required.

On Ubuntu 22.04, successful installation required several additional packages. They can be installed by running the following command in a terminal:

sudo apt-get install r-base-dev libcurl4-openssl-dev libfreetype6-dev libfribidi-dev libfontconfig1-dev libharfbuzz-dev libtiff5-dev libxml2-dev

No dependencies other than R are required on macOS or Windows.

Two versions of this package are available.

To install the stable version from Bioconductor, start R and run:

## Install BiocManager, which is required to install packages from Bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install(version = "devel")
BiocManager::install("TENET.ExperimentHub")

The development version containing the most recent updates is available from our GitHub repository (https://github.com/rhielab/TENET.ExperimentHub).

To install the development version from GitHub, start R and run:

## Install prerequisite packages to install the development version from GitHub
if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}
if (!requireNamespace("remotes", quietly = TRUE)) {
    install.packages("remotes")
}

BiocManager::install(version = "devel")
BiocManager::install("rhielab/TENET.ExperimentHub")

3 Loading TENET.ExperimentHub

To load the TENET.ExperimentHub package, start R and run:

library(TENET.ExperimentHub)

4 Using the included datasets

Wrapper functions are provided to allow easy access to all included datasets. Usage of each wrapper function is demonstrated below.

5 Included datasets

5.1 exampleTENETMultiAssayExperiment

A MultiAssayExperiment dataset created using a modified version of the TCGADownloader function from the TENET package utilizing TCGAbiolinks package functionality. This object contains two SummarizedExperiment objects, expression and methylation, with expression data for 11,637 genes annotated to the GENCODE v36 dataset, including all 1,637 identified human TF genes, and DNA methylation data for 20,000 probes from the Illumina HM450 methylation array. The data are aligned to the human hg38 genome. Expression and methylation values were matched from 200 tumor and 42 adjacent normal tissue samples subset from the TCGA BRCA dataset. Additionally, results from running the TENET step 1-6 functions on these samples are included in the metadata of this MultiAssayExperiment object. Clinical data for these samples are included in the colData of the MultiAssayExperiment object. (A separate data frame object containing a subset of the clinical data for these samples is available as exampleTENETClinicalDataFrame.) This dataset is included to demonstrate TENET functions. Note: Because this dataset is a small subset of the overall BRCA dataset, results generated by TENET from this dataset differ from those presented for the BRCA dataset at large in TENET publications.

## Retrieve the ExperimentHub metadata for the object
exampleTENETMultiAssayExperiment(metadata = TRUE)
#> ExperimentHub with 1 record
#> # snapshotDate(): 2025-03-17
#> # names(): EH9587
#> # package(): TENET.ExperimentHub
#> # $dataprovider: TCGA
#> # $species: Homo sapiens
#> # $rdataclass: MultiAssayExperiment
#> # $rdatadateadded: 2024-09-13
#> # $title: exampleTENETMultiAssayExperiment
#> # $description: A MultiAssayExperiment dataset created using a modified vers...
#> # $taxonomyid: 9606
#> # $genome: hg38
#> # $sourcetype: Multiple
#> # $sourceurl: https://bioconductor.org/packages/release/bioc/html/TCGAbiolin...
#> # $sourcesize: NA
#> # $tags: c("CancerData", "clinical", "CopyNumberVariationData",
#> #   "DNAMethylationData", "ExpressionData", "Homo_sapiens_Data",
#> #   "Survival", "TCGA", "TENET") 
#> # retrieve record with 'object[["EH9587"]]'

## Retrieve the object itself
exampleTENETMultiAssayExperiment()
#> see ?TENET.ExperimentHub and browseVignettes('TENET.ExperimentHub') for documentation
#> loading from cache
#> require("MultiAssayExperiment")
#> A MultiAssayExperiment object of 2 listed
#>  experiments with user-defined names and respective classes.
#>  Containing an ExperimentList class object of length 2:
#>  [1] expression: RangedSummarizedExperiment with 11637 rows and 242 columns
#>  [2] methylation: RangedSummarizedExperiment with 20000 rows and 242 columns
#> Functionality:
#>  experiments() - obtain the ExperimentList instance
#>  colData() - the primary/phenotype DataFrame
#>  sampleMap() - the sample coordination DataFrame
#>  `$`, `[`, `[[` - extract colData columns, subset, or experiment
#>  *Format() - convert into a long or wide DataFrame
#>  assays() - convert ExperimentList to a SimpleList of matrices
#>  exportClass() - save data to flat files

5.2 exampleTENETClinicalDataFrame

A data frame containing example and simulated clinical information corresponding to the samples in the exampleTENETMultiAssayExperiment object, used to demonstrate how TENET functions can import clinical data from a specified data frame. Clinical data are utilized by the step2GetDifferentiallyMethylatedSites, step7TopGenesSurvival, and step7ExpressionVsDNAMethylationScatterplots functions. The data frame consists of vital status and time variables for use by the step7TopGenesSurvival function, simulated purity data for each sample, and simulated copy number variation (CNV) and somatic mutation (SM) data for the top 10 genes by number of linked hypermethylated and hypomethylated probes derived from analyses done using the exampleTENETMultiAssayExperiment object. These data are a subset of the clinical data contained in the colData of the exampleTENETMultiAssayExperiment object.

## Retrieve the ExperimentHub metadata for the object
exampleTENETClinicalDataFrame(metadata = TRUE)
#> ExperimentHub with 1 record
#> # snapshotDate(): 2025-03-17
#> # names(): EH9588
#> # package(): TENET.ExperimentHub
#> # $dataprovider: Multiple
#> # $species: Homo sapiens
#> # $rdataclass: data.frame
#> # $rdatadateadded: 2024-09-13
#> # $title: exampleTENETClinicalDataFrame
#> # $description: A data frame containing example and simulated clinical infor...
#> # $taxonomyid: 9606
#> # $genome: NA
#> # $sourcetype: Multiple
#> # $sourceurl: https://bioconductor.org/packages/release/bioc/html/TCGAbiolin...
#> # $sourcesize: NA
#> # $tags: c("CancerData", "clinical", "CopyNumberVariationData",
#> #   "ExpressionData", "Homo_sapiens_Data", "Survival", "TCGA", "TENET") 
#> # retrieve record with 'object[["EH9588"]]'

## Retrieve the object itself
exampleTENETClinicalDataFrame()
#> see ?TENET.ExperimentHub and browseVignettes('TENET.ExperimentHub') for documentation
#> loading from cache
#>      vital_status time purity ENSG00000165821_CNV
#>  [ reached 'max' / getOption("max.print") -- omitted 39 columns ]
#>  [ reached 'max' / getOption("max.print") -- omitted 231 rows ]

5.3 exampleTENETStep1MakeExternalDatasetsGRanges

A GenomicRanges dataset representing putative enhancer regions relevant to BRCA, created using the step1MakeExternalDatasets function in the TENET package with the consensusEnhancer, consensusNDR, publicEnhancer, publicNDR, and ENCODEdELS arguments all set to TRUE, and the cancerType argument set to “BRCA”. The data are aligned to the human hg38 genome. This dataset is included to demonstrate TENET’s step2GetDifferentiallyMethylatedSites function.

## Retrieve the ExperimentHub metadata for the object
exampleTENETStep1MakeExternalDatasetsGRanges(metadata = TRUE)
#> ExperimentHub with 1 record
#> # snapshotDate(): 2025-03-17
#> # names(): EH9589
#> # package(): TENET.ExperimentHub
#> # $dataprovider: Multiple
#> # $species: Homo sapiens
#> # $rdataclass: GRanges
#> # $rdatadateadded: 2024-09-13
#> # $title: exampleTENETStep1MakeExternalDatasetsGRanges
#> # $description: A GenomicRanges dataset representing putative enhancer regio...
#> # $taxonomyid: 9606
#> # $genome: hg38
#> # $sourcetype: Multiple
#> # $sourceurl: https://github.com/rhielab/TENET.AnnotationHub_files
#> # $sourcesize: NA
#> # $tags: c("CancerData", "ChipSeq", "CopyNumberVariationData",
#> #   "DnaseSeq", "ENCODE", "EpigenomeRoadMap", "ExpressionData",
#> #   "FANTOM5", "GEO", "H3K27ac", "Homo_sapiens_Data", "peaks", "TCGA",
#> #   "TENET") 
#> # retrieve record with 'object[["EH9589"]]'

## Retrieve the object itself
exampleTENETStep1MakeExternalDatasetsGRanges()
#> see ?TENET.ExperimentHub and browseVignettes('TENET.ExperimentHub') for documentation
#> loading from cache
#> GRanges object with 1971031 ranges and 0 metadata columns:
#>             seqnames        ranges strand
#>                <Rle>     <IRanges>  <Rle>
#>         [1]     chr1   10121-10270      *
#>         [2]     chr1   10389-10400      *
#>         [3]     chr1   16141-16290      *
#>         [4]     chr1   20061-20210      *
#>         [5]     chr1 135126-135275      *
#>         ...      ...           ...    ...
#>   [1971027]     chrM     8917-9607      *
#>   [1971028]     chrM     9665-9974      *
#>   [1971029]     chrM   10079-10766      *
#>   [1971030]     chrM   11143-12241      *
#>   [1971031]     chrM   12302-16539      *
#>   -------
#>   seqinfo: 25 sequences from an unspecified genome; no seqlengths

5.4 exampleTENETStep2GetDifferentiallyMethylatedSitesPuritySummarizedExperiment

SummarizedExperiment object

A SummarizedExperiment object with three DNA methylation datasets each composed of 10 adjacent normal colorectal adenocarcinoma (COAD) samples from The Cancer Genome Atlas (TCGA), retrieved using the TCGAbiolinks package. Each dataset has data for 20,000 probes from the Illumina HM450 methylation array, to match the number of probes in the exampleTENETMultiAssayExperiment object. The data are aligned to the human hg38 genome. This object is representative of a purity dataset, which would contain DNA methylation data from potentially confounding sources, used with TENET’s step2GetDifferentiallyMethylatedSites function.

## Retrieve the ExperimentHub metadata for the object
exampleTENETStep2GetDifferentiallyMethylatedSitesPuritySummarizedExperiment(
    metadata = TRUE
)
#> ExperimentHub with 1 record
#> # snapshotDate(): 2025-03-17
#> # names(): EH9590
#> # package(): TENET.ExperimentHub
#> # $dataprovider: TCGA
#> # $species: Homo sapiens
#> # $rdataclass: SummarizedExperiment
#> # $rdatadateadded: 2024-09-13
#> # $title: exampleTENETStep2GetDifferentiallyMethylatedSitesPuritySummarizedE...
#> # $description: A SummarizedExperiment object with three DNA methylation dat...
#> # $taxonomyid: 9606
#> # $genome: hg38
#> # $sourcetype: Multiple
#> # $sourceurl: https://bioconductor.org/packages/release/bioc/html/TCGAbiolin...
#> # $sourcesize: NA
#> # $tags: c("CancerData", "CopyNumberVariationData",
#> #   "DNAMethylationData", "ExpressionData", "Homo_sapiens_Data", "TCGA",
#> #   "TENET") 
#> # retrieve record with 'object[["EH9590"]]'

## Retrieve the object itself
exampleTENETStep2GetDifferentiallyMethylatedSitesPuritySummarizedExperiment()
#> see ?TENET.ExperimentHub and browseVignettes('TENET.ExperimentHub') for documentation
#> loading from cache
#> class: RangedSummarizedExperiment 
#> dim: 20000 10 
#> metadata(0):
#> assays(2): purityMethylationExampleA purityMethylationExampleB
#> rownames(20000): cg00002190 cg00002809 ... rs4331560 rs6982811
#> rowData names(52): address_A address_B ... MASK_extBase MASK_general
#> colnames: NULL
#> colData names(0):

5.5 exampleTENETPeakRegions

A GenomicRanges dataset with example genomic regions (peaks) of interest, used to demonstrate TENET’s step7TopGenesUserPeakOverlap function. The peaks are derived from a ChIP-seq experiment on FOXA1 in MCF-7 cells and aligned to the human hg38 genome. They were downloaded from the ENCODE portal (file ENCFF112JVK in experiment ENCSR126YEB). Citation: ENCODE Project Consortium; Moore JE, Purcaro MJ, Pratt HE, et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature. 2020 Jul;583(7818):699-710. doi: 10.1038/s41586-020-2493-4. Epub 2020 Jul 29. Erratum in: Nature. 2022 May;605(7909):E3. PMID: 32728249; PMCID: PMC7410828.

## Retrieve the ExperimentHub metadata for the object
exampleTENETPeakRegions(metadata = TRUE)
#> ExperimentHub with 1 record
#> # snapshotDate(): 2025-03-17
#> # names(): EH9591
#> # package(): TENET.ExperimentHub
#> # $dataprovider: ENCODE
#> # $species: Homo sapiens
#> # $rdataclass: GRanges
#> # $rdatadateadded: 2024-09-13
#> # $title: exampleTENETPeakRegions
#> # $description: A GenomicRanges dataset with example genomic regions (peaks)...
#> # $taxonomyid: 9606
#> # $genome: hg38
#> # $sourcetype: BED
#> # $sourceurl: https://www.encodeproject.org/files/ENCFF112JVK/@@download/ENC...
#> # $sourcesize: NA
#> # $tags: c("CancerData", "ChIPSeqData", "CopyNumberVariationData",
#> #   "ENCODE", "ExpressionData", "FOXA1", "Homo_sapiens_Data", "peaks",
#> #   "TENET") 
#> # retrieve record with 'object[["EH9591"]]'

## Retrieve the object itself
exampleTENETPeakRegions()
#> see ?TENET.ExperimentHub and browseVignettes('TENET.ExperimentHub') for documentation
#> loading from cache
#> GRanges object with 37386 ranges and 0 metadata columns:
#>           seqnames              ranges strand
#>              <Rle>           <IRanges>  <Rle>
#>       [1]    chr20   41650340-41650989      *
#>       [2]     chr1 147612278-147612917      *
#>       [3]    chr20   48812030-48812609      *
#>       [4]    chr15   69594337-69595180      *
#>       [5]     chr8 101607580-101608404      *
#>       ...      ...                 ...    ...
#>   [37382]    chr14 101762956-101763345      *
#>   [37383]    chr16   15875210-15875599      *
#>   [37384]     chr5 179821648-179822037      *
#>   [37385]    chr12   84236865-84237254      *
#>   [37386]    chr17   64985135-64985570      *
#>   -------
#>   seqinfo: 23 sequences from an unspecified genome; no seqlengths

5.6 exampleTENETTADRegions

A GenomicRanges dataset with example topologically associating domains (TADs), used to demonstrate TENET’s step7TopGenesTADTables function. The TADs are derived from T47D cells (mistakenly labeled as ‘T470’), and aligned to the human hg38 genome. They were downloaded from the 3D Genome Browser at http://3dgenome.fsm.northwestern.edu. Citation: Wang Y, Song F, Zhang B, et al. The 3D Genome Browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions. Genome Biol. 2018 Oct 4;19(1):151. doi: 10.1186/s13059-018-1519-9. PMID: 30286773; PMCID: PMC6172833.

## Retrieve the ExperimentHub metadata for the object
exampleTENETTADRegions(metadata = TRUE)
#> ExperimentHub with 1 record
#> # snapshotDate(): 2025-03-17
#> # names(): EH9592
#> # package(): TENET.ExperimentHub
#> # $dataprovider: 3D Genome Browser
#> # $species: Homo sapiens
#> # $rdataclass: GRanges
#> # $rdatadateadded: 2024-09-13
#> # $title: exampleTENETTADRegions
#> # $description: A GenomicRanges dataset with example topologically associati...
#> # $taxonomyid: 9606
#> # $genome: hg38
#> # $sourcetype: BED
#> # $sourceurl: http://3dgenome.fsm.northwestern.edu/downloads/hg38.TADs.zip
#> # $sourcesize: NA
#> # $tags: c("CancerData", "CopyNumberVariationData", "ExpressionData",
#> #   "Homo_sapiens_Data", "TAD", "TENET") 
#> # retrieve record with 'object[["EH9592"]]'

## Retrieve the object itself
exampleTENETTADRegions()
#> see ?TENET.ExperimentHub and browseVignettes('TENET.ExperimentHub') for documentation
#> loading from cache
#> GRanges object with 1889 ranges and 0 metadata columns:
#>        seqnames              ranges strand
#>           <Rle>           <IRanges>  <Rle>
#>      1     chr1      800001-3680000      *
#>      2     chr1     3800001-6000000      *
#>      3     chr1     6520001-7640000      *
#>      4     chr1     7960001-8920000      *
#>      5     chr1     9240001-9600000      *
#>    ...      ...                 ...    ...
#>   1886     chrX 148000001-149520000      *
#>   1887     chrX 149840001-150720000      *
#>   1888     chrX 151080001-152920000      *
#>   1889     chrX 152960001-153520000      *
#>   1890     chrX 154560001-156040895      *
#>   -------
#>   seqinfo: 23 sequences from an unspecified genome; no seqlengths

6 Session info

sessionInfo()
#> R Under development (unstable) (2025-03-13 r87965)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] MultiAssayExperiment_1.33.9 SummarizedExperiment_1.37.0
#>  [3] Biobase_2.67.0              GenomicRanges_1.59.1       
#>  [5] GenomeInfoDb_1.43.4         IRanges_2.41.3             
#>  [7] S4Vectors_0.45.4            BiocGenerics_0.53.6        
#>  [9] generics_0.1.3              MatrixGenerics_1.19.1      
#> [11] matrixStats_1.5.0           TENET.ExperimentHub_0.99.3 
#> [13] BiocStyle_2.35.0           
#> 
#> loaded via a namespace (and not attached):
#>  [1] KEGGREST_1.47.0         xfun_0.51               bslib_0.9.0            
#>  [4] lattice_0.22-6          vctrs_0.6.5             tools_4.6.0            
#>  [7] curl_6.2.1              tibble_3.2.1            AnnotationDbi_1.69.0   
#> [10] RSQLite_2.3.9           blob_1.2.4              pkgconfig_2.0.3        
#> [13] BiocBaseUtils_1.9.0     Matrix_1.7-3            dbplyr_2.5.0           
#> [16] lifecycle_1.0.4         GenomeInfoDbData_1.2.14 compiler_4.6.0         
#> [19] Biostrings_2.75.4       htmltools_0.5.8.1       sass_0.4.9             
#> [22] yaml_2.3.10             pillar_1.10.1           crayon_1.5.3           
#> [25] jquerylib_0.1.4         DelayedArray_0.33.6     cachem_1.1.0           
#> [28] abind_1.4-8             mime_0.12               ExperimentHub_2.15.0   
#> [31] AnnotationHub_3.15.0    tidyselect_1.2.1        digest_0.6.37          
#> [34] purrr_1.0.4             dplyr_1.1.4             bookdown_0.42          
#> [37] BiocVersion_3.21.1      fastmap_1.2.0           grid_4.6.0             
#> [40] cli_3.6.4               SparseArray_1.7.6       magrittr_2.0.3         
#> [43] S4Arrays_1.7.3          withr_3.0.2             filelock_1.0.3         
#> [46] UCSC.utils_1.3.1        rappdirs_0.3.3          bit64_4.6.0-1          
#> [49] rmarkdown_2.29          XVector_0.47.2          httr_1.4.7             
#> [52] bit_4.6.0               png_0.1-8               memoise_2.0.1          
#> [55] evaluate_1.0.3          knitr_1.50              BiocFileCache_2.15.1   
#> [58] rlang_1.1.5             glue_1.8.0              DBI_1.2.3              
#> [61] BiocManager_1.30.25     jsonlite_1.9.1          R6_2.6.1