1 Introduction

Genome-wide assays provide powerful methods to profile the composition, the conformation and the activity of the chromatin. Linear “coverage” tracks (generally stored as .bigwig files) are one of the outputs obtained when processing raw high-throughput sequencing data. These coverage tracks can be inspected in genome interactive browsers (e.g. IGV) to visually appreciate local or global variations in the coverage of specific genomic assays.

The coverage signal aggregated over multiple genomic features can also be computed. This approach is very efficient to summarize and compare the coverage of chromatin modalities (e.g. protein binding profiles from ChIP-seq, transcription profiles from RNA-seq, chromatin accessibility from ATAC-seq, …) over hundreds and up to thousands of genomic features of interest. This unlocks a more quantitative description of the coverage over groups of genomic features.

tidyCoverage implements the CoverageExperiment and the AggregatedCoverage classes built on top of the SummarizedExperiment class. These classes formalize the extraction and aggregation of coverage tracks over sets of genomic features of interests.

2 Installation

tidyCoverage package can be installed from Bioconductor using the following command:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("tidyCoverage")

3 CoverageExperiment and AggregatedCoverage classes

3.1 CoverageExperiment

tidyCoverage package defines the CoverageExperiment, directly extending the SummarizedExperiment class. This means that all standard methods available for SummarizedExperiments are available for CoverageExperiment objects.

library(tidyCoverage)
showClass("CoverageExperiment")
#> Class "CoverageExperiment" [package "tidyCoverage"]
#> 
#> Slots:
#>                                                                 
#> Name:                     rowRanges                      colData
#> Class: GenomicRanges_OR_GRangesList                    DataFrame
#>                                                                 
#> Name:                        assays                        NAMES
#> Class:               Assays_OR_NULL            character_OR_NULL
#>                                                                 
#> Name:               elementMetadata                     metadata
#> Class:                    DataFrame                         list
#> 
#> Extends: 
#> Class "RangedSummarizedExperiment", directly
#> Class "SummarizedExperiment", by class "RangedSummarizedExperiment", distance 2
#> Class "RectangularData", by class "RangedSummarizedExperiment", distance 3
#> Class "Vector", by class "RangedSummarizedExperiment", distance 3
#> Class "Annotated", by class "RangedSummarizedExperiment", distance 4
#> Class "vector_OR_Vector", by class "RangedSummarizedExperiment", distance 4

data(ce)
ce
#> class: CoverageExperiment 
#> dim: 1 2 
#> metadata(0):
#> assays(1): coverage
#> rownames(1): Scc1
#> rowData names(2): features n
#> colnames(2): RNA_fwd RNA_rev
#> colData names(1): track
#> width: 3000

rowData(ce)
#> DataFrame with 1 row and 2 columns
#>         features         n
#>      <character> <integer>
#> Scc1        Scc1       614

rowRanges(ce)
#> GRangesList object of length 1:
#> $Scc1
#> GRanges object with 614 ranges and 2 metadata columns:
#>         seqnames          ranges strand |        name     score
#>            <Rle>       <IRanges>  <Rle> | <character> <numeric>
#>     [1]       II       4290-7289      + |     YBL109W         0
#>     [2]       II       6677-9676      + |     YBL108W         0
#>     [3]       II      7768-10767      + |   YBL107W-A         0
#>     [4]       II     22598-25597      + |     YBL102W         0
#>     [5]       II     26927-29926      + |   YBL100W-C         0
#>     ...      ...             ...    ... .         ...       ...
#>   [610]       IV 1506505-1509504      + |     YDR536W         0
#>   [611]       IV 1509402-1512401      + |     YDR538W         0
#>   [612]       IV 1510594-1513593      + |     YDR539W         0
#>   [613]       IV 1521749-1524748      + |     YDR542W         0
#>   [614]       IV 1524821-1527820      + |     YDR545W         0
#>   -------
#>   seqinfo: 3 sequences from an unspecified genome

colData(ce)
#> DataFrame with 2 rows and 1 column
#>               track
#>         <character>
#> RNA_fwd     RNA_fwd
#> RNA_rev     RNA_rev

assays(ce)
#> List of length 1
#> names(1): coverage

assay(ce, 'coverage')
#>      RNA_fwd        RNA_rev       
#> Scc1 numeric,184200 numeric,184200

Note that whereas traditional SummarizedExperiment objects store atomic values stored in individual cells of an assay, each cell of the CoverageExperiment coverage assay contains a list of length 1, itself containing an array. This array stores the per-base coverage score of a genomic track (from colData) over a set of genomic ranges of interest (from rowData).

assay(ce, 'coverage')
#>      RNA_fwd        RNA_rev       
#> Scc1 numeric,184200 numeric,184200

assay(ce, 'coverage')[1, 1] |> class()
#> [1] "list"

assay(ce, 'coverage')[1, 1] |> length()
#> [1] 1

assay(ce, 'coverage')[1, 1][[1]] |> class()
#> [1] "matrix" "array"

assay(ce, 'coverage')[1, 1][[1]] |> dim()
#> [1] 614 300

# Compare this to `rowData(ce)$n` and `width(ce)`
rowData(ce)$n
#> [1] 614

width(ce)
#> IntegerList of length 1
#> [["Scc1"]] 3000 3000 3000 3000 3000 3000 3000 ... 3000 3000 3000 3000 3000 3000

assay(ce[1, 1], 'coverage')[[1]][1:10, 1:10]
#>             [,1]       [,2]       [,3]       [,4]       [,5]       [,6]
#>  [1,] -0.2573943 -0.2573943 -0.2573943 -0.2573943 -0.2573943 -0.2573943
#>  [2,] -0.2712007 -0.2712007 -0.2712007 -0.2712007 -0.2712007 -0.2712007
#>  [3,] -0.4199078 -0.4199078 -0.4199078 -0.4199078 -0.4199078 -0.4199078
#>  [4,] -0.6413458 -0.6413458 -0.6413458 -0.6413458 -0.6413458 -0.6460800
#>  [5,] -0.5558896 -0.5558896 -0.5558896 -0.5558896 -0.5558896 -0.5558896
#>  [6,]  0.5000157  0.5000157  0.5000157  0.5000157  0.5000157  0.5000157
#>  [7,] -1.1980792 -1.1980792 -1.1980792 -1.1980792 -1.1980792 -1.1980792
#>  [8,]  1.2336717  1.0952403  1.0952403  1.0952403  1.0952403  1.0952403
#>  [9,]  2.0565955  2.0565955  2.0565955  2.0565955  1.9672122  1.9289051
#> [10,] -0.6665461 -0.6665461 -0.6665461 -0.6665461 -0.6665461 -0.6665461
#>             [,7]       [,8]       [,9]      [,10]
#>  [1,] -0.2573943 -0.2573943 -0.2573943 -0.2573943
#>  [2,] -0.2712007 -0.2712007 -0.2712007 -0.2712007
#>  [3,] -0.4199078 -0.4199078 -0.4199078 -0.4199078
#>  [4,] -0.6466060 -0.6466060 -0.6466060 -0.6466060
#>  [5,] -0.5558896 -0.5558896 -0.5558896 -0.5558896
#>  [6,]  0.8920222  1.1533598  1.1533598  1.1533598
#>  [7,] -1.1980792 -1.1980792 -1.1980792 -1.1980792
#>  [8,]  1.0952403  1.1311052  1.4538886  1.4538886
#>  [9,]  1.9289051  1.9289051  1.9289051  1.9289051
#> [10,] -0.6665461 -0.6140548 -0.6082225 -0.6082225

3.2 AggregatedCoverage

AggregatedCoverage also directly extends the SummarizedExperiment class.

showClass("AggregatedCoverage")
#> Class "AggregatedCoverage" [package "tidyCoverage"]
#> 
#> Slots:
#>                                                                 
#> Name:                     rowRanges                      colData
#> Class: GenomicRanges_OR_GRangesList                    DataFrame
#>                                                                 
#> Name:                        assays                        NAMES
#> Class:               Assays_OR_NULL            character_OR_NULL
#>                                                                 
#> Name:               elementMetadata                     metadata
#> Class:                    DataFrame                         list
#> 
#> Extends: 
#> Class "RangedSummarizedExperiment", directly
#> Class "SummarizedExperiment", by class "RangedSummarizedExperiment", distance 2
#> Class "RectangularData", by class "RangedSummarizedExperiment", distance 3
#> Class "Vector", by class "RangedSummarizedExperiment", distance 3
#> Class "Annotated", by class "RangedSummarizedExperiment", distance 4
#> Class "vector_OR_Vector", by class "RangedSummarizedExperiment", distance 4

data(ac)
ac
#> class: AggregatedCoverage 
#> dim: 1 2 
#> metadata(0):
#> assays(8): mean median ... ci_low ci_high
#> rownames(1): Scc1
#> rowData names(1): features
#> colnames(2): RNA_fwd RNA_rev
#> colData names(1): track
#> width: 3000
#> binning: 1

rowData(ac)
#> DataFrame with 1 row and 1 column
#>         features
#>      <character>
#> Scc1        Scc1

rowRanges(ac)
#> GRangesList object of length 1:
#> $Scc1
#> GRanges object with 614 ranges and 2 metadata columns:
#>         seqnames          ranges strand |        name     score
#>            <Rle>       <IRanges>  <Rle> | <character> <numeric>
#>     [1]       II   234992-237991      + |     YBL002W         0
#>     [2]       II   226136-229135      + |     YBL004W         0
#>     [3]       II   215970-218969      + |     YBL005W         0
#>     [4]       II   219830-222829      + |   YBL005W-B         0
#>     [5]       II   215211-218210      + |   YBL006W-A         0
#>     ...      ...             ...    ... .         ...       ...
#>   [610]       IV 1506505-1509504      + |     YDR536W         0
#>   [611]       IV 1509402-1512401      + |     YDR538W         0
#>   [612]       IV 1510594-1513593      + |     YDR539W         0
#>   [613]       IV 1521749-1524748      + |     YDR542W         0
#>   [614]       IV 1524821-1527820      + |     YDR545W         0
#>   -------
#>   seqinfo: 3 sequences from an unspecified genome

colData(ac)
#> DataFrame with 2 rows and 1 column
#>               track
#>         <character>
#> RNA_fwd     RNA_fwd
#> RNA_rev     RNA_rev

assays(ac)
#> List of length 8
#> names(8): mean median min max sd se ci_low ci_high

assay(ac, 'mean')
#>      RNA_fwd      RNA_rev     
#> Scc1 numeric,3000 numeric,3000

It stores per-base coverage statistical metrics in assays (e.g. mean, median, …). Each assay thus contains an matrix of vectors.

assay(ac[1, 1], 'mean')[[1]] |> dim()
#> NULL

assay(ac[1, 1], 'mean')[[1]] |> length()
#> [1] 3000

assay(ac[1, 1], 'mean')[[1]][1:10]
#>  [1] -0.09209840 -0.09153138 -0.08975691 -0.09136185 -0.09145762 -0.09119866
#>  [7] -0.09120939 -0.09151991 -0.09073821 -0.09032538

4 Manipulate CoverageExperiment objects

4.1 Create a CoverageExperiment object

One can use CoverageExperiment() constructor along with:

  • A single bigwig file imported as = "Rle" and a GRanges or a named GRangesList;
  • A named list of bigwig files imported as = "Rle" and a GRanges or a named GRangesList;
  • A BigWigFile object and a GRanges or a named GRangesList;
  • A named BigWigFileList object and a GRanges or a named GRangesList;

A numeric width argument also needs to be specified. It is used to center features to their midpoint and resize them to the chosen width.

For example:

library(rtracklayer)
bw_file <- system.file("extdata", "MNase.bw", package = "tidyCoverage")
bw_file
#> [1] "/tmp/RtmppFf4Dr/Rinst3ef19b4dd3f2ca/tidyCoverage/extdata/MNase.bw"

bed_file <- system.file("extdata", "TSSs.bed", package = "tidyCoverage")
bed_file
#> [1] "/tmp/RtmppFf4Dr/Rinst3ef19b4dd3f2ca/tidyCoverage/extdata/TSSs.bed"

CE <- CoverageExperiment(
    tracks = import(bw_file, as = "Rle"), 
    features = import(bed_file),
    width = 3000
)
CE
#> class: CoverageExperiment 
#> dim: 1 1 
#> metadata(0):
#> assays(1): coverage
#> rownames(1): features
#> rowData names(2): features n
#> colnames(1): track
#> colData names(1): track
#> width: 3000

And this works as well (note that in this case the names of the GRangesList are being used as rownames):

library(rtracklayer)
bw_file <- system.file("extdata", "MNase.bw", package = "tidyCoverage")
bw_file
#> [1] "/tmp/RtmppFf4Dr/Rinst3ef19b4dd3f2ca/tidyCoverage/extdata/MNase.bw"

bed_file <- system.file("extdata", "TSSs.bed", package = "tidyCoverage")
bed_file
#> [1] "/tmp/RtmppFf4Dr/Rinst3ef19b4dd3f2ca/tidyCoverage/extdata/TSSs.bed"

CoverageExperiment(
    tracks = BigWigFile(bw_file), 
    features = GRangesList('TSSs' = import(bed_file)),
    width = 3000
)
#> class: CoverageExperiment 
#> dim: 1 1 
#> metadata(0):
#> assays(1): coverage
#> rownames(1): TSSs
#> rowData names(2): features n
#> colnames(1): track
#> colData names(1): track
#> width: 3000

4.2 Bin a CoverageExperiment object

By default, CoverageExperiment objects store per-base track coverage. This implies that any cell from the coverage assay has as many columns as the width provided in the constructor function.

assay(CE, 'coverage')[1, 1][[1]] |> ncol()
#> [1] 3000

If per-base resolution is not needed, one can use the window argument in the constructor function to average the coverage score over non-overlapping bins.

CE2 <- CoverageExperiment(
    tracks = import(bw_file, as = "Rle"), 
    features = import(bed_file),
    width = 3000, 
    window = 20
)

CE2
#> class: CoverageExperiment 
#> dim: 1 1 
#> metadata(0):
#> assays(1): coverage
#> rownames(1): features
#> rowData names(2): features n
#> colnames(1): track
#> colData names(1): track
#> width: 3000

assay(CE2, 'coverage')[1, 1][[1]] |> ncol()
#> [1] 150

If a CoverageExperiment object has already been computed, the coarsen() function can be used afterwards to reduce the resolution of the object.

CE3 <- coarsen(CE, window = 20)

CE3 
#> class: CoverageExperiment 
#> dim: 1 1 
#> metadata(0):
#> assays(1): coverage
#> rownames(1): features
#> rowData names(2): features n
#> colnames(1): track
#> colData names(1): track
#> width: 3000

identical(CE2, CE3)
#> [1] TRUE

4.3 Expand a CoverageExperiment object

The expand method from the tidyr package is adapted to CoverageExperiment objects to return a tidy tibble. This reformated object contains several columns:

  1. track: storing colnames, i.e. names of tracks used in the original CoverageExperiment;
  2. features: storing rownames, i.e. names of features used in the original CoverageExperiment;
  3. chr: features seqnames from the CoverageExperiment;
  4. ranges: features from the CoverageExperiment coerced as character;
  5. strand: features strand from the CoverageExperiment;
  6. coord: exact genomic position from the CoverageExperiment;
  7. coverage: coverage score extracted from corresponding track at chr:coord;
  8. coord.scaled: 0-centered genomic position;
expand(CE)
#> # A tibble: 5,355,000 × 8
#> # Groups:   track, features, ranges [1,785]
#>    track features chr   ranges         strand coord coverage coord.scaled
#>    <chr> <chr>    <chr> <chr>          <chr>  <dbl>    <dbl>        <dbl>
#>  1 track features II    II:4290-7289:+ +       4290        0        -1500
#>  2 track features II    II:4290-7289:+ +       4291        0        -1499
#>  3 track features II    II:4290-7289:+ +       4292        0        -1498
#>  4 track features II    II:4290-7289:+ +       4293        0        -1497
#>  5 track features II    II:4290-7289:+ +       4294        0        -1496
#>  6 track features II    II:4290-7289:+ +       4295        0        -1495
#>  7 track features II    II:4290-7289:+ +       4296        0        -1494
#>  8 track features II    II:4290-7289:+ +       4297        0        -1493
#>  9 track features II    II:4290-7289:+ +       4298        0        -1492
#> 10 track features II    II:4290-7289:+ +       4299        0        -1491
#> # ℹ 5,354,990 more rows

Note that if the CoverageExperiment object has been coarsened using window = ..., the coord and coord.scaled are handled correspondingly.

expand(CE3)
#> # A tibble: 267,750 × 8
#> # Groups:   track, features, ranges [1,785]
#>    track features chr   ranges         strand coord coverage coord.scaled
#>    <chr> <chr>    <chr> <chr>          <chr>  <dbl>    <dbl>        <dbl>
#>  1 track features II    II:4290-7289:+ +       4290        0        -1500
#>  2 track features II    II:4290-7289:+ +       4310        0        -1480
#>  3 track features II    II:4290-7289:+ +       4330        0        -1460
#>  4 track features II    II:4290-7289:+ +       4350        0        -1440
#>  5 track features II    II:4290-7289:+ +       4370        0        -1420
#>  6 track features II    II:4290-7289:+ +       4390        0        -1400
#>  7 track features II    II:4290-7289:+ +       4410        0        -1380
#>  8 track features II    II:4290-7289:+ +       4430        0        -1360
#>  9 track features II    II:4290-7289:+ +       4450        0        -1340
#> 10 track features II    II:4290-7289:+ +       4470        0        -1320
#> # ℹ 267,740 more rows

4.4 Plot coverage of a set of tracks over a single genomic locus

To illustrate how to visualize coverage tracks from a CoverageExperiment object over a single genomic locus of interest, let’s use sample data provided in the tidyCoverage package.

# ~~~~~~~~~~~~~~~ Import coverage tracks into a named list ~~~~~~~~~~~~~~~ #
tracks <- list(
    Scc1 = system.file("extdata", "Scc1.bw", package = "tidyCoverage"), 
    RNA_fwd = system.file("extdata", "RNA.fwd.bw", package = "tidyCoverage"),
    RNA_rev = system.file("extdata", "RNA.rev.bw", package = "tidyCoverage"),
    PolII = system.file("extdata", "PolII.bw", package = "tidyCoverage"), 
    MNase = system.file("extdata", "MNase.bw", package = "tidyCoverage")
) |> BigWigFileList()

locus <- GRanges("II:450001-475000")

# ~~~~~~~~~~~~~~~ Instantiate a CoverageExperiment object ~~~~~~~~~~~~~~~ #
CE_chrII <- CoverageExperiment(
    tracks = tracks, 
    features = locus,
    width = width(locus)
)

CE_chrII
#> class: CoverageExperiment 
#> dim: 1 5 
#> metadata(0):
#> assays(1): coverage
#> rownames(1): features
#> rowData names(2): features n
#> colnames(5): Scc1 RNA_fwd RNA_rev PolII MNase
#> colData names(1): track
#> width: 25000

From there, it is easy to (optionally) coarsen then expand the CoverageExperiment into a tibble and use ggplot2 for visualization.

library(ggplot2)
CE_chrII |> 
    coarsen(window = 10) |> 
    expand() |> 
    ggplot(aes(x = coord, y = coverage)) + 
        geom_col(aes(fill = track, col = track)) + 
        facet_grid(track~., scales = 'free') + 
        scale_x_continuous(expand = c(0, 0)) + 
        theme_bw() + 
        theme(legend.position = "none", aspect.ratio = 0.1)

In this plot, each facet represents the coverage of a different genomic track over a single region of interest (chrII:450001-475000). Each facet has independent scaling thanks to facet_grid(..., scales = free).

5 Manipulate AggregatedCoverage objects

5.1 Aggregate a CoverageExperiment into an AggregatedCoverage object

It is often useful to aggregate() genomic tracks coverage over a set of genomic features.

AC <- aggregate(CE)

AC
#> class: AggregatedCoverage 
#> dim: 1 1 
#> metadata(0):
#> assays(8): mean median ... ci_low ci_high
#> rownames(1): features
#> rowData names(2): features n
#> colnames(1): track
#> colData names(1): track
#> width: 3000
#> binning: 1

assay(AC, 'mean')[1, 1][[1]] |> length()
#> [1] 3000
AC20 <- aggregate(CE, bin = 20)
AC20
#> class: AggregatedCoverage 
#> dim: 1 1 
#> metadata(0):
#> assays(8): mean median ... ci_low ci_high
#> rownames(1): features
#> rowData names(2): features n
#> colnames(1): track
#> colData names(1): track
#> width: 3000
#> binning: 20

assay(AC20, 'mean')[1, 1][[1]] |> length()
#> [1] 150

The resulting AggregatedCoverage objects can be readily coerced into a tibble.

as_tibble(AC20)
#> # A tibble: 150 × 13
#>    .sample .feature track features coord  mean median   min   max    sd     se
#>    <chr>   <chr>    <chr> <chr>    <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
#>  1 track   features track features -1500  2.99   2.89     0  9.51  1.75 0.0415
#>  2 track   features track features -1480  3.01   2.92     0  9.56  1.76 0.0416
#>  3 track   features track features -1460  3.07   2.96     0 10.4   1.79 0.0425
#>  4 track   features track features -1440  3.13   3.00     0 10.4   1.81 0.0428
#>  5 track   features track features -1420  3.13   2.99     0 10.4   1.81 0.0428
#>  6 track   features track features -1400  3.12   2.98     0 10.1   1.81 0.0428
#>  7 track   features track features -1380  3.06   2.95     0  9.54  1.79 0.0424
#>  8 track   features track features -1360  3.01   2.93     0 10.2   1.79 0.0423
#>  9 track   features track features -1340  3.06   2.98     0 10.6   1.80 0.0426
#> 10 track   features track features -1320  3.03   2.95     0 10.6   1.80 0.0426
#> # ℹ 140 more rows
#> # ℹ 2 more variables: ci_low <dbl>, ci_high <dbl>

Note that the coarsen-then-aggregate or aggregate-by-bin are NOT equivalent. This is due to the certain operations being not commutative with mean (e.g. sd, min/max, …).

# Coarsen `CoverageExperiment` with `window = ...` then per-bin `aggregate`:
CoverageExperiment(
    tracks = import(bw_file, as = "Rle"), features = import(bed_file),
    width = 3000
) |> 
    coarsen(window = 20) |> ## FIRST COARSEN...
    aggregate() |>          ## ... THEN AGGREGATE
    as_tibble()
#> # A tibble: 150 × 13
#>    .sample .feature track features coord  mean median   min   max    sd     se
#>    <chr>   <chr>    <chr> <chr>    <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
#>  1 track   features track features -1500  2.99   2.95     0  9.01  1.65 0.0391
#>  2 track   features track features -1480  3.01   2.97     0  9.52  1.67 0.0396
#>  3 track   features track features -1460  3.07   3.00     0 10.4   1.70 0.0402
#>  4 track   features track features -1440  3.13   3.04     0 10.4   1.72 0.0407
#>  5 track   features track features -1420  3.13   3.01     0 10.4   1.72 0.0407
#>  6 track   features track features -1400  3.12   3.02     0  9.28  1.71 0.0405
#>  7 track   features track features -1380  3.06   3.01     0  9.23  1.70 0.0402
#>  8 track   features track features -1360  3.01   2.94     0  9.68  1.70 0.0401
#>  9 track   features track features -1340  3.06   3.01     0 10.6   1.71 0.0405
#> 10 track   features track features -1320  3.03   2.99     0 10.6   1.71 0.0404
#> # ℹ 140 more rows
#> # ℹ 2 more variables: ci_low <dbl>, ci_high <dbl>

# Per-base `CoverageExperiment` then `aggregate` with `bin = ...`: 
CoverageExperiment(
    tracks = import(bw_file, as = "Rle"), features = import(bed_file),
    width = 3000
) |> 
    aggregate(bin = 20) |>  ## DIRECTLY AGGREGATE BY BIN
    as_tibble()
#> # A tibble: 150 × 13
#>    .sample .feature track features coord  mean median   min   max    sd     se
#>    <chr>   <chr>    <chr> <chr>    <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
#>  1 track   features track features -1500  2.99   2.89     0  9.51  1.75 0.0415
#>  2 track   features track features -1480  3.01   2.92     0  9.56  1.76 0.0416
#>  3 track   features track features -1460  3.07   2.96     0 10.4   1.79 0.0425
#>  4 track   features track features -1440  3.13   3.00     0 10.4   1.81 0.0428
#>  5 track   features track features -1420  3.13   2.99     0 10.4   1.81 0.0428
#>  6 track   features track features -1400  3.12   2.98     0 10.1   1.81 0.0428
#>  7 track   features track features -1380  3.06   2.95     0  9.54  1.79 0.0424
#>  8 track   features track features -1360  3.01   2.93     0 10.2   1.79 0.0423
#>  9 track   features track features -1340  3.06   2.98     0 10.6   1.80 0.0426
#> 10 track   features track features -1320  3.03   2.95     0 10.6   1.80 0.0426
#> # ℹ 140 more rows
#> # ℹ 2 more variables: ci_low <dbl>, ci_high <dbl>

5.2 AggregatedCoverage over multiple tracks / feature sets

As en example for the rest of this vignette, we compute an AggregatedCoverage object using multiple genomic track files and multiple sets of genomic ranges.

library(purrr)
#> 
#> Attaching package: 'purrr'
#> The following object is masked from 'package:GenomicRanges':
#> 
#>     reduce
#> The following object is masked from 'package:IRanges':
#> 
#>     reduce
library(plyranges)
#> 
#> Attaching package: 'plyranges'
#> The following object is masked from 'package:IRanges':
#> 
#>     slice
#> The following object is masked from 'package:stats':
#> 
#>     filter

# ~~~~~~~~~~~~~~~ Import genomic features into a named list ~~~~~~~~~~~~~~~ #
features <- list(
    TSSs = system.file("extdata", "TSSs.bed", package = "tidyCoverage"),
    `Convergent transcription` = system.file("extdata", "conv_transcription_loci.bed", package = "tidyCoverage")
) |> map(import) |> map(filter, strand == '+') 

# ~~~~~~~~~~~~~~~ Import coverage tracks into a named list ~~~~~~~~~~~~~~~ #
tracks <- list(
    Scc1 = system.file("extdata", "Scc1.bw", package = "tidyCoverage"), 
    RNA_fwd = system.file("extdata", "RNA.fwd.bw", package = "tidyCoverage"),
    RNA_rev = system.file("extdata", "RNA.rev.bw", package = "tidyCoverage"),
    PolII = system.file("extdata", "PolII.bw", package = "tidyCoverage"), 
    MNase = system.file("extdata", "MNase.bw", package = "tidyCoverage")
) |> map(import, as = 'Rle')

# ~~~~~~~~~~~~~~~ Compute aggregated coverage ~~~~~~~~~~~~~~~ #
CE <- CoverageExperiment(tracks, features, width = 5000, scale = TRUE, center = TRUE)
CE
#> class: CoverageExperiment 
#> dim: 2 5 
#> metadata(0):
#> assays(1): coverage
#> rownames(2): TSSs Convergent transcription
#> rowData names(2): features n
#> colnames(5): Scc1 RNA_fwd RNA_rev PolII MNase
#> colData names(1): track
#> width: 5000

AC <- aggregate(CE)
AC
#> class: AggregatedCoverage 
#> dim: 2 5 
#> metadata(0):
#> assays(8): mean median ... ci_low ci_high
#> rownames(2): TSSs Convergent transcription
#> rowData names(2): features n
#> colnames(5): Scc1 RNA_fwd RNA_rev PolII MNase
#> colData names(1): track
#> width: 5000
#> binning: 1

5.3 Plot aggregated coverages with ggplot2

Because AggregatedCoverage objects can be easily coerced into tibbles, the full range of ggplot2 functionalities can be exploited to plot aggregated coverage signal of multiple tracks over multiple sets of genomic ranges.

AC |> 
    as_tibble() |> 
    ggplot(aes(x = coord, y = mean, group = interaction(features, track), col = track)) + 
    geom_line()