tidyCoverage 1.0.0
Genome-wide assays provide powerful methods to profile the composition, the
conformation and the activity of the chromatin. Linear “coverage” tracks
(generally stored as .bigwig
files) are one of the outputs obtained
when processing raw high-throughput sequencing data. These coverage tracks
can be inspected in genome interactive browsers (e.g. IGV
) to visually
appreciate local or global variations in the coverage of specific genomic assays.
The coverage signal aggregated over multiple genomic features can also be computed. This approach is very efficient to summarize and compare the coverage of chromatin modalities (e.g. protein binding profiles from ChIP-seq, transcription profiles from RNA-seq, chromatin accessibility from ATAC-seq, …) over hundreds and up to thousands of genomic features of interest. This unlocks a more quantitative description of the coverage over groups of genomic features.
tidyCoverage
implements the CoverageExperiment
and the AggregatedCoverage
classes built on top of the SummarizedExperiment
class. These classes
formalize the extraction and aggregation of coverage tracks over
sets of genomic features of interests.
tidyCoverage
package can be installed from Bioconductor using the following
command:
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("tidyCoverage")
CoverageExperiment
and AggregatedCoverage
classesCoverageExperiment
tidyCoverage
package defines the CoverageExperiment
, directly extending
the SummarizedExperiment
class. This means that all standard methods
available for SummarizedExperiment
s are available for CoverageExperiment
objects.
library(tidyCoverage)
showClass("CoverageExperiment")
#> Class "CoverageExperiment" [package "tidyCoverage"]
#>
#> Slots:
#>
#> Name: rowRanges colData
#> Class: GenomicRanges_OR_GRangesList DataFrame
#>
#> Name: assays NAMES
#> Class: Assays_OR_NULL character_OR_NULL
#>
#> Name: elementMetadata metadata
#> Class: DataFrame list
#>
#> Extends:
#> Class "RangedSummarizedExperiment", directly
#> Class "SummarizedExperiment", by class "RangedSummarizedExperiment", distance 2
#> Class "RectangularData", by class "RangedSummarizedExperiment", distance 3
#> Class "Vector", by class "RangedSummarizedExperiment", distance 3
#> Class "Annotated", by class "RangedSummarizedExperiment", distance 4
#> Class "vector_OR_Vector", by class "RangedSummarizedExperiment", distance 4
data(ce)
ce
#> class: CoverageExperiment
#> dim: 1 2
#> metadata(0):
#> assays(1): coverage
#> rownames(1): Scc1
#> rowData names(2): features n
#> colnames(2): RNA_fwd RNA_rev
#> colData names(1): track
#> width: 3000
rowData(ce)
#> DataFrame with 1 row and 2 columns
#> features n
#> <character> <integer>
#> Scc1 Scc1 614
rowRanges(ce)
#> GRangesList object of length 1:
#> $Scc1
#> GRanges object with 614 ranges and 2 metadata columns:
#> seqnames ranges strand | name score
#> <Rle> <IRanges> <Rle> | <character> <numeric>
#> [1] II 4290-7289 + | YBL109W 0
#> [2] II 6677-9676 + | YBL108W 0
#> [3] II 7768-10767 + | YBL107W-A 0
#> [4] II 22598-25597 + | YBL102W 0
#> [5] II 26927-29926 + | YBL100W-C 0
#> ... ... ... ... . ... ...
#> [610] IV 1506505-1509504 + | YDR536W 0
#> [611] IV 1509402-1512401 + | YDR538W 0
#> [612] IV 1510594-1513593 + | YDR539W 0
#> [613] IV 1521749-1524748 + | YDR542W 0
#> [614] IV 1524821-1527820 + | YDR545W 0
#> -------
#> seqinfo: 3 sequences from an unspecified genome
colData(ce)
#> DataFrame with 2 rows and 1 column
#> track
#> <character>
#> RNA_fwd RNA_fwd
#> RNA_rev RNA_rev
assays(ce)
#> List of length 1
#> names(1): coverage
assay(ce, 'coverage')
#> RNA_fwd RNA_rev
#> Scc1 numeric,184200 numeric,184200
Note that whereas traditional SummarizedExperiment
objects
store atomic values stored in individual cells of an assay, each cell of
the CoverageExperiment
coverage
assay contains a list of length 1,
itself containing an array. This array stores the per-base
coverage score of a genomic track (from colData
) over a set of genomic
ranges of interest (from rowData
).
assay(ce, 'coverage')
#> RNA_fwd RNA_rev
#> Scc1 numeric,184200 numeric,184200
assay(ce, 'coverage')[1, 1] |> class()
#> [1] "list"
assay(ce, 'coverage')[1, 1] |> length()
#> [1] 1
assay(ce, 'coverage')[1, 1][[1]] |> class()
#> [1] "matrix" "array"
assay(ce, 'coverage')[1, 1][[1]] |> dim()
#> [1] 614 300
# Compare this to `rowData(ce)$n` and `width(ce)`
rowData(ce)$n
#> [1] 614
width(ce)
#> IntegerList of length 1
#> [["Scc1"]] 3000 3000 3000 3000 3000 3000 3000 ... 3000 3000 3000 3000 3000 3000
assay(ce[1, 1], 'coverage')[[1]][1:10, 1:10]
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] -0.2573943 -0.2573943 -0.2573943 -0.2573943 -0.2573943 -0.2573943
#> [2,] -0.2712007 -0.2712007 -0.2712007 -0.2712007 -0.2712007 -0.2712007
#> [3,] -0.4199078 -0.4199078 -0.4199078 -0.4199078 -0.4199078 -0.4199078
#> [4,] -0.6413458 -0.6413458 -0.6413458 -0.6413458 -0.6413458 -0.6460800
#> [5,] -0.5558896 -0.5558896 -0.5558896 -0.5558896 -0.5558896 -0.5558896
#> [6,] 0.5000157 0.5000157 0.5000157 0.5000157 0.5000157 0.5000157
#> [7,] -1.1980792 -1.1980792 -1.1980792 -1.1980792 -1.1980792 -1.1980792
#> [8,] 1.2336717 1.0952403 1.0952403 1.0952403 1.0952403 1.0952403
#> [9,] 2.0565955 2.0565955 2.0565955 2.0565955 1.9672122 1.9289051
#> [10,] -0.6665461 -0.6665461 -0.6665461 -0.6665461 -0.6665461 -0.6665461
#> [,7] [,8] [,9] [,10]
#> [1,] -0.2573943 -0.2573943 -0.2573943 -0.2573943
#> [2,] -0.2712007 -0.2712007 -0.2712007 -0.2712007
#> [3,] -0.4199078 -0.4199078 -0.4199078 -0.4199078
#> [4,] -0.6466060 -0.6466060 -0.6466060 -0.6466060
#> [5,] -0.5558896 -0.5558896 -0.5558896 -0.5558896
#> [6,] 0.8920222 1.1533598 1.1533598 1.1533598
#> [7,] -1.1980792 -1.1980792 -1.1980792 -1.1980792
#> [8,] 1.0952403 1.1311052 1.4538886 1.4538886
#> [9,] 1.9289051 1.9289051 1.9289051 1.9289051
#> [10,] -0.6665461 -0.6140548 -0.6082225 -0.6082225
AggregatedCoverage
AggregatedCoverage
also directly extends the SummarizedExperiment
class.
showClass("AggregatedCoverage")
#> Class "AggregatedCoverage" [package "tidyCoverage"]
#>
#> Slots:
#>
#> Name: rowRanges colData
#> Class: GenomicRanges_OR_GRangesList DataFrame
#>
#> Name: assays NAMES
#> Class: Assays_OR_NULL character_OR_NULL
#>
#> Name: elementMetadata metadata
#> Class: DataFrame list
#>
#> Extends:
#> Class "RangedSummarizedExperiment", directly
#> Class "SummarizedExperiment", by class "RangedSummarizedExperiment", distance 2
#> Class "RectangularData", by class "RangedSummarizedExperiment", distance 3
#> Class "Vector", by class "RangedSummarizedExperiment", distance 3
#> Class "Annotated", by class "RangedSummarizedExperiment", distance 4
#> Class "vector_OR_Vector", by class "RangedSummarizedExperiment", distance 4
data(ac)
ac
#> class: AggregatedCoverage
#> dim: 1 2
#> metadata(0):
#> assays(8): mean median ... ci_low ci_high
#> rownames(1): Scc1
#> rowData names(1): features
#> colnames(2): RNA_fwd RNA_rev
#> colData names(1): track
#> width: 3000
#> binning: 1
rowData(ac)
#> DataFrame with 1 row and 1 column
#> features
#> <character>
#> Scc1 Scc1
rowRanges(ac)
#> GRangesList object of length 1:
#> $Scc1
#> GRanges object with 614 ranges and 2 metadata columns:
#> seqnames ranges strand | name score
#> <Rle> <IRanges> <Rle> | <character> <numeric>
#> [1] II 234992-237991 + | YBL002W 0
#> [2] II 226136-229135 + | YBL004W 0
#> [3] II 215970-218969 + | YBL005W 0
#> [4] II 219830-222829 + | YBL005W-B 0
#> [5] II 215211-218210 + | YBL006W-A 0
#> ... ... ... ... . ... ...
#> [610] IV 1506505-1509504 + | YDR536W 0
#> [611] IV 1509402-1512401 + | YDR538W 0
#> [612] IV 1510594-1513593 + | YDR539W 0
#> [613] IV 1521749-1524748 + | YDR542W 0
#> [614] IV 1524821-1527820 + | YDR545W 0
#> -------
#> seqinfo: 3 sequences from an unspecified genome
colData(ac)
#> DataFrame with 2 rows and 1 column
#> track
#> <character>
#> RNA_fwd RNA_fwd
#> RNA_rev RNA_rev
assays(ac)
#> List of length 8
#> names(8): mean median min max sd se ci_low ci_high
assay(ac, 'mean')
#> RNA_fwd RNA_rev
#> Scc1 numeric,3000 numeric,3000
It stores per-base coverage statistical metrics in assays (e.g. mean
, median
, …).
Each assay thus contains an matrix of vectors.
assay(ac[1, 1], 'mean')[[1]] |> dim()
#> NULL
assay(ac[1, 1], 'mean')[[1]] |> length()
#> [1] 3000
assay(ac[1, 1], 'mean')[[1]][1:10]
#> [1] -0.09209840 -0.09153138 -0.08975691 -0.09136185 -0.09145762 -0.09119866
#> [7] -0.09120939 -0.09151991 -0.09073821 -0.09032538
CoverageExperiment
objectsCoverageExperiment
objectOne can use CoverageExperiment()
constructor along with:
bigwig
file imported as = "Rle"
and a GRanges
or a named GRangesList
;bigwig
files imported as = "Rle"
and a GRanges
or a named GRangesList
;BigWigFile
object and a GRanges
or a named GRangesList
;BigWigFileList
object and a GRanges
or a named GRangesList
;A numeric width
argument also needs to be specified. It is used to center
features
to their midpoint and resize them to the chosen width
.
For example:
library(rtracklayer)
bw_file <- system.file("extdata", "MNase.bw", package = "tidyCoverage")
bw_file
#> [1] "/tmp/RtmppFf4Dr/Rinst3ef19b4dd3f2ca/tidyCoverage/extdata/MNase.bw"
bed_file <- system.file("extdata", "TSSs.bed", package = "tidyCoverage")
bed_file
#> [1] "/tmp/RtmppFf4Dr/Rinst3ef19b4dd3f2ca/tidyCoverage/extdata/TSSs.bed"
CE <- CoverageExperiment(
tracks = import(bw_file, as = "Rle"),
features = import(bed_file),
width = 3000
)
CE
#> class: CoverageExperiment
#> dim: 1 1
#> metadata(0):
#> assays(1): coverage
#> rownames(1): features
#> rowData names(2): features n
#> colnames(1): track
#> colData names(1): track
#> width: 3000
And this works as well (note that in this case the names of the GRangesList
are being used as rownames
):
library(rtracklayer)
bw_file <- system.file("extdata", "MNase.bw", package = "tidyCoverage")
bw_file
#> [1] "/tmp/RtmppFf4Dr/Rinst3ef19b4dd3f2ca/tidyCoverage/extdata/MNase.bw"
bed_file <- system.file("extdata", "TSSs.bed", package = "tidyCoverage")
bed_file
#> [1] "/tmp/RtmppFf4Dr/Rinst3ef19b4dd3f2ca/tidyCoverage/extdata/TSSs.bed"
CoverageExperiment(
tracks = BigWigFile(bw_file),
features = GRangesList('TSSs' = import(bed_file)),
width = 3000
)
#> class: CoverageExperiment
#> dim: 1 1
#> metadata(0):
#> assays(1): coverage
#> rownames(1): TSSs
#> rowData names(2): features n
#> colnames(1): track
#> colData names(1): track
#> width: 3000
CoverageExperiment
objectBy default, CoverageExperiment
objects store per-base track coverage.
This implies that any cell from the coverage
assay has as many columns
as the width
provided in the constructor function.
assay(CE, 'coverage')[1, 1][[1]] |> ncol()
#> [1] 3000
If per-base resolution is not needed, one can use the window
argument in
the constructor function to average the coverage score over non-overlapping bins.
CE2 <- CoverageExperiment(
tracks = import(bw_file, as = "Rle"),
features = import(bed_file),
width = 3000,
window = 20
)
CE2
#> class: CoverageExperiment
#> dim: 1 1
#> metadata(0):
#> assays(1): coverage
#> rownames(1): features
#> rowData names(2): features n
#> colnames(1): track
#> colData names(1): track
#> width: 3000
assay(CE2, 'coverage')[1, 1][[1]] |> ncol()
#> [1] 150
If a CoverageExperiment
object has already been computed, the coarsen()
function can be used afterwards to reduce the resolution of the object.
CE3 <- coarsen(CE, window = 20)
CE3
#> class: CoverageExperiment
#> dim: 1 1
#> metadata(0):
#> assays(1): coverage
#> rownames(1): features
#> rowData names(2): features n
#> colnames(1): track
#> colData names(1): track
#> width: 3000
identical(CE2, CE3)
#> [1] TRUE
CoverageExperiment
objectThe expand
method from the tidyr
package is adapted to CoverageExperiment
objects to return a tidy tibble
. This reformated object contains several
columns:
track
: storing colnames
, i.e. names of tracks used in the original CoverageExperiment
;features
: storing rownames
, i.e. names of features used in the original CoverageExperiment
;chr
: features seqnames
from the CoverageExperiment
;ranges
: features from the CoverageExperiment
coerced as character
;strand
: features strand
from the CoverageExperiment
;coord
: exact genomic position from the CoverageExperiment
;coverage
: coverage score extracted from corresponding track
at chr:coord
;coord.scaled
: 0-centered genomic position;expand(CE)
#> # A tibble: 5,355,000 × 8
#> # Groups: track, features, ranges [1,785]
#> track features chr ranges strand coord coverage coord.scaled
#> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 track features II II:4290-7289:+ + 4290 0 -1500
#> 2 track features II II:4290-7289:+ + 4291 0 -1499
#> 3 track features II II:4290-7289:+ + 4292 0 -1498
#> 4 track features II II:4290-7289:+ + 4293 0 -1497
#> 5 track features II II:4290-7289:+ + 4294 0 -1496
#> 6 track features II II:4290-7289:+ + 4295 0 -1495
#> 7 track features II II:4290-7289:+ + 4296 0 -1494
#> 8 track features II II:4290-7289:+ + 4297 0 -1493
#> 9 track features II II:4290-7289:+ + 4298 0 -1492
#> 10 track features II II:4290-7289:+ + 4299 0 -1491
#> # ℹ 5,354,990 more rows
Note that if the CoverageExperiment
object has been coarsened using window = ...
,
the coord
and coord.scaled
are handled correspondingly.
expand(CE3)
#> # A tibble: 267,750 × 8
#> # Groups: track, features, ranges [1,785]
#> track features chr ranges strand coord coverage coord.scaled
#> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 track features II II:4290-7289:+ + 4290 0 -1500
#> 2 track features II II:4290-7289:+ + 4310 0 -1480
#> 3 track features II II:4290-7289:+ + 4330 0 -1460
#> 4 track features II II:4290-7289:+ + 4350 0 -1440
#> 5 track features II II:4290-7289:+ + 4370 0 -1420
#> 6 track features II II:4290-7289:+ + 4390 0 -1400
#> 7 track features II II:4290-7289:+ + 4410 0 -1380
#> 8 track features II II:4290-7289:+ + 4430 0 -1360
#> 9 track features II II:4290-7289:+ + 4450 0 -1340
#> 10 track features II II:4290-7289:+ + 4470 0 -1320
#> # ℹ 267,740 more rows
To illustrate how to visualize coverage tracks from a CoverageExperiment
object over a single genomic locus of interest,
let’s use sample data provided in the tidyCoverage
package.
# ~~~~~~~~~~~~~~~ Import coverage tracks into a named list ~~~~~~~~~~~~~~~ #
tracks <- list(
Scc1 = system.file("extdata", "Scc1.bw", package = "tidyCoverage"),
RNA_fwd = system.file("extdata", "RNA.fwd.bw", package = "tidyCoverage"),
RNA_rev = system.file("extdata", "RNA.rev.bw", package = "tidyCoverage"),
PolII = system.file("extdata", "PolII.bw", package = "tidyCoverage"),
MNase = system.file("extdata", "MNase.bw", package = "tidyCoverage")
) |> BigWigFileList()
locus <- GRanges("II:450001-475000")
# ~~~~~~~~~~~~~~~ Instantiate a CoverageExperiment object ~~~~~~~~~~~~~~~ #
CE_chrII <- CoverageExperiment(
tracks = tracks,
features = locus,
width = width(locus)
)
CE_chrII
#> class: CoverageExperiment
#> dim: 1 5
#> metadata(0):
#> assays(1): coverage
#> rownames(1): features
#> rowData names(2): features n
#> colnames(5): Scc1 RNA_fwd RNA_rev PolII MNase
#> colData names(1): track
#> width: 25000
From there, it is easy to (optionally) coarsen
then expand
the
CoverageExperiment
into a tibble
and use ggplot2
for visualization.
library(ggplot2)
CE_chrII |>
coarsen(window = 10) |>
expand() |>
ggplot(aes(x = coord, y = coverage)) +
geom_col(aes(fill = track, col = track)) +
facet_grid(track~., scales = 'free') +
scale_x_continuous(expand = c(0, 0)) +
theme_bw() +
theme(legend.position = "none", aspect.ratio = 0.1)
In this plot, each facet represents the coverage of a different genomic track
over a single region of interest (chrII:450001-475000
). Each facet has
independent scaling thanks to facet_grid(..., scales = free)
.
AggregatedCoverage
objectsCoverageExperiment
into an AggregatedCoverage
objectIt is often useful to aggregate()
genomic tracks
coverage over a set of
genomic features
.
AC <- aggregate(CE)
AC
#> class: AggregatedCoverage
#> dim: 1 1
#> metadata(0):
#> assays(8): mean median ... ci_low ci_high
#> rownames(1): features
#> rowData names(2): features n
#> colnames(1): track
#> colData names(1): track
#> width: 3000
#> binning: 1
assay(AC, 'mean')[1, 1][[1]] |> length()
#> [1] 3000
AC20 <- aggregate(CE, bin = 20)
AC20
#> class: AggregatedCoverage
#> dim: 1 1
#> metadata(0):
#> assays(8): mean median ... ci_low ci_high
#> rownames(1): features
#> rowData names(2): features n
#> colnames(1): track
#> colData names(1): track
#> width: 3000
#> binning: 20
assay(AC20, 'mean')[1, 1][[1]] |> length()
#> [1] 150
The resulting AggregatedCoverage
objects can be readily coerced into a tibble
.
as_tibble(AC20)
#> # A tibble: 150 × 13
#> .sample .feature track features coord mean median min max sd se
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 track features track features -1500 2.99 2.89 0 9.51 1.75 0.0415
#> 2 track features track features -1480 3.01 2.92 0 9.56 1.76 0.0416
#> 3 track features track features -1460 3.07 2.96 0 10.4 1.79 0.0425
#> 4 track features track features -1440 3.13 3.00 0 10.4 1.81 0.0428
#> 5 track features track features -1420 3.13 2.99 0 10.4 1.81 0.0428
#> 6 track features track features -1400 3.12 2.98 0 10.1 1.81 0.0428
#> 7 track features track features -1380 3.06 2.95 0 9.54 1.79 0.0424
#> 8 track features track features -1360 3.01 2.93 0 10.2 1.79 0.0423
#> 9 track features track features -1340 3.06 2.98 0 10.6 1.80 0.0426
#> 10 track features track features -1320 3.03 2.95 0 10.6 1.80 0.0426
#> # ℹ 140 more rows
#> # ℹ 2 more variables: ci_low <dbl>, ci_high <dbl>
Note that the coarsen-then-aggregate
or aggregate-by-bin
are NOT
equivalent. This is due to the certain operations being not commutative with mean
(e.g. sd
, min
/max
, …).
# Coarsen `CoverageExperiment` with `window = ...` then per-bin `aggregate`:
CoverageExperiment(
tracks = import(bw_file, as = "Rle"), features = import(bed_file),
width = 3000
) |>
coarsen(window = 20) |> ## FIRST COARSEN...
aggregate() |> ## ... THEN AGGREGATE
as_tibble()
#> # A tibble: 150 × 13
#> .sample .feature track features coord mean median min max sd se
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 track features track features -1500 2.99 2.95 0 9.01 1.65 0.0391
#> 2 track features track features -1480 3.01 2.97 0 9.52 1.67 0.0396
#> 3 track features track features -1460 3.07 3.00 0 10.4 1.70 0.0402
#> 4 track features track features -1440 3.13 3.04 0 10.4 1.72 0.0407
#> 5 track features track features -1420 3.13 3.01 0 10.4 1.72 0.0407
#> 6 track features track features -1400 3.12 3.02 0 9.28 1.71 0.0405
#> 7 track features track features -1380 3.06 3.01 0 9.23 1.70 0.0402
#> 8 track features track features -1360 3.01 2.94 0 9.68 1.70 0.0401
#> 9 track features track features -1340 3.06 3.01 0 10.6 1.71 0.0405
#> 10 track features track features -1320 3.03 2.99 0 10.6 1.71 0.0404
#> # ℹ 140 more rows
#> # ℹ 2 more variables: ci_low <dbl>, ci_high <dbl>
# Per-base `CoverageExperiment` then `aggregate` with `bin = ...`:
CoverageExperiment(
tracks = import(bw_file, as = "Rle"), features = import(bed_file),
width = 3000
) |>
aggregate(bin = 20) |> ## DIRECTLY AGGREGATE BY BIN
as_tibble()
#> # A tibble: 150 × 13
#> .sample .feature track features coord mean median min max sd se
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 track features track features -1500 2.99 2.89 0 9.51 1.75 0.0415
#> 2 track features track features -1480 3.01 2.92 0 9.56 1.76 0.0416
#> 3 track features track features -1460 3.07 2.96 0 10.4 1.79 0.0425
#> 4 track features track features -1440 3.13 3.00 0 10.4 1.81 0.0428
#> 5 track features track features -1420 3.13 2.99 0 10.4 1.81 0.0428
#> 6 track features track features -1400 3.12 2.98 0 10.1 1.81 0.0428
#> 7 track features track features -1380 3.06 2.95 0 9.54 1.79 0.0424
#> 8 track features track features -1360 3.01 2.93 0 10.2 1.79 0.0423
#> 9 track features track features -1340 3.06 2.98 0 10.6 1.80 0.0426
#> 10 track features track features -1320 3.03 2.95 0 10.6 1.80 0.0426
#> # ℹ 140 more rows
#> # ℹ 2 more variables: ci_low <dbl>, ci_high <dbl>
AggregatedCoverage
over multiple tracks / feature setsAs en example for the rest of this vignette, we compute an AggregatedCoverage
object using multiple genomic track files and multiple sets of genomic ranges.
library(purrr)
#>
#> Attaching package: 'purrr'
#> The following object is masked from 'package:GenomicRanges':
#>
#> reduce
#> The following object is masked from 'package:IRanges':
#>
#> reduce
library(plyranges)
#>
#> Attaching package: 'plyranges'
#> The following object is masked from 'package:IRanges':
#>
#> slice
#> The following object is masked from 'package:stats':
#>
#> filter
# ~~~~~~~~~~~~~~~ Import genomic features into a named list ~~~~~~~~~~~~~~~ #
features <- list(
TSSs = system.file("extdata", "TSSs.bed", package = "tidyCoverage"),
`Convergent transcription` = system.file("extdata", "conv_transcription_loci.bed", package = "tidyCoverage")
) |> map(import) |> map(filter, strand == '+')
# ~~~~~~~~~~~~~~~ Import coverage tracks into a named list ~~~~~~~~~~~~~~~ #
tracks <- list(
Scc1 = system.file("extdata", "Scc1.bw", package = "tidyCoverage"),
RNA_fwd = system.file("extdata", "RNA.fwd.bw", package = "tidyCoverage"),
RNA_rev = system.file("extdata", "RNA.rev.bw", package = "tidyCoverage"),
PolII = system.file("extdata", "PolII.bw", package = "tidyCoverage"),
MNase = system.file("extdata", "MNase.bw", package = "tidyCoverage")
) |> map(import, as = 'Rle')
# ~~~~~~~~~~~~~~~ Compute aggregated coverage ~~~~~~~~~~~~~~~ #
CE <- CoverageExperiment(tracks, features, width = 5000, scale = TRUE, center = TRUE)
CE
#> class: CoverageExperiment
#> dim: 2 5
#> metadata(0):
#> assays(1): coverage
#> rownames(2): TSSs Convergent transcription
#> rowData names(2): features n
#> colnames(5): Scc1 RNA_fwd RNA_rev PolII MNase
#> colData names(1): track
#> width: 5000
AC <- aggregate(CE)
AC
#> class: AggregatedCoverage
#> dim: 2 5
#> metadata(0):
#> assays(8): mean median ... ci_low ci_high
#> rownames(2): TSSs Convergent transcription
#> rowData names(2): features n
#> colnames(5): Scc1 RNA_fwd RNA_rev PolII MNase
#> colData names(1): track
#> width: 5000
#> binning: 1
ggplot2
Because AggregatedCoverage
objects can be easily coerced into tibble
s,
the full range of ggplot2
functionalities can be exploited to plot
aggregated coverage signal of multiple tracks over multiple sets of genomic ranges.
AC |>
as_tibble() |>
ggplot(aes(x = coord, y = mean, group = interaction(features, track), col = track)) +
geom_line()