The package includes a dataset from a study by Noguera-Julian, M., et al. 2016, which investigated the differential abundance of microbial species between men who have sex with men (MSM) and non-MSM (hts). This data is stored as an object of the phyloseq
class, a standard input format for creating recipes with dar in conjunction with TreeSummarizedExperiment
. To begin the analysis, we first load and inspect the data:
First, we will create a recipe object from the original data and then specify the processing and differential analysis steps.
Recipes can be created manually by sequentially adding roles to variables in a data set.
The easiest way to create the initial recipe is:
rec_obj <- recipe(metaHIV_phy, var_info = "RiskGroup2", tax_info = "Species")
rec_obj
#> ── DAR Recipe ──────────────────────────────────────────────────────────────────
#> Inputs:
#>
#> ℹ phyloseq object with 451 taxa and 156 samples
#> ℹ variable of interes RiskGroup2 (class: character, levels: hts, msm, pwid)
#> ℹ taxonomic level Species
The var_info
argument corresponds to the variable to be considered in the modeling process and tax_info
indicates the taxonomic level that will be used for the analyses.
From here, preprocessing steps for some step X can be added sequentially in one of two ways:
Note that all step_ancom
and the other functions will always return updated recipes.
We have two types of steps, those in charge of processing (prepro) and those destined to define the methods of differential analysis (da).
The prepro steps are used to modify the data loaded into the recipe which will then be used for the da steps. The dar
package include 3 main preprocessing functionalities.
step_subset_taxa
: Is used for subsetting the columns and values within the tax_table.
step_filter_taxa
: Is used for filtering OTUs from recipe objects.
step_rarefaction
: Is used to resample an OTU table such that all samples have the same library size.
Additionally, the dar
package provides convenient wrappers for the step_filter_taxa
function, designed to filter Operational Taxonomic Units (OTUs) based on specific criteria: prevalence, variance, abundance, and rarity.
step_filter_by_prevalence
: Filters OTUs according to the number of samples in which the OTU appears.step_filter_by_variance
: Filters OTUs based on the variance of the OTU’s presence across samples.step_filter_by_abundance
: Filters OTUs according to the OTU’s abundance across samples.step_filter_by_rarity
: Filters OTUs based on the rarity of the OTU across samples.For our data, we can add an operation to preprocessing the data stored in the initial recpie. First, we will use step_subset_taxa
to retain only Bacteria and Archaea OTUs from the Kingdom taxonomic level. We will then filter out OTUs where at least 3% of the samples have counts greater than 0.
rec_obj <- rec_obj |>
step_subset_taxa(tax_level = "Kingdom", taxa = c("Bacteria", "Archaea")) |>
step_filter_by_prevalence(0.03)
rec_obj
#> ── DAR Recipe ──────────────────────────────────────────────────────────────────
#> Inputs:
#>
#> ℹ phyloseq object with 451 taxa and 156 samples
#> ℹ variable of interes RiskGroup2 (class: character, levels: hts, msm, pwid)
#> ℹ taxonomic level Species
#>
#> Preporcessing steps:
#>
#> ◉ step_subset_taxa() id = subset_taxa__Punsch_roll
#> ◉ step_filter_by_prevalence() id = filter_by_prevalence__Sad_cake
#>
#> DA steps:
Now that we have defined the preprocessing of the input data for all the da methods that will be used, we need to define them. For this introduction we will use metagenomeSeq and maaslin2 methods with default parameters (those defined by the authors of each method).
rec_obj <- rec_obj |>
step_deseq() |>
step_metagenomeseq(rm_zeros = 0.01) |>
step_maaslin()
rec_obj
#> ── DAR Recipe ──────────────────────────────────────────────────────────────────
#> Inputs:
#>
#> ℹ phyloseq object with 451 taxa and 156 samples
#> ℹ variable of interes RiskGroup2 (class: character, levels: hts, msm, pwid)
#> ℹ taxonomic level Species
#>
#> Preporcessing steps:
#>
#> ◉ step_subset_taxa() id = subset_taxa__Punsch_roll
#> ◉ step_filter_by_prevalence() id = filter_by_prevalence__Sad_cake
#>
#> DA steps:
#>
#> ◉ step_deseq() id = deseq__Qottab
#> ◉ step_metagenomeseq() id = metagenomeseq__Profiterole
#> ◉ step_maaslin() id = maaslin__Dabby_Doughs
The dar
package includes more da steps than those defined above. Below is the full list:
To ensure the reproducibility and consistency of the generated results, all the steps defined in the recipe are executed at the same time using the prep
function.
da_results <- prep(rec_obj, parallel = TRUE)
da_results
#> ── DAR Results ─────────────────────────────────────────────────────────────────
#> Inputs:
#>
#> ℹ phyloseq object with 278 taxa and 156 samples
#> ℹ variable of interes RiskGroup2 (class: character, levels: hts, msm, pwid)
#> ℹ taxonomic level Species
#>
#> Results:
#>
#> ✔ deseq__Qottab diff_taxa = 166
#> ✔ metagenomeseq__Profiterole diff_taxa = 236
#> ✔ maaslin__Dabby_Doughs diff_taxa = 146
#>
#> ℹ 65 taxa are present in all tested methods
Note that the resulting object print shows information about the amount of differentially abundant OTUs in each of the methods, as well as the number of OTUs that have been detected by all methods (consensus).
Now that we have the results we need to extract them, however for this we first need to define a consensus strategy with the bake
. For this example we are only interested in those OTUs detected as differentially abundant in the three methods used.
## Number of used methods
count <- steps_ids(da_results, type = "da") |> length()
## Define the bake
da_results <- bake(da_results, count_cutoff = count)
Finally we can extract the table with the results using the cool
function.
cool(da_results)
#> # A tibble: 65 × 2
#> taxa_id taxa
#> <chr> <chr>
#> 1 Otu_15 Bifidobacterium_catenulatum
#> 2 Otu_34 Olsenella_scatoligenes
#> 3 Otu_35 Collinsella_aerofaciens
#> 4 Otu_37 Collinsella_stercoris
#> 5 Otu_38 Enorma_massiliensis
#> 6 Otu_45 Slackia_isoflavoniconvertens
#> 7 Otu_47 Bacteroides_cellulosilyticus
#> 8 Otu_48 Bacteroides_clarus
#> 9 Otu_63 Bacteroides_plebeius
#> 10 Otu_69 Bacteroides_sp_CAG_530
#> # ℹ 55 more rows
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.4.0 RC (2024-04-16 r86468)
#> os Ubuntu 22.04.4 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate C
#> ctype en_US.UTF-8
#> tz America/New_York
#> date 2024-05-01
#> pandoc 2.7.3 @ /usr/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> ade4 1.7-22 2023-02-06 [2] CRAN (R 4.4.0)
#> ape 5.8 2024-04-11 [2] CRAN (R 4.4.0)
#> assertthat 0.2.1 2019-03-21 [2] CRAN (R 4.4.0)
#> Biobase 2.65.0 2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#> BiocGenerics 0.51.0 2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#> biomformat 1.33.0 2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#> Biostrings 2.73.0 2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#> brio 1.1.5 2024-04-24 [2] CRAN (R 4.4.0)
#> bslib 0.7.0 2024-03-29 [2] CRAN (R 4.4.0)
#> ca 0.71.1 2020-01-24 [2] CRAN (R 4.4.0)
#> cachem 1.0.8 2023-05-01 [2] CRAN (R 4.4.0)
#> cli 3.6.2 2023-12-11 [2] CRAN (R 4.4.0)
#> cluster 2.1.6 2023-12-01 [3] CRAN (R 4.4.0)
#> codetools 0.2-20 2024-03-31 [3] CRAN (R 4.4.0)
#> colorspace 2.1-0 2023-01-23 [2] CRAN (R 4.4.0)
#> crayon 1.5.2 2022-09-29 [2] CRAN (R 4.4.0)
#> crosstalk 1.2.1 2023-11-23 [2] CRAN (R 4.4.0)
#> dar * 1.1.0 2024-05-01 [1] Bioconductor 3.20 (R 4.4.0)
#> data.table 1.15.4 2024-03-30 [2] CRAN (R 4.4.0)
#> dendextend 1.17.1 2023-03-25 [2] CRAN (R 4.4.0)
#> devtools 2.4.5 2022-10-11 [2] CRAN (R 4.4.0)
#> digest 0.6.35 2024-03-11 [2] CRAN (R 4.4.0)
#> dplyr 1.1.4 2023-11-17 [2] CRAN (R 4.4.0)
#> ellipsis 0.3.2 2021-04-29 [2] CRAN (R 4.4.0)
#> evaluate 0.23 2023-11-01 [2] CRAN (R 4.4.0)
#> fansi 1.0.6 2023-12-08 [2] CRAN (R 4.4.0)
#> farver 2.1.1 2022-07-06 [2] CRAN (R 4.4.0)
#> fastmap 1.1.1 2023-02-24 [2] CRAN (R 4.4.0)
#> foreach 1.5.2 2022-02-02 [2] CRAN (R 4.4.0)
#> fs 1.6.4 2024-04-25 [2] CRAN (R 4.4.0)
#> furrr 0.3.1 2022-08-15 [2] CRAN (R 4.4.0)
#> future 1.33.2 2024-03-26 [2] CRAN (R 4.4.0)
#> generics 0.1.3 2022-07-05 [2] CRAN (R 4.4.0)
#> GenomeInfoDb 1.41.0 2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#> GenomeInfoDbData 1.2.12 2024-04-23 [2] Bioconductor
#> ggplot2 3.5.1 2024-04-23 [2] CRAN (R 4.4.0)
#> globals 0.16.3 2024-03-08 [2] CRAN (R 4.4.0)
#> glue 1.7.0 2024-01-09 [2] CRAN (R 4.4.0)
#> gridExtra 2.3 2017-09-09 [2] CRAN (R 4.4.0)
#> gtable 0.3.5 2024-04-22 [2] CRAN (R 4.4.0)
#> heatmaply 1.5.0 2023-10-06 [2] CRAN (R 4.4.0)
#> highr 0.10 2022-12-22 [2] CRAN (R 4.4.0)
#> htmltools 0.5.8.1 2024-04-04 [2] CRAN (R 4.4.0)
#> htmlwidgets 1.6.4 2023-12-06 [2] CRAN (R 4.4.0)
#> httpuv 1.6.15 2024-03-26 [2] CRAN (R 4.4.0)
#> httr 1.4.7 2023-08-15 [2] CRAN (R 4.4.0)
#> igraph 2.0.3 2024-03-13 [2] CRAN (R 4.4.0)
#> IRanges 2.39.0 2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#> iterators 1.0.14 2022-02-05 [2] CRAN (R 4.4.0)
#> jquerylib 0.1.4 2021-04-26 [2] CRAN (R 4.4.0)
#> jsonlite 1.8.8 2023-12-04 [2] CRAN (R 4.4.0)
#> knitr 1.46 2024-04-06 [2] CRAN (R 4.4.0)
#> labeling 0.4.3 2023-08-29 [2] CRAN (R 4.4.0)
#> later 1.3.2 2023-12-06 [2] CRAN (R 4.4.0)
#> lattice 0.22-6 2024-03-20 [3] CRAN (R 4.4.0)
#> lazyeval 0.2.2 2019-03-15 [2] CRAN (R 4.4.0)
#> lifecycle 1.0.4 2023-11-07 [2] CRAN (R 4.4.0)
#> listenv 0.9.1 2024-01-29 [2] CRAN (R 4.4.0)
#> magrittr 2.0.3 2022-03-30 [2] CRAN (R 4.4.0)
#> MASS 7.3-60.2 2024-04-23 [3] local
#> Matrix 1.7-0 2024-03-22 [3] CRAN (R 4.4.0)
#> memoise 2.0.1 2021-11-26 [2] CRAN (R 4.4.0)
#> mgcv 1.9-1 2023-12-21 [3] CRAN (R 4.4.0)
#> microbiome 1.27.0 2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#> mime 0.12 2021-09-28 [2] CRAN (R 4.4.0)
#> miniUI 0.1.1.1 2018-05-18 [2] CRAN (R 4.4.0)
#> multtest 2.61.0 2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#> munsell 0.5.1 2024-04-01 [2] CRAN (R 4.4.0)
#> nlme 3.1-164 2023-11-27 [3] CRAN (R 4.4.0)
#> parallelly 1.37.1 2024-02-29 [2] CRAN (R 4.4.0)
#> permute 0.9-7 2022-01-27 [2] CRAN (R 4.4.0)
#> phyloseq 1.49.0 2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#> pillar 1.9.0 2023-03-22 [2] CRAN (R 4.4.0)
#> pkgbuild 1.4.4 2024-03-17 [2] CRAN (R 4.4.0)
#> pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.4.0)
#> pkgload 1.3.4 2024-01-16 [2] CRAN (R 4.4.0)
#> plotly 4.10.4 2024-01-13 [2] CRAN (R 4.4.0)
#> plyr 1.8.9 2023-10-02 [2] CRAN (R 4.4.0)
#> profvis 0.3.8 2023-05-02 [2] CRAN (R 4.4.0)
#> promises 1.3.0 2024-04-05 [2] CRAN (R 4.4.0)
#> purrr 1.0.2 2023-08-10 [2] CRAN (R 4.4.0)
#> R6 2.5.1 2021-08-19 [2] CRAN (R 4.4.0)
#> RColorBrewer 1.1-3 2022-04-03 [2] CRAN (R 4.4.0)
#> Rcpp 1.0.12 2024-01-09 [2] CRAN (R 4.4.0)
#> registry 0.5-1 2019-03-05 [2] CRAN (R 4.4.0)
#> remotes 2.5.0 2024-03-17 [2] CRAN (R 4.4.0)
#> reshape2 1.4.4 2020-04-09 [2] CRAN (R 4.4.0)
#> rhdf5 2.49.0 2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#> rhdf5filters 1.17.0 2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#> Rhdf5lib 1.27.0 2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#> rlang 1.1.3 2024-01-10 [2] CRAN (R 4.4.0)
#> rmarkdown 2.26 2024-03-05 [2] CRAN (R 4.4.0)
#> Rtsne 0.17 2023-12-07 [2] CRAN (R 4.4.0)
#> S4Vectors 0.43.0 2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#> sass 0.4.9 2024-03-15 [2] CRAN (R 4.4.0)
#> scales 1.3.0 2023-11-28 [2] CRAN (R 4.4.0)
#> seriation 1.5.5 2024-04-17 [2] CRAN (R 4.4.0)
#> sessioninfo 1.2.2 2021-12-06 [2] CRAN (R 4.4.0)
#> shiny 1.8.1.1 2024-04-02 [2] CRAN (R 4.4.0)
#> stringi 1.8.3 2023-12-11 [2] CRAN (R 4.4.0)
#> stringr 1.5.1 2023-11-14 [2] CRAN (R 4.4.0)
#> survival 3.6-4 2024-04-24 [3] CRAN (R 4.4.0)
#> testthat 3.2.1.1 2024-04-14 [2] CRAN (R 4.4.0)
#> tibble 3.2.1 2023-03-20 [2] CRAN (R 4.4.0)
#> tidyr 1.3.1 2024-01-24 [2] CRAN (R 4.4.0)
#> tidyselect 1.2.1 2024-03-11 [2] CRAN (R 4.4.0)
#> TSP 1.2-4 2023-04-04 [2] CRAN (R 4.4.0)
#> UCSC.utils 1.1.0 2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#> UpSetR 1.4.0 2019-05-22 [2] CRAN (R 4.4.0)
#> urlchecker 1.0.1 2021-11-30 [2] CRAN (R 4.4.0)
#> usethis 2.2.3 2024-02-19 [2] CRAN (R 4.4.0)
#> utf8 1.2.4 2023-10-22 [2] CRAN (R 4.4.0)
#> vctrs 0.6.5 2023-12-01 [2] CRAN (R 4.4.0)
#> vegan 2.6-4 2022-10-11 [2] CRAN (R 4.4.0)
#> viridis 0.6.5 2024-01-29 [2] CRAN (R 4.4.0)
#> viridisLite 0.4.2 2023-05-02 [2] CRAN (R 4.4.0)
#> webshot 0.5.5 2023-06-26 [2] CRAN (R 4.4.0)
#> withr 3.0.0 2024-01-16 [2] CRAN (R 4.4.0)
#> xfun 0.43 2024-03-25 [2] CRAN (R 4.4.0)
#> xtable 1.8-4 2019-04-21 [2] CRAN (R 4.4.0)
#> XVector 0.45.0 2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#> yaml 2.3.8 2023-12-11 [2] CRAN (R 4.4.0)
#> zlibbioc 1.51.0 2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>
#> [1] /tmp/RtmppiEMBP/Rinst3ffcc45cffd9ea
#> [2] /home/biocbuild/bbs-3.20-bioc/R/site-library
#> [3] /home/biocbuild/bbs-3.20-bioc/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────