Retrieve Feature Data

Nate D. Olson

2019-01-04

Overview

Greengenes 13.8 85% OTU MgDb

The gg 13.8 85% OTU is provided as part of the metagenomeFeatures package.

library(metagenomeFeatures)
gg85 <- get_gg13.8_85MgDb()

gg85 is a MgDb class object with the taxonomic heirarchy, sequence data, and phylogeny for the Greengenes database clustered at the 0.85 similarity threshold.

gg85
#> MgDb object:[1] "Metadata"
#> |ACCESSION_DATE: Mon Apr  2 13:30:09 2018
#> |URL: ftp://greengenes.microbio.me/greengenes_release/gg_13_8_otus
#> |DB_TYPE_NAME: GreenGenes
#> |DB_VERSION: 13.8 85% OTUS
#> |DB_TYPE_VALUE: MgDb
#> |DB_SCHEMA_VERSION: 2.0
#> [1] "Sequence Data:"
#> [1] "DECIPHER formatted seqDB"
#> [1] "Taxonomy Data:"
#> # Source:   table<Seqs> [?? x 11]
#> # Database: sqlite 3.22.0
#> #   [/tmp/RtmpESY947/Rinst716c2f67880b/metagenomeFeatures/extdata/gg13.8_85.sqlite]
#>    row_names identifier description Keys  Kingdom Phylum Class Ord   Family
#>        <int> <chr>      <chr>       <chr> <chr>   <chr>  <chr> <chr> <chr> 
#>  1         1 MgDb       1111561     1111… k__Bac… p__Pr… c__G… o__L… f__   
#>  2         2 MgDb       1111421     1111… k__Bac… p__Pr… c__A… o__R… f__   
#>  3         3 MgDb       1111090     1111… k__Bac… p__Ac… c__N… o__N… f__Ni…
#>  4         4 MgDb       1110893     1110… k__Bac… p__Ba… c__[… o__[… f__Sa…
#>  5         5 MgDb       1110814     1110… k__Bac… p__BR… c__   o__   f__   
#>  6         6 MgDb       1110088     1110… k__Bac… p__Pr… c__G… o__   f__   
#>  7         7 MgDb       1109993     1109… k__Bac… p__Ch… c__D… o__   f__   
#>  8         8 MgDb       1109948     1109… k__Bac… p__Pl… c__[… o__B… f__W4 
#>  9         9 MgDb       1109493     1109… k__Bac… p__Pl… c__v… o__   f__   
#> 10        10 MgDb       1109328     1109… k__Bac… p__Ch… c__A… o__S… f__   
#> # … with more rows, and 2 more variables: Genus <chr>, Species <chr>
#> [1] "Tree Data:"
#> 
#> Phylogenetic tree with 5088 tips and 5087 internal nodes.
#> 
#> Tip labels:
#>  4479984, 540377, 811993, 823988, 4397176, 4446470, ...
#> 
#> Rooted; includes branch lengths.

QIITA Dataset

For this vignette we are using 16S rRNA data from Rousk et al. 2010, a soil microbiome study, https://qiita.ucsd.edu/study/description/94. A BIOM and qiime mapping file for the study can be obtained from QIITA. A vector of Greengenes for the study cluster centers is included in this package for use in this vignette.

data_dir <- system.file("extdata", package = "metagenomeFeatures")
soil_gg_ids <-  readRDS(file.path(data_dir, "qiita_study_94_gg_ids.RDS"))

Obtaining Sequences and Phylogenetic Tree

soil_mgF <- annotateFeatures(gg85, soil_gg_ids)
#> Found more than one class "phylo" in cache; using the first, from namespace 'metagenomeFeatures'
#> Also defined by 'tidytree'
#> Found more than one class "phylo" in cache; using the first, from namespace 'metagenomeFeatures'
#> Also defined by 'tidytree'
#> Found more than one class "phylo" in cache; using the first, from namespace 'metagenomeFeatures'
#> Also defined by 'tidytree'
#> Found more than one class "phylo" in cache; using the first, from namespace 'metagenomeFeatures'
#> Also defined by 'tidytree'

The resulting mgFeatures class object has the taxonomic heirarchy, phylogeny, and sequence data for the study OTUs.

soil_mgF
#> mgFeatures with 61 rows and 8 columns
#>            Keys     Kingdom            Phylum                   Class
#>     <character> <character>       <character>             <character>
#> 1       1107824 k__Bacteria p__Proteobacteria  c__Gammaproteobacteria
#> 2        824826 k__Bacteria p__Proteobacteria  c__Alphaproteobacteria
#> 3        694268 k__Bacteria          p__WPS-2                     c__
#> 4        579266 k__Bacteria p__Proteobacteria   c__Betaproteobacteria
#> 5        558862 k__Bacteria       p__Chlorobi               c__SJA-28
#> ...         ...         ...               ...                     ...
#> 57      4389227 k__Bacteria  p__Acidobacteria               c__iii1-8
#> 58      4391683 k__Bacteria p__Proteobacteria  c__Alphaproteobacteria
#> 59      4421369 k__Bacteria  p__Acidobacteria c__[Chloracidobacteria]
#> 60      4477112 k__Bacteria            p__WS2              c__SHA-109
#> 61      4479102 k__Bacteria           p__OP11               c__OP11-4
#>                     Ord               Family       Genus     Species
#>             <character>          <character> <character> <character>
#> 1      o__Legionellales      f__Coxiellaceae         g__         s__
#> 2                   o__                  f__         g__         s__
#> 3                   o__                  f__         g__         s__
#> 4    o__Burkholderiales                  f__         g__         s__
#> 5                   o__                  f__         g__         s__
#> ...                 ...                  ...         ...         ...
#> 57             o__32-20                  f__         g__         s__
#> 58  o__Sphingomonadales f__Sphingomonadaceae         g__         s__
#> 59              o__RB41         f__Ellin6075         g__         s__
#> 60                  o__                  f__         g__         s__
#> 61                  o__                  f__         g__         s__

Sequence data

mgF_seq(soil_mgF)
#>   A DNAStringSet instance of length 61
#>      width seq                                         names               
#>  [1]  1502 TAGAGTTTGATCCTGGCTCA...AGTCGTAACAAGGTAGCCGT 1107824
#>  [2]  1396 AGAGTTTGATCATGGCTCAG...GCCTTGTACACACCGCCCGT 824826
#>  [3]  1424 AACGCTGGCGGCGTGCCTAA...GGTAAGGGGGACGAAGTCGT 694268
#>  [4]  1498 AGAGTTTGATCCTGGCTCAG...GTCGTAACAAGGTAGCCGTA 579266
#>  [5]  1432 AGAGTTTGATCATGGCTCAG...CAGAAGTAGTTAGCCTAACC 558862
#>  ...   ... ...
#> [57]  1378 TGCTTAACACATGCAAGTCG...TTGCACACACCGCCCGTCAC 4389227
#> [58]  1606 AGAGTTTGATCCTGGCTCAG...GAAGTCGTAACAAGGTAACC 4391683
#> [59]  1496 AGAGTTTGATCCTGGCTCAG...GAAGTCGTAACAAGGTAACC 4421369
#> [60]  1403 GAACGCTGGCGGTACGTCTG...CGGCCGAAGGTGGAGTCAGT 4477112
#> [61]  1336 GATGAACGCTGGCGGCGTGC...CAAAGTTGGGGGCGCCCGAA 4479102

Tree data

mgF_tree(soil_mgF)
#> 
#> Phylogenetic tree with 61 tips and 60 internal nodes.
#> 
#> Tip labels:
#>  200762, 206404, 4479102, 113767, 551866, 552576, ...
#> 
#> Rooted; includes branch lengths.