Contents

1 Project Overview

1.1 About

Bioconductor: Analysis and comprehension of high-throughput genomic data

Packages, vignettes, work flows

Package installation and use

1.2 Key concepts

Goals

What a few lines of R has to say

x <- rnorm(1000)
y <- x + rnorm(1000)
df <- data.frame(X=x, Y=y)
plot(Y ~ X, df)
fit <- lm(Y ~ X, df)
anova(fit)
## Analysis of Variance Table
## 
## Response: Y
##            Df  Sum Sq Mean Sq F value    Pr(>F)    
## X           1 1001.14 1001.14    1013 < 2.2e-16 ***
## Residuals 998  986.27    0.99                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
abline(fit)

Classes and methods – “S3”

Bioconductor classes and methods – “S4”

1.3 High-throughput sequence analysis work flows

  1. Experimental design

  2. Wet-lab sequence preparation (figure from http://rnaseq.uoregon.edu/)

  3. (Illumina) Sequencing (Bentley et al., 2008, doi:10.1038/nature07517)

  4. Alignment
    • Choose to match task, e.g., Rsubread, Bowtie2 good for ChIPseq, some forms of RNAseq; BWA, GMAP better for variant calling
    • Primary output: BAM files of aligned reads
    • More recently: kallisto and similar programs that produce tables of reads aligned to transcripts
  5. Reduction
    • e.g., RNASeq ‘count table’ (simple spreadsheets), DNASeq called variants (VCF files), ChIPSeq peaks (BED, WIG files)
  6. Analysis
    • Differential expression, peak identification, …
  7. Comprehension
    • Biological context

1.4 Bioconductor sequencing ecosystem

Alt Sequencing Ecosystem