Skip to content.

Bioconductor is an open source and open development software project
for the analysis and comprehension of genomic data.



ONLY EDIT THE .Rnw FILE!!! The .tex file is % likely to be overwritten. % \VignetteDepends{Biobase, hgu95av2, hgu133a, genefilter, ALL} % \VignetteIndexEntry{Bioconductor Probe Tutorial} % \VignetteKeywords{tutorial, technical duplicates, probe data} % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \documentclass[12pt]{article}

\usepackage{amsmath,pstricks,fullpage} \usepackage[authoryear,round]{natbib} \usepackage{hyperref} \usepackage{theorem} \usepackage{float}

\parindent 0in % Left justify

\newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle}

\newcommand{\Rpackage}[1]{\textit{#1}} \newcommand{\Rfunction}[1]{\textit{#1}} \newcommand{\Robject}[1]{\texttt{#1}} \newcommand{\Rclass}[1]{{\textit{#1}}} \newcommand{\Rmethod}[1]{{\textit{#1}}} \newcommand{\Rfunarg}[1]{{\textit{#1}}}

\title{Lab: Preprocessing and quality control}

\author{Wolfgang Huber, Robert Gentleman}



For this lab, you need the packages \Rpackage{estrogen}, \Rpackage{arrayMagic}, \Rpackage{vsn}. There are two alternative, partially overlapping paths through this lab, one for people with a preference for Affymetrix platform, and one for people using other array platforms (e.g. spotted and/or two-color arrays).

\paragraph{Exercise 1.} For Affymetrix people \begin{Schunk} \begin{Sinput} > library(estrogen) > openVignette("estrogen") \end{Sinput} \end{Schunk} This contains exercises for Affymetrix preprocessing, rudimentary quality control, up to the production of a gene list. Working through these exercises should take you 20-60 min, depending on your experience with R and the affy package. If you have brought your own data, process it in a similar way.

\paragraph{Exercise 2.} For spotted-array people \begin{Schunk} \begin{Sinput} > library(arrayMagic) > openVignette("arrayMagic") \end{Sinput} \end{Schunk} This contains exercises for spotted chip preprocessing and quality control. Working through these exercises should take you around 30 min, depending on your experience with R. If you have brought your own data, process it in a similar way.

\paragraph{Exercise 3.} Go through the exercises in the vignette for the package \Rpackage{vsn}. Section 5 is on the comparison of different normalization methods. Using either the lymphoma data that comes with the package, or your own data, produce plots like Fig. 9 that also take into account

\begin{itemize} \item loess normalization \item print-tip wise loess normalization \item print-tip wise vsn normalization \item with and without subtraction of the local background \item subtraction of a smoothed version (2D-loess) of the local background \end{itemize}

\paragraph{Exercise 4.} Instead of the plots of the previous exercise, use \textit{receiver operating characteristic curves} to compare the performance of different preprocessing methods for finding differentially expressed genes. A receiver operating characteristic curve is a plot of false positive rate (on the $x$-axis) versus true positive rate (on the $y$ axis).

In many cases, the "truth" is not actually known for every gene on the chip. We can however use the following estimates \begin{eqnarray} \mbox{TP} &=& \mbox{P}(1-\mbox{FDR})\\ \mbox{FP} &=& \mbox{P}\cdot\mbox{FDR} \end{eqnarray}

where TP and FP are the number of true and false positives, P is the length of the genelist select with a certain method at a certain threshold, and FDR is the false discovery rate estimated through a permutation procedure, e.g. from the function \Rfunction{fdc} in the package \Rpackage{arrayMagic}.



BioC 2.5, consisting of 352 packages and designed to work with R 2.10.z, was released today.


R, the open source platform used by Bioconductor, featured in a series of articles in the New York Times.