A brief introduction of ENmix R package for DNA methylation data analysis.
The ENmix package provides a set of quality control, preprocessing/correction and data analysis tools for Illumina Methylation Beadchips. It includes functions to read in raw idat data, background correction, dye bias correction, probe-type bias adjustment, along with a number of additional tools. These functions can be used to remove unwanted experimental noise and thus to improve accuracy and reproducibility of methylation measures. ENmix functions are flexible and transparent. Users have option to choose a single pipeline command to finish all data pre-processing steps (including quality control, background correction, dye-bias adjustment, between-array normalization and probe-type bias correction) or to use individual functions sequentially to perform data pre-processing in a more customized manner. In addition the ENmix package has selectable complementary functions for efficient data visualization (such as QC plots, data distribution plot, manhattan plot and Q-Q plot), quality control (identifing and filtering low quality data points, samples, probes, and outliers, along with imputation of missing values), identification of probes with multimodal distributions due to SNPs or other factors, exploration of data variance structure using principal component regression analysis plot, preparation of experimental factors related surrogate control variables to be adjusted in downstream statistical analysis, an efficient algorithm oxBS-MLE to estimate 5-methylcytosine and 5-hydroxymethylcytosine level; estimation of celltype proporitons; methlation age calculation and differentially methylated region (DMR) analysis.
Most ENmix package can also support the data structure used by several other related R packages, such as minfi, wateRmelon and ChAMP, providing straightforward integration of ENmix-corrected datasets for subsequent data analysis.
ENmix readidat function does not depend on array annotation R packages. It can directly read in Illuminal manifest file, which makes it easier to work with newer array, such as MethylationEPICv2.0 and mouse Beadchip.
The software is designed to support large scale data analysis, and provides multi-processor parallel computing options for most functions.
readidat(): Read idat files into R
readmanifest(): Read array manifest file into R
QCinfo(): Extract and visualize QC information
plotCtrl(): Generate internal control plots
getCGinfo(): Extract CpG probe annotation information
calcdetP(): Compute detection P values
qcfilter(): Remove low quality values, samples or CpGs; remove outlier samples and perform imputation
nmode(): Identify “gap” probes, i.e. those with multimodal distribution from underlying caused by underlying SNPs
dupicc(): Calculate Introclass correlation coefficient (ICC) using data for duplicates
freqpoly(): Frequency polygon plot for single variable
multifreqpoly(): Frequency polygon plot for multiple variables
mpreprocess(): Preprocessing pipeline
preprocessENmix(): ENmix background correction and dye bias correction
relic(): RELIC dye bias correction
norm.quantile(): Quantile normalization
rcp(): RCP probe design type bias correction
Differential methylated region (DMR) analysis
ipdmr(): ipDMR differentially methylated region analysis
combp(): Combp differentially methylated region analysis
oxBS.MLE(): MLE estimates of 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC)
estimateCellProp(): Estimate white blood cell type proportions
methyAge(): Calculate methylation age
predSex(): Estimate sample sex
ctrlsva(): Derive surrogate variables to control for experimental confounding using non-negative internal control probes
pcrplot(): Principal component regression plot
mhtplot(): P value manhattan plot
p.qqplot(): P value Q-Q plot
B2M(): Convert Beta value to M value
M2B(): Convert M value to Beta value
ENmix organizes data with two different classes.
rgDataSet contains raw data (including internal control probes) from IDAT file, CpG annotation from Illumina manifest file and/or sample inforamtion (plate, array, and phenotypes) provided by users. Array intensity data is organized by probe (not CpG locus) at red and green channel.
methDataSet contains methylated and unmethylated intensity values (organized by CpG), CpG annotation from Illumina manifest file and/or sample inforamtion (plate, array, and phenotypes) provided by users.