Contents

1 Generation of Quality Controls

SeSAMe provides a set of quality control steps.

The SeSAMe QC function returns an sesameQC object which can be directly printed onto the screen.

sesameQC(ssets[[1]])
## 
## =======================
## =      Intensities    =
## =======================
## No. probes (num_probes_all)        485577 
## mean (M/U) (mean_intensity):       5529.506 
## mean (M+U) (mean_intensity_total): 11059.01 
## 
## -- Infinium II --
## No. probes: (num_probes_II)        350076 (72.095%)
## Mean Intensity (mean_ii):          5160.813 
## 
## -- Infinium I (Red) -- 
## No. probes: (num_probes_IR)        89203 (18.371%)
## No. Probes Consistent Channel:     88799 
## No. Porbes Swapped Channel:        162 
## No. Probes Low Intensity:          242 
## Mean Intensity (in-band):          6527.3 
## Mean Intensity (out-of-band):      928.2117 
## 
## -- Infinium I (Grn) -- 
## No. probes:                     46298 (9.535%)
## No. Probes Consistent Channel:     46000 
## No. Probes Swapped Channel:        254 
## No. Probes Low Intensity:          44 
## Mean Intensity (in-band):          6394.865 
## Mean Intensity (out-of-band):      640.0676 
## 
## =======================
## =    Non-detection    =
## =======================
## No. probes:                        64144 
## No. probes w/ NA (num_na):         64144 (13.210%)
## No. nondetection (num_nondt):      0 (0.000%)
## 
## =======================
## =      Beta Values    =
## =======================
## Mean Betas:                        0.509425 
## Median Betas:                      0.6091139 
## 
## -- cg probes --
## No. Probes:                        482421 
## No. Probes with NA:                63563 (13.176%)
## Mean Betas:                        0.5118303 
## Median Betas:                      0.6180797 
## % Unmethylated (Beta < 0.3):       41.018%
## % Methylated (Beta > 0.7):         46.791%
## 
## -- ch probes --
## No. Probes:                        3091 
## No. Probes with NA:                581 (18.797%)
## Mean Betas:                        0.1081252 
## Median Betas:                      0.07147149 
## % Unmethylated (Beta < 0.3):       94.861%
## % Methylated (Beta > 0.7):         0.000%
## 
## -- rs probes --
## No. Probes:                        65 
## No. Probes with NA:                0 (0.000%)
## Mean Betas:                        0.5058478 
## Median Betas:                      0.5112609 
## % Unmethylated (Beta < 0.3):       30.769%
## % Methylated (Beta > 0.7):         32.308%
## 
## =======================
## =      Inferences     =
## =======================
## Sex:                            MALE 
## Ethnicity:                      WHITE 
## Age:                            61.43636 
## Bisulfite Conversion (GCT):     1.10858

The sesameQC object can be coerced into data.frame and linked using the following code

qc10 <- do.call(rbind, lapply(ssets, function(x)
    as.data.frame(sesameQC(x))))
qc10$sample_name <- names(ssets)

qc10[,c('mean_beta_cg','frac_meth_cg','frac_unmeth_cg','sex','age')]
##          mean_beta_cg frac_meth_cg frac_unmeth_cg  sex      age
## WB_105      0.5118303     46.79104       41.01820 MALE 61.43636
## WB_218      0.5101290     47.49724       41.97270 MALE 41.39507
## WB_261      0.5132241     47.64622       41.94357 MALE 25.75848
## PBMC_105    0.5167078     46.44486       40.16349 MALE 60.57983
## PBMC_218    0.5190739     48.01221       41.17625 MALE 42.90046
## PBMC_261    0.5179995     48.39253       41.28344 MALE 22.63487
## Gran_105    0.5108416     47.70949       43.27648 MALE 63.18373
## Gran_218    0.5074659     47.50894       43.63102 MALE 43.25656
## Gran_261    0.5100516     47.88544       43.51331 MALE 26.57018
## CD4+_105    0.5235019     47.97616       40.25087 MALE 54.23047

2 Background

The background level is given by mean_oob_grn and mean_oob_red

library(ggplot2)
ggplot(qc10,
    aes(x = mean_oob_grn, y= mean_oob_red, label = sample_name)) +
    geom_point() + geom_text(hjust = -0.1, vjust = 0.1) +
    geom_abline(intercept = 0, slope = 1, linetype = 'dotted') +
    xlab('Green Background') + ylab('Red Background') +
    xlim(c(500,1200)) + ylim(c(500,1200))

3 Mean Intensity

The mean {M,U} intensity can be reached by mean_intensity. Similarly, the mean M+U intensity can be reached by mean_intensity_total. Low intensities are symptomatic of low input or poor hybridization.

library(wheatmap)
p1 <- ggplot(qc10) +
    geom_bar(aes(sample_name, mean_intensity), stat='identity') +
    xlab('Sample Name') + ylab('Mean Intensity') +
    ylim(0,18000) +
    theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))
p2 <- ggplot(qc10) +
    geom_bar(aes(sample_name, mean_intensity_total), stat='identity') +
    xlab('Sample Name') + ylab('Mean M+U Intensity') +
    ylim(0,18000) +
    theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))
WGG(p1) + WGG(p2, RightOf())

4 Fraction of color channel switch

The fraction of color channel switch can be found in InfI_switch_G2R and InfI_switch_R2G. These numbers are symptomatic of how Infinium I probes are affected by SNP-induced color channel switching.

ggplot(qc10) +
    geom_point(aes(InfI_switch_G2R, InfI_switch_R2G))

5 Fraction of NA

The fraction of NAs are signs of masking due to variety of reasons including failed detection, high background, putative low quality probes etc. This number can be reached in frac_na_cg and num_na_cg (the cg stands for CpG probes, so we also have num_na_ch and num_na_rs)

p1 <- ggplot(qc10) +
    geom_bar(aes(sample_name, num_na_cg), stat='identity') +
    xlab('Sample Name') + ylab('Number of NAs') +
    theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))
p2 <- ggplot(qc10) +
    geom_bar(aes(sample_name, frac_na_cg), stat='identity') +
    xlab('Sample Name') + ylab('Fraction of NAs (%)') +
    theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))
WGG(p1) + WGG(p2, RightOf())

6 Quality Ranking

Sesame provide convenient function to compare your sample with public data sets processed with the same pipeline. All you need is a raw SigSet.

sset <- sesameDataGet('EPIC.1.LNCaP')$sset
qualityRank(sset)
##         n_compared  rank_nondetection rank_meanintensity 
##        636.0000000          0.0000000          0.1572327

7 Output explicit and Infinium-I-derived SNP to VCF

sset <- sesameDataGet('EPIC.1.LNCaP')$sset

annoS <- sesameDataPullVariantAnno_SNP('EPIC','hg19')
## Retrieving SNP annotation from  https://zwdzwd.s3.amazonaws.com/InfiniumAnnotation/20200704/EPIC/EPIC.hg19.snp_overlap_b151.rds ... Done.
annoI <- sesameDataPullVariantAnno_InfiniumI('EPIC','hg19')
## Retrieving SNP annotation from  https://zwdzwd.s3.amazonaws.com/InfiniumAnnotation/20200704/EPIC/EPIC.hg19.typeI_overlap_b151.rds ... Done.
## output to console
head(formatVCF(sset, annoS=annoS, annoI=annoI))
##            #CHROM     POS         ID REF ALT QUAL FILTER                   INFO
## cg19411932   chr1  763129 cg19411932   C   T   83   PASS PVF=0.057;GT=0/0;GS=83
## cg16578267   chr1  937047 cg16578267   C   G   83   PASS PVF=0.059;GT=0/0;GS=83
## cg17801765   chr1  958276 cg17801765   G   C   73   PASS PVF=0.080;GT=0/0;GS=73
## cg18148498   chr1 1380620 cg18148498   A   G   92   PASS PVF=0.035;GT=0/0;GS=92
## cg12133962   chr1 3090136 cg12133962   T   C   73   PASS PVF=0.063;GT=0/0;GS=73
## cg24306648   chr1 3554768 cg24306648   G   A   54   PASS PVF=0.120;GT=0/0;GS=54

One can output to actual VCF file with a header by formatVCF(sset, vcf=path_to_vcf).