sesame 1.9.7
SeSAMe provides a set of quality control steps.
The SeSAMe QC function returns an sesameQC
object which can be
directly printed onto the screen.
sesameQC(ssets[[1]])
##
## =======================
## = Intensities =
## =======================
## No. probes (num_probes_all) 485577
## mean (M/U) (mean_intensity): 5529.506
## mean (M+U) (mean_intensity_total): 11059.01
##
## -- Infinium II --
## No. probes: (num_probes_II) 350076 (72.095%)
## Mean Intensity (mean_ii): 5160.813
##
## -- Infinium I (Red) --
## No. probes: (num_probes_IR) 89203 (18.371%)
## No. Probes Consistent Channel: 88799
## No. Porbes Swapped Channel: 162
## No. Probes Low Intensity: 242
## Mean Intensity (in-band): 6527.3
## Mean Intensity (out-of-band): 928.2117
##
## -- Infinium I (Grn) --
## No. probes: 46298 (9.535%)
## No. Probes Consistent Channel: 46000
## No. Probes Swapped Channel: 254
## No. Probes Low Intensity: 44
## Mean Intensity (in-band): 6394.865
## Mean Intensity (out-of-band): 640.0676
##
## =======================
## = Non-detection =
## =======================
## No. probes: 64144
## No. probes w/ NA (num_na): 64144 (13.210%)
## No. nondetection (num_nondt): 0 (0.000%)
##
## =======================
## = Beta Values =
## =======================
## Mean Betas: 0.509425
## Median Betas: 0.6091139
##
## -- cg probes --
## No. Probes: 482421
## No. Probes with NA: 63563 (13.176%)
## Mean Betas: 0.5118303
## Median Betas: 0.6180797
## % Unmethylated (Beta < 0.3): 41.018%
## % Methylated (Beta > 0.7): 46.791%
##
## -- ch probes --
## No. Probes: 3091
## No. Probes with NA: 581 (18.797%)
## Mean Betas: 0.1081252
## Median Betas: 0.07147149
## % Unmethylated (Beta < 0.3): 94.861%
## % Methylated (Beta > 0.7): 0.000%
##
## -- rs probes --
## No. Probes: 65
## No. Probes with NA: 0 (0.000%)
## Mean Betas: 0.5058478
## Median Betas: 0.5112609
## % Unmethylated (Beta < 0.3): 30.769%
## % Methylated (Beta > 0.7): 32.308%
##
## =======================
## = Inferences =
## =======================
## Sex: MALE
## Ethnicity: WHITE
## Age: 61.43636
## Bisulfite Conversion (GCT): 1.10858
The sesameQC
object can be coerced into data.frame and linked
using the following code
qc10 <- do.call(rbind, lapply(ssets, function(x)
as.data.frame(sesameQC(x))))
qc10$sample_name <- names(ssets)
qc10[,c('mean_beta_cg','frac_meth_cg','frac_unmeth_cg','sex','age')]
## mean_beta_cg frac_meth_cg frac_unmeth_cg sex age
## WB_105 0.5118303 46.79104 41.01820 MALE 61.43636
## WB_218 0.5101290 47.49724 41.97270 MALE 41.39507
## WB_261 0.5132241 47.64622 41.94357 MALE 25.75848
## PBMC_105 0.5167078 46.44486 40.16349 MALE 60.57983
## PBMC_218 0.5190739 48.01221 41.17625 MALE 42.90046
## PBMC_261 0.5179995 48.39253 41.28344 MALE 22.63487
## Gran_105 0.5108416 47.70949 43.27648 MALE 63.18373
## Gran_218 0.5074659 47.50894 43.63102 MALE 43.25656
## Gran_261 0.5100516 47.88544 43.51331 MALE 26.57018
## CD4+_105 0.5235019 47.97616 40.25087 MALE 54.23047
The background level is given by mean_oob_grn
and mean_oob_red
library(ggplot2)
ggplot(qc10,
aes(x = mean_oob_grn, y= mean_oob_red, label = sample_name)) +
geom_point() + geom_text(hjust = -0.1, vjust = 0.1) +
geom_abline(intercept = 0, slope = 1, linetype = 'dotted') +
xlab('Green Background') + ylab('Red Background') +
xlim(c(500,1200)) + ylim(c(500,1200))
The mean {M,U} intensity can be reached by mean_intensity
.
Similarly, the mean M+U intensity can be reached by
mean_intensity_total
. Low intensities are symptomatic of low
input or poor hybridization.
library(wheatmap)
p1 <- ggplot(qc10) +
geom_bar(aes(sample_name, mean_intensity), stat='identity') +
xlab('Sample Name') + ylab('Mean Intensity') +
ylim(0,18000) +
theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))
p2 <- ggplot(qc10) +
geom_bar(aes(sample_name, mean_intensity_total), stat='identity') +
xlab('Sample Name') + ylab('Mean M+U Intensity') +
ylim(0,18000) +
theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))
WGG(p1) + WGG(p2, RightOf())
The fraction of color channel switch can be found in
InfI_switch_G2R
and InfI_switch_R2G
. These numbers are
symptomatic of how Infinium I probes are affected by SNP-induced
color channel switching.
ggplot(qc10) +
geom_point(aes(InfI_switch_G2R, InfI_switch_R2G))
The fraction of NAs are signs of masking due to variety of reasons
including failed detection, high background, putative low quality
probes etc. This number can be reached in frac_na_cg
and
num_na_cg
(the cg stands for CpG probes, so we also have
num_na_ch
and num_na_rs
)
p1 <- ggplot(qc10) +
geom_bar(aes(sample_name, num_na_cg), stat='identity') +
xlab('Sample Name') + ylab('Number of NAs') +
theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))
p2 <- ggplot(qc10) +
geom_bar(aes(sample_name, frac_na_cg), stat='identity') +
xlab('Sample Name') + ylab('Fraction of NAs (%)') +
theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))
WGG(p1) + WGG(p2, RightOf())
Sesame provide convenient function to compare your sample with public data sets processed with the same pipeline. All you need is a raw SigSet.
sset <- sesameDataGet('EPIC.1.LNCaP')$sset
qualityRank(sset)
## n_compared rank_nondetection rank_meanintensity
## 636.0000000 0.0000000 0.1572327
sset <- sesameDataGet('EPIC.1.LNCaP')$sset
annoS <- sesameDataPullVariantAnno_SNP('EPIC','hg19')
## Retrieving SNP annotation from https://zwdzwd.s3.amazonaws.com/InfiniumAnnotation/20200704/EPIC/EPIC.hg19.snp_overlap_b151.rds ... Done.
annoI <- sesameDataPullVariantAnno_InfiniumI('EPIC','hg19')
## Retrieving SNP annotation from https://zwdzwd.s3.amazonaws.com/InfiniumAnnotation/20200704/EPIC/EPIC.hg19.typeI_overlap_b151.rds ... Done.
## output to console
head(formatVCF(sset, annoS=annoS, annoI=annoI))
## #CHROM POS ID REF ALT QUAL FILTER INFO
## cg19411932 chr1 763129 cg19411932 C T 83 PASS PVF=0.057;GT=0/0;GS=83
## cg16578267 chr1 937047 cg16578267 C G 83 PASS PVF=0.059;GT=0/0;GS=83
## cg17801765 chr1 958276 cg17801765 G C 73 PASS PVF=0.080;GT=0/0;GS=73
## cg18148498 chr1 1380620 cg18148498 A G 92 PASS PVF=0.035;GT=0/0;GS=92
## cg12133962 chr1 3090136 cg12133962 T C 73 PASS PVF=0.063;GT=0/0;GS=73
## cg24306648 chr1 3554768 cg24306648 G A 54 PASS PVF=0.120;GT=0/0;GS=54
One can output to actual VCF file with a header by formatVCF(sset, vcf=path_to_vcf)
.