cancerhotspots
allows rapid genotyping of known somatic hotspots from the tumor BAM files. This facilitates to get a quick overlook of 3,181 known somatic hot-spots in a matter of minutes, without spending hours on variant calling and annotation. In simple words, it fetches nucleotide frequencies of known somatic hotspots and prioritizes them based on allele frequency.
Output includes a browsable HTML file with variants passing the VAF/read depth filters and, a TSV file including nucleotide counts of all variants analyzed.
can_hs_tbl = maftools::cancerhotspots(
bam = "Tumor.bam",
refbuild = "GRCh37",
mapq = 10,
sam_flag = 1024
)
Input BAM file : Tumor.bam
Variants : cancerhotspots_v2_GRCh37.tsv
VAF filter : 0.050
min reads for t_allele : 8
MAPQ filter : 10
FLAG filter : 1024
Coverage filter : 30
HTSlib version : 1.7
Processed 1000 entries..
Processed 2000 entries..
Processed 3000 entries..
Done!
Summary:
Total variants processed : 3181
Variants > 0.05 threshold: 3
Avg. depth of coverage : 83.02
Output html report : Tumor.html
Output TSV file : Tumor.tsv
Above command generates an HTML report and a TSV file with the readcounts.
head(can_hs_tbl)
# loci fa_ref NT_change Hugo_Symbol Variant_Classification AA_change Meta VAF A T G C Ins Del
# 1: 1:2491289 NA G>A TNFRSF14 Missense_Mutation C111Y deleterious(0) 0 0 0 21 0 0 0
# 2: 1:2491290 NA C>G TNFRSF14 Missense_Mutation C111W deleterious(0) 0 0 0 0 21 0 0
# 3: 1:8073432 NA T>G ERRFI1 Missense_Mutation K409N deleterious(0) 0 1 64 0 1 0 0
# 4: 1:8073434 NA T>G ERRFI1 Missense_Mutation K409Q deleterious(0.04) 0 0 63 0 0 0 0
# 5: 1:8074313 NA T>A ERRFI1 Nonsense_Mutation K116* 0 0 106 0 0 0 0
# 6: 1:9779982 NA T>C PIK3CD Missense_Mutation C416R tolerated(0.26) 0 1 18 0 0 0 0
The tsv files generated by cancerhotspots()
can be aggregated and converted into MAF with the function cancerhotspotsAggr()
.
CLI version of cancerhotspots
can be found here
bamreadcounts
function extracts ATGC nucleotide distribution for targeted loci from the BAM files. The function name is an homage to bam-readcount tool and additionally supports INDELS.
#Generate a sample loci - first two columns must contain chromosome name and position
loci = data.table::data.table(chr = c("seq1", "seq2"), pos = c(1340, 1483))
loci
## chr pos
## <char> <num>
## 1: seq1 1340
## 2: seq2 1483
Get nucleotide frequency from BAM files
#Example BAM file from Rsamtools package
#By default position are assumed to be in 1-based coordinate system
bamfile = system.file("extdata", "ex1.bam", package = "Rsamtools")
loci_rc = maftools::bamreadcounts(bam = bamfile, loci = loci)
loci_rc
# $ex1
# loci fa_ref A T G C Ins Del
# 1: seq1:1340 NA 1 0 0 62 0 0
# 2: seq2:1483 NA 0 13 0 0 0 0
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /media/volume/teran2_disk/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] maftools_2.21.1
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.37 RColorBrewer_1.1-3 R6_2.5.1 fastmap_1.2.0
## [5] Matrix_1.7-0 xfun_0.47 lattice_0.22-6 splines_4.4.1
## [9] cachem_1.1.0 knitr_1.48 htmltools_0.5.8.1 rmarkdown_2.28
## [13] lifecycle_1.0.4 cli_3.6.3 grid_4.4.1 sass_0.4.9
## [17] data.table_1.16.0 jquerylib_0.1.4 compiler_4.4.1 tools_4.4.1
## [21] evaluate_1.0.0 bslib_0.8.0 survival_3.7-0 yaml_2.3.10
## [25] DNAcopy_1.79.0 rlang_1.1.4 jsonlite_1.8.9