CytoMethIC-Oncology
is a collection machine learning models for oncology. This includes CNS tumor classification, pan-cancer classification, cell of origin classification, and subtype classification models.
Models available are listed below:
EHID | ModelID | PredictionLabel |
---|---|---|
EH8423 | CancerCellOfOrigin21_rfc | Cell of origin defined in TCGA (N=21) |
NA | CancerType33_InfHum3_20230807 | TCGA cancer types (N=33) |
EH8398 | CancerType33_mlp | TCGA cancer types (N=33) |
EH8395 | CancerType33_rfc | TCGA cancer types (N=33) |
NA | CancerType33_rfcTCGA_InfHum3 | TCGA cancer types (N=33) |
EH8396 | CancerType33_svm | TCGA cancer types (N=33) |
EH8397 | CancerType33_xgb | TCGA cancer types (N=33) |
NA | CancerType33_xgbTCGA_InfHum3 | TCGA cancer types (N=33) |
EH8402 | CNSTumor66_mlp | CNS Tumor Class (N=66) |
EH8399 | CNSTumor66_rfc | CNS Tumor Class (N=66) |
NA | CNSTumor66_rfcCapper_InfHum3 | CNS Tumor Class (N=66) |
EH8400 | CNSTumor66_svm | CNS Tumor Class (N=66) |
EH8401 | CNSTumor66_xgb | CNS Tumor Class (N=66) |
NA | CNSTumor66_xgbCapper_InfHum3 | CNS Tumor Class (N=66) |
EH8422 | Subtype91_rfc | Cancer subtypes defined in TCGA (N=91) |
NA | TumorPurity_HM450 | Tumor purity (%) |
NA | TumorPurity_HM450_20240318 | Tumor purity (%) |
One can access the model using the EHID above in ExperimentHub()[["EHID"]]
.
More models (if EHID is NA) are available in the following Github Repo. You can directly download them and load with readRDS()
. Some examples using either approach are below.
The below snippet shows a demonstration of the model abstraction working on random forest and support vector models from CytoMethIC models on ExperimentHub.
## for missing data
library(sesame)
library(CytoMethIC)
betas = imputeBetas(sesameDataGet("HM450.1.TCGA.PAAD")$betas)
model = ExperimentHub()[["EH8395"]] # Random forest model
cmi_predict(betas, model)
## $response
## [1] "PAAD"
##
## $prob
## PAAD
## 0.852
## $response
## [1] "PAAD"
##
## $prob
## betas[, attr(model$terms, "term.labels")]
## 0.9864795
model = ExperimentHub()[["EH8422"]] # Cancer subtype
cmi_predict(sesameDataGet("HM450.1.TCGA.PAAD")$betas, model)
## $response
## [1] "GI.CIN"
##
## $prob
## GI.CIN
## 0.462
The below snippet shows a demonstration of the cmi_predict function working to predict the cell of origin of the cancer.
## $response
## [1] "C20:Mixed (Stromal/Immune)"
##
## $prob
## C20:Mixed (Stromal/Immune)
## 0.768
## R version 4.5.0 beta (2025-04-02 r88102)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] sesame_1.27.0 sesameData_1.27.0 CytoMethIC_1.5.0
## [4] ExperimentHub_2.17.0 AnnotationHub_3.17.0 BiocFileCache_2.17.0
## [7] dbplyr_2.5.0 BiocGenerics_0.55.0 generics_0.1.3
## [10] knitr_1.50
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 dplyr_1.1.4
## [3] blob_1.2.4 filelock_1.0.3
## [5] Biostrings_2.77.0 fastmap_1.2.0
## [7] digest_0.6.37 mime_0.13
## [9] lifecycle_1.0.4 KEGGREST_1.49.0
## [11] RSQLite_2.3.9 magrittr_2.0.3
## [13] compiler_4.5.0 rlang_1.1.6
## [15] sass_0.4.10 tools_4.5.0
## [17] yaml_2.3.10 S4Arrays_1.9.0
## [19] bit_4.6.0 curl_6.2.2
## [21] DelayedArray_0.35.0 plyr_1.8.9
## [23] RColorBrewer_1.1-3 abind_1.4-8
## [25] BiocParallel_1.43.0 withr_3.0.2
## [27] purrr_1.0.4 grid_4.5.0
## [29] stats4_4.5.0 preprocessCore_1.71.0
## [31] wheatmap_0.2.0 e1071_1.7-16
## [33] colorspace_2.1-1 ggplot2_3.5.2
## [35] scales_1.3.0 SummarizedExperiment_1.39.0
## [37] cli_3.6.4 rmarkdown_2.29
## [39] crayon_1.5.3 reshape2_1.4.4
## [41] httr_1.4.7 tzdb_0.5.0
## [43] proxy_0.4-27 DBI_1.2.3
## [45] cachem_1.1.0 stringr_1.5.1
## [47] parallel_4.5.0 AnnotationDbi_1.71.0
## [49] BiocManager_1.30.25 XVector_0.49.0
## [51] matrixStats_1.5.0 vctrs_0.6.5
## [53] Matrix_1.7-3 jsonlite_2.0.0
## [55] IRanges_2.43.0 hms_1.1.3
## [57] S4Vectors_0.47.0 bit64_4.6.0-1
## [59] fontawesome_0.5.3 jquerylib_0.1.4
## [61] glue_1.8.0 codetools_0.2-20
## [63] stringi_1.8.7 gtable_0.3.6
## [65] BiocVersion_3.22.0 GenomeInfoDb_1.45.0
## [67] GenomicRanges_1.61.0 UCSC.utils_1.5.0
## [69] munsell_0.5.1 tibble_3.2.1
## [71] pillar_1.10.2 rappdirs_0.3.3
## [73] htmltools_0.5.8.1 randomForest_4.7-1.2
## [75] GenomeInfoDbData_1.2.14 R6_2.6.1
## [77] evaluate_1.0.3 Biobase_2.69.0
## [79] lattice_0.22-7 readr_2.1.5
## [81] png_0.1-8 memoise_2.0.1
## [83] BiocStyle_2.37.0 bslib_0.9.0
## [85] class_7.3-23 Rcpp_1.0.14
## [87] SparseArray_1.9.0 xfun_0.52
## [89] MatrixGenerics_1.21.0 pkgconfig_2.0.3