We make use of the most of the functions described above to show how to perform inference with various algorithms; the reader should read first those sections of the vignette to have an explanation of how those functions work. The aCML dataset is used as a test-case for all algorithms, regardless it should be precessed by algorithms to infer ensemble-level progression models.

To replicate the plots of the original paper were the aCML dataset was first analyzed with CAPRI, we can change the colors assigned to each type of event with the function change.color.

dataset = change.color(aCML, 'Ins/Del', 'dodgerblue4')
dataset = change.color(dataset, 'Missense point', '#7FC97F')
as.colors(dataset)
##          Ins/Del   Missense point Nonsense Ins/Del   Nonsense point 
##    "dodgerblue4"        "#7FC97F"        "#FDC086"        "#fab3d8"

0.0.0.1 Data consolidation.

All TRONCO algorithms require an input dataset were events have non-zero/non-one probability, and are all distinguishable. The tool provides a function to return lists of events which do not satisfy these constraint.

consolidate.data(dataset)
## $indistinguishable
## list()
## 
## $zeroes
## list()
## 
## $ones
## list()

The aCML data has none of the above issues (the call returns empty lists); if this were not the case data manipulation functions can be used to edit a TRONCO object.

0.1 CAPRI

In what follows, we show CAPRI’s functioning by replicating the aCML case study presented in CAPRI’s original paper. Regardless from which types of mutations we include, we select only the genes mutated at least in the 5% of the patients – thus we first use as.alterations to have gene-level frequencies, and then we apply there a frequency filter (R’s output is omitted).

alterations = events.selection(as.alterations(aCML), filter.freq = .05)
## *** Aggregating events of type(s) { Ins/Del, Missense point, Nonsense Ins/Del, Nonsense point }
## in a unique event with label " Alteration ".
## Dropping event types Ins/Del, Missense point, Nonsense Ins/Del, Nonsense point for 23 genes.
## .......................
## *** Binding events for 2 datasets.
## *** Events selection: #events =  23 , #types =  1 Filters freq|in|out = { TRUE ,  FALSE ,  FALSE }
## Minimum event frequency:  0.05  ( 3  alterations out of  64  samples).
## .......................
## Selected  7  events.
## 
## Selected  7  events, returning.

To proceed further with the example we create the dataset to be used for the inference of the model. From the original dataset we select all the genes whose mutations are occurring at least the 5% of the times, and we get that by the alterations profiles; also we force inclusion of all the events for the genes involved in an hypothesis (those included in variable gene.hypotheses, this list is based on the support found in the literature of potential aCML patterns).

gene.hypotheses = c('KRAS', 'NRAS', 'IDH1', 'IDH2', 'TET2', 'SF3B1', 'ASXL1')
aCML.clean = events.selection(aCML,
    filter.in.names=c(as.genes(alterations), gene.hypotheses))
## *** Events selection: #events =  31 , #types =  4 Filters freq|in|out = { FALSE ,  TRUE ,  FALSE }
## [filter.in] Genes hold:  TET2, EZH2, CBL, ASXL1, SETBP1  ...  [ 10 / 14  found].
## Selected  17  events, returning.
aCML.clean = annotate.description(aCML.clean, 
    'CAPRI - Bionformatics aCML data (selected events)')

We show a new oncoprint of this latest dataset where we annotate the genes in gene.hypotheses in order to identify them. The sample names are also shown.

oncoprint(aCML.clean, gene.annot = list(priors = gene.hypotheses), sample.id = TRUE)
## *** Oncoprint for "CAPRI - Bionformatics aCML data (selected events)"
## with attributes: stage = FALSE, hits = TRUE
## Sorting samples ordering to enhance exclusivity patterns.
## Annotating genes with RColorBrewer color palette Set1 .
## Setting automatic row font (exponential scaling): 10.7
## Setting automatic samples font half of row font: 5.3