Small variants within the genome (single nucleotide variants/insertions/deletions) are a critical component in the basis for genetic diseases. The identification and summary of these types of variants is often a first step for the development of hypothesis regarding the role of these events in disease genesis and progression. The
waterfall funtion is designed to effeciently summarize “small variant” (SNVs/indels) information at a cohort level. It is usefull for obtaining a broad sense of the type of variants observed in a cohort. Further
waterfall will give a sense of the mutation burden, reccurently mutated genes, the mutually or co exclusivity between genes and the relation of variants to clinical data.
The purpose of this vignette is to display the many features of the
waterfall function in order to give an in depth view of it’s parameters and functionality. For these examples the data frame
brcaMAf originating from a truncated .maf file from TCGA and available within
GenVisR will be used unless otherwise stated. Further for reproducability the seed for all examples has been set to == 426.
For basic use a user will only need to read a file of the proper type into R as a data frame and then supply this data frame to the
waterfall function as the argument given to
x. By default the data frame supplied is expected to correspond to a file in .maf (version 2.4) format (see below for additional supported formats). This data frame should have at a minimum the following column names “Tumor_Sample_Barcode”,
“Hugo_Symbol”, “Variant_Classification”, and contain rows corresponding to mutation events. Further while any value is permissible for the “Tumor_Sample_Barcode” and “Hugo_Symbol” columns which correspond to a sample name and gene name respectively, specific values are expected for the “Variant_Classification” column (see table below). This is because
waterfall is only capable of displaying a single variant type in the main plot for a cell (i.e. gene/sample). To achieve this
waterfall will choose to plot the most deleterious variant based on a hierarchy predefined for a .maf file. This heiararchy follows the order from top to bottom of the legend output with the plot.
# Load the GenVisR package library("GenVisR") set.seed(426) # Plot with the MAF file type specified (default) The mainRecurCutoff parameter # is described in the next section waterfall(brcaMAF, fileType = "MAF", mainRecurCutoff = 0.05)