Assume your matrix is stored in an object called
mat, to perform consensus partitioning with cola, you only need to run following code:
In above code, there are three steps:
NAs are removed. Rows with very low variance are removed.
NAvalues are imputed if there are less than 50% in each row. Outliers are adjusted in each row.
hclust(hierarchical clustering with cutree),
skmeans::skmeans(spherical k-means clustering),
cluster::pam(partitioning around medoids) and
Mclust::mclust(model-based clustering). The default methods to extract top n rows are
CV(coefficient of variation),
MAD(median absolute deviation) and
ATC(ability to correlate to other rows).
run_all_consensus_partition_methods() runs multiple methods in sequence, which might take long time for big datasets. Users can also run consensus partitioining with a specific top-value methods (e.g. SD) and partitioning methods (e.g. skmeans) by
For extremely large datasets, users can run
consensus_partition_by_down_sampling() by randomly sampling a subset of samples for classification, later the classes of the remaining samples are predicted by the signatures of the cola classification. More details can be found in the vignette “Work with Big Datasets”.
There are examples on real datasets for cola analysis that can be found at https://jokergoo.github.io/cola_collection/.