Abstract
This Quick-Start is a runnable example showing the functionalities of the SpliceWiz workflow. Version 1.6.2
SpliceWiz is a graphical interface for differential alternative splicing and visualization in R. It differs from other alternative splicing tools as it is designed for users with basic bioinformatic skills to analyze datasets containing up to hundreds of samples! SpliceWiz contains a number of innovations including:
This vignette is a runnable working example of the SpliceWiz workflow. The purpose is to quickly demonstrate the basic functionalities of SpliceWiz.
We provide here a brief outline of the workflow for users to get started as quickly as possible. However, we also provide more details for those wishing to know more. Many sections will contain extra information that can be displayed when clicked on, such as these:
What are the system memory requirements for running SpliceWiz
We recommend the following memory requirements (RAM) for running various steps of SpliceWiz:
buildRef()
processBAM()
collateData()
lowMemoryMode=TRUE
): 32 gigabyteslowMemoryMode=FALSE
): 8 gigabytes per threadDifferential analysis
How does SpliceWiz measure alternative splicing?
SpliceWiz defines alternative splicing events (ASEs) as binary events between two possibilities, the included and excluded isoform. It detects and measures: skipped (casette) exons (SE), mutually-exclusive exons (MXE), alternative 5’/3’ splice site usage (A5SS / A3SS), alternate first / last exon usage (AFE / ALE), and retained introns (IR or RI).
SpliceWiz uses splice-specific read counts to measure ASEs. Namely, these are junction reads (reads that align across splice sites). The exception is intron retention (IR) whereby the (trimmed) mean read depth across the intron is measured (identical to the method used in IRFinder).
SpliceWiz provides two metrics:
SpliceOver
or SpliceMax
method (the latter is identical to that used in IRFinder)
Does SpliceWiz detect novel splicing events?
Novel splicing events are those in which at least one isoform is not an annotated transcript in the given gene annotation. SpliceWiz DOES detect novel splicing events.
It detects novel events by using novel junctions, using pairs of junctions that originate from or terminate at a common coordinate (novel alternate splice site usage).
Additionally, SpliceWiz detects “tandem junction reads”. These are reads that span across two or more splice junctions. The region between splice junctions can then be annotated as novel exons (if they are not identical to annotated exons). These novel exons can then be used to measure novel casette exon usage.
The basic steps of SpliceWiz are as follows:
To install SpliceWiz, start R (devel version) and enter:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("SpliceWiz")
Setting up a Conda environment to use SpliceWiz
For those wishing to set up a self-contained environment with SpliceWiz installed (e.g. on a high performance cluster), we recommend using miniconda. For installation instructions, see the documentation on how to install miniconda
After installing miniconda, create a conda environment as follows:
After following the prompts, activate the environment:
Next, install R 4.2.1 as follows:
NB: We have not been able to successfuly use r-base=4.3, so we recommend using r-base=4.2.1 (until further notice).
Many of SpliceWiz’s dependencies are up-to-date from the conda-forge channel, so they are best installed via conda:
conda install -c conda-forge r-devtools r-essentials r-xml r-biocmanager \
r-fst r-plotly r-rsqlite r-rcurl
After this is done, the remainder of the packages need to be installed from the R terminal. This is because most Bioconductor packages are from the bioconda channel and appear not to be routinely updated.
So, lets enter the R terminal from the command line:
Set up Bioconductor 3.16 (which is the latest version compatible with R 4.2):
Again, follow the prompts to update any necessary packages.
Once this is done, install SpliceWiz (devel) from github:
The last step will install any remaining dependencies, taking approximately 20-30 minutes depending on your system.
Enabling OpenMP (multi-threading) for MacOS users (Optional)
For MacOS users, make sure OpenMP libraries are installed correctly. We recommend users follow this guide, but the quickest way to get started is to install libomp
via brew:
Installing statistical package dependencies (Optional)
SpliceWiz uses established statistical tools to perform alternative splicing differential analysis:
To install all of these packages:
NxtIRFdata
data package. This data package contains the example “chrZ” genome / annotations and 6 example BAM files that are used in this working example. Also, NxtIRFdata provides pre-generated mappability exclusion annotations for building human and mouse SpliceWiz references
SpliceWiz offers a graphical user interface (GUI) for interactive users, e.g. in the RStudio environment. To start using SpliceWiz GUI:
Why do we need the SpliceWiz reference?
SpliceWiz first needs to generate a set of reference files. The SpliceWiz reference is used to quantitate alternative splicing in BAM files, as well as in downstream collation, differential analysis and visualisation.
Using the example FASTA and GTF files, use the buildRef()
function to build the SpliceWiz reference:
ref_path <- file.path(tempdir(), "Reference")
buildRef(
reference_path = ref_path,
fasta = chrZ_genome(),
gtf = chrZ_gtf(),
ontologySpecies = "Homo sapiens"
)
The SpliceWiz reference can be viewed as data frames using various getter functions. For example, to view the annotated alternative splicing events (ASE):
See ?View-Reference-methods
for a comprehensive list of getter functions
Using the GUI
After starting the SpliceWiz GUI in demo mode, click the Reference
tab from the menu side bar. The following interface will be shown:
Load Demo FASTA/GTF
(5), and then click Build Reference
(6)
Where did the FASTA and GTF files come from?
The helper functions chrZ_genome()
and chrZ_gtf()
returns the paths to the example genome (FASTA) and transcriptome (GTF) file included with the NxtIRFdata
package that contains the working example used by SpliceWiz:
What is the gene ontology species?
SpliceWiz supports gene ontology analysis. To enable this capability, we first need to generate the gene ontology annotations for the appropriate species.
To see a list of supported species:
getAvailableGO()
#> [1] "Triticum aestivum"
#> [2] "Triticum aestivum_subsp._aestivum"
#> [3] "Triticum vulgare"
#> [4] "Brassica napus"
#> [5] "Arachis hypogaea"
#> [6] "Hibiscus syriacus"
#> [7] "Acridium cancellatum"
#> [8] "Schistocerca cancellata"
#> [9] "Triticum dicoccoides"
#> [10] "Triticum turgidum_subsp._dicoccoides"
#> [11] "Triticum turgidum_var._dicoccoides"
#> [12] "Dendrohyas sarda"
#> [13] "Hyla arborea_sarda"
#> [14] "Hyla sarda"
#> [15] "Locusta gregaria"
#> [16] "Schistocerca gregaria"
#> [17] "Gossypium hirsutum"
#> [18] "Gossypium hirsutum_subsp._mexicanum"
#> [19] "Gossypium lanceolatum"
#> [20] "Gossypium purpurascens"
#> [21] "Camelina sativa"
#> [22] "Carassius auratus_gibelio"
#> [23] "Carassius gibelio_gibelio"
#> [24] "Carassius gibelio"
#> [25] "Carassius gibelio_subsp._gibelio"
#> [26] "Cyprinus gibelio"
#> [27] "Schistocerca piceifrons"
#> [28] "Papaver somniferum"
#> [29] "Zingiber officinale"
#> [30] "Trichomonas vaginalis_G3"
#> [31] "Trichomonas vaginalis_strain_G3"
#> [32] "Carassius auratus"
#> [33] "Carassius carassius_auratus"
#> [34] "Cyprinus auratus"
#> [35] "Helianthus annuus"
#> [36] "Schistocerca americana"
#> [37] "Acipenser ruthenus"
#> [38] "Schistocerca serialis_cubense"
#> [39] "Panicum virgatum"
#> [40] "Nicotiana tabacum"
#> [41] "Oncorhynchus mykiss"
#> [42] "Oncorhynchus nerka_mykiss"
#> [43] "Parasalmo mykiss"
#> [44] "Salmo mykiss"
#> [45] "Schistocerca nitens"
#> [46] "Schistocerca vaga"
#> [47] "Salvia splendens"
#> [48] "Carassius carassius"
#> [49] "Cyprinus carassius"
#> [50] "Vicia villosa"
#> [51] "Camellia sinensis"
#> [52] "Thea sinensis"
#> [53] "Oncorhynchus keta"
#> [54] "Salmo keta"
#> [55] "Pisum sativum"
#> [56] "Salmo salar"
#> [57] "Raphanus sativus"
#> [58] "Oncorhynchus kisutch"
#> [59] "Oncorhyncus kisutch"
#> [60] "Salmo kisatch"
#> [61] "Lolium rigidum"
#> [62] "Aegilops squarrosa_subsp._squarrosa"
#> [63] "Aegilops squarrosa"
#> [64] "Aegilops tauschii"
#> [65] "Patropyrum tauschii_subsp._tauschii"
#> [66] "Patropyrum tauschii"
#> [67] "Triticum aegilops"
#> [68] "Triticum tauschii"
#> [69] "Salmo trutta"
#> [70] "Cryptomeria japonica"
#> [71] "Coregonus clupeaformis"
#> [72] "Salmo clupeaformis"
#> [73] "Oncorhynchus gorbuscha"
#> [74] "Salmo gorbuscha"
#> [75] "Cyprinus carpio"
#> [76] "Glycine max_subsp._soja"
#> [77] "Glycine soja"
#> [78] "Salmo fontinalis"
#> [79] "Salvelinus fontinalis"
#> [80] "Glycine max"
#> [81] "Phaseolus max"
#> [82] "Chenopodium quinoa"
#> [83] "Hordeum sativum"
#> [84] "Hordeum vulgare_subsp._vulgare"
#> [85] "Hordeum vulgare_var._nudum"
#> [86] "Hordeum vulgare_var._vulgare"
#> [87] "Festuca perennis_(L.)_Columbus_&_J.P.Sm.,_2010"
#> [88] "Festuca perennis"
#> [89] "Lolium perenne"
#> [90] "Lolium vulgare"
#> [91] "Coffea arabica"
#> [92] "Barbus grahami&qu