Introduction

SpliceWiz is a graphical interface for differential alternative splicing and visualization in R. It differs from other alternative splicing tools as it is designed for users with basic bioinformatic skills to analyze datasets containing up to hundreds of samples! SpliceWiz contains a number of innovations including:

  • Super-fast handling of alignment BAM files using ompBAM, our developer resource for multi-threaded BAM processing,
  • Alternative splicing event (ASE) filters to remove problematic ASEs from analysis
  • Group-averaged coverage plots: publication-ready figures to clearly visualize differential alternative splicing between biological / experimental conditions
  • Interactive figures, including scatter and volcano plots, gene ontology (GO) analysis, heatmaps, and scrollable coverage plots, powered using the shinyDashboard interface

This vignette is a runnable working example of the SpliceWiz workflow. The purpose is to quickly demonstrate the basic functionalities of SpliceWiz.

We provide here a brief outline of the workflow for users to get started as quickly as possible. However, we also provide more details for those wishing to know more. Many sections will contain extra information that can be displayed when clicked on, such as these:

Click on me for more details
In most sections, we offer more details about each step of the workflow, that can be revealed in text segments like this one. Be sure to click on buttons like these, where available.


FAQ

What are the system memory requirements for running SpliceWiz
We recommend the following memory requirements (RAM) for running various steps of SpliceWiz:

buildRef()

  • Building human / mouse SpliceWiz reference: 8 gigabytes

processBAM()

  • Processing small alignment (BAM) files (~20 million paired end reads): 8 gigabytes
  • Processing large BAM files (~100 million paired end reads): 16 gigabytes

collateData()

  • Collating routine experiments (e.g. 3 replicates, 2 conditions): 8-16 gigabytes
  • Collating large experiments (20+ samples, using lowMemoryMode=TRUE): 32 gigabytes
  • Collating large experiments (20+ samples, using lowMemoryMode=FALSE): 8 gigabytes per thread

Differential analysis

  • Differential analysis (routine experiments): 8 gigabytes
  • Differential analysis (large experiments - 20+ samples): 16 gigabytes
  • DESeq2-based differential analysis of large experiments: 32 gigabytes


How does SpliceWiz measure alternative splicing?

SpliceWiz defines alternative splicing events (ASEs) as binary events between two possibilities, the included and excluded isoform. It detects and measures: skipped (casette) exons (SE), mutually-exclusive exons (MXE), alternative 5’/3’ splice site usage (A5SS / A3SS), alternate first / last exon usage (AFE / ALE), and retained introns (IR or RI).

SpliceWiz uses splice-specific read counts to measure ASEs. Namely, these are junction reads (reads that align across splice sites). The exception is intron retention (IR) whereby the (trimmed) mean read depth across the intron is measured (identical to the method used in IRFinder).

SpliceWiz provides two metrics:

  • Percent spliced in (PSI): is the expression of the included isoform as a proportion of both included/excluded isoform. PSIs are measured for all types of alternative splicing, including annotated retained introns (RI)
  • IR-ratio: For introns, we also measure IR-ratios, which is the expression of IR-transcript as a proportion of IR- and spliced-transcripts. Spliced transcript expression is measured using either SpliceOver or SpliceMax method (the latter is identical to that used in IRFinder)


Does SpliceWiz detect novel splicing events?
Novel splicing events are those in which at least one isoform is not an annotated transcript in the given gene annotation. SpliceWiz DOES detect novel splicing events.

It detects novel events by using novel junctions, using pairs of junctions that originate from or terminate at a common coordinate (novel alternate splice site usage).

Additionally, SpliceWiz detects “tandem junction reads”. These are reads that span across two or more splice junctions. The region between splice junctions can then be annotated as novel exons (if they are not identical to annotated exons). These novel exons can then be used to measure novel casette exon usage.


Workflow from a glance

The basic steps of SpliceWiz are as follows:

  • Building the SpliceWiz reference
  • Process BAM files using SpliceWiz
  • Collate results of individual samples into an experiment
  • Importing the collated experiment as an NxtSE object
  • Alternative splicing event filtering
  • Differential ASE analysis
  • Visualization

Quick-Start

Installation

To install SpliceWiz, start R (devel version) and enter:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("SpliceWiz")

Setting up a Conda environment to use SpliceWiz
For those wishing to set up a self-contained environment with SpliceWiz installed (e.g. on a high performance cluster), we recommend using miniconda. For installation instructions, see the documentation on how to install miniconda

After installing miniconda, create a conda environment as follows:

conda create -n envSpliceWiz python=3.9

After following the prompts, activate the environment:

conda activate envSpliceWiz

Next, install R 4.2.1 as follows:

conda install -c conda-forge r-base=4.2.1

NB: We have not been able to successfuly use r-base=4.3, so we recommend using r-base=4.2.1 (until further notice).

Many of SpliceWiz’s dependencies are up-to-date from the conda-forge channel, so they are best installed via conda:

conda install -c conda-forge r-devtools r-essentials r-xml r-biocmanager \
  r-fst r-plotly r-rsqlite r-rcurl

After this is done, the remainder of the packages need to be installed from the R terminal. This is because most Bioconductor packages are from the bioconda channel and appear not to be routinely updated.

So, lets enter the R terminal from the command line:

R

Set up Bioconductor 3.16 (which is the latest version compatible with R 4.2):

BiocManager::install(version = "3.16")

Again, follow the prompts to update any necessary packages.

Once this is done, install SpliceWiz (devel) from github:

# BiocManager::install("SpliceWiz")
devtools::install_github("alexchwong/SpliceWiz")

The last step will install any remaining dependencies, taking approximately 20-30 minutes depending on your system.


Enabling OpenMP (multi-threading) for MacOS users (Optional)
For MacOS users, make sure OpenMP libraries are installed correctly. We recommend users follow this guide, but the quickest way to get started is to install libomp via brew:

brew install libomp


Installing statistical package dependencies (Optional)
SpliceWiz uses established statistical tools to perform alternative splicing differential analysis:

  • limma: models included and excluded counts as log-normal distributions
  • DESeq2: models included and excluded counts as negative binomial distributions
  • edgeR: models included and excluded counts as negative binomial distributions. SpliceWiz uses the quasi-likelihood method which deals better with zero-counts.
  • DoubleExpSeq: models included and excluded counts using beta binomial distributions

To install all of these packages:

install.packages("DoubleExpSeq")

BiocManager::install(c("DESeq2", "limma", "edgeR"))


Loading SpliceWiz

library(SpliceWiz)
Details
The SpliceWiz package loads the NxtIRFdata data package. This data package contains the example “chrZ” genome / annotations and 6 example BAM files that are used in this working example. Also, NxtIRFdata provides pre-generated mappability exclusion annotations for building human and mouse SpliceWiz references


The SpliceWiz Graphics User Interface (GUI)

SpliceWiz offers a graphical user interface (GUI) for interactive users, e.g. in the RStudio environment. To start using SpliceWiz GUI:

if(interactive()) {
    spliceWiz(demo = TRUE)
}


Building the SpliceWiz reference

Why do we need the SpliceWiz reference?
SpliceWiz first needs to generate a set of reference files. The SpliceWiz reference is used to quantitate alternative splicing in BAM files, as well as in downstream collation, differential analysis and visualisation.

SpliceWiz generates a reference from a user-provided genome FASTA and genome annotation GTF file, and is optimised for Ensembl references but can accept other reference GTF files. Alternatively, SpliceWiz accepts AnnotationHub resources, using the record names of AnnotationHub records as input.


Using the example FASTA and GTF files, use the buildRef() function to build the SpliceWiz reference:

ref_path <- file.path(tempdir(), "Reference")
buildRef(
    reference_path = ref_path,
    fasta = chrZ_genome(),
    gtf = chrZ_gtf(),
    ontologySpecies = "Homo sapiens"
)

The SpliceWiz reference can be viewed as data frames using various getter functions. For example, to view the annotated alternative splicing events (ASE):

df <- viewASE(ref_path)

See ?View-Reference-methods for a comprehensive list of getter functions

Using the GUI
After starting the SpliceWiz GUI in demo mode, click the Reference tab from the menu side bar. The following interface will be shown:

Building the SpliceWiz reference using the GUI

Building the SpliceWiz reference using the GUI

  1. The first step to building a SpliceWiz reference is to select a directory in which to create the reference.
  2. SpliceWiz provides an interface to retrieve the genome sequence (FASTA) and transcriptome annotation (GTF) files from the Ensembl FTP server, by first selecting the “Release” and then “Species” from the drop-down boxes.
  3. Alternatively, users can provide their own FASTA and GTF files.
  4. Human (hg38, hg19) and mouse genomes (mm10, mm9) have the option of further refining IR analysis using built-in mappability exclusion annotations, allowing SpliceWiz to ignore intronic regions of low mappability.
For now, to continue with the demo and create the reference using the GUI, click on the Load Demo FASTA/GTF (5), and then click Build Reference (6)

Where did the FASTA and GTF files come from?
The helper functions chrZ_genome() and chrZ_gtf() returns the paths to the example genome (FASTA) and transcriptome (GTF) file included with the NxtIRFdata package that contains the working example used by SpliceWiz:

# Provides the path to the example genome:
chrZ_genome()
#> [1] "/home/biocbuild/bbs-3.19-bioc/R/site-library/NxtIRFdata/extdata/genome.fa"

# Provides the path to the example gene annotation:
chrZ_gtf()
#> [1] "/home/biocbuild/bbs-3.19-bioc/R/site-library/NxtIRFdata/extdata/transcripts.gtf"

What is the chrZ genome?
For the purpose of generating a running example to demonstrate SpliceWiz, we created an artificial genome / gene annotation. This was created using 7 human genes (SRSF1, SRSF2, SRSF3, TRA2A, TRA2B, TP53 and NSUN5). The SRSF and TRA family of genes all contain poison exons flanked by retained introns. Additionally, NSUN5 contains an annotated IR event in its terminal intron. Sequences from these 7 genes were aligned into one sequence to create an artificial chromosome Z (chrZ). The gene annotations were modified to only contain the 7 genes with the modified genomic coordinates.

What is the gene ontology species?
SpliceWiz supports gene ontology analysis. To enable this capability, we first need to generate the gene ontology annotations for the appropriate species.

To see a list of supported species:

getAvailableGO()
#>    [1] "Triticum aestivum"                                         
#>    [2] "Triticum aestivum_subsp._aestivum"                         
#>    [3] "Triticum vulgare"                                          
#>    [4] "Brassica napus"                                            
#>    [5] "Arachis hypogaea"                                          
#>    [6] "Hibiscus syriacus"                                         
#>    [7] "Acridium cancellatum"                                      
#>    [8] "Schistocerca cancellata"                                   
#>    [9] "Triticum dicoccoides"                                      
#>   [10] "Triticum turgidum_subsp._dicoccoides"                      
#>   [11] "Triticum turgidum_var._dicoccoides"                        
#>   [12] "Dendrohyas sarda"                                          
#>   [13] "Hyla arborea_sarda"                                        
#>   [14] "Hyla sarda"                                                
#>   [15] "Locusta gregaria"                                          
#>   [16] "Schistocerca gregaria"                                     
#>   [17] "Gossypium hirsutum"                                        
#>   [18] "Gossypium hirsutum_subsp._mexicanum"                       
#>   [19] "Gossypium lanceolatum"                                     
#>   [20] "Gossypium purpurascens"                                    
#>   [21] "Camelina sativa"                                           
#>   [22] "Carassius auratus_gibelio"                                 
#>   [23] "Carassius gibelio_gibelio"                                 
#>   [24] "Carassius gibelio"                                         
#>   [25] "Carassius gibelio_subsp._gibelio"                          
#>   [26] "Cyprinus gibelio"                                          
#>   [27] "Schistocerca piceifrons"                                   
#>   [28] "Papaver somniferum"                                        
#>   [29] "Zingiber officinale"                                       
#>   [30] "Trichomonas vaginalis_G3"                                  
#>   [31] "Trichomonas vaginalis_strain_G3"                           
#>   [32] "Carassius auratus"                                         
#>   [33] "Carassius carassius_auratus"                               
#>   [34] "Cyprinus auratus"                                          
#>   [35] "Helianthus annuus"                                         
#>   [36] "Schistocerca americana"                                    
#>   [37] "Acipenser ruthenus"                                        
#>   [38] "Schistocerca serialis_cubense"                             
#>   [39] "Panicum virgatum"                                          
#>   [40] "Nicotiana tabacum"                                         
#>   [41] "Oncorhynchus mykiss"                                       
#>   [42] "Oncorhynchus nerka_mykiss"                                 
#>   [43] "Parasalmo mykiss"                                          
#>   [44] "Salmo mykiss"                                              
#>   [45] "Schistocerca nitens"                                       
#>   [46] "Schistocerca vaga"                                         
#>   [47] "Salvia splendens"                                          
#>   [48] "Carassius carassius"                                       
#>   [49] "Cyprinus carassius"                                        
#>   [50] "Vicia villosa"                                             
#>   [51] "Camellia sinensis"                                         
#>   [52] "Thea sinensis"                                             
#>   [53] "Oncorhynchus keta"                                         
#>   [54] "Salmo keta"                                                
#>   [55] "Pisum sativum"                                             
#>   [56] "Salmo salar"                                               
#>   [57] "Raphanus sativus"                                          
#>   [58] "Oncorhynchus kisutch"                                      
#>   [59] "Oncorhyncus kisutch"                                       
#>   [60] "Salmo kisatch"                                             
#>   [61] "Lolium rigidum"                                            
#>   [62] "Aegilops squarrosa_subsp._squarrosa"                       
#>   [63] "Aegilops squarrosa"                                        
#>   [64] "Aegilops tauschii"                                         
#>   [65] "Patropyrum tauschii_subsp._tauschii"                       
#>   [66] "Patropyrum tauschii"                                       
#>   [67] "Triticum aegilops"                                         
#>   [68] "Triticum tauschii"                                         
#>   [69] "Salmo trutta"                                              
#>   [70] "Cryptomeria japonica"                                      
#>   [71] "Coregonus clupeaformis"                                    
#>   [72] "Salmo clupeaformis"                                        
#>   [73] "Oncorhynchus gorbuscha"                                    
#>   [74] "Salmo gorbuscha"                                           
#>   [75] "Cyprinus carpio"                                           
#>   [76] "Glycine max_subsp._soja"                                   
#>   [77] "Glycine soja"                                              
#>   [78] "Salmo fontinalis"                                          
#>   [79] "Salvelinus fontinalis"                                     
#>   [80] "Glycine max"                                               
#>   [81] "Phaseolus max"                                             
#>   [82] "Chenopodium quinoa"                                        
#>   [83] "Hordeum sativum"                                           
#>   [84] "Hordeum vulgare_subsp._vulgare"                            
#>   [85] "Hordeum vulgare_var._nudum"                                
#>   [86] "Hordeum vulgare_var._vulgare"                              
#>   [87] "Festuca perennis_(L.)_Columbus_&_J.P.Sm.,_2010"            
#>   [88] "Festuca perennis"                                          
#>   [89] "Lolium perenne"                                            
#>   [90] "Lolium vulgare"                                            
#>   [91] "Coffea arabica"                                            
#>   [92] "Barbus grahami&qu