Table of Contents

Introduction

circRNAprofiler is an R-based framework that only requires an R installation and offers 15 modules for a comprehensive in silico analysis of circRNAs. This computational framework allows to combine and analyze circRNAs previously detected by multiple publicly available annotation-based circRNA detection tools. It covers different aspects of circRNAs analysis from differential expression analysis, evolutionary conservation, biogenesis to functional analysis. The pipeline used by circRNAprofiler is highly automated and customizable. Furthermore, circRNAprofiler includes additional functions for data visualization which facilitate the interpretation of the results.

\label{fig:figs} Schematic representation of the circRNA analysis workflow implemented by circRNAprofiler. The grey boxes represent the 15 modules with the main R-functions reported in italics. The different type of sequences that can be selected are depicted in the dashed box. BSJ, Back-Spliced Junction.

Schematic representation of the circRNA analysis workflow implemented by circRNAprofiler. The grey boxes represent the 15 modules with the main R-functions reported in italics. The different type of sequences that can be selected are depicted in the dashed box. BSJ, Back-Spliced Junction.

This vignettes provides a guide of how to use the R package circRNAProfiler.

As practical example the RNA-sequencing data from human left ventricle tissues previously analyzed by our group for the presence of circRNAs (Khan et al. 2016), was here re-analyzed. Multiple detection tools (CircMarker(cm), MapSplice2 (ms) and NCLscan (ns)) were used this time for the detection of circRNAs and an additional sample for each condition was included reaching a total of 9 samples: 3 control hearts, 3 hearts of patients with dilated cardiomyopathies (DCM) and 3 hearts with hypertrophic cardiomyopathies (HCM). After the detections we ran through Modules 1-3 to generate the object backSplicedJunctions and mergedBSJunctions that will be used in this vignettes.

Raw RNA sequencing data are available at NCBI BioProject accession number PRJNA533243.

Install the package

You can install circRNAprofiler using:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("circRNAprofiler")

To download the development version on github use:

BiocManager::install("Aufiero/circRNAprofiler")

Load the package

library(circRNAprofiler)

# Packages needed for the vignettes
library(ggpubr)
library(ggplot2)
library(VennDiagram)
library(gridExtra)

Running circRNAprofiler

Module 1 - Set up project folder

initCircRNAprofiler()

An important step of the analysis workflow is to initialize the project folder, which can be done with the helper function initCircRNAprofiler() available in the package. The project folder should be structured as in the example Figure 2 if seven detection tools are used to detect circRNAs in 6 samples.

\label{fig:figs} Example of a project folder structure

Example of a project folder structure

Helper function

Project folder contains project files and circRNA prediction results. In detail, project files include the genome-annotation file (.gtf), the file containing information about the experimental design (experiment.txt) and optional files (motifs.txt, miRs.txt, transcripts.txt, traits.txt) which contain user specifications that are used to customize the analysis to execute. In order to initialize the project folder the helper function initCircRNAprofiler() can be used to streamline the process. To initialize the project folder run the following command specifying the tools used to predict circRNAs. The following options are allowed: “mapsplice”, “nclscan”, “knife”, “circexplorer2”, “uroborus”, “circmarker”, and “other” (see below when to use the option “other”). The function will automatically create the project folder with the corresponding subfolders named as the circRNA detection tools used, and the 5 *.txt files templates with the corresponding headers. These files should be filled with the appropriate information (see files description below).

If only MapSplice is used for the circRNA detection run the following command:

initCircRNAprofiler(projectFolderName = "circRNAprofiler", detectionTools =
                      "mapsplice")

If circRNA detection is performed by using multiple detection tools then run the command with the name of the detection tools used, e.g.:

initCircRNAprofiler(
    projectFolderName = "circRNAprofiler",
    detectionTools = c("mapsplice", "nclscan", "circmarker")
)

Next, set project folder circRNAprofiler as your working directory.

Project files description

The file experiment.txt contains the experiment design information. It must have at least 3 columns with headers:

  • label (1st column): unique names of the samples (short but informative).

  • fileName (2nd column): name of the input files - e.g. circRNAs_X.txt, where x can be can be 001, 002 etc.

  • condition (3rd column): biological conditions - e.g. A or B; healthy or diseased if you have only 2 conditions.

The functions for the filtering and the differential expression analysis depend on the information reported in this file. The differential expression analysis is performed by comparing the condition positioned forward against the conditions positioned backward in the alphabet (column condition of experiment.txt), so that, circRNAs with positive log2FC are up-regulated in condition B compared to condition A (and vice versa for circRNA with negative log2FC).

label fileName condition
C1 circRNAs_006.txt A
C2 circRNAs_009.txt A
C3 circRNAs_007.txt A
D1 circRNAs_005.txt B
D2 circRNAs_002.txt B
D3 circRNAs_001.txt B
H1 circRNAs_003.txt C
H2 circRNAs_004.txt C
H3 circRNAs_008.txt C

The file *.gtf contains the genome annotation. circRNAprofiler works well and was tested with ensamble gencode, UCSC or NCBI based-genome annotations. It is suggested to use the same annotation file used during the read mapping procedure.

The file motifs.txt (optional) contains motifs/regular expressions specified by the user. It must have 3 columns with header:

  • id (1st column): name of the motif. - e.g. RBM20 or motif1.

  • motif (2nd column): motif/pattern to search.

  • length (3rd column): length of the motif.

If this file is absent or empty only the motifs of RNA Binding Proteins in the ATtRACT database (Giudice et al. 2016) are considered in the motifs analysis.

id motif length
RBM20 [ACU]UUCU[ACU] 6

The file traits.txt (optional) contains diseases/traits specified by the user. It must have one column with header id. Type data(“gwasTraits”) to have an image (dated on the 31st October 2018) of the traits reported in the GWAS catalog (MacArthur et al. 2017). The GWAS catalog is a curated collection of all published genome-wide association studies and contains~ 90000 unique SNP-trait associations. If the file traits.txt is absent or empty, all SNPs associated with all diseases/traits in the GWAS catalog are considered in the SNPs analysis.

id
AR-C124910XX levels in individuals with acute coronary syndromes treated with ticagrelor
Anthracycline-induced cardiotoxicity in childhood cancer
Atrial Septal Defect
Atrial fibrillation
Atrial fibrillation (SNPxSNP interaction)
Atrial fibrillation (interaction)

The file miRs.txt (optional) contains the microRNA ids from miRBase (Griffiths-Jones et al. 2006) specified by the user. It must have one column with header id. The first row must contain the miR name starting with the “>”, e.g >hsa-miR-1-3p. The sequences of the miRs will be automatically retrieved from the mirBase latest release or from the given mature.fa file, that should be present in the project folder. If this file is absent or empty, all miRs of the specified species are considered in the miRNA analysis.

id
>hsa-miR-1-3p
>hsa-miR-126-3p
>hsa-miR-145-5p
>hsa-miR-133a-3p
>hsa-miR-451a
>hsa-miR-23b-3p

The file transcripts.txt (optional) contains the transcript ids of the circRNA host genes. It must have one column with header id. If this file is empty the longest transcript of the circRNA host genes whose exon coordinates overlap with that of the detected back-spliced junctions are considered in the annotation analysis.

id
ENST00000514743.1

circRNA prediction results

The circRNAs_X.txt contains the detected circRNAs. Once the project folder has been initialized the circRNAs_X.txt file/s must go in the corresponding subfolders. There must be one .txt file per sample named circRNAs_X.txt, where X can be 001, 002 etc. If there are 6 samples, 6 .txt files named circRNAs_001.txt, circRNAs_002.txt, circRNAs_003.txt, circRNAs_004.txt, circRNAs_005.txt, circRNAs_006.txt must be present in each subfolder named as the name of the tool that has been used for the circRNA detection. If the detection tool used is mapsplice, nclscan, knife, circexplorer2, uroborus or circmarker, the only thing to do after the detection is to rename the files to circRNAs_X.txt and put them in the corresponding subfolder. A specific import function will be called internally to adapt and format the content as reported below (Figure 3).

If the tool is not mapsplice, nclscan, knife, circexplorer2, uroborus or circmarker, first check that the tool used is an annotation-based circRNA detection tool, then rename the files to circRNAs_X.txt and put them in the subfolder “other”. In this last case, you must ensure that each circRNAs_X.txt file must have at least the following 6 columns with the header (Figure 3):

  • gene: represents the gene from which the circRNA arises.

  • strand: is the strand where the gene is transcribed.

  • chrom: represents the chromosome from which the circRNA is derived.

  • startUpBSE: is the 5’ coordinate of the upstream back-spliced exon in the transcript. This corresponds with the back-spliced junction / acceptor site.

  • endDownBSE: is the 3’ coordinate of the downstream back-spliced exon in the transcript. This corresponds with the back-spliced junction / donor site.

  • coverage: corresponds to the number of reads mapping to the back-spliced junction in the sample.

NOTE: If more columns are present they will be discared.

The coordinates for startUpBSE and endDownBSE are relative to the reference strand, i.e. if strand is positive startUpBSE < endDownBSE, if strand is negative startUpBSE > endDownBSE.

The circRNAprofiler package can be extended in the future with further import functions specifically designed to import the output files of the different circRNA detection tools. At the moment only import functions for circRNA detection from annotation-based circRNA detection tools (e.g. MapSplice2, NCLscan, CircMarker, CircExplorer2, KNIFE, UROBORUS) are supported.

chrom gene strand startUpBSE endDownBSE coverage
chr15 ABHD2 + 89656956 89659752 35
chr19 AGTPBP1 - 88248289 88190230 40

checkProjectFolder()

The function checkProjectFolder() helps to verify that the project folder is set up correctly.

check <- checkProjectFolder()
check

If the project folder is set up correctly, check should be equal to 0.

Module 2 - Import predicted circRNAs

formatGTF()

The function formatGTF() formats the given annotation file from ensemble gencode, UCSC or NCBI. The gtf object is then called in other functions.

# For example purpose load a short version of the already formatted annotation
# file gencode.V19.annotation.gtf (downloaded from https://www.gencodegenes.org/)
data("gtf")
head(gtf)
##   chrom     start       end width strand type gene_name     transcript_id
## 1  chr1 237205505 237205869   365      + exon      RYR2 ENST00000366574.2
## 2  chr1 237433797 237433916   120      + exon      RYR2 ENST00000366574.2
## 3  chr1 237494178 237494282   105      + exon      RYR2 ENST00000366574.2
## 4  chr1 237519265 237519285    21      + exon      RYR2 ENST00000366574.2
## 5  chr1 237527658 237527672    15      + exon      RYR2 ENST00000366574.2
## 6  chr1 237532834 237532908    75      + exon      RYR2 ENST00000366574.2
##   exon_number
## 1           1
## 2           2
## 3           3
## 4           4
## 5           5
## 6           6

# Alternatively put the gtf file in the project folder, then run:
# gtf <- formatGTF(pathToGTF = "gencode.V19.annotation.gtf")

getBackSplicedJunctions()

The function getBackSplicedJunctions() reads the circRNAs_X.txt, adapts the content and generates a unique data frame with the circRNAs detected by each detection tool and the occurrences found in each sample.

# Load the object containing the detected circRNAs
data("backSplicedJunctions")
head(backSplicedJunctions)
##                                  id    gene strand chrom startUpBSE
## 1   ABHD2:+:chr15:89656956:89659752   ABHD2      + chr15   89656956
## 2 ACVR2A:+:chr2:148653870:148657467  ACVR2A      +  chr2  148653870
## 3     AFF1:+:chr4:87967318:87968746    AFF1      +  chr4   87967318
## 4  AGTPBP1:-:chr9:88248289:88190230 AGTPBP1      -  chr9   88248289
## 5  AGTPBP1:-:chr9:88248289:88200378 AGTPBP1      -  chr9   88248289
## 6  AGTPBP1:-:chr9:88248289:88211277 AGTPBP1      -  chr9   88248289
##   endDownBSE tool C1 C2 C3 D1  D2 D3 H1 H2  H3
## 1   89659752   ms 35 11 38 48  38 12 34 44  41
## 2  148657467   ms 56 11 76 56 121 10 37 85 112
## 3   87968746   ms 47  5 34 48  47  5 54 10  30
## 4   88190230   ms 40 23 63 56  65 18 95 67  53
## 5   88200378   ms 13  3 19 26  43  7 20 32  23
## 6   88211277   ms  5 11 24 27  38  5 54 28  41

# Alternatively run:
# backSplicedJunctions <- getBackSplicedJunctions(gtf)

Plot the number of circRNAs identified by each detection tool.

# Plot
p <- ggplot(backSplicedJunctions, aes(x = tool)) +
    geom_bar() +
    labs(title = "", x = "Detection tool", y = "No. of circRNAs") +
    theme_classic()

# Run getDetectionTools() to get the code corresponding to the circRNA
# detection tools.
dt <- getDetectionTools() %>%
    dplyr::filter( name %in% c("mapsplice","nclscan", "circmarker"))%>%
    gridExtra::tableGrob(rows=NULL)

# Merge plots
gridExtra::grid.arrange(p, dt, nrow=1)

Module 3 - Merge commonly identified circRNAs

mergeBSJunctions()

The function mergeBSJunctions() is called to shrink the data frame by grouping back-spliced junction coordinates commonly identified by multiple detection tools. For the grouped back-spliced junction coordinates, the counts of the tool which detected the highest total mean across all analyzed samples will be taken. All the tools that detected the back-spliced junction are then listed in the column “tool” of the final table. Run getDetectionTools() to get the code correspoding to each circRNA detection tool.

NOTE: In this module circRNAs that derived from the antisense strand of the reported gene are identified. In detail, in our pipeline a circRNA is defined antisense if the strand from which the circRNA arises (i.e. reported in the prediction results) is different from the strand where the gene is transcribed (i.e. reported in the genome annotation file). This might be explained by technical artifacts or by the presence of a gene transcribed from the opposite strand that is not annotated. Due to the ambiguous nature of these predictions the antisense circRNAs (if any) are removed from the dataset and if specified by the user they can be exported in a file (i.e. antisenseCircRNAs.txt) for user consultation. Modules in circRNAprofiler are specific for the analysis of circRNAs that derive from the sense strand of the corresponding gene.

# Load object containing the merged back-spliced junctions
data("mergedBSJunctions")
head(mergedBSJunctions)
##                                       id       gene strand chrom
## 1        SYCP2:-:chr20:58497514:58497445      SYCP2      - chr20
## 2 AL132709.8:+:chr14:101432017:101432103 AL132709.8      + chr14
## 3        SLC8A1:-:chr2:40657441:40655613     SLC8A1      -  chr2
## 4        SLC8A1:-:chr2:40657444:40655613     SLC8A1      -  chr2
## 5         LGMN:-:chr14:93207524:93200027       LGMN      - chr14
## 6        EXOC6B:-:chr2:72960247:72945232     EXOC6B      -  chr2
##   startUpBSE endDownBSE     tool   C1  C2   C3   D1   D2  D3    H1   H2
## 1   58497514   58497445       cm 4858 577 4256 6561 6347 253 11050 8792
## 2  101432017  101432103       cm 4310 502 3824 5870 5609 203 10094 7953
## 3   40657441   40655613    cm,ms 2687 539 2718 2862 5623 428  2527 4446
## 4   40657444   40655613 cm,ms,ns 2064 389 2357 2263 4677 324  2196 3894
## 5   93207524   93200027       cm 1367 194 1114 1652 2164 195  1428 1551
## 6   72960247   72945232 cm,ms,ns  641  89  764  741 1186 277  1974 1154
##     H3
## 1 5804
## 2 5256
## 3 6113
## 4 5291
## 5 1579
## 6 1649

# Alternatively run:
# mergedBSJunctions <-
# mergeBSJunctions(backSplicedJunctions, gtf)

Plot commonly identified circRNAs.

# Plot
p <- ggplot(mergedBSJunctions, aes(x = tool)) +
    geom_bar() +
    labs(title = "", x = "Detection tool", y = "No. of circRNAs") +
    theme_classic()

gridExtra::grid.arrange(p, dt, nrow=1)

Module 4 - Filter circRNAs

filterCirc()

The use of multiple detection tools leads to the identification of a higher number of circRNAs. To rule out false positive candidates a filtering step can be applied to the detected circRNAs. The function filterCirc() filters circRNAs on different criteria: condition and read counts. The user can decide the filtering criteria. In the example below by setting allSamples = FALSE the filter is applied to the samples of each condition separately meaning that a circRNA is kept if at least 5 read counts are present in all samples of one of the conditions (A or B or C). If allSamples = TRUE, the filter is applied to all samples. We suggest to set allSamples = FALSE, since the presence of a disease/treatment can decrease the expression of subset of circRNAs thus by applying the filtering to all samples(allSamples = TRUE) those circRNAs are discarded.

filteredCirc <-
filterCirc(mergedBSJunctions, allSamples = FALSE, min = 5)

Plot circRNAs after the filtering step.

# Plot
p <- ggplot(filteredCirc, aes(x = tool)) +
    geom_bar() +
    labs(title = "", x = "Detection tool", y = "No. of circRNAs") +
    theme_classic()

gridExtra::grid.arrange(p, dt, nrow=1)

Alternatively:

# Plot using Venn diagram
cm <- filteredCirc[base::grep("cm", filteredCirc$tool), ]
ms <- filteredCirc[base::grep("ms", filteredCirc$tool), ]
ns <- filteredCirc[base::grep("ns", filteredCirc$tool), ]

p <- VennDiagram::draw.triple.venn(
    area1 = length(cm$id),
    area2 = length(ms$id),
    area3 = length(ns$id),
    n12 = length(intersect(cm$id, ms$id)),
    n23 = length(intersect(ms$id, ns$id)),
    n13 = length(intersect(cm$id, ns$id)),
    n123 = length(Reduce(
        intersect, list(cm$id, ms$id, ns$id)
    )),
    category = c("cm", "ms", "ns"),
    lty = "blank",
    fill = c("skyblue", "pink1", "mediumorchid")
)

Module 5 - Find differentially expressed circRNAs

getDeseqRes()

The helper functions getDeseqRes() identifies differentially expressed circRNAs. The latter uses the R Bioconductor packages DESeq2 which implement a beta-binomial model to model changes in circRNA expression. The differential expression analysis is performed by comparing the condition positioned forward against the condition positioned backward in the alphabet (values in the column condition in experiment.txt). E.g. if there are 2 conditions A and B then a negative log2FC means that in the conditions B there is a downregulation of the corresponding circRNA. If a positive log2FC is found means that there is an upregulation in the condition B of that circRNA.

# Compare condition B Vs A
deseqResBvsA <-
    getDeseqRes(
        filteredCirc,
        condition = "A-B",
        fitType = "local",
        pAdjustMethod = "BH"
    )
head(deseqResBvsA)
##                                       id       gene      log2FC    pvalue
## 1        SYCP2:-:chr20:58497514:58497445      SYCP2 -0.21872628 0.6882730
## 2 AL132709.8:+:chr14:101432017:101432103 AL132709.8 -0.23032811 0.6743554
## 3        SLC8A1:-:chr2:40657441:40655613     SLC8A1 -0.06724610 0.8141522
## 4        SLC8A1:-:chr2:40657444:40655613     SLC8A1 -0.04007647 0.8877923
## 5         LGMN:-:chr14:93207524:93200027       LGMN  0.04748208 0.8536426
## 6        EXOC6B:-:chr2:72960247:72945232     EXOC6B  0.51441043 0.1338385
##        padj        C1        C2        C3        D1        D2        D3
## 1 0.9984061 3359.6544 1895.7608 2591.0545 3555.2959 2357.5131  824.6553
## 2 0.9984061 2980.6732 1649.3448 2328.0528 3180.8546 2083.3923  661.6800
## 3 0.9984061 1858.2526 1770.9100 1654.7195 1550.8698 2088.5924 1395.0691
## 4 0.9984061 1427.4036 1278.0779 1434.9426 1226.2818 1737.2127 1056.0803
## 5 0.9984061  945.3782  637.3962  678.2037  895.1911  803.7905  635.6039
## 6 0.9984061  443.2973  292.4137  465.1235  401.5355  440.5247  902.8835
# Compare condition C Vs A
deseqResCvsA <-
    getDeseqRes(
        filteredCirc,
        condition = "A-C",
        fitType = "local",
        pAdjustMethod = "BH"
    )
head(deseqResCvsA)
##                                       id       gene      log2FC
## 1        SYCP2:-:chr20:58497514:58497445      SYCP2  0.66419598
## 2 AL132709.8:+:chr14:101432017:101432103 AL132709.8  0.69952766
## 3        SLC8A1:-:chr2:40657441:40655613     SLC8A1  0.16723172
## 4        SLC8A1:-:chr2:40657444:40655613     SLC8A1  0.31640317
## 5         LGMN:-:chr14:93207524:93200027       LGMN -0.08792522
## 6        EXOC6B:-:chr2:72960247:72945232     EXOC6B  0.90322591
##        pvalue       padj        C1        C2        C3       H1        H2
## 1 0.136048385 0.52337347 4634.2050 2498.0706 3487.7840 7990.915 5750.6484
## 2 0.110824859 0.46971699 4111.4499 2173.3647 3133.7608 7299.574 5201.8775
## 3 0.556461506 0.88083258 2563.2172 2333.5529 2227.3959 1827.425 2908.0281
## 4 0.251497581 0.66669722 1968.9171 1684.1412 1931.5571 1588.059 2546.9774
## 5 0.700433109 0.93530126 1304.0260  839.9059  912.9209 1032.672 1014.4740
## 6 0.002513725 0.03898947  611.4709  385.3176  626.0966 1427.517  754.8053
##          H3
## 1 3099.3849
## 2 2806.7483
## 3 3264.3935
## 4 2825.4386
## 5  843.1993
## 6  880.5799

Use volcanoPlot() function

# We set the xlim and ylim to the same values for both plots to make them
# comparable. Before setting the axis limits, you should visualize the 
# plots with the default values to be able to define the correct limits 
p1 <-
    volcanoPlot(
        deseqResBvsA,
        log2FC = 1,
        padj = 0.05,
        title = "DCMs Vs. Con",
        setxLim = TRUE,
        xlim = c(-8 , 7.5),
        setyLim = FALSE,
        ylim = c(0 , 4),
        gene = FALSE
    )
p2 <-
    volcanoPlot(
        deseqResCvsA,
        log2FC = 1,
        padj = 0.05,
        title = "HCMs Vs. Con",
        setxLim = TRUE,
        xlim = c(-8 , 7.5),
        setyLim = TRUE,
        ylim = c(0 , 4),
        gene = FALSE
    )
ggarrange(p1, 
          p2, 
          ncol = 1, 
          nrow = 2)

edgerRes()

Alternatively, the helper functions edgerRes() can also be used to identifies differentially expressed circRNAs. The latter uses the R Bioconductor packages EdgeR which implement a beta-binomial model to model changes in circRNA expression. The differential expression analysis is perfomed by comparing the condition positioned forward against the condition positioned backward in the alphabet (values in the column condition of experiment.txt). E.g. if there are 2 conditions A and B then a negative log2FC means that in the conditions B there is a downregulation of the corresponding circRNA. If a positive log2FC is found means that there is an upregulation in the condition B of that circRNA.

# Compare condition B Vs A
edgerResBvsA <-
    getEdgerRes(
        filteredCirc,
        condition = "A-B",
        normMethod = "TMM",
        pAdjustMethod = "BH"
    )
head(edgerResBvsA)
# Compare condition C Vs A
edgerResCvsA <-
    getEdgerRes(
        filteredCirc,
        condition = "A-C",
        normMethod = "TMM",
        pAdjustMethod = "BH"
    )
head(edgerResCvsA)

Module 6 - Map BSJ coordinates between species and genome assemblies

liftBSJcoords()

The function liftBSJCoords() maps back-spliced junction coordinates between different species and genome assemblies by using the liftOver utility from UCSC. Type data(ahChainFiles) to see all possibile options for annotationHubID E.g. if “AH14155” is specified, the hg19ToMm9.over.chain.gz will be used to convert the hg19 (Human GRCh37) coordinates to mm9 (Mouse GRCm37).

NOTE: Only back-spliced junction coordinates where the mapping was successful are reported. Back-spliced junction coordinates that could not be mapped might be not conserved between the analyzed species.

liftedBSJcoords <- liftBSJcoords(filteredCirc, map = "hg19ToMm9",
                                 annotationHubID = "AH14155")

Module 7 - Annotate circRNAs internal structure and flanking introns

annotateBSJs()

The function annotateBSJs() annotates circRNAs internal structure and the flanking introns. The genomic features are exracted from the user provided gene annotation. We first define the circRNA parental transcript as a linear transcript whose exon coordinates overlap with that of the detected back-spliced junctions and then the features are extracted from the selected transcript. Since the coordinates of the detected back-spliced junctions might not exactly correspond to annotated exonic coordinates, a gap of 10 nucleotides is allowed. As default, in situations where genes have multiple transcripts whose exons align to the back-spliced junction coordinates, the transcript that will produce the longest sequence (exon only) will be selected. Alternatively, the transcript to be used can be specified in transcripts.txt. The output data frame will have the following columns:

  • id: unique identifier.

  • gene: represents the gene from which the circRNA arises.

  • allTranscripts: are all transcripts of a circRNA’s host gene which exon coordinates overlap with the detected back-spliced junction coordinates.

  • transcript: as default, this is the transcript producing the longest sequence and whose exon coordinates overlap with the detected back-spliced junction coordinates. The transcript reported in this column is used in the downstream analysis.

  • totExons: total number of exons in the selected transcript (reported in the column transcript)

  • strand: is the strand from which the gene is transcribed.

  • chrom: is the chromosome from which the circRNA is derived.

  • startUpIntron: is the 5’ coordinate of the intron immediately upstream the acceptor site in the selected transcript.

  • endUpIntron: is the 3’ coordinate of the intron immediately upstream the acceptor site in the selected transcript.

  • startUpBSE: is the 5’ coordinate of upstream back-spliced exon in the selected transcript. This corresponds with the back-spliced junction / acceptor site.

  • endUpBSE: is the 3’ coordinate of the upstream back-spliced exon in the selected transcript.

  • startDownBSE: is the 5’ coordinate of downstream back-spliced exon in the selected transcript.

  • endDownBSE: is the 3’ coordinate of downstream back-spliced exon in the selected transcript. This corresponds with the back-spliced junction / donor site.

  • startDownIntron: is the 5’ coordinate of the intron immediately downstream the donor site in the selected transcript.

  • endDownIntron: is the 3’ coordinate of the intron immediately downstream the donor site in the selected transcript.

  • exNumUpBSE: is the position of the upstream back-spliced exon in the selected transcript (e.g. if it is the 1st, th