In this vignette, we are going to see how to use MIRit for investigating the compromised miRNA-gene regulatory networks in thyroid cancer. In particular, an RNA-Seq experiment will be used as an example to demonstrate how to perform an integrative analysis with MIRit, including differential expression analysis, functional enrichment and characterization, correlation analysis and, lastly, the construction and visualization of the impaired miRNAs regulatory axes within biological pathways.
MIRit 1.0.0
MIRit
(miRNA integration tool) is an open source R package that aims to
facilitate the comprehension of microRNA (miRNA) biology through the integrative
analysis of gene and miRNA expression data deriving from different platforms,
including microarrays, RNA-Seq, miRNA-Seq, proteomics and single-cell
transcriptomics. Given their regulatory importance, a complete characterization
of miRNA dysregulations results crucial to explore the molecular networks that
may lead to the insurgence of complex diseases. Unfortunately, there are no
currently available options for thoroughly interpreting the biological
consequences of miRNA dysregulations, thus limiting the ability to identify
the affected pathways and reconstruct the compromised molecular networks. To
tackle this limitation, we developed MIRit, an all-in-one framework that
provides flexible and powerful methods for performing integrative miRNA-mRNA
multi-omic analyses from start to finish. In particular, MIRit includes multiple
modules that allow to perform:
If you use MIRit in published research, please cite the following paper:
Ronchi J and Foti M. ‘MIRit: an integrative R framework for the identification of impaired miRNA-mRNA regulatory networks in complex diseases’. bioRxiv (2023). doi:10.1101/2023.11.24.568528
This package internally uses different R/Bioconductor packages, remember to cite the appropriate publications.
Before starting, MIRit must be installed on your system. You can do this through Bioconductor.
## install MIRit from Bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("MIRit")
If needed, you could also install the development version of MIRit directly from GitHub:
## install the development version from GitHub
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("jacopo-ronchi/MIRit")
When MIRit is installed, we can load it through:
## load MIRit
library(MIRit)
To demonstrate the capabilities of MIRit we will use RNA-Seq data from Riesco-Eizaguirre et al. (2015). This experiment collected samples from 8 papillary thyroid carcinoma tumors and contralateral normal thyroid tissue from the same patients. These samples were profiled for gene expression through RNA-Seq, and for miRNA expression through miRNA-Seq. To provide easy access to the user, raw count matrices have been retrieved from GEO and included in this package.
To load example data, we can simply use the data()
function:
## load count matrix for genes
data(geneCounts, package = "MIRit")
## load count matrix for miRNAs
data(mirnaCounts, package = "MIRit")
When using MIRit, we must specify whether miRNA and gene expression values derive from the same individuals or not. As already mentioned, paired data are those where individuals used to measure gene expression are the same subjects used to measure miRNA expression. On the other hand, unpaired data are those where gene expression and miRNA expression derive from different cohorts of donors. Importantly, MIRit considers as paired samples those data sets where paired measurements are available for at least some samples.
In our case, miRNA and gene expression data originate from the same subjects, and therefore we will conduct a paired samples analysis.
As input data, MIRit requires miRNA and gene expression measurements as matrices with samples as columns, and genes/miRNAs as rows. Further, the row names of miRNA expression matrix should contain miRNA names according to miRBase nomenclature (e.g. hsa-miR-151a-5p, hsa-miR-21-5p), whereas for gene expression matrix, row names must contain gene symbols according to HGNC (e.g. TYK2, BDNF, NTRK2).
These matrices may handle different types of values deriving from multiple technologies, including microarrays, RNA-Seq and proteomics. The only requirement is that, for microarray studies, expression matrices must be normalized and log2 transformed. This can be achieved by applying the RMA algorithm implemented in the oligo (Carvalho and Irizarry 2010) package, or by applying other quantile normalization strategies. On the contrary, for RNA-Seq and miRNA-Seq experiments, the simple count matrix must be supplied.
Eventually, expression matrices required by MIRit should appear as those in
mirnaCounts
and geneCounts
, which are displayed in Tables
1 and 2.
PTC 1 | PTC 2 | PTC 3 | PTC 4 | PTC 5 | |
---|---|---|---|---|---|
A1BG | 7 | 6 | 9 | 12 | 7 |
A1BG-AS1 | 20 | 8 | 22 | 6 | 16 |
A1CF | 0 | 0 | 0 | 1 | 1 |
A2M | 9 | 11 | 9 | 37 | 18 |
A2M-AS1 | 1486 | 722 | 801 | 968 | 1787 |
PTC 1 | PTC 2 | PTC 3 | PTC 4 | PTC 5 | |
---|---|---|---|---|---|
hsa-let-7a-2-3p | 3 | 0 | 9 | 1 | 4 |
hsa-let-7a-3p | 472 | 82 | 228 | 122 | 313 |
hsa-let-7a-5p | 141101 | 45543 | 105598 | 45503 | 159598 |
hsa-let-7b-3p | 412 | 81 | 120 | 147 | 164 |
hsa-let-7b-5p | 16337 | 6586 | 8121 | 7993 | 16516 |
Once we have expression matrices in the proper format, we need to inform MIRit
about the samples in study and the biological conditions of interest. To do so,
we need to create a data.frame
that must contain:
primary
, specifying a unique identifier for each different
subject;mirnaCol
, matching the column name used for each sample in
the miRNA expression matrix;geneCol
, matching the column name used for each sample in
the gene expression matrix;Firstly, let’s take a look at the column names used for miRNA and gene expression matrices.
## print sample names in geneCounts
colnames(geneCounts)
#> [1] "PTC 1" "PTC 2" "PTC 3" "PTC 4" "PTC 5" "PTC 6" "PTC 7" "PTC 8" "NTH 1"
#> [10] "NTH 2" "NTH 3" "NTH 4" "NTH 5" "NTH 6" "NTH 7" "NTH 8"
## print sample names in mirnaCounts
colnames(mirnaCounts)
#> [1] "PTC 1" "PTC 2" "PTC 3" "PTC 4" "PTC 5" "PTC 6" "PTC 7" "PTC 8" "NTH 1"
#> [10] "NTH 2" "NTH 3" "NTH 4" "NTH 5" "NTH 6" "NTH 7" "NTH 8"
## check if samples in geneCounts are equal to those in mirnaCounts
identical(colnames(geneCounts), colnames(mirnaCounts))
#> [1] TRUE
In our case, we see that both expression matrices have the same column names,
and therefore mirnaCol
and geneCol
will be identical. However, note that is
not always the case, especially for unpaired data, where miRNA and gene
expression values derive from different subjects. In these cases, mirnaCol
and
geneCol
must map each column of miRNA and gene expression matrices to the
relative subjects indicated in the primary
column. Notably, for unpaired data,
NAs
can be used for missing entries in mirnaCol
/geneCol
.
That said, we can proceed to create the data.frame
with sample metadata as
follows.
## create a data.frame containing sample metadata
meta <- data.frame(primary = colnames(mirnaCounts),
mirnaCol = colnames(mirnaCounts),
geneCol = colnames(geneCounts),
disease = c(rep("PTC", 8), rep("NTH", 8)),
patient = c(rep(paste("Sample_", seq(8), sep = ""), 2)))
MirnaExperiment
objectAt this point, we need to create an object of class MirnaExperiment
, which is
the main class used in MIRit for integrative miRNA-mRNA analyses. In particular,
this class extends the MultiAssayExperiment
class from the homonym package
(Ramos et al. 2017) to store expression levels of both miRNAs and genes,
differential expression results, miRNA-target pairs and integrative miRNA-gene
analysis.
The easiest way to create a valid MirnaExperiment
object is to use the
MirnaExperiment()
function, which automatically handles the formatting of
input data and the creation of the object.
## create the MirnaExperiment object
experiment <- MirnaExperiment(mirnaExpr = mirnaCounts,
geneExpr = geneCounts,
samplesMetadata = meta,
pairedSamples = TRUE)
Now that the MirnaExperiment
object has been created, we can move to
differential expression analysis, which aims to define differentially expressed
features across biological conditions.
Firstly, before doing anything else, it is useful to explore expression
variability through dimensionality reduction techniques. This is useful because
it allows us to visualize the main drivers of expression differences. In this
regard, MIRit offers the plotDimensions()
function, which enables to visualize
both miRNA and gene expression in the multidimensional space (MDS plots).
Moreover, it is possible to color samples based on specific variables, hence
allowing to explore specific patterns between distinct biological groups.
In our example, let’s examine expression variability for both miRNAs and genes, and let’s color the samples based on “disease”, a variable included in the previously defined metadata.
geneMDS <- plotDimensions(experiment,
assay = "genes",
condition = "disease",
title = "MDS plot for genes")
mirnaMDS <- plotDimensions(experiment,
assay = "microRNA",
condition = "disease",
title = "MDS plot for miRNAs")
ggpubr::ggarrange(geneMDS, mirnaMDS,
nrow = 1, labels = "AUTO", common.legend = TRUE)