Sparse Partial correlation ON Gene Expression with SPONGE

Markus List, Azim Dehghani Amirabad, Dennis Kostka, Marcel H. Schulz

2021-10-26

Purpose

SPONGE logo

SPONGE is the first method to solve the computationally demanding task of reporting significant ceRNA interactions efficiently on a genome-wide scale. Beyond ceRNAs, this method is well suited to infer other types of regulatory interactions such as transcription factor regulation.

Introduction

MicroRNAs (miRNAs) are small 19-22 nucleotide long molecules that facilitate the degradation of messenger RNA (mRNA) transcripts targeted via matching seed sequences. The competing endogenous RNA (ceRNA) hypothesis suggests that mRNAs that possess binding sites for the same miRNAs are in competition. This motivates the existence of so-called sponges, i.e., genes that exert strong regulatory control via miRNA binding in a ceRNA interaction network. It is currently an unsolved problem how to estimate miRNA-mediated ceRNA interactions genome-wide. The most widely used approach considers miRNA and mRNA expression jointly measured for the same cell state. Several partial association methods have been proposed for determining ceRNA interaction strength using conditional mutual information or partial correlation, for instance.

However, we identified three key limitations of existing approaches that prevent the construction of an accurate genome-wide ceRNA interaction network: (i) none of the existing methods considers the combinatorial effect of several miRNAs; (ii) due to the computational demand, the inference of a ceRNA interaction for all putative gene-miRNA-gene interactions in the human genome is prohibitive; (iii) an efficient strategy for determining the significance of inferred ceRNA interactions is missing, and thus important parameters of the system are neglected.

To overcome these challenges, we developed a novel method called Sparse partial correlation on gene expression (SPONGE). We reduce the computational complexity of constructing a genome-wide ceRNA interaction network in several steps. First, we consider only miRNA-gene interactions that are either predicted or experimentally validated. Second, we retain only miRNA-gene interactions that have a negative coefficient in a regularized regression model. Third, instead of each gene-miRNA-gene triplet, we compute a single sensitivity correlation (correlation - partial correlation) for each gene-gene pair given all shared miRNAs that pass the above filter as putative regulators. Finally, we derived the first mathematical formulation to simulate the null distribution of the process for different parameters of the system: number of miRNAs, correlation between genes and sample number. Our formulation enables the computation of empirical p-values for the statistic in a very efficient manner, an order of magnitude faster than previous methods. Analyses revealed that previous studies have underestimated the effect of these parameters in their network inference. Network centrality measures can be applied to SPONGE inferred ceRNA networks to reveal known and novel sponges, many of which are potential biomarkers.

Further details demonstrating how SPONGE improves over the state of the art and how SPONGE inferred ceRNA networks can be used for biomarker discovery will be available in our paper (manuscript submitted, link will be included here at a later point).

General Workflow