Dependencies

The wppi package depends on the OmnipathR package. Since it relies on features more recent than the latest Bioconductor version (OmnipathR 2.0.0 in Bioconductor 3.12), until the release of Bioconductor 3.13, it is recommended to install OmnipathR from git.

Complete workflow in a single call

The score_candidate_genes_from_PPI function executes the full wppi workflow. The only mandatory input is a set of genes of interest. As a return, an ordered table with the similarity scores of the new genes within the neighbourhood of the genes of interest is provided. A higher score stands for a higher functional similarity between this new gene and the given ones.

library(wppi)
# example gene set
genes_interest <- c(
    'ERCC8', 'AKT3', 'NOL3', 'TTK',
    'GFI1B', 'CDC25A', 'TPX2', 'SHE'
)
scores <- score_candidate_genes_from_PPI(genes_interest)
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
## 'as(<dgCMatrix>, "dgTMatrix")' is deprecated.
## Use 'as(., "TsparseMatrix")' instead.
## See help("Deprecated") and help("Matrix-deprecated").
scores
# # A tibble: 295 x 3
#    score gene_symbol uniprot
#    <dbl> <chr>       <chr>
#  1 0.247 KNL1        Q8NG31
#  2 0.247 HTRA2       O43464
#  3 0.247 KAT6A       Q92794
#  4 0.247 BABAM1      Q9NWV8
#  5 0.247 SKI         P12755
#  6 0.247 FOXA2       Q9Y261
#  7 0.247 CLK2        P49760
#  8 0.247 HNRNPA1     P09651
#  9 0.247 HK1         P19367
# 10 0.180 SH3RF1      Q7Z6J0
# # . with 285 more rows

Workflow step by step

Database knowledge

The database knowledge is provided by wppi_data. By default all directed protein-protein interactions are used from OmniPath. By passing various options the network can be customized. See more details in the documentation of the OmnipathR package, especially the import_post_translational_interactions function. For example, to use only the literature curated interactions one can use the datasets = 'omnipath' parameter:

omnipath_data <- wppi_omnipath_data(datasets = 'omnipath')

The wppi_data function retrieves all database data at once. Parameters to customize the network can be passed directly to this function.

db <- wppi_data(datasets = c('omnipath', 'kinaseextra'))
names(db)
# [1] "hpo"      "go"       "omnipath" "uniprot"

Optionally, the Human Phenotype Ontology (HPO) annotations relevant in the context can be selected. For example, to select the annotations related to diabetes:

# example HPO annotations set
HPO_data <- wppi_hpo_data()
HPO_interest <- unique(dplyr::filter(HPO_data, grepl('Diabetes', Name))$Name)

Converting the interactions to an igraph graph object

To work further with the interactions we first convert it to an igraph graph object:

graph_op <- graph_from_op(db$omnipath)

Subgraph from the neighborhood of genes of interest

Then we select a subgraph around the genes of interest. The size of the subgraph is determined by the range of this neighborhood (sub_level argument for the subgraph_op function).

graph_op_1 <- subgraph_op(graph_op, genes_interest)
igraph::vcount(graph_op_1)
# [1] 256

Weighted adjacency matrix

The next step is to assign weights to each interaction. The weights are calculated based on the number of common neighbors and the similarities of the annotations of the interacting partners.

w_adj <- weighted_adj(graph_op_1, db$go, db$hpo)

Random walk

The random walk with restarts algorithm uses the edge weights to score the overall connections between pairs of genes. The result takes into accound also the indirect connections, integrating the information in the graph topology.

w_rw <- random_walk(w_adj)

Scoring proteins

At the end we can summarize the scores for each protein, taking the sum of all adjacent connections. The resulted table provides us a list of proteins prioritized by their predicted importance in the context of interest (disease or condition).

scores <- prioritization_genes(graph_op_1, w_rw, genes_interest)
scores
# # A tibble: 249 x 3
#    score gene_symbol uniprot
#    <dbl> <chr>       <chr>
#  1 0.251 HTRA2       O43464 
#  2 0.251 KAT6A       Q92794 
#  3 0.251 BABAM1      Q9NWV8 
#  4 0.251 SKI         P12755 
#  5 0.251 CLK2        P49760 
#  6 0.248 TUBB        P07437 
#  7 0.248 KNL1        Q8NG31 
#  8 0.189 SH3RF1      Q7Z6J0 
#  9 0.189 SRPK2       P78362 
# 10 0.150 CSNK1D      P48730 
# # . with 239 more rows

Network visualization

The top genes in the first order neighborhood of the genes of interest can be visualized in the PPI network:

{r fig1,dpi = 300, echo=FALSE, eval = FALSE, fig.cap="PPI network visualization of genes of interest (blue nodes) and their first neighbor with similarity scores (green nodes). "} idx_neighbors <- which(!V(graph_op_1)$Gene_Symbol %in% genes_interest) cols <- rep("lightsteelblue2",vcount(graph_op_1)) cols[idx_neighbors] <- "#57da83" scores.vertex <- rep(1,vcount(graph_op_1)) scores.vertex[idx_neighbors] <- 8*scores[na.omit(match(V(graph_op_1)$Gene_Symbol,scores$gene_symbol)),]$score par(mar=c(0.1,0.1,0.1,0.1)) plot(graph_op_1,vertex.label = ifelse(scores.vertex>=1,V(graph_op_1)$Gene_Symbol,NA), layout = layout.fruchterman.reingold,vertex.color=cols, vertex.size = 7*scores.vertex,edge.width = 0.5,edge.arrow.mode=0, vertex.label.font = 1, vertex.label.cex = 0.45)

library(knitr)
knitr::include_graphics("../figures/fig1.png")

Session info

## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] wppi_1.10.0
## 
## loaded via a namespace (and not attached):
##  [1] tidyr_1.3.0       rappdirs_0.3.3    sass_0.4.7        utf8_1.2.4       
##  [5] generics_0.1.3    bitops_1.0-7      xml2_1.3.5        stringi_1.7.12   
##  [9] lattice_0.22-5    hms_1.1.3         digest_0.6.33     magrittr_2.0.3   
## [13] evaluate_0.22     grid_4.3.1        timechange_0.2.0  fastmap_1.1.1    
## [17] cellranger_1.1.0  jsonlite_1.8.7    Matrix_1.6-1.1    progress_1.2.2   
## [21] backports_1.4.1   httr_1.4.7        rvest_1.0.3       selectr_0.4-2    
## [25] purrr_1.0.2       fansi_1.0.5       jquerylib_0.1.4   cli_3.6.1        
## [29] rlang_1.1.1       crayon_1.5.2      bit64_4.0.5       withr_2.5.1      
## [33] cachem_1.0.8      yaml_2.3.7        parallel_4.3.1    tools_4.3.1      
## [37] tzdb_0.4.0        checkmate_2.2.0   dplyr_1.1.3       curl_5.1.0       
## [41] vctrs_0.6.4       logger_0.2.2      R6_2.5.1          lifecycle_1.0.3  
## [45] lubridate_1.9.3   stringr_1.5.0     bit_4.0.5         vroom_1.6.4      
## [49] pkgconfig_2.0.3   pillar_1.9.0      bslib_0.5.1       later_1.3.1      
## [53] glue_1.6.2        Rcpp_1.0.11       xfun_0.40         tibble_3.2.1     
## [57] tidyselect_1.2.0  knitr_1.44        htmltools_0.5.6.1 OmnipathR_3.10.0 
## [61] igraph_1.5.1      rmarkdown_2.25    readr_2.1.4       compiler_4.3.1   
## [65] prettyunits_1.2.0 readxl_1.4.3      RCurl_1.98-1.12