This user manual provides instructions on how to use an R/Bioconductor package “Pi”. Pi is developed as a genetics-led target prioritisation system, with the focus on leveraging human genetic data to prioritise potential drug targets at the gene and pathway level. The long-term goal is to use such information to enhance early-stage target selection and validation. Based on evidence of disease association from genome-wide association studies (GWAS), this prioritisation system is able to generate evidence to support identification of the specific modulated genes (genomic seed genes) that are responsible for the genetic association signal: (i) nearby genes (nGene) based on genomic proximity; (ii) expression-associated genes (eGene) based on summary data produced from eQTL mapping; and (iii) chromatin conformation genes (cGene) based on summary data produced from promoter capture Hi-C studies. Restricted to genomic seed genes (nGene, eGene and cGene), gene-level ontology annotations are further used to define three types of annotation predictors on the basis of relatedness to immune function and dysfunction: (i) function genes (fGene) using Gene Ontology; (ii) disease gene (dGene), causing rare genetic disease using OMIM and Disease Ontology; and (iii) phenotype genes (pGene) using Human Phenotype Ontology and using Mammalian Phenotype Ontology. For each type of seed genes (and their scores), non-seed genes under network influence are identified using the random walk with restart (RWR) algorithm, that is, identification of non-seed genes based on network connectivity/affinity of gene interaction information to seed genes. In summary, given GWAS summary data for a trait, a gene-predictor matrix is constructed containing affinity scores, with columns for genomic and annotation predictors and rows for seed and non-seed genes. Using the gene-predictor matrix prepared, target prioritizations are enabled at the gene and pathway level under two modes (“discovery” and “supervised”), each consisting of three sequential steps: Step 1 (shared by both modes), the preparation of the predictor matrix from GWAS summary data; Step 2 (specific to each mode), the prioritisation of target genes, either through meta-analysis (discovery mode) to prioritise target genes with substantial genetic support and/or with rich network connectivity, or through machine learning (supervised mode) to prioritise target genes guided by knowledge of efficacious drugs; and Step 3 (shared by both modes), the prioritisation of target pathways individually and at crosstalk based on the prioritised target genes.
The latest version on different platforms can be installed following instructions at http://bioconductor.org/install/#install-R.
Pi (the latest stable release version from Bioconductor):
Also install the latest development version from github (highly recommended):