Contents

1 Introduction

InterMine constitutes a powerful open source data warehouse system which integrates diverse biological data sets and provides a growing number of analysis tools, including enrichment analysis widgets (Smith et al. 2012; Kalderimis et al. 2014).

Specifically, the gene set enrichment tool looks for annotations to a set of genes that occur more than would be expected by chance, given a background population of annotations. The hypergeometric distribution is used to calculate a P-value for each annotation and a choice of correction methods for multiple testing (Bonferonni, Holm-Bonferonni and Benjamini Hochberg) are available (Smith et al. 2012; Kalderimis et al. 2014).

InterMine provides Gene Ontology enrichment statistics as well as enrichment statistics for other annotation types including protein domains, pathways, human diseases, mammalian phenotypes and publications. The default background population is all genes in the genome with the specified annotation type. However, the background population can be changed by specifying another list. More information can be found in the online documentation.

2 Perform enrichment analysis with InterMineR

Performing enrichment analysis with InterMineR is preceded by two steps:

  1. Retrieve the list of bioentities of interest (Genes, Proteins, SNPs, etc.) for which the analysis will be performed.

  2. Get the enrichment widget name which indicates the annotation types that you want to investigate for enrichment (e.g. Gene Ontology Terms, protein domains, KEGG and Reactome Pathways, human diseases, mammalian phenotypes and publications).

2.1 Retrieve list containing the genes of interest from InterMineR

In this example, enrichment analysis is performed using a list of genes which are associated with all forms of Diabetes according to OMIM. All query results and calculations are correct according to HumanMine release 5.1 (November 2018). Genomic coordinates and p-values are likely to change between database releases but the methods will remain the same.

The PL_DiabetesGenes dataset is included in the InterMineR package and can also be found online in HumanMine.

Retrieve PL_DiabetesGenes data and get gene identifiers as HumanrMine Gene.symbol and Gene.primaryIdentifier values (ENTREZ IDs).

##   Gene.symbol                                 Gene.name
## 1       ABCC8 ATP binding cassette subfamily C member 8
## 2         ACE           angiotensin I converting enzyme
## 3        AKT2             AKT serine/threonine kinase 2
##   Gene.primaryIdentifier Gene.secondaryIdentifier Gene.length
## 1                   6833          ENSG00000006071       83961
## 2                   1636          ENSG00000159640       21320
## 3                    208          ENSG00000105221       55215
##   Gene.organism.name
## 1       Homo sapiens
## 2       Homo sapiens
## 3       Homo sapiens
## [1] "ABCC8" "ACE"   "AKT2"
## [1] "6833" "1636" "208"

2.2 Get enrichment widgets

After obtaining the Gene.id values of interest, we must determine the type of annotation for the enrichment analysis.

To retrieve all available widgets for a Mine we can use the getWidgets function.

##    startClassDisplay
## 1  primaryIdentifier
## 2               <NA>
## 3  primaryIdentifier
## 4  primaryIdentifier
## 5               <NA>
## 6  primaryIdentifier
## 7  primaryIdentifier
## 8  primaryIdentifier
## 9  primaryIdentifier
## 10              <NA>
##                                                 enrichIdentifier
## 1                         GWASResults.study.publication.pubMedId
## 2                                                           <NA>
## 3                   goAnnotation.ontologyTerm.parents.identifier
## 4                                          publications.pubMedId
## 5                                                           <NA>
## 6                                            pathways.identifier
## 7                         GWASResults.study.publication.pubMedId
## 8  proteins.proteinDomainRegions.proteinDomain.primaryIdentifier
## 9           proteinDomainRegions.proteinDomain.primaryIdentifier
## 10                                                          <NA>
##                                name
## 1         snp_gwas_study_enrichment
## 2   chromosome_distribution_for_snp
## 3            go_enrichment_for_gene
## 4            publication_enrichment
## 5  chromosome_distribution_for_gene
## 6                pathway_enrichment
## 7        snp_publication_enrichment
## 8      prot_dom_enrichment_for_gene
## 9   prot_dom_enrichment_for_protein
## 10                     interactions
##                                                                                                                                                                                                                       description
## 1                                                                                                                                                                                    GWAS studies enriched for SNPs in this list.
## 2  Actual: number of items in this list found on each chromosome.  Expected: given the total number of items on the chromosome and the number of items in this list, the number of items expected to be found on each chromosome.
## 3                                                                                                                                                                                       GO terms enriched for items in this list.
## 4                                                                                                                                                                                   Publications enriched for genes in this list.
## 5  Actual: number of items in this list found on each chromosome.  Expected: given the total number of items on the chromosome and the number of items in this list, the number of items expected to be found on each chromosome.
## 6                                                                                                                                                          Pathways enriched for genes in this list - data from KEGG and Reactome
## 7                                                                                                                                                                                    Publications enriched for SNPs in this list.
## 8                                                                                                                                                                                Protein Domains enriched for items in this list.
## 9                                                                                                                                                                                Protein Domains enriched for items in this list.
## 10                                                               Genes (from the list or not) that interact with genes in this list.  Counts may include the same interaction more than once if observed in multiple experiments.
##                                              enrich startClass
## 1                            GWASResults.study.name        SNP
## 2                                              <NA>        SNP
## 3            goAnnotation.ontologyTerm.parents.name       Gene
## 4                                publications.title       Gene
## 5                                              <NA>       Gene
## 6                                     pathways.name       Gene
## 7               GWASResults.study.publication.title        SNP
## 8  proteins.proteinDomainRegions.proteinDomain.name       Gene
## 9           proteinDomainRegions.proteinDomain.name    Protein
## 10                                             <NA>       <NA>
##                              title targets widgetType   chartType
## 1   GWAS study Enrichment for SNPs     SNP enrichment        <NA>
## 2          Chromosome Distribution     SNP      chart ColumnChart
## 3         Gene Ontology Enrichment    Gene enrichment        <NA>
## 4           Publication Enrichment    Gene enrichment        <NA>
## 5          Chromosome Distribution    Gene      chart ColumnChart
## 6               Pathway Enrichment    Gene enrichment        <NA>
## 7  Publication Enrichment for SNPs     SNP enrichment        <NA>
## 8        Protein Domain Enrichment    Gene enrichment        <NA>
## 9        Protein Domain Enrichment Protein enrichment        <NA>
## 10                    Interactions    Gene      table        <NA>
##                                                     filters
## 1                                                      <NA>
## 2                                      organism.name=[list]
## 3  biological_process,cellular_component,molecular_function
## 4                                                      <NA>
## 5                                      organism.name=[list]
## 6              All,KEGG pathways data set,Reactome data set
## 7                                                      <NA>
## 8                                                      <NA>
## 9                                                      <NA>
## 10                                                     <NA>
##                labels
## 1                <NA>
## 2  Chromosome & Count
## 3                <NA>
## 4                <NA>
## 5  Chromosome & Count
## 6                <NA>
## 7                <NA>
## 8                <NA>
## 9                <NA>
## 10               <NA>

HumanMine provides enrichment for genes, proteins and SNPs, but here we are interested only in the gene-related, enrichment widgets.

##   startClassDisplay
## 3 primaryIdentifier
## 4 primaryIdentifier
## 6 primaryIdentifier
## 8 primaryIdentifier
##                                                enrichIdentifier
## 3                  goAnnotation.ontologyTerm.parents.identifier
## 4                                         publications.pubMedId
## 6                                           pathways.identifier
## 8 proteins.proteinDomainRegions.proteinDomain.primaryIdentifier
##                           name
## 3       go_enrichment_for_gene
## 4       publication_enrichment
## 6           pathway_enrichment
## 8 prot_dom_enrichment_for_gene
##                                                              description
## 3                              GO terms enriched for items in this list.
## 4                          Publications enriched for genes in this list.
## 6 Pathways enriched for genes in this list - data from KEGG and Reactome
## 8                       Protein Domains enriched for items in this list.
##                                             enrich startClass
## 3           goAnnotation.ontologyTerm.parents.name       Gene
## 4                               publications.title       Gene
## 6                                    pathways.name       Gene
## 8 proteins.proteinDomainRegions.proteinDomain.name       Gene
##                       title targets widgetType chartType
## 3  Gene Ontology Enrichment    Gene enrichment      <NA>
## 4    Publication Enrichment    Gene enrichment      <NA>
## 6        Pathway Enrichment    Gene enrichment      <NA>
## 8 Protein Domain Enrichment    Gene enrichment      <NA>
##                                                    filters labels
## 3 biological_process,cellular_component,molecular_function   <NA>
## 4                                                     <NA>   <NA>
## 6             All,KEGG pathways data set,Reactome data set   <NA>
## 8                                                     <NA>   <NA>

The column names provides the character strings that can be passed as arguments to the doEnrichment function and thus define the type of enrichment analysis.

2.3 Perform enrichment analysis with InterMineR

We will perform Gene Ontology Enrichment analysis for the genes in our list using the Gene.id values and the ‘go_enrichment_for_gene’ widget.

As mentioned above ‘PL_DiabetesGenes’ constitutes a genelist which already exists in HumanMine. Therefore, the enrichment analysis could also be performed by passing the name of the list to the genelist argument, with the same results:

doEnrichment function returns a list which contains:

  • a data.frame with the results of the enrichment analysis which was performed in the defined InterMine platform. p-values are given after applying the correction algorithm. Count and populationAnnotationCount columns contain the number of genes that belong to each GO term in the list and in the background population respectively.
##   identifier               description                 pValue count
## 1 GO:0030072 peptide hormone secretion  4.336648956166819E-19    21
## 2 GO:0046879         hormone secretion 7.6663597110215735E-19    22
## 3 GO:0009914         hormone transport  9.540238337303802E-19    22
## 4 GO:0030073         insulin secretion 1.1561118844984925E-16    18
## 5 GO:0033500  carbohydrate homeostasis 2.8899958514686346E-16    17
## 6 GO:0042593       glucose homeostasis 3.1522069422433323E-16    17
##   populationAnnotationCount
## 1                       251
## 2                       313
## 3                       322
## 4                       207
## 5                       182
## 6                       181
## [1] 636   5
  • the size of the reference background population (populationCount)
## [1] 16597
  • the number of input features that were not included in the enrichment analysis (notAnalyzed)
## [1] 6
  • the name and url of the Mine (im)
## $mine
##                             HumanMine 
## "https://www.humanmine.org/humanmine" 
## 
## $token
## [1] ""
  • the rest of the parameters used to perform the analysis (parameters)
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               ids 
## "1018504,1069788,1088766,1074405,1167250,1169335,1094079,1035007,1150284,1046584,1074688,1085117,1076201,1251964,1248044,1254789,1265202,1270014,1120348,1043788,1229453,1146398,1204931,1227904,1208398,1128029,1113856,1233700,1229612,1276310,1109526,1001868,1179209,1174712,1186790,1126059,1042350,1214384,1270234,1095332,1070530,1153906,1247509,1072698,1161160,1014965,1172352,1223354,1082710,1285639,1111701,1072131,1283829,1007739,1257384,1195770,1252086,1180051,1198453,1061004,1211404,1086036,1229758,1196726,1115541,1260234,1053657,1059762" 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            widget 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          "go_enrichment_for_gene" 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              maxp 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            "0.05" 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        correction 
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              "Benjamini Hochberg"

2.4 Apply filters on enrichment analysis

Some of the enrichment widgets contain filters which, when applied, limit the annotation types of the enrichment analysis.

In our example, if we want to retrieve only the enriched GO terms in our list of genes that are related to molecular function, we will use the appropriate filter:

## [1] "biological_process,cellular_component,molecular_function"
##   identifier                                 description
## 1 GO:0005102                  signaling receptor binding
## 2 GO:0005158                    insulin receptor binding
## 3 GO:0005159 insulin-like growth factor receptor binding
## 4 GO:0140297    DNA-binding transcription factor binding
## 5 GO:0001067      regulatory region nucleic acid binding
## 6 GO:0044212 transcription regulatory region DNA binding
##                  pValue count populationAnnotationCount
## 1 0.0037534905026989592    19                      1545
## 2  0.004425288799249525     4                        21
## 3  0.023859181561416545     3                        16
## 4  0.023910794938533403     8                       336
## 5  0.025815502329076468    13                       911
## 6  0.033633006777797264    13                       909
## [1] 6 5

2.5 Apply multiple test correction

To reduce the probability of false positive errors, three different multiple correction algorithms can be applied to the results of the enrichment analysis:

  • Benjamini Hochberg (default)
  • Holm-Bonferroni
  • Bonferroni

The application of one of these algorithms changes the p-values and determines the number of the results which will be returned from the analysis:

## NULL
##   identifier                                                description
## 1  IPR006897      Hepatocyte nuclear factor 1, beta isoform, C-terminal
## 2  IPR023219 Hepatocyte nuclear factor 1, N-terminal domain superfamily
## 3  IPR039066                                Hepatocyte nuclear factor 1
## 4  IPR006899                    Hepatocyte nuclear factor 1, N-terminal
## 5  IPR039011                                 Insulin receptor substrate
## 6  IPR006020                                              PTB/PI domain
##                  pValue count populationAnnotationCount
## 1 1.0948063262382918E-5     2                         2
## 2 1.0948063262382918E-5     2                         2
## 3 1.0948063262382918E-5     2                         2
## 4  3.277347110668014E-5     2                         3
## 5  3.277347110668014E-5     2                         3
## 6  3.198964753864943E-4     3                        40

3 Visualization of InterMineR Gene Ontology Enrichment analysis results

3.1 Convert InterMineR enrichment analysis results to a GeneAnswer-class object

In order to visualize the InterMineR enrichment analysis results, we will use the function convertToGeneAnswers. This function was created to facilitate the visualization of doEnrichment function results by converting them into a GeneAnswers-class object.

This way we can utilize the functions of the package GeneAnswers to visualize the results of the enrichment analysis and the relations between the features (e.g. genes) and/or the annoatation categories (e.g. GO terms) (Feng et al. 2010, 2012; Huang et al. 2014).

## [1] "GeneAnswers"
## attr(,"package")
## [1] "GeneAnswers"
## This GeneAnswers instance was build from GO based on hyperG test.
## Statistical information of 636 categories with p value less than 0.05 are reported. Other categories are considered as nonsignificant.
## There are 636 categories related to the given 68 genes
## 
## Summary of GeneAnswers instance information:
## 
## Slot: geneInput
##   GeneID
## 1   6833
## 2   1636
## 3    208
## 4  26060
## 5    359
## 6    551
## ......
## 
## Slot: testType
## [1] "hyperG"
## 
## Slot: pvalueT
## [1] 0.05
## 
## Slot: genesInCategory
## $`GO:0030072`
##  [1] "6833"   "640"    "11132"  "2645"   "3077"   "6927"   "6928"   "3172"  
##  [9] "3557"   "3569"   "3630"   "3667"   "8660"   "3710"   "3767"   "4544"  
## [17] "4760"   "3651"   "6514"   "169026" "6934"  
## 
## $`GO:0046879`
##  [1] "6833"   "640"    "11132"  "2645"   "3077"   "6927"   "6928"   "3172"  
##  [9] "3557"   "3569"   "3630"   "3667"   "8660"   "3710"   "3767"   "4544"  
## [17] "4760"   "3651"   "56729"  "6514"   "169026" "6934"  
## 
## $`GO:0009914`
##  [1] "6833"   "640"    "11132"  "2645"   "3077"   "6927"   "6928"   "3172"  
##  [9] "3557"   "3569"   "3630"   "3667"   "8660"   "3710"   "3767"   "4544"  
## [17] "4760"   "3651"   "56729"  "6514"   "169026" "6934"  
## 
## $`GO:0030073`
##  [1] "6833"   "640"    "11132"  "2645"   "6927"   "6928"   "3172"   "3557"  
##  [9] "3667"   "8660"   "3710"   "3767"   "4544"   "4760"   "3651"   "6514"  
## [17] "169026" "6934"  
## 
## $`GO:0033500`
##  [1] "6833"   "2642"   "2645"   "6927"   "3172"   "3630"   "3643"   "3667"  
##  [9] "8660"   "3767"   "4544"   "4760"   "4938"   "3651"   "5468"   "169026"
## [17] "6934"   "7466"  
## 
## $`GO:0042593`
##  [1] "6833"   "2642"   "2645"   "6927"   "3172"   "3630"   "3643"   "3667"  
##  [9] "8660"   "3767"   "4544"   "4760"   "4938"   "3651"   "5468"   "169026"
## [17] "6934"   "7466"  
## 
## ......
## 
## Slot: geneExprProfile
## NULL
## 
## Slot: annLib
## [1] "org.Hs.eg.db"
## 
## Slot: categoryType
## [1] "GO"
## 
## Slot: enrichmentInfo
##            genes in Category percent in the observed List
## GO:0030072                21                    0.3088235
## GO:0046879                22                    0.3235294
## GO:0009914                22                    0.3235294
## GO:0030073                18                    0.2647059
## GO:0033500                17                    0.2500000
## GO:0042593                17                    0.2500000
##            percent in the genome fold of overrepresents odds ratio
## GO:0030072            0.01512322               20.42049   29.09774
## GO:0046879            0.01885883               17.15533   24.88179
## GO:0009914            0.01940110               16.67583   24.17297
## GO:0030073            0.01247213               21.22379   28.50435
## GO:0033500            0.01096584               22.79808   30.06410
## GO:0042593            0.01090559               22.92403   30.23204
##                 p value  fdr p value
## GO:0030072 4.336649e-19 4.336649e-19
## GO:0046879 7.666360e-19 7.666360e-19
## GO:0009914 9.540238e-19 9.540238e-19
## GO:0030073 1.156112e-16 1.156112e-16
## GO:0033500 2.889996e-16 2.889996e-16
## GO:0042593 3.152207e-16 3.152207e-16
## ......

GeneAnswers is package designed for the enrichment analysis and visualization of gene lists. It is worth mentioning that the InterMineR filters for the widgets of Gene Ontology and Pathway Enrichment:

## [1] "biological_process,cellular_component,molecular_function"
## [2] "All,KEGG pathways data set,Reactome data set"

match the following available values:

InterMineR widget name InterMineR filter convertToGeneAnswers categoryType
go_enrichment_for_gene biological_process GO.BP
go_enrichment_for_gene cellular_component GO.CC
go_enrichment_for_gene molecular_function GO.MF
pathway_enrichment KEGG pathways data set KEGG
pathway_enrichment Reactome data set REACTOME.PATH

for the categoryType argument of the geneAnswerBuilder function, and can be assigned accordingly in the categoryType argument of the convertToGeneAnswers, therby facilitating the conversion of InterMineR Gene Ontology and Pathway Enrichment analysis results to GeneAnswers-class objects.

## [1] "GeneAnswers"
## attr(,"package")
## [1] "GeneAnswers"
## This GeneAnswers instance was build from GO.MF based on hyperG test.
## Statistical information of 6 categories with p value less than 0.05 are reported. Other categories are considered as nonsignificant.
## There are 6 categories related to the given 68 genes
## 
## Summary of GeneAnswers instance information:
## 
## Slot: geneInput
##   GeneID
## 1   6833
## 2   1636
## 3    208
## 4  26060
## 5    359
## 6    551
## ......
## 
## Slot: testType
## [1] "hyperG"
## 
## Slot: pvalueT
## [1] 0.05
## 
## Slot: genesInCategory
## $`GO:0005102`
##  [1] "1636"  "551"   "640"   "2056"  "3077"  "3082"  "3159"  "3172"  "3557" 
## [10] "3569"  "3630"  "3643"  "3667"  "8660"  "5468"  "5770"  "56729" "6934" 
## [19] "7422" 
## 
## $`GO:0005158`
## [1] "3630" "3667" "8660" "5770"
## 
## $`GO:0005159`
## [1] "3630" "3643" "3667"
## 
## $`GO:0140297`
## [1] "50943" "3159"  "3172"  "4760"  "5468"  "6925"  "6934"  "7466" 
## 
## $`GO:0001067`
##  [1] "50943" "3159"  "6927"  "6928"  "3172"  "8462"  "4760"  "5078"  "3651" 
## [10] "5325"  "5468"  "6925"  "6934" 
## 
## $`GO:0044212`
##  [1] "50943" "3159"  "6927"  "6928"  "3172"  "8462"  "4760"  "5078"  "3651" 
## [10] "5325"  "5468"  "6925"  "6934" 
## 
## 
## Slot: geneExprProfile
## NULL
## 
## Slot: annLib
## [1] "org.Hs.eg.db"
## 
## Slot: categoryType
## [1] "GO.MF"
## 
## Slot: enrichmentInfo
##            genes in Category percent in the observed List
## GO:0005102                19                   0.27941176
## GO:0005158                 4                   0.05882353
## GO:0005159                 3                   0.04411765
## GO:0140297                 8                   0.11764706
## GO:0001067                13                   0.19117647
## GO:0044212                13                   0.19117647
##            percent in the genome fold of overrepresents odds ratio
## GO:0005102          0.0936250151               2.984371   3.753821
## GO:0005158          0.0012725730              46.224090  49.050595
## GO:0005159          0.0009695794              45.501838  47.555769
## GO:0140297          0.0203611683               5.778011   6.415079
## GO:0001067          0.0552054296               3.463001   4.045165
## GO:0044212          0.0550842322               3.470621   4.054585
##                p value fdr p value
## GO:0005102 0.003753491 0.003753491
## GO:0005158 0.004425289 0.004425289
## GO:0005159 0.023859182 0.023859182
## GO:0140297 0.023910795 0.023910795
## GO:0001067 0.025815502 0.025815502
## GO:0044212 0.033633007 0.033633007

Briefly, in the following sections we present how to use several functions of the GeneAnswers package to visualize the results of the Gene Ontology Enrichment analysis on the PL_DiabetesGenes gene list.

For more information about the usage of GeneAnswers package, the user should look at the respective documentation.