SBGNview is a tool set for pathway based data visalization, integration and analysis. SBGNview is similar and complementary to the widely used Pathview(Luo and Brouwer 2013), with the following key features:
Pathway definition by the widely adopted Systems Biology Graphical Notation (SBGN);
Supports multiple major pathway databases beyond KEGG (Reactome, MetaCyc, SMPDB, PANTHER, METACROP etc) and user defined pathways;
Covers 5,200 reference pathways and over 3,000 species by default;
Extensive graphics controls, including glyph and edge attributes, graph layout and sub-pathway highlight;
SBGN pathway data manipulation, processing, extraction and analysis.
Please cite the following papers when using this open-source package. This will help the project and our team:
Luo W, Brouwer C. Pathview: an R/Biocondutor package for pathway-based data integration and visualization. Bioinformatics, 2013, 29(14):1830-1831, doi: 10.1093/bioinformatics/btt285
Please see the Quick Start tutorial for installation instructions and quick start examples.
SBGNview is the main function of the package. It takes two major types of inputs: user data (gene and/or compound data) and SBGN pathway data (pathway IDs or SBGN-ML files). SBGNview parses the SBGN pathway data, extracts glyph (node) and arc (edge) data from SBGN-ML file, maps and integrates the user data to the glyphs and renders the SBGN graph in SVG formatwith mapped data as pseudo-colors. Currently it maps gene/protein omics data to “macromolecule” glyphs and maps compound omics data to “simple chemical” glyphs. The SBGNview function returns a SBGNview object, which contains data to recreate the rendered SBGN graph or further modified it. Please see the documentation of function SBGNview for more details.
library(SBGNview)
Use ls("package:SBGNview")
to list all available functions in SBGNview. To learn more about how to use the functions and access their manuals, use help()
or ?
. For example, ?”SBGNview”
or help("+.SBGNview")
.
ls("package:SBGNview")
## [1] "changeDataId" "changeIds" "downloadSbgnFile" "findPathways"
## [5] "highlightArcs" "highlightNodes" "highlightPath" "loadMappingTable"
## [9] "outputFile" "outputFile<-" "renderSbgn" "sbgn.gsets"
## [13] "sbgnNodes" "SBGNview"
SBGN pathways are defined in a special XML format (SBGN-ML file). SBGN-ML files describe the pathway elements (molecules and their reactions) and the graph layout (positions and appearances of the pathway elements). There are two major data types ement in SBGN-ML files: 1. node data (in tag “glyph”), such as node location, width, hight and node class (macromolecule, simple chemical etc.). 2. edge data (in tag “arc”), such as arc class, start node and end node. For more details, see: https://github.com/sbgn/sbgn/wiki/SBGN_ML
SBGNview provides two tiers of support for SBGN based pathway data, which corresponds to the two common SBGNview scenarios:
Scenarios 1 or Tier 1{#ourCollection}, direct and deep support to a core collection of pathway data from 5 major pathway databases including Reactome, SMPDB, PANTHER, MetaCyc and MetaCrop, or using SBGNview’s pathway collection. Each of these databases covers up to thousands of reference pathways and species (Table 1). SBGNview provides diagram optimization and ID mapping on these pathway databases. In other words, ve prepared a full collection of high quality sbgn pathways from these databases, which can be directly used in data analysis. The pathway data in SBGN-ML format are stored in data sbgn.xmls (included in package SBGNview.data), which are effectively indexed/accessed by SBGNview pathway IDs. We’ve also generated ID mapping between SBGN-ML glyph IDs and common molecule IDs, so that user data can be easily mapped to these pathways.
Scenarios 2 or Tier 2, indirect support to raw SBGN pathway data from pathway databases, collections and users’ custom pathway data. Many major pathway databases provide SBGN-ML files, including Pathway Commons, Reactome and MetaCrop. Their data can be downloaded and directly used by SBGNview. Other databases (e.g. PANTHER and BioCyc) use SBML or BioPAX formats, which can be converted to SBGN-ML format using third party tools. Note Pathway Commons collect pathways from Reactome, PANTHER, SMPDB and other primary databases. These data can be used in SBGNview, except diagrams may not be optimized for data visualization, and users need to provided ID mapping or make sure the glyph IDs are commonly use molecule IDs.
Note the core collection of pathway databases with tier 1 support includes thousands of pathways and species, which should cover most of the users needs in pathway analysis and data visualization. We are taking the first the default route, i.e. to use the SBGNview’s core collection.
Because a large amount of pathway data is loaded into R, this step can take up to several second.
data("sbgn.xmls")
We can check the information of collected pathways using pathways.info
and number of pathways collected using pathways.stats
.
data("pathways.info", "pathways.stats")
head(pathways.info)
## file.name database
## 1 http___identifiers.org_panther.pathway_P00001.sbgn pathwayCommons
## 2 http___identifiers.org_panther.pathway_P00002.sbgn pathwayCommons
## 3 http___identifiers.org_panther.pathway_P00003.sbgn pathwayCommons
## 4 http___identifiers.org_panther.pathway_P00004.sbgn pathwayCommons
## 5 http___identifiers.org_panther.pathway_P00005.sbgn pathwayCommons
## 6 http___identifiers.org_panther.pathway_P00006.sbgn pathwayCommons
## uri sub.database pathway.id
## 1 http://identifiers.org/panther.pathway/P00001 panther.pathway P00001
## 2 http://identifiers.org/panther.pathway/P00002 panther.pathway P00002
## 3 http://identifiers.org/panther.pathway/P00003 panther.pathway P00003
## 4 http://identifiers.org/panther.pathway/P00004 panther.pathway P00004
## 5 http://identifiers.org/panther.pathway/P00005 panther.pathway P00005
## 6 http://identifiers.org/panther.pathway/P00006 panther.pathway P00006
## pathway.name macromolecule.ID.type
## 1 Adrenaline and noradrenaline biosynthesis pathwayCommons
## 2 Alpha adrenergic receptor signaling pathway pathwayCommons
## 3 Alzheimer disease-amyloid secretase pathway pathwayCommons
## 4 Alzheimer disease-presenilin pathway pathwayCommons
## 5 Angiogenesis pathwayCommons
## 6 Apoptosis signaling pathway pathwayCommons
## simple.chemical.ID.type
## 1 pathwayCommons
## 2 pathwayCommons
## 3 pathwayCommons
## 4 pathwayCommons
## 5 pathwayCommons
## 6 pathwayCommons
pathways.stats
## database sub.database Freq
## 1 MetaCrop MetaCrop 62
## 5 MetaCyc MetaCyc 2518
## 9 pathwayCommons panther.pathway 149
## 12 pathwayCommons reactome 1746
## 15 pathwayCommons smpdb 725
SBGNview can search for pathways by keyword and automatically download SBGN-ML files.
pathways <- findPathways(c("bile acid","bile salt"))
head(pathways)
pathways.local.file <- downloadSbgnFile(pathways$pathway.id[1:3])
pathways.local.file
By default findPathways searches for keywords in pathway names. It can also search by different ID types
pathways <- findPathways(c("tp53","Trp53"),keyword.type = "SYMBOL")
head(pathways)
pathways <- findPathways(c("K04451","K10136"),keyword.type = "KO")
head(pathways)
Researchers may have different tastes for a “good looking” layout. We have created different layouts for each pathway. User can download them from pre-generated SBGN-ML files and try.
USers can also define their own pathways and create custom SBGN-ML files from scratch. Several tools like Newt editor can let the user draw a pathway diagram and save it as SBGN-ML file. The tools may also able to generate a primitive pathway layout. But these layouts are usually not desirable for data visualization with too many node-node overlaps and edge-node crossings. Therefore, we recommend the users to stick to SBGNview core collection.
SBGNview can visualize a range of omics data, including both gene (or transcript, protein, enzyme) data and compound (or metabolite, chemical, small molecules) data.
Gene/protein related data will be mapped to “macromolecule” nodes or glyphs on SBGN maps. Most of online databases or resources have their unique glyph ID types and it is likely different from the ID type in the gene data. We often need to map the omics IDs to the node IDs in the SBGN-ML file. In the quick start example (“Adrenaline and noradrenaline biosynthesis” pathway), the SBGN-ML file uses pathwayCommons IDs for gene/protein nodes, whereas the omics dataset uses Entrez gene IDs. The function SBGNview can automatically map common ID types such as ENTREZ, UniProt etc. to nodes in our pre-generated SBGN-ML files as shown in the quick start example. We can also do it manually using function changeDataId, which is called by SBGNview internally to do ID mapping. Supported ID type pairs can be found in data(mapped.ids). changeDataId uses pre-generated mapping tables or pathview to do the mapping. If the input-output ID type pair is not in data(mapped.ids) or can’t be mapped by SBGNview automatically, user needs to provide the mapping table explicitly using the “id.mapping.table” argument.
Firt, load the preprocessed demo gene data.
data("gse16873.d")
head(gse16873.d)
Let’s change the IDs in the gene data manually as an demo.
# select a subset of data for later use
gse16873 <- gse16873.d[,1:3]
# gene data to demonstrate ID convertion
gene.data <- gse16873
head(gene.data[,1:2])
## DCIS_1 DCIS_2
## 10000 -0.30764480 -0.14722769
## 10001 0.41586805 -0.33477259
## 10002 0.19854925 0.03789588
## 10003 -0.23155297 -0.09659311
## 100048912 -0.04490724 -0.05203146
## 10004 -0.08756237 -0.05027725
gene.data <- changeDataId(data.input.id = gene.data,
input.type = "entrez",
output.type = "pathwayCommons",
mol.type = "gene",
sum.method = "sum")
When multiple genes are mapped to the same glyph ID, we use the sum of their values (sum.method = “sum”).
head(gene.data[,1:2])
## DCIS_1
## Protein_000250be0b7ce9123ef439995f26fa1e 0.3209480
## Protein_0002d93f5ea4dd15ef9852f0d2ae15cc 0.2479337
## Protein_000b5881617c17b2c87c72bc3585c252 0.3344750
## Protein_000be815679a2fe779da5eb31081ba20 -0.2698879
## Protein_000be815679a2fe779da5eb31081ba20_Complex_855d843284bb4f558426811e0ce43095 -0.2698879
## Protein_000be815679a2fe779da5eb31081ba20_Complex_c5f21a9fa7df73cf25672b0f21e7f544 -0.2698879
## DCIS_2
## Protein_000250be0b7ce9123ef439995f26fa1e -0.32585772
## Protein_0002d93f5ea4dd15ef9852f0d2ae15cc 1.03972870
## Protein_000b5881617c17b2c87c72bc3585c252 0.35504432
## Protein_000be815679a2fe779da5eb31081ba20 0.04336225
## Protein_000be815679a2fe779da5eb31081ba20_Complex_855d843284bb4f558426811e0ce43095 0.04336225
## Protein_000be815679a2fe779da5eb31081ba20_Complex_c5f21a9fa7df73cf25672b0f21e7f544 0.04336225
Now we run SBGNview, the main function to map and render gene data on SBGN map.
SBGNview.obj <- SBGNview(gene.data = gene.data,
input.sbgn = "P00001",
output.file = "test_output",
gene.id.type = "pathwayCommons",
output.formats = c("png", "pdf", "ps"))
SBGNview.obj
SBGNview will always generate a .svg file. Other formats can be added also. In this example, three additional files (pdf, ps, png) will be created in the same folder.