Computational Exploration of
Metabolic Networks with Pathway Tools
Part 2: APIs & Examples
Randy Gobbel, Ph.D.Bioinformatics Research Group
SRI International
[email protected]://BioCyc.org/
SRI InternationalBioinformaticsComputing with Pathway
Tools: APIs
Generic functions with a consistent naming scheme
Basic frame access functions Built-in functions for analysis and global statistics
Simultaneous access to multiple KBs Cross-species comparisons Specialized KBs
MetaCyc SchemaBase
SRI InternationalBioinformaticsComputing with Pathway
Tools: APIs
PerlCyc interface Library of Perl functions for querying PGDBs via socket connection Database access functions
Select_Organism, All_Pathways Functions for performing inference / hardwired queries
Genes_Of_Reaction, Genes_Of_Pathway Transcription_Unit_Transcription_Factors Enzyme_P
JavaCyc interface also in progresshttp://aracyc.stanford.edu/~mueller/perlcyc/
Lisp API http://bioinformatics.ai.sri.com/ptools/ptools-resources.html
SRI InternationalBioinformaticsPerlcyc and Javacyc
Interface to running Pathway Tools image through TCP
Names are translated to Perl and Java conventions
Object references are supported by means of unique frame names
SRI InternationalBioinformaticsPathway Tools API
Functions
get_class_all_instances(Class) Returns the instances of Class
Key Pathway Tools classes:
Genetic-Elements Genes Proteins
Polypeptides Protein-Complexes
Pathways
Reactions Compounds-And-Elements Enzymatic-Reactions Transcription-Units Promoters DNA-Binding-Sites
SRI InternationalBioinformaticsPathway Tools API
Functions
Notation Frame.Slot means a specified slot of a specified frame
get_slot_value(Frame Slot) Returns first value of Frame.Slot
get_slot_values(Frame Slot) Returns all values of Frame.Slot
slot_has_value_p(Frame Slot) Returns true if Frame.Slot has at least one value
member_slot_value_p(Frame Slot Value) Returns true if Value is one of the values of Frame.Slot
SRI InternationalBioinformatics
Additional Pathway Tools Functions – Semantic Inference LayerBuilt-in functions encode commonly used queries
that compute indirect DB relationships genes_of_pathway, substrates_of_pathway all_transcription_factors, regulon_of_protein
See http://bioinformatics.ai.sri.com/ptools/ptools-fns.html for more information
SRI InternationalBioinformatics
Computing with Pathway Tools:Flat Files
Two file formats: tab-delimited, attribute-valueOne file for each format, each datatypeSpecification:
http://bioinformatics.ai.sri.com/ptools/flatfile-format.htmlExamples:
Pathways.col – Pathways and genes encoding enzymes Enzymes.col – Enzymes and reactions they catalyze Pathways.dat – Full data on each pathway Reactions.dat – Full data on each reaction
SRI InternationalBioinformaticsExample Flat File
UNIQUE-ID - P107-PWYTYPES - Energy-MetabolismCOMMON-NAME - RuMP cycle and formaldehyde assimilationREACTION-LIST - FORMATEDEHYDROG-RXNREACTION-LIST - FORMALDEHYDE-DEHYDROGENASE-RXNREACTION-LIST - 6PGLUCONDEHYDROG-RXNREACTION-LIST - R84-RXNREACTION-LIST - PGLUCISOM-RXNREACTION-LIST - R12-RXNREACTION-LIST - R10-RXNSYNONYMS - ribulose-monophosphate cycleSYNONYMS - formaldehyde oxidation//
SRI InternationalBioinformaticsExample Flat File –
Reactions.dat
UNIQUE-ID - R84-RXNTYPES - EC-1.1.1EC-NUMBER - 1.1.1.-IN-PATHWAY - P122-PWYIN-PATHWAY - P107-PWYLEFT - GLC-6-PLEFT - NADOFFICIAL-EC? - NORIGHT - 6-P-GLUCONATERIGHT - NADHRIGHT - PROTON//
SRI InternationalBioinformaticsExample Flat File –
Compounds.dat
UNIQUE-ID - GLC-6-PTYPES - Carbohydrate-DerivativesCOMMON-NAME - glucose-6-phosphateCAS-REGISTRY-NUMBERS - 56-73-5CHEMICAL-FORMULA - (C 6)CHEMICAL-FORMULA - (H 13)CHEMICAL-FORMULA - (O 9)CHEMICAL-FORMULA - (P 1)MOLECULAR-WEIGHT - 260.137SYNONYMS - D-glucose-6-PSYNONYMS - glucose-6-PSYNONYMS - α-D-glucose-6-phosphateSYNONYMS - α-D-glucose-6-PSYNONYMS - D-glucose-6-phosphate//
SRI InternationalBioinformaticsBioinformatics Results:
Algorithms
Query and visualization environment for genome and pathway information
PathoLogic algorithm predicts the metabolic network of an organism from its genome
Algorithm for global characterization of a metabolic network
Algorithms under development for qualitative modeling of the cell
SRI InternationalBioinformaticsThe Pathway Tools
KB as a "virtual cell"
Detailed representation of proteins, including subunits
Protein complexes and modificationsLinks from genome, through proteins, to
pathways and superpathways
SRI InternationalBioinformaticsComputing with the
Metabolic Network
Comparative analysis of metabolic networksVisualization of expression data
Correlation of metabolism and transportConnectivity analysis of metabolic network
Forward propagation of metabolitesVerification of known growth media with
metabolic network
SRI InternationalBioinformatics
Computational Explorationof PGDBsInfer metabolic network from genome
Bioinformatics 18:705 2002Global properties of the metabolic network
Genome Research 10:568 2000Global properties of the genetic network
Comparison of whole metabolic networks
Consistency of a PGDB with respect to known growth-media requirements
Search for gaps in metabolic network Pacific Symp Biocomputing 2001:471
SRI InternationalBioinformaticsExample Studies
Relationship of protein subunits to gene positions Global properties of the E. coli metabolic network
Reactions catalyzed by more than one enzyme Enzymes that catalyze more than one reaction Reactions participating in more than one pathway
Automatic detection of intersection points in the metabolic network Nutrient analyses
Forward propagation: Given a set of nutrients, what compounds will be produced by the metabolic network?
Backtracking: Given a forward propagation result, and a set of essential compounds that are not included in that result, what precursors must be supplied to produce those compounds?
Operon prediction
SRI InternationalBioinformaticsProtein subunits and
linked genes
Question: are protein subunits coded by neighboring genes?
Proteins are linked to genes, gene positions are recorded in the KB
Procedure Fetch all protein complexes Subunits are stored in the ‘components’ slot Each component has a ‘gene’ slot Genes have ‘left-end-position’ and ‘right-end-position’ slots
Results Protein subunits of >90% of heteromeric enzymes are
encoded by neighboring genes
SRI InternationalBioinformatics
Global properties: How many reactions are catalyzed by more than one enzyme?Procedure
get_class_all_instances(‘Reactions’) We are interested only in reactions with at least one value in
their ‘enzymatic-reaction’ slot result = reactions with more than one value for their
‘enzymatic-reaction’ slotResults
About 10% of reactions are catalyzed by more than one enzyme
Two classes of multi-enzyme reactions Homologous enzymes “Easy” reactions
SRI InternationalBioinformatics
Global properties: Multifunctional enzymes (how many enzymes catalyze more than one reaction?)Procedure
get_class_all_instances(‘Proteins’) result = proteins with more than one value in the ‘catalyzes’
slotResults
100 out of 607 enzymes catalyze multiple reactions This is significantly more than predicted by genome
sequencing projects
SRI InternationalBioinformatics
Global properties: Reactions in multiple pathways Procedure
get_class_all_instances(‘Reactions’) result = reactions with more than one value in the ‘in-
pathway’ slotSignificance
Reactions that appear in multiple pathways correspond to intersection points in the metabolic network
Could be used to identify candidate reactions for drug targets
SRI InternationalBioinformaticsMetabolic Overview
Queries
Species comparison Highlight reactions that are
Shared/not-shared with Any-one/All-of A specified set of species
Overlay expression data Absolute or relative expression levels Reaction colors reflects expression level
SRI InternationalBioinformatics
A
E
SRI InternationalBioinformatics
SRI InternationalBioinformaticsC. crescentus Cell Cycle Gene
Expression
SRI InternationalBioinformatics
Global Consistency Checking of Biochemical Network
Given: A PGDB for an organism A set of initial metabolites
Infer: What set of products can be synthesized by the small-
molecule metabolism of the organism
Can known growth medium yield known essential compounds?
Pacific Symposium on Biocomputing p471 2001
SRI InternationalBioinformaticsAlgorithm:
Forward Propagation
Nutrientset
Metaboliteset
“Fire”reactions
Transport
Products
Reactants
PGDBreaction
pool
SRI InternationalBioinformaticsResults
Phase I: Forward propagation 21 initial compounds yielded only half of 38 essential
compounds for E. coli
Phase II: Manually identify Bugs in EcoCyc (e.g., two objects for tryptophan) Missing initial protein substrates (e.g., ACP) Missing pathways in EcoCyc
Phase III: Forward propagation with 11 more initial metabolites
Yielded all 38 essential compounds
SRI InternationalBioinformaticsInitial Metabolites
(Total: 21 compounds)
Nutrients (8) (M61 Minimal growth medium)
H+, Fe2+, Mg2+, K+, NH3, SO4
2-, PO4
2-, Glucose
Nutrients (10) (Growth conditions)
Water, Oxygen, Trace elements (Mn2+, Co2+, Mo2+, Ca2+, Zn2+, Cd2+, Ni2+, Cu2+)
Bootstrap Compounds (3) ATP, NADP, CoA
SRI InternationalBioinformaticsNutrient-Related Analysis:
Validation of the EcoCyc Database
Results on EcoCyc:
Phase I:• Essential compounds
• produced
19• not produced
19
• Total compounds • produced:
(28%)
• Reactions• Fired
(31%)
SRI InternationalBioinformaticsMissing Essential
Compounds Due To
Bugs in EcoCyc
Narrow conceptualization of the problem Protein substrates
Incomplete biochemical knowledge
SRI InternationalBioinformaticsNutrient-Related Analysis:
Validation of the EcoCyc Database
Results on EcoCyc:
Phase II (After adding 11 extra metabolites):• Essential compounds
• produced
38• not produced
0• Total compounds
• produced:
(49%)• not produced:
(51%)• Reactions
• Fired
(58%)• Not fired
(42%)
SRI InternationalBioinformaticsOperon Prediction
Based on the method of Moreno-Hagelsieb et al. Bioinformatics 18 Suppl. 1 (2002)
Distance between genes Functional classification Correctly predicts 75% of transcription units, 65% of operons
Additional information available in PGDB Pathways Protein complexes Transporters Improved prediction performance: 80% of transcription units,
69% of operonsDetailed paper in preparation
SRI InternationalBioinformaticsVisualization of Genetic
Network
Operon display windowTranscription factor display windowHighlight regulon on Overview diagramPaint expression data onto Overview diagram
Database adapter mechanism: MAGE-ML intermediate form Adapter defined for SMD
Animation User specified mapping of color ranges Import of SAM files (next release)
List of significantly +/- genesDisplay full genetic network (later release)
SRI InternationalBioinformaticsAcknowledgements
SRI Peter Karp, Suzanne Paley,
Pedro Romero, John Pick, Randy Gobbel, Cindy Krieger, Martha Arnaud
EcoCyc Project Julio Collado-Vides, Ian
Paulsen, Monica Riley, Milton Saier
MetaCyc Project Sue Rhee, Lukas Mueller,
Peifen Zhang, Chris SomervilleStanford
Gary Schoolnik, Harley McAdams, Lucy Shapiro, Russ Altman, Iwei Yeh
Funding sources: NIH National Center for
Research Resources NIH National Institute of
General Medical Sciences
NIH National Human Genome Research Institute
Department of Energy Microbial Cell Project
DARPA BioSpice, UPC
BioCyc.org