Date post: | 28-Mar-2015 |
Category: |
Documents |
Upload: | haden-byrom |
View: | 222 times |
Download: | 0 times |
Overview of the Pathway Tools Software
and Pathway/Genome
Databases
SRI InternationalBioinformaticsIntroductions
BRG Staff Peter Karp Tomer Altman Joe Dale Fred Gilham John Myers Suzanne Paley Markus Krummenacker Ingrid Keseler Ron Caspi Alex Shearer Carol Fulcher
Attendees Where from, what genome? What do you hope to get out of the tutorial?
SRI InternationalBioinformatics
SRI International
Private nonprofit research institute
No permanent funding sources
1300 staff in Menlo Park
– Founded in 1946 as Stanford Research Institute
– Separated from Stanford University in 1970
– Name changed to SRI International in 1977
SRI InternationalBioinformaticsSRI Organization
Information and Computing Sciences
Engineering SystemsAnd Sciences
PhysicalSciences
BiopharmaceuticalsAnd
PharmaceuticalDiscovery
Education and
Policy
Bioinformatics Research Group
SRI InternationalBioinformaticsResearch in the SRI
Bioinformatics Research Group
BioCyc Database Collection EcoCyc MetaCyc
Pathway ToolsBioWarehouse
SRI InternationalBioinformaticsOutline for Tutorial
Monday Introduction Pathway/Genome Navigator Introduction to Pathway/Genome Editors
Tuesday PathoLogic tutorial PathoLogic lab session – Build initial version of PGDB Pathway hole filler lecture+lab
Wednesday PathoLogic: Creating protein complexes, operon predictor, transport inference parser Pathway Tools Schema Model organism database projects
Thursday Advanced Pathway/Genome Editors
Friday Overviews and Omics Viewers Comparative analysis Structured Advanced Query Form Metabolite Tracing Regulation
SRI InternationalBioinformaticsTutorial Goals
General familiarity with Pathway Tools goals and functionality
Ability to create, edit, and navigate a new PGDB
Create new PGDB for genome(s) you brought with you
Familiarity with information resources available about Pathway Tools to continue your work
SRI InternationalBioinformaticsSRI’s Support for Pathway Tools
NIH grant finances software development and user support
Additional grants finance other software development
Email us bug reports, suggestions, questions
Comprehensive bug reports are required for us to fix the problem you reported
Keep us posted regarding your progress
SRI InternationalBioinformaticsAdministrative Details
Please wear badge at all timesEscort required outside this room/hallwayLet us know when you are leaving
Use E-Bldg EntrancePhone numbers to call from entrance
Meals
Restrooms
SRI InternationalBioinformaticsTutorial Format
Questions welcome during presentations
Lab sessions will take different amounts of time for different people
Refine your PGDB Read Pathway Tools manuals
Computer logins
Internet connectivity
SRI InternationalBioinformaticsPathway/Genome Database
ChromosomesPlasmids
Genes
ProteinsRNAs
Reactions
Pathways
Compounds
CELL
OperonsPromoters
DNA Binding SitesRegulatory Interactions
Sequence Features
SRI InternationalBioinformaticsBioCyc Collection of
Pathway/Genome DatabasesPathway/Genome Database (PGDB) – combines information about
Pathways, reactions, substrates Enzymes, transporters Genes, replicons Transcription factors/sites, promoters,
operons
Tier 1: Literature-Derived PGDBs MetaCyc EcoCyc -- Escherichia coli K-12
Tier 2: Computationally-derived DBs, Some Curation -- 20 PGDBs
HumanCyc Mycobacterium tuberculosis
Tier 3: Computationally-derived DBs, No Curation -- 349 DBs
SRI InternationalBioinformaticsTerminology –
Pathway Tools Software PathoLogic
Predicts operons, metabolic network, pathway hole fillers, from genome Computational creation of new Pathway/Genome Databases
Pathway/Genome Editors Distributed curation of PGDBs Distributed object database system, interactive editing tools
Pathway/Genome Navigator WWW publishing of PGDBs Querying, visualization of pathways, chromosomes, operons Analysis operations
Pathway visualization of gene-expression data Global comparisons of metabolic networks
Bioinformatics 18:S225 2002
SRI InternationalBioinformaticsPathway Tools Software:
PGDBs Created Outside SRI1000+ licensees: 75+ groups applying software to 150+ organisms
Saccharomyces cerevisiae, SGD project, Stanford University pathway.yeastgenome.org/biocyc/
Mouse, MGD, Jackson LaboratorydictyBase, Northwestern UniversityUnder development:
CGD (Candida albicans), Stanford University Drosophila, P. Ebert in collaboration with FlyBase C. elegans, P. Ebert in collaboration with WormBase
Planned: RGD (Rat), Medical College of Wisconsin
Arabidopsis thaliana, TAIR, Carnegie Institution of WashingtonTomato and Potato, Cornell University GrameneDB, Cold Spring Harbor LaboratoryMedicago truncatula, Samuel Roberts Noble Foundation
SRI InternationalBioinformaticsPathway Tools Software:
PGDBs Created Outside SRINIAID BRCs: BioHealthBase (M. tuberculosis, F. tuleremia), PATRIC, ApiDB (Cryptosporidium)F. Brinkman, Simon Fraser Univ, Pseudomonas aeruginosaV. Schachter, Genoscope, AcinetobacterM. Bibb, John Innes Centre, Streptomyces coelicolorG. Church, Harvard, Prochlorococcus marinus, multiple strainsE. Uberbacher, ORNL and G. Serres, MBL, Shewanella onedensisR.J.S. Baerends, University of Groningen, Lactococcus lactis IL1403, Lactococcus lactis MG1363, Streptococcus pneumoniae TIGR4, Bacillus subtilis 168, Bacillus cereus ATCC14579Matthew Berriman, Sanger Centre, Trypanosoma brucei, Leishmania majorHerbert Chiang, Washington University, Bacteroides thetaiotaomicronSergio Encarnacion, UNAM, Sinorhizobium melilotiGregory Fournier, MIT, Mesoplasma florumMark van der Giezen, University of London, Entamoeba histolytica, Giardia intestinalis Michael Gottfert, Technische Universitat Dresden, Bradyrhizobium japonicumArtiva Maria Goudel, Universidade Federal de Santa Catarina, Brazil, Chromobacterium violaceum ATCC 12472Kenneth J. Kauffman, University of California, Riverside, Desulfovibrio vulgaris
SRI InternationalBioinformaticsPathway Tools Software:
PGDBs Created Outside SRI
Mike McLeod, University of British Columbia, Rhodococcus sp. RHA1
Robert S. Munson, Children's Research Institute, Ohio, Haemophilus ducreyi, Haemophilus influenzae 86-026NP
John Nash, Canadian NRC, Campylobacter jejuni Christopher S. Reigstad, Washington University, Escherichia coli
UTI89 Haluk Resat, Pacific Northwest Lab, Rhodobacter sphearoides Gary Xie, Los Alamos Lab, Bacillus cereus
Large scale users: C. Medigue, Genoscope, 107 PGDBs G. Burger, U Montreal, 48 PGDBs Bart Weimer, Utah State University, Lactococcus lactis, Brevibacterium linens,
Lactobacillus acidophilus, Lactobacillus plantarum, Lactobacillus johnsonii, Listeria monocytogenes
Partial listing of outside PGDBs at BioCyc.org
SRI InternationalBioinformaticsTerminology
“Database” = “DB” = “Knowledge Base” = “KB” = “Pathway/Genome Database” = “PGDB”
SRI InternationalBioinformaticsWhy Create PGDBs?
Extract more information from your genome
Create an up-to-date computable information repository about an organism
Perform analyses on the genome and pathway complement of the organism
Analyses of omics data Analyses of cellular systems (dead-end metabolites) Reports generated by Pathway Tools
Perform comparative analyses with other organisms
Generate a genome poster and metabolic wall chart
SRI InternationalBioinformaticsSequence Project Workflow
Raw Sequence
Phred
Phrap
BLAST, BLOCKS
GeneMark/Glimmer
PathoLogic
P/G Navigator
P/G Editors
WWW Publishing Analyses
PathwayTools
SRI InternationalBioinformaticsEcoCyc Project – EcoCyc.org
E. coli Encyclopedia Review-level Model-Organism Database for E. coli Tracks evolving annotation of the E. coli genome and cellular networks The two paradigms of EcoCyc
“Multi-dimensional annotation of the E. coli K-12 genome” Positions of genes; functions of gene products – 76% / 66% exp Gene Ontology terms; MultiFun terms Gene product summaries and literature citations Evidence codes Multimeric complexes Metabolic pathways Regulation of transcription initiation
Nuc. Acids Res. 35:7577 2007 ASM News 70:25 2004 Science 293:2040
Karp, Gunsalus, Collado-Vides, Paulsen
SRI InternationalBioinformatics
Paradigm 1:EcoCyc as Textual Review Article
All gene products for which experimental literature exists are curated with a minireview summary
Found on protein and RNA pages, not gene pages! 3257 gene products contain summaries
Summaries cover function, interactions, mutant phenotypes, crystal structures, regulation, and more
Additional summaries found in pages for operons, pathways
EcoCyc cites 15,880 publications
SRI InternationalBioinformaticsParadigm 2: EcoCyc as
Computational Symbolic Theory
Highly structured, high-fidelity knowledge representation provides computable information
Each molecular species defined as a DB object Genes, proteins, small molecules
Each molecular interaction defined as a DB object Metabolic reactions Transport reactions Transcriptional regulation of gene expression
220 database fields capture extensive properties and relationships
SRI InternationalBioinformaticsEcoCyc Procedures
DB updates performed by 5 staff curators Information gathered from biomedical literature
Enter data into structured database fields Author extensive summaries Update evidence codes
Corrections submitted by E. coli researchers
Four releases per year
Quality assurance of data and software Evaluate database consistency constraints Perform element balancing of reactions Run other checking programs
SRI InternationalBioinformaticsMetaCyc: Metabolic
Encyclopedia Describe a representative sample of every experimentally
determined metabolic pathway Describe properties of metabolic enzymes
Literature-based DB with extensive references and commentary
Pathways, reactions, enzymes, substrates
Jointly developed by P. Karp, R. Caspi, C. Fulcher, SRI International L. Mueller, A. Pujar, Cornell Univ S. Rhee, P. Zhang, Carnegie Institution
Nucleic Acids Research 2008
SRI InternationalBioinformaticsMetaCyc Data -- Version 11.6
Pathways 1010
Reactions 6,576
Enzymes 4,582
Small Molecules
6,561
Organisms 1,077
Citations 15,875
SRI InternationalBioinformaticsTaxonomic Distribution of
MetaCyc Pathways
Bacteria 517
Green Plants 372
Mammals 90
Fungi 89
Archaea 65
SRI InternationalBioinformaticsFamily of Pathway/Genome
Databases
MetaCyc
EcoCycCauloCycAraCyc
MtbRvCycHumanCyc
SRI InternationalBioinformaticsComparison of BioCyc to KEGG:
The Data
KEGG approach: Static collection of pathway diagrams that are color-coded to produce organism-specific views
KEGG vs MetaCyc: Resource on literature-derived pathways
KEGG pathway maps are composites of pathways in many organisms -- do not identify what specific pathways elucidated in what organisms
KEGG pathway maps encompass multiple biological pathways; are 2-4 times the size of MetaCyc pathways
KEGG has no literature citations, no summaries, less enzyme detail
KEGG vs BioCyc organism-specific PGDBs KEGG re-annotates entire genome for each organism KEGG does not curate or customize pathway networks for each organism
SRI InternationalBioinformatics
Comparison of Pathway Tools to
KEGG: The Software
KEGG has no pathway hole filler or transport inference parser or operon predictor
KEGG has no interactive editing tools – you cannot refine a KEGG pathway DB
KEGG has no algorithmic visualization tools – pathway diagrams are pre-drawn
May become out of date Cannot show pathways at multiple detail levels
KEGG genome browser has very limited functionality KEGG has one overview diagram with limited functionality KEGG has no metabolite tracing tool KEGG has no Structured Advanced Query Tool
SRI InternationalBioinformatics
Overviews and Omics Viewers
Genome-scale Visualizations Metabolic map Transcriptional regulatory network Genome map
Overlay gene expression, proteomics, metabolomics data Obtain pathway based visualizations of omics data
Numerical spectrum of expression values mapped to a color spectrum Steps of overview painted with color corresponding to expression level(s)
of genes that encode enzyme(s) for that step
SRI InternationalBioinformaticsEnvironment for Computational
Exploration of Genomes
Powerful ontology opens many facets of the biology to computational exploration
Global characterization of metabolic networkAnalysis of interface between transport and
metabolismNutrient analysis of metabolic network
SRI InternationalBioinformaticsPathway Tools Implementation
Details
Allegro Common LispSun, Linux, Windows, Macintosh platforms
Ocelot object database
370,000+ lines of code
Lisp-based WWW server at BioCyc.org Manages 370+ PGDBs
SRI InternationalBioinformaticsThe Common Lisp Programming
Environment
Gatt studied Lisp and Java implementation of 16 programs by 14 programmers (Intelligence 11:21 2000)
SRI InternationalBioinformaticsSurvey
Please complete survey at end of each day
SRI InternationalBioinformaticsPGDB(s) That You Build
Before you leave Tar up your PGDB directory and FTP it home, email it home,
or copy it to flash disk We will create a backup copy of your PGDB directory if the
directory is still there at the end of the tutorial Delete the PGDB directory if you don’t want us to back it up We will not give the backed up data to anyone else
SRI InternationalBioinformaticsInformation Sources
Pathway Tools User’s Guide /root/aic-export/pathway-tools/ptools/11.5/doc/manuals/userguide.pdf NOTE: Location of the aic-export directory can vary across different computers
Pathway Tools Web Site Publications, FAQ, programming examples, etc. http://bioinformatics.ai.sri.com/ptools/
BioCyc Publications Page http://biocyc.org/publications.shtml
MetaCyc Guide http://metacyc.org/MetaCycUserGuide.shtml
Slides from this tutorial http://bioinformatics.ai.sri.com/ptools/tutorial/
BioCyc Webinars http://biocyc.org/webinar.shtml
SRI InternationalBioinformaticsReporting Pathway Tools
Problems
Tell us: What platform you are running on What version of Pathway Tools you are running The error message Result of [1] EC(2) :zoom :count :all What operation were you performing when the error occurred?
New patches automatically downloaded and loaded with PTools starts up
Auto-Patch Tools -> Instant Patch -> Download and Activate All Patches
SRI InternationalBioinformaticsSummary
Pathway Tools and Pathway/Genome Databases Not just for pathways! Computational inferences
Operons, metabolic pathways, pathway hole fillers Editing tools Analysis tools: Omics data on pathways Web publishing of PGDBs
Main classes of users: Develop PGDB to extract more information from genome for
genome paper Develop a model-organism DB for the organism that is updated
regularly and published on the web