+ All Categories
Home > Documents > Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University...

Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University...

Date post: 26-Mar-2015
Category:
Upload: wyatt-roach
View: 216 times
Download: 2 times
Share this document with a friend
Popular Tags:
100
Bridging Bridging Bioinformatics and Bioinformatics and Chem(o)informatics Chem(o)informatics Gary Wiggins Gary Wiggins School of Informatics School of Informatics Indiana University Indiana University [email protected] [email protected] Yan He (SLIS MLS Student) Yan He (SLIS MLS Student) Meredith Saba (SLIS MLS Student) Meredith Saba (SLIS MLS Student)
Transcript
Page 1: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Bridging Bioinformatics Bridging Bioinformatics and Chem(o)informaticsand Chem(o)informatics

Gary WigginsGary WigginsSchool of InformaticsSchool of Informatics

Indiana UniversityIndiana [email protected]@indiana.edu

Yan He (SLIS MLS Student)Yan He (SLIS MLS Student)Meredith Saba (SLIS MLS Student)Meredith Saba (SLIS MLS Student)

Page 2: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Provocative ThoughtProvocative Thought

““While much bioscience is published with While much bioscience is published with the knowledge that machines will be the knowledge that machines will be expected to understand at least part of it, expected to understand at least part of it, almost all chemistry is published purely for almost all chemistry is published purely for humans to read.”humans to read.” Murray-Rust et al. Org. Biomol. Chem. 2004, Murray-Rust et al. Org. Biomol. Chem. 2004,

2, 3201.2, 3201.

Page 3: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Overview of the TalkOverview of the Talk

Review of ACS CINF 2004 PapersReview of ACS CINF 2004 Papers Review of Relevant ArticlesReview of Relevant Articles Public Chemistry Databases and Data Public Chemistry Databases and Data

Repositories with Bioinformatics Info/Links Repositories with Bioinformatics Info/Links Overview of Web ServicesOverview of Web Services NIH-funded Projects Underway or Planned NIH-funded Projects Underway or Planned

at Indiana Universityat Indiana University

Page 4: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

““The Bigger Picture — Linking The Bigger Picture — Linking Bioinformatics to Cheminformatics”Bioinformatics to Cheminformatics”

American Chemical Society Division of Chemical American Chemical Society Division of Chemical Information (CINF) Symposium, Anaheim, Information (CINF) Symposium, Anaheim, Spring 2004Spring 2004 All-day session with 16 papersAll-day session with 16 papers http://www.acscinf.org/new/docs/meetings/http://www.acscinf.org/new/docs/meetings/

227nm/227cinfabstracts.htm227nm/227cinfabstracts.htm

Page 5: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Problems from ACS CINF 2004Problems from ACS CINF 2004

Both technical and people factors hinder Both technical and people factors hinder knowledge exchange between biology and knowledge exchange between biology and chemistry. (Lipinski)chemistry. (Lipinski)

People Problems per Chris LipinskiPeople Problems per Chris Lipinski Meta data capture is complicated by people Meta data capture is complicated by people

issues, particularly those between chemists issues, particularly those between chemists and biologists.and biologists.

Discipline-based disconnects occur Discipline-based disconnects occur distressingly often and are frequently distressingly often and are frequently overlooked as a cause of lost productivity.overlooked as a cause of lost productivity.

Page 6: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Interdisciplinary Collaborations: Interdisciplinary Collaborations: Biology and ChemistryBiology and Chemistry

[What’s] “... important for these collaborations is, [What’s] “... important for these collaborations is, not only do you have to accept the other guy’s not only do you have to accept the other guy’s paradigm or at least live with it; you have to be paradigm or at least live with it; you have to be willing to accept the other guy’s foibles or your willing to accept the other guy’s foibles or your perception of the other guy’s foibles (and perception of the other guy’s foibles (and recognize the opposite of this). We each have recognize the opposite of this). We each have our own approaches to how we do science, and our own approaches to how we do science, and it’s just different cultures.”it’s just different cultures.”

--Thom Kauffman interview in ACS LiveWire, March 2005, 7.3. --Thom Kauffman interview in ACS LiveWire, March 2005, 7.3. http://pubs.acs.org/4librarians/livewire/2006/7.3/profile.html http://pubs.acs.org/4librarians/livewire/2006/7.3/profile.html

Page 7: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Some Questions from the ACS Some Questions from the ACS CINF 2004 SymposiumCINF 2004 Symposium

"Find all proteins related to protein A (i.e. "Find all proteins related to protein A (i.e. within a given path length of A) in a protein within a given path length of A) in a protein interaction graph, and retrieve related interaction graph, and retrieve related assay results and compound structures.” assay results and compound structures.”

““Find all pathways where compound X Find all pathways where compound X inhibits or slows a reaction, and retrieve inhibits or slows a reaction, and retrieve Gene Ontology classifications for all Gene Ontology classifications for all proteins involved in the reaction.” proteins involved in the reaction.”

Page 8: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Problems from ACS CINF 2004Problems from ACS CINF 2004

Commercial vs. public dataCommercial vs. public data Batch mode data processing possible in biology, Batch mode data processing possible in biology,

but primitive in chemistrybut primitive in chemistry Primary HTS data has a very high noise factorPrimary HTS data has a very high noise factor Data format standardization problemData format standardization problem

Chemoinformatics and bioinformatics use completely Chemoinformatics and bioinformatics use completely different data formats and analysis toolsdifferent data formats and analysis tools

Chemical and protein sequence information has Chemical and protein sequence information has been largely analyzed separatelybeen largely analyzed separately

Page 9: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Solutions from ACS CINF 2004Solutions from ACS CINF 2004

Linking biological and chemical information in Linking biological and chemical information in computational approaches to predict biological computational approaches to predict biological activity, ADME profiles, and adverse drug activity, ADME profiles, and adverse drug reactions (ADR)reactions (ADR)

Energetics of binding for more accurate and Energetics of binding for more accurate and sensitive chemical representation of DNA-sensitive chemical representation of DNA-protein interactionsprotein interactions

A discovery informatics platform that facilitates A discovery informatics platform that facilitates archival, sharing, integration, and exploration of archival, sharing, integration, and exploration of synthetic methods and biological activity datasynthetic methods and biological activity data

Page 10: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Solutions from ACS CINF 2004 Solutions from ACS CINF 2004

Data pipelining approach makes it Data pipelining approach makes it possible to apply bioinformatics and possible to apply bioinformatics and chemoinformatics data and analyses chemoinformatics data and analyses together.together.

Visualizations are the best way for people Visualizations are the best way for people to understand data.to understand data.

Page 11: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Solutions from ACS CINF 2004Solutions from ACS CINF 2004

Cabinet (Chemical And Biological Information Cabinet (Chemical And Biological Information NETwork, formerly Fedora) servers includeNETwork, formerly Fedora) servers include Metabolic pathway network chart (Empath)Metabolic pathway network chart (Empath) Protein-Ligand Association Network (Planet)Protein-Ligand Association Network (Planet) Enzyme Commission Codebook (EC Book)Enzyme Commission Codebook (EC Book) Traditional Chinese Medicines (TCM)Traditional Chinese Medicines (TCM) World Drug Index (WDI), and others.World Drug Index (WDI), and others.

Built on the Daylight HTTP toolkitBuilt on the Daylight HTTP toolkit http://www.metaphorics.com/products/cabinet.hthttp://www.metaphorics.com/products/cabinet.ht

mlml

Page 12: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Overview of the TalkOverview of the Talk

Review of ACS CINF 2004 PapersReview of ACS CINF 2004 Papers Review of Relevant ArticlesReview of Relevant Articles Public Chemistry Databases and Data Public Chemistry Databases and Data

Repositories with Bioinformatics Info/Links Repositories with Bioinformatics Info/Links Overview of Web ServicesOverview of Web Services NIH-funded Projects Underway or Planned NIH-funded Projects Underway or Planned

at Indiana Universityat Indiana University

Page 13: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

What is Chemoinformatics? What is Chemoinformatics? (Brown)(Brown)

“…“…the essence of chemoinformatics is the essence of chemoinformatics is integrationintegration and and focusfocus rather than its rather than its components, which are independent components, which are independent disciplines.”disciplines.”

Supporting disciplines:Supporting disciplines: Chemical informationChemical information Computational chemistryComputational chemistry ChemometricsChemometrics

Page 14: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Chemoinformatics and DiseaseChemoinformatics and Disease

Page 15: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Toolkits as Integrators (Brown)Toolkits as Integrators (Brown)

Companies such as Daylight, Advanced Companies such as Daylight, Advanced Visual Systems, OpenEye, and SciTegic Visual Systems, OpenEye, and SciTegic provide integration systems for:provide integration systems for: Statistical methodsStatistical methods Text miningText mining Computational chemistryComputational chemistry VisualizationVisualization

Page 16: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Genego’s MetaDrug ProductGenego’s MetaDrug Product

Toxicogenomics platform for the prediction Toxicogenomics platform for the prediction of human drug metabolism and toxicity of of human drug metabolism and toxicity of novel compoundsnovel compounds

Enables the visualization of pre-clinical Enables the visualization of pre-clinical and clinical high-throughput data in the and clinical high-throughput data in the context of the complete biological system context of the complete biological system

Integrates chemical, biological, and protein Integrates chemical, biological, and protein function datafunction data

http://www.genego.com/ http://www.genego.com/

Page 17: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

BioWisdomBioWisdom

Examination of vast amounts of available Examination of vast amounts of available information using its Sofia KnowledgeScan information using its Sofia KnowledgeScan methodologymethodology

SRS data integration platformSRS data integration platform http://www.biowisdom.com/ http://www.biowisdom.com/

Page 18: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Lessons from Hip Hop (Salamone)Lessons from Hip Hop (Salamone)

Mashup techniqueMashup technique Bring together disparate informatics, Bring together disparate informatics,

biological, chemical, and imaging information biological, chemical, and imaging information when conducting researchwhen conducting research

Example of an integration tool: Example of an integration tool: iSpecies.orgiSpecies.org A search for a species returns a page with A search for a species returns a page with

NCBI genomics information, Yahoo images of NCBI genomics information, Yahoo images of the species, and articles culled from Google the species, and articles culled from Google ScholarScholar

Page 19: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

iSpecies.org SearchiSpecies.org Search

For mus musculusFor mus musculus

Page 20: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Chemogenomics and Chemogenomics and Chemoproteomics (Gagna)Chemoproteomics (Gagna)

Chemogenomics (def.)—The description of all Chemogenomics (def.)—The description of all potential drugs that can be used against all potential drugs that can be used against all possible target sites, OR the actions of target-possible target sites, OR the actions of target-specific chemical ligands and how they are used specific chemical ligands and how they are used to globally examine genesto globally examine genes

Chemoproteomics (def.)—Uses chemistry to Chemoproteomics (def.)—Uses chemistry to characterize protein structure and functionscharacterize protein structure and functions

They are “. . . a form of chemical biology brought They are “. . . a form of chemical biology brought up to date in the area of genome and proteome up to date in the area of genome and proteome analysis.”analysis.”

Page 21: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

New Interdisciplinary JournalsNew Interdisciplinary Journals ACS Chemical Biology (ACS)ACS Chemical Biology (ACS) ChemBioChem; A European Journal of ChemBioChem; A European Journal of

Chemical Biology (Wiley/VCH)Chemical Biology (Wiley/VCH) Chemical Biology and Drug Design (Blackwell)Chemical Biology and Drug Design (Blackwell) JBIC; Journal of Biological and Inorganic JBIC; Journal of Biological and Inorganic

Chemistry (Springer)Chemistry (Springer) Journal of Biochemical and Molecular Journal of Biochemical and Molecular

Toxicology (Wiley)Toxicology (Wiley) Molecular Biosystems (RSC)Molecular Biosystems (RSC) Nature Chemical Biology (Nature Publishing)Nature Chemical Biology (Nature Publishing) Organic & Biomolecular Chemistry (RSC)Organic & Biomolecular Chemistry (RSC)

Page 22: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Open Source Software Open Source Software (Geldenhuys)(Geldenhuys)

Log Log PP calculator from Interactive Analysis calculator from Interactive Analysis http://www.logp.comhttp://www.logp.com

University of Utah’s Computational Science and University of Utah’s Computational Science and Engineering OnlineEngineering Online Can submit jobs for molecular mechanics, quantum Can submit jobs for molecular mechanics, quantum

chemical calculations, and biomolecular interfaces for chemical calculations, and biomolecular interfaces for viewing PDB filesviewing PDB files

http://www.cse-online.nethttp://www.cse-online.net

Virtual Computational Chemistry LaboratoryVirtual Computational Chemistry Laboratory http://www.vcclab.orghttp://www.vcclab.org

Page 23: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

The Blue Obelisk (Guha)The Blue Obelisk (Guha)

Several open chemistry and Several open chemistry and chemoinformatics projects that have chemoinformatics projects that have pooled forces to enhance interoperabilitypooled forces to enhance interoperability

Maintain: Maintain: Chemoinformatics Algorithms DictionaryChemoinformatics Algorithms Dictionary Data Repository for standardized data for Data Repository for standardized data for

chemical properties and other facts (e.g., chemical properties and other facts (e.g., mass)mass)

http://www.blueobelisk.org/http://www.blueobelisk.org/

Page 24: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

BlueObelisk.orgBlueObelisk.org

Working collaboratively on projects such as:Working collaboratively on projects such as: Chemistry Development Kit (CDK)Chemistry Development Kit (CDK) JChemPaintJChemPaint JmolJmol JUMBOJUMBO NMRShiftDBNMRShiftDB OctetOctet Open BabelOpen Babel QSARQSAR World Wide Molecular Matrix (WWMM)World Wide Molecular Matrix (WWMM)

Page 25: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Barriers to the Use of Open Source Barriers to the Use of Open Source SoftwareSoftware

Unix command lineUnix command line Problem: Lack of known standards and Problem: Lack of known standards and

datasets of compounds for validation, e.g., datasets of compounds for validation, e.g., in docking programs in docking programs

Page 26: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Lessons from the Human Genome Lessons from the Human Genome Project (Austin)Project (Austin)

Keys to success in the HGP were:Keys to success in the HGP were: ComprehensivenessComprehensiveness Commitment to open access to the sequence as a Commitment to open access to the sequence as a

research tool without encumbranceresearch tool without encumbrance Proposed tools for a “genome functionation Proposed tools for a “genome functionation

toolbox”:toolbox”: Whole-genome transcriptome and proteome Whole-genome transcriptome and proteome

characterizationcharacterization Development of small inhibitory RNAs (siRNAs) and Development of small inhibitory RNAs (siRNAs) and

knockout mice for every geneknockout mice for every gene Small molecules and the druggable genomeSmall molecules and the druggable genome

Page 27: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

ChemDB ChemDB http://cdb.ics.uci.edu/CHEM/Web/ http://cdb.ics.uci.edu/CHEM/Web/

Page 28: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

ChEBI, Chemical Entities of ChEBI, Chemical Entities of Biological InterestBiological Interest

Dictionary of molecular entities focused on Dictionary of molecular entities focused on small chemical compoundssmall chemical compounds

Features an ontological classification, Features an ontological classification, showing the relationships between showing the relationships between molecular entities or classes of entities molecular entities or classes of entities and their parents and/or children and their parents and/or children

Page 29: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Vioxx Entry in ChEBIVioxx Entry in ChEBI

Page 30: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

The IUPAC International Chemical The IUPAC International Chemical Identifier (InChI)Identifier (InChI)

Open source, non-proprietary, public-domain identifier Open source, non-proprietary, public-domain identifier for chemicalsfor chemicals

String of characters that uniquely represent a molecular String of characters that uniquely represent a molecular substancesubstance

Independent of the way the chemical structure is drawnIndependent of the way the chemical structure is drawn Enables reliable structure recognition and easy linking of Enables reliable structure recognition and easy linking of

diverse data compilationsdiverse data compilations Accepts as input MOLfiles (or SDfiles) and CML filesAccepts as input MOLfiles (or SDfiles) and CML files Download the program to your computer at: Download the program to your computer at:

http://www.iupac.org/inchi/license.htmlhttp://www.iupac.org/inchi/license.html

Page 31: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Generation of InChI for Vioxx with Generation of InChI for Vioxx with wInChIwInChI

Page 32: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Vioxx Entry in PubChem Vioxx Entry in PubChem Compounds Found with InChICompounds Found with InChI

Page 33: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Vioxx Bioassay Data in PubChemVioxx Bioassay Data in PubChem

Page 34: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Vioxx PubChem Link to External Vioxx PubChem Link to External Sources of InformationSources of Information

Page 35: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

The Elsevier MDL/NIH Link via The Elsevier MDL/NIH Link via PubChem and DiscoveryGatePubChem and DiscoveryGate

Cross-indexes PubChem to the Compound Cross-indexes PubChem to the Compound Index hosted on Elsevier MDL’s DiscoveryGate Index hosted on Elsevier MDL’s DiscoveryGate platformplatform

MDL added 5 million structures from PubChem MDL added 5 million structures from PubChem to their index, resulting in over 14 million unique to their index, resulting in over 14 million unique chemical structureschemical structures

Links go both waysLinks go both ways Can move from biological data in PubChem to Can move from biological data in PubChem to

bioactivity, chemical sourcing, synthetic methodology, bioactivity, chemical sourcing, synthetic methodology, and EHS data in DiscoveryGate sources and EHS data in DiscoveryGate sources

Page 36: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Elsevier MDL’s xPharmElsevier MDL’s xPharm

Comprehensive set of records linking:Comprehensive set of records linking: Agents (compounds) (2300)Agents (compounds) (2300) Targets (600)Targets (600) Disorders (450)Disorders (450) Principles that govern their interactions (180)Principles that govern their interactions (180)

Answers questions such as:Answers questions such as:• What targets are associated with control of blood What targets are associated with control of blood

pressure?pressure?• What adverse effects are associated with What adverse effects are associated with

monoamine oxidase inhibitors?monoamine oxidase inhibitors?

Page 37: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Text Datamining (Banville)Text Datamining (Banville)

““In the pharmaceutical field, it is ideally the In the pharmaceutical field, it is ideally the marriage of biological and chemical information marriage of biological and chemical information that needs to be the ultimate focus of text data that needs to be the ultimate focus of text data mining applications.”mining applications.”

Problems:Problems: Lack of universal publication standards for identifying Lack of universal publication standards for identifying

each unique chemical entityeach unique chemical entity Selective indexing policies of A&I servicesSelective indexing policies of A&I services Need to understand how chemical structures link to Need to understand how chemical structures link to

biological processesbiological processes

Page 38: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Chemical Datamining SoftwareChemical Datamining Software SureChemSureChem

http://surechem.reeltwo.com/http://surechem.reeltwo.com/ CLiDECLiDE

Recognizes structures, reactions, and textRecognizes structures, reactions, and text http://www.simbiosys.ca/clide/http://www.simbiosys.ca/clide/

OSCAR OSCAR ““OSCAR1” to check experimental dataOSCAR1” to check experimental data

• http://www.ch.cam.ac.uk/magnus/checker.htmlhttp://www.ch.cam.ac.uk/magnus/checker.html• http://www.rsc.org/Publishing/ReSourCe/AuthorGuidelines/AuthoringTools/Ehttp://www.rsc.org/Publishing/ReSourCe/AuthorGuidelines/AuthoringTools/E

xperimentalDataChecker/xperimentalDataChecker/

CSR (Chemical Structure Reconstruction)CSR (Chemical Structure Reconstruction) http://www.scai.fraunhofer.de/uploads/media/MZ-ERCIM05_04.pdfhttp://www.scai.fraunhofer.de/uploads/media/MZ-ERCIM05_04.pdf

MDL DocSearch—combines MDL’s Isentris platform and EMC’s MDL DocSearch—combines MDL’s Isentris platform and EMC’s DocumentumDocumentum

Page 39: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Overview of the TalkOverview of the Talk

Review of ACS CINF 2004 PapersReview of ACS CINF 2004 Papers Review of Relevant ArticlesReview of Relevant Articles Public Chemistry Databases and Public Chemistry Databases and

Data Repositories with Data Repositories with Bioinformatics Info/LinksBioinformatics Info/Links

Overview of Web ServicesOverview of Web Services NIH-funded Projects Underway or Planned NIH-funded Projects Underway or Planned

at Indiana Universityat Indiana University

Page 40: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Themes from SwissProt’s 20Themes from SwissProt’s 20thth Anniversary Conference, Anniversary Conference,

“In silico Analysis of Proteins”“In silico Analysis of Proteins” Knowledgebases, databases and other Knowledgebases, databases and other

information resources for proteinsinformation resources for proteins Sequence searches and alignmentsSequence searches and alignments Protein sequence analysisProtein sequence analysis Protein structure prediction, analysis and Protein structure prediction, analysis and

visualizationvisualization Proteomics data analysisProteomics data analysis

Page 41: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Chemoinformatics Databases Chemoinformatics Databases (J(Jóónsdnsdóóttir)ttir)

Lists databases relevant to drug discovery Lists databases relevant to drug discovery and development, including:and development, including: General databasesGeneral databases DBs for screening compoundsDBs for screening compounds DBs for medicinal agentsDBs for medicinal agents DBs with ADMET propertiesDBs with ADMET properties DBs with physico-chemical propertiesDBs with physico-chemical properties

Curiously Curiously does not mentiondoes not mention Chemical Chemical AbstractsAbstracts

Page 42: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Databases with Protein and Ligand Databases with Protein and Ligand Information (JInformation (Jóónsdnsdóóttir)ttir)

Protein Data BankProtein Data Bank Target Registration DatabaseTarget Registration Database Relibase—uses structural info to analyze Relibase—uses structural info to analyze

protein-ligand interactions; Relibase+ for protein-ligand interactions; Relibase+ for protein-protein interaction searchingprotein-protein interaction searching

Cambridge Structural DatabaseCambridge Structural Database KEGG LIGAND DB for enzyme reactionsKEGG LIGAND DB for enzyme reactions

http://www.genome.ad.jp/ligandhttp://www.genome.ad.jp/ligand

Page 43: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Other Databases with Protein and Other Databases with Protein and Ligand InformationLigand Information

SitesBase--a database of known ligand SitesBase--a database of known ligand binding sites within the PDBbinding sites within the PDB http://www.bioinformatics.leeds.ac.uk/sb/http://www.bioinformatics.leeds.ac.uk/sb/

main.htmlmain.html Binding MOADBinding MOAD

http://www.bindingmoad.org/http://www.bindingmoad.org/ sc-PDB (Kellenberger)sc-PDB (Kellenberger)

http://bioinfo-pharma.u-strasbg.fr:8080/http://bioinfo-pharma.u-strasbg.fr:8080/scPDB/index.jspscPDB/index.jsp

Page 44: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

sc-PDB sc-PDB http://bioinfo-pharma.u-strasbg.fr:8080/scPDB/index.jsphttp://bioinfo-pharma.u-strasbg.fr:8080/scPDB/index.jsp

Page 45: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Isatin Search on sc-PDBIsatin Search on sc-PDB

Page 46: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Other Databases with Protein-Other Databases with Protein-Protein Interaction Data (JProtein Interaction Data (Jóónsdnsdóóttir)ttir) YPD, Yeast Proteome Database (for YPD, Yeast Proteome Database (for

proteins from S. cerevisiae)proteins from S. cerevisiae) http://www.biobase.de/pages/index.php?id=139http://www.biobase.de/pages/index.php?id=139

Human Protein Reference DatabaseHuman Protein Reference Database http://www.hprd.org/http://www.hprd.org/

BIND, Biomolecular Interaction Network BIND, Biomolecular Interaction Network Database (ceased as of 11/16/2005?)Database (ceased as of 11/16/2005?) http://www.bind.ca/Action http://www.bind.ca/Action

Page 47: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

International Molecular Exchange International Molecular Exchange (IMEx) Consortium(IMEx) Consortium

http://imex.sourceforge.net/http://imex.sourceforge.net/ BIND (http://www.blueprint.org) The Blueprint Initiative

AsiaPte. Ltd, Singapore and The Blueprint Initiative North America,Toronto Canada

DIP (http://dip.doe-mbi.ucla.edu) http://dip.doe-mbi.ucla.edu) UCLA-DOE Institute for Genomics & Proteomics

IntAct (http://www.ebi.ac.uk/intact), EMBL–European Bioinformatics Institute, Hinxton, UK;

MINT (http://mint.bio.uniroma2.it/mint/) University of Rome “Tor Vergata”, Rome Italy

MPact (http://mips.gsf.de/genre/proj/mpact), MIPS / Institute for Bioinformatics, Munich, Germany.

Page 48: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Protein Sites from IU I533 Students Protein Sites from IU I533 Students and othersand others

LigandDepot—integrated source for small moleculesLigandDepot—integrated source for small molecules http://ligand-depot.rutgers.edu/index.html http://ligand-depot.rutgers.edu/index.html

PSIPRED Protein Structure Prediction ServerPSIPRED Protein Structure Prediction Server http://bioinf.cs.ucl.ac.uk/psipred/ http://bioinf.cs.ucl.ac.uk/psipred/

DSSP--a database of secondary structure assignments DSSP--a database of secondary structure assignments (and much more) for all protein entries in the PDB (and much more) for all protein entries in the PDB

http://swift.cmbi.ru.nl/gv/dssp/ http://swift.cmbi.ru.nl/gv/dssp/ Dr. Predrag Radivojac’s I690 class on Structural Dr. Predrag Radivojac’s I690 class on Structural

BioinformaticsBioinformatics http://www.informatics.indiana.edu/predrag/http://www.informatics.indiana.edu/predrag/

2006springi690/2006springi690.htm 2006springi690/2006springi690.htm

Page 49: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Protein Secondary Structure Protein Secondary Structure PredictionPrediction

MethodsMethods Neural NetworkNeural Network Rule BasedRule Based Other Machine LearningOther Machine Learning Homology BasedHomology Based

Page 50: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Protein Secondary Structure Protein Secondary Structure Prediction SoftwarePrediction Software

PredictProtein PredictProtein http://www.predictprotein.org/http://www.predictprotein.org/Chou-Fasman Chou-Fasman http://http://

fasta.bioch.virginia.edu/fasta_www/chofas.htmfasta.bioch.virginia.edu/fasta_www/chofas.htm NN PredictNN Predict

http://www.cmpharm.ucsf.edu/~nomi/nnpredict.http://www.cmpharm.ucsf.edu/~nomi/nnpredict.htmlhtml

Page 51: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Structure-Based Docking MethodsStructure-Based Docking Methods

MethodMethod Scans many small molecules and “docks” Scans many small molecules and “docks”

them to a site of interest on a protein structurethem to a site of interest on a protein structure Predicts free energy of bindingPredicts free energy of binding Filters thousands of compounds relatively Filters thousands of compounds relatively

quicklyquickly Top hits can be used for more rigorous Top hits can be used for more rigorous

computational/experimental characterization computational/experimental characterization and optimizationand optimization

Page 52: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Structure-Based Docking MethodsStructure-Based Docking Methods DOCK DOCK

http://dock.compbio.ucsf.edu/http://dock.compbio.ucsf.edu/ Accelrys’s Insight (built on DOCK)Accelrys’s Insight (built on DOCK)

• http://www.accelrys.com/products/insight/http://www.accelrys.com/products/insight/

FlexXFlexX http://www.biosolveit.de/FlexX/http://www.biosolveit.de/FlexX/

GlideGlide http://www.schrodinger.com/http://www.schrodinger.com/

ProductDescription.php?mID=6&sID=6 ProductDescription.php?mID=6&sID=6 GOLDGOLD

http://www.ccdc.cam.ac.uk/products/http://www.ccdc.cam.ac.uk/products/life_sciences/gold/ life_sciences/gold/

Page 53: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Useful Structure DatabasesUseful Structure Databases

ModBase ModBase http://modbase.compbio.ucsf.edu/modbase-cgi-http://modbase.compbio.ucsf.edu/modbase-cgi-

new/search_form.cginew/search_form.cgi Dali Database (Fold classification; based on Dali Database (Fold classification; based on

PDB)PDB) http://ekhidna.biocenter.helsinki.fi/dali/starthttp://ekhidna.biocenter.helsinki.fi/dali/start

Protein Structure Analysis, Comparison, &/or Protein Structure Analysis, Comparison, &/or Classification [Guide]Classification [Guide] http://www.bio.vu.nl/nvtb/Structures.html http://www.bio.vu.nl/nvtb/Structures.html

Page 54: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

SCOP, Structural Classification of SCOP, Structural Classification of ProteinsProteins

Curated database of structural and Curated database of structural and evolutionary relationshipsevolutionary relationships All known protein folds (v. 1.69, July 2005)All known protein folds (v. 1.69, July 2005)

• 70,859 domains organized into 2,845 families, 70,859 domains organized into 2,845 families, 1,539 superfamilies, and 945 folds1,539 superfamilies, and 945 folds

Detailed information about close relativesDetailed information about close relatives Links to coordinates, images of structures, Links to coordinates, images of structures,

interactive viewers, and literature interactive viewers, and literature referencesreferences http://scop.mrc-lmb.cam.ac.uk/scop/ http://scop.mrc-lmb.cam.ac.uk/scop/

Page 55: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

SCOP Search OptionsSCOP Search Options

Homology search yields a list of structures Homology search yields a list of structures with significant levels of sequence with significant levels of sequence similaritysimilarity

Keyword search matches words in SCOP Keyword search matches words in SCOP and PDBand PDB

Page 56: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

CATH Protein Structure CATH Protein Structure Classification Classification

Like SCOP, structured hierarchically by:Like SCOP, structured hierarchically by: Class (determined by secondary structure)Class (determined by secondary structure) Architecture (overall shape, e.g., barrel, sandwich, roll, etc.) – no Architecture (overall shape, e.g., barrel, sandwich, roll, etc.) – no

equivalent in SCOPequivalent in SCOP Topology (grouped into fold families based on overall shape and Topology (grouped into fold families based on overall shape and

connectivity of secondary structures)connectivity of secondary structures) Homologous Superfamily (domains thought to share a common Homologous Superfamily (domains thought to share a common

ancestor)ancestor) As of January 2005, had 43,229 domains classified into As of January 2005, had 43,229 domains classified into

1,467 superfamilies and 5,107 sequence families; A 1,467 superfamilies and 5,107 sequence families; A protein family database (CATH-PFDB) contained a total protein family database (CATH-PFDB) contained a total of 616,470 domain sequences classified into 23,876 of 616,470 domain sequences classified into 23,876 sequence families sequence families

• http://cathwww.biochem.ucl.ac.uk/latest/index.html http://cathwww.biochem.ucl.ac.uk/latest/index.html

Page 57: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

CATH Search OptionsCATH Search Options

Can browse or search the classification by Can browse or search the classification by CATH codeCATH code

CATH codes can be used to search other CATH codes can be used to search other databases, e.g., DHS, Gene3D, and databases, e.g., DHS, Gene3D, and ImpalaImpala

Page 58: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Gasteiger’s Biochemical Pathways Gasteiger’s Biochemical Pathways DatabaseDatabase

Database of biochemical pathways that represents Database of biochemical pathways that represents chemical structures and reactions on the atomic levelchemical structures and reactions on the atomic level

Gives access to each atom and bond of the substrates of Gives access to each atom and bond of the substrates of enzyme reactionsenzyme reactions

Allows the study of transition state hypotheses of Allows the study of transition state hypotheses of enzyme reactionsenzyme reactions

Analysis of the physicochemical effects operating at the Analysis of the physicochemical effects operating at the reaction site allows a classification of enzyme reactions reaction site allows a classification of enzyme reactions that goes beyond the traditional EC code for enzymes. that goes beyond the traditional EC code for enzymes.

1533 biochemical molecules and 2175 reactions1533 biochemical molecules and 2175 reactions http://www2.chemie.uni-erlangen.de/services/biopath/indhttp://www2.chemie.uni-erlangen.de/services/biopath/ind

ex.htmlex.html

Page 59: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

A Gene Expression Database for A Gene Expression Database for NCI60 (Scherf)NCI60 (Scherf)

Published in Nature Genetics, 2000Published in Nature Genetics, 2000 First study to integrate gene expression First study to integrate gene expression

with molecular pharmacology databaseswith molecular pharmacology databases Gene expression profiles for NCI60 Gene expression profiles for NCI60

assessed using microarray technologyassessed using microarray technology Gene-drug relationships investigated by Gene-drug relationships investigated by

how the gene transcription levels vary with how the gene transcription levels vary with respect to drug activitiesrespect to drug activities

Page 60: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Correlation Matrix Between Drug Correlation Matrix Between Drug Activity and Gene ExpressionActivity and Gene Expression

Page 61: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Other Relevant Databases/ServersOther Relevant Databases/Servers

Each year Nucleic Acids Each year Nucleic Acids Research publishes a Research publishes a Database Issue in January and Database Issue in January and a Web Server Issue in July a Web Server Issue in July (See refs in Bibliography (See refs in Bibliography section). Examples from the section). Examples from the most recent issues:most recent issues:

DatabasesDatabases ServersServers

KEGGKEGG BASysBASys

PDBPDB BRIDGEPBRIDGEP

PINTPINT SCRATCHSCRATCH

MutDBMutDB GlyprotGlyprot

GLIDAGLIDA I2I-SiteEngI2I-SiteEng

DrugBankDrugBank PatchDockPatchDock

SPACESPACE

SymmDockSymmDock

DeNovoIDDeNovoID

Page 62: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Overview of the TalkOverview of the Talk

Review of ACS CINF 2004 PapersReview of ACS CINF 2004 Papers Review of Relevant ArticlesReview of Relevant Articles Public Chemistry Databases and Data Public Chemistry Databases and Data

Repositories with Bioinformatics Info/Links Repositories with Bioinformatics Info/Links Overview of Web ServicesOverview of Web Services NIH-funded Projects Underway or Planned NIH-funded Projects Underway or Planned

at Indiana Universityat Indiana University

Page 63: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Web Services OverviewWeb Services Overview

What are “Web Services”?What are “Web Services”? A distributed invocation system built on Grid A distributed invocation system built on Grid

computingcomputing• Independent of platform and programming Independent of platform and programming

languagelanguage• Built on existing Web standardsBuilt on existing Web standards

A service oriented architecture withA service oriented architecture with• Interfaces based on Internet protocolsInterfaces based on Internet protocols• Messages in XML (except for binary data Messages in XML (except for binary data

attachments)attachments)

Page 64: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Service-Oriented ArchitectureService-Oriented Architecture

From Curcin et al. From Curcin et al. DDT, 2005, DDT, 2005, 10(12),86710(12),867

Page 65: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Web Services for Chemistry: Web Services for Chemistry: ProblemsProblems

Performance and scalabilityPerformance and scalability Proprietary dataProprietary data Competition from high-performance desktop Competition from high-performance desktop

applicationsapplications-- Geoff Hutchison, it’s a puzzle blog, 2005-01-05-- Geoff Hutchison, it’s a puzzle blog, 2005-01-05

ALSO: ALSO: Lack of a substantial body of trustworthy Open Lack of a substantial body of trustworthy Open

Access databasesAccess databases Non-standard chemical data formats (over 40 in Non-standard chemical data formats (over 40 in

regular use and requiring normalization to one regular use and requiring normalization to one another)another)

Page 66: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Overview of the TalkOverview of the Talk

Review of ACS CINF 2004 PapersReview of ACS CINF 2004 Papers Review of Relevant ArticlesReview of Relevant Articles Public Chemistry Databases and Data Public Chemistry Databases and Data

Repositories with Bioinformatics Info/Links Repositories with Bioinformatics Info/Links Overview of Web ServicesOverview of Web Services NIH-funded Projects Underway or NIH-funded Projects Underway or

Planned at Indiana UniversityPlanned at Indiana University

Page 67: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Indiana University Planned Indiana University Planned Projects:Projects:

http://www.chembiogrid.org http://www.chembiogrid.org Design of a Grid-based distributed data Design of a Grid-based distributed data

architecturearchitecture Development of tools for HTS data analysis and Development of tools for HTS data analysis and

virtual screeningvirtual screening Database for quantum mechanical simulation Database for quantum mechanical simulation

datadata Chemical prototype projectsChemical prototype projects

Novel routes to enzymatic reaction mechanismsNovel routes to enzymatic reaction mechanisms Mechanism-based drug designMechanism-based drug design Data-inquiry-based development of new methods in Data-inquiry-based development of new methods in

natural product synthesisnatural product synthesis

Page 68: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Web Services for Chemistry at IUWeb Services for Chemistry at IUPurpose Purpose Technologies Technologies

Interaction LayerInteraction Layer Interactive software for Interactive software for creative access and creative access and exploitation of information exploitation of information by humans by humans

Microsoft .NET Smart Microsoft .NET Smart Clients, portlets, Java Clients, portlets, Java applets, email and browser applets, email and browser clients, visualization clients, visualization technologies technologies

Aggregation LayerAggregation Layer Workflows and data Workflows and data schemas customized for schemas customized for particular domains, particular domains, applications and users applications and users

BPEL, Taverna and other BPEL, Taverna and other workflow modeling tools, workflow modeling tools, aggregate web servicesaggregate web services

Web service layerWeb service layer Comprehensive data and Comprehensive data and computation provision computation provision including storage, including storage, calculation, semantics and calculation, semantics and meta-data exposed as web meta-data exposed as web services services

Apache web services, Apache web services, SOAP wrappers, WSDL, SOAP wrappers, WSDL, UDDI, XML, UDDI, XML,

Microsoft .NET Microsoft .NET

Page 69: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

NCI Developmental Therapeutics NCI Developmental Therapeutics Program (DTP)Program (DTP)

Downloadable data:Downloadable data: In vitroIn vitro 60 cell line results 60 cell line results in vitroin vitro anti-HIV results anti-HIV results Yeast assayYeast assay 200,000+ chemical structures200,000+ chemical structures molecular targetsmolecular targets microarray data microarray data

Or search the database at:Or search the database at:• http://http://dtp.nci.nih.gov/docs/dtp_search.htmldtp.nci.nih.gov/docs/dtp_search.html

Page 70: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

IU Database of NIH DTP DataIU Database of NIH DTP Data Contains over 200,000 chemical structures Contains over 200,000 chemical structures

tested in 60 cellular assays from different human tested in 60 cellular assays from different human tumor cell linestumor cell lines

Also includes microarray assay profiles for the Also includes microarray assay profiles for the untreated cell lines (~14,000 datapoints)untreated cell lines (~14,000 datapoints)

A local PostgreSQL database containing the A local PostgreSQL database containing the data that is exposed as a web servicedata that is exposed as a web service

Using workflows and complex SQL queries, we Using workflows and complex SQL queries, we can do advanced data mining that exploits the can do advanced data mining that exploits the chemical, biological and genomic information for chemical, biological and genomic information for particular audiences (chemists, biologists, etc)particular audiences (chemists, biologists, etc)

Page 71: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Mining the NIH DTP databaseMining the NIH DTP database

~20

0,00

0 ~

200,

000

com

poun

dsco

mpo

unds

60 cell lines60 cell lines

~14,000 gene expression

~14,000 gene expression valuesvalues

Cell lines can be clustered based on gene expression similarity

Compounds can be clustered based on similarity of profileacross cell lines, or by chemical structure fingerprint similarity

Page 72: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Use of Taverna at IUUse of Taverna at IU A protein implicated in tumor growth is supplied to the docking A protein implicated in tumor growth is supplied to the docking

program (in this case HSP90 taken from the PDB 1Y4 complex)program (in this case HSP90 taken from the PDB 1Y4 complex) The workflow employs our local NIH DTP database service to The workflow employs our local NIH DTP database service to

search 200,000 compounds tested in human tumor cellular assays search 200,000 compounds tested in human tumor cellular assays for similar structures to the ligand. for similar structures to the ligand.

Client portlets are used to browse these structuresClient portlets are used to browse these structures Once docking is complete, the user visualizes the high-scoring Once docking is complete, the user visualizes the high-scoring

docked structures in a portlet using the JMOL applet.docked structures in a portlet using the JMOL applet. Similar structures are filtered for drugability, and are automatically Similar structures are filtered for drugability, and are automatically

passed to the OpenEye FRED docking program for docking into the passed to the OpenEye FRED docking program for docking into the target protein.target protein.

A 2D structure is supplied for input into the similarity search (in this A 2D structure is supplied for input into the similarity search (in this case, the extracted bound ligand from the PDB IY4 complex)case, the extracted bound ligand from the PDB IY4 complex)

Correlation of docking results and “biological fingerprints” across the Correlation of docking results and “biological fingerprints” across the human tumor cell lines can help identify potential mechanisms of human tumor cell lines can help identify potential mechanisms of action of DTP compoundsaction of DTP compounds

Page 73: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Taverna WorkflowTaverna Workflow

Visual depiction of workflow

Workflow definition

Available web services(WSDL)

Page 74: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Taverna in ActionTaverna in Action

Page 75: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Overall WorkflowOverall Workflow

Page 76: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Pre-Closing QuotePre-Closing Quote

““There is not going to be a ‘voila’ moment There is not going to be a ‘voila’ moment at the computer terminal. Instead, there is at the computer terminal. Instead, there is systematic use of wide-ranging systematic use of wide-ranging computational tools to facilitate and computational tools to facilitate and enhance the drug discovery process.”enhance the drug discovery process.” Jorgensen. Science, March 19, 2004, 303, Jorgensen. Science, March 19, 2004, 303,

1814.1814.

Page 77: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Closing quoteClosing quote

““The future of chemistry depends on the The future of chemistry depends on the automated analysis of chemical automated analysis of chemical knowledge, combining disparate data knowledge, combining disparate data sources in a single resource, such as the sources in a single resource, such as the World-Wide Molecular Matrix, which can World-Wide Molecular Matrix, which can be analysed using computational be analysed using computational techniques to assess and build on these techniques to assess and build on these data.”data.” Townsend et al. Org. Biomol. Chem. 2004, 2, Townsend et al. Org. Biomol. Chem. 2004, 2,

3299.3299.

Page 78: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Post-closing quote: zzzzzCASPost-closing quote: zzzzzCAS

““In an industry first, Chemical Abstracts In an industry first, Chemical Abstracts Service (CAS) has unveiled a Service (CAS) has unveiled a revolutionary new literature searching tool revolutionary new literature searching tool which will permit scientists to search and which will permit scientists to search and retrieve the world’s chemical literature—retrieve the world’s chemical literature—including patents and obscure technical including patents and obscure technical reports—in their sleep.”reports—in their sleep.”--Author unknown--Author unknown

Page 79: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

AcknowledgementsAcknowledgements

Randy ArnoldRandy Arnold Xiao DongXiao Dong Sean MooneySean Mooney Peter Murray-RustPeter Murray-Rust David J. WildDavid J. Wild I533 Chemical Informatics Seminar I533 Chemical Informatics Seminar

StudentsStudents Elsevier ScienceElsevier Science

Page 80: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Bibliography: Articles, Books, and Bibliography: Articles, Books, and Conference PapersConference Papers

““The Bigger Picture: Linking Bioinformatics to Cheminformatics” The Bigger Picture: Linking Bioinformatics to Cheminformatics” [CINF Symposium] Abstracts [1-16], 227th ACS National Meeting[CINF Symposium] Abstracts [1-16], 227th ACS National MeetingAnaheim, CA, March 28-April 1, 2004 Anaheim, CA, March 28-April 1, 2004 http://www.acscinf.org/new/docs/meetings/227nm/227cinfabstracts.http://www.acscinf.org/new/docs/meetings/227nm/227cinfabstracts.htm htm

Austin, C.P. “The completed human genome: implications for Austin, C.P. “The completed human genome: implications for chemical biology.” Current Opinion in Chemical Biology 2003, 7, chemical biology.” Current Opinion in Chemical Biology 2003, 7, 511-515.511-515.

Bajorath, Jürgen, ed. Chemoinformatics: concepts, methods, and Bajorath, Jürgen, ed. Chemoinformatics: concepts, methods, and tools for drug discovery. Totowa, N.J. : Humana Press, c2004. tools for drug discovery. Totowa, N.J. : Humana Press, c2004. (Methods in molecular biology ; v. 275)(Methods in molecular biology ; v. 275)

Banville, Debra L. “Mining chemical structural informationo from the Banville, Debra L. “Mining chemical structural informationo from the drug literature.” Drug Discovery Today January 2006, 11(1/2), 35-drug literature.” Drug Discovery Today January 2006, 11(1/2), 35-42.42.

Brown F. “Editorial opinion: chemoinformatics - a ten year update.”Brown F. “Editorial opinion: chemoinformatics - a ten year update.”Current Opinion in Drug Discovery and Development 2005 May; Current Opinion in Drug Discovery and Development 2005 May; 8(3): 298-302.8(3): 298-302.

Page 81: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Bibliography: Articles (cont’d)Bibliography: Articles (cont’d) Coles, Simon J.; Day, Nick E.; Murray-Rust, Peter; Rzepa, Henry S.; Coles, Simon J.; Day, Nick E.; Murray-Rust, Peter; Rzepa, Henry S.;

Zhang, Yong. “Enhancement of the chemical semantic web through Zhang, Yong. “Enhancement of the chemical semantic web through InChIfication.” InChIfication.” Organic & Biomolecular ChemistryOrganic & Biomolecular Chemistry 20052005, , 33, 1832-, 1832-1834.1834.

Curcin, Vera; Ghanem, Moustafa; Guo, Yike. "Web services in the Curcin, Vera; Ghanem, Moustafa; Guo, Yike. "Web services in the life sciences." life sciences." Drug Discovery TodayDrug Discovery Today 20052005, , 10(12),10(12), 865-871. 865-871.

Gagna CE, Winokur D, Clark Lambert W. “Cell biology, Gagna CE, Winokur D, Clark Lambert W. “Cell biology, chemogenomics and chemoproteomics.” Cell Biol Int. 2004; 28(11): chemogenomics and chemoproteomics.” Cell Biol Int. 2004; 28(11): 755-64. 755-64.

Geldenhuys, W.J.; Gaasch, K.E.; Watson, M.; Allen, D.D.;Van Der Geldenhuys, W.J.; Gaasch, K.E.; Watson, M.; Allen, D.D.;Van Der Schyf, C.J. “Optimizing the use of open-source software applications Schyf, C.J. “Optimizing the use of open-source software applications in drug discovery.” Drug Discovery Today February 2006, 11(3/4), in drug discovery.” Drug Discovery Today February 2006, 11(3/4), 127-132.127-132.

Guha, R.; Howard, M.T.; Hutchison, G.R.; Murray-Rust, P.; Rzepa, Guha, R.; Howard, M.T.; Hutchison, G.R.; Murray-Rust, P.; Rzepa, H.; Steinbeck, C; Wegner, J.; Willighagen, E.L. “The Blue Obelisk—H.; Steinbeck, C; Wegner, J.; Willighagen, E.L. “The Blue Obelisk—Interoperability in chemical informatics.” Journal of Chemical Interoperability in chemical informatics.” Journal of Chemical Information and Modeling 2006 Web Release Date: 22-Feb-2006; Information and Modeling 2006 Web Release Date: 22-Feb-2006; DOI: 10.1021/ci050400b DOI: 10.1021/ci050400b

Page 82: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Bibliography: Articles (cont’d)Bibliography: Articles (cont’d) JJóónsdnsdóóttir, S.O.; Jorgensen, F.S.; Brunak, S. “Prediction methods ttir, S.O.; Jorgensen, F.S.; Brunak, S. “Prediction methods

and databases within chemoinformatics: emphasis on drugs and and databases within chemoinformatics: emphasis on drugs and drug candidates.” Bioinformatics 2005 May 15; 21(10): 2145-60. drug candidates.” Bioinformatics 2005 May 15; 21(10): 2145-60.

Jorgensen, William L. “The many roles of computation in drug Jorgensen, William L. “The many roles of computation in drug discovery.” Science March 19, 2004, 303, 1813-1818.discovery.” Science March 19, 2004, 303, 1813-1818.

Kauffman, Thom. “Profile.” [interview] LiveWire, March 2005, 7.3; Kauffman, Thom. “Profile.” [interview] LiveWire, March 2005, 7.3; http://pubs.acs.org/4librarians/livewire/2006/7.3/profile.htmlhttp://pubs.acs.org/4librarians/livewire/2006/7.3/profile.html

Murray-Rust, Peter S.; Mitchell, John B.O.; Rzepa, Henry S. Murray-Rust, Peter S.; Mitchell, John B.O.; Rzepa, Henry S. “Communication and re-use of chemical information in bioscience.” “Communication and re-use of chemical information in bioscience.” BMC BioinformaticsBMC Bioinformatics 20052005, , 66, 180., 180.

Murray-Rust, Peter; Mitchell, John B.O.; Rzepa, Henry S. Murray-Rust, Peter; Mitchell, John B.O.; Rzepa, Henry S. “Chemistry in bioinformatics.” “Chemistry in bioinformatics.” BMC BioinformaticsBMC Bioinformatics 20052005, , 66, 141-144., 141-144.

Povolna, Vera; Dixon, Scott; Weininger, David. “Cabinet—Chemical Povolna, Vera; Dixon, Scott; Weininger, David. “Cabinet—Chemical and Biological Informatics NETwork.” in: Oprea, Tudor I., ed. and Biological Informatics NETwork.” in: Oprea, Tudor I., ed. Chemoinformatics in Drug Discovery. Weinheim: Wiley-VCH, 2004, Chemoinformatics in Drug Discovery. Weinheim: Wiley-VCH, 2004, 241-269.241-269.

Page 83: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Bibliography: Articles (cont’d)Bibliography: Articles (cont’d) Salamone, Salvatore. “Hip Hop offers lessons on life sciences data Salamone, Salvatore. “Hip Hop offers lessons on life sciences data

integration.” Bio-IT World February 2006, 36.integration.” Bio-IT World February 2006, 36. Scherf Uwe, Ross Douglas T., Waltham Mark, Smith Lawrence H., Scherf Uwe, Ross Douglas T., Waltham Mark, Smith Lawrence H.,

Lee Jae K., Tanabe Lorraine, Kohn Kurt W., Reinhold William C., Lee Jae K., Tanabe Lorraine, Kohn Kurt W., Reinhold William C., Myers Timothy G., Andrews Darren T., Scudiero Dominic A., Eisen Myers Timothy G., Andrews Darren T., Scudiero Dominic A., Eisen Michael B., Sausville Edward A., Pommier Yves, Botstein David, Michael B., Sausville Edward A., Pommier Yves, Botstein David, Brown Patrick O., Weinstein John N. “A gene expression database Brown Patrick O., Weinstein John N. “A gene expression database for the molecular pharmacology of cancer.” Nature Genetics 2000, for the molecular pharmacology of cancer.” Nature Genetics 2000, 24, 236-244.24, 236-244.

Souchelnytskyi, S. "Bridging proteomics and systems biology: What Souchelnytskyi, S. "Bridging proteomics and systems biology: What are the roads to be traveled?" Proteomics 2005 (November), 5(16), are the roads to be traveled?" Proteomics 2005 (November), 5(16), 4123-4137.4123-4137.

Tetko, Igor V. “Computing chemistry on the web.” Drug Discovery Tetko, Igor V. “Computing chemistry on the web.” Drug Discovery Today November 2005, 10(22), 1497-1500.Today November 2005, 10(22), 1497-1500.

Page 84: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Bibliography: Articles (cont’d)Bibliography: Articles (cont’d) Zimmermann, Marc; Thi, Le Thuy Bui; Hofmann, Martin. “Combating Zimmermann, Marc; Thi, Le Thuy Bui; Hofmann, Martin. “Combating

illiteracy in chemistry: Towards computer-based chemical structure illiteracy in chemistry: Towards computer-based chemical structure reconstruction.” ERCIM News January 2005, 60, 40-41.reconstruction.” ERCIM News January 2005, 60, 40-41.

http://www.scai.fraunhofer.de/uploads/media/MZ-http://www.scai.fraunhofer.de/uploads/media/MZ-ERCIM05_04.pdf ERCIM05_04.pdf

Zimmermann, Marc; Fluck, Juliane; Thi, Le Thuy Bui; Kolarik, Zimmermann, Marc; Fluck, Juliane; Thi, Le Thuy Bui; Kolarik, Corinna; Kumpf, Kai; Hofmann, Martin. “Information extraction in the Corinna; Kumpf, Kai; Hofmann, Martin. “Information extraction in the life sciences: Perspectives for medicinal. chemistry, pharmacology life sciences: Perspectives for medicinal. chemistry, pharmacology and toxicology.” Current Topics in Medicinal Chemistry 2005, 5(8), and toxicology.” Current Topics in Medicinal Chemistry 2005, 5(8), 785-796.785-796.

Page 85: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Bibliography: DatabasesBibliography: Databases Andreeva, A.; Howorth, D.; Brenner, S.E.; Hubbard, T.J.P.; Chothia, Andreeva, A.; Howorth, D.; Brenner, S.E.; Hubbard, T.J.P.; Chothia,

C.; Murzin, A.G. “SCOP database in 2004: refinements integrate C.; Murzin, A.G. “SCOP database in 2004: refinements integrate structure and sequence family data.” Nucleic Acids Research 2004, structure and sequence family data.” Nucleic Acids Research 2004, 32 Database issue D226-D229 doi: 10.1093/nar/gkh03932 Database issue D226-D229 doi: 10.1093/nar/gkh039

Chen J, Swamidass SJ, Dou Y, Bruand J, Baldi P. “ChemDB: a Chen J, Swamidass SJ, Dou Y, Bruand J, Baldi P. “ChemDB: a public database of small molecules and related chemoinformatics public database of small molecules and related chemoinformatics resources.” Bioinformatics. 2005 Nov 15; 21(22): 4133-9.resources.” Bioinformatics. 2005 Nov 15; 21(22): 4133-9.

Dunkel, M.; Fullbeck, M.; Neumann, S.; Preissner, R. “SuperNatural: Dunkel, M.; Fullbeck, M.; Neumann, S.; Preissner, R. “SuperNatural: a searchable database of available natural compounds.” Nucleic a searchable database of available natural compounds.” Nucleic Acids Research 2006, 34, Database issue D678-D683 doi: Acids Research 2006, 34, Database issue D678-D683 doi: 10.1093/nar/gkj13210.1093/nar/gkj132

Gold, Nicola D.; Jackson, Richard M. “A searchable database for Gold, Nicola D.; Jackson, Richard M. “A searchable database for comparing protein-ligand binding site for the analysis of structure-comparing protein-ligand binding site for the analysis of structure-function relationships.” Journal of Chemical Information and function relationships.” Journal of Chemical Information and Modeling 2006, 46(2), 736-742.Modeling 2006, 46(2), 736-742.

Page 86: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Bibliography: Databases (cont’d)Bibliography: Databases (cont’d)

Kanehisa, M.; Goto, S.; Hattori, M.; Aoki-Kinoshita, F. Itoh, M.; Kanehisa, M.; Goto, S.; Hattori, M.; Aoki-Kinoshita, F. Itoh, M.; Kawashima, S.; Katayama, T.; Araki, M; Hirakawa, M. “From genomics Kawashima, S.; Katayama, T.; Araki, M; Hirakawa, M. “From genomics to chemical genomics: new developments in KEGG.” Nucleic Acids to chemical genomics: new developments in KEGG.” Nucleic Acids Research 2006, 34, Database issue D354-D357. doi: Research 2006, 34, Database issue D354-D357. doi: 10:1093/nar/gkj102.10:1093/nar/gkj102.

Kellenberger, Esther; Muller, Pascal; Schalon, Clarire; Bret, Guillaume; Kellenberger, Esther; Muller, Pascal; Schalon, Clarire; Bret, Guillaume; Foata, Nicolas; Rognan, Didier. “sc-PDB: An annotated database of Foata, Nicolas; Rognan, Didier. “sc-PDB: An annotated database of druggable binding sites from the Protein Data Bank.” Journal of druggable binding sites from the Protein Data Bank.” Journal of Chemical Information and Modeling 2006, 46(2), 717-727.Chemical Information and Modeling 2006, 46(2), 717-727.

Kirwin, J.J.; Shoichet, B.K. “ZINC—A free database of commercially Kirwin, J.J.; Shoichet, B.K. “ZINC—A free database of commercially available compounds for virtual screening.” Journal of Chemical available compounds for virtual screening.” Journal of Chemical Information and Modeling 2005, 45, 177-182.Information and Modeling 2005, 45, 177-182.

Kouranov, A.; Xie, L. de la Cruz, J.; Chen, L.; Westbrook, J.; Bourne, Kouranov, A.; Xie, L. de la Cruz, J.; Chen, L.; Westbrook, J.; Bourne, P.E.; Berman, H.M. “The RCSB PDB information protal for structural P.E.; Berman, H.M. “The RCSB PDB information protal for structural genomics.” Nucleic Acids Research 2006, 34, Database issue D302-genomics.” Nucleic Acids Research 2006, 34, Database issue D302-D305 doe: 10:1093/nar/gkj120D305 doe: 10:1093/nar/gkj120

Kumar, M.D.S.; Gromiha, M.M. “PINT: Protein-protein interactions Kumar, M.D.S.; Gromiha, M.M. “PINT: Protein-protein interactions thermodynamic database.” Nucleic Acids Research 2006, 34 Database thermodynamic database.” Nucleic Acids Research 2006, 34 Database issue D195-D198 doi: 10.1093/nar/gkj017issue D195-D198 doi: 10.1093/nar/gkj017

Page 87: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Bibliography: Databases (cont’d)Bibliography: Databases (cont’d)

Lo Conte, L.; Brenner, S.E.; Hubbard, T.J.P.; Chothia, C.; Murzin, Lo Conte, L.; Brenner, S.E.; Hubbard, T.J.P.; Chothia, C.; Murzin, A.G. “SCOP database in 2002: refinements accommodate structural A.G. “SCOP database in 2002: refinements accommodate structural genomics.” Nucleic Acids Research 2002, 30(1): 264-267.genomics.” Nucleic Acids Research 2002, 30(1): 264-267.

Murzin, A.G.; Brenner, S.E.; Hubbard, T.; Chothia, C. “SCOP: A Murzin, A.G.; Brenner, S.E.; Hubbard, T.; Chothia, C. “SCOP: A structural classification of proteins database for the investigation of structural classification of proteins database for the investigation of sequences and structures.” Journal of Molecular Biology 1995, 247, sequences and structures.” Journal of Molecular Biology 1995, 247, 536-540.536-540.

Okuno, Y.; Yang, J.; Taneishi, K.; Yabuuchi, H.; Tsujimoto, G. Okuno, Y.; Yang, J.; Taneishi, K.; Yabuuchi, H.; Tsujimoto, G. “GLIDA: GPCR-ligand database for chemical genomic drug “GLIDA: GPCR-ligand database for chemical genomic drug discovery.” Nucleic Acids Research 2006, 34, Database issue discovery.” Nucleic Acids Research 2006, 34, Database issue D673-D677 doi: 10.1093/nar/gkj028.D673-D677 doi: 10.1093/nar/gkj028.

Pearl F, Todd A, Sillitoe I, Dibley M, Redfern O, Lewis T, Bennett C, Pearl F, Todd A, Sillitoe I, Dibley M, Redfern O, Lewis T, Bennett C, Marsden R, Grant A, Lee D, Akpor A, Maibaum M, Harrison A, Marsden R, Grant A, Lee D, Akpor A, Maibaum M, Harrison A, Dallman T, Reeves G, Diboun I, Addou S, Lise S, Johnston C, Dallman T, Reeves G, Diboun I, Addou S, Lise S, Johnston C, Sillero A, Thornton J, Orengo C. The CATH Domain Structure Sillero A, Thornton J, Orengo C. The CATH Domain Structure Database and related resources Gene3D and DHS provide Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis.” comprehensive domain family information for genome analysis.” Nucleic Acids Research. 2005, 33 Database Issue D247-D251.Nucleic Acids Research. 2005, 33 Database Issue D247-D251.

Page 88: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Bibliography: Databases (cont’d)Bibliography: Databases (cont’d)

Wheeler, D.L. et al. “Database resources of the National Center for Wheeler, D.L. et al. “Database resources of the National Center for Biotechnology Information.” Nucleic Acids Research 2006, 34 Biotechnology Information.” Nucleic Acids Research 2006, 34 Database Issue D173-D180 doi: 10.1093/nar/gkj158Database Issue D173-D180 doi: 10.1093/nar/gkj158

Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, Chang Z, Woolsey, Jennifer. “DrugBank: a comprehensive P, Chang Z, Woolsey, Jennifer. “DrugBank: a comprehensive resource for in silico drug discovery and exploration.”resource for in silico drug discovery and exploration.”Nucleic Acids Res. 2006 Jan 1;34(Database issue): D668-72.Nucleic Acids Res. 2006 Jan 1;34(Database issue): D668-72.

Page 89: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Biotech Validation Suite for Protein Biotech Validation Suite for Protein StructuresStructures

Send the server a PDB fileSend the server a PDB file Server provides a comprehensive check of Server provides a comprehensive check of

the protein, including:the protein, including: Atomic volume analysisAtomic volume analysis Full geometric analysisFull geometric analysis NMR restraint dataNMR restraint data

http://biotech.ebi.ac.uk:8400/http://biotech.ebi.ac.uk:8400/

Page 90: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Knowledge-Driven Bioinformatics Knowledge-Driven Bioinformatics Enhanced with ChemistryEnhanced with Chemistry

Page 91: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

ToxTreeToxTree

An in silico toxicology prediction suiteAn in silico toxicology prediction suite Based on the CDK toolkitBased on the CDK toolkit Built on CMLBuilt on CML Released as OpenSource under the GPL Released as OpenSource under the GPL Standalone PC softwareStandalone PC software User Manual: User Manual: http://http://

ecb.jrc.it/DOCUMENTS/QSAR/TOXTREE/ecb.jrc.it/DOCUMENTS/QSAR/TOXTREE/toxTree_user_manual.pdftoxTree_user_manual.pdf

Page 92: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Tools for Genomic and Proteomic Tools for Genomic and Proteomic Scientists Scientists vis-à-visvis-à-vis Cell Biology Cell Biology

(Gagna et al.)(Gagna et al.) Tools to fully exploit the techniques in cellular Tools to fully exploit the techniques in cellular

biologybiology Light microscopy for high resolution imagesLight microscopy for high resolution images Fractionation of cells into basic components via Fractionation of cells into basic components via

ultracentrifugationultracentrifugation Analysis of individual cells through flow cytometryAnalysis of individual cells through flow cytometry LCM, normal and diseased TMAs (tissue LCM, normal and diseased TMAs (tissue

microarrays), quantitative computer image analysis, microarrays), quantitative computer image analysis, cell micromanipulation, and high-throughput cell micromanipulation, and high-throughput microscopymicroscopy

Page 93: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

InChI Generation on the WebInChI Generation on the Web

The following websites provide the facility The following websites provide the facility to generate InChIs:to generate InChIs: www.acdlabs.com/download/chemsk.htmlwww.acdlabs.com/download/chemsk.html

ACD/Labs' freely available structure-drawing ACD/Labs' freely available structure-drawing program ChemSketch includes the facility to program ChemSketch includes the facility to generate InChIs from drawn structures.generate InChIs from drawn structures.

pubchem.ncbi.nlm.nih.govpubchem.ncbi.nlm.nih.gov/edit//edit/PubChem Server Side Structure Editor v1.8 PubChem Server Side Structure Editor v1.8 includes a facility for generating InChIs as you includes a facility for generating InChIs as you draw the structure.draw the structure.

Page 94: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Advances in Macromolcular Advances in Macromolcular Crystallography by CCGCrystallography by CCG

More protein structures available nowMore protein structures available now Use of 3D info in bioinformatics makes Use of 3D info in bioinformatics makes

functional inferences more dependablefunctional inferences more dependable• CCG Structural Family Database distributed with CCG Structural Family Database distributed with

MOEMOE Includes fold detection methodology to ID structurally Includes fold detection methodology to ID structurally

similar proteinssimilar proteins Simultaneous sequence and structural alignment of large Simultaneous sequence and structural alignment of large

collections of proteinscollections of proteins 3D structural family analysis for insight into conserved 3D structural family analysis for insight into conserved

geometry, water molecules, salt bridges, hydrogen geometry, water molecules, salt bridges, hydrogen bonds, hydrophobic contacts, and disulfide bondsbonds, hydrophobic contacts, and disulfide bonds

Page 95: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

CCG’s Cheminformatics OfferingsCCG’s Cheminformatics Offerings

MOE Molecular DatabaseMOE Molecular Database MoMo lecular Descriptors calculated and lecular Descriptors calculated and

used for classification, clustering, filtering, used for classification, clustering, filtering, and predictive model constructionand predictive model construction

QSAR/QSPR Predictive ModelingQSAR/QSPR Predictive Modeling Diversity and Similarity SearchingDiversity and Similarity Searching High Throughput Conformational SearchHigh Throughput Conformational Search 3D Pharmacophore Search3D Pharmacophore Search

Page 96: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Components of the Semantic Web Components of the Semantic Web for Chemistryfor Chemistry

XML – eXtensible Markup LanguageXML – eXtensible Markup Language RDF – Resource Description FrameworkRDF – Resource Description Framework RSS – Rich Site SummaryRSS – Rich Site Summary Dublin Core – allows metadata-based Dublin Core – allows metadata-based

newsfeedsnewsfeeds OWL – for ontologiesOWL – for ontologies BPEL4WS – for workflow and web servicesBPEL4WS – for workflow and web services

Murray-Rust et al. Org. Biomol. Chem. 2004, 2, 3192-Murray-Rust et al. Org. Biomol. Chem. 2004, 2, 3192-3203. 3203.

Page 97: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Web Services Integration Projects: Web Services Integration Projects: BiosciencesBiosciences

myGridmyGrid http://http://www.mygrid.org.ukwww.mygrid.org.uk//

BIOPIPEBIOPIPE http://http://biopipe.orgbiopipe.org//

BioMOBYBioMOBY http://http://biomoby.orgbiomoby.org//

Page 98: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

BIOT 2006BIOT 2006 Major themes, areas and suggested topics includeMajor themes, areas and suggested topics include

- Bio-molecular and Phylogenetic Databases- Bio-molecular and Phylogenetic Databases - Molecular Evolution and Phylogenetic analysis- Molecular Evolution and Phylogenetic analysis - Drug Delivery Systems- Drug Delivery Systems - Bio-Ontology and Data Mining- Bio-Ontology and Data Mining - Sequence Search and Alignment- Sequence Search and Alignment - Microarray Analysis- Microarray Analysis - System Biology- System Biology - Pathway analysis- Pathway analysis - Identification and Classification of Genes- Identification and Classification of Genes - Protein Structure Prediction and Molecular Simulation- Protein Structure Prediction and Molecular Simulation - Functional Genomics- Functional Genomics - Proteomics- Proteomics - Tertiary structure prediction- Tertiary structure prediction - Drug Docking- Drug Docking - Gene Expression Analysis- Gene Expression Analysis - Biomedical Imaging- Biomedical Imaging

Page 99: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Proteomics: What is it?Proteomics: What is it?

Proteomics is the study of protein expression, regulation, modification, and function in living systems for understanding how living systems use proteins. Using a variety of techniques, proteomics can be used to study how proteins interact within a system, or how proteins change due to applied stresses.

Requires advanced measurement techniques, especially separations and mass spectrometry

Page 100: Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith.

Proteomics Needs Informatics for:Proteomics Needs Informatics for:

Locating peaks in 2 or more dimensions MS/MS spectra interpretation Protein/Peptide quantification Peptide detectability Experimental data Biological

information enzyme or pathway regulation disease susceptibility drug efficacy


Recommended