11
Overview of Chemical Informatics and Cyberinfrastructure Collaboratory
Aug 16 2006Geoffrey Fox
Computer Science, Informatics, PhysicsPervasive Technology Laboratories
Indiana University Bloomington IN [email protected]
http://www.infomall.orghttp://www.chembiogrid.org
22
Capabilities Local Teams, successful Prototypes and International
Collaboration set up in 3 initial major focus areas• Chemical Informatics Cyberinfrastructure/Grids with
services, workflows and demonstration uses building on success in other applications (LEAD) and showing distributed integration of academic and commercial tools
• Computational Chemistry Cyberinfrastructure/Grids with simulation, databases and TeraGrid use
• Education with courses and degrees Review of activities suggest we also formalize work in two further
areas• Chemical Informatics Research – model applicability• Interfacing with the User - bench chemist-friendly portal
33
Current Status Web site http://www.chembiogrid.org Wiki chosen to support project as a shared editable web space Building Collaboratory involving PubChem – Global Information
System accessible anywhere and at any time – enhance PubChem with distributed tools (clustering, simulation, annotation etc.) and data
Adopted Taverna as workflow as popular in Bioinformatics but we will evaluate other systems such as GPEL from LEAD
Preparing large set of runs on local Big Red 23 Teraflop supercomputer (OSCAR3 CDK Mopac)
Initial results discussed at conferences/workshops/papers• Gordon Conferences, ACS, SDSC tutorial
First new Cheminformatics courses offered Advisory board set up and met Videoconferencing-based meetings with Peter Murray-Rust and group
at Cambridge roughly every 2-3 weeks Good or potentially good interactions with NIH DTP, Scripps, Lilly
and Michigan ECCR
44
CICC Senior Personnel Geoffrey C. Fox Mu-Hyun (Mookie) Baik Dennis B. Gannon Marlon Pierce Beth A. Plale Gary D. Wiggins David J. Wild Yuqing (Melanie) Wu
Peter T. Cherbas Mehmet M. Dalkilic Charles H. Davis A. Keith Dunker Kelsey M. Forsythe Kevin E. Gilbert John C. Huffman Malika Mahoui Daniel J. Mindiola Santiago D. Schnell William Scott Craig A. Stewart David R. Williams
From Biology, Chemistry, Computer Science, Informatics
at IU Bloomington and IUPUI (Indianapolis)
55
CICC Advisory Board Alan D. Palkowitz (Eli Lilly) Chris Peterson (Kalypsys) David Spellmeyer (IBM) Dimitris K. Agrafiotis (Johnson & Johnson) Horst Hemmerle (Eli Lilly) James M. Caruthers (Purdue University) Jeremy G. Frey (University of Southampton) Joel Saltz (Ohio State University/University of Maryland/Johns
Hopkins University) John M. Barnard (Digital Chemistry) John Reynders (Eli Lilly) Peter Murray-Rust (University of Cambridge) Peter Willett (University of Sheffield) Thompson Doman (Eli Lilly) Val Gillet (University of Sheffield)
Industry andAcademiaMet October 2005will meet this fall
6
CICC Combines Grid Computing with Chemical Informatics
CICCCICC CICCCICCChemical Informatics and Cyberinfrastucture CollaboratoryFunded by the National Institutes of Health
www.chembiogrid.org
Indiana University Department of Chemistry, School of Informatics, and Pervasive Technology Laboratories
Science and Cyberinfrastructure
.
Large Scale Computing ChallengesChemical Informatics is non-traditional area of high performance computing, but many new, challenging problems may be investigated.
CICC is an NIH funded project to support chemical informatics needs of High Throughput Cancer Screening Centers. The NIH is creating a data deluge of publicly available data on potential new drugs.
CICC supports the NIH mission by combining state of the art chemical informatics techniques with
• World class high performance computing• National-scale computing resources (TeraGrid)• Internet-standard web services • International activities for service orchestration• Open distributed computing infrastructure for scientists world wide
NIHPubMed
DataBase
OSCARText
Analysis
POVRayParallel
Rendering
Initial 3DStructure
Calculation
ToxicityFiltering
ClusterGrouping Docking
MolecularMechanics
Calculations
Quantum Mechanics
Calculations
IU’sVaruna
DataBase
NIHPubChemDataBase
Chemical informatics text analysis programs can process 100,000’s of abstracts of online journalarticles to extract chemical signatures of potential drugs.
OSCAR-mined molecular signatures can be clustered, filtered for toxicity, and docked onto larger proteins. These are classic “pleasingly parallel” tasks. Top-ranking docked molecules can be further examined for drug potential.
Big Red (and the TeraGrid) will also enable us to perform time consuming, multi-stepped Quantum Chemistry calculations on all of PubMed. Results go back to public databases that are freely accessible by the scientific community.
CICC Prototype Web Services
Molecular weightsMolecular formulaeTanimoto similarity2D Structure diagramsMolecular descriptors3D structuresInChi generation/searchCMLRSS
Basic cheminformatics
Application based services
Compare (NIH)Toxicity predictions (ToxTree)Literature extraction (OSCAR3)Clustering (BCI Toolkit)Docking, filtering, ... (OpenEye)Varuna simulation
Define WSDL interfaces to enable global production of compatible Web services; refine CML Look at Pipeline Pilot Extend Computational Chemistry (Varuna) Services Routine TeraGrid Big Red use Ready to try “Prototype Production” on OSCAR3 CDK Mopac Develop more training material Link to screening center via Scripps
Next steps?
Key Ideas
Add value to PubChem with additional distributed services and databases Wrapping existing code in web services is not difficult Provide “core” (CDK) services and exemplars of typical tools Provide access to key databases via a web service interface Provide access to major Compute Grids
8
Varuna environment for molecular modeling (Baik, IU)
QMDatabase
ResearcherResearcher
Simulation ServiceFORTRAN Code,
Scripts
Chemical Concepts
Experiments
QM/MMDatabasePubChem, PDB,
NCI, etc.
ChemBioGridChemBioGrid
ReactionDB
DB ServiceQueries, Clustering,
Curation, etc.
Papersetc.
Condor
TeraGridSupercomputers
“Flocks”