www.healthgrid.org
World-wide in silico drug discovery against neglected and emerging diseases on grid infrastructures
Dr Nicolas jacqHealthGrid association
Credit : the WISDOM collaborationhttp://wisdom.healthgrid.org
International Symposium on Grids for Science and Business12 June 2007
Jacq - 12 June 2007 2
The HealthGrid association
• The vision of HealthGrid is the deployment of e-infrastructures able to interoperate geographically distributed repositories of health-related data and the integration of high-end processing services on top of them.
• Some key aspects are:– The integration of health-related actors in grid projects– The integration of grid standards and medical informatics standards for
interoperability– The deployment of pilots for new ways of research and new methods– The integration of bioinformatics community and medical informatics
• The mission of HealthGrid is to foster the communication among the different key actors and to catalyse joint research actions at international level
Jacq - 12 June 2007 3
Main achievements
• Edition of the HealthGrid Whitepaper in 2005 outlining the concept, benefits and opportunities offered by applying grids indifferent applications in biomedicine and healthcare– http://whitepaper.healthgrid.org
• Involvement as full partner in several projects– SHARE (SSA): http://www.eu-share.org– EGEE II (I3): http://www.eu-egee.org– ACGT (IP): http://www.eu-acgt.org
• Organisation of the HealthGrid conference since 2003– HealthGrid.US Alliance will host the 6th International HealthGrid
Conference in Chicago – Spring 2008
• Development of the health grids knowledge base– http://kb.healthgrid.org
Jacq - 12 June 2007 4
Content
• WISDOM, an initiative for grid-enabled drug discoveryagainst neglected and emerging diseases
• Deployment and results of grid-enabled large scalevirtual screening against malaria and avian influenza
• Deployment method
• Conclusion and perspectives
Jacq - 12 June 2007 5
Goal of the WISDOM initiative
• WISDOM stands for World-wide In Silico Docking On Malaria
• Goal: contribute to develop new drugs for neglected and emerging diseases with a particular focus on malaria and avian flu
• Specificity: extensively rely on emerging information technologiesto provide new tools and environments for drug discovery
• Initial focus: virtual screening
• Web site: http://wisdom.healthgrid.org
Jacq - 12 June 2007 6
WISDOM collaboration
7 partners, 4 associated laboratories providing targets and/or in vitro facilities
Univ. Los Andes:Bioinformatics, Malaria biology
LPC Clermont-Ferrand:Biomedical grid
Web service
SCAI Fraunhofer:Knowledge extraction,
Chemoinformatics
Univ. Modena:Malaria biology,
Molecular Dynamics
ITB CNR:Bioinformatics,
Molecular modelling
Univ. Pretoria:Bioinformatics, Malaria biology
Academica Sinica:Grid user interfaceAvian flu biologyIn vitro testing
HealthGrid:Biomedical grid, Dissemination
CEA, Acamba project:Malaria biology, Chemogenomics
Chonnam Nat. Univ.:In vitro testing
PartnersAssociated labs
Mahidol Univ. Bangkok:In vitro testing
New
Jacq - 12 June 2007 7
Benefits from using the grid (1/2)
• World-wide distribution of malaria resistance• 1975-2004: Only 21 new drugs for tropical diseases on 1,556 were
marketed (Chirac P. Toreele. E Lancet. May 2006)
• Neglected diseases keep suffering lack of R&D
• Grids allow reduced costs
Jacq - 12 June 2007 8
Benefits from using the grid (2/2)
• H5N1 virus has the potential to cause a large-scale pandemic• H5N1 may mutate and acquire the ability of drug resistance
• Time is a critical factor for handling emerging diseases
• Grids provide accelerating factor
Source : Ross E.G. Upshur BA(HONS), MA, MD, MSc, CCFP, FRCPCDeaths from all causes each week expressed as an annual rate per 1000
months
Jacq - 12 June 2007 9
In silico drug discovery
• Problem: development of a drug takes 12 to 15 yearsand costs approximately 800 million dollars
TargetIdentification
TargetValidation
LeadIdentification
Lead Optimization
Target discovery Lead discovery
Clinical Phases
(I-III)
Jacq - 12 June 2007 10
Grid impact on drug discovery workflow down
to drug delivery (1/2)
• Grids provide the necessary tools and data to identify new biological targets– Bioinformatics services (database replication, workflow…)– Resources for CPU intensive tasks (genomics comparative analysis,
inverse docking…)
• Grids provide the resources to speed up lead discovery– Large scale in silico docking to identify potentially promising
compounds– Molecular dynamics computations to refine virtual screening and further
assess selected compounds
• Grid offers very interesting perspectives to enable collaboration between public and private partners– Platform for information and knowledge sharing
Jacq - 12 June 2007 11
Grid impact on drug discovery workflow down
to drug delivery (2/2)
• Grids provide environments for epidemiology– Federation of databases to collect data in endemic areas to
study a disease and to evaluate impact of vaccine, vector control measures
– Resources for data analysis and mathematical modelling
• Grids provide the services needed for clinical trials– Federation of databases to collect data in the centres
participating to the clinical trials
• Grids provide the tools to monitor drug delivery– Federation of databases to monitor drug delivery
Jacq - 12 June 2007 12
Content
• WISDOM, an initiative for grid-enabled drug discoveryagainst neglected and emerging diseases
• Deployment and results of grid-enabled large scalevirtual screening against malaria and avian influenza
• Deployment method
• Conclusion and perspectives
Jacq - 12 June 2007 13
Compound database
Target structure model
DOCKING
Predicted binding models
Post-analysis
Compounds for assay
Docking: predict how small molecules bind to a receptor
of known 3D structure
Virtual screening by docking
Jacq - 12 June 2007 14
Grid-enabled high throughput virtual
screening by dockingMillions of potential drugs to test againstinteresting proteins!
High Throughput Screening1-10$/compound, several hours
Data challenge on EGEE~ 2 to 30 days on ~5,000 computers
Hits screeningusing assaysperformed onliving cells
Leads
Clinical testing
Drug
Selection of the best hits
Too costly for neglected disease!
Molecular docking (FlexX, Autodock)~1 to 15 minutes
Targets:PDB: 3D structures
Compounds:ZINC: 4.3M
Chembridge: 500,000
Cheap and fast!
Jacq - 12 June 2007 15
Statistics of deployment
• First Data Challenge: July 1st - August 15th 2005– Target: malaria– 80 CPU years, 1 TB of data produced, 1,700 CPUs used in parallel– 1st large scale docking deployment world-wide on a e-infrastructure
• Second Data Challenge: April 15th - June 30th 2006 – Target: avian flu– 100 CPU years, 800 GB of data produced, 1,700 CPUs used in parallel– Collaboration initiated on March 1st: deployment preparation achieved in 45
days
• Third Data Challenge: October 1st - 15th December 2006 – Target: malaria– 400 CPU years, 1.6 TB of data produced, Up to 5,000 CPUs used in parallel– Very high docking throughput: > 100,000 compounds per hour
Jacq - 12 June 2007 16
A huge international effort for the third data challenge
1% 2% 2% 3%3%
3%3%
5%
6%
7%
12%15%
38%
EGEE Germany SwitzerlandEGEE Asia Pacific EGEE RussiaAuvergridEuChinaGridEELAEGEE South Western EuropeEGEE Central Europe EGEE Northern EuropeEGEE ItalyEGEE South Eastern EuropeEGEE FranceEGEE UKI
Over 420 CPU years in 10 weeksA record throughput of 100,000 docked compounds per hour
WISDOM calculations used FlexX from BioSolveIT(6k free, floating licenses)
Jacq - 12 June 2007 17
Biological objectives
• Malaria– Plasmepsin
– DHFR Plasmodium falciparum– DHFR Plasmodium vivax– GST– Tubulin
• Avian influenza– Neuraminidase N1
N1
H5
Credit: Y-T Wu (ASGC)
Jacq - 12 June 2007 18
Results from avian fludata challenge (1/2)
• 5 out of 6 known effective inhibitors can be identified in the first 15% of the ranking and in the first 5% reranked (2,250 compounds)– Enrichment: (5/6)/(15%x5%) = 111 (<1 in most cases)
• Most known effective inhibitors lose their affinity in binding with a mutated target
GNA 2.4%
15% cut off
E119A
11.5%
E119A mutated type
GNA 11.5%
Original type
GNA=zanamivir
Jacq - 12 June 2007 19
• Experimental assay confirms 7 actives out of 123 purchased “potential hits” (interacting complexes with higher affinities and proper docked poses) = 6%
• Average success rate of in vitro testing = 0.1%• To be confirmed on more hits, tests are running in Univ. of
Chonnam (South Korea)
NA
Results from avian fludata challenge (2/2)
Jacq - 12 June 2007 20
Results from first malaria data challenge
1,000, 000 chemical compounds
Sorting based on scoring in different parameter sets;Consensus scoring
10,000 compounds selected
Based on key interactions, binding modes, etc.
1,000 compounds
MD
100 compounds will be tested in July by Univ. of Chonnam (South Korea)Credit: V. Kasam
Fraunhofer Institute
Jacq - 12 June 2007 21
Content
• WISDOM, an initiative for grid-enabled drug discoveryagainst neglected and emerging diseases
• Deployment and results of grid-enabled large scalevirtual screening against malaria and avian influenza
• Deployment method
• Conclusion and perspectives
Jacq - 12 June 2007 22
Requirements for a deployment on grid
• Adaptation of the application to the grid
• Access to a large infrastructure providing maintained resources
• Use of a production system providing automated and fault-tolerant job and file management
Jacq - 12 June 2007 23
Adaptation of the application to the grid
• The application codes can not be modified and are not designed for grid computing.
• A common strategy is to split the application into shorter tasks
• License management for commercial software is not adapted for large infrastructure
Docking softwareDocking software
DBDBDB
OutputOutputOutput
InputdataInputdata
ParametersParameters
Docking softwareDocking software
DBDBDB
OutputOutputOutput
InputdataInputdata
ParametersParameters
DataDataDataDataDBsubset
DBsubset
Embarrassingly parallel application
Jacq - 12 June 2007 24
Grid Added Value
• Large number of CPUs available
• Reliable and secured Data Management Services– Sharing of results– Replication of the data– ACLs
• Availability of the resources
Real Time Monitor (Imperial College London)http://gridportal.hep.ph.ic.ac.uk/rtm/
Jacq - 12 June 2007 25
Grid infrastructures and projects contributing to the
data challenges
: European grid infrastructure : European grid project
EELA
EUMedGrid EUChinaGrid
: Regional/national grid infrastructure
Auvergrid EGEE
TWGrid
EMBRACE BioinfoGridSHARE
Jacq - 12 June 2007 26
WISDOM production environment
Credit: CNRS-IN2P3
Jacq - 12 June 2007 27
GUI designed by biologists
Target selection
Compound selection
Docking parameter setter
Energy table
Complex visualization
Credit: H-C Lee (ASGC)
Jacq - 12 June 2007 28
Content
• WISDOM, an initiative for grid-enabled drug discoveryagainst neglected and emerging diseases
• Deployment and results of grid-enabled large scalevirtual screening against malaria and avian influenza
• Deployment method
• Conclusion and perspectives
Jacq - 12 June 2007 29
Conclusion
• WISDOM proposes a new approach to drug discoverythanks to the grid– Rapid deployment of large scale virtual screening– Collaborative environment for the sharing of data in the
research community
• First biochemical results demonstrate grid relevance to the drug discovery community
Jacq - 12 June 2007 30
Perspectives
• Summer 2007– 2nd data challenge against avian flu– In vitro tests of the best molecules from the data challenges
• Winter 2007– Discussion with WHO and Novartis
Targets provided by the Drug Target Portfolio Network from the Tropical Disease Research initiative
– Discussion with Africa@home initiativeWISDOM deployment on a desktop grid
Jacq - 12 June 2007 31
Thank you
• To all members of the WISDOM collaboration for theircontribution to the project (CNRS-IN2P3, ASGC, ITB-CNR, SCAI Fraunhofer, Univ of Modena…)
• To all grid nodes which committed resources and allowedthe success of the initiative
• To all projects which supported the initiative by providingeither computing resources or manpower to develop the WISDOM environment (EGEE, BioinfoGRID, Embrace, SHARE…)
• To BioSolveIT by offering up to 6,000 free licenses of FlexX