Date post: | 27-Mar-2015 |
Category: |
Documents |
Upload: | rachel-barnes |
View: | 215 times |
Download: | 0 times |
WISDOM project – Grid and neglected diseases
CoreGRID Summer School 2006
Dr. Marc Zimmermann
Page 2Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
Burden of Diseases in Developing World
Disease Endemic Countries
People at Risk
(million)
Clinical Incidence/yr
(million)
Deaths/yr (million)
Disease Burden (DALYs-million)
HIV/AIDS 180 5.900 40 2.8 86 Malaria 101 2.400 300-500 1.2 44.7 TB 211 1.987 8 1.6 35.4 African trypanosomiasis
36 60 0.3-0.5 0.05 1.5
Chagas Disease 21 100 16-18 0.01 0.7 Leishmaniasis 88 350 12 0.05 2 Filariasis 80 1.000 120 --- 5.8 Schistosomiasis 76 500-600 140 0.01 1.7 Onchocerciasis 36 120 18 --- 0.5 Leprosy 24 --- 0.8 --- 0.2
Page 3Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
Screen in culture
Screen inanimal modelspharmacokineticsanalysis
High throughputscreening
Inhibitors Validated hits Leads Drug candidate
Iterative medicinal chemistryOptimise efficacyand pharmaceutical qualities
Drug Discovery Process
Target selectionValidated?Robust assay system?Structural Information?Lead inhibitors?
Product development
Target
More detailed potency/efficacy studies; pharmacokinetics; early toxicology
Nwaka & Ridley 2003Natural products/traditional medicines
Page 4Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
Gaps in Drug R&D Tropical / Neglected Diseases
Valid. drug targets
New compounds
Good animal models to assess
safety and efficacy
Good evaluation tools
Clinical trials capacity in developing countries
Capacity for uptake of new
medicines
Capacity for post-approval
processes
Research and discovery
Preclinical development
Phase I Phase II Phase IIIRegistration, launch,
utilisation
GAP I GAP II GAP III GAP IV
Page 5Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
Potential Impact of an Distributed IT Infrastructure
Speed-up the development of new drugs and vaccines in silico research on grids Collection of epidemiological data for research (modeling,
molecular biology) Ease the deployment of clinical trials in endemic areas
Improve disease monitoring Collect data on drug distribution and treatment follow-up Evaluate impact of policies and programs Improve alert and monitoring system for epidemics
Ease integration of African research laboratories in world research Offer access to IT resources Offer access to data and services (telemedicine, life sciences)
Page 6Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
From a single PC to a Grid
Farm of PCs
Example:
Novartis
Examples:
Seti@home
Africa@home
Example:
EGEE
Enterprise grid:Mutualization of resources in a company
Volunteer computing: CPU cycles made available by PC owners
Grid infrastructure: Internet + disk and storage resources + services for information management ( data collection, transfer and analysis)
Page 7Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
Grid Impact on Drug Discovery Workflow down to Drug Delivery (1/2)
Grids provide the necessary tools and data to identify new
biological targets Bioinformatics services (database replication, workflow, ...) Resources for CPU intensive tasks such as genomics
comparative analysis, inverse docking, ...
Grids provide the resources to speed up lead discovery Large scale in silico docking to identify potentially promising
compounds Molecular Dynamics computations to refine virtual screening
and further assess selected compounds
Page 8Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
Grid Impact on Drug Discovery Workflow down to Drug Delivery (2/2)
Grids provide environments for epidemiology Federation of databases to collect data in endemic areas to
study a disease and to evaluate impact of vaccine, vector
control measures, Resources for data analysis and mathematical modeling
Grids provide the services needed for clinical trials Federation of database to collect data in the centers
participating to the clinical trials
Grids provide the tools to monitor drug delivery Federation of database to monitor drug delivery
Page 9Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
WISDOM : Wide In Silico Docking On Malaria
Biological goal
Proposition of new inhibitors for a family of proteins produced by Plasmodium falciparum
Biomedical informatics goal Deployment of in silico virtual docking on the grid
Grid goal
Deployment of a CPU consuming application generating large data flows to test the grid operation and services => “data challenge”
Page 10Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
Introduction to the Disease : Malaria
~300 million people worldwide are affected
1-1.5 million people die every year
Widely spread
Caused by protozoan parasites of the genus Plasmodium
Complex life cycle with multiple stages
Page 11Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
There is a Real Need for New Drugs to Fight Malaria (WHO)
Drug resistance has emerged for all classes of antimalarials except
artemisinins. Resistance to chloroquine, the cheapest and the most used
drug, is spreading in almost all the endemic countries. Resistance to the combination of sulfadoxine-
pyrimethamine which was already present in South
America and in South-East Asia is now emerging in East
Africa (65% in Western Tanzania)
All countries experiencing resistance to
conventional monotherapies should use
ACTs (artemisinin-based combination therapies)
But there is even the threat of resistance to artemisinin too, as it is
already observed in murine Plasmodium yoelii
Page 12Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
Identification of New Antimalarial Targets
The available drugs focus on a limited number of biological targets
=> cross-resistance to antimalarials
With the advent of the plasmodium genome, many targets came
into light The potential antimalarial drug targets are broadly classified
into three categories: Targets involved in hemoglobin degradation (proteases like plasmepsins,
falcipains) Targets involved in metabolism Targets engaged in membrane transport and signaling (choline transporter
etc).
The present project WISDOM focuses on hemoglobin metabolism
and especially on Plasmepsin II and Plasmepsin IV
Page 13Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
Docking Dataflow
hit
crystal structure
ligand data base
junk
placing the ligand
Structureoptimization
Ranking
Protein surface
Ligand
Watermolecule
Scoring
Page 14Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
High Throughput Virtual Docking
Chemical compounds (ZINC):
Chembridge – 500,000
Drug like – 500,000
Targets (PDB):
Plasmepsin II (1lee,1lf2,1lf3)
Plasmepsin IV (1ls5)
Millions of chemical
compounds available
in laboratories
High Throughput Screening
1-10$/compound, nearly impossible
Molecular docking (FlexX, AutoDock)
~80 CPU years, 1 TB data
Data challenge on EGEE
~6 weeks on ~1700 computers
Hits screening
using assays
performed on
living cells
Leads
Clinical testing
Drug
Page 15Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
Experimental setup for HT-experimentation
2 different assays: ie. docking tools FlexX and AutoDock
5+3 (5+5) different targets from two families (Plm II and IV): 1lee, 1lf2, 1lf3, 1ls5 A and B chain, including 3 different crystal water setups
4 (2) different assay conditions, ie. parameter variations (place particles, overlap volume, 2 genetic algorithms)
multiple point measuring, i.e. pose clustering and select 10 cluster centres
52 large scale experiments
26 million measurements
52 large scale experiments
26 million measurements
4 x 8 + 2 x 10
Page 16Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
Testing under Controlled Conditions
redocking studies of 5 co-crystallized ligands and 9 known inhibitorsmixed with 500.000 ZINC compounds
ZINC14 2000
controltest
quality filter
20002000
2000
Page 17Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
EGEE, International Project of Grid Infrastructure
Started in 2004, >70 partners in the worldProject leader : CERN6 scientific domains with >20 applications deployed170 grid nodes, 17000 CPUs, several PetaBytes of data, 10000
jobs by day
Countries with nodes contributing to the data challenge WISDOM
Page 18Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
Simplified Grid Workflow
3000 floating licenses given by BioSolveIT to SCAI
Maximum number of used licenses was 1008
StorageStorageElementElement
ComputiComputingngElementElement
Site1
Site2
StorageStorageElementElement
User interfaceUser interface
ComputiComputingngElementElement
Compounds database
Parameter settingsTarget structures
Compounds sub lists
Results
Results
Statistics
Compounds list
RessourceRessourceBrokerBroker
Software
FlexX License
Server
EGEE
Page 19Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
WISDOM Architecture
GRIDGrid services (RB, RLS…)Grid resources (CE, SE)Application components
(Software, database)
wisdom_install
Installer Tester
wisdom_test
wisdom_executionWorkload definition
Job submissionJob monitoring
Job bookkeepingFault tracking
Fault fixingJob resubmission
Set of jobs
User
wisdom_collect
Accounting data
Superviser
wisdom_sitewisdom_db
License server
Page 20Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
Objective of the WISDOM Development
Objective Producing a large amount of data in a limited time with a
minimal human cost during the data challenge.
Need an optimized environment Limited time Performance goal
Need a fault tolerant environment Grid is heterogeneous and dynamic Stress usage of the grid during the DC
Need an automatic production environment Execution with the Biomedical Task Force Grid API are not fully adapted for a bulk use at a large scale
Page 21Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
Number of Docked Ligands (millions) vs. Time (days)
1
2
3
4
5
6
1: Intensive submission of FlexX jobs with
Chembridge ligands base
2: Resubmission
3: Intensive submission of FlexX jobs with drug
like ligands base
4: Resubmission
5: Intensive submission of Autodock jobs with
Chembridge ligands base
6: Resubmission
Page 22Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
Number of Running and Waiting Jobs vs. Time
1
2
3
4
5
6
Page 23Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
SouthEasternEurope, 10%
SouthWesternEurope, 12% Italy, 16%
France, 18%
UKI, 29%NorthernEurope, 7%
CentralEurope, 4%
AsiaPacific, 2%
GermanySwitzerland, 1%
Russia, 1%
Total Amount of CPU Provided by EGEE Federation
Page 24Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
Exploitation Metrics
Metrics FlexX + Autodock phases
Total CPU time 80 years
Number of jobs 72751
Number of grid nodes 58
Number of jobs running in parallel on the grid
1643
Volume of output data 946 GB
Volume of transferred data (input + output) 6302 GB
Page 25Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
Performance Metrics
Metrics FlexX + Autodock phases
Cumulated millions number of docked ligands
41,27
Number of docked ligands / h 46475
Effective CPU time 67,15 years
Effective duration 37 days
Crunching factor 662
Average transfer rate 0,8 MB/s
Peak rate 62,1 MB/s
Page 26Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
Efficiency Metrics (1/2)
Metrics FlexX + Autodock phases
Success rate 77,0 %
Success rate after results checking
46,2 %
Success rate after results checking without WISDOM failures
63,0 %
Efficiency depends on :
Heterogeneous and dynamic nature of the grid
Stress usage
Automatic jobs (re)submission (“sink-hole” effect)
Page 27Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
Efficiency Metrics (2/2)
Successful jobs 46 %
Workload Management failure 10 % - Overload, Disk failure
- Misconfiguration, disk space problem
- Air-conditioning, electrical cut
Data Management failure 4 % - Network / connection
- Electrical cut
- Unknown
Sites failure 9 % - Misconfiguration, Tar command, disk space
- Information system update
- Jobs number limitation in the waiting queue
- Air-conditioning, Electrical cut
Unclassified 4 % - lost jobs
- Unknown
Server license failure 23 % - Server failure
- Electric cut
- Server stop
WISDOM failure 4 % - Jobs distribution
- Human failure
- Script
Page 28Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
Influence of the Assay Conditions
with / without crystal waterdifferent pockets parameter settings
score correlation increasing
Page 29Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
Different Docking Tools
AutoDock: removal of internal stress term scaling of units
AutoDock
Fle
xX
Page 30Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
Best Scoring Compounds for 1lee in Parameter Set 1, FlexX
WISDOM-4905131, -48,6
WISDOM-4905002, -48,1
WISDOM-4919013, -47,9
WISDOM-4905154, -47,7
WISDOM-4629716, -45,6
WISDOM-2783458, -45,0
WISDOM-23511620, -43,6
WR100400, 0.11 μm
Page 31Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
Lessons Learned and Conclusions
Biological goal Top scoring compounds possess basic chemical groups like thiourea,
guanidino, and amino acrolein as core structure. Identified compounds are non peptidic and low molecular weight compounds. More insights are needed and further analysis has to be done Check additional information resources
Biomedical informatics goal WISDOM (Wide In-Silico Docking On Malaria) is the first large scale drug
discovery initiative on an open grid infrastructure Docking tools are not black boxes There are many different variations of the same problem leading to different
resultsGrid goal
About 80 CPU years to produce TB of data Don‘t trust databases and forget flat files Some tools are still missing
Page 32Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
Future Works
Extension of in silico workflow Virtual docking service with further docking tools (qualitative comparisons) Molecular dynamics for reranking Setting up a relational results database Ligand similarity based clustering of results combinatorial library design (using combinatorial docking)
A second data challenge is possible in 2006 (autumn) With the new EGEE middleware, gLite With a better quality process (efficiency, security…) Need to define target, docking software, compounds database finally in vitro testing and structure activity relationships
Page 33Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
The Challenges Related to Deployment in Africa
Challenge in terms of infrastructure in Africa Bandwidth is a prerequisite
Challenges in terms of technology Grid technology must provide the services for data and knowledge
management
Challenges in terms of human involvement Research laboratories and hospitals must have the expertise and the
commitment
It is time for an international initiative to address neglected diseases using
grid infrastructures and volunteer computing
Page 34Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases
Acknowledgements:
people involved @SCAIVinod Kumar KasamAntje WolfAstrid MaaßMahendrakar SridharHorst Schwichtenberg
people involved @IN2P3Matthieu ReichstadtJean SalzemannYannick LegréFlorence Jacq
cooperation partners
http://wisdom.eu-egee.fr