+ All Categories
Home > Documents > WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Date post: 27-Mar-2015
Category:
Upload: rachel-barnes
View: 215 times
Download: 0 times
Share this document with a friend
Popular Tags:
34
WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann
Transcript
Page 1: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

WISDOM project – Grid and neglected diseases

CoreGRID Summer School 2006

Dr. Marc Zimmermann

Page 2: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 2Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

Burden of Diseases in Developing World

Disease Endemic Countries

People at Risk

(million)

Clinical Incidence/yr

(million)

Deaths/yr (million)

Disease Burden (DALYs-million)

HIV/AIDS 180 5.900 40 2.8 86 Malaria 101 2.400 300-500 1.2 44.7 TB 211 1.987 8 1.6 35.4 African trypanosomiasis

36 60 0.3-0.5 0.05 1.5

Chagas Disease 21 100 16-18 0.01 0.7 Leishmaniasis 88 350 12 0.05 2 Filariasis 80 1.000 120 --- 5.8 Schistosomiasis 76 500-600 140 0.01 1.7 Onchocerciasis 36 120 18 --- 0.5 Leprosy 24 --- 0.8 --- 0.2

Page 3: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 3Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

Screen in culture

Screen inanimal modelspharmacokineticsanalysis

High throughputscreening

Inhibitors Validated hits Leads Drug candidate

Iterative medicinal chemistryOptimise efficacyand pharmaceutical qualities

Drug Discovery Process

Target selectionValidated?Robust assay system?Structural Information?Lead inhibitors?

Product development

Target

More detailed potency/efficacy studies; pharmacokinetics; early toxicology

Nwaka & Ridley 2003Natural products/traditional medicines

Page 4: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 4Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

Gaps in Drug R&D Tropical / Neglected Diseases

Valid. drug targets

New compounds

Good animal models to assess

safety and efficacy

Good evaluation tools

Clinical trials capacity in developing countries

Capacity for uptake of new

medicines

Capacity for post-approval

processes

Research and discovery

Preclinical development

Phase I Phase II Phase IIIRegistration, launch,

utilisation

GAP I GAP II GAP III GAP IV

Page 5: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 5Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

Potential Impact of an Distributed IT Infrastructure

Speed-up the development of new drugs and vaccines in silico research on grids Collection of epidemiological data for research (modeling,

molecular biology) Ease the deployment of clinical trials in endemic areas

Improve disease monitoring Collect data on drug distribution and treatment follow-up Evaluate impact of policies and programs Improve alert and monitoring system for epidemics

Ease integration of African research laboratories in world research Offer access to IT resources Offer access to data and services (telemedicine, life sciences)

Page 6: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 6Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

From a single PC to a Grid

Farm of PCs

Example:

Novartis

Examples:

Seti@home

Africa@home

Example:

EGEE

Enterprise grid:Mutualization of resources in a company

Volunteer computing: CPU cycles made available by PC owners

Grid infrastructure: Internet + disk and storage resources + services for information management ( data collection, transfer and analysis)

Page 7: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 7Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

Grid Impact on Drug Discovery Workflow down to Drug Delivery (1/2)

Grids provide the necessary tools and data to identify new

biological targets Bioinformatics services (database replication, workflow, ...) Resources for CPU intensive tasks such as genomics

comparative analysis, inverse docking, ...

Grids provide the resources to speed up lead discovery Large scale in silico docking to identify potentially promising

compounds Molecular Dynamics computations to refine virtual screening

and further assess selected compounds

Page 8: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 8Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

Grid Impact on Drug Discovery Workflow down to Drug Delivery (2/2)

Grids provide environments for epidemiology Federation of databases to collect data in endemic areas to

study a disease and to evaluate impact of vaccine, vector

control measures, Resources for data analysis and mathematical modeling

Grids provide the services needed for clinical trials Federation of database to collect data in the centers

participating to the clinical trials

Grids provide the tools to monitor drug delivery Federation of database to monitor drug delivery

Page 9: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 9Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

WISDOM : Wide In Silico Docking On Malaria

Biological goal

Proposition of new inhibitors for a family of proteins produced by Plasmodium falciparum

Biomedical informatics goal Deployment of in silico virtual docking on the grid

Grid goal

Deployment of a CPU consuming application generating large data flows to test the grid operation and services => “data challenge”

Page 10: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 10Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

Introduction to the Disease : Malaria

~300 million people worldwide are affected

1-1.5 million people die every year

Widely spread

Caused by protozoan parasites of the genus Plasmodium

Complex life cycle with multiple stages

Page 11: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 11Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

There is a Real Need for New Drugs to Fight Malaria (WHO)

Drug resistance has emerged for all classes of antimalarials except

artemisinins. Resistance to chloroquine, the cheapest and the most used

drug, is spreading in almost all the endemic countries. Resistance to the combination of sulfadoxine-

pyrimethamine which was already present in South

America and in South-East Asia is now emerging in East

Africa (65% in Western Tanzania)

All countries experiencing resistance to

conventional monotherapies should use

ACTs (artemisinin-based combination therapies)

But there is even the threat of resistance to artemisinin too, as it is

already observed in murine Plasmodium yoelii

Page 12: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 12Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

Identification of New Antimalarial Targets

The available drugs focus on a limited number of biological targets

=> cross-resistance to antimalarials

With the advent of the plasmodium genome, many targets came

into light The potential antimalarial drug targets are broadly classified

into three categories: Targets involved in hemoglobin degradation (proteases like plasmepsins,

falcipains) Targets involved in metabolism Targets engaged in membrane transport and signaling (choline transporter

etc).

The present project WISDOM focuses on hemoglobin metabolism

and especially on Plasmepsin II and Plasmepsin IV

Page 13: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 13Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

Docking Dataflow

hit

crystal structure

ligand data base

junk

placing the ligand

Structureoptimization

Ranking

Protein surface

Ligand

Watermolecule

Scoring

Page 14: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 14Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

High Throughput Virtual Docking

Chemical compounds (ZINC):

Chembridge – 500,000

Drug like – 500,000

Targets (PDB):

Plasmepsin II (1lee,1lf2,1lf3)

Plasmepsin IV (1ls5)

Millions of chemical

compounds available

in laboratories

High Throughput Screening

1-10$/compound, nearly impossible

Molecular docking (FlexX, AutoDock)

~80 CPU years, 1 TB data

Data challenge on EGEE

~6 weeks on ~1700 computers

Hits screening

using assays

performed on

living cells

Leads

Clinical testing

Drug

Page 15: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 15Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

Experimental setup for HT-experimentation

2 different assays: ie. docking tools FlexX and AutoDock

5+3 (5+5) different targets from two families (Plm II and IV): 1lee, 1lf2, 1lf3, 1ls5 A and B chain, including 3 different crystal water setups

4 (2) different assay conditions, ie. parameter variations (place particles, overlap volume, 2 genetic algorithms)

multiple point measuring, i.e. pose clustering and select 10 cluster centres

52 large scale experiments

26 million measurements

52 large scale experiments

26 million measurements

4 x 8 + 2 x 10

Page 16: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 16Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

Testing under Controlled Conditions

redocking studies of 5 co-crystallized ligands and 9 known inhibitorsmixed with 500.000 ZINC compounds

ZINC14 2000

controltest

quality filter

20002000

2000

Page 17: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 17Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

EGEE, International Project of Grid Infrastructure

Started in 2004, >70 partners in the worldProject leader : CERN6 scientific domains with >20 applications deployed170 grid nodes, 17000 CPUs, several PetaBytes of data, 10000

jobs by day

Countries with nodes contributing to the data challenge WISDOM

Page 18: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 18Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

Simplified Grid Workflow

3000 floating licenses given by BioSolveIT to SCAI

Maximum number of used licenses was 1008

StorageStorageElementElement

ComputiComputingngElementElement

Site1

Site2

StorageStorageElementElement

User interfaceUser interface

ComputiComputingngElementElement

Compounds database

Parameter settingsTarget structures

Compounds sub lists

Results

Results

Statistics

Compounds list

RessourceRessourceBrokerBroker

Software

FlexX License

Server

EGEE

Page 19: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 19Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

WISDOM Architecture

GRIDGrid services (RB, RLS…)Grid resources (CE, SE)Application components

(Software, database)

wisdom_install

Installer Tester

wisdom_test

wisdom_executionWorkload definition

Job submissionJob monitoring

Job bookkeepingFault tracking

Fault fixingJob resubmission

Set of jobs

User

wisdom_collect

Accounting data

Superviser

wisdom_sitewisdom_db

License server

Page 20: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 20Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

Objective of the WISDOM Development

Objective Producing a large amount of data in a limited time with a

minimal human cost during the data challenge.

Need an optimized environment Limited time Performance goal

Need a fault tolerant environment Grid is heterogeneous and dynamic Stress usage of the grid during the DC

Need an automatic production environment Execution with the Biomedical Task Force Grid API are not fully adapted for a bulk use at a large scale

Page 21: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 21Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

Number of Docked Ligands (millions) vs. Time (days)

1

2

3

4

5

6

1: Intensive submission of FlexX jobs with

Chembridge ligands base

2: Resubmission

3: Intensive submission of FlexX jobs with drug

like ligands base

4: Resubmission

5: Intensive submission of Autodock jobs with

Chembridge ligands base

6: Resubmission

Page 22: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 22Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

Number of Running and Waiting Jobs vs. Time

1

2

3

4

5

6

Page 23: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 23Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

SouthEasternEurope, 10%

SouthWesternEurope, 12% Italy, 16%

France, 18%

UKI, 29%NorthernEurope, 7%

CentralEurope, 4%

AsiaPacific, 2%

GermanySwitzerland, 1%

Russia, 1%

Total Amount of CPU Provided by EGEE Federation

Page 24: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 24Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

Exploitation Metrics

Metrics FlexX + Autodock phases

Total CPU time 80 years

Number of jobs 72751

Number of grid nodes 58

Number of jobs running in parallel on the grid

1643

Volume of output data 946 GB

Volume of transferred data (input + output) 6302 GB

Page 25: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 25Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

Performance Metrics

Metrics FlexX + Autodock phases

Cumulated millions number of docked ligands

41,27

Number of docked ligands / h 46475

Effective CPU time 67,15 years

Effective duration 37 days

Crunching factor 662

Average transfer rate 0,8 MB/s

Peak rate 62,1 MB/s

Page 26: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 26Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

Efficiency Metrics (1/2)

Metrics FlexX + Autodock phases

Success rate 77,0 %

Success rate after results checking

46,2 %

Success rate after results checking without WISDOM failures

63,0 %

Efficiency depends on :

Heterogeneous and dynamic nature of the grid

Stress usage

Automatic jobs (re)submission (“sink-hole” effect)

Page 27: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 27Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

Efficiency Metrics (2/2)

Successful jobs 46 %

Workload Management failure 10 % - Overload, Disk failure

- Misconfiguration, disk space problem

- Air-conditioning, electrical cut

Data Management failure 4 % - Network / connection

- Electrical cut

- Unknown

Sites failure 9 % - Misconfiguration, Tar command, disk space

- Information system update

- Jobs number limitation in the waiting queue

- Air-conditioning, Electrical cut

Unclassified 4 % - lost jobs

- Unknown

Server license failure 23 % - Server failure

- Electric cut

- Server stop

WISDOM failure 4 % - Jobs distribution

- Human failure

- Script

Page 28: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 28Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

Influence of the Assay Conditions

with / without crystal waterdifferent pockets parameter settings

score correlation increasing

Page 29: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 29Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

Different Docking Tools

AutoDock: removal of internal stress term scaling of units

AutoDock

Fle

xX

Page 30: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 30Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

Best Scoring Compounds for 1lee in Parameter Set 1, FlexX

WISDOM-4905131, -48,6

WISDOM-4905002, -48,1

WISDOM-4919013, -47,9

WISDOM-4905154, -47,7

WISDOM-4629716, -45,6

WISDOM-2783458, -45,0

WISDOM-23511620, -43,6

WR100400, 0.11 μm

Page 31: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 31Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

Lessons Learned and Conclusions

Biological goal Top scoring compounds possess basic chemical groups like thiourea,

guanidino, and amino acrolein as core structure. Identified compounds are non peptidic and low molecular weight compounds. More insights are needed and further analysis has to be done Check additional information resources

Biomedical informatics goal WISDOM (Wide In-Silico Docking On Malaria) is the first large scale drug

discovery initiative on an open grid infrastructure Docking tools are not black boxes There are many different variations of the same problem leading to different

resultsGrid goal

About 80 CPU years to produce TB of data Don‘t trust databases and forget flat files Some tools are still missing

Page 32: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 32Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

Future Works

Extension of in silico workflow Virtual docking service with further docking tools (qualitative comparisons) Molecular dynamics for reranking Setting up a relational results database Ligand similarity based clustering of results combinatorial library design (using combinatorial docking)

A second data challenge is possible in 2006 (autumn) With the new EGEE middleware, gLite With a better quality process (efficiency, security…) Need to define target, docking software, compounds database finally in vitro testing and structure activity relationships

Page 33: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 33Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

The Challenges Related to Deployment in Africa

Challenge in terms of infrastructure in Africa Bandwidth is a prerequisite

Challenges in terms of technology Grid technology must provide the services for data and knowledge

management

Challenges in terms of human involvement Research laboratories and hospitals must have the expertise and the

commitment

It is time for an international initiative to address neglected diseases using

grid infrastructures and volunteer computing

Page 34: WISDOM project – Grid and neglected diseases CoreGRID Summer School 2006 Dr. Marc Zimmermann.

Page 34Marc Zimmermann, CSS-2006 WISDOM project – Grid and neglected diseases

Acknowledgements:

people involved @SCAIVinod Kumar KasamAntje WolfAstrid MaaßMahendrakar SridharHorst Schwichtenberg

people involved @IN2P3Matthieu ReichstadtJean SalzemannYannick LegréFlorence Jacq

cooperation partners

http://wisdom.eu-egee.fr


Recommended