+ All Categories
Home > Documents > Http:// Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in...

Http:// Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in...

Date post: 02-Jan-2016
Category:
Upload: clifford-sanders
View: 217 times
Download: 0 times
Share this document with a friend
Popular Tags:
23
http://www.itb.cnr.it/bioinfogrid Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod Kasam CLADE workshop, HPDC conference, June 25, 2007, Monterey Bay
Transcript
Page 1: Http:// Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod.

http://www.itb.cnr.it/bioinfogrid

Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria

Presented by

Vinod KasamCLADE workshop, HPDC conference,

June 25, 2007, Monterey Bay

Page 2: Http:// Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod.

25-06-2007, Monterey Bay 2

Outline

• Wisdom introduction• Biological targets• Resources used in wisdom• Production environment• Results• Issues• Conclusions

Page 3: Http:// Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod.

25-06-2007, Monterey Bay 3

Introduction to the disease : malaria

• ~300 million people worldwide are affected

• 1-1.5 million people die every year

• Widely spread

• Caused by protozoan parasites of the genus Plasmodium

Complex life cycle with multiple

stages

Page 4: Http:// Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod.

25-06-2007, Monterey Bay 4

WISDOM-II, second large scale docking deployment against malaria

Parasite DNA synthesis

Parasite cell replication

Parasite DNA synthesis

Parasite detoxification

CEA, Acamba project, France

U. of Modena, Italia

U. of Los Andes, VenezuelaU. of Modena, Italia

U. of Pretoria,South-Africa

Biology partners

Tubulin from Plasmodium/plant/mamal

DHFR from Plasmodium falciparum

DHFR from Plasmodium vivax

GST from Plasmodium falciparum

Malaria target Involved in

Page 5: Http:// Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod.

25-06-2007, Monterey Bay 5

• Biological goal

Proposition of new inhibitors for a family of proteins produced by Plasmodium

• Biomedical informatics goal Deployment of in silico virtual docking on the grid

• Grid goal

Deployment of a CPU consuming application generating large data flows to test the grid operation and services => “data challenge”

WISDOM : Wide In Silico Docking On Malaria

Page 6: Http:// Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod.

25-06-2007, Monterey Bay 6

High Throughput Virtual Docking

Compounds:ZINC- 4,3MChembridge - 500 000

Targets:

3D structures in PDB

One homology model

Millions of chemicalcompounds available High Throughput Screening

1-10$/compound, several hours

Molecular docking (FlexX, Autodock)20 cents/compound, 1 minute

Data challenge on EGEE~ 3 months on ~2000 computers

Hits screeningusing assays performed onliving cells

Leads

Clinical testing

Drug

Page 7: Http:// Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod.

25-06-2007, Monterey Bay 7

Objective of the WISDOM development

• Objective– Dock a whole compound database in a limited time with a minimal

human involvement during the data challenge.

• Need an optimized environment– Production in Limited time– Performance are important

• Need a fault tolerant environment– Stress usage of the grid during the DC– Grid is heterogeneous and dynamic– Data produced are important and can’t be easily reproduced

• Need an automatic production environment– Grid API are not fully adapted for a bulk use at a large scale– Ease the execution– User-friendly hi-level services

Page 8: Http:// Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod.

25-06-2007, Monterey Bay 8

Use of a production system

• Managing thousands of jobs and files is a manually labor-intensive task– Job preparation, submission and monitoring, output retrieval,

failure identification and resolution, job resubmission…– In order to efficiently use the resources

• The amount of transferred data impacts on grid performance– The data must be installed on the grid– The database is stored into subsets

• Grid process introduces significant delays– The submitted jobs must be sufficiently long in order to reduce

the impact of this middleware overhead

• The production system will provide automated and fault-tolerant jobs and files management

Page 9: Http:// Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod.

25-06-2007, Monterey Bay 9

Grid added value for international collaboration on neglected and emerging diseases

• Grids offer unprecedented opportunities for sharing information and resources world wide

Grids are unique tools for :-Collecting and sharing information (Epidemiology, Genomics)-Networking experts-Mobilizing resources routinely or in emergency (vaccine & drug discovery)

Page 10: Http:// Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod.

25-06-2007, Monterey Bay 10

Grid added value of EGEE for a large scale in silico experimentation

• Large computing and storage resources

• 24 hours a day availability of resources, user support

• Workload Management Service

• Information and Monitoring Services

• Data Management Services

• Security

• Reliability of services

Page 11: Http:// Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod.

25-06-2007, Monterey Bay 11

Simplified grid workflow

• FlexX license server :– 6000 floating licenses offered by BioSolveIT to SCAI– Maximum number of concurrent used licenses was 5000

StorageStorageElementElement

ComputiComputingngElementElement

Site1

Site2

StorageStorageElementElement

User interfaceUser interface

ComputiComputingngElementElement

Compounds database

Parameter settingsTarget structures

Compounds sub lists

Results

Results

Statistics

Compounds list

ResourceResourceBrokerBroker

Software

Page 12: Http:// Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod.

25-06-2007, Monterey Bay 12

User Interface

HealthGrid Server

Web Site

WMSSEsCEs &WNs

FlexLM

Schema of the current WISDOM production environment

User Interface

WISDOM production

system

WMSSubmits the jobs

Checks job status Resubmits

CEs &WNs

FlexXjob

SEs

Structure file

Compounds file

inputs

outputs

Output file

Local server

Web Site WISDOMDB

Statistics

FLEXlm

licenselicense

FlexX

Statistics

DMS/GFTP

Page 13: Http:// Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod.

25-06-2007, Monterey Bay 13

Grid infrastructures and projects contributing to WISDOM-II

: European grid infrastructure : European grid project

EELA

EUMedGrid EUChinaGrid

: Regional/national grid infrastructure

AuvergridEGEE

TWGrid

EMBRACE BioinfoGridSHARE

Page 14: Http:// Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod.

25-06-2007, Monterey Bay 14

Instances on different infrastructures

Instances deployed on the different infrastructures during the WISDOM-II data challenge

Page 15: Http:// Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod.

25-06-2007, Monterey Bay 15

Deployment on different infrastrucures

• Up to 5000 computers in more than17 countries mobilized from october 2006 – Jan 2007 to provide CPU

• 1.738 TB of data produced

Distribution of jobs

1% 2% 2% 3%3%

3%

3%

5%

6%

7%

12%15%

38%

EGEE Germany Switzerland

EGEE Asia Pacific

EGEE Russia

Auvergrid

EuChinaGrid

EELA

EGEE South Western Europe

EGEE Central Europe

EGEE Northern Europe

EGEE Italy

EGEE South Eastern Europe

EGEE France

EGEE UKI

Page 16: Http:// Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod.

25-06-2007, Monterey Bay 16

Statistics of deployment

• First DC:– 80 CPU years– 1 TB– 1700 CPUs used in parallel– July 1st - August 15th 2005

• 2nd DC– 100 CPU years– 800 GB– 1700 CPUs used used in parallel– May 1st -April 15th 2006

• 3rd DC– 413 CPU years– 1.7 TB– Up to 5000 CPUs in parallel– 1st October 2006 - 31 January

2007

Number of Jobs 77,504

Total Number of completed dockings 156,407,400

Estimated duration on 1 CPU 413 years

Duration of the experiment 76 days

Average throughput 78,400 dockings/hour

Maximum number of loaded licences (concurrent running jobs)

5,000

Number of used computing elements 98

Average duration of a job 41 hours

Average crunching factor 1,986

Volume of output results 1,738 TB

The crunching factor is the ratio of the total CPU time over the duration of the experiment. It represents the average number of CPUs used simultaneously all along the data challenge and is a metric of the parallelization gain.

Page 17: Http:// Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod.

25-06-2007, Monterey Bay 17

Biological results

The repartition of docking energies of the ZINC database against GST A structure.

(The red column represents a score of -24kj/Mol, the docking score of a co-crystallized

ligand (GTX) of GST A chain)

0

50000

100000

150000

200000

250000

300000

350000

Nu

mb

er

of

com

pou

nd

s

-50 -46 -42 -38 -34 -30 -26 -22 -18 -14 -10 -6 -2 2 6 10 14 18

Docking Energy

Page 18: Http:// Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod.

25-06-2007, Monterey Bay 18

Issues

• Scheduling efficiency of the grid is still a major issue

• The resource broker is still the main bottleneck

• This deployment also shows that it is not possible to do a naive blacklisting of the failing resources, for the simple fact that virtually all the grid resources have produced aborted jobs, and this blacklisting should also take care of the site scheduled downtimes.

• Store and treat the data in a relational database

Page 19: Http:// Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod.

25-06-2007, Monterey Bay 19

Interactive Web Portal

• User Friendly Interface for biologists

• Real Time output of the results– 3D views of the docking poses and structures

• Resubmission of docking jobs

Page 20: Http:// Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod.

25-06-2007, Monterey Bay 20

Conclusion

• Take advantage of the EGEE services, APIs and resources.

• Demonstrated the relevance of computational grids in life science applications

• Manual intervention is reduced (automatic resubmission of jobs)

• Use of AMGA to store results and statistics immediately.

• Interoperable Web Service InterfaceWSDL following the WS-I profile

• Improved flexibility to deploy other bioinformatics applications.

Page 21: Http:// Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod.

25-06-2007, Monterey Bay 21

The next steps

• To address the issue of resource brokers, we are trying to submit the jobs by bypassing resource brokers

• Docking step still requires a lot of manual intervention – Task: improve output data collection and post-docking analysis

• The next step after docking is Molecular Dynamics– Task: deploy Molecular Dynamics computations on grid infrastructures

(successfully deployed already on one target, plasmepsin) – Contribution from CNRS-IN2P3, within the framework of BioinfoGRID,

Fraunhofer SCAI and University of Modena

• Beyond virtual screening, the long term vision: building a grid for malaria– To provide services to research labs working on malaria– To collect and analyze epidemiological data

Page 22: Http:// Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod.

25-06-2007, Monterey Bay 22

Long term vision: a grid for malaria

Use the grid technology to foster research and development on malaria and other neglected diseases

Univ. Los Andes:Biological

targets, Malaria biology

LPC Clermont-Ferrand:

Biomedical grid

SCAI Fraunhofer:Knowledge extraction,

Chemoinformatics

Univ. Modena:Biological targets,

Molecular Dynamics

ITB CNR:Bioinformatics,

Molecular modelling

Univ. Pretoria:Bioinformatics, Malaria biology

Academica Sinica:Grid user interface

Contacts also established with WHO, Microsoft, TATRC, Argonne, SDSC, SERONO, NOVARTIS, Sanofi-Aventis, Hospitals in subsaharian Africa,

HealthGrid:Biomedical grid, Dissemination

CEA, Acamba project:

Biological targets, Chemogenomics

Page 23: Http:// Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod.

25-06-2007, Monterey Bay 23

Acknowledments

Academia Sinica

BioSolveIT

CNR-ITB

CNRS

CEA

Healthgrid

IN2P3

LPC

SCAI Fraunhofer

Università di Modena e Reggio Emilia

Université Blaise Pascal

University of Pretoria

University of Los Andes

Auvergrid

AccambaBioInfoGRID

EGEE

EMBRACE

EUChinaGRID

EUMedGRID

SHARE

TWGrid

Conseil Regional d’Auvergne

European Union

wisdom.healthgrid.org


Recommended