Virtual Laboratory: Data Intensive Science during Holiday @ Robinson Village in Italy! Rajkumar...

Post on 26-Mar-2015

219 views 1 download

Tags:

transcript

Virtual Laboratory:Data Intensive Science during Holiday @

Robinson Village in Italy!

Rajkumar Buyya

Melbourne, Australiawww.buyya.com/ecogrid

WW Grid

2

Grid Warning!

This is a science fiction story on the future of grid computing

All actors mentioned in this talk are Application under consideration is

fictitious. Prof.Watson-II is researching on drug

design. The complete story is fictitious except

the Grid technology!

3

Prof. Watson-II Spends all his time in Lab @ University of Lecce

4

Watson-II’s wife was Unhappy

Since he was not all spending any time with her & kids.

Everyday he goes to lab @ 8am and comes backs to home at 11pm night.

After few day he and his wife had a big fight @ Home: She gives him warning: If he does not

come home tomorrow by 6pm, he will have to face life time consequence.

5

Prof. Watson-II works upto 5pm in Lab @ University of Lecce

Returns to home by 5.30PM!Goes to Work @ 9am

6

Watson-II having moon light dinner with his Wife

7

Prof. Watson-II works up to 5pm in Lab @ University of Lecce

Returns to home by 5.30PM!Goes to Work @ 9am

8

Watson-II promises his wife that he will soon take her for a holiday @ Robinson

Village

9

Prof. Watson-II hires assistant and works smarter!

Returns to home by 5.30PM!Goes to Work @ 9am

10

Watson-II & Family starts their holiday

11

Watson-II & Family on 5 Day Holiday @ Robinson Village

Day 1 @ Robinson Village

13

14

Watson-II happens to meet a Grid researcher on beach!

15

Watson-II quickly reads news clipping that he got from Grid researcher

16

?

17

Watson-II having moon light dinner with his Wife

Day 2 @ Robinson Village

19

Goes to Internet Room & does some surfacing of Grid

researcher page

20

Drug Design: Data Intensive Computing on Grid

A Virtual Laboratory for “Molecular Modelling for Drug Design” on Peer-to-Peer Grid.

It provides tools for examining millions of chemical compounds (molecules) in the Protein Data Bank (PDB) to identify those having potential use in drug design.

In collaboration with: Kim Branson, Structural

Biology, Walter and Eliza Hall Institute (WEHI)

http://www.csse.monash.edu.au/~rajkumar/dd@home/

21

DesignDrug@Home ArchitectureA Virtual Lab for “Molecular Modeling for Drug Design” on P2P Grid

“Screen 2K molecules in 30min. for $10”

Grid Market Directory

ResourceBroker

Grid Info. Service

GTS

GTS

GTS

GTS

“Give me list PDBs sourcesOf type aldrich_300?”

“serv

ice co

st?”

(GTS - Grid Trade Server)

PDB2

“get mol.10 from pdb1 & screen it.”

Data Replica Catalogue

“service providers?”

GTS

PDB1

“mol.10 please?”

“mol.5 please?”

(RB maps suitable Grid nodes and Protein DataBank)

22

Software Tools

Molecular Modelling Tools (DOCK) Parameter Modelling Tools (Nimrod/enFusion) Grid Resource Broker (Nimrod-G) Data Grid Broker Protein Data Bank (PDB) Management and Intelligent Access

Tools PDB databse Lookup/Index Table Generation. PDB and associated index-table Replication. PDB Replica Catalogue (that helps in Resource Discovery). PDB Servers (that serve PDB clients requests). PDB Brokering (Replica Selection). PDB Clients for fetching Molecule Record (Data Movement).

Grid Middleware (Globus and GrACE) Grid Fabric Management (Fork/LSF/Condor/Codine/…)

23

DOCK code*(Enhanced by WEHI, U of

Melbourne)

A program to evaluate the chemical and geometric complementarities between a small molecule and a macromolecular binding site.

It explores ways in which two molecules, such as a drug and an enzyme or protein receptor, might fit together.

Compounds which dock to each other well, like pieces of a three-dimensional jigsaw puzzle, have the potential to bind.

So, why is it important to able to identify small molecules which may bind to a target macromolecule?

A compound which binds to a biological macromolecule may inhibit its function, and thus act as a drug.

Thus disabling the ability of (HIV) virus attaching itself to molecule/protein!

With system specific code changed, we have been able to compile it for Sun-Solaris, PC Linux, SGI IRIX, Compaq Alpha/OSF1

* Original Code: University of California, San Francisco: http://www.cmpharm.ucsf.edu/kuntz/

24

Dock input filescore_ligand yesminimize_ligand yesmultiple_ligands norandom_seed 7anchor_search notorsion_drive yesclash_overlap 0.5conformation_cutoff_factor 3torsion_minimize yesmatch_receptor_sites norandom_search yes . . . . . . . . . . . .maximum_cycles 1ligand_atom_file S_1.mol2receptor_site_file ece.sphscore_grid_prefix ecevdw_definition_file parameter/vdw.defnchemical_definition_file parameter/chem.defnchemical_score_file parameter/chem_score.tblflex_definition_file parameter/flex.defnflex_drive_file parameter/flex_drive.tblligand_contact_file dock_cnt.mol2ligand_chemical_file dock_chm.mol2ligand_energy_file dock_nrg.mol2

Molecule to Molecule to be screenedbe screened

25

score_ligand $score_ligandminimize_ligand $minimize_ligandmultiple_ligands $multiple_ligandsrandom_seed $random_seedanchor_search $anchor_searchtorsion_drive $torsion_driveclash_overlap $clash_overlapconformation_cutoff_factor $conformation_cutoff_factortorsion_minimize $torsion_minimizematch_receptor_sites $match_receptor_sitesrandom_search $random_search . . . . . . . . . . . .maximum_cycles $maximum_cyclesligand_atom_file ${ligand_number}.mol2receptor_site_file $HOME/dock_inputs/${receptor_site_file}score_grid_prefix $HOME/dock_inputs/${score_grid_prefix}vdw_definition_file vdw.defnchemical_definition_file chem.defnchemical_score_file chem_score.tblflex_definition_file flex.defnflex_drive_file flex_drive.tblligand_contact_file dock_cnt.mol2ligand_chemical_file dock_chm.mol2ligand_energy_file dock_nrg.mol2

Parameterized Dock input file

Molecule to be Molecule to be screenedscreened

26

parameter database_name label "database_name" text select oneof "aldrich" "maybridge" "maybridge_300" "asinex_egc" "asinex_epc" "asinex_pre" "available_chemicals_directory" "inter_bioscreen_s" "inter_bioscreen_n" "inter_bioscreen_n_300" "inter_bioscreen_n_500" "biomolecular_research_institute" "molecular_science" "molecular_diversity_preservation" "national_cancer_institute" "IGF_HITS" "aldrich_300" "molecular_science_500" "APP" "ECE" default "aldrich_300";

parameter score_ligand text default "yes";parameter minimize_ligand text default "yes";parameter multiple_ligands text default "no";parameter random_seed integer default 7;parameter anchor_search text default "no";parameter torsion_drive text default "yes";parameter clash_overlap float default 0.5;parameter conformation_cutoff_factor integer default 5;parameter torsion_minimize text default "yes";parameter match_receptor_sites text default "no";parameter random_search text default "yes"; . . . . . . . . . . . .parameter maximum_cycles integer default 1;parameter receptor_site_file text default "ece.sph";parameter score_grid_prefix text default "ece";parameter ligand_number integer range from 1 to 200 step 1;

Dock PlanFile (contd.)

Molecules to be Molecules to be screenedscreened

27

task nodestart copy ./parameter/vdw.defn node:. copy ./parameter/chem.defn node:. copy ./parameter/chem_score.tbl node:. copy ./parameter/flex.defn node:. copy ./parameter/flex_drive.tbl node:. copy ./dock_inputs/get_molecule node:. copy ./dock_inputs/dock_base node:.endtasktask main node:substitute dock_base dock_run node:substitute get_molecule get_molecule_fetch node:execute sh ./get_molecule_fetch node:execute $HOME/bin/dock.$OS -i dock_run -o dock_out copy node:dock_out ./results/dock_out.$jobname copy node:dock_cnt.mol2 ./results/dock_cnt.mol2.$jobname copy node:dock_chm.mol2 ./results/dock_chm.mol2.$jobname copy node:dock_nrg.mol2 ./results/dock_nrg.mol2.$jobnameendtask

Dock PlanFile

28

Nimrod/TurboLinux enFuzion GUI tools for Parameter Modeling

29

Docking Experiment Preparation

Setup PDB DataGrid Index PDB databases Pre-stage (all) Protein Data Bank (PDB) on replica sites Start PDB Server

Create Docking GridScore (receptor surface details) for a given receptor on home node.

Pre-Staging Large Files required for Docking: Pre-stage Dock executables and PDB access client on Grid nodes, if

required (e.g., dock.Linux, dock.SunOS, dock.IRIX64, and dock.OSF1 on Linux, Sun, SGI, and Compaq machines respectively). Use globus-rcp.

Pre-stage/Cache all data files (~3-13MB each) representing receptor details on Grid nodes.

This can can be done demand by Nimrod/G for each job, but few input files are too large and they are required for all jobs). So, pre-staging/caching at http-cache or broker level is necessary to avoid the overhead of copying the same input files again and again!

30

Protein Data Bank

Databases consist of small molecules from commercially available organic synthesis libraries, and natural product databases.

There is also the ability to screen virtual combinatorial databases, in their entirety.

This methodology allows only the required compounds to be subjected to physical screening and/or synthesis reducing both time and expense.

31

Target Testcase

The target for the test case: electrocardiogram (ECE) endothelin converting enzyme. This is involved in “heart stroke” and other transient ischemia.

Is·che·mi·a : A decrease in the blood supply to a bodily organ, tissue, or part caused by constriction or obstruction of the blood vessels.

32

Nimrod/GComputational

Grid Broker

Data Replica CataloguePDB Broker

Algorithm1

AlgorithmN

. . .

PDB Service

PDB2

“Screen mol.5 please?”

GSP1 GSP2 GSP4GSP3(Grid Service Provider)

GSPm

PDB Service

GSPn

1

“advise PDB source?

2“selection & advise: use GSP4!”

5Grid Info. Service

3

“Is GSP4 healthy?”

4

“mol.5 please?”6

“PDB replicas please?”

“Screen 2K molecules in 30min. for $10”

Resource Brokering Architecture for Molecular Screening on World Wide

Grid

7

“process & send results”

33

Nimrod/G in Action:Screening on World-Wide

Grid

34

?

35

Watson-II again saw Grid researcher on beach and asks him a favor!

Can I borrow your Grid identity for 2 days ?

Nice Grid Researcher Trusts Watson & Gives him “his Grid identity” including access to his World Wide Grid

testbed!

Grid Trust on the Beach!

36

Day 3 @ Robinson Village

38

?

39

Watson Gets an Idea while surfing

40

Goes to Internet Room & connects to Grid researcher

machine

41

Connects to his U.Lecce lab machine and copies all protein samples he prepared

before taking holiday

42

Copies Test experiment of Grid researcher & modifies it to use his lab experiment

data.

43

Starts Parameter ExplorationStarts Parameter Exploration

44

Starts Molecular Experimentation

Nimrod/GComputational

Grid Broker

Data Replica CataloguePDB Broker

Algorithm1

AlgorithmN

. . .

PDB Service

PDB2

“Screen mol.5 please?”

GSP1 GSP2 GSP4GSP3 GSPm

PDB Service

GSPn

1 “advise PDB source?

2

“use GSP4!”

5Grid Info. Service

3

“Is GSP4 healthy?”

4

“mol.5 please?”6

“PDB replicas please?”

“Screen 50K molecules in 120min. for $200”

45

Nimrod/G in Action:Screening on World-Wide

Grid

46

?

47

Comes back to Internet room after 2 hours and asks his assistant to test

results

48

Watson-II assistant conducts tests afternoon ?

Sends email to Wantson in the evening: “looks like our client is improving…”

Day 4 @ Robinson Village

50

Watson-II does some more exploration: this time with one million molecules. Asks

Nimrod to email results to his assistant for testing...

51

Starts Parameter ExplorationStarts Parameter Exploration

52

?

53

After Lunch

Watson-II reads email that he received from his assistant and pleased with the results of his experiment.

Sends email to VC of his university to do Press release of his breakthrough discovery ?

54

Did Watson-II invents cure for AIDS ?

Yes

Of course.

?

Vice Chancellor calls for Press Meeting

The news spreads like “I love you Virus” around the world

including Sweden and Norway!

Day 5 @ Robinson Village

57

?

58

?

59

Sweden Announces Nobel Award for a Scientist on Holiday @ Robinson

Village

!

60

“Watson-II the Great” at evening @ Robinson Village

61

Watson-II shares the success with Grid researcher!!!

Day 6 and Beyond!

63

Watson-II & Family returns to their home happily.

64

Watson-II having moon light dinner with his Wife @ home

65

Prof. Watson-II works up to 5pm in Lab @ University of Lecce

Returns to home by 5.30PM!Goes to Work @ 9am

66

Watson-II had moon light dinner with his Wife @ home all his life!

67

Do you want to repeat Watson-II’s success in High Energy Physics ?

?

68

If so, download Software & Explore it in 2006 when LHC

expt. starts Nimrod & Parameteric Computing:

http://www.csse.monash.edu.au/~davida/nimrod/ Economy Grid & Nimrod/G:

http://www.buyya.com/ecogrid/ Virtual Laboratory/DesignDrug@Home:

http://www.buyya.com/dd@home/ Grid Simulation (Java based):

http://www.buyya.com/gridsim/ World Wide Grid testbed:

http://www.buyya.com/ecogrid/wwg/ Looking for new volunteers to grow

Please contact me to barter your & our machines!

Want to build on our work/collaborate: Talk to me now or email: rajkumar@csse.monash.edu.au

69

Thank You… Any ??

Thank You… Any ??

70

Further Information

Books: High Performance Cluster Computing, V1,

V2, R.Buyya (Ed), Prentice Hall, 1999. The GRID, I. Foster and C. Kesselman (Eds),

Morgan-Kaufmann, 1999. IEEE Task Force on Cluster Computing

http://www.ieeetfcc.org Global Grid Forum

www.gridforum.org

IEEE/ACM CCGrid’xy: www.ccgrid.org CCGrid 2002, Berlin: ccgrid2002.zib.de

Grid workshop - www.gridcomputing.org

71

Further Information

Cluster Computing Info Centre: http://www.buyya.com/cluster/

Grid Computing Info Centre: http://www.gridcomputing.com

IEEE DS Online - Grid Computing area:

http://computer.org/dsonline/gc

Compute Power Market Project http://www.ComputePower.com