+ All Categories
Home > Documents > Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar...

Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar...

Date post: 26-Mar-2015
Category:
Upload: faith-gonzalez
View: 216 times
Download: 1 times
Share this document with a friend
Popular Tags:
25
Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia http://www.buyya.com/ecogrid
Transcript
Page 1: Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia .

Nimrod-G and Virtual Lab Tools for Data Intensive Computing on

Grid: Drug Design Case Study

Rajkumar Buyya

Melbourne, Australiahttp://www.buyya.com/ecogrid

Page 2: Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia .

2

Page 3: Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia .

3

Contents

Introduction Resource Management challenges Nimrod-G Toolkit

SPMD/Parameter-Study Creation Tools Grid enabling Drug Design Application Nimrod-G Grid Resource Broker

Scheduling Experiments on World Wide Grid

Conclusions

Scheduling Economics

Grid

EconomyGrid

Page 4: Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia .

4

A typical Grid environment and

Players

Resource Broker

Resource Broker

Application

Page 5: Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia .

5

Grid Characteristics

Heterogeneous Resource Types: PC, WS, Clusters Resource Architecture: CPU Arch, OS Applications: CPU/IO/message intensive Users and Owners Requirements Access Price: different for different users, resources and time. Availability: varies from time to time.

Distributed Resources Ownership Users Each have their own (private) policies and objectives.

Very much similar to heterogeneity and decentralization that is present in “human economies” (democratic and capitalist world).

Hence, we propose the use of “economics” as a metaphor for resource management and scheduling. It regulates supply and demand for resources and offers incentive for resource owners for contributing resources to the Grid.

Page 6: Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia .

6

Grid Tools for Handling

Security

Resource Allocation & Scheduling

Data locality

System Management

Resource Discovery

Uniform Access

Computational Economy

Application DevelopmentNetwork Management

Page 7: Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia .

7

A resource broker for managing, steering, and executing task farming (parametric sweep/SPMD model) applications on Grid based on deadline and computational economy.

Based on users’ QoS requirements, our Broker dynamically leases services at runtime depending on their quality, cost, and availability.

Key Features A single window to manage & control experiment Persistent and Programmable Task Farming Engine Resource Discovery Resource Trading Scheduling & Predications Generic Dispatcher & Grid Agents Transportation of data & results Steering & data management Accounting

Nimrod-G: Grid Resource Broker

Page 8: Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia .

8

Parametric Processing

Multiple RunsSame ProgramMultiple Data Killer Application for the Grid!

ParametersAge Hair

23 CleanAge Hair

23 Clean23 Beard28 Goatee

Age Hair23 Clean23 Beard

Age Hair23 Clean23 Beard28 Goatee28 Clean

Age Hair23 Clean23 Beard28 Goatee28 Clean19 Moustache

Age Hair23 Clean23 Beard28 Goatee28 Clean19 Moustache10 Clean

Age Hair23 Clean23 Beard28 Goatee28 Clean19 Moustache10 Clean

-4000000 Too much

Courtesy: Anand Natrajan, University of Virginia

Magic Engine forManufacturing Humans!

Page 9: Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia .

9

Sample P-Sweep ApplicationsSample P-Sweep Applications

Bioinformatics: Bioinformatics: Drug Design / Protein Drug Design / Protein

ModellingModelling

SensitivitySensitivityexperiments experiments

on smog formationon smog formation

Combinatorial Combinatorial Optimization:Optimization:

Meta-heuristic Meta-heuristic parameter estimationparameter estimation

Ecological Modelling: Ecological Modelling: Control Strategies Control Strategies

for Cattle Tickfor Cattle Tick

Electronic CAD: Electronic CAD: Field Programmable Field Programmable

Gate ArraysGate ArraysComputer Graphics: Computer Graphics: Ray TracingRay Tracing

High Energy High Energy Physics: Physics:

Searching for Searching for Rare EventsRare Events

Finance: Finance: Investment Risk AnalysisInvestment Risk Analysis

VLSI Design: VLSI Design: SPICE SimulationsSPICE Simulations

Aerospace: Aerospace: Wing DesignWing Design

Network SimulationNetwork SimulationAutomobile:Automobile:

Crash Simulation Crash Simulation

Data MiningData Mining

Civil Engineering:Civil Engineering:Building Design Building Design

astrophysics astrophysics

Page 10: Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia .

10

Virtual Drug Design: Data Intensive Computing on Grid

A Virtual Laboratory for “Molecular Modelling for Drug Design” on Peer-to-Peer Grid.

It provides tools for examining millions of chemical compounds (molecules) in the Protein Data Bank (PDB) to identify those having potential use in drug design.

In collaboration with: Kim Branson, Structural

Biology, Walter and Eliza Hall Institute (WEHI)

http://www.csse.monash.edu.au/~rajkumar/vlab

Page 11: Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia .

11

Dock input filescore_ligand yesminimize_ligand yesmultiple_ligands norandom_seed 7anchor_search notorsion_drive yesclash_overlap 0.5conformation_cutoff_factor 3torsion_minimize yesmatch_receptor_sites norandom_search yes . . . . . . . . . . . .maximum_cycles 1ligand_atom_file S_1.mol2receptor_site_file ece.sphscore_grid_prefix ecevdw_definition_file parameter/vdw.defnchemical_definition_file parameter/chem.defnchemical_score_file parameter/chem_score.tblflex_definition_file parameter/flex.defnflex_drive_file parameter/flex_drive.tblligand_contact_file dock_cnt.mol2ligand_chemical_file dock_chm.mol2ligand_energy_file dock_nrg.mol2

Molecule to Molecule to be screenedbe screened

Page 12: Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia .

12

score_ligand $score_ligandminimize_ligand $minimize_ligandmultiple_ligands $multiple_ligandsrandom_seed $random_seedanchor_search $anchor_searchtorsion_drive $torsion_driveclash_overlap $clash_overlapconformation_cutoff_factor $conformation_cutoff_factortorsion_minimize $torsion_minimizematch_receptor_sites $match_receptor_sitesrandom_search $random_search . . . . . . . . . . . .maximum_cycles $maximum_cyclesligand_atom_file ${ligand_number}.mol2receptor_site_file $HOME/dock_inputs/${receptor_site_file}score_grid_prefix $HOME/dock_inputs/${score_grid_prefix}vdw_definition_file vdw.defnchemical_definition_file chem.defnchemical_score_file chem_score.tblflex_definition_file flex.defnflex_drive_file flex_drive.tblligand_contact_file dock_cnt.mol2ligand_chemical_file dock_chm.mol2ligand_energy_file dock_nrg.mol2

Parameterize Dock input file(use Nimrod Tools: GUI/language)

Molecule to be Molecule to be screenedscreened

Page 13: Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia .

13

parameter database_name label "database_name" text select oneof "aldrich" "maybridge" "maybridge_300" "asinex_egc" "asinex_epc" "asinex_pre" "available_chemicals_directory" "inter_bioscreen_s" "inter_bioscreen_n" "inter_bioscreen_n_300" "inter_bioscreen_n_500" "biomolecular_research_institute" "molecular_science" "molecular_diversity_preservation" "national_cancer_institute" "IGF_HITS" "aldrich_300" "molecular_science_500" "APP" "ECE" default "aldrich_300";

parameter score_ligand text default "yes";parameter minimize_ligand text default "yes";parameter multiple_ligands text default "no";parameter random_seed integer default 7;parameter anchor_search text default "no";parameter torsion_drive text default "yes";parameter clash_overlap float default 0.5;parameter conformation_cutoff_factor integer default 5;parameter torsion_minimize text default "yes";parameter match_receptor_sites text default "no";parameter random_search text default "yes"; . . . . . . . . . . . .parameter maximum_cycles integer default 1;parameter receptor_site_file text default "ece.sph";parameter score_grid_prefix text default "ece";parameter ligand_number integer range from 1 to 2000 step 1;

Create Dock PlanFile1. Define Variable and their value

Molecules to be Molecules to be screenedscreened

Page 14: Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia .

14

task nodestart copy ./parameter/vdw.defn node:. copy ./parameter/chem.defn node:. copy ./parameter/chem_score.tbl node:. copy ./parameter/flex.defn node:. copy ./parameter/flex_drive.tbl node:. copy ./dock_inputs/get_molecule node:. copy ./dock_inputs/dock_base node:.endtasktask main node:substitute dock_base dock_run node:substitute get_molecule get_molecule_fetch node:execute sh ./get_molecule_fetch node:execute $HOME/bin/dock.$OS -i dock_run -o dock_out copy node:dock_out ./results/dock_out.$jobname copy node:dock_cnt.mol2 ./results/dock_cnt.mol2.$jobname copy node:dock_chm.mol2 ./results/dock_chm.mol2.$jobname copy node:dock_nrg.mol2 ./results/dock_nrg.mol2.$jobnameendtask

Create Dock PlanFile2. Define Task that jobs need to

do

Page 15: Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia .

15

Use Nimrod-G

0

10

20

30

40

50

60

70

80

90

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

East

West

North

South

Submit & Play!

Page 16: Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia .

16

Legion hosts

Globus Hosts

Bezek is in both Globus and Legion Domains

Arlington

Alexandria

Richmond

HamptonNorfolk

Virginia BeachChesapeakePortsmouth

Newport News

Roanoke

Ap p om a toxRive r

Ja m esRive r

Shena nd oa hRive r

Ra p p a ha nnoc kRive r

Potom a cRive r

VIRGINIA77

81

64

64

66

85

A Nimrod/G Monitor

A Nimrod/G Monitor

CostCostDeadlineDeadline

Page 17: Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia .

17

Discover Discover ResourcesResources

Distribute JobsDistribute Jobs

Establish Establish RatesRates

Meet requirements ? Remaining Meet requirements ? Remaining Jobs, Deadline, & Budget ?Jobs, Deadline, & Budget ?

Evaluate & Evaluate & RescheduleReschedule

Discover Discover More More

ResourcesResources

Adaptive SchedulingAlgorithms

Execution Time (not beyond deadline)

Execution Cost (not beyond budget)

Time Minimisation Minimise Limited by budgetCost Minimisation Limited by deadline MinimiseNone Minimisation Limited by deadline Limited by budget

Adaptive Scheduling Algorithms

Compose & Compose & ScheduleSchedule

Page 18: Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia .

18

Scheduling Experiment on World Wide Grid Testbed

EUROPE:ZIB/GermanyPC2/GermanyAEI/Germany Lecce/ItalyCNR/ItalyCalabria/ItalyPozman/PolandLund/SwedenCERN/Swiss

EUROPE:ZIB/GermanyPC2/GermanyAEI/Germany Lecce/ItalyCNR/ItalyCalabria/ItalyPozman/PolandLund/SwedenCERN/Swiss

ANL/ChicagoUSC-ISC/LA

UTK/TennesseeUVa/Virginia

Dartmouth/NHBU/Boston

ANL/ChicagoUSC-ISC/LA

UTK/TennesseeUVa/Virginia

Dartmouth/NHBU/Boston Monash/Melbourne

VPAC/MelbourneMonash/MelbourneVPAC/Melbourne

Santiago/ChileSantiago/Chile

TI-Tech/TokyoETL/TsukubaAIST/Tsukuba

TI-Tech/TokyoETL/TsukubaAIST/Tsukuba

Cardiff/UKPortsmoth/UK

Cardiff/UKPortsmoth/UK

Kasetsart/BangkokKasetsart/Bangkok

WW Grid

Page 19: Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia .

19

Deadline and Budget Constrained Scheduling Experiment

Workload: 165 jobs, each need 5 minute of CPU time

Deadline: 2 hrs. and budget: 396000 units Strategy: minimise time / cost Execution Cost with cost optimisation

Optimise Cost: 115200 (G$) (finished in 2hrs.) Optimise Time: 237000 (G$) (finished in 1.25 hr.) In this experiment: Time-optimised scheduling run

costs double that of Cost-optimised. Users can now trade-off between Time Vs. Cost.

Page 20: Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia .

20

Globus+LegionGRACE_TS

Australia

Monash Uni.:

Linux cluster

Solaris WS

Nimrod/G

Globus +GRACE_TS

Europe

ZIB/FUB: T3E/Mosix Cardiff: Sun E6500Paderborn: HPCLineLecce: Compaq SCCNR: ClusterCalabria: Cluster CERN: ClusterPozman: SGI/SP2

Globus +GRACE_TS

Asia/Japan

Tokyo I-Tech.:ETL, Tuskuba

Linux cluster

Globus/LegionGRACE_TS

North America

ANL: SGI/Sun/SP2USC-ISI: SGIUVa: Linux ClusterUD: Linux clusterUTK: Linux cluster

Internet

World Wide Grid (WWG)

Globus +GRACE_TS South America

Chile: Cluster

WW Grid

WW Grid

Page 21: Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia .

21

Resources Selected & Price/CPU-sec.

Resource & Location

Grid services & Fabric

Cost/CPU sec. or unit

No. of Jobs Executed

Time_Opt Cost_Opt

Linux Cluster-Monash, Melbourne, Australia

Globus, GTS, Condor

2 64 153

Linux-Prosecco-CNR, Pisa, Italy

Globus, GTS, Fork 3 7 1

Linux-Barbera-CNR, Pisa, Italy

Globus, GTS, Fork 4 6 1

Solaris/Ultas2

TITech, Tokyo, Japan

Globus, GTS, Fork 3 9 1

SGI-ISI, LA, US Globus, GTS, Fork 8 37 5

Sun-ANL, Chicago,US Globus, GTS, Fork 7 42 4Total Experiment Cost (G$) 237000 115200

Time to Complete Exp. (Min.) 70 119

Page 22: Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia .

22

DBC Scheduling for Time Optimization

0

2

4

6

8

10

12

Time (in Minute)

No.

of

Tas

ks i

n E

xecu

tion

Condor-Monash Linux-Prosecco-CNR Linux-Barbera-CNR

Solaris /Ultas2-TITech SGI-ISI Sun-ANL

Page 23: Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia .

23

DBC Scheduling for Cost Optimization

0

2

4

6

8

10

12

14

Time (in Minute)

No.

of

Tas

ks i

n E

xecu

tion

Condor-Monash Linux-Prosecco-CNR Linux-Barbera-CNR

Solaris /Ultas2-TITech SGI-ISI Sun-ANL

Page 24: Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia .

24

Conclusions

P2P and Grid Computing is emerging as a next generation computing platform for solving large scale problems through sharing of geographically distributed resources.

Resource management is a complex undertaking as systems need to be adaptive, scalable, competitive,…, and driven by QoS.

We proposed a framework based on “computational economies” and discussed several economic models for resource allocation and for regulating supply-and-demand for resources.

Scheduling experiments on World Wide Grid demonstrate our Nimrod-G broker ability to dynamically lease or rent services at runtime based on their quality, cost, and availability depending on consumers QoS requirements.

Easy to use tools for composing applications to run on Grid are essential to attracting and getting application community on board.

Economics paradigm for QoS driven resource management is essential to push P2P/Grids into mainstream computing!

Page 25: Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia .

25

Download Software & Information

Nimrod & Parameteric Computing: http://www.csse.monash.edu.au/~davida/nimrod/

Economy Grid & Nimrod/G: http://www.buyya.com/ecogrid/

Virtual Laboratory/Virtual Drug Design: http://www.buyya.com/vlab/

Grid Simulation (GridSim) Toolkit (Java based): http://www.buyya.com/gridsim/

World Wide Grid (WWG) testbed: http://www.buyya.com/ecogrid/wwg/ Looking for new volunteers to grow

Please contact me to barter your & our machines!

Want to build on our work/collaborate: Talk to me now or email: [email protected]


Recommended