A Survey of Programming Frameworks for Dynamic Grid Workflow Applications November 2 nd, 2007 Taura...

transcript

A Survey of Programming Frameworks for Dynamic Grid Workflow

Applications

November 2nd , 2007Taura Lab

Ken Hironaka

Background

• Attempts to analysis databases of enormous size– Genetic sequence database• BLAST (Basic Local Alignment Search Tool) library

– MEDLINE journal abstract database• Enju (a syntactic parser for English)

• Improvements in algorithms are not enough to handle the overwhelming amount of data– Need to be able to parallelize computation

Basic Demands

• Express the workload with ease– Don’t have to think about complex configuration

files• Parallel Computation– No Distributed Computing experts required!

Well known frameworks

• Batch Schedulers– Solution for cluster

computers– Submit each task as a “Job”

in input file(s)– The job is scheduled to an

idle node– Good for embarrassingly

parallel tasks• Tasks with no inter-task

dependencies

– Data sharing by NFS• Easy data collection

Central Manager

Busy Nodes

Submit

Assign

Cluster

Arising Problems

• Handling Workflows

• Coping with Grid(multi-cluster) environments

• Creation of tasks/aggregation of results

Handling Workflows

• Most Tasks are not so embarrassingly parallel– Blindly scheduling jobs is not good enough

• Workflows: Dependencies between tasks– Passing output files as input files bet. tasks

• Eg: Natural Language Processing

Phonological Analysis

MorphologicalAnalysis

SyntacticAnalysis

SemanticAnalysis

Task File

Coping with Grid environments

• Multiple Clusters– 1 huge cluster is rare

• Connectivity in WANs– Firewalls, NATs

• File sharing problems– Independent file systems

• Dynamics– Nodes joining and

leaving(failure)

Fire Wall

Task creation/data collection

• “Task” in the conventional sense– Simple compute a given data

• Manual task creation– Splitting the problems into sub-problems– Tedious manual work for large input/databases

• Manual data collection– Collecting results afterwards– What if they are dispersed all over the Grid?

• Not so trivial for modern settings– Such tasks need to be built into the framework

Detailed summary of the demands

• Modern Parallelization frameworks⇒Frameworks that facilitate Workflow

applications on dynamic Grid environments– Handle workflows with grace– Cope with WAN connectivity problems– Handle Data Transfers– Cope with dynamic changes in resources– Automatically create tasks/collect results– EASY TO USE!

The Organization of this Presentation

• Background/Introduction• Existing Programming Frameworks• Conclusion/Future Work

Condor

• One of many available batch schedulers– Maintains a pool of idle nodes in a cluster

• Goes beyond a regular scheduler– Allows workflow expression for submitted tasks– File Transfer extension to handle data driven jobs

• Grid-Enabled (Condor-G)– Uses the Globus toolkit to run on multiple-clusters• Allows jobs to be scheduled on different pools on

different clusters

DAGMan: Expressing Workflows

• Extension to Condor– Executes a set of Tasks with DAG dependencies

• DAG(Directed Acyclic Graph)– An expression of workflows– A→B : A must finish before B starts– eg: A→C, B→C, C→D, C→E– Can express general workflows

• Fault-Tolerance– In case of failure, restarts from the job that failed

• Eg. If Task C fails, Task A and B are not redone

DAGMan: Howto

• Create a script defining jobs and dependencies

• Submit DAG

– It will be automatically translated into Condor Jobs and will be scheduled accordingly in Condor

### sample.dag ###

#define jobsJob A A.shJob B B.pyJob C C.plJob D D.shJob E E.sh

#define dependenciesPARENT A B CHILD CPARENT C CHILD D E

$condor_submit_dag sample.dag

Stork• Data Placement Scheduler for

Condor• File transfer across file

systems– ftp, scp, http, etc…

• A transfer is treated as a DAGMan Job– Allows Jobs to pass files

without shared FS

• Inter-task data passing is not possible– Must use a third-party server to

pass data

### trans.stork ###

[ dest_url = "file:/tmp/hoge.tar.gz"; src_url = "ftp://www.foo.com/hog.tar.gz”; dap_type = transfer; ]

### sample2.dag ###DATA INPUT0 trans.storkJOB A A.sh

PARENT INPUT0 CHILD A

Review of Condor• Strong in classical batch queuing system related topics

• Pros– Handles workflows and fault-tolerance– Possible to deploy on multiple clusters

• Cons– Condor and its extensions must be installed by the system administrator

on all nodes on each cluster• Big initial overhead• Cannot add more nodes dynamically

– Limited file transfer options• Inter-task data passing is not possible

– Task creation and result collection done manually

Ibis(Satin)

• Java-based parallel computation library• Distributed Object Oriented– Transparent location of distributed objects– Offers RMI(Remote Method Invocation)• Transparent delegation of computation

• Divide-and-Conquer type applications– Satin

foo.doJob(args)

compute

Divide-and-Conquer

• One large problem may be recursively split into numerous smaller subproblems– Eg. Quick Sort, Fibonacci

• SPAWN– Create sub-problems

“children”

• SYNC– By parent– Wait for sub-problem results

• Can Express DAG workflows

Fib(20)

= Fib(19) + Fib(18)

Parent – ChildRelationship

Divide-and-Conquer: HowTo

• Import the library• Define user class

extending on the SatinObject library class

• Define computation methods– Use recursion

• Allows creation of sub-problems

– sync()• Implicit definition of

dependencies

### fib.java ###import ibis.satin.SatinObject;

Class Fib extends SatinObjectimplements …{ public int fib(int N){ if(n<2)return n;

int x = fib(N-1); int y = fib(N-2);

sync(); return x + y; }}

Implicit spawn

Wait for results

Random Work Stealing

• strategy to load-balance among participating nodes– An idle node steals an

unfinished sub-problem from a random node

– The result is returned to the victim node

• Adapts to joining nodes– Automatically acquire

Node 0

Node 2

Node 1STEAL

Dealing with Failures

• When a node fails, its sub-problems needs to be restarted.

• Orphan Tasks– Sub-problems that lose

the parent by which their results are used

– Orphan Tasks results are circulated among nodes

Node 0

Node 1

Node 2

OrphanedSub-Problems

Results cached & circulated

Review of Ibis(Satin)

• Pros– Benefits from targeting divide and conquer applications

• Able to handle workflow by using spawn and sync• Automatically creates tasks/collect results

– Handles dynamic joining/leaving of nodes• Random work stealing• Recycling Orphan sub-problem results

• Cons– Currently, only supports direct communication among

nodes(not for Firewall or NAT)– Targeted for CPU intensive applications

• No primitives for file transfer over the network

Map-Reduce

• Framework for processing large homogenous data on clusters– Handling large databases in google

• The user defines 2 functions– Map, Reduce

Reduce()

Input DataOutput per

reducer

Map-Reduce: Howto

• Abstraction– Data file →

set of key/values

• Map– (k1,v1) → (k2, v2)– values with same key are

combined

• Reduce– (k2, list of v2) → list of

### word count ###

#key: document name#value: contents

def Map(key, value): #emit 1 for each word for w in value: emit(w, 1)

#key: word#values: list of 1s

def Reduce(key,values): result = 0 #add up 1s for i in values: result += 1 emit(key, result)

Implementation

• Master – Worker Model– Worker:• Map Workers and Reduce Workers• Data is directly transferred from Map → Reduce

– Master: coordinates flow of data bet. Workers– Work on failed workers are restarted

• Distributed File System– Collection of results is made simple

Review of Map-Reduce

• Abstracts data files to key/value sets– Computes on them using user defined function

• Pros– Automatic task create/result collection– Automatic file transfers between Map/Reduce– Fault tolerant

• Cons– Map – Reduce Model is still restrictive for many real-life application– Not for WAN– Cannot add nodes dynamically

Comparison of FrameworksCondor Ibis(Satin) Map-Reduce

Grid connectivity ○ × ×

Workflow ○ ○ △

File Transfer △ × ○

Task creation/data collection

× ○ ○

Join/Leave of nodes △ ○ △

Deployment Ease △ ○ ○

- Each have their own strength and weaknesses- Not so trivial to make Grid workflow application easy for scientific computing users

Conclusion

• We have presented a series of viable choices when one attempts to perform parallel workflow applications in a Grid environment

• File Transfer, Task creation/data collection– Need tasks to be able to interact with external

entities• Ibis: parent-child relationship• Map-Reduce: master – worker, worker – worker

Future Works

• Workflow tasks cannot be isolated entities– Need means of interaction among them• Are raw sockets enough?• WAN, dynamic resource compatible

• Grid enabled Workflow framework with following properties– RMI, file transfer primitive between tasks

Map Worker

Reduce Worker

MapReduce

Input in Splits

Output per reducer

MasterNotify

Go Fetch

Node 0

Node 1

Node 2

OrphanedSub-Problems

Adding Nodes

• Possible to add more nodes at runtime

• Uses a global server that is accessible from everywhere– A new node uses this server

to bootstrap itself and join the already participating nodes

• Random Work Stealing– Automatically load-balances

in the face of new nodes

Bootstrap Server

Join and Steal

Satin system

A Survey of Programming Frameworks for Dynamic Grid Workflow Applications November 2 nd, 2007 Taura...

Documents