Post on 25-Dec-2015
transcript
A Survey of Programming Frameworks for Dynamic Grid Workflow
Applications
November 2nd , 2007Taura Lab
Ken Hironaka
Background
• Attempts to analysis databases of enormous size– Genetic sequence database• BLAST (Basic Local Alignment Search Tool) library
– MEDLINE journal abstract database• Enju (a syntactic parser for English)
• Improvements in algorithms are not enough to handle the overwhelming amount of data– Need to be able to parallelize computation
Basic Demands
• Express the workload with ease– Don’t have to think about complex configuration
files• Parallel Computation– No Distributed Computing experts required!
Well known frameworks
• Batch Schedulers– Solution for cluster
computers– Submit each task as a “Job”
in input file(s)– The job is scheduled to an
idle node– Good for embarrassingly
parallel tasks• Tasks with no inter-task
dependencies
– Data sharing by NFS• Easy data collection
Central Manager
Busy Nodes
Submit
Assign
Cluster
Arising Problems
• Handling Workflows
• Coping with Grid(multi-cluster) environments
• Creation of tasks/aggregation of results
Handling Workflows
• Most Tasks are not so embarrassingly parallel– Blindly scheduling jobs is not good enough
• Workflows: Dependencies between tasks– Passing output files as input files bet. tasks
• Eg: Natural Language Processing
Phonological Analysis
MorphologicalAnalysis
SyntacticAnalysis
SemanticAnalysis
Task File
Coping with Grid environments
• Multiple Clusters– 1 huge cluster is rare
• Connectivity in WANs– Firewalls, NATs
• File sharing problems– Independent file systems
• Dynamics– Nodes joining and
leaving(failure)
leave
join
Fire Wall
Task creation/data collection
• “Task” in the conventional sense– Simple compute a given data
• Manual task creation– Splitting the problems into sub-problems– Tedious manual work for large input/databases
• Manual data collection– Collecting results afterwards– What if they are dispersed all over the Grid?
• Not so trivial for modern settings– Such tasks need to be built into the framework
Detailed summary of the demands
• Modern Parallelization frameworks⇒Frameworks that facilitate Workflow
applications on dynamic Grid environments– Handle workflows with grace– Cope with WAN connectivity problems– Handle Data Transfers– Cope with dynamic changes in resources– Automatically create tasks/collect results– EASY TO USE!
The Organization of this Presentation
• Background/Introduction• Existing Programming Frameworks• Conclusion/Future Work
Condor
• One of many available batch schedulers– Maintains a pool of idle nodes in a cluster
• Goes beyond a regular scheduler– Allows workflow expression for submitted tasks– File Transfer extension to handle data driven jobs
• Grid-Enabled (Condor-G)– Uses the Globus toolkit to run on multiple-clusters• Allows jobs to be scheduled on different pools on
different clusters
DAGMan: Expressing Workflows
• Extension to Condor– Executes a set of Tasks with DAG dependencies
• DAG(Directed Acyclic Graph)– An expression of workflows– A→B : A must finish before B starts– eg: A→C, B→C, C→D, C→E– Can express general workflows
• Fault-Tolerance– In case of failure, restarts from the job that failed
• Eg. If Task C fails, Task A and B are not redone
A
B
C
D
E
DAGMan: Howto
• Create a script defining jobs and dependencies
• Submit DAG
– It will be automatically translated into Condor Jobs and will be scheduled accordingly in Condor
### sample.dag ###
#define jobsJob A A.shJob B B.pyJob C C.plJob D D.shJob E E.sh
#define dependenciesPARENT A B CHILD CPARENT C CHILD D E
$condor_submit_dag sample.dag
Stork• Data Placement Scheduler for
Condor• File transfer across file
systems– ftp, scp, http, etc…
• A transfer is treated as a DAGMan Job– Allows Jobs to pass files
without shared FS
• Inter-task data passing is not possible– Must use a third-party server to
pass data
### trans.stork ###
[ dest_url = "file:/tmp/hoge.tar.gz"; src_url = "ftp://www.foo.com/hog.tar.gz”; dap_type = transfer; ]
### sample2.dag ###DATA INPUT0 trans.storkJOB A A.sh
PARENT INPUT0 CHILD A
Review of Condor• Strong in classical batch queuing system related topics
• Pros– Handles workflows and fault-tolerance– Possible to deploy on multiple clusters
• Cons– Condor and its extensions must be installed by the system administrator
on all nodes on each cluster• Big initial overhead• Cannot add more nodes dynamically
– Limited file transfer options• Inter-task data passing is not possible
– Task creation and result collection done manually
Ibis(Satin)
• Java-based parallel computation library• Distributed Object Oriented– Transparent location of distributed objects– Offers RMI(Remote Method Invocation)• Transparent delegation of computation
• Divide-and-Conquer type applications– Satin
foo.doJob(args)
RMI
compute
foo
Divide-and-Conquer
• One large problem may be recursively split into numerous smaller subproblems– Eg. Quick Sort, Fibonacci
• SPAWN– Create sub-problems
“children”
• SYNC– By parent– Wait for sub-problem results
• Can Express DAG workflows
Fib(20)
= Fib(19) + Fib(18)
Parent – ChildRelationship
Divide-and-Conquer: HowTo
• Import the library• Define user class
extending on the SatinObject library class
• Define computation methods– Use recursion
• Allows creation of sub-problems
– sync()• Implicit definition of
dependencies
### fib.java ###import ibis.satin.SatinObject;
Class Fib extends SatinObjectimplements …{ public int fib(int N){ if(n<2)return n;
int x = fib(N-1); int y = fib(N-2);
sync(); return x + y; }}
Implicit spawn
Wait for results
Random Work Stealing
• strategy to load-balance among participating nodes– An idle node steals an
unfinished sub-problem from a random node
– The result is returned to the victim node
• Adapts to joining nodes– Automatically acquire
tasks
Node 0
Node 2
Node 1STEAL
STEAL
Dealing with Failures
• When a node fails, its sub-problems needs to be restarted.
• Orphan Tasks– Sub-problems that lose
the parent by which their results are used
– Orphan Tasks results are circulated among nodes
Node 0
Node 1
Node 2
OrphanedSub-Problems
Results cached & circulated
Review of Ibis(Satin)
• Pros– Benefits from targeting divide and conquer applications
• Able to handle workflow by using spawn and sync• Automatically creates tasks/collect results
– Handles dynamic joining/leaving of nodes• Random work stealing• Recycling Orphan sub-problem results
• Cons– Currently, only supports direct communication among
nodes(not for Firewall or NAT)– Targeted for CPU intensive applications
• No primitives for file transfer over the network
Map-Reduce
• Framework for processing large homogenous data on clusters– Handling large databases in google
• The user defines 2 functions– Map, Reduce
Map()
Map()
Map()
Reduce()
Reduce()
Input DataOutput per
reducer
Map-Reduce: Howto
• Abstraction– Data file →
set of key/values
• Map– (k1,v1) → (k2, v2)– values with same key are
combined
• Reduce– (k2, list of v2) → list of
v2
### word count ###
#key: document name#value: contents
def Map(key, value): #emit 1 for each word for w in value: emit(w, 1)
#key: word#values: list of 1s
def Reduce(key,values): result = 0 #add up 1s for i in values: result += 1 emit(key, result)
Implementation
• Master – Worker Model– Worker:• Map Workers and Reduce Workers• Data is directly transferred from Map → Reduce
– Master: coordinates flow of data bet. Workers– Work on failed workers are restarted
• Distributed File System– Collection of results is made simple
Review of Map-Reduce
• Abstracts data files to key/value sets– Computes on them using user defined function
• Pros– Automatic task create/result collection– Automatic file transfers between Map/Reduce– Fault tolerant
• Cons– Map – Reduce Model is still restrictive for many real-life application– Not for WAN– Cannot add nodes dynamically
Comparison of FrameworksCondor Ibis(Satin) Map-Reduce
Grid connectivity ○ × ×
Workflow ○ ○ △
File Transfer △ × ○
Task creation/data collection
× ○ ○
Join/Leave of nodes △ ○ △
Deployment Ease △ ○ ○
- Each have their own strength and weaknesses- Not so trivial to make Grid workflow application easy for scientific computing users
Conclusion
• We have presented a series of viable choices when one attempts to perform parallel workflow applications in a Grid environment
• File Transfer, Task creation/data collection– Need tasks to be able to interact with external
entities• Ibis: parent-child relationship• Map-Reduce: master – worker, worker – worker
Future Works
• Workflow tasks cannot be isolated entities– Need means of interaction among them• Are raw sockets enough?• WAN, dynamic resource compatible
• Grid enabled Workflow framework with following properties– RMI, file transfer primitive between tasks
Map Worker
Map Worker
Map Worker
Reduce Worker
Reduce Worker
MapReduce
Input in Splits
Output per reducer
MasterNotify
Go Fetch
Node 0
Node 1
Node 2
OrphanedSub-Problems
Adding Nodes
• Possible to add more nodes at runtime
• Uses a global server that is accessible from everywhere– A new node uses this server
to bootstrap itself and join the already participating nodes
• Random Work Stealing– Automatically load-balances
in the face of new nodes
Bootstrap Server
Join and Steal
Satin system