+ All Categories
Home > Documents > Pwrake A Parallel and Distributed Flexible Workflow Management...

Pwrake A Parallel and Distributed Flexible Workflow Management...

Date post: 13-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
1
Pwrake : A Parallel and Distributed Flexible Workflow Management Tool for Wide-area Data Intensive Computing SRCFITS = FileList[ "#{INPUT_DIR}/*.fits" ] file( "pimages.tbl" ) do OUTFITS = SRCFITS.map do |i| o = i.sub(/^(.*?)([^¥/]+).fits/,'p/¥2.p.fits') file( o => [i, HDR] ) do |t| t.rsh "mProjectPP #{i} #{o} #{HDR}" end o end pw_multitask( "Proj" => OUTFITS ).invoke sh "mImgtbl p pimages.tbl" end This poster proposes Pwrake, a parallel and distributed flexible workflow management tool based on Rake, a domain specific language for building applications in the Ruby programming language. Rake is a similar tool to make and ant. It uses a Rakefile that is equivalent to a Makefile in make, but written in Ruby. Due to a flexible and extensible language feature, Rake would be a powerful workflow management language. The Pwrake extends Rake to manage distributed and parallel workflow executions that include remote job submission and management of parallel executions. This paper discusses the design and implementation of the Pwrake, and demonstrates its power of language and extensibility of the system using a practical e-Science data- intensive workflow in astronomical data analysis on the Gfarm file system as a case study. Extending a scheduling algorithm to be aware of file locations, 20% of speed up is observed using 8 nodes (32 cores) in a PC cluster. Using two PC clusters located in different institutions, the file location aware scheduling shows scalable speedup. The extensible Pwrake is a promising workflow management tool even for wide-area data analysis. Masahiro Tanaka and Osamu Tatebe (University of Tsukuba) Local Storage Local Storage Local Storage File System Nodes file1 file2 file3 Local Storage file4 Job for File 1 Job for File 3 Job for File 3 Slow Fast Rake syntax = Ruby syntax file “prog” => [“a.o”, “b.o”] do sh “cc –o prog a.o b.oend Ruby method defined in Rake Ruby code block enclosed by do … end or {…} executed as a task action. Key-value argument to file method task_name => prerequisites site core nodes memory Univ of Tsukuba quad 8 4GB AIST dual 8 2GB Workflow Montage : a tool to combine astronomical images http://montage.ipac.caltech.edu/ Input data: 2MASS All sky survey 1,580 files (3.3 GB) Platform : / /subaru /subaru/spcam /akari /archives /akari/fis /archives/2mass /labA /labA/personB data data Laboratory A data data Seamless access to data files in a Single Global Tree NAOJ JAXA Public Data Analyzed Data Access-protected to Observer Astronomy data archives from observatories in the world Standard protocols for data access Data Intensive Computing in e-Science Storage CPU CPU CPU file1 file2 file3 Storage CPU CPU CPU Storage I/O is bottleneck due to access congestion Storage Storage file1 file2 file3 Network File System Distributed File System parallel execution Distributed FS provides efficient I/O Data Intensive Computing requires Distributed File System Gfarm: a Wide-area Distributed File System What is Rake ? A build tool similar to make Written in Ruby language Part of Ruby 1.9.x Why Rake ? Widely used as a build tool Easy to write complicated workflows using Ruby language features such as parameter sweep Easy to extend behavior by inheriting Task class Easy to define task dynamically Requirement for Workflow Tool PwMultitask class Prerequisite Tasks SSH connection Task1 Task2 Task3 Task Queue Thread Queue for remote executions worker thread1 worker thread2 worker thread3 remote host1 remote host2 remote host3 enqueue dequeue Task1 Task2 Task3 AffinityQueue worker thread1 worker thread2 worker thread3 push with hostname Queue for host1 pop with hostname Queue for host2 Queue for host3 Pwrake Implementation mProjectPP task definition for Pwrake mProjectPP mDiff mBgModel mBackground mAdd mFitplane m 1 = a' 1 x+b' 1 y+c' 1 m 2 = a' 2 x+b' 2 y+c' 2 a 1 x+b 1 y+c 1 =0 a 2 x+b 2 y+c 2 =0 Final image Input images Montage workflow Two sites : Gfarm (#5 and #6) with 48 cores Site: Univ. of Tsukuba and AIST Scheduling: Affinity scheduling (same as #3,4) Arrangement of input data : #5: Each cluster has one file replica for each input file. #6: See figure below. Performance : #5#6 : 41% speedup Scalable speedup is observed in comparison to one-site Performance Evaluation Position of image file U. Tsukuba (32 cores) AIST (16 cores) Result of Performance Evaluation 20% 41% elapsed time (sec) One site : Site: Univ. of Tsukuba NFS (plot #1): Elapsed time increases even as the number of core increases. Gfarm (plot #2-6): #2 : Without Affinity scheduling #3 : With Affinity scheduling #4 : Same as #3 except input data are distributed across compute nodes All the cases show scalable speedup. Performance (32 cores) : #2#3 : 14 % speedup #2#4 : 20 % speedup Pwrake = Rake + Parallel Workflow extension CASE STUDY : Astronomy Workflow BACKGROUND AND MOTIVATION Task B Dynamic Task Definition Task A Task B list Task B target Task B list is defined in Task A TASK_B_LIST = Array.new task "A" do TASK_B_LIST << ... end task "B" => "A" do a = TASK_B_LIST.map do |b| task b do ... end end task("B-target" => a).invoke end GXP make A workflow management tool which exploits the GNU make and uses GXP, a parallel shell tool written in Python, as the underlying distributed execution engine. Define workflows in Makefile. It has implicit and explicit rules to execute, variable values, and shell scripts. It is possible to reduce the length of a workflow description dramatically compared to the DAG input file, and to generate a general workflow for applications. This research is inspired by the GXP make. Swift A scientific workflow system designed for loosely coupled computations. Define workflows in a statically typed language called SwifScript. Swift dispatches a workflow to another scheduler, such as Karajan, while it is not intended for users to extend the scheduler. Such batch job submission needs granularity of jobs for efficient execution. RELATED WORKS Pwrake, a parallel and distributed flexible workflow management tool, is proposed. Pwrake is extensible, and has flexible and powerful workflow language to describe scientific workflow. We demonstrate a practical e-Science data-intensive workflow in astronomical data analysis on Gfarm file system in wide area environment. Extending a scheduling algorithm to be aware of file locations, 20% of speed up was observed using 8 nodes (32 cores) in a PC cluster. Scalable speedup is observed in the measurement using two PC clusters located at different sites, if each file is grouped by coordinate and placed at an appropriate site based on the group. CONCLUSION Exploit local I/O for scalable I/O performance Move and execute program instead of moving large-scale data So far there is no workflow tool with file affinity scheduling. Local Storage Local Storage Local Storage Internet Gfarm File System / /dir1 file1 file2 /dir2 file3 file4 Computer nodes Local Storage Local Storage Local Storage http://datafarm.apgrid.org/ Global namespace to federate storage of compute nodes Designed for data intensive computing in wide area Key issue for Scalable I/O performance: File Affinity Task Scheduling Extensibility : Able to choose scheduling scheme, especially affinity-aware scheduling. Programmable : Easy to define complicated workflows and parameter sweep. Rule-based : Same definition for different set of data. (DAG-based workflow is not re-usable.) Dynamic Task Definition: Define tasks based on the result of former tasks. Performance : Scalability in parallel execution. ABSTRACT Scalable speedup Pwrake feature : Concurrent Workflow Execution INFILES = FileList["?.c"] OUTFILES = INFILES.map do |i| o = i.sub(/.c$/, ".o") file( o => i ) do |t| t.rsh "cc -o #{o} #{i}" end.name end pw_multitask( “target" => OUTFILES ) do sh “cc –o x #{OUTFILES.join(„ ‟)}” end Pwrake Gfarm File System /dir a.c b.c c.c d.c cc o a.o a.c INFILES = [“a.c”,“b.c”,“c.c”,“d.c”] i= a.co = a.oTask class instance : @name : a.o@prerequisites : a.c@action : proc{|t| t.rsh “cc –o a.o a.c} PwMultiTask class instance : @name : “x” @prerequisites : [“a.o”,”b.o”,..] @action : proc{|t| sh “cc –o x a.o b.o ..”} cc o b.o b.c cc o c.o c.c cc o d.o d.c a.c b.c c.c d.c Parse Workflow defined in Rakefile Generate Task-class instances Remote Process Calls via SSH This workflow can be defined as: Arrangement for #6: Input files are assigned to sites by celestial coordinate. It reduces file accesses between sites. Mount Gfarm File System during SSH connection Implementation of Affinity Scheduling
Transcript
Page 1: Pwrake A Parallel and Distributed Flexible Workflow Management …tanaka/publications/open/HPDC... · 2011-02-27 · pw_multitask( "Proj" => OUTFITS ).invoke sh "mImgtbl p pimages.tbl"

Pwrake : A Parallel and Distributed Flexible Workflow Management

Tool for Wide-area Data Intensive Computing

SRCFITS = FileList[ "#{INPUT_DIR}/*.fits" ]

file( "pimages.tbl" ) doOUTFITS = SRCFITS.map do |i|

o = i.sub(/^(.*?)([^¥/]+).fits/,'p/¥2.p.fits')file( o => [i, HDR] ) do |t|

t.rsh "mProjectPP #{i} #{o} #{HDR}"end

oendpw_multitask( "Proj" => OUTFITS ).invokesh "mImgtbl p pimages.tbl"

end

ABSTRACTThis poster proposes Pwrake, a parallel and distributed flexibleworkflow management tool based on Rake, a domain specificlanguage for building applications in the Ruby programminglanguage. Rake is a similar tool to make and ant. It uses a Rakefilethat is equivalent to a Makefile in make, but written in Ruby. Dueto a flexible and extensible language feature, Rake would be apowerful workflow management language. The Pwrake extendsRake to manage distributed and parallel workflow executionsthat include remote job submission and management of parallelexecutions. This paper discusses the design and implementationof the Pwrake, and demonstrates its power of language andextensibility of the system using a practical e-Science data-intensive workflow in astronomical data analysis on the Gfarmfile system as a case study. Extending a scheduling algorithm tobe aware of file locations, 20% of speed up is observed using 8nodes (32 cores) in a PC cluster. Using two PC clusters located indifferent institutions, the file location aware scheduling showsscalable speedup. The extensible Pwrake is a promising workflowmanagement tool even for wide-area data analysis.

Masahiro Tanaka and Osamu Tatebe (University of Tsukuba)

LocalStorage

LocalStorage

LocalStorage

File System Nodes

file1 file2 file3

LocalStoragefile4

Job forFile 1

Job forFile 3

Job forFile 3

Slow

Fast

Rake syntax = Ruby syntax

file “prog” => [“a.o”, “b.o”] do

sh “cc –o prog a.o b.o”

end

Ruby method defined in Rake

Ruby code block enclosed by do … end or {…}executed as a task action.

Key-value argument to file methodtask_name => prerequisites

site core nodes memory

Univ of Tsukuba quad 8 4GB

AIST dual 8 2GB

• Workflow

– Montage : a tool to combine astronomical images

– http://montage.ipac.caltech.edu/

• Input data:

– 2MASS All sky survey

– 1,580 files (3.3 GB)

• Platform :

/

/subaru

/subaru/spcam

/akari/archives

/akari/fis /archives/2mass

/labA

/labA/personB

…… ……

data …data …

LaboratoryA

data … data …

Seamless access to data files

in a Single Global Tree

NAOJ JAXA

Public Data

Analyzed DataAccess-protected

to Observer

Astronomy data archives from observatories in the world

Standard protocols for data access

Data Intensive Computing in e-Science

Storage

CPU CPU CPU

file1 file2 file3

Storage

CPU CPU CPU

Storage I/O is bottleneck

due to access congestion

StorageStorage

file1 file2 file3

Network File System Distributed File System

parallelexecution

Distributed FS provides efficient I/O

Data Intensive Computing requiresDistributed File System

Gfarm: a Wide-area Distributed File System

What is Rake ? A build tool similar to makeWritten in Ruby language Part of Ruby 1.9.x

Why Rake ? Widely used as a build tool Easy to write complicated

workflows using Ruby language features such as parameter sweep

Easy to extend behavior by inheriting Task class

Easy to define task dynamically

Requirement for Workflow Tool

PwMultitask class

PrerequisiteTasks

SSH connection

Task1

Task2

Task3

Task Queue Thread Queue for remote executions

worker thread1

worker thread2

worker thread3

remote host1

remote host2

remote host3

enqueue dequeue

Task1

Task2

Task3

AffinityQueue

worker thread1

worker thread2

worker thread3

pushwith hostname

Queue for

host1

popwith hostname

Queue for

host2

Queue for

host3

… …

Pwrake Implementation

mProjectPP task definition for Pwrake

mProjectPP

mDiff

mBgModel

mBackground

mAddmFitplane

m1

= a'1x+b'

1y+c'

1

m2

= a'2x+b'2y+c'

2

a1x+b

1y+c

1=0 a

2x+b

2y+c

2=0

Final image

Input images

Montage workflow

Two sites :Gfarm (#5 and #6) with 48 cores• Site: Univ. of Tsukuba and AIST• Scheduling: Affinity scheduling (same as #3,4)• Arrangement of input data :

#5: Each cluster has one file replica for each input file. #6: See figure below.

• Performance :#5→#6 : 41% speedup

• Scalable speedup is observed in comparison to one-site

Performance Evaluation

Position of image file

U. Tsukuba(32 cores)

AIST(16 cores)

Result of Performance Evaluation

20% 41%

elap

sed t

ime

(sec

)

One site :• Site: Univ. of Tsukuba

NFS (plot #1): • Elapsed time increases even as the

number of core increases.

Gfarm (plot #2-6):#2 : Without Affinity scheduling #3 : With Affinity scheduling#4 : Same as #3 except input data

are distributed across compute nodes

• All the cases show scalable speedup.• Performance (32 cores) :

#2→#3 : 14 % speedup#2→#4 : 20 % speedup

Pwrake = Rake + Parallel Workflow extension

CASE STUDY : Astronomy Workflow

BACKGROUND AND MOTIVATION

Task B

Dynamic Task DefinitionTask A

Task B list

Task B target

Task B list is defined in Task A

TASK_B_LIST = Array.new

task "A" do TASK_B_LIST << ...

end

task "B" => "A" doa = TASK_B_LIST.map do |b|task b do...

endendtask("B-target" => a).invoke

end

GXP make• A workflow management tool which exploits the GNU make and uses GXP, a

parallel shell tool written in Python, as the underlying distributed execution engine.• Define workflows in Makefile.• It has implicit and explicit rules to execute, variable values, and shell scripts.• It is possible to reduce the length of a workflow description dramatically compared

to the DAG input file, and to generate a general workflow for applications. Thisresearch is inspired by the GXP make.

Swift• A scientific workflow system designed for loosely coupled computations.• Define workflows in a statically typed language called SwifScript.• Swift dispatches a workflow to another scheduler, such as Karajan, while it is not

intended for users to extend the scheduler. Such batch job submission needsgranularity of jobs for efficient execution.

RELATED WORKS•Pwrake, a parallel and distributed flexible workflow management

tool, is proposed.

•Pwrake is extensible, and has flexible and powerful workflowlanguage to describe scientific workflow.

•We demonstrate a practical e-Science data-intensive workflow inastronomical data analysis on Gfarm file system in wide areaenvironment.

•Extending a scheduling algorithm to be aware of file locations, 20%of speed up was observed using 8 nodes (32 cores) in a PC cluster.

•Scalable speedup is observed in the measurement using two PCclusters located at different sites, if each file is grouped bycoordinate and placed at an appropriate site based on the group.

CONCLUSION

• Exploit local I/O for scalable I/O performance• Move and execute program instead of moving large-scale data• So far there is no workflow tool with file affinity scheduling.

LocalStorage

LocalStorage

LocalStorage

Internet

Gfarm File System

/

/dir1

file1 file2

/dir2

file3 file4

Computer nodes

LocalStorage

LocalStorage

LocalStorage

• http://datafarm.apgrid.org/• Global namespace to federate storage of compute nodes• Designed for data intensive computing in wide area

Key issue for Scalable I/O performance: File Affinity Task Scheduling

Extensibility : Able to choose scheduling scheme, especially affinity-aware scheduling.

Programmable : Easy to define complicated workflows and parameter sweep.

Rule-based :Same definition for different set of data.(DAG-based workflow is not re-usable.)

Dynamic Task Definition:Define tasks based on the result of former tasks.

Performance :Scalability in parallel execution.

ABSTRACT

Scalable speedup

Pwrake feature : Concurrent Workflow ExecutionINFILES = FileList["?.c"]

OUTFILES = INFILES.map do |i|o = i.sub(/.c$/, ".o")file( o => i ) do |t|

t.rsh "cc -o #{o} #{i}"end.name

end

pw_multitask( “target" => OUTFILES ) do

sh “cc –o x #{OUTFILES.join(„ ‟)}”end

Pwrake

Gfarm File System

/dir

a.c b.c c.c d.c

cc –o a.o a.c

INFILES = [“a.c”,“b.c”,“c.c”,“d.c”]

i = “a.c” o = “a.o”

Task class instance :

@name : “a.o”

@prerequisites : “a.c”

@action : proc{|t| t.rsh “cc –o a.o a.c”}

PwMultiTask class instance :

@name : “x”

@prerequisites : [“a.o”,”b.o”,..]

@action : proc{|t| sh “cc –o x a.o b.o ..”}

cc –o b.o b.ccc –o c.o c.c

cc –o d.o d.c

a.c b.c c.c d.c

Parse Workflow defined in Rakefile

Generate Task-class instances

Remote Process Calls via SSH

This workflow can be defined as:

Arrangement for #6: Input files are assigned to sites by celestial coordinate. It reduces file accesses between sites.

Mount Gfarm File System during SSH connection

Implementation of Affinity Scheduling

Recommended