+ All Categories
Home > Technology > Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Date post: 13-Jul-2015
Category:
Upload: masahiro-tanaka
View: 2,008 times
Download: 0 times
Share this document with a friend
Popular Tags:
69
Masahiro Tanaka University of Tsukuba 2010-11-14 1 RubyConf X
Transcript
Page 1: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Masahiro Tanaka

University of Tsukuba

2010-11-14 1RubyConf X

Page 2: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Masahiro Tanaka

NArray author

majored in Astronomy

Research fellow in Computer Science◦ at Center for Computational Sciences,

University of Tsukuba

◦ since 2009

2010-11-14 2RubyConf X

Page 3: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Research Fields

◦ Computer Science:

High Performance Computing

Computational Informatics

◦ Computational Science:

Particle Physics, Astrophysics,

Material Science, Life Science,

Biology, Environmental Science

2010-11-14 3RubyConf X

Page 4: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

FIRST

◦ 512 cores+BladeGRAPE

◦ 36 TFLOPS

PACS-CS

◦ 2,560 cores

◦ 14.4 TFOPS

T2K Tsukuba

◦ 10,368 cores

◦ 95 TFLOPS

2010-11-14RubyConf X 4

Page 5: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Conference on SuperComputer

> 10,000 participants

2010-11-14RubyConf X 5

Page 6: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

We are here

SC10 venueErnest N. Morial

Convention Centerexhibit Nov 15-18

2010-11-14 6RubyConf X

Page 7: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

CCS booth at SC09

2010-11-14RubyConf X 7

Page 8: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Pwrake : a Distributed Workflow

Engine for e-Science

2010-11-14 8RubyConf X

Page 9: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

2010-11-14 9RubyConf X

Page 10: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

LHCParticle Accelerator

ALMARadio Observatory

2010-11-14 10RubyConf X

Page 11: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

2010-11-14RubyConf X 11

http://www.sinet.ad.jp/case-examples/tsukuba

: Sharing QCD Simulation data

Page 12: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Computationally intensive science that is carried out in highly distributed network environments,

or

Science that uses immense data sets that require grid computing

◦ (Wikipedia).

2010-11-14 12RubyConf X

Page 13: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

The term was created by John Taylor,

◦ Director General of the United Kingdom's Office of Science and Technology

◦ in 1999

2010-11-14RubyConf X 13

Page 14: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

is a key issue for e-Science.

2010-11-14 14RubyConf X

Page 15: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Performance of single core does no more increase.

2010-11-14 15RubyConf X

Page 16: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

2010-11-14 16RubyConf X

Page 17: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

2010-11-14 17RubyConf X

Page 18: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Parallelize your program

Scalability is an key issue

2010-11-14RubyConf X 18

Page 19: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

P : parallelizable

1-P : sequential

N : # of processors

Speed-up formula :

2010-11-14RubyConf X 19

Number of Processors

Speed u

p

1

1− P + P/N

P<1

1

1− P

Page 20: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

MapReduce

MPI

OpenMP

thread

Parallel programming languages

process

2010-11-14RubyConf X 20

Page 21: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Independent processes can be parallelized

Without parallel programming

Workflow System is required

2010-11-14RubyConf X 21

input1

program

output1

...input2

program

output2

input3

program

output3

input4

program

output4

...

...

Page 22: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

2010-11-14RubyConf X 22

Page 23: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Description of procedures

It is like building a program

2010-11-14RubyConf X 23

cc –c –o a.o a.c

a.c b.c

b.oa.o

cc –o prog …

prog

cc –c –o b.o b.c

Page 24: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Task: Ellipse Node

File: Rectangle Node

Dependency: Edge

DAG

◦ Directed Acyclic Graph

2010-11-14 24RubyConf X

Input File

Task

Output File

Dependency

Page 25: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Montage

◦ software for producing a custom mosaic image from multiple shots of images.

◦ http://montage.ipac.caltech.edu/

2010-11-14 25RubyConf X

Page 26: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Tasks:

◦ Projection

◦ Brightness correction

◦ Coadding

1 image : 1 process

26RubyConf X 2010-11-14

mProjectPP

mDiff

mBgModel

mBackground

mAddmFitplane

m1

= a'1x+b'

1y+c'

1

m2

= a'2x+b'2y+c'

2

a1x+b

1y+c

1=0 a

2x+b

2y+c

2=0

Final image

Input images

Page 27: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

flow

Inputfiles

Outputfile

2010-11-14 27RubyConf X

Page 28: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

◦ invoke task based on dependency

◦ assign a task to an available computer

◦ parallel execution for independent tasks

2010-11-14RubyConf X 28

ProcessProcess

Process

Workflow System

Workflowdefinition

Page 29: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

for Grid Computing

2010-11-14RubyConf X 29

• DAGMan• Pegasus• Triana• ICENI• Taverna• GrADS• GridFlow• UNICORE• Globus workflow• Askalan• Karajan• Kepler

from “A Taxonomy of Scientific Workflow Systems for Grid Computing”Jia Yu and Rajkumar Buyya (2005)

Page 30: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Define DAG in XML

◦ Human cannot write complex XML.

◦ Need to write a program to generate XML

2010-11-14RubyConf X 30

<adag xmlns="http://www.griphyn.org/chimera/DAX"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.griphyn.org/chimera/DAX http://www.griphyn.org/chimera/dax-1.8.xsd"

count="1" index="0" name="test">

<filename file="2mass-atlas-981204n-j0160056.fits" link="input"/>

<job id="ID000001" name="mProject" version="3.0" level="11" dv-name="mProject1" dv-version="1.0">

<argument>

<filename file="2mass-atlas-981204n-j0160056.fits"/>

<filename file="p2mass-atlas-981204n-j0160056.fits"/>

<filename file="templateTMP_AAAaaa01.hdr"/>

</argument>

<uses file="2mass-atlas-981204n-j0160056.fits" link="input" dontRegister="false" dontTransfer="false"/>

<uses file="p2mass-atlas-981204n-j0160056.fits" link="output" dontRegister="true" dontTransfer="true" temporaryHint="tmp"/>

<uses file="p2mass-atlas-981204n-j0160056_area.fits" link="output" dontRegister="true" dontTransfer="true" temporaryHint="tmp"/>

<uses file="templateTMP_AAAaaa01.hdr" link="input" dontRegister="false" dontTransfer="false"/>

</job>

<child ref="ID003006">

<parent ref="ID000001"/>

<parent ref="ID000006"/>

</child>

DAG XML

Page 31: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

DSL to define task dependency

Rule

◦ define multiple tasks at once

◦ avoid redundancy

Skip finished tasks

◦ based on timestamp of file

2010-11-14RubyConf X 31

Page 32: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Grid Explorer : Grid and Cluster shell

◦ http://www.logos.ic.i.u-tokyo.ac.jp/gxp/

◦ written in Python.

GXP Make

◦ GNU Make-based workflow system

◦ Distributed & Parallel execution

2010-11-14RubyConf X 32

Page 33: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Makefile

◦ same input files

◦ same tasks

◦ executed repeatedly

Scientific Workflows have different aspects.

2010-11-14 33RubyConf X

Page 34: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Same workflow for different files

◦ “rule” may solve, but is not enough.

2010-11-14RubyConf X 34

File set 1file file file

File set 2file file file

Workflow

result 1 result 2

Page 35: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Task dependencies rely on :

◦ Not only file name

◦ Parameters, e.g. Geometry

2010-11-14RubyConf X 35

A B

CD

A

task

B C D

task task task

Page 36: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Entire workflow is unknown at first

Result of a task affect

◦ Output files

◦ Afterward tasks

2010-11-14RubyConf X 36

Task

file3file1 file2

Task Task

Page 37: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

create Makefile during Make execution

tricky way

MakefileMakefile.sub: prerequisite

awk –f hoge.awk $< > $@

target: Makefile.sub

make -f $<

Makefile.subtarget1: source1

target2: source2

create

invoke

2010-11-14 37RubyConf X

Page 38: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Scientific workflow requires powerfuland flexible definition language.

You probably know the solution.

◦ What is it?

2010-11-14 38RubyConf X

Page 39: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

2010-11-14RubyConf X 39

Page 40: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Build tool

Internal DSL

Programming power of Ruby

2010-11-14RubyConf X 40

Page 41: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

file "file2" => "file1" do

sh "program file1 > file2"

end

2010-11-14RubyConf X 41

file1

program

file2

Page 42: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

for x in LIST

file x[1] => x[0] do |t|

sh "your_program …"

end

end

2010-11-14 42RubyConf X

Page 43: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

How do your write it with Rake?

2010-11-14RubyConf X 43

Task

file3file1 file2

Task Task

Page 44: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

task :A do

task :B do

puts “B”

end

end

task :default => :A

No task depends on Task B

Page 45: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

task :A do

b = task :B do

puts “B”

end

b.invoke

end

task :default => :A

Rake::Task#invoke

Invoke Task B immediately after definition

Page 46: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

multitask

◦ Rake built-in feature

◦ Parallelize prerequisite tasks of multitask

◦ Ruby thread

Problem

◦ No control for the number of thread.

◦ All the prerequisite tasks are invoked at the same time.

2010-11-14RubyConf X 46

Page 47: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

http://drake.rubyforge.org/

Specify the number of threads

All the independent task are automatically parallelized.

◦ multitask is not necessary

2010-11-14RubyConf X 47

Page 48: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Find task

2010-11-14RubyConf X 48

D E F

C

A

B

Queue to workers

worker thread 1

worker thread 2

Queue from workers

B

Available task

...

Page 49: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Remote process execution

Dynamic task definition

◦ dRake does not allows “invoke” method.

Performance issue

2010-11-14 49RubyConf X

Page 50: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Need Powerful Scientific Workflow tool

Existing

◦ Rake : Powerful for writing workflow

◦ dRake : Parallel execution

Missing

◦ Remote Process Invocation

◦ Scalability

2010-11-14RubyConf X 50

Page 51: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

2010-11-14RubyConf X 51

ProcessProcess

Process

Rake

Rakefile

Gfarm filesystem

file file file

Pwrakeextension

Page 52: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

2010-11-14RubyConf X 52

Page 53: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Parallel + distributed

Workflow

extension for Rake

repository:

◦ http://github.com/masa16/pwrake

2010-11-14RubyConf X 53

Page 54: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Same syntax as Rake.

Parallelize task, file

◦ no multitask

Replace “sh” method

◦ invoke process through SSH

Scalability

2010-11-14 54RubyConf X

Page 55: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Why SSH

◦ Secure

◦ Probably SSH port is available

SSH class for Pwrake

◦ Original implementation

◦ Performance issues

2010-11-14RubyConf X 55

Page 56: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Worker thread in Ruby

Ruby thread uses single-core◦ GVL

sh process uses multi-core.

for x in LIST

file x[1] => x[0] do |t|

sh "your_program …"

end

end

2010-11-14RubyConf X 56

here uses multi-core

Page 57: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

2010-11-14RubyConf X 57

x10faster

dRake

Pwrake

Page 58: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Our approach:

use Distributed Filesystem

◦ file sharing

◦ consistent file timestamp

◦ I/O performance

2010-11-14RubyConf X 58

Page 59: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Storage

CPU CPU CPU

file1 file2 file3

Storage

CPU CPU CPU

Storage I/O becomes

bottleneck StorageStorage

file1 file2 file3

Network File System Distributed File System

Parallel Processing

59RubyConf X 2010-11-14

Page 60: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

2010-11-14RubyConf X 60

Page 61: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Wide-area distributed file system

Global namespace to federate storages

Main developer : Prof. Osamu Tatebe

Open source development◦ http://datafarm.apgrid.org/

LocalStorage

LocalStorage

LocalStorage

InternetGfarm File System

/

/dir1

file1 file2

/dir2

file3 file4

Computer nodes

LocalStorage

LocalStorage

LocalStorage

61RubyConf X 2010-11-14

Page 62: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

use Local I/O for performance

assign task based on File locality

implement as a function of Pwrake

62RubyConf X 2010-11-14

LocalStorage

LocalStorage

LocalStorage

File System Nodes

file1 file2 file3

LocalStoragefile4

Job forFile 1

Job forFile 3

Job forFile 3

Slow

Fast

Page 63: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Locality-aware task assignment for Gfarm

2010-11-14RubyConf X 63

Task1

Task2

Task3

AffinityQueue

worker thread1

worker thread2

worker thread3

pushwith hostname

Queue for

host1

popwith hostname

Queue for

host2

Queue for

host3

… …

Page 64: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

1 node4 cores

2 nodes8 cores

4 nodes16 cores

8 nodes32 cores

64RubyConf X 2010-11-14

NFSdoes not

scale

Page 65: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

1 node4 cores

2 nodes8 cores

4 nodes16 cores

8 nodes32 cores

65RubyConf X 2010-11-14

Gfarmscales

20% Speedup

using locality

Page 66: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Montage workflow

2010-11-14RubyConf X 66

Page 67: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Geographically distributed wokrflow

Fault tolerance

2010-11-14RubyConf X 67

Page 68: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Rake

◦ is so powerful to be used for Scientific definition language.

Pwrake

◦ Parallel and Distributed Workflow extension for Rake

Gfarm

◦ for scalable I/O performance

2010-11-14 68RubyConf X

Page 69: Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

Pwrake site

◦ https://github.com/masa16/pwrake

Questions?

2010-11-14RubyConf X 69


Recommended