Authors: Weiwei Chen, Ewa Deelman 9th International Conference on Parallel Processing and Applied...

Post on 25-Dec-2015

216 views 0 download

transcript

1

Partitioning and Scheduling Workflows across Multiple

Sites with Storage ConstraintsAuthors: Weiwei Chen, Ewa Deelman

9th International Conference on Parallel Processing and Applied Mathmatics

2

Introduction Related work System design Experiments and Evaluations Conclusions

Outline

3

Introduction Related work System design Experiments and Evaluations Conclusions

Outline

4

In recent years, scientific workflows have been widely applied in astronomy, seismology, genomics, etc.

This paper aims to address the problem of scheduling large workflows onto multiple execution sites with storage constraints.

We model workflows as Directed Acyclic Graphs (DAGs), where nodes represent computation and directed edges represent data flow dependencies between nodes.

Introduction

5

control or data dependencies between jobs the mapping of jobs in the workflow onto

resources that are often distributed in the wide area

data-intensive workflows that require significant amount of storage◦ the entire CyberShake earthquake science

workflow has 16,000 sub-workflows and each sub-workflow has more than 24,000 individual jobs and requires 58 GB data.

Problems

6

Introduction Related work System design Experiments and Evaluations Conclusions

Outline

7

Heuristic scheduling◦ HEFT、Min-Min、Max-Min、MCT

These algorithms didn’t take storage constraints into consideration and they need to check every job and schedule them.

Workflow partitioning can be classified as a network cut problem where a sub-workflow is viewed as a sub-graph.

Related work

8

Introduction Related work System design Experiments and Evaluations Conclusions

Outline

9

The site catalog provides information about the available resources.

System design

10

reduces the complexity of the workflow mapping.◦ For example, the entire CyberShake workflow has

more than 3.8108 tasks, which is a large number for workflow management tools. In contrast, each sub-workflow has 24,000 tasks, which is acceptable for workflow management tools.

Why partition?

11

The major challenge in partitioning workflows is to avoid cross dependency, which is a chain of dependencies that forms a cycle in graph (in this case cycles between sub-workflows).

Cross dependency

↑deadlock loop

12

Usually jobs that have parent-child relationships share a lot of data since they have data dependencies.

Three heuristics are proposed to first partition the workflow into sub-workflows.

Partitioner

13

Our heuristic only checks three particular types of nodes:◦ fan-out: where the output of a job is input to

many children◦ fan-in: where the output of several jobs is

aggregated by a child◦ pipeline nodes: 1 parent, 1 child

Our algorithm reduces the time complexity of check operations by n folds, while n is the average depth of the fan-in-fan-out structure.

Partitioner

14

Aggressive search◦ checks if it’s possible

to add the whole fan structure into the sub-workflow

Less-aggressive search◦ performed on its parent

jobs, which includes all of its predecessors until the search reaches a fan-out job.

Partitioner

15

Conservative search◦ all of its predecessors

until the search reaches a fan-in job or a fan-out job.

Partitioner

16

We assume that the size of each input file and output file is known.

Heuristic Ⅰ

17

adds a job to a sub-workflow if all of its unscheduled children can be added to that sub-workflow without causing cross dependencies or exceed the storage constraint.

Heuristic Ⅱ

18

1. For a job with multiple children, each child has already been scheduled

2. After adding this job to the sub-workflow, the data size doesn’t exceed the storage constraint.

Heuristic Ⅲ

19

Critical Path: the longest depth of the sub-workflow weighted by the runtime of each job.

Average CPU Time is the quotient of cumulative CPU time of all jobs divided by the number of available resources.

HEFT estimator uses the calculated earliest finish time of the last sink job as makespan of sub-workflows.

Estimator

20

Re-ordering: partitioning step has already guaranteed that there is a valid mapping

Scheduling algorithm: HEFT、Min-min There are two differences compared to their

original versions: ◦ First, the data transfer cost within a sub-workflow

is ignored since we use a shared file system in our experiments.

◦ Second, the data constraints must be satisfied for each sub-workflow.

Scheduler

21

Introduction Related work System design Experiments and Evaluations Conclusions

Outline

22

Eucalyptus[14]: an infrastructure software that provides on-demand access to Virtual Machine (VM) resources.

The submit host that performs workflow planning and which sends jobs to the execution sites is a Linux 2.6 machine equipped with 8GB RAM and an Intel 2.66GHz Quad CPUs.

14. Eucalyptus Systems. http://www.eucalyptus.com/

Experiments and Evaluations

23

We use Condor [6] pools as execution sites. HTCondor is a specialized workload management

system for compute-intensive jobs. Like other full-featured batch systems, HTCondor provides a job queueing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management.

6. M. Litzkow, M. Livny, et al., Condor—A Hunter of Idle Workstations. In Proceedings of the 8th International Conference on Distributed Computing Systems, New York, June 1988.

Experiments and Evaluations

24

Performance Metrics.◦ Satisfying the Storage Constraints◦ Improving the Runtime Performance

Experiments and Evaluations

25

Workflows Used◦ Montage: an astronomy application, an

astronomy application that is used to construct large image mosaics of the sky.

◦ CyberShake: a seismology application, calculate Probabilistic Seismic Hazard curves for several geographic sites in the Southern California area.

◦ Epigenomics: a bioinformatics application, maps short DNA segments collected with high-throughput gene sequencing machines to a reference genome.

Experiments and Evaluations

26

They were chosen because they represent a wide range of application domains and a variety of resource requirements.◦ Montage: I/O intensive◦ CyberShake: memory intensive◦ Epigenomics: CPU intensive

Experiments and Evaluations

27

storage constraint: 30GB The default workflow has no storage

constraint.

Experiments and Evaluations

28

Experiments and Evaluations

29

Performance with Different Storage Constraints

Experiments and Evaluations

30

CyberShake

Experiments and Evaluations

31

Montage

Experiments and Evaluations

32

Epigenomics

Experiments and Evaluations

33

Experiments and Evaluations

34

The performance with three workflows shows that this approach is able to satisfy the storage constraints and reduce the makespan significantly especially for Epigenomics which has fewer fan-in (synchronization) jobs.

For the workflows we used, scheduling them onto two or three execution sites is best due to a tradeoff between increased data transfer and increased parallelism.

Experiments and Evaluations

35

◦ The Average CPU Time doesn’t take the dependencies into consideration.

◦ The Critical Path doesn’t consider the resource availability.

Experiments and Evaluations

36

Introduction Related work System design Experiments and Evaluations Conclusions

Outline

37

Three heuristics are proposed and compared to show the close relationship between cross dependency and runtime improvement.

The performance with three real-world workflows shows that this approach is able to satisfy storage constraints and improve the overall runtime by up to 48% over a default whole-workflow scheduling.

Conclusions

38

Thanks for your listening