Authors: Weiwei Chen, Ewa Deelman 9th International Conference on Parallel Processing and Applied...

transcript

Partitioning and Scheduling Workflows across Multiple

Sites with Storage ConstraintsAuthors: Weiwei Chen, Ewa Deelman

9th International Conference on Parallel Processing and Applied Mathmatics

Introduction Related work System design Experiments and Evaluations Conclusions

Outline

In recent years, scientific workflows have been widely applied in astronomy, seismology, genomics, etc.

This paper aims to address the problem of scheduling large workflows onto multiple execution sites with storage constraints.

We model workflows as Directed Acyclic Graphs (DAGs), where nodes represent computation and directed edges represent data flow dependencies between nodes.

Introduction

control or data dependencies between jobs the mapping of jobs in the workflow onto

resources that are often distributed in the wide area

data-intensive workflows that require significant amount of storage◦ the entire CyberShake earthquake science

workflow has 16,000 sub-workflows and each sub-workflow has more than 24,000 individual jobs and requires 58 GB data.

Problems

Outline

Heuristic scheduling◦ HEFT、Min-Min、Max-Min、MCT

These algorithms didn’t take storage constraints into consideration and they need to check every job and schedule them.

Workflow partitioning can be classified as a network cut problem where a sub-workflow is viewed as a sub-graph.

Related work

Outline

The site catalog provides information about the available resources.

System design

reduces the complexity of the workflow mapping.◦ For example, the entire CyberShake workflow has

more than 3.8108 tasks, which is a large number for workflow management tools. In contrast, each sub-workflow has 24,000 tasks, which is acceptable for workflow management tools.

Why partition?

The major challenge in partitioning workflows is to avoid cross dependency, which is a chain of dependencies that forms a cycle in graph (in this case cycles between sub-workflows).

Cross dependency

↑deadlock loop

Usually jobs that have parent-child relationships share a lot of data since they have data dependencies.

Three heuristics are proposed to first partition the workflow into sub-workflows.

Partitioner

Our heuristic only checks three particular types of nodes:◦ fan-out: where the output of a job is input to

many children◦ fan-in: where the output of several jobs is

aggregated by a child◦ pipeline nodes: 1 parent, 1 child

Our algorithm reduces the time complexity of check operations by n folds, while n is the average depth of the fan-in-fan-out structure.

Partitioner

Aggressive search◦ checks if it’s possible

to add the whole fan structure into the sub-workflow

Less-aggressive search◦ performed on its parent

jobs, which includes all of its predecessors until the search reaches a fan-out job.

Partitioner

Conservative search◦ all of its predecessors

until the search reaches a fan-in job or a fan-out job.

Partitioner

We assume that the size of each input file and output file is known.

Heuristic Ⅰ

adds a job to a sub-workflow if all of its unscheduled children can be added to that sub-workflow without causing cross dependencies or exceed the storage constraint.

Heuristic Ⅱ

1. For a job with multiple children, each child has already been scheduled

2. After adding this job to the sub-workflow, the data size doesn’t exceed the storage constraint.

Heuristic Ⅲ

Critical Path: the longest depth of the sub-workflow weighted by the runtime of each job.

Average CPU Time is the quotient of cumulative CPU time of all jobs divided by the number of available resources.

HEFT estimator uses the calculated earliest finish time of the last sink job as makespan of sub-workflows.

Estimator

Re-ordering: partitioning step has already guaranteed that there is a valid mapping

Scheduling algorithm: HEFT、Min-min There are two differences compared to their

original versions: ◦ First, the data transfer cost within a sub-workflow

is ignored since we use a shared file system in our experiments.

◦ Second, the data constraints must be satisfied for each sub-workflow.

Scheduler

Outline

Eucalyptus[14]: an infrastructure software that provides on-demand access to Virtual Machine (VM) resources.

The submit host that performs workflow planning and which sends jobs to the execution sites is a Linux 2.6 machine equipped with 8GB RAM and an Intel 2.66GHz Quad CPUs.

14. Eucalyptus Systems. http://www.eucalyptus.com/

Experiments and Evaluations

We use Condor [6] pools as execution sites. HTCondor is a specialized workload management

system for compute-intensive jobs. Like other full-featured batch systems, HTCondor provides a job queueing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management.

6. M. Litzkow, M. Livny, et al., Condor—A Hunter of Idle Workstations. In Proceedings of the 8th International Conference on Distributed Computing Systems, New York, June 1988.

Performance Metrics.◦ Satisfying the Storage Constraints◦ Improving the Runtime Performance

Workflows Used◦ Montage: an astronomy application, an

astronomy application that is used to construct large image mosaics of the sky.

◦ CyberShake: a seismology application, calculate Probabilistic Seismic Hazard curves for several geographic sites in the Southern California area.

◦ Epigenomics: a bioinformatics application, maps short DNA segments collected with high-throughput gene sequencing machines to a reference genome.

They were chosen because they represent a wide range of application domains and a variety of resource requirements.◦ Montage: I/O intensive◦ CyberShake: memory intensive◦ Epigenomics: CPU intensive

storage constraint: 30GB The default workflow has no storage

constraint.

Performance with Different Storage Constraints

CyberShake

Montage

Epigenomics

The performance with three workflows shows that this approach is able to satisfy the storage constraints and reduce the makespan significantly especially for Epigenomics which has fewer fan-in (synchronization) jobs.

For the workflows we used, scheduling them onto two or three execution sites is best due to a tradeoff between increased data transfer and increased parallelism.

◦ The Average CPU Time doesn’t take the dependencies into consideration.

◦ The Critical Path doesn’t consider the resource availability.

Outline

Three heuristics are proposed and compared to show the close relationship between cross dependency and runtime improvement.

The performance with three real-world workflows shows that this approach is able to satisfy storage constraints and improve the overall runtime by up to 48% over a default whole-workflow scheduling.

Conclusions

Thanks for your listening

Authors: Weiwei Chen, Ewa Deelman 9th International Conference on Parallel Processing and Applied...

Documents