Optimizing a Grid workflow for the EGEE infrastructure: The ...3rd EGEE User Forum, 11-14 Feb 2008...

Post on 26-Sep-2020

2 views 0 download

transcript

EGEE-II INFSO-RI-031688

Enabling Grids for E-sciencE

www.eu-egee.org

EGEE and gLite are registered trademarks

Maximilian BergerDistributed and Parallel Systems Group (DPS)University of Innsbruck3rd EGEE User Forum, 11-14 Feb 2008Clermont-Ferrand, France

Optimizing a Grid workflow for the EGEE infrastructure: The case of Wien2k

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Outline

• Introduction– Wien2k

• Porting a workflow– Wien2k workflow– Mapping to activities

• Tasker Model– Motivation– Idea

• Results

• Conclusions

2

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Introduction

3

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Wien2k application

• Performs electronic structure calculation of

solids (crystals)• Based on full-potential (linearized) augmented

plane-wave ((L)APW) method

• One of the most accurate schemes for band

structure calculations• Developed by Computational Quantum

Chemistry Group at Tech. Uni.

of Vienna (K. Schwarz, P. Blaha)

• Over 1000 licenses world-wide

• Sequential and MPI versions• http://www.wien2k.at

4

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Grid workflow

5

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Workflow representations

• Original workflow– Graphical description– Textual description of

data dependencies

• “Translation” to several workflow representations– Detailed control and

dataflow

• Implementation of Grid workflow– Quite different to the

original one

6

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Grid workflow

• We took the simplest sub-workflow

• Identify atomic and compound activities– Atomic: single activities– Compound: Can be splitted and parallelized

• Different control- and data- flow

• Application activity ≠ Grid activity !

• Grid activity– wraps application activity (or activities)– can run independently of the others– performs data flow management– sets environment– cleanup environment

lapw0

lapwfermi

sumpara

lcore

mixer

lapw1 lapw1lapw1

lapw2 lapw2lapw2

... ...

......

testcnv

7

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Grid activities

• Common error when porting applications– Mapping one-to-one application activities and grid activities

• Important aspects related to Grid Middleware– Execution models

Centralized workflow enactor Delegation using execution agents (e.g. Ganga) Using resource broker or manual submission Shared file system on worker nodes Application deployment Workflow support of the middleware (DAG)

– Data management models Direct access to file transfer mechanisms File staging Transfer to intermediate data repositories

8

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

How to implement Grid activities

• Activity Attraction Pattern– Using approximated execution times and file sizes

Known by the application developer or scientist

– “Bigger” activities attract “smaller” ones

9

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Workflow evolution

Grid

10

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Workflow execution

• gLite workflow support is not sufficient (JDL DAG)– Not expressive enough for more complex control on data

dependence– No support for loops

• We built our own workflow engine– Very flexible to generate code on-the-fly

Changing workflow Add/change activities Make local tests with local backend Support loops

• Complex part (how to make grid activities) is understood

11

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Worker model

12

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Scheduling Problem

• In development grids scheduling time is short– A few seconds for each activity

• In production grids (EGEE) scheduling time is long!– 5 minutes is „best“ experienced– 90 minutes during the day

• Workflow consists of small activities– Good for parallelization, but– Each activity requires scheduling– Example in Wien2k: Activities < 1 minute!

• Grid execution is much slower than local execution!

13

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Scheduling Problem

Queue

UI machine Worker node

14

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Worker model

• Idea: go through scheduling only once• Submit generic „worker tasks“

• Controller provides work on request

• Scheduling is now done only once

• Workers can be submitted before the actual work starts• And reused for next run

• Example implementation: DIANE

15

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Worker Model

Queue

UI machine Worker node

16

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Experiments

17

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Setup

• Wien2k workflow– Medium sized crystal calculation– Varying the number of k-points submitted to the grid– More k-points: Better scalable

• Measured:– Time for one iteration of SCF cycle– Three results: sequential execution, grid execution, ideal time– Grid scheduling with worker tasks

18

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Measurements

19

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Results

• Small experiments: better to execute sequentially• Medium experiments: Performance about equal

• Large experiments: Grid execution is faster!

• Still overhead!• Possible Reasons:

– StageIn / StageOut for every task– Some nodes do not respond -> timeout before rescheduling– Could be optimized (do only once)

20

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Conclusions

There is still work to be done:• Worker model provides sufficient results, however• Workflow execution has to be further improved to

optimize performance• Reuse previously staged software• Data flow between workers rather than back to the

controller.• Use and test new EGEE III middleware support for

workflows

We will continue the work in EGEE III!

21

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

THANK YOU!

Questions?

22