+ All Categories
Home > Documents > Optimizing a Grid workflow for the EGEE infrastructure: The ...3rd EGEE User Forum, 11-14 Feb 2008...

Optimizing a Grid workflow for the EGEE infrastructure: The ...3rd EGEE User Forum, 11-14 Feb 2008...

Date post: 26-Sep-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
22
EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Maximilian Berger Distributed and Parallel Systems Group (DPS) University of Innsbruck 3 rd EGEE User Forum, 11-14 Feb 2008 Clermont-Ferrand, France Optimizing a Grid workflow for the EGEE infrastructure: The case of Wien2k
Transcript
Page 1: Optimizing a Grid workflow for the EGEE infrastructure: The ...3rd EGEE User Forum, 11-14 Feb 2008 Clermont-Ferrand, France Optimizing a Grid workflow for the EGEE infrastructure:

EGEE-II INFSO-RI-031688

Enabling Grids for E-sciencE

www.eu-egee.org

EGEE and gLite are registered trademarks

Maximilian BergerDistributed and Parallel Systems Group (DPS)University of Innsbruck3rd EGEE User Forum, 11-14 Feb 2008Clermont-Ferrand, France

Optimizing a Grid workflow for the EGEE infrastructure: The case of Wien2k

Page 2: Optimizing a Grid workflow for the EGEE infrastructure: The ...3rd EGEE User Forum, 11-14 Feb 2008 Clermont-Ferrand, France Optimizing a Grid workflow for the EGEE infrastructure:

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Outline

• Introduction– Wien2k

• Porting a workflow– Wien2k workflow– Mapping to activities

• Tasker Model– Motivation– Idea

• Results

• Conclusions

2

Page 3: Optimizing a Grid workflow for the EGEE infrastructure: The ...3rd EGEE User Forum, 11-14 Feb 2008 Clermont-Ferrand, France Optimizing a Grid workflow for the EGEE infrastructure:

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Introduction

3

Page 4: Optimizing a Grid workflow for the EGEE infrastructure: The ...3rd EGEE User Forum, 11-14 Feb 2008 Clermont-Ferrand, France Optimizing a Grid workflow for the EGEE infrastructure:

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Wien2k application

• Performs electronic structure calculation of

solids (crystals)• Based on full-potential (linearized) augmented

plane-wave ((L)APW) method

• One of the most accurate schemes for band

structure calculations• Developed by Computational Quantum

Chemistry Group at Tech. Uni.

of Vienna (K. Schwarz, P. Blaha)

• Over 1000 licenses world-wide

• Sequential and MPI versions• http://www.wien2k.at

4

Page 5: Optimizing a Grid workflow for the EGEE infrastructure: The ...3rd EGEE User Forum, 11-14 Feb 2008 Clermont-Ferrand, France Optimizing a Grid workflow for the EGEE infrastructure:

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Grid workflow

5

Page 6: Optimizing a Grid workflow for the EGEE infrastructure: The ...3rd EGEE User Forum, 11-14 Feb 2008 Clermont-Ferrand, France Optimizing a Grid workflow for the EGEE infrastructure:

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Workflow representations

• Original workflow– Graphical description– Textual description of

data dependencies

• “Translation” to several workflow representations– Detailed control and

dataflow

• Implementation of Grid workflow– Quite different to the

original one

6

Page 7: Optimizing a Grid workflow for the EGEE infrastructure: The ...3rd EGEE User Forum, 11-14 Feb 2008 Clermont-Ferrand, France Optimizing a Grid workflow for the EGEE infrastructure:

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Grid workflow

• We took the simplest sub-workflow

• Identify atomic and compound activities– Atomic: single activities– Compound: Can be splitted and parallelized

• Different control- and data- flow

• Application activity ≠ Grid activity !

• Grid activity– wraps application activity (or activities)– can run independently of the others– performs data flow management– sets environment– cleanup environment

lapw0

lapwfermi

sumpara

lcore

mixer

lapw1 lapw1lapw1

lapw2 lapw2lapw2

... ...

......

testcnv

7

Page 8: Optimizing a Grid workflow for the EGEE infrastructure: The ...3rd EGEE User Forum, 11-14 Feb 2008 Clermont-Ferrand, France Optimizing a Grid workflow for the EGEE infrastructure:

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Grid activities

• Common error when porting applications– Mapping one-to-one application activities and grid activities

• Important aspects related to Grid Middleware– Execution models

Centralized workflow enactor Delegation using execution agents (e.g. Ganga) Using resource broker or manual submission Shared file system on worker nodes Application deployment Workflow support of the middleware (DAG)

– Data management models Direct access to file transfer mechanisms File staging Transfer to intermediate data repositories

8

Page 9: Optimizing a Grid workflow for the EGEE infrastructure: The ...3rd EGEE User Forum, 11-14 Feb 2008 Clermont-Ferrand, France Optimizing a Grid workflow for the EGEE infrastructure:

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

How to implement Grid activities

• Activity Attraction Pattern– Using approximated execution times and file sizes

Known by the application developer or scientist

– “Bigger” activities attract “smaller” ones

9

Page 10: Optimizing a Grid workflow for the EGEE infrastructure: The ...3rd EGEE User Forum, 11-14 Feb 2008 Clermont-Ferrand, France Optimizing a Grid workflow for the EGEE infrastructure:

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Workflow evolution

Grid

10

Page 11: Optimizing a Grid workflow for the EGEE infrastructure: The ...3rd EGEE User Forum, 11-14 Feb 2008 Clermont-Ferrand, France Optimizing a Grid workflow for the EGEE infrastructure:

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Workflow execution

• gLite workflow support is not sufficient (JDL DAG)– Not expressive enough for more complex control on data

dependence– No support for loops

• We built our own workflow engine– Very flexible to generate code on-the-fly

Changing workflow Add/change activities Make local tests with local backend Support loops

• Complex part (how to make grid activities) is understood

11

Page 12: Optimizing a Grid workflow for the EGEE infrastructure: The ...3rd EGEE User Forum, 11-14 Feb 2008 Clermont-Ferrand, France Optimizing a Grid workflow for the EGEE infrastructure:

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Worker model

12

Page 13: Optimizing a Grid workflow for the EGEE infrastructure: The ...3rd EGEE User Forum, 11-14 Feb 2008 Clermont-Ferrand, France Optimizing a Grid workflow for the EGEE infrastructure:

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Scheduling Problem

• In development grids scheduling time is short– A few seconds for each activity

• In production grids (EGEE) scheduling time is long!– 5 minutes is „best“ experienced– 90 minutes during the day

• Workflow consists of small activities– Good for parallelization, but– Each activity requires scheduling– Example in Wien2k: Activities < 1 minute!

• Grid execution is much slower than local execution!

13

Page 14: Optimizing a Grid workflow for the EGEE infrastructure: The ...3rd EGEE User Forum, 11-14 Feb 2008 Clermont-Ferrand, France Optimizing a Grid workflow for the EGEE infrastructure:

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Scheduling Problem

Queue

UI machine Worker node

14

Page 15: Optimizing a Grid workflow for the EGEE infrastructure: The ...3rd EGEE User Forum, 11-14 Feb 2008 Clermont-Ferrand, France Optimizing a Grid workflow for the EGEE infrastructure:

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Worker model

• Idea: go through scheduling only once• Submit generic „worker tasks“

• Controller provides work on request

• Scheduling is now done only once

• Workers can be submitted before the actual work starts• And reused for next run

• Example implementation: DIANE

15

Page 16: Optimizing a Grid workflow for the EGEE infrastructure: The ...3rd EGEE User Forum, 11-14 Feb 2008 Clermont-Ferrand, France Optimizing a Grid workflow for the EGEE infrastructure:

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Worker Model

Queue

UI machine Worker node

16

Page 17: Optimizing a Grid workflow for the EGEE infrastructure: The ...3rd EGEE User Forum, 11-14 Feb 2008 Clermont-Ferrand, France Optimizing a Grid workflow for the EGEE infrastructure:

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Experiments

17

Page 18: Optimizing a Grid workflow for the EGEE infrastructure: The ...3rd EGEE User Forum, 11-14 Feb 2008 Clermont-Ferrand, France Optimizing a Grid workflow for the EGEE infrastructure:

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Setup

• Wien2k workflow– Medium sized crystal calculation– Varying the number of k-points submitted to the grid– More k-points: Better scalable

• Measured:– Time for one iteration of SCF cycle– Three results: sequential execution, grid execution, ideal time– Grid scheduling with worker tasks

18

Page 19: Optimizing a Grid workflow for the EGEE infrastructure: The ...3rd EGEE User Forum, 11-14 Feb 2008 Clermont-Ferrand, France Optimizing a Grid workflow for the EGEE infrastructure:

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Measurements

19

Page 20: Optimizing a Grid workflow for the EGEE infrastructure: The ...3rd EGEE User Forum, 11-14 Feb 2008 Clermont-Ferrand, France Optimizing a Grid workflow for the EGEE infrastructure:

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Results

• Small experiments: better to execute sequentially• Medium experiments: Performance about equal

• Large experiments: Grid execution is faster!

• Still overhead!• Possible Reasons:

– StageIn / StageOut for every task– Some nodes do not respond -> timeout before rescheduling– Could be optimized (do only once)

20

Page 21: Optimizing a Grid workflow for the EGEE infrastructure: The ...3rd EGEE User Forum, 11-14 Feb 2008 Clermont-Ferrand, France Optimizing a Grid workflow for the EGEE infrastructure:

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

Conclusions

There is still work to be done:• Worker model provides sufficient results, however• Workflow execution has to be further improved to

optimize performance• Reuse previously staged software• Data flow between workers rather than back to the

controller.• Use and test new EGEE III middleware support for

workflows

We will continue the work in EGEE III!

21

Page 22: Optimizing a Grid workflow for the EGEE infrastructure: The ...3rd EGEE User Forum, 11-14 Feb 2008 Clermont-Ferrand, France Optimizing a Grid workflow for the EGEE infrastructure:

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure

THANK YOU!

Questions?

22


Recommended