EGEE-II INFSO-RI-031688
Enabling Grids for E-sciencE
www.eu-egee.org
EGEE and gLite are registered trademarks
Maximilian BergerDistributed and Parallel Systems Group (DPS)University of Innsbruck3rd EGEE User Forum, 11-14 Feb 2008Clermont-Ferrand, France
Optimizing a Grid workflow for the EGEE infrastructure: The case of Wien2k
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure
Outline
• Introduction– Wien2k
• Porting a workflow– Wien2k workflow– Mapping to activities
• Tasker Model– Motivation– Idea
• Results
• Conclusions
2
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure
Introduction
3
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure
Wien2k application
• Performs electronic structure calculation of
solids (crystals)• Based on full-potential (linearized) augmented
plane-wave ((L)APW) method
• One of the most accurate schemes for band
structure calculations• Developed by Computational Quantum
Chemistry Group at Tech. Uni.
of Vienna (K. Schwarz, P. Blaha)
• Over 1000 licenses world-wide
• Sequential and MPI versions• http://www.wien2k.at
4
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure
Grid workflow
5
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure
Workflow representations
• Original workflow– Graphical description– Textual description of
data dependencies
• “Translation” to several workflow representations– Detailed control and
dataflow
• Implementation of Grid workflow– Quite different to the
original one
6
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure
Grid workflow
• We took the simplest sub-workflow
• Identify atomic and compound activities– Atomic: single activities– Compound: Can be splitted and parallelized
• Different control- and data- flow
• Application activity ≠ Grid activity !
• Grid activity– wraps application activity (or activities)– can run independently of the others– performs data flow management– sets environment– cleanup environment
lapw0
lapwfermi
sumpara
lcore
mixer
lapw1 lapw1lapw1
lapw2 lapw2lapw2
... ...
......
testcnv
7
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure
Grid activities
• Common error when porting applications– Mapping one-to-one application activities and grid activities
• Important aspects related to Grid Middleware– Execution models
Centralized workflow enactor Delegation using execution agents (e.g. Ganga) Using resource broker or manual submission Shared file system on worker nodes Application deployment Workflow support of the middleware (DAG)
– Data management models Direct access to file transfer mechanisms File staging Transfer to intermediate data repositories
8
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure
How to implement Grid activities
• Activity Attraction Pattern– Using approximated execution times and file sizes
Known by the application developer or scientist
– “Bigger” activities attract “smaller” ones
9
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure
Workflow evolution
Grid
10
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure
Workflow execution
• gLite workflow support is not sufficient (JDL DAG)– Not expressive enough for more complex control on data
dependence– No support for loops
• We built our own workflow engine– Very flexible to generate code on-the-fly
Changing workflow Add/change activities Make local tests with local backend Support loops
• Complex part (how to make grid activities) is understood
11
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure
Worker model
12
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure
Scheduling Problem
• In development grids scheduling time is short– A few seconds for each activity
• In production grids (EGEE) scheduling time is long!– 5 minutes is „best“ experienced– 90 minutes during the day
• Workflow consists of small activities– Good for parallelization, but– Each activity requires scheduling– Example in Wien2k: Activities < 1 minute!
• Grid execution is much slower than local execution!
13
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure
Scheduling Problem
Queue
UI machine Worker node
14
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure
Worker model
• Idea: go through scheduling only once• Submit generic „worker tasks“
• Controller provides work on request
• Scheduling is now done only once
• Workers can be submitted before the actual work starts• And reused for next run
• Example implementation: DIANE
15
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure
Worker Model
Queue
UI machine Worker node
16
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure
Experiments
17
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure
Setup
• Wien2k workflow– Medium sized crystal calculation– Varying the number of k-points submitted to the grid– More k-points: Better scalable
• Measured:– Time for one iteration of SCF cycle– Three results: sequential execution, grid execution, ideal time– Grid scheduling with worker tasks
18
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure
Measurements
19
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure
Results
• Small experiments: better to execute sequentially• Medium experiments: Performance about equal
• Large experiments: Grid execution is faster!
• Still overhead!• Possible Reasons:
– StageIn / StageOut for every task– Some nodes do not respond -> timeout before rescheduling– Could be optimized (do only once)
20
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure
Conclusions
There is still work to be done:• Worker model provides sufficient results, however• Workflow execution has to be further improved to
optimize performance• Reuse previously staged software• Data flow between workers rather than back to the
controller.• Use and test new EGEE III middleware support for
workflows
We will continue the work in EGEE III!
21
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 Max Berger, UIBK: Optimizing a Grid workflow for EGEE infrastructure
THANK YOU!
Questions?
22