Meta-Balancer
Automated Load Balancing Invocation based onApplication Characteristics
Harshitha Menon, Nikhil Jain, Gengbin Zheng, Laxmikant Kale
25th September
Cluster 2012, Beijing, China
1 / 30
Meta-Balancer
Outline
1 IntroductionMotivationLoad Balancing Challenges
2 Background
3 Meta-BalancerStatistics CollectionDecision Making
4 Evaluation
5 Conclusion and Future Work
2 / 30
Meta-Balancer
Introduction
Outline
1 IntroductionMotivationLoad Balancing Challenges
2 Background
3 Meta-BalancerStatistics CollectionDecision Making
4 Evaluation
5 Conclusion and Future Work
3 / 30
Meta-Balancer
Introduction
Motivation
Motivation
Modern parallel applications on large systems
Difficult to program and extract best performancePerformance is limited by most overloaded processorThe chance that one processor is severely overloaded getshigher as no of processors increases
Load imbalance in parallel applications
Leads to drop in system utilizationHampers scalability of the application
4 / 30
Meta-Balancer
Introduction
Motivation
Motivation
Modern parallel applications on large systems
Difficult to program and extract best performancePerformance is limited by most overloaded processorThe chance that one processor is severely overloaded getshigher as no of processors increases
Load imbalance in parallel applications
Leads to drop in system utilizationHampers scalability of the application
4 / 30
Meta-Balancer
Introduction
Load Balancing Challenges
Load Balancing Challenges
Load balancing has to be profitable!
Determining factors
Incurred overheads - collection of statistics, execution ofstrategy to find the new mapping of tasks/work units, movingthe tasksWhen to perform load balance?Load balancing strategy selection
Adaptive load balancing is needed in a dynamic applications
5 / 30
Meta-Balancer
Introduction
Load Balancing Challenges
Load Balancing Challenges
Load balancing has to be profitable!
Determining factors
Incurred overheads - collection of statistics, execution ofstrategy to find the new mapping of tasks/work units, movingthe tasksWhen to perform load balance?Load balancing strategy selection
Adaptive load balancing is needed in a dynamic applications
5 / 30
Meta-Balancer
Introduction
Load Balancing Challenges
Load Balancing Challenges
Load balancing has to be profitable!
Determining factors
Incurred overheads - collection of statistics, execution ofstrategy to find the new mapping of tasks/work units, movingthe tasksWhen to perform load balance?Load balancing strategy selection
Adaptive load balancing is needed in a dynamic applications
5 / 30
Meta-Balancer
Introduction
Load Balancing Challenges
Meta-Balancer
Automating load balancing related decision making
Monitors the application continuously and predicts loadbehavior
Identifies when to invoke load balancing for optimalperformance based on
Predicted load behavior and guiding principlesPerformance in recent past
6 / 30
Meta-Balancer
Introduction
Load Balancing Challenges
Meta-Balancer
Automating load balancing related decision making
Monitors the application continuously and predicts loadbehavior
Identifies when to invoke load balancing for optimalperformance based on
Predicted load behavior and guiding principlesPerformance in recent past
6 / 30
Meta-Balancer
Introduction
Load Balancing Challenges
Meta-Balancer
Automating load balancing related decision making
Monitors the application continuously and predicts loadbehavior
Identifies when to invoke load balancing for optimalperformance based on
Predicted load behavior and guiding principlesPerformance in recent past
6 / 30
Meta-Balancer
Background
Outline
1 IntroductionMotivationLoad Balancing Challenges
2 Background
3 Meta-BalancerStatistics CollectionDecision Making
4 Evaluation
5 Conclusion and Future Work
7 / 30
Meta-Balancer
Background
Charm++
Message-driven parallel programming paradigm based onoverdecomposition and migratable objects
Programmer decomposes the problem into tasks
Charm++ RTS manages the scheduling of tasks on theprocessors
User View
System implementation
8 / 30
Meta-Balancer
Background
Charm++
Message-driven parallel programming paradigm based onoverdecomposition and migratable objects
Programmer decomposes the problem into tasks
Charm++ RTS manages the scheduling of tasks on theprocessors
User View
System implementation
8 / 30
Meta-Balancer
Background
Charm++
Message-driven parallel programming paradigm based onoverdecomposition and migratable objects
Programmer decomposes the problem into tasks
Charm++ RTS manages the scheduling of tasks on theprocessors
User View
System implementation
8 / 30
Meta-Balancer
Background
Dynamic Load Balancing Framework in Charm++
Based on principle of persistence
Instruments the application tasks at fine-grained level
Relies on application user to invoke load balancer and selectload balancing strategy
When the load balancing is invoked
Gathers the statistics based on the strategy (centralized orhierarchical)Executes load balancing strategyMigrates objects based on new mapping
9 / 30
Meta-Balancer
Background
Dynamic Load Balancing Framework in Charm++
Based on principle of persistence
Instruments the application tasks at fine-grained level
Relies on application user to invoke load balancer and selectload balancing strategy
When the load balancing is invoked
Gathers the statistics based on the strategy (centralized orhierarchical)Executes load balancing strategyMigrates objects based on new mapping
9 / 30
Meta-Balancer
Background
Dynamic Load Balancing Framework in Charm++
Based on principle of persistence
Instruments the application tasks at fine-grained level
Relies on application user to invoke load balancer and selectload balancing strategy
When the load balancing is invoked
Gathers the statistics based on the strategy (centralized orhierarchical)Executes load balancing strategyMigrates objects based on new mapping
9 / 30
Meta-Balancer
Background
Dynamic Load Balancing Framework in Charm++
Based on principle of persistence
Instruments the application tasks at fine-grained level
Relies on application user to invoke load balancer and selectload balancing strategy
When the load balancing is invoked
Gathers the statistics based on the strategy (centralized orhierarchical)Executes load balancing strategyMigrates objects based on new mapping
9 / 30
Meta-Balancer
Meta-Balancer
Outline
1 IntroductionMotivationLoad Balancing Challenges
2 Background
3 Meta-BalancerStatistics CollectionDecision Making
4 Evaluation
5 Conclusion and Future Work
10 / 30
Meta-Balancer
Meta-Balancer
Design Overview
Module to control load balancing related decision making
Implemented on top of Charm++ load balancing framework
Key responsibilities
Monitor the application: collect minimal statisticsIdentify the iteration to invoke load balancing to optimizeperformanceForm a consensus among participating processors on when toinvoke load balancing
11 / 30
Meta-Balancer
Meta-Balancer
Design Overview
Module to control load balancing related decision making
Implemented on top of Charm++ load balancing framework
Key responsibilities
Monitor the application: collect minimal statisticsIdentify the iteration to invoke load balancing to optimizeperformanceForm a consensus among participating processors on when toinvoke load balancing
11 / 30
Meta-Balancer
Meta-Balancer
Design Overview
Module to control load balancing related decision making
Implemented on top of Charm++ load balancing framework
Key responsibilities
Monitor the application: collect minimal statisticsIdentify the iteration to invoke load balancing to optimizeperformanceForm a consensus among participating processors on when toinvoke load balancing
11 / 30
Meta-Balancer
Meta-Balancer
Statistics Collection
Statistics Collection
a1 b1 a2 b2
c1 d2c2 d1
e1 e2 e3 e4
Stats Red 1
c3
e11 e12 e13
a9 b10
c8 d7
ROOT
PE0
PE1
PE2
Stats Red 2
Asynchronous collection
Overlaps with application executionSupported using Charm++’s treebased reductionNo barrier for statistics collection
Minimal statistics
Max loadAverage loadUtilization of processors
12 / 30
Meta-Balancer
Meta-Balancer
Statistics Collection
Statistics Collection
a1 b1 a2 b2
c1 d2c2 d1
e1 e2 e3 e4
Stats Red 1
c3
e11 e12 e13
a9 b10
c8 d7
ROOT
PE0
PE1
PE2
Stats Red 2
Asynchronous collection
Overlaps with application executionSupported using Charm++’s treebased reductionNo barrier for statistics collection
Minimal statistics
Max loadAverage loadUtilization of processors
12 / 30
Meta-Balancer
Meta-Balancer
Decision Making
Decision Making
Consider the load imbalance given by
ζ =Lmax − Lavg
Lavg
ζ > 0 means load imbalance; leads to performance loss
Should load balancing be invoked when ζ > 0?
Goal - minimize total execution time (application + loadbalancing overheads)
13 / 30
Meta-Balancer
Meta-Balancer
Decision Making
Decision Making
Consider the load imbalance given by
ζ =Lmax − Lavg
Lavg
ζ > 0 means load imbalance; leads to performance loss
Should load balancing be invoked when ζ > 0?
Goal - minimize total execution time (application + loadbalancing overheads)
13 / 30
Meta-Balancer
Meta-Balancer
Decision Making
Decision Making
Consider the load imbalance given by
ζ =Lmax − Lavg
Lavg
ζ > 0 means load imbalance; leads to performance loss
Should load balancing be invoked when ζ > 0?
Goal - minimize total execution time (application + loadbalancing overheads)
13 / 30
Meta-Balancer
Meta-Balancer
Decision Making
Model to Predict Ideal LB Period
Consider a linear model for load prediction based on collectedstatistics
Average load is represented by
Lavg = a ∗ t + la
Max load is represented by
Lmax = m ∗ t + lm
14 / 30
Meta-Balancer
Meta-Balancer
Decision Making
Model to Predict Ideal LB Period
Consider a linear model for load prediction based on collectedstatistics
Average load is represented by
Lavg = a ∗ t + la
Max load is represented by
Lmax = m ∗ t + lm
14 / 30
Meta-Balancer
Meta-Balancer
Decision Making
Model to Predict Ideal LB Period
Consider a linear model for load prediction based on collectedstatistics
Average load is represented by
Lavg = a ∗ t + la
Max load is represented by
Lmax = m ∗ t + lm
14 / 30
Meta-Balancer
Meta-Balancer
Decision Making
Model to Predict Ideal LB Period
Application execution time is sum of
Time spent on running application
Load Balancing overhead
Γ =η
τ× (
∫ τ
0(mt + lm)dt + ∆) +
∫ η
0(at + la)dt
τ be the ideal LB period,η be the total iterations an application executes,Γ be the total application execution time, and∆ be the cost associated with load balancing
15 / 30
Meta-Balancer
Meta-Balancer
Decision Making
Model to Predict Ideal LB Period
Application execution time is sum of
Time spent on running application
Load Balancing overhead
Γ =η
τ× (
∫ τ
0(mt + lm)dt + ∆) +
∫ η
0(at + la)dt
τ be the ideal LB period,η be the total iterations an application executes,Γ be the total application execution time, and∆ be the cost associated with load balancing
15 / 30
Meta-Balancer
Meta-Balancer
Decision Making
Model to Predict Ideal LB Period
Equating the differential oftotal time to zero to minimizeit, we obtain
d
dτ(Γ) = η × (
m
2− ∆
τ2) = 0
τ =
√2∆
m
PD[LPXP�OR
DG�SUHGLFWL
RQ�FXUYH
DYHUDJH�ORDG�SUHGLFWLRQ�FXUYH $UHD�EHWZHHQ�WKH�PD[LPXP�DQG�DYHUDJH�ORDG�SUHGLFWLRQ�FXUYHV�LV�WKH�WLPH�VDYHG�GXH�WR�ORDG�EDODQFLQJ��/RDG�EDODQFLQJ�LV�SURILWDEOH�LI�WKLV�DUHD�LV�JUHDWHU�WKDQ�/%�FRVW
WLPH�VWHSV�LWHUDWLRQV
/RDG
16 / 30
Meta-Balancer
Meta-Balancer
Decision Making
Model to Predict Ideal LB Period
Equating the differential oftotal time to zero to minimizeit, we obtain
d
dτ(Γ) = η × (
m
2− ∆
τ2) = 0
τ =
√2∆
m
PD[LPXP�OR
DG�SUHGLFWL
RQ�FXUYH
DYHUDJH�ORDG�SUHGLFWLRQ�FXUYH $UHD�EHWZHHQ�WKH�PD[LPXP�DQG�DYHUDJH�ORDG�SUHGLFWLRQ�FXUYHV�LV�WKH�WLPH�VDYHG�GXH�WR�ORDG�EDODQFLQJ��/RDG�EDODQFLQJ�LV�SURILWDEOH�LI�WKLV�DUHD�LV�JUHDWHU�WKDQ�/%�FRVW
WLPH�VWHSV�LWHUDWLRQV
/RDG
16 / 30
Meta-Balancer
Meta-Balancer
Decision Making
Consensus Mechanism
d7
e11 e12 e13
LB Period BCast 10
c8
Max Iteration
a10 b10
c9d8
ROOT
PE0
PE1
PE2
PAUSE b11 b13
Final LB Period BCast 13
d10 c13d9
LOAD BALANCE
1 2 3 4
PAUSE
17 / 30
Meta-Balancer
Evaluation
Outline
1 IntroductionMotivationLoad Balancing Challenges
2 Background
3 Meta-BalancerStatistics CollectionDecision Making
4 Evaluation
5 Conclusion and Future Work
18 / 30
Meta-Balancer
Evaluation
Evaluation
Applications
LeanMD: molecular dynamics simulation programFractography: used to study fracture surfaces of materials
Machines used
Ranger: SUN constellation cluster at TACCJaguar: Cray system at ORNL
Three sets of Experiments
No Load BalancingPeriodic Load BalancingUsing Meta-Balancer
19 / 30
Meta-Balancer
Evaluation
Evaluation
Applications
LeanMD: molecular dynamics simulation programFractography: used to study fracture surfaces of materials
Machines used
Ranger: SUN constellation cluster at TACCJaguar: Cray system at ORNL
Three sets of Experiments
No Load BalancingPeriodic Load BalancingUsing Meta-Balancer
19 / 30
Meta-Balancer
Evaluation
Evaluation
Applications
LeanMD: molecular dynamics simulation programFractography: used to study fracture surfaces of materials
Machines used
Ranger: SUN constellation cluster at TACCJaguar: Cray system at ORNL
Three sets of Experiments
No Load BalancingPeriodic Load BalancingUsing Meta-Balancer
19 / 30
Meta-Balancer
Evaluation
LeanMD with No Load Balancing
Overall processorutilization is 65%
No significant variation inprocessor loads during therun
20 / 30
Meta-Balancer
Evaluation
LeanMD with Periodic Load Balancing
10
100
1000
10000
8 16 32 64 128 256 512 1024
Elap
sed
time (
s)
LB Period
Elapsed time vs LB Period (Jaguar)
128 cores256 cores512 cores
1024 cores2048 cores4096 cores
Frequent load balancingincreases execution time
Periodic load balancingmay not give performancebenefit
21 / 30
Meta-Balancer
Evaluation
LeanMD with Meta-Balancer
Invoked load balancer atthe beginning
Thereafter frequency ofload balancing is low
Improved performance by31% and the overallutilization to 95%
22 / 30
Meta-Balancer
Evaluation
LeanMD - Comparison of Execution Time
Core No LB (s) Periodic LB (Period) (s) Meta-Balancer (s)
128 1945.16 1451.30 (200) 1388.29
256 1005.22 750.11 (200) 695.55
512 516.47 393.30 (400) 355.85
1024 264.15 209.64 (400) 190.52
2048 135.92 116.69 (400) 94.33
4096 70.68 69.6 (700) 57.83
Meta-Balancer outperforms periodic load balancing
23 / 30
Meta-Balancer
Evaluation
Fractography with No Load Balancing
Large variation inprocessor utilization
Low utilization leading toresource wastage
24 / 30
Meta-Balancer
Evaluation
Fractography with Periodic Load Balancing
Frequent load balancingleads to high overheadand no benefit
Infrequent load balancingleads to load imbalanceand results in no gains
25 / 30
Meta-Balancer
Evaluation
Fractography with Meta-Balancer
Identifies the need forfrequent load balancing inthe beginning
Frequency of loadbalancing decreases asload becomes balanced
Increases overall processorutilization and gives gainof 31%
26 / 30
Meta-Balancer
Conclusion and Future Work
Outline
1 IntroductionMotivationLoad Balancing Challenges
2 Background
3 Meta-BalancerStatistics CollectionDecision Making
4 Evaluation
5 Conclusion and Future Work
27 / 30
Meta-Balancer
Conclusion and Future Work
Conclusion
Difficult to find the optimum load balancing period
Depends on the application characteristicsDepends on the machine the application is run on
Meta-Balancer automates the decision of when to invoke loadbalancing based on application characteristics
Meta-Balancer adaptively identifies load balancing period
Meta-Balancer obtains substantial gains and avoids repetitiveexperimentation
28 / 30
Meta-Balancer
Conclusion and Future Work
Conclusion
Difficult to find the optimum load balancing period
Depends on the application characteristicsDepends on the machine the application is run on
Meta-Balancer automates the decision of when to invoke loadbalancing based on application characteristics
Meta-Balancer adaptively identifies load balancing period
Meta-Balancer obtains substantial gains and avoids repetitiveexperimentation
28 / 30
Meta-Balancer
Conclusion and Future Work
Conclusion
Difficult to find the optimum load balancing period
Depends on the application characteristicsDepends on the machine the application is run on
Meta-Balancer automates the decision of when to invoke loadbalancing based on application characteristics
Meta-Balancer adaptively identifies load balancing period
Meta-Balancer obtains substantial gains and avoids repetitiveexperimentation
28 / 30
Meta-Balancer
Conclusion and Future Work
Conclusion
Difficult to find the optimum load balancing period
Depends on the application characteristicsDepends on the machine the application is run on
Meta-Balancer automates the decision of when to invoke loadbalancing based on application characteristics
Meta-Balancer adaptively identifies load balancing period
Meta-Balancer obtains substantial gains and avoids repetitiveexperimentation
28 / 30
Meta-Balancer
Conclusion and Future Work
Future Work
Extend Meta-Balancer to select load balancing strategy
Computation vs Communication strategyRefinement vs Comprehensive strategyCentralized vs Distributed strategy
Better models for predicting load
29 / 30
Meta-Balancer
Conclusion and Future Work
Future Work
Extend Meta-Balancer to select load balancing strategy
Computation vs Communication strategyRefinement vs Comprehensive strategyCentralized vs Distributed strategy
Better models for predicting load
29 / 30
Meta-Balancer
Conclusion and Future Work
Thank you!
30 / 30