SEEC: A Framework for SElf-awarE Computing
Henry Hoffmann, Martina Maggio
2
Outline
• Introduction/Motivating Example
• The SEEC Framework
• Experimental Validation
• Conclusions
2
3
Multicore Computing Systems Increase Burden on Application Developers
CorrectnessSpeed: Architecture
Power /Energy
Application
Speed: A lgor ithms
Today, application programmers have to address many, sometimes competing, concerns
Quality
beat/s
Lo Hi
Power
4
Example: Developing a Multicore Video Encoder
VideoEncoder
Lo Hi
Power Meter
Allocate resources for best case
Encoder must drop frames to keep up
The power is low
VideoEncoder
Lo Hi
Power Meter
Allocate resources for worst case
Encoder Exceeds Goals
The power is high, resources wasted
Application programmers need to balance competing constraints in fluctuating environments
5
Self-aware Computing Can Address Challenges of Multi-objective Optimization
Self-aware (self-*, adaptive, etc.) computing has become a discipline unto itself:
• Laddaga [DARPA BAA 1997, IEEE Intelligent Systems 1999]
• Kephart and Chess [IEEE Computer 2003]
• Babaoglu et al. [LNCS 2005]
5
Self-aware (or self-*, adaptive, autonomic, etc.) systems have the flexibility to change behavior online to balance multiple needs in dynamic
environments
6
The Self-Aware Computing Idea
Traditional Systems Self-Aware Systems
Decide
Act
• Run in open loop
• Assumptions made at design time• Based on guesses about future
Decide Act
Observe
• Run in closed loop
• Understand user goals • Monitor the environment
• Programmer optimizes for system• No flexibility to adapt to changes
• System optimizes for application• Flexibly adapt behavior
7
Prior Work in Self-Aware Systems
• Self-aware/Adaptive/Autonomic systems have been used to solve problems in:– Hardware [Bitirgen et al, MICRO 2008, Albonesi et al. IEEE Computer 2003]– Software [Salehie & Tahvildari ACM TAAS 2009]– Real-time Systems [Block et al. ECRTS 2008]– Mobile Computing [Masters MobileHCI 2008]– Dynamic Compilation [Sorber et al, SenSys 2007, Baek & Chilimbi PLDI 2010]– Numerical Libraries [ATLAS, SPIRAL, FFTW]– Many others…
We build on previous work to create a programming model for self-aware systems
8
Outline
• Introduction/Motivating Example
• The SEEC Framework
• Experimental Validation
• Conclusions
8
9
SElf-awarE Computing (SEEC) Framework
• Goal:Reduce programmer burden with self-aware programming model
• Key Features:1. Applications explicitly state goals, system meets goals optimally2. One unified decision engine adapts algorithms, software and hardware
Power /Energy
Application
Quality
Speed beat/s
Lo Hi
Power
10
Example Self-Aware System Built from SEEC
• At key intervals, applications issue a heartbeat (e.g. once per frame)
• Apps also register desired performance (e.g. 30 beats (frames) per second)
• The performance (heart rate) and goals can be read by system software
• If performance is low the system adapts to increase performance
• If performance exceeds goals, the system frees resources
Video Encoder Self-Aware System
Goals:30 beat/s
Cores1 16
Speed1.6 2.4
Bandwidth1 10
AP
I20 b/s30 b/s33 b/s30 b/s
11
Roles in the SEEC Framework
Application Developer
Systems Developer
SEECSystem
InfrastructureExpress application goals and progress(e.g. frames/ second)
Read goals and performance
Determine how to adapt (e.g. How much to speed up the application)
Provide a set of actions and a callback function(e.g. allocation of cores to process)
Initiate actions based on results of decision phase
Observe
Decide
Act
12
Registering Application Goals
• Performance– Goals: target heart rate and/or latency between tagged heartbeats– Progress: issue heartbeats at important intervals
• Quality– Goals: distortion (distance from “best” value)– Progress: distortion over last heartbeat
• Power– Goals: target heart rate / Watt and/or target energy between tagged heartbeats– Progress: Power/energy over last heartbeat interval
Observe
Application
Lo Hi
Power
SEECDecision Engine
Performance
Power
Quality
Research to date focuses on meeting performance while minimizing power/maximizing quality
13
Registering System Actions
Each action has the following attributes:• Estimated Speedup
– Predicted benefit of taking an action• Cost
– Predicted downside of taking an action (increased power, lowered quality)• Callback
– A function that takes an id an implements the associated action
Act
SEECDecision Engine
System Services
Cores1 16
Speed1.6 2.4
Bandwidth1 10
Estimated Speedup
Cost
Callback
14
The SEEC Decision Engine
Decisions are made to select actions given observations:• Read application goals and heartbeats• Determine speedup with adaptive 2nd order control system• Translate speedup into one or more actions
Decide
SEEC Decision EngineApplication System Services
Lo
Hi
Power
The control system provides predictable and analyzable behavior
15
Optimizing Resource Allocation with SEEC
• SEEC can observe, decide and act
• How does this enable optimal resource allocation?
• Let’s implement the video encoder example from the introduction
16
Performance/Watt Adaptation in Video Encoding
Performance goal
0
10
20
30
40
50
60
70
80
50 150 250 350 450Time (Heartbeat)
Per
form
ance
(Fra
me/
s)
130
140
150
160
170
180
0 2 4 6 8 10 12 14 16Time (s)
Pow
er (W
)
Performance Power
17
System Models in SEEC
17
1
2
3
4
5
6
7
8
9
0 5 10 15 20 25 30Action
Spee
dup
Initial Model
SEEC’s control system takes actions based on models (of speedup and cost per action) associated with actions
What if the models are inaccurate?
18
SEEC Decision and Adaptation Engine
Updating Models in SEEC
• After every action, SEEC updates system models
• Kalman filters used to estimate true speedups
Decide
0
0.2
0.4
0.6
0.8
1
1.2
blue_sky.yu
v
crowd_run_1080p.yu
v
dinner.yuv
ducks_take_off_
1080p.yuv
factory.
yuv
in_to_tree_1080p.yu
vlife
.yuv
native.yu
v
old_town_cross_
1080p.yuv
park_joy_1080p.yu
v
pedestrian_area.yu
v
riverbed.yu
v
rush_hour.yuv
station2.yu
v
sunflower.yuv
tracto
r.yuv
average
Norm
alize
d Pe
rform
ance
/Wat
t
static worst static oracle SEEC, known model SEEC, learned model
Application System Services
Lo
Hi
Power
1
2
3
4
5
6
7
8
9
0 5 10 15 20 25 30Action
Spee
dup
Initial Model
SEEC combines predictability of control systems with adaptability of learning systems
19
SEEC Online Learning of Speedup Model for Application with Local Minima
1
2
3
4
5
6
7
8
9
0 5 10 15 20 25 30
Action
Spee
dup
Initial Model Actual Speedups Learned Speedups
20
Handling Multiple Applications
20
Decide
SEEC Decision Engine System Services
Application
Lo
Hi
Power
Application
Lo
Hi
Power
Application
Lo
Hi
Power
Application
Lo
Hi
Power
• Control actions computed separately for each application
• For finite resources, several alternatives:
• Priorities determine which apps meet resource needs
• Weights determine proportional assignment
21
Outline
• Introduction/Motivating Example
• The SEEC Framework
• Experimental Validation
• Conclusions
21
22
Systems Built with SEEC
22
System Actions Tradeoff BenchmarksDynamic Loop Perforation
Skip some loop iterations Performance vs. Quality
7/13 PARSECs
Dynamic Knobs Make static parameters dynamic
Performance vs. Quality
bodytrack, swaptions, x264, SWISH++
Core Scheduler Assign N cores to application
Compute vs. Power 11/13 PARSECs
Clock Scaler Change processor speed Compute vs. Power 11/13 PARSECs
Bandwidth Allocator Assign memory controllers to application
Memory vs. Power STREAM (doesn’t make a difference for PARSEC)
Power Manager Combination of the three above
Performance vs. Power
PARSEC, STREAM, simple test apps (mergesort, binary search)
Learned Models Power Manager with speedup and cost learned online
Performance vs. Power
PARSECs
Multi-App Control Power Manager with multiple applications
Performance vs. Power for multiple applications
Combinations of PARSECs
23
Systems Built with SEEC
System Actions Tradeoff BenchmarksDynamic Loop Perforation
Skip some loop iterations Performance vs. Quality
7/13 PARSECs
Dynamic Knobs Make static parameters dynamic
Performance vs. Quality
bodytrack, swaptions, x264, SWISH++
Core Scheduler Assign N cores to application
Compute vs. Power 11/13 PARSECs
Clock Scaler Change processor speed Compute vs. Power 11/13 PARSECs
Bandwidth Allocator Assign memory controllers to application
Memory vs. Power STREAM (doesn’t make a difference for PARSEC)
Power Manager Combination of the three above
Performance vs. Power
PARSEC, STREAM, extra test apps (mergesort, binary search)
Learned Models Power Manager with speedup and cost learned online
Performance vs. Power
PARSECs
Multi-App Control Power Manager with multiple applications
Performance vs. Power for multiple applications
Combinations of PARSECs
23
24
Dynamic Knobs: Creating Adaptive Applications
24
Application Goals
System Actions
Experiment
Turn static command line parameters into dynamic structure
Detail in Hoffmann et al. “Dynamic Knobs for Power Aware Computing” ASPLOS 2011
Maintain performance and minimize quality loss
Adjust memory locations to change application settings
Benchmarks: bodytrack, swaptions, SWISH++, x264
Maintain performance when clock speed changes
25
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 50 100 150 200 250Time
Nor
mal
ized
Per
form
ance
Baseline Power Cap Dynamic Knobs
ResultsEnabling Dynamic Applications
bodytrack
Clock drops 2.4-1.6GHz
w/o SEEC perf. drops
w/ SEEC perf. recovers
Clock rises 1.6-2.4 GHz
SEEC returns quality to baseline
26
swaptionsSWISH++ x264
Dynamic knobs automatically enable dynamic response for a range of applications using a single mechanism
Maintains performance despite noise
Perfect behavior Maintains baseline performance
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 200 400 600 800 1000Time
Nor
mal
ized
Per
form
ance
Baseline Power Cap Dynamic Knobs
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 100 200 300 400 500Time
Nor
mal
ized
Per
form
ance
Baseline Power Cap Dynamic Knobs
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 100 200 300 400 500 600Time
Nor
mal
ized
Per
form
ance
Baseline Power Cap Dynamic Knobs
ResultsEnabling Dynamic Applications
27
Optimizing Performance per Watt for Video Encoding
27
Application Goals
System Actions
Experiment
Adapt system behavior to needs of individual inputs
Maintain 30 frame/s while minimizing power
Change cores, clock speed, and memory bandwidth
Benchmark: x264 w/ 16 different 1080p inputs
Compare performance/Watt w/ SEEC to best static allocation of resources
28
ResultsOptimizing Performance/Watt for Video Encoder
0
0.2
0.4
0.6
0.8
1
1.2
Nor
mal
ized
Per
form
ance
/Wat
t
static worst static best SEEC
29
Learning Models Online
29
Application Goals
System Actions
Experiment
Adapt system behavior to needs of individual inputs
Maintain 30 frame/s while minimizing power
Change cores, clock speed, and memory bandwidth
Tailor models to individual applications while running
Benchmark: x264 w/ 16 different 1080p inputs
Compare performance/Watt w/ learned model to previous measurements
30
ResultsPerformance/Watt with Online Learning
30
0
0.2
0.4
0.6
0.8
1
1.2
Nor
mal
ized
Per
form
ance
/Wat
t
static worst static oracle SEEC, known model SEEC, learned model
31
Managing Application and System Resources Concurrently
31
Application Goals
System Actions
Experiment
Manage multiple applications when clock frequency changes
bodytrack: maintain performance, minimize power
x264: maintain performance, minimize quality loss
Change core allocation to both applications
Change x264’s algorithms
Maintain performance of both application when clock frequency changes
32
ResultsSEEC Management of Multiple Applications
bodytrack x264
32
0
0.5
1
1.5
2
2.5
40 90 140 190 240Time (Heartbeat)
No
rmal
ized
Per
form
ance
0
1
2
3
4
5
6
7
Co
res
bodytrack w/ adaptationbodytrackbodytrack cores
0
0.5
1
1.5
2
2.5
40 90 140 190 240Time (Heartbeat)
No
rmal
ized
Per
form
ance
0
1
2
3
4
5
6
7
Co
res
x264 w/ adaptationx264x264 cores
Clock drops 2.4-1.6GHz w/o SEEC app
misses goals
SEEC allocates cores to bodytrack
w/o SEEC app exceeds goals
SEEC removes cores from x264
SEEC adjusts algorithm to meet goals
Summary of Experiments
Experiment Demonstrated Benefit of SEECDynamic Knobs Maintains application performance in the face of loss of
compute resources
Performance/Watt Out-performs oracle for static allocation of resources by adapting to fluctuations in input data
Performance/Watt with learning
Learns models online and still achieve 95% of static oracle
Multi-App control Maintains performance of multiple apps by managing algorithm and system resources to adapt to loss of compute resources
33
34
Outline
• Introduction/Motivating Example
• The SEEC Framework
• Experimental Validation
• Conclusions
34
35
SEEC References
• Application Heartbeats framework:– Hoffmann, Eastep, Santambrogio, Miller, Agarwal. Application Heartbeats: A Generic
Interface for Specifying Program Performance and Goals in Autonomous Computing Environments. ICAC 2010
• Control Systems:– Maggio, Hoffmann, Santambrogio, Agarwal, Leva. Controlling software applications within
the Heartbeats framework. CDC 2010– Maggio, Hoffmann, Santambrogio, Agarwal, Leva. Power-Aware Design for Embedded
Systems. Under Review.
• Adaptive Applications:– Hoffmann, Misailovic, Sidiroglou, Agarwal, Rinard. Using Code Perforation to Improve
Performance, Reduce Energy Consumption, and Respond to Failures. MIT-CSAIL-TR-2209-042. 2009
– Hoffmann, Sidiroglou, Carbin, Misailovic, Agarwal, Rinard. Dynamic Knobs for Power-Aware Computing. ASPLOS 2011.
• The SEEC Framework:– Hoffmann, Maggio, Santambrogio, Leva, Agarwal. SEEC: A Framework for Self-aware
Computing. MIT-CSAIL-TR-2010-049 2010.
35
36
Conclusions
• SEEC is designed to help ease programmer burden– Solves resource allocation problems– Adapts to fluctuations in environment
• SEEC has two distinguishing features– Incorporates goals and feedback directly from the application– Abstracts sensors, controller, and actuator to create a generic feedback
control system capable of managing algorithm, software, and hardware adaptation
• Demonstrated the benefits of SEEC in several experiments– SEEC can optimize performance per Watt for video encoding– SEEC can adapt algorithms and resource allocation to meet goals in the
face of power caps or other changes in environment
36
37
Backup Slides
0
0.2
0.4
0.6
0.8
1
1.2
0 100 200 300 400 500
Time (Heartbeat)
Nor
mal
ized
Per
form
ance
static min static max SEEC, pure delay SEEC, slow convergence SEEC, oscillating
38
Controlling Memory Bandwidth for STREAM
0
0.2
0.4
0.6
0.8
1
1.2
0 200 400 600 800 1000
Time (Heartbeat)
Nor
mal
ized
Per
form
ance
static min static max SEEC, pure delay SEEC, slow convergence SEEC, oscillating
39
Using Code Perforation toSave Power in Server Farms
• Currently peak load met by provisioning extra hardware
• Instead, we reduce hardware
• At low loads, no perforation necessary
• At high loads, perforation increases capacity – Runtime detects performance degradation from load– Runtime adjusts perforation in running apps to respond to load– Same peak load met with fewer machines
• Tested by consolidating mini-server farm39
40
Power Saving With Code Perforation
0
150
300
450
600
750
900
0 0.5 10
5
10
15
20
25
30
• Power:• Up to 3/4 reduction in machines and power for video• Up to 1/3 reduction in machines and power for search
• Quality: • Max 8% loss for video• Max 30% loss for search (Fewer total results – precision of top 10 unchanged)
Pow
er (W
)
Qua
lity
Loss
(%)
Utilization
H.264 Video Encode SWISH++ Search Engine
Original SystemConsolidated SystemUtilization
Quality Loss