SizeCap: Efficiently Handling Power Surges for Fuel Cell Powered Data Centers
Yang Li, Di Wang, Saugata Ghose, Jie Liu, Sriram Govindan, Sean James, Eric Peterson,
John Siegler, Rachata Ausavarungnirun, Onur Mutlu
Executive Summary n Fuel cells: efficient power source for data centers
n Problem: limited load following capability q Fuel cells only gradually increase output power when load increases q Power surges may lead to a power shortfall à server shutdown or
damage
n Existing Approaches q Power capping: hurts performance q Energy storage device (ESD): increases cost
n Our Approach: SizeCap q Our goal: low cost, still guarantee workload performance q Key Idea 1: Size the ESD to cover only typical-case power surges q Key Idea 2: Use smart power capping, which is aware of fuel cell
and workload behavior, to handle remaining power surges
n SizeCap safely reduces ESD size by 50 – 85% 2
Outline
n Background n Problem n Existing Approaches n Key Ideas n Detailed Design n Evaluation n Conclusion
3
Fuel Cell Powered Data Centers n Data center power consumption continues to grow
q In USA alone: q We need more energy-efficient power sources
n Fuel cells q Convert fuel (e.g., hydrogen, natural gas) into electricity q Advantages: high energy efficiency, low CO2 emission, highly reliable delivery infrastructure
4
91 billion kWh @2013 à 140 billion kWh @2020
Server 1FuelCell
System
Rack
...Server N
Outline
n Background n Problem n Existing Approaches n Key Ideas n Detailed Design n Evaluation n Conclusion
5
Problem: Limited Load Following Capability n Fuel cell power output only gradually increases when power
demand increases
6
0 1 2 3 4 5 6 7 8
6
8
10
Time (min)
Powe
r (kW
)
Rack Power Demand
0 1 2 3 4 5 6 7 8
6
8
10
Time (min)
Powe
r (kW
)
0 1 2 3 4 5 6 7 8
6
8
10
Time (min)
Powe
r (kW
)
Rack Power DemandFuel Cell Power
Problem: Limited Load Following Capability n Fuel cell power output only gradually increases when power
demand increases
7
0 1 2 3 4 5 6 7 8
6
8
10
Time (min)
Powe
r (kW
)
Rack Power Demand
0 1 2 3 4 5 6 7 8
6
8
10
Time (min)
Powe
r (kW
)
0 1 2 3 4 5 6 7 8
6
8
10
Time (min)
Powe
r (kW
)
0 1 2 3 4 5 6 7 8
6
8
10
Time (min)
Powe
r (kW
)
0 1 2 3 4 5 6 7 8
6
8
10
Time (min)
Powe
r (kW
)
Rack Power DemandFuel Cell Power
Power Shortfall
Problem: Limited Load Following Capability n Fuel cell power output only gradually increases when power
demand increases
n Can lead to server damage or shut down
8
0 1 2 3 4 5 6 7 8
6
8
10
Time (min)
Powe
r (kW
)
Rack Power Demand
0 1 2 3 4 5 6 7 8
6
8
10
Time (min)
Powe
r (kW
)
0 1 2 3 4 5 6 7 8
6
8
10
Time (min)
Powe
r (kW
)
0 1 2 3 4 5 6 7 8
6
8
10
Time (min)
Powe
r (kW
)
0 1 2 3 4 5 6 7 8
6
8
10
Time (min)
Powe
r (kW
)
0 1 2 3 4 5 6 7 8
6
8
10
Time (min)
Powe
r (kW
)
Power Shortfall
0 1 2 3 4 5 6 7 8
6
8
10
Time (min)
Powe
r (kW
)
Rack Power DemandFuel Cell Power
Problem: Limited Load Following Capability n Fuel cell power output only gradually increases when power
demand increases
n Can lead to server damage or shut down
9
0 1 2 3 4 5 6 7 8
6
8
10
Time (min)
Powe
r (kW
)
Rack Power Demand
0 1 2 3 4 5 6 7 8
6
8
10
Time (min)
Powe
r (kW
)
0 1 2 3 4 5 6 7 8
6
8
10
Time (min)
Powe
r (kW
)
0 1 2 3 4 5 6 7 8
6
8
10
Time (min)
Powe
r (kW
)
0 1 2 3 4 5 6 7 8
6
8
10
Time (min)
Powe
r (kW
)
0 1 2 3 4 5 6 7 8
6
8
10
Time (min)
Powe
r (kW
)
Power Shortfall
0 1 2 3 4 5 6 7 8
6
8
10
Time (min)
Powe
r (kW
)
Rack Power DemandFuel Cell Power
How can we efficiently handle power shortfalls?
Outline
n Background n Problem n Existing Approaches n Key Ideas n Detailed Design n Evaluation n Conclusion
10
Existing Approaches to Handling Power Shortfalls n Power capping
q Cuts down the power demand q Performs DVFS or shuts down
nodes q Low cost q Hurts performance
n Energy storage device (ESD) q Buffers energy q Supplies extra energy when needed q High performance q High cost: ESD is sized to handle
worst-case power surges, even though they rarely occur
n Our goal: high performance, low cost
11
Cost
Performance
ESD
Power Capping
Our Goal
Outline
n Background n Problem n Existing Approaches n Key Ideas n Detailed Design n Evaluation n Conclusion
12
SizeCap: Key Ideas
Key Idea 1: Size ESD based on typical-case power surges, not worst-case surges
Key Idea 2: Use smart power capping to handle remaining power surges
13
Key Idea 1: Size ESD Based on Typical Case n We study production data center traces from Microsoft
n Unavailable period: time that underprovisioned ESD cannot handle power surges
n Trace 1: reduce ESD size by 85% à only 0.4% unavailable period
14
0 6 12 18 244
5
6
7
8
9
10
11
Time (hour)
Rac
k Po
wer
(kW
)
Trace1
0 50 1000
5
10
15
ESD Capacity (kJ)
Per
cent
age
of
Una
vaila
ble
Perio
d (%
)
Typical Case Worst Case
Key Idea 1: Size ESD Based on Typical Case n We study production data center traces from Microsoft
n Trace 2: reduce ESD size by 50% à 6.2% unavailable period
15
0 0.5 1 1.5 24
6
8
10
Time (hour)
Rac
k Po
wer
(kW
)
Trace2
0 20 40 60 800
20
40
60
ESD Capacity (kJ)
P
erce
ntag
e of
U
nava
ilabl
e Pe
riod
(%)
Typical Case
Worst Case
Sizing ESD based on typical-case power surges does not hurt performance significantly
SizeCap: Key Ideas
Key Idea 1: Size ESD based on typical-case power surges, not worst-case surges
Key Idea 2: Use smart power capping to handle remaining power surges
16
Key Idea 2: Smart Power Capping n Make power capping aware of fuel cell load following behavior
q Fuel cells respond differently to different power surges q With fuel cell load following model, we can know how fuel cell power
responds to rack power demand q Control the rack power such that it never exceeds sum of fuel cell power
and ESD output
n Make power capping aware of workload behavior q Workload performance is dependent on how power is allocated over
time q Allocate power over time to maximize workload performance
17
Smart power capping uses fuel cell, workload behavior to deliver higher benefits
SizeCap
n A framework to reduce ESD capacity by employing smart power capping policies
n At design time q Select best power capping policy implementable in system q Find minimum ESD size that still meets service level
agreement (SLA) under the selected policy
n At runtime q Period-based power control q Every period: use power capping policy to determine power
used by each server in next period
18
Outline
n Background n Problem n Existing Approaches n Key Ideas n Detailed Design n Evaluation n Conclusion
19
Design Time: Policy Selection & ESD Sizing
20
ESD Capacity+
Power CappingPolicy
ESD Sizing Engine
Power Capping Policy Pool
RepresentativeWorkload/Trace
Service Level Agreement (SLA)
Power CappingConstraints
SizeCap
Best CappingPolicy
Runtime: Execute Power Capping Policy
n Power Budget Planner: Plan total rack power budget for next period n Power Budget Assigner: Distribute rack power among the servers for
next period
n Controller can be centralized or decentralized
21
Fuel CellSystem Info Power
BudgetAssignerESD
Fuel Cell System Power
BudgetPlanner
Server Power Budget(s)
Server Info(s)
Server(s)Rack PowerBudget
Power Capping Controller
ESDEnergy
Power Capping Policy Taxonomy
22
Power Capping Policy
C FCA WA
D FCA WA
C FCA WU
D FCA WU
C FCU WA
D FCU WA
C FCU WU
D FCU WU
FCA WA
FCA WU
FCU WA
FCU WU
FCA FCU Fuel Cell Model
Aware vs. Unaware
Workload Aware vs. Unaware
Centralized vs. Decentralized
Power Capping Policy Taxonomy
23
Power Capping Policy
C FCA WA
C FCA WU
D FCA WU
C FCU WU
D FCU WU
FCA WA
FCA WU
FCU WU
FCA FCU Fuel Cell Model
Aware vs. Unaware
Workload Aware vs. Unaware
Centralized vs. Decentralized
Fuel Cell Unaware Workload Unaware
Fuel Cell and Workload Unaware Policies n Goal: No power shortfalls, optimize performance in next
period
n Power Budget Planner n Use ESD first n When ESD is used up
q Ramp up rack power with conservative but safe rate q Static rate that guarantees no shortfalls in entire fuel cell operating
range
n Power Budget Assigner n Assign power to each server proportional to each server’s workload
intensity or current power consumption
24
Power Capping Policy Taxonomy
25
Power Capping Policy
C FCA WA
C FCA WU
D FCA WU
C FCU WU
D FCU WU
FCA WA
FCA WU
FCU WU
FCA FCU Fuel Cell Model
Aware vs. Unaware
Workload Aware vs. Unaware
Centralized vs. Decentralized
Fuel Cell Aware Workload Unaware
Fuel Cell Aware, Workload Unaware Policies n Goal: No power shortfalls, optimize performance in next
period
n Power Budget Planner n Use ESD first n When ESD is used up
q Ramp up rack power with maximum safe ramp rate q Dynamically adapted rate to guarantee no shortfalls only under
current conditions, derived from fuel cell model
n Power Budget Assigner n Same as fuel cell and workload unaware policies
26
Power Capping Policy Taxonomy
27
Power Capping Policy
C FCA WA
C FCA WU
D FCA WU
C FCU WU
D FCU WU
FCA WA
FCA WU
FCU WU
FCA FCU Fuel Cell Model
Aware vs. Unaware
Workload Aware vs. Unaware
Centralized vs. Decentralized
Fuel Cell Aware Workload Aware
Fuel Cell and Workload Aware Policy n Goal: No power shortfalls, optimize performance over
multiple periods n Spend max ESD power now, cap aggressively in later periods n Save some power now and cap more, use power in later periods
n Power Budget Planner n Use fuel cell model to find all safe power capping settings n Use workload behavior to assign power
q Look at how workload performs over next several periods under different power allocations
q Pick power capping setting that maximizes long-term performance
n Power Budget Assigner n Similar to previous policies
28
Outline
n Background n Problem n Existing Approaches n Key Ideas n Detailed Design n Evaluation n Conclusion
29
Evaluation Methodology n Simulation Configuration
q Rack with 45 production servers q Each server runs power capping driver developed in-house
n Traces q Production traces collected from Microsoft data centers q WebSearch workload
n Metrics q Success rate: Percentage of requests completed within the
maximum allowable service time q Average latency: Average service latency of all requests q P95 latency: 95th percentile (tail) latency
30
Key Evaluation Results
n SLA: Assume margins of 0.1% success rate, 3% average latency, and 10% P95 latency under fully-provisioned ESD
n D-FCA-WU: Safely reduces ESD size by 85% for Trace 1, by 50% for Trace 2, and meets SLA
n Policies with awareness of fuel cell and/or workload behavior reduce ESD size 10–20% more than unaware policies
31
0 6 12 18 244
6
8
10
Time (hour)
Rac
k Po
wer
(kW
)
Trace1
0 0.5 1 1.5 24
6
8
10
Time (hour)
Rac
k Po
wer
(kW
)
Trace2
Conclusion n Fuel cells: efficient power source for data centers
n Problem: limited load following capability q Fuel cells only gradually increase output power when load increases q Power surges may lead to a power shortfall à server shutdown or
damage
n Existing Approaches q Power capping: hurts performance q Energy storage device (ESD): increases cost
n Our Approach: SizeCap q Our goal: low cost, still guarantee workload performance q Key Idea 1: Size the ESD to cover only typical-case power surges q Key Idea 2: Use smart power capping, which is aware of fuel cell
and workload behavior, to handle remaining power surges
n SizeCap safely reduces ESD size by 50 – 85% 32
SizeCap: Efficiently Handling Power Surges for Fuel Cell Powered Data Centers
Yang Li, Di Wang, Saugata Ghose, Jie Liu, Sriram Govindan, Sean James, Eric Peterson,
John Siegler, Rachata Ausavarungnirun, Onur Mutlu
Impact of ESD Cost on TCO n ESD cost
q Supercapacitor: $5.6 per kJ [McCawley, Fung Institute 2014]
n ESD sizes for our traces q Trace 1
n Fully-provisioned: 112.5 kJ per rack à $630.00 n After SizeCap: 16.9 kJ per rack à $94.50
q Trace 2 n Fully-provisioned: 68.0 kJ per rack à $380.80 n After SizeCap: 34.0 kJ per rack à $190.40
34