How Much Power Oversubscription is
Safe and Allowed in Data Centers?
1EECS @ University of Tennessee, Knoxville
2ECE @ The Ohio State University
3IBM Research, Austin
Xing Fu1,2, Xiaorui Wang1,2, Charles Lefurgy3
2
Introduction
� Power: a first-class constraint in data center design
� Power oversubscription by power capping
� Improves power facility utilization
� Improves server performance
� Power capping at different levels
� Servers, racks, and data centers
� However, they all share a common assumption
Power should never exceed the rated power capacity?
� Otherwise the circuit breaker (CB) would trip?
� Not really! circuit breakers can sustain short overloads.
3
Trip curve of a typical circuit breaker
How Much Power Subscription is Safe?
� A CB trips or not depends on
� Magnitude of the overload
� Duration of the overload
� Ideal upper bound?
� Lower bound of the tolerance band
� This paper
� Investigates CB trip features
� Proposes adaptive power control to
1. Fully utilizes the allowed overload
interval for maximized server perf.
2. Safely hosts more servers without
upgrading power facilities
Rated capacity
1.17 rated capacity
Two
minutes
4
Proposed Solution: CB-Adaptive� More than just a standalone controller
� A methodology that adapts the parameters of existing power
controllers to engineer their settling times
� Example: adapts a server power controller [Lefurgy ICAC’07]
1. Obtain the tripping time from the CB tripping curve
2. The desired settling time should be the tripping time
3. Adapt controller parameter K to enforce the settling time
Server
CPU frequency Power measuredPower Cap
Error
K Error×
Adapt controller parameter K
5
CB-Adaptive Design Details
� System model
� p(k) is the power of the server
� d(k) is the change to the CPU frequency
� A is a hardware-specific parameter when the server runs
LINPACK
� How to adapt the controller parameter?
� The relationship between the parameter and the settling time
� The parameter is a function of the measured power, the rated
current of CB, and the control period.
( 1) ( ) ( )p k p k Ad k+ = +
6
Two CB-Adaptive Improvements
� Temperature-aware CB-Adaptive
� The CB trip curve is impacted by
the ambient temperature.
� The rated current of CB is a linear
function of the temperature.
� K is also a function of ambient
temperature.
� CB-Proactive
� Delicately increases DVFS level in a proactive way
� Further improves the server performance
� When and to what extent the DVFS level is increased?
� CB enters the long-delay region
� Increase the frequency to the highest level
0.9
1
1.1
1.2
10 20 30 40 50
Temperature
Th
e r
ate
d c
urr
en
t
7
Discussion on Power Oversubscription
� Possible applications of CB-Adaptive
� Hosting additional servers
� Safety issues
� A typical power delivery system
� Every component can tolerate overloads like CBs
� Overload capacity: power beyond which permanent damage
occurs to the component
8
More Discussion
� Components other than CBs do not experience
overloads frequently.
� It is less likely that many servers reach their peak power
simultaneously.
� Evidenced by a real Google data center [Fan ISCA’07]
� When only a branch circuit is overloaded
� CB-Adaptive can be applied directly
� When multiple branch circuits are overloaded
� CB-Adaptive needs to consider the tripping time of
components other than CBs.
9
Hardware Testbed
� Dell OptiPlex 380
� Rockwell Allen-
Bradley 1489-A
Industrial CB
� Workloads
� SPEC CPU2006
� SPEC JBB
� LINPACK
10
Baselines
� NoControl
� Estimates the peak power consumption of a server
� No power caps
� Unsafe and conservative
� P-Control
� Measures the power in every control period
� A non-adaptive proportional controller calculates frequency
changes to enforce a power budget.
� P-Control-CB
� The power budget is different from that of P-Control
� Upper bound of the long-delay region of the CB
11
Power Control Comparison
� NoControl causes the CB
trips. Unsafe
� P-Control & P-Control-CB
Unsafe and conservative
� CB-Adaptive fully utilizes
overload intervals of CBs.
� Raise CPU freq for
higher performance
50
100
150
0 10 20 30 40 50 60 70 80 90 100
Control Period (5sec)
Po
wer
Co
nsu
mp
tio
n
(Watt
)
NoControl Rated Limit Long-Delay Upper Limit
50
100
150
0 10 20 30 40 50 60 70 80 90 100
Control Period (5sec)
Po
we
rC
on
su
mp
tio
n(W
att
)
P-Control-CB Long-Delay Upper Limit P-Control
50
100
150
0 10 20 30 40 50 60 70 80 90 100
Control Period (5sec)
Po
we
r
Co
ns
um
pti
on
(Wa
tt)
CB-Adaptive Rated Limit Long-Delay Upper Limit
Converge to 80W after hours
50
100
150
0 10 20 30 40 50 60 70 80 90 100Control Period (5sec)
Po
wer
Co
nsu
mp
tio
n
(Watt
)
CB-Proactive Rated Limit Long-Delay Upper Limit
Converge to 80W
NoControl
P-Control
12
Performance Comparison
� CB-Adaptive outperforms
P-Control by
� 66%, for LINPACK
� 29 % to 49%, for SPEC CPU
2006
� 74%, for SPEC JBB
0.0
0.4
0.8
1.2
1.6
2.0
P-Cont
rol
P-Cont
rol-C
B
CB
-Ada
ptive
CB
-Pro
activ
e
LIN
PA
CK
Perf
orm
an
ce
(Gfl
op
s)
0
5000
10000
15000
20000
25000
30000
35000
SP
EC
JB
B P
erf
orm
an
ce
(bo
ps)
LINPACK SPECJBB
0
3
6
9
12
15
18
sphin
x3
wrf
lbm
tonto
Gem
sFDTD
calc
ulixpovr
ayso
plex
dealII
namd
lesl
ie3d
cact
usAD
Mgro
mac
sze
usm
p
milc
games
sbw
aves
xala
ncbm
k
asta
rom
netp
ph26
4ref
libqua
ntum
sjen
ghm
mer
gobmk
mcf
gcc
bzip2
perlb
ench
SPEC2006 benchmarks
Pe
rfo
rma
nce (
Bas
e R
ati
o)
P-Control CB-Adaptive CB-Proactive
13
Impact of Temperature
� Temperature impacts the trip time significantly.
� Temperature-blind solutions P-Control-CB, CB-Adaptive and CB-Proactive are not safe.
200
250
300
350
400
450
500
NoControl (2
1.7℃℃℃℃ )
NoControl (2
6.4℃℃℃℃ )
NoControl (3
1.6℃℃℃℃ )
NoControl (3
4.8℃℃℃℃ )
P-Control-C
B (45℃℃℃℃
)
CB-Adaptive (4
5℃℃℃℃)
CB-Proactive (4
5 ℃℃℃℃)
Temperature (degree Celsius)
Tri
p T
ime
(S
ec
)
14
Temperature-Aware CB-Adaptive
� As the temperature increases, the performance
of servers decreases.
� The performance decrease is modest.
15
Power Provisioning Analysis
� NoControl
� The estimation is too conservative
� 7 servers hosted per branch
� P-Control
� Enforce a power budget instead of an estimation of power
� 13 servers hosted per branch
� CB-Adaptive
� Enforce a higher power budget than P-Control
� 20 servers hosted per branch
Rated power of the CBThe number of servers
estimated server power=
16
Conclusions
� A common assumption of existing power capping
� Peak power should never exceed the rated CB capacity
� This paper
� Systematically studies the CB tripping characteristics
� Identifies ideal upper bound of safe power oversubscription
� Proposes two adaptive power control strategies
� Evaluation on safe power oversubscription
� A single server: 38% performance improvement
� Circuit branch: host 54% more servers without upgrading
power infrastructure
17
Questions?
� Acknowledgements
� NSF CAREER Award CNS-0845390
� NSF CSR Grant CNS-0720663
� NSF SHF Grant CCF-1017336
� Prof. Leon Tolbert at the University of Tennessee
Thank you!
19
Control Theoretic Analysis
� How to adapt the controller parameter?
� Details of the derivation
� Z transform of the system model
� Z-domain controller
� Calculate the close loop transfer function
� Reverse Z transform
1 0.02m
KA
−=
20
Power Provisioning Analysis
� UPS cannot tolerate overloads
� Not a problem because each UPS run at 50% its capacity
� Factors limiting overload capacities