+ All Categories
Home > Documents > How Much Power Oversubscription is Safe and Allowed … · How Much Power Oversubscription is Safe...

How Much Power Oversubscription is Safe and Allowed … · How Much Power Oversubscription is Safe...

Date post: 25-May-2018
Category:
Upload: dangtuyen
View: 213 times
Download: 0 times
Share this document with a friend
21
How Much Power Oversubscription is Safe and Allowed in Data Centers? 1 EECS @ University of Tennessee, Knoxville 2 ECE @ The Ohio State University 3 IBM Research, Austin Xing Fu 1,2 , Xiaorui Wang 1,2 , Charles Lefurgy 3
Transcript

How Much Power Oversubscription is

Safe and Allowed in Data Centers?

1EECS @ University of Tennessee, Knoxville

2ECE @ The Ohio State University

3IBM Research, Austin

Xing Fu1,2, Xiaorui Wang1,2, Charles Lefurgy3

2

Introduction

� Power: a first-class constraint in data center design

� Power oversubscription by power capping

� Improves power facility utilization

� Improves server performance

� Power capping at different levels

� Servers, racks, and data centers

� However, they all share a common assumption

Power should never exceed the rated power capacity?

� Otherwise the circuit breaker (CB) would trip?

� Not really! circuit breakers can sustain short overloads.

3

Trip curve of a typical circuit breaker

How Much Power Subscription is Safe?

� A CB trips or not depends on

� Magnitude of the overload

� Duration of the overload

� Ideal upper bound?

� Lower bound of the tolerance band

� This paper

� Investigates CB trip features

� Proposes adaptive power control to

1. Fully utilizes the allowed overload

interval for maximized server perf.

2. Safely hosts more servers without

upgrading power facilities

Rated capacity

1.17 rated capacity

Two

minutes

4

Proposed Solution: CB-Adaptive� More than just a standalone controller

� A methodology that adapts the parameters of existing power

controllers to engineer their settling times

� Example: adapts a server power controller [Lefurgy ICAC’07]

1. Obtain the tripping time from the CB tripping curve

2. The desired settling time should be the tripping time

3. Adapt controller parameter K to enforce the settling time

Server

CPU frequency Power measuredPower Cap

Error

K Error×

Adapt controller parameter K

5

CB-Adaptive Design Details

� System model

� p(k) is the power of the server

� d(k) is the change to the CPU frequency

� A is a hardware-specific parameter when the server runs

LINPACK

� How to adapt the controller parameter?

� The relationship between the parameter and the settling time

� The parameter is a function of the measured power, the rated

current of CB, and the control period.

( 1) ( ) ( )p k p k Ad k+ = +

6

Two CB-Adaptive Improvements

� Temperature-aware CB-Adaptive

� The CB trip curve is impacted by

the ambient temperature.

� The rated current of CB is a linear

function of the temperature.

� K is also a function of ambient

temperature.

� CB-Proactive

� Delicately increases DVFS level in a proactive way

� Further improves the server performance

� When and to what extent the DVFS level is increased?

� CB enters the long-delay region

� Increase the frequency to the highest level

0.9

1

1.1

1.2

10 20 30 40 50

Temperature

Th

e r

ate

d c

urr

en

t

7

Discussion on Power Oversubscription

� Possible applications of CB-Adaptive

� Hosting additional servers

� Safety issues

� A typical power delivery system

� Every component can tolerate overloads like CBs

� Overload capacity: power beyond which permanent damage

occurs to the component

8

More Discussion

� Components other than CBs do not experience

overloads frequently.

� It is less likely that many servers reach their peak power

simultaneously.

� Evidenced by a real Google data center [Fan ISCA’07]

� When only a branch circuit is overloaded

� CB-Adaptive can be applied directly

� When multiple branch circuits are overloaded

� CB-Adaptive needs to consider the tripping time of

components other than CBs.

9

Hardware Testbed

� Dell OptiPlex 380

� Rockwell Allen-

Bradley 1489-A

Industrial CB

� Workloads

� SPEC CPU2006

� SPEC JBB

� LINPACK

10

Baselines

� NoControl

� Estimates the peak power consumption of a server

� No power caps

� Unsafe and conservative

� P-Control

� Measures the power in every control period

� A non-adaptive proportional controller calculates frequency

changes to enforce a power budget.

� P-Control-CB

� The power budget is different from that of P-Control

� Upper bound of the long-delay region of the CB

11

Power Control Comparison

� NoControl causes the CB

trips. Unsafe

� P-Control & P-Control-CB

Unsafe and conservative

� CB-Adaptive fully utilizes

overload intervals of CBs.

� Raise CPU freq for

higher performance

50

100

150

0 10 20 30 40 50 60 70 80 90 100

Control Period (5sec)

Po

wer

Co

nsu

mp

tio

n

(Watt

)

NoControl Rated Limit Long-Delay Upper Limit

50

100

150

0 10 20 30 40 50 60 70 80 90 100

Control Period (5sec)

Po

we

rC

on

su

mp

tio

n(W

att

)

P-Control-CB Long-Delay Upper Limit P-Control

50

100

150

0 10 20 30 40 50 60 70 80 90 100

Control Period (5sec)

Po

we

r

Co

ns

um

pti

on

(Wa

tt)

CB-Adaptive Rated Limit Long-Delay Upper Limit

Converge to 80W after hours

50

100

150

0 10 20 30 40 50 60 70 80 90 100Control Period (5sec)

Po

wer

Co

nsu

mp

tio

n

(Watt

)

CB-Proactive Rated Limit Long-Delay Upper Limit

Converge to 80W

NoControl

P-Control

12

Performance Comparison

� CB-Adaptive outperforms

P-Control by

� 66%, for LINPACK

� 29 % to 49%, for SPEC CPU

2006

� 74%, for SPEC JBB

0.0

0.4

0.8

1.2

1.6

2.0

P-Cont

rol

P-Cont

rol-C

B

CB

-Ada

ptive

CB

-Pro

activ

e

LIN

PA

CK

Perf

orm

an

ce

(Gfl

op

s)

0

5000

10000

15000

20000

25000

30000

35000

SP

EC

JB

B P

erf

orm

an

ce

(bo

ps)

LINPACK SPECJBB

0

3

6

9

12

15

18

sphin

x3

wrf

lbm

tonto

Gem

sFDTD

calc

ulixpovr

ayso

plex

dealII

namd

lesl

ie3d

cact

usAD

Mgro

mac

sze

usm

p

milc

games

sbw

aves

xala

ncbm

k

asta

rom

netp

ph26

4ref

libqua

ntum

sjen

ghm

mer

gobmk

mcf

gcc

bzip2

perlb

ench

SPEC2006 benchmarks

Pe

rfo

rma

nce (

Bas

e R

ati

o)

P-Control CB-Adaptive CB-Proactive

13

Impact of Temperature

� Temperature impacts the trip time significantly.

� Temperature-blind solutions P-Control-CB, CB-Adaptive and CB-Proactive are not safe.

200

250

300

350

400

450

500

NoControl (2

1.7℃℃℃℃ )

NoControl (2

6.4℃℃℃℃ )

NoControl (3

1.6℃℃℃℃ )

NoControl (3

4.8℃℃℃℃ )

P-Control-C

B (45℃℃℃℃

)

CB-Adaptive (4

5℃℃℃℃)

CB-Proactive (4

5 ℃℃℃℃)

Temperature (degree Celsius)

Tri

p T

ime

(S

ec

)

14

Temperature-Aware CB-Adaptive

� As the temperature increases, the performance

of servers decreases.

� The performance decrease is modest.

15

Power Provisioning Analysis

� NoControl

� The estimation is too conservative

� 7 servers hosted per branch

� P-Control

� Enforce a power budget instead of an estimation of power

� 13 servers hosted per branch

� CB-Adaptive

� Enforce a higher power budget than P-Control

� 20 servers hosted per branch

Rated power of the CBThe number of servers

estimated server power=

16

Conclusions

� A common assumption of existing power capping

� Peak power should never exceed the rated CB capacity

� This paper

� Systematically studies the CB tripping characteristics

� Identifies ideal upper bound of safe power oversubscription

� Proposes two adaptive power control strategies

� Evaluation on safe power oversubscription

� A single server: 38% performance improvement

� Circuit branch: host 54% more servers without upgrading

power infrastructure

17

Questions?

� Acknowledgements

� NSF CAREER Award CNS-0845390

� NSF CSR Grant CNS-0720663

� NSF SHF Grant CCF-1017336

� Prof. Leon Tolbert at the University of Tennessee

Thank you!

18

BackUp

19

Control Theoretic Analysis

� How to adapt the controller parameter?

� Details of the derivation

� Z transform of the system model

� Z-domain controller

� Calculate the close loop transfer function

� Reverse Z transform

1 0.02m

KA

−=

20

Power Provisioning Analysis

� UPS cannot tolerate overloads

� Not a problem because each UPS run at 50% its capacity

� Factors limiting overload capacities

21


Recommended