+ All Categories
Home > Documents > Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov...

Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov...

Date post: 19-Dec-2015
Category:
View: 224 times
Download: 5 times
Share this document with a friend
Popular Tags:
35
Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU) Onur Mutlu (CMU)
Transcript
Page 1: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

Memory Power Management viaDynamic Voltage/Frequency

Scaling

Howard David (Intel)Eugene Gorbatov (Intel)Ulf R. Hanebutte (Intel)

Chris Fallin (CMU)Onur Mutlu (CMU)

Page 2: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

2

Memory Power is Significant Power consumption is a primary concern in modern

servers Many works: CPU, whole-system or cluster-level

approach But memory power is largely unaddressed Our server system*: memory is 19% of system

power (avg) Some work notes up to 40% of total system power

Goal: Can we reduce this figure?

lbm

Gem

sFD

TD milc

lesl

ie3d

libqu

antu

mso

plex

sphi

nx3

mcf

cact

usAD

M gcc

deal

IIto

nto

bzip

2go

bmk

sjen

gca

lcul

ixpe

rlben

chh2

64re

fna

md

grom

acs

gam

ess

povr

ayhm

mer

0100200300400

System PowerMemory Power

Pow

er (W

)

*Dual 4-core Intel Xeon®, 48GB DDR3 (12 DIMMs), SPEC CPU2006, all cores active. Measured AC power, analytically modeled memory power.

Page 3: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

3

Existing Solution: Memory Sleep States? Most memory energy-efficiency work uses sleep

states Shut down DRAM devices when no memory requests

active But, even low-memory-bandwidth workloads keep

memory awake Idle periods between requests diminish in multicore

workloads CPU-bound workloads/phases rarely completely cache-

resident

lbm

Gem

sFDT

D

milc

lesli

e3d

libqu

antu

m

sopl

ex

sphi

nx3

mcf

cact

usAD

M gcc

deal

II

tont

o

bzip

2

gobm

k

sjeng

calc

ulix

perlb

ench

h264

ref

nam

d

grom

acs

gam

ess

povr

ay

hmm

er

0%2%4%6%8%

Sleep State Residency

Tim

e Sp

ent i

n Sl

eep

St

ates

Page 4: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

4

Memory Bandwidth Varies Widely Workload memory bandwidth requirements vary

widely

Memory system is provisioned for peak capacity often underutilized

lbm

GemsFD

TD milc

leslie

3d

libquan

tumso

plex

sphinx3 mcf

cactu

sADM gc

cdea

lIItonto

bzip2

gobmk

sjeng

calcu

lix

perlben

ch

h264refnam

d

gromac

s

gamess

povray

hmmer0

2

4

6

Memory Bandwidth for SPEC CPU2006

Band

wid

th/c

hann

el (G

B/s)

Page 5: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

5

Memory Power can be Scaled Down DDR can operate at multiple frequencies reduce

power Lower frequency directly reduces switching power Lower frequency allows for lower voltage Comparable to CPU DVFS

Frequency scaling increases latency reduce performance Memory storage array is asynchronous But, bus transfer depends on frequency When bus bandwidth is bottleneck, performance

suffers

CPU Voltage/Freq.

System Power

Memory Freq.

System Power

↓ 15% ↓ 9.9% ↓ 40% ↓ 7.6%

Page 6: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

6

Observations So Far Memory power is a significant portion of total power

19% (avg) in our system, up to 40% noted in other works

Sleep state residency is low in many workloads Multicore workloads reduce idle periods CPU-bound applications send requests frequently

enoughto keep memory devices awake

Memory bandwidth demand is very low in some workloads

Memory power is reduced by frequency scaling And voltage scaling can give further reductions

Page 7: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

7

DVFS for Memory Key Idea: observe memory bandwidth utilization,

then adjust memory frequency/voltage, to reduce power with minimal performance loss

Dynamic Voltage/Frequency Scaling (DVFS) for memory

Goal in this work: Implement DVFS in the memory system, by: Developing a simple control algorithm to exploit

opportunity for reduced memory frequency/voltage by observing behavior

Evaluating the proposed algorithm on a real system

Page 8: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

8

Outline Motivation

Background and Characterization DRAM Operation DRAM Power Frequency and Voltage Scaling

Performance Effects of Frequency Scaling

Frequency Control Algorithm

Evaluation and Conclusions

Page 9: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

9

Outline Motivation

Background and Characterization DRAM Operation DRAM Power Frequency and Voltage Scaling

Performance Effects of Frequency Scaling

Frequency Control Algorithm

Evaluation and Conclusions

Page 10: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

10

DRAM Operation Main memory consists of DIMMs of DRAM devices Each DIMM is attached to a memory bus (channel) Multiple DIMMs can connect to one channel

Memory Bus (64 bits)

/8 /8 /8 /8 /8 /8 /8 /8

to Memory Controller

Page 11: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

11

Inside a DRAM Device

Bank 0

Sense AmpsColumn Decoder

Row

Deco

der ODT

Reci

ever

sD

rive

rs

Regis

ter

s

Wri

te

FIFO

Banks• Independent

arrays• Asynchronous:

independent of memory bus speed

I/O Circuitry• Runs at bus speed• Clock sync/distribution• Bus drivers and receivers• Buffering/queueing

On-Die Termination• Required by bus electrical

characteristicsfor reliable operation

• Resistive element that dissipates power when bus is active

Page 12: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

12

Effect of Frequency Scaling on Power Reduced memory bus frequency: Does not affect bank power:

Constant energy per operation Depends only on utilized memory bandwidth

Decreases I/O power: Dynamic power in bus interface and clock

circuitryreduces due to less frequent switching

Increases termination power: Same data takes longer to transfer Hence, bus utilization increases

Tradeoff between I/O and termination results in a net power reduction at lower frequencies

Page 13: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

13

Effects of Voltage Scaling on Power Voltage scaling further reduces power because all

parts of memory devices will draw less current (at less voltage)

Voltage reduction is possible because stable operation requires lower voltage at lower frequency:

1333MHz 1066MHz 800MHz1

1.11.21.31.41.51.6

Minimum Stable Voltage for 8 DIMMs in a Real System

Vdd for Power Model

DIM

M V

olta

ge (V

)

Page 14: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

14

Outline Motivation

Background and Characterization DRAM Operation DRAM Power Frequency and Voltage Scaling

Performance Effects of Frequency Scaling

Frequency Control Algorithm

Evaluation and Conclusions

Page 15: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

15

How Much Memory Bandwidth is Needed?

lbm milc

libquantum

sphinx3

cactu

sADM

dealIIbzip

2sje

ng

perlbench

namd

gamess

hmmer01234567

Memory Bandwidth for SPEC CPU2006

Band

wid

th/c

hann

el (G

B/s)

Page 16: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

16

Performance Impact of Static Frequency Scaling Performance impact is proportional to bandwidth

demand Many workloads tolerate lower frequency with

minimal performance droplb

mG

emsF

DTD milc

lesli

e3d

libqu

antu

mso

plex

sphi

nx3

mcf

cact

usAD

M gcc

deal

IIto

nto

bzip

2go

bmk

sjen

gca

lcul

ixpe

rlben

chh2

64re

fna

md

grom

acs

gam

ess

povr

ayhm

mer

01020304050607080

Performance Loss, Static Frequency Scaling

1333->800

1333->1066

Perf

orm

ance

Dro

p (%

)

lbm

Gem

sFD

TD milc

lesli

e3d

libqu

antu

mso

plex

sphi

nx3

mcf

cact

usAD

M gcc

deal

IIto

nto

bzip

2go

bmk

sjen

gca

lcul

ixpe

rlben

chh2

64re

fna

md

grom

acs

gam

ess

povr

ayhm

mer

0

2

4

6

8Performance Loss, Static Frequency Scaling

1333->8001333->1066

Perf

orm

ance

Dro

p (%

)

:::: :::: :::: :::: :::: :: :: ::

Page 17: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

17

Outline Motivation

Background and Characterization DRAM Operation DRAM Power Frequency and Voltage Scaling

Performance Effects of Frequency Scaling

Frequency Control Algorithm

Evaluation and Conclusions

Page 18: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

18

Memory Latency Under Load At low load, most time is in array access and bus

transfer small constant offset between bus-frequency latency curves

As load increases, queueing delay begins to dominate

bus frequency significantly affects latency

0 1000 2000 3000 4000 5000 6000 7000 80006090

120150180

Memory Latency as a Function of Bandwidth and Mem Frequency

Utilized Channel Bandwidth (MB/s)

Late

ncy

(ns)

Page 19: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

19

Control Algorithm: Demand-Based Switching

After each epoch of length Tepoch:

Measure per-channel bandwidth BWif BW < T800 : switch to 800MHz

else if BW < T1066 : switch to 1066MHz

else : switch to 1333MHz

0 1000 2000 3000 4000 5000 6000 7000 80006090

120150180

Memory Latency as a Function of Bandwidth and Mem Frequency800MHz 1067MHz 1333MHz 800-fit

Utilized Channel Bandwidth (MB/s)

Late

ncy

(ns)

T1066T800

Page 20: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

20

Implementing V/F Switching Halt Memory Operations

Pause requests Put DRAM in Self-Refresh Stop the DIMM clock

Transition Voltage/Frequency Begin voltage ramp Relock memory controller PLL at new frequency Restart DIMM clock Wait for DIMM PLLs to relock

Begin Memory Operations Take DRAM out of Self-Refresh Resume requests

C Memory frequency already adjustable statically

C Voltage regulators for CPU DVFS can work for memory DVFS

C Full transition takes ~20µs

Page 21: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

21

Outline Motivation

Background and Characterization DRAM Operation DRAM Power Frequency and Voltage Scaling

Performance Effects of Frequency Scaling

Frequency Control Algorithm

Evaluation and Conclusions

Page 22: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

22

Evaluation Methodology Real-system evaluation

Dual 4-core Intel Xeon®, 3 memory channels/socket 48 GB of DDR3 (12 DIMMs, 4GB dual-rank, 1333MHz)

Emulating memory frequency for performance Altered memory controller timing registers (tRC,

tB2BCAS) Gives performance equivalent to slower memory

frequencies

Modeling power reduction Measure baseline system (AC power meter, 1s

samples) Compute reductions with an analytical model (see

paper)

Page 23: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

23

Evaluation Methodology

Workloads SPEC CPU2006: CPU-intensive workloads All cores run a copy of the benchmark

Parameters Tepoch = 10ms Two variants of algorithm with different switching

thresholds: BW(0.5, 1): T800 = 0.5GB/s, T1066 = 1GB/s BW(0.5, 2): T800 = 0.5GB/s, T1066 = 2GB/s

More aggressive frequency/voltage scaling

Page 24: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

24

Performance Impact of Memory DVFS Minimal performance degradation: 0.2% (avg), 1.7%

(max) Experimental error ~1%

lbm

Gem

sFD

TD milc

lesli

e3d

libqu

antu

m

sopl

ex

sphi

nx3

mcf

cact

usAD

M gcc

deal

II

tont

o

bzip

2

gobm

k

sjen

g

calc

ulix

perlb

ench

h264

ref

nam

d

grom

acs

gam

ess

povr

ay

hmm

er

AVG

-1

0

1

2

3

4

BW(0.5,1)BW(0.5,2)

Perf

orm

ance

Deg

rada

tion

(%)

Page 25: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

25

Memory Frequency Distribution Frequency distribution shifts toward higher memory frequencies with more memory-intensive benchmarks

lbmmilc

libquan

tum

sphinx3

cactu

sADM

dealII

bzip2

sjeng

perlben

chnam

d

gamess

hmmer0%

20%

40%

60%

80%

100%

13331066800

Page 26: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

26

Memory Power Reduction Memory power reduces by 10.4% (avg), 20.5%

(max) lb

mG

emsF

DTD milc

lesl

ie3d

libqu

antu

mso

plex

sphi

nx3

mcf

cact

usAD

M gcc

deal

IIto

nto

bzip

2go

bmk

sjen

gca

lcul

ixpe

rlben

chh2

64re

fna

md

grom

acs

gam

ess

povr

ayhm

mer

AVG

0

5

10

15

20

25

BW(0.5,1)BW(0.5,2)

Mem

ory

Pow

er R

educ

tion

(%)

Page 27: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

27

System Power Reductionlb

mG

emsF

DTD milc

lesl

ie3d

libqu

antu

mso

plex

sphi

nx3

mcf

cact

usAD

M gcc

deal

IIto

nto

bzip

2go

bmk

sjen

gca

lcul

ixpe

rlben

chh2

64re

fna

md

grom

acs

gam

ess

povr

ayhm

mer

AVG

00.5

11.5

22.5

33.5

4

BW(0.5,1)BW(0.5,2)

Syst

em P

ower

Red

uctio

n (%

)

As a result, system power reduces by 1.9% (avg), 3.5% (max)

Page 28: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

28

System energy reduces by 2.4% (avg), 5.1% (max)

System Energy Reductionlb

mG

ems.

..m

ilcle

slie

3dlib

qua.

..so

plex

sphi

nx3

mcf

cact

u...

gcc

deal

IIto

nto

bzip

2go

bmk

sjen

gca

lcul

ixpe

rlb.

..h2

64re

fna

md

grom

acs

gam

ess

povr

ayhm

mer

AV

G-1

0

1

2

3

4

5

6

BW(0.5,1)BW(0.5,2)

Syst

em E

nerg

y Re

ducti

on (%

)

Page 29: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

29

Related Work MemScale [Deng11], concurrent work (ASPLOS 2011)

Also proposes Memory DVFS Application performance impact model to decide voltage

and frequency: requires specific modeling for a given system; our bandwidth-based approach avoids this complexity

Simulation-based evaluation; our work is a real-system proof of concept

Memory Sleep States (Creating opportunity with data placement [Lebeck00,Pandey06], OS scheduling [Delaluz02], VM subsystem [Huang05]; Making better decisions with better models [Hur08,Fan01])

Power Limiting/Shifting (RAPL [David10] uses memory throttling for thermal limits; CPU throttling for memory traffic [Lin07,08]; Power shifting across system [Felter05])

Page 30: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

30

Conclusions Memory power is a significant component of system

power 19% average in our evaluation system, 40% in other work

Workloads often keep memory active but underutilized Channel bandwidth demands are highly variable Use of memory sleep states is often limited

Scaling memory frequency/voltage can reduce memory power with minimal system performance impact 10.4% average memory power reduction Yields 2.4% average system energy reduction

Greater reductions are possible with wider frequency/voltage range and better control algorithms

Page 31: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

Memory Power Management viaDynamic Voltage/Frequency

Scaling

Howard David (Intel)Eugene Gorbatov (Intel)Ulf R. Hanebutte (Intel)

Chris Fallin (CMU)Onur Mutlu (CMU)

Page 32: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

32

Why Real-System Evaluation? Advantages:

Capture all effects of altered memory performance System/kernel code, interactions with IO and peripherals, etc

Able to run full-length benchmarks (SPEC CPU2006) rather than short instruction traces

No concerns about architectural simulation fidelity Disadvantages:

More limited room for novel algorithms and detailed measurements

Inherent experimental error due to background-task noise, real power measurements, nondeterministic timing effects

For a proof-of-concept, we chose to run on a real system in order to have results that capture all potential side-effects of altering memory frequency

Page 33: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

33

CPU-Bound Applications in a DRAM-rich system We evaluate CPU-bound workloads with 12 DIMMs:

what about smaller memory, or IO-bound workloads?

12 DIMMs (48GB): are we magnifying the problem? Large servers can have this much memory, especially for database

or enterprise applications Memory can be up to 40% of system power [1,2], and reducing its

power in general is an academically interesting problem

CPU-bound workloads: will it matter in real life? Many workloads have CPU-bound phases (e.g., database scan or

business logic in server workloads) Focusing on CPU-bound workloads isolates the problem of varying

memory bandwidth demand while memory cannot enter sleep states, and our solution applies for any compute phase of a workload

[1] L. A. Barroso and U. Holzle. “The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.” Synthesis Lectures on Computer Architecture. Morgan & Claypool, 2009.[2] C. Lefurgy et al. “Energy Management for Commercial Servers.” IEEE Computer, pp. 39—48, December 2003.

Page 34: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

34

Combining Memory & CPU DVFS? Our evaluation did not incorporate CPU DVFS:

Need to understand effect of single knob (memory DVFS) first

Combining with CPU DVFS might produce second-order effects that would need to be accounted for

Nevertheless, memory DVFS is effective by itself, and mostly orthogonal to CPU DVFS: Each knob reduces power in a different component Our memory DVFS algorithm has neligible performance

impact negligible impact on CPU DVFS CPU DVFS will only further reduce bandwidth demands

relative to our evaluations no negative impact on memory DVFS

Page 35: Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU)

35

Why is this Autonomic Computing? Power management in general is autonomic: a system

observes its own needs and adjusts its behavior accordingly Lots of previous work comes from architecture community, but crossover in ideas and approaches could be beneficial

This work exposes a new knob for control algorithms to turn, has a simple model for the power/energy effects of that knob, and observes opportunity to apply it in a simple way

Exposes future work for: More advanced control algorithms Coordinated energy efficiency across rest of system Coordinated energy efficiency across a cluster/datacenter,

integrated with memory DVFS, CPU DVFS, etc.


Recommended