Predicting Performance Impact of DVFS for Realistic Memory Systems Rustam Miftakhutdinov Eiman...

Post on 01-Apr-2015

217 views 2 download

transcript

Predicting Performance Impact of DVFSfor Realistic Memory Systems

Rustam MiftakhutdinovEiman Ebrahimi

Yale N. Patt

2

V

f

Dynamic Voltage/Frequency Scaling

Image source: intel.com

3

fopt

Impact of Frequency Scaling

frequency

time

power

energy

4

fo

Impact of Frequency Scaling

power

time

frequency

5

fopt

Prediction Overview

instructions

frequency

energy perinstruction

100K 200K 300K0

fo freq.

time

fo freq.

powerfo

fo freq.fopt

energy

our work

×

6

Outline

Intro to performance prediction

Why realistic memory systems?

Variable memory latency

Prefetching

7

V

f

Why Realistic Memory System?

8

Prior Work

• Stall time

• Leading loads (2010) S. Eyerman et al. G. Keramidas et al. B. Rountree

Evaluated withconstant access latency memory system

9

Energy Savings

Constant Access Latency

Realistic DRAM Realistic DRAM + Streaming Prefetcher

0123456789 Oracle

Stall timeLeading loadsOur predictor

Nor

m. E

nerg

y Sa

ving

s (%

)

< 0.1

Gmean of relative savings for 13 memory-intensive SPEC 2006 benchmarks.Baseline: most energy-efficient static frequency for SPEC 2006

*

10

Energy Savings

Constant Access Latency

Realistic DRAM Realistic DRAM + Streaming Prefetcher

0123456789 Oracle

Stall timeLeading loadsOur predictor

Nor

m. E

nerg

y Sa

ving

s (%

)

< 0.1

Gmean of relative savings for 13 memory-intensive SPEC 2006 benchmarks.Baseline: most energy-efficient static frequency for SPEC 2006

*

11

Outline

Intro to performance prediction

Why realistic memory systems?

Variable memory latency

Prefetching

12

Execution Example

chipactivity

memoryrequests A

BC

DE

1 2 3 4

time

13

T = Tmemory + Tcomputeindependent offrequency

proportional tocycle time

14

to

Linear Modelexecution time T

cycle time t

Tmemory

Tcompute

0

15

Measuring Tmemory

chipactivity

memoryrequests

time

16

Measuring Tmemory

chipactivity

memoryrequests

time

17

Causes of Request Dependences

next

next

next

Pointer Chasing

instruction window

miss miss

Finite Chip Resources

18

Measuring Tmemory

chipactivity

memoryrequests

time

Critical Path Algorithm

at Tstart 1. record Tstart and Tmemory

TendTstart time

Tmemory

19

at Tend 2. compute path = Tmemory(Tstart) + (Tend - Tstart)

old critical path request latency

3. set Tmemory = max(Tmemory, path)

new Tmemory

(length of critical path)

20

to

Linear Modelexecution time T

cycle time t

Tmemory

Tcompute

0

21

Linear Model

to

execution time T

cycle time t

Tmemory

Tcompute

0

to cycletime

Tm

time

fo freq.

time

fo freq.

power

fo freq.fopt

energy

×

22

Critical Path: Variable Access Latency

chipactivity

memoryrequests

time

Leading Loads: Constant Access Latency

timechipactivity

memoryrequests

23

to

Leading Loadsexecution time T

cycle time t

Tmemory

Tcompute

0

leading loads

24

Leading Loads

to

execution time T

cycle time t

Tmemory

Tcompute

0

leading loads

to cycletime

Tm

time

fo freq.

time

fo freq.

power

fo freq.fopt

energy

×

25

Energy Savings

Constant Access Latency

Realistic DRAM0

1

2

3

4

5

6

7

8 OracleStall timeLeading loadsOur predictor

Nor

m. E

nerg

y Sa

ving

s (%

)

Gmean of relative savings for 13 memory-intensive SPEC 2006 benchmarks.Baseline: most energy-efficient static frequency for SPEC 2006

*

26

Outline

Intro to performance prediction

Why realistic memory systems?

Variable memory latency

Prefetching

27

chipactivity

memoryrequests

time

Prefetcher OFF

Prefetcher ON

chipactivity

memoryrequests

Streaming Workload

28

Limited Bandwidth Modelexecution time T

cycle time t

Tdemand

TcomputeTmemorymin

tcrossover0

Energy Savings

29Gmean of relative savings for 13 memory-intensive SPEC 2006 benchmarks.Baseline: most energy-efficient static frequency for SPEC 2006

*

Constant Access Latency

Realistic DRAM Realistic DRAM + Streaming Prefetcher

0123456789 Oracle

Stall timeLeading loadsOur predictor

Nor

m. E

nerg

y Sa

ving

s (%

)

< 0.1

30

Recap

Intro to performance prediction

Why realistic memory systems?

Variable memory latency

Prefetching

31

Final Thought

Performance predictors need realistic evaluation