+ All Categories
Home > Documents > A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power...

A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power...

Date post: 08-May-2018
Category:
Upload: duongthu
View: 223 times
Download: 3 times
Share this document with a friend
26
A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core Systems 2014/01/23 Affiliations: * National Chiao Tung University, Taiwan ^ National Central University, Taiwan Chih -Yen Lai*, Gung-Yu Pan*, Hsien-Kai Guo*, Jing-Yang Jou*^ Authors:
Transcript
Page 1: A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System DRAM Power Management Demand of low power systems DRAM

A Read-Write Aware DRAM Scheduling for

Power Reduction in Multi-Core Systems

2014/01/23

Affiliations: * National Chiao Tung University, Taiwan

^ National Central University, Taiwan

Chih-Yen Lai*, Gung-Yu Pan*, Hsien-Kai Guo*, Jing-Yang Jou*^

Authors:

Page 2: A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System DRAM Power Management Demand of low power systems DRAM

A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System

Outline

Introduction

DRAM power management

DRAM architecture

Related Works and Our Motivation

Proposed Techniques

Experimental Results

Conclusions and Future Works

2

Page 3: A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System DRAM Power Management Demand of low power systems DRAM

A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System

DRAM Power Management

Demand of low power systems

DRAM consumes 25%~60% of the system power [2][3][4]

DRAM power management is important

The memory wall

Limits the system performance

Makes DRAM power management harder

3

Memory

54%CPU 21%

Drives 4%

NIC 3%

HBAs 2%

Others 16%

Power consumption of FBDIMM server

*Source: A speech by Samsung’s S. Kadivar at Denali MemCon, 2009 *Source: A speech by Samsung’s K. Han, 2012

DRAM

vs.

CPU

Page 4: A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System DRAM Power Management Demand of low power systems DRAM

A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System

DRAM Architecture

Dual Inline Memory Module

(DIMM)

Composed of multiple DRAM chips

Rank

A set of DRAM chips in a DIMM

Bank

Spreads across chips in a rank

Memory controller

Reorder Queue (RQ)

Command Queue (CQ)

Scheduling

Power mode switching4

DIMMs

Rank

DRAM chip

Rank

Bank

Memory

Controller

Core Core Core Core

Cache

Last Level Cache

SchedulerRQ

CQ1 CQ2 CQ3 CQ4

Page 5: A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System DRAM Power Management Demand of low power systems DRAM

A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System

Outline

Introduction

Related Works and Our Motivation

Related works

Motivation

Proposed Techniques

Experimental Results

Conclusions and Future Works

5

Page 6: A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System DRAM Power Management Demand of low power systems DRAM

A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System

Related Works

Hybrid main memory

PDRAM [13][14], cached DRAM [15]

Re-designing the physical structure of the DRAM

Array rearrangement [2], Mini-rank [16]

Adding hardware component to the DRAM

Intelligent refresh [17], automatic data migration [18]

Power management policies on the memory

controller

Power-down policy

Scheduler

Throttling mechanism

6

Page 7: A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System DRAM Power Management Demand of low power systems DRAM

A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System

Power Management Policies

Power-down policy

Determine when to turn off idle ranks

Granularity: minimum number of chips to be switched

Overhead: transition delays

Time-out power-down [19]

Queue-aware power-down [21]

7

ri becomes idle

DRAM clock

𝑡𝑇𝑂

ri is turned off

ri receives command

ri is turned on

𝑡𝑃𝐷𝑁 𝑡𝑃𝑈𝑃

*ri: rank i

Page 8: A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System DRAM Power Management Demand of low power systems DRAM

A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System

Power Management Policies

Scheduler

Schedules the order of request commands in the memory

controller

Performance driven scheduler

Power-aware scheduler [21]

8

W1 r1

R2 r3

W3 r3

W4 r1

W5 r2

First

Last

W1 r1

W4 r1

R2 r3

W3 r3

W5 r2

W1 r1

R2 r3

W5 r2

W4 r1

W3 r3

Page 9: A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System DRAM Power Management Demand of low power systems DRAM

A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System

Power Management Policies

Throttling mechanism

Blocks commands until throttle delay (𝑡𝑇𝐷) is reached

Ranks stay in the low power mode for longer periods

9

Page 10: A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System DRAM Power Management Demand of low power systems DRAM

A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System

Our Motivation

Reduce power with minimal hardware overhead

Criticality difference between reads and writes

Knobs

Throttling mechanism

Reorder request commands

Power mode control

We propose two techniques

Read-write aware throttling

Rank level read-write reordering

10

Page 11: A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System DRAM Power Management Demand of low power systems DRAM

A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System

Outline

Introduction

Related Works and Our Motivation

Proposed Techniques

Overview

Read-write aware throttling

Rank level read-write reordering

A complete example

Experimental results

Conclusions and Future Works

11

Page 12: A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System DRAM Power Management Demand of low power systems DRAM

A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System

Overview

Greedy power-down memory controller

12

Turn off ri

Proceed to next cycle

RQ receives commands from

last level cache

Turn on ri

CQi is

empty?

RQ pops a command to the CQ

Yes

No

Per rank

Greedy power-down memory controller

Page 13: A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System DRAM Power Management Demand of low power systems DRAM

A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System

Overview

Previous work [21]

13

Turn off ri

Proceed to next cycle

RQ receives commands from

last level cache

Turn on ri

CQi is empty?

RQ pops an allowed command to the CQ

Yes

No

Per rank

𝑡𝑇𝐷 is

reached?

Yes No

Si contains

request?

Cluster commands into command sets

S1…Sn by rank in the RQ

Allow commands in Si to be sent to CQi

Yes

No

Per rank

Previous work [21]

Greedy power-down memory controller

Page 14: A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System DRAM Power Management Demand of low power systems DRAM

A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System

Overview

Our work

14

𝑡𝑇𝐷 is

reached?

Turn off ri

Proceed to next cycle

Yes No

RQ receives commands from

last level cache

Turn on ri

CQi is empty?

Cluster commands into command sets

S1…Sn by rank in the RQ

Allow commands in Si to be sent to CQi

RQ pops an allowed command to the CQ

Yes

No

Per rank

Per rank

Rank level read-write reordering in the RQSi contains

read

request?

Yes

No

Our work

Previous work [21]

Greedy power-down memory controller

Page 15: A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System DRAM Power Management Demand of low power systems DRAM

A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System

Read-Write Aware Throttling

Based on basic throttling [21]

Blocks commands in the RQ

Clusters commands into

command sets 𝑆1, 𝑆2, …, 𝑆𝑛

Read-write aware throttling

Read requests in each

command set?

Urgent rank vs Trivial rank

Only allows commands

targeting urgent ranks to be

sent to the corresponding

CQs

15

cluster

W1 r1

R2 r3

W3 r3

W4 r1

W5 r2

W6 r1

W7 r1

R8 r1

RQFront

End

RQ

W1 r1

W4 r1

W6 r1

W7 r1

R8 r1

S1

R2 r3

W3 r3

S3

W5 r2S2

S4

r1 on

r3 on

r2 on?

r4 off

Page 16: A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System DRAM Power Management Demand of low power systems DRAM

A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System

Rank Level Read-Write Reordering

Reorder commands within a command set

Lets the read requests be processed as early as

possible

Groups commands based on target addresses

At most one read request in a command group

FIFO order for commands in a command group

FIFO order for read requests across command groups

16

Front

End

S1

W1 addr1

W4 addr2

W6 addr3

W7 addr2

R8 addr2

R9 addr3

read-write

reorder

W4 addr2

W7 addr2

R8 addr2

W6 addr3

R9 addr3

W1 addr1

S1

cmd group

cmd group

cmd group

Page 17: A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System DRAM Power Management Demand of low power systems DRAM

A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System

A Complete Example

17

W1 r1

R2 r3

W3 r3

W4 r1

W5 r2

W6 r1

W7 r1

R8 r1

RQFront

End

cluster commands

RQ

W1 r1

W4 r1

W6 r1

W7 r1

R8 r1

S1

R2 r3

W3 r3

S3

W5 r2 S2

r1 is urgent

r3 is urgent

r2 is trivial

Page 18: A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System DRAM Power Management Demand of low power systems DRAM

A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System

A Complete Example

18

Cmd group

W4 addr2

W6 addr3

W7 addr2

R8 addr2

W4 addr2

W7 addr2

R8 addr2

W1 addr1

Cycle k

Front

End

Proceed to next cycle

Read-write reordering

W1 addr1

Cycle k+3

r1 is off r1 is on r1 is off

Both W1 and W6 finish

Cycle

k+4

W1 addr1

W6 addr3

W6 addr3

W6 addr1

Throttle delay is reached

Commands from last throttle period are finished

Page 19: A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System DRAM Power Management Demand of low power systems DRAM

A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System

Outline

Introduction

Related Works and Our Motivation

Proposed Techniques

Experimental Results

Analysis on different techniques

Power and performance trade-off

Conclusions and Future Works

19

Page 20: A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System DRAM Power Management Demand of low power systems DRAM

A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System

Simulation Environment

Simulators: Multi2Sim + DRAMSim2

System: ARM Cortex A9 + DDR2 SDRAM

Workloads: SPEC CPU2006 (multi-program),

SPLASH-2 (multi-threaded)

Policies

Previous work [21]

Basic throttling, Power-aware scheduling, Queue-aware

power-down

Oracle policy

Maximum power reduction at zero performance degradation

Our work

Read-write aware throttling, Rank level read-write reordering

20

Page 21: A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System DRAM Power Management Demand of low power systems DRAM

A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System

Analysis on Different Techniques

At 𝑡𝑇𝐷 = 400 CPU cycles

21

Page 22: A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System DRAM Power Management Demand of low power systems DRAM

A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System

Power and Performance Trade-Off

Trade-off characteristic for SPEC CPU2006

~75% of power reduction is close to the upper bound

Short throttle delays work well for our work

22

Page 23: A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System DRAM Power Management Demand of low power systems DRAM

A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System

Power and Performance Trade-Off

Trade-off characteristic for SPLASH-2

Small difference in power reduction

SPLASH-2 benchmarks are less memory intensive

23

Benchmark

(combination)

Main memory requests

per million cycles

SPEC

CPU2006

fp1 511391.80

fp2 818090.00

fp3 31697.60

fp4 442664.80

SPLASH-2

cholesky 894.36

fft 2196.41

fmm 271.93

radix 801.45

barnes 38.60

Page 24: A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System DRAM Power Management Demand of low power systems DRAM

A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System

Outline

Introduction

Related Works and Our Motivation

Proposed Techniques

Experimental Results

Conclusions and Future Works

24

Page 25: A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System DRAM Power Management Demand of low power systems DRAM

A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System

Conclusions and Future Works

Conclusions

We propose two techniques

Read-write aware throttling

Rank level read-write reordering

Our work improves the power reduction by 10%~15% on

average comparing to the previous work [21]

Our work achieves ~75% power reduction at 1%~3%

system performance degradation

Future works

Run-time throttle delay controller

Associate with automatic data migration, write

combining…

Utilizing deeper power mode25

Page 26: A Read-Write Aware DRAM Scheduling for Power … · A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System DRAM Power Management Demand of low power systems DRAM

A Read-Write Aware DRAM Scheduling for Power Reduction in Multi-Core System

26


Recommended