+ All Categories
Home > Documents > Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at...

Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at...

Date post: 12-Aug-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
43
Rethinking DRAM Power Modes for Energy Proportionality Krishna Malladi 1 , Ian Shaeffer 2 , Liji Gopalakrishnan 2 , David Lo 1 , Benjamin Lee 3 , Mark Horowitz 1 Stanford University 1 , Rambus Inc 2 , Duke University 3 [email protected]
Transcript
Page 1: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

Rethinking DRAM Power Modes for Energy Proportionality

Krishna Malladi1, Ian Shaeffer2, Liji Gopalakrishnan2,David Lo1, Benjamin Lee3, Mark Horowitz1

Stanford University1, Rambus Inc2, Duke University3

[email protected]

Page 2: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

2

Main Memory in Datacenters

Server power main energy bottleneck in datacentersPUE of ~1.1 the rest of the system is energy efficient

Significant main memory (DRAM) power25-40% of server power across all utilization pointsLow dynamic range No energy proportionality

Page 3: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

3

Main Memory in Datacenters

Server power main energy bottleneck in datacentersPUE of ~1.1 the rest of the system is energy efficient

Significant main memory (DRAM) power25-40% of server power across all utilization pointsLow dynamic range No energy proportionality

Page 4: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

4

Outline

Inefficiencies of DRAM interfaces

Energy-proportionality via fast DRAM interfaces- MemBlaze- MemCorrect- MemDrowsy

Page 5: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

5

Outline

Inefficiencies of DRAM interfaces

Energy-proportionality via fast DRAM interfaces- MemBlaze- MemCorrect- MemDrowsy

Page 6: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

6

DDR3 Energy & Powermodes

DDR3 optimized for high bandwidthHigh speed interface with DLLs, CLKs, ODTsVery high static power in active-idle

Hard to powerdown to deep statesLong impractical wakeup time to power up interfaceInsufficient idleness in workloads Significant active-idle time

Power Mode DIMM Idle Power (W) Exit Latency (ns)

Active Idle 5.36 0

Fast Powerdown 2.79 20

Deep Powerdown 0.92 768

Page 7: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

7

DDR3 Energy & Powermodes

DDR3 optimized for high bandwidthHigh speed interface with DLLs, CLKs, ODTsVery high static power in active-idle

Hard to powerdown to deep statesLong impractical wakeup time to power up interfaceInsufficient idleness in workloads Significant active-idle time

Power Mode DIMM Idle Power (W) Exit Latency (ns)

Active Idle 5.36 0

Fast Powerdown 2.79 20

Deep Powerdown 0.92 768

Page 8: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

8

DDR3 Energy & Powermodes

DDR3 optimized for high bandwidthHigh speed interface with DLLs, CLKs, ODTsVery high static power in active-idle

Hard to powerdown to deep statesLong impractical wakeup time to power up interfaceInsufficient idleness in workloads Significant active-idle time

Power Mode DIMM Idle Power (W) Exit Latency (ns)

Active Idle 5.36 0

Fast Powerdown 2.79 20

Deep Powerdown 0.92 768

Page 9: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

9

DDR3 Energy & Powermodes

DDR3 optimized for high bandwidthHigh speed interface with DLLs, CLKs, ODTsVery high static power in active-idle

Hard to powerdown to deep statesLong impractical wakeup time to power up interfaceInsufficient idleness in workloads Significant active-idle time

88%! Power Mode DIMM Idle Power (W) Exit Latency (ns)

Active Idle 5.36 0

Fast Powerdown 2.79 20

Deep Powerdown 0.92 768

Page 10: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

10

Path to Energy-Proportionality

Page 11: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

11

Path to Energy-Proportionality

Page 12: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

12

Path to Energy-Proportionality

Reduce active-idle power

Page 13: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

13

Path to Energy-Proportionality

Reduce active-idle power

Reduce time in active-idleIncrease time in power-down

Page 14: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

14

Path to Energy-Proportionality

Reduce active-idle power

Reduce time in active-idleIncrease time in power-down

Reduce power-down power

Page 15: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

15

DRAM Interfaces

Bits are shortSampling window is only 625ps

Data (DQ) and Clock (CLK) signals forwarded to DRAMWrite data aligned to Clock edges

Page 16: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

16

DRAM Interfaces

Dynamic chip variations affect ReadsPVT variations Misaligned DQS and CLK signalsNon-deterministic Read timing Incorrect sampling

Page 17: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

17

DRAM Interfaces

On-chip DLLsAdjust delay to match chip temperature, voltage variationsAlign DQS, DQ to CLK

Power hungry, long settling time poor powermodes

Page 18: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

18

Live with Slow-PowerupS/W mechanisms

Batch requests (or) subset ranks (or) Predict idlenessDegrades application performanceDegraded device density

H/W mechanismsStatically Disable DLLs in BIOS Statically lowers bandwidth

Worse performance

Use current deep powermodesLong memory wake-up latency

Page 19: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

19

With Wakeup = 1u sec

E-D curves flatCan’t win with long wakeups

Page 20: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

20

Faster Wakeups

Powerups should be much smaller

100ns

Page 21: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

21

Faster Wakeups

Powerups should be much smaller

100ns

Page 22: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

22

Outline

Inefficiencies of DRAM interfaces

Energy-proportionality via fast DRAM interfaces- MemBlaze- MemCorrect- MemDrowsy

Page 23: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

23

Fast DRAM Wakeups

Enabling deep powerdown needs low-

latency wakeups

Rearchitectinterface to reduce

wakeup latency

Fast wakeup withMemBlaze

Retain interface but powerdown

aggressively

Speculative wakeup with MemCorrect

Lazy wakeup with MemDrowsy

Page 24: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

24

Fast DRAM Wakeups

Enabling deep powerdown needs low-

latency wakeups

Rearchitectinterface to reduce

wakeup latency

Fast wakeup withMemBlaze

Retain interface but powerdown

aggressively

Speculative wakeup with MemCorrect

Lazy wakeup with MemDrowsy

Page 25: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

25

Fast DRAM Wakeups

Enabling deep powerdown needs low-

latency wakeups

Rearchitectinterface to reduce

wakeup latency

Fast wakeup withMemBlaze

Retain interface but powerdown

aggressively

Speculative wakeup with MemCorrect

Lazy wakeup with MemDrowsy

Page 26: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

26

Fast DRAM Wakeups

Enabling deep powerdown needs low-

latency wakeups

Rearchitectinterface to reduce

wakeup latency

Fast wakeup withMemBlaze

Retain interface but powerdown

aggressively

Speculative wakeup with MemCorrect

Lazy wakeup with MemDrowsy

Page 27: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

27

Fast Wakeup with MemBlaze

No DLLPeriodic Timing reference signal stores DRAM offset in controllerCurrent-mode logic (CML) clocking has fewer variations

Fast turn-on of datapathCapacitive boosting quickly restores bias values

Page 28: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

28

Fast Wakeup with MemBlaze

No DLLPeriodic Timing reference signal stores DRAM offset in controllerCurrent-mode logic (CML) clocking has fewer variations

Fast turn-on of datapathCapacitive boosting quickly restores bias values Exit latency ~ 10ns

Page 29: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

29

MemBlaze DRAM + Controller

Integrated into DRAMs. Fabricated and testedMore details in the paper

Page 30: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

30

Silicon Results

Page 31: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

31

MethodologyWorkloads

MemcachedKey/value pairs with 100B and 10KB valuesZipf popularity distribution with exponential inter-arrival times

Yahoo! Cloud Benchmark (YCSB), SPECjbbMultiprogrammed (MP) and Multithreaded (MT)

SPECCPU 2006, SPECOMP 2001, PARSECHigh BW (HB), Medium BW (MB), Low BW (LB)

Architecture8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache32 GB DRAM, 2Gb DDR3-1333 chipsFast powerdown baseline, 15 cycle powerdown timer

Page 32: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

32

MemBlaze Evaluation

66% lower memory energy with MemBlaze fastlockNo performance penalty

Page 33: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

33

Fast DRAM Wakeups

Enabling deep powerdown needs low-

latency wakeups

Rearchitectinterface to reduce

wakeup latency

Fast wakeup withMemBlaze

Retain interface but powerdown

aggressively

Speculative wakeup with MemCorrect

Lazy wakeup with MemDrowsy

Page 34: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

34

Fast DRAM Wakeups

Enabling deep powerdown needs low-

latency wakeups

Rearchitectinterface to reduce

wakeup latency

Fast wakeup withMemBlaze

Retain interface but powerdown

aggressively

Speculative wakeup with MemCorrect

Lazy wakeup with MemDrowsy

Page 35: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

35

Speculative Wakeup with MemCorrect

Fast wakeupUse deep power-down, which powers-off DLL, CLKTransfer speculatively before the long DLL recalibration

Error Detection/CorrectionDetector fires if power-down period accumulated large skewCorrector waits for recalibration before transfer

Page 36: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

36

MemCorrect Evaluation

Vary probability of correct timing (p)40% energy savings (esp. for datacenters)Small p Recalibration latency exposed

Degrades performance for high-BW appsIncreases energy/bit

Page 37: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

37

Fast DRAM Wakeups

Enabling deep powerdown needs low-

latency wakeups

Rearchitectinterface to reduce

wakeup latency

Fast wakeup withMemBlaze

Retain interface but powerdown

aggressively

Speculative wakeup with MemCorrect

Lazy wakeup with MemDrowsy

Page 38: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

38

Fast DRAM Wakeups

Enabling deep powerdown needs low-

latency wakeups

Rearchitectinterface to reduce

wakeup latency

Fast wakeup withMemBlaze

Retain interface but powerdown

aggressively

Speculative wakeup with MemCorrect

Lazy wakeup with MemDrowsy

Page 39: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

39

Lazy Wakeup with MemDrowsy

Fast wakeupWakeup from deep-powerdownTransfer at lower rate before DLL recalibration completes

Reduced Sampling RateLower data rate for READs during calibration time (~ 700ns)

Transfer each bit multiple times Wider sampling windowEliminates timing uncertainty

Page 40: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

40

MemDrowsy Evaluation

Vary sampling reduction rate (Z)40% energy savings for datacenter appsHigh Z harms both performance and energy/bit

Energy per bit increases from wake-ups, higher bus activityZ=2 more realistic

Page 41: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

41

MemCorrect + MemDrowsy

Combine MemCorrect and MemDrowsyIf error detected, halve sampling rate instead of backoff≤10% performance penalty50% energy/bit savings

Page 42: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

42

Conclusion

DDR3 is energy-disproportionalDRAMs dissipate high static power

DDR3 interfaces are efficiency bottlenecksHigh active-idle powerLong wake-ups from power modes

Re-architect interfaces with MemBlazeOr use MemCorrect + MemDrowsy

Provide fast wake-up from power modesEnergy efficiency improves by 40-70%Performance impact is ≤ 10%

Page 43: Rethinking DRAM Power Modes for Energy Proportionality · 2012-12-08 · 8 OoO Nehalem cores at 3GHz, 8MB shared L3 cache 32 GB DRAM, 2Gb DDR3-1333 chips Fast powerdown baseline,

43

Thank you for your attention!

Questions?

[email protected]


Recommended