+ All Categories
Home > Documents > ProactiveDRAM: A DRAM-initiated Retention Management ......DRAM cell DRAM row sense amp DRAM bank...

ProactiveDRAM: A DRAM-initiated Retention Management ......DRAM cell DRAM row sense amp DRAM bank...

Date post: 21-Jun-2021
Category:
Upload: others
View: 19 times
Download: 0 times
Share this document with a friend
6
ProactiveDRAM: A DRAM-initiated Retention Management Scheme Jue Wang Pennsylvania State University University Park, Pennsylvania 16802 USA [email protected] Xiangyu Dong Qualcomm Technologies, Inc. San Diego, California 92121 USA [email protected] Yuan Xie University of California, Santa Barbara Santa Barbara, California 93106 USA [email protected] Abstract—DRAM cells are leaky and need periodic refresh, which hurts system performance and consumes additional energy. With DRAM scaling towards sub-20 nm process technology, we expect a significant portion of DRAM cells become weak cells and require a higher refresh rate, resulting in even higher refresh overhead. A possible solution is to selectively refresh those weak cells using a higher frequency but still refresh the majority at the nominal rate. However, how to provide a multi-rate DRAM refresh scheme is not straightforward. Previous work on this topic was built on an obsolete refresh framework and was incompatible to modern DRAM standards, making it challenging to be adopted in practice. In this work, we propose ProactiveDRAM, a novel scheme that allows DRAM proactively guides the timing of weak cell refresh management and reuses memory controllers’ capability in command scheduling. ProactiveDRAM offers a smart retention-aware refresh on the DRAM row granularity, and more importantly, it can be built atop any modern DRAM architecture. Our simulation results show that ProactiveDRAM can handle 1% (or even 10%) weak row population with negligible performance and energy overhead 1 . I. I NTRODUCTION Dynamic Random Access Memory (DRAM) is de facto the only main memory technology for decades because it successfully strikes the balance between performance and cost. A DRAM cell consists of one transistor and one capacitor. The capacitor can be either charged or discharged to represent a binary value (e.g. 0 or 1). This scheme provides a high degree of integration making DRAM the cheapest memory solution, but capacitors leak charge over time, and stored data eventually fades unless DRAM cell is periodically refreshed. As a result, in modern computer systems, memory refresh operations are needed so that the charge on the capacitor can be restored to its original level. The maximum time that a cell can retain its data without refresh is called the retention time. It is believed that the continuous DRAM process scaling toward sub-20 nm region poses a challenge in maintaining the current DRAM retention time level. Normally, DRAM cell retention time follows a tail distribution because of process variations [5], [12]. As more DRAM cells are integrated into a single chip, the process imperfection introduces more DRAM cells that fail to reach the nominal retention time and become faulty cells if the refresh frequency is unchanged. These cells are treated as weak cells. In addition, variable retention time (VRT) starts to becomes a non-trivial phenomenon in latest DRAM technologies [15]. VRT cells transition between high retention state and low retention state. Since they are unpredictable, we also have to conservatively count VRT cells 1 This work is supported in part by NSF 1218867, 1213052, 1409798, and Department of Energy under Award Number DE-SC0005026. Channel 0 Rank 0 Chip A Chip B bank bank Rank 1 Chip C Chip D bank bank CPU cache data bus command bus CS0 CS1 memory controller processor Fig. 1. DRAM system organization. as weak cells. Combined these effects together, we expect a non-negligible portion of a DRAM will be weak. Previous works have the same concerns and show that DRAM refresh will soon become a system performance bottleneck and also cause an energy waste as high as 20% [14], [17], [18]. One of possible solutions is to exploit the observation that the DRAM retention time is a distribution across all the DRAM cells. In a simple fixed-rate refresh scheme, the worst DRAM cell in terms of retention decides the minimum refresh fre- quency. Therefore, the goal of this paper is to find an efficient way of refreshing DRAM at multiple refresh rates. There are other efforts toward the same direction (e.g. RAIDR [14]). However, they require special memory controller designs and their proposed refresh scheme is incompatible to modern DRAM standards. Hence, another important target of this paper is to ensure the proposed technique can be fit into modern DRAM systems. In this paper, we propose ProactiveDRAM, an innovative scheme that meets both of our goals. ProactiveDRAM allows a DRAM to self-manage its weak row retention time. To our best knowledge, ProactiveDRAM is novel in several aspects: It is compatible to the auto-refresh framework; It manages DRAM retention on a row granularity; DRAM chip has the capability of communicating with memory controller and orchestrating the additional refreshes for its weak rows; It reuses the command queue in memory controllers; The performance and energy impacts are negligible. II. BACKGROUND AND MOTIVATION We first review the DRAM technology briefly and explain our motivation of this work. A. DRAM Basic Figure 1 shows the conceptual view of a DRAM system, which is hierarchically organized as channels, ranks, and 978-1-4799-6492-5/14/$31.00 ©2014 IEEE 22
Transcript
Page 1: ProactiveDRAM: A DRAM-initiated Retention Management ......DRAM cell DRAM row sense amp DRAM bank subarray wordline bitline Fig. 2. DRAM internal bank architecture. banks. A rank can

ProactiveDRAM: A DRAM-initiated RetentionManagement Scheme

Jue WangPennsylvania State University

University Park, Pennsylvania 16802 [email protected]

Xiangyu DongQualcomm Technologies, Inc.

San Diego, California 92121 [email protected]

Yuan XieUniversity of California, Santa BarbaraSanta Barbara, California 93106 USA

[email protected]

Abstract—DRAM cells are leaky and need periodic refresh,which hurts system performance and consumes additional energy.With DRAM scaling towards sub-20 nm process technology, weexpect a significant portion of DRAM cells become weak cellsand require a higher refresh rate, resulting in even higher refreshoverhead. A possible solution is to selectively refresh those weakcells using a higher frequency but still refresh the majority atthe nominal rate. However, how to provide a multi-rate DRAMrefresh scheme is not straightforward. Previous work on this topicwas built on an obsolete refresh framework and was incompatibleto modern DRAM standards, making it challenging to be adoptedin practice. In this work, we propose ProactiveDRAM, a novelscheme that allows DRAM proactively guides the timing ofweak cell refresh management and reuses memory controllers’capability in command scheduling. ProactiveDRAM offers asmart retention-aware refresh on the DRAM row granularity,and more importantly, it can be built atop any modern DRAMarchitecture. Our simulation results show that ProactiveDRAMcan handle 1% (or even 10%) weak row population withnegligible performance and energy overhead1.

I. INTRODUCTION

Dynamic Random Access Memory (DRAM) is de factothe only main memory technology for decades because itsuccessfully strikes the balance between performance and cost.A DRAM cell consists of one transistor and one capacitor. Thecapacitor can be either charged or discharged to represent abinary value (e.g. 0 or 1). This scheme provides a high degreeof integration making DRAM the cheapest memory solution,but capacitors leak charge over time, and stored data eventuallyfades unless DRAM cell is periodically refreshed. As a result,in modern computer systems, memory refresh operations areneeded so that the charge on the capacitor can be restored toits original level. The maximum time that a cell can retain itsdata without refresh is called the retention time.

It is believed that the continuous DRAM process scalingtoward sub-20 nm region poses a challenge in maintaining thecurrent DRAM retention time level. Normally, DRAM cellretention time follows a tail distribution because of processvariations [5], [12]. As more DRAM cells are integrated intoa single chip, the process imperfection introduces more DRAMcells that fail to reach the nominal retention time and becomefaulty cells if the refresh frequency is unchanged. Thesecells are treated as weak cells. In addition, variable retentiontime (VRT) starts to becomes a non-trivial phenomenon inlatest DRAM technologies [15]. VRT cells transition betweenhigh retention state and low retention state. Since they areunpredictable, we also have to conservatively count VRT cells

1This work is supported in part by NSF 1218867, 1213052, 1409798, andDepartment of Energy under Award Number DE-SC0005026.

Channel 0 Rank 0

Chip A Chip B

bank

bank

Rank 1 Chip C Chip D

bank

bank

CPU cache

data bus

commandbusCS0 CS1

memory controller

processor

Fig. 1. DRAM system organization.

as weak cells. Combined these effects together, we expect anon-negligible portion of a DRAM will be weak. Previousworks have the same concerns and show that DRAM refreshwill soon become a system performance bottleneck and alsocause an energy waste as high as 20% [14], [17], [18].

One of possible solutions is to exploit the observation thatthe DRAM retention time is a distribution across all the DRAMcells. In a simple fixed-rate refresh scheme, the worst DRAMcell in terms of retention decides the minimum refresh fre-quency. Therefore, the goal of this paper is to find an efficientway of refreshing DRAM at multiple refresh rates. There areother efforts toward the same direction (e.g. RAIDR [14]).However, they require special memory controller designs andtheir proposed refresh scheme is incompatible to modernDRAM standards. Hence, another important target of thispaper is to ensure the proposed technique can be fit intomodern DRAM systems.

In this paper, we propose ProactiveDRAM, an innovativescheme that meets both of our goals. ProactiveDRAM allowsa DRAM to self-manage its weak row retention time. To ourbest knowledge, ProactiveDRAM is novel in several aspects:

• It is compatible to the auto-refresh framework;• It manages DRAM retention on a row granularity;• DRAM chip has the capability of communicating with

memory controller and orchestrating the additionalrefreshes for its weak rows;

• It reuses the command queue in memory controllers;• The performance and energy impacts are negligible.

II. BACKGROUND AND MOTIVATION

We first review the DRAM technology briefly and explainour motivation of this work.

A. DRAM Basic

Figure 1 shows the conceptual view of a DRAM system,which is hierarchically organized as channels, ranks, and

978-1-4799-6492-5/14/$31.00 ©2014 IEEE 22

Page 2: ProactiveDRAM: A DRAM-initiated Retention Management ......DRAM cell DRAM row sense amp DRAM bank subarray wordline bitline Fig. 2. DRAM internal bank architecture. banks. A rank can

DRAM cell

DRAM row

sense amp

DRAM bank

subarray

wordline

bitline

Fig. 2. DRAM internal bank architecture.

banks. A rank can consist of multiple DRAM chips, and aDRAM chip usually has 8 internal banks. Today’s commodityDRAM chip already reaches the multi-gigabits scale, and abank is commonly further divided into multiple subarrays asillustrated in Figure 2. A DRAM subarray contains multiplerows (e.g. 512 rows in most designs), and the DRAM accessis on the row granularity to amortize peripheral circuitryoverhead. Modern DRAM standards specify the row size from1 KB [11] to 4 KB [10]. Memory controllers (MC) manageeach DRAM channel independently. The communication be-tween MC and channel is through a command bus, a data bus,and chip select signals (CS).

Accessing data stored in DRAM starts by sending arow activation command (ACT). The ACT command carriesthe row address information and triggers the bitline senseamplifier (BLSA) to read the data from the correspondingrow. After that, a column access command (either READ orWRITE) selects parts of the data latched in BLSA and transfersthem between DRAM subarray and I/O pins. Finally, we needto close the opened row before accessing another row. This isdone by precharging all the bitlines back to half Vdd (PRE).

B. DRAM Refresh

DRAM devices operate by storing electrical charge oncapacitors. Since the capacitor loses the charge over time,eventually it needs to be periodically refreshed to the originalcharge level. The amount of time a DRAM bitcell can reliablymaintain the stored data is called “retention time”, and allof the DRAM bitcells must be refreshed before retentiontime expires. The retention time (tREF) depends on thedevice operating temperature. Commonly, tREF is 64 ms forDDRx [8], [11] or 32 ms for LPDDRx [9], [10].

DRAM refresh (REF) is performed by activating a row(ACT) and then precharging it (PRE) without any columnaccess. Assuming a DRAM bank has 8,192 rows , it requires8,192 REF commands to be issued during every retentionwindow, and the interval between two REF commands isdefined as tREFI=tREF/8,192. Historically, early DRAMtechnologies (such as EDO-DRAM in 1990s) all use such aper-row refresh scheme. At that time, it was MC’s responsibil-ity to satisfy the DRAM retention requirement. MC maintainsa refresh row counter and sends out an ACT command (a.k.a.row address strobe, RAS) within every tREFI interval. Thisscheme is called RAS only refresh, or ROR.

However, the DRAM density has been scaled up tremen-dously over the past decade, and now each DRAM bank

Refreshing Bank 0, Row A Refreshing Bank 0, Row B

timeline

tRC’

tRRD’

tRFC

Refreshing Bank 1, Row A Refreshing Bank 1, Row B

Refreshing Bank 2, Row A Refreshing Bank 2, Row B

Refreshing Bank 7, Row A Refreshing Bank 7, Row B

Fig. 3. DRAM auto refresh scheme: each refresh command refreshes multiplerows in a staged way.

TABLE I. DDR3-1600 DRAM TIMING PARAMETERS

Capacity 512 Mb 1 Gb 2 Gb 4 Gb 8 Gb

# of rows 8,192 16,384 32,768 65,536 65,536tRRD 6 ns 7.5 nstFAW 30 ns 40 nstRC 47.5 ns

tRFC (standard) 90 ns 110 ns 160 ns 300 ns 350 nstRFC* (if ROR) 95.5 ns 155.5 ns 275.5 ns 515.5 ns 670 ns

commonly contains more than 32,768 rows. It becomes chal-lenging for MCs to manage DRAMs with different numberof rows. In addition, ROR fails to work in this new scenariobecause refreshing one row at each time obviously causes avery short tREFI, putting significant burden on MC commandscheduling. As a result, all modern DRAM designs switchto a new refresh framework, called auto-refresh (AR). InAR, the refresh row counter is incorporated into DRAMchips themselves. MC only provides a fixed refresh beat,always sending out 8,192 REF commands during one retentionwindow. Upon receiving a REF command, DRAM refreshesmultiple rows in a group depending on how large the DRAMcapacity is. For example, in a 8-bank DRAM design whereeach bank contains 32,768 rows, each refresh group has 64rows. The latency of refreshing multiple DRAM (tRFC) canbe roughly estimated as:

tRFC∗ = (N − 1)× tRRD + tRC (1)where tRRD is the minimum interval between two rowactivations, tRC is the minimum time required for openingand then closing a DRAM row, and N is the number ofrows in a refresh group. Which N rows to refresh everytime is determined by DRAM vendors and can be varyfrom one vendor to another. This flexibility allows DRAMvendors to tighten their refresh circuitry timing and reachesa shorter overall latency that the obsolete ROR frameworkcannot achieve. Today’s DRAM optimizes tRFC by addingextra logic to enable concurrent row refreshes inside a banksimultaneously and dividing a refresh group into subgroups.This change increases the actual tRC’ and tRRD’ becauserefreshing multiple rows generates more sensing noise andpower delivery pressure. Therefore, the actual tRFC is:

tRFC = (N/M − 1)× tRRD′ + tRC′ (2)where M is the number of rows in a refresh subgroup.

Figure 3 shows an example where 16 rows are refreshedin 2 subgroups. Refresh subgrouping makes AR have muchshorter refresh latency than ROR does. Table I lists DDR3-1600 the timing parameters [8]. Estimating using Equation 1with further tFAW (four activation window) concern, we cansee that if ROR were still used, tRFC would be much larger.

C. Retention Time Variation

DRAM retention time tREF can be modeled as

tREF ∝ CS/Ileak (3)

23

Page 3: ProactiveDRAM: A DRAM-initiated Retention Management ......DRAM cell DRAM row sense amp DRAM bank subarray wordline bitline Fig. 2. DRAM internal bank architecture. banks. A rank can

where CS represents the DRAM cell capacitance and Ileak isthe leakage current from the capacitor node. The retention timeis highly related to temperature because the leakage currentincreases significantly at high temperature. Modern DRAMsmaintain the retention time at 64 ms for standalone memory(e.g. DDR3 [8], DDR4 [11]) and 32 ms for mobile memory(e.g. LPDDR2 [9], LPDDR3 [10]) due to high temperatureconcern. Recent JEDEC standards (e.g. DDR4, LPDDR3)specify a 4X refresh mode where the expected cell retention isreduced by 4 times when temperature reaches 85 ◦C, and therefresh frequency has to be increased accordingly.

Besides the temperature dependency, it is always chal-lenging to keep the retention time at the nominal level asthe DRAM industry keeps scaling down the process featuresize. Along the cell size shrinking, DRAM is facing evensmaller CS and more fluctuating Ileak, which make theDRAM retention time more sensitive to variation. Previousstudies have shown that DRAM cell retention time followsa log-normal distribution [12], [13] due to process variation.Some cells might inherently have smaller CS because ofprocess imperfection and some have larger leakage currentpath. Those cells are identified as weak cell. Even worse, anew phenomenon, variable retention time (VRT) is expectedto become a dominant effect. VRT is generally believed tobe caused by the presence of charges trapped in DRAM cellaccess transistor [15], and it makes some cells experience anunpredictable retention time. Researchers commonly observe acell’s retention time dropped by a factor of 4 [15]. In summary,DRAM weak cells can be caused either by process variationor VRT. We expect the weak cell population keeps increasing,and it will soon exceed a level where we can effectively handleusing traditional approaches such as row redundancy and ECC.

In this work, we use LPDDR3’s 32 ms retention time as ournominal value, and model the weak cell distribution on a 20 nmprocess node scaled from previous work [12]. We then derivethe weak row distribution by converting the weak cell failureprobability into weak row failure probability. Assuming eachDRAM row has 16,384 cells (e.g. an LPDDR3 8Gb DRAMhas 2 KB row size), we draw both the weak cell and weak rowcumulative distribution functions in Figure 4. According to ourmodel, 1% LPDDR3 DRAM rows fail to reach the nominal32 ms retention target. This 1% weak row population is toohigh to repair using the traditional row redundancy scheme.

D. Retention-aware Refresh

RAIDR [14] is a well-know retention-aware refresh tech-nique. Its key idea is to group DRAM rows into retention timebins and apply a different refresh rate to each bin. RAIDRstores retention time bins in the memory controller, whichcauses two issues in applying RAIDR into practice.

First, RAIDR requires us to revive the obsolete RAS-only refresh (ROR), in which memory controllers can decidewhich row to refresh at which time. However, as discussed inSection II-B, ROR is no longer used in modern DRAM designsdue to its scalability drawback and incapability of leveragingthe recent refresh circuit optimizations (e.g. concurrent refreshsubgrouping).

Second, RAIDR implements the whole DRAM chip refreshby sending periodic ACT commands with refreshing row

1E-9

1E-8

1E-7

1E-6

1E-5

1E-4

1E-3

1E-2

1E-1

1E+0

8 16 32 64 128 256

Cum

ulat

ive

cell

failu

re p

roba

bilit

y

Retention time (Unit: ms)

weak cell distributionweak row distribution

Fig. 4. The weak cell and weak row cumulative distributions.

information. Such a scheme might cause memory commandbus congestion as it constantly consumes a part of commandbus bandwidth. Even worse, RAIDR tends to distribute thoseACT commands over time to avoid burst refreshing stage,but the side effect is that DRAM never detects a sufficientlylong idle period to put itself into a power saving state.Therefore, RAIDR might cause undesired performance andpower overhead in our harsh retention distribution setting.

III. DESIGN AND IMPLEMENTATION

Our design starts by modifying RAIDR and then reappliesthe basic concept into a brand-new memory architecture.

A. Modified RAIDR

Our first attempt is to modify the original RAIDR schemeand make it AR-compatible. To do that, we revert somechanges made by original RAIDR and let DRAM chipsmaintain their refresh row counter as usual. Figure 5 shows ourmodification. Since now MCs are no longer aware of whichDRAM rows are under refresh, DRAMs need to maintainrefresh row counters again and in addition store their ownweak row information. The memory system is then refreshedbased on the nominal 1X retention time. Each DRAM rowat least gets one refresh within a retention window. Duringevery normal refresh, DRAM looks up the weak row tableand checks whether the phase-shifted refresh groups have areduced retention time. If weak rows are found in a phase-shifted region, DRAM internally conducts another refreshoperation to keep the weak rows alive.

Figure 6 demonstrates how phase-shifted rows can befound. In this example, we assume one REF commandrefreshes 4 rows in a DRAM bank (i.e. 32 rows in a DRAMchip). Note that it is not necessary that these 4 rows areconsecutive as illustrated in Figure 6 considering DRAMvendors might distribute them into different subarrays tobalance power consumption. If a weak row is defined to have4X shorter retention time, the phase-shifted rows of Row Xis Row (X+N/4), Row (X+N/2), and Row (X+3N/4). In thisexample, weak rows are found in the N/4 and 3N/4 regions,thereby two more internal refresh operations are added.

Considering now tRFC might be dynamically doubled,tripled, or quadrupled depending on how many phase-shiftedregions contain weak rows, MCs are unable to reserve therefresh time upfront according to any standard. Therefore, weintroduce a sideband signal allowing DRAM indicates howlong the current tRFC value shall be. As shown in Figure 6,during the normal refresh period (i.e. the first tRFC), the

24

Page 4: ProactiveDRAM: A DRAM-initiated Retention Management ......DRAM cell DRAM row sense amp DRAM bank subarray wordline bitline Fig. 2. DRAM internal bank architecture. banks. A rank can

refresh logic

row counter

memory controller

memory scheduler

extra ACT commands

command

data

non-JEDEC DRAM A

non-JEDEC DRAM B

refresh logic

memory controller

memory scheduler

REF command

command

data

DRAM B dynamic tRFC

control Asserted

when refresh ongoing

(a) Original RAIDR using ROR (b) Modified RAIDR using AR

row counter weak

table B

DRAM A

row counter weak row

table A

weak row table A

weak row table B

Fig. 5. Original RAIDR vs. modified RAIDR with auto refresh compatibility.

RR

RR

W

Wnormalrefresh

extrarefresh

extrarefresh

row 0

row N/2

refreshdirection

normalrefresh

extrarefresh

extrarefresh

check 3 phase-shifted regions

tRFC

commandbus

DRAMrefreshcircuit

DRAMretentionchecker

sideband signal (busy refreshing)

REF

N/4 region 3N/4 region

weak row currentrefreshing

row

Fig. 6. An example of modified RAIDR refresh.

additional circuitry in DRAM checks whether the next phase-shifted region contains any weak rows by looking up the weakrow list. If weak rows are found in one region, an extra refreshoperation is internally appended at the end of current one.During the entire refresh, DRAM asserts the sideband signal,indicating a 3X tRFC in this example.

This modified RAIDR uses AR refresh framework insteadof ROR. The only change is to add a sideband signal. Thissignal has a toggling rate on the 100 ns order and can be easilyadded without considering high-speed I/O circuit designs.

B. ProactiveDRAM

The modified RAIDR scheme sets the first step of makingDRAMs provide on-the-fly feedback to MCs. This eases theretention-aware refresh management. However, the manage-ment granularity is on the refresh group level (e.g. 32 rows ina 8 Gb LPDDR3 DRAM) and inevitably causes inefficiency.Back to the example shown in Figure 6, even though only 2weak rows are found in the phase-shifted regions, 3 normalrows in Region N/4 and 3 rows in Region 3N/4 have to berefreshed at a 4X frequency.

To further optimize the cell retention management, wepropose ProactiveDRAM. The basic concept is to mix theuse of AR for normal refresh and ROR for extra refreshcompensation. To implement this concept, a DRAM behaveslike a memory operation requester when DRAM decides someof its rows need supplementary refreshes and sends out ACTcommands to the MC. On the MC side, these ACT commandsare inserted to the normal command queues for scheduling.DRAM becomes proactive, and only weak rows explicitlyreceive extra refresh to compensate their shorter retention.

Similar to Figure 5, we still deploy a sideband signal, butthis signal now carries weak row addresses. Figure 7 shows theproposed memory system. It is possible to only use one wire byserializing all the address information with a predefined headerpacket. The only change to the MC is to add a deserializer fortranslating the sideband information into an ACT command

transaction scheduler

address transaction

command queue

command encoding

electrical signaling

from requester DRAM

deserializer

back-end bus protocol engine front-end transaction scheduler

additional ACT command w/ weak

row address

serialized weak row address

memory controller

Fig. 7. ProactiveDRAM and its modification to memory controller.

R R R R

W

W normal refresh

extra refresh

row 0

row N/2

refresh direction

weak row X

weak row Y

refresh logic

memory controller

memory scheduler

REF command

command

data

DRAM B dynamic tRFC

control

ProactiveDRAM

row counter

DRAM A

command queue

Sideband channel (Row X and Y

addresses)

row counter weak row

table

Fig. 8. An example of ProactiveDRAM: during a normal refresh, DRAMdetects weak rows that need extra refreshes, and sends out a serialized signalto memory controller that carries weak row addresses.

and inserts it into the normal command queue for futurescheduling. This scheme reuses the existing command queuefor row-level retention management and AR for basic DRAMrefreshing. In case that the extra refresh is urgent, we canassign a higher priority value to the inserted ACT command,the modern memory schedulers can promptly execute high-priority commands so that the retention requirement for weakrows can be satisfied in time.

Figure 8 shows the conceptual view of ProactiveDRAM.Under the same retention distribution, DRAM detects twoweak rows (i.e. row X and row Y) during a normal refreshoperation. Unlike modified RAIDR, the refresh operation stillfinishes within the standard tRFC time, therefore no dynamictRFC control is needed on the MC side.

C. Hardware Overhead Analysis

Adding logic to DRAM chips is controversial. However,our proposals only require negligible hardware overhead.

First, the bloom filter technique provides an efficient struc-ture to represent set membership and can be used to reducethe hardware overhead of storing the weak row information.Based on the analysis in the previous works [4], the storageoverhead for a 2-set (weak and not weak) bloom filter isonly about 1.25KB. The weak row information is providedby DRAM vendors and is filled during the chip testing.An eFuse OTP (one time programmable) memory block canserve for this purpose. eFuse technology is DRAM process-compatible and widely used in DRAM manufacturing already,e.g. modern DRAMs use eFuse to turn on redundant rowsduring post-silicon repair [4]. Assuming the eFuse bitcell sizeis 6500 F2 [19], the overall die size overhead of adding a1.25 KB eFuse OTP is less than 0.1% in a 8 Gb DRAM. Also,we have a highly relaxed timing requirement to the weak rowchecking logic because we only need the result after the firsttRFC time, which is usually longer than 200 ns.

The row address serializer on DRAM and the deserializeron MC are also needed. They are both simple FIFOs with a re-laxed timing requirement because we can process the sidebandsignal information concurrently during normal operations.

25

Page 5: ProactiveDRAM: A DRAM-initiated Retention Management ......DRAM cell DRAM row sense amp DRAM bank subarray wordline bitline Fig. 2. DRAM internal bank architecture. banks. A rank can

TABLE II. SIMULATION SETTINGS

Core 4-cores, 3.2 GHz, ARMv7 coresI-L1/D-L1 caches private, 32 KB, 8-way, 64-byte cache line

L2 cache shared, 1 MB, 16-way, LRU, 64-byte cache lineMemory controller 16-entry transaction queue, FR-FCFS scheduler

DRAM 2 channels, 1 rank, 8 banks

TABLE III. 8GB LPDDR4 DRAM PROJECTION

Architecture 2-channel, 8-bank, 32,768-row, 1,024-columnBurst length x16 I/O width, 16n-prefetch

Frequency 1600 MHz Retention 32 msRL 30 cycles WL 12 cycles

tRAS 42 ns tRCD 29 nstRRD 10 ns tFAW 50 nstRP 18 ns tRTP 7.5 ns

tWTR 10 ns tWR 20 nstRFC 180 ns tDQSCK 2 ns

IV. EXPERIMENTS

We model both modifed RAIDR and ProactiveDRAM. Thissection describes our simulation methodology and results.

A. Simulation Methodology

We model a 3.2 GHz quad-core out-of-order ARMv7 mi-croprocessor using our modified version of gem5 [1] with anin-house DRAM simulator integrated. Our DRAM simulator iscycle-accurate and capable of modeling both memory subsys-tem performance and power consumption (including DRAMcore power and I/O power). We adopt the DRAM open-page management policy with an FR-FCFS [20] scheduler inour simulator settings. Table II lists the key parameters. Toevaluate the retention-aware refresh in future DRAM products,we project an LPDDR4 8Gb DRAM die parameters based onsome preliminary JEDEC disclosure [21]. Table III lists thekey DRAM parameters we use in our simulation.

We select 15 memory-intensive benchmarks from SPEC2006 [23], EEMBC 2.0 [2], and HPEC [6]. We fast-forwardeach simulation to a pre-defined code region, warm-up 10 mil-lion instructions, and simulate for at least 1 billion instructions.

B. Performance Speedup

Figure 9 shows the performance comparison. The valuesare normalized to the baseline system, which uses 32ms asthe retention time and needs 1% row redundancy to cover theweak rows. The 4xRefresh system uses 8ms as the DRAMretention time. Based on Figure 4, the portion of weak rowsis significantly reduced to smaller than 0.01% in this system.However, as shown in Figure 9, its performance is degraded by20% on average compared to the baseline due to more frequentrefresh operations.

Both modified RAIDR and ProactiveDRAM maintain32 ms retention time and eliminate the necessity of redundancyrows. Figure 9 shows that modified RAIDR improves thesystem performance by 15% compared to the 4xRefreshsystem but still 8% worse than the baseline. On the contrary,ProactiveDRAM is 8% better than modified RAIDR, and itsperformance degradation compared to baseline is only 0.4%.

C. Energy Savings

Figure 10 shows the energy comparison. We can see that4xRefresh causes 20% energy overhead because its refresh

40%50%60%70%80%90%

100%110%

Nor

mal

ized

sys

tem

pe

rfor

man

ce

4xRefresh Modified RAIDR ProactiveDRAM

Fig. 9. Normalized performance comparison of 4xRefresh, RAIDR,ProactiveDRAM, assuming 1% DRAM rows fail to have normal retention.

60%70%80%90%

100%110%120%130%

Nor

mal

ized

ene

rgy

cons

umpt

ion

4xRefresh Modified RAIDR ProactiveDRAM

Fig. 10. Normalized energy consumption comparison of 4xRefresh, RAIDR,ProactiveDRAM, assuming 1% DRAM rows fail to have normal retention.

frequency is 4 times higher. Modified RAIDR reduces theoverhead to 8%, but the way it manages extra refreshes isstill inefficient. ProactiveDRAM further save the energy by7% compared to modified RAIDR, and its energy overhead isonly 0.5% above the baseline, which is negligible.

D. Sensitivity to Weak Row Population

In the previous simulations, we assume that only 1%LPPDDR4 DRAM rows fail to reach the normal retentiontarget. But, as the process scaling, this percentage will beincreased. Figure 11 and Figure 12 shows the simulation resultswhen 10% rows fail to reach the normal retention target.

In this scenario, the modified RAIDR drops another 7%of performance, which means a total 15% performance islost in managing the extra refresh for the 10% weak rows.As a comparison, ProactiveDRAM only experiences a 3.7%performance degradation. We need to understand that thereis no practical way to repair 10% weak row using thetraditional methods like row redundancy. Therefore, paying3.7% performance loss is a viable way to keep DRAM dieyield on an acceptable level since a DRAM die containing10% weak rows is still considered as a good die.

As for the energy comparison, modified RAIDR consumes16.5% more energy, but ProactiveDRAM still maintains itsenergy overhead as low as 4.6% on average.

Considering its small performance and energy overhead,ProactiveDRAM is efficient to manage DRAM’s weak rowseven if their portion is as high as 10%. It makes Proactive-DRAM an useful feature to next-generation memory systems.

V. RELATED WORK

To our knowledge, ProactiveDRAM is the first work topropose a low-overhead retention-aware refresh via DRAM

26

Page 6: ProactiveDRAM: A DRAM-initiated Retention Management ......DRAM cell DRAM row sense amp DRAM bank subarray wordline bitline Fig. 2. DRAM internal bank architecture. banks. A rank can

40%50%60%70%80%90%

100%110%

Nor

mal

ized

sys

tem

pe

rfor

man

ce

4xRefresh Modified RAIDR ProactiveDRAM

Fig. 11. Normalized performance comparison of 4xRefresh, RAIDR,ProactiveDRAM, assuming 10% DRAM rows fail to have normal retention.

60%70%80%90%

100%110%120%130%

Nor

mal

ized

ene

rgy

cons

umpt

ion

4xRefresh Modified RAIDR ProactiveDRAM

Fig. 12. Normalized energy consumption comparison of 4xRefresh, RAIDR,ProactiveDRAM, assuming 10% DRAM rows fail to have normal retention.

itself detecting weak row and orchestrate MC to add sup-plementary refresh on a row granularity. RAIDR [14] is sofar the closest prior art. However, RAIDR requires the entirememory refresh system to revert back to the obsolete RORscheme, which is not a scalable approach in dealing today’sgigabit-scale DRAM chips. Also, RAIDR puts a burden onMC, making it responsible for DRAM retention profiling.

There are other work focused on reducing DRAM refreshoverhead. Mukundan et al. [17] studied the fine-grained refreshin DDR4 standard and proposed a prioritizing scheme basedon the rank staged refresh schedule. Nair et al. [18] advocateda refresh pausing technique. However, their assumption thatthere are breakpoints in middle of a refresh operation is not thecommon case in modern DRAMs. Other researchers exploitedthe fact that a DRAM row activation operation essentiallyrefreshes that row. Based on this observation, Song [22]proposed a smart DRAM that remembers the last referencetime of each row so that a later refresh operation can beskipped if it is recently accessed. Ghosh and Lee [3] designeda similar scheme but implemented it as a memory controllerfeature. On the software side, RAPID [24], Flikker [16], andESKIMO [7] all differentiate the DRAM high retention regionsand low retention regions and then partition data into differentregions based on software-level knowledge.

VI. CONCLUSION

The continuous DRAM scaling poses challenges in DRAMretention time management. Process variation and VRT willsoon cause a significant portion of DRAM cells to have ashorter retention time, and a retention-aware intelligent refreshmechanism is needed. Therefore, we propose ProactiveDRAM,a novel DRAM system design that allows DRAM itself todetect the necessity of weak row extra refresh and to orches-trate memory controller to send out auxiliary row activationcommands. ProactiveDRAM introduces a DRAM-to-controller

backward communication channel to the memory systemdesign. Through this sideband channel, ProactiveDRAM isable to run retention-aware refresh management on a rowgranularity and still utilizing DRAM’s auto-refresh feature.It hides the existence of weak rows with a low performanceand power overhead. ProactiveDRAM can be considered as avaluable add-on feature to next-generation memory systems.

REFERENCES

[1] N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu,J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell,M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, “The gem5simulator,” SIGARCH Computer Architecture News, vol. 39, no. 2, 2011.

[2] EEMBC, “EEMBC benchmark,” http://www.eembc.org/.

[3] M. Ghosh and H.-H. S. Lee, “Smart refresh: An enhanced memorycontroller design for reducing energy in conventional and 3D die-stacked DRAMs,” in MICRO, 2007.

[4] S. K. Goel and T.-C. Huang, “DRAM repair architecture for Wide I/ODRAM based 2.5D/3D system chips,” US Patent 2013/0 044 554 A1, 221, 2013.

[5] T. Hamamoto, S. Sugiura, and S. Sawada, “On the retention timedistribution of dynamic random access memory (DRAM),” IEEETransactions on Electron Devices, vol. 45, no. 6, 1998.

[6] HPEC, “HPEC benchmark,” http://www.ll.mit.edu/HPECchallenge/.

[7] C. Isen and L. John, “ESKIMO: Energy savings using semantic knowl-edge of inconsequential memory occupancy for DRAM subsystem,” inMICRO, 2009.

[8] JEDEC, “JESD79-3E DDR3,” 2009.

[9] JEDEC, “JESD209-2E LPDDR2,” 2010.

[10] JEDEC, “JESD209-3 LPDDR3,” 2012.

[11] JEDEC, “JESD79-4 DDR4,” 2013.

[12] K. Kim and J. Lee, “A new investigation of data retention time in trulynanoscaled DRAMs,” IEEE Electron Device Letters, vol. 30, no. 8, Aug2009.

[13] Y. Li, H. Schneider, F. Schnabel, R. Thewes, and D. Schmitt-Landsiedel, “DRAM yield analysis and optimization by a statisticaldesign approach,” IEEE Transactions on Circuits and Systems I, vol. 58,no. 12, 2011.

[14] J. Liu, B. Jaiyen, R. Veras, and O. Mutlu, “RAIDR: Retention-awareintelligent DRAM refresh,” in ISCA, 2012.

[15] J. Liu, B. Jaiyen, Y. Kim, C. Wilkerson, and O. Mutlu, “An experimentalstudy of data retention behavior in modern DRAM devices: Implicationsfor retention time profiling mechanisms,” in ISCA, 2013.

[16] S. Liu, K. Pattabiraman, T. Moscibroda, and B. G. Zorn, “Flikker:Saving DRAM refresh-power through critical data partitioning,” inASPLOS, 2011.

[17] J. Mukundan, H. Hunter, K.-h. Kim, J. Stuecheli, and J. F. Martı́nez,“Understanding and mitigating refresh overheads in high-density DDR4DRAM systems,” in ISCA, 2013.

[18] P. J. Nair, C.-C. Chou, and M. K. Qureshi, “A case for refresh pausingin DRAM memory systems,” in HPCA, 2013.

[19] Y.-B. Park, I.-H. Choi, D.-H. Lee, L. Jin, J.-H. Jang, P.-B. Ha, and Y.-H.Kim, “Design of an eFuse OTP memory of 8 bits based on a 0.35µmbcd process,” in ICMIC, 2011.

[20] S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens,“Memory access scheduling,” in ISCA, 2000.

[21] D. Skinner, “LPDDR4 moves mobile,” in Mobile Forum, 2013.

[22] S. P. Song, “Method and system for selective DRAM refresh to reducepower consumption,” US Patent 6 094 705 A, 7 25, 2000.

[23] SPEC CPU, “SPEC CPU2006,” http://www.spec.org/cpu2006/.

[24] R. Venkatesan, S. Herr, and E. Rotenberg, “Retention-aware placementin DRAM (RAPID): software methods for quasi-non-volatile DRAM,”in HPCA, 2006.

27


Recommended