PAY-AS-YOU-GO STORAGE-EFFICIENT HARD ERROR CORRECTION
Moinuddin K. QureshiECE, Georgia Tech
Research done while at: IBM T. J. Watson Research Center New York
MICRO 2011 Dec 6, 2011
PAY-AS-YOU-GO, MICRO-2011
Introduction
PCM is a scalable technology. Device state changed by heating.
Over time, write operations break heater Cell gets stuck
Reported write endurance: 10-100 million writes/cell
With good wear leveling still possible to have 8+ years lifetime
PAY-AS-YOU-GO, MICRO-2011
Not All Cells Are Created Equal
Variability in lifetime due to process variation: weak vs. strong cells
Weak cells fail much earlier reduce system lifetime greatly
Lifetime usually modeled as Gaussian with SDEV of 10-30% of meanWe use SDEV=20% of mean
P (5 SDEV from mean) ≈ 10-6
For 1GB memory bank, 8K bits fail at time 0, more as we write!
PCM needs significant amount of error correction to handle variability
PAY-AS-YOU-GO, MICRO-2011
Write Efficient Code
Traditional ECC codes are write intensive More wear
Endurance related (hard) faults identified with checker read
Write-efficient code: Error Correcting Pointers [ISCA’10]
ECP needs 10 bits per entry. Handles multiple faults (needs 1 Full bit)
0 1 2 3 4 … 511
Cache Line (512b)
XPointer
9 bit
D
For correcting N errors, ECP needs (10N+1) bits
1 bit
PAY-AS-YOU-GO, MICRO-2011
Expensive to Correct Many Errors
To get 6+ years lifetime, we need to correct six errors per line
Storage: 61 bits/line (about 12%, 1GB for 8GB) Expensive
Unlike ECC in current DRAM chips, this overhead is not optional
0 1 2 3 4 5 6 7
Baseline System Lifetime (years)
NoECPECP-1 ECP-2 ECP-3 ECP-4 ECP-5 ECP-6
Goal: Reduce storage significantly (3X-6X) while retaining lifetime
PAY-AS-YOU-GO, MICRO-2011
Motivation
Uniformly allocating error correction entries is inefficient (by ~20X)
We do not need to pay for error correction of each line upfront
Pay-As-You-Go: Give error correction entries in proportion to errors
Num Writes(Normalized)
No ECP used
Only ECP-1 used
ECP-2 to ECP-6 used
Average ECP Used
50% 99.02% 0.97% 0.01% 0.01
95% 79.63% 18.14% 2.23% 0.23
100% 73.24% 22.82% 3.95% 0.31
Utilization of error correction entries per line
Key insight: Very few lines have large number of errors
PAY-AS-YOU-GO, MICRO-2011
Outline
Introduction & Motivation PAYG Design Results Even More Storage Efficiency Related Work Summary
PAY-AS-YOU-GO, MICRO-2011
Naïve Design for PAYG
MEMORY LINE (64B)OFB
Ways (Num GEC entries per set)
Sets V TAG ECP-N
GEC Entry
Global Error Correction (GEC) Pool
Given 73% of lines have no error, why not give ECP-6 only on error?
GEC Pool structure: Set associative vs. Fully associative (impractical)
PAY-AS-YOU-GO, MICRO-2011
Three Key Problems
1. Set associative structure is inefficient (by ~8X for 8-way)
2. If we allocate six ECP entries per each GEC entry, most errorcorrection entries still remain unused
3. Given >25% of lines are likely to have at-least on error, the latency impact of GEC is significant
PAY-AS-YOU-GO, MICRO-2011
Inefficiency of Set Associative GEC There are 10s/100s of thousand of sets Any set could overflow
How many entries used before one set overflows? Buckets-and-Balls
An 8-way GEC only 12% full when one set overflows Need 8x entries
PAY-AS-YOU-GO, MICRO-2011
Scalable Structure for GEC Pool
“Hash-Table With Chaining” structure for flexibility & low latency
OFB
OFBSet Associative Table (SAT)
Global Collision Table (GCT)
GEC Entry
1
PTR1
PTR
GCT-HEAD
*PTR is two-way replicated
TAKEN BY SOME OTHER SET
PAY-AS-YOU-GO, MICRO-2011
Scalable Structure for GEC Pool
Structure Total Entries Latency
Fully Associative N Very High
8-way Set Associative 8*N 1
8-way (SAT+GCT) 1.5*N 1+epsilon
Proposed GEC structure has latency similar to Set Associative Table while needing 5X fewer entries
Global Collision Table (GCT) with half as many sets as SAT is sufficient
Lets say we want to store N entries
PAY-AS-YOU-GO, MICRO-2011
Solving Other Two Problems
2. Fine Grained Allocation for effectively utilizing ECP entries• Each GEC entry has only ECP-1. • Each line can have multiple GEC entries• We guarantee that all entries are in same set of (SAT/GCT)• A faulty line can get more than ECP-6 as well
3. Local Error Correction (LEC) for low latency in common case• Each line has dedicated ECP-1 (handles 95% lines)• Ensures extra accesses (GEC) needed for only few lines
PAY-AS-YOU-GO, MICRO-2011
PAYG: Tying it All TogetherPAYG performs on-demand allocation of error correction entries
PAYG has 3 levels. LEC is first line of defense (lowers latency) SAT is second and GCT is third (flexible)
PAY-AS-YOU-GO, MICRO-2011
Outline
Introduction & Motivation PAYG Design Results Even More Storage Efficiency Related Work Summary
PAY-AS-YOU-GO, MICRO-2011
Evaluation Settings Assumptions:
1. Mean writes 32 Million, SDEV=20%, no correlation2. Perfect wear leveling all lines get same number of writes3. Writes are converted into writes-read to detect faults
Configuration:PCM bank of 1GB with 64B lines, so 16 million lines per bankWrite latency of 1 micro secondAt 100% write traffic, lifetime is 18 years (if zero variance)
Figure of Merit:Uniform ECP-6 gets 35% of ideal lifetime, so 6.5 yearsWe report lifetime with respect to Uniform ECP-6
PAY-AS-YOU-GO, MICRO-2011
ECP-1
ECP-2
ECP-3
ECP-4
ECP-5
ECP-6
32K
64K
128K
256K
512K
1024K
2K 4K 8K 16K
32K
64K
0
10
20
30
40
50
60
70
80
90
100
110
Life
time
wit
h re
spec
t to
ECP-
6 (%
)
Importance of Scalable GEC Pool
Proposed structure reduces storage overhead of GEC by more than 5X
Num SAT SetsNum GCT Sets
(SAT Sets=128K)
NoFGA-NoGCT NoFGA-wGCT Total Sets128K+64K=192K
PAY-AS-YOU-GO, MICRO-2011
Importance of Fine-Grained Alloc.Num ECP Entries in Each GEC Entry 5 4 3 2 1
Num GEC Entry per Set (64B line) 8 9 12 16 24
Total ECP Entries per Set 40 36 36 32 24
5 4 3 2 1100
102
104
106
108
110
112
114
116
Num ECP Entries in Each GEC Entry
Life
time
Nor
m. t
o EC
P-6
(%)
Fine-Grained Allocation improves the effectiveness of PAYG
PAY-AS-YOU-GO, MICRO-2011
Importance of LECWe can get higher lifetime by increasing GEC size but we still need LEC
5 years
For first 5 years, PAYG incurs on avg 1 extra access for < 0.4% accesses
Without LEC, latency impact is significant. With LEC, not so much
PAY-AS-YOU-GO, MICRO-2011
Storage Overhead
LEC Storage 13 bits/line (10 bit ECP + 1 valid + 2 OFB)
GEC Storage 6.5 bits/line on average
Total 19.5 bits/line
Scheme Storage Overhead(bits/line)
Lifetime
Uniform ECP-6 61 1X
Uniform ECP-8 81 1.13X
PAYG with ECP-1 in LEC 19.5 1.13X
PAYG provides lifetime similar to ECP-8 at 3.1X less storage than ECP-6
(Total storage overhead to protect 1GB reduces from 122MB to 39MB, down 83MB)
PAY-AS-YOU-GO, MICRO-2011
Outline
Introduction & Motivation PAYG Design Results Even More Storage Efficiency Related Work Summary
PAY-AS-YOU-GO, MICRO-2011
Efficient Single Bit CorrectionLEC responsible for most of storage overhead (13 bits out of 19.5 bits)
Need efficient schemes single bit hard faults Alternate Data Retry (ADR)
ADR: Mask hard fault by storing data in either normal or inverted form
110 1
SA-0
0
INV
001 0
SA-0
1
INV
ADR needs only 1 bit to mask a single stuck-at-fault (caveat: double write)
Reduce storage overhead of PAYG by using ADR instead of ECP-1 in LEC
PAY-AS-YOU-GO, MICRO-2011
Comparisons
Scheme Storage Overhead(bits/line)
Lifetime
Uniform ECP-6 61 1XUniform ECP-8 81 1.13XPAYG with ECP-1 in LEC 19.5 1.13XPAYG with ADR in LEC 9.5 1.02X
PAYG with heterogeneous error correction reduces storage by 6X
Hard to scale ADR to multiple faults. SAFER [MICRO’10] partitions lines with multiple faults into single bit faults. SAFER needs 55 bits/line and lifetime ~ECP-6
PAY-AS-YOU-GO, MICRO-2011
Outline
Introduction & Motivation PAYG Design Results Even More Storage Efficiency Related Work Summary
PAY-AS-YOU-GO, MICRO-2011
Non Uniform Error Correction
Variable Strength ECC (VS-ECC) by Alameldeen+ ISCA’11Proposed for cache reliability at low voltagesEach way has ECC-4 for one quarter of ways, allocated based on testingDifference: Cache line disabling works. Only set associative structure.
Layered ECP by Schechter+ ISCA’10ECP-1 for each line, and some ECP entries for each pageIn essence, this is a set-associative GEC with ECP-1 in LECDifference: Set associative GEC requires 5X more entries (inefficient)
Line Sparing with FREE-p by Hyun+ HPCA’11A faulty line is remapped to a spare area using embedded pointerSparing needs 1 good line for 1 uncorrectable faultDifference: PAYG is much more storage efficient than sparing
PAY-AS-YOU-GO, MICRO-2011
FREE-p: Sparing vs. CorrectionFor 1 extra error bit, PAYG needs 20 bit GEC entry, FREE-p needs 512 bit
PAYG is more effective than line sparing with FREE-p
PAY-AS-YOU-GO, MICRO-2011
Outline
Introduction & Motivation PAYG Design Results Even More Storage Efficiency Related Work Summary
PAY-AS-YOU-GO, MICRO-2011
SummaryPCM: limited endurance, variability across cells reduces lifetime
Need to correct many (six) errors per line
Uniform allocation is expensive and inefficient (only 0.3 out of 6 used)
Pay-As-You-Go (PAYG): Allocate error correction entries on demand
PAYG has LEC + GEC Pool (Set Associative Table + Global Collision Table)
Provides 1.13X lifetime compared to ECP-6 at 3.1X lower overhead
Heterogeneous scheme (ADR for LEC) reduces storage by 6X
PAYG useful for efficient hard-error correction in other technologies too