PRACTICAL NONVOLATILE MULTILEVEL-CELL
PHASE CHANGE MEMORY
Jichuan Chang,
Robert S. Schreiber,
Norman P. Jouppi
Hewlett-Packard Labs
Doe Hyun Yoon
IBM T. J. Watson Research Center
MEMORY CAPACITY CHALLENGE IN HPC
• DRAM as main memory
– Scaling is slowing down
• Hard to meet ever-increasing capacity demand
• Byte-addressable nonvolatile memory
– Phase change memory (PCM), memristor, …
– Scales better than DRAM
– Multilevel-cell (MLC) capability
– Nonvolatility
• Checkpoint, in-situ post processing
• High-performance file system
• NV MLC PCM for continued capacity scaling 2
MAJOR CHALLENGE: RESISTANCE DRIFT • Conventional 4-Level-Cell (4LC) Designs
– Naïve 4LC is useless
– Optimized 4LC is only barely usable
– Still need refresh -- it’s volatile memory
• Observation: Most errors in 4LC occur in one cell state
• Proposal: 3-Level-Cell (3LC) PCM – Simple, genuinely nonvolatile (>10 years retention)
– 3-ON-2 and mark-and-spare • Low-cost wearout tolerance for 3LC
– 1.41 bits/cell (vs. 1.52 in 4LC) • Only 7% lower capacity than (volatile) 4LC
3
PCM AND RESISTANCE DRIFT
4
PHASE CHANGE MEMORY • Best of DRAM and Flash
– Higher capacity, better scaling (vs. DRAM)
– Faster, byte-addressable NVM (vs. Flash)
• MLC (Multilevel-Cell) capability
– Store more than 1 bits per cell
• Ex) 2 bits per cell
• Caveats:
– Slow, low-bandwidth write
– Finite write endurance
– Resistance drift 5
Common problems
in both SLC and MLC
RESISTANCE DRIFT • PCM Cell resistance increases over time
– R(t), cell resistance at time t (t >0)
• A cell is programmed at t =0
• Sensed as R0 at time t0 (>0)
• : drift rate (0<<1)
• Drift errors
– Negligible in SLC PCM
– Major reliability problem in MLC PCM
6
0
0)(t
tRtR
DRIFT ERRORS IN 4LC PCM • 4 cell states: S1, S2, S3, S4
– PDF is truncated Gaussian
• ±2.75 around mean values
• Mean resistance values: 1, 2, 3, 4
– Threshold between states: 1, 2, 3
• Drift rate () increases with cell resistance
7 log10R
3.5 3 2.5 6.5 4 4.5 5 5.5 6
S1 S2 S3 S4
1 2 3
1 2 3 4
1E-10
1E-09
1E-08
1E-07
1E-06
1E-05
1E-04
1E-03
1E-02
1E-01
1E+00
2s 32s 17min 9hour 12day 1year 34year 1089year 34865year
DRIFT ERROR RATES • Monte-Carlo simulation
• Errors only in S2 and S3
8
S2
S3
Time
Fra
ctio
n o
f c
ells
with
an
err
or
REFRESH • Refresh before cells loose their data
– Consume already limited PCM write BW
– Too frequent refresh will make PCM unavailable
to users
• What PCM refresh interval is acceptable?
– At least 50% of write BW should be
available to users
– Refresh interval >17 minutes
• Caveat: PCM w/ refresh is no longer nonvolatile
9
CELL ERROR RATE • What cell error rate is tolerable?
– Goal: 10-year device MTBF
• Fewer than 1 erroneous 64B block
in a 16GB device for 10 years
– CER >1e-2
• Impossible to achieve the goal
even with unrealistically strong ECC
– CER ~1e-3 @ 17min refresh
• Barely meets the goal with BCH-10
• More analysis in the paper
10
BASELINE 4LC PCM
11
NAÏVE DESIGN: 4LCN
• Equal probability for all 4 states
• 17min refresh caps CER at ~1e-2
12
1E-10
1E-09
1E-08
1E-07
1E-06
1E-05
1E-04
1E-03
1E-02
1E-01
1E+00
2s 32s 17min 9hour 12day 1year 34year 1089year 34865year
Fra
ctio
n o
f c
ells
with
an
err
or
Refresh Interval
CER~1e-2
Unacceptable
Refresh interval > 17 min
4LCn
13
OPTIMAL STATE MAPPING • Drift only increases cell resistance
• Optimize 2, 3, 1, 2, 3 to minimize CER
– minimize CER(2, 3, 1, 2, 3)
– subject to i+2.75+<i< i+1-2.75-
– for i=1,2,3
0
0.5
1
1.5
2
2.5
2.5 2.75 3 3.25 3.5 3.75 4 4.25 4.5 4.75 5 5.25 5.5 5.75 6 6.25 6.5
pd
f o
f ce
ll re
sis
tan
ce
S1 S4 S2 S3
Simple
mapping Optimal
mapping
1 2 3
minimum
spacing
1 2 3 4
OPTIMAL STATE MAPPING: 4LCO
• CER ~1e-3 @ 17-min refresh
• With BCH-10, it meets the goal
14
1E-10
1E-09
1E-08
1E-07
1E-06
1E-05
1E-04
1E-03
1E-02
1E-01
1E+00
2s 32s 17min 9hour 12day 1year 34year 1089year 34865year
Fra
ctio
n o
f c
ells
with
an
err
or
Refresh Interval
4LCo
CER~1e-3, barely usable
with 10-bit correcting ECC
4LCn
Refresh interval > 17 min
PROPOSAL:
3LC PCM
15
0
0.5
1
1.5
2
2.5
2.5 2.75 3 3.25 3.5 3.75 4 4.25 4.5 4.75 5 5.25 5.5 5.75 6 6.25 6.5
pd
f o
f ce
ll re
sis
tan
ce
S3
3 0
0.5
1
1.5
2
2.5
2.5 2.75 3 3.25 3.5 3.75 4 4.25 4.5 4.75 5 5.25 5.5 5.75 6 6.25 6.5
pd
f o
f ce
ll re
sis
tan
ce
PROPOSAL: 3LC PCM
16
Wide margin
S1 S2 S4
1 2
Simple
mapping Optimal
mapping
• Observation:
– Most errors occur in one state (S3)
• DO NOT USE IT
– Wide Margin for S2
• Simple and optimal mapping (3LCn & 3LCo)
1 2 4
3LC DESIGNS (3LCN AND 3LCO) • Reliable for >10 years w/o ECC & refresh
• Genuinely nonvolatile
17
1E-10
1E-09
1E-08
1E-07
1E-06
1E-05
1E-04
1E-03
1E-02
1E-01
1E+00
2s 32s 17min 9hour 12day 1year 34year 1089year 34865year
Fra
ctio
n o
f c
ells
with
an
err
or
Refresh Interval
4LCo
3LCn
3LCo
CER~1e-9 at 16 years
No ECC, No refresh
4LCn
3LC PCM DESIGN ISSUES • How to store information?
– Binary information in ternary cells
• What about wearout failures?
• How to compensate for
the reduced cell density?
– 3LC’s ideal capacity is 1.58 bits/cell (log23)
– vs. 2 bits/cell in 4LC
18
HOW TO STORE BINARY INFO IN TERNARY CELLS?
• 3-ON-2
– Store three bits in two ternary cells
– 64B (512-bit) data block in 342 cells
• 9 states in 2 ternary cells
• 8 states for 3-bit data
• INVALID state
– (S4, S4)
– Use this for tolerating
wearout failures
19
First
cell
Second
cell
3-bit
data
S1 S1 000
S1 S2 001
S1 S4 010
S2 S1 011
S2 S2 100
S2 S4 101
S4 S1 110
S4 S2 111
S4 S4 INVALID
TOLERATING WEAROUT FAILURES IN 3LC
• PCM has only finite write endurance
– ~108 writes per cell
• Mark-and-spare
– A low-cost wearout failure tolerance for 3LC
– Use 3LC’s INVALID state for marking a cell pair with
a failure
– No need to store failed-cell location
– 2 spare cells per failure
• c.f. ECP [Schechter+ ISCA’10 ]
– Need a pointer and a spare cell for a failure
– 5 cells per failure with 512-bit data block and 4LC 20
• Use INVALID (S4, S4) to mark
a cell pair w/ failure
– A stuck-at cell stuck can be revived by
applying reverse current [Goux+ IEEE TED’09]
• Need a spare pair for tolerating a failure
A pair w/
failure
MARK-AND-SPARE EXAMPLE
21
Wearout
failure
A ternary cell A cell pair
for 3 bits
D0 D1 D2 D3 D4 D5 D6 D7 S0 S1
HOW TO CORRECT WEAROUT FAILURES?
22
342
256 31
10
50
0 50 100 150 200 250 300 350 400
3LC
4LC
342
256
12
31
10
50
0 50 100 150 200 250 300 350 400
3LC
4LC
CAPACITY: 3LC VS. 4LC
23
• 64B (512-bit) block
• 3LC needs fewer bits than 4LC for error correction
– 6 wearout failures:
Mark-and-spare (2cells/failure) vs. ECP (5cells/failure)
– Drift errors: BCH-1 vs. BCH-10
• 3LC: 1.41 bits/cell, 4LC: 1.52 bits/cell
• Besides, 3LC is nonvolatile
7%
Data Wearout failure correction Drift error correction
# cells
CAPACITY VS. # WEAROUT FAILURES • MLC has worse endurance than that of SLC
• May need to tolerate more than
6 wearout failures
24
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
bits/c
ell
# Wearout failures tolerated
4LC
3LC
COMPARISON TO TRI-LEVEL-CELL PCM
25
• Recent work on MLC drift errors [ISCA’13] – Same observation
• Most errors occur in the S3 state
– Same solution • Use 3 levels instead of 4 levels
• TLC paper does not address – Wearout failures
– Optimal resistance/threshold mapping • Baseline 4LC is overly pessimistic – not usable at all
• Unique feature in TLC paper – Bandwidth-Enhanced writes
MLC PCM FOR CONTINUED CAPACITY SCALING
• Major challenge: resistance drift
• Conventional 4LC PCM is not practical – Strong ECC and frequent refresh:
• Performance/power penalty
• Loose nonvolatility
• Proposal: 3LC PCM – Simple, genuinely nonvolatile
– 3-ON-2 & Mark-and-spare • Low-cost wearout tolerance mechanism for 3LC
– Only 7% lower capacity than (volatile) 4LC
• Generalized non-power-of-two level cells – 5LC, 6LC, …
26