Date post: | 10-Feb-2018 |
Category: |
Documents |
Upload: | truongkhuong |
View: | 218 times |
Download: | 2 times |
LPDDR3/4-ECC DRAM for High-reliability
IoT, Automotive and Control System Applications
Wolfgang [email protected]
October 13th, 2015
Copyright Green Mountain Semiconductor Inc. 1/27
Overview
1 Memory Errors and Technology Scaling
2 Growth Markets for Memory
3 ECC for Safety Critical IoT, Automotive and IndustrialApplications
4 A flexible ECC solution for LPDDR3/4 DRAM up to 2133Mbps
5 Conclusion
Copyright Green Mountain Semiconductor Inc. 2/27
DRAM Faults
Failures in a DRAM can be classified as follows[Constantinescu, 2002].
Hard Faults
Permanent recurring faults. These faults cause the memorylocation to persistently return incorrect data.
Intermittent Faults
These faults cause a memory location to occasionally returnincorrect data. This may be due to a weak cell, and or moreextreme operating conditions (temperature).
Transient Faults
Also know as soft error, these faults are unpredictable and are notrelated to device damage. The memory location can be fixed byre-writing the correct data.
Memory Errors and Technology Scaling Copyright Green Mountain Semiconductor Inc. 3/27
Fault Rates
Some study points on DRAM in a server farm[Sridharan, 2012].
I There is a 2.95% chance a 2GB DDR2 DIMM will have a hardfault over an 11 month period.
I Hard faults dominate, accounting for 70% of faults
I 47% of faults are single bit
I 22.5% of faults are from a single column or row
I These results are from a server farm, which is a controlledenvironment
Memory Errors and Technology Scaling Copyright Green Mountain Semiconductor Inc. 4/27
Scaling of Hard and Intermittent Faults
VRT errors increase with scaling [Kang, 2014]
I Cell capacitance decreasing
I Transistor leakage increasing (GIDL, charge trapping)
Circuit degradation increases with scaling [Constantinescu, 2007]
I Increased resistance, permanent or intermittent
I Crosstalk delays
I Ultra-thin oxide breakdown
Apparent hard and intermittent errors may result from systemdesign errors, namely specification violations [Aichinger, 2012]
I Timing specification violations
I Refresh violations
Memory Errors and Technology Scaling Copyright Green Mountain Semiconductor Inc. 5/27
Soft Error Rate Scaling DRAM
Effective Scaling
Vertical Integration etc.
2015
12Gb
System level Soft Error Rate remained flat with technology scalingfor DRAM, due to reduction in cell area, and voltage andcapacitance not scaling proportionally. Vertical or 2.5D integrationwill see linear increase of System Level SER. [Baumann, 2005]
Memory Errors and Technology Scaling Copyright Green Mountain Semiconductor Inc. 6/27
Soft Error Rate Scaling SRAM
Effective Scaling
Vertical Integration?
with BPSG
2015
10nm
SRAM Soft Error Rate per bit remains flat with technology scalingdue to reduction in node capacitance and aggressive voltagescaling. System level SER continues to increase with memoryusage. [Baumann, 2005]
Memory Errors and Technology Scaling Copyright Green Mountain Semiconductor Inc. 7/27
Disturb Error Rate Scaling DRAM
Effective Scaling
Vertical Integration
Vertical disturb
negligible?
Possibility to address
systematic weakness
through
system/specification
Hypothesis, insufficient published data
I Errors demonstrated on 2GB DDR2 DRAM modules[Kim, 2014]
I Scaling risk: Cells and Word/Bit Lines move closer together
I 3D Integration effect on disturb needs study
I Sensitive outlier cells may be screened (at high test cost)
I Targeted refresh (”PARA”) difficult to employ in practice
Memory Errors and Technology Scaling Copyright Green Mountain Semiconductor Inc. 8/27
DRAM Standard Densities
Memory Errors and Technology Scaling Copyright Green Mountain Semiconductor Inc. 9/27
3D Stacking
DRAM is growing Vertically
I Vertical stacks mean largeramounts of memory per device
I Repair is typically done afterassembly of all layers
I More layers increasesredundancy flexibility
I ’Effective scaling’ does notreduce the DRAM cell size, andtherefore does not reduce thesoft error rate per cell. Thisresults in a potential increase inthe soft error rate
Memory Errors and Technology Scaling Copyright Green Mountain Semiconductor Inc. 10/27
Markets
I Mobile Market is maturing
I Growth is in other sectors
Growth Markets for Memory Copyright Green Mountain Semiconductor Inc. 11/27
Memory Environments
Server Environment
I Temperature Control
I Replaceable
I Device OrientationControl
Mobile Environment
I No Thermal Control
I Uncontrolled Radiation
I High Physical Stress
I Electromagnetic noise
Growth Markets for Memory Copyright Green Mountain Semiconductor Inc. 12/27
Iot System Requirements
IoT Computer Market
Medical Aviation Manufacturing Automotive Energy Consumer Office Server
Data Security/Privacy X X X X X X X
Reliability X X X X X X
Hack/Sabotage Resistance X X X X X X X
Low Maintenance X X X X
Fail-Safe X X X X X X
Early Warning X X X X X X
Environmental (temp, humidity) X X X X X
Environmental (radiation) X X
Documentation X X X X X X X
Cost X X X X
Table: IoT system requirements, addressable by on-die ECC (yellow)
Growth Markets for Memory Copyright Green Mountain Semiconductor Inc. 13/27
Traditional ECC
DRAM without ECC
DRAM with ECC (ninth module contains parity bits)
I Traditional DRAM ECC is done external to the module
I Typically an extra device per 8 devices is used to includeparity for ECC
I ECC drives more data lines for the parity bits, increasingpower consumption
ECC for Safety Critical IoT, Automotive and Industrial Applications Copyright Green Mountain Semiconductor Inc. 14/27
Embedded System ECC Implementation
In-package ECC solution more suitable for compact systems
I Embedded, industrial and mobile computing devices use oneor few multi-chip memory packages, soldered directly to boardor even onto processor.
I Fixed memory configuration,single board
I PoP and SiP package solutions
I On-die ECC for lower power
I On-die ECC allows retrofit
ECC for Safety Critical IoT, Automotive and Industrial Applications Copyright Green Mountain Semiconductor Inc. 15/27
ECC to Lower PowerSome developers have used ECC to clean up weak cells.
ECC is under consideration for LPDDR4 spec
tREFW
CELL
CH
AR
GE Low Leak Cell
High Leak Cell
Cell failure due to leakage
ECC can be used push retention times even longer. ECC cancorrect the tail end of worst case cells that contribute to retentionfails. By letting ECC correct these fails, retention can be pushedout and total power consumption can be reduced.
ECC for Safety Critical IoT, Automotive and Industrial Applications Copyright Green Mountain Semiconductor Inc. 16/27
LPDDR4
I LPDDR4 has a large data prefetch, which makes a ECCdesign device more appealing since a large data word is moreefficient for ECC.
I Data Masking poses a problem for ECC by invalidating theECC solution, so LPDDR4 has introduced a dedicated MaskedWrite command
I This allows for a read, modify write operation to recalulate anew syndrome
I However correction on a large data word effects performance,and the ECC data still forces a fixed size penalty.
ECC for Safety Critical IoT, Automotive and Industrial Applications Copyright Green Mountain Semiconductor Inc. 17/27
A flexible ECC solution
Wide Bus ECC
Memory Array
(Multiples of 16 bits)
ECC ECC ECC ECC ECC ECC
Data IO
Memory Array Data + Parity
Data +
Parity
Data IO
Dedicated ECC architecture versus fully configurable solution.
A flexible ECC solution for LPDDR3/4 DRAM up to 2133Mbps Copyright Green Mountain Semiconductor Inc. 18/27
A flexible ECC solution
General Purpose Memory
ECC
Data IO
ECC Protected
Memory
ECC ECC ECC ECC ECC ECC ECC
User configurable ECC allocation through mode register setting.
A flexible ECC solution for LPDDR3/4 DRAM up to 2133Mbps Copyright Green Mountain Semiconductor Inc. 19/27
ECC Correct/Repair (Scrub) During Refresh
ARRAY
SEN
SE A
MPLI
ER
Refreshed WL
Corrected Area of WLFirst Refresh
ARRAY
SEN
SE A
MPLI
ER
Refreshed WL
ARRAY
SEN
SE A
MPLI
ER
Refreshed WL
tREFWA
DD
R 0
tREFW
AD
DR
0
tREFW
AD
DR
0
Corrected Area of WLSecond Refresh
Corrected Area of WLThird Refresh
tREFW
AD
DR
0
A flexible ECC solution for LPDDR3/4 DRAM up to 2133Mbps Copyright Green Mountain Semiconductor Inc. 20/27
ECC and Physical Repair During Refresh
During Refresh, address of fail can be stored for analysis.
I If address fails multiple times in a row, it’s likely to be a hardfail
I If failing address is a hard fail, cell can be repaired either witha register, or with spare element
I Since refresh increments through all addresses, smarter repairscan be made
I If two or more bits fail on a bitline, spare bitline can be usedfor replacement
I If two or more bits fail on a wordline, spare wordline can beused for replacement.
I Overall health of chip during refresh scrubbing can be storedin user available register for chip health monitoring.
A flexible ECC solution for LPDDR3/4 DRAM up to 2133Mbps Copyright Green Mountain Semiconductor Inc. 21/27
Physical Repair during refresh
ARRAY
SEN
SE A
MPLI
ER
BITLINE
WO
RD
LIN
E
BITLINE
Hard failing cell, replace with redundant element, or with register
ARRAY
SEN
SE A
MPLI
ER
BITLINE
WO
RD
LIN
E
BITLINE
Hard fail along a bitline(Replace with spare bitline)
ARRAY
SEN
SE A
MPLI
ER
BITLINE
WO
RD
LIN
E
BITLINE
Hard fail along a wordline(Replace with spare wordline)
A flexible ECC solution for LPDDR3/4 DRAM up to 2133Mbps Copyright Green Mountain Semiconductor Inc. 22/27
SECDED
ERR Register
Memory Array
ECC ECC ECC ECC ECC ECC
Data IO
SEC-DED Out
Self repair Logic
Redundancy
Activation
I Real-time Single Error Correct, Double Error Detect Output
I Fail register and self repair engine
A flexible ECC solution for LPDDR3/4 DRAM up to 2133Mbps Copyright Green Mountain Semiconductor Inc. 23/27
Adaptive Dynamic Refresh
Using ECC to Control Refresh Rate
Refr
esh
Rate
Temperature
Typica
l Refre
sh
Margin Setti
ng
Marg
in
Cell Distrib
ution
of Retention Fails
ECC Controlled
Refresh Rate
Increase refresh rate if too many fails and reduce rate if too fewfails, always guaranteeing refesh rate mimics cell fail distribution.Self-calibrating system, no need for tightly calibrated temperature
sensor.A flexible ECC solution for LPDDR3/4 DRAM up to 2133Mbps Copyright Green Mountain Semiconductor Inc. 24/27
Conclusion
I Device Scaling is pushing technologies to the limits, with errorrates expected to increase.
I Safety critical embedded applications introduce reliabilitychallenges not met by traditional ECC solutions.
I Green Mountain’s architecture is fully backwards compatible,with no additional latency or speed derating up to 2133Mbps.
I Low circuit area overhead allows for economicalECC/non-ECC combo chip architecture, lowering productdevelopment costs.
I Adaptive Dynamic Refresh enables ultra low power atmonitored, definable quality metrics.
Conclusion Copyright Green Mountain Semiconductor Inc. 25/27
References
Dr. Tilak Agerwala - Data Centric Systems - The Next Paradigm in Computing (2014), Keynote Lecture,
ICCP 2014
Christian Constantinescu - Impact of Deep Submicron Technology on Dependenability of VLSI Circuits
(2002), International Conference on Dependable Systems and Networks (DSN), pp. 205-209
Vilas Sridharan and Dean Liberty - A Field Study of DRAM Errors (2012), International Conference for
High Performance Computing, Networking, Storage and Analysis (SC) pp. 3-6.
Robert C. Baumann - Radiation-Induced Soft Errors in Advanced Semiconductor Technologies (2005), IEEE
Transactions On Device and Materials Reliability, Vol. 5 No. 3 pp. 6.
Yoongu Kim - Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM
Disturbance Errors (2014), ISCA 2014
Barbara P. Aichinger - Google Study: Could those memory failures be caused by design flaws?, MEMCON
2012
Christian Constantinescu - Impact of Intermittent Faults on Nanocomputing Devices (2007), Workshop on
Dependable and Secure Nanocomputing (DSN)
Uksong Kang et al. - Co-Architecting Controllers and DRAM to Enhance DRAM Process Scaling, The
Memory Forum 2014
Conclusion Copyright Green Mountain Semiconductor Inc. 26/27