+ All Categories
Home > Documents > LPDDR3/4-ECC DRAM for High-reliability - IoT, Automotive ... · PDF fileTitle: LPDDR3/4-ECC...

LPDDR3/4-ECC DRAM for High-reliability - IoT, Automotive ... · PDF fileTitle: LPDDR3/4-ECC...

Date post: 10-Feb-2018
Category:
Upload: truongkhuong
View: 218 times
Download: 2 times
Share this document with a friend
27
LPDDR3/4-ECC DRAM for High-reliability IoT, Automotive and Control System Applications Wolfgang Hokenmaier [email protected] October 13th, 2015 Copyright Green Mountain Semiconductor Inc. 1/27
Transcript

LPDDR3/4-ECC DRAM for High-reliability

IoT, Automotive and Control System Applications

Wolfgang [email protected]

October 13th, 2015

Copyright Green Mountain Semiconductor Inc. 1/27

Overview

1 Memory Errors and Technology Scaling

2 Growth Markets for Memory

3 ECC for Safety Critical IoT, Automotive and IndustrialApplications

4 A flexible ECC solution for LPDDR3/4 DRAM up to 2133Mbps

5 Conclusion

Copyright Green Mountain Semiconductor Inc. 2/27

DRAM Faults

Failures in a DRAM can be classified as follows[Constantinescu, 2002].

Hard Faults

Permanent recurring faults. These faults cause the memorylocation to persistently return incorrect data.

Intermittent Faults

These faults cause a memory location to occasionally returnincorrect data. This may be due to a weak cell, and or moreextreme operating conditions (temperature).

Transient Faults

Also know as soft error, these faults are unpredictable and are notrelated to device damage. The memory location can be fixed byre-writing the correct data.

Memory Errors and Technology Scaling Copyright Green Mountain Semiconductor Inc. 3/27

Fault Rates

Some study points on DRAM in a server farm[Sridharan, 2012].

I There is a 2.95% chance a 2GB DDR2 DIMM will have a hardfault over an 11 month period.

I Hard faults dominate, accounting for 70% of faults

I 47% of faults are single bit

I 22.5% of faults are from a single column or row

I These results are from a server farm, which is a controlledenvironment

Memory Errors and Technology Scaling Copyright Green Mountain Semiconductor Inc. 4/27

Scaling of Hard and Intermittent Faults

VRT errors increase with scaling [Kang, 2014]

I Cell capacitance decreasing

I Transistor leakage increasing (GIDL, charge trapping)

Circuit degradation increases with scaling [Constantinescu, 2007]

I Increased resistance, permanent or intermittent

I Crosstalk delays

I Ultra-thin oxide breakdown

Apparent hard and intermittent errors may result from systemdesign errors, namely specification violations [Aichinger, 2012]

I Timing specification violations

I Refresh violations

Memory Errors and Technology Scaling Copyright Green Mountain Semiconductor Inc. 5/27

Soft Error Rate Scaling DRAM

Effective Scaling

Vertical Integration etc.

2015

12Gb

System level Soft Error Rate remained flat with technology scalingfor DRAM, due to reduction in cell area, and voltage andcapacitance not scaling proportionally. Vertical or 2.5D integrationwill see linear increase of System Level SER. [Baumann, 2005]

Memory Errors and Technology Scaling Copyright Green Mountain Semiconductor Inc. 6/27

Soft Error Rate Scaling SRAM

Effective Scaling

Vertical Integration?

with BPSG

2015

10nm

SRAM Soft Error Rate per bit remains flat with technology scalingdue to reduction in node capacitance and aggressive voltagescaling. System level SER continues to increase with memoryusage. [Baumann, 2005]

Memory Errors and Technology Scaling Copyright Green Mountain Semiconductor Inc. 7/27

Disturb Error Rate Scaling DRAM

Effective Scaling

Vertical Integration

Vertical disturb

negligible?

Possibility to address

systematic weakness

through

system/specification

Hypothesis, insufficient published data

I Errors demonstrated on 2GB DDR2 DRAM modules[Kim, 2014]

I Scaling risk: Cells and Word/Bit Lines move closer together

I 3D Integration effect on disturb needs study

I Sensitive outlier cells may be screened (at high test cost)

I Targeted refresh (”PARA”) difficult to employ in practice

Memory Errors and Technology Scaling Copyright Green Mountain Semiconductor Inc. 8/27

DRAM Standard Densities

Memory Errors and Technology Scaling Copyright Green Mountain Semiconductor Inc. 9/27

3D Stacking

DRAM is growing Vertically

I Vertical stacks mean largeramounts of memory per device

I Repair is typically done afterassembly of all layers

I More layers increasesredundancy flexibility

I ’Effective scaling’ does notreduce the DRAM cell size, andtherefore does not reduce thesoft error rate per cell. Thisresults in a potential increase inthe soft error rate

Memory Errors and Technology Scaling Copyright Green Mountain Semiconductor Inc. 10/27

Markets

I Mobile Market is maturing

I Growth is in other sectors

Growth Markets for Memory Copyright Green Mountain Semiconductor Inc. 11/27

Memory Environments

Server Environment

I Temperature Control

I Replaceable

I Device OrientationControl

Mobile Environment

I No Thermal Control

I Uncontrolled Radiation

I High Physical Stress

I Electromagnetic noise

Growth Markets for Memory Copyright Green Mountain Semiconductor Inc. 12/27

Iot System Requirements

IoT Computer Market

Medical Aviation Manufacturing Automotive Energy Consumer Office Server

Data Security/Privacy X X X X X X X

Reliability X X X X X X

Hack/Sabotage Resistance X X X X X X X

Low Maintenance X X X X

Fail-Safe X X X X X X

Early Warning X X X X X X

Environmental (temp, humidity) X X X X X

Environmental (radiation) X X

Documentation X X X X X X X

Cost X X X X

Table: IoT system requirements, addressable by on-die ECC (yellow)

Growth Markets for Memory Copyright Green Mountain Semiconductor Inc. 13/27

Traditional ECC

DRAM without ECC

DRAM with ECC (ninth module contains parity bits)

I Traditional DRAM ECC is done external to the module

I Typically an extra device per 8 devices is used to includeparity for ECC

I ECC drives more data lines for the parity bits, increasingpower consumption

ECC for Safety Critical IoT, Automotive and Industrial Applications Copyright Green Mountain Semiconductor Inc. 14/27

Embedded System ECC Implementation

In-package ECC solution more suitable for compact systems

I Embedded, industrial and mobile computing devices use oneor few multi-chip memory packages, soldered directly to boardor even onto processor.

I Fixed memory configuration,single board

I PoP and SiP package solutions

I On-die ECC for lower power

I On-die ECC allows retrofit

ECC for Safety Critical IoT, Automotive and Industrial Applications Copyright Green Mountain Semiconductor Inc. 15/27

ECC to Lower PowerSome developers have used ECC to clean up weak cells.

ECC is under consideration for LPDDR4 spec

tREFW

CELL

CH

AR

GE Low Leak Cell

High Leak Cell

Cell failure due to leakage

ECC can be used push retention times even longer. ECC cancorrect the tail end of worst case cells that contribute to retentionfails. By letting ECC correct these fails, retention can be pushedout and total power consumption can be reduced.

ECC for Safety Critical IoT, Automotive and Industrial Applications Copyright Green Mountain Semiconductor Inc. 16/27

LPDDR4

I LPDDR4 has a large data prefetch, which makes a ECCdesign device more appealing since a large data word is moreefficient for ECC.

I Data Masking poses a problem for ECC by invalidating theECC solution, so LPDDR4 has introduced a dedicated MaskedWrite command

I This allows for a read, modify write operation to recalulate anew syndrome

I However correction on a large data word effects performance,and the ECC data still forces a fixed size penalty.

ECC for Safety Critical IoT, Automotive and Industrial Applications Copyright Green Mountain Semiconductor Inc. 17/27

A flexible ECC solution

Wide Bus ECC

Memory Array

(Multiples of 16 bits)

ECC ECC ECC ECC ECC ECC

Data IO

Memory Array Data + Parity

Data +

Parity

Data IO

Dedicated ECC architecture versus fully configurable solution.

A flexible ECC solution for LPDDR3/4 DRAM up to 2133Mbps Copyright Green Mountain Semiconductor Inc. 18/27

A flexible ECC solution

General Purpose Memory

ECC

Data IO

ECC Protected

Memory

ECC ECC ECC ECC ECC ECC ECC

User configurable ECC allocation through mode register setting.

A flexible ECC solution for LPDDR3/4 DRAM up to 2133Mbps Copyright Green Mountain Semiconductor Inc. 19/27

ECC Correct/Repair (Scrub) During Refresh

ARRAY

SEN

SE A

MPLI

ER

Refreshed WL

Corrected Area of WLFirst Refresh

ARRAY

SEN

SE A

MPLI

ER

Refreshed WL

ARRAY

SEN

SE A

MPLI

ER

Refreshed WL

tREFWA

DD

R 0

tREFW

AD

DR

0

tREFW

AD

DR

0

Corrected Area of WLSecond Refresh

Corrected Area of WLThird Refresh

tREFW

AD

DR

0

A flexible ECC solution for LPDDR3/4 DRAM up to 2133Mbps Copyright Green Mountain Semiconductor Inc. 20/27

ECC and Physical Repair During Refresh

During Refresh, address of fail can be stored for analysis.

I If address fails multiple times in a row, it’s likely to be a hardfail

I If failing address is a hard fail, cell can be repaired either witha register, or with spare element

I Since refresh increments through all addresses, smarter repairscan be made

I If two or more bits fail on a bitline, spare bitline can be usedfor replacement

I If two or more bits fail on a wordline, spare wordline can beused for replacement.

I Overall health of chip during refresh scrubbing can be storedin user available register for chip health monitoring.

A flexible ECC solution for LPDDR3/4 DRAM up to 2133Mbps Copyright Green Mountain Semiconductor Inc. 21/27

Physical Repair during refresh

ARRAY

SEN

SE A

MPLI

ER

BITLINE

WO

RD

LIN

E

BITLINE

Hard failing cell, replace with redundant element, or with register

ARRAY

SEN

SE A

MPLI

ER

BITLINE

WO

RD

LIN

E

BITLINE

Hard fail along a bitline(Replace with spare bitline)

ARRAY

SEN

SE A

MPLI

ER

BITLINE

WO

RD

LIN

E

BITLINE

Hard fail along a wordline(Replace with spare wordline)

A flexible ECC solution for LPDDR3/4 DRAM up to 2133Mbps Copyright Green Mountain Semiconductor Inc. 22/27

SECDED

ERR Register

Memory Array

ECC ECC ECC ECC ECC ECC

Data IO

SEC-DED Out

Self repair Logic

Redundancy

Activation

I Real-time Single Error Correct, Double Error Detect Output

I Fail register and self repair engine

A flexible ECC solution for LPDDR3/4 DRAM up to 2133Mbps Copyright Green Mountain Semiconductor Inc. 23/27

Adaptive Dynamic Refresh

Using ECC to Control Refresh Rate

Refr

esh

Rate

Temperature

Typica

l Refre

sh

Margin Setti

ng

Marg

in

Cell Distrib

ution

of Retention Fails

ECC Controlled

Refresh Rate

Increase refresh rate if too many fails and reduce rate if too fewfails, always guaranteeing refesh rate mimics cell fail distribution.Self-calibrating system, no need for tightly calibrated temperature

sensor.A flexible ECC solution for LPDDR3/4 DRAM up to 2133Mbps Copyright Green Mountain Semiconductor Inc. 24/27

Conclusion

I Device Scaling is pushing technologies to the limits, with errorrates expected to increase.

I Safety critical embedded applications introduce reliabilitychallenges not met by traditional ECC solutions.

I Green Mountain’s architecture is fully backwards compatible,with no additional latency or speed derating up to 2133Mbps.

I Low circuit area overhead allows for economicalECC/non-ECC combo chip architecture, lowering productdevelopment costs.

I Adaptive Dynamic Refresh enables ultra low power atmonitored, definable quality metrics.

Conclusion Copyright Green Mountain Semiconductor Inc. 25/27

References

Dr. Tilak Agerwala - Data Centric Systems - The Next Paradigm in Computing (2014), Keynote Lecture,

ICCP 2014

Christian Constantinescu - Impact of Deep Submicron Technology on Dependenability of VLSI Circuits

(2002), International Conference on Dependable Systems and Networks (DSN), pp. 205-209

Vilas Sridharan and Dean Liberty - A Field Study of DRAM Errors (2012), International Conference for

High Performance Computing, Networking, Storage and Analysis (SC) pp. 3-6.

Robert C. Baumann - Radiation-Induced Soft Errors in Advanced Semiconductor Technologies (2005), IEEE

Transactions On Device and Materials Reliability, Vol. 5 No. 3 pp. 6.

Yoongu Kim - Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM

Disturbance Errors (2014), ISCA 2014

Barbara P. Aichinger - Google Study: Could those memory failures be caused by design flaws?, MEMCON

2012

Christian Constantinescu - Impact of Intermittent Faults on Nanocomputing Devices (2007), Workshop on

Dependable and Secure Nanocomputing (DSN)

Uksong Kang et al. - Co-Architecting Controllers and DRAM to Enhance DRAM Process Scaling, The

Memory Forum 2014

Conclusion Copyright Green Mountain Semiconductor Inc. 26/27

The End

Conclusion Copyright Green Mountain Semiconductor Inc. 27/27


Recommended