+ All Categories
Home > Documents > Reliability Issues in Flash Memory Storage...

Reliability Issues in Flash Memory Storage...

Date post: 16-Mar-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
34
Reliability Issues in Flash Memory Storage Devices 2011. 08. 01 Sang Lyul Min Seoul National University http://archi.snu.ac.kr/symin
Transcript
Page 1: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

Reliability Issues in Flash Memory Storage Devices

2011. 08. 01Sang Lyul Min

Seoul National Universityhttp://archi.snu.ac.kr/symin

Page 2: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

Outline

Flash Memory Basics Our 10-year Research and Technology Transfer Lessons Learned Conclusions

Page 3: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

Conventional MOS Transistor

gate (G)

p-substraten+ source (S) n+ drain (D)

Schematic symbol

G

S

D

Page 4: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

Conventional MOS Transistor: A Constant-Threshold Transistor

Id

VgsVth

GS

RonS D

Vgs > Vth

Page 5: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

Flash Memory

Control gate

erasure

p-substrate

Floating gate

Thin tunneling oxide

n+ source n+ drain

programming

Schematic symbol

G

S

D

Page 6: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

Flash Memory

Control gate

p-substraten+ source n+ drain

Control gate

p-substraten+ source n+ drain

Erased Cell Programmed Cell

Page 7: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

Flash Memory:A “Programmable-Threshold” Transistor

Id

VgsVth-0 Vth-1

“1” state “0” state

Control gate

p-substraten+ source n+ drain

Control gate

p-substraten+ source n+ drain

Erased state Programmed state

Page 8: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

More Bits Per Transistor

Source: Eli Harari (SanDisk), “NAND at Center Stage,” Flash Memory Summit 2007.

Page 9: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

(NAND) Flash Memory Interface

vs.

SpareData

SpareData

SpareData

SpareData

SpareData

SpareData

SpareData

SpareData

SpareData

SpareData

SpareData

SpareData

………

2j blocks

2i

pages

SpareData

SpareData

SpareData

SpareData

SpareData

SpareData

SpareData

SpareData

SpareData

SpareData

SpareData

SpareData

………

2j blocks

2i

pages Each read / (in-place) write takes 5~35 ms

Read physical page (chip #, block #, page #) 20 ~ 80 us

Write physical page (chip #, block #, page #) 200~800 us

Erase block (chip#, block #) 2~3 ms

Page 10: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

NAND Flash Memory Characteristics

The Good- Low latency - Low power consumption- High shock/vibration resistant- Small form factor- Massive parallelism

….

From the dark night

Page 11: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

NAND Flash Memory Market Trends

$/MB DRAM NAND Flash

2000 $0.97 $1.35

2001 0.22 0.43

2002 0.22 0.25

2003 0.17 0.21

2004 0.17 0.10

2005 0.11 0.05

2006 0.096 0.021

2007 0.057 0.012

2008 ~0.025 <0.005

CAGR -32.1%/yr -50.0%/yr

Source: Lane Mason (Denali Software), “NAND FlashPoint Platform”

Page 12: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

NAND Flash Memory Market Trends

Millions GB DRAM NAND Flash

2000 30 1.1

2001 50 1.6

2002 71 4.6

2003 98 14.6

2004 158 68

2005 240 200

2006 340 600

2007 645 1600

2008 1000 4000

CAGR +60.0%/yr +150%/yr

Source: Lane Mason (Denali Software), “NAND FlashPoint Platform”

Page 13: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

Outline

Flash Memory Basics Our 10-year Research and Technology Transfer Lessons Learned Conclusions

Page 14: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

14

Our 10-year Research on Flash Memory

2000년SSFTL (For Commercial CF cards) (2000. 05 ~ 2002.01)

2006년

2005년

2011년

2002년

2004년

SSFTL - SeoulSSFTL - Hong Kong SSFTL - Vancouver

USB 2.0-based SSD (Flash-only) (2004. 06)

Hybrid HDD

Chameleon SSD (Flash/FRAM Hybrid) (2005.12)

Hydra SSD (Flash-only) (2006.02)

Hydra FPGA version Technology TransferHydra ASIC version Technology Transfer

Page 15: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

15

Hydra SSD Platform

NV-RAM Modules

Samsung SLC NAND

Samsung MLC NAND

Hynix MLC NAND

FREESCALE MRAM (parallel)

RAMTRON FRAM (parallel)

RAMTRON FRAM (serial)

Page 16: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

16

Performance Results

PC Mark 05 Results

0

10

20

30

40

50

60

70

80

XP Startup Application Loading General Usage Virus Scan File Write

MB/

s

Seagate 2.5 in HDD Seagate 3.5 in HDD Adtron 3.5 in SSD M-Systems 2.5 in SSDSamsung 2.5 in SSDHydra SSD

3431 5169 2255 4494 6080 11045

PCMark05 HDD Score

Seong, Y.J., Nam, E.H., Yoon, J.H., Kim, H., Choi, J.-Y., Lee, S., Bae, Y.H., Lee, J., Cho, Y., Min, S.L. “Hydra: A block-mapped parallel flash memory solid-state disk architecture” (2010), IEEE Transactions on Computers, 59 (7), pp. 905-921.

Nam, E.H., Kim, S.J., Eom, H., and Min, S.L. “Ozone (O3): An Out-of-order Flash Memory Controller Architecture,” To appear in the IEEE Transactions on Computers.

Page 17: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

17

Technology Transfer

Oct. 30, 2007, NotebookReview.com“… In fact, it may well be the single fastest storage medium available to the customer today….”

Page 18: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

Outline

Flash Memory Basics Our 10-year Research and Technology Transfer Lessons Learned Conclusions

Page 19: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

19

NAND Flash Memory Characteristics

The Good- Low latency - Low power consumption- Small form factor- Massive parallelism

….

The Bad- Power failures- Bad blocks

(Program/Erase Errors)- Program disturbance- Read disturbance

From the dark night

Page 20: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

20

On A Fine Spring Day in 2007

………..

Page 21: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

21

Tragic Remains

Page 22: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

22

Around That Time…

http://en.wikipedia.org/wiki/Edison_Chen_photo_scandal

Page 23: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

23

A Few of Recovered Photos

Page 24: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

24

Happy Ending….

Page 25: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

25

………………………………………………………………..............

time

Reliability Analysis of NAND Flash Memory-based Storage Device (Ideal)

25

Power failure

Flash operation failure

….

………………………………………………………………..............

user1

user2

usernUsers

1day 1month 1year 10years … time

……………………......user3

user4

..…

timetk tk+1 time

cumulativefailure ratio

cumulativefailure ratio

cumulativefailure ratio

cumulative distribution

…1

………………………………………………………………..............

systemfailure

systemfailure

Page 26: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

2626

………………………………………………………………..............

time

Power failure

Flash operation failure

………………………………………………………………..............

1min 30min 6hour 3days … physical time…

..…

cumulativefailure ratio

cumulative distribution

1

………………………………………………………………..............

Emulatedusers

failstate

failstate

timetk tk+1 time

cumulativefailure ratio

cumulativefailure ratio

……………………......

1day 1month 1year 10years virtual time

Reliability Analysis of NAND Flash Memory-based Storage Device (Practical)

Page 27: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

27

Failure Analysis

2727

virtual time

s1

s2

sn

fail

fail

good

W1

W2

Wn

Reliabilityassessment Debugging

time

cumulativefailure ratio 1

Fail statediagnostics

Regression test

virtualtime

rollback &replay

restoresystemstate

restoresystemstate

snapshotrepository

rollback &replay

symptom 1 symptom 2

symptom 3symptom 4

symptom k

Page 28: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

2828

Post-mortem Analysis of Customer Returns

Causal relationship between bugs and symptoms

symptom 1 symptom 2

symptom 3symptom 4

symptom k

bug 1

bug 3

bug 2

bug 4

bug n

Post-mortemanalysis

symptom 1

symptom 3

symptom 2

symptom 4

Failure instances in real world

symptom 1

symptom 2

symptom 3

symptom 4

symptom k

Page 29: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

29

Two Key Findings

“Many” customer returns (cellular phones) are due to bugs in flash memory management software

“Most” bugs in flash memory management software are due to inadequate/incorrect handling of nested power failures and flash memory errors

Page 30: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

30

NAND Flash Memory Characteristics

The Good- Low latency - Low power consumption- High Reliability- Small form factor- Massive parallelism

….

The Bad- Power failures- Bad blocks

(Program/Erase Errors)- Program disturbance- Read disturbance

From the dark night

The Ugly- Limited Endurance- Retention errors

Page 31: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

31

The State of Affairs – Flash Memory

Scanning tunneling microscope image of a silicon surface showing 10 nm is ~20 atoms across

Source: B. Shirley, “The Many Flavors of NAND … and More to Come,” Flash Memory Summit 2009

Page 32: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

32

A “Deadly” Combination

Source: R. E. Kaufman, “Vaccine Role – History, Prevention and Future Projects,” Swine Flu: What Employers Need To Know, Sept. 24, 2009

Page 33: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

Outline

Flash Memory Basics Our 10-year Research and Technology Transfer Lessons Learned Conclusions

Page 34: Reliability Issues in Flash Memory Storage Devicescse.snu.ac.kr/sites/default/files/node--seminar/20110801CIC_민상렬.pdf17 Technology Transfer Oct. 30, 2007, NotebookReview.com

34

Conclusions (Call for Actions)

“Provably-Correct” Flash Memory Software Bad Block Management Scheme Crash Recovery Scheme etc

“Open” Reliability Evaluation Platform High-fidelity Flash Memory Modeling Configurable Fault Injection

Prepare for the future “when everything fails”


Recommended