SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected])
Solid State Drives (SSDs)
Jinkyu Jeong ([email protected])Computer Systems Laboratory
Sungkyunkwan Universityhttp://csl.skku.edu
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 2
Memory Types
• High-density• Reliable• Low-cost• Suitableforhighproductionwithstablecode
ROM
• Non-volatile• High-density• Ultravioletlightforerasure
EPROM
• Non-volatile• Lowerreliability• Highercost• Lowestdensity• Electricallybyte-erasable
EEPROM
• High-density• Low-cost• High-speed• High-power
DRAM
• High-density• Low-cost• High-speed• Low-power• Highreliability
FLASH
Source: Intel Corporation.
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 3
Flash Memory Characteristics
• Erase-before-write– Read– Write or Program: 1 à 0– Erase: 0 à 1
• Bulk erase– Program unit:• NOR: byte or word• NAND: page– Erase unit: block
1 1 1 1 1 1 1 1
1 1 0 1 1 0 1 0
1 1 1 1 1 1 1 1
write(program)
erase
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 4
Logical View of NAND Flash
• A collection of blocks• Each block has a number of pages• The size of a block or a page depends on the
technology (but, it’s getting larger)
Page0
Block1Page1
Pagem-1
Blockn-1Block0
Dataarea Sparearea
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 5
NAND Flash Types
• SLC NAND– Single Level Cell– 1 bit/cell• MLC NAND– Multi-level Cell (misnomer)– 2 bits/cell• TLC NAND– Triple-level Cell– 3 bits/cell• 3D NAND
Source: Micron Technology, Inc.
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 6
NAND Applications
• Universal Flash Drives (UFDs)• Flash cards– CompactFlash, MMC, SD, Memory stick, …• Smartphones– eMMC (Embedded MMC)– UFS (Universal Flash Storage)• SSDs (Solid State Drives)• Other embedded devices– MP3 players, Digital TVs, Set-top boxes,
Car navigators, …
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 7
Commercial SSDs
http://www.enuri.com (As of May 14, 2016)
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 8
Anatomy of an SSD
• Samsung 850 Evo
SSDControllerNANDFlash
DRAM
http://www.anandtech.com/show/9451/the-2tb-samsung-850-pro-evo-ssd-review
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 10
HDDs vs. SSDs
1 http://www.tomshardware.com/reviews/samsung-850-evo-850-pro-2tb-ssd,4205.html1 http://www.storagereview.com/samsung_spinpoint_m9t_hard_drive_review 3 http://www.enuri.com (As of Sep. 27, 2015)
Feature SSD (Samsung) HDD (Seagate)
Model MZ-75E2T0B (850Evo) ST2000LM003(SpinPoint M9T)
Capacity2TB(128Gb 32-Layer3DV-NANDTLCx16die/channelx8channels)
2TB(3Discs, 6Heads,5400RPM)
Formfactor 2.5”,66g 2.5”, 130g
DRAM 2GB 32MB
Hostinterface SATA-3(6.0Gbps) SATA-3(6.0Gbps)
Powerconsumption(Active/Idle/Sleep) 3.7,4.7W /0.5W/0.05W 2.3W/0.7W/0.18W
Performance850 Evo1:Sequential:128KB/QD2Random:4KB/QD32M9T2:Sequential:2MBRandom:4KB
Sequentialread: 544MB/sSequentialwrite: 520MB/sRandomread: 97,687IOPSRandom write: 89,049IOPS
Randomread: 11,335IOPS(QD1)Randomwrite: 38,433IOPS(QD1)
Sequentialread: 124MB/sSequentialwrite: 124MB/sRandomread: 56IOPSRandomwrite: 98IOPS
Power-on toready: 3.5secAverageseek: 12/14msAveragelatency: 5.6ms
Price3 1,009,380won(505won/GB) 117,060won(59won/GB)
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 11
State of the Art
• World’s first 2.5” SAS 32TB SSD @ Flash Memory Summit 2016
Source: THESSDREVIEW, Samsung Newsroom
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 12
State of the Art
• Z-SSD @ Flash Memory Summit 2016– 4 times faster than NVMe Flash SSDs
Source: Samsung Newsroom
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 13
NAND Constraints
• No in-place update– Require sector remapping (or address translation)• Bit errors– Require the use of error correction codes (ECCs)• Bad blocks– Factory-marked and run-time bad blocks– Require bad block remapping• Limited program/erase cycles– < 100K for SLCs, < 3K for MLCs, < 1K for TLCs– Require wear-leveling
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 14
Flash Translation Layer (FTL)
• A software layer to make NAND flash fully emulate traditional block devices (e.g. disks)
+DeviceDriver
Read Write Erase
FileSystem
ReadSectors WriteSectors
FlashMemory
Mismatch!
+DeviceDriver
FlashMemory
FTL
+
ReadSectors WriteSectors
FileSystem
ReadSectors WriteSectors
Source: Zeen Info. Tech.
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 15
Address Mapping
• Required since flash pages cannot be overwritten
… …
LBAaddressspace(Asseenbythehost)
write
Mappingtable
olddata
NANDflash
newdata
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 16
Example: Page Mapping
• Flash configuration– Page size: 4KB– # of pages / block = 4
• Current state– Written to page 0, 1, 2, 8, 4, 5
• Reading page 5Logicalpage#5 0000000101
0 01 12 23
0 01 12 28 3
4 45 5
67
891011
12131415
PBN:0
PBN:1
PBN:2
PBN:3
PageMapTable DataBlock PPN
4 45 567
8 391011
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 17
Example: Page Mapping
• Flash configuration– Page size: 4KB– # of pages / block = 4
• Current state– Written to page 0, 1, 2, 8, 4, 5
• New requests (in order)– Write to page 9– Write to page 3– Write to page 5
0 01 12 23
0 01 12 28 3
4 45 5
67
891011
12131415
PBN:0
PBN:1
PBN:2
PBN:3
PageMapTable DataBlock PPN
4 45 567
8 391011
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 18
Example: Page Mapping
• Flash configuration– Page size: 4KB– # of pages / block = 4
• Current state– Written to page 0, 1, 2, 8, 4, 5
• New requests (in order)– Write to page 9– Write to page 3– Write to page 5
0 01 12 23
0 01 12 28 3
4 45 59 6
7
891011
12131415
PBN:0
PBN:1
PBN:2
PBN:3
PageMapTable DataBlock PPN
4 45 567
8 39 61011
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 19
Example: Page Mapping
• Flash configuration– Page size: 4KB– # of pages / block = 4
• Current state– Written to page 0, 1, 2, 8, 4, 5
• New requests (in order)– Write to page 9– Write to page 3– Write to page 5
0 01 12 23 7
0 01 12 28 3
4 45 59 63 7
891011
12131415
PBN:0
PBN:1
PBN:2
PBN:3
PageMapTable DataBlock PPN
4 45 567
8 39 61011
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 20
Example: Page Mapping
• Flash configuration– Page size: 4KB– # of pages / block = 4
• Current state– Written to page 0, 1, 2, 8, 4, 5
• New requests (in order)– Write to page 9– Write to page 3– Write to page 5
0 01 12 23 7
0 01 12 28 3
4 45 59 63 7
5 891011
12131415
PBN:0
PBN:1
PBN:2
PBN:3
PageMapTable DataBlock PPN
4 45 5867
8 39 61011
Invalidateoldpage
Updatedpagewrite
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 21
Garbage Collection
• Garbage collection (GC)– Eventually, FTL will run out of blocks to write to– GC must be performed to reclaim free space– Actual GC procedure depends on the mapping scheme
• GC in page-mapping FTL– Select victim block(s)– Copy all valid pages of victim block(s) to free block– Erase victim block(s)– Note: At least one free block should be reserved for
GC
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 22
Example: GC in Page Mapping
• Current state– Written to page 0, 1, 2, 8, 4, 5– Written to page 9, 3, 5
• New requests (in order)– Write to page 8– Write to page 9– Write to page 3– Write to page 1– Write to page 4
0 01 12 23 7
0 01 12 28 3
4 45 59 63 7
5 891011
12131415
PBN:0
PBN:1
PBN:2
PBN:3
PageMapTable DataBlock PPN
4 45 867
8 39 61011
Spareblock
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 23
Example: GC in Page Mapping
• Current state– Written to page 0, 1, 2, 8, 4, 5– Written to page 9, 3, 5
• New requests (in order)– Write to page 8– Write to page 9– Write to page 3– Write to page 1– Write to page 4
0 01 12 23 7
0 01 12 28 3
4 45 59 63 7
5 88 9
1011
12131415
PBN:0
PBN:1
PBN:2
PBN:3
PageMapTable DataBlock PPN
4 45 867
8 99 61011
Spareblock
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 24
Example: GC in Page Mapping
• Current state– Written to page 0, 1, 2, 8, 4, 5– Written to page 9, 3, 5
• New requests (in order)– Write to page 8– Write to page 9– Write to page 3– Write to page 1– Write to page 4
0 01 12 23 7
0 01 12 28 3
4 45 59 63 7
5 88 99 10
11
12131415
PBN:0
PBN:1
PBN:2
PBN:3
PageMapTable DataBlock PPN
4 45 867
8 99 101011
Spareblock
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 25
Example: GC in Page Mapping
• Current state– Written to page 0, 1, 2, 8, 4, 5– Written to page 9, 3, 5
• New requests (in order)– Write to page 8– Write to page 9– Write to page 3– Write to page 1– Write to page 4
0 01 12 23 11
0 01 12 28 3
4 45 59 63 7
5 88 99 103 11
12131415
PBN:0
PBN:1
PBN:2
PBN:3
PageMapTable DataBlock PPN
4 45 867
8 99 101011
Spareblock
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 26
Example: GC in Page Mapping
• Current state– Written to page 0, 1, 2, 8, 4, 5– Written to page 9, 3, 5
• New requests (in order)– Write to page 8– Write to page 9– Write to page 3– Write to page 1– Write to page 4
0 01 132 23 11
0 01 12 28 3
4 45 59 63 7
5 88 99 103 11
4 121 13
1415
PBN:0
PBN:1
PBN:2
PBN:3
PageMapTable DataBlock PPN
4 125 867
8 99 101011
victim
Validpagecopy
Updatedpagewrite
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 27
Example: GC in Page Mapping
• Current state– Written to page 0, 1, 2, 8, 4, 5– Written to page 9, 3, 5
• New requests (in order)– Write to page 8– Write to page 9– Write to page 3– Write to page 1– Write to page 4
0 01 132 23 11
0 01 12 28 3
4567
5 88 99 103 11
4 121 134 14
15
PBN:0
PBN:1
PBN:2
PBN:3
PageMapTable DataBlock PPN
4 145 867
8 99 101011
Spareblock
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 28
OS Implications
• NAND flash has different characteristics compared to disks– No seek time– Asymmetric read/write access times– No in-place-update– Good sequential read/write and random read
performance, but bad random write performance– Wear-leveling– …– Traditional operating systems have been optimized for
disks. What should be changed?
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 29
SSD Support in OS
• Turn off “defragmentation” for SSDs• New “TRIM” command– Remove-on-delete
• Simpler I/O scheduler• Align file system partition with SSD layout• Flash-aware file systems (e.g. F2FS in Linux)• Larger block size (4KB)
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 30
Beauty and the Beast
• NAND Flash memory is a beauty– Small, light-weight, robust, low-cost,
low-power non-volatile device• NAND Flash memory is a beast– Much slower program/erase operations– No in-place-update– Erase unit > write unit– Limited lifetime – Bit errors, bad blocks, …• Software support is essential for
performance and reliability!
SSE3044: Operating Systems, Fall 2016, Jinkyu Jeong ([email protected]) 31
Beyond Flash
• Resistance-based memory technologies
Source: IEEE Computer, August 2013.