Post on 13-Aug-2020
transcript
Wojciech Malikowski (NVM Solutions Group)
SPDK, PMDK & VTune™ Amplifier Summit
Legal Notices and DisclaimersIntel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.
The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks.
Intel does not control or audit the design or implementation of third party benchmark data or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmark data are reported and confirm whether the referenced benchmark data are accurate and reflect performance of systems available for purchase.
Intel, the Intel logo, Intel Optane, Xeon, and others are trademarks of Intel Corporation in the U.S. and/or other countries.
© Intel Corporation.
*Other names and brands may be claimed as the property of others.
2
SPDK, PMDK & VTune™ Amplifier Summit
Agenda
• Flash Translation Layer (FTL)overview
• Example use cases
• Write buffer cache for 4K PLI
• Zoned namespace
3
SPDK, PMDK & VTune™ Amplifier Summit 4
WHY DO WE NEED FTL• Standard SSD provides Flash
Translation Layer inside firmware
HOST
Flash Translation Layer:Data PlacementIO Scheduling
Garbage Collection
Media Handling
READ/WRITE Logical Block
READ/WRITE/ERASE
NANDDIE
STANDARD SSD
NANDDIE
NANDDIE
NANDDIE
SPDK, PMDK & VTune™ Amplifier Summit 5
WHY DO WE NEED FTL• Standard SSD provides Flash
Translation Layer inside firmware
• SDPK FTL provides block device access on top of non block SSD device implementing Open Channel interface
• Open Channel allows to control NAND flash data placement directly
• FTL logic should be moved from SSD firmware to the host
HOST
Flash Translation Layer:Data PlacementIO Scheduling
Garbage Collection
Media Handling
READ/WRITE Logical Block
READ/WRITE/ERASE
NANDDIE
STANDARD SSD
NANDDIE
NANDDIE
NANDDIE
HOST
Media Handling
READ/WRITE/ERASE Physical Page
READ/WRITE/ERASE
OPEN CHANNEL SSD
Flash Translation Layer:Data PlacementIO Scheduling
Garbage Collection
NANDDIE
NANDDIE
NANDDIE
NANDDIE
SPDK, PMDK & VTune™ Amplifier Summit 6
FTL OVERVIEW: core components/CONCEPTS• Geometry
• L2P table
• Write buffer
• Metadata
• Relocation module
• Abstracted NAND management (ANM)
SPDK, PMDK & VTune™ Amplifier Summit 7
FTL OVERVIEW: GEOMETRY• GROUP
• PU – parallel unit
• CHUNK
• LOGICAL BLOCK CHUNK (tens of MB)
LOGICAL BLOCK (4K)
MEDIA CONTROLLER
PU 1 (tens of GB)
PU 2 PU 3 PU 1 PU 2 PU 3
GROUP 1 GROUP 2
READ/WRITE/ERASE PHYSICAL ADDRESS
Open Channel SSD
SPDK, PMDK & VTune™ Amplifier Summit 8
FTL OVERVIEW: BaND• Band - collection of chunks,
each belongs to a different parallel unit
• FTL write pointer iterates over the chunks in band to achieve maximum write parallelism
• Band could be in open, close or free state
CHK0
CHK1
CHK2
CHKN
CHK0
CHK1
CHK2
CHKN
CHK0
CHK1
CHK2
CHKN
CHK0
CHK1
CHK2
CHKN
CHK0
CHK1
CHK2
CHKN
CHK0
CKH1
CHK2
CHKN
MEDIA CONTROLLER
PU 1 PU 2 PU 3 PU 1 PU 2 PU 3
GROUP 1 GROUP 2
READ/WRITE/ERASE PHYSICAL ADDRESS
Open Channel SSD
BAND 0
BAND 1
BAND 2
BAND N
SPDK, PMDK & VTune™ Amplifier Summit 9
FTL OVERVIEW: L2p tableLBA GROUP PU CHUNK BLOCK
0 0 1 8 0
4K 2 0 1 3
8K 2 1 3 5
…
• Maps logical address to physical address on disk
• Allocated in DRAM
SPDK, PMDK & VTune™ Amplifier Summit 10
FTL OVERVIEW: METADATA• We need to store metadata in
each band for restoring device and defragmentation process
• When band is opening, head metadata is written and when band becomes full, we write tail metadata
• Head metadata contains:device UUID, sequence number, version etc.
• Tail metadata contains LBA map for its band and its validity
CHK0
CHK1
CHK2
CHKN
CHK0
CHK1
CHK2
CHKN
CHK0
CHK1
CHK2
CHKN
CHK0
CHK1
CHK2
CHKN
CHK0
CHK1
CHK2
CHKN
CHK0
CKH1
CHK2
CHKN
PU 1 PU 2 PU 3 PU 1 PU 2 PU 3
BAND 0
BAND 1
BAND 2
BAND N
SPDK, PMDK & VTune™ Amplifier Summit 11
FTL OVERVIEW: METADATA• We need to store metadata in
each band for restoring device and defragmentation process
• When band is opening, head metadata is written and when band becomes full, we write tail metadata
• Head metadata contains:device UUID, sequence number, version etc.
• Tail metadata contains LBA map for its band and its validity
CHK0
CHK1
CHK2
CHKN
CHK0
CHK1
CHK2
CHKN
CHK0
CHK1
CHK2
CHKN
CHK0
CHK1
CHK2
CHKN
CHK0
CHK1
CHK2
CHKN
CHK0
CKH1
CHK2
CHKN
PU 1 PU 2 PU 3 PU 1 PU 2 PU 3
BAND 0
BAND 1
BAND 2
BAND N
BAND
CHUNK 1 CHUNK 2 CHUNK N
HEAD META
TAIL META
VALID MAP
Physical to logical address map
Sequence number
VersionUUID
SPDK, PMDK & VTune™ Amplifier Summit 12
FTL OVERVIEW: WRITE BUFFER• Open channel SSD defines minimal
write size unit, which can be multiple of block size e. g. 32K
• Write buffer collects writes before they can be submitted onto disk
BATCH 1
ENTRY 1
BATCH N
ENTRY 2 ENTRY N
ENTRY 1 ENTRY 2 ENTRY N
SPDK, PMDK & VTune™ Amplifier Summit 13
FTL OVERVIEW: RELOCATION MODULE• Manages band’s
defragmentation process
• Each band has its own merit based on its age, write count and validity
VALID BLOCK
INVALID BLOCK
BAND BEFORE RELOCATION BAND AFTER RELOCATION
SPDK, PMDK & VTune™ Amplifier Summit 14
FTL OVERVIEW: ANM HANDLING• ANM: Abstracted NAND management events
• FTL processes data relocation on certain events from SSD, like read disturb or background data refresh
• It is using Asynchronous Event Information specified in Section 5.2 of the NVMe 1.3
• Vendor Specific Notification - Get Log Page -Chunk Notification Log Entry (Log Identifier D0h)
HOST
Media Handling
READ/WRITE/ERASE Physical Page
READ/WRITE/ERASE
OPEN CHANNEL SSD
Flash Translation Layer:Data PlacementIO Scheduling
Garbage Collection
NANDDIE
NANDDIE
NANDDIE
NANDDIE
ANM
SPDK, PMDK & VTune™ Amplifier Summit 15
Current state• NVMe* driver with modifications
for Open Channel -spdk/include/nvme_ocssd.h
• BDEV FTL module -spdk/lib/bdev/nvme/bdev_ftl.h
• FTL library - spdk/lib/ftl
• Upstreamed and merged to SPDK* 19.01
*Other names and brands may be claimed as the property of others.
Bdev layer
OC SSD
NVMe driver
OC SSD
FTL bdev
FTL library
Open Channel API
SPDK 18.07 SPDK 19.01
FTL bdev
FTL library
SPDK, PMDK & VTune™ Amplifier Summit 16
DATA ISOLATION EXAMPLE
MEDIA CONTROLLER
READ/WRITE/ERASE PHYSICAL ADDRESS
Bdev layer
NVMe driver Open Channel API
PU0 PU1 PU2 PU3 PU4 PU5 PU6 PU7 PU8 PU9 PU10 PU11 PU10
OC SSD
SPDK, PMDK & VTune™ Amplifier Summit 17
DATA ISOLATION EXAMPLE
MEDIA CONTROLLER
READ/WRITE/ERASE PHYSICAL ADDRESS
Bdev layer
NVMe driver Open Channel API
PU0 PU1 PU2 PU3 PU4 PU5 PU6 PU7 PU8 PU9 PU10 PU11 PU10
OC SSD
• construct_ftl_bdev -b ftl1 -l 0-0
FTL bdev
FTL library
SPDK, PMDK & VTune™ Amplifier Summit 18
DATA ISOLATION EXAMPLE
MEDIA CONTROLLER
READ/WRITE/ERASE PHYSICAL ADDRESS
Bdev layer
NVMe driver Open Channel API
PU0 PU1 PU2 PU3 PU4 PU5 PU6 PU7 PU8 PU9 PU10 PU11 PU10
OC SSD
• construct_ftl_bdev -b ftl1 -l 0-0
• construct_ftl_bdev -b ftl2 -l 1-4FTL bdev
FTL library
FTL bdev
FTL library
SPDK, PMDK & VTune™ Amplifier Summit 19
DATA ISOLATION EXAMPLE
MEDIA CONTROLLER
READ/WRITE/ERASE PHYSICAL ADDRESS
Bdev layer
NVMe driver Open Channel API
PU0 PU1 PU2 PU3 PU4 PU5 PU6 PU7 PU8 PU9 PU10 PU11 PU10
OC SSD
• construct_ftl_bdev -b ftl1 -l 0-0
• construct_ftl_bdev -b ftl2 -l 1-4
• construct_ftl_bdev -b ftl3 -l 5-10
FTL bdev
FTL library
FTL bdev
FTL library
FTL bdev
FTL library
SPDK, PMDK & VTune™ Amplifier Summit 20
DATA ISOLATION EXAMPLE
MEDIA CONTROLLER
READ/WRITE/ERASE PHYSICAL ADDRESS
Bdev layer
NVMe driver Open Channel API
PU0 PU1 PU2 PU3 PU4 PU5 PU6 PU7 PU8 PU9 PU10 PU11 PU10
OC SSD
• construct_ftl_bdev -b ftl1 -l 0-0
• construct_ftl_bdev -b ftl2 -l 1-4
• construct_ftl_bdev -b ftl3 -l 5-10
• delete_ftl_bdev -b ftl2
FTL bdev
FTL library
FTL bdev
FTL library
SPDK, PMDK & VTune™ Amplifier Summit 21
NEXT STEPS: Write buffer cache for 4K PLI
*Other names and brands may be claimed as the property of others.
Bdev layer
NVMe driver
Open Channel NVMe SSD
FTL bdev
FTL library
Open Channel API
SPDK* 18.07 SPDK 19.01
Intel® Optane SSD
Nvme Bdev
NVMe API
LVOL1 LVOL2
FUTURE
FTL bdev
FTL library
SPDK, PMDK & VTune™ Amplifier Summit 22
NEXT STEPS: Write buffer cache for 4K PLI
*Other names and brands may be claimed as the property of others.
Bdev layer
NVMe driver
Open Channel NVMe SSD
FTL bdev
FTL library
Open Channel API
SPDK* 18.07 SPDK 19.01
Intel® Optane SSD
Nvme Bdev
NVMe API
LVOL1 LVOL2
FUTURE
FTL bdev
FTL library
VBDEV mdcache 1
VBDEV mdcache 2
SPDK, PMDK & VTune™ Amplifier Summit 23
NEXT STEPS: Write buffer cache for 4K PLI
*Other names and brands may be claimed as the property of others.
Bdev layer
NVMe driver
Open Channel NVMe SSD
FTL bdev
FTL library
Open Channel API
SPDK* 18.07 SPDK 19.01
Intel® Optane SSD
Nvme Bdev
NVMe API
LVOL1 LVOL2
FUTURE
FTL bdev
FTL library
VBDEV mdcache 1
VBDEV mdcache 2
VBDEV mdcache 1
VBDEV mdcache 2
SPDK, PMDK & VTune™ Amplifier Summit 24
NEXT STEPS: ZONED NAMESPACE SUPPORT
*Other names and brands may be claimed as the property of others.
Bdev/ZNS layer
NVMe driver
Open Channel NVMe SSD
FTL bdev
FTL library
Open Channel API
SPDK* 18.07 SPDK 19.01 FUTURE
ZNS NVMe SSD
OCSSD/ZNS adapter
FTL bdev
FTL library
ZNS API
SPDK, PMDK & VTune™ Amplifier Summit 25
Summary• FTL built on top of open channel provides more control to applications
• Extra control can be used to provide:
• Better isolation
• WAF reduction
• Better QoS
SPDK, PMDK & VTune™ Amplifier Summit 26
Summary• FTL built on top of open channel provides more control to applications
• Extra control can be used to provide:
• Better isolation
• WAF reduction
• Better QoS
Start using FTL with SPDK today: https://spdk.io/doc/ftl.html