Introduction to Solid State Drives

Post on 29-Nov-2014

7,238 views 4 download

description

These are the slides from a tutorial I presented at LOPSA-East in 2013. It covers spinning media and and solid state drives in detail. A video of the presentation can be found on YouTube: http://www.youtube.com/watch?v=G3wf1HMr6b0

transcript

Introduction to Solid State Drive

Technology

Saturday, November 16, 13

Class Overview

• The Evolution of Storage Technology

• Spinning Disks

• Storage Metrics

• Solid State Technology

Saturday, November 16, 13

better understanding of spinning disks, understand high & low level flash, issues with SSDs

The Evolution of Storage Technology

Saturday, November 16, 13

Pre-History

Density/Time Speed/Time

Saturday, November 16, 13

Spinning Disks

Saturday, November 16, 13

The Parts of a Hard Drive

Saturday, November 16, 13

PlattersThe Parts of a Hard Drive

Saturday, November 16, 13

Actuator Arms and Heads

The Parts of a Hard Drive

Saturday, November 16, 13

Controllerand Interface

The Parts of a Hard Drive

Saturday, November 16, 13

Voltron Force Assemble!

Saturday, November 16, 13

Disk Interface

• Removable (USB/CF)

• SATA1 / II / III

• Nearline SAS

• SAS

• Fibre Channel

• PCI-e

Saturday, November 16, 13

Spinning Disk Removable Media

• Advantages:

• Nigh Universal

• Disadvantages:

• Slower

• Fragile

• Easily lost.

• Abstraction Layer

Disk Interface

Saturday, November 16, 13

USB

SATA I / II / III

• Speeds: 1.5 / 3 / 6Gb/s

• Requires AHCI for things like NCQ

• Subset of SAS

• Shares IDE command set

Disk Interface

Saturday, November 16, 13

AHCI - Advanced Host Controller Interface (IDE is ok for TRIM)NCQ on SSD ensure SSD has things to do while host is latent(Intel can queue 32 requests) - logo from SATA-IO (intnl org)

SATA 3.1

• Approved July 2011

• Universal Storage Module

• mSATA

• QTRIM

Disk Interface

(This time, it’s personal)

Saturday, November 16, 13

QTRIM - queued TRIM commands, USM is a mobile drive standard

SAS / Nearline SAS

• SAS

• Enhanced CRC checking

• 512/520/528 bit blocks

• Low density, high reliability

• Nearline SAS

• ...not so much

Saturday, November 16, 13

Serially Attached SCSI - 16 bits of CRC

Disk Geometry

Saturday, November 16, 13

PlattersDisk

Geometry

Saturday, November 16, 13

TracksDisk

Geometry

Saturday, November 16, 13

Cylinders Disk

Geometry

Saturday, November 16, 13

Cylinders Disk

Geometry

Saturday, November 16, 13

SectorsDisk

Geometry

Saturday, November 16, 13

Logical Block Addressing

• First introduced as an abstraction layer

• Replaced CHS addressing

• Address Space is Linear (block 0 - n)

• Size of address space depends on the standard at time of manufacture.

DiskGeometry

Saturday, November 16, 13

Currently at 48-bit LBA -

Variables Affecting Spinning Disk IO Rate

Saturday, November 16, 13

Platter Rotational Speed

Saturday, November 16, 13

Seek Speed

Saturday, November 16, 13

Data Density

Saturday, November 16, 13

Controller CacheVariables Affecting Spinning

Disk Speed

Saturday, November 16, 13

Size / battery backed / hybrid drives

Spinning Disk Damage Vectors

Saturday, November 16, 13

Movement

• Movement vertical or parallel to platter

• Measured in G forces

• Head Crashes

• Spinning Down

• Head uses “Landing Strip”

• Repeated platter contact causes damage to the read/write head

Spinning Disk Damage Vectors

Saturday, November 16, 13

Used to manually park the heads | Putting your computer to sleep can cause the head to park | nanocoating on the bumpy landing strip

Protection against movement

• “Active Drive Protection”: Free-fall sensor

• Has a lift arm to lift the head away from the platter

• Some protection systems are in the drive, some are in the controller

• Don’t mix the two

Saturday, November 16, 13

Apple: Sudden motion sensor, Lenovo/IBM: Hard Drive Active Protection System

Next slide: “You know, vibrations are movement...”

You know, vibrations are movement...

Saturday, November 16, 13

Yes, vibrations are important, too

Saturday, November 16, 13

Shelf Life

• Oil / Lubricants in bearings

• Temperature fluctuations

• Magnetic “events” (bit rot)

• Outgassing / vapor removal

Spinning Disk Damage Vectors

Saturday, November 16, 13

Long-term “archival quality” drives with long-life lubricant Long term “cold storage” arrays which periodically spin up drives every few weeks to clean & scrub the data

Spinning Disks in RAID

• Redundant Array of Inexpensive Disks

• Common RAID levels:

• 0,1,5,6,10

• Software / Hardware

Saturday, November 16, 13

Important Considerations

• Redundancy

• Capacity

• Speed

• Robust

SpinningDisks

In RAID

Saturday, November 16, 13

Speed: Dedicated hardware? Single point of failure? Parity Calculation? How long to rebuild a drive? NUMBER OF SPINDLES!! Redundancy: How many drive failures? URE errors? Capacity: Parity stripe or mirrors? (harder better faster stronger)

Advantages

• Linear Speed

• Price (Per Gigabyte)

• Well-Understood

SpinningDisks

In RAID

Saturday, November 16, 13

Disadvantages

• Random Speed

• Price (per IOPS)

• Failure Rate

• Rebuild Speed

SpinningDisks

In RAID

Saturday, November 16, 13

Storage Metrics

Saturday, November 16, 13

IOPS

• What are they?

• What aren’t they?

Saturday, November 16, 13

The Simplified Equation

IOPS = 1/(((R+W)/2)/1000) + (L/1000)

RWL

= Average Read Time= Average Write Time= Average Latency

Saturday, November 16, 13

Rule of Thumb Assumptions

RPM IOPS5400 50-807200 80-10010k 130-15015k 180-200

Saturday, November 16, 13

Determining IOPS

• Per Drive

• Manufacturer’s Stated Numbers

• Rule of Thumb

• Per RAID Array

• Write penalty

Saturday, November 16, 13

IO Profiling

• Active Tools

• Bonnie++

• dd

• Intel NAS Toolkit

• Passive Tools

• io(stat/meter/top), atop

• Resource Monitor / Process Explorer

Saturday, November 16, 13

http://www.intel.com/products/server/storage/NAS_Perf_Toolkit.htm

Solid State Drive Technology

Saturday, November 16, 13

NOR Flash

• Reads and writes are atomic single-bit

• Expensive

• Small specific use cases

Saturday, November 16, 13

Won’t talk about NOR much.

NAND Flash

• Reads are based on “read blocks” (4k)

• Writes are based on “erasure blocks”

• Cheap (and getting cheaper)

• Broad use cases

Saturday, November 16, 13

Read / Write Profiles

• Logical addresses abstracted from LBA

• No seek time

• Reads are generally very fast

• Writes are typically slower

Saturday, November 16, 13

Random and Linear IO have identical access timeNext slide: The magic

The Magic

InsulatingBarrier

Pure Silicon

Doped silicon capable of holding an electrical charge

Saturday, November 16, 13

Barrier is a dielectric film (silicon oxide)

Quantum Tunneling

(transmission coefficient for a particle tunneling through a single potential barrier)

Saturday, November 16, 13

Hot Carrier InjectionStorage / Erase uses Fowler-Nordheim Tunnel Injection / Release

Doped Silicon

Single Layer Cell (SLC)

Multi-Layer Cell (MLC)

Triple-Layer Cell (TLC)

Saturday, November 16, 13

Use charge pumps to get through the barrier Each charge level has a binary state - 1 or 0

Gradual Destruction

Energy increaseswith cell layers

Multiple cells needmultiple writes

Barrier accumulateselectrons

Electrical potential difference of barrier and cells disappears

Saturday, November 16, 13

Difficulty Going ForwardTLCTLC

000 100

001 101

010 110

011 111

SLC

0

1

MLC

00

01

10

11

4LC4LC4LC4LC

0000 0100 1000 1100

0001 0101 1001 1101

0010 0110 1010 1110

0011 0111 1011 1111

Saturday, November 16, 13

Density

• 3-Dimensional

• Charge levels

• Size of cells

• “Dot Pitch” (Cells Per Inch)

• 5nm, 3nm, 2nm

• Varies with “level” count

Saturday, November 16, 13

SLC / ESLC

• Low Density

• Single (bit) Level Cell

• Quick: 25µs Read / 200-300µ Write

• More robust & long wear time

• Write endurance near 100,000 cycles

Saturday, November 16, 13

Capacity expensive | only in 5nm / 3nm densities |

MLC / EMLC

• Reasonably High Density

• Two (bit) Level Cell

• Decently fast: 50µs Read / 600-900µs Write

• Medium lifetime

• Write endurance near 3,000 cycles

Saturday, November 16, 13

TLC

• Very High Density

• Three (bit) Level Cell

• Decently fast: 75µs Read / 900-1350µs Write

• Not very robust or durable :-(

• Write endurance ~ 1,000 cycles

Saturday, November 16, 13

Write Amplification and Garbage Collection

Saturday, November 16, 13

Block Sizes

• Read Block

• 4k (aka “page”)

• Erasure Block

• (Large) multiple of 4k

• aka “block” 256KB erasure

block size

Saturday, November 16, 13

e-ink parallel

Write Amplification

Written Data

Empty Cell

Saturday, November 16, 13

next - want to change the data in the upper right quadrant

Write Amplification

Written Data

Empty Cell

New Data

Old Data

Saturday, November 16, 13

next - big chunk of new data to write

Write Amplification

Written Data

Empty Cell

Old Data

New Data

Saturday, November 16, 13

Where does this go? We’re out of empty erasure blocks!

Write Amplification

Written Data

Empty Cell

New data writtenover old cell

w/o TRIM

Saturday, November 16, 13

Write Amplification

Written Data

Empty Cell

New data writtenover old cell

w/ TRIM

Saturday, November 16, 13

Garbage Collection

Saturday, November 16, 13

Garbage Collection

Saturday, November 16, 13

Garbage Collection

Saturday, November 16, 13

Garbage Collection

Saturday, November 16, 13

Garbage Collection

Saturday, November 16, 13

IO Performance Profiles

Saturday, November 16, 13

Remember:

• Spinning Disks

• Linear is fast

• Random is slow

• Read marginally faster than writes (sometimes)

Saturday, November 16, 13

writes slower when switching tracks

With SSDs:

• Reads are fast

• Writes are slow(ish)

• Random or linear doesn’t matter (as much)

Saturday, November 16, 13

SSD Performance Overview

• Depends on

• Number of flash chips in use

• Number of busses from the processor

• Performance of controller CPU

• Contention

• Bus speed

• Number of erasure blocks used

• Number of previous writes to flash cells

Saturday, November 16, 13

• Chips

• IO Busses

• CPU Cores

Saturday, November 16, 13

Causes of Contention

• Legitimate use

• Garbage collection

• Legitimate (but latent) useage

• IO Blender!

(Bender Blender: http://bit.ly/10vc7Sf)Saturday, November 16, 13

Latent: updatedb? atime? app-level garbage collection? (t-shirt at threadless)

Bus Speed

• SATA - 3 or 6 Gb/s?

• IOPS Calc

• Can your controller handle your disks?

Saturday, November 16, 13

Read

• Very fast

• No seek time

• moderately improved over spinning disk (linear - random greatly)

• Causes no damage to the media

• Generally scales up with capacity

Saturday, November 16, 13

Write

• Usually fast (depending on drive usage)

• No seek time

• highly improved over spinning disk

• Causes no damage to the media

• Generally scales up with capacity

Saturday, November 16, 13

Spinning DiskRead/Write Matrix

Read Write

Linear

Random

Saturday, November 16, 13

SSDRead/Write Matrix

Read Write

Linear

Random

Saturday, November 16, 13

Solid State in Practice

Saturday, November 16, 13

Solid State Form Factors

Saturday, November 16, 13

Removable Media

Saturday, November 16, 13

Drives

Saturday, November 16, 13

PCI Cards

Saturday, November 16, 13

Next slide: parts of an SSD

Parts of an SSD

Saturday, November 16, 13

Interface

USB

PCI

SATA/SAS

Saturday, November 16, 13

IDE (sadly?)

ControllerMain

Processor

I/O Bus Lanes

RAMCache

Battery / SuperCapacitor

Saturday, November 16, 13

Flash Chips

Saturday, November 16, 13

If individual chip capacity is finite, how do bigger drives increase capacity? What does this mean for performance?

Flash Controllers

• Flash Translation Layer (FTL)

• Stripe Writes

• Interpret bus instructions

• Wear Leveling

• Garbage Collection

Saturday, November 16, 13

Do the heavy lifting - single largest problem with flash drives, without a doubt.

Flash Translation LayerLBA (0...n blocks)

F L A S H C H I P S

Saturday, November 16, 13

SSD Aspects & Concerns

Saturday, November 16, 13

Longevity

• Primarily determined by the class of flash

• (e)SLC, (e)MLC, TLC

• Related to wear-leveling

• Under-reported capacity

• Short-stroking improves lifetime (not speed)

Saturday, November 16, 13

Partition Alignment

• Performance and longevity

• As big (or bigger) issue than it was in spinning disks

• Native 4k read blocks

• Far larger erasure blocks

• larger than is practical for alignment

Saturday, November 16, 13

TRIM

• As a command, refers to ATA-8 spec

• SCSI equivalent is UNMAP, but both are often referred to as TRIM.

• Does not immediately delete unused blocks

• Allows for GC

Saturday, November 16, 13

Linux calls this “discard” - TRIM refers

Linux TRIM Support

• EXT4 / XFS / JFS / BTRFS - Native using ‘discard’ option

• Consider NOOP or Deadline IO scheduler

• fstrim (part of util-linux) for R/W vols

• zerofree for R/O vols

Saturday, November 16, 13

fstrim & zerofree - userland - important for thin-provisioned volumes on SAN arrays which support it. Check docs on schedulers for details - deadline prefers read queues (under /sys/block)

OSX Trim Support

• Comes by default on factory-installed SSDs

• Trim-Enabler

• http://www.groths.org/trim-enabler/

Saturday, November 16, 13

ZFS and SSDs

• ZFS Intent Log (ZIL)

• Adaptive Replacement Cache (ARC)

• arc_summary can help you decide

Saturday, November 16, 13

ZIL is almost like a journal - ARC is a RAM cache that has disk backing it. SSDs can be L2ARC - https://code.google.com/p/jhell/wiki/arc_summary

Filesystems in General

• Standard journaling filesystems

• Mount options (atime/relatime, etc), /tmp->tmpfs

• Next-Gen

• ZFS / BTRFS

• Distributed Filesystems

• DRBD

Saturday, November 16, 13

ZFS - SSD cache pool | ZFS/BTRFS are COW | DRBD no trim

Monitor Health w/ S.M.A.R.T.

• S.M.A.R.T. information

• vendor-specific

• Includes flash erase count

• smartctl on Linux and Mac

• Dozens of tools on Windows (check wiki)

Saturday, November 16, 13

Forensics

(http://bit.ly/fast11-wei-paper)

...Our results lead to three conclusions:

First, built-in commands are effective, but manufacturers sometimes implement them incorrectly.

Second, overwriting the entire visible address space of an SSD twice is usually, but not always, sufficient to sanitize the drive.

Third, none of the existing hard drive-orientedtechniques for individual file sanitization are effective on SSDs

Reliably Erasing Data From Flash-Based Solid State DrivesMichael Wei∗, Laura M. Grupp∗, Frederick E. Spada†, Steven Swanson∗

∗Department of Computer Science and Engineering, University of California, San Diego†Center for Magnetic Recording and Research, University of California, San Diego

Saturday, November 16, 13

SSD-enhanced RAID Array Considerations

Saturday, November 16, 13

Hardware / Software

• Dedicated CPU Power

• Battery-backed storage

Hardware RAID Controllers

• Trust (eyes on code)

• Excessive cost of HW

Software RAID Controllers

• Commercial Support

• Proprietary Tech

• Portability

• Spare CPU Cycles

Saturday, November 16, 13

single point of failure

TRIM / GC?

• Does the RAID software/device know enough to pass along TRIM?

• Will the array eventually crawl because of ongoing GC issues?

Saturday, November 16, 13

No software RAID that I know of supports it. Intel chipset for RAID0 with TRIM

Access Bandwidth

• How much data can a single drive transmit?

• How many drives are in the array?

• What is the aggregate bus speed to the array controller?

• What is the bus speed to the host(s)?

Saturday, November 16, 13

SSD Throughput Example

From Tech Radar: http://bit.ly/100UhvY

4.15Gb/s

Saturday, November 16, 13

Controller / Bus

• Speed / Ports

• How mature / reliable / tested?

RememberMe?

Saturday, November 16, 13

Just because buses exist in a storage array oesn’t make them magic and infinite in size

Tiering / Caching

Very slow, cheap disks

Faster spinning disks

SSD tier - hot blocks

Very fast SDRAM

Saturday, November 16, 13

Future Technology

Saturday, November 16, 13

Enhanced Capacity

Saturday, November 16, 13

Kowloon Walled City

Enhanced Longevity

Saturday, November 16, 13

Telomeres in chromosomes

Smart SSDs

Saturday, November 16, 13

Active Flash

Saturday, November 16, 13

What should I buy?

Saturday, November 16, 13

Questions?

Saturday, November 16, 13