Post on 11-Jan-2016
transcript
HARD DISKS AND OTHER STORAGE DEVICES
Jehan-François PârisSpring 2015
Magnetic disks (I)
Sole part of computer architecture with moving parts:
Data stored on circular tracks of a diskSpinning speed between 5,400 and 15,000
rotations per minuteAccessed through a read/write head
Magnetic disks (II)
Platter
R/W headArm
Servo
Magnetic disks (III)
Data are stored into circular tracks Tracks are partitioned into a variable number of
fixed-size sectorsOutside tracks have more sectors than inside
tracks If disk drive has more than one platter, all tracks
corresponding to the same position of the R/W head form a cylinder
Seagate ST4000DM000 (I)
Interface: SATA 6Gb/s (750MB/s) Capacity: 4TB Cache: 64MB multisegmented Seek Average
Read: < 8.5msWrite: <9.5ms
Average data rate: 146 MB/s (R/W) Maximum sustained
data rate: 180MB/s
Seagate ST4000DM000 (II)
Number of platters: 4 Number of heads: 8 Bytes per sector: 4,096 Irrecoverable read
errors per bit read: 1 in 1014
Power consumptionOperating: 7.5W Idle: 5WStandby & Sleep:0.75W
Sectors and blocks
Sectors are the smallest physical storage unit on a diskFixed-sizeTraditionally 512 bytesSeparated by intersector gaps
Blocks are the smallest transfer unit between the disk and the main memory
Magnetic disks (III)
Disk spins at a speed varying between5,400 rpm (laptops) and15,000 rpm (Seagate Cheetah X15, …)Accessing data requires
Positioning the head on the right track: Seek time
Waiting for data to reach the headOn the average half a rotation
Transferring the data
Accessing disk contents
Each block on a disk has a unique addressNormally a single number
Logical block addressing (LBA)Standard since 1996
Older disks used a different scheme Cylinder-head-sector
Exposed disk internal organizationCan still map old CHS triples onto LBA
addresses
Disk access times
Dominated by seek time and rotational delay
We try to reduce seek times by placing all data that are likely to be accessed together on nearby tracks or same cylinder
Cannot do as much for rotational delay
Seek times (I)
Depend on the distance between the two tracks Minimal delay for
Seeks between adjacent tracks Track to track (1-3 ms)
Switching between tracks within the same cylinder
Worse delay for end to end seeks
Seek times (II)
3 to 5x
x
Track to track End to end
Seek time
Rotational latency
On the average half a rotationSame for read and writes
One and half rotations for write/verify
Average rotational delay
RPM Delay
(ms)
5400 5.6
7200 4.2
10,000 3.0
15,000 2.0
Transfer rate (I)
Burst rate:Observed while transferring a blockHighest for blocks on outside tracks
More of them on each track Sustained transfer rate:
Observe red while reading sequential blocksLower
Transfer rate (II)
Actual transfer rate
Double buffering (I)
Speeds up handling of sequential file
B0 B1 B2 B3 B4 B5 B6 …
File
B1Buffers B2
Processedby DBMS
In transfer
Double buffering (II)
When both tasks are completed
B0 B1 B2 B3 B4 B5 B6 …
File
B3Buffers B2
Processedby DBMS
In transfer
The five minute rule
Jim Gray Keep in memory any data item that will be used
during the next five minutes
The internal disk controller
Printed circuit board attached to disk driveAs powerful as the CPU of a personal
computer of the early 80's Functions include
Speed bufferingDisk scheduling…
Reliability Issues
Disk failure rates
Failure rates follow a bathtub curveHigh infantile mortality Low failure rate during useful lifeHigher failure rates as disks wear out
Disk failure rates (II)
Failurerate
Time
Infantilemortality
Useful life
Wearout
Disk failure rates (III)
Infant mortality effect can last for months for disk drives
Cheap SATA disk drives seem to age less gracefully than SCSI drives
The Backblaze study
Reported on the disk failure rates of more than 25,000 disks at Backblaze.
Their disks tend to fail at a rate of5.1 percent per year during their first eighteen
months1.4 percent per year during the next eighteen
months11.8 percent per year after that
0
5
10
15
0 12 24 36 48
Time (months)
Year
ly fa
ilure
rate
(per
cent
)
Early failure stage5.1% failure rate
Random failure stage1.4% failure rate
Wearout failure stage11.8% failure rate
MTTF
Disk manufacturers advertise very highMean Times To Fail (MTTF) for their products500,000 to 1,000,000 hours, that is,
57 to 114 years Does not mean that disk will last that long! Means that disks will fail at an average rate of
one failure per 500,000 to 100,000 hours duringtheir useful life
More MTTF Issues (I)
Manufacturers' claims are not supported by solid experimental evidence
Obtained by submitting disks to a stress test at high temperature and extrapolating results to ideal conditionsProcedure raises many issues
More MTTF Issues (II)
Failure rates observed in the field aremuch higherCan go up to 8 to 9 percent per year
Corresponding MTTFs are 11 to 12.5 years
If we have 100 disks and a MTTF of 12.5 years, we can expect an average of 8 disk failures per year
Flash Drives
What about flash? Widely used in flash drives, most MP3
players and some small portable computers
Several important limitationsLimited write bandwidth
Must erase a whole block of data before overwriting any part of it
Limited endurance 10,000 to 100,000 write cycles
Flash drives
Widely used in flash drives, most MP3 players and some small portable computers
Similar technology as EEPROM Three technologies:
NOR flashNAND flashVertical NAND
NOR Technology
Each cell hasone end connected straight to ground the other end connected straight to a bit line
Longest erase and write times Allow random access to any memory location Good choice for storing BIOS code
Replace older ROM chips
NAND Technology
Shorter erase and write times Requires less chip area per cell Up to ten times the endurance of NOR flash. Disk-like interface:
Data must be read on a page-wise basis Block erasure:
Erasing older data must be performed one block at a time
Typically 32, 64 or 128 pages
Vertical NAND Technology
Fastest
The flash drive controller
PerformsError correction
Higher flash densities result in many errorsLoad leveling
Distribute writes among blocks to prevent failures resulting from uneven numbers of erase cycles
Flash drives works best with sequential workloads
Performance data
Widely vary between models: One random pair of specs:
Read Speed 22MBpsWrite Speed 15MBps
RAID level 0
No replication Advantages:
Simple to implementNo overhead
Disadvantage: If array has n disks failure rate is n times the failure
rate of a single disk
RAID levels 0 and 1RAID level 0
RAID level 1 Mirrors
RAID level 1
MirroringTwo copies of each disk block
Advantages:Simple to implementFault-tolerant
Disadvantage:Requires twice the disk capacity of normal file
systems
RAID level 4 (I)
Requires N+1 disk drivesN drives contain data
Individual blocks, not chunksBlocks with same disk address form a
stripe
x x xx ?
RAID level 4 (II)
Parity drive contains exclusive or of the N blocks in stripe
p[k] = b[k] b[k+1] ... b[k+N-1]
Parity block now reflects contents of several blocks!
Can now do parallel reads/writes
RAID levels 4 and 5
RAID level 4
RAID level 5
Bottleneck
RAID level 5
Single parity drive of RAID level 4 is involved in every write Will limit parallelism
RAID-5 distribute the parity blocks among the N+1 drivesMuch better
The small write problem
Specific to RAID 5 Happens when we want to update a single block
Block belongs to a stripeHow can we compute the new value of the
parity block
...b[k+1] p[k]b[k+2]b[k]
First solution
Read values of N-1 other blocks in stripe Recompute
p[k] = b[k] b[k+1] ... b[k+N-1]
Solution requiresN-1 reads2 writes (new block and new parity block)
Second solution
Assume we want to update block b[m] Read old values of b[m] and parity block p[k] Compute
p[k] = new b[m] old b[m] old p[k]
Solution requires2 reads (old values of block and parity block)2 writes (new block and new parity block)
Other RAID organizations (I) RAID 6:
Two check disksTolerates two disk failuresMore complex updates
Other RAID organizations (II) RAID 10:
Also known as RAID 1 + 0Data are striped (as in RAID 0 or RAID 5)
over pairs of mirrored disks (RAID 1)
RAID 0
RAID 1 RAID 1 RAID 1 RAID 1
Other RAID organizations (III) Two dimensional RAIDs
Designed for archival storage Data are written once and read maybe
(WORM) Update rate is less important than
High reliabilityLow storage costs
Complete 2D RAID arrays
Haven parity disksn(n – 1)/2 data disks
P2P1
P3
D13
D14P4
D34
D23
D24
D12
Main advantageWork in progress