A Survey of I/O Optimization Techniques
Sven GROOTKitsuregawa Laboratory
December 7th 2007
2
BackgroundIncrease in CPU and memory speed not
matched by disk drivesIncreasing disk and data sizesDisks limited by mechanical components:
hard to improve
We must optimize I/O accesses.
2007-12-07
3
OutlineHard disk drivesEvaluating Optimizations Through the I/O
Path [Riska et al, 2007]File System LevelDevice Driver LevelDisk Level
PrefetchingCompetitive Prefetching [Li et al, 2007]DiskSeen [Ding et al, 2007]
2007-12-07
4
Hard Disk Drives
2007-12-07
5
Hard Disk Drives
2007-12-07
Track
Sector
Disk head
Bottleneck!Disk rotation
Bottleneck!Head movement
Delay = Head Seek Time + Rotational Latency
6
The I/O Path [Riska et al, 2007] File system• B
lock allocation• Request merging
Disk drive• R
equest scheduling
• Caching
2007-12-07
7
Evaluation Environment
2007-12-07
Postmark file system benchmarkMeasure transactions
Each transaction has two steps: create file or delete file and read file or append file.
Workload
File Size
Work Set
File Size
No. Of Files
Trans-actions
SS Small Small 9-15 KB 10,000 100,000SL Small Large 9-15 KB 200,000 100,000LS Large Small 0.1-3 MB 1,000 20,000LL Large Large 0.1-3 MB 4,250 20,000/
40,000
8
File SystemsExt2
Block/cylinder groupsSingle, double, or triple indirect metadata blocks
Ext3Compatible with Ext2Journaling filesystem
ReiserFSMetadata in B+ treesJournaling filesystem
XFSExtent-based B+ trees.Journaling filesystemAllocation groups
2007-12-07
9
File System Throughput
2007-12-07
Small WS Large WS Small WS Large WSSmall Files Large Files
020406080
100120140160180200
ext2ext3ReiserXFS
Tran
sact
ions
/ se
c
Reiser performs bestEfficient request mergingEfficient block allocation
Not much journaling overhead
10
Device Driver LevelRequest reordering/mergingElevator algorithm
Sweep the disk, process all requests when the head passes the location
Shortest Seek Time FirstAlways process closest requestMay lead to starvation
2007-12-07
11
I/O SchedulersNo-Op
First-Come First-Serve algorithm; no reorderingDeadline
SSTF with aging to prevent starvationAnticipatory (default)
Similar to deadlineWaits for better request under some circumstances
CFQElevator algorithmGives each process equal I/O time
2007-12-07
12
Scheduler Throughput
2007-12-07
Small WS Large WS Small WS Large WSSmall Files Large Files
0
50
100
150
200
250
DeadlineAnticipatoryCFQNoOp
Tran
sact
ions
/ se
c
All outperform No-OpDeadline performs best
No deceptive idleness in Postmark
13
Disk Drive LevelRequest reorderingDisk drives used:
2007-12-07
ST318453LC ST3146854LC
ST3300007LC
Capacity 18 GB 146 GB 300 GBRPM 15,000 15,000 10,000Platters 1 4 4Linear density
64K TPI 85K TPI 105K TPI
Avg seek time
3.6/4 ms 3.4/4 ms 4.7/5.3 ms
Cache 8 MB 8 MB 8 MB
14
Disk Drive Results – Throughput
2007-12-07
18 GB 146 GB 300 GB 18 GB 146 GB 300 GBSmall Files - Small WS Large Files - Large WS
0
50
100
150
200
250
Ext2Ext3ReiserXFSTr
ansa
ctio
ns /
sec
15
PrefetchingRead data expected to be needed in the futurePrefetching reduces number of I/O switches
between concurrent data streamsOptimal strategy: read exactly the data needed
Requires a-priori knowledge of the stream sizeAggressive prefetching: large prefetching depth
May fetch unnecessary dataConservative prefetching: small prefetching depth
Too many I/O switches
2007-12-07
16
Competitive Prefetching[Li et al, 2007]
Prefetching depth: data that can be read during average I/O switch time
Guarantees time taken no more than twice that of optimal off-line strategy
Must measure I/O switch time and transfer rates
2007-12-07
17
Competitive Prefetching – Results
1 2 4 8 16 32 6405
10152025303540
Microbenchmark
Aggressive prefetchingCompetitive Prefetch-ingLinux
Concurrent requests
Thro
ughp
ut
1 2 4 8 16 32 6402468
101214161820
Index Searching
Concurrent requests2007-12-07
18
DiskSeen [Ding et al, 2007]
Problem: file level prefetching has disadvantagesFile level sequentiality may not be preserved at
disk levelInconvenient for recording access informationInter-file sequentiality not exploitedFile metadata blocks not prefetched
Solution: Block level prefetchingUses disk logical block numbersWorks next to file level prefetcher
2007-12-07
19
DiskSeenSequence detection
Global counter, incremented every block accessCurrent counter value for block stored: access
indexSequence when access indices on sequential
blocks grow uniformlyPrefetch when sequence detected
2007-12-07
20
DiskSeenHistory based prefetching
Keeps limited history of past access indicesLook for trails from current block
Unlike sequences, trails can skip blocks or go backwards
When history trail found, prefetch trail blocks
2007-12-07
52002N/AN/AN/A
52001N/AN/AN/A
85000631105200043500
74000631114855034950
85001632004350135000
85010632904351037000
B1 B2 B3 B4B’2B’3
21
DiskSeen
2007-12-07
OS Caching Area
Block access information
Prefetched blocks
PrefetchingArea
1. On-demand read
2. File-level prefetch
3. Prefetching4. Move hit blocks5. Delayed write-
back
1 5
2
4
3 Hard Disk
22
DiskSeen – Results
stride
d
reve
rsed
CVS diff grep
TPC-H
Q4
TPC-H
Q17
0
20
40
60
80
100
120
Linux 2.6.11First Run w/ DiskSeenSecond Run w/ DiskSeen
Exec
utio
n ti
me
2007-12-07
23
DiskSeen – Results
2007-12-07
CVS Benchmark
Before DiskSeen After DiskSeen
24
ConclusionDisk performance remains bottleneckEffective optimization opportunities exist at many levelsActive area of research
FS2 [Huang et al, 2005], Preemptive Scheduling [Dimitrijevic et al, 2005], Distributed File Systems [e.g. Weil et al, 2006], Idletime Scheduling [Eggert et al, 2005], etc.
Room for improvementUse of application/domain knowledge in
schedulers/prefetchersAnticipatory scheduler improvementsScheduling at disk level
2007-12-07