X-RAY: A Non-Invasive Exclusive Caching Mechanism for RAIDs
Lakshmi N. Bairavasundaram
Muthian Sivathanu
Andrea C. Arpaci-Dusseau
Remzi H. Arpaci-Dusseau
ADvanced Systems Laboratory
Computer Sciences Department
University of Wisconsin – Madison
Introduction Caching in modern systems
Multiple levels Storage: 2-level hierarchy
Level 1: File system (FS) cache Software-managed Main memory of host/client LRU-like cache replacement
Level 2: RAID cache Firmware-managed Memory inside RAID system Usually LRU replacement .......
File system cache
RAID cache
RAID
Application
Host
Introduction – contd. LRU
Replace LRU block Cache placement on read
Read Block no. 10
LRU MRU
Read Block no. 10
39 …….. 4523 10…….. 4523
Introduction – contd. LRU
Replace LRU block Cache placement on read
2 levels of LRU Redundant contents
……..
……..
Read Block no. 10
Read Block no. 10
Read Block no. 10
10
10
MRU
MRU
LRU
LRU10
LRU 10 MRU
MRULRU
11
11
12
12
….
….
FS Cache
RAID Cache
Introduction – contd. LRU
Cache placement on read Replace LRU block
2 levels of LRU Redundant contents
Goal: Exclusive caching 10
LRU 10 MRU
MRULRU
11
11
12
12
….
….
FS Cache
RAID Cache
Improved RAID Caching Multi-Queue (Zhou et al. 2001)
Add frequency component to cache policy Not strictly exclusive!
DEMOTE (Wong and Wilkes 2002) Change interface to disk File system issues “cache place” command Has perfect information and hence perfectly exclusive caches Interface changes – difficult to deploy
Ideal RAID Cache Exclusive caching
File system and RAID caches should have different contents
Global LRU Known to work well RAID cache should be a victim cache
No interface changes….
……
FS Cache
RAID Cache Block ReadVictim Block
LRU
MRU
X-RAY Observes disk traffic
Reads and writes to data and metadata
Builds a model of the FS cache Uses semantic knowledge Predicts size and contents of FS cache
Identifies set of exclusive blocks Recent victims of the FS cache
Reads blocks from disk into cache Result
A nearly exclusive cache without interface changes
File system cache
RAID cache
RAID
Host
Model of FS cacheX-RAY
Talk Outline Introduction File Systems Information and Inferences X-RAY Cache Design Results Conclusion
File System Operation Applications perform file reads and writes File system (Unix)
Translates file accesses to disk block requests Metadata
To maintain application data on disk and manage disk blocks Periodically written to disk Examples: inodes, bitmap blocks
File System Operation Inode
Pointers to data blocks File access information
Inode
Data Blocks
Latest access time
Pointers to data blocks
File
File System Operation File access
Use inode to obtain pointers to disk data blocks Read corresponding blocks from disk if they are not in FS cache Update the access time information in inode
Metadata updates Periodically check for “dirty” inodes and write to disk
The Problem To observe disk traffic and infer
the contents of FS cache Why difficult?
FS cache size changes over time Shares main memory with virtual
memory system
The Problem To observe disk traffic and infer
the contents of FS cache Why difficult?
FS cache size changes over time Disk cannot observe all FS-level
accesses
Read block:
10
10
Disk Read
11
11
101112
12
12
LRU
LRU
MRU
MRU
FS Cache
FS Cache Model
RAID
The Problem To observe disk traffic and infer
the contents of FS cache Why difficult?
FS cache size changes over time Disk cannot observe all FS-level
accesses
Read block:
10
10
Disk Read
11
11
10
12
12
LRU
LRU
MRU
MRU
13
FS Cache
FS Cache Model
RAID
The ProblemRead block:
10
11 12
12 13
13
LRU
LRU
MRU
MRU
FS Cache
FS Cache Model
RAID
To observe disk traffic and infer the contents of FS cache
Why difficult? FS cache size changes over time Disk cannot observe all FS-level
accesses
The ProblemRead block:
10
11 12
12 13
13
LRU
LRU
MRU
MRU
FS Cache
FS Cache Model
RAID
To observe disk traffic and infer the contents of FS cache
Why difficult? FS cache size changes over time Disk cannot observe all FS-level
accesses
Key observation We need information about
accesses that hit in FS cache File system maintains access
information in inodes
Talk Outline Introduction File Systems Information and Inferences X-RAY Cache Design Results Conclusion
Information Obtain information from observing disk traffic Knowledge of file system structures and operations
File system maintains time of last access in inodes Periodic inode writes Assuming whole file access, all blocks are in FS cache
Assume file system cache policy is LRU
Inferences Read for data block
Block will be placed in file system cache (MRU block)
Read for previously read data block Block became victim in file system cache Blocks with an earlier access time should also be victims
Inode write: new access time , no disk read observed All blocks belonging to file are in FS cache Other blocks with later access time should also be present
Talk Outline Introduction File Systems Information and Inferences X-RAY Cache Design Results Conclusion
Design
Recency list (R-list) List of data blocks ordered
by access time Cache Begin (CB) pointer
Divides R-list into inclusive and exclusive regions
RAID Cache contents Subset of blocks in exclusive
region
LRU MRUA, 1 B, 1 D, 3C, 2 F, 5E, 3
CBInclusive regionExclusive region
Block number Access time
Blocks the RAID should cache
Blocks expected to be in FS cache
Disk Read
LRU MRUA , 1 B , 1 C , 2 D , 3 E , 3 F , 4
CBInclusive regionExclusive region
Read Block ‘D’ ; time = 6
Disk Read
LRU MRUA , 1 B , 1 C , 2 D , 3 E , 3 F , 4
CBInclusive regionExclusive region
Read Block ‘D’ ; time = 6
Disk Read
LRU MRUA , 1 B , 1 C , 2 D , 6E , 3 F , 4
CBInclusive regionExclusive region
Read Block ‘D’ ; time = 6
Inode Write – Access time change
LRU MRUA , 1 B , 1 C , 2 D , 3 E , 4 F , 5
CBInclusive regionExclusive region
G , 7
Inode “23” : access time = 6
Semantic knowledge Inode “23” == blocks D & E
Blocks D, E : access time = 6
Inode Write – Access time change
LRU MRUA , 1 B , 1 C , 2 D , 3 E , 4 F , 5
CBInclusive regionExclusive region
G , 7
Blocks D, E : access time = 6Inode “23” : access time = 6
Inode Write – Access time change
LRU MRUA , 1 B , 1 C , 2 F , 5
D , 6 E , 6
CBInclusive regionExclusive region
G , 7
Blocks D, E : access time = 6Inode “23” : access time = 6
X-RAY Cache
LRU MRUA , 1 B , 1 C , 2 F , 5 D , 6 E , 6
CBInclusive regionExclusive region
G , 7
RAID Cache (size = 2 blocks)
Keep track of additions to window in exclusive region
X-RAY Cache
Read newly-added blocks from disk Replace blocks no longer in the window Additional disk bandwidth
Idle time, extra internal bandwidth, freeblock scheduling
LRU MRUA , 1 B , 1 C , 2 F , 5 D , 6 E , 6
CBInclusive regionExclusive region
G , 7
RAID Cache (size = 2 blocks)
Talk Outline Introduction File Systems Information and Inferences X-RAY Cache Design Results
Tracking FS Cache Contents RAID Cache Performance
Conclusion
Results – Tracking Accurate size and content prediction Highly responsive to FS cache size changes Tolerates changes in inode write interval Partial file reads
X-RAY performs well if percentage of partially accessed files is < 40% (typical traces have less than 30%)
Results – Cache Performance
Performs better than LRU and Multi-Queue
Close to DEMOTE, in spite of imperfect information
Hit rate advantage translates to lower read latency
Additional Results File system cache policy is not LRU
Clock, 2Q X-RAY performs nearly as well as before It performs better than both LRU and Multi-Queue
Idle time requirements X-RAY reads blocks into cache only during idle time It performs well if idle time is greater than one-third of actual idle time
observed in the trace
More in the paper …
Conclusion Easy deployment is an important goal in developing technology
Avoid interface changes – use non-invasive mechanisms
Higher-level systems maintain various pieces of information about data they manage Provide low-level systems with basic semantic knowledge
Semantic intelligence for managing RAID caches Use access information in metadata to track file system cache contents
and cache exclusive blocks In spite of imperfect information, X-RAY performs nearly as well as
changing the interface
Semantically-smart Disk Systems Availability, security and performance improvements
Questions ?
ADvanced Systems Laboratory (ADSL)
Computer Sciences, University of Wisconsin-Madison
http://www.cs.wisc.edu/adsl