CS 5460: Operating Systems
CS5460: Operating Systems Lecture 17: Intro to File Systems
(Ch. 10)
Important From Last Time Page replacement algorithms
– Optimal page replacement strategy evicts the page used farthest in the future
– LRU is a decent approximation of optimal – Clock / second chance algorithm is a single-bit approximation of
LRU – works well for most workloads
Thrashing happens when working sets do not fit into RAM
– Response: Swap out entire processes – Last resort: Start killing processes
Copy on write optimizations Memory-mapped file optimizations
CS 5460: Operating Systems
CS 5460: Operating Systems
Filesystem Layers User’s viewpoint:
– Objects: Files, directories, bytes – Operations: Create, read, write, delete,
rename, move, seek, set attributes
Physical viewpoint: – Objects: Sectors, tracks, disks – Operations: Seek, read block, write block
User ßà OS layer – User library hides many details – OS can directly read/write user data
OS ßà Hardware layer – IO registers – Interrupts – DMA
Disk Hardware
User Apps
User Library
Open() | Close() | Read() | Write()
Seek() | ReadBlk() | WriteBlk()
Trap
I/O regs DMA
Interrupts DMA
CS 5460: Operating Systems
Typical Disk Organization
Coated with magnetic material that encodes bits
– Capacity increases come from improvements in bit density
Logically divided into: – Spindles: individual disks – Tracks: rings on a disk – Sectors: portions of a track – Cylinders: stacks of tracks
Read/write data (overview): – Position disk head over track – Wait for sector to rotate under head – Read/write data from/to sector
Block/sector
Cylinder/Track
R/W head
Disk arm
Rotation
Spindle
CS 5460: Operating Systems
Disk Organization (cont’d) Disk physics:
– Modern disks spin at 5400, 7200, 10000, and 15000 rpm – Outside edge of 3.5” disk spins at over 150 mph – Disk head “floats” on very thin cushion of air above platter
» Bernoulli effect used to “fly” as close as possible » Head crash is exactly that à disk head contacts the surface
Disks organized as stacks of platters: – Disk heads mounted on “combs” à often heads on both sides – Separate disk heads moved independently
Disk controller – Managing all the independent head movements – Contains RAM to cache disk contents from/to disk – Accepts commands from CPU à responds using DMA/interrupts
CS 5460: Operating Systems
Disk Hardware Trends
Eliminating seeks is critical to performance!
Model Size Interface Seek RPM Price ST320011A 20 GB ATA/IDE 9.0ms 7,200 $92
ST318437LC 18 GB U2-SCSI 3.6ms 15,000 $329
ST3120814A 120 GB ATA-100 8.5ms 7,200 $80
ST373453LC 73 GB U-SCSI 3.6ms 15,000 $263
ST3320820A 320 GB PATA 4ms 7,200 $99
ST3146855LW 147 GB Ultra320 2ms 15,000 $313
ST2000DM001 2000 GB SATA 4.1ms 7,200 $130
ST3600057SS 600 GB SAS 2ms 15,000 $500
• 200
1 • 2
005
• 200
7
• Data: Seagate, NewEgg, dirtcheapdrives.com
• 201
2
How to avoid seeks?
Design file system carefully Use RAM as a cache for disk
– Once a block is read, cache it as long as possible – When a block is written to, delay the actual write
Combine hard disk with solid state disk (SSD) Replace hard disk with SSD RAM cloud
CS 5460: Operating Systems
CS 5460: Operating Systems
What Do File System Users Need? Persistence: Data persists beyond jobs, crash, …
– Disk provides basic non-volatile storage – OS can enhance persistence via redundancy
Speed: Fast access to data – Random access handled efficiently – OS can enhance performance via file caching
Size: Can store lots of data Sharing/protection:
– Users can control who/what has access to their data
Ease of use: – Basic file abstraction (names, offsets, byte streams, …) – Directories simplify naming and lookup
CS 5460: Operating Systems
File System Abstractions File: Basic container of persistent data
– Unix: flat byte stream – IBM mainframes: series of records or objects
Directory system: Hierarchical naming relationships – Directories are special “files” that index other files – OS exports operations to manage directories indirectly
Common file access patterns: – Sequential: data processed in order, byte/record at a time
» Example: Compiler reading a source file – Random access: address blocks of data based on file offset
» Example: Demand paging reads, database searches – Keyed access: address blocks based on “key” values
» Typically implemented using key-file (hash) -- data-file pairs
CS 5460: Operating Systems
Common File System Operations
Data operations: – Create() – Delete() – Open() – Close() – Read() – Write() – Seek()
Naming operations: – HardLink() – SoftLink() – Rename()
Attribute operations: – SetAttribute() – GetAttribute()
Attributes include owner, protection, last accessed
CS 5460: Operating Systems
File System Data Structures Kernel (in-mem) Structures
– Global open file table – Per-process open file table – Free (disk) block list – Free inode list – File buffer cache: Cached disk blocks – Inode cache – Name cache
On-Disk Structures – Superblock: File system format info – File: Collection of blocks/bytes – File descriptor (inode): File metadata – Directory: Special kind of file – Free block/inode maps
• File inode
• File • contents
• Disk contents
• Key: Provide • this mapping
• efficiently and • safely.
CS 5460: Operating Systems
Key In-Memory Data Structures Open file table: shared by all processes w/ open file
– Open count and “deleted” flag – Copy of (or pointer to) file’s inode – Location of file blocks in file buffer cache (see below)
Per-process file table: private for each process – Pointer to entry in global open file table – Current position in the file (“seek” pointer) – Access mode (read, write, read-write)
File buffer cache: cache of file data blocks – Indexed by file-blocknum pairs (hash structure) – Used to reduce effective access time of disk operations – Can hold blocks from user files, directories, file system metadata
CS 5460: Operating Systems
Key In-Memory Data Structures Name cache: cache of recent name lookup results
– Indexed by full filename (hash structure) – Used to eliminate directory traversals (disk ops) for name lookups
Free space “bitmap”: – Used to track which blocks on disk are available
Free inode “bitmap”: – Used to track which file index nodes on disk are available
Superblock: holds key metadata that describes disk – Physical characteristics: size of disk, size of blocks, … – Location of free space and free inode “bitmaps” – Location of inodes – Multiple copies stored in known location à redundancy
CS 5460: Operating Systems
Key On-Disk Data Structures File descriptor (aka “inode”)
– Link count – Security attributes: UID, GID, … – Size – Access/modified times – “Pointers” to blocks – …
Directory file: array of… – File name (fixed/variable size) – Inode number – Length of directory entry
Free block bitmap Free inode bitmap Superblock
ulong links;
uid_t uid;
gid_t gid;
ulong size;
time_t access_time;
time_t modified_time;
addr_t blocklist…;
Filename inode#
Filename inode#
REALLYLONGFILENAME
inode# Filename
inode# Short inode#
Directory file:
File descriptor (inode):
CS 5460: Operating Systems
Naming and Directories Need a method to “name” files on disk:
– OS wants to use numbers or indices – Users prefer textual/visual names and hierarchical organization – Solution: Directories
Naming schemes: – Simple: One name space for entire disk w/ unique names – User-based: Each user has a single separate directory (TOPS-10) – Hierarchical: Tree-structured name space (modern OSes)
» Store directories as special files flagged as “directory file” » User programs can read directory like normal files » Only special system calls can modify directory files » Directory files contain <name, filedesc> pairs » Special “root” directory
CS 5460: Operating Systems
Traversing Directories (Simplified)
How do we locate file descriptor for “/foo/bar”? – Divide file name into components (e.g., “/”, “foo”, and “bar”). – Recursively descend directory hierarchy, at each step:
» Load file descriptor of “next” directory file » Use file descriptor info to locate and load directory file contents » Scan directory file for matching filename of next component » If match found à extract file descriptor number from (name, filedesc) » If no match à lookup failure
How can we speed up this process? – Name cache
» Probe name cache for longest prefix contained in cache (e.g., “/foo”) » Start recursive descent using longest prefix as starting point
CS 5460: Operating Systems
Finding a File’s Inode on Disk Locate inode for /foo/bar:
1. Find inode for “/” – Always in known location
2. Read “/” directory into memory 3. Find “foo” entry
» If no match, fail lookup 4. Load “foo” inode from disk 5. Check permissions
» If no permission, fail lookup 6. Load “foo” directory blocks 7. Find “bar” entry
» If no match, fail lookup 8. Load “bar” inode from disk 9. Check permissions
» If no permission, fail lookup
foo inode#
bar inode#
“/” inode
“foo” inode
“bar” inode
“foo” directory
“/” directory
Note: Pointers are block/inode numbers,
not addresses!
CS 5460: Operating Systems
Finding a File’s Blocks on Disk Conceptually, inode contains table:
– One entry per block in file – Entry contains physical block address (e.g., platter 3, cylinder 1, sector 26) – To locate data at offset X, read block (X / block_size)
Issues à How do we physically implement this table? – Most files are small – Most of the disk is contained in (relatively few) large files – Need to efficiently support both sequential and random access – Want simple inode lookup and management mechanisms
Block Address 0 Block Address 1
… Block Address N
CS 5460: Operating Systems
File System Operation Details Create(name)
– Check permissions / quota – Allocate disk space – Create file descriptor w/ name, location on disk, attributes – Add index to file descriptor in directory – Optional: file type (e.g., Word doc)
» Richer interface » More complicated implementation
Delete(name) – Find directory containing file – Remove filedesc from directory – Free disk blocks used by file – Note: Wait until last user closes
CS 5460: Operating Systems
File System Operation Details fid = Open(name, mode)
– Check if file already open. If not: » Find the file (via a name lookup) » Copy file descriptor into open file table
– Check protection à abort operation if access not allowed – Increment open count in global open file table – Create per-process file table entry
» Add pointer to corresponding entry in system open file table » Initialize seek pointer to start of file
– Return per-process file table index
Close(fid) – Remove entry in per-process file table – Decrement open count in global open file table
» If 0, remove from open file table
CS 5460: Operating Systems
File System Operation Details Read(fid, offset, size, buffer)
– Random access – Reads “size” bytes from offset “from” file into “buffer”
Read(fid, size, buffer) – Sequential access – Reads “size” bytes from current seek offset into “buffer” – Increment current seek offset by number of bytes read – May read less bytes than were requested
Write(…) – Analogous to Read()
Seek(fid, offset) – Sets “seek offset” to specified offset
Important from Today Key idea: Build hierarchical filesystem abstraction
on top of a flat array of blocks Filesystem goals
– Reads, writes, file management operations must be fast – Efficient use of storage – Data is durable in face of OS crashes (and maybe disk crashes) – Implements OS’s security policy
Next time: How to implement a filesystem
CS 5460: Operating Systems