CS5460: Operating Systemscs5460/slides/Lecture17.pdf · CS 5460: Operating Systems Key In-Memory...

CS 5460: Operating Systems

CS5460: Operating Systems Lecture 17: Intro to File Systems

(Ch. 10)

Important From Last Time   Page replacement algorithms

–  Optimal page replacement strategy evicts the page used farthest in the future

–  LRU is a decent approximation of optimal –  Clock / second chance algorithm is a single-bit approximation of

LRU – works well for most workloads

  Thrashing happens when working sets do not fit into RAM

–  Response: Swap out entire processes –  Last resort: Start killing processes

  Copy on write optimizations   Memory-mapped file optimizations



Filesystem Layers User’s viewpoint:

–  Objects: Files, directories, bytes –  Operations: Create, read, write, delete,

rename, move, seek, set attributes

Physical viewpoint: –  Objects: Sectors, tracks, disks –  Operations: Seek, read block, write block

User ßà OS layer –  User library hides many details –  OS can directly read/write user data

OS ßà Hardware layer –  IO registers –  Interrupts –  DMA

Disk Hardware

User Apps

User Library

Open() | Close() | Read() | Write()

Seek() | ReadBlk() | WriteBlk()

Trap

I/O regs DMA

Interrupts DMA


Typical Disk Organization

  Coated with magnetic material that encodes bits

–  Capacity increases come from improvements in bit density

  Logically divided into: –  Spindles: individual disks –  Tracks: rings on a disk –  Sectors: portions of a track –  Cylinders: stacks of tracks

  Read/write data (overview): –  Position disk head over track –  Wait for sector to rotate under head –  Read/write data from/to sector

Block/sector

Cylinder/Track

R/W head

Disk arm

Rotation

Spindle


Disk Organization (cont’d)   Disk physics:

–  Modern disks spin at 5400, 7200, 10000, and 15000 rpm –  Outside edge of 3.5” disk spins at over 150 mph –  Disk head “floats” on very thin cushion of air above platter

»  Bernoulli effect used to “fly” as close as possible »  Head crash is exactly that à disk head contacts the surface

  Disks organized as stacks of platters: –  Disk heads mounted on “combs” à often heads on both sides –  Separate disk heads moved independently

  Disk controller –  Managing all the independent head movements –  Contains RAM to cache disk contents from/to disk –  Accepts commands from CPU à responds using DMA/interrupts


Disk Hardware Trends

  Eliminating seeks is critical to performance!

Model Size Interface Seek RPM Price ST320011A 20 GB ATA/IDE 9.0ms 7,200 $92

ST318437LC 18 GB U2-SCSI 3.6ms 15,000 $329

ST3120814A 120 GB ATA-100 8.5ms 7,200 $80

ST373453LC 73 GB U-SCSI 3.6ms 15,000 $263

ST3320820A 320 GB PATA 4ms 7,200 $99

ST3146855LW 147 GB Ultra320 2ms 15,000 $313

ST2000DM001 2000 GB SATA 4.1ms 7,200 $130

ST3600057SS 600 GB SAS 2ms 15,000 $500

• 200

1 • 2

005

• 200

7

• Data: Seagate, NewEgg, dirtcheapdrives.com

• 201

2

How to avoid seeks?

  Design file system carefully   Use RAM as a cache for disk

–  Once a block is read, cache it as long as possible –  When a block is written to, delay the actual write

  Combine hard disk with solid state disk (SSD)   Replace hard disk with SSD   RAM cloud



What Do File System Users Need?   Persistence: Data persists beyond jobs, crash, …

–  Disk provides basic non-volatile storage –  OS can enhance persistence via redundancy

  Speed: Fast access to data –  Random access handled efficiently –  OS can enhance performance via file caching

  Size: Can store lots of data   Sharing/protection:

–  Users can control who/what has access to their data

  Ease of use: –  Basic file abstraction (names, offsets, byte streams, …) –  Directories simplify naming and lookup


File System Abstractions   File: Basic container of persistent data

–  Unix: flat byte stream –  IBM mainframes: series of records or objects

  Directory system: Hierarchical naming relationships –  Directories are special “files” that index other files –  OS exports operations to manage directories indirectly

  Common file access patterns: –  Sequential: data processed in order, byte/record at a time

»  Example: Compiler reading a source file –  Random access: address blocks of data based on file offset

»  Example: Demand paging reads, database searches –  Keyed access: address blocks based on “key” values

»  Typically implemented using key-file (hash) -- data-file pairs


Common File System Operations

  Data operations: –  Create() –  Delete() –  Open() –  Close() –  Read() –  Write() –  Seek()

  Naming operations: –  HardLink() –  SoftLink() –  Rename()

  Attribute operations: –  SetAttribute() –  GetAttribute()

Attributes include owner, protection, last accessed


File System Data Structures   Kernel (in-mem) Structures

–  Global open file table –  Per-process open file table –  Free (disk) block list –  Free inode list –  File buffer cache: Cached disk blocks –  Inode cache –  Name cache

  On-Disk Structures –  Superblock: File system format info –  File: Collection of blocks/bytes –  File descriptor (inode): File metadata –  Directory: Special kind of file –  Free block/inode maps

• File inode

• File • contents

• Disk contents

• Key: Provide • this mapping

• efficiently and • safely.


Key In-Memory Data Structures   Open file table: shared by all processes w/ open file

–  Open count and “deleted” flag –  Copy of (or pointer to) file’s inode –  Location of file blocks in file buffer cache (see below)

  Per-process file table: private for each process –  Pointer to entry in global open file table –  Current position in the file (“seek” pointer) –  Access mode (read, write, read-write)

  File buffer cache: cache of file data blocks –  Indexed by file-blocknum pairs (hash structure) –  Used to reduce effective access time of disk operations –  Can hold blocks from user files, directories, file system metadata


Key In-Memory Data Structures   Name cache: cache of recent name lookup results

–  Indexed by full filename (hash structure) –  Used to eliminate directory traversals (disk ops) for name lookups

  Free space “bitmap”: –  Used to track which blocks on disk are available

  Free inode “bitmap”: –  Used to track which file index nodes on disk are available

  Superblock: holds key metadata that describes disk –  Physical characteristics: size of disk, size of blocks, … –  Location of free space and free inode “bitmaps” –  Location of inodes –  Multiple copies stored in known location à redundancy


Key On-Disk Data Structures   File descriptor (aka “inode”)

–  Link count –  Security attributes: UID, GID, … –  Size –  Access/modified times –  “Pointers” to blocks –  …

  Directory file: array of… –  File name (fixed/variable size) –  Inode number –  Length of directory entry

  Free block bitmap   Free inode bitmap   Superblock

ulong links;

uid_t uid;

gid_t gid;

ulong size;

time_t access_time;

time_t modified_time;

addr_t blocklist…;

Filename inode#

Filename inode#

REALLYLONGFILENAME

inode# Filename

inode# Short inode#

Directory file:

File descriptor (inode):


Naming and Directories   Need a method to “name” files on disk:

–  OS wants to use numbers or indices –  Users prefer textual/visual names and hierarchical organization –  Solution: Directories

  Naming schemes: –  Simple: One name space for entire disk w/ unique names –  User-based: Each user has a single separate directory (TOPS-10) –  Hierarchical: Tree-structured name space (modern OSes)

»  Store directories as special files flagged as “directory file” »  User programs can read directory like normal files »  Only special system calls can modify directory files »  Directory files contain <name, filedesc> pairs »  Special “root” directory


Traversing Directories (Simplified)

  How do we locate file descriptor for “/foo/bar”? –  Divide file name into components (e.g., “/”, “foo”, and “bar”). –  Recursively descend directory hierarchy, at each step:

»  Load file descriptor of “next” directory file »  Use file descriptor info to locate and load directory file contents »  Scan directory file for matching filename of next component »  If match found à extract file descriptor number from (name, filedesc) »  If no match à lookup failure

  How can we speed up this process? –  Name cache

»  Probe name cache for longest prefix contained in cache (e.g., “/foo”) »  Start recursive descent using longest prefix as starting point


Finding a File’s Inode on Disk Locate inode for /foo/bar:

1.  Find inode for “/” –  Always in known location

2.  Read “/” directory into memory 3.  Find “foo” entry

»  If no match, fail lookup 4.  Load “foo” inode from disk 5.  Check permissions

»  If no permission, fail lookup 6.  Load “foo” directory blocks 7.  Find “bar” entry

»  If no match, fail lookup 8.  Load “bar” inode from disk 9.  Check permissions

»  If no permission, fail lookup

foo inode#

bar inode#

“/” inode

“foo” inode

“bar” inode

“foo” directory

“/” directory

Note: Pointers are block/inode numbers,

not addresses!


Finding a File’s Blocks on Disk   Conceptually, inode contains table:

–  One entry per block in file –  Entry contains physical block address (e.g., platter 3, cylinder 1, sector 26) –  To locate data at offset X, read block (X / block_size)

  Issues à How do we physically implement this table? –  Most files are small –  Most of the disk is contained in (relatively few) large files –  Need to efficiently support both sequential and random access –  Want simple inode lookup and management mechanisms

Block Address 0 Block Address 1

… Block Address N


File System Operation Details   Create(name)

–  Check permissions / quota –  Allocate disk space –  Create file descriptor w/ name, location on disk, attributes –  Add index to file descriptor in directory –  Optional: file type (e.g., Word doc)

»  Richer interface »  More complicated implementation

  Delete(name) –  Find directory containing file –  Remove filedesc from directory –  Free disk blocks used by file –  Note: Wait until last user closes


File System Operation Details   fid = Open(name, mode)

–  Check if file already open. If not: »  Find the file (via a name lookup) »  Copy file descriptor into open file table

–  Check protection à abort operation if access not allowed –  Increment open count in global open file table –  Create per-process file table entry

»  Add pointer to corresponding entry in system open file table »  Initialize seek pointer to start of file

–  Return per-process file table index

  Close(fid) –  Remove entry in per-process file table –  Decrement open count in global open file table

»  If 0, remove from open file table


File System Operation Details   Read(fid, offset, size, buffer)

–  Random access –  Reads “size” bytes from offset “from” file into “buffer”

  Read(fid, size, buffer) –  Sequential access –  Reads “size” bytes from current seek offset into “buffer” –  Increment current seek offset by number of bytes read –  May read less bytes than were requested

  Write(…) –  Analogous to Read()

  Seek(fid, offset) –  Sets “seek offset” to specified offset

Important from Today   Key idea: Build hierarchical filesystem abstraction

on top of a flat array of blocks   Filesystem goals

–  Reads, writes, file management operations must be fast –  Efficient use of storage –  Data is durable in face of OS crashes (and maybe disk crashes) –  Implements OS’s security policy

  Next time: How to implement a filesystem


Date post:	01-Aug-2020
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

CS5460: Operating Systemscs5460/slides/Lecture17.pdf · CS 5460: Operating Systems Key In-Memory...

Documents