+ All Categories
Home > Documents > File Systems & Performance - George Mason University

File Systems & Performance - George Mason University

Date post: 03-Feb-2022
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
44
CS 571 Operating Systems Angelos Stavrou, George Mason University File Systems & Performance
Transcript
Page 1: File Systems & Performance - George Mason University

CS 571 Operating Systems

Angelos Stavrou, George Mason University

File Systems & Performance

Page 2: File Systems & Performance - George Mason University

GMU CS 571

File-System Interface

  File Concept   File Operations   Access Methods   Directory Structure

2

Page 3: File Systems & Performance - George Mason University

GMU CS 571

File Concept

  A file is a named collection of related information that is recorded on secondary storage

  Several information storage media �(magnetic/optical disks)

  The operating system provides a uniform logical view of information storage

3

Page 4: File Systems & Performance - George Mason University

GMU CS 571

File Concept (Cont.)   Files

  are mapped onto physical storage devices.   represent programs (both source and object forms)

and data.   have a certain structure that may be considered as

sequence of bits, bytes, lines, records..   have attributes that are recorded by the O.S. (name,

size, protection info, etc.)

  Information about files are kept in the directory structure, which is maintained on the secondary storage.

4

Page 5: File Systems & Performance - George Mason University

GMU CS 571

Basic File Operations   Create   Write   Read   Delete   Others

  reposition within the file, append, rename, ...

  For write/read operations, the operating system needs to keep a file position pointer for each process   Need to update it dynamically and properly

  All these operations require that the directory structure be first searched for the target file

5

Page 6: File Systems & Performance - George Mason University

GMU CS 571

File Operations

  To avoid searching the directory entries repeatedly, many systems require that an Open system call be issued before that file is first used actively.

  Operating System keeps   a system-wide open-file table containing information about all open

files   per-process open-file tables containing information about all open

files of each process   The open operation takes a file name and searches the

directory, copying the directory entry into the open-file table. It returns a pointer to the entry in the open file table.

6

Page 7: File Systems & Performance - George Mason University

GMU CS 571

File Operations (Cont.)

Process Aʼs Open-File

Table . . .

.

.

Process Bʼs Open-File

Table . . . . . .

.

.

. . . . .

.

.

System-Wide Open-File Table

7

Page 8: File Systems & Performance - George Mason University

GMU CS 571

File Operations

  The per-process open table contains info about   Position pointer   Access rights   Accounting   Pointer to the system-wide open-file table entry

  The system-wide open table includes info about   File location on the disk   File size   File open count (the number of processes using this file)

  A process that completes its operations on a given file will issue a Close system call.

8

Page 9: File Systems & Performance - George Mason University

GMU CS 571

Internal File Structure   Disk systems have a well-defined block size determined

by the size of a sector.   All disk I/O is performed in units of one block (physical

record).   Each block is one or more sectors   A sector can hold 32 – 4096 bytes

  Files are made of logical records. �Typically, a fixed number of logical records will be packed into physical records.

  Operating System will perform translation from logical records to physical records.

9

Page 10: File Systems & Performance - George Mason University

GMU CS 571

Access Methods   Sequential Access

  Information is processed in order, one record after the other   Example: editors and compilers

read next write next reset (rewind)

10

Page 11: File Systems & Performance - George Mason University

GMU CS 571

Access Methods 11

  Direct Access   The file is made up fixed-length logical records

that allow programs to read and write records rapidly in no particular order

read n write n

or alternatively: position to n read next write next

n = relative block number

Page 12: File Systems & Performance - George Mason University

GMU CS 571

Storage Structure

  A disk can be used for:   a single file system (in its entirety)   multiple file systems   in part for file systems, in part for other purposes (e.g. for

swap space or unformatted (raw) disk space)

  These parts are known as partitions, slices or minidisks.

  The parts can be combined to form larger structures known as volumes (virtual disks).

12

Page 13: File Systems & Performance - George Mason University

GMU CS 571

Storage Structure (Cont.)

  Each partition can be either “raw” (containing no file system), or “cooked” (with a file system)

  Raw disk   contains a large sequential array of logical blocks, without

any file-system data   can be used as swap space   can be used for special (e.g. database) applications

13

Page 14: File Systems & Performance - George Mason University

GMU CS 571

Swap Space

  The portion of the storage used for virtual memory support. It may hold:   entire process images (e.g.: by copying entire process image at

process start-up time to the swap space and then performing demand paging from the swap space)

  individual pages pushed out of the main memory during page replacement.�

  It can be implemented in a separate raw partition   By-passes the file system   Uses algorithms and disk allocation techniques particularly designed

for speed

  Or alternatively, as a large file within the file system.

14

Page 15: File Systems & Performance - George Mason University

GMU CS 571

Storage Structure (Cont.)   Each partition that contains a file system has �

a device directory   The device directory keeps information (name,

location, size, type, owner) for files on that partition.

15

Page 16: File Systems & Performance - George Mason University

GMU CS 571

Directory Structure

  The directory acts as a symbol table that translates file names into their directory entries.

  Operations on a directory   Search for a file   Create a file   Delete a file   List a directory   Rename a file   …

16

Page 17: File Systems & Performance - George Mason University

GMU CS 571

Directory Structure   Early systems used single-level and two-level directory

structures   Tree-structured directories extend the structure to a tree of

arbitrary height   Users can create their own subdirectories   Every file has a unique path name

17

Page 18: File Systems & Performance - George Mason University

GMU CS 571

Acyclic-Graph Directories

  Allows shared subdirectories and files.   A shared file will “exist” in multiple directories at

once.

18

Page 19: File Systems & Performance - George Mason University

GMU CS 571

Achieving File Sharing

  Option 1: Duplicate all information about the shared file in both directories (Problem?)

  Option 2: Create a new directory entry called link   The link is effectively a pointer to another file or directory   When the directory entry of a referred file is a link, we resolve the

link by using the path name (symbolic link in Unix)   “ln –s reports/report1.txt myreport”

19

Page 20: File Systems & Performance - George Mason University

GMU CS 571

Achieving File Sharing (Cont.)

  Option 3: Each entry in a directory can point to a little data structure (File Control Block [FCB], or “i-node”) that keeps information about the file   The directory entries corresponding to a shared file will all point

to the same file control block   Non-symbolic or “hard” links in Unix   “ln reports/report1.txt myreport”

FCB of the file

“reports” Directory

report1.txt

“root“ Directory

myreport

20

Page 21: File Systems & Performance - George Mason University

GMU CS 571

Achieving File Sharing (Cont.)

  What to do when a shared file is deleted by a user?   The deletion of a link should not affect the original file   If the original file is deleted, we may be left with dangling

pointers.   Solutions

  Using backpointers, delete also all links. The search may be expensive.

  Alternatively, leave the links intact until an attempt is made to use them (Unix symbolic links). �May lead to infrequent but subtle problems.

  In case of non-symbolic (or in Unix, “hard”) links: Preserve the file until all references are deleted. Keep the count of the number of the references, delete the file when the count reaches zero.

21

Page 22: File Systems & Performance - George Mason University

GMU CS 571

File System Implementation

  File System Structure   File System Implementation   Allocation Methods   File System Performance

22

Page 23: File Systems & Performance - George Mason University

GMU CS 571

File System Structure

  An operating system may allow multiple file systems.

  Once the user interface is determined, the file system must be implemented to map the logical file system to the secondary-storage devices.

  File control block – storage structure that keeps information about a given file (Unix “i-nodes”).   Ownership, size, permissions, access date info, location of data

blocks

23

Page 24: File Systems & Performance - George Mason University

GMU CS 571

Accessing Disk Sub-system

  Disk access time has two components

  Random access time that includes seek time and rotational latency (5-10 ms)

  Transfer time (10 MB/s)

  Compare to the memory access time of 10-100 nanoseconds

24

Page 25: File Systems & Performance - George Mason University

GMU CS 571

Layered File System   File system is organized into layers

  Logical File System Layer manages the file-system structure (through directories and FCBs).

  File-Organization Module performs mapping between logical blocks and physical blocks. It also includes free-space manager and block allocation manager.

  Basic File System Layer issues generic commands to the appropriate device driver (I/O Control Layer) to read and write physical blocks on the disk

25

Page 26: File Systems & Performance - George Mason University

GMU CS 571

Implementing File Operations

  Opening a File   The file name is passed to the logical file system   The logical file system searches the directory structure   Once the file is found, the FCB is copied into a system-

wide open-file table.   An entry is made in the per-process open-file table.   The “open” system call returns a pointer to the

appropriate entry in the per-process open-file table (“file descriptor” in Unix, “file handle” in Windows 2000/XP) .

  In reality, the “open” system call first searches the system-wide open file table to see if the file is already in use by other process.

26

Page 27: File Systems & Performance - George Mason University

GMU CS 571

Implementing File Operations

  Creating a File

  The application program calls the logical file system

  The logical file system allocates a new FCB, reads the appropriate directory into memory, updates it and writes it back to disk.

27

Page 28: File Systems & Performance - George Mason University

GMU CS 571

Implementation of “Open” and “Read” 28

  Figure (a) refers to opening a file.   Figure (b) refers to reading a file.

Page 29: File Systems & Performance - George Mason University

GMU CS 571

Implementing File Operations

  Closing a file   The per-process open file table entry is removed   The system-wide open file table entry’s open count is

decremented   When the count reaches zero, the updated file information is

copied back to the disk-based directory structure and the system-wide open-file table entry is removed.

  Incorporating Networking   FCB may also contain information for network connections and

devices.

29

Page 30: File Systems & Performance - George Mason University

GMU CS 571

Allocation Methods

  The allocation method refers to how disk blocks are allocated for files:

  Contiguous allocation

  Linked allocation

  Indexed allocation

30

Page 31: File Systems & Performance - George Mason University

GMU CS 571

Contiguous Allocation   Each file occupies a set of contiguous blocks on the disk.   Simple – only starting location (block #) and length

(number of blocks) are required.�

31

Page 32: File Systems & Performance - George Mason University

GMU CS 571

Contiguous Allocation

  Efficient access to multiple blocks of a file

  Both sequential and direct access can be supported.

  A major problem is determining how much space is needed for a new file.

  Finding space for a new file: First-fit and best-fit

  External fragmentation: free space is broken into multiple chunks.

32

Page 33: File Systems & Performance - George Mason University

GMU CS 571

Contiguous Allocation

  Many newer file systems use a modified contiguous allocation scheme.

  A contiguous chunk of space is allocated initially, and then, when this amount is not sufficient, another chunk (an extent) is added.

  The location of a file’s blocks is then recorded as a location and a block count, plus a link to the first block of the next extent.

33

Page 34: File Systems & Performance - George Mason University

GMU CS 571

Linked Allocation   Each file is a linked list of disk blocks: blocks may be

scattered anywhere on the disk.   Each block contains a pointer to the next block.   Each directory entry has a pointer to the first and last disk

blocks of the file.

34

Page 35: File Systems & Performance - George Mason University

GMU CS 571

Linked Allocation

  External fragmentation is eliminated.   The size of a file does not need to be declared at the

time of creation.   Can be used effectively only for sequential access

files (Inefficient for direct-access files).   Some disk space needed for the pointers.   One solution is to collect blocks into multiples

(clusters) and to allocate the clusters rather than blocks.

  Another problem of linked allocation is reliability: what will happen if a pointer is lost or damaged?

35

Page 36: File Systems & Performance - George Mason University

GMU CS 571

File-Allocation Table (FAT) 36

  A variation of the linked allocation method

  A section of the disk at the beginning of each partition is used as the File Allocation Table.

  The table entries give the block number of the next block in the file.

  The scheme can result in a significant number of disk head seeks, unless the FAT is cached.

  Free block management

Page 37: File Systems & Performance - George Mason University

GMU CS 571

Indexed Allocation   Brings all pointers together in the index block.   Each file has its own index block, which is an array of disk-

block addresses.   The ith entry in the index block points to the ith block of the file.

37

Page 38: File Systems & Performance - George Mason University

GMU CS 571

Indexed Allocation

  Indexed allocation supports direct access, without suffering from external fragmentation or size-declaration problems.

  However, wasted space may be a problem.   How large the index block should be?

  To reduce the wasted space, we want to keep the index block small

  If the index block is too small, it will not be able to hold pointers for a large file.

  Linked scheme   Multilevel scheme   Combined scheme

38

Page 39: File Systems & Performance - George Mason University

GMU CS 571

Combined Scheme (Unix) 39

 Keep the first, (say) 15 pointers of the index block in the file’s i-node (FCB).

  The first 12 of these pointers point to direct blocks

  The next three pointers point to indirect blocks

Page 40: File Systems & Performance - George Mason University

GMU CS 571

File System Performance   Disk access is the bottleneck for the file system

performance

  Caching   Most disk controllers have an on-board cache that can store

entire tracks at a time   Subsequent requests can be served through the on-board cache

  Most systems maintain a separate section of main memory for a disk cache (block cache, or buffer cache), where blocks are kept under the assumption that they will be re-used in near future

40

Page 41: File Systems & Performance - George Mason University

GMU CS 571

File System Performance (Cont.)   LRU is a reasonable block replacement policy BUT: if a critical block (such as File Control Block, or� i-node) is read into the cache and modified, but not re-written to the disk, a crash will leave the file system in an inconsistent state.

  Critical blocks must be written immediately.

  Avoiding inconsistency   Write through-cache: write every modified block to disk as soon as it has

been written

  UNIX solution   The system call sync forces all the modified blocks out onto the disk

immediately.   A program, usually called update, is invoked in the background to call

sync every 30 seconds.

41

Page 42: File Systems & Performance - George Mason University

GMU CS 571

Memory-Mapped Files   Memory-mapped file I/O allows file I/O to be treated as

routine memory access by mapping disk blocks to a page in memory.

  A file is initially read using demand paging. A page-sized portion of the file is read from the file system into a physical page. Subsequent reads/writes to/from the file are treated as ordinary memory accesses.

  Simplifies file access by treating file I/O through memory rather than read() / write() system calls.

  Also allows several processes to map the same file allowing the pages in memory to be shared.

42

Page 43: File Systems & Performance - George Mason University

GMU CS 571

Unified Virtual Memory   Mapping can be done explicitly (through a system

call) or the operating system can choose to memory map a file to kernel space when a file is opened and accessed through ordinary system calls such as open(), read() an write().

  The file data can be also cached in a page cache: virtual-memory techniques are used to cache file data as pages rather than as file-system-oriented blocks�(Solaris, Windows 2000/XP, new Linux releases).�This is known as unified virtual memory.

43

Page 44: File Systems & Performance - George Mason University

GMU CS 571

More on File System Performance

  Block-read-ahead: When reading block k to the cache in memory, read also block k+1

  Reduce disk arm motion through   Putting blocks that are likely to be accessed in sequence close to

each other   Disk scheduling algorithms that serve pending disk access

requests in an order that reduces the delay

44


Recommended