+ All Categories
Home > Documents > File systems Chapter 6: File Systems -...

File systems Chapter 6: File Systems -...

Date post: 14-Mar-2019
Category:
Upload: dinhnguyet
View: 222 times
Download: 0 times
Share this document with a friend
10
1 Chapter 6: File Systems Chapter 6 2 CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt) File systems Files Directories & naming File system implementation Example file systems Chapter 6 3 CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt) Long-term information storage Must store large amounts of data Gigabytes -> terabytes -> petabytes Stored information must survive the termination of the process using it Lifetime can be seconds to years Must have some way of finding it! Multiple processes must be able to access the information concurrently Chapter 6 4 CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt) Naming files Important to be able to find files after they’re created Every file has at least one name Name can be Human-accessible: “foo.c”, “my photo”, “Go Panthers!”, “Go Banana Slugs!” Machine-usable: 4502, 33481 Case may or may not matter Depends on the file system Name may include information about the file’s contents Certainly does for the user (the name should make it easy to figure out what’s in it!) Computer may use part of the name to determine the file type Chapter 6 5 CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt) Typical file extensions Chapter 6 6 CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt) File structures Sequence of bytes Sequence of records 1 byte 1 record 12A 101 111 sab wm cm avg ejw sab elm br S02 F01 W02 Tree
Transcript

1

Chapter 6: File Systems

Chapter 6 2CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

File systems

FilesDirectories & namingFile system implementationExample file systems

Chapter 6 3CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Long-term information storage

Must store large amounts of dataGigabytes -> terabytes -> petabytes

Stored information must survive the termination ofthe process using it

Lifetime can be seconds to yearsMust have some way of finding it!

Multiple processes must be able to access theinformation concurrently

Chapter 6 4CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Naming files

Important to be able to find files after they’re createdEvery file has at least one nameName can be

Human-accessible: “foo.c”, “my photo”, “Go Panthers!”, “Go BananaSlugs!”Machine-usable: 4502, 33481

Case may or may not matterDepends on the file system

Name may include information about the file’s contentsCertainly does for the user (the name should make it easy to figure outwhat’s in it!)Computer may use part of the name to determine the file type

Chapter 6 5CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Typical file extensions

Chapter 6 6CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

File structures

Sequence of bytes Sequence of records

1 byte

1 record

12A 101 111

sab wm cm avg ejw sab elm br

S02 F01 W02

Tree

2

Chapter 6 7CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

File types

Executablefile

Archive

Chapter 6 8CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Accessing a file

Sequential accessRead all bytes/records from the beginningCannot jump around

May rewind or back up, however

Convenient when medium was magnetic tapeOften useful when whole file is needed

Random accessBytes (or records) read in any orderEssential for database systemsRead can be …

Move file marker (seek), then read or …Read and then move file marker

Chapter 6 9CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

File attributes

Chapter 6 10CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

File operations

Create: make a new fileDelete: remove an existingfileOpen: prepare a file to beaccessedClose: indicate that a file isno longer being accessedRead: get data from a fileWrite: put data to a file

Append: like write, but onlyat the end of the fileSeek: move the “current”pointer elsewhere in the fileGet attributes: retrieveattribute informationSet attributes: modifyattribute informationRename: change a file’sname

Chapter 6 11CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Using file system calls

Chapter 6 12CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Using file system calls, continued

3

Chapter 6 13CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Memory-mapped files

Segmented process before mapping files into its addressspaceProcess after mapping

Existing file abc into one segmentCreating new segment for xyz

Programtext

Data

Before mapping

Programtext

Data

After mapping

abc

xyz

Chapter 6 14CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

More on memory-mapped files

Memory-mapped files are a convenient abstractionExample: string search in a large file can be done just aswith memory!Let the OS do the buffering (reads & writes) in the virtualmemory system

Some issues come up…How long is the file?

Easy if read-onlyDifficult if writes allowed: what if a write is past the end of file?

What happens if the file is shared: when do changesappear to other processes?When are writes flushed out to disk?

Clearly, easier to memory map read-only files…

Chapter 6 15CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Directories

Naming is nice, but limitedHumans like to group things together forconvenienceFile systems allow this to be done with directories(sometimes called folders)Grouping makes it easier to

Find files in the first place: remember the enclosingdirectories for the fileLocate related files (or just determine which files arerelated)

Chapter 6 16CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Single-level directory systems

One directory in the file systemExample directory

Contains 4 files (foo, bar, baz, blah)owned by 3 different people: A, B, and C (owners shown in red)

Problem: what if user B wants to create a file called foo?

Rootdirectory

Afoo

Abar

Bbaz

Cblah

Chapter 6 17CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Two-level directory system

Solves naming problem: each user has her own directoryMultiple users can use the same file nameBy default, users access files in their own directoriesExtension: allow users to access files in others’ directories

Rootdirectory

Afoo

Abar

Bfoo

Bbaz

A B C

Cbar

Cfoo

Cblah

Chapter 6 18CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Hierarchical directory system

Rootdirectory

Afoo

AMom

Bfoo

Bfoo.tex

A B C

Cbar

Cfoo

Cblah

APapers

APhotos

AFamily

Asunset

Asunset

Aos.tex

Akids

BPapers

Bfoo.ps

4

Chapter 6 19CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Unix directory tree

Chapter 6 20CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Operations on directories

Create: make a newdirectoryDelete: remove a directory(usually must be empty)Opendir: open a directory toallow searching itClosedir: close a directory(done searching)

Readdir: read a directoryentryRename: change the nameof a directory

Similar to renaming a file

Link: create a new entry ina directory to link to anexisting fileUnlink: remove an entry ina directory

Remove the file if this is thelast link to this file

Chapter 6 21CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

File system implementation issues

How are disks divided up into file systems?How does the file system allocate blocks to files?How does the file system manage free space?How are directories handled?How can the file system improve…

Performance?Reliability?

Chapter 6 22CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Carving up the disk

Masterboot record

Partition table

Partition 1 Partition 2 Partition 3 Partition 4

Entire disk

Bootblock

Superblock

Free spacemanagement

Indexnodes

Files & directories

Chapter 6 23CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

A B C D E F

A Free C Free E F

Contiguous allocation for file blocks

Contiguous allocation requires all blocks of a file to beconsecutive on diskProblem: deleting files leaves “holes”

Similar to memory allocation issuesCompacting the disk can be a very slow procedure…

Chapter 6 24CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Contiguous allocation

Data in each file is stored inconsecutive blocks on diskSimple & efficient indexing

Starting location (block #) on disk(start)Length of the file in blocks (length)

Random access well-supportedDifficult to grow files

Must pre-allocate all needed spaceWasteful of storage if file isn’tusing all of the space

Logical to physical mapping is easyblocknum = (pos/1024)+ start;offset_in_block= pos% 1024;

Start=5Length=2902

0 1 2 3

4 5 6 7

8 9 10 11

5

Chapter 6 25CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Linked allocation

File is a linked list of diskblocks

Blocks may be scatteredaround the disk driveBlock contains both pointerto next block and dataFiles may be as long asneeded

New blocks are allocated asneeded

Linked into list of blocks infileRemoved from list (bitmap)of free blocks

0 1 2 3

4 5 6 7

8 9 10 11

Start=9End=4Length=2902

Start=3End=6Length=1500

0

x

4 6

x

Chapter 6 26CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Finding blocks with linked allocation

Directory structure is simpleStarting address looked up from directoryDirectory only keeps track of first block (not others)

No wasted space - all blocks can be usedRandom access is difficult: must always start at first block!Logical to physical mapping is done byblock = start;offset_in_block = pos % 1020;for (j = 0; j < pos / 1020; j++) {block = block->next;

}Assumes that next pointer is stored at end of blockMay require a long time for seek to random location in file

Chapter 6 27CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

A

B

4012

-23-24

536-17-1809-110-111-112-113-114-115

Linked allocation using a RAM-based table

Links on disk are slowKeep linked list in memoryAdvantage: fasterDisadvantages

Have to copy it to disk atsome pointHave to keep in-memory andon-disk copy consistent

-1

-1-1

Chapter 6 28CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Using a block index for allocation

Store file block addresses inan array

Array itself is stored in a diskblockDirectory has a pointer to thisdisk blockNon-existent blocks indicatedby -1

Random access easyLimit on file size?

0 1 2 3

4 5 6 7

8 9 10 11

grades 4 4802

Name index size

69708

Chapter 6 29CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Finding blocks with indexed allocation

Need location of index table: look up in directoryRandom & sequential access both well-supported:look up block number in index tableSpace utilization is good

No wasted disk blocks (allocate individually)Files can grow and shrink easilyOverhead of a single disk block per file

Logical to physical mapping is done byblock = index[block % 1024];offset_in_block = pos % 1024;

Limited file size: 256 pointers per index block, 1 KBper file block -> 256 KB per file limit

Chapter 6 30CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Larger files with indexed allocation

How can indexed allocation allow files larger than a singleindex block?Linked index blocks: similar to linked file blocks, but usingindex blocks insteadLogical to physical mapping is done byindex= start;blocknum = pos/1024;for(j= 0;j< blocknum /255);j++){index= index->next;}block= index[blocknum % 255];offset_in_block= pos% 1024;File size is now unlimitedRandom access slow, but only for very large files

6

Chapter 6 31CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Two-level indexed allocation

Allow larger files by creating an index of index blocksFile size still limited, but much largerLimit for 1 KB blocks = 1 KB * 256 * 256 = 226 bytes = 64 MB

Logical to physical mapping is done byblocknum = pos / 1024;index = start[blocknum / 256)];block = index[blocknum % 256]offset_in_block = pos % 1024;

Start is the only pointer kept in the directoryOverhead is now at least two blocks per file

This can be extended to more than two levels if larger filesare needed...

Chapter 6 32CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Block allocation with extents

Reduce space consumed by index pointersOften, consecutive blocks in file are sequential on diskStore <block,count> instead of just <block> in indexAt each level, keep total count for the index for efficiency

Lookup procedure is:Find correct index block by checking the starting file offset for eachindex blockFind correct <block,count> entry by running through index block,keeping track of how far into file the entry isFind correct block in <block,count> pair

More efficient if file blocks tend to be consecutive on diskAllocating blocks like this allows faster reads & writesLookup is somewhat more complex

Chapter 6 33CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Managing free space: bit vector

Keep a bit vector, with one entry per file blockNumber bits from 0 through n-1, where n is the number of file blockson the diskIf bit[j] == 0, block j is freeIf bit[j] == 1, block j is in use by a file (for data or index)

If words are 32 bits long, calculate appropriate bit by:wordnum = block/32;bitnum = block% 32;Search for free blocks by looking for words with bits unset(words != 0xffffffff)Easy to find consecutive blocks for a single fileBit map must be stored on disk, and consumes space

Assume 4 KB blocks, 8 GB disk => 2M blocks2M bits = 221 bits = 218 bytes = 256KB overhead

Chapter 6 34CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Managing free space: linked list

Use a linked list to manage free blocksSimilar to linked list for file allocationNo wasted space for bitmapNo need for random access unless we want to findconsecutive blocks for a single file

Difficult to know how many blocks are free unlessit’s tracked elsewhere in the file systemDifficult to group nearby blocks together if they’refreed at different times

Less efficient allocation of blocks to filesFiles read & written more because consecutive blocks notnearby

Chapter 6 35CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Issues with free space management

OS must protect data structures used for free spacemanagementOS must keep in-memory and on-disk structures consistent

Update free list when block is removed: change a pointer in theprevious block in the free listUpdate bit map when block is allocated

Caution: on-disk map must never indicate that a block is free when it’spart of a fileSolution: set bit[j] in free map to 1 on disk before using block[j] in a fileand setting bit[j] to 1 in memoryNew problem: OS crash may leave bit[j] == 1 when block isn’t actuallyused in a fileNew solution: OS checks the file system when it boots up…

Managing free space is a big source of slowdown in filesystems

Chapter 6 36CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

What’s in a directory?

Two types of informationFile namesFile metadata (size, timestamps, etc.)

Basic choices for directory informationStore all information in directory

Fixed size entriesDisk addresses and attributes in directory entry

Store names & pointers to index nodes (i-nodes)

games attributesmail attributesnews attributes

research attributes

gamesmailnews

research

attributes

attributes

attributes

attributesStoring all informationin the directory

Using pointers toindex nodes

7

Chapter 6 37CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Directory structure

StructureLinear list of files (often itself stored in a file)

Simple to programSlow to runIncrease speed by keeping it sorted (insertions are slower!)

Hash table: name hashed and looked up in fileDecreases search time: no linear searches!May be difficult to expandCan result in collisions (two files hash to same location)

TreeFast for searchingEasy to expandDifficult to do in on-disk directory

Name lengthFixed: easy to programVariable: more flexible, better for users

Chapter 6 38CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Handling long file names in a directory

Chapter 6 39CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Sharing files

Rootdirectory

Afoo

????

Bfoo

A B C

Cbar

Cfoo

Cblah

APapers

APhotos

AFamily

Asunset

Asunset

Aos.tex

Akids

BPhotos

Blake

Chapter 6 40CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Solution: use links

A creates a file, and inserts into her directoryB shares the file by creating a link to itA unlinks the file

B still links to the fileOwner is still A (unless B explicitly changes it)

a.tex

Owner: ACount: 1

a.tex

Owner: ACount: 2

b.tex

Owner: ACount: 1

b.tex

A A B B

Chapter 6 41CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Managing disk space

Dark line (left hand scale) gives data rate of a diskDotted line (right hand scale) gives disk space efficiencyAll files 2KB

Block size

Chapter 6 42CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Disk quotas

8

Chapter 6 43CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

File that hasnot changed

Backing up a file system

A file system to be dumpedSquares are directories, circles are filesShaded items, modified since last dumpEach directory & file labeled by i-node number

Chapter 6 44CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Bitmaps used in a file system dump

Chapter 6 45CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Checking the file system for consistency

Consistent Missing (“lost”) block

Duplicate block in free list Duplicate block in two files

Chapter 6 46CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

File system cache

Many files are used repeatedlyOption: read it each time from diskBetter: keep a copy in memory

File system cacheSet of recently used file blocksKeep blocks just referencedThrow out old, unused blocks

Same kinds of algorithms as for virtual memoryMore effort per reference is OK: file references are a lot lessfrequent than memory references

Goal: eliminate as many disk accesses as possible!Repeated reads & writesFiles deleted before they’re ever written to disk

Chapter 6 47CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

File block cache data structures

Chapter 6 48CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Grouping data on disk

9

Chapter 6 49CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Log-structured file systems

Trends in disk & memoryFaster CPUsLarger memories

ResultMore memory -> disk caches can also be largerIncreasing number of read requests can come from cacheThus, most disk accesses will be writes

LFS structures entire disk as a logAll writes initially buffered in memoryPeriodically write these to the end of the disk logWhen file opened, locate i-node, then find blocks

Issue: what happens when blocks are deleted?

Chapter 6 50CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

••

••

Direct pointers...

Unix Fast File System indexing scheme

inode

data

data

data

data

data

data

data

data

...

...

...

...

dataprotection mode

owner & group

timestamps

size

block count

single indirect

double indirect

triple indirect

••

••

••

••

••

••

••

••

••

link count

Chapter 6 51CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

More on Unix FFS

First few block pointers kept in directorySmall files have no extra overhead for index blocksReading & writing small files is very fast!

Indirect structures only allocated if neededFor 4 KB file blocks (common in Unix), max file sizes are:

48 KB in directory (usually 12 direct blocks)1024 * 4 KB = 4 MB of additional file data for single indirect1024 * 1024 * 4 KB = 4 GB of additional file data for double indirect1024 * 1024 * 1024 * 4 KB = 4 TB for triple indirect

Maximum of 5 accesses for any file block on disk1 access to read inode & 1 to read file blockMaximum of 3 accesses to index blocksUsually much fewer (1-2) because inode in memory

Chapter 6 52CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Directories in FFS

Directories in FFS are justspecial files

Same basic mechanismsDifferent internal structure

Directory entries containFile nameI-node number

Other Unix file systemshave more complexschemes

Not always simple files…

inode number

record length

name length

name

inode number

record length

name length

name

Directory

Chapter 6 53CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

CD-ROM file system

Chapter 6 54CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Directory entry in MS-DOS

10

Chapter 6 55CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

MS-DOS File Allocation Table

2 TB2048 MB32 KB

2 TB1024 MB16 KB

2 TB512 MB8 KB

1 TB256 MB16 MB4 KB

128 MB8 MB2 KB

4 MB1 KB

2 MB0.5 KB

FAT-32FAT-16FAT-12Block size

Chapter 6 56CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Bytes

Windows 98 directory entry & file name

Checksum

Chapter 6 57CS 1550, cs.pitt.edu (originaly modified by Ethan L. Miller and Scott A. Brandt)

Storing a long name in Windows 98

Long name stored in Windows 98 so that it’s backwardscompatible with short names

Short name in “real” directory entryLong name in “fake” directory entries: ignored by older systems

OS designers will go to great lengths to make new systemswork with older systems…


Recommended