+ All Categories
Home > Documents > UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating...

UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating...

Date post: 22-Dec-2015
Category:
View: 213 times
Download: 0 times
Share this document with a friend
122
11/20/2006 ecs150, Fall 2006 1 UCDavis, ecs150 Fall 2006 ecs150 Fall 2006: Operating System Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr. S. Felix Wu Computer Science Department University of California, Davis http://www.cs.ucdavis.edu/~wu/ [email protected]
Transcript
Page 1: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 1

UCDavis, ecs150Fall 2006

ecs150 Fall 2006:Operating SystemOperating System#5: File Systems(chapters: 6.4~6.7, 8)

Dr. S. Felix Wu

Computer Science Department

University of California, Davishttp://www.cs.ucdavis.edu/~wu/

[email protected]

Page 2: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 2

UCDavis, ecs150Fall 2006

File System AbstractionFile System Abstraction

Files Directories

Page 3: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 3

UCDavis, ecs150Fall 2006

System-call interfaceActive file entries

VNODE Layer or VFS

Local naming (UFS)

FFS

Buffer cache

Block or character device driver

Hardware

Page 4: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 4

UCDavis, ecs150Fall 2006

Page 5: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 5

UCDavis, ecs150Fall 2006

Page 6: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 6

UCDavis, ecs150Fall 2006

Page 7: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 7

UCDavis, ecs150Fall 2006

dirp = opendir(const char *filename);struct dirent *direntp = readdir(dirp);

struct dirent {ino_t d_ino;char d_name[NAME_MAX+1];

};

directory

direntinode

file_name

file

file

file

direntinode

file_name

direntinode

file_name

Page 8: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 8

UCDavis, ecs150Fall 2006

Local versus RemoteLocal versus Remote

System Call Interface V-node Local versus remote

– NFS or i-node– Stackable File System

Hard-disk blocks

Page 9: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 9

UCDavis, ecs150Fall 2006

File-System StructureFile-System Structure File structure

– Logical storage unit– Collection of related information

File system resides on secondary storage (disks).

File system organized into layers. File control block – storage structure

consisting of information about a file.

Page 10: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 10

UCDavis, ecs150Fall 2006 File File Disk Disk

separate the disk into blocks separate the file into blocks as well paging from file to disk

blocks: 4 - 7- 2- 10- 12

How to represent the file??How to link these 5 pages together??

Page 11: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 11

UCDavis, ecs150Fall 2006

Bit torrent piecesBit torrent pieces

1 big file (X Gigabytes) with a number of pieces (5%) already in (and sharing with others).

How much disk space do we need at this moment?

Page 12: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 12

UCDavis, ecs150Fall 2006 Hard DiskHard Disk

Track, Sector, Head– Track + Heads Cylinder

Performance– seek time– rotation time– transfer time

LBA– Linear Block Addressing

Page 13: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 13

UCDavis, ecs150Fall 2006 File File Disk blocks Disk blocks

fileblock

0

4

fileblock

1

7

fileblock

2

2

fileblock

3

10

0file

block4

12

What are the disadvantages?1. disk access can be slow for “random access”.2. How big is each block? 64 bytes? 68 bytes?

Page 14: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 14

UCDavis, ecs150Fall 2006

Kernel Hacking SessionKernel Hacking Session

This Friday from 7:30 p.m. until midnight.. 3083 Kemper

– Bring your laptop– And bring your mug…

Page 15: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 15

UCDavis, ecs150Fall 2006 A File SystemA File System

partition partition partition

i-list directory and data blockssb

i-node i-node ……. i-node

d

Page 16: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 16

UCDavis, ecs150Fall 2006

One Logical File One Logical File Physical Disk Blocks Physical Disk Blocks

efficient representation & access

Page 17: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 17

UCDavis, ecs150Fall 2006 An i-nodeAn i-node

Typical:each block 8K or 16K bytes

??? entries inone disk block

A file

Page 18: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 18

UCDavis, ecs150Fall 2006

inode (index node) structureinode (index node) structure meta-data of the file.

– di_mode 02– di_nlinks 02– di_uid 02– di_gid 02– di_size 04– di_addr 39– di_gen 01– di_atime 04– di_mtime 04– di_ctime 04

Page 19: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 19

UCDavis, ecs150Fall 2006

System-call interfaceActive file entries

VNODE Layer or VFS

Local naming (UFS)

FFS

Buffer cache

Block or character device driver

Hardware

Page 20: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 20

UCDavis, ecs150Fall 2006

Page 21: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 21

UCDavis, ecs150Fall 2006 A File SystemA File System

partition partition partition

i-list directory and data blockssb

i-node i-node ……. i-node

d

Page 22: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 22

UCDavis, ecs150Fall 2006

Page 23: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 23

UCDavis, ecs150Fall 2006

125 struct ufs2_dinode {126 u_int16_t di_mode; /* 0: IFMT, permissions; see below. */127 int16_t di_nlink; /* 2: File link count. */128 u_int32_t di_uid; /* 4: File owner. */ 129 u_int32_t di_gid; /* 8: File group. */ 130 u_int32_t di_blksize; /* 12: Inode blocksize. */ 131 u_int64_t di_size; /* 16: File byte count. */ 132 u_int64_t di_blocks; /* 24: Bytes actually held. */ 133 ufs_time_t di_atime; /* 32: Last access time. */ 134 ufs_time_t di_mtime; /* 40: Last modified time. */ 135 ufs_time_t di_ctime; /* 48: Last inode change time. */ 136 ufs_time_t di_birthtime; /* 56: Inode creation time. */ 137 int32_t di_mtimensec; /* 64: Last modified time. */ 138 int32_t di_atimensec; /* 68: Last access time. */ 139 int32_t di_ctimensec; /* 72: Last inode change time. */ 140 int32_t di_birthnsec; /* 76: Inode creation time. */ 141 int32_t di_gen; /* 80: Generation number. */ 142 u_int32_t di_kernflags; /* 84: Kernel flags. */ 143 u_int32_t di_flags; /* 88: Status flags (chflags). */ 144 int32_t di_extsize; /* 92: External attributes block. */ 145 ufs2_daddr_t di_extb[NXADDR];/* 96: External attributes block. */ 146 ufs2_daddr_t di_db[NDADDR]; /* 112: Direct disk blocks. */ 147 ufs2_daddr_t di_ib[NIADDR]; /* 208: Indirect disk blocks. */ 148 int64_t di_spare[3]; /* 232: Reserved; currently unused */ 149 };

Page 24: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 24

UCDavis, ecs150Fall 2006166 struct ufs1_dinode {

167 u_int16_t di_mode; /* 0: IFMT, permissions; see below. */ 168 int16_t di_nlink; /* 2: File link count. */ 169 union { 170 u_int16_t oldids[2]; /* 4: Ffs: old user and group ids. */ 171 } di_u; 172 u_int64_t di_size; /* 8: File byte count. */ 173 int32_t di_atime; /* 16: Last access time. */ 174 int32_t di_atimensec; /* 20: Last access time. */ 175 int32_t di_mtime; /* 24: Last modified time. */ 176 int32_t di_mtimensec; /* 28: Last modified time. */ 177 int32_t di_ctime; /* 32: Last inode change time. */ 178 int32_t di_ctimensec; /* 36: Last inode change time. */ 179 ufs1_daddr_t di_db[NDADDR]; /* 40: Direct disk blocks. */ 180 ufs1_daddr_t di_ib[NIADDR]; /* 88: Indirect disk blocks. */ 181 u_int32_t di_flags; /* 100: Status flags (chflags). */ 182 int32_t di_blocks; /* 104: Blocks actually held. */ 183 int32_t di_gen; /* 108: Generation number. */ 184 u_int32_t di_uid; /* 112: File owner. */ 185 u_int32_t di_gid; /* 116: File group. */ 186 int32_t di_spare[2]; /* 120: Reserved; currently unused */ 187 };

Page 25: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 25

UCDavis, ecs150Fall 2006

Bittorrent piecesBittorrent pieces

File size: 10 GBPieces downloaded: 512 MBHow much disk space do we need?

Page 26: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 26

UCDavis, ecs150Fall 2006

#include <stdio.h>#include <stdlib.h>

intmain(void){ FILE *f1 = fopen("./sss.txt", "w"); int i;

for (i = 0; i < 1000; i++) { fseek(f1, rand(), SEEK_SET); fprintf(f1, "%d%d%d%d", rand(), rand(), rand(), rand()); if (i % 100 == 0) sleep(1); } fflush(f1);}

# ./t# ls –l ./sss.txt

Page 27: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 27

UCDavis, ecs150Fall 2006

Page 28: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 28

UCDavis, ecs150Fall 2006

Page 29: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 29

UCDavis, ecs150Fall 2006

Page 30: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 30

UCDavis, ecs150Fall 2006 An i-nodeAn i-node

Typical:each block 1K

??? entries inone disk block

A file

Page 31: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 31

UCDavis, ecs150Fall 2006

i-nodei-node

How many disk blocks can a FS have? How many levels of i-node indirection will be

necessary to store a file of 2G bytes? (I.e., 0, 1, 2 or 3) What is the largest possible file size in i-node? What is the size of the i-node itself for a file of 10GB

with only 512 MB downloaded?

Page 32: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 32

UCDavis, ecs150Fall 2006

AnswerAnswer How many disk blocks can a FS have?

– 264 or 232: Pointer (to blocks) size is 8/4 bytes. How many levels of i-node indirection will be

necessary to store a file of 2G (231) bytes? (I.e., 0, 1, 2 or 3)– 12*210 + 28 * 210 + 28 *28 *2 10 + 28 * 28 *28 *2 10 >? 231

What is the largest possible file size in i-node?– 12*210 + 28 * 210 + 28 *28 *2 10 + 28 * 28 *28 *2 10

– 264 –1– 232 * 210

You need to consider three issues and find the minimum!

Page 33: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 33

UCDavis, ecs150Fall 2006

AnswerAnswer

How many pointers?– 512MB divided by the block size (1K)– 512K pointers times 8 (4) bytes = 4 (2) MB

Page 34: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 34

UCDavis, ecs150Fall 2006 A File SystemA File System

partition partition partition

i-list directory and data blockssb

i-node i-node ……. i-node

d

Page 35: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 35

UCDavis, ecs150Fall 2006

FFS and UFSFFS and UFS

/usr/src/sys/ufs/ffs/*– Higher-level: directory structure– Soft updates & Snapshot

/usr/src/sys/ufs/ufs/*– Lower-level: buffer, i-node

Page 36: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 36

UCDavis, ecs150Fall 2006

# of i-nodes# of i-nodes

UFS1: pre-allocation– 3% of HD, about < 25% used.

UFS2: dynamic allocation– Still limited # of i-nods

Page 37: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 37

UCDavis, ecs150Fall 2006

di_size vs. di_blocksdi_size vs. di_blocks

???

Page 38: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 38

UCDavis, ecs150Fall 2006

One Logical File One Logical File Physical Disk Blocks Physical Disk Blocks

efficient representation & access

Page 39: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 39

UCDavis, ecs150Fall 2006

di_size vs. di_blocksdi_size vs. di_blocks

Logical Physical

fstat du

Page 40: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 40

UCDavis, ecs150Fall 2006

Extended Attributes in UFS2Extended Attributes in UFS2 Attributes associated with the File

– di_extb[2]; – two blocks, but indirection if needed.

Format– Length 4– Name Space 1– Content Pad Length 1– Name Length 1– Name mod 8– Content variable

Applications: ACL, Data Labelling

Page 41: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 41

UCDavis, ecs150Fall 2006

Some thoughts….Some thoughts…. What can you do with “extended attributes”? How to design/implement?

– Should/can we do it “Stackable File Systems”?– Otherwise, the program to manipulate the EA’s

will have to be very UFS2-dependent or FiST with an UFS2 optimization option.

Are there any counter examples?– security and performance considerations.

Page 42: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 42

UCDavis, ecs150Fall 2006

Page 43: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 43

UCDavis, ecs150Fall 2006

Page 44: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 44

UCDavis, ecs150Fall 2006 struct dirent {

ino_t d_ino;char d_name[NAME_MAX+1];

};

struct stat {…short nlinks;

…};

directory

direntinode

file_name

file

file

file

direntinode

file_name

direntinode

file_name

Page 45: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 45

UCDavis, ecs150Fall 2006 A File SystemA File System

partition partition partition

i-list directory and data blockssb

i-node i-node ……. i-node

d

Page 46: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 46

UCDavis, ecs150Fall 2006

ln –s /usr/src/sys/sys/proc.h ppp.h ln /usr/src/sys/sys/proc.h ppp.h

Page 47: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 47

UCDavis, ecs150Fall 2006

File System Buffer CacheFile System Buffer Cacheapplication: read/write files

OS: translate file to disk blocks

...buffer cache ...maintains

controls disk accesses: read/write blocks

hardware:

Any problems?

Page 48: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 48

UCDavis, ecs150Fall 2006

File System ConsistencyFile System Consistency

To maintain file system consistency the ordering of updates from buffer cache to disk is critical

Example:– if the directory block is written back before the

i-node and the system crashes, the directory structure will be inconsistent

Page 49: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 49

UCDavis, ecs150Fall 2006

File System ConsistencyFile System Consistency File system almost always use a buffer/disk cache for

performance reasons This problem is critical especially for the blocks that

contain control information: i-node, free-list, directory blocks

Two copies of a disk block (buffer cache, disk) consistency problem if the system crashes before all the modified blocks are written back to disk

Write back critical blocks from the buffer cache to disk immediately

Data blocks are also written back periodically: sync

Page 50: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 50

UCDavis, ecs150Fall 2006

Two StrategiesTwo Strategies Prevention

– Use un-buffered I/O when writing i-nodes or pointer blocks

– Use buffered I/O for other writes and force sync every 30 seconds

Detect and Fix– Detect the inconsistency

– Fix them according to the “rules”

– Fsck (File System Checker)

Page 51: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 51

UCDavis, ecs150Fall 2006

File System IntegrityFile System Integrity Block consistency:

– Block-in-use table

– Free-list table

File consistency:– how many directories pointing to that i-node?

– nlink?

– three cases: D == L, L > D, D > L What to do with the latter two cases?

0 1 1 1 0 0 0 1 0 0 0 2

1 0 0 0 1 1 1 0 1 0 2 0

Page 52: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 52

UCDavis, ecs150Fall 2006 File System IntegrityFile System Integrity

File system states(a) consistent(b) missing block(c) duplicate block in free list(d) duplicate data block

Page 53: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 53

UCDavis, ecs150Fall 2006

Metadata OperationsMetadata Operations

Metadata operations modify the structure of the file system– Creating, deleting, or renaming

files, directories, or special files– Directory & I-node

Data must be written to disk in such a way that the file system can be recovered to a consistent state after a system crash

Page 54: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 54

UCDavis, ecs150Fall 2006

Metadata IntegrityMetadata Integrity

FFS uses synchronous writes to guarantee the integrity of metadata– Any operation modifying multiple pieces of

metadata will write its data to disk in a specific order

– These writes will be blocking Guarantees integrity and durability of

metadata updates

Page 55: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 55

UCDavis, ecs150Fall 2006

Deleting a file (I)Deleting a file (I)

abc

def

ghi

i-node-1

i-node-2

i-node-3

Assume we want to delete file “def”

Page 56: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 56

UCDavis, ecs150Fall 2006

Deleting a file (II)Deleting a file (II)

abc

def

ghi

i-node-1

i-node-3

Cannot delete i-node before directory entry “def”

?

Page 57: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 57

UCDavis, ecs150Fall 2006

Deleting a file (III)Deleting a file (III)

Correct sequence is1. Write to disk directory block containing deleted

directory entry “def”

2. Write to disk i-node block containing deleted i-node

Leaves the file system in a consistent state

Page 58: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 58

UCDavis, ecs150Fall 2006

Creating a file (I)Creating a file (I)

abc

ghi

i-node-1

i-node-3

Assume we want to create new file “tuv”

Page 59: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 59

UCDavis, ecs150Fall 2006

Creating a file (II)Creating a file (II)

abc

ghi

tuv

i-node-1

i-node-3

Cannot write directory entry “tuv” before i-node

?

Page 60: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 60

UCDavis, ecs150Fall 2006

Creating a file (III)Creating a file (III)

Correct sequence is1. Write to disk i-node block containing new i-node

2. Write to disk directory block containing new directory entry

Leaves the file system in a consistent state

Page 61: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 61

UCDavis, ecs150Fall 2006

Synchronous UpdatesSynchronous Updates

Used by FFS to guarantee consistency of metadata:– All metadata updates are done through blocking

writes

Increases the cost of metadata updates Can significantly impact the performance

of whole file system

Page 62: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 62

UCDavis, ecs150Fall 2006

Page 63: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 63

UCDavis, ecs150Fall 2006

SOFT UPDATESSOFT UPDATES

Use delayed writes (write back) Maintain dependency information about

cached pieces of metadata:This i-node must be updated before/after this directory entry

Guarantee that metadata blocks are written to disk in the required order

Page 64: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 64

UCDavis, ecs150Fall 2006

3 Soft Update Rules3 Soft Update Rules

Never point to a structure before it has been initialized.

Never reuse a resource before nullifying all previous pointers to it.

Never reset the old pointer to a live resource before the new pointer has been set.

Page 65: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 65

UCDavis, ecs150Fall 2006

Problem #1 with S.U.Problem #1 with S.U.

Synchronous writes guaranteed that metadata operations were durable once the system call returned

Soft Updates guarantee that file system will recover into a consistent state but not necessarily the most recent one– Some updates could be lost

Page 66: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 66

UCDavis, ecs150Fall 2006

We want to delete file “foo” and create new file “bar”

i-node-2 foo

NEW bar

NEW i-node-3

Block A Block B

What are the dependency relationship?

Page 67: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 67

UCDavis, ecs150Fall 2006

We want to delete file “foo” and create new file “bar”

i-node-2 foo

NEW bar

NEW i-node-3

Block A Block B

Circular DependencyX-2nd Y-1st

Page 68: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 68

UCDavis, ecs150Fall 2006

Problem #2 with S.U.Problem #2 with S.U.

Cyclical dependencies:– Same directory block contains entries to be

created and entries to be deleted– These entries point to i-nodes in the same block

Brainstorming:– How to resolve this issue in S.U.?

Page 69: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 69

UCDavis, ecs150Fall 2006

How to update?? i-node first or director block first?

Page 70: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 70

UCDavis, ecs150Fall 2006

Page 71: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 71

UCDavis, ecs150Fall 2006

Solution in S.U.Solution in S.U.

Roll back metadata in one of the blocks to an earlier, safe state

(Safe state does not contain new directory entry)

def

Block A’

Page 72: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 72

UCDavis, ecs150Fall 2006

Write first block with metadata that were rolled back (block A’ of example)

Write blocks that can be written after first block has been written (block B of example)

Roll forward block that was rolled back Write that block Breaks the cyclical dependency but must now

write twice block A

Page 73: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 73

UCDavis, ecs150Fall 2006

Before any Write Operation

After any Write Operation

SU Dependency Checking(roll back if necessary)

SU Dependency Processing(task list updating)(roll forward if necessary)

Page 74: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 74

UCDavis, ecs150Fall 2006

two most popular approaches for improving the performance of metadata operations and recovery:– Journaling – Soft Updates

Journaling systems record metadata operations on an auxiliary log

Soft Updates uses ordered writes

Page 75: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 75

UCDavis, ecs150Fall 2006 JOURNALINGJOURNALING

Journaling systems maintain an auxiliary log that records all meta-data operations

Write-ahead logging ensures that the log is written to disk before any blocks containing data modified by the corresponding operations.– After a crash, can replay the log to bring the file

system to a consistent state

Page 76: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 76

UCDavis, ecs150Fall 2006

JOURNALINGJOURNALING

Log writes are performed in addition to the regular writes

Journaling systems incur log write overhead but– Log writes can be performed efficiently

because they are sequential (block operation consideration)

– Metadata blocks do not need to be written back after each update

Page 77: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 77

UCDavis, ecs150Fall 2006

JOURNALINGJOURNALING

Journaling systems can provide– same durability semantics as FFS if log is

forced to disk after each meta-data operation– the laxer semantics of Soft Updates if log

writes are buffered until entire buffers are full

Page 78: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 78

UCDavis, ecs150Fall 2006

Soft Updates vs. JournalingSoft Updates vs. Journaling

Advantages disadvantages

Page 79: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 79

UCDavis, ecs150Fall 2006

With Soft Updates??With Soft Updates??

CPU

Do we still need “FSCK”? at boot time?

Page 80: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 80

UCDavis, ecs150Fall 2006

Recover the Missing ResourcesRecover the Missing Resources

In the background, in an active FS…– We don’t want to wait for the lengthy FSCK

process to complete…

A related issue:– the virus scanning process– what happens if we get a new virus signature?

Page 81: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 81

UCDavis, ecs150Fall 2006

Snapshot of the FSSnapshot of the FS

backup and restore dump reliably an active File System

– what will we do today to dump our 40GB FS “consistent” snapshots? (in the midnight…)

“background FSCK checks”

Page 82: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 82

UCDavis, ecs150Fall 2006

What is a snapshot?What is a snapshot?(I mean “conceptually”.)(I mean “conceptually”.)

Freeze all activities related to the FS. Copy everything to “some space”. Resume the activities.

How do we efficiently implement this concept such that the activities will only be blocked for about 0.25 seconds, and we don’t have to buy a really big hard drive?

Page 83: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 83

UCDavis, ecs150Fall 2006

Page 84: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 84

UCDavis, ecs150Fall 2006

Page 85: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 85

UCDavis, ecs150Fall 2006

Copy-on-Write

Page 86: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 86

UCDavis, ecs150Fall 2006 Snapshot: a fileSnapshot: a file

Logical sizeVersus physical size

Page 87: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 87

UCDavis, ecs150Fall 2006

ExampleExample

# mkdir /backups/usr/noon# mount –u –o snapshot /usr/snap.noon /usr# mdconfig –a –t vnode –u 0 –f /usr/snap.noon# mount –r /dev/md0 /backups/usr/noon

/* do whatever you want to test it */

# umount /backups/usr/noon# mdconfig –d –u 0# rm –f /usr/snap.noon

Page 88: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 88

UCDavis, ecs150Fall 2006

Page 89: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 89

UCDavis, ecs150Fall 2006

Page 90: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 90

UCDavis, ecs150Fall 2006

#include <stdio.h>#include <stdlib.h>

intmain(void){ FILE *f1 = fopen("./sss.txt", "w"); int i;

for (i = 0; i < 1000; i++) { fseek(f1, rand(), SEEK_SET); fprintf(f1, "%d%d%d%d", rand(), rand(), rand(), rand()); if (i % 100 == 0) sleep(1); } fflush(f1);}

Page 91: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 91

UCDavis, ecs150Fall 2006

ExampleExample

# mkdir /backups/usr/noon# mount –u –o snapshot /usr/snap.noon /usr# mdconfig –a –t vnode –u 0 –f /usr/snap.noon# mount –r /dev/md0 /backups/usr/noon

/* do whatever you want to test it */

# umount /backups/usr/noon# mdconfig –d –u 0# rm –f /usr/snap.noon

Page 92: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 92

UCDavis, ecs150Fall 2006

Page 93: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 93

UCDavis, ecs150Fall 2006

Page 94: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 94

UCDavis, ecs150Fall 2006

Page 95: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 95

UCDavis, ecs150Fall 2006

Page 96: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 96

UCDavis, ecs150Fall 2006

Page 97: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 97

UCDavis, ecs150Fall 2006

Page 98: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 98

UCDavis, ecs150Fall 2006

Page 99: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 99

UCDavis, ecs150Fall 2006

ExampleExample

# mkdir /backups/usr/noon# mount –u –o snapshot /usr/snap.noon /usr# mdconfig –a –t vnode –u 0 –f /usr/snap.noon# mount –r /dev/md0 /backups/usr/noon

/* do whatever you want to test it */

# umount /backups/usr/noon# mdconfig –d –u 0# rm –f /usr/snap.noon

Page 100: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 100

UCDavis, ecs150Fall 2006

Copy-on-Write

Page 101: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 101

UCDavis, ecs150Fall 2006

Page 102: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 102

UCDavis, ecs150Fall 2006 A File SystemA File System

??? entries inone disk block

A file

Page 103: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 103

UCDavis, ecs150Fall 2006 A Snapshot i-nodeA Snapshot i-node

??? entries inone disk block

A file

Not used orNot yet copy

Page 104: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 104

UCDavis, ecs150Fall 2006 Copy-on-writeCopy-on-write

??? entries inone disk block

A file

Not used orNot yet copy

Page 105: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 105

UCDavis, ecs150Fall 2006 Copy-on-writeCopy-on-write

??? entries inone disk block

A file

Not used orNot yet copy

Page 106: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 106

UCDavis, ecs150Fall 2006

Multiple SnapshotsMultiple Snapshots

about 20 snapshots Interactions/sharing among snapshots

Page 107: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 107

UCDavis, ecs150Fall 2006

Snapshot of the FSSnapshot of the FS

backup and restore dump reliably an active File System

– what will we do today to dump our 40GB FS “consistent” snapshots? (in the midnight…)

“background FSCK checks”

Page 108: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 108

UCDavis, ecs150Fall 2006

Page 109: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 109

UCDavis, ecs150Fall 2006

VFS: the FS SwitchVFS: the FS Switch

syscall layer (file, uio, etc.)

user space

Virtual File System (VFS)networkprotocol

stack(TCP/IP) NFS FFS LFS etc.*FS etc.

device drivers

Sun Microsystems introduced the virtual file system interface in 1985 to accommodate diverse filesystem types cleanly.

VFS allows diverse specific file systems to coexist in a file tree, isolating all FS-dependencies in pluggable filesystem modules.

VFS was an internal kernel restructuringwith no effect on the syscall interface.

Incorporates object-oriented concepts:a generic procedural interface withmultiple implementations.

Based on abstract objects with dynamicmethod binding by type...in C.Other abstract interfaces in the kernel: device drivers,

file objects, executable files, memory objects.

Page 110: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 110

UCDavis, ecs150Fall 2006

vnodevnode In the VFS framework, every file or directory in active use

is represented by a vnode object in kernel memory.

syscall layer

NFS UFS

free vnodes

Each vnode has a standardfile attributes struct.

Vnode operations aremacros that vector tofilesystem-specificprocedures.

Generic vnode points atfilesystem-specific struct(e.g., inode, rnode), seenonly by the filesystem. Each specific file system

maintains a cache of its resident vnodes.

Page 111: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 111

UCDavis, ecs150Fall 2006

vnode Operations and vnode Operations and AttributesAttributes

directories onlyvop_lookup (OUT vpp, name)vop_create (OUT vpp, name, vattr)vop_remove (vp, name)vop_link (vp, name)vop_rename (vp, name, tdvp, tvp, name)vop_mkdir (OUT vpp, name, vattr)vop_rmdir (vp, name)vop_symlink (OUT vpp, name, vattr, contents)vop_readdir (uio, cookie)vop_readlink (uio)

files onlyvop_getpages (page**, count, offset)vop_putpages (page**, count, sync, offset)vop_fsync ()

vnode attributes (vattr)type (VREG, VDIR, VLNK, etc.)mode (9+ bits of permissions)nlink (hard link count)owner user IDowner group IDfilesystem IDunique file IDfile size (bytes and blocks)access timemodify timegeneration number

generic operationsvop_getattr (vattr)vop_setattr (vattr)vhold()vholdrele()

Page 112: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 112

UCDavis, ecs150Fall 2006

Network File System (NFS)Network File System (NFS)

syscall layer

UFS

NFSserver

VFS

VFS

NFSclient

UFS

syscall layer

client

user programs

network

server

Page 113: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 113

UCDavis, ecs150Fall 2006

vnode Cachevnode CacheHASH(fsid, fileid)

VFS free list headActive vnodes are reference- counted by the structures that hold pointers to them.

- system open file table

- process current directory

- file system mount points

- etc.

Each specific file system maintains its own hash of vnodes (BSD).

- specific FS handles initialization

- free list is maintained by VFSvget(vp): reclaim cached inactive vnode from VFS free listvref(vp): increment reference count on an active vnodevrele(vp): release reference count on a vnode vgone(vp): vnode is no longer valid (file is removed)

Page 114: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 114

UCDavis, ecs150Fall 2006

Page 115: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 115

UCDavis, ecs150Fall 2006

Page 116: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 116

UCDavis, ecs150Fall 2006

struct vnode {struct mtx v_interlock; /* lock for "i" things */u_long v_iflag; /* i vnode flags (see below) */int v_usecount; /* i ref count of users */long v_numoutput; /* i writes in progress */struct thread *v_vxthread; /* i thread owning VXLOCK */int v_holdcnt; /* i page & buffer references */struct buflists v_cleanblkhd; /* i SORTED clean blocklist */struct buf *v_cleanblkroot;/* i clean buf splay tree */int v_cleanbufcnt; /* i number of clean buffers */struct buflists v_dirtyblkhd; /* i SORTED dirty blocklist */struct buf *v_dirtyblkroot; /* i dirty buf splay tree */int v_dirtybufcnt;

Page 117: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 117

UCDavis, ecs150Fall 2006

System-call interfaceActive file entries

VNODE Layer or VFS

Local naming (UFS)

FFS

Buffer cache

Block or character device driver

Hardware

Page 118: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 118

UCDavis, ecs150Fall 2006

Page 119: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 119

UCDavis, ecs150Fall 2006 How Stacking WorksHow Stacking Works

EXT2FS

US

ER

KE

RN

EL

User process

data &error codes

read()System CallInterface

File SystemInterface ext2fs_read()

ncryptfs_read()

data &error codes

NCryptfs

Page 120: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 120

UCDavis, ecs150Fall 2006

FiST: File System TranslatorLanguage + compilerCode portabilityAverage code size over other stackable file-systems is reduced ten times.Average development time is reduced seven timesDevelopers need only to describe the core functionality of their file systems.Basefs = minimalist template derived from WrapfsExtending platform-specific vnode interfaces in a platform independent way.

Page 121: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 121

UCDavis, ecs150Fall 2006

Page 122: UCDavis, ecs150 Fall 2006 11/20/2006ecs150, Fall 20061 Operating System ecs150 Fall 2006 : Operating System #5: File Systems (chapters: 6.4~6.7, 8) Dr.

11/20/2006 ecs150, Fall 2006 122

UCDavis, ecs150Fall 2006

Transaction-based FSTransaction-based FS

Performance versus consistency “Atomic Writes” on Multiple Blocks

– See the paper titled “Atomic Writes for Data Integrity and Consistency in Shared Storage Devices for Clusters” by Okun and Barak, FGCS, vol. 20, pages 539-547, 2004.

– Modify SCSI handling


Recommended