11/13/2014
1
CS341: Operating System
Dr. A. SahuDept of Comp. Sc. & Engg.
Indian Institute of Technology Guwahati
Lect 40: 13th Nov 2014
File System & Device Drive • FS Implementation
–File system Implementation • I/O subsystem & Device Drivers
File‐System Structure• File structure– Logical storage unit– Collection of related information
• File system resides on (disks)Provided user interface to storage mapping– Provided user interface to storage, mapping logical to physical
– Provides efficient and convenient access to disk by allowing data to be stored, located retrieved easily
File‐System Structure• Disk provides in‐place rewrite and random access
– I/O transfers performed in blocks of sectors(usually 512 bytes)
• File Control block – storage structureFile Control block storage structure consisting of information about a file
Inode
• Device driver controls the physical device • File system organized into layers
Layered File SystemApplication Program
Logical File System
File Organization Moduleg
Basic File System
I/O Control
Devices
File System Layers• Device drivers –Manage I/O devices at the I/O control layer–Given commands like – “read drive1, cylinder 72, track 2, sector 10, , y , , ,
into memory location 1060”–Outputs low‐level hardware specific commands to hardware controller APPAPP
Logical FSLogical FS
F Org .ModF Org .Mod
Basic FS Basic FS
I/O Control I/O Control
Devices
11/13/2014
2
File System Layers• Basic file system –Given command like “retrieve block 123”translates to device driver–Also manages memory buffers and caches Allocation, freeing, replacement • Buffers hold data in transit• Caches hold frequently used data APPAPP
Logical FSLogical FS
F Org .ModF Org .Mod
Basic FS Basic FS
I/O Control I/O Control
Devices
File System Layers• File organization module understands files, logical address, and physical blocks – Translates logical block # to physical block # –Manages free space, disk allocation
APPAPP
Logical FSLogical FS
F Org .ModF Org .Mod
Basic FS Basic FS
I/O Control I/O Control
Devices
File System Layers• Logical file system manages metadata information
– Translates file name into file number, file handle, location by maintaining file control
( )blocks (inodes in UNIX)–Directory management
–Protection APPAPP
Logical FSLogical FS
F Org .ModF Org .Mod
Basic FS Basic FS
I/O Control I/O Control
Devices
File System Layers (Cont.)• Layering –Useful for reducing complexity & redundancy, –But adds overhead : decrease performance
• Translates File name into – File number, file handle, location –By maintaining file control blocks (inodes)– Logical layers can be implemented by any coding method according to OS designer
APPAPP
Logical FSLogical FS
F Org .ModF Org .Mod
Basic FS Basic FS
I/O Control I/O Control
Devices
File System Layers (Cont.)• Many file systems, sometimes many within an operating system, Each with its own format – CD‐ROM is ISO 9660; – Unix has UFS, FFS; –Windows has FAT, FAT32, NTFS as well as Floppy, CD, DVD Blu‐ray,
– Linux has more than 40 types, with extended file system ext2 and ext3 leading; plus distributed file systems, etc.)
– New ones still arriving – ZFS, GoogleFS, Oracle ASM, FUSE
File‐System Implementation
• We have system calls at the API level, but how do we implement their functions?– On‐disk and in‐memory structures
• Boot control block contains info needed by system to boot OS from that volume– Needed if volume contains OS, usually first block of , yvolume
• Volume control block (superblock, master file table) contains volume details– Total # of blocks, # of free blocks, block size, free block pointers or array
• Directory structure organizes the files– Names and inode numbers, master file table
11/13/2014
3
File‐System Implementation (Cont.)• Per‐file File Control Block (FCB) contains many details about the file– inode number, permissions, size, dates– NFTS stores into in master file table using relational DB structuresrelational DB structures
File dates (create, access, write)
File Permissions
File owner, group, ACL
File Size
File data blocks or ptr to file data block
Virtual File Systems (Cont.)• The API is to the VFS interface, rather than any specific type of file system
File System Interface
VFS InterfaceVFS Interface
Local FS type 1 Local FS type 2 Remote FS
Disk SSD Network
Contiguous Allocation• Mapping from logical to physical
Block to be accessed = Q + starting address
Displacement into block = R
Linked Allocation
Example of Indexed Allocation Indexed Allocation – Mapping (Cont.)
Inode‐ptr
Data Block
Data Block
Data Block
Outer index
Index Table File
11/13/2014
4
Efficiency and Performance
• Efficiency dependent on:–Disk allocation and directory algorithms
– Types of data kept in file’s directory entry–Pre‐allocation or as‐needed allocation ofPre allocation or as needed allocation of metadata structures– Fixed‐size or varying‐size data structures
Efficiency and Performance (Cont.)• Performance
– Keeping data and metadata close together– Buffer cache – separate section of main memory for frequently used blocks
– Synchronous writes sometimes requested by apps or y q y pp
needed by OS• No buffering / caching – writes must hit disk before acknowledgement
• Asynchronous writes more common, buffer‐able, faster
– Free‐behind and read‐ahead – techniques to optimize sequential access
– Reads frequently slower than writes
Recovery
• Consistency checking – compares data in directory structure with data blocks on disk, and tries to fix inconsistencies
–Can be slow and sometimes fails• Use system programs to back up data from disk to another storage device (magnetic tape, other magnetic disk, optical)
• Recover lost file or disk by restoring data from backup
Log Structured File Systems
• Log structured (or journaling) file systems record each metadata update to the file system as a transaction
• All transactions are written to a logA transaction is considered committed once it is– A transaction is considered committed once it is written to the log (sequentially)
– Sometimes to a separate device or section of disk– However, the file system may not yet be updated
Log Structured File Systems
• The transactions in the log are asynchronously written to the file system structures– When the file system structures are modified, the transaction is removed from the log
• If the file system crashes all remainingIf the file system crashes, all remaining transactions in the log must still be performed
• Faster recovery from crash, removes chance of inconsistency of metadata
I/o Subsystem & LDDI/o Subsystem & LDD
11/13/2014
5
I/O Hardware• Variety of I/O dev : Storage, Transmission, Human‐interface
• Common concepts – signals from I/O devices interface with computer
– Port – connection point for devicePort connection point for device– Bus ‐ daisy chain or shared direct access• PCI bus common in PCs/Servers, PCI Express (PCIe)
– Peripheral Component Interconnect
• Expansion bus connects relatively slow devices
I/O Hardware• Common concepts – signals from I/O devices interface with computer
– Controller (host adapter) – electronics that operate port, bus, device• Sometimes integrated/Separate circuit board• Contains processor, microcode, private memory, bus controller, etc–Some talk to per‐device controller with bus controller, microcode, memory, etc
A Typical PC Bus Structure I/O Hardware (Cont.)• I/O instructions control devices• Devices usually have registers where device driver places commands, addresses, and data to write, or read data from registers after command execution– Data‐in register, data‐out register, status register, control registercontrol register
– Typically 1‐4 bytes, or FIFO buffer• Devices have addresses, used by – Direct I/O instructions– Memory‐mapped I/O
• Device data and command registers mapped to processor address space
• Especially for large address spaces (graphics)
In your CS321 (Comp Peri. Interface)
• 8085 based Interfacing – 8255 (I/O Interface Controller)– 8254 (Timer)
– 8259 (Interrupt controller)– 8237 (DMA Controller)– 8251 (UART Controller)
• Microcontroller – All controller are on Chip
Device I/O Port Locations on PCs (partial)
11/13/2014
6
Interrupts• Polling can happen in 3 instruction cycles
– Read status, logical‐and to extract status bit, branch if not zero– How to be more efficient if non‐zero infrequently?
• CPU Interrupt‐request line triggered by I/O device– Checked by processor after each instruction
• Interrupt handler receives interrupts– Maskable to ignore or delay some interrupts
• Interrupt vector to dispatch interrupt to correct handler– Context switch at start and end– Based on priority, but some are nonmaskable
– Interrupt chaining if more than one device at same interrupt number
Direct Memory Access• Used to avoid programmed I/O (one byte at a time) for large data movement
• Requires DMA controller
• Bypasses CPU to transfer data directly between I/O device and memorybetween I/O device and memory
Direct Memory Access• OS writes DMA command block into memory – Source and destination addresses– Read or write mode, Count of bytes–Writes location of command block to DMA controller
Bus mastering of DMA controller grabs bus from– Bus mastering of DMA controller – grabs bus from CPU
• Cycle stealing from CPU but still much more efficient
–When done, interrupts to signal completion
• Version that is aware of virtual addresses can be even more efficient ‐ DVMA
Application I/O Interface• I/O system calls encapsulate device behaviors in generic classes
• Device‐driver layer hides differences among I/O controllers from kernel
• New devices talking already‐implemented protocols need no extra work
• Each OS has its own I/O subsystem structures and device• Each OS has its own I/O subsystem structures and device driver frameworks
• Devices vary in many dimensions– Character‐stream or block
– Sequential or random‐access– Synchronous or asynchronous (or both)– Sharable or dedicated
– Speed of operation– read‐write, read only, orwrite only
Characteristics of I/O Devices (Cont.)• Subtleties of devices handled by device drivers• Broadly I/O devices can be grouped by the OS into
– Block I/O– Character I/O (Stream)/ ( )
– Memory‐mapped file access– Network sockets
• For direct manipulation of I/O device specific characteristics, usually an escape / back door– Unix ioctl() call to send arbitrary bits to a device control register and data to device data register
Block and Character Devices• Block devices include disk drives– Commands include read, write, seek – Raw I/O, direct I/O, or file‐system access–Memory‐mapped file access possible
• File mapped to virtual memory and clusters brought via demand paging
– DMA
• Character devices include keyboards, mice, serial ports– Commands include get(), put()– Libraries layered on top allow line editing
11/13/2014
7
Network Devices• Varying enough from block and character to have own interface
• Linux, Unix, Windows and many others include socket interface– Separates network protocol from network operation
– Includes select() functionality
• Approaches vary widely (pipes, FIFOs, streams, queues, mailboxes)
Clocks and Timers
• Provide current time, elapsed time, timer
• Normal resolution about 1/60 second• Some systems provide higher‐resolution timers
• Programmable interval timer used for timings, periodic interrupts
• ioctl() (on UNIX) covers odd aspects of I/O such as clocks and timers
Nonblocking and Asynchronous I/O• Blocking ‐ process suspended until I/O completed
– Easy to use and understand– Insufficient for some needs
• Nonblocking ‐ I/O call returns as much as available– User interface, data copy (buffered I/O)Implemented via multi threading– Implemented via multi‐threading
– Returns quickly with count of bytes read or written– select() to find if data ready then read() or write() to transfer
• Asynchronous ‐ process runs while I/O executes– Difficult to use– I/O subsystem signals process when I/O completed
Kernel I/O Subsystem• Scheduling– Some I/O request ordering via per‐device queue– Some OSs try fairness– Some implement Quality Of Service (i.e. IPQOS)
• Buffering ‐ store data in memory while transferring between devices– To cope with device speed mismatch
– To cope with device transfer size mismatch
– To maintain “copy semantics”
– Double buffering – two copies of the data• Kernel and user, Varying sizes• Full / being processed and not‐full / being used• Copy‐on‐write can be used for efficiency in some cases
Linux Device Driverinux evice river
Linux Kernel Split View
Linux Device Driver by Jonhantan Corbet
11/13/2014
8
application
We would write most of this source‐code“app.cpp”
but we would call some library‐functionse g open() read() write() malloc()
call
e.g., open(), read(), write(), malloc(), …
then our code would get ‘linked’ withstandard runtime libraries
(So this is an example of “code reuse”)
standard
“runtime”
libraries
ret
application
call
Many standard library functionsperform services that requireexecuting privileged instructions(which only the kernel can do)
standard
“runtime”
libraries
ret
user space kernel space
Operating
System
kernel
syscall
sysret
application
ll
module
Linux allows us to write our owninstallable kernel modules
and add them to a running system
ret
standard
“runtime”
libraries
call
ret
user space kernel space
Operating Systemkernel
syscall
sysret
call
• Basic structure of a C program:
– Comment‐banner (showing title and abstract)– Preprocessor directives (e.g., for header‐files)Global data declarations (if they are needed)– Global data‐declarations (if they are needed)
– Required ‘main()’ function (as the entry‐point)– Can invoke ‘printf()’ (for ‘formatted’ output)– Optionally may define some other functions
#include<stdio.h>//Headerfor printf
int main(){
printf(“\n Hello world\n”);
return 0;
}
• We’re allowed to ‘install’ kernel objects:$ /sbin/insmod myLKM.ko
• We’re allowed to ‘remove’ kernel objects:• We re allowed to remove kernel objects:$ /sbin/rmmod myLKM
• Anyone is allowed to ‘list’ kernel objects:$ /sbin/lsmod
11/13/2014
9
• Kernel module differs from a normal C application program (e.g., no ‘main()’ function)
• A kernel module cannot call any of the familiar functions from the standard C runtime libraries
• For any LKM two entry points are mandatory (one• For any LKM, two entry‐points are mandatory (one for ‘initialization’, and one for ‘cleanup’)
• Resembles normal layout of C programs
but
• Two ‘module administration’ functions [ h i d][these are required]
plus
• Appropriate ‘module service’ functions [these are optional]
• Module uses ‘printk()’ instead of ‘printf()’• Includes the <linux/module.h> header‐file• Specifies a legal software license (“GPL”)• Compilation requires a special ‘Makefile’ • Execution is “passive” (it’s a ‘side‐effect’)• Module has no restriction on ‘privileges’
• int init_module( void );// this gets called during module installation
• void cleanup module( void );• void cleanup_module( void );// this gets called during module removal
• A newer syntax allows memory‐efficiency:module_init(my_init);
module_exit(my_exit);
#include <linux/module.h> // for printk()
int init( void ){
printk( "\n Kello, everybody! \n\n" );
return 0;
}
void exit( void ){
printk( "\n Goodbye now... \n\n" );
}
MODULE_LICENSE("GPL");
module_init(init);
module_exit(exit);
• You can modify the ‘printk()’ text‐string so its message will be sure to be displayed – ‐‐ it will be output to the graphical desktop
• Here’s how you can do it:printk( “<0> Hello, everybody! \n” );
This log‐level indicates a ‘kernel emergency’
11/13/2014
10
System Call Interface
VFS
File System
Socket
Buffer Cache
Block
Device DriverCharacter
Device DriverNetwork
Device Driver
Network
Protocol
Hardware
function
function
Device‐driver LKM layout
module’s ‘payload’is a collection ofcallback‐functions having prescribed
init
exit
fops
function
. . .
registers the ‘fops’
unregisters the ‘fops’
prototypes
AND
a ‘package’ offunction‐pointers
the usual pair ofmodule‐administration
functions
int open( char *pathname, int flags, … );int read( int fd, void *buf, size_t count );int write( int fd, void *buf, size_t count );int lseek( int fd, loff_t offset, int whence ); int close( int fd );
(and other less‐often‐used file‐I/O functions)
• UNIX systems treat hardware‐devices as special files, so that familiar functions can be used by application programmers to access devices (e.g., open, read, close)
• But a System Administrator has to create these device‐files (in the ‘/dev’ directory)
• Or alternatively (as we’ve seen), an LKM could create these necessary device‐files
# mknod /dev/cmos c 70 0
#include <linux/module.h> // for printk() #include <linux/fs.h> // for register_chrdev() #include <asm/uaccess.h> // for put_user(), get_user()#include <asm/io.h> // for inb(), outb()
char modname[] = "cmosram;// name of this kernel module
char devname[] = "cmos;// name for the device's fileint my_major = 70; // major ID‐number for driverint cmos_size = 128; // total bytes of cmos memory
int write_max = 9; // largest 'writable' address
ssize_t my_read( struct file *file, char *buf, size_t len, loff_t *pos ) {unsigned char datum;
if ( *pos >= cmos_size ) return 0;outb( *pos, 0x70 ); datum = inb( 0x71 );if ( put_user( datum, buf ) ) return –EFAULT;*pos += 1; return 1;
}
ssize_t my_write( struct file *file, const char *buf, size_t len, loff_t *pos ) {unsigned char datum;
if ( *pos >= cmos_size ) return 0;if ( *pos > write_max ) return –EPERM;
if ( get_user( datum, buf ) ) return –EFAULT;outb( *pos, 0x70 ); outb( datum, 0x71 );*pos += 1; return 1;
}
11/13/2014
11
loff_t my_llseek( struct file *file, loff_t pos, int whence ) {loff_tnewpos = ‐1;switch ( whence ) {
case 0: newpos = pos; break; // SEEK_SETcase 1: newpos = file‐>f_pos + pos; break; // SEEK_CURcase 2: newpos=cmos size + pos; break;// SEEK ENDp _ p ; ;// _
}
if (( newpos < 0 )||( newpos > cmos_size )) return –EINVAL;file‐>f_pos = newpos;return newpos;
}
struct file_operations my_fops = {owner: THIS_MODULE, llseek: my_llseek, write: my_write, read: my_read, };
static int __init my_init( void ) {printk( "<1>\nInstalling \'%s\' module ", devname );
printk( "(major=%d) \n", my_major );
return register_chrdev( my_major, devname, &my_fops );
}
static void __exit my_exit(void ) {unregister_chrdev( my_major, devname );
printk( "<1>Removing \'%s\' module\n", devname );
}
module_init( my_init );
module_exit( my_exit );
MODULE_LICENSE("GPL");
#include <stdio.h> // for printf(), perror() #include <fcntl.h> // for open() #include <unistd.h> // for read() int main( int argc, char **argv ) {
int status = 0; int fd = open( "/dev/cmos", O_RDONLY );if ( fd < 0 ) { perror( "/dev/cmos" ); return ‐1; } // Repeatedly reads Status_Reg until its bit has 'flipped‘ 30 times
for (int i = 0; i < 30; i++) {for (int i = 0; i < 30; i++) {do { // do busy‐wait until UpdateInProgress is 'true'lseek( fd, 10, SEEK_SET ); read( fd, &status, 1 );} while ( ( status & 0x80 ) == 0x00 );do{ // do busy‐wait until UpdateInProgress is 'false’
lseek( fd, 10, SEEK_SET ); read( fd, &status, 1 );} while ( ( status & 0x80 ) == 0x80 );printf( " %d Second Elapsed\n", i+1 );
}
}
• lspci ‐nn• Peripheral devices in the early PCs used fixed i/o‐ports and fixed memory‐addresses, e.g.:– Video memory address‐range: 0xA0000‐0xBFFFF P bl i i/ 0 40 0 43– Programmable timer i/o‐ports: 0x40‐0x43
– Keyboard and mouse i/o‐ports: 0x60‐0x64– Real‐Time Clock’s i/o‐ports: 0x70‐0x71– Hard Disk controller’s i/o‐ports: 0x01F0‐01F7– Graphics controller’s i/o‐ports: 0x03C0‐0x3CF– Serial‐port controller’s i/o‐ports: 0x03F8‐0x03FF– Parallel‐port controller’s i/o‐ports: 0x0378‐0x037A
PCI Configuration Space Header(16 doublewords – fixed format)
A non‐volatile parameter‐storage areafor each PCI device‐function
PCI Configuration Space Body(48 doublewords – variable format)
64
doublewords
Status
Register
Command
Register
Device
ID
Vendor
ID
BISTCache
Line
Size
Class CodeClass/SubClass/ProgIF
Revision
ID
Base Address 0
Latency
Timer
Header
Type
Base Address 1
31 031 0
16 doublewordsDwords
1 ‐ 01 ‐ 0
3 ‐ 23 ‐ 2
5 45 4Base Address 0
Subsystem
Device IDSubsystem
Vendor ID CardBus CIS Pointer
reservedcapabilitiescapabilities
pointerExpansion ROM Base Address
MinimumMinimum
Grant
Interrupt
Pinreserved
Base Address 1
Base Address 2Base Address 3
Base Address 4Base Address 5
Interrupt
Line
MaximumMaximum
Latency
5 ‐ 45 ‐ 4
7 ‐ 67 ‐ 6
9 ‐ 89 ‐ 8
11 ‐ 1011 ‐ 10
13 ‐ 1213 ‐ 12
15 ‐ 1415 ‐ 14
11/13/2014
12
TX FIFOmain
memory
packet
buffernic
RX FIFO
transceiver LAN
cableB
U
S
buffer
CPU
• Network Interface’s hardware needs to implement ‘filtering’ of network packets
• Otherwise the PC’s memory‐usage and processor‐time will be wasted handlingprocessor time will be wasted handling packets not meant for this PC to receive
network packet’s layout
Destination‐address (6‐bytes) Source‐address (6‐bytes)
Each data‐packet begins with the 6‐byte device‐addressof the network interface which is intended to receive it
• You can see the Hardware Address of the ethernet controller on your PC by typing:
$ /sbin/ifconfigk f i i h fi li f• Look for it in the first line of screen‐output
that is labeled ‘eth0’, for example:
eth0 Link encap: Ethernet HWaddr 00:11:43:C9:50:3A
Status
Register
Command
Register
DeviceID
0x1677
VendorID
0x14E4
BISTCache
Line
Size
Class CodeClass/SubClass/ProgIF
Revision
ID
Base Address 0
Latency
Timer
Header
Type
Base Address 1
31 031 0
16 doublewords
Dwords
1 ‐ 0
3 ‐ 2
5 ‐ 4Base Address 0
Subsystem
Device IDSubsystem
Vendor ID CardBus CIS Pointer
reservedcapabilities
pointerExpansion ROM Base Address
Minimum
Grant
Interrupt
Pinreserved
Base Address 1
Base Address 2Base Address 3
Base Address 4Base Address 5
Interrupt
Line
Maximum
Latency
5 ‐ 4
7 ‐ 6
9 ‐ 8
11 ‐ 10
13 ‐ 12
15 ‐ 14
#include <linux/pci.h>struct pci_dev *devp;
unsigned int iomem_base, iomem_size;
void *io;
devp = pci_get_device( 0x14E4, 0x1677, NULL );if ( !d )if ( !devp ) return –ENODEV;
iomem_base = pci_resource_start( devp, 0 );iomem_size = pci_resource_len( devp, 0 );
io = ioremap( iomem_base, iomem_size );
if ( !io ) return ‐EBUSY;
See the DemosSee the Demos