The Linux file system modules

transcript

Nezer J. Zaidenberg

Minhala In 29.1 recitation I will publish ex.1 and 2 questions.

And ex. 2 solution. Students who have not yet submitted ex. 2 must do

so prior to 29.1 All students that submitted HW must schedule oral

exam prior to 29.1 or they will fail the homework! Students that cannot meet the 29.1 deadline with

good reason should inform me. We will work something out.

Minhala 2 You should submit ex 3. before the test, or request

extension before the test. If you will not request extension (sending me email

with your team members ID) We will publish your final grade after the exam.

Please send me the requests to nzaidenberg@mac.com

We will not accept requests after the exam and if you have posted a request you should submit the ex.

Minhala 3 Shiurhazara before the test – will be held 1 day

before the exam at noon. (Room will be announced) I will answer all your questions and go over the

questions we asked in HW-1,2 and some issues that will be raised on the lectures and filesystem ex.

Back to file system

What should we know What is a File system How the VFS calls file system specific functions

via “virtual table” (“Inheritance in C”) How to operate (start/stop) VMWARE How to write simple (hello world) modules How to write file system modules that register file

system and read the super-block How to debug using printk and /var/log/messages

What next Successful mount Successful ls Successful open/touch Successful read/write Successful mkdir/remove dir Successful mmap/munmap List of functions to implement List of kernel function we can use

A word of caution… In order not to give all my cards…. I have cited sources from 3 different sources

My uxfs Minix Ext2

This way you can still think about ex.3 without getting all the code… But beware not everything is done exactly the same in all file

systems

You will also see examples of how the “inheritance” in Linux file system is implemented. (Think about “generic file system” from which uxfs, minix and ext2 inherit)

Working with block devices References

P. 348 (scanning for uxfs file system) UNIX filesystems – very simplified

Chapter 15.2 Understanding The Linux Kernel 3rdeditiion – much more then we need

Buffer head bread basics When we access a block from block device we call the bread()

function. The bread() function reads block from block device returning a

buffer_head object (this object can later be accessed for data) Each call to bread() will be followed by a call to brelse() which will

release the buffer. A 2nd call to bread() before brelse() was called will cause the

operation to block() Sb_bread() is a wrapper to bread()

Sb_bread(sb, block(==offset)) == bread(sb->s_dev, block, sb->s_blocksize)

We will use sb_bread() in most code samples (brelse still apply)

Buffer head writing and reading In order to write a buffer head we mark it as dirty

using mark_buffer_dirty(structbuffer_head) The dirty buffers are periodically written to disk

(or written on brelse) In order to access the data read we read b_data

member of struct buffer_head

Examples – ux_put_super + ux_write_super Void ux_put_super(struct super_block *s){ struct ux_fs *fs = (struct ux_fs *) s->s_fs_info; struct buffer_head *bh = fs->u_sbh;

printk (KERN_ERR "scipio : ux_put_super %s %d", __FILE__, __LINE__);kfree(fs);

brelse(bh);}

Ux_write_super 1/2void ux_write_super(struct super_block *sb){

struct ux_fs *fs = (struct ux_fs *) sb->s_fs_info;

struct buffer_head *bh = fs->u_sbh;printk (KERN_ERR "Scipio write super was called %s %d\n”, __FILE__, __LINE__);lock_kernel();

Ux_write_super 2/2 printk (KERN_ERR "Scipio write super after lock kernel %s %d\

n”,__FILE__, __LINE__); if (!(sb->s_flags & MS_RDONLY)) { mark_buffer_dirty(bh); } sb->s_dirt = 0;

printk (KERN_ERR "Scipio write super before unlock kernel %s %d\n”,__FILE__, __LINE__);unlock_kernel();printk (KERN_ERR "Scipio write super after unlock kernel %s %d\n”, _FILE__, __LINE__);

Completing the mount operation

And initial discussion on locking

So what does mount(1) check after mounting File system mount(1) operation also calls read to

the root inode verifing that indeed mount was successful and a directory was written

Some of you have demonstrated mount that fails with “not a directory” message.

For mount(1) to be completed successfully we need the XX_iget implementation.

(The kernel knows what is the root inode to read because of the d_alloc_root function)

ux_iget() – my iget (porting the book)struct inode *ux_iget(struct super_block *sb, unsigned long ino){ struct buffer_head *bh; struct ux_inode *di;

int block;struct inode * inode;printk (KERN_ERR "scipio : ux_iget was called %s %d\n", __FILE__, __LINE__);inode = iget_locked(sb,ino);

My ux_iget (2/6)if (!(inode)) {

printk (KERN_ERR "scipio : ux_iget iget_locked failed %s %d\n", __FILE__, __LINE__);

return ERR_PTR(-ENOMEM);}if (!(inode->i_state & I_NEW)) return inode;

if (ino < UX_ROOT_INO || ino > UX_MAXFILES) { printk("uxfs: Bad inode number %lu\n", ino);

printk (KERN_ERR "scipio : ux_iget bad inode number %lu, %s %d\n", ino, __FILE__, __LINE__);

goto ux_iget_error;}

My ux_iget 3/6 // Note that for simplicity, there is only one inode per

block! block = UX_INODE_BLOCK + ino; bh = sb_bread(inode->i_sb, block); if (!bh) {

printk (KERN_ERR "scipio : ux_iget problem with sb_bread on inode %d %s %d\n", ino, __FILE__, __LINE__);goto ux_iget_error;

} di = (struct ux_inode *)(bh->b_data);

inode->i_mode = di->i_mode;

My ux_iget (4/6)if (di->i_mode & S_IFDIR) {

inode->i_mode |= S_IFDIR; inode->i_op = &ux_dir_inops; inode->i_fop = &ux_dir_operations; } else if (di->i_mode & S_IFREG) { inode->i_mode |= S_IFREG; inode->i_op = &ux_file_inops; inode->i_fop = &ux_file_operations; inode->i_mapping->a_ops = &ux_aops; }

My ux_iget 5/6 inode->i_uid = di->i_uid; inode->i_gid = di->i_gid; inode->i_nlink = di->i_nlink; inode->i_size = di->i_size; inode->i_blocks = di->i_blocks;

inode->i_atime.tv_sec = di->i_atime; inode->i_mtime.tv_sec = di->i_mtime; inode->i_ctime.tv_sec = di->i_ctime; inode->i_atime.tv_nsec = 0;

inode->i_mtime.tv_nsec = 0;

My ux_iget 6/6 Inode->i_ctime.tv_nsec = 0; memcpy(&inode->i_private, di, sizeof(struct ux_inode)); brelse(bh);

unlock_new_inode(inode);printk (KERN_ERR "scipio : ux_iget before return %s %d\n", __FILE__, __LINE__);return inode;

ux_iget_error:printk (KERN_ERR "scipio : ux_iget had error %s %d\n", __FILE__, __LINE__);iget_failed(inode);return ERR_PTR(-EINVAL);

The new iget_locked()

New way Each file system has fs_iget()

which calls iget_locked(); Iget_locked() -> search for

inode in the inode cache (shared memory) if its there it is returned. If not it is red from disk.

(naturally all shared memory operations are locked)

Old way Iget() method Each fs had

read_inode() Disappeared : 2.6.25

(not so very long ago!) Problems : with style

and locking

For more information : http://kerneltrap.org/Linus/Removing_iget_and_read_inode

Some more kernel operations Printk - we know kmalloc/kfree – same as the none kernel function

(kmalloc should get extra parameter value GFP_KERNEL) (more on this… kzalloc = kmalloc + set memory to zero. Kcalloc = like normal calloc)

most strXXX and memXXX functions are usable in the kernel same as in user mode (though the implementation is built in kernel not via library function)

Complete kernel API reference : http://www.gelato.unsw.edu.au/~dsw/public-files/kernel-docs/kernel-api/index.html

Just a word of caution The Linux kernel is evolving beast with API

coming in and out with practically no attempt for backward compatibility.

Examples : iget_locked was added at kernel 2.6.25 while kzalloc was added at 2.6.14 (and doesn’t appear in the API reference)

The kernel progress via emails and post in mailing list and everything is documented. When in doubt ask google.

Reading inode from disk – minix stylefs/minix/bitmap.c115 minix_V1_raw_inode(struct super_block *sb, ino_tino, structbuffer_head **bh)

117 int block;

118 structminix_sb_info *sbi = minix_sb(sb);

119 structminix_inode *p;

121 if (!ino || ino>sbi->s_ninodes) {

122 printk("Badinode number on dev %s: %ld is out of range\n",

123 sb->s_id, (long)ino);

124 return NULL;

fs/minix/bitmap.c 126 ino--;127 block = 2 + sbi->s_imap_blocks + sbi->s_zmap_blocks +128 ino / MINIX_INODES_PER_BLOCK;129 *bh = sb_bread(sb, block);130 if (!*bh) {131 printk("Unable to read inode block\n");132 return NULL;133 }134 p = (void *)(*bh)->b_data;135 return p + ino % MINIX_INODES_PER_BLOCK;136 }

Writing inode Is done via call to iput. (This will also call your

routines) Iput() marks the inode as used one less time.

When usage equal zero the inode is put to disk and is freed.

Iget/iget_locked() increase the usage by 1

Write_inode (from minix)fs/minix/inode.c

560 static intminix_write_inode(structinode * inode, int wait)

561 {562 brelse(minix_update_inode(inode));563 return 0;564 }

Still minix : fs/minix/inode.c552 static structbuffer_head *minix_update_inode(structinode

*inode)553 {554 if (INODE_VERSION(inode) == MINIX_V1)555 return V1_minix_update_inode(inode);556 else557 return V2_minix_update_inode(inode);558 }

The Linux file system modules

Documents