The Linux file system modules

Post on 22-Mar-2016

34 views 0 download

description

The Linux file system modules. Nezer J. Zaidenberg. Minhala. In 29.1 recitation I will publish ex.1 and 2 questions. And ex. 2 solution. Students who have not yet submitted ex. 2 must do so prior to 29.1 - PowerPoint PPT Presentation

transcript

The Linux file system modules

Nezer J. Zaidenberg

Minhala In 29.1 recitation I will publish ex.1 and 2 questions.

And ex. 2 solution. Students who have not yet submitted ex. 2 must do

so prior to 29.1 All students that submitted HW must schedule oral

exam prior to 29.1 or they will fail the homework! Students that cannot meet the 29.1 deadline with

good reason should inform me. We will work something out.

Minhala 2 You should submit ex 3. before the test, or request

extension before the test. If you will not request extension (sending me email

with your team members ID) We will publish your final grade after the exam.

Please send me the requests to nzaidenberg@mac.com

We will not accept requests after the exam and if you have posted a request you should submit the ex.

Minhala 3 Shiurhazara before the test – will be held 1 day

before the exam at noon. (Room will be announced) I will answer all your questions and go over the

questions we asked in HW-1,2 and some issues that will be raised on the lectures and filesystem ex.

Back to file system

What should we know What is a File system How the VFS calls file system specific functions

via “virtual table” (“Inheritance in C”) How to operate (start/stop) VMWARE How to write simple (hello world) modules How to write file system modules that register file

system and read the super-block How to debug using printk and /var/log/messages

What next Successful mount Successful ls Successful open/touch Successful read/write Successful mkdir/remove dir Successful mmap/munmap List of functions to implement List of kernel function we can use

A word of caution… In order not to give all my cards…. I have cited sources from 3 different sources

My uxfs Minix Ext2

This way you can still think about ex.3 without getting all the code… But beware not everything is done exactly the same in all file

systems

You will also see examples of how the “inheritance” in Linux file system is implemented. (Think about “generic file system” from which uxfs, minix and ext2 inherit)

Working with block devices References

P. 348 (scanning for uxfs file system) UNIX filesystems – very simplified

Chapter 15.2 Understanding The Linux Kernel 3rdeditiion – much more then we need

Buffer head bread basics When we access a block from block device we call the bread()

function. The bread() function reads block from block device returning a

buffer_head object (this object can later be accessed for data) Each call to bread() will be followed by a call to brelse() which will

release the buffer. A 2nd call to bread() before brelse() was called will cause the

operation to block() Sb_bread() is a wrapper to bread()

Sb_bread(sb, block(==offset)) == bread(sb->s_dev, block, sb->s_blocksize)

We will use sb_bread() in most code samples (brelse still apply)

Buffer head writing and reading In order to write a buffer head we mark it as dirty

using mark_buffer_dirty(structbuffer_head) The dirty buffers are periodically written to disk

(or written on brelse) In order to access the data read we read b_data

member of struct buffer_head

Examples – ux_put_super + ux_write_super Void ux_put_super(struct super_block *s){ struct ux_fs *fs = (struct ux_fs *) s->s_fs_info; struct buffer_head *bh = fs->u_sbh;

printk (KERN_ERR "scipio : ux_put_super %s %d", __FILE__, __LINE__);kfree(fs);

brelse(bh);}

Ux_write_super 1/2void ux_write_super(struct super_block *sb){

struct ux_fs *fs = (struct ux_fs *) sb->s_fs_info;

struct buffer_head *bh = fs->u_sbh;printk (KERN_ERR "Scipio write super was called %s %d\n”, __FILE__, __LINE__);lock_kernel();

Ux_write_super 2/2 printk (KERN_ERR "Scipio write super after lock kernel %s %d\

n”,__FILE__, __LINE__); if (!(sb->s_flags & MS_RDONLY)) { mark_buffer_dirty(bh); } sb->s_dirt = 0;

printk (KERN_ERR "Scipio write super before unlock kernel %s %d\n”,__FILE__, __LINE__);unlock_kernel();printk (KERN_ERR "Scipio write super after unlock kernel %s %d\n”, _FILE__, __LINE__);

}

Completing the mount operation

And initial discussion on locking

So what does mount(1) check after mounting File system mount(1) operation also calls read to

the root inode verifing that indeed mount was successful and a directory was written

Some of you have demonstrated mount that fails with “not a directory” message.

For mount(1) to be completed successfully we need the XX_iget implementation.

(The kernel knows what is the root inode to read because of the d_alloc_root function)

ux_iget() – my iget (porting the book)struct inode *ux_iget(struct super_block *sb, unsigned long ino){ struct buffer_head *bh; struct ux_inode *di;

int block;struct inode * inode;printk (KERN_ERR "scipio : ux_iget was called %s %d\n", __FILE__, __LINE__);inode = iget_locked(sb,ino);

My ux_iget (2/6)if (!(inode)) {

printk (KERN_ERR "scipio : ux_iget iget_locked failed %s %d\n", __FILE__, __LINE__);

return ERR_PTR(-ENOMEM);}if (!(inode->i_state & I_NEW)) return inode;

if (ino < UX_ROOT_INO || ino > UX_MAXFILES) { printk("uxfs: Bad inode number %lu\n", ino);

printk (KERN_ERR "scipio : ux_iget bad inode number %lu, %s %d\n", ino, __FILE__, __LINE__);

goto ux_iget_error;}

My ux_iget 3/6 // Note that for simplicity, there is only one inode per

block! block = UX_INODE_BLOCK + ino; bh = sb_bread(inode->i_sb, block); if (!bh) {

printk (KERN_ERR "scipio : ux_iget problem with sb_bread on inode %d %s %d\n", ino, __FILE__, __LINE__);goto ux_iget_error;

} di = (struct ux_inode *)(bh->b_data);

inode->i_mode = di->i_mode;

My ux_iget (4/6)if (di->i_mode & S_IFDIR) {

inode->i_mode |= S_IFDIR; inode->i_op = &ux_dir_inops; inode->i_fop = &ux_dir_operations; } else if (di->i_mode & S_IFREG) { inode->i_mode |= S_IFREG; inode->i_op = &ux_file_inops; inode->i_fop = &ux_file_operations; inode->i_mapping->a_ops = &ux_aops; }

My ux_iget 5/6 inode->i_uid = di->i_uid; inode->i_gid = di->i_gid; inode->i_nlink = di->i_nlink; inode->i_size = di->i_size; inode->i_blocks = di->i_blocks;

inode->i_atime.tv_sec = di->i_atime; inode->i_mtime.tv_sec = di->i_mtime; inode->i_ctime.tv_sec = di->i_ctime; inode->i_atime.tv_nsec = 0;

inode->i_mtime.tv_nsec = 0;

My ux_iget 6/6 Inode->i_ctime.tv_nsec = 0; memcpy(&inode->i_private, di, sizeof(struct ux_inode)); brelse(bh);

unlock_new_inode(inode);printk (KERN_ERR "scipio : ux_iget before return %s %d\n", __FILE__, __LINE__);return inode;

ux_iget_error:printk (KERN_ERR "scipio : ux_iget had error %s %d\n", __FILE__, __LINE__);iget_failed(inode);return ERR_PTR(-EINVAL);

}

The new iget_locked()

New way Each file system has fs_iget()

which calls iget_locked(); Iget_locked() -> search for

inode in the inode cache (shared memory) if its there it is returned. If not it is red from disk.

(naturally all shared memory operations are locked)

Old way Iget() method Each fs had

read_inode() Disappeared : 2.6.25

(not so very long ago!) Problems : with style

and locking

For more information : http://kerneltrap.org/Linus/Removing_iget_and_read_inode

Some more kernel operations Printk - we know kmalloc/kfree – same as the none kernel function

(kmalloc should get extra parameter value GFP_KERNEL) (more on this… kzalloc = kmalloc + set memory to zero. Kcalloc = like normal calloc)

most strXXX and memXXX functions are usable in the kernel same as in user mode (though the implementation is built in kernel not via library function)

Complete kernel API reference : http://www.gelato.unsw.edu.au/~dsw/public-files/kernel-docs/kernel-api/index.html

Just a word of caution The Linux kernel is evolving beast with API

coming in and out with practically no attempt for backward compatibility.

Examples : iget_locked was added at kernel 2.6.25 while kzalloc was added at 2.6.14 (and doesn’t appear in the API reference)

The kernel progress via emails and post in mailing list and everything is documented. When in doubt ask google.

Reading inode from disk – minix stylefs/minix/bitmap.c115 minix_V1_raw_inode(struct super_block *sb, ino_tino, structbuffer_head **bh)

116 {

117 int block;

118 structminix_sb_info *sbi = minix_sb(sb);

119 structminix_inode *p;

120

121 if (!ino || ino>sbi->s_ninodes) {

122 printk("Badinode number on dev %s: %ld is out of range\n",

123 sb->s_id, (long)ino);

124 return NULL;

125 }

fs/minix/bitmap.c 126 ino--;127 block = 2 + sbi->s_imap_blocks + sbi->s_zmap_blocks +128 ino / MINIX_INODES_PER_BLOCK;129 *bh = sb_bread(sb, block);130 if (!*bh) {131 printk("Unable to read inode block\n");132 return NULL;133 }134 p = (void *)(*bh)->b_data;135 return p + ino % MINIX_INODES_PER_BLOCK;136 }

Writing inode Is done via call to iput. (This will also call your

routines) Iput() marks the inode as used one less time.

When usage equal zero the inode is put to disk and is freed.

Iget/iget_locked() increase the usage by 1

Write_inode (from minix)fs/minix/inode.c

560 static intminix_write_inode(structinode * inode, int wait)

561 {562 brelse(minix_update_inode(inode));563 return 0;564 }

Still minix : fs/minix/inode.c552 static structbuffer_head *minix_update_inode(structinode

*inode)553 {554 if (INODE_VERSION(inode) == MINIX_V1)555 return V1_minix_update_inode(inode);556 else557 return V2_minix_update_inode(inode);558 }

More from minixfs/minix/inode.c

499 static structbuffer_head * V1_minix_update_inode(struct inode * inode)

500 {501 structbuffer_head * bh;502 structminix_inode * raw_inode;503 structminix_inode_info *minix_inode =

minix_i(inode);504 inti;505

And… fs/minix/inode.c506 raw_inode = minix_V1_raw_inode(inode->i_sb,

inode->i_ino, &bh);507 if (!raw_inode)508 return NULL;…519 mark_buffer_dirty(bh);520 return bh;521 }

Creating new files When we call touch for example… We need to allocate new inode We allocate a Linux inode and also a file system

inode pointed by the above Please note : allocate_inode is a new method (It

does not appear in UNIX file system book) do not confuse with pate’s ux_ialloc() which finds a free inode.

How ext2 allocate inode 142 static struct inode *ext2_alloc_inode(struct super_block *sb)

143 { 144 struct ext2_inode_info *ei; 145 ei = (struct ext2_inode_info

*)kmem_cache_alloc(ext2_inode_cachep, GFP_KERNEL ); 146 if (!ei) 147 return NULL; // scipio : I removed some #ifdef

152 ei->i_block_alloc_info = NULL; 153 ei->vfs_inode.i_version = 1; 154 return &ei->vfs_inode; 155 }

For those who find it weird : ext2.h16 struct ext2_inode_info {17 __le32 i_data[15];18 __u32 i_flags;19 __u32 i_faddr;20 … 62 struct mutextruncate_mutex; 63 struct inode vfs inode; 64 struct list_headi_orphan; /* unlinked but open inodes

*/ 65 };

Explaining Struct inode is encapsulated in ext2_inode_info so using

simple pointer arithmetic one an find the correct pointer… that is done via thestatic inline struct ext2_inode_info *EXT2_I(struct inode

*inode)Function(Though it may be more correct that ext2_inode “is a”n inode

and not “has a”n inode kernel developers are more interested in speed and memory locality then OOP. I’ve implemented two mallocs and it also works)

Speed is of most importance to kernel developers (but I would be most willing to explain code lines)

Get block/put blockWorks roughly the same as with Inode but via

different data structure(blocks are read using sb_bread() and put using

brelese() after we mark the block as dirty)We may want to do our own locking (especially in

SMP systems)

Kernel spinlocks and the BKL Kernel spinlocks are named “recursive mutexes” When the lock is obtained nobody else can obtain the lock.

(operation would block) Previous versions of Linux had the “Big Kernel Lock” acronym

== BKL. That means that each lock locked the entire kernel (even unrelated parts)

This lock is beginning to phase out… But for simplicity and improved stability it may be a good idea

to have all your functions inside a “lock_kernel() statement. (The BKL is released with unlock_kernel())

Example in kernel code(from fs/ext2/inode.c)

BKL1384 lock_kernel();1385 ext2_update_dynamic_rev(sb);1386

EXT2_SET_RO_COMPAT_FEATURE(sb,1387

EXT2_FEATURE_RO_COMPAT_LARGE_FILE);1388 unlock_kernel();1389 ext2_write_super(sb);

Reading directories Reading directories is identical to reading inodes

as far as Inodes are concerned Reading directories requires directory_operation

struct with different functions then file operations Reading directories one has to fill a dirent

structure (take note that this is why dirent structure has inode number which we never used in user space)

List of useful directory functions d_alloc_root (p. 349 Unix filesystems) – allocate

the root Inode for the kernel to read filldir (p. 353 Unix filesystems) – copy directory

content to user space d_XXX (see the kernel API) – manipulate the

kernel directory cache does exactly what the name applies

NOW WHAT You should be able to create file system (using

Userland mkfs) You should be able to create file system that

support reading and writing files and directories (You have all the API’s and the kernel example. (feel free to examine other file systems))

You should be able to DIG into mmap and links alone… but I’ll cover that next lecture

Some more implementation hints It may be a good idea to turn SELinux off while working on

the kernel. echo 0 > /selinux/enforce

It may be better idea to make SELinux not start or permissive Edit /etc/selinux/config SELINUX=disabled // DISABLED OR SELINUX=permissive // generate warnings

http://www.geocities.com/ravikiran_uvs/articles/rkfs.html is an helpful (beej like) manual on how to write file system kernel modules may be worth your time

It’s never to late to start digging the kernel!