+ All Categories
Home > Documents > The Linux file system modules

The Linux file system modules

Date post: 22-Mar-2016
Category:
Upload: sakina
View: 34 times
Download: 0 times
Share this document with a friend
Description:
The Linux file system modules. Nezer J. Zaidenberg. Minhala. In 29.1 recitation I will publish ex.1 and 2 questions. And ex. 2 solution. Students who have not yet submitted ex. 2 must do so prior to 29.1 - PowerPoint PPT Presentation
45
The Linux file system modules Nezer J. Zaidenberg
Transcript
Page 1: The Linux file system modules

The Linux file system modules

Nezer J. Zaidenberg

Page 2: The Linux file system modules

Minhala In 29.1 recitation I will publish ex.1 and 2 questions.

And ex. 2 solution. Students who have not yet submitted ex. 2 must do

so prior to 29.1 All students that submitted HW must schedule oral

exam prior to 29.1 or they will fail the homework! Students that cannot meet the 29.1 deadline with

good reason should inform me. We will work something out.

Page 3: The Linux file system modules

Minhala 2 You should submit ex 3. before the test, or request

extension before the test. If you will not request extension (sending me email

with your team members ID) We will publish your final grade after the exam.

Please send me the requests to [email protected]

We will not accept requests after the exam and if you have posted a request you should submit the ex.

Page 4: The Linux file system modules

Minhala 3 Shiurhazara before the test – will be held 1 day

before the exam at noon. (Room will be announced) I will answer all your questions and go over the

questions we asked in HW-1,2 and some issues that will be raised on the lectures and filesystem ex.

Page 5: The Linux file system modules

Back to file system

Page 6: The Linux file system modules

What should we know What is a File system How the VFS calls file system specific functions

via “virtual table” (“Inheritance in C”) How to operate (start/stop) VMWARE How to write simple (hello world) modules How to write file system modules that register file

system and read the super-block How to debug using printk and /var/log/messages

Page 7: The Linux file system modules

What next Successful mount Successful ls Successful open/touch Successful read/write Successful mkdir/remove dir Successful mmap/munmap List of functions to implement List of kernel function we can use

Page 8: The Linux file system modules

A word of caution… In order not to give all my cards…. I have cited sources from 3 different sources

My uxfs Minix Ext2

This way you can still think about ex.3 without getting all the code… But beware not everything is done exactly the same in all file

systems

You will also see examples of how the “inheritance” in Linux file system is implemented. (Think about “generic file system” from which uxfs, minix and ext2 inherit)

Page 9: The Linux file system modules

Working with block devices References

P. 348 (scanning for uxfs file system) UNIX filesystems – very simplified

Chapter 15.2 Understanding The Linux Kernel 3rdeditiion – much more then we need

Page 10: The Linux file system modules

Buffer head bread basics When we access a block from block device we call the bread()

function. The bread() function reads block from block device returning a

buffer_head object (this object can later be accessed for data) Each call to bread() will be followed by a call to brelse() which will

release the buffer. A 2nd call to bread() before brelse() was called will cause the

operation to block() Sb_bread() is a wrapper to bread()

Sb_bread(sb, block(==offset)) == bread(sb->s_dev, block, sb->s_blocksize)

We will use sb_bread() in most code samples (brelse still apply)

Page 11: The Linux file system modules

Buffer head writing and reading In order to write a buffer head we mark it as dirty

using mark_buffer_dirty(structbuffer_head) The dirty buffers are periodically written to disk

(or written on brelse) In order to access the data read we read b_data

member of struct buffer_head

Page 12: The Linux file system modules

Examples – ux_put_super + ux_write_super Void ux_put_super(struct super_block *s){ struct ux_fs *fs = (struct ux_fs *) s->s_fs_info; struct buffer_head *bh = fs->u_sbh;

printk (KERN_ERR "scipio : ux_put_super %s %d", __FILE__, __LINE__);kfree(fs);

brelse(bh);}

Page 13: The Linux file system modules

Ux_write_super 1/2void ux_write_super(struct super_block *sb){

struct ux_fs *fs = (struct ux_fs *) sb->s_fs_info;

struct buffer_head *bh = fs->u_sbh;printk (KERN_ERR "Scipio write super was called %s %d\n”, __FILE__, __LINE__);lock_kernel();

Page 14: The Linux file system modules

Ux_write_super 2/2 printk (KERN_ERR "Scipio write super after lock kernel %s %d\

n”,__FILE__, __LINE__); if (!(sb->s_flags & MS_RDONLY)) { mark_buffer_dirty(bh); } sb->s_dirt = 0;

printk (KERN_ERR "Scipio write super before unlock kernel %s %d\n”,__FILE__, __LINE__);unlock_kernel();printk (KERN_ERR "Scipio write super after unlock kernel %s %d\n”, _FILE__, __LINE__);

}

Page 15: The Linux file system modules

Completing the mount operation

And initial discussion on locking

Page 16: The Linux file system modules

So what does mount(1) check after mounting File system mount(1) operation also calls read to

the root inode verifing that indeed mount was successful and a directory was written

Some of you have demonstrated mount that fails with “not a directory” message.

For mount(1) to be completed successfully we need the XX_iget implementation.

(The kernel knows what is the root inode to read because of the d_alloc_root function)

Page 17: The Linux file system modules

ux_iget() – my iget (porting the book)struct inode *ux_iget(struct super_block *sb, unsigned long ino){ struct buffer_head *bh; struct ux_inode *di;

int block;struct inode * inode;printk (KERN_ERR "scipio : ux_iget was called %s %d\n", __FILE__, __LINE__);inode = iget_locked(sb,ino);

Page 18: The Linux file system modules

My ux_iget (2/6)if (!(inode)) {

printk (KERN_ERR "scipio : ux_iget iget_locked failed %s %d\n", __FILE__, __LINE__);

return ERR_PTR(-ENOMEM);}if (!(inode->i_state & I_NEW)) return inode;

if (ino < UX_ROOT_INO || ino > UX_MAXFILES) { printk("uxfs: Bad inode number %lu\n", ino);

printk (KERN_ERR "scipio : ux_iget bad inode number %lu, %s %d\n", ino, __FILE__, __LINE__);

goto ux_iget_error;}

Page 19: The Linux file system modules

My ux_iget 3/6 // Note that for simplicity, there is only one inode per

block! block = UX_INODE_BLOCK + ino; bh = sb_bread(inode->i_sb, block); if (!bh) {

printk (KERN_ERR "scipio : ux_iget problem with sb_bread on inode %d %s %d\n", ino, __FILE__, __LINE__);goto ux_iget_error;

} di = (struct ux_inode *)(bh->b_data);

inode->i_mode = di->i_mode;

Page 20: The Linux file system modules

My ux_iget (4/6)if (di->i_mode & S_IFDIR) {

inode->i_mode |= S_IFDIR; inode->i_op = &ux_dir_inops; inode->i_fop = &ux_dir_operations; } else if (di->i_mode & S_IFREG) { inode->i_mode |= S_IFREG; inode->i_op = &ux_file_inops; inode->i_fop = &ux_file_operations; inode->i_mapping->a_ops = &ux_aops; }

Page 21: The Linux file system modules

My ux_iget 5/6 inode->i_uid = di->i_uid; inode->i_gid = di->i_gid; inode->i_nlink = di->i_nlink; inode->i_size = di->i_size; inode->i_blocks = di->i_blocks;

inode->i_atime.tv_sec = di->i_atime; inode->i_mtime.tv_sec = di->i_mtime; inode->i_ctime.tv_sec = di->i_ctime; inode->i_atime.tv_nsec = 0;

inode->i_mtime.tv_nsec = 0;

Page 22: The Linux file system modules

My ux_iget 6/6 Inode->i_ctime.tv_nsec = 0; memcpy(&inode->i_private, di, sizeof(struct ux_inode)); brelse(bh);

unlock_new_inode(inode);printk (KERN_ERR "scipio : ux_iget before return %s %d\n", __FILE__, __LINE__);return inode;

ux_iget_error:printk (KERN_ERR "scipio : ux_iget had error %s %d\n", __FILE__, __LINE__);iget_failed(inode);return ERR_PTR(-EINVAL);

}

Page 23: The Linux file system modules

The new iget_locked()

New way Each file system has fs_iget()

which calls iget_locked(); Iget_locked() -> search for

inode in the inode cache (shared memory) if its there it is returned. If not it is red from disk.

(naturally all shared memory operations are locked)

Old way Iget() method Each fs had

read_inode() Disappeared : 2.6.25

(not so very long ago!) Problems : with style

and locking

For more information : http://kerneltrap.org/Linus/Removing_iget_and_read_inode

Page 24: The Linux file system modules

Some more kernel operations Printk - we know kmalloc/kfree – same as the none kernel function

(kmalloc should get extra parameter value GFP_KERNEL) (more on this… kzalloc = kmalloc + set memory to zero. Kcalloc = like normal calloc)

most strXXX and memXXX functions are usable in the kernel same as in user mode (though the implementation is built in kernel not via library function)

Complete kernel API reference : http://www.gelato.unsw.edu.au/~dsw/public-files/kernel-docs/kernel-api/index.html

Page 25: The Linux file system modules

Just a word of caution The Linux kernel is evolving beast with API

coming in and out with practically no attempt for backward compatibility.

Examples : iget_locked was added at kernel 2.6.25 while kzalloc was added at 2.6.14 (and doesn’t appear in the API reference)

The kernel progress via emails and post in mailing list and everything is documented. When in doubt ask google.

Page 26: The Linux file system modules

Reading inode from disk – minix stylefs/minix/bitmap.c115 minix_V1_raw_inode(struct super_block *sb, ino_tino, structbuffer_head **bh)

116 {

117 int block;

118 structminix_sb_info *sbi = minix_sb(sb);

119 structminix_inode *p;

120

121 if (!ino || ino>sbi->s_ninodes) {

122 printk("Badinode number on dev %s: %ld is out of range\n",

123 sb->s_id, (long)ino);

124 return NULL;

125 }

Page 27: The Linux file system modules

fs/minix/bitmap.c 126 ino--;127 block = 2 + sbi->s_imap_blocks + sbi->s_zmap_blocks +128 ino / MINIX_INODES_PER_BLOCK;129 *bh = sb_bread(sb, block);130 if (!*bh) {131 printk("Unable to read inode block\n");132 return NULL;133 }134 p = (void *)(*bh)->b_data;135 return p + ino % MINIX_INODES_PER_BLOCK;136 }

Page 28: The Linux file system modules

Writing inode Is done via call to iput. (This will also call your

routines) Iput() marks the inode as used one less time.

When usage equal zero the inode is put to disk and is freed.

Iget/iget_locked() increase the usage by 1

Page 29: The Linux file system modules

Write_inode (from minix)fs/minix/inode.c

560 static intminix_write_inode(structinode * inode, int wait)

561 {562 brelse(minix_update_inode(inode));563 return 0;564 }

Page 30: The Linux file system modules

Still minix : fs/minix/inode.c552 static structbuffer_head *minix_update_inode(structinode

*inode)553 {554 if (INODE_VERSION(inode) == MINIX_V1)555 return V1_minix_update_inode(inode);556 else557 return V2_minix_update_inode(inode);558 }

Page 31: The Linux file system modules

More from minixfs/minix/inode.c

499 static structbuffer_head * V1_minix_update_inode(struct inode * inode)

500 {501 structbuffer_head * bh;502 structminix_inode * raw_inode;503 structminix_inode_info *minix_inode =

minix_i(inode);504 inti;505

Page 32: The Linux file system modules

And… fs/minix/inode.c506 raw_inode = minix_V1_raw_inode(inode->i_sb,

inode->i_ino, &bh);507 if (!raw_inode)508 return NULL;…519 mark_buffer_dirty(bh);520 return bh;521 }

Page 33: The Linux file system modules

Creating new files When we call touch for example… We need to allocate new inode We allocate a Linux inode and also a file system

inode pointed by the above Please note : allocate_inode is a new method (It

does not appear in UNIX file system book) do not confuse with pate’s ux_ialloc() which finds a free inode.

Page 34: The Linux file system modules

How ext2 allocate inode 142 static struct inode *ext2_alloc_inode(struct super_block *sb)

143 { 144 struct ext2_inode_info *ei; 145 ei = (struct ext2_inode_info

*)kmem_cache_alloc(ext2_inode_cachep, GFP_KERNEL ); 146 if (!ei) 147 return NULL; // scipio : I removed some #ifdef

152 ei->i_block_alloc_info = NULL; 153 ei->vfs_inode.i_version = 1; 154 return &ei->vfs_inode; 155 }

Page 35: The Linux file system modules

For those who find it weird : ext2.h16 struct ext2_inode_info {17 __le32 i_data[15];18 __u32 i_flags;19 __u32 i_faddr;20 … 62 struct mutextruncate_mutex; 63 struct inode vfs inode; 64 struct list_headi_orphan; /* unlinked but open inodes

*/ 65 };

Page 36: The Linux file system modules

Explaining Struct inode is encapsulated in ext2_inode_info so using

simple pointer arithmetic one an find the correct pointer… that is done via thestatic inline struct ext2_inode_info *EXT2_I(struct inode

*inode)Function(Though it may be more correct that ext2_inode “is a”n inode

and not “has a”n inode kernel developers are more interested in speed and memory locality then OOP. I’ve implemented two mallocs and it also works)

Page 37: The Linux file system modules

Speed is of most importance to kernel developers (but I would be most willing to explain code lines)

Page 38: The Linux file system modules

Get block/put blockWorks roughly the same as with Inode but via

different data structure(blocks are read using sb_bread() and put using

brelese() after we mark the block as dirty)We may want to do our own locking (especially in

SMP systems)

Page 39: The Linux file system modules

Kernel spinlocks and the BKL Kernel spinlocks are named “recursive mutexes” When the lock is obtained nobody else can obtain the lock.

(operation would block) Previous versions of Linux had the “Big Kernel Lock” acronym

== BKL. That means that each lock locked the entire kernel (even unrelated parts)

This lock is beginning to phase out… But for simplicity and improved stability it may be a good idea

to have all your functions inside a “lock_kernel() statement. (The BKL is released with unlock_kernel())

Page 40: The Linux file system modules

Example in kernel code(from fs/ext2/inode.c)

BKL1384 lock_kernel();1385 ext2_update_dynamic_rev(sb);1386

EXT2_SET_RO_COMPAT_FEATURE(sb,1387

EXT2_FEATURE_RO_COMPAT_LARGE_FILE);1388 unlock_kernel();1389 ext2_write_super(sb);

Page 41: The Linux file system modules

Reading directories Reading directories is identical to reading inodes

as far as Inodes are concerned Reading directories requires directory_operation

struct with different functions then file operations Reading directories one has to fill a dirent

structure (take note that this is why dirent structure has inode number which we never used in user space)

Page 42: The Linux file system modules

List of useful directory functions d_alloc_root (p. 349 Unix filesystems) – allocate

the root Inode for the kernel to read filldir (p. 353 Unix filesystems) – copy directory

content to user space d_XXX (see the kernel API) – manipulate the

kernel directory cache does exactly what the name applies

Page 43: The Linux file system modules

NOW WHAT You should be able to create file system (using

Userland mkfs) You should be able to create file system that

support reading and writing files and directories (You have all the API’s and the kernel example. (feel free to examine other file systems))

You should be able to DIG into mmap and links alone… but I’ll cover that next lecture

Page 44: The Linux file system modules

Some more implementation hints It may be a good idea to turn SELinux off while working on

the kernel. echo 0 > /selinux/enforce

It may be better idea to make SELinux not start or permissive Edit /etc/selinux/config SELINUX=disabled // DISABLED OR SELINUX=permissive // generate warnings

http://www.geocities.com/ravikiran_uvs/articles/rkfs.html is an helpful (beej like) manual on how to write file system kernel modules may be worth your time

Page 45: The Linux file system modules

It’s never to late to start digging the kernel!


Recommended