Kernel Internals

Post on 04-Apr-2018

239 views 0 download

transcript

  • 7/30/2019 Kernel Internals

    1/54

    Santosh Sam Koshy

    santoshk@cdac.in

    Centre for Development of Advanced Computing, Hyderabad

    Kernel Internals

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    2/54

    Santosh Sam Koshy

    santoshk@cdac.in

    Centre for Development of Advanced Computing, Hyderabad

    Agenda

    IOCTLKernel Synchronization Techniques

    Wait Queues

    Time Delays

    Deferred Executions

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    3/54

    14/11/2012

    Santosh Sam Koshy

    santoshk@cdac.in

    Centre for Development of Advanced Computing, Hyderabad 3

    IOCTL

    Most drivers need -in addition to the ability to read and writethe device -the ability to perform various types of hardware

    control via the device driver. These operations are normally

    supported via the ioctl method

    In the user space, the ioctl command has the following formatint ioctl(int fd, unsigned long cmd, ...);

    The ioctl driver method has the prototype

    int (*ioctl) (struct inode *inode, struct file *filp, unsigned int

    cmd, unsigned long arg);

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    4/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Magic Numbers

    Magic numbers are mechanisms of identifying the commandsfor a particular device. They must be unique over the system.

    These are maintained by the kernel in 4 bit-fields

    type: This is the magic number present in the file ioctl-

    number.txt. It is 8 bits widenumber:The ordinal (sequential) number. It is also 8 bits wide

    direction:The direction of data transfer. Two bits

    size: The size of the user data involved. The size is architecture

    dependent, and is generally limited to 13 or 14 bits

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    5/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Kernel Synchronization

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    6/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Agenda

    Sources of Concurrency in the KernelMechanisms to manage concurrency

    Semaphores

    RW Semaphores

    Spinlocks

    RW Spinlocks

    Completions

    Atomic Variables

    Sequential Locks

    RCU

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    7/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    What is Synchronization

    In the kernel there can be many tasks that execute pseudo

    concurrently. This may lead to data inconsistencies in

    accessing a common resource

    A well defined coordination between tasks in accessing

    shared data is a must and this coordination leads tosynchronization between tasks.

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    8/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Sources of Concurrency

    In a linux system, there is a possibility that numerous process areexecuting in the user space, making system calls to the kernel

    SMP systems can access your code concurrently

    Kernel code is preemptible

    Interrupts are asynchronous events that can cause concurrent

    execution

    Delayed code execution mechanisms provided by the kernel

    Hot pluggable devices can suddenly stop the functioning of thecode

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    9/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Mechanisms to manage concurrency

    Any time a hardware or software resource is shared beyond asingle thread of execution, there is a possibility that one thread

    gets an inconsistent view of that resource

    This calls for some resource access management and is broughtabout by mechanisms called locking or mutual exclusion -

    making sure that only one resource can manipulate a shared

    resource at one time.

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    10/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Semaphores

    At its core, a semaphore is a single integer value combined witha pair of functions that are typically called up and down

    To use semaphores, the code must include asm/semaphore.h.

    The semaphore implementation in the kernel is just a structure

    semaphore.struct semaphore {

    atomic_t count;

    int sleepers;

    wait_queue_head_t wait;

    };

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    11/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Semaphores

    There are two ways of creating a semaphore. The dynamic wayuses the function

    void sema_init(struct semaphore *sem, int val)

    Statically, semaphores may be created by the macro

    static DECLARE_SEMAPHORE_GENERIC(name,count)

    The count or val in both cases specifies the initialization value of

    the semaphore. Setting it to 1 created the semaphore as a binary

    semaphore or a mutex (mutual exclusion semaphore)

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    12/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Semaphores

    Semaphores may also be created in the mutex mode by the

    following functions

    DECLARE_MUTEX(name);

    DECLARE_MUTEX_LOCKED(name);They may be initialized at runtime by the following

    init_MUTEX(struct semaphore *sem);

    void init_MUTEX_LOCKED(struct semaphore *sem)

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    13/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Semaphores

    Semaphores may be accessed by calling one of the following

    functions

    void down(struct semaphore *sem);

    int down_interruptible(struct semaphore *sem);

    int down_trylock(struct semaphore *sem);

    Once access to the critical section is completed, the

    semaphore may be released by the function

    void up(struct semaphore *sem)

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    14/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Reader/Writer Semaphores

    Code using rwsems must include linux/rwsem.h. The relevant

    data type for rwsem is struct rw_semaphore. An rwsem must beexplicitly initialized at run time using

    void init_rwsem(struct rw_semaphore *sem)

    For read only access,

    void down_read(struct rw_semaphore *sem);

    int down_read_trylock(struct rw_semaphore *sem);

    void up_read(struct rw_semaphore *sem)

    For requirements wherein a long read is required after a quickwrite,

    void downgrade_write(struct semaphore *sem)

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    15/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Spinlocks

    A spinlock is a mutual exclusion device that can have only twovalues locked and unlocked. It is implemented as a single bit in

    an integer value. Code wishing to take out a particular lock tests

    the relevant bit.

    Unlike semaphores, spinlocks may be used in code that cannotsleep.

    If the lock is taken by somebody else, the code goes into a tight

    loop where it repeatedly checks the lock until it becomes

    available

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    16/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Spinlocks

    Spinlocks are intended for use on multiprocessor systems

    although a uniprocessor workstation running a preemptive

    kernel behaves like SMP

    If a non preemptive uniprocessor ever went into a spinlock, it

    would spin forever; on other thread would ever be able to

    obtain the CPU to release the lock

    The Linux implementation nullifies the spinlock

    implementation if it is tried to be used on a uniprocessorsystem

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    17/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Spinlocks

    The required include file for spinlock primitives is

    linux/spinlock.h. A spinlock has the type spinlock_t and has to

    be initialized before it is used

    The static initialization for a spinlock is done by

    spinlock_t my_lock = SPIN_LOCK_UNLOCKEDor at runtime as

    void spin_lock_init(spinlock_t *lock);

    A spinlock is obtiained and released by

    void spin_lock(spinlock_t *lock);

    void spin_unlock(spinlock_t *lock);

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    18/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Spinlocks and Interrupts

    Spinlocks can be used in interrupt handlers, whereas semaphores

    may not be used since they sleep.

    If a lock is shared with an interrupt handler, local interrupts must

    be disabled before acquiring the lock.

    The kernel provides a separate interface for this which disablesand enables local interrupts on acquiring and releasing the

    spinlock respectively

    void spin_lock_irqsave(spinlock_t *lock, unsigned long flags);

    void spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    19/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Reader/Writer Spinlocks

    Using read/write spinlocks is similar to rwsems

    It is initialized by

    rwlock_t mylock =RW_LOCK_UNLOCKED //static way

    rwlock_t mylock;

    rw_lock_init (&my_rwlock); //Dynamic way

    Reader and writer locks may be gained and released by

    void read_lock(rmlock_t *lock);

    void read_unlock(rwlock_t *lock);

    void write_lock(rwlock_t *lock);

    void write_unlock(rwlock_t *lock);

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    20/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Semaphores vs Spinlocks

    Requirement Recommended Lock

    Low overhead locking Spinlock

    Short lock hold time Spinlock

    Long lock hold time Semaphore

    Need to lock from interrupt context Spinlock

    Need to sleep while holding lock Semaphore

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    21/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Completions

    A common phenomenon is kernel programming is theinitiation of some activity outside the current execution flow

    and then wait for that activity to complete.

    Consider the following code snippet

    struct semaphore sem;init_MUTEX_LOCKED(&sem);

    start_external_task(&sem);

    down(&sem);

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    22/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Completions

    Completions are a simple light weight mechanism with one task:

    allowing one thread to tell another that the job is done

    A completion can be created with

    DECLARE_COMPLETION(my_completion);

    Waiting for the completion is by simply callingvoid wait_for_completion(struct completion *c);

    The actual completion event is signaled by

    void complete(struct completion *c);

    void complete_all(struct completion *c);

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    23/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Atomic Variables

    Atomic variables are special data types that are provided by thekernel, to perform simple operations in an atomic manner.

    The kernel provides an atomic integer type called atomic_t and

    a set of functions that have to be used to perform operations on

    the atomic variables.The operations are very fast, because they compile to a simple

    machine instruction whenever possible

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    24/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Atomic Integer Operations

    Some important integer operations are

    void atomic_set(atomic_t *v, int i);

    int atomic_read(atomic_t *v);

    void atomic_add(int i, atomic_t *v);void atomic_sub(int i, atomic_t *v);

    void atomic_inc(atomic_t *v);

    void atomic_dec(atomic_t *v);

    int atomic_inc_and_test(atomic_t *v);

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    25/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Atomic Bit Operations

    The atomic data type is good in working around with

    integers. It does not suffice bitwise operations. The kernel

    provides necessary functions that act on single bits. These

    are declared in asm/bitops.h

    The available bit operations are:

    void set_bit(nr, void *addr);

    void clear_bit(nr, void *addr);

    void change_bit(nr, void *addr);

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    26/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    seqlocks

    An added feature in the 2.6 kernel that is intended to provide

    fast, lockless access to a shared resource.

    Seq locks work in situations where write access is rare but must

    be fast

    They work by allowing readers free access to the resource butrequiring those readers to check for collisions with writers and,

    when collisions occur, retry their access

    Cannot be used to protect data structures involving pointers

    because the reader may be following a pointer that is invalidwhile the writer may be changing the data structure

    seqlocks

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    27/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    seqlocks

    Seqlocks are defined in linux/seqlock.h. It may be initialized by

    seqlock_t lock1= SEQLOCK_UNLOCKED;

    The write path is obtained by

    void write_seqlock(seqlock_t *lock);

    /*write lock is obtained....make changes*/

    void write_sequnlock (seqlock_t *lock);

    Readers may function in this pattern

    do {

    seq = read_seqbegin(&lock );read the data here

    }while (read_seqretry(&lock, seq);

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    28/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Read Copy Update

    SharedResourc

    e

    Pointer

    Copy ofSharedResourc

    e

    Pointer

    reader 1

    reader 2

    reader 3

    reader 4

    reader 5

    SharedResourc

    e

    Pointer

    Copy ofSharedResourc

    e

    Pointer

    reader 1

    reader 2

    reader 3

    reader 4

    reader 5

    Pointer

    writer

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    29/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Wait Queues, Delays and DeferredExecution

    A d

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    30/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Agenda

    Wait Queues

    HZ

    Jiffies

    Long Delays

    Kernel Timers

    Tasklets

    Work Queues

    W it Q

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    31/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Wait Queues

    Wait Queues are mechanisms of putting a user space process

    into a sleep whenever the kernel driver is not able to suffice theuser processs requirements.

    When a process is put to sleep, it is marked as being in a special

    state and removed from the schedulers run queue. The process

    will not be scheduled unless an event causes the scheduling.

    The linux scheduler maintains two special states that represent a

    wait state. They are defined as TASK_INTERRUPTIBLE and

    TASK_UNINTERRUPTIBLE

    Declaration of Wait Queues

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    32/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Declaration of Wait Queues

    The wait queue may also be defined as a list of processes

    waiting for a specific event.

    Wait queues are managed by means of a wait queue head, a

    structure of type wait_queue_head_t. It may be defined and

    initialized as

    DECLARE_WAIT_QUEUE(name); //static declaration

    Or

    wait_queue_head_t my_queue; //Dynamically

    init_waitqueue_head (&my_queue);

    This only creates a wait queue list for appending future tasks to

    it

    U i W it Q

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    33/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Using Wait Queues

    When a process sleeps, it is in expectation that some condition

    will become true in the future. The simplest way of sleeping iscalling the macro wait_event

    wait_event (queue, condition);

    Wait_event_interruptible (queue, condition);

    Wait_event_timeout (queue, condition, timeout);Wait_event_interruptible_timeout (queue, condition, timeout);

    The waking up process is either another user space process ormay be an interrupt handler. It satisfies the condition for wake upand calls one of the appropriate functions

    Void wake_up (wait_queue_head_t *queue);

    Void wake_up_interruptible(wait_queue_head_t *queue);

    When to use Wait Queues

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    34/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    When to use Wait Queues

    There are two behaviors that warrant the use of wait queuesIf a process calls readbut no data is available, the process must block.The process is awakened as soon as some data arrives, and that data isreturned to the caller, even if there is less data than the amountrequested in the count argument to the method.

    If a process calls write and there is no space in the buffer, the processmust block, and it must be on a different wait queue from the one usedfor reading. When some data has been written to the hardware device,and space becomes free in the output buffer, the process is awakened

    and the write call succeeds, although the data may be only partiallywritten if there isnt room in the buffer for the count bytes that wererequested

    Exclusive Waits

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    35/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Exclusive Waits

    Thundering Herd

    In using wait queues, we may occur a situation wherein there are many

    processes waiting for the occurrence of an event.

    During the wakeup process, all processes waiting for the event are made

    ready to execute. This causes a herd of processes thunder-in together to

    gain exclusive access to the shared resource.Only one of these events is satisfied with the CPU and the rest have to

    go back into their sleep state. This thundering of processes for CPU

    access may deteriorate the overall system performance, if it is quite

    frequent.

    This problem is known as the Thundering Herd Problem and is sorted

    out using Exclusive Wait Mechanisms

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    36/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    Exclusive Waits

    In response to the thundering herd problems, kernel developers

    have added an exclusivewait option to the kernel.There are two important differences in using exclusive waits:

    When a wait queue entry has the WQ_FLAG_EXCLUSIVE flag set, itis added to the end of the wait queue. Entries without that flag areadded to the beginning

    When wake_up is called on a wait queue, it stops after waking the firstprocess that has the WQ_FLAG_EXCLUSIVE flag set

    Putting a process into an interruptible wait is a simple matter ofcalling

    Void prepare_to_wait_exclusive(wait_queue_head_t *queue,wait_queue_t *wait, int state)

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    37/54

    Santosh Sam Koshy

    santoshk@cdac.inCentre for Development of Advanced Computing, Hyderabad

    HZ

    The kernel keeps track of the flow of time by means of timerinterrupts. These are generated by the systems timinghardware at regular intervals.

    This interval is programmed at system boot up by the kernelaccording to the value HZ, which is an architecture dependent

    variable. The default values range from 50 to 1200 and istypically set to 100 or 1000 on x86 machines

    Changing the value of HZ to a new effect will take its toll onlyon recompiling the kernel with the new value

    Jiffies

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    38/54

    Santosh Sam Koshy

    santoshk@cdac.in

    Centre for Development of Advanced Computing, Hyderabad

    Jiffies

    Every time a timer interrupt occurs, the value of an internal

    kernel counter is incremented. The counter is initialized to 0 on

    system boot and therefore represents the number of timer ticks

    since last boot.

    The counter is a 64 bit variable and is called jiffies_64.

    However, driver writers access the jiffies variable, an unsigned

    long that is same as either jiffies_64 or its least significant bits.

    Using the Jiffies Counter

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    39/54

    Santosh Sam Koshy

    santoshk@cdac.in

    Centre for Development of Advanced Computing, Hyderabad

    Using the Jiffies Counter

    The jiffies counter can be used in reading the present time andthereby calculating the future timestamp. This may be

    explained as follows

    J = jiffies;

    Stamp_1 = J + HZ //Stamp_1 iterates to one second

    ahead

    Stamp_2 = J + HZ/2 // Stamp_2 may refer to half a

    second in the future

    Delaying Execution

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    40/54

    Santosh Sam Koshy

    santoshk@cdac.in

    Centre for Development of Advanced Computing, Hyderabad

    Delaying Execution

    Long Delays:

    Occasionally a driver needs to delay execution for relatively longperiods more than one clock tick. There are few ways of

    implementing the same

    Busy Waiting

    J = jiffies;Delay = J + 5 * HZ //A delay of 5 HZ from now

    While (time_before (J, Delay)) {

    /* do nothing */}

    Delaying Execution

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    41/54

    Santosh Sam Koshy

    santoshk@cdac.in

    Centre for Development of Advanced Computing, Hyderabad

    y g

    This method causes a busy looping in the while statement, whichhogs the CPU for no productive outcome

    Yielding the Processor

    While (time_before (J, Delay)) {

    schedule(); // yield the CPU

    }

    The advantage of this method is that another process may get

    access to the CPU. The delay requested guaranteed but the

    process may not be scheduled exactly after the requested delay.

    Delaying Execution

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    42/54

    Santosh Sam Koshy

    santoshk@cdac.in

    Centre for Development of Advanced Computing, Hyderabad

    y g

    Short Delays:The kernel implements functions that provide delays that may

    not otherwise be possible with the jiffies counter. These delays

    are implemented as function loop, depending on the architecture.

    Void ndelay(unsigned long nsecs);

    Void udelay(unsigned long usecs);

    Void mdelay(unsigned long msecs;

    Kernel Timers

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    43/54

    Santosh Sam Koshy

    santoshk@cdac.in

    Centre for Development of Advanced Computing, Hyderabad

    Kernel Timers

    Kernel timers are used to schedule execution of a function ata later instance of time, based on the clock tick.

    A kernel timer is a data structure that instructs the kernel toexecute a user defined function with a user defined argumentat a user defined time.

    The declaration can be found in linux/timer.h and the sourcecode may be found in kernel/timer.c

    Kernel Timers

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    44/54

    Santosh Sam Koshy

    santoshk@cdac.in

    Centre for Development of Advanced Computing, Hyderabad

    The functions scheduled to run, may not run while the process

    that initiated it is executing. They are run asynchronously as inan interrupt context.

    Kernel timers may be considered as a software interrupthandlers and have certain constraints associated with their

    implementations.Primarily, they have to be atomic and there are additionalconstraints because of execution in the interrupt context

    The timers run on the same CPU that registered it.

    The Timer API

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    45/54

    Santosh Sam Koshy

    santoshk@cdac.in

    Centre for Development of Advanced Computing, Hyderabad

    The Timer API

    The kernel provides drivers with a number of functions todeclare, register and remove kernel timers

    struct timer_list {

    unsigned long expires;

    void (*function)(unsigned long);

    unsigned long data;

    };

    The member expires specifies the amount of time for delay.There is a function pointer to a user defined function and the

    third parameter data takes the arguments to the function pointer.

    The Timer API

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    46/54

    Santosh Sam Koshy

    santoshk@cdac.in

    Centre for Development of Advanced Computing, Hyderabad

    The Timer API

    The timer may be initialized by using the function

    void init_timer(struct timer_list *timer);

    The public fields of the structure may be initialized after returningfrom the function

    This timer may be added and deleted to and from the the kernel

    using the functions

    void add_timer(struct timer_list *timer);

    void del_timer(struct timer_list *timer);

    The timer is a one shot execution and is taken off the list before

    it is run.

    Other Timer APIs

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    47/54

    Santosh Sam Koshy

    santoshk@cdac.in

    Centre for Development of Advanced Computing, Hyderabad

    int mod_timer(struct timer_list *timer, unsigned long expires);

    Allows the modification of the timer count

    int timer_pending(const struct timer_list *timer);

    Returns true or false to indicate whether the timer is currentlyscheduled to run by reading one of the opaque fields of thestructure

    Applications of Kernel Timers

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    48/54

    Santosh Sam Koshy

    santoshk@cdac.in

    Centre for Development of Advanced Computing, Hyderabad

    They find various applications such as polling a device bychecking its status registers at regular intervals, when thehardware cannot fire interrupts.

    Other applications may be turning off the floppy motor, shuttingdown the processor fan on system shutdown etc...

    Tasklets

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    49/54

    Santosh Sam Koshy

    santoshk@cdac.in

    Centre for Development of Advanced Computing, Hyderabad

    Tasklets is another kernel facility that allows deferring theexecution of a process to a later instance.

    It is similar to kernel timers in that they run at interrupt time,they always run on the same CPU that schedules them and theyreceive an unsigned long argument

    They differ from kernel timers in the fact that they are notscheduled at a particular time. They are scheduled by the systemat a later instance of time.

    The Tasklet Data Structure

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    50/54

    Santosh Sam Koshy

    santoshk@cdac.in

    Centre for Development of Advanced Computing, Hyderabad

    A tasklet exists as a data structure that must be initializedbefore use.

    struct tasklet_struct {

    void (* func)(unsigned long);

    unsigned long data;

    }

    Initialization is done by the function

    void tasklet_init(struct tasklet_struct t, void (func)(unsignedlong), unsigned long data);

    Tasklets

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    51/54

    Santosh Sam Koshy

    santoshk@cdac.in

    Centre for Development of Advanced Computing, Hyderabad

    A tasklet can be disabled and re-enabled later; it wont beexecuted until it is enabled as many times as it has been disabled

    A tasklet can re-register itself

    A tasklet can be scheduled to execute at normal priority or high

    priorityTasklets may be run immediately if the system is not underheavy load but never later than the next timer tick

    Tasklet APIs

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    52/54

    Santosh Sam Koshy

    santoshk@cdac.in

    Centre for Development of Advanced Computing, Hyderabad

    void tasklet_disable(struct tasklet_struct *t);

    void tasklet_enable(struct tasklet_struct *t);

    void tasklet_schedule(struct tasklet_struct *t);

    void tasklet_hi_schedule(struct tasklet_struct *t);

    void tasklet_kill(struct tasklet_struct *t);

    Work Queues

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    53/54

    Santosh Sam Koshy

    santoshk@cdac.in

    Centre for Development of Advanced Computing, Hyderabad

    Work queues allow the kernel code to request that a function be

    called at some future time. They differ asWork Queue functions run in the context of a special kernelprocess

    These functions can sleep

    Kernel code can request that the execution of work queuefunctions be delayed for an explicit interval

    The key difference between tasklets and work queues is thattasklets execute for a short period, immediately and are atomic.

    The same does not hold for work queues

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in
  • 7/30/2019 Kernel Internals

    54/54

    Santosh Sam Koshy

    santoshk@cdac.in

    Centre for Development of Advanced Computing, Hyderabad

    mailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.inmailto:santoshk@cdac.in