+ All Categories
Home > Documents > OS09 K. Gopinath, IIScgopi/os09/extra-sep-os09.pdfIndian Inst of Science 1 OS09 K. Gopinath, IISc...

OS09 K. Gopinath, IIScgopi/os09/extra-sep-os09.pdfIndian Inst of Science 1 OS09 K. Gopinath, IISc...

Date post: 28-Jan-2021
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
62
Indian Inst of Science 1 OS09 K. Gopinath, IISc Slides are from Tanenbaum, provided as part of his book Updated Tanenbaum slides from by Darrell Long/Ethan Miller at UCSC Mine own (K. Gopinath) Some papers/books too numerous to list Hence PL. DO NOT CIRCULATE
Transcript
  • Indian Inst of Science 1

    OS09K. Gopinath, IISc

    Slides are from Tanenbaum, provided as part of his book Updated Tanenbaum slides from

    by Darrell Long/Ethan Miller at UCSC Mine own (K. Gopinath) Some papers/books too numerous to list Hence PL. DO NOT CIRCULATE

  • Sleep and wakeup: ProducerConsumer problem#include "prototypes.h"

    #define N 100

    int count = 0;

    void producer(void) {

    int item;

    while (TRUE) {

    produce_item(&item);

    if (count == N) sleep();

    enter_item(item);

    count = count + 1;

    if (count == 1) wakeup(consumer);

    }

    }

    void consumer(void) {

    int item;

    while (TRUE) {

    if (count == 0) sleep();

    remove_item(&item);

    count = count - 1;

    if (count == N - 1) wakeup(producer);

    consume_item(item);

    }

    }

    Race problem if just before sleep, consumer swapped out and producer signals

  • Interrupts & Kernel code● Interrupts (or process scheduling) can occur anytime● Interrupt handler can also call brelse just like kcode

    ● However, interrupt handler should not block● Otherwise, the process on whose (kernel) stack the interrupt 

    handler runs blocks● Can expose data structures in an inconsistent state

    ● List manipulation requires multiple steps● Interrupt can expose intermediate state

    ● Interrupt handler can manipulate linked lists that kernel code could also be manipulating● Need to raise “processor execution level” to mask interrupts (or 

    scheduling)● Check & sleep (or test & set) should be atomic

  • Buffer Allocation Algs● getblk: given a filesystem number and disk block number, get the 

    buffer for it locked

    ● brelse: given a locked buffer, wakeup waiting procs and unlock it

    ● bread: read a given disk block into a buffer

    ● breada: bread + asynch. read ahead

    ● bwrite: write a given buffer to a disk block

    ● Buffer properties

    ● No file block in 2 different buffers● Can be in free list or hash list: Search free list if any buffer 

    needed; hash list if a particular buffer needed● Buf alloc safe: allocs during syscall & frees at end

    – Disk drive hw problem: cannot interrupt CPU: buf lost!– But no starvation guarantees: 

  • Data Structures● inode: owner, file type (REG, DIR, FIFO, CHR, 

    BLK, ...), access perms/times, #links, disk addrs for blocks in file, file size

    ● incore inode: addl fields: locked?, processwaiting?, dirty?, mount point?; reference count (# of opens), ptrs to other incore inodes(free and hash q)

    ● superblock: size of FS/inode list, #free blocks/inodes, dirty?,list/bitmap of free blocks/inodes, index of next free block/inode, locks for lists/bitmap

  • algorithm getblkinput: file sys #, block #; output: locked buffer that can now be used for block

    while (buffer not found) { 1

    if (block in hash queue) { 2

    if (buffer busy) { 3

    sleep (event buffer becomes free); 4

    continue; 5

    } 6

    mark buffer busy; 7

    remove buffer from free list; 8

    return buffer; 9

    } else { 10

    if (there are no buffers on free list){11

    sleep (event any buffer becomes free);

    continue;

    }

    remove buffer from free list;

    if (buffer marked for delayed write) {

    asynchronous write buffer to disk;

    continue;

    }

    mark buffer busy;

    remove buffer from old hash queue;

    put buffer onto new hash queue;

    return buffer;

    }

    }

    bread(filesystem f, block n) {

    getblk(f,n);

    if (buffer data valid) return buffer;

    initiate disk read;

    sleep(event disk read complete);

    return (buffer); }

  • Race Conditions   P1 P2 P3block b not

    on hash Q

    no free bufs

    sleep

    block b not on hash Q

    no free bufs

    sleep

    free a buf

    wakeup

    use freed buf

    wakeup

    Try from beg!

    P1 P2 P3alloc buf to block b

    lock buf

    init I/O

    sleep until

    I/O done

    buf locked

    sleep

    wait for any buf

    get buf b & reassign to b'

    wakeup; try again!

  • algorithm brelseinput : locked buffer

    output : none

    {wake up all procs : event, waiting for any buffer to become free

    wake up all procs : event, waiting for this buffer to become free

    raise processor execution level to block interrupts;

    if (buffer contents are valid and buffer not old)

    enqueue buffer at end of free list

    else enqueue buffer at the beginning of the free list

    lower processor execution level to allow interrupts;

    unlock(buffer);}bwrite(buf b) {

    initiate disk write;

    if (I/O synchronous) {sleep(event I/O complete); brelse(b);}

    else (if b marked for delayed write) mark buffer to put at head of list

    }

  • Problems● 1st prob● P1 finds buffer is busy (line 3) & “starts” to sleep ● P1 gets preempted as P2 finishes use of buffer (thru interrupt)

    and releases it● P1 now begins sleep even though buffer free

    ● 2nd prob● P1 finds no free buffers (line 11)● P1 gets preempted as P2 finishes use of buffer (thru interrupt)

    and releases it● P1 now begins sleep even though buffer free

    ● 3rd prob● Both getblk (line 8) and brelse manipulate free list

    Atomicity violated in each case: interleaved execution of getblk with interrupt handler or with brelse

    Need to block interrupts; Also: have to watch out for hardware error!

  • Other file ops● iget: get a locked inode (doing bread if necessary) given

    inode number● iput: release an inode; if ref count 0, writes dirty inode● bmap: given inode and byte offset, returns disk block num

    and offset● namei: given a path, get the locked inode● ialloc: assign a new disk inode for a newly created file● ifree: free an inode (link count 0)● alloc: allocate a free disk block and return buffer using getblk● free: free a disk block

  • initsem(semaphore *sem, int val) { *sem = val}

    void P(semaphore *sem) {

    *sem -= 1;

    while (*sem

  • Have we solved the problem?● P() and V() must be executed atomically● In uniprocessor system may disable interrupts● In multiprocessor system, use hardware 

    synchronization primitives● TS, FAA, etc…

    ● Involves some limited amount of busy waiting

  • Simulation of a monitor with semaphorestypedef int semaphore;

    semaphore mutex = 1;

    void enter_monitor(void) {

    down(mutex);

    }

    void leave_normally(void) {

    up(mutex);

    }

    void leave_with_signal(semaphore c) {

    /* signal on c & exit monitor */

    up(c);

    }

    void wait(semaphore c) {

    up(mutex);

    down(c);

    }

  • Java Monitors● void wait(); Enter a monitor's wait set until notified by 

    another thread● void wait(long timeout);  Enter a monitor's wait set until 

    notified by another thread or timeout milliseconds elapses● void wait(long timeout, int nanos);  Enter a monitor's wait 

    set until notified by another thread or timeout milliseconds plus nanos nanoseconds elapses

    ● void notify();  Wake up one thread waiting in the monitor's wait set. (If no threads are waiting, do nothing.)

    ● void notifyAll();  Wake up all threads waiting in the monitor's wait set. (If no threads are waiting, do nothing.)

  • Java (contd)● Each Java monitor has a single nameless anonymous 

    condition variable on which a thread can wait() or signal one waiting thread with notify() or signal all waiting threads with notifyAll().

    ● This nameless condition variable corresponds to a lock on the object that must be obtained whenever a thread calls a synchronized method in the object. ● Only inside a synchronized method may wait(), notify(), 

    and notifyAll() be called.● Methods that are static can also be synchronized. There is a 

    lock associated with the class that must be obtained when a static synchronized method is called. 

  • Problems with Semaphores● Too complex?

    ● Needs lowlevel atomic op to construct, blocking & unblocking involve context switches, manipulates scheduler and sleep Qs

    ● Good for resources held for long times, not for short ● Good as V only wakes up if someone can run● But this  can result in convoys

    ● Low priority process P1 that has locked an imp lock (L) preempted by P2 which then waits for L

    – Imp lock: Often log lock in txnal systems– P3 also needs L, P4 also, ... all wait

    ● P1 scheduled again (FIFO) & unlocks L● P2 gets lock (P1 preempted), P2 uses lock, then P3, ...● For next upd, P2 goes back to Q again, then P3, P4,...● Lockunlock: 100's of insts; lockwaitdispatchunlock: 1000's 

  • Semaphore Problem?● 1: T2 (P2) in cs (using a sem): blocks T3; T4 on run Q● 2: T2 exits cs but active; T3 now gets sem but inactive ● 3: T1 (P1) now wants to enter cs but blocked by T3

    ● T1 blocked even if no one in cs! FIFO property!● T4 scheduled on P1

    – T1 & T3 cannot run unless T2 or T4 giveup– Processor 1: T1 > T1 > T4– Processor 2: T2 > T2 > T2

    ● Problem in step 2:● Have to make sure that T3 does not get sem but on ready 

    Q. T1 will then get sem & no context sw.● Need different semantics: eg: condition variables

  • Message Passing: Mailboxes, Ports, CSPSend/Receive; Blocking/nonblocking

    typedef int message[MSIZE];

    void producer(void){

    int item;

    message m;

    while (TRUE) {

    produce_item(&item);

    receive(consumer, &m);

    bulid_message(&m, item);

    send(consumer, &m);

    }

    }

    void consumer(void){

    int item, i; message m;

    for (i = 0; i < N; i++) send(producer, &m);

    while (TRUE) {

    receive(producer, &m);

    extract_item(&m, & item);

    send(producer, &m);

    consumer_item(item);

    }

    }

  • Fork & fork1 in MT processes● Process with exactly 1 LWP=> same semantics as “old Unix” 

    process● copy all LWPs on fork? Solaris9 but not Posix

    ● one LWP blocked in parent: what about in child? Restart? Concurrent syscalls? EINTR or wait(disk)?

    ● one LWP has open netw cnxn: if closed, unexpected user msg to remote node

    ● one LWP changing a shared data structure: corruption thru the new copy of LWP? How to make a “consistent” copy?

    ● copy only calling LWP? Fork1: Solaris10; good for exec'ing● some user thrs not on LWPs that were in parent● child process should not try to acq locks held by LWPs not 

    in child (deadlock!) but user code cannot know! these locks may be held by ulib POSIX

  • fork1fork1(): only calling LWP created in child   registration of fork_handlers (_atfork)

           prepare: prior to fork in the ctxt of calling LWP. LIFO

           parent: after fork. FIFO

           child: after fork in context of 1 thr in child. FIFO

           LIFO/FIFO order to enable preserving of locking order

              int pthread_atfork(void (*prepare)  (void),                                      void (*parent) (void), void (*child) (void));

       handles orphaned mutexes

          prepare fork handlers lock all mutexes (by calling thr)

          parent/child fork handlers unlock mutexes

       indep libs & appl progs can protect themselves

          lib provides fork handlers

  • Fork and threads Thr A Thr B Thr B_ch locks mutex modifies shared data fork copy of locked mutex and inconsistent data struct. /* cannot drop mutex as data inconsistent nor can it take mutex: deadlock*/ /* memory leaks also! */

  • Solutions?● programs that use fork() call an exec function soon

    afterwards in child process, thus resetting all states● In the meantime, only a short list of async-signal-safe

    library routines are promised to be available● But not good wrt multi-threaded (MT) libraries.

    ● Applications may not be aware that a MT library is in use, and feel free to call any number of library routines between the fork() and exec calls. They may be extant 1-threaded programs that cannot be expected to obey new restrictions imposed by the threads library.

    ● A MT library needs a way to protect its internal state during fork() in case it is re-entered later in the child process. eg. MT I/O libraries, which are invoked between the fork() and exec calls to effect I/O redirection.

  • Fork handling● Lock global mutexes

    ● Other threads locked out of the critical regions of code protected by these mutexes

    ● Can take snapshot: copy of valid, stable data● Reset synchr objects in the child process

    ● ensures they are properly cleansed of any artifacts from the threading subsystem of parent process

    ● eg. a mutex may inherit a wait queue of threads waiting for the lock; this wait queue makes no sense in child. Initialize mutex to remedy (deletes unnecessary data structures in child). Otherwise memory leaks!

    ● But how to correct or otherwise deal with the inconsistent state in the child?

  • With pthread_atfork: no orphaned locks! prepare: lock(mutex) parent: unlock(mutex)

    child: unlock(mutex) Thr A Thr B ThrB_ch locks mutex

    modifies shared data

    attempt fork but blocked as

    prepare (lock mutex) blocked

    drops mutex (shared data now consistent!)

    prepare succeeds (locks mutex)

    fork completes

    unlocks mutex (parent) unlock mutex(child)

  • Solutions0: pthread_atfork: provides MT libraries with a means

    to protect themselves from innocent appls that call fork(), and provides MT appls with a std mech for protecting themselves from fork() calls in a lib routine or the appl itself. But COMPLEX!!! Avoid problems by

    1: If posssible, fork before creating any threads2: Instead of fork, create a new thread. If forking to

    exec a binary, can attempt to convert binary to a shared lib that can be linked to.

    3: Try a surrogate parent method. Fork at init time; the child will be a "surrogate" parent that will remain 1-threaded. When exec is needed, child is informed and it does a fork/exec

  • Posix Model of Concurrency● Creation● pthread_create(tp, attrp, fptr, argp)             ● pthread_attr_xxx(): manipulate attr of a thread

    – Init/destroy; set/get detachstate, inheritsched, schedparam, schedpolicy, scope, stackaddr, stacksize 

    ● Exit● pthread_exit(retvalp)                          ● pthread_join(t, **v): wait for another thread termination  ● pthread_detach(t): storage for thread can be reclaimed 

    when thread terminates (no zombie)● Thread Specific Data (indexed by key)

    ● pthread_key_create(keyp, fpdestructor)/_delete()           ● pthread_setspecific()/_getspecific() mapping betw key and thread   

  • ● Signal: pthread_sigmask(how, newmask, saveprev): change signal  mask  for  calling thread 

    ● pthread_kill(t, sig)                   sigwait: suspend thr till sig   ● ID: pthread_self()                         

    ● pthread_equal(t1, t2)                      ● pthread_once(once?, fptr):  ensure some init at most once 

    ● Scheduling● pthread_setschedparam()/_getschedparam()   

    ●  Cancellation (cancellation pts: _join, _cond_wait, _cond_timedwait, sem_wait, sigwait, _testcancel)

    ● pthread_cancel(t) by others /pthread_testcancel(void) by self● pthread_setcancelstate()/type()● pthread_cleanup_pop()/_push(): if a thread exits or cancelled 

    (with locked mutexes?), cleanup handlers executed; LIFO order                      

  • ● Mutex● pthread_mutex_init()/_destroy()● pthread_mutexattr_xxx()

    – Init/destroy; set/get pshared, protocol, prioceiling● pthread_mutex_setprioceiling()/_getprioceiling()● pthread_mutex_lock()/_trylock()/_unlock()

    ● Condition Variable● pthread_cond_init()/_destroy()● pthread_condattr_xxx()

    – Init/destroy; set/get pshared● pthread_cond_wait()/_timedwait()● pthread_cond_signal()● pthread_cond_broadcast()

  • Condition variablesint x,y; pthread_mutex_t mut =PTHREAD_MUTEX_INITIALIZER;pthread_cond_t cond = PTHREAD_COND_INITIALIZER;// (waiter) Wait until x is greater than y pthread_mutex_lock(&mut); while (x y pthread_mutex_lock(&mut); /* modify x and y */ if (x > y) pthread_cond_broadcast(&cond); pthread_mutex_unlock(&mut);

  • // (waiter) if timeout also

    struct timeval now;

    struct timespec timeout;

    int retcode;

    pthread_mutex_lock(&mut);

    gettimeofday(&now);

    timeout.tv_sec = now.tv_sec + 5;

    timeout.tv_nsec = now.tv_usec * 1000;

    retcode = 0;

    while (x

  • ● Semaphore● sem_init()/_destroy()                             ● sem_open()/_close()                              ● sem_wait()/_trywait()                            ● sem_post()                              ● sem_getvalue()                          ● sem_unlink()  

    ● fork() Clean Up Handling● pthread_atfork()   

    ● Async safe? Some pthread calls not safe to call from sig handlers● A user thr lib may have taken a lock  to ensure, say, that 

    only one user changing Qs. If pthread_mutex_lock, etc, may deadlock          

  • Signals:● oldest ipc method used by UNIX systems to signal asynchronous 

    events. ONLY 1BIT INFO!● can be generated by a keyboard interrupt or an error condition 

    or by other processes in the system (if they have the correct privileges)● kernel & superuser can send a signal to any process ● a process can also send a signal to other processes with same 

    uid/gid● Processes can handle signals themselves or allow kernel to 

    handle● If kernel handles the signal, default action for the signal: eg, SIGFPE 

    causes core dump and causes the process to exit● SIGSTOP (causes a process to halt its execution) and SIGKILL handled  only by 

    kernel

    ● List of signals on an Linux/Intel machine: SIGHUP  SIGINT SIGQUIT  SIGILL SIGTRAP SIGIOT SIGBUS SIGFPE SIGKILL SIGUSR1   SIGSEGV SIGUSR2  SIGPIPE  SIGALRM  SIGTERM SIGCHLD  SIGCONT  SIGSTOP  SIGTSTP  SIGTTIN  SIGTTOU SIGURG SIGXCPU SIGXFSZ  SIGVTALRM SIGPROF SIGWINCH SIGIO SIGPWR 

  • Signals (cont’d)

    ● void (*signal(int signo, void (*func) (int))) (int) =● typedef void Sigfunc(int); Sigfunc *signal(int, Sigfunc *)

    ● Signal is a func that returns a ptr to a func that ret void (prev sigh)● Or, sighandler_t signal(int signum, sighandler_t handler);

    ● Linux implements signals using information stored in  in task_struct of process:● struct sigpending pending: currently pending signals● blocked: mask of blocked signals● struct signal_struct *sig has array of sigactions that holds info 

    about how the process handles each signal

    ● Signals generated by setting appropriate bit in signal field of pending. If not blocked,  scheduler will run handler in the next system scheduling.

    ● Every time a process exits from a system call, the signal and blocked fields are checked, and if there is any unblocked signal, the handler is called.

  • #include

    static void sig_usr(int); /* one handler for both signals */

    int main(void) {

    if (signal(SIGUSR1, sig_usr) == SIG_ERR)

    err_sys("can't catch SIGUSR1");

    if (signal(SIGUSR2, sig_usr) == SIG_ERR)

    err_sys("can't catch SIGUSR2");

    for ( ; ; ) pause(); }

    static void sig_usr(int signo) { /* argument is signal number */

    if (signo == SIGUSR1) printf("received SIGUSR1\n");

    else if (signo == SIGUSR2)

    printf("received SIGUSR2\n");

    else err_dump("received signal %d\n", signo);

    return;}

  • #include  No Qing for nonreal time signals!

    main() { int childPid, i; void SigIntHandler();

    sigblock(sigmask(SIGINT)); signal(SIGINT, SigIntHandler);

    childPid = fork(); if (childPid > 0) { /* parent */ for (i=0; i < 10 ; i++) kill(childPid, SIGINT); printf("Parent has issued %d signals to the child\n", i); } else { /* child */ sleep(2); /* sleep for 2 secs so that signals overwritten */ while (1) sigpause(0); }}void SigIntHandler(int signo) { printf("Child : received a signal\n");}

  • signal: V7, SVR2/3/4 (handler uninstalled, no blocking of signals, no

    autostart of interrupted system calls)sigset, sighold, sigrelse, sigignore, sigpause: SVR3/4 (no autostart)

    signal, sigvec, sigblock, sigsetmask(unblock a signal), sigpause: 4.x BSD (autostart 4.2; default 4.3/4.4)

    sigaction, sigprocmask, sigpending, sigsuspend: autostart unspecified (POSIX.1), optional(SVR4, 4.3/4.4BSD, Linux)sigprocmask: change the list of currently blocked signals

    sigpending: allows examination of pending signals (ones

    which have been raised while blocked)

    sigsuspend: replaces with given signal mask & suspends process until a signal

    int sigaction(int signo, const struct sigaction *act, struct sigaction *oact)

    struct sigaction {

    void (*sa_handler)();

    sigset_t sa_mask; /* addl signals to block */

    int sa_flags; /* restart?, alt stack?, waitchild?, uninstall handler? ...*/ }

  • Unreliable signalsold V7 code: race with a new signal for process before signal reinstalled

    int sig_int();

    ...

    signal(SIGINT, sig_int);

    ...

    sig_int() {

    /* another signal can come here! can cause default action */

    signal(SIGINT, sig_int);

    ...

    }

  • Another race

    int sig_int_flag;

    main() { int sig_int(); ... signal(SIGINT, sig_int); ... while (sig_int_flag==0) /* signal can come here! */ pause(); ...}

    sig_int() { signal(SIGINT, sig_int); sig_int_flag=1}

  • int sighold(int sig); int sigrelse(int sig) SysVsighold(SIGQUIT); sighold(SIGINT)

    c.s.

    sigrelse(SIGINT); sigrelse(SIGQUIT)

    int sig_int_flag;

    main() {

    int sig_int();

    ...

    signal(SIGINT, sig_int);

    ...

    sighold(SIGINT);

    while (sig_int_flag==0) sigpause(SIGINT); //atomically release signal

    /* wait for a signal to occur */ // and pause

    ...

  • Restarting of interrupted system calls by signals        4.3BSDCan only call reentrant functions within signal handlers

    int oldmask; /* SIGQUIT: quit key + core image; SIGINT: interrupt key ^C */oldmask= sigblock (sigmask(SIGQUIT)|sigmask(SIGINT)); /* block SIGQUIT/INT */c.s.sigsetmask(oldmask) /* reset to old mask */

    int sig_int_flag;main() { int sig_int();...signal(SIGINT, sig_int);...sigblock(sigmask(SIGINT)); /* sigblock returns mask before */while (sig_int_flag==0) sigpause(0); /*wait for signal to occur *//* sigpause(0) sigsetmask + pause as signal can in betw *//* process signal... */...}

  • Executing Signal Handlers in Linux● On signal (either from kernel or another process), ker checks some 

    conditions (disp, etc) before calling do_signal

    ● do_signal in kernel while (user) signal handler in user mode

    ● After signal handler run, kernel code executed further

    ● However, ker stack no longer contains hw context of interrupted program as ker stack emptied on user mode

    ● Also, sig handlers can reenter kernel (syscalls, etc.)● Solution: copy hw context saved in ker stack to user stack of curr 

    process 

    ● When sig handler terminates, sigreturn syscall automatically invoked to copy hw context back to kernel stack & restore the user stack

    ● Sigframe struct pushed on stack has some code for calling sigreturn: stack has to be executable!!!

  • pselect● int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set

    *exceptfds, struct timeval *utimeout);● int pselect(int nfds, fd_set *readfds, fd_set *writefds,

    fd_set *exceptfds, const struct timespec *ntimeout, sigset_t *sigmask);

    ● pselect() used to wait for a signal as well as data from a fd ● Programs that receive signals as events normally use the

    signal handler only to raise a global flag. ● The global flag indicates that the event must be processed in

    the main loop of the program. ● A signal will cause the select()/pselect() to return with errno set

    to EINTR. This behavior is essential so that signals can be processed in the main loop of the program, otherwise select() would block indefinitely.

  • Race condition● Somewhere in the main loop, a conditional

    checks the global flag. ● What if a signal arrives after the conditional, but

    before the select() call? ● select() would block indefinitely, even though an

    event is actually pending.

  • pselect example● udp broadcast

  • Linux Concurrency Model● Within appl: clones (incl threads & processes of other systems)● Inside kernel: 

    ● Kernel threads: do not have USER context● deferrable and interruptible ker funcs:

    – Softirq: reentrant: multiple softirqs of the same type can be run concurrently on several CPUs. 

    ● No dyn alloc! Have to be statically defined at compile time. – Tasklet: multiple tasklets of the same type cannot run 

    concurrently on several CPUs. ● Dyn alloc OK! Can be allocated and initialized at run time (loadable 

    modules). Impl thru softirqs

    – Bottom Half: multiple bottom halves cannot be run concurrently on several CPUs. No dyn alloc!

    ● Impl thru tasklets

    ● Across HW: IPI

  • Spinlocks & Semaphores● Shared data betw different parts of code in kernel 

    ● most common: access to data structures shared between user process context and interrupt context

    ● In uniprocessor system: mutual excl by setting and clearing interrupts + flags

    ● SMP: three types of spinlocks: vanilla (basic), readwrite, bigreader● Readwrite spinlocks when many readers and few writers

    – Eg: access to the list of registered filesystems. ● Bigreader spinlocks a form of readwrite spinlocks optimized for 

    very light read access, with penalty for writes

    – limited number of bigreader spinlocks users. – used in networking part of the kernel. 

    ● semaphores: Two types of semaphores: basic and readwrite semaphores. Different from IPC's● Mutex or counting up()& down(); interruptible/ non

  • Spinlocks: (cont’d)● A good example of using spinlocks: accessing a data strucuture 

    shared betw a user context and an interrupt handlerspinlock_t my_lock = SPIN_LOCK_UNLOCKED;my_ioctl() { // _ioctl: definitely process context!

    spin_lock_irq(&my_lock); // and known that interrupts enabled!/* critical section */ // hence, _irq to disable iinterruptsspin_unlock_irq(&my_lock);

    } my_irq_handler() { // _irq_handler: definitely system (or intr

    spin_lock(&lock); // context)& hence known that intr disabled!/* critical section */ // can use simpler lockspin_unlock(&lock);

    }

     spin_lock: if interrupts disabled or no race with interrupt context spin_lock_irq: if interrupts enabled and has to be disabled  spin_lock_irqsave: if interrupt state not known● Basic premise of a spin lock: one thread busywaits on a resource on one 

    processor while another used on another (only true for MP). But code has to work for 1 or more processors. If all threads on 1 processor, if a thread tries to spin lock that is already held by another thread, deadlock.

    ● Never give up CPU when holding a spinlock! 

  • Linux 2.4 buffer cachestruct buffer_head * getblk(kdev_t dev, int block, int size) {

    for (;;) {

    struct buffer_head * bh;

    bh = get_hash_table(dev, block, size);

    if (bh)

    return bh;

    if (!grow_buffers(dev, block, size))

    free_more_memory();

    }

    }

  • struct buffer_head * get_hash_table(kdev_t dev, int block, int size){

    struct buffer_head *bh, **p = &hash(dev, block);

    read_lock(&hash_table_lock);

    for (;;) { static inline void get_bh(struct buffer_head * bh) {

    bh = *p; atomic_inc(&(bh)->b_count);

    if (!bh) break; }

    p = &bh->b_next;

    if (bh->b_blocknr != block) continue;

    if (bh->b_size != size) continue;

    if (bh->b_dev != dev) continue;

    get_bh(bh); #define hash(dev,block) break; } hash_table[(_hashfn(HASHDEV(dev),block) & bh_hash_mask)]

    read_unlock(&hash_table_lock); bh_hash_mask = (nr_hash - 1)

    return bh;

    } lru_list_lock > hash_table_lock > unused_list_lock: hier

    #define HASHDEV(dev) ((unsigned int ) (dev))

  • Linux downstatic inline void down(struct semaphore * sem) {

    __asm__ __volatile__( "# atomic down operation\n\t"

    LOCK "decl %0\n\t" /* --sem->count */

    "js 2f\n"

    "1:\n"

    LOCK_SECTION_START("")

    "2:\tcall __down_failed\n\t"

    "jmp 1b\n"

    LOCK_SECTION_END

    :"=m" (sem->count)

    :"c" (sem)

    :"memory");

    }

  • asm( ".text\n" ".align 4\n"

    ".globl __down_failed\n"

    "__down_failed:\n\t"

    #if defined(CONFIG_FRAME_POINTER)

    "pushl %ebp\n\t" "movl %esp,%ebp\n\t"

    #endif

    "pushl %eax\n\t"

    "pushl %edx\n\t"

    "pushl %ecx\n\t"

    "call __down\n\t"

    "popl %ecx\n\t"

    "popl %edx\n\t"

    "popl %eax\n\t"

    #if defined(CONFIG_FRAME_POINTER)

    "movl %ebp,%esp\n\t" "popl %ebp\n\t"

    #endif

    "ret" );

  • void __down(struct semaphore * sem) { struct task_struct *tsk = current; DECLARE_WAITQUEUE(wait, tsk); tsk->state = TASK_UNINTERRUPTIBLE; add_wait_queue_exclusive(&sem->wait, &wait);

    spin_lock_irq(&semaphore_lock); sem->sleepers++; for (;;) { int sleepers = sem->sleepers;

    /*Add "everybody else" into it. They aren't * playing, because we own the spinlock. */ if (!atomic_add_negative(sleepers - 1, &sem->count)) { sem->sleepers = 0; break; } sem->sleepers = 1; /* us - see -1 above */ spin_unlock_irq(&semaphore_lock); schedule(); tsk->state = TASK_UNINTERRUPTIBLE; spin_lock_irq(&semaphore_lock); } spin_unlock_irq(&semaphore_lock); remove_wait_queue(&sem->wait, &wait); tsk->state = TASK_RUNNING; wake_up(&sem->wait);}

  • Analysis● Semaphore open: count=1, sleepers=0: down makes

    count 0; __down not executed● Semaphore closed & no sleeping processes:

    count=0, sleepers=0=> count -1 & sleepers 1● Each iteration checks if count negative

    – Negative: schedule() & check again– Otherwise: sleepers=0; wakeup another (but Q empty)

    ● Semaphore closed & other sleeping processes: count, sleepers (-1,1) => (-2, 1)● Sleepers temporarily 2, count becomes -1 again

    – Checks if count still negative as holding process may V● If negative: schedule ● Not negative:

  • #define DECLARE_WAITQUEUE(name, tsk) wait_queue_t name = __WAITQUEUE_INITIALIZER(name, tsk)

    #define __WAITQUEUE_INITIALIZER(name, tsk) { task: tsk, task_list: { NULL, NULL }, __WAITQUEUE_DEBUG_INIT(name)}static spinlock_t semaphore_lock = SPIN_LOCK_UNLOCKED

    struct semaphore { atomic_t count; int sleepers; wait_queue_head_t wait;#if WAITQUEUE_DEBUG long __magic;#endif};#define spin_lock_irq(lock) do { local_irq_disable(); spin_lock(lock); } while (0)#define local_irq_disable() __cli()

    struct __wait_queue { unsigned int flags;#define WQ_FLAG_EXCLUSIVE 0x01 struct task_struct * task; struct list_head task_list;#if WAITQUEUE_DEBUG long __magic; long __waker;#endif };typedef struct __wait_queue wait_queue_t;

  • static inline void up(struct semaphore * sem) {

    __asm__ __volatile__( "# atomic up operation\n\t"

    LOCK "incl %0\n\t" /* ++sem->count */

    "jle 2f\n"

    "1:\n"

    LOCK_SECTION_START("")

    "2:\tcall __up_wakeup\n\t"

    "jmp 1b\n"

    LOCK_SECTION_END

    ".subsection 0\n"

    :"=m" (sem->count)

    :"c" (sem)

    :"memory");

    }

  • asm(

    ".text\n" ".align 4\n"

    ".globl __up_wakeup\n"

    "__up_wakeup:\n\t"

    "pushl %eax\n\t"

    "pushl %edx\n\t"

    "pushl %ecx\n\t"

    "call __up\n\t"

    "popl %ecx\n\t"

    "popl %edx\n\t"

    "popl %eax\n\t"

    "ret");

    #define wake_up(x) __wake_up((x),TASK_UNINTERRUPTIBLE |TASK_INTERRUPTIBLE, 1)

    void __up(struct semaphore *sem) {

    wake_up(&sem->wait);

    }

  • void __wake_up(wait_queue_head_t *q, unsigned int mode, int nr_exclusive{

    unsigned long flags;

    if (unlikely(!q)) return;

    spin_lock_irqsave(&q->lock, flags);

    __wake_up_common(q, mode, nr_exclusive, 0);

    spin_unlock_irqrestore(&q->lock, flags);

    }

    static inline void __wake_up_common(wait_queue_head_t *q, unsigned int mode, int nr_exclusive, int sync) {

    struct list_head *tmp; unsigned int state; wait_queue_t *curr; task_t *p;

    list_for_each(tmp, &q->task_list) {

    curr = list_entry(tmp, wait_queue_t, task_list);

    p = curr->task;

    state = p->state;

    if ((state & mode) && try_to_wake_up(p, sync) && ((curr->flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)) break;

    }

    }

  • Process Tree● Init reads /etc/inittab● Opens tty

    ● Fd 0,1,2 set to dev● Login printed● Read user name● Initial env set (p: add to existing env; envp: TERM, etc,)● uid, gid=0● execle(“/bin/login”, “login”, “p”, username, (char*)0, envp)● Getpwname (get password file entry); getpass  get a password; use 

    crypt/md5 to validate pwd● Fail: login calls exit(1);noticed by init; respawn action● Success: chdir; chown for terminal device; setgid; initgroups; initenv 

    (HOME, SHELL, USER, PATH, ...)● Setuid; then  execl(“/bin/sh”, “sh”, 0)   (2nd arg: login shell)

    # Run gettys in standard runlevels1:2345:respawn:/sbin/mingetty tty12:2345:respawn:/sbin/mingetty tty23:2345:respawn:/sbin/mingetty tty34:2345:respawn:/sbin/mingetty tty45:2345:respawn:/sbin/mingetty tty56:2345:respawn:/sbin/mingetty tty6

  • init

    init

    getty

    login

    fork

    exec

    exec

    forks one per tty init

    login shell

    term dev driver

    thru getty/login

    fds 0,1,2

    userRS-232 cnxn

  • init

    inetd

    inetd

    telnetd

    fork

    exec

    fork/exec of /bin/sh that executes /etc/rc script init

    login shell

    term dev driver

    thru inetd, telnetd, login

    fds 0,1,2

    usernetw cnxn

    telnet req

  • Network Logins● Terminal device driver thru, say, RS232

    ● Shell (fd 0,1,2): user level● Kernel level:

    – Line terminal disc (echo chars, assemble chars to lines, bs, Cu, gen SIGINT/SIGQUIT, CS, CQ, newline (CR+LF),...)

    – terminal device driver 

    ● Network login: similar to terminal login● init, inetd, telnetd/sshd, login ● Pseudoterminal device driver

    ● pseudoterminal is a special IPC that acts like a terminal● data written to master side received by the slave side as if it was the result of a 

    user typing at an ordinary terminal & viceversa

    ● Netw cnxn thru telnetd/sshd server& telnet/ssh client

    ,

  • rlogind

    TCP/IP

    netw dev driver pty master

    login shell

    term disc

    pty slave

    fork

    exec, exec

    stdout/err stdin

    KERNEL

    Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11We’re still cheating…Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Signals:Slide 33Slide 34Slide 35Slide 36Slide 37Slide 38Slide 39Slide 40Slide 41Slide 42Slide 43Slide 44Slide 45Spinlocks:Slide 47Slide 48Slide 49Slide 50Slide 51Slide 52Slide 53Slide 54Slide 55Slide 56Slide 57Slide 58Slide 59Slide 60Slide 61Slide 62


Recommended