OS09 K. Gopinath, IIScdrona.csa.iisc.ernet.in/~gopi/os09/extra-sep-os09.pdf · algorithm getblk...

Indian Inst of Science 1

OS09K. Gopinath, IISc

Slides are from Tanenbaum, provided as part of his book Updated Tanenbaum slides from

by Darrell Long/Ethan Miller at UCSC Mine own (K. Gopinath) Some papers/books too numerous to list Hence PL. DO NOT CIRCULATE

Sleep and wakeup: ProducerConsumer problem#include "prototypes.h"

#define N 100

int count = 0;

void producer(void) {

int item;

while (TRUE) {

produce_item(&item);

if (count == N) sleep();

enter_item(item);

count = count + 1;

if (count == 1) wakeup(consumer);

}

}

void consumer(void) {

int item;

while (TRUE) {

if (count == 0) sleep();

remove_item(&item);

count = count - 1;

if (count == N - 1) wakeup(producer);

consume_item(item);

}

}

Race problem if just before sleep, consumer swapped out and producer signals

Interrupts & Kernel code● Interrupts (or process scheduling) can occur anytime

● Interrupt handler can also call brelse just like kcode● However, interrupt handler should not block● Otherwise, the process on whose (kernel) stack the interrupt

handler runs blocks● Can expose data structures in an inconsistent state

● List manipulation requires multiple steps● Interrupt can expose intermediate state

● Interrupt handler can manipulate linked lists that kernel code could also be manipulating● Need to raise “processor execution level” to mask interrupts (or

scheduling)● Check & sleep (or test & set) should be atomic

Buffer Allocation Algs● getblk: given a filesystem number and disk block number, get the

buffer for it locked

● brelse: given a locked buffer, wakeup waiting procs and unlock it

● bread: read a given disk block into a buffer

● breada: bread + asynch. read ahead

● bwrite: write a given buffer to a disk block

● Buffer properties

● No file block in 2 different buffers● Can be in free list or hash list: Search free list if any buffer

needed; hash list if a particular buffer needed● Buf alloc safe: allocs during syscall & frees at end

– Disk drive hw problem: cannot interrupt CPU: buf lost!– But no starvation guarantees:

Data Structures● inode: owner, file type (REG, DIR, FIFO, CHR,

BLK, ...), access perms/times, #links, disk addrs for blocks in file, file size

● incore inode: addl fields: locked?, processwaiting?, dirty?, mount point?; reference count (# of opens), ptrs to other incore inodes(free and hash q)

● superblock: size of FS/inode list, #free blocks/inodes, dirty?,list/bitmap of free blocks/inodes, index of next free block/inode, locks for lists/bitmap

algorithm getblkinput: file sys #, block #; output: locked buffer that can now be used for block

while (buffer not found) { 1

if (block in hash queue) { 2

if (buffer busy) { 3

sleep (event buffer becomes free); 4

continue; 5

} 6

mark buffer busy; 7

remove buffer from free list; 8

return buffer; 9

} else { 10

if (there are no buffers on free list){11

sleep (event any buffer becomes free);

continue;

}

remove buffer from free list;

if (buffer marked for delayed write) {

asynchronous write buffer to disk;

continue;

}

mark buffer busy;

remove buffer from old hash queue;

put buffer onto new hash queue;

return buffer;

}

}

bread(filesystem f, block n) {

getblk(f,n);

if (buffer data valid) return buffer;

initiate disk read;

sleep(event disk read complete);

return (buffer); }

Race Conditions P1 P2 P3block b not

on hash Q

no free bufs

sleep

block b not on hash Q

no free bufs

sleep

free a buf

wakeup

use freed buf

wakeup

Try from beg!

P1 P2 P3alloc buf to block b

lock buf

init I/O

sleep until

I/O done

buf locked

sleep

wait for any buf

get buf b & reassign to b'

wakeup; try again!

algorithm brelseinput : locked buffer

output : none

{wake up all procs : event, waiting for any buffer to become free

wake up all procs : event, waiting for this buffer to become free

raise processor execution level to block interrupts;

if (buffer contents are valid and buffer not old)

enqueue buffer at end of free list

else enqueue buffer at the beginning of the free list

lower processor execution level to allow interrupts;

unlock(buffer);}

bwrite(buf b) {

initiate disk write;

if (I/O synchronous) {sleep(event I/O complete); brelse(b);}

else (if b marked for delayed write) mark buffer to put at head of list

}

Problems● 1st prob

● P1 finds buffer is busy (line 3) & “starts” to sleep ● P1 gets preempted as P2 finishes use of buffer (thru interrupt)

and releases it● P1 now begins sleep even though buffer free

● 2nd prob

● P1 finds no free buffers (line 11)● P1 gets preempted as P2 finishes use of buffer (thru interrupt)

and releases it● P1 now begins sleep even though buffer free

● 3rd prob

● Both getblk (line 8) and brelse manipulate free list

Atomicity violated in each case: interleaved execution of getblk with interrupt handler or with brelse

Need to block interrupts; Also: have to watch out for hardware error!

Other file ops● iget: get a locked inode (doing bread if necessary) given

inode number

● iput: release an inode; if ref count 0, writes dirty inode

● bmap: given inode and byte offset, returns disk block num and offset

● namei: given a path, get the locked inode

● ialloc: assign a new disk inode for a newly created file

● ifree: free an inode (link count 0)

● alloc: allocate a free disk block and return buffer using getblk

● free: free a disk block

initsem(semaphore *sem, int val) {

*sem = val

}

void P(semaphore *sem) {

*sem -= 1;

while (*sem <0) sleep

}

void V(semaphore *sem) {

*sem += 1;

if (*sem<=0) wakeup thread blocked on sem

}

boolean_t CP(semaphore *sem) {

if (*sem>0) {*sem -= 1; return(TRUE) } else return(FALSE)

}

● Mutex thru Semaphore semaphore sem;

initsem(&sem, 1);

P(&sem);

use resource

V(&sem);

● Eventwait semaphore event;

initsem(&event, 0);

P(&event);

event processing

V(&event);

● Countable Resources

semaphore counter;

initsem(&counter, count);

P(&counter); use resource; V(&counter)

Have we solved the problem?● P() and V() must be executed atomically● In uniprocessor system may disable interrupts● In multiprocessor system, use hardware

synchronization primitives● TS, FAA, etc…

● Involves some limited amount of busy waiting

Simulation of a monitor with semaphorestypedef int semaphore;

semaphore mutex = 1;

void enter_monitor(void) {

down(mutex);

}

void leave_normally(void) {

up(mutex);

}

void leave_with_signal(semaphore c) {

/* signal on c & exit monitor */

up(c);

}

void wait(semaphore c) {

up(mutex);

down(c);

}

Java Monitors● void wait(); Enter a monitor's wait set until notified by

another thread● void wait(long timeout); Enter a monitor's wait set until

notified by another thread or timeout milliseconds elapses● void wait(long timeout, int nanos); Enter a monitor's wait

set until notified by another thread or timeout milliseconds plus nanos nanoseconds elapses

● void notify(); Wake up one thread waiting in the monitor's wait set. (If no threads are waiting, do nothing.)

● void notifyAll(); Wake up all threads waiting in the monitor's wait set. (If no threads are waiting, do nothing.)

Java (contd)● Each Java monitor has a single nameless anonymous

condition variable on which a thread can wait() or signal one waiting thread with notify() or signal all waiting threads with notifyAll().

● This nameless condition variable corresponds to a lock on the object that must be obtained whenever a thread calls a synchronized method in the object. ● Only inside a synchronized method may wait(), notify(),

and notifyAll() be called.● Methods that are static can also be synchronized. There is a

lock associated with the class that must be obtained when a static synchronized method is called.

Problems with Semaphores● Too complex?

● Needs lowlevel atomic op to construct, blocking & unblocking involve context switches, manipulates scheduler and sleep Qs

● Good for resources held for long times, not for short ● Good as V only wakes up if someone can run● But this can result in convoys

● Low priority process P1 that has locked an imp lock (L) preempted by P2 which then waits for L

– Imp lock: Often log lock in txnal systems– P3 also needs L, P4 also, ... all wait

● P1 scheduled again (FIFO) & unlocks L● P2 gets lock (P1 preempted), P2 uses lock, then P3, ...● For next upd, P2 goes back to Q again, then P3, P4,...● Lockunlock: 100's of insts; lockwaitdispatchunlock: 1000's

Semaphore Problem?● 1: T2 (P2) in cs (using a sem): blocks T3; T4 on run Q● 2: T2 exits cs but active; T3 now gets sem but inactive ● 3: T1 (P1) now wants to enter cs but blocked by T3

● T1 blocked even if no one in cs! FIFO property!● T4 scheduled on P1

– T1 & T3 cannot run unless T2 or T4 giveup– Processor 1: T1 > T1 > T4– Processor 2: T2 > T2 > T2

● Problem in step 2:● Have to make sure that T3 does not get sem but on ready

Q. T1 will then get sem & no context sw.● Need different semantics: eg: condition variables

Message Passing: Mailboxes, Ports, CSPSend/Receive; Blocking/nonblocking

typedef int message[MSIZE];

void producer(void){

int item;

message m;

while (TRUE) {

produce_item(&item);

receive(consumer, &m);

bulid_message(&m, item);

send(consumer, &m);

}

}

void consumer(void){

int item, i; message m;

for (i = 0; i < N; i++) send(producer, &m);

while (TRUE) {

receive(producer, &m);

extract_item(&m, & item);

send(producer, &m);

consumer_item(item);

}

}

Fork & fork1 in MT processes● Process with exactly 1 LWP=> same semantics as “old Unix”

process● copy all LWPs on fork? Solaris9 but not Posix

● one LWP blocked in parent: what about in child? Restart? Concurrent syscalls? EINTR or wait(disk)?

● one LWP has open netw cnxn: if closed, unexpected user msg to remote node

● one LWP changing a shared data structure: corruption thru the new copy of LWP? How to make a “consistent” copy?

● copy only calling LWP? Fork1: Solaris10; good for exec'ing● some user thrs not on LWPs that were in parent● child process should not try to acq locks held by LWPs not

in child (deadlock!) but user code cannot know! these locks may be held by ulib POSIX

fork1fork1(): only calling LWP created in child

registration of fork_handlers (_atfork)

prepare: prior to fork in the ctxt of calling LWP. LIFO

parent: after fork. FIFO

child: after fork in context of 1 thr in child. FIFO

LIFO/FIFO order to enable preserving of locking order

int pthread_atfork(void (*prepare) (void), void (*parent) (void), void (*child) (void));

handles orphaned mutexes

prepare fork handlers lock all mutexes (by calling thr)

parent/child fork handlers unlock mutexes

indep libs & appl progs can protect themselves

lib provides fork handlers

Fork and threads Thr A Thr B Thr B_ch

locks mutex

modifies shared

data fork

copy of locked mutex and

inconsistent data struct.

/* cannot drop mutex as

data inconsistent nor can

it take mutex: deadlock*/

/* memory leaks also! */

Solutions?● programs that use fork() call an exec function soon

afterwards in child process, thus resetting all states● In the meantime, only a short list of async-signal-safe

library routines are promised to be available

● But not good wrt multi-threaded (MT) libraries. ● Applications may not be aware that a MT library is in use,

and feel free to call any number of library routines between the fork() and exec calls. They may be extant 1-threaded programs that cannot be expected to obey new restrictions imposed by the threads library.

● A MT library needs a way to protect its internal state during fork() in case it is re-entered later in the child process. eg. MT I/O libraries, which are invoked between the fork() and exec calls to effect I/O redirection.

Fork handling● Lock global mutexes

● Other threads locked out of the critical regions of code protected by these mutexes

● Can take snapshot: copy of valid, stable data

● Reset synchr objects in the child process● ensures they are properly cleansed of any artifacts from

the threading subsystem of parent process● eg. a mutex may inherit a wait queue of threads waiting for

the lock; this wait queue makes no sense in child. Initialize mutex to remedy (deletes unnecessary data structures in child). Otherwise memory leaks!

● But how to correct or otherwise deal with the inconsistent state in the child?

With pthread_atfork: no orphaned locks!

prepare: lock(mutex)

parent: unlock(mutex)

child: unlock(mutex)

Thr A Thr B ThrB_ch locks mutex

modifies shared data

attempt fork but blocked as

prepare (lock mutex) blocked

drops mutex (shared data now consistent!)

prepare succeeds (locks mutex)

fork completes

unlocks mutex (parent) unlock mutex(child)

Solutions0: pthread_atfork: provides MT libraries with a means

to protect themselves from innocent appls that call fork(), and provides MT appls with a std mech for protecting themselves from fork() calls in a lib routine or the appl itself. But COMPLEX!!! Avoid problems by

1: If posssible, fork before creating any threads

2: Instead of fork, create a new thread. If forking to exec a binary, can attempt to convert binary to a shared lib that can be linked to.

3: Try a surrogate parent method. Fork at init time; the child will be a "surrogate" parent that will remain 1-threaded. When exec is needed, child is informed and it does a fork/exec

Posix Model of Concurrency● Creation● pthread_create(tp, attrp, fptr, argp) ● pthread_attr_xxx(): manipulate attr of a thread

– Init/destroy; set/get detachstate, inheritsched, schedparam, schedpolicy, scope, stackaddr, stacksize

● Exit● pthread_exit(retvalp) ● pthread_join(t, **v): wait for another thread termination ● pthread_detach(t): storage for thread can be reclaimed

when thread terminates (no zombie)● Thread Specific Data (indexed by key)

● pthread_key_create(keyp, fpdestructor)/_delete() ● pthread_setspecific()/_getspecific() mapping betw key and thread

● Signal: pthread_sigmask(how, newmask, saveprev): change signal mask for calling thread

● pthread_kill(t, sig) sigwait: suspend thr till sig ● ID: pthread_self()

● pthread_equal(t1, t2) ● pthread_once(once?, fptr): ensure some init at most once

● Scheduling● pthread_setschedparam()/_getschedparam()

● Cancellation (cancellation pts: _join, _cond_wait, _cond_timedwait, sem_wait, sigwait, _testcancel)

● pthread_cancel(t) by others /pthread_testcancel(void) by self

● pthread_setcancelstate()/type()● pthread_cleanup_pop()/_push(): if a thread exits or cancelled

(with locked mutexes?), cleanup handlers executed; LIFO order

● Mutex● pthread_mutex_init()/_destroy()● pthread_mutexattr_xxx()

– Init/destroy; set/get pshared, protocol, prioceiling● pthread_mutex_setprioceiling()/_getprioceiling()● pthread_mutex_lock()/_trylock()/_unlock()

● Condition Variable● pthread_cond_init()/_destroy()● pthread_condattr_xxx()

– Init/destroy; set/get pshared● pthread_cond_wait()/_timedwait()● pthread_cond_signal()● pthread_cond_broadcast()

Condition variablesint x,y;

pthread_mutex_t mut =PTHREAD_MUTEX_INITIALIZER;

pthread_cond_t cond = PTHREAD_COND_INITIALIZER;

// (waiter) Wait until x is greater than y

pthread_mutex_lock(&mut);

while (x <= y) pthread_cond_wait(&cond, &mut);

/* operate on x and y */

pthread_mutex_unlock(&mut);

// (signaller) Signal if modifications on x and y st x>y


/* modify x and y */

if (x > y) pthread_cond_broadcast(&cond);


// (waiter) if timeout also

struct timeval now;

struct timespec timeout;

int retcode;


gettimeofday(&now);

timeout.tv_sec = now.tv_sec + 5;

timeout.tv_nsec = now.tv_usec * 1000;

retcode = 0;

while (x <= y && retcode != ETIMEDOUT)

retcode = pthread_cond_timedwait(&cond, &mut, &timeout);

if (retcode == ETIMEDOUT) {/* timeout occurred */}

else { /* operate on x and y */}


● Semaphore● sem_init()/_destroy() ● sem_open()/_close() ● sem_wait()/_trywait() ● sem_post() ● sem_getvalue() ● sem_unlink()

● fork() Clean Up Handling● pthread_atfork()

● Async safe? Some pthread calls not safe to call from sig handlers● A user thr lib may have taken a lock to ensure, say, that

only one user changing Qs. If pthread_mutex_lock, etc, may deadlock

Signals:● oldest ipc method used by UNIX systems to signal asynchronous

events. ONLY 1BIT INFO!● can be generated by a keyboard interrupt or an error condition

or by other processes in the system (if they have the correct privileges)● kernel & superuser can send a signal to any process ● a process can also send a signal to other processes with same

uid/gid● Processes can handle signals themselves or allow kernel to

handle● If kernel handles the signal, default action for the signal: eg, SIGFPE

causes core dump and causes the process to exit● SIGSTOP (causes a process to halt its execution) and SIGKILL handled only by

kernel

● List of signals on an Linux/Intel machine: SIGHUP SIGINT SIGQUIT SIGILL SIGTRAP SIGIOT SIGBUS SIGFPE SIGKILL SIGUSR1 SIGSEGV SIGUSR2 SIGPIPE SIGALRM SIGTERM SIGCHLD SIGCONT SIGSTOP SIGTSTP SIGTTIN SIGTTOU SIGURG SIGXCPU SIGXFSZ SIGVTALRM SIGPROF SIGWINCH SIGIO SIGPWR

Signals (cont’d)

● void (*signal(int signo, void (*func) (int))) (int) =● typedef void Sigfunc(int); Sigfunc *signal(int, Sigfunc *)

● Signal is a func that returns a ptr to a func that ret void (prev sigh)● Or, sighandler_t signal(int signum, sighandler_t handler);

● Linux implements signals using information stored in in task_struct of process:● struct sigpending pending: currently pending signals

● blocked: mask of blocked signals● struct signal_struct *sig has array of sigactions that holds info

about how the process handles each signal

● Signals generated by setting appropriate bit in signal field of pending. If not blocked, scheduler will run handler in the next system scheduling.

● Every time a process exits from a system call, the signal and blocked fields are checked, and if there is any unblocked signal, the handler is called.

#include <signal.h>

static void sig_usr(int); /* one handler for both signals */

int main(void) {

if (signal(SIGUSR1, sig_usr) == SIG_ERR)

err_sys("can't catch SIGUSR1");

if (signal(SIGUSR2, sig_usr) == SIG_ERR)

err_sys("can't catch SIGUSR2");

for ( ; ; ) pause(); }

static void sig_usr(int signo) { /* argument is signal number */

if (signo == SIGUSR1) printf("received SIGUSR1\n");

else if (signo == SIGUSR2)

printf("received SIGUSR2\n");

else err_dump("received signal %d\n", signo);

return;}

#include <signal.h> No Qing for nonreal time signals!

main() { int childPid, i; void SigIntHandler();

sigblock(sigmask(SIGINT)); signal(SIGINT, SigIntHandler);

childPid = fork(); if (childPid > 0) { /* parent */ for (i=0; i < 10 ; i++) kill(childPid, SIGINT); printf("Parent has issued %d signals to the child\n", i); } else { /* child */ sleep(2); /* sleep for 2 secs so that signals overwritten */ while (1) sigpause(0); }}void SigIntHandler(int signo) { printf("Child : received a signal\n");}

signal: V7, SVR2/3/4 (handler uninstalled, no blocking of signals, no

autostart of interrupted system calls)

sigset, sighold, sigrelse, sigignore, sigpause: SVR3/4 (no autostart)

signal, sigvec, sigblock, sigsetmask(unblock a signal), sigpause: 4.x BSD (autostart 4.2; default 4.3/4.4)

sigaction, sigprocmask, sigpending, sigsuspend: autostart unspecified (POSIX.1), optional(SVR4, 4.3/4.4BSD, Linux)

sigprocmask: change the list of currently blocked signals

sigpending: allows examination of pending signals (ones

which have been raised while blocked)

sigsuspend: replaces with given signal mask & suspends process until a signal

int sigaction(int signo, const struct sigaction *act, struct sigaction *oact)

struct sigaction {

void (*sa_handler)();

sigset_t sa_mask; /* addl signals to block */

int sa_flags; /* restart?, alt stack?, waitchild?, uninstall handler? ...*/ }

Unreliable signalsold V7 code: race with a new signal for process before signal reinstalled

int sig_int();

...

signal(SIGINT, sig_int);

...

sig_int() {

/* another signal can come here! can cause default action */


...

}

Another race

int sig_int_flag;

main() { int sig_int(); ... signal(SIGINT, sig_int); ... while (sig_int_flag==0) /* signal can come here! */ pause(); ...}

sig_int() { signal(SIGINT, sig_int); sig_int_flag=1}

int sighold(int sig); int sigrelse(int sig) SysVsighold(SIGQUIT); sighold(SIGINT)

c.s.

sigrelse(SIGINT); sigrelse(SIGQUIT)

int sig_int_flag;

main() {

int sig_int();

...


...

sighold(SIGINT);

while (sig_int_flag==0) sigpause(SIGINT); //atomically release signal

/* wait for a signal to occur */ // and pause

...

Restarting of interrupted system calls by signals 4.3BSDCan only call reentrant functions within signal handlers

int oldmask; /* SIGQUIT: quit key + core image; SIGINT: interrupt key ^C */oldmask= sigblock (sigmask(SIGQUIT)|sigmask(SIGINT)); /* block SIGQUIT/INT */c.s.sigsetmask(oldmask) /* reset to old mask */

int sig_int_flag;main() { int sig_int();...signal(SIGINT, sig_int);...sigblock(sigmask(SIGINT)); /* sigblock returns mask before */while (sig_int_flag==0) sigpause(0); /*wait for signal to occur *//* sigpause(0) <> sigsetmask + pause as signal can in betw *//* process signal... */...}

Executing Signal Handlers in Linux● On signal (either from kernel or another process), ker checks some

conditions (disp, etc) before calling do_signal

● do_signal in kernel while (user) signal handler in user mode

● After signal handler run, kernel code executed further

● However, ker stack no longer contains hw context of interrupted program as ker stack emptied on user mode

● Also, sig handlers can reenter kernel (syscalls, etc.)● Solution: copy hw context saved in ker stack to user stack of curr

process

● When sig handler terminates, sigreturn syscall automatically invoked to copy hw context back to kernel stack & restore the user stack

● Sigframe struct pushed on stack has some code for calling sigreturn: stack has to be executable!!!

pselect● int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set

*exceptfds, struct timeval *utimeout);

● int pselect(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, const struct timespec *ntimeout, sigset_t *sigmask);

● pselect() used to wait for a signal as well as data from a fd ● Programs that receive signals as events normally use the

signal handler only to raise a global flag. ● The global flag indicates that the event must be processed in

the main loop of the program. ● A signal will cause the select()/pselect() to return with errno set

to EINTR. This behavior is essential so that signals can be processed in the main loop of the program, otherwise select() would block indefinitely.

Race condition

● Somewhere in the main loop, a conditional checks the global flag.

● What if a signal arrives after the conditional, but before the select() call?

● select() would block indefinitely, even though an event is actually pending.

pselect example

● udp broadcast

Linux Concurrency Model● Within appl: clones (incl threads & processes of other systems)● Inside kernel:

● Kernel threads: do not have USER context● deferrable and interruptible ker funcs:

– Softirq: reentrant: multiple softirqs of the same type can be run concurrently on several CPUs.

● No dyn alloc! Have to be statically defined at compile time. – Tasklet: multiple tasklets of the same type cannot run

concurrently on several CPUs. ● Dyn alloc OK! Can be allocated and initialized at run time (loadable

modules). Impl thru softirqs

– Bottom Half: multiple bottom halves cannot be run concurrently on several CPUs. No dyn alloc!

● Impl thru tasklets

● Across HW: IPI

Spinlocks & Semaphores● Shared data betw different parts of code in kernel

● most common: access to data structures shared between user process context and interrupt context

● In uniprocessor system: mutual excl by setting and clearing interrupts + flags

● SMP: three types of spinlocks: vanilla (basic), readwrite, bigreader● Readwrite spinlocks when many readers and few writers

– Eg: access to the list of registered filesystems. ● Bigreader spinlocks a form of readwrite spinlocks optimized for

very light read access, with penalty for writes

– limited number of bigreader spinlocks users. – used in networking part of the kernel.

● semaphores: Two types of semaphores: basic and readwrite semaphores. Different from IPC's● Mutex or counting up()& down(); interruptible/ non

Spinlocks: (cont’d)● A good example of using spinlocks: accessing a data strucuture

shared betw a user context and an interrupt handlerspinlock_t my_lock = SPIN_LOCK_UNLOCKED;

my_ioctl() { // _ioctl: definitely process context!

spin_lock_irq(&my_lock); // and known that interrupts enabled!

/* critical section */ // hence, _irq to disable iinterrupts

spin_unlock_irq(&my_lock);

}

my_irq_handler() { // _irq_handler: definitely system (or intr

spin_lock(&lock); // context)& hence known that intr disabled!

/* critical section */ // can use simpler lock

spin_unlock(&lock);

}

spin_lock: if interrupts disabled or no race with interrupt context

spin_lock_irq: if interrupts enabled and has to be disabled spin_lock_irqsave: if interrupt state not known● Basic premise of a spin lock: one thread busywaits on a resource on one

processor while another used on another (only true for MP). But code has to work for 1 or more processors. If all threads on 1 processor, if a thread tries to spin lock that is already held by another thread, deadlock.

● Never give up CPU when holding a spinlock!

Linux 2.4 buffer cachestruct buffer_head * getblk(kdev_t dev, int block, int size) {

for (;;) {

struct buffer_head * bh;

bh = get_hash_table(dev, block, size);

if (bh)

return bh;

if (!grow_buffers(dev, block, size))

free_more_memory();

}

}

struct buffer_head * get_hash_table(kdev_t dev, int block, int size){

struct buffer_head *bh, **p = &hash(dev, block);

read_lock(&hash_table_lock);

for (;;) { static inline void get_bh(struct buffer_head * bh) {

bh = *p; atomic_inc(&(bh)->b_count);

if (!bh) break; }

p = &bh->b_next;

if (bh->b_blocknr != block) continue;

if (bh->b_size != size) continue;

if (bh->b_dev != dev) continue;

get_bh(bh); #define hash(dev,block) break; } hash_table[(_hashfn(HASHDEV(dev),block) & bh_hash_mask)]

read_unlock(&hash_table_lock); bh_hash_mask = (nr_hash - 1)

return bh;

} lru_list_lock > hash_table_lock > unused_list_lock: hier

#define HASHDEV(dev) ((unsigned int ) (dev))

Linux downstatic inline void down(struct semaphore * sem) {

__asm__ __volatile__( "# atomic down operation\n\t"

LOCK "decl %0\n\t" /* --sem->count */

"js 2f\n"

"1:\n"

LOCK_SECTION_START("")

"2:\tcall __down_failed\n\t"

"jmp 1b\n"

LOCK_SECTION_END

:"=m" (sem->count)

:"c" (sem)

:"memory");

}

asm( ".text\n" ".align 4\n"

".globl __down_failed\n"

"__down_failed:\n\t"

#if defined(CONFIG_FRAME_POINTER)

"pushl %ebp\n\t" "movl %esp,%ebp\n\t"

#endif

"pushl %eax\n\t"

"pushl %edx\n\t"

"pushl %ecx\n\t"

"call __down\n\t"

"popl %ecx\n\t"

"popl %edx\n\t"

"popl %eax\n\t"

#if defined(CONFIG_FRAME_POINTER)

"movl %ebp,%esp\n\t" "popl %ebp\n\t"

#endif

"ret" );

void __down(struct semaphore * sem) { struct task_struct *tsk = current; DECLARE_WAITQUEUE(wait, tsk); tsk->state = TASK_UNINTERRUPTIBLE; add_wait_queue_exclusive(&sem->wait, &wait);

spin_lock_irq(&semaphore_lock); sem->sleepers++; for (;;) { int sleepers = sem->sleepers;

/*Add "everybody else" into it. They aren't * playing, because we own the spinlock. */ if (!atomic_add_negative(sleepers - 1, &sem->count)) { sem->sleepers = 0; break; } sem->sleepers = 1; /* us - see -1 above */ spin_unlock_irq(&semaphore_lock); schedule(); tsk->state = TASK_UNINTERRUPTIBLE; spin_lock_irq(&semaphore_lock); } spin_unlock_irq(&semaphore_lock); remove_wait_queue(&sem->wait, &wait); tsk->state = TASK_RUNNING; wake_up(&sem->wait);}

Analysis● Semaphore open: count=1, sleepers=0: down makes

count 0; __down not executed● Semaphore closed & no sleeping processes:

count=0, sleepers=0=> count -1 & sleepers 1● Each iteration checks if count negative

– Negative: schedule() & check again– Otherwise: sleepers=0; wakeup another (but Q empty)

● Semaphore closed & other sleeping processes: count, sleepers (-1,1) => (-2, 1)● Sleepers temporarily 2, count becomes -1 again

– Checks if count still negative as holding process may V● If negative: schedule ● Not negative:

#define DECLARE_WAITQUEUE(name, tsk) wait_queue_t name = __WAITQUEUE_INITIALIZER(name, tsk)

#define __WAITQUEUE_INITIALIZER(name, tsk) { task: tsk, task_list: { NULL, NULL }, __WAITQUEUE_DEBUG_INIT(name)}static spinlock_t semaphore_lock = SPIN_LOCK_UNLOCKED

struct semaphore { atomic_t count; int sleepers; wait_queue_head_t wait;#if WAITQUEUE_DEBUG long __magic;#endif};#define spin_lock_irq(lock) do { local_irq_disable(); spin_lock(lock); } while (0)#define local_irq_disable() __cli()

struct __wait_queue { unsigned int flags;#define WQ_FLAG_EXCLUSIVE 0x01 struct task_struct * task; struct list_head task_list;#if WAITQUEUE_DEBUG long __magic; long __waker;#endif };typedef struct __wait_queue wait_queue_t;

static inline void up(struct semaphore * sem) {

__asm__ __volatile__( "# atomic up operation\n\t"

LOCK "incl %0\n\t" /* ++sem->count */

"jle 2f\n"

"1:\n"

LOCK_SECTION_START("")

"2:\tcall __up_wakeup\n\t"

"jmp 1b\n"

LOCK_SECTION_END

".subsection 0\n"

:"=m" (sem->count)

:"c" (sem)

:"memory");

}

asm(

".text\n" ".align 4\n"

".globl __up_wakeup\n"

"__up_wakeup:\n\t"

"pushl %eax\n\t"

"pushl %edx\n\t"

"pushl %ecx\n\t"

"call __up\n\t"

"popl %ecx\n\t"

"popl %edx\n\t"

"popl %eax\n\t"

"ret");

#define wake_up(x) __wake_up((x),TASK_UNINTERRUPTIBLE |TASK_INTERRUPTIBLE, 1)

void __up(struct semaphore *sem) {

wake_up(&sem->wait);

}

void __wake_up(wait_queue_head_t *q, unsigned int mode, int nr_exclusive{

unsigned long flags;

if (unlikely(!q)) return;

spin_lock_irqsave(&q->lock, flags);

__wake_up_common(q, mode, nr_exclusive, 0);

spin_unlock_irqrestore(&q->lock, flags);

}

static inline void __wake_up_common(wait_queue_head_t *q, unsigned int mode, int nr_exclusive, int sync) {

struct list_head *tmp; unsigned int state; wait_queue_t *curr; task_t *p;

list_for_each(tmp, &q->task_list) {

curr = list_entry(tmp, wait_queue_t, task_list);

p = curr->task;

state = p->state;

if ((state & mode) && try_to_wake_up(p, sync) && ((curr->flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)) break;

}

}

Process Tree● Init reads /etc/inittab● Opens tty

● Fd 0,1,2 set to dev● Login printed● Read user name● Initial env set (p: add to existing env; envp: TERM, etc,)● uid, gid=0● execle(“/bin/login”, “login”, “p”, username, (char*)0, envp)● Getpwname (get password file entry); getpass get a password; use

crypt/md5 to validate pwd● Fail: login calls exit(1);noticed by init; respawn action● Success: chdir; chown for terminal device; setgid; initgroups; initenv

(HOME, SHELL, USER, PATH, ...)● Setuid; then execl(“/bin/sh”, “sh”, 0) (2nd arg: login shell)

# Run gettys in standard runlevels1:2345:respawn:/sbin/mingetty tty12:2345:respawn:/sbin/mingetty tty23:2345:respawn:/sbin/mingetty tty34:2345:respawn:/sbin/mingetty tty45:2345:respawn:/sbin/mingetty tty56:2345:respawn:/sbin/mingetty tty6

init

init

getty

login

fork

exec

exec

forks one per tty init

login shell

term dev driver

thru getty/login

fds 0,1,2

userRS-232 cnxn

init

inetd

inetd

telnetd

fork

exec

fork/exec of /bin/sh that executes /etc/rc script init

login shell

term dev driver

thru inetd, telnetd, login

fds 0,1,2

usernetw cnxn

telnet req

Network Logins● Terminal device driver thru, say, RS232

● Shell (fd 0,1,2): user level● Kernel level:

– Line terminal disc (echo chars, assemble chars to lines, bs, Cu, gen SIGINT/SIGQUIT, CS, CQ, newline (CR+LF),...)

– terminal device driver

● Network login: similar to terminal login● init, inetd, telnetd/sshd, login ● Pseudoterminal device driver

● pseudoterminal is a special IPC that acts like a terminal● data written to master side received by the slave side as if it was the result of a

user typing at an ordinary terminal & viceversa

● Netw cnxn thru telnetd/sshd server& telnet/ssh client

,

rlogind

TCP/IP

netw dev driver pty master

login shell

term disc

pty slave

fork

exec, exec

stdout/err stdin

KERNEL

Date post:	10-Feb-2018
Category:	Documents
Upload:	phamkhanh
View:	215 times
Download:	2 times

OS09 K. Gopinath, IIScdrona.csa.iisc.ernet.in/~gopi/os09/extra-sep-os09.pdf · algorithm getblk...

Documents