Indian Inst of Science 1
OS09K. Gopinath, IISc
Slides are from Tanenbaum, provided as part of his book Updated Tanenbaum slides from
by Darrell Long/Ethan Miller at UCSC Mine own (K. Gopinath) Some papers/books too numerous to list Hence PL. DO NOT CIRCULATE
Sleep and wakeup: ProducerConsumer problem#include "prototypes.h"
#define N 100
int count = 0;
void producer(void) {
int item;
while (TRUE) {
produce_item(&item);
if (count == N) sleep();
enter_item(item);
count = count + 1;
if (count == 1) wakeup(consumer);
}
}
void consumer(void) {
int item;
while (TRUE) {
if (count == 0) sleep();
remove_item(&item);
count = count - 1;
if (count == N - 1) wakeup(producer);
consume_item(item);
}
}
Race problem if just before sleep, consumer swapped out and producer signals
Interrupts & Kernel code● Interrupts (or process scheduling) can occur anytime
● Interrupt handler can also call brelse just like kcode● However, interrupt handler should not block● Otherwise, the process on whose (kernel) stack the interrupt
handler runs blocks● Can expose data structures in an inconsistent state
● List manipulation requires multiple steps● Interrupt can expose intermediate state
● Interrupt handler can manipulate linked lists that kernel code could also be manipulating● Need to raise “processor execution level” to mask interrupts (or
scheduling)● Check & sleep (or test & set) should be atomic
Buffer Allocation Algs● getblk: given a filesystem number and disk block number, get the
buffer for it locked
● brelse: given a locked buffer, wakeup waiting procs and unlock it
● bread: read a given disk block into a buffer
● breada: bread + asynch. read ahead
● bwrite: write a given buffer to a disk block
● Buffer properties
● No file block in 2 different buffers● Can be in free list or hash list: Search free list if any buffer
needed; hash list if a particular buffer needed● Buf alloc safe: allocs during syscall & frees at end
– Disk drive hw problem: cannot interrupt CPU: buf lost!– But no starvation guarantees:
Data Structures● inode: owner, file type (REG, DIR, FIFO, CHR,
BLK, ...), access perms/times, #links, disk addrs for blocks in file, file size
● incore inode: addl fields: locked?, processwaiting?, dirty?, mount point?; reference count (# of opens), ptrs to other incore inodes(free and hash q)
● superblock: size of FS/inode list, #free blocks/inodes, dirty?,list/bitmap of free blocks/inodes, index of next free block/inode, locks for lists/bitmap
algorithm getblkinput: file sys #, block #; output: locked buffer that can now be used for block
while (buffer not found) { 1
if (block in hash queue) { 2
if (buffer busy) { 3
sleep (event buffer becomes free); 4
continue; 5
} 6
mark buffer busy; 7
remove buffer from free list; 8
return buffer; 9
} else { 10
if (there are no buffers on free list){11
sleep (event any buffer becomes free);
continue;
}
remove buffer from free list;
if (buffer marked for delayed write) {
asynchronous write buffer to disk;
continue;
}
mark buffer busy;
remove buffer from old hash queue;
put buffer onto new hash queue;
return buffer;
}
}
bread(filesystem f, block n) {
getblk(f,n);
if (buffer data valid) return buffer;
initiate disk read;
sleep(event disk read complete);
return (buffer); }
Race Conditions P1 P2 P3block b not
on hash Q
no free bufs
sleep
block b not on hash Q
no free bufs
sleep
free a buf
wakeup
use freed buf
wakeup
Try from beg!
P1 P2 P3alloc buf to block b
lock buf
init I/O
sleep until
I/O done
buf locked
sleep
wait for any buf
get buf b & reassign to b'
wakeup; try again!
algorithm brelseinput : locked buffer
output : none
{wake up all procs : event, waiting for any buffer to become free
wake up all procs : event, waiting for this buffer to become free
raise processor execution level to block interrupts;
if (buffer contents are valid and buffer not old)
enqueue buffer at end of free list
else enqueue buffer at the beginning of the free list
lower processor execution level to allow interrupts;
unlock(buffer);}
bwrite(buf b) {
initiate disk write;
if (I/O synchronous) {sleep(event I/O complete); brelse(b);}
else (if b marked for delayed write) mark buffer to put at head of list
}
Problems● 1st prob
● P1 finds buffer is busy (line 3) & “starts” to sleep ● P1 gets preempted as P2 finishes use of buffer (thru interrupt)
and releases it● P1 now begins sleep even though buffer free
● 2nd prob
● P1 finds no free buffers (line 11)● P1 gets preempted as P2 finishes use of buffer (thru interrupt)
and releases it● P1 now begins sleep even though buffer free
● 3rd prob
● Both getblk (line 8) and brelse manipulate free list
Atomicity violated in each case: interleaved execution of getblk with interrupt handler or with brelse
Need to block interrupts; Also: have to watch out for hardware error!
Other file ops● iget: get a locked inode (doing bread if necessary) given
inode number
● iput: release an inode; if ref count 0, writes dirty inode
● bmap: given inode and byte offset, returns disk block num and offset
● namei: given a path, get the locked inode
● ialloc: assign a new disk inode for a newly created file
● ifree: free an inode (link count 0)
● alloc: allocate a free disk block and return buffer using getblk
● free: free a disk block
initsem(semaphore *sem, int val) {
*sem = val
}
void P(semaphore *sem) {
*sem -= 1;
while (*sem <0) sleep
}
void V(semaphore *sem) {
*sem += 1;
if (*sem<=0) wakeup thread blocked on sem
}
boolean_t CP(semaphore *sem) {
if (*sem>0) {*sem -= 1; return(TRUE) } else return(FALSE)
}
● Mutex thru Semaphore semaphore sem;
initsem(&sem, 1);
P(&sem);
use resource
V(&sem);
● Eventwait semaphore event;
initsem(&event, 0);
P(&event);
event processing
V(&event);
● Countable Resources
semaphore counter;
initsem(&counter, count);
P(&counter); use resource; V(&counter)
Have we solved the problem?● P() and V() must be executed atomically● In uniprocessor system may disable interrupts● In multiprocessor system, use hardware
synchronization primitives● TS, FAA, etc…
● Involves some limited amount of busy waiting
Simulation of a monitor with semaphorestypedef int semaphore;
semaphore mutex = 1;
void enter_monitor(void) {
down(mutex);
}
void leave_normally(void) {
up(mutex);
}
void leave_with_signal(semaphore c) {
/* signal on c & exit monitor */
up(c);
}
void wait(semaphore c) {
up(mutex);
down(c);
}
Java Monitors● void wait(); Enter a monitor's wait set until notified by
another thread● void wait(long timeout); Enter a monitor's wait set until
notified by another thread or timeout milliseconds elapses● void wait(long timeout, int nanos); Enter a monitor's wait
set until notified by another thread or timeout milliseconds plus nanos nanoseconds elapses
● void notify(); Wake up one thread waiting in the monitor's wait set. (If no threads are waiting, do nothing.)
● void notifyAll(); Wake up all threads waiting in the monitor's wait set. (If no threads are waiting, do nothing.)
Java (contd)● Each Java monitor has a single nameless anonymous
condition variable on which a thread can wait() or signal one waiting thread with notify() or signal all waiting threads with notifyAll().
● This nameless condition variable corresponds to a lock on the object that must be obtained whenever a thread calls a synchronized method in the object. ● Only inside a synchronized method may wait(), notify(),
and notifyAll() be called.● Methods that are static can also be synchronized. There is a
lock associated with the class that must be obtained when a static synchronized method is called.
Problems with Semaphores● Too complex?
● Needs lowlevel atomic op to construct, blocking & unblocking involve context switches, manipulates scheduler and sleep Qs
● Good for resources held for long times, not for short ● Good as V only wakes up if someone can run● But this can result in convoys
● Low priority process P1 that has locked an imp lock (L) preempted by P2 which then waits for L
– Imp lock: Often log lock in txnal systems– P3 also needs L, P4 also, ... all wait
● P1 scheduled again (FIFO) & unlocks L● P2 gets lock (P1 preempted), P2 uses lock, then P3, ...● For next upd, P2 goes back to Q again, then P3, P4,...● Lockunlock: 100's of insts; lockwaitdispatchunlock: 1000's
Semaphore Problem?● 1: T2 (P2) in cs (using a sem): blocks T3; T4 on run Q● 2: T2 exits cs but active; T3 now gets sem but inactive ● 3: T1 (P1) now wants to enter cs but blocked by T3
● T1 blocked even if no one in cs! FIFO property!● T4 scheduled on P1
– T1 & T3 cannot run unless T2 or T4 giveup– Processor 1: T1 > T1 > T4– Processor 2: T2 > T2 > T2
● Problem in step 2:● Have to make sure that T3 does not get sem but on ready
Q. T1 will then get sem & no context sw.● Need different semantics: eg: condition variables
Message Passing: Mailboxes, Ports, CSPSend/Receive; Blocking/nonblocking
typedef int message[MSIZE];
void producer(void){
int item;
message m;
while (TRUE) {
produce_item(&item);
receive(consumer, &m);
bulid_message(&m, item);
send(consumer, &m);
}
}
void consumer(void){
int item, i; message m;
for (i = 0; i < N; i++) send(producer, &m);
while (TRUE) {
receive(producer, &m);
extract_item(&m, & item);
send(producer, &m);
consumer_item(item);
}
}
Fork & fork1 in MT processes● Process with exactly 1 LWP=> same semantics as “old Unix”
process● copy all LWPs on fork? Solaris9 but not Posix
● one LWP blocked in parent: what about in child? Restart? Concurrent syscalls? EINTR or wait(disk)?
● one LWP has open netw cnxn: if closed, unexpected user msg to remote node
● one LWP changing a shared data structure: corruption thru the new copy of LWP? How to make a “consistent” copy?
● copy only calling LWP? Fork1: Solaris10; good for exec'ing● some user thrs not on LWPs that were in parent● child process should not try to acq locks held by LWPs not
in child (deadlock!) but user code cannot know! these locks may be held by ulib POSIX
fork1fork1(): only calling LWP created in child
registration of fork_handlers (_atfork)
prepare: prior to fork in the ctxt of calling LWP. LIFO
parent: after fork. FIFO
child: after fork in context of 1 thr in child. FIFO
LIFO/FIFO order to enable preserving of locking order
int pthread_atfork(void (*prepare) (void), void (*parent) (void), void (*child) (void));
handles orphaned mutexes
prepare fork handlers lock all mutexes (by calling thr)
parent/child fork handlers unlock mutexes
indep libs & appl progs can protect themselves
lib provides fork handlers
Fork and threads Thr A Thr B Thr B_ch
locks mutex
modifies shared
data fork
copy of locked mutex and
inconsistent data struct.
/* cannot drop mutex as
data inconsistent nor can
it take mutex: deadlock*/
/* memory leaks also! */
Solutions?● programs that use fork() call an exec function soon
afterwards in child process, thus resetting all states● In the meantime, only a short list of async-signal-safe
library routines are promised to be available
● But not good wrt multi-threaded (MT) libraries. ● Applications may not be aware that a MT library is in use,
and feel free to call any number of library routines between the fork() and exec calls. They may be extant 1-threaded programs that cannot be expected to obey new restrictions imposed by the threads library.
● A MT library needs a way to protect its internal state during fork() in case it is re-entered later in the child process. eg. MT I/O libraries, which are invoked between the fork() and exec calls to effect I/O redirection.
Fork handling● Lock global mutexes
● Other threads locked out of the critical regions of code protected by these mutexes
● Can take snapshot: copy of valid, stable data
● Reset synchr objects in the child process● ensures they are properly cleansed of any artifacts from
the threading subsystem of parent process● eg. a mutex may inherit a wait queue of threads waiting for
the lock; this wait queue makes no sense in child. Initialize mutex to remedy (deletes unnecessary data structures in child). Otherwise memory leaks!
● But how to correct or otherwise deal with the inconsistent state in the child?
With pthread_atfork: no orphaned locks!
prepare: lock(mutex)
parent: unlock(mutex)
child: unlock(mutex)
Thr A Thr B ThrB_ch locks mutex
modifies shared data
attempt fork but blocked as
prepare (lock mutex) blocked
drops mutex (shared data now consistent!)
prepare succeeds (locks mutex)
fork completes
unlocks mutex (parent) unlock mutex(child)
Solutions0: pthread_atfork: provides MT libraries with a means
to protect themselves from innocent appls that call fork(), and provides MT appls with a std mech for protecting themselves from fork() calls in a lib routine or the appl itself. But COMPLEX!!! Avoid problems by
1: If posssible, fork before creating any threads
2: Instead of fork, create a new thread. If forking to exec a binary, can attempt to convert binary to a shared lib that can be linked to.
3: Try a surrogate parent method. Fork at init time; the child will be a "surrogate" parent that will remain 1-threaded. When exec is needed, child is informed and it does a fork/exec
Posix Model of Concurrency● Creation● pthread_create(tp, attrp, fptr, argp) ● pthread_attr_xxx(): manipulate attr of a thread
– Init/destroy; set/get detachstate, inheritsched, schedparam, schedpolicy, scope, stackaddr, stacksize
● Exit● pthread_exit(retvalp) ● pthread_join(t, **v): wait for another thread termination ● pthread_detach(t): storage for thread can be reclaimed
when thread terminates (no zombie)● Thread Specific Data (indexed by key)
● pthread_key_create(keyp, fpdestructor)/_delete() ● pthread_setspecific()/_getspecific() mapping betw key and thread
● Signal: pthread_sigmask(how, newmask, saveprev): change signal mask for calling thread
● pthread_kill(t, sig) sigwait: suspend thr till sig ● ID: pthread_self()
● pthread_equal(t1, t2) ● pthread_once(once?, fptr): ensure some init at most once
● Scheduling● pthread_setschedparam()/_getschedparam()
● Cancellation (cancellation pts: _join, _cond_wait, _cond_timedwait, sem_wait, sigwait, _testcancel)
● pthread_cancel(t) by others /pthread_testcancel(void) by self
● pthread_setcancelstate()/type()● pthread_cleanup_pop()/_push(): if a thread exits or cancelled
(with locked mutexes?), cleanup handlers executed; LIFO order
● Mutex● pthread_mutex_init()/_destroy()● pthread_mutexattr_xxx()
– Init/destroy; set/get pshared, protocol, prioceiling● pthread_mutex_setprioceiling()/_getprioceiling()● pthread_mutex_lock()/_trylock()/_unlock()
● Condition Variable● pthread_cond_init()/_destroy()● pthread_condattr_xxx()
– Init/destroy; set/get pshared● pthread_cond_wait()/_timedwait()● pthread_cond_signal()● pthread_cond_broadcast()
Condition variablesint x,y;
pthread_mutex_t mut =PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
// (waiter) Wait until x is greater than y
pthread_mutex_lock(&mut);
while (x <= y) pthread_cond_wait(&cond, &mut);
/* operate on x and y */
pthread_mutex_unlock(&mut);
// (signaller) Signal if modifications on x and y st x>y
pthread_mutex_lock(&mut);
/* modify x and y */
if (x > y) pthread_cond_broadcast(&cond);
pthread_mutex_unlock(&mut);
// (waiter) if timeout also
struct timeval now;
struct timespec timeout;
int retcode;
pthread_mutex_lock(&mut);
gettimeofday(&now);
timeout.tv_sec = now.tv_sec + 5;
timeout.tv_nsec = now.tv_usec * 1000;
retcode = 0;
while (x <= y && retcode != ETIMEDOUT)
retcode = pthread_cond_timedwait(&cond, &mut, &timeout);
if (retcode == ETIMEDOUT) {/* timeout occurred */}
else { /* operate on x and y */}
pthread_mutex_unlock(&mut);
● Semaphore● sem_init()/_destroy() ● sem_open()/_close() ● sem_wait()/_trywait() ● sem_post() ● sem_getvalue() ● sem_unlink()
● fork() Clean Up Handling● pthread_atfork()
● Async safe? Some pthread calls not safe to call from sig handlers● A user thr lib may have taken a lock to ensure, say, that
only one user changing Qs. If pthread_mutex_lock, etc, may deadlock
Signals:● oldest ipc method used by UNIX systems to signal asynchronous
events. ONLY 1BIT INFO!● can be generated by a keyboard interrupt or an error condition
or by other processes in the system (if they have the correct privileges)● kernel & superuser can send a signal to any process ● a process can also send a signal to other processes with same
uid/gid● Processes can handle signals themselves or allow kernel to
handle● If kernel handles the signal, default action for the signal: eg, SIGFPE
causes core dump and causes the process to exit● SIGSTOP (causes a process to halt its execution) and SIGKILL handled only by
kernel
● List of signals on an Linux/Intel machine: SIGHUP SIGINT SIGQUIT SIGILL SIGTRAP SIGIOT SIGBUS SIGFPE SIGKILL SIGUSR1 SIGSEGV SIGUSR2 SIGPIPE SIGALRM SIGTERM SIGCHLD SIGCONT SIGSTOP SIGTSTP SIGTTIN SIGTTOU SIGURG SIGXCPU SIGXFSZ SIGVTALRM SIGPROF SIGWINCH SIGIO SIGPWR
Signals (cont’d)
● void (*signal(int signo, void (*func) (int))) (int) =● typedef void Sigfunc(int); Sigfunc *signal(int, Sigfunc *)
● Signal is a func that returns a ptr to a func that ret void (prev sigh)● Or, sighandler_t signal(int signum, sighandler_t handler);
● Linux implements signals using information stored in in task_struct of process:● struct sigpending pending: currently pending signals
● blocked: mask of blocked signals● struct signal_struct *sig has array of sigactions that holds info
about how the process handles each signal
● Signals generated by setting appropriate bit in signal field of pending. If not blocked, scheduler will run handler in the next system scheduling.
● Every time a process exits from a system call, the signal and blocked fields are checked, and if there is any unblocked signal, the handler is called.
#include <signal.h>
static void sig_usr(int); /* one handler for both signals */
int main(void) {
if (signal(SIGUSR1, sig_usr) == SIG_ERR)
err_sys("can't catch SIGUSR1");
if (signal(SIGUSR2, sig_usr) == SIG_ERR)
err_sys("can't catch SIGUSR2");
for ( ; ; ) pause(); }
static void sig_usr(int signo) { /* argument is signal number */
if (signo == SIGUSR1) printf("received SIGUSR1\n");
else if (signo == SIGUSR2)
printf("received SIGUSR2\n");
else err_dump("received signal %d\n", signo);
return;}
#include <signal.h> No Qing for nonreal time signals!
main() { int childPid, i; void SigIntHandler();
sigblock(sigmask(SIGINT)); signal(SIGINT, SigIntHandler);
childPid = fork(); if (childPid > 0) { /* parent */ for (i=0; i < 10 ; i++) kill(childPid, SIGINT); printf("Parent has issued %d signals to the child\n", i); } else { /* child */ sleep(2); /* sleep for 2 secs so that signals overwritten */ while (1) sigpause(0); }}void SigIntHandler(int signo) { printf("Child : received a signal\n");}
signal: V7, SVR2/3/4 (handler uninstalled, no blocking of signals, no
autostart of interrupted system calls)
sigset, sighold, sigrelse, sigignore, sigpause: SVR3/4 (no autostart)
signal, sigvec, sigblock, sigsetmask(unblock a signal), sigpause: 4.x BSD (autostart 4.2; default 4.3/4.4)
sigaction, sigprocmask, sigpending, sigsuspend: autostart unspecified (POSIX.1), optional(SVR4, 4.3/4.4BSD, Linux)
sigprocmask: change the list of currently blocked signals
sigpending: allows examination of pending signals (ones
which have been raised while blocked)
sigsuspend: replaces with given signal mask & suspends process until a signal
int sigaction(int signo, const struct sigaction *act, struct sigaction *oact)
struct sigaction {
void (*sa_handler)();
sigset_t sa_mask; /* addl signals to block */
int sa_flags; /* restart?, alt stack?, waitchild?, uninstall handler? ...*/ }
Unreliable signalsold V7 code: race with a new signal for process before signal reinstalled
int sig_int();
...
signal(SIGINT, sig_int);
...
sig_int() {
/* another signal can come here! can cause default action */
signal(SIGINT, sig_int);
...
}
Another race
int sig_int_flag;
main() { int sig_int(); ... signal(SIGINT, sig_int); ... while (sig_int_flag==0) /* signal can come here! */ pause(); ...}
sig_int() { signal(SIGINT, sig_int); sig_int_flag=1}
int sighold(int sig); int sigrelse(int sig) SysVsighold(SIGQUIT); sighold(SIGINT)
c.s.
sigrelse(SIGINT); sigrelse(SIGQUIT)
int sig_int_flag;
main() {
int sig_int();
...
signal(SIGINT, sig_int);
...
sighold(SIGINT);
while (sig_int_flag==0) sigpause(SIGINT); //atomically release signal
/* wait for a signal to occur */ // and pause
...
Restarting of interrupted system calls by signals 4.3BSDCan only call reentrant functions within signal handlers
int oldmask; /* SIGQUIT: quit key + core image; SIGINT: interrupt key ^C */oldmask= sigblock (sigmask(SIGQUIT)|sigmask(SIGINT)); /* block SIGQUIT/INT */c.s.sigsetmask(oldmask) /* reset to old mask */
int sig_int_flag;main() { int sig_int();...signal(SIGINT, sig_int);...sigblock(sigmask(SIGINT)); /* sigblock returns mask before */while (sig_int_flag==0) sigpause(0); /*wait for signal to occur *//* sigpause(0) <> sigsetmask + pause as signal can in betw *//* process signal... */...}
Executing Signal Handlers in Linux● On signal (either from kernel or another process), ker checks some
conditions (disp, etc) before calling do_signal
● do_signal in kernel while (user) signal handler in user mode
● After signal handler run, kernel code executed further
● However, ker stack no longer contains hw context of interrupted program as ker stack emptied on user mode
● Also, sig handlers can reenter kernel (syscalls, etc.)● Solution: copy hw context saved in ker stack to user stack of curr
process
● When sig handler terminates, sigreturn syscall automatically invoked to copy hw context back to kernel stack & restore the user stack
● Sigframe struct pushed on stack has some code for calling sigreturn: stack has to be executable!!!
pselect● int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set
*exceptfds, struct timeval *utimeout);
● int pselect(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, const struct timespec *ntimeout, sigset_t *sigmask);
● pselect() used to wait for a signal as well as data from a fd ● Programs that receive signals as events normally use the
signal handler only to raise a global flag. ● The global flag indicates that the event must be processed in
the main loop of the program. ● A signal will cause the select()/pselect() to return with errno set
to EINTR. This behavior is essential so that signals can be processed in the main loop of the program, otherwise select() would block indefinitely.
Race condition
● Somewhere in the main loop, a conditional checks the global flag.
● What if a signal arrives after the conditional, but before the select() call?
● select() would block indefinitely, even though an event is actually pending.
pselect example
● udp broadcast
Linux Concurrency Model● Within appl: clones (incl threads & processes of other systems)● Inside kernel:
● Kernel threads: do not have USER context● deferrable and interruptible ker funcs:
– Softirq: reentrant: multiple softirqs of the same type can be run concurrently on several CPUs.
● No dyn alloc! Have to be statically defined at compile time. – Tasklet: multiple tasklets of the same type cannot run
concurrently on several CPUs. ● Dyn alloc OK! Can be allocated and initialized at run time (loadable
modules). Impl thru softirqs
– Bottom Half: multiple bottom halves cannot be run concurrently on several CPUs. No dyn alloc!
● Impl thru tasklets
● Across HW: IPI
Spinlocks & Semaphores● Shared data betw different parts of code in kernel
● most common: access to data structures shared between user process context and interrupt context
● In uniprocessor system: mutual excl by setting and clearing interrupts + flags
● SMP: three types of spinlocks: vanilla (basic), readwrite, bigreader● Readwrite spinlocks when many readers and few writers
– Eg: access to the list of registered filesystems. ● Bigreader spinlocks a form of readwrite spinlocks optimized for
very light read access, with penalty for writes
– limited number of bigreader spinlocks users. – used in networking part of the kernel.
● semaphores: Two types of semaphores: basic and readwrite semaphores. Different from IPC's● Mutex or counting up()& down(); interruptible/ non
Spinlocks: (cont’d)● A good example of using spinlocks: accessing a data strucuture
shared betw a user context and an interrupt handlerspinlock_t my_lock = SPIN_LOCK_UNLOCKED;
my_ioctl() { // _ioctl: definitely process context!
spin_lock_irq(&my_lock); // and known that interrupts enabled!
/* critical section */ // hence, _irq to disable iinterrupts
spin_unlock_irq(&my_lock);
}
my_irq_handler() { // _irq_handler: definitely system (or intr
spin_lock(&lock); // context)& hence known that intr disabled!
/* critical section */ // can use simpler lock
spin_unlock(&lock);
}
spin_lock: if interrupts disabled or no race with interrupt context
spin_lock_irq: if interrupts enabled and has to be disabled spin_lock_irqsave: if interrupt state not known● Basic premise of a spin lock: one thread busywaits on a resource on one
processor while another used on another (only true for MP). But code has to work for 1 or more processors. If all threads on 1 processor, if a thread tries to spin lock that is already held by another thread, deadlock.
● Never give up CPU when holding a spinlock!
Linux 2.4 buffer cachestruct buffer_head * getblk(kdev_t dev, int block, int size) {
for (;;) {
struct buffer_head * bh;
bh = get_hash_table(dev, block, size);
if (bh)
return bh;
if (!grow_buffers(dev, block, size))
free_more_memory();
}
}
struct buffer_head * get_hash_table(kdev_t dev, int block, int size){
struct buffer_head *bh, **p = &hash(dev, block);
read_lock(&hash_table_lock);
for (;;) { static inline void get_bh(struct buffer_head * bh) {
bh = *p; atomic_inc(&(bh)->b_count);
if (!bh) break; }
p = &bh->b_next;
if (bh->b_blocknr != block) continue;
if (bh->b_size != size) continue;
if (bh->b_dev != dev) continue;
get_bh(bh); #define hash(dev,block) break; } hash_table[(_hashfn(HASHDEV(dev),block) & bh_hash_mask)]
read_unlock(&hash_table_lock); bh_hash_mask = (nr_hash - 1)
return bh;
} lru_list_lock > hash_table_lock > unused_list_lock: hier
#define HASHDEV(dev) ((unsigned int ) (dev))
Linux downstatic inline void down(struct semaphore * sem) {
__asm__ __volatile__( "# atomic down operation\n\t"
LOCK "decl %0\n\t" /* --sem->count */
"js 2f\n"
"1:\n"
LOCK_SECTION_START("")
"2:\tcall __down_failed\n\t"
"jmp 1b\n"
LOCK_SECTION_END
:"=m" (sem->count)
:"c" (sem)
:"memory");
}
asm( ".text\n" ".align 4\n"
".globl __down_failed\n"
"__down_failed:\n\t"
#if defined(CONFIG_FRAME_POINTER)
"pushl %ebp\n\t" "movl %esp,%ebp\n\t"
#endif
"pushl %eax\n\t"
"pushl %edx\n\t"
"pushl %ecx\n\t"
"call __down\n\t"
"popl %ecx\n\t"
"popl %edx\n\t"
"popl %eax\n\t"
#if defined(CONFIG_FRAME_POINTER)
"movl %ebp,%esp\n\t" "popl %ebp\n\t"
#endif
"ret" );
void __down(struct semaphore * sem) { struct task_struct *tsk = current; DECLARE_WAITQUEUE(wait, tsk); tsk->state = TASK_UNINTERRUPTIBLE; add_wait_queue_exclusive(&sem->wait, &wait);
spin_lock_irq(&semaphore_lock); sem->sleepers++; for (;;) { int sleepers = sem->sleepers;
/*Add "everybody else" into it. They aren't * playing, because we own the spinlock. */ if (!atomic_add_negative(sleepers - 1, &sem->count)) { sem->sleepers = 0; break; } sem->sleepers = 1; /* us - see -1 above */ spin_unlock_irq(&semaphore_lock); schedule(); tsk->state = TASK_UNINTERRUPTIBLE; spin_lock_irq(&semaphore_lock); } spin_unlock_irq(&semaphore_lock); remove_wait_queue(&sem->wait, &wait); tsk->state = TASK_RUNNING; wake_up(&sem->wait);}
Analysis● Semaphore open: count=1, sleepers=0: down makes
count 0; __down not executed● Semaphore closed & no sleeping processes:
count=0, sleepers=0=> count -1 & sleepers 1● Each iteration checks if count negative
– Negative: schedule() & check again– Otherwise: sleepers=0; wakeup another (but Q empty)
● Semaphore closed & other sleeping processes: count, sleepers (-1,1) => (-2, 1)● Sleepers temporarily 2, count becomes -1 again
– Checks if count still negative as holding process may V● If negative: schedule ● Not negative:
#define DECLARE_WAITQUEUE(name, tsk) wait_queue_t name = __WAITQUEUE_INITIALIZER(name, tsk)
#define __WAITQUEUE_INITIALIZER(name, tsk) { task: tsk, task_list: { NULL, NULL }, __WAITQUEUE_DEBUG_INIT(name)}static spinlock_t semaphore_lock = SPIN_LOCK_UNLOCKED
struct semaphore { atomic_t count; int sleepers; wait_queue_head_t wait;#if WAITQUEUE_DEBUG long __magic;#endif};#define spin_lock_irq(lock) do { local_irq_disable(); spin_lock(lock); } while (0)#define local_irq_disable() __cli()
struct __wait_queue { unsigned int flags;#define WQ_FLAG_EXCLUSIVE 0x01 struct task_struct * task; struct list_head task_list;#if WAITQUEUE_DEBUG long __magic; long __waker;#endif };typedef struct __wait_queue wait_queue_t;
static inline void up(struct semaphore * sem) {
__asm__ __volatile__( "# atomic up operation\n\t"
LOCK "incl %0\n\t" /* ++sem->count */
"jle 2f\n"
"1:\n"
LOCK_SECTION_START("")
"2:\tcall __up_wakeup\n\t"
"jmp 1b\n"
LOCK_SECTION_END
".subsection 0\n"
:"=m" (sem->count)
:"c" (sem)
:"memory");
}
asm(
".text\n" ".align 4\n"
".globl __up_wakeup\n"
"__up_wakeup:\n\t"
"pushl %eax\n\t"
"pushl %edx\n\t"
"pushl %ecx\n\t"
"call __up\n\t"
"popl %ecx\n\t"
"popl %edx\n\t"
"popl %eax\n\t"
"ret");
#define wake_up(x) __wake_up((x),TASK_UNINTERRUPTIBLE |TASK_INTERRUPTIBLE, 1)
void __up(struct semaphore *sem) {
wake_up(&sem->wait);
}
void __wake_up(wait_queue_head_t *q, unsigned int mode, int nr_exclusive{
unsigned long flags;
if (unlikely(!q)) return;
spin_lock_irqsave(&q->lock, flags);
__wake_up_common(q, mode, nr_exclusive, 0);
spin_unlock_irqrestore(&q->lock, flags);
}
static inline void __wake_up_common(wait_queue_head_t *q, unsigned int mode, int nr_exclusive, int sync) {
struct list_head *tmp; unsigned int state; wait_queue_t *curr; task_t *p;
list_for_each(tmp, &q->task_list) {
curr = list_entry(tmp, wait_queue_t, task_list);
p = curr->task;
state = p->state;
if ((state & mode) && try_to_wake_up(p, sync) && ((curr->flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)) break;
}
}
Process Tree● Init reads /etc/inittab● Opens tty
● Fd 0,1,2 set to dev● Login printed● Read user name● Initial env set (p: add to existing env; envp: TERM, etc,)● uid, gid=0● execle(“/bin/login”, “login”, “p”, username, (char*)0, envp)● Getpwname (get password file entry); getpass get a password; use
crypt/md5 to validate pwd● Fail: login calls exit(1);noticed by init; respawn action● Success: chdir; chown for terminal device; setgid; initgroups; initenv
(HOME, SHELL, USER, PATH, ...)● Setuid; then execl(“/bin/sh”, “sh”, 0) (2nd arg: login shell)
# Run gettys in standard runlevels1:2345:respawn:/sbin/mingetty tty12:2345:respawn:/sbin/mingetty tty23:2345:respawn:/sbin/mingetty tty34:2345:respawn:/sbin/mingetty tty45:2345:respawn:/sbin/mingetty tty56:2345:respawn:/sbin/mingetty tty6
init
init
getty
login
fork
exec
exec
forks one per tty init
login shell
term dev driver
thru getty/login
fds 0,1,2
userRS-232 cnxn
init
inetd
inetd
telnetd
fork
exec
fork/exec of /bin/sh that executes /etc/rc script init
login shell
term dev driver
thru inetd, telnetd, login
fds 0,1,2
usernetw cnxn
telnet req
Network Logins● Terminal device driver thru, say, RS232
● Shell (fd 0,1,2): user level● Kernel level:
– Line terminal disc (echo chars, assemble chars to lines, bs, Cu, gen SIGINT/SIGQUIT, CS, CQ, newline (CR+LF),...)
– terminal device driver
● Network login: similar to terminal login● init, inetd, telnetd/sshd, login ● Pseudoterminal device driver
● pseudoterminal is a special IPC that acts like a terminal● data written to master side received by the slave side as if it was the result of a
user typing at an ordinary terminal & viceversa
● Netw cnxn thru telnetd/sshd server& telnet/ssh client
,
rlogind
TCP/IP
netw dev driver pty master
login shell
term disc
pty slave
fork
exec, exec
stdout/err stdin
KERNEL