+ All Categories
Home > Documents > Multiprocessors and Multithreading – classroom slides

Multiprocessors and Multithreading – classroom slides

Date post: 01-Jan-2016
Category:
Upload: suki-winters
View: 26 times
Download: 0 times
Share this document with a friend
Description:
Multiprocessors and Multithreading – classroom slides. Example use of threads - 1. compute thread. I/O thread. compute. I/O request. I/O. I/O complete. I/O result Needed. I/O result Needed. compute. (a) Sequential process. (b) Multithreaded process. - PowerPoint PPT Presentation
45
Multiprocessors and Multithreading – classroom slides
Transcript
Page 1: Multiprocessors and Multithreading – classroom slides

Multiprocessors and Multithreading – classroom slides

Page 2: Multiprocessors and Multithreading – classroom slides

compute

compute

I/O

I/O resultNeeded

(a) Sequential process

compute thread

I/O resultNeeded

(b) Multithreaded process

I/O request

I/O complete

I/O thread

Example use of threads - 1

Page 3: Multiprocessors and Multithreading – classroom slides

Digitizer Tracker Alarm

Example use of threads - 2

Page 4: Multiprocessors and Multithreading – classroom slides

Programming Support for Threads• creation

– pthread_create(top-level procedure, args)• termination

– return from top-level procedure– explicit kill

• rendezvous– creator can wait for children

• pthread_join(child_tid)

• synchronization– mutex– condition variables

Main thread

thread_create(foo, args)

(a) Before thread creation

main thread

thread_create(foo, args)

(b) After thread creation

foo thread

Page 5: Multiprocessors and Multithreading – classroom slides

Sample program – thread create/join

int foo(int n){ ..... return 0;}int main(){ int f; thread_type child_tid; ..... child_tid = thread_create (foo, &f); ..... thread_join(child_tid);}

Page 6: Multiprocessors and Multithreading – classroom slides

Programming with Threads

• synchronization– for coordination of the threads

• communication– for inter-thread sharing of data– threads can be in different processors– how to achieve sharing in SMP?

• software: accomplished by keeping all threads in the same address space by the OS

• hardware: accomplished by hardware shared memory and coherent caches

producer consumer

buffer

Page 7: Multiprocessors and Multithreading – classroom slides

Need for Synchronization

digitizer(){ image_type dig_image; int tail = 0; loop { if (bufavail > 0) { grab(dig_image); frame_buf[tail mod MAX] = dig_image; tail = tail + 1; bufavail = bufavail - 1; } }}

tracker(){ image_type track_image; int head = 0; loop { if (bufavail < MAX) { track_image = frame_buf[head mod MAX]; head = head + 1; bufavail = bufavail + 1; analyze(track_image); } }}

Problem?

Page 8: Multiprocessors and Multithreading – classroom slides

digitizer tracker

bufavail bufavail = bufavail – 1; bufavail = bufavail + 1;

Shared data structure

……

head tail

(First valid filled frame in frame_buf)

(First empty spot in frame_buf)

0 99 frame_buf

Page 9: Multiprocessors and Multithreading – classroom slides

Synchronization Primitives• lock and unlock

– mutual exclusion among threads– busy-waiting Vs. blocking– pthread_mutex_trylock: no blocking– pthread_mutex_lock: blocking– pthread_mutex_unlock

Page 10: Multiprocessors and Multithreading – classroom slides

Fix number 1 – with locks

digitizer(){ image_type dig_image; int tail = 0; loop { thread_mutex_lock(buflock); if (bufavail > 0) { grab(dig_image); frame_buf[tail mod MAX] = dig_image; tail = tail + 1; bufavail = bufavail - 1; } thread_mutex_unlock(buflock); }}

tracker()( image_type track_image; int head = 0; loop { thread_mutex_lock(buflock); if (bufavail < MAX) { track_image = frame_buf[head

mod MAX]; head = head + 1; bufavail = bufavail + 1; analyze(track_image); } thread_mutex_unlock(buflock); }}Problem?

Page 11: Multiprocessors and Multithreading – classroom slides

Fix number 2

digitizer(){ image_type dig_image; int tail = 0; loop { grab(dig_image); thread_mutex_lock(buflock); while (bufavail == 0) do nothing;

thread_mutex_unlock(buflock); frame_buf[tail mod MAX] = dig_image; tail = tail + 1; thread_mutex_lock(buflock); bufavail = bufavail - 1; thread_mutex_unlock(buflock); } }

tracker(){ image_type track_image; int head = 0; loop { thread_mutex_lock(buflock); while (bufavail == MAX) do nothing; thread_mutex_unlock(buflock); track_image = frame_buf[head mod MAX]; head = head + 1; thread_mutex_lock(buflock); bufavail = bufavail + 1; thread_mutex_unlock(buflock); analyze(track_image); }}

Problem?

Page 12: Multiprocessors and Multithreading – classroom slides

Fix number 3

digitizer(){ image_type dig_image; int tail = 0; loop { grab(dig_image); while (bufavail == 0) do nothing; frame_buf[tail mod MAX] = dig_image; tail = tail + 1; thread_mutex_lock(buflock); bufavail = bufavail - 1; thread_mutex_unlock(buflock); } }

tracker(){ image_type track_image; int head = 0; loop { while (bufavail == MAX) do nothing; track_image = frame_buf[head mod MAX]; head = head + 1; thread_mutex_lock(buflock); bufavail = bufavail + 1; thread_mutex_unlock(buflock); analyze(track_image); }}

Problem?

Page 13: Multiprocessors and Multithreading – classroom slides

• condition variables– pthread_cond_wait: block for a signal– pthread_cond_signal: signal one waiting thread– pthread_cond_broadcast: signal all waiting

threads

Page 14: Multiprocessors and Multithreading – classroom slides

T1 T2

cond_wait (c, m)

cond_signal (c) blocked

resumed

T1 T2

cond_wait (c, m)

cond_signal (c)

(a) Wait before signal (b) Wait after signal (T1 blocked forever)

Wait and signal with cond vars

Page 15: Multiprocessors and Multithreading – classroom slides

Fix number 4 – cond var

digitizer(){ image_type dig_image; int tail = 0; loop { grab(dig_image); thread_mutex_lock(buflock); if (bufavail == 0)

thread_cond_wait(buf_not_full, buflock); thread_mutex_unlock(buflock); frame_buf[tail mod MAX] = dig_image; tail = tail + 1; thread_mutex_lock(buflock); bufavail = bufavail - 1; thread_cond_signal(buf_not_empty); thread_mutex_unlock(buflock); }}

tracker(){ image_type track_image; int head = 0; loop { thread_mutex_lock(buflock); if (bufavail == MAX)

thread_cond_wait(buf_not_empty, buflock); thread_mutex_unlock(buflock); track_image = frame_buf[head mod MAX]; head = head + 1; thread_mutex_lock(buflock); bufavail = bufavail + 1; thread_cond_signal(buf_not_full); thread_mutex_unlock(buflock); analyze(track_image); }}

This solution is correct so long as there is exactly one producer and one consumer

Page 16: Multiprocessors and Multithreading – classroom slides

Gotchas in programming with cond vars

acquire_shared_resource(){ thread_mutex_lock(cs_mutex); if (res_state == BUSY) thread_cond_wait (res_not_busy, cs_mutex);

res_state = BUSY; thread_mutex_unlock(cs_mutex);}release_shared_resource(){ thread_mutex_lock(cs_mutex); res_state = NOT_BUSY; thread_cond_signal(res_not_busy); thread_mutex_unlock(cs_mutex);}

T3 is here

T2 is here

T1 is here

Page 17: Multiprocessors and Multithreading – classroom slides

cs_mutex T3

res_not_busy T2

(a) Waiting queues before T1 signals

cs_mutex T3

res_not_busy

T2

(a) Waiting queues after T1 signals

State of waiting queues

Page 18: Multiprocessors and Multithreading – classroom slides

Defensive programming – retest predicate

acquire_shared_resource(){ thread_mutex_lock(cs_mutex); T3 is here while (res_state == BUSY) thread_cond_wait (res_not_busy, cs_mutex); T2 is here res_state = BUSY; thread_mutex_unlock(cs_mutex);}release_shared_resource(){ thread_mutex_lock(cs_mutex); res_state = NOT_BUSY; T1 is here thread_cond_signal(res_not_buys); thread_mutex_unlock(cs_mutex);}

Page 19: Multiprocessors and Multithreading – classroom slides

mail box

Dispatcher

workers

(a) Dispatcher model

mailbox

mailbox

(b) Team model

(c) Pipelined model

stages

Threads as software structuring abstraction

Page 20: Multiprocessors and Multithreading – classroom slides

Threads and OS

Traditional OS

• DOS– memory layout

– protection between user and kernel?

User

Kernel

Program data

DOS code data

Page 21: Multiprocessors and Multithreading – classroom slides

• Unix– memory layout

– protection between user and kernel?– PCB?

user

kernel

P1 P2

process code and data

process code and data

kernel code and data

PCB PCB

Page 22: Multiprocessors and Multithreading – classroom slides

• programs in these traditional OS are single threaded– one PC per program (process), one stack, one

set of CPU registers– if a process blocks (say disk I/O, network

communication, etc.) then no progress for the program as a whole

Page 23: Multiprocessors and Multithreading – classroom slides

MT Operating Systems

How widespread is support for threads in OS?

• Digital Unix, Sun Solaris, Win95, Win NT, Win XP

Process Vs. Thread?

• in a single threaded program, the state of the executing program is contained in a process

• in a MT program, the state of the executing program is contained in several ‘concurrent’ threads

Page 24: Multiprocessors and Multithreading – classroom slides

Process Vs. Thread

– computational state (PC, regs, …) for each thread

– how different from process state?

P1 P2User

Kernel kernel code and data

code data code data

PCB PCB

T2 T3 T1 T1

P1 P2

Page 25: Multiprocessors and Multithreading – classroom slides

(a) ST program (b) MT program

code code

global global

heap heap

stack stack1 stack2 stack3 stack4

Page 26: Multiprocessors and Multithreading – classroom slides

• threads– share address space of process– cooperate to get job done

• threads concurrent?– may be if the box is a true multiprocessor– share the same CPU on a uniprocessor

• threaded code different from non-threaded?– protection for data shared among threads– synchronization among threads

Page 27: Multiprocessors and Multithreading – classroom slides

Threads Implementation

• user level threads– OS independent– scheduler is part of the runtime system– thread switch is cheap (save PC, SP, regs)– scheduling customizable, i.e., more app control– blocking call by thread blocks process

Page 28: Multiprocessors and Multithreading – classroom slides

Kernel

User P2

Threads library

T1 T2 T3

T2 T3 T1

P1 P2 P3 process ready_q

P3

mutex, cond_var

threadready_q

P1

Threads library

T1 T2 T3

T2 T3 T1

mutex, cond_var

threadready_q

Page 29: Multiprocessors and Multithreading – classroom slides

Kernel

User P1

Threads library

T2 T3 T1

Currently executing thread

Blocking call to the OS Upcall to

the threads library

Page 30: Multiprocessors and Multithreading – classroom slides

• solution to blocking problem in user level threads– non-blocking version of all system calls– polling wrapper in scheduler for such calls

• switching among user level threads– yield voluntarily– how to make preemptive?

• timer interrupt from kernel to switch

Page 31: Multiprocessors and Multithreading – classroom slides

• Kernel level– expensive thread switch– makes sense for blocking calls by threads– kernel becomes complicated: process vs.

threads scheduling– thread packages become non-portable

• problems common to user and kernel level threads– libraries– solution is to have thread-safe wrappers to such

library calls

Page 32: Multiprocessors and Multithreading – classroom slides

Kernel

User P2

P1 P2 P3 process ready_q

P3 P1

T2 T3 T1 T2 T1

T1 T2 T3 T1 T2 thread level scheduler

process level scheduler

Page 33: Multiprocessors and Multithreading – classroom slides

Kernel

User P2 P3 P1

T2 T3 T1 T2 T1

lwp

Solaris threads

Page 34: Multiprocessors and Multithreading – classroom slides

/* original version */ | /* thread safe version */ | | mutex_lock_type cs_mutex;void *malloc(size_t size)| void *malloc(size_t size){ | { | thread_mutex_lock(cs_mutex); | ...... | ...... ...... | ...... | | thread_mutex_unlock(cs_mutex); | return(memory_pointer);| return (memory_pointer);} | }

Thread safe libraries

Page 35: Multiprocessors and Multithreading – classroom slides

Synchronization support

• Lock– Test and set instruction

Page 36: Multiprocessors and Multithreading – classroom slides

Shared Memory

CPU CPU CPU CPU . . . . Input/output

Shared bus

SMP

Page 37: Multiprocessors and Multithreading – classroom slides

cache

Shared Memory

CPU

Shared bus

cache

CPU

cache

CPU . . . .

SMP with per-processor caches

Page 38: Multiprocessors and Multithreading – classroom slides

X

Shared Memory

P1

Shared bus

X

P2

X

P3

T1 T2 T3

Cache consistency problem

Page 39: Multiprocessors and Multithreading – classroom slides

X -> X’

Shared Memory

P1

Shared bus

X -> inv

P2

X -> inv

P3

T1 T2 T3

(b) write-invalidate protocol

invalidate ->

X -> X’

Shared Memory

P1

Shared bus

X -> X’

P2

X -> X’

P3

T1 T2 T3

(c) write-update protocol

update ->

Two possible solutions

Page 40: Multiprocessors and Multithreading – classroom slides

Given the following details about an SMP (symmetric multiprocessor):Cache coherence protocol: write-invalidateCache to memory policy: write-backInitially:

The caches are emptyMemory locations:

A contains 10B contains 5

Consider the following timeline of memory accesses from processors P1, P2, and P3.Contents of caches and memory?

Time (in increasing order)

Processor P1 Processor P2 Processor P3

T1 Load A

T2 Load A

T3 Load A

T4 Store #40, A

T5 Store #30, B

Page 41: Multiprocessors and Multithreading – classroom slides
Page 42: Multiprocessors and Multithreading – classroom slides

What is multithreading?

• technique allowing program to do multiple tasks

• is it a new technique?– has existed since the 70’s (concurrent Pascal,

Ada tasks, etc.)

• why now?– emergence of SMPs in particular– “time has come for this technology”

Page 43: Multiprocessors and Multithreading – classroom slides

active

• allows concurrency between I/O and user processing even in a uniprocessor box

process

• threads in a uniprocessor?

Page 44: Multiprocessors and Multithreading – classroom slides

Multiprocessor: First Principles• processors, memories, interconnection

network

• Classification: SISD, SIMD, MIMD, MISD

• message passing MPs: e.g. IBM SP2

• shared address space MPs– cache coherent (CC)

• SMP: a bus-based CC MIMD machine– several vendors: Sun, Compaq, Intel, ...

• CC-NUMA: SGI Origin 2000

– non-cache coherent (NCC)• Cray T3D/T3E

Page 45: Multiprocessors and Multithreading – classroom slides

• What is an SMP?– multiple CPUs in a single box sharing all the

resources such as memory and I/O

• Is an SMP more cost effective than two uniprocessor boxes?– yes (roughly 20% more for a dual processor

SMP compared to a uni)– modest speedup for a program on a dual-

processor SMP over a uni will make it worthwhile


Recommended