+ All Categories
Home > Documents > 4.1 Introduction to Threads Overview Multithreading Models Thread Libraries Threading Issues...

4.1 Introduction to Threads Overview Multithreading Models Thread Libraries Threading Issues...

Date post: 03-Jan-2016
Category:
Upload: anis-shields
View: 303 times
Download: 2 times
Share this document with a friend
Popular Tags:
32
4.1 Introduction to Threads Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads
Transcript

4.1

Introduction to Threads

Overview

Multithreading Models

Thread Libraries

Threading Issues

Operating System Examples

Windows XP Threads

Linux Threads

4.2

Threads

A Thread is just a sequence of instructions to execute

Threads share the same memory space as other threads in the same application – so they automatically share data and variables.

Threads can run on different processor cores on a multicore processor – this makes applications faster and more responsive

Even on a single core processor threads make an application more responsive – if one thread stops waiting for I/O, other threads can still run

Processes have a unique virtual memory address space and they take a lot longer for the OS to switch between than threads. Sharing data requires additional overhead and steps – so they have a lot more overhead than threads in many applications. Most applications have one process with several threads.

In C/C++, a thread typically runs the code in a C/C++ function and a special API call starts up a new thread running that function.

4.3

Single and Multithreaded Processes

4.4

Benefits of Threads

Responsiveness

Applications can run up to N times faster on an N core processor

Resource Sharing

Economy

Scalability

4.5

Multicore Programming

Applications only run on one processor core - unless they use multiple threads

Multicore systems are putting more pressure on programmers to use threads, multithreaded application challenges include:

Dividing activities

Balancing the Computational Load

Data splitting

Data dependency

Testing and debugging

4.6

Concurrent Execution on a Single-core System

OS can time slice between the four Threads T1…T4

4.7

Parallel Execution on a Multicore System

OS can time slice the four Threads T1…T4 on two processor cores. Two threads can run in parallel on different cores. Application could run up to twice as fast. Without threads, an application can run on only one core!

4.8

User Threads

Thread management done by a user-level threads library

Three primary thread libraries:

POSIX Pthreads

Win32 threads

Java and C# threads

4.9

Thread Libraries

Thread library provides programmer with API for creating and managing threads

Two primary ways of implementing

Library entirely in user space

Kernel-level library supported by the OS

4.10

Pthreads

A POSIX standard (IEEE 1003.1c) API for thread creation and synchronization

API specifies behavior of the thread library, implementation is up to development of the library

Common in UNIX operating systems (Solaris, Linux, Mac OS X)

Can also be added to Windows by installing the optional Pthreads library

4.11

Java and C# Threads

Thread support is built into these newer languages with keywords

Java threads are managed by the JVM

C# thread support is in .Net Framework (the C# JVM)

Typically implemented using the threads model provided by underlying OS

Java and C# threads may be created by:

Extending Thread class

Implementing the Runnable interface

4.12

Threading Issues

Semantics of fork() and exec() system calls

Thread cancellation of target thread

Asynchronous or deferred

Signal handling

Thread pools

Thread-specific data

Scheduler activations

4.13

Thread Cancellation

Terminating a thread before it has finished

Two general approaches:

Asynchronous cancellation terminates the target thread immediately

Deferred cancellation allows the target thread to periodically check if it should be cancelled

4.14

Signal Handling

Signals are used in UNIX systems to notify a process that a particular event has occurred

A signal handler is used to process signals

1. Signal is generated by particular event

2. Signal is delivered to a process

3. Signal is handled

Options:

Deliver the signal to the thread to which the signal applies

Deliver the signal to every thread in the process

Deliver the signal to certain threads in the process

Assign a specific thread to receive all signals for the process

4.15

Thread Pools

Create a number of threads in a pool where they await work

Advantages:

Usually slightly faster to service a request with an existing thread than create a new thread

Allows the number of threads in the application(s) to be bound to the size of the pool

4.16

Windows Threads

Implements the one-to-one mapping, kernel-level

Each thread contains

A thread id

Register set

Separate user and kernel stacks

Private data storage area

The register set, stacks, and private storage area are known as the context of the threads

4.17

Linux Threads

Linux refers to them as tasks rather than threads

Thread creation is done through clone() system call

clone() allows a child task to share the address space of the parent task (process)

Background on the need for Synchronization

• Threads may need to wait for other threads to finish an operation

• Additionally concurrent access to shared data with threads may result in data inconsistency (i.e., incorrect values)

• Maintaining data consistency requires mechanisms to ensure the orderly execution of cooperating processes (or threads)

Example Problem

• Suppose two threads share a common buffer array. The producer put items in the buffer and the consumer removes them.

• A solution to a two thread consumer-producer problem that fills all the buffer space has an integer count that keeps track of the number of full buffers. Initially, count is set to 0. It is incremented by the producer after it produces a new buffer and is decremented by the consumer after it consumes a buffer.

Producer while (true) { /* produce an item and put in

nextProduced */ while (count == BUFFER_SIZE)

; // do nothing buffer [in] = nextProduced; in = (in + 1) % BUFFER_SIZE; count++;

}

Consumer while (true) {

while (count == 0) ; // do nothing nextConsumed = buffer[out]; out = (out + 1) % BUFFER_SIZE;

count--;

// consume the item in nextConsumed}

Critical Section

• The code segments that read and write global shared data between threads or processes is called a “critical section”

• Possible race condition bugs on global variable values – example will follow

• OS Synchronization API used to solve this• Must be careful and use OS synchronization

primitives to control access to a critical section or hidden bugs will appear in code

Race Condition on Count• count++ could be implemented as

register1 = count register1 = register1 + 1 count = register1

• count-- could be implemented as

register2 = count register2 = register2 - 1 count = register2

• Consider this execution interleaving with “count = 5” initially:

S0: producer executes register1 = count {register1 = 5}S1: producer executes register1 = register1 + 1 {register1 = 6} S2: consumer executes register2 = count {register2 = 5} S3: consumer executes register2 = register2 - 1 {register2 = 4} S4: producer executes count = register1 {count = 6 } S5: consumer executes count = register2 {count = 4}

Need an Atomic Operation

• Count++ and Count-- code must run to end before switching to other thread to avoid bugs

• Atomic operation here means a basic operation which cannot be stopped or interrupted in the middle to switch to another thread

• Race conditions will occur faster on systems with multiple processors since threads are running in parallel

Solution to Critical-Section Problem1. Mutual Exclusion (Mutex) - If process Pi is executing in its critical

section, then no other processes can be executing in their critical sections

2. Progress - If no process is executing in its critical section and there exist some processes that wish to enter their critical section, then the selection of the processes that will enter the critical section next cannot be postponed indefinitely

3. Bounded Waiting - A bound must exist on the number of times that other processes are allowed to enter their critical sections after a process has made a request to enter its critical section and before that request is grantedAssume that each process executes at a nonzero speed No assumption concerning relative speed of the N processes

Solution to Critical-section Problem Using Mutex Locks

do { acquire lock

critical section release lock

remainder section } while (TRUE);

Deadlock and Starvation• Deadlock – two or more processes or threads are waiting indefinitely for

an event that can be caused by only one of the waiting processes• Let S and Q be two semaphores initialized to 1 (i.e. a mutual exclusion

lock) P0 P1

wait (S); wait (Q); wait (Q); wait (S);

. .

. .

. . signal (S); signal (Q); signal (Q); signal (S);

• Starvation – indefinite blocking. A process may never be removed from the semaphore queue in which it is suspended

• Priority Inversion - Scheduling problem when lower-priority process holds a lock needed by higher-priority process. Might need to run lower –priority process first to continue. – messes up priority on processes

RTOS• Real Time Operating System (RTOS)• Used in systems that need a fast response time

to external events on the order of milliseconds• This is about 10-100X faster than PCs• The general purpose OS in a PC is optimized for

throughput and a fast graphical user interface – but at the expense of the Real Time response

Mbed RTOS & Threads

• Runs a 1ms time slice to switch between threads this is about 10-100X faster than PCs

• Memory is limited to around 8 threads – each thread needs its own stack and the RTOS also uses a fair chunk of RAM (32K). RAM is used for variables only. Nonvolatile Flash memory stores code and constants -there is (512K) of it, so it is typically not the issue.

MBED RTOS

• The mbed RTOS also provides some basic synchronization primitves:– Mutex Lock – used to lock and unlock access to

shared memory (variables) and I/O devices– On the mbed compiler, using the keyword volatile

will put the equivalent of a mutex lock on a simple built in global variable data type (but not arrays)

– Signals – can be used to send signals between threads

MBED RTOS

• Semaphores – a more advanced synchronization primitive than a mutex. Can count things, but also slower than a mutex.

• Thread::wait(x ms) – tells the RTOS scheduler to not run this thread again until x ms of time has passed. Useful to keep a thread from using too much processor time when it does not need it. Other threads run during the delay.

• Don’t use wait – use Thread::wait

Mbed RTOS• Free for ARM mbed users. Many RTOSes require a

license fee. Just need RTOS library in project and a new #include “rtos.h” after mbed.h include

• Documentation and code examples found in the mbed Handbook under “Real Time Operating System” click “mbed RTOS” link

• Free networking libraries are also available that use the RTOS for Internet of Things Devices (IoT)


Recommended