Multithreaded Primer 2011-11-29 - FarWestfarwest.ca/Multithreaded Primer 2011-11-29.pdf · 2011....

Geoff ’s Primer On Multithreaded Programming

Copyright © 2011 FarWest Software, Inc.

N

E

S

W

FarWest Software, Inc.

Table of Contents

Introduction 1

A History Lesson 2

The Batch Job And The Rise of Multitasking 2

Multiple Processors 2

Light Weight Processes and Threads 3

Schedulers and Threads 3

What Are Threads? 5

Processes In Miniature 5

How Are Threads Created? 5

How Do Threads Stop? 6

Sharing and Protecting Data 8

Why Protect Memory? 8

The Mutex 8

The Read/Write Lock 10

FIFO-Biased Priority 10

Read-Biased Priority 10

Write-Biased Priority 10


Geoff’s Primer on Multithreaded Programming i

Issues With Protection 11

Interthread Communication 12

Queues 12

The Basic FIFO Queue 12

Priority-driven Queues 12

MacOS and iOS Observers 14

Multithreaded Architectures 15

The Pipeline 15

The Pool 15

The Static Pool 16

The Dynamic Pool 16

The Hybrid 16

Performance Issues 17

Thread Starvation 17

SingleThreading 18

Conclusion and Further Reading 21


Geoff’s Primer on Multithreaded Programming ii

Introduction

Writing software for a multi-CPU or multicore environment generally means writing something to be multithreaded. This primer is intended to help the reader better understand multithreaded programming, and provide some guides for building better, faster and more reliable programs that use threads. It includes a discussion on what threading is, and how it works in some of the more common operating systems out today. There is also a bit of a history lesson that leads up to what threading is, to better understand how we got to where we are today.

This primer will cover the basic architectures and design patterns for multithreaded programs, and include specific discussions on architectures for iOS and Android. It will also discuss the issues around memory protection, and talk about problems and solutions for common performance issues in multithreaded programming.

The primer will include code samples for Java and Objective-C, targeted at UNIX/Linux environments, as well as for iOS and Android.

This primer is not meant as an exhaustive treatment of multithreaded design and programming. What it covers are the basics that can make you productive quickly, and to help you take advantage of multi-core and multi-CPU systems. The point is to describe the common constructs and the basic approaches.


Geoff’s Primer on Multithreaded Programming 1

A History Lesson

The Batch Job And The Rise of MultitaskingA long time ago, in a computer room far, far away, a computer would run one and only one program at a time. Often known as a “batch job”, the computer was dedicated solely to running that single task, to the exclusion of everything else. These computers had a single processor or CPU, with associated memory and persistent storage such as disk, tape or punchcards.

Research starting in the 1960’s to allow more than one program to run at a time. Over time, this developed into what we know as the “multitasking operating system”, where multiple active programs shared the single CPU. The way this worked was to allow a program to run for a brief period of time, interrupt it and save its current state, and then allow another program to run for a brief period of time. The change between programs was known as a context switch, and was either done in software (as part of the underlying operating system), or sometimes in hardware in the form of specific assembly instructions to make the switch occur. While it wasn’t true parallelism, since the programs only ran one at a time, the switch as usually fast enough that it appeared as if the programs were running simultaneously.

Multiple ProcessorsResearch started in the 1980s, and advanced further in the 1990s, started to add extra CPUs to a single computer, beginning what would be come Symmetric Multiprocessing, or SMP. In these machines there were two or more CPUs, sharing memory, storage and other services, and acting like a single computer. This added some complication to the machines and the operating systems, since memory had to be protected to prevent two processors from trying to allocate the same memory space to two different processes, and things like CPU-based caches needed to be updated and kept synchronized.

To best take advantage of these multiple CPUs, developers started to write their software so that they could split workload among multiple, parallel instances of the same program.


2 Geoff’s Primer on Multithreaded Programming

This was the first simple approach to a form of crude “threading”, but it had some disadvantages. Any shared state had to either be managed via shared memory (which was typically fairly limited), via files in the filesystem, or by passing data back and forth using interprocess communication constructs like sockets.

Light Weight Processes and ThreadsAt the same time that SMP technologies were developing, work started on a new way to parallelize work, called “light weight processes” by some, or “threads” by others. The term “thread” became the common term for running multiple tasks within a single process. The early work on threads, started in part by the now-defunct Sun Microsystems, involved running a small scheduler inside the process, which would “context switch” between threads. The operating system still saw the program as a single program, and managed and scheduled it as such. This type of threading, though, couldn’t really take advantage of multiple CPUs, because the scheduling was contained inside the process and was invisible to the OS outside.

In the early-1990s, operating systems were changed to become thread-aware, and to allow the operating system to schedule and manage the threads. The big leap that came from this was the ability of two threads to run truly in parallel if they were assigned to two different processors. From the thread’s perspective, they couldn’t tell if they were sharing a single CPU, or were running in parallel. They got to share the process’s memory space, and could communicate using in-memory structures.

Schedulers and ThreadsThere were two approaches to how an operating system would schedule and manage threads. One, taken initially by Solaris, was to schedule the threads within the context of the process (but unlike LWP, the scheduling is done in the kernel, not in the process), and thus the threads were competing with themselves, but all of them got context-switched out when the process did. To be able to better manage processor usage, threading libraries included calls to indicate what the priority for a thread was, and the higher-priority threads generally got more processing time than lower-priority ones. The problem, though, was that the whole thing could be preempted by some other process. Threads in this scheme also didn’t run as smoothly, since there were two levels of context switching: one occurring within a process and one between processes.



Other operating systems, such as Windows NT, scheduled threads across the entire machine, and the “process” simply became more of an administrative construct. Now, threads weren’t just competing for CPU and other resources within a single process, but with all threads across the entire machine. The concept of thread priority basically disappeared, and “process priority” was applied to all threads in a single process. The result was a “smoother” runtime experience, since there was only a single layer of context switching occurring. Most other UNIX operating systems, like AIX and HP-UX took a similar approach, being “thread scheduled” systems rather than “process scheduled” systems.

The increased use of threading, though, also meant that programmers had to think differently about how to design, code and manage their programs. Problems arose that didn’t occur before, such as two threads trying to change a piece of memory simultaneously. Issues that only existed inside the kernel of the operating system now existed in the programs themselves.



What Are Threads?

Processes In MiniatureThreads can be thought of as miniature processes that run within the context of the larger program. They can be created and destroyed as necessary without having to stop or restart the process itself. When the process is killed, the threads die with it.

So how is this better or different than simply running multiple instances of the same program? Well, unlike multiple, parallel instances of a program, the threads all share the same memory space, which means that they can all have the same, shared state for data. This allows threads to read or update some internal variable or data structure without having to deal with sending messages to other programs that the state has changed, or manage and control access to slower services like disk to update state.

The result is that using shared memory within the process is far faster, and state changes between threads can be almost instantaneous. Doing the same between processes is much slower, and there are error conditions that you (or a framework you use) has to handle that don’t exist when the activity is constrained to a single process.

Most modern runtime frameworks and environments include a number of threads when your program starts. You may not have actually built anything with threads, but some exist anyways, created as part of the framework startup. For example, most Java programs start with at least 2 threads: one containing the class that was invoked on startup, and another to handle garbage collection. Your program doesn’t see these other threads, but they are there nonetheless.

How Are Threads Created?There are many different ways to create a thread, depending on the programming language you use and the runtime environment that the program is meant for. In Java running on a Linux system, creating a new thread is simple. You code your class, extending the Thread base class, and implementing the run() method where the thread’s work starts. Some other thread simply creates an instance of that class, calls the start()



method, and the thread is off and running. The sample below shows a very simple thread class, and demonstrates how it is invoked.

class SimpleThread extends Thread { public void run() {

// do some stuff here }}

class ThreadStarter {

public void main(char[] argv) { SimpleThread t = new SimpleThread(); t.start();

try { t.join(); // wait for the thread to end }

catch (InterruptedException ex) { // handle the exception here } }

}

On a mobile platform like iOS, you could do it as shown below. This shows the use of Grand Central Dispatch, which provides mechanisms for building multithreaded apps without some of the complications in traditional threads.

dispatch_queue_t aQueue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);dispatch_async(aQueue, ^{ // do parallel work here});

Obviously, there is more involved to making a meaningful and useful multithreaded application, but the basic constructs aren’t overly complicated.

How Do Threads Stop?Threads can stop on their own by simply returning from the method used to start the thread (for example, by returning from the run() method on a Java class that inherits from Thread). In some environments, you may be able to send the equivalent of a kill signal to the thread, and have it stop immediately, but this is typically not a recommended alternative.



Threads can also be stopped by killing the entire process. In this case, all of the threads will stop immediately.

It is usually best to not have threads just hanging around if you don’t need to, since they do use up memory and resources. Whether this matters depends on the architecture of your program, and the memory constraints on your runtime environment.



Sharing and Protecting Data

Why Protect Memory?In a multithreaded program, all of the threads have access to all of the public data. So, if you have a global variable or a singleton, all threads have access to that variable or singleton. Reading shared memory isn’t a big problem, since that can be done safely in multiple threads. However, whenever a thread needs to change a variable or piece of shared memory, that requires protection.

But why? Particularly for primitive types like int or long, isn’t setting it’s value pretty safe? Well, in actuality, memory and variable operations are normally not atomic: they don’t happen in a single, discrete operation. They can be interrupted by the scheduler part-way through a write, meaning that another thread can read bad or odd data while a write to that variable is in progress.The problem is that only some of the data for that variable has been written, as the write is incomplete.

In order to avoid these types of problems, we need to protect shared data, including primitive types like char, int and long, so that they can be read safely and consistently. We need to avoid a situation where one thread is trying to read a variable while another is trying to write. More seriously, we want avoid a situation where two threads are trying to write to the same variable at the same time, potentially leaving it with an unusable or incorrect value.

While there are many ways to protect data, the two most common are the mutual exclusion lock (or mutex) and the read/write lock. There are others, such as semaphores, counting semaphores and spinlocks. But the two discussed here are by far the most common, the simplest to use and the easiest to understand.

The MutexThe mutual exclusion lock, or mutex, is a simple construct for protecting a variable or some block of code. For some languages, such as Java, the mutex is built into the



language itself via the synchronized keyword, making the job of protecting variables or blocks of code very, very simple.

A mutex works by only allowing one thread at a time to access some path through the code. Think of it as a gate that requires requests to use a piece of code to be queued up, waiting their turn, and only one thread can execute that code at a time.

Bear in mind that, if performance is a concern, one way to minimize the impact of protecting code sections is to ensure that the smallest amount of logic should be protected. Try to structure the logic such that only the absolute minimum is protected.

As mentioned above, this construct is built into Java using the synchronized keyword. Java allows the use of synchronized at the method level (so that a method is protected on a class-by-class basis) as well as in specific blocks of code. The code sample below shows a method that is protected with the synchronized keyword.

public synchronized void someMethod() { // do useful stuff here

}

In this case, only one thread at a time can execute the method, and if more than one thread wants to use it, the others have to wait their turn.

You can also protect discrete blocks of code, either at the class level, or by using an instantiated object. The code below locks access at the class level.

synchronized (SomeClass.class) { // do useful stuff here}

As with the synchronized method, any thread requesting access to the block must wait its turn. It is also possible to synchronize based on an instantiated object.

SomeClass object..

.synchronized (object) { // do useful stuff here

}



In this case, that code is protected by a specific object, and if more than one instance of the object exists, the code may be accessed in parallel. This approach is useful if you want to protect access to a particular object, and still allow parallel execution to continue.

The Read/Write LockThe read/write lock is a special form of mutual exclusion lock that allows parallel reads, but only a single write to occur at a time. Threads will indicate on a read/write lock whether they are trying to read or write in a particular code path. Threads that indicate they are reading can access the code path in parallel, which allows multiple threads to safely read simultaneously. When a thread indicates it wants to write, all reads in progress can finish, and then any other activity is blocked until the write is complete.

Read/write locks often include a way to indicate priority. There are basically three priorities for a read/write lock: simple FIFO, read-biased or write-biased.

FIFO-Biased PriorityIn simple FIFO, the requests are handled in the order that they are received. If the next pending request is a write, all other requests will queue up behind it in the order they are received. Obviously, if a read is in progress and another read request is made with no other pending requests waiting, that read request gets to proceed immediately. But if a write request arrives while reads are in progress, all other requests wait, even other read requests.

Read-Biased PriorityIn a read-biased lock, reads jump ahead of requests to write. If there is a pending write request while the section is already locked (either by reads or writes), and a read request is made, that read request will jump ahead of the write request.

Write-Biased PriorityIn a write-biased lock, the writes take priority over the reads. If there is a pending write request, any other requests have to wait until that write completes. If another write request comes along, it jumps ahead of any read requests, but is queued with the write requests in FIFO order.



Issues With ProtectionWith mutexes and read/write locks, you have to be aware of what thread has locked a particular section of code, and when it unlocks it. In some languages, it is possible for a thread to deadlock itself by trying to lock the same mutex twice. This can happen if you use POSIX-style mutexes. This means you have to be aware of when a thread has locked a mutex, and to avoid trying to lock it a second time.

It is also possible in some languages for two threads to deadlock each other, specifically if they need to lock two mutexes before they can do something. This problem is sometimes discussed in a thought experiment known as the Dining Philosopher’s Problem. There is a fairly good description of this problem on Wikipedia.[http://en.wikipedia.org/wiki/Dining_philosophers_problem]

There are some strategies to avoid this. One is to protect the two mutexes with a third, such that access to the other two is first controlled by a “master” mutex.

The other is to test a mutex before locking it, and avoiding the deadlock. This isn’t possible in all environments, but if it can be done, it can allow a thread to lock one mutex, test the other, and if it is already locked, unlock the first, back off and wait, then try again, and only proceed when both mutexes can be locked safely. The risk with this strategy is that a thread is never able to lock both mutexes simply because other threads are in there locking and unlocking the required mutexes. In this situation, it may be better to back up and review the design and logic, and avoid the need for locking two mutexes where possible.

A better way, if possible, is to avoid resource dependencies. Granted, this isn’t always viable, but where you have control of the resources and how they are managed, careful design can usually avoid this type of issue.



http://en.wikipedia.org/wiki/Dining_philosophers_problem%5D

http://en.wikipedia.org/wiki/Dining_philosophers_problem%5D

Interthread Communication

Threads can communicate using a few different mechanisms, depending on the technology you use. For most systems, you can use some kind of queue to send messages to a thread. On MacOS and iOS, you can use observers to note the change in a variable, and that change can trigger some kind of activity.

Queues

The Basic FIFO QueueOne of the more common ways to communicate with a thread is to put some kind of message or object into a FIFO queue. The thread waits for messages to arrive on that queue, and then acts on them accordingly.

For Java programmers, the temptation is to use a simple ArrayList, protected with a mutex, to be the queue. This can work, and it is simple, but it has some problems. First, the ArrayList doesn’t offer a way to wait for a message efficiently. The thread has to effectively poll the ArrayList to see if there is any data on it. Unless you put some kind of brief sleep between checks, this will cause a thread to use up a lot of CPU but not do any real work. This also steals CPU from other threads who do have work to do.

Fortunately, Java has a set of data structures that are build specifically for this type of activity: the BlockingQueue. There are several types, but all of them provide the same basic functionality. The reading thread can block and wait efficiently (using little or no CPU) until a message is delivered. Other threads can safely put objects into this queue while the thread reading from it is waiting. Any thread reading from the queue will wake up after an object is placed on the queue.

Priority-driven QueuesThe upside of a FIFO queue is that it is egalitarian: all messages have the same priority. However, there may be times where you want some messages to jump ahead of the queue, and bypass lower-priority messages. An example could be a message that is used



to interrupt work in progress. Some environments offer priority-driven queues, but you can also imitate them using normal queues in some cases. To do this, you create a “meta-queue”, which is an object that contains multiple queues, plus a mechanism to allow the object to efficiently wait for notification that a queue has elements.

When a request is made to put an object on the “meta-queue”, it includes an indication as to what priority is requested. The object is put on the appropriate internal queue based on priority.

When a request is made to get an object, the “meta-queue” tests the queues in priority order, from first to last. In the diagram above, any request to get an object from the queue results in the Priority 1 queue being tested first, and until this queue is empty, all requests will come from this queue. The Priority 2 queue is only checked if the Priority 1 queue is empty, and so on.

Priority 1 Queue

Priority 2 Queue

Priority 3 Queue

Notification Object



The “notification object” will vary, depending on your environment. The goal is to allow threads to block, waiting for an item, but to do so efficiently. Polling non-stop will cause a thread to use CPU without any benefit. The notification object may be a queue itself, or it may be some other construct (such as the wait/notify construct in Java).

MacOS and iOS ObserversPrograms written for MacOS and iOS have a powerful construct called the observer. This allows other objects to be notified when the value of some attribute on another object has been changed. The concept is based on the Observer pattern.

The way it works is that an object can add an observer to a property on another object. When the property is “set”, the object that requested notification is told. There are issues with observers in MacOS and iOS that you need to be aware of, and it is best to consult the Apple documentation. The observer construct in MacOS and iOS are very powerful, and allow for cleaner object/state interaction than having to maintain and publish state yourself. A future primer will likely be written specifically about observers in MacOS and iOS.



Multithreaded Architectures

While there can be many different architectures for multithreaded applications, there are two that are the most common: the pipeline and the pool.

The PipelineThe pipeline is basically a chain of threads, where work is passed from one to the next much like a factory assembly line.

Any objects to be processes are given to Stage 1. When it has completed its work on the object, it passes it to Stage 2, and then retrieves the next item of work.

This construct works well in situations where a workflow can be broken into multiple, discrete stages that can run in parallel.

The PoolThe pool is a collection of one or more threads available to do work. There are a couple of approaches to dispatching work to a pool of threads. One is to have all of the waiting threads reading from a single dispatch queue, and the first thread to read a message gets to work.

Stage 1 Stage 2 Stage 3

Thread

Thread

Thread

Queue



The other is for each thread to have it’s own queue, and work is handed out using some kind of scheme, with round-robin being the simplest.

There are two variants on the pool concept: a static pool and a dynamic pool.

The Static PoolIn a static pool, there are a set number of threads, typically created at startup. The number of threads never increases or decreases. Work is dispatched either via a shared queue or round-robin to each thread’s queue.

The Dynamic PoolIn a dynamic pool, threads are created as needed. If a message arrives but there is no thread available to work on it, then a new thread is created. In some cases, a dynamic pool may have an upper limit, to avoid a situation where the system requests more threads than are allowed, or so many threads are created that they exhaust available memory and case the whole program to crash.

The HybridA hybrid approach can either have a pool of pipelines (where work is dispatched to one or more pipelines) or a pipeline of pools (where work moves through the pipeline, but each stage is a pool of threads work in parallel on that stage of work). These types of constructs are rare, and often needlessly complex. But, the concept shouldn’t be ignored.



Performance Issues

Simply making a “normal” application multi-threaded doesn’t automatically mean you will see better performance. The structure of the app, and how the threads interact, can limit any performance gains, or in fact result in no gains at all. In some, rare cases, it can make things worse, and not better, for performance. Fortunately, a lot of these performance problems are solvable.

There are two fairly common issues when it comes to multithreaded performance: thread starvation and single-threading. In both cases, the result is less performance that you would expect, but with some refactoring and re-engineering, many systems can fix this problems.

Thread StarvationThread starvation is a situation where a thread (or pool of threads) basically doesn’t have “enough work”, and so they are sitting around, idle, waiting for the next thing to work on. This happens when the flow of logic “upstream” of the threads isn’t feeding data as fast as the the downstream threads can process it.

Sometimes, this situation is unavoidable, simply because only so much data can be processed at a particular step in the system. Adding more threads at that step may not help, since the bottleneck may be a shared resource such as disc storage or a network connection.

However, you need to look carefully at the point where the bottleneck is occurring and see if there are optimizations that can be performed. In some cases, it is much like traditional performance tuning: you need a better algorithm. Picking a better approach to a computational problem at one stage of the program is often enough to speed up data generation, keeping other threads busier. Sometimes it is simply a hardware issue: better/more/faster devices (discs, network bandwidth, etc).



In the end, though, you may find that this problem isn’t solvable using the technology you have available, or with the budget that you have for hardware and software resources. In cases like this, you may actually decide to abandon or reduce the use of multithreading, because it adds complexity for little benefit. This isn’t necessarily a bad thing to do.

SingleThreadingThis situation is sometimes mistaken for thread starvation, and the symptoms can sometimes be misleading. The more common bits of evidence are an app or system where the threads should be running faster, but all of the threads are running slower than expected, and aren’t necessarily stopped. On the surface, it is a case of “it should run faster, why isn’t it?”.

The problem could be singlethreading, a type of bottleneck where multiple threads are constrained by a shared resource protected by a mutex. Depending on where that resource sits in the flow of logic (and how long the shared resource is exclusively locked) will determine the performance impact.

The solutions depends on the situation. The simplest may be that the code section being protected is simply very long, and only protecting the bare minimum of the code path will speed things up. Be wary when you find yourself protecting large methods or long code paths.

In some cases, a shared resource that is protected by a mutex could be protected by a read/write lock instead, allowing threads to read the data in parallel. This only works, of course, if the shared resource is one that is primarily read by the threads, and only written to occasionally (either by the threads in question or by an outside resource).

Another possibility is to have a multiple pools of shared data, that allows parallel access. The best way to explain this solution is with an example: threads will often create and destroy objects, which ultimately use chunks of memory from the process memory heap. In the past, that memory heap, in order to ensure consistency and protect the heap structure, was protected by a single mutex. That meant that the threads, although running in parallel, were constrained by the mutex on the single heap when they wanted to allocate and free memory.



How was this addressed? Simple: multiple heaps, each protected by a mutex. By having multiple heaps, and having the threads allocate and free data from different heaps, meant that the threads could run in parallel again without having to compete with each other for a single heap.

Memory Heap

Thread

Mutex

Thread Thread

Memory Heap

Thread

Mutex

Thread Thread

Memory Heap

Memory Heap

Mutex Mutex

Heap Selection Logic



Most memory allocation is now done using multiple pools. But, you may find parts of your system (or underlying components your system uses) still has a single pool for resources. It may be necessary to build your own layer on top of the shared resource to allow some amount of parallel access.



Conclusion and Further Reading

While the need for multithreaded programs has been around for quite some time, there has been renewed interest and renewed questions on how to best support it, given the rise of multicore processors on mobile devices. Hopefully this primer has been helpful in providing insight and guidance into multithreaded programming. Once you have some experience, designing and building multithreaded software isn’t actually all that difficult. It does take some discipline, and you have to always “be aware of your surroundings” in ways that you don’t in a traditional singlethreaded program. But it really isn’t as scary and complicated as it may first appear. In time, you may come to rely on it extensively. In some cases, being able to build something in a multithreaded way is essential to gain the highest possible performance from your software. Keep these lessons in mind, and your programs should work just fine now, and in the future.

Have more questions? I can be reached via Twitter (on @farwestab) and I maintain a technology blog (http://farwestab.wordpress.com) where I discuss various technology and technology-related business topics. You can also go to my company’s website (http://www.farwest.ca).

A good resource for Java programmers is Brian Goetz’s Java Concurrency In Practice (ISBN 978-0321349606). It is a very thorough treatment on the topic. For iOS and MacOS programmers, the documentation on the Apple developer site is an essential resource, and gives a comprehensive and detailed description of the frameworks used for multithreaded programming.



http://farwestab.wordpress.com

http://farwestab.wordpress.com

http://www.farwest.ca




Date post:	25-Feb-2021
Category:	Documents
Upload:	others
View:	9 times
Download:	0 times

Multithreaded Primer 2011-11-29 - FarWestfarwest.ca/Multithreaded Primer 2011-11-29.pdf · 2011....

Documents