Concurrent ProgrammingIntroduction
Frédéric Haziza <[email protected]>
Department of Computer Systems
Uppsala University
Ericsson - Fall 2007
Good to know Scenario Definitions Hardware Classical Paradigms
Outline
1 Good to know
2 Scenario
3 Definitions
4 Hardware
5 Classical ParadigmsIterative ParallelismRecursive ParallelismProducer/ConsumerClient/ServerInteracting Peers
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Literature
Gregory Andrews.Foundations ofMultithreaded, Parallel andDistributed Programming.Addison-Wesley, 1999 (ISBN:
0-201-35752-6)
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Schedule
Date, Time, Comments
Tue 6 Nov 9.00-12.00 Setting the decorTue 13 Nov 9.00-12.00 Locks, Barriers + LabTue 27 Nov 9.00-12.00 Remainder...
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Scenario
Several cars want to drive from point A to point B.
They can compete for space on the same roadand end up either:
following each other
or competing for positions (and having accidents!).
Or they could drive in parallel lanes,thus arriving at about the same time without getting in eachother’s way.
Or they could travel different routes, using separate roads.
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Scenario
Several cars want to drive from point A to point B.
Sequential ProgrammingThey can compete for space on the same roadand end up either:
following each other
or competing for positions (and having accidents!).
Parallel ProgrammingOr they could drive in parallel lanes,
thus arriving at about the same time without getting in each other’s way.
Distributed ProgrammingOr they could travel different routes, using separate roads.
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Definitions
Concurrent Program
2+ processes working together to perform a task.
Each process is a sequential program(= sequence of statements executed one after another)
Single thread of control vs multiple thread of control
Communication• Shared Variables• Message Passing
Synchronization• Mutual Exclusion• Condition Synchronization
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Correctness
Wanna write a concurrent program?
What kinds of processes?
How many processes?
How should they interact?
CorrectnessEnsure that processes interaction is properly synchronized
Mutual ExclusionEnsuring the critical sections of statements do not executeat the same timeCondition SynchronizationDelaying a process until a given condition is true
Our focus: imperative programs and asynchronous executionMP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Amdhal’s law
P is the fraction of a calculation that can be parallelized
(1 − P) is the fraction that is sequential(i.e. cannot benefit from parallelization)
N processors
⇒ maximum speedup = 1(1−P)+P/N .
ExampleIf P = 90% ⇒ max speedup of 10no matter how large the value of N used (ie N → ∞)
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Single-Processor Machine
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Memory Hierarchy
Main Memory
Level 2 cache
Level 1 cache
CPU
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Why do we miss in the cache?
Compulsory missTouching the data for the first time
Capacity missThe cache is too small
Conflict missNon-ideal cache implementation (data hash to the same cache line)
Main Memory
Miss
Cache
Hit
CPU
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Locality
Temporal locality
Spatial locality
Inner loop stepping through an array
A, B, C, A+1, B, C, A+2, B, C,
spatial temporal
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
MultiProcessor world - Taxonomy
SIMD MIMD
Message Passing
Fine-grained Coarse-grained
Shared Memory
UMA NUMA COMA
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Shared-Memory Multiprocessors
Memory Memory...
Interconnection network / Bus
Cache Cache...
CPU CPU
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Programming Model
Thread
$
Thread
$
Thread
$
Thread
$
Thread
$
Thread
$
Thread
$
Thread
$
Thread
$
Shared Memory
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Cache coherency
Shared Memory
A: B:
$
Thread
$
Thread
$
Thread
Read A
Read A
...
...
Read A
...
Read A
...
Write A
Read B
...
Read A
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Summing up Coherence
There can be many copies of adatum, but only one value
Too strong!!!
There is a single global order ofvalue changes to each datum
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Memory Ordering
The coherence defines a per-datum order of valuechanges.
The memory model defines the order of value changes forall the data.
What ordering does the memory system guarantees?“Contract” between the HW and the SW developersWithout it, we can’t say much about the result of a parallelexecution
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
What order for these threads?
A’ denotes a modified value to the datum at address A
Thread 1
LD AST B’LD CST D’LD E......
Thread 2
ST A’LD B’ST C’LD DST E’......
LD A happens before ST A’
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Other possible orders?
Thread 1
LD AST B’LD C
ST D’LD E......
Thread 2
ST A’LD B’ST C’LD D
ST E’......
Thread 1
LD AST B’LD C
ST D’LD E......
Thread 2
ST A’LD B’ST C’LD D
ST E’......
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Memory model flavors
Sequentially Consistent: Programmer’s intuition
Total Store Order: Almost Programmer’s intuition
Weak/Release Consistency: No guaranty
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Dekker’s algorithm
Initially A = 0,B = 0
“fork”
A := 1if(B==0)print(“A wins”);
B := 1if(A==0)print(“B wins”);
Can both A and B win?
Does the write
become globally
visible before the
read is performed?
Left: The read (ie, test if B==0) can bypass the store (A := 1)Right: The read (ie, test if A==0) can bypass the store (B := 1)⇒ Both loads can be performed before any of the stores⇒ Yes, it is possible that both win!
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Dekker’s algorithm for Total Store Order
Initially A=0,B=0
“fork”
A := 1Membar #StoreLoad;if(B==0)print(“A wins”);
B := 1Membar #StoreLoad;if(A==0)print(“B wins”);
Can both A and B win?
Does the write
become globally
visible before the
read is performed?
Membar: the read is started after all previous stores have been“globally ordered”⇒ Behaves like a sequentially consistent machine⇒ No, they won’t both win. Good job Mister Programmer!
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Dekker’s algorithm, in general
Initially A = 0,B = 0
“fork”
A := 1if(B==0)print(“A wins”);
B := 1if(A==0)print(“B wins”);
Can both A and B win?
The answer depends on the memory model
Remember? ...Contract between the HW and SW developers.
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
So....
Memory Modelis a tricky issue
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
New issues
Compulsory miss
Capacity miss
Conflict miss
Memory Memory...
Interconnection network / Bus
Cache Cache
...
CPU CPU
Communication missCache-to-cache transfer
False-sharingSide-effect from large cache lines
What about the compiler?Code reordering? volatile keyword in C...
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Good to know
Performance ⇒ Use of CacheMemory hierarchy ⇒ Consistency problems
To get maximal performance on a given machine,the programmer has to know about the characteristics of thememory system and has to write programs to account them
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Distributed Memory Architecture
Interconnection network
Memory Memory
...Cache Cache
CPU CPU
Communication through Message Passing
Own cache, but memory not shared⇒ No coherency problems
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Classical Paradigms
Data Parallel
Task Parallel
5 paradigms:
Iterative parallelism
Recursive parallelism
Producer/Consumer
Client/Server
Interacting peers
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Iterative Parallelism: Matrix multiplication
1: double a[n,n], b[n,n], c[n,n];
2: for [i=0 to n-1] { ⊲iterating trough the rows
3: for [j=0 to n-1] { ⊲iterating trough the columns
4: ⊲ Computes inner product of a[i,*] and b[*,j]
5: c[i,j] = 0.0;6: for [ k = 0 to n-1 ] {7: c[i,j] = c[i,j] + a[i,k]*b[k,j];8: }9: }10: }
What can we parallelize? Line 5 to 7⇒ c[i,j] is written to, and a[i,k], b[k,j] are only read⇒ every c[i,j] computation!
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Iterative Parallelism: Matrix multiplication
Parallelizing the rows
co [i=0 to n-1] { ⊲compute rows in parallel
for [j=0 to n-1] {c[i,j] = 0.0;for [ k = 0 to n-1 ] {
c[i,j] = c[i,j] + a[i,k]*b[k,j];}
}}
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Iterative Parallelism: Matrix multiplication
Parallelizing the columns
co [j=0 to n-1] { ⊲compute columns in parallel
for [i=0 to n-1] {c[i,j] = 0.0;for [ k = 0 to n-1 ] {
c[i,j] = c[i,j] + a[i,k]*b[k,j];}
}}
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Iterative Parallelism: Matrix multiplication
Parallelizing all rows and columns
co [i=0 to n-1, j=0 to n-1] {c[i,j] = 0.0;for [ k = 0 to n-1 ] {
c[i,j] = c[i,j] + a[i,k]*b[k,j];}
}
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Recursive Parallelism: Adaptive Quadrature
f (x)
x
y
a b
∫ b
af (x)dx
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Recursive Parallelism: Adaptive Quadrature
1: double fleft = f(a), fright, area = 0.0;2: double width = (b-a)/ INTERVALS;
3: for [x = (a+width) to b by width] {4: fright = f(x);5: ⊲Compute the small rectangle area
6: area = area + (fleft * lfright) * width / 2;7: fleft = fright; ⊲the right-hand value becomes the new left-hand value
8: }
f (x)
x
y
x
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Divide and Conquer
f (x)
x
y
f (x)
x
y
|areanew − areaold | > EPSILON
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Divide and Conquer
double quad (double left, right, fleft, fright, oldarea) {
double mid = (left + right)/2; ⊲find the middle point
double fmid = f(mid); ⊲get its value
double larea = (fleft + fmid) ∗ (mid − left)/2;double rarea = (fmid + fright) ∗ (right − mid)/2;
if |(larea + rarea) − oldarea| > EPSILON {⊲Recurse to integrate both halves
larea = quad (left,mid,fleft,fmid,larea);rarea = quad (mid,right,fmid,fright,rarea);
}return (larea + rarea);
}
∫ b
af (x)dx ≈ quad(a, b, f (a), f (b), (f (a) + f (b)) ∗ (b − a)/2);
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Divide and Conquer - Parallel
double quad (double left, right, fleft, fright, oldarea) {
double mid = (left + right)/2; ⊲find the middle point
double fmid = f(mid); ⊲get its value
double larea = (fleft + fmid) ∗ (mid − left)/2;double rarea = (fmid + fright) ∗ (right − mid)/2;
if |(larea + rarea) − oldarea| > EPSILON {⊲Recurse to integrate both halves
co [] {larea = quad (left,mid,fleft,fmid,larea);
⊲in parallel!
rarea = quad (mid,right,fmid,fright,rarea);} ⊲Must wait for larea and rarea
}return (larea + rarea);
}
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Producer / Consumer
Producer Consumer
Shared Resource
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Client / Server
Client1
Clientn
......
Server
Request
Reply
Request
Reply
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Interacting Peers - Coordinator/Workers
Coordinator
Worker1 Workern−1Results
Data
Results
Data
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Interacting Peers - Circular Pipeline
Worker1 ... Workern−1
MP’07 | MP’07 (Introduction)
Good to know Scenario Definitions Hardware Classical Paradigms
Interacting Peers
Coordinator/Workers
Coordinator
Worker1 Workern−1Results
Data
Results
Data
Circular pipeline
Worker1 ... Workern−1
MP’07 | MP’07 (Introduction)