+ All Categories
Home > Engineering > 24 Multithreaded Algorithms

24 Multithreaded Algorithms

Date post: 21-Jan-2018
Category:
Upload: andres-mendez-vazquez
View: 2,430 times
Download: 0 times
Share this document with a friend
238
Analysis of Algorithms Multi-threaded Algorithms Andres Mendez-Vazquez April 15, 2016 1 / 94
Transcript
Page 1: 24 Multithreaded Algorithms

Analysis of AlgorithmsMulti-threaded Algorithms

Andres Mendez-Vazquez

April 15, 2016

1 / 94

Page 2: 24 Multithreaded Algorithms

Outline1 Introduction

Why Multi-Threaded Algorithms?2 Model To Be Used

Symmetric MultiprocessorOperationsExample

3 Computation DAGIntroduction

4 Performance MeasuresIntroductionRunning Time Classification

5 Parallel LawsWork and Span LawsSpeedup and ParallelismGreedy SchedulerScheduling Rises the Following Issue

6 ExamplesParallel FibonacciMatrix MultiplicationParallel Merge-Sort

7 ExercisesSome Exercises you can try!!!

2 / 94

Page 3: 24 Multithreaded Algorithms

Outline1 Introduction

Why Multi-Threaded Algorithms?2 Model To Be Used

Symmetric MultiprocessorOperationsExample

3 Computation DAGIntroduction

4 Performance MeasuresIntroductionRunning Time Classification

5 Parallel LawsWork and Span LawsSpeedup and ParallelismGreedy SchedulerScheduling Rises the Following Issue

6 ExamplesParallel FibonacciMatrix MultiplicationParallel Merge-Sort

7 ExercisesSome Exercises you can try!!!

3 / 94

Page 4: 24 Multithreaded Algorithms

Multi-Threaded Algorithms

MotivationUntil now, our serial algorithms are quite suitable for running on asingle processor system.However, multiprocessor algorithms are ubiquitous:

I Therefore, extending our serial models to a parallel computation modelis a must.

Computational ModelThere exist many competing models of parallel computation that areessentially different:

I Shared MemoryI Message PassingI Etc.

4 / 94

Page 5: 24 Multithreaded Algorithms

Multi-Threaded Algorithms

MotivationUntil now, our serial algorithms are quite suitable for running on asingle processor system.However, multiprocessor algorithms are ubiquitous:

I Therefore, extending our serial models to a parallel computation modelis a must.

Computational ModelThere exist many competing models of parallel computation that areessentially different:

I Shared MemoryI Message PassingI Etc.

4 / 94

Page 6: 24 Multithreaded Algorithms

Multi-Threaded Algorithms

MotivationUntil now, our serial algorithms are quite suitable for running on asingle processor system.However, multiprocessor algorithms are ubiquitous:

I Therefore, extending our serial models to a parallel computation modelis a must.

Computational ModelThere exist many competing models of parallel computation that areessentially different:

I Shared MemoryI Message PassingI Etc.

4 / 94

Page 7: 24 Multithreaded Algorithms

Multi-Threaded Algorithms

MotivationUntil now, our serial algorithms are quite suitable for running on asingle processor system.However, multiprocessor algorithms are ubiquitous:

I Therefore, extending our serial models to a parallel computation modelis a must.

Computational ModelThere exist many competing models of parallel computation that areessentially different:

I Shared MemoryI Message PassingI Etc.

4 / 94

Page 8: 24 Multithreaded Algorithms

Multi-Threaded Algorithms

MotivationUntil now, our serial algorithms are quite suitable for running on asingle processor system.However, multiprocessor algorithms are ubiquitous:

I Therefore, extending our serial models to a parallel computation modelis a must.

Computational ModelThere exist many competing models of parallel computation that areessentially different:

I Shared MemoryI Message PassingI Etc.

4 / 94

Page 9: 24 Multithreaded Algorithms

Multi-Threaded Algorithms

MotivationUntil now, our serial algorithms are quite suitable for running on asingle processor system.However, multiprocessor algorithms are ubiquitous:

I Therefore, extending our serial models to a parallel computation modelis a must.

Computational ModelThere exist many competing models of parallel computation that areessentially different:

I Shared MemoryI Message PassingI Etc.

4 / 94

Page 10: 24 Multithreaded Algorithms

Outline1 Introduction

Why Multi-Threaded Algorithms?2 Model To Be Used

Symmetric MultiprocessorOperationsExample

3 Computation DAGIntroduction

4 Performance MeasuresIntroductionRunning Time Classification

5 Parallel LawsWork and Span LawsSpeedup and ParallelismGreedy SchedulerScheduling Rises the Following Issue

6 ExamplesParallel FibonacciMatrix MultiplicationParallel Merge-Sort

7 ExercisesSome Exercises you can try!!!

5 / 94

Page 11: 24 Multithreaded Algorithms

The Model to Be Used

Symmetric MultiprocessorThe model that we will use is the Symmetric Multiprocessor (SMP) wherea shared memory exists.

L3 Cache

L2 Cache L1i Cache

L1d Cache CPU Core 1

CPU Core 3 CPU Core 4

CPU Core 2

L1d Cache

L1d Cache

L1d Cache

L1i Cache

L1i Cache

L1i Cache

L2 Cache

L2 CacheL2 Cache

P-to-P

L3 Cache

L2 Cache L1i Cache

L1d Cache CPU Core 1

CPU Core 3 CPU Core 4

CPU Core 2

L1d Cache

L1d Cache

L1d Cache

L1i Cache

L1i Cache

L1i Cache

L2 Cache

L2 CacheL2 Cache

P-to-P

L3 Cache

L2 Cache L1i Cache

L1d Cache CPU Core 1

CPU Core 3 CPU Core 4

CPU Core 2

L1d Cache

L1d Cache

L1d Cache

L1i Cache

L1i Cache

L1i Cache

L2 Cache

L2 CacheL2 Cache

P-to-P

L3 Cache

L2 Cache L1i Cache

L1d Cache CPU Core 1

CPU Core 3 CPU Core 4

CPU Core 2

L1d Cache

L1d Cache

L1d Cache

L1i Cache

L1i Cache

L1i Cache

L2 Cache

L2 CacheL2 Cache

P-to-P

MAIN SHARED MEMORY

Processor 1 Processor 2 Processor 3 Processor 4

BUS

6 / 94

Page 12: 24 Multithreaded Algorithms

Dynamic Multi-Threading

Dynamic Multi-ThreadingIn reality it can be difficult to handle multi-threaded programs in aSMP.Thus, we will assume a simple concurrency platform that handles allthe resources:

I SchedulesI MemoryI Etc

It is Called Dynamic Multi-threading.

Dynamic Multi-Threading Computing OperationsSpawnSyncParallel

7 / 94

Page 13: 24 Multithreaded Algorithms

Dynamic Multi-Threading

Dynamic Multi-ThreadingIn reality it can be difficult to handle multi-threaded programs in aSMP.Thus, we will assume a simple concurrency platform that handles allthe resources:

I SchedulesI MemoryI Etc

It is Called Dynamic Multi-threading.

Dynamic Multi-Threading Computing OperationsSpawnSyncParallel

7 / 94

Page 14: 24 Multithreaded Algorithms

Dynamic Multi-Threading

Dynamic Multi-ThreadingIn reality it can be difficult to handle multi-threaded programs in aSMP.Thus, we will assume a simple concurrency platform that handles allthe resources:

I SchedulesI MemoryI Etc

It is Called Dynamic Multi-threading.

Dynamic Multi-Threading Computing OperationsSpawnSyncParallel

7 / 94

Page 15: 24 Multithreaded Algorithms

Dynamic Multi-Threading

Dynamic Multi-ThreadingIn reality it can be difficult to handle multi-threaded programs in aSMP.Thus, we will assume a simple concurrency platform that handles allthe resources:

I SchedulesI MemoryI Etc

It is Called Dynamic Multi-threading.

Dynamic Multi-Threading Computing OperationsSpawnSyncParallel

7 / 94

Page 16: 24 Multithreaded Algorithms

Dynamic Multi-Threading

Dynamic Multi-ThreadingIn reality it can be difficult to handle multi-threaded programs in aSMP.Thus, we will assume a simple concurrency platform that handles allthe resources:

I SchedulesI MemoryI Etc

It is Called Dynamic Multi-threading.

Dynamic Multi-Threading Computing OperationsSpawnSyncParallel

7 / 94

Page 17: 24 Multithreaded Algorithms

Dynamic Multi-Threading

Dynamic Multi-ThreadingIn reality it can be difficult to handle multi-threaded programs in aSMP.Thus, we will assume a simple concurrency platform that handles allthe resources:

I SchedulesI MemoryI Etc

It is Called Dynamic Multi-threading.

Dynamic Multi-Threading Computing OperationsSpawnSyncParallel

7 / 94

Page 18: 24 Multithreaded Algorithms

Dynamic Multi-Threading

Dynamic Multi-ThreadingIn reality it can be difficult to handle multi-threaded programs in aSMP.Thus, we will assume a simple concurrency platform that handles allthe resources:

I SchedulesI MemoryI Etc

It is Called Dynamic Multi-threading.

Dynamic Multi-Threading Computing OperationsSpawnSyncParallel

7 / 94

Page 19: 24 Multithreaded Algorithms

Dynamic Multi-Threading

Dynamic Multi-ThreadingIn reality it can be difficult to handle multi-threaded programs in aSMP.Thus, we will assume a simple concurrency platform that handles allthe resources:

I SchedulesI MemoryI Etc

It is Called Dynamic Multi-threading.

Dynamic Multi-Threading Computing OperationsSpawnSyncParallel

7 / 94

Page 20: 24 Multithreaded Algorithms

Dynamic Multi-Threading

Dynamic Multi-ThreadingIn reality it can be difficult to handle multi-threaded programs in aSMP.Thus, we will assume a simple concurrency platform that handles allthe resources:

I SchedulesI MemoryI Etc

It is Called Dynamic Multi-threading.

Dynamic Multi-Threading Computing OperationsSpawnSyncParallel

7 / 94

Page 21: 24 Multithreaded Algorithms

Dynamic Multi-Threading

Dynamic Multi-ThreadingIn reality it can be difficult to handle multi-threaded programs in aSMP.Thus, we will assume a simple concurrency platform that handles allthe resources:

I SchedulesI MemoryI Etc

It is Called Dynamic Multi-threading.

Dynamic Multi-Threading Computing OperationsSpawnSyncParallel

7 / 94

Page 22: 24 Multithreaded Algorithms

Outline1 Introduction

Why Multi-Threaded Algorithms?2 Model To Be Used

Symmetric MultiprocessorOperationsExample

3 Computation DAGIntroduction

4 Performance MeasuresIntroductionRunning Time Classification

5 Parallel LawsWork and Span LawsSpeedup and ParallelismGreedy SchedulerScheduling Rises the Following Issue

6 ExamplesParallel FibonacciMatrix MultiplicationParallel Merge-Sort

7 ExercisesSome Exercises you can try!!!

8 / 94

Page 23: 24 Multithreaded Algorithms

SPAWN

SPAWNWhen called before a procedure, the parent procedure may continue toexecute in parallel.

NoteThe keyword spawn does not say anything about concurrentexecution, but it can happen.The Scheduler decide which computations should run concurrently.

9 / 94

Page 24: 24 Multithreaded Algorithms

SPAWN

SPAWNWhen called before a procedure, the parent procedure may continue toexecute in parallel.

NoteThe keyword spawn does not say anything about concurrentexecution, but it can happen.The Scheduler decide which computations should run concurrently.

9 / 94

Page 25: 24 Multithreaded Algorithms

SPAWN

SPAWNWhen called before a procedure, the parent procedure may continue toexecute in parallel.

NoteThe keyword spawn does not say anything about concurrentexecution, but it can happen.The Scheduler decide which computations should run concurrently.

9 / 94

Page 26: 24 Multithreaded Algorithms

SYNC AND PARALLEL

SYNCThe keyword sync indicates that the procedure must wait for all itsspawned children to complete.

PARALLELThis operation applies to loops, which make possible to execute the bodyof the loop in parallel.

10 / 94

Page 27: 24 Multithreaded Algorithms

SYNC AND PARALLEL

SYNCThe keyword sync indicates that the procedure must wait for all itsspawned children to complete.

PARALLELThis operation applies to loops, which make possible to execute the bodyof the loop in parallel.

10 / 94

Page 28: 24 Multithreaded Algorithms

Outline1 Introduction

Why Multi-Threaded Algorithms?2 Model To Be Used

Symmetric MultiprocessorOperationsExample

3 Computation DAGIntroduction

4 Performance MeasuresIntroductionRunning Time Classification

5 Parallel LawsWork and Span LawsSpeedup and ParallelismGreedy SchedulerScheduling Rises the Following Issue

6 ExamplesParallel FibonacciMatrix MultiplicationParallel Merge-Sort

7 ExercisesSome Exercises you can try!!!

11 / 94

Page 29: 24 Multithreaded Algorithms

A Classic Parallel Piece of Code: Fibonacci Numbers

Fibonacci’s DefinitionF0 = 0F1 = 1Fi = Fi−1 + Fi−2 for i > 1.

Naive AlgorithmFibonacci(n)

1 if n ≤ 1 then2 return n3 else x = Fibonacci(n − 1)4 y = Fibonacci(n − 2)5 return x + y

12 / 94

Page 30: 24 Multithreaded Algorithms

A Classic Parallel Piece of Code: Fibonacci Numbers

Fibonacci’s DefinitionF0 = 0F1 = 1Fi = Fi−1 + Fi−2 for i > 1.

Naive AlgorithmFibonacci(n)

1 if n ≤ 1 then2 return n3 else x = Fibonacci(n − 1)4 y = Fibonacci(n − 2)5 return x + y

12 / 94

Page 31: 24 Multithreaded Algorithms

Time Complexity

Recursion and ComplexityRecursion T (n)=T (n − 1) + T (n − 2) + Θ (1).Complexity T (n) = Θ (Fn) = Θ (φn), φ = 1+

√5

2 .

13 / 94

Page 32: 24 Multithreaded Algorithms

Time Complexity

Recursion and ComplexityRecursion T (n)=T (n − 1) + T (n − 2) + Θ (1).Complexity T (n) = Θ (Fn) = Θ (φn), φ = 1+

√5

2 .

13 / 94

Page 33: 24 Multithreaded Algorithms

There is a Better Way

We can order the first tree numbers in the sequence as(F2 F1F1 F0

)=(

1 11 0

)

Then(

F2 F1F1 F0

)(F2 F1F1 F0

)=(

1 11 0

)(1 11 0

)

=(

2 11 1

)

=(

F3 F2F2 F1

)

14 / 94

Page 34: 24 Multithreaded Algorithms

There is a Better Way

We can order the first tree numbers in the sequence as(F2 F1F1 F0

)=(

1 11 0

)

Then(

F2 F1F1 F0

)(F2 F1F1 F0

)=(

1 11 0

)(1 11 0

)

=(

2 11 1

)

=(

F3 F2F2 F1

)

14 / 94

Page 35: 24 Multithreaded Algorithms

There is a Better Way

Calculating in O(log n) when n is a power of 2(1 11 0

)n=(

F (n + 1) F (n)F (n) F (n − 1)

)

Thus(

1 11 0

) n2(

1 11 0

) n2

=(

F(

n2 + 1

)F(

n2

)F(

n2

)F(

n2 − 1

) )( F(

n2 + 1

)F(

n2

)F(

n2

)F(

n2 − 1

) )

However...We will use the naive version to illustrate the principles of parallelprogramming.

15 / 94

Page 36: 24 Multithreaded Algorithms

There is a Better Way

Calculating in O(log n) when n is a power of 2(1 11 0

)n=(

F (n + 1) F (n)F (n) F (n − 1)

)

Thus(

1 11 0

) n2(

1 11 0

) n2

=(

F(

n2 + 1

)F(

n2

)F(

n2

)F(

n2 − 1

) )( F(

n2 + 1

)F(

n2

)F(

n2

)F(

n2 − 1

) )

However...We will use the naive version to illustrate the principles of parallelprogramming.

15 / 94

Page 37: 24 Multithreaded Algorithms

The Concurrent Code

Parallel AlgorithmPFibonacci(n)

1 if n ≤ 1 then2 return n3 else x = spawn Fibonacci(n − 1)4 y = Fibonacci(n − 2)5 sync6 return x + y

16 / 94

Page 38: 24 Multithreaded Algorithms

The Concurrent Code

Parallel AlgorithmPFibonacci(n)

1 if n ≤ 1 then2 return n3 else x = spawn Fibonacci(n − 1)4 y = Fibonacci(n − 2)5 sync6 return x + y

16 / 94

Page 39: 24 Multithreaded Algorithms

The Concurrent Code

Parallel AlgorithmPFibonacci(n)

1 if n ≤ 1 then2 return n3 else x = spawn Fibonacci(n − 1)4 y = Fibonacci(n − 2)5 sync6 return x + y

16 / 94

Page 40: 24 Multithreaded Algorithms

Outline1 Introduction

Why Multi-Threaded Algorithms?2 Model To Be Used

Symmetric MultiprocessorOperationsExample

3 Computation DAGIntroduction

4 Performance MeasuresIntroductionRunning Time Classification

5 Parallel LawsWork and Span LawsSpeedup and ParallelismGreedy SchedulerScheduling Rises the Following Issue

6 ExamplesParallel FibonacciMatrix MultiplicationParallel Merge-Sort

7 ExercisesSome Exercises you can try!!!

17 / 94

Page 41: 24 Multithreaded Algorithms

How do we compute a complexity? Computation DAG

DefinitionA directed acyclic G = (V ,E) graph where

The vertices V are sets of instructions.The edges E represent dependencies between sets of instructions i.e.(u, v) instruction u before v.

NotesA set of instructions without any parallel control are grouped in astrand.Thus, V represents a set of strands and E represents dependenciesbetween strands induced by parallel control.A strand of maximal length will be called a thread.

18 / 94

Page 42: 24 Multithreaded Algorithms

How do we compute a complexity? Computation DAG

DefinitionA directed acyclic G = (V ,E) graph where

The vertices V are sets of instructions.The edges E represent dependencies between sets of instructions i.e.(u, v) instruction u before v.

NotesA set of instructions without any parallel control are grouped in astrand.Thus, V represents a set of strands and E represents dependenciesbetween strands induced by parallel control.A strand of maximal length will be called a thread.

18 / 94

Page 43: 24 Multithreaded Algorithms

How do we compute a complexity? Computation DAG

DefinitionA directed acyclic G = (V ,E) graph where

The vertices V are sets of instructions.The edges E represent dependencies between sets of instructions i.e.(u, v) instruction u before v.

NotesA set of instructions without any parallel control are grouped in astrand.Thus, V represents a set of strands and E represents dependenciesbetween strands induced by parallel control.A strand of maximal length will be called a thread.

18 / 94

Page 44: 24 Multithreaded Algorithms

How do we compute a complexity? Computation DAG

DefinitionA directed acyclic G = (V ,E) graph where

The vertices V are sets of instructions.The edges E represent dependencies between sets of instructions i.e.(u, v) instruction u before v.

NotesA set of instructions without any parallel control are grouped in astrand.Thus, V represents a set of strands and E represents dependenciesbetween strands induced by parallel control.A strand of maximal length will be called a thread.

18 / 94

Page 45: 24 Multithreaded Algorithms

How do we compute a complexity? Computation DAG

DefinitionA directed acyclic G = (V ,E) graph where

The vertices V are sets of instructions.The edges E represent dependencies between sets of instructions i.e.(u, v) instruction u before v.

NotesA set of instructions without any parallel control are grouped in astrand.Thus, V represents a set of strands and E represents dependenciesbetween strands induced by parallel control.A strand of maximal length will be called a thread.

18 / 94

Page 46: 24 Multithreaded Algorithms

How do we compute a complexity? Computation DAG

DefinitionA directed acyclic G = (V ,E) graph where

The vertices V are sets of instructions.The edges E represent dependencies between sets of instructions i.e.(u, v) instruction u before v.

NotesA set of instructions without any parallel control are grouped in astrand.Thus, V represents a set of strands and E represents dependenciesbetween strands induced by parallel control.A strand of maximal length will be called a thread.

18 / 94

Page 47: 24 Multithreaded Algorithms

How do we compute a complexity? Computation DAG

ThusIf there is an edge between thread u and v, then they are said to be(logically) in series.If there is no edge, then they are said to be (logically) in parallel.

19 / 94

Page 48: 24 Multithreaded Algorithms

How do we compute a complexity? Computation DAG

ThusIf there is an edge between thread u and v, then they are said to be(logically) in series.If there is no edge, then they are said to be (logically) in parallel.

19 / 94

Page 49: 24 Multithreaded Algorithms

Example: PFibonacci(4)

ExamplePFibonacci(4)

PFibonacci(3) PFibonacci(2)

PFibonacci(2)

PFibonacci(1) PFibonacci(1)

PFibonacci(1) PFibonacci(0)

PFibonacci(0)

20 / 94

Page 50: 24 Multithreaded Algorithms

Edge Classification

Continuation EdgeA continuation edge (u, v) connects a thread u to its successor v withinthe same procedure instance.

Spawned EdgeWhen a thread u spawns a new thread v, then (u, v) is called a spawnededge.

Call EdgesCall edges represent normal procedure call.

Return EdgeReturn edge signals when a thread v returns to its calling procedure.

21 / 94

Page 51: 24 Multithreaded Algorithms

Edge Classification

Continuation EdgeA continuation edge (u, v) connects a thread u to its successor v withinthe same procedure instance.

Spawned EdgeWhen a thread u spawns a new thread v, then (u, v) is called a spawnededge.

Call EdgesCall edges represent normal procedure call.

Return EdgeReturn edge signals when a thread v returns to its calling procedure.

21 / 94

Page 52: 24 Multithreaded Algorithms

Edge Classification

Continuation EdgeA continuation edge (u, v) connects a thread u to its successor v withinthe same procedure instance.

Spawned EdgeWhen a thread u spawns a new thread v, then (u, v) is called a spawnededge.

Call EdgesCall edges represent normal procedure call.

Return EdgeReturn edge signals when a thread v returns to its calling procedure.

21 / 94

Page 53: 24 Multithreaded Algorithms

Edge Classification

Continuation EdgeA continuation edge (u, v) connects a thread u to its successor v withinthe same procedure instance.

Spawned EdgeWhen a thread u spawns a new thread v, then (u, v) is called a spawnededge.

Call EdgesCall edges represent normal procedure call.

Return EdgeReturn edge signals when a thread v returns to its calling procedure.

21 / 94

Page 54: 24 Multithreaded Algorithms

Example: PFibonacci(4)

The Different Edges

PFibonacci(4)

PFibonacci(3) PFibonacci(2)

PFibonacci(2)

PFibonacci(1) PFibonacci(1)

PFibonacci(1) PFibonacci(0)

PFibonacci(0)

Init Thread

Spawn Edge

Continuation Edge

Return Edge

Final Thread

Call Edge

22 / 94

Page 55: 24 Multithreaded Algorithms

Outline1 Introduction

Why Multi-Threaded Algorithms?2 Model To Be Used

Symmetric MultiprocessorOperationsExample

3 Computation DAGIntroduction

4 Performance MeasuresIntroductionRunning Time Classification

5 Parallel LawsWork and Span LawsSpeedup and ParallelismGreedy SchedulerScheduling Rises the Following Issue

6 ExamplesParallel FibonacciMatrix MultiplicationParallel Merge-Sort

7 ExercisesSome Exercises you can try!!!

23 / 94

Page 56: 24 Multithreaded Algorithms

Performance Measures

WORKThe work of a multi-threaded computation is the total time to execute theentire computation on one processor.

Work =∑i∈I

Time (Threadi)

SPANThe span is the longest time to execute the strands along any path of theDAG.

In a DAG which each strand takes unit time, the span equals thenumber of vertices on a longest or critical path in the DAG.

24 / 94

Page 57: 24 Multithreaded Algorithms

Performance Measures

WORKThe work of a multi-threaded computation is the total time to execute theentire computation on one processor.

Work =∑i∈I

Time (Threadi)

SPANThe span is the longest time to execute the strands along any path of theDAG.

In a DAG which each strand takes unit time, the span equals thenumber of vertices on a longest or critical path in the DAG.

24 / 94

Page 58: 24 Multithreaded Algorithms

Example: PFibonacci(4)

Example

Critical Path

PFibonacci(4)

PFibonacci(3) PFibonacci(2)

PFibonacci(2)

PFibonacci(1) PFibonacci(1)

PFibonacci(1) PFibonacci(0)

PFibonacci(0)

25 / 94

Page 59: 24 Multithreaded Algorithms

Example

ExampleIn Fibonacci(4), we have

I 17 threads.I 8 vertices in the longest path

We have thatAssuming unit time

I WORK=17 time unitsI SPAN=8 time units

NoteRunning time not only depends on work and span but

I Available CoresI Scheduler Policies

26 / 94

Page 60: 24 Multithreaded Algorithms

Example

ExampleIn Fibonacci(4), we have

I 17 threads.I 8 vertices in the longest path

We have thatAssuming unit time

I WORK=17 time unitsI SPAN=8 time units

NoteRunning time not only depends on work and span but

I Available CoresI Scheduler Policies

26 / 94

Page 61: 24 Multithreaded Algorithms

Example

ExampleIn Fibonacci(4), we have

I 17 threads.I 8 vertices in the longest path

We have thatAssuming unit time

I WORK=17 time unitsI SPAN=8 time units

NoteRunning time not only depends on work and span but

I Available CoresI Scheduler Policies

26 / 94

Page 62: 24 Multithreaded Algorithms

Example

ExampleIn Fibonacci(4), we have

I 17 threads.I 8 vertices in the longest path

We have thatAssuming unit time

I WORK=17 time unitsI SPAN=8 time units

NoteRunning time not only depends on work and span but

I Available CoresI Scheduler Policies

26 / 94

Page 63: 24 Multithreaded Algorithms

Example

ExampleIn Fibonacci(4), we have

I 17 threads.I 8 vertices in the longest path

We have thatAssuming unit time

I WORK=17 time unitsI SPAN=8 time units

NoteRunning time not only depends on work and span but

I Available CoresI Scheduler Policies

26 / 94

Page 64: 24 Multithreaded Algorithms

Example

ExampleIn Fibonacci(4), we have

I 17 threads.I 8 vertices in the longest path

We have thatAssuming unit time

I WORK=17 time unitsI SPAN=8 time units

NoteRunning time not only depends on work and span but

I Available CoresI Scheduler Policies

26 / 94

Page 65: 24 Multithreaded Algorithms

Example

ExampleIn Fibonacci(4), we have

I 17 threads.I 8 vertices in the longest path

We have thatAssuming unit time

I WORK=17 time unitsI SPAN=8 time units

NoteRunning time not only depends on work and span but

I Available CoresI Scheduler Policies

26 / 94

Page 66: 24 Multithreaded Algorithms

Example

ExampleIn Fibonacci(4), we have

I 17 threads.I 8 vertices in the longest path

We have thatAssuming unit time

I WORK=17 time unitsI SPAN=8 time units

NoteRunning time not only depends on work and span but

I Available CoresI Scheduler Policies

26 / 94

Page 67: 24 Multithreaded Algorithms

Example

ExampleIn Fibonacci(4), we have

I 17 threads.I 8 vertices in the longest path

We have thatAssuming unit time

I WORK=17 time unitsI SPAN=8 time units

NoteRunning time not only depends on work and span but

I Available CoresI Scheduler Policies

26 / 94

Page 68: 24 Multithreaded Algorithms

Outline1 Introduction

Why Multi-Threaded Algorithms?2 Model To Be Used

Symmetric MultiprocessorOperationsExample

3 Computation DAGIntroduction

4 Performance MeasuresIntroductionRunning Time Classification

5 Parallel LawsWork and Span LawsSpeedup and ParallelismGreedy SchedulerScheduling Rises the Following Issue

6 ExamplesParallel FibonacciMatrix MultiplicationParallel Merge-Sort

7 ExercisesSome Exercises you can try!!!

27 / 94

Page 69: 24 Multithreaded Algorithms

Running Time Classification

Single ProcessorT1 running time on a single processor.

Multiple ProcessorsTp running time on P processors.

Unlimited ProcessorsT∞ running time on unlimited processors, also called the span, if werun each strand on its own processor.

28 / 94

Page 70: 24 Multithreaded Algorithms

Running Time Classification

Single ProcessorT1 running time on a single processor.

Multiple ProcessorsTp running time on P processors.

Unlimited ProcessorsT∞ running time on unlimited processors, also called the span, if werun each strand on its own processor.

28 / 94

Page 71: 24 Multithreaded Algorithms

Running Time Classification

Single ProcessorT1 running time on a single processor.

Multiple ProcessorsTp running time on P processors.

Unlimited ProcessorsT∞ running time on unlimited processors, also called the span, if werun each strand on its own processor.

28 / 94

Page 72: 24 Multithreaded Algorithms

Outline1 Introduction

Why Multi-Threaded Algorithms?2 Model To Be Used

Symmetric MultiprocessorOperationsExample

3 Computation DAGIntroduction

4 Performance MeasuresIntroductionRunning Time Classification

5 Parallel LawsWork and Span LawsSpeedup and ParallelismGreedy SchedulerScheduling Rises the Following Issue

6 ExamplesParallel FibonacciMatrix MultiplicationParallel Merge-Sort

7 ExercisesSome Exercises you can try!!!

29 / 94

Page 73: 24 Multithreaded Algorithms

Work Law

DefinitionIn one step, an ideal parallel computer with P processors can do:

I At most P units of work.I Thus in TP time, it can perform at most PTP work.

PTP ≥ T1 =⇒ Tp ≥T1

P

30 / 94

Page 74: 24 Multithreaded Algorithms

Work Law

DefinitionIn one step, an ideal parallel computer with P processors can do:

I At most P units of work.I Thus in TP time, it can perform at most PTP work.

PTP ≥ T1 =⇒ Tp ≥T1

P

30 / 94

Page 75: 24 Multithreaded Algorithms

Work Law

DefinitionIn one step, an ideal parallel computer with P processors can do:

I At most P units of work.I Thus in TP time, it can perform at most PTP work.

PTP ≥ T1 =⇒ Tp ≥T1

P

30 / 94

Page 76: 24 Multithreaded Algorithms

Work Law

DefinitionIn one step, an ideal parallel computer with P processors can do:

I At most P units of work.I Thus in TP time, it can perform at most PTP work.

PTP ≥ T1 =⇒ Tp ≥T1

P

30 / 94

Page 77: 24 Multithreaded Algorithms

Span Law

DefinitionA P-processor ideal parallel computer cannot run faster than amachine with unlimited number of processors.However, a computer with unlimited number of processors canemulate a P-processor machine by using simply P of its processors.Therefore,

TP ≥ T∞

31 / 94

Page 78: 24 Multithreaded Algorithms

Span Law

DefinitionA P-processor ideal parallel computer cannot run faster than amachine with unlimited number of processors.However, a computer with unlimited number of processors canemulate a P-processor machine by using simply P of its processors.Therefore,

TP ≥ T∞

31 / 94

Page 79: 24 Multithreaded Algorithms

Work Calculations: Serial

Serial Computations

A B

NoteWork: T1 (A ∪ B) = T1 (A) + T1 (B).Span: T∞ (A ∪ B) = T∞ (A) + T∞ (B).

32 / 94

Page 80: 24 Multithreaded Algorithms

Work Calculations: Serial

Serial Computations

A B

NoteWork: T1 (A ∪ B) = T1 (A) + T1 (B).Span: T∞ (A ∪ B) = T∞ (A) + T∞ (B).

32 / 94

Page 81: 24 Multithreaded Algorithms

Work Calculations: Serial

Serial Computations

A B

NoteWork: T1 (A ∪ B) = T1 (A) + T1 (B).Span: T∞ (A ∪ B) = T∞ (A) + T∞ (B).

32 / 94

Page 82: 24 Multithreaded Algorithms

Work Calculations: Parallel

Parallel Computations

A

B

NoteWork: T1 (A ∪ B) = T1 (A) + T1 (B).Span: T∞ (A ∪ B) = max T∞ (A) ,T∞ (B).

33 / 94

Page 83: 24 Multithreaded Algorithms

Work Calculations: Parallel

Parallel Computations

A

B

NoteWork: T1 (A ∪ B) = T1 (A) + T1 (B).Span: T∞ (A ∪ B) = max T∞ (A) ,T∞ (B).

33 / 94

Page 84: 24 Multithreaded Algorithms

Work Calculations: Parallel

Parallel Computations

A

B

NoteWork: T1 (A ∪ B) = T1 (A) + T1 (B).Span: T∞ (A ∪ B) = max T∞ (A) ,T∞ (B).

33 / 94

Page 85: 24 Multithreaded Algorithms

Outline1 Introduction

Why Multi-Threaded Algorithms?2 Model To Be Used

Symmetric MultiprocessorOperationsExample

3 Computation DAGIntroduction

4 Performance MeasuresIntroductionRunning Time Classification

5 Parallel LawsWork and Span LawsSpeedup and ParallelismGreedy SchedulerScheduling Rises the Following Issue

6 ExamplesParallel FibonacciMatrix MultiplicationParallel Merge-Sort

7 ExercisesSome Exercises you can try!!!

34 / 94

Page 86: 24 Multithreaded Algorithms

Speedup and Parallelism

Speed upThe speed up of a computation on P processors is defined as T1

TP.

Then, by work law T1TP≤ P. Thus, the speedup on P processors can

be at most P.

NotesLinear Speedup when T1

TP= Θ (P).

Perfect Linear Speedup when T1TP

= P.

ParallelismThe parallelism of a computation on P processors is defined as T1

T∞.

I In specific, we are looking to have a lot of parallelism.I This changes from Algorithm to Algorithm.

35 / 94

Page 87: 24 Multithreaded Algorithms

Speedup and Parallelism

Speed upThe speed up of a computation on P processors is defined as T1

TP.

Then, by work law T1TP≤ P. Thus, the speedup on P processors can

be at most P.

NotesLinear Speedup when T1

TP= Θ (P).

Perfect Linear Speedup when T1TP

= P.

ParallelismThe parallelism of a computation on P processors is defined as T1

T∞.

I In specific, we are looking to have a lot of parallelism.I This changes from Algorithm to Algorithm.

35 / 94

Page 88: 24 Multithreaded Algorithms

Speedup and Parallelism

Speed upThe speed up of a computation on P processors is defined as T1

TP.

Then, by work law T1TP≤ P. Thus, the speedup on P processors can

be at most P.

NotesLinear Speedup when T1

TP= Θ (P).

Perfect Linear Speedup when T1TP

= P.

ParallelismThe parallelism of a computation on P processors is defined as T1

T∞.

I In specific, we are looking to have a lot of parallelism.I This changes from Algorithm to Algorithm.

35 / 94

Page 89: 24 Multithreaded Algorithms

Speedup and Parallelism

Speed upThe speed up of a computation on P processors is defined as T1

TP.

Then, by work law T1TP≤ P. Thus, the speedup on P processors can

be at most P.

NotesLinear Speedup when T1

TP= Θ (P).

Perfect Linear Speedup when T1TP

= P.

ParallelismThe parallelism of a computation on P processors is defined as T1

T∞.

I In specific, we are looking to have a lot of parallelism.I This changes from Algorithm to Algorithm.

35 / 94

Page 90: 24 Multithreaded Algorithms

Speedup and Parallelism

Speed upThe speed up of a computation on P processors is defined as T1

TP.

Then, by work law T1TP≤ P. Thus, the speedup on P processors can

be at most P.

NotesLinear Speedup when T1

TP= Θ (P).

Perfect Linear Speedup when T1TP

= P.

ParallelismThe parallelism of a computation on P processors is defined as T1

T∞.

I In specific, we are looking to have a lot of parallelism.I This changes from Algorithm to Algorithm.

35 / 94

Page 91: 24 Multithreaded Algorithms

Speedup and Parallelism

Speed upThe speed up of a computation on P processors is defined as T1

TP.

Then, by work law T1TP≤ P. Thus, the speedup on P processors can

be at most P.

NotesLinear Speedup when T1

TP= Θ (P).

Perfect Linear Speedup when T1TP

= P.

ParallelismThe parallelism of a computation on P processors is defined as T1

T∞.

I In specific, we are looking to have a lot of parallelism.I This changes from Algorithm to Algorithm.

35 / 94

Page 92: 24 Multithreaded Algorithms

Speedup and Parallelism

Speed upThe speed up of a computation on P processors is defined as T1

TP.

Then, by work law T1TP≤ P. Thus, the speedup on P processors can

be at most P.

NotesLinear Speedup when T1

TP= Θ (P).

Perfect Linear Speedup when T1TP

= P.

ParallelismThe parallelism of a computation on P processors is defined as T1

T∞.

I In specific, we are looking to have a lot of parallelism.I This changes from Algorithm to Algorithm.

35 / 94

Page 93: 24 Multithreaded Algorithms

Outline1 Introduction

Why Multi-Threaded Algorithms?2 Model To Be Used

Symmetric MultiprocessorOperationsExample

3 Computation DAGIntroduction

4 Performance MeasuresIntroductionRunning Time Classification

5 Parallel LawsWork and Span LawsSpeedup and ParallelismGreedy SchedulerScheduling Rises the Following Issue

6 ExamplesParallel FibonacciMatrix MultiplicationParallel Merge-Sort

7 ExercisesSome Exercises you can try!!!

36 / 94

Page 94: 24 Multithreaded Algorithms

Greedy Scheduler

DefinitionA greedy scheduler assigns as many strands to processors aspossible in each time step.

NoteOn P processors, if at least P strands are ready to execute during atime step, then we say that the step is a complete step.Otherwise we say that it is an incomplete step.This changes from Algorithm to Algorithm.

37 / 94

Page 95: 24 Multithreaded Algorithms

Greedy Scheduler

DefinitionA greedy scheduler assigns as many strands to processors aspossible in each time step.

NoteOn P processors, if at least P strands are ready to execute during atime step, then we say that the step is a complete step.Otherwise we say that it is an incomplete step.This changes from Algorithm to Algorithm.

37 / 94

Page 96: 24 Multithreaded Algorithms

Greedy Scheduler

DefinitionA greedy scheduler assigns as many strands to processors aspossible in each time step.

NoteOn P processors, if at least P strands are ready to execute during atime step, then we say that the step is a complete step.Otherwise we say that it is an incomplete step.This changes from Algorithm to Algorithm.

37 / 94

Page 97: 24 Multithreaded Algorithms

Greedy Scheduler

DefinitionA greedy scheduler assigns as many strands to processors aspossible in each time step.

NoteOn P processors, if at least P strands are ready to execute during atime step, then we say that the step is a complete step.Otherwise we say that it is an incomplete step.This changes from Algorithm to Algorithm.

37 / 94

Page 98: 24 Multithreaded Algorithms

Greedy Scheduler Theorem and CorollariesTheorem 27.1On an ideal parallel computer with P processors, a greedy schedulerexecutes a multi-threaded computation with work T1 and span T∞ in timeTP ≤ T1

P + T∞.

Corollary 27.2The running time TP of any multi-threaded computation scheduled by agreedy scheduler on an ideal parallel computer with P processors is withina factor of 2 of optimal.

Corollary 27.3Let TP be the running time of a multi-threaded computation produced bya greedy scheduler on an ideal parallel computer with P processors, and letT1 and T∞ be the work and span of the computation, respectively. Then,if P T1

T∞(Much Less), we have TP ≈ T1

P , or equivalently, a speedup ofapproximately P .

38 / 94

Page 99: 24 Multithreaded Algorithms

Greedy Scheduler Theorem and CorollariesTheorem 27.1On an ideal parallel computer with P processors, a greedy schedulerexecutes a multi-threaded computation with work T1 and span T∞ in timeTP ≤ T1

P + T∞.

Corollary 27.2The running time TP of any multi-threaded computation scheduled by agreedy scheduler on an ideal parallel computer with P processors is withina factor of 2 of optimal.

Corollary 27.3Let TP be the running time of a multi-threaded computation produced bya greedy scheduler on an ideal parallel computer with P processors, and letT1 and T∞ be the work and span of the computation, respectively. Then,if P T1

T∞(Much Less), we have TP ≈ T1

P , or equivalently, a speedup ofapproximately P .

38 / 94

Page 100: 24 Multithreaded Algorithms

Greedy Scheduler Theorem and CorollariesTheorem 27.1On an ideal parallel computer with P processors, a greedy schedulerexecutes a multi-threaded computation with work T1 and span T∞ in timeTP ≤ T1

P + T∞.

Corollary 27.2The running time TP of any multi-threaded computation scheduled by agreedy scheduler on an ideal parallel computer with P processors is withina factor of 2 of optimal.

Corollary 27.3Let TP be the running time of a multi-threaded computation produced bya greedy scheduler on an ideal parallel computer with P processors, and letT1 and T∞ be the work and span of the computation, respectively. Then,if P T1

T∞(Much Less), we have TP ≈ T1

P , or equivalently, a speedup ofapproximately P .

38 / 94

Page 101: 24 Multithreaded Algorithms

Outline1 Introduction

Why Multi-Threaded Algorithms?2 Model To Be Used

Symmetric MultiprocessorOperationsExample

3 Computation DAGIntroduction

4 Performance MeasuresIntroductionRunning Time Classification

5 Parallel LawsWork and Span LawsSpeedup and ParallelismGreedy SchedulerScheduling Rises the Following Issue

6 ExamplesParallel FibonacciMatrix MultiplicationParallel Merge-Sort

7 ExercisesSome Exercises you can try!!!

39 / 94

Page 102: 24 Multithreaded Algorithms

Race Conditions

Determinacy RaceA determinacy race occurs when two logically parallel instructions accessthe same memory location and at least one of the instructions performs awrite.

ExampleRace-Example()

1 x = 02 parallel for i = 1 to 3 do3 x = x + 14 print x

40 / 94

Page 103: 24 Multithreaded Algorithms

Race Conditions

Determinacy RaceA determinacy race occurs when two logically parallel instructions accessthe same memory location and at least one of the instructions performs awrite.

ExampleRace-Example()

1 x = 02 parallel for i = 1 to 3 do3 x = x + 14 print x

40 / 94

Page 104: 24 Multithreaded Algorithms

Example

Determinacy Race Example

1

2

3

4 5

67

8 910

11

step x r1 r2 r3

1 02 0 03 0 14 0 1 05 0 1 0 06 0 1 0 17 0 1 1 18 1 1 1 19 1 1 1 110 1 1 1 1

41 / 94

Page 105: 24 Multithreaded Algorithms

Example

NOTEAlthough, this is of great importance is beyond the scope of this class:

For More about this topic, we have:I Maurice Herlihy and Nir Shavit, “The Art of Multiprocessor

Programming,” Morgan Kaufmann Publishers Inc., San Francisco, CA,USA, 2008.

I Andrew S. Tanenbaum, “Modern Operating Systems” (3rd ed.).Prentice Hall Press, Upper Saddle River, NJ, USA, 2007.

42 / 94

Page 106: 24 Multithreaded Algorithms

Example

NOTEAlthough, this is of great importance is beyond the scope of this class:

For More about this topic, we have:I Maurice Herlihy and Nir Shavit, “The Art of Multiprocessor

Programming,” Morgan Kaufmann Publishers Inc., San Francisco, CA,USA, 2008.

I Andrew S. Tanenbaum, “Modern Operating Systems” (3rd ed.).Prentice Hall Press, Upper Saddle River, NJ, USA, 2007.

42 / 94

Page 107: 24 Multithreaded Algorithms

Example

NOTEAlthough, this is of great importance is beyond the scope of this class:

For More about this topic, we have:I Maurice Herlihy and Nir Shavit, “The Art of Multiprocessor

Programming,” Morgan Kaufmann Publishers Inc., San Francisco, CA,USA, 2008.

I Andrew S. Tanenbaum, “Modern Operating Systems” (3rd ed.).Prentice Hall Press, Upper Saddle River, NJ, USA, 2007.

42 / 94

Page 108: 24 Multithreaded Algorithms

Example

NOTEAlthough, this is of great importance is beyond the scope of this class:

For More about this topic, we have:I Maurice Herlihy and Nir Shavit, “The Art of Multiprocessor

Programming,” Morgan Kaufmann Publishers Inc., San Francisco, CA,USA, 2008.

I Andrew S. Tanenbaum, “Modern Operating Systems” (3rd ed.).Prentice Hall Press, Upper Saddle River, NJ, USA, 2007.

42 / 94

Page 109: 24 Multithreaded Algorithms

Outline1 Introduction

Why Multi-Threaded Algorithms?2 Model To Be Used

Symmetric MultiprocessorOperationsExample

3 Computation DAGIntroduction

4 Performance MeasuresIntroductionRunning Time Classification

5 Parallel LawsWork and Span LawsSpeedup and ParallelismGreedy SchedulerScheduling Rises the Following Issue

6 ExamplesParallel FibonacciMatrix MultiplicationParallel Merge-Sort

7 ExercisesSome Exercises you can try!!!

43 / 94

Page 110: 24 Multithreaded Algorithms

Example of Complexity: PFibonacci

Complexity

T∞ (n) = max T∞ (n − 1) ,T∞ (n − 2)+ Θ (1)

Finally

T∞ (n) = T∞ (n − 1) + Θ (1) = Θ (n)

ParallelismT1 (n)T∞ (n) = Θ

(φn

n

)

44 / 94

Page 111: 24 Multithreaded Algorithms

Example of Complexity: PFibonacci

Complexity

T∞ (n) = max T∞ (n − 1) ,T∞ (n − 2)+ Θ (1)

Finally

T∞ (n) = T∞ (n − 1) + Θ (1) = Θ (n)

ParallelismT1 (n)T∞ (n) = Θ

(φn

n

)

44 / 94

Page 112: 24 Multithreaded Algorithms

Example of Complexity: PFibonacci

Complexity

T∞ (n) = max T∞ (n − 1) ,T∞ (n − 2)+ Θ (1)

Finally

T∞ (n) = T∞ (n − 1) + Θ (1) = Θ (n)

ParallelismT1 (n)T∞ (n) = Θ

(φn

n

)

44 / 94

Page 113: 24 Multithreaded Algorithms

Outline1 Introduction

Why Multi-Threaded Algorithms?2 Model To Be Used

Symmetric MultiprocessorOperationsExample

3 Computation DAGIntroduction

4 Performance MeasuresIntroductionRunning Time Classification

5 Parallel LawsWork and Span LawsSpeedup and ParallelismGreedy SchedulerScheduling Rises the Following Issue

6 ExamplesParallel FibonacciMatrix MultiplicationParallel Merge-Sort

7 ExercisesSome Exercises you can try!!!

45 / 94

Page 114: 24 Multithreaded Algorithms

Matrix Multiplication

TrickTo multiply two n × n matrices, we perform 8 matrix multiplications ofn2 ×

n2 matrices and one addition n × n of matrices.

Idea

A=(

A11 A12A21 A22

),B =

(B11 B12B21 B22

), C =

(C11 C12C21 C22

)C =

(C11 C12C21 C22

)=(

A11 A12A21 A22

)(B11 B12B21 B22

)= ...

(A11B11 A11B12A21B11 A21B12

)+(

A12B21 A12B22A22B21 A22B22

)

46 / 94

Page 115: 24 Multithreaded Algorithms

Matrix Multiplication

TrickTo multiply two n × n matrices, we perform 8 matrix multiplications ofn2 ×

n2 matrices and one addition n × n of matrices.

Idea

A=(

A11 A12A21 A22

),B =

(B11 B12B21 B22

), C =

(C11 C12C21 C22

)C =

(C11 C12C21 C22

)=(

A11 A12A21 A22

)(B11 B12B21 B22

)= ...

(A11B11 A11B12A21B11 A21B12

)+(

A12B21 A12B22A22B21 A22B22

)

46 / 94

Page 116: 24 Multithreaded Algorithms

Any Idea to Parallelize the Code?

What do you think?Did you notice the multiplications of sub-matrices?

Then What?We have for example A11B11 and A12B21!!!

We can do the following

A11B11 + A12B21

47 / 94

Page 117: 24 Multithreaded Algorithms

Any Idea to Parallelize the Code?

What do you think?Did you notice the multiplications of sub-matrices?

Then What?We have for example A11B11 and A12B21!!!

We can do the following

A11B11 + A12B21

47 / 94

Page 118: 24 Multithreaded Algorithms

Any Idea to Parallelize the Code?

What do you think?Did you notice the multiplications of sub-matrices?

Then What?We have for example A11B11 and A12B21!!!

We can do the following

A11B11 + A12B21

47 / 94

Page 119: 24 Multithreaded Algorithms

The use of the recursion!!!

As always our friend!!!

48 / 94

Page 120: 24 Multithreaded Algorithms

Pseudo-code of Matrix-MultiplyMatrix −Multiply(C , A, B, n) // The result of A×B in C with n a power of 2 for simplicity

1 if (n == 1)

2 C [1, 1] = A [1, 1] + B [1, 1]3 else4 allocate a temporary matrix T [1...n, 1...n]

5 partition A, B, C , T into n2 ×

n2 sub-matrices

6 spawn Matrix −Multiply (C11, A11, B11, n/2)7 spawn Matrix −Multiply (C12, A11, B12, n/2)8 spawn Matrix −Multiply (C21, A21, B11, n/2)9 spawn Matrix −Multiply (C22, A21, B12, n/2)10 spawn Matrix −Multiply (T11, A12, B21, n/2)11 spawn Matrix −Multiply (T12, A12, B21, n/2)12 spawn Matrix −Multiply (T21, A22, B21, n/2)

13 Matrix −Multiply (T22, A22, B22, n/2)

14 sync15 Matrix −Add(C , T , n)

49 / 94

Page 121: 24 Multithreaded Algorithms

Pseudo-code of Matrix-MultiplyMatrix −Multiply(C , A, B, n) // The result of A×B in C with n a power of 2 for simplicity

1 if (n == 1)

2 C [1, 1] = A [1, 1] + B [1, 1]3 else4 allocate a temporary matrix T [1...n, 1...n]

5 partition A, B, C , T into n2 ×

n2 sub-matrices

6 spawn Matrix −Multiply (C11, A11, B11, n/2)7 spawn Matrix −Multiply (C12, A11, B12, n/2)8 spawn Matrix −Multiply (C21, A21, B11, n/2)9 spawn Matrix −Multiply (C22, A21, B12, n/2)10 spawn Matrix −Multiply (T11, A12, B21, n/2)11 spawn Matrix −Multiply (T12, A12, B21, n/2)12 spawn Matrix −Multiply (T21, A22, B21, n/2)

13 Matrix −Multiply (T22, A22, B22, n/2)

14 sync15 Matrix −Add(C , T , n)

49 / 94

Page 122: 24 Multithreaded Algorithms

Pseudo-code of Matrix-MultiplyMatrix −Multiply(C , A, B, n) // The result of A×B in C with n a power of 2 for simplicity

1 if (n == 1)

2 C [1, 1] = A [1, 1] + B [1, 1]3 else4 allocate a temporary matrix T [1...n, 1...n]

5 partition A, B, C , T into n2 ×

n2 sub-matrices

6 spawn Matrix −Multiply (C11, A11, B11, n/2)7 spawn Matrix −Multiply (C12, A11, B12, n/2)8 spawn Matrix −Multiply (C21, A21, B11, n/2)9 spawn Matrix −Multiply (C22, A21, B12, n/2)10 spawn Matrix −Multiply (T11, A12, B21, n/2)11 spawn Matrix −Multiply (T12, A12, B21, n/2)12 spawn Matrix −Multiply (T21, A22, B21, n/2)

13 Matrix −Multiply (T22, A22, B22, n/2)

14 sync15 Matrix −Add(C , T , n)

49 / 94

Page 123: 24 Multithreaded Algorithms

Pseudo-code of Matrix-MultiplyMatrix −Multiply(C , A, B, n) // The result of A×B in C with n a power of 2 for simplicity

1 if (n == 1)

2 C [1, 1] = A [1, 1] + B [1, 1]3 else4 allocate a temporary matrix T [1...n, 1...n]

5 partition A, B, C , T into n2 ×

n2 sub-matrices

6 spawn Matrix −Multiply (C11, A11, B11, n/2)7 spawn Matrix −Multiply (C12, A11, B12, n/2)8 spawn Matrix −Multiply (C21, A21, B11, n/2)9 spawn Matrix −Multiply (C22, A21, B12, n/2)10 spawn Matrix −Multiply (T11, A12, B21, n/2)11 spawn Matrix −Multiply (T12, A12, B21, n/2)12 spawn Matrix −Multiply (T21, A22, B21, n/2)

13 Matrix −Multiply (T22, A22, B22, n/2)

14 sync15 Matrix −Add(C , T , n)

49 / 94

Page 124: 24 Multithreaded Algorithms

Pseudo-code of Matrix-MultiplyMatrix −Multiply(C , A, B, n) // The result of A×B in C with n a power of 2 for simplicity

1 if (n == 1)

2 C [1, 1] = A [1, 1] + B [1, 1]3 else4 allocate a temporary matrix T [1...n, 1...n]

5 partition A, B, C , T into n2 ×

n2 sub-matrices

6 spawn Matrix −Multiply (C11, A11, B11, n/2)7 spawn Matrix −Multiply (C12, A11, B12, n/2)8 spawn Matrix −Multiply (C21, A21, B11, n/2)9 spawn Matrix −Multiply (C22, A21, B12, n/2)10 spawn Matrix −Multiply (T11, A12, B21, n/2)11 spawn Matrix −Multiply (T12, A12, B21, n/2)12 spawn Matrix −Multiply (T21, A22, B21, n/2)

13 Matrix −Multiply (T22, A22, B22, n/2)

14 sync15 Matrix −Add(C , T , n)

49 / 94

Page 125: 24 Multithreaded Algorithms

Explanation

Lines 1 - 2Stops the recursion once you have only two numbers to multiply

Line 4Extra matrix for storing the second matrix in(

A11B11 A11B12A21B11 A21B12

)+(

A12B21 A12B22A22B21 A22B22

)︸ ︷︷ ︸

T

Line 5Do the desired partition!!!

50 / 94

Page 126: 24 Multithreaded Algorithms

Explanation

Lines 1 - 2Stops the recursion once you have only two numbers to multiply

Line 4Extra matrix for storing the second matrix in(

A11B11 A11B12A21B11 A21B12

)+(

A12B21 A12B22A22B21 A22B22

)︸ ︷︷ ︸

T

Line 5Do the desired partition!!!

50 / 94

Page 127: 24 Multithreaded Algorithms

Explanation

Lines 1 - 2Stops the recursion once you have only two numbers to multiply

Line 4Extra matrix for storing the second matrix in(

A11B11 A11B12A21B11 A21B12

)+(

A12B21 A12B22A22B21 A22B22

)︸ ︷︷ ︸

T

Line 5Do the desired partition!!!

50 / 94

Page 128: 24 Multithreaded Algorithms

Explanation

Lines 6 to 13Calculating the products in(

A11B11 A11B12A21B11 A21B12

)+(

A12B21 A12B22A22B21 A22B22

)Using Recursion and Parallel Computations

Line 14A barrier to wait until all the parallel computations are done!!!

Line 15Call Matrix −Add to add C and T .

51 / 94

Page 129: 24 Multithreaded Algorithms

Explanation

Lines 6 to 13Calculating the products in(

A11B11 A11B12A21B11 A21B12

)+(

A12B21 A12B22A22B21 A22B22

)Using Recursion and Parallel Computations

Line 14A barrier to wait until all the parallel computations are done!!!

Line 15Call Matrix −Add to add C and T .

51 / 94

Page 130: 24 Multithreaded Algorithms

Explanation

Lines 6 to 13Calculating the products in(

A11B11 A11B12A21B11 A21B12

)+(

A12B21 A12B22A22B21 A22B22

)Using Recursion and Parallel Computations

Line 14A barrier to wait until all the parallel computations are done!!!

Line 15Call Matrix −Add to add C and T .

51 / 94

Page 131: 24 Multithreaded Algorithms

Matrix ADD

Matrix Add CodeMatrix −Add(C ,T ,n)// Add matrices C and T in-place to produce C = C + T

1 if (n == 1)2 C [1, 1] = C [1, 1] + T [1, 1]3 else4 Partition C and T into n

2 ×n2 sub-matrices

5 spawn Matrix −Add (C11,T11, n/2)6 spawn Matrix −Add (C12,T12, n/2)7 spawn Matrix −Add (C21,T21, n/2)8 Matrix −Add (C22,T22, n/2)9 sync

52 / 94

Page 132: 24 Multithreaded Algorithms

Matrix ADD

Matrix Add CodeMatrix −Add(C ,T ,n)// Add matrices C and T in-place to produce C = C + T

1 if (n == 1)2 C [1, 1] = C [1, 1] + T [1, 1]3 else4 Partition C and T into n

2 ×n2 sub-matrices

5 spawn Matrix −Add (C11,T11, n/2)6 spawn Matrix −Add (C12,T12, n/2)7 spawn Matrix −Add (C21,T21, n/2)8 Matrix −Add (C22,T22, n/2)9 sync

52 / 94

Page 133: 24 Multithreaded Algorithms

Matrix ADD

Matrix Add CodeMatrix −Add(C ,T ,n)// Add matrices C and T in-place to produce C = C + T

1 if (n == 1)2 C [1, 1] = C [1, 1] + T [1, 1]3 else4 Partition C and T into n

2 ×n2 sub-matrices

5 spawn Matrix −Add (C11,T11, n/2)6 spawn Matrix −Add (C12,T12, n/2)7 spawn Matrix −Add (C21,T21, n/2)8 Matrix −Add (C22,T22, n/2)9 sync

52 / 94

Page 134: 24 Multithreaded Algorithms

Matrix ADD

Matrix Add CodeMatrix −Add(C ,T ,n)// Add matrices C and T in-place to produce C = C + T

1 if (n == 1)2 C [1, 1] = C [1, 1] + T [1, 1]3 else4 Partition C and T into n

2 ×n2 sub-matrices

5 spawn Matrix −Add (C11,T11, n/2)6 spawn Matrix −Add (C12,T12, n/2)7 spawn Matrix −Add (C21,T21, n/2)8 Matrix −Add (C22,T22, n/2)9 sync

52 / 94

Page 135: 24 Multithreaded Algorithms

ExplanationLine 1 - 2Stops the recursion once you have only two numbers to multiply

Line 4To Partition

C =(

A11B11 A11B12A21B11 A21B12

)

T =(

A12B21 A12B22A22B21 A22B22

)

In lines 5 to 8We do the following sum in parallel!!!(

A11B11 A11B12A21B11 A21B12

)︸ ︷︷ ︸

C

+(

A12B21 A12B22A22B21 A22B22

)︸ ︷︷ ︸

T53 / 94

Page 136: 24 Multithreaded Algorithms

ExplanationLine 1 - 2Stops the recursion once you have only two numbers to multiply

Line 4To Partition

C =(

A11B11 A11B12A21B11 A21B12

)

T =(

A12B21 A12B22A22B21 A22B22

)

In lines 5 to 8We do the following sum in parallel!!!(

A11B11 A11B12A21B11 A21B12

)︸ ︷︷ ︸

C

+(

A12B21 A12B22A22B21 A22B22

)︸ ︷︷ ︸

T53 / 94

Page 137: 24 Multithreaded Algorithms

ExplanationLine 1 - 2Stops the recursion once you have only two numbers to multiply

Line 4To Partition

C =(

A11B11 A11B12A21B11 A21B12

)

T =(

A12B21 A12B22A22B21 A22B22

)

In lines 5 to 8We do the following sum in parallel!!!(

A11B11 A11B12A21B11 A21B12

)︸ ︷︷ ︸

C

+(

A12B21 A12B22A22B21 A22B22

)︸ ︷︷ ︸

T53 / 94

Page 138: 24 Multithreaded Algorithms

ExplanationLine 1 - 2Stops the recursion once you have only two numbers to multiply

Line 4To Partition

C =(

A11B11 A11B12A21B11 A21B12

)

T =(

A12B21 A12B22A22B21 A22B22

)

In lines 5 to 8We do the following sum in parallel!!!(

A11B11 A11B12A21B11 A21B12

)︸ ︷︷ ︸

C

+(

A12B21 A12B22A22B21 A22B22

)︸ ︷︷ ︸

T53 / 94

Page 139: 24 Multithreaded Algorithms

Calculating Complexity of Matrix Multiplication

Work of Matrix MultiplicationThe work of T1 (n) of matrix multiplication satisfies the recurrence:

T1 (n) = 8T1

(n2

)︸ ︷︷ ︸

The sequential product

+ Θ(n2)

︸ ︷︷ ︸The sequential sum

= Θ(n3).

54 / 94

Page 140: 24 Multithreaded Algorithms

Calculating Complexity of Matrix Multiplication

Span of Matrix Multiplication

T∞ (n) = T∞(n

2

)︸ ︷︷ ︸

The parallel product

+ Θ (log n)︸ ︷︷ ︸The parallel sum

= Θ(log2 n

)

This is because:T∞

(n2)Matrix Multiplication is taking n

2 ×n2 matrices at the same

time because parallelism.Θ (log n) is the span of the addition of the matrices (Remember, weare using unlimited processors) which has a critical path of lengthlog n.

55 / 94

Page 141: 24 Multithreaded Algorithms

Calculating Complexity of Matrix Multiplication

Span of Matrix Multiplication

T∞ (n) = T∞(n

2

)︸ ︷︷ ︸

The parallel product

+ Θ (log n)︸ ︷︷ ︸The parallel sum

= Θ(log2 n

)

This is because:T∞

(n2)Matrix Multiplication is taking n

2 ×n2 matrices at the same

time because parallelism.Θ (log n) is the span of the addition of the matrices (Remember, weare using unlimited processors) which has a critical path of lengthlog n.

55 / 94

Page 142: 24 Multithreaded Algorithms

Calculating Complexity of Matrix Multiplication

Span of Matrix Multiplication

T∞ (n) = T∞(n

2

)︸ ︷︷ ︸

The parallel product

+ Θ (log n)︸ ︷︷ ︸The parallel sum

= Θ(log2 n

)

This is because:T∞

(n2)Matrix Multiplication is taking n

2 ×n2 matrices at the same

time because parallelism.Θ (log n) is the span of the addition of the matrices (Remember, weare using unlimited processors) which has a critical path of lengthlog n.

55 / 94

Page 143: 24 Multithreaded Algorithms

Collapsing the sum

Parallel Sum

+ +

56 / 94

Page 144: 24 Multithreaded Algorithms

How much Parallelism?

The Final Parallelism in this Algorithm isT1 (n)T∞ (n) = Θ

(n3

log2 n

)Quite A Lot!!!

57 / 94

Page 145: 24 Multithreaded Algorithms

Outline1 Introduction

Why Multi-Threaded Algorithms?2 Model To Be Used

Symmetric MultiprocessorOperationsExample

3 Computation DAGIntroduction

4 Performance MeasuresIntroductionRunning Time Classification

5 Parallel LawsWork and Span LawsSpeedup and ParallelismGreedy SchedulerScheduling Rises the Following Issue

6 ExamplesParallel FibonacciMatrix MultiplicationParallel Merge-Sort

7 ExercisesSome Exercises you can try!!!

58 / 94

Page 146: 24 Multithreaded Algorithms

Merge-Sort : The Serial Version

We haveMerge − Sort (A, p, r)Observation: Sort elements in A [p...r ]

1 if (p < r) then2 q = b(p+r)/2c3 Merge − Sort (A, p, q)4 Merge − Sort (A, q + 1, r)5 Merge (A, p, q, r)

59 / 94

Page 147: 24 Multithreaded Algorithms

Merge-Sort : The Parallel Version

We haveMerge − Sort (A, p, r)Observation: Sort elements in A [p...r ]

1 if (p < r) then2 q = b(p+r)/2c3 spawn Merge − Sort (A, p, q)4 Merge − Sort (A, q + 1, r) // Not necessary to spawn this5 sync6 Merge (A, p, q, r)

60 / 94

Page 148: 24 Multithreaded Algorithms

Calculating Complexity of This simple Parallel Merge-Sort

Work of Merge-SortThe work of T1 (n) of this Parallel Merge-Sort satisfies the recurrence:

T1 (n) =

Θ (1) if n = 12T1

(n2)

+ Θ (n) otherwise= Θ (n log n)

Because the Master Theorem Case 2.

Span

T∞ (n) =

Θ (1) if n = 1T∞

(n2)

+ Θ (n) otherwise

We have thenT∞

(n2)sort is taking two sorts at the same time because parallelism.

Then, T∞ (n) = Θ (n) because the Master Theorem Case 3.

61 / 94

Page 149: 24 Multithreaded Algorithms

Calculating Complexity of This simple Parallel Merge-Sort

Work of Merge-SortThe work of T1 (n) of this Parallel Merge-Sort satisfies the recurrence:

T1 (n) =

Θ (1) if n = 12T1

(n2)

+ Θ (n) otherwise= Θ (n log n)

Because the Master Theorem Case 2.

Span

T∞ (n) =

Θ (1) if n = 1T∞

(n2)

+ Θ (n) otherwise

We have thenT∞

(n2)sort is taking two sorts at the same time because parallelism.

Then, T∞ (n) = Θ (n) because the Master Theorem Case 3.

61 / 94

Page 150: 24 Multithreaded Algorithms

Calculating Complexity of This simple Parallel Merge-Sort

Work of Merge-SortThe work of T1 (n) of this Parallel Merge-Sort satisfies the recurrence:

T1 (n) =

Θ (1) if n = 12T1

(n2)

+ Θ (n) otherwise= Θ (n log n)

Because the Master Theorem Case 2.

Span

T∞ (n) =

Θ (1) if n = 1T∞

(n2)

+ Θ (n) otherwise

We have thenT∞

(n2)sort is taking two sorts at the same time because parallelism.

Then, T∞ (n) = Θ (n) because the Master Theorem Case 3.

61 / 94

Page 151: 24 Multithreaded Algorithms

Calculating Complexity of This simple Parallel Merge-Sort

Work of Merge-SortThe work of T1 (n) of this Parallel Merge-Sort satisfies the recurrence:

T1 (n) =

Θ (1) if n = 12T1

(n2)

+ Θ (n) otherwise= Θ (n log n)

Because the Master Theorem Case 2.

Span

T∞ (n) =

Θ (1) if n = 1T∞

(n2)

+ Θ (n) otherwise

We have thenT∞

(n2)sort is taking two sorts at the same time because parallelism.

Then, T∞ (n) = Θ (n) because the Master Theorem Case 3.

61 / 94

Page 152: 24 Multithreaded Algorithms

How much Parallelism?

The Final Parallelism in this Algorithm isT1 (n)T∞ (n) = Θ (log n)

NOT NOT A Lot!!!

62 / 94

Page 153: 24 Multithreaded Algorithms

Can we improve this?

We have a problemWe have a bottleneck!!! Where?

Yes in the Merge part!!!We need to improve that bottleneck!!!

63 / 94

Page 154: 24 Multithreaded Algorithms

Can we improve this?

We have a problemWe have a bottleneck!!! Where?

Yes in the Merge part!!!We need to improve that bottleneck!!!

63 / 94

Page 155: 24 Multithreaded Algorithms

Parallel Merge

Example: Here, we use and intermediate array T

64 / 94

Page 156: 24 Multithreaded Algorithms

Parallel Merge

Step 1. Find x = T [q1] where q1 = b(p1+r1)/2c or the midpoint inT [p1..r1]

65 / 94

Page 157: 24 Multithreaded Algorithms

Parallel Merge

Step 2. Use Binary Search in T [p1..r1] to find q2

66 / 94

Page 158: 24 Multithreaded Algorithms

Then

So that if we insert x between T [q2 − 1] and T [q2]

T[

p1 · · · q2 − 1 x q2 · · · r1]is sorted

67 / 94

Page 159: 24 Multithreaded Algorithms

Binary Search

It takes a key x and a sub-array T [p..r ] and it does1 If T [p..r ] is empty r < p, then it returns the index p.2 if x ≤ T [p], then it returns p.3 if x > T [p], then it returns the largest index q in the range

p < q ≤ r + 1 such that T [q − 1] < x.

68 / 94

Page 160: 24 Multithreaded Algorithms

Binary Search

It takes a key x and a sub-array T [p..r ] and it does1 If T [p..r ] is empty r < p, then it returns the index p.2 if x ≤ T [p], then it returns p.3 if x > T [p], then it returns the largest index q in the range

p < q ≤ r + 1 such that T [q − 1] < x.

68 / 94

Page 161: 24 Multithreaded Algorithms

Binary Search

It takes a key x and a sub-array T [p..r ] and it does1 If T [p..r ] is empty r < p, then it returns the index p.2 if x ≤ T [p], then it returns p.3 if x > T [p], then it returns the largest index q in the range

p < q ≤ r + 1 such that T [q − 1] < x.

68 / 94

Page 162: 24 Multithreaded Algorithms

Binary Search Code

BINARY-SEARCH(x ,T , p, r)1 low = p2 high = max p, r + 13 while low < high4 mid =

⌊log+high

2

⌋5 if x ≤ T [mid]6 high = mid7 else low = mid + 18 return high

69 / 94

Page 163: 24 Multithreaded Algorithms

Binary Search Code

BINARY-SEARCH(x ,T , p, r)1 low = p2 high = max p, r + 13 while low < high4 mid =

⌊log+high

2

⌋5 if x ≤ T [mid]6 high = mid7 else low = mid + 18 return high

69 / 94

Page 164: 24 Multithreaded Algorithms

Binary Search Code

BINARY-SEARCH(x ,T , p, r)1 low = p2 high = max p, r + 13 while low < high4 mid =

⌊log+high

2

⌋5 if x ≤ T [mid]6 high = mid7 else low = mid + 18 return high

69 / 94

Page 165: 24 Multithreaded Algorithms

Binary Search Code

BINARY-SEARCH(x ,T , p, r)1 low = p2 high = max p, r + 13 while low < high4 mid =

⌊log+high

2

⌋5 if x ≤ T [mid]6 high = mid7 else low = mid + 18 return high

69 / 94

Page 166: 24 Multithreaded Algorithms

Binary Search Code

BINARY-SEARCH(x ,T , p, r)1 low = p2 high = max p, r + 13 while low < high4 mid =

⌊log+high

2

⌋5 if x ≤ T [mid]6 high = mid7 else low = mid + 18 return high

69 / 94

Page 167: 24 Multithreaded Algorithms

Parallel Merge

Step 3. Copy x in A [q3] where q3 = p3 + (q1 − p1) + (q2 − p2)

70 / 94

Page 168: 24 Multithreaded Algorithms

Parallel Merge

Step 4. Recursively merge T [p1..q1 − 1] and T [p2..q2 − 1] and placeresult into A [p3..q3 − 1]

71 / 94

Page 169: 24 Multithreaded Algorithms

Parallel Merge

Step 5. Recursively merge T [q1 + 1..r1] and T [q2..r2] and placeresult into A [q3 + 1..r3]

72 / 94

Page 170: 24 Multithreaded Algorithms

The Final Code for Parallel MergePar −Merge (T , p1, r1, p2, r2,A, p3)

1 n1 = r1 − p1 + 1, n2 = r2 − p2 + 12 if n1 < n23 Exchange p1 ↔ p2, r1 ↔ r2,n1 ↔ n24 if (n1 == 0)5 return6 else7 q1 = b(p1+r1)/2c8 q2 = BinarySearch (T [q1] ,T , p2, r2)9 q3 = p3 + (q1 − p1) + (q2 − p2)10 A [q3] = T [q1]11 spawn Par −Merge (T , p1, q1 − 1, p2, q2 − 1,A, p3)12 Par −Merge (T , q1 + 1, r1, q2 + 1, r2,A, q3 + 1)13 sync

73 / 94

Page 171: 24 Multithreaded Algorithms

The Final Code for Parallel MergePar −Merge (T , p1, r1, p2, r2,A, p3)

1 n1 = r1 − p1 + 1, n2 = r2 − p2 + 12 if n1 < n23 Exchange p1 ↔ p2, r1 ↔ r2,n1 ↔ n24 if (n1 == 0)5 return6 else7 q1 = b(p1+r1)/2c8 q2 = BinarySearch (T [q1] ,T , p2, r2)9 q3 = p3 + (q1 − p1) + (q2 − p2)10 A [q3] = T [q1]11 spawn Par −Merge (T , p1, q1 − 1, p2, q2 − 1,A, p3)12 Par −Merge (T , q1 + 1, r1, q2 + 1, r2,A, q3 + 1)13 sync

73 / 94

Page 172: 24 Multithreaded Algorithms

The Final Code for Parallel MergePar −Merge (T , p1, r1, p2, r2,A, p3)

1 n1 = r1 − p1 + 1, n2 = r2 − p2 + 12 if n1 < n23 Exchange p1 ↔ p2, r1 ↔ r2,n1 ↔ n24 if (n1 == 0)5 return6 else7 q1 = b(p1+r1)/2c8 q2 = BinarySearch (T [q1] ,T , p2, r2)9 q3 = p3 + (q1 − p1) + (q2 − p2)10 A [q3] = T [q1]11 spawn Par −Merge (T , p1, q1 − 1, p2, q2 − 1,A, p3)12 Par −Merge (T , q1 + 1, r1, q2 + 1, r2,A, q3 + 1)13 sync

73 / 94

Page 173: 24 Multithreaded Algorithms

The Final Code for Parallel MergePar −Merge (T , p1, r1, p2, r2,A, p3)

1 n1 = r1 − p1 + 1, n2 = r2 − p2 + 12 if n1 < n23 Exchange p1 ↔ p2, r1 ↔ r2,n1 ↔ n24 if (n1 == 0)5 return6 else7 q1 = b(p1+r1)/2c8 q2 = BinarySearch (T [q1] ,T , p2, r2)9 q3 = p3 + (q1 − p1) + (q2 − p2)10 A [q3] = T [q1]11 spawn Par −Merge (T , p1, q1 − 1, p2, q2 − 1,A, p3)12 Par −Merge (T , q1 + 1, r1, q2 + 1, r2,A, q3 + 1)13 sync

73 / 94

Page 174: 24 Multithreaded Algorithms

The Final Code for Parallel MergePar −Merge (T , p1, r1, p2, r2,A, p3)

1 n1 = r1 − p1 + 1, n2 = r2 − p2 + 12 if n1 < n23 Exchange p1 ↔ p2, r1 ↔ r2,n1 ↔ n24 if (n1 == 0)5 return6 else7 q1 = b(p1+r1)/2c8 q2 = BinarySearch (T [q1] ,T , p2, r2)9 q3 = p3 + (q1 − p1) + (q2 − p2)10 A [q3] = T [q1]11 spawn Par −Merge (T , p1, q1 − 1, p2, q2 − 1,A, p3)12 Par −Merge (T , q1 + 1, r1, q2 + 1, r2,A, q3 + 1)13 sync

73 / 94

Page 175: 24 Multithreaded Algorithms

The Final Code for Parallel MergePar −Merge (T , p1, r1, p2, r2,A, p3)

1 n1 = r1 − p1 + 1, n2 = r2 − p2 + 12 if n1 < n23 Exchange p1 ↔ p2, r1 ↔ r2,n1 ↔ n24 if (n1 == 0)5 return6 else7 q1 = b(p1+r1)/2c8 q2 = BinarySearch (T [q1] ,T , p2, r2)9 q3 = p3 + (q1 − p1) + (q2 − p2)10 A [q3] = T [q1]11 spawn Par −Merge (T , p1, q1 − 1, p2, q2 − 1,A, p3)12 Par −Merge (T , q1 + 1, r1, q2 + 1, r2,A, q3 + 1)13 sync

73 / 94

Page 176: 24 Multithreaded Algorithms

Explanation

Line 1Obtain the length of the two arrays to be merged

Line 2: If one is larger than the otherWe exchange the variables to work the largest element!!! In this case wemake n1 ≥ n2

Line 4if n1 == 0 return nothing to merge!!!

74 / 94

Page 177: 24 Multithreaded Algorithms

Explanation

Line 1Obtain the length of the two arrays to be merged

Line 2: If one is larger than the otherWe exchange the variables to work the largest element!!! In this case wemake n1 ≥ n2

Line 4if n1 == 0 return nothing to merge!!!

74 / 94

Page 178: 24 Multithreaded Algorithms

Explanation

Line 1Obtain the length of the two arrays to be merged

Line 2: If one is larger than the otherWe exchange the variables to work the largest element!!! In this case wemake n1 ≥ n2

Line 4if n1 == 0 return nothing to merge!!!

74 / 94

Page 179: 24 Multithreaded Algorithms

Explanation

Line 10It copies T [q1] directly into A [q3]

Line 11 and 12They are used to recurse using nested parallelism to merge the sub-arraysless and greater than x.

Line 13The sync is used to ensure that the subproblems have completed beforethe procedure returns.

75 / 94

Page 180: 24 Multithreaded Algorithms

Explanation

Line 10It copies T [q1] directly into A [q3]

Line 11 and 12They are used to recurse using nested parallelism to merge the sub-arraysless and greater than x.

Line 13The sync is used to ensure that the subproblems have completed beforethe procedure returns.

75 / 94

Page 181: 24 Multithreaded Algorithms

Explanation

Line 10It copies T [q1] directly into A [q3]

Line 11 and 12They are used to recurse using nested parallelism to merge the sub-arraysless and greater than x.

Line 13The sync is used to ensure that the subproblems have completed beforethe procedure returns.

75 / 94

Page 182: 24 Multithreaded Algorithms

First the Span Complexity of Parallel Merge: T∞ (n)

Suppositionsn = n1 + n2

What case should we study?Remember T∞ (n) = max T∞ (n1) + T∞ (n2)

We notice then thatBecause lines 3-6 n2 ≤ n1

76 / 94

Page 183: 24 Multithreaded Algorithms

First the Span Complexity of Parallel Merge: T∞ (n)

Suppositionsn = n1 + n2

What case should we study?Remember T∞ (n) = max T∞ (n1) + T∞ (n2)

We notice then thatBecause lines 3-6 n2 ≤ n1

76 / 94

Page 184: 24 Multithreaded Algorithms

First the Span Complexity of Parallel Merge: T∞ (n)

Suppositionsn = n1 + n2

What case should we study?Remember T∞ (n) = max T∞ (n1) + T∞ (n2)

We notice then thatBecause lines 3-6 n2 ≤ n1

76 / 94

Page 185: 24 Multithreaded Algorithms

Span Complexity of the Parallel Merge with OneProcessor: T1 (n)

Then

2n2 ≤ n1 + n2 = n =⇒ n2 ≤ n/2

ThusIn the worst case, a recursive call in lines 11 merges:⌊n1

2⌋elements of T [p1...r1] (Remember we are halving the array by

mid-point).With all n2 elements of T [p2...r2].

77 / 94

Page 186: 24 Multithreaded Algorithms

Span Complexity of the Parallel Merge with OneProcessor: T1 (n)

Then

2n2 ≤ n1 + n2 = n =⇒ n2 ≤ n/2

ThusIn the worst case, a recursive call in lines 11 merges:⌊n1

2⌋elements of T [p1...r1] (Remember we are halving the array by

mid-point).With all n2 elements of T [p2...r2].

77 / 94

Page 187: 24 Multithreaded Algorithms

Span Complexity of the Parallel Merge with OneProcessor: T1 (n)

Then

2n2 ≤ n1 + n2 = n =⇒ n2 ≤ n/2

ThusIn the worst case, a recursive call in lines 11 merges:⌊n1

2⌋elements of T [p1...r1] (Remember we are halving the array by

mid-point).With all n2 elements of T [p2...r2].

77 / 94

Page 188: 24 Multithreaded Algorithms

Span Complexity of the Parallel Merge with OneProcessor: T1 (n)

Then

2n2 ≤ n1 + n2 = n =⇒ n2 ≤ n/2

ThusIn the worst case, a recursive call in lines 11 merges:⌊n1

2⌋elements of T [p1...r1] (Remember we are halving the array by

mid-point).With all n2 elements of T [p2...r2].

77 / 94

Page 189: 24 Multithreaded Algorithms

Span Complexity of the Parallel Merge with OneProcessor: T1 (n)

Thus, the number of elements involved in such a call is⌊n1

2

⌋+ n2 ≤

n12 + n2

2 + n22

≤ n12 + n2

2 +n/2

2= n1 + n2

2 + n4

≤ n2 + n

4 = 3n4

78 / 94

Page 190: 24 Multithreaded Algorithms

Span Complexity of the Parallel Merge with OneProcessor: T1 (n)

Thus, the number of elements involved in such a call is⌊n1

2

⌋+ n2 ≤

n12 + n2

2 + n22

≤ n12 + n2

2 +n/2

2= n1 + n2

2 + n4

≤ n2 + n

4 = 3n4

78 / 94

Page 191: 24 Multithreaded Algorithms

Span Complexity of the Parallel Merge with OneProcessor: T1 (n)

Thus, the number of elements involved in such a call is⌊n1

2

⌋+ n2 ≤

n12 + n2

2 + n22

≤ n12 + n2

2 +n/2

2= n1 + n2

2 + n4

≤ n2 + n

4 = 3n4

78 / 94

Page 192: 24 Multithreaded Algorithms

Span Complexity of the Parallel Merge with OneProcessor: T1 (n)

Thus, the number of elements involved in such a call is⌊n1

2

⌋+ n2 ≤

n12 + n2

2 + n22

≤ n12 + n2

2 +n/2

2= n1 + n2

2 + n4

≤ n2 + n

4 = 3n4

78 / 94

Page 193: 24 Multithreaded Algorithms

Span Complexity of the Parallel Merge with OneProcessor: T1 (n)

Knowing that the Binary Search takes

Θ (log n)

We get the span for parallel merge

T∞ (n) = T∞(3n

4

)+ Θ (log n)

This can can be solved using the exercise 4.6-2 in the Cormen’s Book

T∞ (n) = Θ(log2 n

)

79 / 94

Page 194: 24 Multithreaded Algorithms

Span Complexity of the Parallel Merge with OneProcessor: T1 (n)

Knowing that the Binary Search takes

Θ (log n)

We get the span for parallel merge

T∞ (n) = T∞(3n

4

)+ Θ (log n)

This can can be solved using the exercise 4.6-2 in the Cormen’s Book

T∞ (n) = Θ(log2 n

)

79 / 94

Page 195: 24 Multithreaded Algorithms

Span Complexity of the Parallel Merge with OneProcessor: T1 (n)

Knowing that the Binary Search takes

Θ (log n)

We get the span for parallel merge

T∞ (n) = T∞(3n

4

)+ Θ (log n)

This can can be solved using the exercise 4.6-2 in the Cormen’s Book

T∞ (n) = Θ(log2 n

)

79 / 94

Page 196: 24 Multithreaded Algorithms

Calculating Work Complexity of Parallel Merge

Ok!!! We need to calculate the WORK

T1 (n) = Θ (Something)

ThusWe need to calculate the upper and lower bound.

80 / 94

Page 197: 24 Multithreaded Algorithms

Calculating Work Complexity of Parallel Merge

Ok!!! We need to calculate the WORK

T1 (n) = Θ (Something)

ThusWe need to calculate the upper and lower bound.

80 / 94

Page 198: 24 Multithreaded Algorithms

Calculating Work Complexity of Parallel MergeWork of Parallel MergeThe work of T1 (n) of this Parallel Merge satisfies:

T1 (n) = Ω (n)

Because each of the n elements must be copied from array T to arrayA.

What about the Upper Bound O?First notice that we can have a merge with

n4 elements when we have we have the worst case of

⌊n12⌋

+ n2 in theother merge.And 3n

4 for the worst case.And the work of the Binary Search of O (log n)

81 / 94

Page 199: 24 Multithreaded Algorithms

Calculating Work Complexity of Parallel MergeWork of Parallel MergeThe work of T1 (n) of this Parallel Merge satisfies:

T1 (n) = Ω (n)

Because each of the n elements must be copied from array T to arrayA.

What about the Upper Bound O?First notice that we can have a merge with

n4 elements when we have we have the worst case of

⌊n12⌋

+ n2 in theother merge.And 3n

4 for the worst case.And the work of the Binary Search of O (log n)

81 / 94

Page 200: 24 Multithreaded Algorithms

Calculating Work Complexity of Parallel MergeWork of Parallel MergeThe work of T1 (n) of this Parallel Merge satisfies:

T1 (n) = Ω (n)

Because each of the n elements must be copied from array T to arrayA.

What about the Upper Bound O?First notice that we can have a merge with

n4 elements when we have we have the worst case of

⌊n12⌋

+ n2 in theother merge.And 3n

4 for the worst case.And the work of the Binary Search of O (log n)

81 / 94

Page 201: 24 Multithreaded Algorithms

Calculating Work Complexity of Parallel MergeWork of Parallel MergeThe work of T1 (n) of this Parallel Merge satisfies:

T1 (n) = Ω (n)

Because each of the n elements must be copied from array T to arrayA.

What about the Upper Bound O?First notice that we can have a merge with

n4 elements when we have we have the worst case of

⌊n12⌋

+ n2 in theother merge.And 3n

4 for the worst case.And the work of the Binary Search of O (log n)

81 / 94

Page 202: 24 Multithreaded Algorithms

Calculating Work Complexity of Parallel MergeWork of Parallel MergeThe work of T1 (n) of this Parallel Merge satisfies:

T1 (n) = Ω (n)

Because each of the n elements must be copied from array T to arrayA.

What about the Upper Bound O?First notice that we can have a merge with

n4 elements when we have we have the worst case of

⌊n12⌋

+ n2 in theother merge.And 3n

4 for the worst case.And the work of the Binary Search of O (log n)

81 / 94

Page 203: 24 Multithreaded Algorithms

Calculating Work Complexity of Parallel MergeWork of Parallel MergeThe work of T1 (n) of this Parallel Merge satisfies:

T1 (n) = Ω (n)

Because each of the n elements must be copied from array T to arrayA.

What about the Upper Bound O?First notice that we can have a merge with

n4 elements when we have we have the worst case of

⌊n12⌋

+ n2 in theother merge.And 3n

4 for the worst case.And the work of the Binary Search of O (log n)

81 / 94

Page 204: 24 Multithreaded Algorithms

Calculating Work Complexity of Parallel MergeWork of Parallel MergeThe work of T1 (n) of this Parallel Merge satisfies:

T1 (n) = Ω (n)

Because each of the n elements must be copied from array T to arrayA.

What about the Upper Bound O?First notice that we can have a merge with

n4 elements when we have we have the worst case of

⌊n12⌋

+ n2 in theother merge.And 3n

4 for the worst case.And the work of the Binary Search of O (log n)

81 / 94

Page 205: 24 Multithreaded Algorithms

Calculating Work Complexity of Parallel Merge

ThenThen, for some α ∈

[14 ,

34

], then we have the following recursion for the

Parallel Merge when we have one processor:

T1 (n) = T1 (αn) + T1 ((1− α) n)︸ ︷︷ ︸Merge Part

+ Θ (log n)︸ ︷︷ ︸Binary Search

Remark: α varies at each level of the recursion!!!

82 / 94

Page 206: 24 Multithreaded Algorithms

Calculating Work Complexity of Parallel Merge

ThenThen, for some α ∈

[14 ,

34

], then we have the following recursion for the

Parallel Merge when we have one processor:

T1 (n) = T1 (αn) + T1 ((1− α) n)︸ ︷︷ ︸Merge Part

+ Θ (log n)︸ ︷︷ ︸Binary Search

Remark: α varies at each level of the recursion!!!

82 / 94

Page 207: 24 Multithreaded Algorithms

Calculating Work Complexity of Parallel Merge

ThenThen, for some α ∈

[14 ,

34

], then we have the following recursion for the

Parallel Merge when we have one processor:

T1 (n) = T1 (αn) + T1 ((1− α) n)︸ ︷︷ ︸Merge Part

+ Θ (log n)︸ ︷︷ ︸Binary Search

Remark: α varies at each level of the recursion!!!

82 / 94

Page 208: 24 Multithreaded Algorithms

Calculating Work Complexity of Parallel Merge

ThenAssume that T1 (n) ≤ c1n − c2 log n for positive constants c1 and c2.

We have then using c3 for Θ (log n)

T1 (n) ≤ T1 (αn) + T1 ((1− α) n) + c3 log n≤ c1αn − c2 log (αn) + c1 (1− α) n − c2 log ((1− α) n) + c3 log n= c1n − c2 log (α(1− α))− 2c2 log n + c3 log n (splitting elements)= c1n − c2 (log n + log (α(1− α)))− (c2 − c3) log n≤ c1n − (c2 − c3) log n because log n + log (α(1− α)) > 0

83 / 94

Page 209: 24 Multithreaded Algorithms

Calculating Work Complexity of Parallel Merge

ThenAssume that T1 (n) ≤ c1n − c2 log n for positive constants c1 and c2.

We have then using c3 for Θ (log n)

T1 (n) ≤ T1 (αn) + T1 ((1− α) n) + c3 log n≤ c1αn − c2 log (αn) + c1 (1− α) n − c2 log ((1− α) n) + c3 log n= c1n − c2 log (α(1− α))− 2c2 log n + c3 log n (splitting elements)= c1n − c2 (log n + log (α(1− α)))− (c2 − c3) log n≤ c1n − (c2 − c3) log n because log n + log (α(1− α)) > 0

83 / 94

Page 210: 24 Multithreaded Algorithms

Calculating Work Complexity of Parallel Merge

ThenAssume that T1 (n) ≤ c1n − c2 log n for positive constants c1 and c2.

We have then using c3 for Θ (log n)

T1 (n) ≤ T1 (αn) + T1 ((1− α) n) + c3 log n≤ c1αn − c2 log (αn) + c1 (1− α) n − c2 log ((1− α) n) + c3 log n= c1n − c2 log (α(1− α))− 2c2 log n + c3 log n (splitting elements)= c1n − c2 (log n + log (α(1− α)))− (c2 − c3) log n≤ c1n − (c2 − c3) log n because log n + log (α(1− α)) > 0

83 / 94

Page 211: 24 Multithreaded Algorithms

Calculating Work Complexity of Parallel Merge

ThenAssume that T1 (n) ≤ c1n − c2 log n for positive constants c1 and c2.

We have then using c3 for Θ (log n)

T1 (n) ≤ T1 (αn) + T1 ((1− α) n) + c3 log n≤ c1αn − c2 log (αn) + c1 (1− α) n − c2 log ((1− α) n) + c3 log n= c1n − c2 log (α(1− α))− 2c2 log n + c3 log n (splitting elements)= c1n − c2 (log n + log (α(1− α)))− (c2 − c3) log n≤ c1n − (c2 − c3) log n because log n + log (α(1− α)) > 0

83 / 94

Page 212: 24 Multithreaded Algorithms

Calculating Work Complexity of Parallel Merge

ThenAssume that T1 (n) ≤ c1n − c2 log n for positive constants c1 and c2.

We have then using c3 for Θ (log n)

T1 (n) ≤ T1 (αn) + T1 ((1− α) n) + c3 log n≤ c1αn − c2 log (αn) + c1 (1− α) n − c2 log ((1− α) n) + c3 log n= c1n − c2 log (α(1− α))− 2c2 log n + c3 log n (splitting elements)= c1n − c2 (log n + log (α(1− α)))− (c2 − c3) log n≤ c1n − (c2 − c3) log n because log n + log (α(1− α)) > 0

83 / 94

Page 213: 24 Multithreaded Algorithms

Calculating Work Complexity of Parallel Merge

Now, we have that given 0 < α(1− α) < 1We have log (α(1− α)) < 0

Thus, making n large enough

log n + log (α(1− α)) > 0 (1)

Then

T1 (n) ≤ c1n − (c2 − c3) log n

84 / 94

Page 214: 24 Multithreaded Algorithms

Calculating Work Complexity of Parallel Merge

Now, we have that given 0 < α(1− α) < 1We have log (α(1− α)) < 0

Thus, making n large enough

log n + log (α(1− α)) > 0 (1)

Then

T1 (n) ≤ c1n − (c2 − c3) log n

84 / 94

Page 215: 24 Multithreaded Algorithms

Calculating Work Complexity of Parallel Merge

Now, we have that given 0 < α(1− α) < 1We have log (α(1− α)) < 0

Thus, making n large enough

log n + log (α(1− α)) > 0 (1)

Then

T1 (n) ≤ c1n − (c2 − c3) log n

84 / 94

Page 216: 24 Multithreaded Algorithms

Calculating Work Complexity of Parallel Merge

Now, we choose c2 and c3 such that

c2 − c3 ≥ 0

We have that

T1 (n) ≤ c1n = O(n)

85 / 94

Page 217: 24 Multithreaded Algorithms

Calculating Work Complexity of Parallel Merge

Now, we choose c2 and c3 such that

c2 − c3 ≥ 0

We have that

T1 (n) ≤ c1n = O(n)

85 / 94

Page 218: 24 Multithreaded Algorithms

Finally

Then

T1 (n) = Θ (n)

The parallelism of Parallel MergeT1 (n)T∞ (n) = Θ

( nlog2 n

)

86 / 94

Page 219: 24 Multithreaded Algorithms

Finally

Then

T1 (n) = Θ (n)

The parallelism of Parallel MergeT1 (n)T∞ (n) = Θ

( nlog2 n

)

86 / 94

Page 220: 24 Multithreaded Algorithms

Then What is the Complexity of Parallel Merge-sort withParallel Merge?First, the new code - Input A [p..r ] - Output B [s..s + r − p]Par −Merge − Sort (A, p, r ,B, s)

1 n = r − p + 12 if (n == 1)3 B [s] = A [p]4 else let T [1..n] be a new array5 q = b(p+r)/2c6 q = q − p + 17 spawn Par −Merge − Sort (A, p, q,T , 1)8 Par −Merge − Sort (A, q + 1, r ,T , q ′ + 1)9 sync10 Par −Merge (T , 1, q ′, q ′ + 1, n,B, s)

87 / 94

Page 221: 24 Multithreaded Algorithms

Then What is the Complexity of Parallel Merge-sort withParallel Merge?First, the new code - Input A [p..r ] - Output B [s..s + r − p]Par −Merge − Sort (A, p, r ,B, s)

1 n = r − p + 12 if (n == 1)3 B [s] = A [p]4 else let T [1..n] be a new array5 q = b(p+r)/2c6 q = q − p + 17 spawn Par −Merge − Sort (A, p, q,T , 1)8 Par −Merge − Sort (A, q + 1, r ,T , q ′ + 1)9 sync10 Par −Merge (T , 1, q ′, q ′ + 1, n,B, s)

87 / 94

Page 222: 24 Multithreaded Algorithms

Then What is the Complexity of Parallel Merge-sort withParallel Merge?First, the new code - Input A [p..r ] - Output B [s..s + r − p]Par −Merge − Sort (A, p, r ,B, s)

1 n = r − p + 12 if (n == 1)3 B [s] = A [p]4 else let T [1..n] be a new array5 q = b(p+r)/2c6 q = q − p + 17 spawn Par −Merge − Sort (A, p, q,T , 1)8 Par −Merge − Sort (A, q + 1, r ,T , q ′ + 1)9 sync10 Par −Merge (T , 1, q ′, q ′ + 1, n,B, s)

87 / 94

Page 223: 24 Multithreaded Algorithms

Then What is the Complexity of Parallel Merge-sort withParallel Merge?First, the new code - Input A [p..r ] - Output B [s..s + r − p]Par −Merge − Sort (A, p, r ,B, s)

1 n = r − p + 12 if (n == 1)3 B [s] = A [p]4 else let T [1..n] be a new array5 q = b(p+r)/2c6 q = q − p + 17 spawn Par −Merge − Sort (A, p, q,T , 1)8 Par −Merge − Sort (A, q + 1, r ,T , q ′ + 1)9 sync10 Par −Merge (T , 1, q ′, q ′ + 1, n,B, s)

87 / 94

Page 224: 24 Multithreaded Algorithms

Then What is the Complexity of Parallel Merge-sort withParallel Merge?First, the new code - Input A [p..r ] - Output B [s..s + r − p]Par −Merge − Sort (A, p, r ,B, s)

1 n = r − p + 12 if (n == 1)3 B [s] = A [p]4 else let T [1..n] be a new array5 q = b(p+r)/2c6 q = q − p + 17 spawn Par −Merge − Sort (A, p, q,T , 1)8 Par −Merge − Sort (A, q + 1, r ,T , q ′ + 1)9 sync10 Par −Merge (T , 1, q ′, q ′ + 1, n,B, s)

87 / 94

Page 225: 24 Multithreaded Algorithms

Then What is the Complexity of Parallel Merge-sort withParallel Merge?First, the new code - Input A [p..r ] - Output B [s..s + r − p]Par −Merge − Sort (A, p, r ,B, s)

1 n = r − p + 12 if (n == 1)3 B [s] = A [p]4 else let T [1..n] be a new array5 q = b(p+r)/2c6 q = q − p + 17 spawn Par −Merge − Sort (A, p, q,T , 1)8 Par −Merge − Sort (A, q + 1, r ,T , q ′ + 1)9 sync10 Par −Merge (T , 1, q ′, q ′ + 1, n,B, s)

87 / 94

Page 226: 24 Multithreaded Algorithms

Then, What is the amount of Parallelism of ParallelMerge-sort with Parallel Merge?

WorkWe can use the worst work in the parallel to generate the recursion:

TPMS1 (n) = 2TPMS

1

(n2

)+ TPM

1 (n)

= 2TPMS1

(n2

)+ Θ (n)

= Θ (n log n) Case 2 of the MT

88 / 94

Page 227: 24 Multithreaded Algorithms

Then, What is the amount of Parallelism of ParallelMerge-sort with Parallel Merge?

WorkWe can use the worst work in the parallel to generate the recursion:

TPMS1 (n) = 2TPMS

1

(n2

)+ TPM

1 (n)

= 2TPMS1

(n2

)+ Θ (n)

= Θ (n log n) Case 2 of the MT

88 / 94

Page 228: 24 Multithreaded Algorithms

Then, What is the amount of Parallelism of ParallelMerge-sort with Parallel Merge?

WorkWe can use the worst work in the parallel to generate the recursion:

TPMS1 (n) = 2TPMS

1

(n2

)+ TPM

1 (n)

= 2TPMS1

(n2

)+ Θ (n)

= Θ (n log n) Case 2 of the MT

88 / 94

Page 229: 24 Multithreaded Algorithms

Then, What is the amount of Parallelism of ParallelMerge-sort with Parallel Merge?

WorkWe can use the worst work in the parallel to generate the recursion:

TPMS1 (n) = 2TPMS

1

(n2

)+ TPM

1 (n)

= 2TPMS1

(n2

)+ Θ (n)

= Θ (n log n) Case 2 of the MT

88 / 94

Page 230: 24 Multithreaded Algorithms

Then, What is the amount of Parallelism of ParallelMerge-sort with Parallel Merge?

SpanWe get the following recursion for the span by taking in account that lines7 and 8 of parallel merge sort run in parallel:

TPMS∞ (n) = TPMS

(n2

)+ TPM

∞ (n)

= TPMS∞

(n2

)+ Θ

(log2 n

)= Θ

(log3 n

)Exercise 4.6-2 in the Cormen’s Book

89 / 94

Page 231: 24 Multithreaded Algorithms

Then, What is the amount of Parallelism of ParallelMerge-sort with Parallel Merge?

SpanWe get the following recursion for the span by taking in account that lines7 and 8 of parallel merge sort run in parallel:

TPMS∞ (n) = TPMS

(n2

)+ TPM

∞ (n)

= TPMS∞

(n2

)+ Θ

(log2 n

)= Θ

(log3 n

)Exercise 4.6-2 in the Cormen’s Book

89 / 94

Page 232: 24 Multithreaded Algorithms

Then, What is the amount of Parallelism of ParallelMerge-sort with Parallel Merge?

SpanWe get the following recursion for the span by taking in account that lines7 and 8 of parallel merge sort run in parallel:

TPMS∞ (n) = TPMS

(n2

)+ TPM

∞ (n)

= TPMS∞

(n2

)+ Θ

(log2 n

)= Θ

(log3 n

)Exercise 4.6-2 in the Cormen’s Book

89 / 94

Page 233: 24 Multithreaded Algorithms

Then, What is the amount of Parallelism of ParallelMerge-sort with Parallel Merge?

SpanWe get the following recursion for the span by taking in account that lines7 and 8 of parallel merge sort run in parallel:

TPMS∞ (n) = TPMS

(n2

)+ TPM

∞ (n)

= TPMS∞

(n2

)+ Θ

(log2 n

)= Θ

(log3 n

)Exercise 4.6-2 in the Cormen’s Book

89 / 94

Page 234: 24 Multithreaded Algorithms

Then, What is the amount of Parallelism of ParallelMerge-sort with Parallel Merge?

ParallelismT1 (n)T∞ (n) = Θ

( nlog2 n

)

90 / 94

Page 235: 24 Multithreaded Algorithms

Plotting both ParallelismsWe get the incredible difference between both algorithm

91 / 94

Page 236: 24 Multithreaded Algorithms

Plotting the T∞We get the incredible difference when running both algorithms withan infinite number of processors!!!

92 / 94

Page 237: 24 Multithreaded Algorithms

Outline1 Introduction

Why Multi-Threaded Algorithms?2 Model To Be Used

Symmetric MultiprocessorOperationsExample

3 Computation DAGIntroduction

4 Performance MeasuresIntroductionRunning Time Classification

5 Parallel LawsWork and Span LawsSpeedup and ParallelismGreedy SchedulerScheduling Rises the Following Issue

6 ExamplesParallel FibonacciMatrix MultiplicationParallel Merge-Sort

7 ExercisesSome Exercises you can try!!!

93 / 94

Page 238: 24 Multithreaded Algorithms

Exercises

27.1-127.1-227.1-427.1-627.1-727.2-127.2-327.2-427.2-527.3-127.3-227.3-327.3-4

94 / 94


Recommended