+ All Categories
Home > Documents > Multi-core Programming Speedup - Kentfarrell/mc08/lectures/02-Speedup.pdf•Running on dual-core S =...

Multi-core Programming Speedup - Kentfarrell/mc08/lectures/02-Speedup.pdf•Running on dual-core S =...

Date post: 24-Feb-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
8
1 Multi-core Programming Speedup Based on slides from Intel Software College and Multi-Core Programming – increasing performance through software multi-threading by Shameem Akhter and Jason Roberts, 2 Copyright © 2006, Intel Corporation. All rights reserved. Threading Basics Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College Agenda Concurrency vs. Parallelism Characterization Hardware & Parallel Computing Threading Tools Overview Thread Communication
Transcript
Page 1: Multi-core Programming Speedup - Kentfarrell/mc08/lectures/02-Speedup.pdf•Running on dual-core S = (1-P) + P/n = .7 + .3/2 = .85 •Running on quad-core S = (1-P) + P/n = .7 + .3/4

1

Multi-core ProgrammingSpeedup

Based on slides from Intel Software College

and

Multi-Core Programming –

increasing performance through software multi-threading

by Shameem Akhter and Jason Roberts,

2

Copyright © 2006, Intel Corporation. All rights reserved.

Threading Basics

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Agenda

Concurrency vs. Parallelism

Characterization

Hardware & Parallel Computing

Threading Tools Overview

Thread Communication

Page 2: Multi-core Programming Speedup - Kentfarrell/mc08/lectures/02-Speedup.pdf•Running on dual-core S = (1-P) + P/n = .7 + .3/2 = .85 •Running on quad-core S = (1-P) + P/n = .7 + .3/4

2

3

Copyright © 2006, Intel Corporation. All rights reserved.

Threading Basics

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Agenda

Concurrency vs. Parallelism

Characterization

Hardware & Parallel Computing

Threading Tools Overview

Thread Communication

4

Copyright © 2006, Intel Corporation. All rights reserved.

Threading Basics

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Concurrency vs. Parallelism

• Concurrency: two or more threads are in progress at the same time:

• Parallelism: two or more threads are executing at the same time

• Multiple cores needed

Thread 1Thread 1Thread 2Thread 2

Thread 1Thread 1Thread 2Thread 2

Page 3: Multi-core Programming Speedup - Kentfarrell/mc08/lectures/02-Speedup.pdf•Running on dual-core S = (1-P) + P/n = .7 + .3/2 = .85 •Running on quad-core S = (1-P) + P/n = .7 + .3/4

3

5

Copyright © 2006, Intel Corporation. All rights reserved.

Threading Basics

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Agenda

Concurrency vs. Parallelism

Characterization

Hardware & Parallel Computing

Threading Tools Overview

Thread Communication

6

Copyright © 2006, Intel Corporation. All rights reserved.

Threading Basics

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Speedup (Simple)

Measure of how much faster the computation executes versus the best serial code

• Serial time divided by parallel time

Example: Painting a picket fence• 30 minutes of preparation (serial)• One minute to paint a single picket• 30 minutes of cleanup (serial)

Thus, 300 pickets takes 360 minutes (serial time)

Page 4: Multi-core Programming Speedup - Kentfarrell/mc08/lectures/02-Speedup.pdf•Running on dual-core S = (1-P) + P/n = .7 + .3/2 = .85 •Running on quad-core S = (1-P) + P/n = .7 + .3/4

4

7

Copyright © 2006, Intel Corporation. All rights reserved.

Threading Basics

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Computing Speedup

What if fence owner uses spray gun to paint 300 pickets in one hour?• Better serial algorithm• If no spray guns are available for multiple workers, what is

maximum parallel speedup?

5.7X30 + 3 + 30 = 63100

6.0X30 + 0 + 30 = 60Infinite

4.0X30 + 30 + 30 = 9010

1.7X30 + 150 + 30 = 2102

1.0X30 + 300 + 30 = 3601

SpeedupTimeNumber of painters

Illustrates Amdahl’s Law

Potential speedup is restricted by serial portion

8

Copyright © 2006, Intel Corporation. All rights reserved.

Threading Basics

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

n = number of processors

Tparallel = {(1-P) + P/n} Tserial

Speedup = Tserial / Tparallel

Amdahl’s Law

Describes the upper bound of parallel execution speedup

P is parallel fraction

Serial code limits speedup

(1-P

)P

T ser

ial

(1-P

)

P/2

0.5 + 0.5 + 0.250.25

1.0/0.75 = 1.0/0.75 = 1.331.33

n = 2n = 2n = n = ∞∞

P/∞∞…

0.5 + 0.5 + 0.00.0

1.0/0.5 = 1.0/0.5 = 2.02.0

Page 5: Multi-core Programming Speedup - Kentfarrell/mc08/lectures/02-Speedup.pdf•Running on dual-core S = (1-P) + P/n = .7 + .3/2 = .85 •Running on quad-core S = (1-P) + P/n = .7 + .3/4

5

9

Copyright © 2006, Intel Corporation. All rights reserved.

Threading Basics

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Amdahl’s Law

Amdahl's Law (Ideal)

0

5

10

15

20

25

30

35

40

45

10 20 30 40

Number of Processors

Spee

dup f=0.0

f=0.01f=0.05f=0.1

10

Copyright © 2006, Intel Corporation. All rights reserved.

Threading Basics

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Amdahl's Law (Actual)

0

10

20

30

40

50

10 20 30 40

Number CPUs

Spee

dup f=0.0

f=0.01Actual

Amdahl’s Law (Continued)

Page 6: Multi-core Programming Speedup - Kentfarrell/mc08/lectures/02-Speedup.pdf•Running on dual-core S = (1-P) + P/n = .7 + .3/2 = .85 •Running on quad-core S = (1-P) + P/n = .7 + .3/4

6

11

Copyright © 2006, Intel Corporation. All rights reserved.

Threading Basics

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Consequences of Amdahl’s Law

• Increasing the number of cores only affects the parallel portion• If code only 10% parallel, best one can do is run it in 90% of time

• It is more important to reduce the proportion of the code that is sequential than to increase the number of cores• P= 0.3 (30% parallel)

• Running on dual-core S = (1-P) + P/n = .7 + .3/2 = .85• Running on quad-core S = (1-P) + P/n = .7 + .3/4 = .775• Running on dual-core with double amount of parallelized code

P = 0.6 (60% parallel)• Running on dual core S = (1-P) + P/n = .4 + .6/2 = .7

12

Copyright © 2006, Intel Corporation. All rights reserved.

Threading Basics

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Modification of Amdahl’s Law for thread overhead

• Need to consider overhead of adding threadsSpeedup = 1/[ (1-P) + P/n + H(n)]

where H(n) is thread overhead• Note that overhead is not linear on good parallel machine

• Source of overhead • Actual OS overhead• Inter-thread activities such as synchronization and

communication

• If overhead large enough speedup can be less than 1• Threading can actually slow performance

• Important to design multi-threaded applications well

Page 7: Multi-core Programming Speedup - Kentfarrell/mc08/lectures/02-Speedup.pdf•Running on dual-core S = (1-P) + P/n = .7 + .3/2 = .85 •Running on quad-core S = (1-P) + P/n = .7 + .3/4

7

13

Copyright © 2006, Intel Corporation. All rights reserved.

Threading Basics

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Efficiency

Measure of how effectively computation resources (threads) are kept busy

• Speedup divided by number of threads• Expressed as average percentage of non-idle time

6.0X

5.7X

4.0X

1.7X

1.0X

Speedup

very low

5.7%

40%

85%

100%

Efficiency

30 + 3 + 30 = 63100

30 + 0 + 30 = 60Infinite

30 + 30 + 30 = 9010

30 + 150 + 30 = 2102

3601

TimeNumber of painters

14

Copyright © 2006, Intel Corporation. All rights reserved.

Threading Basics

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Amdahl’s Law and Hyper-Threading

• With HT get 30% performance gain

• A thread on HT enabled processor runs slowe

Page 8: Multi-core Programming Speedup - Kentfarrell/mc08/lectures/02-Speedup.pdf•Running on dual-core S = (1-P) + P/n = .7 + .3/2 = .85 •Running on quad-core S = (1-P) + P/n = .7 + .3/4

8

15

Copyright © 2006, Intel Corporation. All rights reserved.

Threading Basics

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Repeal of Amdahl’s Law

• Amdahl’s work seemed to limit applicability of parallel computing

• Amdahl’s Law based on assumptions• Best serial algorithm limited by CPU cycles available

• In multi-core may have multiple caches, so more of data may be in cache reducing memory latency

• Serial algorithm best• sometimes parallel solution requires less computational steps

• Problem size fixed• Usually size increases with resources available

16

Copyright © 2006, Intel Corporation. All rights reserved.

Threading Basics

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Software College

Gustafson’s Law

Consider scaling problem size as processor count increased

Ts = serial execution time

Tp(N,W) = parallel execution time for same problem, size W, on N CPUs

S(N,W) = Speedup on problem size W, N CPUs

S(N,W) = (Ts + Tp(1,W) )/( Ts + Tp(N,W) )

Consider case where Tp(N,W) ~ W*W/N

S(N,W) -> (N*Ts + N*W*W)/(N*Ts + W*W)

If W-> ∞ as N -> ∞ then S(N,W) -> N

Gustafson’s Law provides hope for parallel applications to deliver on their promise.


Recommended