+ All Categories
Home > Documents > INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

Date post: 30-Dec-2015
Category:
Upload: bryan-chase
View: 227 times
Download: 1 times
Share this document with a friend
Popular Tags:
21
INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10
Transcript
Page 1: INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

INTEL CONFIDENTIAL

Predicting Parallel PerformanceIntroduction to Parallel Programming – Part 10

Page 2: INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

2

Review & Objectives

Previously: Design and implement of a task decomposition solution

At the end of this part you should be able to:Define speedup and efficiencyUse Amdahl’s Law to predict maximum speedup

Page 3: INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

3

Speedup

Speedup is the ratio between sequential execution time and parallel execution time

For example, if the sequential program executes in 6 seconds and the parallel program executes in

2 seconds, the speedup is 3X

Speedup curveslook like this

Cores

Sp

eed

up

Page 4: INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Efficiency

EfficiencyA measure of core utilizationSpeedup divided by the number of cores

ExampleProgram achieves speedup of 3 on 4 coresEfficiency is 3 / 4 = 75%

4

Effi

cien

cy

Cores

Efficiency curveslook like this

Page 5: INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Speedup Example

Painting a picket fence– 30 minutes of preparation (serial)– One minute to paint a single picket– 30 minutes of cleanup (serial)

Thus, 300 pickets takes 360 minutes (serial time)

5

Speedup and Efficiency

Page 6: INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Computing Speedup

6

Number of painters

Time Speedup

1 30 + 300 + 30 = 360 1.0X

2 30 + 150 + 30 = 210 1.7X

10 30 + 30 + 30 = 90 4.0X

100 30 + 3 + 30 = 63 5.7X

Infinite 30 + 0 + 30 = 60 6.0X

Speedup and Efficiency

Page 7: INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

7

Efficiency Example

Number of painters

Time Speedup Efficiency

1 360 1.0X 100%

2 30 + 150 + 30 = 210 1.7X 85%

10 30 + 30 + 30 = 90 4.0X 40%

100 30 + 3 + 30 = 63 5.7X 5.7%

Infinite 30 + 0 + 30 = 60 6.0X very low

Speedup and Efficiency

Page 8: INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Idea Behind Amdahl’s Law

8

Cores

Exe

cuti

on

Tim

e

s

s

ss s

1-s

(1-s )/2 (1-s )/3(1-s )/5(1-s )/4

Portion of computationthat will be performed

sequentially

Portion of computationthat will be executed

in parallel

Page 9: INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

9

Derivation of Amdahl’s Law

Speedup is ratio of execution time on 1 core to execution time on p cores

Execution time on 1 core is s + (1-s)Execution time on p cores is at least s + (1-s)/p

psspss

ss

/)1(

1

/)1(

)1(

Page 10: INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Amdahl’s Law Is Too Optimistic

Amdahl’s Law ignores parallel processing overheadExamples of this overhead include time spent

creating and terminating threadsParallel processing overhead is usually an increasing

function of the number of cores (threads)

10

Page 11: INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Graph with Parallel Overhead Added

11

Cores

Exe

cuti

on

Tim

e Parallel overheadincreases with

# of cores

Page 12: INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Other Optimistic Assumptions

Amdahl’s Law assumes that the computation divides evenly among the cores

In reality, the amount of work does not divide evenly among the cores

Core waiting time is another form of overhead

12

Task started

Task completed

Working time

Waiting time

Page 13: INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Graph with Workload Imbalance Added

13

Cores

Exe

cuti

on

Tim

e

Time lostdue to

workloadimbalance

Page 14: INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Illustration of the Amdahl Effect

14

n = 100,000

n = 10,000

n = 1,000

Cores

Speedu

p

Linear speedup

Page 15: INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Using Amdahl’s Law

Program executes in 5 secondsProfile reveals 80% of time spent in function alpha,

which we can execute in parallelWhat would be maximum speedup on 2 cores?

New execution time ≥ 5 sec / 1.67 = 3 seconds

15

67.16.0

1

2/)2.01(2.0

1

Page 16: INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Superlinear Speedup

According to our general speedup formula, the maximum speedup a program can achieve on p cores is p

Superlinear speedup is the situation where speedup is greater than the number of cores used

It means the computational rate of the cores is faster when the parallel program is executing

Superlinear speedup is usually caused because the cache hit rate of the parallel program is higher

16

Page 17: INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

17

References

Michael J. Quinn, Parallel Programming in C with MPI and OpenMP, McGraw-Hill (2004).

Page 18: INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.
Page 19: INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

19

More General Speedup Formula

(n,p) Speedup for problem of size n on p cores

(n) Time spent in sequential portion of code for problem of size n

(n) Time spent in parallelizable portion of code for problem of size n

(n,p) Parallel overhead

),(/)()(

)()(),(

pnpnn

nnpn

Page 20: INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Amdahl’s Law: Maximum Speedup

20

),(/)()(

)()(),(

pnpnn

nnpn

This term is set to 0

Assumes parallelwork divides perfectlyamong available cores

Page 21: INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

The Amdahl Effect

21

As n theseterms dominate

Speedup is an increasing function of problem size

),(/)()(

)()(),(

pnpnn

nnpn


Recommended