INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

INTEL CONFIDENTIAL

Predicting Parallel PerformanceIntroduction to Parallel Programming – Part 10

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

2

Review & Objectives

Previously: Design and implement of a task decomposition solution

At the end of this part you should be able to:Define speedup and efficiencyUse Amdahl’s Law to predict maximum speedup


3

Speedup

Speedup is the ratio between sequential execution time and parallel execution time

For example, if the sequential program executes in 6 seconds and the parallel program executes in

2 seconds, the speedup is 3X

Speedup curveslook like this

Cores

Sp

eed

up


Efficiency

EfficiencyA measure of core utilizationSpeedup divided by the number of cores

ExampleProgram achieves speedup of 3 on 4 coresEfficiency is 3 / 4 = 75%

4

Effi

cien

cy

Cores

Efficiency curveslook like this


Speedup Example

Painting a picket fence– 30 minutes of preparation (serial)– One minute to paint a single picket– 30 minutes of cleanup (serial)

Thus, 300 pickets takes 360 minutes (serial time)

5

Speedup and Efficiency


Computing Speedup

6

Number of painters

Time Speedup

1 30 + 300 + 30 = 360 1.0X

2 30 + 150 + 30 = 210 1.7X

10 30 + 30 + 30 = 90 4.0X

100 30 + 3 + 30 = 63 5.7X

Infinite 30 + 0 + 30 = 60 6.0X



7

Efficiency Example

Number of painters

Time Speedup Efficiency

1 360 1.0X 100%

2 30 + 150 + 30 = 210 1.7X 85%

10 30 + 30 + 30 = 90 4.0X 40%

100 30 + 3 + 30 = 63 5.7X 5.7%

Infinite 30 + 0 + 30 = 60 6.0X very low



Idea Behind Amdahl’s Law

8

Cores

Exe

cuti

on

Tim

e

s

s

ss s

1-s

(1-s )/2 (1-s )/3(1-s )/5(1-s )/4

Portion of computationthat will be performed

sequentially

Portion of computationthat will be executed

in parallel


9

Derivation of Amdahl’s Law

Speedup is ratio of execution time on 1 core to execution time on p cores

Execution time on 1 core is s + (1-s)Execution time on p cores is at least s + (1-s)/p

psspss

ss

/)1(

1

/)1(

)1(


Amdahl’s Law Is Too Optimistic

Amdahl’s Law ignores parallel processing overheadExamples of this overhead include time spent

creating and terminating threadsParallel processing overhead is usually an increasing

function of the number of cores (threads)

10


Graph with Parallel Overhead Added

11

Cores

Exe

cuti

on

Tim

e Parallel overheadincreases with

# of cores


Other Optimistic Assumptions

Amdahl’s Law assumes that the computation divides evenly among the cores

In reality, the amount of work does not divide evenly among the cores

Core waiting time is another form of overhead

12

Task started

Task completed

Working time

Waiting time


Graph with Workload Imbalance Added

13

Cores

Exe

cuti

on

Tim

e

Time lostdue to

workloadimbalance


Illustration of the Amdahl Effect

14

n = 100,000

n = 10,000

n = 1,000

Cores

Speedu

p

Linear speedup


Using Amdahl’s Law

Program executes in 5 secondsProfile reveals 80% of time spent in function alpha,

which we can execute in parallelWhat would be maximum speedup on 2 cores?

New execution time ≥ 5 sec / 1.67 = 3 seconds

15

67.16.0

1

2/)2.01(2.0

1


Superlinear Speedup

According to our general speedup formula, the maximum speedup a program can achieve on p cores is p

Superlinear speedup is the situation where speedup is greater than the number of cores used

It means the computational rate of the cores is faster when the parallel program is executing

Superlinear speedup is usually caused because the cache hit rate of the parallel program is higher

16


17

References

Michael J. Quinn, Parallel Programming in C with MPI and OpenMP, McGraw-Hill (2004).


19

More General Speedup Formula

(n,p) Speedup for problem of size n on p cores

(n) Time spent in sequential portion of code for problem of size n

(n) Time spent in parallelizable portion of code for problem of size n

(n,p) Parallel overhead

),(/)()(

)()(),(

pnpnn

nnpn


Amdahl’s Law: Maximum Speedup

20

),(/)()(

)()(),(

pnpnn

nnpn

This term is set to 0

Assumes parallelwork divides perfectlyamong available cores


The Amdahl Effect

21

As n theseterms dominate

Speedup is an increasing function of problem size

),(/)()(

)()(),(

pnpnn

nnpn

Date post:	30-Dec-2015
Category:	Documents
Upload:	bryan-chase
View:	227 times
Download:	1 times

INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

Documents