+ All Categories
Home > Documents > Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP

Date post: 28-Jan-2016
Category:
Upload: torin
View: 62 times
Download: 1 times
Share this document with a friend
Description:
Parallel Programming in C with MPI and OpenMP. Michael J. Quinn. Chapter 15. The Fast Fourier Transform. Outline. Fourier analysis Discrete Fourier transform Fast Fourier transform Parallel implementation. Discrete Fourier Transform. Many applications in science, engineering Examples - PowerPoint PPT Presentation
27
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn Michael J. Quinn
Transcript
Page 1: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Parallel Programmingin C with MPI and OpenMP

Michael J. QuinnMichael J. Quinn

Page 2: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Chapter 15

The Fast Fourier TransformThe Fast Fourier Transform

Page 3: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Outline

Fourier analysisFourier analysis Discrete Fourier transformDiscrete Fourier transform Fast Fourier transformFast Fourier transform Parallel implementationParallel implementation

Page 4: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Discrete Fourier Transform

Many applications in science, engineeringMany applications in science, engineering ExamplesExamples

Voice recognitionVoice recognition Image processingImage processing

Straightforward implementation: Straightforward implementation: ((nn22)) Fast Fourier transform: Fast Fourier transform: ((n n loglog n n))

Page 5: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Fourier Analysis

Fourier analysis: Represent continuous functions Fourier analysis: Represent continuous functions by potentially infinite series of sine and cosine by potentially infinite series of sine and cosine functionsfunctions

Discrete Fourier transform: Map a sequence over Discrete Fourier transform: Map a sequence over time to another sequence over frequencytime to another sequence over frequency Signal strength as a function of time Signal strength as a function of time Fourier coefficients as a function of frequencyFourier coefficients as a function of frequency

Page 6: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

DFT Example (1/4)16 data points representing signal strength over time

Page 7: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

DFT Example (2/4)DFT yields amplitudes and frequencies of sine/cosine functions

Page 8: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

DFT Example (3/4)Plot of four constituent sine/cosine functions and their sum

Page 9: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

DFT Example (4/4)Continuous function and original 16 samples.

Page 10: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

DFT of Speech Sample

“An gorra cats are furrier...”

Signal

Frequencyand amplitude

Figure courtesy Ron Cole and Yeshwant Muthusamy of the Oregon Graduate Institute

Page 11: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Computing DFT

Matrix-vector product Matrix-vector product FFn n xxxx is input vector (signal samples)is input vector (signal samples)

ffi,ji,j = = nnijij for 0 for 0 i, ji, j < < nn and and nn is is

primitive primitive nnth root of unityth root of unity

Page 12: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Example 1

Compute DFT of vector (2, 3)Compute DFT of vector (2, 3) 22, the primitive square root of unity, is -1, the primitive square root of unity, is -1

1

5

3

2

11

11

1

0

112

012

102

002

x

x

Page 13: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Example 2

Compute DFT of vector (1, 2, 4, 3)Compute DFT of vector (1, 2, 4, 3) The primitive 4th root of unity is The primitive 4th root of unity is ii

i

i

ii

ii

x

x

x

x

3

0

3

10

3

4

2

1

11

1111

11

1111

3

2

1

0

94

64

34

04

64

44

24

04

34

24

14

04

04

04

04

04

Page 14: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Fast Fourier Transform

An An ((nn log log nn) algorithm to perform DFT) algorithm to perform DFT Based on divide-and-conquer strategyBased on divide-and-conquer strategy Suppose we want to compute Suppose we want to compute ff((xx))

We define two new functions, We define two new functions, ff[0][0] and and ff[1][1]

11

2210 ...)(

nn xaxaxaaxf

12/1

2531

]1[

12/2

2420

]0[

...

...

n

n

nn

xaxaxaaf

xaxaxaaf

Page 15: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

FFT (continued)

Note: Note: ff((xx) = ) = f f [0][0]((xx22) + ) + x f x f [1][1]((xx22)) Problem of evaluating Problem of evaluating f f ((xx) at ) at nn values of values of

reduces toreduces to Evaluating Evaluating f f [0][0]((xx) and ) and f f [1][1]((xx) at ) at nn/2 values /2 values

of of Performing Performing f f [0][0]((xx22) + ) + x f x f [1][1]((xx22))

Leads to recursive algorithm with time Leads to recursive algorithm with time complexity complexity ((nn log log nn))

Page 16: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Iterative Implementation Preferable

Well-written iterative version performs Well-written iterative version performs fewer index computations than recursive fewer index computations than recursive versionversion

Iterative version evaluates key common Iterative version evaluates key common sub-expression only oncesub-expression only once

Easier to derive parallel FFT algorithm Easier to derive parallel FFT algorithm when sequential algorithm in iterative formwhen sequential algorithm in iterative form

Page 17: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Recursive Iterative (1/3)Recursive implementation of FFT

Page 18: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Recursive Iterative (2/3)Determining which computations are performedfor each function invocation

Page 19: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Recursive Iterative (3/3)Tracking the flow of data values (input vector at bottom)

Page 20: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Parallel Program Design

Domain decompositionDomain decomposition Associate primitive task with each Associate primitive task with each

element of input vector element of input vector aa and and corresponding element of output vector corresponding element of output vector yy

Add channels to handle communications Add channels to handle communications between tasksbetween tasks

Page 21: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

FFT Task/Channel Grapha[ 0 ] y [ 0 ]

a[ 1 ] y [ 1 ]

a[ 2 ] y [ 2 ]

a[ 3 ] y [ 3 ]

a[ 4 ] y [ 4 ]

a[ 5 ] y [ 5 ]

a[ 6 ] y [ 6 ]

a[ 7 ] y [ 7 ]

Page 22: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Agglomeration and Mapping

Agglomerate primitive tasks associated with Agglomerate primitive tasks associated with contiguous elements of vectorcontiguous elements of vector

Map one agglomerated task to each processMap one agglomerated task to each process

Page 23: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

After Agglomeration, Mappinga[0 ] a[1 ] a[2 ] a[3 ] a[4 ] a[5 ] a[6 ] a[7 ] a[8 ] a[9 ] a[1 0 ] a[1 1 ] a[1 2 ] a[1 3 ] a[1 4 ] a[1 5 ]

y [0 ] y [1 ] y [2 ] y [3 ] y [4 ] y [5 ] y [6 ] y [7 ] y [8 ] y [9 ] y [1 0 ] y [11 ] y [1 2 ] y [1 3 ] y [1 4 ] y [1 5 ]

Input

Output

Page 24: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Phases of Parallel FFT Algorithm

Phase 1: Processes permute Phase 1: Processes permute aa’s (all-to-all ’s (all-to-all communication)communication)

Phase 2:Phase 2: First log First log nn –– log log pp iterations of FFT iterations of FFT No message passing is requiredNo message passing is required

Phase 3:Phase 3: Final log Final log pp iterations iterations Processes organized as logical hypercubeProcesses organized as logical hypercube In each iteration every process swaps values with In each iteration every process swaps values with

partner across a hypercube dimensionpartner across a hypercube dimension

Page 25: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Complexity Analysis

Each process performs equal share of Each process performs equal share of computation: computation: ((nn log log nn / / pp))

All-to-all communication: All-to-all communication: ((nn log log pp / / pp)) Sub-vector swaps during last log Sub-vector swaps during last log pp

iterations: iterations: ((nn log log pp / / pp))

Page 26: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Isoefficiency Analysis

Sequential time complexity: Sequential time complexity: ((nn log log nn)) Parallel overhead: Parallel overhead: ((nn log log pp)) Isoefficiency relation:Isoefficiency relation:

nn log log nn C C nn log log pp log log nn C log C log p p nn ppCC

Scalability depends Scalability depends CC, a function of the ratio , a function of the ratio between computing speed and communication between computing speed and communication speed.speed.

1//)( CCC pppppM

Page 27: Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Summary

Discrete Fourier transform used in many scientific Discrete Fourier transform used in many scientific and engineering applicationsand engineering applications

Fast Fourier transform important because it Fast Fourier transform important because it implements DFT in time implements DFT in time ((nn log log nn))

Developed parallel implementation of FFTDeveloped parallel implementation of FFT Why isn’t scalability better?Why isn’t scalability better?

((nn log log nn) sequential algorithm) sequential algorithm Parallel version requires all-to-all data Parallel version requires all-to-all data

exchangeexchange


Recommended