Parallel Programming in C with MPI and OpenMP

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Parallel Programmingin C with MPI and OpenMP

Michael J. QuinnMichael J. Quinn


Chapter 15

The Fast Fourier TransformThe Fast Fourier Transform


Outline

Fourier analysisFourier analysis Discrete Fourier transformDiscrete Fourier transform Fast Fourier transformFast Fourier transform Parallel implementationParallel implementation


Discrete Fourier Transform

Many applications in science, engineeringMany applications in science, engineering ExamplesExamples

Voice recognitionVoice recognition Image processingImage processing

Straightforward implementation: Straightforward implementation: ((nn22)) Fast Fourier transform: Fast Fourier transform: ((n n loglog n n))


Fourier Analysis

Fourier analysis: Represent continuous functions Fourier analysis: Represent continuous functions by potentially infinite series of sine and cosine by potentially infinite series of sine and cosine functionsfunctions

Discrete Fourier transform: Map a sequence over Discrete Fourier transform: Map a sequence over time to another sequence over frequencytime to another sequence over frequency Signal strength as a function of time Signal strength as a function of time Fourier coefficients as a function of frequencyFourier coefficients as a function of frequency


DFT Example (1/4)16 data points representing signal strength over time


DFT Example (2/4)DFT yields amplitudes and frequencies of sine/cosine functions


DFT Example (3/4)Plot of four constituent sine/cosine functions and their sum


DFT Example (4/4)Continuous function and original 16 samples.


DFT of Speech Sample

“An gorra cats are furrier...”

Signal

Frequencyand amplitude

Figure courtesy Ron Cole and Yeshwant Muthusamy of the Oregon Graduate Institute


Computing DFT

Matrix-vector product Matrix-vector product FFn n xxxx is input vector (signal samples)is input vector (signal samples)

ffi,ji,j = = nnijij for 0 for 0 i, ji, j < < nn and and nn is is

primitive primitive nnth root of unityth root of unity


Example 1

Compute DFT of vector (2, 3)Compute DFT of vector (2, 3) 22, the primitive square root of unity, is -1, the primitive square root of unity, is -1

1

5

3

2

11

11

1

0

112

012

102

002

x

x


Example 2

Compute DFT of vector (1, 2, 4, 3)Compute DFT of vector (1, 2, 4, 3) The primitive 4th root of unity is The primitive 4th root of unity is ii

i

i

ii

ii

x

x

x

x

3

0

3

10

3

4

2

1

11

1111

11

1111

3

2

1

0

94

64

34

04

64

44

24

04

34

24

14

04

04

04

04

04


Fast Fourier Transform

An An ((nn log log nn) algorithm to perform DFT) algorithm to perform DFT Based on divide-and-conquer strategyBased on divide-and-conquer strategy Suppose we want to compute Suppose we want to compute ff((xx))

We define two new functions, We define two new functions, ff[0][0] and and ff[1][1]

11

2210 ...)(

nn xaxaxaaxf

12/1

2531

]1[

12/2

2420

]0[

...

...

n

n

nn

xaxaxaaf

xaxaxaaf


FFT (continued)

Note: Note: ff((xx) = ) = f f [0][0]((xx22) + ) + x f x f [1][1]((xx22)) Problem of evaluating Problem of evaluating f f ((xx) at ) at nn values of values of

reduces toreduces to Evaluating Evaluating f f [0][0]((xx) and ) and f f [1][1]((xx) at ) at nn/2 values /2 values

of of Performing Performing f f [0][0]((xx22) + ) + x f x f [1][1]((xx22))

Leads to recursive algorithm with time Leads to recursive algorithm with time complexity complexity ((nn log log nn))


Iterative Implementation Preferable

Well-written iterative version performs Well-written iterative version performs fewer index computations than recursive fewer index computations than recursive versionversion

Iterative version evaluates key common Iterative version evaluates key common sub-expression only oncesub-expression only once

Easier to derive parallel FFT algorithm Easier to derive parallel FFT algorithm when sequential algorithm in iterative formwhen sequential algorithm in iterative form


Recursive Iterative (1/3)Recursive implementation of FFT


Recursive Iterative (2/3)Determining which computations are performedfor each function invocation


Recursive Iterative (3/3)Tracking the flow of data values (input vector at bottom)


Parallel Program Design

Domain decompositionDomain decomposition Associate primitive task with each Associate primitive task with each

element of input vector element of input vector aa and and corresponding element of output vector corresponding element of output vector yy

Add channels to handle communications Add channels to handle communications between tasksbetween tasks


FFT Task/Channel Grapha[ 0 ] y [ 0 ]

a[ 1 ] y [ 1 ]

a[ 2 ] y [ 2 ]

a[ 3 ] y [ 3 ]

a[ 4 ] y [ 4 ]

a[ 5 ] y [ 5 ]

a[ 6 ] y [ 6 ]

a[ 7 ] y [ 7 ]


Agglomeration and Mapping

Agglomerate primitive tasks associated with Agglomerate primitive tasks associated with contiguous elements of vectorcontiguous elements of vector

Map one agglomerated task to each processMap one agglomerated task to each process


After Agglomeration, Mappinga[0 ] a[1 ] a[2 ] a[3 ] a[4 ] a[5 ] a[6 ] a[7 ] a[8 ] a[9 ] a[1 0 ] a[1 1 ] a[1 2 ] a[1 3 ] a[1 4 ] a[1 5 ]

y [0 ] y [1 ] y [2 ] y [3 ] y [4 ] y [5 ] y [6 ] y [7 ] y [8 ] y [9 ] y [1 0 ] y [11 ] y [1 2 ] y [1 3 ] y [1 4 ] y [1 5 ]

Input

Output


Phases of Parallel FFT Algorithm

Phase 1: Processes permute Phase 1: Processes permute aa’s (all-to-all ’s (all-to-all communication)communication)

Phase 2:Phase 2: First log First log nn –– log log pp iterations of FFT iterations of FFT No message passing is requiredNo message passing is required

Phase 3:Phase 3: Final log Final log pp iterations iterations Processes organized as logical hypercubeProcesses organized as logical hypercube In each iteration every process swaps values with In each iteration every process swaps values with

partner across a hypercube dimensionpartner across a hypercube dimension


Complexity Analysis

Each process performs equal share of Each process performs equal share of computation: computation: ((nn log log nn / / pp))

All-to-all communication: All-to-all communication: ((nn log log pp / / pp)) Sub-vector swaps during last log Sub-vector swaps during last log pp

iterations: iterations: ((nn log log pp / / pp))


Isoefficiency Analysis

Sequential time complexity: Sequential time complexity: ((nn log log nn)) Parallel overhead: Parallel overhead: ((nn log log pp)) Isoefficiency relation:Isoefficiency relation:

nn log log nn C C nn log log pp log log nn C log C log p p nn ppCC

Scalability depends Scalability depends CC, a function of the ratio , a function of the ratio between computing speed and communication between computing speed and communication speed.speed.

1//)( CCC pppppM


Summary

Discrete Fourier transform used in many scientific Discrete Fourier transform used in many scientific and engineering applicationsand engineering applications

Fast Fourier transform important because it Fast Fourier transform important because it implements DFT in time implements DFT in time ((nn log log nn))

Developed parallel implementation of FFTDeveloped parallel implementation of FFT Why isn’t scalability better?Why isn’t scalability better?

((nn log log nn) sequential algorithm) sequential algorithm Parallel version requires all-to-all data Parallel version requires all-to-all data

exchangeexchange

Date post:	28-Jan-2016
Category:	Documents
Upload:	torin
View:	62 times
Download:	1 times

Parallel Programming in C with MPI and OpenMP

Documents