Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Parallel Programmingin C with MPI and OpenMP
Michael J. QuinnMichael J. Quinn
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Chapter 15
The Fast Fourier TransformThe Fast Fourier Transform
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Outline
Fourier analysisFourier analysis Discrete Fourier transformDiscrete Fourier transform Fast Fourier transformFast Fourier transform Parallel implementationParallel implementation
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Discrete Fourier Transform
Many applications in science, engineeringMany applications in science, engineering ExamplesExamples
Voice recognitionVoice recognition Image processingImage processing
Straightforward implementation: Straightforward implementation: ((nn22)) Fast Fourier transform: Fast Fourier transform: ((n n loglog n n))
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Fourier Analysis
Fourier analysis: Represent continuous functions Fourier analysis: Represent continuous functions by potentially infinite series of sine and cosine by potentially infinite series of sine and cosine functionsfunctions
Discrete Fourier transform: Map a sequence over Discrete Fourier transform: Map a sequence over time to another sequence over frequencytime to another sequence over frequency Signal strength as a function of time Signal strength as a function of time Fourier coefficients as a function of frequencyFourier coefficients as a function of frequency
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
DFT Example (1/4)16 data points representing signal strength over time
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
DFT Example (2/4)DFT yields amplitudes and frequencies of sine/cosine functions
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
DFT Example (3/4)Plot of four constituent sine/cosine functions and their sum
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
DFT Example (4/4)Continuous function and original 16 samples.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
DFT of Speech Sample
“An gorra cats are furrier...”
Signal
Frequencyand amplitude
Figure courtesy Ron Cole and Yeshwant Muthusamy of the Oregon Graduate Institute
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Computing DFT
Matrix-vector product Matrix-vector product FFn n xxxx is input vector (signal samples)is input vector (signal samples)
ffi,ji,j = = nnijij for 0 for 0 i, ji, j < < nn and and nn is is
primitive primitive nnth root of unityth root of unity
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example 1
Compute DFT of vector (2, 3)Compute DFT of vector (2, 3) 22, the primitive square root of unity, is -1, the primitive square root of unity, is -1
1
5
3
2
11
11
1
0
112
012
102
002
x
x
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example 2
Compute DFT of vector (1, 2, 4, 3)Compute DFT of vector (1, 2, 4, 3) The primitive 4th root of unity is The primitive 4th root of unity is ii
i
i
ii
ii
x
x
x
x
3
0
3
10
3
4
2
1
11
1111
11
1111
3
2
1
0
94
64
34
04
64
44
24
04
34
24
14
04
04
04
04
04
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Fast Fourier Transform
An An ((nn log log nn) algorithm to perform DFT) algorithm to perform DFT Based on divide-and-conquer strategyBased on divide-and-conquer strategy Suppose we want to compute Suppose we want to compute ff((xx))
We define two new functions, We define two new functions, ff[0][0] and and ff[1][1]
11
2210 ...)(
nn xaxaxaaxf
12/1
2531
]1[
12/2
2420
]0[
...
...
n
n
nn
xaxaxaaf
xaxaxaaf
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
FFT (continued)
Note: Note: ff((xx) = ) = f f [0][0]((xx22) + ) + x f x f [1][1]((xx22)) Problem of evaluating Problem of evaluating f f ((xx) at ) at nn values of values of
reduces toreduces to Evaluating Evaluating f f [0][0]((xx) and ) and f f [1][1]((xx) at ) at nn/2 values /2 values
of of Performing Performing f f [0][0]((xx22) + ) + x f x f [1][1]((xx22))
Leads to recursive algorithm with time Leads to recursive algorithm with time complexity complexity ((nn log log nn))
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Iterative Implementation Preferable
Well-written iterative version performs Well-written iterative version performs fewer index computations than recursive fewer index computations than recursive versionversion
Iterative version evaluates key common Iterative version evaluates key common sub-expression only oncesub-expression only once
Easier to derive parallel FFT algorithm Easier to derive parallel FFT algorithm when sequential algorithm in iterative formwhen sequential algorithm in iterative form
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Recursive Iterative (1/3)Recursive implementation of FFT
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Recursive Iterative (2/3)Determining which computations are performedfor each function invocation
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Recursive Iterative (3/3)Tracking the flow of data values (input vector at bottom)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Parallel Program Design
Domain decompositionDomain decomposition Associate primitive task with each Associate primitive task with each
element of input vector element of input vector aa and and corresponding element of output vector corresponding element of output vector yy
Add channels to handle communications Add channels to handle communications between tasksbetween tasks
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
FFT Task/Channel Grapha[ 0 ] y [ 0 ]
a[ 1 ] y [ 1 ]
a[ 2 ] y [ 2 ]
a[ 3 ] y [ 3 ]
a[ 4 ] y [ 4 ]
a[ 5 ] y [ 5 ]
a[ 6 ] y [ 6 ]
a[ 7 ] y [ 7 ]
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Agglomeration and Mapping
Agglomerate primitive tasks associated with Agglomerate primitive tasks associated with contiguous elements of vectorcontiguous elements of vector
Map one agglomerated task to each processMap one agglomerated task to each process
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
After Agglomeration, Mappinga[0 ] a[1 ] a[2 ] a[3 ] a[4 ] a[5 ] a[6 ] a[7 ] a[8 ] a[9 ] a[1 0 ] a[1 1 ] a[1 2 ] a[1 3 ] a[1 4 ] a[1 5 ]
y [0 ] y [1 ] y [2 ] y [3 ] y [4 ] y [5 ] y [6 ] y [7 ] y [8 ] y [9 ] y [1 0 ] y [11 ] y [1 2 ] y [1 3 ] y [1 4 ] y [1 5 ]
Input
Output
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Phases of Parallel FFT Algorithm
Phase 1: Processes permute Phase 1: Processes permute aa’s (all-to-all ’s (all-to-all communication)communication)
Phase 2:Phase 2: First log First log nn –– log log pp iterations of FFT iterations of FFT No message passing is requiredNo message passing is required
Phase 3:Phase 3: Final log Final log pp iterations iterations Processes organized as logical hypercubeProcesses organized as logical hypercube In each iteration every process swaps values with In each iteration every process swaps values with
partner across a hypercube dimensionpartner across a hypercube dimension
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Complexity Analysis
Each process performs equal share of Each process performs equal share of computation: computation: ((nn log log nn / / pp))
All-to-all communication: All-to-all communication: ((nn log log pp / / pp)) Sub-vector swaps during last log Sub-vector swaps during last log pp
iterations: iterations: ((nn log log pp / / pp))
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Isoefficiency Analysis
Sequential time complexity: Sequential time complexity: ((nn log log nn)) Parallel overhead: Parallel overhead: ((nn log log pp)) Isoefficiency relation:Isoefficiency relation:
nn log log nn C C nn log log pp log log nn C log C log p p nn ppCC
Scalability depends Scalability depends CC, a function of the ratio , a function of the ratio between computing speed and communication between computing speed and communication speed.speed.
1//)( CCC pppppM
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Summary
Discrete Fourier transform used in many scientific Discrete Fourier transform used in many scientific and engineering applicationsand engineering applications
Fast Fourier transform important because it Fast Fourier transform important because it implements DFT in time implements DFT in time ((nn log log nn))
Developed parallel implementation of FFTDeveloped parallel implementation of FFT Why isn’t scalability better?Why isn’t scalability better?
((nn log log nn) sequential algorithm) sequential algorithm Parallel version requires all-to-all data Parallel version requires all-to-all data
exchangeexchange