0270_PDF_C06.pdf

Chapter 6

Performing Bit-Reversal by

Repeated Permutation of

Intermediate Results

It has been shown in the previous chapter that if the input data are rst permutedinto bit-reversed order, then the radix-2 DIFRN FFT can be used to obtain naturallyordered output. This process is depicted for a N = 8 example in Figure 6.1. Whenthe static permutation step is not performed in place, the bit-reversed input data areavailable in array b after the reordering.

Figure 6.1 Bit-reversing the input before performing in-place DIFRN FFT.

2000 by CRC Press LLC

Of course, the same result can be accomplished by bit-reversing the output froman NR FFT algorithm as depicted in Figure 6.2 for the same example.

Figure 6.2 Bit-reversing the output after performing in-place DIFNR FFT.

6.1 Combining Permutation with Buttery Compu-

tation

The cost of extra memory accesses in a separate bit-reversing phase can be completelyeliminated if data permutation is combined with the buttery computation at eachstep. Such an alternative is presented in this section.

6.1.1 The ordered radix-2 DIFNN FFT

When input x and output X are both in natural order, the algorithm is referred toas an ordered FFT in the literature. The ordered radix-2 DIF FFT procedure wasoriginally proposed by Stockham [30, 89]. The key to understanding what is requiredis to view each buttery computation as consisting of one permutation step followedby one in-place computation step. These permutation steps reorder the initial input aswell as the input to each subsequent subproblem, and the notation introduced in theprevious chapters can be used to describe this process in a natural way.

Again using the N = 8 example above, with the input in the natural order, i.e.,

a[i2i1i0] = xi2i1i0 , the rst in-place buttery is denoted byi2i1i0. If this buttery

operation is preceded by permuting the data in a[i2i1i0] to b[i1i0i2], it is natural to use


i2i1i0 i1i0i2a b

to denote the permutation, which is followed by in-place buttery computation denotedby

i1i0i2 i1i0i2

To show the combined eect, the two sequences above are condensed into

i2i1i0 i1i0i2

a b

If the next step involves permuting the derivative x(1)i2i1i0 in b[i1i0i2] to a[i0i1i2], thenthe derivatives x(2)i2i1i0 and x

(3)i2i1i0

can both be computed in-place in a[i0i1i2]. Sincex

(3)i2i1i0

= Xi0i1i2 is contained in a[i0i1i2], the output frequencies Xms are naturallyordered in array a as desired.

However, the easiest way to understand an algorithm may not be the most e-cient way to implement an algorithm. For example, two implementations of a singlebuttery computation step involving naturally ordered input elements a[2] = x2 anda[6] = x6 are depicted in Figures 6.3 and 6.4.

In Figure 6.3, the ordered DIF FFT is implemented as one understands it; i.e.,a permutation step actually precedes the buttery computation. As reected by thefragment of pseudo-code displayed in Figure 6.3, memory locations b[4] and b[5] areeach modied twice.

In Figure 6.4, the ordered DIF FFT is implemented without rst permuting a[2]to b[4], a[6] to b[5], . . . , etc. Instead, the derivative x(1)2 is computed and storeddirectly into b[4], and so on. As reected by the fragment of pseudo-code displayed inFigure 6.4, memory locations b[4] and b[5] are each modied only once. Since the samememory accessing pattern applies to all butteries in every stage, this implementationeliminates all extra memory accesses in reordering intermediate results, and it is a moreecient way to implement the ordered DIF FFT algorithm. The complete pseudo-codeprogram is given as Algorithm 6.1 below.


Figure 6.3 Naive Implementation of the (ordered) DIFNN FFT.


Figure 6.4 Implement the (ordered) DIFNN FFT with no extra memory access.


Algorithm 6.1 The (ordered) radix-2 DIFNN FFT algorithm.

beginNumOfProblems := 1 Initially: One problems of size NProblemSize := N HalfSize = ProblemSize/2Distance := 1NotSwitchInput := truewhile ProblemSize > 1 do Halve each problem

if NotSwitchInput Array a contains input; array b contains outputfor JFirst := 0 to NumOfProblems 1 do

J := JFirst; Jtwiddle := 0K := JFirstwhile J < N 1 do

W := w[Jtwiddle]b[J ] := a[K] + a[K +N/2]b[J + Distance] := (a[K] a[K+N/2]) WJtwiddle := Jtwiddle + NumOfProblems Assume w[] = NJ := J + 2 NumOfProblemsK := K + NumOfProblems

end whileend forNotSwitchInput := false

else Array b contains input; array a contains outputfor JFirst := 0 to NumOfProblems 1 do

J := JFirst; Jtwiddle := 0K := JFirstwhile J < N 1 do

W := w[Jtwiddle]a[J ] := b[K] + b[K +N/2]a[J + Distance] := (b[K] b[K +N/2]) WJtwiddle := Jtwiddle + NumOfProblems Assume w[] = NJ := J + 2 NumOfProblemsK := K + NumOfProblems

end whileend forNotSwitchInput := true

end ifNumOfProblems := NumOfProblems 2ProblemSize := ProblemSize/2Distance := Distance 2

end whileend


6.1.2 The shorthand notation

As usual, assuming that x is initially contained in a in the natural order, a second arrayb would alternately contain the data. The entire computation process, along with theuse of the two arrays, is depicted below.

i2i1i0 i1i0i2 i0

i12

i012

a b a b

Note that the corresponding twiddle factors are

i1i0N , i00N ,

0N = 1 ,

because DIFNR, DIFRN, and DIFNN FFT algorithms all transform the same elementxi2i1i0 , although they refer to the dierent addresses of xi2i1i0 in expressing the samealgorithm.

Once again, all details of the (ordered) DIFNN FFT can be captured by a shorthandnotation together with the twiddle factors.

6.2 Applying the Ordered DIF FFT to a N = 32

Example

Generalizing the shorthand notation for N = 32, the following sequence represents allve stages of permutation and computation depicted in Figure 6.5.

i4i3i2i1i0 i3i2i1i0i4 i2i1i0

i34 i1i0

i234 i0

i1234

i01234

a b a b a b

The corresponding twiddle factors are

i3i2i1i0N , i2i1i00N ,

i1i000N ,

i0000N ,

0N = 1 .

By comparing Figure 6.6, where the butteries associated with a particular pairof resulting subproblems are shown without the cluttering of others, with the twounordered DIF FFT in Figures 4.4 and 5.3, one immediately observes that

all three variants of the DIF FFT treat exactly the same pairs of subproblemsduring each stage of the computation.

Thus they all implement the same radix-2 DIF FFT algorithm.


Figure 6.5 Butteries of the (ordered) DIFNN FFT algorithm.


Figure 6.6 Identifying the subproblems paired up by the (ordered) DIFNN FFT.


6.3 In-Place Ordered (or Self-Sorting) Radix-2 FFT

Algorithms.

Another class of ordered FFTs performs in-place permutation and consequently doesnot need a second array; they are the so-called self-sorting in-place algorithms. Thisclass contains variants of the prime-factor algorithms [20, 81, 99] and a radix-2 FFT [58].This class has been further extended to include self-sorting in-place radix-3, radix-4,radix-5, and nally mixed-radix FFTs [101]. The radix-2 algorithm is relevant to thediscussion in this chapter. Using the notation developed earlier, the process of applyingthe self-sorting in-place radix-2 DIF FFT to array a, which contains naturally orderedx, is depicted below for N = 32.

i4i3i2i1i0 i0i3i2i1i4 i0i1i2

i34 i0i1

i234 i0

i1234

i01234

a a a a a a

Observe that the permutation always involves bits in symmetric positions: e.g., instep 1, the left-most bit i4 switches with the right-most bit i0 and in step 2, bit i3,the second bit from the left end, switches with bit i1, the second bit from the rightend. Accordingly, the ordering of the bits is reversed after only two steps, andthe permutation can be implemented using pairwise interchanges. The contents ina[0i3i2i11] and a[1i3i2i10] are switched in step 1 and the contents in a[i00i214] anda[i01i204] are switched in step 2. Since each pairwise interchange can be done usinga single temporary location, the array b is not needed.


INSIDE the FFT BLACK BOX: Serial and Parallel Fast Fourier Transform AlgorithmsTable of ContentsPart II: Sequential FFT AlgorithmsChapter 6: Performing Bit-Reversal by Repeated Permutation of Intermediate Results6.1 Combining Permutation with Butterfly Computation6.1.1 The ordered radix-2 DIF NN FFT6.1.2 The shorthand notation

6.2 Applying the Ordered DIF FFT to a N = 32 Example6.3 In-Place Ordered (or Self-Sorting)Radix-2 FFT Algorithms.

Date post:	07-Nov-2015
Category:	Documents
Upload:	bao-tram-nguyen
View:	212 times
Download:	0 times

0270_PDF_C06.pdf

Documents