+ All Categories
Home > Documents > FFT in Hardware and Software

FFT in Hardware and Software

Date post: 03-Jan-2016
Category:
Upload: edward-saunders
View: 48 times
Download: 2 times
Share this document with a friend
Description:
FFT in Hardware and Software. Background. Core Algorithm Original Algorithm, the DFT, O(n 2 ) complexity New Algorithm, the FFT (Fast Fourier Transform), O(nlog 2 (n)) depending on implementation. DFT Computation. - PowerPoint PPT Presentation
Popular Tags:
35
FFT in Hardware and Software
Transcript
Page 1: FFT in Hardware and Software

FFT in Hardware and Software

Page 2: FFT in Hardware and Software

Background

• Core Algorithm• Original Algorithm, the DFT, O(n2) complexity• New Algorithm, the FFT (Fast Fourier

Transform), O(nlog2(n)) depending on implementation.

Page 3: FFT in Hardware and Software

DFT Computation

• A summation over the whole input array for every single element in the output array.

• A VERY computationally inefficient algorithm to implement.

]1[][])[()(

n

jnenxnxDFTX

Page 4: FFT in Hardware and Software

FFT Computation

• A much more computationally efficient algorithm

• Works using the divide and conquer principle.

• First developed by Cooley and Tukey in 1965!

Page 5: FFT in Hardware and Software

DFT vs. FFT (Number of Operations)

Problem Size (N)

Standard DFT(smaller is better)

FFT(smaller is better)

% of DFT(smaller is better)

2 4 1 25

4 16 4 25

8 64 12 19

16 256 32 13

32 1024 80 8

64 4096 192 5

128 16384 448 3

256 65536 1024 2

512 262144 2304 1

1024 1048576 5120 <1

Page 6: FFT in Hardware and Software

Exponential Growth of DFT(Smaller is Better)

0

200

400

600

800

1000

1200

0 200 400 600 800 1000 1200

Th

ou

san

ds

Problem Size

Co

mp

uta

tio

ns

Req

uir

ed

Nearly Linear Growth of FFT(Smaller is Better)

0

1

2

3

4

5

6

0 200 400 600 800 1000

Th

ou

san

ds

Problem Size

Co

mp

uta

tio

ns

Req

uir

ed

DFT vs. FFT

Percent of DFT Computation Time(Smaller is Better)

0%

5%

10%

15%

20%

25%

30%

0 200 400 600 800 1000 1200

Problem Size

Per

cen

t o

f D

FT

Co

mp

uta

tio

n T

ime

Page 7: FFT in Hardware and Software

FFT Butterfly Operations

• Butterfly arrangement of computations

• Repeated on successive pairs of input data

• Then half as many times on alternating pairs

• Then half again as many times on every fourth element

• …

Page 8: FFT in Hardware and Software

The Butterfly

• Simple operations repeated many times

-WnN

xe[n]

xo[n]

X[n]

X[n+N/2]

WnN

N

nkj

nkN eW

2

Page 9: FFT in Hardware and Software

8-point FFT DemonstrationThe Entire Calculation

x[0]

x[4]

x[2]

x[6]

x[1]

x[5]

x[3]

x[7]

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

Input Array Output

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Multiplication by W factor Addition+

Page 10: FFT in Hardware and Software

8-point FFT Demonstration

x[0]

x[4]

x[2]

x[6]

x[1]

x[5]

x[3]

x[7]

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

Input Array Output

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Multiplication by W factor Addition+

Page 11: FFT in Hardware and Software

8-point FFT Demonstration

x[0]

x[4]

x[2]

x[6]

x[1]

x[5]

x[3]

x[7]

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

Input Array Output

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Multiplication by W factor Addition+

Page 12: FFT in Hardware and Software

8-point FFT Demonstration

x[0]

x[4]

x[2]

x[6]

x[1]

x[5]

x[3]

x[7]

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

Input Array Output

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Multiplication by W factor Addition+

Page 13: FFT in Hardware and Software

8-point FFT Demonstration

x[0]

x[4]

x[2]

x[6]

x[1]

x[5]

x[3]

x[7]

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

Input Array Output

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Multiplication by W factor Addition+

Page 14: FFT in Hardware and Software

8-point FFT Demonstration

x[0]

x[4]

x[2]

x[6]

x[1]

x[5]

x[3]

x[7]

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

Input Array Output

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Multiplication by W factor Addition+

Page 15: FFT in Hardware and Software

8-point FFT Demonstration

x[0]

x[4]

x[2]

x[6]

x[1]

x[5]

x[3]

x[7]

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

Input Array Output

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Multiplication by W factor Addition+

Page 16: FFT in Hardware and Software

8-point FFT Demonstration

x[0]

x[4]

x[2]

x[6]

x[1]

x[5]

x[3]

x[7]

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

Input Array Output

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Multiplication by W factor Addition+

Page 17: FFT in Hardware and Software

8-point FFT Demonstration

x[0]

x[4]

x[2]

x[6]

x[1]

x[5]

x[3]

x[7]

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

Input Array Output

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Multiplication by W factor Addition+

Page 18: FFT in Hardware and Software

8-point FFT Demonstration

x[0]

x[4]

x[2]

x[6]

x[1]

x[5]

x[3]

x[7]

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

Input Array Output

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Multiplication by W factor Addition+

Page 19: FFT in Hardware and Software

8-point FFT Demonstration

x[0]

x[4]

x[2]

x[6]

x[1]

x[5]

x[3]

x[7]

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

Input Array Output

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Multiplication by W factor Addition+

Page 20: FFT in Hardware and Software

8-point FFT Demonstration

x[0]

x[4]

x[2]

x[6]

x[1]

x[5]

x[3]

x[7]

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

Input Array Output

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Multiplication by W factor Addition+

Page 21: FFT in Hardware and Software

8-point FFT Demonstration

x[0]

x[4]

x[2]

x[6]

x[1]

x[5]

x[3]

x[7]

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

Input Array Output

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Multiplication by W factor Addition+

Page 22: FFT in Hardware and Software

Why Hardware?

• Even more speed for FFT

• Extremely parallelizable

• A whole layer can be done in two FPGA clock cycles– 1 multiply cycle– 1 add cycle– (Assuming sufficient multipliers)

Page 23: FFT in Hardware and Software

Hardware Problems

• Complexity

• Input speed

• Output speed

• If the FPGA takes 24.4ns but takes 20s to transfer the input data, what gain is there?– i.e. 24.4ns + 20s + 20s = ~40s!

Page 24: FFT in Hardware and Software

Mitigation of Hardware Problems– Use a faster bus

• AMD Opteron’s Hypertransport– 20.8 GB/s (166.4 Gb/s) per Link (V. 3)– Modules that fit into an AMD 64-bit Opteron

Socket

– http://www.drccomputer.com/pages/modules.html - xilinx based module

– http://www.xtremedatainc.com/xd1000_brief.html - altera based module

Page 25: FFT in Hardware and Software

Mitigation of Hardware Problems

– Put the FPGA on the die with the DSP• Need silicon vendor support• FPGA can access memory on a very wide bus (i.e.

128 bits per cycle)

– Implement the entire project in FPGA• Time consuming to program• Possibly insufficient room on the FPGA

Page 26: FFT in Hardware and Software

8-point FFT DemonstrationIn Hardware

x[0]

x[4]

x[2]

x[6]

x[1]

x[5]

x[3]

x[7]

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

Input Array Output

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Multiplication by W factor Addition+

Page 27: FFT in Hardware and Software

8-point FFT DemonstrationIn Hardware

x[0]

x[4]

x[2]

x[6]

x[1]

x[5]

x[3]

x[7]

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

Input Array Output

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Multiplication by W factor Addition+

Page 28: FFT in Hardware and Software

8-point FFT DemonstrationIn Hardware

x[0]

x[4]

x[2]

x[6]

x[1]

x[5]

x[3]

x[7]

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

Input Array Output

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Multiplication by W factor Addition+

Page 29: FFT in Hardware and Software

8-point FFT DemonstrationIn Hardware

x[0]

x[4]

x[2]

x[6]

x[1]

x[5]

x[3]

x[7]

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

Input Array Output

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Multiplication by W factor Addition+

Page 30: FFT in Hardware and Software

Why Not Software?

• Each butterfly must be done sequentially

• Only slight parallelism enabled by a DSP like the TigerSHARC

• Each Butterfly can be done in 2 cycles (after optimization).

Page 31: FFT in Hardware and Software

Results of Testing

• Linear Profiling of FFT Algorithm in C++

StageCycle count Time

8-point 32-point 256-point 8-point 32-point 256-point

Initialization 21 25 25 35.07ns 41.75ns 41.75ns

Computation 6922 1135 174222 1.895 s 11.559 s290.950

s

Butterfly 91 151.97ns

Page 32: FFT in Hardware and Software

Results of Testing

• Profiling of VHDL on FPGA

• Butterfly takes 24.377ns to execute– 62% is computational, 38% is routing on FPGA

Page 33: FFT in Hardware and Software

Product Offerings

• Most DSP Vendors• Many FPGA Vendors (IP – Intellectual Property)• Microcontroller Vendors (i.e. Blackfin)• FFTW – The Fastest Fourier Transform in the

West• AMD Math Core Library• Intel Library• Highly Optimized for the expected hardware

Page 34: FFT in Hardware and Software

Published Results

• The Radix 4 version delivers a 1 K points complex processing time of 25 microseconds at 200-MHz system speeds and uses only about 10 percent of the resources in a mid-range Stratix device. The Radix 2 is half the size of the Radix 4 and offers a 1 K points complex processing time of 50 microseconds at 200-MHz system speeds. Additional versions of the new cores are under development. [6]

FFT IP Core Published Results [7]

FFT/IFFT length

Texas InstrumentsC6713

Single 4DSP FFT core

(Smaller is Better)

Quad 4DSP FFT core

(Smaller is Better)

256 12.3µs 3.68µs 920ns

512 27.3µs 6.24µs 1.56µs

1024 60.2µs 11.4µs 2.85µs

Page 35: FFT in Hardware and Software

References

[1] Signals Systems and Transforms[2] James W. Cooley and John W. Tukey, "An algorithm for

the machine calculation of complex Fourier series," Math. Comput. 19, 297–301 (1965).

[3] http://www.drccomputer.com/pages/modules.html - xilinx based module

[4] http://www.xtremedatainc.com/xd1000_brief.html - altera based module

[5] http://www.amd.com/us-en/Processors/DevelopWithAMD/0,,30_2252_2353,00.html

[6] http://www.us.design-reuse.com/news/news5650.html[7] http://www.4dsp.com/fft.htm


Recommended