01205908

8/6/2019 01205908

1/4

IMPLEMENTATION OF A PROGRA MM ABLE 64-2048-POINT FFT/IFFT PROCESSORFOR OFDM-BASED COM MUNICATION SYSTEMS

WLAN 64ADSL 2x256VDSL Zx256xZ",n=0:4

2 5 6 x 2 " , ~ 0 : 3AB

Jen-Chih Kuo, Ching-Hua Wen, and An-Yeu (Andy) Wu

3.2PS23P S23 IP S

31x2"

0.3125 MH z4.3125 KHr4.3125 KH n

4 . 0 6 5 ~ 2 " Hz us

Graduate Institute of Electronics Engineering, andDepartment of Electrical Engineering,National Taiwan University, Taipei, 106, Taiwan, R.O.C .ABSTRXCT

Orfhogonal Frequency Division Mu1fipleri.g ( O F D M )system is famous for its robustness against frequencyselective fading channel. Th e Fas f Fourier T ransform (FFT)an d Inverse FFT (IFFT) processor are used as themodulation/demodulation kernel in the OFDM systems. Thesizes of FFTLlFFT .processors are varied in the differentapplications of OFDM systems. In this pa per, we design an dimplement a programm able 64-2048-point FF Tn FF Tprocessor to cover the different specifications of OFDMapplications. The cached-memory architecture is oursuggested VLSl system architecture. We implement theProcessing Element (PE) by using CORDIC algorithm toreplace the multiplier-based PE. We also proposed d4 -prerofafion and modified EEAS-CORDIC VLSI architectureto reduce the iteration number and quantization noise.Finally, we implement the FFT processor with TSMC 0.35p m 1P4M CMO S technology. The die area of the FFT/IFF Tprocessor is 12.25 mm' incl udi ng. 204 8~3 2 its memory. Th einputloutput wordlength is 16-bit wide. The chip can oper ateunder 80 MHz and meet most standard requirements(64-2048 points).

1. I N T R O D U C T I O NThe OFDM system is a form of Mulli-Carrier Modularion(MCM) technologies [I]. It has been widely implemented in

high digital com munications, such as wireless LAN, 801.1 la,digital audiolvideo broadcasting, ADSL, and VDSL system 12-31One of the main reasons is to increase the robustness againstfrequency selective fading or narrowband interference. Themodulatioddemodulation kernel in OFDM system is the FFTIIFFT operations. But the sizes of FFTIIFFT are different becauseof the various applications of OFDM system. Traditionally, weneed lo design various points of FFT/IFFT processors for everyapplication of OFDM system individually, as shown in Table 1.This causes the waste of time and money. In this paper, wedesign and implement a programmable FFT/IFFT processor tobe used in various OFDM-based communication systems. W ealso use low switching activity CORDIC-based PE design toachieve low-power consumption so as to elongate the battery lifeon mobile applications and to relieve the heating problem of thechip. We implement the FFT processor with TSMC 0.35 pmlP4M CMOS technology. The die area of the FFTLFFTprocessor is 12.25 mm' including 2048x32 bits SRAM. Th einputloutput wordlength is 16-bit wide. The maximum operating

0-7803-776 I-3/03/$17.00 02 00 3 IEEE

frequency is 80 MHz, which can meet most existing OFDMsystems using 64-2048-point FFTIIFFT.

Application F W l F F l Size Frequencyspacing

I I I 'DVB-T I 819212048 I .1.116/4.464KHz I "'"lf'"

Table 1. FFT1lFFT Size for OFDM Applications.2. PROPOSED SYSTEM A R C H I T E C T U R E

There are various SINCIUM for implementations of FFTprocessors, such as single-memory, dual-memory, pipelinedarchitecture, array type [4]. Typically conventional FFTalgorithms are developed to minimize the number ofmultiplications and additions. However, the memory operationsare usually ignored. Hidden memory operations might take halfof the power consumption in the w hole FFT calculations [ 5 ] . Toreduce the number of memory access, we choose the Cached-Memory architecture [4] to realize the programmable 64-2048-point FFT processor. The basic idea of cached FFT is to reducethe number of memory access as shown in Fig.1. Instead ofprocessing one stage of butterfly operations at a time, we storedata in local storage, and process more data in one Super-Slage(Passo, Pass l , Pnss2) at a period of time. To achieve this, wehave to design two sections of data movement operationsdiffering from traditional FFT, as shown in Fig. 2. The resultingoperations are still very regular and will not increase muchcomplexity. The data will only redwrite from the cachememory of each Super-stage. Super-Stages can greatly reducethe number of memory access as the FF T size N becomes bigger.In this paper, we also design the programm able 64-2048 pointsFFT processor VLSI architecture, as shown in Fig. 3.

The programmable FFTilFFT processor consists of fourdesign units: Processing Element (PE), Address G enerator (AG),and Control Logic Uflil (CLU). The following sections willdiscuss the design issues oft ho se units.

U-121

8/6/2019 01205908

2/4

.Fig. 1. Cached FFT DataflowDiagram [4]Memory

Fig. 1.Cache-Memory FFT ProcessorArchilecNre

IData lnpuv0utpn

Fig. 3. The proposed pmgammable 64-204-pint FFTIIFFTP ~ C C E E O ~LSIArchitechre.

3. PROCESSINGELEMENT (PE)Th e COordinate Rotational Digital Computer (CORDIC)algorithm is a well-known VLSl arithmetic unit. The basicconcept of CORDlC is to decompose the d esired rotation anglesinto several easy-to-be-implemented sub-angles. We adoptExtended Elementary Angle Set (EEAS) scheme [6] to composethese sub-angles. Each sub-angie can do one mi&-rotation. The

hardware requirement of CORDIC is very simple. It also haspotential advantage of low switching activities for low power.Hence, we employ the modified EEAS-CORDIC VLSIarchitecture to design our Butterfly PE .

3.1. Modif ied VLSI A r c h i t e c t u r e of EEAS-CORDICIn order to realize the IP core of EEAS-CORDIC PE, wepropose the modified EEAS-CORDIC VLSI architecture, as

shown in Fig.4. The important differences fiom the conventional

CORDIC design are the parameter sequences arrangement,physical consideration and circuit speed-up: It can definitelyimprove the performance of COpIC-based PE. The PE alsoemployed the technique of Carry-Save Adder and Carry Look-ahead adder to speed up the design.We design a single stage of EEAS-CORDIC, which includesmicra-rotation mode and scaling mode. Besides, the modifiedEEAS-CORDIC architecture needs dedicated parametersequences arrangement in th e EEAS algorithm. We follow [6] todesim those EEAS-CORDIC " m e te r s to control the butteri lv-PE circuit.

8/6/2019 01205908

3/4

CORDlC PE Multiplier PE

I I I J

d2pre-rofnf ion

Table1 .Coefficient Storage Comparison of CORDIC-basedand Multiplier-based PE ,

CO,XI [ O , n / Z 14pre-rotation

1 (0000,0000,000l)

-I 1111,1111,1111)

Rational Sequence @)of C O R D l C1 (0000,0000,000l)

-I I 111,1111,111 I)1I I

Fig. 5. Switch activity in 2 's complement representation and CORDICrepresentation.4. CLU AND AG DESIGNS

4.1 Control Logic Unit (CLU) DesignTh e topmost Conrrol Logic Unit (CLU) is composed of thefollowing three individual circuits:

a. FFT/IFFT Operation Selection: I-hit input determineswhether the FFT or IFFT transform should he computed.h. FFT Size Selection: 3-bit FFT size selecting inputs decideswhich length of the FFT should he calculated.c. Data movement: The processing kernel, PE and cache, andmain memory can operate at different frequency to furtherreduce the power consumption.

4.2 Address Generator (AG) Unit DesignThe cached FFTIIFFT address generation circuit can heviewed as a modified version of the traditional FFT. What weneed to do is to find a grouping o ft he memory accesses such thata portion of the full FFT can be calculated using less than Nwords ofmemory. Table 3[4] show the generated address in 64-point FFT.To achieve the goal of sharing address, we can discard the

LS B digits of address in high points FFT. Fo r example, wediscard the generated address of 8-points FFT processor in Fig. 6(a ) to get the final address of4-p oints FFT in Fig. 6(b).

Super- Memory Cache RomStage I Pass I Address IAddressl Address

Table 3. Th e generated Address for 64-paint Cached FF? [4].

Fig. 6 . (a) The generated address in 8-point FF T (b) The generatedaddress after discarding the LSB in 4-point F F l .

5. IM P L E M E NT AT ION RE SUL T SThe FFTIIFFT processor is implemented with TSMC 0.35 p mlP4M CMOS technology The die size is 3.9 x 3.9 mm* with2048-word memory, each is 32 bits wide. The microphotographof the processor is shown in Fig.7.Table 4 lists the physical implementation result of thisprogrammable 64-2048-point FFTIIFFT. The FFT sizes and therespective operating frequency and power consumption are listedin Table 5.

II-123

8/6/2019 01205908

4/4

Fig. 7. Microphotograph of the FFTIIFFT Processor.

I Technology I TSMC 0.35pm 1P4M CMOS I

ApplicationFFT Size-

Wordlen 16 bitsGate Counts 14,732Die Size 3. 9 I .9 mmCore Size 2 . 6 x 2 . 6 m m

(IEEE VDSL, DAB,802.11a) DAB DVB-T64 512 1024 20483. 2 62 128 224

Max FrequencyPower Range I 126 - 74 mW JTable 4. hplem enlation result of programmable FFT processorI I WAN I ADSL, I .,-~.VDSL, I

1F R I P 1 P I PS I PS IFrequencyperating I 65MHz 1 . Fiz I 2 5MH z I 6 0 M H z1 Power I 545mW I 126mW I 2 5 3 m W I 574mW IonsumptionTable S. FFI ize, TFPT, requency, and Power Consumption.

6. COM P ARSIONIn order to eliminate the factor of different fabricationtechnology, we adopt the Normalize Index [4]. Th e Normalized

Area is the silicon area normalized to a 0.35 pm technology, asshown in eq. (2).(2)Area(TechnoIogyl0.35pn)Normalized Area =

The FFTs per Energy, which compares the number of FFTcalculation per Energy, as shown in eq. 3).

Technology x FrequencyPower

FFTs per Energy = X I 0 0 0 (3 )As shown in Table 6. We have pretty good performanceaccording to the normalize index.

I Normal FFTsCMOS FFT F r q . AreaTech. Size (Mllz) (mm) i ~ ~ dArea Energyper8192 16.60.6 62.4 21.23 10.80.6 8192 20 140 47.63 18.4

Table 6. Comparison of Various FFTIEFT processors

7. C O N C L U S I O NThe programmable FFT/IFFT processor design has beendemonstrated based on OFDM applications. The cached-memoryarchitecture is chosen and then we defined the hardwarearchitecture. Finally we finished the design of a 64-2048-pointProgrammable FFTAFFT processor and work in mostapplications of OFDM system successfully.

8. RE F E RE NCE S[ I ][2 ]

Irving Kalet, The multitone channel, IEEE frans. Oncommunication,pp 119-124, February 1989.P.S. Chow, I. C. Tu, and I. M. Ciofk PerformanceEvaluation of a Multichannel Transceiver System forADSL and VH DSL services, IEEE J. Selected A rea, Vol.SAC-9, No. 6 , pp. 909-919, Aug. 1991131 R.V. Paiement, Evaluation of Single Carrier andMulticarrier Modulation Techniques for Digital ATVTerrestrial Broadcasting, CRC Report, No. CRC-RP-004,Ottawa, Canada, Dec. 1994.B.M. Bass, A Low-Power High-Performance, 1024-PointFFT Processor, I E E E J of Solid-state Circuit.vol. 3 4 no.3 , pp. 380-387 , Ma r 1999.W. Li and L. Wanhammar, A Pipeline FFT Processor,IEEE Workrhop on Signal Processing Systems, pp. 654-662, 1999.[6] C.S. W u and A. Y. W u, A novel rotational VLSlarchitecture based on extended elementary-angle setCORDIC algorithm, in Proc. IEEE 2nd EEE Asia Pnc fieConference on A S K S , Cheju, Korea), pp. I 1 1-114,2000.[7 ] E. Bidet, D. Castelain, et al., A fast single-chipimplementation of 8192 complex point FFT, IEEEJournal of Solid-State Circuits, vol. 30 , Issue 3, pp . 300-305, March 1995.C.C.W. Hui, T.1. Ding, et al., A New FFT Architectureand Chip Design for Motion Compensation Based onPjase Correlation, Preceeding of InternationalConference, A pplication Specific Sysfem s ArchitecturesnndProcessors, pp. 83-02, 1996.J. Lihong, Y. Gao, et al. A New VLSI-Oriented FFTAlgorithm and Implementation, IEEE ASIC Confirence,pp. 337-341, 1998.

[4]

[5 ]

[XI

[9 ]

U-124

Date post:	07-Apr-2018
Category:	Documents
Upload:	annapurna-kamadi
View:	221 times
Download:	0 times

01205908

Documents