IEEE EIT 2007 Proceedings 345.
1-4244-0941-1/07/$25.00 c©2007 IEEE
Efficient Memory-Based FFT Processors for OFDM Applications
Chin-Long Wey, Shin-Yo Lin, and Wei-Chien TangDepartment of Electrical Engineering, National Central University, Jhongli, Taiwan
e-mail: [email protected]; URL: www.ee.ncu.edu.tw/~clwey
Abstract- This paper presents Radix-2 memory-based FFT(MBFFT) processors. Taking the advantages of low hardwarecost of MBFFT architectures, this study improves the speedperformance. The improvement was achieved by an efficientmemory retrieval scheme for reducing the control complexityand a clock scheme with parallel structures for reducing thecycle times and latency. Instead of using dual-port memoriesfor data storage and retrieval, our designs use single-portmemories with pre-fetch registers for hardware cost reduction.Based on the pre-layout simulation results, the core area of thedeveloped MBFFT is 2.04mm2 with the maximal work fre-quency of 198MHz for N=8192 points (24bits per word).
I. IntroductionOFDM (Orthogonal Frequency Division Multiplexing)
[1,2], a form of multi-carrier modulation technology, is a spe-cial case of multi-carrier transmission, where a single datastream is transmitted over a number of lower rate sub-carriers.OFDM technique has been widely implemented in high-speeddigital communications to increase the robustness against fre-quency selective fading or narrowband interface. It is also usedfor wideband data communications over mobile radio FMchannels, xDSL, DAB, and DVB-T/H. In these application,efficient FFT (Fast Fourier Transformation) processors arerequired for real-time operation
FFT architectures can be classified into two categories:(1) Pipelined architectures [3-8]; and (2) Memory-based archi-tectures [9-13]. Taking the advantage of structure regularity inVLSI implementation, the pipelined architecture employesmore Processing Elements (PEs) to achieve higher perfor-mance than its counterpart. On the other hand, memory-basedarchitecture requires only one butterfly PE, as well as somememory blocks for storing input and intermediate data, to per-form the real-time operation. Because of high-speed and lowcontrol complexity, pipelined architectures are commonlyused for many applications at the cost of increased chip area.
Taking the advantages of low hardware cost of memory-based FFT (MBFFT) architectures, this study is to improve thespeed performance. The improvement can be achieved by anefficient memory retrieval scheme for reducing the controlcomplexity and a clock scheme with parallel structures forreducing the cycle times and latency. Instead of using dual-portmemories for data storage and retrieval, our designs use single-port memories with pre-fetch registers for hardware costreduction. The use of single-ported memories also contributes
to the cycle time reduction. In general, latency is also an impor-tant parameter for OFDM applications. This study will demon-strates the improvement of latency using multiple PEs and alsoaddresses the design issues.
II. Memory-Based FFT ProcessorsA. Basic FFT Operation
The DFT (Discrete Fourier Transform) of a finite-lengthsequence of length N is
; k=0,1,...,(N-1) (1)
where WNkn=e-j(2π/N)nk. Note that X[k] and x[n] may be com-
plex numbers. Consider the implementation of radix-2 decima-tion-in-frequency (DIT) FFT process, for any integer r=0,1,..,(N/2)-1,
(2)
and
(3)
A butterfly unit is used to compute both (2) and (3) for bothx[n] and x[N/2+n] as
y[n] = x[n] + x[(N/2)+n] (4)y[(N/2)+n]= (x[n] - x[(N/2)+n])WN
n (5)Figure 1 shows the signal flow graph of 8-point FFT.
B. Memory-Based FFT (MBFFT) ArchitecturesA typical MBFFT architecture is shown in Figure 2, in
which five dual-ported (N/4)-RAMs were employed. Note theRAM5 was mainly used as a buffer for temporarily storing thecomputed data. Thus, the architecture requires a total memory
X k[ ] x n[ ]WNkn
n 0=
N 1–
∑=
X 2r[ ] x n[ ] x n N 2⁄( )+[ ]+( )WN 2⁄nr
n 0=
N 2⁄( ) 1–
∑=
X 2r 1+[ ] x n[ ] x n N 2⁄( )+[ ]–( )WNnWN 2⁄
nr
n 0=
N 2⁄( ) 1–
∑=
Figure 1. Signal Flow Graph for Radix-2 FFT with N=8.
x[0]
x[1]
x[2]
x[3]
x[4]
x[5]
x[6]
x[7]
-1
-1
-1
-1
WN0
WN0
WN0
WN0
WN0
WN0
-1
-1
-1
-1
WN0
WN2
WN2
WN2
WN1
WN3
-1
-1
-1
-1
IEEE EIT 2007 Proceedings 346.
CLK
inpu
t RAM1 RAM2 Operations RAM1 RAM2 out-puts 00 01 10 11 00 01 10 11 00 01 10 11 00 01 10 11
1 x0 x02 x1 x13 x2 x24 x3 x35 x4 x0 b0=x0+x4; b4=(x0-x4)*w0 b0 b46 x5 x1 b1=x1+x5; b5=(x1-x5)*w1 b1 b57 x6 x2 b2=x2+x6; b6=(x2-x6)*w2 b6 b28 x7 x3 b3=x3+x7; b7=(x3-x7)*w3 b7 b39 b0 b2 a0=b0+b2; a2=(b0-b2)*w0 a0 a2
10 b1 b3 a1=b1+b3; a3=(b1-b3)*w2 a3 a111 b6 b4 a4=b4+b6; a6=(b4-b6*w0 a6 a412 b7 b5 a5=b5+b7; a7=(b5-b7)*w2 b5 b713 y0 a0 a1 z0=a0+a1; z1=(a0-a1*w0 y0 z0 z114 y1 a3 a2 z2=a2+a3; z3=(a2-a3)*w0 y1 z3 z215 y2 a6 a7 z4=a4+a5; z5=(a4-a5)*w0 y2 z6 z716 y3 a5 a4 z6=a6+a7; z7=(a6-a7)*w0 y3 z5 z4
size of 1.25N words with the latency of (N/2)+(N/2)logNcycles for N-points FFT operation [11].
Since the memories in MBFFT processor dominate theentire chip area, where the area ratio can be as high as 85%,we developed an alternative MBFFT process [12] with a to-tal memory size of N words with the latency of (N/4)+(N/2)logN cycles for N-points FFT operation, as shown in Fig-ure 3(a). Note that the architecture employs two (N/2)-words memories, namely, RAM1 and RAM2. Basically, thefirst (N/2) input data (x[0],x[1],..., x[N/2-1], are loaded andstored to RAM1. Then, the data x[N/2+k], k=0,1,..., (N/2)-1, loaded from input, and x[k], read from RAM1 at the ad-dress “k”, perform the following operations,
b[k] = x[k] + x[N/2+k]; b[N/2+k] = (x[k] - x[N/2+k]) * WN
n;where b[k] and b[N/2+k] are stored to the address “k” ofRAM1 and RAM2, respectively, as shown in Figure 3(b) forN=8.
For example, in clock #5, “b0=x0+x4; b4=(x0-x4)*w0”,the data x4 is loaded from the input and x0 is read from R1at the address “00”. Then, both data are processed and theresultant values, b0 and b4, are stored into R1 and R2 at theaddress “00”, respectively. In clock #10, “a1=b1+b3;a3=(b1-b3)*w2”, both data b1 and b3 are loaded from R1 at“01” and R2 at “11”, the resultant values, a3 and a1, arestored into R1 at “01” and R2 at “11”, respectively. In thisimplementation, the processed data may be stored back tothe addresses where its was loaded, or swapped to the loca-tion where the other data was loaded. In other words, at eachclock cycles, any pair of data to be processed are located atdifferent RAMs and only one address in each RAM isenabled.
Figures 4(a) and 4(b) present the control signals forMUXs and the memory access, respectively. In Figure 4(c),RA/WA stands for the address to be accessed for RA (read)or WA (write). There are two columns under RA/WA,where “1” and “2” represents RAM1 (R1) and RAM2 (R2),respectively. Note that DO and DI are the data output and in-put of the memories, respectively.
Figure 4(c) shows the data flow, where two RAMs areused to store the processed data and each RAM containsonly 4 bits at the locations 00, 01, 10, and 11. The upper halfdescribes the data retrieval in RAM1, where the memoryaddresses are given right after the inputs. Two data to beprocessed are located at different RAMs in Stage i, and theresultant data are stored at the locations in different RAMsin Stage (i+1). One can easily extend the patterns to any N.
In this implementation, the period of N/2 clock cycles isdefined as a phase. For N=8, each phase contains 4 clockcycles, where a two-bit counter (w1w0) is used to count theclock cycles in each phase. Interestingly, the control signal
++
+ _
InputData
RAM1 RAM2 RAM3 RAM4
RAM5
ROM
Figure 2. A Typical MBFFT Architecture [11].
Figure 3. A MBFFT Architecture [12]: (a) Schematic; and (b) FFT Operation with N=8.
RAM-1read
writeRAM-2
read
write
Ouptut
ROM
Input Data
0 1 01 00 10
0 1 1 0
000110
01
m1 m2
m3
m4mb
PE
++
+_
ma 0 1
IEEE EIT 2007 Proceedings 347.
Address R_en ROMRAM1 RAM2 R1 R2 R-en Addr
w9..w1w0 x x x x x x x x x x 1 1 1 x x x x x x x x x xw9..w1w0 w9w8w7w6w5...w0 0 1 0 w9w8......w3w2w1w0w9..w1w0 w9w8w7w6w5...w0 0 0 0 w8w7........w2w1w0 0w9..w1w0 w9w8w7w6w5...w0 0 0 0 w7w6.....w2w1w0 0 0w9..w1w0 w9w8w7w6w5...w0 0 0 0 w6w5......w1w0 0 0 0w9..w1w0 w9w8w7w6w5..w0 0 0 0 w5w4....w1w0 0 0 0 0w9..w1w0 w9w8...w5w4....w0 0 0 0 w4w3.....w0 0 0 0 0 0w9..w1w0 w9w8...w4w3....w0 0 0 0 w3w2..w0 0 0 0 0 0 0w9..w1w0 w9w8...w3w2w1w0 0 0 0 w2w1w0 0 0 0 0 0 0 0w9..w1w0 w9w8w7...w2w1w0 0 0 0 w1w0 0 0 0 0 0 0 0 0w9..w1w0 w9w8w7...w2w1w0 0 0 0 w0 0 0 0 0 0 0 0 0 0 w9..w1w0 w9w8w7...w2w1w0 0 0 0 0 0 0 0 0 0 0 0 0 0
CLK Phase ma m3 m4 m1 m2 mb1-1024 0 0 00 x x xx 0
1025-2048 1 1 w9w9 w9 0 10 02049-3072 2 x w8w8 w8 w9 0w9 03073-4096 3 x w7w7 w7 w8 0w8 04097-5120 4 x w6w6 w6 w7 0w7 05121-6144 5 x w5w5 w5 w6 0w6 06145-7168 6 x w4w4 w4 w5 0w5 07169-8192 7 x w3w3 w3 w4 0w4 08193-9216 8 x w2w2 w2 w3 0w3 0
9217-10240 9 x w1w1 w1 w2 0w2 010241-11264 10 x w0w0 w0 w1 0w1 011265-12288 11 0 00 0 w0 0w0 1
Counter=(w9w8w7w6w5w4w3w2w1w0)
Figure 5. MBFFT: (a) & (b) Control Signals of MBFFT with N=2048 for MUXs; and Memory Access;(a)(b)
in each phase can be represented in terms of the counterwights, i.e., w1 and w0. Note that the 16 cycles in Figure 4(a)can be divided into four phases and the control signals forMUXs, RAMs, and ROM in each phase can be derived asshown in Figures 5(d) and 5(e). Thus, the control circuitry iscomprised of the 2-bit counter. Note that the control signalscan be synthesized using random logics. However, for struc-tural simplicity and regularity, we use a simple finite statemachine (FSM) to realize the control signals. In that imple-mentation, the FSM can be easily extended FFT operationswith any N. This was one of the salient features in the devel-oped MBFFT processor. Figure 5 shows the control signalsof both MUXs and memory access for N=2048, where thereare 12 phases and each phase contains 1024 cycles, and a10-bit counter with (w9w8...w1w0) can be employed.
Our simulation results show that the developed MBFFTPtakes [(N/2)+(N/2)logN] cycles to complete the first set ofdata. Because of the data overlapping shown in Figure 3(b),the MBFFT in Figure 3(a) takes [(N/2)logN] cycles to com-plete each following set of data. Thus, the average latency
of two data processes is [(N/4)+(N/2)logN]. Figure 5(c) shows the layout view of MBFFT with
N=8192, where the memories and circuitry are synthesizedby the Artisan Tool and Design Compiler, respectively,using the TSMC 0.18µm 1P6M digital CMOS process. Notethat two 4K dual-ported RAMs were employed, where eachword contains 24 bits, i.e., 12 bits in complext numbers.Experimental results show that the maximum work fre-quency is approximately 117 MHz, and the core area is4.12mm2.
Note that the total area of the two 4K dual ported RAMsin Figure 5(c) is approximately 82.03% of the total area, or3.38mm2. The ROM takes approximately 10.25%. On theother hand, the PE and control logics (including MUXs)require only 2.13% and 0.71%, respectively. The data showthat the control logic is very simple for this implementation,while the area of PE is not significant.
The goal of the present paper is to develop a low costFFT processor for DVB-T applications. Higher work fre-quency and smaller area were set to the highest priority.
CLK ma m1 m2 mb m3 m4
1 0 x xx 0 10 x2 0 x xx 0 10 x3 0 x xx 0 10 x4 0 x xx 0 10 x5 1 0 10 0 01 06 1 0 10 0 01 07 1 0 10 0 00 18 1 0 10 0 00 19 x 0 00 0 01 010 x 0 00 0 00 111 x 1 01 0 01 012 x 1 01 0 00 113 0 0 00 1 10 x14 0 1 01 1 10 x15 0 0 00 1 10 x16 0 1 01 1 10 x
CLK
inpu
t RA/WA R1 R2 R1 R21 2 DO DO DI DI
1 x0 00 00 x02 x1 01 01 x13 x2 10 10 x24 x3 11 11 x35 x4 00 00 x0 b0 b46 x5 01 01 x1 b1 b57 x6 10 10 x2 b6 b28 x7 11 11 x3 b7 b39 00 10 b0 b2 a0 a2
10 01 11 b1 b3 a3 a111 10 00 b6 b4 a6 a412 11 01 b7 b5 a5 a713 y0 00 11 a0 a114 y1 01 10 a3 a215 y2 10 01 a6 a716 y3 11 00 a5 a4
CLK Phase ma m3 m4 m1 m2 mb1-4 0 0 00 x x xx 05-8 1 1 w1w1 w1 0 10 0
9-12 2 x w0w0 w0 w1 0w1 013-16 3 0 00 0 w0 0w0 1
Counter=(w1w0) Address R_en ROMR1 R2 R1 R2 R-en Addr
w1w0 xx 1 1 1 xxw1w0 w1w0 0 1 0 w1w0w1w0 w1w0 0 0 0 w0 0w1w0 w1w0 0 0 0 0 0
Figure 4. MBFFT [11]: (a) Control Signals for MUXs;(b) for Memory Access; (c) Signal Flow Graph; and (d)&(e) Simplified Version of Control Signals.
(a) (b)
(c)
(d) (e)
01234567
0123
01674523
03652147
03651274
RA
M1
RA
M2
0001101100011011
SRAM4K x 24
SRAM4K x 24
RO
M
SRAM4K x 24
SRAM4K x 24
RO
M
and (c) Layout View of MBFFT with N=8192.
(c)
IEEE EIT 2007 Proceedings 348.
III. Proposed FFT ProcessorsOne way to improve the area of the MBFFT processor in
Figure 5(c) is the use of single-port memory. Table 1 com-pares the synthesized results obtained by Artisan tool forboth single-ported and dual-ported RAMs.
Table 1: Comparison for Dual-port vs. Single-port(24-bits/word)
Apparently, the use of single-port memory has better per-formance in both area and speed. However, at each cycle ofthe FFT operation in Figure 3(a), we need to perform theoperations of reading data from the memories and writingdata to memories. Both operations can be achieved by adual-ported memory within one cycle. However, it may taketwo cycles for single-ported memory to perform the sameoperations. In order to reduce the cycle time, we develop analternative design which employs the single-port memorywith pre-fetch registers to accomplish the memory accesswithin one cycle.
Figure 6(a) shows the proposed MBFFT processor. Twopre-fetch registers are inserted. Basically, the FFT operationis similar to that in Figure 3(a). The processor can be dividedinto two stages: (a) data storage; and (b) data process. Theright-hand side of Figure 6(b) is the data storage stage, whilethe left-hand side is the data processing stage. In fact, bothstages can be performed in parallel. Figure 6(c) shows theclock control signals of the FFT operation. In this imple-
mentation, this FFT control signal is also used for synchro-nizing the data process stage. On the other hand, the FFTclock signal is divided into two sub-cycles, as shown in Fig-ure 6(d). The first sub-cycle is used to control the operationthat stores the processed data to the memory, where thememory is controlled by the positive-edge of the control sig-nal. At the same time, the data stored in the pre-fetch regis-ter is also available for the PE to use. At the positive-edge ofthe control signal in the second sub-cycle, data is read frommemory and is ready for the pre-fetch register to fetch.
Based on the proposed clock scheme, the use of single-ported memory with the pre-fetch registers can achieve thesame task as that for the use of dual-ported memory. At thesame time, the clock cycle is reduced significantly, wherethe worst-case delay is reduced to that in the data processstage.
The MBFFT in Figure 6(a) has also been developed andits layout view is given in Figure 7.
Experimental results show that the area is reduced from4.12mm2 for the 8K MBFFT with dual-ported memories to2.04mm2. The area reduction is almost 50%. The maximumworking frequency is increased from 115MHz in Figure 5(c)
Area (mm2) 8K 2K 1K 512 256
Dual-port 2.044 1.439 1.390 1.295 1.278Single-port 1.635 1.237 1.205 1.204 1.143
Delay (ns) 8K 2K 1K 512 256Dual-port 2.75 0.84 0.48 0.31 0.20Single-port 1.20 0.37 0.23 0.14 0.09
RAM
RAM
RO
M RAM
RAM
RO
M
Figure 7. Layout View of the Proposed MBFFT.
RAM-1read
writeRAM-2read
write
Output
ROM
Input Data
0 1 01 00 10
0 1 1 0
000110
01
m1 m2
m3
m4mb
PE+
++
_
ma 0 1
Prefetch Prefetch
Output
ROM
Input Data
0 1 1 0
000110
01
m3
m4mb
PE+
++
_
ma 0 1
PrefetchRAM-1
read
writeRAM-2
read
write
0 1 01 00 10m1 m2
Prefetch
W R
PE
W R
PE
W R
PE
W
R
PE
Write Processed Data to Memory
Read Data from Memory
Process Data
Buffer Data is Available
Figure 6. MBFFT with Single-port Memories: (a) Schematic; (b) Parallel Structure;and (c) & (d) Timing and Clock Control Signals.
(a) (b)
(c)
(d)
IEEE EIT 2007 Proceedings 349.
to 198MHz. The performance improvement is significant.For OFDM applications, the speed performance of the
FFT processor is determined by the work frequency and thelatency. In the proposed architecture, the latency is (N/4)+(N/2)log(N) for N points FFT operations. In general, thelatency can be improved by using more PEs.
As mentioned, the area of a PE in Figure 3(c) is approx-imately 2.13% of the overall core area, and it is not signifi-cant comparing with the area of memories. Thus, we mayuse two PEs as shown in Figure 7(a), and the associated sig-nal flow is given in Figure 7(b) for N=16. The circuit in Fig-ure 7(a) is also implemented. The experimental results showthat the core area is 2.33mm2. for N=8192 points (24 bits perword). The maximal work frequency is 185 MHz. The areaof the circuit increases approximately 14.2% comparingwith the MBFFT with one PE. In this implementation, forN=8192, four 2K RAMs are employed. Note that bothapproaches employs a total of 8K RAMs. However, the totalarea of four 2K RAMs is approximately 10% more than thatof two 4K RAMs. The area increase is due to the PE andcontrol logics are not significant. Moreover, the latency ofthe MBFFT with 2 PE is reduced to (3N/4)+(N/4)logN. Inother words, the latency is reduced approximately by a half.Further, for the MBFFT with 4 PE, the core area is approx-imately 3.37 mm2, the working frequency is 162MHz, andits latency is N+(N/8)lognN.
In summary, for the MBFFT with N=8192 points (24bits per word), from the layout data, the experimental dataare tabulated in Table 2.
IV. ConclusionThis paper compares the MBFFT designs using dual-
ported memories and single-ported memories. Results showthat the area of using dual-ported memory is about double
that of single-ported memory. Thus, a clock scheme is pro-posed to use single-ported memory with pre-fetch register toreduce the hardware cost. Due to the insignificant area of PEin a MBFFT processor, this paper also demonstrates the per-formances of area, maximal work frequency, and latency ofthe MBFFTs with one PE, two PEs, and four PEs. As listedin Table 2, the MBFFT processor with one PE has the small-est area and highest work frequency, but with the lowestlatency. In many OFDM applications, the architecture ofFFTs depends upon the speed, latency, and power consump-tion. The higher work frequency, the more power consump-tion. However, it will process more data.
This paper demonstrates the design tradeoffs andassumes the FFT with the simple radix-2 structure. In fact,the increase of PEs and/or the use of higher radix mayimprove the design performance at the cost of increase thedesign complexity.
AcknowledgmentThis work was supported in part by the Taiwan National Science
Council under the grant numbers NSC94-2220-E-008-001,NSC94-2220-E-008-008, and in part by Elan MicroelectronicsCorp., Taiwan.
Table 2. Performance of Proposed MBFFTN=8192 Area
(mm2)Max. Work
Freq. (MHz)Latency
with 1 PE 2.04 198 55,296with 2 PEs 2.33 185 32,768with 4 PEs 3.37 162 21,504
RAM 2Kx24
RAM 2Kx24
RAM 2Kx24
RAM 2Kx24
ROM
RAM 2Kx24
RAM 2Kx24
RAM 2Kx24
RAM 2Kx24
ROM
0123456789101112131415
01234567
0123456789101112131415
0167452389141512131011
0365214781114131091215
0365127481114139101512
RA
M1
RA
M2
RA
M3
RA
M4
00011011000110110001101100011011
RAM-1read
writeRAM-2
read
write
Output
ROM
Input Data
P E
RAM-3read
writeRAM-4
read
write
Output
P E
0 1 01 00 10
0 1 1 0
000110
000110
0 1 1 0
000110
000110
m11 m12 m22
m13 m14
mbm23 m24
0 1m21
01 00 10
mc1mc2
0 1
01
Prefetch Prefetch Prefetch Prefetch
Figure 7. Proposed MBFFT with 2 PEs: (a) Schematic; (b) Signal Path Flow with N=16; and (c) Layout View of MBFFT with N=8192.
(24 bits per word)
IEEE EIT 2007 Proceedings 350.
References1. R. van Nee and R. Prasad, OFDM for Wireless Multimedia
Communications, Artech House, 2000.2. W.-Y. Zou and Y. Wu, "COFDM: an Overview," IEEE Trans.
on Broadcasting, pp.1-8, March 1995.3. G. Bi and E.V. Jones, “A Pipelined FFT Processor for Word-
Sequence Data,” IEEE Trans. on Acoustics, Speech, andSignal Processing, Vol.37, pp.1982-1985, December 1989.
4. S. He and M. Torkelson, “Designing Pipeline FFT Processorfor OFDM (de)modulation,” Proc. of InternationalSymposium on Signals, Systems, and Electronics (ISSSE),pp.257-262, 1998.
5. Y.N. Chang and K.K. Parhi, “An Efficient Pipelined FFTArchitecture,” IEEE Trans. on Circuits and Systems II:Analog and Digital Signal Processing, Vol. 50, pp.322-325,June 2003.
6. L. Jia, Y. Gao, J. Isoaho, and H. Tenhunen, “A New VLSI-oriented FFT Algorithm and Implementation,” Prco. of 11thAnnu. IEEE International ASIC conference, pp.337-341,September 1998
7. E. Bibetm D. Castelain, C. Joanblanq, and P. Senn, “A FastSingle-chip Implementation of 8192 Complex Points FFT,”IEEE Journal of Solid-State Circuits, Vol.20, pp.205-300,March 1995.
8. J. Lee, H. Lee, S-I Cho, and S.-S. Choi, “A High-speed Low-Complexity Radix-24 FFT Processor for MB-OFDM UWBSystems,” Proc. of IEEE International Symp. on Circuits andSystems, pp.4719-4722, May 2006.
9. Y.-W. Lin, H.-Y. Liu, and C.-Y Lee, “A Dynamic Scaling FFTProcessor for DVB-T Applications,” IEEE Journal of Solid-State Circuits, Vol.39, pp.2005-2013, November 2004.
10. C.M. Wu, M.D. Shieh, H.F. Lo, and M.H. Hu,“Implementation of channel demodulator for DAB Systems,”2003 IEEE International Symposium on Circuits andSystems, vol. 2, pp.25-28, May 2003.
11. C.-K. Chang, C.-P. Hung, and S.-G. Chen, “An EfficientMemory-based FFT Architecture,” Proc. of IEEEInternational Symposiums on Circuits and Systems, pp.129-132, 2003.
12. C.L. Wey, W.-C. Tang, and S.Y. Lin, “Efficient Memory-Based FFT Architectures for Digital Video Broadcasting(DVB-T/H),” Proc. of International Symp. on VLSI Design,Automation, and Test (VLSI-DAT), Hsinchu, Taiwan, April,2007.
13. C.L. Wey, S.-Y. Lin, and W.-C. Tang, “Efficient VLSIImplementation of Memory-Based FFT Processors for DVB-T Applications,” Proc. of IEEE Computer Society AnnualSymposium on VLSI (IVLSI), Porto Alegre, Brazil, May2007.