VLSI DSP Project Report_1.0

7/30/2019 VLSI DSP Project Report_1.0

1/38

A 1024 POINT RADIX-22 AND COMPLEXFFT DESIGN USING WALLACE

MULTIPLIER

Le CaiStudent ID: 4125589

Yuan XuStudent ID: 4139225

Zhe ZhangStudent ID: 4137165

December 18th, 2009

i


2/38

TABLE OF CONTENTS

A 1024 POINT RADIX-22 AND COMPLEX FFT DESIGN USING WALLACE MULTIPLIER.............................................................................................................................................. I

OBJECTIVE: .................................................................................................................................IV

THE WALLACE MULTIPLIER DESIGN ..................................................................................... 5

Booth recoding ..................................................................................................................... 5WallaceTree_Adder for Partial Product Reduction .............................................................732-bit Brent Kung adder ......................................................................................................8

SIMULATION RESULTS .............................................................................................................. 9

SYNTHESIS RESULTS OF THE WALLACE MULTIPLIER .....................................................9

Results of Critical Path ..................................................................................................... 10Results of Power Consumption .........................................................................................13Results of Area ...................................................................................................................14

Conclusion of Phase1 ........................................................................................................14

SERVERAL OTHER FFT DESIGNS ..........................................................................................15

R2MDC ..............................................................................................................................15R2SDF ................................................................................................................................15R4SDF ................................................................................................................................16R4MDC ..............................................................................................................................16R4SDC ...............................................................................................................................16

FFT DESIGN BASED ON RADIX-22 ALGORITHM ...............................................................17

RADIX-22 SDF ARCHITECTURE FOR 1024 POINTS COMPLEX FFT ................................19

SYNTHESIS RESULTS OF THE 1024 POINTS FFT .................................................................23Results of Power Dissipation .............................................................................................24Results of Critical Path ...................................................................................................... 25Results of Area ...................................................................................................................35Conclusion of the Phase2 ...................................................................................................36

REFERENCES .............................................................................................................................. 37

ii


3/38

LIST OF FIGURES

FIG 4. SIMULATION WAVEFORMS OF WALLACE MULTIPLIER................................9

FIG 5. R2MDC(N=16)..................................................................................................................15

FIG 7. R4SDF(N=256)..................................................................................................................16FIG 8. R4MDC(N=256)................................................................................................................16

FIG 9. R4SDC(N=256)..................................................................................................................17

FIG 10. BUTTERFLY WITH DECOMPOSED TWIDDLE FACTORS.....................................19

FIG 11. 1024 POINTS RADIX-22 FFT ARCHITECTURE.........................................................20

FIG 12. BF2I..................................................................................................................................20

FIG 13. BF2II.................................................................................................................................20

FIGURE 14. BUTTERFLY ARCHITECTURE OF THE 1024 POINTS RADIX-22 FFT..........21

FIGURE 15. SIMULATION WAVEFORMS OF 1024 POINTS FFT.........................................23Figure 16. Schematic View of 1024 points FFT.36

LIST OF TABLES

TABLE 2. TWIDDLE FACTORS OF EACH WIRE...................................................................22

iii


4/38

OBJECTIVE:

Design a 1024-point radix-22 and complex FFT module based on Booth recoding Wallacemultiplier with Verilog. This project contains two stages.

In the first stage, implement a 1616 multiplier based on Wallace tree using radix-4 boothsalgorithm. By using Verilog, design sub-blocks such as half adder, full adder, booth encoder,partial product generator, 32-bit Brent Kung adder, and Wallace tree carry save adder. Asimulation is carried out to verify the correct function of the proposed multiplier.

In the second stage, design a 1024-point radix-22 and complex FFT module based on the firststage. Based on the He and Torkelsons paper, the proposed 1024-point FFT processor utilizessimplified cascaded radix-22 single-path delay feedback (SDF) structure. The control circuit ofthe proposed simplified radix-22 FFT SDF architecture is simpler than that of the direct radix-4FFT SDF structure. The multiplier cost of the proposed FFT architecture is less than that of theprevious FFT structures in 1024-point FFT applications. Only 4 complex multipliers and 1024complex-word data memory are needed for the pipelined 1024-point FFT processor.

iv


5/38

THE WALLACE MULTIPLIER DESIGN

Booth recoding

module Booth recoding

Booth multiplication is a technique that allows for smaller, faster multiplication circuits, by recoding the numbers that are multiplied.

Reducing the Number of Partial Products

It is possible to reduce the number of partial products by half, by using the technique of radix 4 Booth recoding. The basic idea is that, instead of

shifting and adding for every column of the multiplier term and multiplying by 1 or 0, we only take every second column, and multiply by 1, 2, o

0, to obtain the same results. So, to multiply by 7, we can multiply the partial product aligned against the least significant bit by -1, and multiply the

partial product aligned with the third column by 2.

Partial Product 0 = Multiplicand * -1, shifted left 0 bits (x -1)

Partial Product 1 = Multiplicand * 2, shifted left 2 bits (x 8)

This is the same result as the equivalent shift and add method:





The advantage of this method is the halving of the number of partial products. This is important in circuit design as it relates to the propagation dela

in the running of the circuit, and the complexity and power consumption of its implementation.

Radix-4 Booth Recoding

To Booth recode the multiplier term, we consider the bits in blocks of three, such that each block overlaps the previous block by one bit. Grouping

starts from the LSB, and the first block only uses two bits of the multiplier (since there is no previous block to overlap):

Fig 1.Grouping of bits from the multiplier term for Booth recoding

5


6/38

Table 1. Booth recoding mapping calculation

We generate three signals depending on the input bits, S(for shift x2) , N(when negative), Z(when zero).The logic expressions are

Sign Extension Tricks

Once the Booth recoded partial products have been generated, they need to be shifted and added together. The problem with implementing this inhardware is that the first partial product needs to be sign extended by 6 bits, the second by four bits, and so on. This is easily achievable in hardwar

but requires additional logic gates than if those bits could be permanently kept constant.

The procedure to do this is:

Invert the most significant bit (MSB) of each partial product

Add an additional '1' to the MSB of the first partial product

Add an additional '1' in front of each partial product

This technique allows any sign bits to be correctly propagated, without the needs to sign extend all of the bits.

6


7/38

WallaceTree_Adder for Partial Product Reduction

module Wallace tree adder forPartial product reduction Tables

7


8/38

Fig 2.Wallace tree adder for Partial product reduction

32-bit Brent Kung adder

module Brent Kung adder(VMA)

In order to build fast adders, it is necessary to organize carry propagation and generation into recursive trees.

Here is the definition:

Pi means the Cin=Cout, while Gi means the Cout = 1 and idependent of Cin.

Then, the expression of Sum and Cout of one adder could be:

We need dot operator to do recursive implementation.

There gives a 16-bit Brent-Kung adder for example

Fig 3. 16-bit Brent-Kung adder

8


9/38

SIMULATION RESULTS

The simulation is carried out with Modelsim SE 6.5c. The results are shown in Fig. 4 as follows:

Fig 4. Simulation Waveforms of Wallace Multiplier.

SYNTHESIS RESULTS OF THE WALLACE MULTIPLIER

With the aid of Synopsis Design Compiler, we employ FreePDK 45 nm CMOS technique to

obtain the synthesis results with respect of power dissipation, length of critical path, and silicon

area.

9


10/38

Results of Critical Path

The result of Critical Path is shown as follows:

****************************************Report : timing

-path full

-delay max

-max_paths 1

-sort_by group

Design : multiplier

Version: A-2007.12

Date : Tue Nov 17 16:39:59 2009

****************************************

Operating Conditions: typical Library: gscl45nm

Wire Load Model Mode: top

Startpoint: B[3] (input port)

Endpoint: sum[31] (output port)

Path Group: (none)

Path Type: max

Point Incr Path

--------------------------------------------------------------------------

input external delay 0.00 0.00 r

B[3] (in) 0.00 0.00 r

pre_coding/y[3] (booth_coding) 0.00 0.00 r

pre_coding/U63/Y (INVX1) 0.04 0.04 f

pre_coding/U50/Y (NAND3X1) 0.04 0.09 r

pre_coding/U13/Y (BUFX2) 0.04 0.12 r

pre_coding/U49/Y (OAI21X1) 0.01 0.14 f

pre_coding/C857/Z_11 (*SELECT_OP_5.15_5.1_15) 0.00 0.14 f

pre_coding/pp1[12] (booth_coding) 0.00 0.14 f

tree_adder/pp1[12] (wallace_tree_adder) 0.00 0.14 f

tree_adder/adder13/inb (full_adder_81) 0.00 0.14 f

tree_adder/adder13/U6/Y (XNOR2X1) 0.07 0.20 r

tree_adder/adder13/U3/Y (XOR2X1) 0.07 0.27 r

tree_adder/adder13/sum (full_adder_81) 0.00 0.27 r

tree_adder/adder50/ina (full_adder_53) 0.00 0.27 r

10


11/38





tree_adder/adder88/U6/Y (XNOR2X1) 0.06 0.47 rtree_adder/adder88/U3/Y (XOR2X1) 0.07 0.54 r








tree_adder/adder142/U3/Y (XOR2X1) 0.05 0.78 f

tree_adder/adder142/sum (full_adder_1) 0.00 0.78 ftree_adder/VMA/A[10] (brent_kung_28bitadder) 0.00 0.78 f

tree_adder/VMA/pg10/A (p_g_17) 0.00 0.78 f

tree_adder/VMA/pg10/U2/Y (AND2X1) 0.04 0.82 f

tree_adder/VMA/pg10/G (p_g_17) 0.00 0.82 f

tree_adder/VMA/U30/Y (AND2X1) 0.06 0.88 f

tree_adder/VMA/adder5/Gin (dot_com_18) 0.00 0.88 f

tree_adder/VMA/adder5/U4/Y (AOI21X1) 0.05 0.93 r

tree_adder/VMA/adder5/U1/Y (BUFX2) 0.03 0.96 r

tree_adder/VMA/adder5/U3/Y (INVX1) 0.02 0.98 f

tree_adder/VMA/adder5/Gout (dot_com_18) 0.00 0.98 ftree_adder/VMA/adder18/G (dot_com_5) 0.00 0.98 f




tree_adder/VMA/adder18/Gout (dot_com_5) 0.00 1.04 f

tree_adder/VMA/adder22/G (dot_com_1) 0.00 1.04 f




tree_adder/VMA/adder22/Gout (dot_com_1) 0.00 1.11 f

tree_adder/VMA/adder26/G (half_dot_com_22) 0.00 1.11 f




tree_adder/VMA/adder26/Gout (half_dot_com_22) 0.00 1.20 f

tree_adder/VMA/adder41/Gin (half_dot_com_8) 0.00 1.20 f

11


12/38





tree_adder/VMA/adder43/Gin (half_dot_com_6) 0.00 1.30 ftree_adder/VMA/adder43/U3/Y (AOI21X1) 0.04 1.34 r









tree_adder/VMA/adder47/Gin (half_dot_com_2) 0.00 1.51 ftree_adder/VMA/adder47/U3/Y (AOI21X1) 0.04 1.54 r









tree_adder/VMA/U9/Y (XOR2X1) 0.04 1.73 rtree_adder/VMA/sum[27] (brent_kung_28bitadder) 0.00 1.73 r

tree_adder/sum[31] (wallace_tree_adder) 0.00 1.73 r

sum[31] (out) 0.00 1.73 r

data arrival time 1.73

--------------------------------------------------------------------------

(Path is unconstrained)

12


13/38

Results of Power Consumption

The result of power consumption is shown as follows:

****************************************Report : power

-analysis_effort low

Design : multiplier

Version: A-2007.12

Date : Tue Nov 17 16:39:24 2009

****************************************

Library(s) Used:

gscl45nm (File: /home/class/zhan0915/project/gscl45nm.db)



Global Operating Voltage = 1.1

Power-specific unit information :

Voltage Units = 1V

Capacitance Units = 1.000000pf

Time Units = 1ns

Dynamic Power Units = 1mW (derived from V,C,T units)

Leakage Power Units = 1nW

Cell Internal Power = 1.7201 mW (57%)

Net Switching Power = 1.2722 mW (43%)

---------

Total Dynamic Power = 2.9922 mW (100%)

Cell Leakage Power = 18.9980 uW

13


14/38

Results of Area

The result of silicon area is shown as follows:

****************************************Report : area

Design : multiplier

Version: A-2007.12

Date : Thu Nov 17 16:47:24 2009

****************************************

Library(s) Used:

gscl45nm (File: /home/grads/zhan0884/FreePDK45/osu_soc/lib/files/gscl45nm.db)

Number of ports: 70

Number of nets: 486

Number of cells: 135

Number of references: 135

Combinational area: 4213.5213404

Noncombinational area: 0.000000

Net Interconnect area: undefined (No wire load specified)

Total cell area: 4213.5213404

Total area: undefined

Conclusion of Phase1

In the project Phase 1, one 1616 multiplier is designed based on Wallace tree using radix-4

booths algorithm. Synthesis results show that the multiplier takes the silicon area of

4213.521340 m2. The critical path of the multiplier is 1.73 ns. And the total power consumption

of the multiplier is 2.9922 mW.

14


15/38

SERVERAL OTHER FFT DESIGNS

Before going into details of the new approach, it is beneficial to have a brief review of the

various architectures for pipeline FFT processors. This Section give a brief review of previous

approaches for FFT hardware design. Different approaches will be put into functional blockswith unified terminology, where the additive butterfly has been separated from multiplier to

show the hardware requirement distinctively. The control and twiddle factor reading mechanism

have been also omitted for clarity.

R2MDC

Radix-2 Multi-path Delay Commutator (R2MDC) was probably the most straightforward

approach for pipeline implementation of radix-2 FFT algorithm. The input sequence has been

broken into two parallel data stream flowing forward, with correct distance between data

elements entering the butterfly scheduled by proper delays. Both butterflies and multipliers are in50%utilization. (log2N-2) multipliers, log2Nradix-2 butterflies, and (3/2N-2) registers (delay

elements) are needed.

Fig 5. R2MDC(N=16)

R2SDF

Radix-2 Single-path Delay Feedback (R2SDF) uses the registers more efficiently by storing the

butterfly output in feedback shift registers. A single data stream goes through the multiplier at

every stage. It has same number of butterfly units and multipliers as in R2MDC approach, but

with much reduced memory requirement: (N-1) registers. Its memory requirement is minimal.

Fig 6. R2SDF(N=16)

15


16/38

R4SDF

Radix-4 Single-path Delay Feedback (R4SDF) was proposed as a radix-4 version of R2SDF,

employing Coordinate Rotational Digital Computer (CORDIC) iterations. The utilization of

multipliers has been increased to 75% due to the storage of 3 out of radix-4 butterfly outputs.However, the utilization of the radix-4 butterfly, which is fairly complicated and contains at least

8 complex adders, is dropped to only 25%. It requires (log4N-1) multipliers, log4 N full radix-4

butterflies and storage of size (N-1).

Fig 7. R4SDF(N=256)

R4MDC

Radix-4 Multi-path Delay Commutator (R4MDC) is a radix-4 version of R2MDC. It has been

used as the architecture for the initial VLSI implementation of pipeline FFT processor and

massive wafer scale integration. However, it suffers from low, 25%, utilization of all

components, which can be compensated only in some special applications where four FFTs are

being processed simultaneously. It requires 3log4Nmultipliers, log4Nfull radix-4 butterflies and(5/2N-4) registers.

Fig 8. R4MDC(N=256)

R4SDC

Radix-4 Single-path Delay Commutator (R4SDC) uses a modified radix-4 algorithm with

programmable 1/4 radix-4 butterflies to achieve higher, 75%utilization of multipliers. A

combined Delay-Commutator also reduces the memory requirement to (2N-2) from (5/2N-1),

that of R4MDC. The butterfly and delay-commutator become relatively complicated due to

16


17/38

programmability requirement. R4SDC has been used recently in building the largest ever single

chip pipeline FFT processor for HDTV application.

Fig 9. R4SDC(N=256)

FFT DESIGN BASED ON RADIX-22 ALGORITHM

In this section, we will derive the hardware oriented radix-22 algorithm for FFT implementation.

One example of 16-point radix-22 will be given in this section. And finally, the detailed butterfly

trellis will be plotted to guideline the following hardware design.

The DFT of sizeNis defined by

1

0

( ) ( ) ,0N

nk

N

n

X k x n W k N

=

=


18/38

1 2 3 1 2 3

3 2 1

2 3 1 2 3 2 31

3 2

2 1 1 1 ( )( 2 4 )2 4

1 2 3 1 2 3

0 0 0

2 1 1( ) ( )(2 4 )4 4

2 3

0 02

( 2 4 ) ( )2 4

( )4

N NNn n n k k k

N

n n n

N NNn n k n n k k

k

N N N

n n

N NX k k k x n n n W

NB n n W W

+ + + +

= = =

+ + +

= =

+ + = + +

= +

2 3 1 2 31

3 2

2 1 1 ( )( 2 4 )4

2 3

0 0 2

( )4

NNn n k k k

k

N N

n n

NB n n W

+ + +

= =

= +

where

1 1

2 3 2 3 1 2 3

2

( ) ( ) ( 1) ( )4 4 2 4

k k

N

N N N NB n n x n n x n n n+ = + + + +

2 3 1 2 3 2 1 22 3 3 1 2 3 3

3 1 2 3 32 1 2

( )( 2 4 ) ( 2 )( 2 ) 44 4

( 2 ) 4( 2 )( )

N Nn n k k k n k k

Nn k n k k n k

N N N N N

n k k n k n k k

N N

W W W W W

j W W

+ + + + +

++

=

=

then the equation will deduced for the radix-22 FFT algorithm:

3 1 2 3 3

3

4 1( 2 )

1 2 3 1 2 3 40( 2 4 ) ( , , )

Nn k k n k

N NnX k k k H k k n W W

+

=

+ + =

1 1 2 1

BF I BF I

( 2 )

1 2 3 3 3 3 3

BF II

3( , , ) ( ) ( 1) ( ) ( ) ( ) ( 1) ( )

2 4 4

k k k k N N NH k k n x n x n j x n x n+

= + + + + + +

6 4 4 4 4 7 4 4 4 48 6 4 4 4 4 4 7 4 4 4 4 4 8

1 4 4 4 4 4 4 4 4 4 4 4 4 4 2 4 4 4 4 4 4 4 4 4 4 4 4 43

This equation represents the first two stages of butterflies with only trivial multiplications in the

SFG, as BF I and BF II in Fig 10. After these two stages, full multipliers are required to compute

the product of the decomposed twiddle factor3 1 2( 2 )n k k

NW+

in eqn. X(k1+2k2+4k3), as shown in

Fig 10. Note the order of the twiddle factors is different from that of radix-4 algorithm.

18


19/38

Fig 10. Butterfly with decomposed twiddle factors.

Radix-22 algorithm has the feature that it has the same multiplicative complexity as radix-4

algorithms, but still retains the radix-2 butterfly structures. The multiplicative operations are in

such an arrangement that only every other stage has non-trivial multiplications. This is a great

structural advantage over other algorithms when pipeline/cascade FFT architecture is under

consideration.

RADIX-22 SDF ARCHITECTURE FOR 1024 POINTS COMPLEX FFT

Fig. 11 outlines an implementation of the R22SDF architecture for N = 1024, note the similarity

of the data-path to R2SDF and the reduced number of multipliers. The implementation uses two

types of butterflies, one identical to that in R2SDF, the other contains also the logic to implementthe trivial twiddle factor multiplication, as shown in Fig. 12,13 respectively. Due to the spatial

regularity of Radix-22 algorithm, the synchronization control of the processor is very simple. A

(log2 N)-bit binary counter serves two purposes: synchronization controller and address counter

for twiddle factor reading in each stages.

19


20/38

Fig 11. 1024 points Radix-22 FFT architecture.

Fig 12. BF2I.

Fig 13. BF2II.

20


21/38

.

.

.

BF 1

.

.

.

BF 2

.

.

.

BF 3

.

.

.

BF 4

.

.

.

BF 5

.

.

.

BF 6

.

.

.

10 24-point ra dix-22

FFT algorithm

.

.

.

x (0)

.

.

.

BF7

.

.

.

BF8

.

.

.

BF 9

.

.

.

BF 10

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

x (1)

x (2)

x (3)

x (4)

x(1 0 2 2)

x (1 0 2 3)

X(0 )

X(1 )

X(2 )

X(3 )

X(4 )

X(1 0 2 2)

X(1 0 2 3)

0-0

0-1

0-2

0-3

0-4

0-1 0 2 3

1- 0

1- 1

1- 2

1- 3

1- 4

1-1 0 2 3

2- 0

2- 1

2- 2

2- 3

2- 4

2-1 0 2 3

3-0

3-1

3-2

3-3

3-4

3-1 0 2 3

4-0

4-1

4-2

4-3

4-4

4-1 0 2 3

5- 0

5- 1

5- 2

5- 3

5- 4

5-1 0 2 3

6-0

6-1

6-2

6-3

6-4

6-1 0 2 3

7- 0

7- 1

7- 2

7- 3

7- 4

7-1 0 2 3

8-0

8-1

8-2

8-3

8-4

8-1 0 2 3

N(0) N(1) N(2) N(3) N(4) N(5) N(6) N(7) N(8)

Figure 14. Butterfly architecture of the 1024 points radix-22 FFT.

The 1024 points FFT using R22SDF architecture is shown in Fig.14. It includes 5 stage of butterfly in which there are two butterfly

named BF1 and BF2. We describe the connection between BF1 and BF2 as a network, thus there are 9 networks named from N(0) to

N(8) shown in Fig 14. Each network contains 1024 piece of wires which we described as m-n where m represents the network and n

stands for the certain wire. For instance, the 128th wire in network 7 is represented as 7-128. Thus, we can use the formulation given

before to calculate each wires twiddle factor. The twiddle factors value is specified in Table 2 shown as follows:

21


22/38

Table 2. Twiddle factors of each wire.

Network Wire number Twiddle factor

N(0) 0-m, 768m1023 -j

0-m, else 1

N(2) 0-m, 192+256nm255+256n, (0n3) -j0-m, else 1N(4) 0-m, 48+64nm63+64n, (0n15) -j

0-m, else 1

N(6) 0-m, 12+16nm15+16n, (0n63) -j

0-m, else 1

N(8) 0-m, m=3+4n, (0n255) -j

0-m, else 1

N(1) 0-m, 768m1023 W3(m-768)

0-m, 512m767 W1(m-512)

0-m, 256m511 W2(m-512)

0-m, else 1N(3) 0-m, 192+256nm255+256n, (0n3) W3(m-192-256n)

0-m, 128+256nm191+256n, (0n3) W1(m-128-256n)

0-m, 64+256nm127+256n, (0n3) W2(m-64-256n)

0- m, else 1

N(5) 0-m, 48+64nm63+64n, (0n15) W3(m-48-64n)

0- m, 32+64nm47+64n, (0n15) W1(m-32-64n)

0-m, 16+64nm31+64n, (0n15) W2(m-16-64n)

0- m, else 1

N(7) 0-m, 12+16nm15+16n, (0n63) W3(m-12-16n)

0- m, 8+16nm11+16n, (0n63) W1(m-8-16n)

0-m, 4+16nm7+16n, (0n63) W2(m-4-16n)

0- m, else 1

22


23/38

SYNTHESIS RESULTS OF THE 1024 POINTS FFT

With the aid of Synopsis Design Compiler, we employ FreePDK 45 nm CMOS technique to

obtain the synthesis results with respect of power dissipation, length of critical path, and silicon

area.

Figure 15. Simulation Waveforms of 1024 points FFT

23


24/38

Results of Power Dissipation

The result of power consumption is shown as follows:

****************************************Report : power

-analysis_effort low

Design : fft_1024

Version: A-2007.12

Date : Tue Dec 22 07:40:01 2009

****************************************

Library(s) Used:

gscl45nm (File: /home/class/zhan0915/fft3/gscl45nm.db)



Global Operating Voltage = 1.1

Power-specific unit information :

Voltage Units = 1V

Capacitance Units = 1.000000pf

Time Units = 1ns

Dynamic Power Units = 1mW (derived from V,C,T units)

Leakage Power Units = 1nW

Cell Internal Power = 37.6966 mW (83%)

Net Switching Power = 7.7248 mW (17%)

---------

Total Dynamic Power = 45.4214 mW (100%)

Cell Leakage Power = 2.2162 mW

24


25/38

Results of Critical Path

The result of length of critical path is shown as follows:

****************************************Report : timing

-path full

-delay max

-max_paths 1

-sort_by group

Design : fft_1024

Version: A-2007.12

Date : Tue Dec 22 07:41:08 2009

****************************************

# A fanout number of 1000 was used for high fanout net computations.



Startpoint: counter/q_reg[8]

(rising edge-triggered flip-flop)

Endpoint: imag_out[15]

(output port)

Path Group: (none)

Path Type: max

Point Incr Path

--------------------------------------------------------------------------

counter/q_reg[8]/CLK (DFFPOSX1) 0.00 # 0.00 r

counter/q_reg[8]/Q (DFFPOSX1) 0.36 0.36 r

counter/q[8] (ctr) 0.00 0.36 r

bf_2_0/s (bf_2_0) 0.00 0.36 r

bf_2_0/U279/Y (INVX1) 0.21 0.57 f

bf_2_0/U101/Y (OR2X1) 0.07 0.64 f

bf_2_0/U102/Y (INVX1) 0.74 1.38 r

bf_2_0/U278/Y (MUX2X1) 0.20 1.58 f

bf_2_0/U262/Y (INVX1) 0.10 1.67 r

bf_2_0/adder1/B[0] (vma16_35) 0.00 1.67 r

bf_2_0/adder1/ipg16/B[0] (p_g_16_35) 0.00 1.67 r

25


26/38

bf_2_0/adder1/ipg16/U31/Y (XOR2X1) 0.04 1.72 f

bf_2_0/adder1/ipg16/pg0[1] (p_g_16_35) 0.00 1.72 f

bf_2_0/adder1/ir1c1/pg[1] (partial_product_generator1_560)

0.00 1.72 f

bf_2_0/adder1/ir1c1/U2/Y (AOI21X1) 0.04 1.76 rbf_2_0/adder1/ir1c1/U1/Y (INVX1) 0.04 1.79 f

bf_2_0/adder1/ir1c1/pgo (partial_product_generator1_560)

0.00 1.79 f

bf_2_0/adder1/ir2c3/pg0 (partial_product_generator1_559)

0.00 1.79 f

bf_2_0/adder1/ir2c3/U2/Y (AOI21X1) 0.04 1.83 r

bf_2_0/adder1/ir2c3/U1/Y (INVX1) 0.04 1.87 f


0.00 1.87 f

bf_2_0/adder1/ir3c7/pg0 (partial_product_generator1_558)0.00 1.87 f




0.00 1.96 f

bf_2_0/adder1/ixor16/A[7] (xor_16_35) 0.00 1.96 f

bf_2_0/adder1/ixor16/U3/Y (XOR2X1) 0.04 2.00 f

bf_2_0/adder1/ixor16/S[7] (xor_16_35) 0.00 2.00 f

bf_2_0/adder1/S[7] (vma16_35) 0.00 2.00 f

bf_2_0/U233/Y (AOI22X1) 0.05 2.05 rbf_2_0/U21/Y (BUFX2) 0.05 2.10 r

bf_2_0/U9/Y (AND2X1) 0.03 2.13 r

bf_2_0/U89/Y (INVX1) 0.03 2.15 f

bf_2_0/imag_out0[7] (bf_2_0) 0.00 2.15 f

mul0_i/A[7] (multiplier_7) 0.00 2.15 f

mul0_i/pre_coding/x[7] (booth_coding_7) 0.00 2.15 f

mul0_i/pre_coding/U240/Y (INVX1) 0.25 2.41 r

mul0_i/pre_coding/U663/Y (MUX2X1) 0.10 2.50 f

mul0_i/pre_coding/U662/Y (OAI21X1) 0.05 2.55 r

mul0_i/pre_coding/pp0[7] (booth_coding_7) 0.00 2.55 r

mul0_i/tree_adder/pp0[7] (wallace_tree_adder_7) 0.00 2.55 r

mul0_i/tree_adder/adder6/ina (full_adder_647) 0.00 2.55 r

mul0_i/tree_adder/adder6/U6/Y (XNOR2X1) 0.07 2.63 r

mul0_i/tree_adder/adder6/U3/Y (XOR2X1) 0.07 2.70 r

mul0_i/tree_adder/adder6/sum (full_adder_647) 0.00 2.70 r


26


27/38





mul0_i/tree_adder/adder81/U6/Y (XNOR2X1) 0.08 2.92 rmul0_i/tree_adder/adder81/U3/Y (XOR2X1) 0.07 3.00 r


mul0_i/tree_adder/adder108/ina (half_adder_441) 0.00 3.00 r


mul0_i/tree_adder/adder108/sum (half_adder_441) 0.00 3.07 r




mul0_i/tree_adder/VMA/A[3] (brent_kung_28bitadder_7)

0.00 3.14 rmul0_i/tree_adder/VMA/pg3/A (p_g_195) 0.00 3.14 r

mul0_i/tree_adder/VMA/pg3/U1/Y (XOR2X1) 0.08 3.22 r

mul0_i/tree_adder/VMA/pg3/P (p_g_195) 0.00 3.22 r

mul0_i/tree_adder/VMA/adder1/P (dot_com_167) 0.00 3.22 r

mul0_i/tree_adder/VMA/adder1/U1/Y (AND2X1) 0.04 3.26 r

mul0_i/tree_adder/VMA/adder1/Pout (dot_com_167) 0.00 3.26 r

mul0_i/tree_adder/VMA/adder20/P (half_dot_com_182) 0.00 3.26 r

mul0_i/tree_adder/VMA/adder20/U2/Y (AOI21X1) 0.02 3.28 f

mul0_i/tree_adder/VMA/adder20/U1/Y (INVX1) 0.05 3.33 r

mul0_i/tree_adder/VMA/adder20/Gout (half_dot_com_182)0.00 3.33 r

mul0_i/tree_adder/VMA/U6/Y (XOR2X1) 0.08 3.42 r

mul0_i/tree_adder/VMA/sum[4] (brent_kung_28bitadder_7)

0.00 3.42 r

mul0_i/tree_adder/sum[8] (wallace_tree_adder_7) 0.00 3.42 r

mul0_i/sum[8] (multiplier_7) 0.00 3.42 r

bf_1_1/imag_in1[8] (bf_1_4) 0.00 3.42 r

bf_1_1/adder1/B[8] (vma16_31) 0.00 3.42 r

bf_1_1/adder1/ipg16/B[8] (p_g_16_31) 0.00 3.42 r




0.00 3.47 f



bf_1_1/adder1/ir1c9/pgo[0] (partial_product_generator2_338)

27


28/38

0.00 3.53 f


0.00 3.53 f


bf_1_1/adder1/ir5c9/U1/Y (INVX1) 0.03 3.58 fbf_1_1/adder1/ir5c9/pgo (partial_product_generator1_490)

0.00 3.58 f


0.00 3.58 f




0.00 3.64 f


bf_1_1/adder1/ixor16/U15/Y (XOR2X1) 0.03 3.68 fbf_1_1/adder1/ixor16/S[10] (xor_16_31) 0.00 3.68 f

bf_1_1/adder1/S[10] (vma16_31) 0.00 3.68 f

bf_1_1/U128/Y (MUX2X1) 0.04 3.71 r

bf_1_1/U127/Y (INVX1) 0.04 3.76 f

bf_1_1/imag_out0[10] (bf_1_4) 0.00 3.76 f

bf_2_1/imag_in1[10] (bf_2_4) 0.00 3.76 f

bf_2_1/U277/Y (MUX2X1) 0.08 3.84 r

bf_2_1/U261/Y (INVX1) 0.09 3.92 f

bf_2_1/adder1/B[10] (vma16_27) 0.00 3.92 f

bf_2_1/adder1/ipg16/B[10] (p_g_16_27) 0.00 3.92 fbf_2_1/adder1/ipg16/U29/Y (XOR2X1) 0.04 3.97 f



0.00 3.97 f

bf_2_1/adder1/ir1c11/U1/Y (AND2X1) 0.04 4.01 f


0.00 4.01 f


0.00 4.01 f



0.00 4.05 f


0.00 4.05 f



28


29/38


0.00 4.12 f



bf_2_1/adder1/ixor16/S[11] (xor_16_27) 0.00 4.16 fbf_2_1/adder1/S[11] (vma16_27) 0.00 4.16 f

bf_2_1/U244/Y (AOI22X1) 0.05 4.21 r

bf_2_1/U21/Y (BUFX2) 0.05 4.26 r

bf_2_1/U4/Y (AND2X1) 0.03 4.29 r

bf_2_1/U86/Y (INVX1) 0.03 4.31 f

bf_2_1/imag_out0[11] (bf_2_4) 0.00 4.31 f




mul1_i/pre_coding/U686/Y (MUX2X1) 0.10 4.66 fmul1_i/pre_coding/U685/Y (OAI21X1) 0.05 4.71 r









mul1_i/tree_adder/adder47/U3/Y (XOR2X1) 0.07 5.01 rmul1_i/tree_adder/adder47/sum (full_adder_429) 0.00 5.01 r













0.00 5.38 r

mul1_i/tree_adder/VMA/pg7/A (p_g_135) 0.00 5.38 r

mul1_i/tree_adder/VMA/pg7/U1/Y (XOR2X1) 0.05 5.43 f

29


30/38

mul1_i/tree_adder/VMA/pg7/P (p_g_135) 0.00 5.43 f

mul1_i/tree_adder/VMA/adder3/P (dot_com_117) 0.00 5.43 f

mul1_i/tree_adder/VMA/adder3/U1/Y (AND2X1) 0.05 5.49 f

mul1_i/tree_adder/VMA/adder3/Pout (dot_com_117) 0.00 5.49 f

mul1_i/tree_adder/VMA/adder19/P (dot_com_101) 0.00 5.49 fmul1_i/tree_adder/VMA/adder19/U1/Y (AND2X1) 0.05 5.54 f

mul1_i/tree_adder/VMA/adder19/Pout (dot_com_101) 0.00 5.54 f

mul1_i/tree_adder/VMA/adder23/P (half_dot_com_129) 0.00 5.54 f

mul1_i/tree_adder/VMA/adder23/U2/Y (AOI21X1) 0.04 5.57 r

mul1_i/tree_adder/VMA/adder23/U1/Y (INVX1) 0.04 5.61 f

mul1_i/tree_adder/VMA/adder23/Gout (half_dot_com_129)

0.00 5.61 f



0.00 5.69 rmul1_i/tree_adder/sum[12] (wallace_tree_adder_5) 0.00 5.69 r


bf_1_2/imag_in1[12] (bf_1_3) 0.00 5.69 r

bf_1_2/adder1/B[12] (vma16_23) 0.00 5.69 r

bf_1_2/adder1/ipg16/B[12] (p_g_16_23) 0.00 5.69 r




0.00 5.75 f

bf_1_2/adder1/ir1c13/U3/Y (AOI21X1) 0.04 5.79 rbf_1_2/adder1/ir1c13/U2/Y (INVX1) 0.03 5.81 f


0.00 5.81 f


0.00 5.81 f




0.00 5.86 f


0.00 5.86 f




0.00 5.92 f


30


31/38



bf_1_2/adder1/S[14] (vma16_23) 0.00 5.95 f

bf_1_2/U120/Y (MUX2X1) 0.04 5.99 r

bf_1_2/U119/Y (INVX1) 0.04 6.04 fbf_1_2/imag_out0[14] (bf_1_3) 0.00 6.04 f

bf_2_2/imag_in1[14] (bf_2_3) 0.00 6.04 f

bf_2_2/U273/Y (MUX2X1) 0.08 6.11 r

bf_2_2/U257/Y (INVX1) 0.09 6.20 f

bf_2_2/adder1/B[14] (vma16_19) 0.00 6.20 f

bf_2_2/adder1/ipg16/B[14] (p_g_16_19) 0.00 6.20 f

bf_2_2/adder1/ipg16/U21/Y (XOR2X1) 0.06 6.26 r

bf_2_2/adder1/ipg16/pg14[1] (p_g_16_19) 0.00 6.26 r

bf_2_2/adder1/ixor16/B[14] (xor_16_19) 0.00 6.26 r


bf_2_2/adder1/S[14] (vma16_19) 0.00 6.31 f

bf_2_2/U241/Y (AOI22X1) 0.05 6.36 r

bf_2_2/U24/Y (BUFX2) 0.05 6.40 r

bf_2_2/U13/Y (AND2X1) 0.03 6.43 r

bf_2_2/U83/Y (INVX1) 0.03 6.46 f

bf_2_2/imag_out0[14] (bf_2_3) 0.00 6.46 f



mul2_i/pre_coding/U251/Y (INVX1) 0.25 6.71 rmul2_i/pre_coding/U680/Y (MUX2X1) 0.10 6.81 f
















31


32/38





mul2_i/tree_adder/adder142/ina (full_adder_188) 0.00 7.45 rmul2_i/tree_adder/adder142/U6/Y (XNOR2X1) 0.08 7.52 r




0.00 7.60 r





mul2_i/tree_adder/VMA/sum[10] (brent_kung_28bitadder_3)0.00 7.74 r



bf_1_3/imag_in1[14] (bf_1_2) 0.00 7.74 r

bf_1_3/adder1/B[14] (vma16_15) 0.00 7.74 r

bf_1_3/adder1/ipg16/B[14] (p_g_16_15) 0.00 7.74 r





bf_1_3/adder1/S[14] (vma16_15) 0.00 7.85 f

bf_1_3/U120/Y (MUX2X1) 0.04 7.89 r

bf_1_3/U119/Y (INVX1) 0.04 7.94 f

bf_1_3/imag_out0[14] (bf_1_2) 0.00 7.94 f

bf_2_3/imag_in1[14] (bf_2_2) 0.00 7.94 f

bf_2_3/U273/Y (MUX2X1) 0.08 8.02 r

bf_2_3/U257/Y (INVX1) 0.09 8.10 f

bf_2_3/adder1/B[14] (vma16_11) 0.00 8.10 f

bf_2_3/adder1/ipg16/B[14] (p_g_16_11) 0.00 8.10 f






bf_2_3/adder1/S[14] (vma16_11) 0.00 8.21 f

32


33/38

bf_2_3/U241/Y (AOI22X1) 0.05 8.26 r

bf_2_3/U29/Y (BUFX2) 0.05 8.30 r

bf_2_3/U13/Y (AND2X1) 0.03 8.33 r

bf_2_3/U96/Y (INVX1) 0.03 8.36 f

bf_2_3/imag_out0[14] (bf_2_2) 0.00 8.36 fmul3_i/A[14] (multiplier_1) 0.00 8.36 f



mul3_i/pre_coding/U680/Y (MUX2X1) 0.10 8.71 f






mul3_i/tree_adder/adder13/U3/Y (XOR2X1) 0.07 8.91 rmul3_i/tree_adder/adder13/sum (full_adder_82) 0.00 8.91 r









mul3_i/tree_adder/adder115/ina (full_adder_11) 0.00 9.20 rmul3_i/tree_adder/adder115/U6/Y (XNOR2X1) 0.08 9.28 r








0.00 9.50 r






0.00 9.64 r


33


34/38


bf_1_4/imag_in1[14] (bf_1_1) 0.00 9.64 r

bf_1_4/adder1/B[14] (vma16_7) 0.00 9.64 r

bf_1_4/adder1/ipg16/B[14] (p_g_16_7) 0.00 9.64 r

bf_1_4/adder1/ipg16/U21/Y (XOR2X1) 0.07 9.71 rbf_1_4/adder1/ipg16/pg14[1] (p_g_16_7) 0.00 9.71 r




bf_1_4/adder1/S[14] (vma16_7) 0.00 9.75 f

bf_1_4/U120/Y (MUX2X1) 0.04 9.79 r

bf_1_4/U119/Y (INVX1) 0.04 9.84 f

bf_1_4/imag_out0[14] (bf_1_1) 0.00 9.84 f

bf_2_4/imag_in1[14] (bf_2_1) 0.00 9.84 f

bf_2_4/U257/Y (MUX2X1) 0.08 9.91 rbf_2_4/U241/Y (INVX1) 0.09 10.00 f

bf_2_4/adder1/B[14] (vma16_3) 0.00 10.00 f

bf_2_4/adder1/ipg16/B[14] (p_g_16_3) 0.00 10.00 f




0.00 10.04 f



0.00 10.09 fbf_2_4/adder1/ir2c15/pg[1] (partial_product_generator2_26)

0.00 10.09 f



0.00 10.13 f


0.00 10.13 f



0.00 10.16 f


0.00 10.16 f




0.00 10.23 f

34


35/38


bf_2_4/adder1/ixor16/U10/Y (XOR2X1) 0.06 10.29 r

bf_2_4/adder1/ixor16/S[15] (xor_16_3) 0.00 10.29 r

bf_2_4/adder1/S[15] (vma16_3) 0.00 10.29 r

bf_2_4/U218/Y (AOI22X1) 0.04 10.33 fbf_2_4/U54/Y (BUFX2) 0.11 10.45 f

bf_2_4/U217/Y (NAND2X1) 0.08 10.53 r

bf_2_4/imag_out0[15] (bf_2_1) 0.00 10.53 r

imag_out[15] (out) 0.00 10.53 r

data arrival time 10.53

--------------------------------------------------------------------------

(Path is unconstrained)

Results of Area

The result of silicon area is shown as follows:

****************************************

Report : areaDesign : fft_1024

Version: A-2007.12

Date : Tue Dec 22 07:40:53 2009

****************************************

Library(s) Used:

gscl45nm (File: /home/class/zhan0915/fft3/gscl45nm.db)

Number of ports: 66Number of nets: 39672

Number of cells: 38565

Number of references: 31

Combinational area: 70080.567213

Noncombinational area: 262240.141182

35


36/38

Net Interconnect area: undefined (No wire load specified)

Total cell area: 332320.708395

Total area: undefined

Information: This design contains black box (unknown) components. (RPT-8)

Conclusion of the Phase2

In the second stage, one 1024-point radix-22 FFT module is designed. Synthesis results show that

the multiplier takes the silicon area of 332320.708m2. The critical path of the multiplier is

10.53ns. And the total power consumption of the multiplier is 45.4214mW. Fig. 16 is the

schematic view of the FFT.

Fig. 16 Schematic View of 1024-points FFT

36


37/38

REFERENCES

[1] Shousheng He and Mats Torkelson, A New Approach to Pipeline FFT Processor,15-19 April

1996 Page(s):766 - 770 Digital Object Identifier 10.1109/IPPS.1996.508145

[2] Shousheng He and Mats Torkelson, Design and Implementation of a 1024-point Pipeline FFTProcessor, 11-14 May 1998 Page(s):131 134 Digital Object Identifier 10.1109/CICC.1998.694922

[ 3] S. He and M. Torkelson. A complex array multiplier using distributed arithmetic. InProc. IEEE

CICC'96,pages 71-74, San Diego, CA, May 1996.

[4] Garrido, M; Parhi, K; Grajal, J, A Pipelined FFT Architecture for Real-Valued Signals, Volume

PP, 2009 Page(s):1 - 1 Digital Object Identifier10.1109/TCSI.2009.2017125.

[5] Kia Bazargan, University of Minnesota Class Handouts, EE 5324- VLSI Design II, Spring 2006.

[6] Kharrat, M.W.; Ben Ayed, M.A.; Loulou, M.; Masmoudi, N.; Kamoun, L.,A new method to

implement a constant operand multiplier, Microelectronics, The 14th International Conference on 2002

ICM 11-13 Dec. 2002 Page(s):62 65.

[7] Saeeid Tahmasbi Oskuii, Per Gunnar Kjeldsberg, Oscar Gustafsson Power Optimized PartialProduct Reduction Interconnect Ordering in Parallel Multipliers.

37


38/38

Date post:	04-Apr-2018
Category:	Documents
Upload:	anandanandnew
View:	235 times
Download:	0 times

VLSI DSP Project Report_1.0

Documents