+ All Categories
Home > Documents > VLSI DSP Project Report_1.0

VLSI DSP Project Report_1.0

Date post: 04-Apr-2018
Category:
Upload: anandanandnew
View: 235 times
Download: 0 times
Share this document with a friend

of 38

Transcript
  • 7/30/2019 VLSI DSP Project Report_1.0

    1/38

    A 1024 POINT RADIX-22 AND COMPLEXFFT DESIGN USING WALLACE

    MULTIPLIER

    Le CaiStudent ID: 4125589

    Yuan XuStudent ID: 4139225

    Zhe ZhangStudent ID: 4137165

    December 18th, 2009

    i

  • 7/30/2019 VLSI DSP Project Report_1.0

    2/38

    TABLE OF CONTENTS

    A 1024 POINT RADIX-22 AND COMPLEX FFT DESIGN USING WALLACE MULTIPLIER.............................................................................................................................................. I

    OBJECTIVE: .................................................................................................................................IV

    THE WALLACE MULTIPLIER DESIGN ..................................................................................... 5

    Booth recoding ..................................................................................................................... 5WallaceTree_Adder for Partial Product Reduction .............................................................732-bit Brent Kung adder ......................................................................................................8

    SIMULATION RESULTS .............................................................................................................. 9

    SYNTHESIS RESULTS OF THE WALLACE MULTIPLIER .....................................................9

    Results of Critical Path ..................................................................................................... 10Results of Power Consumption .........................................................................................13Results of Area ...................................................................................................................14

    Conclusion of Phase1 ........................................................................................................14

    SERVERAL OTHER FFT DESIGNS ..........................................................................................15



    FFT DESIGN BASED ON RADIX-22 ALGORITHM ...............................................................17

    RADIX-22 SDF ARCHITECTURE FOR 1024 POINTS COMPLEX FFT ................................19

    SYNTHESIS RESULTS OF THE 1024 POINTS FFT .................................................................23Results of Power Dissipation .............................................................................................24Results of Critical Path ...................................................................................................... 25Results of Area ...................................................................................................................35Conclusion of the Phase2 ...................................................................................................36

    REFERENCES .............................................................................................................................. 37

    ii

  • 7/30/2019 VLSI DSP Project Report_1.0

    3/38

    LIST OF FIGURES

    FIG 4. SIMULATION WAVEFORMS OF WALLACE MULTIPLIER................................9

    FIG 5. R2MDC(N=16)..................................................................................................................15

    FIG 7. R4SDF(N=256)..................................................................................................................16FIG 8. R4MDC(N=256)................................................................................................................16

    FIG 9. R4SDC(N=256)..................................................................................................................17

    FIG 10. BUTTERFLY WITH DECOMPOSED TWIDDLE FACTORS.....................................19

    FIG 11. 1024 POINTS RADIX-22 FFT ARCHITECTURE.........................................................20

    FIG 12. BF2I..................................................................................................................................20

    FIG 13. BF2II.................................................................................................................................20

    FIGURE 14. BUTTERFLY ARCHITECTURE OF THE 1024 POINTS RADIX-22 FFT..........21

    FIGURE 15. SIMULATION WAVEFORMS OF 1024 POINTS FFT.........................................23Figure 16. Schematic View of 1024 points FFT.36

    LIST OF TABLES

    TABLE 2. TWIDDLE FACTORS OF EACH WIRE...................................................................22

    iii

  • 7/30/2019 VLSI DSP Project Report_1.0

    4/38

    OBJECTIVE:

    Design a 1024-point radix-22 and complex FFT module based on Booth recoding Wallacemultiplier with Verilog. This project contains two stages.

    In the first stage, implement a 1616 multiplier based on Wallace tree using radix-4 boothsalgorithm. By using Verilog, design sub-blocks such as half adder, full adder, booth encoder,partial product generator, 32-bit Brent Kung adder, and Wallace tree carry save adder. Asimulation is carried out to verify the correct function of the proposed multiplier.

    In the second stage, design a 1024-point radix-22 and complex FFT module based on the firststage. Based on the He and Torkelsons paper, the proposed 1024-point FFT processor utilizessimplified cascaded radix-22 single-path delay feedback (SDF) structure. The control circuit ofthe proposed simplified radix-22 FFT SDF architecture is simpler than that of the direct radix-4FFT SDF structure. The multiplier cost of the proposed FFT architecture is less than that of theprevious FFT structures in 1024-point FFT applications. Only 4 complex multipliers and 1024complex-word data memory are needed for the pipelined 1024-point FFT processor.

    iv

  • 7/30/2019 VLSI DSP Project Report_1.0

    5/38

    THE WALLACE MULTIPLIER DESIGN

    Booth recoding

    module Booth recoding

    Booth multiplication is a technique that allows for smaller, faster multiplication circuits, by recoding the numbers that are multiplied.

    Reducing the Number of Partial Products

    It is possible to reduce the number of partial products by half, by using the technique of radix 4 Booth recoding. The basic idea is that, instead of

    shifting and adding for every column of the multiplier term and multiplying by 1 or 0, we only take every second column, and multiply by 1, 2, o

    0, to obtain the same results. So, to multiply by 7, we can multiply the partial product aligned against the least significant bit by -1, and multiply the

    partial product aligned with the third column by 2.

    Partial Product 0 = Multiplicand * -1, shifted left 0 bits (x -1)

    Partial Product 1 = Multiplicand * 2, shifted left 2 bits (x 8)

    This is the same result as the equivalent shift and add method:

    Partial Product 0 = Multiplicand * 1, shifted left 0 bits (x 1)

    Partial Product 1 = Multiplicand * 1, shifted left 1 bits (x 2)

    Partial Product 2 = Multiplicand * 1, shifted left 2 bits (x 4)

    Partial Product 3 = Multiplicand * 0, shifted left 3 bits (x 0)

    The advantage of this method is the halving of the number of partial products. This is important in circuit design as it relates to the propagation dela

    in the running of the circuit, and the complexity and power consumption of its implementation.

    Radix-4 Booth Recoding

    To Booth recode the multiplier term, we consider the bits in blocks of three, such that each block overlaps the previous block by one bit. Grouping

    starts from the LSB, and the first block only uses two bits of the multiplier (since there is no previous block to overlap):

    Fig 1.Grouping of bits from the multiplier term for Booth recoding

    5

  • 7/30/2019 VLSI DSP Project Report_1.0

    6/38

    Table 1. Booth recoding mapping calculation

    We generate three signals depending on the input bits, S(for shift x2) , N(when negative), Z(when zero).The logic expressions are

    Sign Extension Tricks

    Once the Booth recoded partial products have been generated, they need to be shifted and added together. The problem with implementing this inhardware is that the first partial product needs to be sign extended by 6 bits, the second by four bits, and so on. This is easily achievable in hardwar

    but requires additional logic gates than if those bits could be permanently kept constant.

    The procedure to do this is:

    Invert the most significant bit (MSB) of each partial product

    Add an additional '1' to the MSB of the first partial product

    Add an additional '1' in front of each partial product

    This technique allows any sign bits to be correctly propagated, without the needs to sign extend all of the bits.

    6

  • 7/30/2019 VLSI DSP Project Report_1.0

    7/38

    WallaceTree_Adder for Partial Product Reduction

    module Wallace tree adder forPartial product reduction Tables

    7

  • 7/30/2019 VLSI DSP Project Report_1.0

    8/38

    Fig 2.Wallace tree adder for Partial product reduction

    32-bit Brent Kung adder

    module Brent Kung adder(VMA)

    In order to build fast adders, it is necessary to organize carry propagation and generation into recursive trees.

    Here is the definition:

    Pi means the Cin=Cout, while Gi means the Cout = 1 and idependent of Cin.

    Then, the expression of Sum and Cout of one adder could be:

    We need dot operator to do recursive implementation.

    There gives a 16-bit Brent-Kung adder for example

    Fig 3. 16-bit Brent-Kung adder

    8

  • 7/30/2019 VLSI DSP Project Report_1.0

    9/38

    SIMULATION RESULTS

    The simulation is carried out with Modelsim SE 6.5c. The results are shown in Fig. 4 as follows:

    Fig 4. Simulation Waveforms of Wallace Multiplier.

    SYNTHESIS RESULTS OF THE WALLACE MULTIPLIER

    With the aid of Synopsis Design Compiler, we employ FreePDK 45 nm CMOS technique to

    obtain the synthesis results with respect of power dissipation, length of critical path, and silicon

    area.

    9

  • 7/30/2019 VLSI DSP Project Report_1.0

    10/38

    Results of Critical Path

    The result of Critical Path is shown as follows:

    ****************************************Report : timing

    -path full

    -delay max

    -max_paths 1

    -sort_by group

    Design : multiplier

    Version: A-2007.12

    Date : Tue Nov 17 16:39:59 2009

    ****************************************

    Operating Conditions: typical Library: gscl45nm

    Wire Load Model Mode: top

    Startpoint: B[3] (input port)

    Endpoint: sum[31] (output port)

    Path Group: (none)

    Path Type: max

    Point Incr Path

    --------------------------------------------------------------------------

    input external delay 0.00 0.00 r

    B[3] (in) 0.00 0.00 r

    pre_coding/y[3] (booth_coding) 0.00 0.00 r

    pre_coding/U63/Y (INVX1) 0.04 0.04 f

    pre_coding/U50/Y (NAND3X1) 0.04 0.09 r

    pre_coding/U13/Y (BUFX2) 0.04 0.12 r

    pre_coding/U49/Y (OAI21X1) 0.01 0.14 f

    pre_coding/C857/Z_11 (*SELECT_OP_5.15_5.1_15) 0.00 0.14 f

    pre_coding/pp1[12] (booth_coding) 0.00 0.14 f

    tree_adder/pp1[12] (wallace_tree_adder) 0.00 0.14 f

    tree_adder/adder13/inb (full_adder_81) 0.00 0.14 f

    tree_adder/adder13/U6/Y (XNOR2X1) 0.07 0.20 r

    tree_adder/adder13/U3/Y (XOR2X1) 0.07 0.27 r

    tree_adder/adder13/sum (full_adder_81) 0.00 0.27 r

    tree_adder/adder50/ina (full_adder_53) 0.00 0.27 r

    10

  • 7/30/2019 VLSI DSP Project Report_1.0

    11/38

    tree_adder/adder50/U6/Y (XNOR2X1) 0.06 0.34 r

    tree_adder/adder50/U3/Y (XOR2X1) 0.07 0.41 r

    tree_adder/adder50/sum (full_adder_53) 0.00 0.41 r

    tree_adder/adder88/ina (full_adder_26) 0.00 0.41 r

    tree_adder/adder88/U6/Y (XNOR2X1) 0.06 0.47 rtree_adder/adder88/U3/Y (XOR2X1) 0.07 0.54 r

    tree_adder/adder88/sum (full_adder_26) 0.00 0.54 r

    tree_adder/adder115/ina (full_adder_10) 0.00 0.54 r

    tree_adder/adder115/U6/Y (XNOR2X1) 0.06 0.60 r

    tree_adder/adder115/U3/Y (XOR2X1) 0.07 0.67 r

    tree_adder/adder115/sum (full_adder_10) 0.00 0.67 r

    tree_adder/adder142/ina (full_adder_1) 0.00 0.67 r

    tree_adder/adder142/U6/Y (XNOR2X1) 0.06 0.74 r

    tree_adder/adder142/U3/Y (XOR2X1) 0.05 0.78 f

    tree_adder/adder142/sum (full_adder_1) 0.00 0.78 ftree_adder/VMA/A[10] (brent_kung_28bitadder) 0.00 0.78 f

    tree_adder/VMA/pg10/A (p_g_17) 0.00 0.78 f

    tree_adder/VMA/pg10/U2/Y (AND2X1) 0.04 0.82 f

    tree_adder/VMA/pg10/G (p_g_17) 0.00 0.82 f

    tree_adder/VMA/U30/Y (AND2X1) 0.06 0.88 f

    tree_adder/VMA/adder5/Gin (dot_com_18) 0.00 0.88 f

    tree_adder/VMA/adder5/U4/Y (AOI21X1) 0.05 0.93 r

    tree_adder/VMA/adder5/U1/Y (BUFX2) 0.03 0.96 r

    tree_adder/VMA/adder5/U3/Y (INVX1) 0.02 0.98 f

    tree_adder/VMA/adder5/Gout (dot_com_18) 0.00 0.98 ftree_adder/VMA/adder18/G (dot_com_5) 0.00 0.98 f

    tree_adder/VMA/adder18/U4/Y (AOI21X1) 0.02 0.99 r

    tree_adder/VMA/adder18/U1/Y (BUFX2) 0.03 1.03 r

    tree_adder/VMA/adder18/U3/Y (INVX1) 0.02 1.04 f

    tree_adder/VMA/adder18/Gout (dot_com_5) 0.00 1.04 f

    tree_adder/VMA/adder22/G (dot_com_1) 0.00 1.04 f

    tree_adder/VMA/adder22/U4/Y (AOI21X1) 0.02 1.06 r

    tree_adder/VMA/adder22/U1/Y (BUFX2) 0.03 1.09 r

    tree_adder/VMA/adder22/U3/Y (INVX1) 0.02 1.11 f

    tree_adder/VMA/adder22/Gout (dot_com_1) 0.00 1.11 f

    tree_adder/VMA/adder26/G (half_dot_com_22) 0.00 1.11 f

    tree_adder/VMA/adder26/U3/Y (AOI21X1) 0.02 1.12 r

    tree_adder/VMA/adder26/U1/Y (BUFX2) 0.03 1.15 r

    tree_adder/VMA/adder26/U2/Y (INVX1) 0.04 1.20 f

    tree_adder/VMA/adder26/Gout (half_dot_com_22) 0.00 1.20 f

    tree_adder/VMA/adder41/Gin (half_dot_com_8) 0.00 1.20 f

    11

  • 7/30/2019 VLSI DSP Project Report_1.0

    12/38

    tree_adder/VMA/adder41/U3/Y (AOI21X1) 0.04 1.24 r

    tree_adder/VMA/adder41/U1/Y (BUFX2) 0.03 1.27 r

    tree_adder/VMA/adder41/U2/Y (INVX1) 0.03 1.30 f

    tree_adder/VMA/adder41/Gout (half_dot_com_8) 0.00 1.30 f

    tree_adder/VMA/adder43/Gin (half_dot_com_6) 0.00 1.30 ftree_adder/VMA/adder43/U3/Y (AOI21X1) 0.04 1.34 r

    tree_adder/VMA/adder43/U1/Y (BUFX2) 0.03 1.37 r

    tree_adder/VMA/adder43/U2/Y (INVX1) 0.03 1.41 f

    tree_adder/VMA/adder43/Gout (half_dot_com_6) 0.00 1.41 f

    tree_adder/VMA/adder45/Gin (half_dot_com_4) 0.00 1.41 f

    tree_adder/VMA/adder45/U3/Y (AOI21X1) 0.04 1.44 r

    tree_adder/VMA/adder45/U1/Y (BUFX2) 0.03 1.47 r

    tree_adder/VMA/adder45/U2/Y (INVX1) 0.03 1.51 f

    tree_adder/VMA/adder45/Gout (half_dot_com_4) 0.00 1.51 f

    tree_adder/VMA/adder47/Gin (half_dot_com_2) 0.00 1.51 ftree_adder/VMA/adder47/U3/Y (AOI21X1) 0.04 1.54 r

    tree_adder/VMA/adder47/U1/Y (BUFX2) 0.03 1.58 r

    tree_adder/VMA/adder47/U2/Y (INVX1) 0.03 1.60 f

    tree_adder/VMA/adder47/Gout (half_dot_com_2) 0.00 1.60 f

    tree_adder/VMA/adder48/Gin (half_dot_com_1) 0.00 1.60 f

    tree_adder/VMA/adder48/U3/Y (AOI21X1) 0.03 1.64 r

    tree_adder/VMA/adder48/U1/Y (BUFX2) 0.03 1.67 r

    tree_adder/VMA/adder48/U2/Y (INVX1) 0.02 1.69 f

    tree_adder/VMA/adder48/Gout (half_dot_com_1) 0.00 1.69 f

    tree_adder/VMA/U9/Y (XOR2X1) 0.04 1.73 rtree_adder/VMA/sum[27] (brent_kung_28bitadder) 0.00 1.73 r

    tree_adder/sum[31] (wallace_tree_adder) 0.00 1.73 r

    sum[31] (out) 0.00 1.73 r

    data arrival time 1.73

    --------------------------------------------------------------------------

    (Path is unconstrained)

    12

  • 7/30/2019 VLSI DSP Project Report_1.0

    13/38

    Results of Power Consumption

    The result of power consumption is shown as follows:

    ****************************************Report : power

    -analysis_effort low

    Design : multiplier

    Version: A-2007.12

    Date : Tue Nov 17 16:39:24 2009

    ****************************************

    Library(s) Used:

    gscl45nm (File: /home/class/zhan0915/project/gscl45nm.db)

    Operating Conditions: typical Library: gscl45nm

    Wire Load Model Mode: top

    Global Operating Voltage = 1.1

    Power-specific unit information :

    Voltage Units = 1V

    Capacitance Units = 1.000000pf

    Time Units = 1ns

    Dynamic Power Units = 1mW (derived from V,C,T units)

    Leakage Power Units = 1nW

    Cell Internal Power = 1.7201 mW (57%)

    Net Switching Power = 1.2722 mW (43%)

    ---------

    Total Dynamic Power = 2.9922 mW (100%)

    Cell Leakage Power = 18.9980 uW

    13

  • 7/30/2019 VLSI DSP Project Report_1.0

    14/38

    Results of Area

    The result of silicon area is shown as follows:

    ****************************************Report : area

    Design : multiplier

    Version: A-2007.12

    Date : Thu Nov 17 16:47:24 2009

    ****************************************

    Library(s) Used:

    gscl45nm (File: /home/grads/zhan0884/FreePDK45/osu_soc/lib/files/gscl45nm.db)

    Number of ports: 70

    Number of nets: 486

    Number of cells: 135

    Number of references: 135

    Combinational area: 4213.5213404

    Noncombinational area: 0.000000

    Net Interconnect area: undefined (No wire load specified)

    Total cell area: 4213.5213404

    Total area: undefined

    Conclusion of Phase1

    In the project Phase 1, one 1616 multiplier is designed based on Wallace tree using radix-4

    booths algorithm. Synthesis results show that the multiplier takes the silicon area of

    4213.521340 m2. The critical path of the multiplier is 1.73 ns. And the total power consumption

    of the multiplier is 2.9922 mW.

    14

  • 7/30/2019 VLSI DSP Project Report_1.0

    15/38

    SERVERAL OTHER FFT DESIGNS

    Before going into details of the new approach, it is beneficial to have a brief review of the

    various architectures for pipeline FFT processors. This Section give a brief review of previous

    approaches for FFT hardware design. Different approaches will be put into functional blockswith unified terminology, where the additive butterfly has been separated from multiplier to

    show the hardware requirement distinctively. The control and twiddle factor reading mechanism

    have been also omitted for clarity.

    R2MDC

    Radix-2 Multi-path Delay Commutator (R2MDC) was probably the most straightforward

    approach for pipeline implementation of radix-2 FFT algorithm. The input sequence has been

    broken into two parallel data stream flowing forward, with correct distance between data

    elements entering the butterfly scheduled by proper delays. Both butterflies and multipliers are in50%utilization. (log2N-2) multipliers, log2Nradix-2 butterflies, and (3/2N-2) registers (delay

    elements) are needed.

    Fig 5. R2MDC(N=16)

    R2SDF

    Radix-2 Single-path Delay Feedback (R2SDF) uses the registers more efficiently by storing the

    butterfly output in feedback shift registers. A single data stream goes through the multiplier at

    every stage. It has same number of butterfly units and multipliers as in R2MDC approach, but

    with much reduced memory requirement: (N-1) registers. Its memory requirement is minimal.

    Fig 6. R2SDF(N=16)

    15

  • 7/30/2019 VLSI DSP Project Report_1.0

    16/38

    R4SDF

    Radix-4 Single-path Delay Feedback (R4SDF) was proposed as a radix-4 version of R2SDF,

    employing Coordinate Rotational Digital Computer (CORDIC) iterations. The utilization of

    multipliers has been increased to 75% due to the storage of 3 out of radix-4 butterfly outputs.However, the utilization of the radix-4 butterfly, which is fairly complicated and contains at least

    8 complex adders, is dropped to only 25%. It requires (log4N-1) multipliers, log4 N full radix-4

    butterflies and storage of size (N-1).

    Fig 7. R4SDF(N=256)

    R4MDC

    Radix-4 Multi-path Delay Commutator (R4MDC) is a radix-4 version of R2MDC. It has been

    used as the architecture for the initial VLSI implementation of pipeline FFT processor and

    massive wafer scale integration. However, it suffers from low, 25%, utilization of all

    components, which can be compensated only in some special applications where four FFTs are

    being processed simultaneously. It requires 3log4Nmultipliers, log4Nfull radix-4 butterflies and(5/2N-4) registers.

    Fig 8. R4MDC(N=256)

    R4SDC

    Radix-4 Single-path Delay Commutator (R4SDC) uses a modified radix-4 algorithm with

    programmable 1/4 radix-4 butterflies to achieve higher, 75%utilization of multipliers. A

    combined Delay-Commutator also reduces the memory requirement to (2N-2) from (5/2N-1),

    that of R4MDC. The butterfly and delay-commutator become relatively complicated due to

    16

  • 7/30/2019 VLSI DSP Project Report_1.0

    17/38

    programmability requirement. R4SDC has been used recently in building the largest ever single

    chip pipeline FFT processor for HDTV application.

    Fig 9. R4SDC(N=256)

    FFT DESIGN BASED ON RADIX-22 ALGORITHM

    In this section, we will derive the hardware oriented radix-22 algorithm for FFT implementation.

    One example of 16-point radix-22 will be given in this section. And finally, the detailed butterfly

    trellis will be plotted to guideline the following hardware design.

    The DFT of sizeNis defined by

    1

    0

    ( ) ( ) ,0N

    nk

    N

    n

    X k x n W k N

    =

    =

  • 7/30/2019 VLSI DSP Project Report_1.0

    18/38

    1 2 3 1 2 3

    3 2 1

    2 3 1 2 3 2 31

    3 2

    2 1 1 1 ( )( 2 4 )2 4

    1 2 3 1 2 3

    0 0 0

    2 1 1( ) ( )(2 4 )4 4

    2 3

    0 02

    ( 2 4 ) ( )2 4

    ( )4

    N NNn n n k k k

    N

    n n n

    N NNn n k n n k k

    k

    N N N

    n n

    N NX k k k x n n n W

    NB n n W W

    + + + +

    = = =

    + + +

    = =

    + + = + +

    = +

    2 3 1 2 31

    3 2

    2 1 1 ( )( 2 4 )4

    2 3

    0 0 2

    ( )4

    NNn n k k k

    k

    N N

    n n

    NB n n W

    + + +

    = =

    = +

    where

    1 1

    2 3 2 3 1 2 3

    2

    ( ) ( ) ( 1) ( )4 4 2 4

    k k

    N

    N N N NB n n x n n x n n n+ = + + + +

    2 3 1 2 3 2 1 22 3 3 1 2 3 3

    3 1 2 3 32 1 2

    ( )( 2 4 ) ( 2 )( 2 ) 44 4

    ( 2 ) 4( 2 )( )

    N Nn n k k k n k k

    Nn k n k k n k

    N N N N N

    n k k n k n k k

    N N

    W W W W W

    j W W

    + + + + +

    ++

    =

    =

    then the equation will deduced for the radix-22 FFT algorithm:

    3 1 2 3 3

    3

    4 1( 2 )

    1 2 3 1 2 3 40( 2 4 ) ( , , )

    Nn k k n k

    N NnX k k k H k k n W W

    +

    =

    + + =

    1 1 2 1

    BF I BF I

    ( 2 )

    1 2 3 3 3 3 3

    BF II

    3( , , ) ( ) ( 1) ( ) ( ) ( ) ( 1) ( )

    2 4 4

    k k k k N N NH k k n x n x n j x n x n+

    = + + + + + +

    6 4 4 4 4 7 4 4 4 48 6 4 4 4 4 4 7 4 4 4 4 4 8

    1 4 4 4 4 4 4 4 4 4 4 4 4 4 2 4 4 4 4 4 4 4 4 4 4 4 4 43

    This equation represents the first two stages of butterflies with only trivial multiplications in the

    SFG, as BF I and BF II in Fig 10. After these two stages, full multipliers are required to compute

    the product of the decomposed twiddle factor3 1 2( 2 )n k k

    NW+

    in eqn. X(k1+2k2+4k3), as shown in

    Fig 10. Note the order of the twiddle factors is different from that of radix-4 algorithm.

    18

  • 7/30/2019 VLSI DSP Project Report_1.0

    19/38

    Fig 10. Butterfly with decomposed twiddle factors.

    Radix-22 algorithm has the feature that it has the same multiplicative complexity as radix-4

    algorithms, but still retains the radix-2 butterfly structures. The multiplicative operations are in

    such an arrangement that only every other stage has non-trivial multiplications. This is a great

    structural advantage over other algorithms when pipeline/cascade FFT architecture is under

    consideration.

    RADIX-22 SDF ARCHITECTURE FOR 1024 POINTS COMPLEX FFT

    Fig. 11 outlines an implementation of the R22SDF architecture for N = 1024, note the similarity

    of the data-path to R2SDF and the reduced number of multipliers. The implementation uses two

    types of butterflies, one identical to that in R2SDF, the other contains also the logic to implementthe trivial twiddle factor multiplication, as shown in Fig. 12,13 respectively. Due to the spatial

    regularity of Radix-22 algorithm, the synchronization control of the processor is very simple. A

    (log2 N)-bit binary counter serves two purposes: synchronization controller and address counter

    for twiddle factor reading in each stages.

    19

  • 7/30/2019 VLSI DSP Project Report_1.0

    20/38

    Fig 11. 1024 points Radix-22 FFT architecture.

    Fig 12. BF2I.

    Fig 13. BF2II.

    20

  • 7/30/2019 VLSI DSP Project Report_1.0

    21/38

    .

    .

    .

    BF 1

    .

    .

    .

    BF 2

    .

    .

    .

    BF 3

    .

    .

    .

    BF 4

    .

    .

    .

    BF 5

    .

    .

    .

    BF 6

    .

    .

    .

    10 24-point ra dix-22

    FFT algorithm

    .

    .

    .

    x (0)

    .

    .

    .

    BF7

    .

    .

    .

    BF8

    .

    .

    .

    BF 9

    .

    .

    .

    BF 10

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    x (1)

    x (2)

    x (3)

    x (4)

    x(1 0 2 2)

    x (1 0 2 3)

    X(0 )

    X(1 )

    X(2 )

    X(3 )

    X(4 )

    X(1 0 2 2)

    X(1 0 2 3)

    0-0

    0-1

    0-2

    0-3

    0-4

    0-1 0 2 3

    1- 0

    1- 1

    1- 2

    1- 3

    1- 4

    1-1 0 2 3

    2- 0

    2- 1

    2- 2

    2- 3

    2- 4

    2-1 0 2 3

    3-0

    3-1

    3-2

    3-3

    3-4

    3-1 0 2 3

    4-0

    4-1

    4-2

    4-3

    4-4

    4-1 0 2 3

    5- 0

    5- 1

    5- 2

    5- 3

    5- 4

    5-1 0 2 3

    6-0

    6-1

    6-2

    6-3

    6-4

    6-1 0 2 3

    7- 0

    7- 1

    7- 2

    7- 3

    7- 4

    7-1 0 2 3

    8-0

    8-1

    8-2

    8-3

    8-4

    8-1 0 2 3

    N(0) N(1) N(2) N(3) N(4) N(5) N(6) N(7) N(8)

    Figure 14. Butterfly architecture of the 1024 points radix-22 FFT.

    The 1024 points FFT using R22SDF architecture is shown in Fig.14. It includes 5 stage of butterfly in which there are two butterfly

    named BF1 and BF2. We describe the connection between BF1 and BF2 as a network, thus there are 9 networks named from N(0) to

    N(8) shown in Fig 14. Each network contains 1024 piece of wires which we described as m-n where m represents the network and n

    stands for the certain wire. For instance, the 128th wire in network 7 is represented as 7-128. Thus, we can use the formulation given

    before to calculate each wires twiddle factor. The twiddle factors value is specified in Table 2 shown as follows:

    21

  • 7/30/2019 VLSI DSP Project Report_1.0

    22/38

    Table 2. Twiddle factors of each wire.

    Network Wire number Twiddle factor

    N(0) 0-m, 768m1023 -j

    0-m, else 1

    N(2) 0-m, 192+256nm255+256n, (0n3) -j0-m, else 1N(4) 0-m, 48+64nm63+64n, (0n15) -j

    0-m, else 1

    N(6) 0-m, 12+16nm15+16n, (0n63) -j

    0-m, else 1

    N(8) 0-m, m=3+4n, (0n255) -j

    0-m, else 1

    N(1) 0-m, 768m1023 W3(m-768)

    0-m, 512m767 W1(m-512)

    0-m, 256m511 W2(m-512)

    0-m, else 1N(3) 0-m, 192+256nm255+256n, (0n3) W3(m-192-256n)

    0-m, 128+256nm191+256n, (0n3) W1(m-128-256n)

    0-m, 64+256nm127+256n, (0n3) W2(m-64-256n)

    0- m, else 1

    N(5) 0-m, 48+64nm63+64n, (0n15) W3(m-48-64n)

    0- m, 32+64nm47+64n, (0n15) W1(m-32-64n)

    0-m, 16+64nm31+64n, (0n15) W2(m-16-64n)

    0- m, else 1

    N(7) 0-m, 12+16nm15+16n, (0n63) W3(m-12-16n)

    0- m, 8+16nm11+16n, (0n63) W1(m-8-16n)

    0-m, 4+16nm7+16n, (0n63) W2(m-4-16n)

    0- m, else 1

    22

  • 7/30/2019 VLSI DSP Project Report_1.0

    23/38

    SYNTHESIS RESULTS OF THE 1024 POINTS FFT

    With the aid of Synopsis Design Compiler, we employ FreePDK 45 nm CMOS technique to

    obtain the synthesis results with respect of power dissipation, length of critical path, and silicon

    area.

    Figure 15. Simulation Waveforms of 1024 points FFT

    23

  • 7/30/2019 VLSI DSP Project Report_1.0

    24/38

    Results of Power Dissipation

    The result of power consumption is shown as follows:

    ****************************************Report : power

    -analysis_effort low

    Design : fft_1024

    Version: A-2007.12

    Date : Tue Dec 22 07:40:01 2009

    ****************************************

    Library(s) Used:

    gscl45nm (File: /home/class/zhan0915/fft3/gscl45nm.db)

    Operating Conditions: typical Library: gscl45nm

    Wire Load Model Mode: top

    Global Operating Voltage = 1.1

    Power-specific unit information :

    Voltage Units = 1V

    Capacitance Units = 1.000000pf

    Time Units = 1ns

    Dynamic Power Units = 1mW (derived from V,C,T units)

    Leakage Power Units = 1nW

    Cell Internal Power = 37.6966 mW (83%)

    Net Switching Power = 7.7248 mW (17%)

    ---------

    Total Dynamic Power = 45.4214 mW (100%)

    Cell Leakage Power = 2.2162 mW

    24

  • 7/30/2019 VLSI DSP Project Report_1.0

    25/38

    Results of Critical Path

    The result of length of critical path is shown as follows:

    ****************************************Report : timing

    -path full

    -delay max

    -max_paths 1

    -sort_by group

    Design : fft_1024

    Version: A-2007.12

    Date : Tue Dec 22 07:41:08 2009

    ****************************************

    # A fanout number of 1000 was used for high fanout net computations.

    Operating Conditions: typical Library: gscl45nm

    Wire Load Model Mode: top

    Startpoint: counter/q_reg[8]

    (rising edge-triggered flip-flop)

    Endpoint: imag_out[15]

    (output port)

    Path Group: (none)

    Path Type: max

    Point Incr Path

    --------------------------------------------------------------------------

    counter/q_reg[8]/CLK (DFFPOSX1) 0.00 # 0.00 r

    counter/q_reg[8]/Q (DFFPOSX1) 0.36 0.36 r

    counter/q[8] (ctr) 0.00 0.36 r

    bf_2_0/s (bf_2_0) 0.00 0.36 r

    bf_2_0/U279/Y (INVX1) 0.21 0.57 f

    bf_2_0/U101/Y (OR2X1) 0.07 0.64 f

    bf_2_0/U102/Y (INVX1) 0.74 1.38 r

    bf_2_0/U278/Y (MUX2X1) 0.20 1.58 f

    bf_2_0/U262/Y (INVX1) 0.10 1.67 r

    bf_2_0/adder1/B[0] (vma16_35) 0.00 1.67 r

    bf_2_0/adder1/ipg16/B[0] (p_g_16_35) 0.00 1.67 r

    25

  • 7/30/2019 VLSI DSP Project Report_1.0

    26/38

    bf_2_0/adder1/ipg16/U31/Y (XOR2X1) 0.04 1.72 f

    bf_2_0/adder1/ipg16/pg0[1] (p_g_16_35) 0.00 1.72 f

    bf_2_0/adder1/ir1c1/pg[1] (partial_product_generator1_560)

    0.00 1.72 f

    bf_2_0/adder1/ir1c1/U2/Y (AOI21X1) 0.04 1.76 rbf_2_0/adder1/ir1c1/U1/Y (INVX1) 0.04 1.79 f

    bf_2_0/adder1/ir1c1/pgo (partial_product_generator1_560)

    0.00 1.79 f

    bf_2_0/adder1/ir2c3/pg0 (partial_product_generator1_559)

    0.00 1.79 f

    bf_2_0/adder1/ir2c3/U2/Y (AOI21X1) 0.04 1.83 r

    bf_2_0/adder1/ir2c3/U1/Y (INVX1) 0.04 1.87 f

    bf_2_0/adder1/ir2c3/pgo (partial_product_generator1_559)

    0.00 1.87 f

    bf_2_0/adder1/ir3c7/pg0 (partial_product_generator1_558)0.00 1.87 f

    bf_2_0/adder1/ir3c7/U2/Y (AOI21X1) 0.04 1.91 r

    bf_2_0/adder1/ir3c7/U1/Y (INVX1) 0.05 1.96 f

    bf_2_0/adder1/ir3c7/pgo (partial_product_generator1_558)

    0.00 1.96 f

    bf_2_0/adder1/ixor16/A[7] (xor_16_35) 0.00 1.96 f

    bf_2_0/adder1/ixor16/U3/Y (XOR2X1) 0.04 2.00 f

    bf_2_0/adder1/ixor16/S[7] (xor_16_35) 0.00 2.00 f

    bf_2_0/adder1/S[7] (vma16_35) 0.00 2.00 f

    bf_2_0/U233/Y (AOI22X1) 0.05 2.05 rbf_2_0/U21/Y (BUFX2) 0.05 2.10 r

    bf_2_0/U9/Y (AND2X1) 0.03 2.13 r

    bf_2_0/U89/Y (INVX1) 0.03 2.15 f

    bf_2_0/imag_out0[7] (bf_2_0) 0.00 2.15 f

    mul0_i/A[7] (multiplier_7) 0.00 2.15 f

    mul0_i/pre_coding/x[7] (booth_coding_7) 0.00 2.15 f

    mul0_i/pre_coding/U240/Y (INVX1) 0.25 2.41 r

    mul0_i/pre_coding/U663/Y (MUX2X1) 0.10 2.50 f

    mul0_i/pre_coding/U662/Y (OAI21X1) 0.05 2.55 r

    mul0_i/pre_coding/pp0[7] (booth_coding_7) 0.00 2.55 r

    mul0_i/tree_adder/pp0[7] (wallace_tree_adder_7) 0.00 2.55 r

    mul0_i/tree_adder/adder6/ina (full_adder_647) 0.00 2.55 r

    mul0_i/tree_adder/adder6/U6/Y (XNOR2X1) 0.07 2.63 r

    mul0_i/tree_adder/adder6/U3/Y (XOR2X1) 0.07 2.70 r

    mul0_i/tree_adder/adder6/sum (full_adder_647) 0.00 2.70 r

    mul0_i/tree_adder/adder43/ina (full_adder_619) 0.00 2.70 r

    26

  • 7/30/2019 VLSI DSP Project Report_1.0

    27/38

    mul0_i/tree_adder/adder43/U6/Y (XNOR2X1) 0.07 2.77 r

    mul0_i/tree_adder/adder43/U3/Y (XOR2X1) 0.07 2.85 r

    mul0_i/tree_adder/adder43/sum (full_adder_619) 0.00 2.85 r

    mul0_i/tree_adder/adder81/ina (full_adder_591) 0.00 2.85 r

    mul0_i/tree_adder/adder81/U6/Y (XNOR2X1) 0.08 2.92 rmul0_i/tree_adder/adder81/U3/Y (XOR2X1) 0.07 3.00 r

    mul0_i/tree_adder/adder81/sum (full_adder_591) 0.00 3.00 r

    mul0_i/tree_adder/adder108/ina (half_adder_441) 0.00 3.00 r

    mul0_i/tree_adder/adder108/U1/Y (XOR2X1) 0.07 3.07 r

    mul0_i/tree_adder/adder108/sum (half_adder_441) 0.00 3.07 r

    mul0_i/tree_adder/adder135/ina (half_adder_425) 0.00 3.07 r

    mul0_i/tree_adder/adder135/U1/Y (XOR2X1) 0.07 3.14 r

    mul0_i/tree_adder/adder135/sum (half_adder_425) 0.00 3.14 r

    mul0_i/tree_adder/VMA/A[3] (brent_kung_28bitadder_7)

    0.00 3.14 rmul0_i/tree_adder/VMA/pg3/A (p_g_195) 0.00 3.14 r

    mul0_i/tree_adder/VMA/pg3/U1/Y (XOR2X1) 0.08 3.22 r

    mul0_i/tree_adder/VMA/pg3/P (p_g_195) 0.00 3.22 r

    mul0_i/tree_adder/VMA/adder1/P (dot_com_167) 0.00 3.22 r

    mul0_i/tree_adder/VMA/adder1/U1/Y (AND2X1) 0.04 3.26 r

    mul0_i/tree_adder/VMA/adder1/Pout (dot_com_167) 0.00 3.26 r

    mul0_i/tree_adder/VMA/adder20/P (half_dot_com_182) 0.00 3.26 r

    mul0_i/tree_adder/VMA/adder20/U2/Y (AOI21X1) 0.02 3.28 f

    mul0_i/tree_adder/VMA/adder20/U1/Y (INVX1) 0.05 3.33 r

    mul0_i/tree_adder/VMA/adder20/Gout (half_dot_com_182)0.00 3.33 r

    mul0_i/tree_adder/VMA/U6/Y (XOR2X1) 0.08 3.42 r

    mul0_i/tree_adder/VMA/sum[4] (brent_kung_28bitadder_7)

    0.00 3.42 r

    mul0_i/tree_adder/sum[8] (wallace_tree_adder_7) 0.00 3.42 r

    mul0_i/sum[8] (multiplier_7) 0.00 3.42 r

    bf_1_1/imag_in1[8] (bf_1_4) 0.00 3.42 r

    bf_1_1/adder1/B[8] (vma16_31) 0.00 3.42 r

    bf_1_1/adder1/ipg16/B[8] (p_g_16_31) 0.00 3.42 r

    bf_1_1/adder1/ipg16/U3/Y (XOR2X1) 0.06 3.47 f

    bf_1_1/adder1/ipg16/pg8[1] (p_g_16_31) 0.00 3.47 f

    bf_1_1/adder1/ir1c9/pg[1] (partial_product_generator2_338)

    0.00 3.47 f

    bf_1_1/adder1/ir1c9/U3/Y (AOI21X1) 0.04 3.51 r

    bf_1_1/adder1/ir1c9/U2/Y (INVX1) 0.03 3.53 f

    bf_1_1/adder1/ir1c9/pgo[0] (partial_product_generator2_338)

    27

  • 7/30/2019 VLSI DSP Project Report_1.0

    28/38

    0.00 3.53 f

    bf_1_1/adder1/ir5c9/pg[0] (partial_product_generator1_490)

    0.00 3.53 f

    bf_1_1/adder1/ir5c9/U2/Y (AOI21X1) 0.02 3.55 r

    bf_1_1/adder1/ir5c9/U1/Y (INVX1) 0.03 3.58 fbf_1_1/adder1/ir5c9/pgo (partial_product_generator1_490)

    0.00 3.58 f

    bf_1_1/adder1/ir6c10/pg0 (partial_product_generator1_486)

    0.00 3.58 f

    bf_1_1/adder1/ir6c10/U2/Y (AOI21X1) 0.03 3.62 r

    bf_1_1/adder1/ir6c10/U1/Y (INVX1) 0.03 3.64 f

    bf_1_1/adder1/ir6c10/pgo (partial_product_generator1_486)

    0.00 3.64 f

    bf_1_1/adder1/ixor16/A[10] (xor_16_31) 0.00 3.64 f

    bf_1_1/adder1/ixor16/U15/Y (XOR2X1) 0.03 3.68 fbf_1_1/adder1/ixor16/S[10] (xor_16_31) 0.00 3.68 f

    bf_1_1/adder1/S[10] (vma16_31) 0.00 3.68 f

    bf_1_1/U128/Y (MUX2X1) 0.04 3.71 r

    bf_1_1/U127/Y (INVX1) 0.04 3.76 f

    bf_1_1/imag_out0[10] (bf_1_4) 0.00 3.76 f

    bf_2_1/imag_in1[10] (bf_2_4) 0.00 3.76 f

    bf_2_1/U277/Y (MUX2X1) 0.08 3.84 r

    bf_2_1/U261/Y (INVX1) 0.09 3.92 f

    bf_2_1/adder1/B[10] (vma16_27) 0.00 3.92 f

    bf_2_1/adder1/ipg16/B[10] (p_g_16_27) 0.00 3.92 fbf_2_1/adder1/ipg16/U29/Y (XOR2X1) 0.04 3.97 f

    bf_2_1/adder1/ipg16/pg10[1] (p_g_16_27) 0.00 3.97 f

    bf_2_1/adder1/ir1c11/pg[1] (partial_product_generator2_295)

    0.00 3.97 f

    bf_2_1/adder1/ir1c11/U1/Y (AND2X1) 0.04 4.01 f

    bf_2_1/adder1/ir1c11/pgo[1] (partial_product_generator2_295)

    0.00 4.01 f

    bf_2_1/adder1/ir2c11/pg[1] (partial_product_generator2_289)

    0.00 4.01 f

    bf_2_1/adder1/ir2c11/U1/Y (AND2X1) 0.04 4.05 f

    bf_2_1/adder1/ir2c11/pgo[1] (partial_product_generator2_289)

    0.00 4.05 f

    bf_2_1/adder1/ir6c11/pg[1] (partial_product_generator1_428)

    0.00 4.05 f

    bf_2_1/adder1/ir6c11/U2/Y (AOI21X1) 0.04 4.09 r

    bf_2_1/adder1/ir6c11/U1/Y (INVX1) 0.04 4.12 f

    28

  • 7/30/2019 VLSI DSP Project Report_1.0

    29/38

    bf_2_1/adder1/ir6c11/pgo (partial_product_generator1_428)

    0.00 4.12 f

    bf_2_1/adder1/ixor16/A[11] (xor_16_27) 0.00 4.12 f

    bf_2_1/adder1/ixor16/U14/Y (XOR2X1) 0.04 4.16 f

    bf_2_1/adder1/ixor16/S[11] (xor_16_27) 0.00 4.16 fbf_2_1/adder1/S[11] (vma16_27) 0.00 4.16 f

    bf_2_1/U244/Y (AOI22X1) 0.05 4.21 r

    bf_2_1/U21/Y (BUFX2) 0.05 4.26 r

    bf_2_1/U4/Y (AND2X1) 0.03 4.29 r

    bf_2_1/U86/Y (INVX1) 0.03 4.31 f

    bf_2_1/imag_out0[11] (bf_2_4) 0.00 4.31 f

    mul1_i/A[11] (multiplier_5) 0.00 4.31 f

    mul1_i/pre_coding/x[11] (booth_coding_5) 0.00 4.31 f

    mul1_i/pre_coding/U247/Y (INVX1) 0.25 4.57 r

    mul1_i/pre_coding/U686/Y (MUX2X1) 0.10 4.66 fmul1_i/pre_coding/U685/Y (OAI21X1) 0.05 4.71 r

    mul1_i/pre_coding/pp0[11] (booth_coding_5) 0.00 4.71 r

    mul1_i/tree_adder/pp0[11] (wallace_tree_adder_5) 0.00 4.71 r

    mul1_i/tree_adder/adder10/ina (full_adder_457) 0.00 4.71 r

    mul1_i/tree_adder/adder10/U6/Y (XNOR2X1) 0.07 4.79 r

    mul1_i/tree_adder/adder10/U3/Y (XOR2X1) 0.07 4.86 r

    mul1_i/tree_adder/adder10/sum (full_adder_457) 0.00 4.86 r

    mul1_i/tree_adder/adder47/ina (full_adder_429) 0.00 4.86 r

    mul1_i/tree_adder/adder47/U6/Y (XNOR2X1) 0.07 4.94 r

    mul1_i/tree_adder/adder47/U3/Y (XOR2X1) 0.07 5.01 rmul1_i/tree_adder/adder47/sum (full_adder_429) 0.00 5.01 r

    mul1_i/tree_adder/adder85/ina (full_adder_402) 0.00 5.01 r

    mul1_i/tree_adder/adder85/U6/Y (XNOR2X1) 0.08 5.08 r

    mul1_i/tree_adder/adder85/U3/Y (XOR2X1) 0.07 5.16 r

    mul1_i/tree_adder/adder85/sum (full_adder_402) 0.00 5.16 r

    mul1_i/tree_adder/adder112/ina (full_adder_385) 0.00 5.16 r

    mul1_i/tree_adder/adder112/U6/Y (XNOR2X1) 0.08 5.23 r

    mul1_i/tree_adder/adder112/U3/Y (XOR2X1) 0.07 5.31 r

    mul1_i/tree_adder/adder112/sum (full_adder_385) 0.00 5.31 r

    mul1_i/tree_adder/adder139/ina (half_adder_287) 0.00 5.31 r

    mul1_i/tree_adder/adder139/U1/Y (XOR2X1) 0.07 5.38 r

    mul1_i/tree_adder/adder139/sum (half_adder_287) 0.00 5.38 r

    mul1_i/tree_adder/VMA/A[7] (brent_kung_28bitadder_5)

    0.00 5.38 r

    mul1_i/tree_adder/VMA/pg7/A (p_g_135) 0.00 5.38 r

    mul1_i/tree_adder/VMA/pg7/U1/Y (XOR2X1) 0.05 5.43 f

    29

  • 7/30/2019 VLSI DSP Project Report_1.0

    30/38

    mul1_i/tree_adder/VMA/pg7/P (p_g_135) 0.00 5.43 f

    mul1_i/tree_adder/VMA/adder3/P (dot_com_117) 0.00 5.43 f

    mul1_i/tree_adder/VMA/adder3/U1/Y (AND2X1) 0.05 5.49 f

    mul1_i/tree_adder/VMA/adder3/Pout (dot_com_117) 0.00 5.49 f

    mul1_i/tree_adder/VMA/adder19/P (dot_com_101) 0.00 5.49 fmul1_i/tree_adder/VMA/adder19/U1/Y (AND2X1) 0.05 5.54 f

    mul1_i/tree_adder/VMA/adder19/Pout (dot_com_101) 0.00 5.54 f

    mul1_i/tree_adder/VMA/adder23/P (half_dot_com_129) 0.00 5.54 f

    mul1_i/tree_adder/VMA/adder23/U2/Y (AOI21X1) 0.04 5.57 r

    mul1_i/tree_adder/VMA/adder23/U1/Y (INVX1) 0.04 5.61 f

    mul1_i/tree_adder/VMA/adder23/Gout (half_dot_com_129)

    0.00 5.61 f

    mul1_i/tree_adder/VMA/U2/Y (XOR2X1) 0.08 5.69 r

    mul1_i/tree_adder/VMA/sum[8] (brent_kung_28bitadder_5)

    0.00 5.69 rmul1_i/tree_adder/sum[12] (wallace_tree_adder_5) 0.00 5.69 r

    mul1_i/sum[12] (multiplier_5) 0.00 5.69 r

    bf_1_2/imag_in1[12] (bf_1_3) 0.00 5.69 r

    bf_1_2/adder1/B[12] (vma16_23) 0.00 5.69 r

    bf_1_2/adder1/ipg16/B[12] (p_g_16_23) 0.00 5.69 r

    bf_1_2/adder1/ipg16/U25/Y (XOR2X1) 0.06 5.75 f

    bf_1_2/adder1/ipg16/pg12[1] (p_g_16_23) 0.00 5.75 f

    bf_1_2/adder1/ir1c13/pg[1] (partial_product_generator2_252)

    0.00 5.75 f

    bf_1_2/adder1/ir1c13/U3/Y (AOI21X1) 0.04 5.79 rbf_1_2/adder1/ir1c13/U2/Y (INVX1) 0.03 5.81 f

    bf_1_2/adder1/ir1c13/pgo[0] (partial_product_generator2_252)

    0.00 5.81 f

    bf_1_2/adder1/ir5c13/pg[0] (partial_product_generator1_363)

    0.00 5.81 f

    bf_1_2/adder1/ir5c13/U2/Y (AOI21X1) 0.02 5.83 r

    bf_1_2/adder1/ir5c13/U1/Y (INVX1) 0.03 5.86 f

    bf_1_2/adder1/ir5c13/pgo (partial_product_generator1_363)

    0.00 5.86 f

    bf_1_2/adder1/ir6c14/pg0 (partial_product_generator1_360)

    0.00 5.86 f

    bf_1_2/adder1/ir6c14/U2/Y (AOI21X1) 0.03 5.90 r

    bf_1_2/adder1/ir6c14/U1/Y (INVX1) 0.03 5.92 f

    bf_1_2/adder1/ir6c14/pgo (partial_product_generator1_360)

    0.00 5.92 f

    bf_1_2/adder1/ixor16/A[14] (xor_16_23) 0.00 5.92 f

    30

  • 7/30/2019 VLSI DSP Project Report_1.0

    31/38

    bf_1_2/adder1/ixor16/U11/Y (XOR2X1) 0.03 5.95 f

    bf_1_2/adder1/ixor16/S[14] (xor_16_23) 0.00 5.95 f

    bf_1_2/adder1/S[14] (vma16_23) 0.00 5.95 f

    bf_1_2/U120/Y (MUX2X1) 0.04 5.99 r

    bf_1_2/U119/Y (INVX1) 0.04 6.04 fbf_1_2/imag_out0[14] (bf_1_3) 0.00 6.04 f

    bf_2_2/imag_in1[14] (bf_2_3) 0.00 6.04 f

    bf_2_2/U273/Y (MUX2X1) 0.08 6.11 r

    bf_2_2/U257/Y (INVX1) 0.09 6.20 f

    bf_2_2/adder1/B[14] (vma16_19) 0.00 6.20 f

    bf_2_2/adder1/ipg16/B[14] (p_g_16_19) 0.00 6.20 f

    bf_2_2/adder1/ipg16/U21/Y (XOR2X1) 0.06 6.26 r

    bf_2_2/adder1/ipg16/pg14[1] (p_g_16_19) 0.00 6.26 r

    bf_2_2/adder1/ixor16/B[14] (xor_16_19) 0.00 6.26 r

    bf_2_2/adder1/ixor16/U11/Y (XOR2X1) 0.05 6.31 fbf_2_2/adder1/ixor16/S[14] (xor_16_19) 0.00 6.31 f

    bf_2_2/adder1/S[14] (vma16_19) 0.00 6.31 f

    bf_2_2/U241/Y (AOI22X1) 0.05 6.36 r

    bf_2_2/U24/Y (BUFX2) 0.05 6.40 r

    bf_2_2/U13/Y (AND2X1) 0.03 6.43 r

    bf_2_2/U83/Y (INVX1) 0.03 6.46 f

    bf_2_2/imag_out0[14] (bf_2_3) 0.00 6.46 f

    mul2_i/A[14] (multiplier_3) 0.00 6.46 f

    mul2_i/pre_coding/x[14] (booth_coding_3) 0.00 6.46 f

    mul2_i/pre_coding/U251/Y (INVX1) 0.25 6.71 rmul2_i/pre_coding/U680/Y (MUX2X1) 0.10 6.81 f

    mul2_i/pre_coding/U679/Y (OAI21X1) 0.05 6.86 r

    mul2_i/pre_coding/pp0[14] (booth_coding_3) 0.00 6.86 r

    mul2_i/tree_adder/pp0[14] (wallace_tree_adder_3) 0.00 6.86 r

    mul2_i/tree_adder/adder13/ina (full_adder_268) 0.00 6.86 r

    mul2_i/tree_adder/adder13/U6/Y (XNOR2X1) 0.07 6.93 r

    mul2_i/tree_adder/adder13/U3/Y (XOR2X1) 0.07 7.01 r

    mul2_i/tree_adder/adder13/sum (full_adder_268) 0.00 7.01 r

    mul2_i/tree_adder/adder50/ina (full_adder_240) 0.00 7.01 r

    mul2_i/tree_adder/adder50/U6/Y (XNOR2X1) 0.07 7.08 r

    mul2_i/tree_adder/adder50/U3/Y (XOR2X1) 0.07 7.15 r

    mul2_i/tree_adder/adder50/sum (full_adder_240) 0.00 7.15 r

    mul2_i/tree_adder/adder88/ina (full_adder_213) 0.00 7.15 r

    mul2_i/tree_adder/adder88/U6/Y (XNOR2X1) 0.08 7.23 r

    mul2_i/tree_adder/adder88/U3/Y (XOR2X1) 0.07 7.30 r

    mul2_i/tree_adder/adder88/sum (full_adder_213) 0.00 7.30 r

    31

  • 7/30/2019 VLSI DSP Project Report_1.0

    32/38

    mul2_i/tree_adder/adder115/ina (full_adder_197) 0.00 7.30 r

    mul2_i/tree_adder/adder115/U6/Y (XNOR2X1) 0.08 7.38 r

    mul2_i/tree_adder/adder115/U3/Y (XOR2X1) 0.07 7.45 r

    mul2_i/tree_adder/adder115/sum (full_adder_197) 0.00 7.45 r

    mul2_i/tree_adder/adder142/ina (full_adder_188) 0.00 7.45 rmul2_i/tree_adder/adder142/U6/Y (XNOR2X1) 0.08 7.52 r

    mul2_i/tree_adder/adder142/U3/Y (XOR2X1) 0.07 7.60 r

    mul2_i/tree_adder/adder142/sum (full_adder_188) 0.00 7.60 r

    mul2_i/tree_adder/VMA/A[10] (brent_kung_28bitadder_3)

    0.00 7.60 r

    mul2_i/tree_adder/VMA/pg10/A (p_g_76) 0.00 7.60 r

    mul2_i/tree_adder/VMA/pg10/U1/Y (XOR2X1) 0.07 7.67 r

    mul2_i/tree_adder/VMA/pg10/P (p_g_76) 0.00 7.67 r

    mul2_i/tree_adder/VMA/U11/Y (XOR2X1) 0.07 7.74 r

    mul2_i/tree_adder/VMA/sum[10] (brent_kung_28bitadder_3)0.00 7.74 r

    mul2_i/tree_adder/sum[14] (wallace_tree_adder_3) 0.00 7.74 r

    mul2_i/sum[14] (multiplier_3) 0.00 7.74 r

    bf_1_3/imag_in1[14] (bf_1_2) 0.00 7.74 r

    bf_1_3/adder1/B[14] (vma16_15) 0.00 7.74 r

    bf_1_3/adder1/ipg16/B[14] (p_g_16_15) 0.00 7.74 r

    bf_1_3/adder1/ipg16/U21/Y (XOR2X1) 0.07 7.81 r

    bf_1_3/adder1/ipg16/pg14[1] (p_g_16_15) 0.00 7.81 r

    bf_1_3/adder1/ixor16/B[14] (xor_16_15) 0.00 7.81 r

    bf_1_3/adder1/ixor16/U11/Y (XOR2X1) 0.04 7.85 fbf_1_3/adder1/ixor16/S[14] (xor_16_15) 0.00 7.85 f

    bf_1_3/adder1/S[14] (vma16_15) 0.00 7.85 f

    bf_1_3/U120/Y (MUX2X1) 0.04 7.89 r

    bf_1_3/U119/Y (INVX1) 0.04 7.94 f

    bf_1_3/imag_out0[14] (bf_1_2) 0.00 7.94 f

    bf_2_3/imag_in1[14] (bf_2_2) 0.00 7.94 f

    bf_2_3/U273/Y (MUX2X1) 0.08 8.02 r

    bf_2_3/U257/Y (INVX1) 0.09 8.10 f

    bf_2_3/adder1/B[14] (vma16_11) 0.00 8.10 f

    bf_2_3/adder1/ipg16/B[14] (p_g_16_11) 0.00 8.10 f

    bf_2_3/adder1/ipg16/U21/Y (XOR2X1) 0.06 8.16 r

    bf_2_3/adder1/ipg16/pg14[1] (p_g_16_11) 0.00 8.16 r

    bf_2_3/adder1/ixor16/B[14] (xor_16_11) 0.00 8.16 r

    bf_2_3/adder1/ixor16/U11/Y (XOR2X1) 0.05 8.21 f

    bf_2_3/adder1/ixor16/S[14] (xor_16_11) 0.00 8.21 f

    bf_2_3/adder1/S[14] (vma16_11) 0.00 8.21 f

    32

  • 7/30/2019 VLSI DSP Project Report_1.0

    33/38

    bf_2_3/U241/Y (AOI22X1) 0.05 8.26 r

    bf_2_3/U29/Y (BUFX2) 0.05 8.30 r

    bf_2_3/U13/Y (AND2X1) 0.03 8.33 r

    bf_2_3/U96/Y (INVX1) 0.03 8.36 f

    bf_2_3/imag_out0[14] (bf_2_2) 0.00 8.36 fmul3_i/A[14] (multiplier_1) 0.00 8.36 f

    mul3_i/pre_coding/x[14] (booth_coding_1) 0.00 8.36 f

    mul3_i/pre_coding/U251/Y (INVX1) 0.25 8.61 r

    mul3_i/pre_coding/U680/Y (MUX2X1) 0.10 8.71 f

    mul3_i/pre_coding/U679/Y (OAI21X1) 0.05 8.76 r

    mul3_i/pre_coding/pp0[14] (booth_coding_1) 0.00 8.76 r

    mul3_i/tree_adder/pp0[14] (wallace_tree_adder_1) 0.00 8.76 r

    mul3_i/tree_adder/adder13/ina (full_adder_82) 0.00 8.76 r

    mul3_i/tree_adder/adder13/U6/Y (XNOR2X1) 0.07 8.83 r

    mul3_i/tree_adder/adder13/U3/Y (XOR2X1) 0.07 8.91 rmul3_i/tree_adder/adder13/sum (full_adder_82) 0.00 8.91 r

    mul3_i/tree_adder/adder50/ina (full_adder_54) 0.00 8.91 r

    mul3_i/tree_adder/adder50/U6/Y (XNOR2X1) 0.07 8.98 r

    mul3_i/tree_adder/adder50/U3/Y (XOR2X1) 0.07 9.05 r

    mul3_i/tree_adder/adder50/sum (full_adder_54) 0.00 9.05 r

    mul3_i/tree_adder/adder88/ina (full_adder_27) 0.00 9.05 r

    mul3_i/tree_adder/adder88/U6/Y (XNOR2X1) 0.08 9.13 r

    mul3_i/tree_adder/adder88/U3/Y (XOR2X1) 0.07 9.20 r

    mul3_i/tree_adder/adder88/sum (full_adder_27) 0.00 9.20 r

    mul3_i/tree_adder/adder115/ina (full_adder_11) 0.00 9.20 rmul3_i/tree_adder/adder115/U6/Y (XNOR2X1) 0.08 9.28 r

    mul3_i/tree_adder/adder115/U3/Y (XOR2X1) 0.07 9.35 r

    mul3_i/tree_adder/adder115/sum (full_adder_11) 0.00 9.35 r

    mul3_i/tree_adder/adder142/ina (full_adder_2) 0.00 9.35 r

    mul3_i/tree_adder/adder142/U6/Y (XNOR2X1) 0.08 9.42 r

    mul3_i/tree_adder/adder142/U3/Y (XOR2X1) 0.07 9.50 r

    mul3_i/tree_adder/adder142/sum (full_adder_2) 0.00 9.50 r

    mul3_i/tree_adder/VMA/A[10] (brent_kung_28bitadder_1)

    0.00 9.50 r

    mul3_i/tree_adder/VMA/pg10/A (p_g_19) 0.00 9.50 r

    mul3_i/tree_adder/VMA/pg10/U1/Y (XOR2X1) 0.07 9.57 r

    mul3_i/tree_adder/VMA/pg10/P (p_g_19) 0.00 9.57 r

    mul3_i/tree_adder/VMA/U11/Y (XOR2X1) 0.07 9.64 r

    mul3_i/tree_adder/VMA/sum[10] (brent_kung_28bitadder_1)

    0.00 9.64 r

    mul3_i/tree_adder/sum[14] (wallace_tree_adder_1) 0.00 9.64 r

    33

  • 7/30/2019 VLSI DSP Project Report_1.0

    34/38

    mul3_i/sum[14] (multiplier_1) 0.00 9.64 r

    bf_1_4/imag_in1[14] (bf_1_1) 0.00 9.64 r

    bf_1_4/adder1/B[14] (vma16_7) 0.00 9.64 r

    bf_1_4/adder1/ipg16/B[14] (p_g_16_7) 0.00 9.64 r

    bf_1_4/adder1/ipg16/U21/Y (XOR2X1) 0.07 9.71 rbf_1_4/adder1/ipg16/pg14[1] (p_g_16_7) 0.00 9.71 r

    bf_1_4/adder1/ixor16/B[14] (xor_16_7) 0.00 9.71 r

    bf_1_4/adder1/ixor16/U11/Y (XOR2X1) 0.04 9.75 f

    bf_1_4/adder1/ixor16/S[14] (xor_16_7) 0.00 9.75 f

    bf_1_4/adder1/S[14] (vma16_7) 0.00 9.75 f

    bf_1_4/U120/Y (MUX2X1) 0.04 9.79 r

    bf_1_4/U119/Y (INVX1) 0.04 9.84 f

    bf_1_4/imag_out0[14] (bf_1_1) 0.00 9.84 f

    bf_2_4/imag_in1[14] (bf_2_1) 0.00 9.84 f

    bf_2_4/U257/Y (MUX2X1) 0.08 9.91 rbf_2_4/U241/Y (INVX1) 0.09 10.00 f

    bf_2_4/adder1/B[14] (vma16_3) 0.00 10.00 f

    bf_2_4/adder1/ipg16/B[14] (p_g_16_3) 0.00 10.00 f

    bf_2_4/adder1/ipg16/U21/Y (XOR2X1) 0.04 10.04 f

    bf_2_4/adder1/ipg16/pg14[1] (p_g_16_3) 0.00 10.04 f

    bf_2_4/adder1/ir1c15/pg[1] (partial_product_generator2_33)

    0.00 10.04 f

    bf_2_4/adder1/ir1c15/U1/Y (AND2X1) 0.04 10.09 f

    bf_2_4/adder1/ir1c15/pgo[1] (partial_product_generator2_33)

    0.00 10.09 fbf_2_4/adder1/ir2c15/pg[1] (partial_product_generator2_26)

    0.00 10.09 f

    bf_2_4/adder1/ir2c15/U1/Y (AND2X1) 0.04 10.13 f

    bf_2_4/adder1/ir2c15/pgo[1] (partial_product_generator2_26)

    0.00 10.13 f

    bf_2_4/adder1/ir3c15/pg[1] (partial_product_generator2_23)

    0.00 10.13 f

    bf_2_4/adder1/ir3c15/U1/Y (AND2X1) 0.04 10.16 f

    bf_2_4/adder1/ir3c15/pgo[1] (partial_product_generator2_23)

    0.00 10.16 f

    bf_2_4/adder1/ir4c15/pg[1] (partial_product_generator1_45)

    0.00 10.16 f

    bf_2_4/adder1/ir4c15/U2/Y (AOI21X1) 0.04 10.20 r

    bf_2_4/adder1/ir4c15/U1/Y (INVX1) 0.03 10.23 f

    bf_2_4/adder1/ir4c15/pgo (partial_product_generator1_45)

    0.00 10.23 f

    34

  • 7/30/2019 VLSI DSP Project Report_1.0

    35/38

    bf_2_4/adder1/ixor16/A[15] (xor_16_3) 0.00 10.23 f

    bf_2_4/adder1/ixor16/U10/Y (XOR2X1) 0.06 10.29 r

    bf_2_4/adder1/ixor16/S[15] (xor_16_3) 0.00 10.29 r

    bf_2_4/adder1/S[15] (vma16_3) 0.00 10.29 r

    bf_2_4/U218/Y (AOI22X1) 0.04 10.33 fbf_2_4/U54/Y (BUFX2) 0.11 10.45 f

    bf_2_4/U217/Y (NAND2X1) 0.08 10.53 r

    bf_2_4/imag_out0[15] (bf_2_1) 0.00 10.53 r

    imag_out[15] (out) 0.00 10.53 r

    data arrival time 10.53

    --------------------------------------------------------------------------

    (Path is unconstrained)

    Results of Area

    The result of silicon area is shown as follows:

    ****************************************

    Report : areaDesign : fft_1024

    Version: A-2007.12

    Date : Tue Dec 22 07:40:53 2009

    ****************************************

    Library(s) Used:

    gscl45nm (File: /home/class/zhan0915/fft3/gscl45nm.db)

    Number of ports: 66Number of nets: 39672

    Number of cells: 38565

    Number of references: 31

    Combinational area: 70080.567213

    Noncombinational area: 262240.141182

    35

  • 7/30/2019 VLSI DSP Project Report_1.0

    36/38

    Net Interconnect area: undefined (No wire load specified)

    Total cell area: 332320.708395

    Total area: undefined

    Information: This design contains black box (unknown) components. (RPT-8)

    Conclusion of the Phase2

    In the second stage, one 1024-point radix-22 FFT module is designed. Synthesis results show that

    the multiplier takes the silicon area of 332320.708m2. The critical path of the multiplier is

    10.53ns. And the total power consumption of the multiplier is 45.4214mW. Fig. 16 is the

    schematic view of the FFT.

    Fig. 16 Schematic View of 1024-points FFT

    36

  • 7/30/2019 VLSI DSP Project Report_1.0

    37/38

    REFERENCES

    [1] Shousheng He and Mats Torkelson, A New Approach to Pipeline FFT Processor,15-19 April

    1996 Page(s):766 - 770 Digital Object Identifier 10.1109/IPPS.1996.508145

    [2] Shousheng He and Mats Torkelson, Design and Implementation of a 1024-point Pipeline FFTProcessor, 11-14 May 1998 Page(s):131 134 Digital Object Identifier 10.1109/CICC.1998.694922

    [ 3] S. He and M. Torkelson. A complex array multiplier using distributed arithmetic. InProc. IEEE

    CICC'96,pages 71-74, San Diego, CA, May 1996.

    [4] Garrido, M; Parhi, K; Grajal, J, A Pipelined FFT Architecture for Real-Valued Signals, Volume

    PP, 2009 Page(s):1 - 1 Digital Object Identifier10.1109/TCSI.2009.2017125.

    [5] Kia Bazargan, University of Minnesota Class Handouts, EE 5324- VLSI Design II, Spring 2006.

    [6] Kharrat, M.W.; Ben Ayed, M.A.; Loulou, M.; Masmoudi, N.; Kamoun, L.,A new method to

    implement a constant operand multiplier, Microelectronics, The 14th International Conference on 2002

    ICM 11-13 Dec. 2002 Page(s):62 65.

    [7] Saeeid Tahmasbi Oskuii, Per Gunnar Kjeldsberg, Oscar Gustafsson Power Optimized PartialProduct Reduction Interconnect Ordering in Parallel Multipliers.

    37

  • 7/30/2019 VLSI DSP Project Report_1.0

    38/38


Recommended