+ All Categories
Home > Documents > lecture6_FPGA

lecture6_FPGA

Date post: 05-Apr-2018
Category:
Upload: zaheer-abbas
View: 216 times
Download: 0 times
Share this document with a friend

of 104

Transcript
  • 7/31/2019 lecture6_FPGA

    1/104

    George Mason UniversityECE 545 Introduction to VHDL

    FPGA Devicesand

    FPGA Design Flow

    ECE 545Lecture 6

  • 7/31/2019 lecture6_FPGA

    2/104

    2ECE 545 Introduction to VHDL

    Resources

    Xilinx, Inc.Spartan-3 FPGA Introduction

    Features

    Architectural Overview

    Spartan-3 FPGA Functional Description

    CLB Overview,

    Block RAM Overview

    Dedicated Multipliers

    http://direct.xilinx.com/bvdocs/publications/ds099.pdf

    http://direct.xilinx.com/bvdocs/publications/ds099.pdfhttp://direct.xilinx.com/bvdocs/publications/ds099.pdf
  • 7/31/2019 lecture6_FPGA

    3/104

    3ECE 545 Introduction to VHDL

    Resources

    Integrated Interfaces: Active-HDL with Synplify

    Integrated Synthesis and Implementation

    Movie Demos

    Active-HDL Help

    http://www.aldec.com/products/active-hdl/multimediademo/movies/active_hdl_with_synplify/http://www.aldec.com/products/active-hdl/multimediademo/movies/fpga_synth_implement/http://www.aldec.com/products/active-hdl/multimediademo/movies/fpga_synth_implement/http://www.aldec.com/products/active-hdl/multimediademo/movies/fpga_synth_implement/http://www.aldec.com/products/active-hdl/multimediademo/movies/active_hdl_with_synplify/http://www.aldec.com/products/active-hdl/multimediademo/movies/active_hdl_with_synplify/http://www.aldec.com/products/active-hdl/multimediademo/movies/active_hdl_with_synplify/
  • 7/31/2019 lecture6_FPGA

    4/104

    4ECE 545 Introduction to VHDL

    designs must be sentfor expensive and timeconsuming fabricationin semiconductor foundry

    bought off the shelfand reconfigured bydesigners themselves

    Two competing implementation approaches

    ASICApplicationSpecificIntegratedCircuit

    FPGAFieldProgrammableGateArray

    designed all the wayfrom behavioral descriptionto physical layout

    no physical layout design;design ends witha bitstream usedto configure a device

  • 7/31/2019 lecture6_FPGA

    5/104

    5ECE 545 Introduction to VHDL

    Which Way to Go?

    Off-the-shelf

    Low development cost

    Short time to market

    Reconfigurability

    High performance

    ASICs FPGAs

    Low power

    Low cost inhigh volumes

  • 7/31/2019 lecture6_FPGA

    6/104

  • 7/31/2019 lecture6_FPGA

    7/1047ECE 545 Introduction to VHDL

    FPGA vendors

    and

    FPGA families

  • 7/31/2019 lecture6_FPGA

    8/1048ECE 545 Introduction to VHDL

    1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000

    FPGAs

    ASICs

    CPLDs

    SPLDs

    Microprocessors

    SRAMs & DRAMs

    ICs (General)

    Transistors

    The Design Warriors Guide to FPGAsDevices, Tools, and Flows. ISBN 0750676043

    Copyright 2004 Mentor Graphics Corp. (www.mentor.com)

    Technology Timeline

  • 7/31/2019 lecture6_FPGA

    9/1049ECE 545 Introduction to VHDL

    Major FPGA vendors

    SRAM-based FPGAsXilinx Inc. www.xilinx.com

    Altera Corp. www.altera.com

    Atmel Corp. www.atmel.com

    Lattice Semiconductor Corp.

    www.latticesemi.com

    Antifuse and flash-based FPGAsActel Corp. www.actel.com

    QuickLogic Corp.www.quicklogic.com

    http://www.xilinx.com/http://www.altera.com/http://www.atmel.com/http://www.latticesemi.com/http://www.actel.com/http://www.quicklogic.com/http://www.quicklogic.com/http://www.actel.com/http://www.latticesemi.com/http://www.atmel.com/http://www.altera.com/http://www.xilinx.com/
  • 7/31/2019 lecture6_FPGA

    10/10410ECE 545 Introduction to VHDL

    State-of-the-art

    Feature

    Technology node

    SRAM AntifuseE2PROM /

    FLASH

    One or more

    generations behind

    One or more

    generations behind

    Fast

    Reprogrammingspeed (inc.

    erasing)----

    3x slower

    than SRAM

    Yes

    Volatile (must

    be programmedon power-up)

    NoNo

    (but can be if required)

    MediumPower

    consumptionLow Medium

    Acceptable(especially when using

    bitstream encryption)

    IP Security Very Good Very Good

    Large

    (six transistors)

    Size ofconfiguration cell

    Very smallMedium-small

    (two transistors)

    NoRad Hard Yes Not really

    NoInstant-on Yes Yes

    YesRequires external

    configuration fileNo No

    Yes

    (very good)

    Good forprototyping

    NoYes

    (reasonable)

    Yes

    (in system)Reprogrammable No

    Yes (in-system

    or offline)

    The Design Warriors Guide to FPGAsDevices, Tools, and Flows. ISBN 0750676043

    Copyright 2004 Mentor Graphics Corp. (www.mentor.com)

  • 7/31/2019 lecture6_FPGA

    11/10411ECE 545 Introduction to VHDL

    The Programmable MarketplaceQ1 Calendar Year 2005

    Source: Company reportsLatest information available; computed on a 4-quarter rolling basis

    XilinxAltera

    LatticeActel

    QuickLogic: 2% Xilinx

    All Others

    Two dominant suppliers, indicating a maturing market

    PLD Segment FPGA Sub-Segment

    Other: 2%

    51%33%

    5% 7%

    Altera

    58%

    31% 11%

  • 7/31/2019 lecture6_FPGA

    12/10412ECE 545 Introduction to VHDL

    PLD Market Share

    Source: Gartner Dataquest

    $2.3B$2.6B$4.1B$2.6B$2.1B $2.6B $3.1B

    31% 33% 34% 32% 31% 32% 32%

    39%32%

    28% 24% 20% 18% 17%

    49%50%

    44%38%

    35%30%

    51%

    0%

    20%

    40%

    60%

    80%

    100%

    Calendar year 1998 1999 2000 2001 2002 2003 2004

    MarketSha

    re(%)

    Xilinx Altera All Others

  • 7/31/2019 lecture6_FPGA

    13/10413ECE 545 Introduction to VHDL

    FPGA families

    Spartan 3 Virtex 4 LX / SX / FXSpartan 3E Virtex 5 LXSpartan 3L

    Low-cost High-performance

    Xilinx

    Altera Cyclone II Stratix II

    Stratix II GX

  • 7/31/2019 lecture6_FPGA

    14/10414ECE 545 Introduction to VHDL

    Xilinx

    Primary products: FPGAs and the associated CAD

    software

    Main headquarters in San Jose, CA

    Fabless* Semiconductor and Software Company

    UMC (Taiwan) {*Xilinx acquired an equity stake inUMC in 1996}

    Seiko Epson (Japan)

    TSMC (Taiwan)

    ProgrammableLogic Devices ISE Alliance and Foundation

    Series Design Software

    Source: [Xilinx Inc.]

  • 7/31/2019 lecture6_FPGA

    15/10415ECE 545 Introduction to VHDL

    Xilinx FPGA Families

    Old families

    XC3000, XC4000, XC5200 Old 0.5m, 0.35m and 0.25m technology. Not

    recommended for modern designs.

    Low Cost Family

    Spartan/XL derived from XC4000

    Spartan-II

    derived from Virtex Spartan-IIE derived from Virtex-E

    Spartan-3, Spartan 3E, Spartan 3L

    High-performance families

    Virtex (220 nm)

    Virtex-E, Virtex-EM (180 nm) Virtex-II, Virtex-II PRO (130 nm)

    Virtex-4 (90 nm)

    Virtex 5 (65 nm)

    Source: [Xilinx Inc.]

    P i f h f ili f

  • 7/31/2019 lecture6_FPGA

    16/10416ECE 545 Introduction to VHDL

    Prices of the most recent families ofXilinx FPGAs

    Spartan 3 Virtex II, Virtex II-Pro

    < $130* < $3,000*

    Spartan 3E Virtex 4

    < $35* < $3,000*

    * approximate cost of the largest device per unit for

    a batch of 10,000 units

    Low-cost High-performance

  • 7/31/2019 lecture6_FPGA

    17/10417ECE 545 Introduction to VHDL

    Xilinx FPGAs

  • 7/31/2019 lecture6_FPGA

    18/10418ECE 545 Introduction to VHDL

    BlockRAMs

    BlockRAMs

    ConfigurableLogicBlocks

    I/OBlocks

    Xilinx FPGA

    Block

    RAMs

  • 7/31/2019 lecture6_FPGA

    19/10419ECE 545 Introduction to VHDL

    CLB CLB

    CLB CLB

    Logic cell

    Slice

    Logic cell

    Logic cell

    Slice

    Logic cell

    Logic cell

    Slice

    Logic cell

    Logic cell

    Slice

    Logic cell

    Configurable logic block (CLB)

    The Design Warriors Guide to FPGAsDevices, Tools, and Flows. ISBN 0750676043

    Copyright 2004 Mentor Graphics Corp. (www.mentor.com)

    Xilinx CLB

  • 7/31/2019 lecture6_FPGA

    20/10420ECE 545 Introduction to VHDL

    16-bit SR

    flip-flop

    clock

    mux

    y

    qe

    a

    b

    c

    d

    16x1 RAM

    4-input

    LUT

    clock enable

    set/reset

    The Design Warriors Guide to FPGAsDevices, Tools, and Flows. ISBN 0750676043

    Copyright 2004 Mentor Graphics Corp. (www.mentor.com)

    Simplified view of a Xilinx Logic Cell

  • 7/31/2019 lecture6_FPGA

    21/10421ECE 545 Introduction to VHDL

    LUT (Look-Up Table) Functionality

    Look-Up tablesare primaryelements forlogic

    implementation Each LUT can

    implement anyfunction of 4

    inputs

    x1 x2 x3 x4

    y

    x1 x2

    y

    LUT

    x1x2x3x4

    y

    0

    x10

    x2 x3 x40 0

    0 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 0

    0 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

    y

    0100010

    101001100

    0

    x10

    x2 x3 x40 0

    0 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 0

    0 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

    y

    1111111

    111110000

    x1 x2 x3 x4

    y

    x1 x2 x3 x4

    y

    x1 x2

    y

    x1 x2

    y

    LUT

    x1x2x3x4

    y

    0

    x10

    x2 x3 x40 0

    0 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 0

    0 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

    y

    0100010

    101001100

    0

    x10

    x2 x3 x40 0

    0 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 0

    0 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

    y

    0100010

    101001100

    0

    x10

    x2 x3 x40 0

    0 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 0

    0 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

    y

    1111111

    111110000

    0

    x10

    x2 x3 x40 0

    0 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 0

    0 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

    y

    1111111

    111110000

  • 7/31/2019 lecture6_FPGA

    22/104

    22ECE 545 Introduction to VHDL

    5-Input Functions implemented using two LUTs

    LUTLUT

    X5 X4 X3 X2 X1 Y

    0 0 0 0 0 0

    0 0 0 0 1 1

    0 0 0 1 0 0

    0 0 0 1 1 0

    0 0 1 0 0 1

    0 0 1 0 1 1

    0 0 1 1 0 0

    0 0 1 1 1 0

    0 1 0 0 0 1

    0 1 0 0 1 0

    0 1 0 1 0 0

    0 1 0 1 1 1

    0 1 1 0 0 1

    0 1 1 0 1 1

    0 1 1 1 0 1

    0 1 1 1 1 1

    1 0 0 0 0 0

    1 0 0 0 1 0

    1 0 0 1 0 0

    1 0 0 1 1 0

    1 0 1 0 0 0

    1 0 1 0 1 0

    1 0 1 1 0 01 0 1 1 1 1

    1 1 0 0 0 0

    1 1 0 0 1 1

    1 1 0 1 0 0

    1 1 0 1 1 1

    1 1 1 0 0 0

    1 1 1 0 1 1

    1 1 1 1 0 0

    1 1 1 1 1 0

    LUTLUT

    OUT

  • 7/31/2019 lecture6_FPGA

    23/104

    23ECE 545 Introduction to VHDL

    RAM16X1S

    O

    D

    WE

    WCLKA0

    A1

    A2

    A3

    RAM32X1S

    O

    DWE

    WCLKA0A1A2A3A4

    RAM16X2S

    O1

    D0

    WE

    WCLKA0A1A2A3

    D1

    O0

    =

    =

    LUT

    LUT or

    LUT

    RAM16X1D

    SPO

    D

    WE

    WCLK

    A0

    A1

    A2

    A3

    DPRA0 DPO

    DPRA1

    DPRA2

    DPRA3

    or

    Distributed RAM

    CLB LUT configurable asDistributed RAM A LUT equals 16x1 RAM

    Implements Single and Dual-

    Ports Cascade LUTs to increaseRAM size

    Synchronous write

    Synchronous/Asynchronousread Accompanying flip-flops used

    for synchronous read

  • 7/31/2019 lecture6_FPGA

    24/104

    24ECE 545 Introduction to VHDL

    D QCE

    D QCE

    D QCE

    D QCE

    LUT

    INCE

    CLK

    DEPTH[3:0]

    OUTLUT =

    Shift Register

    Each LUT can beconfigured as shift register Serial in, serial out

    Dynamically addressabledelay up to 16 cycles

    For programmablepipeline Cascade for greater cycle

    delays Use CLB flip-flops to add

    depth

  • 7/31/2019 lecture6_FPGA

    25/104

    25ECE 545 Introduction to VHDL

    Shift Register

    Register-rich FPGA Allows for addition of pipeline stages to increase

    throughput

    Data paths must be balanced to keep desiredfunctionality

    64

    Operation A

    4 Cycles 8 Cycles

    Operation B

    3 Cycles

    Operation C

    64

    12 Cycles

    3 Cycles

    9-Cycle imbalance

  • 7/31/2019 lecture6_FPGA

    26/104

    26ECE 545 Introduction to VHDL

    16-bit SR

    16 x 1 RAM

    4-input LUT

    The Design Warriors Guide to FPGAsDevices, Tools, and Flows. ISBN 0750676043

    Copyright 2004 Mentor Graphics Corp. (www.mentor.com)

    Xilinx Multipurpose LUT

  • 7/31/2019 lecture6_FPGA

    27/104

    27ECE 545 Introduction to VHDL

    16-bit SR

    flip-flop

    clock

    mux

    y

    qe

    a

    b

    c

    d

    16x1 RAM

    4-input

    LUT

    clock enable

    set/reset

    The Design Warriors Guide to FPGAsDevices, Tools, and Flows. ISBN 0750676043

    Copyright 2004 Mentor Graphics Corp. (www.mentor.com)

    Simplified view of a Xilinx Logic Cell

  • 7/31/2019 lecture6_FPGA

    28/104

    28ECE 545 Introduction to VHDL

    COUT

    D Q

    CK

    S

    REC

    D Q

    CK

    REC

    O

    G4G3G2G1

    Look-UpTable

    Carry

    &

    Control

    Logic

    O

    YB

    Y

    F4F3F2F1

    XB

    X

    Look-UpTable

    F5IN

    BY

    SR

    S

    Carry

    &

    Control

    Logic

    CINCLKCE

    SLICE

    Carry & Control Logic

  • 7/31/2019 lecture6_FPGA

    29/104

    29ECE 545 Introduction to VHDL

    Each CLB contains separatelogic and routing for the fastgeneration of sum & carrysignals

    Increases efficiency andperformance of adders,subtractors, accumulators,comparators, and counters

    Carry logic is independent ofnormal logic and routingresources

    Fast Carry Logic

    LSB

    MSB

    CarryLog

    ic

    Routing

  • 7/31/2019 lecture6_FPGA

    30/104

    30ECE 545 Introduction to VHDL

    Accessing Carry Logic

    All major synthesis tools can infer carrylogic for arithmetic functions

    Addition (SUM

  • 7/31/2019 lecture6_FPGA

    31/104

    31ECE 545 Introduction to VHDL

    CLB Slice Structure

    Each slice contains two sets of the

    following: Four-input LUT

    Any 4-input logic function,

    or 16-bit x 1 sync RAM (SLICEM only)

    or 16-bit shift register (SLICEM only)

    Carry & Control Fast arithmetic logic

    Multiplier logic

    Multiplexer logic

    Storage element Latch or flip-flop Set and reset

    True or inverted inputs

    Sync. or async. control

  • 7/31/2019 lecture6_FPGA

    32/104

    George Mason UniversityECE 545 Introduction to VHDL

    Block RAM(BRAM)

  • 7/31/2019 lecture6_FPGA

    33/104

    33ECE 545 Introduction to VHDL

    Block RAM

    Spartan-3Dual-Port

    Block RAM

    PortA

    P

    ortB

    Block RAM

    Most efficient memory implementation

    Dedicated blocks of memory

    Ideal for most memory requirements

    4 to 104 memory blocks

    18 kbits = 18,432 bits per block (16 k without parity bits)

    Use multiple blocks for larger memories

    Builds both single and true dual-port RAMs

    RAM Blocks and Multipliers in Xilinx

  • 7/31/2019 lecture6_FPGA

    34/104

    34ECE 545 Introduction to VHDL

    RAM blocks

    Multipliers

    Logic blocks

    RAM Blocks and Multipliers in XilinxFPGAs

    The Design Warriors Guide to FPGAsDevices, Tools, and Flows. ISBN 0750676043

    Copyright 2004 Mentor Graphics Corp. (www.mentor.com)

  • 7/31/2019 lecture6_FPGA

    35/104

    35ECE 545 Introduction to VHDL

    Spartan-3 Block RAM Amounts

  • 7/31/2019 lecture6_FPGA

    36/104

    36ECE 545 Introduction to VHDL

    Block RAM Port Aspect Ratios

    0

    16,383

    1

    4,095

    4

    0

    8,191

    2

    0

    2047

    8+1

    0

    1023

    16+2

    0

    16k x 1

    8k x 2 4k x 4

    2k x (8+1)

    1024 x (16+2)

  • 7/31/2019 lecture6_FPGA

    37/104

    37ECE 545 Introduction to VHDL

    Block RAM Port Aspect Ratios

  • 7/31/2019 lecture6_FPGA

    38/104

    38ECE 545 Introduction to VHDL

    Single-Port Block RAM

  • 7/31/2019 lecture6_FPGA

    39/104

  • 7/31/2019 lecture6_FPGA

    40/104

    40ECE 545 Introduction to VHDL

    RAMB4_S16_S8

    Port A Out18-Bit Width

    Port B In2k-Bit Depth

    Port A In1K-Bit Depth

    Port B Out9-Bit Width

    DOA[17:0]

    DOB[8:0]

    WEA

    ENA

    RSTA

    ADDRA[9:0]

    CLKA

    DIA[17:0]

    WEB

    ENB

    RSTB

    ADDRB[10:0]

    CLKB

    DIB[8:0]

    Dual-Port Bus Flexibility

    Each port can be configured with a different data buswidth

    Provides easy data width conversion without anyadditional logic

  • 7/31/2019 lecture6_FPGA

    41/104

    41ECE 545 Introduction to VHDL

    0, ADDR[12:0]

    1, ADDR[12:0]

    RAMB4_S1_S1

    Port B Out1-Bit Width

    DOA[0]

    DOB[0]

    WEAENA

    RSTA

    ADDRA[12:0]

    CLKA

    DIA[0]

    WEB

    ENB

    RSTB

    ADDRB[12:0]

    CLKB

    DIB[0]

    Port B In

    8K-Bit Depth

    Port A Out

    1-Bit Width

    Port A In8K-Bit Depth

    Two Independent Single-Port RAMs

    To access the lower RAM

    Tie the MSB address bit toLogic Low

    To access the upper RAM Tie the MSB address bit to

    Logic High

    Added advantage of True Dual-

    Port No wasted RAM Bits

    Can split a Dual-Port 16K RAM intotwo Single-Port 8K RAM

    Simultaneous independent accessto each RAM

  • 7/31/2019 lecture6_FPGA

    42/104

    George Mason UniversityECE 545 Introduction to VHDL

    Embedded Multipliers

  • 7/31/2019 lecture6_FPGA

    43/104

    43ECE 545 Introduction to VHDL

    18 x 18 Embedded Multiplier

    Fast arithmetic functions Optimized to implement

    multiply / accumulate modules

    18 x 18 signed multiplier

    Fully combinational

    Optional registers with CE & RST (pipeline)

    Independent from adjacent block RAM

  • 7/31/2019 lecture6_FPGA

    44/104

    44ECE 545 Introduction to VHDL

    18 x 18 Multiplier

    Embedded 18-bit x 18-bit multiplier

    2s complement signed operation

    Multipliers are organized in columns

    18 x 18Multiplier

    Output(36 bits)

    Data_A(18 bits)

    Data_B(18 bits)

    P iti f M lti li

  • 7/31/2019 lecture6_FPGA

    45/104

    45ECE 545 Introduction to VHDL

    Positions of Multipliers

  • 7/31/2019 lecture6_FPGA

    46/104

    46ECE 545 Introduction to VHDL

    Asynchronous 18-bit Multiplier

  • 7/31/2019 lecture6_FPGA

    47/104

    47ECE 545 Introduction to VHDL

    18-bit Multiplier with Register

  • 7/31/2019 lecture6_FPGA

    48/104

    George Mason UniversityECE 545 Introduction to VHDL

    Input/Output Blocks(IOBs)

    Basic I/O Block Structure

  • 7/31/2019 lecture6_FPGA

    49/104

    49ECE 545 Introduction to VHDL

    Basic I/O Block Structure

    D

    EC

    Q

    SR

    D

    EC

    Q

    SR

    D

    EC

    Q

    SR

    Three-StateControl

    Output Path

    Input Path

    Three-State

    Output

    Clock

    Set/Reset

    Direct Input

    RegisteredInput

    FF Enable

    FF Enable

    FF Enable

    IOB F ti lit

  • 7/31/2019 lecture6_FPGA

    50/104

    50ECE 545 Introduction to VHDL

    IOB Functionality

    IOB provides interface between thepackage pins and CLBs

    Each IOB can work as uni- or bi-directional

    I/O Outputs can be forced into High Impedance

    Inputs and outputs can be registered

    advised for high-performance I/O

    Inputs can be delayed

  • 7/31/2019 lecture6_FPGA

    51/104

    George Mason UniversityECE 545 Introduction to VHDL

    Spartan-3 Family Attributes

    S t 3 FPGA F il M b

  • 7/31/2019 lecture6_FPGA

    52/104

    52ECE 545 Introduction to VHDL

    Spartan-3 FPGA Family Members

  • 7/31/2019 lecture6_FPGA

    53/104

    George Mason UniversityECE 545 Introduction to VHDL

    FPGA Design Flow

  • 7/31/2019 lecture6_FPGA

    54/104

    54ECE 545 Introduction to VHDL

    Design process (1)

    Design and implement a simple unit permitting to

    speed up encryption with RC5-similar cipher with

    fixed key set on 8031 microcontroller. Unlike in

    the experiment 5, this time your unit has to be able

    to perform an encryption algorithm by itself,

    executing 32 rounds..

    LibraryIEEE;

    use ieee.std_logic_1164.all;

    use ieee.std_logic_unsigned.all;

    entity RC5_core is

    port(clock, reset, encr_decr: in std_logic;

    data_input: in std_logic_vector(31downto0);

    data_output: out std_logic_vector(31downto0);

    out_full: in std_logic;

    key_input: in std_logic_vector(31downto0);

    key_read: out std_logic;

    );

    end AES_core;

    Specification

    VHDL description (Your VHDL Source Files)

    Functional simulation

    Post-synthesis simulationSynthesis

  • 7/31/2019 lecture6_FPGA

    55/104

    Design Process control from Active HD

  • 7/31/2019 lecture6_FPGA

    56/104

    56ECE 545 Introduction to VHDL

    Design Process control from Active-HD

    Logic Synthesis

  • 7/31/2019 lecture6_FPGA

    57/104

    57ECE 545 Introduction to VHDL

    architecture MLU_DATAFLOW of MLU is

    signal A1:STD_LOGIC;

    signal B1:STD_LOGIC;

    signal Y1:STD_LOGIC;

    signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC;

    begin

    A1

  • 7/31/2019 lecture6_FPGA

    58/104

    58ECE 545 Introduction to VHDL

    Synthesis Tools

    and others

    Features of synthesis tools

  • 7/31/2019 lecture6_FPGA

    59/104

    59ECE 545 Introduction to VHDL

    Features of synthesis tools

    Interpret RTL code

    Produce synthesized circuit netlist in

    a standard EDIF format Give preliminary performance

    estimates

    Some can display circuit schematicscorresponding to EDIF netlist

    Timing report after s nthesis

  • 7/31/2019 lecture6_FPGA

    60/104

    60ECE 545 Introduction to VHDL

    Timing report after synthesis

    Performance Summary

    *******************

    Worst slack in design: -0.924

    Requested Estimated Requested Estimated

    Clock ClockStarting Clock Frequency Frequency Period Period Slack

    Type Group-------------------------------------------------------------------------------------------------------exam1|clk 85.0 MHz 78.8 MHz 11.765 12.688 -0.924

    inferred Inferred_clkgroup_0System 85.0 MHz 86.4 MHz 11.765 11.572 0.193

    system default_clkgroup===========================================================

    Implementation

  • 7/31/2019 lecture6_FPGA

    61/104

    61ECE 545 Introduction to VHDL

    Implementation

    After synthesis the entire implementationprocess is performed by FPGA vendor

    tools

  • 7/31/2019 lecture6_FPGA

    62/104

    62ECE 545 Introduction to VHDL

    Mapping

  • 7/31/2019 lecture6_FPGA

    63/104

    63ECE 545 Introduction to VHDL

    Mapping

    LUT2

    LUT3

    LUT4

    LUT5

    LUT1FF1

    FF2

    LUT0

    Placing FPGA

  • 7/31/2019 lecture6_FPGA

    64/104

    64ECE 545 Introduction to VHDL

    Placing

    CLB SLICES

    FPGA

    Routing FPGA

  • 7/31/2019 lecture6_FPGA

    65/104

    65ECE 545 Introduction to VHDL

    Routing

    Programmable Connections

    FPGA

    Map report header

  • 7/31/2019 lecture6_FPGA

    66/104

    66ECE 545 Introduction to VHDL

    Map report header

    Release 7.1.03i Map H.41Xilinx Mapping Report File for Design 'exam1'

    Design Information------------------Command Line : c:\Xilinx\bin\nt\map.exe -p 2S200FG256-6 -o map.ncd -pr b -k

    4-cm area -c 100 -tx off exam1.ngd exam1.pcfTarget Device : xc2s200Target Package : fg256Target Speed : -6Mapper Version : spartan2 -- $Revision: 1.26.6.4 $

    Mapped Date : Wed Nov 02 11:15:15 2005

    Map report

  • 7/31/2019 lecture6_FPGA

    67/104

    67ECE 545 Introduction to VHDL

    Map reportDesign Summary--------------

    Number of errors: 0Number of warnings: 0Logic Utilization:Number of Slice Flip Flops: 144 out of 4,704 3%Number of 4 input LUTs: 173 out of 4,704 3%

    Logic Distribution:Number of occupied Slices: 145 out of 2,352 6%Number of Slices containing only related logic: 145 out of 145 100%Number of Slices containing unrelated logic: 0 out of 145 0%

    *See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs: 210 out of 4,704 4%

    Number used as logic: 173Number used as a route-thru: 5Number used as 16x1 RAMs: 32

    Number of bonded IOBs: 74 out of 176 42%Number of GCLKs: 1 out of 4 25%Number of GCLKIOBs: 1 out of 4 25

    Place & route report

  • 7/31/2019 lecture6_FPGA

    68/104

    68ECE 545 Introduction to VHDL

    Place & route report

    Timing Score: 0

    Asterisk (*) preceding a constraint indicates it was not met.

    This may be due to a setup or hold violation.

    --------------------------------------------------------------------------------

    Constraint | Requested | Actual | Logic

    | | | Levels--------------------------------------------------------------------------------

    TS_clk = PERIOD TIMEGRP "clk" 11.765 ns | 11.765ns | 11.622ns | 13

    HIGH 50% | | |

    --------------------------------------------------------------------------------

    OFFSET = OUT 11.765 ns AFTER COMP "clk" | 11.765ns | 11.491ns | 1

    --------------------------------------------------------------------------------

    OFFSET = IN 11.765 ns BEFORE COMP "clk" | 11.765ns | 11.442ns | 2--------------------------------------------------------------------------------

    Post layout timing report

  • 7/31/2019 lecture6_FPGA

    69/104

    69ECE 545 Introduction to VHDL

    Post layout timing report

    Timing summary:---------------

    Timing errors: 0 Score: 0

    Constraints cover 42912 paths, 0 nets, and 1038 connections

    Design statistics:

    Minimum period: 11.622ns (Maximum frequency:86.044MHz)

    Minimum input required time before clock: 11.442ns

    Minimum output required time after clock: 11.491ns

  • 7/31/2019 lecture6_FPGA

    70/104

    Configuration of SRAM based FPGAs

  • 7/31/2019 lecture6_FPGA

    71/104

    71ECE 545 Introduction to VHDL

    Configuration data inConfiguration data out

    = I/O pin/pad

    = SRAM cell

    The Design Warriors Guide to FPGAsDevices, Tools, and Flows. ISBN 0750676043

    Copyright 2004 Mentor Graphics Corp. (www.mentor.com)

    Configuration of SRAM based FPGAs

  • 7/31/2019 lecture6_FPGA

    72/104

    72ECE 545 Introduction to VHDL

    Configuration times

    of selected FPGAdevices

  • 7/31/2019 lecture6_FPGA

    73/104

    73ECE 545 Introduction to VHDL

    Timing of digital circuits

    Timing Characteristics of Combinational

  • 7/31/2019 lecture6_FPGA

    74/104

    74ECE 545 Introduction to VHDL

    Circuits

    Combinational Circuits AreCharacterized by Propagation Delays

    through logic components (gates, LUTs)

    through interconnects (routing delays)

    tp LUT tp routing

    LUT LUT LUT

    Total propagation delay through combinational logic

    Timing Characteristics of CombinationalCi it (2)

  • 7/31/2019 lecture6_FPGA

    75/104

    75ECE 545 Introduction to VHDL

    Circuits (2)

    Total Propagation Delay of LogicDepends on the Number of Logic Levelsand Delays of Logic Components

    Number of logic levels is the number oflogic components (gates, LUTs) the signalpropagates through

    Routing Delays Depend on:

    Length of interconnects Fanout

    Timing Characteristics of Combinational

  • 7/31/2019 lecture6_FPGA

    76/104

    76ECE 545 Introduction to VHDL

    Circuits (3)

    Fanout Number of Inputs Connectedto One Output

    Each inputs has its capacitance

    Fast switching of outputs with high fanoutrequires higher currents and strong drivers

    LUT LUT

    LUT

    LUT

    Timing Characteristics of Combinational

  • 7/31/2019 lecture6_FPGA

    77/104

    77ECE 545 Introduction to VHDL

    Circuits (4)

    In Current Technologies Routing DelaysMake 45-65% of the Total PropagationDelays

    Timing Characteristics of Sequential

  • 7/31/2019 lecture6_FPGA

    78/104

    78ECE 545 Introduction to VHDL

    Circuits (1)

    Timing Features of Flip-flops Setup time tS minimum time the input has

    to be stable before the rising edge of theclock

    Hold time tH minimum time the input hasto be stable after the rising edge of theclock

    Propagation delay tP time to propagate

    input to output after the rising edge of theclock

    Timing Characteristics of Sequential

  • 7/31/2019 lecture6_FPGA

    79/104

    79ECE 545 Introduction to VHDL

    Circuits (2)

    D Q

    clk

    clk

    D

    Q

    tS tH

    tP

    Input D must remainstable during

    this interval

    Input D can freelychange during

    this interval

    Critical Path (1)

  • 7/31/2019 lecture6_FPGA

    80/104

    80ECE 545 Introduction to VHDL

    Critical Path (1)

    Critical Path The Longest Path FromOutputs of Registers to Inputs ofRegisters

    D Qin

    clk

    D Qout

    tlogic

    tCritical = tFF-P + tlogic + tFF-setup

    Critical Path (2)

  • 7/31/2019 lecture6_FPGA

    81/104

    81ECE 545 Introduction to VHDL

    Critical Path (2)

    Min. Clock Period = Length of TheCritical Path

    Max. Clock Frequency = 1 / Min. ClockPeriod

  • 7/31/2019 lecture6_FPGA

    82/104

    82ECE 545 Introduction to VHDL

    n+m

    n+m

    Clock Jitter

  • 7/31/2019 lecture6_FPGA

    83/104

    83ECE 545 Introduction to VHDL

    Clock Jitter

    Rising Edge of The Clock Does NotOccurPrecisely Periodically

    May cause faults in the circuit

    clk

    Clock Skew

  • 7/31/2019 lecture6_FPGA

    84/104

    84ECE 545 Introduction to VHDL

    Clock Skew

    Rising Edge of the Clock Does Not Arrive atClock Inputs of All Flip-flops at The SameTime

    D Qin

    clk

    D Qout

    delay

    D Qin

    clk

    D Q out

    delay

    Clock skew

  • 7/31/2019 lecture6_FPGA

    85/104

    85ECE 545 Introduction to VHDL

    Clock skew

    H-clock tree used to minimize clock skew

  • 7/31/2019 lecture6_FPGA

    86/104

    86ECE 545 Introduction to VHDL

    H clock tree used to minimize clock skew

    Dealing With Clock Problems

  • 7/31/2019 lecture6_FPGA

    87/104

    87ECE 545 Introduction to VHDL

    Dealing With Clock Problems

    Use Only Dedicated Clock Nets for ClockSignals

    Do Not Put Any Logic in Clock Nets

    Basic I/O Block Structure

  • 7/31/2019 lecture6_FPGA

    88/104

    88ECE 545 Introduction to VHDL

    DEC

    Q

    SR

    D

    EC

    Q

    SR

    D

    EC

    Q

    SR

    Three-StateControl

    Output Path

    Input Path

    Three-State

    Output

    Clock

    Set/Reset

    Direct Input

    RegisteredInput

    FF Enable

    FF Enable

    FF Enable

    IOB Functionality

  • 7/31/2019 lecture6_FPGA

    89/104

    89ECE 545 Introduction to VHDL

    IOB Functionality

    IOB provides interface between thepackage pins and CLBs

    Each IOB can work as uni- or bi-directional

    I/O Outputs can be forced into High Impedance

    Inputs and outputs can be registered

    advised for high-performance I/O Inputs can be delayed

  • 7/31/2019 lecture6_FPGA

    90/104

    90ECE 545 Introduction to VHDL

    Timing simulation afterimplementation

    Timing vs. functional simulation

  • 7/31/2019 lecture6_FPGA

    91/104

    91ECE 545 Introduction to VHDL

    g s u c o a s u a o

    Simulation before synthesis is used to verifycircuit functionality and may differfrom the oneafter synthesis and implementation

    Implementation tool generates SDF (StandardDelay Format) as a standard delay file and thenetlist for synthesized VHDL code with delays.

    Generated netlist contains many componentinstantiation statements with library references

    SDF file

  • 7/31/2019 lecture6_FPGA

    92/104

    92ECE 545 Introduction to VHDL

    ( DELAYFILE

    ( CELL( CELLTYPE XOR)

    ( INSTANCE U34.Z_VTX)

    ( DELAY( INCREMENT

    ( DEVICE 01(0.385090:0.385090:0.385090)(0.235177: 0.235177: 0.235177)

    ) ) ) )

    A part of the SDF file is shown below.It indicates XOR gate delays (low to high, high tolow) of minimum, typical and worst case timing

    Netlist from the synthesis tool

  • 7/31/2019 lecture6_FPGA

    93/104

    93ECE 545 Introduction to VHDL

    y

    library IEEE;

    library TC200G;use IEEE.std_logic_1164.all;

    use TC200G.components.all;

    entity CONSYN is

    port( RSTn, CLK, D0, D1, D2, D3, D4, D5,

    D6, D7 : in std_logic; FF_OUT,

    COMB_OUT, FF_COMB_OUT : out

    std_logic);end CONSYN;

    architecture structural of CONSYN is

    signal XOR8, FF, n70, n71, n72, n73, n74, n75,

    n76, n67, n68, n69 : std_logic;

    begin

    FF_OUT n75,

    D => XOR8, CP => CLK, CD => RSTn) ;

    U30 : MUX21L port map( Z => n71, A => n67, B =>

    n68, S => n69);

    U31 : EN port map( Z => n67, A => D1, B => D0);

    U32 : IV port map( Z => n68, A => n67);

    U33 : EOP port map( Z => n69, A => D6, B => D7);

    U34 : EO3 port map( Z => n70, A => D3, B => D2,

    C => D4);

    U35 : EO port map( Z => n72, A => D5, B => n70);U36 : EOP port map( Z => XOR8, A => n72,

    B => n71);

    U37 : FA1A port map( S => n73, CO => n76, CI => D3,

    A => D2, B => FF);

    U38 : EO3 port map( Z => n74, A => n68, B => n73,

    C => D4);

    U39 : EOP port map( Z => FF_COMB_OUT, A => D5,B => n74);

    end structural;

  • 7/31/2019 lecture6_FPGA

    94/104

    94ECE 545 Introduction to VHDL

    Timing parameters

    Timing parameters

  • 7/31/2019 lecture6_FPGA

    95/104

    95ECE 545 Introduction to VHDL

    definition units pipelining

    delay

    clock period

    clock frequency

    time pointpoint

    rising edgerising edge

    of clock

    1clock period

    ns

    ns

    MHz

    good

    good

    latency

    throughput

    time inputoutput

    #output bits/time unit

    ns

    Mbits/s

    bad

    good

    Basic iterative architectureof the encryption/decryption unit

  • 7/31/2019 lecture6_FPGA

    96/104

    96ECE 545 Introduction to VHDL

    register

    combinational

    logic

    one round

    multiplexer

    of the encryption/decryption unit

    round keys

    enc_dec

    Basic iterative architecture: Timing

  • 7/31/2019 lecture6_FPGA

    97/104

    97ECE 545 Introduction to VHDL

    IN

    OUT

    M1

    C1

    M2

    C2

    M3

    k clock_period

    CLK

    Latency

    Increasing throughput using pipelining

  • 7/31/2019 lecture6_FPGA

    98/104

    98ECE 545 Introduction to VHDL

    round 1

    round 16

    . . .

    Throughput =

    target_clock_period

    block sizetarget

    clock

    period,

    e.g., 20 ns

  • 7/31/2019 lecture6_FPGA

    99/104

    99ECE 545 Introduction to VHDL

    Optimizationcriteria

    Degrees of freedom and possible trade-offs

  • 7/31/2019 lecture6_FPGA

    100/104

    100ECE 545 Introduction to VHDL

    g p

    speed area

    power testability

    Degrees of freedom and possible trade-offs

  • 7/31/2019 lecture6_FPGA

    101/104

    101ECE 545

    Introduction to VHDL

    speed

    area

    latency

    throughput

    g p

  • 7/31/2019 lecture6_FPGA

    102/104

    102ECE 545

    Introduction to VHDL

    Optimizationmethods

    Speed optimization methods (1)

  • 7/31/2019 lecture6_FPGA

    103/104

    103ECE 545

    Introduction to VHDL

    better architecture (e.g., CLA vs. ripplecarry adder)

    pipelining

    parallel processing

    optimization options of synthesis andimplementation tools

    Speed optimization methods (2)

  • 7/31/2019 lecture6_FPGA

    104/104

    reducing fanout of control signals

    better state encoding

    registered outputs from the state machine