Date post: | 05-Apr-2018 |
Category: |
Documents |
Upload: | zaheer-abbas |
View: | 216 times |
Download: | 0 times |
of 104
7/31/2019 lecture6_FPGA
1/104
George Mason UniversityECE 545 Introduction to VHDL
FPGA Devicesand
FPGA Design Flow
ECE 545Lecture 6
7/31/2019 lecture6_FPGA
2/104
2ECE 545 Introduction to VHDL
Resources
Xilinx, Inc.Spartan-3 FPGA Introduction
Features
Architectural Overview
Spartan-3 FPGA Functional Description
CLB Overview,
Block RAM Overview
Dedicated Multipliers
http://direct.xilinx.com/bvdocs/publications/ds099.pdf
http://direct.xilinx.com/bvdocs/publications/ds099.pdfhttp://direct.xilinx.com/bvdocs/publications/ds099.pdf7/31/2019 lecture6_FPGA
3/104
3ECE 545 Introduction to VHDL
Resources
Integrated Interfaces: Active-HDL with Synplify
Integrated Synthesis and Implementation
Movie Demos
Active-HDL Help
http://www.aldec.com/products/active-hdl/multimediademo/movies/active_hdl_with_synplify/http://www.aldec.com/products/active-hdl/multimediademo/movies/fpga_synth_implement/http://www.aldec.com/products/active-hdl/multimediademo/movies/fpga_synth_implement/http://www.aldec.com/products/active-hdl/multimediademo/movies/fpga_synth_implement/http://www.aldec.com/products/active-hdl/multimediademo/movies/active_hdl_with_synplify/http://www.aldec.com/products/active-hdl/multimediademo/movies/active_hdl_with_synplify/http://www.aldec.com/products/active-hdl/multimediademo/movies/active_hdl_with_synplify/7/31/2019 lecture6_FPGA
4/104
4ECE 545 Introduction to VHDL
designs must be sentfor expensive and timeconsuming fabricationin semiconductor foundry
bought off the shelfand reconfigured bydesigners themselves
Two competing implementation approaches
ASICApplicationSpecificIntegratedCircuit
FPGAFieldProgrammableGateArray
designed all the wayfrom behavioral descriptionto physical layout
no physical layout design;design ends witha bitstream usedto configure a device
7/31/2019 lecture6_FPGA
5/104
5ECE 545 Introduction to VHDL
Which Way to Go?
Off-the-shelf
Low development cost
Short time to market
Reconfigurability
High performance
ASICs FPGAs
Low power
Low cost inhigh volumes
7/31/2019 lecture6_FPGA
6/104
7/31/2019 lecture6_FPGA
7/1047ECE 545 Introduction to VHDL
FPGA vendors
and
FPGA families
7/31/2019 lecture6_FPGA
8/1048ECE 545 Introduction to VHDL
1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000
FPGAs
ASICs
CPLDs
SPLDs
Microprocessors
SRAMs & DRAMs
ICs (General)
Transistors
The Design Warriors Guide to FPGAsDevices, Tools, and Flows. ISBN 0750676043
Copyright 2004 Mentor Graphics Corp. (www.mentor.com)
Technology Timeline
7/31/2019 lecture6_FPGA
9/1049ECE 545 Introduction to VHDL
Major FPGA vendors
SRAM-based FPGAsXilinx Inc. www.xilinx.com
Altera Corp. www.altera.com
Atmel Corp. www.atmel.com
Lattice Semiconductor Corp.
www.latticesemi.com
Antifuse and flash-based FPGAsActel Corp. www.actel.com
QuickLogic Corp.www.quicklogic.com
http://www.xilinx.com/http://www.altera.com/http://www.atmel.com/http://www.latticesemi.com/http://www.actel.com/http://www.quicklogic.com/http://www.quicklogic.com/http://www.actel.com/http://www.latticesemi.com/http://www.atmel.com/http://www.altera.com/http://www.xilinx.com/7/31/2019 lecture6_FPGA
10/10410ECE 545 Introduction to VHDL
State-of-the-art
Feature
Technology node
SRAM AntifuseE2PROM /
FLASH
One or more
generations behind
One or more
generations behind
Fast
Reprogrammingspeed (inc.
erasing)----
3x slower
than SRAM
Yes
Volatile (must
be programmedon power-up)
NoNo
(but can be if required)
MediumPower
consumptionLow Medium
Acceptable(especially when using
bitstream encryption)
IP Security Very Good Very Good
Large
(six transistors)
Size ofconfiguration cell
Very smallMedium-small
(two transistors)
NoRad Hard Yes Not really
NoInstant-on Yes Yes
YesRequires external
configuration fileNo No
Yes
(very good)
Good forprototyping
NoYes
(reasonable)
Yes
(in system)Reprogrammable No
Yes (in-system
or offline)
The Design Warriors Guide to FPGAsDevices, Tools, and Flows. ISBN 0750676043
Copyright 2004 Mentor Graphics Corp. (www.mentor.com)
7/31/2019 lecture6_FPGA
11/10411ECE 545 Introduction to VHDL
The Programmable MarketplaceQ1 Calendar Year 2005
Source: Company reportsLatest information available; computed on a 4-quarter rolling basis
XilinxAltera
LatticeActel
QuickLogic: 2% Xilinx
All Others
Two dominant suppliers, indicating a maturing market
PLD Segment FPGA Sub-Segment
Other: 2%
51%33%
5% 7%
Altera
58%
31% 11%
7/31/2019 lecture6_FPGA
12/10412ECE 545 Introduction to VHDL
PLD Market Share
Source: Gartner Dataquest
$2.3B$2.6B$4.1B$2.6B$2.1B $2.6B $3.1B
31% 33% 34% 32% 31% 32% 32%
39%32%
28% 24% 20% 18% 17%
49%50%
44%38%
35%30%
51%
0%
20%
40%
60%
80%
100%
Calendar year 1998 1999 2000 2001 2002 2003 2004
MarketSha
re(%)
Xilinx Altera All Others
7/31/2019 lecture6_FPGA
13/10413ECE 545 Introduction to VHDL
FPGA families
Spartan 3 Virtex 4 LX / SX / FXSpartan 3E Virtex 5 LXSpartan 3L
Low-cost High-performance
Xilinx
Altera Cyclone II Stratix II
Stratix II GX
7/31/2019 lecture6_FPGA
14/10414ECE 545 Introduction to VHDL
Xilinx
Primary products: FPGAs and the associated CAD
software
Main headquarters in San Jose, CA
Fabless* Semiconductor and Software Company
UMC (Taiwan) {*Xilinx acquired an equity stake inUMC in 1996}
Seiko Epson (Japan)
TSMC (Taiwan)
ProgrammableLogic Devices ISE Alliance and Foundation
Series Design Software
Source: [Xilinx Inc.]
7/31/2019 lecture6_FPGA
15/10415ECE 545 Introduction to VHDL
Xilinx FPGA Families
Old families
XC3000, XC4000, XC5200 Old 0.5m, 0.35m and 0.25m technology. Not
recommended for modern designs.
Low Cost Family
Spartan/XL derived from XC4000
Spartan-II
derived from Virtex Spartan-IIE derived from Virtex-E
Spartan-3, Spartan 3E, Spartan 3L
High-performance families
Virtex (220 nm)
Virtex-E, Virtex-EM (180 nm) Virtex-II, Virtex-II PRO (130 nm)
Virtex-4 (90 nm)
Virtex 5 (65 nm)
Source: [Xilinx Inc.]
P i f h f ili f
7/31/2019 lecture6_FPGA
16/10416ECE 545 Introduction to VHDL
Prices of the most recent families ofXilinx FPGAs
Spartan 3 Virtex II, Virtex II-Pro
< $130* < $3,000*
Spartan 3E Virtex 4
< $35* < $3,000*
* approximate cost of the largest device per unit for
a batch of 10,000 units
Low-cost High-performance
7/31/2019 lecture6_FPGA
17/10417ECE 545 Introduction to VHDL
Xilinx FPGAs
7/31/2019 lecture6_FPGA
18/10418ECE 545 Introduction to VHDL
BlockRAMs
BlockRAMs
ConfigurableLogicBlocks
I/OBlocks
Xilinx FPGA
Block
RAMs
7/31/2019 lecture6_FPGA
19/10419ECE 545 Introduction to VHDL
CLB CLB
CLB CLB
Logic cell
Slice
Logic cell
Logic cell
Slice
Logic cell
Logic cell
Slice
Logic cell
Logic cell
Slice
Logic cell
Configurable logic block (CLB)
The Design Warriors Guide to FPGAsDevices, Tools, and Flows. ISBN 0750676043
Copyright 2004 Mentor Graphics Corp. (www.mentor.com)
Xilinx CLB
7/31/2019 lecture6_FPGA
20/10420ECE 545 Introduction to VHDL
16-bit SR
flip-flop
clock
mux
y
qe
a
b
c
d
16x1 RAM
4-input
LUT
clock enable
set/reset
The Design Warriors Guide to FPGAsDevices, Tools, and Flows. ISBN 0750676043
Copyright 2004 Mentor Graphics Corp. (www.mentor.com)
Simplified view of a Xilinx Logic Cell
7/31/2019 lecture6_FPGA
21/10421ECE 545 Introduction to VHDL
LUT (Look-Up Table) Functionality
Look-Up tablesare primaryelements forlogic
implementation Each LUT can
implement anyfunction of 4
inputs
x1 x2 x3 x4
y
x1 x2
y
LUT
x1x2x3x4
y
0
x10
x2 x3 x40 0
0 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 0
0 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y
0100010
101001100
0
x10
x2 x3 x40 0
0 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 0
0 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y
1111111
111110000
x1 x2 x3 x4
y
x1 x2 x3 x4
y
x1 x2
y
x1 x2
y
LUT
x1x2x3x4
y
0
x10
x2 x3 x40 0
0 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 0
0 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y
0100010
101001100
0
x10
x2 x3 x40 0
0 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 0
0 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y
0100010
101001100
0
x10
x2 x3 x40 0
0 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 0
0 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y
1111111
111110000
0
x10
x2 x3 x40 0
0 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 0
0 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y
1111111
111110000
7/31/2019 lecture6_FPGA
22/104
22ECE 545 Introduction to VHDL
5-Input Functions implemented using two LUTs
LUTLUT
X5 X4 X3 X2 X1 Y
0 0 0 0 0 0
0 0 0 0 1 1
0 0 0 1 0 0
0 0 0 1 1 0
0 0 1 0 0 1
0 0 1 0 1 1
0 0 1 1 0 0
0 0 1 1 1 0
0 1 0 0 0 1
0 1 0 0 1 0
0 1 0 1 0 0
0 1 0 1 1 1
0 1 1 0 0 1
0 1 1 0 1 1
0 1 1 1 0 1
0 1 1 1 1 1
1 0 0 0 0 0
1 0 0 0 1 0
1 0 0 1 0 0
1 0 0 1 1 0
1 0 1 0 0 0
1 0 1 0 1 0
1 0 1 1 0 01 0 1 1 1 1
1 1 0 0 0 0
1 1 0 0 1 1
1 1 0 1 0 0
1 1 0 1 1 1
1 1 1 0 0 0
1 1 1 0 1 1
1 1 1 1 0 0
1 1 1 1 1 0
LUTLUT
OUT
7/31/2019 lecture6_FPGA
23/104
23ECE 545 Introduction to VHDL
RAM16X1S
O
D
WE
WCLKA0
A1
A2
A3
RAM32X1S
O
DWE
WCLKA0A1A2A3A4
RAM16X2S
O1
D0
WE
WCLKA0A1A2A3
D1
O0
=
=
LUT
LUT or
LUT
RAM16X1D
SPO
D
WE
WCLK
A0
A1
A2
A3
DPRA0 DPO
DPRA1
DPRA2
DPRA3
or
Distributed RAM
CLB LUT configurable asDistributed RAM A LUT equals 16x1 RAM
Implements Single and Dual-
Ports Cascade LUTs to increaseRAM size
Synchronous write
Synchronous/Asynchronousread Accompanying flip-flops used
for synchronous read
7/31/2019 lecture6_FPGA
24/104
24ECE 545 Introduction to VHDL
D QCE
D QCE
D QCE
D QCE
LUT
INCE
CLK
DEPTH[3:0]
OUTLUT =
Shift Register
Each LUT can beconfigured as shift register Serial in, serial out
Dynamically addressabledelay up to 16 cycles
For programmablepipeline Cascade for greater cycle
delays Use CLB flip-flops to add
depth
7/31/2019 lecture6_FPGA
25/104
25ECE 545 Introduction to VHDL
Shift Register
Register-rich FPGA Allows for addition of pipeline stages to increase
throughput
Data paths must be balanced to keep desiredfunctionality
64
Operation A
4 Cycles 8 Cycles
Operation B
3 Cycles
Operation C
64
12 Cycles
3 Cycles
9-Cycle imbalance
7/31/2019 lecture6_FPGA
26/104
26ECE 545 Introduction to VHDL
16-bit SR
16 x 1 RAM
4-input LUT
The Design Warriors Guide to FPGAsDevices, Tools, and Flows. ISBN 0750676043
Copyright 2004 Mentor Graphics Corp. (www.mentor.com)
Xilinx Multipurpose LUT
7/31/2019 lecture6_FPGA
27/104
27ECE 545 Introduction to VHDL
16-bit SR
flip-flop
clock
mux
y
qe
a
b
c
d
16x1 RAM
4-input
LUT
clock enable
set/reset
The Design Warriors Guide to FPGAsDevices, Tools, and Flows. ISBN 0750676043
Copyright 2004 Mentor Graphics Corp. (www.mentor.com)
Simplified view of a Xilinx Logic Cell
7/31/2019 lecture6_FPGA
28/104
28ECE 545 Introduction to VHDL
COUT
D Q
CK
S
REC
D Q
CK
REC
O
G4G3G2G1
Look-UpTable
Carry
&
Control
Logic
O
YB
Y
F4F3F2F1
XB
X
Look-UpTable
F5IN
BY
SR
S
Carry
&
Control
Logic
CINCLKCE
SLICE
Carry & Control Logic
7/31/2019 lecture6_FPGA
29/104
29ECE 545 Introduction to VHDL
Each CLB contains separatelogic and routing for the fastgeneration of sum & carrysignals
Increases efficiency andperformance of adders,subtractors, accumulators,comparators, and counters
Carry logic is independent ofnormal logic and routingresources
Fast Carry Logic
LSB
MSB
CarryLog
ic
Routing
7/31/2019 lecture6_FPGA
30/104
30ECE 545 Introduction to VHDL
Accessing Carry Logic
All major synthesis tools can infer carrylogic for arithmetic functions
Addition (SUM
7/31/2019 lecture6_FPGA
31/104
31ECE 545 Introduction to VHDL
CLB Slice Structure
Each slice contains two sets of the
following: Four-input LUT
Any 4-input logic function,
or 16-bit x 1 sync RAM (SLICEM only)
or 16-bit shift register (SLICEM only)
Carry & Control Fast arithmetic logic
Multiplier logic
Multiplexer logic
Storage element Latch or flip-flop Set and reset
True or inverted inputs
Sync. or async. control
7/31/2019 lecture6_FPGA
32/104
George Mason UniversityECE 545 Introduction to VHDL
Block RAM(BRAM)
7/31/2019 lecture6_FPGA
33/104
33ECE 545 Introduction to VHDL
Block RAM
Spartan-3Dual-Port
Block RAM
PortA
P
ortB
Block RAM
Most efficient memory implementation
Dedicated blocks of memory
Ideal for most memory requirements
4 to 104 memory blocks
18 kbits = 18,432 bits per block (16 k without parity bits)
Use multiple blocks for larger memories
Builds both single and true dual-port RAMs
RAM Blocks and Multipliers in Xilinx
7/31/2019 lecture6_FPGA
34/104
34ECE 545 Introduction to VHDL
RAM blocks
Multipliers
Logic blocks
RAM Blocks and Multipliers in XilinxFPGAs
The Design Warriors Guide to FPGAsDevices, Tools, and Flows. ISBN 0750676043
Copyright 2004 Mentor Graphics Corp. (www.mentor.com)
7/31/2019 lecture6_FPGA
35/104
35ECE 545 Introduction to VHDL
Spartan-3 Block RAM Amounts
7/31/2019 lecture6_FPGA
36/104
36ECE 545 Introduction to VHDL
Block RAM Port Aspect Ratios
0
16,383
1
4,095
4
0
8,191
2
0
2047
8+1
0
1023
16+2
0
16k x 1
8k x 2 4k x 4
2k x (8+1)
1024 x (16+2)
7/31/2019 lecture6_FPGA
37/104
37ECE 545 Introduction to VHDL
Block RAM Port Aspect Ratios
7/31/2019 lecture6_FPGA
38/104
38ECE 545 Introduction to VHDL
Single-Port Block RAM
7/31/2019 lecture6_FPGA
39/104
7/31/2019 lecture6_FPGA
40/104
40ECE 545 Introduction to VHDL
RAMB4_S16_S8
Port A Out18-Bit Width
Port B In2k-Bit Depth
Port A In1K-Bit Depth
Port B Out9-Bit Width
DOA[17:0]
DOB[8:0]
WEA
ENA
RSTA
ADDRA[9:0]
CLKA
DIA[17:0]
WEB
ENB
RSTB
ADDRB[10:0]
CLKB
DIB[8:0]
Dual-Port Bus Flexibility
Each port can be configured with a different data buswidth
Provides easy data width conversion without anyadditional logic
7/31/2019 lecture6_FPGA
41/104
41ECE 545 Introduction to VHDL
0, ADDR[12:0]
1, ADDR[12:0]
RAMB4_S1_S1
Port B Out1-Bit Width
DOA[0]
DOB[0]
WEAENA
RSTA
ADDRA[12:0]
CLKA
DIA[0]
WEB
ENB
RSTB
ADDRB[12:0]
CLKB
DIB[0]
Port B In
8K-Bit Depth
Port A Out
1-Bit Width
Port A In8K-Bit Depth
Two Independent Single-Port RAMs
To access the lower RAM
Tie the MSB address bit toLogic Low
To access the upper RAM Tie the MSB address bit to
Logic High
Added advantage of True Dual-
Port No wasted RAM Bits
Can split a Dual-Port 16K RAM intotwo Single-Port 8K RAM
Simultaneous independent accessto each RAM
7/31/2019 lecture6_FPGA
42/104
George Mason UniversityECE 545 Introduction to VHDL
Embedded Multipliers
7/31/2019 lecture6_FPGA
43/104
43ECE 545 Introduction to VHDL
18 x 18 Embedded Multiplier
Fast arithmetic functions Optimized to implement
multiply / accumulate modules
18 x 18 signed multiplier
Fully combinational
Optional registers with CE & RST (pipeline)
Independent from adjacent block RAM
7/31/2019 lecture6_FPGA
44/104
44ECE 545 Introduction to VHDL
18 x 18 Multiplier
Embedded 18-bit x 18-bit multiplier
2s complement signed operation
Multipliers are organized in columns
18 x 18Multiplier
Output(36 bits)
Data_A(18 bits)
Data_B(18 bits)
P iti f M lti li
7/31/2019 lecture6_FPGA
45/104
45ECE 545 Introduction to VHDL
Positions of Multipliers
7/31/2019 lecture6_FPGA
46/104
46ECE 545 Introduction to VHDL
Asynchronous 18-bit Multiplier
7/31/2019 lecture6_FPGA
47/104
47ECE 545 Introduction to VHDL
18-bit Multiplier with Register
7/31/2019 lecture6_FPGA
48/104
George Mason UniversityECE 545 Introduction to VHDL
Input/Output Blocks(IOBs)
Basic I/O Block Structure
7/31/2019 lecture6_FPGA
49/104
49ECE 545 Introduction to VHDL
Basic I/O Block Structure
D
EC
Q
SR
D
EC
Q
SR
D
EC
Q
SR
Three-StateControl
Output Path
Input Path
Three-State
Output
Clock
Set/Reset
Direct Input
RegisteredInput
FF Enable
FF Enable
FF Enable
IOB F ti lit
7/31/2019 lecture6_FPGA
50/104
50ECE 545 Introduction to VHDL
IOB Functionality
IOB provides interface between thepackage pins and CLBs
Each IOB can work as uni- or bi-directional
I/O Outputs can be forced into High Impedance
Inputs and outputs can be registered
advised for high-performance I/O
Inputs can be delayed
7/31/2019 lecture6_FPGA
51/104
George Mason UniversityECE 545 Introduction to VHDL
Spartan-3 Family Attributes
S t 3 FPGA F il M b
7/31/2019 lecture6_FPGA
52/104
52ECE 545 Introduction to VHDL
Spartan-3 FPGA Family Members
7/31/2019 lecture6_FPGA
53/104
George Mason UniversityECE 545 Introduction to VHDL
FPGA Design Flow
7/31/2019 lecture6_FPGA
54/104
54ECE 545 Introduction to VHDL
Design process (1)
Design and implement a simple unit permitting to
speed up encryption with RC5-similar cipher with
fixed key set on 8031 microcontroller. Unlike in
the experiment 5, this time your unit has to be able
to perform an encryption algorithm by itself,
executing 32 rounds..
LibraryIEEE;
use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;
entity RC5_core is
port(clock, reset, encr_decr: in std_logic;
data_input: in std_logic_vector(31downto0);
data_output: out std_logic_vector(31downto0);
out_full: in std_logic;
key_input: in std_logic_vector(31downto0);
key_read: out std_logic;
);
end AES_core;
Specification
VHDL description (Your VHDL Source Files)
Functional simulation
Post-synthesis simulationSynthesis
7/31/2019 lecture6_FPGA
55/104
Design Process control from Active HD
7/31/2019 lecture6_FPGA
56/104
56ECE 545 Introduction to VHDL
Design Process control from Active-HD
Logic Synthesis
7/31/2019 lecture6_FPGA
57/104
57ECE 545 Introduction to VHDL
architecture MLU_DATAFLOW of MLU is
signal A1:STD_LOGIC;
signal B1:STD_LOGIC;
signal Y1:STD_LOGIC;
signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC;
begin
A1
7/31/2019 lecture6_FPGA
58/104
58ECE 545 Introduction to VHDL
Synthesis Tools
and others
Features of synthesis tools
7/31/2019 lecture6_FPGA
59/104
59ECE 545 Introduction to VHDL
Features of synthesis tools
Interpret RTL code
Produce synthesized circuit netlist in
a standard EDIF format Give preliminary performance
estimates
Some can display circuit schematicscorresponding to EDIF netlist
Timing report after s nthesis
7/31/2019 lecture6_FPGA
60/104
60ECE 545 Introduction to VHDL
Timing report after synthesis
Performance Summary
*******************
Worst slack in design: -0.924
Requested Estimated Requested Estimated
Clock ClockStarting Clock Frequency Frequency Period Period Slack
Type Group-------------------------------------------------------------------------------------------------------exam1|clk 85.0 MHz 78.8 MHz 11.765 12.688 -0.924
inferred Inferred_clkgroup_0System 85.0 MHz 86.4 MHz 11.765 11.572 0.193
system default_clkgroup===========================================================
Implementation
7/31/2019 lecture6_FPGA
61/104
61ECE 545 Introduction to VHDL
Implementation
After synthesis the entire implementationprocess is performed by FPGA vendor
tools
7/31/2019 lecture6_FPGA
62/104
62ECE 545 Introduction to VHDL
Mapping
7/31/2019 lecture6_FPGA
63/104
63ECE 545 Introduction to VHDL
Mapping
LUT2
LUT3
LUT4
LUT5
LUT1FF1
FF2
LUT0
Placing FPGA
7/31/2019 lecture6_FPGA
64/104
64ECE 545 Introduction to VHDL
Placing
CLB SLICES
FPGA
Routing FPGA
7/31/2019 lecture6_FPGA
65/104
65ECE 545 Introduction to VHDL
Routing
Programmable Connections
FPGA
Map report header
7/31/2019 lecture6_FPGA
66/104
66ECE 545 Introduction to VHDL
Map report header
Release 7.1.03i Map H.41Xilinx Mapping Report File for Design 'exam1'
Design Information------------------Command Line : c:\Xilinx\bin\nt\map.exe -p 2S200FG256-6 -o map.ncd -pr b -k
4-cm area -c 100 -tx off exam1.ngd exam1.pcfTarget Device : xc2s200Target Package : fg256Target Speed : -6Mapper Version : spartan2 -- $Revision: 1.26.6.4 $
Mapped Date : Wed Nov 02 11:15:15 2005
Map report
7/31/2019 lecture6_FPGA
67/104
67ECE 545 Introduction to VHDL
Map reportDesign Summary--------------
Number of errors: 0Number of warnings: 0Logic Utilization:Number of Slice Flip Flops: 144 out of 4,704 3%Number of 4 input LUTs: 173 out of 4,704 3%
Logic Distribution:Number of occupied Slices: 145 out of 2,352 6%Number of Slices containing only related logic: 145 out of 145 100%Number of Slices containing unrelated logic: 0 out of 145 0%
*See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs: 210 out of 4,704 4%
Number used as logic: 173Number used as a route-thru: 5Number used as 16x1 RAMs: 32
Number of bonded IOBs: 74 out of 176 42%Number of GCLKs: 1 out of 4 25%Number of GCLKIOBs: 1 out of 4 25
Place & route report
7/31/2019 lecture6_FPGA
68/104
68ECE 545 Introduction to VHDL
Place & route report
Timing Score: 0
Asterisk (*) preceding a constraint indicates it was not met.
This may be due to a setup or hold violation.
--------------------------------------------------------------------------------
Constraint | Requested | Actual | Logic
| | | Levels--------------------------------------------------------------------------------
TS_clk = PERIOD TIMEGRP "clk" 11.765 ns | 11.765ns | 11.622ns | 13
HIGH 50% | | |
--------------------------------------------------------------------------------
OFFSET = OUT 11.765 ns AFTER COMP "clk" | 11.765ns | 11.491ns | 1
--------------------------------------------------------------------------------
OFFSET = IN 11.765 ns BEFORE COMP "clk" | 11.765ns | 11.442ns | 2--------------------------------------------------------------------------------
Post layout timing report
7/31/2019 lecture6_FPGA
69/104
69ECE 545 Introduction to VHDL
Post layout timing report
Timing summary:---------------
Timing errors: 0 Score: 0
Constraints cover 42912 paths, 0 nets, and 1038 connections
Design statistics:
Minimum period: 11.622ns (Maximum frequency:86.044MHz)
Minimum input required time before clock: 11.442ns
Minimum output required time after clock: 11.491ns
7/31/2019 lecture6_FPGA
70/104
Configuration of SRAM based FPGAs
7/31/2019 lecture6_FPGA
71/104
71ECE 545 Introduction to VHDL
Configuration data inConfiguration data out
= I/O pin/pad
= SRAM cell
The Design Warriors Guide to FPGAsDevices, Tools, and Flows. ISBN 0750676043
Copyright 2004 Mentor Graphics Corp. (www.mentor.com)
Configuration of SRAM based FPGAs
7/31/2019 lecture6_FPGA
72/104
72ECE 545 Introduction to VHDL
Configuration times
of selected FPGAdevices
7/31/2019 lecture6_FPGA
73/104
73ECE 545 Introduction to VHDL
Timing of digital circuits
Timing Characteristics of Combinational
7/31/2019 lecture6_FPGA
74/104
74ECE 545 Introduction to VHDL
Circuits
Combinational Circuits AreCharacterized by Propagation Delays
through logic components (gates, LUTs)
through interconnects (routing delays)
tp LUT tp routing
LUT LUT LUT
Total propagation delay through combinational logic
Timing Characteristics of CombinationalCi it (2)
7/31/2019 lecture6_FPGA
75/104
75ECE 545 Introduction to VHDL
Circuits (2)
Total Propagation Delay of LogicDepends on the Number of Logic Levelsand Delays of Logic Components
Number of logic levels is the number oflogic components (gates, LUTs) the signalpropagates through
Routing Delays Depend on:
Length of interconnects Fanout
Timing Characteristics of Combinational
7/31/2019 lecture6_FPGA
76/104
76ECE 545 Introduction to VHDL
Circuits (3)
Fanout Number of Inputs Connectedto One Output
Each inputs has its capacitance
Fast switching of outputs with high fanoutrequires higher currents and strong drivers
LUT LUT
LUT
LUT
Timing Characteristics of Combinational
7/31/2019 lecture6_FPGA
77/104
77ECE 545 Introduction to VHDL
Circuits (4)
In Current Technologies Routing DelaysMake 45-65% of the Total PropagationDelays
Timing Characteristics of Sequential
7/31/2019 lecture6_FPGA
78/104
78ECE 545 Introduction to VHDL
Circuits (1)
Timing Features of Flip-flops Setup time tS minimum time the input has
to be stable before the rising edge of theclock
Hold time tH minimum time the input hasto be stable after the rising edge of theclock
Propagation delay tP time to propagate
input to output after the rising edge of theclock
Timing Characteristics of Sequential
7/31/2019 lecture6_FPGA
79/104
79ECE 545 Introduction to VHDL
Circuits (2)
D Q
clk
clk
D
Q
tS tH
tP
Input D must remainstable during
this interval
Input D can freelychange during
this interval
Critical Path (1)
7/31/2019 lecture6_FPGA
80/104
80ECE 545 Introduction to VHDL
Critical Path (1)
Critical Path The Longest Path FromOutputs of Registers to Inputs ofRegisters
D Qin
clk
D Qout
tlogic
tCritical = tFF-P + tlogic + tFF-setup
Critical Path (2)
7/31/2019 lecture6_FPGA
81/104
81ECE 545 Introduction to VHDL
Critical Path (2)
Min. Clock Period = Length of TheCritical Path
Max. Clock Frequency = 1 / Min. ClockPeriod
7/31/2019 lecture6_FPGA
82/104
82ECE 545 Introduction to VHDL
n+m
n+m
Clock Jitter
7/31/2019 lecture6_FPGA
83/104
83ECE 545 Introduction to VHDL
Clock Jitter
Rising Edge of The Clock Does NotOccurPrecisely Periodically
May cause faults in the circuit
clk
Clock Skew
7/31/2019 lecture6_FPGA
84/104
84ECE 545 Introduction to VHDL
Clock Skew
Rising Edge of the Clock Does Not Arrive atClock Inputs of All Flip-flops at The SameTime
D Qin
clk
D Qout
delay
D Qin
clk
D Q out
delay
Clock skew
7/31/2019 lecture6_FPGA
85/104
85ECE 545 Introduction to VHDL
Clock skew
H-clock tree used to minimize clock skew
7/31/2019 lecture6_FPGA
86/104
86ECE 545 Introduction to VHDL
H clock tree used to minimize clock skew
Dealing With Clock Problems
7/31/2019 lecture6_FPGA
87/104
87ECE 545 Introduction to VHDL
Dealing With Clock Problems
Use Only Dedicated Clock Nets for ClockSignals
Do Not Put Any Logic in Clock Nets
Basic I/O Block Structure
7/31/2019 lecture6_FPGA
88/104
88ECE 545 Introduction to VHDL
DEC
Q
SR
D
EC
Q
SR
D
EC
Q
SR
Three-StateControl
Output Path
Input Path
Three-State
Output
Clock
Set/Reset
Direct Input
RegisteredInput
FF Enable
FF Enable
FF Enable
IOB Functionality
7/31/2019 lecture6_FPGA
89/104
89ECE 545 Introduction to VHDL
IOB Functionality
IOB provides interface between thepackage pins and CLBs
Each IOB can work as uni- or bi-directional
I/O Outputs can be forced into High Impedance
Inputs and outputs can be registered
advised for high-performance I/O Inputs can be delayed
7/31/2019 lecture6_FPGA
90/104
90ECE 545 Introduction to VHDL
Timing simulation afterimplementation
Timing vs. functional simulation
7/31/2019 lecture6_FPGA
91/104
91ECE 545 Introduction to VHDL
g s u c o a s u a o
Simulation before synthesis is used to verifycircuit functionality and may differfrom the oneafter synthesis and implementation
Implementation tool generates SDF (StandardDelay Format) as a standard delay file and thenetlist for synthesized VHDL code with delays.
Generated netlist contains many componentinstantiation statements with library references
SDF file
7/31/2019 lecture6_FPGA
92/104
92ECE 545 Introduction to VHDL
( DELAYFILE
( CELL( CELLTYPE XOR)
( INSTANCE U34.Z_VTX)
( DELAY( INCREMENT
( DEVICE 01(0.385090:0.385090:0.385090)(0.235177: 0.235177: 0.235177)
) ) ) )
A part of the SDF file is shown below.It indicates XOR gate delays (low to high, high tolow) of minimum, typical and worst case timing
Netlist from the synthesis tool
7/31/2019 lecture6_FPGA
93/104
93ECE 545 Introduction to VHDL
y
library IEEE;
library TC200G;use IEEE.std_logic_1164.all;
use TC200G.components.all;
entity CONSYN is
port( RSTn, CLK, D0, D1, D2, D3, D4, D5,
D6, D7 : in std_logic; FF_OUT,
COMB_OUT, FF_COMB_OUT : out
std_logic);end CONSYN;
architecture structural of CONSYN is
signal XOR8, FF, n70, n71, n72, n73, n74, n75,
n76, n67, n68, n69 : std_logic;
begin
FF_OUT n75,
D => XOR8, CP => CLK, CD => RSTn) ;
U30 : MUX21L port map( Z => n71, A => n67, B =>
n68, S => n69);
U31 : EN port map( Z => n67, A => D1, B => D0);
U32 : IV port map( Z => n68, A => n67);
U33 : EOP port map( Z => n69, A => D6, B => D7);
U34 : EO3 port map( Z => n70, A => D3, B => D2,
C => D4);
U35 : EO port map( Z => n72, A => D5, B => n70);U36 : EOP port map( Z => XOR8, A => n72,
B => n71);
U37 : FA1A port map( S => n73, CO => n76, CI => D3,
A => D2, B => FF);
U38 : EO3 port map( Z => n74, A => n68, B => n73,
C => D4);
U39 : EOP port map( Z => FF_COMB_OUT, A => D5,B => n74);
end structural;
7/31/2019 lecture6_FPGA
94/104
94ECE 545 Introduction to VHDL
Timing parameters
Timing parameters
7/31/2019 lecture6_FPGA
95/104
95ECE 545 Introduction to VHDL
definition units pipelining
delay
clock period
clock frequency
time pointpoint
rising edgerising edge
of clock
1clock period
ns
ns
MHz
good
good
latency
throughput
time inputoutput
#output bits/time unit
ns
Mbits/s
bad
good
Basic iterative architectureof the encryption/decryption unit
7/31/2019 lecture6_FPGA
96/104
96ECE 545 Introduction to VHDL
register
combinational
logic
one round
multiplexer
of the encryption/decryption unit
round keys
enc_dec
Basic iterative architecture: Timing
7/31/2019 lecture6_FPGA
97/104
97ECE 545 Introduction to VHDL
IN
OUT
M1
C1
M2
C2
M3
k clock_period
CLK
Latency
Increasing throughput using pipelining
7/31/2019 lecture6_FPGA
98/104
98ECE 545 Introduction to VHDL
round 1
round 16
. . .
Throughput =
target_clock_period
block sizetarget
clock
period,
e.g., 20 ns
7/31/2019 lecture6_FPGA
99/104
99ECE 545 Introduction to VHDL
Optimizationcriteria
Degrees of freedom and possible trade-offs
7/31/2019 lecture6_FPGA
100/104
100ECE 545 Introduction to VHDL
g p
speed area
power testability
Degrees of freedom and possible trade-offs
7/31/2019 lecture6_FPGA
101/104
101ECE 545
Introduction to VHDL
speed
area
latency
throughput
g p
7/31/2019 lecture6_FPGA
102/104
102ECE 545
Introduction to VHDL
Optimizationmethods
Speed optimization methods (1)
7/31/2019 lecture6_FPGA
103/104
103ECE 545
Introduction to VHDL
better architecture (e.g., CLA vs. ripplecarry adder)
pipelining
parallel processing
optimization options of synthesis andimplementation tools
Speed optimization methods (2)
7/31/2019 lecture6_FPGA
104/104
reducing fanout of control signals
better state encoding
registered outputs from the state machine