FPGA Architecture

Field Programmable Gate Array (FPGA)

Classification of the Programmable Devices:

- IC semiconductor technology used- Vaulted and non-vaulted- Programmability- Flexibility- Capacity- Routing method - Characteristics of basic cell (virtual classification)

Example: 16-bit adder

-It can be implemented using FPGA by using only 20 cells each of which is built using less than 100 transistor. Hence, it needs less than 2,000 transistors.

- Needs about 73,000,000,000 bits when it is implemented using bit cell devices such as ROM. Each bit cell usually needs on average a 1.5 transistor. Hence, bit cell implementation requires 109,500,000,000 transistor.

The use of programmable devices in DSP Systems

- Most DSP systems use the gate cell types because of their possibilities of reducing the Boolean expressions

- PLA and PAL (SPLD) have a low flexibility and low capacity (maximum 32 inputs and 32 outputs) therefore, they use small DSP circuits in its implementation.

- The Complex Programmable Logic Device (CPLD) is the first generation of the FPGA that has high flexibility but with low capacity and is used to implement small circuits and medium systems with such small control circuits.

- FPGA is a new type of the programmable devices that has the high flexibility and high capacity. It has a different internal architecture which depends on the company design (such as Altera, Xilinx… etc).

Architecture of Field Programmable Gate Arrays (FPGA)

- It is consisting of an array of logic blocks that can be programmable interconnected to realize different designs

- FPGA is programmed via electrically programmable switches much the same as traditional programmable logic devices (PLDs).

- FPGAs can be used to implement just about any hardware design.

- The common feature of these is that FPGA is a set of free or semi-free connection matrix gates.

- FPGA logic blocks differ greatly in their size and implementation capability. This logic can be very small as two-transistor logic block used in the Crosspoint FPGA and can be significantly of large size like the look-up table used in the Xilinx 3000 series FPGA.

Programming Technologies

1- SRAM Programming Technology

- FPGA connections are achieved using pass-transistors, transmission gates, or multiplexers that are controlled by SRAM cells

- It is used in the devices from Xilinx , Altera, Plessey, Algotronix, Concurrent Logic and Toshiba.

Disadvantage:Its large area. It takes at least five transistors to implement an SRAM cell, plus at least one transistor to serve as a programmable switch.

Advantages: 1- fast re-programmability (The FPGA can be programmed an unlimited number of times)2- It requires only standard integrated circuit process technology.

2 -Antifuse Programming Technology

- Antifuse is a two terminal device with an unprogrammed state presenting a very high resistance between its terminals.

- When a high voltage (from 11 to 20 volts, depending on the type of antifuse) is applied across its terminals the antifuse will “blow” and create a low resistance link

- Antifuse technology is used in the FPGA’s from Actel, Quicklogic, and Crosspoint.

Advantages: 1- Its small size. This advantage is somewhat reduced by the large size of the necessary programming transistors, which must be able to handle large currents, and the inclusion of isolation transistors that are sometimes needed to protect low voltage transistors from high programming voltages.2- A second major advantage of an antifuse is its relatively low series resistance.

Disadvantage: This technology can be used only once on one-time programmable (OTP) devices

3 -Floating Gate Programming Technology

- The floating gate programming (or EPROM/E2PROM ) technology uses technology found in ultraviolet erasable EPROM and electrically erasable E2PROM devices. It is used in devices from Altera and Plus Logic.

- The programmable switch is a transistor that can be permanently “disabled”. This is accomplished by injecting a charge on the floating gate (gate 2 in the Figure) using a high voltage between the control gate 1 and the drain of the transistor. This charge increases the threshold voltage of the transistor so that it turns off.

Advantages:1-its re-programmability. 2- No external permanent memory (like in SRAM) is needed to program the chip on power-up.

Disadvantage:An E2PROM cell is roughly twice the size of an EPROM cell

Logic Block Architecture

Fine-Grain Logic Blocks

1- The Crosspoint FPGA

- The FPGA from Crosspoint solutions uses a single transistor pair in the logic block

- Since the transistors are connected together in rows, the two two-input NAND gates are isolated by turning off the pair of transistors between the gates

2-The Plessey FPGA

- The main advantage of using fine grain logic blocks is that the useable blocks are fully utilized. This is because it is easier to use small logic gates efficiently.

- The main disadvantage of fine grain blocks is that they require a relatively large number of wire segments and programmable switches. Such routing resources are costly in delay and area.

- As a result, FPGA's employing fine grain blocks are in general slower and achieve lower densities than those employing coarse grain blocks

Coarse-Grain Logic Blocks

1 -Actel Logic Block

Act-1

702 logic functions can be realized

Act-2

766 logic functions can be realized

- the 16-bit adder needs about 3000 Actel-2 logic block with each block having 14 transistors. Hence, it needs about 42,000-transistor.

2 -Quick Logic Logic Block

- The logic block in the FPGA that forms QuickLogic is similar to the Actel logic blocks in that it employs a four to one multiplexer but each input of the multiplexer is fed by an AND gate

- The QuickLogic logic block is unique among FPGA architectures in that it offers up to 14-input-wide gating functions. This allows many logic functions to be accomplished in a single block delay that requires two or more delays with other architecture.

Programmed Quick Logic FPGA for the logic function

3 -The Altera Logic Block

The Altera 5000 Series logic block

- The architecture of the Altera FPGA has evolved from the PLA-based architecture of traditional PLDs with its logic block consisting of wide (20 to over 100 inputs) AND gates feeding into an OR gate with three to eight inputs.

- The advantage of this type of block is that the wide AND gate can be used to form logic functions with few levels of logic blocks, reducing the need for programmable interconnect. As well as logic connections also serve as the routing function.

- A disadvantage of the wired-AND configuration is the use of pull-up devices that consume static power. An array full of these pull-ups will consume significant amount of power. To mitigate this, each gate in the MAX 7000 series block can be programmed to consume about 60% less power but at the expense of about 40% increase in delay.

4-The Xilinx Logic Block

The Xilinx 3000 logic block

12 K

- The basis for the Xilinx logic block is an SRAM functioning as a look-up table (LUT). The truth table for a K-input logic function is stored in a

SRAM

- The advantage of the look-up tables is that they exhibit high functionality, a K-input LUT that can implement any function of K inputs and there are

such functionsK22

Lookup table-based logic

- Its disadvantage is that they will be quite large for more than about five inputs, since the number of memory cells needed for a K-input LUT is K2

Generally For 4-input LUT For 5-input LUT

LUT inputs n 4 5

PROM bits required 16 32

Possible functions

2n

22n

65,536 216 2964,294,967, 2

32

The Xilinx 4000 logic block

Two 4-input LUT can be used directly as an SRAM block. This allows small amounts of memory to be more efficiently implemented. Another feature is the inclusion of circuitry that can be used to implement fast carry addition circuits.

structure of a CLB of Virtex-E

- Each Virtex-E CLB contains four LCs, organized in two similar slices.- Virtex-E CLB contains logic that combines function generators to provide functions of

five or six inputs. - F6 input multiplex combines the outputs of all four-function generators in the CLB by

selecting one of the F5-multiplexer outputs. This permits the implementation of any 6-input function, an 8:1 multiplexer, or select functions of up to 19 inputs.

Comparative Study of Different Types of Programmable Devices

Typeof IC

Type of cell

Cost of cell / transistor

Total cost / cell* Total cost / transistor*

ROM bit 1.5 73,000,000,000 109,500,000,000

Plessey FPGA Fine gate 35 500 17,500

Xilinx FPGA Coarse gate 100 20 2000

*Approximate cost per 16-bit adder

- The circuits that have internal feedback must be built by using the gate cell devices.

- The forward transfer function (no internal) such as adder circuit must be built using either:

Bit cell devices for circuits with less than 16-bit inputs. Gate cell devices for circuits with more than 16-bit inputs.

- The small circuit that has feedback in its internal design is recommend to be used for fine-grain devices.

- All types of the large inputs circuit are recommend for fine-grain devices for simple transfer functions (the outputs have small Boolean equations).

- All types of the large input circuit is recommend to be used for coarse-grain devices for complex transfer functions (the outputs have long and complex Boolean equations).

- The large inputs circuit that has complex transfer functions is strongly recommended to use Xilinx FPGA devices.

Routing Architecture

- The routing architecture of an FPGA is the manner in which the programmable switches and wiring segments are positioned to allow the programmable interconnection of the logic blocks.

A wire segment is a wire unbroken by programmable switches. One or more switches may attach to the wire segment

A track is a sequence of one or more wire segments in a line.

A routing channel is a group of parallel tracks.

A Connection Block (CB) provides connectivity from the inputs and outputs of a logic block to the wire segments in the channels

Switch Block (SB) which provides connectivity between the horizontal as well as vertical wire segments

- In some architecture, the switch block and connection block are intermingled, and in others they are combined into a single structure.

- The switch block topology is different from device to another: eg. In topology 1 wire A and B can not be connected while in topology 2 they can.

The Xilinx Routing Architecture

- Connection block typically connects each pin to only two or three out of the five tracks passing by a block as the expanded figure in the upper left comer to save area due to the use of SRAM programming technology.

- On all four sides of the logic block there are connection blocks that connect a total of 11 different logic block pins to the wire segments.

General-purpose interconnect consisting of wire segments that pass through switches in the switch block.

Direct interconnect consisting of wire segments that connect each logic block output directly to four nearest neighbors (thick black lines )

Long lines, which span the length or width of the chip, providing high fan-out uniform delay connections (dashed lines)

A clock line, which is a single net that spans the entire chip and is driven by a high drive buffer.

Wire segments types

Up Xilinix 4000, Double length wire is used to scan two CLB offering lower routing delay for moderately long connection.

The Actel Routing Architecture

- The routing architecture is asymmetric because there are more uncommitted general purpose tracks in the horizontal direction than the vertical.

- There is no clearly separable switch block in the Actel architecture. Instead, the switching is distributed throughout the horizontal channels

- Each horizontal channel consists of 22 routing tracks, and each track is broken up into segments of different lengths

- This wide distribution of segment lengths makes it likely that a segment of the exact or close length of any given connection can be found, so that very few series programmable switches are needed in any intra-channel connection

- The routing architecture of the Crosspoint FPGA is similar to that of Actel. The Quicklogic architecture, which also uses antifuses, is again similar except that the segments are of two classes: short tracks of length one, and long tracks that traverse the entire chip

- In addition to the input segments and output segments, there are uncommitted vertical freeways that either travel the entire height of the chip. This allows signals to travel longer vertical distances than permitted by the output segments.

The Altera Routing Architecture

- The routing architecture of the Altera FPGA is novel in that it has a two-level hierarchy

- At the first level of the hierarchy, 16 or 32 of the logic blocks are grouped into a logic array block (LAB)

- The structure of the LAB is very similar to a traditional PLD. Each ‘x’ in the figure indicates a point where a connection can be made

The tracks are dedicated to one of four types of connections:

1) Connections from the outputs of all logic blocks in this LAB.2) Connections from the logic expanders.3) Connections from outputs of logic blocks in other LAB’s. (next level of hierarchy PIA)4) Connections from the I/O pads of the chip.

- the advantage of this scheme is that it makes the routing problem very easy, and the regularity of the physical design of the silicon allows it to be packed tightly and efficiently.

- A second advantage of this approach is that the delay through the PIA is the same regardless of which track is used since all tracks have identical loading.

- The disadvantage is that many switches are needed, and these may add more capacitive load than necessary.

The Plessey Routing Architecture

- Programmable routing is achieved using only a multiplexer as a connection block on the inputs of the two-input NAND gate. The multiplexers are controlled by SRAM cells

- The inputs to each multiplexer are connected to:1) The output of the previous NAND gate in the row.

2) The output of the NAND gate above or below this logic block, whichever is closer.3) A vertical long track.4) One of the following three connections depending on which NAND input the multiplexer drives: - A horizontal long track (the upper input).

- The NAND gate output two blocks previous to the current one (lower input of Master block).- The output of the block diagonally away from the current one (lower input of Slave block).

Large and Small Gate Cell

- The small gate devices give low cost circuits because they don’t have unused inputs and a low speed and this is because of the series connection of the small gates

- large gate devices need high cost circuits because they generally have unused inputs. However, they have a high speed because of the direct generation to the large logic function

6-input AND gate example :(a) 2-input AND gate cell devices need 6-AND gates with 5-routing paths. Hence, it needs about 11 transistors and about 11-time delay per one 2-input AND gate (tg)

(b) The 8-input AND gate cell devices need 1-AND gate without any routing path but it needs about 16 transistors and about 2 tg using the same technology of the 2-input AND gate devices.

Effect of Logic Block Granularity on FPGA Density and Performance

Virtex-E Xilinx FPGA case study :

- Divide the large number of inputs circuit to small sub circuits with the number of inputs less than 7-input in each logic equation

- The cost and delay of each circuit can be calculated from the following equation where n is the number of inputs in the circuit:

4 n for 1

4n for 2

4)-n (

Cost

6

n Delay

4-input equations in sub circuits give a minimum cost, because each such function needs one cell and one unit delay

6-input equations in sub circuits give the maximum speed, because each 6-input function needs four cells (one CLB) and one unit delay

5-input equations in sub circuits give an optimal cost and speed. This is because each 5-input function needs two cells and one unit delay.

Design Example: 5-bit adder.

Output S0 S1 S2 S3 S4 Co Total

No. of variables 3 5 7 9 11 11 11

Cost/cell 1 2 8 32 128 128 299

Delay/tg 1 1 2 2 2 2 2

1 -direct implementation


No. of variables 3 3 3 3 3 3 11

Cost/cell 2 2 2 2 1 1 10

Delay/tg 1 1 1 1 1 1 5

2 -minimum cost implementation

3 -optimal cost/speed implementation



Cost/cell 1 4 1 4 1 1 12

Delay/tg 1 1 1 1 1 1 3



Cost/cell 1 2 16 1 2 2 24

Delay/tg 1 1 2 1 1 1 3

4 -Another optimized implementation

FPGA Generic Design Flow

Design Entry: creation of design files using schematic editor or hardware description language

Design Synthesis: creation of a lower level of logic abstraction using a library of primitives.

Partition (or Mapping): assigning to each logic element a specific physical element

Place: maps logic into specific locations in the target FPGA chip.

Route: connections of the mapped logic.

Program Generation: a bit-stream file is generated to program the device.

Device Programming: downloading the bit-stream to the FPGA.

Design Verification: simulation is used to check functionalities

Date post:	02-Feb-2016
Category:	Documents
Upload:	asaad
View:	231 times
Download:	0 times

FPGA Architecture

Documents