+ All Categories
Home > Documents > Lecture03 FPGA Doc

Lecture03 FPGA Doc

Date post: 10-Apr-2015
Category:
Upload: kulwant-nagi
View: 448 times
Download: 0 times
Share this document with a friend
Description:
FPGA
29
An introduction to FPGA Christoph Heer, December 2002 An introduction to FPGA Christoph Heer December 2002 Abstract This document aims to give an overview of the technology of FPGAs (Field-Programmable Gate Arrays). It focuses on aspects of the architecture and gives insights into the design flow. FPGA devices are compliant with standard CMOS technology, with the exception of those FPGAs which use flash or fuse technology. With processes below 0.2 μm, macros of a reasonable capacity of some 10,000 gate equivalents can be embedded on-chip, constituting Configurable Systems-on- Chip (CSoC). Today, chip platforms integrate a standard microprocessor core, SRAM and an FPGA. Future systems will contain specialized cores.
Transcript
Page 1: Lecture03 FPGA Doc

An introduction to FPGA Christoph Heer, December 2002

An introduction to FPGA

Christoph HeerDecember 2002

Abstract

This document aims to give an overview of the technology of FPGAs (Field-Programmable GateArrays). It focuses on aspects of the architecture and gives insights into the design flow.FPGA devices are compliant with standard CMOS technology, with the exception of those FPGAswhich use flash or fuse technology. With processes below 0.2 µm, macros of a reasonable capacityof some 10,000 gate equivalents can be embedded on-chip, constituting Configurable Systems-on-Chip (CSoC). Today, chip platforms integrate a standard microprocessor core, SRAM and anFPGA. Future systems will contain specialized cores.

Page 2: Lecture03 FPGA Doc

An introduction to FPGA Christoph Heer, December 2002

Contents

Chapter 1 - Introduction

References

Chapter 2 - FPGA Architecture

2.1 Basic Structure

2.2 The Configurable Logical Cell

2.2.1. Simple Transistor/Multiplexer/Gate-Based Cells

2.2.2. LUT-Based Cells

2.2.3. PAL/PLA-Based Cells

2.2.4. ALU-Based Cells

2.3 Routing Structures

2.4 FPGA Configuration

2.5 Distributed SRAM

2.6 Input / Output Cells

References

Chapter 3 - FPGA Design Flow

Appendix A - PAL / PLA Architecture

Appendix B - List of Relevant Acronyms

Page 3: Lecture03 FPGA Doc

An introduction to FPGA Christoph Heer, December 2002

1 Introduction

Digital integrated circuits may be broadly classified into three categories:

1.Programmable logic2.Application-specific logic3.Programmable standard architectures

Programmable logic is typically a means of storing amounts of data for quick access1, in either avolatile or non-volatile manner with respect to the power supply. The programmability of suchdevices could be one-time only, repeatable, or even dynamic (in the case of RAMs). Applicationspecific logic is typically highly optimised in terms of functionality, performance, power and cost.The highest degree of optimisation is obtained with full custom implementation, while semi-customdevices offer quicker design processes. Programmable standard architectures are highly flexible,generic devices, the functionality of which is determined by loaded software (program code).However, the processing time of a function is long because the code is executed sequentially; suchdevices are therefore typically made use of in applications which allow these longer response times.

Gate arrays provide a highly standardised means to implement digital integrated circuit designs.They are manufactured as regular arrays of patterned blocks of transistors which can beinterconnected to form logic elements such as gates, flip-flops and multiplexers. The advantage isthat the manufacturer can pre-produce gate array wafers without interconnections in high-volume.These are then configured in an additional process step in the factory. Once a customer provides adefinition of the logic block interconnections, one or more layers of metal are added to form theseconnections. Sea-of-gates structures are slightly different in that, unlike regular gate arrays, whereblank routing space is provided at regular intervals in the transistor array, added metal interconnectshave to be placed over particular transistors, rendering them unusable. The advantage is a betterarea utilisation. These two types of devices are collectively known as MPGAs (Mask-Programmable Gate Arrays). As process technologies advance and sizes get smaller, it is becomingincreasingly more expensive to configure such devices.

FPGAs (Field-Programmable Gate Arrays) and CPLDs (Complex Programmable Logic Devices)2

are digital devices based on configurable logical cells and configurable interconnect structures.They are manufactured using the latest technologies and very high capacity in equivalent ASICgates. The Altera APEX 20KC for example reaches capacities of 1.5 million gates using 0.15 µmtechnology [1]. Unlike MPGAs, the configuration step does not involve a technological process but 1A PLA / PAL may also be considered as a memory device, if the input vector to the array is viewed as an address

vector and the output of the array as the contents of the memory location uniquely determined by that input /address.

2As is explained in Section 2.2.3 , CPLDs may be considered to be a type of FPGA, and throughout this document,unless otherwise specified, the term FPGA will be used to refer to both FPGAs and CPLDs. The nature andcomplexity of the two types of devices are similar, even though they differ very much in architecture and possibly inthe type of application too.

Page 4: Lecture03 FPGA Doc

An introduction to FPGA Christoph Heer, December 2002

is done electrically. Re-configuration is therefore an option, during system boot-up and possiblydynamically during run-time, though one-time programmable FPGAs also exist. FPGA devicesprovide a very high degree of flexibility based on a standard architecture producible in largequantities. They support the implementation of a wide range of circuit types and offer a lot ofpotential for parallel processing. In this respect they appear superior to DSP architectures. The factthat there is no need to generate a mask to configure FPGA architecture means that the hardwareimplementation of logic circuits is faster and that small quantities may be produced at a reasonablecost. FPGAs can be used for fast functional verification during the development phase, avoiding thelong waiting times associated with simulation. The cost of prototyping and time-to-market of newdesigns is therefore reduced, as is the cost for small-volume production of particular designs.

Most FPGAs are re-configurable even after the chip would have been put into application. Inparticular, FPGA macros which are embedded together with standardised cores on the same dieallow further flexibility. Thus, for example, if one such embedded FPGA macro is used in acommunications transceiver, changes in the communications protocol may be taken care of simplyby re-configuring the eFPGA, rather than re-designing the whole transceiver.All these advantages however come with an incurred increase in signal delay and powerconsumption, and worse utilisation of chip area when compared to equivalent logic circuitsimplemented in full-custom or semi-custom.

To summarise, systems implemented using FPGAs offer the following advantages anddisadvantages over semi-custom and full-custom devices:

Advantages:• Fast and cheap procedure for implementing hardware• Fast functional verification• Low cost of low-volume production• Improved time-to-market• Re-configurability in the field

Disadvantages:• Non-optimal utilisation of silicon area• Signal delay and power consumption are higher• Routing problems could limit flexibility• Potential clock-skew problems

Despite these disadvantages, the market of stand-alone FPGA devices has in recent years explodedinto a billion-dollar business and further growth is expected as process technologies improve. Themain benefit of flexibility without the costs of mask generation will then be even more significant.Since FPGAs are compatible with standard CMOS processes, the embedding of FPGA macros intolarger designs will be a common technique in the imminent future. The following market modelsare foreseeable:

1.Programmable once:• derivatives of standard devices• low cost of customisation even in low quantities• protection of intellectual property as read-outs of programmed gate arrays are harder to obtain

than those of full-custom designs2.Re-programmable:• prototyping and functional development on standard platforms• in-field customisation and updating

Page 5: Lecture03 FPGA Doc

An introduction to FPGA Christoph Heer, December 2002

• multiple-application hardware

In conclusion, although FPGAs are sub-optimal in terms of physical implementation, they offergreat potential for producing standard cores which are individually customisable at low cost.

References

[1] Altera, Data Sheet, APEX 20KC Programmable Logic Device, ver. 1.1, April 2000.

Page 6: Lecture03 FPGA Doc

An introduction to FPGA Christoph Heer, December 2002

2 FPGA Architecture

2.1 Basic Structure

Figure 2.1 - Basic FPGA architecture [1].

The basic architecture of an FPGA (Figure 2.1) is an array of identical, configurable logical cells.The periphery of the device consists of a number of configurable input/output cells. The array isinterwoven with configurable interconnect resources and switches, which provide connection routesbetween all these elements. Additionally FPGAs may have small RAM blocks distributed in thearray; these may also be configured to provide one logically lumped memory unit.

The array of configurable logical cells may be structured in several ways, as shown in Figure 2.2.

a)Symmetric matrixb)Rows of cellsc)Sea of cells: this term refers to the fact that no dedicated routing resources exist between the

structured logical cells but instead they are switched through the cells.d)Hierarchical structure

Page 7: Lecture03 FPGA Doc

An introduction to FPGA Christoph Heer, December 2002

Figure 2.2 - a) Symmetric matrix architecture b) Rows c) Sea of cells d) Hierarchy [2a].

An FPGA device is generally designed to allow the implementation of practically any logic circuit.This however requires an area trade-off between a sufficient number of flexible configurable logicalcells and enough interconnect resources to allow all connections between these cells. As themajority of circuits will only utilise a small portion of routing and logic resources, this results in aloss in speed (incurred by signal passing through redundant routing elements) and density of logicwhen compared to the same circuit implemented in dedicated logic. An interesting concept is thegrouping of different FPGA devices with related architecture into a family [3]. Each member in afamily would be physically tailored to a certain class of application architecture, by for examplereplacing the switches in certain routes by hard shorts, or hard-wiring the logical cells internally in acertain manner. This member may now implement certain circuits more efficiently, but its reducedflexibility means that some circuits may not fit at all onto the device. Implementation of a circuit isnow a question of choosing the right device from the FPGA family.

The IEEE Std. 1149.1 Joint Test Action Group (JTAG) standard describes boundary-scan testcircuitry which facilitates functional verification and debugging of FPGA cores by allowing theobservation of logic nodes without the need to bring these nodes externally via an I/O pin. Dynamicconfiguring of the FPGA may also be done through the JTAG interface.

Page 8: Lecture03 FPGA Doc

An introduction to FPGA Christoph Heer, December 2002

2.2 The Configurable Logical Cell

The CLC (Configurable Logical Cell) is used to implement a number of logic functions (generallyone or two) of a larger number of inputs. A cell may consist of various combinations of thefollowing elements:

• Transistors• Basic gates (NAND, XOR, ... )• Flip-flops• Multiplexers• Look-up tables (LUTs)• AND-OR arrays (sum-of-products)

The term granularity refers to a quantification of the complexity of the CLC and can depend on thefollowing:

• Number of logical functions which may be implemented by each CLC• Number of equivalent NAND2 gates of each CLC• Total number of transistors that physically constitute the CLC

An FPGA device of higher granularity therefore consists of a larger number of less complex CLCs,requiring more complex interconnections. FPGAs can therefore be classified according to thegranularity of their array structures. Arrays of gates or transistors represent the highest extreme ofthe granularity scale, while arrays of microprocessors or ALUs are at the other end, since the CLCsin this case are of very high complexity and require simpler interconnect resources.

2.2.1 Simple Transistor / Multiplexer / Gate-Based Cells

Figure 2.4 - Cell of transistor chains [2a].

Page 9: Lecture03 FPGA Doc

An introduction to FPGA Christoph Heer, December 2002

The most basic type of configurable logical cell consists of simple groupings of transistors.Programmable devices based on such cells are conceptually very similar to gate arrays and requirecomplex routing to implement large logic circuits. Figure 2.4 shows a logical cell formed oftransistor chains.

As a second example of a device with highgranularity, Figure 2.5 shows a simple CLCbased on multiplexers and a standard OR gate.This is used in the Actel 40MK family. The 8-input, 1-output cell can implement basic logicgates (NAND, AND, OR, NOR) with 2, 3 or 4inputs. Efficient use of interconnecting resourcesallows the implementation of any logic function,including flip-flops, by wiring a number of gatestogether.

Figure 2.5 - Actel 40MK CLC [4].

Page 10: Lecture03 FPGA Doc

2.2.2 LUT-Based Cells

Most FPGAs use logical cells which are based on Look-up Table (LUTs), the largestexception being CPLDs. An LUT is realised as a number of memory locations (e.g.SRAM) which are set during the configuration phase. During operation, the vector ofinput signals selects one memory location, the content of which is switched to theoutput of the LUT. This is implemented by means of pass transistors.

In the example LUT shown in Figure 2.6, depending on the inputs A, B and C, a pathis switched through a decision tree of depth three. The contents of the memory cell (inthis case 1 bit) corresponding to that path then appear at the output. Using thisarchitecture any combinational function of the three inputs may be implemented. AnLUT with more inputs can implement more logic, thereby reducing the number oflogical cells needed and with it the chip area needed to provide the routing betweenthe cells (Figure 2.3). However, LUT complexity grows exponentially with thenumber of inputs. Previous research [5] has shown that a 4-input LUT is the mostefficient in terms of area and most commercial FPGA vendors in fact use LUTs ofthis size.

It is also common practice to use two LUTs in parallel. The two outputs could eitherbe dynamically selected using a multiplexer or propagated as two output ports of thelogical cell. In the first case a logical cell of 4 inputs, for instance, could beimplemented using two 3-input LUTs and one multiplexer which is switched by thefourth cell input. The benefit of splitting the LUT is increased flexibility inconfiguring the logical cell.

Figure 2.6 - LUT architecture [2b].

Page 11: Lecture03 FPGA Doc

11

Whilst the LUT implements combinational logic circuits, logical cells must alsocontain flip-flops to be able to implement sequential logic. Figure 2.7 shows asimplified CLC for a typical FPGA.

Figure 2.7 - Basic CLC architecture [6].

Page 12: Lecture03 FPGA Doc

12

Figure 2.8 - CLC configurations [7].

This simple cell can now be configured in several modes to implement various basictypes of digital circuit (Figure 2.8). The most common configurations are:

• Synthesis mode: Any logic function of up to 4 variables in its registered or directform.

• Arithmetic mode: The LUT is split to provide any two logic functions of the same3 variables. In the arithmetic mode, the inputs A, B, C are the addends and theCarry-in, whilst the output functions are the Sum and the Carry-out.

• Multiplier mode: This mode also implements an adder, with the addends this timebeing partial products and Carry-in from the previous bit position. The partialproduct of A and B may be implemented with an AND gate. In the case of theAtmel AT40K device [7], from which these configurations were sourced, an ANDgate is included in the architecture of the CLC for this purpose, avoiding thewasteful reservation of an LUT input to implement such a simple function.

• Counter mode: The LUT provides two logic functions (counter Output and Carry-out) of the same 2 variables, which are a Carry-in and the previous Output. Thefeedback loop to use this output as an input is normally provided for within theCLC; this could also be implemented externally by connecting appropriate routes.

• Multiplexer (2:1) mode: The LUT is configured to provide a logic function of 3variables, where one selects one of the other two inputs. As an example, the casewhere C is the select line for A and B will be considered. In this case the 1-bitmemory cells in the LUT are configured to implement the following truth table:

Page 13: Lecture03 FPGA Doc

13

A B C D O/p

0 0 0 x 0

0 1 0 x 0

1 0 0 x 1

1 1 0 x 1

0 0 1 x 0

0 1 1 x 1

1 0 1 x 0

1 1 1 x 1

Note that some configurations, namely Arithmetic, Counter and Multiplier modes,require 2 distinct functions of 3 inputs. Both Atmel and Actel in fact provide anarchitecture with two separate 3-input LUTs. This is equivalent to a 4-input LUT interms of the number of gates required to implement the LUT. In other words,extending the 3-input LUT in Figure 2.6 to a 4-input LUT involves inserting a fourthinput line D and increasing the depth of the tree to four, which requires an additional16 pass transistors. Since each 3-input LUT contains 14 transistors, having two 3-input LUTs and using D to select which of the outputs will be registered by the flip-flop also results in an additional 16 transistors. Figure 2.9 shows one such example inthe Actel Varicore CLC [8] (ignore the Carry In and Carry Out lines at this point).Other devices, like Altera APEX 20K [9], have 4-input LUTs with a second outputspecifically providing a Carry-out line.

Additionally, the CLCs contain interfacing logic to the routing resources and in somecases specialised functionality such as fast carry and cascade chains to speed uparithmetic operations. The internal connectivity of the each cell is determined by anumber of multiplexers which can be used to configure all possible inter-connectionsbetween LUT, flip-flop and local routing lines.

Page 14: Lecture03 FPGA Doc

14

Figure 2.9 - Actel Varicore CLC architecture [8].

An example of an LUT-based CLC of higher complexity (5 inputs / 2 outputs) is theXilinx XC3000 CLC [10], which uses a 5-input LUT and 2 flip-flops to implementmore complex functions with less number of cells. The obvious penalty is lessefficient CLC utilisation.

Page 15: Lecture03 FPGA Doc

15

Figure 2.10 - Xilinx XC3000 CLC [10].2.2.3 PAL/PLA-Based Cells

Complex Programmable Logic Device (CPLDs) are also devices with high cell-complexity. The CLC of a CPLD is not-surprisingly called a Simple ProgrammableLogic Device (SPLD) and is based on sum-of-products (also called AND-OR) logic.Each SPLD is made up of a PAL or PLA3, macrocells and input / output structures.The PLA / PAL produces a number of product terms which are functions of the inputsto the SPLD. The number of macrocells per PLA / PAL determines how manydifferent logic functions may be obtained from a selection of the same set of productterms. Whether the OR logic is lumped in the PLA / PAL cell or the macrocell blockis simply a question of labelling and is manufacturer-dependent.

3The difference between a PAL and a PLA is explained in Appendix A.

Page 16: Lecture03 FPGA Doc

16

Figure 2.11 - Generic CPLD architecture [11].

As with other FPGAs, all the logical cells can be interconnected using routingresources, though in the case of the CPLD, these tend to be simpler and based onsignal lines running through the whole device, a characteristic of a low-granularitydevice. This also means that delays between cells are predictable. Figure 2.12 showsthe CLC of Altera CPLDs, consisting of AND gates with high fan-in (gates with morethan 20 inputs) which converge in OR gates of 3 to 8 inputs. This structure allows theimplementation of complex logic functions using a minimal amount of CLCs,reducing the required number of interconnections. In practice though it is verydifficult to use the array to its maximum complexity, so density is wasted.

Page 17: Lecture03 FPGA Doc

17

Figure 2.12 - Altera CLC architecture [2c].

2.2.4 ALU-Based Cells

FPGAs based on arrays of ALUs have recently appeared on the market as very low-granularity programmable devices. Companies offering such solutions, or in theprocess of developing them, include Adaptive Silicon, LSI (architecture licensed fromAdaptive Silicon), PACT corporation and Elixent. Arrays of statically programmedALUs can be configured into synchronous DSP pipelines yielding powerfulinstruction level parallelism.

Figure 2.13 - Array of 4-bit ALUs [12].

Page 18: Lecture03 FPGA Doc

18

2.3 Routing Structures

Four types of routing networks are needed in an FPGA device:

• Power feeding network• Reset and multiple clock networks (local / global)• Signal network interconnecting all cells• Configuration lines

A strategy adopted by most manufacturers to different extents is the structuring of thedevice into some sort of hierarchy, by segmenting the array into groups of CLCs.Routing lines interconnecting the cells could then be broadly classified into threedifferent types:

• Local routing lines directly interconnecting neighbours• Interconnects to route signals within a cluster of cells• Global interconnects to transmit signals throughout the whole array

Local routing lines are of low fan-out and limited length. The switching in this caseis done from within the CLC, to create fast point-to-point interconnections useful forfast arithmetic operations for instance. These connections allow the most efficientimplementation of standard structures (as are multiplier elements, shift registers, etc.)in terms of utilisation and speed.

Page 19: Lecture03 FPGA Doc

19

Figure 2.14 - Example of a routing resource using programmable switches [2a].Routing within a cluster of cells is done by means of a matrix of interconnectionlines, which may be configured to realise connections between any two CLCs orbetween one CLC and an I/O cell. Different routes are made using routing resourceswhich consist of configurable pass transistor switches. An example of such a resourceis shown in Figure 2.14. Emphasis has to be made on the importance of havingefficient CAD tools which make good utilisation of the CLCs and place for minimaldistance. Each of the switches in such programmable routing resources is equivalentto an RC element, meaning that it introduces a propagation delay to the signal. Figure2.15 shows how the route between two CLCs, passing through a switching matrix andtwo programmable interconnection points (PIPs in Xilinx terminology) which connectthe cell to a line, may be represented by an equivalent RC model. With FPGA devicesof high granularity, the routing resources are more complex, meaning that there are alarge number of very different routes between two cells, each of which has a verydifferent associated delay. For this reason, low-granularity devices have more easilypredictable delays between cells.

Global interconnects require strong signal driving and do not use the abovementioned routing matrices. They enable the transmission of global signals to allCLCs with minimal delay and attenuation of logic levels. Because of large distances,there could be the need for signal refresh using tri-state buffers.

Page 20: Lecture03 FPGA Doc

20

Figure 2.15 - Breakdown of route into equivalent electrical model [2a].The level of connectivity between cells in the FPGA has a direct effect on the totalarea of the circuit. Recent advancements in the semiconductor technology process hasincreased the number of metal layers available for interconnection (from 2 to 7layers), albeit at a cost. Extra layers can be used to reduce the amount of area requiredfor more complex interconnectivity and allow the allocation of specific layers toparticular functions such as power supply and clock signals.

Different FPGA manufacturers have adopted very different solutions to the complexquestion of routing between cells in an FPGA device. Therefore the routingarchitectures of the different devices will be addressed in more detail in the chaptersconcerning the particular devices.

2.4 FPGA Configuration

FPGA devices allow the configuration of all CLCs, I/O cells and interconnectresources. The gate of each configurable transistor is controlled by the contents of a 1-bit memory cell, with a logic '0' or logic '1' determining whether the gate is off or on.

Page 21: Lecture03 FPGA Doc

21

To reduce the wiring required for configuration, the memory cells can be connected ina chain and the configuration is then loaded using a shift operation. Depending on thephysical configuration mechanism, it is possible to classify FPGAs into three classes:

• One-time configurable devices• Non-volatile re-configurable devices• Volatile re-configurable devices

One-time programmable devices store configuration using fuses or anti-fuses. Theformer are normally closed structures, while the latter are normally open. A devicebased on fuse technology is programmed by physically breaking the connectionsbetween appropriate structures. On the other hand, a device based on anti-fuses isprogrammed by melting interconnections between particular cells to generatecontacts. The Actel eX [13], mX [4] and sX [14] families are based on anti-fusestructures.

In the case of re-programmable devices, activation or deactivation of interconnects isimplemented by means of pass transistors or tri-state buffers (Figure 2.16). Memoryunits also store the configuration of LUTs and static multiplexers in the CLC. If thetype of memory used is EEPROM, the device is non-volatile, but the difficultmechanism of re-configuration imposes limitations on the application of the system.SRAM memory, on the other hand, loses the configuration once power is removedfrom the device (volatile), but it is simple and quick to configure. The use of SRAMallows for dynamic re-configuration of the device even during real-time operation.Small local SRAM blocks may also be used to store several configuration bits. In thiscase, unlike in the application of SRAM blocks for ordinary data storage, there is noneed for a select of the read lines.

Figure 2.16 - Configuration of FPGA devices [15].

In commercial applications, a separate PROM device is used to store theconfiguration, which is then loaded into the FPGA SRAM at system start-up via aspecial configuration interface which usually allows both serial and parallelconfiguration modes. In systems which combine eFPGA cores with microprocessorcores, the processor could load new configurations into the FPGA. To facilitatesystem testing and debugging, many devices support read-out of configuration. The

Page 22: Lecture03 FPGA Doc

22

IEEE Std. 1149.1 JTAG standard describes boundary-scan circuitry which allows theobservation and configuration of individual elements for such purposes.

2.5 Distributed SRAM

Several applications require the use of local memory units. For this purpose, manyFPGAs include small SRAM blocks, which are distributed in an array-like structurethroughout the device. This is known as distributed RAM and could be configured asone logical RAM unit. This type of RAM offers faster access by the FPGA and moreflexibility of configuration of the memory as well as of the communication betweendifferent processes and memory blocks, when compared to a lumped memory blockexternal to the FPGA core. In most cases these distributed memory blocks can beconfigured as multiple independent synchronous / asynchronous, single-port / dual-port RAM blocks, often offering a compromise between the width of the address anddata busses. For example, Altera's FLEX 10K [16] allows the followingconfigurations: 256x8, 512x4, 1024x2, 2048x1.

The LUT in an LUT-based CLC could be looked at as a small memory unit with theflip-flop used to latch the output. Some FPGA devices, like the Xilinx XC4000 Series[17], also allow the configuration of several CLCs into distributed RAM, though ofcourse this implies a loss in logic resources. In the survey carried out on commerciallyavailable FPGAs, the only type of distributed RAM described was that implementedas SRAM blocks distributed throughout the device.

2.6 Input / Output Cells

An important aspect of flexibility on an architectural level is the interface between anIC and external circuitry. There may be the need to support different bus standardswith the same core logic, or to allow different IC pin-outs as required by differentboard layouts. The input / output cells on an FPGA device are programmable blockssituated on the periphery of the circuit. As an example, the basic structure of the IOcell of the Xilinx XC4000 Series [17] will be examined, as shown in the simplifiedblock diagram in Figure 2.17. In general, it may be assumed that other manufacturersuse similar architecture in the IO cells of their devices; if however there are largedifferences, then these are explained in the respective sections.

Page 23: Lecture03 FPGA Doc

23

Figure 2.17 - Simplified block diagram of XC4000E Series IOC [17].

The structure incorporates the following features:

• D flip-flops which could be used to provide sequential buffering of the input oroutput line.

• The tri-state output buffer may be put in a state of high impedance by means of anactivate signal, implementing tri-state outputs or bi-directional I/O.

• The output slew rate may be controlled at the configuration stage.• The output pull-up device may be configured with either an n-channel transistor,

pulling to one threshold level below Vcc or p-channel transistor to pull up to Vcc.• The input thresholds can be configured for either TTL or CMOS logic levels.• Programmable pull-up and pull-down resistors are used to tie floating pins to Vcc

or ground respectively.References

[1] J. Carrabina, F. Lisa and A. J. Velasco, Implementación con FPGAs, Chapter 11from the book Sistemas Digitales, 2000.

[2] S.A. Bota Ferragut, FPGAs, Internal Communication, Universitat de Barcelona[2a] Chapter 1. Introducción[2b] Chapter 3. Arquitectura Logic Cell Array (LCA) de Xilinx[2c] Chapter 5. Arquitectura Multiple Array Matrix (Max-plus) de Altera

[3] V. Betz and J. Rose, Using Architectural “Families” to Increase FPGA Speedand Density, University of Toronto.

Page 24: Lecture03 FPGA Doc

24

[4] Actel, Data Sheet, 40MX and 42MX FPGA Families, ver. 5.0, February 2001.

[5] J. Rose, R. J. Francis, D. Lewis and P. Chow, Architecture of ProgrammableGate Arrays: The Effect of Logic Block Functionality on Area Efficiency, IEEEJournal of Solid State Circuits, Oct. 1990, pp. 1217 - 1225.

[6] V. Betz and J. Rose, How Much Logic Should Go in an FPGA Logic Block?,University of Toronto.

[7] Atmel, Data Sheet, AT40K FPGAs, January 1999.

[8] Actel, Data Sheet, VariCore EPGA Family, rel. 1.0, February 2001.

[9] Altera, Data Sheet, APEX 20K PLD Family, ver. 4.0, August 2001.

[10] Xilinx, Data Sheet, XC3000 Series FPGAs, ver. 3.1, November 1998.

[11] A. Dhir, Introducing Xilinx and Programmable Logic Solutions for HomeNetworking, ver. 1.0, March 2001.

[12] www.elixent.com

[13] Actel, Data Sheet, eX Family FPGAs, ver. 0.3, March 2001.

[14] Actel, Data Sheet, 54SX Family FPGAs, ver. 3.0.1, May 2000.

[15] V. Betz and J. Rose, FPGA Routing Architecture: Segmentation and Buffering toOptimise Speed and Density, University of Toronto.

[16] Altera, Data Sheet, FLEX 10K Embedded PLD Family, ver. 4.1, March 2001.

[17] Xilinx, Data Sheet, XC4000E and XC4000X Series FPGAs, ver. 3.1, ver 1.6,May 1999.

Page 25: Lecture03 FPGA Doc

25

3 FPGA Design Flow

The process of circuit design on FPGA devices is highly automated and involves theuse of flexible and powerful CAD tools. The efficiency of the tools used has a directimpact on the overall design time and the efficiency of the FPGA implementation:

• Design Entry. This is the starting point of the design process and involvescapturing the design using a high-level description language like Verilog orVHDL. Alternatively a schematic editor is used to enter the design at basic logiclevel, or by making use of generic blocks which in turn are described by high-level languages. Other possibilities include entry of the design using statediagrams. The CAD software provided by FPGA manufacturers includes librariesof standard circuits or macro-functions to quickly implement common circuits ofvarying complexity. The schematic or VHDL description are then translated into anetlist describing the circuit in terms of logic gates and sequential elements.

• Logic Synthesis. This tool optimises the circuit by regrouping logic functionsand/or removing redundancies. Such optimisation is carried out according todesign constraints or rules, which could be minimising area or maximisingvelocity. Once the optimised netlist is obtained, it has to be mapped onto thelogical cell of the FPGA (LUT / flip-flop, PLA ... ). The aim of this is to minimisethe total number of CLCs to be used.

• Floorplanning. The circuit to be designed is now divided into partitions, each ofwhich is adjusted to be implemented in a particular area on a FPGA device. Apartition usually corresponds to a large section of the circuit which has a particularfunctionality, e.g a multiplier, filter bank etc. In this step, the total number ofFPGA devices required is also determined.

• Place and Route. A logic partition is now mapped onto an FPGA device bymeans of the placement tool, which assigns a physical place in the array of CLCsto each function (LUT / flip-flop, PLA ... ). Typical placement algorithms aim tominimise the total length of the interconnections in the final design, with theobjective of maximising the speed of the device. Routing algorithms configure therouting elements to provide the required connections between logic elements. Theprimary aim of any routing algorithm is to assure that 100% of the required routesmay be realised. Other goals of routing algorithms include finding the shortestpaths possible between elements. Because of restricted interconnection resources,this step is the most restrictive.

• Layout Verification. This step involves extracting the physical layout of thedesign and simulating it using commercial simulators to obtain timing data andchecking design rules (DRC). If the delays associated with the interconnectionswithin the prototype indeed fulfil delay constraints imposed by the designspecifications, then the device may be programmed, otherwise the placement androuting steps have to be repeated until a satisfactory configuration is found.

• Macro Integration. This involves the provision of all the necessary files and dataformats for integrating the macro in the design flow of the whole chip.

Page 26: Lecture03 FPGA Doc

26

Once the circuit would have been verified, the design configuration is output in aformat which is readable as an input to the FPGA device which is to be programmed.The programming of the device could be a question of minutes.

Page 27: Lecture03 FPGA Doc

27

Appendix A - PAL / PLA Structure

Figure A.1 -PAL / PLA structure [1].

A PLA (Programmable Logic Array) provides a structured form of implementingcombinational functions which are in the form of sum-of-products of a number of

Page 28: Lecture03 FPGA Doc

28

input lines to the device. As shown in Figure A.1, PLAs are built of two distributed-gate arrays. These 2 arrays are programmed by forming a connection between thearray input lines and the logic gate (AND, OR) inputs. The first array provides theproducts (and is therefore known as the AND plane) and the second provides thedesired sum of these products (and is known as the OR plane). A PAL device is avariation in which the OR plane is fixed.

References

[1] Xilinx, Data Sheet. CoolRunner XPLA3 CPLD, ver. 1.4, April 2001.

Page 29: Lecture03 FPGA Doc

29

Appendix B - List of Relevant Acronyms

ASIC Application-Specific Integrated Circuit

CLC Configurable Logical Cell

CPLD Complex PLD

CSoC Configurable SoC

DRC Design Rule Check

eFPGA embedded FPGA

FPGA Field-Programmable Gate Array

IOC Input / Output Cell

JTAG Joint Test Action Group

LUT Look-Up Table

MPGA Mask-Programmable Gate Array

PLA / PAL Programmable Logic Array

PLD Programmable Logic Device

SoC System-on-Chip

SPLD Simple PLD


Recommended