+ All Categories
Home > Documents > Reconfigurable Computing - FPGA structures

Reconfigurable Computing - FPGA structures

Date post: 28-Jan-2016
Category:
Upload: meryle
View: 51 times
Download: 0 times
Share this document with a friend
Description:
Reconfigurable Computing - FPGA structures. John Morris Chung-Ang University The University of Auckland. ‘Iolanthe’ at 13 knots on Cockburn Sound, Western Australia. FPGA Architectures. Programmable logic takes many forms Originally devices contained 10’s of gates and flip-flops - PowerPoint PPT Presentation
Popular Tags:
36
Reconfigurable Computing - FPGA structures John Morris Chung-Ang University The University of Auckland Iolanthe’ at 13 knots on Cockburn Sound, Western Australia
Transcript
Page 1: Reconfigurable Computing - FPGA structures

Reconfigurable Computing -FPGA structures

John MorrisChung-Ang University

The University of Auckland

‘Iolanthe’ at 13 knots on Cockburn Sound, Western Australia

Page 2: Reconfigurable Computing - FPGA structures

FPGA Architectures

Programmable logic takes many forms

Originally devices contained 10’s of gates and flip-flops These early devices were generally called PAL’s (Programmable

Array Logic) A typical structure was

With 10-20 inputs and outputs and ~20 flip-flops,they could

• implement small state machines and • replace large amounts of discrete ‘glue’ logic

ProgrammableAnd-Or array

FF

FF

FF

FF

Inp

uts

(

20)

Ou

tpu

ts (

20)

Page 3: Reconfigurable Computing - FPGA structures

Programmable Logic

Memory should also be included in the class of programmable logic! It finds application in LUTs, state machines, ...

From early UV EPROMs with ~kbytes,we now have many styles of memory which retains values when power is removedand capacities in Mbytes

Memory is an important consideration when designingreconfigurable systems.FPGA technology does not provide large amounts of memoryand this can be a constraint -especially if you are trying to produce a compact,single chip solution to your problem!

Page 4: Reconfigurable Computing - FPGA structures

Modern Programmable Logic

As technology has evolved, so have programmable devices

Today’s FPGAs contain Millions of ‘gates’ Memory Support for several I/O protocols - TTL, LVDS, GTL, … Arithmetic units - adders, multipliers Processor cores

Page 5: Reconfigurable Computing - FPGA structures

FPGA Architecture

The ‘core’ architecture of most modern FPGAs consists ofLogic blocksInterconnection resourcesI/O blocks

Page 6: Reconfigurable Computing - FPGA structures

Typical FPGA Architecture

Logic blocksembedded in a‘sea’ of connectionresources

CLB = logic blockIOB = I/O bufferPSM = programmable switch matrix

This particular arrangement

is similar to that in Xilinx 4000

)and onwards (chips-

devices from other manufacturersare similar in overall

structure

Page 7: Reconfigurable Computing - FPGA structures

Typical FPGA Architecture

Logic blocks embedded in a ‘sea’ of connectionresources

CLB = logic blockIOB = I/O bufferPSM = programmable switch matrix

Interconnections critical Transmission gates on paths

FlexibilityConnect any LB to any other

but Much slower than

connections within a logic block

Much slower than long lines on an ASIC

Aside:

This is a ‘universal’ problem - not restricted to FPGAs!

Applies to • custom VLSI ,

• ASICs , • systems ,

• parallel processors

Small transistors high speed high density long, wide datapaths

Page 8: Reconfigurable Computing - FPGA structures

Logic Blocks

Combination of And-or array

orLook-Up-Table (LUT)

Flip-flops Multiplexors

General aim Arbitrary boolean

function of several variables

Storage Designers try to estimate

what combination of resourceswill produce the most efficientapplication circuit mappings

Xilinx 4000 (and on) CLB

•3 LUT blocks

•2 Flip-Flops (Asynch Reset)

•Multiplexors

•Clock / Reset Lines

Page 9: Reconfigurable Computing - FPGA structures

Adders

Adders appear in most designs Arithmetic Adders (including subtracters) Other arithmetic operators

eg multipliers, dividers

Counters (including program counters in processors) Incrementors, decrementors, etc

They also often appear on the critical path Adder performance can be crucial for system performance

Because of their importance, researchers are still searching for better ways to add!

Adder structures proposed already Ripple carry Carry select Carry skip Carry look-ahead Manchester … and several dozen more variants

Page 10: Reconfigurable Computing - FPGA structures

Ripple Carry Adder

The simplest and most well known adder

How long does it take an n-bit adder to produce a result?

n x propagation delay( FA: (a or b) carry ) We can do better than this - using one of many known better structures but What are the advantages of a ripple carry adder? Small Regular

Fits easily into a 2-D layout!

FA

a1 b1

cincout

s1

FA

a0 b0

cincout

s0

FA

an-1 bn-1

cincout

sn-1

FA

an-2 bn-2

cincout

sn-2carryout

Very important in packing circuitry into

fixed 2-D layout of an FPGA!

Page 11: Reconfigurable Computing - FPGA structures

Ripple Carry Adders

Ripple carry adder performance is limited by propagation of carries

FAa1 b1

cincout

s1

FAa0 b0

cincout

s0

FAan-1bn-1

cincout

sn-1

FAan-2bn-2

cincout

sn-2carryout

FAa3 b3

cincout

s3

FAa2 b2

cincout

s2

On an FPGA,this link is often

the major source

of time delay

…because one or two

FA blocks will often fitin a logic block!

LBLBLB

Connections within a logic block are fast!

Connections between logic blocks are slower

Page 12: Reconfigurable Computing - FPGA structures

Interconnections critical Transmission

gates on pathsFlexibilityConnect any

LB to any other

but Much slower

than connections within a logic block

Much slower than long lines on an ASIC

Every one of these

connection points

is a transmission gate

This switch matrix isa mass of transmission

gates too!

Using general interconnect

Page 13: Reconfigurable Computing - FPGA structures

‘Fast Carry’ Logic

Critical delay Transmission of carry out from one logic block to the next

Solution (most modern FPGAs) ‘Fast carry’ logic Special paths between logic blocks used specifically for

carry outVery fast ripple carry adders!

More sophisticated adders? Carry select

Uses ripple carry blocks - so can use fast carry logicShould be faster for wide datapaths?

Carry lookahead Uses large amounts of logic and multiple logic blocksHard to make it faster for small adders!

Page 14: Reconfigurable Computing - FPGA structures

Logic Blocks and fast carry

Xilinx solution Carry logic

precedes LUTs

Fast carryconnections

Up Down

Adders must lie in a column of the FPGA

Some(not serious?) constraint on layout

G

carry

F

carry

Cinup

Cout Cindown

Cout

F1-4

G1-4

Note that carry chains canrun either up or down

)But not sideways(!

Direct connections

to CLB above)in the same column(

Direct connections

to CLB below)in the same column(

Page 15: Reconfigurable Computing - FPGA structures

Logic Blocks and fast carry - Altera version Altera

Simpler logic element One LUT and one flip-flop / logic element Four inputs + one output / logic element Simpler LE more LEs / device

Carry logic follows LUTCarry chains in one direction onlyAdditional fast link (cascade chain)

Efficient implementation of high fan-in functions eg 4+-input gate spans 2+ LEs with a slow link Cascade chain avoids the slow link

• Efficient (fast) high fan-in functions• eg Potentially much faster carry look-ahead adder?

• Discussed later!

Is Xilinx better? It’s unlikely to be superior for all applications!!

Page 16: Reconfigurable Computing - FPGA structures

Carry Select Adder

n-bit Ripple Carry Adder

a0-3

sum0-3

b0-3cin

a4-7

sum04-7

b4-7

cout7

cout3

0

sum14-7

cout7

1

n-bit Ripple Carry Adder

b4-7

n-bit Ripple Carry Adder

0 1

sum4-7

0 1

carryHere we build an 8-bit adder

from 4-bit blocks

‘Standard’

n-bit ripple carryadders

n = any suitable value

Page 17: Reconfigurable Computing - FPGA structures

Carry Select Adder

n-bit Ripple Carry Adder

a0-3

sum0-3

b0-3cin

a4-7

sum04-7

b4-7

cout7

cout3

0

sum14-7

cout7

1

n-bit Ripple Carry Adder

b4-7

n-bit Ripple Carry Adder

0 1

sum4-7

0 1

carryAfter 4*tpd it will

produce a carry out

This block adds the 4 low order bits

These two blocks ‘speculate’

on the value of cout3

One assumes it willbe 0

the other assumes 1

Page 18: Reconfigurable Computing - FPGA structures

Carry Select Adder

n-bit Ripple Carry Adder

a0-3

sum0-3

b0-3cin

a4-7

sum04-7

b4-7

cout7

cout3

0

sum14-7

cout7

1

n-bit Ripple Carry Adder

b4-7

n-bit Ripple Carry Adder

0 1

sum4-7

0 1

carryAfter 4*tpd it will

produce a carry out

This block adds the 4 low order bits

After 4*tpd we will have:• sum0-3 (final sum bits)• cout3

(from low order block)• sum04-7

• cout07

(from block assuming 0 cin)• sum14-7

• cout17

(from block assuming 1 cin)

Page 19: Reconfigurable Computing - FPGA structures

Carry Select Adder

n-bit Ripple Carry Adder

a0-3

sum0-3

b0-3cin

a4-7

sum04-7

b4-7

cout7

cout3

0

sum14-7

cout7

1

n-bit Ripple Carry Adder

b4-7

n-bit Ripple Carry Adder

0 1

sum4-7

0 1

carry

Cout3 selects correct sum4-7 and carry out

All 8 bits + carry are availableafter 4*tpd(FA) + tpd(multiplexor)

Page 20: Reconfigurable Computing - FPGA structures

Each ripple carry blockshould use fast carry logicCarry Select Adder

n-bit Ripple Carry Adder

a0-3

sum0-3

b0-3cin

a4-7

sum04-7

b4-7

cout7

cout3

0

sum14-7

cout7

1

n-bit Ripple Carry Adder

b4-7

n-bit Ripple Carry Adder

0 1

sum4-7

0 1

carryDrawback:These links are relatively slow! CSA only faster for n > ?

Page 21: Reconfigurable Computing - FPGA structures

Carry Select Adder

This scheme can be generalized to any number of bits Select a suitable block size (eg 4, 8) Replicate all blocks except the first

One with cin = 0

One with cin = 1

Use final cout from preceding block to select correct set of outputs for current block

Page 22: Reconfigurable Computing - FPGA structures

Better adders

Performance of a CSA is still O(n) More sophisticated adders?

Carry skip Variant on carry select idea

Carry lookahead Uses large amounts of logic and multiple logic blocksHard to make it faster for small adders!

Page 23: Reconfigurable Computing - FPGA structures

Carry Lookahead Adder

Standard adder expressionscout = a•b + b•cin + a•cin

sum = a b cin

Define two new symbols‘Generate’ G = a•b

If a and b are both 1, then a carry must be generated

‘Propagate’ P = a b If a b (a b = 1), then propagate the carry in

Nowsum = P cin

cout = G + cinP

For the ith bitsumi = Pi ci

ci+1 = Gi + ciPi

Page 24: Reconfigurable Computing - FPGA structures

Carry Lookahead Adder

Expanding the carry expressionsc1 = G0 + c0P0

c2 = G1 + c1P1 = G1 + P1 (G0 + c0P0) = G1 + P1G0 + c0P1P0

c3 = G2 + c2P2 = G2 + P2 (G1 + c1P1) = G2 + P2 (G1 + P1( G0 + c0P0)) = G2 + P2 G1 + P2P1G0 + c0P2P1P0

c4 = G3 + c3P3 = G3 + P3 (G2 + c2P2) = ... = G3 + P3 G2 + P3P2G1 + P3P2P1G0 + c0P3P2P1P0

Note that c4 can be calculated in a logic block that permits

4 P inputs4 G inputsc0 input

Fortuitously, this is exactly what a Xilinx CLB allows! This may be an accidental coincidence!

An Altera LE can use cascade in to calculate c4 using several LEs without serious propagation delay penalty

Page 25: Reconfigurable Computing - FPGA structures

Large Carry Lookahead Adders

Define ‘group’ generate and propagate symbolsFor 4-bit ‘groups’PG = P3P2P1P0

GG = G3 + P3 G2 + P3P2G1 + P3P2P1G0

c5 = G4 + c4P4 = G4 + P4 (G3 + c3P3) = ... = G4 + P4 G3 + P4P3G2 + P4P3P2G1 + P4P3P2P1G0+ c0P4P3P2P1P0

= G4 + P4 (GG + PGc0)

Note that the expression for c5 has exactly the same form as the expression for c1 - substituting (GG + PGc0) for c0

Thus large CLAs are built by cascading blocks that compute group G and P signals

Page 26: Reconfigurable Computing - FPGA structures

Cascading blocks for large CLAs

c0

a0-3b0-3

s0-3GG PG

G0 P0

a4-7b4-7

s4-7GG PGc4

c1G1 P1

GG PG

a8-11b8-11

s8-11GG PGc8

c2G2 P2

s12-15GG PGc12

c3G3 P3

a12-15b12-15

G0 P0c1G1 P1

GG PG

s16-19GG PG

c0G0 P0

a16-19b16-19

GG PG4-bit adder block

modified to calculate G and P

Page 27: Reconfigurable Computing - FPGA structures

Large Carry Lookahead Adders

Define ‘group’ generate and propagate symbols Thus large CLAs are built by cascading blocks that compute

group G and P signals

As with CS Adders, optimum group size will be determined by the technology usedIt will need to match logic block capabilityIt will need to balance internal LUT propagation delay with

link delay+ effectiveness of additional capabilities,

eg Altera cascade chains

Some experimentation will be needed to determine it! CLA overheads are high

Unlikely to be effective for small adders! Some experiments we have done suggest >64 bits needed!

Page 28: Reconfigurable Computing - FPGA structures

Fast Adders

Many other fast adder schemes have been proposedeg Carry-skip Manchester Carry-save Carry Look Ahead

If implementing an adder

(eg in programmable logic) do a little research first!

Page 29: Reconfigurable Computing - FPGA structures

Fast Adders

Challenge: What style of adder is fastest / most compact for any FPGA technology? Answer is not simple For small adders (n < ?),

fast carry logic will certainly make a simple ripple carry adder fastest

It will also use the minimum resources - but will need to be laid out as a column or row

For larger adders ( ? < n < ? ), carry select styles are likely to be best -

They use ripple carry blocks efficiently

For very large adders ( n > ? ), a carry look ahead adder may be faster?

But it will use considerably more resources!

Page 30: Reconfigurable Computing - FPGA structures

Exploiting a manufacturer’s fast carry logic To use the Altera fast carry logic, write your adder like this:

LIBRARY ieee;USE ieee.std_logic_1164.all;LIBRARY lpm ;USE lpm.lpm_components.all ;

ENTITY adder ISPORT ( c_in : IN STD_LOGIC ;

a, b : IN STD_LOGIC_VECTOR(15 DOWNTO 0) ;sum : OUT STD_LOGIC_VECTOR(15 DOWNTO 0) ;c_out : OUT STD_LOGIC ) ;

END adderlpm ;

ARCHITECTURE lpm_structure OF adder ISBEGIN

instance: lpm_add_subGENERIC MAP (LPM_WIDTH => 16)PORT MAP ( cin => Cin, dataa => a, datab => b,

result => sum, cout => c_out ) ;END lpm_structure ;

Page 31: Reconfigurable Computing - FPGA structures

What about that carry in?

In an ALU, we usually need to do more than just add! Subtractions are common also Observe

c = a - b

is equivalent toc = a + (-b)

So we can use an adder for subtractions if we can negate the 2nd operand

Negation in 2’s complement arithmetic?

Page 32: Reconfigurable Computing - FPGA structures

Adder / Subtractor

Negation in 2’s complement arithmetic? Rule:

Complement each bitAdd 1 eg

Binary Decimal 0001 1

Complement 1110Add 1 1111 -1

0110 6Complement 1001 Add 1 1010 -6

Page 33: Reconfigurable Computing - FPGA structures

Adder / Subtractor

Using an adderComplement each bit using an inverterUse the carry in to add 1!

a

b

carry

c

cin

FA FA FA

0 1

add/subtract

Page 34: Reconfigurable Computing - FPGA structures

Example - GenerateENTITY adder IS

GENERIC ( n : INTEGER := 16 ) ;PORT ( c_in : IN std_ulogic ;

a, b : IN std_ulogic_vector(n-1 DOWNTO 0) ;sum : OUT std_ulogic_vector(n-1 DOWNTO 0) ;c_out : OUT std_ulogic ) ;

END adder;

ARCHITECTURE rc_structure OF adder ISSIGNAL c : STD_LOGIC_VECTOR(1 TO n-1) ;COMPONENT fulladd

PORT ( c_in, x, y : IN std_ulogic ;s, c_out : OUT std_ulogic ) ;

END COMPONENT ;BEGIN

FA_0: fulladd PORT MAP ( c_in=>c_in, x=>a(0), y=>b(0), s=>sum(0), c_out=>c(1) ) ;

G_1: FOR i IN 1 TO n-2 GENERATEFA_i: fulladd PORT MAP ( c(i), a(i), b(i), sum(i), c(i+1) ) ;

END GENERATE ;FA_n: fulladd PORT MAP (C(n-1),A(n-1),B(n-1),Sum(n-1),Cout) ;

END rc_structure ;

Page 35: Reconfigurable Computing - FPGA structures

=keyi?

Serial Circuits

Space efficient Sloooow

One bit of result produced per cycleSometimes this isn’t a problem

Highly parallel problems• Search

• Many operations on the same data stream eg search a text database for many keywords in parallel

Text stream

=key0?

=keyn?

Data rate:

x MB/s

Serial processing needs:

8x Mbits/s - Easy!

Effective performance may require comparison with

1000’s of keys

space for key circuits critical!

small, compact bit-serial comparator ideal!

=keyi? =keyi?

Page 36: Reconfigurable Computing - FPGA structures

Serial Circuits

Bit serial adder

ENTITY serial_add IS PORT( a, b, clk : IN std_logic; sum, cout : OUT std_logic ); END ENTITY serial_add;

ARCHITECTURE df OF serial_add IS

SIGNAL cint : std_logic; BEGIN

PROCESS( clk ) BEGIN

IF clk’EVENT AND clk = ‘1’ THEN

sum <= a XOR b XOR cint; cint <= (a AND b) OR (b AND cint) OR (a AND cint ); END IF;

END PROCESS;

cout <= cint;END ARCHITECTURE df;

2-b

it

reg

iste

r

cout

suma

b

cin

FA

Note:

The synthesizer will insert

the latch on the internal signals!

clock

It will recognize the

IF clk’EVENT … pattern!


Recommended