+ All Categories
Home > Documents > Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term...

Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term...

Date post: 21-Dec-2015
Category:
View: 215 times
Download: 3 times
Share this document with a friend
Popular Tags:
46
Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review
Transcript
Page 1: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

ECE 636

Reconfigurable Computing

Lecture 13

Mid-term I Review

Page 2: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

SRAM-based FPGA

• SRAM bits can be programmed many times

• Each programming bit takes up five transistors

• Larger device area reduces speed versus EPROM and antifuse.

Read or Write

Data

Q

Q

Programming Bit I1I2

P1

P2P3P4

Out

2-Input LUT

Page 3: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

Field Programmable Gate Array

Page 4: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

Connection Box Flexibility

• Fc -> How many tracks does an input pin connect to?

• If logic cluster is small, FC is large FC = W

• If logic cluster is large, Fc can be less.

- Approximately 0.2W for Xilinx XC4000EX, Virtex

LogicCluster

IO pin

Tracks

OutT0 T1 T2

T0T1T2

Out

FC = 3T0 T1 T2

Page 5: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

Switchbox Flexibility

• Switch box provides optimized interconnection area.

• Flexibility found to be not as important as FC

• Six transistors needed for FS= 3

0

1

0

1

0 1

0 1

Page 6: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

Switchbox Issues

Page 7: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

Bidirectional vs Directional

Page 8: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

Directional Wiring: Outputs can use switch block muxes

Dir Architecture

Single-driverWiring!!!

New connectivityconstraint

Page 9: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

Fine-grained Approach

• For 4-input LUTs 16 bits of information available

• Can be chained together through programmable network.

• Decoder and multiplexer an issue.

• Flexibility is a key aspect.

Addr

A D

A D

16X1

16X1LUT1

LUT2

Page 10: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

Hill Climbing Algorithms

• To avoid getting trapped in local minima, consider “hill-climbing” approach

• Need to accept worse solutions or make “bad” moves to get global minima.

• Acceptance is probabalistic. Only accept cost-increasing moves some of the time.

Cost

Solution space

Page 11: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

Maze Routing

• Evaluate shortest feasible paths based on a cost function• Like row-based device global route allocates channel

bandwidth not specific solutions. • Formulate cost function as needed to address desired

goal.

L

L

C

S

Page 12: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

Routing Tradeoffs

• Bias router to find first, best route.

• Vary number of node expansions using:

pcosti = (1 – a) x pcosti-1 + ncosti + a x disti

Page 13: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

Architectural Limitation

• Routing architecture necessitates domain selection.

• Bigger effect for multi-fanout nets

Page 14: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

Pathfinder

• Use a non-decreasing history value to represent congestion.

• Similarities to multi-commodity flow

• Can be implemented efficiently but does require substantial run time

• Only update after an interation.

ci = (1 + hn * hfac) * (1 + pn * pfac) + bn, n-1

Page 15: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

Bipartitioning

• Perhaps biggest problem in multi-FPGA design is partitioning

• Partitioner must deal with logic and pin constraints.

• Could simultaneously attempt partitioning across all devices. Even “simple” algorithms are O(n3)

• Better to recursively bipartition circuit.

Page 16: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

KLFM Partitioning

• Identify nodes to swap to reduce overall cut size

• Lock moved nodes

• Algorithm continues until no un-locked node can be moved without violating size constraints

Bin 1 Bin 2

Page 17: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

KLFM Partitioning

• Key issue is implementing node costs in lists that can be easily accessed and updated.

• Many extensions to consider to speed up overall optimization

• Reasonably easy to implement in software

Page 18: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

Partition Preprocessing: Clustering

• Identify bin size

• Choose a seed block (node)

• Identify node with highest connectivity to join cluster

• Terminate when cluster size met.

• In practical terms cluster size of 4 works best

Page 19: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

Clustering

• Technology mapping before partitioning is typically ineffective since frequently area is secondary to interconnect

• Frequently bipartitioning continues after unclustering as well.

Cluster

KLFM

uncluster KLFM

• This allows for additional fine-grain moves.

Page 20: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

Logic Replication

• Attempt to reduce cutset by replicating logic.

• Every input of original cell must also input the replicated cell.

• Replication can either be integrated into the partitioning process or used as a post-process technique.

Page 21: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

Logic Emulation

• Emulation takes a sizable amount of resources

• Compilation time can be large due to FPGA compiles

• One application: also direct ties to other FPGA computing applications.

Page 22: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

Are Meshes Realistic?

• The number of wires leaving a partition grows with Rent’s Rule

P = KGB

• Perimeter grows as G0.5 but unfortunately most circuits grow at GB where B > 0.5

• Effectively devices highly pin limited

• What does this mean for meshes?

Page 23: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

Multi-FPGA Software

• Missing high-level synthesis

• Global placement and routing similar to intra-device CAD

Page 24: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

Virtual Wires

• Overcome pin limitations by multiplexing pins and signals

• Schedule when communication will take place.

Page 25: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

Virtual Wires Software Flow

• Global router enhanced to include scheduling and embedding.

• Multiplexing logic synthesized from FPGA logic.

Page 26: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

Why Compiling C is Hard

° General Language

° Not Designed For Describing Hardware

° Features that Make Analysis Hard• Pointers

• Subroutines

• Linear code

° C has no direct concept of time

° C (and most procedural languages) are inherently sequential• Most people think sequentially.

• Opportunities primarily lie in parallel data

Page 27: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

Variables

° Handel-C has one basic type - integer

° May be signed or unsigned

° Can be any width, not limited to 8, 16, 32 etc.

Variables are mapped to hardware registers.

void main(void){

unsigned 6 a;a=45;

}

1 0 1 1 0 1 = 0x2da =

LSBMSB

Page 28: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

DeepC Compiler• Consider loop based

computation to be memory limited

• Computation partitioned across small memories to form tiles

• Inter-tile communication is scheduled

• RTL synthesis performed on resulting computation and communication hardware

Page 29: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

DeepC Compiler• Parallelizes compilation across multiple tiles

• Orchestrates communication between tiles

• Some dynamic (data dependent) routing possible.

Page 30: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

Control FSM• Result for each tile is a datapath, state machine,

and memory block

Page 31: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

Striped Architecture

• Same basic approach, pipelined communication, incremental modification• Functions as a linear pipeline• Each stripe is homogeneous to simplify computation• Condition codes allow for some control flexibility

FPGAFabric

Control Unit

Configuration Cache

Configuration Control &Next Addr

Address

Condition Codes Microprocessor

Interface

Page 32: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Lecture 13: Mid-term 1 Review October 22, 2013

Piperench Internals

• Only multi-bit functional units used• Very limited resources for interconnect to neighboring programming

elements• Place and route greatly simplied

Page 33: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Convolutional Encoder Accepts information bits as a continuous stream Operates on the current b-bit input, where b

ranges from 1 to 6 and some number of immediately preceding b-bit inputs to produce V output bits, V > b

FF FF

+

+

1

0 1

0

0

b =1, V =2

Page 34: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Definitions Constraint Length

Number of successive b-bit groups of information bits for each encoding operation

Denoted by K Code Rate (or) Rate

b/V Typical values

K : 7 Rate : 1/2, 1/3

Page 35: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

The Viterbi Algorithm Finds a bit-sequence in the set of all

possible transmitted bit-sequences that most closely resembles the received data.

Maximum likelihood algorithm Each bit received by decoder associated

with a measure of correctness. Practical for short constraint length

convolutional codes

Page 36: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

00

10

11

01

0/00

1/11

1/01

1/10

0/01

0/11

1/00

0/10

State diagram State

Encoder memory Branch

k/ij,where i and j

representthe output bitsassociated with input bit k

Page 37: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Trellis Diagram

00

01

10

11

00 00 00

11 1111

11

10

01

10

01

00

10

T=0 T=1 T=2 T=3

ENC IN : 0 1 0ENC OUT : 00 11 10RECEIVED: 00 11 11

Accumulated metric

2+2,3+0 : 3

0+1,3+1 : 1

2+0,3+1 : 2

0+1,3+1 : 1

0 0

3

2

2

3 1

3

0 2

1

K = 3Rate ½

Total number of states = 2K-1

Page 38: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Adaptive Viterbi Algorithm

Motivation Extremely large memory and logic for Viterbi

Algorithm Fewer number of paths retained Reduced memory and computation

Definitions Path – Bit sequence Path metric or cost – Accumulated error metric of a

path Survivor – Path which is retained for the

subsequent time step

Page 39: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Adaptive Viterbi AlgorithmCriterion for path survival

1. A threshold T is introduced such that a path is retained if and only if current path metric is less than dm+T, where dm is the minimum cost among all survivors of the previous time step.

2. The total number of survivors per time step is limited to a critical number called Nmax selected by user.

Only best Nmax paths have to be retained at any

time.

Page 40: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Trellis Diagram for AVA

Page 41: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

Architecture (contd.)

Add

Add

b1

sum1

b2

sum2

di < dm + T

di < dm + T

Countpaths

Count < Nmax

T = T-2

yes

no

Updatememory

yes

yes

Elimination of sorting

Page 42: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

42

Virtual Router Independent routing policies

for each virtual router

Key challenges• Isolation• Performance• Flexibility• Scalability

Forwarding Table

Routing Control

Virtual router B

Forwarding Table

Routing Control

Physical routerVirtual router A

DEMUX MUX

Page 43: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

43

Virtualization using FPGAs

A novel network virtualization substrate which

• Uses FPGA to implement high performance virtual routers

• Introduces scalability through virtual routers in host software

• Exploits reconfiguration to customize hardware virtual routers

FPGA

VirtualRouter 1

VirtualRouter 2

VirtualRouter 3

VirtualRouter 4

Page 44: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

44

Partial Reconfiguration Use partial reconfiguration to independently configure

virtual routers

Page 45: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

45

Full FPGA Reconfiguration Two virtual routers (A, B) initially in FPGA During reconfiguration router A migrated to software, the

other eliminated After reconfiguration two virtual routers (A, B’) again in FPGA

ReducedThroughput

Page 46: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.

46

Partial FPGA Reconfiguration A remains in hardware and operates at full speed 20X speedup in reconfiguration down time due to partial

reconfiguration SustainedThroughput


Recommended