+ All Categories
Home > Documents > Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs

Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs

Date post: 31-Dec-2015
Category:
Upload: allen-cooley
View: 20 times
Download: 1 times
Share this document with a friend
Description:
Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs. FPGA 2008. by Michael T. Frederick and Arun K. Somani [email protected] , [email protected] Iowa State University Ames, IA USA. Artificial Constraint. High performance. Squeaky wheel. - PowerPoint PPT Presentation
22
1 Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs by Michael T. Frederick and Arun K. Somani [email protected] , [email protected] Iowa State University Ames, IA USA FPGA 2008
Transcript
Page 1: Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs

1

Beyond the Arithmetic Constraint: Depth-Optimal

Mappingof Logic Chains in LUT-based

FPGAs

byMichael T. Frederick and Arun K. Somani

[email protected], [email protected] State University

Ames, IA USA

FPGA 2008

Page 2: Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs

April 19, 2023 2

Motivation

HDL macros preserve chains through synthesis, technology mapping, clustering, placement, and routing

ArtificialConstraint

Circuit delay is usually dominated by programmable interconnect - carry chains are one attempt to address

0ps wire delay adjacent cell interconnect in Stratix Carry logic delay is 58ps LUT delay is 366ps Programmable routing delay is typically on range 300ps-2.0ns

About 70% of circuit delay due to routing, 30% due to LUTs Next to no delay due to carry chain connections

Highperformance

Squeakywheel

Overspecialized

Page 3: Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs

April 19, 2023 3

Logic Chains:The Generalization of Arithmetic

(K-1)-LUT operating mode Carry-select addition Separate (K-1)-LUT functions (e.g. cout and

sum) Altera Stratix & Cyclone

K-LUT operating mode Single K-LUT function Altera Stratix LUT Chain Carry chain reuse cell (Frederick et al. 2007)

Page 4: Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs

April 19, 2023 4

Technology Mapping for Chains

FlowMap (Cong et al. 1994) First polynomial time, optimal logic depth

solution Assumption is that LUTs connected solely with

routing Arbitrary net delay FlowMap (Cong et al.

1994) Static estimated net delay Not adapted to chain characteristics

Carry chain reuse (Frederick et al. 2007) Post-technology map heuristic methods for

assigning carry chain nets

Page 5: Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs

April 19, 2023 5

Redefining depth Minimize routing depth instead of logic depth

Chain net – negligible delay (~0ps) Routing net – non-zero delay (>0ps)

Chains are a series of depth increasing nodes A Boolean node increasing logic depth, but not routing depth Exclusivity – chain net is an exclusive connection between

adjacent LEs ChainMap is inspired by, and incorporates FlowMap concepts

Labeling Ascertain optimal routing depth of each node Identify depth increasing nodes Compute minimum routing and logic height cut for each node

Mapping - Form LUTs defined by minimum height cuts Duplication- Comply with exclusivity constraint Relaxation (optional) - Reduce the number of node duplications

Page 6: Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs

April 19, 2023 6

Preliminaries Cut - A partition of nodes in a

DAG LUT(t) – K-feasible set of nodes

determined by a minimum routing height cut that implements t

Routing height of node t is determined by cut height and depth increasing node

Logic height of node t is determined by cut height

Objective 1: minimize routing depth

Objective 2: minimize logic depth

otherwise

Xd)X(X,hg(t) G

)X,feasible(XK 1

0min

1min

)X(X,hl(t) L)X,feasible(XK

LUT(t)

)X(X,

)X(X,

)X(X,X

X

Page 7: Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs

April 19, 2023 7

LabelingStep 1: Find a depth increasing node

Construct the cone Nt of t in DAG N Exclude all non-predecessors of t

Join all PIs with global source s Create P, the predecessors of t where

g(u)=p t is included in P d can be any node in P Special case: when t=d then LUT(t)=P

Partition P using DFS(d,P) Pd consists of chain nodes ~Pd consists of nodes that join LUT(t) Special case: when t=d then Pd=P, ~Pd=Ø

If no edge exists between Pd-{d} and ~Pd then d is a valid depth increasing node

Page 8: Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs

April 19, 2023 8

LabelingStep 2: Isolate depth increasing node

Collapse all of Pd into d’ and all of ~Pd into t’ Special case t=d results in

collapse of all Pd into d’=t’ Precludes a cut that bisects

the nodes in Pd or ~Pd

If Nt’ can be formed, then there is a valid depth increasing node

Page 9: Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs

April 19, 2023 9

LabelingStep 3: Find min-height K-feasible cut

Construct the flow residual graph Nt’’

Split all nodes except {s, t} Assign bridging edge capacity of 1, all

others ∞ capacity Apply Max-flow, Min-cut

If find more than K augmenting paths, can’t find K-feasible cut and LUT(t)=t

If less than or equal to K augmenting paths, DFS can be used to find cut set

Any node in P can be d l(u), is used to select the minimum

height logic depth from among the K-feasible minimum height routing cuts

Page 10: Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs

April 19, 2023 10

Mapping

Using minimum height cuts, generate a mapping solution Exactly as in FlowMap

Create set T, containing all POs For each t є T, use its K-feasible cut to

create LUT(t)=t’ Update T to be (T-{t}) υ input(t’) Repeat process until all nodes in T are

PIs

Page 11: Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs

April 19, 2023 11

Duplication:The Exclusivity Constraint

1. If either u or v has K inputs, u and v must be equivalent

2. If |input(u) υ input(v)| < K, u and v can compute different functions

3. u must generate a chain output and v a routing output, or vice versa

4. input(v) does not contain u and input(u) does not contain v

Page 12: Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs

April 19, 2023 12

Duplication Traverse N in reverse

topological order If t violates exclusivity, it is

duplicated as t’ Remap LE {u,v} є output(t)

Area bounded by O(n2) Worst case expansion occurs

when each chain from s to its descendent POs is duplicated

For a chain with n nodes, O(n)xO(n) possible duplications

Page 13: Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs

April 19, 2023 13

Relaxation Process nodes in reverse

topological order (PO to PI)

For each node t, identify LEs {u,v} for all u,v є output(t)

If only 1 LE, leave alone If more than 1 LE, relax all

of the shortest paths to POs Designed to target long

logic chains, typified by arithmetic

Page 14: Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs

April 19, 2023 14

Experimental Flow Quartus II – HDL elaboration to VQM format (Malhotra et al. 2004) SIS – synthesis, technology decomposition, and chain extraction of

VQM netlist (Sentovich et al. 1992) DMIG, FlowPack – K-bounding, logic network reduction (Chen et al.

1992) Four different experimental flows

forget - don’t use chains after elaboration before - insert chains after synthesis and before ChainMap after - insert chains after ChainMap normal - insert chains after FlowMap

Page 15: Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs

April 19, 2023 15

Optimal Results Speedup increases as

routing becomes more costly

Area often not feasible cfft, K=5, before, 2.07x

before, forget outperform after

before, forget closely mirror each other (<5%)

Growth rate with G:L of K=[4,5] nearly the same, but K=6 slightly lower due to fewer nets

Page 16: Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs

April 19, 2023 16

Relaxed Results before, forget closely

mirror each other (<5%) before, forget trend

differently for K=[4,6] As G:L and K increase,

ability to mask relaxed nets decreases

Vulnerability of relaxation technique for higher G:L

after is always increasing Few opportunities for

tool-generated chains, yielding fewer relaxations

HDL chains more resilient against increased G:L

Page 17: Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs

April 19, 2023 17

A closer look… Optimal routing

depth always as good or better than traditional

Ubiquitous speedup for optimal solutions

In some cases, relaxation technique hinders performance

LUT consumption often reduced

Page 18: Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs

April 19, 2023 18

Conclusions & Future Work Formally, a generic logic chain is a subnetwork of

adjacent nodes with equal routing depth and increasing logic depth

L c N s.t. g(uj) = g(ui), l(uj)=l(ui)+1, for all ui,uj є L Polynomial O(n3) time identification of generic logic

chains Eliminates the need to preserve HDL

Finish full design flow experiments, including clustering, placement, and routing

Develop more creative & effective relaxation techniques Explore architectures that more effectively support

generic chains Develop CAD tools that no longer depend on HDL

preservation

Page 19: Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs

19

Questions?

Thanks!

Page 20: Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs

April 19, 2023 20

References1. Altera. Stratix Series User Guides. www.altera.com.2. J. Cong and Y. Ding. FlowMap: an optimal technology mapping algorithm for

delay optimization in lookup-table based FPGA designs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 13(1):1-12, 1994.

3. A. Farrahi and M. Sarrafzadeh. Complexity of the lookup-table minimization problem for fpga technology mapping. IEEE Transactions On Computer-Aided Design Of Integrated Circuits And Systems, 13(11):1319-1332, 1994.

4. L. R. Ford and D. R. Fulkerson. Flows in Networks. Princeton Univ. Press, Princeton, NJ, 1962.

5. M. Frederick and A. Somani. Non-arithmetic carry chains for reconfigurable fabrics. In Proceedings of the 15th International Conference on Computer Design, pages 137-143, October 2007.

6. S. Malhotra, T. Borer, D. Singh, and S. Brown. The quartus university interface program: enabling advanced fpga research. In Proceedings of the 2004 IEEE Int'l Conference on Field-Programmable Technology, pages 225-230, Dec. 2004.

7. OpenCores. www.opencores.org.8. E. Sentovich, K. Singh, L. Lavagno, C. Moon, R. Murgai, A. Saldanha, H. Savoj, P.

Stephan, R. K. Brayton, and A. L. Sangiovanni-Vincentelli. Sis: A system for sequential circuit synthesis. Technical Report UCB/ERL M92/41, EECS Department, University of California, Berkeley, 1992.

9. S. Singh, J. Rose, P. Chow, and D. Lewis. The effect of logic block architecture on fpga performance. Journal of Solid-State Circuits, 27:281-287, March 1992.

Page 21: Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs

April 19, 2023 21

AbstractLook-up table based FPGAs have migrated from a niche technology for

design prototyping to a valuable end-product component and, in some cases, a replacement for general purpose processors and ASICs alike. One way architects have bridged the performance gap between FPGAs and ASICs is through the inclusion of specialized components such as multipliers, RAM modules, and microcontrollers. Another dedicated structure that has become standard in reconfigurable fabrics is the arithmetic carry chain. Currently, it is only used to map arithmetic operations as identified by HDL macros. For non-arithmetic operations, it is an idle but potentially powerful resource.

This work presents ChainMap, a polynomial-time delay-optimal technology mapping algorithm for the creation of generic logic chains in LUT-based FPGAs. ChainMap requires no HDL macros be preserved through the design flow. It creates logic chains, both arithmetic and non-arithmetic, in an arbitrary Boolean network whenever depth increasing nodes are encountered. Use of the chain is not reserved for arithmetic, but rather any set of gates exhibiting similar characteristics. By using the carry chain as a generic, near zero-delay adjacent cell interconnection structure an average optimal speedup of 1.4x is revealed, and an average relaxed speedup of 1.25x can be realized simultaneously with a 0.95x LUT utilization decrease.

Page 22: Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs

April 19, 2023 22

Proof of increasing routing depth

)Z(Z,h, g(u))X(X,hg(t) GG 1 (i)g(u)g(t)g(u) )Z(Z,h)X(X,hg(t) GG

11 (ii) )Z(Z,h, g(u))X(X,hg(t) GG

g(u)g(t) g(u)-)Z(Z,h)X(X,hg(t)- GG 11)Z(Z,h), g(u)X(X,hg(t) GG (iii)

g(u)g(t)g(u) )Z(Z,h)X(X,hg(t) GG

}{, tNXXu t

g(u)g(t)g(u) )X(X,hg(t) G Case 1:

)Z(Z,h)Y(Y,hYZ GG )Y(Y,h)X(X,hXY GG )Z(Z,h) Y(Y,h)X(X,h GGG

Xu Case 2:

1 (iv) )Z(Z,h), g(u)X(X,hg(t) GG

1 )Y(Y,hg(d))X(X,hg(d)Y, g(t)if d GG

g(u)g(t)g(u)-)Z(Z,h)Y(Y,h GG 11g(d))Y(Y,h)X(X,hY, g(t)if d GG

g(u)g(t)g(u))Z(Z,hg(d)g(t) G 1

111 pg(t)p{t},{t})(Nhg(t) tG

{t}NX,Xu t Case 3:

tN)X(X, ofcut height minimum-

)X(X,N)Y(Y, u by induced in cut -

uN)Z(Z, ofcut height minimum-

otherwise

Xd)X(X,hg(t) G

)X,feasible(XK 1

0min

Given:

Show: 1 pg(t)pinput(t)ug(u)p , where

1 pg(t)p)(,1 tinputupg(t)g(u)p


Recommended