Routing Wire Optimization through Generic Synthesis on FPGA Carry

Post on 22-Feb-2016

47 views 2 download

description

Routing Wire Optimization through Generic Synthesis on FPGA Carry. Hadi P. Afshar Joint work with: Grace Zgheib , Philip Brisk and Paolo Ienne. FPGAs and ASICs Gaps*. How to narrow the gap ? Specialized (DSP) blocks Coarser grained logic b locks Hard-wired connections. - PowerPoint PPT Presentation

transcript

Routing Wire Optimizationthrough Generic Synthesis on FPGA Carry

Hadi P. Afshar

Joint work with: Grace Zgheib, Philip Brisk and Paolo Ienne

2

FPGAs and ASICs Gaps*

• Performance– Ratio: 3-4

• Area– Ratio: 20-35

• Power– Ratio: 7-15

*I. Kuon and J. Rose, "Measuring the gap between FPGAs and ASICs“, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 26, NO. 2, FEBRUARY 2007, pp. 203 – 215.

Routing resources consume ≈60-80% of the chip area and are significant contributors to circuit delay.

Concerns:✘ Lack of generality and flexibility✘ Underutilization✘ Change in routing structure

How to narrow the gap? Specialized (DSP) blocks Coarser grained logic blocks Hard-wired connections

3

Carry Chains

4-LUT

4-LUT

4-LUT

4-LUT

+

+

CLB

CLB

CLB

CLB

CLB

CLB

CLB

CLB

CLB

CLB

CLB

CLB

8 In

puts

4

Motivation Example

5

Problem DefinitionLUT Mapped Flow Graph

Step1: Logic Matching

Step2: Chaining

6

Logic Matching

• Step1: Enumeration of Programmable Part• Step2: Identifying regular and independent

segments • Step3: Developing alphabet library of the

macro cell• Step4: Mask division and library matching

B

LUT

LUT

+

A Cin

Cout

7

Logic Matching (Example)

• Step1: Enumerationi3 i2 i1 i0 LUT1 LUT2

0 0 0 0 A0 B0

0 0 0 1 A0 B1

0 0 1 0 A1 B0

0 0 1 1 A1 B1

0 1 0 0 A2 B2

0 1 0 1 A2 B3

0 1 1 0 A3 B2

0 1 1 1 A3 B3

1 0 0 0 A4 B4

1 0 0 1 A4 B5

1 0 1 0 A5 B4

1 0 1 1 A5 B5

1 1 0 0 A6 B6

1 1 0 1 A6 B7

1 1 1 0 A7 B6

1 1 1 1 A7 B7

8

Logic Matching (Example)

• Step2: Regular and Independent Segmentsi3 i2 i1 i0 LUT1 LUT2

0 0 0 0 A0 B0

0 0 0 1 A0 B1

0 0 1 0 A1 B0

0 0 1 1 A1 B1

0 1 0 0 A2 B2

0 1 0 1 A2 B3

0 1 1 0 A3 B2

0 1 1 1 A3 B3

1 0 0 0 A4 B4

1 0 0 1 A4 B5

1 0 1 0 A5 B4

1 0 1 1 A5 B5

1 1 0 0 A6 B6

1 1 0 1 A6 B7

1 1 1 0 A7 B6

1 1 1 1 A7 B7

9

Logic Matching (Example)

• Step3: Alphabet library of the cell

LUT1 LUT2 Cin 8-bit alphabets of configuration mask dictionaryA0 B0 0 0 0 0 0 0 …A0 B1 0 0 0 0 0 0 …A1 B0 0 0 0 0 0 0 …A1 B1 0 0 0 0 0 0 …A0 B0 1 0 1 0 1 1 …A0 B1 1 0 1 0 1 0 …A1 B0 1 0 0 1 1 1 …A1 B1 1 0 0 1 1 0 …

A0 = 0A1 = 0B0 = 0 B1 = 0

A0 = 1A1 = 0B0 = 0 B1 = 0

A0 = 0A1 = 1B0 = 0 B1 = 0

A0 = 1A1 = 1B0 = 0 B1 = 0

A0 = 0A1 = 0B0 = 1 B1 = 0

10

Logic Matching (Example)

• Step4: Mask segmented matching

8-bit 8-bit 8-bit 8-bit

Library

How much we gain?

• Assume that mask is 32-bit

– N Segments

– M Patterns in each segment

– Our Library Size = Bits

– Num of all configurations =

11

32.MN

32.NMN

Order of magnitudes less memory Order of magnitudes less comparisons

12

Chaining HeuristicInput

Output

1 2

3

4

5

2 0

5

1

Input

Output

2

0

1

1

Input

Output

We need to find chains of functions, which are mappable to the macrocell, to be placed on the carry chains

Synthesis and Chaining ResultsBenchmark Chainable Chained Max Chain

LengthAverage Chain

Lengthalu4 74% 39% 4 3.5

pdc 69% 35% 6 3.9

misex3 68% 42% 4 3.1

ex1010 71% 41% 5 3.4

ex5p 72% 40% 4 3.5

des* 65% 31% 3 3.0

apex2 73% 42% 4 3.6

apex4 75% 39% 4 3.7

spla 72% 43% 6 4.2

seq 69% 38% 4 3.4

Average 70% 39% 4.4 3.5

13* The minimum threshold for the chain length is 4, except for “des” which is 3.

14

Experimental MethodologyGoal: Extract chains of eligible functions from the synthesized netlist in order to place them on the logic chains; the non-chained ones are remained unchanged.

Our SynthesisEngine

Chain HeuristicLogic Matching

Chain HeuristicChaining Heuristic

Netlist GenerationNetlist Generation

DAG GenerationVQM Parser DAG Generation

Synthesis and LUT MappingQuartus-II LUT Mapping & Syn

Place and RouteQuartus-II Place & Route

15

Local Routing Wires26% saving in local wires number

16

Total Wire Lengths

9% saving in total wire lengths

17

Delay3% delay penalty due to large in-out delay of the adder

18

Conclusion

Narrow the FPGA and ASIC Gaps

Lighten the stress on routing resources

Hardwired connections + Dedicated logic

Improved Routability with a Lighter Network

19

Thanks for your attention.

hadi.parandehafshar@epfl.ch