+ All Categories
Home > Documents > Electrical and Computer Engineering Muhammad Noman Ashraf Optimization of Data-Flow Computations...

Electrical and Computer Engineering Muhammad Noman Ashraf Optimization of Data-Flow Computations...

Date post: 01-Jan-2016
Category:
Upload: timothy-miller
View: 214 times
Download: 1 times
Share this document with a friend
36
Electrical and Computer Engineering Muhammad Noman Ashraf Optimization of Data-Flow Computations Using Canonical TED Representation M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimization of Data-Flow Computations Using Canonical TED Representation” , in IEEE Transactions on Computer-Aided design of Integrated Circuits and Systems ECE 667 Synthesis and Verification of Digital Systems Spring 2011 Slides adapted from D. Gomez-Prado,Q. Ren, M. Ciesielski, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
Transcript

Electrical and Computer Engineering

Muhammad Noman Ashraf

Optimization of Data-Flow Computations Using Canonical TED Representation

M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimization of Data-Flow Computations Using Canonical TED Representation” , in IEEE Transactions on Computer-Aided design of Integrated Circuits and Systems

ECE 667 Synthesis and Verification of Digital SystemsSpring 2011

Slides adapted from D. Gomez-Prado,Q. Ren, M. Ciesielski, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

2Electrical and Computer Engineering

Overview

Motivation TED Review Related Work TED Decomposition System TED Linearization Product Term Extraction Sum-Term Extraction Reordering DFG Generation Replacing constant multipliers by Shifters Conclusion References

3Electrical and Computer Engineering

Motivation

F=a⋅ (f⋅ (g+d⋅ c)+c⋅ e⋅ g)

F=a⋅ f⋅ g+a⋅ f d⋅ c+a⋅ c⋅ e⋅ gMinimum number of operations: 5MPY, 2ADD

F=(a⋅ f)(g+d⋅ c)+(a⋅ c)⋅ e⋅ gnumber of operations: 6MPY, 2ADD

Res: 2MPY,1ADD

Res: 2MPY,1ADD

8MPY, 2ADD

1

2

3

4

5

1

2

3

4 L=3MPY+1ADD

L = 3MPY+2ADD

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

4Electrical and Computer Engineering

TED Review [Construction]

ywpwqwzux 2)(

zu

qw

(zu+qw)

+

x(zu+qw)

pw2

+

+

yw

Canonical for the given order:x,z,u,q,p,y,w

1 2w

^2 1 w

Notation: NON-LINEAR

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

5Electrical and Computer Engineering

RELATED WORK

HDL Compilers• High level synthesis systems – Cyber, Spark, Catapult C –

Lacks local optimility

Kernel based decomposition [Hosangadi et al, Optimizing Polynomial Expressions by algebraic factorization and cse, IEEE Transactions 2005]

• Lacks canonicity

Cut based decomposition (TED based) [Askar et al. “Data-flow transformations using Taylor expansion diagrams,” in Proc. Des. Autom. Test Eur., 2007]

• Limitation – only applicable to TEDs with disjoint decomposition property

6Electrical and Computer Engineering

Cut based decomposition (Related Work) Top down approach Apply a series of cuts (additive and multiplicative) to the edges such that it separates into two disjoint sub-graphs Different sequence of cuts results in different DFG

Sequence - A3,A1,M1,A2

7Electrical and Computer Engineering

Cut based decomposition (Related Work) Top down approach Apply a series of cuts (additive and multiplicative) to the edges such that it separates into two disjoint sub-graphs Different sequence of cuts results in different DFG

Sequence – A1,A3,M1,A2

Sequence - A3,A1,M1,A2

8Electrical and Computer Engineering

TED decomposition [TDS]

Cut based decomposition mentioned earlier only works for TEDs with disjoint decomposition property• Many TEDs don’t have this property

New approach – Bottom up• Identify algebraic operations and extract from the graph• Also works for TEDs without disjoint decomposition property• TED based factorization, CSE, and decomposition jointly referred asTED

decomposition

Systematically involves • Linearization• Product-term extraction• Sum-term extraction• Reordering• DFG generation

9Electrical and Computer EngineeringSlide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

TDS System Overview

TED linearization

Variable ordering

TED factorization & decomposition

Constant multiplication& shifter generation

Common subexpression elimination (CSE)

TED-based Transformations

Static timing analysis

Latency optimization

Resource constraints

DFG-based Transformations

Behavioral transformations

Optimized DFG

TDS netlist

TDS netlist

Designobjectives

Designconstraints

Structural elements

FunctionalTED

StructuralDFG

TDS flow

Matrix transforms,Polynomials

C, Behavioral HDL

DFG extraction

High Level Synthesis(GAUT)

RTL VHDL

Orig

inal

DF

G

HLS flow

10Electrical and Computer Engineering

TED Linearization

TED naturally represents polynomial in its factored form

This efficiency is missing when considering non-linear expressions

F=a2c+abca could be factored out

split a^2 intoa1 and a2

F=a1(a2+b)c

11Electrical and Computer EngineeringTED Decomposition

split w^2 intow1 and w2

TED Linearization [back to previous example]

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

12Electrical and Computer Engineering

TED Linearization [Concept]

^1

x^n^0

F0 F1

Fn…..

x1^0

F0

x2

F1xn

Fn-1

Fn

^1

^0

^0

^1

^1

• split xk = x1.x2.x3…..xk , where xi =xj for all i,j

• iteratively perform splitting on high order nodes

• above substitution results in Horner form which contains minimum no. of multiplications

13Electrical and Computer Engineering

Product Term Extraction

Extractable Product Term – product of variables which appear in expression only once• Can be extracted from TED without duplicating any of it’s variables

Set of nodes connected by a series of multiplicative edges only• starting and ending nodes can have incident additive edges• Starting and ending nodes can have more than one incoming or outgoing

multiplicative edge• Ending node can be terminal node 1

[TDS] recursively identify such terms by traversing the graph in a bottom-up fashion • For each node use depth first approach for including nodes in product term

14Electrical and Computer Engineering

start

u has only one * parent …YESu has only one child path …YES

z has only one * parent …YESz has only one * child path …NO

CONTINUE

BACKTRACK

zu

P1

P2

Product-Term Extraction [back to example]

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

15Electrical and Computer Engineering

Sum Term Extraction

Extractable Sum Term – sum of variables which appear in expression only once• Can be extracted from TED without duplicating any of it’s variables

“Set of nodes incident to multiplicative edges joined at a single common node, such that nodes in question are connected by a chain of additive edges only”

[TDS] recursively identify such terms by traversing the graph in a bottom-up fashion • For each node, make a list of incident nodes and extract the nodes from

the list if connected by additive edges only

[TDS] Uses associativity property of addition

16Electrical and Computer Engineering

Keep support(irreducible)

start

S1

Sum-Term Extraction [back to example]

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

17Electrical and Computer Engineering

Sum Term Extraction

Extractable Sum Term – sum of variables which appear in expression only once• Can be extracted from TED without duplicating any of it’s variables

“Set of nodes incident to multiplicative edges joined at a single common node, such that nodes in question are connected by a chain of additive edges only”

[TDS] recursively identify such terms by traversing the graph in a bottom-up fashion • For each node, make a list of incident nodes and extract the nodes from

the list if connected by additive edges only

[TDS] Uses associativity property of addition

18Electrical and Computer Engineering

Example to illustrate Associativity*

S1=b+d

S2=a+c

19Electrical and Computer Engineering

Stop when TED isIrreducible.

Now generate DFG – (to be explained later)

If Sum term extraction results in more product terms, go back

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

Sum-Term Extraction [cont. – back to example]

20Electrical and Computer Engineering

P3

P4

P5 S3Stop when TED isIrreducible.

S2

Reordering [Back to previous example -> Iteration 2 extraction]

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

22Electrical and Computer Engineering

DFG Generation and Optimization

Transform each irreducible TED into simple DFG• Additive edge -> addition operation• Multiplicative edge -> multiplication operation• Break multiple operands operations into chain of operations

[TDS] maintain a hash table for DFG nodes keyed by the corresponding function • Helps in reusing the node, if same function/expression found again• Captures redundancy due to poor variable order during factorization

DFG is not unique• Can be restructured and balanced to minimize cost

23Electrical and Computer Engineering

Data Flow Graph

L=2MPY+2ADD

Req 3MPY, 2ADD

total: 5MPY, 3ADD

Reordering cost

1

2

3

4

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

24Electrical and Computer Engineering

S2

P3

P4 S3

L=2MPY+2ADD

Req 3MPY, 2ADD

Reordering [-> Iteration 3 extraction]

Cost involves

Reordering of variable

Extraction

DFG generation

Annotating Latency and resource requirements

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

25Electrical and Computer Engineering

1

2

3

4

F

1

2

3

4

5

total: 4MPY , 3ADD

F = S3 = P4+P3 = w⋅S2+x⋅P1 = w⋅(q+S1)+x⋅(z⋅u) = w⋅(q+P2+y)+x⋅z⋅ u = w⋅(q+p⋅w+y)+x⋅z⋅u

L=2MPY+2ADD L=2MPY+3ADD

Req 1MPY,1ADD

1×1×1+

1+1+

Reordering cost

L=2MPY+2ADD

Req 2MPY, 1ADD

Previous cost

L=2MPY+2ADD

Req=3MPY,2ADD

Generating and evaluating new Data Flow Graph [Iteration 3]

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

26Electrical and Computer Engineering

Through reordering all cases can be obtained

1

234

Reordering [-> Iteration 4 extraction,DFG generation]

Design Space Exploration

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

27Electrical and Computer Engineering

Replacing constant multipliers*

By shifters• Transform constant multiplications into shifters, while considering factorization involving

shifters Steps

• Represent constant in CSD format – Use shift variable Li (instead of 2i for shifting i bits• Generate TED with shift variables, linearize it and perform decomposition• Replace terms involving shift variables (Li) by i-bit shifters

7a + 6bL3(a+b) - L.b - a ((a+b)<<3) – (a+

(b<<1))

(L3-1)a+(L3-L)b

28Electrical and Computer EngineeringSlide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

TDS – TED Decomposition System RECAP Read in the CDFG file (cdfg) or polynomial expression (poly) or using pre-

coded DSP transforms (tr) Translate into functional TED (dfg2ted) and structural elements (comparators

etc.) Linearize its data path (linearize) Iterate

• Iterate• Product term extraction• Sum term extraction

• Reorder to minimize latency (reorder) Set of irreducible TEDs Produce Final DFG (ted2dfg)and annotate back the CDFG file (write) Data flow and computation intensive designs - DSP

Design Space Exploration

29Electrical and Computer Engineering

Conclusion

Results in the paper show 15% Latency improvement and 7% area reduction when using DFG generated from TDS instead of using KBD• Far better results when compared to original DFG

TDS – front end to GAUT

Fundamental limitation – decomposition dependent upon variable reordering which is an expensive operation

30Electrical and Computer Engineering

REFERENCES

M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimization of Data-Flow Computations Using Canonical TED Representation”, in IEEE Transactions on Computer-Aided design of Integrated Circuits and Systems

M. Ciesielski, S. Askar, D. Gomez-Prado, J. Guillot, and E. Boutillon, “Data-flow transformations using Taylor expansion diagrams,” in Proc. Des. Autom. Test Eur., 2007, pp. 455–460

TDS—TED-Based Dataflow Decomposition System, Univ. Massachusetts,Amherst, MA. [Online]. Available: http://www.ecs.umass.edu/ece/labs/vlsicad/tds.html

31Electrical and Computer Engineering

QUESTIONS?

32Electrical and Computer Engineering

Experiment Setup*

TED linearization

Variable ordering

TED factorization & decomposition

Constant multiplication& shifter generation

Common subexpression elimination (CSE)

TED-based Transformations

Static timing analysis

Latency optimization

Resource constraints

DFG-based Transformations

Behavioral transformations

Optimized DFG

TDS netlist

TDS netlist

Designobjectives

Designconstraints

Structural elements

FunctionalTED

StructuralDFG

TDS flow

Matrix transforms,Polynomials

C, Behavioral HDL

DFG extraction

High Level Synthesis(GAUT)

RTL VHDL

Orig

inal

DF

G

HLS flow

KBD ORIGINAL

TED

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

33Electrical and Computer Engineering

Results*

KBD

KBDKBD

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

34Electrical and Computer Engineering

Results: Quintic Spline*

KBD

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

35Electrical and Computer Engineering

Results: Quartic spline*

KBD

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

36Electrical and Computer Engineering

Improvement over KBD and Original*

KBD

KBD

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

37Electrical and Computer Engineering


Recommended