+ All Categories
Home > Documents > HW/SW Synthesis

HW/SW Synthesis

Date post: 31-Jan-2016
Category:
Upload: kiri
View: 151 times
Download: 0 times
Share this document with a friend
Description:
HW/SW Synthesis. Outline. Synthesis CFSM Optimization Software synthesis Problem Task synthesis Performance analysis Task scheduling Compilation. POLIS Methodology. Graphical EFSM+ Esterel. Java. EC. Compilers. SW Synthesis. HW Synthesis. CFSMs. Partitioning. SW Estimation. - PowerPoint PPT Presentation
Popular Tags:
43
HW/SW Synthesis HW/SW Synthesis
Transcript
Page 1: HW/SW Synthesis

HW/SW SynthesisHW/SW Synthesis

Page 2: HW/SW Synthesis

2

OutlineOutline

SynthesisSynthesis

CFSM OptimizationCFSM Optimization

Software synthesisSoftware synthesis ProblemProblem

Task synthesisTask synthesis

Performance analysisPerformance analysis

Task schedulingTask scheduling

CompilationCompilation

Page 3: HW/SW Synthesis

3

Aptix Board Consists of

– micro of choice

– FPGA’s– FPIC’s

Aptix Board Consists of

– micro of choice

– FPGA’s– FPIC’s

POLIS Methodology

Graphical EFSM+

Esterel

Graphical EFSM+

EsterelJava ................

CFSMs

Partitioning

SW Synthesis

SW Code + RTOS

Logic Netlist

HW Synthesis

SW Estimation HW Estimation

Physical Prototyping

HW/SW Co-SimulationPerformance/trade-off

EvaluationFormal Verification

Compilers

ECEC

Page 4: HW/SW Synthesis

4

Hardware - Software ArchitectureHardware - Software Architecture

Hardware:Hardware: Currently:Currently:

Programmable processors (micro-controllers, DSPs)Programmable processors (micro-controllers, DSPs) ASICs (FPGAs)ASICs (FPGAs)

Software:Software: Set of concurrent Set of concurrent taskstasks

Customized Real-Time Operating SystemCustomized Real-Time Operating System

Interfaces:Interfaces: Hardware modulesHardware modules

Software procedures (polling, interrupt handlers, ...)Software procedures (polling, interrupt handlers, ...)

Page 5: HW/SW Synthesis

5

System PartitioningSystem Partitioning

CFSM1CFSM1

CFSM7CFSM7

CFSM6CFSM6

CFSM5CFSM5

CFSM4CFSM4

CFSM3CFSM3

CFSM2CFSM2

e2e2

e8e8

e3e3

e2e2

e1e1

e9e9

e3e3

e5e5

e7e7

e9e9

port5port5

port1port1

port2port2

port3port3

HW partition 1HW partition 1

HW partition2HW partition2

SW partition 3SW partition 3

SchedulerScheduler

port6port6

port7port7

Page 6: HW/SW Synthesis

6

Software SynthesisSoftware Synthesis

Two-level processTwo-level process ““Technology” (processor) independent:Technology” (processor) independent:

best decision/assignment sequence given CFSMbest decision/assignment sequence given CFSM

““Technology” (processor) dependent:Technology” (processor) dependent:

conversion into machine codeconversion into machine code instruction selection

instruction scheduling

register assignment

(currently left to compiler)

need need performance and cost analysisperformance and cost analysis Worst Case Execution Time

code and data size

Page 7: HW/SW Synthesis

7

Software SynthesisSoftware Synthesis

Technology-independent phase:Technology-independent phase: Construction of Control-Data Flow Graph from CFSMConstruction of Control-Data Flow Graph from CFSM

(based on BDD representation of Transition Function)(based on BDD representation of Transition Function)

Optimization of CDFG forOptimization of CDFG for execution speedexecution speed code sizecode size(based on BDD sifting algorithm)(based on BDD sifting algorithm)

Technology-dependent phase:Technology-dependent phase: Creation of (restricted) C codeCreation of (restricted) C code Cost and performance analysisCost and performance analysis CompilationCompilation

Page 8: HW/SW Synthesis

8

Software Implementation ProblemSoftware Implementation Problem

Input: Input: Set of tasks (specified by CFSMs)Set of tasks (specified by CFSMs) Set of timing constraints (e.g., input event rates and response constraints)Set of timing constraints (e.g., input event rates and response constraints)

Output:Output: Set of procedures that implement the tasks Set of procedures that implement the tasks Scheduler that satisfies the timing constraintsScheduler that satisfies the timing constraints

Minimizing:Minimizing: CPU cost CPU cost Memory sizeMemory size Power, etc.Power, etc.

Page 9: HW/SW Synthesis

9

Software ImplementationSoftware Implementation

How to do it ? How to do it ?

Traditional approach:Traditional approach: Hand-coding of proceduresHand-coding of procedures

Hand-estimation of timing input to scheduling algorithmsHand-estimation of timing input to scheduling algorithms

Long and error-proneLong and error-prone

Our approach: three-step Our approach: three-step automated automated procedure:procedure: Synthesize each task separatelySynthesize each task separately

Extract (estimated) timingExtract (estimated) timing

Schedule the tasksSchedule the tasks

Customized RT-OS (scheduler + drivers)Customized RT-OS (scheduler + drivers)

Page 10: HW/SW Synthesis

10

Software ImplementationSoftware Implementation

Current strategy: Current strategy: Iterate between synthesis, estimation and schedulingIterate between synthesis, estimation and scheduling

Designer chooses the scheduling algorithm Designer chooses the scheduling algorithm

Future work: Future work: Top-down propagation of timing constraintsTop-down propagation of timing constraints

Software synthesis under constraintsSoftware synthesis under constraints

Automated scheduling selection Automated scheduling selection (based on CPU utilization estimates)(based on CPU utilization estimates)

Page 11: HW/SW Synthesis

11

Software Synthesis ProcedureSoftware Synthesis Procedure

Specification, partitioning

S-graph synthesis

Timing estimation

Scheduling, validationnot

feasible feasible

Code generation

Compilation

Testing, validation

Production

pass

fail

Page 12: HW/SW Synthesis

12

Task Implementation Task Implementation

Goal: quick response time, within timing and size constraintsGoal: quick response time, within timing and size constraints

Problem statement:Problem statement: Given a CFSM transition function and constraintsGiven a CFSM transition function and constraints Find a procedure implementing the Find a procedure implementing the transition functiontransition function while meeting the while meeting the

constraintsconstraints

The procedure code is acyclic:The procedure code is acyclic: Powerful optimization and analysis techniquesPowerful optimization and analysis techniques Looping, state storage etc. are implemented outside Looping, state storage etc. are implemented outside

(in the OS)(in the OS)

Page 13: HW/SW Synthesis

13

SW Modeling IssuesSW Modeling Issues

The software model should be:The software model should be: Low-level enough to allow detailed optimization and estimationLow-level enough to allow detailed optimization and estimation High-level enough to avoid excessive detailsHigh-level enough to avoid excessive details

e.g. register allocation, instruction selectione.g. register allocation, instruction selection

Main types of “user-mode” instructions:Main types of “user-mode” instructions: Data movementData movement ALUALU Conditional/unconditional branchesConditional/unconditional branches Subroutine callsSubroutine calls

RTOS handles I/O, interrupts and so onRTOS handles I/O, interrupts and so on

Page 14: HW/SW Synthesis

14

SW Modeling IssuesSW Modeling Issues

Focus on control-dominated applicationsFocus on control-dominated applications Address only CFSM control structure optimizationAddress only CFSM control structure optimization

Data path left as “don’t touch”Data path left as “don’t touch”

Use Use Decision Diagrams Decision Diagrams (Bryant ‘86)(Bryant ‘86) Appropriate for control-dominated tasksAppropriate for control-dominated tasks

Well-developed set of optimization techniquesWell-developed set of optimization techniques

Augmented with arithmetic and Boolean operators, to perform data Augmented with arithmetic and Boolean operators, to perform data computationscomputations

Page 15: HW/SW Synthesis

15

ROBDDsROBDDs

• Reduced Ordered BDDs [Bryant 86] • A node represents a function given by the

Shannon decompositionf = x f x + x f x

• Variable appears once on any path from root to terminal

• Variables are ordered• No two vertices represent the same

function • Canonical

• Two functions are equal if and only if their BDDs are isomorphic Þ direct application in equivalence checking

f = xf = x11 + x + x22 x x33

xx11

xx22

xx33

11 00

ROBDD

xx11ff

xx11ff

Page 16: HW/SW Synthesis

16

ROBDDs and Combinational VerificationROBDDs and Combinational Verification

Given two circuits:Given two circuits:Build the ROBDDs of the outputs in terms of the primary inputsBuild the ROBDDs of the outputs in terms of the primary inputs

Two circuits are equivalent if and only if the ROBDDs are isomorphicTwo circuits are equivalent if and only if the ROBDDs are isomorphic

Complexity of verification depends on the Complexity of verification depends on the sizesize of ROBDDsof ROBDDsCompact in many casesCompact in many cases

Page 17: HW/SW Synthesis

17

ROBDDs and Memory ExplosionROBDDs and Memory Explosion

ROBDDs are not always compactROBDDs are not always compactSize of an ROBDD can be Size of an ROBDD can be exponentialexponential in number of variables in number of variables

Can happen for real life circuits alsoCan happen for real life circuits also e.g. Multiplierse.g. Multipliers

Commonly known as: Memory Explosion Problem of ROBDDs

Page 18: HW/SW Synthesis

18

Technique for Handling ROBDD Memory Technique for Handling ROBDD Memory ExplosionExplosion

ROBDDsROBDDsEnhancementsEnhancementsVariable Variable

OrderingOrdering

Free BDDsFree BDDsOFDDs, OFDDs, OKFDDsOKFDDs

PartitionedPartitionedROBDDsROBDDs

RelaxRelaxOrderingOrdering

NodeNode DecompDecomp.. PartitioningPartitioning

All the representations are canonical combinational equivalence checking

Page 19: HW/SW Synthesis

19

bb33

Handling Memory Explosion: Variable OrderingHandling Memory Explosion: Variable Ordering

BDD size very sensitive to variable orderingBDD size very sensitive to variable ordering

aa11bb11 + a + a22bb22 + a + a33bb33

aa11

bb11aa22

bb22

aa33

bb33

1100

Good Ordering: 8 nodesGood Ordering: 8 nodes1100

Bad Ordering: 16 nodesBad Ordering: 16 nodes

aa11

aa

22

aa22

aa33 aa33 aa33aa33

bb11bb11 bb11 bb11

bb22bb22

Page 20: HW/SW Synthesis

20

aa11

bb11

aa22

bb22

1100

Handling Memory Explosion: Variable OrderingHandling Memory Explosion: Variable Ordering

Good static as well as dynamic ordering techniques existGood static as well as dynamic ordering techniques exist Dynamic variable reorderingDynamic variable reordering [Rudell 93] [Rudell 93]

Change variable order automatically during computationsChange variable order automatically during computations Repeatedly swap a variable with adjacent variableRepeatedly swap a variable with adjacent variable Swapping can be done locallySwapping can be done locally Select the best locationSelect the best location

aa11bb11 + a + a22bb22

aa11

aa22

bb22

1100

aa22

bb11 bb11

Page 21: HW/SW Synthesis

21

SW Model: S-graphsSW Model: S-graphs

Acyclic extended decision diagram computing a transition functionAcyclic extended decision diagram computing a transition function

S-graph structure:S-graph structure: Directed acyclic graphDirected acyclic graph

Set of finite-valued variablesSet of finite-valued variables

TEST nodes evaluate an expression and branch accordinglyTEST nodes evaluate an expression and branch accordingly

ASSIGN nodes evaluate an expression and assign its result to a variableASSIGN nodes evaluate an expression and assign its result to a variable

Basic block + branch is a general CDFG modelBasic block + branch is a general CDFG model(but we constrain it to be (but we constrain it to be acyclicacyclic for optimization) for optimization)

Page 22: HW/SW Synthesis

22

An Example of S-graphAn Example of S-graph

a := a a := a + 1+ 1

a := 0a := 0

detect(c)detect(c)a<a<

bb

BEGINBEGIN

ENDEND

FF

TT

TT FF

– input event c– output event y– state int a– input int b– forever

if (detect(c))

if (a < b)

a := a + 1

emit(y)

else

a := 0

emit(y)

emit(y)emit(y)

Page 23: HW/SW Synthesis

23

S-graphs and FunctionsS-graphs and Functions

Execution of an s-graph computes a function from a set of input and Execution of an s-graph computes a function from a set of input and

state variables to a set of output and state variables:state variables to a set of output and state variables: Output variables are initially undefinedOutput variables are initially undefined Traverse the s-graph from BEGIN to ENDTraverse the s-graph from BEGIN to END

Well-formed s-graph: Well-formed s-graph: Every time a function depending on a variable is evaluated, that variable has a Every time a function depending on a variable is evaluated, that variable has a

defined valuedefined value

How do we derive an s-graph implementing a given function ?How do we derive an s-graph implementing a given function ?

Page 24: HW/SW Synthesis

24

S-graphs and FunctionsS-graphs and Functions

Problem statement:Problem statement: Given: a finite-valued multi-output function over a set of finite-valued variablesGiven: a finite-valued multi-output function over a set of finite-valued variables Find: an s-graph implementing itFind: an s-graph implementing it

Procedure based on Shannon expansionProcedure based on Shannon expansionf = x ff = x fxx + x’ f + x’ fx’x’

Result heavily depends on ordering of variables in expansionResult heavily depends on ordering of variables in expansion Inputs before outputs: TESTs dominate over ASSIGNsInputs before outputs: TESTs dominate over ASSIGNs Outputs before inputs: ASSIGNs dominate over TESTsOutputs before inputs: ASSIGNs dominate over TESTs

Page 25: HW/SW Synthesis

25

Example of S-graph ConstructionExample of S-graph Construction

x = a b + c

y = a b + daa

bbcc

dd

x := 1x := 1

y := 1y := 1

00 11

00 11

11

11

dd

00

x := 1x := 1

y := 0y := 0

x := 0x := 0

y := 1y := 1

x := 0x := 0

y := 0y := 0

0000 11

Order: a, b, c, d, x, y Order: a, b, c, d, x, y (inputs before (inputs before outputs)outputs)

Page 26: HW/SW Synthesis

26

Example of S-graph ConstructionExample of S-graph Construction

x = a b + cx = a b + c

y = a b + dy = a b + daa

bb

x := 1x := 1

y := 1y := 1

00 11

00 11

x := cx := c

y := dy := d

Order: a, b, x, y, c, d (interleaving inputs and outputs)

Page 27: HW/SW Synthesis

27

S-graph OptimizationS-graph Optimization

General trade-off: General trade-off: TEST-based is faster than ASSIGN-based (each variable is visited at most once)TEST-based is faster than ASSIGN-based (each variable is visited at most once)

ASSIGN-based is smaller than TEST-based (there is more potential for sharing)ASSIGN-based is smaller than TEST-based (there is more potential for sharing)

Implemented as Implemented as constrained siftingconstrained sifting of the Transition Function BDD of the Transition Function BDD

The procedure can be iterated over s-graph fragments:The procedure can be iterated over s-graph fragments: Local optimization, depending on fragment criticality (speed versus size)Local optimization, depending on fragment criticality (speed versus size)

Constraint-driven optimization (still to be explored)Constraint-driven optimization (still to be explored)

Page 28: HW/SW Synthesis

28

From S-graphs to InstructionsFrom S-graphs to Instructions

TEST nodes TEST nodes conditional branches conditional branches

ASSIGN nodes ASSIGN nodes ALU ops and data moves ALU ops and data moves

No loops in a No loops in a singlesingle CFSM transition CFSM transition (User loops handled at the RTOS level)(User loops handled at the RTOS level)

Data flow handling:Data flow handling: ““Don’t touch” them (except common sub-expression extraction)Don’t touch” them (except common sub-expression extraction) Map expression DAGs to C expressionsMap expression DAGs to C expressions C compiler allocates registers and select op-codesC compiler allocates registers and select op-codes

Need source-level debugging environment (with any of the Need source-level debugging environment (with any of the

chosen entry languages)chosen entry languages)

Page 29: HW/SW Synthesis

29

Software Synthesis ProcedureSoftware Synthesis Procedure

Specification, partitioningSpecification, partitioning

S-graph synthesisS-graph synthesis

Timing estimation

Scheduling, validationScheduling, validationnot not

feasiblefeasible feasiblefeasible

Code generation

Compilation

Testing, validation

Production

passpass

failfail

Page 30: HW/SW Synthesis

30

POLIS : S-graph Level EstimationPOLIS : S-graph Level Estimation

SW synthesis

CFSM

Sw code

S-graph synthesis and optimization

S-graph

Code generation

Timing / code size information

Estimation

Page 31: HW/SW Synthesis

31

Problems in Software Performance EstimationProblems in Software Performance Estimation

How to link behavior to assembly code?-> Model C code generated from S-graph and use a set of cost parameters

How to handle the variety of compilers and How to handle the variety of compilers and CPUs?CPUs?

Page 32: HW/SW Synthesis

32

Software ModelSoftware Model

func(E) event E; { static int st; Initialization of local variables; Structure of mixed if or switch statements and assign statements ; return; }

generated C codeT = Tpp + k Tinit + Tstruct S = Spp + k Sinit + Sstruct

Time T and Size S

Tpp, Spp

Tinit, Sinit

Tstruct, Sstruct

Time Size

Page 33: HW/SW Synthesis

33

Execution Time of a Path and the Code SizeExecution Time of a Path and the Code Size

PropertyProperty : Form of each statement is determined by type of : Form of each statement is determined by type of corresponding node.corresponding node.

TT struct struct = ƒ°pi Ct(= ƒ°pi Ct( pi: pi: takes value 1 if node i is on a path, otherwise 0.takes value 1 if node i is on a path, otherwise 0. Ct(n,v): Ct(n,v): execution time for node type n execution time for node type n and variable type v.and variable type v.

SS structstruct = ƒ°Cs(= ƒ°Cs( node_type_of node_type_of (i), (i), variable_type_of variable_type_of (i)) (i)) Cs(n,v): Cs(n,v): code size for node type n code size for node type n and variable type v.and variable type v.

path on S-graph

node_type_of node_type_of (i), (i), variable_type_of variable_type_of (i)) (i))

Page 34: HW/SW Synthesis

34

Cost ParametersCost Parameters

* Pre-calculated cost parameters for:* Pre-calculated cost parameters for:

(1) Ct(n,v), Cs(n,v): (1) Ct(n,v), Cs(n,v): Execution time and code size for node type n Execution time and code size for node type n and variable type v.and variable type v.

(2) T(2) Tpppp, S, Spppp: : Pre- and post- execution time and code size.Pre- and post- execution time and code size.

(3) T(3) Tinitinit, S, Sinitinit:: Execution time and code size for local variableExecution time and code size for local variable initialization.initialization.

Page 35: HW/SW Synthesis

35

Problems in Software Performance EstimationProblems in Software Performance Estimation

How to link behavior to assembly code?How to link behavior to assembly code?

How to handle the variety of compilers and CPUs?-> prepare cost parameters for each target

Page 36: HW/SW Synthesis

36

Extraction of Cost ParametersExtraction of Cost Parameters

set of benchmark programs

target C compiler

static analyzer execution & profilingor

parameter extractor

cost parameters

Page 37: HW/SW Synthesis

37

AlgorithmAlgorithm

Preprocess: extracting set of cost parameters. Weighting nodes and edges in given S-graph

with cost parameters. Traversing weighted S-graph. Finding maximum cost path and minimum cost

path using Depth-First Search on S-graph.

Accumulating 'size' costs on all nodes.

Page 38: HW/SW Synthesis

38

S-graph Level Estimation :AlgorithmS-graph Level Estimation :Algorithm

Cost C is a triple (min_time, max_time, code_size)Cost C is a triple (min_time, max_time, code_size)

Algorithm: Algorithm: SGtrace SGtrace (sg(sgii)) if (sgif (sgii == NULL) return (C(0, ,0)); == NULL) return (C(0, ,0)); if (sgif (sgii has been visited) has been visited) return ( pre-calculated Ci(*,*,0) associated with sgreturn ( pre-calculated Ci(*,*,0) associated with sgii ); ); CCii = initialize (max_time = 0, min_time = , code_size = 0); = initialize (max_time = 0, min_time = , code_size = 0); for each child sgfor each child sgjj of sg of sgii { { CCijij = = SGtrace SGtrace (sg(sgjj) + edge cost for edge e) + edge cost for edge eijij;; CCii.max_time = max(C.max_time = max(Cii.max_time, C.max_time, Cijij.max_time);.max_time); CCii.min_time = min(C.min_time = min(Cii.min_time, C.min_time, Cijij.min_time);.min_time); CCii.code_size += C.code_size += Cijij.code_size;.code_size; }} CCii += node cost for node sg += node cost for node sgii;; return (Creturn (Cii););

Page 39: HW/SW Synthesis

39

ExperimentsExperiments

* Proposed methods implemented and examined * Proposed methods implemented and examined in POLIS system.in POLIS system.

* Target CPU and compiler:* Target CPU and compiler: M68HC11 and Introl C compiler.M68HC11 and Introl C compiler.

* Difference D is defined as* Difference D is defined as

D = costestimated costmeasured-costmeasured

Page 40: HW/SW Synthesis

40

Experimental Results : S-graph LevelExperimental Results : S-graph Level

model estimated measured % differencemin 158 141 12.06

FRC max 469 496 -5.44size 654 690 5.22min 223 191 16.75

TIMER max 938 912 2.85size 1,573 1,436 9.54min 145 131 10.69

ODOMETER max 361 363 -0.55size 454 457 -0.66min 314 335 -6.27

SPEEDOMETER max 880 969 -9.18size 764 838 -8.83min 119 111 7.21

BELT max 322 323 -0.31size 511 520 -1.73min 197 171 15.20

FUEL max 533 586 -9.04size 637 647 -1.55min 262 221 18.55

CROSSDISP max 16,289 16,979 -4.06size 32,592 38,618 -15.60

Page 41: HW/SW Synthesis

41

Performance and Cost Estimation: SummaryPerformance and Cost Estimation: Summary

S-graph: low-level enough to allow accurate performance estimationS-graph: low-level enough to allow accurate performance estimation

Cost parameters assigned to each node, depending on:Cost parameters assigned to each node, depending on: System type (CPU, memory, bus, ...)System type (CPU, memory, bus, ...)

Node and expression typeNode and expression type

Cost parameters evaluated via simple benchmarksCost parameters evaluated via simple benchmarks Need timing and size measurements for each target systemNeed timing and size measurements for each target system

Currently implemented for MIPS, 68332 and 68HC11 processorsCurrently implemented for MIPS, 68332 and 68HC11 processors

Page 42: HW/SW Synthesis

42

Performance and Cost EstimationPerformance and Cost Estimation

4040

26264141 6363

14

18 9

Example: 68HC11 timing Example: 68HC11 timing

estimationestimation

Cost assigned to s-graph edgesCost assigned to s-graph edges

(Different for taken/not taken (Different for taken/not taken branches)branches)

Estimated time:Estimated time: Min: 26 cyclesMin: 26 cycles

Max: 126 cyclesMax: 126 cycles

Accuracy: within 20% of Accuracy: within 20% of

profilingprofiling

a := a a := a + 1+ 1

a := 0a := 0

detect(c)a<a<

bb

BEGINBEGIN

ENDEND

FF

TT

TT FF

emit(y)emit(y)

Page 43: HW/SW Synthesis

43

Open ProblemsOpen Problems

Better synthesis techniquesBetter synthesis techniques Add state variables to simplify s-graphAdd state variables to simplify s-graph

Performance-driven synthesis of critical pathsPerformance-driven synthesis of critical paths

Exact memory/speed trade-offExact memory/speed trade-off

Estimation of caching and pipelining effectsEstimation of caching and pipelining effects May have little impact on control-dominated systems May have little impact on control-dominated systems

(frequent branches and context switches)(frequent branches and context switches)

Relatively easy during co-simulationRelatively easy during co-simulation


Recommended