+ All Categories
Home > Documents > Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of...

Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of...

Date post: 31-Dec-2015
Category:
Upload: eileen-burke
View: 215 times
Download: 1 times
Share this document with a friend
Popular Tags:
29
Winter-Spring 2001 Codesign of Embedded System s 1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)
Transcript
Page 1: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms1

Co-Synthesis Algorithms:HW/SW Partitioning

Part ofHW/SW Codesign of

Embedded Systems Course (CE 40-226)

Page 2: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms2

Topics Introduction Preliminaries Hardware/Software Partitioning Distributed System Co-Synthesis

Page 3: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms3

Topics Introduction A Classification Examples

Vulcan Cosyma

Page 4: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms4

Introduction to HW/SW Partitioning The first variety of co-synthesis

applications Definition

A HW/SW partitioning algorithm implements a specification on some sort of multiprocessor architecture

Usually Multiprocessor architecture = one CPU +

some ASICs on CPU bus

Page 5: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms5

Introduction to HW/SW Partitioning (cont’d) A Terminology

Allocation Synthesis methods which design the

multiprocessor topology along with the PEs and SW architecture

Scheduling The process of assigning PE (CPU and/or ASICs)

time to processes to get executed

Page 6: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms6

Introduction to HW/SW Partitioning (cont’d) In most partitioning algorithms

Type of CPU is fixed and given ASICs must be synthesized

What function to implement on each ASIC? What characteristics should the implementation

have? Are single-rate synthesis problems

CDFG is the starting model

Page 7: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms7

HW/SW Partitioning (cont’d) Normal use of architectural components

CPU performs less computationally-intensive functions

ASICs used to accelerate core functions Where to use?

High-performance applications No CPU is fast enough for the operations

Low-cost application ASIC accelerators allow use of much smaller,

cheaper CPU

Page 8: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms8

A Classification Criterion: Optimization Strategy

Trade-off between Performance and Cost Primal Approach

Performance is the primary goal First, all functionality in ASICs. Progressively move

more to CPU to reduce cost. Dual Approach

Cost is the primary goal First, all functions in the CPU. Move operations to

the ASIC to meet the performance goal.

Page 9: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms9

A Classification (cont’d) Classification due to optimization

strategy (cont’d) Example co-synthesis systems

Vulcan (Stanford): Primal strategy Cosyma (Braunschweig, Germany): Dual strategy

Page 10: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms10

Co-Synthesis Algorithms:HW/SW Partitioning

HW/SW Partitioning Examples:Vulcan

Page 11: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms11

Partitioning Examples:Vulcan Gupta, De Micheli, Stanford University Primal approach

1. All-HW initial implementation. 2. Iteratively move functionality to CPU to

reduce cost. System specification language

HardwareC Is compiled into a flow graph

Page 12: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms12

Partitioning Examples:Vulcan (cont’d)

nop

x=a y=b

1 1x=a; y=b;

HardwareC

cond

x=e y=f

c>d c<=dif (c>d)x=e;

else y=f;

HardwareC

Page 13: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms13

Partitioning Examples:Vulcan (cont’d) Flow Graph Definition

A variation of a (single-rate) task graph Nodes

Represent operations Typically low-level operations: mult, add

Edges Represent data dependencies Each contains a Boolean condition under which

the edge is traversed

Page 14: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms14

Partitioning Examples:Vulcan (cont’d) Flow Graph

is executed repeatedly at some rate can have initiation-time constraints for each

node t(vj)+lij t(vj) t(vj)+uij

can have rate constraints on each node mi Ri Mi

Page 15: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms15

Partitioning Examples:Vulcan (cont’d) Vulcan Co-synthesis Algorithm

Partitioning quantum is a thread Algorithm divides the flow graph into threads

and allocates them Thread boundary is determined by

1. (always) a non-deterministic delay element, such as wait for an external variable

2. (on choice) other points of flow graph

Target architecture CPU + Co-processor (multiple ASICs)

Page 16: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms16

Partitioning Examples:Vulcan (cont’d) Vulcan Co-synthesis algorithm (cont’d)

Allocation Primal approach

Scheduling is done by a scheduler on the target CPU

is generated as part of synthesis process schedules all threads (both HW and SW threads)

cannot be static, due to some threads non-deterministic initiation-time

Page 17: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms17

Partitioning Examples:Vulcan (cont’d) Vulcan Co-synthesis algorithm (cont’d)

Cost estimation SW implementation

Code size relatively straight forward

Data size Biggest challenge. Vulcan puts some effort to find bounds for each

thread HW implementation

?

Page 18: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms18

Partitioning Examples:Vulcan (cont’d) Vulcan Co-synthesis algorithm (cont’d)

Performance estimation Both SW- and HW-implementation

From flow-graph, and basic execution times for the operators

Page 19: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms19

Partitioning Examples:Vulcan (cont’d) Algorithm Details

Partitioning goal Allocate each thread to one of two partitions

CPU Set: S

Co-processor set: H Required execution-rate must be met, and total

cost minimized

Page 20: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms20

Partitioning Examples:Vulcan (cont’d) Algorithm Details (cont’d)

Algorithm steps1. Put all threads in H set

2. Iteratively do2.1. Move some operations to S.

2.1.1. Select a group of operations to move to S.

2.1.2. Check performance feasibility, by computing worst-case delay through flow-graph given the new thread times

2.1.3. Do the move, if feasible2.2. Incrementally update the new cost-function to

reflect the new partition

Page 21: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms21

Partitioning Examples:Vulcan (cont’d) Algorithm Details (cont’d)

Vulcan cost functionf(w) = c1Sh(H) - c2Ss(S) + c3B - c4P + c5|m|

c: weight constants S(): Size functions B: Bus utilization (<1) P: Processor utilization (<1) m: total number of variables to be

transferred between the CPU and the co-processor

Page 22: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms22

Partitioning Examples:Vulcan (cont’d) Algorithm Details (cont’d)

Complementary notes A heuristic to minimize communication

Once a thread is moved to S, its immediate successors are placed in the list for evaluation in the next iteration.

No back-track Once a thread is assigned to S, it remains there

Experimental results considerably faster implementations than all-SW,

but much cheaper than all-HW designs are produced

Page 23: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms23

Co-Synthesis Algorithms:HW/SW Partitioning

HW/SW Partitioning Examples:Cosyma

Page 24: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms24

Partitioning Examples:Cosyma Rolf Ernst, et al: Technical University of

Braunschweig, Germany Dual approach

1. All-SW initial implementation. 2. Iteratively move basic blocks to the ASIC

accelerator to meet performance objective. System specification language

Cx

Is compiled into an ESG (Extended Syntax Graph) ESG is much like a CDFG

Page 25: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms25

Partitioning Examples:Cosyma (cont’d) Cosyma Co-synthesis Algorithm

Partitioning quantum is a Basic Block A Basic Blocks is a branch-free block of program

Target Architecture CPU + accelerator ASIC(s)

Scheduling Allocation Cost Estimation Performance Estimation Algorithm Details

Page 26: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms26

Partitioning Examples:Cosyma (cont’d) Cosyma Co-synthesis Algorithm (cont’d)

Performance Estimation SW implementation

Done by examining the object code for the basic block generated by a compiler

HW implementation Assumes one operator per clock cycle. Creates a list schedule for the DFG of the basic block. Depth of the list gives the number of clock cycles required.

Communication Done by data-flow analysis of the adjacent basic blocks. In Shared-Memory

Proportional to number of variables to be accessed

Page 27: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms27

Partitioning Examples:Cosyma (cont’d) Algorithm Steps

Change in execution-time caused by moving basic block b from CPU to ASIC:

c(b) = w( tHW(b)-tSW(b) + tcom(Z) - tcom(ZUb)) x It(b)

w: Constant weight t(b): Execution time of basic block b tcom(b): Estimated communication time between CPU

and the accelerator ASIC, given a set Z of basic blocks implemented on the ASIC

It(b): Total number of times that b is executed

Page 28: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms28

Partitioning Examples:Cosyma (cont’d) Experimental Results

By moving only basic-blocks to HW Typical speedup of only 2x Reason:

Limited intra-basic-block parallelism Cure:

Implement several control-flow optimizations to increase parallelism in the basic block, and hence in ASIC

Examples: loop pipelining, speculative branch execution with multiple branch prediction, operator pipelining

Result: Speedups: 2.7 to 9.7 CPU times: 35 to 304 seconds on a typical workstation

Page 29: Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE 40-226)

Winter-Spring 2001 Codesign of Embedded Syste

ms29

What we learned today HW/SW Partitioning: One broad

category of co-synthesis algorithms Criteria by which a co-synthesis

algorithm is categorized


Recommended