Hardware-Software Co-partitioning for Distributed Embedded Systems.

Post on 02-Jan-2016

226 views 2 download

Tags:

transcript

Hardware-Software Co-partitioning for Distributed

Embedded Systems

2

Outline

1. Introduction2. Related Work3. Distributed Embedded System and

System Model 4. Multi-Level Partitioning 5. Case Study

3

1. Introduction• Hardware-Software Codesign• Distributed Embedded System• Motivation

Task Graph Physical Restrictions

• Distributed Embedded System Codesign (DESC) Object Modeling Technique (OMT) Linear Hybrid Automata (LHA) SES Models

4

1. Introduction (cont’)

• Multi-Level Partitioning Partitioning Algorithm Sharing, Clustering

• Case Studies

5

2. Related Work

• Target Embedded System 1-CPU and 1-ASIC Topology n-CPU and m-ASIC Topology

Optimal Codesign Heuristic Codesign

6

2. Related Work (cont’) • Codesign of 1-CPU and 1-ASIC

Topology Kumar et al. 1993 Kalavade and Lee 1993 Thomas et al. 1993 Gupta and De Micheli 1993 Barros et al. 1994

7

2. Related Work (cont’)

• Codesign of n-CPU and m-ASIC Topology Optimal Codesign Approaches:

Mixed integer linear programming

Prakash and Parker 1992 Exhaustive search

Wolf 1994, Haworth et al. 1993

D’Ambrosio and Hu 1994

8

2. Related Work (cont’) Heuristic Codesign Approaches:

Iterative and Constructive

Iterative:

Dick and Jha 1998 --- MOGAC, CORDS

Dick and Jha 1999 --- MOCYN

9

2. Related Work (cont’) Constructive:

Wolf 1996 --- object-oriented

Yen and Wolf 1996 --- sensitivity-driven

Dave, Lakshminarayana, and Jha 1999 --- COSYN

Dave and Jha 1999 --- COFTA

Dave and Jha 1998 --- COHRA

Our proposed: Distributed Embedded System

Codesign (DESC)

10

3. Distributed Embedded Systems and System Models • An embedded computer system is a system

which uses computers but is not a general-purpose computer.

• In 1971, there were about 142,000 computers world-wide.

• In 1999, there are now some 350 to 400 million personal computers alone and at least of magnitude more embedded devices.

11

3. Distributed Embedded Systems and System Models (cont’)

• There are several reasons to build distributed hardware engine for embedded system Cheaper Faster response time The devices control may be physically

distributed

12

3. Distributed Embedded Systems and System Models (cont’)

• System Models Object Modeling Technique (OMT)

Models Object Model Dynamic Model Functional Model

13

3. Distributed Embedded Systems and System Models (cont’) Linear Hybrid Automata (LHA) Models

Internal system model For verifying systems

SES Models SES/workbench is a popular modeling and simulation

tool for system performance evaluation

14

4. Multi-Level Partitioning

• Multi-Level Partitioning (MLP)

Three Main Phases Codesign Space Exploration (CSE) System Structural Partitioning (SSP) Binary Search Copartitioning (BSC)

Explore Design Space

Generate StructuralPartition

Copartitioning

Last StructuralPartition?

Last DesignAlternative?

Output HeuristicallyOptimal Partition

Yes

Yes

Number of CPU andhardware cost

CPU allocation todistributed subsystems

No

No

CPUSharing

ASICSharing

CSE level

HardwareClustering

SoftwareGrouping

Initialization

SSP level

BSC levelNext structural partition

Overall Flow Chart of Multi-Level Partitioning

Calculate CPD ratios of each object in MLA

Sort all MLA objects in an ascending order of their CPD ratios

Select an object with median CPD ratio

Use software to implement all objects with CPD ratios not less than that of the selected median object.

Use hardware to implement all objects with CPD ratios less than that of the selected median object.

Check if the partition result satisfies system constraints

Check if the partition is a heuristically

optimal solution?

Yes

Store structural partition result and perform sharing and clustering

Performance is more important Cost is more

important

Cost and performance constraints are satisfied

Cost constraint is not satisfied, but performance constraints are satisfied

Cost constraint is satisfied, but performance constraints are not

satisfied

Increase software objects

Increase hardware objects

No satisfactory partition

Place all objects of hardware parts into ILA and all other

objects into MLA

Select number of CPU and hardware cost (Explore Design Space)

Allocate CPU to distributed Subsystem (Generate Structural Partition)

Output least costly partition

Initialization

Copartitioning

CSE level

SSP level

BSC level

Yes

Yes

No

No

Last structural partition?

Last design alternative?

Next structural partition

OMT Models

LHA Models

Partition found?

Yes

Print “No partition”

No

Detailed Flow Diagram of Multi-Level Partitioning

17

)(_/|)(_)(_|

)(_)(_)(

xConstraintPerfxPerfSoftwarexPerfHardware

xCostSoftwarexCostHardwarexCPD

where x is a object

CPD: Cost-Performance Difference

4. Multi-Level Partitioning (cont’)

18

4. Multi-Level Partitioning (cont’)

• CPU/ASIC Sharing Sharing Threshold Distance (STD)

SLI: Subsystem Location Inter-distance

Sharing No Sharing

STD0SLI:

19

Interconnect Cost (IC) Model IC (X1, X2) = α × SLI(S1, S2) × #Link(X1, S2) ×

BW(X1, S2) + EC(X1)

SLI: Subsystem Location Inter-distanceS1 and S2 : Subsystems

X1 and X2 : A component (PE or ASIC)

α : A parameter that depends on the interconnection technology#Link(X1, S2) : The number of links between X1 and S2

BW(X1, S2) : The communication bandwidth between X1 and S2

EC(X1) : The cost for enhancing X1 such that both S1 and S2 can

use X1.

4. Multi-Level Partitioning (cont’)

20

Algorithm 5.2 Share Components AlgorithmShare_Components(s){

/* s=<s1, s2, …,s>, si=(si1, si2) where si1 is the number of PE in subsystem Si and si2 is the number of ASIC in subsystem Si. si1, si2{0,1, ……} */

 for (i = 1, i , i++) { for (j = i, j , j++) {

if SLI(si, sj) STD {

if (si1 0 sj1 0)

Share_PE(Si, Sj); /* Refer to Algorithm 5.3 */

if (si2 0 sj2 0)

Share_ASIC(Si, Sj); /* Refer to Algorithm 5.4 */} }}}

4. Multi-Level Partitioning (cont’)

21

• Hardware Clustering and SoftwareGrouping

In DESC, hardware clustering is based on Kernighan and Lin basic graph partitioning algorithm, but it is enhanced to include DEMS characteristics.

Software grouping technique similar to load balancing on multiple processors

4. Multi-Level Partitioning (cont’)

22

4. Multi-Level Partitioning (cont’)

• Analysis and Validation of MLP Complexity analysis

r: the number of objects : the number of subsystems

,...,0

)]__()([ _p

MLP timeClustertimeSharepspBSCtimeInit

)])(2()([loglog 2

,...,0 12 max rkkpCpsprrrr

p pkMLP

23

5. Case Studies

• Vehicle Parking Management System (VPMS)

• Examples of Sharing and Clustering in MLP

• Application of MLP to Coal Mine System

24

• Vehicle Parking Management System (VPMS) VPMS Specifications

A VPMS consists of three subsystems: ENTRY management, EXIT management, and DISPLAY.

An ENTRY (or an EXIT) subsystem consists of three parts: a ticket facility, a gate controlled by a gate-motor, and a pair of sensors.

A DISPLAY subsystem

5. Case Studies (cont’)

25

Constraints for the VPMS system A maximum cost of $1,300, A maximum display response time of 14,000 µs, and A maximum ENTRY (EXIT) gate response time of 250

µs.

7. Case Study (cont’)

26

Specification and Mapping of VPMS VPMS is described using OMT models consisting of

Object

Dynamic, and

Functional models.

5. Case Study (cont’)

Vehicle ParkingManagement System

ENTRY ManagementSystem

Display System

GateController Ticket Checker

Motor

ControlUnit

ENTRY Gate EXIT Gate

isa isa

Sensor Send/ReceiveDevice

ControlUnit

ENTRY Sensor EXIT Sensor

isa isa

Display Device Control System

Counter DisplayInterface

7-Segment LCD Dot Matrix

TimeStamp

EXIT ManagementSystem

Object Model of VPMS

Dynamic Model of a DISPLAY Subsystem

Decrementcounter

UpdateDisplayIdle

Car in

Incrementcounter

Car out

Push time stampbutton

Read count

Count > 0,send ACK!

Count = 0,out of space

Functional Model of a DISPLAY Subsystem

CounterIncrementCounter

EXIT Sensor ENTRY Sensor

DecrementCounter

UpdateDisplay

Car out signal Car in signal

Counter Data

30

• LHA Model of VPMS

Hardware LHA Model

Software LHA Model

5. Case Study (cont’)

Hardware LHA of a DISPLAY Subsystem

Update Display

DecrementCounter

Idle IncrementCounter

Read Count

Count:=500

t:= 0,

t = 100ns

Car outt := 0

Push time stamp buttont := 0

Car int := 0

t = 42ns, t := 0

Count := Count 1t = 42ns, t := 0

Count := Count + 1

t = 18ns

Software LHA of a DISPLAY Subsystem

Update Display

DecrementCounter

Polling IncrementCounter

Read Count

Count:=500

t:= 0, x := 0,

t = 10ms, t:= 0x := 0

Car out,t := 0

Push time stamp button,

t := 0

Car in,t := 0

t = 3.2μ s, t := 0Count := Count 1x 33ms, x := 0

t = 3.2μ s, t := 0

Count := Count +1

x 33ms, x := 0

t = 10μ s ,

t := 0

t = 5.12μ s,t := 0,x 33ms,x := 0

33

• SES Models

Using SES/workbench Model

A car-simulator

An ENTRY management subsystem

An EXIT management subsystem

A DISPLAY subsystem

5. Case Study (cont’)

34

SES Model of a DISPLAY Subsystem

5. Case Study (cont’)

35

• Applying MLP to VPMS

Calculation of CPD for VPMS Parts Hardware

Cost Software

Cost Hardware

Performance Software

Performance CPD

Sensor Driver 115 90 210 1,030 7.622 Counter 120 90 290 13,200 32.533 Motor Driver 260 90 820 1,030 202.381

5. Case Study (cont’)

36

Applying MLP to the VPMS Example Binary Search Copartitioning (BSC)

Codesign Space Exploration (CSE) (Number of CPU) Partitions(SSP) Cost ($)

Response time (s) (sensor to display)

Response time (s)

(sensor to gate)

Feasi-bility

0 A(HC, HS, HM) 1,450 190 0.2 No

1 B(HC, HS, SM) 1,280 190 215.0 Yes C(HC, HS,

2MS) 1,370 13,200 820.0 No

2 D(SC, HS, SM) 1,250 13,100 215.0 Yes E( 2

CS, HS, SM) 1,340 13,100 210.0 No 3

F(SC, SS, SM) 1,225 13,200 1,030.0 No

H: hardware, S: software, subscripts: C = Counter, S = Sensor Driver, M = Motor Driver,

superscripts: 1 One CPU, 2 Two CPUs, 3 Three CPUs

5. Case Study (cont’)

37

•VPMS Emulation Block Diagram for Prototype D(SC, HS, SM)

S i n g l e - c h i p P r o c e s s o r

( 8 7 5 1 )

T i m e S t a m p M a c h i n e

M I n t e r f a c e E n t r y g a t e

D i s p l a y D e v i c e

M I n t e r f a c e E x i t g a t e

E n t r y S e n s o r & D r i v e r

S i g n a l P r o c e s s i n g

E x i t S e n s o r & D r i v e r

S i g n a l P r o c e s s i n g

T i c k e t C h e c k e r

C a r i n ( i )

C a r o u t ( i )

P a r k i n g f e e s p a i d ( i )

D i s p l a y s c a n d a t a ( o )

O p e n ( o ) o r C l o s e ( o )

O p e n ( o ) o r C l o s e ( o )

P u s h t i m e s t a m p b u t t o n ( i ) T i c k e t t a k e n ( i ) A c k n o w l e d g m e n t ( o )

S i n g l e - c h i p P r o c e s s o r

( 8 7 5 1 )

5. Case Study (cont’)

38

VPMS Emulation Results

VPMS Emulation Results

Partitions B(HC, HS, SM) D(SC, HS, SM)

Cost ($) 1278 1240 Power Consumption (W) 4.76 4.20

Response time (µs) (sensor to display) 180 13,000

Response time (µs) (sensor to gate) 210 210

5. Case Study (cont’)

39

•Examples of Sharing and Clustering in MLP Sharing and clustering techniques in MLP

based on several variants of the VPMS case study.

How object oriented modeling can be advantageous in hierarchical partitioning.

Coal mine control and monitoring system

5. Case Study (cont’)

Advantage of Sharing in MLP

Partitioning Results for three VPMS Specifications

with and without Sharing Specifications

VPMS-1 VPMS-2 VPMS-3 STD (m) 1.0 1.0 1.0 SLI(ENTRY, EXIT) (m) 6.0 0.5 0.8 SLI(Display, EXIT) (m) 7.0 3.0 0.5 SLI(Display, ENTRY) (m)

2.0 3.0 0.5

Partitioning Results

Number and Locations of PE

3

(1) ENTRY gate control

(2) EXIT gate control

(3) Display

2

(1) ENTRY/ EXIT gate control

(2) Display 1

(1)ENTRY/ EXIT/

Display Subsystem

Number and Locations of ASIC

2

(1) ENTRY sensor control

(2) EXIT sensor control

1

(1) ENTRY/ EXIT sensor 1

(1) ENTRY/ EXIT/

Display Subsystem Interface

System Cost ($) 1,430 1,250 1,180 Display response time (s)

13,200 13,200 14,020

Performance Gate response time (s)

210 210 1030

MLP Execution Time (sec)

0.602 3.857 14.789

Advantage of Clustering in MLP

Partitioning Results for five VPMS Specifications

with and without Clustering Specifications

VPMS-A VPMS-B VPMS-C VPMS-D VPMS-E Number of Subsystems

1 2 2 2 3

Subsystems

(1) ENTRY/ EXIT/ Display Subsystem

(1) ENTRY/ EXIT Subsystem

(2) Display Subsystem

(1) ENTRY/ Display Subsystem

(2) EXIT Subsystem

(1) ENTRY Subsystem

(2) EXIT/ Display Subsystem

(1) ENTRY Subsystem

(2) EXIT Subsystem

(3) Display Subsystem

Partitioning Results

Number and locations of PE

1 (1) Motor

Driver/ Counter

2 (1) Motor

Driver (2) Counter

2

(1) ENTRY Motor Driver/ Counter

(2) EXIT Motor Driver

2

(1) ENTRY Motor Driver

(2) EXIT Motor Driver/ Counter

3

(1) ENTRY Motor Driver

(2) EXIT Motor Driver

(3) Counter

Number and locations of ASIC

1 (1) Sensor Driver

1 (1) Sensor

Driver 2

(1) ENTRY Sensor

(2) EXIT Sensor

2

(1) ENTRY Sensor

(2) EXIT Sensor

2

(1) ENTRY Sensor

(2) EXIT Sensor

System Cost ($) 1,180 1,250 1,340 1,340 1,430 Display response time (s)

14,020 13,200 13,100 13,100 13,200 Perfor-mance Gate

response time (s)

1,030 210 110 110 110