Partizionamento HW/SW nell'implementazione di sistemi real-time su FPGA con softcore

Partizionamento HW/SW nell'implementazione di sistemi real-time su FPGA con softcore

Outline

• Intro & Motivation• Model• Algorithms• Experiments

Intro and Motivation

• Past work on design optimization for single-processor scheduling– Realizing that the schedulability condition can be

viewed as a feasibility region in the domain of the design variables

– Realizing that such region is convex for EDF under reasonable assumptions

• Availability of Softcores for FPGAs– NIOS II for Altera

• Co-design problem – a functionality can be implemented in HW (inside

the FPGA) in SW (inside or outside the FPGA) and executed by one or more (How many?) Softcores.

Motivation

• Start from some system Model (Simulink)• Explore different HW design options (0-1-2-4-

… NIOS)• For each design option find optimal design

configuration by means of convex linear optimization

• HW implementation is subject to area constraints

• SW implementation is subject to schedulability constraints

HW (area) Constraints

• Models available:• Single-dimension

• Condition linear bound

1

2

3

4

1 2 3 4

slotted linear

Aai

HW (area) Constraints

• Models available:– 2-dimensions

1

23 4

56

7

cutting stock problem

• Complex, more realistic and extremely well-studied problem (real-world implications)

• linear bound solutions can be found from operations research literature !

Reality of FPGAs (additional resource constr.)

Schedulability constraints

• EDF (or L&L sufficient) bound

lubUp

eU

i

i

• How realistic is it?• Implementations of FP and EDF on NIOS exist• How about deadline=periods, independence and so

on?

The Model

• Starting point: Simulink model

The Model

• implementation of a Simulink model

• HW implementation: market tools exist (Celoxica) for implementing Simulink blocks in FPGA.

The Model

• SW implementation: market tools exist RT-Workshop+embedded coder (Mathworks) or TergetLink (Dspace) for implementing Simulink blocks as a set of concurrent threads.

• Threads inherit the sampling period of the blocks (periodic model)

• No overrun is permitted (deadlines=periods)• Communication is by switched buffers

(asynchronous, tasks are independent)• Of course code generation and switched

buffers are not commercially available for EDF but there is nothing that prevents their implementation

The Model

• FPGA = rectangular area of Logic Elements (Les). All dimensions will be in terms of Les

• FPGA height = H• FPGA width = W• Assume homogeneous bidimensional model of

FPGA (array of Les)

• k Softcores CPUl l=1..k are implemented in FPGA: each core requires an area slsh (k=0, 1, 2 ..)

H

W

sh

sw

The Model

• System model = network of blocks

• V = {F1, F2, … Fn} is the set of functional block

• A block Fi can be implemented in HW or SW. according to the value of sil {0,1}. sil=1 if block Fi is executed in SW upon CPUl. If not executed in SW a block MUST be implemented in HW.

• If implemented in HW, a block requires an area wi hi

• If implemented in SW, a block Fi has a worst case comp. time i and a period of execution ti. (HW implementation has i 0) ui = i/ti

The Model

• If implemented in SW, a block is executed in the context of a thread with the same period.

• mi,j =1 if Fi is mapped for execution in j and 0 otherwise (these are not optimization variables but constants!)

• Schedulability constraint (for each NIOS)

Results to be exploited

• Cutting Stock approximate (linear) solution: Level packing (Lodi)

• pack the items in row forming levels– the first level is the bottom of the bin, the second

level is built on top of the first and so on …

• In each level, the leftmost item is the tallest one

• The bottom level is the tallest one• Items are sorted and renunmbered by non-

increasing hi values.


• An example:

• there are n potential levels (one for each initializing block)


• Variables:

• yi = 1 if item i initializes level i and 0 otherwise

• Objective (original):– minimize the height of the required rectangle


• Constraints (original):– xij , i {1.. n-1}, j>i, xij=1 if item j is packed in level

i, 0 otherwise

• Each item is packed exactly once

• Width constraint

Reusing Results

• These results can be reused as follows:• The original objective can be retained or it

can become a constraint

Hyhn

iii

1


• The existence of a packing(Each item is packed exactly once)

• Becomes …• Each item is packed exactly once or it is executed

on a CPU

11

1 1

j

i

k

ljljij syx


• The width constraint is retained …• A schedulability constraint must be added for

eack CPU

),...,1(1

lub klUsun

iili

• Options:• Minimize height with the utilization

constraint• Minimize utilization with height constraint

Problem

• The available area is not squared!• The area necessary for implementing the k CPUs

must be considered• Solution:• start with the 1-CPU case: there are two possible

partitionings

H

Wsh

sw

H-sh

W-sw

• Duplicate all packing variables (the complexity of the problem is correspondingly increased)

Problem

• For the k-CPU case additional assumptions are required (CPUs are packed by rows, columns, or …)

H

W

shsw

H - k sh

W - k sw

H

W

H - 2 shW - 2 sw

Experimenting with GPLK

• Demo …

Date post:	14-Jan-2016
Category:	Documents
Upload:	lonna
View:	24 times
Download:	0 times

Partizionamento HW/SW nell'implementazione di sistemi real-time su FPGA con softcore

Documents