Partizionamento HW/SW nell'implementazione di sistemi real-time su FPGA con softcore
Outline
• Intro & Motivation• Model• Algorithms• Experiments
Intro and Motivation
• Past work on design optimization for single-processor scheduling– Realizing that the schedulability condition can be
viewed as a feasibility region in the domain of the design variables
– Realizing that such region is convex for EDF under reasonable assumptions
• Availability of Softcores for FPGAs– NIOS II for Altera
• Co-design problem – a functionality can be implemented in HW (inside
the FPGA) in SW (inside or outside the FPGA) and executed by one or more (How many?) Softcores.
Motivation
• Start from some system Model (Simulink)• Explore different HW design options (0-1-2-4-
… NIOS)• For each design option find optimal design
configuration by means of convex linear optimization
• HW implementation is subject to area constraints
• SW implementation is subject to schedulability constraints
HW (area) Constraints
• Models available:• Single-dimension
• Condition linear bound
1
2
3
4
1 2 3 4
slotted linear
Aai
HW (area) Constraints
• Models available:– 2-dimensions
1
23 4
56
7
cutting stock problem
• Complex, more realistic and extremely well-studied problem (real-world implications)
• linear bound solutions can be found from operations research literature !
Reality of FPGAs (additional resource constr.)
Schedulability constraints
• EDF (or L&L sufficient) bound
lubUp
eU
i
i
• How realistic is it?• Implementations of FP and EDF on NIOS exist• How about deadline=periods, independence and so
on?
The Model
• Starting point: Simulink model
The Model
• implementation of a Simulink model
• HW implementation: market tools exist (Celoxica) for implementing Simulink blocks in FPGA.
The Model
• SW implementation: market tools exist RT-Workshop+embedded coder (Mathworks) or TergetLink (Dspace) for implementing Simulink blocks as a set of concurrent threads.
• Threads inherit the sampling period of the blocks (periodic model)
• No overrun is permitted (deadlines=periods)• Communication is by switched buffers
(asynchronous, tasks are independent)• Of course code generation and switched
buffers are not commercially available for EDF but there is nothing that prevents their implementation
The Model
• FPGA = rectangular area of Logic Elements (Les). All dimensions will be in terms of Les
• FPGA height = H• FPGA width = W• Assume homogeneous bidimensional model of
FPGA (array of Les)
• k Softcores CPUl l=1..k are implemented in FPGA: each core requires an area slsh (k=0, 1, 2 ..)
H
W
sh
sw
The Model
• System model = network of blocks
• V = {F1, F2, … Fn} is the set of functional block
• A block Fi can be implemented in HW or SW. according to the value of sil {0,1}. sil=1 if block Fi is executed in SW upon CPUl. If not executed in SW a block MUST be implemented in HW.
• If implemented in HW, a block requires an area wi hi
• If implemented in SW, a block Fi has a worst case comp. time i and a period of execution ti. (HW implementation has i 0) ui = i/ti
The Model
• If implemented in SW, a block is executed in the context of a thread with the same period.
• mi,j =1 if Fi is mapped for execution in j and 0 otherwise (these are not optimization variables but constants!)
• Schedulability constraint (for each NIOS)
Results to be exploited
• Cutting Stock approximate (linear) solution: Level packing (Lodi)
• pack the items in row forming levels– the first level is the bottom of the bin, the second
level is built on top of the first and so on …
• In each level, the leftmost item is the tallest one
• The bottom level is the tallest one• Items are sorted and renunmbered by non-
increasing hi values.
Results to be exploited
• An example:
• there are n potential levels (one for each initializing block)
Results to be exploited
• Variables:
• yi = 1 if item i initializes level i and 0 otherwise
• Objective (original):– minimize the height of the required rectangle
Results to be exploited
• Constraints (original):– xij , i {1.. n-1}, j>i, xij=1 if item j is packed in level
i, 0 otherwise
• Each item is packed exactly once
• Width constraint
Reusing Results
• These results can be reused as follows:• The original objective can be retained or it
can become a constraint
Hyhn
iii
1
Results to be exploited
• The existence of a packing(Each item is packed exactly once)
• Becomes …• Each item is packed exactly once or it is executed
on a CPU
11
1 1
j
i
k
ljljij syx
Results to be exploited
• The width constraint is retained …• A schedulability constraint must be added for
eack CPU
),...,1(1
lub klUsun
iili
• Options:• Minimize height with the utilization
constraint• Minimize utilization with height constraint
Problem
• The available area is not squared!• The area necessary for implementing the k CPUs
must be considered• Solution:• start with the 1-CPU case: there are two possible
partitionings
H
Wsh
sw
H-sh
W-sw
• Duplicate all packing variables (the complexity of the problem is correspondingly increased)
Problem
• For the k-CPU case additional assumptions are required (CPUs are packed by rows, columns, or …)
H
W
shsw
H - k sh
W - k sw
H
W
H - 2 shW - 2 sw
Experimenting with GPLK
• Demo …