HEPSYCODE
A System-Level Methodologyfor HW/SW Co-Design of
Heterogeneous Parallel Dedicated Systems
Center of Excellence DEWSUniversità degli Studi dell’Aquila
ITALY
1st Italian Workshop on Embedded Systems (IWES 2016)
2
Overview
• The Proposed Methodology– System Behaviour Specification– Functional Simulation– Co-Analysis & Co-Estimation
• Technologies Library• Co-Analysis• Co-Estimation
– Design Space Exploration• HW/SW Partitioning, Mapping and Architecture Definition• Timing Co-Simulation• Iterations• Example
• Main References
The Proposed Methodology
4
The Proposed Methodology
• The proposed methodology starts from a model of the system behaviour, based on a Concurrent ProcessesMoC, and lead to an heterogeneous parallel dedicatedsystem able to satisfy given F/NF requirements
– In particular, the goal is to suggest to designer• How to partition processes between HW and SW• Which kind of heterogeneous parallel architecture to use
– How many processors, which kind, how to connect them• How to map processes to processor
– GPP, ASP, SPP
– Current NF requirements are mainly architectural and timing constraints but the methodology can be extended
• e.g. power/energy, reliability, monitorability, mixed-criticality
5
The Proposed Methodology
• Reference Co-Design Flow
SystemBehaviour
Model
ReferenceInputs
TimingConstraints
Algorithm-LevelFlow
SchedulingDirectives
HeterogeneousParallel
DedicatedSystem
System-LevelFlow
Technologies Library-Processors-Memories
-Interconnections
ArchitecturalConstraints
6
The Proposed Methodology
• Reference Co-Design Flow– System-Level
SystemBehaviour
Model
FunctionalSimulation
ReferenceInputs
Co-AnalysisCo-Estimation
- Affinity- Timing- Size- Concurrency- Load- Bandwidth
TimingConstraints
HW/SW Partitioning,Mapping and
Architecture Definition
TimingCo-Simulation
Design Space Exploration
Algorithm-LevelFlow
System-Level Flow
HeterogeneousParallel
DedicatedSystem
Technologies Library-Processors-Memories
-Interconnections
SchedulingDirectives
ArchitecturalConstraints
7
The Proposed Methodology
• Inputs
SystemBehaviour
Model
FunctionalSimulation
ReferenceInputs
Co-AnalysisCo-Estimation
- Affinity- Timing- Size- Concurrency- Load- Bandwidth
TimingConstraints
HW/SW Partitioning,Mapping and
Architecture Definition
TimingCo-Simulation
Design Space Exploration
System-Level Flow
Technologies Library-Processors-Memories
-Interconnections
SchedulingDirectives
ArchitecturalConstraints
8
The Proposed Methodology
• Inputs– Functional Requirements
• System Behaviour Model– An executable/simulatable model of the system behaviour based on a
Concurrent Processes MoC• Reference Inputs
– Relevant inputs data sets
9
The Proposed Methodology
• Inputs– Non–Functional Requirements/Constraints
• Timing Constraints– Time-To-Completion constraint
» Actually the only timing one (WIP: real-time constraints)
• Architectural Constraints– Target Form Factor (TFF)
» ASIC, FPGA, SOB (PCB), SO(P)C– Target Template Architecture (TTA)
» min/max # of processors and interconnection links instances» Total available area (or an equivalent metric)
• Scheduling Directives– Available scheduling policies
» RR, Priority, HPV, etc…
10
The Proposed Methodology
• Inputs– Technologies Library
• Characterization of the available processors, memories and links– Actual attributes are dependent on TFF
» ASIC, FPGA, SOB, SO(P)C
11
The Proposed Methodology
• Outputs– Definition of an heterogeneous parallel dedicated system
• HW/SW partitioning of processes• HW/SW Architecture
– How many processors, which kind, how to connect them, whichscheduling policies on SW processors
• How to map processes to processorsSystem
BehaviourModel
FunctionalSimulation
ReferenceInputs
Co-AnalysisCo-Estimation
- Affinity- Timing- Size- Concurrency- Load- Bandwidth
TimingConstraints
HW/SW Partitioning,Mapping and
Architecture Definition
TimingCo-Simulation
Design Space Exploration
System-Level Flow
Technologies Library-Processors-Memories
-Interconnections
SchedulingDirectives
ArchitecturalConstraints
The Proposed Methodology
System Behaviour Specification
13
System Behaviour Specification
SystemBehaviour
Model
FunctionalSimulation
ReferenceInputs
Co-AnalysisCo-Estimation
- Affinity- Timing- Size- Concurrency- Load- Bandwidth
TimingConstraints
HW/SW Partitioning,Mapping and
Architecture Definition
TimingCo-Simulation
Design Space Exploration
System-Level Flow
Technologies Library-Processors-Memories
-Interconnections
SchedulingDirectives
ArchitecturalConstraints
14
System Behaviour Specification
• SBS = {SBM, RI, T}– SBM = {PS, CH}: Concurrent Processes MoC → CSP-based
• A MoC that explicitly defines also a model of communication– Unidirectional point-to-point blocking channels for data exchange
• Such a MoC is well suited to describe system-level behaviour sinceit is unifying for HW and SW and it enables the “processes toprocessors” mapping
– Languages suitable to describe CSP» SystemC, OCCAM, Handel-C, ADA
– More abstract languages» UML, SysML, Simulink
15
System Behaviour Specification
• SBS = {SBM, RI, T}
– RI: a set of inputs (possibly timed), representative AMAP oftypical operating conditions of the system and related expectedoutputs
• To be used for analysis and validation
– T: time-to-completion timing constraint• To be satisfied by each RI
16
System Behaviour Specification
Stimulus System Display
• Reference Example– RI– T: 1000 ms
i1
10 ms: 10 2020 ms: 5 1530 ms: 50 2240 ms: 5 77…100 ms: 10 10
o1
5124422…10
T: from the first input
to the last output max 1000 ms
17
System Behaviour Specification
• Reference Example– PS and CH
Stimulus Display
p4
p1
p7
p2 p3
p8
p5 p6
ch1
ch2
ch3ch4
ch5
ch6
ch7ch8
ch9ch10
ch11
ch12
ch13
ch14 ch15
The Proposed Methodology
Functional Simulation
19
Functional Simulation
SystemBehaviour
Model
FunctionalSimulation
ReferenceInputs
Co-AnalysisCo-Estimation
- Affinity- Timing- Size- Concurrency- Load- Bandwidth
TimingConstraints
HW/SW Partitioning,Mapping and
Architecture Definition
TimingCo-Simulation
Design Space Exploration
System-Level Flow
Technologies Library-Processors-Memories
-Interconnections
SchedulingDirectives
ArchitecturalConstraints
20
Functional Simulation
• This step allows to “validate” the SBM by means of a functional simulation– Such a simulation allows to take into account timed inputs (i.e.
there is a concept of simulated time), but it doesn’t consider the time that will be needed to execute the statements composingthe processes and for the communications
• Currently based on standard SystemC kernel
– If SBM is not correct (i.e. wrong outputs or critical conditionssuch as e.g. deadlocks) SBM should be properly modified and simulated again
The Proposed Methodology
Co-Analysis & Co-Estimation
22
Co-Analysis & Co-Estimation
SystemBehaviour
Model
FunctionalSimulation
ReferenceInputs
Co-AnalysisCo-Estimation
- Affinity- Timing- Size- Concurrency- Load- Bandwidth
TimingConstraints
HW/SW Partitioning,Mapping and
Architecture Definition
TimingCo-Simulation
Design Space Exploration
System-Level Flow
Technologies Library-Processors-Memories
-Interconnections
SchedulingDirectives
ArchitecturalConstraints
23
Co-Analysis & Co-Estimation
• This step is composed of two independent activities– Co-Analysis
• Static and Dynamic– Co-Estimation
• Static and Dynamic
• Both are based on a given Technologies Library
Co-Estimation
STATIC-Timing, Size
DYNAMIC- Load, Bandwidth
Co-Analysis
STATIC- Affinity
DYNAMIC- Concurrency
Technologies Library-Processors-Memories
-Interconnections
SystemBehaviour
Model
ReferenceInputs
TimingConstraints
ArchitecturalConstraints
The Proposed Methodology
Co-Analysis & Co-EstimationTechnologies Library
2525
Technologies Library
• TL contains the characterization of available processors, interconnection links and memories– It is used to perform analysis and estimations and, later, to build
the final architecture during the DSE step• However, there is the need for different TLs depending on TFF
– The main differences are related to the different attributes (or differentmeaning of the same attribute)
– In general• TL = {PC, IL, M}
– PC: {pc1, …, pcn}» Set of processors
– IL: {il1, …, iln}» Set of interconnection links
– M: {m1,…mn}» Set of memories
The Proposed Methodology
Co-Analysis & Co-Estimation
27
Co-Analysis & Co-Estimation
• Co-Analysis– This activity performs the evaluation of two kinds of metrics
• Static analysis– Affinity
• Dynamic analysis– Concurrency
Co-Analysis(static)
- AffinityTechnologies Library
-Processors-Memories
-Interconnections
SystemBehaviour
Model
Co-Analysis(dynamic)
- Concurrency
ReferenceInputs
ArchitecturalConstraints
The Proposed Methodology
Co-Analysis & Co-Estimation
29
Co-Analysis & Co-Estimation
• Co-Estimation– This activity performs two kinds of dependent estimations
• Static estimations– Timing– Size
• Dynamic estimations– Load– Bandwidth
Co-Estimation(static)
- Timing- Size
Co-Estimation(dynamic)
- Load- Bandwidth
Technologies Library-Processors-Memories
-Interconnections
SystemBehaviour
Model
TimingConstraints
ReferenceInputs
ArchitecturalConstraints
The Proposed Methodology
Design Space Exploration
31
Design Space Exploration
SystemBehaviour
Model
FunctionalSimulation
ReferenceInputs
Co-AnalysisCo-Estimation
- Affinity- Timing- Size- Concurrency- Load- Bandwidth
TimingConstraints
HW/SW Partitioning,Mapping and
Architecture Definition
TimingCo-Simulation
Design Space Exploration
System-Level Flow
Technologies Library-Processors-Memories
-Interconnections
SchedulingDirectives
ArchitecturalConstraints
32
Design Space Exploration
• This step is composed of two iterative activities– HW/SW Partitioning, Mapping and Architecture Definition– Timing Co-Simulation
– The final goal is the automatic identification of• an HW/SW partitioning of the processes in PS• an heterogeneous parallel architecture composed of several
connected processors with local memory (i.e. blocks) composed starting form the TL and able to satisfy the architectural constraints
• a mapping of the partitioned processes to the blocks able to satisfy the timing constraint
The Proposed Methodology
Design Space ExplorationHW/SW Partitioning, Mapping and Architecture Definition
34
Design Space Exploration
SystemBehaviour
Model
FunctionalSimulation
ReferenceInputs
Co-AnalysisCo-Estimation
- Affinity- Timing- Size- Concurrency- Load- Bandwidth
TimingConstraints
HW/SW Partitioning,Mapping and
Architecture Definition
TimingCo-Simulation
Design Space Exploration
System-Level Flow
Technologies Library-Processors-Memories
-Interconnections
SchedulingDirectives
ArchitecturalConstraints
HW/SW Partitioning, Mapping and Architecture Definition
35
Design Space Exploration
• HW/SW Partitioning, Mapping and Architecture Definition– Main inputs
• Annotated SBM– SBM + Process-level metrics/estimations
• Technology Library• Architectural Constraints
– To limit cost, to ensure feasibility, or to model an existing platform
36
Design Space Exploration
• HW/SW Partitioning, Mapping and Architecture Definition– 2 Phases Approach
1st phase
2nd phase
Annotated SBM +
TAPAM1
- Partial ArchitectureNumber and type BBs
- HW/SW Partitioning - Mapping
PAM2 BB
Interaction Graph
- Final ArchitectureNumber and type of BBs Number and type of interconnection links Topology
- HW/SW Partitioning - Mapping
37
Design Space Exploration
• HW/SW Partitioning, Mapping and Architecture Definition– Main Outputs
• Heterogeneous Parallel Dedicated Systems (HPDS)– A set of blocks connected by means of a set of links
» Architecture Graph
• Mapping between SBM and blocks/links
bb2.1
bb1.2
bb1.1
il2.1
il1.1
p4
p1
p7p2 p3
p8
p5 p6
bb3.1
The Proposed Methodology
Design Space ExplorationTiming Co-Simulation
39
Design Space Exploration
SystemBehaviour
Model
FunctionalSimulation
ReferenceInputs
Co-AnalysisCo-Estimation
- Affinity- Timing- Size- Concurrency- Load- Bandwidth
TimingConstraints
HW/SW Partitioning,Mapping and
Architecture Definition
TimingCo-Simulation
Design Space Exploration
System-Level Flow
Technologies Library-Processors-Memories
-Interconnections
SchedulingDirectives
ArchitecturalConstraints
Timing Co-Simulation
40
Design Space Exploration
• Timing Co-Simulation– The timing co-simulation activity considers the suggested HPDS
(i.e. architecture and mapping) and all the relevant info previously collected to check if T is going to be satisfied
• Scheduling Directives– Additionally, the designer can select a scheduling policy to be used
» e.g. round-robin, priority-based (if any), etc.
– Currently based on standard SystemC kernel integrated withspecific extensions
The Proposed Methodology
Design Space ExplorationIterations
42
Design Space Exploration
• Iterations– If the proposed mapping/architecture doesn’t satisfy T, the
designer have to perform again the design space exploration• by changing scheduling directives• by changing some parameters in DSE heuristics• by changing architectural constraints
– If no solutions are still found the designer have to perform other changes in the previous steps
• by modifying the SBM– in order to apply semantically equivalent transformations to better show
relevant features (e.g. concurrency or affinity) that the methodology could exploit
• by modifying elements in TL or by relaxing T– This means that T is not feasible with the selected technologies!
The Proposed Methodology
Design Space ExplorationExample
44
Design Space Exploration
• Example
TL
Intel MPU8051
Microchip DSPIC (Pic24)
XilinxSpartan3AN
45
Design Space ExplorationPROCESSES
Requested time
Estimated time
The Proposed Methodology
Main References
47
Main References
• L. Pomante, D. Sciuto, F. Salice, W. Fornaciari, C. Brandolese. “Affinity-Driven System Design Exploration for HeterogeneousMultiprocessor SoC”, IEEE Transactions on Computers, vol. 55, no. 5, May 2006.
• L. Pomante, “System-Level Design Space Exploration for Dedicated Heterogeneous Multi-Processor Systems”. IEEE International Conference on Application-specific Systems, Architectures and Processors, 2011.
• L. Pomante, “HW/SW Co-Design of Dedicated Heterogeneous Parallel Systems: an Extended Design Space Exploration Approach”. IET Computers & Digital Techniques, Institution of Engineering and Technology, 2013, Vol. 7, Iss. 6, pp. 246–254.
• L. Pomante. “Electronic System-Level HW/SW Co-Design of Heterogeneous Multi-Processor Embedded Systems”, The River Publishers Series in Circuits and Systems, 2016.