Software Estimation Alberto Sangiovanni-Vincentelli Thanks to Prof. Sharad Malik at Princeton...

Software EstimationSoftware Estimation

Alberto Sangiovanni-VincentelliAlberto Sangiovanni-Vincentelli

Thanks to Prof. Sharad Malik at Princeton Universityand Prof. Reinhard Wilhelm at Universitat des Saarlandes

for some of the slides

2

OutlineOutline

SW estimation overviewSW estimation overview

Program path analysisProgram path analysis

Micro-architecture modelingMicro-architecture modeling

Implementation examples: CinderellaImplementation examples: Cinderella

SW estimation in VCCSW estimation in VCC

SW estimation in AISW estimation in AI

SW estimation in POLISSW estimation in POLIS

3


SW estimation problems in HW/SW co-designSW estimation problems in HW/SW co-designThe structure and behavior of synthesized programs are known in The structure and behavior of synthesized programs are known in

the co-design systemthe co-design system

Quick (and as accurate as possible) estimation methods are neededQuick (and as accurate as possible) estimation methods are needed Quick methods for HW/SW partitioning [Hu94, Gupta94]Quick methods for HW/SW partitioning [Hu94, Gupta94] Accurate method using a timing accurate co-simulation [Henkel93]Accurate method using a timing accurate co-simulation [Henkel93]

4

SW estimation in HW-SW co-designSW estimation in HW-SW co-design

Architecture Function

Mapping

HW SW

No concept of SW

Fast withModerate Accuracyand Low Cost

Accurate at any cost

Capacity

5

SW estimation overview: motivationSW estimation overview: motivation

SW estimation helps toSW estimation helps to Evaluate HW/SW trade-offsEvaluate HW/SW trade-offs

Check performance/constraintsCheck performance/constraints Higher reliabilityHigher reliability

Reduce system costReduce system cost Allow slower hardware, smaller size, lower power consumptionAllow slower hardware, smaller size, lower power consumption

6

SW estimation overview: tasksSW estimation overview: tasks

Architectural evaluationArchitectural evaluation processor selectionprocessor selection

bus capacitybus capacity

Partitioning evaluationPartitioning evaluation HW/SW partitionHW/SW partition

co-processor needsco-processor needs

System metric evaluationSystem metric evaluation performance met?performance met?

power met?power met?

size met?size met?

7

SW estimation overview: Static v.s. Dynamic SW estimation overview: Static v.s. Dynamic

Static estimationStatic estimation Determination of runtime properties at compile timeDetermination of runtime properties at compile time

Most of the (interesting) properties are undecidable => use Most of the (interesting) properties are undecidable => use approximationsapproximations

An approximation program analysis is safe, if its results can An approximation program analysis is safe, if its results can always be depended on. Results are allowed to be imprecise as always be depended on. Results are allowed to be imprecise as long as they are not on the safe sidelong as they are not on the safe side

Quality of the results (precision) should be as good as possibleQuality of the results (precision) should be as good as possible

8

SW estimation overview: Static v.s. Dynamic SW estimation overview: Static v.s. Dynamic

Dynamic estimationDynamic estimation Determination of properties at runtimeDetermination of properties at runtime

DSP ProcessorsDSP Processors relatively data independentrelatively data independent most time spent in hand-coded kernelsmost time spent in hand-coded kernels static data-flow consumes most cyclesstatic data-flow consumes most cycles small number of threads, simple interruptssmall number of threads, simple interrupts

Regular processorsRegular processors arbitrary C, highly data dependentarbitrary C, highly data dependent commercial RTOS, many threadscommercial RTOS, many threads complex interrupts, prioritiescomplex interrupts, priorities

9

SW estimation overview: approachesSW estimation overview: approaches

Two aspects to be consideredTwo aspects to be considered The structure of the code (The structure of the code (program path analysisprogram path analysis))

E.g. loops and false pathsE.g. loops and false paths

The system on which the software will run (The system on which the software will run (micro-architecture modelingmicro-architecture modeling)) CPU (ISA, interrupts, etc.), HW (cache, etc.), OS, CompilerCPU (ISA, interrupts, etc.), HW (cache, etc.), OS, Compiler

Needs to be done at high/system levelNeeds to be done at high/system level Low-levelLow-level

e.g. gate-level, assembly-language levele.g. gate-level, assembly-language level Easy and accurate, but long design iteration timeEasy and accurate, but long design iteration time

High/system-levelHigh/system-level Reduces the exploration time of the design spaceReduces the exploration time of the design space

10

Conventional system design flowConventional system design flow

system partition

design criteria: - performance - cost - modifiability - testability - reliability

HW design SW design

requirements

re-partitioning

performance tuning

system debug performance analysisLong iteration loop !!

Low-level performance estimation

11

System-level software modelSystem-level software model

Must be fast - whole system simulationMust be fast - whole system simulation

Processor model must be cheapProcessor model must be cheap ““what if” my processor did Xwhat if” my processor did X

future processors not yet developedfuture processors not yet developed

evaluation of processor not currently usedevaluation of processor not currently used

Must be convenient to useMust be convenient to use no need to compile with cross-compilersno need to compile with cross-compilers

debug on my desktopdebug on my desktop

Must be accurate enough for the purposeMust be accurate enough for the purpose

12

Accuracy vs Performance vs CostAccuracy vs Performance vs Cost

Hardware Emulation

Cycle accurate model

Cycle counting ISS

Static spreadsheet

Dynamic estimation

Accuracy Speed $$$*

+++ ---

--

+-

++ --

++ + -

+

-

++ ++

+++ +++

*$$$ = NRE + per model + per design

13

OutlineOutline








14


Basic blocksBasic blocks A basic block is a program segment which is only entered at the A basic block is a program segment which is only entered at the

first statement and only left at the last statement.first statement and only left at the last statement.

Example: function callsExample: function calls

The WCET (or BCET) of a basic block is determinedThe WCET (or BCET) of a basic block is determined

A program is divided into basic blocksA program is divided into basic blocks Program structure is represented on a directed program flow Program structure is represented on a directed program flow

graph with basic blocks as nodes.graph with basic blocks as nodes.

A longest / shortest path analysis on the program flow identify A longest / shortest path analysis on the program flow identify WCET / BCETWCET / BCET

15


Program path analysisProgram path analysis Determine extreme case execution paths.Determine extreme case execution paths.

Avoid exhaustive search of program paths.Avoid exhaustive search of program paths.

Eliminate Eliminate False PathsFalse Paths:: Make use of path information provided by the user.Make use of path information provided by the user.

if (ok) i = i*i + 1;else i = 0;

if (i) j++;else j = j*j;

for (i=0; i<100; i++) { if (rand() > 0.5) j++; else k++;}

2100 possible worst case paths!

Always executed together!

16


Path profile algorithmPath profile algorithm Goal: Determines how many times each acyclic path in a routine Goal: Determines how many times each acyclic path in a routine

executesexecutes

Method: identify sets of potential paths with statesMethod: identify sets of potential paths with states

Algorithms:Algorithms: Number final states from Number final states from 00, , 11, to , to n-1n-1, where , where nn is the number of potential is the number of potential

paths in a routine; a final state represents the single path taken through paths in a routine; a final state represents the single path taken through a routinea routine

Place instrumentation so that transitions need not occur at every Place instrumentation so that transitions need not occur at every conditional branchconditional branch

Assign states so that transitions can be computed by a simple arithmetic Assign states so that transitions can be computed by a simple arithmetic operationoperation

Transforms a control-flow graph containing loops or huge numbers of Transforms a control-flow graph containing loops or huge numbers of potential paths into an acyclic graph with a limited number of paths potential paths into an acyclic graph with a limited number of paths

17


Transform the problem into an integer linear Transform the problem into an integer linear

programming (ILP) problem.programming (ILP) problem. Basic idea:Basic idea:

subject to a set of linear constraints that bound all feasible subject to a set of linear constraints that bound all feasible

values of values of xxii’s.’s.

Assumption for now: simple micro-architecture modelAssumption for now: simple micro-architecture model

(constant instruction execution (constant instruction execution time)time)

Exec. count of Bi (integer variable)

Single exec. time of basic block Bi (constant)

max( cixii∑ )

18

Program path analysis: structural constraintsProgram path analysis: structural constraints

Linear constraints constructed automatically from Linear constraints constructed automatically from

program’s control flow graph.program’s control flow graph.

Example: While loop

/* p >= 0 */q = p; while (q<10) q++;r = q;

Structural ConstraintsAt each node:

Exec. count of Bi = inputs

= outputs

Functional Constraints:provide loop bounds andother path information

Control Flow Graph

1 1 2x2d2 d4d3 d5x3d3d4x4d5d6

0x1x310x1Source Code

B1: q=p;

B4: r=q;

B2: while(q<10)

B3: q++;

d1

d2

d3

d5

d4

d6

x1

x2

x3x4

19

Program path analysis: functional constraintsProgram path analysis: functional constraints

Provide loop bounds (mandatory).Provide loop bounds (mandatory).

Supply additional path information (optional).Supply additional path information (optional).

x1 for (i=0; i<10; ++i)

x2 for (j=0; j<i; ++j)

x3 A[i] += B[i][j];

Nested loop:

loop bounds

path info.

If statements:x1 if (ok)

x2 i=i*i+1; else

x3 i=0;

x4 if (i)

x5 j=0; else

x6 j=j*j;

True statement executed at most 50%:

B2 and B5 have same execution counts:

x20.5x1

x2 x5

x2 10x1

0x2 x3 9x2

x3 45x1

20

OutlineOutline








21


Micro-architecture modelingMicro-architecture modeling Model hardware and determine the execution time of Model hardware and determine the execution time of

sequences of instructions.sequences of instructions.

Caches, CPU pipelines, etc. make WCET computation Caches, CPU pipelines, etc. make WCET computation difficult since they make it history-sensitivedifficult since they make it history-sensitive

Program path analysis and micro-architecture modeling Program path analysis and micro-architecture modeling are inter-related.are inter-related.

Worst casepath

Instructionexecution time

22


Pipeline analysisPipeline analysis Determine each instruction’s worst case Determine each instruction’s worst case effectiveeffective

execution time by looking at its surrounding instructions execution time by looking at its surrounding instructions within the same basic block.within the same basic block.

Assume constant pipeline execution time for each basic Assume constant pipeline execution time for each basic block.block.

Cache analysisCache analysis Dominant factor.Dominant factor.

Global analysis is required.Global analysis is required.

Must be done simultaneously with path analysis.Must be done simultaneously with path analysis.

23


Other architecture feature analysisOther architecture feature analysis Data dependent instruction execution timesData dependent instruction execution times

Typical for CISC architecturesTypical for CISC architectures e.g. shift-and-add instructions

Superscalar architecturesSuperscalar architectures

24

Micro-architecture modeling: pipeline featuresMicro-architecture modeling: pipeline features

Pipelines are hard to predictPipelines are hard to predict Stalls depend on execution history and cache contentsStalls depend on execution history and cache contents

Execution times depend on execution historyExecution times depend on execution history

Worst case assumptionsWorst case assumptions Instruction execution cannot be overlappedInstruction execution cannot be overlapped

If a hazard cannot be safely excluded, it must be assumed to happenIf a hazard cannot be safely excluded, it must be assumed to happen

For some architectures, hazard and non-hazard must be considered For some architectures, hazard and non-hazard must be considered (interferences with instruction fetching and caches)(interferences with instruction fetching and caches)

Branch predictionBranch prediction Predict which branch to fetch based onPredict which branch to fetch based on

Target address (backward branches in loops)Target address (backward branches in loops) History of that jump (branch history table)History of that jump (branch history table) Instruction encoding (static branch prediction)Instruction encoding (static branch prediction)

25

Micro-architecture modeling: pipeline featuresMicro-architecture modeling: pipeline features

On average, branch prediction works wellOn average, branch prediction works well Branch history correctly predicts most branchesBranch history correctly predicts most branches

Very low delays due to jump instructionsVery low delays due to jump instructions

Branch prediction is hard to predictBranch prediction is hard to predict Depends on execution history (branch history table)Depends on execution history (branch history table)

Depends on pipeline: when does fetching occur?Depends on pipeline: when does fetching occur?

Incorporates additional instruction fetches not along the execution Incorporates additional instruction fetches not along the execution path of the program (mispredictions)path of the program (mispredictions)

Changes instruction cache quite significantlyChanges instruction cache quite significantly

Worst case scenariosWorst case scenarios Instruction fetches occur along all possible execution pathsInstruction fetches occur along all possible execution paths

Prediction is wrong: re-fetch along other pathPrediction is wrong: re-fetch along other path

I-Cache contents are ruinedI-Cache contents are ruined

26

Micro-architecture modeling: pipeline analysisMicro-architecture modeling: pipeline analysis

Goal: calculate all possible pipeline states at a program pointGoal: calculate all possible pipeline states at a program point

Method: perform a cycle-wise evolution of the pipeline, determining all Method: perform a cycle-wise evolution of the pipeline, determining all

possible successor pipeline statespossible successor pipeline states

Implemented from a formal model of the pipeline, its stages and Implemented from a formal model of the pipeline, its stages and

communication between themcommunication between them

Generated from a PAG specificationGenerated from a PAG specification

Results in WCET for basic blocksResults in WCET for basic blocks

Abstract state is a set of concrete pipeline states; try to obtain a superset Abstract state is a set of concrete pipeline states; try to obtain a superset

of the collecting semanticsof the collecting semantics

Sets are small as pipeline is not too history-sensitiveSets are small as pipeline is not too history-sensitive

Joins in CFG are set unionJoins in CFG are set union

27

Micro-architecture modeling: I-cache analysisMicro-architecture modeling: I-cache analysis

Without cache analysisWithout cache analysis For each instruction, For each instruction,

determine:determine: total execution counttotal execution count

execution timeexecution time

Instructions within a basic Instructions within a basic block have same execution block have same execution countscounts Group them together.Group them together.

With i-cache analysisWith i-cache analysis For each instruction, determine:For each instruction, determine:

cache hit execution countcache hit execution count cache miss execution countcache miss execution count cache hit execution timecache hit execution time cache miss execution timecache miss execution time

Instructions within a basic block Instructions within a basic block may have may have differentdifferent cache cache hit/miss countshit/miss counts Need other grouping method.Need other grouping method.

Extend previous ILP formulationExtend previous ILP formulation

28

Micro-architecture modeling: D-cache analysisMicro-architecture modeling: D-cache analysis Difficulties:Difficulties:

Data flow analysis is required.Data flow analysis is required.

Load/store address may be ambiguous.Load/store address may be ambiguous.

Load/store address may change.Load/store address may change.

Simple solution:Simple solution: Extend cost function to include data cache hit/miss penalties.Extend cost function to include data cache hit/miss penalties.

Simulate a block of code with known execution path to obtain Simulate a block of code with known execution path to obtain data hits and misses.data hits and misses. x1 if (something) {

x2 for (i=0; i<10; ++i)

x3 for (j=0; j<i; ++j)

x4 A[i] += B[i][j]; } else {

x5 /* ... */ }

Data hits/misses of this loop nest can be simulated.

29

OutlineOutline








30


To be faster than co-simulation of the target processor (at least To be faster than co-simulation of the target processor (at least

one order of magnitude)one order of magnitude)

To provide more flexible and easier to use bottleneck analysis To provide more flexible and easier to use bottleneck analysis

than emulation (e.g., who is causing the high cache miss rate?)than emulation (e.g., who is causing the high cache miss rate?)

To support fast design exploration (what-if analysis)after changes To support fast design exploration (what-if analysis)after changes

in the functionality and in the architecturein the functionality and in the architecture

To support derivative designTo support derivative design

To support To support well-designedwell-designed legacy code (clear separation between legacy code (clear separation between

application layer and API SW platform layer)application layer and API SW platform layer)

ObjectivesObjectives

31


ApproachesApproaches Various trade-offs between simplicity, compilation/simulation speed and Various trade-offs between simplicity, compilation/simulation speed and

precisionprecision

Virtual Processor Model: it compiles C source to simplified “object code” used Virtual Processor Model: it compiles C source to simplified “object code” used

to back-annotate C source with execution cycle counts and memory accessesto back-annotate C source with execution cycle counts and memory accesses Typically ISS uses object code, Cadence CC-ISS uses assembly code, commercial CC-Typically ISS uses object code, Cadence CC-ISS uses assembly code, commercial CC-

ISS’s use object codeISS’s use object code

CABA: C-Source Back Annotation and model calibration via Target Machine CABA: C-Source Back Annotation and model calibration via Target Machine

Instruction SetInstruction Set

Instruction-Set Simulator: it uses target object code to:Instruction-Set Simulator: it uses target object code to: either reconstruct annotated C source (Compiled-Code ISS)either reconstruct annotated C source (Compiled-Code ISS)

or executed on an interpreted ISSor executed on an interpreted ISS

32

SW estimation in VCCSW estimation in VCCScenariosScenarios

Target Processor

VCC

White Box C

VCCVirtual

Compiler

Target Processor

Compiler

HostCompiler

Compiled CodeVirtual Instruction

Set Simulator

Compiled Code Instruction Set

Simulator

InterpretedInstruction Set

Simulator

TargetProcessorInstruction

Set

VCC Virtual

ProcessorInstruction

Set

.obj

AnnotatedWhite Box C

AnnotatedWhite Box C .obj

.obj

HostCompiler

Target Assembly Code

ASM 2 C

Compiled Code

Processor Model

Co-simulation

tmp=b+cc=f(d)MT update Δ1

Δ1

= * +y a c bMT updateΔ2+Δ3

write B yΔ3

6( )f yMT updateΔ4return

Δ4

=( <<* )r s a= + *a r m x Δ2

tmp !tmp

33


LimitationsLimitations

C (or assembler) library routine estimation (e.g. C (or assembler) library routine estimation (e.g.

trigonometric functions): the delay should be part of the trigonometric functions): the delay should be part of the

library modellibrary model

Import of arbitrary (especially processor or RTOS-Import of arbitrary (especially processor or RTOS-

dependent) legacy codedependent) legacy code Code must adhere to the simulator interface including Code must adhere to the simulator interface including

embedded system calls (RTOS): the conversion is not the aim embedded system calls (RTOS): the conversion is not the aim of software estimationof software estimation

34

SW estimation in VCCSW estimation in VCCVirtual Processor Model (VPM)Virtual Processor Model (VPM)compiled code virtual instruction set simulatorcompiled code virtual instruction set simulator

Pros:Pros:does not require target software development chaindoes not require target software development chain

fast simulation model generation and executionfast simulation model generation and execution

simple and cheap generation of a new processor modelsimple and cheap generation of a new processor model

Needed when target processor and compiler not availableNeeded when target processor and compiler not available

Cons:Cons:hard to model target compiler optimizations (requires “best in hard to model target compiler optimizations (requires “best in

class” Virtual Compiler that can also as C-to-C optimization for class” Virtual Compiler that can also as C-to-C optimization for the target compiler)the target compiler)

low precision, especially for data memory accesseslow precision, especially for data memory accesses

35


Interpreted instruction set simulator (I-ISS)Interpreted instruction set simulator (I-ISS)Pros:Pros:

generally available from processor IP providergenerally available from processor IP provider

often integrates fast cache modeloften integrates fast cache model

considers target compiler optimizations and real data and code addressesconsiders target compiler optimizations and real data and code addresses

Cons: Cons: requires target software development chainrequires target software development chain

often low speedoften low speed

different integration problem for every vendor (and often for every CPU)different integration problem for every vendor (and often for every CPU)

may be difficult to support communication models that require waiting to may be difficult to support communication models that require waiting to complete an I/O or synchronization operationcomplete an I/O or synchronization operation

36


Compiled code instruction set simulator (CC-ISS)Compiled code instruction set simulator (CC-ISS)

Pros:Pros: very fast (almost same speed as VPM, if low precision is very fast (almost same speed as VPM, if low precision is

required)required)

considers target compiler optimizations and real data and considers target compiler optimizations and real data and code addressescode addresses

Cons:Cons: often not available from CPU vendor, expensive to createoften not available from CPU vendor, expensive to create

requires target software development chainrequires target software development chain

37


CABA - VICABA - VI

For each processor:For each processor: Group target instructions into m Virtual Instructions (e.g., ALU, load, Group target instructions into m Virtual Instructions (e.g., ALU, load,

store, …)store, …) For each one of n (much larger than m) benchmarksFor each one of n (much larger than m) benchmarks

Run ISS and get benchmark cycle count and VIs execution countRun ISS and get benchmark cycle count and VIs execution count Derive average execution time for each VI (processor BSS file) by best Derive average execution time for each VI (processor BSS file) by best

fit on benchmark run datafit on benchmark run data For each functional block:For each functional block:

Compile source and extract VI composition for each ASM Basic BlockCompile source and extract VI composition for each ASM Basic Block Split source into BBs and back-annotate estimated execution time using ASM BBs’ Split source into BBs and back-annotate estimated execution time using ASM BBs’

VI composition and BSSVI composition and BSS Run VCC and get functional block cycle countRun VCC and get functional block cycle count

38


CABA - VICABA - VI

CABA-VI: uses a calibration-like procedure to obtain average CABA-VI: uses a calibration-like procedure to obtain average execution timing for each target instruction (or instruction class execution timing for each target instruction (or instruction class – Virtual Instruction (VI)). Unlike the similar VPM technique, the – Virtual Instruction (VI)). Unlike the similar VPM technique, the VI’s are target-dependent. The resulted BSS is used to VI’s are target-dependent. The resulted BSS is used to generate the performance annotations (delay, power, bus generate the performance annotations (delay, power, bus traffic) and its accuracy is not limited to the calibration codes.traffic) and its accuracy is not limited to the calibration codes.

In both cases, part of the CCISS infrastructure is re-used to:In both cases, part of the CCISS infrastructure is re-used to: parse the assembler,parse the assembler, identify the basic blocks, identify the basic blocks, identify and remove the cross-reference tags, identify and remove the cross-reference tags, handle embedded waits and other constructs, handle embedded waits and other constructs, generate code for bus traffic.generate code for bus traffic.

39


CABA - VICABA - VI

Each benchmark used for calibration generates an equation of the Each benchmark used for calibration generates an equation of the

form:form:bvnvnvn

nn=⋅++⋅+⋅ L

2211

⎪⎪⎩

⎪⎪⎨

⎧

=⋅++⋅+⋅

=⋅++⋅+⋅=⋅++⋅+⋅

mnmnmm

nn

nn

bvnvnvn

bvnvnvn

bvnvnvn

L

M

L

L

2211

22222121

11212111

⎪⎪⎩

⎪⎪⎨

⎧

>

>>

0

0

0

2

1

nv

v

v

M

m

b

B

e

m

i i

i∑=

⎟⎟⎠

⎞⎜⎜⎝

⎛−

= 1

2

1

Error Function to Minimize

40

SW estimation in VCCSW estimation in VCCResultsResults

BenchmarkBenchmark SimulationSimulation PSIMPSIM RelErrRelErr

bs_cfgbs_cfg 48053.948053.9 4823648236 ––0.12% 0.12%

crc_cfgcrc_cfg 330345330345 320862320862 2.99% 2.99%

insertsort_cfginsertsort_cfg 480090480090 480381480381 ––0.03% 0.03%

jfdctint_cfgjfdctint_cfg 1.20559e+061.20559e+06 12058441205844 ––0.01%0.01%

lms_cfglms_cfg 438952438952 430956430956 1.88% 1.88%

matmul_cfgmatmul_cfg 1.14307e+061.14307e+06 11433081143308 ––0.01% 0.01%

fir_cfgfir_cfg 2.61924e+062.61924e+06 25973972597397 0.85%0.85%

fft1k_cfgfft1k_cfg 1.32049e+061.32049e+06 12988821298882 1.67%1.67%

fibcall_cfgfibcall_cfg 120073120073 120324120324 ––0.10%0.10%

fibo_cfgfibo_cfg 6.28005e+066.28005e+06 62802686280268 ––0.00%0.00%

fft1_cfgfft1_cfg 1.00826e+061.00826e+06 984526984526 2.42%2.42%

ludcmp_cfgludcmp_cfg 1.9772e+061.9772e+06 19563081956308 1.07%1.07%

minver_cfgminver_cfg 1.12565e+061.12565e+06 11146931114693 0.99% 0.99%

qurt_cfgqurt_cfg 1.46096e+061.46096e+06 14212821421282 2.80%2.80%

select_cfgselect_cfg 824290824290 746637746637 10.42%10.42% Very small errors where the C source was annotated by analyzing the Very small errors where the C source was annotated by analyzing the nonnon-tagged assembler – not -tagged assembler – not

always possible.always possible. Larger errors are due to errors in the matching mechanism (a one-to-one correspondence between Larger errors are due to errors in the matching mechanism (a one-to-one correspondence between

the C source and assembler basic blocks is not possible) or influences of the tagging on the compiler the C source and assembler basic blocks is not possible) or influences of the tagging on the compiler optimizations.optimizations.

41

SW estimation in VCCSW estimation in VCCConclusionsConclusions

VPM-CVPM-C Features a high accuracy when simulating the code it was tuned for.Features a high accuracy when simulating the code it was tuned for. The BSS file generation can be automatedThe BSS file generation can be automated In case of limited code coverage during the BSS generation phase, it might feature In case of limited code coverage during the BSS generation phase, it might feature

unpredictable accuracy variations when the code or input data changes.unpredictable accuracy variations when the code or input data changes. The code coverage depends also on the data set used as input to generate the The code coverage depends also on the data set used as input to generate the

model.model. Assumes a perfect cache.Assumes a perfect cache. Requires cycle accurate ISS and target compiler (only by the modeler Requires cycle accurate ISS and target compiler (only by the modeler not by the not by the

user of the modeluser of the model)) Good for achieving accurate simulations for data dominated flows, whose control Good for achieving accurate simulations for data dominated flows, whose control

flow remains pretty much unchanged with data variations (e.g., MPEG decoding)flow remains pretty much unchanged with data variations (e.g., MPEG decoding) Development time for a new BSS ranges from 1 day to 1 week. Fine tuning the BSS Development time for a new BSS ranges from 1 day to 1 week. Fine tuning the BSS

to improve the accuracy may go up to 1 month, mostly due to extensive simulationsto improve the accuracy may go up to 1 month, mostly due to extensive simulations Good if not developing extremely time-critical software (e.g. Interrupt Service Good if not developing extremely time-critical software (e.g. Interrupt Service

Routines), or when the precision of SWE is sufficient for the task at hand (e.g., not Routines), or when the precision of SWE is sufficient for the task at hand (e.g., not for final validation after partial integration on an ECU)for final validation after partial integration on an ECU)

Good if SW developer is comfortable in using the Microsoft VC++ IDE, rather than Good if SW developer is comfortable in using the Microsoft VC++ IDE, rather than the target processor development environment, which may be more familiar to the the target processor development environment, which may be more familiar to the designer (and more powerful or usable)designer (and more powerful or usable)

42

SW estimation in VCCSW estimation in VCCConclusionsConclusions

CABACABA Fast simulation, comparable with VPM.Fast simulation, comparable with VPM. Good to very good accuracy, since the measurements are based on the real Good to very good accuracy, since the measurements are based on the real

assembler and target architecture effects.assembler and target architecture effects. Good stability with respect to code or execution flow changes Good stability with respect to code or execution flow changes The production target compiler is needed (both modeler The production target compiler is needed (both modeler and userand user)) About 1 man-month for building a CABA-VI infrastructure, with one processor About 1 man-month for building a CABA-VI infrastructure, with one processor

model.model. From 2 weeks to 2 months to integrate a new processor – depending upon the From 2 weeks to 2 months to integrate a new processor – depending upon the

simulation time required for the calibrationsimulation time required for the calibration Combines the fast simulation, that characterizes the VPM-based techniques, with the Combines the fast simulation, that characterizes the VPM-based techniques, with the

high accuracy of the object code analysis techniques, such as CCISS and ISS high accuracy of the object code analysis techniques, such as CCISS and ISS integration.integration.

Although too few experiments were conducted to know how well it suits various Although too few experiments were conducted to know how well it suits various kinds of targets and what is its accuracy and stability to input data and control flow kinds of targets and what is its accuracy and stability to input data and control flow variations, they appear to be promising.variations, they appear to be promising.

Date post:	19-Dec-2015
Category:	Documents
View:	215 times
Download:	0 times

Software Estimation Alberto Sangiovanni-Vincentelli Thanks to Prof. Sharad Malik at Princeton...

Documents