HeroeS: Virtual Platform Driven Integration of ...adt.cs.upb.de/wolfgang/isorc13.pdf · and the...

HeroeS: Virtual Platform Driven Integration ofHeterogeneous Software Components for

Multi-Core Real-Time ArchitecturesMarkus Becker

University of Paderborn/C-LABPaderborn, Germany

Email: [email protected]

Ulrich KiffmeierdSPACE GmbH

Paderborn, GermanyEmail: [email protected]

Wolfgang MuellerUniversity of Paderborn/C-LAB

Paderborn, GermanyEmail: [email protected]

Abstract—This article presents the HeroeS virtual platformdriven methodology for embedded multi-core and real-time SWdesign. The methodology’s focus is on early integration, testingand performance estimation of heterogeneous SW stacks, i.e.,SW components and layers at mixed abstraction levels and/ortargeting different instruction sets. We take into account currentsystem-level methodologies such as Transaction Level Modeling(TLM) and Real-Time Operating System (RTOS) modeling.For this, a SystemC virtual platform framework is presentedcombining state of the art simulation techniques according to theproposed methodology. This includes host-compiled target SWabstraction, abstract RTOS and Hardware Abstraction Layer(HAL) models in SystemC, extended QEMU user and systemmode emulation and TLM 2.0 bus models. Efficient but yetaccurate performance estimates can be provided through staticand dynamic annotation. We apply binary mutation testing, i.e,a test assessment and improvement approach for instructionlevel SW testing. Our approach was investigated by prototypicalintegration into a commercial AUTOSAR environment. Exper-imental results were obtained by an automotive case study: afault-tolerant fuel injection control system, which is part of anin-car network.

I. INTRODUCTION

Today’s embedded systems increasingly include networksof multi-core CPUs running more and more SW with real-time constraints. In some embedded application domains, suchas mobile computing or automotive industry, SW has alreadyreplaced HW as the key driver for innovation. However,HW greatly impacts SW in terms of performance. Thus, SWdevelopment is tightly coupled with the HW especially atlower levels of the software stack, which are referred to asHardware-dependent Software (HdS). Virtual platforms enableearly SW development by providing execution environmentsbased on more or less abstract models before the actual HWis available. In order to manage the ever increasing platformcomplexity TLM methodologies were introduced raising thelevel of abstraction, thereby, elevating HW/SW codesign andverification to Electronic System-Level (ESL). However, cur-rent approaches such as OSCI TLM 2.0 are still too muchfocused on HW. Though, the advent of RTOS modelingapproaches in SystemC partially closed the SW methodologygap, there remains a lack of guidance for applying the differentmethodologies and libraries in a unified design frameworkproviding smooth HW/SW corefinement.

Embedded SW engineering has mainly become layer andcomponent oriented enabling reuse across projects and com-panies, e.g., in Commercial Off-The-Shelf (COTS) drivendeployment flows. Such components are typically subject

to intellectual property protection. Thus, source code is notavailable at the integration stage. Moreover, hardware plat-forms are often shipped with SW components, e.g., standardlibraries, device drivers and boot firmware. Such SW istypically provided in formats targeting a certain Instruction SetArchitecture (ISA), e.g., assembler code, linkable object codeand binary images. Thus, engineers are frequently confrontedwith the integration of heterogeneous SW components. Forthis, practical design environments are required addressingmore flexibility in terms of supporting early integration andestimation of such SW stacks.

In this article we propose the HeroeS methodology focusingon advanced models for heterogeneous SW in the contextof current system-level methodologies such as TLM andRTOS modeling. Our methodology is supported by a virtualplatform framework enabling integration of heterogeneous SWcomponents at different abstraction levels. We consider earlyperformance and real-time estimation techniques addressingspeed vs. accuracy trade-off. Integration testing is covered byadvanced mutation analysis techniques which apply to bothsource level and instruction level SW. The implemented virtualplatform framework is based on SystemC [1] utilizing state ofthe art techniques for simulation acceleration. This includestechniques such as host-compiled target SW abstraction, ex-tended QEMU user mode and full system emulation, perfor-mance modeling through static and dynamic annotation of SWsegments, abstract RTOS and HAL models and TLM 2.0 basedsystem bus abstraction.

We investigate the applicability of the HeroeS methodologyin the context of automotive SW design flows. For this, a pro-totypical integration into a commercial AUTOSAR tool chainwas achieved. industrial example application: a distributedmotor management SW which is part of an in-car network.

The remainder of the article is organized as follows. InSection II we give an overview to current system level method-ologies. In Section III we introduce the HeroeS methodology.In Section IV we describe a virtual platform frameworkaccording to the presented methodology. In Section V weconsider AUTOSAR as a case study for applying the HeroeSapproach to automotive domain. In Section VI we discussrelated work in the field of system level SW modeling andanalysis. Finally, Section VII concludes our paper giving anoutlook to future work.

II. ELECTRONIC SYSTEM LEVEL DESIGN

In order to deal with consideration of complex mixedHW/SW systems, system-level methodologies and abstractionlevels were introduced beyond Register Transfer Level (RTL).In 2003, [2] introduced the notion of Transaction Level Mod-eling (TLM). They address the abstraction of cycle-timed RTLvia approximately-timed TLM (bus arbitration) abstraction tountimed system specifications with respect to computation andcommunication. In parallel, [3][4] introduced clockless TLMabstractions to interface embedded SW and HW to capturelogic operations on buses for master and slave communication.For the smooth transition from timed to untimed models, thelevels of Programmers View with and without timing wereintroduced (PV & PV-T) which finally contributed to the OSCITLM 1.0 standard [5]. Thereafter, PV-T was revised by theOSCI and refined to loosely- and approximately-timed codingstyles as they can be found in TLM 2.0 [6]. Thus, the termTLM is somehow ambiguous as different definitions weregiven by individual research groups. We refer to TLM as thegeneral concept of starting system design from a combinedHW/SW model of processes communicating via read andwrite primitives accessing channels such as FIFOs. Wheneverwe use the term TLM 2.0 we refer to the more specificapproach provided by the Open SystemC Initiative (OSCI) [1]that focuses mainly on abstract modeling of memory mappedsystem buses. For this, they provide a library and codingstyles based on concepts such as target and initiator ports,blocking and non-blocking interfaces, the quantum keeper,the generic payload and Direct Memory Interface (DMI). Inorder to connect with models of different abstraction levelsso-called transactors are required, which convert the individualinterfaces and protocols to the respective TLM protocol.

A. Processor and SW modeling

In former times of HW/SW codesign unified modelingstopped after the HW/SW partitioning step by continuing SWdevelopment in cosimulated Instruction Set Simulators (ISS).In the context of system-level design, additional abstractionlevels for processor and SW modeling have been introduced

Fig. 1. Common organization of layered software stacks by Ecker et al. [7].

through recent years referred to as (abstract) RTOS modeling.Such approaches are typically based on the introduction of SWschedulers and RTOS APIs in a system-level design language– today mainly in SystemC.

Schirner et. al presented a layered approach in [8], incre-mentally describing processor modeling with essential featuresof task mapping, dynamic scheduling, interrupt handler, low-level firmware and hardware interrupt handling. At the highestlevel, the application is running natively on the simulationhost. At Task Level, an abstract RTOS model is introducedand processes are refined into tasks. In the processor modelsat the Firmware (FW) and TLM refinement steps, hardwareabstraction (HAL) and processor hardware layers are intro-duced. The FW and TLM level add models of the externalbus communication and the interrupt handling chain on thesoftware and hardware side, respectively. Finally, a Bus-Functional Model (BFM) of the processor includes pin- andcycle-accurate models of the bus interfaces and the protocolstate machines driving and sampling the external wires.

III. HETEROGENEOUS SW-CENTRIC SYSTEM MODELING

Fig. 1 shows the common organization of a layered softwarestack as it can be found in embedded systems. Such SW stacksare divided into layers of the application software, middlewareand the Hardware-dependent Software (HdS) which can befurther divided into a Hardware Abstraction Layer (HAL)and the portable part of the system software, i.e., the RTOS,communication protocols, boot firmware and device drivers.We propose the advanced HeroeS design methodology in orderto deal with the integration of such embedded SW stackswhich are highly heterogeneous by nature.

The starting point of our methodology (see Fig. 2) is anApplication SW model (I. ASM), i.e., a SW partition of asystem specification model. We assume the application to bedefined by source code, e.g., C/C++ code either written byhand or automatically derived from higher level models suchas state flow models in MATLAB/Simulink or UML. Forheterogeneous SW modeling, we propose four intermediaterefinement models decoupling code refinement from interfacerefinement, namely, Source Level Task Model (II. STM),Target Level Task Model (III. TTM), Source Level SystemSoftware Model (IV. SSM) and Target Level System SoftwareModel (V. TSM). For this, we employ abstract RTOS mod-eling further separating task modeling and system softwaremodeling steps which can be modeled at both the source codelevel and the target instruction level. By decoupling also therefinement of individual CPUs or RTOS tasks a high degree ofmodeling flexibility can be achieved. Finally, the target specificSW Implementation Model (VI. SIM) contains all SW layersfor running on real hardware.

Boxes in the background of Fig. 2 depict the relationship ofour methodology to transaction-level modeling as introducedin Section II. Our focus is on SW modeling and refinement.However, comodeling of HW and SW is essential to ourmethodology. For this, we consider TLM as the HW/SWinterface throughout the refinement process. Though, someaddress related information can be partially derived by modelsI.-V., such as the frequency and locality of host or virtualmemory accesses, full benefits of OSCI TLM 2.0 can beexploited only with the SIM which includes register accurateaccess to physical addresses through the definition of a HAL.

Fig. 2. The HeroeS SW-centric methodology for heterogeneous SW.

We do not consider performance modeling as explicit partof the SW refinement since performance is estimated by thevirtual platform. Nevertheless, as indicated by the z-axis inFig. 2 we identified four general modeling techniques whichcan be applied for SW performance estimation dependingon the SW model and the available details of the HW.More details on individual methods will follow in subsequentsections.

A. Code and Interface RefinementCode and interface refinement consists of four major steps

(a.-d.). Each refinement step comprises several smaller de-sign decisions. In order to provide more flexibility in termsof integrating instruction level target SW we consider coderefinement and interface refinement as orthogonal steps. Thus,(b.) system software modeling and (c.) target instruction setarchitecture modeling can be carried out independently. Thefour refinement steps are as follows:

a. Task modeling requires mapping of SW processes totasks and assigning tasks to single-core or multi-corescheduling contexts. Each task has a set of assignedproperties such as periodic vs. aperiodic activation andpreemptive vs. non-preemptive execution. Each schedul-ing context can have a user-defined scheduling policysuch as fixed priorities, dynamic priorities and/or timeslicing. Primitive calls such as read/write or send/re-ceive have to be refined to RTOS and communicationAPIs. This can be automated/abstracted by the generationof a middleware/adapter layer such as the Run-TimeEnvironment (RTE) layer approach in AUTOSAR [9].The introduction of APIs can be stepwise starting froma canonic RTOS API refining towards standard APIs,such as POSIX [10] or OSEK/AUTOSAR. Multi-coreprogramming support can be introduced on top of dedi-cated middleware APIs such as OpenMP [11] or MCAPI[12]. RTOS and communication models are abstract atthis point as the actual system services implementationremains undefined.

b. System software modeling requires the definition of sys-tem services which are typically based on a HAL API.Such services include the portable part of the RTOS,higher layers of the communication stacks and secondlevel device drivers. At this point the abstract RTOS

model must be refined to an actual RTOS implementation,e.g., the task activation mechanisms and task schedulers.For this, HW interrupts such as timers or I/O requestsneed to be introduced and according Interrupt ServiceRoutines (ISR) must be defined. Communication APIsneed to be refined according to the system topology,e.g., shared memory, inter-process communication, I/Oor network communication. As the system SW layerinterfaces with the HW through APIs the SW can bestill defined in high level languages such as C. Thus,there is no dependence to modeling details of the targetarchitecture and the physical memory layout.

c. Target instruction set architecture modeling requires thedefinition of the target ISA, e.g., ARM or PowerPC,and compilation of the source code to processor specificobject and binary code. This implicates the selection ofa compiler tool chain and compiler flags such as thespecific instruction set and code optimization flags. More-over, application binary interface (ABI) agreements aredecided, i.e., calling conventions, byte ordering, paddingand alignment. As such, user space memory layout isdefined by the compiler/linker introducing virtual addressspace.

d. Hardware abstraction layer modeling requires the def-inition of the HAL according to the actual physicalmemory layout of the system/memory bus. This includesthe definition of the low level assembler routines suchas HW initializations, the RTOS context switch andother routines accessing HW registers directly, e.g., clock,timer, interrupt and power configuration. Resource allo-cation might require SW refinements of the higher layersaccording to the refined HW platform model. A binaryimage of the software stack can be derived in order tomove to cycle-accurate ISS/RTL models or real hardwarefor more precise performance and real-time verification.However, all design decisions concerning the SW stackdesign have been taken at this point. Thus, the result canbe considered as SW implementation model targeting areal HW platform.

B. Performance ModelingSW performance can be modeled at different accuracy

levels according to the available details of the HW model. In

ASM STM SSM TTM TSM SIMUntimed (zero-timed) X X X X X XStatic execution time annotation X X X X XDynamic execution time estimation X X XTLM 2.0bus

Loosely-timed XApproximately-timed X

TABLE IAPPLICATION OF SW PERFORMANCE MODELING TECHNIQUES.

general, we distinguish four approaches for SW performancemodeling: (i) untimed (sometimes referred to as zero-timed),(ii) static execution time annotation, (iii) dynamic executiontime estimation and (iv) bus transaction modeling. The mostabstract approach is untimed or zero-timed modeling whichis be applied when there are no details of the HW platformat all. Here, we assume an idealized processor executing soft-ware with infinite speed. Virtual time is approaching throughdiscrete events used to block/trigger SW processes. However,the SW execution itself does not have any execution time (i.e.,zero-time). This method can be used to investigate the effectsof causal relationship between SW and time-stamped eventsbut not for estimating performance of the SW being executedon a certain HW.

A simple method for performance estimation is to annotatesoftware segments, such as instructions, compiler basic blocks,functions or tasks, by static execution delays. Such delayscan be derived by the specification in terms of worst-caseconsiderations or actual delays can be analyzed in advanceby some static model of the HW platform using analysistools like AbsInt aiT [13]. However, the accuracy of staticannotation might be insufficient due to dynamic executiondelays which cannot be resolved off-line. Here, dynamicestimation can provide more accuracy through abstract mod-els of the processor core, e.g., an ALU timing model fordata-dependent delays. Depending on the system complexityperformance is heavily influenced by the memory subsystemresulting in more or less deterministic SW timing. Modelingthe memory delay as a constant overhead per access can besufficient for deterministic platforms such as used for hardreal-time applications. However, modern mobile platforms arehighly dynamic since they are equipped with shared memories,hierarchical caches, pipelines and branch predictors. In such acase, accurate estimations require more sophisticated modelssimulating the sequences of memory transactions. In order toavoid the overhead of cycle-accurate bus modeling TLM 2.0coding styles Loosely-Timed (LT) or Approximately-Timed(AT) are used to abstract from pin-accurate models and tomake synchronization among the bus components less tight.This provides a good accuracy vs. speed trade-off for earlyvirtual platforms. Performance modeling techniques apply tothe proposed HeroeS models as depicted by Table I.

C. Component Integration TestingThe integration of components to a system often reveals

unforeseeable failures which cannot be avoided by thoroughunit and component testing alone. This can be for instancedue to incompatible interface specification. Thus, additionaltesting is required at the integration stage which can be avery challenging task when dealing with heterogeneous SWstacks. In order to reach sufficient test coverage integrationtesting might require sophisticated test patterns which haveto be different compared to component level testing. Suchtest patterns must be able to propagate test stimuli from the

ASM STM SSM TTM TSM SIMSource level mutation testing X X XInstruction level mutation testing X X X

TABLE IIAPPLICATION OF MUTATION TESTING TECHNIQUES.

interfaces of the integrated Design Under Test (DUT) throughinterconnected components and stacked SW layers. Here,mutation based testing approaches combined with AutomaticTest Pattern Generation (ATPG) approaches turned out to bevaluable tools for providing high quality test environments.The basic principle of mutation based testing is to insert smallfailures into the DUT which are likely to be coupled with realfaults. Such faults are typically applied to source code andmutants are derived by compilation. Each mutant is executedwith the available set of test cases. A mutant is said to bekilled in case a deviation of the output w.r.t. the originalDUT was detected. A mutation score is computed in orderto provide a quality metric for the current test environment.The more mutants were killed the better the test environmentis rated. ATPG methods, such as random and/or constraintbased approaches, can be applied in case the test environmentreveals an insufficient quality.

While white box testing methods apply well to source levelSW they are not suitable for instruction level SW such asbinaries. In contrast, black box testing is limited w.r.t. ATPGas there is no insight into the DUT. We provide a method formutation testing at the instruction level utilizing target specificfault models. Our method can be applied to the unmodifiedbinary under test without assuming source code access or anyknowledge of the binary’s behavior. Instruction level mutationtesting comprises four major steps: (i) binary analysis, (ii)automatic test pattern generation, (iii) binary mutation testingand (iv) evaluation. We apply static control flow and dataflow analysis in order to derive a constraint annotated controlflow graph (CFG) which serves as input model for a SATsolving. Mutation tables are derived by a target-specific faultmodel and the annotated CFGs. The actual application of themutation table is performed at run-time. For this, we extendedthe QEMU binary translator flow to inject and detect faultswhile the binary under test is executed. Testing results are backannotated to the CFG model in order to compute additional testcases. A combined formal/heuristic approach is implementedby randomly solving path constraints in order to increaseprobability of code and path coverage which correlates alsowith the mutation score. In case the achieved mutation scoreindicates an insufficient test quality steps (ii)-(iv) can berepeated. More details on this method can be found in [14][15].SW mutation testing applies to HeroeS models as depicted byTable II.

IV. VIRTUAL PLATFORM FRAMEWORK

In this section we present a virtual platform frameworkaccording to the presented HeroeS design methodology. Theplatform’s backplane is a SystemC kernel for event drivensimulation. We utilize different SystemC libraries to achieveefficient execution environments for each of the proposedSW models. For ASM simulation, the application SW ismapped to plain SystemC. In STM, SSM, TTM and TSM,lower layers of the SW stack are abstracted by RTOS andHAL models in SystemC which are replaced stepwise throughsource level and instruction level production code. Source leveltarget SW is abstracted in ASMs, STMs and SSMs through

host-compilation and native execution. Here, performance canbe modeled through static binary code analysis and backannotation to source code. We extended QEMU to supportmutation and dynamic performance estimation for instructionlevel SW which we refer to as XEMU [15]. Wrappers forXEMU were developed for interfacing with SystemC in orderto simulate TTMs, TSMs and SIMs. TLM 2.0 bus models canbe connected to the memory interface of XEMU system modefor SIM simulation. Fig. 3 shows the mapping of the HeroeSmethodology to the SystemC virtual platform framework.

A. SystemC RTOS and HAL Models

In SystemC, concurrency modeling is achieved by means ofa cooperative scheduler implementing pseudo parallelization.In order to yield control to the simulation kernel a SystemCthread (SC THREAD) has to explicitly invoke wait blockingfor an event or an amount of virtual time. The semanticsof a native SC THREAD do not apply to SW modelingsince SW threads are inherently preemptive due to CPUinterrupts. Thus, our abstract RTOS and HAL models inSystemC provide an additional synchronization layer on topof the SystemC scheduler for modeling of preemptive SWprocesses. For this, we introduce two special SystemC moduletypes SC RTOS MODULE and SC HAL MODULE. Thesemodules define APIs for abstract RTOS and HAL services.A dedicated function CONSUME CPU TIME is provided inorder to account a SW segment’s execution time by imple-menting an interruptible wait statement.

Tasks and cores are modeled by connection ofSC THREADs to RTOS contexts. Each RTOS contextcan be assigned to a user-defined or common schedulingpolicy such as fixed priorities, EDF or Round-Robin. Inaddition to a generic RTOS API the model provides standardOS interfaces such as POSIX or OSEK/AUTOSAR. TheSC HAL MODULE provides system service modeling ontop of a HAL API with primitives for process creation,context switching and interrupt management. An ISR ismodeled by an SC THREAD which must be connected to aHW event and an RTOS context. Dedicated ISR schedulerscan be defined according to the interrupt policy of the HWplatform. More details on this method can be found in [16].

Fig. 3. Mapping of HeroeS methodology and virtual platform framework.

B. Host-Compiled Target SW Abstraction

For the native execution of host-compiled target SW we pro-vide a wrapper for linking shared object code to the SystemCmodel. In order to abstract from the wrapping mechanismwe defined a common interface which is shared by sourcelevel and instruction level wrappers. The wrapped SW can betriggered by SystemC events through control API functionssuch as run(MyRoutine). Callbacks coming from wrapped SWare forwarded to SystemC APIs. For this, a function call isrepresented by a structure syscall t which wraps the callbackdata in a target independent format. Callbacks are also usedfor piggybacking accumulated execution delay in order to syn-chronize locally decoupled time with the global SystemC timethrough calling the CONSUME CPU TIME primitive.

Static Execution Time Annotation: At the source level targetSW execution time can be modeled by instrumenting the host-compiled SW. For this, code has to be segmented into timeannotated behaviors such as tasks or functions. We applysegmentation to the level of source code branches in orderto accurately capture the control flow of the application suchas loops or conditional statements. For this, a C grammar hasbeen developed to place preprocessor marks at the source codespanning a CFG. The marks can be replaced to instrument thesource code by execution delays or cycle count accumulationstatements according to the transition between two sourcecode marks. Such values can be approximated, measured oranalyzed statically. We implemented a tool chain for backannotation of static timing analysis. For this, source codemarks are replaced by volatile declared assembler labels fortarget compilation. Such labels can be used to identify linearinstruction level SW segments according to the source codeCFG. Best case and worst case execution time (BCET/WCET)for segments can be derived in order to be back annotated tosource code. The approach can provide sufficient accuracy forearly estimates. However, the accuracy depends on the targetplatform’s complexity and the target compiler as mapping ofthe instruction level and source level CFGs is challengingwhen the compiler applies advanced optimization techniquessuch as function in-lining or loop unrolling. Details on thismethod can be found in [17].

Traditional cycle-accurate and binary code interpreting ISSturned out to be a performance bottleneck in fast virtualplatforms. QEMU is an open source SW emulator imple-menting Dynamic Binary Translation (DBT) technique. Unlikestatic binary translators only code encountered at runtimeis considered avoiding unnecessary translation overhead. Incontrast to traditional ISS, DBT is performed at compiler basicblock level, i.e., linear code segments until a final branchinstruction. Moreover, translated blocks (TB) are bufferedin a translation cache in order to provide execution speedclose to native execution by avoiding redundant translations.QEMU is suitable for fast functional CPU modeling whenthere is no need for a detailed model of the CPU’s microarchitecture. Besides x86 QEMU supports many differentembedded systems architectures such as ARM, PowerPC,SPARC, MIPS or Microblaze. In general, QEMU operates intwo emulation modes: user mode and system mode. The usermode provides CPU emulation for a single user program ontop of a Linux kernel. QEMU (full) system mode providesemulation of an entire target system including CPU cores,system bus and I/O in order to run a complete software stack,

i.e., including boot firmware, operating system and HAL.QEMU can be considered as state of the art concerning fastISS. However, QEMU does not naturally provide performanceestimates for the executed target SW. We extended QEMU’sbinary translator for dynamic timing estimation and run-timecode mutation (which we refer to as XEMU [15]).

We integrate both the user mode and the system mode ofXEMU according to interface refinement of the SW models.For this, we implemented two SystemC wrappers. The usermode wrapper connects XEMU with SystemC using a com-mon SW wrapper interface which we also use to connect tohost-compiled target SW wrapper. As such, functions of thewrapped SW stacks can be triggered by SystemC. Trapped APIcalls coming from wrapped SW stacks are forwarded to theRTOS, HAL and TLM APIs in SystemC. Callbacks trapped byXEMU user mode need special care as the wrapped instructionlevel code is typically compiled for a different target ISA.Thus, the XEMU user mode wrapper requires a specific targetto host adapter dealing with binary interface issues suchas calling conventions and byte ordering. Moreover, pointerarguments need to be converted from target to host addressspace. For this, the adapter needs to subtract the base addressof the emulated target memory is which mapped to the host’saddress space using an offset.

XEMU for Time Estimation and Mutation Testing: Weapplied two extensions to QEMU in order to provide timeestimation (XEMU-T) and mutation testing (XEMU-M) forinstruction level target code. For this, we modified the binarytranslator to instrument the generated back end code. XEMU-T annotates code blocks by static cycle counts according to theCPU timing specification. In order to capture data-dependentdelays evaluation code is inserted into translated blocks whichthen accumulates additional CPU cycles during execution.XEMU-M emulates small SW faults at run-time by modifyingtranslated code blocks according to a mutation table which isgenerated by code analysis in advance. More details on thismethod can be found in [14].

C. SynchronizationSynchronization of time can be either loose or tight impact-

ing the speed vs. accuracy trade-off. Different schemes can beapplied according to the SW abstraction levels. The proposedperformance models rely on time annotated functional seg-ments. As such, we do not apply any clock cycle synchroniza-tion. SW processes must call CONSUME CPU TIMEin order to synchronize locally accumulated execution delayswhich is limited by the granularity of functional segments.For source level SW models processes can be segmenteddown to the level of linear control flow segments according tosource code. For instruction level SW models segments can besmaller such as compiler basic blocks or even single processorinstructions. However, for the sake of performance we applya more coarse-grained synchronization scheme [18]. As such,local time is decoupled by accumulating time annotations. Inorder to simulate the application’s data flow in a correct causalrelationship a synchronization must be performed before eachcommunication. Synchronization in TLM 2.0 depends on thecoding style of the bus model. For LT buses, synchronizationwith XEMU system mode can be applied on the level ofcompiler basic blocks in order to benefit from efficient dy-namic binary translation. Thus, we do not propose applicationof a fixed time quantum as the XEMU execution is driven

by translated basic blocks which can have varying executiontime estimates. However, the maximum size of a basic blockin terms of the instruction count can be limited resultingin less simulation performance. For AT buses, each memorytransaction needs to be synchronized according to the busprotocol. For this, XEMU system mode has to be executedin single instruction mode providing more accuracy on thecost of a major performance degradation.

V. CASE STUDY: AUTOMOTIVE SOFTWARE DESIGN

A. AUTOSARThe AUTomotive Open System ARchitecture (AUTOSAR)

[9] initiative is a partnership of leading automotive companiesfounded in 2003. The automotive industry is widely organizedas chains of suppliers and integrators such as tool vendors,automotive HW/SW suppliers and car manufacturers (OEM).In order to increase interoperability among companies theinitiative developed a methodology based on an XML metamodel and a well-defined software architecture for ElectronicControl Units (ECU). Application SW models in AUTOSARare strictly component-oriented. Atomic components can beorganized hierarchically and connected via ports and interfacesusing client/server or sender/receiver patterns. The internalbehavior of atomic components is modeled through runnables,data accesses and events. This view is referred to as theVirtual Functional Bus (VFB) which is defined upon the Run-Time Environment (RTE) middleware layer API. The work in[19] describes a mapping of the VFB view to SystemC/TLMconcepts which matches with the ASM concepts of the HeroeSmethodology and framework. By defining the RTE layer theapplication SW is deployed to task sets running on a networkof ECUs. This requires mapping of RTE concepts to the RTOSand comm. API which we refer to as the Task Modeling step.The ECU architecture furthermore defines an interface forthe RTE layer the Microcontroller Abstraction Layer (MCAL)which the Basic Software (BSW), i.e., the system software, isdefined upon. Following the HeroeS methodology, such layerscan be introduced stepwise. We investigated the mapping ofAUTOSAR and HeroeS methodologies by means of a com-mercial AUTOSAR design environment. For this, we consid-ered the integration of the HeroeS virtual platform frameworkinto the AUTOSAR tool chain provided by dSPACE in orderto simulate virtual ECU (V-ECU) networks.

B. Example Application: Fuel Injection ControllerFor experimental results we investigated the SW of a fault-

tolerant fuel injection controller, which is a part of a motormanagement system. The fuel injection controller is modeledby a SW component (SWC) which is internally composedof atomic components for sensor correction and for fuel ratecomputation. The controller requires four sensor signals, suchas throttle angle and engine speed. The sensor correctionis able to compensate one signal fault at a time by use ofapproximation. Based on the corrected sensor data the fuel ratecomputation calculates a fuel injection rate. The controller’soutput signals are transmitted to a further component whichdrives a combi instrument. For testing purpose, the applicationcomes with another component wrapping a physical model ofthe engine which generates stimuli in a closed-loop manner.The generated application C code consists of 10 functionswith a total complexity of 3397 lines of code. According tothe AUTOSAR methodology the application was mapped to

a network of three ECUs connected by a CAN bus. We useddSPACE SystemDesk [20] to generate the ECU productioncode with a total complexity of approx. 30,000 lines.

C. AUTOSAR Tool Chain IntegrationFig. 4 shows the prototypical integration of the HeroeS

framework into the dSPACE tool chain. For this, an interfacefor the XCP protocol and A2L (ASAP2) descriptions wasimplemented. Furthermore, we implemented an interface forvirtual ECU descriptions (V-ECU) in order to import SWstacks generated by SystemDesk. The platform can be operatedin two modes: (i) fast simulation mode and (ii) interactivemode. Mode (i) executes as fast as possible for testing andestimating different system configurations. Mode (ii) useshost-adaptive speed control to interface with an experimen-tation environment for interactive control and measurement ofvariables. Fig. 5 shows a snapshot of the running tool setup.Tasks, signals and events can be traced in VCD format throughSystemC tracing facilities.

Fig. 4. Prototypical integration into dSPACE tool chain.

VI. RELATED WORK

Embedded software is typically analyzed by formal tim-ing analysis or by means of a virtual prototype. For staticformal timing analysis, Worst Case Execution/Response Time(WCET/WCRT) analysis is applied on a frequent basis. WCRTanalysis is typically based on an event stream abstraction,where the individual tasks in the model are activated by events[21][22]. The event stream abstraction takes composable eventFIFOs at the components’ inputs and outputs for performanceanalysis. The underlying theory defines a workload for theindividual tasks within a specific time interval, so that minimaland maximal distances between events can be determined, e.g.,by the means of the sliding window technique [23].

For the implementation of a virtual system prototype, asystem level language - today mainly SystemC - is applied incombination with an abstract RTOS [24] and/or an ISS [25].Abstract RTOS models have the advantage that they providea significantly faster simulation speed for time-based simula-tions on the costs of less accuracy in the simulation results. Assuch, significant speed-ups of up to 40.000x compared to ISSwith simulation errors of less than 2% have been reported[26]. The combination of a SystemC RTOS scheduler andinterpretive ISS was proposed in [27] first. Later, more hybridapproaches for source code and instruction level target SWwere proposed such as HVP [28], HySim [29] and HyCos [30].However, the authors do not investigate the employment ofadvanced emulation techniques such as QEMU user mode orfull system emulation. The combination of QEMU full systememulator and SystemC/TLM 2.0 for early SW development

was proposed by GreenSocs [31] first focusing on functionalaspects. Later, more advanced approaches were introduced,e.g, TIMA RABBITS [32], addressing efficient investigationsof non-functional properties such as timing and power. Someother virtual platforms also rely on SystemC/TLM 2.0 andsimilar DBT technology such as OVPsim[33] and SynopsysCoMET[34]. However, they are either commercial or sourcecode is partially closed. Thus, they are not as suitable foracademic research. To the best of our knowledge we arenot aware of any approach that evaluates the combinationof QEMU/DBT technology and SystemC models at differentHW/SW and SW/SW interfaces such as the system bus, theHAL API and the RTOS API.

The HeroeS approach proposed in this article continuesour former works w.r.t. system-level RTOS-aware HW/SWmodeling flows. A dedicated approach for the refinement fromfunctional SystemC PV models to SystemC/ISS cosimulationwas introduced by [35]. It supports smooth HW/SW parti-tioning by replacing SystemC threads with POSIX threadswithout code modifications with the help of their SC2OSlibrary. However, the approach is limited to the functionalevaluation of SW not considering RTOS properties. Therefore,we extended the flow in [36] to a four-level RTOS-awareTLM 2.0-based refinement that starts from functional untimedSystemC PV models. In a first step, the methodology appliesSC2OS before models are ported to a canonical abstract RTOS,namely, aRTOS, which is finally replaced by the target RTOSrunning in the full system mode of the QEMU softwareemulator. In the third step, aRTOS system calls for abstractRTOS simulation are introduced [16], which are replaced bysystem calls of the target RTOS in the final step. Later, werefined the flow by combining QEMU user mode emulationwith SystemC at the RTOS API [18].

VII. CONCLUSION

We presented the novel HeroeS methodology and virtualplatform framework for early SW integration and performanceestimation of heterogeneous SW stacks for embedded muti-core and real-time architectures. The implemented virtualplatform framework is based on SystemC employing severalstate of the art abstraction techniques for generating fastsimulation models by means of TLM, RTOS/HAL models andextended QEMU user/system mode emulation. Our approachwas evaluated by protopyical integration into a commercialAUTOSAR design environment. Our future work will focus onscalability investigations w.r.t performance vs. accuracy trade-off that can be provided by the proposed platform models.

VIII. AKNOWLEDGEMENTS

This work was partly funded by the German Ministry of Ed-ucation and Research (BMBF) through the project SANITAS(16M3088I), the DFG SFB 614, the ITEA2 projects VERDE(01S09012) and TIMMO-2-USE (01IS10034).

REFERENCES

[1] Homepage of SystemC: http://www.systemc.org/.[2] L. Cai and D. Gajski, “Transaction Level Modeling: An Overview,” in

Proc. of the Int. Conference on Hardware/Software Codesign and SystemSynthesis (CODES+ISSS), 2003.

[3] A. Clouard, K. Jain, F. Ghenassia, L. Maillet-Contoz, and J.-P. Strassen,“Using Transactional Level Models in a SoC Design Flow,” in SystemC– Methodologies and Applications. Springer, 2003.

[4] F. Ghenassia(ed.), “Transaction-Level Modeling with SystemC.”Springer, 2005.

Fig. 5. SystemC virtual platform connected to measurement and calibration environment via XCP/A2L interfaces.

[5] OSCI. Transaction Level Modeling Library, Release 1.0. April 2005.[6] OSCI. Transaction Level Modeling Library, Release 2.0.1. July 2009.[7] W. Ecker, V. Esen, T. Steininger, and M. Velten, “HW/SW Interface

Implementation and Modeling,” in Hardware-dependent Software, 2009,pp. 95–150.

[8] G. Schirner and R. Domer, “Introducing Preemptive Scheduling in Ab-stract RTOS Models Using Result Oriented Modeling,” in Proceedingsof Design, Automation and Test in Europe (DATE), 2008.

[9] Homepage of AUTOSAR: http://www.autosar.org/.[10] The Open Group. IEEE Std 1003.1-2008. [Online]. Available:

http://pubs.opengroup.org/onlinepubs/9699919799/[11] Homepage of OpenMP: http://www.openmp.org/.[12] Homepage of MCAPI: http://www.multicore-association.org/.[13] Homepage of AbsInt: http://www.absint.com/.[14] M. Becker, C. Kuznik, M. Joy, T. Xie, and W. Mueller, “Binary Mutation

Testing Trough Dynamic Translation,” in Proc. of the 42th InternationalConference on Dependable Systems and Networks (DSN), 2012.

[15] M. Becker, D. Baldin, C. Kuznik, M. Joy, T. Xie, and W. Mueller,“XEMU: An Efficient QEMU Based Binary Mutation Testing Frame-work for Embedded Software,” in Proc. of the International Conferenceon Embedded Software (EMSOFT), 2012.

[16] H. Zabel, W. Mueller, and A. Gerstlauer, “Accurate RTOS Modelingand Analysis with SystemC,” in Hardware-dependent Software, 2009,pp. 233–260.

[17] M. Joy, M. Becker, W. Mueller, and E. Mathews, “Automated SourceCode Annotation for Timing Analysis of Embedded Software,” inProc. of the Advanced Computing and Communications Conference(ADCOM), 2012.

[18] M. Becker, H. Zabel, and W. Mueller, “A Mixed Level Simulation En-vironment for Stepwise RTOS Refinement,” in Proc. of the Conferenceon Distributed and Parallel Embedded Systems (DIPES), 2010.

[19] M. Krause, O. Bringmann, and W. Rosenstiel, “Verification of AU-TOSAR Software by SystemC-Based Virtual Prototyping,” in Hardware-dependent Software, 2009, pp. 267–293.

[20] Homepage of dSPACE GmbH: http://www.dspace.com/.[21] S. Chakraborty, S. Kuenzli, and L. Thiele, “A General Framework for

Analysing System Properties in Platform-Based Embedded System De-sign,” in Proc. of the Design, Automation and Test in Europe Conference(DATE), 2003.

[22] S. Schliecker, A. Hamann, R. Racu, and R. Ernst, “Formal Methods forSystem Level Performance Analysis and Optimization,” in Proc. of theDesign Verification Conference (DVCON), 2008.

[23] J. Lehoczky, “Fixed Priortiy Scheduling of Periodic Tasks Sets with

Arbitrary Deadlines,” in Proc. of the Real-Time Systems Symposiums(RTSS), 1990.

[24] A. Gerstlauer, H. Yu, and D. Gajski, “RTOS Modeling for System LevelDesign,” in DATE’03: Design, Automation and Test in Europe, 2003.

[25] A. Nohl, G. Braun, O. Schliebusch, R. Leupers, H. Meyr, and A. Hoff-mann, “A Universal Technique for Fast and Flexible Instruction-SetArchitecture Simulation,” in Proc. of the Design Automation Conference(DAC), 2002.

[26] G. Schirner, A. Gerstlauer, and R. Domer, “Abstract, MultifacetedModeling of Embedded Processors for System Level Design,” in Proc.of the Conference on Asia South Pacific Design Automation (ASPDAC),2007.

[27] M. Krause, D. Englert, O. Bringmann, and W. Rosenstiel, “Combinationof Instruction Set Simulation and Abstract RTOS Model Executionfor Fast and Accurate Target Software Evaluation,” in Proc. of theInt. Conference on Hardware/Software Codesign and System Synthesis(CODES+ISSS), 2008.

[28] J. Ceng, W. Sheng, J. Castrillon, A. Stulova, R. Leupers, G. Ascheid, andH. Meyr, “A High-Level Virtual Platform for Early MPSoC SoftwareDevelopment,” in Proc. of the Int. Conference on Hardware/SoftwareCodesign and System Synthesis (CODES+ISSS), 2009.

[29] S. Kraemer, L. Gao, J. Weinstock, R. Leupers, G. Ascheid, and H. Meyr,“HySim: A Fast Simulation Framework for Embedded Software Devel-opment,” in Proc. of the Int. Conference on Hardware/Software Codesignand System Synthesis (CODES+ISSS), 2007.

[30] Z. Wang and J. Henkel, “HyCoS: Hybrid Compiled Simulation ofEmbedded Software with Target Dependent Code,” in Proc. of theInt. Conference on Hardware/Software Codesign and System Synthesis(CODES+ISSS), 2012.

[31] M. Monton, A. Portero, M. Moreno, B. Martinez, and J. Carrabina,“Mixed SW/SystemC SoC Emulation Framework,” 2007.

[32] M. Gligor, N. Fournel, and F. Petrot, “Using Binary Translation inEvent Driven Simulation for Fast and Flexible MPSoC Simulation,” inProc. of the Int. Conference on Hardware/Software Codesign and SystemSynthesis (CODES+ISSS), 2009.

[33] Homepage of Open Virtual Platforms. http://www.ovpworld.org/.[34] Homepage of Synopsys. http://www.synopsys.org/.[35] P. Destro, F. Fummi, and G. Pravadelli, “A Smooth Refinement Flow for

Co-Designing HW and SW Threads,” in Proc. of Design, Automationand Test in Europe (DATE), 2007.

[36] M. Becker, G. D. Guglielmo, F. Fummi, W. Mueller, G. Pravadelli, andT. Xie, “RTOS-Aware Refinement for TLM2.0-based HW/SW Designs,”in Proc. of the Conference on Design, Automation and Test in Europe(DATE), 2010.

Date post:	27-Jun-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

HeroeS: Virtual Platform Driven Integration of ...adt.cs.upb.de/wolfgang/isorc13.pdf · and the...

Documents