+ All Categories
Home > Documents > Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

Date post: 13-Jan-2016
Category:
Upload: oksana
View: 28 times
Download: 0 times
Share this document with a friend
Description:
Reliability study of an embedded operating system for industrial applications Pardo, J., Campelo, J.C, Serrano, J.J. Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain. Research Objectives. - PowerPoint PPT Presentation
Popular Tags:
33
1 Reliability study of an Reliability study of an embedded operating system for embedded operating system for industrial applications industrial applications Pardo, J., Campelo, J.C, Serrano, J.J. Pardo, J., Campelo, J.C, Serrano, J.J. Juan Pardo Juan Pardo Fault Tolerant Systems Group Fault Tolerant Systems Group Polytechnic University of Valencia Polytechnic University of Valencia Spain Spain
Transcript
Page 1: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

11

Reliability study of an embedded Reliability study of an embedded operating system for industrial operating system for industrial

applicationsapplications

Pardo, J., Campelo, J.C, Serrano, J.J.Pardo, J., Campelo, J.C, Serrano, J.J.

Juan PardoJuan PardoFault Tolerant Systems GroupFault Tolerant Systems Group

Polytechnic University of Valencia Polytechnic University of Valencia Spain Spain

Page 2: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 22

Research ObjectivesResearch Objectives Critical industrial applications or fault tolerant Critical industrial applications or fault tolerant

applications need for applications need for operating systems (OS) which which guarantee a correct and safe behaviour despite the guarantee a correct and safe behaviour despite the appearance of errors. appearance of errors.

In order to validate the behaviour of an operating system In order to validate the behaviour of an operating system in front of errors, in front of errors, software fault injection techniques can can be used. be used.

These techniques can be used These techniques can be used to corrupt the information of some of the operating system calls to see of some of the operating system calls to see how the system react in front of invalid or corrupted how the system react in front of invalid or corrupted values at the kernel calls. values at the kernel calls.

Page 3: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 33

Research ObjectivesResearch Objectives The research work presented is about the development and results The research work presented is about the development and results

on on software fault injection in an embedded system composed by a in an embedded system composed by a Real-Time Operating System (RTOS) and a microcontroller.Real-Time Operating System (RTOS) and a microcontroller.

A A software fault injection tool has been developed. The has been developed. The methodology proposed treated the operating system as a methodology proposed treated the operating system as a black-box where its source code was not available.where its source code was not available.

With this objective a With this objective a layer between the operating system and the between the operating system and the application to be executed has been developed. application to be executed has been developed.

OS OS error detection coverage has been measured and observations has been measured and observations about OS about OS critical data structures to be improved have been to be improved have been commented, in order to improve the final commented, in order to improve the final robustness of the of the operating system.operating system.

Page 4: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 44

IntroductionIntroduction Software of computer systems involves a lot of aspects of our lives. Software of computer systems involves a lot of aspects of our lives.

Despite their enormous expansion, they are still far from reaching the Despite their enormous expansion, they are still far from reaching the perfection.perfection.

In order to measure the quality of the software some tests are required. In order to measure the quality of the software some tests are required.

Fault tolerance deals with software’s ability to hide problems, deals with software’s ability to hide problems, specifically the effects of faults [specifically the effects of faults [Voas98]. ].

Robustness is the degree to which a system operates correctly in the is the degree to which a system operates correctly in the presence of exceptional inputs or stressful environmental conditions. presence of exceptional inputs or stressful environmental conditions.

Robustness can thus be viewed as an indication on the OS capacity to can thus be viewed as an indication on the OS capacity to resist/react to faults induced by the applications running on top of it, or resist/react to faults induced by the applications running on top of it, or originating from the hardware layer or from device drivers [originating from the hardware layer or from device drivers [DBench02].].

Page 5: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 55

IntroductionIntroduction

Fault Tolerant SystemFault Tolerant System

Fault tolerance is intended to preserve the delivery of correct is intended to preserve the delivery of correct service in the presence of active faults. It is generally implemented service in the presence of active faults. It is generally implemented by error detection and subsequent system recoveryby error detection and subsequent system recovery

A system able to A system able to continue working although the appearance of although the appearance of errorserrors

Safe behaviour known state which doesn’t produce any risk to known state which doesn’t produce any risk to the systemthe system

DependabilityDependability

To avoid the lost of To avoid the lost of human lives or important or important economic quantities Final products quality Final products quality Validation before to go to the market before to go to the market

Page 6: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 66

IntroductionIntroduction

DependabilitDependabilityy

AttributesAttributes MeansMeans ThreatsThreats

AvailabilityReliabilitySafetyConfidentialityIntegrityMaintainability

Fault preventionFault toleranceFault removalFault forecasting

FaultsErrorsFailures

Dependability:Dependability:

Dependability of a computing system is the ability to deliverservice that can justifiably be trusted

A. AvizienisJC. LaprieB. Randell

Page 7: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 77

State of artState of artFault InjectionFault Injection

TechniquesTechniques Fault InjectionFault Injection

FI on Simulated models FI on prototypes

VHDL Simulation

models

Other languages

Hardware Injection HWIFI

Software Injection SWIFI

External

Internal

HWIFI at pin level

Electromagnetic Perturbations

Time Level

Static

Dynamic

High Level

Machine Language

Heavy ion radiations

Laser Radiation

Scan Chain

Injection Objectives:

•Prediction

•Elimination

Page 8: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 88

Advantages & drawbacks (SWIFI )Advantages & drawbacks (SWIFI )

Total control on When and Where to inject Total control on When and Where to inject ControllabilityControllability

Higher level faults simulationHigher level faults simulation

Reduced costReduced cost

Higher Higher reachabilityreachability

Higher portability Higher portability FlexibilityFlexibility

Low risk to damage the circuit under testsLow risk to damage the circuit under tests

Easy Easy automationautomation of the injection campaigns of the injection campaigns

Good Good observabilityobservability everyday processors have more internal tools for everyday processors have more internal tools for

debuggingdebugging

Page 9: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 99

Advantages & drawbacks (SWIFI )Advantages & drawbacks (SWIFI )

There are zones which SW can not reach.There are zones which SW can not reach.

Less precision on timing measurements Less precision on timing measurements interferences with the interferences with the

system, overload, etc. system, overload, etc.

Injection and activation agents overload the systemInjection and activation agents overload the system

Runtime Injection Runtime Injection Little intrusion Little intrusion

Objective: minimize the overloadObjective: minimize the overload

Drawback for RTOSDrawback for RTOS

Easy automation of injections campaignsEasy automation of injections campaigns

Pre-runtime Pre-runtime Less intrusion Less intrusion

Page 10: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 1010

SW Fault InjectionSW Fault Injection SW Fault Injection tools:

FIAT:FIAT: Fault Injection Based Automated Testing Environment, Fault Injection Based Automated Testing Environment, Carnegie Carnegie Mellon University.Mellon University.

EFI, PROFI:EFI, PROFI: Processor Fault Injector,Processor Fault Injector, Dortmund University. Dortmund University. FERRARI:FERRARI: Fault and ERRor Automatic Real-time Injector, Fault and ERRor Automatic Real-time Injector, Texas Texas

University.University. SFI, DOCTOR:SFI, DOCTOR: intergrateD sOftware implemented fault injeCTiOn intergrateD sOftware implemented fault injeCTiOn

enviRonment, enviRonment, Michigan University. Michigan University. FINE:FINE: Fault Injection and moNitoring Environment,Fault Injection and moNitoring Environment, Universidad de Universidad de

Illinois University. Illinois University. FTAPE:FTAPE: Fault Tolerance and Performance Evaluator,Fault Tolerance and Performance Evaluator, Illinois University. Illinois University. XCEPTION:XCEPTION: Coimbra University. Coimbra University.

MAFALDA, MAFALDA-RTMAFALDA, MAFALDA-RT:: Microkernel Assessment by Fault injection Microkernel Assessment by Fault injection AnaLysis and Design AidAnaLysis and Design Aid, LAAS-CNRS en Toulouse, LAAS-CNRS en Toulouse

BALLISTABALLISTA: : Carnegie Mellon University.Carnegie Mellon University.

Page 11: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 1111

ToolsTools MicroC/OS-IIMicroC/OS-II RTOS RTOS Infineon C166 Infineon C166 Microcontroller Microcontroller Tasking Tasking Compiler, Debugger.. Compiler, Debugger..

C161C161

C166C166

C163C163

C164C164

C165C165

C167C167

• Robotics

• PLC’s

• Servo-Drives

• Motor Control

• Power-Inverters

• Machine-ToolControl (CNC)

• EngineManagement

• TransmissionControl

• ABS/ASK

• Active Suspension

Automotive Industrial Control

• DVD / CD-ROM

• TV / Monitor

• VCR / SatReceiver

• Set Top Box

• Games

• Video Surveillance

Telecom/ Datacom

• CommunicationBoards (LAN)

• Modems

• PBX

• MobileCommunication

EDP

• Hard Disk Drives

• Tape Drives

• Printers

• Scanners

• Digital Copiers

• FAX Machines

Consumer

Applications for the C166 Family

WDTOSC. PEC

CPUROM /

RAM

PORTS

CAPCOMADCBus

Ext.

Processor -System

Interrupt-System

USART GPTs

Peripheral-System

Flash

Control

X-BusSync Communication PWMPeriphrl.

XRAMXRAM1KByte1KByte

XRAMXRAM1KByte1KByteRAMRAM

1KByte1KByte

RAMRAM1KByte1KByte

PWMPWM

ADCADC

CANCANBUS-BUS-

CONTROLCONTROL

INTERRUPTINTERRUPTUNITUNIT

CAPCOMCAPCOM1+21+2

SSCSSC

USARTUSARTGPTGPT1+21+2

IR+PEC-IR+PEC-CONTROLCONTROL

ROMROM

WDTWDT

CORECORE

Infineon Microcontroller Characteristics:Infineon Microcontroller Characteristics:16 bits High performance16 bits High performanceOn-chip CMOS On-chip CMOS 16.5 MIPS, 25/33 MHz16.5 MIPS, 25/33 MHzAdvantages from CISC & RISCAdvantages from CISC & RISCHigh functionality for peripheralHigh functionality for peripheralTypical for automotiveTypical for automotive

Page 12: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 1212

COTS componentsCOTS components The main motivation to use Commercial Off-The-The main motivation to use Commercial Off-The-

Shelf (COTS) components on a system design is Shelf (COTS) components on a system design is the the notorious cost reductionnotorious cost reduction associated to the associated to the final product development. final product development.

The use of COTS components becomes a The use of COTS components becomes a cost-cost-effective methodeffective method for rapid prototyping of complex for rapid prototyping of complex software systems. software systems.

On the other hand, the use of COTS software On the other hand, the use of COTS software components have components have serious certification problemsserious certification problems due to their design process is unknown. due to their design process is unknown.

Page 13: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 1313

COTS componentsCOTS components

COTS software is composed of COTS software is composed of general purpose general purpose componentscomponents which have poor dependability which have poor dependability specifications. specifications.

Usually, COTS components are like a Usually, COTS components are like a black-boxblack-box, , the source code is not available and their the source code is not available and their internal architecture (structure and data flow) is internal architecture (structure and data flow) is not adequately documented. not adequately documented.

Page 14: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 1414

µC/OS-II Operating SystemµC/OS-II Operating System

Selection came motivated from the perspective that it is a system Selection came motivated from the perspective that it is a system widely used since several years ago. widely used since several years ago.

First Version MicroC/OS 1992

Industrial robots, motor control, medical instruments, etc. Industrial robots, motor control, medical instruments, etc.

It is 99% compliant with the Motor Industry Software Reliability It is 99% compliant with the Motor Industry Software Reliability Association (MISRA) C Coding Standards. Association (MISRA) C Coding Standards.

All Modified Condition Decision Coverage (MCDC) code in All Modified Condition Decision Coverage (MCDC) code in MicroC/OS-II has been removed, improving code quality for RTCA / MicroC/OS-II has been removed, improving code quality for RTCA / EUROCAE DO-178B Level A-certified environments for avionics EUROCAE DO-178B Level A-certified environments for avionics applications.applications.

Validated Software Comp.

Page 15: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 1515

µC/OS-II: Characteristics µC/OS-II: Characteristics

Portable: uC/OS-II is written in highly portable ANSI C, with target uC/OS-II is written in highly portable ANSI C, with target microprocessor-specific code written in assembly language. microprocessor-specific code written in assembly language.

ROMable: was designed for embedded applications. This means that if you was designed for embedded applications. This means that if you have the proper tool chain (i.e., C compiler, assembler, and linker/locator), have the proper tool chain (i.e., C compiler, assembler, and linker/locator), you can embed uC/OS-II as part of a product.you can embed uC/OS-II as part of a product.

Scalable: it’s possible to use only the services needed in the application. it’s possible to use only the services needed in the application. This allows to reduce the amount of memory (both RAM and ROM) needed. This allows to reduce the amount of memory (both RAM and ROM) needed. Scalability is accomplished with the use of conditional compilation. Scalability is accomplished with the use of conditional compilation.

Preemptive: uC/OS-II is a fully preemptive real-time kernel. This means that uC/OS-II is a fully preemptive real-time kernel. This means that uC/OS-II always runs the highest priority task that is ready. uC/OS-II always runs the highest priority task that is ready.

Multitasking: uC/OS-II can manage up to 64 tasks; however, the current uC/OS-II can manage up to 64 tasks; however, the current version of the software reserves eight of these tasks for system use. This version of the software reserves eight of these tasks for system use. This leaves your application up to 56 tasks. Each task has a unique priority leaves your application up to 56 tasks. Each task has a unique priority assigned to it, which means that uC/OS-II cannot do round-robin scheduling. assigned to it, which means that uC/OS-II cannot do round-robin scheduling.

Jean J. Labrosse

Page 16: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 1616

µC/OS-II: CharacteristicsµC/OS-II: Characteristics

Deterministic: Execution time of all uC/OS-II functions and services are Execution time of all uC/OS-II functions and services are deterministic. You can always know how much time uC/OS-II will take to execute a deterministic. You can always know how much time uC/OS-II will take to execute a function or a service. Further more execution time of all uC/OS-II services do not function or a service. Further more execution time of all uC/OS-II services do not depend on the number of tasks running in your application.depend on the number of tasks running in your application.

Task Stacks: Each task requires its own stack; uC/OS-II allows each task to have a Each task requires its own stack; uC/OS-II allows each task to have a different stack size. This allows you to reduce the amount of RAM needed in your different stack size. This allows you to reduce the amount of RAM needed in your application.application.

Services: system services such as mailboxes, queues, semaphores, fixed-sized system services such as mailboxes, queues, semaphores, fixed-sized memory partitions, time-related functions, etc.memory partitions, time-related functions, etc.

Interrupt Management: Interrupts can suspend the execution of a task. If a higher Interrupts can suspend the execution of a task. If a higher priority task is awakened as a result of the interrupt, the highest priority task will run priority task is awakened as a result of the interrupt, the highest priority task will run as soon as all nested interrupts complete. Interrupts can be nested up to 255 levels as soon as all nested interrupts complete. Interrupts can be nested up to 255 levels deep.deep.

Robust and Reliable: uC/OS-II is based on uC/OS, which has been used in uC/OS-II is based on uC/OS, which has been used in hundreds of commercial applications since 1992.hundreds of commercial applications since 1992.

Jean J. Labrosse

Page 17: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 1717

Black-box approachBlack-box approach The aim of study was to use a The aim of study was to use a black-boxblack-box approach for the OS study. approach for the OS study.

So the So the OS source codeOS source code was not modified trying to avoid as was not modified trying to avoid as maximum as possible an intrusion in the OS behaviour. maximum as possible an intrusion in the OS behaviour.

With this objective, a layer named as With this objective, a layer named as Meta-KernelMeta-Kernel, had been , had been developed between the OS and the application to be executed. developed between the OS and the application to be executed.

Through this layer the fault injection was realized in any of the Through this layer the fault injection was realized in any of the parameters of the system calls to measure the parameters of the system calls to measure the OS robustnessOS robustness. .

In black-box testing, input is fed into a program and the output is In black-box testing, input is fed into a program and the output is checked. What goes on inside the program (the checked. What goes on inside the program (the black-boxblack-box) is ) is unimportant. (unimportant. (Voas98))

COTS SW

Page 18: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 1818

System DesignSystem Design MicroC/OS-II OS MicroC/OS-II OS

Black-Box

OS Source Code not modified

Injector Injector Layer Layer between the OS and between the OS and the applicationthe application

Injection on the Injection on the parameters of system parameters of system callscalls

Page 19: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 1919

Injector AttributesInjector Attributes

Software Fault Injection

Software Fault Injection

ObjectivesObjectives TimeTime FaultsFaults MultiplicityMultiplicity WorkloadWorkload

Fault Prediction

Fault Removal

Fault Prediction

Fault Removal

Pre-runtime

Runtime

Pre-runtime

Runtime

Level

Localization

Persistence

Type

Duration

Level

Localization

Persistence

Type

Duration

Number of simultaneously faults injected each experiment

Number of simultaneously faults injected each experiment

Real Applications BenchmarksSynthetic Programs

Real Applications BenchmarksSynthetic Programs

Injector Attributes:

•Prediction, elimination

•Pre-runtime & Runtime

•High Level

•Transient faults

•Changing of one bit at the system calls (Bit-Flip)

•One fault injected each exp.

•Workload for tool testing

SOFTWARE FAULT INJECTION ATTRIBUTES

Page 20: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 2020

Workload DesignWorkload Design

CharacteristicsCharacteristics::

•Maximum system calls consume

•System calls of synchronization, semaphores, memory, queues, messages, tasks handling, Timing management, etc.

•Open module to include calculus.

•Workload for testing the injection tool and the OS

Page 21: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 2121

Workload DesignWorkload Design

The system workload was The system workload was continuously runningcontinuously running and and consisted of a series of tasks consisted of a series of tasks executing the application. executing the application.

On the other hand, an On the other hand, an injection agentinjection agent developed developed was in charge of injecting was in charge of injecting faults and invalid values at faults and invalid values at the kernel calls in order to the kernel calls in order to monitor the system monitor the system robustness.robustness.

Page 22: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 2222

Errors ClassificationErrors Classification

Errors which could affect the systemErrors which could affect the system Classification related to the detection Classification related to the detection

mechanismsmechanisms Measures about error detection coverage and Measures about error detection coverage and

latency timeslatency times

Events after fault

injection

OS Error codeC167 Error

codeApplication

ErrorNo Error(Correct result)

Others

↓Not Safe Faults(NFS)

System Call not used

System Call used but injection no affects

Detected Errors

After the Fault Injection

Page 23: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 2323

Injection ModelInjection Model TheThe faultloadfaultload is the most critical dimension of an OS benchmark is the most critical dimension of an OS benchmark

and more generally of any dependability benchmark. and more generally of any dependability benchmark.

Two techniques for system call parameter corruption could be Two techniques for system call parameter corruption could be used: the ‘used: the ‘bit-flip technique’ consisting in flipping systematically bits ’ consisting in flipping systematically bits of the target parameters of the target parameters

and the ‘and the ‘selective substitution technique’ when invalid data values ’ when invalid data values are introduced in the system call parameters. are introduced in the system call parameters.

Studies have demonstrated the equivalence of the errors provoked Studies have demonstrated the equivalence of the errors provoked by the two techniques [by the two techniques [Dbench02].].

Page 24: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 2424

Injection ModelInjection Model

BIT-FLIP BIT-FLIP techniquetechnique It is randomly chosen on It is randomly chosen on

runtime:runtime:

1.1. System callSystem call

2.2. ParameterParameter

3.3. Bit Bit

Consequence of physical Consequence of physical faultsfaults

EMI interferencesEMI interferences Noise Noise Hardware faultsHardware faults ......

Page 25: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 2525

Analysis of the obtained resultsAnalysis of the obtained results

•D0: No error, correct output (the fault injection didn’t affect the system).

•D1: Error detected by the operating system (µC/OS-II error code).

•D2: Error detected by the application (the application result was no correct).

•D3: Error which produced the system hangs. (System failure)

•D4: Error detected by the microcontroller.

•Codification of the different output values:

Page 26: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 2626

Analysis of the obtained resultsAnalysis of the obtained results

D4

D3

D2

D1

D0

DETECC

D4D3D2D1D0

Po

rce

nta

je

70

60

50

40

30

20

10

0

DETECC

756 65,7 65,7 65,7

241 21,0 21,0 86,7

23 2,0 2,0 88,7

101 8,8 8,8 97,5

29 2,5 2,5 100,0

1150 100,0 100,0

D0

D1

D2

D3

D4

Total

VálidosFrecuencia Porcentaje

Porcentajeválido

Porcentajeacumulado

Complete System (Complete System (µC/OS-II + MicroµC/OS-II + Micro)::

C cs = D0 + D1 + D2 + D4 = C cs = D0 + D1 + D2 + D4 = 65,7 + 21 + 2 + 2,5 = 91,2 %65,7 + 21 + 2 + 2,5 = 91,2 %

Operating System ( Operating System ( µC/OS-IIµC/OS-II ): ):

C C OSOS = D0 + D1 =86,7 % = D0 + D1 =86,7 %

CoverageCoverage::[Powell95, Constantinescu95]

Page 27: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 2727

Analysis of the obtained resultsAnalysis of the obtained results

Error detection Error detection latencieslatencies

Time between the injection and Time between the injection and detection by the OSdetection by the OS

Mean value obtained 304 Mean value obtained 304 μμss

One built-in timer of the One built-in timer of the microcontroller to measure microcontroller to measure latencieslatencies

High precisionHigh precision

Descriptivos

,30422573 1,97E-02

,26533773

,34311372

,27924537

,12800000

9,392E-02

,30646466

,102400

,972800

,870400

,49920000

1,213 ,157

-,287 ,312

Media

Límite inferior

Límite superior

Intervalo de confianzapara la media al 95%

Media recortada al 5%

Mediana

Varianza

Desv. típ.

Mínimo

Máximo

Rango

Amplitud intercuartil

Asimetría

Curtosis

LATENCEstadístico Error típ.

241N =

LATENC

1,2

1,0

,8

,6

,4

,2

0,0

Page 28: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 2828

Other ResultsOther Results

Frequency tables about Frequency tables about the most typical the most typical error error codescodes given by the OS given by the OS

Valid data Frequency Percentage Accumulative percentage

Error Code

E1 111 41,1 41,1 OS_ERR_EVENT_TYPE

E11 14 5,2 46,3 OS_MEM_INVALID_PART

E40 8 3,0 49,3 OS_TASK_DEL_ERR

E41 3 1,1 50,4 OS_PRIO_ERR

E42 69 25,6 75,9 OS_PRIO_INVALID

E60 13 4,8 80,7 OS_TASK_DEL_ERR

E81 11 4,1 84,8 OS_TIME_INVALID_MINUTES

E82 2 0,7 85,6 OS_TIME_INVALID_SECONDS

E83 10 3,7 89,3 OS_TIME_INVALID_MILLI

Ex 29 10,7 100,0 NO CODE

Total 270 100,0

‘E1’ was the most typical. This error is the ‘OS_ERR_EVENT_TYPE’. This error was produced when the fault was injected in some semaphore, message queue or mailbox. The system reacted going to a hanging state.

Secondly, the error code ‘E42’ related with the ‘OS_PRIO_INVALID’ was obtained when the injection was at system calls about task management.

Page 29: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 2929

Other ResultsOther Results

31 31

4 4

5 5

19 19

9 3 12

1 1

5 5

1 1

6 6

2 2

29 5 34

4 4

19 19

32 32

5 5

23 23

14 14

14 14

29 29

4 4

4 4

2 2

8 5 19 9 4 5 9 29 4 28 32 5 23 45 14 29 2 270

LL1

LL10

LL13

LL15

LL16

LL17

LL18

LL19

LL20

LL21

LL22

LL23

LL24

LL28

LL3

LL30

LL4

LL5

LL50

LL6

LL8

LL9

LLAMAD

Total

LL10 LL13 LL15 LL16 LL17 LL18 LL20 LL22 LL23 LL24 LL28 LL3 LL30 LL4 LL5 LL50 LL9

PROPAG

Total

Moreover, after the injection campaigns it was possible to see how errors were propagated through the system. It was registered the corrupted system call and later which was the system call who finally detected the error, taking the time employed for the system to detect this situation.

Error Propagation

Page 30: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 3030

Other ResultsOther Results To finish, results on which were the To finish, results on which were the most critical system calls were were

obtained with the aim to improve their robustness and of course the obtained with the aim to improve their robustness and of course the final OS dependability. final OS dependability.

For example, there are some data structures, related with the For example, there are some data structures, related with the event control block, in which the injection produced a lot of failures and the , in which the injection produced a lot of failures and the most of times the system hanged. most of times the system hanged.

This is due to in these structures is stored the This is due to in these structures is stored the list of tasks waiting for some event, so if the injection corrupts that information, the system , so if the injection corrupts that information, the system loss the sequence of the next actions and goes to a non safe state loss the sequence of the next actions and goes to a non safe state without knowing how to react (without knowing how to react (the system hangs). ).

This give us information on where dedicate special attention due to This give us information on where dedicate special attention due to an error on those data structures could provoke an error on those data structures could provoke critical failures on on the system.the system.

Page 31: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 3131

Conclusions Conclusions After the experiments, the error detection coverage, error detection After the experiments, the error detection coverage, error detection

latency times, error propagation, typical OS error codes, etc. have latency times, error propagation, typical OS error codes, etc. have been obtained. been obtained.

Fault injection into the Fault injection into the code and data memory segments of the segments of the microkernel will be implemented too. microkernel will be implemented too.

About possible improvements for the MicroC/OS-II to increase its About possible improvements for the MicroC/OS-II to increase its dependability should take into account, that some detected errors in dependability should take into account, that some detected errors in certain certain data structures could provoke critical failures on the system. could provoke critical failures on the system.

These detected data structures should implement some mechanism These detected data structures should implement some mechanism to protect the information they host.to protect the information they host.

Page 32: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 3232

Future ResearchFuture Research In a next research work, these data have to be In a next research work, these data have to be

compared with compared with other COTS RTOS working under the working under the same conditions. same conditions.

RT-fault injector to minimize intrusionRT-fault injector to minimize intrusion((Without internal debug support, intrusion > 0)Without internal debug support, intrusion > 0)

Nexus-implemented fault injection-implemented fault injection Other architecture: Motorola MPC565Other architecture: Motorola MPC565 Intrusion -----> nullIntrusion -----> null Preliminary resultsPreliminary results Better controllability and observability Better controllability and observability Best option to validate RTOS and applicationsBest option to validate RTOS and applications

Page 33: Juan Pardo Fault Tolerant Systems Group Polytechnic University of Valencia Spain

SEPT’04SEPT’04 WSRS '04 WSRS '04 3333

Contact DataContact Data

Juan Pardo

Fault Tolerant Systems GroupFault Tolerant Systems GroupPolytechnic University of Valencia Polytechnic University of Valencia Spain Spain

EmailEmail: : [email protected]

WebWeb: : http://www.disca.upv.es/gstf/


Recommended