+ All Categories
Home > Education > What activates a bug? A refinement of the Laprie terminology model.

What activates a bug? A refinement of the Laprie terminology model.

Date post: 16-Apr-2017
Category:
Upload: peter-troeger
View: 297 times
Download: 0 times
Share this document with a friend
12
Peter Tröger * , Lena Feinbube + , Matthias Werner * 26th IEEE International Symposium on Software Reliability Engineering * Operating Systems Group, Technical University Chemnitz, Germany + Operating Systems and Middleware Group, Hasso-Plattner-Institute, Germany WAP: What activates a bug? A refinement of the Laprie terminology model
Transcript
Page 1: What activates a bug? A refinement of the Laprie terminology model.

Peter Tröger*, Lena Feinbube+, Matthias Werner*

26th IEEE International Symposium on Software Reliability Engineering

*Operating Systems Group, Technical University Chemnitz, Germany +Operating Systems and Middleware Group, Hasso-Plattner-Institute, Germany

WAP: What activates a bug? A refinement of the Laprie terminology model

Page 2: What activates a bug? A refinement of the Laprie terminology model.

[email protected] 2015

Describing Software Bugs

▶ ‚Buggy‘ code producing an error only in the ‚right’ state

▶ Dormant design fault, activated by execution? ▶ Dormant design fault, activated for some state of argv[1]? ▶ Erroneous argument as external fault? ▶ Erroneous argument as propagating error? ▶ Mandelbug?

2

What activates a bug? A refinement of the Laprie terminology model

#define BUFSIZE 256 int main(int argc, char **argv) { char buf[BUFSIZE]; strcpy(buf, argv[1]); } [CWE ID 121]

Page 3: What activates a bug? A refinement of the Laprie terminology model.

[email protected] 2015

Terminology in Use▶ Meta-study of 144 SE papers ▶ Different terminology models in use ▶ Orthogonal Defect Classification (ODC)

R. Chillarege, I. S. Bhandari, J. K. Chaar, M. J. Halliday, D. S. Moebus, B. K. Ray, and M.-Y. Wong, “Orthogonal defect classification- a concept for in-process measurements,” IEEE Transactions on Software Engineering,, vol. 18, no. 11, pp. 943–956, 1992.

▶ IEEE Software Engineering GlossaryJ. Radatz, A. Geraci, and F. Katki, “IEEE Standard Glossary of Software Engineering Terminology,” IEEE Std, vol. 61012, no. 12, p. 3, 1990.

▶ Laprie / Avizienis A. Avizienis, J.-C. Laprie, B. Randell, and C. Landwehr, “Basic concepts and taxonomy of dependable and secure computing,” IEEE Transactions on Dependable and Secure Computing, vol. 1, no. 1, pp. 11–33, 2004.

▶ BinderR. Binder, „Testing object-oriented systems: models, patterns, and tools.“ Addison-Wesley Professional, 2000.

▶ Cristian F. Cristian, “Understanding fault-tolerant distributed systems,” Communications of The ACM, vol. 34, pp. 56–78, 1991.

3

What activates a bug? A refinement of the Laprie terminology model

12

Fig. 12: Occurrences of tag combinations from the terminol-ogy and target categories. The ODC terminology is mainlyused to describe code and interface problems, whereasthe usage of the IEEE terminology is more widespreadacross targets such as architecture, test and specification.INTERPRETATION: ODC is tailored for code aspects mainly.The code and interface targets are well covered by ODC,as they are explicitly described by the ODC “Target” and“Defect Type” categories. Testing and requirements are alsodiscussed in the ODC papers, but mainly in the context ofhow a bug can be reproduced or by mapping code defects torequirement. On the other hand, the IEEE has standardizedterminology not only for software faults, but also for manyother aspects in the whole software development life cycle.This might explain why the IEEE terminology is used for abroader range of targets.

Fig. 13: Occurrences of tag combinations from the model andterminology categories. The diverging sets of terminologyappear to be used for different types of models: whileODC and IEEE are strong with fault models, the Laprieterminology also frequently used for error models, as wellas meta studies. ODC appears to have a broader range ofapplicability than the IEEE terminology. INTERPRETATION:The definitions of error states and failure events in ODCare based on Laprie, and therefore use a precise state-based definition. On the other hand, IEEE defined “fault”(error in our terminology) as a “manifestation of an errorin software”, which is rather vague. It may be the case thatIEEE terminology is tailored mainly to suit mainly staticassets.

Page 4: What activates a bug? A refinement of the Laprie terminology model.

[email protected] 2015

Goal▶ Vocabulary is crucial ▶ Fault, error, failure, defect, bug, problem,

recovery, outage, failure, crash, .... ▶ Team communication, document writing

▶ Terminology model for software bugs ▶ Focus on state-related issues

▶ Approach: Refine the proven existing terminology ▶ Step 1: Unambiguous description of „the“ Laprie model ▶ Step 2: Create system model for software specifics ▶ Step 3: Refine the Laprie model accordingly

4

What activates a bug? A refinement of the Laprie terminology model

indicates where the main balance of interest and activity liesin each case.

The dependability and security specification of asystem must include the requirements for the attributes interms of the acceptable frequency and severity of servicefailures for specified classes of faults and a given useenvironment. One or more attributes may not be required atall for a given system.

2.4 The Means to Attain Dependability and Security

Over the course of the past 50 years many means have beendeveloped to attain the various attributes of dependabilityand security. Those means can be grouped into four majorcategories:

. Fault prevention means to prevent the occurrence orintroduction of faults.

. Fault tolerancemeans to avoid service failures in thepresence of faults.

. Fault removal means to reduce the number andseverity of faults.

. Fault forecasting means to estimate the presentnumber, the future incidence, and the likely con-sequences of faults.

Fault prevention and fault tolerance aim to provide theability to deliver a service that can be trusted, while faultremoval and fault forecasting aim to reach confidence inthat ability by justifying that the functional and thedependability and security specifications are adequate andthat the system is likely to meet them.

2.5 Summary: The Dependability and Security Tree

The schema of the complete taxonomy of dependable andsecure computing asoutlined in this section is shown inFig. 2.

3 THE THREATS TO DEPENDABILITY AND SECURITY

3.1 System Life Cycle: Phases and Environments

In this section, we present the taxonomy of threats that mayaffect a system during its entire life. The life cycle of asystem consists of two phases: development and use.

The development phase includes all activities frompresentation of the user’s initial concept to the decision thatthe system has passed all acceptance tests and is ready todeliver service in its user’s environment. During thedevelopment phase, the system interacts with the develop-ment environment and development faultsmay be introducedinto the system by the environment. The development

environment of a system consists of the following elements:

1. the physical world with its natural phenomena,2. human developers, some possibly lacking competence

or having malicious objectives,3. development tools: software and hardware used by the

developers to assist them in the developmentprocess,

4. production and test facilities.

The use phase of a system’s life begins when the systemis accepted for use and starts the delivery of its services tothe users. Use consists of alternating periods of correctservice delivery (to be called service delivery), serviceoutage, and service shutdown. A service outage is caused bya service failure. It is the period when incorrect service(including no service at all) is delivered at the serviceinterface. A service shutdown is an intentional halt ofservice by an authorized entity. Maintenance actions maytake place during all three periods of the use phase.

During the use phase, the system interacts with its useenvironment and may be adversely affected by faultsoriginating in it. The use environment consists of thefollowing elements:

1. the physical world with its natural phenomena;2. administrators (including maintainers): entities (hu-

mans or other systems) that have the authority tomanage, modify, repair and use the system; someauthorized humans may lack competence or havemalicious objectives;

3. users: entities (humans or other systems) that receiveservice from the system at their use interfaces;

4. providers: entities (humans or other systems) thatdeliver services to the system at its use interfaces;

5. the infrastructure: entities that provide specializedservices to the system, such as information sources(e.g., time, GPS, etc.), communication links, powersources, cooling airflow, etc.

6. intruders: malicious entities (humans and othersystems) that attempt to exceed any authority theymight have and alter service or halt it, alter thesystem’s functionality or performance, or to accessconfidential information. Examples include hackers,vandals, corrupt insiders, agents of hostile govern-ments or organizations, and malicious software.

As used here, the term maintenance, following commonusage, includes not only repairs, but also all modifications

4 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 1, NO. 1, JANUARY-MARCH 2004

Fig. 1. Dependability and security attributes.

Fig. 2. The dependability and security tree.[Avizienis et al., 2004]

Page 5: What activates a bug? A refinement of the Laprie terminology model.

[email protected] 2015

What activates a bug? A refinement of the Laprie terminology model

Step 1: Failure Automaton

5

II. RELATED WORK

Beizer [9] presented an early taxonomy of faults alongdimensions such as structure, data, documentation etc. It hasformed the basis for several statistical analyses of bug reposi-tories. With the recent advent of large scale open source coderepositories, there has also been more research on softwarefault patterns [10], [11], [12]. However, the reasons for errorstates, or fault (activation) conditions, are not scrutinizedin much detail. Our impression is that, while code designfaults are heavily studied and formalized, fault enabling andactivation conditions are under-represented. Nevertheless, theyhave implicitly been studied in other forms:

Cristian [13] presented a formal description of exceptionsin software programs. The approach is focused on the internalexception handling and detection mechanisms, and assumesthat only the initial state of program invocation is relevant –similarly to the input trajectory concept in the Laprie model.

The Orthogonal Defect Classification (ODC) [14] is anintegrated approach to classify software defects in the contextof the entire development process. However, the granularityof ODC triggers is coarse and there is no clear distinctionbetween static and dynamic properties or environmental andinternal states. For instance, while the Bug-Fix trigger explainshow a fault was introduced, the Workload trigger describes aruntime phenomenon in the environment.

Several certification standards, such as ISO9001 orISO26262, define their own terminology model for softwareflaws. Most of these definitions are derived from [15], [14], or[6] and therefore show the same ambiguities.

One typical example for the mix-up of concepts is thesoftware security taxonomy by Tsipenyuk et al. [16]. Animpressive range of software security problems is classified inthis model, which unquestionably shows the authors expertisein the domain. However, faults, fault activations and errors areall treated in the same way.

Many software testing approaches, like the purely codecoverage based ones, focus mainly on the internal state ofthe investigated software. It has been shown, however, thatfocusing only on coverage and neglecting dynamics and envi-ronment state is insufficient to uncover all bugs [17].

III. LAPRIE FAILURE AUTOMATON

Since specialized vocabulary is unavoidable in largeprojects and across multiple communities, we see a benefitin stepping back and approaching the problem by discussingsoftware-specific issues with the common base terms. There-fore, we discuss here an refinement of the original Lapriemodel. Alternative approaches could be the mutual mapping ofmodels (such as Laprie vs. ODC), or the creation of a grand-unifying meta model. While the first approach could be aninteresting work on its own, we see no advantage in defininganother completely new meta model beside the existing ones.

We restrict the discussion to dependability threats andtheir relationship, especially with respect to software reliabilityand availability. Our chosen version [6] of the Laprie modelwas published by Avizienis, Laprie and Randell in 2001. Ithas shown to be the most widely cited publication of the

Normal

Dormant Fault

Active Fault /Latent Error

Detected Error

Outage

External Fault

Internal Fault

Activation

Detection

Failure

Restoration

FailureError Handling

Figure 1. Failure automaton with the classical Laprie terminology [6]. Since“deactivation” is not an explicitely defined term in the original source, theaccording edge remains unlabeled.

model, even though more recent versions exist. In order tocorrectly represent all relevant statements from the source,we decided to visualize the terminology relationships as anon-deterministic finite automaton (NFA) that we call failureautomaton (see Figure 1). It relies on the common one-fault-at-a-time assumption, although in practice, faults and theiractivations may happen simultaneously.

With the definition of fault as a cause of an error, struc-tural defects, behavioral issues and external influences are allcovered by the same term. Given that, we consider both faultstates and fault events in the automaton by introducing thestates dormant fault and active fault, as well as the eventsinternal fault and external fault.

We assume that an explicit restoration event at least fixesthe error situation in the system, but cannot always fix theoriginal problem. Therefore, we interpret restoration as eventthat moves the system back to dormant fault state.

IV. ABSTRACT SYSTEM MODEL

Given the automaton representation of the Laprie terminol-ogy, our next step is to formulate an abstract understanding ofthe most relevant properties in software-based systems. Thefocus lies on the (more) explicit separation of fault conditionsand fault activations. In order to do that, we want two majoraspects to be represented: system layers and layer-drivensystem state.

We distinguish between the investigated layer and theenvironment layer of the system. Laprie and Kanoun call thisseparation an X-ware system with a hierarchy of interpreters[4]. Both the investigated layer and the environment layer areinternal parts of the system, while other entities are understoodas external stakeholders. The typical N-tier architecture forsoftware is represented here, since the environment layers canbe recursively treated as investigated layer, until the lowestphysical hardware level is reached. Therefore, the investigatedlayer is always assumed to be software, while the environmentlayer may be hardware or software.

2

[quotes from Avizienis et al., 2001]

“The delivery of incorrect service

is a system outage.”

“A system failure is an event that occurs

when the delivered service deviates from

correct service.”“Fault prevention: how to prevent the

occurrence or introduction of

faults.”

“Errors that are present but not

detected are latent errors.”

“A fault is active when it produces an error,

otherwise it is dormant. ...Most

internal faults cycle between their dormant

and active states.”

“A transition from incorrect service to

correct service is service restoration.”

Page 6: What activates a bug? A refinement of the Laprie terminology model.

[email protected] 2015

Step 2: System Model for Software

▶ Recursive concept for n-tier systems

▶ Investigated layer I ▶ State vector sI: Correct or incorrect ▶ Incorrect sI (error) may be detectable

and/or externally visible (failure) ▶ Input / output through environment ▶ Next state depends on environment

▶ Environment layer E ▶ State vector sE: Expected or unexpected ▶ Can have progress on its own

6

What activates a bug? A refinement of the Laprie terminology model

Investigated Layer I

Environment Layer E.........

InternalExternal

Input Output

Figure 2. Abstract system model.

The choice for the right granularity of layers is a widelydebated topic. Often, it is discussed with the unit-of-mitigationidea in mind [29]: The smallest acceptable granularity isthe one where dependability strategies, such as spatial faulttolerance, are still implementable in the layer itself.

The failure of the highest investigated system layer is theone that ultimately becomes visible to the user, since it offersthe service interface of the system. The user can be eithera human or another system, in accordance with the classicalLaprie understanding. Error propagation happens from theenvironment up to the investigated layer. The failure of theenvironment layer therefore influences the fault conditions inthe investigated layer.

It may be argued that environment layer failures can leadto a direct system failure, without prior error propagationthrough the investigated layer. We argue here that, given theassumptions above, this should also be interpreted as caseof (immediate) error propagation. One example is the crashof an application server (environment layer) that hosts aweb application (investigated layer). In this case, propagationoccurs in the form of an implicit termination of the runningapplication as a distinct progress step.

It can also be argued that error propagation may happeninside the investigated layer as well. However, for such cases,a separate system model at a lower level of granularity can bedefined.

B. System State

The possible states of a layer can be described as statespace ⌦. We interpret one state from this set as an arbitrarilycomplex vector of information, implemented in hardware orsoftware. Or, to use the words of Avizienis et al. [5]:

“The total state of a given system is the set of the follow-ing states: computation, communication, stored information,interconnection, and physical condition.”

Given a chosen granularity, both the investigated layer Iand the environment layer E have current state vectors sI 2⌦I and sE 2 ⌦E at any discrete point in time. ⌦I and ⌦E

may overlap in parts, for example when one physical memorylocation in a computer contributes to both state sets.

The investigated layer has a set of correct states XI ⇢ ⌦I

which lead to a non-failing operation of the system. The envi-

ronment layer is typically a black box for the developer, so wedenote XE ⇢ ⌦E as its expected states, which just expressesan external observer assumption regardless of whether this isan internal error state for E. The current states sI and sEmay or may not be in the set of correct or expected states,respectively.

Most software systems contain both volatile state andpersistent state at any given time. Any restart or other kind ofrecovery resets only the volatile state, so the persistent statemust be protected from corruption under all circumstances.The classical example is a database that, as any other software,operates on volatile state, but tries to make sure that the persis-tent state on disk always keeps its integrity. Our understandingof ‘current state’ includes both the volatile and the persistentstate.

Another relevant issue to be clarified in the system modelare input and output activities. Software engineers typicallylink the idea of system ‘input’ with something that is drivenby a – potentially unknown – third party from outside. Further-more, input is also treated as something that can be queued,restricted, checked or controlled in some way. Since this isdifficult to express in a generalized fashion, we decided torepresent system input (e.g. from the network) as a state changein the environment layer. This seems to be close to reality– no real application ever directly processes input from theoutside world. Instead, multiple levels of input buffers andcaches make the consumed input a part of the environmentlayer. Similarly, the generation of output by the investigatedlayer is modeled as triggered state change in the environmentlayer, and not as direct action. The investigated layer thereforehas no own input or output events.

Both the environment and the investigated layer need aprogress concept, meaning that their states evolve at discretepoints in time. The most common approach to model statechanges are discrete events for an ‘atomic’ execution step.The atomicity may e.g. refer to the processor hardware, thesemantics of the programming language as with C sequencepoints, or to the execution model of the virtual runtimeenvironment as with PLC loop-based computers.

The choice of the next active state in I depends on thecombination of sI and sE , specifically at the moment whenthe execution step happens.

Progress in I potentially also changes the state in E,for example when system calls take place. Therefore, weassume mutual influence between the layers, a direct one frominvestigated to environment layer, and an indirect one fromenvironment to investigated layer.

In E, the decision about the next state relies only on thecurrent sE .

The assumptions of our abstract system model are summa-rized in Table II.

VII. REFINED LAPRIE FAILURE AUTOMATON

We now combine the previously described layered systemmodel with the Laprie failure automaton from Figure 1. Thismainly relates to the question of how error and failure statescan occur in systems represented by the model.

5

as external stakeholders. The typical N-tier architecture forsoftware is represented here, since the environment layerscan be recursively treated as investigated layer, until thelowest physical hardware level are reached. Therefore, theinvestigated layer is always assumed to be software, while theenvironment layer may be hardware or software.

The choice for the right granularity of layers is a widelydebated topic. Often, it is discussed with the unit-of-mitigationidea in mind [29]: The smallest acceptable granularity isthe one where dependability strategies, such as spatial faulttolerance, are still implementable in the layer itself.

The failure of the highest investigated system layer is theone that ultimately becomes visible to the user, since it offersthe service interface of the system. The user can be eithera human or another system, in accordance with the classicalLaprie understanding. Error propagation happens from theenvironment up to the investigated layer. The failure of theenvironment layer therefore influences the fault conditions inthe investigated layer.

It may be argued that environment layer failures can leadto a direct system failure, without prior error propagationthrough the investigated layer. We argue here that, given theassumptions above, this should also be interpreted as caseof (immediate) error propagation. One example is the crashof an application server (environment layer) that hosts aweb application (investigated layer). In this case, propagationoccurs in the form of an implicit termination of the runningapplication as a distinct progress step.

It can also be argued that error propagation may happeninside the investigated layer as well. However, for such cases,a separate system model at a lower level of granularity can bedefined.

B. System State

The possible states of a layer can be described as statespace ⌦. We interpret one state from this set as an arbitrarilycomplex vector of information, implemented in hardware orsoftware. Or, to use the words of Avizienis et al. [5]:

“The total state of a given system is the set of the follow-ing states: computation, communication, stored information,interconnection, and physical condition.”

Given a chosen granularity, both the investigated layer Iand the environment layer E have current state vectors sI 2⌦I and sE 2 ⌦E at any discrete point in time. ⌦I and ⌦E

may overlap in parts, for example when one physical memorylocation in a computer contributes to both state sets.

The investigated layer has a set of correct states XI ⇢ ⌦I

which lead to a non-failing operation of the system. The envi-ronment layer is typically a black box for the developer, so wedenote XE ⇢ ⌦E as its expected states, which just expressesan external observer assumption regardless of whether this isan internal error state for E. The current states sI and sE mayor may not be in the set of correct resp. expected states.

Most software systems contain both volatile state andpersistent state at any given time. Any restart or other kind ofrecovery resets only the volatile state, so the persistent state

must be protected from corruption under all circumstances.The classical example is a database that, as any other software,operates on volatile state, but tries to make sure that the persis-tent state on disk always keeps its integrity. Our understandingof ‘current state’ includes both the volatile and the persistentstate.

Another relevant issue to be clarified in the system modelare input and output activities. Software engineers typicallylink the idea of system ‘input’ with something that is drivenby a – potentially unknown – third party from outside. Further-more, input is also treated as something that can be queued,restricted, checked or controlled in some way. Since this isdifficult to express in a generalized fashion, we decided torepresent system input (e.g. from the network) as a state changein the environment layer. This seems to be close to reality– no real application ever directly processes input from theoutside world. Instead, multiple levels of input buffers andcaches make the consumed input a part of the environmentlayer. Similarly, the generation of output by the investigatedlayer is modeled as triggered state change in the environmentlayer, and not as direct action. The investigated layer thereforehas no own input or output events.

Both the environment and the investigated layer need aprogress concept, meaning that their states evolve at discretepoints in time. The most common approach to model statechanges are discrete events for an ‘atomic’ execution step.The atomicity may e.g. refer to the processor hardware, thesemantics of the programming language as with C sequencepoints, or to the execution model of the virtual runtimeenvironment as with PLC loop-based computers.

The choice of the next active state in I depends on thecombination of sI and sE , specifically at the moment whenthe execution step happens.

Progress in I potentially also changes the state in E,for example when system calls take place. Therefore, weassume mutual influence between the layers, a direct one frominvestigated to environment layer, and an indirect one fromenvironment to investigated layer.

In E, the decision about the next state relies only on thecurrent sE .

The assumptions of our abstract system model are summa-rized in Table II.

Table II. STATE CONCEPT FOR THE ABSTRACT SYSTEM MODEL.

Investigated layer states ⌦I

Current state sI 2 ⌦I

Correct states XI ⇢ ⌦I

Incorrect states EI = ⌦I \ XI

Detectable incorrect states DI ⇢ EI

Externally visible incorrect states FI ⇢ EI

Investigated layer progress fI : ⌦I ⇥ ⌦E ! ⌦I ⇥ ⌦E�sIsE

�(t) 7!

�sIsE

�(t + 1)

Environment layer states ⌦E

Current state sE 2 ⌦E

Expected states XE ⇢ ⌦E

Unexpected states EE = ⌦E \ XE

Environment layer progress fE : ⌦E ! ⌦E

fE : sE(t) 7! sE(t + 1)

5

Page 7: What activates a bug? A refinement of the Laprie terminology model.

[email protected] 2015

Refined Terminology

▶ Refined definition for ‚software fault‘: Minimal set of code deviations from correct code, such that the execution of the deviating code can trigger an error.

▶ Fault Model: Description of possibilities for faulty code

▶ Fault Condition Model: Description of fault-enabling system states

▶ Fault Enabling: Change of system state to allow some error ▶ Fault Activation: Execution of faulty code leading to that error

7

What activates a bug? A refinement of the Laprie terminology model

Page 8: What activates a bug? A refinement of the Laprie terminology model.

[email protected] 2015

What activates a bug? A refinement of the Laprie terminology model

Step 3: Refined Failure Automaton

8

Disabled FaultsI 2 XI

sE 2 XE

Dormant FaultsI 2 XI

sE 2 ⌦E

Active Fault /Latent ErrorsI 2 EI

sE 2 ⌦E

Detected ErrorsI 2 DI

sE 2 ⌦E

OutagesI 2 FI

sE 2 ⌦E

EXECF

CON

(Enabling)

COFF

(Disabling)

EXECF

(Activation) Deactivation

COFF , CON ,EXECF

FAIL

Detection

COFF , CON ,EXECF

Mitigation

Restoration

RecoveryFAIL

Figure 3. Failure automaton with fault activation conditions. Some events may occur in more than one of the states: CON (fault condition is fulfilled now),COFF (fault condition is no longer fulfilled), EXECF (faulty code is executed), FAIL (failure).

is no longer fulfilled. The precise formulation of such eventsdepends on the representation of states in the system.

Figure 3 shows the resulting failure automaton. It has asimilar set of dependability-related operational states. Whilethe system is running, it may reside for an unrestricted periodof time in all of these operational states.

In disabled fault state, the fault conditions are not fulfilled.The state of the investigated and the environment layer are asexpected by the developers and operators (sI 2 XI , sE 2 XE).The fault / code defect / bug may be present in the system,but does not influence the system functionality in any way.We intentionally treat the existence of faults as normal here.Avoiding the introduction of the fault is described as faultprevention, finding and fixing an existing fault as fault removal,both in accordance with the classical terminology.

With the change of sI and / or sE into something thatfulfills an activation condition, the system moves into thedormant fault state, a transition which is triggered by the socalled CON event. Similarly, a change of sI and / or sE mayresult in an event (COFF ) that brings the dependability stateback to disabled fault.

CON can be an isolated event that happens once and thennever again, or it may happen dynamically and multiple timesduring runtime. The CON event may even happen immediatelyat application start, for example when the environment layer isnot in the right shape to execute this software (sE 2 EE). Suchare, for instance, scenarios where the runtime platform and thetarget platform used during compilation do not match. Codewhich is executed on the wrong platform can exhibit arbitrarily

wrong behavior. Note that COFF may not always be possiblein a specific system, e.g. when there are configuration errorsin the environment layer.

An interesting question is the nature of investigated andenvironment layer in case of a dormant fault. Since an incorrectstate of the investigated layer is by definition an error (i.e., adifferent state in our failure automaton), the system must stillbe in a correct state (sI 2 XI ) while the fault is dormant.Initially, it seems more intuitive to define the existence ofbroken internal state as precondition for an error. However,even this would have been a result of some root causebeforehand, when we assume that the investigated system hada correct state at start. The point here is that we are typicallynot interested in the end of the error propagation chain, but inthe initial root cause. Therefore, it makes sense to define thatthe combination of correct own state and arbitrary environmentstate serves as precondition for activation.

The environment layer state is arbitrary during a dormantfault situation (sE 2 ⌦). When it still works correctly(sE 2 XE) in dormant fault state, we denote this as hardfault, since the activation only depends on the right sI and theexecution (EXECF ). An unexpected environment condition(sE 2 EE) in dormant fault state is a situation that is hard topredict with software testing, as long as the environment layeris not completely identical to the one used during operation(Mandelbug case).

In the moment were the faulty code is executed (EXECF )and the fault condition is still fulfilled (no COFF ), the internalstate changes into something not expected by developers or

7

EXECF: Event when faulty code is executed

CON: Event when activation condition is established

COFF: Event when activation condition is no longer given

EI: Incorrect states

DI: Detectable incorrect states

FI: Externally visible incorrect states

Page 9: What activates a bug? A refinement of the Laprie terminology model.

[email protected] 2015

Discussion

▶ Most Laprie concepts remain the same ▶ Fault prevention, fault removal and fault tolerance ▶ External physical faults may impact both sI and sE

▶ Fault handling in the original sense is now fault disabling or fault removal ▶ Might be interesting to focus on fault disabling ▶ Adding software dependencies makes fault disabling harder

▶ Activation conditions with unexpected environment states are key ▶ How to test that?

9

What activates a bug? A refinement of the Laprie terminology model

Page 10: What activates a bug? A refinement of the Laprie terminology model.

[email protected] 2015

Describing Software Bugs

▶ Unexpected input: Fault enabling due to input received in the environment

▶ Race condition: Fault enabling / disabling due to timing of the environment ▶ Missing libraries: Fault enabling immediately on application start, no COFF

▶ Automatic variable initialization: Reduction of activation conditions ▶ Common-cause error: Same sE in multiple activation conditions

10

What activates a bug? A refinement of the Laprie terminology model

#define BUFSIZE 256 int main(int argc, char **argv) { char buf[BUFSIZE]; strcpy(buf, argv[1]); } [CWE ID 121]

Page 11: What activates a bug? A refinement of the Laprie terminology model.

[email protected] 2015

Summary▶ Refinement of the proven Laprie terminology model ▶ Separation of code defects and their enabling states ▶ Separation of investigated and environment layer ▶ Basic concepts (propagation, fault / error / failure) remain the same

▶ Fault model: Missing and defective code

▶ Fault condition model: System states enabling faults

▶ Error model: System states with activated faults that may lead to failure

11

What activates a bug? A refinement of the Laprie terminology model

Disabled FaultsI 2 XI

sE 2 XE

Dormant FaultsI 2 XI

sE 2 ⌦E

Active Fault /Latent ErrorsI 2 EI

sE 2 ⌦E

Detected ErrorsI 2 DI

sE 2 ⌦E

OutagesI 2 FI

sE 2 ⌦E

EXECF

CON

(Enabling)

COFF

(Disabling)

EXECF

(Activation) Deactivation

COFF , CON ,EXECF

FAIL

Detection

COFF , CON ,EXECF

Mitigation

Restoration

RecoveryFAIL

Figure 3. Failure automaton with fault activation conditions. Some events may occur in more than one of the states: CON (fault condition is fulfilled now),COFF (fault condition is no longer fulfilled), EXECF (faulty code is executed), FAIL (failure).

is no longer fulfilled. The precise formulation of such eventsdepends on the representation of states in the system.

Figure 3 shows the resulting failure automaton. It has asimilar set of dependability-related operational states. Whilethe system is running, it may reside for an unrestricted periodof time in all of these operational states.

In disabled fault state, the fault conditions are not fulfilled.The state of the investigated and the environment layer are asexpected by the developers and operators (sI 2 XI , sE 2 XE).The fault / code defect / bug may be present in the system,but does not influence the system functionality in any way.We intentionally treat the existence of faults as normal here.Avoiding the introduction of the fault is described as faultprevention, finding and fixing an existing fault as fault removal,both in accordance with the classical terminology.

With the change of sI and / or sE into something thatfulfills an activation condition, the system moves into thedormant fault state, a transition which is triggered by the socalled CON event. Similarly, a change of sI and / or sE mayresult in an event (COFF ) that brings the dependability stateback to disabled fault.

CON can be an isolated event that happens once and thennever again, or it may happen dynamically and multiple timesduring runtime. The CON event may even happen immediatelyat application start, for example when the environment layer isnot in the right shape to execute this software (sE 2 EE). Suchare, for instance, scenarios where the runtime platform and thetarget platform used during compilation do not match. Codewhich is executed on the wrong platform can exhibit arbitrarily

wrong behavior. Note that COFF may not always be possiblein a specific system, e.g. when there are configuration errorsin the environment layer.

An interesting question is the nature of investigated andenvironment layer in case of a dormant fault. Since an incorrectstate of the investigated layer is by definition an error (i.e., adifferent state in our failure automaton), the system must stillbe in a correct state (sI 2 XI ) while the fault is dormant.Initially, it seems more intuitive to define the existence ofbroken internal state as precondition for an error. However,even this would have been a result of some root causebeforehand, when we assume that the investigated system hada correct state at start. The point here is that we are typicallynot interested in the end of the error propagation chain, but inthe initial root cause. Therefore, it makes sense to define thatthe combination of correct own state and arbitrary environmentstate serves as precondition for activation.

The environment layer state is arbitrary during a dormantfault situation (sE 2 ⌦). When it still works correctly(sE 2 XE) in dormant fault state, we denote this as hardfault, since the activation only depends on the right sI and theexecution (EXECF ). An unexpected environment condition(sE 2 EE) in dormant fault state is a situation that is hard topredict with software testing, as long as the environment layeris not completely identical to the one used during operation(Mandelbug case).

In the moment were the faulty code is executed (EXECF )and the fault condition is still fulfilled (no COFF ), the internalstate changes into something not expected by developers or

7

Page 12: What activates a bug? A refinement of the Laprie terminology model.

[email protected] 2015 12

What activates a bug? A refinement of the Laprie terminology model

Disabled FaultsI 2 XI

sE 2 XE

Dormant FaultsI 2 XI

sE 2 ⌦E

Active Fault /Latent ErrorsI 2 EI

sE 2 ⌦E

Detected ErrorsI 2 DI

sE 2 ⌦E

OutagesI 2 FI

sE 2 ⌦E

EXECF

CON

(Enabling)

COFF

(Disabling)

EXECF

(Activation) Deactivation

COFF , CON ,EXECF

FAIL

Detection

COFF , CON ,EXECF

Mitigation

Restoration

RecoveryFAIL

Figure 3. Failure automaton with fault activation conditions. Some events may occur in more than one of the states: CON (fault condition is fulfilled now),COFF (fault condition is no longer fulfilled), EXECF (faulty code is executed), FAIL (failure).

is no longer fulfilled. The precise formulation of such eventsdepends on the representation of states in the system.

Figure 3 shows the resulting failure automaton. It has asimilar set of dependability-related operational states. Whilethe system is running, it may reside for an unrestricted periodof time in all of these operational states.

In disabled fault state, the fault conditions are not fulfilled.The state of the investigated and the environment layer are asexpected by the developers and operators (sI 2 XI , sE 2 XE).The fault / code defect / bug may be present in the system,but does not influence the system functionality in any way.We intentionally treat the existence of faults as normal here.Avoiding the introduction of the fault is described as faultprevention, finding and fixing an existing fault as fault removal,both in accordance with the classical terminology.

With the change of sI and / or sE into something thatfulfills an activation condition, the system moves into thedormant fault state, a transition which is triggered by the socalled CON event. Similarly, a change of sI and / or sE mayresult in an event (COFF ) that brings the dependability stateback to disabled fault.

CON can be an isolated event that happens once and thennever again, or it may happen dynamically and multiple timesduring runtime. The CON event may even happen immediatelyat application start, for example when the environment layer isnot in the right shape to execute this software (sE 2 EE). Suchare, for instance, scenarios where the runtime platform and thetarget platform used during compilation do not match. Codewhich is executed on the wrong platform can exhibit arbitrarily

wrong behavior. Note that COFF may not always be possiblein a specific system, e.g. when there are configuration errorsin the environment layer.

An interesting question is the nature of investigated andenvironment layer in case of a dormant fault. Since an incorrectstate of the investigated layer is by definition an error (i.e., adifferent state in our failure automaton), the system must stillbe in a correct state (sI 2 XI ) while the fault is dormant.Initially, it seems more intuitive to define the existence ofbroken internal state as precondition for an error. However,even this would have been a result of some root causebeforehand, when we assume that the investigated system hada correct state at start. The point here is that we are typicallynot interested in the end of the error propagation chain, but inthe initial root cause. Therefore, it makes sense to define thatthe combination of correct own state and arbitrary environmentstate serves as precondition for activation.

The environment layer state is arbitrary during a dormantfault situation (sE 2 ⌦). When it still works correctly(sE 2 XE) in dormant fault state, we denote this as hardfault, since the activation only depends on the right sI and theexecution (EXECF ). An unexpected environment condition(sE 2 EE) in dormant fault state is a situation that is hard topredict with software testing, as long as the environment layeris not completely identical to the one used during operation(Mandelbug case).

In the moment were the faulty code is executed (EXECF )and the fault condition is still fulfilled (no COFF ), the internalstate changes into something not expected by developers or

7


Recommended