Download - Center for Reliability Engineering Integrating Software into PRA B. Li, M. Li, A. Sinha, Y. Wei, C. Smidts Presented by Bin Li Center for Reliability Engineering.

1 8 5 6

Center for Reliability Engineering

Integrating Software into PRA

B. Li, M. Li, A. Sinha, Y. Wei, C. Smidts

Presented by

Bin LiCenter for Reliability Engineering

University of Maryland, College Park

July 20, 2004

1 8 5 6


Integrating Software into PRA

Research Objectives

• The objective of our research is to extend the current PRA (Probabilistic Risk Assessment) methodology to integrate software in the risk assessment process.

• Such extension requires modeling the software, the computer platform on which it resides and the interactions it has with other systems.

1 8 5 6


Framework

Initiating EventAnalysis

Accident-Sequence

Construction

Accident-Sequence

Quantification

UncertaintyAnalysis

ChecklistsPHAFMEAHAZOPMaster Logic Diagram

Event Tree AnalysisEvent Sequence DiagramPetri NetsMakov Chains

Fault Tree AnalysisProbabilistic MethodsStatistical MethodsCommon Cause AnalysisHuman reliability Analysis

PRA Question1Initiators

PRAQuestion 2

Consequences

PRAQuestion 3

Probabilities

PRAQuestion 4Uncertainty

PRA Process Analysis Steps Techniques

Classical StatisticsBayesian StatisticsSensitivity Analysis

1. What is the software related failures in the system?2. How to classify the software related failures?3. In which level should we consider the softwarerelated failures?

1. Which methods should one use if softwareparticipates to the unfolding of the accident?

1. Which quantification models or methods can beused to quantify software related failures?2. Which kinds of data will be needed for thesequantification models?3. If these data are not available, How can we handlethe cases?

1. What is necessary uncertainty analysis for thisresearch?2. Which kinds of uncertainty analysis should bedone?

Questions that this research should answer Methodology

Software relatedfailure modes

ModelingApproaches andTechniques

QuantificationModels and Data

UncertaintyAnalysisMethodology

Framework for Integrating Software into PRA

1 8 5 6


Software related failure mode taxonomy

Input(Human, Software,

Hardware)Software

Output(Human, Software,

Hardware)

Computer

Environment

1 8 5 6


Software related failure mode taxonomy

Software related failures

Internal failures Interaction failures

Inputfailures

Outputfailures

Supportfailures

Multipleinteraction

failures

Environmental factors

Functionfailures

Attributefailures

Functionset

failures

1 8 5 6


Validation of the Failure Mode Taxonomy

Training

JSC Classification

UMD Verification of thetaxonomy and Consolidation

with JSC

Phase 1

Phase 2

Phase 3

• Validation Criteria:– Completeness– Consistency– Repeatability– Applicability

Validation Process

1 8 5 6


Completeness and ApplicabilityFailure Modes Definition Value-Initialization Value at initialization is incorrect. Value input or

output at initialization is incorrect. A function receives a bad value from a file at time = 0.

Value Logic An incorrect value is used due to a problem with logic. Value-Additional Logic Additional logic added to handle a value. Value-Display Display value, added, deleted or modified. Incorrect

value displayed in data field. Value-Hardware Value changes due to hardware changes or

modifications. Changing the NIC changes the MAC address and corrupts the license key.

Inadequate Requirements

Requirements were incorrect. The developer followed the requirements to the letter and they were determined to be incorrect.

Propagation of Failure Failure upstream of the module causes an unstable state or failure of modules downstream.

User error User knowingly or unknowingly uses software incorrectly or outside of the intentional design boundaries.

Failure Modes Added By JSC

1 8 5 6


Repeatability and Consistency

The conflicts in two rounds

Second Round

Category Functional I/O Support

Multiple Interaction

Failure Mode Attr

ibut

e

Fun

ctio

n

Am

ount

Ran

ge

Tim

e

Typ

e

Val

ue

Rat

e

CP

U

Per

iphe

ral

Res

ourc

e

Com

mun

icat

ion

Attribute 26 1 3 12 Functional

Function 64 14 1

Amount 2 2 1 1

Range 4 2 7 1

Time 3 1 4 1

Type 1 5 1

Value 25 42 1 61 1

I/O

Rate 1 2

CPU 1 0

Peripheral 1 0 Support

Resource 1 1 4

Fir

st R

ou

nd

Multiple Interaction Communication 3

1 8 5 6


Repeatability

Second round

Failure mode 1

Failure mode 2

Failure mode n

Failure mode 1

P11 P12 …… P1n P1+

Failure mode 2

P21 P22 …… P2n P2+

……

……

……

……

……

……

.

Fir

st r

ound

Failure mode n

Pn1 Pn2 …… Pnn Pn+

P+1 P+2 …… P+n

n

jiji PP

1

n

iijj PP

1

n

iiio PP

1

n

iiie PPP

1

e

eo

P

PPR

1

The measurement of repeatability (R) is the repeatability coefficient (Cohen’s Kappa), Kappa values less than 0.45 indicate inadequate repeatability, values above 0.62 indicate good repeatability,and values above 0.78 indicate excellent repeatability

R = 0.46

1 8 5 6


Results of the Validation of the Taxonomy

• The UMD and the JSC teams reached the following consensus:

– The taxonomy is completecomplete and can be appliedapplied to aerospace systems of various natures;

– The taxonomy includes failure modes applicable to autonomous real time systems and mission critical systems;

– The taxonomy considers all the failure modes in software;

– There is sufficient data available for the validation and enough flexibility to use alternative data.

• RepeatabilityRepeatability and ConsistencyConsistency are adequate.

1 8 5 6


Test-Based Approach - Procedure

• Identify events/components controlled by software in the MLD

• Identify events/components controlled by software in accident scenarios

• Specify the functions involved• Modeling of the Software Component in ESDs/ETs

and Fault Trees • Quantification

1 8 5 6


Identify Software Controlled Events/Components in the MLD

Loss of Occupants

AND GATE

Internal Accident Disaster

OR GATE

Fire

Gas

Chemical Explosion

Poision Materials (anthrax)

Exit Fails

AND GATE

Emergency Exit Fails

Normal Exit Path Fails

Gate Fails

Software Fails

OR GATE

Temperature

OR GATE

OR GATE

Software Fails Initiating Events

OR GATE

1 8 5 6


Identify Software Controlled Events/Components in Accident Scenarios

EmergencyExit

PACS(2)B1

LED1Yes

No

Yes

Safe

Safe

Loss ofOccupants

Safe

No

Yes

FireProtection

YesSafe

No

Loss ofOccupants

Guardthere

Guardaction

T1< Tcritical

Yes

No

Loss ofOccupants

DelayPACS

(1)B3

GateDelay of

opening of gate T2< Tcritical

Yes Yes Yes

No

NoNoNo

PACS (3)B2

Gate Delay of openingof gate T2< TcriticalT3< Tcritical

Yes Yes

NoNo No

Fire

No

Sequence 1

Sequence 2

Sequence 3

Sequence 4

Sequence 5-6

Sequence 7

Sequence 8-11

56

8 9 10 11

The userinsert the

card

Yes

Yes

Delay

The userinsert the PIN

Delay

Delay Safe

Loss ofOccupants

PACS (4)B2

GateDelay of opening

of gate T2< TcriticalT3< T

critical

Yes Yes

NoNo No

Sequence 12

Sequence 13-1513 14 15

YesThe userinsert the PIN

1 8 5 6


• Identify software behavior from ESD/ET– Identify stimuli and results

• Identify software component from requirements specifications– Identify inputs and outputs

• Match stimuli/inputs and results/outputs

Specify the Functions Involved

1 8 5 6


Modeling Software Component in ESDs/ETs and FTs

Input Delay ofExecution

SWExecution

Does the support platformfunction normally?

Yes

No

Does the required SWoutput match the inputrequired by the next

component?

Behavior specified inrequirements is consistent,

unique and the actualbehavior is adequate per

requirement

Yes

No

No

Continue on safe branch

Does the erroneous behaviorlead to a safe condition?


NoUnsafe State

Does the required outputlead to a safe condition?


Yes

Yes

Is the support platform fullynonfunctional?

Yes

Yes

No

Does this support failure leadto a safe condition?

Yes

No

Support platformbehaves in a

degraded modeInput Delay of

ExecutionSW

Execution

Behavior specified inrequirements is consistent,

unique and the actualbehavior is adequate per

requirement

Does the required SWoutput match the inputrequired by the next

component?

Yes

No

No


Does the erroneous behaviorlead to a safe condition?


No Unsafe State

Does the required outputlead to a safe condition?


Yes

Yes

Yes

No Unsafe State

Unsafe State


Unsafe State

No

1 8 5 6


• Utilizing testing to obtain the probability that the software leads to an unsafe state

• The process is as follows:– Define the test cases. These test cases cover both the normal input and

the abnormal input. The testing strategy includes the identification of normal input space and abnormal input space. Test cases are randomly sampled from these spaces.

– Build a Finite State Machine model of the software component to represent its behavior (the oracle). The operational profile derived from the input tree is also embedded into this FSM model.

– Automate the testing using the test scripts generated from the FSM model.

– Define and identify the software component’s safe and unsafe conditions within the context of each ESD sequence.

Quantification

1 8 5 6


Scalability

• The test based approach can be used for large scale systems because large finite state machines have been built and large systems can be tested by WinRunner.

• Scalability, describes the relationship between the effort needed to use this method for large systems and the effort needed for the smaller systems which are part of the investigation.

• Contributors to the effort are:• The modeling effort (time to build the finite state machine), • The test case generation time (time to generate the test cases in

TestMaster) • The test execution time (time to execute test cases in WinRunner).

1 8 5 6


Modeling Time

COCOMO II is used to calculate the time to construct the finite state machine model.

PM = A *(Size)E *27%*25%A=2.94 Emin=0.91Emax=1.226

Size1=70FP PMmax= 1 and PMmin=0.63

Size2=700FPPMMax=16.5 and PMmin=5.4

1 8 5 6


Test Generation time

• Test generation time in full coverage is a function of the size of the model.

• Empirical relations of the following forms can be found:

where

CsizeAtime *

• Empirical study shows :

transtions of number models of number statesof rnumbesize

055.0)(*0018.0 sizetime

1 8 5 6


Calculation of FSM Model Size• Size of the model is a function of Function Points and the

Operation profile.

• Procedure for calculating the size– Determine the basic size from Function Point calculations for the

system.

– Determine the reliability requirement for the testing process.– Calculate the number of iterations required for the target reliability.

– Calculate the size of the largest iterating sub-model.– Calculate the modified size.

),( OPFPfsize

)(6861.3)(0434.0 2 FPFPsize

frequency) least withpath the ofy probabilit

Failure ofy probabilit Thresholdn

log(

submodelsizensizesize actual *

1 8 5 6


Test Execution Time

soiexec Tmnt **0016.0)(*0000703.0 /

• Test Execution time(texec) is a linear function of the number of the

input/output (ni/o), numbers of check points (m) and the waiting time for

responses(Ts)

• Empirical study shows:

soiexec Tmnt **)(* /

1 8 5 6


Summary of Scalability Study

The results of the scalability show that:

• Modeling time can be calculated by using COCOMO II;

• Test generation time in full cover is a function of the size of the model;

• Test execution time is a linear function of the number of the input/output, numbers of check points and the waiting time for responses.

1 8 5 6


Ongoing and Future Research

• Continue the application of a large scale system – The application we chose is CM1 from the

Metrics Data Program

• Finalize the scalability study

• Continue the support failure modes study

• Continue the output failure modes study

• Conduct the fault propagation study