Chapter 9 Probabilistic Safety Assessment
9.1 Introduction
Probabilistic safety assessment (PSA), also called probabilistic risk assessment (PRA), is currently being widely applied to many fields, viz., chemical and process plants, nuclear facilities, aerospace, and even to financial management. PSA es-sentially aims at identifying the events and their combination(s) that can lead to severe accidents, assessing the probability of occurrence of each combination and evaluating the consequences. PSA provides the quantitative estimate of risk which is useful for comparison alternatives in different design and engineering areas. PSA has been accepted all over the world as an important tool to assess the safety of a facility and to aid in ranking safety issues by order of importance. The main benefit of PSA is to provide insights into design, performance, and environmental impacts, including the identification of dominant risk contributors and the com-parison of options for reducing risk. In addition, it provides inputs to decisions on design and back-fitting, plant operation, safety analysis, and regulatory issues. PSA offers a consistent and integrated framework for safety-related decision-making.
PSA is considered to complement the deterministic analysis for design-basis events, and for beyond-design-basis accidents that consider multiple failures, in-cluding operator errors and low-probability events of high consequences, it sup-plements the information. PSA has been shown to provide important safety in-sights in addition to those provided by deterministic analysis. PSA differs from traditional deterministic safety analysis in that it provides a methodological ap-proach to identifying accident sequences that can follow from a broad range of ini-tiating events (IEs) and it includes the systematic and realistic determination of accident frequencies and consequences.
In spite of the benefits, it is well recognized that PSA has its own limitations. The accuracy of the PSA depends on the uncertainties in aspects like data and models on common-cause failures (CCFs) and human reliability. The benefits that
324 9 Probabilistic Safety Assessment
accrue from PSA overweigh its limitations. So worldwide, utilities are performing PSA of their plants, and many regulatory bodies are using it as a risk-informed approach in decision-making, some even following it as a risk-based approach in decision-making. Over the years, the PSA methodology has matured and even new applications like living PSA/risk monitor, technical specification optimiza-tion, reliability centered maintenance and risk-based in-service inspection have emerged.
9.2 Concept of Risk and Safety
The word risk is generally defined as “the chance of injury or loss resulting from exposure to a source of danger,” while safety means “freedom from danger or hazard.” A problem is that in a technological society, no activity has zero risk. Thus, there cannot be absolute safety, and the two terms are related to one another, with low risk meaning the same as high level of safety. However, psychologically, we tend to be more comfortable with the term safety than with the term risk, whether we are discussing nuclear reactors, airplanes, chemicals, or automobiles.
Risk is related to “chance of loss.” The qualitative definition can be converted into a quantitative one by putting risk on a mathematical foundation. Let the “chance” be probability, “loss” be consequences “of” be multiplication. This is to define risk as probability times consequences, thus risk combines both probability and consequences:
Risk = Probability × Consequences. (9.1)
Producing probability distributions for the consequences effects a much more detailed description of risk.
The notion of risk can be further refined by defining it as a set of triplets; as explained by Kaplan and Garrick [1], it is set of scenarios Si, each of which has a probability pi and a consequence xi. If the scenarios are ordered in terms of in-creasing severity of the consequences then a risk curve can be plotted, for example as shown in Figure 9.1.
Risk = <Si, pi, xi>, i = 1, 2, 3,…, n. (9.2)
The risk curve represents the probability that the consequences of the accident will be greater than some value of x. Mathematically, the exceedance probability is an integral from x to ∞ ; the curve is known as the complementary cumulative distribution function.
Further, instead of probability of the event, frequency with which such an event might take place can also be used.
9.2 Concept of Risk and Safety 325
Figure 9.1 Risk curve
Table 9.1 Probabilities and consequences
Event Consequence xi ($) Probability Risk Accident A 1000 4 × 10–2 40 Accident B 500 2 × 10–2 10 Accident C 100 1 × 10–2 1 Accident D 1500 8 × 10–2 120 Accident E 10000 5 × 10–4 5 Accident F 5000 5 × 10–3 25 Accident G 2500 1 × 10–3 2.5 Accident H 750 3 × 10–2 22.5 Accident I 8000 3 × 10–4 2.4 Accident J 7000 1 × 10–4 0.7
Example 1 In a chemical plant, event probabilities and consequences for accidents A–J are known (as shown in Table 9.1). Construct the risk curve.
Solution: Calculations are given in Table 9.2.
Table 9.2 Risk calculations
X 100 500 750 1000 1500 P(X≥x) 0.1869 0.1769 0.1569 0.1269 0.0869 X 2500 5000 7000 8000 10000 P(X≥x) 0.0069 0.0059 0.0009 0.0008 0.0005
Prob
abili
ty
(X >
= x)
Consequences (x)
326 9 Probabilistic Safety Assessment
00,020,040,060,080,1
0,120,140,160,180,2
100 2100 4100 6100 8100
x
P(X>=x)
Figure 9.2 Risk curve
The risk for each accident is calculated (Table 9.2). The constructed risk curve is shown in Figure 9.2. By providing safe grounds against accidents D, A, and F, the risk can be reduced significantly.
9.3 Probabilistic Safety Assessment Procedure
The procedure presented here is general and is derived based on the insights from [2–5], as shown in Figure 9.3.
PSA begins with the definition of objectives and scope of the analysis, which are necessary for the organization and management of the study. This is also use-ful to inform the specification of consequence categories (for accident sequence development) and frequency cut-off that serve to bound the analysis.
Thorough understanding of the system is the prerequisite for doing PSA. Sys-tem awareness through system/plant visits and discussions with designers, and op-erating and maintenance engineers, enhance the understanding. The information for system familiarization and for further analysis is obtained from design reports, toxic inventory, technical specification reports (for test, maintenance, and operat-ing description), history cards, and safety-related unusual occurrence reports.
A list should be made of all the sources of hazard (toxic substances, radioactiv-ity, fire, etc.) from which accidents can progress. Several approaches are available for identifying the hazards or accident initiation. These include preliminary hazard analysis (PHA), HAZOP, master logic diagrams (MLDs), and failure mode and ef-fect analysis (FMEA). After identifying the IEs, the potential states of the plant to
9.3 Probabilistic Safety Assessment Procedure 327
be analyzed are determined, and the safety functions incorporated into the plant are identified. The relationships between IEs, safety functions, and systems are es-tablished and categorized.
Figure 9.3 PSA procedure
System Description Objective Definition
Hazard Identification (Initiate event)
Accident Sequence Modeling
DATA Incident Frequency Estimation
Consequence Estimation
Risk Estimation
Risk Assessment
Risk Management
328 9 Probabilistic Safety Assessment
After identifying IEs, the next step is development of a model to simulate the initiation of an accident and the response of the plant/system. A typical accident sequence consists of IE, specific system failure/success, human errors, and associ-ated consequences. Accident sequence modeling can be divided into:
• event sequence modeling; • system modeling; • consequence analysis.
An event sequence model provides sequence of events, following an IE, lead-ing to different consequences. There are several methods available for event se-quence modeling:
• event sequence diagram; • event trees; • cause-consequence diagrams.
Once the response of the plant to the IEs has been modeled by one of the avail-able event sequence modeling techniques, system modeling is used to quantify system failure probabilities and subsequently accident sequence frequencies. There are several methods available for system modeling:
• fault trees; • state space diagrams; • RBDs; • go charts.
Consequence analysis determines the expected severity of each sequence ob-tained from the event sequence model. It is specific to the system and the units could be the number of fatalities, radioactivity dose, or damage in terms of dollars. Risk estimation combines the consequences and frequencies (or likelihood) of all accident sequences to provide a measure of risk.
After obtaining a quantitative measure from risk estimation, risk evaluation is carried out, taking into account judgments about significance of hazardous events and risk levels. It also requires the introduction of acceptance standards. Risk as-sessment covers risk estimation and evaluation.
The risk assessment also identifies major contributions to the overall risk. This is main input to risk management where decisions are to be made regarding efforts to reduce the risk. Risk management focuses on the development of implementa-tion strategy, examination of policy options, environmental monitoring, and opera-tions auditing. Thus, risk management includes risk estimation, evaluation, and decision-making.
Interested readers can refer to [6–9] for further details on PSA and specific ap-plications, including the chemical and nuclear industries.
9.4 Identification of Hazards and Initiating Events 329
9.4 Identification of Hazards and Initiating Events
9.4.1 Preliminary Hazard Analysis
PHA identifies major hazards of the system, their causes, and severity of the con-sequences. PHA is usually used during the preliminary design stage. A guide list containing components which are potentially hazardous (e.g., toxic materials, fu-els, pressure containers, heating devices, and radioactive materials), is very useful information for beginning a PHA. Understanding the physics of these components and their interaction with neighboring equipment is useful in identifying hazard-ous situations. A typical format of PHA is shown in Table 9.3. A thorough under-standing of the system along with the operating experience of the specific system aids in completing the PHA tables. A typical PHA table for a nuclear power plant (NPP) is shown in Table 9.3. The information from the PHA is very elementary and it does not identify specific components in the systems which have potential to create hazardous scenarios. FMEA and HAZOP are most widely used in chemi-cal and process plants for hazard identification.
Table 9.3 PHA format for an NPP
Hazardous element
Event causing hazardous situation
Hazardous situation
Event leading to potential accident
Potential accident
Effects Preventive measures
Reactor core
Rupture of header due to corrosion
Reactor shutdown system failure
Safety system emergency core cooling system is unavailable
Release of radioactive material
Radiation dose to operator
Proper water chemistry to prevent corrosion
9.4.2 Master Logic Diagram
The MLD is a hierarchical, top-down display of IEs, showing general types of un-desired events at the top, proceeding to increasingly detailed event descriptions at lower tiers, and displaying IEs at the bottom. The goal is not only to support iden-tification of a comprehensive set of IEs, but also to group them according to the challenges that they pose (the responses that are required as a result of their occur-rences). IEs that are completely equivalent in the challenges that they pose, includ-ing their effects on subsequent pivotal events, are equivalent in the risk model.
A useful starting point for identification of IEs is a specification of “normal” operation in terms of (a) the nominal values of a suitably chosen set of physical variables and (b) the envelope in this variable space outside of which an IE would be deemed to have occurred. A comprehensive set of process deviations can
330 9 Probabilistic Safety Assessment
thereby be identified, and causes for each of these can then be addressed in a sys-tematic way. An MLD for typical pressurized heavy-water reactor (PHWR) is shown in Figure 9.4.
9.5 Event Tree Analysis
Event tree analysis is an inductive method which shows all possible outcomes re-sulting from an IE. The IE can be subsystem failure, an external event (like flood, fire, or earthquake), or operator error. The event tree models the sequences con-taining relationships among IE and subsequent responses along with the end states. The subsequent response events (branches of event tree) are safety systems, also known as pivotal events. Various accident sequences can be identified and probability of occurrence of each sequence is further quantified.
Although it is theoretically possible to develop risk models using only event trees or only fault trees, it would be very difficult in the case of complex real sys-tems. Hence a combination of event trees and fault trees is used to develop the risk models. Event trees model the accident sequences, whereas fault trees model indi-vidual system responses. However, if applicable probabilistic data are available for safety systems or pivotal events then fault tree modeling is not required.
9.5.1 Procedure for Event Tree Analysis
The following steps are used for carrying out event tree analysis (see Figure 9.5). The construction of the event tree begins with an IE. An exhaustive list of accident IEs is identified from which accident sequences could be postulated during differ-ent operational states of the overall system/plant/entity. There are several ap-proaches available for preparation of a list of IEs such as operational history of the system, reference of previous list of similar systems, MLDs, and PHA/HAZOP. The IEs include both internal events such as equipment failure, human error, or software failure, and external events such as fires, floods, and earthquakes.
Once the exhaustive list of IEs is prepared, detailed analysis of each IE listed should be carried out to assess the causes and consequences. As the analysis of larger lists results in wastage of resources, grouping of IEs is done which may have the same consequences. This can be done if the demands these IEs place on safety functions, front-line systems, and support systems are the same. Hence, the safety functions that need to be performed for pivotal events involved in respond-ing to each IE are identified. Based on this information, the IE group can be mod-eled using the same event tree analysis.
9.5 Event Tree Analysis 331
Figure 9.4 MLD of typical PHWR [4]
Others
Compressed air system failure
End shield cooling system failure
Process water system failure
Process water cooling system failure
Increase in reactor coolant inventory (Feed valves fully open, Bleed valves fully closed)
Core damage
Decrease in RCS flow rate
Flow blockage in coolant chan-nel assembly
Single/multiple reactor coolant pump failure
Decrease in reactor cool-ant inventory
PHT header and piping failure
Coolant channel failure
Steam gen-erator (SG) tube failure
FM induced LOCA
Seal plug leakage after refueling
Failure of heat exchanger tubes other than SG
Loss of regula-tion
Failure of mod-erator flow
Reactivity and power distribu-tion anomalies
Increase in heat removal from secon-dary side
Steam system piping failure inside RB
Steam system piping failure outside RB
Decrease in heat removal from secondary side
Loss of normal feed flow
Power supply failure
Feed water pipe break
332 9 Probabilistic Safety Assessment
Figure 9.5 Procedure for carrying out event tree analysis
List all accident initiators
Identify safety functions
Grouping of IEs
Choose a group
Order the safety functions & develop branches
Identifying accident sequence and corresponding end state
Minimal cut set determination for all sequences
Under the given criterion of failure, identify the accident sequences
Determine minimal cut sets for the overall system
Quantification and documentation of results
Repeat for all IE groups
9.5 Event Tree Analysis 333
Figure 9.6 Typical event tree having three safety systems.
iS = failed state; Si = success state
Event trees are graphical models that order and reflect events according to the requirements for mitigation of each group of IEs. Events or “headings” of an event tree can be any (or combination of) safety functions, safety systems, basic events, and operator actions. The event tree headings are generally ordered according to their time of intervention. A support system must enter the sequence before the af-fected systems in order for a support system failure to fail the systems it supports.
For each heading of the event tree, the set of possible success and failure states are defined and enumerated. Each state gives rise to a branching of the event tree. The general practice is to assign “success” to the top branch and “failure” to the bottom branch.
Combination of all the states through the event tree branching logic gives dif-ferent paths ending with accidental or healthy scenarios. In each path there is an IE and combination of safety system states; such a path is known as an accident sequence. A typical event tree is shown in Figure 9.6; it has three safety systems
and with binary (success and failure) state combinations leading to eight (2 3 ) end
SS1 SS2 SS3 Initiating event
IE
Success
Failure 1S
2S
2S
1S
2S
3S
3S
3S
3S
3S
3S
3S
3S
2S
1
2
3
4
5
6
7
8
Safety systems/ pivotal events
334 9 Probabilistic Safety Assessment
states. Table 9.4 lists the Boolean expression sequences for all the accident se-quences in the event tree. In complex systems there can be many safety systems making an event tree with large number of sequences. However, the number of sequences can be reduced by eliminating physically impossible nodes. For exam-ple, a simplified event tree for a large loss-of-coolant accident (LLOCA) in a PHWR is shown in Figure 9.7. It has three pivotal events:
• reactor protection system (RPS); • total power supply system (TPS); • emergency core cooling system (ECCS).
Theoretically as shown in event tree (Figure 9.6), it should have eight paths, but it is having only four paths. This is because:
• RPS failure will directly have significant consequence irrespective of other events.
• ECCS is dependent on the power supply.
The right side of the event tree represents the end state that results from that se-quence through the event tree. The end state can have healthy or accident conse-quences. To determine such consequences, a thorough understanding of the sys-tem, operating experience, and analyses of accidents (like thermal hydraulic or chemical reaction studies) is required.
Figure 9.7 Event tree for LLOCA
LLOCA RPS TPS ECCS
1
2
3
4
9.5 Event Tree Analysis 335
Table 9.4 Boolean expressions for accident sequences
Accident sequence Boolean expression 1 I ∩ S 1 ∩ S 2 ∩ S 3
2 I ∩ S 1 ∩ S 2 ∩ 3S
3 321 SSSI ∩∩∩
4 321 SSSI ∩∩∩
5 321 SSSI ∩∩∩
6 321 SSSI ∩∩∩
7 321 SSSI ∩∩∩
8 321 SSSI ∩∩∩
The pivotal events (branches of event tree) may be simple basic events or may be a complex system which may require a fault tree to get the minimal cut sets. The minimal cut set expression for each accident sequence is determined using Boolean algebra rules. This is illustrated with the following example.
Example 2 Consider the event tree shown in Figure 9.8. Let 1S and 2S be de-
rived from the fault tree and 3S be a basic event given by the following expres-
sions. Calculate the Boolean expression for accident sequence 8.
Solution: The Boolean expressions for the top events of the fault tree are
eS
dcbS
acabS
=
+=
+=
3
2
1
)(
Accident sequence 8 is has the Boolean expression
3218 ... SSSIAS = .
Substituting the Boolean expression for each branch,
edcaIedbaIASedacabAS
edacabcabcabASedcbacabAS
edccbacabIAS
.........][
.][.)])([(
])).[((
8
8
8
8
8
+=+=
+++=++=
++=
Quantitative evaluation, the probability calculation of accident sequence, is similar to fault tree evaluation. The probability of accident sequence 8 is
).....()....()....()( 8 edcbaIPedcaIPedbaIPASP −+= .
336 9 Probabilistic Safety Assessment
Figure 9.8 Event tree with fault trees of associated branches
a
c b
I
1S
2S
3S
b a
c
d
9.6 Importance Measures 337
In complex systems like NPPs, there will be several IE groups. Hence, minimal cut sets will be determined for all the accident sequences if all the system failure criteria and all the accident sequences having the same consequences which sat-isfy the failure criteria are identified. The Boolean expression for the overall sys-tem is obtained in the form of minimal cut sets. The probability of basic events will be subsequently used to quantify the system measure.
An event tree for an IE class IV power supply failure for a PHWR is shown in Figure 9.9. Software is required to do the analysis for such large event trees.
FailureSuccess
Failure
Success
Failure
Null
Success
Failure
Success
FailureNull
Success
Failure
Null
Null
NullNull
NullSuccess
Failure
SuccessFailure
Null
NullNull
Null Not setNull Not setSuccess Not setFailure Not setNull Not setNull Not setSuccess Not setFailure Not setNull Not setNull Not set
Class IV RPS Class III SSR AFWS SDCS FFS Consequence Frequency
Figure 9.9 Event tree for class IV failure of PHWR [10]
9.6 Importance Measures
Quantification of system risk/reliability gives only the overall system perform-ance measure. In the case where improvements in the system reliability or reduc-tion in risk are required, one has to rank the components or in general the pa-rameters of system model. Importance measures determine the changes in the system metric due to changes in parameters of the model. Based on these impor-tance measures, critical parameters are identified. By focusing more resources on the most critical parameters, system performance can be improved effectively. Importance measures also provide invaluable information on prioritization of
338 9 Probabilistic Safety Assessment
components for inspection and maintenance activities. This section discusses various importance measures used in PSA.
9.6.1 Birnbaum Importance
The Birnbaum measure of importance is defined as the change in system risk for a change in failure probability for a basic event. The basic event can be component failure or human error or a parameter of system risk model. It is mathematically expressed as
i
Bi p
RI∂∂= . (9.3)
where R is system risk or unavailability which is a function of n basic events:
),...,,( 21 nxxxfR = , (9.4)
and pi is the probability of basic event xi. In PSA or design reliability analysis, R is expressed as probability over union
of minimal cut sets. It is mathematically the sum of products of basic event prob-abilities with the rare-event approximation. Separating the terms having an ith ba-sic event probability pi from the sum of products gives the following equation:
BApR ii += , (9.5)
where B is sum of products not having pi and Ai is the sum of products with pi fac-tored out. Now the Birnbaum importance measure can be defined as
ii
ii
i
Bi A
pBAp
pRI =
∂+∂=
∂∂= )(
. (9.6)
It can be observed from the final expression for the Birnbaum measure of im-portance that it does not contain the probability of basic event pi. This makes highly important but highly reliable basic events have a high Birnbaum impor-tance. For example, a passive component like a bus in an electrical power supply may have high ranking. But the level of reliability of the bus is already very high, if one wants to focus on it for system metric improvement.
9.6 Importance Measures 339
9.6.2 Inspection Importance
Inspection importance is the risk due to cut sets containing i components. It is the Birnbaum importance of a basic event multiplied by the probability of that basic event:
.ApIor
pRpI
iiIi
ii
Ii
×=
∂∂×=
(9.7)
9.6.3 Fussell–Vesely Importance
This is the fractional change in risk for a fractional change in a basic event prob-ability, i.e.,
i
i
ii
FVi p
RRp
ppRRI
∂∂×=
∂∂=
)/()/(
. (9.8)
As we have ii
R Ap
∂ =∂
, now
iiFV
i ARpI ×= . (9.9)
The three importance measures discussed previously deal with basic event probabilities one event at a time. Bolgorov and Apostalakis [11] proposed a dif-ferential importance measure (DIM) which considers all basic event probabilities. It is defined as follows:
∑= ∂
∂∂∂
= n
jj
j
iiDIM
i
dppR
dppR
I
1
. (9.10)
340 9 Probabilistic Safety Assessment
Assuming a uniform change for all basic events, the DIM can be expressed as a function of Birnbaum importance:
∑=
= n
j
BMj
BMiDIM
i
I
II
1
. (9.11)
Assuming a uniform percentage change for all basic event probabilities (δpi/pi), the DIM can be expressed as a function of Fussell–Vesely importance:
∑=
= n
j
FVj
FViDIM
i
I
II
1
. (9.12)
This is applicable to all the analysis conditions, for example parameters of the model have different dimensions.
Example 3 An emergency power supply has three diesel generators (DGs). One DG is sufficient for all the emergency loads, and the loads are connected to the buses in such a way that failure of any one bus will not affect the performance of the system. The line diagram of the power supply is shown in Figure 9.10. Con-struct the fault tree and calculate the minimal cut sets. Using importance measures rank the components of the system.
Figure 9.10 Line diagram of emergency power supply
Solution: The fault tree for the emergency power supply is shown in Figure 9.11. The minimal cut sets of the system are
.DGDGCBCBDGDGDGDGCBBDGDGBDGCBBDGDGBBBT
212132111231222132121
++++++=
Using the formulae mentioned above, importance measures are evaluated for each component as shown in Table 9.5.
DG 3 DG 2 DG 1
CB 2CB 1
BUS 1 BUS 2
9.6 Importance Measures 341
CLIIIQ=2.267e-7
Emergencysupply failure
SPLYBUS1
No supplyat Bus1
SPLYBUS2
No supplyat Bus2
IPSB1
No supplyto Bus1
BUS1
Bus1failure
Q=2.560e-6
IPSB2
No supplyto Bus2
BUS2
Bus2failure
Q=2.560e-6
DG3FLDDG1
DG3 failureto feed DG1
loads
DG1
DG1unavailable
Q=6.096e-3
DG3
DG3unavailable
Q=6.096e-3
CB1
Transferfailure
Q=1.712e-4
DG3DLDDG2
DG3 failureto feed DG2
loads
DG2
DG2unavailable
Q=6.096e-3
DG3
DG3unavailable
Q=6.096e-3
CB2
Transferfailure
Q=1.712e-4 Figure 9.11 Fault tree for emergency supply failure
Table 9.5 Importance measures for each component
Component Birnbaum Inspection Fussell–Vesely DIM(1) DIM(2) B1 4.07E–5 1.043E–10 4.6E–4 0.211 1.53E–4 B2 4.07E–5 1.043E–10 4.6E–4 0.211 1.53E–4 DG1 3.71E–5 2.26E–7 9.995E–1 0.192 0.333 DG2 3.71E–5 2.226E–7 9.995E–1 0.192 0.333 DG3 3.719E–5 2.267E–7 9.999E–1 0.1926 0.3333 CB1 2.197E–8 3.76E–12 1.658E–5 1.13E–4 5.52E–6 CB2 2.197E–8 3.76E–12 1.658E–5 1.13E–4 5.52E–6
342 9 Probabilistic Safety Assessment
Table 9.6 Importance ranking
Component Birnbaum Inspection Fussell–Vesely DIM(1) DIM(2) B1 1, 2 4, 5 4, 5 1, 2 4, 5 B2 1, 2 4, 5 4, 5 1, 2 4, 5 DG1 4, 5 2, 3 2, 3 4, 5 2, 3 DG2 4, 5 2, 3 2, 3 4, 5 2, 3 DG3 3 1 1 3 1 CB1 6, 7 6, 7 6, 7 6, 7 6, 7 CB2 6, 7 6, 7 6, 7 6, 7 6, 7
Now the ranking is given to each component based on the obtained values as shown in Table 9.6.
9.7 Common-cause Failure Analysis
Dependencies that exist inherently in engineering systems possess limitations in achieving high reliability and safety. By providing a high factor of safety and redun-dancy, reliability and safety can be improved up to a certain level, but beyond that it is a challenge to improve due to the dependencies. For example, all redundancies may fail due to exposure to a harsh physical environment. The recognition of de-pendent failures can provide insights into strong and weak points of system design and operation. All the dependencies should be listed separately and should also be properly included in fault tree/event tree models in order to evaluate correctly their impact on the level of risk. Nevertheless, all the dependent failures may not have specific root causes to incorporate directly into fault trees/event trees. Dependent failures whose root causes are not explicitly modeled are known as CCFs. This sec-tion provides a brief description of various CCF models available in the literature.
9.7.1 Treatment of Dependent Failures
In the probability framework, the simultaneous occurrence of two events A and B is given by
( ) ( ) BP A B P A PA
⎛ ⎞∩ = ⎜ ⎟⎝ ⎠
.
Generally in PSA calculation, it is assumed that A and B are independent and sim-ply the product of P(A) and P(B) is used:
( ) ( ) ( )P A B P A P B∩ = .
9.7 Common-cause Failure Analysis 343
However, in the presence of positive dependency ( ) ( ) ( ),P A B P A P B∩ > due to
the fact that ).(BPABP >⎟⎠⎞
⎜⎝⎛
Thus, independent assumptions may underestimate the risk value if there exists positive dependency in reality.
There can be many different classifications of dependencies. As per the stan-dard on CCF by NUREG/CR-4780 [12], and IAEA 50P-4 [3] dependences are categorized as the following types.
9.7.1.1 Functional Dependences
These dependences are among systems, trains, subsystems, or components due to the sharing of hardware or due to a process coupling. Shared hardware refers to the dependence of multiple systems, trains, subsystems, or components on the same equipment. In process coupling, the function of one system, train, subsys-tem, or component depends directly or indirectly on the function of another. A di-rect dependence exists when the output of one system, train, subsystem, or com-ponent constitutes an input to another. An indirect dependence exists when the functional requirements of the one system, train, subsystem, or component depend on the state of another. Possible direct process coupling between systems, trains, subsystems, or components includes electrical, hydraulic, pneumatic, and me-chanical connections.
9.7.1.2 Physical Dependences
There are two types of physical dependences:
• Those dependences that cause an IE and also possibly failure of plant mitigat-ing systems due to the same influence, e.g., external hazards and internal events. Such events are certain transients, earthquakes, fires and floods, etc.
• Those dependences that increase the probability of multiple system failures. Often they are associated with extreme environmental stresses created by the failure of one or more systems after an IE or by the IE directly. Examples are fluid jets and environmental effects caused by LOCAs.
It should be emphasized that proximity is not the only environmental coupling inducing physical dependence. A ventilation duct, for example, might create an environmental coupling among systems, trains, subsystems, or components lo-cated in seemingly decoupled locations. Radiation coupling and electromagnetic coupling are two other forms not directly associated with a common spatial do-main.
344 9 Probabilistic Safety Assessment
9.7.1.3 Human Interaction Dependence
Two types of dependence introduced by human actions can be distinguished: those based on cognitive behavioral processes and those based on procedural behavioral processes. Cognitive human errors can result in multiple faults once an event has been initiated. Dependences due to procedural human errors include multiple maintenance errors that result in dependent faults with effects that may not be immediately apparent (e.g., miscalibration of redundant components).
In all the dependencies categories, it is failure of two or more components due to a shared cause or event. If the clear cause-effect relationship can be identified which is causing failures of multiple events, then it should be explicitly modeled in the system model.
For example, fires, floods, and earthquakes are treated explicitly as initiating events of events trees in PSA. Human errors are also included as branches of event trees. However, multiple failure events for which no clear root cause event can be identified can be modeled using implicit methods categorized as CCF models; thus CCFs represent residual dependencies that are not explicitly modeled in event/fault trees. CCFs can therefore belong to any of the above-mentioned types.
CCFs are classified as due to design, construction, procedural, and environ-mental causes. These can be further subdivided as due to functional deficiencies, realization faults, manufacturing, installation, test and maintenance, operation, human error, normal extremes, and energetic extremes. The predominant causes are design (30–50%), operation and maintenance errors (30%), and the rest due to normal and extreme environmental causes (30%).
Examples of CCF failures:
• fire at Browns Ferry NPP; • failure of all three redundant auxiliary feed water pumps during the Three Mile
Island accident.
9.7.1.4 Defense Against Common-cause Failure
By adopting the following defenses during design and operating practices, one can eliminate or reduce vulnerabilities of the system for CCFs:
• diversity (e.g., in NPPs shutdown can be achieved by inserting shutoff rods or by injecting liquid poison into the moderator, the physical princi-ple of working is completely independent; diversity can be provided from the manufacturing side also);
• staggered testing; • staggered maintenance; • physical barriers.
9.7 Common-cause Failure Analysis 345
9.7.2 Procedural Framework for Common-cause Failure Analysis
The procedure for the CCF analysis is divided into three phases. Phase 1 – screening analysis
Steps: 1.1 plant familiarization, problem definition, and system modeling; 1.2 preliminary analysis of CCF vulnerability;
1.2.1 qualitative screening; 1.2.2 quantitative screening.
Phase 2 – detailed qualitative analysis Steps:
2.1 review of operating experience; 2.2 development of root cause defense matrices.
Phase 3 – detailed quantitative analysis
Steps: 3.1 selection of probability models for common-cause basic events
(CCBEs); 3.2 data analysis; 3.3 parameter estimation; 3.4 quantification; 3.5 sensitivity analysis; 3.6 reporting.
Interested readers can refer to [13, 14] for detailed description of the procedural framework that was developed for performing a CCF analysis.
9.7.3 Treatment of Common-cause Failures in Fault Tree Models
Components having implicit shared causes of failure are identified in the model. The fault trees are then modified to explicitly include these shared causes. Based on the number of components, the new basic events are introduced to consider these common causes. Fault trees are then solved to obtain minimal cut sets which are now updated with CCBEs. To illustrate the treatment of CCFs, consider a sys-tem having two identical redundant components. If CCF is not considered, there is only one cut set as shown in Figure 9.12 (a), where A denotes the failure of com-ponent A and B denotes the failure of component B.
In the presence of CCF, an event CAB can be defined as the failure of both com-ponents A and B due to common cause. Each component basic event becomes a subtree containing its independent failure and CCBEs. Figure 9.12 (b) shows the fault tree model considering the CCBE. Using Boolean algebra, it can be simpli-fied to the fault tree shown in Figure 9.12 (c). Thus, the Boolean expression of the system is given by T =AI BI + CAB.
346 9 Probabilistic Safety Assessment
Figure 9.12 Active parallel redundancy (a) without CCF and (b), (c) with CCF
To illustrate the difference between the case where CCF is not considered and the case with CCF, a numerical example is considered here.
Failure of A (A) Failure of B (B)
(a)
(c)
Independent Failure of B )( IB
ABC
Independent Failure of A
)(A
ABC
(b)
IA ABC IB
9.7 Common-cause Failure Analysis 347
Example 4 The total probability of failure of each component A and B is 0.05. It is known from the experience that 10% of the time both the components fail due to common cause. Compare probability of failure of the systems constituting A and B in active parallel redundancy (i) without considering CCF and (ii) considering CCF.
Solution:
Case (i): refer to Figure 9.12 (a).
P(T) = P(AB) = P(A)P(B) = 2.5 × 10–3.
Case (ii): refer to Figure 9.12 (b,c).
3
( ) ( )( ) ( ) ( ) ( )
7 015 10
I I AB
I I AB I I AB
P T P A B CP A B P C P A B P C
. .−
= += + −
= ×
Thus neglecting CCF can underestimate the final result. In the case of two redundant components, there is only one CCBE, CAB. In a
system of three redundant components A, B, and C, the CCBEs are CAB, CAC, CBC and CABC. The first three events represent CCBEs involving any two components and the fourth is a CCBE involving all three components. Each component can fail due to its independent cause and it associated CCBEs. For example, the compo-nent A failure can be represented by the subtree shown in Figure 9.13.
The Boolean expression for the total failure of component A is given by
1
2
3
1 2 3
Let ( )( ) ( )( )
Now,( ) 2
I AB AC ABC
I
AB AC
ABC
A A C C CP A Q
P C P C QP C Q
P A Q Q Q .
= + + +=
= ==
= + +
Generalizing for n-component common-cause groups and assuming the prob-ability of a CCBE depends on the number and not on the specific components in that basic event.
Let Q1 represent each independent failure probability and Qi represent the probability of failure CCBE involving i components. The total probability of fail-ure of a component Qt is expressed as
i
n
ii
nt QCQ ∑
=−
−=1
11 .
348 9 Probabilistic Safety Assessment
Qi values can be computed from experience. To account for the lack of data, parameter models such as beta-factor, alpha-factor, and multiple-Greek-letter (MGL) models are used which put less stringent requirements on the data.
Figure 9.13 Component A failure
Example 5 Consider the consistency of three identical parallel pumps in the pumping system of Figure 9.14. Operation of one pump is sufficient for successful operation of the system. Develop the fault tree with CCBEs and quantify the fail-ure probability of the system given Q1 = 0.012, Q2 = 0.001, and Q3 = 0.0006.
Solution: From the fault tree (Figure 9.15), we have the following gate expres-sions:
I AB AC ABC
I AB BC ABC
I BC AC ABC
T A BCA A C C CB B C C CC C C C C .
== + + += + + += + + +
Using the absorption and independent law of Boolean algebra, AB can be simpli-fied to
( ) ( )I AB AC ABC I AB BC ABC
I I AB I BC AC BC ABC
A B A C C C B C C CA B A B C A C C C C
= + + + + + += + + + +
( ) ( ) ( )( ) ( ).
I AB AC ABC I AB BC ABC I BC AC ABC
I I AB I BC AC BC ABC I AC BC ABC
ABC A C C C B C C C C C C CABC A B C A C C .C C C C C C
= + + + + + + + + += + + + + + + +
Component A Fails (A)
IA ACC ABC ABCC
9.7 Common-cause Failure Analysis 349
Figure 9.14 Pumping system
Figure 9.15 Fault tree for pumping system
After simplification, the final Boolean expression is
ABC I BC I AC I AB AB BC AC BC AC AB I I IT C A C B C C C C C C C C C A B C= + + + + + + + .
Pump 1
Pump 2
Pump 3
Pump A
Pump B
Pump C
T
ABC
IB
IC
ABC
ABCC
ACC
ABCC
ACC
ABC
ABCC
BCC
IA
350 9 Probabilistic Safety Assessment
Using rare-event approximation and assuming
3
2
1
)()()()(
)()()(
QCPQCPCPCP
QCPBPAP
ABC
ACBCAB
III
====
===
the failure probability of the system is 32
22213 333)( QQQQQTP +++= .
Substituting the given values, P(T) = 6.71E–4.
9.7.4 Common-cause Failure Models
CCF models are used to quantify CCBEs (Table 9.7). The common-cause models can be classified based on how multiple failures occur into two major categories [12]: non-shock models and shock models. These are described below.
9.7.4.1 Non-shock Models
Non-shock models are CCF models that estimate multiple failure probabilities without postulating a model for the underlying failure mechanism. Examples are the basic-parameter, beta-factor, MGL, and alpha-factor models.
Beta-factor Model
The first CCF model applied in risk and reliability analysis is the beta-factor model introduced by Fleming [15]. The beta-factor model is a single-parameter model; that is, it uses one parameter in addition to the total component failure probability to calculate the CCF probabilities. This model assumes that a constant fraction (β) of the component failure rate can be associated with CCBEs shared by other com-ponents in that group. Another assumption is that whenever a CCBE occurs, all components within the common-cause component group are assumed to fail.
tm
tI
QQQQ
ββ
=−= )1(
(9.13)
This implies that
mI
m
QQQ+
=β , (9.14)
where tQ is the total failure probability of one component )( mIt QQQ += , IQ is the independent failure probability of the single component, mQ is the probability
9.7 Common-cause Failure Analysis 351
of basic event failure involving m specific components, and m is the maximum number of components in a common-cause group. To generalize the equation, it can be written for m components involving failure of k components (k ≤ m),
mkQQmkQ
tk
k
tk
==≤=
=−=
β
β≺2 0
1 k )1( (9.15)
where kQ is the probability of a basic event involving k specific components.
Table 9.7 CCF models and parameter estimates
Name General form for multiple component failure probabilities
Point estimators
Basic pa-rameter
Qk = Qk
Dkm
kk Nc
nQ =
Beta-factor (β)
mkQQmkQkQQ
tk
k
tk
==≤=
=−=
β
β≺2 0
1)1(
k
m
k
k
m
k
kn
kn
1
2
=
=
Σ
Σ=β
Multiple Greek let-ters (β, α, ν)
)(mkQ = tklkl
Q
km
)1)((
11
11..1 +=
−Π
⎟⎟⎠
⎞⎜⎜⎝
⎛−−
ρρ
ρ1 = 1, ρ2 = β, ρ3 = γ, …, ρm+1 = 0
k
m
lk
k
m
lkl
kn
kn
1−=
=
Σ
Σ=ρ
Alpha-factor (α) t
t
mkm
k Q
kmkQ
α
α
⎟⎟⎠
⎞⎜⎜⎝
⎛−−
=
11
)()(
where )(
1
mk
m
kt kαα ∑
=
=
k
m
k
kk
n
n
1=Σ
=α
Binomial failure rate
mkpQ
mkppQ
kppQQ
mk
kmkk
mk
=+=
<≤−=
=−+=−
−
ωμμ
μ2)(1)(
1)(1 1I
D
II mN
nQ =
λt = nt/ND
ω = nL/ND
mt
m
kk p
mnpkn)1(11 −−
=∑=
mt
p)1(1 −−= λμ
352 9 Probabilistic Safety Assessment
A practical and useful feature of this model is that the estimators of β do not explicitly depend on system or component success data, which are not generally available. Also, estimates of β for widely different types of components do not appear to vary much compared to Qk. These two observations and the simplicity of the model are the main reasons for its wide use in risk and reliability studies. However, application of this model be limited up to certain values of m.
Estimator for the Beta-factor Model Parameter
Although the beta-factor was originally developed for a system of two redundant components and the estimators that are often presented in the literature also as-sume that the data are collected from two-unit systems, a generalized beta-factor estimator can be defined for a system of m redundant components. Such an esti-mator is based on the following general definition of the beta-factor (identical to the way it is defined in the more general MGL model):
2
111
m
kkt
mQ
kQβ
=
−⎛ ⎞= ⎜ ⎟−⎝ ⎠
∑ or 2
1
m
kkm
kk
kn
knβ =
=
Σ=
Σ . (9.16)
Example 6 There are two redundant DGs present as a part of emergency power supply to take safety loads at the time of grid supply failure. The information given below is available from operational experience. Calculate the total failure probability of one component, Qt; the independent failure probability of the single DG, QI; the probability of basic event failure involving both DGs, Q2.
Solution:
No. of demands = ND = 1500; No. of times DG1 or DG2 alone failed = n1= 50; No. of times both DGs failed = n2 = 4.
Parameter estimation:
k
m
k
k
m
k
kn
kn
1
2
=
=
Σ
Σ=β .
m = 2 in the given example; β = 2n2/(n1 + 2n2) = 0.138.
Calculation of failure probabilities: the general expression for Qt irrespective of CCF model is
∑=
=m
kk
dt kn
mNQ
1
1;
Qt = (n1 + 2n2)/(2ND) = 0.01933; Q2 = β × Qt = 2.668 × 10−3.
9.7 Common-cause Failure Analysis 353
Multiple-Greek-letter Model
The MGL model is an extension of the beta-factor model introduced by Fleming and Kalinowski [16]. The MGL model was the one used most frequently in the in-ternational CCF reliability benchmark exercise. In this model, other parameters in addition to the beta-factor are introduced to account more explicitly for higher-order redundancies and to allow for different probabilities of failures of subgroups of the common-cause component group.
The MGL parameters consist of the total component failure probability Qt, which includes the effects of all independent and common-cause contributions for all component failure and a set of failure fractions. These fractions are used to quantify the conditional probabilities of all the possible ways a CCF of a compo-nent can be shared with other components in the same group, given that a compo-nent failure has occurred. For a group of m redundant components and for each given failure mode, m different parameters are defined.
For a general case,
tklkl
mk Q
km
Q )1)((
11
11..1
)(+=
−Π
⎟⎟⎠
⎞⎜⎜⎝
⎛−−
= ρρ ,
where ρ1 = 1, ρ2 = β, ρ3 = γ, ρ4 = δ,…, ρm+1 = 0. β is the conditional probability that the cause of a component failure will be
shared by one or more additional components, given that a specific component has failed.
γ is the conditional probability that the cause of a component failure that is shared by one or more components will be shared by two or more additional com-ponents, given that two specific components have failed.
δ is the conditional probability that the cause of a component failure that is shared by two or more components will be shared by three or more additional components, given that three specific components have failed.
The following equations express the probability of multiple component failures due to common cause, kQ , in terms of the MGL parameters, for a three-component group:
31 (1 ) tQ Qβ= − ; 3
2(1 )
2tQ
Qβ γ−
= ; 33 tQ Qβγ= .
For a four-component group:
41 (1 ) tQ Qβ= − ; 4
2(1 )
3tQ
Qβ γ−
= ; ( )4
3
13
tQQ
βγ δ−= ; 4
4 tQ Qβγδ= .
354 9 Probabilistic Safety Assessment
For a four-component group, the MGL model has four parameters:
∑=
⎟⎟⎠
⎞⎜⎜⎝
⎛−−
=m
k
mkt Q
km
Q1 1
1,
Qt = Q1(4) + 3Q2
(4) + 3Q3(4) + Q4
(4) , (9.17)
β = )4(4
)4(3
)4(2
)4(1
)4(4
)4(3
)4(2
3333
QQQQQQQ
+++++
, (9.18)
)4(4
)4(3
)4(2
)4(4
)4(3
333
QQQQQ
+++=γ , (9.19)
)4(4
)4(3
)4(4
3 QQQ
+=δ . (9.20)
Estimators for the MGL Parameters
Based on the definition of the MGL parameters, the simple point estimators are
k
m
lk
k
m
lkl
kn
kn
1−=
=
Σ
Σ=ρ (l = 2, 3,…, m),
where nk is defined as the number of events involving the failures of exactly k components.
Example 7 In a four-unit redundant system, for which we have the following data, estimate the parameters of the MGL model and calculate the CCF failure prob-abilities:
ND = 1000; Redundancy level = m = 4; No. of independent events = n1 = 50; No. of events involving 2 components = n2 = 10; No. of events involving 3 components = n3 = 4; No. of events involving 4 components = n4 = 1.
9.7 Common-cause Failure Analysis 355
Solution:
Parameter estimation:
ρ2 = β = kk
kk
kn
kn4
1
4
2
=
=
Σ
Σ = 0.42, ρ3 = γ =
kk
kk
kn
kn4
2
4
3
=
=
Σ
Σ = 0.44, and ρ4 = δ =
kkkn
n4
3
44
=Σ
= 0.25.
Qt = n1/4ND + 3n2/6ND + 3n3/4ND +n4/ND
= (2.15E–2)/d.
Calculation of failure probabilities:
tQQ *)1(41 β−= = 1.25E–2;
3*)1(4
2tQQ γβ −= = 1.68E–3;
3*)1(4
3tQQ δβγ −= = 9.9E–4; tQQ *4
4 βγδ= = 9.9E–4.
Alpha-factor Model
The alpha-factor model explained by Mosleh [17] defines CCF probabilities from a set of failure frequency ratios and the total component failure frequency, Qt. In terms of the basic event probabilities, the alpha-factor parameters are defined as
∑=
⎟⎟⎠
⎞⎜⎜⎝
⎛
⎟⎟⎠
⎞⎜⎜⎝
⎛
=m
mk
mk
k
1kQ
km
Qkm
m)(
)(
)(α , (9.21)
where )(mkQ
km⎟⎟⎠
⎞⎜⎜⎝
⎛ is the frequency of events involving k component failures in a
common-cause group of m components, and the denominator is the sum of such frequencies. In other words, αk
(m) is ratio of the probability of failure events in-volving any k components over the total probability of all failure events in a group of m components, and Σkαk
(m) = 1. The basic event probabilities can be expressed in terms of Qt and the alpha-factors as follows:
t
t
mkm
k Q
kmkQ
α
α
⎟⎟⎠
⎞⎜⎜⎝
⎛−−
=
11
)()( , where )(
1
mk
m
kt kαα ∑
=
= . (9.22)
356 9 Probabilistic Safety Assessment
CCBE probabilities for three components:
tt
tt
tt
QQQQQQαα
αα
αα 3
32
21
13;; === .
CCBE probabilities for four components:
tt
tt
tt
tt
QQQQQQQQαα
αα
αα
αα 4
43
32
21
14;;
32; ==== .
Estimators for the alpha-factor model parameters. An estimator for each of the alpha-factor parameters (αk) can be based on its definition as the fraction of total failure events that involve k component failures due to common cause. Therefore, for a system of m redundant components,
k
m
k
kk
n
n
1=Σ
=α .
Example 8 Assuming an alpha-factor model, calculate the CCF probabilities for Example 7.
Solution:
Parameter estimation:
kkn
n4
1
11
=Σ
=α = 50/65 = 0.769;
kkn
n4
1
22
=Σ
=α = 0.154;
kkn
n4
1
33
=Σ
=α = 0.062;
kkn
n4
1
44
=Σ
=α = 0.015.
Calculation of failure probabilities:
t
t
QQα
α
⎟⎟⎠
⎞⎜⎜⎝
⎛=
132 24
2 with Qt = 2.15E–2 and αt = 1.325.
Q24 = 1.67E–3/d.
Extended Beta-factor Model
The checklist beta-factor model is based on that developed by R. A. Humphreys. The method involves the scoring of the system design against eight criteria, namely (i) separation, (ii) similarity, (iii) design complexity, (iv) analysis, (v) pro-
9.7 Common-cause Failure Analysis 357
cedures, (vi) training, (vii) environmental control, and (viii) environmental tests. The total score is then converted on a regression basis, into a beta-value. This model reflects levels of redundancy, by defining generic beta-factors for different levels of redundancy. For instance, redundancy level 2 or 3 uses β1 and level 4 or 5 uses β2 and levels above 5 use β3, with β1 > β2 > β3. These beta-factors are se-lected based on comparison between the failure probability of redundant compo-nents as obtained by this method and the more precise MGL method.
9.7.4.2 Shock Models
Shock models are CCF models that estimate multiple failure probabilities by pos-tulating a shock that impacts the system at certain frequency to cause multiple failures. One such example is explained below.
Binomial Failure Rate Model
The binomial failure rate (BFR) model considers two types of failures. The first represents independent component failures; the second type is caused by shocks that can result in failure of any number of components in the system. According to this model, there are two types of shocks: lethal and non-lethal. When a non-lethal shock occurs, each component within the common-cause component group is as-sumed to have a constant and independent probability of failure. The name of this model arises from the fact that, for a group of components, the distribution of the number of failed components resulting from each non-lethal shock occurrence fol-lows a binomial distribution. The BFR model is, therefore, more restrictive be-cause of these assumptions than all other multi-parameter models presented here. When a lethal shock occurs, all components are assumed to fail with a conditional probability of unity. Application of the BFR model with lethal shocks requires the use of the following set of parameters: QI independent failure frequency for each component; μ frequency of occurrence of non-lethal shocks; p conditional probability of failure of each component, given a non-lethal shock; ω frequency of occurrence of lethal shocks; m total number of components in common-cause group.
Thus, the frequency of basic events involving k specific components is given as
mkpQ
mkppQ
kppQ
mk
kmkk
mk
=+=
<≤−=
=−+=−
−
ωμμ
μ2)1()(
1)1(Q 1I
(9.23)
358 9 Probabilistic Safety Assessment
Estimators for Binomial Failure Rate Model
The main parameters of the BFR model are QI, λt, ω, and p. Let λt be the rate of non-lethal shocks that cause at least one component failure, and
∑=
=m
kkt nn
1
, (9.24)
where nk is the number of basic events involving k components. nL is the number of lethal shocks, and nI is number of individual component failures, excluding those due to lethal and non-lethal shocks:
D
II
nQ
mN= , (9.25)
λt = nt/ND,
ω = nL/ND,
and p is the solution of
mt
m
kk p
mnpkn)1(11 −−
=∑=
. (9.26)
An estimator for μ can be obtained from the above estimators:
mt
p)1(1 −−= λμ . (9.27)
Example 9 A power supply system (shown in Figure 3.43) consists of four unin-terruptable power supplies (UPSs) and three bus bars. The power supply at any of the three buses is sufficient for feeding the loads. UPS4 is the standby for any failed UPS. Considering CCBEs for both UPS and buses, develop the fault tree and calculate the system failure probability from the given following information. Total unavailability of the each UPS and bus is 5.5E–4 and 1.6E–6, respectively. Assume the alpha-factors for the buses as α1 = 0.95, α2 = 0.04, and α3 = 0.01, and for the UPSs as α1 = 0.95, α2 = 0.035, α3 = 0.01, and α4 = 0.005. Calculate CCBE probabilities and unavailability of system.
9.7 Common-cause Failure Analysis 359
MCPS
No supply fromMCPS
UPSFAIL3
Failure of UPS
BUSFAIL2
Failure of Bus
F2I
F2 independetfailure
r=0
F24
CCF leading tofailure of F2 and
F4
r=0
F26
CCF leading tofailure of F2 and
F6
r=0
F246
CCF leading tofailure of F2, F4
& F6
r=0
U1I
UPS 3independent
failure
r=0
U12
CCF leading tofailure of U1
and U2
r=0
U13
CCF leading tofailure of U1
and U3
r=0
U14
CCF leading tofailure of U1
and U4
r=0
U123
CCF leading tofailure of U1, U2
& U3
r=0
U124
CCF leading tofailure of U1, U2
& U4
r=0
U134
CCF leading tofailure of U1, U3
& U4
r=0
U1234
CCF leading tofailure of U1,U2, U3 & U4
r=0
U1
UPS 1 failure
U2
UPS2 failure
Page 4
U3
UPS 3 failure
Page 5
U4
UPS4 failure
Page 6
F2
F2 bus failure
F4
F4 bus failure
Page 2
F6
F6 bus failure
Page 3
Figure 9.16 Fault tree of MCPS with CCF
Solution: The fault tree for the given system is shown in Figure 9.16. Minimal cut sets are shown in Table 9.8. There are 43 cut sets. Assuming symmetry, and as the components are identical, the following notation is used:
For UPSs: U1I = U2I = U3I = U4I = P1, U12 = U23 = U24 = U14 = U13 = U34 = P2, U123 = U124= U234 = U134 = P3, U1234 = P4.
For buses: B1I = B2I = B3I =Q1, B12 = B23 = B13 = Q2, B123 = Q3.
360 9 Probabilistic Safety Assessment
Table 9.8 Minimal cut sets of power supply system
No. Cut set No. Cut set 1 U123 23 U24.U13 2 U124 24 U24.U34 3 U134 25 U2I.U34 4 U234 26 U12.U34 5 U1234 27 U13.U2I 6 F24 28 U12.U3I 7 F246 29 U23.U4I 8 F26 30 U23.U14 9 F46 31 U23.U24
10 U1I.U24 32 U23.U34 11 U13.U23 33 U12.U13 12 U12.U4I 34 U12.U23 13 U12.U14 35 F2I.F4I 14 U12.U24 36 U14.U2I 15 U14.U3I 37 F2I.F6I 16 U14.U34 38 U14.U24 17 U1I.U34 39 F4I.F6I 18 U1I.U23 40 U1I.U2I.U4I 19 U13.U4I 41 U2I.U3I.U4I 20 U13.U14 42 U1I.U2I.U3I 21 U13.U34 43 U1I.U3I.U4I 22 U24.U3I
The probability over sum of minimal cut sets using rare event approximation is now simplified to
322
143212
23
1 33412154)( QQQPPPPPPTP +++++++= .
CCBE probabilities for buses:
06.132 3211
=++==∑=
αααααm
kkt k ,
31 21 2 3
31 43E 6 6 037 E 8 4 53E 8t t t
t t t
. . .Q Q ;Q Q ;Q Qαα α
α α α= = − = = − = = − .
CCBE probabilities for UPSs:
07.1432 43211
=+++==∑=
ααααααm
kkt k ,
31 2 41 2 3 4
2 44 883E 4 3 59E 5 5 14E 6 1 028E 5
3t t t tt t t t
. . . .P P . ;P P ;P P ;P Pαα α α
α α α α= = − = = − = = − = = −
9.8 Human Reliability Analysis 361
Substituting all these CCBEs in P(T), the unavailability of the system is ob-tained as 3.129E–5.
9.8 Human Reliability Analysis
Failure causes of complex engineering systems include hardware failures, soft-ware failures, and human errors. Traditional risk/reliability studies assumed that majority of system failures were due to hardware failures, but it has been found from the accident history that human error causes 20–90% of all major system failures [9]. For example, in the aerospace industry, 90% of accidents were due to human error, whereas in NPPs it was 46%. Hence, human interactions (HIs) should be considered with special emphasis while doing PSA. A human reliability analysis (HRA) is used to model the influence of human error and HI with the sys-tem in the calculation of system risk/reliability. The main objective of HRA is to ensure that the key HIs are systematically incorporated into the assessment in or-der to identify which of these actions are dominant risk contributors that should be carefully attended to in order to reduce human-error probability (HEP) in operator actions. This section addresses all the key issues involved in the incorporation of HRA into PSA.
9.8.1 Human Behavior and Errors
In a successful mission in engineering systems, human actions are involved like system design, operation, and maintenance. All these human actions are consid-ered in HRA to identify and quantify the important interactions. The classification of HIs is a very important step in HRA. Due to the complex nature of human be-havior, there are several classifications available in the literature. Some of the relevant classifications are mentioned here.
A classification or taxonomy of human errors will aid human-error identifica-tion and analysis. A typical taxonomy for observable manifestation of human error (external error mode) and psychological error mechanism would be as follows:
• Error of omission occurs when an operator omits a step in a task or the en-tire task, amounting to an unintended or unnoticed action.
• Error of commission occurs when the person does the task, but does it in-correctly, amounting to an unintended action excluding inaction.
• Psychological error is the operator’s internal failure mode such as attention failure, perception failure, memory failure, etc.
Rasmussen’s stepladder model [18] provides a generally accepted framework to identify different types of behavior and associated error mechanisms (Fig-ure 9.17). Human actions can be grouped into the following types based on work complexities:
362 9 Probabilistic Safety Assessment
• Skill based: highly practiced activities that can be performed with little ap-parent thought.
• Rule based: performance of tasks, as per the procedures, within the normal experience and ability of the particular individual.
• Knowledge based: performance of tasks in unforeseen situations where fa-miliar patterns and rules cannot be applied directly as symptoms could be ambiguous, the state of the plant is complicated by multiple failures, an in-strument does not give true representation of a situation, etc. A high level of cognitive processing is necessary.
This model is based on the assumption that humans will generally perform tasks at the lowest level possible to minimize the amount of decision-making or cognitive thought required. Skill-based tasks require little or no decision-making and hence a task proceeds directly from initial stimuli, i.e., activation to the execu-tion stage. Rule-based tasks require some decision-making and hence move first from the initial stimuli to the integration phase where information is processed, and only then is an appropriate procedure or rule selected. Finally, the selected procedure or rule is executed. Knowledge-based tasks require the highest degree of decision-making and this leads to interpretation of information and then to its evaluation, before appropriate procedures can be selected and tasks executed. Behavior Decision-making elements Knowledge-based interpretation Evaluation Rule-based integration Procedure selection Skill-based activation Execution
Figure 9.17 Rasmussen’s decision-making model
The error mechanisms associated with Rasmussen’s model of human behavior are of two kinds, slips and mistakes:
• A slip is an error in implementing a set goal – intention correct but execution
failed, e.g., a failure to open a valve. A type of slip is a lapse; an omission, e.g., forgetting to open a valve.
• A mistake occurs when a correct and necessary action is performed on the wrong system (wrong system is selected) or an erroneous action (wrong action is selected) is performed on the right system.
9.8 Human Reliability Analysis 363
9.8.2 Categorization of Human Interactions in Probabilistic Safety Assessment
Three categories of HIs can be defined to facilitate the incorporation of HRA into the PSA structure. The three categories are as follows [19].
9.8.2.1 Category A: Pre-initiators
Pre-initiators consist of those HIs associated with maintenance, testing, and cali-bration, which on account of the errors made during their performance, can cause equipment/systems becoming unavailable, when required post-fault. System avail-ability could be degraded because the HIs may cause failure of a compo-nent/component group or may leave components in an inoperable state. Especially important are errors that result in concurrent failure of multiple channels of safety-related systems. This unavailability is added to other failure contributions for components or systems, at the fault tree level. In these HIs, the time available for action is not a major constraint, i.e., time-related stress is not a significant influen-tial factor. Further, pre-initiator errors usually occur prior to an IE and can remain latent. Recovery action for such human errors could follow error alarm, post-maintenance testing, or post-maintenance inspection with checks and may be modeled as applicable at the quantification stage.
9.8.2.2 Category B: Initiators
Initiators are those HIs that contribute to IEs or plant transients. They are usually implicit in the selection of IEs and contribute to total IE frequency. Errors in these actions, either by themselves or in combination with other failures (other than hu-man errors) can cause IEs. Most important are errors that not only precipitate an ac-cident sequence but which also concurrently cause failure of safety/safety-related systems. Such “common-cause initiators” need to be specially emphasized in HRA.
9.8.2.3 Category C: Post-initiators
These are post-incident HIs comprising the actions performed after an IE, with the intent to bring the plant to a safe state. Errors in these interactions can exacerbate the fault sequence. These HIs can be separated into three different types:
1. Procedural safety actions. These actions involve success/failure in following established procedures in response to an accident sequence and are incorpo-rated explicitly into event trees. These include emergency operating procedure responses and other manual reinforcement actions.
364 9 Probabilistic Safety Assessment
2. Aggravating actions/errors. These actions/errors occur post-fault following an IE and significantly exacerbate the accident progression. They are the most dif-ficult to identify and model. One type occurs when the operator’s mental image of the plant differs from actual plant state, causing the operator to perform the supposedly right action for the event, which however has been wrongly inter-preted. Such an error also occurs when the operator correctly diagnoses an event but adopts a less than optimal strategy for dealing with it. Once the ac-tions and their significant consequences are identified they can be incorporated explicitly into the event/fault tree.
3. Improvisations and recovery/repair actions. These consist of recovery actions, which are included in accident sequences that would otherwise dominate risk profiles. They may include the recovery of previously unavailable equipment or the use of non-standard procedures (improvisations) to mitigate accident condi-tions. These can be incorporated into the PSA as recovery actions in the acci-dent sequence event trees.
Some diagnosis is required for type 1, type 2, and type 3 actions, and time is usually a limiting factor. The general approach for dealing with type 1 and type 3 actions is the same and the two can be treated as one single category. For conven-ience, type 2 actions are also included in this category, although specific measures are usually outlined for dealing with them.
9.8.3 Steps in Human Reliability Analysis
The stepwise structured process of HRA based on the systematic human action re-liability procedure (SHARP) is presented below [20].
9.8.3.1 Definition
The task needs to be defined to ensure that all relevant human actions are ade-quately considered in the study. This would also clarify the boundaries for the study and the interface with other assessments being performed. The scope and ob-jective will determine the tasks and analysis of tasks will help in identifying poten-tial human errors. For completeness of coverage and modeling, the HIs identified are grouped into three categories as discussed in the previous section for appropri-ate modeling in HRA depending on timing and impact on the plant operation.
9.8 Human Reliability Analysis 365
9.8.3.2 Screening
This step is to identify the human actions that are significant to the operation and safety of the plant. Screening can be carried out by a combination of qualitative and quantitative approaches by the analyst with appropriate justification.
9.8.3.3 Qualitative Analysis
A detailed description of important human actions is developed by defining the key influential factors necessary to complete the modeling. The potential for errors and mechanism for recovery from the identified errors should be determined or es-timated. Recovery actions identified in emergency operating procedures and also cut sets not in an approved procedure should be included.. This will often require information additional to that collected during the initial task analysis. The above steps should have identified most of the specific constraints and performance-shaping factors (PSFs) associated with the overall task (e.g., time available, se-quence of steps, specific context of the task). But there may be other factors that need to be considered, particularly relating to the individuals performing the tasks (experience, level of training, stress levels, etc.).
9.8.3.4 Representation and Model Integration
Once HIs are selected after screening out and broken down into elements with de-tailed descriptions, the next task is to select and apply techniques for depicting im-portant human actions in logic structures. There are various models advocated by experts: time-independent models include technique for human error rate predic-tion (THERP); time-dependent models include human cognitive reliability (HCR) and operator action tree (OAT). Some of these HRA models are discussed below.
Time-independent Models
THERP. In time-independent models, the time available to the operator is not a major constraint on action, i.e., the probability of the operator taking the action is not significantly altered by reducing or increasing the time available for action. Errors related to such cases usually (but not always) occur before the IE. The models are therefore also referred to as latent-error models. For modeling of such errors, THERP is used.
THERP is somewhat analogous to hardware reliability analysis with human ac-tions substituted for component outputs. In THERP, tasks and task steps are iden-tified along with PSFs that influence the steps. The task failure event is modeled
366 9 Probabilistic Safety Assessment
in what is called an HRA event tree (to distinguish it from a PRA event tree). The HRA event tree (Figure 9.18) structures the activities, potential failures and de-pendences (redundancies) in HIs in failure logic and includes a failure probability at the end of failure paths. Diagnosis in THERP is considered to be a holistic process and is assigned a single HEP value. HEP data are taken from the THERP Handbook.
Accident sequence evaluation program (ASEP) HRA procedure. In complex systems like NPPs, HRA can be an involved and time-consuming process. THERP was therefore expanded to cover a more cost-effective three-stage HRA procedure called ASEP for application to PSA. The three stages are:
1. screening HRA using the screening HEP assignment methodology of ASEP; 2. nominal HRA using the nominal HEP assignment methodology of ASEP for
those tasks whose estimated HEPs are greater than the screening limit; 3. THERP HRA methodology applied to those tasks whose HEPs are greater
than screening HRA, as well as nominal HRA limit values.
Figure 9.18 HRA event tree for calibration task
The ASEP-HRA procedure includes both screening model and nominal diagnosis model time reliability curves. Details of the ASEP HRA procedure are given in [21].
Time-dependent Models
In these models, the time available to the operator for action is, in many cases, a major constraint on the operator’s ability to act. Most of the time-dependent mod-els are based on a time reliability correlation, which allows an engineering ori-ented quantification of human reliability in terms of HEPs. Examples of such
a
B = Failure to calibrate the instrument correctly
A = Failure to set-up test equipment properly
b
9.8 Human Reliability Analysis 367
models are HCR and OAT. HCR, which allows practical handling of significant HIs, uses a normalized three-parameter Weibull distribution to represent the corre-lation between time available for response and the probability of failure to re-spond. OAT (Figure 9.19) is a representation that identifies alternative actions on the basis of operator interpretations associated with observation, diagnosis, and se-lection of response. The analyst can display the potential of different decision strategies to affect the accident sequence.
Event
Occurs
Operator Detects
Alarm
Operator Diagnoses
Problem
Operator Responds
Properly
Figure 9.19 OAT model
The HCR model. This is one of the models used in the SHARP technique. The HCR model has been developed for quantification of control-room crew suc-cess/failure probability as a function of time allowing for skill-, rule-, and knowl-edge-type behavior that can result in different probabilities. The model also allows for selected PSFs that can influence crew response times. An assumption made is that the probability distribution for crews responding to a plant event is a function of normalized time (actual crew response time to the event/median response time for a number of crews). It depends on the behavior involved (skill, rule or knowl-edge). PSFs affect response probability by changing the median response time. PSFs considered in the use of the HCR model are operating experience, stress, and quality of the human–machine interaction.
The HCR correlation is given below; the model relates the non-response prob-ability P(t) to normalized time t/T1/2:
iC
i
i
A
BTt
tP
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡⎟⎠⎞⎜
⎝⎛ −
−= 2/1exp)( , (9.28)
where t is the time available to complete the action or set of actions following a stimulus, and T1/2 is the estimated median time to complete the task (action or set
S
F
Success
Failure F
F
368 9 Probabilistic Safety Assessment
of actions) as adjusted by specific PSFs. This is arrived at on the basis of an analy-sis of simulator data for similar plants or on the basis of discussions with crews. Ai, Bi, Ci are the correlation coefficients specified for skill-, rule-, and knowledge-based processing (Table 9.9).
This model involves the four steps given below:
1. Determine the cognitive process (skill, rule, or knowledge) applicable to the HI involved.
2. Estimate the time window by thermal hydraulics/transient analysis. 3. Estimate the median time reflecting key plant- and task-specific PSFs. 4. Estimate the crew response probability using the HCR correlation.
Simulator data have been used to examine the validity of the HCR model. EPRI’s Operator Reliability Experiment Project also examined the validity of the HCR model and arrived at the conclusion that the operator response time can be well represented by a lognormal probability distribution, which provides as good a fit as the Weibull and is easier to use.
HCR interim parameters. The analyst should use his/her judgment in applica-bility of any model for the specific tasks in the plant. Model integration is done to describe how the significant human actions are integrated into the plant and sys-tem models of the PSA either in fault tree or event tree stages.
Table 9.9 Correlation coefficients
Cognitive processing type Ai Bi Ci
Skill 0.407 0.7 1.2 Rule 0.601 0.6 0.9 Knowledge 0.791 0.5 0.8
9.8.3.5 Quantification
This step involves use of appropriate data or quantification methods to assign probabilities for the various actions examined, determining sensitivities and estab-lishing uncertainty ranges. The human-error data collection and analysis is an im-portant aspect in assuring quality in quantification.
References
1. Kaplan S, Garrick BJ (1981) On the quantitative definition of risk. Risk Analysis 1: 11–37 2. NASA (2002) Probabilistic risk assessment – procedures guide for NASA managers and
practitioners. Version 1.1, NASA report 3. IAEA (1992) Procedure for conducting probabilistic safety assessment of nuclear power
plants (level 1). Safety series no. 50-P-4, International Atomic Energy Agency, Vienna
References 369
4. AERB (2002) Probabilistic safety assessment guidelines. AERB safety guide, AERB, Mum-bai
5. USNRC (1975) Reactor safety study. WASH-1400, NUREG-75/014, United States Nuclear Regulatory Commission
6. Vose D (2000) Risk analysis – a quantitative guide. John Wiley & Sons, New York 7. Bedford T, Cooke R (2001) Probabilistic risk analysis: foundations and methods. Cambridge
University Press, London 8. Fullwood RR (2000) Probabilistic safety assessment in the chemical and nuclear industries.
Butterworth Heinemann 9. Stewart MG (1997) Probabilistic risk assessment of engineering system, Chapman & Hall 10. BARC (1996) Level 1 PSA report. Bhabha Atomic Research Centre report, Mumbai 11. Borgonovo E, Apostolakis GE (2001) A new importance measure for risk-informed decision
making. Reliability Engineering and System Safety 72: 193–212 12. Mosleh A et al (1988) Procedures for treating common cause failures in safety and reliability
studies. US Nuclear Regulatory Commission and Electric Power Research Institute, NUREG/CR-4780, and EPRI NP-5613, vols 1 and 2
13. USNRC (1993) Procedure for analysis of common-cause failures in probabilistic safety analysis. NUREG/CR-5801 (SAND91-7087)
14. USNRC (2007) Common-cause failure database and analysis system: event data collection, classi-fication, and coding, NUREG/CR-6268
15. Fleming KN (1975) A reliability model for common mode failure in redundant safety sys-tems. General atomic report GA-A13284
16. Fleming KN, Kalinowski AM (1983) An extension of the beta factor method to systems with high level of redundancy. Pickard, Lowe and Garric, Inc., PLG-0289
17. Mosleh A, Siu NO (1987) A smulti parameter event based common cause failure model. Pro-ceedings of Ninth International Conference on Structural Mechanics in Reactor Technology, Lausanne, Switzerland
18. Rasmussen J (1979) On the structure of knowledge – a morphology of mental models in a man-machine context. RIS0-M-2192, RISO National Laboratory, Roskilde, Denmark
19. IAEA (1995) Human reliability analysis in PSA for nuclear power plants. Safety series no. 50-P-10, IAEA, Vienna
20. Hannaman, GW, Spungin AJ (1984) Systematic human action reliability procedure (SHARP), EPRI NP-3583
21. Subramaniam K, Saraf RK, Sanyasirao VVS, Venkatraj V (2000) A perspective on human re-liability analysis and studies on the application of HRA to Indian pressurised heavy water re-actors. Report BARC/2000/E-013