Date post: | 20-Jan-2015 |
Category: |
Health & Medicine |
Upload: | yoshio-sakai |
View: | 257 times |
Download: | 1 times |
An Extended Notation of FTA for Risk Assessment of Software-intensive Medical Devices.
- Recognition of The Risk Class Before and After The Risk Control Measure -
Yoshio SAKAI Engineering Promotion Center, NIHON KOHDEN CORPORATION Seiko SHIRASAKA The Graduate School of System Design and Management, KEIO University Yasuharu NISHI Department of Systems Engineering, The University of Electro-Communications
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Flow of the Presentation
1. Explanation of the traditional FTA which lack consideration of the software. 2. Explanation of the risk assessment method in ISO 14971 which lack consideration of
the software. 3. Explanation of solutions using an extended notation of FTA.
2
1. Traditional FTA 2. Risk Assessment Method in ISO 14971
3. An Extended Notation of FTA
OLD OLD NEW
Lack of consideration of the Software Failure
Hazard
Hazardous Situation
Harm
Severity of the Harm
Probability of Occurrence
of HarmRisk
Seq
uenc
e of
Eve
nts
Exposure (P1)
P2
P1 × P2
Intensive-Software
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
The History of FTA (Fault Tree Analysis)
Fault Tree Analysis (FTA) was originally developed for Minuteman Missile in 1962 at Bell Laboratories by H.A. Watson. At that time, FTA was designed because the electronic system was not able to endure vibration and caused it to break down.
As for the FTA, completeness was raised by BOEING.
1962
1965
NOW The FTA is used widely.
The cause of the trouble was the hardware failure,
not software.
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
The traditional FTA which lacks consideration of the software.
• When FTA was developed, the failure caused by the software was not an element of the failures of FTA.
• The traditional FTA is not comprehensible about – The effectiveness before and after the risk control measure. – The software in the system and the risk control measure affects the top event.
• The calculation of the failure rate on FTA can not use for the failure caused by the software.
4
HARDWARE
SOFTWARE
○
×
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
The Traditional Risk Assessment Method
1. The hot water as the thermal energy
2. A cover opens and spills hot water
3. Getting burned
The example is the boiled water with an electric kettle.
5
Fig. 3. ISO 14971
P1 is the probability of a hazardous situation occurring. P2 is the probability of a hazardous situation leading to harm.
Software ?
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
The Estimation of the probability of a hazardous situation
6
Failure Rate of Random Hardware Failure
HARDWARE USABILITY
SOFTWARE •Software is Invisible. •The failure caused by the software occurs systematically, but not statistically.
We can not estimate the probability or the likelihood of the failure cased by Software.
The likelihood of the usability failure HIGH Frequent Probable Occasional Remote Improbable LOW
Likelihood: SOURCE IEC 80001-2-1 Step by Step
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Feature of Systematic Failure Systematic failure is unwanted behaviour which is • repeatable
– If the conditions can be exactly replicated
• predictable (but not accurately) – all systems have flaws
• indefensible – it should not occur... … but it is extremely hard to prevent
7
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
The definition and explanation of Systematic Failure
This International Standard NOTE4 : • sets requirements for the avoidance and control of systematic faults, which are based
on experience and judgment from practical experience gained in industry. Even though the probability of occurrence of systematic failures cannot in general be quantified the standard does, however, allow a claim to be made, for a specified safety function, that the target failure measure associated with the safety function can be considered to be achieved if all the requirements in the standard have been met;
SOURCE: IEC 61508-3:2010
Systematic Failure failure, related in a deterministic way to a certain cause, that can only be eliminated by a change of the design or of the manufacturing process, operational procedures, documentation or other relevant factors
SOURCE: ISO 26262-1:2011
8
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Two types of evaluation of the hazard caused by Systematic Software Failure
9
The probability of such failure shall be assumed to be 100 percent. (IEC 62304:2006)
If the hazard could arise from a failure of the software, the risk evaluation should be analyzed by the following two concerns. (IEC 62304:2006 Amd.1 , This Study)
• The probability is 100%. • This 100 percent principle has been chosen for conservative purpose
but not practical in real application.
• 1st concern is the risk level as the severity of the harm before the risk control measures. • 2nd concern is the risk level as the severity of the harm after the risk control measures. • The evaluation of the residual risk is of importance, but under the cause of the software, the
probability of occurrence of harm before the risk control measures is not.
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
The procedure of evaluation of the hazard caused by Systematic Software Failure
10
RISK RESIDUAL RISK RISK CONTROL MEASURES
If the hazardous situation occurs by Systematic Software Failure
The safety is affected by • the hardware as the risk
control measure and • the reliability of the
critical software component.
After the risk control measures, we have to evaluate the residual risk for the safety.
The probability of occurrence of harm caused by the software before the risk control measures is not necessary for the risk assessment.
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Method of evaluating Systematic Failure
Medical device Manufacturers can evaluate the residual risk class by the following combination after countermeasure.
11
a. The severity of the residual risk
b. The reliability of the software items that could contribute to a hazardous situation
c. The safe architecture of the software system
These are not elements of the probability
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Relation between the risk control measures and Architecture.
12
Complicated Software Items (Low cohesion and High coupling)
Segregated Software Items (High cohesion and Low coupling)
Layered Architecture (3 Layers: Presentation, Domain and Date Source)
Result of having continuous addition (A real software system)
Clear Not Clear
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
The mode of cut or coagulation is switched by software.
Mode Principles
Cut For cutting, a continuous single frequency sine wave is often employed.
Coagulation For coagulation, the average power is typically reduced below the threshold of cutting. Generally, the sine wave is turned on and off in a rapid succession.
13
The Principles of Electrosurgical Knife
There are the serious hazardous situations in the software system.
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Electrosurgical Knife Block Diagram
14
The wave is controlled and switched by the software
The most serious hazard is hemorrhage not intended by the abnormal output of Electrosurgical knife.
Let’s see the fault tree analysis following slides.
High Risk Software Component
High Risk Software Component
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Extended Notation of FTA (1)
15
Abnormal Output ofElectrosurgical Knife
High-frequencyWave Failure
Wave CircuitFailure
Output HardwareFailure
Timer Failure
Failure of the AbnormalDetection
AbnormalMonitoring
Failure
A/DConvertor
Failure
Cut/CoagMode
Mismatch
Unintended Output causedby Software
AbnormalMonitoring
Failure
Abnornal Output causedby Hardware
Class Bs = AND (Bs, B)Class C = OR (C, C, B)
Class C Class C Class B
Class Bs
Class A(C)s = AND (C, --Bs))
Class Bs Class B
Class Cs
Class A(C)s = AND (Cs, --Bs)
Class A(C)s = OR (A(C)s, A(C)s)
1st column from the bottom and on the left side of FTA Example
a. There are three hardware failures. b. Each failure is classified by the risk level. c. Three basic events are connected with OR
gate. d. The highest risk class is adopted by the OR
function.
Risk Class Definition (Source IEC 62304:2006)
Class A No injury or damage to health is possible
Class B Non-serious injury is possible
Class C Death or serious injury is possible
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Extended Notation of FTA (2)
16
2nd column from the bottom and on the left side of FTA Example a. The right basic event is an abnormal monitoring failure. b. This event is caused by the software. c. It is described with Class Bs as impact level of risk Class
B and with “s” as the effect of the software. d. The abnormal monitoring inhibits and controls the output
hardware failure. This is indicated by AND function as AND(C, --Bs). The stage of inhibit is shown by the number of the minus. In this case, the risk control measure goes down the risk level by two stages from C to A.
Abnormal Output ofElectrosurgical Knife
High-frequencyWave Failure
Wave CircuitFailure
Output HardwareFailure
Timer Failure
Failure of the AbnormalDetection
AbnormalMonitoring
Failure
A/DConvertor
Failure
Cut/CoagMode
Mismatch
Unintended Output causedby Software
AbnormalMonitoring
Failure
Abnornal Output causedby Hardware
Class Bs = AND (Bs, B)Class C = OR (C, C, B)
Class C Class C Class B
Class Bs
Class A(C)s = AND (C, --Bs))
Class Bs Class B
Class Cs
Class A(C)s = AND (Cs, --Bs)
Class A(C)s = OR (A(C)s, A(C)s)
Class C Class A
-- Risk Control Measure(Class Bs)
Down the risk level by two stages
Class A(C) s = AND(C, --Bs)
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Extended Notation of FTA (3)
17
1st column from the bottom and On the right side of FTA Example. a. The abnormal monitoring failure is caused by the
software.
b. The A/D convertor failure is caused by hardware.
c. If the basic event does not inhibit the other basic event, the highest risk class is adopted by the AND function. (This method is inspired by the notation of ASIL decomposition in ISO 26262-9)
d. The subscript “s” is inherited from the left side to the right side through the function as the affect of the software to the system.
Abnormal Output ofElectrosurgical Knife
High-frequencyWave Failure
Wave CircuitFailure
Output HardwareFailure
Timer Failure
Failure of the AbnormalDetection
AbnormalMonitoring
Failure
A/DConvertor
Failure
Cut/CoagMode
Mismatch
Unintended Output causedby Software
AbnormalMonitoring
Failure
Abnornal Output causedby Hardware
Class Bs = AND (Bs, B)Class C = OR (C, C, B)
Class C Class C Class B
Class Bs
Class A(C)s = AND (C, --Bs))
Class Bs Class B
Class Cs
Class A(C)s = AND (Cs, --Bs)
Class A(C)s = OR (A(C)s, A(C)s)
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Extended Notation of FTA (4)
18
1st column from the top of FTA Example. a. The highest risk class is adopted by
the OR function. In this case, the risk classes are same.
b. The risk class of a top event is expressed after all as Class A (C) s.
• The followings are recognized by this notation.
– The risk class of the residual risk is A. – The highest risk class before the risk
control measure is C. – The software affects the top event or the
risk control measure in the system.
Abnormal Output ofElectrosurgical Knife
High-frequencyWave Failure
Wave CircuitFailure
Output HardwareFailure
Timer Failure
Failure of the AbnormalDetection
AbnormalMonitoring
Failure
A/DConvertor
Failure
Cut/CoagMode
Mismatch
Unintended Output causedby Software
AbnormalMonitoring
Failure
Abnornal Output causedby Hardware
Class Bs = AND (Bs, B)Class C = OR (C, C, B)
Class C Class C Class B
Class Bs
Class A(C)s = AND (C, --Bs))
Class Bs Class B
Class Cs
Class A(C)s = AND (Cs, --Bs)
Class A(C)s = OR (A(C)s, A(C)s)
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Effectiveness of this Notation
These are the following effectiveness of this notation. • The safety analysts can recognize
– the risk class before and after the risk control measure. – the software in the system and the risk control measure affects the top event. – the effect of the risk control by the minus mark in the AND function.
• When there is the mark "s" of the event in the fault tree, the safety analysts find the start point of the effect of the software for the system safety.
• When there is the mark "s" and the minus mark, the safety analysts can recognize the risk which is given by changing software of the risk control measure.
19
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013 20
The start point of the effect of the software for the system safety
There is the risk which is given by changing software of the risk control measure
There is the risk which is given by changing software of the risk control measure
Effectiveness of this Notation
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Attention! • FTA is an excellent way to show the structure of the mechanism that
Top Event as "undesired state of the system" is generated. • On the other hand, the calculation of the failure rate on FTA has a
dangerous feature too.
21
1.The evaluation of the residual risk is of importance. 2.We can evaluate the severity of the harm before and after the risk control measures. Therefore, we should focus on the architecture of the software system and the structure of the risk control measures.
When Systematic Software Failure has not been recognized, the analysis of a radiation therapy machine named Therac-25 included the software in the fault trees but used a “generic failure rate” of 10-4 for software events.
This number was justified based on the historical performance of the Therac-25 software.(This source is from SAFEWARE by Pf. Nancy Leveson)
But now, we understand the features of the software well, and recognize it is not realistic.
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013 22
Thank you. I wish this notation will be used in the real development of Medical Devices.
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
REFERENCES [1] Dolores R. Wallace, D. Richard Kuhn, “Failure Modes In Medical Device Software:An
Analysis Of 15 Years Of Recall Data” , 2001 [2] S.Shirasaka, Y.Sakai, Y.Nishi, “Feature Analysis of Estimated Causes of Failures in Medical
Device Software and Proposal of Effective Measures” , ISSRE 2011, [3] ISO 14971:2007 Medical devices - Application of risk management to medical devices [4] ISO 26262-1:2011 Road vehicles - Functional safety - Part 1: Vocabulary [5] IEC/TR 80001-2-1 Application of risk management for IT-networks incorporating medical
devices – Part 2-1: Step-by-step risk management of medical IT-networks – practical applications and examples
[6] IEC 62304:2006 Medical device software - Software life cycle processes [7] “Katerina Goseva-Popstojanova, Ahmed Hassan, Ajith Guedem, Walid Abdelmoez, Diaa Eldin
M. Nassar, Hany Ammar, Ali Mili, “Architectural-Level Risk Analysis Using UML”, IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 29 NO. 10 OCTOBER 2003
[8] Sherif M. Yacoub, Hany H. Ammar, “A Methodology for Architecture-Level Reliability Risk Analysis”, IEEE TRANSACTIONS ON SOFTWARE ENGINEERING VOL. 28 NO. 6 JUNE 2002
23
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Therac-25 FTA • The probability for the computer to
choose the wrong energy is 10-11 . • The probability for the computer to
choose the wrong mode is 4×10-9 • I took off a safety device with the
hardware for an economic reason. • Systematic Software Failure has not
been recognized • This number was justified based on
the historical performance of the Therac-25 software.
The probability is 10-11 ? The probability is 4×10-9 ?
VT100
PDP-11
Computerchooses the
wrong energy
0.00000000001
System outputs thewrong energy
Computerchooses the
wrong mode
0.000000004
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
IEC 80001-2-1 Figure 8
26
Work Sheet Example of Hazard Analysis
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
New Hazard Analysis of the real medical devices.
27
Probability should be replaced to Effect of Risk Control Measure (e.g. Major/Moderate/Minor)
If there is the combination of the hardware faults and the software errors, we should have separation of the concern which is Hardware or Usability or Software.
Add “Risk Control Measure Type of Concern” SOFTWARE, USABILITY, HARDWARE, CONBINATION of ・・・
Probability should be replaced to Probability or Likelihood or NA(Software): Not Applicable.
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Separation of The Concern for the risk assessment
28
HARDWARE USABILITY
SOFTWARE
Probability (Statistically)
likelihood
The risk level before the risk control measures. The risk level after the risk control measures.
1st Concern
2nd Concern 3rd Concern
NA→The risk level
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
IEC 80001-2-1 Table D.3
29
Usability <-> ○ Likelihood Software <-> × Likelihood
If the hazardous situation occurred in the software, we can estimate the risk level as only the severity of the harm after the risk control measures.
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013 30
Medical Device System
Hardware & Software
Hazard
User Needs
Hazardous Situation & Harm
Intended Use
Software Architecture
Risk Control Measure
Residual Risk
RequirementsAnalysis
Risk Assessment
Risk Reduction
Hazard
Hazardous Situation
Harm
Severity of the Harm
Probability of Occurrence
of HarmRisk
Seq
uenc
e of
Eve
nts
Exposure (P1)
P2
P1 × P2
Change the method of the risk assessment!
The important aspects
We should focus on the architecture of the software system and the structure of the risk control measures.
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
IEC 62304:2006 Amd1 CD 4.3 Software safety classification
This chart and our study are the same classify method.
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
The Types of Safety Design
Fault Avoidance
Fail Safe
Fault Tolerance
Error Proof (Fool Proof)
Total Optimization Specific Optimization
Usability
USER
Contrasting Method
Architecture
Specific optimization as Fault Avoidance approach is not realistic for the large-scale and complicated software system.
Total optimization approach is reasonable for today’s medical device software.
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
Safety Design Method Realization Technique
Fault Avoidance
Fail Safe
Fault Tolerance
Error Proof / Fool Proof
Formal Method
Space Tolerance
Main Sub
High Coverage Testing
Interlock Lockout Safeguard
Easy Operation Home button Safety Label
Time Tolerance
1st 2nd
Information Tolerance Main
Information Error
Correction
[email protected] 24th ISSRE / 1st MedSRDR 2013 Nov 7, 2013
ISO 26262-9 Figure 2 — ASIL decomposition schemes
34
• If the basic event does not inhibit the other basic event, the highest risk class is adopted by the AND function. (This method is inspired by the notation of ASIL decomposition in ISO 26262-9)
AND function without the element of the risk control as inhibit should select the maximum level of failures. Because it focus on the risk class before and after the risk control measures.