Safety Critical Systems
2
Safety Critical Systems
A safety-critical computer system is a computer system whose failure may cause injury or death to human beings or the environment
Examples: Aircraft control system (fly-by-wire,...) Nuclear power station control system Control systems in cars (anti-lock brakes,...) Health systems (heart pacemakers,...) Railway control systems Communication systems Wireless Sensor Networks Applications?
3
What is Safety?
“The avoidance of death, injury or poor health to customers, employees, contractors and the general
public; also avoidance of damage to property and the environment”
Safety is NOT an absolute quantity!Safety is NOT an absolute quantity!
Safety is also defined as "freedom from unacceptable risk of harm"
Safety is also defined as "freedom from unacceptable risk of harm"
A basic concept in System Safety Engineering is the avoidance of "hazards"
A basic concept in System Safety Engineering is the avoidance of "hazards"
4
Safety vs. Security
These two concepts are often mixed up In German, there is just one term for both!
System
Safety= doesn’t cause harm
Security= protection against
attacks
5
SILs and Dangerous Failure Probability
Safety Integrity Level
High demand mode of operation (Probability of dangerous failure per hour)
SIL 4 10-9 P < 10-8
SIL 3 10-8 P < 10-7
SIL 2 10-7 P < 10-6
SIL 1 10-6 P < 10-5
6
Railway Signalling Systems
Signalling and Switching Axle Counters Applications for ETCS
An incorrect output may lead to an incorrect signal causing a major accident!
Safety Integrity Level 4 (highest)
7
(Old) Interlocking Systems
Mechanical / Electromechanical
Systems
8
Signal Box / Interlocking Tower
Electric system with some electronics
9
Modern Signal Box / Interlocking Tower
Lots of electronics and computer systems
10
What is a Hazard? Hazard
physical condition of platform that threatens the safety of personnel or the platform, i.e. can lead to an accident
a condition of the platform that, unless mitigated, can develop into an accident through a sequence of normal events and actions
"an accident waiting to happen"
Examples oil spilled on staircase failed train detection system at an automatic railway
level crossing loss of thrust control on a jet engine loss of communication distorted communication undetectably incorrect output
11
Hazard Severity Level (Example)
Category Id.
Definition
CATASTROPHIC I General: A hazard, which may cause death, system loss, or severe property or environmental damage.
CRITICAL II General: A hazard, which may cause severe injury, major system, property or environmental damage.
MARGINAL III General: A hazard, which may cause marginal injury, marginal system, property or environmental damage.
NEGLIGIBLE IV General: A hazard, which does not cause injury, system, property or environmental damage.
12
Hazard Probability Level (Example)
LevelProbability [h-
1]Definition
Occurrences per year
Frequent P ≥ 10-3 may occur several times a month
More than 10
Probable 10-3 > P ≥ 10-4 likely to occur once a year
1 to 10
Occasional 10-4 > P ≥ 10-5 likely to occur in the life of the system
10-1 to 1
Remote 10-5 > P ≥ 10-6
unlikely but possible to occur in the life of the system
10-2 to 10-1
Improbable
10-6 > P ≥ 10-7 very unlikely to occur 10-3 to 10-2
Incredible P < 10-7
extremely unlikely, if not inconceivable to occur
Less than 10-
3
13
Risk Classification Scheme (Example)
Hazard Severity
Hazard Probability
CATASTROPHIC CRITICAL MARGINAL NEGLIGIBLE
Frequent A A A B
Probable A A B C
Occasional A B C C
Remote B C C D
Improbable C C D D
Incredible C D D D
14
Risk Class Definition (Example)
Risk Class Interpretation
A Intolerable
BUndesirable and shall only be accepted when risk reduction is impracticable.
CTolerable with the endorsement of the authority.
DTolerable with the endorsement of the normal project reviews.
15
Having identified the level of risk for the product we must determine how acceptable & tolerable that risk is Regulator / Customer Society Operators
Decision criteria for risk acceptance / rejection Absolute vs. relative risk (compare with previous,
background) Risk-cost trade-offs Risk-benefit of technological options
Risk Acceptability
16
Risk Tolerability
Hazard
Severity Probability
Risk
Risk Criteria
Tolerable?No
Risk Reduction MeasuresYes
What are Safety Requirements
The system requirements specification (or sub-system/equipment as appropriate) may be considered in two parts :
1. Requirements which are not related to safety
2. Requirements which are related to safety
Requirements which are related to safety are usually called safety requirements. These may be contained in a separate safety requirements specification.
Safety integrity relates to the ability of a safety-related system to achieve its required safety functions. The higher the safety integrity, the lower the likelihood that it will fail to carry out the required safety functions
Safety integrity comprises two parts :
1. Systematic failure integrity
2. Random failure integrity
Systematic failure integrity is the non-quantifiable part of the safety integrity and relates to hazardous systematic faults (hardware or software). Systematic faults are caused by human errors in the various stages of the system/sub-system/equipment life-cycle.
Examples of Systematic Faults are :1. Specification errors2. Design errors3. Manufacturing errors4. Installation errors5. Operation errors6. Maintenance errors7. Modification errors
Random failure integrity is that part of the safety integrity which relates to hazardous random faults, in particular random hardware faults, which are the result of the finite reliability of hardware components.
Examples of Random Faults are:1. Failure of Resistor2. Failure of an IC Etc.
22
Diversity
Goal: Fault Tolerance/Detection Diversity is "a means of achieving all or
part of the specified requirements in more than one independent and dissimilar manner."
Can tolerate/detect a wide range of faults"The most certain and effectual check upon errors which arise in the process of computation, is to cause the same computations to be made by separate and independent computers; and this check is rendered still more decisive if they make their computations by different methods."
Dionysius Lardner, 1834
23
Layers of Diversity
Concept of Operation(e.g. specifications)
Realisation(e.g. object code)
Implementation(e.g. source code)
Design(e.g. design descriptions)
HW(CPU, memory,...)
abstraction
e.g. two different paradigms, such as rule based and functional
Diversity Examples
e.g. n version design
e.g. n version coding
e.g. diverse compilers
e.g. diverse CPU
24
Examples for Diversity
Specification Diversity Design Diversity Data Diversity Time Diversity Hardware Diversity Compiler Diversity Automated Systematic Diversity Testing Diversity Diverse Safety Arguments …
Some faults to be targeted:
programming bugs, specification faults, compiler faults, CPU faults, random hardware faults (e.g. bit flips), security attacks,...
25
Compiler Diversity
Use of two diverse compilers to compile one common source code
...Module A{ int i; int end; get(end); for i = 1 to end result=func(i,result); POS[i]=result; next}...
Compiler A
Compiler B
...move $4, Ajmp $54256add ($5436), B...
...add ($66533), Aret move $4, C...
Common Source Code
Diverse Compiler - different manufacturer - different version - different compiler options
Diverse Object Code (?)
26
Compiler Diversity: Issues
Targeted Faults: Systematic compiler faults Some systematic and permanent hardware
faults (if executed on one board) Issues:
To some degree possible with one compiler and different compile options (optimization on/off,…)
If compilers from different manufacturers are taken, independence must be ensured
27
Systematic Automatic Diversity
What can be "diversified": memory usage execution sequence statement structures array references data coding register usage addressing modes pointers mathematical and logic rules