Date post: | 30-Dec-2015 |
Category: |
Documents |
Upload: | shannon-clark |
View: | 215 times |
Download: | 0 times |
How safe is safe enough?(and how do we demonstrate that?)
Dr David Pumfrey
High Integrity Systems Engineering Group
Department of Computer Science
University of York
2
Why System Safety ?
Why do we strive to make systems safe?
Self interestwe wouldn’t want to be harmed by systems we develop and use
unsafe systems are bad business
We have to do sorequired by law
required by standards
But what do the law and standards represent?laws try to prevent what society finds morally unacceptable
ultimately assessed by the courts, as representatives of society
standards try to define what is acceptable practiseto discharge legal and moral responsibilities
3
Perception of Safety
Perception (and hence individual acceptance) of risk affected by many factors
(Apparent) degree of control
Number of deaths in one accident (aircraft versus cars)
Familiarity vs. novelty
“Dreadness” of risk (“falling out of the sky”, nuclear radiation)
Voluntary vs. involuntary risk (hang gliding vs nuclear accident)
Politics and journalismFrequency / profile of reporting of accidents / issues
Experience
Individual factors – age, sex, religion, culture
How do companies (engineers?) make decisions given diversity of views?
4
Getting it wrong – some examples
Ariane 5 (mis-use of legacy software)
A330 ADIRUs (software could not cope with hardware failure mode)
Boeing 777 (software error allowed switch back to ADIRU that had previously been detected as faulty)
Therac 25 (software errors contributed to radiotherapy overdose accidents)
5
Boeing 777An incident of massive altitude fluctuations on a flight out of Perth
Problem caused by Air Data Inertial Reference Unit (ADIRU)
Software contained a latent fault which was revealed by a change
Problem was in fault management/dispatch logic
June 2001 accelerometer#5 fails with erroneoushigh output values, ADIRUdiscards output values
Power Cycle on ADIRU occurs each occasionaircraft electrical systemis restarted
Aug 2006 accelerometer #6 fails, latent softwareerror allows use of previously failed accel #5
http://www.atsb.gov.au/publications/investigation_reports/2005/AAIR/aair200503722.aspx
6
Therac 25
Therac 25 was a development of (safe, successful) earlier medical machines
Intended for operation on tumoursUses linear accelerator to produce electron stream and generate X-rays (both can be used in treatments)
X-ray therapy requires about 100 times more electron energy than electron therapy
this level of electron energy is hazardous if patient exposed directly
Selection of treatment type controlled by a turntable
7
Therac 25 Schematic
M irror
Counterw eight
Electron modescan target
X-ray modetarget
Position sensemicrosw itchassembly Locking
plunger
8
Software in Therac-25
On older models, there were mechanical interlocks on turntable position and beam intensity
In Therac-25, mechanical interlocks were removed; turntable position and beam activation were both computer controlled
Older models required operator to enter data twice - at patient’s side, in shielded area – and then cross-checked
In Therac-25, data only entered once (to speed up therapy sessions)
Very poor user interfaceDisplay updated so slowly experienced therapists could “type ahead”
Undocumented error codes which occurred so often the operators ignored them
Six over-dosage accidents (resulting in deaths)May have been many cases where ineffective treatment was given
Westland Helicopters Merlin (EH101)
Helicopter Electric Actuation Technology (HEAT) project
To replace traditional flight controls...rods and links
power assistance from high pressure hydraulics
...with electrical actuationsmaller, lighter
reduction in fire risk
BUT
totally fly-by-wire – no mechanical reversion
flight control electronics become extremely critical
10
Eurofighter Typhoon: Display Processor Hardware
Second MMU
Private RAM
Private ROM
Timers
Shared RAM
Shared ROM
Priv
ate
bus
Priv
ate
bus
Loca
l bus
Processor
Private RAM
Private ROM
Timers
Arbitration
Arbitration
Processor
Second MMU
I/O
Arbitration
Specialisthardware
Arbitration
System bus
11
Timing diagram
ContextSwitch
GenerateBroadcastInterrupt
System Health Monitor
Check_SG Status
MIM InputOutput
ProcessorCBIT
SG BLTProcessor
CBITLocaliseBus Data
ProcessorCBIT
ProcessorCBIT
CPERequests
HUDMonitor
ContextSwitch
CPE_USERHSG_CBIT &CHECKSUM
ContextSwitch
System Health Monitor MIM InputProcessor
CBITSG BLT
RadarTransfer
LocaliseBus Data
ProcessorCBIT
RadarInterfaceManager
ServiceDiscrete I/F
ProcessorCBIT
IC_USERL_MSG_CBIT& CHECKSUM
& MIM_CBIT
Global BusInput Data
ProcessorCBIT
ProcessorCBIT
LocaliseBus Data
SG BLTProcessor
CBITProcessor
CBITProcessor
CBITContextSwitch
PE_USERC_MSG_CBIT
&CHECKSUM
ProcessorCBIT
ProcessorCBIT
LocaliseBus Data
SG BLTProcessor
CBITProcessor
CBIT
Global BusInput Data
ProcessorCBIT
ProcessorCBIT
SG BLTLocaliseBus Data
ProcessorCBIT
ProcessorCBIT
ProcessorCBIT
ProcessorCBIT
SG BLTLocaliseBus Data
ProcessorCBIT
ProcessorCBIT
ProcessorCBIT
ProcessorCBIT
SG BLTLocaliseBus Data
ProcessorCBIT
ProcessorCBIT
ProcessorCBIT
ProcessorCBIT
SG BLTLocaliseBus Data
ProcessorCBIT
ProcessorCBIT
ContextSwitch
System Health Monitor
ContextSwitch
System Health Monitor
ContextSwitch
System Health Monitor
ContextSwitch
System Health Monitor
ContextSwitch
System Health Monitor
ContextSwitch
System Health Monitor
CPE
IC
PE1
PE2
PE3
PE4
PE5
PE6
ContextSwitch
PE_USERR_MSG_CBIT
&CHECKSUM
ContextSwitch
PE_USERCBIT
CHECKSUM
ContextSwitch
PE_USERCBIT
CHECKSUM
ContextSwitch
PE_USERCBIT
CHECKSUM
ContextSwitch
PE_USERCBIT
CHECKSUM
ProcessorCBIT
ProcessorCBIT
ProcessorCBIT
ProcessorCBIT
ProcessorCBIT
ProcessorCBIT
ProcessorCBIT
Global BusInput Data
ProcessorCBIT
TimerInterruptLevel 6
Sync 1CPE(All)
Sync 2CPE
Sync 3PE1
(CPE)
Sync 4CPE
Sync 5IC
Sync 6PE3
Sync 7CPE(All)
SUPERVISOR
USER
SUPERVISOR
USER
SUPERVISOR
USER
SUPERVISOR
USER
SUPERVISOR
USER
SUPERVISOR
USER
SUPERVISOR
USER
SUPERVISOR
USER
VME Bus Block Transfer Period
BroadcastInterruptLevel 5
BroadcastInterruptLevel 5
BroadcastInterruptLevel 5
BroadcastInterruptLevel 5
BroadcastInterruptLevel 5
BroadcastInterruptLevel 5
BroadcastInterruptLevel 5
Non-synchronised supervisoroperations
MCTimer
Update
ContextSwitch
Radar EOF interrputsCPU level 5
No latency
IFF EOF interrputsCPU level 5No latency
Sync withdata wordCPU level 2
Acyclicinterrupts
Warninginterrupt
Multi-mission data / PDSload
12
Recursive Resource DependencyEVENTS M EM O RY
RAMRO M CPU regs I/O regs
ProgramRO M
StackRAM
Critica lvariab les
Interrupts O utput events
M astercycle clock
M M Uregisters
Bus arb itra tioncontro l reg isters
T imerregisters
Interruptconfiguration
registers
In itia lisationroutines
RAMRO M CPU regs
All resources
Intrinsica llycritica l resources
Primary contro lresources
In itia lisation routines forprimary contro l resources use
system resources, anddependencies become cyclic.
13
Safety Cases: Who are they for?
Many people and organisations will have an interest in a safety case
supplier / manufactureroperatorregulatory authoritiesbodies that conduct acceptance trialspeople who will work with the system
and their representatives (unions)
“neighbours” (e.g. general public who live round an air base)emergency services
May need more than one “presentation” of safety case to suit different audiencesWho has the greatest interest?
14
Goal Structuring Notation
Purpose of a Goal Structure
To show how goals are broken down into
sub-goals, and eventually supported by evidence
(solutions) whilst making clear the
strategies adopted, the rationale for the
approach (assumptions, justifications)
and the context in which goals are stated
A/J
15
Control Systemis Safe
All identified hazards eliminated /
sufficiently mitigated
I.L. Process Guidelines defined
by Ref X.
Hazards Identifiedfrom FHA (Ref Y)
Tolerability targets(Ref Z)
Fault Tree Analysis
FormalVerification
Process Evidenceof I.L. 4
Probability of H2 occurring
< 1 x 10-6 per annum
H1 has been eliminated
Probability of H3 occurring
< 1 x 10-3 per annum
Primary Protection System developed
to I.L. 4
Secondary Protection System developed to I.L. 2
Process Evidence of
I.L. 2
J
1x10-6 p.a.limit for
Catastrophic Hazards
Software developed to I.L.
appropriate to hazards involved
A Simple Goal Structure
16
HEAT: Developing the ArgumentTop goal
Trials aircraft is acceptably safe to fly with HEAT/ACT fitted
System
HEAT/ACT system is acceptably safe
Clearance
Procedures for flight clearance and certification followed
Integration
Trials a/c remains acceptably safe with HEAT fitted
SMS
SMS implemented to DS00-56
Product
All identified hazards have been suitably addressed
Process
All relevant requirements and standards have been complied with
17
Progressive Development
Hazard Log Application
G 1.1.4.7Hazard Log
requirement satisfied
G 1.1.4.7Hazard Log
requirement satisfied
G 1.1.4.7.1Hazard Log initiated
Hazard Log Application
Hazard Log Guidance
Notes document
G 1.1.4.7.3Hazard Log used to assess levels of risk throughout project
G 1.1.4.7.2Hazard Log correctly
maintained
Safety Review minutes
G 1.1.4.7Hazard Log
requirement satisfied
G 1.1.4.7.1Hazard Log initiated
Hazard Log Application
Hazard Log Guidance
Notes document
ISAT Hazard Log audit
report
G 1.1.4.7.3Hazard Log used to assess levels of risk throughout project
G 1.1.4.7.2Hazard Log correctly
maintained
Safety Review minutes
G 1.1.4.7.2.1Access rights to
Hazard Log correctly controlled
G 1.1.4.7.2.2Sign-off procedure and
rights to Hazard Log correctly controlled
G 1.1.4.7.2.3 Hazard Log
used consistently
G 1.1.4.7.2.4 Hazard Log update
procedure understood and correctly followed
18
An analogy
Safety case like a legal case presented in courtLike a legal case, a safety case must:
be clearbe crediblebe compellingmake best use of available evidence
Like a legal case, a safety case will always be subjectiveThere is no such thing as absolute safetySafety can never be provedAlways making an argument of acceptability
19
What is a convincing argument?Example: The Completeness Problem
G1.1.2.1.1
All relevant airworthiness requirements have been identified completely and correctly
AwComplete
Argument by showing extreme improbability of overlooking relevant requirements
AwCorrect
Argument by showing assumptions used to derive requirements were correct
G1.1.2.1.1.1
Airworthiness requirements specified
G1.1.2.1.1.2
Relevant airworthiness requirements satisfy mandated standards where applicable
G1.1.2.1.1.3
Relevance of airworthiness requirements to HEAT/ACT assessed by competent staff
G1.1.2.1.1.4
Assumptions are proven correct by flight test
BoC
Basis of Certification document
#####
CompAwStaff
Competencies of staff used to filter
airworthiness requirements
AwSigs
Competencies of specialists used to vet and approve
requirements
FltTest
Assumptions proven by flight test
DS970
Def Stan 00-970
JAR 29
GRS
EH101 General Requirement Specification
20
How is evidence used?
Strong, specific – individually compelling, taken together show system properties
Weak, general – compelling in sum
Think about evidence in used in legal (court) caseDirect - Supports a conclusion with no “intermediate steps”
e.g. a witness testifies that he saw the suspect at point X at time Y.
Circumstantial - Requires an inference to be made to reach a conclusione.g. ballistics test proves the suspect’s gun fired the fatal shot.
Safety case evidence is similare.g. Testing is direct – shows how the system behaves in specific instanceConformance to design rules is indirect – allows inference that system is fit for purpose (if rules have been proven)
Evidence may “stack up” in different ways:
21
Conclusions
Demonstrating safety is a challenge
We are building ever more complex systems
Much of the “bespoke” complexity is in software
Essential that safety is a design driver...
... and also, design for ability to demonstrate safety