Safety Critical Systems
T 79.5303
Design for safety hardware and software
Ilkka Herttua
V - Lifecycle model
SystemAcceptance
System Integration & Test
Module Integration & Test
Requirements Analysis
Requirements Model
Test Scenarios Test Scenarios
SoftwareImplementation
& Unit Test
SoftwareDesign
Requirements Document
Systems Analysis &
Design
Functional / Architechural - Model
Specification Document K
now
led
ge B
ase
** Configuration controlled Knowledge that is increasing in Understanding until Completion of the System:
• Requirements Documentation• Requirements Traceability• Model Data/Parameters• Test Definition/Vectors
Designing for Safety
• Faults groups
- requirement/specification errors
- random component failures
- systematic faults in design (software)• Approaches to tackle problems
- right system architecture (fault-tolerant)
- reliability engineering (component, system)
- quality management (designing and producing processes)
Designing for Safety• Hierarchical design
- simple modules, encapsulated functionality- separated safety kernel – safety critical functions
• Maintainability- preventative versa corrective maintenance- scheduled maintenance routines for whole lifecycle - easy to find faults and repair – short MTTR (mean time to repair)
• Reduce human error- Proper HMI
Hardware Faults
Intermittent faults- Fault occurs and recurs over time (loose connector)Transient faults- Fault occurs and may not recur (lightning)- Electromagnetic interferencePermanent faults- Fault persists / physical processor failure (design fault – over current)
• Fault tolerance hardware- Achieved mainly by redundancy- Adds cost, weight, power consumption, complexityOther means:- Improved maintenance, single system with better materials (higher mean time between failure - MTBF)
Fault Tolerance
Redundancy types
Active Redundancy:- Redundant units are always operating in parallel
Dynamic Redundancy (standby):- Failure has to be detected- Changeover to other module
Hardware redundancy techniques
Active techniques: - Parallel (k of N)- Voting (majority/simple)
Standby techniques :- Operating - hot stand by- Non-operating – cold stand by
Hardware reliability prediction
• Electronic Components- Based on probability and statistical- MIL-Handbook 217 – experimental data on actual device behaviour- Manufacture information and allocated circuit types-Bath tube curve; burn in – useful life – wear out
Safety Critical Hardware
Fault Detection:- Routines to check that hardware works- Signal comparisons - Information redundancy –parity check etc..- Watchdog timers- Bus monitoring – check that processor alive- Power monitoring
Safety Critical Hardware
1. Commercial Microprocessors
- No safety firmware, least assurance- Redundancy makes better, but
common failures possible- Fabrication failures, microcode and
documentation errors- Use components which have history
and statistics.
Safety Critical Hardware
2. Special reliable Microprocessors
- Collins Avionics/Rockwell AAMP2- Used in Boeing 747-400 (30+ pieces)- High cost – bench testing, documentation, formal verification- Other models: SparcV7, TSC695E, ERC32 (ESA radiation-tolerant), 68HC908GP32 (airbag)
Safety Critical Hardware
3. Programmable Logic Controllers (PLC)• Contains power supply, interface and one or more processors.• Designed for high mean time between failure (MTBF)• Solid Firmware • Program stored in EEPROMS• Programmed with ladder or function block diagrams
Safety Critical Software
Software development:- Normally iteration is needed to develop a working solution. (writing code, testing and modification).- In non-critical environment code is accepted, when tests are passed.- Testing is not enough for safety critical application – Software needs an assessment process: dynamic/static testing, simulation, code analysis and formal verification.
Safety Critical Software
Dependable Software :
- Process for development- Work discipline - Well documented- Quality management- Validated/verified
Safety-Critical Software
Software faults:- Requirements defects: failure of software requirements to specify the environment in which the software will be used or unambiguous requirements- Design defects: not satisfying the requirements or documentation defects- Code defects: Failure of code to conform to software designs.
Safety-Critical Software Software faults:- Subprogram effects: Definition of a called variable may be changed. -Definitions aliasing: Names refer to the same storage location.- Initialising failures: Variables are used before assigned values.- Memory management: Buffer, stack and memory overflows- Expression evaluation errors: Divide-by-zero/arithmetic overflow
Safety Critical Software Safety Critical Programming Language:
- Logical soundness: Unambiguous definition of the language- no dialects of C++ - Simple definitions: Complexity can lead to errors in compliers or other support tools- Expressive power: Language shall support to express domain features efficiently and easily- Security of definitions: Violations of the language definition shall be detected- Verification: Language supports verification, proving that the produced code is consistent with the specification. - Memory/time constrains: Stack, register and memory usage are controlled.
Safety Critical Software Language comparison:-Structured assembler (wild jumps, exhaustion of memory, well understood)- Ada (wild jumps, data typing, exception handling, separate compilation)- Subset languages: CORAL, SPADE and Ada (Alsys CSMART Ada kernel)- Validated compilers for Pascal and Ada- Available expertise: with common languages higher productivity and fewer mistakes, but C still not appropriate.
Safety Critical Software
Languages used :- Boeing uses mostly Ada, but still for type 747-400 about 75 languages used.- ESA mandated Ada for mission critical systems.- NASA Space station in Ada, some systems with C and Assembler.- Car ABS systems with Assembler- Train control systems with Ada- Medical systems with Ada and Assembler- Nuclear Reactors core and shut down system with Assembler, migrating to Ada.
Safety Critical Software
Tools- High reliability and validated tools are required: Faults in the tool can result in faults in the safety critical software.- Widespread tools are better tested- Use confirmed process of the usage of the tool- Analyse output of the tool: static analysis of the object code- Use alternative products and compare results- Use different tools (diversity) to reduce the likelihood of wrong test results.
Safety Critical Software
Designing Principles 1- New software features add complexity, try to keep software simple - Plan for avoiding human error – unambiguous human-computer interface- Removal of hazardous module (Ariane 5 unused code)
Safety Critical Software
Designing Principles 2- Add barriers: hard/software locks for critical parts- Minimise single point failures: increase safety margins, exploit redundancy and allow recovery.- Isolate failures: don‘t let things get worse.- Fail-safe: panic shut-downs, watchdog code- Avoid common mode failures: Use diversity – different programmers, n-version programming
Safety Critical Software
Designing Principles 3
- Fault tolerance: Recovery blocks – if one module fails, execute alternative module.
- Don‘t relay on run-time systems
Safety-Critical Software
Techniques/Tools:
-Fault prevention: Preventing the introduction or occurrence of faults by using design supporting tools (UML with CASE tool)-Fault removal: Testing, debugging and code modification
Safety Critical Software Software tool faults:- Faults in software tools (development/modelling) can results in system faults.-Techniques for software development (language/design notation) can have a great impact on the performance or the people involved and also determine the likelihood of faults.- The characteristics of the programming systems and their runtime determine how great the impact of possible faults on the overall software subsystem can be.
Practical Design Process (By I-Logix tool manufacture – Statemate)
Improved Development Process
Intergrated Development Process
Verified software process
Safety Critical Software
Reduction of Hazardous Conditions
- Simplify: Code contains only minimum features and no unnecessary or undocumented features or unused executable code- Diversity: Data and control redundancy - Multi-version programming: shared specification leads to common-mode failures, but synchronisation code increases complexity
Home assignments 2 a• Neil Storey’s book: Safety Critical Computer Systems
- 5.10 Describe a common cause of incompleteness within specifications. How can this situation cause problems?
- 9.17 Describe the advantages and disadvantages of the reuse of software within safety critical projects.
Cont.
Home assignments 2 b- 7.15 A system may be described by the following reliability
model, where the numbers within the boxes represent the module reliability. Calculate the system reliability.
Email by 1. March to [email protected]
0,7
0,7
0,70,9
0,98 0,97
0,99