STPA A new hazard analysis technique based on the STAMP model of accident causation
Transcript
Slide 1
STPA A new hazard analysis technique based on the STAMP model
of accident causation
Slide 2
2 Outline What is STPA STPA process Example: Robot Example:
TCAS Comparison and Results In-class STPA
Slide 3
3 STAMP-Based Hazard Analysis (STPA) Basic premise is to
prevent accidents by enforcing safety constraints on system
behavior (controlling hazardous system states) Goals (same as any
hazard analysis) Identification of system hazards and related
safety constraints necessary to ensure acceptable risk Design For
Safety Accumulation of information about how hazards can occur. Use
info to eliminate, mitigate and control hazards in system design,
development, manufacturing, and operations
Slide 4
4 Controlling States Since hazardous states can be prevented
through appropriate control (enforcing safety constraints), this
hazard analysis method seeks to find instances of Inadequate
Control Inadequate control occurs when there are state transitions
to hazardous states The commands, decisions, or actions that lead
to violation of safety constraints: Inadequate Control Actions
Slide 5
5 Inadequate Control Actions Identify inadequate control
actions 1.A required control action is not provided or not followed
2.An incorrect or unsafe control action is provided 3.A potentially
correct control action is provided too late or too early (at the
wrong time) 4.A correct control action is stopped too soon.
Slide 6
6 Control Flaw Taxonomy Design of the control algorithm does
not enforce constraints Flaw(s) in creation process Process changes
without appropriate change in control algorithm (asynchronous
evolution) Incorrect modification or adaptation Process models are
inconsistent, incomplete, or incorrect Flaw(s) in creation process
Flaw(s) in updating process Inadequate or missing feedback Not
provided in system design Communication flaw Time lag Inadequate
sensor operation
Slide 7
7 Control Flaw Taxonomy (cont) Time lags and measurement
inaccuracies not accounted for Expected process inputs are wrong or
missing Expected control inputs are wrong or missing Disturbance
model is wrong Amplitude, frequency, or period is out of range
Unidentified disturbance Inadequate coordination among controllers
and decision makers
Slide 8
8 Inadequate Control Execution Inadequate Execution of Control
Actions Communication flaw Inadequate actuator operation Time lag
Copyright Nancy Leveson, Aug. 2006
Slide 9
9 Controlled Process Failure Inadequate Sensor Operation
Inadequate Actuator Operation Process Model Wrong Inadequate
Control Algorithm Control Input Wrong or Missing Feedback Wrong or
Missing Inadequate control Commands Process Input Wrong or Missing
Process Output Wrong or Missing Disturbances Unidentified or Out of
Range STPA: A New Hazard Analysis Technique Based on STAMP
Controller Sensor(s) Actuator(s)
Slide 10
10 How to Perform STPA 1.High-level Hazard Analysis: Indentify
Accidents Hazards High-level Safety Constraints 2.Identify
Inadequate Control Actions Control structure 3.Control Flaws In the
design 4.Change design to eliminate, mitigate, or control
potentially unsafe control actions and behaviors. Or accept
5.Iterate
Slide 11
11 Identifying and Specifying Safety Constraints Most
requirements only specify nominal behavior Need to specify
off-nominal behavior Need to specify what system and software must
NOT do What must not do is not inverse of what must do Derive from
system hazard analysis Copyright Nancy Leveson, Aug. 2006
Slide 12
12 Example: Mobile Robot
Slide 13
13 1.Identify high-level functional requirements and
environmental constraints. e.g. size of physical space, crowded
area 2. Identify high-level hazards a. Violation of minimum
separation between mobile base and objects (including orbiter and
humans) b. Mobile robot becomes unstable (e.g., could fall over) c.
Manipulator arm hits something d. Fire or explosion e. Contact of
human with DMES f. Inadequate thermal control (e.g., damaged tiles
not detected, DMES not applied correctly) g. Damage to robot
Thermal Tile Robot Example Copyright Nancy Leveson, Aug. 2006
Slide 14
14 3. Restate hazards as high-level safety constraints e.g.
Robot must not allow humans to come in contact with DMES 4.Try to
eliminate from system design 5.If cannot be eliminated or
adequately controlled at system design level, will need to refine
and allocate them to system components. Thermal Tile Robot Example
(2) Copyright Nancy Leveson, Aug. 2006
Slide 15
15 Design Constraints are Refined and Traced to Components
1.4.2.1 Mobile Base (MB) : Requirements : MB-FR1: The mobile base
shall be able to carry all the mobile robot subsystem components
[2.6.3(73)] MB-FR2: The mobile base shall be able to move smoothly
in any direction and to cross cable covers on the floor [EA.3(15),
H3(38), 2.6.2(73)] MB-FR3: The mobile base shall be able to raise
its inspection and injection equipment to the level required for
servicing the tiles, from 2.9 meters to 4 meters [EA.2(15),
2.6.3(73), 2.10.1(81), 2.10.4(81)] Copyright Nancy Leveson, Aug.
2006
Slide 16
16 Design Constraints are Refined and Traced to Components (2)
Design Constraints : MB-C1: The mobile base must be no more than
2.5 meters long and 1 meter wide. While moving, it must fit under
structural beams as low as 1.75 meters [EA.2(15), 4.6)]
Safety-Related Design Constraints MB-SC1: The mobile base must be
able to ensure accuracy of 10 cm for positioning and 1 mm for tile
servicing (inspection and injection tasks [EA.2(15), H4(38),
2.6.1(73), 2.6.4(73)] MB-SC2: The mobile base design must protect
against fire and explosion [H6(39), 2.6.5(73), 2.6.6(73)] MB-SC3:
It must be possible to move the mobile base out of the way in case
of an emergency [2.9.2(79)] Copyright Nancy Leveson, Aug. 2006
Slide 17
17 Design Constraints are Refined and Traced to Components (3)
Motor Controller: 2.9.2 The drivetrains for locomotion are within
the diameter of the wheel hub and consist of a brushless DC motor,
resolver for positioning and commutation, a brake, a cycloidal
reducer providing 225:1 gear reduction with exceptional stiffness,
and a locking hub that couples the output of the reducer to the
wheel. The locking hub allows the operator to disengage the wheels
from the drivetrain completely [MB-SC3(20)] Rationale: In an
emergency, the ability to disengage the wheels will allow towing or
pushing the machine out of the way. Copyright Nancy Leveson, Aug.
2006
Slide 18
18 Define preliminary control structure and refine constraints
and design in parallel. Copyright Nancy Leveson, Aug. 2006
Slide 19
19 Refinement and Allocation After defining initial control
structure, refine constraints and design in parallel. Identify
potentially hazardous control actions by each of system components
that would violate system design constraints. Restate as component
safety design requirements and constraints. Perform hazard analysis
using STPA to identify how safety-related requirements and
constraints could be violated (the potential causes of inadequate
control and enforcement of safety-related constraints). Augment the
basic design to eliminate, mitigate, or control potential unsafe
control actions and behaviors. Iterate over the process, i.e.
perform STPA on the new augmented design and continue to refine the
design until all hazardous scenarios are eliminated, mitigated, or
controlled. Document design rationale and trace requirements and
constraints to the related design decisions. Copyright Nancy
Leveson, Aug. 2006
Slide 20
20 Try to eliminate hazards from system conceptual design. If
not possible, then identify controls and new design constraints.
For unstable base hazard System Safety Constraint: Mobile base must
not be capable of falling over under worst case operational
conditions Copyright Nancy Leveson, Aug. 2006
Slide 21
21 First try to eliminate : 1.Make base heavy Could increase
damage if hits someone or something. Difficult to move out of way
manually in emergency 2.Make base long and wide Eliminates hazard
but violates environmental constraints 3.Use lateral stability legs
that are deployed when manipulator arm extended but must be
retracted when mobile base moves. Two new design constraints :
Manipulator arm must move only when stabilizer legs are fully
deployed Stabilizer legs must not be retracted until manipulator
arm is fully stowed. Copyright Nancy Leveson, Aug. 2006
Slide 22
22 Identify potentially hazardous control actions by each of
system components 1.A required control action is not provided or
not followed 2.An incorrect or unsafe control action is provided
3.A potentially correct or inadequate control action is provided
too late or too early (at the wrong time) 4.A correct control
action is stopped too soon. Hazardous control of stabilizer legs:
Legs not deployed before arm movement enabled Legs retracted when
manipulator arm extended Legs retracted after arm movements are
enabled or retracted before manipulator arm fully stowed Leg
extension stopped before they are fully extended Copyright Nancy
Leveson, Aug. 2006
Slide 23
23 Restate as safety design constraints on components
1.Controller must ensure stabilizer legs are extended whenever arm
movement Is enabled 2.Controller must not command a retraction of
stabilizer legs when manipulator arm extended 3.Controller must not
command deployment of stabilizer legs before arm movements are
enabled. Controller must not command retraction of legs before
manipulator arm fully stowed 4.Controller must not stop leg
deployment before they are fully extended Copyright Nancy Leveson,
Aug. 2006
Slide 24
24 Do same for all hazardous commands: e.g., Arm controller
must not enable manipulator arm movement before stabilizer legs are
completely extended. At this point, may decided to have arm
controller and leg controller in same component Copyright Nancy
Leveson, Aug. 2006
Slide 25
25 To produce detailed scenarios for violation of safety
constraints, augment control structure with process models Arm
Movement Enabled Disabled Unknown Stabilizer Legs Extended
Retracted Unknown Manipulator Arm Stowed Extended Unknown How could
become inconsistent with real state? e.g. issue command to extend
stabilizer legs but external object could block extension or
extension motor could fail Copyright Nancy Leveson, Aug. 2006
Slide 26
26 Problems often in startup or shutdown: e.g., Emergency
shutdown while servicing tiles. Stability legs manually retracted
to move robot out of way. When restart, assume stabilizer legs
still extended and arm movement could be commanded. So use unknown
state when starting up Do not need to know all causes, only safety
constraints: - - May decide to turn off arm motors when legs
extended or when arm extended. Could use interlock or tell computer
to power it off. - Must not move when legs extended? Power down
wheel motors while legs extended. Coordination problems Copyright
Nancy Leveson, Aug. 2006
Slide 27
27 Example: TCAS
Slide 28
28 Step 1: Identify hazards and translate into high- level
requirements and constraints on behavior TCAS Hazards: 1.A near
mid-air collision (NMAC): Two controlled aircraft violate minimum
separation standards) 2.A controlled maneuver into ground 3.Loss of
control of aircraft 4.Interference with other safety-related
aircraft systems 5.Interference with the ground-based ATC system
6.Interference with ATC safety-related advisory System Safety
Design Constraints: TCAS must not cause or contribute to an NMAC
TCAS must not cause or contribute to a controlled maneuver into the
ground Copyright Nancy Leveson, Aug. 2006
Slide 29
29 Step 2: Define basic control structure Copyright Nancy
Leveson, Aug. 2006
Slide 30
30 Component Responsibilities TCAS: Receive and update
information about its own and other aircraft Analyze information
received and provide pilot with Information about where other
aircraft in the vicinity are located An escape maneuver to avoid
potential NMAC threats Pilot Maintain separation between own and
other aircraft using visual scanning Monitor TCAS displays and
implement TCAS escape maneuvers Follow ATC advisories Air Traffic
Controller Maintain separation between aircraft in controlled
airspace by providing advisories (control action) for pilot to
follow Copyright Nancy Leveson, Aug. 2006
Slide 31
31 Aircraft components (e.g., transponders, antennas) Execute
control maneuvers Receive and send messages to/from aircraft Etc.
Airline Operations Management Provide procedures for using TCAS and
following TCAS advisories Train pilots Audit pilot performance Air
Traffic Control Operations Management Provide procedures Train
controllers, Audit performance of controllers Audit performance of
overall collision avoidance system Copyright Nancy Leveson, Aug.
2006
Slide 32
32 For the NMAC hazard : TCAS: 1.The aircraft are on a near
collision course and TCAS does not provide an RA 2.The aircraft are
in close proximity and TCAS provides an RA that degrades vertical
separation. 3.The aircraft are on a near collision course and TCAS
provides an RA too late to avoid an NMAC 4.TCAS removes an RA too
soon Pilot: 1.The pilot does not follow the resolution advisory
provided by TCAS (does not respond to the RA) 2.The pilot
incorrectly executes the TCAS resolution advisory. 3.The pilot
applies the RA but too late to avoid the NMAC 4.The pilot stops the
RA maneuver too soon. Copyright Nancy Leveson, Aug. 2006
Slide 33
33 Step 3b: Use identified inadequate control actions to refine
system safety design constraints When two aircraft are on a
collision course, TCAS must always provide an RA to avoid the
collision TCAS must not provide RAs that degrades vertical
separation The pilot must always follow the RA provided by TCAS
Copyright Nancy Leveson, Aug. 2006
Slide 34
34 Step 4: Determine how potentially hazardous control actions
could occur (scenarios of how constraints can be violated).
Eliminate from design or control in design or operations. Step4a:
Augment control structure with process models for each control
component. Step4b: For each of inadequate control actions, examine
parts of control loop to see if could cause it. Guided by a set of
generic control loop flaws Step 4c: Design controls and mitigation
measures Step4d: Consider how designed controls could degrade over
time. Copyright Nancy Leveson, Aug. 2006
Slide 35
35 Copyright Nancy Leveson, Aug. 2006
Slide 36
36 TCAS does not provide an RA when required to avoid an NMAC -
Unit is not operational -- Pilot does not turn it on --
Self-monitor turns off TCAS unit -- Component failure - TCAS does
not perceive a conflict -- Current location of aircraft is
incorrect TCAS thinks other aircraft is on the ground Incorrect
altitude provided to TCAS Uneven terrain TCAS puts other aircraft
outside protected volume -- Location of own aircraft incorrect
Altimeter error Delay in receipt of information about altitude
change Copyright Nancy Leveson, Aug. 2006
Slide 37
37 Comparison with Traditional HA Techniques Top-down (vs.
bottom-up like FMECA) Considers more than just component failure
and failure events (includes these but more general) Guidance in
doing analysis (vs. FTA) Handles dysfunctional interactions and
system accidents, software, management, etc. Copyright Nancy
Leveson, Aug. 2006
Slide 38
38 Comparisons (2) Concrete model (not just in head) Not
physical structure (HAZOP) but control (functional) structure
General model of inadequate control (based on control theory) HAZOP
guidewords based on model of accidents being caused by deviations
in system variables Includes HAZOP model but more general Compared
with TCAS II Fault Tree (MITRE) STPA results more comprehensive
Included Ueberlingen accident Copyright Nancy Leveson, Aug.
2006
Slide 39
39 Ballistic Missile Defense System (BMDS) Non-Advocate Safety
Assessment using STPA A layered defense to defeat all ranges of
threats in all phases of flight (boost, mid-course, and terminal)
Made up of many existing systems (BMDS Element) Early warning
radars Aegis Ground-Based Midcourse Defense (GMD) Command and
Control Battle Management and Communications (C2BMC) Others MDA
used STPA to evaluate the residual safety risk of inadvertent
launch prior to deployment and test
Slide 40
40 Results Deployment and testing held up for 6 months because
so many scenarios identified for inadvertent launch. In many of
these scenarios: All components were operating exactly as intended
Complexity of component interactions led to unanticipated system
behavior STPA also identified component failures that could cause
inadequate control (most analysis techniques consider only these
failure events) As changes are made to the system, the differences
are assessed by updating the control structure diagrams and
assessment analysis templates. Adopted as primary safety approach
for BMDS
Slide 41
In-Class STPA Subway Train Doors
Slide 42
42 Train Doors What are is the system goal(s)? What are the
accidents? What are the hazards? Translate the hazards into safety
constraints
Slide 43
43
Slide 44
44 What to do for Train Doors Exercise To Do in your groups or
individually, your choice. See slide 10, How to do STPA and follow
that to do a STPA hazard analysis on an existing train door design
you are familiar with. (feel free to add design changes that are
interesting to you) Be sure to include: Control structure and
control loops, process models (the controllers model of what the
system/process is doing), The expected control inputs,
measurements, sensors etc that you find to be relevant as you are
going through the STPA process. Inadequate control actions (slide
5) Control flaws and inadequate control executions. (slides
6-9)
Slide 45
45 What to do for Train Doors Exercise Once youve found the
inadequate control actions and related control flaws and inadequate
control executions, Identify new safety constraints on the system
and new design decisions to enforce the safety constraints (and
prevent inadequate control) Well talk about the Train Doors STPA in
class next week. No need to turn any papers, but bring what youve
done so we can go over it as a group. Feel free to contact me with
questions. Maggie Stringfellow: [email protected]