STPA A new hazard analysis technique based on the STAMP model of accident causation.

STPA

A new hazard analysis technique based on the STAMP model of accident causation

2

Outline

• What is STPA

• STPA process

• Example: Robot

• Example: TCAS

• Comparison and Results

• In-class STPA

3

STAMP-Based Hazard Analysis (STPA)

• Basic premise is to prevent accidents by enforcing safety constraints on system behavior (controlling hazardous system states)

• Goals (same as any hazard analysis)

– Identification of system hazards and related safety constraints necessary to ensure acceptable risk

• Design For Safety

– Accumulation of information about how hazards can occur.

– Use info to eliminate, mitigate and control hazards in system design, development, manufacturing, and operations

4

Controlling States

• Since hazardous states can be prevented through appropriate control (enforcing safety constraints), this hazard analysis method seeks to find instances of

Inadequate Control

• Inadequate control occurs when there are state transitions to hazardous states

• The commands, decisions, or actions that lead to violation of safety constraints:

• “Inadequate Control Actions”

5

Inadequate Control Actions

Identify inadequate control actions

1. A required control action is not provided or not followed2. An incorrect or unsafe control action is provided

3. A potentially correct control action is provided too late or too early (at the wrong time)

4. A correct control action is stopped too soon.

6

Control Flaw Taxonomy

• Design of the control algorithm does not enforce constraints

– Flaw(s) in creation process

– Process changes without appropriate change in control algorithm (asynchronous evolution)

– Incorrect modification or adaptation

• Process models are inconsistent, incomplete, or incorrect

– Flaw(s) in creation process

– Flaw(s) in updating process

– Inadequate or missing feedback• Not provided in system design• Communication flaw• Time lag• Inadequate sensor operation

7

Control Flaw Taxonomy (cont)

• Time lags and measurement inaccuracies not accounted for

• Expected process inputs are wrong or missing

• Expected control inputs are wrong or missing

• Disturbance model is wrong

– Amplitude, frequency, or period is out of range

– Unidentified disturbance

• Inadequate coordination among controllers and decision makers

8

Inadequate Control Execution

Inadequate Execution of Control Actions

• Communication flaw

• Inadequate actuator operation

• Time lag

© Copyright Nancy Leveson, Aug. 2006

9

Controlled Process Failure

InadequateSensor

Operation

InadequateActuator

Operation

ProcessModelWrong

InadequateControl

AlgorithmControlInput

Wrong orMissing

FeedbackWrong orMissing

Inadequate controlCommands

Process InputWrong orMissing

Process OutputWrong or Missing

DisturbancesUnidentified

or Out of Range

STPA: A New Hazard Analysis Technique Based on STAMP

Controller

Sensor(s)

Actuator(s)

10

How to Perform STPA1. High-level Hazard Analysis:

• Indentify Accidents

• Hazards

• High-level Safety Constraints

2. Identify Inadequate Control Actions

• Control structure

3. Control Flaws

• In the design

4. Change design to eliminate, mitigate, or control potentially unsafe control actions and behaviors.

• Or accept

5. Iterate

11

Identifying and Specifying Safety Constraints

• Most requirements only specify nominal behavior

– Need to specify off-nominal behavior

– Need to specify what system and software must NOT do

• What must not do is not inverse of what must do

• Derive from system hazard analysis


12

Example: Mobile Robot

13

1. Identify high-level functional requirements and environmental constraints.

e.g. size of physical space, crowded area

2. Identify high-level hazards a. Violation of minimum separation between mobile base and

objects (including orbiter and humans)

b. Mobile robot becomes unstable (e.g., could fall over)

c. Manipulator arm hits something

d. Fire or explosion

e. Contact of human with DMES

f. Inadequate thermal control (e.g., damaged tiles not detected, DMES not applied correctly)

g. Damage to robot

Thermal Tile Robot Example


14

3. Restate hazards as high-level safety constraints

e.g. Robot must not allow humans to come in contact with DMES

4. Try to eliminate from system design

5. If cannot be eliminated or adequately controlled at system design level, will need to refine and allocate them to system components.

Thermal Tile Robot Example (2)


15

Design Constraints are Refined and Traced to Components

1.4.2.1 Mobile Base (MB):

Requirements:

MB-FR1: The mobile base shall be able to carry all the mobile robot subsystem components [2.6.3(73)]

MB-FR2: The mobile base shall be able to move smoothly in any direction and to cross cable covers on the floor [EA.3(15), H3(38), 2.6.2(73)]

MB-FR3: The mobile base shall be able to raise its inspection and injection equipment to the level required for servicing the tiles, from 2.9 meters to 4 meters [EA.2(15), 2.6.3(73), 2.10.1(81), 2.10.4(81)]


16

Design Constraints are Refined and Traced to Components (2)

Design Constraints:

MB-C1: The mobile base must be no more than 2.5 meters long and 1 meter wide. While moving, it must fit under structural beams as low as 1.75 meters [EA.2(15), 4.6)]

Safety-Related Design Constraints

MB-SC1: The mobile base must be able to ensure accuracy of 10 cm for positioning and 1 mm for tile servicing (inspection and injection tasks [EA.2(15), H4(38), 2.6.1(73), 2.6.4(73)]

MB-SC2: The mobile base design must protect against fire and explosion [H6(39), 2.6.5(73), 2.6.6(73)]

MB-SC3: It must be possible to move the mobile base out of the way in case of an emergency [2.9.2(79)]


17

Design Constraints are Refined and Traced to Components (3)

Motor Controller:

2.9.2 The drivetrains for locomotion are within the diameter of the wheel hub and consist of a brushless DC motor, resolver for positioning and commutation, a brake, a cycloidal reducer providing 225:1 gear reduction with exceptional stiffness, and a locking hub that couples the output of the reducer to the wheel. The locking hub allows the operator to disengage the wheels from the drivetrain completely [MB-SC3(20)]

Rationale: In an emergency, the ability to disengage the wheels will allow towing or pushing the machine out of the way.


18

Define preliminary control structure and refine constraints and design in parallel.


19

Refinement and Allocation

• After defining initial control structure, refine constraints and design in parallel.– Identify potentially hazardous control actions by each of system

components that would violate system design constraints. Restate as component safety design requirements and constraints.

– Perform hazard analysis using STPA to identify how safety-related requirements and constraints could be violated (the potential causes of inadequate control and enforcement of safety-related constraints).

– Augment the basic design to eliminate, mitigate, or control potential unsafe control actions and behaviors.

– Iterate over the process, i.e. perform STPA on the new augmented design and continue to refine the design until all hazardous scenarios are eliminated, mitigated, or controlled.

• Document design rationale and trace requirements and constraints to the related design decisions.


20

Try to eliminate hazards from system conceptual design. If not possible, then identify controls and new design constraints.

For unstable base hazard

System Safety Constraint:

Mobile base must not be capable of falling over under worst case operational conditions


21

First try to eliminate:

1. Make base heavy

Could increase damage if hits someone or something.

Difficult to move out of way manually in emergency

2. Make base long and wide

Eliminates hazard but violates environmental constraints

3. Use lateral stability legs that are deployed when manipulator arm extended but must be retracted when mobile base moves.

Two new design constraints:

• Manipulator arm must move only when stabilizer legs are fully deployed

• Stabilizer legs must not be retracted until manipulator arm is fully stowed.


22

Identify potentially hazardous control actions by each of system components

1. A required control action is not provided or not followed

2. An incorrect or unsafe control action is provided

3. A potentially correct or inadequate control action is provided too late or too early (at the wrong time)

4. A correct control action is stopped too soon.

Hazardous control of stabilizer legs:

• Legs not deployed before arm movement enabled

• Legs retracted when manipulator arm extended

• Legs retracted after arm movements are enabled or retracted before manipulator arm fully stowed

• Leg extension stopped before they are fully extended


23

Restate as safety design constraints on components

1. Controller must ensure stabilizer legs are extended whenever arm movement Is enabled

2. Controller must not command a retraction of stabilizer legs when manipulator arm extended

3. Controller must not command deployment of stabilizer legs before arm movements are enabled. Controller must not command retraction of legs before manipulator arm fully stowed

4. Controller must not stop leg deployment before they are fully extended


24

Do same for all hazardous commands:

e.g., Arm controller must not enable manipulator arm movement before stabilizer legs are completely extended.

At this point, may decided to have arm controller and leg controller in same component


25

To produce detailed scenarios for violation of safety constraints, augment control structure with process models

Arm Movement

EnabledDisabledUnknown

Stabilizer LegsExtendedRetractedUnknown

Manipulator ArmStowed

ExtendedUnknown

How could become inconsistent with real state? e.g. issue command to extend stabilizer legs but external object could block extension or extension motor could fail


26

Problems often in startup or shutdown:

e.g., Emergency shutdown while servicing tiles. Stability legs manually retracted to move robot out of way. When restart, assume stabilizer legs still extended and arm movement could be commanded. So use “unknown” state when starting up

Do not need to know all causes, only safety constraints: - - May decide to turn off arm motors when legs extended or when arm extended. Could use interlock or tell computer to power it off.

- Must not move when legs extended? – Power down wheel motors while legs extended.

Coordination problems


27

Example: TCAS

28

Step 1: Identify hazards and translate into high-level requirements and constraints on behavior

TCAS Hazards:

1. A near mid-air collision (NMAC): Two controlled aircraft violate minimum separation standards)

2. A controlled maneuver into ground3. Loss of control of aircraft4. Interference with other safety-related aircraft systems5. Interference with the ground-based ATC system6. Interference with ATC safety-related advisory

System Safety Design Constraints:

– TCAS must not cause or contribute to an NMAC– TCAS must not cause or contribute to a controlled

maneuver into the ground – …


29

Step 2: Define basic control structure


30

Component Responsibilities

TCAS:

• Receive and update information about its own and other aircraft

• Analyze information received and provide pilot with

– Information about where other aircraft in the vicinity are located

– An escape maneuver to avoid potential NMAC threats

Pilot

• Maintain separation between own and other aircraft using visual scanning

• Monitor TCAS displays and implement TCAS escape maneuvers

• Follow ATC advisories

Air Traffic Controller

• Maintain separation between aircraft in controlled airspace by providing advisories (control action) for pilot to follow


31

Aircraft components (e.g., transponders, antennas)

• Execute control maneuvers

• Receive and send messages to/from aircraft

• Etc.

Airline Operations Management

• Provide procedures for using TCAS and following TCAS advisories

• Train pilots

• Audit pilot performance

Air Traffic Control Operations Management

• Provide procedures

• Train controllers,

• Audit performance of controllers

• Audit performance of overall collision avoidance system


32

For the NMAC hazard:

TCAS:

1. The aircraft are on a near collision course and TCAS does not provide an RA

2. The aircraft are in close proximity and TCAS provides an RA that degrades vertical separation.

3. The aircraft are on a near collision course and TCAS provides an RA too late to avoid an NMAC

4. TCAS removes an RA too soon

Pilot:

1. The pilot does not follow the resolution advisory provided by TCAS (does not respond to the RA)

2. The pilot incorrectly executes the TCAS resolution advisory.

3. The pilot applies the RA but too late to avoid the NMAC

4. The pilot stops the RA maneuver too soon.


33

Step 3b: Use identified inadequate control actions to refine system safety design constraints

• When two aircraft are on a collision course, TCAS must always provide an RA to avoid the collision

• TCAS must not provide RAs that degrades vertical separation

• …

• The pilot must always follow the RA provided by TCAS

• …


34

Step 4: Determine how potentially hazardous control actions could occur (scenarios of how constraints can be violated). Eliminate from design or control in design or operations.

Step4a: Augment control structure with process models for each control component.

Step4b: For each of inadequate control actions, examine parts of control loop to see if could cause it.

Guided by a set of generic control loop flaws

Step 4c: Design controls and mitigation measures

Step4d: Consider how designed controls could degrade over time.


35


36

TCAS does not provide an RA when required to avoid an NMAC

- Unit is not operational

--Pilot does not turn it on

-- Self-monitor turns off TCAS unit

-- Component failure

- TCAS does not perceive a conflict

-- Current location of aircraft is incorrect

TCAS thinks other aircraft is on the ground

Incorrect altitude provided to TCAS

Uneven terrain

TCAS puts other aircraft outside protected volume

-- Location of own aircraft incorrect

Altimeter error

Delay in receipt of information about altitude change


37

Comparison with Traditional HA Techniques

• Top-down (vs. bottom-up like FMECA)

• Considers more than just component failure and failure events (includes these but more general)

• Guidance in doing analysis (vs. FTA)

• Handles dysfunctional interactions and system accidents, software, management, etc.


38

Comparisons (2)

• Concrete model (not just in head)

– Not physical structure (HAZOP) but control (functional) structure

– General model of inadequate control (based on control theory)

• HAZOP guidewords based on model of accidents being caused by deviations in system variables

• Includes HAZOP model but more general

• Compared with TCAS II Fault Tree (MITRE)

STPA results more comprehensive

Included Ueberlingen accident


39

Ballistic Missile Defense System (BMDS)Non-Advocate Safety Assessment using STPA

• A layered defense to defeat all ranges of threats in all phases of flight (boost, mid-course, and terminal)

• Made up of many existing systems (BMDS Element)

– Early warning radars

– Aegis

– Ground-Based Midcourse Defense (GMD)

– Command and Control Battle Management and Communications (C2BMC)

– Others

• MDA used STPA to evaluate the residual safety risk of inadvertent launch prior to deployment and test

40

Results

• Deployment and testing held up for 6 months because so many scenarios identified for inadvertent launch. In many of these scenarios:

– All components were operating exactly as intended

– Complexity of component interactions led to unanticipated system behavior

• STPA also identified component failures that could cause inadequate control (most analysis techniques consider only these failure events)

• As changes are made to the system, the differences are assessed by updating the control structure diagrams and assessment analysis templates.

• Adopted as primary safety approach for BMDS

In-Class STPA

Subway Train Doors

42

Train Doors

• What are is the system goal(s)?

• What are the accidents?

• What are the hazards?

• Translate the hazards into safety constraints

43

44

What to do for Train Doors Exercise• To Do in your groups or individually, your choice.

• See slide 10, “How to do STPA” and follow that to do a STPA hazard analysis on an existing train door design you are familiar with. (feel free to add design changes that are interesting to you)

• Be sure to include:

– Control structure and control loops,

– process models (the controller’s model of what the system/process is doing),

– The expected control inputs, measurements, sensors etc that you find to be relevant as you are going through the STPA process.

– Inadequate control actions (slide 5)

– Control flaws and inadequate control executions. (slides 6-9)

45

What to do for Train Doors Exercise

Once you’ve found the inadequate control actions and related control flaws and inadequate control executions, Identify

– new safety constraints on the system and

– new design decisions to enforce the safety constraints (and prevent inadequate control)

• We’ll talk about the Train Doors STPA in class next week. No need to turn any papers, but bring what you’ve done so we can go over it as a group.

• Feel free to contact me with questions. Maggie Stringfellow: [email protected]

Date post:	17-Dec-2015
Category:	Documents
Upload:	laurel-glenn
View:	222 times
Download:	1 times

STPA A new hazard analysis technique based on the STAMP model of accident causation.

Documents