Risk management tools
Patrick HudsonTim Hudson
Hudson Global Consulting
How can we manage risk?
• We can manage risk by hoping it won’t happen• We can manage risk by offering sacrifices to the
Gods• We can manage risk by understanding what we are
doing
• The first two don’t work• The third is what a Safety Management System
does
Risk• Risk is a complex concept• Combination of to different components
– RISK = Outcome x Probability of that outcome• Outcomes – what could happen
– Usually seen as a scenario– Worst case - conservative– Most credible worst case
• Probability of those outcomes– Often measured as frequency of occurrence– Needs to be applied before anything has gone wrong– Probabilities are difficult to estimate– Knowing the probability may change its value
Session 16 Building World Class SMS
No Structure Structure
safety management system
Do
Plan
Check
Feedback
ContinuousImprovement
Engage
IncidentPotentialMatrix
TRIPOD
Road Safety Plan
ObjectivesTargets
Organization
Structure Alcohol&
Drugs Policy
HSEPlan
HSEPolicy
Audit Plans HAZARDS
& EFFECTS MGMT.
EA
HealthRiskAssess.
UnsafeActAudit
There is more to an SMS than lots of good intentions
Safety Management System (SMS)Pr
oduc
tion
Protection
DISASTER
BANKRUPTCYBetter defenses converted to increased
production
Safety Management System (SMS)
Protection
DISASTER
BANKRUPTCY
Best practice operations under SMS
Prod
uctio
n
Generic HSE Management System (Shell)
1- Leadership and Commitment
2 - Policy and Strategic Objectives
8 - Management Review
Corrective Action 7 - Audit
3 - Organisation, Responsibilities
Resources and Standards
Corrective Action
5 - Planning & Procedures
4 - Hazards & Effects Mgt (Risk Mgt)
6 – Implementation, Monitoring
Corrective Action
PLAN
DO FEEDBACK
CHECK
Hazard-based approachHEMP - Hazard and Effects Management ProcessIdentify - What are the hazards?Assess - how big are those hazards?Control - how do we control the hazards?Recover - what if it still goes wrong?
Step 1. Identification• First identify your hazards
– What is going to hurt you?– Needs to be specific enough to manage practically
• E.g. not just potential and kinetic energy– General enough to manage specifics in the same way– Accumulate in a list – Hazard Register
• A range of tools and methods help here– Brainstorming - proactive– HAZID– Incident analyses - reactive
• Reporting
Step 2. Assess
• How big is the risk you are taking and running?• A wide range of tools available• Not an exact science – whatever anyone tells you• Small risks can be ignored• Large risks may not be taken• Usually framed in terms of ALARP
– As Low as Reasonably Practicable– Not intended to be as low as possible
• Risk assessment should point to what to do about the hazard in question
Step 3. Manage and control• Primarily preventative• Success is measured by nothing going wrong• Prevention involves a variety of approaches
– Use of the hierarchy of controls– Barriers to keep hazards in place– Controls to prevent them escaping
• Management is directly responsibility for the provision of controls and barriers– Requires resourcing, procurement and continuous evaluation
• Front line personnel is responsible for their use once provided and supported– Requires ability to operate the controls and barriers
Step 4. Recovery
• Recovery is necessary after control over a hazardous process has been lost
• But before the worst case consequences have been achieved
• Recovery controls and barriers are reactive• The term Mitigation applies best here• These controls are usually much more expensive than
preventative controls• Sometimes challenged because “We’ve never used that
so we can get rid of it and save money”
Tools• Risk management tools are intended to help one or more of the 4 steps
– Usually applied continuously to improve– Especially on the feedback loops
• Audits• Incident investigations• Reporting • Performance assessment for predictive improvement
• Identify – discover unexpected hazards• Assess – evaluate what needs to be done• Control – systematically list the controls to see if they are adequate to reduce the risk to
acceptable levels• Recover – identify what will reduce the consequences• Successful risk management allows us to take the risks that enable us to get the benefits without
disaster
• These can easily be mapped onto the ICAO components– Not just the risk management elements– Also all the other elements
Minimising RegretMaximising Opportunity
IncidentNormal
Operations
MissedOpportunity Safe
Go
No-Go
Regret No Regret
Risk Assessment Matrices
• A simple way of supporting the product of outcome and probability
• Not a discrete set of values, but an easy way of representing the distributions of severity of outcomes and their probabilities
• So – there is no single CORRECT Matrix
Consequence Increasing ProbabilityA B C D E
Rating People Assets Environment
Neverheard of in
industry
Incidentheard of in
industry
Incidentheard of incompany
Incidenthappensseveral
times peryear in
company
Incidenthappensseveral
times peryear in alocation
0 No injury Nodamage
No effects LowRisk
LowRisk
LowRisk
LowRisk
LowRisk
1 Slightinjury
Slightdamage
Slighteffect
LowRisk
LowRisk
LowRisk
Med/lowRisk
Med/lowRisk
2 Minorinjury
Minordamage
Minoreffect
Med/lowRisk
Med/lowRisk
Med/lowRisk
Med/lowRisk
Med/lowRisk
3 Majorinjury
Localdamage
Localisedeffect
Med/lowRisk
Med/lowRisk
MediumRisk
MediumRisk
HighRisk
4 Singlefatality
Majordamage
Majoreffect
MediumRisk
MediumRisk
MediumRisk
HighRisk
HighRisk
5 Multiplefatality
Extensivedamage
Massiveeffect
MediumRisk
HighRisk
HighRisk
HighRisk
HighRisk
Risk Assessment Matrix
The colour determines the level of active risk management required
0 21 43
5 86 1110
7 129 13 14
Reduced exposure Left side
MitigationRight side
Now
After
Risk Calculations
0 22 44
5 128 2815
8 4020 100 200
Reduced exposure Left side
MitigationRight side
Risk matrix alternative
The numbers are a reflection of how unacceptable the matrix cell is
What is ALARP?
Options0
20
40
60
80
100
120
1 2 3 4 5 6
Ris
k
Risk tostakeholders
Cost
Legal mimimumrequirements
ALARP = As Low As Reasonably Practical
How can we understand our controls?
• The Bowtie is an industry standard in many high-hazard activities
• Bowties cover both control and recovery• Bowties are not primarily intended to be
quantitative, but can be computed with• Bowties visually express the extent and types of
control and are easy for managers to understand– Is everything procedural– Does one person have to do everything
Bow-tie ConceptEvents and
CircumstancesHarm to people and damage to assets
or environment
HAZARD
CONSEQUENCES
CONTROLS
Undesirable event withpotential for harm or damage
Engineering activitiesMaintenance activitiesOperations activities
Bow-tie Conceptfor a specific event
Events andCircumstances
Harm to people and damage to assets
or environment
HAZARD
CONSEQUENCES
RISK CONTROLS
Undesirable event withpotential for harm or damage
Engineering activitiesMaintenance activitiesOperations activities
A problem for aviation
• Simple models have difficulty in capturing recent major commercial aviation incidents
• Asiana 214, QF 32, AF 447, BA 38
A Diversion - Causality
• Simple accidents are simply caused– Linear and deterministic
• Complex accidents are more complex• 80-20 rule suggests simple accidents are 80%• Remaining 20% require us to recognize
complexity
Theory 1 - how accidents are caused
• Linear causes – A causes B causes C
• Deterministic - either it is a cause or it isn’t
• We can compute both backwards and forwards
• People are seen as the problem – human error etc
• Probably good enough to catch 80% of the accidents we are likely to have
• Covers most of private and GA operators
Private users
Theory 2 - how accidents are caused
• Non-Linear causes– Cause and consequence may be disproportionate– These causes are organizational, not individual
• Deterministic dynamics- either it is a cause or it isn’t
• We can compute both backwards and forwards– Increasingly difficult with non-linear causes
• This is the Organizational Accident Model
• Probably good enough to catch 80% of the residual accidents = 96%
• Probably best GA and professional operations
Oilfield operations
Non-linearity
• The size of an effect (consequence) is linearly proportional to the input – linearity
• Non-linearity is different– The size of an effect (bad consequences) gets bigger (or
smaller after a while) as a function of the input– The improvement in performance gets smaller (almost
always) even though the input gets bigger• Linearity works fine to start with, but only 80% of the
cases
Linear and non-linear functions
Effect
Cause
Effect
Cause
Linear Non-linear
Suddenly gets a lot worse
More non-linear functions
Effect
Cause
Effect
Cause
Non-linear Non-linear
It can’t get much worse Both – starts bad, tails off
Determinism• A Causes B• If A happens, then B will happen next
Non-determinism
• Move from A causes B to A makes B more likely
• Causation is probabilistic• Probabilities are distributions, not points
Conditionalize on latest aircraft generation
4 th generation aircraft have dominantly weird accidents
Types of accidents• Theory 1• Simple models may cover 80% of all accidents• These are the simple personal accidents
• Theory 2• The next step gets 80% of the remainder = 96%• These are the complex personal accidents and some organizational
accidents
• Theory 3• The probabilistic approach may net the next 80% = 99.2%• These are the complex process accidents
Theory 3 - how accidents are caused• Non-Linear causes
• Non-Deterministic dynamics– Probabilistic rather than specific– Influences on outcomes by people and the organisation
• Probabilities may be distributions rather that single values• We cannot compute both backwards and forwards
• The dominant accidents that remain are WEIRD– WILDLY– ERRATIC– INCIDENTS– RESULTING IN– DISASTER
• Prior to an event there may be a multitude of possible future outcomes
Unusual or WEIRD Accidents
• In commercial aviation major accidents are now extremely rare
• Simple risk assessment and analysis models often fail to capture how these accidents are caused
• We need to understand our risk space better• The Rule of Three is an example of how to do
this
The Rule of Three
• Accidents have many causes (50+)• A number of dimensions were marginal• Marginal conditions score as Orange• NO-Go conditions score as Red
• The Rule of 3 is Three Oranges = Red
Aircraft Operation Dimensions
• Crew Factors Experience, Duty time, CRM• Aircraft Perf. Category, Aids, Fuel, ADDs• Weather Cloud base, wind, density alt, icing, wind• Airfield Nav Aids, ATC, Dimensions, Topography• Environment Night/day, Traffic, en route situation• Plan Change, Adequacy, Pressures, Timing• Platform Design, Stability, Management
The Rule of Three
No of Oranges
Outcome
1/2 1 1/2 2 1/2 3 1/2
Crash
Big Sky
We fixed it
ProblemNo problem
Why does the rule work?
• People use cognitive capacity to allow for increasing risk
• As the oranges increase the remaining available capacity is reduced
• At 3 oranges there is little available capacity remaining
• Any trigger can de-stabilize the system• An accident suddenly becomes very likely
Load > strength
How random numbers combine
Normal upper limit
Normal lower limit
The danger zone/safe zone – safe operating envelope concept
Known dangerzone
Normal path through the safe field
Unknown dangerzone(swiss cheese defect)
Defined Operational Boundary
Enter unknown dangerzone
Normal path blocked by uncommon
circumstance
Risk
• Risk is a complex concept• Classically probability x outcome• Safety management is about:
– Taking risk – acceptable (ALOS) vs unacceptable – Running risk – getting away with it– Can be based on luck or on professionalism
• The granularity of the outcomes and how they can be reached is essential
• Most approaches are crude– Salami slicing is a way to evade regulation
Risk Space
High Risk areas
Low risk/resilient areas
Single distribution AKnown danger
zone
Single distribution BKnown danger
zone
Single distribution CKnown danger
zone
Known danger zones
Combined distribution (A,B,C)
Combined distribution (A,B,C)Known danger zones
Known danger
zone
Combined distribution (A,B,C)
Unexpected danger
zone
Known danger zones
Known danger
zone
Simple view of combined distribution
Simple view of combined distribution
Low average risk despite danger
zone
Simple view of combined distribution
Medium average risk despite danger zone
Simple view of combined distribution
High average risk due to sufficient
granularity
Mission Creep and Drift into Danger
• Success with risks makes people willing to accept greater risks– This is a consequence of risk homeostasis
• This can look like complacency, but is a natural consequence of their successes, so far
• Failure to understand the finer detail of the risk space makes this drift into danger more likely
Conclusion
• Conventional risk assessment involves uncovering the potential for bad consequences
• Modern commercial aviation is very safe, so the accidents we wish to avoid may not be caught by standard techniques
• Advanced risk analysis involves increasing our understanding of the risk space we operate in