+ All Categories
Home > Business > Dezfuli.h

Dezfuli.h

Date post: 28-May-2015
Category:
Upload: nasapmc
View: 14,979 times
Download: 0 times
Share this document with a friend
Popular Tags:
31
A “Systems/Case-based” Approach to System Safety Presented at NASA Project Management Challenge 2012 February 22-23, 2012 Homayoon Dezfuli, Ph.D. NASA Technical Fellow for System Safety Office of Safety and Mission assurance (OSMA) NASA Headquarters
Transcript
  • 1. A Systems/Case-based Approach to System SafetyPresented at NASA Project Management Challenge 2012 February 22-23, 2012Homayoon Dezfuli, Ph.D. NASA Technical Fellow for System SafetyOffice of Safety and Mission assurance (OSMA)NASA Headquarters

2. Introduction We have developed a System SafetyFramework under which system safetyactivities are conducted andcommunicated The three elements of the frameworkare: Safety objectives System safety activities Risk-Informed Safety Case (RISC) Guidance on the System Safety Framework is contained in theNASA System Safety Handbook Volume 1: System SafetyFramework and Concepts for Implementation (NASA/SP-2010-580) Volume 1 will be followed by Volume 2 on methodsPresented by Homayoon Dezfuli2 3. Motivation Development of the System Safety Framework is motivated bya desire to: Foster a systems view of safety (i.e., a holistic, systemsengineering view of safety) Improve integration and effectiveness of system safety activities Establish a process for defining adequate safety Establish a means for presenting a coherent case for the safetyof the system to decision makers Establish a process that is compatible with the growing trendtoward insight/oversight relationships with commercial providersPresented by Homayoon Dezfuli3 4. Safety Objectives4 5. What is Safety? Safety is freedom from those conditions that can cause death, injury, occupational illness, damage to or loss of equipment or property, or damage to the environment NPR 8715.3 The specific scope of safety is application-specific, and mustbe clearly defined by the stakeholders in terms of the entitiesto which it applies and the consequences against which it isassessed The degree of safety that is considered acceptable is alsoapplication-specific We strive to attain a degree of safety that fulfills obligations tothe at-risk communities and addresses agency priorities We do not expect to attain absolute safety (nor consider itpossible to do so)Presented by Homayoon Dezfuli5 6. Adequate Safety Achieving an adequately safe system requires adherence to the followingfundamental safety principles: The system meets or exceeds a minimum tolerable level of safety. Below thislevel the system is considered unsafe The system is as safe as reasonably practicable (ASARP)Achieve anadequately safe system Achieve a system that Achieve a system that ismeets or exceeds theas safe as reasonably minimum tolerable level practicable (ASARP)of safetyOperate the systemDesign the system to Build the system toDesign the system toBuild the system toOperate the system to continuously meet meet or exceed themeet or exceed thebe as safe asbe as safe as to continuously be as or exceed the minimum tolerable minimum tolerablereasonablyreasonably safe as reasonablyminimum tolerable level of safety level of safetypracticable practicablepracticable level of safety The minimum tolerable level of safety is not necessarily static, and mayevolve over the course of the system life cycle The principles of adequate safety must be maintained throughout allphases of the system life cyclePresented by Homayoon Dezfuli6 7. NASA Safety Thresholds & Goals NASAs minimum level of tolerable safety for human spaceflightmissions is articulated in NASAs agency-level safety goals andthresholds for crew transportation system missions to the ISS They reflect a tolerance for an initial safety performance that isacceptable initially but below long-term expectationsPresented by Homayoon Dezfuli 7 8. NASA Safety Thresholds and Goals:Accounting for the Unknowns There are known knowns; there are things we know we know.We also know there are known unknowns; that is to say we know there are somethings we do not know.But there are also unknown unknowns the ones we dont know we dont know. Former United States Secretary of Defense Donald Rumsfeld, 2002 Meeting quantitative safety requirements means more than simply showingthat the known-known risks do not exceed the applicable goal or threshold We must also be able to show that: The known-unknowns (risks that have been identified but are not quantifiable) and theunknown-unknowns (risks that exist but have not been identified) are bounded The bounds of the unknowns do not threaten the quantitative safety requirements Methods for doing this include: Reliability growth analyses of US vehicles and other countries vehicles Analyses of historical precursors and anomaliesPresented by Homayoon Dezfuli8 9. As Safe As Reasonably Practicable (ASARP) ASARP entails weighing the safety performance of a system against the sacrifice needed to further improve it. A system is ASARP if an incremental improvement in safety would require a disproportionate deterioration of system performance in other areas. From SS Handbook The ASARP concept is closely related to the as low as reasonablyachievable (ALARA) and as low as reasonably practicable (ALARP)concepts that are found in U.S. nuclear applications and U.K. Health andSafety law ASARP implies that: A comprehensive spectrum of alternative means for achieving operationalobjectives has been identified The performance of each alternative has been analyzed to determine the relativegains and losses in performance (operational effectiveness, safety, cost, andschedule) that would result from selecting one alternative over another Safety performance is given priority in the selection of an alternative, insofar asthe selection is within operational constraintsPresented by Homayoon Dezfuli9 10. ASARP (Cont.) The ASARP region contains thosealternatives whose safetyperformance is as high as can beachieved without resulting inintolerable performance in one ormore of the other missionexecution domains ASARP is a region of the trade space and can contain more than onespecific alternative The ASARP concept makes no explicit reference to the absolute valueof a systems safety performance Improvements to cost, schedule, or technical performance beyondminimum tolerable levels are not justifiable if they come at the expenseof safety performancePresented by Homayoon Dezfuli 10 11. Deriving Operational Safety Objectives The fundamental safety principles set the stage for the furtherdevelopment of safety objectives, negotiated on anapplication-specific basis Safety objectives are developed using an objectives hierarchydown to a level where they can be clearly addressed bysystems safety activities, thereby creating a link that: Assures that system safety activities are directed towardsaccomplishing defined safety objectives Enables the system safety activities to be assessed in terms ofthe degree to which their target safety objectives have been met The safety objectives at the bottom level of the objectiveshierarchy represent the operational definition of safety for thesystem under consideration, and are referred to asoperational safety objectivesPresented by Homayoon Dezfuli 11 12. System Safety Objectives HierarchyPresented by Homayoon Dezfuli12 13. System Safety Activities 13 14. System Safety Activities as a Part of theSystem Safety Framework System safety activities are conducted as part of the overallsystems engineering technical process activities System safety activities are designed to promote the developmentof safe systems and to provide evidence to help demonstratethrough the Risk-Informed Safety Case (discussed later) that thestated system safety objectives have been achievedSystem SafetyObjectives(define safety)RISC EvaluationRisk-Informed Safety Case (demonstrate safety)(confirm safety)System Safety Activities(achieve safety)Presented by Homayoon Dezfuli14 15. System Safety Activities Early Design1. Initial constraints focus on applicable safety requirements, design alternatives, operational constraints, and risk tolerances2. The RIDM process provides models and results to evaluate trade-offs in the search for a final design that is ASARP3. ISAs (integration of hazard analysis, physical response 7. Inform analysis, and probabilistic 8. Allocate 6. Initialize analysis) provide input needed to demonstrate the system meets quantitative safety requirements 2. Conduct RIDM4. Under the ASARP objective, trade 1. Set Initial Constraints 5. Select Design studies are performed to examine how variations (e.g., in design) affect not only safety but also the other mission execution domains3. ISAs4. Trades Note: Interfaces are shown in some cases by nesting rather than by arrows. The nesting format automatically implies an arrow from the smaller activities within the nest to the larger activity surrounding it. Presented by Homayoon Dezfuli15 16. System Safety Activities Early Design (Cont.)5. The process of down-selecting from the design alternatives to one particular design concept is conducted through a risk- informed deliberation by the decision makers6. The initialization role of CRM is to complete the risk modeling started during RIDM to include all hazards and associated scenarios that affect the risks7. Informed compliance with requirements that have been 8. Allocate 7. Inform 6. Initialize developed historically and are recognized as best practices in their engineering disciplines tend to provide protection against 2. Conduct RIDM1. Set Initial Constraints 5. Select Design known unknowns and unknown unknowns8. The process for determining lower level performance requirements involves a risk-3. ISAs4. Trades informed allocation of requirements from system to sub- system level Presented by Homayoon Dezfuli16 17. System Safety Activities Detailed DesignDesign a safe systemduring detailed design Safety Detailed Design1. During detailed design, the role of Objectives CRM evolves to Design the system to Design the system to beas safe as reasonably include themeet or exceed the practicable minimum tolerable development and level of safety implementation of new controls whenMaintain Minimize theBe responsive Comply with needed toallocation of Risk-informto newintroduction of levied requirementsdesign potentially counteract any newinformation requirementsconsistent with solutionadverseduring systemthat affect achievable safety decisions conditions during or changed risksperformancedesignsystem design safety2. Program controls and commitments include RISC EvaluationRISC managementConfirms Safety activities to (within Systems Engineering) promote an System Safety Activities environment within 1. Conduct CRM (analytic deliberative process)Conduct CRM (analytic deliberative process)2. Program control & commitments Program control and commitments Also conduct RIDM ififmajor re-planningis neededAlso conduct RIDM major re-planning is needed which designImplement opportunities forMaintain risk analysis of systemcommunicationperformanceManagement Conduct improving safetyConductControl proactively protocols,verification researchconfiguration without incurring MaintainMaintain other and identified seeks net- management, and validationmission exe- individualbeneficialthat safety unreasonable cost, integratedtesting design best cution domain riskssafety requirementssafety programspractices, schedule, and analysis performance improvementslessons are being met models technical impacts learned, etc. are sought out and implemented Presented by Homayoon Dezfuli 17 18. Risk-Informed Safety Case18 19. Risk-Informed Safety Case (RISC) The risk-informed safety case (RISC) is the means by whichthe satisfaction of the systems safety objectives isdemonstrated and communicated to decision makers at majormilestones such as Key Decision Points (KDPs) The RISC presents decision makers with a coherent case forsafety, rather than presenting them with a set of individualsafety analysis and safety management productsPresented by Homayoon Dezfuli 19 20. Risk-Informed Safety Case (RISC) (cont.) A risk-informed safety case (RISC) is a structured argument, supported by a body of evidence, that provides a compelling, comprehensible and valid case that a system is or will be adequately safe for a given application in a given environment. This is accomplished by addressing each of the operational safety objectives that have been negotiated for the system, including articulation of the roadmap for the achievement of safety objectives that are applicable to later phases of the system life cycle.From NASA/SP-2010-580 (SS Handbook)The term risk-informed is used to emphasize that adequate safety is the result of a deliberative decision making process that involves an assessment of risks, and strives for a proper balance between safety performance and performance in other mission execution domainsPresented by Homayoon Dezfuli20 21. Risk-Informed Safety Case (RISC) (cont.) The elements of the RISC are: An explicit set of safety claims about the system(s), for example,the probability of an accident or a group of accidents is lowerthan a specified value and/or as low as reasonably practicable Supporting evidence for the claim, for example, representativeoperating history, redundancy in design, or results of analysis Structured safety arguments that link claims to evidence and thatuse logically valid rules of inference RISCs produced by lower-level organizational units (e.g., sub-system-level units) can be used as sub-claims of the RISC at the nexthigher level of the NASA hierarchyPresented by Homayoon Dezfuli 21 22. RISC Life Cycle Considerations The RISC addresses the full system life cycle, regardless ofthe particular point in the life cycle at which the RISC isdeveloped. This results in two types of safety claims: Claims related to the safety objectives of the current or previousphases argue that the objectives have been met Claims related to the safety objectives of future phases arguethat necessary planning and preparation have been conducted,and that commitments are in place to satisfy the objectives at theappropriate timePresented by Homayoon Dezfuli 22 23. Example RISC Safety Claims Derived from Safety Objectives The claims made (and defended) by the RISC dovetail with thesafety objectives negotiated at the outset of systemformulation RISC Design Claims Derived from Design Objectives: The systemdesign is adequately safe The system design The system design is meets or exceeds as safe asthe minimumreasonablytolerable level of practicable (ASARP)safetyAppropriate historically-informeddefenses against Requirements have Design solution been allocated unknown and un- decisions have been consistent withquantified safety achievable safetyrisk informedhazards have been performance incorporated into the designPresented by Homayoon Dezfuli23 24. Example RISC Structure The system design meets or exceeds the minimum tolerable level of safety Claim: The system designmeets or exceeds the An ISA has been properly The ISA shows that theconducteddesign solution meets theminimum tolerable level of allocated safety goal/threshold requirements.safetyThe design solution hasThe ISA methods used are Unknown and un-been sufficiently well appropriate to the level ofquantified safety hazardsdeveloped to support thedesign solution definitiondo not significantly impactISA and the decision context safety performanceDesign solution elements:: ISA methods: The design is robust The design minimizes theConOps Identify hazards against identified but un-potential for vulnerability toDRMs comprehensively quantified hazards unknown hazardsOperatingCharacterize initiatingenvironments events and systemSystem schematicscontrol responsesDesign drawingsprobabilistically The design incorporates: The design incorporates:...Quantify events Historically-informedMinimal complexity consistent with margins againstAppropriate TRL physics and available comparable stressesitems dataAppropriateProven solutions to ... redundancies the extent possible Appropriate materialsAppropriate for intended use inspection and Appropriatemaintenance The ISA analysts are fullyinspection and accessesqualified to conduct the maintenance... ISA accesses ...Adjusted/waivedrequirements, standards,best practices do notsignificantly increase vulnerabilities to unknown/unquantified hazardsPresented by Homayoon Dezfuli 24 25. Example RISC Structure (cont.) Claim: Design solution Design solution decisionsarerisk informeddecisions are riskinformedRIDM has been conductedThe tailored set of to select the design thatrequirements, standards, maximizes safety without and best practices toexcessive performancewhich the design compliespenalties in other mission supports a design solutionexecution domains that is as safe as reasonably practicable Stakeholder objectives are The RIDM methods usedunderstood andare appropriate to the life The set of applicableThere is an appropriate requirements (or imposed cycle phase and the requirements, standards,analytical basis for all constraints) have been decision contextand best practices wasadjustments/waivers toallocated from the levelcomprehensively identified requirements, standards, aboveand best practices RIDM methods:Identify alternativesAnalyze the risks associated with each Adjusted/waived alternativerequirements, standards,Support the risk- best practices: informed, deliberativeImprove the balance selection of a design between analyzed alternative performance measures Preserve safety performance as a priority The RIDM analysts are Do not significantlyfully qualified to conduct increase RIDMvulnerabilities to unknown/ unquantified hazardsPresented by Homayoon Dezfuli25 26. Example RISC Structure (cont.) Claim: Appropriate historically-informed defenses against Appropriate historically-informeddefenses against unknown andunknown and un-quantified safetyun-quantified safety hazards areincorporated into the designhazards are incorporated into thedesignThe design is robust The design minimizes theagainst identified but un- potential for vulnerability toClaim: Requirements are allocatedquantified hazardsunknown hazardsconsistent with achievable safety The design incorporates:The design incorporates:performanceHistorically-informed Minimal complexity margins against Appropriate TRL comparable stresses items Appropriate Proven solutions to redundanciesthe extent possible Appropriate materials Appropriate for intended useinspection andAllocated requirements Appropriate maintenanceare consistent withinspection andaccesses achievable safety maintenance ... performance accesses ...Performance requirements Allocated requirementsare consistent with thehave been negotiated Adjusted/waivedperformance commitmentsbetween the requirements requirements, standards, developed during RIDMowner and the best practices do notorganization responsiblesignificantly increase for meeting the vulnerabilities to unknown/requirementsunquantified hazardsPresented by Homayoon Dezfuli26 27. Independent Evaluation of the RISC It is good practice for an evaluator to have one or more checklists for determining whether the evidence is sufficient to support a claim The checklist should be organized independently from the RISC and should tend to be generically applicable rather than application specificEVALUATION BY ANALYSIS TYPEANALYSIS ATTRIBUTEPhysical Hazards Individual Aggregate Risk Risk Responses RisksRisksDrivers AllocationsImportant issues are identified and evaluatedGrade: Grade: Grade: Grade:Grade: Grade: Comment: Comment: Comment: Comment:Comment: Comment:Models are graded according to the importance of the issue Grade: Grade: Grade: Grade:Grade: Grade: Comment: Comment: Comment: Comment:Comment: Comment:Tests support models and analysis of important issuesGrade: Grade: Grade: Grade:Grade: Grade: Comment: Comment: Comment: Comment:Comment: Comment:Best available models are used for all risk significant issues Grade: Grade: Grade: Grade:Grade: Grade: Comment: Comment: Comment: Comment:Comment: Comment:Etc. PROGRAMMATIC CONTROLEVALUATIONPlans related to programmatic controls are comprehensively and clearly documented.Grade:Comment:Management will actively promote an environment within which design opportunities for improving safetyGrade:without incurring unreasonable cost, schedule, and technical impacts are sought out and implemented Comment:during each phase.Protocols are in place that will promote effective and timely communication among design teams from Grade:different organizations working on different parts of the system. Comment:Etc.Presented by Homayoon Dezfuli 27 28. Putting It All TogetherPresented by Homayoon Dezfuli28 29. Challenges Ahead Organizational challenges Integrating system safety personnel/activities more closely withsystems engineering, operations management, and riskmanagement Analytical challenges Integrating/refining existing analysis activities to support thedevelopment of an integrated safety analysis (ISA) Meaningful accounting for unknown and under-evaluated risks indetermining whether safety thresholds and goals have beenachieved Procedural and regulatory challenges Development of standards and practices for formulating andevaluating risk informed safety cases (RISCs) Development of guidelines for excising unnecessaryrequirements while maintaining safety beneficial requirementsPresented by Homayoon Dezfuli 29 30. Backup Slides30 31. Independent Evaluation of the RISC A flowdown checklist for evaluating the RISC has the advantage of explicitly showing how arguments based on evidence support claims.1.0 TOP-LEVEL CLAIMSafety Performance MeasuresThis flow-down checklist examines how safe the system is (or will be),* how well it is demonstrated, and what is being done to make sureSafety Performance Requirementsthat the top-level safety claim is true (or remains true).* This is the technical basis for the claim:(including Goal and Threshold) Evidence, including operating experience, testing, associated engineering analysis, and a comprehensive, integrated design and safetyEngineering Requirements analysis (IDSA), including scenario modeling using Probabilistic Safety Analysis (PSA)Process Requirements A credible set of performance commitments, deterministic requirements, and implementation measures.*The nature and specificity of the claim, and the character of the underlying evidence, depend on the life cycle phase at which the safety case is being applied.The results of analysis have been clearly presented, conditional on an The design intent is characterized in terms of It has been successfully demonstrated The implementation aspects needed to explicitly characterized baseline allocation of levels of performance,design reference missions, CONOPS, andthat no further improvements to theachieve the level of safety claimed is risk-informed requirements, and operating experience. An effectivedeterministic requirements to be satisfied. The design or operations are currently net- correctly understood, and the process for identifying departures from this baseline and/ordesign itself is characterized at a level of detail beneficial (as safe as reasonablynecessary measures have beenaddressing future emergent issues that are not addressed by thisappropriate to the current life cycle phase. practicable).committed to. baseline has been developed. 1.1 1.21.3 1.4An effective process forThe design for the current life Analyses performed provide the An effective process has beenaddressing unresolved and The design and mission intent cycle phase (including following results:carried out to identify significant It has been confirmed that allocatednon-quantified safety issuessafety improvements, but no is well charctterized.*requirements and controls) isAggregate risk results (issues invalidating the performance is feasible well specified.*Dominant accident scenarioscandidate measures have beenbaseline case) has been identified 1.1.11.1.2Comparison with threshold/formuulated.1.4.1 1.2.21.3.1 goal Established baseline forAn effective process has been A reasonable defense developed for monitoring and precursor analysis It has been demonstrated that furtheragainst unknown safetyassuring ongoing satisfaction of ..issues is included in the improvements in safety would allocated performance levels, anddesign and controlsunacceptably affect schedule1.2.1there are commitments to implementConcept of OperationWhat is credited is reasonablethese measuresDesign Referenceand justifiable 1.2.2.11.3.21.4.2MissionsOperation Environments1.1.2.1Historically Informed In addition to reviewing existing information sources and A reasonable attempt has beenIt has been demonstrated that furtherElementsoperating experience, the best processes known for identifyingimprovements in safety would incurmade to identify and prioritize allThe nominal performance and previously unrecognized safety hazards has been applied. significant risks in the risk 1.1.1.1dynamic responses in designexcessive performance penalties management programreference phases are well1.2.1.11.3.31.4.3 understood and justified1.1.2.2The limits of the safety models are recognized, the caliber ofevidence used in the models has been evaluated, and uncertaintyAn effective process has beenIt has been demonstrated that further developed for evaluating flight andThe performance tailoring and and sensitivity analyses have been performed. improvements in safety would incur test experience for the presence ofallocation are well understoodCompleteness issue excessive cost accident precursorsand justified Understanding of key phenomenology and assumptions1.3.41.4.41.1.2.31.2.1.3 Hazard controls, crew survival methods (if applicable), deterministicrequirements, and fault protection approaches have been formulatedeffectively in a risk-informed manner1.1.2.4 1.2.1.2Presented by Homayoon Dezfuli31