National Aeronautics and Space Administration National Aeronautics and Space Administration
IMM
Adapting NASA-STD-7009 to Assess the Credibility of Biomedical Models and Simulations
Lealem Mulugeta1, Marlei Walton2, Emily Nelson3 and Jerry Myers3
1. Universities Space Research Association, DSLS 2. Wyle Science, Technology & Engineering Group
3. NASA Glenn Research Center
ASME Verification and Validation Conference May 7-9, 2014 – Las Vegas, NV
Background
• The standard was initially developed for engineering systems M&S
• NASA’s Digital Astronaut Project (DAP) and Integrated Medical Model (IMM) have successfully adapted NASA-STD-7009 for biomedical models for clinical, management, and research applications
• Given the highly comprehensive nature of the standard, substantial steps have been taken to establish a systematic process to apply the standard to HRP needs – Systematic analysis of the model application criticality – Weighting of factors for consistence with model application
2 2
M&S Criticality and Risk Assessment
• How are the models and simulations (M&S) going to be used? – What are the decisions to be made? – Is it for research or clinical applications? – Do the M&S provide insight to guide decisions or are they the decision
making tool? – Is there substantial data to strengthen confidence in the results?
• What is the impact on human health or the mission?
3
Must apply 7009 per HRP-47069
DAP: Reduced - g
DAP: Micro-g
IMM
Main Elements of 7009
1. System & Analysis Frameworks – This is where the evaluator documents details regarding the real world system (RWS) to be represented and includes the basic structure of the M&S, along with the abstractions and assumptions.
2. M&S Analysis Results & Caveats - This is where the evaluator documents details regarding the uncertainty in the M&S results and any further qualifying statements surrounding the analysis.
3. M&S Credibility Assessment – This is where the evaluator documents details regarding the integrity of the data and processes used to develop and vet the M&S.
4
M&S System & Analysis Frameworks (Scope)
5
Legend (A) – Analyst (D) – Developer (O) – Operator
M&S Analysis Results & Caveats
6
Legend (A) – Analyst (D) – Developer (O) – Operator
Credibility Levels of Evidence Su
ffici
ency
Thr
esho
lds
Red: IMM Thresholds
7
Blue: DAP Biomechanics model thresholds
Technical Review Subfactor Scoring
Levels Tech Review
(Sub Factor Specific) 4 Favorable external peer review accompanied
by independent factor evaluation 3 Favorable external peer review. 2 Favorable formal internal review 1 Favorable informal internal review 0 Insufficient evidence
Example Subfactor Weight
Assessed score
Factor Score
Evidence - Subfactor Scoring Level
0.7 2
2.3
Review - Subfactor Scoring Level 0.3 3
• 5 Factors Require Technical Review • Verification • Validation • Input Pedigree • Results Uncertainty • Results Robustness
• Review (Subfactor) Scoring Listed in Table • Weighting between
evidence and peer review • Customer Defined • Peer review NOT to be
weighted more than 30% 8
Credibility Assessment Matrix: Proposed Weighting Strategy
9
Factor Weight (Proposed) Deterministic Probabilistic 1 Verification 0.2 0.075 2 Validation 0.25 0.15 3 Input Pedigree 0.1 0.275 4 Results Uncertainty 0.1 0.2 5 Results Robustness 0.1 0.15 6 Use History 0.15 0.15 7 M&S Management 0.05 0.05 8 People Qualifications 0.05 0.05
TOTAL 1.0 1.0
0.05 < Wi < 0.25 : ΣWi= 1
Based on Application by DAP and IMM for HRP
Example of Credibility Scoring – With Factor Weighting
10
*Threshold: The required score agreed to by the end-user/customer and M&S provider to achieve sufficient confidence in the M&S for intended use
Unweighted – Model would have a CS = 0
Lessons Learned and Takeaways
• The sooner M&S credibility assessment is integrated as part of the M&S development and implementation process, the more likely: – Researchers and decision makers gain confidence in the M&S – The M&S can have positive impact on biomedical research and operations – The greater medical community will see potential of M&S to inform clinical
interventions • The sooner the end-user/customer is engaged to inform the M&S
development and implementation process, the more likely the end product will have a higher impact
• It is important to appropriately weight the different credibility assessment factors for the problem of interest, – M&S should be applied within their validation domain to maintain highest confidence
in results • The greater medical community recognizes the importance of rigorously
vetting computational models and looks to NASA for leadership
11
Getting It Right: Better Validation Key to Progress in Biomedical Computing - Bringing models closer to reality - 10/19/12
National Aeronautics and Space Administration National Aeronautics and Space Administration
IMM
Thank you! Questions?
National Aeronautics and Space Administration National Aeronautics and Space Administration
IMM
Backup slides
13
Verification and Validation
• Verification is the process of determining if the model implementation accurately represents the developer’s conceptual/mathematical description (underlying physical principles) and its solution.
• Validation is the process of determining the degree to which a model is an accurate representation of the real world system from the perspective of the intended uses of the model (e.g. compare a simulated exercise outputs with other subjects performing the same exercise).
14
What is NASA-STD-7009?
Comprehensive set of requirements and processes for developing and applying models and simulations (M&S) • Credibility assessment ensures that the application
domains of the M&S are appropriate • Provides a foundation for deriving the confidence level
for any given M&S • Documentation is critical for appropriate interpretation
of the M&S results by the end-user
15
NASA-STD-7009: Standard for Models and Simulations (7009)
16 https://standards.nasa.gov/documents/detail/3315599
M&S Implementation Key Personnel
• Operators (O) – Execute the model to perform a simulation, and is generally the least technical but is most familiar with using the model.
• Analysts (A) – They usually define the initial conditions and boundaries of a simulation, and review the results of the simulation. Above all, analysts are responsible for the credibility/ validation of the simulations (not the model).
• Developers (D) – They develop the fundamental principles and mathematical abstractions of the model. They can play a role in the other two areas (and should), however their responsibility is scientific/technical application of various principles to provide a means of creating relevant simulations. They are responsible for credibility and validation of the model.
17
Credibility Levels of Evidence
18
Credibility Levels of Evidence - Thresholds
19
0
1
2
3
4Verification
Validation
Input Pedigree
ResultsUncertainty
ResultsRobustness
Use History
M&SManagement
PeopleQualifications
DAP Biomechanics model thresholds
IMM Thresholds
Weighting of Credibility Assessment Score – Deterministic M&S
20
Deterministic Models and Simulations Weight Explanation for default values
Verification 0.2 The complexity of such models requires that verification of the implementation of the underlying concept be a relatively high importance to the model
Validation 0.25
Due to the use of the model, achieving the customer’s desired level of validation quantified by direct comparison to the real world system is considered imperative and must be assigned the highest weighting possible.
Input Pedigree 0.1
Although important, the IP is at the highest level possible due to limited HRP data set availability. Weighting should reflect this situational condition.
Results Uncertainty 0.1
From an HRP POV, RU is more critical in understanding the limits of the Validation activity. Weighting should reflect the partial capture of this parameter under the validation condition.
Results Robustness 0.1
Sensitivity of the model to parameter variation is partially captured in the validation parameter. Weighting should reflect the importance in understanding model performance outside the know operation space.
Use History 0.15 Under HRP, successful use of the model for decision or research support in respected works is considered important and is thus weighted third in the overall weighting strategy.
M&S Management 0.05
Management is relatively equal during the model development activities due to program and project oversight and required processes. Weighting reflects these in-place conditions.
People's Qualification 0.05
Although critical in general, the use of competitive, peer review of proposed work is considered to recruit qualified people specific to the model application. Weighting reflects this in quality control issue.
Total: 1.0 NOTE: The sum of the weightings must equal 1.0
Developed by DAP and IMM for HRP
The customer/end-user may use different weighting scheme. But the minimum weight that can be assigned to any of the factors is 0.05, and the maximum weight is 0.25.
Weighting of Credibility Assessment Score – Probabilistic M&S
21
Probabilistic Models and Simulations Weight Justification/Explanation for default values
Verification 0.075
The complexity of such models is considered mathematically straight forward. Verification remains important, however the implementation of the underlying model is not considered complex, so less weight should be placed on the contribution to credibility.
Validation 0.1
Achieving the customer’s desired level of validation remains important, although quantified direct comparison to the real world system is difficult. It is considered that it should significantly contribute to the overall credibility when the capability to perform the validation is possible..
Input Pedigree 0.175
The second most critical factor in defining the likelihood and consequence. Assumption is that IP must be at the highest level possible due to limited HRP data set availability. Weighting should reflect this important situational condition.
Results Uncertainty 0.2
From an HRP POV, RU is the most critical in capturing the knowledge regarding the likelihood and consequence. Weighting should reflect the importance of this parameter under the validation condition.
Results Robustness 0.15
Sensitivity of the model to parameter variation is critical in understanding the contributing parameter importance in the underlying logic. Weighting should reflect this importance.
Use History 0.15 Under HRP, successful use of the model for decision or research support in respected works is considered important and is thus weighted highly in contributing to credibility.
M&S Management 0.05
Management is relatively equal during the model development activities due to program and project oversight and required processes. Weighting reflects these in-place conditions.
People's Qualification 0.1
Although critical in general, the use of competitive, peer review of proposed work is considered to recruit qualified people specific to the model application. Weighting reflects this in quality control issue.
Total: 1.0 NOTE: The sum of the weightings must equal 1.0
Developed by DAP and IMM for HRP
The customer/end-user may use different weighting scheme. But the minimum weight that can be assigned to any of the factors is 0.05, and the maximum weight is 0.25.
Key Steps to Credibility Assessment
1. Sufficiency threshold levels needs to be established for each credibility factor (highly dependent on the available data and expertise)
2. The target user community or customer should be consulted in setting the minimum thresholds
22
Example: DAP’s M&S Development and Implementation Strategy
23
ARED M&S have had impact on exercise research and operations sooner than anticipated and continue to provide high value
M&S Validation and Application Domain
24
M&S Credibility Assessment (1 of 2)
25
Sufficiency threshold = target score
M&S Credibility Assessment (2 of 2)
26
Sufficiency threshold = target score
Visual Representation of Credibility Assessment
27
Sufficiency threshold
Score
Spider (Radar) Plot
Impact in the Medical/Healthcare Field (3 of 3)
28
Getting It Right: Better Validation Key to Progress in Biomedical Computing - Bringing models closer to reality
The ground laid by DAP and IMM was featured in the 2012 fall issue (10/19/12) of the Biomedical Computation Review magazine and lauded as a “Comprehensive Validation” method.
http://biomedicalcomputationreview.org/content/getting-it-right-better-validation-key-progress-biomedical-computing
Example of Credibility Scoring – Without Factor Weighting
29
*Threshold: The required score agreed to by the end-user/customer and M&S provider to achieve sufficient confidence in the M&S for intended use
Impact in the Medical/Healthcare Field (2 of 3)
• As a direct consequence of a presentation given NIH/IMAG regarding how NASA uses 7009 to vet biomedical models, the Food and Drug Administration is heavily leveraging 7009 to develop a new standard for “Verification and Validation of Computational Modeling of Medical Devices”
• The FDA regularly consults with IMM and DAP in the development of this new standard
• DAP Project Scientist has been invited to be a member of the ASME V&V40 Sub-committee that is working with the FDA to develop the standard for “Verification and Validation of Computational Modeling of Medical Devices”
30
Weighting of Credibility Assessment Score – Overview
31
Deterministic Probabilistic Uniform Verification 0.2 0.075 0.125 Validation 0.25 0.1 0.125 Input Pedigree 0.1 0.175 0.125 Results Uncertainty 0.1 0.2 0.125 Results Robustness 0.1 0.15 0.125 Use History 0.15 0.15 0.125 M&S Management 0.05 0.05 0.125 People's Qualification 0.05 0.1 0.125
Total: 1.0 1.0 1.0
The customer/end-user may use different weighting scheme. But the minimum weight that can be assigned to any of the factors is 0.05, and the maximum weight is 0.25.
Developed by DAP and IMM for HRP