SSRR 2018 November 8, 2018
Formal Methods in Resilient Systems Design
using a Flexible Contract Approach
Sponsor: DASD(SE)
ByDr. Azad M. Madni
10th Annual SERC Sponsor Research ReviewNovember 8, 2018
FHI 360 CONFERENCE CENTER1825 Connecticut Avenue NW, 8th Floor
Washington, DC 20009
www.sercuarc.org
SSRR 2018 November 8, 2018 2For Official Use Only
Outline
■ Research Objectives
■ Engineered Resilience
■ Innovative Approach
■ Accomplishments
■ Exemplar Real-World Problem
■ Prototype Implementation
■ Initial Findings and Lessons Learned
■ Way Ahead
SSRR 2018 November 8, 2018 3For Official Use Only
Team
■ Prof. Azad Madni, Principal Investigator
■ Prof. Dan Erwin, Co-Investigator
■ Dr. Ayesha Madni, Project Manager
■ Edwin Ordoukhanian, Research Assistant
■ Parisa Pouya, Research Assistant
SSRR 2018 November 8, 2018 4For Official Use Only
Research Overview
■ Objective
➢ develop a formal modeling approach for designing resilient systems
■ Approach➢ based on Resilience Contract (RC), a formal, probabilistic construct
➢ RC = Traditional Contract + flexible assumptions + Partially
Observable Markov Decision Process + in-use learning
■ Application➢ planning and decision making in multi-UAV swarm and spacecraft
swarm
➢ problem of interest to both DOD and civilian sector
SSRR 2018 November 8, 2018 5For Official Use Only
Problem
■ Systems and networks in the 21st century are required to be resilient in the face of uncertainty and systemic and external disruptions
■ Predictability, flexibility and adaptability are essential for verifiable, resilient behavior of systems and system-of-systems networks
■ For predictable system operation, system (model) has to be verifiable in terms of both static properties and dynamic behavior
■ For flexibility, model needs to be modifiable by an external agent
■ For adaptability, system (model) needs to have the ability to self-adjust (i.e., self-restructure, self-reorganize, self-reconstitute)
■ These requirements lead to the need for formal and probabilistic modeling to address tradeoffs between system (model) verifiability, flexibility and adaptability
■ This recognition provided the impetus for our research
SSRR 2018 November 8, 2018 6For Official Use Only
Engineered Resilience:A Messy Problem
■ Engineered resilience is messy
➢ requirements can be imprecise (especially initially)
➢ actions can be unclear (especially initially)
➢ system states can be ambiguous (partial observability)
➢ incompatible with invariant methods
■ Want a formal and flexible modeling methodology
➢ provides value even with partial information
➢ model has ability to incrementally learn from new evidence
➢ facilitates key tradeoff: flexibility (resilience) vs. formality (V&V)
SSRR 2018 November 8, 2018 7For Official Use Only
System Resilience:Multiple Interpretations
■ Recoverability: Ability of system to rebound and return to
equilibrium at original, slightly degraded, or better state
■ Robustness: Ability of system to absorb a disturbance within
design envelope without any structural change
■ Extensibility: Ability of system to extend gracefully (i.e., add
capacity/resources) to fulfill increase in demand
■ Adaptability: Ability of system to monitor and adjust continually
through restructuring or reconfiguration to counter disruptions
SSRR 2018 November 8, 2018 8For Official Use Only
Resilient DoD Systems / SoS: Requirements
■ Operate safely in dynamic, uncertain environments
■ Tolerate / survive systemic faults and failures
■ Accomplish goals (e.g., navigate safely) even with
incomplete information (e.g., partial observability)
■ Adjust /adapt to environment disruptions
■ Protect against physical and cyber threats
■ Reconfigure / restructure to minimize impact of disruptions
(e.g., security breaches, loss of sensing node or comm link)
SSRR 2018 November 8, 2018 9For Official Use Only
Innovative Approach
■ Combines formal and probabilistic modeling with heuristics
within a two-level architectural framework spanning planning
and decision making (top level) and control (bottom level)
■ Architecture characteristics
➢ Layered: comprising planning and decision making layer and
control layer
➢ Decisions and information flow from planning and decision
making layer to control layer
➢ Execution constraints flow from control layer to planning and
decision making layer
➢ In case of conflicts, satisfaction of global objectives has
precedence over satisfaction of local AV goals
■ Employs Resilience Contract, a new construct, to perform
trade-off between system verifiability and system flexibility
SSRR 2018 November 8, 2018 10For Official Use Only
Accomplishments
■ System Modeling
➢ probabilistic modeling (POMDP)
➢ traditional control
➢ POMDP: planning and decision making (higher level)
➢ traditional control: execution (lower level)
➢ simple scenarios require only control layer
➢ complex scenarios (i.e., multiple choices, uncertainty) require both
■ Experimentation Testbed
➢ hardware-software integration
➢ skeletal Digital Twin
➢ system behavior exploration
■ Smart Dashboard
➢ monitoring individual UAVs and UAV-network
➢ intervening to redirect assets
➢ maintaining audit trail of decisions and behaviors
SSRR 2018 November 8, 2018 11For Official Use Only
System Modeling
SSRR 2018 November 8, 2018 12For Official Use Only
Modeling Challenges
■ Support verification and testing of system model and SoS
network in uncertain environments prone to disruptions
■ Accommodate changes in structure and behavior to cope
with disruptions
■ Support bi-directional reasoning (evidence, model)
■ Be scalable and extensible (no. of agents, interconnections)
■ Provide value even with partial information (not “data
hungry”)
■ Learn incrementally from new evidence (reinforcement
learning)
SSRR 2018 November 8, 2018 13For Official Use Only
Resilience Contract
■ Combines traditional contract, probabilistic contract
(implemented as POMDP), and heuristics to achieve desired
level of verifiability, flexibility, and scalability
➢ Traditional contract (assert-guarantee): correctness, verifiability
➢ Probabilistic contract (POMDP; belief-reward): flexibility,
adaptability
➢ Heuristics: complexity reduction, contain combinatorial explosion
SSRR 2018 November 8, 2018 14For Official Use Only
Resilience Contract:Key Characteristics
■ Extension of traditional (“assert-guarantee”) contract
➢ relaxes assertions in traditional contract to “belief-reward” (flexibility)
➢ Partially Observable Markov Decision Process (uncertainty handling)
➢ in-use reinforcement learning (hidden states, transitions)
➢ heuristics/pattern recognition (complexity reduction)
■ Exhibits desired characteristics
➢ Verifiable models: key to safety and security
➢ Flexible models: key to adaptability and resilience
➢ Learning models: key to performance improvement
■ Exemplar Applications
➢ multi-UAV swarm planning & decision making; self-driving car network
SSRR 2018 November 8, 2018 15For Official Use Only
RC evaluates POMDP
reward; typical responses:
• Keep going
• Stop
• Enforce trajectory to
a safe state
• Notify support team
Policy
Execution
Resilience Contract (RC)
SSRR 2018 November 8, 2018 16For Official Use Only
Experimentation
Testbed
SSRR 2018 November 8, 2018 17For Official Use Only
Experimentation Testbed: Layered Architecture
Data Sources
-Sensors -Physical Vehicle -Digital Twin -Environment
System Control-traditional controller
Planning and Decision-Making
-Probabilistic (POMDP) -Deterministic
Experimenter Interface-Initial Conditions -What-if Injects -Dashboard
Experimenter
SSRR 2018 November 8, 2018 18For Official Use Only
Testbed Rationale
■ Concept Exploration and Model Verification
➢ critical for exploring different CONOPS and verifying system
models through static correctness analysis and simulation-
based evaluation in a controlled experimentation environment
■ Digital Twin Development and Management
➢ ideal environment to develop and maintain Digital Twin
➢ produce significant time and cost savings
■ Demonstration Platform
➢ ideal for incrementally developing and demonstrating system
and SoS capabilities
SSRR 2018 November 8, 2018 19For Official Use Only
Hardware Used in
Experimentation Testbed
SSRR 2018 November 8, 2018 20For Official Use Only
Potential Testbed Capabilities
■ Apply theoretical concepts in an instrumented and controlled
physical world setting before planning next steps
■ Allow “test-drive” of various AV resilience CONOPS
■ Verify realizability of models and software algorithms in
integrated hardware-software configurations
■ Enable controlled experimentation with formal models in a safe
environment
■ Enable deeper understanding of the realities of state-space
modeling, self-learning, and adaptive control options
SSRR 2018 November 8, 2018 21For Official Use Only
Smart
Dashboard
SSRR 2018 November 8, 2018 22For Official Use Only
Dashboard
■ Exploring different modeling, verification simulation, and
visualization options in a controlled environment
■ Comprises:
➢ user-selectable situation displays (plan view; UAV views)
➢ health status displays (UAV resource availability/use, UAVs)
➢ dynamic context management
➢ scenario monitoring and visualization
➢ user intervention during execution (redirect asset/adapt plan)
➢ data collection and analytics (e.g., miles flown without
incident, inject handling effectiveness)
SSRR 2018 November 8, 2018 23For Official Use Only
Multi-UAV Monitoring
and Control Dashboard
■ Demonstration
➢ customizable dashboard for monitoring and control of
simulated/physical vehicles
■ Underlying technologies
➢ dronekit platform with visualization facilities
➢ dronekit is a software platform that allows commands to be issued
to both flying hardware and simulation model
➢ quadcopters (hardware) and quadcopter simulation models
■ Key capabilities
➢ simulated vehicles exhibit behavior of physical vehicle
(quadcopters)
➢ same commands used to control simulated and physical vehicles
(quadcopters)
➢ can easily replace simulated vehicles with physical vehicles
SSRR 2018 November 8, 2018 24For Official Use Only
Illustrative
Example
SSRR 2018 November 8, 2018 25For Official Use Only
Illustrative Example:Multi-UAV Operations
■ Multiple UAVs tasked to jointly conduct operational mission
➢ e.g., search and rescue, NEO, HADR
■ The environment can be uncertain and potentially deceptive
➢ e.g., partial observability, noisy sensors, hostile actors
■ UAV can experience malfunctions
➢ loss of communication or sensing
➢ subsystem fault/failure
■ UAV swarm can experience disruptions
➢ loss of UAV, loss of communication within swarm
■ UAV swarm needs to be able to complete mission safely
with original/descoped objectives
SSRR 2018 November 8, 2018 26For Official Use Only
Operational Use Cases
■ QC swarm operational use cases ➢numerous; multiple variations in state combinations
■ Fall into discrete stages (context)➢e.g., deployment, enroute, action on objectives, redeployment
■ Influencing factors ➢METT-TC: Mission, Enemy, Terrain and weather, Troops and
support available, Time available, and Civil Considerations
SSRR 2018 November 8, 2018 27For Official Use Only
Quadcopter Testbed
■ Each Quadcopter
➢ is driven by Raspberry Pi and Navio Flight Controller
➢ has full IMU: 3-axis accelerometers, rate gyros, magnetometer
➢ take inputs for laptop and/or remote controller
o control values (throttle, roll-pitch-yaw)
o able to perform autonomous flight
■ Testbed Capabilities
➢ can run customized Python scripts and control the vehicles
o using dronekit framework and commands
➢ perform semi-autonomous flights
o able to launch, take-off, hover, and perform limited waypoint navigation
➢ dashboard to monitor vehicle status and control vehicle’s position
o ability to communicate with simulated vehicles as well as hardware
SSRR 2018 November 8, 2018 28For Official Use Only
Agent
Exemplar Swarm Control Architecture
Belief
MDP
Model
Policy
State Estimator
UAV Swarm
Actions
Observations
Belief
Estimates
Environment
Sensors
SSRR 2018 November 8, 2018 29For Official Use Only
Update Iterations
S1
(threat on
left)
S2
(threat on
right)
Belief Vector
b0
Action = observe
o: threat on left
Belief Vector
b0b1
𝑏1 𝑠𝑖 =𝑃(𝑜|𝑠𝑖 , 𝑎)σ𝑠𝑗∈𝑆
𝑃(𝑠𝑖|𝑠𝑗, 𝑎)𝑏0(𝑠𝑗)
𝑃(𝑜|𝑎, 𝑏))
Initial Belief
SSRR 2018 November 8, 2018 30For Official Use Only
Dashboard Showing Camera View of Flying Quadcopter
SSRR 2018 November 8, 2018 31For Official Use Only
Dashboard Showing 3 Flying QCs With One Low on Battery and Landing
SSRR 2018 November 8, 2018 32For Official Use Only
Initial Findings
■ When implementing hybrid model, need to resolve mismatch
between planning and decision-making and vehicle control layers
■ Mismatch can be resolved by
➢ ensuring that propagated commands from planning and decision
making layer to controller do not violate physical/regulatory constraints
➢ propagating execution constraints from control layer to planning layer
for planning layer to take into account when issuing commands
➢ incorporating heuristics (e.g., priorities, region of influence) to resolve
conflicts and simplify computation
■ POMDP and vehicle controller work on different time scales
➢ dynamics model runs every 0.01 seconds (accuracy)
➢ POMDP runs slower (high level decisions/commands)
o basically a waypoint navigation problem
o ideal sampling period for POMDP determined experimentally
o overall response time to action needs to be minimized
SSRR 2018 November 8, 2018 33For Official Use Only
Initial Findings (cont’d)
■ Concurrent creation of testbed facilitates experimentation and
data collection
➢ currently able to switch between simulation model and physical
system
➢ In the future, will be able to take operational data including wear
and tear from physical system to incorporate into virtual model –
this will be a step in developing a Digital Twin
■ Introducing a monitoring and execution dashboard facilitated
both understanding and debugging of vehicle behaviors
SSRR 2018 November 8, 2018 34For Official Use Only
POMDP Modeling:Lessons Learned (General)
■ POMDP is equivalent to a rule-based system for simple scenarios
■ POMDP needs to work with traditional controller to transform execution
decisions into execution actions
■ POMDP states should be defined and created based on various
conditions that system/SoS can potentially experience
■ Ability to acquire new knowledge through reinforcement learning makes
POMDP attractive for complex scenarios with partial observability
■ Value function and time horizon are important parameters that shape
system and SoS network behaviors
■ Best action or action with highest reward associated with a particular
state are also an escape route for that state
➢ e.g. an action that would bring the vehicle into a safe state
SSRR 2018 November 8, 2018 35For Official Use Only
POMDP Modeling:Lessons Learned (Specific)
■ POMDP generates only high level commands, does not specify details of
command
➢ POMDP identifies “What?” and “When?”, not “How?” or “How much?”
■ Translation of POMDP commands to executable actions are performed by
controller
➢ if POMDP’s one-time command is sufficient to result in a transition in vehicles
state, the POMDP will receive a different observation (with high probability)
that identifies new state of POMDP
➢ otherwise, POMDP gives same command to controller until it realizes a
transition in vehicle’s actual state
■ Rewards (and penalties) for performing actions in different states can be
assigned based on real world experience➢ These values can be adjusted later through reinforcement learning
➢ Rewards are assigned to actions that we believe contribute most to goal
achievement, or are “safest” to take if headed toward an unsafe state
SSRR 2018 November 8, 2018 36For Official Use Only
Summary
■ 21st Century DoD Systems need to be safe, resilient and affordable
■ They have to operate in uncertain, potentially hostile and deceptive
environment with partial observability
■ System model verifiability is needed for system safety
■ System model flexibility is needed to enable resilient behavior
■ Modeling such systems is beyond capability of traditional systems
modeling approaches
■ Resilience Contract, a probabilistic approach based on POMDP,
when coupled with heuristics and reinforcement learning, can satisfy
the need for safety, resilience and improved performance
■ Our research is demonstrating the viability of this approach while
simultaneously developing a testbed for controlled experimentation
with different CONOPS and resilience methods
SSRR 2018 November 8, 2018 37For Official Use Only
References ■ Madni, A.M., Erwin, D. and Sievers, M. Experimentation Testbed for Verification and Testing of MBSE
Models, MDPI Systems, Accepted for Publication
■ Madni, A.M. and Sievers, M. “Model Based Systems Engineering: Motivation, Current Status, and
Research Opportunities,” Systems Engineering, 20th Anniversary Special Issue, vol. 21, issue 3, pp. 172-
190, 2018.
■ Madni, A.M., Sievers, M., Erwin, D. Madni, A., Ordoukhanian, E. and Pouya, P. Formal Modeling of
Complex Resilient Networked Systems, AIAA Science and Technology Forum, San Diego, California,
January 7-11, 2019
■ Sievers, M., Madni, A.M., and Pouya, P. Assuring Spacecraft Swarm Byzantine Resilience, AIAA Science
and Technology Forum, San Diego, California, January 7-11, 2019
■ Madni, A.M. Formal Methods in Resilient Systems Design Using a Flexible Contract Approach, 21st Annual
Systems Engineering Conference, Tampa, Florida, October 22-24, 2018.
■ Madni, A.M., Sievers, M., Ordoukhanian, E., and Pouya, P., and Madni, A. “Extending Formal Modeling for
Resilient Systems,” 2018 INCOSE International Symposium, July 7-12, 2018.
■ Madni, A.M. “Formal Methods for Intelligent Systems Design and Control,” AIAA SciTech Forum, 2018
AIAA Information Systems, AIAA InfoTech@Aerospace, Kissimmee, Florida, Jan 8-12, 2018
■ Madni, A.M., Sievers, M., Humann, J., Ordoukhanian, E., “Model-Based Approach for Engineering Resilient
System-of-Systems: Application to Multi-UAV Swarms,” Conference on Systems Engineering Research,
March 23-25, 2017, Redondo Beach, CA.
■ Madni, A.M., D’Ambrosio, J., Sievers, M., Humann, J., Ordoukhanian, E., Sundaram, P. “Model-Based
Approach for Engineering Resilient System-of-Systems: Applications to Autonomous Vehicle Network”
CSER, Mar 23-25, 2017, Redondo Beach, CA.
■ Sievers, M. and Madni, A.M. Contract-Based Byzantine Resilience for Spacecraft Swarm, 2016 AIAA
Science and Technology Forum and Expo, Grapevine, Texas, Jan 9-13, 2017.
■
SSRR 2018 November 8, 2018 38For Official Use Only
Azad M. MadniSERC Principal Investigator
■ Professor, Astronautical Engineering, University of Southern California
■ Executive Director, Systems Architecting and Engineering Program
■ Founder and Chairman, Intelligent Systems Technology Inc.
■ Life Fellow, IEEE; Fellow, AAAS, AIAA, INCOSE; Life Fellow, SDPS & IETE
■ Ph.D., M.S., B.S. in Engineering, UCLA
■ Research Sponsors: DoD-SERC, AFRL, AFOSR, ARI, ARL, RDECOM, ERDC,
TATRC, DARPA, OSD, NASA, DOE, NIST, ONR, NAVAIR, NAVSEA, SPAWAR,
MARCOR, DTRA, MDA, , GM, Boeing, NGC, Raytheon, LM Orincon, SAIC, others
■ Recent Awards
➢ 2018 INCOSE Outstanding Service Award
➢ 2018 IEEE SMC Systems Science and Engineering Award for MBSE TC (most influential)
➢ 2017 Dean’s Award for Innovation in Teaching and Education
➢ 2017 John F. Guarrera Engineering Educator of the Year from Engineer’s Council
➢ 2017 James E. Ballinger Engineer of the Year Award from OCEC
➢ 2016 Boeing Lifetime Achievement Award (Contributions to Boeing, Aerospace and Nation)
➢ 2016 Boeing Visionary Systems Engineering Leadership Award
➢ 2014 INCOSE Lifetime Achievement Award
➢ 2013 IIE Innovation in Curriculum Award
➢ 2011 INCOSE Pioneer Award
■ Recent Authored Books
➢ Transdisciplinary Systems Engineering: Exploiting Convergence in a Hyper-Connected World
(foreword by Norm Augustine) Springer, 2018
➢ Tradeoff Decisions in System Design (foreword by John Slaughter), Springer, 2016
SSRR 2018 November 8, 2018 39For Official Use Only
Way Ahead
■ Year One: Extend Phase I prototype to incorporate POMDP decision-making
➢ Develop verifiable utility function (i.e., Reward Function) to evaluate available
options
➢ Refine method to assign value to reward
➢ Employ testbed to refine transition probabilities among vehicle states
➢ Develop plans to transition to target site(s)
➢ Publish research in key conferences (e.g., NDIA, CSER, AIAA Scitech, INCOSE
IS, SYSCON)
■ Year Two: Expand testbed capability
➢ Introduce greater complexity into scenarios
➢ Expand POMDP model to address more complex scenarios (e.g., more states)
➢ Run simulations to collect data and refine methodology
➢ Introduce additional capabilities into testbed (data collection, behaviors/patterns
identification)
➢ Publish research in one or more key conferences (e.g., NDIA, CSER, AIAA
Scitech, INCOSE IS, SYSCON)