Date post: | 21-Dec-2015 |
Category: |
Documents |
View: | 213 times |
Download: | 1 times |
Deliberation Scheduling for Planning in Real-Time
David J. MuslinerHoneywell Laboratories
Robert P. GoldmanSIFT, LLC
Kurt KrebsbachLawrence University
April 18, 2023
Outline
Application summary.
Deliberation scheduling problem.
Analytic experiments.
Demonstration tests.
Conclusions.
April 18, 2023
Planning and Action for Real-Time Control
Adaptive Mission Planner: Decomposes an overall mission into multiple control problems, with limited performance goals designed to make the controller synthesis problem solvable with available time and available execution resources.
Controller Synthesis Module: For each control problem, synthesizes a real-time reactive controller according to the constraints sent from AMP.
Real Time Subsystem: Continuously executes synthesized control reactions in hard real-time environment; does not “pause” waiting for new controllers.
Adaptive Mission Planner
Controller Synthesis Module
Real Time
System
April 18, 2023
Available actions
Uncontrollabletransitions
Goal state description
Initial state description
TimedAutomataControllerDesign&ExecutableReactiveController
Controller Synthesis Module (CSM)
Controller Synthesis Module
Problem configuration
April 18, 2023
AMP Overview
Mission is the main input: threats and goals, specific to different mission phases (e.g., ingress, attack, egress). Threats are safety-critical: must guarantee to maintain
safety (sometimes probabilistically) in worst case, using real-time reactions.
Goals are best-effort: don’t need to guarantee.
Each mission phase requires a plan (or controller), built by the CSM to handle a problem configuration.Changes in capabilities, mission, environment can lead to need for additional controller synthesis.
April 18, 2023
AMP Responsibilities
Divide mission into phases, subdividing them as necessary to handle resource restrictions.Build problem configurations for each phase, to drive CSM.Modify problem configurations, both internally and via negotiation with other AMPs, to handle resource limitations. Capabilities (assets). Bounded rationality: deliberation resources. Bounded reactivity: execution resources.
April 18, 2023
AMP Deliberation SchedulingMDP-based approach for AMP to adjust CSM problem configurations and algorithm parameters to maximize expected utility of deliberation.Issues:
Complex utility function for overall mission plan. Survival dependencies between sequenced controllers. Require CSM algorithm performance profiles. Planning that is expected to complete further in the future must be
discounted.
Differences from other deliberation scheduling techniques:
CSM planning is not an anytime algorithm --- it’s more a Las Vegas than a Monte Carlo algorithm.
It’s not a problem of trading deliberation versus action: deliberation and action proceed in concert.
Survival of the platform is key concern.
April 18, 2023
AMP Deliberation Scheduling
Mission phases characterized by: Probability of survival/failure. Expected reward. Expected start time and duration.
Agent keeps reward from all executed phases.Different CSM problem configuration operators yield different types of plan improvements.
Improve probability of survival. Improve expected reward (number or likelihood of goals).
Configuration operators can be applied to same phase in different ways (via parameters).Configuration operators have different expected resource requirements (computation time/space).
April 18, 2023
Expected Mission Utility
Markov chain behavior in the mission phases: Probability of surviving vs. entering absorbing failure state.Reward expectations unevenly distributed.
Phase1
Phase2
FAILURE
Phase4
n
ijiSRU
1
s1
1-s1
R3
s2 Phase5
Phase3 R5
s3 s4
April 18, 2023
The Actions: CSM Performance Profiles
1
10
100
1000
0 1 2 3 4 5 6
Pla
nn
er
Tim
e (
secs
)
Number of Threats
Found PlanNo Plan Found
Timed Out
AMP attempts to predict time-to-plan from domain characteristics, so AMP can be smart about configuring CSM problems in time-constrained situations.
April 18, 2023
Histogram of Same Performance Results
0
10
20
30
40
50
60
70
5 10 15 20 25 30 35 40 45 50
Fre
qu
en
cy
Planner Time (secs)
100 Samples Per Threat Group
0.5 Second Wide Sample Bins
1 Threat 2 Threats3 Threats4 Threats
Note increasing spread (uncertainty of runtime) as problem grows.1Q 2Q 4Q 7Q
AMP’s performance estimate: 80% likely to find plan in given deliberation quanta for given number of threats.
T = threats.Q = 4 seconds.
1T 2T 3T 4T
April 18, 2023
Modeling the Problem as MDP
Actions: commit to 80% success time for CSM plan. All actions have equal probability of success. Durations vary.
States: Sink states: destruction and mission completion. Other states: vector of survival probabilities.
Utility model: goal achievement + survival.
April 18, 2023
Algorithms
Optimal MDP solution: Bellman backup (finite horizon problem). Very computationally expensive.
Greedy one-step lookahead. Assume you do only one computational action, which
is best. Discounted variant.
Strawmen: shortest-action first, earliest-phase first, etc.
Conducted a number of comparison experiments (results published elsewhere).
April 18, 2023
Discount Factors
Greedy use of basic expected utility formula requires discounting to take into account two important effects: Window of opportunity for deliberation: you have more
future time to deliberate on phases that start later. Otherwise, large potential improvements in far-out phases can
distract from near-term improvements. Split phase when new plan downloaded during
execution: Amount of improvement limited by time remaining in phase.
April 18, 2023
Runtime Comparison of Optimal & Greedy
1
10
100
1000
10000
100000
1000000
10000000
0 10 20 30 40 50 60 70 80 90
Experiments (sorted by opt runtime)
Del
iber
atio
n T
ime
(ms)
optimal-delib-time
greedy-delib-time
April 18, 2023
Quality Result for Medium Scenarios
April 18, 2023
“Medium” Quality Comparison Summary
Discounted greedy agent beats simple greedy agent 79 times, ties 3, loses 2.
Discounted greedy agent averages 86% of optimal expected utility; simple greedy averages 79%.
More difficult domains challenge myopic policies, and crush random policy (73% overall). Discounted greedy beats random 83/84 times.
Even on easy scenarios, optimal is waaaay too slow!
April 18, 2023
Mission Testing
Modified AMP to incorporate deliberation scheduling algorithms.
Tested three different agents: S – shortest problem first; U – simple greedy DS; DU – greedy with discounting.
Tested in mission with multiple threats and two goals.
April 18, 2023
Mission Overview
Ingress
Attack Egress
0
1
2
3
45
6
April 18, 2023
Demo Outcome
Shortest: Builds all the easy single-threat plans quickly. Survives the entire mission. Waits too long before building plans for goal
achievement; fails to hit targets.
Utility: Builds safe plans for most threats Gets distracted by high-reward goal in egress phase. Dies in attack phase due to unhandled threat.
Discounted utility: Completes entire mission successfully.
April 18, 2023
0
20
40
60
80
100
120
140
160
180
0 50 100 150 200 250
Exp
ect
ed
Fu
ture
Pa
yoff
"
Time (quanta)
ShortestUtility
Discounted Utility
Expected Payoff vs. Time
Utility chooses badly, tries to
plan for egress but ignores threat during attack.
Shortest chooses badly, discards good plans and tries goal
plans too late.
Apparent drop in utility
is due to phase update.
April 18, 2023
Demo 2: Ingress Phase
• All three are attacked but defend selves successfully.
April 18, 2023
Demo 2: Attack Phase
• Utility and Discounted utility hit targets.
• Utility dies from unhandled threat.
• Shortest stays safe but does not strike target.
April 18, 2023
Demo 2: Second Attack Phase (“Egress”)
• Only Discounted utility hits second target.
• Shortest stays safe but does not strike target.
April 18, 2023
Summary
The End
April 18, 2023
Related Topics
Conventional Deliberation Scheduling Work: Typically this work assumes the object-level
computation is based on anytime algorithms. CSM algorithms are not readily converted to anytime.
Performance improvements are discrete and all-or-nothing.
Because of true parallel Real Time System/AI System, don’t have conventional think/act tradeoffs.
Design-to-time: appropriate, but building full schedules versus single action choices. Comparison may be possible.MDP solvers: either infinite horizon or finite horizon with offline policy computation. We have on-line decision making with dynamic MDP.
April 18, 2023
Demo Scenario
Three types of threats (IR, radar, radar2) during ingress, attack, and egress phases.Targets in attack and egress phases.Overall, there are 41 valid different problem configurations that can be sent to the CSM. Some are unsolvable in allocated time.Performance profiles are approximate:
Predicted planning times range from 1 to 60 seconds. Some configurations take less than predicted. Some take more, and time out rather than finishing.
Mission begins as soon as first plan available (< 1 second).Mission lasts approx 4 minutes.Doing all plans would require 22.3 minutes.