Deliberation Scheduling for Planning in Real-Time David J. Musliner Honeywell Laboratories Robert P....

Deliberation Scheduling for Planning in Real-Time

David J. MuslinerHoneywell Laboratories

Robert P. GoldmanSIFT, LLC

Kurt KrebsbachLawrence University

April 18, 2023

Outline

Application summary.

Deliberation scheduling problem.

Analytic experiments.

Demonstration tests.

Conclusions.

April 18, 2023

Planning and Action for Real-Time Control

Adaptive Mission Planner: Decomposes an overall mission into multiple control problems, with limited performance goals designed to make the controller synthesis problem solvable with available time and available execution resources.

Controller Synthesis Module: For each control problem, synthesizes a real-time reactive controller according to the constraints sent from AMP.

Real Time Subsystem: Continuously executes synthesized control reactions in hard real-time environment; does not “pause” waiting for new controllers.

Adaptive Mission Planner

Controller Synthesis Module

Real Time

System

April 18, 2023

Available actions

Uncontrollabletransitions

Goal state description

Initial state description

TimedAutomataControllerDesign&ExecutableReactiveController

Controller Synthesis Module (CSM)

Controller Synthesis Module

Problem configuration

April 18, 2023

AMP Overview

Mission is the main input: threats and goals, specific to different mission phases (e.g., ingress, attack, egress). Threats are safety-critical: must guarantee to maintain

safety (sometimes probabilistically) in worst case, using real-time reactions.

Goals are best-effort: don’t need to guarantee.

Each mission phase requires a plan (or controller), built by the CSM to handle a problem configuration.Changes in capabilities, mission, environment can lead to need for additional controller synthesis.

April 18, 2023

AMP Responsibilities

Divide mission into phases, subdividing them as necessary to handle resource restrictions.Build problem configurations for each phase, to drive CSM.Modify problem configurations, both internally and via negotiation with other AMPs, to handle resource limitations. Capabilities (assets). Bounded rationality: deliberation resources. Bounded reactivity: execution resources.

April 18, 2023

AMP Deliberation SchedulingMDP-based approach for AMP to adjust CSM problem configurations and algorithm parameters to maximize expected utility of deliberation.Issues:

Complex utility function for overall mission plan. Survival dependencies between sequenced controllers. Require CSM algorithm performance profiles. Planning that is expected to complete further in the future must be

discounted.

Differences from other deliberation scheduling techniques:

CSM planning is not an anytime algorithm --- it’s more a Las Vegas than a Monte Carlo algorithm.

It’s not a problem of trading deliberation versus action: deliberation and action proceed in concert.

Survival of the platform is key concern.

April 18, 2023

AMP Deliberation Scheduling

Mission phases characterized by: Probability of survival/failure. Expected reward. Expected start time and duration.

Agent keeps reward from all executed phases.Different CSM problem configuration operators yield different types of plan improvements.

Improve probability of survival. Improve expected reward (number or likelihood of goals).

Configuration operators can be applied to same phase in different ways (via parameters).Configuration operators have different expected resource requirements (computation time/space).

April 18, 2023

Expected Mission Utility

Markov chain behavior in the mission phases: Probability of surviving vs. entering absorbing failure state.Reward expectations unevenly distributed.

Phase1

Phase2

FAILURE

Phase4

n

ijiSRU

1

s1

1-s1

R3

s2 Phase5

Phase3 R5

s3 s4

April 18, 2023

The Actions: CSM Performance Profiles

1

10

100

1000

0 1 2 3 4 5 6

Pla

nn

er

Tim

e (

secs

)

Number of Threats

Found PlanNo Plan Found

Timed Out

AMP attempts to predict time-to-plan from domain characteristics, so AMP can be smart about configuring CSM problems in time-constrained situations.

April 18, 2023

Histogram of Same Performance Results

0

10

20

30

40

50

60

70

5 10 15 20 25 30 35 40 45 50

Fre

qu

en

cy

Planner Time (secs)

100 Samples Per Threat Group

0.5 Second Wide Sample Bins

1 Threat 2 Threats3 Threats4 Threats

Note increasing spread (uncertainty of runtime) as problem grows.1Q 2Q 4Q 7Q

AMP’s performance estimate: 80% likely to find plan in given deliberation quanta for given number of threats.

T = threats.Q = 4 seconds.

1T 2T 3T 4T

April 18, 2023

Modeling the Problem as MDP

Actions: commit to 80% success time for CSM plan. All actions have equal probability of success. Durations vary.

States: Sink states: destruction and mission completion. Other states: vector of survival probabilities.

Utility model: goal achievement + survival.

April 18, 2023

Algorithms

Optimal MDP solution: Bellman backup (finite horizon problem). Very computationally expensive.

Greedy one-step lookahead. Assume you do only one computational action, which

is best. Discounted variant.

Strawmen: shortest-action first, earliest-phase first, etc.

Conducted a number of comparison experiments (results published elsewhere).

April 18, 2023

Discount Factors

Greedy use of basic expected utility formula requires discounting to take into account two important effects: Window of opportunity for deliberation: you have more

future time to deliberate on phases that start later. Otherwise, large potential improvements in far-out phases can

distract from near-term improvements. Split phase when new plan downloaded during

execution: Amount of improvement limited by time remaining in phase.

April 18, 2023

Runtime Comparison of Optimal & Greedy

1

10

100

1000

10000

100000

1000000

10000000

0 10 20 30 40 50 60 70 80 90

Experiments (sorted by opt runtime)

Del

iber

atio

n T

ime

(ms)

optimal-delib-time

greedy-delib-time

April 18, 2023

Quality Result for Medium Scenarios

April 18, 2023

“Medium” Quality Comparison Summary

Discounted greedy agent beats simple greedy agent 79 times, ties 3, loses 2.

Discounted greedy agent averages 86% of optimal expected utility; simple greedy averages 79%.

More difficult domains challenge myopic policies, and crush random policy (73% overall). Discounted greedy beats random 83/84 times.

Even on easy scenarios, optimal is waaaay too slow!

April 18, 2023

Mission Testing

Modified AMP to incorporate deliberation scheduling algorithms.

Tested three different agents: S – shortest problem first; U – simple greedy DS; DU – greedy with discounting.

Tested in mission with multiple threats and two goals.

April 18, 2023

Mission Overview

Ingress

Attack Egress

0

1

2

3

45

6

April 18, 2023

Demo Outcome

Shortest: Builds all the easy single-threat plans quickly. Survives the entire mission. Waits too long before building plans for goal

achievement; fails to hit targets.

Utility: Builds safe plans for most threats Gets distracted by high-reward goal in egress phase. Dies in attack phase due to unhandled threat.

Discounted utility: Completes entire mission successfully.

April 18, 2023

0

20

40

60

80

100

120

140

160

180

0 50 100 150 200 250

Exp

ect

ed

Fu

ture

Pa

yoff

"

Time (quanta)

ShortestUtility

Discounted Utility

Expected Payoff vs. Time

Utility chooses badly, tries to

plan for egress but ignores threat during attack.

Shortest chooses badly, discards good plans and tries goal

plans too late.

Apparent drop in utility

is due to phase update.

April 18, 2023

Demo 2: Ingress Phase

• All three are attacked but defend selves successfully.

April 18, 2023

Demo 2: Attack Phase

• Utility and Discounted utility hit targets.

• Utility dies from unhandled threat.

• Shortest stays safe but does not strike target.

April 18, 2023

Demo 2: Second Attack Phase (“Egress”)

• Only Discounted utility hits second target.

• Shortest stays safe but does not strike target.

April 18, 2023

Summary

The End

April 18, 2023

Related Topics

Conventional Deliberation Scheduling Work: Typically this work assumes the object-level

computation is based on anytime algorithms. CSM algorithms are not readily converted to anytime.

Performance improvements are discrete and all-or-nothing.

Because of true parallel Real Time System/AI System, don’t have conventional think/act tradeoffs.

Design-to-time: appropriate, but building full schedules versus single action choices. Comparison may be possible.MDP solvers: either infinite horizon or finite horizon with offline policy computation. We have on-line decision making with dynamic MDP.

April 18, 2023

Demo Scenario

Three types of threats (IR, radar, radar2) during ingress, attack, and egress phases.Targets in attack and egress phases.Overall, there are 41 valid different problem configurations that can be sent to the CSM. Some are unsolvable in allocated time.Performance profiles are approximate:

Predicted planning times range from 1 to 60 seconds. Some configurations take less than predicted. Some take more, and time out rather than finishing.

Mission begins as soon as first plan available (< 1 second).Mission lasts approx 4 minutes.Doing all plans would require 22.3 minutes.

Date post:	21-Dec-2015
Category:	Documents
View:	213 times
Download:	1 times

Deliberation Scheduling for Planning in Real-Time David J. Musliner Honeywell Laboratories Robert P....

Documents