1
REPLANNING IN DOMAINS WITH PARTIAL INFORMATION AND SENSING ACTIONSGuy ShaniRonen BrafmanBen-Gurion University
Problem Background SDR Results
2
Online Planning under Uncertainty with Partial Observability and Sensing• Deterministic actions• Concrete goal condition• Uncertainty about the initial state
• Non-stochastic model – states are either possible or impossible• Sensing actions provide information about the world
• We can generate a conditional plan• Online planning
• We do not plan for all contingencies ahead of time – just until the next observation
Problem Background SDR Results
3
Start state:(oneof (wumpus-at 2,3) (wumpus-at 3,2))(oneof (wumpus-at 3,4) (wumpus-at 4,3))
Start state:(oneof (at 1,1) … (at 5,1) (at 1,2) (at 5,2) … (at 5,1) … (at 5,5))
Examples• Toy problems from CLG [Albore et
at.]
• Doors• Gate location unknown
• Wumpus• Monster location unknown• Must correlate observations from
multiple locations
• Localize• Agent location unknown• Must reason from history
Start state:(oneof (door-at 2,1) … (door-at 2,5))(oneof (door-at 4,1) … (door-at 4,5))
Problem Background SDR Results
4
Why work on this problem?• Uncertainty, partial observability
• no need to motivate
• Study the challenge of planning to sense/learn• Many POMDP methods cope poorly with information gathering
sub-plans that do not provide rewards• We study this in a slightly simpler setting obtained by:
• Simpler form of uncertainty : non-stochastic, deterministic actions• Structured actions and state (a-la STRIPS)
• Extend existing techniques that focus on contingent planning with full observability
Problem Background SDR Results
5
Our contributions
• Extending replanning techniques to handle this case
• A lazy technique for (not) maintaining the belief state
Problem Background SDR Results
7
Replanning (basic idea)
• Pros: very simple, fast, and often effective
• Cons: a greedy approach with the regular drawbacks • Simplistic classical model can lead to poor choices• Can get caught in dead-ends• Smart sampling may reduce these problems
Generate simpler classical probleme.g. Reduce initial state uncertainty by choosing one state
Plan for the reduced problemExecute plan until things breake.g. Observation doesn’t agree with the
selected state
Classical problem
Plan
Reduced uncertainty
8
Replanning with PO and Sensing – Take 1
• Determinize the problem by determinizing the current state• Plan for this initial state only• Execute until observations conflict with deterministic model• Replan!
Problem: Planner will make no effort to sense. It plans as if it knows everything. Need a more sophisticated model that captures the agent’s belief state
Problem Background SDR Results
9
Solution: Use Palacious and Geffner’s Translation-based Approach
• Explicitly represent the agent knowledge• Knowledge predicates replace regular predicates• Kp = Know that p is true• Must ground knowledge on some initial features
• A short tutorial with zero details.
Problem Background SDR Results
10
Translation to Classical Planning
• Maintain predicate values given an initial state• i.e. we know that p is true given that si
was the initial state and false if sj was the initial state.
• Kp means that we know that p is true in all valid states
• Revise actions:• An effect is transformed into • Precondition p transformed to Kp, i.e. an
action can be applied only if the preconditions hold in all valid states
s0
• K(wumpus-at p-2,3)|s0
• K(not (wumpus-at p-3,2))|s0
• K(stench-at p-2,4)|s0
• K(stench-at p-2,2)
Problem Background SDR Results
11
Translation to Classical Planning
• Sensing actions reveal an unknown predicate p, and hence have effect or
• Actions to eliminate states from the belief• If then we know that s was not the initial state effect
• Warning! Many details are missing…
Problem Background SDR Results
12
Replanning with PO and Sensing – Take 2
• Use knowledge domain translation• Feed translation into classical planner• Execute plan until things break
• E.g. observation is inconsistent with expectations• Replan!
• Still missing…• What happens when sensing actions are executed in the
knowledge domain?• Translation size is often huge!
13
• Problem 1: Sensing actions translate into non-deterministic actions
• Solution: Determinize sensing by choosing an initial state s0. All observations will be consistent with this state• Sensing actions have conditional (deterministic) effects:
• Planner must KNOW preconditions of actions and goals• It must use explicit sensing actions
Problem Background SDR Results
Replanning with PO and Sensing – Take 2
Action Move-right K-Move-right K-Sense-rightPrecondition Free-right Kfree-right
Effect (at 1,1)->(at 2,1) (Kat 1,1)->(Kat 2,1) Free-right->Kfree-right
14
• Problem: Translation is often huge• Given N initial states. Predicate copy for each initial state (N
copies).• Each condition in every actions is copied 2N times.• Actions to eliminate every initial state on every predicate.
• Solution: sample a small number of possible initial states
Localize 5X5
Original N=19 N=2
Predicates 24 963 162
Actions 9 1201 130
Conditions 59 2309 421
Start state - 533 85
Problem Background SDR Results
Replanning with PO and Sensing – Take 2
To summarize:
1.Sample subset S out of the possible
current states• To reduce the translation size
2.Sample s0 from S
• Base observations on s0
3.Generate knowledge domain
translation, given S and s0
4.Solve using a classical planner
15
Still missing … Belief Maintenance
• Must recognize if the goal was reached• Must recognize if the preconditions of the next action are
guaranteed to be true
• Requires maintaining information about the current belief state (set of valid states)• This issue is orthogonal to how we generate the plan
Problem Background SDR Results
18
Belief Maintenance through Regression
• A (very) lazy approach• Maintain b0 as a formula• Maintain history a1,o1,…,at,ot
• Cons: must regenerate formula on every query
• Pros: generated formula is focused only on the current query and remains small
Check whether ct holds at bt
Regress through a1,o1 resulting in
Regress through at-1,ot-1 resulting in
Regress through at,ot resulting in
Solve SAT problem If there is no satisfying
assignment then ct holds at bt
…
20
Sample, Determinize, Replan - SDRSelect S and s0
Translate to classical planning
Run classical planner (FF)
Regress goal (solved using MiniSat)
Regress action precondition
Execute action
Check observation consistencyGoal achieved!Terminate
PLAN
EXECUTE
Problem Background SDR Results
21
CLG vs. SDR• CLG translation generates non-deterministic effects for
observation actions• In offline mode all possibilities are checked• In online mode the environment is queried (as we do)• Uses a specialized semi-classical planner (FF variant) – SDR can
use any black box planner (experiments use FF)
• CLG uses tags• In many (most) cases more efficient than complete states.• Complete translation still blows up rapidly.
Problem Background SDR Results
22
Results
SDR CLGDomain #Actions Time #Actions TimeWumpus15 92.5 42 103 240.7Wumpus20 115.9 156.1 160 1224.8doors13 177.1 25.1 111.8 264.5doors17 306.8 96.9 PF X
localize15 56.5 35.6 PF X
localize17 71.7 75 PF X
colorballs-9-3 660.2 209.1 227.8 707.1
colorballs-9-7 1343.9 693.3 TF X
medpks150 88.8 268 CSU X
medpks199 89.9 502.9 PF X
Problem Background SDR Results
30
Summary• SDR – Contingent Replanner under partial observability
• Sample a set of possible states from the current belief.• Create a classical planning translation.• Execute plan until sample proven invalid or goal was reached.
• SDR shown to be faster and scale up to larger domains than CLG (state-of-the-art)
Problem Background SDR Results
31
Future Work• Sensing costs
• Sensing can have a cost (e.g. sensor warmup)• Should have a tradeoff between sensing and acting• Remove sensed preconditions – agent should decide whether it
wants to sense or not• Deadends – well known pitfall of replanning algorithms• Smarter sampling techniques• Scaling up – currently not much better than POMDPs!
Problem Background SDR Results
Thank you