Download - Replanning in Domains with Partial Information and Sensing Actions

1

REPLANNING IN DOMAINS WITH PARTIAL INFORMATION AND SENSING ACTIONSGuy ShaniRonen BrafmanBen-Gurion University

Problem Background SDR Results

2

Online Planning under Uncertainty with Partial Observability and Sensing• Deterministic actions• Concrete goal condition• Uncertainty about the initial state

• Non-stochastic model – states are either possible or impossible• Sensing actions provide information about the world

• We can generate a conditional plan• Online planning

• We do not plan for all contingencies ahead of time – just until the next observation


3

Start state:(oneof (wumpus-at 2,3) (wumpus-at 3,2))(oneof (wumpus-at 3,4) (wumpus-at 4,3))

Start state:(oneof (at 1,1) … (at 5,1) (at 1,2) (at 5,2) … (at 5,1) … (at 5,5))

Examples• Toy problems from CLG [Albore et

at.]

• Doors• Gate location unknown

• Wumpus• Monster location unknown• Must correlate observations from

multiple locations

• Localize• Agent location unknown• Must reason from history

Start state:(oneof (door-at 2,1) … (door-at 2,5))(oneof (door-at 4,1) … (door-at 4,5))


4

Why work on this problem?• Uncertainty, partial observability

• no need to motivate

• Study the challenge of planning to sense/learn• Many POMDP methods cope poorly with information gathering

sub-plans that do not provide rewards• We study this in a slightly simpler setting obtained by:

• Simpler form of uncertainty : non-stochastic, deterministic actions• Structured actions and state (a-la STRIPS)

• Extend existing techniques that focus on contingent planning with full observability


5

Our contributions

• Extending replanning techniques to handle this case

• A lazy technique for (not) maintaining the belief state


7

Replanning (basic idea)

• Pros: very simple, fast, and often effective

• Cons: a greedy approach with the regular drawbacks • Simplistic classical model can lead to poor choices• Can get caught in dead-ends• Smart sampling may reduce these problems

Generate simpler classical probleme.g. Reduce initial state uncertainty by choosing one state

Plan for the reduced problemExecute plan until things breake.g. Observation doesn’t agree with the

selected state

Classical problem

Plan

Reduced uncertainty

8

Replanning with PO and Sensing – Take 1

• Determinize the problem by determinizing the current state• Plan for this initial state only• Execute until observations conflict with deterministic model• Replan!

Problem: Planner will make no effort to sense. It plans as if it knows everything. Need a more sophisticated model that captures the agent’s belief state


9

Solution: Use Palacious and Geffner’s Translation-based Approach

• Explicitly represent the agent knowledge• Knowledge predicates replace regular predicates• Kp = Know that p is true• Must ground knowledge on some initial features

• A short tutorial with zero details.


10

Translation to Classical Planning

• Maintain predicate values given an initial state• i.e. we know that p is true given that si

was the initial state and false if sj was the initial state.

• Kp means that we know that p is true in all valid states

• Revise actions:• An effect is transformed into • Precondition p transformed to Kp, i.e. an

action can be applied only if the preconditions hold in all valid states

s0

• K(wumpus-at p-2,3)|s0

• K(not (wumpus-at p-3,2))|s0

• K(stench-at p-2,4)|s0

• K(stench-at p-2,2)


11

Translation to Classical Planning

• Sensing actions reveal an unknown predicate p, and hence have effect or

• Actions to eliminate states from the belief• If then we know that s was not the initial state effect

• Warning! Many details are missing…


12


• Use knowledge domain translation• Feed translation into classical planner• Execute plan until things break

• E.g. observation is inconsistent with expectations• Replan!

• Still missing…• What happens when sensing actions are executed in the

knowledge domain?• Translation size is often huge!

13

• Problem 1: Sensing actions translate into non-deterministic actions

• Solution: Determinize sensing by choosing an initial state s0. All observations will be consistent with this state• Sensing actions have conditional (deterministic) effects:

• Planner must KNOW preconditions of actions and goals• It must use explicit sensing actions



Action Move-right K-Move-right K-Sense-rightPrecondition Free-right Kfree-right

Effect (at 1,1)->(at 2,1) (Kat 1,1)->(Kat 2,1) Free-right->Kfree-right

14

• Problem: Translation is often huge• Given N initial states. Predicate copy for each initial state (N

copies).• Each condition in every actions is copied 2N times.• Actions to eliminate every initial state on every predicate.

• Solution: sample a small number of possible initial states

Localize 5X5

Original N=19 N=2

Predicates 24 963 162

Actions 9 1201 130

Conditions 59 2309 421

Start state - 533 85



To summarize:

1.Sample subset S out of the possible

current states• To reduce the translation size

2.Sample s0 from S

• Base observations on s0

3.Generate knowledge domain

translation, given S and s0

4.Solve using a classical planner

15

Still missing … Belief Maintenance

• Must recognize if the goal was reached• Must recognize if the preconditions of the next action are

guaranteed to be true

• Requires maintaining information about the current belief state (set of valid states)• This issue is orthogonal to how we generate the plan


18

Belief Maintenance through Regression

• A (very) lazy approach• Maintain b0 as a formula• Maintain history a1,o1,…,at,ot

• Cons: must regenerate formula on every query

• Pros: generated formula is focused only on the current query and remains small

Check whether ct holds at bt

Regress through a1,o1 resulting in

Regress through at-1,ot-1 resulting in

Regress through at,ot resulting in

Solve SAT problem If there is no satisfying

assignment then ct holds at bt

…

20

Sample, Determinize, Replan - SDRSelect S and s0

Translate to classical planning

Run classical planner (FF)

Regress goal (solved using MiniSat)

Regress action precondition

Execute action

Check observation consistencyGoal achieved!Terminate

PLAN

EXECUTE


21

CLG vs. SDR• CLG translation generates non-deterministic effects for

observation actions• In offline mode all possibilities are checked• In online mode the environment is queried (as we do)• Uses a specialized semi-classical planner (FF variant) – SDR can

use any black box planner (experiments use FF)

• CLG uses tags• In many (most) cases more efficient than complete states.• Complete translation still blows up rapidly.


22

Results

SDR CLGDomain #Actions Time #Actions TimeWumpus15 92.5 42 103 240.7Wumpus20 115.9 156.1 160 1224.8doors13 177.1 25.1 111.8 264.5doors17 306.8 96.9 PF X

localize15 56.5 35.6 PF X

localize17 71.7 75 PF X

colorballs-9-3 660.2 209.1 227.8 707.1

colorballs-9-7 1343.9 693.3 TF X

medpks150 88.8 268 CSU X

medpks199 89.9 502.9 PF X


30

Summary• SDR – Contingent Replanner under partial observability

• Sample a set of possible states from the current belief.• Create a classical planning translation.• Execute plan until sample proven invalid or goal was reached.

• SDR shown to be faster and scale up to larger domains than CLG (state-of-the-art)


31

Future Work• Sensing costs

• Sensing can have a cost (e.g. sensor warmup)• Should have a tradeoff between sensing and acting• Remove sensed preconditions – agent should decide whether it

wants to sense or not• Deadends – well known pitfall of replanning algorithms• Smarter sampling techniques• Scaling up – currently not much better than POMDPs!


Thank you