Learning Heuristic Search via Imitation · 2018-05-16 · Motivation Problem Formulation Approach...

Problem FormulationMotivation Approach and Algorithm

Mohak Bhardwaj, Sanjiban Choudhury, Sebastian Scherer

Learning Heuristic Search via Imitation

Imitate clairvoyant oracle planner [2,3](Backward Dijkstra’s Algorithm)

Alternating Gaps

Shifting Gaps

1 2

3

Reference1. J. Pearl. Heuristics: Intelligent search strategies for computer problem solving, 1984.

3. Choudhury, et al. Adaptive information gathering via imitation learning. RSS, 2017.

2. Ross and Bagnell. “Reinforcement and imitation learning via interactive no-regret learning”. arXiv,2014

Construct Graph

SearchPath

Heuristic is a policy from state of search

to node to expand

Planning must focus on expected performance on actual distribution using machine learning

Single Bugtrap

Multiple Bugtraps

Gaps and forest

Mazes

Solves full problem to get true expansions-to-go

A heuristic guides search to minimize number of expansions [1]

State

Action

Reward

Transition Model Depends on true map

Search MDP

Recast Search as sequential decision making under uncertainty (over World Map).

ScalabilitySaIL convergence characteristics

Motion Planning:

Sample a world from a database

Query oracle plannerfor

Roll-in mixture policy; choose random action;

collect

Converges fast consistently across environments

Converges way faster than model free RL

Approximates search effort from belief

detects and escapes local minima

Learner

Oracle

IMITATION

4

Repeat steps (1-3)to add m samples

Repeat steps (1-4)to train N policies

Aggregate data, update policy and repeat

Euclidean Behavior CloningHelicopter planning in simulation (XYZH with curvature constraints)

Real-world Quadrotor Planning (Trained completely in Sim)

Tra

inin

g D

atas

et

(Can

yon)

Dataset

70% gaps below; 30% top

Gap uniformly along wall

Search As Imitation Learning

Note: Feature calculation should not expend extra search effort!

Compress search state to get for each

A*

sear

ch

(253

1 ex

pan

sion

s, 7

000m

s)

SaI

L(1

8 ex

pan

sion

s, 1

00m

s)

SaI

L(1

8 ex

p)

A

* (1

910

exp)

Onboard Camera

Third Person View

Path plannedonboard in

100 ms

Virtual MazeRobot View

Alternative representations: image patches, distance transforms, number of invalid successors/siblings etc.

Representing Search State

adapts to change in world distribution

exploits environment structure

Search based: [ , , g, , ]

World based: [ , , , , ]

Date post:	12-Mar-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Learning Heuristic Search via Imitation · 2018-05-16 · Motivation Problem Formulation Approach...

Documents