+ All Categories
Home > Documents > Learning Heuristic Search via Imitation · 2018-05-16 · Motivation Problem Formulation Approach...

Learning Heuristic Search via Imitation · 2018-05-16 · Motivation Problem Formulation Approach...

Date post: 12-Mar-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
1
Problem Formulation Motivation Approach and Algorithm Mohak Bhardwaj, Sanjiban Choudhury, Sebastian Scherer Learning Heuristic Search via Imitation Imitate clairvoyant oracle planner [2,3] (Backward Dijkstra’s Algorithm) Alternating Gaps Shifting Gaps 1 2 3 Reference 1. J. Pearl. Heuristics: Intelligent search strategies for computer problem solving, 1984. 3. Choudhury, et al. Adaptive information gathering via imitation learning. RSS, 2017. 2. Ross and Bagnell. “Reinforcement and imitation learning via interactive no-regret learning”. arXiv,2014 Construct Graph Search Path Heuristic is a policy from state of search to node to expand Planning must focus on expected performance on actual distribution using machine learning Single Bugtrap Multiple Bugtraps Gaps and forest Mazes Solves full problem to get true expansions-to-go A heuristic guides search to minimize number of expansions [1] State Action Reward Transition Model Depends on true map Search MDP Recast Search as sequential decision making under uncertainty (over World Map). Scalability SaIL convergence characteristics Motion Planning: Sample a world from a database Query oracle planner for Roll-in mixture policy; choose random action; collect Converges fast consistently across environments Converges way faster than model free RL Approximates search effort from belief detects and escapes local minima Learner Oracle I M I T A T I O N 4 Repeat steps (1-3) to add m samples Repeat steps (1-4) to train N policies Aggregate data, update policy and repeat Euclidean Behavior Cloning Helicopter planning in simulation (XYZH with curvature constraints) Real-world Quadrotor Planning (Trained completely in Sim) Training Dataset (Canyon) Dataset 70% gaps below; 30% top Gap uniformly along wall Search As Imitation Learning Note: Feature calculation should not expend extra search effort! Compress search state to get for each A* search (2531 expansions, 7000ms) SaIL (18 expansions, 100ms) SaIL (18 exp) A* (1910 exp) Onboard Camera Third Person View Path planned onboard in 100 ms Virtual Maze Robot View Alternative representations: image patches, distance transforms, number of invalid successors/siblings etc. Representing Search State adapts to change in world distribution exploits environment structure Search based: [, , g, , ] World based: [ , , , , ]
Transcript
Page 1: Learning Heuristic Search via Imitation · 2018-05-16 · Motivation Problem Formulation Approach and Algorithm Mohak Bhardwaj, Sanjiban Choudhury, Sebastian Scherer Learning Heuristic

Problem FormulationMotivation Approach and Algorithm

Mohak Bhardwaj, Sanjiban Choudhury, Sebastian Scherer

Learning Heuristic Search via Imitation

Imitate clairvoyant oracle planner [2,3](Backward Dijkstra’s Algorithm)

Alternating Gaps

Shifting Gaps

1 2

3

Reference1. J. Pearl. Heuristics: Intelligent search strategies for computer problem solving, 1984.

3. Choudhury, et al. Adaptive information gathering via imitation learning. RSS, 2017.

2. Ross and Bagnell. “Reinforcement and imitation learning via interactive no-regret learning”. arXiv,2014

Construct Graph

SearchPath

Heuristic is a policy from state of search

to node to expand

Planning must focus on expected performance on actual distribution using machine learning

Single Bugtrap

Multiple Bugtraps

Gaps and forest

Mazes

Solves full problem to get true expansions-to-go

A heuristic guides search to minimize number of expansions [1]

State

Action

Reward

Transition Model Depends on true map

Search MDP

Recast Search as sequential decision making under uncertainty (over World Map).

ScalabilitySaIL convergence characteristics

Motion Planning:

Sample a world from a database

Query oracle plannerfor

Roll-in mixture policy; choose random action;

collect

Converges fast consistently across environments

Converges way faster than model free RL

Approximates search effort from belief

detects and escapes local minima

Learner

Oracle

IMITATION

4

Repeat steps (1-3)to add m samples

Repeat steps (1-4)to train N policies

Aggregate data, update policy and repeat

Euclidean Behavior CloningHelicopter planning in simulation (XYZH with curvature constraints)

Real-world Quadrotor Planning (Trained completely in Sim)

Tra

inin

g D

atas

et

(Can

yon)

Dataset

70% gaps below; 30% top

Gap uniformly along wall

Search As Imitation Learning

Note: Feature calculation should not expend extra search effort!

Compress search state to get for each

A*

sear

ch

(253

1 ex

pan

sion

s, 7

000m

s)

SaI

L(1

8 ex

pan

sion

s, 1

00m

s)

SaI

L(1

8 ex

p)

A

* (1

910

exp)

Onboard Camera

Third Person View

Path plannedonboard in

100 ms

Virtual MazeRobot View

Alternative representations: image patches, distance transforms, number of invalid successors/siblings etc.

Representing Search State

adapts to change in world distribution

exploits environment structure

Search based: [ , , g, , ]

World based: [ , , , , ]

Recommended