Problem FormulationMotivation Approach and Algorithm
Mohak Bhardwaj, Sanjiban Choudhury, Sebastian Scherer
Learning Heuristic Search via Imitation
Imitate clairvoyant oracle planner [2,3](Backward Dijkstra’s Algorithm)
Alternating Gaps
Shifting Gaps
1 2
3
Reference1. J. Pearl. Heuristics: Intelligent search strategies for computer problem solving, 1984.
3. Choudhury, et al. Adaptive information gathering via imitation learning. RSS, 2017.
2. Ross and Bagnell. “Reinforcement and imitation learning via interactive no-regret learning”. arXiv,2014
Construct Graph
SearchPath
Heuristic is a policy from state of search
to node to expand
Planning must focus on expected performance on actual distribution using machine learning
Single Bugtrap
Multiple Bugtraps
Gaps and forest
Mazes
Solves full problem to get true expansions-to-go
A heuristic guides search to minimize number of expansions [1]
State
Action
Reward
Transition Model Depends on true map
Search MDP
Recast Search as sequential decision making under uncertainty (over World Map).
ScalabilitySaIL convergence characteristics
Motion Planning:
Sample a world from a database
Query oracle plannerfor
Roll-in mixture policy; choose random action;
collect
Converges fast consistently across environments
Converges way faster than model free RL
Approximates search effort from belief
detects and escapes local minima
Learner
Oracle
IMITATION
4
Repeat steps (1-3)to add m samples
Repeat steps (1-4)to train N policies
Aggregate data, update policy and repeat
Euclidean Behavior CloningHelicopter planning in simulation (XYZH with curvature constraints)
Real-world Quadrotor Planning (Trained completely in Sim)
Tra
inin
g D
atas
et
(Can
yon)
Dataset
70% gaps below; 30% top
Gap uniformly along wall
Search As Imitation Learning
Note: Feature calculation should not expend extra search effort!
Compress search state to get for each
A*
sear
ch
(253
1 ex
pan
sion
s, 7
000m
s)
SaI
L(1
8 ex
pan
sion
s, 1
00m
s)
SaI
L(1
8 ex
p)
A
* (1
910
exp)
Onboard Camera
Third Person View
Path plannedonboard in
100 ms
Virtual MazeRobot View
Alternative representations: image patches, distance transforms, number of invalid successors/siblings etc.
Representing Search State
adapts to change in world distribution
exploits environment structure
Search based: [ , , g, , ]
World based: [ , , , , ]