Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Neural Information Processing Systems (NeurIPS), 2020
Zhiwei Deng Karthik Narasimhan Olga Russakovsky
Human communicates with robots - through language
Robots interact with environments- perceive visual information- perform planning, take actions
Human language
Environment observation
Planning
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Vision-and-Language Navigation Task
Unseen environment
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language NavigationVision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments, Peter Anderson et al., CVPR 2018
Vision-and-Language Navigation TaskUnseen environment
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language NavigationVision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments, Peter Anderson et al., CVPR 2018
Photorealistic images
Vision-and-Language Navigation TaskFacing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination
Unseen environment
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Photorealistic images
Human annotated instructions
Vision-and-Language Navigation TaskFacing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination
Unseen environment
StartTarget
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Photorealistic images
Human annotated instructions
Navigation in a room
Vision-and-Language Navigation TaskFacing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination
Unseen environment
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Vision-and-Language Navigation TaskFacing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination
Unseen environment
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Vision-and-Language Navigation TaskUnseen environment
Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Unseen environmentFacing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination
Challenge 1: Reason over observation and languages
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Unseen environmentChallenge 1: Reason over observation and languages
Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Unseen environmentFacing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination
Challenge 2: Perform error correction and recovery
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Unseen environmentChallenge 2: Perform error correction and recovery
Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination
Deviate from correct path
Unseen environmentChallenge 2: Perform error correction and recovery
Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination
Incorrect actionCorrect action Deviate from correct path
Vision-and-Language Navigation TaskUnseen environment
Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Vision-and-Language Navigation TaskUnseen environment
Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Current navigation architecturesUnseen environment
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination
Agent ActionEncoderDecision space
Unseen environment
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination
Agent ActionEncoderDecision space
Current navigation architectures
Unseen environment
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Existing navigation architectures for VLN: constrained local decision space
Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination
Agent ActionEncoderDecision space
Current navigation architectures
Unseen environmentFacing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Observation + decision space
Alignment confusion
Current navigation architectures
Unseen environmentFacing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
AgentDecision space
ActionEncoder
Need to make multi-step decisions, making error correction harder
Current navigation architectures
??
??
Our work: Evolving Graphical Planner
A differentiable graphical planner
Evolving Graphical Structure Proxy graphs for planning
Graph Pool
Message Passing
Graph Unpool
Graph-augmented supervision
Condensation
Standard VLN navi-agentUnseen environment
Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
AgentDecision space
ActionEncoder
Our work: Evolving Graphical Planner
A differentiable graphical planner: global decision space helps
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Unseen environmentFacing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
AgentDecision space
ActionEncoder
Topological map
Our work: Evolving Graphical Planner
Unseen environmentFacing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting. Enter the bedroom, you will reach your destination
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
AgentDecision space
ActionEncoder
Topological map
Visit to expandVisit to stop
Our work: Evolving Graphical Planner
Our work: Evolving Graphical Planner
Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…
A differentiable graphical planner: Graphical memory – topological connection + raw feat.
Graphical memory
Instructions
Observations (visual + angle)
AC
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language NavigationTopological map
Gt = (Vt ,Et )
vti = (visualt
i ,angleti )
Our work: Evolving Graphical Planner
Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…
Graphical memory
Instructions
Observations (visual + angle)
o Grounding: global alignment
A differentiable graphical planner: Graphical memory
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
AC
Topological map
Our work: Evolving Graphical Planner
Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…
Graphical memory
Instructions
Observations (visual + angle)
ACo Follow the memorized path
o Decision made in single step
o Easier error correction
A differentiable graphical planner: Graphical memory
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language NavigationTopological map
Our work: Evolving Graphical Planner
Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…
Ever expanding graph…
Instructions
Observations (visual + angle)
AC
A differentiable graphical planner: Proxy graphs
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Our work: Evolving Graphical Planner
Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…
Instructions
Observations (visual + angle)
AC
A differentiable graphical planner: Proxy graphs
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Operate on the full graph: high planning cost
Ever expanding graph…
Our work: Evolving Graphical Planner
Ever expanding graph…
AC
A differentiable graphical planner: Proxy graphs
Pool
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Hierarchical Graph Representation Learning with Differentiable Pooling, Ying et al. NeurIPS’18
V! t = AtTVt
E! t = AtT Et At
G! t = (Vt ,Et )Gt = (Vt ,Et )
Our work: Evolving Graphical Planner
Pooling matrix : soft “attention” or aggregation from the original graph
AC
A differentiable graphical planner: Proxy graphs
Pool
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Hierarchical Graph Representation Learning with Differentiable Pooling, Ying et al. NeurIPS’18
G! t = (Vt ,Et )Gt = (Vt ,Et )
At
Our work: Evolving Graphical Planner
Pooling matrix : obtained from
AC
A differentiable graphical planner: Proxy graphs
Pool
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Hierarchical Graph Representation Learning with Differentiable Pooling, Ying et al. NeurIPS’18
G! t = (Vt ,Et )Gt = (Vt ,Et )
At f (Gt ,language,agent − state)
Our work: Evolving Graphical Planner
AC
A differentiable graphical planner: Proxy graphs
Pool
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Hierarchical Graph Representation Learning with Differentiable Pooling, Ying et al. NeurIPS’18
G! t = (Vt ,Et )Gt = (Vt ,Et )
Neural message passing: GraphNeuralNetworks(Gt ,k = steps)
Relational inductive biases, deep learning, and graph networks, arxiv’18
Our work: Evolving Graphical Planner
AC
A differentiable graphical planner: Proxy graphs
Un-pool
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Hierarchical Graph Representation Learning with Differentiable Pooling, Ying et al. NeurIPS’18
G! t = (Vt ,Et )Gt = (Vt ,Et )
Pooling matrix : transpose as the un-pool matrixAt
Our work: Evolving Graphical Planner
AC
A differentiable graphical planner: Proxy graphs
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Hierarchical Graph Representation Learning with Differentiable Pooling, Ying et al. NeurIPS’18
G! t = (Vt ,Et )Gt = (Vt ,Et )
Propose next action
Un-pool
Our work: Evolving Graphical Planner
AC
A differentiable graphical planner: Proxy graphs – multi-channel
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Hierarchical Graph Representation Learning with Differentiable Pooling, Ying et al. NeurIPS’18
Gt = (Vt ,Et )
Propose next action
Un-pool
{G! tk(Vt
k ,Etk )},k = 1,...,K
Our work: Evolving Graphical Planner
Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…
A differentiable graphical planner: how to supervise the imitation learner?
Self-Monitoring Navigation Agent for Vision-and-Language Navigation, Ma et al., ICLR’19Speaker-Follower Models for Vision-and-Language Navigation, Fried&Hu et al., NeurIPS’18The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation, Ma et al., ICCV’19
S
Expert trajectories are provided
Our work: Evolving Graphical Planner
Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…
A differentiable graphical planner: how to supervise the imitation learner?
Self-Monitoring Navigation Agent for Vision-and-Language Navigation, Ma et al., ICLR’19Speaker-Follower Models for Vision-and-Language Navigation, Fried&Hu et al., NeurIPS’18The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation, Ma et al., ICCV’19
S
How to use expert trajectory supervision?
Our work: Evolving Graphical Planner
Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…
A differentiable graphical planner: how to supervise the imitation learner?
Self-Monitoring Navigation Agent for Vision-and-Language Navigation, Ma et al., ICLR’19Speaker-Follower Models for Vision-and-Language Navigation, Fried&Hu et al., NeurIPS’18The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation, Ma et al., ICCV’19
S
Option 1: “teacher forcing”
D ={(a1,a2 ,...,aTi )i}Expert trajectory dataset:
a2 a3a1
Our work: Evolving Graphical Planner
Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…
A differentiable graphical planner: how to supervise the imitation learner?
Self-Monitoring Navigation Agent for Vision-and-Language Navigation, Ma et al., ICLR’19Speaker-Follower Models for Vision-and-Language Navigation, Fried&Hu et al., NeurIPS’18The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation, Ma et al., ICCV’19
S
Option 1: “teacher forcing”
P(a1,a2 ,...,aT | s) = P(a1 | s) P(at | a1,a2 ,...,at−1,s)t=2
T
∏
a2 a3a1
Our work: Evolving Graphical Planner
Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…
A differentiable graphical planner: how to supervise the imitation learner?
Self-Monitoring Navigation Agent for Vision-and-Language Navigation, Ma et al., ICLR’19Speaker-Follower Models for Vision-and-Language Navigation, Fried&Hu et al., NeurIPS’18The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation, Ma et al., ICCV’19
S
Option 1: “teacher forcing” – drifting issue in unseen data
Our work: Evolving Graphical Planner
Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…
A differentiable graphical planner: how to supervise the imitation learner?
Self-Monitoring Navigation Agent for Vision-and-Language Navigation, Ma et al., ICLR’19Speaker-Follower Models for Vision-and-Language Navigation, Fried&Hu et al., NeurIPS’18The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation, Ma et al., ICCV’19
S
Option 2: “student forcing”
a2*
a3*
a1*
a4*
Our work: Evolving Graphical Planner
Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…
A differentiable graphical planner: how to supervise the imitation learner?
Self-Monitoring Navigation Agent for Vision-and-Language Navigation, Ma et al., ICLR’19Speaker-Follower Models for Vision-and-Language Navigation, Fried&Hu et al., NeurIPS’18The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation, Ma et al., ICCV’19
S
Option 2: “student forcing” – generate new supervision (shortest path)
D* ={(a1*,a2
*,...,aTi* )i}
D∪ D*
a2*
a3*
a1*
a4*
Our work: Evolving Graphical Planner
Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…
A differentiable graphical planner: how to supervise the imitation learner?
Self-Monitoring Navigation Agent for Vision-and-Language Navigation, Ma et al., ICLR’19Speaker-Follower Models for Vision-and-Language Navigation, Fried&Hu et al., NeurIPS’18The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation, Ma et al., ICCV’19
S
Option 2: “student forcing” – shortest path supervisionmismatch
Our work: Evolving Graphical Planner
Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…
A differentiable graphical planner: how to supervise the imitation learner?
Self-Monitoring Navigation Agent for Vision-and-Language Navigation, Ma et al., ICLR’19Speaker-Follower Models for Vision-and-Language Navigation, Fried&Hu et al., NeurIPS’18The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation, Ma et al., ICCV’19
S
Option 2: “student forcing” – graph augmented supervision
Decision space
Our work: Evolving Graphical Planner
Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…
A differentiable graphical planner: how to supervise the imitation learner?
Self-Monitoring Navigation Agent for Vision-and-Language Navigation, Ma et al., ICLR’19Speaker-Follower Models for Vision-and-Language Navigation, Fried&Hu et al., NeurIPS’18The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation, Ma et al., ICCV’19
S
Option 2: “student forcing” – graph augmented supervision
a2*
a3*
a1*
a4*
a5*
D* ={(a1*,a2
*,...,aTi* )i}
D∪ D*
Our work: Evolving Graphical Planner
Facing the end of the bed, take an immediate right and exit the bedroom through the open doorway. Walk straight until you see a large red painting. At the painting make a turn towards and go through the doorway on the right of the painting…
A differentiable graphical planner: how to supervise the imitation learner?
Self-Monitoring Navigation Agent for Vision-and-Language Navigation, Ma et al., ICLR’19Speaker-Follower Models for Vision-and-Language Navigation, Fried&Hu et al., NeurIPS’18The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation, Ma et al., ICCV’19
S
Option 2: “student forcing” – graph augmented supervision
o Ground truth always exists
o No mismatch problem
o No need to access the ENV
Our work: Evolving Graphical Planner
A differentiable graphical planner: full training process
Instructions
Observations (visual + angle)
Graphical memory
Multi-channel planner
Action
Loss
Our work: Evolving Graphical Planner
A differentiable graphical planner: test inference matches the training
Instructions
Observations (visual + angle)
Graphical memory
Multi-channel planner
Action
Experiments
• Room-to-Room (R2R): all trajectories are generated through shortest-path, emphasize on goal reaching
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Contribution of each component
44454647484950515253
Top-K = 3 Top-K = 5 Top-K =10
Top-K =All
Does global decision space help (success rate %)
• Room-to-Room (R2R): all trajectories are generated through shortest-path, emphasize on goal reaching
The global decision space, the planner and the new supervision strategy help on navigation success rate
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
0
10
20
30
40
50
60
Messagepassing steps
= 0
Messagepassing steps
= 3
Multi-channelplanner
Proxy graph choices (success rate %)
42
44
46
48
5052
54
Shortest path Graph-augmented(ours)
Supervision strategy for imitation learner (success rate %)
Compare to existing backbones
• Room-to-Room (R2R): all trajectories are generated through shortest-path, emphasize on goal reaching
We outperform previous backbone architecture
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Room-for-room with pure imitation learning
• Room-for-Room (R4R): measured by Coverage weighted by Length Score (CLS), normalized dynamic time warping (DTW), Success rate weighted normalized Dynamic Time Warping (SDTW), emphasize on path following
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Room-for-room with pure imitation learning
• Room-for-Room (R4R): measured by Coverage weighted by Length Score (CLS), normalized dynamic time warping (DTW), Success rate weighted normalized Dynamic Time Warping (SDTW)
We achieve the state-of-the-art using pure imitation learning
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Contributions
o A differentiable graphical planner that extends the decision space globally
o A new supervision strategy for training imitation agent in navigation
o Introduce proxy graphs for improving the efficiency of planning
Email: [email protected]