Object Goal Navigation using Goal-oriented Semantic Exploration
Devendra Singh Chaplot
AbhinavGupta
RuslanSalakhutdinov
DhirajGandhi
Winner CVPR 2020 Habitat ObjectNav Challenge Team Arnold (SemExp)
Webpage: https://devendrachaplot.github.io/projects/semantic-exploration
Object Goal Navigation
2
Object Goal Navigation
2
Object Goal: dining table
Object Goal Navigation
2
Object Goal: dining table
Semantic Scene Understanding
Object detection and segmentation
Geometric Scene Understanding
Understanding navigable space
Passive
Object Goal Navigation
2
Object Goal: dining table
Learning Semantic Priors Episodic Memory
Keeping track of explored and unexplored areas
Where is ‘dining table’ more likely to be found?
Semantic Scene Understanding
Object detection and segmentation
Geometric Scene Understanding
Understanding navigable space
Passive
Active
Active Neural SLAM
Sensor Pose Reading (x′ t)
Observation (st)
Action (at)
Neural SLAM ( fSLAM)
Local Policy (πL)
Global Policy (πG)
Pose Estimate ( ̂xt)
Short-term goal (gs
t )
Long-term goal (gl
t)
fPlan
Map (mt)
3
Active Neural SLAM
Sensor Pose Reading (x′ t)
Observation (st)
Action (at)
Neural SLAM ( fSLAM)
Local Policy (πL)
Global Policy (πG)
Pose Estimate ( ̂xt)
Short-term goal (gs
t )
Long-term goal (gl
t)
fPlan
Map (mt)
3
[Chaplot el al. ICLR-20]
Incorporating Semantics
4
Obstacle Map Representation(Active Neural SLAM)
Obstacle Map (2 × M × M)
ObstaclesExplored Area
Incorporating Semantics
4
Obstacle Map Representation(Active Neural SLAM)
Obstacle Map (2 × M × M)
ObstaclesExplored Area
Semantic Map (K × M × M)
Semantic categories (C)
ObstaclesExplored Area
Semantic Map Representation(SemExp)
K = C + 2
RGB (It)
Depth (Dt)
Semantic Mapping
Mask RCNN
RGB (It)
Depth (Dt)
First-person Semantic Predictions
Semantic Mapping
Mask RCNN
X Y Z
RGB (It)
Depth (Dt)
First-person Semantic Predictions
Point Cloud
Semantic Mapping
Mask RCNN
X Y Z
RGB (It)
Depth (Dt)
First-person Semantic Predictions
Point Cloud
C3C1 C2
Semantic Labels
Semantic Mapping
Mask RCNN
X Y Z
RGB (It)
Depth (Dt)
First-person Semantic Predictions
Point Cloud
C3C1 C2
Semantic Labels
Semantic Mapping
Voxel (C + 1) × H × M × M
All cells
Mask RCNN
X Y Z
sum across height
RGB (It)
Depth (Dt)
First-person Semantic Predictions
Point Cloud
C3C1 C2
Semantic Labels
Category-wise
All obstacles
Projection Map (C + 2) × M × M
Semantic categories (C)
ObstaclesExplored Area
Semantic Mapping
Voxel (C + 1) × H × M × M
All cells
Mask RCNN
X Y Z
sum across height
RGB (It)
Depth (Dt)
First-person Semantic Predictions
Point Cloud
C3C1 C2
Semantic Labels
Category-wise
All obstacles
Projection Map (C + 2) × M × M
Semantic categories (C)
ObstaclesExplored Area
Semantic Map Prediction (C + 2) × M × M
Denoising Network
Semantic Mapping
Voxel (C + 1) × H × M × M
SemExp Model Overview
Sensor Pose Reading (xt)
Object Goal ( “chair”)G =
Observation (RGBD)
(st)
SemExp Model Overview
Sensor Pose Reading (xt)
Semantic Mapping
Semantic Map (mt)
Object Goal ( “chair”)G =
Observation (RGBD)
(st)
SemExp Model Overview
Sensor Pose Reading (xt)
Semantic Mapping
Goal-Oriented Semantic Policy
Long-term goal (gt)
Semantic Map (mt)
Object Goal ( “chair”)G =
Observation (RGBD)
(st)
SemExp Model Overview
Sensor Pose Reading (xt)
Semantic Mapping
Deterministic Local Policy (πL)
Goal-Oriented Semantic Policy
Long-term goal (gt)
Semantic Map (mt)
Object Goal ( “chair”)G =
Observation (RGBD)
(st)
Action (at)
ObjectGoal Navigation Results
8
ObjectGoal Navigation Results
Gibson
Random
RGBD + RL [1]
RGBD + Semantics + RL [2]
Classical Map + FBE
Active Neural SLAM [3]
SemExp
0 0.15 0.3 0.45 0.6
0.5440.446
0.4030.159
0.0820.004
Gibson
Random
RGBD + RL [1]
RGBD + Semantics + RL [2]
Classical Map + FBE
Active Neural SLAM [3]
SemExp
0 0.15 0.3 0.45 0.6
0.5440.446
0.4030.159
0.0820.004 0.004
0.027
0.049
0.124
0.145
0.199
Success Rate
*Adapted from [1] Savva et al. ICCV-19, [2] Mousavian et al. ICRA-19, [3] Chaplot el al. ICLR-20
SPL
8
ObjectGoal Navigation Results
MP3D
Random
RGBD + RL [1]
RGBD + Semantics + RL [2]
Classical Map + FBE
Active Neural SLAM [3]
SemExp
0 0.15 0.3 0.45 0.6
0.360.321
0.3110.0310.037
0.005
MP3D
Random
RGBD + RL [1]
RGBD + Semantics + RL [2]
Classical Map + FBE
Active Neural SLAM [3]
SemExp
0 0.15 0.3 0.45 0.6
0.360.321
0.3110.0310.037
0.005 0.004
0.027
0.049
0.124
0.145
0.199
Success Rate
0.004
0.027
0.049
0.124
0.145
0.199
*Adapted from [1] Savva et al. ICCV-19, [2] Mousavian et al. ICRA-19, [3] Chaplot el al. ICLR-20
SPL
8
Habitat Challenge Leaderboard
9
Test-standard Minival
Method SPL Success Dist SPL Success Dist
Arnold (SemExp) 0.071 0.179 8.818 0.246 0.467 3.334
Active Exploration 0.041 0.089 9.461 0.108 0.167 5.079
DD-PPO 0.021 0.062 9.316 - - -
Blue Ox 0.017 0.060 8.903 0.083 0.133 4.254
SRCB-robot-sudoer 0.002 0.004 10.276 0.124 0.233 4.848
PPO RGBD - - - 0 0 6.055
Random 0.000 0.000 10.330 0 0 6.379
Real-world Transfer
10
See video at https://devendrachaplot.github.io/projects/semantic-exploration
11
Object Goal Navigation using Goal-oriented Semantic ExplorationDevendra Singh Chaplot, Dhiraj Gandhi, Abhinav Gupta, Ruslan SalakhutdinovCVPR 2020
Webpage: https://devendrachaplot.github.io/projects/semantic-exploration
Devendra Singh ChaplotWebpage: http://devendrachaplot.github.io/Email: [email protected]: @dchaplot
Thank you