Object Goal Navigation using Goal-oriented Semantic...

Post on 08-Aug-2020

3 views 0 download

transcript

Object Goal Navigation using Goal-oriented Semantic Exploration

Devendra Singh Chaplot

AbhinavGupta

RuslanSalakhutdinov

DhirajGandhi

Winner CVPR 2020 Habitat ObjectNav Challenge Team Arnold (SemExp)

Webpage: https://devendrachaplot.github.io/projects/semantic-exploration

Object Goal Navigation

2

Object Goal Navigation

2

Object Goal: dining table

Object Goal Navigation

2

Object Goal: dining table

Semantic Scene Understanding

Object detection and segmentation

Geometric Scene Understanding

Understanding navigable space

Passive

Object Goal Navigation

2

Object Goal: dining table

Learning Semantic Priors Episodic Memory

Keeping track of explored and unexplored areas

Where is ‘dining table’ more likely to be found?

Semantic Scene Understanding

Object detection and segmentation

Geometric Scene Understanding

Understanding navigable space

Passive

Active

Active Neural SLAM

Sensor Pose Reading (x′ t)

Observation (st)

Action (at)

Neural SLAM ( fSLAM)

Local Policy (πL)

Global Policy (πG)

Pose Estimate ( ̂xt)

Short-term goal (gs

t )

Long-term goal (gl

t)

fPlan

Map (mt)

3

Active Neural SLAM

Sensor Pose Reading (x′ t)

Observation (st)

Action (at)

Neural SLAM ( fSLAM)

Local Policy (πL)

Global Policy (πG)

Pose Estimate ( ̂xt)

Short-term goal (gs

t )

Long-term goal (gl

t)

fPlan

Map (mt)

3

[Chaplot el al. ICLR-20]

Incorporating Semantics

4

Obstacle Map Representation(Active Neural SLAM)

Obstacle Map (2 × M × M)

ObstaclesExplored Area

Incorporating Semantics

4

Obstacle Map Representation(Active Neural SLAM)

Obstacle Map (2 × M × M)

ObstaclesExplored Area

Semantic Map (K × M × M)

Semantic categories (C)

ObstaclesExplored Area

Semantic Map Representation(SemExp)

K = C + 2

RGB (It)

Depth (Dt)

Semantic Mapping

Mask RCNN

RGB (It)

Depth (Dt)

First-person Semantic Predictions

Semantic Mapping

Mask RCNN

X Y Z

RGB (It)

Depth (Dt)

First-person Semantic Predictions

Point Cloud

Semantic Mapping

Mask RCNN

X Y Z

RGB (It)

Depth (Dt)

First-person Semantic Predictions

Point Cloud

C3C1 C2

Semantic Labels

Semantic Mapping

Mask RCNN

X Y Z

RGB (It)

Depth (Dt)

First-person Semantic Predictions

Point Cloud

C3C1 C2

Semantic Labels

Semantic Mapping

Voxel (C + 1) × H × M × M

All cells

Mask RCNN

X Y Z

sum across height

RGB (It)

Depth (Dt)

First-person Semantic Predictions

Point Cloud

C3C1 C2

Semantic Labels

Category-wise

All obstacles

Projection Map (C + 2) × M × M

Semantic categories (C)

ObstaclesExplored Area

Semantic Mapping

Voxel (C + 1) × H × M × M

All cells

Mask RCNN

X Y Z

sum across height

RGB (It)

Depth (Dt)

First-person Semantic Predictions

Point Cloud

C3C1 C2

Semantic Labels

Category-wise

All obstacles

Projection Map (C + 2) × M × M

Semantic categories (C)

ObstaclesExplored Area

Semantic Map Prediction (C + 2) × M × M

Denoising Network

Semantic Mapping

Voxel (C + 1) × H × M × M

SemExp Model Overview

Sensor Pose Reading (xt)

Object Goal ( “chair”)G =

Observation (RGBD)

(st)

SemExp Model Overview

Sensor Pose Reading (xt)

Semantic Mapping

Semantic Map (mt)

Object Goal ( “chair”)G =

Observation (RGBD)

(st)

SemExp Model Overview

Sensor Pose Reading (xt)

Semantic Mapping

Goal-Oriented Semantic Policy

Long-term goal (gt)

Semantic Map (mt)

Object Goal ( “chair”)G =

Observation (RGBD)

(st)

SemExp Model Overview

Sensor Pose Reading (xt)

Semantic Mapping

Deterministic Local Policy (πL)

Goal-Oriented Semantic Policy

Long-term goal (gt)

Semantic Map (mt)

Object Goal ( “chair”)G =

Observation (RGBD)

(st)

Action (at)

Demo Video

7 https://youtu.be/h56dA2uxpGU

ObjectGoal Navigation Results

8

ObjectGoal Navigation Results

Gibson

Random

RGBD + RL [1]

RGBD + Semantics + RL [2]

Classical Map + FBE

Active Neural SLAM [3]

SemExp

0 0.15 0.3 0.45 0.6

0.5440.446

0.4030.159

0.0820.004

Gibson

Random

RGBD + RL [1]

RGBD + Semantics + RL [2]

Classical Map + FBE

Active Neural SLAM [3]

SemExp

0 0.15 0.3 0.45 0.6

0.5440.446

0.4030.159

0.0820.004 0.004

0.027

0.049

0.124

0.145

0.199

Success Rate

*Adapted from [1] Savva et al. ICCV-19, [2] Mousavian et al. ICRA-19, [3] Chaplot el al. ICLR-20

SPL

8

ObjectGoal Navigation Results

MP3D

Random

RGBD + RL [1]

RGBD + Semantics + RL [2]

Classical Map + FBE

Active Neural SLAM [3]

SemExp

0 0.15 0.3 0.45 0.6

0.360.321

0.3110.0310.037

0.005

MP3D

Random

RGBD + RL [1]

RGBD + Semantics + RL [2]

Classical Map + FBE

Active Neural SLAM [3]

SemExp

0 0.15 0.3 0.45 0.6

0.360.321

0.3110.0310.037

0.005 0.004

0.027

0.049

0.124

0.145

0.199

Success Rate

0.004

0.027

0.049

0.124

0.145

0.199

*Adapted from [1] Savva et al. ICCV-19, [2] Mousavian et al. ICRA-19, [3] Chaplot el al. ICLR-20

SPL

8

Habitat Challenge Leaderboard

9

Test-standard Minival

Method SPL Success Dist SPL Success Dist

Arnold (SemExp) 0.071 0.179 8.818 0.246 0.467 3.334

Active Exploration 0.041 0.089 9.461 0.108 0.167 5.079

DD-PPO 0.021 0.062 9.316 - - -

Blue Ox 0.017 0.060 8.903 0.083 0.133 4.254

SRCB-robot-sudoer 0.002 0.004 10.276 0.124 0.233 4.848

PPO RGBD - - - 0 0 6.055

Random 0.000 0.000 10.330 0 0 6.379

Real-world Transfer

10

See video at https://devendrachaplot.github.io/projects/semantic-exploration

11

Object Goal Navigation using Goal-oriented Semantic ExplorationDevendra Singh Chaplot, Dhiraj Gandhi, Abhinav Gupta, Ruslan SalakhutdinovCVPR 2020

Webpage: https://devendrachaplot.github.io/projects/semantic-exploration

Devendra Singh ChaplotWebpage: http://devendrachaplot.github.io/Email: chaplot@cs.cmu.eduTwitter: @dchaplot

Thank you