+ All Categories
Home > Documents > Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1...

Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1...

Date post: 13-Jul-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
82
Scene Understanding with 3D Deep Networks Thomas Funkhouser Princeton University
Transcript
Page 1: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Scene Understanding

with 3D Deep Networks

Thomas Funkhouser

Princeton University

Page 2: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Disclaimer: I am talking about the work of these people …

Shuran Song Andy Zeng Fisher Yu

Angela Dai Matthias Niessner Matt FisherJianxiong Xiao

Maciej HalberYinda Zhang

Page 3: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Goal

Understanding indoor scenes observed in RGB-D images

• Robotics

• Augmented reality

• Virtual tourism

• Surveillance

• Home remodeling

• Real estate

• Telepresence

• Forensics

• Games

• etc.

Page 4: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Goal

Understanding indoor scenes observed in RGB-D images

Input RGB-D Image(s)

Semantic Segmentation

Page 5: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Goal

Understanding indoor scenes observed in RGB-D images in 3D

3D Scene Understanding

Input RGB-D Image(s)

Semantic Segmentation

Page 6: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Goal

Understanding indoor scenes observed in RGB-D images in 3D

• Surface reconstruction

• Amodal object detection

• Object relationships

• Materials, lights, etc.

• Physical properties

• Novel views

• Info sharing

• Spatial inference

• Simulation

• etc.

Semantic Segmentation

Page 7: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Goal for This Talk

Learn ConvNets to recognize patterns in voxels

• Local shape descriptor

• Amodal object detection

• Semantic scene completion

Page 8: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Talk Outline

Local shape descriptor

Amodal object detection

Semantic scene completion

Scale

Small

Large

Page 9: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Talk Outline

Local shape descriptor

Amodal object detection

Semantic scene completion

Scale

Small

LargeA. Zeng, S. Song, M. Niessner, M. Fisher, J. Xiao, T. Funkhouser,

“3DMatch: Learning Local Geometric Descriptors from 3D Reconstructions,”

submitted to CVPR 2017

Page 10: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Local Shape Descriptor

Goal: train a discriminating 3D local shape descriptor from data

Local shape descriptor Local shape descriptor

…0.58 0.21 0.92 0.67 0.04 0.53

Match!

0.58 0.21 0.92 0.67 0.04 0.53 …

Page 11: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Local Shape Descriptor

Challenge: where to get training data?

Page 12: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Local Shape Descriptor: “3D Match”

Approach: train on wide-baseline correspondences in RGB-D reconstructions

“Ground truth” match between

RGB-D Images from different views

Page 13: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Local Shape Descriptor: “3D Match”

Approach: train on wide-baseline correspondences in RGB-D reconstructions

Page 14: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Local Shape Descriptor: “3D Match”

Method: sample true/false correspondences from RGB-D reconstructions,

train Siamese network

Page 15: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Local Shape Descriptor: “3D Match”

Result: learns to discriminate local shapes found in real-world data

Page 16: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Local Shape Descriptor: “3D Match” Results

Result 1: learned feature descriptor predicts RGB-D point correspondences

more accurately than hand-tuned descriptors

Match classification error at 95% recall

Fragment Alignment Success Rate

Page 17: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Local Shape Descriptor: “3D Match” Results

Result 2: feature descriptor learned from RGB-D reconstructions provides

matching for recognizing poses of small objects in Amazon Picking Challenge

Predicting pose of 3D object model in RGB-D scan

Object pose prediction accuracy

Page 18: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Local Shape Descriptor: “3D Match” Results

Result 3: feature descriptor learned from RGB-D reconstructions provides

discriminative matching of semantic correspondences on 3D meshes

Page 19: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Talk Outline

Local Shape Descriptor

Amodal object detection

Semantic scene completion

Scale

Small

LargeS. Song and J. Xiao,

“Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images,”

CVPR 2016

Page 20: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Object Detection

Goal: given a RGB-D image, find objects (labeled 3D amodal bounding boxes)

Input: Single RGB-D Output: labeled 3D Amodal Boxes

Page 21: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

[CVPR13] Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images

[IJCV14] Indoor Scene Understanding with RGB-D Images: Bottom-up Segmentation, Object Detection and semantic segmentation

[ECCV14] Object Detection and Segmentation using Semantically Rich Image and Depth Features

[CVPR15] Aligning 3D Models to RGB-D Images of Cluttered Scenes

[CVPR16] Cross Modal Distillation for Supervision Transfer

2D Operations

2D Instance

Segmentation

Coarse Pose

Classification

Point Cloud

Alignment

2D Contour

Detection

2D Region

Proposal

2D Object

Detection

Encode Depth Map

as Extra Channels

3D Amodal

Detection Result

Depth Map

Image

3D Output3D Input 3D

Object Detection

Most previous work:

Page 22: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

3D Deep Learning

Object Detection: “Deep Sliding Shapes”

Approach:

3D Amodal

Detection Result

Depth Map

Image

3D Operations 3D Output3D Input

Page 23: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

bed

Object Detection: “Deep Sliding Shapes”

Object Recognition NetworkRegion Proposal Network

RGB-D Image

Page 24: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

bed

Object Detection: “Deep Sliding Shapes”

Object Recognition NetworkRegion Proposal Network

RGB-D Image

Page 25: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Object Detection: “Deep Sliding Shapes”

Data encoding:

1) Estimate

major directions

of room

2) Compute

TSDF

Page 26: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Object Detection: “Deep Sliding Shapes”

Data encoding:

1) Estimate

major directions

of room

2) Compute

TSDF

2.5 m

5.2 m

5.2 m

Page 27: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Object Detection: “Deep Sliding Shapes”

Data encoding:

1) Estimate

major directions

of room

2) Compute

TSDF

Page 28: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Region

Proposal

Network

TSDF 3D Region Proposals

Object Detection: “Deep Sliding Shapes”

3D region proposal network:

Page 29: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

×3

×50Pixel Area

Physical Size

Object Detection: “Deep Sliding Shapes”

3D region proposal network:

Page 30: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Object Detection: “Deep Sliding Shapes”

Multiscale 3D region proposal network:

Page 31: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Inp

ut:

TS

DF

Con

v 1

ReL

U +

Po

ol

Con

v 2

ReL

U +

Po

ol

Object Detection: “Deep Sliding Shapes”

Multiscale 3D region proposal network:

Page 32: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Inp

ut:

TS

DF

Con

v 1

ReL

U +

Po

ol

Con

v 2

ReL

U +

Po

ol

Con

v 3

ReL

U +

Po

ol

Conv

Class

Conv

3D Box

Softmax

L1

Smooth

Object Detection: “Deep Sliding Shapes”

Multiscale 3D region proposal network:

Receptive field: 0.4 m3

Page 33: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Level 1 Anchors

0.6×0.2×0.4 m

0.5×0.5×0.2 m

0.6×0.2×0.4 m

Inp

ut:

TS

DF

Con

v 1

ReL

U +

Po

ol

Con

v 2

ReL

U +

Po

ol

Con

v 3

ReL

U +

Po

ol

Conv

Class

Conv

3D Box

Softmax

L1

Smooth

Object Detection: “Deep Sliding Shapes”

Multiscale 3D region proposal network:

Receptive field: 0.4 m3

Page 34: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Con

v 4

ReL

U +

Po

ol

Conv

Class

Conv

3D Box

Softmax

L1

Smooth

Inp

ut:

TS

DF

Con

v 1

ReL

U +

Po

ol

Con

v 2

ReL

U +

Po

ol

Con

v 3

ReL

U +

Po

ol

Conv

Class

Conv

3D Box

Softmax

L1

Smooth

Object Detection: “Deep Sliding Shapes”

Multiscale 3D region proposal network:

Receptive field: 1 m3Receptive field: 0.4 m3

Page 35: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Level 2 Anchors

Con

v 4

ReL

U +

Po

ol

Conv

Class

Conv

3D Box

Softmax

L1

Smooth

Object Detection: “Deep Sliding Shapes”

Receptive field: 1 m3

Page 36: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

bed

Object Detection: “Deep Sliding Shapes”

Object Recognition NetworkRegion Proposal Network

RGB-D Image

Page 37: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

project to 2D

Object Detection: “Deep Sliding Shapes”

Joint object recognition network:

Page 38: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

TSDF

Image Patch

Object Detection: “Deep Sliding Shapes”

Joint object recognition network:

Page 39: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Object Detection: “Deep Sliding Shapes”

Joint object recognition network:

Page 40: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Co

nv 1

Re

LU

+ P

oo

l

Co

nv 2

Re

LU

+ P

oo

l

Co

nv 3

Re

LU

FC

2

2D VGG on ImageNet

Con

ca

ten

atio

n

FC

3

FC

Cla

ss

FC

3D

Bo

x

So

ftm

ax

L1

Sm

oo

th

3D ConvNet

Object Detection: “Deep Sliding Shapes”

Joint object recognition network:

Page 41: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Object Detection: “Deep Sliding Shapes” Experiments

Train and test on amodal boxes provided in SUN RGB-D

S. Song, S. Lichtenberg, and J. Xiao, “SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite,” CVPR 2015

Page 42: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

2D Deep Learning

3D Deep Learning

3D Non-Deep Learning

Object Detection: “Deep Sliding Shapes” Results

Quantitative comparisons:

Object detection accuracy on NYU v2 dataset (mAP)

Page 43: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Sliding Shapes: sofa Ours: bathtub

Object Detection: “Deep Sliding Shapes” Results

Qualitative comparisons:

Page 44: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Sliding Shapes: chair Ours: sofa

Object Detection: “Deep Sliding Shapes” Results

Qualitative comparisons:

Page 45: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Sliding Shapes: table Ours: bed

Object Detection: “Deep Sliding Shapes” Results

Qualitative comparisons:

Page 46: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Sliding Shapes: miss Ours: table and chairs

Object Detection: “Deep Sliding Shapes” Results

Qualitative comparisons:

Page 47: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Sliding Shapes: toilet Ours: garbage bin+bed

Object Detection: “Deep Sliding Shapes” Results

Qualitative comparisons:

Page 48: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Talk Outline

Local Shape Descriptor

Amodal object detection

Semantic scene completion

Scale

Small

Large S. Song, F. Yu, A. Zeng, A. Chang, M. Savva, and T. Funkhouser,

“Semantic Scene Completion from a Single Depth Image,”

submitted to CVPR 2017

Page 49: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Input: Single view depth map Output: Semantic scene completion

Semantic Scene Completion

Goal: given an RGB-D image, label all voxels by semantic class

Page 50: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

3D Scene

visible surface

free space

occluded space

outside view

outside room

Semantic Scene Completion

Goal: given an RGB-D image, label all voxels by semantic class

Page 51: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

visible surface

free space

occluded space

outside view

outside room

3D Scene

Semantic Scene Completion

Goal: given an RGB-D image, label all voxels by semantic class

Page 52: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

semantic scene completion

This paper

scene completion Firman et al.

surface segmentation Silberman et al.

The occupancy and the object identity

are tightly intertwined !

3D Scene

Semantic Scene Completion

Prior work: segmentation OR completion

Page 53: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Prediction: N+1 classes

SSCNet

Input: Single view depth map Output: Semantic scene completion

3D ConvNet

Semantic Scene Completion: “SSCNet”

Approach: end-to-end deep network

Page 54: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Semantic Scene Completion : “SSCNet”

Page 55: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Semantic Scene Completion : “SSCNet”

Page 56: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Encode 3D space using flipped TSDF

Semantic Scene Completion : “SSCNet”

Page 57: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Encode 3D space using flipped TSDF

Voxel size: 0.02 m

Semantic Scene Completion : “SSCNet”

Page 58: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Local geometry

Receptive field: 0.98 m

Semantic Scene Completion : “SSCNet”

Page 59: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

High-level 3D context

via big receptive field

provided by

dilated convolution

Receptive field: 2.26

Semantic Scene Completion : “SSCNet”

Page 60: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Multi-scale aggregation

Receptive field: 0.98 m Receptive field:1.62 m Receptive field: 2.26 m

Semantic Scene Completion : “SSCNet”

Page 61: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Semantic Scene Completion: “SSCNet” Experiments

Where to get training data?

Page 62: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Semantic Scene Completion: “SSCNet” Experiments

Where to get training data?

No dense volumetric ground truth with semantic labels for a complete scene

SUN3D: No semantic labelsNYU: only visible surfaces

Page 63: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Semantic Scene Completion: “SSCNet” Experiments

SUNCG dataset

Page 64: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Semantic Scene Completion: “SSCNet” Experiments

SUNCG dataset

• 46K houses

• 50K floors

• 400K rooms

• 5.6M object instances

Page 65: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Semantic Scene Completion: “SSCNet” Experiments

SUNCG dataset

synthetic camera views depth

ground truth

semantic scene

completion

Page 66: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Semantic Scene Completion: “SSCNet” Experiments

SUNCG dataset

Page 67: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Train on SUNCG Test on NYU

Semantic Scene Completion: “SSCNet” Experiments

Page 68: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Semantic Scene Completion: “SSCNet” Results

Result: better than previous volumetric completion algorithms

Comparison to previous algorithms for volumetric completion

Page 69: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Zhang et al.

Ground Truth

Ours(SSCNet)

Color Image Observed Surface

Firman et al.

Page 70: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Semantic Scene Completion: “SSCNet” Results

Result: better than previous 3D model fitting algorithms

Comparison to previous algorithms for 3D model fitting

Page 71: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Ours(SSCNet)Geiger and WangLin et al.

Color Image Observed Surface Ground Truth

Page 72: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Ours(SSCNet)Geiger and WangLin et al.

Color Image Observed Surface Ground Truth

Page 73: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Ours(SSCNet)Geiger and WangLin et al.

Color Image Observed Surface Ground Truth

Page 74: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Summary

Three projects where ConvNets are trained to recognize patterns in voxels

with different …

• Tasks

• Scales

• Training data

• Loss functions

• Network architectures

• Training protocols

Page 75: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Future Challenges

Acquiring larger data sets

Leveraging geometric structure

Leveraging semantic structure

Better integration RGB and D

Better surface parameterizations

Finer-grained categories

Higher resolution

etc.

Page 76: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Future Challenges

Acquiring larger data sets

Leveraging geometric structure

Leveraging semantic structure

Better integration RGB and D

Better surface parameterizations

Finer-grained categories

Higher resolution

etc.

Page 77: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Future Challenges

Acquiring larger data sets

Leveraging geometric structure

Leveraging semantic structure

Better integration RGB and D

Better surface parameterizations

Finer-grained categories

Higher resolution

etc.

1,500 surface reconstructions 36,213 labeled objects

A. Dai, A. Chang, M. Savva,

M. Halber, T. Funkhouser, and M. Niessner,

“ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes,”

submitted to CVPR 2017.

Page 78: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Future Challenges

Acquiring larger data sets

Leveraging geometric structure

Leveraging semantic structure

Better integration RGB and D

Better surface parameterizations

Finer-grained categories

Higher resolution

etc.

M. Halber, T. Funkhouser,

“Fine-to-Coarse Registration of RGB-D Scans,”

submitted to CVPR 2017

Page 79: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Future Challenges

Acquiring larger data sets

Leveraging geometric structure

Leveraging semantic structure

Better integration RGB and D

Better surface parameterizations

Finer-grained categories

Higher resolution

etc.

M. Halber, T. Funkhouser,

“Fine-to-Coarse Registration of RGB-D Scans,”

submitted to CVPR 2017

Page 80: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Future Challenges

Acquiring larger data sets

Leveraging geometric structure

Leveraging semantic structure

Better integration RGB and D

Better surface parameterizations

Finer-grained categories

Higher resolution

etc.

M. Halber, T. Funkhouser,

“Fine-to-Coarse Registration of RGB-D Scans,”

submitted to CVPR 2017

Page 81: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Future Challenges

Acquiring larger data sets

Leveraging geometric structure

Leveraging semantic structure

Better integration RGB and D

Better surface parameterizations

Finer-grained categories

Higher resolution

etc.

Sleeping Area

ottoman

bed

sofadresser with mirror

dresser

nightstand

lamp

wall

dresser

dresser with mirror

Y. Zhang, M. Bai, J. Xiao, P. Kohli, and S. Izadi,

“DeepContext: Context-Encoding Neural Pathways

for 3D Holistic Scene Understanding,”

submitted to CVPR 2017

Page 82: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep

Acknowledgments

Princeton:• Angel Chang, Maciej Halber, Manolis Savva, Elena Sizikova,

Shuran Song, Jianxiong Xiao, Fisher Yu, Yinda Zhang, Andy Zeng

Collaborators:• Angela Dai, Matt Fisher, Matthias Niessner, Ersin Yumer

Data:• SUN3D, 7-Scenes, Analysis-by-Synthesis, NYU, Trimble, Planner5D

Funding:• Intel, NSF, Adobe

Thank You!


Recommended