Download - Reconstructing PASCAL VOC - Semantic Scholar · 2017-03-18 · Building 3D morphable models from 2D images, Thomas J. Cashman and Andrew W. Fitzgibbon, PAMI 2013 Model Evolution:

Reconstructing PASCAL VOC

Sara Vicente*Anthropics Technology

Lourdes AgapitoUniversity College London

Jorge BatistaISR - University of

Coimbra

João Carreira*UC Berkeley / ISR

* First two authors contributed equally

Data Matters

1960 1990 2010Person

Motorbike

EverythingToy images3D models

Image classificationCropped images

Hundreds of images,class labels

Object localizationSimple images

10K-1M images, class labels, segmentations and keypoints

Goal:Test data:

Training Data:

PresentRenewed interest on joint object reconstruction and recognition

Seeing 3D chairs: exemplar part-based 2D-3D alignment using a large dataset of CAD models, M. Aubry, D. Maturana, A. Efros, B. Russell and J. Sivic

Estimating Image Depth Using Shape Collections, H. Su, Q. Huang, N. Mitra, Y. Li and L. Guibas

Beyond PASCAL: A Benchmark for 3D Object Detection in the Wild, Y. Xiang, R. Mottaghi and S. Savarese

Detailed 3D Representations for Object Recognition and Modeling, Z. Zia, M. Stark, B. Schiele and K. Schindler

Image-based Synthesis and Re-Synthesis of Viewpoints Guided by 3D Models. K. Rematas, T. Ritschel, M. Fritz, and T. Tuytelaars

Parsing IKEA objects: Fine Pose Estimation. J. Lim, H. Pirsiavash and A. Torralba

Present

Renewed interest on joint object reconstruction and recognition

But awesome recognition datasets (PASCAL VOC, Imagenet) that took years to collect and everyone uses have only 2D annotations

Person

Motorbike

Class labelsSegmentations

Keypoints (not shown)

Available Unavailable

Aligned 3D shapesPASCAL VOC

Proposed Solution

Bootstrap reconstructions for all objects in detection datasets from existing 2D annotations

Facilitate new attack at joint recognition and reconstruction

Available Reconstructed

Class-based Reconstruction – Prior Work

A Morphable Model for the Synthesis of 3D Faces, Volker Blanz and Thomas Vetter, Siggraph 1999

What shape are dolphins? Building 3D morphable models from 2D images, Thomas J. Cashman and Andrew W. Fitzgibbon, PAMI 2013 Model Evolution: An Incremental Approach to

Non-Rigid Structure from Motion, Shengqi Zhu, Li Zhang, Brandon M. Smith, CVPR 2010

Morphable Models built from:

Multiple 3D scans

Single 3D mesh + 2D data

2D data(non-rigid SFM)

Less information

But… how ? PASCAL VOC - Birds

But… how ? PASCAL VOC - Chairs

But… how ? PASCAL VOC - Aeroplanes

But… how ? PASCAL VOC - Boats

Key Idea

Assume for each object in a class there are a small number of similar ones seen from different viewpoints (shape surrogates)

Target Object Other objects in same category

Key Idea

Assume for each object in a class there are a small number of similar ones seen from different viewpoints (shape surrogates)

Target Object Other objects in same category

Reconstruct an object using standard rigid multiview techniques with the images of surrogates as additional views

Hard to identify surrogates: perform viewpoint-biased sampling

Proposed Approach

1. Viewpoint Estimation (Rigid Structure from Motion)

2. 3D Reconstruction (Visual Hull Sampling)

3. Reconstruction RankingFor each object:

Jointly over all objects in a class:

Bet

ter

Output

Step 1 of 3: Class-based Viewpoint Estimation


Factorization-based rigid SFM:

𝑥11 𝑥1

𝑘…

𝑦11 𝑦1

𝑘…

𝑥21 𝑥2

𝑘…

𝑦21 𝑦2

𝑘…

𝑥𝑁1 𝑥𝑁

𝑘…

𝑦𝑁1 𝑦𝑁

𝑘…

… … =

Measurement matrix

Estimating 3D shape from degenerate sequences with missing data, Manuel Marques, João Paulo Costeira, CVIU 2009

Known

𝑀1

…

Unknown

𝑀2

𝑀𝑁

Motion matrices

x

Shape

Unknown

𝑥1

𝑦1

𝑧1

𝑥2

𝑦2

𝑧2

𝑥𝑘

𝑦𝑘

𝑧𝑘

…

…

…


Idea: exploit segmentation information: occluded keypoints should project inside silhouette

Side viewOriginal view

Estimated keypoints (occluded)

Estimated keypoints (visible)

Ground truth keypoints (only visible ones are available)


Estimated elevation for airplanes:

Step 2 of 3: 3D Reconstruction (Visual Hull)

Well-known multiview reconstruction algorithm

Efficient

Easy to implement

Multiple views of same aeroplane model


Making the multiview reconstruction assumptions hold

Sampling approach• Randomly select multiple pairs of silhouettes hoping that one

pair arises from shape surrogates• Bias sampling to most informative viewpoints

Cars Aeroplanes


Making the multiview reconstruction assumptions hold

Sampling approach• Randomly select multiple pairs of silhouettes hoping that one

pair arises from shape surrogates• Bias sampling to most informative viewpoints

Typically:• Left/Right• Top/Bottom• Front/Back


Principal Component Analysis on 3D points from SFM returns an intuitive set of 3 informative viewpoints

Cluster together objects up to 15º away from these viewpoints

Cars Aeroplanes

Informative viewpoints = PCA ( )

Step 2 of 3: Visual Hull Reconstruction

Randomly sample silhouettes from 2 out of the 3 clusters multiple times and reconstruct from each combination with target image (in gray)

a b

c d e

Step 2 of 3: Imprinted Visual Hull Reconstruction

Optimize each reconstruction to conform exactly to the reference silhouette

Non-imprintedImprinted Reference silhouette

Step 3 of 3: Reconstruction Ranking

Select mesh whose projected boundaries best match average masks

Car average masks and SFM model Selected reconstruction

Bet

ter

Target Object

Reconstruction ranking

Experiments

Reconstructed 9,087 annotated and unnocluded objects on PASCAL VOC 20 categories

Also reconstructed 1000 renderings of a synthetic extension of PASCAL VOC for obtaining quantitative results

Synthetic Dataset: Reconstruction Error

Smaller is better

Shape InflationOur results SFM convex hull

Smaller is better


Playing with puffball: simple scale-invariant inflation for use in vision and graphics,N. Twarog, M. Tappen, and E. Adelson, In ACM Symp. on Applied Perception, 2012

Shape InflationOur results SFM convex hull


Smaller is better

Shape InflationThis method SFM Convex Hull


Inflation: shape inflation baseline

SFMCvxHull: convex hull of SFM points

aeroplane 3.58 9.64 5.79

bicycle 4.3 10.51 6.56

bird 9.98 8.76 12.01

boat 5.91 8.81 6.52

bottle 8.09 6.25 12.13

bus 6.45 11.02 7.34

car 3.04 11.07 3.22

cat 6.98 11.39 9.61

chair 5.36 8.13 7.37

cow 5.44 9.17 7.5

dining table 8.97 8.67 9.52

dog 7.08 11.61 9.91

horse 6.05 6.9 7.41

motorbike 4.12 9.24 5.32

person 7.35 9.14 19.46

potted plant 7.72 7.58 17.86

sheep 7.18 8.77 7.16

sofa 6.11 8.06 5.75

train 15.73 17.01 17.47

tv/monitor 9.73 9.67 10.08

mean 6.96 9.57 9.4

Code available online:http://www2.isr.uc.pt/~joaoluis/carvi/index.html

Conclusions

Rigid SFM can be made robust to challenging intra-category variation

Class-based reconstruction by sampling visual hulls with different putative surrogates shapes

Bootstrapped coarse 3D viewpoint and shape information from existing 2D annotations on PASCAL VOC

Future work: • Learn more powerful recognition models from the new 3D data

• Relax need for annotations

Thanks!