Reconstructing PASCAL VOC
Sara Vicente*Anthropics Technology
Lourdes AgapitoUniversity College London
Jorge BatistaISR - University of
Coimbra
João Carreira*UC Berkeley / ISR
* First two authors contributed equally
Data Matters
1960 1990 2010Person
Motorbike
EverythingToy images3D models
Image classificationCropped images
Hundreds of images,class labels
Object localizationSimple images
10K-1M images, class labels, segmentations and keypoints
Goal:Test data:
Training Data:
PresentRenewed interest on joint object reconstruction and recognition
Seeing 3D chairs: exemplar part-based 2D-3D alignment using a large dataset of CAD models, M. Aubry, D. Maturana, A. Efros, B. Russell and J. Sivic
Estimating Image Depth Using Shape Collections, H. Su, Q. Huang, N. Mitra, Y. Li and L. Guibas
Beyond PASCAL: A Benchmark for 3D Object Detection in the Wild, Y. Xiang, R. Mottaghi and S. Savarese
Detailed 3D Representations for Object Recognition and Modeling, Z. Zia, M. Stark, B. Schiele and K. Schindler
Image-based Synthesis and Re-Synthesis of Viewpoints Guided by 3D Models. K. Rematas, T. Ritschel, M. Fritz, and T. Tuytelaars
Parsing IKEA objects: Fine Pose Estimation. J. Lim, H. Pirsiavash and A. Torralba
Present
Renewed interest on joint object reconstruction and recognition
But awesome recognition datasets (PASCAL VOC, Imagenet) that took years to collect and everyone uses have only 2D annotations
Person
Motorbike
Class labelsSegmentations
Keypoints (not shown)
Available Unavailable
Aligned 3D shapesPASCAL VOC
Proposed Solution
Bootstrap reconstructions for all objects in detection datasets from existing 2D annotations
Facilitate new attack at joint recognition and reconstruction
Available Reconstructed
Class-based Reconstruction – Prior Work
A Morphable Model for the Synthesis of 3D Faces, Volker Blanz and Thomas Vetter, Siggraph 1999
What shape are dolphins? Building 3D morphable models from 2D images, Thomas J. Cashman and Andrew W. Fitzgibbon, PAMI 2013 Model Evolution: An Incremental Approach to
Non-Rigid Structure from Motion, Shengqi Zhu, Li Zhang, Brandon M. Smith, CVPR 2010
Morphable Models built from:
Multiple 3D scans
Single 3D mesh + 2D data
2D data(non-rigid SFM)
Less information
But… how ? PASCAL VOC - Birds
But… how ? PASCAL VOC - Chairs
But… how ? PASCAL VOC - Aeroplanes
But… how ? PASCAL VOC - Boats
Key Idea
Assume for each object in a class there are a small number of similar ones seen from different viewpoints (shape surrogates)
Target Object Other objects in same category
Key Idea
Assume for each object in a class there are a small number of similar ones seen from different viewpoints (shape surrogates)
Target Object Other objects in same category
Reconstruct an object using standard rigid multiview techniques with the images of surrogates as additional views
Hard to identify surrogates: perform viewpoint-biased sampling
Proposed Approach
1. Viewpoint Estimation (Rigid Structure from Motion)
2. 3D Reconstruction (Visual Hull Sampling)
3. Reconstruction RankingFor each object:
Jointly over all objects in a class:
Bet
ter
Output
Step 1 of 3: Class-based Viewpoint Estimation
Step 1 of 3: Class-based Viewpoint Estimation
Factorization-based rigid SFM:
𝑥11 𝑥1
𝑘…
𝑦11 𝑦1
𝑘…
𝑥21 𝑥2
𝑘…
𝑦21 𝑦2
𝑘…
𝑥𝑁1 𝑥𝑁
𝑘…
𝑦𝑁1 𝑦𝑁
𝑘…
… … =
Measurement matrix
Estimating 3D shape from degenerate sequences with missing data, Manuel Marques, João Paulo Costeira, CVIU 2009
Known
𝑀1
…
Unknown
𝑀2
𝑀𝑁
Motion matrices
x
Shape
Unknown
𝑥1
𝑦1
𝑧1
𝑥2
𝑦2
𝑧2
𝑥𝑘
𝑦𝑘
𝑧𝑘
…
…
…
Step 1 of 3: Class-based Viewpoint Estimation
Idea: exploit segmentation information: occluded keypoints should project inside silhouette
Side viewOriginal view
Estimated keypoints (occluded)
Estimated keypoints (visible)
Ground truth keypoints (only visible ones are available)
Step 1 of 3: Class-based Viewpoint Estimation
Estimated elevation for airplanes:
Step 2 of 3: 3D Reconstruction (Visual Hull)
Well-known multiview reconstruction algorithm
Efficient
Easy to implement
Multiple views of same aeroplane model
Step 2 of 3: 3D Reconstruction (Visual Hull)
Making the multiview reconstruction assumptions hold
Sampling approach• Randomly select multiple pairs of silhouettes hoping that one
pair arises from shape surrogates• Bias sampling to most informative viewpoints
Cars Aeroplanes
Step 2 of 3: 3D Reconstruction (Visual Hull)
Making the multiview reconstruction assumptions hold
Sampling approach• Randomly select multiple pairs of silhouettes hoping that one
pair arises from shape surrogates• Bias sampling to most informative viewpoints
Typically:• Left/Right• Top/Bottom• Front/Back
Step 2 of 3: 3D Reconstruction (Visual Hull)
Principal Component Analysis on 3D points from SFM returns an intuitive set of 3 informative viewpoints
Cluster together objects up to 15º away from these viewpoints
Cars Aeroplanes
Informative viewpoints = PCA ( )
Step 2 of 3: Visual Hull Reconstruction
Randomly sample silhouettes from 2 out of the 3 clusters multiple times and reconstruct from each combination with target image (in gray)
a b
c d e
Step 2 of 3: Imprinted Visual Hull Reconstruction
Optimize each reconstruction to conform exactly to the reference silhouette
Non-imprintedImprinted Reference silhouette
Step 3 of 3: Reconstruction Ranking
Select mesh whose projected boundaries best match average masks
Car average masks and SFM model Selected reconstruction
Bet
ter
Target Object
Reconstruction ranking
Experiments
Reconstructed 9,087 annotated and unnocluded objects on PASCAL VOC 20 categories
Also reconstructed 1000 renderings of a synthetic extension of PASCAL VOC for obtaining quantitative results
Synthetic Dataset: Reconstruction Error
Smaller is better
Shape InflationOur results SFM convex hull
Smaller is better
Synthetic Dataset: Reconstruction Error
Playing with puffball: simple scale-invariant inflation for use in vision and graphics,N. Twarog, M. Tappen, and E. Adelson, In ACM Symp. on Applied Perception, 2012
Shape InflationOur results SFM convex hull
Synthetic Dataset: Reconstruction Error
Smaller is better
Shape InflationThis method SFM Convex Hull
Synthetic Dataset: Reconstruction Error
Inflation: shape inflation baseline
SFMCvxHull: convex hull of SFM points
aeroplane 3.58 9.64 5.79
bicycle 4.3 10.51 6.56
bird 9.98 8.76 12.01
boat 5.91 8.81 6.52
bottle 8.09 6.25 12.13
bus 6.45 11.02 7.34
car 3.04 11.07 3.22
cat 6.98 11.39 9.61
chair 5.36 8.13 7.37
cow 5.44 9.17 7.5
dining table 8.97 8.67 9.52
dog 7.08 11.61 9.91
horse 6.05 6.9 7.41
motorbike 4.12 9.24 5.32
person 7.35 9.14 19.46
potted plant 7.72 7.58 17.86
sheep 7.18 8.77 7.16
sofa 6.11 8.06 5.75
train 15.73 17.01 17.47
tv/monitor 9.73 9.67 10.08
mean 6.96 9.57 9.4
Code available online:http://www2.isr.uc.pt/~joaoluis/carvi/index.html
Conclusions
Rigid SFM can be made robust to challenging intra-category variation
Class-based reconstruction by sampling visual hulls with different putative surrogates shapes
Bootstrapped coarse 3D viewpoint and shape information from existing 2D annotations on PASCAL VOC
Future work: • Learn more powerful recognition models from the new 3D data
• Relax need for annotations
Thanks!