Date post: | 17-Jan-2016 |
Category: |
Documents |
Upload: | nigel-holt |
View: | 214 times |
Download: | 0 times |
1
Acquiring 3D Indoor Environments with Variability and Repetition
Young Min Kim Stanford University
Niloy J. MitraUCL/ KAUST
Dong-Ming YanKAUST
Leonidas GuibasStanford University
2
Data Acquisition via Microsoft Kinect
Raw data: Noisy point clouds Unsegmented Occlusion issues
Our tool: Microsoft Kinect
Real-time Provides depth and color Small and inexpensive
3
Dealing with Pointcloud Data
• Object-level reconstruction
• Scene-level reconstruction[Chang and Zwicker 2011]
[Xiao et. al. 2012]
4
Mapping Indoor Environments
• Mapping outdoor environments– Roads to drive vehicles– Flat surfaces
• General indoor environments contain both objects and flat surfaces– Diversity of objects of interest– Objects are often cluttered– Objects deform and move
Solution: Utilize semantic information
5
Nature of Indoor Environments
• Man-made objects can often be well-approximated by simple building blocks– Geometric primitives– Low DOF joints
• Many repeating elements – Chairs, desks, tables, etc.
• Relations between objects give good recognition cues
6
Indoor Scene Understanding with Pointcloud Data
• Patch-based approach
• Object-level understanding [Silberman et. al. 2012]
[Koppula et. al. 2011]
[Shao et. al. 2012] [Nan et. al. 2012]
7
Comparisons
[1] An Interactive Approach to Semantic Modeling of Indoor Scenes with an RGBD Camera[2] A Search-Classify Approach for Cluttered Indoor Scene Understanding
[1] [2] oursPrior model 3D database 3D database Learned
Deformation Scaling Part-based scaling Learned
Matching Classifier Classifier GeometricSegmentation User-assisted Iteration Iteration
Data Microsoft Kinect Mantis Vision Microsoft Kinect
8
Contributions
• Novel approach based on learning stage– Learning stage builds the model that is specific to
the environment• Build an abstract model composed of simple
parts and relationship between parts– Uniquely explain possible low DOF deformation
• Recognition stage can quickly acquire large-scale environments– About 200ms per object
9
Approach
• Learning: Build a high-level model of the repeating elements
• Recognition: Use the model and relationship to recognize the objects
translational
rotational
10
Approach
• Learning– Build a high-level model of the repeating elements
11
Output Model: Simple, Light-Weighted Abstraction
• Primitives– Observable faces
• Connectivity– Rigid– Rotational– Translational– Attachment
• Relationship– Placement information
3m3m
2m2m
1m1m
ggcontact
translational
rotational3
1l
Mlmmm },,,,{ 31321
12
Joint Matching and Fitting
• Individual segmentation– Group by similar normals
• Initial matching– Focus on large parts– Use size, height, relative positions– Keep consistent match
• Joint primitive fitting– Add joints if necessary– Incrementally complete the model
13
Approach
• Learning– Build a high-level model of the repeating elements
14
Approach
• Learning– Build a high-level model of the repeating elements
• Recognition– Use the model and relationship to recognize the
objects
15
Hierarchy
• Ground plane and desk• Objects– Isolated clusters
• Parts– Group by normals
• The segmentation is approximate and to be corrected later
S},,{ 21 oo
iopp },,{ 21
16
Bottom-Up Approach
• Initial assignment for parts vs. primitives– Simple comparison of height, normal, size– Robust to deformation– Low false-negatives
• Refined assignment for objects vs. models– Iteratively solve for position, deformation and
segmentation– Low false-positives
parts
17
Bottom-Up Approach
• Initial assignment for parts vs. primitive nodes• Refined assignment for objects vs. models
Input points
Initial objects
Models matched
Refined objectsobjects parts matched
18
Results
Data available:http://www0.cs.ucl.ac.uk/staff/n.mitra/research/acquire_indoor/paper_docs/data_learning.ziphttp://www0.cs.ucl.ac.uk/staff/n.mitra/research/acquire_indoor/paper_docs/data_recognition.zip
19
Synthetic Scene
Recognition speed: about 200ms per object
20
Synthetic Scene
21
Synthetic Scene
22
0 0.2 0.4 0.6 0.8 1 1.20
0.2
0.4
0.6
0.8
1
1.2Data type
Gaussian 0.004 Gaussian 0.004Gaussian 0.3 Gaussian 0.3Gaussian 1.0 Gaussian 1.0
Precision
Reca
ll
Different pair Similar pair
23
0 0.2 0.4 0.6 0.8 1 1.20
0.2
0.4
0.6
0.8
1
1.2Data type
Gaussian 0.004 Gaussian 0.004Gaussian 0.3 Gaussian 0.3Gaussian 1.0 Gaussian 1.0
Precision
Reca
ll
Different pair Similar pair
24
0 0.2 0.4 0.6 0.8 1 1.20
0.2
0.4
0.6
0.8
1
1.2Noise
Gaussian 0.004 Gaussian 0.008Gaussian 0.3 Gaussian 0.5Gaussian 1.0 Gaussian 2.0
Precision
Reca
ll
0 0.2 0.4 0.6 0.8 1 1.20
0.2
0.4
0.6
0.8
1
1.2Density
density 0.4 density 0.5density 0.6 density 0.7density 0.8
Precision
Reca
ll
25
Office 1
trash bin
4 chairs2 monitors
2 whiteboards
26
Office 2
27
Office 3
28
Deformations
drawer deformations
monitorlaptopmissed monitor
chair
29
Auditorium 1Open table
30
Auditorium 2
Open table
Open chairs
31
Seminar Room 1
missed chairs
32
Seminar Room 2
missed chairs
33
Limitations
• Missing data– Occlusion, material, …
• Error in initial segmentation– Cluttered objects are merged as a single segment– View-point sometimes separate single object into
pieces
34
Conclusion
• We present a system that can recognize repeating objects in cluttered 3D indoor environments.
• We used purely geometric approach based on learned attributes and deformation modes.
• The recognized objects provide high-level scene understanding and can be replaced with high-quality CAD models for visualization (as shown in the previous talks!)
35
Thank You
• Qualcomm Corporation• Max Planck Center for Visual Computing and Communications• NSF grants 0914833 and 1011228• a KAUST AEA grant• Marie Curie Career Integration Grant 303541• Stanford Bio-X travel Subsidy