Date post: | 15-Aug-2015 |
Category: |
Technology |
Upload: | embedded-vision-alliance |
View: | 44 times |
Download: | 4 times |
Copyright © 2015 videantis GmbH 1
Marco Jacobs
12 May 2015
3D from 2D: Theory, Implementation and
Applications of Structure from Motion
Copyright © 2015 videantis GmbH 2
• Founded in 2004
• Headquarters in Hannover
• Vision and video processor IP
• In production for many years
• Many millions of units shipped in
automotive
Videantis & VISCODA Company Overview
• Founded in 2011
• Headquarters in Hannover
• Computer vision and imaging
algorithms and services
• Structure from motion algorithm
for automotive and movie
applications
Copyright © 2015 videantis GmbH 3
Why Do You Need 3D?
Copyright © 2015 videantis GmbH 4
• 3D helps you to measure:
• Distance of objects
• Size of objects
• Directions and speeds of
objects
• Camera position & direction
• Makes object segmentation easier
• Applications:
• Automotive
• Augmented reality
• Mobile positioning
• Many more
It’s a 3D World
Augmented reality Positioning Automotive
Copyright © 2015 videantis GmbH 5
How to Get 3D?
Copyright © 2015 videantis GmbH 6
• Binocular
• Stereopsis
• Convergence
• Monocular
• Motion parallax
• Depth from motion
• Kinetic depth effect
• Aerial & curvilinear
perspective, size,
accommodation, occlusion,
texture gradient, lighting and
shading, defocus blur,
elevation
How Do We Humans Perceive 3D?
Structure
from
motion
Humans primarily use
monocular vision
to sense depth
Copyright © 2015 videantis GmbH 7
Structure From Motion
Structure from
motion algorithm
+ camera origin
and direction
+ calibrated
camera
Copyright © 2015 videantis GmbH 8
Structure from Motion—Video
Copyright © 2015 videantis GmbH 9
Sensor Resolution Distance Cost
Ultrasound - - $
3D sensor (ToF/SL) + - $$$
Radar - + $$$
Lidar + + $$$$$
Stereo cameras + + $$$
Structure from motion + + $
All of the above (fusion) ++ ++ $$$$$
Depth Sensing Comparison Chart
Structure from motion reuses monocular cameras already available in
system: capture 3D data with small form factor, low cost
Copyright © 2015 videantis GmbH 10
Structure from Motion Algorithm
Copyright © 2015 videantis GmbH 11
Structure from Motion
1
2
3
Copyright © 2015 videantis GmbH 12
Structure From Motion
Copyright © 2015 videantis GmbH 13
Structure From Motion
Copyright © 2015 videantis GmbH 14
Structure From Motion
Copyright © 2015 videantis GmbH 15
Structure from Motion
1
2
3
Copyright © 2015 videantis GmbH 16
Feature Detection
1. Sobel in x
2. Sobel in y
1. Derivative calc
2. Box Filter
3. Harris calc
4. Max location
5. Threshold
6. Dilate
7. Select
3 3 2
3 3 2
2 2
M = w(x, y)Ix
2 Ix Iy
Ix Iy Iy2
é
ë
êê
ù
û
úú
x,y
å
(l0,l1) = eigenvalues(M)
l1
l0
l0 » l1
big
l1 >> l0
l0 >> l1
l0 » l1
small
“edge”
“corner”
“edge”
“flat”
Find edges in
horizontal and
vertical directions
Find edges in
horizontal and
vertical directions
1
Two strong
gradient
directions?
found corner
2
Select corners
R > threshold K
(local maxima)
3
Copyright © 2015 videantis GmbH 17
Structure from Motion
1
2
3
Copyright © 2015 videantis GmbH 18
• Find optical flow
• [x,y] vector for each feature
• Algorithm:
• Generate multiscale pyramid
• For all features
• For all scales
• Calculate gradient matrix
• For 1..K (or error<threshold)
• Use gradient matrix to
calculate next location
• Find flow vector estimate
• Reuse guess for next level
Pyramidal Lucas-Kanade Algorithm
Image
n
Image
n+1
Optical
flow:
Find v
v
Image pyramid
Copyright © 2015 videantis GmbH 19
• Float and fixed 32-bit same info
• Float: dynamic range, less accuracy
• Fixed: trade off accuracy for range
change precision on the fly:
e.g. y=1/x; x (Q30.1) y (Q1.30)
• Vision algorithms:
range limited fixed more accuracy
• LK tracking example:
• Precision (Q1.30) fraction 7
more bits than float, 100x better
• OpenCV x86/GPU lose feature
points
• Videantis fixed-point tracks
correctly
• Some algorithms require dynamic range
Optical Flow Lesson Learned:
Fixed Point Versus Floating Point
1 0 1 1 1 0 0 1 0 1 1 1 0 0 1 0 1 1 1 0 0 1 0 1 1 1 0 0 1 1 0 0
Exponent (8 bits) Mantissa (23 bits)
1
0
1 1 1 0 0 1 0 1 1 1 0 0 1 0 1 1 1 0 0 1 0 1 1 1 0 0 1 1 0 0
Integer Fraction
1
0
1 1 1 0 0 1 0 1 1 1 0 0 1 0 1 1 1 0 0 1 0 1 1 1 0 0 1 1 0 0
Integer Fraction
Float
Fractional integer
more accurate than float
change precision
Copyright © 2015 videantis GmbH 20
Finding a Uniform Grid of Feature Points
1
2
3
Find and track feature points:
• OpenCV finds strongest points in image
• But we need a uniform distribution
Solution: define a grid (e.g. 16x16) of cells:
• detect strongest point inside each cell
• track this point from frame to frame
• empty cell find strong feature here
Copyright © 2015 videantis GmbH 21
• Divide image into n strips (selectable at runtime)
• 2 pixels overlap between strips (for filters)
• Each strip is processed on a different core
Feature Detect Parallelization Strategy
Strip 1
Strip 2
Strip n
Copyright © 2015 videantis GmbH 22
1. Build image pyramids—each processor works on strips
2. Track features:
• Group of features processed per core
• Doesn‘t work on wide SIMD processors
• Works well on multicore architectures (close to linear speedups)
Group 1
Group 2 Group n
Feature Tracking Parallelization Strategy
Copyright © 2015 videantis GmbH 23
Structure From Motion
1
2
3
Copyright © 2015 videantis GmbH 24
Epipolar constraint:
0ˆ12RxTx
T
8-point Algorithm—Longuet-higgins ‘81
(R,T )
c0c1
x1
x2
P P
1
2
Find camera motion
Find 3D location of point
For 8 point pairs,
combine into
linear system
SVD decomposition to find R and T
Then find distances and 3D locations using least squares
cES = 0, c = (a1,a2,...,an )T
Copyright © 2015 videantis GmbH 25
• Ambiguity w.r.t. scale
• Camera moved 2x? Or scene is 2x?
• Calibrate using other sensors
• Errors in feature point location, tracking
• No guarantee solution is close to true solution
• No guarantee reconstruction will be consistent
• Need more complex model and solver
• Determine most likely camera parameters and point locations
(Bayesian formulation)
• Scene isn’t static
• Need segmentation, assume rigid bodies
Issues And Robustness
Copyright © 2015 videantis GmbH 26
Conclusions
• SfM uses standard cameras to acquire 3D point clouds
• Robust implementations can yield impressive results
• The algorithms can be implemented efficiently at high resolution while
consuming low power on the videantis parallel signal processor
Please drop by our
booth for
a real-time
demonstration
Copyright © 2015 videantis GmbH 27
Questions?
Copyright © 2015 videantis GmbH 28
• TU München, 20+ hour course on Multiple View Geometry by Prof. D.
Cremers
• https://www.youtube.com/playlist?list=PLTBdjV_4f-
EJn6udZ34tht9EVIW7lbeo4
• Structure from Motion (UCF Comp Vis Video Lectures 2012)
• https://www.youtube.com/watch?v=zdKX7Xo3Cb8
• School of mines lecture
• https://www.youtube.com/watch?v=kfN76APa4HE
• http://inside.mines.edu/~whoff/courses/EENG512/lectures/
Resources