"3D from 2D: Theory, Implementation, and Applications of Structure from Motion," a Presentation From...

Copyright © 2015 videantis GmbH 1

Marco Jacobs

12 May 2015

3D from 2D: Theory, Implementation and

Applications of Structure from Motion


• Founded in 2004

• Headquarters in Hannover

• Vision and video processor IP

• In production for many years

• Many millions of units shipped in

automotive

Videantis & VISCODA Company Overview

• Founded in 2011

• Headquarters in Hannover

• Computer vision and imaging

algorithms and services

• Structure from motion algorithm

for automotive and movie

applications


Why Do You Need 3D?


• 3D helps you to measure:

• Distance of objects

• Size of objects

• Directions and speeds of

objects

• Camera position & direction

• Makes object segmentation easier

• Applications:

• Automotive

• Augmented reality

• Mobile positioning

• Many more

It’s a 3D World

Augmented reality Positioning Automotive


How to Get 3D?


• Binocular

• Stereopsis

• Convergence

• Monocular

• Motion parallax

• Depth from motion

• Kinetic depth effect

• Aerial & curvilinear

perspective, size,

accommodation, occlusion,

texture gradient, lighting and

shading, defocus blur,

elevation

How Do We Humans Perceive 3D?

Structure

from

motion

Humans primarily use

monocular vision

to sense depth


Structure From Motion

Structure from

motion algorithm

+ camera origin

and direction

+ calibrated

camera


Structure from Motion—Video


Sensor Resolution Distance Cost

Ultrasound - - $

3D sensor (ToF/SL) + - $$$

Radar - + $$$

Lidar + + $$$$$

Stereo cameras + + $$$

Structure from motion + + $

All of the above (fusion) ++ ++ $$$$$

Depth Sensing Comparison Chart

Structure from motion reuses monocular cameras already available in

system: capture 3D data with small form factor, low cost


Structure from Motion Algorithm


Structure from Motion

1

2

3









1

2

3


Feature Detection

1. Sobel in x

2. Sobel in y

1. Derivative calc

2. Box Filter

3. Harris calc

4. Max location

5. Threshold

6. Dilate

7. Select

3 3 2

3 3 2

2 2

M = w(x, y)Ix

2 Ix Iy

Ix Iy Iy2

é

ë

êê

ù

û

úú

x,y

å

(l0,l1) = eigenvalues(M)

l1

l0

l0 » l1

big

l1 >> l0

l0 >> l1

l0 » l1

small

“edge”

“corner”

“edge”

“flat”

Find edges in

horizontal and

vertical directions

Find edges in

horizontal and

vertical directions

1

Two strong

gradient

directions?

found corner

2

Select corners

R > threshold K

(local maxima)

3



1

2

3


• Find optical flow

• [x,y] vector for each feature

• Algorithm:

• Generate multiscale pyramid

• For all features

• For all scales

• Calculate gradient matrix

• For 1..K (or error<threshold)

• Use gradient matrix to

calculate next location

• Find flow vector estimate

• Reuse guess for next level

Pyramidal Lucas-Kanade Algorithm

Image

n

Image

n+1

Optical

flow:

Find v

v

Image pyramid


• Float and fixed 32-bit same info

• Float: dynamic range, less accuracy

• Fixed: trade off accuracy for range

change precision on the fly:

e.g. y=1/x; x (Q30.1) y (Q1.30)

• Vision algorithms:

range limited fixed more accuracy

• LK tracking example:

• Precision (Q1.30) fraction 7

more bits than float, 100x better

• OpenCV x86/GPU lose feature

points

• Videantis fixed-point tracks

correctly

• Some algorithms require dynamic range

Optical Flow Lesson Learned:

Fixed Point Versus Floating Point

1 0 1 1 1 0 0 1 0 1 1 1 0 0 1 0 1 1 1 0 0 1 0 1 1 1 0 0 1 1 0 0

Exponent (8 bits) Mantissa (23 bits)

1

0

1 1 1 0 0 1 0 1 1 1 0 0 1 0 1 1 1 0 0 1 0 1 1 1 0 0 1 1 0 0

Integer Fraction

1

0

1 1 1 0 0 1 0 1 1 1 0 0 1 0 1 1 1 0 0 1 0 1 1 1 0 0 1 1 0 0

Integer Fraction

Float

Fractional integer

more accurate than float

change precision


Finding a Uniform Grid of Feature Points

1

2

3

Find and track feature points:

• OpenCV finds strongest points in image

• But we need a uniform distribution

Solution: define a grid (e.g. 16x16) of cells:

• detect strongest point inside each cell

• track this point from frame to frame

• empty cell find strong feature here


• Divide image into n strips (selectable at runtime)

• 2 pixels overlap between strips (for filters)

• Each strip is processed on a different core

Feature Detect Parallelization Strategy

Strip 1

Strip 2

Strip n


1. Build image pyramids—each processor works on strips

2. Track features:

• Group of features processed per core

• Doesn‘t work on wide SIMD processors

• Works well on multicore architectures (close to linear speedups)

Group 1

Group 2 Group n

Feature Tracking Parallelization Strategy



1

2

3


Epipolar constraint:

0ˆ12RxTx

T

8-point Algorithm—Longuet-higgins ‘81

(R,T )

c0c1

x1

x2

P P

1

2

Find camera motion

Find 3D location of point

For 8 point pairs,

combine into

linear system

SVD decomposition to find R and T

Then find distances and 3D locations using least squares

cES = 0, c = (a1,a2,...,an )T


• Ambiguity w.r.t. scale

• Camera moved 2x? Or scene is 2x?

• Calibrate using other sensors

• Errors in feature point location, tracking

• No guarantee solution is close to true solution

• No guarantee reconstruction will be consistent

• Need more complex model and solver

• Determine most likely camera parameters and point locations

(Bayesian formulation)

• Scene isn’t static

• Need segmentation, assume rigid bodies

Issues And Robustness


Conclusions

• SfM uses standard cameras to acquire 3D point clouds

• Robust implementations can yield impressive results

• The algorithms can be implemented efficiently at high resolution while

consuming low power on the videantis parallel signal processor

Please drop by our

booth for

a real-time

demonstration


Questions?


• TU München, 20+ hour course on Multiple View Geometry by Prof. D.

Cremers

• https://www.youtube.com/playlist?list=PLTBdjV_4f-

EJn6udZ34tht9EVIW7lbeo4

• Structure from Motion (UCF Comp Vis Video Lectures 2012)

• https://www.youtube.com/watch?v=zdKX7Xo3Cb8

• School of mines lecture

• https://www.youtube.com/watch?v=kfN76APa4HE

• http://inside.mines.edu/~whoff/courses/EENG512/lectures/

Resources

https://www.youtube.com/playlist?list=PLTBdjV_4f-EJn6udZ34tht9EVIW7lbeo4



https://www.youtube.com/watch?v=zdKX7Xo3Cb8



https://www.youtube.com/watch?v=kfN76APa4HE

https://www.youtube.com/watch?v=kfN76APa4HE

http://inside.mines.edu/~whoff/courses/EENG512/lectures/

http://inside.mines.edu/~whoff/courses/EENG512/lectures/

Date post:	15-Aug-2015
Category:	Technology
Upload:	embedded-vision-alliance
View:	44 times
Download:	4 times

"3D from 2D: Theory, Implementation, and Applications of Structure from Motion," a Presentation From...

Technology