Stereoscopic 3D reconstruction -...

Post on 29-Sep-2020

4 views 0 download

transcript

Stereoscopic 3D reconstruction

• Narrower formulation: given a rectified binocular

stereo pair, fuse it to produce a depth image

Stereoscopic 3D reconstruction

• Narrower formulation: given a rectified binocular stereo

pair, fuse it to produce a depth image

image 1 image 2

Dense depth map

Depth from disparity

f

x x’

Baseline

B

z

O O’

X

f

z

fBxxdisparity

Disparity is inversely proportional to depth…

B1 B2

1Bx

f z 2Bx

f z

1 2B Bx x

f z

x

Depth from disparity

f

x’

z

O O’

X

f

z

fBxxdisparity

1Bx

f z 2Bx

f z

1 2B Bx x

f z

B

B1

B2

field of view

of stereo

The role of the baseline in depth reconstruction accuracy

one pixel

uncertainty of

scenepoint

Optical axes of the two cameras need not be parallel

• Field of view decreases with increase in baseline and vergence

• Accuracy increases with baseline and vergence

The correspondence problem

• …, so if we could find the corresponding

points in two images, we could estimate

relative depth…

• Epipolar geometry constrained our search, but

we still have to solve the correspondence

problem.

• Goal/motivation: Establish densecorrespondences between the views of a stereo pair, given the epipolar constraint, so as to perform 3D reconstruction

Matching cost

disparity

Left Right

scanline

Correspondence search

• Slide a window along the right scanline and

compare contents of that window with the

reference window in the left image

• Matching cost: SSD or normalized correlation

Left Right

scanline

Correspondence search

SSD

Left Right

scanline

Correspondence search

Norm. corr

Basic stereo matching algorithm

• If necessary, rectify the two stereo images to transform

epipolar lines into scanlines

• For each pixel x in the first image

– Find corresponding epipolar scanline in the right image

– Examine all pixels on the scanline and pick the best match x’

– Compute disparity x–x’ and set depth(x) = B*f/(x–x’)

Effect of window size

– Smaller window

+ More detail

• More noise

– Larger window

+ Smoother disparity maps

• Less detail

W = 3 W = 20

Results with window search

Window-based matching Ground truth

Data

Failures of correspondence search

Textureless surfaces Occlusions, repetition

Non-Lambertian surfaces, specularities…

How can we improve window-based matching?

• The similarity constraint is local (each

reference window is matched

independently)

• Need to enforce non-local

correspondence constraints

Non-local constraints• Uniqueness

– For any point in one image, there should be at

most one matching point in the other image

Non-local constraints• Uniqueness

– For any point in one image, there should be at most

one matching point in the other image

• Ordering

– Corresponding points should be in the same order in

both views

Non-local constraints

• Uniqueness

– For any point in one image, there should be at

most one matching point in the other image

• Ordering

– Corresponding points should be in the same

order in both views

• Smoothness

– We expect disparity values to change slowly

(for the most part)

Scanline stereo

• Try to coherently match pixels on the

entire scanline

• Different scanlines are still optimized

independentlyLeft image Right image

“Shortest paths” for scan-line stereo

Left image

Right image

Can be implemented with dynamic programming Ohta & Kanade ’85, Cox et al. ’96:

C(i,j) = min{C(i-1,j)+coccl, C(i,j-1)+coccl, C(i-1,j-1)+ccorr(i,j)}

leftS

rightS

Rightocclusion

occlC

occlC

I

I

corrC

Slide credit: Y. Boykov

Dynamic Programming - Result

• No inter scan-line consistency is enforced

Stereo matching as energy minimization

I1I2 D

• Energy functions of this form can be minimized using

graph cuts

Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate Energy Minimization via Graph Cuts, PAMI 2001

W1(i) W2(i+D(i)) D(i)

jii

jDiDiDiWiWDE,neighbors

2

21 )()())(()()(

data term smoothness term

ResultsWindow-based matchingGround truth

Dynamic programming Graph Cuts

Another example

Dynamic programmingGraph Cuts

Active stereo with structured light

Active stereo with structured light

L. Zhang, B. Curless, and S. M. Seitz. Rapid Shape Acquisition Using Color

Structured Light and Multi-pass Dynamic Programming. 3DPVT 2002

Active stereo with structured light

• Project “structured” light patterns onto the object

• Simplifies the correspondence problem

• Allows us to use only one camera

L. Zhang, B. Curless, and S. M. Seitz. Rapid Shape Acquisition Using Color Structured

Light and Multi-pass Dynamic Programming. 3DPVT 2002

Laser scanning

Optical triangulation

• Project a single stripe of laser light

• Scan it across the surface of the object

• This is a very precise version of structured light scanning

Digital Michelangelo Project

Levoy et al.http://graphics.stanford.edu/projects/mich/

Source: S. Seitz

Laser scanned models

The Digital Michelangelo Project, Levoy et al.

Source: S. Seitz

Laser scanned models

The Digital Michelangelo Project, Levoy et al.Source: S. Seitz

Laser scanned models

The Digital Michelangelo Project, Levoy et al.

Source: S. Seitz

Laser scanned models

The Digital Michelangelo Project, Levoy et al.

Source: S. Seitz

Laser scanned models

The Digital Michelangelo Project, Levoy et al.

Source: S. Seitz

1.0 mm resolution

Aligning range images

• A single range scan is not sufficient to describe a

complex surface

• Need techniques to register multiple range images

B. Curless and M. Levoy, A Volumetric Method for Building Complex Models from

Range Images, SIGGRAPH 1996

Kinect: Structured infrared light

http://bbzippo.wordpress.com/2010/11/28/kinect-in-infrared/

The third eye can be used to provide further constraints

or for verification

Beyond two-view stereo

Volumetric stereo

Scene Volume

V

Input Images

(Calibrated)

Goal: Determine occupancy + “color” of points in V

Discrete formulation: Voxel Coloring

Discretized

Scene Volume

Input Images

(Calibrated)

Goal: Assign RGB values to voxels in Vphoto-consistent with images

The idea…

Photo-consistent

Scene

surface

RedRedRed

BlueGrayRed

Inconsistent

1. Choose voxel

2. Project and correlate

3. Color if consistent(standard deviation of pixel

colors below threshold)

Voxel Coloring Approach

Complexity and computability

Discretized

Scene Volume

N voxels

C colors

3

All Scenes (CN3)Photo-Consistent

Scenes

TrueScene

Photo-consistency

All ScenesPhoto-Consistent

Scenes

TrueScene

• A photo-consistent scene is a scene that exactly

reproduces your input images from the same

camera viewpoints

• You can’t use your input cameras and images to

tell the difference between a photo-consistent

scene and the true scene

Which shape do you get?

• The Photo Hull is the UNION of all photo-consistent scenes in V

– It is a photo-consistent scene reconstruction

– Tightest possible bound on the true scene

True Scene

V

Photo Hull

V

Source: S. Seitz

1. Choose voxel

2. Project and correlate

3. Color if consistent(standard deviation of pixel

colors below threshold)

Voxel Coloring Approach

Visibility Problem: in which images is each voxel visible?

Space Carving

•Space Carving Algorithm

Image 1 Image N

…...

• Initialize to a volume V containing the true scene

• Repeat until convergence

• Choose a voxel on the outside of the volume

• Carve if not photo-consistent

• Project to visible input images

K. N. Kutulakos and S. M. Seitz, A Theory of Shape by Space Carving, ICCV 1999

Space Carving Algorithm

• The Basic Algorithm is Unwieldy

– Complex update procedure

• Alternative: Multi-Pass Plane Sweep

– Efficient, can use texture-mapping hardware

– Converges quickly in practice

– Easy to implement

Space Carving Results: African Violet

Input Image (1 of 45) Reconstruction

ReconstructionReconstruction

Space Carving Results: Hand

Input Image(1 of 100)

Views of Reconstruction

1. C=2 (shape from silhouettes)

• Volume intersection [Baumgart 1974]

> For more info: Rapid octree construction from image sequences. R. Szeliski,

CVGIP: Image Understanding, 58(1):23-32, July 1993 or

> W. Matusik, C. Buehler, R. Raskar, L. McMillan, and S. J. Gortler, Image-Based

Visual Hulls, SIGGRAPH 2000 ( pdf 1.6 MB )

Voxel coloring solutions

Why use silhouettes?

• Can be computed robustly

• Can be computed efficiently

- =

background

+

foreground

background foreground

• The case of binary images: a voxel is

photo-consistent if it lies inside the object’s

silhouette in all views

Reconstruction from Silhouettes

Binary Images

• The case of binary images: a voxel is photo-

consistent if it lies inside the object’s silhouette

in all views

Reconstruction from Silhouettes

Binary Images

Finding the silhouette-consistent shape (visual hull):

• Backproject each silhouette

• Intersect backprojected volumes

Volume intersection

• Reconstruction Contains the True Scene

– But is generally not the same

Voxel algorithm for volume intersection

• Color voxel black if on silhouette in every image

Voxel algorithm for volume intersection

Photo-consistency vs. silhouette-consistency

True Scene Photo Hull Visual Hull

55

Visual Hull Results

•Download data and results fromhttp://www-cvr.ai.uiuc.edu/ponce_grp/data/visual_hull/

Properties of Volume Intersection

• Pros

– Easy to implement

• Cons

– No concavities

– Reconstruction is not photo-consistent if

texture information is available

– Requires silhouette extraction

Carved visual hulls• The visual hull is a good starting point for

optimizing photo-consistency

– Easy to compute

– Tight outer boundary of the object

– Parts of the visual hull (rims) already lie on the surface

and are already photo-consistent

• Thus:

Yasutaka Furukawa and Jean Ponce, Carved Visual Hulls for Image-Based

Modeling, ECCV 2006.

58

1. Compute visual hull

2. Carve the visual hull to optimize photo-consistency

From multiple views to textured 3D meshes

K. Tzevanidis, X. Zabulis, T. Sarmis, P. Koutlemanis, N. Kyriazis, A.A. Argyros, “From multiple views to textured 3D meshes: a

GPU-powered approach”, in Proceedings of the Computer Vision on GPUs Workshop, CVGPU’2010, In conjunction with

ECCV’2010, Heraklion, Crete, Greece, 10 September 2010.

Shape from silhouettes

K. Tzevanidis, X. Zabulis, T. Sarmis, P. Koutlemanis, N. Kyriazis, A.A. Argyros, “From multiple views to textured 3D meshes: a GPU-powered

approach”, in Proceedings of the Computer Vision on GPUs Workshop, CVGPU’2010, In conjunction with ECCV’2010, Heraklion, Crete,

Greece, 10 September 2010.

Shape from silhouettes

61

K. Tzevanidis, X. Zabulis, T. Sarmis, P. Koutlemanis, N. Kyriazis, A.A. Argyros, “From multiple views to textured 3D meshes: a

GPU-powered approach”, in Proceedings of the Computer Vision on GPUs Workshop, CVGPU’2010, In conjunction with

ECCV’2010, Heraklion, Crete, Greece, 10 September 2010.