Focus Range Sensors - Department of Computer Science ... · PDF fileFocus Range Sensors ......

Focus Range Sensors

Shree K. NayarDepartment of Computer Science, Columbia University New York, NY 10027, U.S.A.

Minori Noguchi, Masahiro Watanabe, Yasuo NakagawaProductional Engineering Research Laboratory, Hitachi Ltd. Yokohama, 244 Japan

AbstractStructures of dynamic scenes can only be re-

covered using a real-time range sensor. Focusanalysis offers a direct solution to fast and denserange estimation. It is computational efficient asit circumvents the correspondence problem facedby stereo and feature tracking in structure frommotion. However, accurate depth estimation re-quires theoretical and practical solutions to a va-riety of problems including recovery of texturelesssurfaces, precise blur estimation, and magnifica-tion variations caused by defocusing. Both tex-tured and textureless surfaces are recovered usingan illumination pattern that is projected via thesame optical path used to acquire images. Theillumination pattern is optimized to ensure maxi-mum accuracy and spatial resolution in computeddepth. A prototype focus range sensor has beendeveloped that produces up to 512x480 depth es-timates at 30 Hz with an accuracy better than0.3%. In addition, a microscopic shape from fo-cus sensor is described that uses the derived il-lumination pattern and a sequence of images torecover depth with an accuracy of 1 micron. Sev-eral experimental results are included to demon-strate the performances of both sensors. We con-clude with a brief summary of our recent resultson passive focus analysis.

1 Introduction

Of all problems studied in computational vision,recovery of three-dimensional scene structure hasby far attracted the most attention. This has re-sulted in a panoply of sensors and algorithms [15][2] that can be broadly classified into two cat-egories; passive and active. Passive techniquessuch as shape from shading and shape from tex-ture attempt to extract structure from a singleimage. These algorithms are still under investi-

gation and, given the assumptions they are forcedto invoke, they are expected to prove complemen-tary to other techniques but not serve as stand-alone strategies. Other passive methods such asstereo and structure from motion use multipleviews to resolve shape ambiguities inherent in asingle image. The primary bottleneck for thesemethods has proved to be correspondence andfeature tracking. Recently, it was demonstratedthat stereo could achieve real-time performance,but only with the use of significant customizedhardware [16]. Further, passive algorithms haveyet to demonstrate the accuracy and robustnessrequired for high-level perception tasks such asobject recognition and pose estimation.

Hitherto, high quality depth maps have re-sulted only from the use of active sensors basedon time of flight or light striping [15]. From apractical perspective, light stripe range findinghas emerged as a clear winner. In structured en-vironments, where active radiation of a scene isfeasible, it offers a robust yet inexpensive solutionto a variety of problems. However, it has sufferedfrom one inherent drawback, namely, speed. Toachieve depth maps with sufficient spatial reso-lution, a large number (say, N) of closely spacedstripes are used. If all stripes are projected simul-taneously it is impossible to associate a uniquestripe with any given image point, a process thatis necessary to compute depth by triangulation.The classical approach is to obtain N images, onefor each stripe. If Tf is the time required to senseand digitize an image, the scanning of N stripestakes at least N .Tf . Substantial improvementscan be made by assigning gray codes to the stripesand scanning the entire collection of stripes in sets[14]. All the information needed is then acquiredin log2(N)Tf , a significant improvement. An al-ternative approach uses color-coded stripe pat-terns [5]; this however is practical only in a gray-world that reflects all wavelengths of light. New

hope for light stripe range finding has been in-stilled by advances in VLSI. Based on the notionof cell parallelism [17], a computational sensor isdeveloped where each sensor element records astripe detection time-stamp as a single laser stripesweeps the scene at high speed. Depth maps areproduced in as little as 1 msec, though presentday silicon packaging limits the total number ofcells, and hence spatial depth resolution, to 28x32[12].

Here, we summarize our work on a class ofrange sensors that are based on focus analysis. Inparticular, we describe a real-time range sensorthat produces high-resolution (512x480) depthmaps at 30 Hz (video rate) [25] [32]. Focus anal-ysis has a major advantage over stereo and struc-ture from motion. Two or more images of a sceneare taken under different optical settings but fromthe same viewpoint, as initially demonstrated by[27][29] and subsequently by others1. This cir-cumvents the need for correspondence or featuretracking. The real-time sensor mentioned abovehere uses only two scene images. These imagescorrespond to different levels of focus, and localfrequency analysis implemented typically via lin-ear operators yields depth estimates. However,differences between the two images tend to bevery subtle and we believe that previous solu-tions to depth from defocus have met with lim-ited practical success as they are based on roughapproximations to the optical and sensing mecha-nisms involved in focus analysis. In contrast, ourapproach is based on a careful physical model-ing of all the optical, sensing, and computationalelements at work; the optical transfer function,defocus, image sensing and sampling, and focusmeasure operators.

Depth from defocus shares one inherent weak-ness with stereo and motion, in that, it requiresthat the scene have high frequency textures. Atextureless surface appears the same focused ordefocused and the resulting images do not con-tain information necessary for depth computa-tion. This has prompted us to develop a rangesensor that uses active illumination. The key ideais to force a texture on the scene and then ana-

1All work in focus based depth computation can bebroadly classified into depth from focus and depth fromdefocus. The former relies on a large number of imagestaken by displacing the sensor in small increments anduses a focus operator to detect the image of maximumfocus for each scene point (see [20, 7, 22, 21, 19, 1, 34,26]). In contrast, depth from defocus typically uses twoimages and estimates relative blurring to get depth (see[27, 29, 11, 28, 4, 8, 34, 10]).

lyze the relative defocus of the texture in two im-ages. Illumination projection has been suggestedin the past [9][28] for both depth from defocus anddepth from pattern size distortion under perspec-tive projection. However, these projected pat-terns were selected in a more or less arbitraryfashion and do not guarantee desired precisionin computed depth. A critical problem there-fore is determining an illumination pattern thatwould maximize the accuracy and robustness ofdepth from defocus. We arrive at a solution tothis problem through a detailed Fourier analysisof the entire depth from defocus system. First,theoretical models developed for each of the opti-cal and computational elements of the system areexpressed in spatial and Fourier domains. Thederivation of the illumination pattern (or filter) isthen posed as an optimization problem in Fourierdomain. The optimal pattern is one that max-imizes sensitivity of the focus measure to depthvariations while minimizing the size of the focusoperator to achieve high spatial resolution in com-puted depth.

An implementational problem that has repeat-edly surfaced in previous work is the variationin image magnification that occurs when imagesare taken under different focus settings [33]. Thismanifests into a correspondence-like problem. Ithas forced investigators to resort to techniquesvarying from image registration and warping [7]to the use of precise lens calibration for correct-ing magnification variations [33] [7]. We have asimple but effective optical solution to this prob-lem [31]. By appending an additional aperture tothe optics, we show that the focus setting of animaging system can be varied substantially with-out altering magnification.

A prototype real-time focus range sensor hasbeen developed. It uses two CCD image detectorsthat view the scene through the same optical ele-ments [25][32]. The derived illumination patternis fabricated using micro-lithography and incor-porated into the sensor. The illumination patternis projected onto the scene via the same opticalpath used to image the scene. This results in sev-eral advantages. It enables precise registration ofthe illumination pattern with the sampling gridof the image sensors. Light rays projected outthrough the imaging optics are subjected to simi-lar geometric distortions as rays reflected back tothe sensors. Therefore, despite ever-present lensdistortions, the illumination pattern and the sens-ing grid of the detector are well registered. Thecoaxial illumination and imaging also results in

a shadowless image; all surface regions that arevisible to the sensor are also illuminated. Fur-thermore, since both images are acquired fromthe same viewing direction, the missing part orocclusion problem in stereo is avoided. Severalexperiments have been conducted to evaluate theaccuracy and real-time capability of the sensor.In addition to a quantitative error analysis, real-time depth map sequences of moving objects areshown.

Finally, we describe a second system we havedeveloped that is based on focus analysis [26].This system is complementary to the first one,in that, it is based on depth from focus (ratherthan defocus) and is capable of reocovering mi-croscopic objects with an accuracy as good as 1micron. The starting point for this system is analgorithm described in [22], where a shape fromfocus method was developed for microscopic ob-jects. The high magnification of a microscoperesults in images that capture brightness varia-tions caused by the micro-structure of the sur-face. Most surfaces that appear smooth and non-textured to the naked eye produce highly tex-tured images under a microscope. Examples ofsuch surfaces are paper, plastics, ceramics, etc.Microscopic shape from focus [22] was thereforedemonstrated to be an effective approach, offeringsolutions to a variety to challenging shape inspec-tion problems. However, there exist surfaces thatare smooth at the micro-structure level and con-sequently do not produce sufficient texture evenunder a microscope.

The illumination pattern we used in the real-time range sensor was initially incorporated intothe microscopic shape from focus system [26].The illumination pattern is projected onto thesample via the path of the bright field illumina-tion of the microscope. As with the depth fromdefocus sensor, this enables very precise registra-tion of the illumination pattern with the samplinggrid of the image sensor; light rays projected outthrough the imaging optics are subjected to thesame geometric distortions as rays reflected backto the image sensor. The motorized stage of themicroscope is used to automatically acquire a setof images (typically 10-20) by moving the sampletowards the objective lens. The depth from focusalgorithm [22] is then applied to the image set toobtain a complete depth map of the sample. Asexamples, we show accurate and detailed depthmaps of structures on silicon substrates and sol-der joints. These are samples of significant indus-trial import that have been found hard to recover

using other vision techniques.We conclude with a brief discussion on our most

recent result on focus analysis that include a pas-sive bifocal vision sensor [23] and a novel algo-rithm for passive depth from defocus [30] thatuses a minimal operator set to recover scenes withunknown and complex textures.

2 Depth from Defocus

Fundamental to depth from defocus is the rela-tionship between focused and defocused images[3]. Figure 1 shows the basic image formation ge-ometry. All light rays that are radiated by objectpoint P and pass the aperture A are refractedby the lens to converge at point Q on the imageplane. For a thin lens, the relationship betweenthe object distance d, focal length of the lens f ,and the image distance di is given by the Gaus-sian lens law:

1

d+

1

di=

1

f. (1)

I fI1 I2

P

Q

R

f

ddi

α βγ

A

a

O

Figure 1: Image formation and depth from defocus.

Each point on the object plane is projectedonto a single point on the image plane, causing aclear or focused image If to be formed. If, how-ever, the sensor plane does not coincide with theimage plane and is displaced from it, the energyreceived from P by the lens is distributed over apatch on the sensor plane. The result is a blurredimage of P . It is clear that a single image does notinclude sufficient information for depth estima-tion as two scenes defocused to different degreescan produce identical images. A solution to depthis achieved by using two images, I1 and I2, sepa-rated by a known physical distance β. The prob-lem is reduced to analyzing the relative blurringof each scene point in the two images and com-puting the distance α of its focused image. Then,

using di= γ- α, the lens law (1) yields depth d ofthe scene point. Simple as this procedure may ap-pear, several technical problems emerge when im-plementing an algorithm of practical value. Theseinclude (a) accurate estimation of relative defocusin the two images, (b) recovery of textured andtextureless surfaces, and (c) achieving constantmagnification that is invariant to the degree ofdefocus.

3 Constant-MagnificationDefocus

We begin with the last of the problems mentionedabove. In the imaging system shown in Figure1, the effective image location of point P movesalong ray R as the sensor plane is displaced. Thiscauses a shift in image coordinates of P that inturn depends on the unknown scene coordinatesof P . This variation in image magnification withdefocus manifests as a correspondence-like prob-lem in depth from defocus as the right set ofpoints in images I1 and I2 are needed to estimateblurring. We approach this problem from an opti-cal perspective rather a computational one. Con-sider the image formation model shown in Figure2. The only modification made with respect tothe model in Figure 1 is the use of the exter-nal aperture A′. The aperture is placed at thefront-focal plane, i.e. a focal length in front ofthe principal point O of the lens. This simpleaddition solves the prevalent problem of magni-fication variation with distance α of the sensorplane from the lens. Simple geometrical analysisreveals that a ray of light R′ from any scene pointthat passes through the center O′ of aperture A′

emerges parallel to the optical axis on the imageside of the lens [18]. Furthermore, this parallelray is the axis of a cone that includes all lightrays radiated by the scene point, passed throughby A′ and intercepted by the lens. As a result,despite blurring, the effective image coordinatesof point P in both images I1 and I2 are the same,namely, the coordinate of its focused image Q onIf .

This invariance of magnification to defocusholds true for any depth from defocus configura-tion (all values of α and β). It can also be shownthat the constant-magnification property is unaf-fected by the aperture radius a′. Furthermore,the lens law of (1) remains valid. This modifica-tion is realizable not only in single lens systemsbut any compound lens system. Given an off-the-

I fI1 I2

P

QR

f

ddi

α βγ

O

A'

a'f

O''

Figure 2: A constant-magnification imaging system

for depth from defocus is achieved by simply placing

an aperture at the front-focal plane of the optics.

shelf lens, such an aperture is easily appended tothe casing of the lens. The resulting optical sys-tem is called a telecentric lens. While the nominaland effective F-numbers of the classical optics inFigure 1 are f/a and di/a, respectively, they areboth equal to f/a′ in the telecentric case.

4 Modeling

Effective solutions to both illumination projectionand depth estimation require careful modelingand analysis of all physical phenomena involvedin depth from defocus. There are five different el-ements, or components, that play a critical role,namely, the illumination pattern, optical transferfunction, defocusing, image sensing, and the focusoperator. All of these together determine the therelation between the depth of a scene point andits two focus measures. Since we have used thetelecentric lens (Figure 2) in our implementation,it’s parameters are used in developing each model.However, all of the following expressions can bemade valid for the classical lens system (Figure 1)by simply replacing the factor f

a′by di

a . Though

we use both spatial and Fourier (frequency) mod-els of the above components, for brevity we willpresent Fourier models only when they are neededto make pertinent observations.

4.1 Illumination Pattern

Before the parameters of the illumination patterncan be determined, an illumination model mustbe defined. Such a model must be flexible in thatit must subsume a large enough variety of possibleillumination patterns. As we will describe shortly,the image sensor used has rectangular pixels ar-ranged on a rectangular spatial grid. Hence, thebasic building block of the model is a rectangular

illuminated patch, or cell, with uniform intensity:

ic(x, y; bx, by) = 2Π(1

bxx,

1

byy) (2)

where, 2Π() is the two-dimensional Rectangularfunction [6]. The unknown parameters of this illu-mination cell are bx and by, the length and widthof the cell. This cell is assumed to be repeated ona two-dimensional grid to obtain a periodic pat-tern. This periodicity is essential since our goalis to achieve spatial invariance in depth accuracy.The periodic grid is defined as:

ig(x, y; tx, ty) = 2III( 12 ( 1txx+ 1

tyy), 1

2 ( 1txx− 1

tyy))

(3)where, 2III() is the 2-dimensional Shah function[6], and 2tx and 2ty determine the periods of thegrid in the x and y directions. The final illu-mination pattern is obtained by convolving thecell ic(x, y) with the grid ig(x, y), i.e. i(x, y) =ic(x, y) ∗ ig(x, y). The exact pattern is thereforedetermined by four parameters, namely, bx, by,tx and ty. The above illumination grid is notas restrictive as it may appear upon initial in-spection. For instance, the parameters bx, by,2tx and 2ty can each be stretched to obtain re-peated illumination and non-illumination stripesin the horizontal and vertical directions, respec-tively. Alternatively, they can also be adjusted toobtain a checkerboard illumination pattern withlarge or small illuminated patches. The exact val-ues for bx, by, tx and ty will be evaluated by theoptimization procedure described later.

The Fourier transforms of the illumination cell,grid, and pattern are denoted as Ic(u, v), Ig(u, v),and I(u, v), respectively, and are related as:

I(u, v; bx, by, tx, ty) = Ic(u, v) · Ig(u, v) (4)

4.2 Optical Transfer Function

Adjacent points on the illuminated surface reflectlight waves that interfere with each other to pro-duce diffraction effects. The angle of diffractionincreases with the spatial frequency of surfacetexture. Since the lens aperture of the imagingsystem (Figure 2) is of finite radius a′, it does notcapture the higher order diffractions radiated bythe surface (see [3] for details). This effect placesa limit on the optical resolution of the imagingsystem characterized by the optical transfer func-tion (OTF):

O(u, v; a′, f) (5)

ϕ x

Spatial domain Frequency Domain

ϕ y

py px

s1/px 1/py

S

Sam

plin

gS

enso

r

wxwyx y

1/wx 1/wy

vu

yx

o

Opt

ical

Tra

nsfe

r F

unct

ion

2a’/ λ

u v

O

Def

ocus

vu

H

α a’/x y

h

u v

0.61

/α a’

/ α 2 22

yx

0.61 λ f / a’2

a’πf

f

f

f

g g

cs cS

Figure 3: Spatial and frequency models for the opti-

cal and sensing elements of depth from defocus.

=

{(a′f )2(γ − sin γ) ,

√u2 + v2 ≤ 2a′

λf

0 ,√u2 + v2 > 2a′

λf

where γ = 2 cos−1(λfa′√u2+v2

2 ) .

where, (u, v) is the spatial frequency of the two-dimensional surface texture as seen from the im-age side of the lens, f is the focal length of thelens, and λ is the wavelength of incident light.It is clear from the above expression that only

spatial frequencies below the limit 2a′λf will be im-

aged by the optical system (Figure 3). This inturn places restrictions on the frequency of theillumination pattern.

4.3 Defocusing

The defocus function is described in detail in pre-vious work (see [3][13] for example). As in Figure2, let α be the distance between the focused imageof a surface point and its defocused image formedon the sensor plane. The light energy radiatedby the surface point and collected by the imag-ing optics is uniformly distributed over a circularpatch on the sensor plane. This patch, also calledthe pillbox, is the defocus function (Figure 3):

h(x, y;α, a′, f) =f2

2πa′2α2Π(

d

2aα

√x2 + y2) (6)

where, a′ is the radius of the telecentric lens aper-ture. The Fourier transform of the defocus func-

tion is:

H(u, v;α, a′, f) (7)

=f

2πa′α√u2 + v2

J1(2πa′αf

√u2 + v2)

where J1 is the first-order Bessel function [3].As is evident from the above expression, defocusserves as a low-pass filter. The bandwidth of thefilter increases as α decreases, i.e. as the sensorplane gets closer to the plane of focus. Note thatin a defocused image, all frequencies are atten-uated at the same time. In the case of passivedepth from focus or defocus, this poses a seri-ous problem; different frequencies in an unknownscene are bound to have different (and unknown)magnitudes and phases. It is difficult therefore toestimate the degree of defocus of an image regionwithout the use of a large set of narrow-band fo-cus operators that analyze each frequency in iso-lation. Hence, it would be desirable to have anillumination pattern that has a single dominantfrequency, enabling robust estimation of defocusand hence depth.

4.4 Image Sensing

We assume the image sensor to be a typical CCDTV camera that can be modeled as a rectangu-lar array of rectangular sensing elements (pixels).The quantum efficiency [13] of each pixel is as-sumed to be uniform over the area of the pixel.Let m(x, y) be the continuous image formed onthe sensor plane. The finite pixel area has theeffect of averaging the continuous image m(x, y).In spatial domain, the averaging function is therectangular cell:

sc(x, y;wx, wy) =2 Π(1

wxx,

1

wyy) (8)

where, wx and wy are the length and width ofthe pixel, respectively. The discrete image is ob-tained by sampling the convolution of m(x, y)with sc(x, y). This sampling function is a rect-angular grid:

sg(x, y; px, py, ϕx, ϕy) (9)

=1

pxpy2III( 1

px(x− ϕx), 1

py(y − ϕy))

where, px and py are spacings between dis-crete samples in the two spatial dimensions, and(ϕx, ϕy) is phase shift of the grid. The final dis-crete image is therefore:

md(x, y) = (sc(x, y) ∗m(x, y)) · sg(x, y) (10)

The parameters wx, wy, px, and py are all deter-mined by the particular image sensor used. Theseparameters are therefore known and their valuesare substituted after the optimization is done. Onthe other hand, the phases (ϕx, ϕy) of the sam-pling function is with respect to the illuminationpattern and are also viewed as parameters to beoptimized. In Fourier domain, the final discreteimage is:

Md(u, v) = (Sc(u, v) ·M(u, v)) ∗ Sg(u, v) (11)

4.5 Focus Operator

Since defocusing has the effect of suppressinghigh-frequency components in the focused image,it is desirable that the focus operator respond tohigh frequencies in the image. For the purpose ofillumination optimization, we use the Laplacian.However, the derived pattern will remain optimalfor a large class of symmetric focus operators. Inspatial domain, the 3x3 discrete Laplacian is:

l(x, y; qx, qy)

= 4δ(x) · δ(y)− [δ(x) · δ(y − qy) + δ(x) (12)

·δ(y + qy) + δ(x− qx) · δ(y) + δ(x+ qx) · δ(y)]

Here, qx and qy are the spacings between neigh-boring elements of the discrete Laplacian kerneland are given by the image sensor. The Fouriertransform of the Laplacian is:

L(u, v; qx, qy) = 4− 2 cos (2πqxu)− 2 cos (2πqyv)(13)

The required discrete nature of the focus opera-tor comes with a price. It tends to broaden thebandwidth of the operator. Once the illuminationpattern has been determined, the above filter willbe tuned to maximize sensitivity to the funda-mental illumination frequency while minimizingthe effects of spurious frequencies caused eitherby the scene’s inherent texture or image noise.

4.6 Focus Measure

The focus measure is simply the output of thefocus operator. It is related to defocus α (andhence depth d) via all of the components modeledabove. Note that the illumination pattern (ic ∗ig) is projected through optics that is similar tothat used for image formation. Consequently, thepattern is also subjected to the limits imposed bythe optical transfer function o and the defocusfunction h. Therefore, the texture projected on

the scene is:

i(x, y; bx, by, tx, ty)∗o(x, y; a′, f)∗h′(x, y;α′, a′, f)(14)

where, α′ represents defocus of the illuminationitself that depends on the depth of the illumi-nated point. However, the illumination pattern,once incident on a surface patch, plays the role ofsurface texture and hence defocus α′ of illumina-tion does not have any significant effect on depthestimation. The projected texture is reflected bythe scene and projected by the optics back ontothe image plane to produce the discrete image:

{i(x, y; bx, by, tx, ty) ∗ o(x, y; a′, f)∗2

∗h′(x, y;α′, a′, f) ∗ h(x, y;α, a′, f)

∗sc(x, y;wx, wy)} · sg(x, y; px, py, ϕx, ϕy)(15)

where, o∗2 = o ∗ o. The final focus measure func-tion g(x, y) is the result of applying the discreteLaplacian to the above discrete image:

g(x, y) = {(i ∗ o∗2 ∗ h∗2 ∗ sc) · sg} ∗ l (16)

Since the distance between adjacent weights ofthe Laplacian kernel must be integer multiples ofthe period of the image sampling function sg , (16)can be rearranged as:

g(x, y) = (i ∗ o∗2 ∗h′ ∗h ∗ sc ∗ l) · sg = g0 · sg (17)

where, g0 = i ∗ o∗2 ∗ h′ ∗ h ∗ sc ∗ l. Alternately, inFourier domain we have:

G(u, v) = (I ·O2 ·H ′ ·H ·Sc ·L)∗Sg = G0∗Sg (18)

The above expression gives us the final output ofthe focus operator for any value of the defocusparameter α.

5 Optimization

The illumination optimization problem is formu-lated as follows: Establish closed-form relation-ships betweenthe illumination parameters (bx, by, tx, ty), sensorparameters (wx, wy, px, py, ϕx, ϕy), and discreteLaplacian parameters (qx, qy) so as to maximizethe sensitivity, robustness, and spatial resolutionof the focus measure g(x, y). High sensitivity im-plies that a small variation in the degree of focusresults in a large variation in g(x, y). By robust-ness we mean that all pixels with the same de-gree of defocus produce the same focus measureindependent of their location on the image plane.

This ensures that depth estimation accuracy isinvariant to location on the image plane. Lastly,high spatial resolution is achieved by minimizingthe size of the focus operator.

The details of the optimization process aregiven in [24] and will be omitted in the inter-est of space. Here, we briefly outline the ar-guments we have used to arrive at the opti-mal pattern. In order to minimize smoothingeffects and maximize spatial resolution of com-puted depth, the support (or span) of the dis-crete Laplacian must be as small as possible.This in turn requires the frequency of the illu-mination pattern be as high as possible. How-ever, the optical transfer function described insection 4.2 imposes limits on the highest fre-quency that can be imaged by the optical sys-

tem. This maximum allowable frequency is 2a′λf ,

determined by the numerical aperture of the tele-centric lens. Our objective then is to maximizethe fundamental spatial frequency (1/tx, 1/ty) ofthe illumination. In order to maximize this fre-quency while maintaining high detectability, wemust have

√(1/tx)2 + (1/ty)2 close to the optical

limit 2a′λf . This in turn pushes all higher harmon-

ics in the illumination pattern outside the opticallimit. What we are left with is a surface tex-ture whose image has only the quadrapole fun-damental frequencies (±1/tx,±1/ty). Using thisobservation, the illumination pattern parameters(bx, by, tx, ty, ) and the illumination phase shift(ϕx, ϕy) that maximize || G( 1

tx, 1ty

) || are deter-

mined in [24]. Two optimal patterns were foundand are shown in Figure 4. Exactly how suchhigh resolution patterns can be projected and per-fectly registered with the image detector will bedescribed in the experimental section.

px

py

2 px

2 py

(a) (b)

Figure 4: Optimal illumination filter patterns: (a)

tx = 2px, ty = 2py, ϕx = 0, ϕy = 0; and (b)

tx = 4px, ty = 4py, ϕx = 1/8tx, ϕy = 1/8ty . Here,

(tx, ty) is the illumination period, (px, py) is the pixel

size, and (ϕx, ϕy) is the illumination phase shift with

respect to the sensing grid.

6 Tuned Focus Operator

For the purpose of illumination optimization, weused the Laplacian. The resulting illuminationpattern has only a single dominant absolute fre-quency, (1/tx, 1/ty). Given this, we are in a po-sition to further refine our focus operator so asto minimize the effects of all other frequenciescaused either by the physical texture of the sceneor image noise.

Given that the operator must eventually be dis-crete and of finite support, there is a limit to theextent to which it can be tuned. To constrainthe problem, we impose the following conditions.(a) To maximize spatial resolution in computeddepth, we force the operator kernel to be 3x3. (b)Since the fundamental frequency of the illumina-tion pattern has a symmetric quadrapole arrange-ment, the focus operator must be rotationallysymmetric. (c) The operator must not respondto any DC component in image brightness. Theabove conditions, yield a set of equations with theoperator elements as variables [32]. These equa-tions were solved to find that the operator withits symmetric structure has only one variable.This variable was optimize so as to yield a fre-quency response with sharpest peaks, i.e. powerspectrum with the lowest second moment aroundthe illumination frequency (±1/tx,±1/ty). Thistuned focus operator was found to have substan-tially sharper peaks than the discrete Laplacian.

7 Depth from Two Images

Depth estimation uses two images of the scene,I1(x, y) and I2(x, y), that correspond to differ-ent effective focal lengths as shown in Figure 2.Depth of each scene point is determined by esti-mating the displacement α of the focused planeIf for the scene point. The tuned focus opera-tor is applied to both images to get focus mea-sure images g1(x, y) and g2(x, y, ). Since the im-age now has a single dominant frequency, namely(±1/tx,±1/ty), a relation between the focus mea-sures and defocus α can be derived using (18):

q =g1 − g2

g1 + g2=

H( 1tx, 1ty

;α) − H( 1tx, 1ty

;α− β)

H( 1tx, 1ty

;α) + H( 1tx, 1ty

;α− β)

(19)As shown in Figure 5, q is a monotonic functionof α such that −p ≤ q ≤ p, p ≤ 1. In prac-tice, the above relation can be pre-computed andstored as a look-up table that maps q to a unique

α. Since α represents the position of the focusedimage, the lens law (1) yields the depth d of thecorresponding scene point. Note that the tunedfocus operator designed in the previous sectionis a linear filter, making it feasible to computedepth maps of scenes in real-time using simpleimage processing hardware.

-1.0

-0.5

0.0

0.5

1.0

g1 g

2_

+g1 g

2

q =

αβ

p

-p

Figure 5: Relation between focus measures g1 and

g2 and the defocus parameter α.

8 Real Time Range Sensor

Based on the above results, we have implemented[25][32] the real-time focus range sensor shown inFigure 6. The scene is imaged using a standard12.5 mm Fujinon lens with an additional aper-ture added to convert it to telecentric. Light rayspassing through the lens are split in two direc-tions using a beam-splitting prism. This producestwo images that are simultaneously detected us-ing two Sony XC-77RR 8-bit CCD cameras. Thepositions of the two cameras are precisely fixedsuch that one obtains a near-focus image whilethe other a far-focus image. In this setup, a phys-ical displacement of 0.25mm between the effectivefocal lengths of the two CCD cameras translatesto a sensor depth of field of approximately 30 cms.This detectable range of the sensor can be variedeither by changing the sensor displacement or thefocal length of the imaging optics.

The illumination pattern shown in Figure 4(b)was etched on a glass plate using microlithogra-phy, a process widely used in VLSI. The filter wasthen placed in the path of a 300 W Xenon arclamp. The illumination pattern generated is pro-jected using a telecentric lens identical to the oneused for image formation. A half-mirror is usedto ensure that the illumination pattern projectsonto the scene via the same optical path used toacquire images. As a result, the pattern is al-most perfectly registered with respect to the pix-

CCD 2

CCD 1

IMAGING OPTICS

SOURCE

FILTER

PRISM

Figure 6: (a) The real-time focus range sensor and its

key components. (b) The sensor can produce depth

maps up to 512x480 in resolution at 30 Hz.

els of the two CCD cameras. If objects in thescene have a strong specular reflection compo-nent, cross-polarized filters can be attached to theillumination and imaging lens to filter out specu-larities.

Images from the two CCD cameras are digitizedand processed using MV200 Datacube image pro-cessing hardware. The present configuration in-cludes the equivalent of two 8-bit digitizers, twoA/D convertors, and one 12-bit convolver. Thishardware enables simultaneous digitization of thetwo images, convolution of both images with thetuned focus operator, and the computation of adepth map, all within a single frametime of 33msec with a lag of 33 msec. A look-up tableis used to map each pair of focus measures toa unique depth estimate (see [32] for details).

Several experiments were conducted on bothtextured and textureless surfaces to test the per-formance of the sensor [25] [32]. The performanceevaluation results are summarized in Table 1 anddiscussed in detail in [32].

Depth Accuracy (rms)

Repeatability (rms)

Spatial Resolution

Speed

Delay

Simulatneous Image Grab

SuccessiveImage Grab

256 x 240 512 x 480

30 Hz 30 Hz

33 msec 33 msec

0.24 % 0.34 %

0.23 % 0.29 %

Table 1: Performance characteristics of the sensor.

As stated earlier, structures of dynamic scenescan only be recovered using a real-time sensorsuch as the one we have developed. Figure 7 il-lustrates the power of such a high-speed, high-resolution sensor. The figure shows two bright-ness images and the computed depth map of acup with milk flowing out of it.

(a)

(b)

Figure 7: (a) Two images of a scene taken using

different focus settings. (b) A depth map of the scene

computed in 33 msec by the focus range sensor.

Figure 8 shows a scene with polyhedral objects.The computed depth map in Figure 8(b) is fairlyaccurate despite the complex textural propertiesof the objects. All surface discontinuities and ori-entation discontinuities are well preserved. Fig-ure 9 shows an object’s depth map computed asit rotates on a motorized turntable. Such depthmap sequences are valuable for automatic CADmodel generation from sample objects.

9 Microscopic Shape fromFocus

The optimal illumination filter shown in Figure4(b) was also incorporated into the shape fromfocus system developed in [22]. A set of sampleimages are obtained by automatically moving themicroscope stage in increments of ∆z = zi−zi−1.The Laplacian focus operator is applied to eachimage to obtain a set of focus measure values ateach image point (x, y); the number of focus mea-sures equals the number of images taken. The dis-

(a)

(b)

Figure 8: (a) Near and far focused images of a set of

polyhedral objects. (b) Computed depth map.

(d) (e) (f)

(g) (h) (i)

(a) (b) (c)

Figure 9: Depth maps generated by the sensor at 30

Hz while an object rotates on a motorized turntable.

crete stage position zj that yields the maximumfocus measure value at an image point, can beused as an approximation of the depth of the cor-responding surface point. A more accurate depthestimate z is obtained by applying Gaussian in-terpolation [22] to the three focus measures cor-responding to zj−1, zj , and zj+1.

It was shown in [22] that at least 3 focusmeasures are needed for depth estimation byGaussian interpolation. In the active illumi-nation system proposed here, the fundamentalfrequency of the surface texture is determinedby the illumination filter used and is simply√

( 1tx

)2 + ( 1ty

)2. For this frequency the defocus

measure has a zero-crossing at defocus value α′

such that 2πaα′

d

√( 1tx

)2 + ( 1ty

)2 = 3.83. This gives

us an upper limit on the usable defocus range forany point on the surface, which is, −α′ ≤ α ≤ α′.Therefore, on the image side of the microscopeoptics, we need to obtain (for any surface point)at least 3 images within the above defocus range.This gives us the following maximum distance be-tween consecutive focused images on the imageside of the optics:

∆z′ ≤ 2α′

4=

1

2· 3.83 · d

2πa√

( 1tx

)2 + ( 1ty

)2(20)

This distance on the image side is related to themaximum allowable microscope stage displace-ment (between consecutive images) on the sampleside of the optics by the magnification M of theobjective lens used:

∆z ' ∆z′1

M2(21)

In our experiments, the magnification of the ob-jective lens is M = 20, the ratio a/d = 0.025, andillumination parameters are tx = 44 µm, ty =52 µm. Using these values in eqs.(20) and (21) weget the maximum allowable stage displacement∆z ≤ 0.5 µm. Experiments reported in [26] il-lustrate that ∆z = 0.5 µm does in fact producethe best results. Further decreasing ∆z does notsignificantly improve the accuracy of the depthmaps.

The sample in Figure 10 has rectangular struc-tures fabricated on a smooth silicon wafer. Siliconwafer inspection is of great relevance in a varietyof chip manufacturing processes. The surface ofthe wafer is very smooth, resulting in images (seeFigure 10(a)) that are more or less textureless.This renders the original shape from focus system

[22] ineffective. A total of 16 images were takenfor this sample. The derived illumination patternproduces very accurate shape information that isfar superior to that produced by the bright field il-lumination of the microscope [26]. Similar resultswere obtained for the solder joint sample shown inFigure 11. For this sample, an objective lens withM = 10 was used and a total of 23 sample im-ages were taken. This sample exhibits noticeabletexture under a microscope. However, the tex-ture is not consistent over the entire surface. Asa result, the illumination pattern is necessary toget an accurate depth map. Solder shape inspec-tion has remained a challenging and unresolvedindustrial problem. These results indicate thatmicroscopic shape from focus may provide an ef-fective solution to this important problem.

(a) Camera image (b) Depth Map

Figure 10: Image and depth map of rectangularstructures on a smooth silicon substrate. The struc-tures are approximately 13µm tall.

(a) Camera image (b) Depth Map

Figure 11: Image and depth map of a solder jointon a circuit board. The solder joint is approximately150µm high and 100µm wide.

10 Summary

We have summarized our results on a variety ofissues related to depth estimation by focus analy-sis. Accurate modeling of optics and sensing wereshown to be essential for precise depth estimation.Both textured and textureless surfaces are recov-ered by using an optimized illumination patternthat is registered with the image sensor. We alsopresented an optical solution to constant magnifi-cation defocusing, a problem that has limited theprecision of depth from defocus algorithms. All ofthese results were used to implement a real-timefocus range sensor that produces high resolutiondepth maps at frame rate. This sensor is uniquein its ability to produce fast, dense, and precisedepth information at a very low cost. With timewe expect the sensor to find applications rangingfrom visual recognition and robot control to au-tomatic CAD model generation for visualizationand virtual reality.

The second system we described targets a dif-ferent class of objects, namely, microscopic struc-tures. Using the derived illumination pattern,we have demonstrated a fully automated micro-scopic shape from focus system that can recoverdepth with an accuracy within 1 micron. Thissystem has well-defined applications in the indus-trial arena, where a depth sensor for samples suchas silicon wafers and solder joints is much soughtafter.

The obvious extension to this work is the de-velopment of passive focus range finders for bothindoor as well as outdoor scenes. We have al-ready implemented a passive bifocal vision sensorand are in the process of evaluating its capabil-ities [23]. Such a sensor cannot afford the lux-ury of projected illumination. It must rely oncomplex scene textures for depth estimation. Inthis regard, we have recently developed an effi-cient depth from defocus algorithm [30] that usesa minimal set of operators to recover structuresof scenes with unknwown textures.

References

[1] N. Asada, H. Fujiwara, and T. Matsuyama.Edge and depth from focus. Proc. of AsianConf. on Comp. Vis., pages 83–86, Nov.1993.

[2] P. J. Besl. Range imaging sensors. Techni-cal Report GMR-6090, General Motors Re-search Laboratories, March 1988.

[3] M. Born and E. Wolf. Principles of Optics.London:Permagon, 1965.

[4] V. M. Bove, Jr. Entropy-based depth fromfocus. Jrn. of Opt. Soc. of Am. A, 10:561–566, Apr. 1993.

[5] K. L. Boyer and A. C. Kak. Color-encodedstructured light for rapid active sensing.IEEE Trans. on Pattern Analysis and Ma-chine Intelligence, 9(1):14–28, January 1987.

[6] R. N. Bracewell. The Fourier Transform andIts Applications. McGraw Hill, 1965.

[7] T. Darrell and K. Wohn. Pyramid baseddepth from focus. Proc. of IEEE Conf. onComp. Vis. and Patt. Rec., pages 504–509,June 1988.

[8] J. Ens and P. Lawrence. A matrix basedmethod for determining depth from focus.Proc. of IEEE Conf. on Comp. Vis. andPatt. Rec., pages 600–609, June 1991.

[9] B. Girod and S. Scherock. Depth from focusof structured light. Proc. of SPIE: Optics,Illum., and Image Sng for Mach. Vis. IV,1194, Nov. 1989.

[10] M. Gokstorp. Computing depth from out-of-focus blur using a local frequency represen-tation. Proc. on Intl. Conf. on Patt. Recog.,October 1994.

[11] P. Grossman. Depth from focus. Patt.Recog., 9(1):63–69, 1987.

[12] A. Gruss, S. Tada, and T. Kanade. A vlsismart sensor for fast range imaging. Proc.of ARPA Image Understanding Workshop,pages 977–986, April 1993.

[13] B.K.P. Horn. Robot Vision. MIT Press, 1986.

[14] S. Inokuchi, K. Sato, and F. Matsuda. Rangeimaging system for 3-d object recognition.Proc. of 7th Intl. Conf. on Pattern Recogni-tion, pages 806–808, July 1984.

[15] R. A. Jarvis. A perspective on range find-ing techniques for computer vision. IEEETrans. on Pattern Analysis and Machine In-telligence, 5(2):122–139, March 1983.

[16] T. Kanade. Development of a Video-Ratestereo machine. Proc. of ARPA Image Un-derstanding Workshop, pages 549–557, Nov.1994.

[17] T. Kanade, A. Gruss, and L. R. Carley. Avery fast vlsi rangefinder. Proc. of Intl. Conf.on Robotics and Automation, pages 1322–1329, April 1991.

[18] R. Kingslake. Optical System Design. Aca-demic Press, 1983.

[19] A. Krishnan and N. Ahuja. Range est. fromfocus using a non-frontal imaging camera.Proc. of AAAI Conf., pages 830–835, July1993.

[20] E. Krotkov. Focusing. Intl. Jrnl. of Comp.Vis., 1:223–237, 1987.

[21] H. N. Nair and C. V. Stewart. Robust focusranging. Proc. of IEEE Conf. on Comp. Vis.and Patt. Rec., pages 309–314, June 1991.

[22] S. K. Nayar and Y. Nakagawa. Shape fromfocus. IEEE Trans. on Patt. Anal. andMach. Intell., 16(8):824–831, Aug. 1994.

[23] S. K. Nayar and M. Watanabe. Passive bi-focal vision sensor. Technical Report (inpreparation), Dept. of Computer Science,Columbia University, New York, NY, USA,Dec 1995.

[24] S. K. Nayar, M. Watanabe, and M. Noguchi.Real-time focus range sensor. Technical Re-port CUCS-028-94, Dept. of Computer Sci-ence, Columbia University, New York, NY,USA, November 1994.

[25] S. K. Nayar, M. Watanabe, and M. Noguchi.Real-time focus range sensor. Proc. of Intl.Conf. on Computer Vision, pages 995–1001,June 1995.

[26] M. Noguchi and S. K. Nayar. Microscopicshape from focus using active illumination.Proc. of Intl. Conf. on Patt. Recog., Oct.1994.

[27] A. Pentland. A new sense for depth of field.IEEE Trans. on Patt. Anal. and Mach. In-tell., 9(4):523–531, July 1987.

[28] A. Pentland, S. Scherock, T. Darrell, andB. Girod. Simple range cameras based onfocal error. Jrnl. of Opt. Soc. of Am. A,11(11):2925–2935, Nov. 1994.

[29] M. Subbarao. Parallel depth recovery bychanging camera parameters. Proc. of Intl.Conf. on Comp. Vis., pages 149–155, Dec.1988.

[30] M. Watanabe and S. K. Nayar. Rational fil-ters for passive depth from defocus. Techni-cal Report CUCS-035-95, Dept. of ComputerScience, Columbia University, New York,NY, USA, September 1995.

[31] M. Watanabe and S. K. Nayar. Telecen-tric optics for constant magnification imag-ing. Technical Report CUCS-026-95, Dept.of Computer Science, Columbia University,New York, NY, USA, September 1995.

[32] M. Watanabe, S. K. Nayar, and M. Noguchi.Real-time implementation of depth from de-focus. Proceedings of SPIE Conference, Oc-tober 1995.

[33] R. G. Willson and S. A. Shafer. Modelingand calibration of automated zoom lenses.Technical Report CMU-RI-TR-94-03, TheRobotics Institute, Carnegie Mellon Univer-sity, Jan. 1994.

[34] Y. Xiong and S. A. Shafer. Variable windowgabor filters and their use in focus and cor-respondence. Proc. of IEEE Conf. on Com-puter Vision and Pattern Recognition, pages668–671, June 1994.

Date post:	09-Mar-2018
Category:	Documents
Upload:	hakiet
View:	218 times
Download:	1 times

Focus Range Sensors - Department of Computer Science ... · PDF fileFocus Range Sensors ......

Documents