+ All Categories
Home > Documents > IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR ...cvlab.khu.ac.kr/2015_TCSVT.pdfcovering translucent...

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR ...cvlab.khu.ac.kr/2015_TCSVT.pdfcovering translucent...

Date post: 24-May-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
14
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, DRAFT 1 Recovering Translucent Objects using a Single Time-of-Flight Depth Camera Hyunjung Shim 1* , Member, IEEE, Seungkyu Lee 2 , Member, IEEE Abstract—Translucency introduces great challenges to 3D ac- quisition because of complicated light behavior such as refraction and transmittance. In this paper, we describe the development of a unified 3D data acquisition framework that reconstructs translucent objects using a single commercial Time-of-Flight (ToF) camera. In our capture scenario, we record a depth map and intensity image of the scene twice using a static ToF camera; first, we capture the depth map and intensity image of an arbitrary background, and then we position the translucent foreground object and record a second depth map and intensity image with both the foreground and background. As a result of material characteristics, the translucent object yields systematic distortions in the depth map. We developed a new distance representation that interprets the depth distortion induced as a result of translucency. By analyzing ToF depth sensing principles, we constructed a distance model governed by the level of translucency, foreground depth, and background depth. Using an analysis-by-synthesis approach, we can recover the 3D geometry of a translucent object from a pair of depth maps and their intensity images. Extensive evaluation and case studies demonstrate that our method is effective for modeling the non-linear depth distortion due to translucency and for reconstruction of a three-dimensional translucent object. Index Terms— 3D reconstruction, translucency, Time-of-Flight. I. I NTRODUCTION In recent years the introduction of three-dimensional data has resulted in a new paradigm shift in the multimedia industry. Starting with the success of 3D films such as Avatar in 2010, 3D multimedia contents have received increasing attention within the industries of film, gaming, display, and animation [4]. More recently, the advent of 3D printing has leveraged the innovation of various manufacturing sectors and led to important changes in various industries and related businesses. Despite the explosive demand for 3D multimedia content, 3D image data are rather limited in their use compared to 2D data because the 3D data acquisition technology is not mature enough to guarantee reliable performance. Current state-of-the-art techniques in 3D data acquisition show sat- isfactory performance when capturing an opaque Lambertian object; however, a variety of objects in the real world exhibit non-Lambertian material characteristics such as specularity, translucency, and transparency. To date, there is no unified framework for 3D data acquisition that is commonly applicable for objects regardless of their material properties. In particular, transparency and translucency are caused by light refraction 1* Corresponding author, School of Integrated Technology, Yonsei Univer- sity, 162-1 Songdo-dong, Yonsu-gu, Incheon, South Korea 2 Department of Computer Engineering, Kyung Hee University, 1732, Deogyeong-daero, Giheung-gu, Yongin-si, Gyeonggi-do, South Korea and transmittance in addition to light reflection. It is therefore important to account for light refraction and transmittance in a 3D acquisition framework. A. Related Work To keep pace with this reality, the reconstruction of non- Lambertian objects has been actively studied in recent years. Existing work in 3D reconstruction can be categorized by the target object type. Prior work focuses on a specific problem of non-Lambertian illumination such as a pure specular, trans- parent, translucent surface and global illumination introduced by multipath interference. Table I summarizes existing tech- niques. In this section, we focus on important approaches to re- covering translucent surfaces while summarizing the ideas described in related work. 1) Pure specular surface: It is pos- sible to exploit the mirror (pure specular) reflection property for 3D acquisition. The most representative approaches are shape from distortion, shape from specular flow, and shape from specularity. Shape from distortion projects a known or unknown pattern on the target object and analyzes the distortion of patterns induced by specular reflection. Shape from specular flow [46], [1], [2] exploits the motion of specular flows as depth clues. Shape from specularity [28], [49], [26], [38] uses specular highlights to constrain the surface normals and depth information. 2) Transparent surface: Similar to pure specular surfaces, the appearance of a transparent object is solely governed by the surrounding environment, as described quantitatively by Snell’s law. Various techniques have been developed by extending shape from distortion or direct ray measurement, as introduced in [27]. Several approaches [41], [42], [3], [40] have been developed based on the principle of shape from distortion. Instead of distortions by mirror-like specular reflection, these techniques consider light distortion caused by the refraction of transparent objects, e.g. water, crystals or glass surfaces. Murase [41], [42] first solved the problem of refractive surface reconstruction in computer vision. He captured the water surface using a single camera by placing an unknown pattern at the bottom of a water tank. As the shape of the water surface changed over time, he recorded the corresponding distorted images. Assuming a known re- fractive index, Murase analyzed the trajectory of distortion and used the average traveling path to approximate a background pattern. Given the camera position and estimated pattern, a surface gradient can be estimated for each frame, resulting in a final water shape by integrating surface normals.
Transcript
Page 1: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR ...cvlab.khu.ac.kr/2015_TCSVT.pdfcovering translucent surfaces while summarizing the ideas described in related work. 1) Pure specular

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, DRAFT 1

Recovering Translucent Objects using a SingleTime-of-Flight Depth Camera

Hyunjung Shim1∗, Member, IEEE, Seungkyu Lee2, Member, IEEE

Abstract— Translucency introduces great challenges to 3D ac-quisition because of complicated light behavior such as refractionand transmittance. In this paper, we describe the developmentof a unified 3D data acquisition framework that reconstructstranslucent objects using a single commercial Time-of-Flight(ToF) camera. In our capture scenario, we record a depthmap and intensity image of the scene twice using a staticToF camera; first, we capture the depth map and intensityimage of an arbitrary background, and then we position thetranslucent foreground object and record a second depth mapand intensity image with both the foreground and background.As a result of material characteristics, the translucent objectyields systematic distortions in the depth map. We developed anew distance representation that interprets the depth distortioninduced as a result of translucency. By analyzing ToF depthsensing principles, we constructed a distance model governedby the level of translucency, foreground depth, and backgrounddepth. Using an analysis-by-synthesis approach, we can recoverthe 3D geometry of a translucent object from a pair of depthmaps and their intensity images. Extensive evaluation and casestudies demonstrate that our method is effective for modelingthe non-linear depth distortion due to translucency and forreconstruction of a three-dimensional translucent object.

Index Terms— 3D reconstruction, translucency, Time-of-Flight.

I. INTRODUCTION

In recent years the introduction of three-dimensional datahas resulted in a new paradigm shift in the multimediaindustry. Starting with the success of 3D films such as Avatarin 2010, 3D multimedia contents have received increasingattention within the industries of film, gaming, display, andanimation [4]. More recently, the advent of 3D printing hasleveraged the innovation of various manufacturing sectors andled to important changes in various industries and relatedbusinesses. Despite the explosive demand for 3D multimediacontent, 3D image data are rather limited in their use comparedto 2D data because the 3D data acquisition technology isnot mature enough to guarantee reliable performance. Currentstate-of-the-art techniques in 3D data acquisition show sat-isfactory performance when capturing an opaque Lambertianobject; however, a variety of objects in the real world exhibitnon-Lambertian material characteristics such as specularity,translucency, and transparency. To date, there is no unifiedframework for 3D data acquisition that is commonly applicablefor objects regardless of their material properties. In particular,transparency and translucency are caused by light refraction

1* Corresponding author, School of Integrated Technology, Yonsei Univer-sity, 162-1 Songdo-dong, Yonsu-gu, Incheon, South Korea

2 Department of Computer Engineering, Kyung Hee University, 1732,Deogyeong-daero, Giheung-gu, Yongin-si, Gyeonggi-do, South Korea

and transmittance in addition to light reflection. It is thereforeimportant to account for light refraction and transmittance ina 3D acquisition framework.

A. Related Work

To keep pace with this reality, the reconstruction of non-Lambertian objects has been actively studied in recent years.Existing work in 3D reconstruction can be categorized by thetarget object type. Prior work focuses on a specific problemof non-Lambertian illumination such as a pure specular, trans-parent, translucent surface and global illumination introducedby multipath interference. Table I summarizes existing tech-niques.

In this section, we focus on important approaches to re-covering translucent surfaces while summarizing the ideasdescribed in related work. 1) Pure specular surface: It is pos-sible to exploit the mirror (pure specular) reflection propertyfor 3D acquisition. The most representative approaches areshape from distortion, shape from specular flow, and shapefrom specularity. Shape from distortion projects a knownor unknown pattern on the target object and analyzes thedistortion of patterns induced by specular reflection. Shapefrom specular flow [46], [1], [2] exploits the motion of specularflows as depth clues. Shape from specularity [28], [49], [26],[38] uses specular highlights to constrain the surface normalsand depth information.

2) Transparent surface: Similar to pure specular surfaces,the appearance of a transparent object is solely governedby the surrounding environment, as described quantitativelyby Snell’s law. Various techniques have been developed byextending shape from distortion or direct ray measurement,as introduced in [27]. Several approaches [41], [42], [3],[40] have been developed based on the principle of shapefrom distortion. Instead of distortions by mirror-like specularreflection, these techniques consider light distortion causedby the refraction of transparent objects, e.g. water, crystalsor glass surfaces. Murase [41], [42] first solved the problemof refractive surface reconstruction in computer vision. Hecaptured the water surface using a single camera by placingan unknown pattern at the bottom of a water tank. As theshape of the water surface changed over time, he recordedthe corresponding distorted images. Assuming a known re-fractive index, Murase analyzed the trajectory of distortion andused the average traveling path to approximate a backgroundpattern. Given the camera position and estimated pattern, asurface gradient can be estimated for each frame, resulting ina final water shape by integrating surface normals.

Page 2: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR ...cvlab.khu.ac.kr/2015_TCSVT.pdfcovering translucent surfaces while summarizing the ideas described in related work. 1) Pure specular

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, DRAFT 2

TABLE ISUMMARY OF RECENT WORK FOR 3D RECONSTRUCTING NON-LAMBERTIAN OBJECTS

Material properties Class of techniques Representative workShape from distortion [47], [23], [5], [50], [10]

Pure specular surface Shape from specular flow [46], [1], [2]Shape from specularity [28], [49], [26], [38]

Transparent surface Shape from distortion [41], [42], [3], [40]Direct ray measurement [34], [35], [39]

Structured light [45], [44], [7], [8], [9], [18], [21], [19], [20]Polarization imaging [24], [7]

Translucent surface Shape from scattering [29], [30], [11]Shape from specularity [6], [38]

Multiple Modulation [33], [15], [12], [17], [16], [22]Multipath interference Modified ToF imaging [32], [51], [52](Global illumination) Restoration,Deconvolution [31], [14], [13]

Shape from Intterreflection [43], [37]

Like other shape from X techniques, shape from distortionaims to only recover the surface normals and suffers fromdepth-normal ambiguity (i.e., a single normal does not corre-spond to a single depth value). Direct ray measurement intro-duces additional observation to recover both depth and normalssimultaneously. Kutulakos and Steger [34], [35] analyzed thefeasibility of recovering the depth and normal for a refractivesurface and investigated the requirement for viewpoints andscene configuration for success of the algorithm. Finally, witha known refractive index and a known ground pattern, theyused a piece-wise light triangulation to recover the refractivesurface. More recently, Morris and Kutulakos [39] developed astereo system with a known background pattern for recoveringthe dynamic liquid surface with an unknown refractive index.

3) Translucent surface: The appearance of translucent sur-faces is the result of complex light transport, includingrefraction, transmittance, and multiple reflections. Existingtechniques are classified into either shape from specularity,modified structured light, polarization-based techniques, orshape from scattering, as described below.

Shape from specularity can be also applied to translucentsurfaces. Since specular reflection is independent of sub-surface scattering, specular highlights are considered robustcues to identify translucent surfaces. Chen et al. [6] usedthis property to acquire the mesostructure of translucent andrefractive objects. Ma et al. [38] also used specular reflectionto recover translucent surfaces such as human skin.

Structured light techniques are one of the most popular 3Dacquisition approaches and have been extensively studied inthe computer vision community over the last few decades.Recently, an extension of structured light techniques has beeninvestigated in order to handle the scene in the presence ofglobal illumination, including translucent surfaces. Park etal. [45] combined a multiview stereo with a structured lightsetup to reduce the effects of global illumination, whereasHermans et al. [25] made use of a moving illumination patternwith a constant velocity and show reliable performance underglobal illumination. Nayar et al. [44] also applied structuredillumination to separate the indirect illumination from im-

ages and inspired several later studies in phase-modulatedstructured light techniques to account for global illumination[7], [8], [9], [18]. Most recently, Gupta et al. [19], [20]adopted the structured patterns motivated by logical codingand decoding method. They analyzed the range effects ofglobal illumination and developed structured patterns resilientto both short- and long-range effects. Additionally. Guptaand Nayar [21] invented a modulated phase shifted patternto further improve the accuracy of 3D reconstruction underglobal illumination and illumination defocus.

Polarization-based techniques use polarized light that prop-agates in a single plane direction [24]. A variety of techniquesuse polarization to reconstruct translucent surfaces [7].

Shape from scattering represents a class of techniques thatmodel the scattering behavior explicitly and simultaneously es-timate the shape and scattering parameters [29], [30]. Inoshitaet al. [29] formulated the relationship between the intensityof single scattering and the shape of a translucent object.Assuming a homogeneous surface with a known refractiveindex, they compute the height map of a translucent objectin the presence of inter-reflection. Most recently, Inoshita etal. [30] and Dong et al. [11] used a deconvolution techniqueto eliminate the scattering illumination from images. Theythen applied the conventional photometric stereo approach torecover the surface normals of a planar and optically thicktranslucent object.

4) Multipath interference in ToF imaging: Recently, ToFimaging techniques solve the depth distortion caused by mul-tipath interference [33], [15], [12], [17], [16], [31], [14], [13],[22]. The first approach to handling the multipath interferences[33], [15], [12], [17], [16], [22] uses depth measurementsat multiple modulation frequencies and extracts the correctdepth from the contaminated depth. However, multi-frequencybased approaches often require an exponential fitting, whichis known to be ill-posed and difficult to implement.

Lately, Velten et al. [51] proposed a modified ToF imagingframework for recovering the shape of objects beyond theline of sight. The key idea is to use the ultra-fast Time-of-Flight imaging technique and measure the light reflecting off

Page 3: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR ...cvlab.khu.ac.kr/2015_TCSVT.pdfcovering translucent surfaces while summarizing the ideas described in related work. 1) Pure specular

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, DRAFT 3

walls or other obstacles. A similar idea is introduced by Wuet al. [52], who modeled global illumination using Time-of-Flight (ToF) imaging. They measured the time profile at eachpixel and used it to identify the regions affected by globalillumination and detect the edges. Note that the ToF imagingtechnique used in this study records a series of reflected lightalong the time to obtain a x-y-t volume. The ToF imagingtechnique used in studies [51] and [52] is different from thesensing technique implemented in a commercial ToF depthcamera. The ToF depth camera sends either box or sinusoidwaves, measures the reflected wave, identifies the phase delay,and calculates the depth from the delay. Therefore, the ToFdepth camera produces one depth map per time step and thismethod is suitable for practical implementation. In contrast,the ToF imaging technique described in references [51] and[52] requires highly accurate temporal sampling (e.g., picosecond resolution) and is prohibitive for practical implementa-tion. Kadambi et al. [32] developed a custom time-coded ToFimaging framework inspired by coded exposure techniques.They formulated the multipath interference by the illuminationsignal convolved with a scene-dependent kernel and solvedthe estimation by adapting the sparsity and scene-dependentconstraints.

Another approach [31], [14], [13] develops the radiometricmodel to predict the sensor measurement and optimizes thecorrect depth with several scene constraints such as Lamber-tian patches or ambient lighting. The multipath problem issimilar to ours in that it suffers from scene-dependent depthdistortion and attempts to factorize the mixed measurement.However, the distortion is a few centimeters from the correctdepth because the source of errors is the indirect light fromadjacent surface patches. As a result, the contaminated mea-surement can be used as an initial estimate and the inverseproblem can be solved using either multi-frequency-basedsamples or scene constraints. Because the depth distortion bytranslucency ranges from few centimeters to few meters, it isinfeasible to recover the correct depth from raw measurementsusing constraint-based optimization.

For more information about related work, please refer toreference [27].

B. Proposed Algorithm

For acquiring non-Lambertian materials, even the mostrecent techniques require specialized equipment and a con-trolled environment. Hence, despite their impressive perfor-mance and technical contributions, the current approaches toreconstructing non-Lambertian objects are prohibitive for anordinary user and general practical applications. Unfortunately,reproducibility by ordinary users is crucial for the successof commercial technology because nowadays a considerableproportion of multimedia data is created by users. In this paper,we propose a new, unified 3D data acquisition framework thatreconstructs translucent objects using a single, commercialTime-of-Flight (ToF) camera. Compared to existing techniquesfor recovering non-Lambertian objects, our work has twoimportant advantages. First, our framework is applicable forpractical scenarios and easily operated by ordinary users.

Without constraints on environmental conditions and user’sproficiency, our work shows reliable performance using only asingle consumer depth camera. Second, we provide a unifiedframework for both Lambertian and translucent objects. Allexisting techniques consider a single class of object mate-rials as either Lambertian, specular, or translucent and thendevelop an optimal 3D reconstruction methodology for thatspecific class of object. However, recognizing the class ofobject materials and localizing the region represent anotherchallenging research problem. Consequently, integrating theexisting techniques is unlikely to offer an ultimate solutionfor reliable 3D reconstruction. Our method can automaticallylocalize the translucent region of an object and process thedepth measurements to recover its original depth information.We then develop a generalized representation associated withthe level of translucency. In this way we are able to modelboth Lambertian and translucent objects.

Our framework consists of three stages. The first stage isdata acquisition. In our capture scenario, we record the depthmap and intensity image of the scene twice using a static ToFcamera, first capturing a depth map of an arbitrary background,and then placing the target translucent object at the foregroundand recording a second depth map with both the foregroundand background. Because of light transmittance and refraction,the translucent object yields systematic depth distortion in thedepth map. In the second stage, we identify the translucentsurface points given a pair of recorded depth maps. Since thecamera position is fixed, the two depth maps are perfectlyaligned. We then compare the two depth maps, find a 3Dpoint with significant discrepancy between a pair of depthmeasures, and identify it as a translucent point. The finalstage is to analyze the depth distortion of translucent pointsand recover the original depth value. For that, we constructa distance model that interprets the depth distortion causedby translucency. To develop the mathematical representation,we assume that the most translucent object is represented bya single layered homogeneous surface with ignorable lightrefraction and subsurface scattering as the spatial distributionof incident light. We ignore the subsurface scattering effectsbecause the errors produced by subsurface scattering are rathersmall, in the range of few millimeters. . Also, as the amount oferror is consistent over a surface with uniform materials theseeffects do not yield strong shape distortion. In this paper, wefocus on large-scale depth distortion, which is often a fewmeters. Because the appearance of a purely transparent objectis mostly governed by the light refraction, our assumptions arenot valid for transparent objects. Finally, we develop a two-layer distance model consisting of a single-layer foregroundand background. By analyzing the ToF depth sensing principle,we construct a distance model governed by the level oftranslucency, the foreground depth, and the background depth.With an analysis-by-synthesis approach, we can recover the 3Dgeometry of a translucent object from a pair of depth mapsand their intensity images.

Whereas Gupta et al. [19], [20] focus on the global il-lumination caused by a short-range effect (e.g., the surfacemeso-structure) and a long-range effect (e.g., the local inter-reflection within the object), we handle the depth distortion

Page 4: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR ...cvlab.khu.ac.kr/2015_TCSVT.pdfcovering translucent surfaces while summarizing the ideas described in related work. 1) Pure specular

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, DRAFT 4

Fig. 1. Light transport in two layer scene. Foreground is a translucent surface and background is an opaque surface.

caused by a far-long range effect (e.g., between foregroundand background). A far-long range yields a significant depthdistortion, and it is almost impossible to retain the originalshape of the object. The major contributions of this paper canbe summarized as follows:

• We reconstruct a translucent object using a single Time-of-Flight depth camera.

• We provide a practical solution to acquiring the 3Dgeometry of translucent objects even that is operable byan ordinary user. Our framework is reproducible withoutmodifying the camera hardware or firmware.

• To the best of our knowledge, this is the first attemptto resolve a far-long range effect of depth distortion fortranslucent object.

The paper is organized as follows. In Section II, we developour two-layer distance model that simulates the light transportaccounting a far-long range effect. In Section III, we furtherderive the specialized distance model for the ToF sensor, aspecial case of a general two-layer distance model. We thenemploy this model to reconstruct the 3D object using a pairof depth maps (Section IV). To show that our framework iseff6ective on various real objects, extensive experiments areconducted for model validation, quantitative evaluation, andqualitative evaluation; these results are provided in SectionV. We address some interesting findings and limitations inSection VI. Finally, this paper is concluded in Section VII byhighlighting the advantages of the proposed work and statingour plans for future work.

II. N LAYER GEOMETRY

Considering a translucent foreground and opaque back-ground surface, we can represent the amount of light recordedat each pixel by a combination of light rays interacting withthe foreground and background surfaces. Suppose that thedirection of an emitted light ray is equivalent to that of theviewing ray at each pixel. This assumption is valid for generalactive imaging devices. Fig. 1 illustrates the light rays arriving

at the pixel using N th bounce and the light rays transmittingthe foreground using the N th transmittance.

We denote the foreground distance by df , the translucentparameter by ξ and the brightness of the emitted IR light byLin. The translucent parameter depends on the thickness andabsorption ratio of the surface. The distance d is computedby√x2+y2+z2 when {x,y,z} is a 3D coordinate of the

point. The 1st bounce is an emitted light ray reflected by theforeground. Its brightness is a weighted fraction of an emittedlight, precisely the fraction determined by ξ being weightedby its surface albedo and incident angle. Since the light isinversely proportional to the square of its traveling distance,the intensity of a 1st bounce ray (L1) can be expressed as

L1=(1−ξ)ρf cosθfLin

d2f. (1)

After the first reflection at the foreground, 1st transmittance isdetermined by incoming light transmitting the foreground. Thetrajectory of light might change depending on the refractiveindex of the foreground. However, for simplicity we regard theforeground as a single layer and express the light transmittanceindependent of refractive index in Fig. 1. The 1st transmittanceray reflects off the background, arrives at the foreground,transmits the foreground from the back and finally returnsto the pixel. This forms the 2nd bounce ray. We denote itsbrightness as L2 and it can be modeled by

L2=ρbcosθbξ

2ρ2f cos2θfLin

d2b. (2)

The fraction of light traveling from the background reflectsoff the foreground and returns to the background, namely the2nd transmittance. The N th bounce and N th transmittancecan be defined in the same manner. Finally, we formulate thebrightness of light recorded at each pixel by the sum of infinitenumber of bounce rays as follows.

Ltotal=(1−ξ)ρf cosθfLin

d2f+ρbcosθbξ

2ρ2f cos2θfLin

d2b+

Page 5: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR ...cvlab.khu.ac.kr/2015_TCSVT.pdfcovering translucent surfaces while summarizing the ideas described in related work. 1) Pure specular

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, DRAFT 5

Fig. 2. Computing {Q1,Q2,Q3,Q4} from the phase delay. This figuredemonstrates four different switches in ToF depth camera collect the electriccharges. Red region indicates the amount of electric charge for each switchgiven reflected IR wave.

∞∑i=0

ξ2(1−ξ)i+1(ρf cosθf )i+3(ρbcosθb)

i+2Lin(db+(i+1)(db−df ))2

. (3)

III. TWO-LAYER DEPTH MODEL FOR TOF DEPTHCAMERA

Given a translucent object, Eq. 3 explains the brightness oflight detected by general active imaging devices. A Time-of-Flight (ToF) camera uses infrared (IR) LEDs to illuminate thescene and measures its response to calculate the depth map ofthe scene. Based on the principle of a ToF depth camera, theemitted light is an IR wave and we detect the returned waves tocompute the phase delay of the IR wave. More specifically, thedepth camera uses several switches (see Fig. 2) and connectedmemory elements (e.g., a capacitor) and collects the incominglight. In the following illustration and our implementation, weuse four switches with two memory elements and considerthe emitted IR wave to be a square signal. For a real device, asinusoidal signal is often used for approximating the IR wave.However, in our formulation we use a square wave because itis intuitive and simple to understand. Given the speed of lightc, the distance value d[36] can be computed by

d=c

2tan−1

{L(Q3−Q4)

L(Q1−Q2)

}=c

2tan−1

{Q3−Q4

Q1−Q2

}. (4)

Notice that Qm stands for the electric charge of the mth

switch and the phase shift between Qm−1 and Qm is exactly90 degrees. In Fig. 2, each capacitor collects the differenceof electric charges and they are equally weighted by theamplitude of light L. As a result, their ratio reveals the distancevalue. Since a ray direction can be identified by its pixelposition, we can compute the 3D coordinate {x,y,z} from

its distance value. The current depth sensing technology usesEq. 4 for implementation. This formula provides a unique andcorrect distance solution only if the reflected IR wave comesfrom a single, well-illuminated surface point. In other words,the current sensing principle inherently poses the opaque Lam-bertian surface assumption. Therefore, this formula becomesinvalid with a translucent object.

Fig. 3 shows that the translucent surface yields multiple IRwaves with different phase delays. From now on, we ignorethe high-order bounce rays (i.e., greater than or equal to 3) andapproximate the reflected wave by the sum of the 1st and 2nd

bounce rays. This is reasonable since high-order bounce raysare significantly attenuated by traveling distance and surfacealbedo. As a result of two waves traveling from the foregroundand background, the reflected IR wave is no longer a squarewave. In Eq. 3, we derive the general expression of reflectedlight caused by a translucent surface. Employing this generalexpression for the ToF depth camera, we can formulate a two-layer distance model that determines the distance value d ofthe translucent point. That is,

d=c

2tan−1

{L2(Q

b3−Qb4)+L1(Q

f3−Q

f4 )

L2(Qb1−Qb2)+L1(Qf1−Q

f2 )

}, (5)

when Qbm and Qfm represent the electric charge of the mth

switch for the background and foreground pixel respectively.Given this two-layer distance model, our goal is to find theforeground distance df written as:

df =c

2tan−1

{Qf3−Q

f4

Qf1−Qf2

}. (6)

Fig. 3. ToF depth sensing with two layer translucent scene. Given the twolayer scene, a total reflected IR wave is the sum of two waveform and theintegral of this waveform results in a wrong distance value.

Page 6: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR ...cvlab.khu.ac.kr/2015_TCSVT.pdfcovering translucent surfaces while summarizing the ideas described in related work. 1) Pure specular

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, DRAFT 6

IV. RECONSTRUCTING A TRANSLUCENT SURFACE USINGTOF DEPTH CAMERA

To recover the 3D surface of translucent object, we capturethe scene twice using a ToF depth camera. We capture thefirst depth map of the arbitrary background, and then placethe target translucent object at the foreground and capturethe second depth map of the scene in the presence of boththe foreground and background. When acquiring one depthmap, we simultaneously capture the IR intensity image. Thisis always available for any ToF depth camera. In our scenario,we obtain two intensity images corresponding to two depthmaps and denote them Ib and I , respectively, which havethe values ρbcosθbLin/d2b and Ltotal from Eq. 3. Finally, ourobservations are db, Ib, d and I . By comparing a pair of thesedepth maps, we identify the translucent surface points. Underthe fixed camera position, the two depth maps are perfectlyaligned. As a result, we find a set of 3D points with significantdiscrepancy between a pair of depth measures and label it asthe translucent point. The remaining part of this section appliesto translucent points.

In Eq. 5, we have a set of unknown variables{ρbcosθb,ρf cosθf ,df ,Lin,ξ}. Fortunately, we can substituteρbcosθb and ρf cosθf by a function of Ib and I . To do that,we first approximate Ltotal by L1+L2 to give I=L1+L2.Since Ib carries ρbcosθb, we can rewrite I as

I = (1−ξ)ρf cosθfLind2f

+Ibξ2ρ2f cos

2θf . (7)

Assigning X=ρf cosθf , Eq. 7 becomes the quadratic equa-tion for X . That is,

ξ2IbX2+

1−ξd2f

LinX−I=0. (8)

The solution of Eq. 8 becomes

X=

− 1−ξd2fLin±

√(1−ξ)2d4f

L2in+4ξ2IbI

2ξ2Ib. (9)

From the physical constraint of each variables, we know X≥0, 0≤ξ≤1, Lin,df ≥0. Then, X is uniquely determined by

X=

− 1−ξd2fLin+

√(1−ξ)2d4f

L2in+4ξ2IbI

2ξ2Ib. (10)

Suppose Lin is given by a calibration technique. From thisderivation, remaining unknowns needed to compute ρf cosθfare df and ξ. Also, ρbcosθb is identified by Ib/Lin. Finally,we reduce a set of unknowns by {df ,ξ}.

Combining Eqs. 1, 2, and 10, we rewrite the formula in Eq.5 in terms of two unknowns {df ,ξ} and four known values{Lin,Ib,I,db}.

d = f(df ,ξ),

=c

2tan−1

{A(Qb3−Qb4)+B(Qf3−Q

f4 )

A(Qb1−Qb2)+B(Qf1−Qf2 )

}, (11)

where

A=−(1−ξ)Lin+

√(1−ξ)2L2

in+4ξ2IbId4f

2Lind2f,

B=(1−ξ)d2f

.

Consequently, we can predict the observation d by hypothe-sized df and ξ using Eq. 11. This is possible because Q3−Q4

and Q1−Q2 map to an unique distance value. Fig. 2 describeshow to compute Qb1−Qb2 and Qb3−Qb4 from db (or Qf1−Q

f2

and Qf3−Qf4 from df ). For every pixel, we set the hypothesis

of df and ξ. Based on a analysis-by-synthesis strategy, weevaluate each hypothesis by comparison with the observationd, and choose a pair to minimize the prediction error. Ourobjective function is written as follows:

[df ,ξ]=argmindf ,ξ‖d−f(df ,ξ)‖2. (12)

Since we have a single equation to solve two unknowns, thisis still an ill-posed problem. To eliminate such ambiguity, weimpose the homogeneous surface assumption. By letting alltranslucent pixels vote for the same ξ, we make this estima-tion problem reliable and tractable. Under the homogeneoussurface assumption, we regard ξ independent of position.Therefore, we first hypothesize ξ and estimate a set of optimaldistances for translucent points. By accumulating the errorsover all translucent points, we determine the error for ξ andchoose the one corresponding to the minimum error. Based onthis iterative approach, we can determine ξ and a set of df fortranslucent points.

A. Emitted Light Distribution

Previously, Lin was assumed to be known. In this subsec-tion, we explain how to compute the emitted light distributionLin for an arbitrary depth camera. Notice that Lin is theamount of light leaving the sensor, which is different fromthe radiance arriving at a patch. We use a white opaque planeobject as a calibration object. The reflected light is modeledby ρcosθfLin

d2 , which is a special case of Eq. 1 with ξ=0.Considering the plane object, we fit the plane using leastsquares and identify its surface normal. This reveals cosθ; acosine angle between the surface normal and the line of sight.We consider that the incoming light direction is equivalent tothe line of sight, e.g. a vector connecting the 3D point and theoptical center of the camera. We set ρ of this white plane as1 and use the distance of each pixel to compute d. Since thereflected light is measured by the IR intensity I , we formulateLin as follows.

Lin=Id2

ρcosθ. (13)

Then, it is possible to determine the Lin at every pixel. Thisprocedure is required just once per depth camera. In practice,

Page 7: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR ...cvlab.khu.ac.kr/2015_TCSVT.pdfcovering translucent surfaces while summarizing the ideas described in related work. 1) Pure specular

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, DRAFT 7

Lin is proportional to the exposure condition t. Lin at anarbitrary exposure can thus be rewritten by

Lin|t=t0 =t0Lin|t=t1

t1. (14)

In our experiment, we set t=1.5ms to obtain Lin and weightit according to the exposure of input object by Eq. 14.

V. EXPERIMENTAL RESULTS

In this section, we first validate the proposed distancemodel with real objects and then conduct experiments torecover the translucent object from a pair of depth maps (withcorresponding IR intensity images). Our experiments includeboth quantitative and qualitative evaluation. For quantitativeevaluation, we measure the accuracy of the proposed algorithmusing planar objects. We chose planar objects because it isfavorable to obtain the groundtruth depth using the shapeprior while legitimate to evaluate the proposed algorithm aswe process each pixel independently without any shape prior.For qualitative evaluation, we include experimental objects ofvarious shapes and materials.

A pair of depth maps is acquired with a MESA SR4000camera for each target object. The operating range of the depthcamera is between 0.5 and 5 meter from the camera, whichis equivalent to our operating range for both foreground andbackground surface. Because our scene configuration occupiesa wide range of distance, the dynamic range of raw data isoften insufficient to capture valid depth measurements. Thelack of dynamic range causes either saturated or severelynoisy depth measurements and effective depth accuracy isfar below its ideal specification. The low dynamic range ofthe sensor becomes particularly problematic when the targetsurface is too dark, too far, or highly specular. To addressthis issue, we apply a hybrid exposure technique for depthimaging based on reference [48]. That is, we combine depthmaps under two exposures and determine an optimal depthfor each pixel. Throughout this paper, the raw data representthe results of the hybrid exposure technique and we conductthe experiments on top of these processed data. Althoughtheoretically this approach allows us to capture the entirerange of opaque surface, it often fails to capture the mirrortype of objects because the shortest exposure for commercialdepth cameras is not small enough to record the valid depthmeasurement of mirrors. Because of this practical concern, thecurrent implementation cannot handle mirror materials for thebackground.

A. Validating the Proposed Model using Real Objects

In this section, we validate the proposed model using realobjects and a commercial ToF depth camera. Our experimentalobjects are two translucent file folders presenting differentlevels of translucency. The depth camera used throughout thispaper was a MESA SR4000 with the following specification:resolution of 176×144, FOV of 44×35◦, and operating rangebetween 100 and 5000 mm. To collect the data, we first capturethe background scene, a white planar wall. Next, we record aseries of depth map and IR intensity images by locating the

Fig. 4. Model fitting using real objects. Note that we approximate ξ=0.72for the object 1 and ξ=0.91 for the object 2.

target object at different positions. We place the target objectperpendicular to the optical axis of the camera and considerit a planar surface. To acquire the groundtruth 3D points, weattach a white matte paper at half of the plane, use the 3Dpoints of the matte surface to fit a planar surface, and considerthis the groundtruth 3D information for the target object.

As a result, we obtain a set of actual 3D measurementsand IR intensity, and their groundtruth and background 3Dpoints. We are also given Lin by a separate calibrationmethod. Consequently, ξ is the single remaining unknownvariable in Eq. 11. In this experiment, we acquire 20 differentdepth maps, each corresponding to the foreground position at{700,800,...,2600} mm from the camera. The background isfixed at 3300 mm from the camera. Finally, we find the optimalξ that approximates 20 different depth maps simultaneously.Using this optimal ξ, we synthesize the 3D points of thetarget object and compare it with actual measurements. Fig.4 illustrates our model approximation compared to the actualmeasurements.

The photographs of the two file folders clearly show thedifferent translucency. The technique yields a significant dif-ference in 3D measurements: the more translucent the surface,the more distortion in 3D measurement. For object 2, aforeground depth of less than 800 mm appears to be behindthe background position. Such a depth reversal occurs when1) the shape of the reflected waveform in Fig. 3 forms twoseparate square waves, and 2) the amplitude of the foregroundwaveform is much lower than that of the background. Inthis circumstance, the ratio of electronic charges becomesambiguous, equivalent to the case when the 3D point issomewhat behind the background. This happens when thedistance between the foreground and background is greaterthan or equal to half of the maximum operating range andthe foreground object is nearly transparent. In Fig. 4, weobserve a depth reversal when the foreground is at 800 mm,the background is 3300 mm and the translucency is 0.91. Sincethe interval between the foreground and background is 2500mm, which is exactly the half the maximum operating range(5000 mm) and the foreground object is highly translucent,

Page 8: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR ...cvlab.khu.ac.kr/2015_TCSVT.pdfcovering translucent surfaces while summarizing the ideas described in related work. 1) Pure specular

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, DRAFT 8

both the observation and our prediction show depth reversal.Even if the interval between the foreground and background islarge enough, depth reversal never appears with a foregroundof low translucency. Such a behavior is correctly implementedin the proposed model.

The overall approximation error between the measurementand our estimate is 70 mm by root mean square error (RMSE).As noted in Fig. 4, the estimated ξ is 0.72 for object 1 and0.91 for object 2.

B. Quantitative EvaluationWe next chose two planar objects different from those

used in Section V-A to show that our method is effective forreconstruction of various planar objects. These objects were atranslucent document holder and a file box. To assess the ac-curacy of reconstructed 3D points, we apply the same schemeintroduced in Section V-A; attaching a thin white matte paperon half of the object, finding a plane using least squares andcomputing a root mean square (RMS) distance from this plane.Each target object is placed at {500,750,1000,1750}mm giventhe background at 2000 mm. As in the validation experiment,our background scene is a white wall. Since our test objectpresents both translucency and opacity for groundtruthingpurpose, we are unable to automatically localize translucentpoints. Therefore, we manually select both the groundtruthplane and region of interest (ROI) (see the green and cyanboxes in Fig. 6) and compute RMS errors as shown in Fig. 5.

From Fig. 5 (a), the average error in the raw data is approxi-mately 195.3 mm whereas our results present an average errorof 10.6 mm. Considering that the standard deviation (STD)for the groundtruth plane was 2.4 mm, our 3D reconstructionis fairly accurate. In Fig. 5, the target object for (b) is moretranslucent than that used for (a). Analogous to our prediction,this causes more distortions in the raw data; the errors in rawdata are much greater in Fig. 5 (b). The average error of rawdata in Fig. 5 (b) is 725.6 mm whereas the average error ofour reconstruction results is 51.7 mm. In addition to averageerrors, we also report the STD of our estimates as bars in thegraph. The average STD for our estimates is 3.1 mm for Fig.5 (a) and 8.0 mm for Fig. 5 (b), comparable to the value of2.4 mm for the groundtruth plane. These results confirm thatour method is fairly reliable for recovery of the original shapeof a target object.

These experiments yielded some interesting findings thatwere interpreted well by our model. First, we observe thatthe errors significantly increase when the foreground objectis located close to the background. From Fig. 5 (a), the errorsuddenly increases at 1750 mm. This corresponds to the casewhen the distance between the foreground and backgroundis approximately 250 mm. We interpret this behavior as theresult of high-order bounce rays due to inter-reflection betweenforeground and background. Although high-order bounce raysare ignorable in typical circumstances, they can affect depthmeasurements if the interval between the foreground andbackground is considerably small (e.g. less than 250 mm inour experiment).

Second, the reconstruction problem becomes more challeng-ing for nearly transparent objects. From the data for object 2 in

Fig. 4, we confirm that our model correctly predicts the depthdistortion of a nearly transparent object using 20 differentdepth maps. Even though our model successfully synthesizesthe depth of nearly transparent surfaces, reconstruction per-formance varies according to the noise level of observation.In fact, the noise level is significantly increased by the levelof translucency. The measurement from a nearly transparentobject is composed of a small amount of 1st bounce ray anda large amount of 2nd bounce ray; the smaller the amountof 1st bounce ray, the lower the signal-to-noise ratio. Such ameasurement is less reliable for extraction of the foregroundinformation. This behavior stands out if a target object is faraway from the camera because the small amount of 1st bounceray is further attenuated by the power of its traveling distance.As a result, reconstruction errors increase as the distanceof the foreground object is further from the camera. Weobserve the same trend in Fig. 5 (b). The overall reconstructionerror is worse than that of Fig. 5 (a) and is increased withforeground distance. The error from raw data is decreased asthe foreground distance is increased. This is because raw dataconsistently locates near the background when the foregroundis nearly transparent. Consequently, the error in raw datais proportional to the distance between the foreground andbackground. Additionally, translucency is a source of non-linear depth distortion in the raw data and causes complete lossof the original object shape. Fig. 6 (a) and (b) visualize the3D point clouds of both raw data and our estimates. Grey dotscorrespond to raw data, cyan dots indicate our reconstructionwithin the ROI and red dots are raw data within the ROI. Thered dots exhibit severe distortions, breaking the original planeshape. Note that our results consistently recover the planeshape similar to the groundtruth.

C. Qualitative Evaluation

Figs. 7 and 8 visualize our reconstructions for several realobjects. Our algorithm is capable of reconstructing objectswith various shapes (e.g., planar and cylindrical shapes), anddifferent materials (e.g., plastic, paraffin). Unlike previousexperiments for model validation and quantitative evaluation,we automatically determine translucent regions by comparingthe foreground depth map with the background depth map.We simply choose pixels for which the difference betweenthe two maps is larger than 30 mm and consider those astranslucent points. We choose 30 mm as a threshold becausethe standard deviation of background is approximately 10 mmin our experiments and fine that 30 mm is a sufficient thresholdvalue to avoid selection of noisy pixels.

Fig. 7 shows several planar objects made of various mate-rials. To better illustrate our results and raw data, we presentboth 2D depth maps and 3D point clouds from two differentperspectives; a close-up view and overhead view (e.g.. a viewof the object from above). From raw data (denoted by reddots), we observe severe depth distortions caused by translu-cency. As expected by our distance model, a nearly transparentobject such as a file folder exhibits significant shape distortion.For various materials, the proposed algorithm consistentlyproduces reasonable results in recovering the original planar

Page 9: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR ...cvlab.khu.ac.kr/2015_TCSVT.pdfcovering translucent surfaces while summarizing the ideas described in related work. 1) Pure specular

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, DRAFT 9

(a) Document holder (Estimated ξ=0.65) (b) File box (Estimated ξ=0.94)

Fig. 5. Quantitative evaluation of reconstruction accuracy using real objects. These plots illustrate RMS errors (raw 3D data for dotted line and reconstructed3D for solid line) as a function of groundtruth distance of planes. Bars indicate standard deviation and the graph (a) includes the close shot of errors. Noticethat the standard deviation for measuring the groundtruth plane was 2.4 mm in average and the background plane is located at 2000 mm.

(a) Document holder (Estimated ξ=0.65, df =1250 mm)

(b) File box (Estimated ξ=0.94, df =500 mm)

Fig. 6. 3D/2D Visualization of raw data and our reconstruction. We visualize our reconstruction using cyan dots and corresponding raw data using red dotsin 3D space. The color-coded map presents the depth map of raw data and our results.

shapes. Compared to the groundtruth position in 3D space(annotated by a green line) and the reference depth map,our reconstruction predicts the original position and shapeof objects reasonably well. To create a reference depth map,

we fit the 3D points of our estimates into planar surfaces.The reference is not exactly the groundtruth but provides agood reference for the underlying shape. In practice, all ofthe target objects in Fig. 7 have an inhomogeneous surface

Page 10: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR ...cvlab.khu.ac.kr/2015_TCSVT.pdfcovering translucent surfaces while summarizing the ideas described in related work. 1) Pure specular

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, DRAFT 10

(a) Cabinet bin (Estimated ξ=0.85)

(b) File folder (Estimated ξ=0.95)

(c) Storage drawer (ξ=0.89)

Fig. 7. Various planar objects. First column: picture of target object, Second column: 2D depth map, Third column: close-up view, Fourth column: overheadview (a view of an object from above). We mark our reconstruction by cyan dots and corresponding raw data by red dots in 3D space. Green line representsthe actual position where each target object is located.

to some extent. A cabinet bin is less transparent at the centerand more transparent around the center. A file folder presentssome specularity at the center, resulting in missing points inour reconstruction. The boundary of every object is nearlyopaque due to its thickness, which is the source of additionalerrors around boundary points. Because we explicitly assumea homogenous surface, inhomogeneity of real world objectscan add some errors.

Fig. 8 presents a set of cylindrical objects made of variousmaterials. Similar to planar objects, we create reference depthmaps by fitting the 3D points of our estimates into cylin-drical surfaces. A round object is particularly challenging inour framework because it breaks our two-layer assumption.At every line of sight, the distance value is the result ofthree bounce rays: two from the foreground and one frombackground. Hence, our reconstruction of a pen container

appears rather flat and different from the original round shape.Interestingly, the two-layer assumption is still valid for otherobjects such as an aroma candle and a basket filled withwater because these objects are filled with some scatteringmedia. Once the object is filled with scattering media, theentire body is regarded as a single layer surface with somethickness. Although our model does not explicitly accountfor subsurface scattering, we approximate the scattering objectwith dense materials as a single-layer homogeneous surface.As seen in Fig. 8, our framework provides reasonable estimatesin practice because the errors caused by subsurface scatteringare not significant. Based on these experiments, we confirmthat our framework successfully performs shape reconstructionunder adverse conditions.

From the 2D illustration of depth map, we observe that non-linear depth distortions in raw data appeared at the foremost tip

Page 11: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR ...cvlab.khu.ac.kr/2015_TCSVT.pdfcovering translucent surfaces while summarizing the ideas described in related work. 1) Pure specular

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, DRAFT 11

(a) Aroma candle (Estimated ξ=0.3)

(b) Basket filled with water (Estimated ξ=0.15)

(c) Pen container (Estimated ξ=0.64)

Fig. 8. Various cylindrical objects. First column: picture of target object and reference depth map, Second column: 2D depth map, Third column: close-upview, Fourth column: overhead view. We mark our reconstruction by cyan dots and corresponding raw data by red dots in 3D space. Green line representsthe actual position where each target object is located.

of objects. The distortion forms a peak shape at the foremosttip of object. Except for this peak, the remaining portion isshifted toward the background leading to non-linear depth dis-tortions. For various materials, our reconstruction recovers theoriginal cylindrical shape and approximates the groundtruthposition reasonably well. The performance, however, becomesless effective for the bottom of objects where they intersectwith the floor because of inter-reflections between the objectand floor. This effect becomes severe when dealing withobjects filled with scattering media. The same types of errorsare present in raw data. Our two-layer distance model ignoreshigh-order bounce rays and therefore we cannot avoid errorscaused by inter-reflections.

From these experiments, we can summarize some importantinformation regarding the performance of the proposed algo-rithm. First, our distance model can handle various translucent

objects, presenting diverse shapes and materials. Extensiveevaluation of various objects shows that the proposed algo-rithm can effectively characterize various real objects. Second,our framework is not limited to a single thin-layer surface aslong as the object is filled with a homogenous medium. Sincemany real objects have closed form surfaces, this propertycan extend the application scenarios of our work. Third,performance is slightly reduced by inhomogeneity of thesurface. This issue can be alleviated by enforcing homogeneitywithin the local surface instead of the entire surface. Fourth,many of the example objects used in our experiment presentspecularity. Currently, we do not explicitly handle specularilluminations because specular highlights occupy only a smallproportion of the object surface.

Page 12: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR ...cvlab.khu.ac.kr/2015_TCSVT.pdfcovering translucent surfaces while summarizing the ideas described in related work. 1) Pure specular

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, DRAFT 12

Vinyl (Estimated ξ=0.97)

(b) Basket (Estimated ξ=0.3)

Fig. 9. Challenging objects. First column: picture of target object and reference depth map, Second column: 2D depth map, Third column: close-up view,Fourth column: overhead view. We mark our reconstruction by cyan dots and corresponding raw data by red dots in 3D space. Green line represents the actualposition where each target object is located.

VI. DISCUSSION

A. Effect of Refraction

Although we initially assume that surfaces do not producelight refraction, some translucent objects present refraction. Inthe presence of light refraction, a background point b in Fig. 1no longer positions on the line of sight: b and db change uponan incident angle θf . Knowing this drawback, we examinedthe decrease in performance for a refractive object, a pieceof vinyl, as shown in Fig. 9 (a). As expected, the originalshape is not fully recovered in our reconstruction. Instead, ourreconstruction corrects the general position of the object upto a certain extent. In the future, we plan to add a refractionparameter into our model to predict the exact traveling path oflight. We expect that this will help develop a complete distancemodel accounting for more general translucent objects.

B. Multi-layered Objects

In Section V-C, we approximate cylindrical objects usingour two-layer distance model, considering two-layered fore-ground objects as single layers. This approximation is validif an object is filled with homogeneous scattering media.We test our performance with a multi-layer object, the samebasket shown in Fig. 8 (b) without water. The correspondingresult is presented in Fig. 9 (b). Comparing the raw datain Fig. 8 (b) and Fig. 9 (b), we can observe that an emptybasket shows strong shape distortion, compared with a filledbasket. Although our reconstruction is near to the groundtruthposition, the original object shape is not correctly recovered.

To address this issue, we plan to extend to a multi-layeredmodel in the future. Although a multi-layered model is ad-vantageous to build a rich representation, a high dimensionalmodel is prohibitive for inference as it recovers multipleparameters from few observations. Hence, we plan to studyan optimal distance model that approximates a major classof translucent objects and determine an effective inferencescheme by introducing some useful constraints such as asurface smoothness.

C. Subsurface Scattering vs. Interreflection

One might consider that a solid scattered volume is equiv-alent to an infinite number of dense layers attributed to inter-reflections. However, even though both originate from multiplebounces of reflected light, subsurface scattering produces ashort-range error whereas inter-reflections cause long-rangeerror, as addressed in [19], [20]. Because subsurface scatteringis generated within the foreground object, we can approximateits depth distortion by fitting a translucent parameter (typicallya small value is suitable for scattered objects). In contrast,inter-reflections are introduced by high-order bounce rays.As discussed in Section VI-B, our current implementation isincapable of handling inter-reflections.

D. A Single Layer vs. Two-Layer Depth Model

When an object becomes nearly opaque (e.g., ξ=0 forthe opaque Lambertian object), depth distortions are lesssignificant because a small amount of reflected light from

Page 13: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR ...cvlab.khu.ac.kr/2015_TCSVT.pdfcovering translucent surfaces while summarizing the ideas described in related work. 1) Pure specular

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, DRAFT 13

the background reaches the camera. Under this circumstance,a single layer L1 might be sufficient to represent the depthmodel. However, it is hard to distinguish nearly opaque objectsfrom nearly transparent objects without the prior knowledgeof scene. In fact, detecting the specific type of object is yetanother research problem. Therefore, we apply a two-layerdepth model and provide a unified framework for handlingtranslucent as well as Lambertian objects.

VII. CONCLUSION

Recovering translucent objects is a challenging researchproblem because illumination from a translucent object isoften affected by its surrounding environment, such as thebackground, scene configuration, or lighting. In this paper, wepropose a new framework for recovering a translucent objectusing a single ToF camera. We derive a generic distance modelconsidering a two-layered scene (Section II) and then focus onits special case built upon Time-of-Flight sensing principles(Section III). Our framework is reproducible, operable byan ordinary user, and practical because it is not necessaryto modify camera hardware or firmware. Also, our distancemodel predicts a depth distortion by a far-long range effect;both translucency and the distance between background andforeground cause significant shape distortions. Although mostexisting techniques are limited to a specific class of objects, weprovide a unified framework for both Lambertian and translu-cent objects. Based on extensive evaluations and experimentalstudy on corner cases, we show that the proposed method iseffective for recovering the shape of translucent objects.

In the future, we plan to incorporate light refraction into ourmodel, develop a high-order distance model for multi-layeredobjects, address inter-reflections by adding high order bouncerays into our model, and determine an effective inferencescheme by introducing some useful constraints (e.g., surfacesmoothness). Moreover, the current configuration requires userinteractions of inserting a target object into the scene. In futurestudies, we will extend this framework for multiple depthcameras and acquisition of multi-view depth maps that we canuse to simultaneously estimate the foreground, background,and a translucent parameter. In this way we expect to be ableto capture and process a dynamic object.

ACKNOWLEDGEMENT

This research was supported by the MSIP (Ministry ofScience, ICT and Future Planning), Korea, under the ITConsilience Creative Program (NIPA-2014-H0201-14-1001)supervised by the NIPA (National IT Industry PromotionAgency) and also the Basic Science Research Programthrough the National Research Foundation of Korea (NRF)funded by the Ministry of Science, ICT and Future Planning(2013062644).

REFERENCES

[1] Y. Adato, Y. Vasilyev, O. Ben-Shahar, and T. Zickler. Towards a theoryof shape from specular flow. In Proceedings of IEEE InternationalConference on Computer Vision, pages 1–8, 2007.

[2] Y. Adato, Y. Vasilyev, T. Zickler, and O. Ben-Shahar. Shape fromspecular flow. IEEE Transactions on Pattern Analysis and MachineIntelligence, 32(11):2054–2070, Nov 2010.

[3] S. Agarwal, S. Mallick, D. Kriegman, and S. Belongie. On refractiveoptical flow. In Proceedings of European Conference on ComputerVision, pages 483–494, 2004.

[4] P. Benzie, J. Watson, P. Surman, I. Rakkolainen, K. Hopf, H. Urey,V. Sainov, and C. Von Kopylow. A survey of 3dtv displays: Techniquesand technologies. Circuits and Systems for Video Technology, IEEETransactions on, 17(11):1647–1658, Nov 2007.

[5] T. Bonfort and P. Sturm. Voxel carving for specular surfaces. InProceedings of IEEE International Conference on Computer Vision,pages 591–596, 2003.

[6] T. Chen, M. Goesele, and H.-P. Seidel. Mesostructure from specularity.In Proceedings of IEEE Conference on Computer Vision and PatternRecognition, pages 17–22, 2006.

[7] T. Chen, H. P. A. Lensch, C. Fuchs, and H.-P. Seidel. Polarization andphase shifting for 3d scanning of translucent objects. In Proceedings ofIEEE Conference on Computer Vision and Pattern Recognition, pages1–8, 2007.

[8] T. Chen, H.-P. Seidel, and H. P. A. Lensch. Modulated phase-shifting for3d scanning. In Proceedings of IEEE Conference on Computer Visionand Pattern Recognition, pages 1–8, 2008.

[9] V. Couture, N. Martin, and S. Roy. Unstructured light scanningto overcome interreflections. In Proceedings of IEEE InternationalConference on Computer Vision, pages 1895–1902, 2011.

[10] Y. Ding and J. Yu. Recovering shape characteristics on near-flat specularsurfaces. In Proceedings of IEEE Conference on Computer Vision andPattern Recognition, pages 1–8, 2008.

[11] B. Dong, K. D. Moore, W. Zhang, and P. Peers. Scattering parametersand surface normals from homogeneous translucent materials usingphotometric stereo. In Proceedings of IEEE Conference on ComputerVision and Pattern Recognition, pages 2299–2306, June 2014.

[12] A. A. Dorrington, J. P. Godbaz, M. J. Cree, A. D. Payne, and L. V.Streeter. Separating true range measurements from multipath andscattering interference in commercial range cameras. In Proc. SPIE,7864:786404T–786404T–10, 2011.

[13] D. Freedman, E. Krupka, Y. Smolin, I. Leichter, and M. Schmidt.Sra: Fast removal of general multipath for tof sensors. In ProceedingEuroupean Conference in Computer Vision, 2014.

[14] S. Fuchs. Multipath interference compensation in time-of-flight cameraimages. Pattern Recognition (ICPR), 2010 20th International Confer-ence on, pages 3583–3586, Aug 2010.

[15] J. Godbaz, M. Cree, and A. A. Dorrington. Mixed pixel return separationfor a full-field ranger. Image and Vision Computing New Zealand, 2008.IVCNZ 2008. 23rd International Conference, pages 1–6, Nov 2008.

[16] J. P. Godbaz, M. J. Cree, and A. A. Dorrington. Multiple returnseparation for a full-field ranger via continuous waveform modelling.Proc. SPIE, 7251:72510T–72510T–12, 2009.

[17] J. P. Godbaz, M. J. Cree, and A. A. Dorrington. Closed-form inversesfor the mixed pixel/multipath interference problem in amcw lidar. InProc. SPIE, 8296:829618T–829618T–15, 2012.

[18] J. Gu, T. Kobayashi, M. Gupta, and S. Nayar. Multiplexed illuminationfor scene recovery in the presence of global illumination. In Proceedingsof IEEE International Conference on Computer Vision, pages 691–698,2011.

[19] M. Gupta, A. Agrawal, A. Veeraraghavan, and S. Narasimhan. Structuredlight 3d scanning in the presence of global illumination. In Proceedingsof IEEE Conference on Computer Vision and Pattern Recognition, pages713–720, 2011.

[20] M. Gupta, A. Agrawal, A. Veeraraghavan, and S. Narasimhan. Apractical approach to 3d scanning in the presence of interreflections,subsurface scattering and defocus. In Proceedings of IEEE Conferenceon Computer Vision and Pattern Recognition, 102(1):33–55, 2013.

[21] M. Gupta and S. Nayar. Micro phase shifting. In Proceedings of IEEEConference on Computer Vision and Pattern Recognition, pages 813–820, 2012.

[22] M. Gupta, S. K. Nayar, M. B. Hullin, and J. Martin. Phasor imaging:A generalization of correlation-based time-of-flight imaging. ColumbiaTechnical Report, 2014.

[23] M. A. Halstead, B. A. Barsky, S. A. Klein, and R. B. Mandell.Reconstructing curved surfaces from specular reflection patterns usingspline surface fitting of normals. In Proceedings of ACM SIGGRAPH,pages 335–342, 1996.

[24] E. Hecht. Optics. Addison-Wesley, page 698, 2002.[25] C. Hermans, Y. Francken, T. Cuypers, and P.Bekaert. Depth from sliding

projections. In Proceedings of IEEE Conference on Computer Vision andPattern Recognition, pages 1865–1872, 2009.

[26] M. Holroyd, J. Lawrence, G. Humphreys, and T. Zickler. A photometricapproach for estimating normals and tangents. ACM Trans. Graph.,27(5):133:1–133:9, Dec 2008.

[27] I. Ihrke, K. Kutulakos, H. Lensch, M. Magnor, and W. Heidrich. State

Page 14: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR ...cvlab.khu.ac.kr/2015_TCSVT.pdfcovering translucent surfaces while summarizing the ideas described in related work. 1) Pure specular

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, DRAFT 14

of the art in transparent and specular object reconstruction. ComputerGraphics Forum, 29(8):2400–2426, 2010.

[28] K. Ikeuchi. Determining surface orientations of specular surfaces byusing the photometric stereo method. IEEE Transactions on PatternAnalysis and Machine Intelligence, 3(6):661–669, Nov 1981.

[29] C. Inoshita, Y. Mukaigawa, Y. Matsushita, and Y. Yagi. Shape fromsingle scattering for translucent objects. In Proceedings of EuropeanConference on Computer Vision, pages 371–384, 2012.

[30] C. Inoshita, Y. Mukaigawa, Y. Matsushita, and Y. Yagi. Surface normaldeconvolution: Photometric stereo for optically thick translucent objects.In Proceedings of Euroupean Conference on Computer Vision, pages346–359, 2014.

[31] D. Jimenez, D. Pizarro, M. Mazo, and S. Palazuelos. Modellingand correction of multipath interference in time of flight cameras.In Proceedings of IEEE Conference on Computer Vision and PatternRecognition, pages 893–900, June 2012.

[32] A. Kadambi, R. Whyte, A. Bhandari, L. Streeter, C. Barsi, A. Dorring-ton, and R. Raskar. Coded time of flight cameras: Sparse deconvolutionto address multipath interference and recover time profiles. ACM Trans.Graph., 32(6):167:1–167:10, Nov. 2013.

[33] A. Kirmani, A. Benedetti, and P. Chou. Spumic: Simultaneous phaseunwrapping and multipath interference cancellation in time-of-flightcameras using spectral methods. Multimedia and Expo (ICME), 2013IEEE International Conference on, pages 1–6, July 2013.

[34] K. N. Kutulakos and E. Steger. A theory of refractive and specular 3dshape by light-path triangulation. In Proceedings of IEEE InternationalConference on Computer Vision, pages 1448–1455, 2005.

[35] K. N. Kutulakos and E. Steger. A theory of refractive and specular 3dshape by light-path triangulation. International Journal of ComputerVision, 76(1):13–29, 2008.

[36] R. Lange and P. Seitz. Solid-state time-of-flight range camera. QuantumElectronics, IEEE Journal of, 37(3):390–397, Mar 2001.

[37] S. Liu, T. Ng, and Y. Matsushita. Shape from second-bounce of lighttransport. In Proceedings of European Conference on Computer Vision,pages 280–293, 2010.

[38] W.-C. Ma, T. H. P. Peers, C.-H. Chabert, M. Weiss, and P. Debevec.Rapid acquisition of specular and diffuse normal maps from polarizedspherical gradient illumination. Proceedings of the 18th EurographicsConference on Rendering Techniques, pages 183–194, 2007.

[39] N. J. Morris and K. N. Kutulakos. Dynamic refraction stereo. IEEETransactions on Pattern Analysis and Machine Intelligence, 33(8):1518–1531, 2011.

[40] N. J. W. Morris and K. N. Kutulakos. Reconstructing the surface ofinhomogeneous transparent scenes by scatter-trace photography. InProceedings of IEEE Conference on Computer Vision and PatternRecognition, pages 1–8, 2007.

[41] H. Murase. Surface shape reconstruction of an undulating transparentobject. pages 313–317, Dec 1990.

[42] H. Murase. Surface shape reconstruction of a nonrigid transparent objectusing refraction and motion. IEEE Transactions on Pattern Analysis andMachine Intelligence, 14(10):1045–1052, Oct 1992.

[43] S. Nayar, K. Ikeuchi, and T. Kanade. Shape from interreflections. InProceedings of IEEE International Conference on Computer Vision,pages 2–11, 1990.

[44] S. Nayar, G. Krishnan, M. D. Grossberg, and R. Raskar. Fast separationof direct and global components of a scene using high frequencyillumination. ACM Trans. on Graphics (also Proc. of ACM SIGGRAPH),25(3):935–944, 2006.

[45] J. Park and C. Kak. 3d modeling of optically challenging objects. IEEETransactions on Visualization and Computer Graphics, 14(2):246–262,2008.

[46] S. Roth and M. J. Black. Specular flow and the recovery of surfacestructure. In Proceedings of IEEE Conference on Computer Vision andPattern Recognition, pages 1869–1876, 2006.

[47] H. Schultz. Retrieving shape information from multiple images of aspecular surface. IEEE Transactions on Pattern Analysis and MachineIntelligence, 16(2):195–201, 1994.

[48] H. Shim and S. Lee. Hybrid exposure for depth imaging of a time-of-flight depth sensor. Opt. Express, 22(11):13393–13402, Jun 2014.

[49] J. Solem, H. Aanaes, and A. Heyden. A variational analysis of shapefrom specularities using sparse data. International Symposium on 3DData Processing, Visualization and Transmission, pages 26–33, Sept2004.

[50] M. Tarini, H. P. A. Lensch, M. Goesele, and H.-P. Seidel. 3d acquisitionof mirroring objects. Graphical Models, 67(4):233–259, 2005.

[51] A. Velten, T. Willwacher, O. Gupta, A. Veeraraghavan, M. G. Bawendi,and R. Raskar. Recovering three-dimensional shape around a cornerusing ultra-fast time-of-flight imaging. Nature Communications, 2012.

[52] D. Wu, M. O’Toole, A. Velten, A. Agrawal, and R. Raskar. Decomposing

global light transport using time of flight imaging. In Proceedings ofIEEE Conference on Computer Vision and Pattern Recognition, pages366–373, 2012.

HyunJung Shim received her B.S. degree in elec-trical engineering from Yonsei University, Korea, in2002, and the M.S. and Ph.D. degrees in electricaland computer engineering from Carnegie MellonUniversity, Pittsburgh, PA, in 2004 and 2008 respec-tively. She is currently an assistant professor in theSchool of Integrated Technology at Yonsei Univer-sity. From November 2008 to February 2013, shewas with the Samsung Advanced Institute of Tech-nology, Samsung Electronics, Korea. Her researchinterests include 3D modeling and reconstruction,

inverse lighting and reflectometry, face modeling, image-based relightingand rendering, light field capturing and processing algorithms, and colorenhancement algorithms. Most recently, her research activities focus on depthimage-based scene understanding and analysis.

Seungkyu Lee received his PhD degree in computerscience and engineering from Penn State University,US. He has been a research engineer in KoreaBroadcasting System Technical Research Institute,where he carried out research on HD image pro-cessing, MPEG4-AVC, and the standardization ofTerrestrial-Digital Mobile Broadcasting. He has beena principal research scientist at Advanced MediaLab, Samsung Advanced Institute of Technology. Heis an assistant professor at Kyung Hee University.His research interests include ToF depth camera,

color/depth image processing, symmetry- based computer vision, and 3-Dmodeling and reconstruction.


Recommended