+ All Categories
Home > Documents > Simultaneous line matching and epipolar geometry estimation based on the intersection context of...

Simultaneous line matching and epipolar geometry estimation based on the intersection context of...

Date post: 11-Sep-2016
Category:
Upload: hyunwoo-kim
View: 213 times
Download: 0 times
Share this document with a friend
15
Simultaneous line matching and epipolar geometry estimation based on the intersection context of coplanar line pairs Hyunwoo Kim a,, Sukhan Lee b,a Multimedia Research Team, Daum Communications Corp., 714 Hannam-dong, Yongsan-gu, Seoul 140-894, Republic of Korea b College of Information and Communication Engineering, Sungkyunkwan University, 300 Cheoncheon-dong, Jangan-gu, Suwon, Gyeonggi-do 440-746, Republic of Korea article info Article history: Received 30 August 2010 Available online 28 March 2012 Communicated by S. Sarkar Keywords: Line matching Scene modeling Line coplanarity Epipolar geometry estimation abstract This paper presents a novel line matching method based on the intersection context of coplanar line pairs. The proposed method is designed to be especially effective for dealing with poorly structured and/or textured scenes. To overcome the ambiguity in line matching based on single line segments, the intersecting line pairs in 2D images that are coplanar in 3D are chosen instead for use in matching. The coplanarity of intersecting line pairs and their corresponding intersection context discriminate the true intersecting line pairs from the false intersecting ones in 3D. Compared to previous approaches, the method proposed herein offers efficient yet robust matching performance under poor line topologies or junction structures, while simultaneously estimating unknown camera geometry. This is due to the fact that the proposed method neither resorts to comprehensive topological relations among line seg- ments nor relies on the presence of well-defined junction structures. The intersecting line pairs, used here as matching features, are more informative than the single line segments and simpler than the compre- hensive topological relations. Also, the coplanarity criteria are more generally applied than the require- ment of junction structures. Comparison studies and experimental results prove the accuracy and speed of the proposed method for various real world applications. Ó 2012 Elsevier B.V. All rights reserved. 1. Introduction 3D scene modeling is an active and important research field in computer vision and graphics, with applications including in TV/ film production, augmented reality, robotics, navigation, and sur- veillance systems. In scene modeling, the detection and matching of image features is the first crucial step because, given feature cor- respondence among multiple views, 3D scenes may be recon- structed by triangulation (Snavely et al., 2006; Mordohai et al., 2006; Pollefeys et al., 1999; Baillard et al., 1999; Bartoli and Sturm, 2003; Micusik and Kosecka, 2009). Many of the scene modeling methods to date have been pro- posed under the proposition that interest points (Schmid et al., 2000), e.g., corners, can be detected and matched based on the invariant properties associated with such photometric data as intensity, color, shape, and texture. However, the approaches based on interest points are effective only for richly structured scenes with sufficient texture information to be used for the extraction and matching of interest points. In real world situations, it is often the case that scenes contain poorly textured objects, including such man-made objects as tables, desks, chairs, sinks, refrigerators, microwave ovens, monotone walls, and hallways, making interest points hardly detectable and poorly localizable (Kim et al., 2008). In this case, line features can be good alternatives to interest points as image features because man-made objects are often configured with several parts of well-defined geometric shapes that offer dis- tinct 3D lines and edges. As reported, in poorly textured indoor scenes, 2D lines may sometimes be the only image feature to be utilized for automatic 3D modeling (Baillard et al., 1999; Bartoli and Sturm, 2003; Tang et al., 2006; Micusik and Kosecka, 2009). While line features are regarded as robust to environmental vari- ations for detection and localization in 2D image planes, they are dif- ficult to match because of the lack of photometric invariance to be used for measuring similarity. The many approaches to line match- ing available to date can be categorized based on whether or not they perform the estimation of camera geometry, the geometric relation- ship among multiple views, simultaneously with line matching. Most of the conventional line matching methods have been based on the assumption that camera geometry may be estimated a priori from the matching of corresponding interest points or from the calibration using calibration patterns. With known camera geome- try, matching can be constrained on the epipolar lines, effectively converting the problem of line matching into that of matching between the points on the lines that satisfy the epipolar constraints. 0167-8655/$ - see front matter Ó 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.patrec.2012.03.014 Corresponding authors. Tel.: +82 2 6718 0193; fax: +82 2 6718 0905 (H. Kim), Tel.: +82 31 290 7150; fax: +82 31 299 6479 (S. Lee). E-mail addresses: [email protected] (H. Kim), [email protected] (S. Lee). Pattern Recognition Letters 33 (2012) 1349–1363 Contents lists available at SciVerse ScienceDirect Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec
Transcript
Page 1: Simultaneous line matching and epipolar geometry estimation based on the intersection context of coplanar line pairs

Pattern Recognition Letters 33 (2012) 1349–1363

Contents lists available at SciVerse ScienceDirect

Pattern Recognition Letters

journal homepage: www.elsevier .com/locate /patrec

Simultaneous line matching and epipolar geometry estimationbased on the intersection context of coplanar line pairs

Hyunwoo Kim a,⇑, Sukhan Lee b,⇑a Multimedia Research Team, Daum Communications Corp., 714 Hannam-dong, Yongsan-gu, Seoul 140-894, Republic of Koreab College of Information and Communication Engineering, Sungkyunkwan University, 300 Cheoncheon-dong, Jangan-gu, Suwon, Gyeonggi-do 440-746, Republic of Korea

a r t i c l e i n f o a b s t r a c t

Article history:Received 30 August 2010Available online 28 March 2012

Communicated by S. Sarkar

Keywords:Line matchingScene modelingLine coplanarityEpipolar geometry estimation

0167-8655/$ - see front matter � 2012 Elsevier B.V. Ahttp://dx.doi.org/10.1016/j.patrec.2012.03.014

⇑ Corresponding authors. Tel.: +82 2 6718 0193; faTel.: +82 31 290 7150; fax: +82 31 299 6479 (S. Lee).

E-mail addresses: [email protected] (H. Kim)

This paper presents a novel line matching method based on the intersection context of coplanar linepairs. The proposed method is designed to be especially effective for dealing with poorly structuredand/or textured scenes. To overcome the ambiguity in line matching based on single line segments,the intersecting line pairs in 2D images that are coplanar in 3D are chosen instead for use in matching.The coplanarity of intersecting line pairs and their corresponding intersection context discriminate thetrue intersecting line pairs from the false intersecting ones in 3D. Compared to previous approaches,the method proposed herein offers efficient yet robust matching performance under poor line topologiesor junction structures, while simultaneously estimating unknown camera geometry. This is due to thefact that the proposed method neither resorts to comprehensive topological relations among line seg-ments nor relies on the presence of well-defined junction structures. The intersecting line pairs, used hereas matching features, are more informative than the single line segments and simpler than the compre-hensive topological relations. Also, the coplanarity criteria are more generally applied than the require-ment of junction structures. Comparison studies and experimental results prove the accuracy and speedof the proposed method for various real world applications.

� 2012 Elsevier B.V. All rights reserved.

1. Introduction

3D scene modeling is an active and important research field incomputer vision and graphics, with applications including in TV/film production, augmented reality, robotics, navigation, and sur-veillance systems. In scene modeling, the detection and matchingof image features is the first crucial step because, given feature cor-respondence among multiple views, 3D scenes may be recon-structed by triangulation (Snavely et al., 2006; Mordohai et al.,2006; Pollefeys et al., 1999; Baillard et al., 1999; Bartoli and Sturm,2003; Micusik and Kosecka, 2009).

Many of the scene modeling methods to date have been pro-posed under the proposition that interest points (Schmid et al.,2000), e.g., corners, can be detected and matched based on theinvariant properties associated with such photometric data asintensity, color, shape, and texture. However, the approaches basedon interest points are effective only for richly structured sceneswith sufficient texture information to be used for the extractionand matching of interest points. In real world situations, it is oftenthe case that scenes contain poorly textured objects, including

ll rights reserved.

x: +82 2 6718 0905 (H. Kim),

, [email protected] (S. Lee).

such man-made objects as tables, desks, chairs, sinks, refrigerators,microwave ovens, monotone walls, and hallways, making interestpoints hardly detectable and poorly localizable (Kim et al., 2008).In this case, line features can be good alternatives to interest pointsas image features because man-made objects are often configuredwith several parts of well-defined geometric shapes that offer dis-tinct 3D lines and edges. As reported, in poorly textured indoorscenes, 2D lines may sometimes be the only image feature to beutilized for automatic 3D modeling (Baillard et al., 1999; Bartoliand Sturm, 2003; Tang et al., 2006; Micusik and Kosecka, 2009).

While line features are regarded as robust to environmental vari-ations for detection and localization in 2D image planes, they are dif-ficult to match because of the lack of photometric invariance to beused for measuring similarity. The many approaches to line match-ing available to date can be categorized based on whether or not theyperform the estimation of camera geometry, the geometric relation-ship among multiple views, simultaneously with line matching.Most of the conventional line matching methods have been basedon the assumption that camera geometry may be estimated a priorifrom the matching of corresponding interest points or from thecalibration using calibration patterns. With known camera geome-try, matching can be constrained on the epipolar lines, effectivelyconverting the problem of line matching into that of matchingbetween the points on the lines that satisfy the epipolar constraints.

Page 2: Simultaneous line matching and epipolar geometry estimation based on the intersection context of coplanar line pairs

Fig. 1. Overall flow of the proposed method.

1350 H. Kim, S. Lee / Pattern Recognition Letters 33 (2012) 1349–1363

Schmid and Zisserman (1997) automatically matched linesegments by exploiting the intensity neighborhood of the linesegments, guided by the epipolar constraints between differentcamera views, to provide point-to-point correspondences alongthe line segments. Resolving the resulting ambiguity, Werner andZisserman (2002) improved the previous algorithm by a ’line seep’,or search to register the photometric neighborhood. Those ap-proaches presume that accurate camera geometry should be esti-mated beforehand. However, when these methods are applied topoorly textured scenes, the detection of interest points is ratherineffective, such that the estimation of camera geometry becomeserroneous or inaccurate, resulting in the failure of the algorithms.Therefore, the need exists for an automatic line matching algo-rithm without predetermined camera geometry, possibly estimat-ing camera geometry and matching line segments simultaneously.

Recently, several reported investigations pioneered methodolo-gies for matching line segments without assumed camera geome-try in unstructured real world situations. Bay et al. (2005)proposed a line matching algorithm targeting poorly texturedscenes. First, an initial set of line segment correspondences are ob-tained by comparing the histograms of the neighboring color pro-files in both views. Then a topological filter is applied to findcorrect line matches while removing wrong candidate matchesfor the initial matches. Finally, a coplanar grouping stage usinghomography allows us to estimate the fundamental matrix evenfrom line segments. Kim et al. (2007) introduced a spectral linematching algorithm to find the subset of correspondences withthe greatest consistency, which are learned using logistic classifi-ers. Furthermore, line matching techniques in wide-baseline ste-reos have been successfully presented in Wang et al. (2009a,b).Those approaches require heavy computation via topological rela-tion analysis, a learning stage, or heavy neighborhood region anal-ysis. Besides, the photometric information for only single linesegments is not discriminative enough to match line segments.

Another group of researchers has utilized junction features forline matching. Vincent and Laganière (2004) matched junctionsby estimating the local perspective distortion between the neigh-borhoods of junctions, then estimated the fundamental matrixbased on a constrained minimization, assuming crude camera poseestimates. Bay et al. (2006) identified polyhedral junctions result-ing from the intersections of the line segments, then segmentedthe images into planar polygons using an algorithm based on a bin-ary space partitioning tree. However, junction-based approachesare only applied to well-structured scenes, where lines and thejunctions are robustly extracted. Such scenes are limited to housesbuilt by brick or tiles, well- structured indoor aisles, and well-edged furniture.

To challenge line matching problem in poorly textured scenes,we used intersection context of line features by combining the geo-metrical invariance of 3D line intersection with the photometricinvariance of the projected 2D line intersection. Instead of matchingsingle line segments individually, matching of a pair of linesegments coplanar in 3D is investigated. Co-planarity of the inter-secting line pairs and their corresponding intersection context dis-criminate true intersecting line pairs from false intersecting ones in3D world. Compared to previous approaches, the proposed methodoffers efficient yet robust matching performance under poor linetopologies or junction structures, while simultaneously estimatingunknown camera geometry. The proposed method neither resortsto comprehensive topological relations among line segments norrelies on the presence of well-defined junction structures. Theintersecting line pairs, used here as matching features, are not onlymore informative than the single line segments and simpler thanthe comprehensive topological relations, but also the co-planaritycriteria are more general to apply than the requirement of junctionstructures.

The novelty of the proposed method are threefold:

(1) First, in feature extraction, the intersection of line pairs ismodeled as an image feature, which has good localizationproperty and captures geometrically meaningful structures,e.g., corners and junctions of furniture or electronic appli-ances. The newly modeled feature has also fair photometricinvariance between different camera views. Each featureconsists of a line pair and its intersection context. Based onthe observation that the geometrically important line pairsare closely located each other, a proximate rule is also usedin line paring. The features are called ‘‘Line Intersection Con-text Features (LICFs)’’ in the paper. Compared to conven-tional interest points, the detected intersection features aresmaller in quantity but more geometrically representativein quality (Section 2).

(2) Second, the feature matching stage performs camera geom-etry estimation and line matching simultaneously. Line seg-ments are matched not knowing camera geometry inadvance. While conventional algorithms consists of twosteps, i.e., camera geometry estimation and line matching,but the proposed one combines the two steps into one.The benefit of the one-step algorithm is the error of camerageometry estimation is not propagated into the consecutiveline matching step, and even in the case of inaccurate cam-era geometry estimation, reasonable line matching resultsare given for further multi-view integration. Coplanarity ofthe intersecting line pairs and their corresponding intersec-tion context discriminate true intersecting line pairs fromfalse intersecting ones in 3D world. (Section 3).

(3) Last, a practical and fast solution is presented when camerageometry is given. It is useful for the off-the-shelf stereocameras, which have been prevailing in robot vision, andfor the fixed multiple surveillance camera systems, is pre-sented. When camera geometry can be pre-computed by cal-ibration or semi-automatic or automatic correspondencematching using calibration patterns, the stereo pairs can berectified and the proposed method can be simplified to makethe proposed approach adequate for near real-time line-based scene modeling. Compared to the previous approaches

Page 3: Simultaneous line matching and epipolar geometry estimation based on the intersection context of coplanar line pairs

H. Kim, S. Lee / Pattern Recognition Letters 33 (2012) 1349–1363 1351

based on single line feature invariance and topology, thecomputational speed is much faster and the chance ofmiss-matching is lower (Section 4).

The overall algorithm consists of two stages: feature extractionand matching. The procedure is illustrated in Fig. 1. First, in featureextraction stage, line segments are extracted from each image,then intersecting line pairs are determined based on a proximityrule. The intersecting positions and neighborhoods of the line pairsare newly defined and used as image features, called ‘‘Line Inter-section Context Features (LICFs)’’.

Next, in the following feature matching stage, the LICFs are ini-tially matched from different image pairs based on photometricinvariance of the intersection context using a region descriptor.Then the initial matches are refined while estimating camerageometry simultaneously using a RANSAC technique.

In Sections 2 and 3, feature extraction and matching techniquesare described, respectively. Section 4 introduces a fast and simpli-fied version for the rectified stereo images, and Section 5 presentsthe comparison studies and experimental results in real worldscenes including line-based scene modeling. Finally, Section 6 con-cludes with a discussion of the performance of the line matchingmethod reported herein and future investigation.

Fig. 2. Finding intersecting line segment pairs. For example, in this figure, the linesegments l2 and l3 are intersecting line pair with the line segment l1 because theend points e2 and s3 are within the intersecting regions respectively, but the linesegment l4 is considered not intersecting with l1. So the intersecting line pairs,{{l1,l2},{l1,l3}}, are obtained.

2. Feature extraction

2.1. Coplanarity of intersecting line pairs

To solve the line matching problem, coplanar line pairs are con-sidered as matching features instead of single line segments. Twoline segments can produce more constraints for matching by com-bining the individual similarity, but the simple combination of thesimilarity of two line segments does not improve the matching re-sult much. The geometric relation and photometric invariance ofcoplanar line pairs warrant investigation.

For line pairs or groups of lines, special 3D geometric relationscan be observed and measured in 2D image space: parallelism,orthogonality, and coplanarity. The first two, parallelism andorthogonality, are strong geometric constraints. The 2D projectionsof 3D parallel lines meet a vanishing point, and the set of the par-allel lines with different directions are projected into images whileconstructing a vanishing line (Criminisi et al., 1999; Micusik andKosecka, 2009). However, accurate and automatic computation ofthose features are not easy because the accuracy of the extractedline is limited by line quantization.

The latter, coplanarity, is a rather weaker constraint, comparedto the first two properties, but can be determined easily among linepairs and/or groups. A coplanar line group can be determinedbased on the inter-image homography between different views(Bay et al., 2006). However, doing so without information aboutthe cameras and/or structures can be difficult because homographyalone is not only enough to discriminate coplanar lines pairs fromnon-coplanar line pairs accurately, and measuring coplanarity forall four line groups is also computationally expensive.

In this paper, aside from inter-image homography, coplanarityof intersecting line pairs is investigated. First, when two lines arecoplanar in 3D, the projected lines in 2D are always coplanar.Moreover, when the intersection of a 3D coplanar line pair is pro-jected into 2D images, the projected intersection is also the inter-section of the 2D line pairs that are projected from the 3Dcoplanar line pair. Therefore, the intersection of the coplanar linepair is geometrically invariant under perspective projection. Thatis, the geometric relationship among coplanar line pairs and theirintersection are preserved among different views under perspec-tive projection.

However, determining coplanar line pairs from 2D line pairsstill remains a problem. Any line pair in a 2D image can be eithercoplanar or non-coplanar. The lines of a coplanar line pair meetin 3D, but those of a non-coplanar pair do not. To discriminate acoplanar line pair from a non-coplanar line pair, the intersectionof the lines of the pair is further considered in the following steps.(1) First, since the lines of a 2D pair that intersect far from the endpoints of the line segments are unlikely to be coplanar except par-allel line pairs in 3D, a proximity rule is applied in determining theintersecting line pairs from the line segments (Section 2.2). (2)Next, because an image patch centered at the intersection pointof a coplanar line pair between different views comes from thesame surface patch centered at the line intersection in 3D, the pho-tometric property of the local neighborhoods of the intersectionpoint is preserved for photometric invariance to check coplanarity(Section 2.3).

2.2. Determining intersecting line pairs

The proposed algorithm begins with the extraction of line seg-ments from each image using a Canny edge detector with hystere-sis, followed by edge linking, with the linked edges fitted to the linesegments. The extracted line segments in an image I are repre-sented by {l1, . . . , lk1}, where k1 denotes the number of extractedline segments (Werner, 2007).

Due to noise and occlusion, the line segments intersecting in 3Ddo not explicitly meet in 2D images. Thus, to find candidates forintersecting line pairs, their end points of line segments are ex-tended virtually. A proximity rule is used to prevent non-existingintersecting lines in 3D. In an image, starting from a line segment,line segments are searched within a certain distance in order to re-move the possibility of pairing line segments falsely intersected in3D.

Given a line segment l1, its end points s1 and e1 are extendedinto the centers of the circles, sx1 and ex1, by a user specified dis-tance dth. A line segment is considered as an intersecting pair whenat least one of the extended end points are within the given thresh-old dth, that is, the sky-blue areas defined by the circles and rectan-gle. The proximity rule is illustrated in Fig. 2 by examples.

Formally, given the index set Pð: N # N2Þ of intersecting linepairs of the image I represented by p(k) = (i, j) such that d(li,lj) < dth,where d(li,lj) denotes the distance between line segments li, lj asdefined in Fig. 2, the intersecting line pairs are represented by

Lpair ¼ fflpðkÞ1 ; lpðkÞ2g; ðpðkÞ1;pðkÞ2Þ 2 Pg ð1Þ

where k = 1, . . . ,# of P. In the same way, given the index set P0 ofintersecting line pairs of the image I0 represented by p0(k) = (i, j)such that d l0i; l

0j

� �< dth, the intersecting line pairs are represented

by

Page 4: Simultaneous line matching and epipolar geometry estimation based on the intersection context of coplanar line pairs

1352 H. Kim, S. Lee / Pattern Recognition Letters 33 (2012) 1349–1363

L0pair ¼ l0p0 ðkÞ1 ; l0p0 ðkÞ2

n o; ðp0ðkÞ1;p0ðkÞ2Þ 2 P0

n oð2Þ

where k = 1, . . . ,# of P0.

2.3. Line intersection context feature (LICF)

Now, a new image feature is defined based on the candidates ofintersecting line pairs. In each line pair, the intersection context ofthe line pair is utilized as an image feature. The intersection con-text includes the intersecting position and the local texture regioncenter at the position. The feature is called the ‘‘Line IntersectionContext Feature (LICF)’’ in this paper. The newly defined featurecontains geometric information, as well as photometric informa-tion. The former is the positional information of intersection com-puted from a line pair, and the latter is the region information ofthe local image patch centered at the intersection position. For-mally, the LICFs are represented by the intersection positions andthe corresponding region patches in an image, as follows:

CLpair� fxk;RðxkÞg ¼ fxk;RðxkÞ; lpðkÞ1 ; lpðkÞ2g ð3Þ

where k = 1, . . . ,# of P. xk denotes the intersection positions of thecorresponding intersecting line pairs Lpairk

ð¼ flpðkÞ1 ; lpðkÞ2gÞ, andR(xk) denotes the region patch centered at the intersection position

xk. For convenience, a LICF CLpairk

� �sometimes refers to only the posi-

tion (xk) instead of the set of the position and the neighboring regionpatch (R(xk)).

Fig. 3 illustrates the model of the LICF and an example, andFig. 4 additionally compares LICFs with Harris corners by showingexamples. The texture regions around LICFs have weaker texturecharacteristics, compared to Harris corners, a typical interest pointfeatures (Schmid et al., 2000; Harris and Stephens, 1988). Harriscorners are taken because they are effective for detections of cor-ners/junctions opposed to other interest point detectors, which ex-tract blobs. Harris corner detector finds the 100 strongest locationsin terms of corner response. We can see that Harris corner detectorresponds to points on the lines rather than real junctions (e.g., the

Fig. 3. (Left) An example of LICF. The line pair, colored in magenta and cyan, intersects inThe zoom-in image of the intersection context. (Right) A model of LICF. (For interpretatioversion of this article.)

Fig. 4. Comparison of Harris corners and LICFs. Detected features and matching features aaround Harris corners and LICFs are also shown. (For interpretation of the references to

table corners). To compare Harris corners and LICFs, eigen-analysisof their neighborhoods is performed (Mikolajczyk et al., 2005).While Harris corners are extracted from corners and junctions withmainly two large eigenvalues, LICFs include even areas whoseintersection are covered by coplanar line pairs with wide-rangingeigenvalues. In addition, the interest point feature, e.g., Harris cor-ners, and the LICFs are complimentary in detection and matching,so that the unification of both features can produce more informa-tive image features.

The characteristics of LICFs can be summarized, as follows: (1)Good localization owing to derivation from the line pair intersec-tion. (Line features themselves have sub-pixel accuracy, so theirintersections also have sub-pixel accuracy. LICFs naturally havesub-pixel accuracy.) (2) Fair photometric invariance of the localpatch centered at the intersection position, including junctions,corners, lines, and flat areas. (The features are 2D projections from3D surface patches with photometric information.) (3) Fast linematching speed because of similarity measure only in the intersec-tion context instead of all line segment context. (While conven-tional line matching methods used all neighboring context areasalong the line segments, the line intersection context comparesonly the local regions of intersection constructed by the intersect-ing lines. The texture area is smaller than those of conventionalmethods, so the matching speed is faster. In addition, the intersec-tion context area has the most discriminative photometric infor-mation, compared to the other neighborhoods along the points ofeach line segment. The texture descriptor is explained in detail inthe following section on the feature matching stage.)

3. Feature matching

3.1. Matching of LICFs

To match the line intersection context features (LICFs) amongdifferent views, feature similarity is defined. In this stage, the LICFsprojected from true 3D points need to be matched while prohibit-ing those from false 3D points, which do not exist in 3D world.

the circled position, with an intersection context bounded by the red box. (Middle)n of the references to colour in this figure legend, the reader is referred to the web

re marked in black and in blue, respectively. Samples of neighboring texture regionscolour in this figure legend, the reader is referred to the web version of this article.)

Page 5: Simultaneous line matching and epipolar geometry estimation based on the intersection context of coplanar line pairs

H. Kim, S. Lee / Pattern Recognition Letters 33 (2012) 1349–1363 1353

While LICFs projected from true 3D points have a local texture re-gion around the features with similar photometric properties un-der projective distortion among different camera views, onesprojected from false 3D points do not. In this paper, region infor-mation about the intersection context of the LICFs is utilized formatching. The transformation of the local texture region amongdifferent views can be modeled as 2D projective transformationwith scale, translation, rotation, and shearing under the assump-tion of local planarity.

For implementing a region descriptor of LICFs, any local featuredescriptor can be used to measure the texture similarity under per-spective distortion and illumination variation. They include thesum of square difference (SSD), sum of absolute distance (SAD),normalized cross-correlation (NCC) (Gonzalez and Woods, 2006),color histogram, and scale-invariant feature transform (SIFT)(Lowe, 2004), maximally stable extremal regions (MSER) (Mataset al., 2002) etc. Assuming that the perspective distortion is smallin the local area, NCC is sufficient to describe the local textureinvariance, which is known to be fast and robust despite moderateillumination change and small perspective distortion. AlthoughNCC is thought to be adequate in the case of narrow-baseline ste-reo matching with small scale and rotation changes in local textureregions, in practice, NCC has been applied to match moderatewide-baseline stereo pairs, as demonstrated in Section 5, owingto the characteristics of LICFs. In particular, NCC is a good similaritymeasure for scene modeling for off-the-shelf stereo cameras orfixed multiple cameras. However, matching in wide-baseline cam-era views with large perspective distortion can be achieved byusing the scale and rotation invariant version of NCC (Zhao et al.,2006) or by incorporating more advanced local feature descriptors,such as SIFT and MSER with LICFs, which will be our future work,especially for general visual recognition.

Fig. 6. Degenerate configurations. (Left) A coplanar line lying in the epipolar plane. (Midepipolar line.

Fig. 5. LICF matching using epipolar constraint and coplanarity. (Left) Coplanar line pair ccenter and the epipole of the first (second) camera, respectively.

For matching LICFs using NCC, the same process as with interestpoints is adopted (Gonzalez and Woods, 2006). The NCC of LICFs,

CLpairi¼ fxi;RðxiÞg and C0Lpair0

j

¼ x0j;R0 x0j� �n o

, between different

views is represented by

NCC CLpairi;C0Lpair0

j

� ��NCC xi;x0j

� �

¼ 1Nncc�1

PNnccðRðxiÞ�RðxiÞÞ � R0 x0j

� ��R0 x0j

� �� �

rRðxiÞ �rR0 x0

j

� � ;

ð4Þ

where Nncc denotes the number pixels of the local region patch of

the LICF. RðxiÞðrRðxiÞÞ and R0 x0j� �ðrR0 ðx0

jÞÞ are the means (standard

deviations) of the region patches, respectively. After computingNCC scores from both images, the most strongly correlated matcheswith each other in both images are selected and represented by

Minitx;x0 ¼ fxk; x0k; Lpairk

; L0pairkg; k ¼ 1; . . . ;Kinit ð5Þ

where fxk; Lpairkg 2 CLpair

and x0k; L0pairk

n o2 C0Lpair0

and Kinit denotes

the number of matching LICFs by NCC.

3.2. Matching refinement

Not surprisingly, after the matching process based on the NCC,the results contain mismatches because of the limited discrimina-tive power of the similarity measure; therefore, a refinement stageis crucial. While conventional line matching approaches assumeknown camera geometry, our approach is to estimate the line

dle) A non-coplanar line lying in the epipolar plane. (Right) A line pair lying in the

ase. (Right) Non-coplanar line pair case. C and e (C0 and e0) are the camera projection

Page 6: Simultaneous line matching and epipolar geometry estimation based on the intersection context of coplanar line pairs

1354 H. Kim, S. Lee / Pattern Recognition Letters 33 (2012) 1349–1363

matching and camera geometry simultaneously by using LICFs. For2-view, 3-view, and N-view, the camera geometry is representedup to projective transformation by fundamental matrix, trifocaltensor, and camera projection matrices, respectively (Hartley andZisserman, 2004). In this paper, the results are shown for the 2-view case. Extension to 3-view and N-view is also possible in thesame framework.

Fig. 5 presents geometric relations of a LICF between two differ-ent views. One is the configuration of a coplanar line pair, and theother is that of a non-coplanar line pair in 3D. Given two cameraviews, the relations between cameras and 3D points/lines are con-strained by epipolar geometry in 2D image space, and, based on the

Fig. 8. A comparison study results for the scene ‘‘apt’’. (First row) Images from the first vimatching line segments, matching LICFs, and matching Harris corners are drawn in order.the references to colour in this figure legend, the reader is referred to the web version o

Fig. 7. LICF matching in the r

epipolar geometry, two configurations can be discriminated. Notethat there is a third configuration. When lines intersection at 3Dcorners/junctions, they intersect in 3D (i.e., that are coplanar) butthe area around the intersection point is not coplanar. This config-uration need to be explicitly handled for wide-baseline views, butit can fall into the coplanar case for narrow-baseline views.

Given a coplanar line pair, L1 and L2, the intersection meets onthe 3D point, X, and these lines are located on the same plane (col-ored in pink). The projections, x and x0, of the 3D intersection pointinto the two different views, I and I0, are also the intersectionpoints of the projected line pairs, {l1,l2} and l01; l

02

� �, in 2D image

space. Since the intersection points, x and x0, computed from the

ew. (Second row) Images from the second view. In each column, original test images,The estimated epipolar constraints are overlaid with red lines. (For interpretation off this article.)

ectified stereo pair case.

Page 7: Simultaneous line matching and epipolar geometry estimation based on the intersection context of coplanar line pairs

H. Kim, S. Lee / Pattern Recognition Letters 33 (2012) 1349–1363 1355

projected line pairs, come from the true 3D point X, they meet theepipolar constraint. That is, the 2D intersection point x in the firstview is on the epipolar line FTx0, transferred from the correspond-ing point x0 in the second view. x0 is also transferred from x by theepipolar constraint Fx.

On the contrary, non-coplanar paired lines, L1 and L2, do notintersect in 3D. Lines l1 and l2, projected into the first view inter-sect at the point, x, constructing a LICF. Also, for the second view,the projected lines, l01 and l02, intersect at x0. Since neither LICFsback-projects into a real 3D point, neither meet the epipolar con-straint or exist on the same epipolar plane. Although the LICFsare matched by NCC in the initial matching step, the LICFs fromnon-coplanar line pairs can be eliminated by testing whether theymeet the epipolar constraints or not.

In the refinement stage, based on LICF correspondences be-tween two different views, the fundamental matrix is estimated,and mismatches are removed using RANdom SAmple Consensus(RANSAC) (Torr, 2002; Torr and Murray, 1997). The refinementstage using RANSAC is the same as the conventional ones exceptthat the algorithm is applied to matching LICFs instead of matchinginterest points. To find the fundamental matrix giving the bestmatches, the fundamental matrix with maximum inliers is selectedas a solution. The inlier matches LICFs for which the fitting error iswithin a user-defined threshold Eth (0.5 pixel in our experiments).The fitting error, given a fundamental matrix, is defined by thesymmetric transfer error:

Etransðx; x0Þ ¼1

Ntrans

PNtrans

i¼1d x0i; Fxi� 2 þ d xi; F

T x0i� �2

ð6Þ

Fig. 9. Comparison study results for the sce

where dðx; yÞ ¼ ðyT xÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiy2

1 þ y22

q., the distance of the point x from the

line y, and Ntrans denotes the number of matching LICFs identified asinliers. The refined matching LICFs are represented by

Mrefinex;x0 ¼ xk; x0k; Lpairk

; L0pairk

n o; k ¼ 1; . . . ;Krefine ð7Þ

where fxk; Lpairkg 2 CLpair

and fx0k; L0pairkg 2 C0L0pair

, and Krefine denotes

the number of matching LICFs after the matching refinement.

3.3. Line segment matching

Given a matching LICF ðfxk; x0k; Lpairk; L0pairk

gÞ, matching between

individual line segments (flpðkÞ1 ; lpðkÞ2g and fl0p0 ðkÞ1 ; l0p0 ðkÞ2g) from the

coplanar line pairs ðfLpairk; L0pairk

gÞ must be resolved. To find thematch of each line segment from the matched line pair, a sophisti-cated method based on oriented projective geometry can be used(Werner, 2007; Hartley and Zisserman, 2004) if the epiploar geom-etry and the matches of line pair sets are known. However, in thispaper, selecting a line segment with smaller angle difference be-tween the lines of a matched pair is sufficient, assuming smallrotation change between views. This assumption is reasonablewhen a monocular camera is attached to a mobile platform thatmoves on the ground or stereo cameras are verging toward thesame direction to capture depth information of a scene. Moreover,the method is robust even when the fundamental matrix estima-tion is erroneous.

The corresponding line pairs fl1; l2; la; lbg 2Mrefinex;x0 between two

camera views are assumed to be given. For a line segment l1 in the

ne ‘‘tcorner’’. Refer to Fig. 8 for details.

Page 8: Simultaneous line matching and epipolar geometry estimation based on the intersection context of coplanar line pairs

1356 H. Kim, S. Lee / Pattern Recognition Letters 33 (2012) 1349–1363

first image, the matching line segments with index k 2 {a,b} arefound from the matching line pairs in the second image by select-ing the line segments with small angle differences using the fol-lowing Equation:

k ¼ minjfminðjh1 � hjj; jp� h1 þ hjjÞg; j 2 fa; bg: ð8Þ

The final matching lines are represented, as follows.

Ml;l0 ¼ lmk ; l

m0

k

n o; k ¼ 1; . . . ;Kmatch ð9Þ

where Lpairk; L0pairk

n o¼ lm

k1; lmk2; lm0

k1 ; lm0

k2

n o2M

refinex;x0 denoting Kmatch the

number of matching line segments while satisfying Eq. (8).

3.4. Degeneracy analysis

The proposed algorithm works for most line pair configurations,as depicted in Fig. 5, called general configurations, but there areexceptional configurations, called degenerate configurations, inwhich detection and matching of LICFs and/or line segments arenot possible. These degenerate configurations occur when one ortwo lines of a 3D line pair are lying in the epipolar plane, asillustrated in Fig. 6.

In one case of degenerate configuration, one line of a coplanar3D line pair lies in the epipolar line (Fig. 6. Left). When a 3D lineL2 is lying in epipolar plane (colored in sky blue) and is coplanarwith the other line L1 on a 3D plane (colored in pink), there is nodegeneracy to detect and match LICFs, x and x0, in both viewsbecause the LICFs are the projections of the real 3D intersectionpoint X. Considering finite representation of line segments, it is

Fig. 10. Comparison study results for the sce

impossible to determine the end points of the matching 2D linesegments, l2 and l02 using the epipolar constraint because thoseare coincident with the epipolar lines, FTx0 and Fx, in both images,respectively. However, in image applications, only knowing theline equations of matching lines instead of finite end point loca-tions of matching line segments is still very useful and meaningful.

In another case of degenerate configuration, one line of a non-coplanar 3D line pair lies in the epipolar plane (Fig. 6. Middle). Inthis case, a line L2 lies in the epipolar plane (colored in sky blue)and does not meet with the other line L1 in 3D. The LICFs, x andx0, do not correspond to a real 3D point, resulting in the LICFs dif-fering in their photometric information. However, since the LICFsare not matched by photometric invariance, this case can beavoided, as in non-coplanar line pairs in general configurations.

A final case of degenerate configuration involves a (coplanar) 3Dline pair of which both two lines are lying in the epipolar plane(Fig. 6. Right). Since both 3D lines, L1 and L2, and their intersectionX are lying in the same epipolar plane, their projections, x, l1, andl2ðx0; l01, and l02) in the first (second) view are on the epipolar linesFTx0. This case cannot be considered using the proposed method be-cause those lines are merged into a single line in the line detectionstage. However, the line pairs of which the relative angle differenceis very small are close to the degenerate configuration, which canbe avoided by not pairing those lines.

4. Simplification by known camera geometry

Known camera geometry, a simplified and fast algorithm is fur-ther proposed by rectifying the image pairs in advance (Hartley

ne ‘‘valbonne’’. Refer to Fig. 8 for details.

Page 9: Simultaneous line matching and epipolar geometry estimation based on the intersection context of coplanar line pairs

H. Kim, S. Lee / Pattern Recognition Letters 33 (2012) 1349–1363 1357

and Zisserman, 2004). It is suitable for real-time practical applica-tions, such as robotics or video surveillance. For fixed stereocamera configurations, such as off-the-shelf stereo cameras andfixed multiple wide-baseline cameras, accurate camera geometry,including factors such as epipolar constraints, can be pre-computedby calibration or manual correspondence, allowing rectified imagepairs to be easily produced. When the rectified image pairs are

Table 1Quantitative comparison for the fundamental matrix estimation.

apt tcorner valbonne kampa

Harris# of detected features 978:966 978:978 978:978 978:978# of initial matches 299 279 147 100# of refined matches 202 179 99 29Inlier ratio (%) 67.56 64.16 67.35 29.00Transfer Error 0.1917 0.1976 0.2957 3.3197Max sampling number 124,394 450,898 118,696 369,879

LICF# of detected lines 166:173 172:192 186:242 419:449# of detected LICFs 361:362 281:305 266:584 1090:1217# of initial matches 78 90 52 65# of refined LICF matches 45 78 42 23# of refined line matches 53 70 34 34Inlier ratio (%) 57.70 86.67 80.77 35.39Transfer Error 0.2228 0.1535 0.3125 0.8145Max sampling number 49,761 336,294 111,918 251,309

Fig. 12. Results in the poorly textured sce

Fig. 11. Comparison study results for the sc

given, the initial matching and the refinement matching aremerged into one stage because camera geometry estimation canbe omitted. The simplified algorithm, developed for the rectifiedstereo pairs by known camera geometry, is called ‘‘the simplifiedversion,’’ compared to the original proposed algorithm, called ‘‘thefull version,’’ can be used to perform line matching and camerageometry estimation at the same time.

lab1 lab2 table sink Average (standard deviation)

978:978 978:978 978:978 978:978 –296 218 404 411 269.25 (110.95)126 106 229 246 152 (74.38)42.57 48.62 56.68 59.85 54.47 (13.57)0.3047 0.2510 0.2646 0.2365 0.6327 (1.0865)241,591 189,770 487,076 128,115 263,802 (151,649)

774:668 162:170 146:157 185:207 –387:334 442:444 288:364 401:451 –54 94 130 178 92.63 (42.78)49 53 74 141 63.13 (36.04)48 49 67 107 57.75 (23.86)52.94 56.38 56.92 79.21 63.248 (17.37)0.3221 0.2416 0.2467 0.2575 0.3214 (0.2060)77,871 405,371 352,656 278,592 232,972 (134,970)

ne ‘‘lab1’’. Refer to Fig. 8 for details.

ene ‘‘kampa’’. Refer to Fig. 8 for details.

Page 10: Simultaneous line matching and epipolar geometry estimation based on the intersection context of coplanar line pairs

Fig. 14. Experimental results and comparison study for the scene ’’table’’. (First and third rows) Images and results from the first view. (Second and fourth rows) Images andresults from the second view. In each column of the 1st and 2nd rows are the original test images, matching lines when not using known epipolar geometry, and matchinglines after rectification, in that order. In each column of the 3rd an 4th rows are matching Harris corners, the matching LICFs when not using known epipolar constraints, andmatching LICFs after rectification, in that order, with their estimated epipolar constraints.

Fig. 13. Results in the poorly textured scene ‘‘lab2’’. Refer to Fig. 8 for details.

1358 H. Kim, S. Lee / Pattern Recognition Letters 33 (2012) 1349–1363

Page 11: Simultaneous line matching and epipolar geometry estimation based on the intersection context of coplanar line pairs

H. Kim, S. Lee / Pattern Recognition Letters 33 (2012) 1349–1363 1359

The procedure is as follows. First, stereo image pairs are recti-fied using known camera geometry; then LICFs are extracted fromline segments and matched on the known epipolar lines. The LICFmatching step is done on the scan line, as illustrated in Fig. 7. For aLICF, x, in an image, a matching LICF can be found on the same scanline in the other image among the LICFs, x01, x02, and x03. Amongthem, the LICF match with the highest NCC is selected as a match.The distance difference between the the LICF, x in the first viewand the LICFs in the second view, x01, x02, and x03, correspond tothe disparities of the stereo image pairs, �d1, d2, and d3, respec-tively. The matching LICFs with high NCC scores are regarded asnon-occluded features, and those with low NCC scores as occluded.

5. Experimental results

5.1. Comparison studies

First, the proposed method is compared with the conventionalinterest-point-based matching method, involving Harris corners,by evaluating the number of (correct) matching features and theaccuracy of the estimated fundamental matrix by the symmetrictransfer error, defined in Eq. (6). For fundamental matrix estima-tion, the eight-point algorithm is implemented (Hartley, 1997).

Fig. 15. Experimental results and comparison study

Test image pairs are from richly textured outdoor scenes be-cause they include line features and corner features in the samescene, and they are collected through the Internet websites (Lloyd,2003; Werner, 2007). In Lloyd (2003) and Werner (2007), narrow-baseline stereo image pairs and relatively wide-baseline ones arecollected, respectively.

Overall, the comparison studies prove that the method pro-posed herein can match line features correctly, as did the inter-est-point-based method, in well-textured regions. Moreover, theproposed method works better in the poorly textured regions.More interestingly, correct line matching is observed even in thescenes and local regions with large perspective distortion. The pro-posed method is not intended for the line matching in the widebase line stereo, but it has a possibility to be extended in thosecases.

Fig. 8 shows the results from the scene ‘‘apt’’. Feature matchingresults of both Harris corners and LICFs are fairly correct and accu-rate while giving reasonable fundamental matrix estimation re-sults. The difference is that, while Harris corners are detected andaccurately matched more often for textured regions, such as apart-ments and white cars, LICFs occur at the junctions of balconies/bricks of the apartment and the parking lot lanes. In parking lotlanes, we observe matching performance under large perspective

for the scene ‘‘sink’’. Refer to Fig. 14 for details.

Page 12: Simultaneous line matching and epipolar geometry estimation based on the intersection context of coplanar line pairs

1360 H. Kim, S. Lee / Pattern Recognition Letters 33 (2012) 1349–1363

distortion, resulting from the close distance between the scene andthe cameras. Looking at the details, a mismatch of LICFs occurs at#21 when the repetitive textured LICFs were matched on the sameepipolar lines, resulting in mismatches of the related individual linesegment #24. Note that the end points of matching line segmentsare not determined here, and the equations of line matching needto be compared. The figures of our experimental results are bestviewed in color and with PDF magnification.

Fig. 9 presents the results in the scene ‘‘tcorner’’. The scene isharder to match than the previous scene because of more repeti-tive patterns of window frames and poor line distribution in thebottom part of the image space. In this scene, the accuracy of theestimated fundamental matrices of Harris matches is slightly bet-ter than that of LICF matches. Some LICFs (#19, #63, and #66)are incorrectly matched while being propagated into the linematching results (#19 and #62) because actual matching regionsare not only shown in the other image pair, but also matched,according to most similar features. Image margins can be set forsafe matching later. In general, in the presence of repetitive pat-terns, we claim that interest-point-based matching is more accu-rate because it can reduce the chance of miss-matching bydetecting more discriminative features with strong photometricinformation for matching.

For a small set of relatively widely separated image pairs pro-vided in Werner (2007), we obtained correct matching results,although the proposed method is not designed for wide-baselineimage matching, even in the scene for which the interest-point-based method fails. In the scene ‘‘valbonne,’’ shown in Fig. 10,the feature matching results of both Harris corners and LICFs areaccurately matched, producing slightly different epipolar geome-try. Matching results are all correct except the line pairs from thematching LICF #9. Interestingly, the LICF #9, matched from theincorrect line pairs #9 and #10, is not from coplanar line pairs,but were matched because they were projected from a true 3Dpoint. The elimination of the mistaken LICF match does not leadto the failure of epipolar geometry estimation.

The scene ‘‘kampa’’ in Fig. 11 is more challenging because imagepairs have large perspective distortion. The proposed method givescorrect matching results with correct and accurate epipolar geom-etry, while the interest-point-based method fails in this regard. As

Table 2Quantitative comparison between the full version and the simplified version in poorlytexture scenes.

Table Sink

The full version# of LICF matches 74 141# of line matches 67 107Symmetric transfer error 0.2467 0.2575

The simplified version# of LICF matches 77 124# of line matches 64 101Symmetric transfer error 1.0700+e�26 2.2123+e�26

Fig. 16. Reconstructed line structure of the scene ‘‘table’’. The figures are the first view imin that order.

mentioned earlier, the reason is that the proposed method detectsmeaningful geometric structures by eliminating the less meaning-ful photometric features. However, the matching NCC scores in thewide-baseline stereo matching are low, compared to the narrow-baseline stereo matching; therefore, sophisticated handing of thewidely separated matching must be part of future work.

To evaluate the performance in the two narrow-baseline scenesquantitatively, the symmetric transfer error, the inlier ratio, andthe maximum sampling number are analyzed (see Table 1). Interms of the symmetric transfer errors, the accuracy of the esti-mated fundamental matrices by both Harris and LICF matches inboth scenes are comparable. For the scene ‘‘apt’’, the symmetrictransfer error by Harris matches are slightly smaller (0.1917) thanthat by LICF matches (0.2228), but for the scene ‘‘tcorner’’, the re-sult is opposite (0.1976 for matching Harris corners, 0.1535 formatching LICFs).

For the inlier ratio, the results of both matching methods arecomparable. Harris corners are relatively reliable for matchingthan LICFs in the scene ‘‘apt’’, but it was opposite in the scene‘‘tcorner’’. The inlier ratios of Harris matching are 67.56% and64.16% for the scenes, ‘‘apt’’ and ‘‘tcorner’’, respectively. Those ofLICF matching are 57.70% and 86.67% for the scenes, ‘‘apt’’ and‘‘tcorner’’, respectively.

In terms of the maximum sampling number, matching of LICFsare slightly better and it can be said that matching of LICFs needless samplings to find best matches then that of Harris corners.Matching of LICFs are faster and less iteration in the refinement.The maximum sampling numbers of Harris matching are 124,394and 450,898 for the scenes, ‘‘apt’’ and ‘‘tcorner’’, respectively. Thoseof LICF matching are 49,761 and 336,294 for the scenes, ‘‘apt’’ and‘‘tcorner’’, respectively.

5.2. Results on poorly textured scenes

Second, the proposed method is applied to poorly texturedscenes to show line matching performance in real world situationsand to demonstrate the accuracy of the simplified version, com-pared to the full version. The image pairs are captured by off-the-shelf cameras, PointGrey’s Bumble-bee stereo cameras (30 Hzimages, 640 � 480 pixels), attached to a mobile platform. To simu-late the unknown camera geometry cases, the general stereo imagepairs are captured from the same off-the-shelf stereo camera yet ata different time frame for each pair.

The experimental results on poorly textured scenes show thatthe proposed method can detect and match most important geo-metric structures of poorly textured scenes, including tables,chairs, sinks, and electronics appliances, while the interest-point-based method fails to match and recover these geometric struc-tures. In addition, when camera geometry is given, the simplifiedversion is comparable to the full version, and runs fast enoughfor line matching and its 3D recovery in robotics applications.

Figs. 12 and 13 are the results of the scenes, ‘‘lab1’’ and ‘‘lab2’’,which contain noticeable scale change and perspective effect due

age as well the scenes viewed from the first camera, from the top, and from the side,

Page 13: Simultaneous line matching and epipolar geometry estimation based on the intersection context of coplanar line pairs

H. Kim, S. Lee / Pattern Recognition Letters 33 (2012) 1349–1363 1361

to forward camera motion into the objects. The experimentalresults show that the proposed method can detect and match mostimportant structures, such as the edges and patterns of the tables,the box lines of the refrigerator, the vertical and horizontal lines ofthe walls, the textures/shadows on the table tops, and the linesof the floor. In those scenes, although the estimation of the

Fig. 17. Reconstructed line structure of the scene ‘‘sink’’. The figures are the first view imaorder.

Fig. 18. Line matching results with respect to varying orientation and distance betweeshown from the stereo camera pairs which are 1.8-m-distant; (3rd and 4th rows) 2.1-m-dcolumn, from top to bottom, the scenes with different orientations between the long edgthat order.

fundamental matrix is not accurate due to the uneven distributionof features, most line segments are correctly matched due tosimultaneous camera geometry estimation and line matching, withthe exception of a few mismatches, like LICF #48 in Fig. 13. Notethat the end points of the matching line segments are determinedby the estimated or known epipolar constraints in this section,

ge, the scenes viewed from the first camera, from the top, and from the front, in that

n objects and cameras. (1st and 2nd rows) The first and second views of the tableistant; (5th and 6th rows) 2.4-m-distant; (7th and 8th rows) 2.7-m-distant. In eache of the table and the stereo camera baseline: 0�, 15�, 30�, 45�, 60�, 75�, and 90�, in

Page 14: Simultaneous line matching and epipolar geometry estimation based on the intersection context of coplanar line pairs

1362 H. Kim, S. Lee / Pattern Recognition Letters 33 (2012) 1349–1363

such that the line segments can be reconstructed by recovering theend points and the degenerate configuration, where the end pointsof matching lines are inaccurate, is easily noticeable.

Figs. 14 and 15 are the stereo image pairs that are captured bythe off-the-shelf camera and rectified using the pre-calibratedcamera parameters. Those scenes are acquired in a kitchen de-signed for the demonstration of real situation performance of per-sonal robots, especially a robot’s ability to serve foods and/ordrinks to the elder.

The matching results are very impressive. In the scene ‘‘table’’,shown in Fig. 14, all of matching lines are correct matches, whilecapturing, the most crucial details of the dinning table, the chair,and the environment. The scene ‘‘sink’’, shown in Fig. 15, is morecomplex because it includes many objects, such as a soda fountain,a tray, electronic appliances, and the sink. The general matching re-sult is quite accurate except the mismatches of the matching linepairs {#76, #79} and {#46, #91}. The first is from the mismatchof LICF due to similar texture areas, and the latter resulted fromthe failure of individual line segment matching based on relativeline angle.

5.2.1. Experimental results of the simplified versionAdditionally, the full version, the simplified version, and the

interest-point-based method are applied to the scenes in Figs. 14and 15, to compare their performance. Performance is evaluatedin terms of the accuracies of the estimated camera geometry andmatching results. When the full version and the interest-point-based method are applied, the known camera geometry is blinded

Fig. 19. Results in terms of varying orientations: 0�, 15�, 30�, 45�, 60�, 75�,

Fig. 20. Results in term of varying distances: 1.8 m, 2.1 m, 2.4 m, and 2

and, instead, is estimated during the algorithms. The results showthat the full version is slightly better than the simplified version interms of the number of matched lines and fundamental matrixaccuracy. The error in camera calibration seems to explain the dif-ference. In the full version, the error does not affect the line match-ing results because the camera geometry, i.e., the fundamentalmatrix, is simultaneously estimated. The full version capturesmore matching line segments (e.g., #4, #48, #49, and #65 in thescene ‘‘table’’, and #6, #19, #66, and #98 in the scene ’’sink’’) thandoes the simplified version. For the interest-point-based method,the estimation of camera geometry is incorrect because of the poortextures in the scenes. Harris corner matching and fundamentalmatrix estimation are quite comparable in the scene ‘‘table,’’ asshown in Fig. 14, but are inferior in the scene ‘‘sink’’ (Fig. 15) be-cause Harris corners are not detected and matched in the bot-tom-left part of the image pair.

Table 2 presents the figures for the number of matching LICFs,the number of matching lines, and symmetric transfer error, bothfor the full version and for the simplified version. Note that thesymmetric transfer error in the rectified case is close to zero be-cause the matches are done on the epipolar lines, owing to the rec-tification process.

Line reconstruction: Furthermore, 3D structures are recoveredfrom matching line segments using the calibrated camera parame-ters. Instead of direct line reconstruction, the end points of match-ing line segments are reconstructed (Werner and Zisserman, 2002;Hartley and Sturm, 1997). In the scene ‘‘table’’, the rectangularshape of the tabletop and orthogonal structure between the

and 90�. The values shown are the average, minimum, and maximum.

.7 m. The values shown are the average, minimum, and maximum.

Page 15: Simultaneous line matching and epipolar geometry estimation based on the intersection context of coplanar line pairs

H. Kim, S. Lee / Pattern Recognition Letters 33 (2012) 1349–1363 1363

tabletop and the legs of the table and the chair are modeledproperly (Fig. 16). In the scene ‘‘sink’’ (Fig. 17), the T-shape andthe rectangular shapes of the sink are reconstructed correctly.

Speed: The simplified version of the proposed algorithm for therectified stereo image pairs is implemented in C/C++ for fast pro-cessing in near real-time. The processing unit consists of a Pentium4G Hz CPU and 2G Byte RAM. The line feature extraction moduleruns for 200 ms for both views, and the line matching module,including line pair determination, runs for 350 ms.

For the comparison purpose, color-histogram-based matching isimplemented. The proposed method (350 ms) is faster than thecolor-histogram-based method (500 ms), conducting more numer-ous line matches and more accurate matching results. Althoughthe color-histogram-based method needs more sophisticatedstages to be used as a line matcher (Bay et al., 2005), already theaccuracy and speed improvements by the proposed line matchingmethod is significant. Note that Bay et al. (2005) report that theirimplementation takes 8 s, on average, using Pentium 4 at1.6 GHz, excluding line detection.

5.2.2. Sensitivity analysis under orientation and distance variationsFinally, the matching results of LICFs and segments are pre-

sented when the orientation and distance varies between camerasand objects in order to ensure the stability and reliability of thosechanges. Fig. 18 shows the matching results with respect to orien-tation and distance variation. The orientation varies from 0� to 90�,with a step of 15�, and the distance varies among 1.8 m, 2.1 m,2.4 m, and 2.7 m. The results show that the matching is quite evenwhen the orientation and the distance vary.

Figs. 19 and 20 illustrate the numbers of matching LICFs andmatching lines with respect to orientation and distance. These fig-ures show that the matching numbers are consistent to orientationand distance variation. However, there is large variation in thestandard deviation of the numbers because the background clutteris not eliminated, and the viewing volume for the table is not care-fully handled.

6. Concluding remarks

In this paper, a novel line matching algorithm, based on the lineintersection context feature, was presented. Experimental resultsshowed that the performance is comparable and complimentaryto that of interest-point-based matching. The proposed algorithmworks well for poorly textured scenes, an area in which interest-point-based matching often fails. Simultaneous line matchingand fundamental matrix estimation achieved correct line matchingindependent of the fundamental matrix estimation accuracy. Thesimplified version for rectified images using the pre-computedfundamental matrix also showed comparable results with nearreal-time speed.

This paper focused on the poorly textured indoor scenes fordomestic robotics using off-the-shelf stereo cameras. Future workwill extend the algorithm for use with widely separated views,including 3D rotations and translations, by incorporating sophisti-cated region descriptors, such as SIFT, and possibly a new descrip-tor utilizing intersection context for general visual recognition.Also, for tracking, a sequential line matching and 3D scene recon-struction scheme can be developed to handle multiple views.

Acknowledgments

This work was supported in part by the Intelligent Robot Pro-gram under the Frontier R&D Initiative, also in part by theKORUS-Tech Program (KT-2008-SW-AP-FSO-0004) and in part bythe ITRC Program NIPA-2012-(C1090-1221-0008) of MKE, Korea,

as well as in part by WCU Program (R31-10062-0) and PRCP(2011-0018397) of NRF / MEST, Korea.

References

Baillard, C., Schmid, C., Zisserman, A., Fitzgibbon, A., England, O.O., 1999. Automaticline matching and 3d reconstruction of buildings from multiple views. In: ISPRSConference on Automatic Extraction of GIS Objects from Digital Imagery, IAPRS,vol. 32, Part 3-2W5, pp. 69–80.

Bartoli, A., Sturm, P., 2003. Multiple-view structure and motion from linecorrespondences. In: ICCV ’03: Proceedings of the Ninth IEEE InternationalConference on Computer Vision, IEEE Computer Society, Washington, DC, USA,p. 207.

Bay, H., Ferrari, V., Van Gool, L., 2005. Wide-baseline stereo matching with linesegments. In: CVPR05, pp. I: 329–336.

Bay, H., Ess, A., Neubeck, A., Van Gool, L., 2006. 3d from line segments in two poorly-textured, uncalibrated images. In: 3DPVT06, pp. 496–503.

Criminisi, A., Reid, I., Zisserman, A., 1999. Single view metrology. Int. J. Comput. Vis.40, 123–148.

Gonzalez, R.C., Woods, R.E., 2006. Digital Image Processing, 3rd ed. Prentice-Hall,Inc., Upper Saddle River, NJ, USA.

Harris, C., Stephens, M., 1988. A combined corner and edge detection. In:Proceedings of The Fourth Alvey Vision Conference, pp. 147–151.

Hartley, R.I., 1997. In defense of the eight-point algorithm. IEEE Trans. Pattern Anal.Mach. Intell. 19, 580–593.

Hartley, R.I., Sturm, P., 1997. Triangulation. Comput. Vis. Image Underst. 68, 146–157.

Hartley, R.I., Zisserman, A., 2004. Multiple View Geometry in Computer Vision,second ed. Cambridge University Press, ISBN: 0521540518.

Kim, G., Hebert, M., Park, S.K., 2007. Preliminary development of a line feature-based object recognition system for textureless indoor objects. In: Lee, S., HongSuh, I., Sang Kim, M. (Eds.), Recent Progress in Robotics: Viable Robotic Serviceto Human. Springer-LNCIS, pp. 255–268.

Kim, E., Medioni, G., Lee, S., 2008. Planar patch based 3d environment modelingwith stereo camera. In: RO-MAN07, Jeju Island, Korea, pp. 516–521.

Lloyd, B., 2003. Computation of the fundamental matrix. http://www.cs.unc.edu/�blloyd/comp290-089/fmatrix/.

Lowe, D.G., 2004. Distinctive image features from scale-invariant keypoints. Int. J.Comput. Vis. 60, 91–110.

Matas, J., Chum, O., Martin, U., Pajdla, T., 2002. Robust wide baseline stereo frommaximally stable extremal regions. In: Proceedings of British Machine VisionConference, London, pp. 384–393.

Micusik, B., Kosecka, J., 2009. Piecewise planar city 3d modeling from street viewpanoramic sequences. In: IEEE Conference on Computer Vision and PatternRecognition (CVPR), Miami, FL, USA, pp. 2906–2912.

Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F.,Kadir, T., Gool, L.V., 2005. A comparison of affine region detectors. Int. J. Comput.Vis. 65, 43–72.

Mordohai, A.F., Akbarzadeh, A., m. Frahm, J., Mordohai, P., Engels, C., Gallup, D.,Merrell, P., Phelps, M., Sinha, S., Talton, B., Wang, L., Yang, Q., Stewenius, H.,Yang, R., Welch, G., Towles, H., Nister, D., Pollefeys, M., 2006. Towards urban 3dreconstruction from video. In: 3DPVT, pp. 1–8.

Pollefeys, M., Koch, R., Gool, L.V., 1999. Self-calibration and metric reconstructioninspite of varying and unknown intrinsic camera parameters. Int. J. Comput. Vis.32, 7–25.

Schmid, C., Zisserman, A., 1997. Automatic line matching across views. In: IEEEConference on Computer Vision and Pattern Recognition (CVPR), pp. 666–671.

Schmid, C., Mohr, R., Bauckhage, C., 2000. Evaluation of interest point detectors. Int.J. Comput. Vis. 37, 151–172.

Snavely, N., Seitz, S.M., Szeliski, R., 2006. Photo tourism: Exploring photo collectionsin 3d. In: SIGGRAPH Conference Proceedings. ACM Press, New York, NY, USA, pp.835–846.

Tang, A.W.K., Ng, T.P., Hung, Y.S., Leung, C.H., 2006. Projective reconstruction fromline-correspondences in multiple uncalibrated images. Pattern Recogn. 39, 889–896.

Torr, P.H., 2002. A Structure and Motion Toolkit in Matlab: Iterative adventures in Sand M. Technical Report MSR-TR-2002-56. Microsoft Research.

Torr, P.H.S., Murray, D.W., 1997. The development and comparison of robustmethods for estimating the fundamental matrix. Int. J. Comput. Vis. 24, 271–300.

Vincent, E., Laganière, R., 2004. Junction matching and fundamental matrix recoveryin widely separated views. In: Proceedings of British Machine VisionConference, London, UK, pp. 77–86.

Wang, L., Neumann, U., You, S., 2009a. Wide-baseline image matching using linesignatures. In: 2009 IEEE 12th International Conference on Computer Vision,IEEE, pp. 1311–1318.

Wang, Z., Wu, F., Hu, Z., 2009b. Msld: A robust descriptor for line matching. PatternRecogn. 42, 941–953.

Werner, T., 2007. Lmatch: Matlab toolbox for matching line segments acrossmultiple calibrated images. http://cmp.felk.cvut.cz/�werner/software/lmatch/.

Werner, T., Zisserman, A., 2002. New techniques for automated architecturalreconstruction from photographs. In: ECCV ’02: Proceedings of the 7thEuropean Conference on Computer Vision-Part II, Springer-Verlag, London,UK, pp. 541–555.

Zhao, F., Huang, Q., Gao, W., 2006. Image matching by multiscale oriented cornercorrelation. In: ACCV (1), pp. 928–937.


Recommended