A Local Iterative Reﬁnement Method for Adaptive Support...

A Local Iterative Refinement Method for Adaptive Support-WeightStereo Matching

Eric T. Psota, Jedrzej Kowalczuk, Jay Carlson, and Lance C. PérezDepartment of Electrical Engineering, University of Nebraska, Lincoln, NE, U.S.A.

Abstract— A new stereo matching algorithm is introducedthat performs iterative refinement on the results of adap-tive support-weight stereo matching. During each iterationof disparity refinement, adaptive support-weights are used bythe algorithm to penalize disparity differences within localwindows. Analytical results show that the addition of iterativerefinement to adaptive support-weight stereo matching doesnot significantly increase complexity. In addition, this newalgorithm does not rely on image segmentation or plane fitting,which are used by the majority of the most accurate stereomatching algorithms. As a result, this algorithm has lowercomplexity, is more suitable for parallel implementation, anddoes not force locally planar surfaces within the scene. Whencompared to other algorithms that do not rely on imagesegmentation or plane fitting, results show that the new stereomatching algorithm is one of the most accurate listed on theMiddlebury performance benchmark.

Keywords: Stereo matching, stereo correspondence, adaptive sup-port weights, iterative stereo.

1. IntroductionFor the past thirty years, stereo matching has been one of

the most thoroughly researched problems in computer vision,with recent research trends focused on either maximizingspeed at the expense of accuracy or maximizing accuracy atthe expense of speed. The vast majority of stereo matchingalgorithms focused on achieving high accuracy use both color-based image segmentation and plane fitting [1]–[3]. Image seg-mentation significantly enhances stereo matching performance,particularly in areas of low contrast; however, image seg-mentation methods, such as mean-shift, are generally difficultto implement in parallel hardware [2]. Thus, algorithms thatuse image segmentation, though highly accurate, are typicallymuch slower than those with parallel implementations.

Regardless of whether they are focused on improving speedor accuracy, most algorithms use some form of window-basedmatching costs. Since its introduction, the adaptive support-weight (ASW) algorithm for computing the stereo matchingcost criteria [4] has become the most extensively used window-based cost criteria in the field. Unlike many prior approaches towindow-based stereo matching that attempt to find the optimalshape of the support window [5] [6], the ASW algorithmcomputes the level of similarity between each pixel in thewindow and the central pixel of interest using both spatialand color distance. The resulting similarity measure is thenused to compute a weighted average of the matching cost.

Adaptive support-weight stereo matching has since beenincorporated into many of the most effective stereo matchingalgorithms. One of the most accurate methods, introduced in2009 by Yang et al. [2], uses a combination of ASW costs,belief propagation, image segmentation, and plane fitting. Inaddition, several real-time stereo matching algorithms use theASW costs to improve accuracy [7] [8].

Here, a new stereo matching algorithm is introduced thatperforms iterative refinement on the results of ASW stereomatching. This new algorithm uses the set of adaptive support-weights for both cost computation and iterative refinement ofthe disparity map. The iterative refinement method assignsa cost penalty whenever a pixel’s disparity differs from thedisparity of its neighboring pixels, where the level of similarityis measured using adaptive support-weights. Thus, the adaptivesupport-weights computed for matching costs are reused todetermine the magnitude of the iterative refinement cost.The disparity penalty cost is added to the original matchingcosts at each iteration, and a combination of both dynamicprogramming (DP) and winner-take-all (WTA) strategies isused to determine reliable matches.

Unlike many of the most accurate stereo matching algo-rithms, this new algorithm does not require image segmenta-tion or plane fitting. Thus, the new algorithm benefits fromthe associated reduction in computational complexity, and itis more suitable for parallel implementation. It also doesnot require that segmented areas within the scene lie on thesame plane, an assumption that can be problematic for stereomatching of curved surfaces [9]. When compared to all otheralgorithms that do not require image segmentation or planefitting, results show that this new algorithm is among the mostaccurate listed on the Middlebury performance benchmark[10].

Section 2 introduces the necessary background and notationrelated to both window-based stereo matching a dynamic pro-gramming. Section 3 introduces the local iterative refinementmethod used by the proposed stereo matching algorithm. Fi-nally, Section 4 presents results comparing this new algorithmto other relevant algorithms.

2. BackgroundLet IL be the left image and IR be the right image. The

color of a pixel p = (m,n) ∈ IL, with row location m ∈1, . . . ,M and column location n ∈ 1, . . . , N, is givenby the three-dimensional color intensity vector (Rp, Gp, Bp).Similarly, the color of p′ = (m′, n′) ∈ IR, where m′ ∈

1, . . . ,M ′ and n′ ∈ 1, . . . , N ′ is given by (Rp′ , Gp′ , Bp′).Throughout this paper, indices ending with a prime will alwaysrefer to the right image, and indices without a prime will referto the left image.

The goal of a typical dense stereo matching algorithm is tofind a set of correspondences between pixels in the left andright images. In order to limit the search for correspondencesto one dimension, images are typically rectified so that epipolarlines are horizontal and corresponding pixels are made to lieon the same line in both images [11]. Throughout this paper,it is assumed that all image pairs have been properly rectifiedand the total number of rows in each image is M ′ = M . Itis worth noting that, while much of the analysis presented inthis paper is concerned with matching pixels between a singlecorresponding lines in the images, a dense matching betweenIL and IR requires that each line m = 1, . . . ,M is matched.

A common method for visualizing correspondences betweenmatching pixels is via the disparity image. When the pixelcorresponding to p ∈ IL is p′ ∈ IR, the disparity estimate ofpixel p in the left image is denoted by EL(p) = (n− n′), andthe disparity estimate of pixel p′ in the right image is denotedby ER(p′) = (n− n′).

2.1 Window-Based Stereo MatchingWindow-based stereo matching uses information from each

pixel’s surrounding neighborhood to estimate correspondencesbetween pixels. A typical window-based matching approachoperates under the assumption that surfaces in the two imagesare locally continuous and approximately parallel to the imageplanes of both cameras. Using a square window with pixel p inthe center, the set of neighboring pixels in the area surroundingpixel p is defined by

Ωp = l = (m + j, n + k) | j = −ω, ..., ω and k = −ω, ..., ω,

where 2ω+ 1 is the width of the window that contains p. Thecost of matching pixel p to pixel p′ is given by

D(p, p′) =∑

l∈Ωp , l′∈Ωp′

d(l, l′).

where D is a matrix that contains the pairwise cost of matchingall elements p and p′ on the same row m in the images. Inthis paper, the two distance metrics of interest are the sum ofabsolute difference (SAD), given by

dSAD(l, l′) = |Rl −Rl′ |+ |Gl −Gl′ |+ |Bl −Bl′ |,

and the Euclidean distance, given by

dE(l, l′) =√

(Rl −Rl′)2 + (Gl −Gl′)2 + (Bl −Bl′)2.

For further discussion related to distance metrics, the reader isreferred to [12].

One of the most popular methods for obtaining correspon-dences from a matching cost matrix D is the winner-take-all(WTA) method, which is essentially an energy minimizationscheme that assigns matches for every pixel p in the left imagegiven by

m(p) = argminp′=(m,n′) |n′=1,...,N ′

D(p, p′), (1)

and a match for every pixel p′ in the right image given by

m(p′) = argminp=(m,n) |n=1,...,N

D(p, p′). (2)

Another common method for determining matches betweenpixels includes dynamic programming, which is discussed laterin Section 2.3.

2.2 Adaptive Support-Weight Stereo MatchingAdaptive support-weight window-based stereo matching ap-

plies weights to each of the pixels surrounding the pixel ofinterest during matching [4]. Under the gestalt principle ofproximity, a typical surface is locally continuous and has lo-cally consistent color [13]. Consider a pixel l = (m+i, n+j).It is more likely that pixel l is on the same surface as pixelp = (m,n) when the Euclidean distance ∆d(p, l) =

√i2 + j2

between them is small. It is also more likely that pixel lis on the same surface as pixel p when the difference incolor between them, given by dE(p, l), is small. Note that theCIELab color space is used to compute dE(p, l) in place ofthe RGB color space because it more accurately reflects humanperceptual differences through three-dimensional distance.

In order to make use of the Gestalt principles of proximityfor the purposes of stereo matching, ASW window-basedstereo matching applies weights to the pixels based upon theirEuclidean distance and color difference. The weight appliedto a pixel l is given by

w(p, l) = e

(− dE(p,l)

γc−∆d(p,l)

γd

), (3)

where the values of γc and γd are chosen empirically. Whenapplied to window-based stereo matching, the weights are usedto compute the cost of matching pixel p in the left image topixel p′ in the right image, given by

D(p, p′) =

∑l∈Ωp , l′∈Ωp′

w(p, l)w(p′, l′)dE(l, l′)


w(p, l)w(p′, l′). (4)

Once the distance matrix D is computed, it is possible to obtaina set of matches using either the WTA criteria [4] or dynamicprogramming [7]. The method presented in this paper uses acombination of both WTA and dynamic programming to obtaina set of reliable matches.

2.3 Dynamic ProgrammingThe term Dynamic Programming (DP) refers to a process of

constructing global solutions to multi-stage decision problems[14]. In stereo matching applications, DP is used to determinean optimal sequence of matches between corresponding rowsof pixels in IL and IR. The first step in DP is to aggregatematching costs, by creating a matrix D of distances betweenall pairs of elements in the two rows. Once D is created, DPfinds a mapping between the rows of pixels expressed by anordered set Φ = φk = (n, n′) | 1 ≤ k ≤ K, 1 ≤ n ≤ N, 1 ≤n′ ≤ N ′. Such sets, also known as warping paths in D, arerequired to meet the following conditions:

1) Boundary constraintThe path starts at D(1, 1) and ends at D(N ′, N), that isφ1 = (1, 1) and φK = (N,N ′).

2) Continuity constraintAny two consecutive pairs φk and φk+1 correspond toadjacent elements of D.

3) Monotonicity constraintFor all consecutive pairs φk = (n, n′) and φk+1 =(n+, n

′+), both n+ − n ≥ 0 and n′+ − n′ ≥ 0.

Among exponentially many paths satisfying the given prop-erties, the particular path of interest is the one associated withminimum global matching cost. To find this path, the DPalgorithm constructs an accumulated distance matrix A, whereeach element A(n, n′) contains distances accumulated alongthe minimum-cost path leading to this element. Each elementofA is evaluated by adding its associated cost with the smallestcost from the three possible adjacent elements (top, top-left,and left – which follows from properties 2) and 3) above):

A(n, n′) = D(n, n′) + min

A(n− 1, n′),A(n− 1, n′ − 1),A(n, n′ − 1)

after initializing A(1, 1) = D(1, 1). Once this matrix has beenpopulated, we find the optimal path through D by startingfromA(N,N ′) and backtracking to A(1, 1), following the pathwhich minimizes the accumulated distance and abides by therules established in properties 2) and 3).

In order to encourage smooth matching from dynamic pro-gramming it is common to penalize singularities, i.e., verticalor horizontal moves in A. A cost penalty λ can be incorporatedin the formation of the cumulative distance matrix by using

A(n, n′) = D(n, n′) + min

A(n− 1, n′) + λ,A(n− 1, n′ − 1),A(n, n′ − 1) + λ

(5)

After DP is complete, a set of continuous matches betweenthe rows of pixels is given in the form of Φ. Finally, this setof continuous matches can be translated into a sequence ofdisparity estimates given by (n−n′) for all φk = (n, n′) ∈ Φ.

3. Iterative Refinement of AdaptiveSupport-Weight Stereo Matching

In this section, a new method for iterative refinement foradaptive support-weight stereo matching is derived. Recall thedecision criteria, given by Equation (1), for finding a matchfor pixel p, given by m(p) = p′. In the ideal case, the resultingmatch p′ is the candidate with the highest probability of beingthe correct match, given the two images. Thus,

m(p) = argmaxp′=(m,n′) |n′=1,...,N ′

P (p↔ p′ | IL, IR) (6)

is a generalization of an ideal matching, where p↔ p′ denotesa correct correspondence between pixels p and p′.

Using window-based stereo matching, only the windows Ωpand Ωp′ surrounding the pixels p and p′ are used to determinematches. This significantly reduces the complexity of matching

by converting a global problem to a local problem. Thus,Equation (6) can be reduced to

m(p) ≈ argmaxp′=(m,n′) |n′=1,...,N ′

P (p↔ p′ |Ωp,Ωp′) . (7)

Using Bayes’ theorem, the probability on the right-hand sideof Equation (7) can be reformulated as

P (p′ = m(p) |Ωp,Ωp′) =P (Ωp, Ωp′ | p↔ p′)× P (p↔ p′)

P (Ωp, Ωp′).

(8)To simplify the expression, it is assumed that all Ωp and Ωp′are independent and equiprobable. Thus, the decision criteriaof Equation (7) can be reformulated as

m(p) ≈ argmaxp′ = (m,n′) |n′ = 1, ..., N ′

P (Ωp,Ωp′ | p↔ p′)× P (p↔ p′).

(9)To reduce the complexity associated with considering every

possible Ωp and Ωp′ , it is assumed that there exists pairwiseindependence between all pairs of pixels li, lj ∈ Ωp. Thus, wecan reduce Equation (9) to

m(p) ≈ argmaxp′ = (m,n′) |n′ = 1, ..., N ′

∏l∈Ωp , l′∈Ωp′

P (l, l′ |p↔ p′)

×P (p↔ p′)

= argmaxp′ = (m,n′) |n′ = 1, ..., N ′


log(P (l, l′ | p↔ p′)

)+ log (P (p↔ p′)) . (10)

Using the SAD matching criteria is equivalent to assumingthat the distribution of P (l, l′ | p ↔ p′) ≈ e−dSAD(l,l′). Bysubstituting this approximation into Equation (10), we havethe commonly known energy minimization given by

m(p) ≈ argminp′ = (m,n′) |n′ = 1, ..., N ′


dSAD(l, l′)

− log (P (p↔ p′)) .

(11)By applying adaptive support-weights to the SAD distances(Equation (4)), the matching criteria is given by

m(p) ≈ argminp′ = (m,n′) |n′ = 1, ..., N ′

∑

l∈Ωp , l′∈Ωp′

w(p, l)w(p′, l′)dSAD(l, l′)∑l∈Ωp , l′∈Ωp′

w(p, l)w(p′, l′)

− log (P (p↔ p′)) , (12)

During the first iteration of matching, there is no estimate ofdisparity, and log (P (p↔ p′)) is a constant for all p and p′.Thus, during the first iteration log (P (p↔ p′)) can be ignoredbecause it does not affect the minimization. However, duringsubsequent iterations, there exists an estimated disparity thatcan be used to compute log (P (p↔ p′)). In the followingsection, the derivation of log (P (p↔ p′)) is given, along witha new method for using this estimate for iterative refinement.

Left Image Right Image

GenerateAdaptiveWeights

GenerateAdaptiveWeights

AggregateCosts

ComputeDisparityPenalty

ComputeDisparityPenalty

+

Compute1st Min &2nd Min

Compute1st Min &2nd Min

+ +

DynamicProgramming

FinalIteration

Post-Processing

Dense DisparityMap

no noyes

IL IR

PiL PiR

D0

MC1st

MC2nd

MC1st

MC2nd

E i-1LCi-1L CiL

E iLCiL

E iRCiR

E i-1RCi-1R

Fig. 1: Flow diagram of the iterative ASW stereo matchingalgorithm.

3.1 Iterative Disparity RefinementOnce the first iteration of stereo matching is complete, an

estimated disparity is available for use in subsequent iterations.We will denote the estimated disparity of pixel p after theith iteration as E i(p). The set of disparity estimates from theprevious iteration i− 1, denoted by E i-1(p), are used to guidematching in the current iteration i. The additive term fromEquation (12) is approximated using the distribution P (p ↔p′) ≈ e−δ×|E

i-1(p)−(n−n′)|, where the scalar δ is determinedempirically. Thus, the cost of deviating from the estimateddisparity is log (P (p↔ p′)) ≈ δ × |E i-1(p) − (n − n′)|.However, similar to the way ASW stereo matching computes aweighted average of color difference cost, a weighted averageof the disparity difference penalty is used here.

Along with each disparity estimate E i(p) is an associatedlevel of confidence assigned to that match, given by Ci(p).Both prior estimates and confidences are used to compute thedisparity difference penalties, given by

PiL(p, p′)= δ×

∑l∈Ωp\m

w(p, l)Ci-1L (l)

∣∣∣∣∣∣∣∣∣

∑l∈Ωp\m

w(p, l)Ci-1L (l)Ei-1

L (l)

∑l∈Ωp\m

w(p, l)Ci-1L (l)

− (n-n′)

∣∣∣∣∣∣∣∣∣,

(13)where the window Ωp\m includes all pixels in Ωp except

those on line m. The window Ωp\m is used to discourageincorrect matches from reinforcing themselves, and reducethe characteristic streaking behavior of DP. The method forcomputing E i(p) and Ci(p) is now introduced.

Multiple factors are used to determine the confidence Ci(p)of a given pixel disparity estimate E i(p). First, a WTA matchmWTA(p) for pixel p in the left image is found along with theassociated minimum cost given by

MC1st(p) = minp′ = (m,n′) |n′ = 1, ..., N′

D(p, p′) + PiL(p, p′). (14)

Note that the disparity difference penalty PiL(p, p′) is equalto zero during iteration i = 1. After computing the minimumcost, the 2nd minimum cost is computed using

MC2nd(p) = minp′ = (m,n′) |n′ = 1, ..., N′

|p′ −mWTA(p)| > 1

D(p, p′) + PiL(p, p′).

(15)Finally, the level of confidence for the match is given by

CiL(p) =∣∣∣∣MC2nd(p)−MC1st(p)

MC2nd(p)

∣∣∣∣ (16)

The confidence measure CiL(p) essentially rates the uniquenessof the min-cost match by comparing it to its nearest competitor.Penalties, matches, minimums, and confidences are foundsimilarly for each pixel p′ in the right image.

After establishing the initial confidence for each pixel basedupon the uniqueness of the WTA matches, the cost matrices areadded together element-by-element Di = D0 +PiL +PiR, andDP is performed on Di. The following steps are then taken toderive matching estimates and refine the confidence. For eachp:

1) If the match between p and p′ corresponds to a diagonalmove in the optimal DP path, and mWTA(mWTA(p))−p = 0(i.e. matches are consistent), set E i(p) = p− p′ and theconfidence Ci(p) remains unchanged.

2) If mWTA(mWTA(p))− p ≤ 1 and Ci(p) > 0.25, keep E i(p)unchanged and set Ci(p) = Ci(p)2.

3) If neither 1) or 2) is satisfied, set E i(p) = ∅ and itsconfidence Ci(p) = 0.0.

The first condition corresponds to an agreement between thedecisions made by DP and WTA. Naturally, such correspon-dences should be considered highly reliable. The second con-dition corresponds to reliable matches from WTA that did notappear on the optimal DP path. These matches are penalized bysquaring the value of their confidence, which resides between0.0 and 1.0. Thus, a confidence of 1.0 is not penalized, while aconfidence of 0.25 is changed to 0.0625. This helps to preservethe confidence of highly reliable matches that DP does not find.The third condition corresponds to a situation where p has anunreliable match. In this case, the confidence is set to zeroand the disparity estimate is set to ∅. Figure 1 shows a flowdiagram of the iterative ASW algorithm.

3.2 Adaptive LambdaIt should be noted that an adaptive λ is used during dynamic

programming. The value of adaptive lambda takes into account

Table 1: The number of estimated points and the associatederror rate of iterative ASW stereo matching (results shown inFigure 2).

Iteration 1 2 4 8 FinalPoints with E 6= ∅ 68083 74808 78792 80181 85438

% of Errors 1.46% 1.30% 0.99% 0.83% 0.85%

knowledge that depth discontinuities in the scene are oftenassociated with a high degree of pixel intensity shifts. Thus,adaptive lambda is given by

λL(n) = λ×

max dE(p−, p+), 30max

n=2,...,N−1

(max

dE(p−, p+), 30

) , (17)

where dE(p−, p+) is the Euclidean distance between the colorsof the pixels to the left and right of p = (m,n). The adaptivelambda for the right image, denoted λR, is computed similarly.Finally, adaptive lambdas are incorporated into the cumulativedistance matrix given by Equation (5) by using A(n−1, n′)+λL(n) and A(n, n′ − 1) + λR(n′).

3.3 Post-ProcessingFinally, in order to produce a dense disparity map, some

post processing is used. For all matches that are designated anestimate of E(p) = ∅, neighbors within a radius of ω = 2 anda specific SAD color distance threshold are used to fill in theseempty estimates. The color distance threshold is increased untilall empty estimates are filled. The resulting dense disparitymap is then passed through 10 rounds of a 3×3 median filteringto produce the final output.

3.4 ComplexityThe computational complexity of iterative adaptive support-

weight stereo matching can be determined by considering thefollowing five steps: 1) generating adaptive support-weights, 2)computing matching costs, 3) generating disparity penalties,4) performing dynamic programming, and 5) finding WTAmatches. Let the search space for candidate disparities belimited to some range R < min(N,N ′) of pixels, and recallthat ω is the window diameter, M is the number of rows inboth images, and N and N ′ are the number of columns in theright and left images, respectively.

The complexity of generating adaptive support-weights anddisparity penalties is O(MNN ′ω2), the complexity of com-puting matching costs is O(MNRN ′ω2), and the complexityof both dynamic programming and WTA is O(MNR). There-fore, the overall complexity of the first iteration is boundedby O(MNRN ′ω2), whereas further iterations which do notrequire the computation of matching costs are bounded byO(MNN ′ω2). Thus, the complexity associated with eachadditional iteration is small when compared to the complexityof the first iteration.

(a) 1st Iteration (b) 2nd Iteration

(c) 4th Iteration (d) 8th Iteration

(e) Final Output

Fig. 2: Disparity images after 1-8 iterations on Tsukuba.

4. ResultsThe results presented in this section were obtained using

iterative adaptive support-weight stereo matching with theparameters ω = 9, γc = 4.48, γd = 12.58, λ = 7.21,and δ = 0.12. The four images sets used in this sec-tion, along with their ground truth disparities, are availableon the Middlebury College stereo website [10] [12] [15]:http://vision.middlebury.edu/stereo/.

In Figure 2, results are presented to illustrate the effect ofiterative processing on the Tsukuba image set. The error rateof the disparity estimates is given in Table 1 after iterations1, 2, 4, 8, and post-processing. With each successive iteration,the number of valid disparity estimates (with E 6= ∅) increases,and the percentage of errors decreases. Improvements in thedisparity map can be seen as iterations increase. For example,the disparity significantly improves in the area surrounding thecamera lens in the background, and the area to the right of thesculpture head (the original image can be seen in Figure 4).

Results given in Figure 3 show that the % of estimatesand errors nearly converges after 8 iterations for all fourimage pairs. Thus, only 8 iterations were needed to producethe results given in Figure 4. The new algorithm performsparticularly well on the Tsukuba, Venus, and Cones imagesets by preserving sharp depth discontinuities, and effectivelyeliminating outlying clusters of incorrect matches. In fact, thenew algorithm produces the most accurate disparity image forTsukuba, when compared to all other algorithms listed on theMiddlebury website. However, like most algorithms that do not

2 4 6 8 10 12 14 16 18

0.65

0.7

0.75

0.8

0.85

0.9

Iterations

% o

f Pix

els

Estim

ated

TsukubaVenusTeddyCones

2 4 6 8 10 12 14 16 180

1

2

3

4

5

Iterations

% o

f Erro

rs

TsukubaVenusTeddyCones

Fig. 3: Percentage of estimates and errors using Iterative ASW.

incorporate some form of image segmentation, the algorithmsuffers in the white, untextured area to the right of the stuffedbear in Teddy. The algorithm also produces poor accuracy onthe floor, where the highly slanted surface challenges manywindow-based algorithms. Large, planar surfaces such as thesebenefit greatly from image segmentation and plane-fitting.

Our results are presented alongside the results of other stereomatching algorithms that do not require image segmentationin Table 2. When compared to the original ASW algorithm of[4], the results improve significantly for all four images. It isalso worth noting that results for the original ASW algorithmwere obtained using 35 × 35 windows, whereas the iterativeASW results were obtained using 19× 19 windows. Iterativerefinement allows information to propagate beyond the rangeof the window radius, allowing for smaller window sizes. Thiswindow size reduction reduces the complexity of computingraw matching cost by a factor of ≈ 3.4.

The results of geodesic support-weights stereo matching[16] are also given in Table 2, showing significant improve-ment over ASW stereo matching for most images. By usinggeodesic distances, this method is able to essentially performlocal image segmentation within windows to more effectivelyisolate pixels that belong to the same smooth surface as thecentral pixel of interest. While this method can be considereda local algorithm, a significant penalty is introduced in termsof complexity, as the geodesic weights can not be efficientlycomputed in parallel. Along with the geodesic support-weightsstereo matching algorithm, the new iterative ASW algorithmis one of the two most accurate stereo matching algorithmsthat do not require image segmentation or plane fitting.

5. ConclusionA new stereo matching algorithm has been introduced

that performs iterative refinement of the results of adaptivesupport-weight stereo matching. The new iterative ASW stereo

matching algorithm presented here was constructed using aprobabilistic framework and a series of approximations ofthe matching cost distributions. It has been shown that theproposed algorithm adds a negligible amount of complexity tothe ASW stereo matching algorithm. In fact, by incorporatingiterative processing, it is possible to achieve improved accuracywhile also reducing the size of the support windows, thussignificantly reducing the complexity of the cost computations.Finally, it has been shown that the new algorithm providesexceptional accuracy when compared to similar algorithms thatdo not rely on image segmentation or plane fitting.

References[1] A. Klaus, M. Sormann, and K. Karner, “Segment-based stereo matching

using belief propagation and a self-adapting dissimilarity measure,” inPattern Recognition, 2006. ICPR 2006. 18th International Conferenceon, vol. 3, pp. 15 –18, 2006.

[2] Q. Yang, L. Wang, R. Yang, H. Stewenius, and D. Nister, “Stereo match-ing with color-weighted correlation, hierarchical belief propagation, andocclusion handling,” Pattern Analysis and Machine Intelligence, IEEETransactions on, vol. 31, pp. 492 –504, March 2009.

[3] Z.-F. Wang and Z.-G. Zheng, “A region based stereo matching algo-rithm using cooperative optimization,” in Computer Vision and PatternRecognition, 2008. CVPR 2008. IEEE Conference on, pp. 1 –8, 2008.

[4] K.-J. Yoon and I. S. Kweon, “Adaptive support-weight approach forcorrespondence search,” Pattern Analysis and Machine Intelligence,IEEE Transactions on, vol. 28, pp. 650 –656, April 2006.

[5] A. Fusiello, V. Roberto, and E. Trucco, “Efficient stereo with multiplewindowing,” in IEEE Computer Society Conference on Computer Visionand Pattern Recognition, pp. 858 –863, June 1997.

[6] S. B. Kang, R. Szeliski, and J. Chai, “Handling occlusions in dense multi-view stereo,” in Computer Vision and Pattern Recognition, 2001. CVPR2001. Proceedings of the 2001 IEEE Computer Society Conference on,vol. 1, pp. I–103 – I–110 vol.1, 2001.

[7] L. Wang, M. Liao, M. Gong, R. Yang, and D. Nister, “High-quality real-time stereo using adaptive cost aggregation and dynamic programming,”in 3DPVT ’06: Proceedings of the Third International Symposiumon 3D Data Processing, Visualization, and Transmission (3DPVT’06),(Washington, DC, USA), pp. 798–805, IEEE Computer Society, 2006.

[8] W. Yu, T. Chen, F. Franchetti, and J. C. Hoe, “High performance stereovision designed for massively data parallel platforms,” Circuits andSystems for Video Technology, IEEE Transactions on, vol. 20, pp. 1509–1519, November 2010.

[9] A. Banno and K. Ikeuchi, “Disparity map refinement and 3d surfacesmoothing via directed anisotropic diffusion,” in Computer Vision Work-shops (ICCV Workshops), 2009 IEEE 12th International Conference on,pp. 1870–1877, October 2009.

[10] D. Scharstein and R. Szeliski, “Middlebury stereo vision research page,”<http://vision.middlebury.edu/stereo/eval/>.

[11] R. I. Hartley and A. Zisserman, Multiple View Geometry in ComputerVision. Cambridge University Press, 2nd ed., 2003.

[12] D. Scharstein and R. Szeliski, “A taxonomy and evaluation of densetwo-frame stereo correspondence algorithms,” International Journal ofComputer Vision, pp. 7–42, 2002.

[13] D. Marr, “Vision: A computational investigation into the human repre-sentation and processing of visual information,” in W.H. Freeman, 1982.

[14] R. E. Bellman, “The theory of dynamic programming,” Bulletin of theAmerican Mathematical Society, vol. 60, no. 6, pp. 503–515, 1954.

[15] D. Scharstein and R. Szeliski, “High-accuracy stereo depth maps usingstructured light,” in In IEEE Computer Society Conference on ComputerVision and Pattern Recognition, vol. 1, pp. 195–202, June 2003.

[16] A. Hosni, M. Bleyer, M. Gelautz, and C. Rhemann, “Local stereomatching using geodesic support weights,” in 16th IEEE InternationalConference on Image Processing, pp. 2093 –2096, November 2009.

[17] K.-J. Yoon and I. S. Kweon, “Stereo matching with the distinctivesimilarity measure,” in Computer Vision, 2007. ICCV 2007. IEEE 11thInternational Conference on, pp. 1 –7, October 2007.

[18] Z. Gu, X. Su, Y. Liu, and Q. Zhang, “Local stereo matching with adap-tive support-weight, rank transform and disparity calibration,” PatternRecognition Letters, vol. 29, no. 9, pp. 1230 – 1235, 2008.

Table 2: Results on the Middlebury test set, as measured by the percent of pixels in the dense disparity map with absolute errorgreater than 1 and 2. Rank is given in terms of overall performance on non-occluded (NonOCC) pixels, and bold entries denotea #1 ranking among all algorithms listed on the Middlebury stereo vision benchmark.

Tsukuba Venus Teddy ConesMethod Threshold Rank NonOcc All Disc NonOcc All Disc NonOcc All Disc NonOcc All Disc

Iterative ASW Error > 1 18 0.85 1.28 4.59 0.35 0.86 4.53 7.60 14.5 17.3 3.20 9.36 8.49Error > 2 16 0.60 0.99 3.26 0.17 0.49 2.31 4.77 9.23 10.7 2.10 7.02 6.04

Geodesic Support [16] Error > 1 16 1.45 1.83 7.71 0.14 0.26 1.90 6.88 13.2 16.1 2.94 8.89 8.32Error > 2 21 1.25 1.61 6.65 0.13 0.22 1.87 3.76 7.27 9.38 2.32 7.37 6.78

Dist. Sim. Meas. [17] Error > 1 32 1.21 1.75 6.39 0.35 0.69 2.63 7.45 13.0 18.1 3.91 9.91 8.32Error > 2 20 0.92 1.33 4.90 0.19 0.37 1.75 4.89 7.91 11.7 1.95 7.16 5.44

Adapt. Disp. Cal. [18] Error > 1 23 1.19 1.42 6.15 0.23 0.34 2.50 7.80 13.6 17.3 3.62 9.33 9.72Error > 2 35 1.03 1.23 5.29 0.20 0.28 2.46 5.09 9.55 11.7 2.71 8.02 7.73

BP + Dir. Diff. [9] Error > 1 35 2.90 4.47 15.1 0.65 1.20 4.52 5.07 14.7 15.7 2.94 12.6 7.50Error > 2 34 1.61 2.89 8.58 0.29 0.69 2.50 3.05 12.6 10.00 2.26 11.5 5.60

Adapt. Supp.-Wgt. [4] Error > 1 43 1.38 1.85 6.90 0.71 1.19 6.13 7.88 13.3 18.6 3.97 9.79 8.26Error > 2 41 1.08 1.41 5.40 0.51 0.82 5.06 7.50 11.8 11.8 2.26 7.16 5.20

Left Image Ground Truth Disparity Our Result Errors

Fig. 4: Results of iterative ASW stereo matching on the Tsukuba, Venus, Teddy, and Cones images from the Middlebury stereoimage set.

Date post:	15-Jun-2020
Category:	Documents
Upload:	others
View:	5 times
Download:	0 times

A Local Iterative Reﬁnement Method for Adaptive Support...

Documents