1
Real-time Stereo Matching on CUDA using an Iterative Refinement Method for Adaptive Support-Weight Correspondences
IEEE Transactions on Circuits and Systems for Video Technology, 2011
University of Nebraska-Lincoln
Jedrzej KowalczukEric T. Psota
Lance C. Pérez
2
Outline• Introduction•Related work• Iterative model• Implement on parallel hardware•Result•Conclusion
3
Introduction•A novel real-time stereo matching method is
presented by using ▫a two-pass approximation of adaptive support-weight
aggregation.▫a low-complexity iterative disparity refinement
technique.
•The refinement technique, constructed using a probabilistic framework.
4
Introduction• two-pass method produces
▫an accurate approximation of the support weights. ▫reducing the complexity of aggregation.
•This method has been implemented on massively parallel using the CUDA computing engine.
5
Introduction• In this paper, a real-time stereo matching method is
introduced by using▫window-based cost aggregation.▫a low-complexity iterative technique implemented.
on CUDA.
6
Introduction•Many real-time methods focus on reducing the
complexity, at the expense of reduced accuracy.
•The proposed approach takes full advantage of the GTX 580’s computing capabilities to produce a highly accurate stereo matching method.
7
Outline• Introduction•Related work• Iterative model• Implement on parallel hardware•Result•Conclusion
8
Related work•Adaptive support-weight
▫mimics the process of visual grouping in the HVS.▫decreases as the geometric distance between p and q
increases.▫typical scene surfaces have locally consistent color.
9
Adaptive Support-Weight•.
• .
• .
10
Adaptive Support-Weight•Complexity of ASW makes it unsuitable for cost
aggregation in real-time applications.
• It is necessary to reduce the complexity of raw adaptive support-weight cost aggregation.▫two-pass adaptive support weights [21]▫approximated joint bilateral filtering [22]▫exponential step-size adaptive weights [9]▫cross-based support weight [11]
11
Two-pass Adaptive Support-Weight• Instead of using square windows for matching.
•The two-pass approach approximates the ASW by performing cost aggregation along the vertical and then the horizontal direction.
•Complexity is reduced from O(n2) to O(n).
12
Two-pass Adaptive Support-Weight•Fail to accurately approximate the support weights
under certain conditions.
13
Compare the Four Modifications
Two-pass
Bilateral Filtering
ESAW
Cross-based
14
Outline• Introduction•Related work• Iterative model• Implement on parallel hardware•Result•Conclusion
15
Flow Diagram
16
Iterative model• Improve the accuracy of the adaptive support-weight
stereo matching.•Let denote a probabilistic event
▫.
17
Iterative model•Bayes’ theorem
18
Iterative model•Stereo matching is performed by using an additive
distance metric, arbitrarily denoted by δ(q, ͞q).▫.
• .
19
Iterative model•.
20
Iterative Disparity Refinement•Let Dp
i be the disparity estimate for pixel p obtained in the ith iteration of matching.
•Let Fpi used to express the confidence level associated
with the disparity estimate of pixel p.
• .
21
Iterative Disparity Refinement•Penalty function
22
Iterative Disparity Refinement•After the matching costs are computed, the minimum
cost matches are found for both reference and target images using the WTA decision criteria.
23
Iterative Disparity Refinement• If ͞p = m(p) and p’ = m(͞p), then
▫disparity d(p, ͞p) is assigned to reference disparity map.▫disparity d(p’, ͞p) is assigned to target disparity map.
• If | d(p, ͞p) - d(p’, ͞p) | > 1, then its confidence Fpi is
set to zero.
24
Outline• Introduction•Related work• Iterative model• Implement on parallel hardware
▫CUDA execution model▫stereo matching on CUDA▫complexity and runtime distribution
•Result•Conclusion
25
Flow Diagram
26
CUDA execution model•A block of threads is an abstract representation of a
multiprocessor and capable of performing operations in parallel.▫The threads are executed on the graphics device
equipped with a GPU.
▫At runtime, each block of threads gets mapped to a single multiprocessor on the device.
27
CUDA execution model•The implementation of the proposed method utilizes
the NVIDIA GeForce GTX 580 GPU computing processor, equipped with 512 CUDA cores.
•The device code is encapsulated in special functions called kernels that are invoked by the host, and executed in parallel by multiple threads.
28
Stereo Matching on CUDA•The kernels are designed such that each thread within
a block is responsible for computing the matching cost for a single pair of pixels.
•This granularity of computations allows the threads in each warp to take advantage of memory coalescing.
29
Stereo Matching on CUDA
30
31
32
33
Complexity and Runtime Distribution•Complexity of matching cost volume is O(mnwr/s).
•Complexity of iterative refinement is O(mnwk/s).
34
Percentages of the total execution time
35
Outline• Introduction•Related work• Iterative model• Implement on parallel hardware•Result•Conclusion
36
Result•γc = 30.91 and γg = 28.21 for matching cost
aggregation.
•γc = 10.94 and γg = 118.78 for iterative disparity refinement, and the disparity penalty was set to
α = 0.085.
37
Result
38
Result
39
40
41
42
Outline• Introduction•Related work• Iterative model• Implement on parallel hardware•Result•Conclusion
43
Conclusion•The refinement technique iteratively improves the
accuracy of the disparity map and typically converges after only six iterations.
•The added complexity associated with iterative refinement is shown both analytically and experimentally to be relatively small.