IEEE Transactions on Circuits and Systems for Video Technology, 2011

1

Real-time Stereo Matching on CUDA using an Iterative Refinement Method for Adaptive Support-Weight Correspondences

IEEE Transactions on Circuits and Systems for Video Technology, 2011

University of Nebraska-Lincoln

Jedrzej KowalczukEric T. Psota

Lance C. Pérez

2

Outline• Introduction•Related work• Iterative model• Implement on parallel hardware•Result•Conclusion

3

Introduction•A novel real-time stereo matching method is

presented by using ▫a two-pass approximation of adaptive support-weight

aggregation.▫a low-complexity iterative disparity refinement

technique.

•The refinement technique, constructed using a probabilistic framework.

4

Introduction• two-pass method produces

▫an accurate approximation of the support weights. ▫reducing the complexity of aggregation.

•This method has been implemented on massively parallel using the CUDA computing engine.

5

Introduction• In this paper, a real-time stereo matching method is

introduced by using▫window-based cost aggregation.▫a low-complexity iterative technique implemented.

on CUDA.

6

Introduction•Many real-time methods focus on reducing the

complexity, at the expense of reduced accuracy.

•The proposed approach takes full advantage of the GTX 580’s computing capabilities to produce a highly accurate stereo matching method.

7


8

Related work•Adaptive support-weight

▫mimics the process of visual grouping in the HVS.▫decreases as the geometric distance between p and q

increases.▫typical scene surfaces have locally consistent color.

9

Adaptive Support-Weight•.

• .

• .

10

Adaptive Support-Weight•Complexity of ASW makes it unsuitable for cost

aggregation in real-time applications.

• It is necessary to reduce the complexity of raw adaptive support-weight cost aggregation.▫two-pass adaptive support weights [21]▫approximated joint bilateral filtering [22]▫exponential step-size adaptive weights [9]▫cross-based support weight [11]

11

Two-pass Adaptive Support-Weight• Instead of using square windows for matching.

•The two-pass approach approximates the ASW by performing cost aggregation along the vertical and then the horizontal direction.

•Complexity is reduced from O(n2) to O(n).

12

Two-pass Adaptive Support-Weight•Fail to accurately approximate the support weights

under certain conditions.

13

Compare the Four Modifications

Two-pass

Bilateral Filtering

ESAW

Cross-based

14


15

Flow Diagram

16

Iterative model• Improve the accuracy of the adaptive support-weight

stereo matching.•Let denote a probabilistic event

▫.

17

Iterative model•Bayes’ theorem

18

Iterative model•Stereo matching is performed by using an additive

distance metric, arbitrarily denoted by δ(q, ͞q).▫.

• .

19

Iterative model•.

20

Iterative Disparity Refinement•Let Dp

i be the disparity estimate for pixel p obtained in the ith iteration of matching.

•Let Fpi used to express the confidence level associated

with the disparity estimate of pixel p.

• .

21

Iterative Disparity Refinement•Penalty function

22

Iterative Disparity Refinement•After the matching costs are computed, the minimum

cost matches are found for both reference and target images using the WTA decision criteria.

23

Iterative Disparity Refinement• If ͞p = m(p) and p’ = m(͞p), then

▫disparity d(p, ͞p) is assigned to reference disparity map.▫disparity d(p’, ͞p) is assigned to target disparity map.

• If | d(p, ͞p) - d(p’, ͞p) | > 1, then its confidence Fpi is

set to zero.

24

Outline• Introduction•Related work• Iterative model• Implement on parallel hardware

▫CUDA execution model▫stereo matching on CUDA▫complexity and runtime distribution

•Result•Conclusion

25

Flow Diagram

26

CUDA execution model•A block of threads is an abstract representation of a

multiprocessor and capable of performing operations in parallel.▫The threads are executed on the graphics device

equipped with a GPU.

▫At runtime, each block of threads gets mapped to a single multiprocessor on the device.

27

CUDA execution model•The implementation of the proposed method utilizes

the NVIDIA GeForce GTX 580 GPU computing processor, equipped with 512 CUDA cores.

•The device code is encapsulated in special functions called kernels that are invoked by the host, and executed in parallel by multiple threads.

28

Stereo Matching on CUDA•The kernels are designed such that each thread within

a block is responsible for computing the matching cost for a single pair of pixels.

•This granularity of computations allows the threads in each warp to take advantage of memory coalescing.

29

Stereo Matching on CUDA

30

31

32

33

Complexity and Runtime Distribution•Complexity of matching cost volume is O(mnwr/s).

•Complexity of iterative refinement is O(mnwk/s).

34

Percentages of the total execution time

35


36

Result•γc = 30.91 and γg = 28.21 for matching cost

aggregation.

•γc = 10.94 and γg = 118.78 for iterative disparity refinement, and the disparity penalty was set to

α = 0.085.

37

Result

38

Result

39

40

41

42


43

Conclusion•The refinement technique iteratively improves the

accuracy of the disparity map and typically converges after only six iterations.

•The added complexity associated with iterative refinement is shown both analytically and experimentally to be relatively small.

Date post:	23-Feb-2016
Category:	Documents
Upload:	zoltin
View:	47 times
Download:	0 times

IEEE Transactions on Circuits and Systems for Video Technology, 2011

Documents