Robust obstacle detection based on a novel disparity calculation method and G-disparity

Computer Vision and Image Understanding 123 (2014) 23–40

Contents lists available at ScienceDirect

Computer Vision and Image Understanding

journal homepage: www.elsevier .com/ locate/cviu

Robust obstacle detection based on a novel disparity calculation methodand G-disparity q

http://dx.doi.org/10.1016/j.cviu.2014.02.0141077-3142/Crown Copyright � 2014 Published by Elsevier Inc. All rights reserved.

q This paper has been recommended for acceptance by Narendra Ahuja.⇑ Corresponding author. Address: University of Bristol, Merchant Venturers

Building, Room 1.15, Woodland Road, BS8 1UB Bristol, UK. Fax: +44 (0)117 9545206.

E-mail address: [email protected] (Y. Gao).

Yifei Wang, Yuan Gao ⇑, Alin Achim, Naim DahnounMerchant Venturers School of Engineering, University of Bristol, Merchant Venturers Building, Woodland Road, BS8 1UB Bristol, UK

a r t i c l e i n f o

Article history:Received 17 September 2011Accepted 28 February 2014Available online 13 March 2014

Keywords:Obstacle detectionU–V disparityFree space calculationStereo vision

a b s t r a c t

This paper presents a disparity calculation algorithm based on stereo-vision for obstacle detection andfree space calculation. This algorithm incorporates line segmentation, multi-pass aggregation and effi-cient local optimisation in order to produce accurate disparity values. It is specifically designed for trafficscenes where most of the objects can be represented by planes in the disparity domain. The accurate hor-izontal disparity gradient for the side planes are also extracted during the disparity optimisation stage.Then, an obstacle detection algorithm based on the U–V-disparity is introduced. Instead of using theHough transform for line detection which is extremely sensitive to the parameter settings, the G-dispar-ity image is proposed for the detection of side planes. Then, the vertical planes are detected separatelyafter removing all the side planes. Faster detection speed, lower parameter sensitivity and improvedperformance are achieved comparing with the Hough transform based detection. After the obstaclesare located and removed from the disparity map, most of the remaining pixels are projections fromthe road surface. Using a spline as the road model, the vertical profile of the road surface is estimated.Finally, the free-space is calculated based on the vertical road profile which is not restricted by the planarroad surface assumption.

Crown Copyright � 2014 Published by Elsevier Inc. All rights reserved.

1. Introduction

Obstacle detection has been an active research area for two dec-ades. It is widely used in many Advanced Driver Assistance Sys-tems (ADAS) and Intelligent Transportation Systems (ITS). Themost important goal is to accurately locate obstacles in front andmeasure the distances to the obstacles. Existing solutions are basedon either active sensors or passive sensors. Active sensor basedsystems provide simple and accurate distance measurements.However, they cannot produce results with adequate spatialresolution in real-time. Active sensors also face the problem of mu-tual interference. Although this problem has been solved by thepseudo-random laser modulation scheme [1], the LiDAR system re-quires an unreasonably high power to detect objects in the far-fieldif adequate spatial resolution is required. Therefore, our research isfocused on passive sensor based obstacle detection.

The passive sensor based systems can be further divided intothree main types: segmentation-based [2–7], motion-based

[8–10] and depth-based [11–16] obstacle detection. The segmenta-tion-based solutions separate the obstacles from the backgroundby identifying and analysing potential obstacle features in thescene. Various features have been used, such as colour [4], edge[6], edge symmetry [3], texture [7] and so on. The distances tothe detected obstacles are normally estimated using fixed cameraparameters and a flat-road assumption. Most of the motion-basedmethods are based on perspective transformation [9,17] andglobal-motion estimation algorithms [8,10]. Differentiating themotion of local image areas from the global motion introducedby the vehicle ego-motion allows the detection of objects. If theego-motion of the vehicle is directly available, perspective trans-formation can be applied to warp the current frame, f ðiÞ, to pro-duce an ‘expected’ next frame, bf ðiþ 1Þ, assuming a flat road.Then, for obstacle detection, the actual frame, f ðiþ 1Þ, can be com-pared with frame, bf ðiþ 1Þ. When the global motion is unavailable,algorithms such as optical flow can be applied. The global motionneeds to be first identified by analysing the flow field. The posi-tions where the motion flows do not agree with the global motionfield indicate possible dynamic objects which are very likely to beobstacles such as vehicles or pedestrians [18–20]. Generally, bothshape-based and motion-based algorithms are able to provideaccurate detection. However, the distance measurements can onlybe achieved with a flat-road assumption (or accurate prior road

http://crossmark.crossref.org/dialog/?doi=10.1016/j.cviu.2014.02.014&domain=pdf

http://dx.doi.org/10.1016/j.cviu.2014.02.014

mailto:[email protected]


http://www.sciencedirect.com/science/journal/10773142

http://www.elsevier.com/locate/cviu

24 Y. Wang et al. / Computer Vision and Image Understanding 123 (2014) 23–40

vertical profile detection), due to the lack of direct distance mea-surement. In reality, only near-field roads can be treated as flatplanes. Therefore, the accuracy of the far-field distance measure-ments become unreliable with vertical road gradient variations.

Stereo vision-based solutions hold an advantage over shape-based and motion-based approaches. Direct distance calculationscan be achieved by evaluating the disparities between the twoimages without any assumption of the road structure. By doing so,obstacles are detected based on the disparity maps instead of inten-sity images and so the process can be significantly simplified. How-ever, calculating a dense disparity map requires a significant amountof computation which is burdensome for the overall system. Block-based methods evaluate the similarities of local image areas. The po-sition where the highest similarity is achieved can be selected as thecorrespondence. Global optimisation-based algorithms includemore constraints, such as a smoothness term. Combining the con-straints with the cost function, a global energy function representingthe goodness of the current match can be formed. Many optimisa-tion techniques [21–23] have been applied to find the maximum/minimum of the global energy function with varying degrees of suc-cess, depending on the test images. However, the global optimisa-tion process is normally computationally demanding and difficultto implement in real-time. Dynamic programming (DP)-based ap-proaches [24,25] optimise disparity values based on a DisparitySpace Image (DSI), generated using a scanline. It is difficult for thismethod to incorporate inter-scanline information and to relax themonotonicity or ordering constraint (described in [26,27]).

When designing a disparity calculation algorithm, one of themost important steps is to decide the cost function. Traditional costfunctions are generally based on intensity differences such as theSSD, MD and the SAD. Intensity difference based cost functionsare the simplest as they do not include divisions and are suitablefor implementation on fixed-point embedded systems. However,these functions are sensitive to the differences of the intensitygains between the two cameras. The NCC is also a popular costfunction. Although the NCC demands higher computationalcomplexity, it is definitely preferred when intensity differencesare non-negligible. Other methods are also proposed, focusing onreducing the sensitivity towards intensity differences, includingimage gradient and non-parametric transforms (rank and censustransforms) [28–30] based cost functions.

One of the general problems during stereo matching is that er-rors are likely to be introduced in the homogeneous areas (pixels ina large image area that share very similar intensities). In automo-tive applications, large homogeneous regions exist on the road sur-face. This leads to the matching cost being unpredictable in variouslocations. The position corresponding to the minimum cost mightnot be the correct correspondence. Grouping the pixels togetherwithin the homogeneous region during the cost aggregation solvesthe problem but it is difficult to allow smooth and gradual dispar-ity changes within the group. Another factor that induces error isthe occlusion, which is introduced due to the position differencebetween the left and right cameras. Existing solutions are mainlybased on the left and right consistency check or global optimisa-tion [22,21], which are computationally complicated.

In this paper, we propose a disparity calculation algorithmwhich readily solves the homogeneous area problem. It also haspartial occlusion handling capability. The algorithm consists of fourmain steps: cost calculation, image line segmentation, cost aggre-gation and optimisation. Line segmentation is applied to the refer-ence image horizontally and vertically. The calculated costs areaggregated in both the horizontal and vertical directions. The hor-izontal aggregation enables a correct correspondence to be foundin the homogeneous and occluded areas. Vertical aggregation al-lows gradual disparity variation within each horizontal segment.Segment based optimisation is then performed to identify the final

disparity for each pixel. The proposed technique is especially suit-able for this application since a gradual change of disparity in theroad area and side planes must be encouraged (many global opti-misation-based algorithms involve a smoothness term which addsa small penalty even to gradual disparity changes). During theoptimisation, the proposed technique also generates an accuratehorizontal disparity gradient which simplifies the side plane recog-nition during obstacle detection.

Once the disparity map is available, the potential obstacles inthe scene can be extracted. Many existing systems assume the roadsurface is planar [11]. This assumption allows the reference imageto be perspective projected and matched with the other image. Thefinal intensity difference map reveals the potential obstacles. Thebiggest advantage of such systems is that the disparity matchingstep can be omitted. However, if the flat road assumption is vio-lated, the system performance will be heavily affected. In orderto relax the flat road assumption while saving computation, dispar-ities are only calculated at certain feature points in [31–33]. How-ever, sophisticated feature extractors are computationallycomplicated. Most importantly, a large amount of valuable dis-tance information is not included and cannot be shared with theother components in the autonomous vehicle system. With fastgrowing computing systems, dense disparity matching can beachieved in real-time. One of the most significant contributions,based on a dense disparity map, is called the U–V-disparity map[13]. By calculating the line disparity histograms horizontally andvertically, the V-disparity and U-disparity images can be generatedrespectively. Each plane in the world coordinate system will beshown as a line on the U–V-disparity map. Categorising road sur-faces and obstacles into different types of planes simplifies theobstacle detection problem into a line detection one. The Houghtransform has been the most popular algorithm to extract lines(mostly corresponding to horizontal or oblique planes) in theU–V-disparity space [13,34]. However, the Hough transformrequires pre-set parameters whose settings may cause seriousover-detection or the neglect of small obstacles. It is also extremelydifficult to find one set of parameters that works in the fast varyingtraffic environment. In [13], the detection of the road plane is car-ried out prior to the obstacle detection. The obstacles are found byevaluating the disparity difference in the road disparity profile.This type of algorithm assumes that a very large percentage ofthe scene is occupied by the road surface, so that the longest linein the V-disparity map corresponds to the road surface. However,in the cases when obstacles are in the near-field, the road surfacecan be blocked and this results in large errors.

In our system, the detection is separated into side and verticalplane detection. Side planes are located first on the U-disparityand the proposed G-disparity, which is calculated based on the dis-parity gradient. The detected obstacles are removed so that mini-mum outliers are involved during vertical plane detection. Thisapproach does not include sensitive parameter settings and itcan be implemented very efficiently. Furthermore, the proposedobstacle detection does not rely on the prior road surface exclusionand allows the presence of obstacles in the foreground. Anotheradvantage is that, once all the obstacles are removed from thescene, the road surface can be easily detected and modelled. Asproposed in [35], the vertical profile of the road surface is modelledusing a spline. In our system, the model parameters are thenoptimised using the least squares method. An extra refinementstep ensures the boundary between the obstacles and the roadsurface is accurately defined.

The rest of this paper is organised as follows: Section 2 intro-duces the proposed stereo matching algorithm. Section 3 focuseson the obstacle detection. Section 4 describes the road modellingand free space computation. Section 5 presents experimental re-sults and corresponding discussions. Section 6 concludes the paper.

Fig. 2. Relationship between the disparity d and the depth Z.

Fig. 3. Point P is only visible to the reference (left) image. The black rectanglerepresents a possible object that is nearer to the camera than P.

Y. Wang et al. / Computer Vision and Image Understanding 123 (2014) 23–40 25

2. Disparity calculation

2.1. Theoretical preliminaries

First of all, we define the world coordinate system as shown inFig. 1. The relationship between disparity and depth is a result ofperspective projection. As shown in Fig. 2, defining a point in the3D space as P and projecting this point onto the stereo vision sys-tem with a horizontal baseline B, the resultant points on the leftand right image plane can be represented as pL and pR respectively.The relationship between these two points and the depth Z can befound as shown in Eq. (1).

d ¼ BfZ

ð1Þ

where d is the disparity which can be calculated by the horizontalposition difference between pL and pR on the image space and f isthe focal length.

In order to locate the correct correspondence for each pixel inthe reference image, we need to utilise the intensity informationsurrounding the point of interest. Even calibrated cameras couldinclude a small amount of intensity difference which could leadto incorrect correspondence. In the case where the pixel disparitieswithin a homogeneous region are identical, this problem can besolved by image segmentation and restricting the disparities ofpixels within the same segment. However, in obstacle detectionapplications, the disparity of the road surface changes graduallyso that assigning a single disparity is not an option. Furthermore,changes in distance also introduce changes in object appearancefrom one image to the other which then causes a correspondentsegment not being found. Another factor that induces errors isocclusion, which occurs due to the position difference betweenthe left and right cameras. This causes background pixels to be vis-ible in the reference image, but not in the other one (as shown inFig. 3). In theory, the disparity of the occluded areas cannot bemeasured and should be excluded from the final result. Existingways of excluding these pixels are mainly based on the left andright consistency check and global optimisation [21–23] whichare computationally complex. However, in most situations, the dis-parities of occluded areas can be estimated with the disparity ofthe neighbouring background object.

In this paper, a multi-pass cost aggregation approach based online segmentation is presented in order to solve the problemsintroduced by homogeneous and occluded regions. The systemblock diagram of the proposed disparity calculation algorithm isshown in Fig. 4. As discussed in [34], planes within a normal roadscene can be separated into horizontal, oblique, vertical and sideplanes. Although shapes of the road surfaces on the two images

Fig. 1. Perspective projection of point P in the world coordinate system onto theimage plane.

Fig. 4. Block diagram of the proposed disparity calculation algorithm.

are different, the widths of the road areas on both images are iden-tical. Therefore, line segmentation instead of region segmentationis implemented in our system to calculate the disparities of the ob-lique and vertical planes. Aggregating the costs horizontally allowsvertical disparity changes while restricting the segment disparitiesusing object widths. In order to cope with the occluded areas, aweighted aggregation is conducted to suppress the contributionsof occluded pixels. When the object is a side plane, the disparityvalues change linearly in the horizontal direction. In this case,vertical line segmentation is also employed. By aggregating thecost within each vertical segment, the disparity of side surfacescan be calculated accurately. The final segment disparities arethen optimised based on the multi-pass aggregation results. The

Fig. 5. Disparity map based on the NCC cost function and WTA optimisation method.

Fig. 6. Cost of a pixel in the homogeneous region. The maximum does notcorrespond to the correct position.


horizontal aggregation results are first optimised using the inter-scanline information. After that, an energy function is evaluatedbased on the horizontal segment DSI s [36] to determine the finaldisparity for each pixel.

2.2. Cost calculation

Cost calculation provides the most basic information needed byall high-level algorithms such as the popular BP and the SGM in or-der to produce the final disparity map. While choosing the costfunction, the most important factors are accuracy and speed. Someof the simple functions such as the SAD and the SSD can produceaccurate results if the input images are well calibrated (includingbrightness) and contain rich texture details (many colour/intensitychanges). As many of our test image sets from EISATS1 contain sig-nificant brightness differences between the left and right views, NCCis implemented due to its robustness against intensity changes. TheNCC coefficient or cost dn;mðdÞ of pixel Iðn;mÞ can be calculated asgiven in Eq. (2).

dn;mðdÞ ¼1

ð2hþ 1Þ2 � 1

Xnþh

x¼n�h

Xmþh

y¼m�h

ðIlðx; yÞ � IlÞðIrðxþ d; yÞ � IrÞrlrr

ð2Þ

where dm;nðdÞ represents the normalised cross-correlation coeffi-cient at disparity d. Il and Ir denote the mean of the intensities with-in the left and right blocks respectively and r represents thestandard deviation. The normalising term ensures dm;nðdÞ is insensi-tive to any intensity variations.

Unlike the SAD and SSD, larger NCC coefficients correspond tothe likely matches. Therefore, in this paper, disparities correspond-ing to large costs are regarded as desirable matches.

2.3. Line segmentation and cost aggregation

If the Winner-Take-All (WTA) optimisation method is used tolocate the maximum cost location, the disparity map can be pro-duced as shown in Fig. 5.

This is a typical road scene with vehicles and boundary fences. Alarge amount of error is produced in the homogeneous regions andoccluded areas. By inspecting the cost of a pixel inside the homo-geneous region (shown in Fig. 6), we can see that the local costcalculation does not reflect the correct match, and the maximumin this case corresponds to an incorrect disparity value. In theoccluded regions, correct correspondences do not exist. Therefore,errors cannot be avoided at this stage.

In order to illustrate the influence of a homogeneous regionduring cost calculation, the NCC coefficients of one pixel inside ahomogeneous regions are shown in Fig. 6. As all intensities insidethe region are very similar, the maximum corresponds to an incor-rect position. Therefore, the costs of pixels inside an homogeneousarea need to be aggregated to find the correct disparity values.

1 Enpeda Image Sequence Analysis Test Site.

The proposed cost aggregation method is based on the horizon-tal and vertical line segmentations. First, the gradient of the refer-ence (left) image is calculated horizontally by using a Sobel mask.Then, a small threshold is applied to identify the intensity changes.This threshold is easy to determine since only the areas with verylittle texture need to be grouped. Once the vertical edges are found,any pixels between the two edge points are grouped into a seg-ment. The cost dn;mðdÞ can therefore be represented as dsðj; dÞ,where s is the segment index and j is the pixel index withinsegment s.

The distribution of disparity values within a homogeneous linesegment can be separated into three cases:

� The segment of interest corresponds to either the horizontal(parallel to the X–Z plane), oblique (the X-axis is parallel to thistype of planes) or the vertical planes (parallel to the X–Y plane)in the 3D space, with no occlusions. These planes are all parallelto the X-axis. While projecting points on these planes to theimage plane, only the ones lying on the same line parallel tothe image plane will be projected onto the same image row.Therefore, resultant disparities within the line segment areidentical.� The horizontal segment also corresponds to either the horizon-

tal, oblique or vertical planes (parallel to the X–Y plane) in the3D space with occlusions.� Finally, if the line segment corresponds to a side plane (planes

that are perpendicular to the road plane), the true disparitiesgradually change horizontally. Assigning a single disparity valueto all pixels in the segment is thus inappropriate. A multi-passaggregation method is developed in our system to accommo-date each case differently.

For the first case, the DSI of a image line segment within ahomogeneous region is shown in Fig. 7. Each pixel in Fig. 7 corre-sponds to a cost value (normalised correlation coefficient). The x-axis represents the number of pixels inside this line segment.The disparity range increases along the y-axis. Defining thedisparity d and the pixel index j within a segment s as two randomvariables, the normalised cost d̂A

s ðj; dÞ can be treated as a probabil-ity density of correct correspondence and is shown below as:

Fig. 7. The DSI image of pixels on the same image row and inside a homogeneousregion. The red line indicate the correct disparity at d = 22. (For interpretation of thereferences to colour in this figure legend, the reader is referred to the web version ofthis article.)

Fig. 9. WTA optimisations based on four sets of aggregation results.


d̂As ðj; dÞ ¼

dsðj;dÞRRdsðj;dÞdjdd

ð3Þ

The denominator is a normalising term to ensure the probabilitydensity integrates to one. The objective is to find the cost of thewhole segment depending on d and to eliminate the term j. Thisis achieved by calculating the marginal density function as follows:

CAs ðdÞ ¼

Zd̂A

s ðj; dÞdj ð4Þ

where CAs ðdÞ represents the probability of d being the correct dispar-

ity for the whole segment. If d results in a large number of high val-ues in d̂A

s ðj;dÞ, the integrated result will provide a large value at dand vice versa. The aggregated cost should contain a unique peakwhich corresponds to the correct disparity. Fig. 8 illustrates the costaggregation result of the segment containing the pixel whose cost isshown in Fig. 6. As the figure shows, the maximum position is nowcorrect at d ¼ 22. This indicates that cost aggregation within ahomogeneous region can result in a distinct peak which is moresuitable when used to estimate the disparity map.

For the second case, an example of the DSI of a segment includ-ing an occluded region is shown in Fig. 9. The costs in the occludedarea are completely different from the rest. If the cost is aggregatedusing Eq. (4), there is a possibility that the final result will be influ-enced by the occluded region and cause errors. Therefore, it isimportant to ignore the occluded areas during cost aggregation.However, the determination of the occluded area from the DSI isnot an easy task. In [36], the authors detect the occluded areasusing Dynamic Programming (DP). Since occlusion detection needsto be performed on every image row, the overall computationalcomplexity will be dramatically increased. As the left image is se-lected as the reference, occlusions can only occur on the right sideof an object. In order to restrict the confidence of the costs of rightpixels during the aggregation, a weight function is applied to givehigher support to the pixels close to the left end of a segment whilesuppressing the cost contribution of the pixels close to the rightend. In the proposed approach, a Gaussian weighting function w1

is employed. It provides more flexibility while tuning since theweight distribution is controlled by its mean and standarddeviation values. Simpler weights function such as the rampfuntion can also be taken into consideration. In our case, linearly

Fig. 8. The aggregated cost of a horizontal segment without normalisation.

distributed weighting coefficients do not suppress the occlusionproblem in cases of large (wide) segments. Therefore, a Gaussianfunction is employed for good searching range coverage on the leftend of the segment and sufficient restriction of occlusion on theright side. When the Gaussian weighting function w1 is appliedto dsðj; dÞ, the normalisation process becomes:

d̂Lsðj; dÞ ¼

w1ðjÞdsðj; dÞRRw1ðjÞdsðj;dÞdjdd

ð5Þ

Hence, another aggregation function is created as shown below:

CLsðdÞ ¼

Zd̂L

sðj;dÞdj ð6Þ

If j is normalised within the range ½�1 1�, the mean of the Gaussianfunction is set at �1 and the standard deviation is chosen to be 0:4.This parameter is determined by manually tuning the value andevaluating the resultant disparity map using selected image setsfrom the Enpeda Image Sequence Analysis Test Site. The small stan-dard deviation indicates the weights at the right side of the segmentare very low. This choice is based on the fact that some of the occlu-sion effect is severe, which can occupy a big proportion in a seg-ment or even cover the whole segment. If the complete segmentis within an occluded area, the aggregated result of Eq. (6) cannotindicate the correct disparity. However, the possibility of this occur-ring is generally low in a road scene.

With the above two aggregation functions, the costs of pixelswithin a segment are aggregated twice. The aggregation resultsof Eq. (4) could be affected by the occluded areas but it producesexcellent results in occlusion-free areas. The results of Eq. (6) areless sensitive to the negative influence of the occluded areas butmay produce errors due to the exclusion of a large portion of pixelson the right side of the segment. In order to make sure that at leastone of the aggregation results corresponds to the correct disparity,another aggregation is performed using Eq. (6) but with a differentweighting function. This time, the mean of the Gaussian function isstill at �0:7 but the standard deviation is chosen to be 0:8 (decidedby tuning the value manually and evaluating the experimental re-sults). This function assigns higher weights to more pixels whilepreventing errors introduced by small occluded areas. The resul-tant cost normalisation function is shown as follows:

d̂Ms ðj;dÞ ¼

w2ðjÞdsðj;dÞRRw2ðjÞdsðj;dÞdjdd

ð7Þ

The marginal density is therefore:

CMs ðdÞ ¼

Zd̂M

s ðj;dÞdj ð8Þ

The final decision about which aggregation result corresponds tothe correct disparity will be discussed in detail in Section 2.4.

For the third case, in order to allow gradual changes within thegroup while generating reliable aggregated results, segmentation


is applied again vertically with the horizontal edges. The heights ofa projected line on a side surface should be identical on the left andright images. Therefore, if d corresponds to the true disparity, avertical segment in the reference image should find the one onthe right image with the same height to be the correspondence.If v and l are used to represent the vertical segment and pixel in-dexes respectively, the normalisation process can be expressed by:

d̂Avðl;dÞ ¼

dvðl;dÞRRdvðl; dÞdldd

ð9Þ

The aggregation of pixels inside a vertical segment is therefore:

CAvðdÞ ¼

Zd̂A

vðl;dÞdl ð10Þ

The occluded areas are not considered during vertical aggregationsince they only exist on the right side of an object and it is difficultto identify them in the vertical direction.

The proposed multi-pass approach aggregates the cost fourtimes in total. Each aggregation utilises different information, aim-ing to solve a specific problem. These aggregation results need tobe further processed in order to generate an optimised disparitymap. The four available aggregation results are CA

s ðdÞ;CLsðdÞ;C

Ms ðdÞ

and CAv ðdÞ (Eqs. (4), (6), (8) and (10)).

2.4. Segment-based disparity optimisation

With the above four sets of cost functions, the WTA optimisa-tion is applied to each set and produces four disparity maps Di

W ,where i ¼ 1;2;3;4 as shown in Fig. 10. Each pixel should selectits disparity among the four results. As Fig. 9 shows, each of thefour disparity maps contains errors. However, most of the errorsare not shared by all. For example, D4

W produces accurate dispari-ties for the side planes but a large number of errors is introducedon the oblique planes. The objective of the optimisation is to findthe optimum value among the four for each pixel or segment.Many global optimisation methods (such as SA [37], SGM [21]and BP) can be applied to solve the problem. As the disparity levelsare restricted to only four values, the required computation toachieve the final result is significantly reduced. In our system, a lo-cal optimisation method is conducted based on segments insteadof pixels in order to further decrease the required computationalpower.

Although the computational complexity of a 2D optimisation isgreatly reduced after the cost aggregation, it is still difficult to

Fig. 10. WTA optimisations based on

achieve real-time performance. In our system, the horizontal seg-mented disparity maps (Di¼1;2;3

W ) are optimised first. The verticalsmoothness of local image areas are evaluated. These local areasare determined by the horizontal and vertical segmentation. Inpractice, it is reasonable to assume that the disparity within a ver-tical segment should be identical (such as in the obstacle areas) orslowly changing (such as in the road surface area). Any suddenchanges are irregular and should be removed. On each of the threemaps based on horizontal segmentation, vertical disparitydifferences are calculated. The locations where the differencesare greater than 1 are labelled with 1s on a positive differencemap ri

þ where i ¼ 1;2;3. The locations where the differences aresmaller than �1 are labelled with 1s on a negative differencemap ri

�. The final uncertainty map ri can be calculated using infor-mation contained in both difference maps as follows:

riðv; lÞ ¼minX

l

riþðv; lÞ;

Xl

ri�ðv; lÞ

!ð11Þ

ri provides an accurate measure of the number of distinctivedisparity changes within a vertical segment. At pixel ðv ; lÞ, thedisparity DH can be represented as:

DHðs; jÞ ¼ DipW ðs; jÞ where rip ðsÞ ¼mini

Xj

ri¼1;2;3ðs; jÞ !

ð12Þ

where DH is the optimised result based on horizontally segmenteddisparity maps. This optimisation process enforces the inter-scanrelationship on each horizontal segment so that errors can be iden-tified while a smooth vertical disparity change is not interrupted.

Although DH is now optimised, it is only based on the horizontalsegment results. The disparity values representing the side planesare still inaccurate. In order to get the final disparity map, thesedisparity values in DH must be replaced by the corresponding infor-mation stored in D4

W (a disparity map calculated based on the ver-tical segments). Unlike the inter-scanline optimisation discussedabove, the disparities for the side planes are selected based onthe DSIs of horizontal segments.

It is important to note that unlike DH;D4W has not yet been ana-

lysed and optimised. It could contain errors introduced by insuffi-cient pixels in a segment or occlusions. If DH is directly comparedwith D4

W without preprocessing, it is very likely to cause inaccurateoptimisation. The objects most likely to appear on the sides of thevehicle could be other vehicles, buildings and fences. The sidesurfaces of these objects are approximately planar. Even in some

four sets of aggregation results.


extreme cases where the objects have curved side surfaces, it isvery unlikely for the complete curvature surface to be a homoge-neous area which cannot be separated during line segmentation.If the curved surface is segmented into a few segments, disparitychanges within each segment can be treated as approximately lin-ear. For the above reasons, the disparity contained in a horizontalsegment of D4

W is restricted to be on a straight line on the DSI.The parameters of the line corresponding to horizontal segment sare calculated with least squares fitting based on smoothed andsub-sampled D4

W ðsÞ where D4W ðsÞ contains D4

Wðs; jÞ for all j. Sincesub-pixel resolution is not required, the lines are then sampled,digitised and stored in DV ðs; jÞ. It is important to note that the valueof DV ðs; jÞ changes according to the fitted line, whereas DHðs; jÞ con-tains the same value for different j. For DH , each segment disparityis unique and so it shows a horizontal line on the DSI according tothe disparity value. The confidence for the segment disparitiesbeing correct is evaluated by averaging the cost values crossedby the line as follows:

Fðs;DxÞ ¼P

jðdsðj;Dxðs; jÞÞÞJ

ð13Þ

where Dx is a disparity map which can be either DV or DH . Since thenegative values of normalised correlation values are set to 0, thecost values vary from 0 to 1. J is the number of pixels withinsegment s. Fðs;DxÞ varies from 0 to 1 for each group. IfFðs;DV Þ > Fðs;DHÞ, the final disparities of the segment Dðs; jÞ areset to DV ðs; jÞ. Similarly, If Fðs;DV Þ <¼ Fðs;DHÞ, the final disparitiesof the segment Dðs; jÞ are set to DHðs; jÞ.

This line fitting process not only correctly optimises the dispar-ity map, but also calculates the disparity gradients of segments.The detection of the side surfaces can be significantly simplifiedby utilising this information during the obstacle detection stage(discussed in detail in Section 3).

Finally, the disparity values of the edge positions are filled withthe WTA optimisation based on the NCC cost. High frequency com-ponents on the disparity map D are removed and interpolated. Anexample of the final disparity map is shown in Fig. 11 and moreexperimental results and comparisons can be found in Section 5.

3. Obstacle detection

3.1. Preliminary background

With the disparity maps extracted earlier, we now focus onobstacle detection. Our work is based on the U–V-disparity repre-sentation [13] and it is extended for faster and more accuratedetection. The projection of a disparity map in U-disparity andV-disparity domains can be treated as two histograms of disparityvalues. From this point, ‘the projection of the disparity map in V/U-disparity domain/space’ is generally stated as ‘U/V-disparity’. For

Fig. 11. An example of the final disparity map achieved with the proposedalgorithm.

the V-disparity, the histogram of each horizontal line is calculated.If the number of pixels sharing the same disparity on each line isrepresented by brightness, an image in which vertical and roadplanes are represented by lines is formed. An example of V-dispar-ity is shown in Fig. 12. The longest diagonal line in the V-disparityrepresents the road plane. The vertical lines represent the obstacleswith vertical planes. The length of each vertical line indicates theheight of the corresponding obstacle. The U-disparity is similar tothe V-disparity but it is the histogram of columns of the disparitymap. Likewise, the U-disparity (as shown in Fig. 12) encodes thehorizontal information of obstacles. The road plane is no longervisible but the side surfaces are included and are represented withnon-horizontal lines. As carefully analysed in [34], the U–V-dispar-ity includes essential information for most planes that wouldappear in a road scene. It reduces the time consuming plane detec-tion problem to a much simpler line detection one. However, allobjects are included in the histogram and it is difficult to extractan obstacle without the interference of other outliers. Many algo-rithms detect the road surface first with a flat road assumption.The relationship between the road plane and vertical planes canthen be defined to restrict the pixels of interest. The Hough trans-form has been one of the most commonly used algorithms for U–V-disparity based line detection. However, it has been found in ourprevious work [38] that the parameters of the Hough transformare very difficult to determine in changing environments. It re-quires a threshold to binarise the U–V-disparity, another thresholdfor the Hough space accumulator, a parameter to determine theacceptable point spacing, a line orientation restriction and moreparameters involved in the 2D peak detection. Even though theparameters are carefully selected based on a single set of test data,errors are unavoidable if shorter lines (far range obstacles) need tobe detected. Including a post processing step is almost essential ifthe Hough transform is used. However, implementing this stepindicates the inclusion of even more parameters. In summary,the Hough transform takes a large amount of computational powerand memory space but cannot consistently produce promisingresults.

In our system, obstacle detection is based on the U-disparityand the proposed G-disparity space (encloses disparity gradientinformation). The V-disparity is not used to extract the verticalprofiles of obstacles. The most important reason to exclude theV-disparity is that the heights of the side planes are difficult to ex-tract. Furthermore, errors will be introduced in a special case. Iftwo obstacles with different heights have an identical disparity,the V-disparity will result in single line with changing intensities.Although the two obstacles can be separated horizontally by ana-lysing the U-disparity, the difference in their height is very difficultto distinguish. A similar situation will happen to the U-disparitywhen two obstacles with similar disparity exist vertically. Fortu-nately, this happens very rarely in traffic scenes. In the proposedapproach, the disparity gradient information extracted earlier is

Fig. 12. Example U-disparity and V-disparity image.


used, and the obstacle detection problem has been separated intotwo stages: side plane recognition and vertical plane detection.By doing so, the detection of obstacles can be faster and moreaccurate.

Fig. 14. G-disparity created based on the horizontal disparity gradient of Fig.13.Gradients close to zero are not shown on the G-disparity.

3.2. Side plane detection

The detection of side planes is a more complicated problemthan the vertical plane detection since the orientations of the linesare not priorly available. When the Hough transform is used forline detection, it is almost essential to include the local edge orien-tation to achieve accurate and fast performance. However, the ori-entations of the edge points in the U-disparity are not accurate dueto the discrete nature of the disparity map (low resolution in thedisparity axis). In most cases, disparity changes slowly in the hor-izontal direction on a side plane. The disparity level only changesevery few pixels. In this case, the orientations of the edges corre-sponding to the side planes are inaccurate and mostly vertical asshown in Fig. 13. Thus, a range of orientations needs to be allowedwhich increases the system complexity. Another problem is thatonce detection of small objects is allowed (represented by veryshort lines on the U-disparity), the horizontal segments of the sidelines can be easily confused as small vertical planes.

The proposed obstacle detection system extracts lines repre-senting the side planes before the detection of vertical planes. Oncethe detected side planes are removed from the disparity map,vertical planes can be very easily extracted. During the disparitycalculation stage (discussed in Section 2), the disparities of someof the horizontal segments are selected from DV . These segmentdisparity values are fitted with a straight line with a non-zero gra-dient. They are very likely to be included in the side planes and thegradient of the fitted lines can be useful. However, it is possiblethat not all pixels from the side planes are correctly chosen fromDV . Therefore, using these pixels to represent the side planes willresult in incorrect sizes and positions. In this paper, we proposea similar approach to the generation of the U-disparity. The gradi-ents of the segments corresponding to the same side plane can beaggregated which results in the proposed G-disparity and allowsconvenient side line detection.

Defining the disparity gradient map as DG, the correspondinggradient for each pixel can be represented as DGðn;mÞ or DGðs; jÞwhere n and m represent the column and row indices. Like before,s and j represents the jth pixel within a horizontal segment s. A his-togram of the disparity gradient can then be calculated for eachcolumn of DG like the calculation of the U-disparity. The result iscalled the G-disparity. In most situations, many pixels on DG arezero, which corresponds to oblique or vertical planes. These pixelsare excluded during the G-disparity calculation since the objectiveis to locate the side planes. An example G-disparity is illustrated inFig. 14. As the figure shows, similar gradients on the same columnare accumulated and form horizontal lines on the G-disparity. Thewidth and the horizontal positions of these lines are identical tothe side lines in the U-disparity (as shown in Fig. 14 and 15). This

Fig. 13. A small section of a non-horizontal line.

indicates that the gradient and horizontal profiles of side lines inthe U-disparity can be extracted by locating the horizontal linesin the G-disparity. In our system, the detection of these horizontallines is achieved by analysing the pixel connectivities but othermethods can also be applied. Once the horizontal lines are found,the corresponding horizontal positions, widths and gradients ofthe side lines on the U-disparity space are available. The only un-known parameter of each line on the U-disparity is the line offset.For every horizontal line, a one dimensional Hough transform withfixed line orientation is applied to extract the offsets. The numberof Hough transform executions is equivalent to the number of hor-izontal lines on the G-disparity space. Each time, only a verticalband on the U-disparity is used. This vertical band is determinedby the position and width of the corresponding horizontal lineson the G-disparity. Unlike applying the Hough transform directlyon the U-disparity, the proposed algorithm finds the horizontalprofile and line orientation prior to the use of the Hough transform.Another important fact is that only the offset corresponding to themaximum in the Hough space needs to be selected. During thewhole process, no manually selected parameter is needed excepta small threshold TU on the U-disparity for binarisation. The deter-mination of this threshold is based on the performance of the dis-parity calculation algorithm and the size of the smallest detectableobstacle. In our system, it has been chosen to be 10 so that most ofthe pixels are preserved (this indicates a small object with heightof only 10 pixels will be included). A side line detection result isshown in Fig. 15. The resultant line detection is fast, accurateand requires only one parameter which can be easily determined.

Vertical profiles of the side planes are detected based on the ori-ginal disparity map. For each side line, the indicated disparities areused for side plane detection. The height of a plane is detected bylocating vertically the locally connected pixels with approximatelyidentical disparity values. The result of a side plane detection isshown in Fig. 17. The side planes are highlighted in yellow. Asthe figure illustrates, the side planes are detected accurately.

3.3. Vertical plane detection

Once the side planes are detected, all the pixels belonging to thesides are removed from the disparity map. The U-disparity is recal-culated and contains almost only the horizontal lines. This step isvery important since significantly less outliers are included duringvertical plane detection. The same threshold TU is applied to the U-disparity for binarisation.

For each row d, the adjacent pixels are grouped and form a line.If the distance between two valid pixels is smaller than TL, they aretreated as being on the same line. This parameter is related to TU ,since if more pixels are preserved, a lower TL is needed to connectthe points together. Thus, this relationship is assumed to be linearas shown below:

TL ¼ c � TU ð14Þ

It has been found by experiments that setting c ¼ 0:4 is a suitablechoice for our test data.

Finally, the lines on row d that are shorter than a object sizethreshold TSðdÞ (shown in Eq. (15)) are removed. This is a linearequation so that TS changes from small to large according to the

Fig. 15. Side line detection result based on the G-disparity and U-disparity image.

Fig. 17. Final obstacle detection result. The side and vertical planes are marked inyellow and red respectively. (For interpretation of the references to colour in thisfigure legend, the reader is referred to the web version of this article.)


disparity. This is because objects appear smaller in the far-field andlarger in the near-field. The size threshold should also change withdepth.

TSðdÞ ¼ TSð0Þ þ d � TSðdmaxÞ � TSð0Þdmax

ð15Þ

where TSð0Þ ¼ 5 and TSðdmaxÞ ¼ 20 defines the size threshold at thefurthest and nearest distance respectively. This size limitation isvery small which allows the detection of small objects and it is ex-tremely rare that an obstacle in the road scene is smaller than theselimits. All the above choice of thresholds are based on 640� 300 in-put image with 30 disparity levels. If the resolution of the input datachanges significantly, these thresholds need to be tuned accord-ingly. This is reasonable since all these parameters are in pixel units.Identical objects will be represented by more pixels in high resolu-tion images. The result of this line selection process is shown inFig. 16. As we can see, the detection is accurate due to the exclusionof outliers.

The height and vertical position of a vertical plane is extractedbased on the vertical image band defined with the horizontal lineobtained earlier. Furthermore, the corresponding d of the line indi-cates that only pixels with the same d belong to the obstacle ofinterest. Therefore, a one dimensional accumulator can be usedto store the number of pixels having the same d within the verticalregion. A similar pixel connecting process is carried out as de-scribed for the detection of horizontal lines. If more than one lineis detected, extra potential obstacles are created to allow thedetection of multiple vertically positioned obstacles having anidentical disparity value.

The final obstacle detection result is shown in Fig. 17. The pro-posed algorithm uses the minimum amount of parameters, whichare easy to determine, to detect both small and large obstacles andidentifies the side and vertical planes quickly and accurately. Itdoes not rely on the prior road plane detection and successfullyavoids the disadvantages of applying the Hough transform.

4. Free-space calculation

In this section, we include a free-space calculation algorithm toillustrate the benefit of the proposed obstacle detection approach.It is important to note that our aim is to detect all the image areasthat fit with the road surface and are free of any obstacles. There-fore, some off-road areas and road surfaces beneath obstaclescould be included in the results. With the obstacles removed fromthe disparity map, the free-space calculation becomes a much eas-ier task. The remaining areas are mostly occupied by road, sky andsome other far-field objects. The most significant advantage of thesystem is that the road surface is not assumed to be a very largearea. Also, the amount of outliers involved during model fitting isreduced to the minimum which enables easier and faster computa-tion. The vertical profile of the road surface is modelled using aspline which is proposed in [35]. This flexible model has very

Fig. 16. Horizontal line detection results. Detected lines are represented by red lines.references to colour in this figure legend, the reader is referred to the web version of th

few restrictions, and therefore, allows most of the road verticalprofiles to be accurately represented.

The first step is to pick the road region from the disparity map.In order to remove the area of sky, the furthest possible distance ofthe road to be modelled is limited to 100m. This is the case whereno obstacle is ahead of the vehicle. If an obstacle blocks the view ofthe far-distance road surface, the distance of that obstacle is usedas the furthest modelling distance.

An example of a disparity map without obstacles and extremefar-field pixels is shown in Fig. 18. Since the nearest road surfacepixels always lie on the lowest row of the image, the search forthe road surface starts expanding from the centre (can varydepending on the mounting position of the camera) of the lowestline. The width of the area is set to be limited by any discontinu-ities of valid pixels. From the lowest line upwards, the search stopsonce no valid pixels can be found in the upper row or the numberof the pixels on the row is smaller than a threshold. This thresholdis set to confirm that enough pixels are available for the followingsteps. In our experiments, the threshold is chosen to be 60 whilethe image width is 640.

Then the V-disparity is constructed based on the selected roadarea. The resultant image contains a clear line corresponding tothe road as shown in Fig. 19. On each row of the V-disparity, thed corresponding to the maximum intensity is selected to representthe road disparity. This process reduces the number of pixels in-volved during line fitting to the minimum and removes any poten-tial outliers.

A least squares line fit is then performed based on the datapoints extracted earlier. The B-spline equation is defined as:

XðtÞ ¼X

BioðtÞQ i ð16Þ

where XðtÞ represents the curve defined by the B-spline basis func-tion B and control points Q. The degree of the spline is representedby o. t 2 ½to; tK�o�1� denotes sampling points on the curve where K isthe number of knots. The control point index is represented by i. Inour system, a cubic B-spline is used (o ¼ 3).

The extreme far-field is not included in the detection. (For interpretation of theis article.)

Fig. 18. Disparity map without the obstacles and extreme far-field pixels.

Fig. 19. V-disparity based on disparity without the obstacles and extreme far-fieldpixels.

Fig. 21. Free-space calculation result. The free-space is highlighted in green and theobstacles are highlighted in red. (For interpretation of the references to colour inthis figure legend, the reader is referred to the web version of this article.)


The least squares algorithm finds the minimum of the summedsquared error between the spline to each data point by setting thedifferentiation of the error equation to 0. The resultant function isshown in Eq. (17).

BT BQ � BT P ¼ 0 ð17Þ

B contains all components of B. Q is a matrix containing all Q i

and P represents the data points. The control points (Q ) can thenbe calculated. The advantage of the least squares method is thata global minimum can be guaranteed. However, the algorithmcan be sensitive to outliers since the difference of all points is in-volved in the calculation. This problem will not cause any issuein our case since all data points are carefully sampled from theroad surface. The result of curve fitting based on Fig. 19 is shownin Fig. 20. The chosen disparity d at each image row m is illustratedas small black circles. The blue line is the spline fitted to thesepoints. As this figure shows, although the resolution of d is low,the fitted line is accurate and represents a flat road. More splinefitting results can be found in supplementary data.

Fig. 20. Result of least squares spline fitting. The selected points are represented bysmall circles and the fitted spline is shown in blue. (For interpretation of thereferences to colour in this figure legend, the reader is referred to the web version ofthis article.)

The original disparity map is compared to the disparities ob-tained from the vertical road profile. The initial selection of theroad area is expanded if the horizontally adjacent non-road areacontains the same disparity level as the road profile. In far dis-tances where the road disparity is not modelled, the spline is line-arly extrapolated based on the gradient at the minimum of themodelled d. This step enables the road area to be extended to veryfar distances. Finally, morphological processing (erosion and dila-tion) is applied to the road area to remove expansion errors. Thefinal free-space calculation result is shown in Fig. 21. As the figureshows, the free-space is detected correctly up to a very far distance.

5. Experimental results

5.1. Disparity calculation results

Although the proposed disparity calculation algorithm is specif-ically designed to work on traffic scenes, its performance was alsoevaluated using the stereo images provided by Middlebury.2 Thesestereo image sets are very commonly used for result evaluations andthe ground truths are also available. It is worth noting that these testimages contain rich textures and are very different from road scenes.These textures produce a large amount of edges and this leads toserious over segmentation for edge based approaches. This furtherreduces the performance of the occlusion handling part of the pro-posed algorithm (the occluded areas are not grouped with thenon-occluded areas so that cannot be corrected). Ideally, a texture-based segmentation algorithm should be used instead of an edgebased one. However, since the focus of the proposed algorithm ison road scenarios with the smallest amount of computation, edgedetection is still used for segmentation during the evaluation.

Four sets of images that are regarded as standard evaluationdata are used in our tests. During the tests, the parameter settingsremain constant and the results are compared with the groundtruth. Performance evaluation is conducted based on the bad-pixelpercentage l as shown below [26]:

l ¼ 1MN

Xm;n

ðjDðn;mÞ � DTðn;mÞj > gdÞ ð18Þ

where D is the disparity map and DT represents the ground truth. gd

is a threshold controlling the error tolerance. In the experiments, gd

is set to 1 as it is the default choice for the published Middleburyevaluation results. The proposed algorithm does not estimate thedisparities on the left most side of the reference image due to thefact that this section might not be included in the right image. Apartfrom this region, all the available pixels in the ground truth are usedinstead of only the non-occluded areas. Fig. 22 illustrates the test

2 Middlebury Computer Vision website. (http://vision.middlebury.edu/stereo/).


images, the disparity map calculated with the proposed approachand the difference image with the ground truth.

As the figure shows, most of the errors are concentrated on theedge areas and are caused by occlusions and interpolations. Thedisparities within the texture-less regions are accurate andsmooth. The error percentage is shown in Table 1 along with thecomparison with several top ranked algorithms in Middlebury.The parameters for the listed algorithms are manually tuned sothat optimum performance can be achieved. The same parametersare used for all four images in our test. According to the error rate,the proposed algorithm will not be placed on the top of the chartin the Middlebury evaluation rank since many algorithms arespecifically tuned for the Middlebury data sets. Nevertheless, theperformance of the proposed results is still higher than the tradi-tional Winner-Take-All (WTA) methods (e.g. SSD + MF [39]), and

Fig. 22. Disparity calculation results based on the proposed algorithm using the test imathe input images. The disparity calculation results are shown in the second column. Theand the ground truths. All differences below or equal to dd = 1 are set to zero.

achieved a lower error rate in all four tests. The only algorithmemploying WTA that stays at the top for the Middlebury chart is[40]. However, this algorithm utilises the Belief Propagation (BP)based technique for cooperative optimisation [40]. In fact, mostof the top ranked methods in Middlebury are BP based algorithms(e.g.[41–44]). When compared to BP based approaches, the pro-posed algorithm achieved a relative higher error rate in the firstthree test sets. However, the error rate of ‘Cones’ came very closeto the results produced using BP based methods. It is worth notingthat Middlebury tests are not truly indicative of the performance ofan algorithm based on real-world traffic scenes. As also suggestedby [28,45], the performance scenes are completely different. Thisexplains the fact that the proposed algorithm resulted in a compet-itive performance in Cones, since the large number of horizontaland vertical surfaces within the scene demonstrates high similarity

ge sets available on the Middlebury Computer Vision Pages. The first column showsthird column illustrates the difference maps between the resultant disparity images

Table 1Percentage of ‘bad-pixels’ evaluation using a selection of top 10 ranking Middlebury methods and the proposed algorithm.

Tsukuba (%) Venus (%) Teddy (%) Cones (%) Average (%)

AdaptingBP[43] 1.37 0.21 7.06 7.92 4.14CoopRegion[40] 1.16 0.21 8.31 7.18 4.21DuobleBP[44] 1.29 0.45 8.30 8.78 4.70RDP[42] 1.39 0.38 9.94 7.69 4.85BP[22] 5.50 3.28 9.60 8.01 6.60Proposed Algorithm 4.53 1.12 14.11 8.29 7.01SSD + MF[39] 7.07 5.16 24.8 19.8 14.2

Fig. 23. Stereo vision results based on different trafic scenes.


to a real-world traffic scene. The Middlebury images were testedhere in order to compare the performance of the proposed algo-rithm in well controlled situations with existing algorithms.

Following the testing under well controlled indoor environ-ments, the real-world performance of the proposed disparity calcu-lation algorithm was evaluated based on the stereo images setsprovided on Kitti.3 The results obtained with the proposed approachwere compared to the ones obtained with the traditional Winner-Take-All (WTA) [26] and one of the simplest BP method [22], bothof which are based on the NCC cost function. It is not suitable tocompare to the results achieved using the SAD cost function, sincethe test images contain large intensity variations. Because the

3 Kitti Vision Benchmark.

WTA technique only uses local information, comparison with theWTA optimisation results is the most appropriate way of illustratingthe improvements achieved by the proposed algorithm. BP based ap-proaches are high performance global optimisation techniques.Many algorithms based on BP have achieved leading performancein the Middlebury stereo evaluations as well as the comparisonsusing real-world traffic images [46]. Our implementation is basedon the same algorithm tested in [46] but with the NCC cost functioninstead of the census transform. The authors in [28] also illustratedthe BP performs better based on the edge map instead of the originalimage. In our experiments, both types of input images achieved asimilar performance due to the fact the NCC calculation is insensitiveto brightness changes.

All the cost calculations for the compared algorithms (the WTAexample, the BP example and the proposed algorithm) are based on



the same 5� 5 block size. The WTA approach does not require anyparameter settings. It is certainly the fastest but a large amount oferrors is introduced in the homogeneous and occlusion regions.The BP algorithm contains a few parameters mainly aimed at tun-ing the overall smoothness of the disparity map. These parametersare not easy to determine and the final result is sensitive to mod-ifications. In our implementation of BP, a set of optimum parame-ters are selected based on experiments in order to balance betweenthe depth smoothness and accuracy. For the proposed approach,the required parameters are discussed in detail in the previous sec-tions. They are straightforward to determine and not as sensitive asthe ones included in the BP approach.

Some of the experimental results are shown in Fig. 23–25. Thetest images shown in this section are some of the most typical fromthe sequences. Most of these scenes contain large homogeneousand occluded areas along with large camera gain differences. Inthe evaluations of the three algorithms, the proposed algorithmachieved a comparable performance to the BP approach. Much bet-ter disparity maps are obtained when compared to ones based onthe WTA optimisation. The proposed algorithm has a partial occlu-sion handling capability. However, unlike the BP approach, theocclusion handling strength cannot be controlled by parameters.The performance of the proposed algorithm in homogeneous re-gions is the most impressive. The errors are minimised by selectingthe horizontal segment disparity for each individual pixel. Mean-while, smooth disparity transitions are allowed by incorporating

the vertical segments disparities. As a result, accurate and high res-olution disparity maps can be obtained. The proposed algorithmalso provides important gradient information to allow the simpli-fication of side plane recognition during obstacle detection. Finally,the proposed algorithm is a line based method which is muchfaster than the BP approach and most of the global optimisationalgorithms. The implementation of the proposed algorithm is inMatlab. Apart from the cost calculation, optimisation based on BPtakes 92:6 seconds and the proposed algorithm takes 7:05 secondson average with an i7 CPU. It is important to note that Matlabimplementations are not accurate measurements of the algorithmspeed. The processing speeds are shown here only for indicationpurposes.

As Fig. 23–25 show, comparing with the other two, the proposedalgorithm achieved the best performance in road areas. The dispar-ities on the road surface are accurate and smooth. For the verticalplanes, both the proposed algorithm and the BP achieved high per-formance. The proposed algorithm is also able to distinguish theside planes and produce smooth disparity transitions in these areas.In the occluded areas, the proposed algorithm demonstrated highdegrees of handling ability if the occluded areas are within a largesegment with more non-occluded pixels. However, if this conditionis violated, the occluded areas will introduce errors. Generally, theBP exhibits a stronger ability to correct the disparities in occludedareas. However, the strength of occlusion handling is controlledby adjusting the general smoothness for the whole disparity image.


4 Kitti Vision Benchmark. (http://www.cvlibs.net/datasets/kitti/).


It is difficult for the BP to remove all occlusions while not producingerrors in other areas. Both the proposed algorithm and the BP ap-proach are able to produce accurate disparities up to far distances.The performance of the BP approach relies on parameter tuningwhereas the proposed algorithm is less constrained. Until now, itis fair to say that the proposed approach has achieved significantlybetter results than the WTA method in real world traffic scenes. Incomparison with the global optimisation algorithm, BP, theproposed algorithm also demonstrated advantages in terms ofparameter sensitivity, speed and calculation accuracy on the non-occluded areas.

While the proposed algorithm demonstrates a more impressiveperformance compared to general disparity calculation ap-proaches, further evaluation were carried out based on the stereoimages sets from Kitti vision benchmark. The Kitti evaluation re-sults are produced by stereo vision algorithms specially designedand tuned for real-world traffic scenes so that the proposed algo-rithm can be compared with other ’’road traffic specialists’’. Theproposed algorithm computed disparity maps of 195 Kitti test im-age pairs. Those test sequences contain challenging illuminationconditions, high image resolution and large pixel displacements.The results were evaluated according to the ground truth disparitymaps generated by a Velodyne HDL-64E laser scanner. The Kittievaluation chart ranks all methods according to the number ofnon-occluded erroneous pixels at the specified disparity errorthreshold. All methods providing less than 100 percentage densityhave been interpolated using simple background interpolation. All

test pairs were processed using the same parameter in order to re-flect the adaptability of the tested algorithm across various real-world-traffic scenarios. All details of the listed methods can befound in Kitti.4 The main evaluation ranking is decided accordingto the number of non-occluded erroneous pixels at the 3 pixels errorthreshold. Table 2 shows the performance evaluation of theproposed system compared with methods from Kitti stereo bench-mark. In Table 2, the evaluation categories are Out-Noc (Percentageof erroneous pixels in non-occluded areas), Out-All (Percentage oferroneous pixels in total), Avg-Noc (Average disparity error in non-occluded areas) and Avg-All (Average disparity error in total). It isworth noting that some methods (e.g. PCBP-SS) receive support fromtime space information rather than purely stereo information.

During the early stages of the algorithm development, our pro-posed algorithm was coded and tested under a Matlab environ-ment. However, in the Kitti bench evaluation chart, all the topranked methods have their results obtained in a C/C++ environ-ment. In order to present a clearer processing speed comparisonwith other algorithms in Kitti bench evaluation chart, a C/C++ ver-sion of the proposed algorithm has been coded and this has under-gone the same set of evaluation image sequences. The repetitivenature of the core section of the proposed algorithm results in amuch lower processing time in C than its Matlab version. The Mat-lab version of the proposed algorithm takes 4 min on average to

Table 2Performance evaluation of the proposed system compared with methods from the Kitti stereo benchmark.

Rank Method Out-Noc (%) Out-All (%) Avg-Noc (px) Avg-All (px) Runtime Environment

1 SceneFlow 2.98 3.86 0.8 1.0 6 min Matlab + C/C++2 PCBP-SS 3.40 4.72 0.8 1.0 5 min Matlab + C/C++3 gtRF-SS 3.83 4.59 0.9 1.0 1 min Matlab + C/C++4 StereoSLIC 3.92 5.11 0.9 1.0 2.3 s C/C++5 PR-Sf + E 4.02 4.87 0.9 1.0 200 s Matlab + C/C++6 PCBP 4.04 5.37 0.9 1.1 5 min Matlab + C/C++7 PR-Sceneflow 4.36 5.22 0.9 1.1 150 s Matlab � C/C++8 wSGM 4.97 6.18 1.3 1.6 6s C/C++9 ATGV 5.02 6.88 1.0 1.6 6 min Matlab + C/C++10 rSGM 5.03 6.60 1.1 1.5 0.3 s C/C++31 linBP 8.56 10.70 1.7 2.7 1.6 min C/C++32 S + GF 9.03 11.21 2.1 3.4 140 s C/C++33 SM_GPTM 9.79 11.38 2.1 2.6 6.5 s C/C++34 LAMC-DS 9.82 11.49 2.1 2.7 10.8 min Matlab35 Two-step 9.91 11.30 1.7 1.9 7 min Matlab + C/C++36 MS-DSI 10.54 11.98 1.9 2.2 10.3 s C/C++37 SDM 10.95 12.14 2.0 2.3 1 min C/C++38 GF 11.65 13.76 4.5 5.6 120 s C/C++39 BSM 11.74 13.44 2.2 2.8 2.5 min C/C++40 GCSF 12.05 13.24 1.9 2.1 2.4 s C/C++41 OCV-BM-post 12.28 13.76 2.1 2.3 0.1 s C/C++42 GCS 13.38 14.54 2.1 2.3 2.2 s C/C++43 Proposed Algorithm 16.73 18.49 3.4 4.3 5 s C/C++44 MPA-1 18.92 20.84 4.9 6.3 4 min Matlab45 CostFilter 19.99 21.08 5.0 5.4 4 min Matlab

Fig. 26. Comparison of stereo vision results computed by ELAS, SNCC and the proposed algorithm the left input image, the disparity error map and the estimated (andinterpolated) disparity are shown. The error map scales linearly between 0 (black) and P5 (white) pixels error. Red denotes all occluded pixels, falling outside the imageboundaries. The false colour map is scaled to the largest ground truth disparity value. (For interpretation of the references to colour in this figure legend, the reader is referredto the web version of this article.)

Fig. 27. Comparison of stereo vision results computed by ELAS, SNCC and the proposed algorithm the left input image, the disparity error map and the estimated (andinterpolated) disparity are shown. The error map scales linearly between 0 (black) and P5 (white) pixels error. Red denotes all occluded pixels, falling outside the imageboundaries. The false colour map is scaled to the largest ground truth disparity value. (For interpretation of the references to colour in this figure legend, the reader is referredto the web version of this article.)


Fig. 28. Obstacle detection results based on different image sequences. The side planes and the vertical planes are highlighted with yellow and red respectively. (Forinterpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 29. Line detection result based on the input image shown in Fig. 23(e). (a)Illustrates the Hough transform detection result and (b) shows the line detectionresult based on the proposed approach. The image is interpolated vertically so thatthe line gradient can be shown.


compute a single disparity image, on a PC with a single core i72.8 GHz. The C version of the proposed algorithm runs on the samePC with the same set of parameters but with the processing timedramatically cut down to 5 s. Furthermore, it produces more accu-rate results of 16.73% of erroneous pixels in non-occluded areasthan its result of Matlab version (19.04%). It can be shown that,although the accuracy of the proposed algorithm is not as impres-sive as the top ranked algorithms in Kitti, it demonstrates a veryshort processing time which is critical in applications in theautomotive industry. By employing a state-of-art Digital SignalProcessor such as TI C6678 with the strength of multi-core pro-cessing, a real-time processing speed of 18–20 frames per secondis predicated.

Some of the testing results are shown in Fig. 26,27. SNCC [47]employs a cost function based on a modified version of the Norma-lised Cross-Correlation (NNC) calculation, which is ideal forcomparison with the proposed algorithm. ELAS [48] is one of thetop performance methods dealing with scenes containing a largeamount of homogeneous surfaces. It has been used for comparisonwith the proposed algorithm to evaluate the performance in homo-geneous areas. Fig. 26 demonstrates a typical street scene with awide road and vehicle side surfaces. The proposed algorithm showscomplete disparity calculation results with ELAS in road surfacesand vehicle side surfaces. SNCC, employing a more complex costfunction calculation, had a similar performance to the proposedalgorithm while smoothness suffered in pavements areas. InFig.27, all three compared algorithms demonstrated good perfor-mance. However, the small tree area was given fault disparity val-ues by ELAS and nearly completed missed out by SNCC. However,the proposed algorithm retained good accuracy in local disparitycalculations while still performing competitively in large homoge-nous regions.

5.2. Obstacle detection results

The proposed obstacle detection algorithm utilises the disparitygradient and simplifies the detection of side planes by constructingthe proposed G-disparity. The obstacle detection results shown in

Fig. 28 are obtained based on the disparity map calculated usingthe proposed algorithm shown in Fig. 23–25. As the figures show,the proposed algorithm is able to clearly identify side planes andvertical planes and both far-field and near-field obstacles can bequickly detected. Fig. 29 illustrates a comparison between the pro-posed approach and the Hough transform line detection on theU-disparity space. The Hough transform is based on peak detection

Fig. 30. Candidate free-space detection results. The free-spaces and obstacles are highlighted in green and red respectively. (For interpretation of the references to colour inthis figure legend, the reader is referred to the web version of this article.)


instead of thresholding in order to avoid severe over detection. Theparameters for each image are tuned carefully (different parame-ters for different images) so that optimum results can be achieved.The parameters of the proposed approach are fixed for all test data.It can be seen that the proposed line detection approach success-fully minimises the outliers included in the operation and thedetection result is more accurate. Once the side lines are removedfrom the U-disparity, the problem is simplified to the detection ofhorizontal lines. For the Hough transform, parameters are tuned inorder to detect the small object in the centre in Fig. 29. However,detection errors still exist in other regions. The experimental re-sults also demonstrate the proposed algorithm is less sensitive toparameter changes compared with the Hough transform-basedapproach.

5.3. Free-space calculation results

By excluding the obstacles from the disparity image, the verticalroad profile can be easily determined using least squares line fit-ting. Fig. 20 shows a line fitting result based on the spline model.The data points are correctly sampled from the road surface andthe vertical road gradient changes are accurately estimated. The fi-nal free-space is calculated based on the initial road area selectionand the vertical profile. Some of the detection results are shown inFig. 30. As the figure shows, free-spaces can be correctly calculatedin various conditions. Even when the foreground road area is small,the detection result is still accurate. Detecting the free-space afterobstacle detection is a much easier job where no assumption isneeded.

6. Conclusion

In this paper, a novel stereo vision based obstacle detectionsystem is presented. For the disparity calculation, a multi-passhorizontal and vertical aggregation process allows accurate costevaluation in textureless regions. The algorithm also has a goodpartial occlusion handling capability. A accurate horizontal dispar-

ity gradient is extracted during disparity optimisation, which sim-plifies the detection of side planes. In our experiments, theproposed stereo vision algorithm achieved comparable perfor-mance to the BP based global optimisation technique but at a muchlower complexity. Based on the calculated disparity and disparitygradient, the proposed G-disparity can be constructed to be usedin conjunction with the U–V-disparity images to achieve efficientand accurate detection. Finally, vertical profiles of road surfacesare modelled with splines and free-space calculation is performed.By removing the detected obstacles from a disparity map, the freedriving spaces can be conveniently located even with the presenceof large obstacles in the near-field. For future work, the possibilityof using temporal information for stabilisation will be explored.

Appendix A. Supplementary material

Supplementary data associated with this article can be found, inthe online version, at http://dx.doi.org/10.1016/j.cviu.2014.02.014.

References

[1] X. Ai, R. Nock, J.G. Rarity, N. Dahnoun, High-resolution random-modulation cwlidar, Appl. Opt. 50 (2011) 4478–4488.

[2] I. Ulrich, I. Nourbakhsh, Appearance-based obstacle detection with monocularcolor vision, in: Proceedings of the National Conference on ArtificialIntelligence, AAAI Press; MIT Press, Menlo Park, CA; Cambridge, MA; London,1999, 2000, pp. 866–871.

[3] W. von Seelen, C. Curio, J. Gayko, U. Handmann, T. Kalinke, Scene analysis andorganization of behavior in driver assistance systems, in: InternationalConference on Image Processing, vol. 3, 2000, pp. 524–527. http://dx.doi.org/10.1109/ICIP.2000.899483.

[4] S. Buluswar, B. Draper, Color machine vision for autonomous vehicles, Eng.Appl. Artif. Intell. 11 (2) (1998) 245–256.

[5] E. Dickmanns, R. Behringer, D. Dickmanns, T. Hildebrandt, M. Maurer, F.Thomanek, J. Schiehlen, The seeing passenger car’vamors-p’, in: Proceedings ofthe IEEE Intelligent Vehicles Symposium, IEEE, 1994, pp. 68–73.

[6] C. Goerick, D. Noll, M. Werner, Artificial neural networks in real-time cardetection and tracking applications, Pattern Recogn. Lett. 17 (4) (1996) 335–343.

[7] T. Kalinke, C. Tzomakas, W. Seelen, A texture-based object detection and anadaptive model-based classification, in: Proc. IEEE Intelligent VehiclesSymposium ‘98, 1998.


http://refhub.elsevier.com/S1077-3142(14)00040-X/h0090








http://dx.doi.org/10.1109/ICIP.2000.899483

http://dx.doi.org/10.1109/ICIP.2000.899483











[8] A. Giachetti, M. Campani, V. Torre, The use of optical flow for road navigation,IEEE Trans. Robot. Autom. 14 (1) (1998) 34–48.

[9] H. Mallot, H. Bulthoff, J. Little, S. Bohrer, Inverse perspective mappingsimplifies optical flow computation and obstacle detection, Biol. Cybernet.64 (3) (1991) 177–185.

[10] W. Kruger, W. Enkelmann, S. Rossle, Real-time estimation and tracking ofoptical flow vectors for obstacle detection, in: Proceedings of the IntelligentVehicles’95 Symposium, IEEE, 1995, pp. 304–309.

[11] M. Bertozzi, A. Broggi, GOLD: a parallel real-time stereo vision system forgeneric obstacle and lane detection, IEEE Trans. Image Process. 7 (1998) 62–81.

[12] A. Broggi, C. Caraffi, R. Fedriga, P. Grisleri, Obstacle detection with stereo visionfor off-road vehicle navigation, in: IEEE Computer Society Conference onComputer Vision and Pattern Recognition-Workshops (CVPR Workshops),IEEE, 2005. pp. 65–65.

[13] R. Labayrade, D. Aubert, J. Tarel, Real time obstacle detection in stereovision onnon flat road geometry through v-disparity representation, Proceeding of theIEEE Intelligent Vehicle Symposium, vol. 2, IEEE, 2002, pp. 646–651.

[14] S. Nedevschi, R. Danescu, D. Frentiu, T. Marita, F. Oniga, C. Pocol, R. Schmidt, T.Graf, High accuracy stereo vision system for far distance obstacle detection, in:Proceeding of the IEEE Intelligent Vehicles Symposium, 2004, pp. 292–297.http://dx.doi.org/10.1109/IVS.2004.1336397.

[15] T. Williamson, A high-performance stereo vision system for obstacle detection,Ph.D. thesis, Robotics Institute, Carnegie Mellon University, 1998.

[16] K. Huh, J. Park, J. Hwang, D. Hong, A stereo vision-based obstacle detectionsystem in vehicles, Opt. Lasers Eng. 46 (2) (2008) 168–178.

[17] S. Carlsson, Object detection using model based prediction and motionparallax, in: Computer Vision-ECCV 90, Springer, 1990, pp. 297–306.

[18] W. Enkelmann, Obstacle detection by evaluation of optical flow fields fromimage sequences, in: Computer Vision-ECCV 90, Springer, 1990, pp. 134–138.

[19] A. Yilmaz, X. Li, M. Shah, Contour-based object tracking with occlusionhandling in video acquired using mobile cameras, IEEE Trans. Pattern Anal.Machine Intell. 26 (11) (2004) 1531–1536.

[20] W. Kruger, W. Enkelmann, S. Rossle, Real-time estimation and tracking ofoptical flow vectors for obstacle detection, in: Proceedings of the IntelligentVehicles Symposium, 1995, pp. 304–309, http://dx.doi.org/10.1109/IVS.1995.528298.

[21] H. Hirschmüller, Accurate and efficient stereo processing by semi-globalmatching and mutual information, in: IEEE Computer Society Conference onComputer Vision and Pattern Recognition, IEEE Computer Society, 2005.

[22] J. Sun, N. Zheng, H. Shum, Stereo matching using belief propagation, IEEETrans. Pattern Anal. Machine Intell. 25 (2003) 787–800.

[23] V. Kolmogorov, R. Zabih, Computing visual correspondence with occlusionsusing graph cuts, in: Eighth IEEE International Conference on Computer Vision,IEEE Computer Society, 2001.

[24] P. Belhumeur, A bayesian approach to binocular steropsis, Int. J. Comput. Vis.19 (3) (1996) 237–260.

[25] Y. Deng, X. Lin, A fast line segment based dense stereo algorithm using treedynamic programming, in: Computer Vision–ECCV 2006, Springer, 2006, pp.201–212.

[26] D. Scharstein, R. Szeliski, A taxonomy and evaluation of dense two-framestereo correspondence algorithms, Int. J. Comput. Vis. 47 (1) (2002) 7–42.

[27] J. Kalomiros, J. Lygouras, Hardware principles for the design of a stereo-matching state machine based on dynamic programming, J. Eng. Sci. Technol.Rev. 1 (2008) 19–24.

[28] S. Guan, R. Klette, Belief-propagation on edge images for stereo analysis ofimage sequences, in: Proceedings of the 2nd International Conference onRobot Vision, 2008, pp. 291–302.

[29] D. Scharstein, Matching images by comparing their gradient fields,Proceedings of the 12th IAPR International Conference on PatternRecognition, vol. 1, IEEE, 1994, pp. 572–575.

[30] R. Zabih, J. Woodfill, Non-parametric local transforms for computing visualcorrespondence, in: Computer Vision-ECCV’94, Springer, 1994, pp. 151–158.

[31] U. Franke, I. Kutzbach, Fast stereo based object detection for stop and go traffic,in: Proceedings of the IEEE Intelligent Vehicles Symposium, 1996, pp. 339–344.

[32] Q. Yu, H. Araujo, H. Wang, Stereo-vision based real time obstacle detection forurban environments, in: IEEE Conference Proceedings of the 11th InternationalConference on Advanced Robotics, ICAR, vol. 3, 2003, pp. 1671–1676.

[33] S. Gehrig, F. Stein, Collision avoidance for vehicle-following systems, IEEETrans. Intell. Transport. Syst. 8 (2) (2007) 233–244.

[34] Z. Hu, F. Lamosa, K. Uchimura, A complete uv-disparity study for stereovisionbased 3d driving environment analysis, in: Fifth International Conference on3D Digital Imaging and Modeling, IEEE, 2005, pp. 204–211.

[35] A. Wedel, H. Badino, C. Rabe, H. Loose, U. Franke, D. Cremers, B-splinemodeling of road surfaces with an application to free-space estimation, IEEETrans. Intell. Transport. Syst. 10 (4) (2009) 572–583.

[36] S. Intille, A. Bobick, Disparity-space images and large occlusion stereo, in:Computer Vision-ECCV’94, Springer, 1994, pp. 179–186.

[37] H. Youssef et al., simulated annealing and tabu search: a comparative study,Eng. Appl. Artif. Intell. 14 (2) (2001) 167–181.

[38] Y. Gao, X. Ai, Y. Wang, N. Dahnoun, UV-disparity based obstacle detection with3D camera and steerable filter, in: IEEE Intelligent Vehicles Symposium 2011(IV 2011), 2011.

[39] D. Scharstein, R. Szeliski, A taxonomy and evaluation of dense two-framestereo correspondence algorithms, Int. J. Comput. Vis. 47 (1–3) (2002) 7–42.

[40] Z.-F. Wang, Z.-G. Zheng, A region based stereo matching algorithm usingcooperative optimization, 2008, pp. 1–8.

[41] Q. Yang, L. Wang, R. Yang, S. Wang, M. Liao, D. Nister, Real-time global stereomatching using hierarchical belief propagation, 6, 2006, pp. 989–998.

[42] X. Sun, X. Mei, S. Jiao, M. Zhou, H. Wang, Stereo matching with reliabledisparity propagation, 2011, pp. 132–139.

[43] A. Klaus, M. Sormann, K. Karner, Segment-based stereo matching using beliefpropagation and a self-adapting dissimilarity measure, 3, 2006, pp. 15–18.

[44] Q. Yang, L. Wang, R. Yang, H. Stewénius, D. Nistér, Stereo matching with color-weighted correlation, hierarchical belief propagation, and occlusion handling,IEEE Trans. Pattern Anal. Machine Intell. 31 (3) (2009) 492–504.

[45] T. Vaudrey, C. Rabe, R. Klette, J. Milburn, Differences between stereo andmotion behaviour on synthetic and real-world stereo sequences, in: 23rdInternational Conference on Image and Vision Computing, IEEE, 2008, pp. 1–6.

[46] K. Schauwecker, S. Morales, S. Hermann, R. Klette, A comparative study ofstereo-matching algorithms for road modelling in the presence of windscreenwipers, Tech. rep. 67, The .enpeda.. Project, The University of Auckland, 2011.

[47] N. Einecke, J. Eggert, A two-stage correlation method for stereoscopic depthestimation, 2010, pp. 227–234. http://dx.doi.org/10.1109/DICTA.2010.49

[48] M.R. Andreas Geiger, R. Urtasun, Efficient large-scale stereo matching,Computer Vision CACCV.






















http://dx.doi.org/10.1109/IVS.2004.1336397

































































http://dx.doi.org/10.1109/DICTA.2010.49

Date post:	27-Jan-2017
Category:	Documents
Upload:	naim
View:	212 times
Download:	0 times

Robust obstacle detection based on a novel disparity calculation method and G-disparity

Documents