Multi-model fitting using Particle Swarm Optimization for 3D perception in robot vision

Multi-model Fitting Using Particle Swarm Optimization for 3DPerception in Robot Vision

Kai Zhou, Michael Zillich, Markus Vincze, Alen Vrecko and Danijel Skocaj

Abstract— Attention operators based on 2D image cues (suchas color, texture) are well known and discussed extensivelyin the vision literature but are not ideally suited for roboticapplications. In such contexts it is the 3D structure of sceneelements that makes them interesting or not. We show how abottom-up exploration mechanism that selects spaces of interest(SOIs) based on scene elements that pop out from planes isused within a larger architecture for a cognitive system. Thismechanism simplifies the object localization as single planedetection, which is however not practical when dealing with realscenes that contains objects with complicated structures (e.g.objects in a multi-layer shelf). Therefore, the key work requiredfor this situation is the multi-plane estimation, which is solvedin this paper using Particle Swarm Optimization (PSO).

I. INTRODUCTION

Imagine a robot entering a room with the task to locatean object, say the ubiquitous coffee mug. This is quitea challenge as the mug might be partially occluded on acluttered office desk, hidden in shadow on a shelf, too faraway and only a few pixels large or simply out of the currentview. This highlights the importance of attention mechanismsfor robotic vision applications, as has also been arguedpreviously e.g. in [1]. While much of the object recognitionliterature operates on centered objects which are large inthe image (at least for training) a major problem in roboticapplications is to get such nice views in the first place.

Although attention has been the subject of much interestin the psychology and vision literature, relatively little isconcerned with attention based on 3D cues (see [2] for agood overview).

However, psychophysical studies show that spontaneous,exploratory eye movements are not only dependent on 2Dfeatures such as contrast and edge intensity as used inpopular saliency models [3] [4] but are also influenced bythe 3D structure of the visual scene, e.g. for a slanted planefollow the depth gradient [5] or fall on the 3D center ofgravity of objects rather than the 2D c.o.g. of the projection[6].

Several authors have addressed the issue of attentionbased on 3D cues. [7] combines disparity, image flow andmotion cues into an attentional operator that is designedto follow close moving targets. [8] extends the standardKoch & Ullman [9] model of visual attention based on colorimages with a depth channel. Similarly [10] combines two

Kai Zhou, Michael Zillich and Markus Vincze are with Institute of Au-tomation and Control, Vienna University of Technology, Austria. [zhou,zillich, vincze]@acin.tuwien.ac.at

Alen Vrecko and Danijel Skocaj are with the Visual Cognitive Sys-tems Laboratory, University of Ljubljana, Slovenia. [alen.vrecko,danijel.skocaj]@fri.uni-lj.si

2D saliency maps from the reflectance and range image of a3D laser range scanner and show improved object recognitionperformance.

Putting an emphasis on biological plausibility the authorsof [11] extend their Selective Tuning Model of attention tothe binocular case to select areas and disparities of optimalmatches between left and right images. Moreover their modelhandles issues of binocular rivalry, i.e. can put attention ona salient region in one eye when the corresponding regionin the other eye is occluded.

Showing the use of attention in a robotic system [1]presents a strategy for a mobile robot equipped with astereo head to search for a target object in an unknown3D environment that optimizes the probability of findingthe target given a fixed cost limit in terms of total numberof robotic actions required for detection. Their approachmaintains a 3D grid of detection likelihood that is used toplan next best positions and views.

Most of the above mentioned approaches handle 3D at-tention by treating the disparity image like another channelnext to color. [12] use the 3D reconstructed point cloud forsegmenting objects from a ground plane as a preprocessingstep in a robotic scenario for learning object properties.The segmentation from stereo data, which can be of un-satisfactory accuracy depending on the amount of availabletexture, is refined with graph-cut segmentation in the colorimage. However they focus on the affordance learning formanipulation and apply RANSAC (RANdom SAmple Con-sensus) [13] to fit a plane to the table surface. The well-recognised limitation of the RANSAC scheme, which isthat it permits the estimation of single model, restricts thepractical application such as dealing with real scenes thatcontains objects with complicated structures (e.g. multi-layershelf, sofa and connected tea table).

We extend that approach to a wider range of scenes anddevelop a 3D attention operator for a robotic system aimedat various indoor tasks. Among these are object recognitionand learning of objects and their properties. We make theassumption that objects presented to the robot for learningas well as objects the robot is asked to pick up are restingon supporting surfaces such as tables, shelves or simply thefloor. Accordingly we place attention on anything that sticksout from supporting surfaces.

Our major contribution is a novel multi-model estimationapproach based on particle swarm optimisation. The resultedmultiple supporting planes serve in our robot system asa fundamental component for bottom-to-up exploration. InSection II the related work of model-estimation is outlined.

https://www.researchgate.net/publication/228938181_Unsupervised_learning_of_basic_object_affordances_from_object_properties?el=1_x_8&enrichId=rgreq-bdb4242e-f80a-4ecf-b223-78bfe5b21bad&enrichSource=Y292ZXJQYWdlOzIyNDIyMzAyODtBUzoxMjk2NzMyMTIyMDcxMDRAMTQwNzkyNzkxMTkwNg==

https://www.researchgate.net/publication/23629338_Depth_Affects_Where_We_Look?el=1_x_8&enrichId=rgreq-bdb4242e-f80a-4ecf-b223-78bfe5b21bad&enrichSource=Y292ZXJQYWdlOzIyNDIyMzAyODtBUzoxMjk2NzMyMTIyMDcxMDRAMTQwNzkyNzkxMTkwNg==

https://www.researchgate.net/publication/222654518_A_Bimodal_Laser-Based_Attention_System?el=1_x_8&enrichId=rgreq-bdb4242e-f80a-4ecf-b223-78bfe5b21bad&enrichSource=Y292ZXJQYWdlOzIyNDIyMzAyODtBUzoxMjk2NzMyMTIyMDcxMDRAMTQwNzkyNzkxMTkwNg==

https://www.researchgate.net/publication/220244956_Computational_Visual_Attention_Systems_and_Their_Cognitive_Foundations_A_Survey?el=1_x_8&enrichId=rgreq-bdb4242e-f80a-4ecf-b223-78bfe5b21bad&enrichSource=Y292ZXJQYWdlOzIyNDIyMzAyODtBUzoxMjk2NzMyMTIyMDcxMDRAMTQwNzkyNzkxMTkwNg==

https://www.researchgate.net/publication/37683454_Attention_and_Visual_Search_Active_Robotic_Vision_Systems_that_Search?el=1_x_8&enrichId=rgreq-bdb4242e-f80a-4ecf-b223-78bfe5b21bad&enrichSource=Y292ZXJQYWdlOzIyNDIyMzAyODtBUzoxMjk2NzMyMTIyMDcxMDRAMTQwNzkyNzkxMTkwNg==


https://www.researchgate.net/publication/221469459_An_Attentional_Framework_for_Stereo_Vision?el=1_x_8&enrichId=rgreq-bdb4242e-f80a-4ecf-b223-78bfe5b21bad&enrichSource=Y292ZXJQYWdlOzIyNDIyMzAyODtBUzoxMjk2NzMyMTIyMDcxMDRAMTQwNzkyNzkxMTkwNg==

https://www.researchgate.net/publication/222864111_Modeling_visual_attention_via_selective_tuning?el=1_x_8&enrichId=rgreq-bdb4242e-f80a-4ecf-b223-78bfe5b21bad&enrichSource=Y292ZXJQYWdlOzIyNDIyMzAyODtBUzoxMjk2NzMyMTIyMDcxMDRAMTQwNzkyNzkxMTkwNg==

https://www.researchgate.net/publication/215458581_Random_Sample_Consensus_A_Paradigm_for_Model_Fitting_with_Applications_To_Image_Analysis_and_Automated_Cartography?el=1_x_8&enrichId=rgreq-bdb4242e-f80a-4ecf-b223-78bfe5b21bad&enrichSource=Y292ZXJQYWdlOzIyNDIyMzAyODtBUzoxMjk2NzMyMTIyMDcxMDRAMTQwNzkyNzkxMTkwNg==

https://www.researchgate.net/publication/3887814_Computing_visual_attention_from_scene_depth?el=1_x_8&enrichId=rgreq-bdb4242e-f80a-4ecf-b223-78bfe5b21bad&enrichSource=Y292ZXJQYWdlOzIyNDIyMzAyODtBUzoxMjk2NzMyMTIyMDcxMDRAMTQwNzkyNzkxMTkwNg==

https://www.researchgate.net/publication/19324926_Shifts_in_Selective_Visual_Attention_Towards_the_Underlying_Neural_Circuitry?el=1_x_8&enrichId=rgreq-bdb4242e-f80a-4ecf-b223-78bfe5b21bad&enrichSource=Y292ZXJQYWdlOzIyNDIyMzAyODtBUzoxMjk2NzMyMTIyMDcxMDRAMTQwNzkyNzkxMTkwNg==

https://www.researchgate.net/publication/3660343_A_computational_model_of_depth-based_attention?el=1_x_8&enrichId=rgreq-bdb4242e-f80a-4ecf-b223-78bfe5b21bad&enrichSource=Y292ZXJQYWdlOzIyNDIyMzAyODtBUzoxMjk2NzMyMTIyMDcxMDRAMTQwNzkyNzkxMTkwNg==

https://www.researchgate.net/publication/8381053_Saccadic_localization_in_the_presence_of_cues_to_three-dimensional_shape?el=1_x_8&enrichId=rgreq-bdb4242e-f80a-4ecf-b223-78bfe5b21bad&enrichSource=Y292ZXJQYWdlOzIyNDIyMzAyODtBUzoxMjk2NzMyMTIyMDcxMDRAMTQwNzkyNzkxMTkwNg==

The subsequent sections describe the system overview, theemployed methods in detail and present the results.

II. RELATED WORK

In model estimation from a set of observed data with out-liers, the RANSAC approaches based on randomly selectedhypotheses of models prove efficient. Each hypothesis isevaluated by means of a cost function f(.). The hypothesisthat minimize f(.), i.e. h = arg min f(.) is considered asthe solution of the estimation. As the candidate hypothesesare discrete, the RANSAC algorithms offer only approximateresults. Meanwhile, in order to ensure that all the inliersamples are covered by the hypotheses, a massive numberof hypotheses must be generated in most RANSAC-basedalgorithms. The inlier ratio is required to be defined be-fore the hypotheses generation, either by the user [13][14]or by pre-calculation [15][16]. The RANSAC approach isoriginally designed for single model estimation and a fewliteratures report some successful extensions to multi-modelfitting [17][18].

Since the hypotheses used by RANSAC are usually de-scribed with parameters (e.g. a line r(θ) = r0 sec(θ − ϕ) inpolar coordinate system is represented with two parameters[r, θ]), the target models can be alternatively estimated inparameter space[19][20][21]. The parameter-space analy-sis approaches cope naturally with multi-model estimation.Diverse statistic analysis methods such as skewness andkurtosis detection are deployed in the parameter space ofthe model. In [19], Xu et al. build a histogram and detectthe peaks of the histogram in parameter space to get the bestsolution; Subbarao and Meer [20] utilize mean-shift to clusterthe manifold structures, which in essence applies the Gauss-Newton algorithm to find the minimum of an elaborate costfunction in parameter space.

Recent developments [14][15][16] in robust multi-modelestimation lie in a large part in automatic process ending.Instead of evaluating each hypothesis individually, the can-didates are checked by means of their distribution and therelationships between one hypothesis and the other onesare also considered (such as J-linkage in [14] and Mercerkernel in [16]). However, the search space of solutions is stilllimited to all the pre-selected hypotheses as for the RANSACfamily.

Recently in [21], Delong et al. proposed an energy mini-mization based algorithm, which is widely used for labellingproblems in computer vision such as image segmentationand stereo matching. By controlling the label number in thesolution, this method proves to be effect for multi-modelfitting.

All the above mentioned methods deliver discrete solutionsand therefore are sensitive to uneven distributed data sets, i.e.once the distribution of inliers is uneven, some models aredropped because of the manifold regularization.

We propose a novel method to estimate multiple modelsin parameter space. The hypothetical models are mappedto parameter space and represented as particles, which canmove continuously. Particle swarm optimization (PSO) is

employed to search for the optimal position in the parameterspace, where the corresponding model minimizes the costfunction. Since the PSO is originally designed to simulatesocial behaviours aimed at a global optimization, informationsharing and mutual cooperation among particles(solutions)are incorporated as in [14][15][16]. Therefore, the modelfitting can be automatically ended up when all the optimalmodels are found out. The other advantage of our algorithmis that it delivers the estimation results model by modelsuited for real-time performance in mobile robot system,differing from the global analysis algorithms in which allthe estimated models are output as a whole solution afterthe entire estimation process [14][15][16].

III. SYSTEM OVERVIEW

Figure 1 shows (part of) the visual processing happeningwithin a larger robotics framework. The framework is basedon a software architecture toolkit [22] which handles issueslike threading, lower level drivers and communication viashared working memories.

Vision Subarchitecture

Language

Sp

ati

al R

ea

so

nin

g

WorkingMemoryVideo

Stereo

Plane Pop-out

Recognition

Segmentation

Learning

SOIs

Proto Objects

Planning

Na

vig

ati

on

Labels

SupportingPlanes

Fig. 1. System overview: Attention driven visual processing in a cognitiverobotics architecture

Attention in this context serves several purposes. First ofall there is the obvious usage as a primer for costly objectrecognition. 3D point cloud from stereo reconstruction isused to extract dominant planes as well as things stickingout from these planes, a process which we refer to as planepop-out. Those sticking-out parts form spherical 3D spacesof interest (SOIs) (termed so to avoid confusion with typical2D image regions of interest - ROIs) which are handedto the segmentation component. The segmenter is coarselyinitialised with colours obtained from back-projected 3Dpoints inside the SOIs and then it refines the projectedcontour of the SOIs generating what we term proto-objects.These form an intermediate level more object-like (i.e. morelikely to correspond to an actual scene object) than just“stuff” that caught our attention but not quite recognised(labelled) objects yet. Proto-objects are subsequently handedto the (SIFT-based) recogniser which only needs to processthe segmented image regions.

https://www.researchgate.net/publication/3192738_Bias_in_robust_estimation_caused_by_discontinuities_and_multiple_structures?el=1_x_8&enrichId=rgreq-bdb4242e-f80a-4ecf-b223-78bfe5b21bad&enrichSource=Y292ZXJQYWdlOzIyNDIyMzAyODtBUzoxMjk2NzMyMTIyMDcxMDRAMTQwNzkyNzkxMTkwNg==

https://www.researchgate.net/publication/224135901_Robust_Fitting_of_Multiple_Structures_The_Statistical_Learning_Approach?el=1_x_8&enrichId=rgreq-bdb4242e-f80a-4ecf-b223-78bfe5b21bad&enrichSource=Y292ZXJQYWdlOzIyNDIyMzAyODtBUzoxMjk2NzMyMTIyMDcxMDRAMTQwNzkyNzkxMTkwNg==





https://www.researchgate.net/publication/222482307_A_New_Curve_Detection_Method_-_Randomized_Hough_Transform_Rht?el=1_x_8&enrichId=rgreq-bdb4242e-f80a-4ecf-b223-78bfe5b21bad&enrichSource=Y292ZXJQYWdlOzIyNDIyMzAyODtBUzoxMjk2NzMyMTIyMDcxMDRAMTQwNzkyNzkxMTkwNg==

https://www.researchgate.net/publication/221304340_Robust_Multiple_Structures_Estimation_with_J-Linkage?el=1_x_8&enrichId=rgreq-bdb4242e-f80a-4ecf-b223-78bfe5b21bad&enrichSource=Y292ZXJQYWdlOzIyNDIyMzAyODtBUzoxMjk2NzMyMTIyMDcxMDRAMTQwNzkyNzkxMTkwNg==

https://www.researchgate.net/publication/221304340_Robust_Multiple_Structures_Estimation_with_J-Linkage?el=1_x_8&enrichId=rgreq-bdb4242e-f80a-4ecf-b223-78bfe5b21bad&enrichSource=Y292ZXJQYWdlOzIyNDIyMzAyODtBUzoxMjk2NzMyMTIyMDcxMDRAMTQwNzkyNzkxMTkwNg==

https://www.researchgate.net/publication/221364415_Fast_Approximate_Energy_Minimization_with_Label_Costs?el=1_x_8&enrichId=rgreq-bdb4242e-f80a-4ecf-b223-78bfe5b21bad&enrichSource=Y292ZXJQYWdlOzIyNDIyMzAyODtBUzoxMjk2NzMyMTIyMDcxMDRAMTQwNzkyNzkxMTkwNg==


https://www.researchgate.net/publication/222682593_Engineering_intelligent_information-processing_systems_with_CAST?el=1_x_8&enrichId=rgreq-bdb4242e-f80a-4ecf-b223-78bfe5b21bad&enrichSource=Y292ZXJQYWdlOzIyNDIyMzAyODtBUzoxMjk2NzMyMTIyMDcxMDRAMTQwNzkyNzkxMTkwNg==

https://www.researchgate.net/publication/221361877_Nonlinear_Mean_Shift_for_Clustering_over_Analytic_Manifolds?el=1_x_8&enrichId=rgreq-bdb4242e-f80a-4ecf-b223-78bfe5b21bad&enrichSource=Y292ZXJQYWdlOzIyNDIyMzAyODtBUzoxMjk2NzMyMTIyMDcxMDRAMTQwNzkyNzkxMTkwNg==

IV. THE PROPOSED APPROACH

There are of course potentially many planes in an indoorenvironment but we are only interested in supporting, i.e. hor-izontal planes. We either know the tilt angle of the camerafrom the mounting on the mobile platform and thus knowthe horizontal direction or, in case we don’t have a calibratedsystem, we find it based on the assumption that the groundplane is the dominant plane in the first view when enteringa new room (initialisation phase).

A. MULTI-MODEL FITTING

Here we describe how the plane estimation problem canbe effectively solved with particle swarm optimization. Giveninput data (point cloud from stereo) L = {xi}i=1,...,N andplane hypotheses H = {θj}j=1,...,M which are generated byindividuals randomly sampled from L, the plane should berepresented with D parameters, which means the optimalplane can be estimated with searching a D dimensionalspace, also each plane can be considered as a particle inthe D dimensional space. A cost function f(θ) is defined toevaluate the current position of model θ in the D dimensionalparameter space. The problem of plane model estimation isto find arg min{f(θ), θ ∈ RD}.

Particle Swarm Optimization (PSO) is a metaheuristicoptimization method which was first intended for simulatingsocial behaviour of birds[23]. PSO maintains a swarm ofcandidate solutions(called particles), and makes these parti-cles fly around in the search-space according to their ownpersonal best found position and the swarm’s best knownposition. To rate each particle, PSO introduces a fitnessfunction which has a similar functionality as the energyfunction used in [21]. Each particle has a velocity thatdirects its movement and is adjusted iteratively. After severaliterations, particles are converged at one position where theevaluation function has the minimum value.

vt+1jd = χ(wvtjd + c1r1(ptjd − θtjd) + c2r2(ptgd − θtjd))θt+1jd = θtjd + vt+1

jd

χ =2

|2− c−√c2 − 4c|

c = c1 + c2

(1)

where vj = (vj1, vj1, . . . , vjD) denotes the velocityof particle j in d dimension (1 ≤ d ≤ D), pj =(pj1, pj1, . . . , pjD) is the jth particle’s own best positionfound so far, and pgd is the global best current position ofthe swarm. For all the particles in d dimensions, vt+1, θt+1

are updated velocity and new position derived from this newvelocity. As in [24], χ is the “constriction factor”which isintroduced to limit the maximum velocity and the dynamicsearch area on each dimension. w, c1, c2 are inertia weightand two acceleration constants, where c1 is the factor thatinfluences the “cognitive”behaviour, i.e., how much the par-ticle will follow its own best solution, and c2 is the factorfor “social”behaviour, i.e., how much the particle will followthe swarm’s best solution. c1 and c2 are typically set to 2.1

and 2.0, χ is a function of c1 and c2 as reflected in (1). r1

and r2 are even distributed random numbers in the interval[0, 1]. The iteration continues till a termination criterion (e.g.number of iterations performed, or adequate fitness reached)is met, then the pgd holds the best found solution. Thetermination criterion as well as the fitness function usedin our approach are addressed later. Algorithm 1 lists thescheme of PSO.

Algorithm 1 PSO scheme1: ∀θj Initialise vj , pj randomly2: while maximum iteration or minimum error criteria is

not attained do3: ∀θj , evaluate each particle using fitness function f(θj)4: if f(θj) < f(pj) then5: Replace pi with θj6: end if7: if f(θj) < f(pg) then8: Replace pg with θj9: end if

10: ∀θj , calculate new vj and update θj according to (1)11: end while

We don’t simply use the current best position of theswarm but select one particle from a list of the global-bestusing roulette wheel selection. This roulette wheel selectionmechanism [25] is introduced to promote the diversificationof global (assuming) optima and avoid the local optimacaused by premature convergence. To implement the roulettewheel selection, a list of particles with the last n fitnessvalues is first stored. These particles are rated using theirfitness and the fittest individuals have the largest share of theroulette wheel. If fk is the fitness of particle k, its probabilityof being selected as the global best is as follows:

Pf=fk =fk∑n

m=1 fm(2)

Since this roulette wheel selection mechanism is calledwhenever the particle needs to update its velocity, the cellson the roulette (equivalent with the length of the global bestlist) should be reasonably limited.

Easy case: one model estimation: We use the general costfunction the same with the RANSAC family as the fitnessfunction in PSO. It can be formulated as follows:

f(θt) =∑xi⊂L

Loss(distxiθt) (3)

where i denotes the index of each input data and θ is themodel hypothesis. dist is an error function such as geometricdistance between individual data and model hypothesis. Theloss function is a truncation function which can be illustratedas follows:

Loss(dist) =

{dist if |dist| < const

const otherwise(4)


https://www.researchgate.net/publication/3865142_Comparing_inertial_weights_and_Constriction_factor_in_particle_swarm_optimization?el=1_x_8&enrichId=rgreq-bdb4242e-f80a-4ecf-b223-78bfe5b21bad&enrichSource=Y292ZXJQYWdlOzIyNDIyMzAyODtBUzoxMjk2NzMyMTIyMDcxMDRAMTQwNzkyNzkxMTkwNg==

https://www.researchgate.net/publication/3623274_Particle_swarm_Optimization?el=1_x_8&enrichId=rgreq-bdb4242e-f80a-4ecf-b223-78bfe5b21bad&enrichSource=Y292ZXJQYWdlOzIyNDIyMzAyODtBUzoxMjk2NzMyMTIyMDcxMDRAMTQwNzkyNzkxMTkwNg==


We use a time-based measure to detect the convergence(termination criterion) as in [26]. The swarm is said to haveconverged if the global best location has not moved for ssteps, this measure is invariant to the scale of problem space.

Fitness Function “Deflation”: When more than one modelexists in the dataset, the fitness function has more thanone optimum, therefore the swarm probably plunges intoone of the following behaviours: either it can find onlyone optimum, or it will wander and reciprocate amongoptima. We propose a “deflation” function g(θ) to perform atransformation of the fitness function f(θ) once there is any(possibly local) optimum being found, this transformationlessens the fitness of the optima found as well as the aroundarea of these optimums, it seems as the nearby area of theoptima found converges at this position and the particles inthis area has the lower probability to be selected as the globalbest position.

g(θ) = f(θ)∏k⊂O

(λe−λDθθk + 1) (5)

where θ is one optimum found and Dθθk denotes thedistance between current particle and the kth optimum found.The set O contains the number of all the optima found. λis a parameter which is inversely proportional to the size ofthe nearby convergence area of the optima found.

Algorithm 2 PSO scheme for multiple model fitting1: ∀θj Initialise vj , pj randomly2: while Completion criteria is not attained do3: if sizeof(O) == 0 then4: ∀θj , E(θj) = f(θj)5: else6: ∀θj , E(θj) = g(θj)7: end if8: if E(θj) < f(pj) then9: Replace pi with θj

10: end if11: if E(θj) < f(pg) then12: Replace pg with θj13: end if14: ∀θj , calculate new vj and update θj according to (1)15: if Convergence is detected then16: ∀θj Reinitialise vj , pj17: O →pushback(pg)18: end if19: end while

Each convergence can be detected in the same way asin the one-model estimation problem described before. Thecompletion criteria, which means all the optima are found,can be derived by the analysing the stability of the globalbest list used in roulette wheel selection. Once all theoptima are found, the fitness of new hypothetical particlescounterbalances with each other, the stability of global bestlist reflects this phenomenon.

B. OBJECT SEGMENTATION

Plane fitting is called iteratively until no more horizontalplanes with reasonable size can be found. Then the remainingpoints sticking out from these planes are segmented using3D flood-filling and the resulting clusters together with abounding sphere form SOIs. Note that the bounding sphereis slightly larger than the actual point cluster to ensure italso contains a part of the plane points, which is neededfor the following segmentation step. Figure 4 shows thedifferent scenes and corresponding reconstructed point cloud.Different planes are shown in different colours and remainingsticking out points are shown in yellow. Because of theinherent limitation of stereo reconstruction at poorly texturedsurface parts and shadowing effects between left and rightcamera, we refine the results using 2D colour based segmen-tation.

The 2D segmentation is based on energy minimisationwith graph cuts. The back-projected 3D points within theSOI provide colour and spatial cues for the object and itsbackground. The cost function for the object combines thecolour cost with the spatial cost, while the cost functionfor the background consists of the colour cost componentonly. The spatial cost is simply the distance between thepoint and the nearest object’s back-projected 3D point. Thecolour cost, on the other hand, is the average distancebetween the point’s colour and the K nearest colours fromthe sample (K is determined based on the sample size).Besides foreground and background cost functions, there is athird cost function with a fixed cost to cover the areas, whereboth former functions have high costs. While these areas areconsidered uncertain and might be resolved on higher levelsof the system’s cognition, they are meanwhile deemed asbackground by the recogniser.

The distance between two colours is calculated in the HLScolour space:

∆HLS = ∆2S + (1−∆S)∆HL (6)∆HL = S∆H + (1− S)∆L, (7)

where ∆H , ∆L and ∆S are the distances between the twocolour’s HLS components, while S is the average saturationof the two colours. All the parameters are normalised tovalues between 0 and 1. The H distance has to be additionallynormalised and truncated because of its circular space. Thecontribution of each colour component to the overall distancebetween the two colours is thus determined by the saturationdifference and saturation average.

The code for the graph cut algorithm was kindly providedby Boykov, Kolmogorov and Veksler [27][28][29].

V. EXPERIMENTAL RESULTS

To demonstrate the progress of convergence, firstly wetested our PSO based multi-model estimation algorithm onsynthetic 2D line detection to demonstrate how the particlesmove and converge to the solution. In figure 2, each particleis a solution to the line-fitting problem and the color/heightpresents the fitness of this solution. This figure records all

−20

0

20

40

−15−10

−50

5100

100

200

300

400

Fig. 2. Convergence of all the particles, the horizontal plane is the 2dimensional parameter space of the solution to the line fitting problem andvertical axis is the value of fitness function.

0.40.5

0.60.7

00.1

0.20.3

0.40

50

100

150

200

Fig. 3. Movement of global best, the horizontal plane is the 2 dimensionalparameter space of the solution to the line fitting problem and vertical axisis the iteration times of PSO.

the positions that every particle has been “flied through” inthe entire procedure of line-fitting. Figure 3 shows that after100 iterations, the global best is almost stable and static, thisis the criterion for convergence detection in our experiments.The line-fitiing results are reported in figure 6 and 7.

As shown in figure 4, our system are tested on the differentreal scene. The multiple layers of shelf are estimated in figure4(a), the chair surface and floor are detected in figure 4(b).Figure 5 shows a typical result for a shelf consisting of threeplanes. The results of the subsequent segmentation step areshown in the right part of figure 5. On the left side, the SOIare marked on the original image with a bounding rectangle,the following 2D graph-cut segmentation would only processin these rectangles. The right side zooms on these area, thetop images show the position of back-projected 3D points(green for object, red for background) and the segmentation(grey for object, white for background), the bottom imagesrepresent the graph cut cost functions for object and back-ground where the brighter colour denotes greater cost. Wecan see that despite the fact that the backprojected 3D pointsare not very precise due to rather large noise, the graph-cutsegmentation can be successfully initialised and provides aprecise object contour.

VI. CONCLUSION

We present an attentional mechanism based on plane pop-out in 3D stereo data and its use within a robotic frame-work. Our major contribution is the novel robust multi-plane

(a) multi-layer shelf scene

(b) chair and floor scene

Fig. 4. PSO based multi-plane estimation is tested with two differentscenes, the detected planes are illustrated in the right images, note that thefigure is best viewed in color.

Fig. 5. Segmentation of back-projected spaces of interest

estimation framework. To our knowledge, Particle SwarmOptimisation is the first used as the workhorse to solve thisproblem. As PSO is a variant of metaheuristics, theoreticallyother metaheuristic methods could also be applied in multi-model fitting. Our successful application of PSO points outthe new direction of future studies.

Future work will on one hand focus on more robustextraction of supporting planes in cases where only smalltextured parts of the plane are visible as in the case of(densely filled) shelves. To this end we plan to fuse cues fromline-based stereo with dense stereo. On the other hand weare currently integrating the recently proposed segmentationwith fixation method by [30] as an alternative to the moregeneric graph-cut segmentation used now.

VII. ACKNOWLEDGMENTS

The research leading to these results has received fund-ing from the European Communitys Seventh Framework

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Fig. 6. The demonstration of 2D line fitting using PSO. The procedure of PSO iteration is illustrated: the initial 100 particles are showed in the firstfigure, the next three figures are captured after 50, 100 and 150 iterations, the last figure is the final estimation result. Input data of 800 points, with 100liners and 700 gross outliers.

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Fig. 7. The multiple line fitting results are depicted, either with the models that have spatial overlap (last two) or just occlude each other. Input data ofeach test is with 100 points per line and 700 gross outliers.

Programme [FP7/2007-2013] under grant agreement No.215181, CogX.

REFERENCES

[1] J. K. Tsotsos and K. Shubina, “Attention and Visual Search : ActiveRobotic Vision Systems that Search,” in The 5th International Con-ference on Computer Vision Systems, 2007.

[2] S. Frintrop, E. Rome, and H. Christensen, “Computational VisualAttention Systems and their Cognitive Foundations: A Survey,” ACMTransactions on Applied Perception, 2009.

[3] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visualattention for rapid scene analysis,” IEEE Transactions on PAMI,vol. 20, no. 11, pp. 1254–1259, Nov 1998.

[4] J. Tsotsos, S. Culhane, W. Wai, Y. Lai, N. Davis, and F. Nuflo,“Modeling visual attention via selective tuning,” Artificial Intelligence,vol. 78, no. 1–2, pp. 507–547, 1995.

[5] M. Wexler and N. Quarti, “Depth Affects Where We Look,” CurrentBiology, vol. 18, pp. 1872–1876, Dec. 2008.

[6] D. Vishwanath and E. Kowler, “Saccadic localization in the presenceof cues to three-dimensional shape,” J. Vis., vol. 4, no. 6, pp. 445–458,May 2004.

[7] A. Maki, P. Nordlund, and J.-O. Eklundh, “A computational model ofdepth-based attention,” in Proceedings of the 13th ICPR, vol. 4, Aug1996, pp. 734–739.

[8] N. Ouerhani and H. Hugli, “Computing visual attention from scenedepth,” in ICPR 2000, vol. 1, 2000, pp. 375–378.

[9] C. Koch and S. Ullman, “Shifts in selective visual attention: towardsthe underlying neural circuitry,” Human Neurobiology, pp. 219–227,1985.

[10] S. Frintrop, E. Rome, A. Nuchter, and H. Surmann, “A bimodal laser-based attention system,” J. of Computer Vision and Image Understand-ing (CVIU), Special Issue on Attention and Performance in ComputerVision, vol. 100, no. 1-2, pp. 124–151, 2005.

[11] N. D. B. Bruce and J. K. Tsotsos, “An attentional framework for stereovision,” in In Proc. of Canadian Conference on Computer and RobotVision, 2005.

[12] B. Ridge, D. Skocaj, and A. Leonardis, “Unsupervised learning ofbasic object affordances from object properties,” in Proceedings ofthe Fourteenth Computer Vision Winter Workshop, 2009, pp. 21–28.

[13] M. A. Fischler and R. C. Bolles, “Random sample consensus: aparadigm for model fitting with applications to image analysis andautomated cartography,” Commun. ACM, vol. 24, no. 6, pp. 381–395,1981.

[14] R. Toldo and A. Fusiello, “Robust multiple structures estimation withj-linkage,” in Proceedings of the European Conference of ComputerVision, 2008, pp. 537–547.

[15] W. Zhang and J. Kosecka, “Nonparametric estimation of multiplestructures with outliers,” in Dynamical Vision Workshop, 2006.

[16] T. Chin, H. Wang, and D. Suter, “Robust fitting of multiple structures:The statistical learning approach,” in ICCV, 2009.

[17] M. Zuliani, C. S. Kenney, and B. S. Manjunath, “The multiransacalgorithm and its application to detect planar homographies,” in IEEEInternational Conference on Image Processing, 2005.

[18] C. V. Stewart, “Bias in robust estimation caused by discontinuitiesand multiple structures,” IEEE Transactions on Pattern Analysis andMachine Intelligence, vol. 19, pp. 818–833, 1997.

[19] L. Xu, E. Oja, and P. Kultanen, “A new curve detection method:randomized hough transform (rht),” Pattern Recogn. Lett., vol. 11,no. 5, pp. 331–338, 1990.

[20] R. Subbarao and P. Meer, “Nonlinear mean shift for clustering overanalytic manifolds,” in Proceedings of the 2006 IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition,2006, pp. 1168–1175.

[21] A. Delong, A. Osokin, H. N. Isack, and Y. Boykov, “Fast approximateenergy minimization with label costs,” in CVPR, 2010.

[22] N. Hawes and J. Wyatt, “Engineering intelligent information-processing systems with cast,” Advanced Engineering Infomatics,vol. 24, no. 1, pp. 27–39, 2010.

[23] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Pro-ceedings of IEEE International Conference on Neural Networks, 1995,pp. 1942–1948.

[24] R. Eberhart and Y. Shi, “Comparing inertia weights and constrictionfactors in particle swarm optimization,” in Proceedings of the 2000IEEE Congress on Evolutionary Computation, 2000, pp. 84–88 vol.1.

[25] T. Back, Evolutionary algorithms in theory and practice: evolutionstrategies, evolutionary programming, genetic algorithms. Oxford,UK: Oxford University Press, 1996.

[26] S. Bird and X. Li, “Enhancing the robustness of a speciation-basedpso,” in Proceedings of the 2006 IEEE Congress on EvolutionaryComputation, 2006.

[27] Y. Boykov and V. Kolmogorov, “An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision,” IEEETransactions on PAMI, vol. 26, no. 9, pp. 1124 – 1137, 2004.

[28] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energyminimization via graph cuts,” in ICCV (1), 1999, pp. 377–384.[Online]. Available: citeseer.ist.psu.edu/article/boykov99fast.html

[29] V. Kolmogorov and R. Zabih, “What energy functions can be mini-mized via graph cuts?” IEEE Transactions on PAMI, vol. 26, no. 2,pp. 147 – 159, 2004.

[30] A. Mishra, Y. Aloimonos, and C. L. Fah, “Active Segmentation withFixation,” in ICCV, 2009.

Date post:	01-May-2023
Category:	Documents
Upload:	tuwien
View:	0 times
Download:	0 times

Multi-model fitting using Particle Swarm Optimization for 3D perception in robot vision

Documents