A cooperative particle swarm optimizer with …liacs.leidenuniv.nl › ~csnaco › SWI › papers...

Swarm Intell (2010) 4: 57–89DOI 10.1007/s11721-009-0037-5

A cooperative particle swarm optimizer with migrationof heterogeneous probabilistic models

Mohammed El-Abd · Mohamed S. Kamel

Received: 14 July 2008 / Accepted: 25 September 2009 / Published online: 14 October 2009© Springer Science + Business Media, LLC 2009

Abstract Particle Swarm Optimization (PSO) is a stochastic optimization approach thatoriginated from simulations of bird flocking, and that has been successfully used in manyapplications as an optimization tool. Estimation of distribution algorithms (EDAs) are a classof evolutionary algorithms which perform a two-step process: building a probabilistic modelfrom which good solutions may be generated and then using this model to generate new in-dividuals. Two distinct research trends that emerged in the past few years are the hybridiza-tion of PSO and EDA algorithms and the parallelization of EDAs to exploit the idea ofexchanging the probabilistic model information. In this work, we propose the use of a coop-erative PSO/EDA algorithm based on the exchange of heterogeneous probabilistic models.The model is heterogeneous because the cooperating PSO/EDA algorithms use differentmethods to sample the search space. Three different exchange approaches are tested andcompared in this work. In all these approaches, the amount of information exchanged isadapted based on the performance of the two cooperating swarms. The performance of thecooperative model is compared to the existing state-of-the-art PSO cooperative approachesusing a suite of well-known benchmark optimization functions.

Keywords Particle swarm optimization · Estimation of distribution algorithms ·Cooperative search · Hybrid algorithms · Probabilistic models migration · Non-linearfunction optimization

1 Introduction

Particle Swarm Optimization (PSO) (Kennedy and Eberhart 1995) is an optimizationmethod widely used to solve the problem of continuous nonlinear function optimization.It is a stochastic optimization technique that was inspired from simulations of bird flockingbehavior.

M. El-Abd (�) · M.S. KamelECE Department, University of Waterloo, 200 University Av. W., Waterloo, Ontario N2L3G1, Canadae-mail: [email protected]

mailto:[email protected]

58 Swarm Intell (2010) 4: 57–89

Fig. 1 The EDA generalstructure.

1 : P ⇐ Initialize the population

2 : Evaluate the initial population

3 : while iter_number ≤ Max_iterations

4 : Ps ⇐ Select the top s individuals from P

5 : M ⇐ Estimate a new Model from Ps

6 : Pn ⇐ Sample n individuals from M

7 : Evaluate Pn

8 : P ⇐ Select n individuals from P ∪ Pn

9 : iter_number = iter_number + 1

10: end while

Estimation of distribution algorithms (EDAs) (Larrañaga and Lozano 2001) are evolu-tionary algorithms that originated from genetic algorithms (GAs). In any GA, the populationevolves from one generation to the next by manipulating its individuals through a sequenceof selection, crossover, and mutation operators. On the other hand, in EDAs, the selectionoperator is the only one present. The selected individuals could be regarded as a sampledrawn from an unknown probability distribution. The EDAs try to estimate this probabil-ity distribution by using selected individuals to construct a probabilistic model. This modelis consequently used to generate a new population to replace the current one and so on.Hence, instead of maintaining a group of individuals from one generation to the next as inGAs, EDAs maintain a continuously updated probabilistic model. Although EDAs had beenoriginally introduced to tackle combinatorial optimization problems, recent applications tocontinuous optimization have been proposed (Rudolph and Köppen 1996; Servet et al. 1997;Sebag and Ducoulombier 1998; Gallagher et al. 1999). The general structure of an EDA isshown in Fig. 1.

The introduction of Parallel EDAs is a new research direction that has been pursued inthe past few years (Hiroyaso et al. 2003; Ahn et al. 2004; de la Ossa et al. 2004, 2006;Madera et al. 2006; Schwarz et al. 2007; Jaros and Schwarz 2007). The general idea isto have different EDAs running in parallel and exchanging information among them. Theinformation exchanged could either be a group of individuals, as in parallel GAs, or theprobabilistic model maintained by every algorithm.

In this work, we investigate the use of a cooperative PSO model that is based on exchang-ing information about probabilistic models rather than exchanging particles. To our knowl-edge, we also introduce the first heterogeneous model that incorporates different EDAs inthe cooperating populations.

The motivation behind this work is twofold. Firstly, to our knowledge, there have beenfew attempts to hybridize PSO with EDAs, and hence all the cooperative systems utilizingthe PSO algorithms have relied on the classical migration of particles, and there has notbeen any investigation of the benefits of exchanging probabilistic models. Secondly, all theparallel EDAs proposed in the literature so far are based on using an island-model approachwhere all the islands use the same probabilistic model to sample the search space. Hence, theislands might actually run into the same problems caused by the model used. Nevertheless,these approaches did manage to produce better results for a number of different cases. Theintuition behind our approach is that by using different probabilistic models (i.e., differentapproaches to sample the search space), one could end up taking advantage of the benefitsof the different approaches.

As part of our investigation, three different information exchange approaches are testedin this work using the proposed model. The first approach is the classical migration of par-ticles. The second approach is for each swarm to use the received probabilistic model to

Swarm Intell (2010) 4: 57–89 59

generate new particles to replace the worst particles in the receiving swarm. The third is foreach swarm to combine the received probabilistic model with its own and continue with thesearch. However, since every swarm receives a model that is in a different form than its own,a new procedure needs to be performed by the recipient swarm before performing the modelcombination step. This procedure is referred to as model conversion.

An adaptive cooperative model using these different approaches is implemented andcompared to its individual components as well as to some state-of-the-art PSO cooperativeoptimization algorithms using a benchmark of different continuous optimization functions.

The paper is organized as follows: a brief background about PSO is covered in Sect. 2.Section 3 surveys the different hybrid implementations combining both PSO and EDAs.Section 4 covers the different parallel EDA models proposed in the literature. An explana-tion of the different exchange approaches tested in this work is given in Sect. 5. The newcooperative PSO model is detailed in Sect. 6. The experimental results are illustrated anddiscussed in Sect. 7. Section 8 shows a comparison between our algorithm and some state-of-the-art cooperative PSO implementations proposed in the literature. Finally, the paper isconcluded in Sect. 9.

2 Particle swarm optimization

The PSO method is a population-based method, where the population is referred to as aswarm. The swarm consists of a number of individuals called particles. Each particle i inthe swarm holds the following information: (i) the current position xi , which represents asolution to the problem, (ii) the current velocity vi , (iii) the best position, the one associatedwith the best objective function value the particle has achieved so far pbesti , where thisobjective function value is calculated using a function f (.) that evaluates the desirability ofa solution, and (iv) the neighborhood best position, the one associated with the best objectivefunction value found in the particle’s neighborhood nbesti . The choice of nbesti depends onthe neighborhood topology adopted by the swarm; different neighborhood topologies havebeen studied by Kennedy and Mendes (2002).

In traditional PSO, each particle adjusts its own position in every iteration in order tomove towards its best position and the neighborhood best according to the following equa-tions:

vt+1ij = wvt

ij + c1r1

(pbesttij − xt

ij

) + c2r2

(nbesttij − xt

ij

), (1)

xt+1ij = xt

ij + vt+1ij , (2)

for j ∈ {1 . . . d}, where d is the number of dimensions, i ∈ {1 . . . n}, where n is the numberof particles, t is the iteration number, w is the inertia weight, r1 and r2 are two randomnumbers uniformly distributed in the range [0,1), and c1 and c2 are the acceleration factors.The parameters controlling the algorithm’s performance include w, c1, c2, and the swarmsize n.

After changing their position, each particle updates its personal best position using (as-suming a minimization problem):

pbestt+1i =

{pbestti if f (pbestti ) ≤ f (xt+1

i ),

xt+1i if f (pbestti ) > f (xt+1

i ).(3)

60 Swarm Intell (2010) 4: 57–89

Finally, the global best of the swarm is updated using the following equation:

gbestt+1 = arg minpbestt+1

i

f(pbestt+1

i

). (4)

This model is referred to as the lbest (local best) model. Another model is the gbest(global best) model, which is the case when the particle’s neighborhood is defined as thewhole swarm.

3 Particle swarm optimization based on probabilistic models

3.1 EDPSO

An estimation of distribution particle swarm optimizer (EDPSO) was proposed by Iqbal andMontes de Oca (2006). The method borrowed some ideas from a development in Ant ColonyOptimization (ACO) for solving continuous optimization problems (Socha and Dorigo 2005,2008). The approach samples the particles distribution in the search space as a joint prob-ability distribution using mixtures of weighted Gaussian functions. The Gaussian functionsare defined through an archive of k solutions (pbests of the particles). For each dimension d ,the dimension is either updated using PSO equations or by sampling a Gaussian distribu-tion selected from the archive. The values of this dimension d across all the solutions in thearchive compose the vector μd , which is the vector of means for the univariate Gaussiandistributions:

μd = 〈pbest1d ,pbest2d , . . . ,pbestkd〉. (5)

To select one of these distributions, a weight vector w, which holds the weights associatedwith each distribution, is calculated. This is done by ranking the solutions according to theirobjective function value, with the best solution having a rank of 1. The weight is calculatedfor each solution as follows:

w = 〈w1,w2, . . . ,wk〉,

wl = 1

qk√

2πe

− (l−1)2

2q2k2 ,(6)

where q determines how much we prefer good solutions, and l is the solution rank.The Gaussian function to be used is selected probabilistically. The probability of select-

ing a certain Gaussian function is proportional to its weight. This probability is calculatedas follows:

p = 〈p1,p2, . . . , pk〉,pl = wl∑k

r=1 wr.

(7)

After selecting a certain Gaussian function Gd denoted by its mean pbestgd , where 1 <

g < k, the standard deviation for this function is calculated as

σgd = ξ

k∑

i=1

|pbestid − pbestgd|k − 1

, (8)

where ξ is a parameter to balance the exploration–exploitation behaviors.

Swarm Intell (2010) 4: 57–89 61

Finally, the selected Gaussian function is evaluated (not sampled) to generate a value r ,which is used to decide how the particle will update its position. This is done by generatinga uniformly distributed random number, U(0,1). If it is less than r , the particle moves usingthe normal PSO equations. Otherwise, the Gaussian function is sampled in order to movethe particle.

3.2 EDA–PSO

A hybrid EDA–PSO approach was proposed by Zhou and Jin (2006). The algorithm samplesan independent univariate Gaussian distribution based on the best half of the swarm. Themean and standard deviation of the model is calculated in every iteration as

μj = 1

M

M∑

i=1

xij ,

σj =√√√√ 1

M

M∑

i=1

(xij − μj)2,

(9)

where M = n/2 for a swarm with n particles, i is the particle number, and j refers to thej th dimension.

The choice of whether to update the particle using the normal PSO equations or to samplethe particle using the estimated distribution is made with a probability p, referred to as theparticipation ratio. If p = 0, the algorithm will behave as a pure EDA algorithm, and ifp = 1, it will be a pure PSO algorithm. In the hybrid approach, where 0 < p < 1, eachparticle is either totally updated by the PSO equations or totally sampled from the estimateddistribution. This is different than EDPSO, where this choice is made at the variables (notat the particles) level. Finally, the particle is updated only if its objective function valueimproves.

Zhou and Jin (2006) also proposed different approaches in order to adaptively set theparameter p. These approaches depend on the success rate of each of both the PSO andEDA parts in improving a particle’s objective function value. These approaches are:

– The Generation based, where the success rates are calculated based on the informationgathered during the last iteration,

pt+1 =sum_PSOt

num_PSOt

sum_PSOt

num_PSOt + sum_EDAt

num_EDAt

. (10)

– The All historical information, where the success rates are calculated based on the infor-mation gathered during the entire search,

pt+1 =∑t

i=1sum_PSOi

num_PSOi

∑t

i=1sum_PSOi

num_PSOi + ∑t

i=1sum_EDAi

num_EDAi

. (11)

– The Sliding window, where the success rates are calculated considering only the informa-tion in the last m iterations,

pt+1 =∑t

i=t−m+1sum_PSOi

num_PSOi

∑t

i=t−m+1sum_PSOi

num_PSOi + ∑t

i=t−m+1sum_EDAi

num_EDAi

, (12)

where m is the window size.

62 Swarm Intell (2010) 4: 57–89

In the above, num_PSOt and sum_PSOt refer to the number of improvements (number ofparticles improved) and sum of improvements (sum of improvements of the particles objec-tive function values) achieved by the PSO component of the algorithm at iteration t . Whilenum_EDAt and sum_EDAt refer to the number of improvements and sum of improvementsachieved by the EDA component of the algorithm at iteration t .

de la Ossa et al. (2006) proposed calculating σj using the whole population instead ofonly the best half as done for μj . This was found to produce better results due to the induceddiversity that prevents premature convergence. The same approach is used in this work whenapplying the EDA–PSO algorithm.

The performance of EDA–PSO is also enhanced by incorporating the method used byGrahl et al. (2006) for updating the variance of the Gaussian model. In this method, the vari-ance of the Gaussian model is either enlarged or reduced based on the area covered by themodel. If the model is following a slope, the variance of the Gaussian model is adjusted ac-cording to the performance of the algorithm. If the Gaussian model is covering an optimum,the variance is kept as is. Whether the Gaussian model is covering a slope or an optimum isfound out by calculating the correlation between the ranks of the objective function valuesand densities of the sampled individuals. If the correlation r > −0.55, adaptive scaling isused (the model is covering a slope). Otherwise, the variance is kept the same (the model iscovering an optimum).

In the case of the model covering a slope, the adjustment of the variance is done byscaling the variance with a coefficient CAVS, which has an initial value of 1. This coeffi-cient is adjusted according to the performance of the algorithm. The coefficient is increased(i.e., multiplied by 1.1) if the best individual has improved from the previous iteration ordecreased (i.e., multiplied by 0.9) otherwise. If the value for CAVS drops below 0.1 or be-comes higher than 10, it is reset to 10 to encourage exploration. All these values are thesame as used in Grahl et al. (2006). This approach is applied after calculating μ and σ ofthe Gaussian model and before continuing with the update of the different particles.

The complete algorithm for EDA–PSO with the correlation triggered variance adaptationapproach is shown in Fig. 2.

3.3 PSO_Bounds

A population-based incremental learning (PBIL) approach for continuous search spaces wasproposed by Servet et al. (1997). The algorithm explored the search space by dividing the do-main of each gene into two equal intervals referred to as the low and high intervals. A prob-ability hd , which is initially set to 0.5, is the probability of dimension number d being in thehigh interval as shown:

xd ∈ [a, b], hd = Probability

(xd >

a + b

2

). (13)

After each generation, this distribution was updated according to the dimension values ofthe best individual using the following formula:

p ={

0 if xbestd < a+b

2 ,

1 otherwise,

ht+1d = (1 − α) · ht

d + α · p,

(14)

Swarm Intell (2010) 4: 57–89 63

input Max_Function_Evaluations, number_of_particles, w, c1, c2

1 : Initialize the swarm

2 : Max_Iterations = Max_Function_EvaluationsNum_Particles

3 : iter_number = 1

4 : CAVS = 1

5 : while iter_number ≤ Max_Iterations

6 : Calculate μ using top n2 particles

7 : Calculate σ using the whole swarm

8 : Calculate correlation factor r

9 : if r > −0.55

10: σ =√

CAVS · σ11: end if

12: for every particle i

13: if U(0,1) < p

14: candidate_particle = PSO(i)

15: else

16: candidate_particle = Gauss(μ,σ )

17: end if

18: if candidate_particle has a better objective function value than the original particle

19: xi = candidate_particle

20: Update pbest i if necessary

21: end if

22: end for

23: iter_number = iter_number + 1

24: Update gbest if necessary

25: if EDA-PSO is successful

26: CAVS = CAVS · 1.1

27: else

28: CAVS = CAVS · 0.9

29: end if

30: if CAVS < 0.1 or CAVS > 10

31: CAVS = 10

32: end if

33: end while

34: return gbest

Fig. 2 The EDA–PSO algorithm

where α is the relaxation factor, and t is the iteration number. If hd becomes below hmind or

above hmaxd , the population is resampled in the corresponding interval, [a, a+b

2 ] or [ a+b2 , b],

respectively.The current authors (El-Abd and Kamel 2007) introduced PSO_Bounds, in which the

concepts of PBIL are integrated into PSO. At the beginning of the algorithm, the parti-cles are initialized in the predefined domain. After every iteration, the probability hd of eachdimension d is adjusted according to the probability of the value associated with this dimen-sion being in the high interval of the defined domain. To prevent premature convergence, thisprobability is calculated using information from all the particles and not only gbest . Hence,

64 Swarm Intell (2010) 4: 57–89

the original equations of PBIL are changed for continuous optimization as follows:

ptid =

{0 if pbesttid < a+b

2 ,

1 otherwise,

ptd = 1

N

n∑

i=1

ptid ,

ht+1d = (1 − α) · ht

d + α · ptd,

(15)

where i ∈ {1 . . . n}, n is the number of particles, t is the iteration number, and d is thedimension.

When hd becomes specific enough, the domain of dimension d is adjusted accordingly,and hd is reinitialized to 0.5. In this model, different dimensions may end up having differentdomains and different velocity bounds which do not happen in normal PSO.

In order to overcome the problem of the bounds overlapping, thus preventing further par-ticle movement, the width of the adjusted bounds is taken into consideration if the algorithmneeds to adjust these bounds for a certain dimension d . If the width drops below a prede-termined percentage of the initial search domain width, controlled by a parameter T , thebounds are reset to the initial bounds of the search space, and the velocity component is alsoreinitialized. This will allow the particles to move in different directions and in large stepsin the next iteration while still taking the old pbest and gbest information into account,hence, not losing any previous information gathered during the search.

The algorithm for PSO_Bounds is shown in Fig. 3, where xmind and xmax

d refer to theminimum and maximum bounds for dimension d , while vmin

d and vmaxd refer to the velocity

bounds.

4 Parallel estimation of distribution algorithms

A recent research direction is to introduce the idea of parallel EDAs that either exchange agroup of individuals or exchange their built-in probabilistic models instead.

4.1 Exchanging individuals

A distributed probabilistic model-building genetic algorithm (DPMBGA) was presented byHiroyaso et al. (2003). Principal Component Analysis (PCA) was used to handle the cor-relation between the design variables. DPMBGA uses the island model where migrationoccurred in a directed ring topology. The migrant individuals were randomly chosen andused to replace the worst individuals in the next sub-population. DPMBGA provided betterresults than a real coded GA, and the best results were achieved when using PCA in onlyhalf of the sub-populations.

In Madera et al. (2006), the authors proposed the use of a distributed version of EDA(dEDA) and applied it to both combinatorial and numerical problems. They used the is-land model in which each processor executed a Univariate Marginal Distribution Algorithm(UMDA), (Mühlenbein 1998). Each processor exchanged information with other proces-sors according to a certain migration policy. The information was exchanged so that thebest individuals in one population replace the worst individuals in another. The experimentsshowed that the distributed model was able to solve problems of considerable complexity byusing a suitable configuration of the migration parameters. The authors also introduced the

Swarm Intell (2010) 4: 57–89 65

input Max_Function_Evaluations, number_of_particles, w, c1, c2, α, hmind

, hmaxd

, T

1 : Initialize the swarm


3 : iter_number = 1


5 : Update the swarm

6 : for each dimension d

7 : Calculate ptd

and update hd

8 : if hd < hmind

or hd > hmaxd

9 : if hd < hmind

10: xmaxd

= b = a+b2

11: else if hd > hmaxd

12: xmind

= a = a+b2

13: end if

14: vdmax = −vdmin = xmaxd

−xmind

2

15: if xmaxd

− xmind

< T

16: Set xmind

and xmaxd

to the initial search bounds

17: vmaxd

= −vmind

= xmaxd

−xmind

2

18: Re-initialize vd

19: end if

20: hd = 0.5

21: end if

22: end for


24: end while

25: return gbest

Fig. 3 The PSO_Bounds algorithm

idea of implementing a heterogeneous system where different processors execute differentalgorithms. However, our work introduces the first attempt to implement such an approach.

4.2 Exchanging probabilistic models

A basic framework for implementing a parallel EDA was introduced by Ahn et al. (2004)and applied it using PBIL. Each island has a resident probability distribution vector (rPV)that estimates the distribution of the promising resident individuals. At every communica-tion step, each island receives the immigrant PVs (iPV) from the neighboring islands. Theevolution of each island proceeds through three different phases: the generation phase, theselection phase, and the update phase. In the generation phase, each island generates threetypes of individuals, namely, the resident individuals created by rPV, the immigrant individ-uals created by iPV, and the crossbred individuals resultant from the crossover of rPV andiPV. In the selection phase, the best individuals are proportionally selected from the wholepopulation. Finally, in the update phase, the selected individuals are used in updating thedifferent PVs. This framework was used in implementing a discrete parallel EDA based onPBIL referred to as P 2BIL. The introduced approach was found to produce results that arecomparable to those of multiple-deme parallel GAs.

66 Swarm Intell (2010) 4: 57–89

De la Ossa et al. (2004) proposed an island EDA model with the migration of univariatedistributions to solve combinatorial problems. Each island adopted UMDA. The migrant in-formation between the cooperating islands was a tuple 〈M,f 〉, where M is the probabilisticmodel, and f is the average objective function value of the best 10% population’s individ-uals. When an island receives an immigrant model, Mi , this model is combined with theresident model, Mr , using the formula

Mr = βMr + (1 − β) · Mi, (16)

where β was set to be 0.9. The authors also proposed an adaptive approach for setting thisparameter as follows (for a maximization problem):

β ={

fr

fi+frif fi ≥ fr,

0.9 otherwise,(17)

where fr and fi are the average objective function values related to the resident and immi-grant models, respectively. The authors came to the conclusion that migrating a probabilisticmodel generally gives better results than migrating a group of individuals.

In de la Ossa et al. (2006), the application of island-based parallel EDAs was extendedto continuous domains. The authors experimented with islands that either adopt UMDA orEMNAGLOBAL (Larrañaga and Lozano 2001), where the latter is used to capture multivariatedependencies. The normal distribution was used to model the promising individuals. Insteadof the previous combination model proposed in de la Ossa et al. (2004) and shown in (16),mixture models were used instead allowing the combination of single distributions into ajoint model. In the mixture model, each variable of the individual was set by either samplingthe resident model or the immigrant model. The sampled model was probabilistically chosenbased on a probability β that was adaptively set as in (17). The following equation illustratesthis approach:

Individual ={

sample(Mr) if random(0,1) < β,

sample(Mi) otherwise.(18)

The experiments conducted in de la Ossa et al. (2006) showed that the parallelization wasmore beneficial when using UMDA. When using EMNAGLOBAL, the islands required verylarge populations to correctly model the distribution, and this resulted in a performancedeterioration. It was also shown that the migration of a probabilistic model is better than themigration of individuals especially when setting β adaptively.

Schwarz et al. (2007) and Jaros and Schwarz (2007) proposed the use of a parallel bivari-ate marginal distribution algorithm (BMDA). The island model was used in a directed ringtopology. The authors proposed two approaches for combining the immigrant and residentmodels. One approach was the mixed learning of the dependency graphs, experimentingwith both the max and random operators. The other approach was the adaptive learningof dependency graphs, employing equations similar to (17) and (18). The authors reachedthe conclusion that the migration of probability models with adaptation can significantlyimprove the performance over the migration of individuals. They also found the sequentialBMDA to produce competitive results when compared with the adaptive parallel version butwith the drawback of increased time complexity.

Swarm Intell (2010) 4: 57–89 67

5 Which information should be exchanged?

When adopting a PSO cooperative model (or any cooperating model in general), there aresome important issues that need to be addressed in order to make the model efficient. El-Abdand Kamel (2008) showed that one needs to determine for each swarm which information toexchange, when to exchange it, what to do with it, and to whom it should be sent.

In this work we will discuss the type of information exchanged. In all the previous modelsdiscussed in the previous section, the cooperative EDA models either exchanged a group ofindividuals or probabilistic models that got combined using an adaptive scheme. However,there has been no comparison of the two approaches using the same model.

While investigating the use of the cooperative EDA-based particle swarm optimizer, wewill compare three different exchange approaches using a 2-swarm cooperative model. Thedifferent information exchange approaches tested in this work are:

– The classical migration of particles, where a set containing the best particles of eachswarm is sent to replace the worst particles in the other swarm.

– The migration of the probabilistic model:– One approach is that each swarm combines the received model with its own and then

continues with the search.– Another approach is that each swarm uses the received probabilistic model to generate

a set of particles to replace the worst particles that it contains.

For the first exchange approach, one needs to decide whether to replace the whole particleinformation or just its position x. In this work, we replace the whole information containedin the particle. One disadvantage of this approach is that when particles are being exchangedboth ways, both swarms will have a large number of exactly the same particles right afterthe exchange. This increases the probability of both swarms searching the same area of thesearch space because they would start from the exact same points (points shared after thecommunication).

For the second exchange approach, usually taken by researchers in the EDA field, prob-abilistic models are being combined adaptively as discussed earlier, based on the findingsof previous research. However, randomly replacing some dimensions of the probabilisticmodel might actually guide the particles away from good regions. There is no guarantee thatreplacing one dimension of the resident model with the same dimension from the receivedmodel is of any benefit.

For the third exchange approach, the use of the received model to generate particles inorder to replace the worst particles in the receiving swarm should overcome the previoustwo problems. The information present in the received model is transmitted through thegenerated particles. These particles will not harm the search in any way as they replace theworst particles in the swarm and at the same time are not identical to the ones present in thesending swarm. Additionally, the resident model remains the same unless these particles aregood enough to influence it.

Another explanation is that one could regard the generated particles as a mutated versionof the best particles of the swarm generating the probabilistic model. This mutation is guar-anteed to be small as the sample distribution is actually drawn from the best particles of thatswarm and it will allow both swarms to be still searching different points.

When more swarms are added to the model, there will be another parameter that mightaffect the performance of the different exchange approaches, which is the swarms topology.In many previously conducted studies, the ring topology was proven to be the best onewhen having more than two cooperating modules. When the ring topology is adopted, each

68 Swarm Intell (2010) 4: 57–89

swarm will be sending the information to the next swarm while receiving information fromthe previous swarm. This might cause the performance of one exchange strategy or anotherto be completely different. Although in this work we focus on the use of a 2-swarm model,the same exchange approaches are to be tested in future work when adding more swarms tothe cooperating model.

6 Proposed model

In this work, we investigate the use of a cooperative EDA-based particle swarm optimizerthat is based on the exchange of heterogeneous probabilistic models. The model uses twoswarms, each of which employs a different PSO and EDA hybrid.

6.1 Cooperative swarms

In the hybrid model shown in Fig. 4 and illustrated in Fig. 5, one swarm is using thePSO_Bounds algorithm, while the other uses the EDA–PSO approach. At every communi-cation step, each swarm sends its resident model to the other swarm along with the averageobjective function value of the best half of its individuals while receiving the same kind ofinformation back. For the PSO_Bounds swarm, the model sent is a vector containing thelower bound, the higher bound, and the probability of the value being in the upper half forall the dimensions as shown below:

MPSO_Bounds = ⟨(a1, b1, h1), (a2, b2, h2), . . . , (an, bn, hn)

⟩, (19)

where n is the problem size.

Fig. 4 Hybrid cooperativemodel

input Max_Function_Evaluations, number_of_particles, w, c1, c2, Synchronization_Period

1 : Initialize the two swarms


3 : iter_number = 1


5 : Update PSO_Bounds swarm

6 : Update EDA-PSO swarm

7 : if Synchronization

8 : Exchange information

9 : end if


11: end while

12: return the better of the two gbest solutions

Fig. 5 The sequential algorithm for the cooperative model

Swarm Intell (2010) 4: 57–89 69

On the other hand, the EDA–PSO model received is the mean and standard deviation ofthe various normal distributions used to sample the different dimensions in the other swarmas follows:

MEDA–PSO = ⟨(μ1, σ1), (μ2, σ2), . . . , (μn, σn)

⟩. (20)

Note that fPSO_Bounds is the average objective function value of the best half of thePSO_Bounds swarm, while fEDA–PSO is the average objective function value of the besthalf of the EDA–PSO swarm.

The communication approach adopted by the model is the synchronous one. In this ap-proach, both swarms perform the same predetermined number of iterations before attempt-ing communication. This parameter is referred to as the synchronization period.

When the third exchange approach is adopted, the new particles’ velocities are initializedrandomly within the range specified by the received model. In case of the received modelbeing a Gaussian one, the velocities are initialized in the range specified by the PSO_Boundsswarm (recipient swarm) as the velocities would be set to these bounds anyway.

6.2 Probabilistic model sharing

This section gives further details of the second information exchange approach. At everycommunication step, each swarm has to combine the received model with its own. However,since these models are different, this is done in two steps, namely, model conversion andmodel combination.

6.2.1 Model conversion

In the model conversion step, each swarm converts the immigrant model to an equivalentmodel that is in the same form as its resident one. For the PSO_Bounds swarm, the immi-grant model is converted as shown below:

ad = μd − γ · σd,

bd = μd + γ · σd, (21)

hd = 0.5.

The value of γ could be interpreted as how much we trust the received model. If Gaussianmodel is already surrounding an optimum, it is better to have a small value for γ to havea very restricted search around this optimum. However, if the Gaussian model is coveringa slope, we would need γ to be large so as not to constrain ourselves in the middle areacovered by the model as it does not contain good solutions. For simplicity, this parameter isfixed inside the PSO_Bounds swarm, although a method similar to the one used by Grahl etal. (2006) could be used to adaptively set its value.

For the EDA–PSO swarm, the process starts by checking the value of hd to see whethera larger number of particles are in the low or high region of the received interval. Theprocess then continues by adjusting the received interval and using it to generate an equiv-alent Gaussian model according to the following equations. This is equivalent to setting γ

equal to 3 in (21):

[ad, bd ] ={

ad,ad+bd

2 if hd < 0.5,

ad+bd

2 , bd otherwise,(22)

70 Swarm Intell (2010) 4: 57–89

Fig. 6 Probabilistic models conversion

Fig. 7 The PSO_Bounds swarmcombination function

input Mres, fPSO_Bounds,Mi , fEDA–PSO

1 : if fEDA–PSO < fPSO_Bounds

2 : β = fEDA–PSOfEDA–PSO+fPSO_Bounds

3 : else

4 : β = 0.9

5 : end if

6 : for every dimension d

7 : if U(0,1) < β

8 : Mdres = Md

r

9 : else

10: Mdres = Md

i

11: end if

12: end for

13: return Mres

μd = ad + bd

2,

(23)

σd = bd − ad

6.

Figure 6 illustrates the model conversion process carried by the PSO_Bounds and EDA–PSO swarms.

6.2.2 Model combination

After the conversion is done, each swarm will have two models in the same form that it needsto combine. In the PSO_Bounds swarm, the model combination step uses the mixture modelapproach adopted by de la Ossa et al. (2006) on a dimension-by-dimension basis as shown inFig. 7 for a minimization problem where Mr , Mi , and Mres are the resident, immigrant, andresultant models respectively, and U(0,1) is a uniformly distributed number in the range[0,1). After this step, the PSO_Bounds swarm continues with the search process using the〈a, b,h〉 values of the combined model.

For the EDA–PSO swarm, the immigrant model is combined with the resident one fol-lowing the same approach. The resultant model is then used to generate a number of newparticles to replace the worst particles in the swarm.

Swarm Intell (2010) 4: 57–89 71

6.3 Information adaptation

In this section, an adaptive cooperative version is proposed. The benefit for this adaptiveversion is to increase the effect of the component that is performing a better job in thefunction under study. So, instead of having a fixed exchanged amount of information, theswarm that performs better during the search sends more information to the other swarmwhile receiving less information.

This is done by observing the performance of both components in every iteration duringthe search process. Two counters are used, one for each component, and when a certaincomponent has a better performance during any given iteration, its counter is incremented.To accommodate the fact that these components have different behaviors during the search,a sliding window approach is taken by resetting these counters every 50 iterations. At the endof every sliding window, the number of iterations in which the PSO_Bounds had the betterperformance is calculated as a percentage. This percentage is used to control the amount ofinformation flow between the two swarms.

For the first exchange approach, which is the classical migration of particles, the numberof particles migrating from one swarm to another is controlled. Normally, the number of par-ticles exchanged is 20% of the swarm size. When one component has a better performancethan the other, it receives only 10% and sends 40% to the other swarm.

For the second exchange approach, in which the migrant model is used to generate newparticles, the number of particles generated by the migrant probabilistic model is controlledin the exact same manner.

For the third exchange approach, in which the migrant model is combined with the res-ident one, to increase the PSO_Bounds swarm effect, the number of particles replaced inthe EDA–PSO swarm is increased similarly to the previous approaches. On the other hand,to increase the EDA–PSO swarm effect, the calculation of β in the PSO_Bounds swarm ischanged as follows:

β =⎧⎨

⎩

factor · fEDA–PSOfEDA–PSO+fPSO_Bounds

if fEDA–PSO < fPSO_Bounds,

factor · 0.9 otherwise,(24)

where factor controls the influence of the EDA–PSO swarm. Normally, factor is set to 1,but when it is decreased, the influence of the EDA–PSO swarm is increased.

The adaptive cooperative model is shown in Fig. 8, where the synchronization period isassumed to be equal to the window size.

7 Experimental results

7.1 Experimental settings

Table 1 shows the settings used for applying the algorithms under study. For all experiments,the particles have been randomly initialized in the specified domain using a uniform distri-bution. For EDA–PSO, the value for p is set adaptively using the all historical informationapproach, as it was found to be the best one based on our experiments. For PSO-Bounds,the values for (α, hmin

d , hmaxd ) are changed from (0.01, 0.1, 0.9) in Servet et al. (1997) to (0.1,

0.2, 0.8) to allow a faster process of varying the bounds.

72 Swarm Intell (2010) 4: 57–89

input Max_Function_Evaluations, number_of_particles, w, c1, c2, Synchropnization_Period

1 : Initialize the two swarms


3 : iter_number = 1


5 : Update PSO_Bounds swarm

6 : Update EDA-PSO swarm

7 : if Synchronization

8 : Measure the two components’ performances

9 : Adjust amount of exchanged information

10: Exchange information

11: end if


13: end while

14: return the better of the two gbest solutions

Fig. 8 The adaptive sequential algorithm for the cooperative model

Table 1 Parameter settingsModel Parameter Value

Decreasing Inertia w 0.9 to 0.1 (linearly

Weight PSO decreasing with the iterations)

c1 and c2 2

EDA–PSO p Adaptive—All

historical information

PSO_Bounds α 0.1

hmind

0.2

hmaxd

0.8

T 0.0001

The experiments are conducted using the classical benchmark functions shown in Table 2and the benchmark functions f6–f14 proposed in CEC2005 (Suganthan et al. 2005b) shownin Table 3, available at Suganthan et al. (2005a).

In order to constrain the particles movement within the specified domain for the CEC05functions, any violating particle has its position randomly reinitialized inside the specifieddomain. This approach was chosen over the absorbing or reflecting boundary handling ap-proaches, as it produced better results in our experiments.

The experiments are run for a problem dimensionality of 10, 30, and 50 with 40 particlesin the swarm performing 100,000, 100,000, and 200,000 function evaluations, respectively.The results reported are the means and standard deviations of the solution value after theperformed function evaluations taken over 30 runs.

In the model conversion step of the PSO_Bounds swarm, we set the value of γ to 3. Inthis way, the interval [a, b] should contain 99% of the values that could be generated usingthe received Gaussian distribution. In the model combination step of the EDA–PSO swarm,the newly generated particles were empirically chosen to be 20% of the swarm.

The PSO algorithm used in all the components of this work is based on the gbest modeland uses the decreasing inertia weight approach. In this approach, w is decreased linearly

Swarm Intell (2010) 4: 57–89 73

Table 2 Benchmark functions

Function Equation Domain

Sphere f (x) = ∑nd=1 x2

d(−100,100)

Rosenbrock f (x) = ∑n−1d=1(100 × (x2

d− xd+1)2 + (xd − 1)2) (−30,30)

Griewank f (x) = 14000

∑nd=1 x2

d− ∏n

d=1 cos( xd√d) + 1 (−600,600)

Ackley f (x) = 20 + e − 20 × exp(−0.2√

1n

∑nd=1 x2

d)

− exp( 1n

∑nd=1 cos 2πxd) (−30,30)

Rastrigin f (x) = ∑nd=1(x2

d− 10 × cos 2πxd + 10) (−5.12,5.12)

Schwefel f (x) = 418.9829 × n + ∑nd=1 −xd sin

√|(xd )| (−500,500)

Table 3 CEC05 benchmark functions

Benchmark Description Domain

function

f6 shifted Rosenbrock (−100,100)

f7 shifted rotated Griewank (0,600)

f9 shifted Rastrigin (−5,5)

f10 shifted rotated Rastrigin (−5,5)

f11 shifted rotated Weierstrass (−0.5,0.5)

f12 expanded shifted Schwefel (−100,100)

f13 expanded extended Griewank plus Rosenbrock (−3,1)

f14 expanded rotated extended Schaffer (−100,100)

from 0.9 at the beginning of the search, to emphasize exploration; to 0.1 at the end, toencourage exploitation.

7.2 Results of the individual algorithms

Tables 4, 5, and 6 show the results obtained by applying the decreasing inertia weight PSOalgorithm, EDA–PSO, and PSO_Bounds for different problem sizes. The PSO_Bounds al-gorithm was not applied for f7 as this function is not bounded by a specified domain (thebounds shown in Table 3 are used only as an initialization range). The best results arehighlighted in bold (to test for the significance of the results, we used the Mann–Whitneynonparametric statistical test where the null hypothesis is rejected with a 95% confidencelevel).

For the classical functions, all the algorithms have comparable performances. However,for the CEC05 benchmark functions where the global optimum is not in the middle of thesearch space, both EDA–PSO and PSO_Bounds outperform the decreasing inertia weightPSO algorithm as the decreasing inertia weight PSO has a bias towards the origin of thesearch space. This is verified by running similar experiments to the ones reported in Monson

74 Swarm Intell (2010) 4: 57–89

Fig. 9 Decreasing inertia weight PSO performance under center offset

and Seppi (2005) for a dimensionality of 10. The algorithm is run using the center offsetapproach (shifting the function under study using a value of c equal to 0.25, 0.5, and 0.75)and applied to the Sphere, Rosenbrock, and Rastrigin functions.

Figure 9 shows the convergence behavior of the algorithm for all values of c acrossthe different functions. It is recognized that the algorithm always has the best perfor-mance when c = 0.5, which is the case when the origin is in the middle of the searchspace.

Finally, to compare the algorithms studied in this work against a simple EDA, the re-sults of EDAmvg are taken directly from Yuan and Gallagher (2005) and shown in Table 7.EDAmvg is a continuous EDA applied to the CEC05 benchmark functions tested in this work,where mvg refers to the multivariate Gaussian distribution used to continuously model andgenerate successive populations. The mvg model was chosen to capture the correlation be-tween the different variables. The EDAmvg was only applied to a dimensionality of 10 and30 using 200 and 1,000 individuals performing 100,000 and 300,000 function evaluationsrespectively (the results in the table for a dimensionality of 30 is achieved after 100,000function evaluations). The results show that EDAmvg is outperformed by the decreasing in-ertia weight PSO in both dimensionalities for some cases or in the higher dimensionality forother cases except for f12. Keep in mind also that EDAmvg needed a very large populationto achieve these results. These results encourage the introduction of PSO components intoEDAs and hence the use of the hybrid algorithms studied in this work.

Swarm Intell (2010) 4: 57–89 75

Table 4 Results of all the algorithms for the classical functions; 10, 30, and 50 dimensions

Benchmark Dimensions Decreasing inertia EDA–PSO PSO_Bounds

function weight PSO

Sphere 10 1.690e−146 1.78e−160 2.102e−156

8.440e−146 9.59e−160 8.017e−156

Rosenbrock 7.192e+00 5.321e−01 9.663e+00

1.446e+01 1.379e+00 9.174e+00

Griewank 7.633e−02 5.242e−02 2.025e−02

3.623e−02 4.096e−02 1.468e−02

Ackley 3.194e−15 1.268e+00 7.071e−16

1.598e−15 2.258e-15 6.486e−16

Rastrigin 2.753e+00 1.857e+00 0

1.625e+00 1.449e+00 0

Schwefel 3.617e+02 1.579e+03 5.390e+02

1.516e+02 2.291e+02 2.232e+02

Sphere 30 1.435e−35 2.037e−21 9.022e−26

2.668e−35 5.851e−21 4.530e−25

Rosenbrock 8.024e+01 1.926e+01 7.827e+01

1.195e+02 2.222e+00 7.351e+01

Griewank 1.517e−02 1.655e−02 3.015e−02

1.903e−02 1.730e−02 2.844e−02

Ackley 1.681e−16 1.586e+00 9.429e−12

6.443e−15 9.034e−16 5.164e−11

Rastrigin 3.270e+01 1.138e+01 6.418e+00

7.375e+00 2.846e+00 1.553e+01

Schwefel 3.205e+03 5.252e+03 1.674e+03

5.122e+02 6.468e+02 3.225e+02

Sphere 50 2.807e−33 5.232e−06 5.462e−09

6.159e−33 2.864e−05 2.562e−08

Rosenbrock 1.484e+02 3.826e+01 1.094e+02

1.784e+02 1.086e+01 4.634e+01

Griewank 8.770e−03 3.842e−06 2.498e−02

1.313e−02 2.026e−05 3.249e−02

Ackley 4.867e−14 1.641e+00 1.609e−06

4.285e−14 2.556e−16 6.044e−06

Rastrigin 8.583e+01 2.710e+01 1.110e+01

1.520e+01 3.118e+01 1.718e+01

Schwefel 7.628e+03 8.834e+03 2.990e+03

9.964e+02 1.287e+03 6.008e+02

76 Swarm Intell (2010) 4: 57–89

Table 5 Results of all the algorithms for the CEC05 functions, 10 and 30 dimensions


function weight PSO

f6 10 8.576e+00 1.463e+00 1.500e+01

3.192e+01 1.954e+00 1.437e+01

f7 2.752e−01 4.530e−02 –

1.524e−01 3.065e−02 –

f9 2.355e+00 2.720e+00 0

1.266e+00 1.752e+00 0

f10 1.383e+01 3.152e+00 9.784e+00

5.389e+00 1.925e+00 4.173e+00

f11 4.363e+00 5.100e+00 3.602e+00

1.042e+00 2.456e+00 1.345e+00

f12 6.552e+03 1.496e+04 3.277e+03

4.840e+03 6.251e+03 2.481e+03

f13 6.266e−01 8.142e−01 4.111e−01

1.792e−01 3.086e−01 1.103e−01

f14 2.709e+00 2.725e+00 2.486e+00

4.667e−01 4.708e−01 5.332e−01

f6 30 8.305e+01 3.722e+01 2.245e+02

9.054e+01 3.050e+01 3.021e+02

f7 2.060e−02 1.607e−02 –

1.764e−02 1.645e−02 –

f9 3.618e+01 1.602e+01 4.906e+00

7.420e+00 3.948e+00 5.795e+00

f10 9.752e+01 2.052e+01 4.777e+01

4.407e+01 1.679e+01 1.640e+01

f11 2.994e+01 2.625e+01 2.803e+01

4.081e+00 1.347e+01 3.119e+00

f12 3.369e+05 8.596e+05 2.997e+05

2.093e+05 2.114e+05 2.471e+05

f13 3.352e+00 3.138e+00 2.155e+00

8.214e−01 7.602e−01 6.103e−01

f14 1.296e+01 1.230e+01 1.260e+01

3.511e−01 4.349e−01 4.956e−01

7.3 Comparison of the different exchange approaches

The three exchange approaches are evaluated by using a fixed synchronization period of50 iterations (equal to the information adjustment window size). The results are shown inTables 8, 9, and 10.

Swarm Intell (2010) 4: 57–89 77

Table 6 Results of all the algorithms for the CEC05 functions, 50 dimensions


function weight PSO

f6 50 1.624e+02 4.407e+01 7.175e+02

1.658e+02 1.870e+01 1.450e+03

f7 3.510e−02 6.000e−04 –

3.226e−02 1.632e−03 –

f9 9.247e+01 3.287e+01 1.503e+01

1.671e+01 5.679e+00 1.158e+01

f10 2.211e+02 3.658e+01 8.627e+01

7.229e+01 6.425e+00 2.439e+01

f11 6.351e+01 5.195e+01 5.501e+01

5.214e+00 2.594e+01 6.475e+00

f12 7.271e+05 3.952e+06 2.818e+05

6.204e+05 9.902e+05 6.999e+05

f13 7.498e+00 5.751e+00 3.598e+00

1.334e+00 1.026e+00 7.236e−01

f14 2.279e+01 2.163e+01 2.219e+01

3.393e−02 7.393e−01 5.788e−01

Table 7 Results of EDA_mvg, 10 and 30 dimensions

Benchmark Dimension = 10 Dimension = 30

function Mean. Std. Mean Std.

f6 4.182e−02 1.508e−01 1.067e+03* 4.726e+03

f7 4.205e−01 1.330e−01 7.327e−01 1.377e−01

f9 5.418e+00 1.910e+00 1.895e+02 1.260e+01

f10 5.289e+00 2.774e+00 2.054e+02* 1.860e+01

f11 3.945e+00 3.120e+00 4.049e+01* 1.032e+00

f12 4.423e+02 1.169e+03 1.868e+04 1.723e+04

f13 1.842e+00 3.340e−01 1.720e+01* 1.207e+00

f14 2.630e+00 3.942e−01 1.355e+01* 1.249e−01

*The results after 300,000 function evaluations are still outperformed by the algorithms in this work perform-ing 100,000 function evaluations only

The results show that exchanging probabilistic models both ways and then using themto generate some particles to replace the worst particles of the receiving swarm is the bestapproach. As indicated earlier, this approach strikes a balance between the other two ap-proaches as it overcomes their disadvantages.

A number of experiments are performed using different synchronization periods and asample of the results are shown in Figs. 10 and 11. The results show that most of the func-tions have similar behaviors in response to changing the synchronization period and that

78 Swarm Intell (2010) 4: 57–89

Table 8 Results of all the exchange approaches for the classical functions; 10, 30, and 50 dimensions

Benchmark Dimensions Model Particles Models

function combination exchange generating particles

Sphere 10 0 0 0

0 0 0

Rosenbrock 4.817e+00 2.767e+00 1.914e+00

1.600e+01 6.735e+00 4.323e+00

Griewank 3.984e−02 4.154e−02 3.002e−02

2.963e−02 2.907e−02 2.038e−02

Ackley 0 0 0

0 0 0

Rastrigin 1.658e−01 3.640e−01 9.950e−02

4.588e−01 9.563e−01 3.036e−01

Schwefel 4.318e+02 4.774e+02 4.251e+02

1.898e+02 2.735e+02 1.711e+02

Sphere 30 0 0 0

0 0 0

Rosenbrock 5.371e+01 1.783e+02 4.775e+01

5.796e+01 5.533e+02 3.631e+01

Griewank 1.063e−02 2.616e−02 1.442e−02

1.199e−02 3.037e−02 1.598e−02

Ackley 3.729e−01 4.266e−01 2.020e−05

4.435e−01 1.302e+00 5.066e−05

Rastrigin 2.538e+01 1.223e+01 7.702e+00

1.074e+01 1.035e+01 1.000e+01

Schwefel 2.533e+03 1.948e+03 1.855e+03

1.018e+03 8.448e+02 5.391e+02

Sphere 50 2.033e−06 0 0

1.114e−05 0 0

Rosenbrock 1.003e+02 9.940e+01 1.067e+02

9.900e+01 1.381e+02 9.557e+01

Griewank 1.073e−02 3.259e−02 9.811e−03

2.011e−02 3.803e−02 1.803e−02

Ackley 3.277e−01 0 5.123e−04

1.524e+00 0 1.297e−03

Rastrigin 3.652e+01 5.195e+00 1.512e+01

1.453e+01 7.847e+00 2.683e+01

Schwefel 3.809e+03 3.096e+03 3.163e+03

1.334e+03 6.495e+02 8.179e+02

Swarm Intell (2010) 4: 57–89 79

Table 9 Results of all the exchange approaches for the CEC05 functions, 10 and 30 dimensions



f6 10 5.873e+00 8.488e+00 2.161e+00

1.758e+01 3.149e+01 3.746e+00

f9 5.388e−01 1.165e+00 6.506e−01

8.675e−01 1.309e+00 8.375e−01

f10 3.814e+00 9.371e+00 3.250e+00

2.934e+00 5.066e+00 1.732e+00

f11 3.443e+00 4.069e+00 3.007e+00

1.618e+00 1.566e+00 1.636e+00

f12 1.725e+03 9.962e+03 7.399e+02

2.812e+03 6.109e+03 9.237e+02

f13 5.070e−01 5.372e−01 4.750e−01

1.307e−01 1.605e−01 1.591e−01

f14 2.479e+00 2.224e+00 2.427e+00

4.013e−01 4.330e−01 3.804e−01

f6 30 7.462e+01 8.818e+01 8.414e+01

9.737e+01 8.786e+01 9.221e+01

f9 2.129e+01 2.176e+01 1.876e+01

6.387e+00 8.921e+00 5.750e+00

f10 4.479e+01 4.750e+01 2.441e+01

3.296e+01 1.538e+01 1.135e+01

f11 2.689e+01 2.972e+01 2.417e+01

7.409e+00 5.095e+00 5.067e+00

f12 3.245e+05 5.569e+05 5.013e+04

2.273e+05 2.064e+05 3.424e+04

f13 3.302e+00 3.048e+00 2.839e+00

8.462e−01 1.012e+00 6.715e−01

f14 1.221e+01 1.243e+01 1.239e+01

6.402e−01 3.801e−01 5.145e−01

the synchronization period value at which the best result is obtained is different from onefunction to another. However, the experiments show that our default selected synchroniza-tion period of 50 iterations is usually a reasonable choice for different functions and acrossdifferent dimensions. This value is kept fixed in order to avoid the problem of tuning such aparameter and to make a fair comparison with the other cooperative PSO models.

7.4 Convergence analysis

The convergence behavior of the two cooperating components and the model using the bestexchange approach are shown in Figs. 12, 13, and 14.

80 Swarm Intell (2010) 4: 57–89

Table 10 Results of all the exchange approaches for the CEC05 functions, 50 dimensions



f6 50 1.194e+02 1.308e+02 8.956e+01

8.599e+01 1.014e+02 6.116e+01

f9 3.529e+01 1.983e+01 2.998e+01

1.125e+01 8.090e+00 1.085e+01

f10 3.360e+01 9.061e+01 6.169e+01

7.125e+00 2.267e+01 2.141e+01

f11 5.232e+01 4.521e+01 4.602e+01

1.378e+01 7.993e+00 9.117e+00

f12 7.024e+04 6.691e+04 4.384e+04

4.908e+04 9.928e+04 3.087e+04

f13 5.723e+00 5.219e+00 5.200e+00

9.960e−01 1.086e+00 1.088e+00

f14 2.135e+01 2.209e+01 2.127e+01

7.338e−01 6.963e−01 7.738e−01

Fig. 10 Synchronization period effect for the Rosenbrock function; 10 and 30 dimensions

The plots illustrate that EDA–PSO generally has a slower speed of convergence thanPSO_Bounds. As for the cooperative model, more than one scenario indicates that the modelgenerally tracks the component with the best performance:

– For f6 and f10, the model tracks PSO_Bounds at the beginning of the search, asPSO_Bounds has faster speed of convergence, but the model then switches to track EDA–PSO towards the end of the search.

– For f11 and f12, the model generally follows the behavior of PSO_Bounds and gives amarginally better result for f11 and a superior result for f12.

– For f13, both components have very similar behavior, and the model is following the twocomponents.

Swarm Intell (2010) 4: 57–89 81

Fig. 11 Synchronization period effect for two sample CEC05 benchmark functions; 10 and 30 dimensions

– For f14, the model is following PSO_Bounds, which has better performance in the small-sized problem. For higher dimensionalities, the model is following EDA–PSO which hasthe best performance.

8 Performance of the adaptive model in comparison with other algorithms

The proposed adaptive cooperative model is compared with the following state-of-the-artPSO cooperative algorithms:

– CPSO_S (van den Bergh and Engelbrecht 2004): A cooperative PSO approach, whereeach dimension is being optimized by a separate swarm. The approach uses 10 particlesper swarm, as this was shown in van den Bergh and Engelbrecht (2004) to be the optimalsetting.

– DMS-L-PSO (Liang and Suganthan 2005a, 2005b): A dynamic lbest multi-swarm ap-proach in which particles are randomly and continuously assigned to different swarms.DMS-L-PSO is also combined with the Quasi-Newton method to improve its local searchability. The approach uses 20 swarms and 3 particles per swarm. These swarms are ran-

82 Swarm Intell (2010) 4: 57–89

Fig. 12 Convergence behavior of the three algorithms for the CEC05 benchmark functions f6, f9, and f10;10 and 30 dimensions

domly re-constructed every 5 iterations, and the Quasi-Newton method is performed onthe best 20% particles’ pbests every 100 iterations.

Swarm Intell (2010) 4: 57–89 83

Fig. 13 Convergence behavior of the three algorithms for the CEC05 benchmark functions f11 and f12; 10and 30 dimensions

– TRIBES-D (Clerc 2006a; Cooren et al. 2009): A parameter-free PSO having multipleswarms, referred to as TRIBES. The tribes share information among them and have theability to destroy bad particles and/or randomly generate new ones to form a new tribe.The source code is available at Clerc (2006b).

Tables 11, 12, and 13 show the complete results of theses algorithms for the classicalfunctions and the CEC05 benchmark functions in all dimensions. Table 14 summarizes thiscomparison showing the number of cases in which each algorithm was the best out of the 15cases (5 functions in 3 dimensions) in the classical functions and the 21 cases (7 functionsin 3 dimensions) in the CEC05 benchmark functions.

The comparison also shows the function for which each algorithm provided the best re-sult in all dimensions. The results show that the new cooperative model is very competitivewith the state-of-the-art PSO cooperative algorithms and that there is no algorithm that out-performs all the others on more than two functions. A dash indicates that there is no functionfor which the corresponding algorithm provided the best result across all the dimensions.

84 Swarm Intell (2010) 4: 57–89

Fig. 14 Convergence behavior of the three algorithms for the CEC05 benchmark functions f13 and f14; 10and 30 dimensions

9 Conclusions

In this work, we propose a new cooperative PSO algorithm based on the migration ofheterogeneous probabilistic models. The model is applied to continuous nonlinear func-tion optimization. The model utilizes two PSO and EDA hybrids, namely, EDA–PSO andPSO_Bounds. The two algorithms exchange their information at a predetermined number ofiterations. Three different exchange approaches are tested in this work. The first approachis the classical migration of particles. The second approach involves adaptively combiningthe migrant model with the resident model before continuing with the search. The thirdapproach is the migration of probabilistic models, where each model is used to generatesome particles to replace the worst particles in the receiving swarm. However, since the twomodels are different, the second approach involves a model conversion step in which eachalgorithm converts the received model into an equivalent model that is in the same formof its resident one. The PSO_Bounds algorithm combines the received model with its res-ident one and continues with the search. On the other hand, EDA–PSO uses the combinedreceived-resident model to generate new particles to replace its worst particles.

Swarm Intell (2010) 4: 57–89 85

Table 11 Results of all the algorithms for the classical functions; 10, 30, and 50 dimensions

Benchmark Dimensions Adaptive DMS-L-PSO CPSO_S TRIBES-D

function model

Sphere 10 0 3.45e−59 1.33e−155 0

0 5.72e−59 5.35e−155 0

Rosenbrock 4.36e+00 6.14e−11 4.80e+00 3.19e+00

7.41e+00 1.08e−11 7.72e+00 2.77e+00

Griewank 3.00e−02 0 4.89e−02 4.30e−02

2.08e−02 0 2.46e−02 2.45e−02

Ackley 0 2.31e−15 1.08e−14 0

0 1.08e−15 4.25e−15 0

Rastrigin 9.95e−02 0 0 0

3.04e−02 0 0 0

Schwefel 4.25e+02 1.03e+02 1.15e+03 1.27e−04

1.71e+02 9.70e+01 2.85e+02 2.76e−20

Sphere 30 0 8.98e−27 4.94e−49 0

0 1.08e−26 1.48e−48 0

Rosenbrock 4.77e+01 2.68e−01 2.14e+01 2.89e+01

3.63e+01 1.01e+00 1.72e+01 3.62e+01

Griewank 1.442e−02 1.48e−16 2.46e−02 4.82e−02

1.60e−02 1.10e−16 1.88e−02 4.87e−02

Ackley 2.02e−05 1.12e−12 4.05e−14 3.92e−04

5.07e−05 2.10e−12 1.24e−14 8.46e−04

Rastrigin 7.70e+00 1.78e+01 0 2.29e+00

1.00e+01 4.06e+00 0 1.55e+00

Schwefel 1.86e+03 2.63e+03 3.97e+03 5.38e+02

5.39e+02 9.26e+02 1.05e+03 2.96e+02

Sphere 50 0 1.36e−29 4.56e−57 0

0 1.15e−29 2.17e−56 0

Rosenbrock 1.07e+02 8.09e+00 6.32e+01 5.88e+01

9.56e+01 3.34e+00 2.87e+01 4.16e+01

Griewank 9.81e−03 1.37e−16 1.35e−02 4.90e−02

1.80e−02 4.78e−17 1.17e−02 5.51e−02

Ackley 5.12e−04 3.80e−12 9.08e−14 1.26e−04

1.28e−03 3.66e−12 6.79e−14 2.39e−04

Rastrigin 1.51e+01 3.50e+01 2.98e−01 3.12e+00

2.68e+01 6.95e+00 5.32e−01 1.65e+00

Schwefel 3.16e+03 6.85e+03 6.81e+03 7.95e+02

8.18e+02 1.77e+03 7.45e+02 2.99e+02

Experimental results obtained using a group of classical optimization functions and theCEC05 benchmark library show that the third exchange approach is the best among thethree. This approach overcomes the drawback of other methods. In this approach, the infor-mation present in the received model is transmitted through the generated particles. These

86 Swarm Intell (2010) 4: 57–89

Table 12 Results of all the algorithms for the CEC05 benchmark functions, 10 and 30 dimensions


function model

f6 10 2.16e+00 5.00e−6 2.45e+01 1.35e+01

3.75e+00 1.92e−05 2.94e+01 2.15e+01

f9 6.50e−01 0 0 0

8.38e−01 0 0 0

f10 3.25e+00 4.48e+00 3.82e+01 9.65e+00

1.73e+00 1.27e+00 1.89e+01 3.43e+00

f11 3.01e+00 4.76e+00 6.67e+00 4.06e+00

1.64e+00 6.99e−01 1.73e+00 1.08e+00

f12 7.40e+02 2.48e+00 4.39e+02 1.15e+03

9.24e+02 4.36e+00 6.69e+02 6.68e+02

f13 4.75e−01 3.77e−01 2.75e−01 4.27e−01

1.59e−01 9.26e−02 1.46e−01 1.08e−01

f14 2.42e+00 2.66e+00 3.80e+00 2.89e+00

3.80e−01 2.44e−01 3.53e−01 5.09e−01

f6 30 8.41e+01 7.53e+01 8.54e+01 5.87e+01

9.22e+01 5.63e+01 7.57e+01 5.06e+01

f9 1.88e+01 2.28e+01 3.32e−02 3.06e+00

5.75e+00 5.30e+00 1.82e−01 1.73e+00

f10 2.44e+01 4.46e+01 1.78e+02 1.45e+02

1.13e+01 9.34e+00 4.35e+01 3.21e+01

f11 2.41e+01 3.13e+01 2.11e+01 2.96e+01

5.07e+00 8.98e−01 3.62e+00 2.26e+00

f12 5.01e+04 9.17e+02 6.60e+03 1.05e+05

3.42e+04 1.34e+03 4.83e+03 3.85e+04

f13 2.84e+00 3.05e+00 1.15e+00 2.53e+00

6.72e−01 5.25e−01 2.25e−01 7.03e−01

f14 1.24e+01 1.23e+01 1.32e+01 1.29e+01

5.15e−01 3.36e−01 4.97e−01 2.95e−01

particles will not harm the search in any way as they replace the worst particles in the swarmand at the same time are not identical to the ones present in the sending swarm (as happensin the first approach). Additionally, the resident model remains unchanged unless these par-ticles are good enough to guide the search.

Finally, in the cooperative model proposed, the amount of information exchanged isadaptively set based on the performance of the cooperating components. This is done us-ing a sliding window approach where every predetermined number of iterations the perfor-mance of the two components is measured and used to control the number of particles thatare generated by the exchanged probabilistic models. The cooperative model is also shownto be very competitive with some of the cooperative PSO state-of-the-art algorithms whenapplied to the functions under study.

In future work, we intend to have a different number of function evaluations performedby each component as this might actually improve the performance when one of the com-

Swarm Intell (2010) 4: 57–89 87

Table 13 Results of all the algorithms for the CEC05 benchmark functions, 50 dimensions


function model

f6 50 8.96e+01 1.31e+00 1.37e+02 9.23e+01

6.12e+01 1.67e+00 1.52e+02 8.24e+01

f9 3.00e+01 5.98e+01 6.63e−02 5.47e+00

1.08e+00 1.15e+01 2.52e−01 2.76e+00

f10 6.17e+01 9.72e+01 3.43e+02 3.65e+02

2.14e+01 1.39e+01 7.62+01 7.50e+01

f11 4.60e+01 6.00e+01 3.81e+01 5.39e+01

9.12e+00 1.31e+00 5.41e+00 4.55e+00

f12 4.38e+04 3.53e+03 1.40e+04 4.08e+05

3.09e+04 3.86e+03 1.31e+03 1.12e+05

f13 5.20e+00 6.03e+00 2.22e+00 4.38e+00

1.09e+00 9.83e−01 4.51e−01 1.07e+00

f14 2.13e+01 2.13e+01 2.28e+01 2.25e+01

7.74e−01 4.42e−01 5.05e−01 2.98e−01

Table 14 Comparison of all the algorithms

Algorithm Classical functions CEC05 functions Total

# Cases Best in # Cases Best in # Cases

Adaptive 4 Sphere 8 f10, f14 12

DMS-L-PSO 6 Rosenbrock 9 f6, f12 15

CPSO_S 6 Rastrigin 9 f9, f13 15

TRIBES-D 8 Sphere, Schwefel 2 – 10

ponents is much better than the other. The model could adaptively set this ratio accordingto each component performance during the search. We are also planning to study differentmodel conversion and model combination approaches that could lead to the enhancement ofthe performance of the second exchange approach. Another direction we will consider is tointroduce more cooperating swarms into the system performing different algorithms (e.g.,EDPSO) as this might affect the performance of the different exchange strategies.

One additional point that also need to be addressed in PSO_Bounds is the need to specifythe search bounds. In some cases, it is not possible to know beforehand where the optimummight be, and this renders PSO_Bounds useless (as in f7). It is intended to tackle this issuein order to apply this algorithms to more functions.

Acknowledgements The authors would like to thank the reviewers and the editors for their helpful insightsand suggestions.

References

Ahn, C. W., Goldberg, D. E., & Ramakrishna, R. S. (2004). Multiple-deme parallel estimation of distributionalgorithms: basic framework and applications. In Lecture notes in computer science: Vol. 3019. Pro-

88 Swarm Intell (2010) 4: 57–89

ceedings of international conference on parallel processing and applied mathematics (pp. 544–551).Berlin: Springer.

Clerc, M. (2006a). Particle swarm optimization. London: ISTE.Clerc, M. (2006b). Tribes-d code. http://clerc.maurice.free.fr/pso/Tribes/TRIBES-D.zip.Cooren, Y., Clerc, M., & Siarry, P. (2009). Performance evaluation of tribes, an adaptive particle swarm

optimization algorithm. Swarm Intelligence, 3(2), 149–178.de la Ossa, L., Gamez, J., & Puerta, J. (2004). Migration of probability models instead of individuals: an

alternative when applying the island models to EDAs. In X. Yao, E. Burke, J. A. Lozano, J. Smith,J. J. Merelo-Guervo’s, J. A. Bullinaria, J. Rowe, P. Tino, A. Kaba’n, & H. P. Schwefel (Eds.), Lecturenotes in computer science: Vol. 3242. Proceedings of parallel problem solving from nature (pp. 242–252). Berlin: Springer.

de la Ossa, L., Gamez, J., & Puerta, J. (2006). Initial approaches to the application of island-based paralleledas in continuous domains. Journal of Parallel and Distributed Computing, 66(8), 991–1001.

El-Abd, M., & Kamel, M. S. (2007). Particle swarm optimization with varying bounds. In Proceedings ofIEEE congress on evolutionary computation (pp. 4757–4761). Piscataway: IEEE Press.

El-Abd, M., & Kamel, M. S. (2008). A taxonomy of cooperative search algorithms. International Journal ofComputational Intelligence Research, 4(2), 137–144.

Gallagher, M., Frean, M., & Downs, T. (1999). Real-valued evolutionary optimization using a flexible prob-ability density estimator. In Proceedings of genetic and evolutionary computation conference (Vol. 1,pp. 840–846). San Francisco: Morgan Kaufmann.

Grahl, J., Bosman, P. A. N., & Rothlauf, F. (2006). The correlation-triggered adaptive variance scaling idea.In Proceedings of genetic and evolutionary computation conference (pp. 397–404). New York: ACM.

Hiroyaso, T., Miki, M., Sano, M., Shimosaka, H., Tsutsui, S., & Dongarra, J. (2003). Distributed probabilisticmodel-building genetic algorithm. In Lecture notes in computer science: Vol. 2724. Proceedings ofgenetic and evolutionary computation conference (pp. 1015–1028). Berlin: Springer.

Iqbal, M. & Montes de Oca, M. A. (2006). An estimation of distribution particle swarm optimization algo-rithm. In M. Dorigo, L. M. Gambardella, M. Birattari, A. Martinoli, R. Poli, & T. Stützle (Eds.), Lecturenotes in computer science: Vol. 4150. Proceedings of the fifth international workshop on ant colonyoptimization and swarm intelligence (pp. 72–83). Berlin: Springer.

Jaros, J. & Schwarz, J. (2007). Parallel BMDA with probability model migration. In Proceedings of IEEEcongress on evolutionary computation (pp. 1059–1066). Piscataway: IEEE Press.

Kennedy, J., & Eberhart, R. C. (1995). Particle swarm optimization. In Proceedings of IEEE internationalconference on neural networks (Vol. 4, pp. 1942–1948). Piscataway: IEEE Press.

Kennedy, J., & Mendes, R. (2002). Population structure and particle swarm performance. In Proceedingsof IEEE congress on evolutionary computation (Vol. 2, pp. 1671–1676). Washington: IEEE ComputerSociety.

Larrañaga, P., & Lozano, J. A. (2001). Estimation of distribution algorithms. A new tool for evolutionarycomputation, genetic algorithms and evolutionary computation (Vol. 2). Berlin: Springer.

Liang, J. J., & Suganthan, P. N. (2005a). Dynamic multi-swarm particle swarm optimizer. In Proceedings ofIEEE swarm intelligence symposium (pp. 124–129). Piscataway: IEEE Press.

Liang, J. J., & Suganthan, P. N. (2005b). Dynamic multi-swarm particle swarm optimizer with local search.In Proceedings of IEEE congress on evolutionary computation (pp. 522–528). Washington: IEEE Com-puter Society.

Madera, J., Alba, E., & Ochoa, A. (2006). A parallel island model for estimation of distribution algorithms. InJ. A. Lozano, P. Larrañaga, I. Inza, & E. Bengoetxea (Eds.), Advances in the estimation of distributionalgorithms, studies in fuzziness and soft computing: Vol. 192. Towards a new evolutionary computation(pp. 159–186). Berlin: Springer.

Monson, C. K., & Seppi, K. D. (2005). Exposing origin-seeking bias in PSO. In Proceedings of genetic andevolutionary computation conference (pp. 241–248). New York: ACM Press.

Mühlenbein, H. (1998). The equation for response to selection and its use for prediction. IEEE Transactionson Evolutionary Computation, 5(3), 303–346.

Rudolph, S., & Köppen, M. (1996). Stochastic hill climbing with learning by vectors of normal distributions.In First on-line workshop on soft computing (pp. 60–70), Nagoya, Japan, Nagoya University.

Schwarz, J., Jaros, J., & Ocenasek, J. (2007). Migration of probabilistic models for island-based bivariateEDA algorithm. In Proceedings of genetic and evolutionary computational conference (Vol. 1, p. 631).New York: ACM.

Sebag, M., & Ducoulombier, A. (1998). Extending population-based incremental learning to continuoussearch spaces. In Lecture notes in computer science: Vol. 1498. Proceedings of parallel problem solvingfrom nature (pp. 418–427). Berlin: Springer.

Servet, I., Trave-Massuyes, L., & Stern, D. (1997). Telephone network traffic overloading diagnosis andevolutionary computation technique. In Lecture notes in computer science: Vol. 1363. Proceedings ofartificial evolution (pp. 137–144). Berlin: Springer.

http://clerc.maurice.free.fr/pso/Tribes/TRIBES-D.zip

Swarm Intell (2010) 4: 57–89 89

Socha, K., & Dorigo, M. (2005). Ant colony optimization for continuous domains (Technical ReportTR/IRIDIA/2005-037). Université Libre de Bruxelles, Brussels, Belgium.

Socha, K., & Dorigo, M. (2008). Ant colony optimization for continuous domains. European Journal ofOperational Research, 185(3), 1155–1173.

Suganthan, P. N., Hansen, N., Liang, J. J., Deb, K., Chen, Y. P., Auger, A., & Tiwari, S. (2005a). CEC05benchmark functions. http://staffx.webstore.ntu.edu.sg/MySite/Public.aspx?accountname=epnsugan.

Suganthan, P. N., Hansen, N., Liang, J. J., Deb, K., Chen, Y. P., Auger, A., & Tiwari, S. (2005b). Problemdefinitions and evaluation criteria for the CEC 2005 special session on real-parameter optimization(Technical Report 2005005). ITT Kanpur, India.

van den Bergh, F., & Engelbrecht, A. P. (2004). A cooperative approach to particle swarm optimization. IEEETransactions on Evolutionary Computation, 8(3), 225–239.

Yuan, B., & Gallagher, M. (2005). Experimental results for the special session on real-parameter optimizationat CEC 2005: a simple, continuous EDA. In Proceedings IEEE congress on evolutionary computation(Vol. 2, pp. 1792–1799). Washington: IEEE Computer Society.

Zhou, Y., & Jin, J. (2006). EDA/PSO—a new hybrid intelligent optimization algorithm. In Pro-ceedings of the university of Michigan graduate student symposium (p. 231). Michigan.http://www.engin.umich.edu/students/gradsymposium/2006pamphlet.pdf.

http://staffx.webstore.ntu.edu.sg/MySite/Public.aspx?accountname=epnsugan

http://www.engin.umich.edu/students/gradsymposium/2006pamphlet.pdf

Date post:	04-Jul-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

A cooperative particle swarm optimizer with …liacs.leidenuniv.nl › ~csnaco › SWI › papers...

Documents