Estimating Differential Evolution crossover parameter with VNS approach for continuous global...

Estimating Differential Evolution crossoverparameter with VNS approach for continuous

global optimization

Darko Kovacevic 1 Bratislav Petrovic 2 Pavle Milosevic 3

Faculty of Organizational Sciences, University of Belgrade, Belgrade, Serbia

Abstract

There are two metaheuristics that deserve considerable attention in the past yearsby both academics and practitioners. VNS (Variable Neighborhood Search) is aunique and effective concept of combinatorial and global metaheuristic, based onthe principles of systematic changes in the environment in conjunction with localsearches. On the other side, Differential Evolution (DE), which presents a simpleconcept of the global search space based on evolutionary algorithm that is used inmany practical applications. In this paper we use VNS idea to estimate crossoverparameter, the important parameter of DE heuristic. For that purpose we proposea family of distributions for controlling distances in the search space.

Keywords: global optimization, metaheuristics, Differential Evolution, VariableNeighborhood Search, parametric distributions

1 Email: [email protected] Email: [email protected] Email: [email protected]

Available online at www.sciencedirect.com

Electronic Notes in Discrete Mathematics 39 (2012) 257–264

1571-0653/$ – see front matter © 2012 Elsevier B.V. All rights reserved.

www.elsevier.com/locate/endm

doi:10.1016/j.endm.2012.10.034

http://www.elsevier.com/locate/endm

http://dx.doi.org/10.1016/j.endm.2012.10.034

http://dx.doi.org/10.1016/j.endm.2012.10.034

http://www.sciencedirect.com

1 Introduction

This paper is dealing with multimodal problems of finding global optima indifficult unconstrained nonlinear problems over continuous spaces. DE wasproposed by Storn et al. [9]. It is a very simple and straightforward strat-egy consisting three main parts: strategy, crossover and selection. Thereare two basic strategy approaches: ”DE/rand/1/bin”, which usually demon-strates slow convergence speed and bears stronger exploration capability, and”DE/best/1/bin”, which represents the degenerated case of the previous andusually has the high convergence speed and performs well on the unimodalproblems. Crossover determines whether the target or the trial vector sur-vives to the next generation. The most attention in this paper will be devotedto the mechanism of self-adaptation of crossover parameter CR. At last, wehave selection based on the choice of better solutions. Vesterstroem et al.[13] compared the DE algorithm with Particle Swarm Optimization (PSO)and Evolutionary Algorithms (EAs) on numerical benchmark problems. DEoutperformed PSO and EAs in terms of the solution’s quality on the mostproblems. Sun et al. [10] proposed a combination of DE and the Estimationof Distribution Algorithm (EDA), which tries to guide its search toward apromising area by sampling new solutions from a probability model. Liu etal. [3] reported that effectiveness, efficiency, and robustness of the DE algo-rithm are sensitive to the settings of the control parameters, namely F andCR. The best settings for the control parameters can be different for differentfunctions and for the same function with different requirements. Bearing thisin mind, we are proposing a self-adaptive algorithm based on the DE, whichincorporates the features of VNS [4,5] approach.

The VNS approach does not follow the vector trajectories. It exploresincreasingly distant neighborhoods of the current best solution. If better so-lution is found, VNS jumps from current solution to the new one. Theseneighborhoods in RN are used for estimating parameter CR. There is a vari-ety of hybrid solutions, combining VNS with other optimizers.

2 Self-adaptive Differential Evolution

In the conventional realizations of DE, choice of three basic parameters de-noted by N - population size, F - mutation parameter and CR - crossoverparameter, mainly determines the outcome of the optimization. There is agreat number of empirical findings e.g. [6], which help us to choose a set ofvalues, but there is no strict set of parameters relating to the broader set of

D. Kovacevic et al. / Electronic Notes in Discrete Mathematics 39 (2012) 257–264258

practical problems. As for the population size, we know that higher values al-low greater diversification of the population vectors in the solution space andtherefore a better search space. On the other hand, large population causes aslow convergence, which can be reflected in the slowness of the algorithm incase of large scale optimization problems. Parameter F for large values has thefeature of exploration the solution space, while smaller values are conditioningexploitation. The problem of selecting the values of the parameter F is solvedby introducing roulette methods which gradually give greater probability ofdrawing the successful values of F.

Control parameter F will be implemented by introducing competition intothe heuristics. Let us have a number of settings for parameter F denoted byH and choose among them at random with probability ph, h=1,2,...,H. Theprobabilities can be adjusted according to the success rate of preceding stepsof searching the solution space. The h-th setting is successful if it generatessuch a trial point −→y that f(−→y ) <f(−→xi ). When nh is the current number ofthe f(−→y ) setting successes, probability ph can be calculated as the relativefrequency

ph =nh + n0∑Hi=1 nj + n0

(1)

where n0 >0 is a constant. The settings of n0 >1 prevents a dramatic changein ph by use of the h-th parameter settings. In order to avoid prematureconvergence of probabilities ph, current values of ph are reset to the startingvalues ph=1+H if any of probabilities drop below a given threshold δ >0 andδ <1 . The competition structure prefers the parameters related to the successin a way that algorithm is self-adapted each time better solution is found overprobabilities ph for that settings.

If we consider CR, we see that, depending on its value, the degree ofsimilarity between offspring and trail vector is changing. We define the degreeof similarity of these two vectors as follows. The nearest surroundings of theoffspring are defined with small values of the parameter CR meaning, changesin vectors are over the small number of dimensions. On the other hand, forthe higher values of CR, we get further dimensional surroundings, and theoffspring vector can change by more dimensions at once. On the exampleof the rotated ellipse, Salomon [8] examples why CR factor should not havelow values during the whole course of the optimization. If CR parameter issmall, the algorithm may not be able to perform such a jump which couldfind a better solution by changing the small number of dimensions. For large

D. Kovacevic et al. / Electronic Notes in Discrete Mathematics 39 (2012) 257–264 259

values of CR, the entire population could be too fast converging so populationcould remain trapped in a local optimum. Because of this, we want to startour parameter from the closer neighborhoods to the offspring vector. If ouralgorithm is trapped in a local optimum or is on the correct path to the globaloptimum, we want higher values of CR for two reasons. The first reason isthat algorithm can escape from local optimum by several dimensions, allowingan effective jump to any point over solution space, and second, if it is locatedin the vicinity of global optimum, the search for large number of dimensions orneighbors, can speed up the convergence towards a global optimum. We willrely on the choice of probability distributions from which we want to choosethe parameter CR. Since we are talking about the probability of selecting theparameters CR, we are not defining a clear boundary between neighborhoodsi.e. dimensional distance about offspring vector, just the expected value ofdimensions.

3 Variable Neighborhood Search adaptation

Numerous studies show that it is difficult to choose crossover parameter, be-cause different values of CR correspond to different sets of problems. Weintroduce a family of adaptive distributions (Figure 1), that depend on vari-able neighborhood parameter - par, from which we choose CR values, sincethe parameter is very sensitive to the type of problem that we want to ex-plore. VNS is the concept, in which the search around the current vectorstarts from the closest neighborhoods. If a better solution is not found, pro-gressively increasing of the neighborhood is applied. This idea is applied tothe CR parameter. When the algorithm finds a child vector better than theparent vector, in the next iterations it will be required that par factor re-mains at low values which imply the crossover by just few dimensions i.e.closest dimensional neighborhoods. In this way, it is ensured that the entirepopulation would not have converged too quickly, and therefore more detailedsearch area around the population vector. For these reasons, the iterationsin which the algorithm cannot find a more satisfactory solution will graduallyincrease the value of the neighborhood factor using the step factor, allow-ing finding new solutions in further dimensional distances around the parentvector. In the case of finding a favorable child vector, algorithm resets thedistribution according to objective function. This adaptiveness of parametersallows the adaptation of the algorithm to a given problem, so there is no needfor parametrization by the user. The analysis was started by observing thedistributions from which CR derives values. The large set of problems were


divided into three classes: Additive multi-modal problems (Schwefel, Ackley,Griewank, Rastrigin, Molecular potential energy (MPE), Michalewicz and Six-hump), Unimodal problems (Sphere, Rosenbrock and De Jong’s variations),Non-continuous problems (step function, molecule or sphere packing).

For each group of problems we repeated optimization 100 times using thealgorithm ”DE/rand/1/bin” with CR and F parameters as a random processfrom a uniform distribution. The value of F and CR was recorded in thecases of finding a fitness function that is better than the old fitness function,and in instances where the algorithm achieved a predefined tolerance. Wedefine moving windows for evaluation function so the bins contain the obtainedvalues of the parameters CR in the first iterations of the algorithm, the middleiterations of the algorithm, as well as the final part of the algorithm, whenthe algorithm succeeded to achieve a global optimum. Recorded distributionfor defined classes are presented in following figure.

Figure 1.

It can be shown that, for a wider class of problems that includes theparameter, CR can be modeled as a random process from a beta distribution

Johnson [2] used triangular distributions as a proxy to the beta distri-bution. We use extended version - two sided power distribution (TSP) [12]because the benefits of the control parameters comprehensible set evaluation.

f (x|a,m, b, par) =

⎧⎨⎩

parb−a

(x−am−a

)para < x < m

parb−a

(b−xb−m

)parm < x < b

(2)

Parameters a and b are defined by parameter CR limits i.e. a=CRmin,b=CRmax. The shape of continuous multi-modal distributions suggests usingparameter m=0. For the continues multi-modal problem, we can see thatalgorithm achieves the best results starting from TSP distribution which hasthe feature favoring small values of CR or detailed search over the solutionspace that is consistent with multi-modal definition of a problem, i.e for par→ 0. On other hand, in the final iterations, all CR values are equally favored,leading to values of the parameter par → 1.


Algorithm 1 Step 1: Randomly initialize a population of N individualsPG = {X1,G ,...,XN,G }, Xi,G = { XI

1,G ,...,XD1,G }, i = 1,...,n

Step 2: Evaluate the populationStep 3: Setting initial roulette probability for F parameterStep 4: WHILE stopping criteriaStep 4.1: Calculate roulette probability for F parameter (eq. 1)Step 4.2: Sampling CR from adaptive TSP distribution (eq. 2 a = 0, b = 1)Step 4.3: Applying strategy ”Rand/local-best/bin/1” with obtained F and CRStep 4.4: Evaluate child vector f(ychild)Step 4.5: IF child better than parent vector (f(ychild) ¡ f(yparent))Step 4.5.1: parnew= max(parmin, parold-(f(ychild) - f(yparent)))Step 4.5.2: parent = childStep 4.6: ElseStep 4.6.1: parnew = parold + step factorStep 4.6.2: parnew = min(parnew, parmax)Step 4.7: End IFStep 5: End WHILE

4 Numerical results

We compare several algorithms with significant global feature with our DE-VNS. We will use two different versions of DE with promising parameterselection according to [7] and roulette parameter - Debr18 [11] as well as VNSalgorithm with distinctive global characteristics [1]. Seven common bench-mark functions are used for numerical experiments: Sphere, Ackley, Griewank,MPE, Rastrigin, Schwefel, and Rosenbrock. To show the robustness of theproposed algorithm, we use from 10 to 100 dimensions, so we will cover theproblem with the lower dimensions and large scale problem as well. In ad-dition to the number of function evaluations (FEs) metrics we will be alsointerested in success rate (SR), the percentage of successfully achieved opti-mums in the predefined tolerance around the global optimum. For tolerancewe will use a value of 1e-6. In Table 1, the results of the test are shown. Grayareas indicate that for a given problem, the tolerance is not reached in 100%of cases. Tables also display the basic statistics for FEs: number of minimumFEs (eval min), average number of FEs (eval avg) and number of maximumFEs (eval max ). Each problem will be repeated 20 times, in order to obtainthe statistics of FEs and SR for each of them. In cases when the tolerance wasnot met, the star presents the average values for minimized cost function. Thebest results are marked with bold fields, provided that the optimizer achievedSR percentage of 100%. Minus signs in the tables indicate that data were notavailable.


DDE-VNS Debr18 DE/Rand/1 F=0.5 CR=0.3 Gauss VNS

eval min eval avg eval max SR fmin eval min eval avg eval max SR fmin eval min eval avg eval max SR fmin eval min eval avg eval max SR fmin

SCHW

EFEL

10 7,901 8,901 10,101 100% 1.00E-06 13,801 14,711 15,301 100% 1.00E-06 9,001 9,441 9,901 100% 1.00E-06 - - - - -

20 21,201 23,431 27,301 100% 1.00E-06 42,901 43,851 44,801 100% 1.00E-06 23,601 24,601 25,601 100% 1.00E-06 - - - - -

50 70,401 79,081 89,101 100% 1.00E-06 161,201 165,541 168,001 100% 1.00E-06 94,301 101,161 105,301 100% 1.00E-06 - - - - -

100 172,201 189,791 210,001 100% 1.00E-06 417,301 420,211 421,101 100% 1.00E-06 350,501 385,181 408,701 100% 1.00E-06 - - - - -

RASTRIG

IN

10 10,001 10,691 11,401 100% 1.00E-06 15,101 15,641 16,801 100% 1.00E-06 11,601 13,151 14,201 100% 1.00E-06 - 85,589 - 100% 1.00E-05

20 25,001 27,811 31,101 100% 1.00E-06 44,401 45,261 46,101 100% 1.00E-06 73,601 78,551 86,601 100% 1.00E-06 - 287,075 - 100% 1.00E-05

50 85,001 89,741 95,601 100% 1.00E-06 165,601 167,591 171,301 100% 1.00E-06 1,000,000 1,000,000 1,000,000 0% 118.18 - 1,524,701 - 100% 1.00E-05

100 200,101 220,541 243,701 100% 1.00E-06 414,401 418,761 422,501 100% 1.00E-06 2,000,000 2,000,000 2,000,000 0% 508.05 - 6,248,753 - 100% 1.00E-05

GRIE

WANK

10 11,801 22,101 47,401 100% 1.00E-06 31,401 35,861 43,201 100% 1.00E-06 16,101 17,501 20,001 100% 1.00E-06 - - - - -

20 22,701 24,961 28,101 100% 1.00E-06 46,001 51,441 56,701 100% 1.00E-06 21,301 23,971 26,901 100% 1.00E-06 - - - - -

50 67,401 73,821 80,201 100% 1.00E-06 153,701 158,431 172,201 100% 1.00E-06 68,401 71,001 74,001 100% 1.00E-06 - - - - -

100 158,801 168,961 180,701 100% 1.00E-06 383,801 390,411 399,401 100% 1.00E-06 212,601 216,321 218,501 100% 1.00E-06 - - - - -

ACKLEY

10 9,401 10,441 13,501 100% 1.00E-06 18,301 19,251 20,001 100% 1.00E-06 11,401 11,771 12,101 100% 1.00E-06 - 50,149 - 100% 1.00E-05

20 22,701 26,241 30,001 100% 1.00E-06 55,401 57,261 58,601 100% 1.00E-06 27,501 28,211 29,001 100% 1.00E-06 - 158,412 - 100% 1.00E-05

50 78,101 86,231 98,901 100% 1.00E-06 208,201 211,941 216,901 100% 1.00E-06 94,301 96,011 97,501 100% 1.00E-06 - 1,143,721 - 100% 1.00E-05

100 182,801 204,521 237,601 100% 1.00E-06 524,001 530,321 537,501 100% 1.00E-06 293,201 297,981 304,001 100% 1.00E-06 - - - - -

MPE

10 6,101 6,831 7,601 100% 1.00E-06 27,901 27,901 27,901 100% 1.00E-06 45,001 49,761 51,601 100% 1.00E-06 - 5,015 - 100% 1.00E-05

20 16,801 19,121 20,901 100% 1.00E-06 66,401 72,041 75,901 100% 1.00E-06 400,000 400,000 400,000 0% 0.0015 - 21,172 - 100% 1.00E-05

50 52,701 59,221 66,701 100% 1.00E-06 249,401 262,921 267,301 100% 1.00E-06 1,000,000 1,000,000 1,000,000 0% 8.9852 - 143,309 - 100% 1.00E-05

100 122,901 143,831 160,701 100% 1.00E-06 623,801 646,181 663,601 100% 1.00E-06 2,000,000 2,000,000 2,000,000 0% 39.5348 - 1,183,873 - 100% 1.00E-05

ROSENBROK 10 45,001 53,901 69,001 100% 1.00E-06 29,001 33,001 39,001 100% 1.00E-06 455,001 866,901 1,001,001 50% 2.88E-02 - - - - -

20 183,001 234,301 415,001 100% 1.00E-06 132,001 143,101 150,001 100% 1.00E-06 2,001,001 2,001,001 2,001,001 0% 0.1 - - - - -

30 281,001 424,301 516,001 100% 1.00E-06 280,001 333,601 359,001 100% 1.00E-06 3,000,001 3,000,001 3,000,001 0% 0.043 - - - - -

50 729,001 1,142,001 1,442,001 100% 1.00E-06 619,001 946,201 1,073,001 100% 1.00E-06 5,001,001 5,001,001 5,001,001 0% 1.16 - - - - -

SPHERE

10 8,001 8,201 9,001 100% 1.00E-06 13,001 13,701 15,001 100% 1.00E-06 8,001 8,501 9,001 100% 1.00E-06 - - - - -

20 19,001 21,001 22,001 100% 1.00E-06 39,001 40,601 42,001 100% 1.00E-06 20,001 20,401 21,001 100% 1.00E-06 - - - - -

50 64,001 72,101 83,001 100% 1.00E-06 153,001 155,901 162,001 100% 1.00E-06 70,001 70,801 71,001 100% 1.00E-06 - - - - -

100 153,001 165,001 189,001 100% 1.00E-06 394,001 397,801 402,001 100% 1.00E-06 224,001 226,901 230,001 100% 1.00E-06 - - - - -

Table 1Comparison of DE-VNS with other global optimization heuristics

For all DE variants used in this paper, we will set the maximum numberof parameter evaluation eval max and population pop as shown in Table 2.

Dimensions(D) eval max Pop

10 1e+5–5e+5 34

20 2e+5–2e+6 44

30 1.5e6–3e+6 50

50 5e+5–5e+6 80

100 1e+6–2e+6 100

Table 2Eval max and pop parameters for DE

Because of the characteristic of parameters mentioned before in the text,for the minimum value of parameter par we use parmin=0, parmax=1 andarbitrary step as stepfactor = 1/(10 · D · log2(D)). Parameters a and b aredefined as a=0, b=1, m=0. For competitive settings we use F=0.4, 0.6, 0.8and 1 with defined n0 = 2, δ = 0.05.

5 Conclusion

In this paper we presented a new approach for solving global optimizationof continuous problems, based on the application of Variable NeighborhoodSearch in Differential Evolution heuristic. The change of neighborhoods is


used for automatic estimation of the crossover parameter CR. Numerical re-sults show good performance of proposed algorithm. Our new approach iscompared, favorably, with some of very promising variants of DE and VNSused alone. Moreover, we have shown that the hybrid algorithm in all observedproblems had success rate of 100%, even at large dimension problems.

References

[1] Carrizosa, E., M. Drazic, Z. Drazic, and N. Mladenovic, Gaussian variableneighborhood search for continuous optimization, Computers & OperationsResearch 39 (2012), 2206–2213.

[2] Johnson, D., The triangular distribution as a proxy for the beta distribution inrisk analysis, The Statistician 46 (1997), 387–398.

[3] Liu, J., and J. Lampinen, On setting the control parameter of the differentialevolution method, In: Proc. 8th Int. Conf. Soft Computing (2002), 11–18.

[4] Mladenovic, N., and E. Hansen, Variable neighborhood search, Computers &Operations Research 24 (1997), 1097–1100.

[5] Mladenovic, N, M. Drazic, V. Kovacevic-Vujcic and M. Cangalovic, Generalvariable neighborhood search for the continuous optimization, European Journalof Operational Research 191 (2008), 753–770.

[6] Price, K., R. Storn, and J. Lampinen, “Differential Evolution: A PracticalApproach to Global Optimization”, Springer, 2005.

[7] Qin, A. K., V. L. Huang, and P. N. Suganthan, Differential evolution algorithmwith strategy adaptation for global numerical optimization, IEEE Trans onEvolutionary Computation 191 (2009), 398–417.

[8] Salomon, R., Re-evaluating genetic algorithm performance under coordinaterotation of benchmark functions: a survey of some theoretical and practicalaspects of genetic algorithms, Biosystems 39 (1996), 263–278.

[9] Storn, R., and K. Price, Differential Evolution: A Simple and EfficientAdaptive Scheme for Global Optimization over Continuous Spaces, J. GlobalOptimization 11 (1997), 341–359.

[10] Sun, J., Q. Zhang, and E. Tsang, DE/EDA: A new evolutionary algorithm forglobal optimization, Info. Sci. 169 (2004), 249–262.

[11] Tvrdik, J., Differential Evolution with Competitive Setting of its ControlParameters, TASK Quarterly 11 (2007), 169–179.

[12] Van Dorp, J., and S. Kotz, A Novel Extension of the Triangular Distributionand its Parameter Estimation, The Statistician 51 (2002), 63–79.

[13] Vesterstroem, J., and R. Thomsen, A comparative study of differential evolution,particle swarm optimization, and evolutionary algorithms on numericalbenchmark problems, In: Proc. IEEE Congr. Evolutionary Computation,Portland, OR, (2004), 1980–1987.


Date post:	26-Nov-2016
Category:	Documents
Upload:	darko-kovacevic
View:	216 times
Download:	1 times

Estimating Differential Evolution crossover parameter with VNS approach for continuous global...

Documents