+ All Categories
Home > Documents > [American Institute of Aeronautics and Astronautics 12th AIAA/ISSMO Multidisciplinary Analysis and...

[American Institute of Aeronautics and Astronautics 12th AIAA/ISSMO Multidisciplinary Analysis and...

Date post: 15-Dec-2016
Category:
Upload: jorn
View: 212 times
Download: 0 times
Share this document with a friend
11
Method of Regular Simplexes: A Difference-Assisted Simplex-Based Search Algorithm Arman Azad * and Jorn S. Hansen Institute for Aerospace Studies, University of Toronto, 4925 Dufferin St., Toronto, Ontario, M3H 5T6, Canada A new difference-assisted simplex-based algorithm which has significant advantages in handling optimization problems with large dimensions is introduced. First, fundamental principles are utilized to illustrate a theorem that provides the basis for computation of the simplex-based differences. Then, a set of new outside-expansion and inside-contraction points are defined, and with the help of function values at these points, a bi-directional search pattern is constructed. The primary search direction is determined using the blended difference information that is readily available at the outset of the analysis. In order to compensate for inaccuracies associated with the difference-assisted descent path, a secondary search direction is also defined with the help of the available information about function values. Examples are given to demonstrate the accuracy and efficiency of the new approach, and the performance of the algorithm in parallel environments is discussed. I. Introduction Z eroth-Order Methods (ZOMs) are a class of optimization schemes in which derivatives are not used in the identification or the assessment of potential solutions. These methods require less information about, for example, the differentiability of the functions or the accuracy and continuity of the derivatives. However, they have been discounted by the mainstream mathematical optimization community since early 1960’s, because of their slow rate of convergence, problem size limitations, and a lack of concrete theoretical proof about their expected behavior. In contrast, the simplicity of the ZOMs along with their independence of the calculation of derivatives and their associated accuracy analysis (which is, indeed, a long standing endeavor in its own right) give them increased adaptability and flexibility. Such characteristics are highly desirable for tackling difficult optimization problems and programming in parallel environments. The latter endeavor has recently sparked a renewed interest in ZOMs. Simplex-based search schemes constitute a family of ZOMs in which the trial solutions obtained at the vertices of a simplex are used for determining what the next trial solution will be. Among the pioneering works in this class of algorithms, the simplex method of Spendly et al. 14 has largely been credited as the original non-linear simplex method in the literature. 5 The method was later evolved into the highly-cited Nelder-Mead simplex algorithm 11 in 1965. Since their publications, however, the original methods have been repeatedly reassessed and modified for increased performance. A comprehensive survey on papers that propose modified simplex-based search schemes with improved convergence properties is given by Kolda et al in 2003. 7 It is recognized that, despite the diversity of the proposed modifications, the improved algorithms usually pursue a common idea: a single descent path which is defined in the reflection step (for example, along the line that connects the worst-vertex to the centroid of the remaining vertices) can be inefficient, especially when the angle between the selected direction and the gradient is too large. Based on the strategies chosen to address this issue, most of the modified methods can be classified into three categories: fortified single- direction search schemes, univariate single-direction search schemes, and multidirectional search schemes. * Ph.D. Student, student member, [email protected] Professor, member, [email protected] 12th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference 10 - 12 September 2008, Victoria, British Columbia Canada AIAA 2008-6022 Copyright © 2008 by Arman Azad. Published by the American Institute of Aeronautics and Astronautics, Inc., with permission.
Transcript

Method of Regular Simplexes:

A Difference-Assisted Simplex-Based Search Algorithm

Arman Azad∗ and Jorn S. Hansen†

Institute for Aerospace Studies, University of Toronto,

4925 Dufferin St., Toronto, Ontario, M3H 5T6, Canada

A new difference-assisted simplex-based algorithm which has significant advantages inhandling optimization problems with large dimensions is introduced. First, fundamentalprinciples are utilized to illustrate a theorem that provides the basis for computation ofthe simplex-based differences. Then, a set of new outside-expansion and inside-contractionpoints are defined, and with the help of function values at these points, a bi-directionalsearch pattern is constructed. The primary search direction is determined using theblended difference information that is readily available at the outset of the analysis. Inorder to compensate for inaccuracies associated with the difference-assisted descent path, asecondary search direction is also defined with the help of the available information aboutfunction values. Examples are given to demonstrate the accuracy and efficiency of the newapproach, and the performance of the algorithm in parallel environments is discussed.

I. Introduction

Zeroth-Order Methods (ZOMs) are a class of optimization schemes in which derivatives are not used inthe identification or the assessment of potential solutions. These methods require less information about,

for example, the differentiability of the functions or the accuracy and continuity of the derivatives. However,they have been discounted by the mainstream mathematical optimization community since early 1960’s,because of their slow rate of convergence, problem size limitations, and a lack of concrete theoretical proofabout their expected behavior. In contrast, the simplicity of the ZOMs along with their independence of thecalculation of derivatives and their associated accuracy analysis (which is, indeed, a long standing endeavorin its own right) give them increased adaptability and flexibility. Such characteristics are highly desirablefor tackling difficult optimization problems and programming in parallel environments. The latter endeavorhas recently sparked a renewed interest in ZOMs.

Simplex-based search schemes constitute a family of ZOMs in which the trial solutions obtained at thevertices of a simplex are used for determining what the next trial solution will be. Among the pioneeringworks in this class of algorithms, the simplex method of Spendly et al.14 has largely been credited as theoriginal non-linear simplex method in the literature.5 The method was later evolved into the highly-citedNelder-Mead simplex algorithm11 in 1965. Since their publications, however, the original methods havebeen repeatedly reassessed and modified for increased performance. A comprehensive survey on papers thatpropose modified simplex-based search schemes with improved convergence properties is given by Kolda etal in 2003.7

It is recognized that, despite the diversity of the proposed modifications, the improved algorithms usuallypursue a common idea: a single descent path which is defined in the reflection step (for example, along theline that connects the worst-vertex to the centroid of the remaining vertices) can be inefficient, especiallywhen the angle between the selected direction and the gradient is too large. Based on the strategies chosento address this issue, most of the modified methods can be classified into three categories: fortified single-direction search schemes, univariate single-direction search schemes, and multidirectional search schemes.

∗Ph.D. Student, student member, [email protected]†Professor, member, [email protected]

12th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference 10 - 12 September 2008, Victoria, British Columbia Canada

AIAA 2008-6022

Copyright © 2008 by Arman Azad. Published by the American Institute of Aeronautics and Astronautics, Inc., with permission.

A. Fortified single-direction search schemes

In order to improve the efficiency of a direct search method, it is natural to explore the possibility of reducingthe angle between the trial descent path and the (anti)gradient vector. This can be accomplished in severalways, but the most straight-forward methodology was proposed by Bortz and Kelly2in 1998.

The algorithm, is based on the construction of a right-angled simplex using an initial point X(x1, x2

, ..., xn) with edges having length h. Hence, the vertices of the simplex are X and X + hei, where ei are theunit vectors representing the n-dimensional Cartesian coordinate system. Next, forward finite differencesare calculated, and a new trial solution X+ is found using a univariate line search method. One iteration ofthe algorithm is terminated after a predefined number of calculations. The optimization process is stoppedafter several iterations with decreasing h or when the norm of the approximate gradient is less than asmall positive number. Although the algorithm is deceptively simple and easy to program, the choice ofapproximate (forward-difference) gradients as the only guide for selecting both the descent direction and thetermination criterion might result in slow convergence speeds.

Alternatively, in order to use available information and construct a more efficient simplex-based searchmethod, Yu18 (1979), Rykov13 (1980), and Tseng16 (1999) propose more general reflection steps in which,instead of a single vertex, a subset of vertices of the simplex, or the centroid of a subset of vertices, ischosen and reflected. Because of the improved descent path, the search will be more effective; however,the corresponding algorithms are relatively more complicated, compared to the original simplex methods ofSpendly et al., and Nelder and Mead.

B. Univariate single-direction search schemes

Another reasonable modification to the original simplex-based search methods can be accomplished byperforming a more rigorous univariate line search along the selected descent path. Therefore, instead ofsampling a few points in the space defined by reflection, expansion or contraction, a line search can beused to locate the next trial solution. In 2002, Nazareth and Tseng,10 proposed incorporation of the well-known Golden section method into the Nelder-Mead algorithm along with a restrictive measure to avoidsharp-angled simplexes. While this remedy seems to be relatively simple to implement, it is recognized thatgolden-section search may involve numerous unnecessary function evaluations, especially at the beginning ofanalysis along a non-gradient-based descent path.

C. Multidirectional search schemes

The first multidirectional simplex search algorithm was propose by Torczon,15 and Dennis and Torczon4 inthe early 1990’s. In this algorithm, the best candidate solution in the simplex is identified and chosen as thepivot point. Next, other vertices of the simplex are reflected, expanded, or contracted with respect to thepivot and a lattice of candidate solutions on a diverse set of directions is produced. The size of the latticeis scaled according to the number of processors available. Although, such a strategy might require morecomputations, due to parallelization the final result may be obtained in a shorter time than with sequentialmethods on single-processor machines. It should be noted, however, that the method becomes inefficient asthe number of design variables increases.

D. Relevance and the scope of this research

In this paper, an alternative simplex-based algorithm is introduced which, at every step of the process, usesregular simplixes to construct the search pattern. A comprehensive discussion about the selection of thesearch directions is reserved for the next section, but it should be noted that to establish these directionsin an n-dimensional space the algorithm does not requires more than n+4 function evaluations. Due tothe fact that function evaluations can be performed on parallel machines, regardless of the relation betweenthe dimension of the problem and the number of processors available, the method can be easily scaled formultiprocessor systems.

The primary attention of this work is focused on minimization problems in which the nonlinear objectivefunction exhibits one or more of the following properties:

• A large number of system or state variables are involved in the evaluation of the function. Because ofthe complexity of the function, its behavior, for example, continuity and/or noisiness of the derivatives,is either unknown or difficult to predict.

2 of 11

American Institute of Aeronautics and Astronautics

• Accurate function evaluations are expensive, or due to the uncertainties associated with inputs, theaccuracy of calculated values above a few significant digits is doubtful. Thus, accurate finite differencesor complex step differentiation are not applicable.

• The function values are obtained using a complicated code or commercial software, and automaticdifferentiation techniques are not feasible.

Optimization problems with cost functions that exhibit the above properties are frequently encounteredin practice. In this case, there is little doubt that ZOMs remain the most viable, or sometimes the onlyfeasible option to obtain an optimal solution.

The reminder of this paper is organized as follows. Section 2 provides details about basic definitionsand the mathematical foundation required in the subsequent sections. The key ideas developed in the paperalong with a complete description of a single iteration of the optimization process is provided in Section3. In Section 4 test examples are solved on a single-processor machine. Finally, the results are comparedwith those obtained using the Nelder-Mead simplex and the steepest-descent algorithms (as the referencesimplex-and gradient-based methods) in terms of accuracy and required functions evaluations.

II. Fundamental Concepts

An exclusive property of regular simplexes is explored in this section. This property will be utilized toconstruct the difference-assisted scheme. First, in order to ensure consistency throughout the rest of this

paper, clarifications on terminology along with some fundamental definitions are provided .Definition 1 A simplex in Rn is a generalization of a triangle to n-dimensional spaces. It is the convexhull of n+1 points Xj(xj1 , xj2 , ..., xjn) , j = 1, ..., n + 1. The points (X1, X2, ..., Xn+1) are the vertices ofthe simplex.Definition 2 A simplex in Rn , n ≥ 2, is called an Regular Simplex if the distances between the verticesare equal. This common length, hereafter called the edge of the simplex or LS , is a measure of the size andwill be used as a reference length for normalization proposes. In a regular simplex in Rn , n ≥ 2 if one ofthe vertices is omitted the dimension of problem is reduced to Rn−1. The remaining lower-order space isalso a regular simplex for n > 2Definition 3 In a regular simplex in Rn, the centroid is the intersection of all hyperplanes that passthrough the vertices and divide the simplex into two equal parts. In a regular simplex in Rn the vertices areequally distant from the centroid. The distance can be calculated as a function of LS .

(a) (b)

X i+3

X cen

X i

X i+1

X i+2 O

X i+3

X cen

X i

X i+1

X i+2

X con

Figure 1. The line that connects Xn+1 to Xcen in R3

Lemma In a regular simplex in Rn, n ≥ 2, the straight line that connects an arbitrarily chosen vertex tothe centroid of the remaining vertices, Xcen, is the geometric locus of points O that are equidistant from allvertices, excluding the initially selected vertex.

The proof is provided in the appendix. Figure 1, however, illustrates this concept in a three-dimensionalspace. As indicated, any point located on Xi+3Xcen is equidistant from Xi, Xi+1 and Xi+2.Theorem In a regular simplex, among all points O that lie on the line that connects an arbitrarily chosenvertex, for example Xn+1, to the centroid of the remaining vertices, Xcen, there exists a point Xcon whichits distance from all vertices, excluding the initially selected vertex is

√2

2 Ls. In this case, 6 X1XconX2 =6 X2XconX3 = ... = 6 XiXconXi+1 = ... = 6 Xn−1XconXn = π

2 .

3 of 11

American Institute of Aeronautics and Astronautics

Proof is given in the appendix. As illustrated in Figure 1(b), if XiXcon = Xi+1Xcon = Xi+2Xcon =√2

2 LS , then 6 XiXconXi+1 = 6 XiXconXi+2 = 6 Xi+1XconXi+2 = π2 . Consequently, in the right-angled

simplex XconX1X2...Xn, unit vectors along XconX1, XconX2, ...,XconXn constitute a Cartesian coordinatesystem.

The distance between Xcon and one end of the Xn+1Xcen, for example, Xcen, can be calculated as afunction of Ls. Values of Ln = ||XcenXcon||/Ls obtained for two - to ten-dimensional regular simplixes aregiven in Table 1.

Table 1. Ln =||XcenXcon||

Lscalculated for two - to ten-dimensional simplexes

Dimension of ||XcenXcon||the Simplex (n) Ls

2 0.5000000000000003 0.4082482904638634 0.3535533905932745 0.3162277660168386 0.2886751345948137 0.2672612419124248 0.2500000000000009 0.23570226039551610 0.223606797749979

It should be noted that if an arbitrarily chosen vertex is connected to the centroid of the remainingvertices, Xcen, and an appropriate value of Ln is used to expand the line outside the simplex, the resultingpoint, Xexp, could also be utilized to construct an alternative right-angled simplex. In this case, unit vectorsalong XexpX1, XexpX2, ...,XexpXn represent a Cartesian coordinate system in Rn. Figure 2 illustrates thecoordinate systems with Xexp and Xcon as their origins in R3. Identification of the origin constitutes thefundamental step of the new algorithm that is comprehensively described in the next section. At every stepof the optimization process only one of these two points (Xexp or Xcon) appear in the analysis; therefore, inorder to avoid repetition, an auxiliary parameter XO will be used to represent the origin hereafter.

III. The Method of Regular Simplexes (MRS)

Step by step description of the new algorithm, is provided in this section. Notation and terminology, forexample, reflection , has been chosen to coincide with comparable steps in the Nelder-Mead method.

Step 1 Initialize Use a relatively small (on the order of 10−1 to 10−4 ) simplex edge length Ls and anarbitrarily chosen starting point X1. Set iteration counter k = 1.Step 2 Evaluate Construct a regular simplex in Rn with the help of available information about thestarting point and Ls. Calculate function values at all of the vertices.Step 3 Sort Order the vertices (X(k)

1 , X(k)2 , ..., X

(k)n+1) such that f(X(k)

1 ) ≤ f(X(k)2 ) ≤ ... ≤ f(X(k)

n+1). Sincefunction values at the vertices of the simplex are available the convergence of the results must be examined.Termination Criterion In order to define a termination criterion based on available function values,as originally proposed by Nelder and Mead, Kuester8 suggests the calculation of the function value at thecentroid of all vertices excluding the worst vertex, which is called, F (X(k)

cen). The process is terminated when

[1

n + 1

n+1∑

i=1

(F (X(k)i )− F (X(k)

cen))2] 1

2

≤ ε1

where ε1 is a small positive number. Due to the fact that neither the Nelder-Mead nor the new algorithmuses the value of F (X(k)

cen), this paper pursues an alternative way to establish the termination criterion, as

4 of 11

American Institute of Aeronautics and Astronautics

proposed in5 . In the new approach, instead of using F (X(k)cen), the mean of all function values except the

best is calculated, µ = 1n

∑n+1i=2 F (X(k)

i ). The process is terminated if[

1n + 1

n+1∑

i=1

(F (X(k)i )− µ)2

] 12

≤ ε

where, similar to Kuester’s assumption, ε is small positive number, for example, on the order 10−7 to 10−10.Step 4 Expand and/or Contract and Identify the Origin Exclude the worst vertex X

(k)n+1 from the

analysis; compute the centroid of the resulting lower-order simplex in Rn−1, X(k)cen(x1cen, x2cen, ..., xncen).

Calculate Ln and find X(k)exp, as indicated in Figure 2(a) in R3, calculate F (X(k)

exp). Set X(k)O =X

(k)exp,

F (X(k)O )=F (X(k)

exp). If F (X(k)exp) > F (X(k)

n ), which means the function value at origin is worse than F (X(k)n )

(maximum among available options) use Ln and find X(k)con. Figure 2(b) illustrates the process in R3. Set

X(k)O =X

(k)con, F (X(k)

O )=F (X(k)con).

X n+1

X cen

X 1

X 2

X 3

X exp

i 1

i 2

i 3

1

2

3

1

2

3

i 1 i 3

i 2

X n+1

X cen

X 1

X 2

X 3

X con

i 1

i 2

i 3

1

2

3

1

2

3

i 1

i 3

i 2

(a) (b)

Figure 2. (a): Xexp and (b):Xcon are the origins of the simplex-based coordinate systems in R3.

Step 5 Calculate Simplex-Based Forward Differences Due to the availability of difference informa-tion in the right-angled simplex XOX1...Xn, calculate the vector of forward differences along the edges thatinclude XO, that is, η1, η2, ..., ηn in Rn.

∆F∆η

=(

(∆F

∆η)1, ..., (

∆F

∆η)n

), where (

∆F

∆η)i =

F (X(k)i )− F (X(k)

O )√

22 Ls

.

These forward difference information is blended with central differences in the next step. The enricheddifferences will be used to constitute the primary search path.Step 6 Reflect Based on the value of F (X(k)

O ), constitute the first search direction by reflection.1. If F (X(k)

O ) ≤ F (X(k)1 ), the direction defined by −in along ηn represents the largest difference in function

values among available data.

• Use X(k)O as the pivot and reflect X

(k)n such that X

(k)ref = X

(k)O + (X(k)

O −X(k)n )

• Calculate F (X(k)ref ). Use F (X(k)

n ), F (X(k)O ) and F (X(k)

ref ) to construct a quadratic interpolation

function. Calculate the minimum of the approximating function X(k)min1.

• Use F (X(k)ref ) and replace old difference approximation along ηn with

(∆F

∆η)n =

F (X(k)n )− F (X(k)

ref )√2Ls

.

5 of 11

American Institute of Aeronautics and Astronautics

2. If F (X(k)O ) > F (X(k)

1 ), based on available function values, the direction defined by i1 along η1 representsa viable descent path.

• Use X(k)1 as the pivot and reflect X

(k)O such that X

(k)ref = X

(k)1 + (X(k)

1 −X(k)O )

• Calculate F (X(k)ref ). Use F (X(k)

1 ), F (X(k)O ) and F (X(k)

ref ) to construct a quadratic interpolation

function. Calculate the minimum of the approximating function X(k)min1.

• Use F (X(k)ref ) and replace old difference approximation along η1

(∆F

∆η)1 =

F (X(k)ref )− F (X(k)

O )√2Ls

.

At this stage, two pieces of crucial information are available: an approximate minimum X(k)min1 obtained

along either η1 or ηn, and a blended difference vector ∆F∆η .

Step 7 Construct the Difference-Assisted Search Direction Start from the best available choice,that is, either Xk

1 or XkO and use the blended difference as the descent path, use a line search algorithm (for

example, quadratic polynomial approximation) and obtain the minimum X(k)min2. It is recognized that, the

difference vector is originally calculated in the simplex-based η1η2...ηn coordinates; therefore, it must be firstrecalculated in the global coordinates ζ1ζ2...ζn. In order to constitute the transformation matrix T, first,unit vectors (i1, i2, ..., in) in η1η2...ηn coordinate system are calculated

ii =Xi −XO√

2Ls

.

The unit vectors i1 to in are next substituted into the corresponding lines of the n × n matrix T. It isrecognized that, the elements of the unit vector ij (mj1,mj2, ..., mjn) represent the direction cosines,

cos θj1 = ij.i1 = mj1 , cos θj2 = ij.i2 = mj2 , ..., cos θjn = ij.in = mjn.

Hence, matrix T can be used for transforming a vector valued quantity DTη , for example, approximate

gradients, from the simplex-based η-coordinates to the global ζ-coordinates.

DTζ = T·DT

η

Step 8 Examine the Termination Criterion and Restart Compare the objective function valuesat X

(k)min2, X

(k)min1, X

(k)O and X

(k)1 and sort them. If iteration number k is greater than a predefined number,

print the best result and terminate the calculations. Otherwise, set iteration number k=k+1 and choose thepoint that corresponds to minimum as the starting point for the next iteration X

(k+1)1 .

Calculate the distance between X(k+1)1 and the point that gives the second best function value among

the three remaining candidates. If the distance is less than L(k)s , use it as the edge of the regular simplex in

the next iteration, L(k+1)s , and restart from Step 2.

IV. Results and Discussion

In this section, test examples are used to investigate the capabilities of the new algorithm. The optimiza-tion results obtained using the MRS algorithm are compared with those obtained using the Nelder-Mead

simplex, and the DFP and BFGS algorithms. In order to reach physically meaningful comparisons initialconditions (starting points) and termination criterion (the selected error tolerance) are chosen consistently.

A. Comparisons with the Nelder-Mead Method

Since its introduction in the early 1960’s, the Nelder-Mead method has been without doubt the most com-monly used simplex-based algorithm; though, it has also been repeatedly criticized for its inefficiency inhigher dimensions, lower convergence speed (especially in high-dimensional problems), and its lack of ro-bustness in terms of dependence on starting point and initial simplex size. In this section, a Fortran code

6 of 11

American Institute of Aeronautics and Astronautics

published in8 is used as the reference Nelder-Mead program. In order to be consistent with the MRS methodsome minor modifications regarding the original parameter precision and error tolerance ε was inevitable.The code was also slightly refined to minimize the total number of function evaluations in a single iteration.

First, the effect of problem size on the performance of the algorithms was investigated. The followingquadratic test function was minimized for different values of n (n= 2, 5, 10, 20, 50, 70, 100, 200, 350),

f(x) =n∑

i=1

1ix2

i . (1)

The optimization results, demonstrated in Table 2, indicate that while the the function becomes increasinglyunscaled as n increases, the new algorithm drastically outperforms the Nelder-Mead algorithm in terms ofthe number of function evaluations and the accuracy of the converged solutions.

Table 2. Effect of problem dimensionality on the performance of the MRS and the Nelder-Mead method.

Number of Method of Regular Simplexes Nelder-Mead Methodvariables Function Converged Function Converged

(n) Evaluations Solution Evaluations Solution

2 85 0.000 ×10−6 91 0.000 ×10−6

5 299 0.000 ×10−6 502 0.002 ×10−6

10 518 0.003 ×10−6 1364 0.004 ×10−6

20 1156 0.005 ×10−6 5686 0.019 ×10−6

50 4225 0.025 ×10−6 16471 0.087 ×10−6

70 5855 0.048 ×10−6 27695 0.353 ×10−6

100 9934 0.126 ×10−6 44536 0.557 ×10−6

200 30369 0.246 ×10−6 91124 8.184 ×10−6

350 64818 0.540 ×10−6 247951 28.542 ×10−6

Optimal solutions are obtained based on the termination criterion ε = 1× 10−10. It should be noted that,since in the calculation of µ, the best trial solution is excluded from the analysis, the inconsistencies

between the solutions obtained by these methods is related to the geometry of the final simplex in Rn

The results obtained in this example can be utilized to elaborate on other highly desirable features of thenew algorithm. These include, the capability of being used in parallel environments, improved asymptoticconvergence speed, and ability to avoid a premature convergence to a non-minimal solution.

1. Sequentiality vs. parallel programming capabilities

Despite the fact that most established optimization algorithms pursue sequential procedures, there aregreat algorithmic differences among the corresponding sequential search schemes. To further clarify thisissue, available data about number of completed iterations for n=350 are compared below.

In the Nelder-Mead method, the output of the computer code indicates that, in order to obtain a con-verged solution, 227156 iterations of the algorithm and 247951 function evaluations are completed. It shouldbe recognized that, excluding the shrink step (which is rarely observed in practice) in a complete iteration,the Nelder-Mead algorithm takes either one (reflection) step or two (a sequence of reflection and extension,or reflection and contraction) steps. Due to the fact that, in each iteration, the decisions about the secondarystep(extension or contraction) is taken after the initial Reflection step, the process cannot be performed in aparallel environment without excessive penalties for unnecessary function evaluations. In addition, regardlessof the number of steps taken in single iterations, the 227156 iterations must be completed sequentially.

In the new method, the converged solution was obtained after 64818 function evaluations was performedduring 182 iterations of the optimization algorithm. At the beginning of each iteration, function values atn+1=351 vertices of a regular simplex must be calculated. This can be done on parallel machines. As aresult,

1× 351︸ ︷︷ ︸1st iteration

+ 181× 350︸ ︷︷ ︸2nd to the 182nd iterations

= 63701,

7 of 11

American Institute of Aeronautics and Astronautics

out of 64818, or 98.27% of, function value calculations can be performed in parallel during 182 iterations.Thus, if a sufficient number of processors are available, 63701 function evaluations can be completed withinthe same time-frame required for only 182 of such calculations on a single processor machine. The balance,that is, 64818 - 63701 = 1117, function evaluations are required for line search and quadratic polynomialoptimization proposes. Since the procedures are independent, without much difficulty, a considerable portionof the remaining function value calculations can be also distributed among different machines.

In practice, function evaluations usually require execution of large commercial software packages orin-house-developed computer codes. In this situation, due to simplicity of the new algorithm, the CPUtime required for hundreds of iterations of the new optimization algorithm can be assumed to be negligiblecompared to a single run of the software. As such, its natural parallelization capabilities seems to be verypromising for researchers in the field.

2. Improved asymptotic convergence rate

Zeroth-Order Methods have been repeatedly criticized for a slow asymptotic convergence rate, in the lit-erature. As seen in Table 2, regardless of the dimensionality of the problem, for ε = 1 × 10−10, the newalgorithm requires less computational effort compared to the Nelder-Mead algorithm.

The effect of the predefined value of the termination criterion on the convergence of the algorithms isinvestigated for the fifty-dimensional test function(1). The results, illustrated in Table 3, indicate that theNelder-Mead more rapidly decelerates with increased level of accuracy.

Table 3. Effect of the preset accuracy level on the convergence rate. Solutions are obtained for n=50.

Termination Method of Regular Simplexes Nelder-Mead Methodcriterion Function Converged Function Converged

(Accuracy level) Evaluations Solution Evaluations Solution

1×10−07 2066 0.127138 ×10−04 6021 1.000101 ×10−04

1×10−08 3008 0.346993 ×10−05 8857 2.428242 ×10−05

1×10−09 3991 0.053797 ×10−06 12302 1.044875 ×10−06

1×10−10 4225 2.513256 ×10−08 16471 8.668825 ×10−08

1×10−11 5149 1.326754 ×10−09 24852 4.902235 ×10−09

1×10−12 5666 2.840951 ×10−10 32957 9.698841 ×10−10

1×10−13 6535 2.043505 ×10−11 44351 2.317900 ×10−11

1×10−14 7114 3.404607 ×10−12 51454 6.406820 ×10−12

1×10−15 8097 0.176327 ×10−12 63212 3.710810 ×10−12

3. Premature convergence to a non-optimal solution

Although many practitioners believe that ZOMs, in general, and the Nelder-Mead method, in particular,are more robust than gradient-based methods, in 1998, Mckinnon9 proved that in some especial cases, theNelder-Mead algorithm can be trapped inside repetitive cycles of contraction, and thus is forced to convergeto a non-stationary point. Furthermore, due to a lack of theoretical proof concerning the convergence of themost established simplex-based methods, or because of the inaccuracies associated with the search schemesused in these algorithms, it is often suggested12 or required2 to restart the process and examine the results,after a converged potential solution is obtained. The re-initiation of the optimization procedure, however,requires extra computations. This additional computational effort, which can be regarded as the hiddencost of the optimization process, is problem dependant and cannot be evaluated in advance. In the newalgorithm, the results are obtained in a single run of the iterative process without any hidden additionalfunction evaluations.

B. Comparisons with DFP and BFGS Methods

In the new algorithm, it is recognized that the size of the edge of the simplex Ls gradually shrinks to a smallnumber near the optimum (step 8). In this situation,

√22 Ls, or the forward-difference step-size, would be a

8 of 11

American Institute of Aeronautics and Astronautics

small number as well. As a result, the primary search direction which uses the difference values representsan approximate ∇f . Despite the fact that the descent direction defined using −∇f is the one along which fdecreases most rapidly, following exact gradients might not be an efficient approach.

In order to compare the performance of the algorithm with gradient-based methods, two Quasi-Newtonalgorithms, i.e., DFP and BFGS (the latter, developed and programmed at Northwestern University, byCiyou Zhu et al.19) were utilized in this section. The Quadratic test example (1), introduced in section IV.Aof this paper was minimized for different values of n(n=20, 50, 70, 100, 200, 350). Results are illustrated inTable 4. It should be noted that in order to conduct a physically meaningful comparison, instead of exactgradients, central differences were calculated and used in both Quasi-Newton methods.

Table 4. Effect of problem dimensionality on the performance of the algorithms.

Number of MRS Method DFP Method BFGS Methodvariables Function Converged Function Converged Function Converged

(n) Evaluations Solution Evaluations Solution Evaluations Solution

20 1156 0.005 ×10−6 1278 0.005 ×10−6 945 0.005 ×10−6

50 4225 0.025 ×10−6 3363 0.029 ×10−6 3437 0.026 ×10−6

70 5855 0.048 ×10−6 6058 0.077 ×10−6 5643 0.059 ×10−6

100 9934 0.126 ×10−6 11121 0.136 ×10−6 9051 0.149 ×10−6

200 30369 0.246 ×10−6 42456 0.922 ×10−6 25868 0.252 ×10−6

350 64818 0.540 ×10−6 46555 0.572 ×10−6 56086 0.556 ×10−6

In the MRS method, optimal solutions are obtained based on the termination criterion ε = 1× 10−10. Inthe Quasi-Newton algorithms the termination criterion are set at |∇f | ≤ 1× 10−6, and the closest solution

equal to the one obtained by MRS (or the closest higher solution) is reported.

Despite the fact that number of function evaluations alone does not provide a fair basis required for acomprehensive comparison, Table 4 indicates that the new bi-directional search scheme is effectively capableof handling optimization problems with large number design variables.

V. Conclusion

Unlike the Nelder-Mead simplex and virtually all gradient-based algorithms, it was asserted that thealgorithm of the MRS can be easily implemented on parallel machines. While this useful characteristic

appears to be attractive for practitioners, especially in solving complicated optimization problems, the newalgorithm has other features that should not be underestimated, including:

1. capability of handling higher-dimensional problems without deteriorating convergence speed, as seenin the Nelder-Mead method;

2. faster near-the-optimum convergence speed, as opposed the Nelder-Mead method in higher-dimensions;

3. having no extra optimization costs associated with expensive gradient calculations, or the re-initiationof the process;

4. increased robustness, due to the bi-directionality of the search scheme.

In addition, the new algorithm neither relies on derivatives (and their accuracy, continuity, etc.) nor isconstrained by differentiability conditions; therefore, it can be used with more flexibility, especially whenthe behavior of the objective function is not predictable in advance. In particular, this can be very usefulin developing a sequential unconstrained optimization technique for constrained optimization problems thatuses the MRS and an appropriate function that combines the objective and the constraints, for example, theKreisselmeier-Stainhauser function.

9 of 11

American Institute of Aeronautics and Astronautics

References

1Arora J. S., Introduction to optimum design, 2nd edition, Elsevier Academic Press, 20042Bortz D.M. and Kelley C.T., The simplex gradient and noisy optimization problems, in Computational Methods in Optimal

Design and Control, Progr. Syst. Control Theory 24, J.T. Borggaard, J. Burns, E. Cliff, and S. Schreck (eds.), Birkhauser,Boston, pp. 77-90, 1998

3Chow, K. L., Parallel Unconstrained Optimization, Dept. Paper, Department of Computer Sciences, University of Toronto,October 1993 (avaiable at, http://citeseer.ist.psu.edu/chow93parallel.html, Copyright Penn State and NEC)

4Dennis J.E. and Torczon V., Direct search methods on parallel machines, SIAM Journal on Optimization, {1} 448-474,1991 1992.

5Gurson A. P., Simplex Search Behavior in Nonlinear Optimization, Honors B.Sc. Thesis in Computer Science, Williamand Marry College, VA, 2000

6Hooke R. and Jeeves T.A., Direct search solution of numerical and statistical problems, J. ACM, {8}, 212-229 , 19617Kolda T.G., Lewis R.M., and Torczon V., Optimization by Direct Search: New Perspectives on Some Classical and Modern

Methods, SIAM Reveiw, Vol. 45, No.3, pp. 385-482, 20038Kuester J.L. and Mize J.H., Optimization Techniques with Fortran, McGRAW-Hill Book Company, 19739McKinnon, R.S. Convergence of the Nelder-Mead simplex method to a nonstationary point, SIAM Journal on Optimization,

Vol.9, No.1, pp. 148-158, 199810Nazareth L. and Tseng P., Gilding the lily: A variant of the Nelder-Mead algorithm based on golden-section search,

Comput., Optim. Appl., {22}, 133-144, 200211Nelder J. A. and Mead R., A simplex method for function minimization, Comput. J., 7, 308-313, 196512Press W.H, Teukolsky S. A., Vettering W.T. and Flannery B.P.,Numerical Recepies in Fortran 77, the art of scientific

computing, 2nd ed., Cambridge Univwesity Press, 199713Rykov A. s. Simplex methods of direct search, Engrg. Cybernetics, {18} , 12-18, 198014Spendley W., Hext G. R. , and Himsworth F. R., Sequential application of simplex designs in optimisation and evolu-

tionary operation, Technometrics, {4} , 441-461, 196215Torczon V., Multidirectional Serach: a direct search algorithm for parallel machines, Technical report 90-7, Department

of Mathematical Sciences, Rice University, TX, 199016Tseng P., Fortified-descent simplicial search method: A general approach, SIAM J. Optim., {10}, 269-288, 199917Write M. H., Direct search methods: once scorned, now respectable, in Numerical Analysis 1995, (Proceddings of the 1995

Dundee Biennial Conference in Numerical Analysis), D.F. Griffiths and G.A. Watson (eds.),Addison Wesley Longman, Harlow,UK, pp. 191-208, 1996.

18Yu W., The convergence property of the simplex evolutionary techniques, Scientia Sinica, Special Issue of Mathematics,1 pp. 6877, 1979

19Zhu C., Byrd R.H., Lu P., Nocedal J. L-BFGS-B: FORTRAN Subroutines for Large Scale Bound Constrained Optimiza-tion Tech. Report, NAM-11, EECS Department, Northwestern University, 1994.

Appendix

Lemma: In a regular simplex in Rn, n ≥ 2, the straight line that connects an arbitrarily chosen vertex, forexample Xn+1, to the centroid of the remaining vertices, Xcen, is the geometric locus of points O that

are equidistant from all vertices, excluding the initially selected vertex.

X n+1

C n

O

X 1

X 3

X 2

Figure 3. The line that connects Xn+1 to Xcen in R3

Proof:Definition 2 ⇒ X1Xn+1 = X2Xn+1 = ... = XnXn+1

Definition 3 ⇒ X1Xcen = X2Xcen = ... = XnXcen

XcenXn+1 common

4X1XcenXn+1 ≡ 4X2XcenXn+1 ≡ ... ≡ 4XnXcenXn+1

6 X1XcenXn+1 = 6 X2XcenXn+1 = ... = 6 XnXcenXn+1 (I)

10 of 11

American Institute of Aeronautics and Astronautics

For an arbitrarily chosen O, the following relations hold in triangles4X1XcenO,4X2XcenO, ...,4XnXcenO

(I) ⇒ 6 X1XcenO = 6 X2XcenO = ... = 6 XnXcenO

X1Xcen = X2Xcen = ... = XnXcen

XcenO common

4X1XcenO ≡ 4X2XcenO ≡ ... ≡ 4XnXcenO ⇒X1O = X2O = ... = XnO

Theorem:In a regular simplex, among all points O that lie on the line that connects an arbitrarily chosenvertex, for example Xn+1, to the centroid of the other vertices, Xcen, there exists a point Xcon which

its distance from all vertices, except Xn+1 is√

22 Ls. In this case 6 X1XconX2 = 6 X2XconX3 = ... =

6 XiXconXi+1 = ... = 6 Xn−1XconXn = π2 .

X i

X i+1

X i+2

X i X i+1

L S

H

L S L S

(a) (b)

'

X con X con

Figure 4. Xcon and three consecutive but arbitrarily chosen vertices Xi,Xi+1 and Xi+2

(a) all four points depicted in R3, or (b) Xcon, Xi and Xi+1 shown in R2

Proof: Three distinct points Xi , Xi+1 (i=1,...,n-1 ) and Xcon lie on a plane in Rn. If the line thatconnects Xcon to the center of XiXi+1 is called XconH, then

XiH = Xi+1H

4XiXconXi+1 isosceles

}⇒ XconH ⊥ XiXi+1 ⇒ XconH =

√XconH2 −XiH

2

⇒ XconH = ( 12L2

s − 14L2

s)12 ⇒ XconH = XiH = 1

2Ls (I)

(I) ⇒ 6 HXiXcon = 6 HXconXi

6 XiHXcon = π2

}⇒ 6 HXiXcon = 6 HXconXi = π

4

Similarly, it can be shown that

6 HXiXcon = 6 HXiXcon =π

4⇒ 6 XiXconXi+1 =

π

2, or XiXcon ⊥ XconXi+1

Since Xi and Xi+1, (i = 1, ..., n− 1) are two arbitrarily chosen consecutive vertices, it can be concludedthat 6 X1XconX2 = 6 X2XconX3 = ... = 6 XiXconXi+1 = ... = 6 Xn−1XconXn = π

2 .

11 of 11

American Institute of Aeronautics and Astronautics


Recommended