+ All Categories
Home > Documents > A limited memory adaptive trust-region approach for large-scale unconstrained optimization

A limited memory adaptive trust-region approach for large-scale unconstrained optimization

Date post: 16-Nov-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
24
Bulletin of the Iranian Mathematical Society Vol. XX No. X (201X), pp XX-XX. A LIMITED MEMORY TRUST-REGION METHOD WITH ADAPTIVE RADIUS FOR LARGE-SCALE UNCONSTRAINED OPTIMIZATION MASOUD AHOOKHOSH, KEYVAN AMINI * , MORTEZA KIMIAEI AND M. REZA PEYGHAMI Communicated by Abstract. This study concerns with a trust-region-based method for solving uncon- strained optimization problems. The approach takes the advantages of the compact limited memory BFGS updating formula together with an appropriate adaptive radius strategy. In our approach, the adaptive technique leads us to decrease the number of subproblems solving, while utilizing the structure of limited memory quasi-Newton for- mulas helps to handle large-scale problems. Theoretical analysis indicates that the new approach preserves the global convergence to first-order stationary points under classical assumptions. Moreover, the superlinear and the quadratic convergence rates are also established under suitable conditions. Preliminary numerical experiments on some stan- dard test problems show the effectiveness of the proposed approach for solving large-scale unconstrained optimization problems. 1. Introduction Over the past few decades large-scale unconstrained optimization gets lots of attention thanks to arising in many applications in the context of applied sciences such as biology, physics, geophysics, chemistry, engineering and industry. In general, an unconstrained optimization problem can be formulated as follows (1.1) minimize f (x) subject to x R n , MSC(2010): Primary: 90C30; Secondary: 65k05; Third: 65k10. Keywords: Unconstrained optimization, Trust-region framework, Compact quasi-Newton representation, Limited memory technique, Adaptive strategy, Convergence theory. Received: date, Accepted: date. *Corresponding author c 2011 Iranian Mathematical Society. 1
Transcript

Bulletin of the Iranian Mathematical Society Vol. XX No. X (201X), pp XX-XX.

A LIMITED MEMORY TRUST-REGION METHOD WITH ADAPTIVE

RADIUS FOR LARGE-SCALE UNCONSTRAINED OPTIMIZATION

MASOUD AHOOKHOSH, KEYVAN AMINI∗, MORTEZA KIMIAEI AND M. REZA PEYGHAMI

Communicated by

Abstract. This study concerns with a trust-region-based method for solving uncon-strained optimization problems. The approach takes the advantages of the compactlimited memory BFGS updating formula together with an appropriate adaptive radiusstrategy. In our approach, the adaptive technique leads us to decrease the number ofsubproblems solving, while utilizing the structure of limited memory quasi-Newton for-mulas helps to handle large-scale problems. Theoretical analysis indicates that the newapproach preserves the global convergence to first-order stationary points under classicalassumptions. Moreover, the superlinear and the quadratic convergence rates are alsoestablished under suitable conditions. Preliminary numerical experiments on some stan-dard test problems show the effectiveness of the proposed approach for solving large-scaleunconstrained optimization problems.

1. Introduction

Over the past few decades large-scale unconstrained optimization gets lots of attentionthanks to arising in many applications in the context of applied sciences such as biology,physics, geophysics, chemistry, engineering and industry. In general, an unconstrainedoptimization problem can be formulated as follows

(1.1)minimize f(x)subject to x ∈ Rn,

MSC(2010): Primary: 90C30; Secondary: 65k05; Third: 65k10.

Keywords: Unconstrained optimization, Trust-region framework, Compact quasi-Newton representation, Limited

memory technique, Adaptive strategy, Convergence theory.

Received: date, Accepted: date.

∗Corresponding author

c© 2011 Iranian Mathematical Society.

1

2 Ahookhosh et al.

where f : Rn → R is assumed to be continuously differentiable.

Motivation & history There exist lots of iterative schemes such as Newton, quasi-Newton, variable metric, gradient and conjugate gradient methods that have been intro-duced and developed for solving the unconstrained problem (1.1). In the most cases, theyare required to exploit one of the general globalization techniques, i.e. line search andtrust-region techniques, in order to guarantee the global convergence results (see [26]).

For a given xk, a line search technique refers to a procedure that computes a step-sizeαk along a specific direction dk and generates a new point as xk+1 = xk + αkdk. Manyline search strategies for determining this step-size have been proposed, for instance exactline search or Armijo, Wolfe and Goldstein inexact line searches (see [26]). On the otherhand, a quadratic-based framework of trust-region technique computes a trial step dk bysolving the quadratic subproblem

(1.2)minimize mk(xk + d) = fk + gTk d+

12d

TBkdsubject to d ∈ Rn, ‖d‖ ≤ ∆k,

where ‖.‖ denotes the Euclidean norm, fk = f(xk), gk = ∇f(xk), Bk is the exact Hessian,i.e. Gk = ∇2f(xk), or its symmetric approximation and ∆k is the trust-region radius.In order to evaluate an agreement between the model and the objective function and toaccept or reject the trial step dk, a criterion based on the actual and the model reductionsis required. Traditional monotone trust-region methods do this by defining the ratio

(1.3) ρk =f(xk)− f(xk + dk)

mk(xk)−mk(xk + dk),

where the numerator and the denominator are called the actual and the predicted reduc-tions, respectively. The trial step dk is accepted whenever ρk is greater than a positiveconstant. In this case, if ρk is greater than a constant µ2 > 0, then the new pointxk+1 = xk + dk is accepted and the iterate is called very successful. But if ρk ≥ µ1 for0 < µ1 < µ2, then the new point is generated by xk+1 = xk + dk and the iterate is calledsuccessful and the trust-region radius is also updated appropriately based on amount ofρk. Otherwise, the trial step is rejected and the quadratic subproblem (1.2) would beresolved at the current point with the reduced trust-region radius.

On the basis of literature reviews, one can realize that the traditional trust-regionmethods are very sensitive to the initial trust-region radius ∆0 and its updating scheme.This fact leads the researchers to work on finding appropriate procedures for initial trust-region radius as well as its updating rule. In 1997, Sartenaer in [30] proposed an approachto determine the initial trust-region radius by monitoring the agreement between the modeland the objective function along the steepest descent direction. A possible drawback ofthis approach is the dependency of the parameters to the problem information. Recently,Gould et al. in [18] examined the sensitivity of the traditional trust-region methods to

A limited memory trust-region method with adaptive radius 3

the parameters related to the step acceptance test and trust-region radius update withextensive numerical experiments. Despite their comprehensive tests on a large number oftest problems, they did not claim to find the best parameters for updating scheme. In2002, motivated by a problem in the field of neural network, the first adaptive trust-regionradius was proposed by Zhang et al. in [35] in which the information of the current iteratewas used more effectively to introduce an adaptive scheme. They introduced the followingadaptive trust-region radius

(1.4) ∆k = cpk‖gk‖ ‖B−1k ‖,

where c ∈ (0, 1) is a constant, pk ∈ N ∪ 0 and the matrix Bk = Bk + Ek is a safelypositive definite matrix based on Schnabel and Eskow modified Cholesky factorization,see [31]. As numerical results show, their method works very well on small-scale uncon-strained optimization problems, but the situation dramatically changes for the large-scaleand even medium-scale problems because of the computation of the inverse matrix B−1

k .Subsequently, Shi and Guo in [33] proposed another interesting adaptive radius by

(1.5) ∆k = −cpkgTk qk

qTk Bkqk‖qk‖,

where c ∈ (0, 1) is a constant, pk ∈ N ∪ 0, and Bk is generated by the procedure

Bk = Bk + iI,

where i is the smallest nonnegative integer so that qTk Bkqk > 0 and qk satisfies the well-known angle condition

(1.6) −gTk qk

‖gk‖.‖qk‖≥ τ,

in which τ ∈ (0, 1] is a constant. An important advantage of this method is its ability forselecting an appropriate qk in order to make a more robust method. For instance, Shi andGuo proposed qk = −gk and qk = −B−1

k gk. Preliminary numerical results and theoreticalanalysis showed that their method is promising for solving medium-scale problems withoutany need of exploring for appropriate initial trust-region radius. For more references aboutthe adaptive trust-region radius, see also [1, 2, 3, 4, 6].

Although the proposed adaptive radius of Shi and Guo has some advantages such asdecreasing in the total computational costs by decreasing the number of subproblemssolving and determining a good initial radius, it suffers from some drawbacks as well. Firstof all, it can be easily seen that qk = −gk does not generate an appropriate radius (see[33]), and the computation of qk = −B−1

k gk requires the computation of the inverse matrix

B−1k or solving a linear system of equations that causes being inappropriate for large-scale

problems. Secondly, the process of generating Bk guarantees that the denominator of (1.5)

4 Ahookhosh et al.

is bounded away from zero, however, if the numerator term −gTk qk be close to zero, thenthis case causes a tiny trust-region radius which possibly increases the total number ofiterates. Thirdly, numerical experiments have shown that when the ratio is so close to1, for very successful iterates, the method does not necessarily enlarge the trust-regionradius sufficiently. In addition, the procedure of constructing Bk is almost unusual andsometimes costly. Finally, the necessity of storing Bk for computing the term qTk Bkqk in

(1.5) and the term dTkBkdk for computing the predicted reduction may cause the methodto be unsuitable for large-scale problems.

It is known that the limited memory quasi-Newton methods are the customized ver-sion of the quasi-Newton methods for large-scale optimization problems. Besides, theirimplementations are almost identical to that of the quasi-Newton methods, however, theHessian and the inverse Hessian approximations are not explicitly formed on them. In-stead, they are defined based on information of some small number of previous iterates toreduce the required memory. As another advantage, some limited memory quasi-Newtonformula, such as the compact limited memory BFGS updating formula (see [10]), can pre-serve positive definite property based on some cheap conditions. Due to these remarkableadvantages of the limited memory quasi-Newton formulas, they are widely utilized forlarge-scale optimization problems, see [8, 10, 19, 20, 21, 24, 25].

Content. In this paper, we propose an improved version of the adaptive trust-regionradius (1.5) to attain better performances when the number of variables of underlyingfunction is large. More specifically, we first substitute (1.5) by a modified formula toovercome the above-mentioned disadvantages. We then take the advantages of the compactlimited memory BFGS formula in order to calculate the term dTkBkdk in a cheap way. Itis also clear that the method can preserve positive definiteness of Bk based on someconditions, so it can avoid of calculating Bk. The analysis of the new approach shows thatit inherits both the stability of adaptive trust-region approaches and the effectiveness of thelimited memory BSGS. We also investigate the global convergence to first-order stationarypoints of the proposed method and provide the superlinear and quadratic convergencerates. To show the efficiency of the proposed method, some numerical results are reported.

The remainder of this paper is organized as follows. In Section 2, we describe the mo-tivation behind the proposed algorithm and outline the algorithm. Section 3 is devotedto investigating the global, superlinear and quadratic convergence properties of the algo-rithm. Numerical results are provided in Section 4 to show the well promising behaviour ofthe proposed method for solving large-scale unconstrained optimization problems. Finally,some conclusions are given in Section 5.

A limited memory trust-region method with adaptive radius 5

2. Algorithmic framework

In this section, we first verify some alternatives to overcome disadvantages of the trust-region radius (1.5) which mentioned in the previous section. We then construct our trust-region-based algorithm and establish some of its properties.

It can be easily seen that if one exploits a positive definite quasi-Newton formula forBk, then there is no need to define Bk in (1.5) due to the fact that qTk Bkqk > 0 for every

arbitrary non-zero vector qk. Therefore, the method exempts from the generating Bk. Asmentioned before, qk = −gk does not generate an appropriate trust-region radius, andthe calculation of qk = −B−1

k gk imposes a remarkable computational cost to the method.Thus, it is needed to find a new qk with less computational cost satisfying (1.6). For thispurpose, the following qk is employed throughout the paper

(2.1) qk = −Hkgk,

which clearly has less computational cost in comparison with the cost of calculation qk =−B−1

k gk, so it can partially avoid the costly calculation of the inverse matrix B−1k . In order

to show qk = −Hkgk satisfies (1.6), it is sufficient that the spectral condition number κkof Hk, the ratio λ1/λn of the largest to the smallest eigenvalues of Hk, is bounded above,independently to k. Let θk denotes the angle between qk = −Hkgk and −gk, so the factthat ‖Hkgk‖ ≤ λ1‖gk‖ and gTk Hkgk ≥ λn‖gk‖2 implying

sin(π2− θk

)= cos(θk) =

gTk Hkgk‖gk‖.‖Hkgk‖

≥ λn‖gk‖2

λ1‖gk‖2= κ−1

k .

From the inequality sin(x) ≤ x, it can be concluded that

θk ≤π

2− κ−1

k .

Therefore, if spectral condition number κk is bounded above for any k ∈ N, then θk isbounded away from π/2, i.e. the angle condition (1.6) is satisfied, for example see [16].

It is known that the compact limited memory BFGS formula can be used to decrease thecomputational costs of the calculation qTk Bkqk, d

TkBkdk and Hkgk (see [10, 20]). Besides,

it can remain positive definite if the curvature condition yTk sk > 0 holds for all previousiterates, where sk = xk+1 − xk and yk = gk+1 − gk. We also increase the trust-regionradius more than that defined in (1.5) when the procedure encounters with very successfuliterates. It seems that these changes possibly lead to an approach needing less number ofiterates and function evaluations.

It is clear that traditional trust-region methods and the radius defined in (1.5) requiresto store the n × n matrix Bk needing lots amount of memory, especially for large-scaleproblems. Hence it is worth to explore a way to avoid storing this matrix. The pioneeringwork to find such methods proposed by Nocedal in [25] called the limited memory quasi-Newton method. On the basis of the interesting features of limited memory quasi-Newton

6 Ahookhosh et al.

methods, they have been widely used in various fields of optimization, see [8, 10, 19, 20,21, 24, 25] and references therein.

Here, we briefly describe the compact limited memory BFGS formulae proposed byByrd et al. in [10]. For a positive integer constant m1, let matrices Sk and Yk be definedas follows:

(2.2) Sk = [sk−m, · · · , sk−1], Yk = [yk−m, · · · , yk−1],

where m = mink,m1. Suppose that Dk be the m×m diagonal matrix

(2.3) Dk = diag[sTk−myk−m, · · · , sTk−1yk−1

]and let Lk be m×m lower triangular matrix

(2.4) (Lk)i,j =

sTk−m+i−1yk−m+j−1 if i > j,0 otherwise.

Then, using relations (2.2)–(2.4), the compact representation of BFGS formula proposedby Byrd et al. in [10] can be expressed as follows:

(2.5) Bk = B(0)k −

[Yk B

(0)k Sk

] [ −Dk LTk

Lk STk B

(0)k Sk

]−1 [Y Tk

STk B

(0)k

],

where the basic matrix B(0)k is defined as B

(0)k = σkI, for some positive scalar σk. Defining

Ak = [Yk, Sk] and using the way in [20], we can easily write

Bk = σkI −Ak

[I 00 σk

] [−Dk LT

k

Lk σkSTk Sk

]−1 [I 00 σk

]AT

k .

Consequently, using the Cholesky factorization, this formula can be stated as(2.6)

Bk = σkI −Ak

[I 00 σk

] [−D

12k D

− 12

k LTk

0 JTk

]−1 [D

12k 0

−LkD− 1

2k Jk

]−1 [I 00 σk

]AT

k ,

where Jk is a lower triangular matrix satisfying JTk Jk = σkS

Tk Sk + LkD

−1k LT

k . Notice

that the matrix Dk is a positive definite diagonal matrix, so D1/2k exists and is a positive

definite diagonal matrix as well. Moreover, from positive definiteness of the matrix Dk,one can easily conclude that σkS

Tk Sk + LkD

−1k LT

k is also positive definite and therefore

the matrix Jk evidently exists. Under curvature assumption sTk yk > 0 for each k, it is notdifficult to show that the matrix Bk generated by the formula (2.6) is positive definite. Inthe rest of the paper, we use (2.6) to approximate exact Hessian Gk = ∇2f(xk), however,we will never form it explicitly.

It is obvious that the computation of the term Hkgk in (2.1) is so costly, especially whenthe number of variables is large. Therefore, this formula can be calculated according to

A limited memory trust-region method with adaptive radius 7

the compact limited memory technique which is customized for the approximate inverseHessian, i.e. Hk. Due to literatures review in [10, 20], the compact limited memoryrepresentation of the approximate inverse Hessian Hk can be constructed by

(2.7) Hk = σ−1k I +Ak

[0 σ−1

k I

R−Tk 0

] [Dk + σ−1

k Y Tk Yk −I

−I 0

] [0 R−1

k

σ−1k I 0

]AT

k ,

where

(Rk)i,j =

sTk−m+i−1yk−m+i−1, if i ≤ j;0, otherwise.

Since the matrix Rk is an m×m upper triangular matrix, for any arbitrary vector ηk, it iseasy to compute the vector θk = R−1

k ηk by solving an upper triangular system Rkθk = ηk.Thanks to the updating formula of Hk in (2.7), for an arbitrary vector vk ∈ Rn, we com-pute the vector Hkvk by the following scheme:

Scheme 1: Calculation of Hkvk based on the formula (2.7)

Step 1. Determine ξ = ATk vk and partition it as ξ =

[ξ1ξ2

]mm

. Then solve Rkw = ξ2.

Step 2. Set β = (Dk + σ−1k Y T

k Yk)w − σ−1k ξ1 and solve RT

k γ = β.

Step 3. Set Hkvk = σ−1k [vk + Ykw] + Skγ.

Using the fact that Dk and Rk are respectively diagonal and triangular matrices, we

observe that the Scheme 1 consists of 2mn+ m(m+1)2 multiplications for Step 1, m2+3m+

m(m+1)2 multiplications for Step 2 and 2mn + n multiplications for Step 3. Thus Scheme

1 requires 4mn+ 2m2 + 4m+ n multiplications for computing the term Hkvk.As mentioned in the implementation of the Shi and Guo’s algorithm in [33], we need

to calculate the term vTk Bkvk in the subproblem (1.2) as well as the trust-region radius,respectively. One knows that direct computation of these formulas is expensive for large-scale problems. As discussed in [10, 20], these formula can be calculated more efficiently ifone applies the compact version of limited memory quasi-Newton updates. Based on theformula (2.6), for an arbitrary vector vk ∈ Rn, we establish an effective scheme to computevTk Bkvk as follows:

Scheme 2: Calculation of vTk Bkvk based on the formula (2.6)

Step 1. Compute ξ = ATk vk, and let ξ =

[ξ1ξ2

]mm

.

Step 2. Let ζ =

[ξ1

σkξ2

]. Solve following two systems

D12k t1 = ζ1 and Jkt2 = ζ2 + Lk(D

− 12

k t1).

8 Ahookhosh et al.

Step 3. Set vTk Bkvk = σvTk vk + tT1 t1 − tT2 t2.

In Scheme 2, it is observed that the vector ξ is available from Step 1 of Scheme 1, so wedo not need to recompute it. The lower triangular 2m× 2m linear system in Step 2 needsm2 + 3m multiplications. Finally, Step 3 requires only 2(n + m) multiplications. HenceProcedure 2 requires 2n+2m2+5m multiplications to compute the scalar vTk Bkvk, for anarbitrary vector vk.

We here describe the k-th step of our novel scheme. Let us define

βk = −gTk qk

qTk Bkqk‖qk‖,

where qk is an arbitrary vector satisfying (1.6), for example qk = −gk or qk = −Hkgk,and Bk is defined by the compact limited memory BFGS formula (2.6). Note that theformula (2.6) leads to a positive definite matrix Bk and consequently qTk Bkqk > 0, for anyarbitrary non-zero vector qk. We now define sk based on the definition φk as follows

(2.8) sk :=

‖g0‖ if k = 0;

βk if k ≥ 1, µ1 ≤ ρk−1 < µ2;

c1βk if k ≥ 1, ρk−1 ≥ µ2

where c1 > 1, 0 < µ1 ≤ µ2 ≤ 1. We then compute the adaptive trust-region radius ∆k bythe next formula

(2.9) ∆k = cpksk

in which pk is the smallest integer in N ∪ 0 guaranteeing ρk ≥ µ1. In view of our dis-cussion, a new limited memory trust-region algorithm with adaptive radius is outlined in

A limited memory trust-region method with adaptive radius 9

the following:

Algorithm 1: LMATR (limited memory trust-region algorithm with adaptive radius)

Input: x0 ∈ Rn, B0 ∈ Rn×n, kmax, 0 < µ1 ≤ µ2 ≥ 1, 0 < c1 ≤< 1, c2 ≥ 1, ε > 0;Output: xb; fb;

1 begin2 ∆0 ← ‖g0‖; k ← 0; p← 0;

3 while ‖gk‖ ≥ ε && k ≤ kmax do4 compute qk by (2.1) using Scheme 1;

5 compute qTk Bkqk using Scheme 2;

6 compute sk using (2.8);

7 solve the subproblem (1.2) to specify dk;

8 xk+1 ← xk + dk; compute f(xk+1);

9 compute dTkBkdk using Scheme 2;

10 determine ρk using (1.3);

11 while ρk < µ1 do12 p← p+ 1;

13 solve the subproblem (1.2) to specify dk;

14 xk+1 ← xk + dk; compute fk+1 = f(xk+1);

15 determine ρk using (1.3);

16 end

17 xk+1 ← xk+1; fk+1 = fk+1; m← mink,m1;18 update Sk, Yk, Dk, Lk, Y

Tk Yk and Rk;

19 k ← k + 1;

20 end

21 xb ← xk+1; fb ← fk+1;

22 end

The loop starts from Line 3 and ends at line 20 is called the outer loop, and the loopstarts from Line 11 and ends at line 16 is called the inner loop.

3. Convergence analysis

This section is devoted to analyzing the global convergence of the proposed al-gorithm. We first give some properties of the algorithm and then investigate its globalconvergence to first-order stationary points. The local superlinear and quadratic conver-gence rates of the proposed algorithm are also established in the sequel.

10 Ahookhosh et al.

To establish the global convergence property, we assume that the decrease on the modelmk is at least as much as a fraction of that obtained by Cauchy’s point, i.e., there existsa constant β ∈ (0, 1) such that, for all k,

(3.1) mk(xk)−mk(xk + dk) ≥ β‖gk‖ min

[∆k,‖gk‖‖Bk‖

].

This inequality is called the sufficient reduction condition and have been investigated bymany authors when they extended some inexact methods for approximately solving thesubproblem (1.2), for example see [12, 26, 27]. Formula (3.1) implies that dk 6= 0 whenevergk 6= 0. Furthermore, throughout the paper, we also consider the following two assump-tions in order to analyzing to convergence properties of Algorithm 1:

(H1) The objective function f(x) is twice continuously differentiable and has a lowerbound on the upper level set L(x0) = x ∈ Rn|f(x) ≤ f(x0), x0 ∈ Rn.

(H2) The approximation Hessian matrix Bk is uniformly bounded, i.e., there exists aconstant M > 0 such that ‖Bk‖ ≤M , for all k ∈ N ∪ 0.

Remark 3.1. Suppose that the objective function f(x) is a twice continuously differen-tiable function and the level set L(x0) is bounded. Then (H1) implies that ‖∇2f(x)‖ isuniformly continuous and bounded above on the open bounded convex set Ω, containingL(x0). As a result, there exists a constant L > 0 such that ‖∇2f(x)‖ ≤ L, for all x ∈ Ω.Therefore, using the mean value theorem, it can be concluded that, for all x, y ∈ Ω,

‖g(x)− g(y)‖ ≤ L‖x− y‖,which means that the objective function f(x) is Lipschitz continuous in the open boundedconvex set Ω.

In the next two lemmas, it will be proved that the inner loop of Algorithm 1 will bestopped after the finite number of steps.

Lemma 3.2. Suppose that (H2) holds and the sequence xk is generated by Algorithm1. Then, we have

|f(xk + dk)−mk(xk + dk))| ≤ O(‖dk‖2).

Proof. Taylor’s expansion along with Remark 3.1 and the definition of mk(d) imply that

|f(xk + dk)−mk(xk + dk))| ≤ | − dTkGkdk + dTkBkdk|+O(‖dk‖2)= |dTk (Bk −Gk)dk|+O(‖dk‖2)≤ (L+M)‖dk‖2 +O(‖dk‖2)= O(‖dk‖2),

A limited memory trust-region method with adaptive radius 11

giving the result.

Lemma 3.3. Suppose that (H2) holds and the sequence xk is generated by Algorithm1. Then, the inner loop of Algorithm 1 terminates after the finite number of steps.

Proof. By contrary, we assume that an infinite cycle happens in Step 2 of Algorithm 1.Therefore, setting p = pk implies

(3.2) ∆pk → 0, as p→∞.

Since the current iterate xk is not the optimum solution of problem (1.1), then there existsa positive constant ε so that ‖gk‖ ≥ ε. This fact and (1.6) suggest that

(3.3) mk(xk)−mk(xk + dpk) ≥ βε min[∆p

k,ε

M

],

where dpk is the solution of the subproblem (1.2) corresponding to p ∈ N ∪ 0. Now, byLemma 3.2, (3.2) and (3.3), we obtain∣∣∣∣ f(xk)− f(xk + dpk)

mk(xk)−mk(xk + dpk)− 1

∣∣∣∣ = ∣∣∣∣f(xk + dpk)−mk(xk + dpk))

mk(xk)−mk(xk + dpk)

∣∣∣∣≤

O(‖dpk‖2)

βε min[∆p

k,εM

] ≤ O((∆pk)

2)

βε ∆pk

→ 0, (p→∞).

Therefore, there exists a sufficiently large constant p1 such that for all p ≥ p1, the inequality

ρk =f(xk)− f(xk + dpk)

mk(xk)−mk(xk + dpk)≥ µ1

holds. This fact indicates that there exists a finite nonnegative integer pk, which is acontradiction with (3.2). Therefore, the inner loop of Algorithm 1 terminates after thefinite number of steps.

In order to establish global convergence to first-order critical points, we require thatthe following condition holds

(3.4) mk(xk)−mk(xk + dk) ≥cpk

2M

[gTk qk‖qk‖

]2, for all k ∈ N ∪ 0.

Hence the subsequent lemma plays an important role in proving the global convergence ofthe sequence xk generated by Algorithm 1.

Lemma 3.4. Suppose that (H2) holds, the sequence xk is generated by Algorithm 1 anddk is a solution of the subproblem (1.2). Then, we have that (3.4) holds.

12 Ahookhosh et al.

Proof. By setting

dk = −cpkgTk qk

qTk Bkqkqk,

it is clear that ‖dk‖ ≤ ∆k, i.e., dk is a feasible point of the subproblem (1.2). On the otherhand, from (1.6), we have gTk qk < 0. Consequently, using these facts and the assumption(H2), we obtain

mk(xk)−mk(xk + dk) ≥ mk(xk)−mk(xk + dk) = −gTk dk −1

2dTkBkdk

= cpkgTk qk

qTk Bkqk

[gTk qk −

1

2cpkgTk qk

]= cpk

(gTk qk)2

qTk Bkqk

[1− 1

2cpk

]≥ cpk

(gTk qk)2

qTk Bkqk

[1− 1

2

]=

1

2cpk

(gTk qk)2

qTk Bkqk

≥ cpk

2M

[gTk qk‖qk‖

]2,

completing the proof. We here establish the global convergence property to first-order stationary points of

Algorithm 1 under the mentioned assumptions.

Theorem 3.5. Suppose that (H1) and (H2) hold and qk satisfies (1.6). Then, Algorithm1 either stops at a stationary point of (1.1) or generates an infinite sequence xk suchthat

(3.5) limk→∞

‖gk‖ = 0.

Proof. If Algorithm 1 stops at a stationary point of the problem (1.1), then we havenothing to do. Otherwise, we first prove that

(3.6) limk→∞

−gTk qk‖qk‖

= 0.

By contrary, we suppose that there exists a constant ε0 > 0 and an infinite subset K ⊆N ∪ 0 such that

(3.7) −gTk qk‖qk‖

≥ ε0 ∀k ∈ K.

A limited memory trust-region method with adaptive radius 13

Lemma 3.4 and (3.7) imply that

fk − f(xk + dk) ≥ µ1(mk(xk)−mk(xk + dk)) ≥µ1

2Mcpk

[gTk qk‖qk‖

]2≥ 1

2Mµ1c

pkε20.

Taking an infinite summation from both sides of this inequality and using assumption(H1), we get

1

2Mµ1ε

20

∞∑k=0

cpk ≤∞∑k=0

(fk − fk+1) = f0 − limk→∞

f(xk) <∞.

This fact leads to

limk→∞

cpk = 0.

This equality clearly implies that pk →∞ as k →∞, which is a contradiction with Lemma3.3. Therefore, (3.6) holds.

Using (3.6) and the fact that the vector qk satisfies (1.6), we obtain

0 ≤ τ‖gk‖ ≤ −gTk qk

‖gk‖.‖qk‖‖gk‖ = −

gTk qk‖qk‖

→ 0, (k →∞).

Therefore, (3.5) holds and the proof is completed.

Theorem 1 is the key step in investigating the convergence analysis of Algorithm 1. Itobviously implies that, if the generated sequence xk of Algorithm 1 has limit points,then all of the limit points satisfies the first-order necessary condition.

In what follows, we will verify the superlinear and the quadratic convergence ratesof Algorithm 1 under some classical conditions that have been widely used in nonlinearoptimization literatures, see [26].

Theorem 3.6. Suppose that assumptions (H1) and (H2) hold, the sequence xk is gen-erated by Algorithm 1 converges to x∗, dk = −B−1

k gk, the matrix G(x) = ∇2f(x) iscontinuous in a neighborhood N(x∗, ε) of x∗, and Bk satisfies the following condition

limk→∞

‖[Bk −G(x∗)]dk‖‖dk‖

= 0.

Then, the sequence xk converges to x∗ superlinearly. Moreover, if Bk = G(xk) and G(x)is Lipschitz continuous in a neighborhood N(x∗, ε), then the sequence xk converges tox∗ quadratically.

Proof. The proof is similar to what stated in the proof of Theorems 4.3 and 4.4 in [33]and therefore the details are omitted.

14 Ahookhosh et al.

4. Preliminary numerical experiments

In this section, we report some numerical results of LMATR on a large set of standardtest problems taken from the references [5, 23] in which problem’s dimensions are variedfrom 500 to 10000. To show the efficiency of LMATR, we compare its results with thoseobtained by the limited memory quasi-Newton algorithm (LMQNM), proposed in page48 of [26], the limited memory version of the traditional trust-region algorithm (LMTTR)and the limited memory version of the adaptive trust-region algorithm of Shi and Guo[33] (LMTRS) with qk = −Hkgk.

All of these algorithms take the advantages of the compact limited memory BFGSwith m1 = 5 which is rewritten in MATLAB code due to L-BFGS-B Fortran code in [37],which is based on the literatures [9, 22, 36]. Furthermore, all the algorithms are coded withdouble precision format in MATLAB 7.4 and the trust-region subproblems are solved by amodified Steihaug-Toint procedure (see [12]), which employs the compact limited memoryBFGS. Similar to Bastin et al. in [7], the Steihaug-Toint algorithm terminates at xk + dwhen

‖∇f(xk) +Bkd‖ ≤ min

[1

10, ‖gk‖

12

]‖gk‖ or ‖d‖ = ∆k,

holds. Moreover, all the algorithms are stopped whenever the total number of iteratesexceeds 20000 or the condition

‖gk‖ ≤ 10−5

is satisfied. We set µ1 = 0.05 and µ2 = 0.9 for the algorithms LMATR, LMTTR andLMTRS. Besides, we exploit c = 0.2 and c1 = 1.55 for LMATR, and the parameters ofLMTRS are chosen the same as those reported in [33]. Similar to [12], LMTTR employs∆0 =

110‖g0‖ and updates the trust-region radius by

∆k+1 =

α1‖dk‖ if ρk < µ1;∆k if µ1 ≤ ρk < µ2;maxα2‖dk‖, ∆k if ρk ≥ µ2,

where α1 = 0.25 and α2 = 3.5. During the run of the algorithms, we make sure that all ofthe codes converge to the same point. Therefore, we have just provided those results inwhich all the algorithms are convergent to an identical point. The results are summarizedin Table 1.

Thanks to the structure of Algorithm 1 (LMATR) and the other presented algorithms,it is easy to see that the total number of iterates, Ni, are identical to the total numberof gradient evaluations, Ng. Therefore, in Table 1, we only report the number of iterates,the number of function evaluations, Nf and the running time T as efficiency measures.From Table 1, it can be seen that, in the most cases, both the number of iterates andfunction evaluations of Algorithm 1 is remarkably less than the others. Although theproposed algorithm is not the best algorithm in some problems, it has better computational

A limited memory trust-region method with adaptive radius 15

performances. We also take the advantages of the performance profile of Dolan and Morein [13], which is a statistical tolls to compare the efficiency of algorithms. Therefore, weillustrate the results of Table 1 in Figures 1-3 according to the total number of iterates,the total number of function evaluations and the running time, respectively.

From Figure 1, it is observed that LMATR attains the most wins among the otherconsidered algorithms. More precisely, it solves about 51% of test functions efficientlyand faster than the others. We also see that LMTRS performs better than LMQNM andLMTTR regarding of the total number of iterates. Moreover, considering the ability ofcompleting the run successfully, LMATR is the best algorithm among the others, where itsfigure grows up faster than the other algorithms. This means that whenever the proposedalgorithm is not the best algorithm, it performs close to the best algorithm. Figure 2 showsthat LMATR, LMQNM and LMTTR are so competitive regarding to the total number offunction evaluations, however they perform better than LMTRS. Furthermore, the resultsof LMATR have the most wins in about 40% of test functions. We also use the efficiencymeasure Nf + 3Ni in Figure 3, where it show that LMATR outperforms the other. Thelast comparison is related to the running time, where LMQNM has better performance.Our preliminary computational experiments show that LMATR is promising for solvinglarge-scale unconstrained optimization problems.

16 Ahookhosh et al.

1 2 3 4 5 6 70.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

τ

P(r

p,s ≤

τ :

1 ≤

s ≤

n s)

LMGNMLMTTRLMTRSLMATR

(a) τ = 7

1 1.2 1.4 1.6 1.8 20.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

τ

P(r

p,s ≤

τ :

1 ≤

s ≤

n s)

LMGNMLMTTRLMTRSLMATR

(b) τ = 2

Figure 1. Performance profile for the number of iterates

A limited memory trust-region method with adaptive radius 17

1 2 3 4 5 6 70.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

τ

P(r

p,s ≤

τ :

1 ≤

s ≤

n s)

LMGNMLMTTRLMTRSLMATR

(a) τ = 7

1 1.2 1.4 1.6 1.8 20.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

τ

P(r

p,s ≤

τ :

1 ≤

s ≤

n s)

LMGNMLMTTRLMTRSLMATR

(b) τ = 2

Figure 2. Performance profile for the number of function evaluations

18 Ahookhosh et al.

1 2 3 4 5 6 70.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

τ

P(r

p,s ≤

τ :

1 ≤

s ≤

n s)

LMGNMLMTTRLMTRSLMATR

(a) τ = 7

1 1.2 1.4 1.6 1.8 20.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

τ

P(r

p,s ≤

τ :

1 ≤

s ≤

n s)

LMGNMLMTTRLMTRSLMATR

(b) τ = 2

Figure 3. Performance profile for Nf + 3Ni

A limited memory trust-region method with adaptive radius 19

1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

τ

P(r

p,s ≤

τ :

1 ≤

s ≤

n s)

LMGNMLMTTRLMTRSLMATR

(a) τ = 10

1 1.5 2 2.5 3 3.5 4 4.5 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

τ

P(r

p,s ≤

τ :

1 ≤

s ≤

n s)

LMGNM

LMTTR

LMTRS

LMATR

(b) τ = 5

Figure 4. Performance profile for the running time

20 Ahookhosh et al.Tab

le1.

Numerical

results

Problem

nam

eDim

LMQNM

LMTTR

LMTRS

LMATR

Ni

/N

f/

TN

i/

Nf

/T

Ni

/N

f/

TN

i/

Nf

/T

POW

ER

500

4273/4397/6.28

6515/6860/23.89

6707/8061/31.31

6164/6543/26.52

Hager

500

35/40/0.61

36/41/0.12

38/49/0.22

38/49/0.16

Ray

dan

150

0165/176/0.99

146/155/0.52

144/178/1.00

143/157/1.29

G.W

.&

Holst

500

6423/9262/20.60

6390/8445/26.18

5764/15836/49.57

5577/9099/48.69

BIG

GSB1

500

1409/1449/4.02

1368/1447/5.10

1365/1668/8.39

1521/1598/9.74

G.Rosenbrock

1000

6892/9686/25.14

6248/8041/37.78

5983/13247/61.96

5735/8362/53.27

E.W

.&

Holst

1000

76/111/0.70

66/100/0.33

46/123/0.74

57/94/0.72

Partial

p.quad

.10

00188/217/3.83

170/193/2.73

169/229/3.68

164/187/3.41

DIX

ON3D

Q10

005327/5671/35.12

4692/4971/20.73

4849/6009/32.72

5156/5494/34.16

*FLETCHCR

1000

5333/6271/19.72

5814/6423/34.14

7173/9008/61.24

4687/5559/39.39

TRID

IA10

00871/895/1.31

868/924/3.62

790/920/5.81

631/669/3.55

Quad

.QF2

1000

1322/1681/4.91

289/308/1.84

278/342/3.25

273/289/2.61

VARDIM

1000

77/112/0.37

81/114/0.39

84/245/1.03

81/111/0.57

ENGVAL1

1000

25/32/0.08

22/27/0.25

23/42/0.40

20/26/0.21

Diagonal

210

00145/160/1.00

150/162/0.84

140/167/1.40

141/164/1.49

E.tridiag.

210

0027/31/0.45

29/32/0.15

28/43/0.41

27/30/0.42

E.Penalty

1000

69/71/0.55

61/76/0.27

62/140/0.61

56/71/0.50

ARW

HEAD

1000

14/16/0.12

9/15/0.04

9/48/0.13

10/17/0.06

EG2

1000

153/241/0.86

26/39/0.39

21/101/0.69

20/29/0.24

CUBE

1000

1263/2081/16.76

1903/2235/18.17

1646/6606/39.67

1280/2328/17.45

E.Maratos

1000

153/232/0.88

649/689/2.77

112/423/0.92

105/217/0.74

E.Pow

ell

5000

69/77/2.09

71/94/4.04

49/198/6.13

64/112/4.74

E.Wood

5000

178/236/0.51

50/73/2.20

34/102/2.89

49/77/2.68

P.quad

.50

00597/623/17.46

608/637/38.57

575/702/41.55

592/631/38.45

P.tridiag.

quad

.50

00557/583/14.63

635/676/33.47

591/693/39.91

555/595/34.29

Broyden

tridiag.

5000

68/87/1.17

52/58/2.73

51/74/3.67

33/40/2.07

Alm

ostp.quad

.50

00580/597/16.87

629/661/37.83

599/710/40.29

592/623/37.74

NONDQUAR

5000

1834/2036/16.31

2096/2542/82.08

2292/3949/143.37

2155/2746/112.47

EDENSCH

5000

26/30/0.16

20/25/0.69

21/39/1.17

21/24/0.71

Quad

.QF1

5000

587/604/15.33

607/633/31.43

588/696/37.46

598/640/37.64

E.Fre.an

dRoth

5000

20/23/0.66

18/23/0.74

17/45/0.72

19/26/0.90

E.tridiag.

150

0027/31/0.79

28/31/1.12

24/44/1.53

32/38/1.52

BDEXP

5000

31/32/0.40

36/37/1.00

31/32/0.96

31/32/0.95

HIM

MELBG

5000

2/3/0.02

37/38/1.37

2/3/0.07

2/3/0.06

A limited memory trust-region method with adaptive radius 21Tab

le1.(continued

)

QUARTC

5000

31/32/1.29

31/32/1.51

28/32/1.75

26/28/1.29

LIA

RW

HD

5000

34/42/0.93

39/55/2.14

33/96/3.07

31/40/1.65

E.PSC1

5000

21/24/0.43

12/14/0.46

12/26/0.79

14/18/0.60

E.BD1

5000

14/17/0.40

15/19/0.76

14/17/0.94

14/17/0.71

E.quad

.p.QP1

5000

34/36/1.62

28/35/2.29

27/61/3.43

24/31/2.19

E.DENSCHNB

5000

10/12/0.27

10/11/0.51

9/13/0.70

9/11/0.55

E.DENSCHNF

5000

16/18/0.46

13/18/0.87

14/38/1.90

13/22/1.04

SIN

COS

5000

21/24/0.89

12/14/0.90

12/26/1.25

14/18/1.04

COSIN

E50

0017/20/0.16

14/17/0.78

15/26/1.53

15/18/0.75

Ray

dan

250

008/9/0.30

7/8/0.50

8/9/0.68

8/9/0.71

Diag.

450

004/6/0.08

6/9/0.41

8/23/0.99

7/11/0.46

Diag.

750

0011/14/0.44

7/8/0.47

7/9/0.56

7/10/0.53

Diag.

850

006/8/0.24

6/7/0.45

7/11/0.57

5/7/0.32

NONSCOMP

5000

52/60/1.94

41/50/2.77

636/2466/127.13

34/40/2.55

DIX

MAANE

6000

332/349/23.57

363/386/46.97

288/339/44.67

336/360/47.96

DIX

MAANF

6000

257/268/16.93

252/262/46.42

209/241/43.36

245/259/35.67

DIX

MAANG

6000

240/247/22.79

325/349/47.25

225/267/28.19

289/307/35.89

DIX

MAANI

6000

2417/2478/150.34

2568/2744/240.75

811/965/99.72

3157/3376/354.40

DIX

MAANH

9000

498/586/82.93

249/258/71.60

233/278/89.83

268/283/82.25

DIX

MAANA

9000

8/11/1.36

9/10/2.42

9/17/3.36

10/13/3.33

DIX

MAANB

9000

7/10/2.19

7/8/3.13

8/16/6.74

9/12/4.71

DIX

MAANC

9000

9/12/4.51

10/12/5.64

10/20/5.95

9/12/5.63

DIX

MAAND

9000

10/14/6.53

9/11/6.09

11/24/9.81

13/17/7.76

DIX

MAANJ

9000

284/298/86.18

227/236/103.86

447/513/205.29

388/406/167.55

DIX

MAANK

9000

376/382/112.50

403/414/161.14

414/458/188.66

349/364/135.36

DIX

MAANL

9000

1656/1714/461.84

353/364/178.31

330/384/156.68

316/326/142.45

DQDRTIC

1000

010/14/0.91

18/30/3.59

15/33/5.15

14/23/2.90

Diag.

510

000

5/6/0.39

7/8/1.07

5/6/0.80

5/6/0.78

NONDIA

1000

011/13/0.95

9/19/2.02

11/61/5.37

9/19/2.18

E.T

ET

1000

010/12/0.46

10/11/1.17

10/18/1.86

9/12/1.32

E.Beale

1000

018/20/1.07

18/19/2.18

16/27/3.05

19/23/2.66

FullHessian

FH3

1000

04/6/0.33

4/11/1.06

4/35/3.36

4/11/1.09

G.PSC1

1000

044/46/0.28

38/41/3.29

39/56/5.23

33/38/3.33

E.Him

melblau

1000

013/16/0.60

11/14/2.36

12/24/2.50

10/13/1.48

E.quad

.e.

EP1

1000

06/8/0.26

4/8/0.67

4/25/1.96

4/9/0.79

22 Ahookhosh et al.

5. Concluding remarks

We present an iterative scheme for solving large-scale unconstrained optimization prob-lems based on a trust-region framework equipped with an adaptive radius and the compactlimited memory BFGS. As it is known, using an appropriate adaptive radius can decreasethe total number of subproblems solving in trust-region methods. We describe some dis-advantages of the adaptive trust-region radius of Shi and Guo in [33], especially for solvinglarge problems. To overcome these drawbacks, we propose some reformations for this ra-dius. Moreover, the limited memory quasi-Newton schemes have been developed to copewith large-scale optimization problems. We therefore unify this two interesting ideas intoa trust-region algorithm to decrease the computational cost compared with the traditionaltrust-region framework.

From the theoretical analysis point of view, the proposed algorithm inherits the globalconvergence of traditional trust-region algorithms to the first-order stationary points un-der classical assumptions. The superlinear and the quadratic convergence rates are alsoestablished. Finally, our preliminary numerical experiments on the set of standard testproblems point out that the proposed algorithm is efficient for solving large-scale uncon-strained optimization problems.

Acknowledgement The authors are grateful for the valuable comments and suggestionsof anonymous referees.

References

[1] M. Ahookhosh, K. Amini, A nonmonotone trust region method with adaptive radius for unconstrainedoptimization problems, Computers and Mathematics with Applications. 60 (2010) 411–422.

[2] M. Ahookhosh, H. Esmaeili, M. Kimiaei, An effective trust-region-based approach for symmetricnonlinear systems, International Journal of Computer Mathematics. 90 (2013) 671–690.

[3] K. Amini, M. Ahookhosh, A hybrid of adjustable trust-region and nonmonotone algorithms for un-constrained optimization, Applied Mathematical Modelling, 38 (2014), 2601–2612.

[4] K. Amini, M. Ahookhosh, Combination Adaptive Trust Region Method By Non-Monotone StrategyFor Unconstrained Nonlinear Programming, Asia-Pacific Journal of Operational Research, 28 (05),585–600.

[5] N. Andrei, An unconstrained optimization test functions collection, Advanced Modeling and Optimiza-tion. 10(1) (2008) 147–161.

[6] D. Ataee Tarzanagh, M.R. Peyghami, H. Mesgarani, A new nonmonotone trust region method forunconstrained optimization equipped by an efficient adaptive radius, Optimization Methods and Soft-ware, 29 (4) (2014), 819–836.

[7] F. Bastin, V. Malmedy, M. Mouffe, Ph.L. Toint, D. Tomanos, A retrospective trust-region methodfor unconstrained optimization, Mathemathical Programming. 123(2) (2008) 395–418.

[8] J.V. Burke, A. Wiegmann, L. Xu, Limited memory BFGS updating in a trust-region framework,SIAM Journal on Optimization. submitted, 2008.

[9] R.H. Byrd, P. Lu, J. Nocedal, A limited memory algorithm for bound constrained optimization, SIAMJournal on Scientific and Statistical Computing. 16(5) (1995) 1190–1208.

A limited memory trust-region method with adaptive radius 23

[10] R. Byrd, J. Nocedal, R. Schnabel, Representation of quasi-Newton matrices and their use in limitedmemory methods, Mathematical Programming. 63 (1994) 129–156.

[11] A.R. Conn, N.I.M. Gould, Ph.L Toint, LANCELOT: a Fortran package for large-scale nonlinearoptimization (Release A). (Springer Series in Computational Mathematics, No. 17) Springer, BerlinHeidelberg NewYork, 1992.

[12] A.R. Conn, N.I.M. Gould, Ph.L Toint, Trust-Region Methods. Society for Industrial and AppliedMathematics SIAM, Philadelphia, 2000.

[13] E. Dolan, J.J More, Benchmarking optimization software with performance profiles, MathemathicalProgramming. 21 (2002) 201–213.

[14] H. Esmaeili, M. Kimiaei, An efficient adaptive trust-region method for systemsof nonlinear equations, International Journal of Computer Mathematics, (2014),http://dx.doi.org/10.1080/00207160.2014.887701.

[15] H. Esmaeili, M. Kimiaei, A new adaptive trust-region method for system of nonlinear equations,Applied Mathematical Modelling. 38(11–12) (2014) 3003–3015.

[16] R. Fletcher, Practical Method of Optimization, Wiley, NewYork, 2000.[17] N. Gould, S. Lucidi, M. Roma, Ph.L Toint, Solving the trust-region subproblem using the Lanczos

method, SIAM Journal on Optimization. 9(2) (1999) 504–525.[18] N.I.M. Gould, D. Orban, A. Sartenaer, Ph.L Toint, Sentesivity of trust region algorithms to their

parameters, 4OR. A Quarterly Journal of Operations Research. 3 (2005) 227–241.[19] D.C. Liu, J. Nocedal, On the limited memory BFGS method for large scale optimization, Mathematical

Programming. 45 (1989) 503–528.[20] L. Kaufman, Reduced storage, quasi-Newton trust region approaches to function optimization, SIAM

Journal on Optimization. 10(1) (1999) 56–69.[21] J.L. Morales, J. Nocedal, Enriched methods for large-scale unconstrained optimization, Computational

Optimization and Applications. 21 (2002) 143–154.[22] J.L. Morales, J.Nocedal, L-BFGS-B: Remark on Algorithm 778: L-BFGS-B, FORTRAN routines

for large scale bound constrained optimization, to appear in ACM Transactions on MathematicalSoftware, 2011.

[23] J.J. More, B.S. Garbow, Hillstrom, K.E. Testing Unconstrained Optimization Software, ACM Trans-actions on Mathematical Software. 7 (1981) 17–41.

[24] S.G. Nash, J. Nocedal, A numerical study of the limited memory BFGS method and the truncatedNewton method for large scale optimization, SIAM Journal on Optimization. 1 (1991) 358–372.

[25] J. Nocedal, Updating quasi-Newton matrices with limited storage, Mathematics of Computation. 35(1980) 773–782.

[26] J. Nocedal, S.J Wright, Numerical Optimization. Springer, NewYork, 2006.[27] J. Nocedal, Y. Yuan, Combining trust region and line search techniques, in: Y. Yuan (Ed.), Advanced

in nonlinear programming. Kluwer Academic Publishers, Dordrecht, (1996) 153–175.[28] M.J.D Powell, A new algorithm for unconstrained optimization, In: Rosen JB, Mangasarian OL,

Ritter K (eds) Nonlinear programming. Academic Press, London, (1970) 31–65.[29] M.J.D Powell, On the global convergence of trust region algorithms for unconstrained optimization,

Mathematical Programming. 29 (1984) 297–303.[30] A. Sartenaer, Automatic determination of an initial trust region in nonlinear programming, SIAM

Journal on Scientific Computing. 18(6) (1997) 1788–1803.[31] R.B. Schnabel, E. Eskow, A new modified Cholesky factorization, SIAM Journal on Scientific Com-

puting. 11(6) (1990) 1136–1158.

24 Ahookhosh et al.

[32] G.A. Schultz, R.B. Schnabel, R.H. Byrd, A family of trust-region-based algorithms for unconstrainedminimization with strong global convergence, SIAM Journal on Numerical Analysis. 22 (1985) 47–67.

[33] Z.J. Shi, J.H. Guo, A new trust region method with adaptive radius, Computational Optimization andApplications, 213 (2008) 509–520.

[34] T. Steihaug, The conjugate gradient method and trust regions in large scale optimization, SIAMJournal on Numerical Analysis. 20(3) (1983) 626–637.

[35] X.S. Zhang, J.L. Zhang, L.Z. Liao, An adaptive trust region method and its convergence, Science inChina. 45 (2002) 620–631.

[36] C. Zhu, R.H. Byrd, J. Nocedal, L-BFGS-B: Algorithm 778: L-BFGS-B, FORTRAN routines for largescale bound constrained optimization, ACM Transactions on Mathematical Software. 23(4) (1997)550–560.

[37] http://users.eecs.northwestern.edu/ nocedal/lbfgsb.html

Masoud AhookhoshFaculty of Mathematics, University of Vienna, Oskar-Morgenstern-Platz 1, 1090 Vienna, AustriaEmail: [email protected]

Keyvan AminiDepartment of Mathematics, Razi University, Kermanshah, IranEmail: [email protected]

Morteza KimiaeiDepartment of Mathematics, Asadabad Branch, Islamic Azad University, Asadabad, IranEmail: [email protected]

M. Reza PeyghamiDepartment of Mathematics, K.N. Toosi University of Technology, P.O. Box 16315-1618, Tehran, IranEmail: [email protected]


Recommended