A diﬀerentiable merit function for the shifted …The nonlinear semideﬁnite programming, a...

A differentiable merit function for the shifted perturbedKKT conditions of the nonlinear semidefinite

programming

Yuya Yamakawa and Nobuo Yamashita

July 22, 2013

Abstract.In this paper we consider a primal-dual interior point method for solv-

ing nonlinear semidefinite programming problems, which is based on theshifted perturbed KKT conditions. The main task of the interior pointmethod is to get a point approximately satisfying the shifted perturbedKKT conditions. We first propose a differentiable merit function whosestationary points always satisfy the conditions. The function is an exten-sion of that proposed by Forsgren and Gill for the nonlinear programmingproblem. Then, we develop a Newton type method that finds a station-ary point of the merit function. We show the global convergence of theproposed Newton type method under some mild conditions. Finally, wereport some numerical results which show that the proposed method iscompetitive to the existing primal-dual interior point method based onthe perturbed KKT conditions.

Key Words.The nonlinear semidefinite programming, a primal-dual interior pointmethod, the shifted perturbed KKT conditions, a merit function, theNewton type method

1 Introduction

In this paper we consider the following nonlinear semidefinite programming (SDP) problem:

minimizex∈Rn

f(x),

subject to g(x) = 0, X(x) ⪰ 0,(1.1)

where f : Rn → R, g : Rn → Rm and X : Rn → Sd are twice continuously differentiablefunctions, and Sd denotes the set of d × d real symmetric matrices. Let Sd

++ (Sd+) denote the

set of d× d real symmetric positive (semi)definite matrices. For a matrix M ∈ Sd, M ⪰ 0 and

1

M ≻ 0 mean that M ∈ Sd+ and M ∈ Sd

++, respectively. If the functions f , g and X are linear,then the nonlinear SDP (1.1) is reduced to the linear SDP.

The nonlinear SDP is a wide class of the mathematical programming problems, and hasmany applications [5, 7, 11, 20, 24]. The linear programming, the second order cone program-ming, the linear SDP and the nonlinear programming can be cast as the nonlinear SDP. Thelinear SDP has been studied extensively by many researchers [1, 3, 6, 21, 22, 23]. However thereexist important applications that are formulated as the nonlinear SDP, but cannot be reducedto the linear SDP. For example, the Gaussian channel capacity problems [24], the minimization(or maximization) of the minimal (or maximal) eigenvalue problems [16], the nearest correla-tion matrix problems [17] and the static output feedback problems [18] are such applications.Thus it is worth to develop solution methods for the nonlinear SDP.

Until now several solution methods for the nonlinear SDP have been proposed [5, 9, 19, 26].Basically, these methods are extensions of the existing methods for the nonlinear programming,such as the sequential quadratic programming method, the successive linearization method, theaugmented Lagrangian method and the interior point method.

Freund, Jarre and Vogelbush [5] proposed the sequential semidefinite programming methodfor the nonlinear SDP. However, they consider only the case where the objective function is aquadratic function and the constraint functions are affine. Kanzow, Nagel, Kato and Fukushima[9] extended the successive linearization method with a certain exact penalty function and thetrust region-type technique. They show that the extended method is globally convergent underrather strong assumptions on the generated sequence, which are not verified in advance. Stingl[19] presented the augmented Lagrangian method for the nonlinear SDP. The method needs tocalculate the eigenvalue decomposition of a matrix, and hence, it may not be suitable for solvingsome large-scale problems. Yamashita, Yabe and Harada [26] applied the primal-dual interiorpoint method for the nonlinear SDP, and they exploit a nondifferentiable L1 merit function todetermine a step length. They showed the global convergence of their algorithm under someunclear assumptions on the generated sequences. These assumptions are discussed in Section4.3.

The purpose of this work is to propose an interior point method for (1.1) that convergesglobally under milder conditions than the above existing methods. In particular, we giveconcrete conditions related to the problem data, e.g., f, g and X only. We show that theseconditions hold for the linear SDP.

Recently, Kato, Yabe and Yamashita [10] proposed a primal-dual interior point methodbased on the shifted perturbed KKT conditions, which are an extension of the method pro-posed by Forsgren and Gill [4] for the nonlinear programming. The method generates pointssatisfying the shifted perturbed KKT conditions at each iteration. In order to find such a point,Kato, Yabe and Yamashita [10] exploit a merit function which is an extension of [25]. However,since the merit function is rather complicated, it might be difficult to implement it appropri-ately. In this paper, we propose a new merit function F whose stationary points satisfy theshifted perturbed KKT conditions. It is an extension the merit function of [4] for the nonlinearprogramming. It consists of simple functions of matrices, such as log-determinant and trace,and hence it is easy to implement. We show the following important properties of the meritfunction F .

(i) The merit function F is differentiable;

(ii) Any stationary point of the merit function F is a shifted perturbed KKT point;

2

(iii) The level set of the merit function F is bounded under some reasonable assumptions.

Note that Kato, Yabe and Yamashita [10] showed that their merit function also enjoys (i) and(ii), but they did not show the property (iii). Due to these properties, we can find a pointsatisfying the shifted perturbed KKT conditions by minimizing the merit function F . For theminimization of F , we further propose a Newton type method based on the nonlinear equationsin the shifted perturbed KKT conditions. We show that the Newton direction is sufficientlydescent for the merit function F . As a result, we prove the global convergence of the proposedNewton type method. These details are discussed in Section 4.

The present paper is organized as follows. In Section 2, we introduce some operators in Sd

and important concepts, which are used in the subsequent sections. In Section 3, we presenta primal-dual interior point algorithm based on the shifted perturbed KKT conditions. InSection 4, we first propose a merit function F for the shifted perturbed KKT point and presentits properties. Secondly, we propose a Newton type algorithm for minimization of the meritfunction. Moreover, we prove the global convergence of the Newton type algorithm. In Section5, we report some numerical results for the proposed method. Finally, we make some concludingremarks in Section 6.

Throughout this paper, we use the following notations. Let p and q be positive integers.For matrices A,B ∈ Rp×q, ⟨A,B⟩ denotes the inner product of A and B defined by

⟨A,B⟩ ≡ tr(A⊤B),

where tr(M) denotes the trace of a square matrix M , and the superscript ⊤ denotes thetranspose of a vector or a matrix. Note that if q = 1, then ⟨·, ·⟩ denotes the inner product ofvectors in Rp. For a given vector w ∈ Rp and a matrix W ∈ Rp×q, wi denotes the i-th elementof the vector w, and Wij denotes the (i, j)-th element of the matrix W . Moreover, ∥w∥ denotesthe Euclidean norm of the vector w defined by

∥w∥ ≡√

⟨w,w⟩,

and ∥W∥F denotes the Frobenius norm of the matrix W defined by

∥W∥F ≡√

⟨W,W ⟩.

Let V ≡ Rn ×Rm × Sd. For a given v ∈ V , we use the following notations for simplicity.

v =

xyZ

or v = (x, y, Z),

where x ∈ Rn, y ∈ Rm and Z ∈ Sd, respectively. We further define the inner product ⟨·, ·⟩and the norm ∥ · ∥ on V as ⟨v1, v2⟩ ≡ ⟨x1, x2⟩ + ⟨y1, y2⟩ + ⟨Z1, Z2⟩ and ∥v∥ ≡

√⟨v, v⟩, where

v1 = (x1, y1, Z1) ∈ V and v2 = (x2, y2, Z2) ∈ V . For a given matrix U ∈ Sd, λ1(U), . . . , λd(U)denote eigenvalues of the matrix U . In particular, λmin(U) and λmax(U) denote the minimumand the maximum eigenvalues of the matrix U , respectively. For a given matrix V ∈ Sd

+,

V12 ∈ Sd

+ denotes the matrix such that V = V12V

12 . Note that V

12 ≡ QΛQ⊤, where

Λ =

√λ1(V ) O

. . .

O√λd(V )

,3

and Q is a certain orthogonal matrix such that V = QΛ2Q⊤. Let Φ : P1 × P2 → P3, where P1

and P2 are open sets. We denotes a Frechet derivative of Φ as ∇Φ. We further denote a Frechetderivative of Φ with respect to a variable Z ∈ P1 as ∇ZΦ. Moreover, if Φ is a vector-valuedfunction, then JΦ denotes a Jacobian of Φ.

2 Preliminaries

In this section, we first introduce some operators. Then we present some useful properties ofthe log-determinant function on Sd. Moreover, we introduce the (approximate) KKT conditionsrelated to the primal-dual interior point methods for the nonlinear SDP.

2.1 Some operators and their properties

Let U, V ∈ Sd, P,Q ∈ Rd×d and x,w ∈ Rn. We use the following notations.

(i) A product of the matrices U and V is defined by

U V ≡ UV + V U

2.

(ii) A partial derivative of X(x) with respect to xi is denoted by Ai(x) ∈ Sd, that is,

Ai(x) ≡∂

∂xiX(x) for i = 1, . . . , n.

(iii) An operator A(x) from Rn to Sd is defined by

A(x)w ≡n∑

i=1

wiAi(x).

(iv) The adjoint operator of A(x) is denoted by A∗(x), that is,

A∗(x)U =

⟨A1(x), U⟩...

⟨An(x), U⟩

.(v) An operator P ⊙Q from Sd to Sd is defined by

(P ⊙Q)U ≡ 1

2(PUQ⊤ +QUP⊤). (2.1)

If X(x) =∑n

i=1 xiAi with some constant matrices Ai ∈ Sd, i = 1, . . . , n, then Ai(x) = Ai, i =1, . . . , n. Note that U V = 0 is equivalent to UV = 0 if U and V are symmetric positivesemidefinite. Note also that a gradient of ⟨X(x), U⟩ with respect to x is given by

∇x ⟨X(x), U⟩ = A∗(x)U. (2.2)

We list some useful properties of the operator P ⊙Q. See [22] and [26] for their proofs.

4

Proposition 1 Let P and Q be nonsingular matrices in Rd×d. Then the following statementshold.

(a) The operator P ⊙Q is invertible.

(b) ⟨U, (P ⊙Q)V ⟩ =⟨(P⊤ ⊙Q⊤)U, V

⟩for all U, V ∈ Sd,

⟨U, (P ⊙Q)−1V ⟩ =⟨(P⊤ ⊙Q⊤)−1U, V

⟩for all U, V ∈ Sd.

(c) (P ⊙ P )−1 = (P−1 ⊙ P−1). 2

Some interior point methods for SDP exploit a scaling of X(x) and Z, where Z ∈ Sd

corresponds to a Lagrangian multiplier matrix for X(x) ⪰ 0 in (1.1). (The details of Z aregiven in Subsection 2.3. We will exploit the scaling in the proposed method.) Let T be a

nonsingular matrix in Rd×d. We consider the scaled matrices X(x) and Z defined by

X(x) ≡ (T ⊙ T )X(x) and Z ≡ (T−⊤ ⊙ T−⊤)Z.

We show some useful properties of X(x) and Z.

Proposition 2 The following statements hold.

(a) Let ψ(x, Z) ≡ X(x)Z. Then we have ∇xψ(x, Z) =12(Z⊙I)(T⊙T )A(x) and ∇Zψ(x, Z) =

12(X(x)⊙ I)(T−⊤ ⊙ T−⊤).

(b) Suppose that X(x) and Z are symmetric positive definite. Suppose also that X(x) and Zcommute. Then we have⟨

(Z ⊙ I)(X(x)⊙ I)U,U⟩≥ 0 for all U ∈ Sd.

Furthermore, the strict inequality holds in the above if and only if U = 0.

(c) Suppose that X(x) and Z commute. Then we have

(X(x)⊙ I)(Z ⊙ I) = (Z ⊙ I)(X(x)⊙ Z).

Proof. (a) Let x ∈ Rn be fixed. Since the function X is differentiable, we have

X(x+ h) = X(x) +n∑

i=1

hiAi(x) + o(∥h∥) for h ∈ Rn,

and hence

X(x+ h) = X(x) + (T ⊙ T )n∑

i=1

hiAi(x) + o(∥h∥).

= X(x) + (T ⊙ T )A(x)h+ o(∥h∥).

5

It then follows that

ψ(x+ h, Z) = X(x+ h)Z + ZX(x+ h)

= X(x) + (T ⊙ T )A(x)h+ o(∥h∥)Z + ZX(x) + (T ⊙ T )A(x)h+ o(∥h∥)= X(x)Z + ZX(x) + (T ⊙ T )A(x)hZ + Z(T ⊙ T )A(x)h+ o(∥h∥)

= ψ(x, Z) +1

2(Z ⊙ I)(T ⊙ T )A(x)h+ o(∥h∥),

which implies that ∇xψ(x, Z) =12(Z ⊙ I)(T ⊙ T )A(x). Next we give ∇Zψ(x, Z). Let H ∈ Sd.

Then we have

ψ(x, Z +H) = X(x)(T−⊤ ⊙ T−⊤)(Z +H) + (T−⊤ ⊙ T−⊤)(Z +H)X(x)

= X(x)Z + ZX(x) + X(x)(T−⊤ ⊙ T−⊤)H + (T−⊤ ⊙ T−⊤)HX(x)

= ψ(x, Z) +1

2(X(x)⊙ I)(T−⊤ ⊙ T−⊤)H,

which implies that ∇Zψ(x, Z) =12(X(x)⊙ I)(T−⊤ ⊙ T−⊤). Thus, (a) is proved.

(b) Since the matricesX(x) and Z are symmetric positive definite, X(x) and Z are also symmet-

ric positive definite. It then follows from the commutativity of X(x) and Z that X(x)Z is sym-

metric positive definite. Thus, there exists (X(x)Z)12 such that X(x)Z = (X(x)Z)

12 (X(x)Z)

12 .

Let U ∈ Sd. Then we have⟨(Z ⊙ I)(X(x)⊙ I)U,U

⟩=

1

4tr((ZX(x)U + X(x)UZ + ZUX(x) + UX(x)Z)U)

=1

4tr(X(x)UZU) +

1

4tr(ZUX(x)U) +

1

4tr(UX(x)ZU) +

1

4tr(UZX(x)U)

=1

4tr(X(x)

12UZ

12 Z

12UX(x)

12 ) +

1

4tr(Z

12UX(x)

12 X(x)

12UZ

12 )

+1

4tr(UX(x)ZU) +

1

4tr(UX(x)ZU)

=1

2tr(X(x)

12UZ

12 Z

12UX(x)

12 ) +

1

2tr(U(X(x)Z)

12 (X(x)Z)

12U)

=1

2∥X(x)

12UZ

12∥2F +

1

2∥(X(x)Z)

12U∥2F

≥ 0,

where the third equality follows from the commutativity of X(x) and Z. Note that, since

X(x)12 , Z

12 and (X(x)Z)

12 are positive definite, the strict inequality holds in the above if and

only if U = 0.(c) For any U ∈ Sd, we have

(X(x)⊙ I)(Z ⊙ I)U =1

4(X(x)ZU + ZUX(x) + X(x)UZ + UZX(x))

=1

4(ZX(x)U + X(x)UZ + ZUX(x) + UX(x)Z)

= (Z ⊙ I)(X(x)⊙ I)U,

where the second equality follows from the commutativity of X(x) and Z. Hence, we obtain

(X(x)⊙ I)(Z ⊙ I) = (Z ⊙ I)(X(x)⊙ I). 2

6

2.2 Properties of the log-determinant function

Let ϕ : Sd++ → R be defined by ϕ(M) ≡ − log detM . Let Ω be defined by Ω ≡ x ∈ Rn|X(x) ≻

0, and let φ : Ω → R be defined by

φ(x) ≡ ϕ(X(x)). (2.3)

We first give the differentiability and convexity of φ.

Proposition 3

(a) The function φ is differentiable on Ω, and its derivative is given by ∇φ(x) = −A∗(x)X(x)−1.

(b) Suppose that

X(λu+ (1− λ)v)− λX(u)− (1− λ)X(v) ⪰ 0 for λ ∈ [0, 1] and u, v ∈ Ω. (2.4)

Then φ is convex on Ω. Moreover, if X is injective on Ω, then φ is strictly convex.

(c) Suppose that (2.4) holds. Suppose also that A1(x), . . . , An(x) are linearly independent forall x ∈ Ω. Then φ is strictly convex.

Proof. (a) It follows from [21, Section 5] that

∇ϕ(M) = −M−1. (2.5)

Then, we have from the chain rule that

∇φ(x) = −A∗(x)X(x)−1. (2.6)

(b) First note that detA ≤ detB if 0 ⪯ A and 0 ⪯ B − A from [8, Corollary 7.7.4]. It thenfollows from (2.4) that for any λ ∈ [0, 1] and u, v ∈ Ω such that u = v,

det[λX(u) + (1− λ)X(v)] ≤ det[X(λu+ (1− λ)v)].

Since − log is a decreasing function on (0,∞) and ϕ is strictly convex from [8, Theorem 7.6.7],we have

φ(λu+ (1− λ)v) = − log det[X(λu+ (1− λ)v)]

≤ − log det[λX(u) + (1− λ)X(v)].

= ϕ(λX(u) + (1− λ)X(v))

≤ λϕ(X(u)) + (1− λ)ϕ(X(v))

= λφ(u) + (1− λ)φ(v),

which shows that φ is convex on Ω.Suppose that u = v. Then, since X is injective on Ω, X(u) = X(v). Moreover, since ϕ is

strictly convex,

φ(λu+ (1− λ)v) ≤ ϕ(λX(u) + (1− λ)X(v))

< λϕ(X(u)) + (1− λ)ϕ(X(v))

≤ λφ(u) + (1− λ)φ(v)

7

for λ ∈ (0, 1). Thus, φ is strictly convex.(c) Since X is twice differentiable, X(v + λ(u− v))−X(v) = λA(v)(u− v) + o(λ) for u, v ∈ Ωand λ ∈ (0, 1). Then (2.4) can be written as λA(v)(u − v) − λ(X(u) − X(v)) + o(λ) ⪰ 0.

Dividing both sides by λ, we have A(v)(u− v)−X(u) +X(v) + o(λ)λ

⪰ 0. Letting λ→ 0 yields

A(v)(u− v)−X(u) +X(v) ⪰ 0.

LetM ≡ A(v)(u−v)−X(u)+X(v). SinceM and X(v)−1 are symmetric positive semidefinite,

there exist M12 and X(v)−

12 . Then we have⟨

X(v)−1,M⟩= tr(X(v)−1M) = tr(X(v)−

12M

12M

12X(v)−

12 ) = ∥M

12X(v)−

12∥2F .

It then follows from the convexity of ϕ, (2.3) and (2.5) that

φ(u)− φ(v) = ϕ(X(u))− ϕ(X(v))

≥⟨−X(v)−1, X(u)−X(v)

⟩=

⟨X(v)−1,M

⟩+⟨X(v)−1,−A(v)(u− v)

⟩= ∥M

12X(v)−

12∥2F +

⟨−A∗(v)X(v)−1, u− v

⟩≥ ⟨∇φ(v), u− v⟩ , (2.7)

where the last inequality follows from (2.6).Since φ is convex by (b), it suffices for (c) to show that u = v if and only if φ(u)− φ(v) =

⟨∇φ(v), u− v⟩. If u = v, then it is clear that φ(u) − φ(v) = ⟨∇φ(v), u− v⟩. Conversely,suppose that φ(u) − φ(v) = ⟨∇φ(v), u− v⟩, then the equality holds in (2.7). It follows from

(2.7) that ∥M 12X(v)−

12∥F = 0, and ϕ(X(u))− ϕ(X(v)) = ⟨−X(v)−1, X(u)−X(v)⟩. Then, we

have A(v)(u−v) = 0 from the definition ofM . Since A1(x), . . . , An(x) are linearly independentfor all x ∈ Ω, we have u = v. 2

Note that Proposition 3 (b) does not assume the differentiability of X.We next show that matrices in a level set of ϕ is uniformly positive definite, which is a key

property for the level boundedness of the merit function proposed in Section 4.

Proposition 4 For a given γ ∈ R, let Lϕ(γ) = U ∈ Sd++|ϕ(U) ≤ γ. Let Γ be a bounded

subset of Sd. Then, there exists λ > 0 such that λmin(U) ≥ λ for all U ∈ Lϕ(γ) ∩ Γ.

Proof. Suppose the contrary, that is, there exists a sequence Uj ⊂ Lϕ(γ) ∩ Γ such thatλmin(Uj) → 0 as j → ∞. Then

− log λmin(Uj) → ∞. (2.8)

Since Uj ∈ Lϕ(γ), we have γ ≥ ϕ(Uj) = − log detUj = −∑d

i=1 log λi(Uj). It then follows from(2.8) that there exists an index k and an infinite subset J such that limj→∞,j∈J − log λk(Uj) =−∞, that is, limj→∞,j∈J λk(Uj) = ∞. However, this is contrary to the boundedness of Uj.Therefore, there exists λ > 0 such that λmin(U) ≥ λ for all U ∈ Lϕ(γ) ∩ Γ. 2

8

2.3 The shifted perturbed KKT conditions for the nonlinear SDP

We first introduce optimality conditions for the nonlinear SDP (1.1). Let v = (x, y, Z). TheLagrangian function L of (1.1) is given by

L(v) ≡ f(x)− g(x)⊤y − ⟨X(x), Z⟩ ,

where y ∈ Rm and Z ∈ Sd are the Lagrange multiplier vector and matrix for g(x) = 0 andX(x) ⪰ 0, respectively. From (2.2), a gradient of the Lagrangian function L with respect to xis given by

∇xL(v) = ∇f(x)− Jg(x)⊤y −A∗(x)Z.

The Karush-Kuhn-Tucker (KKT) conditions of (1.1) are written as ∇xL(v)g(x)X(x)Z

=

000

(2.9)

andX(x) ⪰ 0, Z ⪰ 0. (2.10)

Most of the solution methods for the nonlinear SDP is developed to find a point v = (x, y, Z)that satisfies the KKT conditions. However, it is difficult to get such a point directly due to thecomplementarity condition X(x)Z = 0 with X(x) ⪰ 0 and Z ⪰ 0. To overcome this difficulty,the primal-dual interior point method proposed by Yamashita, Yabe and Harada [26] exploitthe following perturbed KKT conditions with a parameter µ > 0. ∇xL(v)

g(x)X(x)Z − µI

=

000

(2.11)

and

X(x) ≻ 0, Z ≻ 0. (2.12)

They [26] proposed the Newton type algorithm to get a point satisfying the perturbed KKTconditions.

In this paper, we focus on the following shifted perturbed KKT conditions. For µ > 0, ∇xL(v)g(x) + µyX(x)Z − µI

=

000

(2.13)

and

X(x) ≻ 0, Z ≻ 0. (2.14)

The above shifted perturbed KKT conditions are derived by Forsgren and Gill [4] for thenonlinear programming. In what follows, we call a point v satisfying the shifted perturbedKKT conditions a shifted perturbed KKT point. Furthermore, we define a set W ⊂ V by

W ≡ (x, y, Z) ∈ V | X(x) ≻ 0, Z ≻ 0.

We call a point v ∈ W an interior point.

9

3 A primal-dual interior point method based on the shifted

perturbed KKT conditions

In this section, we introduce a prototype of an interior point algorithm based on the shifted per-turbed KKT conditions (2.13) and (2.14). Note that the prototype has been already proposedin [10].

The primal-dual interior point method generates a sequence vk ⊂ Rn×Rm×Sd such thatthe point vk approximately satisfies the shifted perturbed KKT conditions (2.13) and (2.14)with µ = µk > 0, where µk is a positive sequence such that µk → 0 (k → ∞).

To construct a concrete algorithm, it is important to define the approximate shifted per-turbed KKT point, and to provide a method for finding the approximate shifted perturbedKKT point.

We first give a concrete definition of the approximate shifted perturbed KKT point. To thisend, let

r(v;µ) ≡

∇xL(v)g(x) + µyX(x)Z − µI

.Moreover, let

ρ(v;µ) ≡

√∥∥∥∥[ ∇xL(v)g(x) + µy

]∥∥∥∥2

+ ∥X(x)Z − µI∥2F .

For a given ε > 0, we define the approximate shifted perturbed KKT point as a point v satisfyingρ(v;µ) ≤ ε and v ∈ W . Note that ρ(v;µ) = 0 and v ∈ W if and only if v is the shifted perturbedKKT point. Note also that ρ(v; 0) = 0, X(x) ⪰ 0 and Z ⪰ 0 if and only if v is an original KKTpoint of the nonlinear SDP (1.1).

Now, we give the framework of the primal-dual interior point method.

Algorithm 1

Step 0. Let µk be a positive sequence such that µk → 0 as k → ∞. Choose constantsσ, ϵ > 0. Set k = 0.

Step 1. Find an approximate shifted perturbed KKT point vk+1 with ε = σµk, that is, vk+1 ∈W such that ρ(vk+1;µk) ≤ σµk.

Step 2. If ρ(vk+1; 0) ≤ ϵ, then stop.

Step 3. Set k = k + 1 and go to Step 1. 2

The following theorem gives conditions for the global convergence of Algorithm 1. It canbe proven in a way similar to [26, Theorem 1]. Thus, we omit the proof.

10

Theorem 1 Suppose that the approximate shifted perturbed KKT point vk+1 is found in Step1 at every iteration. Moreover suppose that the sequence xk is bounded, and that the MFCQcondition holds at any accumulation point of xk, i.e., for any accumulation point x∗ of xk,the matrix Jg(x

∗) is of full rank and there exists a nonzero vector w ∈ Rn such that

Jg(x∗)w = 0 and X(x∗) +

n∑i=1

Ai(x∗) ≻ 0.

Then, the sequences yk and Zk are bounded, and any accumulation point of vk satisfiesthe KKT conditions (2.9) and (2.10). 2

The theorem guarantees the global convergence if the approximate shifted perturbed KKTpoint vk+1 is found at each iteration. Thus it is important to present concrete algorithm thatfinds the point. In the next section, we will propose a merit function for the shifted perturbedKKT point and a Newton type algorithm for solving the unconstrained minimization problemof the merit function.

4 Finding a shifted perturbed KKT point

In order to find the approximate shifted perturbed KKT point in Step 1 of Algorithm 1, wemay solve the following unconstrained minimization problem:

minimize ρ(v;µ)2,subject to v ∈ V ,

where recall that V = Rn ×Rm × Sd. Unfortunately, a stationary point of the problem is notnecessarily a shifted perturbed KKT point unless ∇r(v;µ) is invertible. In this section, we firstconstruct a differentiable merit function F whose stationary point is a shifted perturbed KKTpoint. Moreover, we show that a Newton direction for the nonlinear equations r(v;µ) = 0 isa descent direction of the merit function F . Next, we propose a Newton type algorithm forsolving the unconstrained minimization of the merit function F . Finally, we show that theproposed algorithm finds a shifted perturbed KKT point under some mild assumptions.

4.1 Merit function and its properties

We propose the following merit function F : W → R for the shifted perturbed KKT point.

F (x, y, Z) ≡ FBP (x) + νFPD(x, y, Z),

where ν is a positive constant, and the functions FBP : Ω → R and FPD : W → R are

FBP (x) ≡ f(x) +1

2µ∥g(x)∥2 − µ log detX(x),

and

FPD(x, y, Z) ≡1

2µ∥g(x) + µy∥2 + ⟨X(x), Z⟩ − µ log detX(x) detZ,

11

respectively. The functions FBP and FPD are called the primal barrier penalty function andthe primal-dual barrier penalty function, respectively. Note that F is convex with respect to xwhen f is convex and g, X are linear. The merit function F is an extension of that proposedby Forsgren and Gill [4] for the nonlinear programming.

Remark 1 For the perturbed KKT conditions, Kato, Yabe and Yamashita [10] also proposed

the following merit function F : W → R.

F (x, y, Z) ≡ FBP (x) + νFPD(x, y, Z),

where FPD(w) is defined by

FPD(x, y, Z) ≡1

2∥g(x) + µy∥2 + log

1d⟨X(x), Z⟩+ ∥Z 1

2X(x)Z12 − µI∥2F

(det(X(x)Z))1d

.

They showed that F has nice properties as the merit function F . However, F is more compli-cated than F , and hence it might not be easy to implement the Newton type method based on Fin [10]. Furthermore, even if f is convex and g, X are linear, F is not necessarily convex withrespect to x.

In the rest of this subsection, we present some useful properties of the merit function F suchas the differentiability, the equivalence between a stationary point of F and a shifted perturbedKKT point, and the level boundedness.

First of all, we present a concrete formula of the derivatives of the merit function F .

Theorem 2 The merit function F is differentiable at w = (x, y, Z) ∈ W. Moreover, itsderivative is given by

∇F (w) =

∇FBP (x) + ν∇xFPD(w)ν∇yFPD(w)ν∇ZFPD(w)

,where

∇FBP (x) = ∇f(x) + 1

µJg(x)

⊤g(x)− µA∗(x)X(x)−1,

∇xFPD(w) =1

µJg(x)

⊤(g(x) + µy) +A∗(x)(Z − µX(x)−1),

∇yFPD(w) = g(x) + µy,

∇ZFPD(w) = X(x)− µZ−1.

Proof. By the definition of the merit function F , we have

∇xF (w) = ∇FBP (x) + ν∇xFPD(w), ∇yF (w) = ν∇yFPD(w), ∇ZF (w) = ν∇ZFPD(w).

Thus, the derivative of F is given by

∇F (w) =

∇xF (w)∇yF (w)∇ZF (w)

=

∇FBP (x) + ν∇xFPD(w)ν∇yFPD(w)ν∇ZFPD(w)

.12

By Proposition 3 (a), we obtain

∇FBP (x) = ∇f(x) + 1

µJg(x)

⊤g(x)− µA∗(x)X(x)−1,

∇xFPD(w) =1

µJg(x)

⊤(g(x) + µy) +A∗(x)(Z − µX(x)−1).

From [21, Section 5], we also get∇ZFPD(w) = X(x)−µZ−1. Moreover, ∇yFPD(w) = g(x)+µy.2

Next, we show the equivalence between a stationary point of the merit function F and ashifted perturbed KKT point.

Theorem 3 A point w∗ ∈ W is a stationary point of the merit function F if and only if w∗ isa shifted perturbed KKT point.

Proof. First, let w∗ = (x∗, y∗, Z∗) ∈ W be a stationary point of the merit function F . It thenfollows from Theorem 2 that

∇f(x∗) + 1µJg(x

∗)⊤(2g(x∗) + µy∗) +A∗(x∗)(Z∗ − 2µX(x∗)−1) = 0, (4.1)

g(x∗) + µy∗ = 0, X(x∗)− µ(Z∗)−1 = 0. (4.2)

Thus we have

∇xL(w∗) = ∇f(x∗)− Jg(x

∗)⊤y∗ −A∗(x∗)Z∗

= ∇f(x∗) + 1

µJg(x

∗)⊤g(x∗)− 1

µA∗(x∗)X(x∗)−1

= − 1

µJg(x

∗)⊤(g(x∗) + µy∗)−A∗(x∗)(Z∗ − µX(x∗)−1)

= 0,

where the second and third equalities follow from (4.2) and (4.1), respectively. Therefore, w∗

is a shifted perturbed KKT point.Conversely, let w∗ = (x∗, y∗, Z∗) be a shifted perturbed KKT point. Then, we obtain that

∇xL(w∗) = ∇f(x∗)− Jg(x

∗)⊤y∗ −A∗(x∗)Z∗ = 0,

g(x∗) + µy∗ = 0, X(x∗)Z∗ − µI = 0.

13

It then follows from Theorem 2 that

∇xF (w∗) = ∇f(x∗) + 1

µJg(x

∗)(2g(x∗) + µy∗) +A∗(x∗)(Z∗ − 2µX(x∗)−1)

= ∇f(x∗) + 1

µJg(x

∗)⊤g(x∗)− µA∗(x∗)X(x∗)−1

+1

µJg(x

∗)⊤(g(x∗) + µy∗) +A∗(x∗)(Z∗ − µX(x∗)−1)

= ∇f(x∗)− Jg(x∗)⊤y∗ −A∗(x∗)Z∗

+1

µJg(x

∗)⊤(g(x∗) + µy∗) +A∗(x∗)(Z∗ − µX(x∗)−1)

= 0,

∇yF (w∗) = g(x∗) + µy∗ = 0,

∇ZF (w∗) = X(x∗)− µ(Z∗)−1 = (X(x∗)(Z∗)− µI)(Z∗)−1 = 0.

Therefore, we have ∇F (w∗) = 0, that is, w∗ is a stationary point of F . 2

This theorem is an extension of [4, Lemma 3.1] for the nonlinear programming. From thistheorem, we can find an approximate shifted perturbed KKT point by solving the followingunconstrained minimization problem.

minimize F (w),subject to w ∈ W .

(4.3)

One of the sufficient conditions under which descent methods find a stationary point is that alevel set of the objective function is bounded. Thus, it is worth providing sufficient conditionsfor the level boundedness of the merit function F . For a given α ∈ R, we define the level setL(α) of F by

L(α) = w ∈ W | F (w) ≤ α .

We first give two lemmas.

Lemma 1 Let w = (x, y, Z) ∈ W and µ > 0. Then the following properties hold.

(a) ⟨X(x), Z⟩ − µ log detX(x)Z ≥ dµ(1− log µ),

(b) FPD(w) ≥ dµ(1−log µ). The equality holds if and only if g(x)+µy = 0 and X(x)Z−µI = 0.

(c) lim⟨X(x),Z⟩↓0

FPD(w) = ∞ and lim⟨X(x),Z⟩↑∞

FPD(w) = ∞.

Proof. The properties (a), (b) and (c) directly follow from [26, Lemma 1]. 2

14

Lemma 2 Suppose that an infinite sequence wj = (xj, yj, Zj) is included in L(α). Supposealso that the sequence xj is bounded. Then, the sequences yj and Zj are also bounded.In addition, the sequences X(xj) and Zj are uniformly positive definite.

Proof. Since xj is bounded, the sequence − log detX(xj) is bounded below. Thus, thereexists a real number M1 such that

M1 ≤ f(xj) +1

2µ∥g(xj)∥2 − log detX(xj) = FBP (xj) for all j. (4.4)

It then follows from wj ∈ L(α) and the definition of F that

FPD(wj) =1

ν(F (wj)− FBP (xj)) ≤

1

ν(α−M1) for all j, (4.5)

which can be rewritten as

1

2µ∥g(xj) + µyj∥2 ≤ 1

ν(α−M1)− ⟨X(xj), Zj⟩+ µ log detX(xj)Zj

≤ 1

ν(α−M1)− dµ(1− log µ),

where the last inequality follows from Lemma 1 (a). Hence, the sequence yj is bounded.Next we show that X(xj) is uniformly positive definite. From Lemma 1 (b) and (4.4), we

have

M1 ≤ FBP (xj) = F (wj)− νFPD(wj) ≤ α− νFPD(wj) ≤ α− dµ(1− log µ) for all j,

and hence the sequence FBP (xj) is bounded. It then follows from the boundedness of xjand FBP (xj) = f(xj) +

12µ∥g(xj)∥2 − µ log detX(xj) that − log detX(xj) is also bounded.

From Proposition 4, the boundedness of − log detX(xj) and X(xj) implies that X(xj)is uniformly positive definite, that is, there exists λ such that λmin(X(xj)) ≥ λ > 0 for all j.

Next we show that Zj is bounded. From Lemma 1 (b) and (4.5), we have

dµ(1− log µ) ≤ FPD(wj) ≤1

ν(α−M1) for all j,

and hence the sequence FPD(wj) is bounded. It then follows from Lemma 1 (c) that⟨X(xj), Zj⟩ is bounded. Thus, there exists a real number M2 such that for all j,

M2 ≥ tr(X(xj)Zj) ≥ λmin(X(xj))tr(Zj) ≥ λtr(Zj) = λd∑

k=1

λk(Zj) (4.6)

where the second inequality follows from [2, Proposition 8.4.13]. Since Zj is positive definite,λk(Zj) > 0 for k = 1, . . . , d. It then follows from (4.6) that λk(Zj) is bounded for k = 1, . . . , d,and hence Zj is bounded.

Finally, we show that Zj is uniformly positive definite. Recall that

FPD(wj) =1

2µ∥g(xj) + µyj∥2 + ⟨X(xj), Zj⟩ − µ log detX(xj)− µ log detZj,

15

and that the sequences xj, yj, ⟨X(xj), Zj⟩, − log detX(xj) and FPD(wj) are bounded.Therefore, − log detZj is also bounded. It then follows from Proposition 4 and the bound-edness of Zj that Zj is uniformly positive definite. 2

We now give sufficient conditions under which any level set of the merit function F isbounded.

Theorem 4 Suppose that the following four assumptions hold.(i) The function f is convex;(ii) The functions g1, . . . , gm are affine;(iii) The function X satisfies X(λu+ (1− λ)v)− λX(u)− (1− λ)X(v) ⪰ 0 for λ ∈ [0, 1] andu, v ∈ Ω;(iv) The matrices A1(x), . . . , An(x) are linearly independent for all x ∈ Ω;(v) There exists a shifted perturbed KKT point.Then, the level set L(α) of F is bounded for all α ∈ R.

Proof. Let (xk, yk, Zk) be infinite sequence in L(α). We first show that the sequence xkis bounded. In order to prove this by contradiction, we suppose that there exists a subsetI ⊂ 0, 1, . . . such that limk→∞,k∈I ∥xk∥ = ∞. Without loss of generality, we suppose that∥xk∥ > 1 for all k ∈ I. From the assumption (v), there exists a shifted perturbed KKT pointw∗ = (x∗, y∗, Z∗). Let u : Rn → Rn be defined by

u(xk) ≡1

∥xk∥xk +

(1− 1

∥xk∥

)x∗. (4.7)

Then, since ∥xk∥ > 1, we have

∥u(xk)∥ =

∥∥∥∥ 1

∥xk∥xk +

(1− 1

∥xk∥

)x∗∥∥∥∥ ≤ 1 +

(1− 1

∥xk∥

)∥x∗∥ < 1 + ∥x∗∥ for all k ∈ I,

which implies that the sequence u(xk)I is bounded. Therefore, there exists at least oneaccumulation point of u(xk)I . Let u∗ be an accumulation point of u(xk)I . Then, thereexists a subset J ⊂ I such that limk→∞,k∈J u(xk) = u∗. From the definition (4.7) of u, weobtain

∥u(xk)− x∗∥ =∥xk − x∗∥

∥xk∥≥

∣∣∣∣∥xk∥ − ∥x∗∥∥xk∥

∣∣∣∣ = ∣∣∣∣1− ∥x∗∥∥xk∥

∣∣∣∣ for all k ∈ J ,

and hence

∥u∗ − x∗∥ = limk→∞,k∈J

∥u(xk)− x∗∥ ≥ limk→∞,k∈J

∣∣∣∣1− ∥x∗∥∥xk∥

∣∣∣∣ = 1. (4.8)

Since w∗ is the shifted perturbed KKT point, we have from Theorem 2 that

0 = ∇xL(w∗) = ∇f(x∗) + 1

µJg(x

∗)⊤g(x∗)− 1

µA∗(x∗)X(x∗)−1 = ∇FBP (x

∗). (4.9)

16

Note that FBP is strictly convex from Proposition 3 (c) and the assumptions (i)–(iv). It thenfollows from (4.9) that x∗ is the unique global minimizer of FBP . Thus, (4.8) implies that

0 < FBP (u∗)− FBP (x

∗). (4.10)

Let γk ≡ FBP (u(xk)) − FBP (x∗) and γ∗ ≡ FBP (u

∗) − FBP (x∗). Note that γ∗ is positive from

(4.10) and limk→∞,k∈J γk = γ∗. Therefore, for any ε ∈ (0, γ∗), there exists a positive integer k0such that |γk − γ∗| < ε for all k ∈ J such that k ≥ k0, which yields that

0 < γ∗ − ε < γk = FBP (u(xk))− FBP (x∗) < γ∗ + ε for all k ∈ J such that k ≥ k0. (4.11)

By the definition of F and Lemma 1 (b), we have

FBP (xk) + νdµ(1− log µ) ≤ FBP (xk) + νFPD(wk) = F (wk) ≤ α for all k ∈ J ,

which implies that

FBP (xk) = f(xk) +1

2µ∥g(xk)∥2 − µ log detX(xk) ≤ β for all k ∈ J , (4.12)

where β ≡ α− νdµ(1− log µ). From the convexity of FBP , we obtain

FBP (u(xk)) ≤1

∥xk∥FBP (xk) +

(1− 1

∥xk∥

)FBP (x

∗) for all k ∈ J ,

which means that

FBP (x∗) + (FBP (u(xk))− FBP (x

∗)) ∥xk∥ ≤ FBP (xk) for all k ∈ J . (4.13)

It then follows from (4.11), (4.12) and (4.13) that

FBP (x∗) + (γ∗ − ε)∥xk∥ < FBP (x

∗) + γk∥xk∥ ≤ β for all k ∈ J such that k ≥ k0.

Rearranging the above inequality, we have

∥xk∥ ≤ β − FBP (x∗)

γ∗ − ε<∞ for all k ∈ J such that k ≥ k0.

The inequality contradicts ∥xk∥ → ∞ (k → ∞). Hence, for any sequence xk, yk, Zk ⊂ L(α),the sequence xk is bounded. Since xk is bounded and F (wj) is bounded above, it followsfrom Lemma 2 that the sequences yk and Zk are also bounded. 2

Due to Theorems 2–4, we can solve the unconstrained minimization problem (4.3) by anydescent method, such as the quasi-Newton method and the steepest descent method, and henceget an approximate shifted perturbed KKT point vk+1 in Step 1 of Algorithm 1.

Remark 2 The level boundedness of the merit function for the nonlinear programming is notgiven in the original paper [4]. Applying Theorem 4, it is easy to show that the merit functionM in [4] is level bounded if the objective function f is convex, the constraint functions ci (i ∈ E)are affine, and rank(Jc) = n.

Remark 3 Kato, Yabe and Yamashita [10] showed that their merit function F is differentiableand its stationary point is shifted perturbed KKT point. However, they did not present the levelboundedness of their merit function.

17

4.2 Newton algorithm for minimization of the merit function

In this subsection, we propose a Newton type method for the unconstrained minimizationproblem (4.3) of the merit function F .

We exploit the scaling of X(x) and Z discussed in Subsection 2.1. Let T ∈ Rd×d be anonsingular matrix such that

TX(x)T⊤T−⊤ZT−1 = T−⊤ZT−1TX(x)T⊤. (4.14)

Let X(x) and Z be defined by

X(x) = TX(x)T⊤ = (T ⊙ T )X(x),

Z = T−⊤ZT−1 = (T−⊤ ⊙ T−⊤)Z,

respectively. Note that X(x) and Z commute, that is, X(x)Z = ZX(x) from (4.14). Asseen later, the scaling enables us to analyze and calculate a Newton direction easily. In thesubsequent discussions, for simplicity, we denote X(x) and X(x) by X and X, respectively.

Next, we give a Newton direction, and show that it is descent direction for the meritfunction F . The Newton direction is derived from the nonlinear equations r(w;µ) = 0 in theshifted perturbed KKT conditions (2.13). However, the matrix ∆Z of a pure Newton direction(∆x,∆y,∆Z) for r(w;µ) = 0 is not necessarily symmetric due to XZ − µI = 0. Thus, weconsider the following symmetrized shifted perturbed KKT conditions with the scaling.

rS(w;µ) ≡

∇xL(w)g(x) + µy

X Z − µI

=

000

(4.15)

and

X ≻ 0, Z ≻ 0.

Note that X Z − µI = 0 is equivalent to XZ − µI = 0 if X and Z are symmetric positivesemidefinite [26]. Moreover, X(x) ≻ 0 and Z ≻ 0 if and only if X(x) ≻ 0 and Z ≻ 0. Therefore,the symmetrized shifted perturbed KKT conditions (4.15) are essentially same as the originalshifted perturbed KKT conditions (2.13).

We apply the Newton method to the equation (4.15). Before we give a concrete Newton

equations, we provide a first order approximation of X Z − µI at (x + ∆x, Z + ∆Z). FromProposition 2 (a), it is written as

X Z − µI +1

2(Z ⊙ I)(T ⊙ T )A(x)∆x+

1

2(X ⊙ I)(T−⊤ ⊙ T−⊤)∆Z. (4.16)

Let ∆X be defined as

∆X ≡n∑

i=1

∆xiAi(x) = A(x)∆x,

and let ∆X and ∆Z be the scaled matrices of ∆X and ∆Z with T , that is,

∆X ≡ T∆XT⊤ = (T ⊙ T )∆X and ∆Z ≡ T−⊤∆ZT−1 = (T−⊤ ⊙ T−⊤)∆Z,

18

respectively. Then, the first order approximation (4.16) can be written as

X Z − µI +1

2(Z ⊙ I)∆X +

1

2(X ⊙ I)∆Z

=1

2(XZ + ZX)− µI +

1

2(Z∆X +∆XZ) +

1

2(X∆Z +∆ZX)

Consequently the Newton equations for the nonlinear equation (4.15) are written as

G∆x− Jg(x)⊤∆y −A∗(x)∆Z = −∇xL(w), (4.17)

Jg(x)∆x+ µ∆y = −g(x)− µy, (4.18)

Z∆X +∆XZ + X∆Z +∆ZX = 2µI − XZ − ZX, (4.19)

where G denotes a Hessian matrix of the Lagrangian function L with respect to x or itsapproximation. In what follows, we call the solution ∆w ≡ (∆x,∆y,∆Z) of the Newtonequations (4.17)–(4.19) the Newton direction.

Next, we give the explicit form of the Newton direction ∆w. From (4.18), we have

∆y = − 1

µ(g(x) + µy + Jg(x)∆x). (4.20)

Moreover, since (X ⊙ I)∆Z = 12(X∆Z + ∆ZX), (X ⊙ I)(µX−1 − Z) = 1

2(2µI − ZX − XZ)

and (Z ⊙ I)∆X = 12(Z∆X +∆XZ), the equation (4.19) can be rewritten as

(X ⊙ I)∆Z + (Z ⊙ I)∆X = (X ⊙ I)(µX−1 − Z) (4.21)

Since the matrix X is positive definite and the scaling matrix T is nonsingular, the matrix X =TXT⊤ is also positive definite. Therefore, the operator (X ⊙ I) is invertible from Proposition

1 (a). Moreover, X−1 = (TXT⊤)−1 = T−⊤X−1T−1 = (T−⊤ ⊙ T−⊤)X−1. It then follows from(4.21) that

∆Z = (µX−1 − Z)− (X ⊙ I)−1(Z ⊙ I)∆X

= (T−⊤ ⊙ T−⊤)(µX−1 − Z)− (X ⊙ I)−1(Z ⊙ I)(T ⊙ T )A(x)∆x, (4.22)

where the last equality follows from the definition of the scale matrices Z and ∆X. Since∆Z = (T−⊤ ⊙ T−⊤)∆Z and (T−⊤ ⊙ T−⊤)−1 = (T⊤ ⊙ T⊤) from Proposition 1 (c), multiplyingboth side of (4.22) by (T−⊤ ⊙ T−⊤)−1 yields

∆Z = µX−1 − Z − (T⊤ ⊙ T⊤)(X ⊙ I)−1(Z ⊙ I)(T ⊙ T )A(x)∆x. (4.23)

Finally, we give the concrete form of ∆x. Substituting (4.20) and (4.23) into (4.17), we obtain(G+H +

1

µJg(x)

⊤Jg(x)

)∆x = −∇xL(w)−

1

µJg(x)

⊤(g(x) + µy) +A∗(x)(µX−1 − Z)

= −(∇f(x) + 1

µJg(x)

⊤g(x)− µA∗(x)X−1

), (4.24)

where

H = A∗(x)(T⊤ ⊙ T⊤)(X ⊙ I)−1(Z ⊙ I)(T ⊙ T )A(x).

19

Note that H is the linear operator from Rn to Rn, and

Hu = A∗(x)(T⊤ ⊙ T⊤)(X ⊙ I)−1(Z ⊙ I)(T ⊙ T )A(x)u for all u ∈ Rn.

From the definitions of A(x) and A∗(x), the linear operator H is regarded as the matrix whose(i, j)-th element is written as

Hij =⟨Ai(x), (T

⊤ ⊙ T⊤)(X ⊙ I)−1(Z ⊙ I)(T ⊙ T )Aj(x)⟩. (4.25)

Since Jg(x)⊤Jg(x) is positive semidefinite, we can solve the linear equation (4.24) with respect

to ∆x if G+H is positive definite. Fortunately, H is positive semidefinite as shown below.

Lemma 3 Suppose that X and Z are symmetric positive definite. Then, H is symmetricpositive semidefinite. Furthermore, if A1(x), . . . , An(x) are linearly independent for all x ∈ Rn,then H is symmetric positive definite.

Proof. Since X is positive definite, the operator X ⊙ I is invertible from Proposition 1 (a). Let

u ∈ Rn and V = (X ⊙ I)−1(T ⊙ T )A(x)u. Then, we have

⟨Hu, u⟩ =⟨A∗(x)(T⊤ ⊙ T⊤)(X ⊙ I)−1(Z ⊙ I)(T ⊙ T )A(x)u, u

⟩=

⟨(Z ⊙ I)(T ⊙ T )A(x)u, (X ⊙ I)−1(T ⊙ T )A(x)u

⟩=

⟨(Z ⊙ I)(X ⊙ I)(X ⊙ I)−1(T ⊙ T )A(x)u, (X ⊙ I)−1(T ⊙ T )A(x)u

⟩=

⟨(Z ⊙ I)(X ⊙ I)V, V

⟩≥ 0, (4.26)

where the second equality follows from Proposition 1 (b) and the last inequality follows fromProposition 2 (b). Therefore, H is positive semidefinite.

Next we show that H is symmetric. From (4.25), we have

Hij =⟨Ai(x), (T

⊤ ⊙ T⊤)(X ⊙ I)−1(Z ⊙ I)(T ⊙ T )Aj(x)⟩

= tr(Ai(x)(T⊤ ⊙ T⊤)(X ⊙ I)−1(Z ⊙ I)(T ⊙ T )Aj(x))

= tr((T⊤ ⊙ T⊤)(X ⊙ I)−1(Z ⊙ I)(T ⊙ T )Aj(x)Ai(x))

=⟨(T⊤ ⊙ T⊤)(X ⊙ I)−1(Z ⊙ I)(T ⊙ T )Aj(x), Ai(x)

⟩=

⟨(T⊤ ⊙ T⊤)(X ⊙ I)−1(Z ⊙ I)(X ⊙ I)(X ⊙ I)−1(T ⊙ T )Aj(x), Ai(x)

⟩=

⟨(T⊤ ⊙ T⊤)(X ⊙ I)−1(X ⊙ I)(Z ⊙ I)(X ⊙ I)−1(T ⊙ T )Aj(x), Ai(x)

⟩=

⟨(T⊤ ⊙ T⊤)(Z ⊙ I)(X ⊙ I)−1(T ⊙ T )Aj(x), Ai(x)

⟩=

⟨Aj(x), (T

⊤ ⊙ T⊤)(X ⊙ I)−1(Z ⊙ I)(T ⊙ T )Ai(x)⟩

= Hji,

20

where the sixth equality follows from Proposition 2 (c) and the eighth equality follows fromProposition 1 (b).

Furthermore, suppose that A1(x), . . . , An(x) are linearly independent for all x ∈ Rn and

u = 0. Then, we have V = (X ⊙ I)−1(T ⊙ T )A(x)u = 0. It follows from Proposition 2 (b) and(4.26) that ⟨Hu, u⟩ > 0, i.e., H is positive definite. 2

Remark 4 In the case of the linear SDP, A1(x), . . . , An(x) are usually supposed to be linearlyindependent for x ∈ Rn. Then, H is positive definite from Lemma 3.

To sum up the above discussion, we give the concrete formulae of the Newton direction ∆win the following theorem.

Theorem 5 Let µ > 0 and w = (x, y, Z) ∈ W. Suppose that the matrix G+H is positive def-inite. Then, the Newton equations (4.17)–(4.19) have the unique solution ∆w = (∆x,∆y,∆Z)such that

∆x = −(G+H +

1

µJg(x)

⊤Jg(x)

)−1 (∇f(x) + 1

µJg(x)

⊤g(x)− µA∗(x)X−1

), (4.27)

∆y = − 1

µ(g(x) + µy + Jg(x)∆x),

∆Z = µX−1 − Z − (T⊤ ⊙ T⊤)(X ⊙ I)−1(Z ⊙ I)(T ⊙ T )A(x)∆x.

Proof. It is clear that 1µJg(x)

⊤Jg(x) is positive semidefinite. Thus, the positive definiteness of

G+H and (4.24) yield that

∆x = −(G+H +

1

µJg(x)

⊤Jg(x)

)−1 (∇f(x) + 1

µJg(x)

⊤g(x)− µA∗(x)X−1

).

Furthermore, ∆y and ∆Z directly follow from (4.20) and (4.23), respectively. 2

One of the main burdens on the computations of the Newton direction ∆w is the calculationof the operator (X⊙I)−1 in (4.23) and (4.25). Note that (X⊙I)−1 in (4.23) and (4.25) appears

as (X ⊙ I)−1(Z ⊙ I). Hence, when X = I, it is clear that (X ⊙ I)−1(Z ⊙ I) = Z ⊙ I. On the

other hand, when X = Z, (X ⊙ I)−1(Z ⊙ I) is the identity mapping. Thus, if we choose the

scaling matrix T such that X = I or X = Z, we do not have to explicitly handle the operator(X ⊙ I)−1. This is one of the reasons why we exploit the scaling. Note that the choices of T

such that X = I or X = Z is well-known as HRVW/KSH/M direction or NT direction.

(i) HRVW/KSH/M choice

Let T = X− 12 . Then we have X = I and Z = X

12ZX

12 . This choice corresponds to the dual

HRVW/KSH/M choice for the linear SDP [6, 12, 13].(ii) NT choice

Let T = W− 12 , where W = X

12 (X

12ZX

12 )−

12X

12 . Then we have X = W− 1

2XW− 12 =

W12ZW

12 = Z. This choice corresponds to the NT choice for the linear SDP [14, 15].

Next, we show that the Newton direction is a descent direction for the merit function F .For this purpose, we first show the following two lemmas.

21

Lemma 4 Let µ > 0 and w = (x, y, Z) ∈ W. Suppose that G+H is positive definite. Let ∆xbe given by (4.27). Then we have

∇FBP (x)⊤∆x = −∆x⊤

(G+H +

1

µJg(x)

⊤Jg(x)

)∆x ≤ 0.

Furthermore, ∇FBP (x)⊤∆x = 0 if and only if ∆x = 0.

Proof. We easily see that G+H+ 1µJg(x)

⊤Jg(x) is positive definite from the positive definiteness

of G+H. Since∇FBP (x) = ∇f(x)+ 1µJg(x)

⊤g(x)−µA∗(x)X−1 from Theorem 2, it then follows

from (4.24) that

∇FBP (x)⊤∆x = ∆x⊤

(∇f(x) + 1

µJg(x)

⊤g(x)− µA∗(x)X−1

)= −∆x⊤

(G+H +

1

µJg(x)

⊤Jg(x)

)∆x

≤ 0.

Furthermore, since G+H + 1µJg(x)

⊤Jg(x) is positive definite, ∇FBP (x)⊤∆x = 0 if and only if

∆x = 0. 2

Lemma 5 Let µ > 0 and w = (x, y, Z) ∈ W. Let ∆w = (∆x,∆y,∆Z) be given in Theorem 5.Then we have

⟨∇FPD(w),∆w⟩ = − 1

µ∥g(x) + µy∥2 − ∥(XZ)−

12 (µI − XZ)∥2F ≤ 0.

Furthermore, ⟨∇FPD(w),∆w⟩ = 0 if and only if g(x) + µy = 0 and XZ − µI = 0.

Proof. From Theorem 2, we obtain

⟨∇FPD(w),∆w⟩ = ⟨∇xFPD(w),∆x⟩+ ⟨∇yFPD(w),∆y⟩+ ⟨∇ZFPD(w),∆Z⟩

=1

µ∆x⊤Jg(x)

⊤(g(x) + µy) + ∆x⊤A∗(x)(Z − µX−1)

+(g(x) + µy)⊤∆y +⟨X − µZ−1,∆Z

⟩. (4.28)

On the other hand, we have from the definitions of A∗(x) and ∆X that

∆x⊤A∗(x)(Z − µX−1) =n∑

i=1

∆xi⟨Ai(x), Z − µX−1

⟩=

⟨n∑

i=1

∆xiAi(x), Z − µX−1

⟩=

⟨A(x)∆x, Z − µX−1

⟩=

⟨∆X,Z − µX−1

⟩. (4.29)

22

From Proposition 1 (c) and (b), we have⟨∆X,Z − µX−1

⟩=

⟨(T ⊙ T )−1(T ⊙ T )∆X,Z − µX−1

⟩=

⟨(T−1 ⊙ T−1)(T ⊙ T )∆X,Z − µX−1

⟩=

⟨(T ⊙ T )∆X, (T−⊤ ⊙ T−⊤)(Z − µX−1)

⟩.

Moreover since X−1 = ((T ⊙ T )X)−1 = (TXT⊤)−1 = T−⊤X−1T−1 = (T−⊤ ⊙ T−⊤)X−1, wefurther have ⟨

∆X,Z − µX−1⟩

=⟨∆X, Z − µX−1

⟩=

⟨∆X, (I − µX−1Z−1)Z

⟩= tr(∆X(I − µX−1Z−1)Z)

= tr((I − µX−1Z−1)Z∆X). (4.30)

Since X and Z commute, X−1 and Z−1 also commute. Then we get

tr((I − µX−1Z−1)Z∆X) = tr(Z(I − µZ−1X−1)∆X)

= tr(Z(I − µX−1Z−1)∆X)

= tr((I − µX−1Z−1)∆XZ). (4.31)

From (4.30) and (4.31), we obtain⟨∆X,Z − µX−1

⟩=

1

2tr((I − µX−1Z−1)Z∆X) +

1

2tr((I − µX−1Z−1)∆XZ)

=1

2

⟨I − µX−1Z−1, Z∆X

⟩+

1

2

⟨I − µX−1Z−1,∆XZ

⟩. (4.32)

Note that ⟨X − µZ−1,∆Z⟩ = ⟨∆Z,X − µZ−1⟩. In a way similar to prove (4.32), we also have⟨X − µZ−1,∆Z

⟩=

1

2

⟨I − µX−1Z−1, X∆Z

⟩+

1

2

⟨I − µX−1Z−1,∆ZX

⟩. (4.33)

From (4.28), (4.29), (4.32) and (4.33), we obtain

⟨∇FPD(w),∆w⟩ =1

µ(g(x) + µy)⊤(Jg(x)∆x+ µ∆y)

+1

2

⟨I − µX−1Z−1, Z∆X +∆XZ + X∆Z +∆ZX

⟩. (4.34)

Note that since X and Z are symmetric positive definite and commute, XZ is symmetricpositive definite, and hence there exists (XZ)−

12 . Then, by substituting (4.18) and (4.19) into

(4.34), we have

⟨∇FPD(w),∆w⟩ = − 1

µ∥g(x) + µy∥2 +

⟨I − µX−1Z−1, µI − XZ

⟩= − 1

µ∥g(x) + µy∥2 −

⟨(ZX)−1(µI − ZX), µI − XZ

⟩= − 1

µ∥g(x) + µy∥2 −

⟨(XZ)−

12 (µI − XZ), (XZ)−

12 (µI − XZ)

⟩= − 1

µ∥g(x) + µy∥2 − ∥(XZ)−

12 (µI − XZ)∥2F

≤ 0,

23

where the third equality follows from the commutativity of X and Z. Moreover, it is clear that⟨∇FPD(w),∆w⟩ = 0 if and only if g(x) + µy = 0 and XZ − µI = 0. 2

Now, we show that the Newton direction ∆w is the descent direction for the merit functionF .

Theorem 6 Let µ > 0 and w = (x, y, Z) ∈ W. Assume that G+H is positive definite. Then,∆w = (∆x,∆y,∆Z) be given in Theorem 5 is a descent direction for the merit function F ,i.e.,

⟨∇F (w),∆w⟩ = −∆x⊤(G+H +

1

µJg(x)

⊤Jg(x)

)∆x

− ν

µ∥g(x) + µy∥2 − ν∥(XZ)−

12 (µI − XZ)∥2F

≤ 0.

Furthermore, ⟨∇F (w),∆w⟩ = 0 if and only if w is a shifted perturbed KKT point.

Proof. From Lemmas 4 and 5, we have

⟨∇F (w),∆w⟩ = ∇FBP (x)⊤∆x+ ν ⟨∇FPD(w),∆w⟩

= −∆x⊤(G+H +

1

µJg(x)

⊤Jg(x)

)∆x

− ν

µ∥g(x) + µy∥2 − ν∥(XZ)−

12 (µI − XZ)∥2F

≤ 0. (4.35)

Now, we show the second part of this theorem. Suppose that w is a shifted perturbed KKTpoint, i.e., ∇f(x)− Jg(x)

⊤y −A∗(x)Z = 0, g(x) + µy = 0 and XZ − µI = 0. Then we have

∇f(x) + 1

µJg(x)

⊤g(x)− µA∗(x)X−1 = ∇f(x)− Jg(x)⊤y −A∗(x)Z = 0.

It then follows from (4.27) that ∆x = 0. Moreover, we have ⟨∇F (w),∆w⟩ = 0 from (4.35).Conversely, suppose that ⟨∇F (w),∆w⟩ = 0. Since it follows from Lemmas 4 and 5 that

∇FBP (x)⊤∆x ≤ 0 and ⟨∇FPD(w),∆w⟩ ≤ 0, we have∇FBP (x)

⊤∆x = 0 and ⟨∇FPD(w),∆w⟩ =0. It further follows from Lemmas 4 and 5 that ∆x = 0, g(x)+µy = 0 and XZ−µI = 0. Thenwe have from (4.27) that

∇xL(w) = ∇f(x)− Jg(x)⊤y −A∗(x)Z = ∇f(x) + 1

µJg(x)

⊤g(x)− µA∗(x)X−1 = 0.

Thus, w is a shifted perturbed KKT point. 2

Theorem 6 guarantees that F (w + α∆w) < F (w) for sufficiently small α > 0 if w is not ashifted perturbed KKT point.

24

Now, we discuss how to choose an appropriate step size α such that F (w + α∆w) < F (w).Since the merit function F and the Newton equations (4.17)–(4.19) are well-defined only on W .Therefore, the new point w + α∆w is required to be an interior point. Thus, we must choosethe step size α ∈ (0, 1] such that X(x + α∆x) ≻ 0 and Z + α∆Z ≻ 0. To this end, we firstcalculate

αx =

− τ

λmin(X− 1

2∆XX− 12 )

if λmin(X− 1

2∆XX− 12 ) < 0

1 otherwise

and

αz =

− τ

λmin(Z− 1

2∆ZZ− 12 )

if λmin(Z− 1

2∆ZZ− 12 ) < 0

1 otherwise,

where τ ∈ (0, 1) is a given constant. Set

α = min1, αx, αz. (4.36)

Then Z + α∆Z ≻ 0 for any α ∈ (0, α]. Moreover, X(x + α∆x) ≻ 0 for any α ∈ (0, α] if Xis linear. Note that if X is nonlinear, X(x + α∆x) is not necessarily positive definite for anyα ∈ (0, α].

Next we choose a step size α ∈ (0, α] such that F (w+α∆w) < F (w) and X(x+α∆x) ≻ 0.For this purpose, we adopt the following Armijo’s line search rule: Find the smallest nonnegativeinteger l such that

F (w + αβl∆w) ≤ F (w) + ε0αβl ⟨∇F (w),∆w⟩ ,

X(x+ αβl∆x) ≻ 0

and set α = αβl, where β, ε0 ∈ (0, 1). Note that the second condition is not necessary when Xis linear.

Now, we describe a concrete Newton type method for Step 1 of Algorithm 1. Recall thatthe script k denotes the k-th iteration of Algorithm 1.

Algorithm 2 (for Step 2 of Algorithm 1)

Step 0. Choose β, ε0, τ ∈ (0, 1) and set j = 0 and w0 = vk.

Step 1. If ρ(wj, µk) ≤ σµk, then set vk+1 = wj and return.

Step 2. Obtain the Newton direction ∆wj = (∆xj,∆yj,∆Zj) by solving the Newton equations(4.17)–(4.19).

Step 3. Set αj = αjβlj , where αj is given by (4.36) and lj is the smallest nonnegative integer

such that

F (wj + αjβlj∆wj) ≤ F (wj) + ε0αjβ

lj ⟨∇F (wj),∆wj⟩ ,X(xj + αjβ

lj∆xj) ≻ 0.

Step 4. Set wj+1 = wj + αj∆wj and j = j + 1, and go to Step 1.

25

4.3 Global convergence of Algorithm 2

In this subsection, we prove the global convergence of Algorithm 2. For this purpose, we makethe following assumptions.

Assumptions

(A1) The functions f, g1, . . . , gm and X are twice continuously differentiable.

(A2) The sequence xj generated by Algorithm 2 remains in some compact set Ω of Rn.

(A3) The matrix Gj + Hj +1µJg(xj)

⊤Jg(xj) is uniformly positive definite and the sequence

Gj is bounded.

(A4) The sequences Tj and T−1j are bounded.

Note that Assumption (A2) holds under the assumptions of Theorem 4. Assumption (A3)guarantees that the Newton equations (4.17)–(4.19) have an unique solution.

Remark 5 Assumptions (A1)–(A3) hold for the linear SDP such that A1(xj), . . . , An(xj) arelinearly independent. In fact, it is clear that Assumption (A1) holds. Theorem 4 guaranteesthat Assumption (A2) holds. Moreover Hj is positive definite from Remark 4 and Gj = 0.Thus, Assumption (A3) holds.

Remark 6 Yamashita, Yabe and Harada [26] showed that their Newton type algorithm globallyconverges to a perturbed KKT point satisfying (2.11) and (2.12) under the boundedness of thesequence yj in addition to Assumptions (A1)–(A4). However they do not give sufficientconditions for the boundedness of yj.

Remark 7 Kato, Yabe and Yamashita [10] also showed that the Newton type algorithm with

the merit function F can find a shifted perturbed KKT point under the same assumptions.However, there do not give concrete sufficient conditions for Assumption (A2).

First of all, we show that the sequence wj generated by Algorithm 2 is bounded.

Lemma 6 Suppose that Assumptions (A2) holds. Then, the sequence wj = (xj, yj, Zj) gen-erated by Algorithm 2 is bounded. Furthermore, the matrices Xj and Zj are uniformlypositive definite.

Proof. Since the sequence F (wj) is monotonically decreasing, we have F (wj) ≤ F (w0) for allj. It then follows from Assumption (A2) and Lemma 2 that we have the desired results. 2

Note that the above lemma guarantees that Assumption (A4) holds if the scaling matrix Tis given by HRVW/KSH/M choice or NT choice.

Lemma 7 Suppose that Assumptions (A2)–(A4) hold. Then, the sequence ∆wj generatedby Algorithm 2 is bounded.

26

Proof. It follows from Assumptions (A2)–(A4), Lemma 6 and Theorem 5 that the sequence∆wj generated by Algorithm 2 is bounded. 2

We now show the global convergence of Algorithm 2. Here, we suppose that Algorithm 2generates an infinite sequence and wj is not shifted perturbed KKT point for all j.

Theorem 7 Suppose that Assumptions (A1)–(A4) hold. Then, the sequence wj = (xj, yj, Zj)generated by Algorithm 2 has an accumulation point w∗ = (x∗, y∗, Z∗). Moreover, the accumu-lation point w∗ is a shifted perturbed KKT point.

Proof. Since the sequence wj is bounded from Lemma 6, it has at least one accumulationpoint w∗.

Next, we prove that w∗ is a shifted perturbed KKT point. To this end, we first show thatthe sequence αj given in Step 3 of Algorithm 2 is away from zero, that is, there exists a realnumber α such that 0 < α ≤ αj for all j. Note that from Lemmas 6 and 7, the sequences Xj,Zj, ∆Xj and ∆Zj are bounded. Moreover the matrices Xj and Zj are uniformly

positive definite. Hence, the sequence λmin(X− 1

2j ∆XjX

− 12

j ) and λmin(Z− 1

2j ∆ZjZ

− 12

j ) are alsobounded. It then follows from the definition of αj that there exists a real number α such that0 < α ≤ αj for all j.

Next, we show ⟨∇F (wj),∆wj⟩ → 0 as j → ∞. From the Armijo’s line search strategy inStep 3, we have

F (wj+1)− F (wj) ≤ ε0αjβlj ⟨∇F (wj),∆wj⟩ ,

X(xj + αjβlj∆xj) ≻ 0.

Summing up the above inequality from j = 1 to j = j, we have

F (wj+1)− F (w1) ≤ ε0

j∑j=1

αjβj ⟨∇F (wj),∆wj⟩ .

It then follows from ⟨∇F (wj),∆wj⟩ ≤ 0 from Theorem 6 and α ≤ αj that

F (wj+1)− F (w1) ≤ ε0α

j∑j=1

βlj ⟨∇F (wj),∆wj⟩ .

Since the sequence wj is bounded, the sequence F (wj) is also bounded, and hence

−∞ <

∞∑j=1

βlj ⟨∇F (wj),∆wj⟩ ≤ 0.

Therefore, we have

limj→∞

βlj ⟨∇F (wj),∆wj⟩ = 0.

Now we consider two cases: lim infj→∞ βlj > 0 and lim infj→∞ βlj = 0.

27

Case 1 : lim infj→∞ βlj > 0. Then, we have

limj→∞

⟨∇F (wj),∆wj⟩ = 0.

Case 2 : lim infj→∞ βlj = 0. In this case, there exists a subset J ⊂ 0, 1, · · · such thatlimj→∞,j∈J lj = ∞. Since X(xj) is uniformly positive definite and ∆xj is bounded,there exists l such that X(xj + αjβ

l∆xj) ≻ 0 for all l > l. Therefore, without loss ofgenerality, we suppose that X(xj + αjβ

lj−1∆xj) ≻ 0 for all j ∈ J . Furthermore, sincelj − 1 does not satisfy the Armijo rule in Step 3, we have

ε0tj ⟨∇F (wj),∆wj⟩ < F (wj + tj∆wj)− F (wj),

where tj ≡ αjβlj−1. Let h(t) ≡ F (wj + t∆wj). It then follows from the mean value

theorem for h that there exists θj ∈ (0, 1) such that

ε0tj ⟨∇F (wj),∆wj⟩ < F (wj + tj∆wj)− F (wj)

= h(tj)− h(0)

= tjh′(θjtj)

= tj ⟨∇F (wj + θjtj∆wj),∆wj⟩ ,

which yields that

0 < (ε0 − 1) ⟨∇F (wj),∆wj⟩ < ⟨∇F (wj + θjtj∆wj)−∇F (wj),∆wj⟩≤ ∥∇F (wj + θjtj∆wj)−∇F (wj)∥∥∆wj∥, (4.37)

where the last inequality follows from the Cauchy-Schwarz inequality. Since wj and∆wj are bounded and limj→∞,j∈J tj = 0, we have from Assumption (A1)

limj→∞,j∈J

∥∇F (wj + θjtj∆wj)−∇F (wj)∥ = 0.

It then follows from (4.37) that

limj→∞,j∈J

⟨∇F (wj),∆wj⟩ = 0.

From both cases, we can conclude that

limj→∞

⟨∇F (wj),∆wj⟩ = 0. (4.38)

From the boundedness of wj and Assumptions (A3) and (A4), there exists a subset K ⊂0, 1, . . . such that

limj→∞,j∈K

wj = w∗, limj→∞,j∈K

Gj = G∗, limj→∞,j∈K

Tj = T ∗.

Moreover from (2.1), the sequences Tj ⊙ TjK and T⊤j ⊙ T⊤

j K converge to T ∗ ⊙ T ∗ and(T ∗)⊤ ⊙ (T ∗)⊤, respectively. Then we have from (4.25) that

limj→∞,j∈K

Hj = H∗.

28

Note that the matrix G∗ +H∗ + 1µJg(x

∗)⊤Jg(x∗) is positive definite from Assumption (A3). It

then follows from (4.27) that the subsequence ∆xjK converges to ∆x∗, where

∆x∗ = −(G∗ +H∗ +

1

µJg(x

∗)⊤Jg(x∗)

)−1(∇f(x∗) + 1

µJg(x

∗)⊤g(x∗)− µA∗(x∗)X(x∗)−1

).

Similarly, ∆yjK and ∆ZjK converge to ∆y∗ and ∆Z∗, where

∆y∗ = − 1

µ(g(x∗) + µy∗ + Jg(x

∗)∆x∗),

∆Z∗ = µX(x∗)−1 − Z∗ − ((T ∗)⊤ ⊙ (T ∗)⊤)(X(x∗)⊙ I)−1(Z∗ ⊙ I)(T ∗ ⊙ T ∗)A(x∗)∆x∗,

and Z∗ = ((T ∗)−⊤ ⊙ (T ∗)−⊤)Z∗. It then follows from (4.38) that

⟨∇F (w∗),∆w∗⟩ = 0.

Then, from Theorem 6, we have

∇xL(w∗) = 0, g(x∗) + µy∗ = 0 and X(x∗)Z∗ − µI = 0,

which means that w∗ is a shifted perturbed KKT point. 2

5 Numerical experiments

In this section, we report some numerical experiments for the proposed algorithm (Algorithm1 with Algorithm 2). We compare the proposed algorithm with the interior point method [26]based on the perturbed KKT conditions. We present the number of iterations and the CPUtime of both algorithms. The programs are written in MATLAB R2010a and run on a machinewith an Intel Core i7 920 2.67GHz CPU and 3.00GB RAM. The parameter µk used in the bothalgorithms is updated by µk+1 = µk/10 with µ0 = 0.1. Moreover, we exploit the approximateHessian Gk updated by the Levenberg-Marquardt type algorithm [26, Remark 3]. We adopt

the scaling matrix T = X− 12 , and use the following parameters.

Mc = 3.5, ν = 1.0, τ = 0.95,β = 0.95, ε0 = 0.50.

We solved the following four test problems used in [26] from the initial points indicated in [26].

Gaussian channel capacity problem:

minimize1

2

n∑i=1

log(1 + ti),

subject to1

n

n∑i=1

Xii ≤ P, Xii ≥ 0, ti ≥ 0,[1− aiti

√ri√

ri aiXii + ri

]⪰ 0, (i = 1, . . . , n),

29

where the decision variables are Xii and ti for i = 1, . . . , n. In the experiment, the constantsri and ai for i = 1, . . . , n are randomly chosen from the interval [0, 1], and P is set to 1. Notethat the objective function of the problem is nonconvex and the constraint functions are linear.

Minimization of the minimal eigenvalue problem:

minimize tr(ΠM(q)),

subject to tr(Π) = 1,

Π ⪰ 0,

q ∈ Q,

where Q ⊂ Rp, and M is a function from Rp to Sn, and decision variables are q ∈ Rp

and Π ∈ Sn. In the experiment, p is set to 2, and the function M is given by M(q) ≡q1q2M1+ q1M2+ q2M3, where M1,M2,M3 ∈ Sn are given constant matrices whose elements arerandomly chosen from the interval [−1, 1]. The constraint region Q is set to [−1, 1] × [−1, 1].Note that the objective function is nonconvex and the constraint functions are linear.

Nearest correlation matrix problem:

minimizeX∈Sn

1

2∥X − A∥2F ,

subject to X ⪰ ϵI,

Xii = 1, (i = 1, . . . , n),

where A ∈ Sn is a given constant matrix, and ϵ ∈ R is a given constant. Note that X ⪰ ϵI isequivalent to X − ϵI ⪰ 0. In the experiment, elements of the matrix A are randomly chosenfrom the interval [−1, 1] with Aii = 1 for i = 1, . . . , n. Moreover, we set ϵ = 10−3. Note that theobjective function is quadratic and the constraint functions are linear. Therefore, the problemis convex.

Static output feedback (SOF) problem:

minimize tr(X),subject to P ⪰ 0,

F (Q)P + PF (Q)⊤ +DD⊤ ⪯ 0,[X G(Q)P

PG(Q)⊤ P

]⪰ 0,

where X ∈ Snz×nz , P ∈ Snx×nx and Q ∈ Rnu×ny are decision variables, and the functions Fand G are defined by

F (Q) = A+MQC and G(Q) = B +NQC.

Moreover, the matrices A ∈ Rnx×nx , B ∈ Rnz×nx , C ∈ Rny×nx , D ∈ Rnx×nw ,M ∈ Rnx×nu

and N ∈ Rnz×nu are given constant matrices, and the elements of these matrices are randomly

30

Table 1: Gaussian channel capacity problem

Algorithm 1 SDPIPn iteration time(s) iteration time(s)5 19 0.37 19 0.4210 17 1.82 17 1.7815 22 9.52 21 8.4120 22 28.82 21 28.0325 39 129.14 36 130.5130 29 196.47 24 181.2035 31 443.46 27 388.1340 32 848.94 27 785.54

Table 2: Minimization of the minimal eigenvalue problem


Table 3: Nearest correlation matrix problem


31

Table 4: SOF-H2 problem

Algorithm 1 SDPIPProblem n iteration time(s) iteration time(s)AC1 27 191 6.14 191 6.10AC2 39 142 9.32 142 9.28AC3 38 162 10.30 162 10.19AC6 64 182 51.16 182 52.25AC17 22 11 0.27 11 0.28HE1 15 12 0.19 12 0.19HE2 24 22 0.60 22 0.60HE3 115 245 223.74 245 223.52REA1 26 98 2.88 98 2.90DIS1 88 257 127.47 257 127.95DIS2 16 10 0.21 10 0.17DIS3 58 99 14.85 99 15.41DIS4 66 16 3.05 16 3.34AC4 13 54 0.80 54 0.78BDT1 96 145 96.54 145 102.84MFP 26 167 4.90 167 4.91EB1 59 9 2.24 9 2.37NN15 20 13 0.27 13 0.28PSM 49 87 11.29 87 11.27

32

chosen from the interval [0, 1]. Since the objective function is linear and the constraint functionsare nonconvex, the problem is nonconvex.

For the termination criteria, we set ε = 1.0e − 4 for Gaussian channel capacity problem,Minimization of the minimal eigenvalue problem and Nearest correlation matrix problem, andε = 1.0e− 3 for SOF-H2 problem.

We show the numerical results in Tables 1–4. In these tables, SDPIP denotes the interiorpoint algorithm in [26]. From Tables 1–4, we see that Algorithm 1 is competitive to SDPIP.

6 Concluding remarks

In this paper, we have proposed the new merit function F for the shifted perturbed KKTconditions. We have shown the properties of the merit function. In particular, we gave thelevel boundedness of the merit function F , which is not given in other related papers for thenonlinear SDP. Moreover, we have proposed the Newton type method (Algorithm 2) to findan approximate shifted perturbed KKT point. We further have proved the global convergenceunder weaker assumptions than those in [26]. In the numerical experiments, we have shownthat Algorithm 1 is competitive to Algorithm SDPIP.

As future research, it is worth to show that Algorithm 1 converges superlinearly underappropriate conditions.

References

[1] Alizadeh, F., Haeberly, J. A., Overton, M. L.: Primal-dual interior-point methods forsemidefinite programming: convergence rates, stability and numerical results, SIAM Jour-nal on Optimization 8, 746-768 (1998).

[2] Bernstein, D, S.: Matrix Mathematics, Princeton University Press, 2009.

[3] Fares, B., Noll, D., Apkarian, P.: Robust control via sequential semidefinite programming,SIAM Journal on Control and Optimization 40, 1791-1820 (2002).

[4] Forsgren, A., Gill, P. E.: Primal-dual interior methods for nonconvex nonlinear program-ming, SIAM Journal on Optimization 8, 1132-1152 (1998).

[5] Freund, R. W., Jarre, F., Vogelbusch, C. H.: Nonlinear semidefinite programming: sensi-tivity, convergence, and an application in passive reduced-order modeling, MathematicalProgramming 109, 581-611 (2007).

[6] Helmberg, C., Rendl, F., Vanderbei, R. J., Wolkowicz, H.: An interior-point method forsemidefinite programming, SIAM Journal on Optimization 6, 342-361 (1996).

[7] Hol, C. W. J., Scherer, C. W., Van der Meche, E. G., Bosgra, O. H.: A nonlinear SDP ap-proach to fixed-order controller synthesis and comparison with two other methods appliedto an active suspension system, European journal of control 9, 13-28 (2003).

[8] Horn, R, A., Johnson, C, R.: Matrix analysis, Cambridge University Press, 1985.

33

[9] Kanzow, C., Nagel, C., Kato, H., Fukushima, M.: Successive linearization methods fornonlinear semidefinite programs, Computational Optimization and Applications 31, 251-273 (2005).

[10] Kato, A., Yabe H., Yamashita, H.: An interior point method with a primal-dual quadraticbarrier penalty function for nonlinear semidefinite programming, Technical Report, De-partment of Mathematical Information Science, Tokyo University of Science, 2012.

[11] Kocvara, M., Leibfritz, F., Stingl, M., Henrion, D.: A nonlinear SDP algorithm for staticoutput feedback problems in COMPlib. LAAS-CNRS Research Report No. 04508, October2004, LAAS Toulouse.

[12] Kojima, M., Shindoh, S., Hara, S.: Interior-point methods for the monotone semidefinitelinear complementarity problem in symmetric matrices, SIAM Journal on Optimization 7,86-125 (1997).

[13] Monteiro, R. D. C.: Primal-dual path-following algorithms for semidefinite programming,SIAM Journal on Optimization 7, 663-678 (1997).

[14] Nesterov, Y. E., Todd, M. J.: Self-scaled barriers and interior-point methods for convexprogramming, Mathematics of Operations Research 22, 1-42 (1997).

[15] Nesterov, Y. E., Todd, M. J.: Primal-dual interior-point methods for self-scaled cones,SIAM Journal on Optimization 8, 324-364 (1998).

[16] Overton, M. L.: Large-scale optimization of eigenvalues, SIAM Journal on Optimization2, 88-120 (1992).

[17] Qi, H., Sun, D.: A Quadratically convergent Newton method for computing the nearestcorrelation matrix, SIAM Journal on Matrix Analysis 28, 360-385 (2006).

[18] Quoc, T. D., Gumussoy, S., Michiels, W., Diehl, M.: Combining convex-concave decom-positions and linearization approaches for solving BMIs, with application to static outputfeedback, IEEE Transactions on Automatic Control 57, 1377-1390 (2012).

[19] Stingl, M.: On the Solution of Nonlinear Semidefinite Programs by Augmented LagrangianMethods, Doctoral Thesis, University of Erlangen, 2005.

[20] Stingl, M., Kocvara, M., Leugering, G.: A new non-linear semidefinite programming al-gorithm with an application to multidisciplinary free material optimization, Internationalseries of numerical mathematics 158, 275-295 (2009).

[21] Todd, M. J.: Semidefinite optimization, Acta Numerica 10, 515-560 (2001).

[22] Todd, M. J., Toh, K. C., Tutuncu, R. H.: On the Nesterov-Todd direction in semidefiniteprogramming, SIAM Journal on Optimization 8, 769-796 (1998).

[23] Vandenberghe, L., Boyd, S.: Semidefinite programming, SIAM Review 38, 49-95 (1996).

[24] Vandenberghe, L., Boyd, S., Wu, S. P.: Determinant maximization with linear matrixinequality constraints, SIAM Journal on Matrix Analysis and Applications 19, 499-533(1998).

34

[25] Yamashita, H., Yabe, H.: An interior point method with a primal-dual quadratic barrierpenalty function for nonlinear optimization, SIAM Journal on Optimization 14, 479-499(2003).

[26] Yamashita, H., Yabe, H., Harada, K.: A primal-dual interior point method for nonlinearsemidefinite programming, Mathematical Programming 135, 89-121 (2012).

35

Date post:	15-Aug-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

A diﬀerentiable merit function for the shifted …The nonlinear semideﬁnite programming, a...

Documents