FUNCTION SPACE OPTIMIZATION

FUNCTION SPACE OPTIMIZATION

Andreas GriewankGraduiertenkolleg 1128Institut fur Mathematik

Humboldt-Universitat zu Berlin

Script, version June 4, 2008

By Levis Eneya & Lutz Lehmann

Winter Semester 2007/2008

Contents

Overview 1

1 Existence Theory 21.1 Existence via Arzela-Ascoli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Existence via weak compactness of S . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 (Generalized) Differentiation 82.1 Directional derivatives for S ⊂ X Banach . . . . . . . . . . . . . . . . . . . . . . . . 82.2 Subdifferential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3 Generalized Jacobians and Hessians . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Frechet and Gateaux Differentiability 19

4 Tangent Cones and Sensitivity 284.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.2 Basic Review of Adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.3 Inequality Constraints via Cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5 Duality Results 40

6 Remarks on the Two-Norm Discrepancy 45

Appendix 47

References 50

i

Overview

We consider abstract mathematical optimization problems

minx∈S

f(x) for f : S ⊃ S → R

i.e. find x = x∗ ∈ S such that f∗ := f(x) ≤ f(x) for x ∈ S. We call S a feasible set (Jahn[Jah96]:constrained set). Generally S ⊂ X is an open convex domain in Banach space (X, ‖.‖).

Optimization Activities:

(0) Prove globally existence of a minimizer x.

(1) Characterize locally the minimality of x by necessary and sufficient optimality conditions.

(2) Approximate the problem by finite dimensional ”discretization”

minx∈Sn

f(x), Sn ⊂ Xn ⊂ X with dim(Xn) <∞

(3) Generate iterates x(k)n such that

limn→∞

(

limk→∞

x(k)n = xn

)

= x ∈ f−1(f∗).

(4) Analyze and implement fast algorithms for generating iterates x(k)n .

In this course we will mostly do (0) and (1) and some of (2) - (4). In terms of notation andorganization, we follow Jahn [Jah96], while finite dimensional terminology is as in Bonnans[BGLS06].For nonsmooth aspects, we follow Klatte & Kummer [KK02] and Clarke [Cla90]; and Ponstein[Pon80] for the abstract framework.

1

1 Existence Theory

Definition. Any sequence xj∞j=0 ⊂ S such that f∗ ≡ infx∈S (f(x)) = limj→∞ f(xj) is calledinfimizing (minimizing) sequence.

f∗ always exists but may be −∞. General task: Show that some infimizing sequences have in somesense a limit x ∈ S such that f(x) = f∗. Classical scenario: f is continuous on closed and boundedS ⊂ Rn. This implies that any infimizing sequence has a convergent subsequence xjk → x ∈ S withf(x) = f∗.

1.1 Existence via Arzela-Ascoli

Proposition 1.1. dim(X) =∞ ⇐⇒ not every closed and bounded subset is sequentially compact.

Examples:

• ℓ2 ≡ space of square summable sequences ≡ R∞.

x = (xi)∞i=1 ∈ ℓ2 ⇐⇒ ‖x‖ =

√√√√

∞∑

i=1

x2i <∞

xj = ej = (0, 0, · · · , 0, 1︸︷︷︸

jth pos.

, 0, · · · ).

‖xj − xk‖ =√

2 if j 6= k.

=⇒ No subsequence of (xi)∞i=1 can be Cauchy.

• X = C[0, 1] ≡ space of continuous functions on [0, 1] with ‖x‖ = max |x(ω)|, ω ∈ [0, 1]e.g. xj = sin(jωπ) or xj = ωj.‖xj − xk‖ ≥ 1 =⇒ no Cauchy subsequence. C(Ω) is known to be a complete normed spacebut pointwise limit

x∗(ω) = limj→∞

xj(ω) = limj→∞

ωj =

0 if 0 ≤ ω < 1,1 if ω = 1.

=⇒ lim xj does not exist in C[0, 1].

Observation: In the second example, the slopes of the xj(ω) tend towards infinity near ω = 1.Question: Why does ωj contain no Cauchy subsequence?Pick j0 = ωj0. ∃ ω0 < 1 such that ωj0

0 > 12. But ωk

0 → 0, so that limk→∞ ‖xj0 − xk‖ ≥ 12.

General Assumptions:

(i) X = C(Ω) with Ω ⊂ Rd satisfying the cone condition used in Sobolev embedding.

(ii) Ω is compact in Rd.

Definition. S ⊂ X = C(Ω) is called equicontinuous if ∀ε > 0 ∃δ > 0 ∀ ω, ω′ ∈ Ω, ∀x ∈ S it holdsthat ‖ω′ − ω‖ < δ =⇒ |x(ω)− x(ω′)| < ε.

2

Proposition 1.2. S ⊂ X = C(Ω) ≡ C0 is equicontinuous if it is contained in some subset of Cthat is bounded with respect to the Holder norm

‖x‖0,α = ‖x‖0 + supω 6=ω′

|x(ω)− x(ω′)|‖ω − ω′‖α (1.1)

for α ∈ (0, 1]. Thus γ = sup‖x‖0,1 : x ∈ S <∞ =⇒ S is equicontinuous in C(Ω).

Proof. Take δ = ε1/α.γ.

Proposition 1.3 (Arzela-Ascoli). S is precompact in X = C(Ω), i.e. the closure S is compact inX if and only if

(i) S is bounded, i.e. supx∈S(‖x‖) <∞

(ii) S is equicontinuous.

Proof. (See Lebedev & Vorovich[LV03])

Question: Why does C0,1([0, 1]) ≡ x ∈ C0[0, 1] : ‖x‖0,1 <∞ not have a compact unit ball? Inother words, is Proposition 1.1 really true?Note that from equation (1.1), any bounded sequence has limit x∗ ∈ C0[0, 1]. By ”intimidation”,‖x∗‖0,1 ≤ lim sup ‖xj‖0,∞ → x∗ ∈ C0,1[0, 1]. See Examples 1 and 2 in Appendix 1.

1.2 Existence via weak compactness of S

Observation: infimizing sequence xj∞j=1 ⊂ S ⊂ X need not have a strong accumulation point forthe existence of x∗ ∈ S with f(x∗) = infx∈S(f(x)).

Definition. A sequence xj ∈ X is called weakly convergent to some weak limit x∗ ifℓ(x∗) = limj→∞ ℓ(xj) for all ℓ ∈ X∗, i.e. ℓ is a continuous linear functional on X. Then one writesxj x∗.

Observation: ‖xj − x∗‖ → 0 ⇐⇒ xj → x∗ =⇒ xj x∗ because ℓ are by definition continuouswith respect to ‖.‖.

Definition. A set S ⊂ X is called weakly closed if S ⊃ xjj≥0 and xj x∗ =⇒ x∗ ∈ S.

Proposition 1.4. S is weakly closed =⇒ S closed.

Note: The converse is only true if dim(X) <∞ or S convex.

Proof. Exercise.

Definition. S is called weakly precompact if any sequence xj∞j=1 ⊂ S has a weakly convergentsubsequence xjk

x∗. S is called weakly compact if it is weakly precompact and weakly closed, sothat x∗ ∈ S.

Proposition 1.5. Bounded sets in Hilbert spaces and other reflexive Banach spaces are precompact(e.g. Lp(Ω), 1 < p <∞).

Recall: X is reflexive ⇐⇒ (X∗)∗ ∼= X. Unfortunately C(Ω) is not reflexive.

Definition. A function f : S → R is called weakly lower semi-continuous (w.l.s.c) ifxj x =⇒ f(x) ≤ lim infj→∞ f(xj), which implies lower semi-continuity.

3

Definition. For a given f : S → R the set E(f) ≡ (x, α) ∈ S × R : f(x) ≤ α is called theepigraph of f on S.

Proposition 1.6. For S ⊂ X weakly closed and f : S → R, the following statements are equivalent:

(i) f is w.l.s.c on S.

(ii) E(f) is weakly closed.

(iii) Sα := x ∈ S : f(x) ≤ α is closed or empty for all levels α ∈ R.

Proof. .

• (i) =⇒ (ii): Consider (xj , αj)∞j=1 ⊂ E(f) with (xj , αj) (x, α). Then x ∈ S because S isweakly closed. Furthermore, lim αj = α because projection on last component is continuous.Thus, f(x) ≤ lim infj→∞(f(xj)) ≤ lim infj→∞(αj) = α =⇒ f(x) ≤ α andx ∈ S =⇒ (x, α) ∈ E(f) =⇒ E(f) is weakly closed.

• (ii) =⇒ (iii): E(f) and S × α weakly closed =⇒ E(f) ∩ S × α = Sα × α weaklyclosed.(because (x, α) ∈ E(f) ∩ S × α ⇐⇒ f(x) ≤ α ⇐⇒ (x, α) ∈ Sα × α)=⇒ Sα ≡ x ∈ S : f(x) ≤ α weakly closed

• (iii) =⇒ (i): Suppose xj x with f(x) > limj→∞ f(xj) =⇒ ∃α ∈ R such thatlimj→∞ f(xj) < α < f(x) =⇒ xj ∈ Sα but x /∈ Sα =⇒ Sα is not weakly closed, which is acontradiction.

Observation/Implications

Continuity ⇐ Weak-continuity

⇓ ⇓Lower-semi-continuity ⇐ Weak-lower-semi-continuity

Proposition 1.7. Let S be a bounded and weakly closed subset in a reflexive Banach space andif f : S → R be weakly lower semicontinuous on S. Then there exists an x∗ ∈ f−1(f∗); i.e.f(x∗) = infx∈S f(x).

Sufficient conditions for lower semicontinuity:

Assume the definitions of convex sets and convex functions are known.

Definition. f is called quasi-convex on S if Sα = x ∈ S : f(x) ≤ α is convex for any α ∈ R

Observation: g : R → R monotonic and f quasiconvex =⇒ g f quasi-convex, which appliesespecially if f is convex.

Example: f(x) = x for x ∈ R+ = x ∈ R, x ≥ 0; g = ln x =⇒ g f = ln x quasi-convex.Converse holds if g is strictly monotonic because then g−1 exists and is monotonic.

Proposition 1.8. If S is convex and closed; f continuous and quasi-convex, then f is w.l.s.c.

Proof. Sα is convex and closed for any α ∈ R =⇒ Sα is weakly closed ⇐⇒ f is w.l.s.c (byProposition 1.6)

4

Bρ( x ) −

ρ/4B

x−

x

x~

x

ρ

.

.

.

Figure 1: Open Neighborhoods of x

Lemma. If f is convex and bounded on open neighborhood N ⊂ X of x then f is Lipschitz contin-uous near x.

Proof. Assume |f(x)| ≤ f for all x ∈ Bρ(x) (refer to Figure 1)

f(x) ≡ f

[

x

(

1− 2‖x− x‖ρ

)

+2‖x− x‖

ρ

(

x +(x− x)

‖x− x‖ ·ρ

2

)]

(‖x− x‖ ≤ ρ2

=⇒ ‖x− x‖ ≤ 2ρ2

= ρ)

≤(

1− 2‖x− x‖ρ

)

f(x) +2‖x− x‖

ρf(x).

≤ f(x) +2‖x− x‖

ρ(f(x)− f(x))

≤ f(x) +4f

ρ‖x− x‖

f(x) ≤ f(x) +

(4f

ρ

)

‖x− x‖

=⇒ |f(x)− f(x)| ≤ ‖x− x‖4fρ

Lemma. Let f : S → R be convex and bounded on some open neighborhood in S. Then f is boundedon some open neighborhood of any x ∈ S.

Proof. Suppose f is bounded on Bρ(x) (see Figures 2)∀x ∈ Bρ(x)∃y ∈ Bρ(x), where ρ = (1− 1

λ)ρ, ∃α ∈ [0, 1] such that x = (1− α)x + αy

=⇒ f(x) ≤ (1− α)f(x) + f(x) ≤ maxf(x), f(x) ≤ f(x) + supy∈Bρ(x)

f(y) <∞

Corollary: If f : S → R convex and bounded on some open neighborhood in S. =⇒ f is Lipschitzcontinuous on some neighborhood of any x ∈ int(S).

Application to approximation problem.

Lemma. S convex and closed =⇒ f convex and continuous =⇒ f is w.l.s.c.

5

.

Bρ( x )− y

−

B~ρ

x = x + xλ (1−λ)^

for λ >1

x~

x

−

..

x

δS

Figure 2: Open Neighborhoods of x and x

Proof. By triangle inequality for norm.

Proposition 1.9. On a reflexive Banach space X, any closed convex set S is proximinal in that forany x ∈ X the distance f(x) = ‖x− x‖ : S → R attains a minimum.

Proof. Assume f(xn) → f∗ = infx∈S(f(x)) ≥ 0 monotonically. We may restrict xn to a convexclosed and bounded set

Sf(x0) = x ∈ S : ‖x− x‖ = f(x) ≤ ‖x0 − x‖

=⇒ Sf(x0) is weakly compact, f is w.l.s.c. =⇒ minimizer x xkj exists.

Question: Is x unique? Answer: Generally not. Counter example: X = R2, ‖.‖ = ‖.‖∞, S =x ∈ R2 : ‖x‖∞ = max|x1|, |x2| = 1. x = (2, 0), xα = (1, α) ∈ S with |α| ≤ 1; =⇒ ‖xα− x‖ =‖(−1, α)‖∞ = max1, |α| = 1

Converse: A Banach space X is reflexive if and only if all closed convex S ⊂ X are proximinal!

Proof. see Jahn [Jah96].

Definition. A norm ‖.‖ on a Banach space X is called uniformly convex if ∀ε > 0 ∃ ρ > 0 suchthat x, y ∈ X, ‖x‖ = ‖y‖ = 1 and ‖x− y‖ ≥ ε =⇒ ‖1

2(x + y)‖ ≤ 1− ρ

Example. In Hilbert space,‖1

2(x + y)‖2 = 1

4+ 1

4+ 1

2〈x, y〉 = 1− 1

4− 1

4+ 1

2〈x, y〉 = 1− ‖1

2(x− y)‖2 = 1− 1

4‖x− y‖2 ≤ 1− 1

4ε2.

Hence we may choose ρ = 14ε2

Conclusion: On a uniformly convex Banach space any approximation problem has a unique solutionand minimizing sequences converge strongly.

Summary: Implications for f : S → R on convex S.

• quasi-convex and continuity =⇒ w.l.s.c.

• convex and bounded =⇒ continuity and convexity =⇒ quasi convex and continuity.

• convex and bounded somewhere =⇒ convex and bounded anywhere =⇒ convex andbounded.

6

Properties in reflexive Banach space X

(i) Bounded weakly closed sets are weakly compact.

(ii) Convex closed sets are proximinal ( i.e. minimizer not always unique).

(iii) There exists a topologically equivalent norm that is uniformly convex (Renorming Theorem).Non uniqueness on (R2, ‖.‖∞). (Rn, ‖.‖∞) topologically equivalent to (Rn, ‖.‖2) i.e ∃ c1, c2 :

1 ≤ ‖x‖1‖x‖2 ≤

√n. Most importantly, continuity and convergence properties are the same.

Proposition 1.10. Uniform convexity of ‖.‖ on X implies

(i) (xk x and ‖xn‖ → ‖x‖) ⇐⇒ xk → x ⇐⇒ ‖xk − x‖ → 0.

(ii) minimizer x of f(x) = ‖x− x‖ over x ∈ S (closed and convex) is unique.

(iii) Any infimizing sequence for (ii) converges strongly to x.

Proof. (i) ”⇐= ” obvious.” =⇒ ” for Hilbert case:

‖xk − x‖2 = 〈xk − x, xk − x〉 = ‖x‖2 − 2 〈x, xk〉+ ‖xk‖2limk→∞‖xk − x‖2 = ‖x‖2 − 2〈x, x〉+ lim

n→∞‖xk‖2 = ‖x‖2(1− 2 + 1) = 0

(ii) Suppose f(x1) = f(x2) = d(s, x) = infx∈S f(x), but x1 6= x2

x1 =x1 − x

d(s, x)6= x2 =

x2 − x

d(s, x)

=⇒ ε = ‖x2 − x1‖ 6= 0 =⇒ ∃ δ > 0 such that ‖12(x1 + x2)‖ ≤ 1− δ =⇒ f(1

2(x1 + x2)) =

d(s, x)‖12(x1 + x2)‖ ≤ d(s, x)(1− δ) which contradicts the assumed optimality of x1&x2.

(iii) Any subsequence of an infimizing sequence xk ⊂ S must have a weakly convergent subse-quence whose weak limit is x. Hence we have xk x ⇐⇒ xk − x x − x. By definitionof infimizing sequences f(xk) = ‖xk − x‖ →

k‖x − x‖ = infx∈S f(x). By (i) this implies

xk − x→ x− x ⇐⇒ xk → x.

7

2 (Generalized) Differentiation

2.1 Directional derivatives for S ⊂ X Banach

Definition. For f : S → R, x ∈ S, h ∈ X set

f ′(x; h) = limλ→0+

f(x + λh)− f(x)

λ∈ R

provided x + λh ∈ S for 0 < λ ≤ λ > 0 for some λ = λ(h) > 0 and the limit exists. If f ′(x; h) iswell defined for all h ∈ X then f is said to be (Gateaux-) directionally differentiable at x.

Example: f(x1, x2) = x2 sin(|x1|) is directionally differentiable everywhere, even where x1 = 0.Counter example: Any not at least one-sided differentiable univariate function, e.g.

f(x) =

0 if x = 0

x sin(1/x) otherwise

Observation: f ′(x; h) is positively homogeneous i.e. f ′(x; αh) = αf ′(x; h) for α ≥ 0.

Proposition 2.1. For convex f : S → R we have

(i) f ′(x; h) ∈ R, exists uniquely if x + λh ∈ S for λ ∈ [−λ, λ].

(ii) f(x + h) ≥ f(x) + f ′(x; h), if (i) applies.

(iii) f ′(x; h + h) ≤ f ′(x; h) + f ′(x; h) (sublinearity) (i.e. subgradient by Hahn Banach).

Proof. After rescaling of h assume λ = 1. For fixed x, h consider ∆f(λ) = f(x+λh)−f(x), convex.

To bound ∆f(λ)λ

from below consider

0 = ∆f(0) = ∆f

[1

1 + λλ +

1

1 + λ(−λ)

]

≤ 1

1 + λ∆f(λ) +

λ

1 + λ∆f(−1), with λ > 0.

≤ ∆f(λ) + λ∆f(−1)

=⇒ ∆f(λ)

λ> −∆f(−1).

For any µ > λ,

∆f(λ) = ∆f

(µ− λ

µ· 0 +

λ

µ· u)

≤ µ− λ

µ∆f(0) +

λ

µ∆f(u) =⇒ ∆f(λ)

λ≤ ∆f(µ)

µfor 0 < λ ≤ µ.

Hence ∆f(λ)λ

is bounded below by −∆f(−1) and monotonically increasing on λ ∈ [0, 1]. Hence thereexists a unique limit f ′(x; h) = limλ→0

1λ∆f(λ). This proves (i). Part (ii) follows immediately from

monotonicity: f(x + h) = f(x) + ∆f(1)1≥ f(x) + f ′(x; h).

Finally for (iii):

f(x + λ(h + h)) = f(12(x + 2λh) + 1

2(x + 2λh))

≤ 12f(x + 2λh) + 1

2f(x + 2λh)

subtracting f divide by λ :

f(x + λ(h + h))− f(x)

λ≤ f(x + 2λh)− f(x)

2λ+

f(x + 2λh)− f(x)

2λ

=⇒ f ′(x, h + h) ≤ f ′(x; h) + f ′(x; h), as asserted.

8

Definition. (i) S is called star-shaped with respect to x ∈ S if x ∈ S =⇒ αx + (1−α)x ∈ S forany 0 ≤ α ≤ 1.

(ii) f : S → R is called convex at x if ∀x ∈ S the real function f(αx + (1 − α)x) is well definedand convex.

(iii) h is a tangent of a star-shaped S at x if x + αh ∈ S only for 0 < α < α.

Proposition 2.2 (Directional first order optimality conditions). .

(i) x ∈ S can only be a (local) minimizer of f on S if all existing directional derivatives f ′(x; h)along tangents h of S at x are nonnegative.

(ii) x must be a global minimizer of f on star-shaped S if f is convex at x and all f ′(x; h) with htangent exist and are nonnegative.

Proof. (i) For any tangent h we have, by local minimality, f(x + λh)− f(x) ≥ 0 for small λ > 0,=⇒ f ′(x; h) ≥ 0 if it exists.

(ii) The convexity of f at x implies as in proof of Proposition 2.1 that 1λ[f(x + λh) − f(x)] is

monotonically increasing with respect to λ. However, this difference quotient need not bebounded below, so that existence had to be assumed. Then monotonicity implies that forλ = 1 and h = x − x, f(x) ≥ f(x + (x − x)) + f ′(x; x − x) ≥ f(x) since f ′(x; x − x) ≥ 0 byassumption.

Warning: For a polynomial f(x1, x2) one may have that f(λh) has local minimum at λ = 0 for anyh ∈ Rn\0 without x = 0 being a local minimizer.

Observation: The above first optimality conditions require inspection of f ′(x; h) for all h ∈ h ∈ X :‖h‖ = 1, which is dimensionally only one magnitude below examining all differences f(x)− f(x).In smooth finite dimensional case we look at ∇f(x) ∈ X.

2.2 Subdifferential

Definition. A linear functional g ∈ X∗ is called a subgradient of f : S → R at x if f(x) ≥f(x) + 〈g, x− x〉 for all x ∈ X. The set of all such subgradients is denoted by ∂f(x) and called thesubdifferential.

Proposition 2.3. (i) For f : S → R the condition 0 ∈ ∂f(x) is necessary and sufficient for globaloptimality.

(ii) ∂f(x) is a convex, closed and even weak∗ closed subset of X∗.

(iii) For f : S → R convex and x an interior point of S, the set ∂f(x) is nonempty and furthermorebounded if f is locally Lipschitz near x.

Proof. Pre-remark: X∗ ∋ x∗k∗x∗ ∈ X∗ ⇐⇒ 〈x∗k, z〉−→k 〈x∗, z〉 for all z ∈ X. Hence weak∗ and

weak convergence are equivalent if X and hence X∗ is reflexive.x∗k x∗ =⇒ 〈x∗k, g〉 = 〈x∗, g〉 for all g ∈ X∗∗ ⊃ X.Theorem of Alaoglu: Bounded sets in X∗ are weak∗ compact.

(i) Global optimality: f(x)− f(x) ≥ 0 for all x ∈ X =⇒ f(x)− f(x) ≥ 〈0, x− x〉 for all x ∈ X.

9

(ii)

g, g ∈ ∂f(x) =⇒ 〈g, x− x〉 ≤ f(x)− f(x) ≥ 〈g, x− x〉=⇒ 〈αg + (1− α)g, x− x〉 ≤ f(x)− f(x) ⇐⇒ convexity.

(gk, x− x) ≤ f(x)− f(x) for all x ∈ X and let gk g

=⇒ limk→∞〈gk, x− x〉 = 〈g, x− x〉 ≤ f(x)− f(x).

(iii) As shown in Proposition 2.1 f ′(x; h) is for fixed x a sublinear function of h ∈ X. Hence thereexists, by Hahn-Banach, a linear functional g ∈ X∗ such that 〈g, h〉 ≤ f ′(x; h) for all h ∈ X.This makes g a subgradient since f ′(x; h) ≤ f(x + h) − f(x) by monotonicity of differencequotient. Furthermore, if f has local Lipschitz constant L, we have

|f ′(x; h)| = limλ→0

∣∣∣∣

f(x + λh)− f(x)

λ

∣∣∣∣≤ L‖λh‖

λ= L‖h‖

Then Hahn-Banach ensures that also ‖g‖ ≤ L. It remains to be shown (exercise) that not onlyone but all elements g ∈ ∂f(x) are bounded by L.

Corollary of (iii): Given ∂f(x) the directional derivatives f ′(x; h) are uniquely characterized as socalled support function f ′(x; h) = maxg∈∂f(x)〈g, h〉.Proof. f ′(x; h) ≥ 〈g, h〉 for all g follows from subgradient property. f ′(x; h) = 〈g, h〉 for someg = g(h) follows from version of Hahn-Banach where the linear functional g is prescribed on onevector h to be identical with sublinear bound 〈g, h〉 ≡ f ′(x; h).

Generalization to locally Lipschitz functions

Proposition 2.4. If f : X → R has Lipschitz constant L on some neighborhood of x then

(i) There exists for each h ∈ S a unique value

f 0(x; h) = lim supx→x; λց0

f(x + λh)− f(x)

λ∈ [−L‖h‖, L‖h‖]

(ii) f 0(x; h) is sublinear with respect to h and there exists (by Hahn-Banach) a generalized subdif-ferential ∂0f(x) = g ∈ X∗ : 〈g, h〉 ≤ f 0(x; h)

(iii) For x to be a local unconstrained minimizer of f in S it is necessary that 0 ∈ ∂0f(x; h).

Proof. (i) Existence of f 0(x; h) follows from boundedness.

(ii) Homogeneity f 0(x; ρh) = ρf 0(x, h) is also clear.Subadditivity for h, h ∈ X:

1λ[f(x + λ(h + h))− f(x)] = 1

λ[f(x + λh + λh)− f(x + λh)] + 1

λ[f(x + λh)− f(x)]

As x→ x and x + λh→ x we get λց 0

f 0(x; h + h) = lim sup 1λ[f(x + λ(h + h))− f(x)]

≤ lim sup 1λ[f(x + λh + λh))− f(x + λh)] + f 0(x, h)

= f 0(x, h) + f 0(x, h)

Existence of nonempty ∂0f(x) follows again by Hahn-Banach.

10

(iii) 0 ≤ lim supx→x; λց01λ

[f(x + λh)− f(x)] (by optimality of f(x))

≤ lim supx=x; λց01λ[f(x + λh)− f(x)] ≡ f 0(x; h).

=⇒ 0 ≤ f 0(x; h) ≥ 〈0, h〉 for all h ∈ X, =⇒ 0 ∈ ∂0f(x).

Comment: The above is not a all sufficient because for f(x) = −|x| we have 0 ∈ ∂0f(0) sinceh ∈ +1,−1,

f 0(0, h) = lim supx→0;λց0

−|x + λh|+ |x|λ

If h = 1 & x < 0,

f 0(0, h) ≥ limxր0;λ<|x|

−x + x + λh

λ= h = 1

h = −1 =⇒ f 0(0, h) = −1 =⇒ ∂0f(0) = [−1, 1]

f 0(0; h) = |h| ≥ αh for any α ∈ [−1, 1]

As in convex case, there is a 1− 1 relation between the convex set ∂0f(x) and its support functionf 0(x; h) ≡ max〈g, h〉 : g ∈ ∂0f(x).

Proposition 2.5. Suppose f1 and f2 are Lipschitz near x then ∂0(f1 + f1)(x) ⊂ ∂0f1(x) + ∂0f2(x)where ∂0f1(x) + ∂0f2(x) = (g1 + g2) ∈ X∗ : gi ∈ ∂0fi(x) for i = 1, 2

Lemma. Any C ⊂ X∗ such that f 0(x; h) = max〈g, h〉 : g ∈ C is equal to the set of all generalizedgradients ∂0f(x).

Proof. If g ∈ ∂0(f1 + f2)(x) then for f = f1 + f2,

〈g, h〉 ≤ f 0(x; h) = lim supx→x; λց0

1λ[f1(x + λh) + f2(x + λh)− f1(x)− f2(x)]

≤ f 01 (x; h) + f 0

2 (x; h) for all h ∈ X

The RHS is the support function of convex set ∂0f1(x) + ∂0f2(x). For any h we can choose gi ∈∂0f1(x) such that 〈gi, h〉 = f 0

i (x; h) for i = 1, 2. Hence g can be decomposed into sum g = g1+g2.

Difficulty: Inclusion is in general not equality.Example: f1 = |x|; f2 = −|x|. f(x) = f1 + f2 = 0∂0f(0) = 0 ⊂ [−1, 1] + [−1, 1] = [−2, 2].

Lemma. For Lipschitz continuous f1 and f2 one has ∂0(f1 + f2) = ∂0f1(x) + ∂0f2(x)if f1 or f2 is strictly differentiable in that

f 0(x; h) = limx→x; λց0

f(x + λh)− f(x)

λ= 〈g; h〉,

for a unique g ∈ X∗.

Generalization: For n functions fi(x) we have

∂

(n∑

i=1

fi(x)

)0

⊂n∑

i=1

∂0fi(x) ,

with equality holding if at most one of fi’s is not strictly differentiable.

11

Proposition 2.6 (Chain Rule). F = (Fi)i=1,··· ,n : X → Rn, ϕ : Rn → R.Fi’s and ϕ are locally Lipschitz near x ∈ X and F (x) ∈ Rn, respectively.

∂0(ϕ F )(x) ⊂ co

n∑

i=1

αigi, gi ∈ ∂0Fi(x), (α1, · · · , αn) ∈ Rn ≡ (Rn)∗ ∈ ∂0(ϕ(F (x))

where co ≡ weak closure of convex hull.

Proof. See Clarke [Cla90].

Remark: Equality holds for example if n = 1 and ϕ strictly differentiable.

Example: f(x, y) = maxx, y = 12(x + y + |x− y|)

∂0f(0, 0) = 12(1, 1) + 1

2∂0(|x− y|)(0) by sum rule. ϕ = |.| =⇒ n = 1, α1 = α ∈ [−1, 1]

∂0f(0, 0) ≤ 12(1, 1) + 1

2coα(1,−1) : α ∈ [−1, 1]

= 12(1, 1) + 1

2(−1, 1) + 2α(1,−1) for α ∈ [0, 1]

= (0, 1) + α(1,−1) for α ∈ [0, 1]

= (α, 1− α) for 0 ≤ α ≤ 1 = ∂0f(0, 0)

Check support property:

f 0(0, h) = lim sup(x,y)→(0,0); hց0

max(x + λ∆x, y + λ∆y)−max(x, y)

λ

≥ ∆xα + ∆y(1− α) for 0 ≤ α ≤ 1.

Observation: ∂0(f1 + f2), ∂0f1, ∂

0f2 and ∂0(f1 + f2) are bounded, convex & closed and thus weak*compact subsets of X∗

Proposition 2.7. For any such G and G ∈ X∗ we have(with σ : X → R; G(σ) = g ∈ X∗ : 〈g, h〉 ≤ σ(h) for h ∈ X):

(i) σG(h) ≡ supg∈G〈g, h〉 = maxg∈G〈g, h〉, with σG(h) being sublinear =⇒

(ii) G ⊂ G ⇐⇒ σG(h) ≤ σG(h) for h ∈ X i.e. there is a 1-1 relation between G’s and associatedsupport functions.

(iii) Denoting the association of G with σ by G(σ) we have G(σ1 + σ2) = G(σ1) + G(σ2)

Proof. .

(i) Boundedness of G by γ = sup〈g,h〉‖h‖ , g ∈ G, 0 6= x ∈ X

implies σG(h) ≤ |〈g, h〉| ≤ γ‖h‖.Sublinearity follows simply and implies sup ≡ max by Hahn-Banach.

(ii) ” =⇒ ” This direction is immediate by definition.”⇐= ” Proof under construction (assertion in Clarke [Cla90])

(iii) We have

g1 ≤ σ1 and g2 ≤ σ2 =⇒ (g1 + g2) ≤ σ1 + σ2 =⇒ G(σ1) + G(σ2) ⊆ G(σ1 + σ2).

σ(G(σ1)+G(σ2))(h) = supg∈(G(σ1)+G(σ2))

〈g, h〉 = supgi∈G(σi)

〈g1, h〉+ 〈g2, h〉 = σ1(h) + σ2(h)

=⇒ G(σ1 + σ2) ⊆ G(σ1) + G(σ2)

12

Mean-Value Theorem

Proposition 2.8. Suppose f has Lipschitz constant L on an open neighborhood of a line-segment[x, y] = xα = x(1 − α) + αy : 0 ≤ α ≤ 1. Then there exists a z = xα with 0 ≤ α ≤ 1 such thatf(y)− f(x) ∈ 〈∂0f(z), y − x〉

Proof. Consider the Lipschitz continuous function ϕ(α) = f(xα) − f(x)(1 − α) − αf(y), so thatϕ(0) = f(x) − f(x) = 0 = ϕ(1) = f(y)− f(y). Hence there exists a stationary point α ∈ (0, 1) ofϕ(α) where necessarily either ϕ or −ϕ has minimum and thus0 ∈ ∂0ϕ(α) ⊂ ∂0f(xα)|α + f(x)− f(y) =⇒ f(y)− f(x) ∈ ∂0f(xα)|α

ϕ(α) = f(xα) = f(x(1− α) + αy)

ϕ0(α; h) = lim suphց0; α→α

ϕ(α + λh)− ϕ(α)

λ

= lim suphց0; α→α

f(xα + λh(y − x))− f(xα)

λ

≤ lim supλց0; z→xα

f(z + λh(y − x))− f(z)

λ

= f 0(xα; h(y − x))

∂0ϕ(α) ⊂ 〈∂0f(xα); y − x〉

Application: Characterization of convexity via monotonicity

Proposition 2.9. If f is Lipschitz on a convex neighborhood N of x then it is convex if and onlyif ∂0f(x) is monotone on N in that x, x ∈ N , g ∈ ∂0f(x), g ∈ ∂0f(x) =⇒ 〈g − g, x− x〉 ≥ 0

Proof. (w.l.o.g. x 6= x). Convexity implies f(x)− f(x) ≥ 〈g, x− x〉. At the same time,f(x)− f(x) ≥ 〈g, x− x〉 = 〈−g, x− x〉. Adding we get 0 ≥ 〈g − g, x− x〉 ⇐⇒ monotonicity.

0 ?≥ f(x(1− α) + αx)− f(x)(1− α)− αf(x)

= (1− α)[f(x(1− α) + αx)− f(x)] + α[f(x(1− α) + αx)− f(x)]

= (1− α)〈g, α(x− x)〉 − α〈g, (1− α)(x− x)〉

=(1− α)α

γ[〈g, z − z〉 − 〈g, z − z〉]

=(1− α)α

γ〈g − g, z − z〉 ≤ 0 since γ > 0, α ∈ (0, 1)

Corollary. Under the assumptions of proposition 2.9 the monotonicity of ∂0f(x) on N ∋ x and0 ∈ ∂0f(x) is sufficient for x to be an unconstrained local minimum of f .

Proof. As in proposition 2.4 and convexity.

Question. Can we somehow express ∂0f(x) in terms of ”conventional” derivatives?Answer Yes, in X = Rn using Rademacher and forming convex upper semi-continuous hull.

13

Proposition 2.10. Suppose f is Lipschitz near x. Then ∂0f(x) is a singleton g with g ∈ X∗ ifand only if f is strictly differentiable, i.e.

limx→x; λց0

f(x+λh)−f(x)λ

= 〈g, h〉

for some g ∈ X∗ and all h ∈ X.

Proof. ”⇐= ” (Exercise).For any h : f 0(x; h) = 〈g, h〉 for some g ∈ ∂0f(x) and hence for the unique g = ∂0f(x). But f 0 ismerely limsup, so consider

lim infhց0; x→x

1λ[f(x + λh)− f(x)] = − lim sup

hց0; x→x

1λ[f(x + λh− λh)− f(x + λh)]

= − lim supx→x; λց0

1λ[f(x + λ(−h))− f(x)]

= −f 0(x;−h) = −〈g,−h〉= 〈g, h〉

=⇒ lim inf = lim sup = lim

=⇒ strict differentiability

Proposition 2.11. ∂0f(x) = g ⇐⇒ f strictly differentiable ⇐= continuous differentiability

Example

f(x) =

x2 sin 1

xif x 6= 0

0 if x = 0

f differentiable at x = 0 with f ′(x) = 0Observation: If f is differentiable then ∇f(x) ∈ ∂f(x).Question How do we characterize or compute other elements of ∂0f(x) where x is not strictlydifferentiable?

Consider the example above with h = ±1.

f 0(0; 1) = lim suphց0

x→0

f(x+λ−f(x))λ

= limλց0;

x→0

f ′(x + λ), λ ∈ (0, λ)

= ¯limx→0f′(x) = ¯limx→0(− cos 1

x+ 2x sin 1

x)

= ¯limx→0 − cos 1x

= 1

f 0(0,−1) = ¯limx→0 cos 1x

= 1

f 0(0, h) = |h| the same as for f(x) = |x|. =⇒ ∂0f(0) = [−1, 1], because g · h ≤ |h| for allh ⇐⇒ |g| ≤ 1

Notice that here ∂0f = [−1, 1] = limi→∞ f ′(xi), xi → 0 = all cluster points of sequences f ′(xi)for xi → 0 = x.

Rademacher Theorem. f : S ⊆ X = Rn → R locally Lipschitz has a set of exceptional points

Sf ⊂ S of Lebesgue measure zero such that ∇f(x) =(

∂f∂xi

)

i=1,...,nexists at all x /∈ Sf . i.e. every

locally Lipschitz function is almost everywhere differentiable on Rn.

14

1

f(x) = 1/2 x

f

x

Proof. see Federer [Fed96]

Example:

f has Lipschitz constant 2 and is differentiable except at a countable set of points.There is no directional derivative: f ′(0, 1). But f 0(0, 1) = 2

f 0(0,−1) =

2 if f is odd0 if f is even0 if f(x) = 0 for x ≤ 0

Generalized derivative:

∂0f(0) =

[0, 2] if f odd[−2, 2] if f even[0, 2] otherwise

= conv[0, 2] = [0, 2]

Proposition 2.12. Let f be Lipschitz continuous near x, S be of measure 0. Then

∂f(x) = conv(

limk→∞∇f(xk) xk → x and xk /∈ Sf ∪ S

)

≡ convex hull of all cluster points of conventional gradients of points outside Sf ∪ S near x.

Proof. ”⊃”: Consider g = limk←∞∇f(xk), lim xk = x and any h ∈ Rn = X. Pick multipliers λk

such thatf(xk + λkh)− f(xk)

λk≥ ∇f(xk)h−

1

k

Applying limsup we get f 0(x; h) ≥ g⊤h =⇒ g is subgradient ⇐⇒ g ∈ ∂0f(x). Convexificationfollows immediately.”⊂ ”: This direction is left as an exercise.

Example. f(x1, x2) = max(x1, x2), which is Lipschitz on R2 with constant 1. It is differentiableeverywhere except where x1 6= x2.

15

∇f(x1, x2) =

(0, 1) if x1 > x2

(1, 0) if x1 < x2

∂0(2, 2) = (1, 0), (0, 1) = (1− α, α) for 0 ≤ α ≤ 1f 0((2, 2), (∆x1, ∆x2)) = max

g∈∂0f(0)(∆x1g1, ∆x2g2)

= max0≤α≤1

(∆x1(1− α) + α∆x2) = max(∆x1, ∆x2).

Corollary. If X ⊂ Rn and f Lipschitz on N ⊂ X, then the multifunction ∂0f : NX∗ is upper-semi-continuous in that gk ∈ ∂0f(xk), xk → x, gk → g =⇒ g ∈ ∂0f(x).

Proof. By diagonalization. Suppose gi = limj→∞ gij , gij = ∇f(xij), limj→∞ xij = xi.Take j = j(i) such that ‖xij − xi‖ < 1

4‖xi − x‖ and ‖gij − gi‖ < 1

4‖gi − g‖ then limj→∞ xij(i) = x;

limj→∞ gij(i) = g =⇒ g ∈ ∂0f(x).

Result can be extended to convex combination gi =∑n

i=1 gki α

(k)i

∑α

(k)i = 1. Here finite dim(X) is

important.

Proposition 2.13 (Griewank, Jongen, Mankwong[GJK91]). Let f : D ⊂ Rn → R, D open, beuniform Lipschitz on an open neighborhood of L (:= x ∈ D : f(x) ≤ µ + g⊤x; µ ∈ R, g ∈ Rn).Further, if f(x) = µ + g⊤x, then g 6∈ ∂0f(x). Then the following statements are equivalent

(i) L is convex and f is strictly convex in the interior L0 of L so that for all x 6= y,x, y ∈ L0, and α ∈ (0, 1), we have

f(x(1− α) + αy) < (1− α)f(x) + αf(y)

(ii) The generalized differential is injective in that for x, y ∈ L0 we have∂0f(x) ∩ ∂0f(y) 6= ∅ =⇒ x = y.

(iii) The tangent plane defined by g ∈ ∂f(x) at x ∈ L0 is strictly supporting in that for all y ∈L0\x, f(y) > f(x) + g⊤(y − x).

Proof. Exercise.

16

2.3 Generalized Jacobians and Hessians

Let F : Rn → Rm be locally Lipschitz. Rademacher’s theorem implies that the Jacobian F ′(x) ∈Rm×n exists at all points x ∈ ΩF =

⋂mi=1 ΩFi

with the complement Rn \ ΩF still of measure zero.Hence we have generalized Jacobian

∂0F (x) = conv

limk→∞

F ′(xk), xk → x, xk ∈ ΩF

Properties of generalized Jacobian for Lipschitz F

(i) ∂0F (x) is nonempty, convex and compact at x.

(ii) ∂0F : RnRm×n is upper semi-continuous.

(iii) F is continuously differentiable =⇒ ∂0F (x) = F ′(x)

(iv) ∂0F (x) ⊆ ∂0F1(x)× ∂0F2(x)× . . .× ∂0Fm(x)

Example for (iv):

F (x) =

[|x|−|x|

]

: R1 → R2

F ′(x) =

[1−1

]

if x > 0

[−1

1

]

if x < 0

=⇒ F ′(0) =

[α− (1− α)−α + (1− α)

]

=

[2α− 11− 2α

]

α∈[0,1][

∂0F1(0)∂0F2(0)

]

⊆ conv

[−1−1

]

,

[−1

1

]

,

[1−1

]

,

[11

]

=

[[+1,−1][ 1,−1]

]

=

[αβ

]

, α ∈ [−1, 1], β ∈ [−1, 1]

Chain Rule. Let both F : D ⊂ Rn → Rm and G : E ⊂ Rm → Rp be locally Lipschitz. We have

∂0(G F )(x) ( NM : M ∈ ∂0F (x) and N ∈ ∂0G(F (x))

where NM denote matrix multiplication.

Mean Value Theorem. If F : D ⊂ Rn → Rm is locally Lipschitz on the convex domain D andx, y ∈ D then

F (y)− F (x) ∈ convZ(y − x) : Z ∈ ∂0F (z) ⊂ conv∂0F (z)(y − x)

where z = (1− α)x + αy.

Corollary. If m = n and for some z ∈ D0 all M ∈ ∂0F (z) are nonsingular, then for all x, y insome neighborhood U of z, x 6= y =⇒ F (x) 6= F (y). Moreover there exists a neighborhood V ⊆ Rn

of F (z) and a Lipschitz function G : V → Rn such that

G F (x) = x for all x ∈ U and F G(y) = y for all y ∈ V.

17

Remark. There is also a corresponding implicit function theorem.

Definition. If F = ∇f for some C1,1 function f : Rn → R, then

∂2f(x) = ∂0∇f(x) = conv

limk→∞∇2f(xk), xk → x, xk /∈ ΩF

Notation: We write ∇2f(x) to denote the proper Hessian and ∂2f(x) for generalized Hessian.

Proposition 2.14. If ∇f(x) is Lipschitz continuous on some neighborhood of [x,y] then

f(y)− f(x) = ∇f(x)T (y − x) + 12(y − x)⊤Hz(y − x)

where Hz(y − x) ∈ ∂2f(z) for some z = (1− α)x + αy, 0 < α < 1.

Proof. see Hiriart-Urruty/Lemarechal [HUL01]

Corollary 2.15. Under assumptions of Proposition 3.1, the condition that ∇f(x) = 0 and ∂2f(x)contain only positive definite matrices is sufficient for x to be local minimizer of f .

Proof. Because ∂f is a generalized Jacobian and thus upper-semi-continuous, all generalized Hes-sians ∂2f(x) for x in some neighborhood U of x also contain only positive definite matrices. Oth-erwise, there would be a sequence Hk ∈ ∂2f(xk) with xk → x and Hk indefinite. Then any clusterpoint H = limj→∞Hkj

∈ ∂2f(x) would also be indefinite. Note that the positive definite matricesform an open subset of the linear space of symmetric n × n matrices. Local optimality of f on Ufollows from mean value theorem.

Proposition 2.16 (Saddle point property). Suppose ∇f(x) = 0 and all elements of ∂2f(x) arenonsingular and indefinite. Then x is neither a local minimizer nor a local maximizer of f.

Proof. See Vogel and Griewank [GV]

Proposition 2.17. At any local minimizer x that does not satisfy the second order sufficient condi-tion in Corollary 3.2, the generalized Hessian ∂2f(x) must contain at least one singular or positivedefinite matrix as well as indefinite matrices.

18

3 Frechet and Gateaux Differentiability

Definition. Let F : X → Y where X and Y are Banach spaces.

(i) F is said to be Gateaux differentiable at x ∈ X if for some bounded linear operatorF ′(x) ∈ B(x, y) and all h ∈ X

limh→0

1

λ[F (x + λh)− F (x)] = F ′(x) · h

(ii) F is called Frechet differentiable at x if

lim‖h‖→0

1

‖h‖ [F (x + h)− F (x)− F ′(x) · h] = 0.

Remark. (ii) =⇒ (i); while (i) =⇒ (ii) only if dim(x) <∞.

The following example highlights the difference: Let F : X ≡ L2[0, 1] → Y = X be given byF (x(t)) = sin(x(t)).

Proposition 3.1. F is uniformly Lipschitz continuous with F ′(x)h = cos(x(t))h(t) and Lipschitzconstant L = 1, and is everywhere Gateaux differentiable but nowhere Frechet differentiable.

Proof.

1) Lipschitz continuity:

‖F (x)− F (y)‖2L2 =

∫ 1

0

[sin(x(t))− sin(y(t))]2dt

≤∫ 1

0

(x(t)− y(t))2dt = 1 · ‖x− y‖2L2

2) Gateaux differentiability:

0 ?=

limλ→0

∥∥∥∥

[sin(x(t) + λh(t))− sin(x(t))]

λ− cos(x(t))h(t)

∥∥∥∥

2

2

= limλ→0

∫ 1

0

1λ2 [cos(x(t) + λh(t))λh(t)− cos(x(t)λh(t))]2dt, |h(t)| ≤ |h(t)|

where h(t) ≥ 0 : x(t) ≤ x(t) + λh(t) ≤ x(t) + λh(t) with 0 ≤ h(t) ≤ h(t)

where h(t) ≤ 0 : x(t) + λh(t) ≤ x(t) + λh(t) ≤ x(t), with h(t) ≤ h(t) ≤ 0

= limλ→0

∫ 1

0

[cos(x(t) + λh(t))− cos(x(t))]2h2(t)dt

≤ limλ→0

∫

‖h‖≤∆>0λ2h4(t)dt +

∫

‖h‖≥∆4h2(t)dt ≤ lim

λ→0λ2

∫ 1

0

∆2dt + 4‖h‖2 − 4

∫

‖h‖≤∆

h2(t)dt

︸︷︷︸

∗

(∗) ≤ 4[‖h‖2L2 −∫

‖h‖≤∆

h2(t)dt]

By dominated convergence, lim∆→∞∫

‖h‖≤∆h2(t)dt = ‖h‖2. Hence the limit is indeed zero, i.e.

F is Gateaux differentiable at a given arbitrary x(t).

19

3) Nowhere Frechet differentiable: Assume x = 0, cos(x(t)) = 1 and consider perturbations hn(t) =2nπ for t ∈ [0, n−3] and hn(t) = 0, outside. Thus sin(x(t) + hn(t)) = 0 and cos(x(t) + hn(t)) = 1.

∫ 1

0

h2n(t)dt =

∫ 1n3

0

4n2π2 = (2n)2

n3 π2 = 4nπ2 =⇒ ‖hn‖L2 = 2π√

n−→n→∞

0

Need to show:

∫ 1n3

0

[sin(x(t) + hn(t))− sin(x(t))− cos(x(t))hn(t)]2 · n −→ 0 ?

But

n

∫ 1n3

0

cos2(x(t))h2n(t) = n

∫ 1n3

0

hn(t)2dt = n ‖hn‖2L2 = 4π2

Conclusion. Uniformly Lipschitz continuous mappings from a (the) separable Hilbert space intoitself may not be Frechet differentiable anywhere.

Proposition 3.2 (Preiss 1990, [Pre90]). If X∗ is separable, then any Lipschitz continuous functionf : X → R is Frechet differentiable on a dense set of points in X.

Lemma. Where F : X → Y is Frechet differentiable, it must be locally Lipschitz continuous.

Proof. Exercise.

Proposition 3.3 (First Order Optimality for Gateaux differentiable). If f : N ⊂ X → R is Gateauxdifferentiable at some x ∈ N open, it can only be a local minimizer if f ′(x) = 0 ∈ X∗ ⇐⇒ f ′(x)h =0 for all h ∈ X.

Proof. By proposition 2.2 we must have f ′(x; h) = f ′(x)h ≥ 0 ≤ f ′(x;−h) = −f ′(x)h. =⇒f ′(x)h = 0 =⇒ f ′(x) = 0 ∈ X∗.

Application to variational problem

min f(x) ≡∫ b

a

ϕ(x(t), x(t), t)dt where x(a) = xa ∈ Rn, x(b) = xb ∈ Rn

W.l.o.g, take a = 0, b = 1, x(0) = 0 = x(1). Set x(t) = x(t)− (1− t)xa− txb, ˙x(t) = x(t)+(xa−xb)=⇒ x ∈ C1

0 ([0, 1], Rn) once continuously differentiable with homogeneous boundary condition.

‖x‖ = max0≤t≤1

‖x(t)‖2 =⇒ ‖x(t)‖2 = ‖∫ t

0

x(τ)dτ‖

≤∫ t

0

‖x(τ)‖dτ ≤ t max0≤τ≤1

‖x(τ)‖2 ≤ ‖x‖C1[0,1].

Also assume autonomy, i.e ϕ(x, x, t) ≡ ϕ(x, x) and ϕ : R2n −→ R and∇ϕ = (ϕx, ϕx) : R2n =⇒ R2n with global Lipschitz constant L ≥ 0 for ∇ϕ =⇒ ϕ ∈ C1,1(R2n).

20

Lemma. f : C10 ([0, 1], Rn) −→ R is Frechet differentiable with gradient f ′(x)h =

∫ 1

0(ϕx(x(t), x(t)))h(t)+

ϕx(x(t), x(t))h(t)dt

Proof.

|f(x + h)− f(x)− f ′(x)h| = |∫ 1

0

ϕ(x(t) + h(t), x(t) + h(t))− ϕx(x(t), x(t))h(t)− ϕx(x(t), x(t))h(t)|

≤∫ 1

0

12L(h(t)2 + h(t)2)dt =

∫ 1

0

12L‖h‖2C1

0 [0,1]2dt

= L‖h‖2forh ∈ C10 [0, 1].

Condition for classical solution x ∈ C10 [0, 1] minimizing f is that for all h ∈ C1

0

∫ 1

0

[

ϕx(x(t), x(t))h(t) + ϕx(x(t), x(t))h(t)]

dt = 0

Assuming ϕx(x(t), x(t)) is differentiable with respect to t we obtain integrating by parts

∫ 1

0

[

ϕx(x(t), x(t))− d

dtϕx(x(t), x(t))

]

h(t) = −ϕx(x(t), x(t))h(t)|10

Tentative consequence: Euler-Lagrange Equation:

ϕx(x(t), x(t)) = ddt

ϕx(x(t), x(t))

for 0 ≤ t ≤ 1. This is first order optimality condition for f at x ∈ C10 .

Question: Does ddt

ϕx(x(t), x(t) really exist?Answer: Yes as shown below.

Proposition 3.4. If u, v ∈ C0([0, 1], Rn) and for any h ∈ C10 [0, 1]:

∫ 1

0

(u(t)Th(t) + v(t)T h(t))dt = 0,

then

v(t) = v(0) +

∫ t

0

u(τ) dτ

and thus v = u ∈ C0[0, 1].

Proof.

∫ 1

0

[u(t)⊤h(t) + v(t)⊤h(t)] = 0 for h ∈ C10 ([0, 1], Rn), h(0) = 0 = h(1)

=

∫ 1

0

(v(t)− U(t))⊤h(t)dt for U(t) =

∫ t

0

u(τ)dτ

=

∫ 1

0

(v(t)− U(t)− c)⊤h(t)dt = 0− c⊤∫ τ

0

h(t)dt = c⊤(h(1)− h(0)) = 0 for any c ∈ R.

Pick

c =

∫ 1

0

(v(t)− U(t))dt such that as a result

∫ 1

0

(v(t)− U(t)− c)dt = 0.

21

Then consider h = (v(t)− U(t)− c) corresponding to

h(t) =

∫ t

0

(v(τ)− U(τ)− c)dt =⇒ h(1) = 0 = h(0)

For this we obtain

0 =

∫ 1

0

‖v(t)− U(t)− c‖2 =

∫ 1

0

(v(t)− U(t)− c)⊤h(t)dt

=⇒ v(t) = U(t) + c =⇒ v(0) = c and v(t) = u(t)

=⇒ v ∈ C1([0, 1]).

Proposition 3.5 (Legendre–Clebsch). When x ∈ C10 [0, 1] locally minimizes f and ϕ ∈ C2 then

ϕxx(x(t), x(t)) ∈ Rn×n must be positive semidefinite at all 0 ≤ t ≤ 1

Proof.

d2

dα2f(x + αh)

∣∣∣∣α=0

=

∫ 1

0

h(t)⊤ϕxx(x(t), x(t))h(t)dt +

∫ 1

0

hϕxx(x(t), x(t))h(t)dt

+ 2

∫

h⊤ϕxx(x(t), x(t))h(t)dt ≥ 0

Assume ω⊤ϕxx(x(t0), x(t0))ω < 0 for ω ∈ Rn, ‖ω‖ = 1

=⇒ ω⊤ϕxx(x(t), x(t))ω ≤ δ < 0 for t0 − ε ≤ t ≤ t0 + ε

hε =

εω(1 + cos(π

ε(t− t0))), if |t− t0| ≤ ε

0, otherwise

for ε ≤ ε :

hε =

−πω sin(πε(t− t0)), if |t− t0| ≤ ε

0, otherwise

∫ 1

0

‖h(t)‖2dt ≤ ε2 · 4 · 2ε = 8ε3

∫ 1

0

‖h(t)‖‖h(t)‖dt ≤ πε2ε = 2πε2

d2

dα2f(x + αh)

∣∣∣∣α=0

≤∫

ω⊤ϕxxω‖hε‖2dt + o(ε2)

≤ −δ

∫ 1

0

‖hε‖dt = −δπ2

∫ 1

0

sin2(πε(t− t0))dt = −δπ2ε + o(ε2);

which would yield negative curvature for sufficiently small ε > 0.

Corollary. If in addition ϕ ∈ C2 and if ϕxx(x(t), x(t)) is nonsingular for any t ∈ (0, 1) it followsfrom the implicit function theorem that x is a differentiable function of x(t) and ϕx(x(t), x(t)) sothat x(t) is in fact twice differentiable.

22

Proof.

ϕxx(x(t), x(t)) = ϕx(x(0), x(0) +

∫ t

0

ϕx(x(τ), x(τ))dτ)

︸︷︷︸

:=g(t)∈C1, w.r.t t

Hence IFT is applicable to

G(x(t), x, t) = ϕx(x, x)− g(t) : Rn+1 → Rn

Whenever ∂G∂x

= ϕxx is regular, there is a locally unique function x = F (t) with the derivative

d

dtx(t) = x(t) = −ϕxx(x, x)−1

︸︷︷︸

(Gx)−1

(ϕxx(x(t)))

Which means that x(t) is in fact twice continuously differentiable.

In other words we are allowed to do implicit differentiation on the Euler–Lagrange equation toobtain the second order boundary value problem (BVP) in ODEs

ϕxx(x, x)x + ϕxx(x, x)x = ϕx(x, x) s.t x(0) = 0 = x(1)

Question: Does the BVP always have a solution if det[ϕxx(x, x)] 6= 0 globally?Answer: No. Even not in the scalar case where ϕxx > 0.

Examples:

(i) x + 4π2(x− t) = 0.y(t) = x(t) + t =⇒ y(t) + 4π2y(t) = 0, y(0) = 0, y(1) = 1.Any solution has the form

y(t) = α cos 2πt + β sin 2πt =⇒ y(0) = y(1)

so that the condition y(0) = 0, y(1) = 1 is not satisfied.

(ii) Another example where the variational problem has a solution outside C10 [0, 1] is the double

well potential

ϕ(x, x) = 12(x2 − 1)2.

min f(x) = 12

∫ 1

0

(x(t)2 − 1)2dt such that x(0) = 0 = x(1)

has minimizers

x(t) =

t for 0 ≤ t ≤ 12

1− t for 12≤ t ≤ 1

x(t) =

1 for 0 ≤ t < 12

−1 for 12

< t ≤ 1NaN if t = 1

2

23

Conclusion: The Euler–Lagrange need not hold everywhere along an optimal solution. It doeshold whenever x(t) is differentiable. Then x(t) solves the local problem

min

∫ b

a

ϕ(x(t), x(t))dt s.t x(a) = xa and x(b) = xb

ϕx ≡ 0; ϕx = (x2 − 1)2x ≡ 0 if |x| = 1 =⇒ ddt

ϕx = 0 = ϕx.

The above solution is contained in H10 (0, 1) ⊇ C1

0 [0, 1]

(iii) An example where no solution exists even in H10 ([0, 1]) is the following:

ϕ(x) = 12(x2 − 1)2 + 1

2x2 =⇒ min f(x) =

∫ 1

0

(12(x2 − 1)2 + 1

2x2)dt, x(0) = 0 = x(1)

k+11/1

0

1/2

1/2

2x (t)

x (t)

0 ≤ f(xk) ≤∫ 1

0

1

4k+1dt ≤ 1

22k+3−→ 0 as k →∞

Hence

infx∈C1

0 ([0,1])f(x) = inf

x∈H10 [0,1]

f(x) = f∗

But f∗ = 0 is not obtained by the pointwise limit x∗(t) ≡ 0

f(x∗) =

∫ 1

0

(12

+ 0)dt =

1

2> lim inf

k→∞f(xk)

even though xk 0 weakly in H10 [0, 1]. That means f is not weakly lower semicontinuous in

H10 ([0, 1]).

Partial remedy: Extension of variational function f(x) from x ∈ C10 [0, 1] onto H1

0 [0, 1] (whilestill assuming that ϕ : R2n → R is C1,1 with global Lipschitz constant L)

Observation: H10 [0, 1] is the closure of C1

0 [0, 1] ∋ x with respect to the norm

‖x‖H1=

√∫ 1

0

‖x(t)‖2dt =⇒ C10 is a dense subspace of H1

0

x ∈ H10 has the representation x = limk→∞ xk, xk ∈ C1

0 . Then f(x) = limk→∞ f(xk) is unique

24

provided f is Lipschitz continuous on C10 with respect to ‖.‖H1

0

|f(y)− f(x)| ≤∫ 1

0

|ϕ(y(t), y(t))− ϕ(x(t), x(t))|dt

≤∫ 1

0

L√

‖y(t)− x(t)‖2 + ‖y(t)− x(t)‖2dt

≤ L

[∫ 1

0

(‖y(t)− x(t)‖2 + ‖y(t)− x(t)‖2)dt

]12

≤√

2L

∫ 1

0

‖x(t)− y(t)‖2dt =√

2L‖x− y‖H10

Conclusion: f(x) is well-defined on H10 ([0, 1]).

Question: What is sufficient for f to be weakly lower semicontinuous so that reflexivity of the Hilbertspace H1

0 guarantees existence of minimizers by Proposition 1.7?Answer: For a general f : H1

0 → R a sufficient condition is convexity. For f of integral form it sufficesthat ϕ(x, x) is convex with respect to x, which follows locally from Legendre–Clebsch condition iff ∈ C2.

Proposition 3.6. If ϕ(x, x) has a uniform Lipschitzian gradient and is convex with respect to xthen

f(x) =

∫ 1

0

ϕ(x(t), x(t))dt for x ∈ H10 [0, 1]

is weakly lower semicontinuous on H10 .

Proof. Suppose xk x∗ ∈ H10 =⇒ ‖xk − x∗‖L2 → 0. Hence we have

limk→∞

f(xk) = limk→∞

∫ 1

0

[ϕ(xk, xk)− ϕ(x∗, xk) + ϕ(x∗, xk)]dt

≥ limk→∞

∫ 1

0

∇ϕ(x∗(t), x∗(t))(xk − x∗)dt + f(x∗)dt

The first term gives

limk→∞

[∫ 1

0

|ϕ(xk, xk)− ϕ(x∗, xk)|]2

≤[∫ 1

0

‖∇xϕ(

x∗(t) + δ(t)(xk(t)− x∗(t)), x(t) + δ(t)(xk(t)− x∗(t)))]2

≤ L

∫ 1

0

(C + ‖(xk(t)‖+ ‖(x∗(t)‖+ ‖(xk(t)‖+ ‖(x∗(t)‖)2

∫ 1

0

‖xk(t)− x∗(t)‖2dt

︸︷︷︸

‖xk−x∗‖2L→0

≤ limk→∞

L(

‖xk‖2H10

+ ‖x∗‖2H10

)

· 0 = 0.

∫ 1

0

∇xϕ(x∗(t), x∗(t))v(t)dt

25

is a bounded linear functional on v ∈ H10 so that by assumed weak convergence

lim

∫ 1

0

∇xϕ(x∗(t), x∗(t))(xk(t)− x∗(t))dt = 0

Remark: Proposition 3.6 is also true for ϕ = ϕ(x, x, t).

Example (Weierstrass): ϕ(x, x, t) = 12t(x− 1)2, x(0) = 0 = x(1)

f(x) = 12

∫ 1

0

t(x(t)− 1)2dt ≥ 0

and is weak lower semicontinuous. Since ϕxx = t ≥ 0 everywhere =⇒ convexity with respect to x.

Let us look at Euler-Lagrange:

ϕx ≡ 0 ≡ d

dtϕx =

d

dtt(x− 1) = x− 1 + tx

x

x− 1= −1

t=⇒ ln(x− 1) = c− ln t =⇒ x = 1 +

c

t

=⇒ x(t) = t + c ln t + d

Again: BVP has no solution since x(0) is undefined unless C = 0 in which case x(0) = d 6= x(1) =d + 1 =⇒ no C1 minimizer. Construct infimizing sequence by truncation

xn(t) =

t for 0 ≤ t ≤ 1n

t− 1− ln(t)ln(n)

for 1n≤ t ≤ 1

xn(t) =

t for 0 ≤ t ≤ 1

n

1− 1t ln(n)

for 1n≤ t ≤ 1

xn(1

nt) =

−n

ln(n)

f(xn) =

∫ 1n

0

0dt +

∫ 1

1n

t

[1

t ln n

]2

dt

=

∫ 1

1n

1

t ln(n)2dt =

ln(t)

ln(n)2

]1

1n

= − ln(1/n)

ln(n)2=

1

ln(n)→ 0, as n→∞.

i.e xn ∈ H10 [0, 1] form infimizing sequence with f∗ = limn→∞ f(xn) = 0.

Question: Does the sequence xn have a weak cluster point which would have to be a minimizerof f in H1

0 by weak lower semicontinuity?

Answer: No, because ‖xn‖H10

grows unbounded as shown below.

26

‖xn‖2H10

=

∫ 1

0

(x(t))2dt =

∫ 1n

0

1dt +

∫ 1

1n

(1− 1

t ln(n))2dt

=1

n+

[

t− 2 ln t

ln n− 1

t(ln(n))2

]1

1n

=1

n+ 1− 1

n− 2 ln(n)

ln(n)− 1

(ln(n))2+

n

ln(n)2

= −1− 1

(ln(n))2+

n

ln(n)2→∞, as n→∞.

Compare to minx>0f(x) = 1

x.

Conclusion : For unconstrained S = X we need bounded level sets which is implied by coercivityi.e. lim‖x‖→∞ f(x) =∞.

In the variational example:

ϕ(x, x, t) ≥ C|x|2 − d, C > 0, d ∈ R

implies that f(x) ≥ C‖x‖H10− d =⇒ infimizing sequence must be contained in the ball

S =

x ∈ X : ‖x‖H10≤ 2

f ∗+d

c

.

That feasible set S is convex, bounded closed and thus weakly closed and weakly sequentially com-pact. Hence Proposition 1.7 applies.

Review/Summary

(1) Attainment of infimal value by the minimizer is guaranteed if

(i) dim(X) <∞. The feasible set S ⊂ X is bounded and closed, and continuous.

(ii) X = C(Ω), f : X → R continuous and S equicontinuous ∀ ε > 0, ∃ δ > 0 ∀ x ∈ X ∀ ω ∈Ω ∋ ω′ : |ω − ω| < δ =⇒ |x(ω) − x(ω)| < ε then S is sequentially compact in X andinfimizing sequences converge to minimizers. Example: Open pit mining problem (seeAppendix for details).

(iii) X is reflexive, f weakly lower semicontinuous (implied by quasi-convexity) and S weaklyclosed and bounded (implied by quasi-convexity) and S weakly closed and bounded (im-plied by coercivity in unconstrained case).

(2) Necessary Conditions for local optimality of x ∈ S:

(i) f 0(x; h) ≥ 0 if x + λh ∈ S for 0 ≤ λ ≤ λ and f locally Lipschitz continuous.

(ii) 0 ∈ ∂0f(x) if x is in interior of S and f is Lipschitz continuous.

(iii) 0 = f ′(x) if f is Gateaux-differentiable at x.

(iv) Euler–Lagrange holds for x ∈ C1 on variational problem with ϕ ∈ C1,1.

(v) Legendre–Clebsch for x ∈ C2 on variational problem with ϕ ∈ C2,1. Local minimalityimplies global minimality for convex f on convex S.

27

4 Tangent Cones and Sensitivity

4.1 Motivation

Unconstrained optimization → (explicitly) constrained optimization problems.

Definition (Cone). Let C ⊂ X be a subset of a Banach space. C is a cone if x ∈ C and λ ≥ 0 =⇒λx ∈ C. A cone is pointed if −x ∈ C ∋ x =⇒ x = 0 (i.e. only rays no whole lines belong to C).Example: positive octant (R+)n

Lemma (4.0). C ⊂ X is convex ⇐⇒ (x, y ∈ C =⇒ x + y ∈ C)

Proof. Exercises

Definition. For any S ⊆ X set Cone(S) = λx : λ ≥ 0, x ∈ S

Definition. h ∈ X is called tangent to S ⊂ X at x if ∃ xk ⊂ S, λk > 0 such that xk → x and(xk − x)λk → h. The set of all such tangents is denoted by T (S, x) and called “contingent cone” or“Bouligand tangent cone”.

Observation: Without loss of generality, h = 0 or λk = ‖h‖‖xk−x‖

Proposition 4.1.

(i) If x ∈ S, then T (S, x) ⊂ Closure(Cone(S − x))

(ii) If S is starshaped at x then Cone(S − x) ⊂ T (S, x) where S − x 6= S\x =: x ∈ S : x 6= x

Proof.

(i) h = limk→∞ λk(xk − x) = limk→∞ xk, with xk = (xk − x)λk ∈ Cone(S − x)

(ii) h ∈ Cone(S − x) = λ(x− x) = λk(1k(x− x)) = λk(x + 1

k(x− x)− x)

Corollary. If S is starshaped then T (S, x) = Closure(Cone(S − x))

Proposition 4.2. For all S ⊂ X, T (S, x) is closed.

Proof.

h = limk→∞

hk, hk ∈ T (S, x) =⇒ ∃ xk,j ∈ S, λk,j > 0 s.t. λk,j(xk,j − x)→j

hk ∈ T

=⇒ ∃ j(k) s.t. ‖λk,j(k)(xk,j(k) − x)− hk‖ ≤ 1k≥ ‖xk,j(k) − x‖

limk→∞‖λk,j(k)(xk,j(k) − x)− h‖ ≤ lim

k→∞

(‖λk,j(k)(xk,j(k) − x)− hk‖

)+ ‖h− hk‖

≤ limk→∞‖h− hk‖+ 1

k= 0 and lim

n→∞xk,j(k) = x

28

Lemma 4.3. S convex =⇒ T (S, x) convex.

Proof. λk(xk − x)→ h, λk(xk − x)→ hwith λ = λk + λk > 0 and

x =λk

λk + λk

xk +λk

λk + λk

xk ∈ S

by convexity we get

h + h = limk→∞

(λk + λk)( λk

λk + λk

xk +λk

λk + λk

xk − x)

= limk→∞

λk(xk − x) ∈ T (S, x).

=⇒ T (S, x) is convex by Lemma (4.0).

Proposition 4.4. For x to be a local minimizer on S for an f that is Frechet differentiable on anopen neighborhood of S it is necessary that f ′(x, h) = f ′(x)h ≥ 0 for all h ∈ T (S, x)

Proof.

f ′(x)h = limk→∞

λkf′(x)(xk − x) since h = lim

k→∞λk(xk − x)

= limk→∞

λkf(xk)− f(x)− [f(xk)− f(x)− f ′(x)(xk − x)]

≥ − limk→∞

λk‖xk − x∗‖|f(xk)− f(x)− f ′(x)(xk − x)|

‖xk − x‖= −‖h‖ · 0 by Frechet differentiability.

Question: Is it good enough for f ′(x, h) to be locally Lipschitz continuous with respect to h?

Corollary 4.5. Under the assumptions of Proposition 4.4 and with S starshaped at x and pseudo-convex, i.e f ′(x, h) ≥ 0 =⇒ f(x + λh) ≥ f(x) if λh ∈ S − x, then x must in fact be (global)minimizer of f on S.

General representation of feasible Set

S = x ∈ S : h(x) = 0 ∈ Z; g(x) ∈ −C ⊂ Y for S ⊂ X : g(x) ≤ 0

where h : X → Z, g : X → Y . X,Y,Z are Banach spaces.

First step consider S = X and omit g. S ≡ x ∈ X : h(x) = 0 = h−1(0).

Lemma 4.6. If h is Frechet differentiable at x ∈ S then T (S, x) ⊂ ker(h′(x)) ≡ v ∈ X : h′(x)v =0Proof.

T (S, x) ∋ v = limk→∞

λk(xk − x)︸︷︷︸

vk

with λk > 0, xk ∈ S lim xk = x.

=⇒ h′(x).v = limk→∞

λkh′(x)(xk − x)

= limk→∞

λk‖xk − x‖‖h(xk)− h(x)− h′(x)(xk − x)‖‖xk − x‖

= limk→∞‖vk‖ · 0 = 0 by Frechet differentiability

29

Question: Give simple example where T (S, x) 6= ker(h′(x)).

Remark: h′(x) may not be surjective and continuous with respect to x.

Answer: Lack of surjectivity X = R2, h(x) = x21 + x2

2 =⇒ h−1(0) = 0 =⇒ T (h−1(0), 0) = 0.ker(h′(0)) = ker([0, 0]) = R2 ) 0

Assume from now on that h′(x) is continuous with respect to x in the vicinity of x and that h′(x)is surjective; i.e. Range(h′(x)) = Y .

Construction of a feasible arg x(λ) = x + λv + r(λ) ∈ h−1(0) for small λ ∈ [0, λ) ⊂ [0, 1) withr(λ) ∈ o(λ):

(i) By the open mapping theorem one concludes from the boundedness and surjectivity of h′(x)that h′(x)Bρ(0X) ⊃ B1(0Y ) for some 0 < ρ <∞, where Bρ(p) = x ∈ X : ‖x− p‖ ≤ ρ. Hereρ generalizes the norm of h′(x)−1 if it were to exist, which it usually does not.

(ii) For ε ∈ ( 0, 1/(2ρ) ) pick σ > 0 such that for x ∈ Bσ(x)

‖h′(x)− h′(x)‖ ≤ ε =⇒ ‖h(x)− h(x)− h′(x)(x− x)‖ ≤ ε‖x− x‖

for all pairs x ∈ Bσ(x) ∋ x. To apply the mean value theorem for scalar functions consider forthe given pair (x, x) (with ∆x = x − x) some functional ℓ ∈ Y ∗ such that by Hahn-Banach‖ℓ‖ ≤ 1 and

〈ℓ, h(x)− h(x)− h′(x)∆x〉 = ‖h(x)− h(x)− h′(x)∆x‖ .

Consider ϕ : R→ R with

ϕ(t) = 〈ℓ, h(x + t∆x)− h(x)− th′(x)∆x〉=⇒ ϕ(0) = 0

ϕ(1) = ‖h(x)− h(x)− h′(x)∆x‖ϕ′(t) = 〈ℓ, h′(x + t∆x)∆x− h′(x)∆x〉|ϕ′(t)| ≤ ‖h′(x + t∆x)− h′(x)‖ ‖∆x‖ ≤ ε‖∆x‖ for 0 ≤ t ≤ 1.

By the mean value theorem, for some t ∈ (0, 1)

|ϕ(1)− ϕ(0)| = |ϕ′(t)(1− 0)|=⇒ ‖h(x)− h(x)− h′(x)∆x‖ = |ϕ′(t)| ≤ ε‖∆x‖

(iii) Choose some v ∈ ker(h′(x)) with 0 6= ‖v‖ ≤ σ and compute r(λ) by applying a Newton-likeiteration r0 = 0, rk+1 = rk − uk for k = 0, 1, ... with uk satisfyingh′(x)uk = h(x + λv + rk) ∈ Y while ‖uk‖ ≤ ρ‖h(x + λv + rk)‖ which is solvable by (i).

d(λ) = ‖h(x + λv)‖ = ‖h(x + λv)− h(x)− h′(x)λv‖≤ o(λ)‖v‖ ≤ o(λ)σ — initial residual

30

Successive residuals:

‖h(x + λv + rk − uk)‖ = ‖h(x + λv + rk − uk)− [h(x + λv + rk)− h′(x)uk]‖≤ ε‖uk‖≤ ρε‖h(x + λv + rk)‖ ≤ 1

2‖h(x + λv + rk)‖

≤ (12)k+1‖h(x + λv + r0)‖ = (1

2)k+1d(λ)

=⇒ ‖uk‖ ≤ ρ ‖h(x + λv + rk)‖ ≤ ρ(12)k d(λ).

supk‖rk‖ ≤

∞∑

k=0

‖uk‖ ≤∞∑

k=0

ρ (12)k d(λ)

≤ 2ρd(λ) ≤ 2ρo (λ)σ ≤ σ · (1− λ) for small λ < λ

By continuity of h, which follows from differentiability we have for r(λ) =∑∞

k=0 uk thath(x + λv + r(λ)) = 0 and ‖r(λ)‖ ≤ 2ρo(λ)σSo that ‖r(λ)‖/λ = o(λ0) −−→

λ→00.

Proposition 4.7. If h : X → Z has Frechet derivative h′(x) that is surjective at x and continuouswith respect to x near x then T (h−1(0), x) = ker(h′(x))

Corollary 4.8. Under the assumption of Proposition 4.7, if x is a local minimizer of a Frechetdifferentiable f : X → R on h−1(0) ⊂ X then there exists a functional λ ∈ Z∗ such that f ′(x)v =〈λ, h′(x)v〉 for v ∈ X

Proof. See page 32.

4.2 Basic Review of Adjoints

Let X, Y be Banach spaces, A ∈ B(X, Y ), X −→A

Y ; X∗ ←−A∗

Y ∗, where X∗ = B(X, R), Y ∗ =

B(Y, R) are spaces of bounded linear functionals. A∗ : Y ∗ → X∗ is defined such that for all y∗ ∈ Y ∗

〈A∗y∗, x〉 = 〈y∗, Ax〉 for x ∈ X

Lemma (Properties of the Adjoint).

(i) ‖A∗‖ = ‖A‖ ( =⇒ A∗ ∈ B(Y ∗, X∗))

(ii) (A + B)∗ = A∗ + B∗

(iii) (αA)∗ = αA∗

Thus, adjoining, ∗, is a bounded linear operator in the real case; ∗ : B(X, Y )→ B(Y ∗, X∗)

Proof. We prove part (i)

|〈A∗y∗, x〉| = |〈y∗, Ax〉|≤ ‖y∗‖‖Ax‖ ≤ ‖y∗‖‖A‖‖x‖

‖A∗y∗‖ ≤ sup|〈A∗y∗, x〉|‖x‖

≤ ‖y∗‖‖A‖ =⇒ ‖A∗‖ ≤ ‖A‖

31

∀ x0 ∈ X and y0 = Ax0 ∃ y∗0 : ‖y∗0‖ ≤ 1, |〈y∗0, y0〉| = ‖y0‖‖Ax0‖ = |〈y∗0, Ax0〉| = |〈A∗y∗0, x0〉| ≤ ‖A∗y∗0‖‖x0‖‖Ax0‖‖x0‖

≤ ‖A∗y∗0‖ ≤ ‖A∗‖‖y∗0‖ = ‖A∗‖ for any x0 ∈ X.

=⇒ ‖A‖ ≤ ‖A∗‖ ≤ ‖A‖.

Definition.

• ker(A) = x ∈ X : Ax = 0,• Range(A) = y ∈ Y : y = Ax, x ∈ X• X∗ ⊃ (ker(A))⊥ = x∗ ∈ X∗ : 〈x∗, x〉 = 0 if Ax = 0• (Range(A))⊥ = y∗ ∈ Y ∗ : 〈y∗, y〉 = 0 if y = Ax

Correspondingly, we have Range(A∗), Range(A∗)⊥, ker(A∗), ker(A∗)⊥

Proposition 4.9. For A ∈ B(X, Y )

(i) ker(A∗) = Range(A)⊥ ⊂ Y ∗

(ii) If furthermore Range(A) = Y , then ker(A)⊥ = Range(A∗) and ker(A)⊥ = Range(A∗) ⊂ X∗

Proof. (i) ” ⊂ ” : Suppose y∗ ∈ ker(A∗) and y = Ax

=⇒ 〈y∗, y〉 = 〈y∗, Ax〉 = 〈A∗y∗, x〉 = 0 since A∗y∗ = 0.

” ⊃ ”: Suppose y∗ ∈ Range(A)⊥, x ∈ X =⇒ 〈y∗, Ax〉 = 0 =⇒ 〈A∗y∗, x〉 = 0 =⇒ A∗y∗ =0 since x is arbitrary.

(ii) ” ⊃ ”: Let x∗ ∈ Range(A∗), then x∗ = A∗y∗ =⇒ ∀x ∈ ker(A)

〈x∗, x〉 = 〈A∗y∗, x〉 = 〈y∗, Ax〉 = 0 =⇒ x∗ ∈ ker(A)⊥

” ⊂ ”: Let x∗ ∈ ker(A)⊥. Now construct a y∗ ∈ Y ∗ with 〈y∗, y〉 ≡ 〈x∗, x〉 for any y ∈Range(A) = Y and any x such that Ax = y. y∗ exists and is unique since for x with Ax = y,〈x∗, x〉 = 〈x∗, x〉 as x − x ∈ ker(A). Hence y∗ is uniquely defined on Y . Next we must showboundedness. By open mapping we can pick x such that ‖x‖ ≤ ρ‖y‖ with Ax = y for someρ <∞|〈y∗, y〉| = |〈x∗, x〉| ≤ ‖x∗‖‖x‖ ≤ ‖x∗‖‖y‖ρ =⇒ ‖y∗‖ ≤ ‖x∗‖ρ and 〈y∗, Ax〉 = 〈x∗, x〉 ∀ x

So that necessarily A∗y∗ = x∗ =⇒ x∗ ∈ Range(A∗).

Remark: The result can be generalized like the concepts of adjoints themselves to (only closed)semi-Fredholm operators (Kato).

Proof of Corollary 4.8 :

Proof. Minimality requires by Proposition 4.7 that f ′(x)v = 0 for all v ∈ T (h−1(0), x) = ker(h′(x)) =⇒f ′(x) ∈ ker(h′(x))⊥. This implies, by assumed surjectivity of h′(x) that

f ′(x) ∈ Range(h′(x)∗) =⇒ f ′(x) = (h′(x))∗λ∗, λ∗ ∈ Z∗.

32

Practical Consequences

(i) x can under suitable conditions be computed together with λ∗ as a root to the KKT system:

h(x) = 0

f ′(x)− h′(x)∗λ∗ = 0

i.e. root of the mapping (x, λ∗) ∈ (X × Z∗) to (h(x), f ′(x)− h′(x)∗λ) ∈ Z ×X∗

(ii) The multiplier vector λ can be interpreted as sensitivity as follows: Under suitable assumption,the perturbed problem

min f(x) s.t h(x(t)) = tz for fixed z ∈ Z and t ∈ (−ε, ε) ∈ R

has differentiable paths of solutions x(t), λ∗(t) with x(0) = x, λ∗(0) = λ∗. Then at t = 0 theoptimal value has the derivative

d

dtf(x(t)) = f ′(x(t))x(t) = 〈λ∗(t), h′(x(t))x(t)〉 = 〈λ∗, z〉.

4.3 Inequality Constraints via Cones

A linear space X is partially ordered by ” ≤ ”” if

x ≤ y (reflexivity)

x ≤ y ∧ y ≤ z =⇒ x ≤ z (transitivity)

x ≤ y ∧ z ∈ X : x + z ≤ y + z (additivity)

x ≤ y ∧ α ≥ 0 =⇒ αx ≤ αy (multiplicativity)

Lemma. Due to additivity and multiplicativity

x ≤ y ⇐⇒ y − x ∈ C ≡ z ∈ X : z ≥ 0

with C being a convex cone, which is pointed if and only if ” ≤ ” is antisymmetric(i.e. x ≤ y ∧ y ≤ x =⇒ x = y).

Proof. Exercise.

Definition. The dual cone

C∗ ≡ x∗ ∈ X∗ : 〈x∗, x〉 ≥ 0 for x ∈ C

by definition is always closed and convex even for nonconvex C.

Examples:

(i) X = R2, C ≡ (x1, x2) ∈ R2 : x1 ≥ 0 ≤ x2 (nonnegative orthant)

(ii) In C0[0, 1] C = x ∈ C0[0, 1] : x(ω) ≥ 0X∗ ⊃ C∗ ≡ Stieltjes integrals represented by nondecreasing functions of bounded variation.

33

The inequality constraints g : X → Y, g(x) ≤ 0 ⇐⇒ g(x) ∈ −C (by the definition of ” ≤ ”). Inmany cases we assume that C ∈ Y has nonempty interior Int(C) .

Typical example: NLP

g(x) =

(g1(x)g2(x)

)

∈ R2

C = R2+ =⇒ nonempty interior .

g1(x) ≤ 0

g2(x) ≤ 0⇐⇒ g(x) ∈ −C.

Examples of cones without interior in C[0, 1]

• Convex functions

• Lipschitz continuous functions on [0, 1]. x ≤ y ⇐⇒ y − x is convex.

From now on, we will consider the more general problem

min f(x) s.t x ∈ S ≡ x ∈ X : g(x) ∈ −C, h(x) = 0

where g : X → Y and h : X → Z

One (bad) idea is to replace h(x) = 0 by h(x) ≤ 0 ∧ −h(x) ≤ 0. This idea is bad because wecan never have a vector v ∈ X such that at x ∈ h−1(0) strictly h′(x)v < 0 and −h′(x)v < 0. Butexistence of such a v is necessary for various constraint qualifications. In finite dimension, thismeans two active constraints with gradients of opposite sign that must be linearly dependent.

Proposition 4.10 (No strict descent direction). Suppose f , g and h are Frechet differentiable atx, h′(x) is continuous with respect to x ≈ x and surjective. Then x can only be a local minimizer ifthere exists no v ∈ X such that 〈f ′(x), v〉 < 0 , h′(x)v = 0 and g(x) + g′(x)v ∈ − Int(C)

Proof. Since v ∈ ker(h′(x)) = T (h−1(0), x) there exists a sequence xk ∈ h−1(0) with

xk → x andxk − x

‖xk − x‖ →v

‖v‖ with ‖v‖ = 1 w.l.o.g.

By Frechet differentiability we have

f(xk)− f(x)− f ′(x)(xk − x)

‖xk − x‖ → 0 =⇒ f(xk)− f(x)

‖xk − x‖ → f ′(x)v < 0

This contradicts local minimality provided we can show that g(xk) ∈ −C for large k and thus xk ∈ S

g(xk)− g(x)

‖xk − x‖ → g′(x)v

yk = g(x) +g(xk)− g(x)

‖xk − x‖ → g(x) + g′(x)v ∈ Int(−C) =⇒ yk ∈ −C for large k.

=⇒ g(xk) = ‖xk − x‖yk + (1− ‖xk − x‖)g(x) ∈ −C

since this RHS is a convex combination of two terms in −C.

34

Geometric Motivation of Lagrange Multiplier via Separation

Suppose we have the map

fgh

: X →

R

YZ

At any x where this map from X into R × Y × Z is open arbitrary variations of the value triplef(x), g(x), h(x) are possible. i.e. can be realized by perturbation of x. Thus x cannot possibly beextremal. Hence for any x to be a minimizer we need that f(x), g(x), h(x) lies on the boundary ofthe set

M =

f(x) + αg(x) + y

h(x)

x∈X, α>0, y∈Int(C)

If M is convex then it has a supporting hyperplane at any boundary point (f(x), g(x), h(x))

Such that for all (f(x) + α, g(x) + y, h(x))

µ(f(x) + α) + 〈y∗, g(x) + y〉+ 〈z∗, h(x)〉 ≥ µf(x) + 〈y∗, g(x)〉+ 〈z∗, h(x)〉

Problem: M is generally not convex.Remedy: Replace M by the linearization

M = [f ′(x)v + α, g(x) + g′(x)v + y, h′(x)v] : 0 < α ∈ R, v ∈ X, y ∈ Int(C)

Under suitable assumptions M is convex, open, and does not contain 0.

Lemma. For x ∈ X where f, g, and h are Frechet differentiable

M = [f ′(x)v + α, g(x) + g′(x)v + y, h′(x)v] : 0 < α ∈ R, v ∈ X, y ∈ Int(C)

(i) is always convex

(ii) is open if h′(x) is surjective

(iii) does not contain 0 if x is a local minimizer so that there are no directions of descent accordingto Proposition 4.10

Proof. Let (vi, αi, yi) ∈ X × R+ × Int(C) for i = 1, 2

(i) Then we have for any λ ∈ [0, 1]

λ(f ′(x)v1 + α1) + (1− λ)(f ′(x)v2 + α2) = f ′(x)(λv1 + (1− λ)v2) + λα1 + (1− λ)α2)

λ(g(x) + g′(x)v1 + y1) + (1− λ)g(x) + g′(x)v2 + y2) = g(x) + g′(x)(λv1 + (1− λ)v2) + λy1 + (1− λ)y2

λh′(x)v1 + (1− λ)h′(x)v2 = h′(x)(λv1 + (1− λ)v2)

=⇒ M is convex as an affine image of convex X × R+ × Int(C).

(ii) By open mapping there exists ρ such that for any ∆h ∈ Z there exists ∆v ∈ X with

h′(x)∆v = ∆h and ‖∆v‖ ≤ ρ‖∆h‖ ⇐⇒ h′(x)(v + ∆v) = h + ∆h

35

Then look for ∆y such that

g(x) + g′(x)(v + ∆v) + y + ∆y = g + ∆g ∈ Y

For given ∆g and g = g(x) + g′(x)v + y

∆y = ∆g − g′(x)∆v

‖∆y‖ ≤ ‖∆g‖+ ‖g′(x)‖‖∆v‖ ≤ ‖∆g‖+ ‖g′(x)‖ρ‖∆h‖

For given ∆f look for ∆α such that

f ′(x)∆v + ∆α = ∆f =⇒ |∆α| ≤ ‖∆f‖+ ‖f ′(x)‖ρ‖∆h‖

Overall we obtain the estimate

‖∆v‖+ ‖∆y‖+ |∆α| ≤ |∆f |+ ‖∆g‖+ ‖∆h‖ρ(1 + ‖g′(x)‖+ ‖f ′(x)‖)

Since (v, y, α) ∈ Int(X × Int(C) × R+) = X × Int(C) × R+ all sufficiently small changes(∆f, ∆g, ∆h) can be realized.

(iii) Suppose 0 ∈ M . Then

h′(x)v = 0 ∧ g(x) + g′(x)v = 0 ∧ f ′(x)v + α = 0

=⇒ g(x) + g′(x)v ∈ − Int(C) ∧ f ′(x)v < 0 ∧ h′(x)v = 0.

This contradicts minimality of x by Proposition 4.10

Proposition 4.11. Suppose f, g, h are Frechet differentiable at x and that h′(x) is surjective andcontinuous with respect to x. Then x can only be local minimizer of f on S ≡ x ∈ X : h(x) = 0, g(x) ∈ −Cif there exists (µ, y∗, z∗) ∈ R≥0 × C∗ × Z∗ such that

(i) µf ′(x)v + 〈y∗, g′(x)v〉+ 〈z∗, h′(x)v〉 = 0 for all v ∈ X

(ii) 〈y∗, g(x)〉 = 0 (complementarity)

Moreover there is a v ∈ X such that h′(x)v = 0 and g(x)+ g′(x)v ∈ − Int(C). Then µ > 0 and thusw.l.o.g µ = 1, which means we have KKT rather than Fritz-John conditions satisfied.

Proof. Convexity and openness of M follow from the previous Lemma by Eidelheit theorem andseparation from 0 through (µ, y∗, z∗) such that for all v ∈ X, y ∈ Int(C), and α > 0,

µ(α + f ′(x)v) + 〈y∗, g(x) + g′(x)v + y〉+ 〈z∗, h′(x)v〉 > 0.

By continuity and because R≥0 and C are closures of their interiors we obtain for any α ≥ 0 andy ∈ C

µ(α + f ′(x)v) + 〈y∗, g(x) + g′(x)v + y〉+ 〈z∗, h′(x)v〉 ≥ 0

Take α = 1, v = 0, y = −g(x) ∈ C. Then we have

µ + 〈y∗, g(x)〉+ 〈y∗, y〉 ≥ µ ≥ 0

36

Taking α = 0, v = 0, y ∈ C gives

〈y∗, g(x) + y〉 ≥ 0

=⇒ 〈y∗, g(x) + λy〉 ≥ 0 ⇐⇒⟨

y∗,1

λg(x) + y

⟩

≥ 0 ⇐⇒ 〈y∗, y〉 ≥ 0 =⇒ y∗ ∈ C∗

〈y∗, g(x)〉 ≥ 0 for y = 0. g(x) ∈ −C =⇒ 〈y∗, g(x)〉 = −〈y∗,−g(x)〉 ≤ 0 =⇒ 〈y∗, g(x)〉 = 0(complementarity condition) . Thus for α = 0 we get

µf ′(x)v + 〈y∗, g′(x)v〉+ 〈z∗, h′(x)v〉 ≥ 0

which must hold for all v ∈ X and thus also for −v. Hence we have in fact equality as assumed in (i).

Case µ > 0: Under MFCQ: ∃ v such that h′(x)v = 0, and g(x) + g′(x)v ∈ − Int(C). For that v

µf ′(x)v + 〈y∗, g′(x)v〉︸︷︷︸

〈y∗,g(x)+g′(x)v〉<0

+〈z∗, h′(x)v〉 = 0

=⇒ µf ′(x)v 6= 0 =⇒ µ 6= 0 and µ ≥ 0 =⇒ µ = 1 w.l.o.g.

Review of Constraint Qualifications

1) Slater Condition: ∃ x ∈ h−1(0) such that g(x) ∈ − Int(C).

2) Mangasarian-Fromowitz Constraint Qualification (MFCQ) at x ∈ S: ∃ x such that h′(x)(x −x) = 0 and g(x) + g′(x)(x − x) ∈ − Int(C). This can be regarded as Slater condition on locallinearization of S.

3) Kuryucz-Zowe (Robinson): At x ∈ S, g′(x) ker(h′(x)) + Cone(C + g(x)) = Y

Lemma. MFCQ implies KZCQ

Proof. Pick any y ∈ Y . Then for some 0 < λ ∈ R we have

g(x) + g′(x)(x− x)− 1

λy ∈ −C

=⇒ y = g′(x)[λ(x− x)] + λg(x) + C ∈ g′(x) ker(h′(x)) + Cone(g(x) + C).

Case µ > 0: Under KZCQ assume µ = 0. Pick arbitrary y = g′(x)v+λg(x)+y, v ∈ ker(h′(x)), y ∈C. Then we have

〈y∗, y〉 = 〈y∗, g′(x)v〉+ λ〈y∗, g(x)〉+ 〈y∗, y〉= −〈z∗, h′(x)v〉+ 0 + 〈y∗, y〉 = 〈y∗, y〉 ≥ 0

=⇒ 〈y∗, y〉 ≥ 0 ∀y ∈ Y =⇒ 〈y∗, y〉 = 0 =⇒ y∗ = 0.

=⇒ h′(x)∗z∗ = 0 =⇒ z∗ = 0 because ker(h′(x)∗) = Range(h′(x))⊥ = 0. Thus(µ, y∗, z∗) = 0 ∈ (R, C∗, Z∗) which contradicts the construction by Eidelheit separation.

37

Generalization to more general problems

minx∈S

f(x) where S ≡ x ∈ S : g(x) ∈ −C, h(x) = 0

where S is closed convex with nonempty interior.

Kuryucz-Zowe: [g′(x)h′(x)

]

Cone(S − x) +

[Cone(g(x) + C)

0

]

= Y × Z

Multiplier (µ, y∗, z∗)⊤ is as in Proposition 4.11 with µ 6= 0 under KZCQ but the adjoint equalitybecomes inequality

0 ≤ µf ′(x)(x− x) + 〈y∗, g′(x)(x− x)〉+ 〈z∗, h′(x)(x− x)〉 for x ∈ S.

Remark: To obtain sufficient optimization conditions requires some kind of convexity of f and gand linearity of h.

Definition. The map g : X → Y (B-spaces) with Y partially ordered by C is called convex if

g(x(1− λ) + λx)− (1− λ)g(x)− λg(x) ∈ −C.

Lemma. If g is convex and some y∗ ∈ C∗ then ϕ(x) ≡ 〈y∗, g(x)〉 : X → R is convex in the classical(= scalar) sense.

Proof. ϕ(x(1− λ) + λx)− (1− λ)ϕ(x)− λϕ(x) = 〈y∗, y〉 ≤ 0 for y ∈ C.

Proposition 4.12. Suppose (x, y∗, z∗) ∈ X × C∗ × Z∗ satisfies KKT conditions

h(x) = 0, g(x) ∈ −C

〈y∗, g(x)〉 = 0

f ′(x) + g′(x)∗y∗ + h′(x)∗z∗ = 0

Assume further that f and g are convex and h linear. Then x is a global minimizer of the problem

minx∈S

f(x) with S = x ∈ X : g(x) = −C; h(x) = 0 .

Proof. The LagrangianL(x) = f(x) + 〈y∗, g(x)〉+ 〈z∗, h(x)〉

is convex, Frechet differentiable and attains stationary point at x. Hence by Corollary 4.5, x is itsglobal minimizer. For any feasible x ∈ S we have

f(x) = L(x)− 〈y∗, g(x)〉+ 〈z∗, h(x)〉 = L(x) + 〈y∗,−g(x)〉 ≥ L(x).

=⇒ x is also a global minimizer of f(x) since L(x) = f(x).

Corollary 4.13 (Lagrange multipliers as sensitivities). Under the assumptions of Proposition 4.12on f , g and h consider the following pair of optimization problems

min f(x) : g(x) + yi ∈ −C ∧ h(x) + zi = 0 for i = 0, 1, with yi ∈ Y and zi ∈ Z.

Let (xi, y∗i , z∗i ) ∈ (X, C∗, Z∗) be KKT points of these problems. Then we have

〈y∗0, y1 − y0〉+ 〈z∗0 , z1 − z0〉 ≤ f(x1)− f(x0) ≤ 〈y∗1, y1 − y0〉+ 〈z∗1 , z1 − z0〉

38

Proof.

f(x1) = f(x1) + 〈y∗1, g(x1) + y1〉+ 〈z∗1 , h(x1) + z1〉≤ f(x0) + 〈y∗1, g(x1) + y1〉+ 〈z∗1 , h(x1) + z1〉

f(x1)− f(x0) ≤ 〈y∗1, y1 − y0〉+ 〈z∗1 , z1 − z0〉

Exchange of indices due to symmetry gives

f(x0)− f(x1) ≤ 〈y∗0, y0 − y1〉+ 〈z∗0 , z0 − z1〉

Changing signs yields the left part inequality in the assertion.

Remark: By Cauchy–Schwarz we get

f(x1)− f(x0) ≤ ‖y∗1‖‖y1 − y0‖+ ‖z∗1‖‖z1 − z0‖f(x0)− f(x1) ≤ ‖y∗0‖‖y1 − y0‖+ ‖z∗0‖‖z1 − z0‖

Corollary 4.14 (The value-function). f(y, z) ≡ inff(x) : g(x)+y ∈ −C, h(x)+z = 0

is convex

under the assumptions of Proposition 4.12 and that at any y0, z0 with finite f(x0, y0) subgradienty∗0, z

∗0 ∈ C∗ × Z∗ such that

f(y, z)− f(y0, z0) ≥ (y∗0, y − y0) + 〈z∗0 , z − z0〉

Proof.

f(y0(1− λ) + λy1, z0(1− λ) + λz1) = inf

f(x)

∣∣∣∣∣

g(x) + y0(1− λ) + λy1 ∈ −C

h(x) + z0(1− λ) + λz1 = 0

≤ inf

f(x0(1− λ) + λx1)

∣∣∣∣∣

g(xi) + yi ∈ −C

h(xi) + zi = 0, for i = 0, 1

≤ inf(1− λ)f(x0) + λf(x1)

= (1− λ)f(y0, z0) + λf(y1, z1)

This establishes convexity of f . Hence subgradients exist. (y∗0, z∗0) ∈ ∂f (y0, z0) follows from the left

inequality in Corollary 4.13 with y = y1 and f(x1) = f(y, z).

39

5 Duality Results

Consider now the “primal” problem

min f(x) : x ∈ S ≡x ∈ S : g(x) ∈ −C

where S is closed, convex and may incorporate linear equality constraints.

Definition. The dual objective ϕ : C∗ → R is given by

ϕ(y∗) = infx∈S

(f(x) + 〈y∗, g(x)〉

)

The set S∗ =y∗ ∈ C∗ : ϕ(y∗) > −∞ is called the feasible set of the dual problem.

Lemma (Weak Duality). Without any convexity assumptions on f or g

ϕ ≡ supy∗∈S∗

ϕ(y∗) ≤ infx∈S

f(x) = f .

Proof. For any y∗ ∈ C∗ and any x ∈ S we have ϕ(y∗) ≤ f(x) + 〈y∗, g(x)〉 ≤ f(x)

Remark: The difference f − ϕ is called the duality gap. If it is positive it can be shown tovanish under suitable conditions, usually involving some kind of convexity. (Even in semi-definiteprogramming convexity may not be enough).

Semi-Infinite Programming (SIP):

min c⊤x such that a(t)⊤x ≤ β(t) for all t ∈ [0, 1] with a : [0, 1]→ Rn and β : [0, 1]→ R

Semi-Definite Programming (SDP):

min〈c, x〉 such that 〈Ai, x〉 ≤ βi ∈ R

where 〈M, x〉 = Trace M⊤x, M = M⊤ ∧ x = x⊤ > 0.x 0 ⇐⇒ t⊤xt ≥ 0 for t ∈ Rn ∧ ‖t‖ = 1.

Definition. g : X → Y is called convex-like on S ⊂ X with respect to C ⊂ Y if g(S) + C isconvex in Y.

Lemma. Any convex g is convex-like.

Proof.

(1− λ)g(x0) + (1− λ)y0 + λg(x1) + λy1 for xi ∈ S and y∗i ∈ C

(1− λ)y0 + λy1 +[(1− λ)g(x0) + λg(x1)− g((1− λ)x0 + λx1)

]+ g((1− λ)x0 + λx1)

= g((1− λ)x0 + λx1) + y, y ∈ C

∈ g(S) + C.

40

Examples concerning duality gap

(i) min f(x) = −x2 s.t x ∈ S ∈ [0, 2]; g(x) =

(x− 11− x

)

≤(

00

)(=(y1, y2) : y1 ≥ 0 ≤ y2

)

Optimal solution f = f(1) = −1.

ϕ(y∗1, y∗2) = inf

0≤x≤2

−x2 + y∗1(x− 1) + y∗2(1− x)

≤ min0≤x≤2

−x2 + (y∗1 − y∗2)(x− 1)

= min−(y∗1 − y∗2)− 4 + y∗1 − y∗2

≤ −2

supy∗1≥0≤y∗

2

ϕ(y∗1, y∗2) = ϕ(0, 2) = −2 = ϕ < −1 = f

=⇒ duality gap = 1 = f − ϕ

〈y∗, g(x)〉 = 0 · (1− 1) + (1− 1) = 0.

(ii) min−x2 s.t x− 1 ≤ 0, −x ≤ 0, S = R =⇒ S = [0, 1]

f = f(x) = f(1) = −1.

ϕ(y∗1, y∗2) = inf

x∈R

−x2 + y∗1(x− 1) + y∗2(−x)

= −∞

ϕ = −∞ < f = −1.

Check convexity–likeness:

f(x)g1(x)g2(x)

=

−x2

x− 1−x

=

−g22

−1− g2

g2

+

α ≥ 0β ≥ 0γ ≥ 0

g1 g2

f

g1 = −1− g2

not in set

(fg

)

(S) + C

Figure 3: Illustration of failure of convex-likeness by Example (ii)

41

Proposition 5.1 (Strong duality). If (i) (f, g) is convex-like on S with respect to R+ × C; and if(ii) f, g are continuously Frechet differentiable on the neighborhood of S and (iii) g(x) ∈ − Int(C)for some x ∈ S ∩ S (scalar condition), and if (iv) the primal problem has a minimizer x ∈ X, thenthe dual objective

ϕ(y∗) = infx∈S

f(x) + 〈y∗, g(x)〉 : C∗ → R

has a minimum y∗ and there is no duality gap, i.e. f(x) = ϕ(y∗)

Proof. Look at

M =

〈f(x) + α, g(x) + y〉 ∈ R× Y : α > 0, x ∈ S, y ∈ C =⇒ (α, y) ∈ R+ × C.

Convexity-likeness ensures exactly that M is convex. Int(M) 6= ∅ since Int(R+ × C 6= ∅). Bysolvability of the primal problem we have (f(x), 0) /∈ M . Using Eidelheit separation one finds thatthere exists 0 6= (µ, y∗) ∈ R× Y ∗ such that

µ(f(x) + α) + 〈y∗, g(x) + y〉 ≥ µf(x).

Using α = 0 we getµ(f(x)− f(x)

)+ 〈y∗, g(x) + y〉 ≥ 0

Positive scaling of y—keeping everything else fixed—yields 〈y∗, y〉 ≥ 0 =⇒ y∗ ∈ C∗.

Using y = 0 ∧ x = x we getµ(f(x)− f(x)

)+ 〈y∗, g(x)〉 ≥ 0

Using y = 0 ∧ x = x gives

f(x)− f(x) + 〈y∗, g(x)〉 = 〈y∗, g(x)〉 ≥ 0 ≥ 〈y∗, g(x)〉 = 0.

f(x) + 〈y∗, g(x)〉 ≥ f(x) + 〈y∗, g(x)〉 = f(x)

=⇒ supy∗∈C∗

ϕ(y∗) ≥ f(x) =⇒ ϕ = supy∗∈C∗

ϕ(y∗) = f .

Example:min x s.t sin x ≤ 0, S = [a,∞) , a ∈

[−π2

, π2

]

The dual objective is

ϕ(y∗) = infa≤x

(x + y∗ sin x) ≥ a− y∗ > −∞ with y∗ ≥ 0

We will find the minimizer at the lower boundary or one of the stationary points of L(x, y∗) =x+y∗ sin x wrt. x. The stationary points of L(x, y∗) are those where 0 = ∂xL(x, y∗) = 1+y∗ cos(x). Ify∗ ∈ [0, 1), then there are no solutions to this condition. If y∗ ≥ 1, then x ∈ ± arccos(−1/y∗)+2πk :k ∈ ZZ are stationary points and

L(x, y∗) = 2πk ± arccos(−1/y∗) + sin(± arccos(−1/y∗) + 2πk)

= 2πk ±(

arccos(−1/y∗) +√

1− 1/(y∗)2)

.

42

Since x ≥ a ≥ −π2

the stationary interior point with the smallest value of L(x, y∗) is one of

x+ = arccos(−1/y∗) with L(x+, y∗) = arccos(−1/y∗) +√

(y∗)2 − 1 and x− = − arccos(−1/y∗) + 2π

with L(x−, y∗) = 2π − arccos(−1/y∗) −√

(y∗)2 − 1. One finds that L(x−, y∗) ≤ L(x+, y∗), so insummary

ϕ(y∗) =

a + y∗ sin(a) for y∗ ∈ [0, 1)

min

a + y∗ sin(a),

2π − arccos(−1/y∗)−√

(y∗)2 − 1

for y∗ ≥ 1

a = −π2

=⇒ f = −π2

= x. Here Proposition 5.1 applies due to convex-likeness.

We get

ϕ(y∗) =

−π2− y∗ < 0 if 0 ≤ y∗ ≤ 1

min

−π2− y∗, 2π − arccos

(

− 1y∗

)

−√

(y∗)2 − 1

if y∗ ≥ 1

= −π2− y∗ for all y∗ ≥ 0

=⇒ y∗ = 0 =⇒ ϕ = ϕ(y∗) = −π

2= f (i.e. no duality gap).

a = π2

=⇒ f = π = x. Here Proposition 5.1 does not apply due to lack of convex-likeness.

ϕ(y∗) =

π2

+ y∗ for y∗ ∈ [0, 1)

min

π2

+ y∗,

2π − arccos(−1/y∗)−√

(y∗)2 − 1

for y∗ ≥ 1

from which one concludes by numerical methods that y∗ ≈ 1.38005 with ϕ = ϕ(y∗) ≈ 2.95085.

=⇒ ϕ ≈ 2.95085 < 3.14159 < π = f

Consequence of strong dualityThe Lagrangian function

L(x, y∗) = f(x) + 〈y∗, g(x)〉 : S × C∗ → R

has a saddle point at (x, y∗) in that

L(x, y∗) ≤ L(x, y∗) ≤ L(x, y∗) for x ∈ S and y∗ ∈ C∗.

Question: Is the converse true, i.e. does a saddle point imply no duality gap?Answer: Yes as we show below.

Lemma. For L : S × C∗ → R ∪ −∞ ∪ ∞

ϕ ≡ supy∗∈C∗

infx∈S

L(x, y∗) ≤ infx∈S

supy∗∈C∗

L(x, y∗) ≡ f

without any restriction on C∗ 6= ∅ and S 6= ∅.

43

Proof. If ϕ = −∞ the assertion is trivial. Otherwise there exists, for any ε > 0, a y∗ε such that

ϕ− ε ≤ infx∈S

L(x, y∗ε) ≤ ϕ

=⇒ ϕ− ε ≤ L(x, y∗ε) for all x ∈ S

≤ supy∗∈C∗

L(x, y∗) for all x ∈ S

=⇒ ϕ− ε ≤ infx∈S

supy∗∈C∗

L(x, y∗) = f

Since ε may be chosen arbitrarily small, we get ϕ ≤ f .

Proposition 5.2. If there is a saddle point (x, y∗) such that

L(x, y∗) ≤ L(x, y∗) = L(x, y∗)

then we have no duality gap, i.e. ϕ = f with ϕ, f as defined above.

Proof.

L(x, y∗) = infx∈S

L(x, y∗) ≤ supy∗∈C∗

infx∈S

L(x, y∗)

≤ infx∈S

supy∗∈C∗

L(x, y∗)

≤ supy∗∈C∗

L(x, y∗) ≤ L(x, y∗)

Thus, exact equality holds throughout.

Example (No duality gap but also no saddle point)Take f(x) = e−x, S ≡ R, g(x) = −x ≤ 0, y∗ ≥ 0.

L(x, y∗) = e−x − y∗x

f = 0 = infx∈R

supy∗≥0

(e−x − y∗x) = infx∈R

+∞ for x < 0

e−x for x ≥ 0

ϕ = supy∗≥0

infx∈R

(e−x − y∗x) = supy∗≥0

−∞ if y∗ > 0

0 if y∗ = 0

= 0.

But the primal problem has no optimal solution x.

Proposition 5.3. If for some x, y ∈ S × C∗

infx∈S

L(x, y∗) = maxy∗∈C∗

infx∈S

L(x, y∗) = minx∈S

supy∗∈C∗

L(x, y∗) = supy∗∈C∗

L(x, y∗)

then (x, y∗) is a saddle point of L(x, y∗).

Proof.

infx∈S

L(x, y∗) = supy∗∈C∗

L(x, y∗)

L(x, y) ≤ supy∈C∗

L(x, y) = infx∈S

L(x, y∗) ≤ L(x, y∗)

Proposition 5.4. Under the assumptions of Proposition 5.1, x is a minimizer of f(x) and y∗ is amaximizer of ϕ(y∗) if and only if (x, y∗) is a saddle point of the Lagrangian

L(x, y∗) = f(x) + 〈y∗, g(x)〉.

44

6 Remarks on the Two-Norm Discrepancy

Definition. f : U ⊂ X → R is called twice continuously Frechet differentiable on an open domainU if the gradient map

f ′(x) : U ⊂ X → X∗

has the Frechet derivative f ′′(x) which is continuous with respect to x. Note thatf ′′(x) : U → L(X, X∗)

f ′′(x)[u, v] = f ′(x)[v, u] ∈ R − bilinear, u, v ∈ X

Lemma. Under the above assumptions on f at x ∈ U

f(x + v) = f(x) + f ′(x)v + 12f ′′(x)[v, v] + r(x, v)

︸︷︷︸

=o(‖v‖2)

with lim‖v‖→0

r(x, v)

‖v‖2 = 0.

Proposition 6.1. If f is twice continuously Frechet differentiable on U ∋ x and f ′(x) = 0 and forsome δ > 0

f ′′(x)[v, v] ≥ δ‖v‖2 for v ∈ X

then x is a local minimizer of f and there exists an ε > 0 such that

f(x) ≥ f(x) + δ3‖x− x‖2 if ‖x− x‖ ≤ ε

Proof.

f(x)− f(x) = 12f ′′(x)[x− x, x− x] + r2(x, x− x)

≥ δ2‖x− x‖2 − o(‖x− x‖2)

≥ δ3‖x− x‖2 for ‖x− x‖ ≤ ε > 0.

(Counter) Example

f(x) = −∫ 1

0

cos(x(ω))dω, x ∈ X = L2[0, 1]

f ′(x)v =

∫ 1

0

sin(x(ω))v(ω)dω

A candidate minimizer is x(ω) = 0, where

f(x) = −1 ≤ f(x) for x ∈ X

f ′(x)v =

∫ 1

0

sin(0)v(ω)dω = 0

f ′′(x)v2 =

∫ 1

0

cos(0)v(ω)2dω = ‖v‖2L2

For δ = 1, Proposition 6.1 seems to imply that x = 0 is the only minimizer in some neighborhoodof x = 0. But for any ε > 0

xε(ω) =

2π, 0 ≤ ω ≤ ε

0, ε < ω ≤ 1

45

has also a minimal value f(xε) = −1 and

‖xε − x‖2 =

∫ 1

0

(xε(ω))2 dω =

∫ ε

0

(2π)2 dω = 4π2ε

becomes arbitrarily small as ε→ 0. Proposition 6.1 does not apply because the gradient

f ′(x)v =

∫ 1

0

sin(x(ω))v(ω)dω

is nowhere Frechet differentiable.

Possible Remedies:It can be shown that f ∈ C2 on the space X = L∞[0, 1], i.e. the space of all essentially boundedfunctions on [0, 1]. But with respect to the ‖ · ‖∞ norm we do not have f ′′(0)[v, v] ≥ δ‖v‖2∞ forany δ > 0. Second order sufficiency works neither with respect to ‖ · ‖2 nor with respect to ‖ · ‖∞ .However one can show that

f(x) ≥ f(x) + 13‖x− x‖22 if ‖x− x‖∞ ≤ ε.

46

Appendix: Examples concerning existence of minimizers

Example 1: Consider a sequence of functions xj as shown in Figure 4:

Figure 4: Sequence of functions xj

‖xj‖∞ → 0 = ‖x∗‖. But ‖xj‖0,1 = 1 for all j. x∗ = 0 is a weak limit, which is good enough.lim f(xj) ≥ f(x∗), for suitable f : C[0, 1]→ R.

Example 2. (For existence proof via Arzela Ascoli): We consider an open pit mining problem asillustrated in Figure 5.

x(w)

xo(w)

slope limited by material

z = 0

Ω

note: x(w) >= xo(w) represents exavation depth

mountain surface

Figure 5: Open Pit Mine

Program:

• Define set of feasible profiles x ∈ S ⊆ C(Ω).

• Show that S is closed and has uniform Lipschitz constant ( =⇒ sequentially compact inC(Ω)).

47

• Define objective G(x) ≡ gain, and capacity constraint E(x) ≤ E.

• Show that both G and E are Lipschitz continuous on C(Ω).

• Conclude that optimal x∗ exists.

Conditions on feasible profiles:

(i) z = x(ω) ≥ x0(ω) for ω ∈ Ωz = x(ω) = x0(ω) for ω ∈ ∂Ω⇐⇒(x− x0) ∈ C0(Ω) ≡ continuous function satisfying homogeneous boundary conditionsnv ≥ 0 ≡ nonnegative orthant.

(ii) ∀ ω, ω′ ∈ Ω, |x(ω)−x(ω)| ≤ ϕ(ω, x(ω))‖ω−ω‖+o(‖ω−ω‖), for some given function (dependingon the position in space) ϕ : Ω × Z → R with Z = [z, z]; ‖.‖ = ‖.‖2 on Ω means orientationinvariance S = x ∈ x0+ C0(Ω) satisfying conditions (i) and (ii)

Question: Under which conditions on ϕ is S closed?Answer 0: Certainly if ϕ is continuous, but that is too restrictive.Example: Consider

ϕ(ω, z) = ϕα(ω) =

1 if ω > 00 if ω < 0α if ω = 0

=⇒ All xε for ε > 0 are feasible, but x0 = limε→0 xε is only feasible if α = 1 ⇐⇒ ϕα(ω) isupper semi-continuous (u.s.c).

Definition. f : X → R is called upper or lower semi-continuous (l.c.s) if

limj→∞

xj = x∗ =⇒ f(x∗) ≥ lim supj→∞

f(xj) or f(x∗) ≤ lim supj→∞

f(xj), respectively.

Theorem. If infimizing sequence has a limit x∗ and f is l.s.c, then f(x∗) = f∗.

Proposition. If ϕ : Ω × Z → R is upper semi-continuous then S is closed and uniformlyLipschitz continuous, i.e. for some L > 0 and x ∈ S we have |x(ω) − x(ω)| ≤ L‖ω − ω‖ forω, ω ∈ Ω (i.e. Arzela Ascoli is applicable).

(iii) See (∗) in the proof below:

Proof. Consider x ∈ S = closure of S in C(Ω). Due to u.s.c of ϕ, there exists for all ε >0, ω, ω ∈ Ω, y ∈ S, ∃ δ ≥ 0 with ‖ω − ω‖ < δ > ‖y − x‖∞ s.t

ϕ(ω, y(ω)) ≤ ϕ(ω, x(ω)) + ε4

Now we prove by contradiction that for all ω, ω ∈ Bδ(ω) ⊂ Ω, (ω 6= ω) andy ∈ Bδ(x) ⊂ C(Ω)

|y(ω)− y(ω)|‖ω − ω‖

︸︷︷︸

(..)

≤ ϕ(ω, x(ω)) + ε2

(∗)

48

(..) =|y(ω)− y(ω + ∆ω) + y(ω + ∆ω)− y(ω)|

2‖∆ω‖

(by tri. ineq.) ≤ |y(ω)− y(ω + ∆ω)|+ |y(ω + ∆ω)− y(ω)|‖∆ω‖+ ‖∆ω‖

≤ max

|y(ω)− y(ω + ∆ω)|‖∆ω‖ ,

|y(ω + ∆ω)− y(ω)|‖∆ω‖

Hence (∗) can be violated by (ω, ω) if it is also violated by (ω, ω + ∆ω) or (ω + ∆ω, ω) =(ω − ∆ω, ω). Continue with proof by contradiction. Generate sequences (ωj, ωj) by halving

such that ωj = ω + αj∆ω, ωj = ω + αj∆ω with αj > αj and ‖xj − ωj‖ = ‖∆ω‖2

.At limj→∞ ω = ω∗ = limj→∞ ω we have the inequalities

ϕ(ω∗, y(ω∗)) + ε4≤ ϕ(ω, x(ω)) + ε

2

≤ |y(ωj)− y(ωj)|‖ωj − ωj‖

≤ |y(ωj)− y(ω∗)|+ |y(ωj)− y(ω∗)|‖ωj − ω∗‖+ ‖ωj − ω∗‖

≤ max

|y(ωj)− y(ω∗)|‖ωj − ω∗‖

,|y(ωj)− y(ω∗)|‖ωj − ω∗‖

≤ ϕ(ω∗, y(ω∗)) + o [(‖ωj − ω∗‖+ ‖ωj − ω∗‖)]0= ϕ(ω∗, y(ω∗)) =⇒ ε = 0.

This is a contradiction since we started with ε = 14.

(Return to x ∈ S). For all ε > 0, ∃ δ > 0, y ∈ S such that‖y − x‖∞ ≤ ‖ω − ω‖. ε

4and ‖ω − ω‖ < δ

=⇒ |x(ω)− x(ω)|‖ω − ω‖ ≤ |y(ω)− y(ω)|+ ‖y − x‖∞

‖ω − ω‖

≤ ϕ(ω, x(ω)) + ε2

+ 2‖ω − ω‖. ε

4

‖ω − ω‖ ≤ ϕ(ω, x(ω)) + ε

Then as ε→ 0 for δ → 0 we get

|x(ω)− x(ω)| ≤ ϕ(ω, x(ω))‖ω − ω‖+ o(‖ω − ω‖)

Hence S is closed in C(Ω). Uniform Lipschitz continuity follows similarly.

Remark. Given any function ϕ ∈ L∞(Ω× Z) we can define upper semi-continuous envelopeby setting ϕ(ω, z) = lim sup(ω,z)→(ω,z) ϕ(ω, z).

(iv) E(x) =∫

Ω

∫ x(ω)

x0(ω)e(ω, z)dzdω, e = the excavation effort density;

(v) G(x) =∫

Ω

∫ x(ω)

x0(ω)g(ω, z)dzdω, g = gain density (net profit),

where e, g ∈ L∞(Ω× Z), e(ω, z) ≥ 0.

49

Thus we have the static optimization problem maxG(x) s.t x ∈ S and E(x) ≤ E. Existence ofsolution x∗ follows from the compactness of x ∈ S : E(x) ≤ E and continuity of G with respectto ‖.‖∞. In fact E and G are uniformly Lipschitz continuous. i.e.

|E(x)− E(x)| =

∣∣∣∣∣

∫

Ω

∫ x(ω)

x(ω)

e(ω, z)dzdω

∣∣∣∣∣

≤∫

Ω

|x(ω)− x(ω)| supz∈Z|e(ω, z)|dω

≤ |Ω|‖x− x‖∞‖e‖∞ ;

and similarly,G(x)−G(x) ≤ |Ω|‖x− x‖∞‖g‖∞ .

Observation: Neither E nor G are affine, even assuming e is concave with respect to z and thusmaking E(x) convex is unrealistic since multiple optima are typical.

References

[BGLS06] J. Frederic Bonnans, J. Charles Gilbert, Claude Lemarechal, and Claudia A. Sagastizabal.Numerical optimization. Theoretical and practical aspects. Transl. from the French. 2ndreviseded. Universitext. Berlin: Springer. xiv, 490 p., 2006.

[Cla90] Frank H. Clarke. Optimization and nonsmooth analysis. Reprint. Classics in AppliedMathematics, 5. Philadelphia, PA: SIAM, Society for Industrial and Applied Mathemat-ics. xii, 308 p., 1990.

[Fed96] Herbert Federer. Geometric measure theory. Repr. of the 1969 ed. Classics in Mathe-matics. Berlin: Springer-Verlag. xvi, 680 p., 1996.

[GJK91] Andreas Griewank, Hubertus Th. Jongen, and Man Kam Kwong. The equivalence ofstrict convexity and injectivity of the gradient in bounded level sets. Math. Program.,Ser. A, 51(2):273–278, 1991.

[GV] A. Griewank and O. Vogel. Manuscript.

[HUL01] Jean-Baptiste Hiriart-Urruty and Claude Lemarechal. Fundamentals of convex analysis.Grundlehren. Text Editions. Berlin: Springer. x, 259 p., 2001.

[Jah96] Johannes Jahn. Introduction to the theory of nonlinear optimization. 2nd rev. ed. Berlin:Springer. viii, 257 p., 1996.

[KK02] Diethard Klatte and Bernd Kummer. Nonsmooth equations in optimization. Regularity,calculus, methods and applications. Nonconvex Optimization and Its Applications 60.Dordrecht: Kluwer Academic Publishers. xxviii, 2002.

[LV03] L.P. Lebedev and I.I. Vorovich. Functional analysis in mechanics. Revised and extendedtranslation of the Russian edition. Springer Monographs in Mathematics. New York, NY:Springer. xi, 238 p., 2003.

[Pon80] J. Ponstein. Approaches to the Theory of Optimization. Cambridge University Press,Cambridge, England, 1980.

[Pre90] D. Preiss. Differentiability of Lipschitz functions on Banach spaces. J. Funct. Anal.,91(2):312–345, 1990.

50

Date post:	13-Feb-2022
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

FUNCTION SPACE OPTIMIZATION

Documents