+ All Categories
Home > Documents > Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and...

Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and...

Date post: 13-Jul-2020
Category:
Upload: others
View: 10 times
Download: 0 times
Share this document with a friend
55
Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010 at the Summer School on Analysis and Numerics of PDE Constrained Optimization, Lambrecht
Transcript
Page 1: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

Lectures NotesAlgorithms and Preconditioning in

PDE-Constrained Optimization

Prof. Dr. R. Herzog

held in July 2010at the Summer School onAnalysis and Numerics

of PDE Constrained Optimization,Lambrecht

Page 2: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

Please send comments to: [email protected]

Last updated: January 9, 2012

Page 3: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

Contents

Chapter 1. Algorithms in PDE-Constrained Optimization 5

1 A Taxonomy of Methods 5

2 Methods for Unconstrained Problems 72.1 Black-Box Methods 82.2 All-at-once Methods 12

3 Treatment of Inequality Constraints 163.1 Control Constraints: Primal-Dual Active Set Strategy 163.2 Mixed Control-State Constraints 213.3 State Constraints 22

Chapter 2. Preconditioning in PDE-Constrained Optimization 25

4 Introduction 25

5 Properties of Saddle Point Problems 265.1 Saddle Point Problems Arising in Optimization 275.2 Hilbert Space Setting 285.3 Spectral Properties 295.4 An Optimal Control Example 30

6 Preconditioning KKT Systems 336.1 Early Approaches 346.2 Constraint Preconditioners 356.3 Preconditioned Conjugate Gradients 356.4 Implementation Details 38

Appendix A. Software 45

7 Source Code with Comments 45

Bibliography 51

3

Page 4: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010
Page 5: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

CHAPTER 1

Algorithms in PDE-Constrained Optimization

Contents1 A Taxonomy of Methods 5

2 Methods for Unconstrained Problems 72.1 Black-Box Methods 82.1.1 Steepest Descent Method 82.1.2 Nonlinear Conjugate Gradient Methods 102.1.3 Newton’s Method 112.2 All-at-once Methods 122.2.1 First-Order Augmented Lagrangian Methods 122.2.2 SQP Methods 142.2.3 Augmented-Lagrangian SQP Methods 15

3 Treatment of Inequality Constraints 163.1 Control Constraints: Primal-Dual Active Set Strategy 163.1.1 Primal-Dual Active Set Strategy as an Outer Iteration 173.1.2 Primal-Dual Active Set Strategy within Newton and

SQP Iterations 203.1.3 Primal-Dual Active Set Strategy as a Semismooth

Newton Iteration 213.2 Mixed Control-State Constraints 213.3 State Constraints 22

§ 1 A Taxonomy of Methods

Let us consider mathematical optimization problems which may involve equalityand inequality constraints:

Minimize f(x)

s.t. (subject to) e(x) = 0

and g(x) ≤ 0.

(1.1)

We assume throughout that f, e, g are sufficiently smooth functions. The optimiza-tion variable x can be an element from some finite or infinite dimensional space.In the latter case, and when e(x) = 0 involves a partial differential equation (PDE),we call (1.1) a PDE-constrained optimization problem.

Examples for this class of problems are:

• optimal control problems,

• parameter identification problems,

• shape optimization problems.5

Page 6: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

6 Chapter 1. Algorithms in PDE-Constrained Optimization

It is a particular feature in PDE-constrained optimization that the variable x canbe partitioned into x = (y, u). This partitioning is induced by the PDE equalityconstraint. For well-posed problems, the state variable y can be uniquely (or atleast locally uniquely) determined from the PDE e(y, u) = 0, for any given u. In thiscontext, u is called the control variable or more generally, design variable. Forthe examples of problem classes above, u is either the control variable, the parameterto be identified, or the shape variable.

This observation implies that in PDE-constrained optimization, we have a choice:

(a) whether we keep the constraint e(y, u) = 0 as a side constraint in ouroptimization problem and treat both (y, u) as optimization variables,

(b) or whether we eliminate the PDE constraint by means of a solution operator(control-to-state map) y = S(u) (which solves e(S(u), u) = 0 for us); thenwe replace y by S(u) and keep only the design variable u as an optimizationvariable.

Note that this choice sets apart PDE-constrained optimization problems from genericnonlinear optimization problems (1.1), for which it is often not evident how to elim-inate the equality constraints by solving for some of the variables. However, thechoice whether to eliminate or keep the PDE constrains may not always be ours tomake. For instance, we might be asked to build an optimization loop around anexisting solver y = S(u) for a particular equation.

Algorithms for solving PDE-constrained optimization problems can be classifiedalong a number of different dimensions. Here is an attempt at naming at least someof these dimensions.

(a) How is the PDE constraint treated?

In black-box methods, thePDE constraint is eliminated us-ing a solver y = S(u). The onlyoptimization variable is the design(control) variable u.

In all-at-once methods, thePDE constraint is kept explicitlyas a side constraint. Both thestate and control variables x =(y, u) are optimization variables.

(b) What is the highest order of derivatives (or approximations thereof) usedby the algorithm?

Gradient-based methods useonly first-order derivatives of f , eand h.

Hessian-based methods usesecond-order derivatives of atleast one of f , e and h.

(c) What is the typical local rate of convergence of the method?

The method may typically exhibita q-linear rate of convergence.

Or it may converge at least q-superlinearly.

(d) How costly is each iteration?

Page 7: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

§ 2. Methods for Unconstrained Problems 7

No solutions of auxiliary opti-mization problems are needed.

Every iteration requires the solu-tion of at least one auxiliary opti-mization problem.

(e) Does the method produce iterates which are feasible w.r.t. some of theconstraints?

(f) Does the method provide some mechanism to improve its global convergenceproperties?

(g) Will the method take advantage of the fact that some of the constraintsmay be simple or linear?

(h) Can the method be applied in function space, or can it be applied to dis-cretized (finite dimensional) problems only?

Of course, these dimensions are not independent of each other. For instance, in orderto achieve a faster-than-linear rate of convergence, a method will typically need touse more than just gradient information, and an auxiliary optimization problem(e.g., based on a second-order Taylor expansion of the original problem) has to besolved in each iteration.

§ 2 Methods for Unconstrained Problems

We begin by briefly reviewing methods for problems without inequality constraints,which we shall call ’unconstrained’ problems:

Minimize f(y, u) over (y, u) ∈ Y × Us.t. e(y, u) = 0.

(2.1)

It is assumed that we have at our disposal the control-to-state map y = S(u) whichsolves e(S(u), u) = 0 so that we may choose to eliminate the PDE constraint e(·),which maps Y × U → Z ′ with Hilbert spaces Y, U, Z.

A simple and often used example is the following optimal control problem.

Minimize1

2‖y − yΩ‖2

L2(Ω0) +ν

2‖u‖2

L2(Ω)

s.t. (∇y,∇v)L2(Ω) = (u, v)L2(Ω) for all v ∈ H10 (Ω)

(2.2)

with state space Y = H10 (Ω) and control space U = L2(Ω). Here e : Y × U → Z ′

represents the weak form of Poisson’s equation,

〈e(y, u), v〉Z′,Z = (∇y,∇v)L2(Ω) = (u, v)L2(Ω) for all v ∈ H10 (Ω).

Figure 2.1 shows an overview of frequently used black-box (left column) and all-at-once methods (right column). We consider all of them except the SLP (sequentiallinear programming) in this section. Methods with typically exhibit a linear rate ofconvergence are shown in blue, while higher order methods are shown in red. Weemphasize that we can give here only a brief overview of classes of methods and weomit many aspects which are important to make practical methods, like issues ofinexactness and globalization, for instance.

Page 8: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

8 Chapter 1. Algorithms in PDE-Constrained Optimization

Newton

nonlinear CG

steepest descent

AL-SQP

SQP

Augm. Lagrangian

SLP

Figure 2.1. Overview of black-box (left column) and all-at-once al-gorithms (right column).

§ 2.1 Black-Box Methods

Black-box methods treat the reduced problem

Minimize f(u) := f(S(u), u) over u ∈ U. (2.3)

§ 2.1.1 Steepest Descent Method

Literature: [Kelley, 1999, Section 3.1]

The steepest descent method, or gradient method, uses the negative gradient asa search direction in every iteration. Note that we need to carefully distinguishbetween the gradient and the derivative of f . By definition, the derivative of f atu, denoted by f ′(u), is an element of U ′, the dual space of U . We use the Rieszisomorphism R : U ′ → U , to define the gradient ∇f(u) = Rf ′(u) as an element ofU . This implies that

〈f ′(u), δu〉U ′,U =(∇f(u), δu

)U

holds, where (·, ·)U is the scalar product of U . The distinction between f ′ and ∇fbecomes important as soon as we leave Rn with the standard inner product behind(where f ′(u) and ∇f(u) are just transposes of each other). Note that the Rieszisomorphism and thus the definition of the gradient depend on the scalar productin U .

The efficient evaluation of the derivative (or the gradient) uses the adjoint technique.Note that by the chain rule,

f ′(u) δu = fy(S(u), u)S ′(u) δu+ fu(S(u), u) δu

holds, or in terms of gradients,(∇f(u), δu

)U

=(∇yf(·), S ′(u) δu

)Y

+(∇uf(·), δu

)U.

Also by the chain rule, we obtain

e(S(u), u) = 0 ⇒ ey(·)S ′(u) + eu(·) = 0 ⇒ S ′(u) = −ey(·)−1eu(·).

Page 9: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

§ 2. Methods for Unconstrained Problems 9

Here and in the sequel, we abbreviate (S(u), u) by (·). This implies that

〈f ′(u), δu〉U ′,U = 〈fy(·), −ey(·)−1eu(·) δu〉Y ′,Y + 〈fu(·), δu〉U ′,U= 〈−ey(·)−?fy(·), eu(·) δu〉Z,Z′ + 〈fu(·), δu〉U ′,U= 〈−eu(·)?ey(·)−?fy(·), δu〉U ′,U + 〈fu(·), δu〉U ′,U= 〈eu(·)?p+ fu(·), δu〉U ′,U ,

where the adjoint state p ∈ Z is defined as the solution of

ey(y, u)? p = −fy(y, u)

with y = S(u).

In practice the formula for the adjoint equation can be found conveniently using the(formal) Lagrangian calculus, see [Tröltzsch, 2010, Section 2.10]. For our example(2.2), the adjoint state is defined as the solution of

(∇v,∇p)L2(Ω) = −(y − yΩ, v)L2(Ω) for all v ∈ H10 (Ω).

This happens to be the same PDE as the state equation, with a different right handside, since the differential operator is self-adjoint in this example. The gradient isgiven by

∇f(u) = ν u− p.

Using the negative gradient of f at the current iterate un as a search direction, wededuce the steepest descent method, Algorithm 1.

Algorithm 1 Steepest descent methodInput: u0 ∈ U , nmax

Output:1: Set n := 0 and done := false2: while not done and n < nmax do3: Calculate rn := −∇f(un)4: if convergence criterion satisfied then5: Set done := true6: else7: Find an appropriate step length tn ≈ arg mint>0 f(un + t rn)8: Set un+1 := un + tn rn9: Set n := n+ 110: end if11: end while

It is sufficient to determine the step length in step 7 by the Armijo rule with aninitial trial step length of t = 1.

The steepest descent algorithm has the advantage of being easy to implement. Itusually makes good progress during the first iterations but has poor convergenceproperties. It is therefore usually not recommended as a stand-alone method, butcan be used as a fall-back to globalize more advanced methods.

Page 10: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

10 Chapter 1. Algorithms in PDE-Constrained Optimization

§ 2.1.2 Nonlinear Conjugate Gradient Methods

Literature: [Nocedal and Wright, 2006, Section 5.2], [Kelley, 1999, Section 3.2.4],Volkwein [2004, 2003], Hager and Zhang [2006]The (linear) conjugate gradient (CG) method is known as an iterative method withattractive convergence properties for the solution of linear systems Ax = b of equa-tions with symmetric and positive definite coefficient matrix A. Since solving thisequation and finding the minimizer of (1/2)x>Ax − b>x are the same, CG can beviewed as an optimization algorithm for strictly convex quadratic objective func-tions. Its nonlinear versions can be employed to find minimizers of unconstrainedoptimization problems with more general nonlinear objectives.A nonlinear CG method is stated as Algorithm 2. In step 4, a search procedure isneeded to determine an appropriate step length which minimizes ϕ(t) = f(uk+t dk),or finds a zero of ϕ′(t) =

(∇f(uk + t dk), dk

)U. In principle, Newton’s method can

be used for this purpose, but this requires the repeated evaluation of the Hessian off , which is prohibitive in the context of nonlinear CG methods. As a remedy, onemay resort to a secant method in which ϕ(t) is approximated by

ϕ(t) ≈ ϕ(0) + ϕ′(0) t+ϕ′(σ)− ϕ′(0)

2σt2

with some σ > 0. Minimization of the right hand side then leads to

t = − σ ϕ′(0)

ϕ′(σ)− ϕ′(0)= −

σ(∇f(uk), dk

)U(

∇f(uk + σ dk), dk)U−(∇f(uk), dk

)U

. (2.4)

Typically, only few iterations of (2.4) are carried out which generate a sequence ofstep lengths ti and linearization points σi+1 = −ti, starting from an arbitrary initialvalue σ0 > 0. Such inexact line searches, however, may lead to search directionswhich are not descent directions for the objective f . A common solution is thento restart the method by re-setting d to the negative reduced gradient whenever(r, d)U ≤ 0 is found. This is the purpose of step 10 in Algorithm 2. An alternativestep length selection strategy in step 4 based on Wolfe conditions is given in [Nocedaland Wright, 1999, eq. (5.42)].

Algorithm 2 Nonlinear conjugate gradient algorithmInput: u0 ∈ U , nmax

Output:1: Set n := 0 and done := false2: Evaluate dn := rn := −∇f(un)3: while not done and n < nmax do4: Calculate a step length tn satisfying

(∇f(un + tn dn), dn

)U

= 05: Set un+1 := un + tn dn6: Set rn+1 := −∇f(un+1)7: Determine step length βn+1 by one of the formulas below8: Set dn+1 := rn+1 + βn+1 dn and increase n9: if (rn, dn)U ≤ 0 then10: Set dn := rn11: end if12: end while

Page 11: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

§ 2. Methods for Unconstrained Problems 11

Several common choices exist for the selection of the step length βk+1 in step 7.Among them are the Fletcher-Reeves and the Polak-Ribière formulas

βFRk+1 :=

(rk+1, rk+1)U(rk, rk)U

, βPRk+1 :=

(rk+1, rk+1 − rk)U(rk, rk)U

, βPR+k+1 := maxβPR

k+1, 0.

We mention that nonlinear CG methods generally outperform the steepest descentmethod, and refer to [Nocedal and Wright, 1999, Section 5.2] and the survey articleHager and Zhang [2006] for a comparison of the various step length selection strate-gies. For an application of nonlinear conjugate gradient methods for the solution ofoptimal control problems, we refer to, e.g., Volkwein [2004, 2003].

§ 2.1.3 Newton’s Method

Literature: [Ito and Kunisch, 2008, Chapter 5.2], [Kelley, 1999, Section 2]

Newton’s method attacks the necessary optimality condition f ′(u) = 0. Its applica-tion results in the iteration

∇2f(un) dn = −f ′(un), un+1 := un + dn. (2.5)

Here the Hessian ∇2f(u) is understood as an element of L(U,U ′), and hence dn ∈ U .

Algorithm 3 Newton’s methodInput: u0 ∈ U , nmax

Output:1: Set n := 0 and done := false2: while not done and n < nmax do3: Calculate rn := −f ′(un)4: if convergence criterion satisfied then5: Set done := true6: else7: Solve ∇2f(un) dn := rn8: Set un+1 := un + dn9: Set n := n+ 110: end if11: end while

The reduced Hessian matrix is usually not formed explicitly due to the tremendouscomputational effort to do so. By contrast, (2.5) is solved iteratively, using a Krylovmethod such as Minres or CG, which take advantage of the symmetry of ∇2f(un).Every iteration then requires the evaluation of one matrix-vector product∇2f(u) δu.Algorithm 4 describes how to achieve this.

Quasi-Newton methods, such as BFGS, offer an alternative to the exact evaluationof the Hessian matrix. By constrast, they store and accumulate gradient informationfrom iteration to iteration as a substitute for second derivatives. Due to the highdimension of discretized optimal control problems, limited-memory versions such asLM-BFGS, should be employed.

The basic Newton method as stated in Algorithm 3 has good local convergenceproperties (at q-superlinear or even q-quadratic rates). But in order to solve trulynonlinear problems, it has to be globalized, for instance, by embedding it into a

Page 12: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

12 Chapter 1. Algorithms in PDE-Constrained Optimization

trust-region framework, or by using a truncated version, which stops the iterativesolution of (2.5) when encountering directions of negative curvature, see [Nocedaland Wright, 2006, Section 6].

Algorithm 4 Evaluation of the reduced Hessian times a vector ∇2f(u) δu

Input: y = S(u), p = −ey(y, u)−?fy(y, u) (adjoint state)Output: ∇2f(u) δu1: Solve the linearized state equation ey(y, u) δy = −eu(y, u) δu for δy ∈ Y2: Calculate the right hand side t := Lyu(y, u, p)(·, δu) + Lyy(y, u, p)(·, δy) ∈ Y ′3: Solve the adjoint state equation ey(y, u)? δp = −t for δp ∈ Z4: Return ∇2f(u) δu := Luu(y, u, p)(·, δu) + Luy(y, u, p)(·, δy) + eu(y, u)?δp ∈ U ′

§ 2.2 All-at-once Methods

As was already mentioned, all-at-once methods keep the PDE constraint e(y, u) = 0as an explicit side constraint during the optimization. Both the state and controlvariables x = (y, u) are optimization variables now. The Lagrangian associatedwith problem (2.1) is defined as

L(y, u, p) = f(y, u) + 〈p, e(y, u)〉Z,Z′ .We will often combine x = (y, u) into one variable for conciseness of our notation.We briefly review three important classes of methods for the solution of problem(2.1).

§ 2.2.1 First-Order Augmented Lagrangian Methods

Literature: [Ito and Kunisch, 2008, Chapter 3], [Bertsekas, 1996, Chapter 2]

Augmented Lagrangian methods (aka method of multipliers) are related to La-grangian methods as well as penalty methods. Lagrangian methods take turnsin minimizing w.r.t. x the Lagrangian with pn fixed

f(x) + 〈pn, e(x)〉and then updating pn so that the Lagrange dual function is increased.

An example for the class of penalty methods is the quadratic penalty approach,which—in the present context—considers a family of unconstrained problems

Minimize f(y, u) +c

2‖e(y, u)‖2

Z′ over (y, u) ∈ Y × U.

The disadvantage of these methods is that the penalty parameter c needs to bedriven to ∞ which renders the resulting problems increasingly ill-conditioned.

Combining these two ideas leads to Algorithm 5. It stands out as a feature that theparameter c does not need to be taken to ∞.

Let us introduce the Augmented Lagrangian functional

Lc(x, p) := f(x) + 〈p, e(x)〉+c

2‖e(x)‖2

Z′ .

The derivative w.r.t. x of Lc is given by

Lc,x(x, p) δx = Lx(x, p) δx+ c(e(x), ex(x) δx

)Z′. (2.6)

Page 13: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

§ 2. Methods for Unconstrained Problems 13

This implies that every KKT point (satisfying e(x∗) = 0 and Lx(x∗, z∗) = 0) willalso be a stationary point for Lc,x for any c > 0.

The Hessian w.r.t. x of Lc is given byLc,xx(x, p)(δx1, δx2) = Lxx(x, p)(δx1, δx2) + c

(ex(x) δx1, ex(x) δx2

)Z′

+ c(e(x), exx(x)(δx1, δx2)

)Z′.

(2.7)

Suppose that (x∗, p∗) is a point satisfying second-order sufficient conditions,i.e., (x∗, z∗) is a KKT point and the Hessian Lxx(x∗, p∗) is positive definite on thenullspace of ex(x∗). Then we infer from (2.7) that for c > 0, the Hessian of Lc hasbetter coercivity properties than the Hessian of the original Lagrangian L:

Lc,xx(x∗, p∗)(δx, δx) = Lxx(x∗, p∗)(δx, δx) + c ‖ex(x∗) δx‖2Z′

≥ Lxx(x∗, p∗)(δx, δx).

Indeed, one can prove under appropriate assumptions that for c ≥ c0, the HessianLc,xx(x, p∗) is positive definite on all of X, uniformly for any x in some neighborhoodof x∗.

Algorithm 5 First-order Augmented Lagrangian methodInput: p0 ∈ Z, nmax

Output:1: Set n := 0 and done := false2: while not done and n < nmax do3: Solve for xn+1

Minimize Lcn(x, pn) := f(x) + 〈pn, e(x)〉+cn2‖e(x)‖2

Z′ over x ∈ X

4: Update the adjoint statepn+1 := pn + σn+1 e(xn+1)

5: if convergence criterion satisfied then6: Set done := true7: else8: Set cn+1 and n := n+ 19: end if10: end while

The choice of the augmentation parameter cn in Algorithm 5 is important in practice,yet ”no general purpose techniques appear to be available” (see [Ito and Kunisch,2008, p.77] and [Bertsekas, 1996, Chapter 2]). In the update step 4 for the adjointstate, another parameter σn+1 has to be chosen.

Finally, we comment on why Algorithm 5 is termed a first-order algorithm. To thisend, we introduce the Lagrangian dual function associated with problem (2.1), i.e.,

q(p) = infx∈XL(x, p) = inf

x∈Xf(x) + 〈p, e(x)〉Z,Z′ .

At the minimum x (suppose it exists), necessarily fx(x) + 〈p, ex(x)〉 = 0 holds, andwe consider x(·) a function of p. By the chain rule, the derivative of q is

q′(p) δp = 〈fx(x) + ex(x)?p, x′(p) δp〉+ 〈δp, e(x)〉 = 〈δp, e(x)〉,

Page 14: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

14 Chapter 1. Algorithms in PDE-Constrained Optimization

since the first term is zero. Recalling that the Lagrange dual problem is

Maximize q(p) over p ∈ Z,

step 4 of Algorithm 5 is a step in the direction of steepest ascent towards the solutionof the dual problem, hence the attribute ’first-order’.

Clearly, the main effort in every iteration is the solution of the primal unconstrainedminimization problem in step 3. Due to ill-conditioning of the problem for large val-ues of cn, gradient-based methods are out of the question and Newton-type methods(§ 2.1.3) should be used.

§ 2.2.2 SQP Methods

Literature: [Ito and Kunisch, 2008, Chapter 5.3], [Nocedal and Wright, 2006,Chapter 18], [Tröltzsch, 2010, Section 4.11], Alt [1994], Hinze and Kunisch [2001]

SQP (sequential quadratic programming) methods solve a sequence of QP (quadraticprogramming) problems built from successive second-order models of the originalproblem. At a given, or current, point (xn, pn), this QP is

Minimize1

2Lxx(xn, pn)(x− xn, x− xn) + fx(xn)(x− xn) over x ∈ X

s.t. ex(xn)(x− xn) + e(xn) = 0.(2.8)

In the most basic form of the SQP algorithm, the solution x of (2.8) (suppose itexists) and the Lagrange multiplier p (adjoint state) associated with the linearizedequality constraint are used as subsequent iterates (xn+1, pn+1).

Algorithm 6 Basic SQP algorithmInput: x0 ∈ X, p0 ∈ Z, nmax

Output:1: Set n := 0 and done := false2: while not done and n < nmax do3: Solve the QP (2.8) for (xn+1, pn+1)4: if convergence criterion satisfied then5: Set done := true6: else7: Set n := n+ 18: end if9: end while

It is easy to verify that Algorithm 6 is equivalent to Newton’s method, applied tothe KKT conditions Lx(x, p) = 0 and e(x) = 0. Therefore, the SQP method issometimes referred to as the Lagrange-Newton method. The Newton step readsLyy(·) Lyu(·) ey(·)?

Luy(·) Luu(·) eu(·)?ey(·) eu(·) 0

δyδuδp

= −

Ly(·)Lu(·)e(·)

, (2.9)

where (·) stands for the current iterate (yn, un) or (yn, un, pn) as appropriate. In-deed, there are many similarities between Algorithm 6 and Newton’s method for the

Page 15: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

§ 2. Methods for Unconstrained Problems 15

reduced problem, Algorithm 3. It can be shown that the Newton direction dn inAlgorithm 3 satisfiesLyy(·) Lyu(·) ey(·)?

Luy(·) Luu(·) eu(·)?ey(·) eu(·) 0

δydnδp

= −

0Lu(·)

0

.

However, δy and δz are not used as updates in Algorithm 3, but they are discardedand the state equation and adjoint equation are solved exactly in every iteration.

§ 2.2.3 Augmented-Lagrangian SQP Methods

Literature: [Ito and Kunisch, 2008, Chapter 6], Volkwein [2000, 1997]

The weakest link in the first-order Augmented Lagrangian Algorithm 5 (see Sec-tion 2.2.1) is the first-order (gradient-based) update of the dual variables (the adjointstate). Augmented-Lagrangian SQP methods improve on this point and employ asecond-order update formula. It will become clear that the resulting method isclosely related to SQP methods, which determines the name of the algorithm.

Our starting point is again the Augmented Lagrangian functional

Lc(x, p) := f(x) + 〈p, e(x)〉+c

2‖e(x)‖2

Z′

and the observation that every KKT point for (2.1) (satisfying e(x∗) = 0 andLx(x∗, p∗) = 0) will also be a stationary point for Lc,x for any c > 0. Let usapply Newton’s method to the system

Lc,x(x, p) = 0, e(x) = 0,

which results in the iteration(Lc,xx(xn, pn) ex(xn)?

ex(xn) 0

)(x− xnp− pn

)= −

(Lc,x(xn, pn)

e(xn)

)(2.10)

A straightforward computation ([Ito and Kunisch, 2008, p.158–159]) shows that(2.10) is equivalent to first setting pn := pn + cR e(xn), with R : Z ′ → Z the Rieszisomorphism, and then solving(

Lxx(xn, pn) ex(xn)?

ex(xn) 0

)(x− xnp− pn

)= −

(Lx(xn, pn)e(xn)

), (2.11)

which avoids the need to actually form derivatives of Lc.

Altogether, we obtain Algorithm 7, which coincides with [Ito and Kunisch, 2008,Algorithm 6.3]. For the choice c = 0, Algorithm 7 reduces to the SQP method(Algorithm 6).

Note that, in contrast to the first-order Augmented Lagrangian Algorithm 5, thesystem (2.11) combines the primal and dual updates into one iteration rather thancarrying them out subsequently. At some additional cost per iteration, the nextiterate xn+1 could be alternatively determined from a primal minimization w.r.t.x, and then (2.11) might only be used to update the dual variable, see [Ito andKunisch, 2008, Algorithms 6.1, 6.2].

Page 16: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

16 Chapter 1. Algorithms in PDE-Constrained Optimization

Algorithm 7 Basic Augmented Lagrangian SQP algorithmInput: x0 ∈ X, p0 ∈ Z, nmax, c ≥ 0Output:1: Set n := 0 and done := false2: while not done and n < nmax do3: Set pn := pn + cR e(xn)4: Solve the QP (2.11) for (xn+1, pn+1)5: if convergence criterion satisfied then6: Set done := true7: else8: Set n := n+ 19: end if

10: end while

§ 3 Treatment of Inequality Constraints

Inequality constraints always add nonlinearity to the optimization problem (1.1),even if f , e, and g are linear. This is due to the complementarity condition〈µ, g(x)〉 = 0 which introduces a multiplicative coupling.It becomes evident already in finite dimensional optimization that it is importantto distinguish between simple coordinatewise constraints (a ≤ x ≤ b) and moreinvolved constraints such as coupled linear (a ≤ Ax ≤ b) or nonlinear ones. InPDE-constrained optimization, an additional feature which seperates ’easy’ from’hard’ constraints is whether the inequality constraint involves the control and/orthe state variables.In a first attempt, one may try and treat the inequality constraints in an outer loop.For instance, barrier methods or penalty methods can be used to remove theinequality constraints and convert them into additional terms in the objective suchas

−c∫

Ω

ln(y − yb) dx orc

2‖max0, y − yb‖2

L2(Ω)

in case of a pointwise state constraint y ≤ yb in Ω. Then any of the approaches from§ 2 can be used in the inner loop. This is illustrated in Figure 3.1.However, more efficient strategies are usually obtained when dealing with the in-equality constraints in the main optimization loop. We highlight some popularalgorithms for important special cases in the subsequent sections.

§ 3.1 Control Constraints: Primal-Dual Active Set Strategy

Literature: Bergounioux et al. [1999], [Ito and Kunisch, 2008, Chapter 7], Röschand Kunisch [2002], Ulbrich [2003], [Tröltzsch, 2010, Section 2.12]Let us state the following model problem:

Minimize f(y, u) over (y, u) ∈ Y × Us.t. e(y, u) = 0

and ua ≤ u ≤ ub in Ω0.

(3.1)

The control constraint is posed on the set Ω0 where the control is defined, e.g., asubset of Ω, or part of its boundary. The case of finite dimensional controls is also

Page 17: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

§ 3. Treatment of Inequality Constraints 17

treat inequalities

Newton

treat inequalities

nonlinear CG

treat inequalities

steepest descent

treat inequalities

AL-SQP

treat inequalities

SQP

treat inequalities

Augm. Lagrangian

treat inequalities

SLP

Figure 3.1. Schematic of algorithms with inequalities treated in anouter loop.

included with appropriate interpretation. Under suitable assumptions, necessaryoptimality conditions for (3.1) are given by

fy(y, u) δy + ey(y, u) δy = 0 for all δy ∈ Y (3.2a)fu(y, u) δu+ eu(y, u) δu+ µ = 0 for all δu ∈ U (3.2b)e(y, u) = 0 (3.2c)ua ≤ u ≤ ub, u = ua where µ < 0, u = ub where µ > 0. (3.2d)

Equation (3.2d) gives rise to a nonlinear relationship between the Lagrange multi-plier µ and the control variables u.

§ 3.1.1 Primal-Dual Active Set Strategy as an Outer Iteration

We briefly review the primal-dual active set strategy (PDAS) which converts(3.1) into a sequence of equality-constrained problems (3.7), so that methods from§ 2 can be applied. The PDAS can be motivated by starting off from the equivalentre-formulation of (3.1)

Minimize f(y, u) + IUad(u) over (y, u) ∈ Y × Us.t. e(y, u) = 0

(3.3)

which uses the indicator function

IUad(u) =

0 if u ∈ Uad

+∞ if u 6∈ Uad

of the convex admissible set

Uad = u ∈ U : ua ≤ u ≤ ub ≤ 0 in Ω0.The optimality conditions for (3.3) involve an element µ ∈ ∂IUad(u) from the convexsubdifferential of IUad , which is equivalent to the complementarity relation (3.2d).

In convex analysis, the generalized Moreau-Yosida approximation

ϕc(u;µ) = infv∈U

ϕ(u− v) + (µ, v)U +

c

2‖v‖2

U

for given µ ∈ U

Page 18: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

18 Chapter 1. Algorithms in PDE-Constrained Optimization

of a convex function ϕ : U → R∪+∞ (with further properties) is a powerful toolto regularize the potentially non-smooth function ϕ. In the case of ϕ = IUad andU = L2(Ω0), one can show

IUad,c(u;µ) =

∫Ω

iUad,c(u(ξ);µ(ξ)) dξ (3.4)

(IUad,c)′(u;µ) = c

(u+

1

cµ− projUad

(u+1

cµ)), (3.5)

where

iUad,c(u;µ) = − 1

2c|µ|2 +

c

2

∣∣max0, u− ub +1

cµ∣∣2 +

c

2

∣∣min0, u− ua +1

cµ∣∣2

and projUadis the pointwise projection onto Uad, i.e.,

projUad(u) = maxua,minu, ub.

An illustration of IUad,c(· ;µ) for the case U = R and various values of c and µ isgiven in Figure 3.2.

µ = 0 µ = 10 µ = 20

c = 10

c = 50

c = 250

Figure 3.2. An illustration of the generalized Moreau-Yosida regu-larization of the indicator function I of an interval in R.

At this point, there are several possibilities to use the Morau-Yosida approximation:

(a) We could solve, instead of (3.3), a family of regularized problemsMinimize f(y, u) + IUad,c(u;µ) over (y, u) ∈ Y × U

s.t. e(y, u) = 0,(3.6)

perhaps for some fixed choice of µ, and drive c → ∞. This is a penaltymethod, and µ 6= 0 causes a shift of the threshold where the penalty kicksin, see Figure 3.2.

Page 19: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

§ 3. Treatment of Inequality Constraints 19

(b) However, there is a more efficient method to treat control constraints, whichuses the fact that

µ ∈ ∂IUad(u) ⇔ µ = (IUad,c)′(u;µ)

for some (and then for all) c > 0, see [Ito and Kunisch, 2000, Theorem 2.4].This relationship, together with (3.5), motivates the use of the multiplierrule

µ = c(u+

1

cµ− projUad

(u+1

cµ))

for some c > 0 as a prediction strategy for the sets where u = ua or u = ubshould hold. (Recall that this is encoded in the multiplier by (3.2d).) Onearrives at Algorithm 8, the primal-dual active set strategy (PDAS).

active set strategies have long since been used in optimization to estimate whichof the inequalities of a given problem will be active (satisfied with equality) at thesolution. One of the distinguishing features of PDAS is that it uses both primal (u)and dual (µ) variables for this estimation.

Algorithm 8 Primal-dual active set strategyInput: u0 ∈ U , µ0 ∈ UOutput:1: Set n := 0 and done := false2: while not done and n < nmax do3: Determine the active and inactive sets

A−n := x ∈ Ω0 : µn − c (ua − un) < 0A+n := x ∈ Ω0 : µn + c (un − ub) > 0In := Ω0 \ (A+

n ∪ A−n )

4: if convergence criterion satisfied then5: done := true6: else7: Solve the equality-constrained problem (3.7) for un+1 in U and associated

Lagrange multipliers µ±8: Set µn+1 := µ+ − µ−9: Set n := n+ 110: end if11: end while

In every iteration of PDAS, the following equality-constrained problem has to besolved.

Minimize f(y, u) over (y, u) ∈ Y × Us.t. e(y, u) = 0

and u = ua in A−n , u = ub in A+n .

(3.7)

This can be achieved, for instance, by restricting u to the inactive set In and thenusing any of the algorithms considered in § 2, or alternatively, by adding u = uaand u = ub to the problem as additional equality constraints. We obtain PDAS-Newton or PDAS-SQP, for instance.

Page 20: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

20 Chapter 1. Algorithms in PDE-Constrained Optimization

§ 3.1.2 Primal-Dual Active Set Strategy within Newton and SQPIterations

Literature: Tröltzsch [1999], Griesse et al. [2008], [Griesse, 2007, Chapter 2]

We have just outlined in the previous section an algorithmic scheme as in Figure 3.1,where the inequalities are removed (by the primal-dual active set approach) throughan outer iteration, and then Newton’s method or an SQP method is used as an innerloop.

However, Newton’s method (§ 2.1.3) and the SQP method (§ 2.2.2) can naturallybe extended to handle inequality constraints, and thus the PDAS might as well beused in an inner loop, even in the case of pointwise nonlinear inequality constraints.

Despite the complementarity conditions present in problems with inequality con-straints, the optimality condition can be written in a form close to an equation,namely a generalized equation, see Robinson [1980]. In the case of the reducedformulation appropriate for Newton’s method, this generalized equation takes theform

0 ∈ F (u, µ) +N (µ), (3.8)

where N (µ) is the normal cone of the set of non-negative Lagrange multipliers

K+ = µ ∈ L2(Ω0) : µ ≥ 0 a.e. in Ω0.

For example, in the case of problem (2.2) with the one-sided pointwise control con-straint u ≤ ub in Ω0, one has

F (u, µ) =

(ν u− p(u) + µ

u− ub

)where p(u) is the adjoint state belonging to u, and the normmal cone is given by

N (µ) = 0 ×

∅ if µ 6∈ K+

z ∈ L2(Ω) : (z, ν − µ)L2(Ω0) ≤ 0 for all ν ∈ K+ if µ ∈ K+.

The Newton iteration for (3.8) reads

0 ∈ F (un, µn) + F ′(un, µn)(u− un, µ− µn) +N (µ),

which is to be solved for (un+1, µn+1) and which constitutes the optimality systemfor a QP with linearized inequality constraints, which can be solved using PDAS,for instance. This leads to Newton-PDAS.

In the same manner, SQP methods naturally handle nonlinear inequality constraints,say, g(u) ≤ 0. Then the sequence of QPs will contain linearized inequality con-straints (compare (2.8)):

Minimize1

2Lxx(xn, pn)(x− xn, x− xn) + fx(xn)(x− xn) over x ∈ X

s.t. ex(xn)(x− xn) + e(xn) = 0

and g(un)(u− un) + g(un) ≤ 0.

(3.9)

Again, (3.9) can be solved using PDAS, which results in the SQP-PDAS approach.We depict this situation in Figure 3.3.

Page 21: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

§ 3. Treatment of Inequality Constraints 21

Newton for generalized eq.

treat inequalities

AL-SQP with inequalities

treat inequalities

SQP with inequalities

treat inequalities

Figure 3.3. Schematic of algorithms with inequalities treated in aninner loop.

The close relationship pointed out in § 2.2.2 between SQP and Newton’s method ismaintained since (3.9), too, can be interpreted as a Newton step for a generalizedequation of the form

0 ∈ F (y, u, p, µ) +N (µ),

which now involves the state and the adjoint state as well.

Finally, we remark that the Augmented Lagrangian-SQP approach (§ 2.2.3) canalso be extended to cover inequality constraints. We refer to [Ito and Kunisch,2008, Chapter 6.3, Algorithm 6.6] for details.

§ 3.1.3 Primal-Dual Active Set Strategy as a Semismooth NewtonIteration

Literature: Hintermüller et al. [2002], [Ito and Kunisch, 2008, Chapter 8.4], Ulbrich[2003]

There is even one further possibility which combines the linearization and PDAS intoone single loop. This interpretation is based on a reformulation of the complemen-tarity condition (3.2d) using a particular nonlinear complementarity (NCP)function. Apparently, maxa, b = 0 holds if and only if a ≤ 0, b ≤ 0 and a b = 0.This remains true if b is replaced by c b for any c > 0. It is easy to check that (3.2d)can be equivalently stated as

µ−min0, µ− c (ua − u) −max0, µ+ c (u− ub) = 0 in Ω0, (3.10)

and hence (3.2) becomes a system of equations, which are, however, not differen-tiable in the usual sense. Nevertheless, this modified system can be shown to besemismooth under certain conditions. Note that the subsets of Ω0 where the maxor min are attained by the non-zero terms coincide with the active sets occuringin Algorithm 8. The derivative of max0, u can be determined pointwise and it iseither 0 or 1, depending on whether u > 0 or u ≤ 0. Consequently, the resultingsemismooth Newton iteration takes the form of an active set method.

Depending on whether we use the nonlinear complementarity condition (3.10) in thecontext of the black-box or the all-at-once framework, we obtain two more methodswhich differ from PDAS-Newton and Newton-PDAS, or from PDAS-SQP and SQP-PDAS mainly with respect to when the active sets are updated.

§ 3.2 Mixed Control-State Constraints

Literature: Meyer et al. [2005], Meyer et al. [2007], Griesse et al. [2008], Röschand Wachsmuth [2010]

Page 22: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

22 Chapter 1. Algorithms in PDE-Constrained Optimization

Mixed control-state constraints, such as ma ≤ y + u ≤ mb or nonlinear variantsthereof, behave in many ways like control constraints considered in § 3.1. There isa natural extension of the primal-dual active set method in the all-at-once context.We refer to the literature above.

§ 3.3 State Constraints

Literature: Ito and Kunisch [2003], Hintermüller and Kunisch [2006], Hintermüllerand Hinze [2008]

The treatment of pointwise state constraints, such as ya ≤ y ≤ yb on Ω, is moreinvolved. The reason lies partly with the associated Lagrange multiplier µ, whichis not, in general, a function, but only a Borel measure µ ∈ C(Ω)′, see Casas[1986, 1993]. Consequently, the ideas which led to the PDAS method for controlconstraints, cannot be repeated. In particular, we fail to give a meaning to theanalog of µ = (IUad,c)

′(u;µ). Neither can we pursue the alternative motivation(3.10) of PDAS because the complementarity condition does not have a pointwiseinterpretation in case of state constraints.

At this point we recall that Augmented Lagrangian Methods (see § 5) were capable ofremoving undesired constraints by way of adding a penalty term to the Lagrangian.We only need to extend this idea to inequality constraints. For simplicity we considerfor a moment only an upper bound y − yb ≤ 0, which we convert into an equalityconstraint y − yb + s = 0 by means of a slack variable s ≥ 0, which becomes anadditional optimization variable.

Applying the Augmented Lagrangian idea to our state-constrained problem withslack variable

Minimize f(y, u) over (y, u, s) ∈ Y × U × L2(Ω)

s.t. e(y, u) = 0

and y − yb + s = 0 in Ω, s ≥ 0 in Ω,

(3.11)

we arrive at the Augmented Lagrangian functional

Lc(y, u, µ, s) := f(y, u) + (µ, y − yb + s)L2(Ω) +c

2‖y − yb + s‖2

L2(Ω). (3.12)

Note that we made a concious choice here by augmenting the constraint y − yb + swith respect to the L2(Ω) norm. We also chose not to augment the PDE constrainte(y, u) = 0, rather we keep it explicitly as a side constraint.

Looking back at the first-order Augmented Lagrangian method (Algorithm 5, step 3)we see that we will need to find a minimizer of Lc with respect to the primal variables(y, u, s). A closer look at (3.12) shows that Lc is actually uniformly convex w.r.t. s,and hence a partial minimization w.r.t. s can be carried out analytically. If it weren’tfor the constraint s ≥ 0, we would find the minimizer to be s(y, µ) = −(y−yb)−µ/c.With the constraint, we have

s(y, µ) = max

0,−(y − yb)−1

cµ.

By plugging this expression into Lc we arrive at

Lc(y, u, µ) := f(y, u) +c

2

∥∥max0, y − yb +1

cµ∥∥2

L2(Ω)− 1

2c‖µ‖2

L2(Ω). (3.13)

Page 23: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

§ 3. Treatment of Inequality Constraints 23

Wait a minute, we have encountered terms like these before. Indeed, the AugmentedLagrangian functional and the original objective plus a generalized Moreau-Yosidapenalty,

Lc(y, u, µ) = f(y, u) + IYad,c(y;µ),

are the same! Here we use Yad = y ∈ L2(Ω) : y ≤ yb and IYad,c(y;µ) is similar to(3.4).

We can therefore proceed from here in either of two ways:

(a) We may use (3.13) within a first-order Augmented Lagrangian method, byminimizing (3.13) w.r.t. (y, u, p), subject to e(y, u) = 0, for some currentvalues of cn ∈ R and µn ∈ L2(Ω), then perform an update for the dualvariable µn associated with the augmented constraint. This is Algorithm 9,with the setting slightly extended to cover bilateral constraints ya ≤ y ≤ yb,and the term − 1

2c‖µ‖2

L2(Ω) omitted from the objective in step 3 since it isconstant from the point of view of the ’primal’ variables (y, u, p).

(b) Or we may use (3.13) in the spirit of a penalty method, perhaps keep µfixed, but drive c→∞.

Both variants were analyzed and tested in Ito and Kunisch [2003], and the penaltyapproach was found to perform better in practice.

Algorithm 9 First-order Augmented Lagrangian method for state-constrainedproblemsInput: µ0 ∈ L2(Ω), nmax

Output:1: Set n := 0 and done := false2: while not done and n < nmax do3: Solve for (yn+1, un+1, pn+1)

Minimize Lcn(y, u, µn) := f(y, u) +cn2

∥∥max0, y − yb +1

cnµn∥∥2

L2(Ω)

+cn2

∥∥max0, y − ya +1

cnµn∥∥2

L2(Ω)over (y, u) ∈ Y × U

s.t. e(y, u) = 0

4: Update the Lagrange multiplierµn+1 := max0, µn + cn (y − yb)+ min0, µn − cn (ya − y)

5: if convergence criterion satisfied then6: Set done := true7: else8: Set cn+1 and n := n+ 19: end if10: end while

We briefly mention an alternative method of regularizing state constraints, whichgoes back to Meyer et al. [2005, 2007]. The idea is to replace ya ≤ y ≤ yb by themixed control-state constraint ya ≤ y + ε u ≤ yb, which is benign for fixed values of

Page 24: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

24 Chapter 1. Algorithms in PDE-Constrained Optimization

ε > 0 (§ 3.2). This approach is termed Lavrentiev regularization and it clearlyrequires that u and y are defined on the same domain. It does not work, for instance,with distributed constraints and boundary control. For this case, Krumbiegel andRösch [2009] have developed the so-called virtual control concept, which hassimilarities to the generalized Moreau-Yosida penalty approach.

Page 25: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

CHAPTER 2

Preconditioning in PDE-Constrained Optimization

Contents4 Introduction 25

5 Properties of Saddle Point Problems 265.1 Saddle Point Problems Arising in Optimization 275.2 Hilbert Space Setting 285.3 Spectral Properties 295.4 An Optimal Control Example 30

6 Preconditioning KKT Systems 336.1 Early Approaches 346.2 Constraint Preconditioners 356.3 Preconditioned Conjugate Gradients 356.4 Implementation Details 386.4.1 PCG Solver 406.4.2 Scalar Product 406.4.3 Preconditioner 40

§ 4 Introduction

Literature: Saad [2003], Simoncini and Szyld [2007]

Despite remarkable progress in the area of sparse direct solvers for linear systemsof equations, iterative solvers are most often the only feasible solution approachesfor very large-scale problems. In particular, sparse factorizations of PDE problemsdiscretized by finite element, finite difference or finite volume methods in 3D (threespatial dimensions) suffer from significant fill-in, which limits the applicability ofdirect solution methods.

The convergence of iterative solvers for Ax = b depends on the spectral propertiesof A like the distribution of the eigenvalues or its condition number. Unfortunately,matrices arising in most discretization approaches to PDEs are ill-conditioned. Forinstance, the finite element stiffness matrix A of a 2nd-order elliptic PDE operatorhas a condition number proportional to h−2. Therefore, iterative solvers will onlybe effective when combined with a preconditioner P . The preconditioned solver willthen solve the equivalent system P−1Ax = P−1b, which hopefully has better spectralproperties. It is a requirement that systems Py = f are significantly easier to solvethan the problem Ax = b itself.

It was already mentioned in Chapter 1 that those optimization methods with afaster-than-linear rate of convergence typically need to solve, in every iteration, asimplified version of the original problem. To illustrate this point, we restrict our dis-cussion to the SQP (§ 2.2.2) and Augmented Lagrangian SQP (§ 2.2.3) methods. In

25

Page 26: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

26 Chapter 2. Preconditioning in PDE-Constrained Optimization

both of these methods, we have to solve in every iteration a quadratic programming(QP) subproblem, which is represented by a linear system of equations governed bythe matrix (

Lxx e?xex 0

)evaluated at the current iterate, compare (2.9) and (2.11). Matrices of this type aretermed saddle point matrices, and the corresponding linear systems are saddlepoint systems. The name comes from the fact that solving KKT systems is (atleast for convex problems) equivalent to finding a saddle point for the Lagrangian.

We conclude that the efficient solution of saddle point problems is of great impor-tance for the efficient solution of nonlinear optimization problems. Therefore, weaddress in § 5 their spectral properties and in § 6 their preconditioned solution byappropriate Krylov subspace methods.

§ 5 Properties of Saddle Point Problems

We begin by explaining some concepts in finite dimensions and later switch backto the Hilbert space setting in Section 5.2. Sometimes saddle point problems aredefined to be matrices partitioned in the following way:(

A B>1B2 −C

).

This definition is so general that actually every matrix becomes a saddle pointmatrix. We will here adhere to the special case

K =

(A B>

B 0

)with A = A> (5.1)

and use the term saddle point matrix exclusively for these. Notice that the SQPand AL-SQP methods produce subproblems of this type. This remains true also inthe presence of inequality constraints, for instance, with control constraints treatedusing the primal-dual active set strategy, see (6.8), or with state constraints as in§ 3.3, see [Herzog and Sachs, 2010, Section 3]. We mention that in some optimizationmethods, most notably interior point methods, the −C block is present, but thesewill not be considered here.

Let us suppose from now on that A ∈ Rn×n and B ∈ Rm×n with m ≤ n.

The first question is: Under what conditions is (5.1) invertible? This is easy tocharacterize in case A = A> 0 (A is positive semidefinite). Then (5.1) is invertibleif and only if B has full row rank m, and kerA∩ kerB = 0. When the conditionsA 0 or A = A> are dropped, the situation becomes more difficult, see Ganstereret al. [2003].

In the sequel we shall work with the standing assumption that B has full row rankm ≤ n, and one of the following conditions holds:

A = A> 0 on kerB (STD)

A = A> 0 on kerB, A 0 (STD+)

A = A> 0 on all of Rn (STD++)

Page 27: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

§ 5. Properties of Saddle Point Problems 27

Note that (STD++) ⇒ (STD+) ⇒ (STD).It is well known that saddle point matrices are indefinite. Under assumption (STD),K has exactly n positive and m negative eigenvalues, see for instance Chabrillac andCrouzeix [1984]. In other words, its inertia is In(K) = (n,m, 0). If (STD++) holds,then A is invertible, and the claim can be easily seen using the factorization[

A B>

B 0

]=

[I 0

BA−1 I

] [A 00 S

] [I A−1B>

0 I

].

Here S = −BA−1B> denotes the Schur complement, which is negative definite.

§ 5.1 Saddle Point Problems Arising in Optimization

Each of the conditions just mentioned actually has a natural meaning in optimiza-tion. To discuss this, consider the finite dimensional problem

Minimize1

2x>Ax− f>x

s.t. Bx = g.(5.2)

The Lagrangian for (5.2) is L(x, p) = (1/2)x>Ax− f>x+ p>(Bx− g) and its KKTconditions are (

A B>

B 0

)(xp

)=

(fg

). (5.3)

Condition (STD) is sufficient to ensure the existence of a unique minimizer x∗ of(5.2). To see this, let Z be a matrix whose columns form a basis of kerB. Choosex0 ∈ Rn satisfying Bx0 = g (this works since B was assumed to be onto) andconsider the equivalent reduced problem

Minimize1

2(x0 + Z y)>A(x0 + Z y)− f>(x0 + Z y)

for y ∈ Rn−m. Due to the assumption that A is positive definite on kerB it followsthat Z>AZ is positive definite, therefore the problem above has a unique solutiony∗, its objective being uniformly convex. The solution of (5.2) is then x∗ = x0+Z y∗,and the unique associated p∗ can be found from solving

BB>p∗ = B (f − Ax∗).This follows again from the condition that B has full row rank, which is usuallycalled the LICQ (linear independence constraint qualification) in optimization, andwhich is known to entail the uniqueness of the Lagrange multiplier p.Let us discuss the necessity of condition (STD). It A were not positive definite onkerB, there would exist in kerB a direction of negative curvature for (1/2)x>Ax−f>x, and hence the reduced objective would be unbounded below on the feasible setx ∈ Rn : Bx = g.The stronger requirement (STD+) means that the objective is convex (but not nec-essarily uniformly so) on the whole space Rn. In this case, we say that (5.2) is aconvex problem, and solving (5.3) is actually equivalent to finding the unique saddlepoint of the Lagrangian, characterized by

L(x∗, p) ≤ L(x∗, p∗) ≤ L(x, p∗) for all x ∈ Rn, p ∈ Rm.

Finally, (STD++) implies that the objective is uniformly convex on the whole spaceRn.

Page 28: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

28 Chapter 2. Preconditioning in PDE-Constrained Optimization

While (5.2) was a quadratic programming problem, we are often interested in solvingnonlinear optimization problems:

Minimize f(x)

s.t. e(x) = 0.(5.4)

The Lagrangian associated with (5.5) is L(x, p) = f(x) + p>e(x). As was alreadymentioned in § 4, SQP type methods try to approach a solution by solving a sequenceof problems of type (5.3), with A = Lxx(xn, pn) (Hessian of the Lagrangian) andB = ex(xn) (linearization of the constraints). Let us relate the properties of thenonlinear problem (5.5) to those of (5.2). The key are second-order sufficientoptimality conditions, which, in the case of (5.5), are:

Suppose that the first-order necessary conditions Lx(x∗, p∗) = 0 and e(x∗) = 0 hold,and that ex(x∗) has full row rank, i.e., LICQ holds at x∗. Suppose in addition that

Lxx(x∗, p∗) is positive definite on ker ex(x∗). (5.5)

Then x∗ is a strict local optimal solution for (5.5).

We see that these conditions match precisely our minimal standard assumptions(STD) stated above. By a continuity argument, we can expect them to hold for allmatrices (

Lxx(x, p) ex(x)?

ex(x) 0

)at least in the neighborhood of a point (x∗, p∗) satisfying the second-order sufficientcondition.

§ 5.2 Hilbert Space Setting

Literature: [Girault and Raviart, 1986, Chapter I, Section 4], [Quarteroni andValli, 1994, Section 7]

In the Hilbert space setting, the matrices A and B have to be replaced by bilinearforms a(·, ·) and b(·, ·), which act on Hilbert spaces X and Q. The equivalent ofproblem (5.3) becomes: Find x ∈ X, q ∈ Q such that

a(x, z) + b(z, q) = 〈f, z〉 for all z ∈ Xb(x, r) = 〈g, r〉 for all r ∈ Q.

(5.6)

We need to explore how our standard assumptions are to be modified for this infinitedimensional setting. First of all, we assumed B to have full row rank. This isequivalent to saying that the smallest singular value σmin(B) is still positive. Sincesingular values can be characterized by a min–max property, this is in turn equivalentto requiring

minp∈Rm

maxx∈Rn

p>Bx

‖p‖‖x‖> 0.

This is precisely the condition one usually imposes in the infinite dimensional setting:

infp∈Q

supx∈X

b(x, p)

‖p‖Q‖x‖X≥ k0 > 0,

the so-called inf–sup condition. In addition, we need to require b to be bounded,i.e., b(x, p) ≤ ‖b‖ ‖x‖X‖p‖Q, which is automatic in finite dimensions.

The conditions on A are more straightforward to translate. We require that

Page 29: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

§ 5. Properties of Saddle Point Problems 29

• a is symmetric: a(x, z) = a(z, x) for all x, z ∈ X

• a is bounded: a(x, z) ≤ ‖a‖ ‖x‖X ‖z‖X for all x, z ∈ X

• a is coercive (positive definite) on kerB: a(x, x) ≥ α0 ‖x‖2X for all x ∈

kerB.

These assumptions correspond to (STD) in the Hilbert space case. It should nowbe clear how (STD+) and (STD++) are to be understood.

Under assumptions (STD) one can show that (5.6) has a unique solution. Solving(5.6) is equivalent to solving the infinite dimensional QP

Minimize1

2a(x, x)− 〈f, x〉

s.t. b(x, r) = 〈g, r〉 for all r ∈ Q.(5.7)

We already mention at this point that the four numbers‖a‖, α0 (pertaining to a(·, ·)) and‖b‖, k0 (pertaining to b(·, ·))

contain a lot of information about the spectral properties of problem (5.6).

§ 5.3 Spectral Properties

Literature: Rusten and Winther [1992], Gould and Simoncini [2009]

We already mentioned that the spectral properties, e.g., distribution of eigenvaluesand condition number, are important data to estimate the convergence behavior ofiterative solutions methods. We turn again to the finite dimensional (discretized)setting, and we need to investigate in particular if and how the spectral propertiesdepend on the mesh size h.

We quote from Rusten and Winther [1992] the following result. Suppose that(STD++) holds, and let us denote by

• µ1 ≥ µ2 ≥ · · · ≥ µn > 0 the eigenvalues of A

• σ1 ≥ σ2 ≥ · · · ≥ σm > 0 the singular values of B.

Then the following bounds hold for the eigenvalues of K: Its spectrum σ(K) iscontained in the intervals I− ∪ I+, where

I− =

[1

2

(µn −

√µ2n + 4σ2

1

),

1

2

(µ1 −

õ2

1 + 4σ2m

)]⊂ R−,

I+ =

[µn,

1

2

(µ1 +

õ2

1 + 4σ21

)]⊂ R+.

An inspection of the proof of this result shows that it actually extends verbatim tothe case (STD+), i.e., with µn ≥ 0 instead of µn > 0.

This information can be used to estimate the spectral condition number of K, since

κ(K) = ‖K‖‖K−1‖ =σmax(K)

σmin(K)

holds and for symmetric matrices, the singular values coincide with the absolutevalues of their eigenvalues, i.e., σi(K) = |λi(K)|. This implies that the condition

Page 30: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

30 Chapter 2. Preconditioning in PDE-Constrained Optimization

number will be no larger than the absolute value of the bounds of I− and I+ farthestaway from zero, divided by the absolute value of the bounds of these intervals closestto zero:

κ(K) ≤max1

2

(µ1 +

õ2

1 + 4σ21

), −1

2

(µn −

√µ2n + 4σ2

1

)

minµn, −12

(µ1 −

õ2

1 + 4σ2m

)

. (5.8)

Unfortunately, this estimate cannot be sharp since it yields ∞ if only (STD+) issatisfied but not (STD++). Then µn = 0 holds and the denominator is zero, but wealready know that K is invertible (even under the much weaker condition (STD)),hence indeed κ(K) <∞.

Recently, the eigenvalue bounds for K have been improved by Gould and Simoncini[2009]. They even apply to the case (STD) now, when nothing is known about thesign of the smallest eigenvalue µn of A. In their most general result ([Gould andSimoncini, 2009, Proposition 2.2]), the lower bound, µn, in I+ above is replaced bya number γ, which is defined as the smallest positive root of the cubic equation

µ3 − µ2(µ+ µn) + µ (µ µn − ‖a‖2 − σ2m) + µ σ2

m.

It is also shown that γ cannot be larger than the smallest eigenvalue of the reducedproblem, µ := µmin(Z>AZ), where Z is a matrix whose columns form a basis ofkerB. Since γ > 0 holds, the corresponding condition number estimate

κ(K) ≤max1

2

(µ1 +

õ2

1 + 4σ21

), −1

2

(µn −

√µ2n + 4σ2

1

)

minγ, −12

(µ1 −

õ2

1 + 4σ2m

)

(5.9)

will not yield ∞ even under our weakest assumption (STD).

We calculate the extreme eigenvalues and singular values and the corresponding esti-mate for the condition number in the next section, using as an example a discretizedoptimal control problem.

§ 5.4 An Optimal Control Example

We consider the following standard test example in optimal control with ν > 0:

Minimize1

2‖y − yΩ‖2

L2(Ω0) +ν

2‖u‖2

L2(Ω) over y ∈ H10 (Ω), u ∈ L2(Ω)

s.t. (∇y,∇v)L2(Ω) = (u, v)L2(Ω) for all v ∈ H10 (Ω).

(5.10)

The domain in this example is Ω = (0, 1)2 ⊂ R2. Note that the state is observedonly on a subset, Ω0 = (0.5, 1)× (0, 1).

We discretized the problem using the standard finite element spaces

Yh consisting of piecewise linear (P1), continuous functions,Uh consisting of piecewise constant (P0), discontinuous functions.

The standard basis functions are denoted by ϕi ⊂ Yh and ψi ⊂ Uh. The discreteproblem is a quadratic programming (QP) problem whose necessary (and sufficient)optimality conditions are given by(

A BB 0

)(xp

)=

(fg

)(5.11)

Page 31: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

§ 5. Properties of Saddle Point Problems 31

with building blocks

A =

(M

χΩ0yy 00 Muu

), B =

(K −Myu

).

The individual blocks areM

χΩ0yy state space mass matrix with the characteristic function

of Ω0 as a weight (χΩ0ϕj, ϕi)L2(Ω),

Muu control space mass matrix (ψj, ψi)L2(Ω),

K stiffness matrix (∇ϕj,∇ϕi)L2(Ω),

Myu mixed state/control space mass matrix (ϕj, ψi).

We computed the spectral estimates according to [Gould and Simoncini, 2009,Proposition 2.2] on a sequence of uniformly refined meshes for the parameter ν = 1.The results are shown in Table 5.1.

level µ1(A) µn(A) µ(A) σ1(B) σm(B)

1 3.97e-02 -8.27e-24 3.03e-02 7.08e+00 1.55e-012 1.39e-02 -9.93e-24 7.68e-03 7.73e+00 4.58e-023 3.79e-03 -3.31e-24 1.93e-03 7.93e+00 1.25e-024 9.69e-04 -4.96e-24 4.83e-04 7.98e+00 3.26e-03

level [I− I−] [I+ I+] κ(K)

1 -7.08e+00 -1.36e-01 2.84e-02 7.09e+00 2.50e+022 -7.73e+00 -3.94e-02 7.02e-03 7.74e+00 1.10e+033 -7.93e+00 -1.07e-02 1.76e-03 7.93e+00 4.50e+034 -7.98e+00 -2.81e-03 4.43e-04 7.98e+00 1.80e+04

Table 5.1. Spectral bounds (upper row) for matrices A and B inproblem (5.10) on various grid levels, with respect to the standardinner product. Computed eigenvalue bounds (lower row) based on[Gould and Simoncini, 2009, Proposition 2.2] and corresponding con-dition number estimates.

We can make the following observations:

(a) The A block is only positive semidefinite. It has a number of zero eigenval-ues, µk(A) = µk+1(A) = · · · = µn(A) = 0, which is due to the observationof the state being restricted to the subdomain Ω0 in the objective of (5.10).

(b) Nevertheless, the A block is positive definite on kerB, as can be seen fromµ(A) (the smallest eigenvalue of the restriction of A to kerB) remainingpositive. This is in accordance with our expectation, since assumption(STD) is satisfied for our problem (5.10).

(c) The largest eigenvalue µ1(A) of the A block decreases like ∼ h2, which is inaccordance with known results for mass matrices in 2D.

(d) The largest singular value σ1(B), that is the norm of B, is almost constant.This is due to the fact that σ1(B) is dominated by the largest eigenvalue ofthe stiffness matrix K, which is bounded above by a constant, independentof the mesh level.

Page 32: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

32 Chapter 2. Preconditioning in PDE-Constrained Optimization

(e) The smallest singular value σm(B) decreaes like ∼ h2.

All in all, we conclude that the upper bound of I−, i.e., 12

(µ1 −

õ2

1 + 4σ2m

),

approaches zero from below with a rate of ∼ h2. At the same time, the lowerbound of I+, i.e., γ, which is bounded above by µ, approaches zero from above withthe same rate. Consequently, the condition number estimate κ(K) shown in thelast column, is of order ∼ h−2, which can also be confirmed by direct verification.The same rate for κ(K) also holds in the 3D case. The need to use preconditionediterative solvers (§ 6) becomes evident at this point.

On second thought, however, one thing is curious. We noticed earlier that the spec-tral properties of problem (5.10) should actually be described in the four numbers‖a‖, α0, ‖b‖ and k0, which are mesh independent. Indeed, it is an easy exercise(compare [Herzog and Sachs, 2010, Lemma 3.1]) to show that for our problem

‖a‖ = max1, ν, α0 = ν/(4 cP ), ‖b‖ = 2, k0 = (1/cP )2

hold, where cP is the Poincaré constant ‖y‖H1(Ω) ≤ cP |y|H1(Ω).

A first attempt of an explanation might be that the discretization alters these con-stants. Well, this is indeed a concern. Since we are using conforming discretizationsYh ⊂ Y = H1

0 (Ω) and Uh ⊂ U = L2(Ω), it is easy to see that the constants ‖a‖ and‖b‖ must also be valid at all discrete levels, independent of the mesh size. But thisis not the case for α0 and k0, since in general

xh ∈ Xh : b(xh, ph) = 0 for all ph ∈ Qh 6⊂ kerB

and, given ph ∈ Qh,

the sup-generating element x ∈ X ofb(x, ph)

‖x‖X‖ph‖Qmay not belong to Xh.

The constants valid on the discrete level indeed depend on the choice of the spacesXh = Yh × Uh and Qh = Yh. With our choice above, however, we do satisfy thediscrete stability condition, which means that we may have to live with smaller butmesh independent constants 0 < α′0 ≤ α0 and 0 < k′0 ≤ k0.

Therefore, the reason for the condition number κ(K) going to infinity with rate∼ h−2 must lie elsewhere. To make the story short, in our computations abovewe have (implicitly) equipped the spaces Xh and Qh with the scalar products ofRn and Rm. However, this does not make a lot of sense for vectors which actuallyrepresent elements of function spaces. Therefore, we should have equipped thefinite dimensional spaces with appropriate scalar products, so that the norms ofthe coefficient vectors reflect the norms of the functions represented by them. Forthe state and adjoint variables y and u, appropriate scalar products are induced bythe matrices (K + Myy), while for the control variable we take Muu. With thesescalar products, our discrete coefficient vectors have norms which correctly reflectthe H1(Ω) or L2(Ω) norms of the functions they represent.

How can we re-compute the spectral parameters of our matrices A and B usingthis knowledge? Changing the scalar products amounts to using the algorithms forgeneralized eigenvalue and singular value computation. To illustrate this point, µ isa generalized eigenvalue of A with respect to the scalar product mentioned above if(

MχΩ0yy 00 Muu

)(yu

)= µ

(K +Myy 0

0 Muu

)(yu

)

Page 33: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

§ 6. Preconditioning KKT Systems 33

holds for some nonzero vector (y, u). By re-computing all values using this general-ized formulation, we arrive at Table 5.2. It shows that w.r.t. to the appropriate scalarproducts, the condition number is actually benign and in particular independent ofthe mesh size.

level µ1(A) µn(A) µ(A) σ1(B) σm(B)

1 1.00e+00 -1.49e-23 9.11e-01 4.31e+00 9.78e-012 1.00e+00 -6.62e-24 9.12e-01 4.32e+00 9.77e-013 1.00e+00 -5.79e-23 9.14e-01 4.33e+00 9.77e-014 1.00e+00 -6.95e-23 9.15e-01 4.33e+00 9.77e-01

level [I− I−] [I+ I+] κ(K)

1 -4.31e+00 -5.98e-01 4.03e-01 4.84e+00 1.20e+012 -4.32e+00 -5.98e-01 4.03e-01 4.85e+00 1.20e+013 -4.33e+00 -5.97e-01 4.04e-01 4.85e+00 1.20e+014 -4.33e+00 -5.97e-01 4.04e-01 4.85e+00 1.20e+01

Table 5.2. Same information as in Table 5.1, but with respect toappropriate scalar products, i.e., using generalized eigenvalue and sin-gular value computations.

§ 6 Preconditioning KKT Systems

Literature: Benzi et al. [2005], Stoll and Wathen [2008]

The following quotes are taken from Barrett et al. [1994] and Benzi et al. [2005]:

• ”The convergence rate of iterative methods depends on spectral propertiesof the coefficient matrix. Hence one may attempt to transform the linearsystem into one that is equivalent in the sense that it has the same solution,but that has more favorable spectral properties. A preconditioner is amatrix that effects such a transformation.”

• ”For saddle point problems, the construction of high-quality preconditionersnecessitates exploiting the block structure of the problem, together withdetailed knowledge about the origin and structure of the various blocks.Because the latter varies greatly from application to application, there isno such thing as the ’best’ preconditioner for saddle point problems.”

Using (left) preconditioning, we convert a saddle point problem of the form (5.3)into the equivalent system

K−1

(A B>

B 0

)(xp

)= K−1

(fg

)(6.1)

with the purpose that the preconditioned matrix has better spectral properties.Since K−1 has to be applied to a given vector during the iterative solution process,i.e., linear systems with K have to be solved, the computational costs for this hasto be balanced with the improvement in convergence achieved by preconditioning.Clearly, the ’best’ preconditioner from the convergence point of view of the matrixitself, but solving with K is then equivalent to solving the original system.

Page 34: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

34 Chapter 2. Preconditioning in PDE-Constrained Optimization

There are a number of classes of preconditioners available in the literature whichrespect the block structure and properties of saddle point problems. We refer to[Benzi et al., 2005, Section 10] for an overview. Many of the well established pre-conditioners were designed with applications other than optimal control in mind. Aprominent example is the Stokes problem(

−µ4 ∇− div 0

)(up

)=

(f0

)with viscosity µ > 0.

A typical Hilbert space setting for this problem is X = H10 (Ω)d, Q = L2

0(Ω) and1

a(u,v) = µ (∇u,∇v)L2(Ω)d×d , b(u, p) = −(divu, p)L2(Ω).

It is easy to see from here that the Stokes problem verifies assumption (STD++).By contrast, we saw in § 5.4 that optimality systems for optimal control problemstypically satisfy only the much weaker condition (STD). Therefore, most of thepreconditioners for the Stokes problem cannot be directly applied to KKT systemsin optimal control.

In the remainder of this section, we briefly touch upon several classes of precon-ditioners which make do with the assumption of (STD). We use again the KKTconditions of our model problem (5.10) as an illustrative example. In continuousform, this system reads

K

yup

=

χΩ0 0 −40 νI −I−4 −I 0

yup

=

χΩ0yΩ

00

(6.2)

where −4 stands for the negative Laplacian with homogeneous Dirichlet boundarydata.

§ 6.1 Early Approaches

Literature: Battermann and Heinkenschloss [1998], Battermann and Sachs [2001],Haber and Ascher [2000], Biros and Ghattas [2005]

The main idea behind early approaches in preconditioning KKT systems from PDE-constrained optimization problems was to exploit available preconditioners for theforward operator, i.e., the operator governing the state equation (−4 in our exam-ple). This led to preconditioners of the class

K =

0 0 −40 H −I−4 −I 0

where H stands for the reduced Hessian matrix ∇2f(·), or

H =

(4−1

I

)?(χΩ0 00 νI

)(4−1

I

)and where the general symbol M represents a preconditioner for the matrix oroperator M .

1L20(Ω) denotes the space of L2 functions on Ω with zero average. It is needed to normalize

the pressure, which is determined only up to a constant.

Page 35: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

§ 6. Preconditioning KKT Systems 35

Since K is equivalent to a lower triangular matrix, every solve with K requires twosolves with the preconditioners −4 and one with the approximate reduced HessianH. Often H = νI was used, because the true reduced Hessian is just a compact per-turbation of this. Since both K and K are indefinite operators (indefinite matricesafter discretization), general purpose solvers like QMR or GMRES were generallyused with these preconditioners. (MINRES requires positive definite precondition-ers and therefore wasn’t considered.)

§ 6.2 Constraint Preconditioners

Literature: Gould et al. [2001], Keller et al. [2000], Dollar and Wathen [2006]

Constraint preconditioners are of the following structure:

K =

(G B>

B 0

)with G symmetric positive definite on kerB.

They open up the possibility of using the ppcg (preconditioned projected conjugategradient) method for the solution of(

A BB 0

)(xp

)=

(fg

)in spite of K’s indefiniteness. Indeed, the conjugate gradient iteration taking placecan be shown to be equivalent to one in the reduced space kerB, see Gould et al.[2001], provided that the initial iterate is consistent with the equation Bx = g.

It may appear worthwhile at first glance to apply constraint preconditioners toPDE constrained optimization problems. However, there is actually little benefit ofpreconditioning the benign block A, consisting of mass matrices, while leaving theB block intact. Indeed, solving systems with K is as hard as solving systems withK since the ’difficult’ operator is our PDE operator (−4 in the example), which isnot preconditioned at all.

The simple trick of swapping rows and columns in (6.2) does not help either becausethe only useful constellation would beχΩ0 −4 0

−4 0 −I0 −I νI

which is not of the form (5.1).

We return later however, to a constraint preconditioner as a building block of anotherpreconditioner to cope with issues arising due to control inequality constraints, see§ 6.4.3.

§ 6.3 Preconditioned Conjugate Gradients

Literature: Schöberl and Zulehner [2007], Herzog and Sachs [2010]

The conjugate gradient method is known to be a reliable iterative solver only forsymmetric and positive definite matrices. Since saddle point matrices are indefinite(§ 5.3), the possibility of solving them via CG seems to be ruled out. In principle,

instead of solving K(~x~q

)=

(~f~g

), one could solve K>K

(~x~q

)= K>

(~f~g

)which is

Page 36: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

36 Chapter 2. Preconditioning in PDE-Constrained Optimization

governed by a symmetric and positive definite matrix. However, this is not recom-mended due to the now squared condition number.

In their seminal paper Bramble and Pasciak [1988], to everybody’s surprise, theauthors found a way of applying the preconditioned conjugate gradient iteration tosaddle-point problems. This became possible through a clever combination of a pre-conditioner and a non-standard scalar product D, which render the preconditionedmatrix self-adjoint and positive definite with respect to D! The conjugate gradientmethod with this non-standard inner product can thus be applied to the precondi-tioned matrix. Ever since, the Bramble-Pasciak idea and extensions thereof haverisen to popular solution methods for saddle point problems arising mainly from thediscretization of PDEs. It seems that this technique was long underestimated if notunknown in the optimization and optimal control communities.

We say that a matrix A is self-adjoint w.r.t. the scalar product D if(x,Ay

)D =

(Ax, y

)D for all x, y

holds, i.e., if A = D−1A>D. We say that it is positive definite w.r.t. D if(x,Ax

)D ≥ c ‖x‖2

D for all x,

with a constant c > 0. Stoll and Wathen [2008] have more on self-adjointness ingeneral inner products, in particular for saddle point problems.

In 2007, Schöberl and Zulehner [2007] gave a thorough analysis and systematicconstruction of a class of symmetric indefinite preconditioners, together with suitablescalar products, which effect the desired magic. The preconditioners are of the form

K =

(A B>

B BA−1B> − S

)=

(I 0

BA−1 I

)(A B>

0 −S

), (6.3)

where A and S are symmetric and nonsingular matrices that we define below. Thisfactorization reveals that every application of the preconditioner requires two solveswith A and one solve with S. The interesting fact of the matter is that suitablepreconditioner building blocks are very easy to construct. One may simply takeproperly scaled preconditioners for the scalar product matrices (let’s call them Xand Q) in the spaces X and Q:

A =1

σX , S =

σ

τQ,

where X and Q are suitable approximations to X and Q, respectively.This finding is of great practical value because excellent preconditioners for thescalar product matrices are available ’off-the-shelf’ for very many combinations ofspaces and finite element discretizations. Here are some examples:

• For the spaceX = H1(Ω), the standard scalar product matrix is the stiffnessmatrix associated with the weak form of the problem

−4y + y = 0 in Ω, ∂ny = 0 on ∂Ω.

For the space X = H10 (Ω), an alternative scalar product is the one associ-

ated with the weak form of

−4y = 0 in Ω, y = 0 on ∂Ω.

Page 37: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

§ 6. Preconditioning KKT Systems 37

For both problems, the geometric multigrid approach is one convenient wayof obtaining a suitable preconditioner X of optimal complexity for a broadchoice of discretizations.

• For the space X = L2(Ω), the scalar product matrix is known as the massmatrix in the finite element context. Due to its mesh independent conditionnumber, it can be preconditioned with very little effort, e.g., by diagonallypreconditioned conjugate gradients, symmetric Gauss-Seidel iterations, orChebyshev semi-iteration (see Wathen and Rees [2008] for the latter).

• For product spaces, one may simply use independent preconditioners foreach factor space.

Schöberl and Zulehner [2007] found explicit bounds for σ and τ to make suitablescaling parameters. These bounds are

σ <1

‖a‖and τ >

1

(1− qX)(1− qQ)

1

k20

, (6.4)

and they depend only on two of the four governing constants2 and also on the qualityof the preconditioners, for which we make the non-restrictive assumptions of spectralequivalence

(1− qX) X X X , (1− qQ) Q Q Q.

With σ and τ chosen to satisfy (6.4), the matrix

D = K − K =

(A− A 0

0 BA−1B> − S

)(6.5)

defines a scalar product, and the preconditioned matrix K−1K is self-adjoint andpositive definite w.r.t. D. Moreover, the preconditioned condition number can beestimated by

κ(K−1K) ∼ ‖a‖α0

(‖b‖k0

)2

. (6.6)

Recall that the contraction rate of conjugate gradient iterations is√κ− 1√κ+ 1

.

Estimate (6.6) thus shows that the performance of the preconditioned conjugategradient method will depend exclusively on properties of the infinite dimensionalproblem and it will not deteriorate with decreasing mesh size (under the conditionof stable discretizations, see the footnote).

2We recall from Section 5.4 that the coercivity and inf-sup constants α′0 and k′0 observed afterdiscretization could be worse (smaller) than those belonging to the undiscretized problem. Themesh independence of α′0 and k′0 imposes a discrete stability constraint on the choice of the finiteelement pairs which generate the finite dimensional spaces. We assume here that this condition issatisfied, as was the case in our example in Section 5.4, because this is an issue of discretization,not of preconditioning.

Page 38: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

38 Chapter 2. Preconditioning in PDE-Constrained Optimization

§ 6.4 Implementation Details

Literature: Herzog and Sachs [2010]

In this section we show how to implement and apply the aforementioned precondi-tioner to the following problem, taken from [Herzog and Sachs, 2010, Section 3.1].3

Minimize1

2‖y − yd‖2

L2(Ω) +ν

2‖u− ud‖2

L2(Ω)

s.t.−4y + y = u in Ω

∂ny = 0 on ∂Ω

and ua ≤ u ≤ ub a.e. in Ω.

(6.7)

Note that this closely resembles our earlier example, but control constraints areincluded. We point out that the source code for your own numerical experiments isavailable from

http://www.tu-chemnitz.de/mathematik/part_dgl/publications.php.

We already know from § 3.1, Algorithm 8, that every step in the primal-dual activeset strategy (or semismooth Newton method) requires the solution of the followingsystem:

I · L∗ ·· νI −I IL −I · ·· c χAk

· χIk

yk+1

uk+1

pk+1

ξk+1

=

ydud0

c(χA+

kub + χA−k

ua) , (6.8)

where χA+k, χA−k and χAk

denote the characteristic functions of A+k , A

−k and Ak =

A+k ∪ A

−k , respectively, and L represents the differential operator of the PDE con-

straint in (6.7). In our present example, we have L = −4 + I with homogeneousNeumann boundary conditions in weak form, considered as an operator from H1(Ω)into H1(Ω)∗.

System (6.8) changes from iteration to iteration due to changes in the active sets.Since, however, we focus here on the efficient solution of individual Newton steps, wedrop the iteration index from now on. From (6.8) one infers ξI = 0 (the restrictionof ξ to the inactive set I), and we may eliminate this variable from the problem.The Newton system then attains an equivalent symmetric saddle point form:

I · L∗ ·· νI −I χAL −I · ·· χA · ·

yupξA

=

ydud0

χA+ ub + χA− ua

, (6.9)

which fits into our framework with the following identificationsx = (y, u) ∈ X = H1(Ω)× L2(Ω)

q = (p, ξA) ∈ Q = H1(Ω)× L2(A)

and bilinear forms

a((y, u), (z, v)

):= (y, z)L2(Ω) + ν (u, v)L2(Ω) (6.10a)

b((y, u), (p, ξA)

):= (y, p)H1(Ω) − (u, p)L2(Ω) + (u, ξA)L2(A). (6.10b)

3with the slight extension that ud ∈ L2(Ω) appears in the objective

Page 39: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

§ 6. Preconditioning KKT Systems 39

It is an easy exercise (compare again [Herzog and Sachs, 2010, Lemma 3.1]) to showthat our problem satisfies assumptions (STD) with constants

‖a‖ = max1, ν, α0 = ν/2, ‖b‖ = 2, k0 = 1/2,

independent of the active sets.

When discretizing the problem with standard finite elements, say, piecewise linearcontinuous functions for both, the state and control variables, we get a discretevariant of equation (6.10b). As in the continuous setting, we infer ~µI = ~0 andeliminate this variable to obtain

Mh · L>h ·· ν Mh −Mh P>ALh −Mh · ·· PA · ·

~y~u~p~µA

=

Mh~ydMh~ud

0PA+ ~ub + PA− ~ua

. (6.11)

PA is a rectangular matrix consisting of those rows of the diagonal 0-1-matrix χAwhich belong to the active indices, and similarly for PA± .

Some comments concerning the discrete system (6.11) are in order. The variable ~µis the Lagrange multiplier associated to the discrete constraint ~ua ≤ ~u ≤ ~ub imposedon the coefficient vector, and the relations ~µ = P>A ~µA and ~µA = PA ~µ hold. Ifwe set ~ξ = M−1

h ~µ, then ~ξ is the coordinate vector of a function in L2(Ω) whichapproximates the multiplier ξ in the continuous system. This observation must bereflected by the choice of norms on the discrete level, see (6.12) below.

The settings~x = (~y, ~u) ∈ Xh = Rn × Rn

~q = (~p, ~µA) ∈ Qh = Rn × RnA

and bilinear forms

ah((~y, ~u), (~z,~v)

):= ~z>Mh~y + ν ~v>Mh~u

bh((~y, ~u), (~p, ~µA)

):= ~p>Lh~y − ~p>Mh~u+ ~µ>A(PA~u)

define the setting for the discrete problem. As mentioned above, care has to betaken in choosing an appropriate norm for the discrete multiplier ~µA. We use scalarproducts in the spaces Xh and Qh represented by the following matrices:

X =

(Kh

Mh

)and Q =

(Kh

P>AM−1h PA

)(6.12)

with Mh and Kh defined as

Mh = (ϕi, ϕj)L2(Ω) mass matrix (6.13a)Lh = Kh = (∇ϕi,∇ϕj)L2(Ω) + (ϕi, ϕj)L2(Ω) stiffness matrix, (6.13b)

where ϕini=1 is the standard basis of piecewise linear continuous basis functions.

We recall that the preconditioner has the form

K =

(I 0

BA−1 I

)(A B>

0 −S

)

Page 40: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

40 Chapter 2. Preconditioning in PDE-Constrained Optimization

and that we solely require correctly scaled preconditioners for the scalar productmatrices defined in (6.12). We use

A =1

σ

(Kh

Mh

)and S =

σ

τ

(Kh

PAM−1h P>A

)(6.14)

and refer to Herzog and Sachs [2010] for details on the automatic choice of scalingparameters σ and τ .

The remainder of these notes is devoted to explaining some more implementationdetails to convince the reader that the proposed preconditioner is actually easy toapply.

§ 6.4.1 PCG Solver

We begin by showing the pcg method in Algorithm 10. The reader will verify thatthis is actually a standard implementation of the conjugate gradient method (see,e.g., Saad [2003] or Shewchuk [1994]), applied to the preconditioned matrix K−1K,and with respect to the D scalar product, whose implementation is described in thenext subsection.

§ 6.4.2 Scalar Product

It is to be noted that the scalar product 〈(~tx,~tq), (~rx, ~rq)〉D, with D given in (6.5),cannot usually be evaluated for arbitrary pairs of vectors. The reason is that matrix-vector products with A and S are usually not available (in contrast to products withA−1 and S−1, which are realized by applications of the preconditioners; think ofmultigrid). And thus A ~rx and S ~rq cannot be evaluated unless (~rx, ~rq) = K−1(~sx, ~sq)holds. That is, the evaluation of the scalar product is implementable if one ofthe factors is known to be the preconditioner applied to another pair of vectors.(Fortunately, this is the case during the pcg iterations.) We denote this situationby 〈(~tx,~tq), (~rx, ~rq); (~sx, ~sq)〉D. Under these circumstances the scalar product can beevaluated as follows:

〈(~tx,~tq), (~rx, ~rq); (~sx, ~sq)〉D = (~tx)>(~sx −B>~rq − A~rx) + (~tq)

>(~sq −B~rx).

As a consequence, it is necessary to maintain the relations ~r = K−1~s and ~q = K−1~ethroughout the iteration, which requires the storage of one extra vector comparedto common conjugate gradient implementations with respect to the standard innerproduct.

§ 6.4.3 Preconditioner

Algorithm 11 below describes in detail the application of the preconditioner (6.14)in terms of

K(~rx~rq

)=

(~sx~sq

)(6.15)

where ~rx = (~ry, ~ru), ~rq = (~rp, ~rµA) and ~sx = (~sy, ~su), ~sq = (~sp, ~sµA) hold. Thebuilding blocks of the preconditioner are as follows:

• (Kh)−1~b is realized by one multigrid V-cycle applied to the linear system

with the scalar product matrix Kh (representing the discrete H1(Ω) scalarproduct) and right hand side ~b. A number of νGS forward and reverse

Page 41: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

§ 6. Preconditioning KKT Systems 41

Algorithm 10 Conjugate gradient method for K−1K w.r.t. to scalar product D.

Input: right hand side (~bx,~bq) and initial iterate (~xx, ~xq)

Output: solution (~xx, ~xq) of K (~xx, ~xq) = (~bx,~bq)1: Set n := 0 and compute initial residual(

~sx~sq

):=

(~bx − A~xx −B>~xq

~bq −B ~xx

)and

(~dx~dq

):=

(~rx~rq

):= K−1

(~sx~sq

)2: Set δ0 := δ+ := 〈(~rx, ~rq), (~rx, ~rq); (~sx, ~sq)〉D3: while n < nmax and δ+ > ε2

rel δ0 and δ+ > ε2abs do

4: Set (~ex~eq

):=

(A ~dx +B>~dq

B ~dx

)and

(~qx~qq

):= K−1

(~ex~eq

)5: Set α := δ+/〈(~dx, ~dq), (~qx, ~qq); (~ex, ~eq)〉D6: Update the solution (

~xx~xq

):=

(~xx~xq

)+ α

(~dx~dq

)7: Update the residual(

~rx~rq

):=

(~rx~rq

)− α

(~qx~qq

)and

(~sx~sq

):=

(~sx~sq

)− α

(~ex~eq

)8: Set δ := δ+ and δ+ := 〈(~rx, ~rq), (~rx, ~rq); (~sx, ~sq)〉D9: Set β := δ+/δ10: Update the search direction(

~dx~dq

):=

(~rx~rq

)+ β

(~dx~dq

)11: Set n := n+ 112: end while13: return (~xx, ~xq)

Gauss-Seidel smoothing steps are used, starting from an initial guess of ~0.In Algorithm 11, the evaluation of (Kh)

−1~b is denoted by multigrid(~b).

• (Mh)−1~b corresponds to νSGS symmetric Gauss-Seidel steps for the mass

matrix Mh (representing the scalar product in L2(Ω)) with right hand side~b, and with an initial guess ~0. This is denoted by SGS(~b) in Algorithm 11.

• Note that the evaluation of ~µA = (PAM−1h P>A )−1~bA is equivalent to solving

the linear system (Mh P>APA 0

)(~r~µA

)= −

(~0~bA

), (6.16)

where ~r is a dummy variable. System (6.16) can be solved efficiently by thepreconditioned projected conjugate gradient (ppcg) method (see § 6.2) in

Page 42: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

42 Chapter 2. Preconditioning in PDE-Constrained Optimization

the standard scalar product, where(diag(Mh) P>A

PA 0

)serves as a (constraint) preconditioner. In Algorithm 11, this correspondsto the call ppcg(~bA,A+,A−). In practice, we use a tight relative termi-nation tolerance of 10−12 for the residual in ppcg, which took at most 13steps to converge in all examples.4 Note that the projected conjugated gra-dient method requires an initial iterate consistent with the second equationPA ~r = −~bA in (6.16). Due to the structure of PA, we may simply take~rA = −~bA, ~rI = ~0 and ~µA = ~0 as initial iterate.

Algorithm 11 Application of the preconditioner according to (6.15)Input: right hand sides ~sx = (~sy, ~su) and ~sq = (~sp, ~sµA), scaling parameters σ, τ ,

and active sets A+,A−

Output: solution ~rx = (~ry, ~ru) and ~rq = (~rp, ~rµA) of (6.15)

1: ~r ′y := multigrid(σ ~sy)

2: ~r ′~u := SGS(σ ~su)

3:

(~s ′p~s ′µA

):= B

(~r ′y~r ′u

)−(~sp~sµA

)4: ~rp := multigrid(τ ~s ′p/σ)

5: ~rµA := ppcg(τ ~s ′µA/σ,A+,A−)

6:

(~s ′y~s ′u

):=

(~sy~su

)−B

(~rp~µA

)7: ~ry := multigrid(σ ~s ′y)

8: ~ru := SGS(σ ~s ′u)

9: return ~ry, ~ru, ~rp, ~rµA

After this, Herzog and Sachs [2010] continues with the investigation of problemswith state constraints, regularized either by the Lavrentiev technique (leading tomixed control-state constraints), or by the Moreau-Yosida penalty approach, see§ 3.3.

We stop here with the presentation of some numerical experiments for the controlconstrained case in 2D and 3D. The setup is described in [Herzog and Sachs, 2010,Example 4.2]. Our conclusion is, not surprisingly, that for moderate discretizationsin 2D a sparse direct solver is not easy to beat, while for 3D problems our iterativesolver wins immediately.

We point out once again that the source code for this example is available from

http://www.tu-chemnitz.de/mathematik/part_dgl/publications.php.

4The reason for solving (6.16) practically to convergence is that intermediate iterates in con-jugate gradient iterations depend nonlinearly on the right hand side, and thus early terminationwould effectively yield a nonlinear preconditioner S not covered by the theory.

Page 43: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

§ 6. Preconditioning KKT Systems 43

Most of it (excluding the standard multigrid routines) is also shown in the appendixwith some additional comments.

Figure 6.1. The plots show the average solution time per Newtonstep vs. the dimension of the discretized state space. We compare thepcg method to Matlab’s sparse direct solver applied to the linearizedoptimality system (6.9) of problem (6.7) in 2D (left) and 3D (right).The triangle has slope 1 and it visualizes the linear complexity of theproposed pcg solver w.r.t. the number of unknowns.

Figure 6.2. The plots show the convergence history of the pcg resid-ual in the 2D (left) and 3D (right) cases on the finest grid, for allNewton steps.

Page 44: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010
Page 45: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

APPENDIX A

Software

§ 7 Source Code with Comments for the Preconditioned CG Code

We show here an actual Matlab implementation of the preconditioned conjugategradient algorithm for the solution of the control constrained problem (6.7) in 2D or3D with the setup described in [Herzog and Sachs, 2010, Example 4.2]. The completecode for this example is available from http://www.tu-chemnitz.de/mathematik/part_dgl/publications.php. It is identical to the code shown below except thatthe latter was stripped from a number of fprintf commands. We also do notaddress standard multigrid routines here.

We begin by loading the problem data for the 3D version of problem (6.7). Forconvenience, the .mat file already contains a hierarchy of all matrices across thegrid levels as well as the multigrid transfer operators. With the prepost parameter,we may override the default number of Gauss-Seidel pre- and post-smoothing stepsfor the multigrid V-cycle if desired. Explore also the other data in the problemstructure.

% Load the problem dataload Example_CC3D.mat;

problem.prepost = 2;

% Get matrices and sizesM = problem.fem.Mproblem.levels ; % mass matrixL = problem.fem.Lproblem.levels ; % PDE operator matrixny = problem.fem.npproblem.levels ; % degrees of freedom for

state/control/adjoint state

Next we initialize our variables (y, u, p, µ) and enter the PDAS (semismooth Newton)loop, which was described in § 3.1.

% Initialize variablesy = zeros(ny ,1); u = zeros(ny ,1); % same # of dofs (

distributed control)p = zeros(ny ,1); mu = zeros(ny ,1);

% Initialize flags and countersiter = 0; done = 0;

% Do a simple semismooth Newton loopwhile (iter < problem.maxSSN & ~done)

% Determine active and inactive sets% ---------------------------------------------------

45

Page 46: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

46 Appendix A. Software

Aplus = find(mu + problem.cSSN * (u - problem.ub) > 0);Aminus = find(mu + problem.cSSN * (u - problem.ua) < 0);

Next we set up the linear system and right hand side, see (6.11). In a truly highperformance code, the matrices A and B would not be formed of course, but matrix-vector products would be used instead.

% Solve the Newton system by Bramble -Pasciak like cg% ---------------------------------------------------% Set up the active set projectorPA = zeros(ny ,1);PA(Aplus) = 1; PA(Aminus) = 1;PA = spdiags(PA ,0,ny,ny);PA = PA(union(Aplus ,Aminus) ,:);nA = length(Aplus)+length(Aminus);

% Set up some zero matricesZ1 = sparse(ny,ny); Z2 = sparse(ny,nA);

% Set up the saddle point blocksA = [M Z1; Z1 problem.nu*M];B = [L -M; Z2’ PA];

% Set up right hand sidebx = [M*problem.fem.yd; problem.nu*M*problem.fem.ud];bq = zeros(ny ,1);bq(Aplus) = problem.ub;bq(Aminus) = problem.ua;bq = [zeros(ny ,1); PA*bq];

We are now ready to call the pcg solver. We haven’t mentioned the safeguardstrategy yet (see [Herzog and Sachs, 2010, Algorithm 2]) which detects unsuitablevalues of the scaling parameters and simply tries again with corrected values if theywere wrong.

% Prepare to call the Bramble -Pasciak cg solver% with a safeguard strategy , should inappropriate% scaling be detected.scaling_rejected = 1;

% Call the solver% ---------------------------------------------------while (scaling_rejected)

[x,q,flag ,pcgiter] = ...bpcg(A,B,[],bx,bq, ...

problem.atolpcg ,problem.rtolpcg ,problem.maxpcg ,...

@Khatm1 ,[y;u],[p;PA*mu], ...problem ,PA,nA,B,Aplus ,Aminus);

if (flag >= 0) % scaling was okscaling_rejected = 0;

Page 47: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

§ 7. Source Code with Comments 47

elseproblem.sigma = problem.sigma / sqrt (2);problem.tau = problem.tau * sqrt (2);

end

end %while (scaling_rejected)

Once the Newton system was successfully solved, we decompose (x, q) = (y, u, p, µ),check for convergence and finish the semismooth Newton loop.

% Split the solutiony = x(1:ny); u = x(ny+1:end); % state and controlp = q(1:ny); mu = zeros(ny ,1); % adjoint state and

multipliermu(union(Aplus ,Aminus)) = q(ny+1:end);

% Check for convergence% ---------------------------------------------------res = mu - max(0,mu + problem.cSSN * (u - problem.ub)) - min

(0,mu - problem.cSSN * (problem.ua - u));res = sqrt(res ’*M*res);

% Check residual tolerance reachedif ((res <= problem.atolSSN) & (iter >= 0))

done = 1;end

% Finish this semismooth Newton iterationiter = iter + 1;

end %while (iter < problem.maxSSN & ~done)

Our work horse within the semismooth Newton loop is the preconditioned conjugategradient solver (here called bpcg after Bramble and Pasciak), see Algorithm 10.The actual implementation of the preconditioner (Algorithm 11, the call to Khatm1,meaning K−1) is not shown, but it is contained in the available source code. Asdescribed in § 6.4.3, it consists of standard components such as a multigrid V-cycle.function [x,q,flag ,iter] = bpcg(A,B,Bt,bx,bq ,atol ,rtol ,maxit ,

Khatm1 ,x0,q0,varargin)% Initialize iteration counter and variablesiter = 0;x = x0; q = q0;

% Compute the initial residualsx = bx - A*x - B’*q;sq = bq - B*x;

% Compute the preconditioned initial residual[rx,rq] = Khatm1(sx,sq,varargin);

% This is also the initial search directiondx = rx; dq = rq;

Page 48: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

48 Appendix A. Software

% Compute delta = residual norm squared w.r.t. the scalarproduct D

delta0 = scalar_product(A,B,rx,rq,rx,rq,sx,sq);delta = delta0;

% Set convergence flagconverged = 0;

% Enter loopwhile (iter < maxit & ~converged & delta > 0)

% Evaluate e = K * dex = A * dx + B’ * dq;eq = B * dx;

% Compute q = Khat ^-1 * e[qx,qq] = Khatm1(ex,eq,varargin);

% Compute the step length (should be non -negative)alpha = delta / scalar_product(A,B,dx,dq,qx ,qq,ex,eq);

% Update the solutionx = x + alpha * dx;q = q + alpha * dq;

% Update the residual and s to maintain r = Khat ^-1 * s% Re-evaluate the residual in every 50th iterationif (iter > 0 & mod(iter ,50) == 0)

sx = bx - A * x - B’ * q;sq = bq - B * x;[rx,rq] = Khatm1(sx,sq,varargin);

elserx = rx - alpha * qx;rq = rq - alpha * qq;sx = sx - alpha * ex;sq = sq - alpha * eq;

end

% Remember old value of deltadeltaold = delta;

% Compute new value of deltadelta = scalar_product(A,B,rx ,rq,rx,rq,sx,sq);

% Compute new value of betabeta = delta / deltaold;

% Update search direction ddx = rx + beta * dx;dq = rq + beta * dq;

Page 49: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

§ 7. Source Code with Comments 49

% Increase iteration counteriter = iter + 1;

% Check for convergenceconverged = (delta <= atol ^2) | (delta <= rtol^2 * delta0)

;

end %while (iter < maxit & ~converged & delta > 0)

% Set return flagif (delta <= 0)

flag = -1; % incorrect scaling of preconditionerelseif converged

flag = 0; % converged to desired toleranceelse

flag = 1; % max # of iterations reached withoutconvergence

end

end % function bpcg

Finally, we show the implementation of the scalar product, which was described in§ 6.4.2.function val = scalar_product(A,B,tx,tq ,rx,rq,sx,sq)% Evaluate the D scalar product of [tx;tq] and [rx;rq]% where [rx;rq] = Khat ^-1 * [sx;sq] is assumedval = tx’ * (sx - B’*rq - A*rx) + tq ’ * (sq - B*rx);

end %function scalar_product

Page 50: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010
Page 51: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

Bibliography

W. Alt. Local convergence of the Lagrange-Newton method with applications tooptimal control. Control and Cybernetics, 23(1–2):87–105, 1994.

R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout,R. Pozo, C. Romine, and H. Van der Vorst. Templates for the Solution of LinearSystems: Building Blocks for Iterative Methods. SIAM, Philadelphia, 1994.

A. Battermann and M. Heinkenschloss. Preconditioners for Karush-Kuhn-Tuckermatrices arising in the optimal control of distributed systems. In W. Desch,F. Kappel, and K. Kunisch, editors, Optimal Control of Partial Differential Equa-tions, volume 126 of International Series of Numerical Mathematics, pages 15–32.Birkhäuser, Basel, 1998.

A. Battermann and E. Sachs. Block preconditioners for KKT systems in PDE-governed optimal control problems. In K.-H. Hoffmann, R.H.W. Hoppe, andV. Schulz, editors, Fast solution of discretized optimization problems (Berlin,2000), volume 138 of International Series of Numerical Mathematics, pages 1–18. Birkhäuser, Basel, 2001.

M. Benzi, G. Golub, and J. Liesen. Numerical solution of saddle point problems.Acta Numerica, 14:1–137, 2005.

M. Bergounioux, K. Ito, and K. Kunisch. Primal-dual strategy for constrainedoptimal control problems. SIAM Journal on Control and Optimization, 37(4):1176–1194, 1999.

D. P. Bertsekas. Constrained Optimization and Lagrange Multiplier Methods. AthenaScientific, Belmont, 1996.

G. Biros and O. Ghattas. Parallel Lagrange-Newton-Krylov-Schur methods for PDE-constrained optimization. Part I: The Krylov-Schur solver. SIAM Journal onScientific Computing, 27(2):687–713, 2005.

J. Bramble and J. Pasciak. A preconditioning technique for indefinite systems result-ing from mixed approximations of elliptic problems. Mathematics of Computation,50(181):1–17, 1988.

E. Casas. Control of an elliptic problem with pointwise state constraints. SIAM Jour-nal on Control and Optimization, 24(6):1309–1318, 1986. doi: 10.1137/0324078.

E. Casas. Boundary control of semilinear elliptic equations with pointwise stateconstraints. SIAM Journal on Control and Optimization, 31(4):993–1006, 1993.

Yves Chabrillac and J.-P. Crouzeix. Definiteness and semidefiniteness of quadraticforms revisited. Linear Algebra and its Applications, 63:283–292, 1984. doi: http://dx.doi.org/10.1016/0024-3795(84)90150-2.

S. Dollar and A. Wathen. Approximate factorization constraint preconditioners forsaddle-point matrics. SIAM Journal on Scientific Computing, 27(5):1555–1572,2006.

W. Gansterer, J. Schneid, and C Ueberhuber. Mathematical properties of equi-librium systems. Technical Report AURORA TR2003–13, University of Vienna,

51

Page 52: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

52 Chapter 1. Bibliography

2003.V. Girault and P.-A. Raviart. Finite Element Methods for Navier-Stokes Equations.

Springer, 1986.N. Gould and V. Simoncini. Spectral analysis of saddle point matrices with indefinite

leading blocks. SIAM Journal on Matrix Analysis and Applications, 31(3):1152–1171, 2009.

N. Gould, M. Hribar, and J. Nocedal. On the solution of equality constrained qua-dratic problems arising in optimization. SIAM Journal on Scientific Computing,23(4):1375–1394, 2001.

R. Griesse. Stability and sensitivity analysis in optimal control of partial differen-tial equations. Habilitation Thesis, Faculty of Natural Sciences, Karl-FranzensUniversity Graz, 2007.

R. Griesse, N. Metla, and A. Rösch. Convergence analysis of the SQP methodfor nonlinear mixed-constrained elliptic optimal control problems. Journal ofApplied Mathematics and Mechanics, 88(10):776–792, 2008. doi: 10.1002/zamm.200800036.

E. Haber and U. Ascher. Preconditioned all-at-once methods for large sparse pa-rameter estimation problems. Inverse Problems, 17:1847–1864, 2000.

W.W. Hager and H. Zhang. A survey of nonlinear conjugate gradient methods.Pacific journal of Optimization, 2(1):35–58, 2006.

R. Herzog and E. Sachs. Preconditioned conjugate gradient method for optimalcontrol problems with control and state constraints. SIAM Journal on MatrixAnalysis and Applications, 31(5):2291–2317, 2010. doi: 10.1137/090779127.

M. Hintermüller and M. Hinze. Moreau-Yosida regularization in state constrainedelliptic control problems: Error estimates and parameter adjustment. Tech-nical Report SPP1253-08-04, Priority Program 1253, German Research Foun-dation, 2008. URL http://www.am.uni-erlangen.de/home/spp1253/wiki/index.php/Preprints.

M. Hintermüller and K. Kunisch. Feasible and non-interior path-following in con-strained minimization with low multiplier regularity. SIAM Journal on Controland Optimization, 45:1198–1221, 2006. doi: 10.1137/050637480.

M. Hintermüller, K. Ito, and K. Kunisch. The primal-dual active set strategy asa semismooth Newton method. SIAM Journal on Optimization, 13(3):865–888,2002.

M. Hinze and K. Kunisch. Second order methods for optimal control of time-dependent fluid flow. SIAM Journal on Control and Optimization, 40(3):925–946,2001. doi: 10.1137/S0363012999361810.

K. Ito and K. Kunisch. Augmented Lagrangian methods for nonsmooth, convex opti-mization in Hilbert spaces. Nonlinear Analysis: Theory, Methods & Applications,41(5-6):591–616, 2000. doi: 10.1016/S0362-546X(98)00299-5.

K. Ito and K. Kunisch. Semi-smooth Newton methods for state-constrained optimalcontrol problems. Systems and Control Letters, 50:221–228, 2003. doi: 10.1016/S0167-6911(03)00156-7.

K. Ito and K. Kunisch. Lagrange multiplier approach to variational problems andapplications, volume 15 of Advances in Design and Control. Society for Industrialand Applied Mathematics (SIAM), Philadelphia, PA, 2008.

C. Keller, N. Gould, and A. Wathen. Constrained preconditioning for indefinitelinear systems. SIAM Journal on Matrix Analysis and Applications, 21:1300–1317, 2000.

Page 53: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

§ 7. Source Code with Comments 53

C. T. Kelley. Iterative Methods for Optimization, volume 18 of Frontiers in AppliedMathematics. Society for Industrial and Applied Mathematics (SIAM), Philadel-phia, PA, 1999.

K. Krumbiegel and A. Rösch. A virtual control concept for state constrained optimalcontrol problems. Computational Optimization and Applications, 43(2):213–233,2009. doi: 10.1007/s10589-007-9130-0.

C. Meyer, A. Rösch, and F. Tröltzsch. Optimal control of PDEs with regularizedpointwise state constraints. Computational Optimization and Applications, 33(2–3):209–228, 2005. doi: 10.1007/s10589-005-3056-1.

C. Meyer, U. Prüfert, and F. Tröltzsch. On two numerical methods for state-constrained elliptic control problems. Optimization Methods and Software, 22(6):871–899, 2007. doi: 10.1080/10556780701337929.

J. Nocedal and S. Wright. Numerical Optimization. Springer, New York, 1999.J. Nocedal and S. Wright. Numerical Optimization. Springer, New York, second

edition, 2006.A. Quarteroni and A. Valli. Numerical Approximation of Partial Differential Equa-tions. Springer, Berlin, 1994.

S. Robinson. Strongly regular generalized equations. Mathematics of OperationsResearch, 5(1):43–62, 1980.

A. Rösch and K. Kunisch. A primal-dual active set strategy for a general classof constrained optimal control problems. SIAM Journal on Optimization, 13(2):321–334, 2002.

A. Rösch and D. Wachsmuth. Semi-smooth Newton’s method for an optimal controlproblem with control and mixed control-state constraints. Optimization Methodsand Software, 2010. doi: 10.1080/10556780903548257.

T. Rusten and R. Winther. A preconditioned iterative method for saddlepointproblems. SIAM Journal on Matrix Analysis and Applications, 13(3):887–904,1992.

Y. Saad. Iterative Methods for Sparse Linear Systems. SIAM, Philadelphia, secondedition, 2003.

J. Schöberl and W. Zulehner. Symmetric indefinite preconditioners for saddle pointproblems with applications to PDE-constrained optimization. SIAM Journal onMatrix Analysis and Applications, 29(3):752–773, 2007.

J. Shewchuk. An introduction to the conjugate gradient method with-out the agonizing pain. Technical report, School of Computer Science,Carnegie Mellon University, 1994. http://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf.

V. Simoncini and D.B. Szyld. Recent computational developments in Krylov sub-space methods for linear systems. Numerical Linear Algebra with Applications,14:1–59, 2007.

Martin Stoll and Andy Wathen. Combination preconditioning and the Bramble-Pasciak+ preconditioner. SIAM Journal on Matrix Analysis and Applications, 30(2):582–608, 2008. ISSN 0895-4798. doi: 10.1137/070688961.

F. Tröltzsch. On the Lagrange-Newton-SQP method for the optimal control ofsemilinear parabolic equations. SIAM Journal on Control and Optimization, 38(1):294–312, 1999.

F. Tröltzsch. Optimal Control of Partial Differential Equations, volume 112 of Grad-uate Studies in Mathematics. American Mathematical Society, Providence, 2010.Theory, methods and applications, Translated from the 2005 German original by

Page 54: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

54 Chapter 1. Bibliography

Jürgen Sprekels.M. Ulbrich. Semismooth Newton methods for operator equations in function spaces.SIAM Journal on Control and Optimization, 13(3):805–842, 2003.

S. Volkwein. Mesh-Independence of an Augmented Lagrangian-SQP Method inHilbert Spaces and Control Problems for the Burgers Equation. PhD thesis, Fach-bereich Mathematik, Technische Universtät Berlin, 1997.

S. Volkwein. Mesh-independence for an Augmented Lagrangian-SQP method inHilbert spaces. SIAM Journal on Control and Optimization, 38:767–785, 2000.

S. Volkwein. Optimal control of laser surface hardening by utilizing a nonlinearprimal-dual active set strategy. Report No. 277, Special Research Center F003,Project Area II: Continuous Optimization and Control, University of Graz &Technical University of Graz, Austria, 2003.

S. Volkwein. Nonlinear conjugate gradient methods for the optimal control of lasersurface hardening. Optimization Methods and Software, 18:179–199, 2004.

A. Wathen and T. Rees. Chebyshev semi-iteration in preconditioning. TechnicalReport NA–08/14, Oxford Universify Computing Laboratory, 2008.

Page 55: Lectures Notes Algorithms and Preconditioning in PDE ...€¦ · Lectures Notes Algorithms and Preconditioning in PDE-Constrained Optimization Prof. Dr. R. Herzog held in July 2010

§ 7. Source Code with Comments 55


Recommended