Efficient Direct Multiple Shooting for Nonlinear Model
Predictive Control on Long Horizons
C. Kirches∗,a, L. Wirschinga, H. G. Bocka, J. P. Schlodera
aInterdisciplinary Center for Scientific Computing (IWR), Heidelberg University,Im Neuenheimer Feld 368, 69120 Heidelberg, GERMANY
Abstract
We address direct multiple shooting based algorithms for nonlinear model predic-tive control, with a focus on problems with long prediction horizons. We describedifferent efficient multiple shooting variants with a computational effort that isonly linear in the horizon length. Proposed techniques comprise structure exploit-ing linear algebra on the one hand, and approximation of derivative information inan adjoint Sequential Quadratic Programming method on the other hand. For ex-plicit one–step methods for ordinary differential equations we address the issue ofconsistent and fast generation of both forward and adjoint derivatives of dynamicprocess models according to the principle of Internal Numerical Differentiation.We discuss the applicability of the proposed methods at the example of threebenchmark problems. These have recently been addressed in literature and serveto evaluate the relative performance of each of the proposed methods for both off–line optimal control and on–line nonlinear model predictive control. Throughout,we compare against results published for a recently proposed collocation approachbased on finite elements.
Key words: nonlinear model predictive control, direct and simultaneous
methods for optimal control, sensitivity generation, benchmarks
1. Introduction1
In Nonlinear Model Predictive Control (NMPC), one repeatedly computes2
solutions to optimal control problems (OCPs) on a finite prediction horizon in3
∗Corresponding author: Tel.: +49 (6221) 54–8895; Fax: +49 (6221) 54–5444Email address: [email protected] (C. Kirches)
Preprint submitted to Journal of Process Control July 4, 2011
order to generate feedback controls for dynamical processes. Since these opti-4
mal control problems are approximations to an infinite–horizon counterpart,5
the choice of the prediction horizon length is crucial. For closed–loop per-6
formance and stability, the use of long horizons would be preferable. More-7
over, fast systems often require a fine control discretization of the prediction8
horizon for sufficient controllability. Classic numerical methods, however,9
frequently show a cubic runtime complexity in the horizon length resp. the10
discretization granularity. This effectively limits the applicability of such11
methods to short horizons and coarse discretizations.12
1.1. Contributions13
In this paper we reconsider three case studies for NMPC on long horizons14
that have been presented in [1, 2]. For each of the case studies we com-15
pare the performance of a full–space and an adjoint Sequential Quadratic16
Programming (SQP) method, as well as of three different approaches at17
structure exploitation for the multiple shooting parameterization. For the18
adjoint SQP method we propose a matrix–free condensing step with reduced19
runtime complexity. We improve over [1], wherein model predictive control20
schemes are not addressed, by investigating the performance and behavior21
of all presented algorithms in an NMPC context for typical set point change22
scenarios. Here we conduct a detailed runtime analysis and give recommen-23
dations for the choice of algorithms depending on the characteristics of the24
dynamic process under consideration. In [1] a collocation scheme based on25
finite elements (CFE) is proposed for the computation of sensitivities. Ad-26
dressing this, we review the principle of Internal Numerical Differentiation27
(IND) due to Bock [3] and apply it to automatically, precisely, and efficiently28
compute derivatives of a discretization scheme for the process dynamics. Us-29
ing the test cases we compare the precision and computational effort of IND30
to that reported for the CFE scheme.31
2. The Direct Multiple Shooting Method for Optimal Control32
In this section we give a brief presentation of the direct multiple shooting33
method for optimal control [4] and the solution of the arising structured34
nonlinear programming problem (NLPs) in an SQP context.35
2
2.1. Optimal control problem formulation36
We consider the following class of OCPs which typically arise in NMPC37
on the fixed and finite prediction horizon [t0, t0 + T ],38
minx(·),u(·)
∫ t0+T
t0
L(x(t), u(t)) dt+ E(x(t0 + T )) (1a)
s.t. x(t) = f(x(t), u(t)), t ∈ [t0, t0 + T ], (1b)
x(t0) = x0, (1c)
0 ≤ r(x(t), u(t)), t ∈ [t0, t0 + T ], (1d)
0 ≤ h(x(t0 + T )). (1e)
We denote by x(t) ∈ Rnx the state vector and by u(t) ∈ Rnu the vector of39
continuous controls of the dynamic process.40
The state trajectory is determined from the initial value problem (IVP)41
(1b, 1c), where x0 is the current state of the process and f (x, u) describes42
a model of the dynamic process. For clarity of exposition we concentrate on43
ODE process models in the following, and refer to [5] for treatment of models44
described by differential–algebraic equations (DAEs). States and controls45
may be subject to constraints (1d) and the final state may be restricted by46
an end–point constraint (1e).47
The objective function is of Bolza type with a Lagrange term L (x, u)48
and a Mayer term E (x(T )). This formulation also includes least–squares49
objective functions of the form L (x, u) = ‖l(x, u)‖22, where l is the least-50
squares residual vector. A typical example is the tracking-type objective51
L (x, u) = (x− x)T Q(t) (x− x) + (u− u)T R(t) (u− u) , (2)
where x and u are reference trajectories for x and u, and Q(t) and R(t)52
suitable positive definite weighting matrices. A typical choice for the Mayer53
term is the quadratic cost54
E (x(t0 + T )) =(x(t0 + T )− x(t0 + T )
)TP(x(t0 + T )− x(t0 + T )
), (3)
with a suitable weighting matrix P . The Mayer term can be used — typically55
in conjunction with the end–point constraint h(x(T )) — to design feedback56
control schemes that guarantee stability of the closed-loop system, see [6, 7].57
The problem may also depend on time–independent model parameters58
which are not considered as degrees of freedom for the optimization. These59
3
parameters may e.g. define set–points for tracking–type objectives, or may be60
fixed design parameters of the process model. In practice, it may happen that61
some of the parameters change during the runtime of the process, and this62
gives rise to the important area of on–line state and parameter estimation63
(see e.g. [8, 9, 10]).64
2.2. Discretization of Controls and Parameterization of States65
In this work, algorithms for the efficient numerical solution of problem (1)66
are based on the direct multiple shooting method for optimal control, first67
described by [11, 4] and extended in a series of subsequent works, cf. [12].68
For a suitable partition of the prediction horizon [t0, t0 + T ] ⊂ R into N69
shooting intervals [ti, ti+1], 0 ≤ i < N , we choose the control discretization70
u(t) = ϕi(t, qi), t ∈ [ti, ti+1]. (4)
In NMPC, the usual choice for the basis functions ϕi are piecewise constant71
controls ϕi(t, qi) = qi ∈ Rnq for t ∈ [ti, ti+1]. In contrast to single shooting72
we also apply a state parameterization by introduction of additional initial73
values si for computing the state trajectories on the shooting intervals,74
xi(t) = f(xi(t), ϕi(t, qi)), xi(ti) = si, [ti, ti+1], 0 ≤ i < N. (5)
Continuity of the optimal trajectory on the whole interval [0, T ] is ensured75
by additional matching conditions76
si+1 = xi(ti+1; ti, si, qi), 0 ≤ i < N, (6)
wherein xi(t; ti, si, qi) denotes the solution of the IVP (5) depending on si77
and qi. One particular advantage of this approach is that it allows for the78
use of adaptive integrators for function and sensitivity evaluation, cf. Section79
3. Path constraints (1d) are usually imposed on the shooting grid {ti}0≤i≤N80
only, but strict feasibility in the interior could be ensured if needed, cf. [13].81
2.3. Nonlinear Program82
From the multiple shooting discretization we obtain the NLP83
mins,q
∑N−1
i=0Li (si, qi) + E (sN) (7a)
s.t. 0 = x0 − s0, (7b)
0 = xi(ti+1; ti, si, qi)− si+1, 0 ≤ i < N, (7c)
0 ≤ r(si, ϕi(ti, qi)), 0 ≤ i < N, (7d)
0 ≤ h(sN), (7e)
4
wherein Li (si, qi) =∫ ti+1
tiL(x(t), ϕi(t, qi)) dt. This NLP shows a parametric84
dependence on the initial value x0 and, with Λ = (I, 0, 0, . . . )T and w = (q, s)85
where q = (q0, . . . , qN−1) and s = (s0, . . . , sN), can be written in the more86
generic form87
minw
φ(w) s.t. c(w) + Λx0 = 0, d(w) ≥ 0. (8)
2.4. Sequential Quadratic Programming88
We solve NLP (8) with a Newton–type method. Starting with an initial89
guess (w0, λ0, µ0), a full step SQP iteration (see e.g. [14]) is performed by90
solving the quadratic programming problem (QP)91
min∆w
12∆wTBk∆w + ∆wT bk (9a)
s.t. 0 = Ck∆w + ck + Λxk0, (9b)
0 ≤ Dk∆w + dk. (9c)
for (∆w, λQP, µQP) and iterating according to92
wk+1 = wk + ∆w, λk+1 = λQP, µk+1 = µQP. (10)
In (9), Bk := B(wk) denotes an approximation of the Hessian of the La-93
grangian of (7), bk := ∇φ(wk) is the objective gradient, and (9b, 9c) are94
linearizations of the equality constraints c(·) and inequality constraints d(·)95
in the current iterate wk respectively. Various structural features of QP96
(9), such as block diagonal Hessian and block bidiagonal Jacobians can be97
exploited, see Section 5.98
2.5. Adjoint SQP99
In subproblem (9) it may be desirable to use approximations Ck, Dk100
instead of the exact Jacobians Ck, Dk. In this case, one has to worry about101
correct identification of the true active set of the solution of (9). In [48, 2] it102
has been shown that this is possible by using the so–called modified gradient103
bk := bk + (Ck − Ck)Tλk + (Dk −Dk)Tµk (11)
in place of bk. The key motivation here is that the product CkTλk can be104
computed as a cheap adjoint derivative of the right-hand side of (7c), without105
having to compute the expensive full Jacobian Ck. The same of course holds106
true for the right-hand sides of (7d, 7e) and DkTµk. For (7c) this involves107
the computation of an adjoint sensitivity of the discretized IVP (1b, 1c).108
Applicable principles are addressed in Section 3.109
5
3. Solutions and Sensitivities of the Dynamic Process110
The formulation of the QP (9) crucially relies on the availability of first111
order derivative information. In this section we review the fundamental prin-112
ciple of Internal Numerical Differentiation, pioneered by [3]. We discuss its113
use to automatically, efficiently, and precisely compute derivatives of a dis-114
cretization scheme for (1b), as shown in [3, 16], for the class of explicit115
Runge–Kutta methods [17, 18, 19]. Discussions of the more involved case of116
linear multi–step methods can be found e.g. in [20, 21] and in [22, 23] for the117
adjoint case.118
3.1. The Principle of Internal Numerical Differentiation119
In optimization of dynamic processes by direct and simultaneous methods120
like collocation or direct multiple shooting, also referred to as “first discretize,121
then optimize” approaches, it is paramount that all employed derivatives are122
consistent with the discretization, i.e., one is interested in an exact derivative123
of the discretized problem.124
In order to achieve this goal in direct methods for optimal control, it is125
vitally important to use the same discretization scheme for both computa-126
tion of the nominal solution trajectory and computation of sensitivities, i.e.,127
to ensure identical choices for all adaptively chosen components of the dis-128
cretization scheme during both computations. These components may com-129
prise e.g. the choice of step sizes in error–controlled methods, of orders in130
variable–order methods, of Jacobians and iteration counts in implicit meth-131
ods, or of pivoting decisions in factorizations. This principle, referred to132
as Internal Numerical Differentiation [3], extends also to the process model,133
i.e., the ODE’s right hand side function f (1b).134
3.2. An Exemplary Adaptive Discretization Scheme135
To illustrate this principle, we consider an explicit Runge–Kutta (RK)136
scheme Φ with s ≥ 1 stages and Butcher tableau α, γ ∈ Rs, β ∈ Rs×s. This137
scheme computes an approximation η(τ +h) = η+hΦ(τ, η;h) to the solution138
x(τ + h; τ, η) of the IVP139
x(t) = f(t, x(t)), x(τ) = η, t ∈ [τ, τ + h], (12)
using the step function140
Φ(τ, η; h) :=∑s
i=1γiki, ki := f
(τ + hαi, η + h
∑i−1
j=1βijkj
), (13)
6
wherein we dropped the control argument from f . We assume that some error141
control mechanism, given an initial value η0, in iterations k = 0, . . . , K − 1142
adaptively chooses a step size hk > 0 to compute the approximation ηk+1 =143
η(τk+1) to x(τk+1) at time τk+1 = τk+hk. Details can be found e.g. in [19, 24]144
and many other related works.145
3.3. A Consistent Forward Mode Discretization Scheme146
A discretization scheme to compute a forward directional sensitivity147
xd(tb; ta) = dx(tb)dx(ta)
· d, [ta, tb] ⊆ [t0, t0 + T ], d ∈ Rnx 6= 0 (14)
that is consistent with (13) can be obtained by forward mode differentiation148
of (13) with respect to η and reads149
Φd(τ, η, ηd; h) :=s∑i=1
γikdi , kdi := ∂f
∂x(·) ·
(ηd + h
∑i−1
j=1βijk
dj
), (15)
where ηd(τ) is the approximate of xd(τ ; ta), and ηd(ta) = d. Arguments of ∂f∂x
150
have been dropped for brevity of notation and are the same as in (13). Hence,151
the forward sensitivity scheme (15) is best evaluated simultaneously with the152
forward simulation scheme (13). Clearly, both schemes are consistent: step153
sizes and evaluation points coincide, and forward directions are propagated154
by Φd just like state approximations by Φ. Hence the IND principle is satisfied155
here. We remark that solving the variational system156
xd(t) = ∂f∂x
(t, x) xd(t), xd(ta) = d (16)
on [ta, tb] conforms to the IND principle only if the same RK scheme and the157
same sequence {hk} of step sizes as for the nominal solution is chosen, and158
if the approximations ηk are reused to evaluate ∂f∂x
.159
3.4. A Consistent Adjoint Mode Discretization Scheme160
As discussed in Section 2.5, the availability of adjoint derivative informa-161
tion may bring considerable benefits to Newton–type methods. A discretiza-162
tion scheme to compute an adjoint directional ODE sensitivity163
λx(tb; ta) = λT · dx(tb)dx(ta)
, [ta, tb] ⊆ [0, T ], λ ∈ Rnx 6= 0 (17)
7
that is consistent with (13) can be obtained by applying the reverse mode of164
automatic differentiation (see e.g. [25]) with respect to η to (13),165
λΦ(τ, η, λη; h) := −s∑i=1
λkTi∂f∂x
(·), λki := γiλη + h
s∑j=i+1
βjiλkTj
∂f∂x
(·).
This adjoint scheme starts with ληK := λ and proceeds for k = K, . . . , 1166
backwards in time with steps ληk−1 := ληk − hk−1 · λΦ(tk−1, ηk−1,ληk; hk−1).167
Afterwards, λη0 is the approximation of (17) and the adjoint directional168
derivative of the discretization scheme. We remark that in some similarity169
to the forward case, solving the adjoint system170
˙λx(t)T = −λx(t)T ∂f∂x
(t, x), λx(tb) = d (18)
on [ta, tb] backwards in time using the RK scheme that is adjoint to the one171
used for the forward simulation, i.e., the RK scheme with transposed Butcher172
matrix β, creates a consistent scheme conforming to IND if again all adaptive173
components remain unaltered, cf. [16, 2].174
3.5. Remarks on Efficiency and Accuracy175
The IND principle is specifically designed to allow for larger integrator176
steps hk to be made, hence allowing for faster computation of the ODE sys-177
tem’s solution and sensitivities, incurring no compromise in accuracy (see178
Section 6). By invoking automatic differentiation, a sensitivity computation179
scheme consistent in the sense of the IND principle always guarantees that180
the obtained derivatives are, up to rounding errors, those of the chosen dis-181
cretization scheme. The actual choice of the discretization scheme is liberated182
of any derivative accuracy concerns and should rather be made with regard183
to e.g. the nonlinearity, stiffness, and numerical stability. For stiff systems,184
implicit methods such as [23, 21] are required. Applicable IND techniques185
can be found e.g. in [23].186
4. Techniques for Nonlinear Model Predictive Control187
We now address the issue of applying either of the two SQP methods of188
Sections 2.4, 2.5 in an on–line NMPC setting.189
8
4.1. Initial–Value Embedding190
The key to an efficient numerical algorithm for NMPC is to reuse infor-191
mation from the last problem to initialize the new problem. This is due192
to the fact that subsequent problems differ only in the parameter x0 of the193
linear embedding Λ (8). If model predictions are sufficiently close to real194
process behavior, it is reasonable to expect that the information contained195
in the previous problem’s solution is a good initial guess very close to the196
solution of the new subproblem.197
In [26] and related works it has been proposed to initialize the current198
problem with the full solution of the previous optimization run, in particular199
control and state variables as well as multipliers. In doing so, the value of200
s0 will in general not be the value of the current state x0. We can, however,201
guarantee that s0 attains the value of x0 already after the first full Newton–202
type step by explicitly including the linear initial value constraint (7b), as203
done in the QP formulation (9). We refer to this way to initialize the current204
problem as Initial Value Embedding (IVE).205
4.2. Real–Time Iterations206
Applying the Newton–type step to the new problem initialized by the207
IVE yields a tangential predictor of the solution, i.e., a first order Taylor ap-208
proximation, even in the presence of an active set change. This motivates the209
idea of real–time iterations, which perform only one Newton–type iteration210
per NMPC sample, and is at the same time the main reason for preference211
of active set methods over interior–point techniques. We refer to [27] for a212
detailed survey on the topic of initial value embeddings and the resulting213
first order tangential predictors.214
4.3. Three Phases of a Real–Time Iteration215
Using the IVE also has an important algorithmic advantage. We can eval-216
uate all derivatives and all function values except the initial value constraint217
without knowledge of the current state x0. Consequently, we can pre–solve a218
major part of QP (9). This allows to separate each real–time iteration into219
the following three phases.220
Preparation. All functions and derivatives that do not require knowledge of221
x0 are evaluated using the iterate (wk, λk, µk) of the previous step. ODE222
solution and sensitivity computation according to the IND principle take223
place in this phase. In addition, sparsity analysis, structure exploitation, and224
9
matrix factorizations happen here, see Section 5. Note that the preparation225
phase of the new problem always takes place one sampling period ahead. The226
preparation phase can be interpreted as setting up the tangential predictor227
as a piecewise linear mapping x0 7→ ∆qk0 .228
Feedback. As soon as x0 is available, the QP (9) is solved for ∆qk0 and the229
feedback control ϕ0(tk, qk0 +∆qk0) is given to the process. Hence, the feedback230
delay reduces to the remaining solution time of the QP after preparation.231
The affine-linear dependence of this QP on x0 via Λ can further be exploited232
as described in Section 4.4.233
Transition. The full variables vector ∆wk = (∆qk,∆sk) is computed. The234
SQP step (10) is performed to obtain the new set (wk+1, λk+1, µk+1) of NLP235
variables.236
4.4. Parametric Quadratic Programming237
Both the structured NLP (7) and the QP subproblems (9) show a linear238
parametric dependence on x0. This is favorably exploited by parametric239
active set methods, cf. [28, 29, 30]. The idea here is to introduce a linear240
affine homotopy in a scalar parameter τ ∈ [0, 1] ⊂ R from the QP that was241
solved in iteration k − 1 to the QP to be solved in iteration k:242
min∆w
12∆wTB∆w + bT (τ)∆w (19a)
s.t. 0 = C∆w + c(τ) + Λx0(τ), (19b)
0 ≤ D∆w + d(τ), (19c)
with initial values x0(0) = xk−10 , x0(1) = xk0. The transition from the old243
to the new QP data is realized by gradient and constraint right hand sides244
b ∈ Hnw , c ∈ Hnc , d ∈ Hnd that are affine in τ ,245
Hn = {φ : R→ Rn | φ(τ) = (1− τ)φk−1 + τφk, τ ∈ (0, 1)}. (20a)
Moreover, from the optimality conditions of QP (19) in τ = 0 and τ = 1 it246
can be seen that an update of the QP’s matrices B, C, D is also possible247
without having to introduce a matrix–valued homotopy, e.g. [38].248
Using this approach to compute the SQP algorithm’s steps has multiple249
advantages. First, a phase one for finding a feasible point of the QP is250
unnecessary, as we can start the homotopy in a trivial QP with zero vectors251
10
and known optimal solution. Second, we can monitor the process of solving252
the QP using the distance to the homotopy end. Intermediate iterates are253
physically meaningful, and are optimal for a known QP of the homotopy.254
Thus, intermediate control feedback can be given during the ongoing solution255
process. Furthermore, this property of the intermediate iterates also gives256
rise to online algorithms which fix the maximum number of active set changes257
to meet hard real–time constraints, cf. [29, 30].258
5. Variants of Structure Exploitation259
The QPs (9, 19) exhibit a highly ordered block structure that is due to the260
direct multiple shooting discretization of problem (1). As efficient numerical261
algorithms will have to exploit this structure, we describe in this section three262
algorithmic variants that serve this purpose given different characteristics of263
the problem.264
5.1. Generic Sparse Solvers265
If QP (9) or (19) is to be solved directly in an active set method, fac-266
tors of the structured symmetric indefinite KKT system must be obtained267
after every active set change. A straightforward approach is to use a sparse268
LBLT decomposition as available e.g. with the highly efficient implementa-269
tion MA57 [31] available from HSL. An alternative ignoring symmetry is a270
LU decomposition (e.g. through UMFPACK [32]) as proposed in [33]. Suf-271
ficient sparsity of the QP blocks is required for this approach to be efficient,272
though, and the observable runtime complexity crucially depends on this. In273
addition, matrix update techniques that provide for cheap recovery of the274
factorization after an active set change are not available in general.275
5.2. Block Structured Factorization276
An alternative approach much more tailored to the particular direct mul-277
tiple shooting structures is a symmetric indefinite block factorization as pro-278
posed in [34, 35]. An extensive discussion of the family of such factorizations279
is available in [36]. In [37] a similar idea has been used in a dual active set280
method. In [38] the application of such approaches to mixed–integer optimal281
control problems is discussed, and suitable matrix update techniques are de-282
rived. Achievable runtime complexities are in general linear in the number283
N of shooting intervals, and cubic in nx resp. nq. Update techniques reduce284
the latter effort to quadratic complexity [37, 39].285
11
5.3. Condensing286
The fundamental idea of all so-called condensing algorithms is to pre–287
process the block structured QP (9) into a smaller but densely populated288
one before solving it by a dense active set method. Indeed, this is the only289
structure exploiting approach considered for comparison of run times in [1],290
and we briefly present it in the following.291
The direct multiple shooting discretization creates a particular block292
structure of the constraint matrix Ck in (9) which is deduced from (7),293
C =
Gq0
. . .
GqN−1
=: C1︷ ︸︸ ︷−IGs
0 −I. . .
. . .
GsN−1 −I
=: C2︷ ︸︸ ︷ . (21)
Herein Gsi denotes the state sensitivity dx(ti+1; ti, si, qi)/dsi and Gq
i denotes294
the parametric sensitivity dx(ti+1; ti, si, qi)/dqi, which both can be computed295
according to the IND principle of section 3. The matrices Bk, Dk and the296
vector bk are partitioned accordingly. Condensing projects problem (9) onto297
the null space of the linearization of the matching conditions (6) by applying298
a block Gaussian elimination M to (21) that yields C ′ = MCk with299
C ′1 =
0Gq
0
Gs1G
q0 Gq
1...
.... . .
GN−11 Gq
0 GN−12 Gq
1 · · · GqN−1
, C ′2 = −I. (22)
Therein, Gji = Gs
i · . . . ·Gsj =
∏jk=iG
sk for j ≥ i. In the same way we obtain300
c′ = Mck. We have from (22) that 0 = C ′1∆q −∆s + c′. This shape of the301
constraints lends itself to elimination of ∆s from the problem by substitution302
of ∆s in both the objective and the constraints. This results in the condensed303
problem304
min∆q
12∆qTB′′∆q + ∆qT b′′ (23a)
s.t. 0 ≤ D′′∆q + d′′, (23b)
12
wherein305
B′′ = B11 +B12C′1 + C ′
T1B
T12 + C ′
T1B22C
′1, D′′ = D1 +D2C
′1, (24a)
b′′ = b1 + C ′T1 b2 −BT
12c′ − C ′T1B22c
′, d′′ = D2c′ + d. (24b)
We refer to e.g. [12] for an extensive derivation of condensing methods also306
addressing effective reductions for the DAE constrained case.307
In a straightforward implementation, the computational effort for con-308
densing is O(N3 · (nx)3 · (nq)2). Condensing methods favourably exploit the309
structural properties of moderately-sized problems that have more system310
states that controls, i.e. nx � nq. Instead of nw = Nnx + (N − 1)nq un-311
knowns, the condensed QP (23) only holds the nw1 = (N − 1)nq controls312
that would also appear in a single–shooting approach, and eliminates the313
nw2 = Nnx states. Hence, this QP can typically be solved efficiently using a314
dense active set method, e.g. [41, 29, 30]. The characteristic runtime com-315
plexity here is O((nw1 )3) in the first active set QP iteration, and O((nw
1 )2) for316
all subsequent ones.317
Block structured factorizations as mentioned above are more effective for318
problems with very large control dimensions nq or very large numbers N of319
shooting intervals on the prediction horizon, see [36, 40, 38] and Section 6.320
5.4. Vector Condensing for Adjoint SQP321
In Section 2.5 we have sketched a variant of the SQP method that works322
on approximations of both the constraint linearizations and the Hessian of the323
Lagrangian. This fact can be exploited to speed up the condensing algorithm324
considerably.325
Prior to the first iteration, we initialize the adjoint SQP method with326
matrices B, C, D of our choice, e.g. the exact Hessian and Jacobians of327
the steady state solution, computed off–line. These matrices are henceforth328
kept fixed and serve as (freely available) approximations for all subsequent329
iterations. Online, only a single cheap adjoint derivative is then required to330
compute the modified gradient compensating for this approximation.331
Moreover, in (24) only the computation of the vector values b′′ and d′′332
from the modified gradient b and the constraint residuals c, d is required.333
The effort here is only O(N · (nx)2 · nq) and hence this methods promises to334
be drastically faster than the full matrix condensing of Section 5.3.335
13
6. Case Studies336
In this section we consider three case studies, two of them were also337
addressed in [1]. These comprise a nonlinear batch process, a continuously338
stirred tank reactor (CSTR) originally due to [42] and treated by e.g. [43, 44,339
45] and [1], and a motion control problem for a chain of masses connected340
by springs, see e.g. [2]. To each of the problems we apply341
• the full–space SQP method with computation of exact Jacobians in342
each NMPC iteration as described in Section 2.4 and [4, 26, 12]. Here,343
the Hessian is computed by either an L–BFGS (economic NMPC) or a344
Gauß–Newton (tracking NMPC) approximation.345
• the adjoint SQP method with cheap adjoint–mode computation of the346
modified gradient as described in Section 2.5 and [48, 2, 46, 47]. Here,347
the Hessian is computed in the off–line optimal solution in advance,348
and is kept fixed for all adjoint SQP iterations.349
We use Fehlberg’s 4th/5th order RK scheme with six stages, and compute350
sensitivities according to the IND principle. We set up and solve the struc-351
tured QP subproblem (9)352
• using a parametric active set method with a sparse LBLT factorization353
of the KKT system in the above methods, computed by the generic354
sparse linear algebra package MA57 described in [31], see Section 5.1;355
• alternatively using a parametric active set method with a tailored block356
factorization [36, 38, 40] for the arising KKT systems, see Section 5.2;357
• or alternatively by condensing of QP (9), see [16, 12], and solution of the358
condensed QP (23) by the parametric active set method qpOASES 3.0359
[29, 30], see Section 5.3.360
For full–space SQP this requires matrix condensing. This is the only361
approach considered in [1] for comparison to the CFE approach pre-362
sented there.363
For adjoint SQP, after initialization with set–point matrices, we only364
need the cheap vector condensing step (24b), cf. Section 5.4.365
Our computing platform is one core of an Intel Core i7 940 machine running366
at 2.67 GHz. Run times quoted from [1] are for an Intel P4 at 3.00 GHz.367
14
6.1. Nonlinear Batch368
The first problem considered for case study is a simplified chemical batch369
reactor with nonlinear dynamics. The yield x2 after one hour of operation is370
to be maximized by suitable control of a reactor temperature profile u. The371
problem formulation on a time horizon t ∈ [0, 1] is given as follows:372
minx(·),u(·)
−x2(1) (25a)
s.t. x1(t) = −(u(t) + pu2(t)
)x1(t) (25b)
x2(t) = u(t)x1(t), (25c)
x1(0) = 1, x2(0) = 0, (25d)
0 ≤ x1(t), 0 ≤ 1− x2(t), 0≤ u(t) ≤ 5, (25e)
where p = 12. Figure 1 depicts state trajectories and piecewise constant373
optimal control trajectories for a discretization of N = 160 shooting intervals.374
0 0.2 0.4 0.6 0.8 10
2
4
t [h]
u
(a) Control trajectory u(·).
0 0.2 0.4 0.6 0.8 10
0.5
1
t [h]
x
(b) State trajectories x1(·), x2(·).
Figure 1: Optimal control and state trajectories of problem (25) for N = 160.
375
Off–line Setting. Table 1 lists problem dimensions and objective function376
values for increasingly fine discretizations N of the time horizon, and com-377
pares run times of the three proposed structure exploiting algorithms for the378
full space SQP. For each choice of N the number of RK steps was chosen379
such that a relative accuracy of 10−6 of the optimal objective is ensured.380
SQP iterations with an L–BFGS Hessian were performed until a KKT tol-381
erance (cf. [12]) of 10−7 was satisfied. We initialized all computations with382
si = (1, 0) for 0 ≤ i ≤ N , qi = 1 for 0 ≤ i ≤ N − 1.383
Discussion. The number of SQP iterations required is nearly independent of384
N ; only very slow growth is observed. For the block QP solvers, overall run-385
time for small N is dominated by IND sensitivity computations, and grows386
15
Dimensions RK Objective Uncondensed SQP Condensed SQPN nvar ncon K [×10−1] # It. Bl.[ms] Sp.[ms] # It. Co.[ms]
5 18 10 5 −5.68388 7 3 4 7 310 33 20 4 −5.72243 9 6 6 9 520 63 40 3 −5.73298 9 9 9 9 940 123 80 2 −5.73479 10 13 12 10 1780 243 160 1 −5.73528 10 22 24 11 57
160 483 320 1 −5.73541 12 54 58 12 341320 963 640 1 −5.73544 12 152 159 14 2717
Table 1: Off-line optimal control: Objective function values, total number of full SQPiterations, and total run times until convergence for problem (25), nx = 2, nu = 1. Sp.:Block structured QP solver using MA57. Bl.: Block structured QP solver using blockstructured linear algebra. Co.: Matrix condensing and dense QP solver qpOASES.
nearly linear for N ≥ 40. Matrix condensing (Co.) runtime shows cubic387
growth, as derived from our presentation in Section 5, and falls behind for388
N ≥ 80. There is no significant difference between block structured factor-389
ization (Bl.) and generic sparse factorization (Sp.). This can be explained by390
observing that the number of active set changes stays below 3 in all SQP iter-391
ations, such that matrix update techniques available for the block structure392
factorization cannot play out their full potential.393
Even taking slightly differing computational platforms into account, all394
approaches using our IND principle are clearly at least as fast as the CFE395
approach, for which run times from 188 ms (N = 5) to 735 ms (N = 160)396
are reported in [1]. The only exception is the matrix condensing variant, the397
only one investigated in [1]. Error control is not addressed therein, but errors398
of the solutions in the range of 10−5 are reported, such that the quality of399
the obtained solutions can be assumed to be comparable.400
Shrinking Horizon NMPC. In [1] problem (25) has been mentioned as a401
benchmark for NMPC on long horizons, but no scenario is proposed. The402
real–world process is simulated by an IVP, starting at t = 0 in x = (1, 0).403
The controller is initialized in the off–line optimal solution. We consider here404
a shrinking horizon scenario with a disturbance ∆p = +0.7 at t = 0.5 h for405
∆t = 0.05 h. This disturbance is applied to the real–world simulation, and is406
assumed to be known to the optimizer with a delay of one sampling period.407
To realize a shrinking horizon, we fix the controls qN−k to qN−1 to zero408
in the k-th RTI, thus keeping the state sN−k unaltered for the remainder of409
16
the horizon. The problem’s dimensions and block structure remain the same.410
The advantage here is that for the block structure KKT factorization, the411
new factorization can be computed from the old one by means of a single412
matrix update. For vector condensing, we avoid having to recondense when413
the number of matrix blocks decreases.414
Figure 2 shows the feedback controls and the resulting state trajectories415
generated by 160 real–time iterations on a shrinking horizon for both the full416
SQP and the adjoint SQP controller. Towards the end of the horizon, the417
full SQP controller shows better reaction to the nonlinearity of the process.418
It achieves a slightly improved yield of x2(1) = 5.64731 · 10−1, compared to419
x2(1) = 5.63029 · 10−1 for the adjoint SQP controller.
0 0.2 0.4 0.6 0.8 10
2
4
t [h]
u
(a) Feedback control trajectories u(·).
0 0.2 0.4 0.6 0.8 10
0.5
1
t [h]
x
(b) State trajectories x1(·), x2(·).
Figure 2: Shrinking horizon feedback control and state trajectories of problem (25) forN = 160. Red: RTI using full SQP. Blue: RTI using adjoint SQP.
420
Discussion. For all algorithmic variants proposed, Table 2 shows a detailed421
analysis of the computational effort of the preparation and feedback phases422
of the RTI of Section 4. The time spent in the transition phase is negligible423
in all cases.424
N Prep. [ms] Feedback [ms]Sp. Co. Sp. Co.
160 10 85 2 < 1320 31 607 4 3640 105 5026 7 17
N Prep. [ms] Feedback [ms]Sp. Co. Sp. Co.
160 9 9 2 < 1320 29 28 4 1640 99 98 12 13
Table 2: NMPC: Average per–iteration run times of preparation and feedback phases inmilliseconds for the full SQP controller (left) and the adjoint SQP controller (right) onproblem (25). Sp.: Sparse QP solver using MA57. Co.: Matrix condensing (full SQP) orvector condensing (adjoint SQP), including runtime of the dense QP solver qpOASES.
From Table 2 we can again see the cubic complexity of matrix condensing425
necessary for full SQP. For the adjoint SQP controller, however, the proposed426
17
vector condensing drastically reduces the runtime spent in the preparation427
phase, and is competitive when compared to the sparse QP solver. In addi-428
tion, the adjoint SQP preparation phase is slightly faster than that of the full429
SQP. This is due to cheaper sensitivity computation by adjoint IND. This430
effect will become more evident in the next case studies.431
6.2. Continuous Stirred Tank Reactor432
The second case study addresses a continuous stirred tank reactor (CSTR)433
originally due to [42], here in a variant described by [45] and later considered434
by e.g. [43, 44]. In this setup, an exothermic reaction of x2(·) takes place in435
a liquid of varying level x1(·) with feed u1(·), and is controlled by external436
regulation u2(·) of the temperature x3(·). For t ∈ [0, 50] we strive to minimize437
a weighted deviation from a given set–point. Parameters of table 3 are taken438
from [45, 44], with time in minutes. The set–point is xs1 = 0.659 m, xs
2 =439
877 mol/m3, us1 = 0.1 m3/min, us
2 = 300 K and least–squares weights are440
w1 = 1, w2 = 10−4, w3 = 105, w4 = 10−1.441
minx(·),u(·)
∫ 50
0
w1
(x1(t)− xs
1
)2+ w2
(x2(t)− xs
2
)2(26a)
+w3
(u1(t)− us
1
)2+ w4
(u2(t)− u s
2
)2dt
s.t. x1(t) =1
πr2
(F0 − u1(t)
), (26b)
x2(t) =1
πr2
F0
(c0 − x2(t)
)x1(t)
− k0x2(t)e−ER
1x3(t) , (26c)
x3(t) =1
πr2
F0
(T0 − x3(t)
)x1(t)
− ∆H
ρ Cp
k0x2(t)e−ER
1x3(t) (26d)
+2U
rρ Cp
(u2(t)− x3(t)
),
x1(0) = xs1, x2(0) = xs
2, x3(0) = 324.5 K, (26e)
0.5 m ≤ x1(t) ≤ 2.5 m, 800 mol/m3 ≤ x2(t) ≤ 1000 mol/m3,
0.085 m3/min ≤ u1(t) ≤ 0.115 m3/min, 299 K ≤ u2(t) ≤ 301 K.
442
Off–Line Run Time Comparison. The off–line scenario investigated in [1]443
involves a set–point change of the molar concentration for t ≥ 9.0 min to444
18
Sym. Unit
x1 mx2 mol/m3
x3 Ku1 m3/minu2 K
Sym. Value Unit
F0 0.1 m3/minT0 350 Kc0 1000 mol/m3
r 0.219 mk0 7.2× 1010 1/min
Sym. Value Unit
E/R 8750 KU 54936 J/(min m2 K)ρ 1000 kg/m3
Cp 239 J/(kg K)∆H −50000 J/mol
Table 3: State and control units and parameter values and units for the CSTR model (26).
c0 = 1050 mol/m3. We initialized all NLP variables in the steady–state445
for c0 = 1000 mol/m3. The total run time for computation of the off–line446
optimal solution (N = 50 intervals, K = 20 RK steps) using the sparse QP447
solver is 89 ms.448
Figures 3 and 4 show the corresponding optimal control and state profiles449
for N = 50 shooting intervals. It is important to note that in this off–line450
computation, the optimizer anticipates this set–point change.451
0 10 20 30 40 500.0998
0.1
0.1002
0.1004
0.1006
t [min]
u1 [
m3/m
in]
(a) Control trajectory u1(·).
0 10 20 30 40 50299.5
300
300.5
301
t [min]
u2 [
K]
(b) Control trajectory u2(·).
Figure 3: Optimal control trajectories of problem (26) for N = 50.
0 10 20 30 40 50
0.58
0.6
0.62
0.64
0.66
t [min]
x1 [
m]
(a) State trajectory x1(·).
0 10 20 30 40 50870
880
890
900
t [min]
x2 [
mo
l/m
3]
(b) State trajectory x2(·).
0 10 20 30 40 50324
326
328
330
t [min]
x3 [
K]
(c) State trajectory x3(·).
Figure 4: Optimal state trajectories of problem (26) for N = 50.
The problem treated in [1] is very similar, although parameter values452
and units appear to have been mixed up in writing, and the initialization453
used there is unknown. For comparison, [1] reports a run time 984 ms for454
19
the CFE approach computing the off–line solution, roughly 10 times slower455
that our approach. In addition, conversion of the values of [45] yields U =456
54,750 J/min m2 K whereas we use the value U = 54,936 J/min m2 K as457
done in [1, 44].458
Adaptivity in IND. The CSTR dynamics show more nonlinear features in the459
optimal solution than the batch reactor does. Therefore, we use this example460
to demonstrate one feature of the IND approach to sensitivity generation,461
namely the ability to easily choose an adaptive discretization by using a462
local error detection and control facility in the RK discretization scheme.463
This feature bears potential for both computational speedups and increased464
precision of the obtained optimal solutions, and sets the IND approach apart465
from the CFE approach presented in [1].466
N Objective for fixed step RK Adaptive RKK = 2 K = 4 K = 8 K = 16 K = 32 øK Objective
10 — — 0.91899 0.92882 0.92651 28.7 0.9269120 — 0.90168 0.90908 0.90738 0.90781 15.1 0.9076640 0.89613 0.90382 0.90198 0.90247 0.90194 8.0 0.9023080 0.90276 0.90093 0.90141 0.90089 0.90133 4.6 0.90124
160 0.90064 0.90112 0.90060 0.90105 0.90093 2.8 0.90095
Table 4: Optimal objective function values for adaptive choice of variable–size RK steps tosatisfy a relative local error tolerance of 10−8 (IND only, columns 7 and 8). The fixed–stepRK schemes (CFE and IND, columns 2 to 6) do not lead to an accuracy of five significantdigits in the objective for any of the listed choices of the number K of equidistant steps.
Discussion. Table 4 shows optimal objective function values computed for467
the CSTR example (26) for increasingly fine choices of the number of shooting468
intervals N and the number of equidistant RK steps K per shooting interval.469
The results are compared to those obtained for an adaptive step size choice470
based on a Fehlberg type estimator [19] for the local truncation error of the471
RK scheme, which is required to stay below 10−8. Here, Table 4 lists the472
average number øK of RK steps per shooting interval. The corresponding473
optimal objective function values are correct up to five decimal digits. The474
fully automatic choice of steps leads to small sizes where necessary and big475
steps where possible. As is evident, a significantly lower total number of476
steps is taken, and this saves computation time. In addition, the precision is477
unrivaled by any of the optimization runs carried out with comparable fixed478
step counts.479
20
Moving Horizon NMPC. We consider the set–point change of [44] to F0 =480
0.11 m3/min at t = 5 min, and use a prediction horizon of 5 min length. Note481
that again the set–point change is not anticipated by the optimizer, but is482
only assumed to be known one sample time after it happened. On startup,483
the process is assumed to be in steady–state. Least–squares objective weights484
proposed in [1] are not suitable for steady–state tracking by NMPC, as the485
control regularization is too strong. We instead propose to choose w1 = 1,486
w2 = 10−4, w3 = 10−8, w4 = 10−4, realizing a tracking objective for xs1, xs
2487
with a reasonably small control regularization.488
Feedback control and state trajectories around the set–point change are489
shown in Figures 5 and 6 for both controllers.490
4 5 6 7 8 9 100.09
0.1
0.11
0.12
t [min]
u1
(a) Control trajectory u1(·).
4 5 6 7 8 9 10299
299.5
300
300.5
301
t [min]
u2
(b) Control trajectory u2(·).
Figure 5: Moving horizon NMPC feedback control trajectories of problem (26) for N =160. Red: RTI using full SQP. Blue: RTI using adjoint SQP.
4 6 8 100.65
0.66
0.67
0.68
0.69
t [min]
x1
(a) State trajectory x1(·).
4 6 8 10876
877
878
879
t [min]
x2
(b) State trajectory x2(·).
4 6 8 10324
324.5
325
325.5
326
t [min]
x3
(c) State trajectory x3(·).
Figure 6: Moving horizon NMPC state trajectories of problem (26) for N = 160. Red:RTI using full SQP. Blue: RTI using adjoint SQP.
Discussion. Table 5 presents run times for preparation and feedback phases491
of the RTI scheme for both the full SQP and the adjoint SQP controller. As492
before, matrix condensing falls behind as the number N of shooting intervals493
increases, but vector condensing performance proposed for the adjoint SQP494
controller matches that of the sparse QP solver. The feedback phase, how-495
ever, is considerably slower also for vector condensing. This is explained by496
21
the small number of nx = 3 differential states that can be eliminated in the497
condensed QP. The advantages of condensing and adjoint IND will become498
more evident for problems with larger state space dimension, such as in the499
third case study to be presented next.
N Prepar. [ms] Feedback [ms]Sp. Co. Sp. Co.
80 10 25 1 1160 24 297 2 4320 66 2162 4 31640 192 17744 9 193
N Prepar. [ms] Feedback [ms]Sp. Co. Sp. Co.
80 8 8 1 1160 20 20 2 2320 56 55 4 12640 172 169 9 54
Table 5: NMPC: Average per–iteration run times in milliseconds for the full SQP controller(left) and the adjoint SQP controller (right) on problem (26), nx = 3, nu = 2. Sp.: SparseQP solver using MA57. Co.: Dense QP solver qpOASES, including runtime of matrixcondensing (full SQP) or vector condensing (adjoint SQP).
500
6.3. Chain Problem501
The third case study involves a motion control problem for a chain of502
n+ 1 point masses connected by springs and subject to gravity, see [2]. The503
point mass positions are denoted by xi(t) ∈ R3 and velocities by vi(t) ∈ R3.504
The first point mass is fixed at the origin. Starting in xi(0) = (7.5i/n, 0, 0),505
vi(0) = (0, 0, 0), 0 ≤ i ≤ n, the velocity vn(t) of the final point mass is to be506
controlled by u(t) ∈ R3 such that this energy–conserving system returns to507
rest in 40 s.508
minx(·),u(·)
∫ 40
0
wv
n∑i=1
||vi(t)||22 + wx||xn(t)− xe||22 + wu||u(t)||22 dt (27a)
xi(t) = vi(t), 1 ≤ i ≤ n− 1, (27b)
vi(t) = (Fi+1(t)− Fi(t)) · n/m− g, 1 ≤ i ≤ n− 1, (27c)
xn(t) = u(t), (27d)
u(t) ∈ [−1, 1]3. (27e)
Therein for 1 ≤ i ≤ n509
Fi(t) :=(xi(t)− xi−1(t)
)· k(n− lr/||xi(t)− xi−1(t)||2
), (28)
with x0(t) := (0, 0, 0) for all t ∈ [0, 40].510
22
Characteristics and weights are given in Table 6. For n = 16 point masses511
this system has nx = 87 states and nu = 3 controls, and hence is considerably512
larger than the case studies considered in [1].
Sym. Value Unit
g (0, 0, 9.81) m/s2
k 0.1 N/mlr 0.55 m
Sym. Value Unit
m 0.45 kgn 15 –xe (7.5, 0, 0) m
Sym. Value Unit
wv 0.25 –wx 25 –wu 0.01 –
Table 6: Parameter values for the chain model (27).
513
Off–line Optimal Control. For N = 640 the NLP after discretization has514
nvar = 57690 unknowns and neq = 55680 equality constraints. The off–515
line solution is obtained after a run time of 2 minutes using the full SQP516
algorithm with Gauss–Newton approximation of the Hessian, and using the517
block structured QP solver. Off–line optimal control trajectories that bring518
the chain to rest are shown in Figure 7 for granularities N = 80, 160, 320,519
640 of the control. Clearly, a sufficiently fine control discretization is required520
to this end: For N = 80 the chain could not be brought to rest by t = 40 s.
0 5 10 15 20 25 30 35 40−1
−0.5
0
0.5
1
t [s]
ux [m
]
0 5 10 15 20 25 30 35 40−1
−0.5
0
0.5
1
t [s]
uz [m
]
(a) Control trajectories ux(·), uz(·).
0 1 2 3 4 5−6
−5
−4
−3
−2
−1
0
xx [m]
xz [
m]
(b) Point mass positions for N = 80 (red)and N = 160 (blue) at t = 40.
Figure 7: Off–line optimal controls for the chain problem (27) (left) for N = 80 (red), 160(blue), 320 (green), 640 (black), and ultimate point mass positions (xi,x(40), xi,z(40)).
521
NMPC Scenario. We now consider the above problem as an NMPC problem522
on a prediction horizon of 8 s discretized with N = 40, 80, or 160 shooting523
23
intervals, and running for 50 s. Table 7 presents run times for both proposed524
controllers and all three approaches at structure exploitation. Clearly, the525
adjoint SQP controller relying on adjoint IND sensitivity generation now526
delivers the better performance by a significant margin. Coupled with the527
proposed vector condensing approach, feedback delays in the low millisecond528
range are possible even for this larger system with 87 states. Second to vector529
condensing comes the dedicated block structured QP solver. For N = 40,530
the adjoint SQP controller is real–time feasible (8s/40 = 200ms sampling531
time) using both approaches to structure exploitation, and is almost real-532
time feasible for N = 80 (100ms sampling time). The sparse solver relying533
on MA57 now falls behind in the feedback phase due to lack of sufficient534
sparsity in the systems to be solved.535
N
4080
160
Prep. [ms] Feedback [ms]Sp. Bl. Co. Sp. Bl. Co.
229 228 1650 218 15 9455 454 7456 807 39 29916 911 38870 1767 85 161
Prep. [ms] Feedback [ms]Sp. Bl. Co. Sp. Bl. Co.
51 56 59 209 12 1108 116 123 531 28 3220 221 249 1822 89 14
Table 7: NMPC: Average per–iteration run times in milliseconds for the full SQP controller(left) and the adjoint SQP controller (right) on problem (27), nx = 87, nu = 3. Sp.: SparseQP solver using MA57. Bl.: Block structured QP solver. Co.: Dense QP solver qpOASES,including runtime of matrix condensing (full SQP) or vector condensing (adjoint SQP).
7. Summary and Conclusions536
In this paper we addressed fast numerical methods for direct multiple537
shooting based NMPC of dynamic process control problems with long predic-538
tion horizons. We presented two SQP methods, one using full Jacobian infor-539
mation and one based on approximate Jacobians and compensation through540
a modified gradient. Both methods crucially rely on efficient and accurate541
computation of forward or adjoint derivative information.542
To this end, we reviewed the fundamental principle of IND at the ex-543
ample of explicit one–step methods for ODE process models. The direct544
multiple shooting discretization induces a block structured QP subproblem545
from which approximate feedback controls are computed. For use in active546
set QP solvers we presented three variants of problem structure exploitation.547
These comprise sparse linear algebra, block structured linear algebra, and a548
24
condensing preprocessing step. For the adjoint SQP method we proposed a549
matrix free condensing step that has a significant runtime complexity advan-550
tage.551
To evaluate the relative merits of each of the proposed direct multiple552
shooting frameworks, we considered three benchmark problems recently ad-553
dressed in literature. We computed off–line optimal solutions for set–point554
change or disturbance scenarios, and also treated these scenarios in a sim-555
ulated NMPC setting. We provided detailed insight into the achieved run556
times, possible sampling rates, and feedback delays. Here, we found the pro-557
posed adjoint SQP method combined with vector condensing to perform best558
by wide margin for systems with larger state space dimensions.559
We also carried out a comparison of the IND principle to a recently pro-560
posed CFE based sensitivity generation method. Including adaptivity into561
the discretization scheme is easily possible in the IND approach. We showed562
that throughout all presented computations, the IND approach is consider-563
ably faster than CFE.564
Both the full SQP and the adjoint SQP algorithm considered in this paper565
treat optimality conditions of the underlying NLP. Further run time speedups566
are possible, e.g. by using the concept of multi–level iteration schemes as first567
described in [48].568
Acknowledgements569
The research leading to these results has received funding from the European570
Union Seventh Framework Programme FP7/2007-2013 under grant agreement571
no FP7-ICT-2009-4 248940. We gratefully acknowledge support by the Heidelberg572
Graduate School of Mathematical and Computational Methods for the Sciences573
(HGS MathComp) funded by Deutsche Forschungsgemeinschaft (DFG). The finan-574
cial support of the DFG in the context of the research cluster “Optimization–based575
control of chemical processes” is gratefully acknowledged.576
References577
[1] J. Tamimi, P. Li, A combined approach to nonlinear model predictive control of fast578
systems, Journal of Process Control 20 (2010) 1092–1102.579
[2] L. Wirsching, An SQP Algorithm with Inexact Derivatives for a Direct Multiple580
Shooting Method for Optimal Control Problems, Diploma thesis, Heidelberg Univer-581
sity, 2006.582
25
[3] H. G. Bock, Numerical treatment of inverse problems in chemical reaction kinetics, in:583
K. Ebert, P. Deuflhard, W. Jager (Eds.), Modelling of Chemical Reaction Systems,584
volume 18 of Springer Series in Chemical Physics, Springer, Heidelberg, 1981, pp.585
102–125.586
[4] H. G. Bock, K. J. Plitt, A Multiple Shooting algorithm for direct solution of optimal587
control problems, in: Proceedings of the 9th IFAC World Congress, Pergamon Press,588
Budapest, 1984, pp. 242–247.589
[5] D. Leineweber, Efficient reduced SQP methods for the optimization of chemical pro-590
cesses described by large sparse DAE models, volume 613 of Fortschritt-Berichte VDI591
Reihe 3, Verfahrenstechnik, VDI Verlag, Dusseldorf, 1999.592
[6] D. Q. Mayne, J. B. Rawlings, C. V. Rao, P. O. M. Scokaert, Constrained model593
predictive control: stability and optimality, Automatica 26 (2000) 789–814.594
[7] J. B. Rawlings, D. Mayne, Model Predictive Control: Theory and Design, Nob Hill595
Publishing, LLC, 2009.596
[8] C. V. Rao, J.B. Rawlings, D. Q. Mayne, Constrained state estimation for nonlinear597
discrete-time systems: Stability and moving horizon approximations, IEEE Transac-598
tions on Automatic Control 48 (2003) 246–258.599
[9] M. Diehl, P. Kuhl, H. G. Bock, J. P. Schloder, Schnelle Algorithmen fur die Zustands-600
und Parameterschatzung auf bewegten Horizonten, Automatisierungstechnik 54601
(2006) 602–613.602
[10] P. Kuhl, M. Diehl, T. Kraus, J. P. Schloder, H. G. Bock, A real-time algorithm for603
moving horizon state and parameter estimation, Computers and Chemical Engineer-604
ing 35 (2011) 71–83.605
[11] K. J. Plitt, Ein superlinear konvergentes Mehrzielverfahren zur direkten Berech-606
nung beschrankter optimaler Steuerungen, Diploma thesis, Rheinische Friedrich–607
Wilhelms–Universitat Bonn, 1981.608
[12] D. Leineweber, I. Bauer, A. Schafer, H. G. Bock, J. P. Schloder, An efficient multiple609
shooting based reduced SQP strategy for large-scale dynamic process optimization610
(Parts I and II), Computers and Chemical Engineering 27 (2003) 157–174.611
[13] A. Potschka, H. G. Bock, J. P. Schloder, A minima tracking variant of semi-infinite612
programming for the treatment of path constraints within direct solution of optimal613
control problems, Optimization Methods and Software 24 (2009) 237–252.614
[14] J. Nocedal, S. Wright, Numerical Optimization, Springer Verlag, Berlin Heidelberg615
New York, 2nd edition, 2006.616
26
[48] H. G. Bock, M. Diehl, E. A. Kostina, J. P. Schloder, Constrained Optimal Feedback617
Control for DAE, in: L. Biegler, O. Ghattas, M. Heinkenschloss, D. Keyes, B. van618
Bloemen Waanders (Eds.), Real-Time PDE-Constrained Optimization, SIAM, 2007,619
pp. 3–24.620
[16] H. G. Bock, Randwertproblemmethoden zur Parameteridentifizierung in Systemen621
nichtlinearer Differentialgleichungen, volume 183 of Bonner Mathematische Schriften,622
Universitat Bonn, Bonn, 1987.623
[17] C. D. T. Runge, Uber die numerische Auflosung von Differentialgleichungen, Math-624
ematische Annalen 46 (1895) 167–178.625
[18] M. Kutta, Beitrag zur naherungsweisen Integration totaler Differentialgleichungen,626
Zeitschrift fur Mathematik und Physik 46 (1901) 435–453.627
[19] E. Fehlberg, Klassische Runge-Kutta-Formeln funfter und siebenter Ordnung mit628
Schrittweiten-Kontrolle, Computing 4 (1969) 93–106.629
[20] I. Bauer, H. G. Bock, S. Korkel, J. P. Schloder, Numerical methods for initial value630
problems and derivative generation for DAE models with application to optimum631
experimental design of chemical processes, in: Scientific Computing in Chemical632
Engineering II, Springer, 1999, pp. 282–289.633
[21] L. Petzold, S. Li, Y. Cao, R. Serban, Sensitivity analysis of differential-algebraic634
equations and partial differential equations, Computers and Chemical Engineering635
30 (2006) 1553–1559.636
[22] A. Sandu, Reverse automatic differentiation of linear multistep methods, in: T. J.637
Barth, M. Griebel, D. E. Keyes, R. M. Nieminen, D. Roose, T. Schlick, C. H. Bischof,638
H. M. Bucker, P. Hovland, U. Naumann, J. Utke (Eds.), Advances in Automatic Dif-639
ferentiation, volume 64 of Lecture Notes in Computational Science and Engineering,640
Springer Berlin Heidelberg, 2008, pp. 1–12.641
[23] J. Albersmeyer, Adjoint based algorithms and numerical methods for sensitivity gen-642
eration and optimization of large scale dynamic systems, Ph.D. thesis, Heidelberg643
University, 2010.644
[24] W. Enright, D. Higham, B. Owren, W. Sharp, A Survey of the Explicit Runge–Kutta645
Method, Technical Report 291/94, Department of Computer Science, University of646
Toronto, Canada;, Toronto, M5S 1A4, Canada, 1995.647
[25] A. Griewank, Evaluating Derivatives, Principles and Techniques of Algorithmic Dif-648
ferentiation, number 19 in Frontiers in Applied Mathematics, SIAM, Philadelphia,649
2000.650
[26] M. Diehl, H. G. Bock, J. P. Schloder, R. Findeisen, Z. Nagy, F. Allgower, Real-651
time optimization and nonlinear model predictive control of processes governed by652
differential-algebraic equations, J. Proc. Contr. 12 (2002) 577–585.653
27
[27] M. Diehl, H. J. Ferreau, N. Haverbeke, Efficient numerical methods for nonlinear654
mpc and moving horizon estimation, in: L. Magni, D. Raimondo, F. Allgower (Eds.),655
Nonlinear Model Predictive Control, volume 384 of Springer Lecture Notes in Control656
and Information Sciences, Springer-Verlag, Berlin, Heidelberg, New York, 2009, pp.657
391–417.658
[28] M. Best, An Algorithm for the Solution of the Parametric Quadratic Programming659
Problem, Applied Mathematics and Parallel Computing, Physica-Verlag, Heidelberg,660
pp. 57–76.661
[29] H. J. Ferreau, H. G. Bock, M. Diehl, An online active set strategy to overcome the662
limitations of explicit MPC, International Journal of Robust and Nonlinear Control663
18 (2008) 816–830.664
[30] H. J. Ferreau, A. Potschka, C. Kirches, The qpOASES website, 2011.665
http://www.kuleuven.be/optec/software/qpOASES.666
[31] I. Duff, MA57 — a code for the solution of sparse symmetric definite and indefinite667
systems, ACM Transactions on Mathematical Software 30 (2004) 118–144.668
[32] T. Davis, Algorithm 832: UMFPACK - an unsymmetric-pattern multifrontal method669
with a column pre-ordering strategy, ACM Trans. Math. Software 30 (2004) 196–199.670
[33] H. Huynh, A Large-Scale Quadratic Programming Solver Based On Block-LU Up-671
dates of the KKT System, Ph.D. thesis, Stanford University, 2008.672
[34] M. Steinbach, A structured interior point SQP method for nonlinear optimal control673
problems, in: R. Bulirsch, D. Kraft (Eds.), Computational Optimal Control, vol-674
ume 115 of International Series of Numerical Mathematics, Birkhauser, Basel Boston675
Berlin, 1994, pp. 213–222.676
[35] M. Steinbach, Structured interior point SQP methods in optimal control, Zeitschrift677
fur Angewandte Mathematik und Mechanik 76 (1996) 59–62.678
[36] M. Steinbach, Fast recursive SQP methods for large-scale optimal control problems,679
Ph.D. thesis, Heidelberg University, 1995.680
[37] R. Bartlett, L. Biegler, QPSchur: A dual, active set, Schur complement method for681
large-scale and structured convex quadratic programming algorithm, Optimization682
and Engineering 7 (2006) 5–32.683
[38] C. Kirches, Fast numerical methods for mixed–integer nonlinear model–predictive684
control, Ph.D. thesis, Heidelberg University, 2010.685
[39] C. Kirches, H. G. Bock, J. P. Schloder, S. Sager, A factorization with686
update procedures for a KKT matrix arising in direct optimal control,687
Mathematical Programming Computation (2010). (submitted). Available Online:688
http://www.optimization-online.org/DB HTML/2009/11/2456.html.689
28
[40] C. Kirches, H. G. Bock, J. P. Schloder, S. Sager, Block structured quadratic pro-690
gramming for the direct multiple shooting method for optimal control, Optimization691
Methods and Software 26 (2011) 239–257.692
[41] P. Gill, W. Murray, M. Saunders, User’s Guide For QPOPT 1.0: A Fortran Package693
For Quadratic Programming, 1995.694
[42] K.-U. Klatt, S. Engell, Ruhrkesselreaktor mit Parallel- und Folgereaktion., in:695
S. Engell (Ed.), Nichtlineare Regelung – Methoden, Werkzeuge, Anwendungen. VDI-696
Berichte Nr. 1026, VDI-Verlag, Dusseldorf, 1993, pp. 101–108.697
[43] M. Diehl, Real-Time Optimization for Large Scale Nonlinear Processes, Ph.D. thesis,698
Heidelberg University, 2001.699
[44] G. Pannocchia, J. B. Rawlings, Disturbance models for offset–free model-predictive700
control, AIChE Journal 49 (2003) 426–437.701
[45] M. Henson, D. Seborg, Nonlinear Process Control, Prentice Hall, Upper Saddle River,702
NJ, 1st edition, 1997.703
[46] L. Wirsching, J. Albersmeyer, P. Kuhl, M. Diehl, H. G. Bock, An adjoint-based704
numerical method for fast nonlinear model predictive control, in: M. Chung, P. Misra705
(Eds.), Proceedings of the 17th IFAC World Congress, Seoul, Korea, July 6–11, 2008,706
volume 17, IFAC-PapersOnLine, 2008, pp. 1934–1939.707
[47] C. Kirches, L. Wirsching, S. Sager, H. G. Bock, Efficient numerics for nonlinear model708
predictive control, in: M. Diehl, F. Glineur, E. Jarlebring, W. Michiels (Eds.), Recent709
Advances in Optimization and its Applications in Engineering, Springer, 2010, pp.710
339–359.711
[48] H. G. Bock, M. Diehl, E. A. Kostina, J. P. Schloder, Constrained Optimal Feedback712
Control for DAE, in: L. Biegler and O. Ghattas and M. Heinkenschloss and D. Keyes713
and B. van Bloemen Waanders (Eds.), Real-Time PDE-Constrained Optimization,714
ch. 1, pp. 3–24, SIAM, 2007.715
29