Global Nonlinear Optimization and Optimal Control for ...

Dissertationzur Erlangung des Doktorgrades Dr. rer. nat.

der Fakultat fur Mathematik und Wirtschaftswissenschaftender Universitat Ulm

Global Nonlinear Optimization and Optimal

Control for Applications in Pulp and Paper

Industry

vorgelegt von

Guntram Seitz,

geboren in Mutlangen

September 2009

Amtierender Dekan: Prof. Dr. Werner Kratz

1. Gutachter: Prof. Dr. Karsten Urban

2. Gutachter: Prof. Dr. Stefan Funken

Tag der mundlichen Prufung: 02.02.2010

Danksagung

Diese Arbeit entstand wahrend meiner Zeit als Doktorand am Institut fur numerische Math-ematik der Universitat Ulm in Zusammenarbeit mit der Firma Voith Paper in Heidenheim.Ich mochte einigen Menschen, die mich wahrend der letzten drei Jahre unterstutzt haben,danken.Von der Universiat Ulm und dem Ulmer Zentrum fur wissenschaftliches Rechnen danke ich

• Prof. Dr. Karsten Urban fur die Betreuung meiner Arbeit,

• Prof. Dr. Stefan Funken fur die vielen hilfreichen Vorschlage, sowie

• allen Doktoranden und Diplomanden, die sich mit mir zwischen 2006 und 2009 Buround Institut geteilt haben und mir dabei stets freundlich und außerst hilfsbereit begeg-net sind.

Von der Firma Voith Paper danke ich

• Hermann-Josef Post fur die regelmaßigen Anregungen zu meiner Arbeit,

• Dr. Rainer Schmachtel fur die interessanten mathematischen Diskussionen, sowie

• Dr. Michael Weiß und Roland Mayer fur die Unterstutzung bei der Modellentwicklungund nicht zuletzt

• Dr. Ulrich Begemann, der die Industriekooperation ermoglicht hat.

Ich mochte auch allen anderen Mitarbeitern und Kollegen an beiden Standorten fur diestets angenehme Arbeitsatmosphare, sowie deren freundliche Unterstutzung danken.

Abstract

This work covers current problems of process simulation concerning energy efficiency inpulp and paper industry, discusses the numerical problems that arise from modeling withsate-of-the-art methods and embeds modeling, simulation, sensitivity analysis and opti-mization into a new framework with extended capabilities. Mathematical models for thedrying process in a paper machine and for the dynamic process in the wet-end of a papermachine are developed and simulations as well as optimizations are carried out.A new tool for the complete parametric sensitivity analysis of stationary process modelsdeveloped in gPROMS is presented and used with the drying process model. The resultsare used to derive an optimization problem whose results lead to a new conceptional layoutof paper machine drying sections which is currently applied for patent by Voith Paper,Heidenheim.A new method of solving time-optimal control problems for grade change problems is derivedand implemented for use with gPROMS and successfully applied to exemplary problems ofoptimal wet-end control and a refinement procedure for optimal control structures is tested.To confirm the results of the optimizations, a global optimization framework is developed.It is based on a generalization and an adaptive extension of the so-called tunneling al-gorithm for global minimization of smooth functions. The use of numerical methods forbox-constrained optimization problems based on the solution of initial value problem isdiscussed with regard on the use with a tunneling-type algorithm.Numerical benchmarks with multi-dimensional non-convex test functions of the presentedalgorithm for global optimization are performed and compared by means of statistics toanalyze the effect of the suggested extensions.

Contents

1. Introduction 11.1. Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1. Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2. Paper and Paper Industry . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.1. The Process of Paper Making . . . . . . . . . . . . . . . . . . . . . . 51.2.2. About Paper Industry . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3. Numerical Simulation in Engineering Application . . . . . . . . . . . . . . . 7

2. Process Simulation in Pulp and Paper Industry 112.1. Applications and Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.1. gPROMS - An Overview . . . . . . . . . . . . . . . . . . . . . . . . . 152.2. The Drying Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2.1. A Dryer Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.2.2. Drying Section Geometry . . . . . . . . . . . . . . . . . . . . . . . . 322.2.3. Steady-State Simulations . . . . . . . . . . . . . . . . . . . . . . . . 35

2.3. The Wet-End Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.3.1. About the Transport Problem . . . . . . . . . . . . . . . . . . . . . . 382.3.2. Model Structures for Pressure-Driven Balancing . . . . . . . . . . . 432.3.3. Library Components . . . . . . . . . . . . . . . . . . . . . . . . . . . 452.3.4. Process Dynamics at Exemplary Plants . . . . . . . . . . . . . . . . 52

3. Solution Methods 573.1. Solution of Nonlinear Systems of Equations . . . . . . . . . . . . . . . . . . 57

3.1.1. Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583.2. Solution of Differential-Algebraic Systems of Equations . . . . . . . . . . . . 63

3.2.1. Index 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633.2.2. Higher differentiation index . . . . . . . . . . . . . . . . . . . . . . . 643.2.3. Consistent Initialization . . . . . . . . . . . . . . . . . . . . . . . . . 643.2.4. ε-embedding for index 1 problems . . . . . . . . . . . . . . . . . . . 643.2.5. Linear Multistep Methods . . . . . . . . . . . . . . . . . . . . . . . . 653.2.6. Predictor-Corrector Idea . . . . . . . . . . . . . . . . . . . . . . . . . 683.2.7. Variable Step Size and Order . . . . . . . . . . . . . . . . . . . . . . 68

3.3. Parametric Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 693.3.1. Steady-State Sensitivity Problems . . . . . . . . . . . . . . . . . . . 693.3.2. A C++ foreign Process for Sensitivity Analysis with gPROMS . . . 713.3.3. Dynamic Sensitivity Problems . . . . . . . . . . . . . . . . . . . . . 77

4. Nonlinear Programming and Optimal Control 834.1. Basics in Unconstrained and Constrained Optimization . . . . . . . . . . . 83

4.1.1. Line-Search and Trust-Region Methods . . . . . . . . . . . . . . . . 854.1.2. Quasi-Newton Update Formulas . . . . . . . . . . . . . . . . . . . . 89

v

Contents

4.2. Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 904.2.1. Quadratic Programming and Active-Set . . . . . . . . . . . . . . . . 914.2.2. SQP Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 934.2.3. Interior-Point Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 95

4.3. Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 994.3.1. Optimal Control for Grade Changes - Trajectory Boundaries . . . . 1014.3.2. Solving the Feasibility Problem . . . . . . . . . . . . . . . . . . . . . 1034.3.3. Integrating the Specification Error . . . . . . . . . . . . . . . . . . . 105

5. Overview on Global Optimization 1095.1. Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.1.1. Controlled Random Search . . . . . . . . . . . . . . . . . . . . . . . 1105.1.2. Simulated Annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . 1125.1.3. Evolution Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

5.2. Discussion and the Tunneling Idea . . . . . . . . . . . . . . . . . . . . . . . 115

6. Adaptive Tunneling Concepts for Global Optimization 1176.1. The Algorithmic Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

6.1.1. Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1196.2. A General Class of Tunneling Algorithms . . . . . . . . . . . . . . . . . . . 120

6.2.1. Definitions and Properties . . . . . . . . . . . . . . . . . . . . . . . . 1206.2.2. Pole Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1266.2.3. Handling Removability by Choosing µ . . . . . . . . . . . . . . . . . 1306.2.4. Empirical Analysis of the Tunneling Concept . . . . . . . . . . . . . 1336.2.5. An Adaptive Shaping Strategy . . . . . . . . . . . . . . . . . . . . . 1396.2.6. Shape-Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . 1446.2.7. n-D Volume Control of Pole Regions . . . . . . . . . . . . . . . . . . 147

6.3. Stochastic Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . 1496.3.1. Maximum Likelihood Estimation of the Model Parameters . . . . . . 1516.3.2. Benchmark Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 1546.3.3. Pseudo-Random Numbers . . . . . . . . . . . . . . . . . . . . . . . . 1556.3.4. Example and fitting of the Weibull parameters . . . . . . . . . . . . 156

6.4. A concrete Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1596.4.1. Start Values by Halton Sequences . . . . . . . . . . . . . . . . . . . 1636.4.2. Discussion on Benchmarking . . . . . . . . . . . . . . . . . . . . . . 164

6.5. Analytic Basins of Attraction . . . . . . . . . . . . . . . . . . . . . . . . . . 1656.5.1. Quasi-Monte-Carlo Approximation . . . . . . . . . . . . . . . . . . . 1736.5.2. Solvability of the r-(GOP) . . . . . . . . . . . . . . . . . . . . . . . . 1746.5.3. Perspectives – Gradient Paths with Tunneling Functions . . . . . . . 176

7. Numerical Results 1817.1. A Projected Gradient Path Method for Linearly Constrained Problems . . . 181

7.1.1. Projected Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1827.1.2. The BDF method for the projected gradient path . . . . . . . . . . . 1837.1.3. Step Size Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1867.1.4. Results on Rosenbrock’s function . . . . . . . . . . . . . . . . . . . . 1887.1.5. Specification-Error-Optimal Control . . . . . . . . . . . . . . . . . . 1937.1.6. Remarks on the Algorithm . . . . . . . . . . . . . . . . . . . . . . . 195

vi

Contents

7.2. Optimal Control of the Wet-End Process . . . . . . . . . . . . . . . . . . . . 1967.2.1. A Grade Change Example . . . . . . . . . . . . . . . . . . . . . . . . 1967.2.2. Refining Control Intervals . . . . . . . . . . . . . . . . . . . . . . . . 1987.2.3. Sequential Time-Optimal Control with Trajectory Boundaries . . . . 201

7.3. Drying Section Analysis and Optimization . . . . . . . . . . . . . . . . . . . 2057.3.1. Defining Geometries . . . . . . . . . . . . . . . . . . . . . . . . . . . 2057.3.2. Choosing the Objective Function . . . . . . . . . . . . . . . . . . . . 2107.3.3. Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 2127.3.4. Optimization Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

7.4. Tunneling Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2207.4.1. Tunneling in n Dimensions . . . . . . . . . . . . . . . . . . . . . . . 2247.4.2. Remarks on Nonlinear Constrained Problems . . . . . . . . . . . . . 2267.4.3. Direct Comparison by Weibull Analysis . . . . . . . . . . . . . . . . 2267.4.4. Conclusions on Tunneling . . . . . . . . . . . . . . . . . . . . . . . . 228

7.5. Global Optimization Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 2307.5.1. Wet-End Global Optimal Control . . . . . . . . . . . . . . . . . . . . 2317.5.2. Global Optimization of the Drying Section Geometry . . . . . . . . 233

7.6. Conclusions and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2357.6.1. Outlook on Tunneling-Type Algorithms . . . . . . . . . . . . . . . . 237

A. Projected Gradient Path Algorithm in MATLAB 239

B. Distance of two Piecewise-Constant Controls 245

Symbols and Abbreviations 246

Bibliography 249

vii

Chapter 1.

Introduction

1.1. Outline of the Thesis

The development of this thesis was part of the cooperation of Voith Paper GmbH (Heiden-heim) and the Institute of Numerical Mathematics of the University of Ulm. Voith PaperGmbH is one of the world’s largest paper machine manufacturer with a sales volume ofmore than 1.9 Billion EUR in 2008 according to the annual report available online.The objective of the work was to establish a model library and a software environmentwith tools for the numerical treatment of the paper making processes which are describedhere. The use of powerful commercial software packages was to embedd into the process ofmodeling, simulation, analysis and global optimization.The topics which are mainly covered are modeling and simulation of paper machine pro-cesses, global nonlinear optimization by tunneling methods and the application of advancedprocess simulation by using commercial software packages. Therefore the thesis relates tothe work of [Wil95, Ekv04, AS06, EH08] for the modeling part. The new results in thearea of global nonlinear optimization extend the ideas of [LM85, BG91, CG00]. For basicand recent work concerning applied process simulation and commercial software, we referto [Cel79, CE93, BP94, OC96, Dah08].First, we give an outline of the thesis by describing the contents of its seven chapters.

Chapter 1

In the introductory chapter, we give an overview on the process of paper making and therecent challenges for the industry in a faltering market. From the point of view of the man-ufacturer it is important to steadily increase the understanding of the processes involved inpaper making and to develop integrated solutions, especially concerning energy consump-tion of the production process. Here, physical and mathematical modeling, together withnumerical simulation of the processes can help to improve the understanding and give hintshow to improve them. In the next section we discuss the meaning of numerical mathematicsin engineering applications. Describing the considered processes by mathematical equationsdirectly leads to analytic problems. Nowadays, as problems get large and complex, analyticsolutions are hardly or impossible to compute. Numerical methods are needed for a largevariety of mathematical problems.

Chapter 2

In Chapter 2, we present some software packages that are recently used for modeling andsimulation in pulp and paper industry while having a closer look on the commercial softwaregPROMS by Process Systems Enterprise, Ltd.. In producing industries, such a work isrelated to the key word Process Simulation. In this work, gPROMS is used for the modeling

1

Chapter 1. Introduction

and later on treated as a black-box solver for nonlinear dynamic and stationary problemsas well as a calculator for the generation of derivatives. We present two different modelsfor two paper making processes of choice. The drying process is one of the most interestingprocesses because of its impact on the overall energy consumption in paper production. Wegive smaller code examples to show how mathematical modeling is done within gPROMS’slanguage of modeling. Then we outline the structures of a model library for describingdynamic wet-end processes which are dominated by transport and separation phenomena.We discuss the transport problem and show how it relates to wet-end modeling. For bothmodels we give simulation results that were produced using gPROMS 3.1.0 and 3.2.0.

Chapter 3

Chapter 3 gives a background on the numerical solution methods used to solve the problemsthat rise from mathematical modeling. The solution of nonlinear systems of equations aswell as the solution of differential-algebraic systems of equations plays a central role forboth dynamic and stationary problems. Although the methods used within the softwaregPROMS are not exactly known to us, there exist well-known methods that are widelyused in various software packages. A good example is the backward-difference formula(BDF) that is used to transform differential-algebraic problems into a sequence of nonlinearproblems which can be solved by Newton-type methods. Calculating derivative information,so-called sensitivities, along with the solution of the systems themselves is an importanttask. We briefly discuss common methods to perform sensitivity analysis for both kind ofproblems and present a new external software process to gPROMS that augments it by thecapability to perform a full large-scale sensitivity analysis of steady-state processes.

Chapter 4

In this work, numerical nonlinear optimization methods are widely used. Amongst recentlydeveloped algorithms these are the standard implemenations in MATLAB (by MathWorks)and gPROMS (by Process Systems Enterprise). In Chapter 4, we introduce the mainideas of unconstrained and constrained optimization leading to the concepts of line-searchsequential-quadratic programming (SQP) and interior-point methods which are used withinthe software packages that are mentioned in this work. Optimal control is a separate fieldin optimization but by applying control-vector parameterization, it can be treated as anonlinear optimization problem. That is what we do in this work. We present the open-loopoptimal control problem for time-minimal changing paper grades in wet-end processes. Thiscan be described as time-optimal control problem under certain specification constraints.We transform it into a feasibility problem and show that it can be solved by a sequence ofunconstrained nonlinear programs which leads to a concrete algorithm.

Chapter 5

In this chapter, we give a short discussion on algorithms for the global optimization offunctions while outlining three widely used methods. We do this to find a suitable clas-sification of the new algorithm presented in Chapter 6. Most of the global optimizationmethods that are frequently cited in literature are designed for the minimization of real-valued and non-smooth functions of multi-dimensional continuous and discrete variables.Actually, nearly all of them use heuristics and stochastic methods. Our approach does notfit into the classes of global optimization methods presented here and references to similar

2

1.1. Outline of the Thesis

approaches in literature are rather rare. We motivate the idea of performing sequences oflocal minimizations instead of direct searches or evolution strategies fits best for the opti-mization problems that arise from the paper machine process models presented in Chapter2.

Chapter 6

A new concept of the function modification approach for global optimization is presentedin Chapter 6. The general function modification approach for the global optimization ofsmooth functions is to modify the objective function whenever a local solution is found. Theidea developed in this thesis is based on the concept of tunneling functions presented in the1980s which is generalized in this chapter. The tunneling idea is to eliminate local minimaby multiplying the function locally with a function creating a singularity in that point. Weextend this idea to smooth modification functions (so-called pole functions) and ellipsoidalregions of effects. We discuss some general properties of pole and tunneling functions forsmooth objective functions and perform an empirical analysis of the concept. We derivea concrete algorithm of choosing start values for local optimization methods and adaptivefunction modifications. Stochastic methods are used to analyze the performance of globaloptimization algorithms empirically while applying them to test functions. Adapting theidea of low-discrepancy nets for producing initial guesses that quickly cover the feasibleregion leads to a semi-deterministic global optimization approach. We define basins ofattraction for local minima by means of gradient paths (or gradient flows) and show thata global optimization multi-start method using dense nets and exact gradient paths solvesthe problem of finding a finite number of discrete minima in finite time. Then we give anoutlook how such a method can be realized using adaptive tunneling concepts.

Chapter 7

Chapter 7 presents the numerical results obtained for numerical optimization applicationsfor the presented models as well as academic examples for benchmarking of global opti-mization methods and for a new numerical algorithm for linearly constrained optimizationproblems. Optimal control problems are stated and solved, using both a standard and thenew approach discussed in Chapter 4. The tunneling concepts are discussed by a varietyof benchmarks for a set of academic functions. The sensitivity analysis tool presented inChapter 3 is applied to the drying section process from Chapter 2. There, a geometryoptimization problem is derived and solved, while the results generated a new insight tothe conceptional layout of paper machine drying sections.Global optimization is applied to the presented problems from pulp and paper industryby using the tunneling-type method of Chapter 6 to analyze the processes and discuss theresults from former optimizations.

1.1.1. Publications

Within this work, major contribution to the following publications and publications inpreparation has been generated.

Conference Proceedings: H. J. Post, G. Seitz and M. Weiss: Analysing process stabilityusing advanced simulation techniques at UCM International Conference 12–14th May2008, Modelling and Simulation in the Pulp and Paper Industry – Proceedings, Year

3


Figure 1.1.: This picture shows a modern paper machine. Below, a schematic of papermachine can be seen. The production direction goes from left to right in bothcases. Note that the drying section is usually hidden under a cover (see thepicture and the multiple circles on the schematic drawing).

of publication: 2008, ISBN: 978-84-691-3029-2, HUELLA DIGITAL, S.L. Edited byUniversidad Complutense de Madrid

Applied for patent: Vorrichtung zur Herstellung einer Materialbahn from Voith PatentGmbH, Heidenheim. DE–102009027609.2, July 2009

To be published:G. Seitz and K. Urban: An adaptive tunneling algorithm for global optimization (work-ing title)

To be published:G. Seitz and K. Urban: A projected gradient flow method for linearly constrainedoptimal control problems (working title)

1.2. Paper and Paper Industry

Holik writes in his Handbook of Paper and Board [Hol06]: Nobody can imagine a worldwithout paper. Indeed, paper is of elementary relevance for human culture and civilizationand the history of both is closely linked, just imagine a world in which books could neverbe printed, preserving and spreading knowledge as fuel for science and culture.Fig. 1.1 shows a picture of (graphical) paper machine as well as a schematic drawing.Not least because of its meaning for human civilization and economy, paper industry needsto invest in research and development concerning timeless topics as

• consumption of raw material and primary products,

• energy and water consumption,

• reliability (often called runnability) of whole plants,

4

1.2. Paper and Paper Industry

• quality of paper and board.

The Research and Development (R&D) work is supported by sciences and tools like

• measuring technology,

• process simulation,

• material sciences and

• process chemistry.

Actually, process simulation can profit from the other disciplines and can be used to supportthe research of all topics above. Therefore, one can expect that process simulation plays acentral role in future R&D work, which definitely encourages the objectives of this work.We borrow from [Hol06] to outline the concept of the paper making process.

1.2.1. The Process of Paper Making

In history, paper was once a rare material made by artisans. Today it is a mass productbut the steps of the production process are still the same in principle. The whole life cycleof paper can be begun with the supply of raw material. The raw materials are basically

• fibers,

• fillers,

• chemical additives and

• coating colors.

Fibers can be distinguished as mechanical or chemical pulp. It is called mechanical if thematerial is natural like groundwood and chemical if it consists of wood fibers that havebeen chemically modified.The paper making process in a plant begins with the delivery of raw materials, which arebrought to a suspension in water in the so-called stock preparation. The stock flows tothe constant part whose task is to produce a uniform mix of desired components in certainratios and supply it to the paper machine itself. That is why that part of the machine iscalled constant part. Together with the stock preparation, these parts are called wet-end,because the stock mainly consists of water there.The paper machine starts with the headbox which distributes the suspension of fibers andfillers evenly over the whole width of the machine so that the paper is uniform in bothdirections, machine direction (MD) and cross machine direction (CD). In the former, orwire section, the paper web is formed by draining large amounts of water. After thisprocess, the material in the machine can first be called paper but it is still much to wet.Next, water is removed mechanically by pressing the paper in the press section. Sinceline loads are practically limited, the dewatering potential by pressing is limited and thefurther drying process is done thermally. In the drying section, the paper is heated bybringing it in contact with hot steel cylinders which causes most of the remaining water toevaporate. Paper may be dried up to its final desired dry content or optional process stepscan be performed such as coating, which disperses coating color onto the paper web or theso-called calender which is designed to smooth the paper surface.

5


1.2.2. About Paper Industry

Paper industry in Germany is the largest in Europe with a sales volume of about 14.9 billionEUR in 2008 and a production of 22.8 million tons per year. At about 180 production sites,more than 3000 different sorts of paper are produced. Most of the paper is for graphicaluse like in print media and board, mainly for packaging use. Roughly half of the germanpaper production is exported. For current economic data and facts we refer to the onlinesources [VDP09, PTS09] as well as to the current edition of [Hol06].Paper is mainly made of cellulose, groundwood and recovered paper, while the use of re-cycled paper has got important in the last decades, so that in Germany, we have a usageratio of about 68 % compared to an international average of 40 %. Paper producing in-dustry faces the problems of water and paper circulations. The better this circulation canbe realized, the higher the cost efficiency of paper making and the lower the environmentalimpact. So one seeks to reduce the usage of fresh water and cellulose as well as groundwoodto a minimum while still satisfying the specifications of a desired sort of paper. Using rawmaterials and energy efficiently gains even more importance regarding the economic trendthat many markets have not been growing for years and probably will not in future. Socompanies of paper producing industry and also paper machine vendors like Voith Paperhave to find ways how to use existing capacities more efficiently or how to design and op-erate new machines to persist in a highly competitive market.Nowadays computer simulation allows to assist the planning and operation of a paper pro-duction plant. The better the engineers understand the characteristics of paper, the betterthe mathematical descriptions of the paper making processes can be. Describing the prob-lems mathematically yields problems of a large scale and high complexity that might notbe solvable manually, either because an analytic solution is not available or because themanual solution consumes too much time. Numerical methods implemented in commercialsoftware packages that have gained popularity in the last decades, are needed to predictthe behavior of the processes under defined conditions. It is inevitable that the used meth-ods are well understood and the problem formulation is done parallel to a mathematicalanalysis of the model.It can be seen as a goal in the future to have a highly complex model of the whole papermaking process, enabling precise predictions and therefore allowing detailed optimizationsof desired output characteristics with respect to chosen inputs. And even more, the ben-efits and drawbacks of whole configurations and machine concepts, starting with a powerplant delivering the energy and the steam, a scheduled raw material transport system up tothe finishing of the paper and delivering complete reels, can be compared. Such a systemshould be capable of predicting the behavior of paper machines online, that is during theoperation to assist the operator and give instruction how to control the machine or evenused to have a fully automatic paper machine that runs at maximum efficiency.In this work we made a further step to that goal by implementing models for paper makingprocesses such as the dynamic wet-end and the drying section, defining optimization prob-lems and deriving optimization methods to solve these kinds of problems. The methodsand models presented in this work can be used to optimize and analyze the processes anthus assist the research and development in paper industry.

6

1.3. Numerical Simulation in Engineering Application


We first discuss the general scientific approach of solving problems in engineering applicationwhich steadily attains more and more meaning for research and development.Every day problems in engineering application are describing, predicting and improving ofthe behavior of complex physical systems and devices, both dynamic and steady state. First,one must be able to understand the causal mechanisms that work in the system. Then, if itcan be described by mathematical formulas, it is up to mathematics to seek solvability of theresulting problem. This first part of the general approach of solving application problemscan be called modeling. Eck, Garcke and Knabner [EGK08] defined mathematical modelingas the description of phenomena in nature, engineering and economical sciences by meansof mathematical structures. Since there is no unitary definition of a general model, we willuse the term for the collectivity of all equations, conditions and transitions that are neededto describe our application problem. We will also imply that such a model always yields auniquely solvable mathematical problem, say that it is well posed.Once a mathematical model is set up, it consists of a certain number of equations, thathave to be solved. Solving equations analytically is the most desireable choice since theaccuracy of the solution would be very high and computation is fast. In practice, ordinarydifferential equations and partial differential equations may be used to describe the systemsof interest. As dimension gets higher or equations get more complicated, an analyticalsolution of the equations is usually getting harder to find, or even up to the impossibility todo so. Solving the equations numerically is the choice if stable solutions can be calculatedefficiently within a given tolerance. Numerical solutions go along with the discretizationof continuous systems and therefore produce approximations to the smooth and unknownsolution. This second part, which implies assigning all necessary conditions of a systemand solving the associated model equations, can be called simulation. Simulation can beseen as a theoretical experiment run on a computer. Exactly there lies the big chance inthe ability to simulate, since these experiments are often performed faster and are muchless expensive than real experiments. Therefore, simulations can be used to predict thesystem’s behavior. The results of a numerical simulation shall be called state.A very natural way of working out a next step is making use of the ability to predictthe state of the system for a given set of conditions. If the described system is large orcomplex, causal relations between conditions and states are not trivial and not alwaysknown, even not to experts. Sensitivity analysis, essentially calculating partial derivatives,exposes the dependencies between given inputs and calculated outputs. Knowing these isalready a benefit and yields the chance to gain a better understanding of the system, butfurthermore they can be used for numerical optimization as gradient information. Theoptimization objective has to be defined in a way that it is part of the output result of asimulation run. Minimizing or maximizing the objective, respectively, has to be achievedby finding the appropriate decision variables. To make notation not too confusing, somesynonyms shall be allowed: The decision variables of an optimization problem are dependingon the context called

• controls or control variables to emphasize their physical function in the system,

• inputs, because they are such for the simulation as a black-box,

• variables, since they can be varied by the user or the optimizer,

• or parameters to emphasize their time invariant character.

7


optimal control problem

?

nonlinear program

?

quadratic program

?

linear system differential-algebraic equations

6

nonlinear system

6

linear system

6

?

Figure 1.2.: An exemplary embedding concept of solving optimal control problems by se-quential quadratic programming.

All of these names shall be valid in this work. Control variables can be seen as part of thetotal of all conditions that have to be given in order to run a simulation. When we set upa simulation task or optimization problem, consequently all controls have to be assigned.In the end, it is part of the problem setup, which conditions of the system are supposed tobe interpreted as control variables and which are not. This part of the scientific approachcan be called shortly optimization.So describing, predicting and improving yields modeling, simulation and optimization.While modeling needs a deep understanding of the underlying system such as physics inengineering, simulation and optimization are disciplines in applied mathematics and wellknown numerical methods can be applied here.In fact, separating modeling, simulation and optimization is a theoretical task. In practice,a possible sequence is: modeling, parameter estimation, simulation, optimization. Here,parameter estimation or identification denotes the task of fitting unknown model param-eters, so that the simulation results match some observed data as good as possible. Thedata fitting problem itself is a special case of an optimization problem and has to be solvedbefore reliable simulations or optimizations can be performed. As problems get larger andharder to solve, users require more and more sophisticated solution methods in order toachieve their goals. The simplest model of an engineering process is a scalar linear relationbetween a state and a control. Solving this can be done by pen and paper or simple basiccomputation ons a computer. If more variables are involved and there are interdependen-cies between them, the problem to be solved is a linear system of equations which can besolved simultaneously be direct and iterative methods (provided solvability). And if at leastone of the equations in the system is not linear, we have a nonlinear system of equations.As usual, numerical methods seek to solve a complex problem by a sequence of simplerproblems where a solution method already is available. In case of nonlinear systems ofequations this means that it can be solved numerically by solving several linear systems.

8


A similar approach is common for problems stated as differential equations. In general,differential equations cannot be solved analytically and numerical methods are designed totransform the differential equation into a set of subproblems of a well-known type. Forordinary differential equations this can be the nonlinear system that arises when using animplicit discretization scheme and for partial differential equations, we usually get large lin-ear systems of equations that are generated for example by finite-difference approximations.If one states optimization problems based on simulation models, the process of downwards-simplification has to be continued. An example is the sequential-quadratic programming(SQP) idea of solving a nonlinear optimization problem based on a nonlinear simulationmodel. The SQP approach solves the nonlinear program by a sequence of quadratic pro-grams, each is solved by a sequence of linear programs. And for each evaluation of theobjective function and its gradients, the simulation model has to be solved, which is non-linear, and so on. See Fig. 1.2 for an overview on the general concept of solving optimalcontrol problems by solving nonlinear programs using SQP. We will discuss the details inthe Chapters 3 and 4.

9

Chapter 2.

Process Simulation in Pulp and PaperIndustry

In this chapter, we describe the basics of process simulation and detail two applications frompulp and paper industry, namely the wet-end process and the drying section as examplesfor a dynamic and a steady-state process. Before describing the mathematical problemsthat arise in simulation of dynamic and steady-state processes in the next chapter, we givea brief overview on the software packages available to solve these problems.Process simulation in pulp and paper industry requires the mathematical modeling of pro-cesses in different parts of a paper machine or preceding plants which can differ in theirnature. Steady-state processes describe te general dependency of variables and parametersof a system without a specified time horizon. This can be used to describe relations betweeninputs and outputs and analyze the behavior which is relevant for production processes thatare running in a constant state. Dynamic systems are used to model time-dependent re-lationships between inputs and outputs and to gain an understanding of the inertia andcomplexity of a process.In this work we describe dynamic systems governed by transport phenomena and materialbalances for the wet-end process of a paper machine. The process of paper drying can bedescribed as a steady-state system of nonlinear equations, using discretized PDEs for thetransport of water and energy along a paper web and through heated cylinders.Most of the modeling and simulation tasks in this work is done with the software packagegPROMS (general process modeling system) by Process Systems Enterprise (PSE). Thisis a software environment with a graphical user interface for equation- and event-basedmodeling of dynamic or steady-state process systems with capability of numerically solvingsets of differential-algebraic equations and performing nonlinear programming tasks by asequential quadratic programming method. Both application examples from paper making,the wet-end process and the drying process, have been implemented in gPROMS 3.1.0 andwere also run with gPROMS 3.2.0 as it was released.Before we describe gPROMS in more detail, we give a very short introduction to the prob-lem arising in process engineering, which encouraged the development of such softwarepackages.The history of modeling and simulation in pulp and paper industry started when computersbecame powerful enough to solve the typical problems that arise in process simulation inreasonable time. Process simulation in pulp and paper industry requires the mathematicalmodeling of steady-state and dynamic processes in the different parts of a paper machineor preceding plants. In principle, a process model can consist of physical, chemical or atleast purposive descriptions of the contained objects that are arranged within flowsheets.

11

Chapter 2. Process Simulation in Pulp and Paper Industry

The Process Simulation Problem

A very helpful way to describe the general mathematical problem of process simulation isto formulate a set of nonlinear differential-algebraic equations (DAEs) in an implicit form

F (x, x, y, u, t) = 0, (2.0.1)

where x ∈ Rnx , x ∈ Rnx , y ∈ Rny , u ∈ Rm and t ∈ [0, T ]. The variable x is called differen-tial state variable, x is its derivative with respect to time and y is the so-called algebraicstate variable. The vector u can consist of either time-varying controls or time-invariantparameters and denotes the input of the system. Note that all states can be explicitly andimplicitly time-dependent.DAEs combine the methods of nonlinear steady-state modeling with time-dependencies de-scribed by ordinary differential equations (ODE). Therefore, they are suitable to modeldynamic material balances along with reactions and interactions of system variables. Thisencourages teh use of commercial software to numerically solve these kinds of problemsfor application in pulp and paper industry. If partial differential equations occur whiledescribing the physical systems and if these can be discretized (by i.e. finite differencing),the discretized versions in form of linear systems of equations can be become part of theDAE system to be approximated simultaneously.The problem of solving DAEs is related to initial value problems of ODEs (or the Cauchyproblem), but in contrast to them, the presence of the algebraic states has to be taken intoaccount and the initialization of the system requires the solution of a system of nonlinearequations. It is also possible to use steady-state initial conditions by setting x(0) = 0 forsome or all of the derivatives. In any case, initial conditions have to be consistent in sucha way that (2.0.1) is solved at t = 0.A special case of DAEs is a classical set of nonlinear equations by removing all time deriva-tives of the differential state variables or setting them to 0. In the following chapter, weoutline solution methods for DAEs that are now widely available and also capable of per-forming sensitivity analysis in order to compute gradients of state variables with respect toinputs.It was noticed that dynamic processes are mainly governed by continuous dependencies butthe use of digital regulators and the simulation of user controls arises the need of discreteevents. These discrete events might change the value of control variables, re-assign alge-braic state variables, re-initial differential state variables or even replace whole equations ofthe model. Clearly, a discrete time event requires the simulation to stop, locate the eventand re-initialize the whole system. An event can be triggered explicitly by reaching an apriori known event time. Such an event time can also depend on earlier events, in [Cel79]it is distinguished between exogenous and endogenous time events and state events. This isneeded to discuss methods to identify , locate and handle discrete events in order to have asystematic way of solving dynamic systems with discontinuities in form of triggered events.State events are always triggered implicitly by hitting some logical event condition that candepend on all state variables. As shown in [BP94], this leads to a decomposition of thetime horizon to intervals with smooth system behavior divided by discrete events. For eachof the intervals, a different set of equations holds. This can be written as

F k(x(k), x(k), y(k), u(k), t) = 0, t ∈ [tk, tk+1), (2.0.2)

in the k-th interval. The number of events is not necessarily known a-priori, that is whythere is no range given. By monitoring all (in fact boolean) event conditions during simu-

12

2.1. Applications and Software

lation, the end of the k-th interval can be located within a certain event tolerance. In eachinterval, the system functions F k are assumed to be smooth. Refer to the next chapter fornumerical details.


In [BP94], a process modeling environment is described for combined discrete/continuousprocesses and a modeling language is presented to write equations for the differential andalgebraic states as well as initial and event conditions. Classical examples from chemicalengineering are given to show the problems that arise even for very simple examples, suchas pressure vessels that open when pressure increases a certain threshold. The opening andclosing of the valves is modeled by discrete events that are monitored by simple inequalityconditions on the state variable pressure. Another example is a liquid filled tank with aweir. If the level rises above the level of the weir, the overflow is activated. Such a simplesystem needs some kind of switching condition that detects when to change the equationfor the overflow through the weir. This shows that a fairly simple model of an engineeringprocess already leads to a problem that needs special solutions techniques, defining therequirements to commercial software solutions.

Applications

We refer to the work of [Dah08] for a recent overview on the software used in pulp andpaper industry and the practical applications of them with detailed references. The mainapplications are:

• Operator training: The simulation model is supposed to be used to help the operatorsof the plant understand the complexity of the process system and to predict thebehavior under user-definded conditions. Until now, there are still practical problemsto get a reliable simulation of the process online in such a way that it is updatedautomatically and always synchronized to the real process.

• Data reconciliation: This means to identify the reliability of sensors in the real ma-chine by comparing the measurements with a material- and energy-balance-basedsimulation.

• Decision support: A probabilistic approach based on a simulation model uses Bayesiannetworks to compute chances of system failures in [Wei02]. This means that theprocesses are modeled by complex networks of decisions and possible observations inform of random variables.

• Root cause analysis: Data reconciliation can be used to identify biased sensors. Alsomethods from decision support can be used to identify causes of problems in theprocess.

• Optimization and model predictive control (MPC): This is a very wide field in processsimulation. The ability of predicting the system’s behavior for a chosen set of inputsby simulating the process system model gives the possibility of finding a certain inputthat results in a more desirable behavior. Steady-state optimization problems referto classical nonlinear optimization problems, while the optimization of a dynamic

13


process in principle means to find an optimal control strategy to achieve certain goals[FA00].The intention of model-predictive control is to set up regulator controls in the realplant that are optimized for a model of the underlying process, often linear. A reg-ulator receives measurements online from a sensor in the process and computes thebest control in terms of a given process model based on that measurement.

Nevertheless, process simulation might help to understand the complex couplings withinthe modeled systems. In order to do that, one can analyze responses of the simulationmodel such as the step or impulse responses on given input variables. In order to do that,one can analyze responses of the simulation model such as the step or impulse responseson given input variables. In the case that a highly complex model of the complete processexists, a singular value decomposition (SVD) could be the method of choice to identify themain components influencing the process.It is often desirable to work with linear time-invariant systems to be able to use the meth-ods of system theory such as frequency analysis [Por96]. However, such models can beconstructed indirectly by constructing a nonlinear model of the process and use Taylor’sexpansion to linearize the model at a certain point which yields a snapshot of the sensitivitysystem, which is a linear time-invariant system (LTI).

Software

Currently, there are many software packages available, especially there are some highly de-veloped ones in use in chemical engineering, as well as in pulp and paper industries. Onecan distinguish between discrete-time software and continuous-time software. While in caseof discrete-time, the solver for the DAEs uses fixed time steps, continuous-time solvers useadaptive methods to compute the solution of the system within a given tolerance. Advan-tages and drawbacks quickly become clear when facing the problem of deciding betweenperformance, accuracy and interactivity. Until now, both types of software have their jus-tification in different applications.We give a short list of recently used software packages of which we think they are most pow-erful and promising for the future of process simulation in the pulp and paper industries.However, this list is far from being a complete overview on all the software available.

• Extend – FlowMac: FlowMac is an extension to the software environment Extend byImagine That Inc., see

http://www.extendsim.com/.

It offers a library of models for use in pulp and paper industry mainly based onmaterial balances. The software extend solves the dynamic systems either with afixed time step or a time stepping that depends on the occuring of discrete events,

http://www.papermac.se/FlowMac

• IDEAS: This is based on the solution techniques of the software Extend and nowadayspart of the service of Andritz Automation, a paper machine vendor,

http://www.ideas-simulation.com/.

• Modelica – Dymola, OpenModelica: Modelica is the name of a modeling language forequation-based process modeling and simulation. It is based on the work of [CE93]

14


and [OC96]. Dymola is the name of a commerical software product by Dynasim, whichoffers a graphical user interface and numerical solvers for the Modelica standard,

http://www.dynasim.se/.

With OpenModelica, there is an ambitious open source project with the goal to createa complete Modelica modeling, compilation and simulation environment which is basedon free software distributed in binary and source code form,

http://www.ida.liu.se/ pelab/modelica/OpenModelica.html.

• MATLAB Simulink: Simulink is a toolbox for MathWorks’ software MATLAB formodel-based simulations using block-type models in graphical flowsheets with a seam-less interface to other MATLAB routines or the capability of generating C code fromthe model and post- or pre-processing of the simulation data. There are also severalextensions available for different applications such as physical modeling or controldesign,

http://www.mathworks.de/products/simulink/.

• gPROMS: We give an overview in the next section, since this is the software we mainlyuse in this work. For online information, refer to

http://www.psenterprise.com/gproms/index.html.

It is notworthy, that there was already some work done for this commercial software.In [AHM07], the authors describe a software tool called DyOS for control vectorparameterization using gPROMS.

Except FlowMac and IDEAS – which are desgined for paper industry – all software packagesare designed to be a general and powerful development environment for modeling, simulationand optimization of process systems of any type and are not limited to pulp and paperindustry.

2.1.1. gPROMS - An Overview

gPROMS is in fact a bundle of software tools with a common solver kernel to computenumerical solutions of given simulation and optimization problems. These are created ina central graphical user interface (GUI) which is called ModelBuilder. This tool can alsobe used to start certain tasks and analyze the results, but there are several ways to usegPROMS as a black-box solver engine and control the tasks from third-party software.

Graphical User Interface

The ModelBuildercan be used to do exactly what the name implies and to control furthertasks like

• Scheduled Simulation,

• (Dynamic) Optimization and

• Parameter Estimation,

as well as exporting the model as an encrypted file object for use in external software with

• gO: MATLAB, to be used for input-output communication,

15


• gO: Run, which allows to run gPROMS from the command line without using Mod-elBuilder,

• gO: Simulink for MATLAB Simulink,

• gO: CAPE-OPEN, CAPE means Computer-Aided Process Engineering) and is relatedto a standardized interface for applications, and

• gO: CFD, an interface to standard software packages for computational fluid dynamics(CFD).

These interface names starting with gO indicate that they are officially part of the gPROMSsoftware package.These external software tools allow the use of models built in gPROMS along with thenative solver routines to be used within external software environments such as MATLAB(gO: MATLAB) or the shell of the operating system (gO: Run). In this way, gPROMS canbe seen as model development tool with flexible solver interfaces to be used as black-box.In the GUI, model development and problem setup is done in an object-orientated way.Modeling and Simulating in gPROMS are done by setting up different objects, so-calledentities with dedicated purposes. These are amongst others:

• Variable Type: This defines a class that a variable can be part of with upper andlower bounds for their value. It can also be used to define an initial guess for variablesof that type that is taken whenever an initial guess in numerical algorithms is needed.Thus, it is a very important entity type for a successful initialization of the processmodel. The variable bounds are monitored during simulation and – if set to physicallimitations – indicate modeling errors and can help to identify it.

• Connection Type: Connection types are entities for the interface between differentunits (models) in the flowsheet. It defines which variables are ’submitted’ via a so-called port. By connecting two models in the flowsheet, identity equations are set upautomatically according to the variables in the connection type. This is basically acomfort feature for flowsheeting and hierarchical models, since all these dependenciescan also be set up manually.

• Model: A model itself stands for the set of equations, assignments, parameters andinitial conditions together with event conditions and user-defined ports of certainconnection types. Every gPROMS simulation model needs at least one model entity.By drag-and-drop, models can be instanced as a unit in the flowsheet. In this manner,arbitrarily complex models can be built by including other models. This automaticallycreates a top-down hierarchy of models, since all models can include other models.

• Process: A process is the main entity to control simulation activities and includes fur-ther assignments, initial conditions and solver parameters. It collects all models thatshall be part of the simulation by including them as units. In a chronological schedule,the user can program explicit or implicit events with simple logical statements andcommands such as ’CONTINUE UNTIL some event ’ or resetting of assigned variablesand re-initialization.

16


Numerical Solvers

In process simulation, a wide range of different mathematical and numerical problemsarise from the fact that numerical algorithms usually decompose the original problem intosubproblems that are easier to solve. The following table shows the hierarchical structureof the state-of-the-art solvers available in gPROMS, which in fact are implemented usingthe CAPE-OPEN standard ([Net09]) to make them replaceable. It shall be read top-downto illustrate the dependencies.

Control Vector Parameterization (CVP) CVP SS, CVP MS(Mixed-Integer) Nonlinear Programming Program ((MI)NLP) SRQPD, OAERAPDifferential-Algebraic Equations (DAE) DASOLVNonlinear Equations (NLS) BDNLSOLLinear Equations MA28, MA48

The table indicates a hierarchy from the top to the bottom in the case of a dynamic op-timization problem, for pure simulation problems, the hierarchy starts with the solutionof DAEs by DASOLV. First, a solver for the control vector parameterization (CVP) iscalled to build a nonlinear program. The suffix SS stands for single-shooting and MS formultiple-shooting, respectively, which are common ways of transforming optimal controlproblems to nonlinear programs.The nonlinear program then is handled by an optimization solver depending on the typeof the problem. If the decision variables include integer variables, the solver for the mixed-integer problem, OAERAP, is called. Otherwise, the sequential-quadratic programming(SQP) solver SRQPD is called. In Chapter 4, we explain the concept of SQP and optimalcontrol via nonlinear programming.An evaluation of the objective function, the constraints, as well as the gradients and Ja-cobians of the constraints for the nonlinear program is performed by calling a solver forDAEs, namely DASOLV, and instruct it to perform the sensitivity integration along withthe state integration. The solver is based on a predictor-correcter backward-difference for-mula method (BDF) and is explained in more detail in Chapter 3. Both state and sensitivityintegration need Jacobian information of the DAE system, that is the partial derivatives ofthe system equations with respect to all variables and inputs.However, integrating DAEs comes with the need for the solution of systems of nonlinearequations, which is done by Newton-type methods and block-decomposition in the solverBDNLSOL.Finally, the need for sparse linear algebra operations occurs in many situations such asthe iterative solution of nonlinear equations with BDNLSOL. MA28 and MA48 are directsolvers for sparse linear systems of equations based on BLAS routines and perform sparseLU decompositions.The used numerical methods for DAEs and sensitivity systems, as well as for linear andnonlinear systems are well-known and are successfully implemented in many other softwareapplications. The methods will be described and further references will be given in Chapter3.

Modeling in gPROMS

First of all, it is known that a problem is well-posed if there exists a unique and stablesolution. For our discussions and the implemenation of our process models we assume that

17


Figure 2.1.: This screenshot shows the gPROMS ModelBuilder interface with an open ex-ample project. The left embedded window includes the project navigator andgives access to all entities and the simulation or optimization results. In thisexample, a simple model with two variables called ’first order’ is opened inlanguage mode and a ’Process’ with a schedule can be seen in the middle ofthe picture.

the problems are well posed. However, it is hard to ensure well-posedness for general models,because even very simple nonlinear equations might have multiple solutions. So there is achance to keep well-posedness by careful modeling, i.e. restricting the domains in suitableways and using (at least) piecewise-continuous functions. To illustrate the modeling taskwithin gPROMS we show some screenshots in Fig. 2.1 and Fig. 2.3 from the graphicaluser interface and demonstrate the software by a simple example modeling a first orderdifferential equation with a discontinuous right-hand side. We assume that the right-handside can be discontinuous because we allow the control u to be defined piecewise. Weillustrate the use of gPROMS modeling at a very simple example. More detailed exampleswith code in gPROMS language are given in the Sections 2.2 and 2.3 as we present themodels used in this work.

A typical workflow of modeling and simulating is:

(i) Open ModelBuilder and begin a new ’Project’.

(ii) Create a ’Variable Type’.

(iii) Create a ’Model’:

18


Figure 2.2.: This picture shows the visualization tool gRMS that comes with gPROMS. It issupplied with simulation results during the simulation calculation. The graphscan be updated during simulation or plotted at once after the simulation run.The graphs shown illustrate the results of the presented example from Fig. 2.1.

a) Define variables of the created type.

b) Write equations (one for each unassigned variable), assignments and initial con-ditions.

(iv) Create a new ’Process’:

a) Add a unit of the created model to the process.

b) Set up a simulation schedule.

(v) Press ’Run’.

(vi) Check the execution output and results files.

If no schedule is given, only the initialization of the system is calculated. This equates asteady-state simulation if there are no time derivatives given or all are set to zero at time0.The following example is illustrated by Fig. 2.1 and refers to the solution of the problem

dx(t)dt

= −x(t) + u(t), t ∈ [0, 20] (2.1.1)

with x(0) = 0 and u(t) ≡ 5 for t ∈ [0, 20].Variables are declared in the following way, by writing them in the VARIABLE-section ofthe project.

VARIABLE

x AS typeu AS type

19


Time-invariant parameters and constants are given in the PARAMETER-section,

PARAMETER

a AS REALb AS REAL

and they are defined in the SET-section:

SET

a := -1;b := 1;

Now there are two variables declared. This means that the system needs exactly twoequations and/or assignments in order to have a chance to be solvable. The EQUATION-block may contain an ordinary differential equation like here:

EQUATION

$x = a * x + b * u

The $-operator stands for the derivative with respect to time. All variables are implicitlytime-dependent. Since this is only a single equation for the variable x, the system cannotbe solved unless u is given somehow, for example in the ASSIGN-block.

ASSIGN

u := 5;

Still we need an initial condition for the initial value problem to be solvable.

INITIAL

x = 0;

Finally, this is a model for a linear first order differential equation. By creating a new’Process’, a schedule for the simulation can be given.

SCHEDULE

CONTINUE FOR 100

In the ’Process’, the solution parameters such as solver tolerances for the numerical ap-proximation of the solution or the accuracy for the detection of events can be defined. Theschedule can have a form like:

SCHEDULE

SEQUENCECONTINUE FOR 10RESET

u := -5;ENDCONTINUE FOR 10

END

20


Figure 2.3.: In this screenshot, we can see an extension of the example project of Fig. 2.1.A second model is implemented and a third one called system to connect theothers to build a combined model to achieve second order behavior. The twoconnected boxes are produced by dragging-and-dropping instance of the modelsto the model topology of system. Linking the model ports automatically createsmodel equations to submit the port information and visualize this connection.

In that case, the simulation is stopped after 20 seconds of simulated time and the variable uis reseted to −5. This models the problem of solving the above linear differential equation(2.1.1) with

u(t) :=

5 t < 10−5 t ≥ 10

. (2.1.2)

Analytically, equation (2.1.1) is not solvable over the whole time horizon since we need thecontinuity of the right-hand side. However, it can be solved piecewise by re-initializing thesystem at the event time. And that is what is called combined discrete/continuous modelinglike in [BP94] and handled in gPROMS. This demonstrates the way of implementing a firstorder differential equation in gPROMS language. Ordinary differential equations of higherorder first have to be transformed into a system of first order differential equations.An example for the hierarchical implementation of a second order ordinary differentialequation is given in the screenshot of Fig. 2.3. There, a copy of the model ’first order’ ismade and both models are extended by the introduction of a Port of a defined connectiontype. This port contains information by a variable that is implicitly declared when a portis used in a model for each instance. A simple flowsheet is built by connecting instances of

21


the two model via this port which automatically creates the identity equations for the portinformation variable. While the port can be seen as output of one model by including anequation that identifies the port information with the state variable, it is used as an inputfor the other model by introducing an equation which identifies the input of the differentialequation by the port information.In this way, the two linear first order system equations are connected by defining theinput of one equation as the output of the other. The expression define is mathematicallynot correct since it is numerically realized by introducing an additional equation which isgenerally solved by finding a set of variables which produces a considerable small residual.How to handle partial differential equations (PDE) on rectangle domains is discussed bythe implementation of the drying process model in Section 2.2.1.The visualization of the solution can be done in a tool called gRMS which is supplied withsimulation output ’on-the-fly’ during simulation, see Fig. 2.2. The user can set up 2D and3D plots containing several graphs, export them or create templates for later use with othersimulations of the same model.

2.2. The Drying Process

In this section, we describe a model for a thermic paper drying process. When the pulpenters the paper machine and is formed to paper, it has a very low dry content and mainlyconsists of water. This is necessary in order to have desirable flow characteristics of thepulp in the wet-end. About half of the water is drained out mechanically by pressing thepaper onto wires. In order to reach the desired final dry content of the specification of asalable paper, it has to be heated so that the water evaporates.This is classically done by leading the paper onto hot rotating rolls which causes the paperto heat up. The water can evaporate through the paper into the surrounding air if the dewpoint of the air is high enough. Thus, there are two main engineering tasks. The first oneis to heat up the paper, the second one is to supply it with enough air and to take care ofthe humid air that is produced by the evaporation of water.Paper is a porous medium. It consists of fibers, fillers and air bubbles with a complexthree-dimensional structure. The description of the evaporation of water through porousmedia, i.e. paper, is a very complex task since it includes capillary flow through this systemof fibers, fillers and bubbles.

Related Work

In the literature, there are already drying section models presented that are based onmass and energy balances. One model which is developed in Dymola using the Modelicalanguage called DryLib and published in [AS06, EH08, Ekv04, Wil95, SA05]. A hierarchicalmodel library is described which can be used to model paper moisture and temperatures inpaper machines. An important part of the library is the dynamic model of a steam heatedrotating cylinder. It is based on mass balances for water and steam within the cylinderand on energy balances for water, steam and the metal of the cylinder hull. Heat transferbetween condensate and metal, as well as metal and paper are modeled proportionally tothe temperature differences. The heat transfer coefficients remain as model parameters. Itis assumed that the steam is saturated, therefore steam temperatures as well as densitiesand enthalpies can be described as functions of the steam pressure. This leads to nonlinearsets of equations. In [SA05], these equations are linearized to obtain a linear state space

22


model that models the response of pressure and temperatures dependent on steam flows.The unknown parameters are estimated by using measured data from a calibration run ofa real paper machine. This model for a steam heated cylinder is used in the hierarchicalmodel to describe the heat transfer from the steam to the paper. Finally, this model libraryis used to automatically control the moisture of the paper by adjusting the steam pressureand flows. In [AS06], also an open-loop optimal control problem for the nonlinear model-predictive control of moisture is formulated and solved. Therefore, the DAE solver DASPKfrom [MP96] is used to calculate the necessary sensitivity analysis.

2.2.1. A Dryer Model

Basically, drying section models can be formulated as dynamic models since there are dy-namics at different scales such as the transport of the paper in a certain direction or thechanging of paper characteristics over time. This is convenient when the user likes to pre-dict the dynamic behavior of the machine under certain changes in the controls such assteam pressure. However, it is not necessary to solve dynamic simulation problems for thedimensioning of drying sections and the analysis of steady-state cases.In the following we outline the model that was developed along with this work during thelast two years of the author’s studies at Voith Paper GmbH. While I was implementing it, Iwas supported by the engineers Hermann-Josef Post and Roland Mayer concerning processengineering details. Implementation details are widely based on the company’s internalknowledge and must not be published here.We state it as a dynamic model and transform it to a steady-state model by replacingall time derivatives by zero. To illustrate the implementation in gPROMS we give codeexamples in gPROMS language comparable to the ones in the previous subsection.The model is a one-dimensional description of the drying section in the direction of the pro-duction, the so-called machine direction. For a specific drying section of a paper machine,the length of the paper from the entrance to the exit of the dryer is fixed. The paper in adryer is indeed three-dimensional, but we omit the width of machine and the thickness ofthe paper by neglecting boundary effects. Then, the length of the paper in the dryer canbe decomposed into subsequent zones with similar environmental conditions. These zonesare basically:

(i) Paper on a heated cylinder.

(ii) Paper surrounded by air between a heated and an unheated cylinder.

(iii) Paper on an unheated cylinder.

(iv) Paper surrounded by air between an unheated and a heated cylinder.

This sequence is repeated until the end of the drying section. In fact, the zones themselvescan be separated again dependent on the presence of drying wire.We first need to declare some variables with concrete physical meaning. In the model,these variables can be used as time-invariant, explicitly time-dependent, implicitly time-dependent or as dependent on the position.

• u – moisture content (or water content) in [kg water/kg solid material]

• v – machine speed in [m · s−1]

23


• m – water evaporation rate in [kg · s−1]

• G – specific mass of the solids in the paper in [kg ·m−2]

• T – temperature of the paper in [K]

• cpP – heat capacity of the paper in [J · kg−1]

• Gb – total specific mass of the paper in [kg ·m−2]

• ∆VH – evaporation enthalpy in [J · kg−1]

• R – universal gas constant [J · [K ·mol]−1]

• Mw – molar mass of water [kg ·mol−1]

Let t be the time in seconds and p be the position in meters. The following balance equationsdescribe the transport of moisture along a given zone in machine direction of length `. Themain transport equation for the moisture at time t is given by

∂

∂tu(t, p) + v(t) · ∂

∂pu(t, p) = −m(t, p)

G(t, p), p ∈ (0, `], (2.2.1)

or in steady-state by

v · ∂∂p

u(p) = −m(p)G(p)

, p ∈ (0, `], (2.2.2)

where ∂∂tu(t, p) = 0 and t is omitted in the notation. This is a simplification of the two-

dimensional model presented in [Ekv04]. The two-dimensional approach has an additionalterm for diffusion in direction of the paper thickness. Since paper is an inhomogeneousmedium, it is hard to describe diffusion coefficients and measurements from within thepaper are not available. Furthermore, diffusion in machine direction can be neglected dueto high machine speeds. The transport equation term is dominant. Our simplificationmakes u(t, p) an average moisture content with respect to paper thickness at time t andposition p.The equation (2.2.1) or (2.2.2), respectively, needs initial values for the differential variableu at time t = 0 in the dynamic case and a boundary condition for p = 0. Now thisequation is used to model the water transport in a certain zone and all zones are connectedsubsequently. If we enumerate the zones by i = 1, 2, . . . , we can introduce

u(i)(t, `(i)) = u(i+1)(t, 0), t ≥ 0 (2.2.3)

and we can giveu(1)(t, 0) = u(0)(t) (2.2.4)

by an input moisture content u(0)(t). When modeling partial differential equations ingPROMS, the user needs to define a discretization scheme. The transport direction ofmoisture depends on the sign of v(t) which is always positive during production. Thus, wechoose an equidistant backward-difference scheme to discretize the variable u in machinedirection, see Fig. 2.4. This means, that we choose a number n(i) of nodes for each zoneand generate n(i) equations, one for each node. From now on, the symbols for the variables

24


6

-

p1 p2 p3 p4 p5p0

u(t, p1)

u(t, p2)

u(t, p3)

u(t, p4)u(t, p5)

u(t, 0)

Figure 2.4.: This figure shows a typical discretization scheme for the approximation of avariable like the moisture content in machine direction. In our sequential model,we have p0 = 0 and u(t, 0) given by the final discretization node of the previouszone.

shall notate their numerical approximations, namely the newly introduced discretized vari-ables. By replacing the partial derivative with respect to p by its backward finite differenceapproximation, we obtain the approximative system

∂

∂tu(t, p1) + v(t)

u(t, p1)− u(t, p0)∆p(i)

= −m(t, p1)G(t, p1)

∂

∂tu(t, p2) + v(t)

u(t, p2)− u(t, p1)∆p(i)

= −m(t, p2)G(t, p2)

...∂

∂tu(t, pn(i)) + v(t)

u(t, pn(i))− u(t, pn(i)−1)∆p(i)

= −m(t, pn(i))G(t, pn(i))

forpj =

j

n(i)· `(i), j = 1, . . . , n(i) (2.2.5)

and

∆p(i) :=`(i)

n(i). (2.2.6)

For the implementation of such a model, this means that a finite number of variables aredeclared to approximate the smooth solution of u in machine direction.In gPROMS code, for the moisture content to be discretized in machine direction, we needto define a so-called ’Distribution Domain’.

DISTRIBUTION_DOMAINMachineDirection AS (0 : ZoneLength)

Then, a discretization of equation (2.2.1) like the one above has the following short form.

FOR p = 0|+ TO ZoneLength DO

25


$u(p) + PARTIAL(u(p), MachineDirection) = - m(p) / G(p)END

The PARTIAL-operator in the FOR-loop has the effect that gPROMS generates approxima-tions of the partial derivatives by automatically creating additional variables and equa-tions for the discretization nodes. Note that all variables are implicitly time-dependent ingPROMS language. Each cycle of the loop creates an equation which is filled with the newnode variables.The variables m and G are implicitly time-dependet since they depend on u and the tem-perature of the paper T and therefore need to be discretized at the same nodes as thedifferential variable. The mass of the solids in the paper G is only transported by themoving paper itself,

ddt

G(t, p) + v(t) · ∂∂p

G(t, p) = 0, p ∈ (0, l], (2.2.7)

while in steady-state this becomes

v · ∂∂p

G(p) = 0⇔ G(p) ≡ const. (2.2.8)

The transport of energy in terms of paper temperature is modeled analogously by a one-dimensional transport equation.

∂

∂tT(t, p) + v(t) · ∂

∂pT(t, p) = (Gb(t, p) · cpP (t, p))−1 ·S(t, p) (2.2.9)

Here, the variable S stands for a sum of flows into sinks and from sources of enthalpy. Itdepends on the type of the current zone, which summands in S are chosen as active. Wewrite

S(t, p) = Scyl(t, p) + Sw(t, p) + Sair(t, p) + Sev(t, p). (2.2.10)

It is an accumulative term and the potential summands are given by the following. The α-values are heat transfer coefficients and the subscripts and superscripts denote the involvedmaterials.

• Heat exchange of paper with the cylinder surface:

Scyl(t, p) = αPcyl · (Tcyl(t, dcyl)−T(t, p)) (2.2.11)

Note that the temperature of the cylinder surface is assumed to be constant alongthe discretization zone. The temperature Tcyl(t, dcyl) itself is described by the heatequation model for a steam-heated cylinder outlined below.

• Heat exchange with the wire:

Sw(t, p) = αPw · (Tw(t, p)−T(t, p)) (2.2.12)

The wire also changes its temperature while it is in contact with the paper. It can bemodeled analogously.

26


. . . . . .

(i)

(ii)

(iii)

(iv)

Figure 2.5.: A two-dimensional cross-section of a drying section. It is a sequence of circlesand lines. The lines represent the paper and have to be tangents of the circles.Note that the wire runs in direct contact with the paper, this means that thepaper web runs between cylinder and wire in zone (i) and in zone (iii) the wireruns between cylinder and paper.

• Heat exchange with the air:

Sair(t, p) = γair ·αPair · (Tair(t)−T(t, p)) (2.2.13)

Here, γair > 0 is a factor to model the fraction of the free surface. It depends on thetype of the contact between paper and air.

(air – paper – air) γair := 2(air – paper – wire – air) γair := 1 + γw

(cylinder – paper – wire – air) γair := γw(cylinder – paper – air) γair := 1

An empiricial porousity ratio γw for the wire has to be chosen in (0, 1).

• Evaporative heat loss:

Sev(t, p) = −∆VH(t, p) · m(t, p) (2.2.14)

The enthalpy of evaporation of water itself is a function of the paper temperature andthus depends on the position p.

Independent of the type of zone to be modeled, S always has to include the evaporative heatloss term. When thinking in terms of modeling in gPROMS, we have to note that everyfunctional dependency of a variable like the enthalpy of evaporation on a discretized variablerises the need of discretizing this variable also. So increasing the number of discretizationnodes can quickly increase the total number of variables in the system.A central equation to describe the evaporation of water from paper to air is given by theso-called Stefan equation, see [Kri97].

m(t, p) = Mw ·η

R· paT(t, p)

· log(

pa − pwa (t)pa − pwP (t, p)

). (2.2.15)

27


cylinder hull

condensate

steam

paperpaper wire

air

-rotation direction

Figure 2.6.: This figure illustrates the general structure of the system to be modeled. Thepaper is covered by a wire and lies on a cylinder which is heated by condensedsteam.

Here, the ambient pressure pa is given and assumed to be constant and η is an unknowndiffusion coefficient with a suitable physical unit. The logarithmic term in this equationincludes a partial pressure relation. The logarithm changes its sign when the differencebetween partial pressure of water vapor in the air and the partial pressure of water vapor inthe paper is zero. If pwP (t, p) > pwa (t), we have evaporation and if pwP (t, p) < pwa (t), we havecondensation. Now we have the evaporation rate as a function of the paper temperature andpartial pressures. If we assume saturated water vapor, the equation of Clausius-Clapeyron[Atk93] can be used to derive the partial pressure as a function of the temperature.

pwP (t, p) = exp(

11.82− 3891T(t, p)− 43.15

)(2.2.16)

paP (t) = exp(

11.82− 3891Tair(t)− 43.15

)(2.2.17)

The constants in that equations are taken from [Ekv04]. Note that this equations holds forthe phase equilibrium.In Fig. 2.6 we can see order of steam, condensate, cylinder, paper wire and air for a heatedcylinder.

Modeling of a Steam-Heated Cylinder

We present a model for a steam-heated rotating cylinder to compute the surface temperaturein a hierarchical drying section model. In principle, the cross-section of the hull of such acylinder is a ring. This could be approximated by a rectangular domain, see Fig. 2.6 forthe system’s layout. But first, we assume that the rotation of the cylinder is fast enoughso that we can ignore the length of the domain which would be dominated by the rotationspeed. So there only remains a temperature profile in direction of the wall thickness to bedescribed.On the lower side of the domain, there is condensed steam and on the upper side there iseither paper or air.We like to model the temperature of the cylinder hull and this leads to a boundary value

28


cylinder hull

condensate

steam

paper

paper wire

air

-

-

UKWO 6?

6?

6?

6?6?6?6?6?

6?WOUK

transport equation

heat equation

Y

)

Figure 2.7.: Here, we can see the main streams in the drying model. In vertical direction wehave one-dimensional flow in the cylinder hull and in the horizontal directionwe describe transport along the paper and wire. The arrows indicate that thereis an exchange of heat. Note that water is only exchanged by paper and air.

problem using the heat equation of the form T ′ = ∆T ,

∂

∂tTcyl(t, p) = λ · ∂

2

∂p2Tcyl(t, p), p ∈ [0, dcyl] (2.2.18)

withλ =

ωcylρcyl · cpcyl

. (2.2.19)

The parameter dcyl denotes the thickness of the cylinder hull. The diffusion coefficient λdepends on the heat conductivity ωcyl, the density ρcyl and the heat capacity cpcyl of thehull material, steel. We assume that the steel is a homogeneous material and therefore λ isconstant.In the steady-state case, the solution of the heat equation is linear if the diffusion coefficientλ is constant. This strongly simplifies the problem to be solved.The boundary conditions for the heat equation are given as Neumann boundaries by theheat transfer from the condensate and the paper or the air, respectively.

−ωcyl ·∂

∂pTcyl(t, 0) = αcylcond · (Tcond(t)−Tcyl(t, 0)) (2.2.20)

The temperature of the condensate is calculated analogously as before by the Clausius-Clapeyron relation and a given steam pressure, see again [Atk93].The surface of the cylinder either has contact with paper or with air. The associated fractioncan be computed by calculating wrapping angles as described in the next subsection. Theheat flow on the surface is the sum of the heat flow from the cylinder to the paper QP andthe heat flow from the cylinder to the air Qair.

−ωcyl ·∂

∂pTcyl(t, dcyl) =

Qair(t)Aair

+QP (t)`c

(2.2.21)

Anticipating the geometry calculations from the next subsection, we assume that `c is thelength of the zone in which the paper has direct contact with the cylinder surface. The

29


air control volume air control volume

Figure 2.8.: This figure illustrates the control volumes for the water and enthalpy balanceswhich are needed to determine the temperature and the moisture content ofthe air that surrounds the paper. A control volume consists of a circle segmentzone on a cylinder followed by a tangent zone. This means that we assumeconstant environment conditions throughout a whole zone.

free surface of the cylinder hull remains and its area is given by Aair. Then, the totalheat conduction to the paper results from integrating the local heat transfer over the wholelength of the discretized zone. The heat flows are given as

QcylP (t) =

∫ `c

0αPcyl · (Tcyl(t, dcyl)−T(t, p)) dp (2.2.22)

and

Qcylair(t) = αaircyl ·Aair · (Tcyl(t, dcyl)−Tair(t)) (2.2.23)

for the heat flow to the surrounding air. See Fig. 2.7 for schematic drawing of the streams.

A Simplified Steady-State Air Control Volume

In the equations from above, the variables Tair for the temperature of the surrounding airand pwa are still unknown. These depend on assumptions on the environment of the paperand the cylinder.We need a control volume of air to balance enthalpy and water. In the following, we outlinean equilibrium model for a control volume. There, we assume that energy and moistureare in equilibrium states, thus at every time t, we will get a balanced value for temperatureand humidity of the air. The water load uair then determines the partial pressure of the

30


water vapor by the known formula

pwa (t) =pa ·uair(t)

0.622 + uair(t), (2.2.24)

see [Hol06]. The enthalpy balance in steady state is

Qinair(t) = Qout

air (t) (2.2.25)

and the water balance isFinair(t) = Fout

air (t). (2.2.26)

To set up the detailed balances, we need to collect all inflows and outflows of enthalpyand water. In Fig. 2.8, we see the components of an air control volume. It combines theair surrounding a zone on a cylinder with the following tangent zone. The volume of thisdomain has no effect for a steady state model. We assume that the air surrounding thesetwo subsequent zones is balanced with respect to all inputs and outputs.It is assumed that each control volume is supplied with a fixed amount q0

air of fresh air ofcertain humidity u0

air and a temperature T 0air. The total water flow M(t) is the sum of the

water flows from each connected paper zone (integral over the evaporation rate). Let Mdry

be the mass flow of dry air, which is equal for input and output of the steady-state system.The water mass balance for steady-state is

u0air · Mdry + M(t)︸︷︷︸

Finair(t)

= = uair(t) · Mdry︸︷︷︸=Foutair (t)

, (2.2.27)

and this is equivalent to

uair(t) = u0air +

M(t)Mdry

(2.2.28)

because Mdry > 0 can be assumed. The mass flow of dry air Mdry is given by

Mdry = ρ0air ·

q0air

1 + u0air

, (2.2.29)

where ρ0air is the density of the humid air which is blown into the control volume. It follows

uair(t) = u0air +

1 + u0air

ρ0air · q0

air

· M(t), (2.2.30)

thus, the steady-state moisture content of the air is proportional to the total evaporation.For the enthalpy balance, we have

Qoutair (t) = Tair(t) · cpair · Mdry + Tair(t) · cpw · (u0

air · Mdry + M(t)) (2.2.31)

for the outflow of enthalpy and

Qinair(t) = T 0

air · cpair · Mdry + T 0air · cpw ·u0

air · Mdry + QPair(t) + Qcyl

air(t) (2.2.32)

for the inflow. The flow from paper to air QPair consists of direct heat transfer by contact

and indirect heat transfer by evaporation which has the form

QPair(t) =

∑zones

(∫Sair(t, p) dp+ cpw

∫T(t, p) · m(t, p) dp

). (2.2.33)

In each zone, we have to integrate the heat exchange Sair and the enthalpy flow in termof water vapor m. The heat exchange with the hot cylinder is given by Qcyl

air as previouslydescribed.

31


`c

`tan

Figure 2.9.: We are interested in the length of the tangent `tan and the arc `c. Being able tocompute these, we can determine the lengths throughout the whole machine.

2.2.2. Drying Section Geometry

Now we will show how to specify the length ` of the zones for the drying section model. Wecan determine the values of ` for the zones by analyzing the two-dimensional cross-sectionof the drying section, see Fig. 2.5. It is a sequence of circles and the paper can be seen asa sequence of circle segments and line segments. If we assume that the run of the paperis smooth, the circle segments and the line segments have to be connected continuouslydifferentiable with respect to the arc length. Thus, the line segments are tangents of thecircles. To have the zone lengths needed for the model presented above, the length of thecircle and line segments have to be calculated, see Fig. 2.9.We explain how to calculate the length of the zones wrapped by paper and the free tangent.The calculation of the tangent length is quite straight-forward. For the calculation of thewrapping length, the idea is to decompose one half of the circle. To derive the desiredresult, we will define a displacement angle α and an additional angle β used in the law ofcosines.A construction layout of a drying section contains the absolute positions of the cylindersand the radiuses of them. This is enough information for the lengths of the circle segmentsand the line segments to be computed.Let ∆x and ∆y be the horizontal and vertical distances between the centers of two circleslike in Fig. 2.11. The radius of the upper circle is r1 and the radius of the lower circle isr2. We illustrate how to calculate the lengths of the different zones at this example. Thedistance of the two centers is given by

d :=√

∆x2 + ∆y2. (2.2.34)

32


r1

`tan

r2∆x

∆yd

r2

α

Figure 2.10.: This drawing shows how to calculate the length of the tangent and the dis-placement angle α by rectangular triangle rules.

The paper wraps the lower cylinder. We calculate the length of the wrapping by lookingat the left half of the circle representation. First, we note that the angle characterizing thedisplacement of the upper and the lower circle can be calculated by

cosα =r1 + r2

d, (2.2.35)

see Fig. 2.10. As illustrated in Fig. 2.11, we can use the law of cosines for a general triangle

c2 = a2 + b2 − 2ab cosβ (2.2.36)

with

a = r2, (2.2.37)b = d, (2.2.38)c2 = (∆y + r2)2 + ∆x2. (2.2.39)

This gives

β = arccos(r2

2 + d2 − (∆y + r2)2 −∆x2

2r2d

). (2.2.40)

The lines from the tangent points on the circle to their centers are orthogonal to the tangentas shown in Fig. 2.10. Thus, we can calculate the length of the tangent simply by

`tan =√d2 − (r2

1 + r22), (2.2.41)

With the law of cosines, we get the angle β. Now we need to subtract the displacementangle α and calculate the radian measure of the resulting angle. This gives the wrappinglength on the lower circle as

`c = (β − α)π

180r2. (2.2.42)

33


β

a := r2

c

b := d

`c

Figure 2.11.: In this drawing, we can see the application of the law of cosines to determinethe maximal wrapping angle β with the sides a, b and c. By subtracting thedisplacement angle α from Fig. 2.10, we get the effective wrapping anglewhich is needed to calculate the wrapping length `c.

Now this approach has to be followed for both halves of each circle to calculate the overallwrapping lengths.

Calculating Zone Lengths for Single- or Double-Tiered Drying Sections

With the geometric approach described above, we have all we need to determine the zonelengths of more complex geometries as well. Drying sections use paper wires to stabilize therun of the paper. These wires run along with the paper in machine direction, but since thepaper in the machine can be very large in length and width, it is convenient to use severalwires subsequently to limit the wire lengths. There are two different ways how such wiresare installed, see Fig. 2.12.

(i) Single-Tier: The wire runs along with paper from upper rolls to lower rolls in groupsand is changed at the end of each group.

(ii) Double-Tier: Upper and lower rolls have their own wire and it is never transferedtop-down or bottom-up.

The single-tier case can be handled by the geometry approach directly. But in case of adouble-tier machine, we have to note that there are more different zones since the wire doesnot cover the paper at every time that the paper lies on a cylinder. The sequence is ’freesurface – covered by the wire – free surface’. So we have to take care of the wrapping of thecylinder by the wire. Fortunately, the wire run is lead by own additional rolls which allowsthe calculation of the wire wrapping in the same manner. For each half of a cylinder, we can

34


tier 1

tier 2

double-tier single-tier

Figure 2.12.: This figures shows the difference between a single- and a double-tier dryingsection. The double-tier version has additional rolls to lead the paper wire(red). This generates a different decomposition of the wrapping segments.

calculate the length of the total paper wrapping and use the length of the wire wrappingto divide the total length into a zone with and without wire. Doing this on each side andmerging the wire covered zones to one gives the desired decomposition.

2.2.3. Steady-State Simulations

We show typical results of the presented drying section model. Every discrete node repre-senting a checkpoint for the paper to pass has a unique relative position compared to thestart of the drying section. The first position can be fixed at position 0 and we can givethe position of a node with respect to the paper length by summing up the partial lengthsof the zones involved.It is clear that the trend of discretized variables over the paper position is a continuous onebecause the input of a zone is set equal to the output of the preceding zone.In Fig. 2.13 we show the trend of the paper temperature. The paper is heated only oc-casionally and therefore cools down in the zones between two hot rolls. This leads to thistypical sawtooth structure. For this result we used a fixed number of 5 nodes per zone,independently of the real length which differs between 1 for contact zones and 3.6 metersfor free zones. It can be seen that there is something like an upper trend for the temper-atures at the end of each zone on a hot cylinder, where the paper temperature has a localmaximum. Analogue, there is a lower trend for the minimal paper temperature reached atthe beginning of a heated roll.In Fig. 2.14 a central result of drying section simulations is shown, namely the trend ofthe dry content of the paper. It starts with a fixed input dry content and starts to drymonotonically and slowly. One can distinguish three different phases of drying, the slowstartup, the fast main drying phase and the slow end drying phase. Physically, this canbe explained by the fact that the paper is quite cool when it enters the drying sectionand water hardly evaporates from it. A fast heating-up is impeded by limitations on themaximal temperature differences between hot cylinders and the paper. So the first phase

35


Figure 2.13.: Typical sawtooth simulation result of the paper temperature over its position.

Figure 2.14.: Typical S-shaped simulation result of the dry content of the paper over itsposition.

36

2.3. The Wet-End Process

tank model

pipe model

pump model

former model

q

model library

R j

-

j q

...

. . .z

Figure 2.15.: This figure shows the concept of library based modeling within flowsheetsof wet-end processes. Since nearly every plant has a different layout, it isconvenient to create reusable standard models within a library.

heats up the paper quite slowly. The paper reaches its maximal drying rate approximatelyin the middle of the drying section. Finally, the paper can never reach the theoretical drycontent of 1, because this would require the complete absence of water and so the dryingprocess is forced to slow down at the end of the machine. Consequently, the trend of thedry content must have similarity with an S-shaped curve.


Different paper machine can have very different concepts, depending on the type of theproduced paper, the raw materical processed or the capacity of the machine. The conse-quence is that the wet-end process does not have a unique layout but it depends on theconcept of the whole plant. The consequence is that we cannot expect to build a model todescribe a general wet-end process. Instead of that, we have to create a whole model libraryconsisting of single models for certain common parts of the wet-end process which can beused to build flowsheets of real wet-end plants. The library and ’flowsheeting’ concept ofmodeling is illustrated in Fig. 2.15.Such a model library was developed and implemented in gPROMS. As for the drying sec-tion model, some of the wet-end models are indirectly based on the solution of partialdifferential equations. At the current stage of developement, gPROMS 3.1.0 is capable ofdiscretizing PDEs on rectangular domains by using different discretization schemes suchas finite differences or orthgonal collocation. This replaces the PDEs by sets of linear andnonlinear or ordinary differential equations, respectively. Before we get into more detailabout wet-end modeling, we discuss the effects of the spatial discretization by means of thetransport problem which is elementary for wet-end models.

37


We show that a common way to describe transport phenomena in chemical engineering canbe seen as a special case resulting from the numerical discretization of the one-dimensionaltransport equation.

2.3.1. About the Transport Problem

The dynamics of a wet-end system has two main components to determine its behavior.The first one is the transport phenomenon through pipes, the other is separation of pulpstreams. When building a whole model library for wet-end flowsheeting, it is not practicalto simulate the fluid dynamics in pipes and tanks because of the computational effort to doso. However, it can be argued that is not necessary anyway. A pipe is mainly a componentwhich causes a delay in the system. But also, diffusion may take place inside. In reality,diffusion in pipe flows depends on flow velocities, pulp properties and pipe geometries.In the following we discuss a general model for transport of mass or similar information forexample through a pipe. We will discuss the effect of a finite difference approximation onthe behavior of the resulting system.A quite simple and straight-forward way of modeling the transport through a pipe is usingthe transport equation

∂

∂tc(x, t) + v · ∂

∂xc(x, t) = 0, (2.3.1)

where we assume that the velocity v is constant and c is some kind of concentration. Inthat case, it is known that the solution of the initial value problem including (2.3.1) and

c(x, 0) = g(x) (2.3.2)

is given byc(x, t) = g(x− tv), (2.3.3)

see [Eva98]. But for this to hold we need v to be constant for all t which holds not true forreal problems.We can interpret a pipe as the interval [0, 1]. When giving some information g(x) for x < 0,the transport equation causes a delay of 1

v seconds until the information is transportedfrom 0 to 1 without any loss. It becomes clear that the exact solution of the transportproblem is not suitable for application in simulation. Such a system has no memory. Whathappens if we change the velocity v at a certain time, the solution (2.3.3) is not correctand it would lose information in the case of increasing v or it will not have the informationneeded in case of decreasing v. We need interior points to store history data of the flowthrough a pipe to capture the dynamics of such a system. We now investigate the effectof numerically solving the transport problem and discuss its relevance to wet-end processsimulation. Again, a numerical way to approximate (2.3.3) is using finite difference formulasto replace the partial derivatives. An effect known as numerical diffusion rises and can beobserved clearly if the initial conditions g(x) are discontinuous. Then, every approximationof the partial derivative will be inaccurate close to discontinuities. This has the effect ofgiving nonzero partial derivative approximations where they should be zero. This makesthe solution artificially smooth and can be seen as unintended diffusion.The reason why we discuss this is because of the inertia of the system to be modeled.Assume that the function g has the following form.

g(x) :=

0, x < −1,1, −1 ≤ x ≤ 0,0, x > 0.

(2.3.4)

38


10 12 14 16 18 20 22 24 26 28 300

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

t

c(1,

t)

k = 2k = 5

k = 10

k = 20

k = 50k = 100

k = 200

k = 500k = 1000

Figure 2.16.: The colored lines result from the numerical solution of the transport equationfor v = 0.1 and an impulse of maginute and duration 1 using a backwardfinite difference scheme. The lines represent the solutions of discretizationschemes with 2, 5, 10, 20, 50, 100, 200, 500 and 1000 equidistant nodes. Thetime-integration of the resulting linear systems is done by gPROMS.

This is an impulse of magnitude 1 and duration 1. The exact solution of the transportequation for v = 0.1 will produce a hard impulse moving through the interval within 10seconds of simulation time. This means that all information coming from g passes throughthe transport system in this time. See Fig. 2.25 for a comparison of different numbers ofdiscretization nodes. Let us model

∂

∂tC(t) = c(0, t)− c(1, t) (2.3.5)

withC(0) = 0. (2.3.6)

The solution just becomes

C(t) =∫ t

0c(0, s)− c(1, s) ds. (2.3.7)

The variable C results from measuring the difference between information inflow andoutflow at every time and so it balances the information content. For the exact solution ofthe transport equation for (2.3.4), the right-hand side of equation (2.3.5) will switch from1 to 0, then to -1 and back to 0.

∂

∂tC(t) = g(0− tv)− g(1− tv)︸︷︷︸

∈0,1,−1

(2.3.8)

39


100

101

102

103

11

12

13

14

15

16

17

18

19

20

number of nodes

t 0.9

Figure 2.17.: The computations were done using gPROMS solving the transport equationwith v = 0.1 and and impulse of magnitude 1 in the first second. The totallength of the discretization zone is 1 and the time t0.9 was chosen by checkingthe condition C(t) ≤ 0.1. Although in total average, the information contentis the same for every discretization, the main part of the information is passedin very different times.

Then C is piecewise linear, starting at 0, moving to 1, staying there and moving to 0 again.It will reach zero at time t = 10 seconds. For 0 < p < 1, we define

tp := mintt > 0 : C(t) ≤ (1− p) · v. (2.3.9)

For a given discretization of the transport equation and an impulse g(x), this is the timeneeded to transport a fraction of p of the total information carried by the impulse. Then wechoose t0.9 as reference value to compare the influences of different numbers of discretizationpoints. It tells us how fast 90 % of the total impulse information is transmitted. It is clearthat choosing p = 1 would give t1 = ∞ for an exact solution of the discretized transportequation. See Fig. 2.17 for a comparison of different numbers of discretization nodes withrespect to their value of t0.9. The backward finite difference approximation of the transportequation with k discretization nodes gives a linear system of the form

∂

∂tX = AX +Bu (2.3.10)

with

A =v

h·

−1 01 −1

1 −1. . . . . .

0 1 −1

∈ Rk×k (2.3.11)

40


?

?

?

?

?-

vhc(0)

X1

X2

X3

X4

X5

vhX1

vhX2

vhX3

vhX4

vhX5

6

?

6

?

6

?

6

?

6

?

- - - - --

Figure 2.18.: This is a cascade of ideal tank reactors to model transport through a pipe.Now, the velocity of the flow may change over time since it is always taken thesame for every CSTR in the cascade. The volume in each CSTR is thereforeconstant. Only the concentration variables Xi will change.

and

B =v

h·

10...0

∈ Rk. (2.3.12)

Here h is the discretization step length. The vector X contains the discretized nodes toapproximate

Xi(t) ≈ c(i ·h, t), i = 1, . . . , k, (2.3.13)

when choosing the single inputu(t) := c(0, t). (2.3.14)

This is just a sequence of linear equations and each of them can be seen as the model ofso-called continuous stirred tank reactor (CSTR) as it is known in chemical engineering[Sch04]. This means that the numerical solution of the transport equation leads to themodel of a cascade of ideal tanks, see Fig. 2.18. It is known that the average hold-up timein such models is constant and therefore independent of the number of discretization points.We assumed an equidistant discretization, but if we allow non-equidistant nodes, the num-ber of nodes and their distance can be used to adjust the tank cascade in a way to fitreal measured data. Flow phenomena usually include also real diffusion (or dispersion)which can be modeled using numerical diffusion that results from the discretization of the

41


transport equation. Therefore, take the dispersion model

∂

∂tc(x, t) = −v · ∂

∂xc(x, t) +Dax ·

∂2

∂x2c(x, t), x ∈ (0, L] (2.3.15)

that differs from the transport equation in the diffusion term. Here, we assume v = const.The constant Dax is the coefficient describing the axial dispersion in the flow. The so-calledBodenstein number is defined by

Bo :=v ·LDax

. (2.3.16)

In [Sch04], it is shown that if the Bodenstein number is large (> 50), it is close to 2k, wherek is the number of discretization points. Anyway, for a suitable choice of k, the dispersionequation can be approximated by the discretized transport equation. This is convenientsince the problem of solving system (2.3.10) is numerically easier than solving the dispersionproblem (2.3.15).

Remark 2.3.1. We illustrated that the first order backward difference approximation ofthe spatial derivative in the transport equation leads to a model of a cascade of CSTRs.This type of discretization is just the application of the implicit Euler method in the spatialdirection. If we ignore the time-dependency of the transport equation, it remains an initialvalue problem in spatial direction. For a flow through a pipe, we can assume that we havegiven the inflow at any time. A higher-order formula to solve this initial value problem witha fixed grid needs more information. If we apply the higher order finite differences methodsto create a linear system of the form

∂

∂tX = AX +Bu, (2.3.17)

there have to be further assumptions on the inflow. Assume

A =v

12h

. . .1 −8 0 8 −1

1 −8 0 8 −11 −8 0 8 −1

. . .

(2.3.18)

for a 5-point stencil. What does this mean physically? The application of a finite differencescheme to replace the partial derivative in the transport equation creates a balance equationfor each node of the form

rate of change = inflow - outflow. (2.3.19)

Inflow and outflow result from summing up all partial flows to other nodes. In case ofa higher-order formula that uses elements which are not direct neighbors of the node, wecannot expect that the flow is unidirectional. Instead of that, reflux is assumed that is even’jumping’ over elements. Physically, in a cascade of of tank reactors like the CSTRs, a singletank in the system can only exchange with its direct neighbors. This excludes discretizationschemes with more than 3 nodes.

42


inflow2

outflow2inflow1storage

orreaction

element

port structure:

pressuremass fibersmass fillers

temperature

7

/

Figure 2.19.: The ports to connect different elements have a certain structure containing allnecessary information.

2.3.2. Model Structures for Pressure-Driven Balancing

We can distinguish three different types of elements. These are:

• storage elements: such elements cause a delay in the system’s dynamics by bufferingbalanced properties.

• reaction elements: these elements change the balanced variables (e.g. by chemcialreactions).

• hybrid elements: a combination of both types to describe a reaction with a bufferingcharacter.

For example, a pipe is a pure storage component since there are no other sinks or sourcesof mass than the inlet and the outlet. Thus, mass fractions do only change by transportthrough the pipe.In Fig. 2.19, the structures for connecting several components are shown. First, we have todecide which quantities are to be modeled. Pulp has several quantifiable properties such asthe mass fractions of each solid component. Every storage or hybrid element has a certaincapacity and the liquid is filled up to a current level. Assuming ideal tank reactors, thisleads to a single variable representing each quantity of the pulp. Balancing volume, fibers,fillers and enthalpy of the pulp, we need to introduce variables of the following types forevery element which has storage character.

• V – volume in [m3]

• mfib – mass of fibers in [kg ·m−3]

• mfil – mass of fillers in [kg ·m−3]

43


storage

element

storage

element

p2p1

-

p1(t)− p2(t) = sgn(v(t)) · 12 · ρ · v(t)2

Figure 2.20.: Each storage component communicates its pressure via a bidirectional port.The direction of the flow depends on the sign of the pressure difference of twoconnected components.

• T – temperature of the pulp in [K]

In order to describe the transport of pulp with the chosen quantities, we have to balancethe quantities in each storage element by taking care of all inflows and outflows, sinksand sources. By a reaction element, we do not necessarily mean a chemical reaction. Ifa component has just a single inflow but several outflows, it must be decided by usingprocess know-how how the inlet flow is separated into the several outflows. It is clear thatone condition has to be the conservation of mass, at least for pure reaction elements. Soif, for instance, the element that is to be modeled has a filtering character, this can meanthat one of the outgoing streams changes its properties depending on the inlow properties.The connection of different components is done by ports. Each inlet or outlet of an elementis represented by a port which communicates the pulp quantities mass of fibers, mass offillers and temperature. For a pressure-based transport system, each port has to include aquantity for the pressure. To decide the amount and the direction of the liquid flow, thedifference between the pressures p1 and p2 of two connected ports is used. This relation isbased on the formula

∆p := p1 − p2 =12· ρ · v2 (2.3.20)

for frictionless flow, which is a special case of Bernoulli’s equation for incompressible fluids.Here, the velocity v is a volume flow rate in [m3 · s−1] and ρ is the density of the fluid.

Remark 2.3.2. Here, we can see some of the problems in modeling a pressure-drivenflow that are typical for nonlinear equations. Assume that we have given ∆p > 0, thenthe solution of (2.3.20) is not unique, but ±(2∆p

ρ )0.5. If one tries to solve the nonlinearequation numerically by a Newton-type method, the solution will depend on the choice ofthe initial guess and furthermore, initializing with v = 0 will cause the numerical methodto fail because no search direction can be found.

44


To overcome the problem of requiring ∆p ≥ 0, we extend equation (2.3.20) to

∆p = sgn(v) · 12· ρ · v2, (2.3.21)

which is the same as (2.3.20) for positive pressure differences but also yields a solution fornegative differences. The sign of v is a discontinuous function and therefore we have todistinguish two different cases. In gPROMS notation, this means:

IF v >= 0 THENDelta_p = 1/2 * rho * v^2

ELSEDelta_p = -1/2 * rho * v^2

END

This can also be written in an implicit form by

Delta_p = 1/2 * rho * v * ABS(v)

where ABS(v) is the absolute value of v which causes gPROMS to automatically distinguishthe two cases.

2.3.3. Library Components

Having discussed the model structures and the main phenomenon of wet-end processes,namely the transport problem, we are ready to present the specific models that are neededfor wet-end flowsheeting.

Storage Chests

A storage chest can be modeled as a simple CSTR. Assume that the chest is cylindrical,then we need the radius r and the height h to define its capacity.The variables V , mfib, mfil and T have to be declared in any case to balance the pulp’sproperties. Let nin be the number of inflows and nout be the number of outflows with flowrates and pulp properties. The variable vof may denote the volume flow rate at the overflowof the chest. Then the mass balances are:

∂

∂tV =

nin∑i=1

viin −nout∑j=1

vjout − vof , (2.3.22)

∂

∂t(V ·mfib)︸︷︷︸

mass fibers

=nin∑i=1

viin ·mifib −mfib ·

nout∑j=1

vjout −mfib · vof , (2.3.23)

∂

∂t(V ·mfil)︸︷︷︸

mass fillers

=nin∑i=1

viin ·mifil −mfil ·

nout∑j=1

vjout −mfil · vof , (2.3.24)

∂

∂t(V ·T ) =

nin∑i=1

viin ·T i − T ·nout∑j=1

vjout − T · vof − loss+ input. (2.3.25)

Note that the outflows have always the properties of the CSTR itself. The last equationincludes additional terms for heat loss and input. Instead of using (ρ · cpw ·V ·T ) for the

45


energy, we assume that the density and the heat capacity are constant and write only(V ·T ). Using the product rule for equations (2.3.23)–(2.3.25) gives

∂

∂tmfib =

1V·nin∑i=1

viin · (mifib −mfib) (2.3.26)

∂

∂tmfil =

1V·nin∑i=1

viin · (mifib −mfil) (2.3.27)

∂

∂tT =

1V·

((nin∑i=1

viin · (mifib − T )

)− loss+ input

)(2.3.28)

The liquid level ` depends on the volume and is given by

` =1πr2

V (2.3.29)

and the geodetic pressure at the bottom of the chest is

p = ρg`, (2.3.30)

where g is the gravitational acceleration. If we assume that the outlets are at the bottomof the chest, this is the pressure to be communicated via the port.Two special cases have to be handled.

• Overflow: when V ≥ Vmax for a maximum volume Vmax ≤ hπr2.

• Running empty: when V ≤ Vmin for a small minimum volume Vmin > 0.

By modeling the overflow by an additional outlet and describing its flow rate by the samerate that the volume in the chest would increase without the overflow, we can solve theoverflow case:

If V ≥ Vmax : vof =nin∑i=1

viin −nout∑j=1

vjout, (2.3.31)

otherwise : vof = 0. (2.3.32)

This keeps V at its maximum value as long as the inflow is higher than the outflow. De-pending on the layout of the process, the overflow is collected in sinks. Overflows can alsobe used in the model to sustain the existance of a solution in cases where no overflow isdesired or possible. Then the sink does not have to include any limitations on the overflowrate or amount.To understand how to model a CSTR running empty, we have to explain the followingbidirectional pipe model.

A Bidirectional Pipe Model

If one would take the transport equation and use a fixed backward difference scheme todiscretize the pipe in its length, flow can only be unidirectional. The boundary of thedomain must always be at the same side of the pipe, thus, the inflow must always be atthe same side. Real pipes, however, allow bidirectional flows. Each time, the flow direction

46


changes, the discretization of the transport equation must be reversed in order to describethe CSTR cascade behavior. However, this is not practical.The pipe is seen as a cascade of k control volumes of fixed size. Let a cylindrical pipe havethe length l and the radius r. Then the control volumes satisfy V1 = · · · = Vk = lπr2

k . Thecontrol volumes are modeled as storage chests as described above. Each chest has a singleinflow and a single outflow and the flow rate v is spatially constant, so V is constant. Itdepends on the flow direction, which side is regarded as inflow and outflow.When the flow direction is from left to right, then the first CSTR has unknown inflow(left-hand side) pulp properties since these come from the connected storage. And whenthe direction is from right to left, then the k-th CSTR has unknown inflow (right-handside) properties.In case of the flow direction from left to right, this means that we have a total of k equationsof the form

∂

∂tmjfib =

v

V· (mj−1

fib −mjfib), (2.3.33)

mk+1fib = mk

fib, (2.3.34)

for j = 1, . . . , k, while the inlet property m0fib is unknown and therefore it must be given

by another connected model. When reversing the flow direction, we change the equationsto

∂

∂tmjfib =

v

V· (mj+1

fib −mjfib), (2.3.35)

m0fib = m1

fib, (2.3.36)

while mk+1fib now is the inlet property that has to be communicated from the right-hand

side of the pipe. If the flow direction changes, all information is preserved in the nodesm1fib, . . . ,m

kfib. For fillers and temperature, the equations are set up analogously.

To decide which case to activate, the sign of the flow rate v is checked. Using the equation

p1 − p2 = sgn(v) · 12· ρ · v2 + ∆lossp (2.3.37)

with a pressure loss in the pipe can have two scenarios.

(i) p1 and p2 are given: then we can determine v by solving the equation.

(ii) v and p1 or p2 are given: then we can determine p1 or p2 by solving the equation.

This concept has an immediate consequence on connected storage elements. If both storageelements communicate a pressure to the pipe, then the pipe determines the flow directionand the flow rate. If however, one of the storages communicates v = 0 for some reason andleaves no equation for the pressure, then the missing pressure is determined by the pipe asa dummy variable.A connected CSTR that runs empty changes its equations. Instead of including the equationfor the geodetic pressure (which yields 0 in that case), it directly includes v = 0 for theconnected pipes and each outlet.Switching the equations is triggered by events that depend on the liquid level in the storagechest. If it sinks below a minimum value, the outlets are closed by activating v = 0 and ifit rises over a certain level the equations are switched back again. By using a higher valueof the reactivation of the outlets, a hysteresis is applied.The value of k can be chosen dependent on the length of the pipe to be interpreted as nodesper meter k(`).

47


pA pB

chest Bchest A

pipe A pipe Bpump

pin pout

6

?

rotating at R rpm

??

Figure 2.21.: This illustrates a system of two chests connected by two pipes and a pump.This system is represented by the equations (2.3.39)–(2.3.41).

Pumps

We outlined how to model pressure driven flows through pipes by a quadratic relationbetween pressure difference and flow rate. Pumps generate a certain pressure head to makethe pulp flow in a desired direction. The pressure head depends on the geodetic height ofthe connected pipes in vertical direction and pressure loss in the pipes. Centrifugal pumpshave a special characteristic of the pressure head that depends on the flow rate and therotation speed of the pump.We model the pressure head ∆hp as a nonlinear function fpc of the flow rate v and therotation speed R:

∆hp = pin − pout = fpc(v,R). (2.3.38)

To explain how this works in a system of two chests, two pipes and a pump, we discuss thefollowing example, see Fig. 2.21. Two chests A and B generate a certain pressure pA andpB communicated to a pipe each. The following three equations determine the flow systembetween both chests with a pump present and rotating at R rounds per minute.

pA − pin = sgn(v) · 12· ρ · v2 + ∆A

loss, (2.3.39)

pout − pB = sgn(v) · 12· ρ · v2 + ∆B

loss, (2.3.40)

pin − pout = fpc

(v, R

). (2.3.41)

Here, the boxes mark the known variables. The unknowns pin, pout, v result from solvingthis nonlinear system of equations. This leads to case (ii) of equation (2.3.37), where onlya single pressure and the flow rate for the pipe are given. In this system, the pipes donot determine the flow rate directly but they generate their specific pressure loss which isneeded to determine the flow rate.Note that the pressure-driven flow of the pulp plays an important role when pumps aresimulated. If a correct pump characteristic fpc is given, we can simulate the flow rateand the rotation speed of the pump. These values are needed to calculate the energyconsumption and the efficiency of the device according to measured pump characteristicsavailable from the pump manufacturer.

48


-

?

-

v(2)in

v(1)in

vout -

6

-

v(1)out

vin v(2)outpin = p

(1)out = p

(2)outp

(1)in = pout = p

(2)in

Figure 2.22.: In principle, there is no difference between junctions and confluences whenconnecting bidirectional pipes that are connected to storage chests themselves.But if pumps are used, flow has to be unidirectional and a port of junction orconfluence has a specific unique function.

Junctions and Confluences

Several flows can be combined to a single flow by introducing a model for a confluenceelement. A confluence element has at least two inlets and a single outlet. All we need is toset up conservation equations for fiber, fillers and temperature and summing up the flowrate of the inlets.

vout = v(1)in + v

(2)in (2.3.42)

mfib · vout = m(1)fib · v

1in +m

(2)fib · v

(2)in (2.3.43)

mfil · vout = m(1)fil · v

(1)in +m

(2)fib · v

(2)in (2.3.44)

T · vout = T(1)in · v

(1)in + T

(2)in · v

(2)in (2.3.45)

pout = p(1)in = p

(2)in (2.3.46)

The flow can be separated by junctions. These basically work opposite to confluences. Thepressure is assumed to be the same at every port the element. It is important that suitableelements are connected to make sure the system works in the intended way. If for examplea pump is connected at every port of a confluence element, then the system will be over-specified and remain unsolvable.It is also possible to give a certain control volume to confluences and junctions to describethe mixing dynamics and interia.

Headbox

A headbox model is also important for the description of a pressure-driven flow. Pulp ispumped at high flow rates through the headbox which compresses it to a thin and wide jetof very high velocities. While the pulp is accelerated, a pressure loss takes place.But it does not play an important role for the wet-end dynamics since velocities are sohigh and the modeled pulp properties do not change while being transported through the

49


"!# "!#

j

vin

vpap

?vww

Figure 2.23.: This is a schematic illustration of the so-called ’DuoFormer TQv’ from VoithPaper. The blue lines represent the path of the paper running through theformer. When balancing the wet-end dynamics, it is necessary to describe theseparation behavior of this device.

headbox.The headbox produces a jet of pulp of a certain velocity vjet by diricting the pulp through anozzle with a certain slice opening dslice. If WM is the width of the nozzle, then the volumeflow to the former satisfies the relation

vin = γcontr · vjet ·WM · dslice, (2.3.47)

with a contraction factor γcontr ∈ (0, 1). This equation is needed to determine the actualvolume flow to the headbox which must be provided by the headbox pump.

Former – Separating Pulp Streams

In the former, the pulp jet is filtered and dewatered by a wire, see Fig. 2.23. This meansthat certain changes in the mass fractions of water, fibers and fillers take place while thepulp stream is separated. One part of the stream continues its way through the machineas paper and another part returns to the wet-end as so-called white water. To model thebehavior of a former, it is essential to describe the separation of the pulp stream. As for theheadbox, we can assume that the former does not cause a significant hold-up time. Then,the mass of water and solids has to be conserved at every time. The total stream balance

vin = vpap + vww (2.3.48)

has to be satisfied if we assume constant density ρ.

vin ·(minfib +min

fil

)︸︷︷︸total mass in

= vpap ·(mpapfib +mpap

fil

)︸︷︷︸

total mass paper

+ vww ·(mwwfib +mww

fil

)︸︷︷︸total mass white water

(2.3.49)

Equations (2.3.48) and (2.3.49) contain a total of 6 unknowns:

• mpapfib – solid content of fibers in the paper

50


• mpapfil – solid content of fillers in the paper

• mwwfib – solid content of fibers in the white water

• mwwfil – solid content of fillers in the white water

• vpap – flow rate of paper

• vww – flow rate of white water

Equation (2.3.49) balances the total solids of the separating stream and

vin ·minfil = vpap ·mpap

fil + vww ·mwwfil (2.3.50)

balances the fillers.The total solids are separated by a dewatering fraction γdew. This gives

(mpapfib +mpapfil )ρ = γdew . (2.3.51)

The solids are fractioned by the so-called retention. For fibers, the retention is defined as

Rfib =vpap ·mpap

fib

vin ·minfib

(=

fibers in paperfibers in jet

), (2.3.52)

and for fillers it is

Rfil =vpap ·mpapfil

vin ·minfil

(= fillers in paper

fillers in jet

). (2.3.53)

Instead of equation (2.3.52), the total retention

Rtot =vpap · (mpapfib +mpapfil )vin · (minfib+minfil)

(= total solids in paper

total solids in jet

)(2.3.54)

can be used. It gives a fraction of the solids that remain in the paper. Collecting the equa-tions (2.3.49), (2.3.48), (2.3.50), (2.3.51), (2.3.53) and (2.3.54) together with assumptionson γdew, Rfil and Rtot yields a system to determine the stream separation in the former. Ifγdew, Rfil and Rtot are constants, this is just a system of linear equations.Now this is the place to apply process know-how about dewatering and retention. Practi-cally, the dewatering fraction γw as well as the retention values are no process parametersthat can be chosen. In fact, they are a result of the process itself and depend on manyother process parameters as well as pulp properties.Chemical additives affect the retention in a complex way. They are added to the system andpass through it analogue to the pulp. This means that some of them remain in the paperwhile others are lead back to the cycles of the system. And all this time, the additives reactwith the solids of the pulp in a way that they build flocs of fibers and fillers which affectsthe retention value the next time the pulp enters the former. One way to describe this isto balance chemical additives all over the plant along with these reactions in every storageelement. However, it is more practical to interpret retention as a differential variable anddescribe its dynamics by an ODE of a suitable order.

Remark 2.3.3. Except pressures and pump efficiencies, pulp transport throughout the wet-end system is a mainly linear process. The former – as well as screens and fiber recovery

51


components – is a component which can cause nonlinearities by separating pulp streams.The separation parameters themselves can be modeled by nonlinear differential equations.The estimation of model parameters by using measured data is important here.

A retention model that was developed and used in this work has the basic form

T 21 · Rtot + T2 · Rtot +Rtot = N (vin,min

fib,minfil), (2.3.55)

Rfil = k ·Rtot. (2.3.56)

which describes a nonlinear dynamic dependency of the total retention of the solid streamand chemical additives. This relates to a generalization of the so-called PT2 element fromsystem theory. The right-hand side is given by a nonlinear function N ( · ) whose concreterepresenation is based on the process know-how of the Voith Paper GmbH and thereforenot shown here. The unknown parameters are fitted by using real measurements. The fillerretention is assumed to be proportional to the total retention. To solve the second orderdifferential equation, an additional variable is introduced and it is reformulated as a systemof first order.The dewatering coefficient is assumed to be constant for our simulations.

Fiber Recovery and Screens

Screens exist for different purposes and are used to filter impurities from the pulp. Fiberrecovery is essential in paper making. Edge trimmings are dissolved in puplers and whitewater is lead back to short and long cycles in which the fibers are to be recovered. Thereason is simply that fibers are valuable primary products in paper making and loss shouldbe minimal.There are different techniques to recover fibers from a pulp stream as well as screeningtechnologies. However, modeling can be similar. The basic construction of such a modelcan simply be a control volume that has a single inlet and several outputs which givedifferent pulp and water qualities. All streams are practically separated into some streamscalled accept and a stream called reject. Again, the separation of the input stream intoseveral output streams can be described in the same ways as for the former model whilereplacing separation parameters by suitable (empirical) models.

2.3.4. Process Dynamics at Exemplary Plants

In this section, we describe a basic wet-end concept using only a short white water cir-culation to outline the concept of wet-end modeling and process behavior. The process’dynamics depend on the transport phenomena of the pulp and the separation in the former.The short white water circulation is illustrated in Fig. 2.24.We use the presented models for the components involved such as a storage chest, pipes,former and pump. The models are implemented in gPROMS ModelBuilder that implicitlycreates a system of differential-algebraic equations. The pipes are parameterized using afixed ratio ’nodes per meter’ k(l) = 10.We numerically analyze the process dynamics by sending impulses via pulp in, see Fig.2.24, and solve the dynamic system using gPROMS solvers. We can expect that the sys-tem provides a characteristic response depending on the white water circulation and theassumptions on the retention and dewatering parameters.

52


jgjg gggg

gg --

-

-

pulp in

paper out

?

overflow

white water

vin

cfil ≈ 0.3

ws ≈ 80

5 m

10 m

vM = 30m · s−1 WM = 6 m

Figure 2.24.: This is a simple wet-end process with only a short white water circulation.Pulp streams in, is mixed with white water and is transported to the formerwhere it is dewatered and separated into paper and white water.

We assume fixed properties of the inflow. The produced paper in such a model is charac-terized by the following two quantities.

• wb – substance, specific mass of solids per square meter [g ·m−2],

ws = 103 ·vpap ·

(mpapfib +mpap

fil

)vM ·WM

, (2.3.57)

where vM is the machine speed in [m · s−1] and WM is the width of the paper web in[m].

• cfil – ash content, dimensionless fraction of fillers of the total solids in the paper,

cfil =mpapfil

mpapfib +mpap

fil

. (2.3.58)

Substance and ash content usually have to lie within certain tolerances in order to belongto a desired product specification. Standard copy paper often has a substance of 80 g/m2.Since these values depend on the wet-end balancing and therefore on the dynamics of theproperties being balanced, they are implicitly time-dependent.The machine speed vM has usually an offset ∆v to the velocity of the pulp jet from theheadbox,

vjet = vM + ∆v, (2.3.59)

but here, we choose ∆v := 0 for simplification.Now we analyze the effect of a change in the inflow vin on the outputs. The wet-end examplemodel is set up to reach values of ws ≈ 80 and cfil ≈ 0.28. This means that the inflow

53


properties and vin = vfixedin are chosen in a suitable way. This leads to a stable system.Then, we define an impulse on the inflow of primary products through pulp in to describea disturbance in the system

vin(t) =

vfixedin , for t < 50,vfixedin + Λ, for 50 ≤ t < 51,vfixedin , for t ≥ 51,

(2.3.60)

with the impulse magnitude |Λ| > 0.The separation parameters were chosen as constants

γdew = 0.2, Rtot = 0.8, and Rfil = 0.5. (2.3.61)

Further parameters are given by:

description variable value unitfiller content pulp in min

fil 16 [g/l]fiber content pulp in min

fib 37 [g/l]volume flow pulp in vfixedin 0.282 [m3/s]slice opening headbox dslice 8 [mm]impulse magnitude Λ 0.05 [m3/s]

The main results are shown in Fig. 2.25. The impulse in the inflow causes a periodicresponse of the system with vanishing magnitude which depends on the short white watercirculation.An increase of the volume flow in the pulp inflow has a quite direct influence on the sub-stance of the solids in the paper. But also the white water gets a higher consistency whichcauses the second influence and so on. The effect on the ash content is a little bit morecomplicated since it depends on the equilibrium in the short water circulation. Obviously,the fraction of fillers in the white water circulation here is higher than it is in the inflow.This is caused by the low filler retention Rfil. By using more of the pulp inflow for a shorttime, the filler content in the pulp decreases in the first response.Note that the separation parameters are chosen as constants. If however the retention anddewatering parameters depend on the consistency, the filler content or the volume flow, itgets harder to predict and explain the system’s behavior by intuition.The reason why we analyze the impulse responses of the wet-end system is because we limitourselves to piecewise-constant controls later on for optimal control of wet-end processes.The control is done by steering variables like the pulp inflow as done in this example and apiecewise-constant control is just a sequence of impulses of different length and magnitude.Modeling the wet-end of a paper machine is in wide areas straightforward and well un-derstood. The main focus on such kind of modeling is to describe the time-dependendphenomena in complex we-end processes, whcih can hardly be fully understood withoutthe help of simulation. The difficulty is mainly to deliver a fit-for purpose model to beused for different kinds of applications. Therefore, I developed a gPROMS model libraryto describe different types of paper machine wet-ends as well as single- and double-tiereddrying sections. A model was built to fit the pilot paper machine in Heidenheim using anown approach on retention modeling resulting in a complex dynamic model with more than5000 unknown variables. It can be used to predict the machine’s behavior when applyingchanges to the process. The comparison with actual measurements of the pilot plant showthe theoretical potential of optimal process control using simulation.

54


Figure 2.25.: These two figures show the results of a dynamic simulation of the simplewet-end model using gPROMS.

55

Chapter 3.

Solution Methods

In Chapter 2, two different models for application in pulp and paper industry were pre-sented. The drying section model has its scope on being used as a steady-state model andthe wet-end model is dedicated to the description of the system’s dynamics over time.However, both problems are in fact linked since the numerical solution of the resulting sys-tems leads to nonlinear systems of equations. While the drying section model – with timederivatives replaced by zero – directly yields a system of nonlinear equations, the wet-endmodel is primarily a system of differential-algebraic equations. Solving DAE systems usu-ally leads to a problem of sequentially solving systems of nonlinear equations.Both models use discretization schemes to describe transport through a system. In thedrying section, a backward finite difference discretization is used to model the transport ofwater and energy in the paper web along the machine direction. In the wet-end model, aspecial case of the finite difference method is used to generate a sequence of CSTRs. In anycase, the number of discretization points can be large depending on the number of pipes, thenumber of nodes used or the complexity of the whole system. As described previously, thediscretization of the one-dimensional transport equation basically leads to a linear systemof ordinary differential equations while the system’s matrix has only nonzero entries in thediagonal and in the lower sub-diagonal. Thus, we can assume that the numerical problemsarising from the presented pulp and paper simulation problems are large and sparse.In the software gPROMS, the solver BDNLSOL is used to solve systems of nonlinear equa-tions whenever they occur, it uses a Newton-type method with block-decomposition tech-niques to handle sparsity. The numerical analysis of systems of nonlinear equations andDAEs is well understood. We borrow freely from [Deu04, HW04, AQS02, DB95, Dav06,BS96] while we outline the numerical concepts of solving these types of problems withinSection 3.1 and 3.2. We will not go into details concerning convergence and analysis resultsof the described methods but refer to the literature mentioned above and in the sectionsthemselves.In Section 3.3, we discuss the methods used for sensitivity analysis and gradient calculationof such systems. We present a new foreign object implementation as an external tool to thesoftware gPROMS which augments it by the capability of performing large-scale parametricsensitivity analysis of nonlinear systems.

3.1. Solution of Nonlinear Systems of Equations

Nonlinear systems require the simultaneous solution of n nonlinear equations involving nunknown variables. The standard method to address such problems is to use the mul-tidimensional version of the scalar Newton-Raphson method. In the literature, numerousvariants of the methods are known and their names usually refer to Newton’s method whichis briefly described in the following.

57

Chapter 3. Solution Methods

6

-

xkxk+1

x

f(x)

6

f(xk)

f(xk+1)

x∗

Figure 3.1.: Newton-Raphson method for finding the zero of a scalar function f(x).

3.1.1. Newton’s Method

A system of n nonlinear equations is a special case of the differential-algebraic system (2.0.1)with only algebraic equations, which we now denote by x instead of y. Let f : Rn → Rn bean n-dimensional operator. A general nonlinear algebraic system for n algebraic variablesis a system of n equations of the form

f(x) =

f1(x1, . . . , xn)...

fn(x1, . . . , xn)

= 0. (3.1.1)

The system involves n equations which have to be satisfied simultaneously by the variablevector x = (x1, . . . , xn)T . The classical Newton method uses a linearization of the systemto guess the position of the zero of f . Let x∗ be a solution of (3.1.1). If f is smooth enough,by Taylor’s theorem, starting at a guess x0, we can use the first-order approximation

0 = f(x∗) ≈ f(x0) + Jf (x0)(x∗ − x0) (3.1.2)

with the Jacobian of f

Jf (x0) := Df(x0) =

∂f1

∂x1 (x0) . . . ∂f1

∂xn (x0)...

...∂fn∂x1 (x0) . . . ∂fn

∂xn (x0)

. (3.1.3)

Writing this as an iteration gives for the (k + 1)-th step the linear system

f(xk) + Jf (xk)(xk+1 − xk) = 0 (3.1.4)

58


for the unknown xk+1. If Jf (xk) is regular, we can write the explicit iteration

xk+1 = xk − Jf (xk)−1f(xk). (3.1.5)

More general, this is a special case of the fixed point iteration

xk+1 = Φ(xk) (3.1.6)

with Φ(x) = x− Jf (x)−1f(x) and a solution x∗ is a fixed point with Φ(x∗) = x∗. For localconvergence to a solution x∗, we need the iteration to be a contraction

‖Φ(x)− x∗‖ ≤ C‖x− x∗‖p (3.1.7)

for 0 ≤ C and an order p ≥ 1 within a domain U(x∗). For p = 1, we need 0 < C < 1 andfor p > 1 only C < ∞. The classical result leads to the theorem of Newton-Kantorovich,which ensures local quadratic convergence of the Newton method roughly speaking if

(i) the domain of f : D ⊂ Rn → Rn is convex, f is continuously differentiable,

(ii) Jf (x0) is regular and its inverse is bounded,

(iii) Jf satisfies a Lipschitz condition and

(iv) if the Newton step is bounded.

For detailed convergence results as well as for a proof of Newton-Kantorovich’s theorem see[OR87].The geometric interpretation of the Newton-Raphson method for a scalar equation is thatfor a current iterate, we always follow the tangent to its unique zero as long as the tangentitself is not parallel to the x-axis, see Fig. 3.1.Starting at an initial guess x0, the basic numerical algorithm has the form:

for k = 0, . . . , kmax doif ‖f(xk)‖ < ε then

exit with solution xkend ifSolve Jf (xk)∆x = −f(xk)xk+1 ← xk + ∆x

end for

This means that we have to solve linear systems of the form Ax = b until we reach a pointwith a norm of the function value vector smaller than a specified tolerance ε or until wereach a non-convergence bound kmax on the number of iterations.If n is large, we face two different problems. The first one is that calculating the JacobianJf (xk) gets expensive. For example, approximating the entries of the matrix by centraldifferences requires a total of 2n2 evaluations of a function in f . The second problem issolving the linear system, where the Jacobian has to be factorized. The direct solution ofthe system by Gaussian elimination has a complexity of O(n3) which becomes unpracticalfor quite low numbers of n.The partial derivatives building the Jacobian can be computed directly by either usingnumerical differentiation techniques or automatic differentiation, see [Gri02].

59


Newton-Type Methods

• Standard Newton:

Jf (xk)∆x = −f(xk), xk+1 = xk + ∆x, k = 0, 1, . . . (3.1.8)

• Simplified Newton:

Jf (x0)∆x = −f(xk), xk+1 = xk + ∆x, k = 0, 1, . . . (3.1.9)

Here, the initial Jacobian Jf (x0) is not updated throughout the iteration or onlyoccasionally.

• Damped Newton:

Jf (xk)∆x = −f(xk), xk+1 = xk + αk∆x, k = 0, 1, . . . (3.1.10)

for a damping parameter αk ∈ (0, 1).

• Relaxed Newton: For scalar equations f(x) = 0 with multiple roots of order p. ApplyNewton’s method on f(x)

1p = 0 with relaxation parameter p.

• Broyden/Quasi-Newton: The secant equation rises the possibility of finding approxi-mations to the Jacobian of f ,

Bk+1(xk+1 − xk) = f(xk+1)− f(xk). (3.1.11)

In one dimension the scalars Bk are just the slope of the secant through two subse-quent points and the Quasi-Newton method is equivalent to the well-known secantmethod. In n dimensions, however, the n2 unkowns in Bk are not identified uniquelyby the secant equation. Whole classes of so-called Broyden-class methods and Quasi-Newton methods can be derived. Additional requirements on the approximation suchas symmetry, sparsity or positive definiteness are used to construct Quasi-Newton up-date formulas. In Chapter 4, the formulas are revisited while Quasi-Newton methodsare explained for their use in nonlinear optimization. The iteration is

Bk∆x = −f(xk), xk+1 = xk + ∆x, k = 0, 1, . . . (3.1.12)

• Gauss-Newton: A variant for the solution of nonlinear least-squares problems involv-ing the normal equations.

• Exact Methods: Newton-type methods are called exact whenever direct methods areused to solve the linear systems.

• Inexact Methods: Newton-type methods are called inexact whenever iterative meth-ods are used to solve the linear systems.

60


Globalization

Since ordinary Newton methods are locally convergent but not necessarily globally, certainglobalization strategies might be applied.

• Continuation methods: These are also known as homotopy methods and extend theproblem of finding the zero of f(x) to

f(x, τ) = 0, (3.1.13)

where the initial guess x0 is a solution for τ = 0 and the desired solution x∗ is asolution for τ = 1. This refers to parameter continuation methods.Continuation methods can be also interpreted as the solution of a system of ordinarydifferential equations

x′(τ) = f(x(τ)), x(0) = x0, τ ≥ 0 (3.1.14)

Then we have to find the value of x for τ →∞, because it is known that every limitpoint of x solves the nonlinear equation. This relates to Newton-path methods andis similar to gradient path methods for unconstrained optimization as discussed inSection 6.5.

• Steepest descent: The so-called residual level function is defined by

g(x) :=12‖f(x)‖22 =

12f(x)T f(x) (3.1.15)

and has its zeros at solutions of f(x) = 0. In contrast to ordinary Newton methods,the iteration

∆x = −∇g(xk) = −Jf (xk)T f(xk), xk+1 = xk + αk∆x, k = 0, 1, . . . (3.1.16)

for suitable choices of step size parameters αk > 0 satisfies the strict downhill property

g(xk+1) < g(xk) if g(xk) 6= 0. (3.1.17)

This globalizes the approach. It follows the steepest descent path of g to find a zeroof f .

• Damped Newton: Choose the damping parameter in (3.1.10) in such a way that

‖f(xk+1)‖ < ‖f(xk)‖ (3.1.18)

• Other types: Convex mapping, Trust-Region methods, Levenberg-Marquardt, generallevel functions.

In every Newton-type method, linear systems of equations have to be solved in each iter-ation. In principle, there are direct and iterative methods for solving the linear problemswhich define the Newton-type methods as exact or inexact, respectively. The subproblemsarising strongly relate to the problems in sparse linear algebra.

61


Sparse Linear Algebra

It is not within the scope of this thesis to give a detailed review on the methods usedfor solving large systems of linear equations. However, we mention some of the standardmethods according to [Dav06, Ede04].The number of variables involved in the nonlinear system is equal to the number of un-knowns in the linear subproblem. Direct methods for solving full systems quickly becomeinefficient and practically impossible to use because of computational efforts and memoryusage for storing full matrices. Fortunately, large systems usually arise from the discretiza-tion of partial or ordinary differential equations which makes them sparse since derivativeapproximations usually are local. It is inevitable that the sparsity pattern of the system isused.Sparse matrices occur as structured or unstructured, namely by properties such as symme-try, definiteness, block or diagonal structure. General nonlinear problems generate linearsystems that do not have a special structure except its sparsity.

• Sparse Direct Methods: Those are based on the idea of direct factorization of thelinear system Ax = b to LUx = b and special strategies are applied to preserve thesparsity of the factored matrices L and U . Full factorization usually generates densenonzero patterns in these matrices. For an overview on direct methods we refer to[Dav06] as well as [Ede04].An important property is that factorization methods easily work on linear systemswith multiple right-hand sides of the form AX = B, where the unknown itself is amatrix. The same factorization can be used to solve every right-hand side at once.This plays a role for the solution of sensitivity systems as discussed in Section 3.3.Dependent on the structure of the system matrix, there are different ways to attackthe problem.

– Symmetric positive definite: Sparse LU decompositon by Cholesky decomposi-ton. The Cholesky decomposition might not preserve sparsity of the systemmatrix, thus, the so-called fill-in is eliminated by representing the nonzero pat-tern of the triangular matrices as graph and using elimination trees.

– Unsymmetric: Sparse LU decomposition by block triangular decompositon asused in MA48 [DR96], PARDISO [SG04] or the NAG library [SS96]. Threemain phases are used, namely analyze, factorize and solve. In the first one, themethods seek to find the block triangular form

PAQ =

A11 A12 · · ·

A22 A23 · ·A33 · ·

· ·Arr

= LU (3.1.19)

with square blocks Aii and chooses pivots in order to preserve the sparsity ofA for the factorization. Then the factorization is computed and the system issolved by back substitution.

• Sparse Iterative Methods: Iterative methods for approximately solving a linear systemof the form Ax = b try to minimize the residual

rk = b−Axk = Ax−Axk = A(x− xk) (3.1.20)

62

3.2. Solution of Differential-Algebraic Systems of Equations

for increasing k in a suitable manner, namely by minimizing the

– residual norm: ‖rk‖,– error norm: ‖x− xk‖ or

– energy norm: ‖A1/2(x− xk)‖.For instance, there exists an error estimate for the energy norm of the residual forconjugate gradient methods depending on the condition of the matrix A. Again itdepends on the structure of A which methods are efficient and applicable.

– Symmetric positive definite: conjugate gradient methods apply for symmetricpositive definite matrices A and solving Ax = b is equivalent to solving theproblem

12xTAx+ bTx→ min . (3.1.21)

A matrix P can be called preconditioner if its inverse can be easily computedand if the system

P−1Ax = P−1b (3.1.22)

can be solved accurately within fewer iterations than the not preconditionedsystem, that is roughly speaking if the condition of the matrix is improved. Thisis motivated by the upper bound on the energy norm of the residual for conjugategradient methods, which says that the error is small if the condition of the actualmatrix is small. Actually, preconditioning is essential for the efficiency of theiterative solution of linear systems.

– Unsymmetric: SuperLU-dist [LD02], Generalized Minimum Residual Method(GMRES)


Consider the system (2.0.1)F (x, x′, y, u, t) = 0

in full implicit form, where the differential variables x and algebraic variables y are treatedseparately. The applications in pulp and paper industry we presented before can be writtenin the special form

x′ = F (x, y), x(0) = x0, (3.2.1)0 = G(x, y), (3.2.2)

where differential and algebraic equations can be separated into explicit ODEs and algebraicequations. The latter ones do not depend on any time derivatives. Thus this form is calledsemi-explicit. We assume that the control u is the only variable which is explicitly time-dependent and so we omit u and t in our notation. In the following we assume that thesystem does not directly depend on the time t.

3.2.1. Index 1

Using the implicit function theorem, we can differentiate equation (3.2.2) if G is sufficientlysmooth and get

y′ = −G−1y Gx F (3.2.3)

63


with

Gy(x, y) := Gy :=∂G

∂y(x, y), (3.2.4)

Gx(x, y) := Gx :=∂G

∂x(x, y), (3.2.5)

if the matrix Gy is invertible in a neighborhood of the solution. Here, we have to differ-entiate equation (3.2.2) only once to obtain a full explicit system of ordinary differentialequations, that is why the DAE system (3.2.1) / (3.2.2) is said to be of index 1.This means that modeling in semi-explicit form directly leads to DAE systems of differen-tiation index 1.

3.2.2. Higher differentiation index

Roughly speaking, the differentiation index of a DAE tells us how far it is from being anordinary differential equation. The following definition is borrowed from [HW04].

Definition 3.2.1. The differentiation index of a DAE system (2.0.1) is the minimumnumber m of analytical differentiations

F = 0,dFdt

= 0, . . . ,dmFdtm

= 0, (3.2.6)

which is needed to analytically find the so-called underlying ODE system of the form

x′ = Ψ(x). (3.2.7)

3.2.3. Consistent Initialization

The initial values x0 of the semi-explicit system (3.2.1) / (3.2.2) is called consistent if thereexists a solution y0 of

G(x0, y0) = 0. (3.2.8)

This leads to the problem of solving a nonlinear system of equations at time 0.

3.2.4. ε-embedding for index 1 problems

The semi-explicit form of DAEs is a special case for ε = 0 of the so-called singular pertur-bation problem (SPP)

x′ = F (x, y), (3.2.9)εy′ = G(x, y), . (3.2.10)

By using this form it is possible to apply standard methods to solve the ODE system andset ε to zero in the resulting iteration formulas.This is a useful strategy since methods can be developed and convergence results of themethods can be analyzed for ε→ 0.

64


In [HW04], the general single-step Runge-Kutta method is applied to the SPP and gives

Xni = xn + h

s∑j=1

aijF (Xnj , Ynj) (3.2.11)

0 = G(Xnj , Ynj) (3.2.12)

xn+1 = xn + hs∑i=1

biF (Xni, Yni) (3.2.13)

yn+1 =

1−s∑

i,j=1

biωij

yn +s∑

i,j=1

bi ωijYnj (3.2.14)

for the Butcher-array [But87]

c M

bT

and M invertible so that(ωij)i,j=1,...,s = M−1. (3.2.15)

Note that the semi-explicit DAE system is only implicitly time-dependent, this means thatwe have no need for the parameter vector c in this notation.

3.2.5. Linear Multistep Methods

As for the application of the single-step Runge-Kutta method, a general multistep methodcan be applied to the SPP (3.2.9) / (3.2.10). For k steps this gives

k∑i=0

αixn+i = hk∑i=0

βiF (xn+i, yn+i), (3.2.16)

εk∑i=0

αiyn+i = hk∑i=0

βiG(xn+i, yn+i), (3.2.17)

with weights αi, βi. With ε = 0 we get

k∑i=0

αixn+i = h

k∑i=0

βiF (xn+i, yn+i), (3.2.18)

0 =k∑i=0

βiG(xn+i, yn+i), (3.2.19)

which is the multistep method applied to the semi-explicit DAE (3.2.1) / (3.2.2). For themethod to be implicit we need βk 6= 0, otherwise the method is explicit.Implicit methods lead to nonlinear problems in each step. Here, we get a combined systemincluding (3.2.18) for the ODE part and (3.2.19) for the algebraic part of the DAE.Adams-Bashforth (AB) and Adams-Moulton (AM) methods are popular examples for ex-plicit and implicit multistep methods [AQS02]. Widely used implicit methods are theso-called BDF methods which are sketched briefly in the following. These are also part ofthe solver DASOLV in gPROMS [Pro04a].

65


BDF

Backward-difference formulas (BDF) are implicit multistep methods where the parametersare chosen in a way that we have a polynomial that satisfies the ODE at the current pointand interpolates it at the k previous points.

k∑i=0

αixn+i = hβkF (xn+k, yn+k), (3.2.20)

0 = G(xn+k, yn+k). (3.2.21)

Clearly, βk = 1 can be chosen without loss of generality. The simplest case k = 1 withα0 = −1 and α1 = 1 leads to the classical implicit Euler method

xn+1 = xn + hF (xn+1, yn+1) (3.2.22)

For k = 2, we have

xn+2 =43xn+1 −

13xn + h

23F (xn+2, yn+2), (3.2.23)

and this can be collected in an array

k α0 α1 α2 α3

1 -1 1 - -2 1

2 -2 32 -

3 −13

32 −3 11

6...

......

......

for k ≥ 0. This scheme has βk = 1 but in the literature there are different equivalent waysto write the BDF method. It is also possible to take αk = 1 which modifies all parametersin this scheme by dividing them by αk thus resulting in βk = 1

αk.

Write Fn := F (xn, yn). Shifting the indices gives for k = 1

xn = xn−1 + hFn ⇔ xn − xn−1

h= Fn (3.2.24)

and for k = 2

xn =43xn−1 −

13xn−2 + h

23Fn ⇔ xn−2 − 4xn−1 + 3xn

2h= Fn. (3.2.25)

Let us assume a constant step size h, thus, this yields a time discretization

tn = nh, n = 0, 1, . . . , (3.2.26)

so that in these terms BDF methods replace the time derivative

x′(tn) ≈ Π′( xn, . . . , xn−k︸︷︷︸≈(x(tn),...,x(tn−k))

) := Π′(tn) (3.2.27)

with an approximation formula Π′ depending on the choice of k. It approximates the timederivative at time tn with the k most recent iterates including the current one which makesthe method implicit.It is well-known that BDF methods are stable up to order k = 6, see [Cry73].

66


Construction of BDF methods

Consider the Cauchy problem

x′ = f(t, x), x(0) = x0 (3.2.28)

for a scalar function x.BDF methods replace the time derivative by an approximation (3.2.27). For the construc-tion of BDF methods, the function Π is a polynomial with the properties

Π′(tn) = fn := f(tn, xn) and (3.2.29)Π(tn−i) = xn−i for i = 0, . . . , k. (3.2.30)

This means it interpolates the solution of the Cauchy problem at the recent k iterates andthey share the same time derivative at the current point.Consider an equidistant grid by a constant step length h. The Lagrange polynomial repre-sentation is

Π(tn − θh) =k∑i=0

xn−i ì(θ) (3.2.31)

with

ì(θ) =k∏

j=0,j 6=i

θ − ji− j

(3.2.32)

for θ = 0, 1, . . . , k.Forming the derivative with respect to the shift θ on both sides gives

ddθ

Π(tn − θh) =ddθ

k∑i=0

xn−i ì(θ), (3.2.33)

−hΠ′(tn + θh) =k∑i=0

xn−i `′i(θ), (3.2.34)

(with θ = 0) hfn =k∑i=0

xn−i αi, (3.2.35)

withαi = `′i(0) i = 0, . . . , k. (3.2.36)

Thus, the coefficients αi result from the requirement (3.2.29) while the interpolation re-quirement generates the special form of BDF methods. Reordering the coefficients leadsback to the form (3.2.20).It is important to note that the interpolation by a Lagrange polynomial does not require thetime-stepping h to be uniform. This property is essential for the construction of variablegrid methods, see [DB95]. Alternatively it is possible to write the interpolating polynomialin Newton’s representation instead of Lagrange’s form which has an important benefit whenconstructing multistep methods of variable order.Note that we outlined the BDF construction scheme for a scalar ODE but this is notrestrictive since it automatically applies to systems of equations component-wise.

67


3.2.6. Predictor-Corrector Idea

The largest effort to compute a single iteration in a BDF method is to solve the nonlinearequation which is typically done by some Newton-type method as described before. Ex-plicit methods do not have the need for solving nonlinear equations but only need functionevaluations of F and G at every time step. The idea of predicting a new iterate by anexplicit method, evaluating the function at the prediction and correcting it by an implicitmethod is called predictor-corrector-method (PC). The corrector is a fixed-point iterationand it is actually an explicit iteration since the prediction is inserted.A simple case is Heun’s method which can be interpreted as predicting the new iterateby the explicit Euler method and correcting it by the implicit trapezoid rule (also-calledCrank-Nicolson). Then PC methods transform implicit methods into explicit ones by re-placing the current iterate on the right-hand side of the iteration instruction by an explicitapproximation.Alternatively, instead of replacing, the predicted point can be used as initial guess for thesolution of the nonlinear equation in the implicit corrector step.

3.2.7. Variable Step Size and Order

It is widely accepted that efficient numerical methods need some kind of adaptivity for theproblem to be solved. Basically there are two different representations of BDF methodswhich allow the changing of order and step size. To decide which step size and order tochoose, one usually needs error estimators for the local discretization error. Then we needthe method to produce an error lower than a specified tolerance. Generally, this leads tothe problem of weighting effort and accuracy.The way the method is constructed is important for the ability to change order or step sizeduring the computation. Adams and BDF methods result from polynomial interpolationformulas, thus, their variability depends on the type of representation.As mentioned before, Newton’s representation of the interpolating polynomial can be used.A great benefit of Newton’s representation is that the order k of the BDF method canbe changed easily and the parameters directly result while being independent of k. Weonly have to add further terms to Newton’s representation. However, this is only easy tocompute if the grid points are equidistant since the special form the Newton polynomialsin case of equidistant grids is needed.The other approach uses a virtual equidistant grid. The idea is to work always with anequidistant grid of the actual step size. There is generally no information at the k pointsin history since we might have worked with a different step size there but the needed in-formation lies at intermediate points of known iterates. The virtual grid is then filled byinterpolating between the iterates in history. This approach is called Nordsieck representa-tion and refers to [Nor62].In [DB95] there is a good overview on the methods of variable grids and orders for mul-tistep methods by predictor corrector schemes. Roughly speaking, in Newton’s approachchanging the grid causes computational effort to determine the multistep parameters andin Nordsieck’s approach changing the order causes computational effort. Finally, step sizeh and order k are adaptively chosen similar to single-step methods by using estimators onthe local discretization error. The benefit of multistep methods compared to single-stepmethods is that the computational effort for the evaluation of the right-hand side is thesame for every step. This means that there might be computational overhead compared to

68

3.3. Parametric Sensitivity Analysis

single-step methods that loses its relevance for large systems and computationally expensiveright-hand sides.Although a general convergence theory for adaptive multistep methods is still missing, im-plicit BDF methods are attractive for solving stiff ODEs and large systems of differential-algebraic equations [Gea71, Pet92].


Parametric sensitivity means to identify the dependency of the solution of a simulationproblem on the time-invariant parameters involved. The simplest form of an explicit scalarmodel is

x = f(u), (3.3.1)

with a scalar state variable x and a parameter u. The sensitivity is just

s :=∂x

∂u=∂f

∂u= f ′(u), (3.3.2)

the derivative of the variable x with respect to u. It is the slope of the tangent at thecurrent point u.The problems that arise from simulation problems in pulp and paper industry as well asin many other engineering applications are of a more general form. Modeling often leadsto systems of nonlinear or differential-algebraic equations while DAEs generalize nonlinearsystems.Sensitivity analysis is a very helpful tool for optimization, parameter estimation or forprocess analysis. In this work, it is used to identify parameters to theoretically improve thedrying efficiency of a paper machine drying sections according to the model presented inthe previous chapter.In this section, we consider the general system of nonlinear equations, outline the principlesof sensitivity analysis and present a way to implement an external sensitivity analysis toolfor the commercial software gPROMS.Sensitivity analysis problems of DAEs are treated similar to ODEs. They yield lineartime-variying systems that can be solved in numerous ways. We briefly explain some ofthem.

3.3.1. Steady-State Sensitivity Problems

Extend the standard nonlinear problem (3.1.1) involving a variable y ∈ Rn to

f(y, u) = 0 (3.3.3)

with f : Rm+n → Rn and a parameter vector u ∈ Rm and call it parameter-dependentnonlinear system. The sensitivity of the n state variables with respect to the m parametersis consequently an n×m-matrix

S :=∂y

∂u∈ Rn×m. (3.3.4)

The implicit function’s theorem tells us that

S = −(∂f

∂y

)−1(∂f∂u

)(3.3.5)

69


if the matrix A := ∂f∂y ∈ Rn×n is invertible. Define B := ∂f

∂u ∈ Rn×m. The sensitivity matrixS solves

AS = −B, (3.3.6)

which is a set of m linear systems

Asi = −bi, i = 1, . . . ,m, (3.3.7)

withsi :=

∂y

∂ui, bi :=

∂f

∂ui, i = 1, . . . ,m, (3.3.8)

and (3.3.6) is also called a linear system with multiple right-hand sides (mrhs).Since the system matrix A does not change for any right-hand side, the problem can besolved by using direct methods. Then the same factorization of A can be used for everyright-hand side.Provided that A and B can be computed, solving the full sensitivity system (3.3.6) isstraight-forward and sometimes called forward mode. However there is a different way tosolve the problem making use of the adjoint system, also called backward mode, which canhave significant benefits, see [MP96]. We will give some more details on that in Section3.3.3.The entries of the matrices A and B generally depend on x and u, the solution of (3.3.3).Thus, the partial derivatives involved are actually functions of x and u. Unless analyticderivatives are available one can use finite differences to approximate the Jacobian informa-tion needed. First pick a parameter vector u∗ and solve f(y∗, u∗) = 0 to get x0 (we assumethat the solution exists). Then

aij ≈fi(y∗ + hej , u

∗)− fi(y∗ − hej , u∗)2h

, (3.3.9)

withek = (0, . . . , 0, 1︸︷︷︸

k-th

, 0, . . . , 0)T , k = 1, . . . , n, (3.3.10)

approximates the partial derivative of the i-th equation in f with respect to the j-th variablein x at the solution (x∗, u∗) by a central difference and perturbation h. This goes analogouslyfor B. However, inaccuracies in the finite difference approximations can lead to a singularmatrix A. Then, direct methods for the solution of the linear system will fail but anyway,this problem is related to the solvability of the nonlinear system itself because it uses thesame kind of Jacobian information. The matrices A and B that we need for the sensitivitysystem are indeed

A = A(y, u), B = B(x, u). (3.3.11)

Assume that we use Newton’s method to solve (3.3.3) for a fixed u∗, then the iteration is

yn+1 = yn −A(yn, u∗)−1f(yn, u∗), (3.3.12)

and it holdslimn→∞

A(yn, u∗) = A(y∗, u∗) (3.3.13)

if the Newton iteration is convergent. This means that the Jacobian information neededto solve the nonlinear system is asymptotically the same as it is needed for the sensitivitysystem.

70


From this point of view external differentiation schemes to approximate the entries of thesensitivity matrix S become very unattractive. Assume that we have given a solution(y∗, u∗) of the nonlinear system. Set up

f(yij , u∗ + hej) = 0, f(y

ij, u∗ − hej) = 0, i = 1, . . . , n, j = 1, . . . ,m. (3.3.14)

Then we need 2nm solutions of the nonlinear system for approximating

sij :=∂yi∂uj≈yij − yij

2h. (3.3.15)

This way of solving the sensitivity problem seems only attractive if a black-box solver forthe nonlinear system is used and no possibility of obtaining Jacobian information is given.

3.3.2. A C++ foreign Process for Sensitivity Analysis with gPROMS

gPROMS supports external interfaces for input and output of system variables, [Pro04b].There are three types of interfaces:

• Foreign Process Interface (FPI): The user can implement input and output functionsto interact with the simulation run. gPROMS supplies the interface with mathemat-ical information on the system which allows us to implement a sensitivity analysistool as presented below. We use the interface for C++ but also other templates areavailable.

• Foreign Object (FO): Foreign Objects can be used to provide user-defined functionswhich can be integrated into the nonlinear and differential-algebraic systems whilemodeling with gPROMS. It might be useful to implement data-driven models orpiecewise-continuous differentiable defined functions externally. When modeling apiecewise-defined variable within gPROMS directly causes the solver DASOLV todetect the state transitions and results in computational expensive re-initializationsof the system. When the user function supplies gPROMS with the function valueand the partial derivative at any time it is called, there is no need for re-initializationwhich can speed up simulation.

• Output Channel (OC): The output channel is an unidirectional interface which pro-vides simulation output. These data can be used to be visualized or stored andconverted externally, for example by writing the data to binary files and readingthem in MATLAB which makes them post-processable.

gPROMS is capable of calculating the Jacobian of the system to be solved, that is

Jf =[∂f

∂y,∂f

∂u

]∈ Rn×m. (3.3.16)

Once the system is solved, the Jacobian information for the current point is transmittedwhen a task named SENDMATHINFO is called within the gPROMS schedule. We can separateinformation concerning state variables and parameters to set up the system matrices A andB for the sensitivity system (3.3.6).A foreign process implementation actually is a dynamic link library (Windows dll) whosemethods can be called by the gPROMS kernel. An FPI library must contain several methodsof a fixed structure. The one we are interested in is the following.

71


void gfpsendm_( const GSTRING ForeignProcessID ,const INTEGER *ForeignProcessHandle ,const GSTRING PName ,const GSTRING Signal ,const DOUBLE *Time ,const INTEGER *NumVars ,const INTEGER *NumEqs ,const INTEGER *NumDVars ,const INTEGER *NumNonZeroes ,const DOUBLE *VarValues ,const DOUBLE *VarDotValues ,const INTEGER *VarTypes ,const GSTRING *VarNames ,const INTEGER *JacRows ,const INTEGER *JacCols ,const DOUBLE *JacValues ,

INTEGER *Status)

Once SENDMATHINFO is processed this method is called. The variable types in upper caseare given by typedefs within the header gFPInterface.h which is provided by the softwarepackage and do not significantly differ from standard C types.The variable names widely explain themselves. We have the pointers *JacRows, *JacColsand *JacValues which include the nonzero Jacobian information in coordinate storage. Twopopular ways (see [RSC05]) of storing sparse matrices so that memory usage is low are shownin Fig. 3.2. As discussed before, solving the sensitivity system means solving a (hopefully)sparse linear system with multiple right-hand sides. Iterative solvers for multiple right-handside problems are rare and it seems to be self-evident to use sparse direct methods since thesedirectly attack the problem. We argued that we can assume regularity of the sensitivitysystem matrix, otherwise we would not be able to get the solution of the nonlinear systemanyway. Therefore we can expect that direct methods solve the problem in an efficient way.For academic use, there are some libraries available to include them into C projects.

• NAG Numerical Libraries (NAG = Numerical Algorithms Group, Ltd.):

http://www.nag.com/numeric/numerical libraries.asp

• PARDISO solver project:

http://www.pardiso-project.org/

Both solver libraries include sparse direct methods for unsymmetric linear systems of equa-tions, NAG also includes iterative solvers. We chose to use the NAG library f11 for large-scale linear systems. The foreign process to be built contains several phases.

(i) Sorting Jacobian information: gPROMS provides the full Jacobian which includesstate variables and parameters together in a single matrix. By the pointer *VarTypeswe can distinguish between states and parameters.

(ii) Converting the data format: gPROMS provides coordinate storage but the NAGlibrary requires its inputs to be in compressed column storage. We use the generalcompressed row storage format and solve ATS = −B.

72


row column value row start

nnz nnz nnz nnznnzn+ 1

column value

?

2nnz integers and nnz doubles nnz + n+ 1 integers and nnz doubles?

1 1 -32 2 154 3 8

3 -0.1

A =

−3 0 00 15 80 1 −0.1

1 1 -32 2 152 3 83

3 -0.1

R

53

2 1 2 1

Figure 3.2.: coordinate storage (left) vs. compressed row storage (right) for an exemplarymatrix: while coordinate storage uses row and column indices for every nonzeroelement in the matrix, compressed row storage makes use of the fact that itis only necessary to indicate where to start storing the next row in terms of aposition in the value vector. The numbers of nonzero entries in A is denotedby nnz and n is the number of rows of A.

73


6 I I 6 K

? ? ? ? ?

row 1 row 2 row 3 row 4 row 5

Figure 3.3.: The lower bar illustrates the unsorted nonzero vector from the coordinate stor-age input. The elements are collected on stacks, one for each row. This has tobe done analogously for the column index vector.

(iii) Solution of the system: The three main phases of the solution process must be runby calling the associated methods from the NAG library. These are:

a) nag superlu column permutation(): Pivoting.

b) nag superlu lu factorize(): Factorizing.

c) nag superlu solve lu(): Solving.

Using these three phases is well documented by example files, see

http://www.nag.co.uk/numeric/CL/nagdoc cl08/html/F11/f11 conts.html.

(iv) Writing the sensitivity data to a file.

For each nonzero element the type of it is stored by the pointer *VarTypes that is supplied bygPROMS. There is an index to describe algebraic variables, assigned variables, differentialvariables and time derivatives of the differential variables. So the total Jacobian informationis carried and can be separated into

• ∂f∂y – the partial derivatives of the system functions with respect to the algebraicvariables,

• ∂f∂x – the partial derivatives with respect to the differential state variables,

• ∂f∂x′ – the partial derivatives with respect to the time derivatives of the differentialstate variables,

• ∂f∂u – the derivatives with respect to the assigned controls.

Note that for the steady-state problem there are no partial derivatives with respect todifferential variables or their derivatives.After the Jacobian entries have been identified and related to the matrices A := ∂f

∂x andB := ∂f

∂u we have to convert the data format. Note that only A is sparse while B is usuallydense.The code in Fig. 3.4 shows an implementation of the conversion from coordinate storage tocompressed row storage. First, the numbers of elements in each row are counted. This can

74


/* INPUT DATA IN COORDINATE STORAGE */

/* n: dimension of the matrix */

/* nnz: number of nonzero entries */

/* v[k]: value of the k-th Jacobian element */

/* ia[k]: row of the k-th element */

/* ja[k]: column of the k-th element */

/* first run: count row entries */

for (k=0, k=nnz , ++k)

row_counter[ia[k]] += 1;

ia_crs [0] = 1;ia_crs[n+1] = nnz + 1;

/* build compressed row vector: */

for (k=1, k=n, ++k)

ia_crs[k] = ia_crs[k-1] + row_counter[ia[k]];

/* reordering nonzero entries */

for (k=0, k=nnz , ++k)

pos = ia_crs[k+1] - (row_counter[ia[k]]--);v_crs[pos] = v[k];ja_crs[pos] = ja[k];

/* OUTPUT DATA IN COMPRESSED ROW STORAGE */

/* v_crs[k]: value of the k-th Jacobian element */

/* ia_crs[k]: row start of the k-th row */

/* ja_crs[k]: column of the k-th element */

Figure 3.4.: C code to transform coordinate storage to compressed row storage. Note thatthe CRS format has usually increasing column indices for each row. For theoutput data to be ordered column-wise, it might be necessary to sort the inputdata by column first

75


be used to uniquely identify the compressed row vector by partial sums. Once the elementcounter and the compressed row vector are set up, the nonzero entries in the value vectorand the associated column indices have to be reordered, see Fig. 3.3. For this we findthe new position of each element by decrementing the element counter and subtracting thecounter from the next row start index. In this way we fill the new nonzero vector partiallyin groups of elements of the same row in the matrix.Phase (iii) solves the sensitivity problem. Pivoting and factorizing have to be performedonly once. After this is done we run a loop over the right-hand sides of the linear system andcompute the sensitivities row-wise. Each time the same factorization can be used. Thisis a straightforward way to solve the problem and shows the benefits of direct methodscompared to iterative methods. Standard iterative solvers must be started all over againfor every right-hand side which multiplies the computational effort.The sensitivity matrix contains n ·m entries and is expected to be dense. If one likes tostore the information and make it visible it is not convenient to write text files since the sizeof the file might get very large. It might even be hardly possible to view the informationgained manually if there are thousands of variables and controls in the system. If howeverall we are interested in is a gradient for optimization of an objective function represented bya single variable in the system, we simply need some entries in a single row of the sensitivitymatrix and reading the data gets a lot clearer. If it is not clear which parameters cause achange in the system at all, the full matrix has to be taken into account. There might befew variables of special interest, so to say the main results of the simulation. One may askfor the parameters with the largest influence on these variables. This reduces the size ofmatrix of interest significantly.

Usage in gPROMS

The library including the foreign process can be included to override the standard for-eign process interface within gPROMS. In the section SOLUTIONPARAMETERS of a gPROMSproject, the FPI is defined by

SOLUTIONPARAMETERSFPI := "sensitivity_analysis ::file";

and each time that the command SENDMATHINFO is called in the SCHEDULE section, thesensitivity analysis is performed and the matrix is stored to ’file’.

Practical Use

Practical use of the sensitivity matrix is to guess the sensitivity of the solution of the systemto disturbances. Let (y0, u0) be a solution of f(y, u) = 0. Define a disturbance vector

z ∈ Rm with‖z‖2‖u0‖2

= ε 1. (3.3.17)

What can be said about the solution of the disturbed system for uz := u0 + z ? If thesystem f(x, u) is linear, we write

xz = x0 + S(uz − u0). (3.3.18)

76


Then we can expect that f(xz, uz) ≈ 0 for small ε. The disturbance of the system can bedefined as

ζz(x0) := ‖xz − x0‖2= ‖S(uz − u0)‖2= ‖Sz‖2≤ ‖S‖2 ‖z‖2 (3.3.19)= ε‖u0‖2 ‖S‖2,

where ‖S‖2 is the spectral norm of the sensitivity matrix. This means that for linearsystems, the change of the whole system caused by a disturbance can be bounded fromabove by the spectral norm of the system matrix. This is interesting for estimating theinfluence of inaccuracies in the model parameters. For sufficiently small choices of ε, onlya weak form of regularity of the system’s solution is needed in order for the inequality tohold for nonlinear systems. However, if the uncertainties and inaccuracies are crucial, ε hasto be chosen large and the error estimate is inaccurate.

3.3.3. Dynamic Sensitivity Problems

Parametric sensitivity analysis of nonlinear systems is actually a special case of the analysisof differential-algebraic system of equations. To be consistent with the notation from abovewe write

f(x, x, y, u) = 0, x(0) = x0 (3.3.20)

for the combined implicit system. Cleary, such a system has the same number of timederivatives x as variables x. If we interpret the algebraic variables y as differential variableswhere the associated time derivatives do not occur, we can leave y from the notation. Thesensitivity system then gets

∂f

∂x

∂x

∂u+∂f

∂x

∂x

∂u+∂f

∂u= 0 (3.3.21)

For explicit and semi-explicit DAE systems we have ∂f∂x = I and by writing si := ∂x

∂uifor

i = 1, . . . ,m, the systems

si = −∂f∂xsi −

∂f

∂u, i = 1, . . . ,m (3.3.22)

is obtained. Set

A =[I,

∂f

∂x

], (3.3.23)

which allows us to write

A

[sisi

]= −∂f

∂u(3.3.24)

and emphasizes the linear character of the sensitivity system.These are linear time-variant systems, each of dimension n. They cannot be solved inde-pendently of the solution of the underlying system (3.3.20) because the information neededmay change as the variables change over time. In the special case of linear systems, thesensitivity systems are in fact identical to them.

77


It is straightforward to solve the dynamic sensitivity system along with the solution of theDAE system. There are some different ways to do this and in any case, the method of solv-ing the sensitivity system depends on the method used to integrate the DAE system. Thefollowing methods described are all suited for DAE integrators based on BDF formulas withfixed leading coefficient such as DASSL [Pet82] or DASPK [LP99]. The concept of fixedleading coefficients is given in [KEBP89] and reformulates BDF formulas to be suitable forvariable step sizes. Note that Jacobian information within such codes is usually obtainedby applying automatic differentiation techniques.We give an overview on some popular methods for sensitivity analysis of large-scale DAEsystems and will see why an external tool for gPROMS based on the foreign process inter-face cannot provide this kind of functionality in a reliable way.For detailed articles that discuss the most popular methods see [FTB97, SLZ00].

Staggered Direct Method

The staggered direct method is a straightforward method to set up exact sensitivity equa-tions, see [CS85]. First, the current time step of the DAE integration is computed. Afterthe corrector iteration for the DAE has converged, the current Jacobian information of thesystem is needed to set up the sensitivity system. The sensitivity system is linear and thetime discretization is performed the same way that it is done for the differential variablesin the DAE system.After the DAE step is computed and the Jacobian is updated, the discretized sensitivitysystem gets a linear system of equations which can be solved directly. However, updatingthe Jacobian at every time step is not inevitable for the integration of DAE systems likein DASSL. Instead of the updating and factoring the Jacobian at every time step, multipletime steps may be performed using an approximation to the Jacobian which is only updatedoccasionally. The staggered direct method needs an updated Jacobian at every time stepwhich cause usually more computational effort if the underlying system has many variables.

Simulatenous Corrector Method

In contrast to staggered methods that solve a time step for the sensitivity system afterthe time step for the DAE system, the simultaneous corrector method solves the combinedsystem and treats it as a single differential-algebraic system of equations, see [MP96]. Thusit ignores the linearity of the sensitivity and the fact that the DAE system can be solvedwithout having any sensitivity information.To make this method attractive one has to make use of the special structure of the aug-mented system. In [MP96] it is shown that the Jacobian of the augmented system can beapproximated by its block diagonal part and used multiple times while still achieving goodconvergence properties for the nonlinear system in each time step. Although the Jacobianof the full system is much larger it does not have to be stored directly and without the needof frequent updates, this method gains computational benefits compared to the staggereddirect method.By its authors, this method later got called forward mode of sensitivity analysis to empha-size the contrast to the adjoint method that solves the adjoint sensitivity system backwardsin time. This is discussed below.

78


Staggered Corrector Method

The staggered corrector method presented in [FTB97] is a modification of the staggereddirect method. The time derivative in the sensitivity system (3.3.24) is again replaced bya BDF formula instead of solving the linear equation directly using matrix factorizationsof A, it is solved by a Quasi-Newton iteration. There, the approximation of the Jacobianthat is used within the DAE integrator can be used and this usually already available asLU factors. So compared to the direct method, the staggered corrector methods avoids theadditional factorization of the Jacobian at every time step of the integrator which is saidto be the most expensive part in numerical sensitivity analysis.The authors showed that this method performs similar to the simultaneous corrector methodand has some benefits if finite-difference approximations are used.

Staggered Hybrid Method

Recently a staggered method was suggested in [DCH01] that solves the linear sensitivitysystem at each time step by using the generalized minimum residual (GMRES) method.The intention is again to avoid the factorization of the Jacobian at each time step. Theapproximation to the Jacobian available as LU factors is used as a pre-conditioner forthe iterative solver. It is shown that this method only performs better that the methodsexplained before if the number of parameters in the system to be solved for is 1.

Adjoint Method

Possibly the most recent progress in methods for sensitivity analysis is given in [YCS02,LP04, LPS06]. A new method for solving the dynamic sensitivity system based on theadjoint system is presented. It allows to solve the sensitivity equations with a computationaleffort that is nearly independent of the number of parameters involved. It clearly getsattractive for systems with few variables and many parameters. First consider the nonlinearsystem

f(y, u) = 0

and its sensitivity system∂f

∂y

∂y

∂u+∂f

∂u= 0. (3.3.25)

Assume that we are interested in a so-called derived function g(y, u) : Rn × Rm → Rk

and k is supposed to be significantly smaller than n and m. This means that we are onlyinterested in some parts of the full sensitivities

s :=dgdu

= gy∂f

∂y+ gu, (3.3.26)

where gy and gu are the partial derivatives of the function g with respect to the variables yand the parameters u, respectively. So we seek the sensitivity of the result of g with respectto u.Let x ∈ Rn and u ∈ Rm, then multiplying (3.3.25) with λ ∈ Rm×n gives

λT∂f

∂ys+ λT

∂f

∂u= 0. (3.3.27)

79


Now if λ solves

λT∂f

∂x= gy, (3.3.28)

then, the sensitivity is given from (3.3.27) by

s = −λT ∂f∂u

+ gu. (3.3.29)

This is just a reformulation of the direct method of solving the standard sensitivity system.However, most of the work to solve the adjoint system has to be done in (3.3.28) but this isindependent of m. If g is the full system, then gy = I and solving (3.3.28) is equivalent tocomputing the inverse of ∂f

∂y . Once this is done, the desired sensitivities can be computeddirectly by a matrix-matrix multiplication and adding gu, which mostly might be identicallyzero.In the dynamic case we have differential variables x = x(t) and the derived function can bestated by an integral formulation

G(x, u) =∫ T

0g(x, u) dt. (3.3.30)

In [YCS02] the adjoint DAE system is derived by interpreting the problem as an optimiza-tion problem with the objective function G and the system equation as constraints. TheLagrange formulation is used to write

I(x, u) = G(y, u)−∫ T

0λT f(x, x, u) dt. (3.3.31)

It holds that

S :=dGdu

=dIdu

=∫ T

0

(gu + gx

∂x

∂u

)dt−

∫ T

0λT(∂f

∂x+∂f

∂x

∂x

∂u+∂f

∂u

)dt. (3.3.32)

Integration by parts is used to derive that with(λT∂f

∂x

)′− λT ∂f

∂x= −gx, (3.3.33)

the sensitivities of G with respect to u are given by

S =∫ T

0

(gu − λT

∂f

∂u

)dt−

(λT∂f

∂x

∂x

∂u

)∣∣∣∣T0

. (3.3.34)

Now the system (3.3.33) has to be solved backwards in time while the end point initialconditions depend on the index of the DAE. Index-0 and index-1 systems have

λT∂f

∂x= 0. (3.3.35)

For further details we refer to the literature mentioned above.

80


gPROMS Foreign Process

By investigating the field of sensitivity analysis and the numerical methods applied, wesee that it seems to be impossible to develop a FPI-based dynamic sensitivity analysistool for the software gPROMS. Although gPROMS is capable of integrating the sensitivityequations simultaneously with optional error control when performing optimization tasks,it is not the software’s intention to provide the user with full sensitivity information. Theidea of using the method SENDMATHINFO to develop a tool analogously to the steady-statecase is convenient but there are some severe reasons why this seems impossible.

• Jacobian updates: The command SENDMATHINFO makes gPROMS send current Jaco-bian information to the foreign process interface. However, this is not necessarily anupdated version of the Jacobian but the current estimate.

• Step and order control: For the DAE solution, DASOLV uses variable-step-size and-order BDF formulas. There is no way to trigger a SENDMATHINFO automaticallyafter each time step. The information has to be submitted according to a predefinedschedule, for example equidistant after some time has passed in the integration. Andso there is no information on how to integrate the sensitivity system accurately.

• Event location: gPROMS detects discontinuities and structural changes in the DAEsystem, locates them up to a certain precision and re-initializes the DAE integration.There is no way to trigger a SENDMATHINFO at every re-initialization of the DAE solver.So the external integration of the sensitivity system cannot detect and locate discreteevents and therefore it cannot re-initialize the sensitivity integration.

One of the best methods still imaginable is an implicit BDF method with fixed order andstep-size. Consider an explicit ODE. For the implicit Euler method with step-size h andthe steps

tn = nh, n = 0, . . . ,T

h, (3.3.36)

this would yield

(I − hAn)sn = sn−1 + hBn, n = 1, . . . ,T

h(3.3.37)

with An = ∂f∂x (tn) and Bn = ∂f

∂u(tn). This might give reasonable results for smooth systemsand a very small step-size h. However, gPROMS has to be forced to update the Jacobianin order for An and Bn to be up-to-date and the sensitivity system to be exact and there isno error control applicable or information about the accuracy of the integration available.So this might only be suitable for a rough analysis of the process. For optimization orparameter estimation, this approach does not seem promising.To classify such an approach, it can be seen as a staggered direct method that is decoupledfrom the time discretization of the DAE system.

81

Chapter 4.

Nonlinear Programming and Optimal Control

In this work we use different optimization methods, also including well-known ones. Beforewe present a new algorithm for solving time-optimal control problems, we outline the basicconcepts of the theory and algorithms for nonlinear programming to give a view on the solu-tion methods that are widely used in practical applications. There are efficient commercialsoftware solutions available for solving nonlinear programming problems. Here we focus onthe methods of sequential quadratic programming as they are used in gPROMS and on theinterior-point methods that are included in the MATLAB’s Optimization Toolbox.In Section 4.3 we discuss practical optimal control problems for grade changes in paperproduction by solving nonlinear programs instead of discussing the calculus of variationsand dynamic programming. We derive a new algorithm for the solution of time optimalcontrol problems with strict and smooth path constraints that is based on the sequentialsolution of easier subproblems.

4.1. Basics in Unconstrained and Constrained Optimization

This chapter strongly bases on the established standard literature for optimization [JS03,NW06, BV04, GK99].

Optimization can have many meanings depending on the type of the problem. In thefollowing we will assume that all functions are smooth and real-valued. A quite generaldefinition can be the following.

Definition 4.1.1. Given a function f : Rn → R. The problem

(P )

minx∈Rn f(x)hi(x) ≥ 0 i = 1, . . . , nineqgi(x) = 0 i = 1, . . . , neq

(4.1.1)

is called optimization problem of an objective function f with neq equality constraints andnineq inequality constraints.

A point x∗ is called optimal solution for (P) if it achieves the lowest possible functionvalue while satisfying the constraints.If the functions f, gi and hi are linear the problem is known as a linear program, otherwiseit is a nonlinear program (NLP). A special case is when the objective function is quadraticand the constraints are linear, then the problem is called a quadratic program. The problemis smooth if the functions are differentiable otherwise non-smooth.Still the problems of type (4.1.1) can be distinguished by the functions’ properties of beingconvex or concave. Convex programs have the very important property of being uniquelysolvable if the constraints are consistent. The numerical solution of convex optimization

83

Chapter 4. Nonlinear Programming and Optimal Control

problems yields a global solution while the solution of a general nonlinear program mightdepend on the choice of the initial guess and is only a local solution. Nevertheless, manyalgorithms that are designed for the solution of convex problems can also be applied onnon-convex programs.Local solutions of smooth constrained optimization problems can be identified by a set ofnecessary conditions, the so-called Karush-Kuhn-Tucker conditions. In the special case ofunconstrained problems it is the well-known first order necessary condition

∇f(x) = 0.

In this unconstrained case, locality and globality of points can be defined as follows.

Definition 4.1.2. A point x∗ is called local minimum of f if

f(x) ≥ f(x∗) for x ∈ Uε(x∗),while Uε(x∗) := x ∈ Rn : ‖x− x∗‖ ≤ ε. If

f(x) ≥ f(x∗) for x ∈ Rn,

the point x∗ is called global minimum of f . If the inequalities hold strictly, the minima arecalled strict or isolated.

When the first order necessary condition is solved by a descent method, the solutionobtained is always local and global if f is convex. For constrained optimization, the La-grangian of the function f is needed. The idea is to augment the the objective function bya weighted sum of the constraint functions.

Definition 4.1.3. The Lagrangian of a function f is given by

L(x, λ, µ) := f(x) +nineq∑i=1

λi hi(x) +neq∑i=1

µi gi(x). (4.1.2)

The vectors λ = (λ1, . . . , λnineq) and µ = (µ1, . . . , µneq) are called Lagrange multipliers forthe associated constraints. The gradient of the Lagrangian with respect to the variable x isconsequently given by

∇xL(x, λ, µ) = ∇f(x) +nineq∑i=1

λi ∇hi(x) +neq∑i=1

µi ∇gi(x). (4.1.3)

The KKT Optimality Conditions

It has to be assumed that the constraints of the nonlinear program are consistent in away that they satisfy some sort of constraint qualification such as the linear independencyconstraint qualification (LICQ) that requires the Jacobian of the constraints to have fullrank.Given a point (x∗, λ∗, µ∗). If x∗ solves the smooth problem (4.1.1), then the followingconditions are satisfied.

∇xL(x∗, λ∗, µ∗) = 0 (4.1.4)gi(x∗) = 0 i = 1, . . . , neq (4.1.5)hi(x∗) ≥ 0 i = 1, . . . , nineq (4.1.6)

λ∗i ≥ 0 i = 1, . . . , nineq (4.1.7)λ∗i hi(x

∗) = 0 i = 1, . . . , nineq (4.1.8)µ∗i gi(x

∗) = 0 i = 1, . . . , neq (4.1.9)

84


These conditions are called the Karush-Kuhn-Tucker (KKT) conditions. Note that (4.1.8)and (4.1.9) are called complementary conditions while a feasible point automatically satisfies(4.1.9). It is only stated for completeness. Instead of writing the conditions component-wise, one can use vectors for a shorter notation

g(x) = (g1(x), . . . , gneq)T , (4.1.10)

h(x) = (h1(x), . . . , hnineq)T , (4.1.11)

which leads to the complementary conditions in vector notation

λTh(x) = 0, (4.1.12)µT g(x) = 0. (4.1.13)

For linear or quadratic programs with pure equality constraints, the KKT conditions yielda system of linear equations that can be solved directly by appropriate methods. Usually,algorithms for nonlinear programming are designed to find a point that satisfies theseconditions. For programs with inequality constraints one can use active-set or interior-point methods to successively figure out the constraints that are satisfied with equality atthe solution, the so-called active-constraints.However, the KKT conditions are local necessary conditions and not appropriate for findingout whether a point is a global minimum of problem (4.1.1) or not. In fact, there are noeasy-to-check sufficient conditions to detect the globality of a solution in cases when theobjective and the constraints are not known to be convex.

4.1.1. Line-Search and Trust-Region Methods

We shortly outline two standard methods for solving an unconstrained optimization problemsince they are essential for the solution of constrained nonlinear programs.Basically there are two widely used approaches to solve the unconstrained optimizationproblem

(U) minx∈Rn

f(x) (4.1.14)

for a nonlinear smooth function f : Rn → R, namely line-search and trust-region methods.Usually, convexity is necessary for the methods to converge to a global minimum of f , butapplied on a concave minimization problem, the methods still yield a local minimum.

Line-Search

In line-search methods we perform the search for a lower function value of f by searchingalong a certain line from the current point given by a descent direction. The k-th iterationis given by

xk+1 := xk + αk ∆k (4.1.15)

while x0 is some initial guess on the optimal solution x∗ and ∆k is a descent direction. Avery straight forward way is to use ∆k := −∇f(xk), which is the steepest descent directionat the point xk.In each iteration the step size αk has to be chosen in such a way that the function valueof xk+1 is sufficiently lower than the function value of xk. This can be written as a one-dimensional minimization problem

αk := α := arg minα>0

f(xk + α∆k) (4.1.16)

85


and the solution α is called the exact line-search step length. However, solving this sub-problem is not practical. Performing a simple backtracking on the step size αk by choosingthe smallest integer number βk ∈ 0, 1, . . . with

f

(xk +

(12

)βk∆k

)< f(xk) (4.1.17)

gives αk :=(

12

)βk . Better conditions for a sufficient decrease in the objective function thanjust f(xk+1) < f(xk) can be gained by the Armijo rule and the Wolfe conditions. Theseensure that the step size is not taken too small to prevent very slow progress. Instead ofbacktracking, one can use bisection, gloden section, interpolation or an interval intersectionmethod like the Wolfe-Powell step sizes.Newton’s method for the solution of systems of nonlinear equations can be applied to thefirst order necessary conditions ∇f(x) = 0. Then the search direction can be alternativelygiven by the Newton direction

Hf (xk) ∆k = −∇f(xk), (4.1.18)

where Hf (x) = ∇xxf(x) is the Hessian of the function f at xk. If the Hessian is regular,the search direction is given by

∆k = −H−1f (xk) ∇f(xk).

and with αk = 1 the full Newton step is given by

xk+1 := xk −H−1f (xk) ∇f(xk).

The Hessian is by its definition symmetric and it is positive definite if the function f isconvex or if xk is close to the solution x∗. If the Hessian matrix itself is not known orhardly computable, an approximation of its inverse Bk ≈ H−1

f (xk) can be used and yieldsa Quasi-Newton iteration

xk+1 := xk − αk Bk(xk) ∇f(xk), (4.1.19)

where a line-search is needed again. Very popular methods to approximate the Hessian andits inverse are given by the BFGS, limited memory BFGS (L-BFGS), SR-1 or the Broydenformula. The BFGS formulas are designed to be always positive definite while the SR-1formula only preserves symmetry by a rank-1 modification. We give the formulas in the nextsubsection. If the function f is not globally convex, the Newton iteration is not necessarilyconverging to a local minimum. The intention of Quasi-Newton search directions is toalways give a descent direction to ensure global convergence of the line-search method andcan be understood as a globalization strategy for Newton methods.Another alternative is given by nonlinear conjugate gradient (cg) methods which are ageneralization of the cg-methods for systems of linear equations. The idea is to calculatesearch directions that are orthogonal to the direction used in the previous step. Popularformulas for conjugate gradient directions are given by Fletcher-Reeves, Polak-Ribiere andHestenes-Stiefel which are, in fact, closely related to Quasi-Newton formulas.The conjugate gradient direction in step k is given by a modification of the steepest descentdirection by historical data

∆k := −∇f(xk) + γk∆k−1, (4.1.20)

86


while the factor γk is chosen as one of the following options.

(Fletcher-Reeves): γk :=∇f(xk)T ∇f(xk)∇f(xk−1)T ∇f(xk−1)

(4.1.21)

(Polak-Ribiere): γk :=∇f(xk)T (∇f(xk)−∇f(xk−1)

‖∇f(xk−1)‖2(4.1.22)

(Hestenes-Stiefel): γk :=∇f(xk)T (∇f(xk)−∇f(xk−1)

(∇f(xk)−∇f(xk−1)T∆k−1(4.1.23)

Convergence of Line-Search Methods

Convergence of an unconstrained optimization methods means that

limk→∞

∇f(xk) = 0. (4.1.24)

The convergence of line-search methods can be shown by Zoutendijk’s theorem [NW06]for descent directions ∆k with step sizes generated in a way that they satisfy the Wolfeconditions

f(xk + αk∆k) ≤ f(xk) + c1 αk∇f(xk)T∆k, (4.1.25)∇f(xk + αk∆k)T∆k ≥ c2 ∇f(xk)T∆k, (4.1.26)

with constants 0 < c1 < c2 < 1. Condition (4.1.25) is called Armijo-rule and means thatthe step size has to be chosen in such a way that the function value at the next iteratelies below a weakened tangent of the current point. The second condition (4.1.26) is calledcurvature condition and it ensures that the slope of the one-dimensional line-search functionincreases by a factor c2 compared to the initial slope at the current iteration. Recall thatthe one-dimensional slope is just the directional derivative of the function along the currentsearch direction. Clearly, this is negative if the search direction is a descent direction.However, a step size that satisfies these conditions might be still far away from the exact line-search step length. Zoutendijk’s theorem ensures the convergence of line-search methodseven for highly inexact step sizes and therefore it is a strong result.

Trust-Region

The idea of line-search methods is to move along a given direction to find the a point with alower function value. In this way, very large steps can be accepted due to the length of thesearch direction. In contrast, the idea of trust-region methods is to find a lower functionvalue of f within a certain radius ∆k around the current iterate. This controls the actualstep size rather than the step size factor αk in line-searches. Here, ∆k is called trust-regionradius and the trust-region methods consist of the following three tasks.

(i) Determine ∆k.

(ii) Find a direction with ‖pk‖ ≤ ∆k that yields sufficient decrease on f .

(iii) Update the iteration xk+1 := xk + pk.

The solution of the trust-region subproblem

(TR)

minp∈Rnmk(p)‖p‖ ≤ ∆k

(4.1.27)

87


with mk(p) := 12pTBkp+∇f(xk)T p+ f(xk) gives a point pk which minimizes the quadratic

approximation of f at xk. The matrix Bk can be the Hessian of f if available or a symmetric(and positive definite) approximation. When using the Euclidean norm, the constraint ofthe quadratic subproblem becomes pT p ≤ ∆2

k which is nonlinear. This means that thesubproblem itself is not a quadratic program by its definition. Possible ways to calculateapproximate solutions to this subproblem are the so-called Cauchy point method, the doglegmethod, the 2-D-subspace minimization and a conjugate gradient method. We briefly givethe basic ideas of the dogleg and two-dimensional subspace minimization method since theyare used in the trust-region implementation of the MATLAB Optimization Toolbox.The dogleg method requires the matrix Bk to be positive definite. At the current iterate weare in the center of the spherical trust region. In the case when Bk is positive definite, theunconstrained minimum of the function mk(p) is just given by p∗ = −B−1

k ∇f(xk) whichis the Newton or Quasi-Newton point, respectively. If ‖p∗‖ ≤ ∆k we have already foundthe solution of the trust-region subproblem. Otherwise, the idea of the dogleg method is toapproximate the path from the center of the trust region to p∗ by line segments which crossthe trust region boundary exactly once. Two line segments within a path p(t) can be usedwhile the start for t = 0 is the current point. The second vertex is given by the minimumpS of mk along its steepest descent direction. If this minimum lies within the trust-regionit is also-called the Cauchy point of the subproblem. The last point of the path is p∗. Thiscan be written as

p(t) =

t pS 0 ≤ t ≤ 1pS + (t− 1)(p∗ − pS) 1 < t ≤ 2

which is called dogleg. The intersection of the dogleg and the trust region boundary isfound by solving the equation

‖pS + (t− 1)(p∗ − pS)‖2 = ∆2k

to find the appropriate value t∗ and set pk := p(t∗).The two-dimensional subspace minimization is a generalization of the dogleg method. Whilethe dogleg method tries to find a sufficient decrease on the objective function along the pathp(t) for 0 ≤ t ≤ 2, the idea of the subspace method is to search the whole two-dimensionalspan of pS and p∗. The problem can be written as

minp∈Rnmk(p)‖p‖ ≤ ∆k

p ∈ span(∇f(xk),−B−1k ∇f(xk))

It is also possible to modify this problem in such a way that a trust-region iterate can becomputed even if Bk is not positive definite.To determine whether a given trust-region radius ∆k is chosen too large or too small onecan compare the improvement of the function value with the predicted improvement interms of the quadratic approximation

τk :=f(xk)− f(xk + pk)mk(0)−mk(pk)

. (4.1.28)

If the relation τk is small, the trust-region radius can be shrunk while on the other hand, ifit is large the radius can be increased.

88


Convergence of Trust-Region methods

In both cases, line-search and trust-region, convergence results state that the norm of thegradient tends to zero as the iteration counter tends to infinity. If the condition τk > η > 0for a positive constant η is satisfied, the convergence of the trust-region method is showneven if the subproblem is solved inexactly. It results in conditions on the decrease on themodel function in each step similar to the Wolfe conditions for line-search methods. Forthe convergence to stationary points of the objective function, it is only necessary that themodel matrix Bk is symmetric and uniformly bounded in its norm. Using the exact Hessianyields Trust-Region Newton methods with similar local convergence properties with respectto local minima of the objective function.

4.1.2. Quasi-Newton Update Formulas

The most popular update formula for approximating the Hessian of a function is the BFGS(Broyden, Fletcher, Goldfarb, Shanno). It is developed for subsequent iterations startingwith a positive definite and symmetric initial approximation. For two iterates xk+1 and xkwe can define

sk := xk+1 − xk,yk := ∇f(xk+1)−∇f(xk).

The idea is to find a matrix Bk+1 for use in a quadratic model of the function at the currentpoint. One can require that the quadratic model interpolates the gradients of the functionat the last two iterates. This leads to the secant equation

Bk+1sk = yk.

It can be seen that this matrix is positive definite if (sk)T yk > 0. It is shown that thiscondition is satisfied if a line-search procedure with Wolfe conditions is applied. Since thesecant equation itself has not a unique solution because of the number of degrees of freedomin Bk+1, further conditions have to be stated to uniquely define a solution. One conditionis symmetry and Bk+1 has to be chosen in a way that it minimizes the distance to the lastmatrix Bk in a certain norm. This yields an approximation to the Hessian of the objectivefunction. For practical algorithms, it is useful to approximate the inverse of the Hessianinstead of the Hessian itself. This can be done by switching sk and yk in the secant equationand leads to the BFGS formula

(BFGS) Hk+1 :=(I − sk(yk)T

(yk)T sk

)Hk

(I − yk(sk)T

(yk)T sk

)+sk(sk)T

(yk)T sk(4.1.29)

with Hk = B−1k .

Another well-known formula is the SR-1 (symmetric-rank-1). The basic idea is to definethe symmetric-rank-1 update by

Bk+1 = Bk + γvvT

for a scalar γ and a vector v. If we require again that Bk+1 satisfies the secant equation, aunique pair (γ, v) can be derived. The update formula for the inverse Hessian is

(SR-1) Hk+1 := Hk +(sk −Hkyk) (sk −Hkyk)T

(sk −Hkyk)T yk. (4.1.30)

89


The Broyden approximation formula is given by

(Broyden) Hk+1 = Hk +(sk −Hkyk)yTkHk

yTkHksk. (4.1.31)

In any cases, formulas for Bk+1 and Hk+1 are available. The main difference betweenbetween both formulas is that BFGS is designed for line-search procedures that satisfy theWolfe conditions to produce symmetric positive definite approximations of the Hessian andthe SR-1 formula only guarantees symmetry. BFGS updating might be ineffective if a line-search is performed which does not satisfy the Wolfe conditions and it is not suitable forthe identification of saddle points. The SR-1 formula is suitable for trust-region methodssince these methods can handle the indefiniteness of the Hessian approximation.

Limited Memory Variants

The BFGS iteration for the inverse Hessian approximation can be written as

Hk+1 = V Tk HkVk +

sk(sk)T

(yk)T sk

with Vk :=(I − yk(sk)T

(yk)T sk

)which might be computationally expensive when the dimension of

the problem is large. Furthermore, when iterating from H0 to Hk all curvature informationalong the iteration path is used implicitly to calculate the latest approximation. Since itcan be assumed that the only curvature information relevant is the most recent one. Thenthe idea of a limit memory variant for BFGS updating which is called L-BFGS is to definethe update by a recursion of depth m.

Hk = (V Tk−1 · · ·V T

k−m) H0k (V T

k−1 · · ·V Tk−m)

+1

(yk−m)T sk−m(V Tk−1 · · · V T

k−m+1) sk−msTk−m (Vk−m+1 · · · Vk−1)

+ . . .

+sk−1s

Tk−1

(yk−1)T sk−1

This recursion can be computed by the so-called two-loop recursion [Noc80].The matrix H0

k does not need to be the same matrix at every iteration. Now this takes thelast m iterations into account and can be implemented in a way that it stores less memorythan the full update. When choosing H0

k every m iterations this is effectively equivalent toresetting the matrix Hkm ← H0

k to a certain initial guess for k ∈ N while performing thestandard BFGS iterations at the other time.Trust-region methods require the approximation to the Hessian itself instead of to its in-verse. In [NW06], implementations to these update formulas, including the SR-1 formula,are derived similarly.

4.2. Constrained Optimization

One of the oldest ideas (see [Zan67]) for solving a problem of type (P) is using penaltyfunctions. The violation of the constraints is quantified, multiplied by a penalty param-eter and added to the objective function. Clearly, the penalty term has to be zero if the

90


constraints are satisfied at the current point. Penalty methods are iterative and result inan unconstrained minimization problem in each iteration, where the penalty function isminimized. If the solution satisfies the constraints, a solution to the constrained problemhas been found. If not, the penalty parameter is modified and the penalty function is min-imized again.A similar way of solving (P) is the use an augmented Lagrangian function. In this case,we have a Lagrange multiplier for each constraint. This is not used as a penalty parameterbut as a decision variable and is handled specially.However, sequential-quadratic programming and interior-point methods for nonlinear pro-grams are widely used. We now outline these popular methods for solving problem (P) from(4.1.1). In sequential-quadratic programming algorithms, a constrained convex quadraticprogram has to be solved in each iteration. Therefore we give the basics of a method forsolving convex quadratic programs known as active-set method.

4.2.1. Quadratic Programming and Active-Set

A quadratic program is a special case of problem (P) and has the form

(QP)

minx∈Rn q(x) = 1

2xTQx+ xT c+ d

aTi x ≥ b, i = 1, . . . , nineqaTi x = b, i = 1, . . . , neq

(4.2.1)

The matrix Q is symmetric and in Rn×n and c ∈ Rn. For each constraint, we have avector ai of coefficients for the linear combination of variables and bi are some constants.Obviously, the constant d ∈ R is irrelevant for the solution of (QP) and can be neglected.If Q is positive definite, the quadratic program is called convex.If there are only equality constraints, the problem has the form

(QPeq)

minx∈Rn q(x) = 1

2xTQx+ xT c

Ax = b(4.2.2)

withA =

(aT1 . . . a

Tneq

).

The Lagrangian of this problem has the form

L(x, µ) =12xTQx+ xT c−

neq∑i=1

µi aTi x (4.2.3)

and the gradient of the Lagrangian with respect to x is given by

∇xL(x, µ) = Qx+ c−ATµ. (4.2.4)

The KKT conditions at a solution x∗ for this problem can then be written as(Q −ATA 0

)︸︷︷︸

=:K

(x∗

µ∗

)=

(−cb

), (4.2.5)

91


where the symmetric matrix K is called KKT-matrix. If the constraints are in a wayconsistent that is when they are linearly independent and the matrix Q is positive definite,there exists a solution (x∗, µ∗) of this system of linear equations. The positive definitenesseven guarantees the uniqueness of the solution.There are direct and iterative ways to solve the system (4.2.5). It is possible to solve theproblem with any method that is capable of solving systems of linear equations but onecan also use the symmetry and the special structure of K to get a more efficient solution.This is important, especially when the numbers of variables and constraints are large. Forfurther details we refer to the literature.

Active-Set

Assume that we have a quadratic program with equality and inequality constraints. Atan optimal solution, we have all equality constraints satisfied, while some of the inequalityconstraints might be satisfied with equality and the others with strict inequality. But whenstrict inequality holds at the solution, this tells us that the constraint is not a hard one andthat it is irrelevant for the solution. An inequality constraint is called ’active’ if it holdswith equality. The set of all active constraints at a current point is called the ’active set’.Clearly, if one knows which of the inequality constraints are active at the solution, thequadratic problem could be solved by a single solution of the KKT system (4.2.5) withthe constraints from the optimal active set, which includes all equality constraints and theinequality constraints that are active at the solution.Since the optimal active-set is not known a priori, a combinatorial strategy to find itis to iteratively activate or deactivate some of the inequality constraints and solving anequality-constrained quadratic program each time until the solution satisfies all inequalityconstraints. In each iteration, the set of active equality constraints is called ’working set’.

For this approach, it is useful to use a different notation that writes all constraints aselements of a common set. Let the indices i ∈ I := 1, . . . , (nineq + neq) refer to allconstraints. Then the inequality constrained quadratic program can be written as

(QPineq)

minx∈Rn q(x) = 1

2xTQx+ xT c

aTi x ≥ bi, i ∈ Iineq,aTi x = bi, i ∈ Ieq,

(4.2.6)

where Iineq and Ieq with Iineq∪Ieq = I denote the sets of all indices which refer to inequalityconstraints or equality constraints, respectively.

Then, the active set a point x can be written as

A(x) = i ∈ I : aTi x = bi. (4.2.7)

In the k-th iteration of the active-set method, the optimal active set A(x∗) is approxi-mated by a working set Wk ⊂ I. The active-set subproblem can be written as an updateto the current point xk in each iteration. If one writes xk + p = x and gk = Qxk + c, thisgives

q(x) = q(xk + p) =12pTGp+ gTk p+

12xTkQxk + cTxk︸︷︷︸

=:zk

92


while zk does not depend on the step p.Now, if one finds a solution pk to

(AS)

minp 1

2pTQp+ gTk p

aTi p = 0, i ∈ Wk,(4.2.8)

then xk+1 := xk +αkpk satisfies the equality constraints from the working set Wk for everyvalue of αk > 0 if xk satisfies them.Choosing αk = 1 might result in the violation of some inequality constraints that are nottaken into account by the working set. If that is the case, the step size parameter αk has tobe chosen as large as possible so that at least one of the violated inequality constraints is justsatisfied with equality while the others are satisfied. Here, the modification of the workingset is motivated. The working set can be augmented by the new found active inequalityconstraint. On the other hand, there are decision rules based on Lagrange multipliers todecide whether an element from the working set can be removed or not.The convergence of active-set methods for convex quadratic programs results from the factthat the solution of (AS) yields a global minimum for the given working set and on theother hand, q(xk+1) < q(xk), which means that it cannot cycle between working sets. Then,the optimal active set is found within a finite number of iterations.

4.2.2. SQP Methods

Sequential-quadratic programming methods solve a nonlinear program by a solving a se-quence of quadratic programs. In nonlinear programs, both, objective functions and con-straint functions are nonlinear. The SQP idea is to model the Lagrangian of the problemby a quadratic approximation with linearized constraints.First, consider an equality-constrained nonlinear program and we write the Jacobian of theconstraints as

A(x) = (∇g1(x),∇g2(x), . . . ,∇gneq(x))T . (4.2.9)

One can solve the equality-constrained nonlinear program by solving the associated KKTsystem

F (x, µ) =

∇f(x)−A(x)Tµ

g1(x)...

gneq(x)

= 0. (4.2.10)

Solving this system with Newton’s method yields the iteration(xk+1

µk+1

)=

(xkµk

)−(F ′(x, µ)

)−1F (x, µ)︸︷︷︸

=:

pkpλ

(4.2.11)

with the Jacobian given by

F ′(x, µ) =

(∇2xxL(x, µ) −A(x)T

A(x) 0

). (4.2.12)

93


The update is then given by the system

F ′(x, µ)

(pkpλ

)= F (x, µ). (4.2.13)

Now this is the KKT system for the quadratic function

qk(p) :=12pT ∇2

xxL(xk, µk) p+∇f(xk) p (4.2.14)

subject to the linear equality constraints

A(xk) p+ c(xk) = 0. (4.2.15)

This gives a local Newton SQP method for nonlinear programs with nonlinear equalityconstraints in cases where the Hessian of the Lagrangian can be computed and is positivedefinite.Nonlinear programs with inequality constraints can be handled by an active-set strategy.The inequality constrained quadratic program has the form

(SQP)

minp qk(p)∇hi(xk)T p+ hi(xk) ≥ 0, i ∈ Iineq∇gi(xk)T p+ gi(xk) = 0, i ∈ Ieq

(4.2.16)

with the notation from (4.2.6). Here, the gradients of the constraints ∇gi(xk) and ∇hi(xk)are needed in each iteration.We cannot expect that the Hessian of the Lagrangian can be computed analytically. Quasi-Newton approximation formulas can also be applied here by setting

sk := xk+1 − xk and yk := ∇xL(xk+1, µk+1)−∇xL(xk, µk+1). (4.2.17)

If the curvature condition sTk yk > 0 is not satisfied, the BFGS update might not yield apositive definite approximation to the Hessian of the Lagrangian. Then, the BFGS updatemust be skipped or damped.Now by solving problem (4.2.16), a search direction pk for the nonlinear program can becomputed. Unless the objective function is quadratic, the full step along this direction mightnot produce sufficient decrease on the objective function or even violate the constraints.We have to keep in mind that the (SQP) subproblem is solved with linearized constraintswhich only guarantees that linear constraints are satisfied by the full step (xk + αkpk).

Merit Functions

A well-known function to decide whether a computed step is accepted or rejected is thenon-smooth `1 merit function

Φ(x;κ) := f(x) + κ∑

i∈Iineq

[hi(x)]− + κ∑i∈Ieq

|gi(x)| (4.2.18)

with a penalty parameter κ > 0. This merit function is called exact since any local minimumof the nonlinear program (4.1.1) is a local minimum of Φ(x;κ).

94


The inequality constraints can be transformed to equality constraints by the introductionof slack variables. Then, the `1 merit function has the form

Φ1(x;κ) := f(x) + κ · ‖g(x)‖1 (4.2.19)

with

g(x) =

g1(x)g2(x)

...h1(x)− s1

h2(x)− s2...

(4.2.20)

and slack variables (s1, s2, . . . ) ≥ 0.In line-search SQP methods, the merit function is used to determine whether the step com-puted by the solution of the quadratic subproblem is accepted. A line-search is performedon the merit function instead of the original objective function or the Lagrangian. Thereare also trust-region SQP methods that make use of merit function to decide whether thetrust region radius has to be shrunk or enlarged.

The Maratos Effect

A disadvantageous effect of SQP methods based on merit functions is that it is possible thatthe solution of the quadratic subproblem leads to a step that does not produce a descenton the merit function or it produces an increase in the norm of the constraints. This mightbe due to highly nonlinear constraints and therefore poor linearizations. A possible way tosolve this problem is to make use of so-called second-order correction terms.

Surely, the nonlinear constraints can be approximated by a quadratic formula whichwould improve the approximation of the constraints for the subproblem. But even if theHessians of the constraints can be computed or approximated, this leads to a quadratic-constrained quadratic minimization problem that can be difficult to solve. The problem isthe second-order term in the quadratic approximation, which also depends on the step thatis to be computed.

Instead of using quadratic constraints, one can evaluate the second-order term at the nextpossible point and keep it fixed. This approximates the quadratic model of the constraintsby a (still) linear model. So, all that has to be done is to introduce a correction term tothe linear constraints. This correction term needs an evaluation of the constraints at thenext point, but these have to be evaluated anyway for the merit function.

4.2.3. Interior-Point Methods

Interior-point methods come in different variants which are classically identified by theproperty of following a discrete path of strictly feasible point in a constrained optimiza-tion framework. An approach which is related to penalty methods is the so-called barriermethod, in which the violation of inequality constraints is weighted by a logarithmic penaltyfunction. The nonlinear program is then solved by a sequence of equality-constrained prob-lems with varying penalty parameters. A different approach leads to the so-called primal-dual methods that address the barrier problem indirectly.In any case, the nonlinear programming problem is modified to eliminate the inequality

95


constraints and then the methods make use of the property of the problem that the KKTconditions can be solved by Newton’s method. In contrast to active-set SQP methods,interior-point methods do not need to guess the active-set at the solution. This difficulty isovercome by the strict feasibility of the iterates, where only equality constraints are active.That is why these methods are called interior-point methods and they are attractive forproblems where the objective function is not defined for infeasible points.

Barrier Methods

In Barrier Methods, a nonlinear program is solved by a two-loop iteration. In the inneriteration, a equality-constrained subproblem is solved by applying Newton’s method on theKKT conditions to get search directions. The outer iteration modifies the parameters ofthe subproblem and computes the steps.Consider a problem

(P0)

minx∈Rn f0(x)hi(x) ≥ 0 i = 1, . . . , nineqgi(x) = 0 i = 1, . . . , neq

(4.2.21)

with nonlinear inequality constraints and nonlinear equality constraints that is assumed tobe solvable.The idea is to transform it to a problem with only equality constraints. First, nineq slackvariables are introduced to replace the inequality constraints by

hi(x)− si = 0, i = 1, . . . , nineq (4.2.22)si ≥ 0, i = 1, . . . , nineq. (4.2.23)

The objective function is rewritten as

f1(x) := f0(x) +nineq∑i=1

I+(hi(x)), (4.2.24)

with

I+(z) =

0 z ≥ 0∞ z < 0

. (4.2.25)

Minimizing f1(x) subject to the equality constraints Ax = b solves problem (4.2.21) im-plicitly. However, this is not practical to solve since f1 is not a continuous differentiablefunction. Thus, the indicator function I+ is approximated by the so-called logarithmicbarrier

I+(z; t) := −1t

ln(z) z > 0 (4.2.26)

with a parameter t > 0. Then, I+(z; t) tends to infinity as z tends to zero from above. Itis only defined for strictly feasible points. It can be used on the slack variables.Define

f2(x) := f0(x) +nineq∑i=1

−1t

ln(si) (4.2.27)

= f0(x) +1t

nineq∑i=1

− ln(si)︸︷︷︸=:Ψ(x,s)

. (4.2.28)

96


This leads to the optimization problem

(P2)

minx,s f2(x, s)gi(x) = 0 i = 1, . . . , neq.

(4.2.29)

For a chosen t > 0, the solution of (4.2.29) is denoted by x∗(t) and called central point.The set x∗(t) : t > 0 is called central path and it is shown in [BV04] that for t toinfinity, x∗(t) converges to a solution of (4.2.21). This leads to a straight-forward methodto produce a sequence of strictly feasible points by directly solving the KKT conditions ofthe subproblem to converge to a solution of the original nonlinear program.

Primal-Dual Methods

A quite common approach to construct interior point methods leads to the so-called primal-dual methods to emphasize that these methods make use of dual variables of the Lagrangian.

The KKT conditions for problem (4.2.29) are

∇f0(x)−ATg (x)µ−ATh (x)λ = 0 (4.2.30)

−1tS−1e+ λ = 0 (4.2.31)

gi(x) = 0 i = 1, . . . , neq (4.2.32)hi(x)− si = 0 i = 1, . . . , nineq (4.2.33)

withS = diag(s1, . . . , snineq) and e = (1, . . . , 1)T . (4.2.34)

The matrices Ag(x) and Ah(x) denote the Jacobians of the constraint functions g =(g1, . . . , geq) and h = (g1, . . . , geq) at the point x. The condition s ≥ 0 is not neces-sary because it is automatically satisfied since otherwise f2(x∗, s∗) = ∞ for a KKT point(x∗, s∗). For any t > 0, the slack variables s have to be strictly positive. Thus, S is regularand by multiplying condition (4.2.31) with S from the left-hand side and writing τ := 1

t ,we obtain

Sλ− τe = 0. (4.2.35)

Now, together with s ≥ 0 and τ = 0, these are the KKT conditions for the problem

(P3)

minx,s f0(x)gi(x) = 0, i = 1, . . . , neq,hi(x)− si = 0 i = 1, . . . , nineq,

(4.2.36)

that does not use explicit barrier functions.Active-set methods are designed to determine the active set at the solution of this prob-

lem. Here, one would have to decide which of slack variables are chosen to be zero and whichnot which is essentially the same problem as in the active-set approach. It is also referedto as the homotopy or continuation approach (see [NW06]) using the parameter τ → 0+,which forces the slack variables to become strictly positive to eliminate this difficulty. Thesolutions for τ then approach the optimal solution from the interior of the feasible set whichexplains that name interior-point method.

97


The solution for a given τ is also-called primal-dual central point and the primal-dual cen-tral path is defined analogue as for the barrier approach.The Lagrangian of problem (4.2.36 ) is given by

L(x, s, µ, λ) := f0(x)−neq∑i=1

µi gi(x)−nineq∑i=1

λi (hi(x)− si), (4.2.37)

which is needed for the application of a damped Newton’s method on the KKT conditionsfrom above. The k-th Newton iteration gives the linear system

∇2xxL 0 −ATg (xk) −ATh (xk)0 diag(λk) 0 S

Ag(xk) 0 0 0Ah(xk) −I 0 0

·

∆kx

∆ks

∆kµ

∆kλ

=

∇f0(xk)−ATg (xk)µk −ATh (xk)λk

Skλk − τkeg(xk)

h(xk)− sk

. (4.2.38)

The subscripts k denote the dependency on the current point (xk, sk, µk, λk). The nextiterate is given by the so-called fraction to the boundary rule.

xk+1 = xk + αs∆kx (4.2.39)

sk+1 = sk + αs∆ks (4.2.40)

µk+1 = µk + αλ∆kµ (4.2.41)

λk+1 = λk + αλ∆kλ (4.2.42)

with step sizes

αs := maxα ∈ (0, 1] : s+ α∆ks ≥ (1− ρ)s, (4.2.43)

αλ := maxα ∈ (0, 1] : λ+ α∆kλ ≥ (1− ρ)λ. (4.2.44)

The damping parameter ρ ∈ (0, 1) – if chosen close to 1 – causes the slack variables andthe associated multipliers to approach zero slowly. An interior-point method consists ofthis basic iteration and certain strategies applied to modify the barrier parameter τ andthe choice of the damping parameter ρ and strategies to handle non-convergence in theconcave case as well. Recent work concerning the choice of barrier parameters is done in[JNW09] and also gives a good overview on the state-of-the-art in nonlinear primal-dualinterior-point methods.The iterative solution of the KKT condition can make use of the special structure of theKKT matrix. The system (4.2.38) can be rewritten in such a way that the the KKT matrixis symmetric which makes the solution of the linear system in each step of the dampedNewton method easier.In practical implementations, the analytic representation of the Hessian of the Lagrangianis usually not available which again leads to the use of Quasi-Newton update formulas toapproximate the Hessian in each iteration. The method converges to a point that satisfiesthe KKT conditions which is necessary for a local minimum with constraints but not suffi-cient. Depending on the definiteness of the Hessian, the method might converge to a local

98

4.3. Optimal Control

maximum or a saddle point of the original problem. The BFGS-formula can also be usedto force global convergence of the interior point method.After computing the damped step in the k-th Newton iteration, one can perform a line-search on the intervals (0, αs) and (0, αλ) and use a merit function to decide whether acomputed step yields sufficient decrease on the objective function and/or a barrier func-tion.


The expression optimal control arises in different problem setups and implies the followingroughly spoken goal: One likes to minimize a cost functional that consists of time-dependentand time-independent costs subject to the satisfaction of a dynamic system. A dynamicsystem can have a finite or infinite, continuous or discrete time horizon.

In contrast to classical optimization problems, optimal control requires a time-dependentsystem and seeks a time-dependent (explicitly and implicitly) vector-valued control functionin a given function space. This can lead to open-loop controls (explicitly time-dependent)and closed-loops (implicitly time-dependent), also known as feedback control which haswide applicability, especially in control systems to optimize regulators [MA71, HAKJ03].Early ways to describe optimal control problems are given by the so-called calculus of varia-tions [KS91] and dynamic programming [Ber07] which are strongly related by Hamiltoniandynamics and Bellman’s principle of optimality [Bel03].A quite common way to address optimal control problems in a very practical sense is toformulate a nonlinear program which is solved in place of the original optimal control prob-lem. Two of the established methods refer to the expressions single and multiple shootingand can be compared with the shooting methods for the solution of boundary value dif-ferential equations. In case of multiple shooting methods, the time horizon is decomposedinto several intervals introducing continuity conditions at the interval boundaries and formtwo-point boundary value problems, see [MG75, FA99, HBS00]. In classical single shootingmethods, such a decomposition is not applied.A central role is played by the so-called control vector parameterization, which is the mainconcept in solving optimal control problems with nonlinear programming. In order to ob-tain a finite number of controls, states and constraints, there are methods that seek tofully discretize the problem with direct collocation methods and solve the resulting largenonlinear program with SQP methods, see [KS93, Bet93, Str93]. In constrast to shootingmethods, theses methods also discretize the states of the dynamical system. In each case, atype of control vector parameterization is applied, either directly or indirectly. In shootingmethods, the parameters can in principle describe an arbitrary control by being for examplethe coefficients of piecewise defined polynomials. We shortly outline the idea of the controlvector parameterization in shooting methods.

Control Vector Parameterization

In classical optimal control problems, the control vector is understood as a vector of func-tions of time and no further assumptions are made. Control functions do not have to besmooth but some problems possess smooth solutions.When thinking of a control at a real plant that has to be steered manually, it is convenientto say that a smooth solution must not necessarily be of interest since it cannot be appliedexactly anyway.

99


So we face two different questions in control vector parameterization. The first is, whatkind of regularity of a practical control must be expected. The second question is moretheoretically and asks how good the smooth solution is approximated by a practical control.The first question is more relevant since it addresses the real application problem and thebest theoretical solution is worthless if it cannot be applied. What kind of control can beapplied?In this work, we describe ways to determine the structure of time-optimal controls for gradechange problems. This can be seen as the creation of process know-how more than the com-putation of practical optimal control strategies.The assumption is that, for certain types of models and under given conditions, a smoothsolution of the optimal control problem exists, whose structure can be analyzed to gainknowledge about the underlying process dynamics. In the following we will work with fixedfinite time horizons. In principle, we could use any basis function to determine piecewisecontrols. Piecewise-constant controls have the benefit that we can impose linear boundconstraints to formulate the fact that there are lower and upper bounds on the controlvalue. If we use, for example, piecewise quadratic functions to define a control function wecan do this by giving the curvature of the function by time-invariant parameters but thenwe need to impose nonlinear constraints to the resulting control function in order to havethe lower and upper bounds satisfied. Thus we use piecewise-constant functions to describea time-dependent control vector

u : R→ Rm : u(t) =

u1(t)...

um(t)

. (4.3.1)

For i = 1, . . . ,m, let the time horizon [0, T ] be decomposed by niT points

0 =: ζ0i < ζ1

i < · · · < ζniTi︸︷︷︸

to choose

< ζniT+1i := T. (4.3.2)

Each control in the vector can be defined by

ui(t) := νki , t ∈ [ζk−1i , ζki ), k ∈ 1, . . . , niT + 1, (4.3.3)

=niT+1∑k=1

δtkνki , (4.3.4)

with

δtk :=

1 t ∈ [ζk−1

i , ζki )0 else

. (4.3.5)

Here, niT , ζki and νki must be given to represent the controls over the full time horizon. Thevalue of νki is the value of the i-th control in interval k.To have the piecewise-constant function setup correct we need the inequalities from (4.3.2).To overcome this, we use interval lengths

ζki = (ζki − ζk−1i ), k = 1, . . . , niT + 1. (4.3.6)

By requiringniT+1∑k=1

ζki = T, (4.3.7)

100


we can write δtk as

δtk =

1 t ∈ [

∑k−1j=1 ζ

ji ,∑k

j=1 ζji )

0 else(4.3.8)

for k = 1, . . . , niT .We write this information as an input matrix

Ui =

(ζ1i . . . ζ

niTi

ν1i . . . ν

niTi

)∈ R2×niT (4.3.9)

for each index i = 1, . . . ,m. This gives (2 ·∑m

i=1 niT ) degrees of freedom to describe the

time-dependecy of the control vector u(t). For our practical optimization calculations weassume a common time discretization scheme

n1T = n2

T = · · · = nmT =: nT (4.3.10)

andζk1 = ζk2 = · · · = ζkm =: ζk, k ∈ 1, . . . , nT . (4.3.11)

Then we can write the inputs in a single matrix

Uc :=

ζ1 . . . ζnT

ν11 . . . νnT1...

...ν1m . . . νnTm

∈ R(m+1)×nT . (4.3.12)

This can reduce the number of degrees of freedom drastically when m is large but withthe drawback of fixing the controls to the same time intervals. This might not be a goodchoice if controls are used with significantly different control structures. This problem isalso related to the scaling of variables.

In the following subsections we will refer to the parameterized control with a single symbolu. In the case of the piecewise-constant discretization, this is understood as the vector formof Uc

u := (ζ1, . . . , ζnT , ν11 , . . . , ν

nT1 , . . . , ν1

m, . . . , νnTm ) ∈ R(m+1) ·nT (4.3.13)

The parameter space shall be denoted by U and results from requiring the controls in u tolie between lower and upper bounds

UmnT := u ∈ R(m+1) ·nT : b` ≤ u ≤ bu (4.3.14)

where b` and bu are vectors containing the lower and upper bounds on each control.

4.3.1. Optimal Control for Grade Changes - Trajectory Boundaries

In the following, we outline the idea of formulating a strict feasibility problem for gradechanges in dynamic systems where a product is defined by keeping specific state variableswithin fixed bounds over a certain time horizon. The formal definition of a minimal-timegrade change problem leads to a hardly solvable numerical problem. We give more practicalmethods to address this problem and compare them in the results section at the optimizationof grade change processes in the presented wet-end model.

101


First, we have to mention the different approaches for the satisfaction of state constraintsthat are widely used in application. State constraints, in general, limit the trajectory ofdynamical states to certain bounds or nonlinear constraints at almost every point withina certain interval. Formally, this leads to an infinite number of constraints. A straight-forward way to circumvent this problem is to discretize the constraints in time, namelyto pick a finite number of checkpoints where the constraints have to be satisfied, comparewith the literature for multiple shooting methods [MG75, FA99, HBS00]. Together withassumptions on the regularity of the state trajectories, one can find a mesh of checkpointsthat is fine enough to ensure the strict feasibility.The use of interior point methods for optimal control is suggested in [Sch07, DVS09, WB08].Let x : R × Rm → Rn be the vector of states at time t which implicitly depends on thecontrol u : R→ Rm. The dynamic system equations are given by

x′(t) = F (x(t, u(t), u(t)), t ≥ 0x(0) = x0. (4.3.15)

Now we define a set of indices in 1, . . . , n of states that are used to specify a certainproduct.

I := i ∈ 1, . . . , n : xi belongs to the product specification (4.3.16)

Formally, a product at a certain time t is accepted if all values of the associated states arewithin certain tolerances of a specified destination z. A specification pair is given by (z, ε)with a given tolerance ε > 0. We define an indicator function

σ(x; z, ε) :=

0 |x− z| < ε

∞ else, (4.3.17)

which detects the violation of the specification tolerances by returning∞. Such an indicatorfunction can be used for any xi with i ∈ I. This leads to the requirement, that we need

σ(xi(t); zi, εi) = 0 for i ∈ I (4.3.18)

at time t to have a product that fits the specifications. Clearly, we have σ(x; z, ε) ≥ 0 byits definition an thus we can write∑

i∈Iσ(xi(t); zi, εi) = 0 at time t. (4.3.19)

Assume that we have given T1 < T2 ∈ R+. If∫ T2

T1

∑i∈I

σ(xi(t); zi, εi) dt = 0, (4.3.20)

then the product specification tolerances are satisfied at every t ∈ [T1, T2] since the statevectors are assumed to be continuous in time. Thus, by (4.3.20), we have given a sufficientformal criterion to measure the satisfaction of the product specification within a certaintime interval. This is however not unique and there are whole classes of possible indicatorfunctions that lead to the same result. All we need is that the indicator function is zerowhenever a point lies within its range and positive for points that violate the constraints.

102


We measure the satisfaction of all constraints by the integral over the sum of indicatorfunctions which is zero in any case when the constraints are strictly satisfied.We can define the time-optimal control problem for a grade change from state (x0

i )i∈I toa certain product (zi, εi)i∈I by the following formal optimization problem. Let T2 be adefined final time. (TOC) stands for time-optimal control problem.

(TOC)

minT1,u(t) T1,

g(T1, u(t)) :=∑

i∈I∫ T2

T1σ(xi(t, u(t)); zi, εi) dt = 0.

u : R→ Rm

T1 < T2

x′(t) = F (x(t, u(t)), u(t)), t ≥ 0x(0) = x0

(4.3.21)

This means that one tries to find the largest possible interval [T1, T2], in which the productspecification is strictly satisfied. In other words, one likes to minimize the time that isneeded to steer the system to its dedicated state.Since the constraint function g is zero whenever a point is feasible and∞ else, the problemcan also be written without this explicit constraint by

(TOC)⇔ minT1,u(t)

J(T1, u(t)) := T1 + g(T1, u(t)) (4.3.22)

with the remaining constraints as in (TOC). This equivalence holds true for the specialchoice of the indicator function (4.3.17).

4.3.2. Solving the Feasibility Problem

We have to keep in mind that a problem of type (TOC) is in general not uniquely solvabledue to the indicator constraint function that is not sensitive to small changes in the controlsat an optimal solution. We cannot expect to find a unique solution and therefore we have tobe satisfied if we have a method that generates points that come close to a specific solution.

One way to do this is use a smooth indicator function that preserves the strict feasibilityof a point if and only if the integral over the indicator function in time is zero. Such afunction can be

σC1(x; z, ε) :=

0, |x− z| < ε,

((|x− z|)− ε)2, |x− z| ≥ ε,(4.3.23)

which is even smooth at points x with |x − z| = ε. However, an implicit notation of theassociated constraint as in (4.3.22) is not possible here since it cannot be guaranteed that asolution of the combined functional is strictly feasible for the original problem. The smoothindicator function can take any value from [0,∞). Thus, this smooth constraint has to behandled explicitly which is possible. It is noteworthy that a solution of a problem of type(4.3.22) also minimizes T1 with the smooth constraint and in any case, the trajectory of xwithin the interval [T1, T2] is not monitored by the indicator functions since the states arestrictly within their bounds. Both formulations cannot distinguish between a solution thatleads x close to the boundary and a solution that runs close to the center of the specification.Now let u be a time-invariant control. Whenever a feasible point is computed, the gradientof the constraint function

gC1(T1, u) :=∫ T2

T1

∑i∈I

σC1(xi(u, t); zi, εi) dt (4.3.24)

103


is zero since the indicator function is constantly zero. This can lead to practical problemsin numerical algorithms. If a point is feasible, the algorithm has no indication on how muchT1 can be reduced without violating the constraints. It transforms to a trial-and-errorproblem. Furthermore, many practical algorithms are designed to produce a sequence ofpoints that is strictly decreasing with respect to the value of the objective function. In thatcase, the problem shows to be unsolvable once T1 is chosen too small so that there is nocontrol u that achieves feasibility of the point.The time-optimal grade change problem can be understood as the problem of finding thesmallest time point T ∗1 for which a control u∗ exists so that (T ∗1 , u

∗) is (still) feasible. Forany t < T ∗1 we will not be able to find a control that achieves feasibility but on the otherhand, for any t ≥ T ∗1 the point (t, u∗) is feasible. For a given t ∈ [0, T2), we try to solve thefeasibility problem

(FP ) gfp(u) := gfp(T1, u) :=∫ T2

T1

(t− T1)2∑i∈I

σC1(xi(u, t); zi, εi) dt = 0 (4.3.25)

by choosing a suitable u. The term (t− T1)2 is introduced as a time-dependent weightingparameter and does not change the solution of the equation. It automatically penalizesdeviations from the specification the more time has passed. The argumentation why thisis used follows below. Although the zero of this equation is not unique, we know that anycontrol u is a global minimum of gfp(u) if and only if it is a zero of it. To identify T ∗1 we can

Algorithm 4.1 Binary search for time-optimal controlRequire: t` ← 0, tu ← T2, TOL > 0, u1, θ ∈ (0, 1)t1 ← θ · (tu − t`)for k = 1, . . . doγ ← minu gfp(tk, u) starting from ukuk+1 ← arg min gfp(tk, u)if γ = 0 thentu ← tkε← (tu − t`)if ε < TOL thent∗ ← 1

2(t` + tu)return

end ifelset` ← tk

end iftk+1 ← θ · (tu + t`)

end for

use a classical binary search, bisection or golden section scheme to decompose the interval[0, T2) iteratively. For example, choose θ = 0.5. Start at t1 = θ ·T2 and try to find a feasiblecontrol. If succeeded, then search in the left interval by dividing it, else search in the rightinterval. Under the assumption that a minimum of (4.3.25) can be found for any choice oft, such a binary search strategy must converge to the unique optimal time with any giventolerance. A simple algorithm that describes this procedure is given in Algorithm 4.1. Thissolves a sequence of feasibility problems and guarantees the feasibility of the control if it

104


converges. The basic idea is that the interval can be divided into two disjoint intervals witha common boundary t∗ := T ∗1 , the optimal time.

D1 := t ∈ (0, T2) : ∃ u ∈ U with gfp(t, u) = 0, (4.3.26)D0 := t ∈ (0, T2) : gfp(t, u) > 0 for all u ∈ U. (4.3.27)

(4.3.28)

And it holdsD0 ∪D1 = (0, T2), D0 ∩D1 = ∅. (4.3.29)

The left interval D0 can be called infeasible region and the right one D1 can be calledfeasible region. Define j(t) := minu gC1(t, u). The problem of finding t∗ is one-dimensionaland solves the problem of finding

t∗ := minD1 : j(t`) > j(t∗) = j(tu) = 0, t` < t∗ < tu (4.3.30)

We have to keep in mind that Algorithm 4.1 is only guaranteed to converge if a globalminimum of gfp(tk, u) is attained for each k. The solution of (FP) must be a global minimumof the function. This subproblem might not be convex or highly nonlinear and the level-set of the 0 can be empty or whole connected regions. If the feasibility problem is notsolved, this might be due to the non-solvability (and that will be fine) or because of theminimization method accepting a local minimum on a higher level. Unfortunately, there isno sufficient criterion to check whether there is a zero of feasibility problem or not, if onlya local minimum on a higher level is known.It is important to use a good initial guess for u in each step of the binary search iteration.As stated in Algorithm (4.1), one can use the previously computed optimal point for thefeasibility problem of the last step. Indeed, there is a chance that uk also solves subproblem(k + 1).Theoretically, the presented method solves the time-optimal control problem of type (TOC).Performance strongly depends on the choice of the initial controls and the initial intervaldecomposition. In the next subsection, we describe the way of minimizing the overallspecification error as a convenient method for solving (TOC) approximately or at leastgiving a good initial guess for Algorithm 4.1 which then solves (TOC) strictly.The time penalization by (t − tk)2 in the algorithm is introduced to favor sub-optimalsolutions if the feasibility equation cannot be solved. If this is the case, the resultingcontrol just minimizes some kind of specification error which is not necessarily feasible forany time t < tk. If no time penalization is used, the deviation from the specification mightoccur for t > tk although it is small. The penalization term shall avoid favoring such points.

4.3.3. Integrating the Specification Error

In any case discussed before, we are investigating ways to clearly identify if a pair of gradechange time and control leads to strictly feasible state trajectories. This is possible by usingindicator functions that are identically zero for the whole range of the product specification.But this method keeps the drawback of ignoring the behavior of the state trajectories withinwhich is a direct consequence of the sufficiency for the feasibility.We define a measure for the overall squared relative specification error as a function of thecontrol u and a time 0 ≤ T < T2

ET (u) :=∫ T2

T

∑i∈I

ωi

(xi(t, u(t))− zi

zi

)2

dt (4.3.31)

105


with weights ωi. By the help of the weight parameters, the different choices of specificationtolerances are to be taken into account. Choosing

ωi :=(ziεi

)2

(4.3.32)

leads to

ET (u) =∫ T2

T

∑i∈I

(xi(t, u(t))− zi

εi

)2

dt. (4.3.33)

Then we have the following necessary inequality that holds for feasible points (T1, u(t)) ofproblem (TOC),

ET1(u) ≤ (T2 − T1) ·nI , (4.3.34)

with nI as the number of elements in the set I. This can easily be derived by noting thatthe argument in the sum is one whenever the state is at its specification boundary. Thisholds especially true for an optimal solution of (TOC). Now this can lead to an inaccurateapproach of solving (TOC) by solving the necessary inequality (4.3.34). Surely, choosingT1 = T2 solves the problem for any control u. That is why we choose T1 := 0 fixed since itis not addressed by the problem anyway. This is in some way a very natural choice sincethe time needed for the grade change is not a degree of freedom in practical problems butsimply a result of the chosen control which can only be evaluated after the grade change isperformed.Now we try to solve the problem by maximizing the distance

maxu

(T2 − 0) ·nI − E0(u)⇔ minuE0(u). (4.3.35)

In words, this means that we try to find a control u that leads to the lowest possibleaccumulated relative error from time zero. This problem is well-defined. If the maximumdistance between the left-hand and the right-hand side of inequality (4.3.34) does solve theinequality for all choices of T1 > 0, then the problem (TOC) has no solution. Thus, we canassume that the solution of problem (4.3.35) leads to a point satisfying the (weak) necessarycondition for an optimum of (TOC) for a suitable choice of T1.A more or less trivial result is that if there is a T1 < T2 in such a way that we can find acontrol u with

ET1(u) = 0, (4.3.36)

we have (T1, u) strictly feasible and

E0(u) =∫ T1

0

∑i∈I

(xi(t, u(t))− zi

εi

)2

dt. (4.3.37)

This means that the only error remains in the interval [0, T1]. However, practical relevanceis low, but it tells us something about the expected behavior of minimizing the relativeerror. In that case, minimizing E0(u) reduces to the minimization of the error in the phaseof the grade change itself, where no valid product is produced. Anyway, further assumptionson the regularity of the state trajectories must be made in order to find an upper boundfor the relative error that ensures strict feasibility and minimizing the error is always aappropriate method to find a control that satisfies this inequality. This means that nomatter how much we know about the system, the way to solve the problem will always be

106


the same.Thus, we suggest computing u∗ := arg minuE0(u) and verify that there is a time T ∗1 (u∗),so that (u∗, T ∗(u∗)) is strictly feasible for (TOC). Then we can check the KKT conditionsto see if this point is close to a critical point of (TOC).

Remark 4.3.1. In general, solving (4.3.35) does not solve (TOC) since there are waysto construct cases where the time-optimal control does not have a minimal error integral.However, in our case studies, this method shows to be practical and has benefits comparedto the method of solving (TOC) directly. See the results chapter.

If the minimum of E0 is not an optimal solution of (TOC), we suggest using it as theinitial guess for the binary search algorithm 4.1 by computing the lowest value of T ∗1 forwhich the control is feasible. If we assume that T ∗1 is not optimal for (TOC) there must bea lower value for which a feasible control can be found and we try to find it by searchingthe interval by the binary search algorithm.This new suggested method for the solution of time-optimal control problems is applied toa test problem in Chapter 7.

107

Chapter 5.

Overview on Global Optimization

5.1. Methods

Nonlinear optimization problems do not necessarily have unique solutions. For non-convexproblems, sufficient conditions for global optimality are hard to find. Algorithms that aredesigned to solve the first order necessary conditions might then lead to a local solutionthat can have a significantly wore objective function value than the global solution. Inthe previous chapter, we introduced a new method for the time-optimal control of gradechange problems which requires the solution method for the sub-problems to yield a globalminimum. This also motivates us to discuss the existing methods for global optimizationbefore we describe a new adaptive tunneling approach in Chapter 6.Global optimization has not found its way to standard literature and often is not part of anacademic education in numerical mathematics. This might be because the general problemof finding the global minimum or all minima of an arbitrary n-dimensional function onarbitrary domains remains an unsolved problem in mathematics and solution attempts aremainly based on heuristics. Although there are numerous methods suggested in scientificjournals and also some books, theoretical results are rare and none of them can guaranteeto solve the global optimization problem for a wide class of problems. It may be noticedthat for convex optimization problems, global and local optimization is actually equivalent.Nearly all attempts to solve the global optimization problem make use of stochastic meth-ods by means of (pseudo-) random variables and distribution modeling while most of themare designed for unconstrained problems. Usually, it is shown for stochastic approachesthat they are asymptotically successful which means that they find the global minimumwith probability 1 as the iteration count tends to infinity. Deterministic approaches do nothave that property and are less promising.Although the domain might be bounded by upper and lower bounds on the variables in-volved, it is often assumed that the global minimum lies in the interior of the domain whichactually makes the problem unconstrained and the bounds on the variables are just somekind of hints on where to find it.Sometimes it is distinguished between algorithms that consist of two phases and those whodo not. In two-phase methods, first a certain point or sample is chosen and then, a localsearch strategy is applied. In [Kir84] such procedures are called strategies of iterative im-provement, where we search for an improvement of the objective function in every step ofthe algorithm. In principle, methods like the controlled random search method by [Pri77],which is described in the next section, is of that kind that it consists of a procedure forrearranging the current state and a procedure for finding an improvement on the objectivefunction. A procedure of that type can also be called greedy-algorithm and seeking im-provement in every step of an optimization algorithm might look tempting because findinga point with a low objective function value is basically what we are looking for but it implies

109

Chapter 5. Overview on Global Optimization

the risk of being trapped in local optima quite early. In 1953, Metropolis et. al., [Met53],used a computer algorithm to find a solution to the physical modeling of molecular states,where a state of minimal energy, the so-called equilibrium, is sought. They presented theidea of accepting states with higher energies with a certain probability instead of strictlyrejecting them. This method later on got known by the name Metropolis algorithm. Inthe early 1980s, this idea was brought to mathematical optimization problems, first to dis-crete integer problems from Operations Research like the traveling salesman problem byKirkpatrick [Kir84] and later to continuous and real-valued optimization problems by nu-merous authors, [DA91] amongst others. This algorithm got known by the name simulatedannealing according to the physical process of melting and annealing of matter and shallbe outlined in Section 5.1.2.When reviewing scientific works on global optimization it can be seen that it is somethinglike a philosophic field of research, none of the methods is clearly superior to others andworks which give a good overview on different kinds of methods are rare. This might bedue to the fact that a general theoretical fundament is missing and many attempts to solveglobal optimization problems are rather related to practical applications in physics or eco-nomics more than academic research topics. A different approach that was developed atroughly the same time as direct methods like the controlled random search methods becamepopular, is the use of evolution and genetic strategies which are based on the principles ofbiological evolution by means of population, mutation and selection by Rechenberg in 1973,[Rec73] (evolution strategies) and Holland in 1975 [Hol98] (genetic algorithms).In this chapter, we give a brief overview on three different ideas to solve global optimizationproblems and all of them are designed derivative-free for nonlinear, non-smooth, real-valuedand box-constrained optimization problems. The tunneling algorithm presented by Levyand Montalvo in the early 1980s, [LM85], does not fit into these schemes of methods. Inprinciple, it is a two-phase method of selecting start points and performing local searchesalthough the authors interpret the phases as minimization and tunneling phase rather thanselecting and searching. Actually, two-phase methods can also be called multi-start meth-ods, see [BRK87] for statistical analysis of such methods. Generally, how to perform localsearches is not a fixed choice. In the case of tunneling, the local searches are performedby methods for nonlinear smooth optimization, such as gradient and conjugate gradientmethods, which restricts the tunneling algorithm and its extensions to smooth problemswhere derivative information can be computed efficiently. If the problem to be dealt withis to minimize a smooth objective function with continuous and continuously constrainedvariables, if function evaluations are computationally expensive and if gradient informationis available, the use of gradient-based methods to find local minima of the objective func-tion is self-evident and worth a try. In this work we have to deal with such problems andthat is the reason why we generalize and extend the concept of the tunneling algorithm inChapter 6.

5.1.1. Controlled Random Search

The controlled random search (CRS) algorithm presented by Price in 1976, [Pri77], isa direct method similar to the ones by Hooke and Jeeves [HJ61] as well as the simplexmethod for nonlinear optimization by Nelder and Mead [NM65]. Significant for direct orrandom search methods is that the algorithms do not need the function to be differentiableor the derivatives to be available and this is the main difference between direct methodsand gradient based methods. However, gradient based methods usually outperform direct

110

5.1. Methods

methods because they make use of additional information about the objective function suchas the gradient and curvature information as in Quasi-Newton methods. The reason whydirect methods are still used is that real application problems are not necessarily smoothand derivatives are often not available. Over 30 years after direct methods for unconstrainedoptimization problems became attractive, the methods are reviewed in [RMLT00].The CRS algorithm has recently been extended several times and is presented in its 6threvision by [MMAV97]. In its basic form it is a direct random search method which makesuse of simplexes to create new trial points. We outline the algorithm as it was first suggested.The algorithm can be applied to unconstrained optimization problems and constrainedoptimization problems provided that the constraints can be evaluated separately from thefunction itself. So it attacks the problem of finding x∗ ∈ Rn with

x∗ = arg minx∈Ω

f(x), (5.1.1)

where f : Rn → R can be a nonlinear function which does not need to be convex ordifferentiable and Ω ⊆ Rn.First, the algorithm picks a number of N n of points randomly from Ω,

P = pi ∈ Ω : i = 1, . . . , N, (5.1.2)

and evaluates f(pi) for i = 1, . . . , N . Let fm := minf(P ) be the minimum of the sample.Then we have to choose n+ 1 points randomly from P ,

r1, . . . , rn, rn+1 ∈ P (5.1.3)

and determine the centroid of the first n points

g =1n

n∑i=1

ri. (5.1.4)

A trial point p is computed as

p = 2g − rn+1 = g + (g − rn+1), (5.1.5)

which is the point reflection of the (n+ 1)-th point at the centroid of the simplex.Now if p ∈ Ω, compute f(p). Otherwise, generate a new set of n + 1 random points in Pand try again. If

f(p) < fm, (5.1.6)

then the trial point p is accepted and replaces pm with f(pm) = maxf(P ), the point withthe highest function value. If the trial point is not accepted, the procedure of choosing n+1points in P is repeated.So the main idea is generating feasible points with lowest known function values at thecurrent time and replace bad points in a sample of fixed size. The number of possible trial

points is (n+ 1)

(N

n+ 1

). If N is chosen significant larger than n, the number of possible

trial points is large which improves the chance of finding a point with a lower functionvalue. As the algorithm makes progress, the points in the sample P contract and clusteraround points with low function values. It does not necessarily lead to a global minimum off because it can be trapped by local minima and direct methods can cluster around local

111


hh h

hh h

z

hh h)

trial point 1 trial point 2

trial point 3

Figure 5.1.: For n = 2 and N = 3 we have 3 possible trial points by reflecting one of the 3points at the centroid of the remaining two.

minima too early before they had the chance of coming close to a global minimum. Butit can be seen as a method for global optimization because the reflection of the (n+ 1)-thdoes not necessarily lie within a direct neighborhood of the simplex. The reflection alwayspoints out of the simplex, the author called it an out-going exploration of the domain. Thisavoids rapid convergence to a local minimum, keeping the possibility of finding a globalsolution but it also slows down the convergence to global minima. In later works such as[Pri83], the algorithm was extended by the introduction of secondary trial points which liewithin the simplex to make some kind of compromise.

5.1.2. Simulated Annealing

In 1953, Metropolis et. al. [Met53] published an article on the simulation of a thermalequilibrium in solid matter. The problem is to find the equilibrium states where moleculeshave the lowest energy. That this is actually some kind of combinatorial global optimiza-tion problem, because not a single state which is locally of minimal energy but globally issought, was discovered about 30 years later. The main idea is to simulate a process of theannealing of a system. Speaking in terms of solid matter, one first has to melt the matterby heating it sufficiently and letting it cool down while the molecules in the matter try tofind their states of minimal energy.We borrow from [vLA87], one of the first books reviewing the algorithms based on simu-lated annealing, to outline the basics of the algorithm.A system is assumed to consist of certain states or called configurations C and each con-figuration has a certain energy E(C). The very simple idea now is to perturb the currentconfiguration C to get C and define ∆E = E(C)−E(C). If ∆E ≤ 0, the new configuration

112

5.1. Methods

C is accepted, otherwise it is accepted according to the Metropolis criterion

exp(−∆E

T

)> p, (5.1.7)

where p ∼ U(0, 1) is a random number. This decision criterion is based on the Boltzmanndistribution for condensed matter physics that gives probability of being in a state of acertain energy.The chance of being accepted in the case of higher energy is the lower the higher theenergy difference is and it is decreasing with the temperature T of the system. Now setup a process of annealing, thus lowering the temperature T by a certain amount at eachiteration. First, the system has to be melted by choosing an initial temperature high enoughso that accepted perturbations are easy to find. In [Kir84] it is suggested to perform manytrial perturbations and count the ratio of the ones being accepted and double the initialtemperature if the amount of accepted perturbations is below a ratio of 80 %. Once such atemperature is found, the annealing process can begin. It is known that this process has tobe slow enough so that the configurations have the chance of finding a state of equilibrium.One of the important questions is how to perturb current states. In principle, the trialpoints can be chosen randomly from the domain or an environment of the current state.For general optimizations problems the configurations can be seen as a point and the energyof a configuration is gained by evaluating the objective function at the point.The temperature parameter is actually a parameter to control the algorithm and there aredifferent ways to describe a process of lowering it, a cooling schedule.

5.1.3. Evolution Strategies

The principles of biological evolution are the motivation for evolution and genetic strategiesfor algorithms for numerical optimization of functions. In 1973, Rechenberg [Rec73] was thefirst to suggest the idea of using evolutionary concepts for numerical optimization. Furtherdevelopment of these kind of algorithms was done by Schwefel et. al. [BS02, BS93, TBS97].In these works, structured overviews on the concepts of evolution strategies and somecomputational examples are available, so we shall refer to these sources for further details.In prinicple, evolution strategies (ES) share the following basic concept. Given an initialparent population of individuals, the properties of individuals are recombined and mutatedin a certain way to create a children generation. Then, individuals are selected accordingto their fitness to be the new parent generation. In this way, ES are designed to find themaximum of a given parameter-dependent function, where an individual is just a point inspace and the fitness is the value of the objective function considered.The main steps imply a large variety and practically infinitely many variants of evolutionalgorithms that differ in the factors:

• Population: One can determine the size of a population or have several isolatedpopulations. It can be distinguished between male and female individuals and so on.

• Recombination: Children can inherit their properties from one or multiple parents.

• Mutation: The properties of the children generation can be perturbed by stochas-tic concepts, adding random numbers to their properties. An important parameterknown as mutation step length might be the variance of the random number addedto the property.

113


• Selection: Basically, there are two variants. Either individuals of the parent generationcan be part of the new parent generation or not.

• Fitness: Usually, one can take the objective function value as the value for the fitnessof an individual.

Some of the standard concepts for designing ES are given by the following and many more(nested) variants are possible.

(1+1)–ES

The simplest form of an evolution strategy is denoted by (1+1), where the first 1 meansthat the parent generation has one individual and the second 1 means that the childrengeneration also has one individual. Actually, this is just a trial-and-error method. An initialpoint is chosen, the point is mutated and the best of both is chosen to be the new parent.

(µ + λ)–ES

This is a generalization of the (1+1)–ES, where the parent population has µ ≥ 1 individualand the children population consists of λ ≥ 0 individuals. The notation with the ’+’ has aspecial meaning that says that the sets of both generation are united and selection is donefor all elements within the union. This is in contrast to the ’,’–selection.

(µ, λ)–ES

Here, the parent and children generation have again µ and λ individuals, respectively. Butinstead of selecting the µ fittest individuals from the union of both sets, the ’,’–selectionmeans that only the children are selected to be the new parent generation, so we needλ ≥ µ.

Simple Mutation

Given a current point xp ∈ Rn, namely a parent individual, the individual in the childrengeneration created from this parent can simply be a perturbation of the current pointaccording to a normally distributed random variable. Then, a child individual xc is givenby

xci = xpi + σzi, i = 1, . . . , n, (5.1.8)

where zi ∼ N(0, σ2i ). The choices of σi is crucial and they are called mutation step lengths.

This fact was known from the start of the development of evolution strategies and choosingthe optimal mutation parameters is actually an additional task to perform while runningan algorithm. First attempts lead to the follwing rule.

Rechenberg’s 1/5 Success Rule

This rule has the intention to control the rate of success while performing mutations. Forsimple exemplary univariate functions, Rechenberg calculated the optimal expected con-vergence rates for mutations according to a normal distribution with expectation 0 andstandard deviation σ by varying the standard deviation. Then the relative frequency ofsuccessful mutations could be calculated which was about 0.2 and encouraged the heuristicrule: ’If the ratio of successful mutations is below 0.2, decrease σ. If it is above, increase σ.

114

5.2. Discussion and the Tunneling Idea

Schwefel suggested in [Sch81] to measure the ratio of success p and update the standarddeviation according to

σnew :=

0.85

1nσold, if p > 1/5

0.85−1nσold, if p < 1/5

σold if p = 1/5

Actually, in n dimensions, the density function of a normal distributed random variable hasthe form

p(z) =

√detC(2π)n

exp(

12zTCz

),

where C is a covariance matrix with a total of n(n + 1)/2 parameters that can be variedaccording to a mutation strategy. This leads to the question which covariance matrix fitsbest in such a way that it maximizes the performance of the evolution strategy.

Self-Adaptation

The concept of self-adaptation for numerical algorithms requires strategies that ensure thatthe algorithm chooses its parameters automatically in such a way that they are or tendto be optimal for the current problem. This problem can be compared to the problemof Quasi-Newton updates which improve the approximation to the true Hessian until theoptimal convergence rate of the Quasi-Newton method for quadratic functions is achieved.So the strategy parameters are interpreted as degrees of freedom in a way that they existfor each individual and evolve together with them.It is again in [Sch81], where a mechanism was suggested to control the coordinates of thecurrent point simultaneously with the strategy parameters. It leads to ellipsoidal shapeddensity functions by modifying n variances together with the n coordinates, however theseare aligned to the cartesian coordinate axes by choosing only the n variances as strategyparameters. For i = 1, . . . , n,

σnewi := σoldi exp(a N(0, 1) + b Ni(0, 1)),xneqi := xoldi + σnewi Ni(0, 1),

where Ni(0, 1) are n random variables from a standard normal distribution. Now there areeffectively two parameters a and b o determine the strategy on how to modify the strategyparameters.For further details on self-adapation we refer to [Bey95] and mention the rather new conceptof the covariance matrix adaptation (CMA) by Hansen et. al. [HO01, Han98], which leadsto a method for self-adaptation that does not make use of random mutations in the evolutionof the strategy parameters.

5.2. Discussion and the Tunneling Idea

Until now we have avoided to formulate a unique global optimization problem. The natureof practical problems can significantly differ depending on the kind of application. So thereare smooth problems, non-smooth problems, problems involving continuous or discretevariables, imposing bound or linear constraints, nonlinear equality or inequality constraints,high-dimensional or low-dimensional, objective functions that are fast to evaluate withhigh precision, some which are computationally expensive, multimodal or not and so on.

115


Problems are available of every kind, so we cannot expect that there is a single algorithmto be suitable for the whole variety of possible global optimization problems. What makesthese problems even worse is that there is no general sufficient indication when a globaloptimum is found. Thus, one usually needs stopping rules that do not guarantee that asolution is found but estimate it to be a likely case, like done in [BRK87] by Bayesianestimators for the number of local minima still left to find. However, rather pragmaticstopping rules will be needed as the computational time to attempt the solution the problemmight be limited. Especially when the amount of time needed for a function evaluation islarge, than one might limit the number of function evaluations directly rather than lookingat any other stopping rules. If the problem is actually to find a good solution instead of thesolution, the results of an algorithm can be satisfying even the current point might be faraway from the real optimum.In the next chapter we mention some benchmark functions that are frequently used toestimate the performance of global optimization methods. These benchmark functionshave known properties, such as the number of global and local minima, the minimumlevel and the position the extremal points. In the global optimization literature there arenumerous other test functions, see [Dix78], and most of them are smooth and the variablesare continuous even if the algorithms that are benchmarked with them are designed forfunctions that must not be differentiable. It is well known that in most cases methodswhich make use of gradients and Hessian approximations perform much better on smoothfunctions. This is not surprising, consider a current point when the directions which allowimprovement on the objective function are rather hard to find, it is very useful to havethe steepest descent direction in form of the negative gradient direction. If the functionthat is to be globally minimized is smooth, the global optimization problem can be seen asthe problem of finding the correct start value for a gradient-based method. Once this startvalue is found, the global optimum is usually found within few iterations of a gradient-basedmethod while the performance only depends on the choice of the method. So in our opinionit is not self-evident to attack the problems in this work – which are actually smooth – withdirect search methods or evolution strategies because of the expected computational effort.That is the reason why we choose to extend a concept of function modifications in sucha way that it assists the search for a good initial guess for a local minimization methodsfor nonlinear programming, namely the tunneling algorithm concept by [LM85] from theearly 1980s which gained quite little attention by the global optimization communities.This is a concept of two-phase multi-start methods to perform sequential unconstrainednonlinear and smooth minimizations of a multimodal objective function by working onan automatically created modification of it. Using smooth optimization techniques hasthe benefit of identifying local minima by checking necessary and sufficient criteria andso, previously found minima shall be eliminated by modifying the function in such a waythat it has a singularity in the point of the local minimum. This main idea shall be thebackground of the tunneling concepts presented in the following chapter.

116

Chapter 6.

Adaptive Tunneling Concepts for GlobalOptimization

6.1. The Algorithmic Framework

In this section, we give an overview on how tunneling algorithms work and how they canbe implemented for practical optimization problems.We focus on the problem of finding the global minimum of a function

f : Ω ⊂ Rn → R. (6.1.1)

117

Chapter 6. Adaptive Tunneling Concepts for Global Optimization

This function might have multiple global minima in Ω, then the global optimization problemof finding r global minima of f can be stated as

(r −GOP): Find z1, . . . , zr ∈ Ω with f(x) ≥ f(zi) ∀ i, ∀ x ∈ Ω (6.1.2)

In general, the integer value of r is not known a priori.Using a local minimization method does not necessarily find one of the zi, since it is onlyguaranteed to find a local minimum that f might have besides the global minimum. Thebasic idea of tunneling consists of performing a sequence of local optimization runs whilemodifying the objective function in such a way that it increases the chance to find a globalminimum.The first tunneling algorithm in the literature from [LM85] separates the algorithm intotwo phases that follow each other repeatedly. In the first phase, the so-called, minimizationphase, the function f is minimized by a local method starting with a given initial guess.Once, the minimization finished with a local minimum z ∈ Ω, the second phase is calledtunneling phase and starts and it has the intention to generate another initial guess forthe next minimization phase. The tunneling phase therefore tries to find a point withf(x) ≤ f(z) so that exactly one of the two options must be true.

(i) x is a local minimum of f .

(ii) ∇f(x) 6= 0 and thus, there is further descent on f .

In the first case, a further local minimum of f is found. In the second case, x can be usedas an initial guess for the next minimization phase.Given a local minimum z of f , a modified objective function is generated by first shiftingit to T (x) = f(x)− f(z). The resulting function now has zeros at points with f(x) = f(z)and T (x) < 0 if f(x) < f(z). Solving the inequality T (x) ≤ 0 is equivalent to to find a newpoint with f(x) ≤ f(z) which is a suitable initial guess for a local minimization method,since it will lead to a local minimum that has at least the same value as at z. To make thissearch more successful, the difference between the value of f at x and the value of f at zis multiplied with a function that amplifies it and generates a non-removable singularity inz. This leads to T (x) = (f(x)− f(z)) · P(x) with a suitable function P, that will be calledpole function. This has the effect, that the point z is no longer in the domain of T andadditionally, points that lie within a region of z will generate search directions for the localminimization leading away from z. This makes this region unattractive and is supposed toincrease the chance of finding a point z 6= z with f(z) ≤ f(z), if there is such.However, it has not been proven yet, that tunneling always works this way, since it stronglydepends on the objective function and the global strategy. But we can analyze at testfunctions, what kind of effect tunneling has on the chance of finding further local minima.This is done in Section 6.2.4.We mention a basic algorithm in (6.1), as it is suggested in [LM85] and [CG00]. For anunknown function, failure or success is hard to measure, since the number of global minimais not known, nor is it the value of f at it. So whenever such an algorithm stops, we cannotknow for sure, if we found all global minima indeed and stopping itself has to be triggeredby less concrete criteria than it is in local minimization. Anyway, if the alternative waymeans just doing a single local minimization, then the results of tunneling algorithm aredefinitely at least as good as this. Therefore, applying a global optimization method suchas tunneling, is a pragmatic and justifiable way in optimization. Even more, for practicaloptimization problems, sufficient criteria for a local minimum do not necessarily have to bemet, since the goal is just finding a function value, that is as small as possible.

118

6.1. The Algorithmic Framework

Algorithm 6.1 Basic Tunneling Algorithm

(i) Minimization Phase: Given an initial guess x0. Use a local minimization methodto gain a local minimum z of f . If this is the first time the minimization phase iscalled or if f(z) < f∗, set f∗ := f(z) and k := 1. If f(z) = f∗, increase k by 1 andstore z to zk, where k is the number of (successfully) performed local minimizationsthat lead to an equal function value. This counts the number of local minima at thesame level.

(ii) Tunneling Phase: Modify the tunneling function T (x) = (f(x)−f∗) · P(x), so thatz is a singularity of T . Generate a point within a certain area of z and start a localminimization to find a point x with T (x) ≤ 0. Use loops to generate further startvalues, if necessary. Once, such a point x is found, set x0 ← x and return to phase(i). Otherwise, repeating the loops will hit certain stopping criteria such as functionevaluation limits or computational time limits and the algorithm has to stop.

6.1.1. Scaling

Real applications can use decision variables in different scales. Some local optimizationmethods are sensitive to that kind of scaling when changing the decision variables simul-taneously. For tunneling algorithms, scaling is also necessary. In the following section, wewill show concepts how to create suitable tunneling functions by special metrics and polefunctions. In order to be able to measure the distance between two points reliably by thesemetrics, it is important, that all components of the variable are equally scaled. Otherwise,the tunneling functions would not modify the objective functions evenly in every direction.We will focus on problems of type (6.1.2), while we assume, that the subset Ω ⊂ Rn hasthe form

Ω := x ∈ Rn : ì ≤ xi ≤ ui (6.1.3)

with lower and upper bounds, l and u, for each variable in x. So, Ω is an n-dimensionalcuboid. Solving the local optimization sub-problem

(LOP)

min f(x)x ∈ Ω

(6.1.4)

is equivalent to solving the scaled optimization problem

(SLOP)

min f(s(x))x ∈ [0, 1]n

(6.1.5)

with the linear scaling function s : Rn → Rn

s(x) = Ax+ `. (6.1.6)

The diagonal scaling matrix A has the form

A = diag(u1 − `1, . . . , un − `n). (6.1.7)

It follows, that s((0, . . . , 0)T ) = ` and s((1, . . . , 1)T ) = u.We can write g(x) := f(s(x)), and when trying to minimize g in x with a gradient based

119


method, the gradient is scaled also by applying the chain rule.

∇g(x) = ∇f(s(x))Js(x). (6.1.8)

Since s is a linear function, the jacobian satisfies Js(x) = A for all x. For a single componenti, this means just scaling the partial derivative of f by the length of the associated interval.

gxi = (ui − ì)fxi (6.1.9)

6.2. A General Class of Tunneling Algorithms

We begin with some elementary definitions of special classes of functions, that are used ingeneral tunneling algorithms. There are some important properties of these functions thatcan be derived and encourage the tunneling approach and the strategies that are applied.In this way, we generalize the tunneling concept of [LM85], [BG91] and [CG00] and showpossible modifications.

6.2.1. Definitions and Properties

Definition 6.2.1. Recall that a function D : Rn × Rn → R is a metric if ∀ x, y, z ∈ Rn

(i) D(x, y) = 0⇔ x = y (Definiteness),

(ii) D(x, y) = D(y, x) (Symmetry),

(iii) D(x, z) ≤ D(x, y) +D(y, z) (Triangle Inequality).

Without having the triangle inequality, we shall call a function semi-metric.If for a fixed z ∈ Rn the function d : Rn → R with

dz(x) := D(x, z) (6.2.1)

is a semi-metric and at least twice continuously differentiable in Rn\z, then we call dz(x)a distance function for x relative to z.

For example, the function dz(x) = ‖x−z‖ is a distance function and known as the Euclideanmetric. The points x ∈ Rn that satisfy ‖x− z‖ = 1 would yield the unit sphere, but usingthe more general definition of distance functions gives us the possibility of creating morecomplex and flexible shapes by dz(x) = 1. We can use any p-norm or an elliptic quadraticform as we show in the following section.We now define a pole function that uses the distance functions to measure the distancebetween a point and its center to create a singularity in it.

Definition 6.2.2. Given a distance function dz : Rn → R from Definition 6.2.1 relativeto a point z ∈ S, while S ⊂ Rn is bounded. Then a function P : Rn → R is called(multiplicative) pole function for z if the following properties hold:

(i) limdz(x)→0 P(x) =∞ (Singularity),

(ii) P(x) ≥ 1 ∀ x ∈ S \ z (Positiveness),

(iii) P(x) = 1 ∀ x /∈ S (Multiplicative Neutrality),

120


(iv) For every x, y ∈ S with dz(x) > dz(y), it holds P(x) < P(y) (Strict Monotonicity).

The point z can be called singularity of P or center of S, which is called ’pole region’.

We see that a pole function is given by the following construction.

Remark 6.2.1. For a pole function in n dimensions, it is enough to have a univariatepole function applied to a suitable distance function. So for a distance function dz, a polefunction P on Rn can be achieved by using an univariate function ρ : R+ → R

P = ρ dz = ρ(dz(x)), (6.2.2)

if ρ is a pole function in one dimension with respect to the Euclidean distance function forthe scalar input dz(x). This transforms the monotonicity condition to

ρ′(x) < 0 ∀x > 0.

The chain rule of differentiation applied to P shows that a pole function has no stationarypoints if the only stationary point of the distance function is at its center z. This is a weakcondition, since it is already satisfied for common distance functions, such as ‖ · ‖. It canbe seen from the metric axioms that a distance function dz(x) has its global minimum atx = z.

A quite simple pole function can be the piecewise defined hyperbola with ρ(x) = 1x

P(x) =

1

dz(x) x ∈ S \ z,1 x /∈ S.

For example choosing dz(x) := ‖x − z‖ and S = x ∈ Rn : dz(x) < 1, as it is done in asimilar way in [LM85], would give us P(x) ∈ C0(Rn).

Definition 6.2.3. We have given np points in Z = z1, . . . , znp ∈ Rn. For each point,we have a pole function Pi with an appropriate set Si for i = 1, . . . , np. A function T :Rn \ Z → R is called tunneling function for an arbitrary continuous function f : Rn → R,if it has the form

T (x) := (f(x)− f∗)np∏i=1

Pi(x), (6.2.3)

while f∗ := mini=1,...,np f(zi) or f∗ := 0 if Z = ∅.

The gradient of general tunneling functions is given by the following.

Lemma 6.1. Given a tunneling function from Defintion 6.2.3, then its gradient is givenby

∇T (x) = ∇f(x)np∏i=1

Pi(x) + (f(x)− f∗)

np∑i=1

∇Pi(x)∏j 6=iPj(x)

(6.2.4)

Proof. This gradient is gained by repeatedly applying the product rule to (6.2.3) and canbe shown by induction.

121


−2 −1 0 1 20

10

x

f, T

−2 −1 0 1 20

10

x

f, T

Figure 6.1.: In these two figures, we can see the basic mechanism of pole functions. Thedashed line shows a quadratic function with a minimum at 0. The solid linein the left figure marks a pole function which smoothly changes to the identityat distance 1 relative to 0. In the right figure, we can see the result, whenmultiplying both functions. Obviously, the minimum at 0 is destroyed, whiletwo artificial minima are created within radius 1 of the former minimum.

What we see here is that calculating gradient information of a tunneling function onlyrequires the gradient of f , its function value and the gradients of the pole function. Evalu-ating pole functions and its gradient can be done analytically because usually its formula isknown. Now it is convenient to calculate the tunneling function and its gradient separatelyfrom the function f as it has to be done in black-box environments.Now we will state some properties of tunneling functions.

Lemma 6.2. Given a function f : Rn → R, a tunneling function T : Rn\Z → R as in Def.6.2.3 with a set Z of singularities, appropriate pole functions Pi and subsets Si. Then,

(i) T (x) ≤ 0⇔ f(x) ≤ f∗,

(ii) Z = ∅ ⇒ T (x) = f(x),

(iii) T (x) = f(x)− f∗ ∀x /∈⋃npi=1 Si.

Proof.

(i) It follows directly from the pole function property P(x) ≥ 1 > 0.

(ii) The product is 1 and f∗ = 0.

(iii) If x is in none of the pole regions, each multiplier Pi(x) = 1 for i = 1, . . . , np.

122


An illustration of a tunneling function applied to a simple objective function in one dimen-sion is shown in Fig. 6.1. It is important to note, that T is created by multiplying a shiftedversion of f by a product of pole functions with the property of being neutral elements inmultiplication outside of their pole regions. Therefore, T is just the shifted version of fthere. Inside the pole regions, it holds Pi(x) ≥ 1 for all i = 1, . . . , np, so T is just amplifyingthe difference between f(x) and f∗ and keeps its sign. Defining a tunneling function withadditive terms, for example with the shifted pole functions (Pi(x)− 1), would destroy theproperty one of Lemma 6.2.We assumed, that the distance functions we use, are at least twice continuously differen-tiable. The function f is also assumed to be twice continuously differentiable. Then itdepends on the type of the pole functions, which order of differentiability of T can beachieved.The following disadvantageous characteristic of tunneling functions has to be kept in mind.We just state it for dimension n = 1 to illustrate the problem.

Lemma 6.3. Given a function f : R→ R with a local minimum z and a tunneling functionT : R \ z → R with a single pole function P for a pole interval S := (a, b) with a < z < b.If T is twice continuously differentiable, and f is strictly monotone decreasing in (a, z) andincreasing in (z, b) then it holds:

∃ x ∈ a, b : f ′(x)(x− z) > 0, f(x) > f(z)⇒ ∃ x ∈ (a, b) \ z : T ′(x) = 0, f ′(x) 6= 0(6.2.5)

In words: if there is a point at the boundary of the interval, whose descent direction pointsaway from z, then there exists a local minimum of T in S that is no local minimum of f .Such a point shall be called artificial minimum of T .

Proof. Since P(x) = 1 and P ′(x) = 0, because the pole function smoothly transforms tothe identity outside of S, it holds T ′(x) = f ′(x) for x /∈ S. But T has a singularity in z andtherefore, there must be a point y in a small region close to z with f ′(x)(y − z) < 0 andT (y) > T (x) = f(x). When y and x lie at the same side of z, that is when sgn (x− z) =sgn(y − z), it follows from the intermediate value theorem for the derivatives of T , thatthey have a zero within S. This can be applied because T is continuously differentiable.We call that zero x.To show that x is not a zero of f ′, we form the derivative of the tunneling function by theproduct rule.

T ′(x) = f ′(x)P(x) + (f(x)− f(z))P ′(x)

Now at x, we have T ′(x) = 0. f is monotone, so it follows from f(x) > f(z) that f(x) >f(z). Now if f ′(x) = 0, there must be P ′(x) = 0, which only holds at x = x since polefunctions are strictly monotone in S. But this a contradiction to x ∈ S \ ∂S.

An example for this phenomenon is shown in Fig. 6.1.

Remark 6.2.2. Given a single pole function P for z ∈ S with distance function dz andresulting tunneling function T . Assume, that there is a stationary point x ∈ S of f with∇f(x) = 0, f(x) 6= f∗ and x 6= z. If z with dz(z) = 0 is the only stationary point of dz,then we have ∇P(y) 6= 0 ∀ y ∈ S. This means that the gradient of T cannot be zero at x.In words: a stationary point of f is no stationary point of T .This seems disadvantageous, but there are two scenarios:

(i) f(x) > f∗

123


(ii) f(x) < f∗

In both cases, x cannot be detected by a zero of the gradient of T . In case (i), the value off is larger than the best known at z, that means, that such a point can be ignored. In case(ii), we have f(x) < f∗ and so T (x) < 0. Now, this is a sufficient stopping criterion forthe search for a lower function value of f .

We have seen that T might have artificial local minima, while on the other hand f can havelocal minima that are not such in T . Artificial local minima are worth to be dealt with,since they can be the limit point of local optimization methods, but the other fact is not alimitation as argued in Remark 6.2.2.We now derive a more explicit distance function with some useful properties.

Lemma 6.4. Assume that we are given a point z ∈ Rn and a parameter δ > 0. If thematrix Q ∈ Rn×n is symmetric and positive definite, then the set of all x ∈ Rn that satisfy

Dz(x) := D(x, z) :=1

δ2λ∗(x− z)TQ(x− z) ≤ 1 (6.2.6)

is an ellipsoid with center z and it holds

max||x− z|| : x ∈ Rn,Dz(x) = 1 = δ, (6.2.7)

while λ∗ is supposed to be the smallest (positive) eigenvalue of Q. Dz(x) is a distancefunction and shall be called the ellipsoidal distance function (EDF) for x and z.

Proof. To show that the root of the quadratic form (6.2.6) is a distance function, we firsthave to check if D(x, z) is a semi-metric. Definiteness follows directly from the positivedefiniteness of Q. We show that D(x, z) is symmetric.

D(x, z) =1

δ2λ∗(x− z)TQ(x− z)

=1

δ2λ∗(−1)(z − x)TQ(−1)(z − x)

= (−1)(−1)1

δ2λ∗(z − x)TQ(z − x)

= D(z, x).

For completeness, we can prove the triangle inequality for√D(x, z). For simplicity we

writeD(x, y) :=

√(x− z)TQ(x− z)

without loss of generality. Because Q is positive definite, there exists Q12 and we get

D(x, y) = ‖Q12 (x− y)‖.

By using Q12x = x, Q

12 y = y and Q

12 z = z for an invertible matrix Q

12 , we get

D(x, y) = ‖x− y‖,≤ ‖x− z‖+ ‖z − y‖,= D(x, z) + D(z, y).

124


Since Dz is basically the a quadratic form, it is infinite differentiable and thus it is a distancefunction for a specified point z.To prove, that the maximum Euclidean distance of points with Dz(x) = 1 is just δ, we usethat the matrix Q is symmetric and therefore it is diagonalizable of the form

Q = P TDP

with a diagonal matrix D consisting of the eigenvalues λi of Q and an orthogonal matrixP . Without loss of generality, we can assume z = 0. Then, it follows for the quadratic form

q(x) := xTQx

= xTP TDPx

= (Px)TD(Px).

We set y := Px as an orthogonal transformation of x. It can easily be seen, that thec-level-curves of q(y) := yTDy with q(y) = c are ellipsoids. The level curve equation is

λ21y

21 + · · ·+ λ2

ny2n = c.

Set ri :=√

cλi

, then

y21

r21

+ · · ·+ y2n

r2n

= 1

is the basic coordinate form for ellipsoids with radiuses ri in coordinate directions. Nowintroduce a parameter γ > 0 and the function

D0(x) := γ xTQx.

The 1-level-curve of the quadratic function D0(x) is an ellipsoid with radiuses pi =√

1γλi

.To limit the maximum distance between the center of the ellipsoid and the points on thelevel curve, take

||p||∞ = maxi|pi| = max

ipi =

√1γλ∗

with the smallest eigenvalue λ∗, since pi gets larger as λi gets smaller. With γ = 1δ2λ∗ we

have ||p||∞ = δ and D0(x) is just the ellipsoid distance function that we were looking forwith z = 0.

Given a positive definite matrix Q and defining δ as above, thus gives an ellipsoid withcontrolled maximal volume expansion. If Q is diagonal, then the resulting ellipsoid isparallel to the coordinate axes called isotropic. If it is the identity matrix I, we get Dz(x) =1δ2 ‖x− z‖22, which describes a sphere with radius δ. Note that the gradient of the distancefunction is given by

∇Dz(x) =2

δ2λ∗Q(x− z) (6.2.8)

after differentiating the quadratic form. Defining a unit sphere by Dz(x) = 1 yields anellipsoidal shaped set with maximal Euclidean distance δ > 0 to the center. Given a certainpoint x, increasing δ would produce a lower value of D(x) while simultaneously decreasingthe length of the gradient. So, increasing δ moves the current point closer to the center and

125


−1 −0.5 0 0.5 10

0.5

1

x-z

d(x

)

−1 −0.5 0 0.5 10

5

10

15

20

x-zP

(x)

Figure 6.2.: The left figure shows two distance functions in R. The solid line belongs tod1(x) = |x−z| and the dashed line to d2(x) = |x−z|2. In the intervals (0, 1) and(−1, 0), it holds d2(x) < d1(x). We can see that the distance functions actuallydiffer, although the boundary −1, 1 is the same for both. This means thatpole functions will work differently dependent on the choice of the distancefunction, even if the boundary of the defined region is the same. This can alsobe seen in the right figure, where the dashed line belongs to the pole functionwith P(x) = 1/d2(x) at points with d2(x) < 1.

the strict monotonicity of pole functions thus implies that the value of an associated polefunction would also increase.We need a flexible distance function that is easy to be evaluated, this is given by (6.2.6)and its gradient by (6.2.8). If a point x lies outside of a defined ellipsoid, Dz(x) will takevalues larger than 1. Inside the ellipsoid, it will continuously take values between 0 and 1,which is a desired property of tunneling distance functions. So, a general ellipsoidal poleregion is given by the open set

S := x ∈ Rn : Dz(x) < 1. (6.2.9)

Now we need to form some pole functions with an outer function as in (6.2.2), ρ : (0, 1)→ R.

6.2.2. Pole Functions

Before we suggest a new type of a pole function, we investigate the ones that are alreadygiven in the literature. Former work from [LM85] suggested the choice of distance and polefunctions as follows. In our notations, this becomes basically

dz(x) := ‖x− z‖22, (6.2.10)S := x ∈ Rn : dz(x) < 1, (6.2.11)

P(x) :=

dz(x)−µ x ∈ S \ z,

1 x /∈ S.(6.2.12)

126


0.9 1 1.10.95

1

1.15

x

x−µ(x

)

0.75 1 1.251

3.85

x

exp(

µ(x)

/x)

Figure 6.3.: In the left figure, we can see how the ramp function works on the classicaltunneling function and on the right-hand side on the exponential tunnelingfunction, both functions with µ0 = 1. For the plot, a switching parameterε = 0.05 was chosen. Both functions are non-differentiable at points withd(x) = 1 + ε or d(x) = 1− ε.

The resulting tunneling function is known as the classical one, since it is the first one thatwas suggested in the literature. Here, a parameter µ > 0 is introduced as an exponent tothe distance in order to control the removability of the singularity in the resulting tunnelingfunction. In the associated work, this parameter is chosen large enough for the tunnelingfunction to have indeed a pole at z.Later work from [BG91, CG00] suggested a different choice of pole functions.

dz(x) := ‖x− z‖22, (6.2.13)S := x ∈ Rn : dz(x) < 1, (6.2.14)

P(x) :=

exp

(µ

dz(x)

)x ∈ S \ z,

1 x /∈ S.(6.2.15)

The resulting tunneling function is consequently called exponential tunneling function. Italso uses a parameter µ to control the characteristic of the pole function.It can be seen that continuity and differentiability of the pole functions or the tunnelingfunctions, respectively, have to be discussed. In case of the classical function (6.2.12),continuity is automatically achieved since d(x)−µ = 1 at points x with d(x) = 1 and so itcontinuously changes to the identity for d(x) ≥ 1. In [LM85], a smoothing ramp functionon the parameter µ in a small region at d(x) = 1 is applied that switches µ as function ofd(x) continuously from its actual value to zero. It has the form

µ = µ(x) =

µ0, dz(x) ≤ 1− ε,0, dz(x) ≥ 1 + ε,

(µ0/2)(1 + (1− dz(x))/ε), else,

127


−1 −0.5 0 0.5 1

0

x

tan(

π /2

x)µ

0 0.5 10

xta

n−µ

Figure 6.4.: The left figure shows two functions tan(π2x)µ with µ = 1 for the solid line andµ = 2 for the dashed line. In the right figure, we can see the same functionswith −µ as exponent. Both are pole functions for the interval (0, 1) with a poleat 0 and tending to zero as x tends to 1.

for a small ε > 0 and the desired parameter µ0. The notation with a pole region S fromabove then is not valid, since the ramp function changes to the 1-function implicitly and anexplicit switch is not needed, so just take P(x) = d(x)−µ(x). The function (6.2.15) is evennot continuous at points with d(x) = 1, in [CG00] this is achieved by using the same rampfunction on the parameter µ. In both (6.2.12) and (6.2.15), applying the ramp functionstill preserves non-differentiability.We now suggest a pole function that yields continuity and a variable order of differentiabilityin a natural way without the use of ramp functions and define it in a more general regionthan the unit sphere.We define a new pole function as

P(x) := P(µ)(x) :=

tan

(π2Dz(x)

)−µ + 1 x ∈ S \ z,1 x /∈ S.

(6.2.16)

with the ellipsoidal distance function (EDF) from (6.2.6) and the pole region (6.2.9). Weshow that depending on the choice of µ, (6.2.16) can reach different orders of differentiabilityP(µ) ∈ Cs(µ), where s(µ) is a step function.

Lemma 6.5. Given the pole function (6.2.16) and a parameter µ, then it holds:

(i) µ > 0⇒ P(µ)(x) ∈ C0(Rn).

(ii) µ > 1⇒ P(µ)(x) ∈ C1(Rn).

(iii) µ > 2⇒ P(µ)(x) ∈ C2(Rn).

This particularly refers to points with Dz(x) = 1.

128


Proof. The distance function is by definition at least in C2(Rn), so it remains to show thatthe function

ρ(d) = tan(d)−µ + 1

for a scalar d has those three properties. We have to show that limd→π2ρ(d) = 1, limd→π

2ρ′(d) =

0 and limd→π2ρ′′(d) = 0, since these are the properties of the identity. We let d tend to π

2and investigate the effect on ρ(d).

limd→π

2

ρ(d) = 1 + limd→π

2

tan(d)−µ

= 1 + limd→π

2

sin(d)−µ

cos(d)−µ

= 1 + limd→π

2

cos(d)µ

sin(d)µ

= 1 +0µ

1µ(µ > 0)

= 1

Thus, ρ(d) can be continuously merged with the identity at d = π2 if µ > 0. Form the first

derivativeρ′(d) = −µ tan(d)−µ−1

(1 + tan(d)2

).

Now for the limit of the first derivative for d→ π2 , it holds:

limd→π

2

ρ′(d) = −µ limd→π

2

1 + tan(d)2

tan(d)µ+1

= −µ limd→π

2

1tan(d)µ+1︸︷︷︸→0

+ tan(d)1−µ︸︷︷︸→0,µ>1

︸︷︷︸

→0

= 0

This means that for µ > 1, ρ has slope 0 at d = π2 , thus smoothing P(x) at points with

Dz(x) = 1.We still have to check the second derivative.

ρ′′(d) = −µtan(d)−µ−1(−µ− 1)

(1 + tan(d)2

)2tan(d)

− 2µ1 + tan(d)2

tan(d)µ

Building the limit in a shorter notation, replacing tan(d) with tan, gives:

limd→π

2

ρ′′(d) = limd→π

2

µ(µ+ 1)(1 + tan2)2

tanµ+2 − 2µ1 + tan2

tanµ

= limd→π

2

µ(µ+ 1)tan4 +2 tan2 +1

tanµ+2 − 2µ1 + tan2

tanµ

= limd→π

2

µ(µ+ 1)

1tanµ+2︸︷︷︸→0

+1

tanµ−2︸︷︷︸→0

+2

tanµ︸︷︷︸→0

−2µ

1tanµ︸︷︷︸→0

+1

tanµ−2︸︷︷︸→0

= 0

129


0 0.392699 0.785398 1.1781 1.5708−6

−4

−2

0

2

4

6

d

log(

cos(

d)/s

in(d

))

Figure 6.5.: This figure shows the function ln(

cos dsin d

)within the interval (0, π2 ). It is sym-

metric to the point (π4 , 0). For points with d < π4 , the logarithm takes positive

values.

The terms 1tan(d)µ−2 tend to zero for d to π

2 , if µ− 2 > 0, so the resulting pole function thatuses ρ(d) with d := π

2Dz(x) is twice continuously differentiable at points withDz(x) = 1.

Higher derivatives have not been investigated, but this is not necessary, because the under-lying objective function f is supposed to be in C2(Rn) and not more.

6.2.3. Handling Removability by Choosing µ

We introduced the concept of pole functions and it is clear that a pole function multipliedwith an arbitrary function can yield a removable singularity, thus the pole is actuallynone and there will be hardly a positive effect of the tunneling function. We analyze theremovability of the singularity of pole functions and derive a simple heuristic to choose theparameter µ in a such a way that we have practically a non-removable singularity.To check that the construction from the preceding section is indeed a pole function, wehave to investigate the removability of the singularity at distance 0. We refer to the resultsin complex analysis from [FB07].

Lemma 6.6. The function ρ : (0, 1) :→ R+ given by

ρ(d) = tan(d)−µ + 1

has a pole of order k = dµe at d = 0 for µ > 0.

Proof. If µ ≤ 0, we have no pole at all since limd→0 ρ(d) = 1, thus we have to assume µ > 0.

130


It holds

limd→0

ρ(d) = limd→0

1tan(d)µ

+ 1

= limd→0

0tan(d)µ−1

(1 + tan(d)2

)︸︷︷︸→1

+ 1

=0µ

limd→0

tan(d)1−µ + 1

= 0 · 0 + 1 = 1 if (1− µ) ≥ 0.

We used de l’Hosptial’s rule and we can see that if µ ≤ 1, it holds limd→0 ρ(d) = 1. Thus,ρ(d) <∞ and ρ has a removable pole at 0.We have to show for which k ∈ N the function ρ(d) := dkρ(d) has a removable pole atd = 0. Therefore, ρ(d) has to be bounded in an area around 0. Form the limit

limd→0

ρ(d) = limd→0

dk(tan(d)−µ + 1

)= lim

d→0

dk

tan(d)µ+ limd→0

dk

= limd→0

dk

tan(d)µ.

both numerator and denominator tend to zero as d tends to zero. Again, we apply del’Hosptial’s rule

limd→0

ρ(d) = limd→0

kdk−1

µ tan(d)µ−1(1 + tan(d)2

)︸︷︷︸→1

=(k

µ

)limd→0

dk−1

tan(d)µ−1.

Applying this rule m times yields

limd→0

ρ(d) = cm limd→0

dk−m

tan(d)µ−m

with a constant cm. It holdslimd→0

dk−m <∞

if k ≥ m. For the denominator,

limd→0

tan(d)µ−m =∞

if µ < m. Then, ifk ≥ m > µ,

we havelimd→0

ρ(d) = 0 <∞,

while m ∈ Z. The smallest possible k ∈ Z, for which ρ(d) is bounded is therefore given byk := m = dµe.

131


We have seen that this function creates a pole of a certain order that depends on the choiceof µ. Multiplying this pole function with an arbitrary continuous function might resultin a tunneling function with a removable pole, when µ is too small. And when such afunction is used as a pole function in a tunneling function, the removability depends on theterm (f(x) − f(z)) for the local minimum z. If the distance d of points tend to zero, theunivariate pole function ρ tends to infinity, but the term (f(x)− f(z)) tends to zero.Encouraged by Riemann’s Removable Singularity Theorem [FB07], we can guess that thereis a removable singularity if the function is bounded in a region around the singularity.Then, on the other hand, if the function can take arbitrary large values, it is unboundedand the singularity cannot be removable. Now the heuristic is to choose µ in a way suchthat the tunneling function takes a specified large value close to the singularity. Therefore,we proceed as illustrated by the following example.Given a local minimum z of f , then a tunneling function has the form

T (x;µ) = (f(x)− f(z))P(µ)(x)

with a pole function P(µ)(x) that depends on the choice of the parameter µ. The distancefunction Dz(x) for a fixed δ is used. Generate a point x with Dz(x) = ε for a small0 < ε 1. Choose C > 0 as the maximum of all function values of (f(x)− f(z)) that areknown so far. Now demand

T (x;µ) = C

and calculate µ that satisfies this equation. This leads to

C = (f(x)− f(z))P(µ)(x) (6.2.17)

⇔ C = (f(x)− f(z))(

tan(π

2Dz(x)

)−µ+ 1)

(6.2.18)

⇔ µ =ln(

C(f(x)−f(z))

)ln(

cos(π2 ε)sin(π2 ε)

) if f(x) > f(z). (6.2.19)

The numerator is always positive and the denominator is positive if Dz(x) = ε < 12 . Still, ε

has to be chosen even lower, since a denominator close to zero would produce very large µ.Determining the parameter µ in this way needs an evaluation of the function f at a point xthat has distance ε to the current pole. It is convenient to use this point as initial point ofthe next tunneling steps. In this way, no function evaluation is lost, while we ensure thatwe start from the highest known level of the shifted function (f(x)− f(z)).It is also possible to replace the term (f(x)− f(z)) by its quadratic approximation

(f(x)− f(z) ≈ Gz(x) :=12

(x− z)TH(x− z). (6.2.20)

Assume that we use the elliptic distance function with a parameter δ > 0. Then, for pointswith Dz(x) = ε, it holds

(x− z)TH(x− z) = ελ∗δ2. (6.2.21)

This means that the quadratic approximation takes

Gz(x) =12ελ∗δ2 (6.2.22)

132


at points with elliptic distance ε and matrix Q = H. Thus, we can approximate µ by

µ ≈ln(

2Cελ∗δ2

)ln(

cos(π2 ε)sin(π2 ε)

) . (6.2.23)

We will call this type of EDF shape-identified, since they use the same curvature informationas the function f . We will discuss this in Section 6.2.6.Generating start values for a tunneling algorithm is treated in Section 6.4.1.

6.2.4. Empirical Analysis of the Tunneling Concept

The concept of tunneling utilizes some properties of the local minimization method used.If a line-search is performed, tunneling benefits from the line-search not being exact. Inthis way, there might be points within the pole region of a tunneling function that lead thelocal method to a new local minimum by jumping over any artificial minima that might bethere also. For our studies, we used the MATLAB routine fmincon by

options = optimset(’LargeScale ’, ’off ’);options.RelLineSrchBndDuration = 2;options.RelLineSrchBnd = 0.1;options.DiffMaxChange = 0.0001;options.DiffMinChange = 1e-8;options.TolFun = 1e-5;z = fmincon(@fun , x, ..., lb, ub, ..., options );

and used lower and upper bounds on x, while setting the Large-Scale flag to off which causesMATLAB to use a line-search SQP method. This empirical analysis consists of numericalexperiments with test functions by using this minimization method. Using different methodsmight lead to slightly different results, but we use them to encourage our heuristic strategies.

To have an idea on how tunneling actually works, we make some studies at test functions.In the first case, we define a function f : R→ R as

f(x) = sin(x) + 1 for x ∈ [π, 4π], (6.2.24)

see Fig. 6.6.The function f has local minima at −π

2 ,3π2 and 7π

2 which are global as well with functionvalue 0. We will assume that the middle one of the local minima z = 3π

2 has been foundby a local minimization algorithm as chosen above. Then the question arises, what kind ofeffect the presence of a pole in z has on the further process of finding all minima of f . Theanswer depends on f , the choice of the local minimization method, the global approach,the type of the pole function and (in one dimension) the radius of the pole region. Thechoice of the local minimization method shall be fixed, because in the following empiricalinvestigation, we like to analyze the mechanism of the tunneling idea itself, not regardingfunction evaluation counts. We define a very simple algorithm to find all local minimaof f . There is no stopping criterion in this algorithm, but we do not need it for ourinvestigation. In each loop of the algorithm, we can interpret the deterministic result ofthe local minimization as a function of the random variable x ∼ U(a, b)

Mf : [a, b]→ Z, (6.2.25)

133


−2 0 2 4 6 8 10 12−1

0

1

2

3

4

5

x

f(x)

−2 0 2 4 6 8 10 12−1

0

1

2

3

4

5

x

T(x

)

Figure 6.6.: The left figure shows the function f(x) = sin(x)+1 and the regions of attractionof the three local minima are marked by the colored bars. The blue bars markthe regions, whose points lead to z2 = −π

2 , the green regions lead to z1 = 3π2

and the red ones lead to z3 = 7π2 . The black bars in the right figure mark the

regions that lead to the artificial minima created by the pole function. Here,the parameters (δ, µ) = (2, 3) are used.

Algorithm 6.2 Local Search with Random Starts in [a, b]

(i) Determine x ∼ U(a, b) from an uniform distribution by some random number gener-ator.

(ii) Start a local minimization of f with initial guess x.

(iii) Once the minimization finishes with a local minimum z of f , return to (i).

134


where Z = z1, . . . , zm is the set of all local minima of f in [a, b] and the subscript fdenotes that it is the result of the minimization method for the function f . We thereforeassume that for every point in the interval [a, b], there exists a limit point of the localminimization method.

Definition 6.2.4. The inverse images of zi

mi := M−1f (zi) = x ∈ [a, b] : Mf (x) = zi (6.2.26)

are the sets of those starting points in [a, b] from which the local minimization methodconverges to the local minimum zi. We will call them region of attraction for zi and a givenlocal minimization method.

If we assume that those sets are Borel-measurable, it holds

|mi| = (b− a)P (Mf (x) = zi) with x ∼ U(a, b). (6.2.27)

andm∑i=1

|mj | = b− a. (6.2.28)

The function Mf is only taking discrete values from Z and by measuring the inverse imagesof these values, we get the probability of the local minimization method finishing with thatvalue when starting from a unitarily distributed random value.We have to approximate these probabilities. A convenient way to do this is by applying aQuasi-Monte-Carlo scheme. We generate a set of r equidistant points

a < x1 < · · · < xr < b

and calculate Mf (xi) for i ∈ Ir = 1, . . . , r. We count for each local minimum zj the resultin a set Nj := i ∈ Ir : Mf (xi) = zj. We approximate

|mj | ≈1r

#Nj j = 1, . . . ,m. (6.2.29)

Here, #Nj notates the absolute number of indices in Ir that lead to zj . Dividing this numberby r approximates the ratio on the interval. The larger r, the better the approximationmight become. This follows from the central limit theorem.If already one single local minimum z1 of f is found, then we are interested in the probabilityof finding a local minimum in the next step that differs from z1. This is consequentlyapproximated with probability

P (Mf (x) 6= z1) = 1− P (Mf (x) = z1)

=1

b− a

n∑j=2

|mj |

≈ 1r

m∑j=2

#Nj =: pf with x ∼ U(a, b). (6.2.30)

This is a quite general concept for analyzing the regions of attraction for a particularfunction using a fixed minimization method. It can also be applied, when a pole exists.

135


Again, we will assume that the first local minimum of f is found at z1. Using the EDF Dz(x)from (6.2.6) and the pole function P(x) from (6.2.16) with a given µ > 1, the tunnelingfunction

T (x) = (f(x)− f(z1))P(x)

destroys the minimum of f in z1. In the one dimensional case, the simple EDF for Q = I = 1simplifies to Dz(x) = 1

δ2 (x− z)2 for a given δ > 0. The tunneling function only has m− 1of the original local minima of f left, but may exhibit artificial local minima as shown inLemma 6.3. Thus, the domain and codomain of Mf change to

MT : [a, b] \ z1 → (Z \ z1) ∪ ZA︸︷︷︸=:ZT

with ZA as the set of all artificial minima of T . Since z1 is not in the codomain of MT , theminimization method cannot converge to it. Now we are interested in the approximationof the probability of the next step resulting in a new local minimum

pT ≈ P (MT (x /∈ ZA), (6.2.31)

that is not an artificial one. We define again

mi := M−1T (zi) i = 2, . . . ,m,

then the probability to find a new local minimum of f by minimizing T from a randomstart value is

(b− a)P (MT (x /∈ ZA) =m∑i=2

|mi|,

which can be approximated in the same manner as it is done for the unmodified objectivefunction f and we denote it by pT . One can say that tunneling indeed works in that case,if pT > pf or |M−1

T (ZA)| < |M−1f (z1)|.

In the following we will compare pf and pT for f and show that pT strongly depends on δ.Recall the test function (6.2.24). We choose a local minimization method and discretizethe interval [−π, 4π] equidistant with r = 1500 points

xi = −π +i

r5π i ∈ Ir := 1, . . . , r.

We haveZ = −π

2︸︷︷︸=:z2

,3π2︸︷︷︸

=:z1

,7π2︸︷︷︸

=:z3

.

For each one of the xi, the solution of the minimization method is calculated, while the setsNj := i ∈ Ir : Mf (xi) = zj for j = 2, 3 are filled sequentially. For the special choice ofthe minimization method and with the given resolution r, pf is approximated by

pf = 0.667555,

which is quite close to 23 as it can be expected, since f is chosen to be symmetric at z1. The

regions of attraction to z1, z2 and z3 have approximately the same measure 13 . This result

can be interpreted as the special case with a pole function with δ = 0. The following table

136


0 0.5 1 1.5 2 2.5 3 3.5 40.65

0.7

0.75

0.8

0.85

0.9

0.95

1

1.05

δ

p

0 0.5 1 1.5 2 2.5 3 3.5 40.65

0.7

0.75

0.8

0.85

0.9

0.95

1

1.05

δ

p

Figure 6.7.: On the left-hand side, the figure shows the probability of finding a new localminimum by choosing the next random start point for µ = 0.5. The right figureis produced with µ = 1.2. The range for δ is in both cases 0.1 ≤ δ ≤ 4 and wassampled by a stepping of 0.05 for the plot.

0 0.5 1 1.5 2 2.5 3 3.5 40.65

0.7

0.75

0.8

0.85

0.9

0.95

1

δ

p

0 0.5 1 1.5 2 2.5 3 3.5 40.65

0.7

0.75

0.8

0.85

0.9

0.95

1

δ

p

Figure 6.8.: On the left-hand side, the figure shows the probability pT for µ = 2. The rightfigure is produced with µ = 3. The range for δ is in both cases 0.1 ≤ δ ≤ 4.

137


is calculated by approximating pT for different choices of δ and a pole function exponentµ, the regularity parameter.

δ = 0 δ = 0.5 δ = 1 δ = 1.5 δ = 2 δ = 2.5 δ = 3

µ = 1.2 0.6676 0.6729 0.6942 0.7995 0.8534 0.9354 0.9900µ = 2.0 0.6676 0.6875 0.7342 0.7981 0.8574 0.8954 0.9674µ = 3.0 0.6676 0.6969 0.7542 0.8228 0.8601 0.8847 0.9221

Fig. 6.7 and 6.8 show the dependency of pT for a wider range of δ and for µ = 0.5, 1.2, 2, 3.In Fig. 6.7, the parameter µ is chosen small, which can have a disadvantageous effect.Although the pole function has a singularity for any µ > 0, it is not guaranteed that theresulting tunneling function has it also. It results from the fact that the limit for x → z1

of the following termT (x) = (f(x)− f(z1))︸︷︷︸

→0

P(x)︸︷︷︸→∞

depends on the strength of pole function. If µ is small, the pole function might not becapable of creating a singularity in z1 and even more: The region, in which T is mono-tonically with respect to the distance function, can also be small and does not necessarilyincrease, as the radius of the pole region increases. This explains why the probability pT isnot monotonically increasing with the radius δ.A further phenomenon on the right-hand side of Fig. 6.8 can be explained by the difficultyof the evaluation of a pole function close to the singularity. As δ increases, the absolutedistances of all other points decrease, this means Dz(x) tends to zero as δ tends to infinity.Since a pole function produces very high values for distances close to zero, they might betoo high to be computed. This will cause the local optimization method to fail, when ittries to start close to z1. This explains why for δ > 3, pT starts to slightly decrease.For this function, T does not exhibit any artificial local minima, if π < δ < 3

2π. Conse-quently, pT must be close to 1. The reason why the approximation does not give exactly1 is that the local minimization might fail, if the initial guess is close to a local maximumof the function with T ′ being small. Actually, there are gaps within the domain of MTthat have not been taken into account yet and our assumption that MT can be evaluatedanywhere in the interval is not fully satisfied. But the measures appear to be practicallyvery small and since analytically the gaps have to be isolated points with measure 0, theycan be neglected. If µ is large enough, the probability of finding a different local minimumof f in the next random minimization step is increasing up to 1 as δ increases to the sizeof the valley that z1 lies in. For this example, this trend is almost linear for µ ≥ 2.This way of empirical analysis of the tunneling idea can easily be extended to a higher di-mension n. This will immediately increase the effort to rn local minimizations for a singlepair (δ, µ) of parameters. The resolution parameter r cannot be taken too small, so thatapproximation of pT is still good enough. Let n = 2 be the dimension, and a test functionis given by

f(x) =2∑i=1

sin(x(i)) x = (x(1), x(2))T ∈ D ⊂ R2 (6.2.32)

with the domain D = [−7π2 ,

5π2 ] × [−7π

2 ,5π2 ]. This function has 9 local minima within D.

Now assume that z = (−π2 ,−π2 ), the local minimum in the middle of D, is found. Using

a similar approach like in one dimension, we are interested in the probability of finding a

138


local minimum of f that differs from z by starting randomly in D. In case, when thereis no pole present at z, we can guess that this probability has to be about 8

9 , since D isa square, z is its center and f is periodic and therefore symmetric along the axes throughz. We choose a resolution parameter r = 100 to discretize D in each dimension. So thereare r2 = 10000 points to be used as an initial guess for the local minimization of f , or atunneling function T , respectively.Again, different parameter tuples (δ, µ) for the same type of pole function are used toanalyze the tunneling concept of this function and δ = 0 means using the pure objectivefunction f . We approximate pT , the chance to result in a local minimum of f that differsfrom z, as:

δ = 0 δ = 0.5 δ = 1 δ = 1.5 δ = 2 δ = 2.5 δ = 3 δ = 3.5

µ = 0.5 0.8872 0.8876 0.8872 0.8910 0.9056 0.9268 0.9468 0.9496µ = 1.0 0.8872 0.8905 0.9062 0.9188 0.9504 0.9760 0.9863 0.9908µ = 1.5 0.8872 0.8864 0.8878 0.8966 0.9125 0.9243 0.9783 0.9955µ = 2.0 0.8872 0.8864 0.8892 0.8984 0.9185 0.9379 0.9695 0.9975µ = 2.5 0.8872 0.8872 0.8911 0.9017 0.9187 0.9383 0.9627 0.9983µ = 3.0 0.8872 0.8872 0.8919 0.9044 0.9200 0.9383 0.9591 0.9979µ = 3.5 0.8872 0.8872 0.8919 0.9064 0.9211 0.9407 0.9571 0.9979

In this two-dimensional case, the size of the pole region plays an even more important role.For small δ, only a high resolution r helps to distinguish the results for pT from the one forpf . Then, very few or even none of the points in the discretization lie within a region aroundthe pole, which belongs to the region of attraction of a different minimum. Choosing δ toosmall, practically means working without any pole. In that case, the local minimizationmethod is likely to finish with an artificial minimum nearly with the same probability asthe minimization of f would finish with z.For plotting, we choose interpolation instead of higher resolutions, since the effort to esti-mate pT depends quadratically on that. The figures in (6.9) illustrate the dependencies ofpT on the parameters µ and δ. On the other hand, higher exponents µ do not automaticallylead to higher probabilities. Generally speaking, higher values for µ make the pole functionsmoother in points x with Dz(x) = 1 and flatter in an area around those. This can causelarger regions of attraction for a local minimization method. In practical use, a tunnelingfunction of C1(Rn) can be enough for successful tunneling, which is already achieved bychoosing µ > 1 as shown in Lemma 6.5. In the Fig. 6.7 and 6.8 of the one-dimensionalcase, the effect of choosing µ = 1.2 shows that it partially leads to higher probabilities offinding a new local minimum than for higher values of µ, especially when δ approaches π.For the one-dimensional test function, the best strategy is, to choose µ slightly larger than1 to ensure differentiability and δ in such a way that the valley of f that surrounds z iseliminated in T . For π < δ < 2π, no artificial minima are left, while no other local mini-mum lies within the pole region.In the following section, heuristic strategies are shown that shall help fitting the pole regionto the shape of the valley around a local minimum.

6.2.5. An Adaptive Shaping Strategy

In the following, we illustrate a strategy to adaptively stretch the region that is affectedby a pole function. Stretching is the choice since there can be an indication for the pole

139


0.5 1 1.5 2 2.5 3 3.50.88

0.9

0.92

0.94

0.96

0.98

1

µ

p T

1 1.5 2 2.5 3 3.50.88

0.9

0.92

0.94

0.96

0.98

1

δp T

Figure 6.9.: The three curves of the left figure are interpolated from the sampled data ofthe two-dimensional example. The solid line shows the dependency of pT onµ for δ = 1. The dotted line stands for δ = 2 and the dashed line for δ = 3.Independently of the choice of δ, a value of µ close to 1 is preferable. The rightfigure shows the interpolated dependency of pT on δ for µ = 1 (solid) and µ = 2(dashed).

region being to small at least along a certain direction. On the other hand, there is nodirect indication for a pole region being to large, so we would never know, when to performa shrinking operation.The shape and size of a pole region shall be used to reduce the probability of the algorithmto return to this region. The best shape of a pole region would be the one, which removesall kind of attraction for any local search algorithm. In general, the perfect shape cannot beknown a priori but we try to develop several strategies to get a better adaption of the poleregion to the underlying function. One could use local information such as the curvatureof the function to generate a suitable pole region, each time a pole is created. A differentstrategy is to apply a modification to the pole region only when necessary.We first discuss the approach of adaptive modification. In the following, we will assumethat the pole region is centered around z = 0. If the algorithm would give a point x withdz(x) < 1, then we know that this has to be an artificial minimum of T (x). The idea is tomodify T in such a way that x is no longer a stationary point. Knowing the position of xgives a direction p = (x− z) = x , see Fig. 6.10. If we like to stretch the pole region alongthis direction, we can do this by the following concept.

Lemma 6.7. Given the set MS := x ∈ Rn : xTx = 1, which is the surface of the unitsphere. Then, applying a linear transformation T := PC with a positive diagonal scalingmatrix

C = diag(c1, . . . , cn), with ci > 0, ∀ i = 1, . . . , n

and an orthonormal transformation P yields

y := PCx ∈ME := y ∈ Rn : yTAy = 1

140


Figure 6.10.: This figure shows the unit sphere in R3 and a point p = (0.5 0.1 0.3)T . Thesolid line is the main direction p of the transformed coordinate system, whilethe dashed lines are orthogonal axes.

Figure 6.11.: The sphere from Fig. 6.10 is diagonally scaled and transformed to the newcoordinate axes. Now, the point p, marked with the triangle, is relativelycloser to the origin than it is in terms of the spherical distance.

141


for x ∈MS andA :=

(T−1

)TT−1 = PC−2P T .

Proof. T = PC is regular since P and C are regular. So we can write x = T−1y.Since x ∈MS , we have

1 = xTx

=(T−1y

)T (T−1y

)= yT

(T−1

)TT−1y

= yTAy.

To show that this is an ellipsoid, we have to show that A is symmetric and positive definite.Symmetry follows immediately from its definition. P is orthogonal and D is diagonal, sowe have

T−1 = C−1P T .

Then it follows

A = (T−1)T (T−1)

=(C−1P T

)TC−1P T

= PC−1C−1P T

= PC−2P T with C−2 = diag(c21, . . . , c

2n).

The matrix C−2 has only positive elements, which are the eigenvalues of A, therefore A ispositive definite.

The idea is to transform the unit sphere along a given direction into an ellipsoid that isparallel to the transformed coordinate axes. We can easily derive a concrete algorithm fromthat result. But before we do that, let us recall Gram-Schmidt’s method of orthonormaliza-tion. We would like to generate an orthonormal basis of Rn with a fixed vector p. Algorithm6.3 is easy to implement, but rounding errors might destroy orthogonality. Alternatively,Given’s rotation [DB95] can be used to perform a QR-decomposition. In each case we needa set of linear independent vectors including p, being the columns of a matrix A. We can fillup the matrix that has to be decomposed with suitable elements from the canonical baseof Rn. By decomposing A = QR with a orthogonal matrix Q and an upper right triangularmatrix R, we get the transformation matrix as P = Q that transforms the coordinate axesin a way that the first main direction is given by p.Let z be the center of the associated pole region S := x ∈ Rn : (x − z)TQ(x − z) < 1.The k-th step of the shaping algorithm has the form (6.4).The diagonal scaling is always applied in the first main direction of the transformed coordi-nate system since this one is given by pk. When we apply this algorithm to the matrix Q in(6.2.6), we have to keep in mind that the function Dz(x) automatically limits the maximalEuclidean distance from the center to a given δ > 0. In the proof of lemma (6.7), we haveseen that the matrices Q given by the algorithm (6.4) are diagonalizable with

Q = Pk C−2︸︷︷︸

=:D

P Tk . (6.2.33)

142


Algorithm 6.3 Generating an orthonormal basis with Gram-Schmidt’s method

(i) Given a vector p 6= 0 and set p⊥0 = p‖p‖ .

(ii) Step k: find a unit vector ei, i = 1, . . . , n with 〈p⊥j , ei〉 6= 0 ∀j = 0, . . . , k− 1. Thisis the case, when the column rank of

P =(p⊥0 p⊥1 . . . p⊥k−1 ei

)is k + 1.

(iii) Set

p⊥k := ei −k−1∑i=0

〈ei, p⊥i 〉〈p⊥i , p⊥i 〉

p⊥i .

(iv) Normalize p⊥k ←p⊥k‖p⊥k ‖

Set k ← k + 1. If k > n, stop, else return to step (ii).

Algorithm 6.4 Adaptive pole shaping

(i) Given a point xk ∈ S. Calculate pk = 1k

∑ki=1(xk − z), the average direction from z.

(ii) Use Gram-Schmidt’s method (6.3) or Given’s rotation, respectively, to get n orthonor-mal vectors including pk in the first column, thus yielding an orthogonal n×n-matrixPk.

Pk =(

pk‖pk‖ p⊥(1) . . . p⊥(n−1)

)(iii) Build a diagonal scaling matrix C by replacing the first element of the identity In by

a scaling parameter σ > 1.

C := diag(σ, 1, . . . , 1) =

σ 0 00 1

. . .0 1

(iv) Set T := PkC.

(v) Reset Q← (T−1)T (T−1). Continue with tunneling and if a point xk+1 ∈ S is reached,return to step (i).

143


Now the matrix D contains the eigenvalues of Q and those are given by

D = C−2

= diag(σ−2, 1, . . . , 1).

Since σ > 1, the smallest eigenvalue of Q is 1σ2 . Then, the distance function becomes

Dz(x) =(σδ

)2(x− z)TQ(x− z), (6.2.34)

which is our desired ellipsoid if δ = σ. The algorithm starts with a spherical pole region ofmaximal expansion δ, which is achieved by the identity matrix Q = In, and only if a pointin S is reached, Q is modified and δ has to be reset to σ only once. In section (6.2.7), wewill have a closer look on how to control the effective volume of the area that is affected bythe sphere or ellipsoid, respectively.Stretching a pole region along a certain direction pk has the following effect. Let PI(pk)be the value of the pole function with a spherical distance function at pk and PQ(pk) thevalue of the pole function for the elliptic distance function. Then

PQ(pk) > PI(pk),

because the appropriate distance functions DI and DQ satisfy

DI(pk) > DQ(pk),

that means that pk is closer to the center when using the derived elliptic distance function.The gradient ∇PQ(pk) is also getting steeper, which makes pk more unattractive to localminimization methods.

6.2.6. Shape-Identification

A quite straightforward way to fit the shape of a pole region to the shape of the region ofattraction of the underlying function is to use curvature information of the function.If z ∈ Rn is a strict local minimum of f , we have ∇f(z) = 0 and Hf (z) positive definite.Without loss of generality, we can assume that f(z) = 0. The hessian is always symmetric,thus, the quadratic Taylor-approximation of f in z by

q(z + x) =12xTHf (z)x+∇f(z)x+ f(z)

=12xTHf (z)x (6.2.35)

is just an elliptic quadratic form. The level curves of q are ellipsoids in Rn. So we coulduse Q = Hf for the elliptic distance function from Lemma 6.2.6, but often, exact hessianinformation is not available. But still, we can use any approximation of the hessian of f , forexample calculating the BFGS-updates for the hessian itself along with the approximationof the inverse, as it can be done in Quasi-Newton or SQP-methods.A pole function that uses a distance function defined by the approximation of the hessian inits center z, shall be called shape-identified pole. A comparison between standard sphericalpoles and shape-identified poles in the sense of the empirical analysis is given in Fig. 6.13.

144


x1

x 2

−10 −5 0 5

−10

−8

−6

−4

−2

0

2

4

6

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

Figure 6.12.: The contour curves of f(x1, x2) = 12(sinx1 + 4 sinx2) is shown here. It has

9 global minima in the square [−72π,−

72π] × [5

2π,52π]. In each minimum,

the hessian of f is Hf = diag(2, 12). This function is used as test function

for the comparison of circle shaped and elliptic pole functions. We assumez = (−1

2π,−12π)T is the first local minimum found for our comparisons.

0 5 10 15 20 25 30 350.86

0.88

0.9

0.92

0.94

0.96

0.98

1

Volume

p T

0 5 10 15 20 25 30 350.86

0.88

0.9

0.92

0.94

0.96

0.98

1

Volume

p T

0 5 10 15 20 25 30 350.86

0.88

0.9

0.92

0.94

0.96

0.98

1

Volume

p T

Figure 6.13.: These three plots compare circle shaped poles (dotted lines) with ellipticshape-identified poles (solid lines). From left to right, we have µ = 1.2, 2, 3.Since ellipses and circles differ in area for same parameters δ as shown in sec-tion (6.2.7), the resulting probabilities of the test function (6.2.32) are plottedover the general volume, which is the area in two dimensions that is affectedby the pole. For small areas, the results hardly differ but in the middle sec-tion, ellipses are superior to circles. On the other hand, as ellipses get larger,the probabilities just increase slightly and spheres perform better.

145


x1

x 2

−5 0 5−5

−4

−3

−2

−1

0

1

2

3

4

5

50

100

150

200

250

300

350

400

450

500

Figure 6.14.: The figure shows the contour curves of Himmelblau’s function (6.2.36). It has4 global minima in Rn. The one at z = (2, 3)T is supposed to be found forour comparisons.

0 5 10 15 20 25 30 35 400.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Volume

p T

0 5 10 15 20 25 30 35 400.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Volume

p T

Figure 6.15.: The dotted lines represent the probabilities for spherical shaped poles and thesolid lines for ellipsoidal shaped lines. The used function is (6.2.36) with apole in z = (2, 3)T and µ = 1.2 for the left figure and µ = 2 for the rightfigure.

146


In this case, the two-dimensional test function 6.2.32 is used and shown in Fig. 6.12. Tohave a further test, we use the function of Himmelblau [Him72].

f(x) = (x21 + x2 − 11)2 + (x1 + x2

2 − 7)2 x = (x1, x2)T ∈ Rn (6.2.36)

This function has 4 global minima as shown in Fig. 6.14. One of them is z = (2, 3)T , inwhich we create circle shaped and elliptic poles to compare their effect for the same coveredarea. The results can be seen in Fig. 6.15 and show that here, elliptic poles perform betterthan circle shaped ones only for small covered areas. The results strongly depend on theshape of the region of attraction to z. Obviously, shape-identified poles do not fit theregions of attraction of function (6.2.36) very well because they are large and flat, see Fig.6.14.Generally speaking, by using ellipsoidal pole areas we try to find a better fit of the affectedarea to the area of attraction of z in order to eliminate it. The choice is justifiable sinceellipsoids generalize spheres. However, there is no guarantee that shape-identified poleslead to higher probabilities of success in global optimization. In the end, the shape of thepole region should be chosen in a way that it fits best the shape of the region of attraction.But if this region is, for example, rectangular or cuboid in n dimensions, it might be harderfor an ellipsoidal shaped area to cover this region than it is for a spherical area. Sincethe expansion into space of a sphere is identical in each direction, it can be more likely tocover further local minima or even global minima of the underlying function. Although,covering a global minimum does not mean that it cannot be found any more, it shouldbe prevented because it could make the covered minimum at least harder to find. Whenrecalling the adaptive pole shaping strategy, we see that the shape is only modified alonga certain direction and only when necessary. In this way, we have a chance that ellipsoidalpoles do not cover further local minima of f .

6.2.7. n-D Volume Control of Pole Regions

We discussed the effect of poles in one and two dimensions. The elliptic distance functionwe used before, is parameterized by δ > 0 to limit the maximal volume expansion. Indeed,the overall volume of the affected area plays an important role. In one dimension, the poleregion is always an interval and the volume (length) is equivalent to the radius δ. Forcircle shaped areas in two dimensions, the affected area has volume (area) πδ2 or πp1p2

for ellipsoidal shaped areas. In general, the radiuses of the n-dimensional ellipsoid that isdefined by a matrix Q ∈ Rn×n via the function from Lemma 6.2.6 are given by

pi = δ

√λ∗

λii = 1, . . . , n, (6.2.37)

while λi is the i-th eigenvalue of Q and λ∗ := miniλi is their minimum.The volume of an n-dimensional unit sphere is given in [Bro91] by the formula

Vu :=πn/2

Γ(n2 + 1), (6.2.38)

while Γ is the known Gamma-function

Γ(x) =∫ ∞

0zx−1 exp(−z)dz x > 0.

147


5 10 15 20 250

1

2

3

4

5

6

dimension

Vu

5 10 15 20 250

0.2

0.4

0.6

0.8

1

1.2

1.4

dimension

δ 1

Figure 6.16.: The left figure shows the dependency of the absolute volume of the unit sphereon the dimension. In the case of the scaled problem, this is also the relativevolume. On the right-hand side, we can see the necessary radius for a sphereof a given dimension to have volume 0.05.

The volume of a sphere with radius δ is

Vs := Vs(δ) := δnVu. (6.2.39)

We see from Fig. 6.16 that the ratio of the unit sphere of the whole volume vanishes withincreasing dimension. Therefore, we have to take care of the parameter δ and determine itdepending on the dimension.The general n-dimensional ellipsoid has the volume

Ve := Ve(δ) :=

(n∏i=1

pi

)Vu

= δn

(n∏i=1

√λ∗

λi

)Vu

= δn λ∗(n/2)

(n∏i=1

λi

)−1/2

︸︷︷︸=: γ

Vu. (6.2.40)

It holdsγ ≤ 1 and Ve(δ) ≤ Vs(δ). (6.2.41)

Since λ∗ is the minimum of the eigenvalues of Q, the factor γ that distinguishes the volumeof spheres and ellipsoids cannot be larger than 1. This means that for same parameters δ,the volume of a sphere is always larger than the volume of an arbitrary ellipsoid.

For a given δ1, we like to generate an ellipsoid (shaped by Q) with Ve(δ) = Vs(δ1). Thisis done by choosing

δ =δ1

γ1/n. (6.2.42)

148

6.3. Stochastic Performance Analysis

A volume control strategy then is to apply algorithm (6.4) and starting with a sphere byQ = I and with radius δ = δ1. Each time, the matrix Q is changed, recalculate γ and resetδ by (6.2.42). This keeps the volume of the ellispoid in touch with δ1 since δ1 may also beresetted by the stretching parameter σ. The reset of δ by the stretching parameter σ assuggested in section (6.2.5) always refers to the radius δ1 of the intial sphere.To generate an initial radius δ1 for the sphere, we recall the scaled problem (6.1.5). Thedomain is Ω = [0, 1]n for dimension n and the volume of it is just 1 for every n. Theabsolute volume of a domain is automatically the ratio of the total volume. So we candefine a volume control in terms of a ratio 0 < ν 1. Demanding the volume of the spheretaking ratio ν of the domain’s volume means

Vs = ν,

δn1Vu = ν,

δ1 =(ν

Vu

) 1n

. (6.2.43)

So there is only need to choose a ratio ν. The parameter δ for a Q-shaped ellipsoid is givenby

δ =(νγ

Vu

) 1n

. (6.2.44)

However, this is actually rather a theoretical result than a practical one. It says that wehave to increase the maximal radius of an ellipsoid in such a way that it covers a volume offixed relative size of the whole domain. As dimension gets high, we would have to chooselarge radiuses which will cause the pole region to cross the boundaries and maybe overlapother or unknown local and global minima of the function. In order to avoid that we have tolimit the maximal radius by taking the minimal known distance of two disjoint local minimainto account and on the other hand there might be a natural limit on the radius which canbe seen as a parameter for an algorithm applied. As we benchmark the tunneling conceptsin Chapter 7, we try to circumvent this negative property by avoiding random starts inhigher dimensions.


When constructing heuristic strategies that shall improve the performance of a global op-timization method, it is important to have a method to measure this in order to comparethe different strategies. We suggest a stochastic benchmark to characterize the effect ofdifferent algorithm parameters. The benchmark has to be performed using a certain globaloptimization method on a well defined test problem.To analyze the performance of a global optimization method, we recall that these methodsusually use random numbers to generate points or directions. Assume that we have given aglobal optimization problem of type (6.1.2) with the value of r known. Using an arbitraryglobal optimization method to solve the problem until all r global minima are found, givingthe total number of function evaluations needed to do so, can be interpreted as the realiza-tion of a random experiment. The number of function evaluations results from a discreteprobability distribution. The type of distribution is a priori unknown and might be veryhard to find out since it depends on the algorithmic details used in the global optimizer.But it is a stochastic global optimization method, and therefore, there is always a chance

149


for the algorithm still to take longer than it already did. Theoretically, solving (6.1.2) cantake an arbitrary long time. This means that the possible values of the random experimenthave no upper bound. On the other hand, there is a lower bound. At least r functionevaluations are needed to find r distinct global minima, but this seems to be a very weaklower bound in practice.A multi-start experiment can be described by a discrete multinomial distribution becauseit can be interpreted as the experiment of drawing r different initial values that lead to rdifferent global minima from the whole set of local minima. But we do not know the prob-abilities for each global minimum to be found by a certain initial guess and the number offunction evaluations needed for each try is not necessarily the same.So we would like to model the distribution of the random experiment. Since, the discreterandom variable can take a large variety of numbers, they can also be approximated by acontinuous probability distribution. We use the Weibull distribution [Wei39] that is usuallyused to model the fatigue of materials and breakdowns of technical devices [Wei51], suchas electric bulbs or to model the amount of loss in risk theory [Hei88]. Comparing withthe electrib bulbs, the light goes out when the last global minimum has been found. Thetime is measured by the number of function evaluations. In terms of the amount of lossin risk theory, the number of function evaluations can be interpreted as total cost of theprocedure, which is quite close to what it really is.In the literature, there are different definitions of the density of the Weibull distribution.Any algorithm will need a minimum of function evaluations that is larger than zero, there-fore we choose the shifted version given by [Dod06]. The shifted Weibull distribution isgiven by its density function

fWei(x;α, β, δ) = βα−β(x− δ)β−1 exp

[−(x− δα

)β]. (6.3.1)

for random variables x that are taking values larger than δ. α is called scaling parameter,β shape parameter and δ location parameter.In the MATLAB Statistics Toolbox, the same formula is used with δ = 0, but this is nota problem when we would like to use it since the data set can be replaced by its shiftedversion, which has the same effect on the fitted parameters α and β.Given the gamma function (6.2.7), the Weibull distribution has the following propertiesthat are taken from [Dod06]. The expectation of a Weibull-distributed random variable isgiven by

E = δ + αΓ(1 +1β

). (6.3.2)

Since the density function is shifted by δ, the expectation contains this parameter as anadditive constant. The standard deviation is given by

V = α2

[Γ(1 +

2β

)−(

Γ(1 +1β

))2], (6.3.3)

and it is independent of the location parameter δ. Finally, the skewness is given by

S = α3

[Γ(1 +

3β

) + 2(

Γ(1 +1β

))]3

− 3[Γ(1 +

2β

) ·Γ(1 +1β

)], (6.3.4)

which also does not depend on δ. The skewness characterizes the symmetry of the densityfunction with respect to the expectation. A symmetric distribution like the normal dis-tribution has skewness 0, while positive values indicate that values smaller than the mean

150


are more likely to be observed than larger values. The larger S is, the more very highvalues of x can be observed, it therefore characterizes the tail of a distribution. Here, for aglobal optimization method, smaller values are desirable since they reduce the probabilityof the method to take very long. A smaller value shows that the method is more reliable.Investigating this property can be part of constructing global optimization methods withhigher reliability.Now define a Weibull-distributed random variable

XO ∼Wei(α, β, δ) (6.3.5)

as the number of function evaluations needed to solve (6.1.2). In the next section, theestimators for the parameters are given by performing the maximum likelihood method[HEK02].

6.3.1. Maximum Likelihood Estimation of the Model Parameters

We have given a set of k realizations of a random variable XO from (6.3.5)

Dx = x1, . . . , xk. (6.3.6)

In [Sab95] it is suggested to take the maximum likelihood method for the 2-parametricWeibull distribution (with δ = 0) instead of the 3-parametric one by the numerical solutionof the set of maximum likelihood equations. The location parameter is approximated by avalue that is slightly higher than the observed minimum of Dx.

δ = minDx︸︷︷︸=:x

+ ε 0 < ε x (6.3.7)

Then, the data set is shifted by

Dx := Dx − δ = x1 − δ︸︷︷︸=:x1

, . . . , xk − δ︸︷︷︸=:xk

, (6.3.8)

now all components in Dx are larger than 0 and can be seen as realizations of a randomvariable X := X − δ with

X ∼Wei(α, β, 0). (6.3.9)

The Weibull distribution can be therefore fitted using only the two parameters α and β.The idea of the maximum likelihood method is to find the parameters α and β, so that theparameterized distribution has the biggest chance to generate the obeserved data set. Thelogarithmic likelihood function can be usually calculated more easily and it is given by

L(α, β) = ln

(k∏i=1

fWei(xi;α, β, 0)

)

=k∑i=1

ln

[βα−βxβ−1

i exp

(−(xiα

)β)]. (6.3.10)

We need to find the maximum of L with respect to the parameters. Necessary for this pointis

∇L(α, β) = 0, (6.3.11)

151


which is a system of two nonlinear equations. Using Maple and Mathematica for verificationgives the analytic partial derivatives of the logarithmic likelihood function as

∂L

∂α(α, β) =

β

α

k∑i=1

[(xiα

)β− 1

], (6.3.12)

∂L

∂β(α, β) = k

(1β− ln(α)

)+

k∑i=1

[ln(xi)−

(xiα

)βln(xiα

)]. (6.3.13)

We solve the nonlinear system (6.3.11) by applying Newton’s method by the iteration(αj+1

βj+1

)=

(αjβj

)−H−1

L (αj , βj)∇L(αj , βj),

⇔

[(αjβj

)−

(αj+1

βj+1

)]= H−1

L (αj , βj)∇L(αj , βj),

⇔ HL

[(αjβj

)−

(αj+1

βj+1

)]= ∇L(αj , βj), (6.3.14)

where the Jacobian HL of (6.3.11) is given by

HL,1,1 =∂L

∂2α2= − β

α2

k∑i=1

[(1 + β)

(xiα

)β− 1

], (6.3.15)

HL,1,2 = HL,2,1 =∂L

∂α∂β=

1α

k∑i=1

[(β ln

(xiα

)+ 1)(

xiα

)β− 1

], (6.3.16)

HL,2,2 =∂L

∂2β2= − 1

β2

k∑i=1

[1 + β2

(xiα

)βln(xiα

)2]. (6.3.17)

For a good initial guess, (αj , βj)T converges quadratically to (α, β). In each step, theimplicit iteration in (6.3.14) requires the solution of a linear system of dimension 2. We givea newton algorithm implemented in MATLAB for the calculation of a maximum likelihoodestimator for the Weibull distribution by:

function [param] = MLE_Weibull(X, alpha , beta)k = length(X); % the size of the samplemaxiter = 100; % maximum number of newton iterationsp = zeros(2,maxiter ); % allocate vectorp(:,1) = [alpha; beta]; % initial guesstol = 1e-8; % break tolerance on F(alpha , beta)for j=1: maxiter

[F, J] = MLS(X, p(1,j), p(2,j));dp = J\F;p(:,j+1) = p(:,j) - dp;if (norm(dx) < tol) || (norm(F) < tol), break , end;

endparam = x(:,j);

152


end

function [Fout , Jout] = MLS(X, alpha , beta)persistent F J k Ftmp1 Ftmp2 Jtmp11 Jtmp12 Jtmp22 b2;if isempty(k)

k = length(X);F = zeros (2 ,1);J = zeros (2 ,2);Ftmp1 = zeros(1,k);Ftmp2 = zeros(1,k);Jtmp11 = zeros(1,k);Jtmp12 = zeros(1,k);Jtmp22 = zeros(1,k);b2 = beta ^2;

endfor i=1:k

xia = X(i)/ alpha;xiab = xia^beta;lxia = log(xia);Ftmp1(i) = xiab - 1;Ftmp2(i) = log(X(i))- xiab * lxia;Jtmp11(i) = (beta +1) * xiab - 1;Jtmp12(i) = -1 + xiab *(beta * lxia + 1);Jtmp22(i) = 1 + b2 * xiab * lxia ^2;

endF(1) = beta/alpha * sum(Ftmp1 );F(2) = k*(1/beta -log(alpha )) + sum(Ftmp2 );Fout = F;if nargout > 1

Jout (1,1) = -beta/( alpha ^2) * sum(Jtmp11 );Jout (1,2) = 1/alpha * sum(Jtmp12 );Jout (2,1) = J(1 ,2);Jout (2,2) = -1/(beta ^2) * sum(Jtmp22 );

endend

This function can be called similar to the function wblfit from the Statistics Toolbox. Weuse the backslash operator in MATLAB for the solution of (6.3.14) by dp=J\F to producethe next step. This uses a direct solution method for linear systems.Here, we have to give an initial guess for α and β. The scaling parameter of a Weibulldistribution is also called characteristic life, which is in general not the same as the meanlife, but usually quite close. So we can use the mean of the sample as an initial guess for α,

α0 :=1k

k∑i=1

xi. (6.3.18)

We expect well constructed global optimization methods to produce a right-skewed distri-bution with S > 0. From [Dod06] we know that the zero of the skewness function is atβ ≈ 3.6 for any value of α. If β < 3.6, we have S > 0 and for β > 3.6 it holds S < 0. We

153


can expect that 1 < β < 3.6 since β = 1 yields the exponential distribution as a specialcase of the Weibull distribution. So we simply choose

β0 := 2 (6.3.19)

as an initial guess for β and the MATLAB R© routine can be used by:

p = MLE_Weibull(x, mean(x), 2);alpha = p(1);beta = p(2);

Expectation, standard deviation and skewness only depend on those two parameters andcan now be easily computed.

6.3.2. Benchmark Functions

For the comparison of different global optimization methods or different parameterizationsof the same method by the stochastic analysis described above, we need a set of well definedtest problems. For each problem, we have to know the number and the position or the levelof the global minima, so that we know exactly when to stop the global optimization method.We choose some from [LM85], where 16 examples are given, from [Sch81] and [Dix78].

(i) Ω := (x1, x2)T ∈ R2 : −6 ≤ xi ≤ 6, i = 1, 2:

f(x1, x2) =

(5∑i=1

i cos((i+ 1)x1 + i)

)(5∑i=1

i cos((i+ 1)x2 + i)

)with 8 global minima in Ω.

(ii) Ω := (x1, x2)T ∈ R2 : −6 ≤ xi ≤ 6, i = 1, 2:

f(x1, x2) =

(5∑i=1

i cos((i+ 1)x1 + i)

)(5∑i=1

i cos((i+ 1)x2 + i)

)

+12((x1 + 1.42513)2 + (x2 + 0.80032)2

)with only 1 global minimum left in Ω compared with function (i).

(iii) Ω := (x1, x2)T ∈ R2 : −3 ≤ x1 ≤ 3,−2 ≤ x2 ≤ 2:

f(x1, x2) =(

4− 2.1x21 +

13x4

1

)x2

1 + x1x2 +(−4 + 4x2

2

)x2

2

with 6 local minima, where 2 of them are also global. This function is also known assix-hump camel back function and suggested in [Dix78].

(iv) Ω := (x1, x2, x3)T ∈ R3 : 0 ≤ xi ≤ 500 i = 1, 2, 3:

f(x1, x2, x3) =3∑i=1

−xi sin (√xi) ,

which is based on a function known as Schwefel’s function 7, see [Sch81]. It hasmultiple local minima and a single global minimum at xi ≈ 420.9687 for i = 1, 2, 3..

154


It is chosen as a special case for dimension n = 3 and the search space is limited tononnegative values so that the function is defined everywhere. In its original version,it uses |xi| in the root and the search space then can be extended to reach from -500to 500. But this version is not differentiable along the coordinate axes so we restrictourselves to the positive half spaces.

(v) Ω := (x1, . . . , x5)T ∈ R5 : −5 ≤ xi ≤ 5, i = 1, . . . , 5:

f(x1, . . . , x5) = 0.1 sin(3πx1)2 + 0.14∑i=1

((xi − 1)2(1 + sin(3πxi+1)2)

+0.1(x5 − 1)2(1 + sin(3πx5)2)

with only one global minimum at at zi = 1 for i = 1, . . . , 5.

These functions have several local minima, some of them have multiple global minima.When trying to solve the problems with stochastic global methods, the time for the solutionmight vary significantly. In the following example, we show that a straightforward solutionof one of the problems yields a sample of function evaluation counts with a sample standarddeviation in the same magnitude as the mean. So it seems to be very hard to make astatement about the performance of a particular method with a small sample size. Wewill therefore suggest a large sample size when constructing stochastic global optimizationmethods.

6.3.3. Pseudo-Random Numbers

In order to perform a stochastic performance analysis, a reliable source of random numbersis needed. We need to simulate a uniform distribution for the calculation of points ordirections.We use the Mersenne-Twister algorithm from [MT98] that is implemented as standardmethod in MATLAB’s routine rand since revision 7.4. For example, generating a pseudo-random number from U(0, 1) can be done by:

p = rand;

But we have to take care that each time, we start a new benchmark, a different seed forthe pseudo-random number generator is chosen, otherwise we could get the same sequenceof numbers. The algorithm can be configured by using:

rand(’twister ’, seed);

where the variable seed has to be chosen each time a new benchmark is going to beperformed. This can be done by linking it to the system time as MATLAB’s documentationsuggests.

seed = 100 * sum(clock );

Here, clock gives an array containing year, month, day, hour, . . . which automaticallytakes different values, each time it is called. Summing up and scaling gives a large scalarthat can be used as a seed for the pseudo-random number generator.

155


function evaluation count

rela

tive

freq

uenc

y

Finding 8 global minima

0 5000 10000 150000

1

2

3x 10

−4

0 5000 10000 150000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

xF(

x)

Figure 6.17.: The left figure shows the relative histogram plot of the example problem fromsection (6.3.4). The red line represents the density of the Weibull distribu-tion that uses maximum likelihood estimators α = 3952, β = 1.7. On theright-hand side, we can see the cumulative empirical distribution functionplot (dots) and the cumulative distribution function of the fitted Weibull dis-tribution (line).

6.3.4. Example and fitting of the Weibull parameters

We illustrate the described method with the following example. Recall the simple randomstart algorithm (6.2). We take the benchmark function (i) from Section 6.3.2, which ex-hibits 8 global minima in its domain and solve it with the random start algorithm, whilestopping when all 8 global minima have been found. We count the total number of functionevaluations until the stop and store the value in an array. This is repeated until a sampleof the size k = 1805 is generated.The way that pseudo-random numbers are treated plays and important role in this stochas-tic analysis. A priori, it is not known, how many random numbers are needed within onerealization of the experiment. The whole experiment can be reproduced by resetting theseed of the random number generator at the beginning. If one would reset the seed beforeevery realization of a single experiment to the same value, the method would be determin-istic and finish with the same result each time. The performance of the method dependsstrongly on the choice of the seed since it determines a sequence of pseudo-random num-bers. To make any conclusions, it is important to have each experiment independent of thepreviously performed ones, which is only the case, if the seed of the generator is reset to adifferent value each time. In section (6.3.3), it is described how to do this.We get the sample (x1, . . . , x1805) and generate the shifted sample xi := xi − mini xi + εwith ε = 0.01 for i = 1, . . . , 1805.First, the sample can be analyzed by having a view on the histogram plot. For furthercomparisons with a continuous density function, it is useful to use a relative histogram. Wehave to choose a number of containers nb with width b, divide the interval [ε,maxi xi] intonb disjoint intervals and count the numbers ηi of elements that lie within each interval. We

156


200 400 600 800 1000 1200 1400 1600 18003800

3900

4000

4100

4200

4300

4400

sample size

mea

n

200 400 600 800 1000 1200 1400 1600 18003800

3900

4000

4100

4200

4300

4400

sample size

mea

nFigure 6.18.: The example problem is solved twice, each time with 1805 runs. Both ex-

periments use different seeds for the random number generator. For i =1, . . . , 1805, the sample means are given by 1/i

∑ij=1 xi. Using a sample size

smaller than 1000, would produce estimators for the expectation between 3800and 4300, which is a very rough approximation of the unknown expectation.Any short-term drifts or trends are coincidences since the experiments areindependent.

have to scale this numberηi :=

1bnb

ηi i = 1, . . . , nb. (6.3.20)

Then we have b∑nb

i=1 ηi = 1, which refers to the necessary property of continuous densityfunctions f(x) of having

∫∞−∞ f(t)dt = 1. This makes the so generated histogram compara-

ble with a density function. The cumulative distribution function can be compared. Thecontinuous one for the fitted distribution is just

Fc(x) :=∫ x

−∞F (z)dz = 1− F (x) = exp

((−xα

)β), (6.3.21)

with the distribution function F (x) = 1 − exp((− xα

)β) of the Weibull distribution. Theempirical cumulative distribution function can be gained by summing up the relative fre-quencies

Fc(x) = b∑i≤x

ηi. (6.3.22)

This is a piecewise-constant function that can be compared with the continuous function asshown in figure (6.17). Calculating the maximum likelihood estimators with the algorithmdescribed in section (6.3.1) yields

δ ≈ 698.99α ≈ 3952β ≈ 1.7

157


The results can be seen in Fig. 6.17. Here, the original data histogram and the shifteddensity function are plotted. The moments of the distribution are estimated by

E ≈ 4205,V ≈ 2155,S ≈ 231.

The number of function evaluations needed to solve the problem varied between 700 and15000. To make any conclusion about the expected performance, we need a large samplesize as illustrated in Fig. 6.18. The reliability and the comparability of these propertiesdepend on the sample size. As it can be seen in the figure, we will need larger samples forthis kind of problem to get comparable results.Asymptotic confidence intervals on the estimation of the expectation for the niveau (1− a)can be given by

Ij :=

[Xj − z1−a

2

Sj√j, Xj + z1−a

2

Sj√j,

], j = 2, . . . , k (6.3.23)

with the j-th mean

Xj =1j

j∑i=1

xi, j = 2, . . . , k (6.3.24)

and the j-th sample variance

S2j =

1j − 1

j∑i=1

(xi − Xj

)2j = 2, . . . , k. (6.3.25)

Here, z1−a2

is the (1− a)-quantile of the standard normal distribution. Choosing a = 0.05,we get z1− 0.05

2= z0.95 ≈ 0.8352. Since the variance of the distribution is estimated to be

quite high, we will need a large sample to get a narrow asymptotic confidence interval. Butthis is needed for different methods to be comparable.Now we will assume that the unknown distribution satisfies the properties

σ =µ

2, µ 0, (6.3.26)

for a certain expecation µ and a variance σ2, which means a high variance relative to theexpectation. The length of the asymptotic (1− a)-confidence interval can be approximatedfor a given sample size, because the mean and the sample variance are estimators for theexpectation and the variance, respectively.

Ij =[µ− z1−a

2

σ√j, µ+ z1−a

2

σ√j,

], j = 2, . . . , k (6.3.27)

The length of each of these intervals is given by

lj = 2z1−a2

σ√j

= z1−a2

µ√j. (6.3.28)

158

6.4. A concrete Implementation

If the length of the asymptotic confidence interval is supposed to be smaller than a giventolerance ratio 0 < νµ < 1 of µ

lk < νµµ, (6.3.29)

then the sample size k has to satisfy

k >

(z1−a

2

νµ

)2

. (6.3.30)

For a = 0.05 and νµ = 0.02 it results k > 1744. Generally speaking, to have a 95 % chanceon being able to compare two samples of type (6.3.26) that differ at least by 2 % in theirmean, we need a sample size of at least 1744 each. For νµ = 0.01, the minimum of kincreases to 6976 and growing quadratically as the tolerance ratio gets smaller.The Weibull distribution was a good choice to model the given experiment and give visuallygood comparisons. Different distributions have been tried, such as the normal distributionor the beta distribution. But the failure of those is not surprising since the Weibull distri-bution is the only one which has the properties that can be expected from the unknowndistribution of the global optimization experiment and it includes the exponential distri-bution as a special case. It can be shifted by a constant, which definitely makes sense forthe global optimization problem, because it must take at least a nonnegative number offunction evaluations to succeed. Then we can expect that the results will accumulate at acertain value but this cannot be the expectation itself since very high numbers will alwayshave a positive probability, which causes the expectation to move away from the accumu-lation point. Therefore, the distribution has to be skewed. Although there is no proof thatthe experiments we are investigating indeed result from a known statistical distribution,the results encourage this approach.Concerning the minimum sample size to be able to make any conclusions, there cannot beany definite minimum value because this depends on each objective function. But if weexpect a given global optimization method to perform well at least on a certain set of func-tions, then we can check the random experiments described above and give subsequentlylower bounds for the sample size.


An algorithm that uses tunneling functions needs a global framework since tunneling func-tions themselves are just instruments. At the beginning, we always start with a single localminimization on f from a certain initial guess. After converging to a local minimum ofthat function, the tunneling function can be constructed. When we minimize the tunnelingfunction in order to find any point with T (x) ≤ 0, we have to handle different scenariosand modify the tunneling function when necessary. We denote the limit point of the chosenlocal minimization method as z := MT (x), while x is the initial guess. Let f∗ be the lowestknown function value of f and Si are the pole regions for the global minima already known.The possible scenarios are:

(i) T (z) ≤ 0⇔ f(z) ≤ f∗: Use z as new initial guess.

(ii) ||∇T (z)|| < TOL, T (z) > 0 and z /∈ Si for i = 1, . . . : Generate pole at z.

(iii) ||∇T (z)|| < TOL, T (z) > 0 and z ∈ Si for at least one i = 1, . . . : Modify pole i andgenerate pole at z.

159


?

calculate z ←MT (x0)

determine x0 ∈ (0, 1)n

?

f(z) < f∗ ? -

?

yes

no

Z ← z, f∗ = f(z)

Z ← Z ∪ z -

?

p← z

p← zx← U(p, ε)

calculate z ←MT (x)?

f(z) ≤ f∗ ?

-

?

yes

no

?

x0 ← z

z ∈ Si ?yes

no

6

reshaping

-6

pole at z

new point?

6no

6

p← z?

determine p ∈ (0, 1)n, ε := 0

yes

Figure 6.19.: This flow chart represents Algorithm 6.5. It also outlines the sub-Algorithm6.6 without any stopping criteria. Here, the method U(p, ε) returns a randompoint within an ε-region around p.

160


It depends on the objective function, which of the cases are most likely to occur. Whilecase (i) is the only desirable case, the others are not. For functions that have many localminima but few global minima, the algorithm might lead to case (ii) quite often. Here,we have no pole in range, which means that the tunneling minimization is just working ona shifted version of f . Case (iii) is also undesirable, but it tells us something about thetunneling function. Here, we can apply the shaping strategies. If however, case (iii) occursinfrequently, this can mean that the pole region already fits or that the function has manylocal minima.

Algorithm 6.5 Semi-Deterministic Tunneling Algorithm

Require: Np, Nm, Nn, Nc ∈ N, Nn ≤ Np. X ← x00, . . . , x

Nc−10 ⊂ (0, 1)n;

c← 0;nz ← 0;Z ← ∅;T ← f ;f∗ ←∞;Choose x0 ← x0

0;c← c+ 1;repeatz ←MT (x0);if f(z0) < f∗ thenZ ← z0;nz ← 1

elseZ ← Z ∪ z0;nz ← nz + 1

end ifUpdate T ← TZ ;Solve subproblem in Algorithm 6.6;

until x0 has not been updated

Algorithm 6.5 is designed for approximately solving the global optimization problem (6.1.2)by a sequence of local optimization problems. We state it separated into the main Algorithm6.5 and the tunneling phase in Algorithm 6.6 to give a better overview. A simplified flowchart is given in figure 6.19.For an arbitrary continuous function, neither the number of global minima nor their functionvalue of f is known a priori. Thus, we need a suitable stopping criterion since we will neverknow when the problem is solved and we do not want the algorithm to run infinitely.Now, the algorithm is finite since it exits when all Nc points in X have been used as aninitial guess for the local minimization problem. It will take a new point from X whenever(Np ·Nm) tunneling minimizations should not lead to a new global minimum. In each case,it produces a nonempty set Z 6= ∅ which approximates the set of all global minima of f .The number #Z = nz is the approximation to the number of global minima of f.We can note that the single start local minimzation method is a special case of the algorithmwhen choosing Np := Nn := 0. Then, the algorithm exits with Z = M(x0

0).By fixing the set X at the beginning of the algorithm, we define a deterministic backgroundstrategy that is performed sequentially. The next point from the deterministic sequence ischosen after (Np−Nn) ·Nm local minimizations were started close to points produced by the

161


Algorithm 6.6 Tunneling phasefor i = 1 to Np do

if i < (Np −Nn) thenChoose x1 ∈ U(z0);

elseChoose x1 ← xc0;c← c+ 1;if c > Nc then

return exit algorithm with current Zend if

end ifz1 ←MT (x1);if T (z1) ≤ 0 thenx0 ← z1;return to loop

elseif z1 ∈ Sj for one j = 1, . . . , nz then

Modify Dj ;end ifUpdate T ← Tz1 ;zm ← z1;for j = 1 to Nm do

Choose xm ∈ U(zm);z2 ←MT (xm);if T (z2) ≤ 0 thenx0 ← z2;return to loop

elsezm ← z2;

end ifend for

end ifend for

162


tunneling algorithm and when these did not lead to any new best known point. Then, if Nn

initial guesses from X do not produce any new point with T ≤ 0, the algorithm stops. If Xis chosen in a way such that the domain Ω = [0, 1]n is covered well, the background strategyensures that in cases where the tunneling steps do not produce any valid output, the globalinvestigation of f still proceeds. We call that kind of algorithm semi-deterministic since ituses a deterministic pattern and stochastic elements within.In the following section we will motivate the choice of a deterministic background strategy.We will choose a sequence that quickly covers the unit cube, independently of the numberof points needed.

6.4.1. Start Values by Halton Sequences

In the algorithm described before, we need two types of start values.

(i) A point x1 in Uε(p) of a given point p.

(ii) A point x2 in (0, 1)n.

The first one is needed, whenever the algorithm starts close to any pole of T . The secondone is needed to generate the first initial guess and to apply a random or deterministicbackground strategy.A straightforward solution to this task is to choose these points randomly from a uniformdistribution in (0, 1) × (0, 1) × . . . . In case (ii), we can directly choose a point x2 := x ∼U((0, 1)n). In case (i) we can take the a realization of a variable from the same distributionand scale the vector to have length ε by x1 := x+ ε x−0.5

‖x−0.5‖ .However, there are other methods to generate points for these cases. In case (i) we have toensure that the point generated still lies within [0, 1]n. If x ∈ [0, 1]n there exists a directionx so that x1 := x+ ε x−0.5

‖x−0.5‖ ∈ (0, 1n). We repeat the random procedure until such a pointis found.In case (ii) we can replace the random variable by a deterministic sequence. Instead ofchoosing randomly, we pick up the idea of Quasi-Monte-Carlo simulation and determine aset of points to be chosen from. A similar idea has already been applied in direct searchmethods for global optimization [SY05, GJ08]. Here, we would like to apply a deterministicbackground strategy and choose this in such a way that the domain of the objective functionis covered quite well and fast by only few points.The root-mean-square discrepancy [BW79, PFH92] is a measure of the non-uniformity ofa sequence. There are several methods to generate low-discrepancy sequences for use inQuasi-Monte-Carlo simulation, such as Halton [Hal60], Niederreiter [PFH92, PBN94], Sobol[Sob67] and Faure [Fau82]. We choose Halton sequences, because they can be computedvery efficiently in low dimension. An upper bound for the error of integration using Quasi-Monte-Carlo methods was derived by [Hal72] using Halton sequences since these sequenceshave an upper bound on their discrepancy [Sch08]. This characterizes how evenly the pointsare distributed in the interval (0, 1).Halton sequences are a multi-dimensional extension to the so-called van der Corput (vdC)sequences [vdC35]. The i-th element of such a sequence is given by representing i by thebase b ∈ N \ 0, 1 and reversing the digits. Choosing b = 2 or b = 10 respectively yield thesequences:

163


0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

x1

x 2

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

x1

x 2

Figure 6.20.: Both figures show 1000 points in the x1-x2-plain. The points in the left figureresult from a pseudo random number generator of a uniform distribution.The right figure results from the Halton sequences for base 2 (x1) and base 3(x2). As it can be seen, the Halton sequence generates a much more uniformcoverage of the unit square.

i base 2 vdC2 vdC10

1 1 0.12 1/22 10 0.012 1/43 11 0.112 3/44 100 0.0012 1/85 101 0.1012 5/86 110 0.0112 3/8...

......

...

i base 10 vdC10

1 1 0.110

2 2 0.210

3 3 0.310

4 4 0.410

5 5 0.510

6 6 0.610...

......

Halton sequences are based on vdC sequences with different primes as base. A sequencex1, x2, . . . , xN ∈ (0, 1)n is to be generated. For j = 1, . . . , xN and for i = 1, . . . , n let pibe the i-th prime, then the i-th component of the j-th point is given by j-th the van derCorput element from base pi.

6.4.2. Discussion on Benchmarking

Benchmarking the performance of a global minimization method is a non-trivial task. Localminimization methods are deterministic and the minimization runs are reproducible. Evenmore important is that gradient based methods applied to an arbitrary continuous functionshave a sufficient stopping criteria that can be evaluated at every iteration of the algorithm.On the other hand, stochastic global minimization methods do not have any necessary oreven sufficient criteria.Usually, test functions with certain properties are constructed, such as a given number oflocal and global minima or a certain distance between those. Generating test functionswith desirable properties is even a scientific task, see [AL07]. Once we know, where to find

164

6.5. Analytic Basins of Attraction

the global minima of a function and when we know the associated function value, we havea sufficient stopping criterion for the global optimization method. That is simply when allthe global minima that we already know are found. That is the difference between local andglobal minimization concerning benchmarking. For local methods, we know when we aredone, even if we do not know where the minimum lies a priori. Hence, for a global method,we do not know when to stop. Thus, the algorithms have to run until some pragmaticstopping criterion is met. We can limit the number of function evaluations, the CPU timespent or some measure of progress to be small for a certain time period.It is nearly a philosophic question which type of benchmarking to choose. In the author’sopinion, global optimization via a sequence of local optimizations cannot be compared todirect search methods by means of function evaluation counts. The reason for this might bethat the choice of the local method is essential as already shown in early results in [LM85].Tunneling algorithms are hybrid methods that use global and local strategies and cannotbe decoupled from the local part.One can try benchmarking by stopping the algorithms by one of the pragmatic criteriaand measuring the average number of global minima found from many runs. This mightgive a measurement of success which can be interpreted as probability of success. But theresults will strongly depend on the stopping criteria. If, for example, the maximum CPUtime allowed is chosen too short, the algorithms do not have a chance to locate the globalminima, while choosing the value too large will give all algorithms enough time to find allminima causing the average minima count being 1 or close to it. In both cases, the resultscannot be compared.What we do in Chapter 7 is that we compare the results of the suggested tunneling conceptsin its different variants with a fixed reference algorithm. This reference algorithm can beseen as an abstract special case of the tunneling algorithm where no poles and only completerandom starts are used. In this way we can figure out which of the new features have thepotential of improving tunneling-type methods.


In the literature, there are some ideas on how to classify and use regions (or basins) ofattraction of local minima. In [Cor75], a similar approach as here is followed. In each case,the idea is to define regions of attraction of a local minimum by means of paths leading tocertain attractors. Optimization methods based on the solution of an ordinary differentialequation date back to [KJAU58] and are based on the idea of Courant’s method of gradientsfrom 1943, see [Cou43].Although the methods based on gradient paths (or often called gradient flows) never madeit to the standard literature, numerous works have been published in the 1970s. Recentresults and algorithms are presented in [Beh98, Bot78, Ram70, BBB89, Zan78, Bed09].Basins of attraction play an important role in certain concrete optimization problems asappearing in optical systems [vTB09], where the basins have fractal structure [NY96].Alternatively to classical line-search or trust-region methods, one can solve the uncon-strained local minimization problem by following the gradient path to the minimum.

Definition 6.5.1. The gradient path for a real-valued function f : Ω ⊂ Rn → R and a

165


point x0 is a path

ϕx0 : [0,∞)→ Rn, t→

ϕ1x0(t)

...ϕnx0(t)

(6.5.1)

that solves the initial value problem

ϕ′x0(t) = −∇f(ϕx0(t))),ϕx0(0) = x0. (6.5.2)

This is a system ODEs where the right-hand side does not explicitly depend on t calledautonomous system. The existence and the uniqueness of a solution follows from Picard-Lindelof ’s theorem and leads to assumptions on right-hand side, and therefore indirectly onf .

The right-hand side of the ordinary differential equation can also be replaced by otherdirections such as the Newton-direction [Deu04, Beh98] but we use the negative gradientto have global convergence.As it is well-known there exists a unique solution for every initial value in Ω of this system ofordinary differential equations if the function f is in C2(Ω). The gradient path is thereforewell-defined. The tangent of the path is always orthogonal to the level curves of the functionf since it is simply given by the gradient of f .In the following ‖ · ‖ denotes an arbitrary norm in Rn. We now repeat Lyapunov’s stabilitytheorem for ordinary differential equations as in [Wal00]. First we need the definition ofthe stability of a solution of system (6.5.2).

Definition 6.5.2. A solution ϕx0(t) of (6.5.2) is called ’stable’ if for every ε > 0, thereexists a δ > 0, so that for all initial values y0 with

‖x0 − y0‖ < δ,

it holds‖ϕx0(t)− ϕy0(t)‖ < ε, 0 ≤ t <∞.

If alsolimt→∞‖ϕx0(t)− ϕy0(t)‖ = 0,

it is called ’asymptotically stable’.

Small changes in the initial value will not cause the limit point of the system to changefor asymptotically stable systems. In [Wal00], it is shown that the gradient path system isasymptotically stable. The argumentation follows from the Lyapunov’s asymptotic analysisof autonomous systems of the form

y′ = g(y). (6.5.3)

In the following we can assume that the null-solution y ≡ 0 with g(0) = 0 solves the systeminstead of any constant y ≡ a with g(a) = 0. A Lyapunov function for g is defined by thefollowing.

Definition 6.5.3. A function V ∈ C1(Ω) is called Lyapunov function if

V (0) = 0,V (x) ≥ 0 x 6= 0,V (x) ≤ 0 x ∈ Ω,

166


whileV (x) := 〈∇V (x), g(x)〉.

The directional derivative V is the derivative of the Lyapunov function in the direction ofg. This means that a so defined Lyapunov function applied by

V ϕx0(t)

is always monotonically decreasing as t increases.We only give the part of Lyapunovs stability theorem that we are interested in.

Theorem 6.1. Assume that g ∈ C(Ω) and g(0) = 0. If there exists a Lyapunov functionV for g, then

V < 0 in Ω \ 0 ⇒ the null-solution of (6.5.3) is asymptotically stable. (6.5.4)

Proof. It follows from the monotonicity of the Lyapunov function with respect to t. For adetailed proof we refer to [Wal00].

For the system (6.5.2), if we assume that the constant solution ϕ0(t) ≡ 0 solves the system,a Lyapunov function is simply given by

V (x) := f(x) (6.5.5)

andV (x) = −‖∇f(x)‖2. (6.5.6)

Now if z is a strict local minimum of f , there exists a set Uε(z) with f(x) > f(z) and∇f(x) 6= 0 for x ∈ Uε(z) \ z. From Theorem 6.1 follows that the solution ϕz(t) ≡ z isasymptotically stable.An attractor of a system can be defined as a point z ∈ Ω for which we have a set Uε(z)with

limt→∞

ϕx(t) = z, x ∈ Uε(z). (6.5.7)

In [Wal00] it is shown that a point z is an attractor of a system if the constant solutionϕ(t) ≡ z is asymptotically stable.We still have to argue that the limit of the system exists for every initial value. For aninitial value we define an arbitrary subsequence

fn := f(ϕx0(tn)) (6.5.8)

for a sequence (tn) of increasing values. If the function f is bounded from below, thesequence fn is also bounded and it is monotonically decreasing according to the definitionof the gradient path. Thus, a limit point exists.

limn→∞

fn = limn→∞

f(ϕx0(tn)) (6.5.9)

= limt→∞

f(ϕx0(t)) (6.5.10)

This means that the gradient path leads to a certain function value if f is bounded frombelow. Roughly speaking, it is converging in terms of the associated function value of f .

167


Unfortunately, we cannot guarantee that this implies the convergence of ϕx0(t) since thegradient path might cycle infintely. A counter example is given in [JS03]. But still we canderive the following result.

Lemma 6.8. Let g(t) := ‖∇f(ϕx0(t))‖ be Lipschitz-continuous for a gradient path ϕx0 ofa function f that is bounded from below. Then

limt→∞

g(t) = 0, (6.5.11)

this means that the gradient tends to have zero length along the gradient path.

Proof. To show this result, recall that a function g : R+ → R+ is called Lipschitz-continuousif for every a, b it holds

|g(a)− g(b)| ≤ L|a− b|

for a constant L > 0. This is particularly true for b := a+ δ with any small δ > 0.We show that assuming that g(t) does not converge to 0 as t runs to infinity yields acontradiction to the assumption that f is bounded from below. Now let g(t) not convergeto zero. Then

∃ ε > 0 : there is no s > 0 with g(t) < ε, ∀ t > s.

It follows that there is a countable set t1, t2, . . . with g(ti) ≥ ε > 0 for i = 1, 2, . . . . Weshow that for every point ti there is an interval Uδ(ti) := (ti, ti + δ) with g(t) ≥ ε0 > 0 for apositive constant ε0. If g(ti+δ) ≥ g(ti) there is nothing to show. So assume g(ti+δ) < g(ti).From the Lipschitz-continuity follows that

g(ti + δ) ≥ g(ti)− δL≥ ε− δL ≥ ε0 > 0,

if δ < ( εL −ε0). So, dependent on the Lipschitz constant L, there is a δ > 0 so that g(ti+ δ)is larger than a positive constant ε0 for all t1, t2, . . . .We define the path value function the fϕ(t) := f(ϕx0(t)). For the derivative of the pathvalue function it holds

fϕ(t) := 〈∇f(ϕ(t)), ϕ′(t)〉 = −g(t)2.

Thus, the function fϕ(t) with respect to the path position satisfies

fϕ(ti + δ) = fϕ(ti) +∫ ti+δ

ti

fϕ(s) ds

= fϕ(ti)−∫ δ

0g(ti + s)2 ds

at each position ti for i = 1, 2, . . . and δ > 0. For the integral it holds∫ δ

0g(ti + s)2 ds ≥ δ(ε− δL)2 ≥ δε2

0 =: c,

which means that the integral is strictly larger than a positive constant c for every ti anddoes not tend to zero even for large ti. We write a sequence for the descent on f

Dj := fϕ(tj)− fϕ(t1) j = 1, 2, . . . .

168


Clearly Dj < 0 for every j. Choose

δ < min( ε

L− ε0

), inf

jtj+1 − tj

.

We can bound the descent from above by summing up the local descents on t1, t2, . . . .

Dj = fϕ(tj)− fϕ(t1)

= fϕ(t1) +∫ tj

t1

−g(s)2 ds− fϕ(t1)

= −∫ tj

t1

g(s)2 ds

= −j−1∑k=1

∫ tk+1

tk

g(s)2 ds

≤ −j−1∑k=1

∫ δ

0g(tk + s)2 ds︸︷︷︸

>c

< (1− j) c

Obviously, for j →∞, the upper bound (1− j) c tends to −∞ . Then limt→∞ fϕ(t) = −∞but this is a contradiction to the assumption that f is bounded from below. Hence, thenorm of the gradient of f tends to zero as t tends to infinity.

We outlined that the gradient path yields the convergence of the function value of f to acertain lower bound along the path and that the norm of the gradient tends to zero. Underthe same assumptions as for these results the convergence of the path itself to certain pointin Ω cannot be shown. If we assume that the function f has a connected set of local minimathen the gradient path could approach this set while never converging directly to a singlepoint. We state the following result for functions with discrete local minima. Note that in[Beh98] a similar result with slightly different conditions is given.

Lemma 6.9. Let x∗ ∈ Ω be an isolated local minimum of a function f and y∗ ∈ Ω withx∗ 6= y∗. Then there is no gradient path that has the limit points x∗ and y∗.

Proof. Since x∗ and y∗ are disjoint, it holds ‖x∗ − y∗‖ = δ > 0 for a positive constant δ.Assume that x∗ is a limit point of a path ϕ(t). This means that

‖ϕ(tε)− x∗‖ < ε

for every ε < δ2 and a tε chosen large enough.

The point x∗ is an isolated local minimum of the function f , thus we have strict convexityin a region around the point. For every y ∈ Uσ(x∗) := x ∈ Ω : ‖x− x∗‖ ≤ σ it holds fora ∈ [0, 1]

f(ax∗ + (1− a)y) < af(x∗) + (1− a)f(y)< af(y) + (1− a)f(y)= f(y).

169


This is particularly true for points on the boundary of the convexity region y ∈ ∂Uσ(x∗).Then we can bound the function value of points on the boundary of the convexity regionby

mσ := inff(y) : y ∈ ∂Uσ(x∗) > f(x∗).

We have f(limt→∞ ϕ(t)) = f(x∗), thus we can choose points with a function value that isarbitrarily close to f(x∗). Then we can find ε < min δ2 , σ in such a way that there is a tεwith

f(ϕ(tε)) < mσ and ‖ϕ(tε)− x∗‖ < ε. (6.5.12)

To see that there is a common tε for both properties we can find a sequence (tn) witht1 < t2 < . . . and it holds

limn→∞

ϕ(tn)) = x∗,

limn→∞

f(ϕ(tn))) = f(x∗).

This sequence discretizes the continuous gradient path and produces convergent sequences(ϕ(tn)) and (f(ϕ(tn))). Then the argumentation holds for every so defined sequence (tn)and we can choose tε from the sequence.Now the function value of the point on the path at tε is smaller than any value on theboundary of the convexity region. The gradient path satisfies

f(ϕ(t2)) ≤ f(ϕ(t1))⇔ t1 ≤ t2,

which we called the downhill property. Once a point with (6.5.12) is reached the followingargumentation holds. Since the gradient path is a continuous function of t it can never leavethe convexity region around Uσ(x∗) because it cannot pass the boundary while satisfyingthe downhill property. Hence, there is a t∗ > 0 with

‖ϕ(t)− y∗‖ > δ

2for t ≥ t∗,

which excludes y∗ as a limit point of the gradient path.

A consequence of this result is that any gradient path that has an isolated minimum aslimit point cannot have any other limit points and therefore converges to that point.If the gradient path converges, it can only lead to a local minimum or a saddle point of fsince the direction is downhill with respect to the function f . However, when starting fromany critical point, the solution of the system is just constantly this critical point . Thisis particularly true for local maxima. We can define a general basin of attraction in ananalytic way by the following. See also Fig. 6.21.

Definition 6.5.4. Assume that we have a function f ∈ C2(Ω) with a lower bound that hasa finite number of isolated local minima. Let Z− be the set of all local minima of f and Z0

be the set of all saddle points. We assume z ∈ Ω for all z ∈ Z− ∪ Z0 and define

B(z) := x ∈ Ω : limt→∞

ϕx(t) = z, z ∈ Z− ∪ Z0, (6.5.13)

while ϕx(t) is the gradient path (6.5.2) of a function f starting at x. This set is called’analytic basin of attraction’ (ABA) of the point z for a function f .

170


w -

6o w /Y6

: hh

Figure 6.21.: This figure shows the basic concept of basins of attraction. For every point inthe domain we are looking for a associated attractor that has to be uniquelydefined. The circles represent local minima of a function while the arrowsillustrate the mapping from a certain initial value to the local minimum andshall not be mixed up with the gradient path itself.

Choosing a local maximum of f as initial value of (6.5.2) yields a constant solution sinceevery local maximum is clearly a stationary point of the system. Thus, the only pointsthat are attracted to a local maximum are the maxima themselves. Let Z+ be the set ofall maxima of f , then it follows⋃

z∈Z−B(z) ∪

⋃z∈Z0

B(z) = Ω \ Z+. (6.5.14)

Property (6.5.14) means that the interior of Ω can be decomposed into ABSs except acertain set of local maxima.

Lemma 6.10. The sets B(z), z ∈ Z− are connected and open subsets of Ω.

Proof. Since we are only concerned with points x ∈ Ω, the analytic basins are subsets of Ω.From the asymptotic stability of the attractors it follows that for every point on the gradientpath to z, there is a neighborhood whose points lead to the same limit point. Thus, theseneighborhoods belong to B(z) and B(z) is open.The analytic basin of z is the union of all paths leading to z. So, for all points x, y ∈ B(z),there exists a path connecting x and y, e.g. the union of the path for x leading to z and thepath for y leading to z. The point z itself is in B(z) and all paths come arbitrarily close toit. Thus, B(z) cannot be decomposed into two disjoint non-empty open sets and B(z) isconnected. Furthermore, it is even path-connected since z is the closure of all paths thatlead to z and connecting all paths.

Lemma 6.11. Given two initial values x0, y0 ∈ Ω. If there exists t1, t2 ≥ 0 with

ϕx0(t1) = ϕy0(t2),

then x0, y0 ∈ B(z) = B(limt→∞ ϕx0(t)) = B(limt→∞ ϕy0(t)).

171


m6

x0 = ψ(0)

y0 = ψ(τ)

6

6

z

Figure 6.22.: This figure shows the reverse gradient path including the points x0 and y0

whenever the corresponding gradient paths are not disjoint.

Proof. It follows directly from the initial value problem (6.5.2) being uniquely solvable.

From Lemma 6.11 we can derive that all pairs of gradient paths satisfy exactly one of thefollowing properties.

(i) They are disjoint.

(ii) They are a subset of a common path.

To see this we assume that we have given a path ϕx. We take a position τ > 0 and definethe reverse path as

ψ(t) := ϕx(τ − t) t ≥ 0. (6.5.15)

This can be written as an ODE by

ψ′(t) =d

dtϕx(τ − t) = (−1) ϕ′x(τ − t),

= (−1)(−1) ∇f(ϕx(τ − t)),= ∇f(ψ(t))

with ψ(0) = ϕx(τ). This path is uniquely defined and it includes the inital value of theoriginal path ψ(τ) = x, compare with Fig. 6.22.Now if two paths ϕx0 , ϕy0 are not disjoint, there exist t1, t2 ≥ 0 with

ϕx0(t1) = ϕy0(t2) =: p,

thus yielding the same limit point. On the other hand, the reverse path starting at ψ(0) = pleads uphill and contains x0 and y0 as argued before. Thus, the reverse path contains allpoints from ϕx0 and ϕy0 until p. Building the union with the downhill path starting fromp gives a set including ϕx0 and ϕy0 completely. That is what we can call a common path.

Remark 6.5.1. Comparing with the definition of a region of attraction from the previoussections, this analytic definition has the property of uniquely mapping every point in the

172


x1

x 2

−5 0 5−5

−4

−3

−2

−1

0

1

2

3

4

5

Figure 6.23.: We sampled 1000 points from a Halton sequence in two dimensions and ap-proximated the gradient paths. The crosses mark the points while the linesrepresent the paths leading to the associated local minimum of Himmelblau’sfunction. For each minimum (marked with the circles) a different color ischosen to illustrate the paths that belong to the ABA of that minimum. Weremember that these sets are open, thus, the boundaries have to include non-isolated local maxima of the function. In the middle of the figure we have anisolated local maximum.

inner of the domain to a local minimum or saddle point of f while being independent ofany algorithmic details such as the type of line-search or the trust-region radius. When werecall the one-dimensional function from (6.2.24) that has three local minima in its domain,we observed that the previously defined regions of attraction were not necessarily connecteddue to numerical phenomena. Even exact line-searches will not guarantee that. Followingthe gradient path is more reliable and if we assume that we can solve the ODE arbitrarilyaccurate, we can guarantee that the resulting ABA is independent of any algorithm thatmight be used. Thus it is a property of the function itself.

In Fig. 6.23 we can see again Himmelblau’s function [Him72] with 1000 gradient pathsstarting from points in a Halton sample.

6.5.1. Quasi-Monte-Carlo Approximation

The same reasoning as in Section 6.4.1 can also be used here. If we would like to approximatethe volume of an ABA we can realize this by sampling points in the domain and check inwhich basin they lie. Repeating this procedure many times gives us an idea on how large

173


the particular basins are.One can easily see that the open basins (6.5.4) are measurable sets. Let λ be a measure,then it can be written as the volume integral

λ(B(z)) =∫B(z)

1 ds =∫

Ωgz(s) ds (6.5.16)

with

gz(s) :=

1, s ∈ B(z),0, else.

(6.5.17)

If x1, . . . , xN is a low-discrepancy sequence of size N in Ω, then this integral is approximatedby the Quasi-Monte-Carlo formula

λ(B(z)) ≈ 1N

N∑i=1

gz(xi), (6.5.18)

which is the mean of all function values evaluated for the sample.This is just counting the average number of points that lie in B(z) while the sequenceensures that the domain Ω is covered well. Since the approximation of volume of B(z) canbe written as an integral over the whole domain, Quasi-Monte-Carlo sampling is supposedto be a better choice than sampling randomly in Ω.Now it holds ∑

z∈Z−λ(B(z)) +

∑z∈Z0

λ(B(z)) + λ(Z+) = λ(Ω). (6.5.19)

If the function f has no flat plateaus Uε(z) for z ∈ Z+ with f(x) = f(z), x ∈ Uε(z), wehave

λ(Z+) = 0 (6.5.20)

and ∑z

λ(B(z)) = λ(Ω).

Although λ(Z+) = 0, the approximation to it has a positive value if at least one point ofthe sample lies in Z+.Let x ∼ U(Ω) be a random point from a uniform distribution in Ω and ϕx the path with theinitial value x. Then the limit point of the path is a random variable and the probability

P ( limt→∞

ϕx(t) = z) =λ(B(z))λ(Ω)

(6.5.21)

is the probability to find a gradient path that leads to z when starting at a random pointin Ω.

6.5.2. Solvability of the r-(GOP)

The definition of an ABA by solving the gradient path ODE analytically gives the possibilityof analyzing the solvability of the global optimization problem.Recall problem (6.1.2). We derive a simple result for the theoretical solvability of ther-(GOP) of finding r global minima of a function f within a given closed hyper-cuboidΩ ⊂ Rn.

174


In the following result, an iteration is defined that guarantees to solve this problem within afinite number of steps. Argumentation follows the terms of the resolution of a set of vectorsXN := x1, . . . , xN ∈ Ω as the first N elements of a sequence (X). We define the followingproperties of X.

Definition 6.5.5. Let ‖ · ‖ be a norm. We call the terms

rMAX(XN ) := maxj

mini 6=j‖xi − xj‖, (6.5.22)

and

rMIN (XN ) := minj

mini 6=j‖xi − xj‖ (6.5.23)

upper resolution of XN or lower resolution of XN , respectively.

Lemma 6.12. Given a global optimization problem r-(GOP) from (6.1.2) concerning afunction f : Ω → R. Assume that f ∈ C2(Ω) and all r global minima are isolated. Now iffor N →∞ it holds for a certain sequence (X) that

X = Ω

and it follows

limN→∞

rMAX(XN )→ 0.

Then for every r > 0 there exists N ≥ r in such a way that the problem (6.1.2) is solved bythe iteration

for k = 1 to N dozk := limt→∞ ϕxk(t)

end for

in a finite number of steps. The variable ϕxk(t) is the gradient path (6.5.2) starting at xkat position t.

Proof. From the analytic definition of the basins of attraction we know that these setsare open. This means particularly that for a local minimum z of the function f thereexists a small ε > 0 with Uε(z) ⊂ B(z) for all z ∈ z1, . . . , zr. X is dense in Ω, sothe second condition is automatically satisfied. And if the elements of a sequence satisfylimN→∞ rMAX(XN ) → 0, there must be an N such that rMAX(XN ) < ε. Consequently,the set XN includes a point in B(z) for all z. Thus, the iterates zk contain all globalminima.

Now it is shown that r global minima of a sufficiently smooth function f can be foundwithin a finite number of steps by sampling start points of the gradient path from a suitabledeterministic sequence. Again, this is related to Quasi-Monte-Carlos methods. In [AM98],the performance of Quasi-Monte-Carlo methods for global optimization is analyzed in onedimension. It is used that the van der Corput sequences are dense in (0, 1). This meansthat the upper resolution tends to zero as the sequence grows. This yields the solvabilityin the sense of lemma 6.12.

175


6.5.3. Perspectives – Gradient Paths with Tunneling Functions

Tunneling functions are modifications to an original objective function f . Using the distancefunctions of the form (6.2.6) yields an ellipsoidal area around a known local minimumwhere the pole function has an effect on the tunneling function. Outside of this region, thetunneling function is just the objective function shifted by the lowest known function value.Gradient paths can only be manipulated by the pole function if the pole region has anonempty intersection with the associated ABA. This means that all points that lie in anABA for a certain local minimum of the original objective function are also in the analogueABA of the tunneling function if the basin has an empty intersection with all pole regionsthat might be there.On the other hand, the ABA of the local minimum that is destroyed by a pole is decomposedwhile the pole function is applied. If the pole region lies completely within the ABA ofthe original objective function, then there must be at least one artificial minimum of thetunneling function in the interior and all gradient paths starting in this region will lead toone of them. This can be argued since the objective function and the pole functions usedare supposed to be in C2(Ω). When starting line-search or trust-region methods close to apole we will get gradients with large norm and this can lead to unpredictable and actuallyrandom behavior of the local solver. Gradient paths are continuous, so can we find a wayout of the pole region by following a gradient path?At least we know that this can only be possible if the pole region is large enough so that ithas a nonempty intersection with a neighbor ABA. If we assume that we can follow gradientpaths analytically, then we can construct an iteration in order to merge a gradient pathwith one of the neighbor basins.Given a local minimum z of the original objective function and a spherical shaped polefunction centered at it to define a tunneling function. Let us take a certain point withinthe pole region S. When we follow the gradient path of the tunneling function from thispoint to a local minimum x, then only these two scenarios are possible.

(i) x ∈ S. It is an artificial local minimum generated by the pole function.

(ii) x /∈ S. It is a local minimum of the original objective function.

Case (ii) is the desired case since the presence of the pole at z generated a gradient paththat lead us to a different local minimum of the objective function. Case (i) is undesiredbut cannot be avoided a priori. However, one can apply a modification strategy that allowsus to continue following a gradient path starting from x and repeating this procedure untilx /∈ S.This can be done by stretching the sphere to form an ellipsoid and transform the axes in away that x lies closer to z and surely still within a modified pole region S. We have to notethat the pole region itself is just a result of the parameterization of the distance function,the effective distance always depends on the choice of the distance function and must notbe mixed up with the Euclidean distance. Such a method is described in Section 6.2.5 bysimple matrix operations to the shape matrix Q of the distance function. Once such amodification is made, we have to ensure that x is no longer an artificial local minimum ofthe tunneling function. Then we can use this point as another initial value for the gradientpath ODE. We write this as: If this algorithm terminates, it gives a local minimum of theobjective function that differs from z.The modification of the pole region has two components. One is the alignment of the

176


Algorithm 6.7 Escape AlgorithmDefine S as a sphere centered at z. Choose x(0) ∈ S. k ← 0.while x ∈ S do

Calculate x(k+1) ← limt→∞ ϕx(k)(t)Average direction p← 1

k

∑k+1i=2 (x(k) − z).

Modify the shape matrix Q and get S such that ‖∇f(x(k+1))‖ > 0.S ← S. k ← k + 1.

end while

ellipsoid to the average direction. The other component is the stretching factor in orderto expand the volume of the pole region. If the modification is chosen in a way that ityields growing pole regions in every step even if the (k + 1)-th iterate x(k+1) is absolutelycloser to z than the previous iterate x(k), the size of the pole region will definitely reachthe boundary of the original ABA. Only then we might be able to find a gradient path outof the pole region.A special case in this iteration is when the pole region S results in a sphere or quite closeto a sphere. This holds for the initial iteration and if subsequent iterates yield an averagedirection p ≈ 0.We show that this strategy works in test cases. For a practical implementation of such analgorithm we need to solve the gradient path subproblems in each step. Obviously, closeto a singularity, the norm of the gradient of the tunneling function can get arbitrary largeresulting in a very stiff and numerically hard to solve initial value problem. Instead ofsolving the original problem we apply a damping strategy and solve the damped gradientpath problem

ϕ′x(t) = −∇f(ϕx(t)), (6.5.24)ϕx(0) = x.

We use a strict monotone logarithmic damping

∇f(x) :=ln(‖∇f(x)‖+ 1)‖∇f(x)‖

∇f(x), (6.5.25)

that preserves the relevant properties of the gradient path.For very small gradients the damped gradient is close to the identity mapping while verylarge gradients become relatively small. This yields a more homogeneous gradient field andseems to be necessary for our experiments since otherwise, the ODE would become verystiff and unsolvable by standard methods. In Fig. 6.24 and 6.25 we can see exemplaryruns of an escape algorithm. The objective function is chosen as Himmelblau’s function intwo dimensions. The figures also show the effect of a tunneling function on the gradientfield. Here, we use the gradient field to visualize the basins of attraction instead of a sampleof gradient paths. For the plot, the gradients were all scaled to have norm 1.In each iteration we solve the subproblem by MATLAB’s routine ode15s for stiff problemswith standard parameters. This uses an implicit multistep method (Adams-Moulton) oralternatively a BDF method with error control. For this examples, we perform a very roughapproximation of the gradient path’s limit points. Therefore we choose a fixed final time ofthe ordinary differential equation and do not check convergence. We choose the final time

177


Escape Iteration No. 1 Escape Iteration No. 2



Figure 6.24.: This shows an exemplary run of an escape algorithm on a tunneled Himmel-blau’s function visualized by its gradient field with gradients of norm 1. Theellipticity parameter is σ = 2, compare with Section 6.2.5. In each step, thesizing parameter δ grows by factor 1.2. The red line shows the gradient paththat is followed piecewise in each iteration of the Escape Algorithm 6.7.

178




Escape Iteration No. 5

Figure 6.25.: Here, we used an ellipticity parameter σ = 1.2 and δ grows by factor 1.4.

179


T = 1 and recall that the length of gradient path is just

L(ϕx; t) =∫ t

0

∥∥∥∥ ln(‖∇f(ϕx(s))‖+ 1)‖∇f(ϕx(s))‖

∇f(ϕx(s))∥∥∥∥ ds

≤∫ t

0ln(‖∇f(ϕx(s))‖+ 1) ds

≤ t ln( maxs∈[0,1]

‖∇f(ϕx(s))‖+ 1).

Thus, moving 1 along the damped gradient path yields a path length that is bounded fromabove by a (usually) small positive value. In our tests, it showed that T = 1 is a reliablechoice for the damped gradient to reach a point close to a local minimum of the tunnelingfunction. This might be due to the fact that the norm of the damped gradient is not varyingvery much in any case. Although the integration of the initial value problem is not solvedefficiently by the chosen method the results show the quality of the escape algorithm idea.As for all global optimization techniques that are based on sequential local minimizationthe choice of an efficient local solver is inevitable and is significantly responsible for themethod’s performance.

180

Chapter 7.

Numerical Results

In this chapter, we present numerical results including simulation, sensitivity analysis, opti-mization and optimal control of the simulation models described in Chapter 2 by using themethods presented in Chapter 4 and 6. This includes the software extension to gPROMSenabling the full parametric sensitivity analysis of large-scale steady-state models and theuse of the newly suggested method for time-optimal control as well as the modified globaloptimization techniques based on the tunneling algorithm idea.First we discuss the idea of local minimization by using methods for stiff ordinary differ-ential equations and present an idea how to make this approach capable of handling linearbound and general linear constraints by projected gradients. We show that the algorithmsolves standard problems and apply it to an error-optimal control problem to analyze thestructure of the problem.

7.1. A Projected Gradient Path Method for Linearly ConstrainedProblems

We present a new implementation of a method for solving the gradient path initial valueproblem for box-constrained minimization problems involving additional linear equalityconstraints. The main idea is the use of an implicit multistep method combined with aprojected gradient approach. For the analysis of gradient projection methods we refer to[CM87].We briefly discuss the main ideas and give the full MATLAB R© code in Appendix A. Weseek to solve the gradient path system

x′ = −∇f(x) (7.1.1)x(0) = x0 (7.1.2)

by using linear multistep methods, more precisely the BDF methods, in such a way that wefind the minimum of the function f . This is in principle a way to attack the unconstrainedsmooth minimization problem. But now let us assume that we have given a linear subspaceΩ ⊂ Rn of the form

Ω := x ∈ Rn : Ax ≥ b, Cx = d. (7.1.3)

Consider the problemminx∈Ω

f(x) (7.1.4)

for a certain initial guess x0. The set Ω is the intersection of a convex polyhedron andsome convex hyperplanes and therefore convex. We will need this property later. In box-

181

Chapter 7. Numerical Results

constrained minimization, the matrix A and the vector b have special structure,

A =

(`

−`

), b =

(`

−u

), (7.1.5)

where I is the identity matrix of dimension n and l, u ∈ Rn are vectors of lower and upperbounds on the variables.Clearly, calculating the gradient path to its limit point does not solve this problem since∇f(x) = 0 is in general not a necessary condition for the constrained problem.The Lagrangian of the problem is given by

L(x, λ, µ) = f(x)− λT (Ax− b)− µT (Cx− d) (7.1.6)

and this leads to the KKT conditions

∇xL(x∗, λ, µ) = ∇f(x∗)−ATλ− CTµ = 0, (7.1.7)λT (Ax∗ − b) = 0, (7.1.8)

λ, µ ≥ 0. (7.1.9)

The gradient path itself does not necessarily lie within Ω and in general it does not leadto a KKT point. We use the idea of the projected gradient to modify the gradient path inway that it satisfies the linear constraints.

7.1.1. Projected Gradient

Assume that we have a current working subspace

Ω= = x ∈ Ω : Wx = w (7.1.10)

with

W =

(C

aTi

), w =

(d

bi

), i ∈ W(x) ⊂ 1, . . . , nineq (7.1.11)

and W(x) is a current guess for the active set at the solution. The number of indices inthis set reduces the dimension of the problem in Ω to Ω= by the same number. Assumethat we have given a point x ∈ Ω=, then the gradient of f at this point might point out ofthe feasible set that is when

x− h∇f(x) /∈ Ω= ∀ h > 0. (7.1.12)

In this point the gradient path is not suitable to produce further progress in minimizing f .Take the orthogonal projection from a point in Rn to a lower dimensional linear subspaceto get the projected (or reduced) gradient

∇Ω=f(x) = P ∇f(x) (7.1.13)

whereP := −Z(ZTZ)−1ZT (7.1.14)

and Z is an orthonormal basis of the null space (kernel) of W that is of x : Wx = 0. Byapplying the projection to the gradient of f in x we get a direction which leads us strictly

182

7.1. A Projected Gradient Path Method for Linearly Constrained Problems

along the boundary. Thus we restrict ourselves to a linear subspace of lower dimension forsearch directions. In case of bound constraints this means that the entry in the projectedgradient for the active bound is simply set to zero.The question is how to find out the components of W to be able to calculate the projectedgradient. We give an idea of finding out the blocking constraints in case of simple boundconstraints as used in our application problem. If equality constraints are present we willalways have a nontrivial projection onto a linear subspace. Assume that x satisfies Cx = dand also aTi x = bi for a certain constraint i that is when the i-th variable is at one of itsboundaries. Then it is not clear that this bound is blocking and projection is needed. If theprojected gradient of f with respect to the subspace of the equality constraints does notpoint ’out of’ the feasible region, then the boundary is not blocking and we do not need toinclude this constraint to the matrix W for projection. Generally, if d is a direction and xis a point with aTi x = bi, then it holds for α > 0,

xα := x+ αd,

that

aTi xα − bi = aTi (x+ αd),= aTi x+ αaTi d,

= αaTi d.

This means that d points out of the feasible set if aTi d < 0 and into the feasible set ifaTi d ≥ 0.An immediate consequence is that projected gradient is not continuous when reaching theboundary of the feasible set. This needs special handling when solving the initial valueproblem. The projected gradient path is then treated as piecewise-smooth, the change inthe gradient is located and the iteration is re-initialized at that point.Consider again the problem of moving along a line and assume that aTi x − bi > 0 andaTi d < 0. Then we can limit the step size by setting

α ≤ bi − aTi xaTi d

(7.1.15)

to ensure thataTi xα − bi ≤ 0. (7.1.16)

Limiting the step so that moving along the line of d does not violate the constraints can beapplied for a linear predictor step as described in the next section.

7.1.2. The BDF method for the projected gradient path

Assume that the current working set is identified that is we know which of the inequalityconstraints are active. Let g(x) := ∇Ω=f(x) be the projected gradient onto the associatedlinear subspace. The ODE to solve is then

x′ = −g(x), (7.1.17)x(0) = 0. (7.1.18)

183


The backward-difference formula of order k applied to this equation gives

k∑i=0

αixn−i = −hg(xn) (7.1.19)

which can be written as the nonlinear system

F (x) := α0x+ hg(x) +Hk = 0, (7.1.20)

where Hk =∑k

i=1 αixn−i is the history of the last k points. In the sense of a predictor-corrector method, we can guess the solution of F (x) = 0 by an explicit method such as the(s+ 1)-step Adams-Bashforth (AB) method (we choose s := k − 1),

x(AB)n = xn−1 + h

s∑i=0

βig(xn−i), (7.1.21)

for the AB coefficients βi.The nonlinear system can be solved using the predictor AB as an initial guess for New-ton’s method. We do this simply by applying MATLAB’s method fsolve which uses atrust-region dogleg method and approximates the Jacobian by finite differences. Alterna-tively Levenberg-Marquardt or Gauss-Newton methods can be chosen. Here might a goodpossibility of improving the solvers performance. We could expect a simplified Newtonmethod to perform well. Note that approximating the Jacobian by finite differences maybe unacceptably slow. The Jacobian of the function F : Rn → Rn from (7.1.20) has theform

∇F (x) = α0I + h∇g(x), (7.1.22)= α0I + h∇2

pf(x), (7.1.23)

where ∇2pf(x) can be understood as the projected Hessian of the function f . One could

think of Quasi-Newton update formulas to approximate Hp ≈ ∇2pf(x(0)) and try a simplified

Newton method with the resulting approximation to the Jacobian of F . This would meanto solve the linear iteration

(α0I + hHp)(x(n+1) − x(n)

)= −F (x(n)) for n = 1, 2, . . . , (7.1.24)

until ‖F (x(n))‖ is smaller than a specified tolerance.Without giving the proof we assume that the predictor as well as the BDF iterate lie inΩ= if the history data lies completely in Ω=. In case of the explicit predictor this is easyto see. The s projected gradients from the history are assumed to be projected that meansthat moving along their direction keeps the constraints satisfied and a linear combinationof the projected gradients does the same.However this only holds for equality constraints defining the linear subspace. Inequalityconstraints might be violated in the next step to be computed. There are different scenarios.

• predictor feasible & implicit step feasible: do nothing

• predictor feasible & implicit step infeasible: see treatment below

• predictor infeasible & implicit step feasible: do nothing

184


• predictor infeasible & implicit step infeasible: limit predictor step to the boundary

In the last case the explicit predictor formula can be used to find a suitable step lengthreduction that leads the iterate onto the boundary. Alternatively one could use the BDFformula and a bisection on the step length to identify the position where the boundary ofthe feasible set is met. However this seems to be inefficiently expensive because severalgradient evaluations will be necessary to locate the intersection of the BDF path with theboundary. We implemented a hybrid guess for the intersection of the gradient path and theboundary. Let h be the current step size, then

x(1)n := xn−1 + h

(aTminxn−1 − bmin

aTminxn−1 − aTminx(AB)n

)s∑i=0

βig(xn−i) (7.1.25)

reduces the explicit AB step to the boundary when aTmin, bmin refers to the constraint withthe lowest value. By

x(2)n := xn−1 +

(aTminxn−1 − bmin

aTminxn−1 − aTminx(BDF )n

)(x(BDF )n − xn−1), (7.1.26)

we get a linear reduction of the suggested BDF step onto the boundary. Since both x(1)n

and x(1)n , satisfy aTminx

(i)n − bmin = 0 for i = 1, 2, we can guess the intersection with the

boundary and set

xn :=12

(x(1)n + x(2)

n

)(7.1.27)

which is also a feasible point. Alternatively we could choose the one with the lower functionvalue.When the boundary is located, the method has to be restarted when a discontinuous changein the projected gradient appeared. This is done analogously to the general startup proce-dure.Now we have a point on the boundary by activating the associated inequality. We have toidentify the active constraints in order to have the information needed for the projection.The following situation might occur. The unprojected gradient satisfies

aTi (−∇f(x)) ≥ 0 (7.1.28)

at a boundary point x. This means that the explicit Euler step will produce a feasiblepoint if the step size is chosen sufficiently small. However, the implicit Euler step does notnecessarily satisfy this condition. The explicit step points in, the implicit step points outof the feasible set. In that case, we could follow the predictor into the feasible set or – andwe will prefer that – recalculate the implicit step for a fixed projection onto that boundaryi.

Startup

The implicit k-step method for k > 1 has to be initialized by appropriate methods inorder to have the history information needed. It might be possible to perform a simplebacktracking line-search which can be seen as explicit Euler steps with a step size thatensures good progress.We choose to increase the ’working order’ stepwise from 1 to k if k > 1. In that way weperform implicit steps of maximal possible order.Alternatively one could think of applying a single-step method with error control to havea good start by identifying the shape of the gradient path to be computed.

185


7.1.3. Step Size Control

We have to keep the following in mind.

We do not want to solve the initial value problem!

We would like to solve the linearly constrained minimization problem that is by findinga KKT point. But we cannot work with fixed step sizes h. Assume solving the gradientpath problem with a fixed-step explicit Euler’s method is equivalent to trying to solve theminimization problem by a gradient method where no line-search is performed. We cannotexpect such a method to converge to a local minimum.Recall the downhill property of gradient paths. It says that the function value of f isstrictly decreasing while following the gradient path. Now instead of applying error controlto the implicit multistep method, we perform a simple backtracking search on the step sizeh until the downhill property is met. We formulate this more general as

find h : f(x(BDF )n ) < f(xn−1) + σhg(xn−1)T∇f(xn−1), (7.1.29)

where σ is an Armijo parameter to require the BDF step to produce a certain minimumdecrease on f . The minimum decrease is formulated by comparing the actual decrease ofthe BDF step with a pure projected gradient step along a line. Note that x(BDF )

n dependson the choice of h. The step size is chosen within some bounds that are given by thegrid history, see below for information about the virtual grid. The maximal step to takeis consequently limited to the sum of all step sizes that were performed since the last re-initialization. In general it does not hold that an implicit step produces descent on thefunction for all small h lower than a certain bound. A counter example is found by tryingto solve a one-dimensional quadratic minimization problem with the implicit Euler methodwhich relates to line-search method with the slope of the next iterate instead of the current.So it might be worth experimenting with values σ < 0 if h is already chosen too small.

Reducing the Order

We do not discuss order control intensively. The only type of order control applied hereis that the order k > 1 is reduced to 1 if the upper inequality cannot be solved even forsmall step sizes h. We use this information as an indication for a hard change in the pathdirection. Then higher order formulas might not be appropriate for producing valid iterates.After the successful reduction, the order can be restored stepwise to its former choice as inthe startup phase.

Virtual Grid

The BDF and the AB method need the last k equidistant grid points. Since we do notapply a constant step size, the spacings in the iteration grid are not equal. We interpret theiteration history as set of piecewise-linear grid functions, for the gradient information andfor the iterates themselves. These functions can be evaluated at every point in the history.Note that the history only dates back to the most recent re-initialization which might havebeen caused by a boundary violation.Here, we need convexity of the feasible set. We need that every point in the history isfeasible for the multistep methods to produce a feasible iterate. For equality constraints,it is obvious that the line between two feasible points is also feasible. The satisfaction

186


of the inequality constraints is only guaranteed if the feasible set is convex, then the lineconnecting two interior points lies completely in the interior of the feasible set. Convexityis also needed for the gradient history to be feasible in a sense that the combined directiondoes not violate the constraints at the current iterate.We calculate such a virtual grid by the following MATLAB R© function.

function [history] = virtual_grid(x, k, h, step)% returns the virtual grid of k equidistant points

% raw data is given in x

% the step size history is given in step

n = length(step);for i=1:n

stepping(i) = sum(step(n:-1:n-i+1));end

l = size(x,2);% last node is always last

history (:,1) = x(:,l);

for i=2:kback = (i-1)*h;% find position of last upper neighbor

p = min(find(stepping >= back )-1);

if (p == 0)dp = back/step(n-p);

elsedp = (back -stepping(p))/ step(n-p);

end

% interpolate linearly

displacement = ((x(:,l-p-1)-x(:,l-p)));history(:,i) = x(:,l-p) + dp*displacement;

end

187


7.1.4. Results on Rosenbrock’s function

0 10 20 30 40 50 60 7010−10

10−5

100

105

k=1k=2k=3k=4

Figure 7.1.: This shows the norm of gradient dependent on the number of iterations per-formed for k = 1, 2, 3, 4. It shows that k = 1 yields the fastest convergence inthat case.

We show the results of the algorithm given in appendix A applied to the standard testfunction, namely Rosenbrock’s banana shaped function from [Ros60],

f(x1, x2) = (1− x1)2 + 100(x2 − x21)2. (7.1.30)

Denote the variable x = (x1, x2)T .We give the results for an unconstrained problem of minimizing f over x ∈ Rn when startingat x0 = (2, 1.2)T . The global minimum of f is at (1, 1)T with f(1, 1) = 0. Fig. 7.2, 7.3,7.4 and 7.5 show the result graphs for the numerical solution of the unprojected gradientpath of Rosenbrock’s function. The results show that we get convergence to the minimum,see Fig. 7.1, with a measured rate of convergence that is at least locally linear (with orderp > 1). However, the performance depends on the order k of the chosen BDF method. Thebest results are shown for k = 1.We analyzed the behavior of the method for order up to k = 4 and give the results in thefollowing table.

order Iterations needed function evaluations order reduced tolerance1 21 322 0 10−10

2 36 532 1 10−10

3 40 616 1 10−10

4 49 685 1 10−10

In each case for k > 1, there was a position where no step size h was found to produce adescent on the function. So the order was reduced to 1 for this iteration.We interpret the results as positive. Higher orders generate better approximations to thesmooth gradient path and are therefore more expensive. Working with the implicit Eulermethod seems to be promising. Higher order BDF formulas are not necessarily suitable

188


10−3

10−2

10−1

100

101

102

1

1.05

1.1

1.15

1.2

1.25

1.3

1.35

1.4

1.45

1.5

x

1

x2

Figure 7.2.: This figure shows the trajectories of the solution of the gradient path for Rosen-brock’s function with the implicit BDF method of order k = 1. Note that thelogarithmic scale shows the stiffness of the problem.

to produce a descent on the function in any case. The reason why the algorithm takes10-20 function evaluations per iteration is due to the fact that the implicit BDF equation issolved by MATLAB’s standard solver fsolve for a given tolerance where the Jacobian hasto approximated. This refers to the problem of approximating the Hessian of the function.We would expect a simplified Newton method to perform well and save function evaluations.To show high performance of the gradient path method is not within our scope but to showthat it generally works with that kind of problem. We extend the problem of minimizingRosenbrock’s function with linear constraints. Let

x1 ≥ 1.2 (7.1.31)

be a lower bound on the first variable. Thus

a = (1, 0)T , b = 2. (7.1.32)

Recall the algorithm’s idea for linearly constrained problems: ’Follow the gradient path tothe boundary of the feasible region. Stay on the boundary as long as the BDF iterates wouldviolate the constraints and return to the interior if they do not.’ So we expect the algorithmto produce a sequence of points that leads to x1 = 1.2 and continues on the boundary tothe constrained minimum. A priori, we would not know whether the minimum lies withinthe feasible region or on the boundary.We give the results for k = 1.

189


0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Figure 7.3.: This shows the solution path with respect to the coordinates for k = 1.

iter fval x1 x2 f(x) optimality0 1 2.0 1.2 785 -1 24 1.49084 1.37043 72.86009 8.33680e+022 40 1.32285 1.43368 10.10580 3.30935e+023 56 1.24292 1.46545 0.68957 8.52932e+014 72 1.21898 1.47455 0.06089 1.25786e+015 85 1.21458 1.47495 0.04605 6.06710e-016 98 1.21325 1.47268 0.04553 4.26341e-017 108 1.21127 1.46792 0.04469 4.67494e-018 118 1.20739 1.45853 0.04307 4.59298e-019 128 1.20000 1.44067 0.04004 4.61323e-0110 147 1.20000 1.44003 0.04000 1.73398e-0211 166 1.20000 1.44000 0.04000 3.32175e-0412 188 1.20000 1.44000 0.04000 3.21242e-0613 229 1.20000 1.44000 0.04000 1.56115e-08

The value in the column called optimality is a measure in the KKT sense. In iteration 9,the reduction to the boundary was performed. Here, the minimum lies on the boundaryfor a KKT point is found.We also could impose equality constraints but this would make the solution of the con-strained problem trivial because of the low dimension of Rosenbrock’s function.

About Simplified Newton’s Method

The idea of solving the simplified Newton iteration (7.1.24) is applied and the Jacobianat the current iterate is approximated by using Broyden’s update (4.1.31) to estimate the

190


10−3

10−2

10−1

100

101

102

0.9

1

1.1

1.2

1.3

1.4

1.5

x

1

x2

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Figure 7.4.: This figure shows the trajectories of the solution of the gradient path for Rosen-brock’s function with the implicit BDF method of order k = 2.

Hessian of the objective function and transform it to the current Jacobian. The code isaugmented to test the idea in a way that it first tries to solve the simplified Newton it-eration up to given tolerance and accepts the solution in case of success. Otherwise ituses MATLAB’s routines to solve the nonlinear equation which is usually more costly sincefsolve approximates the Jacobian using finite differences. We solved the unconstrainedoptimization problem up to the tolerance of 10−10 several times. In the table below, thecolumn ’fsolve stats’ and ’simple stats’ give the statistics of the usage of the fsolve methodand the simple Newton iteration, respectively, in the order ’number of iterations / evalua-tion counter (numbers in failed tries)’. We limit the maximal number of iterations in thesimplified newton method to 12 and choose a tolerance of 10−4 on the residual for theimplicit BDF equation. We expect the iteration to diverge when either the approximationto the true Hessian of the objective function is bad or the initial guess is too far fromthe solution. Thus, computational overhead might be due to failed tries of the simplifiedNewton method. In the following table the number of iterations of the BDF methods iscalled accepted iterations because it emphasizes that a BDF step is only accepted when

191


10−3

10−2

10−1

100

101

0.9

1

1.1

1.2

1.3

1.4

1.5

x

1

x2

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Figure 7.5.: This figure shows the trajectories of the solution of the gradient path for Rosen-brock’s function with the implicit BDF method of order k = 3.

it produces a decrease on the objective function. In the brackets we have the number ofiterations used with pure MATLAB’s method fsolve, compare with the tabular from above.

order accepted iter fval fsolve stats simple stats tolerance / max. iter newton1 22(21) 367 (322) 8/165 14/83(96) 10−4/122 38(36) 549 (532) 11/190 35/180(132) 10−4/123 43(40) 634 (616) 15/237 34/167(180) 10−4/124 58(49) 799 (685) 13/178 105/346(156) 10−4/12

Successful simple iterations need significantly less function evaluations than the trust-regiondogleg method because it works without finite difference approximations. However, diverg-ing iterations destroy any benefit. It can be seen that the algorithm using the simple newtoniteration needs slightly more iterations of the BDF methods which means that the progressis slower due to inaccurate iterates. This can be eliminated by using a finer tolerance onthe residual of the newton iteration, but this can be costly because of the slow convergence

192


10−6 10−4 10−2 1009.3

9.4

9.5

9.6

9.7

9.8

9.9

10

iteration time

inte

rval

2

10−6

10−4

10−2

100

0.3

0.305

0.31

0.315

0.32

0.325

0.33

iteration time

thic

k st

ock

flow

x3

x4

Figure 7.6.: In the left figure, we can see the length of interval 2 that is x1 along thecomputed gradient path. The right figure shows x3 and x4 which are the thickstock flow rates for interval 2 and 3. The length of interval 3 is not shownbecause it is given by 100− x1.

of simplified Newton’s method.Although the results are not convincing, they show that the choice of the method for solvingthe corrector equation must be discussed when revisiting such kinds of algorithms. It isknown that the predictor must be chosen in such a way that it fits the choice of correctorequation. In the literature discussed before, there are several strategies that are based onpredictor polynomials which might generate better initial guesses to the corrector equationwhich is inevitable for a simple Newton method to perform well.

7.1.5. Specification-Error-Optimal Control

We use the presented algorithm to solve the error-optimal control problem (4.3.35) whichaddresses the time-optimal control problem inaccurately. The model used is the simplewet-end presented in Chapter 2.The control variable responsible for the thick stock flow in the input is discretized overthree intervals. The first interval is fixed to have a certain length and the variable is alsofixed to a defined value. For each interval we have to choose its length and the control overthe interval. This yields a problem dimension of n = 4. All controls have lower and upperbounds. Furthermore we assume that the overall time horizon is fixed. The information isstored within a single control variable x of the following structure.

x =

time interval 2time interval 3

control in interval 2control in interval 3

(7.1.33)

So we get a total of 9 constraints. These include 8 inequality constraints by

193


10−5 100101

102

103

iteration time

erro

r

10−6

10−4

10−2

100

0

2

4

6

8

10

12

14x 10

4

iteration time

norm

of g

rad

Figure 7.7.: In the left figure, we can see the objective function along the gradient path forthe first 10 seconds of the gradient path. The figure on the right-hand sideshows the path of the norm of the gradient. Both figures have a logarithmicx-axis because of the fast changes in the first milliseconds.

A =

1 0 0 00 1 0 00 0 1 00 0 0 1−1 0 0 00 −1 0 00 0 −1 00 0 0 −1

, b =

0000

1001000.50.5

(7.1.34)

The time interval lengths have to be chosen between 0 and 100 and the controls between 0and 0.5. The fixed time horizon is described by

C =(

1 1 0 0), d = 100, (7.1.35)

to ensure that the sum of the interval lengths is always 100. Recall that the length of thetime horizon is crucial for the error-optimal control since it is the value of an integral overthe whole length. We start at x = (10 90 0.3 0.28)T . We cannot plot the gradient pathapproximation in 4 dimensions but we can restrict ourselves to subspaces. Clearly, thevariables x1 and x2 will always build a straight line when plotting them against each otherbecause of the equality constraint and the projection to the associated subspace.The first 76 iterations of the gradient path algorithm for k = 1 give the plots in Fig. 7.6 and7.7. The use of the projected gradient guarantees that the equality constraints always hold.We can see that we have to plot the graphs with a logarithmic x-axis in order to be able tosee the shape of the gradient path. The gradient path is somehow like a characteristic of theconsidered problem. The plots do not show a complete path but only the first 10 seconds ofthe iteration. What we can see so far is that we experience immediate and rapid change ofthe variables that represent the control values and almost no change in the interval lengthvariables. This change stops after about 10−5 seconds of iteration time and slowly we can

194


see that a change in the interval lengths is about to take place. So we only observe eithervery rapid or very slow change along the gradient path in the variables.This can be explained the simple fact that the objective function is very sensitive to smallchanges in the variables. This comes from the fact that generally, there are a lot more badideas how to control a system than good ideas how to do it. A random control is verylikely to be a very bad choice and this explains the rapid decrease in the objective functionvalue in the first milliseconds. Furthermore, the choices of a control interval length and thecontrol value for that interval are strongly related. Choosing the time intervals has a verystrong effect on the possible choices of the control values. From the shape of the gradientpath in the first milliseconds we can see that the choice of control values was very bad forthe choice of interval lengths. We can interpret the gradient path roughly as the procedureof finding a sufficiently good choice of control values for the current set of interval lengthsin order to have a change to find a different set of interval lengths. But it seems that, nomatter which point we choose, one of the variables is always blocking in a sense that theobjective function’s sensitivity changes rapidly and we cannot change it without makingthe others a bad choice.

7.1.6. Remarks on the Algorithm

The presented algorithm is far from being perfect. There are many promising ways how thecode can be improved including choice of step lengths, automatic order selection, a suitablestartup and primarily an efficient solution technique for the implicit equation. One couldeven think of predictor-corrector schemes that effectively work with solving the implicitequation by Newton’s method. In the end it might be useful to combine a strategy thatweights integration error and descent on the function along the solution in order to choosea suitable step size and order.As explained we chose the order of the predictor according to the order of the BDF formula.Higher order predictors do not yield a reduction in function evaluations due to the betterinitial guesses. This might be mainly because higher order formulas restrict us to havelower maximal step lengths since more history information is needed when increasing theorder and keeping the step size. It showed that the choice of the initial step length to startthe backtracking with the BDF equation is crucial for the performance of the algorithmwith respect to iterations and function evaluations. One might think of applying the fixed-leading coefficient form of BDF formulas as explained in [MP96] to be able to have a goodstep-size and order control.The algorithm solved the unconstrained Rosenbrock’s problem as well as the linearly con-strained problem by following the gradient path to its minimum or to a KKT point, re-spectively. Compared to a real application problem, the academic problem is considerableeasier. The presented wet-end optimal control problem yields an objective function thatis not defined everywhere. So we have to take care of this fact by restricting ourselves tofeasible points. This means that the BDF iterate cannot lie outside of the feasible region,instead of this, there is no solution to the implicit corrector equation.

195


7.2. Optimal Control of the Wet-End Process

We discuss grade change problems when simulating wet-end processes. The grade changeproblem can be seen as a time-optimal control problem where a parameterized controlfunction has to be found in such a way that the system moves from a current state to adesired stable end state. In paper production this can simply be changing the sort of theproduct as fast a possible in order to have a minimum of waste of material, time and energy.But in the end, the costs are dominated by the factor time because variable production costscan be formulated as integrals over time or are just constants multiplied by time.The way we formulated the grade change problem as a time-optimal control problem witha fixed finite time is now used to solve optimal control problems at exemplary grade changeproblems. We showed that the discretization of the transport problem is significant forthe impulse response of the dynamic wet-end system. Since a piecewise-constant controlis just a sequence of impulses we can expect that the optimal control solution depends onthe discretization of the transport problem. Having a rough discretization, the system hasa high inertia and the responses are smooth. Taking this phenomenon into account, we areinterested in general structures of optimal controls more than in absolute values of intervallengths and control magnitudes. We use a fine discretization in order to have sharp impulseresponses and expect that this allows us to identify optimal control structures. We take100 discretization nodes per meter for the pipes in the simple wet-end model. This leadsto a DAE model with 4508 differential and 91 algebraic variables.

7.2.1. A Grade Change Example

We seek to analyze the dependency of the error-optimal solution on the number of controlintervals that discretize the time horizon and compare the results with time-optimal solu-tions found by binary search.The problem that we discuss is to find a piecewise-constant control of the thick stock flowrate in order to change the substance of the paper from ws = 70 to ws = 90 in minimal time(for time-optimal control) or with a maximum of specification stability (for error-optimalcontrol).In Fig. 7.8 it is shown that the error-optimal control is not necessarily a time-optimalcontrol. But we have to notice that the time-optimal solution trajectory shown is verysensitive to model inaccuracies or small changes in the control parameters, especially thetime interval length of the first control interval. The larger the final time T2 is chosen, themore the error-optimal control needs to converge to the center of the specification windowwhile the time-optimal control does not need to converge to that value. Another drawbackof time-optimal controls is that they are mostly similar but not unique.This makes it harder to determine the structure of optimal controls when increasing thenumber of control intervals. By increasing this number, the problem of minimizing theoverall error always leads to a potentially smaller or at least the same error although theminimal grade change time does not need to decrease. Assume that we have given an opti-mal control for k control intervals. This leads to an effective control in the system that is aspecial case of a control of k + 1 intervals. Obviously, the k-control and a suitable (k + 1)-control have the same objective function value. Now one of the two possible scenarios mustbe true. Either the (k + 1)-control is optimal with respect to k + 1 control intervals andthen we have found the optimal control structure or it is not optimal and therefore it canbe improved. So we can say that the sequence of objective function values for increasing

196


Figure 7.8.: This figure shows the comparison of the optimal control strategies. The red linebelongs to the control minimizing the overall error and the black line resultsfrom the solution of the feasibility problem with Algorithm 4.1. In order tosolve the time-optimal control problem, the state trajectories are supposed tolie within the light red area as soon as possible. Obviously, the error-optimalsolution has a much higher entrance time (about 116 seconds) compared to thetime-optimal control (about 12.8 seconds). This can be explained by the factthat error-optimal control tend to be asymptotically unbiased and thereforeexclude feasible but biased controls such as the time-optimal one in that case.

numbers of control intervals is monotonically decreasing if the underlying optimal solutionto find is smooth.So it is convenient to analyze the structure of error-optimal controls dependent on thenumber of control intervals and we expect an decreasing overall error for the error-optimalcontrols for increasing numbers of control intervals. The sequence of error-optimal valuesof the error is clearly bounded from below by zero. Bolzano-Weierstrass’s theorem tells usthat there is a limit point of this sequence, the smallest possible error for piecewise-constantcontrols. This value can either be reached for a certain number of control intervals or beapproximated asymptotically. In principle, if there is a smooth control, we can expect thatthe approximation of the smooth control by a piecewise-constant function is potentiallybetter if more control intervals are used.We assume that the system is at a certain state at time 0. The time-optimal solution – ifthe control can be chosen arbitrarily – is limited by the system’s inertia. The best possiblecontrol – if it exists – would steer the states into their specification intervals as fast as itis allowed by the control value boundaries and the system’s inertia and keep them in theinterior until the time horizon is reached. This behavior is necessary for the best possiblecontrol but it will not be unique and it might be possible to find a control that results insuch a behavior for a finite number of control intervals. If that is the case, introducingfurther control intervals for the computation of time-optimal controls will not have anyeffect on the optimal entrance time and the binary search algorithm will fail to give feasiblepoints.

197


ζ∗1 ζ∗2 ζ∗3 ζ∗4 ζ∗5

?

dE(t) :=∣∣∣∣ d

dt

∑i∈I

(xi(t,u(t))−zi

εi

)2∣∣∣∣

ζ01

ζ02,1

ζ03 ζ0

4 ζ05

ζ02,2

?

? ?

Figure 7.9.: By taking the point where the time derivative of the objective function takesits maximum we can guess how to divide the control intervals to find a goodstart value for the problem with one more control interval.

7.2.2. Refining Control Intervals

Assume that we have found an optimal solution of the error-optimal control problem

u∗nT := arg minu∈UmnT

E0(u) (7.2.1)

with

E0(u) :=1T2

∫ T2

0

∑i∈I

(xi(t, u(t))− zi

εi

)2

dt (7.2.2)

from Chapter 4 with the parameter space UmnT , together with the DAE system constrainingthe problem implicitly. Recall that u(t) results from building the piecewise-constant func-tion using the parameters in u.Clearly we have that

E0(UmnT ) ⊂ E0(UmnT+1) (7.2.3)

because every piecewise-constant control can be given by a parameter vector of higherdimension by dividing one of the intervals and duplicate the control magnitude for thisinterval. It is a true subset since there are controls impossible to describe with lowerdimension. Let m = 1. Now we try to find an initial guess for the numerical optimizationfor the problem

minu∈UnT+1

E0(u), (7.2.4)

thus for a control with an additional control interval. The optimal control u∗nT has the form(4.3.13),

u∗nT := (ζ∗1 , . . . , ζ∗nT, ν∗1 , . . . , ν

∗nT

). (7.2.5)

198


nT = 1optimize

6

refine?

?

nT = 26

optimize

?

-

refine

6nT = 3

m

m

Figure 7.10.: Refining scheme. We start with a single control interval, optimize, refine thetime discretization by dividing one of the intervals, optimize with two controlintervals and so on.

According to Fig. 7.9, we refine the control vector parameterization by taking the positionwhere the slope of the objective function integrand takes its maximum value. The intervalis separated and the control magnitude for the new intervals is taken as the same as for theold interval. The resulting parameter vector clearly leads to the same objective functionvalue and if the point is not optimal, yet, we can find an improvement by moving along thenegative gradient direction.

Remark 7.2.1. The gradient of the objective function in the dynamic case is built by thevalues of several integrals at the final time T2 and these can be called integral sensitivities.By looking at the sensitivity trajectory we can see which time is most sensitive to changes inthe control parameters. Although this is a real indication for a control interval to be refinedwe cannot make use of it because gPROMS does not provide the trajectories over time ofthe sensitivities building the gradient.

We use a parameter vector unT of length 2nT to describe a unique piecewise-constantfunction defined for t ∈ [0,

∑nTi=1 ζi]. To compare two different control vectors it is not

useful to take the distance between the controls in vector form which is even impossible forvectors of different length. So we work on the time domain where all piecewise-constantfunctions are defined. Let ui(t) and uj(t) be the realizations of two control vectors ofdifferent length i and j. We define the distance between both functions as

gi,j(t) := ui(t)− uj(t) (7.2.6)

and take the L1-norm

‖gi,j‖L1 =∫ T2

0|gi,j(t)| dt, (7.2.7)

which is actually given by a finite sum since gi,j is also piecewise-constant with at leastmini, j intervals, see the MATLAB R© code in Appendix B. This is used to measure thedistance between two effective controls. It has the property that ‖gi,j‖L1 = 0 is possiblealthough ui 6= uj . The procedure of introducing further control intervals and computing the

199


optimal control for the current number of intervals can be monitored by an improvementrate relative to the change in the control,

rnT :=E0(u∗nT−1)− E0(u∗nT )‖gnT ,nT−1‖L1

, (7.2.8)

where u∗nT is the optimal solution for nT control intervals and u∗0 is chosen as a standardcontrol. The larger this value is, the more improvement could be achieved by introducinga further interval.Now we work on an extended wet-end model which consists of the simple wet-end includingthe short white water circulation and and additional circulation with a longer residencetime. The system starts in a steady-state and produces paper with a substance of aboutws = 70 and we are looking for a control of the thick stock flow rate to minimize thedeviation of the substance from the specified value 90. Integrating the error starts at thesame time that control can be used to steer the system. Since there is an offset at t = 0and the steady-state startup does not give the value 90 we have a positive lower boundon the specification error. The system is configured in such a way that the state variablesconverge to constant values whenever a control is changed and held, so to say there is aunique steady-state for each control. So we expect that, independent of the number ofcontrol intervals used, the last control interval is needed to set the thick stock flow rate tothe value needed for the steady-state with substance ws = 90. If we allow only one controlinterval, we will find this value for the flow rate. We choose T2 = 500 as the fixed timehorizon which is always constraining the lengths of the control intervals by

nT∑i=1

ζi = T2, (7.2.9)

and this means that the interval length is effectively not a degree of freedom for nT = 1.We use the refining scheme shown in Fig. 7.10 to start with a single control interval andsuccessively find initial guesses for higher dimensional piecewise-constant controls.In the following table we see the time discretization of a total interval of length 500. Thetime intervals in the boxes show the ones which are divided to generate an initial guess forthe next dimension.

nT ζ∗1 ζ∗2 ζ∗3 ζ∗4 ζ∗5 ζ∗6 refine E0(u∗nT )1 500 – – – – – 8.92 8.5622 18.308 481.692 – – – – 8.54 6.2773 1.424 17.017 481.559 – – – 7.56 5.3774 2.395 0.895 14.35 482.361 – – 7.5 5.2765 2.6 1.719 1.208 12.771 481.702 – 7.5 5.2456 2.623 1.803 1.621 1.964 10.007 481.982 7.5 5.243

The optimal objective function value decreases with increasing numbers of control intervalsand in this case it seems to converge rapidly. It took 4 NLP iterations to find a solutionfor nT = 1. Using the refinery scheme to guess a discretization for nT = 2 helped to find asolution within 9 NLP iterations compared to 27 when starting at point similar to the startcontrol of the one-dimensional problem. So here, it is faster to solve the one-dimensionalproblem to guess a solution for the two-dimensional problem and then solve it than solvingthe two-dimensional problem directly.

200


100 110 120 130 140 1500

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

time

cont

rol

100 110 120 130 140 15070

75

80

85

90

95

time

ws

Figure 7.11.: nT = 1, a single control interval. This relates to a simple step in the thickstock flow rate.

In the following table the progress of improving the solution by introducing further controlintervals is monitored.

nT 1 2 3 4 5 6‖gnT ,nT−1‖L1 33.853 1.383 0.498 0.3684 0.077 0.067

rnT 10.322 1.652 1.804 0.2742 0.4 0.03

The distance from the current solution to the previous one is decreasing as the number ofcontrol intervals in increased. Roughly speaking, this can be seen as some sort of measureof convergence because if it is getting close to zero. This means that the current solutiondoes not significantly differ from the previous one.In the Fig. 7.11 – 7.16 we can see the results of the 6 optimizations for nT = 1, . . . , 6.These consist of the effective control function over time and the response of the system interms of the variable ws. For nT = 1 we have basically a step response which shows ussomething of the system’s dynamics. The more intervals available, the more the responsecan be balanced out to stay close to the specification. We see that the control boundariesare limiting for nT > 2, the controls seek to steer the system to its dedicated state as fast aspossible by using the maximal allowed value for the control magnitude in the first interval.This relates to a short and strong impulse as discussed in Chapter 2 and the system wouldproduce significant responses if the control was reset to its previous value. So the followingintervals are used to balance out exactly this potential response.

7.2.3. Sequential Time-Optimal Control with Trajectory Boundaries

Now we use the error-optimal control to guess a solution for the time-optimal controlproblem with trajectory boundaries as presented in Section 4.3.1. The specification isdefined by a tolerance of +/- 1 for the state variable ws to differ from 90. We use thebinary search Algorithm 4.1 to find the solution within a specified tolerance.

201


100 110 120 130 140 1500

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

time

cont

rol

100 110 120 130 140 15070

75

80

85

90

95

time

ws

Figure 7.12.: nT = 2. The first interval is used to speed up the response of the system andthe flow rate is set to the asymptotic value in the second interval.

100 110 120 130 140 1500

0.1

0.2

0.3

0.4

0.5

time

cont

rol

100 110 120 130 140 15070

75

80

85

90

95

time

ws

Figure 7.13.: nT = 3. This extends the control for nT = 2 by introducing a short intervalat the beginning where the thick stock flow rate takes its maximum value.

202


100 110 120 130 140 1500

0.1

0.2

0.3

0.4

0.5

time

cont

rol

100 110 120 130 140 15070

75

80

85

90

95

time

ws

Figure 7.14.: nT = 4. The control begins to stutter, gaining similarity to bang-bang controlsbecause it switches from its maximum to its minimum value.

100 110 120 130 140 1500

0.1

0.2

0.3

0.4

0.5

time

cont

rol

100 110 120 130 140 15070

75

80

85

90

95

time

ws

Figure 7.15.: nT = 5. An additional interval with maximum flow rate is introduced. Theeffect further stabilizes the response of the system and tries to cancel out anydeviation from the specified set point for ws.

203


100 110 120 130 140 1500

0.1

0.2

0.3

0.4

0.5

time

cont

rol

100 110 120 130 140 15070

75

80

85

90

95

timew

s

Figure 7.16.: nT = 6.

We try to find the largest interval [T1, T2] for which we can find a feasible control. Assumethat we start our search within [0, T2] then the bisection used leads to intervals of length

Lk =(

12

)kT2, (7.2.10)

thus, the length of the final interval can be chosen at the beginning the algorithm. ForT2 = 500 we choose a total of 12 bisection steps, this means that we have to solve 12error-optimal control problems.For nT = 2 and the problems from above we can guess the initial point by the error-optimalsolution. The minimal time for which the computed solution is feasible is used to guesst0 = 33 which can be taken from the graph. It is more important to have a good initialguess for the control variables rather than for the optimal time because the bisection stepwould stop after one NLP iteration whenever the guess for the minimal time is higher.Then it would quickly find that we have to search below 33 in that case.Starting with 33 yields the sequence:

k tk lower b. upper b. success k tk lower b. upper b. success

0 33 0 500 yes 7 14.695 14.4375 14.953 yes1 16.5 0 33 yes 8 14.566 14.4375 14.695 yes2 8.25 0 16.5 no 9 14.502 14.4375 14.566 yes3 12.375 8.25 16.5 no 10 14.470 14.4375 14.502 yes4 14.4375 12.375 16.5 no 11 14.454 14.4375 14.470 yes5 15.469 14.4375 16.5 yes6 14.953 14.4375 15.469 yes

So the controls are modified to have a feasible control for t11 = 14.454 ∈ (14.4375, 14.470)which is significantly lower compared to the error-otpimal initial guess which is feasible for

204

7.3. Drying Section Analysis and Optimization

100 110 120 130 140 1500.25

0.26

0.27

0.28

0.29

0.3

0.31

0.32

0.33

0.34

0.35

time

cont

rol

100 110 120 130 140 15070

75

80

85

90

95

time

ws

Figure 7.17.: This figure compares the error-optimal solution for nT = 2 with the time-optimal solution found by the binary search Algorithm 4.1.

t0 = 33, see Fig. 7.17. The actual control is slightly changed to ensure that the specificationis strictly satisfied. In the graph it can be seen that the time-optimal solution enters thespecification tolerances significantly earlier but there are points at later times where thetrajectory of the state touches the boundary again. This means that the solution is verysensitive to disturbances in the control variables and small changes might cause the pointto become infeasible. A possibility might be to work with a tolerance on the specificationwhich is stricter than the one which is actually needed.

Remark 7.2.2. In our experiments we found that the binary search algorithm succeeds tofind the time-optimal control starting with an error-optimal control provided that the zero ofthe feasibility equation is found whenever the equation is solvable. Otherwise it will wronglyreject an optimal time guess and search the wrong interval. This problem refers to theproblem of finding the global minimum of the feasibility residual.


In this section, we present ways to describe the geometry of paper machine drying sections,perform the steady-state sensitivity analysis an present results from the maximization ofthe production capacity of a drying section by varying geometry parameters.

7.3.1. Defining Geometries

The geometry of drying sections of paper machines can be described in two dimensions.Each cross-section of a roll is a circle which is defined by its center and its radius while thecenter has two coordinates with respect to a defined origin in the two dimensional Cartesiansystem. Then for the i-th roll we have a tuple (pix, p

iy, r

i), where

• pix: absolute position in x-direction of the i-th roll.

205


• piy: absolute position in y-direction of the i-th roll.

• ri: radius of the i-th roll.

Let us assume that a drying section is built from left to right. The setup of all rolls canalso be described by a set of relative positions. Having the first position fixed, we can findthe position of the next one by moving d1

x := p2x − p1

x in x-direction and d1y := p2

y − p1y in

y-direction. As we have seen before, the relative positions are enough for a roll to ’locate’its neighbors and calculate all necessary wrapping and tangent lengths of paper and wire.Clearly, every possible drying section based on a sequence of rolls can be described by 3 ·Nr

variables, if Nr := n` + nu is the total number of rolls and n`, nu is the number of lower orupper rolls, respectively. For machines with about 80 rolls this gives a set of 240 parametersto define the geometry. On the other hand, not every set of tuples yields a valid geometrysince there are several conditions which have to be satisfied.

• Subsequent rolls must have a positive horizontal distance.

• The circles describing the cross-section of the rolls must not intersect.

• Certain minimal distance constraints for the roll surfaces must be satisfied.

We can expect that these conditions somehow reduce the number of parameters needed todescribe the machine or at least transform into simple constraints on the variables. We canask the question whether the radiuses are to be variable or we can assume that the machineis periodic in a way that each triple of subsequent rolls is congruent. This can reduce thenumber of parameters needed to describe the machine drastically. Still we can assume thatall upper and all lower rolls are fixed on horizontal lines each. We reduce the number ofparameters down to 2 without losing too much flexibility.In the following we will keep all radiuses of the rolls fixed at r1 for upper and r2 for lowerrolls. And we assume that all y-coordinates of the upper rolls are equal as well as they-coordinates of the lower rolls, so that there is a unique distance in y-direction betweenupper and lower rolls.

The (2n` + 1)-Parameter Model

If we assume that the drying section starts with an upper roll and ends with an upper roll,we can describe it by relative displacements in x-direction with respect to the centers of thelower rolls. We have n` lower rolls that have a left and a right neighbor each. All we needto know is the displacement to the left and to the right for each of the n` triples. In Fig.7.18 we can see that this description leads to (2n`+1) variables needed for the whole dryingsection layout and yields global flexibility in y-direction with local flexibility in x-direction.For each triple, we have to take care that the geometric constraints are satisfied to guaranteea valid geometry. To ensure the logical order of subsequent rolls we have to assume that atleast

x(i)1 , x

(i)2 , y > 0 for i = 1, . . . , n`, (7.3.1)

but to be more restrictive, we need lower and upper bounds on the parameterslblb0

≤x1

x2

y

≤ ub

ub

uby

, (7.3.2)

206


y

x(1)1 x

(1)2 x

(2)1 x

(2)2

. . . . . .

Figure 7.18.: For n` lower rolls, we have the horizontal displacements x(i)1 and x

(i)2 for i =

1, . . . , n` and a single variable for the vertical distance between any pair oflower and upper rolls y. This results in (2n` + 1) variables to describe adrying section.

where lb and ub are vectors of length n`. Constructional incidents surely give reasonablelower and upper bounds on the parameters. These are a total of (4n` + 2) box constraints.We need some positive minimum surface distances that have to be respected by any pairof rolls so that there is no intersection. We demand a smallest allowed distance ∆(u)

surf > 0

for subsequent upper rolls and ∆(`)surf > 0 for subsequent lower rolls. Then, for all neighbor

pairs of upper rolls we need

x(i)1 + x

(i)2 − 2r1 ≥ ∆(u)

surf for i = 1, . . . , n` (7.3.3)

and for the lower rolls

x(i)1 + x

(i)2 − 2r2 ≥ ∆(`)

surf for i = 1, . . . , n`. (7.3.4)

Here, i is the index of the lower rolls that lies horizontally between the upper rolls. Ob-viously, one of these two constraints is irrelevant since it is automatically satisfied if theother is satisfied. We write

x(i)1 + x

(i)2 ≥ max∆(u)

surf + 2r1,∆(`)surf + 2r2 for i = 1, . . . , n` (7.3.5)

that requires both constraints to be satisfied.The distance between the center of a lower roll and the two neighbor upper rolls is simplygiven by

d(i)1 =

√(x

(i)1

)2+ y2 (7.3.6)

and

d(i)2 =

√(x

(i)2

)2+ y2, (7.3.7)

respectively. For ∆min > 0, we need

d(i)1 − r1 − r2 ≥ ∆min (7.3.8)

d(i)2 − r1 − r2 ≥ ∆min (7.3.9)

(7.3.10)

207


x1

y

x2

r1

r2

Figure 7.19.: This figure shows the geometric parameters of a simplified two-dimensionalcross-section drying section of a paper machine. The variable x1 denotes thehorizontal distance between the center of an upper roll to the lower roll onthe right-hand side. The variable y stands for the vertical distance betweenthe centers which is chosen to be globally equal.

for i = 1, . . . , n` which is equivalent to(x

(i)1

)2+ y2 ≥ (r1 + r2 + ∆min)2 (7.3.11)(

x(i)2

)2+ y2 ≥ (r1 + r2 + ∆min)2 (7.3.12)

for i = 1, . . . , n`. Then we have that the surface distance between any lower rolls and itsneighbor upper rolls is at least ∆min. We collect all constraints (7.3.2), (7.3.5), (7.3.11)and (7.3.12). Including the box constraints we get (7n` + 2) constraints to the variables ofwhich 2n` are nonlinear.

The 3-Parameter Model

Now we reduce the local flexibility in x-direction to global flexibility by defining the dryingsection in a way that it is periodic. Set

x1 := x(1)1 = x

(2)1 = . . . = x

(n`)1 (7.3.13)

and

x2 := x(1)2 = x

(2)2 = . . . = x

(n`)2 . (7.3.14)

Together with the y-displacement, there are only three parameters left to describe themachine and each triple is congruent, see Fig. 7.19.

208


6

x1

y

-

0 ∆fixed

r

Figure 7.20.: This picture shows the feasible region of a 2-parameter model. The variablesx1 and y must not lie within the two circles. The constant r depends onthe radiuses of the rolls and on the desired minimum surface distance. Thevariable x1 is naturally bounded from above by ∆fixed since otherwise x2 ≤ 0and we assumed x1, x2 > 0. Geometrically, the variable y is not bounded fromabove.

The 2-Parameter Model

Assume that x1, x2, y > 0 as in the 3-parameter model. We now reduce the number ofvariables by activating the constraints

x1 + x2 − 2r1 ≥ ∆(u)surf (7.3.15)

x1 + x2 − 2r2 ≥ ∆(`)surf (7.3.16)

This means that we fix the distance between the centers of two neighbor rolls by setting

x2 := ∆fixed − x1 (7.3.17)

for some ∆fixed > max2r1, 2r2. Then we have horizontal surface distances between twoheated rolls

∆1 = ∆fixed − 2r1 > 0 (7.3.18)

and∆2 = ∆fixed − 2r2 > 0 (7.3.19)

for the unheated rolls, see Fig. 7.19.The following conditions ensure that the distance between the roll surfaces are at least∆min.

x21 + y2 ≥ (∆min + r1 + r2)2 (7.3.20)

(∆fixed − x1)2 + y2 ≥ (∆min + r2 + r2)2︸︷︷︸=:r2

(7.3.21)

209


Geometrically, this means that pairs (x1, y) must not lie within two circles of radius rcentered at (0, 0) and (∆fixed, 0), compare with Fig. 7.20.

The (3n`)-Parameter Model

Finally we formulate the full model by introducing n` − 1 additional parameters to the(2n` + 1)-parameter model representing independent vertical distances. This means thatwe have full variability in cylinder positioning by adding the parameters y1, . . . , yn` .

7.3.2. Choosing the Objective Function

We briefly discuss the approach of optimization using systems of nonlinear equations. Itcan be argued analogously for the DAE-constrained optimization problem and we do notrepeat that later.Independent of the choice of parameter model for the geometry, the geometry only hasdirect influence on the wrapping and tangent lengths in our model of the drying section.Effectively, we try to define a set of lengths for the discretization zones of the convectivetransport equations. One could surely assign these lengths directly to the model but byusing the geometry models we restrict ourselves to drying section layouts that have a well-known representation.We can expect that the drying behavior of the paper depends on the wrapping and tangentlengths, maybe even more on the wrapping of the heated cylinders since these are thestrongest sources of energy.Let the unknowns and the parameters of the drying section model be denoted by

• D, the final dry content,

• v, the machine speed,

• X, the intermediate variables needed,

• θ, the geometry parameters (2, 3 or (2n` + 1))

• Θ, the remaining parameters.

Then, the steady-state drying section model can be written as

F (D(θ,Θ), v,X(θ,Θ), θ,Θ), (7.3.22)

where

F : R× R× RNX × RNΘ × Rnθ → RNX+1, (7.3.23)D : Rnθ × RNΘ → R, (7.3.24)X : Rnθ × RNΘ → RNX . (7.3.25)

(7.3.26)

We have (NX + 1) variables in (NX + 1) equations and (NΘ + nθ + 1) parameters.The variable Θ is the vector of all fixed parameters, also including the geometry parameters.One of the variables v and D have to be assigned in order to have a well-defined system.Now we will keep Θ fixed. If we assign v, we can interpret the resulting final dry content

210


(Pv) - (P ∗v )

?

solve without

?

solution θ∗

θ∗

explicit constraintG(θ) := F (X(θ), θ) = 0

Figure 7.21.: Instead of solving the equality constrained problem (Pv), we solve problem(P ∗v ) that shows to be equivalent to (Pv).

as function of the speed v. Or on the other hand, if we fix D, we have v as function of D.This can be written as

F (D(θ, v), v,X(θ, v), θ) (7.3.27)

orF (D, v(θ,D), X(θ,D), θ). (7.3.28)

Now for a fixed final dry content of the paper D := Df then we can leave it from thenotation and modify the resulting machine speed by the choice of the geometry parameterset θ.By defining

X(θ) :=

X1(θ)X2(θ)

...XNX (θ)v(θ)

(7.3.29)

we get the standard form of nonlinear systems of equations

F (X(θ), θ) = 0. (7.3.30)

It is not reasonable to choose the dry content of the paper as the objective function sinceit will usually be close to 1 which makes the objective function value hard to compare andmight result in numerical inefficiency. That is why we fix the final dry content to a valuethat is given by the paper product specification.First, we write the problem in a general form. We would like to maximize the machinespeed v by choosing θ.

(Pv)

maxθ v(θ)F (X(θ), θ) = 0g(θ) ≥ 0

(7.3.31)

Here, the function g is the constraint function that includes the geometric constraintsdepending on the choice of the parameter model as discussed above. In words, we areinterested in finding the geometry that is capable of producing the most paper of a certainfinal dry content.

211


In this notation, the optimization problem is constrained by the underlying system ofequations. However, in practice we can solve

(P ∗v )

maxθ v∗(θ)g(θ) ≥ 0

(7.3.32)

where v∗(θ) := XNX+1(θ) and results from the solution of (7.3.30), see Fig. 7.21. Thismeans that θ∗ solving (P ∗v ) also solves (Pv). This becomes clear when we write the KKTconditions for (7.3.31). The Lagrange function of the original problem is

L(θ, λ, µ) = v(θ)− µF (X(θ), θ)− λg(θ). (7.3.33)

and it is trivial to see that θ∗ solving (7.3.32) and satisfying (7.3.30) is a stationary pointof it.Although we do not know the function v∗(θ) explicitly, we calculate its value by solving thesystem 7.3.30 and get its derivative by solving the sensitivity equations

∂F (X(θ), θ)∂X(θ)

· ∂X(θ)∂θ︸︷︷︸=:S

= −∂F (X(θ), θ)∂θ

. (7.3.34)

Now S is the sensitivity matrix of the system and includes the gradient we are interestedin.

S =

∂X1∂θ1

∂X1∂θ2

. . . ∂X1∂θnθ

∂X2∂θ1

∂X2∂θ2

. . . ∂X2∂θnθ

......

...∂XNX∂θ1

∂XNX∂θ2

. . .∂XNX∂θnθ

∂v∂θ1

∂v∂θ2

. . . ∂v∂θnθ

(7.3.35)

The information we need to solve (7.3.32) by a gradient-based method is given by the lastline of S.When we choose the 2-parameter model for the geometry we have

θ = (θ1, θ2)T := (x1, y)T . (7.3.36)

For the constraint function g this means that

g(θ) =

g1(θ1, θ2)g2(θ1, θ2)g3(θ1)g4(θ2)g5(θ1)

:=

r2 − θ2

1 − θ22

r2 − (∆fixed − θ1)2 − θ22

θ1

θ2

−θ1 + ∆fixed

. (7.3.37)

Then, the solution of (7.3.32) yields a valid geometry of a drying section that allows thehighest production speed while all other process parameters are kept constant.

7.3.3. Sensitivity Analysis

We use the drying section process as a steady-state model and solve it using gPROMSand receive the Jacobian information at the solution by the foreign process interface. Weanalyze the results and visualize them graphically to se which parts of the machine aremost sensitive.

212


150 160 170 180 190 200

5000

5100

5200

5300

5400

5500

nnz = 507482620 40 60 80 100

100

200

300

400

500

600

700

800

900

1000

nnz = 5074826

Figure 7.22.: The sparsity pattern is shown in a windowed view to give a better impressionon it, because it is dense.

First example

In Fig. 7.22 we can see the sparsity pattern of the sensitivity matrix computed for a dryingsection model of the following setup. It includes 43 heated rolls and 42 unheated rolls. Thegeometry is defined by the 3-parameter model. For each zone, we used a fixed discretizationof 4 nodes. This lead to a system of the following dimension.

Number of variables: 34715

Number of parameters: 827

Number of nonzero Jacobian elements: 136472

The Jacobian has approximately 4 nonzero entries per row. The sensitivity matrix has thefull dimension and is not expected to be sparse. Here, we need a total of (8 bytes) · 34715 · 827 =229.674.440 bytes to store the whole information. We imposed a threshold of 10−8 on thesensitivity matrix and interpreted values with a magnitude below it as zero. This leads toa sensitivity matrix with 5074826 nonzero entries.Solving the sensitivity system is actually decoupled from the solution of the nonlinear sys-tem which is solved first by gPROMS. So we can simply add the computational times toget a total time needed to solve the combined problem.The matrix is stored in binary format, so it can be read using other software. We use thebinary file access methods of MATLAB R© to load the sensitivity matrix to the workspace.Additionally, we need indexing information so that we know which information is storedthat is given by arrays of variable and parameter names. Choosing a set of variables andparameters yields a sub-matrix. In the following we pick a single row of the matrix andfocus on specific columns that represent the parameters of interest.

Analyzing the Drying Section

The structure and sparsity pattern of the sensitivity matrix hardly yields useful informa-tion. In order to have concrete results we have to focus on fewer variables. Recall that the

213


5 10 15 20 25 30 35 400

0.2

0.4

0.6

0.8

1

cylinder number

scal

ed s

ensi

tivity

Figure 7.23.: This shows the scaled magnitude of the sensitivity of the final dry content Dwith respect to the steam pressures of the heated cylinders from 1 to 43.

214


5 10 15 20 25 30 35 400

0.2

0.4

0.6

0.8

1

cylinder number

scal

ed s

ensi

tivity

5 10 15 20 25 30 35 400

0.2

0.4

0.6

0.8

1

cylinder numbersc

aled

sen

sitiv

ity

Figure 7.24.: The left figure shows the scaled magnitude of the sensitivity of the final drycontent with respect to the distances x(i)

1 , for i = 1, . . . , 43. Note that allsensitivities ∂D

∂x(i)1

< 0. The right figure shows the sensitivities with respect to

the distances x(i)2 , where all have positive sign.

full system includes many variables that come from the finite difference discretization ofthe zone variables. Actually, most of the variables in the system come from the discretiza-tion. As mentioned before we choose the machine speed as an objective function to bemaximized. But for the sensitivity analysis we focus on the final dry content of the paperand discuss the influencing parameters.Therefore we analyze three kinds of parameters of the drying section, namely steam pres-sure, geometry and air. First we have a look at the steam pressures in the cylinders. Inreality, steam is supplied by steam pressure cascades and cannot be chosen arbitrarily foreach heated cylinder. However, in our simulation we can interpret it as a degree of freedomfor each of the 43 cylinders in our model. Thus we are interested in a gradient of the finaldry content with respect to the 43 parameters representing the assigned steam pressures.We use the (2n` + 1) geometry model with n` = 42 (there is on lower rolls less that upperrolls) to get a system of

Number of variables: 34714,

Number of parameters: 911,

Number of nonzero Jacobian elements: 136470.

We have 43 of 911 variables representing the steam pressure assignments. The results canbe shown for all cylinders in increasing order, see Fig. 7.23, and it is possible to visualizethe results graphically along with the two-dimensional layout view of the drying section.The graphical representation visualizes the sensitivity by using a heat color scheme fromwhite (no sensitivity) to black (high sensitivity), crossing shades of yellow and red.

215


It can be seen that the sensitivity trend with respect to cylinder numbers reaches its max-imum magnitude quite early and loses its effect as the paper continues its way throughthe machine. The stages can be explained by the layout of the drying section consistingof groups, each one with an own paper wire. One would expect that increasing the steampressures increases the final dry content. This is confirmed by the fact that all sensitivitieswith respect to the 43 steam pressures have positive sign. It can be explained why the in-fluence breaks down at the end of the machine and also decreases in magnitude. The paperis drying and it is getting harder to dry. Increasing the steam pressures can hardly makethe paper even dryer in the end. One might think of the possibility of reducing the steampressure at the end of the drying section and increasing it at the beginning if that leads tothe same drying result. Drying the paper would be fastest if one could heat it as fast aspossible, however, physical limitations forbid this approach because very high temperaturedifferences between the paper and the cylinder surface might cause the paper web to break.A consequence of the results here can be that optimal steam pressure controls get into thefocus of the research. The objective does not necessarily have to be the maximization of thefinal dry content or the maximization of production capacity but the reduction of the steamneeded for drying. Therefore the steam pressure cascade has to be taken into account whenmodeling the drying section process so that the steam pressure control can be describedrealistically.We introduced the geometry of the drying section and showed ways to model it by 3n`, (2n`+1), 3 or 2 parameters. Analyzing the standard configuration shows that the sensitivities withrespect to all geometry parameters x(i)

1 and x(i)2 , respectively, all have the same sign. In

any case it can be seen that the standard configuration is not optimal and all geometryparameters could be varied in the same direction. This means moving the lower unheatedrolls all in the same direction and motivates the use of descriptions with lower dimension.The full geometry model yields the same system as before, where 87 of 911 parameters arerepresenting the geometry. In Fig. 7.24 we see a total of 86 sensitivities in the logic order ofthe drying cylinders. The magnitudes of the sensitivities ∂D

∂x(i)1

and ∂D

∂x(i)2

strongly correlate

while their signs are different. An interesting result is that the influence of the geometryincreases as the paper runs through the machine. However we have to keep in mind thatthe information carried by the sensitivity matrix is only valid at a certain point of the wholesystem. Before performing the optimization, we should understand why the geometry hasan influence on the drying process at all. The geometry parameters are used to determinethe lengths of the zones as described in Chapter 2. The length of the zone tells the modelhow the long the paper is in contact with the cylinder, the wire and the air. During alltheses phases, the drying process happens so it is clear that there is indeed an influencethat must not be underestimated. But if we remember the way we attacked the transportproblem in Chapter 2, we see that the numbers of nodes used to discretize the transportequation does cannot depend on the length of the zone smoothly. If a specific geometryis chosen by its parameters, a discretization scheme must be fixed and cannot be changedduring the simulation in gPROMS because it would change the number of variables in thesystem. The software always works with a system of the same size. So we are forced to usethe same discretization scheme and the same number of nodes for every geometry chosenand hope that the discretization error does not invert the effect of the geometry. Finally, wehave a look at the sensitivities of the final dry content with respect to the flow rates of dryair assumed for each control volume. Since there are control volumes for upper and lowerrolls, respectively, we have a total of 2n` = 84 parameters of interest. In Fig. 7.25 we show

216


10 20 30 40 50 60 70 800

0.2

0.4

0.6

0.8

1

cylinder number

scal

ed s

ensi

tivity

Figure 7.25.: This shows the influence of the flow rate of dry air to the control volumes foreach upper and lower roll.

again the scaled sensitivity magnitudes to illustrate the trend of the sensitivity in machinedirection. The results show that there is something like an upper and a lower trend whichseem to merge and generate a synergy effect at the end of the machine. It is clear that thesign of these sensitivities is the same everywhere because increasing the flow rate of dryair to the control volumes directly improves the evacuation of humid air. Unfortunately,we have not found a meaningful explanation of the air sensitivity phenomenon at the endof the machine but it shows that improving modeling of the air system and optimizing theuse of air in drying operation might be worth a closer look.

Computational Costs

Computational costs play a minor role in investigating sensitivity analysis like we applied inthis work. Our implementation of the sensitivity analysis tool for gPROMS is quite costlyfor large problems since it has to order the Jacobian elements provided by the gPROMSforeign process interface, convert the data to have the suitable format to be used with NAGand export the results to a file. The examples above produced sensitivity matrices of 200MB size and more and storing it to the hard disk already takes some time.Computational time is not critical because we are only interested in an offline-investigationof the results.

217


-

?

-

?

-

? ?

1.0 1.04102 1.04117 1.04117

std 2 param. 3 param. 85 param.guess guess guess

4 NLP iterations 2 NLP iterations 1 NLP iteration

≤ ≤ ≤

-126 param.

3 NLP iterations

?

≤ 1.04145

Figure 7.26.: Initial guess propagation for the geometry optimzation.

7.3.4. Optimization Results

The following results contribute to the application for a patent concerning constructionalconcepts of paper machine drying sections as cited in Section 1.1.1.We started using a reference geometry which can exactly be described by any of the pre-sented geometry models, this means that

x(i)1 + x

(i)2 = ∆fixed, i = 1, . . . , n`, (7.3.38)

together with the periodicity (7.3.13) and (7.3.14). So we can start an optimization fromthe same effective geometry independently of the geometry model used. Recall that theway the geometry influences the drying process is by determining zone lengths. Surely allpossible zone lengths with the 2-parameter model are within the set of all possible zonelengths with the 3-parameter model, and these are again in the set of the possible zonelengths of the (2n`+1)-parameter model and so on. Let θ∗(2), θ

∗(3), θ

∗(2n`+1) and θ∗(3n`) be the

optimal geometries of the 2, 3, (2n` + 1) and (3n`)-parameter models. Then it holds that

v(θ∗(3n`)) ≥ v(θ∗(2n`+1)) ≥ v(θ∗(3)) ≥ v(θ∗(2)) (7.3.39)

for n` > 1.From a geometric point of view, the standard geometry is an extreme because it satisfiesthe symmetry conditions

x(i)1 = x

(i)2 , i = 1, . . . , n`. (7.3.40)

As seen before along with the sensitivity analysis of the drying section process, where thestandard geometry is analyzed, it is not optimal because the geometry parameters lie inthe interior of the feasible region and the sensitivities of the final dry content (and alsoof the machine speed) are not zero. It follows that there exist geometries which allowhigher possible machine speeds and all of them are at least partially asymmetric. This isa simple result but it gives a concrete and important impression on what can be expectedwhen we declare geometry parameters as degrees of freedom. First we give the resultsof numerical optimizations using the gPROMS built-in SQP solver. We start with the 2-parameter model, compute the numerical approximation to a local minimum and use theresult to determine a start value for the 3-parameter model and so on, see Fig. 7.26. Therelative objective function is given as a relation to the machine speed that is reached forthe standard geometry

θ0 :=

x0

x0

y0

, (7.3.41)

218


thus starting at 1. Recall that the standard geometry analyzed before has 43 upper and 42lower rolls, so n` = 42 and we have a total of 126 parameters in the full model. We definethe total length of the standard machine as l0 := 2n`x0 = 84x0.

standard 2 3 (2n` + 1) (3n`)

v/v(θ0) 1 1.04102 1.04117 1.04117 1.04145x1/x

0 1 1.8571 1.8533 1.8533 1.8533x2/x

0 1 0.1429 0.1429 0.1429 0.1429y1/y

0 1 1.0477 1.0477 1.0477 1.25y2/y

0 1 1.0477 1.0477 1.0477 1.25yi/y

0, i = 3, . . . 42 1 1.0477 1.0477 1.0477 1.0477∑2i=1

∑42j=1 x

(j)i /ltotal 1 1 0.998 0.998 0.998

SQP iter – 4 2 1 3optimality – 2.7e-6 1.7e-6 6.6e-6 9.4e-14

This confirms property (7.3.39) with nearly equality when comparing the 3-parameter re-sult with the 85-parameter result. The relations of the optimal parameters to the standardparameters are given in this table. The initial guess propagation helped to find the optimalpoints because the geometries that are effectively described do not differ very much. Nu-merical experiments with different initial guesses showed that there seem to be several localmaxima of the objective function. Using the 2-dimensional optimum was a good initialguess to find comparable solutions of higher dimensional models. We have a closer look atthe multi local solutions when applying the tunneling algorithm to the problem.These results seem unambiguous and we can see that the full model does not have the powerto generate further improvements. All parameter models lead to practically the same asym-metric drying section geometry.The meaning of these numerical results can be supported by geometric facts. All optimiza-tions presented effectively seek a parameterization of a drying section that maximizes thecontact length of paper with the steam-heated cylinders. Actually, the symmetric case thatis when x1 = x2 for a lower roll is the worst case in a sense of wrapping the neighbor uppercylinders when the vertical distance is fixed and the sum x1 + x2 is fixed. We can see thisby the formulas from Chapter 2. There, we calculate the wrapping lengths for each halfof the rolls by using the horizontal distance to its neighbor on this side. By leaving theequilibrium where both sides generate the same wrapping lengths we increase the contactlength on one side while decreasing it on the other, but for different amounts. It can easilybe checked that the total wrapping length has a local maximum at the symmetry point.It is clear that this length depends on the vertical distance to the neighbor rolls. If thevertical distance tends to infinity, the wrapping angle tends to 180 degrees that is the lowerbound on the wrapping angle as long as the vertical distance is positive. From an engineer-ing point of view the contact length is needed to accomplish the heat transport from theheated cylinders to the paper. Improving the heat transfer is most important for effectivedrying because this is together with the evacuation of the humid air what it is all about inthermal drying of a paper web. Finally it seems straightforward to maximize the contactlength to have the highest possible heat transfer. However we have to keep in mind thatthe paper is covered by a wire while it has contact with the hot cylinder. This means thatit cannot be enough to heat the paper as much as possible, there must also be contact withthe air in order for the water to evaporate. This argumentation leads to the question ofthe optimal configuration when heating and evaporating are alternating processes as it is

219


x0

Figure 7.27.: Asymmetric lower rolls.

the case for the single-tier drying section. The results for the full (3n`)-parameter modelshow this difficulty. The first two lower rolls were vertically pulled down compared to thestandard geometry although this leads to a shorter contact length for the first 3 heated rolls.Obviously, in this model it is more effective to have more contact with the surrounding airthan with the heated cylinder at the beginning of the drying section. This is explained bythe fact that the first heated rolls are relatively cool compared to the following. Having ahigh temperature difference between paper and the cylinder surface might cause the paperto stick and break.Comparing the optimal values for the parameter models with increasing number of degreesof freedom show that the periodic and size-limited case with 2 parameters leads marginallyworse results than the others. Thus it suggests a drying section where all lower rolls arepositioned asymmetrically between two equidistant neighbor rolls. It has to be checked ifthe construction of such a machine is possible and the construction costs must be estimated.If the results should be that it is only realistic to build a drying section which is partiallyasymmetric, one might ask the question which of the lower rolls have to be asymmetric.This leads to a combinatorial problem of choosing an optimal subset of lower rolls, maybe ina block, and these are to be positioned asymmetrically. There, sensitivity analysis can helpto identify the optimal block. As shown in the sensitivity analysis section, the influenceof the geometry is highest in the first part of the drying section, at least for the standardgeometry and the current point of operation.

7.4. Tunneling Benchmarks

In this section, we show the results of the modified tunneling algorithm for global optimiza-tion problems as presented in Chapter 6. We test the algorithm on the 5 academic testfunctions given in Section 6.3.2 and try several versions of the tunneling algorithm by ac-tivating or deactivating relevant algorithmic features. All other significant parameters andfeatures are kept constant such as the choice and the tolerances of the local minimizationsolver, the size of the initial pole region (ν = 0.02) and the choice of the pole function itself.We choose the tangent pole function with a fixed choice of µ = 2. Here, we do not discussthe influence of the strength parameter µ and shall use the symbol for the estimator to theexpected number of function evaluations.These are

• Shape-identification: on/off,

220


• Pole-reshaping: on/off,

• Deterministic background strategy: on/off,

which gives a total of 8 combinations to try. We use standard Halton sequences to buildthe set of potential start points up to dimension n = 6.The stopping parameters are chosen as

• Nc = 1000, the total number of potential start points, either random or quasi-random,

• Nn = Nc, the total number of global restarts allowed,

• Np = 8, the number of restarts at the last pole before doing a global restart,

• Nm = 4, the number of mobile poles used before restarting at the last pole,

where n is the dimension of the problem. These parameters are chosen high enough to makethe case of not finding all global minima until the stopping criterion is reached unlikely.The gradients of the test functions can all be calculated analytically, so no finite differencingis used.For each one of the 5 test functions we choose a number of total runs to perform thestatistical analysis and make the configurations comparable. We make random runs bypicking points from a uniform distribution and run local minimizations until all globalminima are found and we use points from the Halton sequence of the considered dimensionto have references.

test function n global minima total runs to perform µ0 σ0 µH

(i) 2 8 1805 4205 2155 2568(ii) 2 1 2000 2369.2 2330.4 431

(iiia) 2 2 1744 61.8 42 35(iiib) 2 2 1744 284.1 207.3 492(iv) 3 1 2000 115.7 110.1 68(v) 5 1 2000 2475.5 2344.8 508

The value of µ0 stands for the average function evaluation count for the random methodand µH for the fixed number of function evaluations needed in the case of the deterministicHalton sequence. What we already see is that the deterministic approach seems to hit theglobal minima much earlier in most of our test cases. Test function (iiia) is the same as in6.3.2 and function (iiib) is modified by stretching the lower and upper bounds of the searchspace from (−3,−2)× (3, 2) to (−9,−4)× (9, 4) to decrease the size of basins of attractionof the global minima.In the tables on the next pages we give the results. Here, µ is the mean of function evalua-tions needed to find the global minima and σ is the standard deviation. The computationswere done on Pacioli, the Linux-High-Performance-Computing cluster of the UZWR (UlmerZentrum fur wissenschaftliches Rechnen) which has 35 AMD Opteron nodes. For some testfunctions, a total of about 32 million function evaluations were performed.The presented algorithm and its variants is comparable to a simple multi-start method tobenchmark the performance. As argued before, the number of function evaluations neededto find the global minima strongly depends on the choice of the local solver. Thereforewe can hardly compare our results to function evaluation counts of derivative-free globaloptimization methods. For each one of the test functions we give the reference value of arandom start method using the same local solver as µ0 in the table above.

221


shap

eid

.re

shap

ing

bgst

rat

func

tion

µσ

aver

age

star

tpo

ints

aver

age

resh

apin

gs

00

1(i

)23

5914

9474

.60

01

1(i

)22

85.6

1352

.271

.225

.30

10

(i)

2159

.613

13.2

64.1

25.6

00

0(i

)21

84.8

1273

.765

.10

10

1(i

)22

99.3

1439

.371

.20

11

1(i

)23

05.5

1392

.672

.025

.41

10

(i)

2121

.312

20.8

62.5

25.3

10

0(i

)22

00.6

1233

.065

.80

00

1(i

i)14

9.1

159.

22.

10

01

1(i

i)15

2.5

165.

92.

11.

70

10

(ii)

402.

247

5.8

8.6

3.0

00

0(i

i)40

0.7

373.

38.

50

10

1(i

i)13

915

7.2

20

11

1(i

i)14

0.3

159.

52

1.5

11

0(i

i)39

2.3

376.

98

3.1

10

0(i

i)41

3.6

374.

48.

80

00

1(i

iia)

81.5

76.6

1.0

00

11

(iiia

)84

.680

.51.

00.

80

10

(iiia

)11

7.7

88.5

1.0

1.7

00

0(i

iia)

112.

881

.01.

00

10

1(i

iia)

88.7

79.2

1.0

01

11

(iiia

)92

.088

.01.

00.

91

10

(iiia

)11

6.6

89.4

1.0

1.6

10

0(i

iia)

117.

988

.51.

00

222


shap

eid

.re

shap

ing

bgst

rat

func

tion

µσ

aver

age

star

tpo

ints

aver

age

resh

apin

gssk

ewne

ss

00

1(i

iib)

54.6

41.0

1.0

01.

80

11

(iiib

)54

..441

.11.

00.

42.

30

10

(iiib

)72

.943

.91.

00.

81.

70

00

(iiib

)76

.146

.51.

00

1.8

10

1(i

iib)

56.9

42.1

1.0

01.

91

11

(iiib

)54

.843

.41.

00.

42.

71

10

(iiib

)73

.042

.21.

00.

81.

81

00

(iiib

)74

.143

.71.

00

1.6

00

1(i

v)31

0.0

227.

31.

00

1.4

01

1(i

v)40

8.9

332.

21.

013

.51.

90

10

(iv)

274.

628

4.4

1.0

8.2

1.9

00

0(i

v)24

1.1

232.

61.

00

2.0

10

1(i

v)32

6.1

246.

51.

00

1.8

11

1(i

v)39

6.4

309.

81.

013

.21.

71

10

(iv)

283.

729

8.5

1.0

8.5

2.0

10

0(i

v)24

2.6

228.

41.

00

1.8

00

1(v

)29

1.4

228.

01.

10

5.0

01

1(v

)28

4.8

193.

41.

11.

74.

30

10

(v)

274.

020

5.1

1.1

1.6

4.7

00

0(v

)28

4.1

269.

01.

20

9.8

10

1(v

)27

2.5

179.

61.

00

3.5

11

1(v

)28

1.4

189.

11.

10.

82.

71

10

(v)

273.

420

1.5

1.1

0.7

3.0

10

0(v

)26

9.7

190.

51.

10

3.5

223


7.4.1. Tunneling in n Dimensions

To analyze the behavior of the tunneling algorithm for dimension n = 5, . . . , 20, we use thebenchmark function (v) which can be formulated to be of dimension n > 2 like

f(x) = 0.1

(sin(3πx1)2 +

n−1∑i=1

((xi − 1)2(1 + sin(3πxi+1)2)) + (xn − 1)2(1 + sin(3πxn)2)

),

which has a single global minimum at z = (1, . . . , 1)T and numerous local minima in

Ω = x ∈ Rn : −5 ≤ xi ≤ 5, i = 1, . . . , n. (7.4.1)

Searching for z by picking a random point in Ω and starting a local unconstrained mini-mization with MATLAB’s function fminunc which uses a trust-region method as long asanalytic derivatives are provided gives the following reference data.

n µ0 average number of starts

5 2587.2 183.16 3742.4 2287 4411.8 235.78 3687.8 176.39 4398.3 186.610 6011.3 222.111 6076.8 208.912 6964.1 217.813 8267.7 233.914 7892.1 207.515 9147 226.116 9757.4 225.917 6639.7 143.618 11939.4 244.419 14059.1 269.420 10619.9 192.6

Here, the values for µ0 are guessed roughly by performing 20 complete searches for theglobal minimum and µ0 is the mean of the numbers of function evaluations needed eachtime.What we see here is that the average number of points needed to start a local optimizationdoes not significantly increase with the dimension. This means that the ABA of the globalminimum has a relative volume that is approximatively independent of the dimension, thusalmost of constant volume relative to the volume of Ω. The chance to hit the basin must besimilar for any case otherwise there would be significant differences in the average numbersof starts needed. For the unscaled problem, we can say that the ABA is growing with thedimension at nearly the same rate as the domain Ω does. The scaled problem is to find thesolution in the unit cube of dimension n which has constant volume, then the volume ofthe ABA is nearly constant.Not surprising, we can observe the trend of needing the more function evaluations for eachone of the points that we start from the higher the dimension of the problem.

224


5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

0

100

200

300

400

500

600

problem dimension

µ

shape−id.

spherical

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 205

10

15

20

25

30

35

problem dimension

µ 0 / µ

shape−id.

spherical

Figure 7.28.: These figures compare the average performance of a shape-identified versionand an unmodified version of the tunneling algorithm for n-dimensional prob-lems with n = 5, . . . , 20.

We pick Np = 20 and Nm = 10 which means that we hardly ever need new global startpoints. We have to keep in mind that tunneling can only work by the presence of poleswhich cause the local minimization method to stay away from previously found minima. Asdiscussed in Chapter 6, the chance to hit a region of the domain which is affected by a poletends to zero as the dimension gets large. In that case, tunneling will lose its power if usedonly as addition to a simple multi-start method. To overcome this problem we would needto adjust the size of the pole regions but this will cause them to overlap vast areas of thedomain as the maximal radiuses of the ellipsoids increase. We use two kinds of limitationsfor the maximal radius of the ellipsoids, namely half of the minimal distance of piecewisedisjoint active poles and a fixed upper bound of 0.08.Recall that we work on the scaled problem and the domain is always [0, 1]n. By havinglarge numbers Np and Nm we run many cycles starting close to active poles or mobile poleswhich guarantees that we do not miss to use the effect of the pole function. In Fig. 7.28we compare the averages of 16 different problems solved by either using the tunneling al-gorithm with spherical poles or shape-identified poles. It showed that in 12 of 16 cases, theshape-identified poles had some advantages compared to the spherical poles. The shape-identified version needs a total of 6327 function evaluations to solve all 16 problems whilethe spherical version needs 7034.It also can be seen that the effect of tunneling does not decrease significantly as the dimen-sion grows, it even wins relative performance compared to the simple multi-start solutionin both cases. This can be explained by the fact that the algorithm performs the search fora zero of the tunneling function always guided by a pole function. But note that only 20total optimization runs have been performed for each one of the problems which gives onlya rough impression on the relations for higher dimensions.

225


7.4.2. Remarks on Nonlinear Constrained Problems

Many global optimization algorithms are designed for effectively unconstrained problems,where it is known that the global minimum lies in the interior of an n-dimensional box.Without large modifications, the presented algorithm works on non-convex nonlinear pro-grams with multiple KKT points.We use standard methods for nonlinear programming methods to solve the local minimiza-tion problems arising in the tunneling algorithm. That is why it is possible to apply it toproblems with nonlinear constraints. Then we know that the NLP solver finishes with aKKT point which is not necessarily a stationary point of the unconstrained function. Ifa local minimum lies in the interior of the feasible region, it is also a KKT point. Butadditionally we can expect the NLP solver to finish with points on the boundary. Howeverthis might also be the case for box-constrained problems where it is not known whether theminima lie on the boundary or not. It depends on the function and the type of constraintsif the problem is actually constrained anyway because it can only be called a constrainedproblem if at a KKT point we have active constraints.Now this means that the tunneling function has poles on previously found KKT points andnot necessarily on local minima with zero length gradient. The way the poles are designedhas the same effect as for the unconstrained problem. It indicates whether a point with alower function value as the lowest KKT point is found or not. Tunneling works only on theobjective function and has no effect on the constraints. It destroys KKT points by usingpole functions.We ensure that every time the algorithm produces a perturbation of a current point that iswhen we restart near a mobile pole or an active pole, the new initial guess is feasible withrespect to all constraints.It can be verified that the algorithm still works for the benchmark functions if additionalnonlinear constraints are imposed that ’cut away’ the former global minimum from thefeasible region so that the new best KKT point lies on the boundary. For detailed academictests, it would be necessary to set up problems with multiple KKT points with active con-straints and a known number of them to be globally optimal. The easiest way could be todefine an axial-symmetric test function and require feasible points to lie in the interior of asphere centered at the origin.In our implementation we use MATLAB’s function fmincon which uses a SQP line-searchwith BFGS approximations to the Hessian of the Lagrangian for problems with nonlinearconstraints, a so-called trust-region reflective method for problems with pure bound con-straints or nonlinear constraints and an interior-point method. Using feasible start valuesdoes not guarantee that all iterates are feasible in the SQP solver, especially line-searchsteps might violate the constraints severely. This is unimportant when the function to beminimized can be evaluated everywhere but this not necessarily the case in our applica-tions. Constraints are not chosen arbitrarily but are strict in a way that the function isnot defined outside. However if we start at feasible points and return infinity each time aninfeasible point is tried during a line-search iteration, then the step-size is reduced until theline-search stays feasible. Also, interior-point methods are a promising choice.

7.4.3. Direct Comparison by Weibull Analysis

We solve the benchmark problem (v) for dimension n = 10 to generate a direct comparisonof a tunneling algorithm with spherical and ellipsoidal poles. We use the Weibull fits to

226


Figure 7.29.: Comparison of the spherical and ellipsoidal approach for benchmark function(v) with n = 10 and 10000 test runs each.

visualize and characterize the probability density of the global optimization experiment.We made a total of 10000 runs to search the domain until the single global optimum isfound in both cases. The spherical approach is stated as a (0, 0, 0)-strategy that is it usesspherical poles, no reshaping strategy and random start points. The ellipsoidal approachhas the configuration (1, 1, 0). Both do not use deterministic background strategies inorder to avoid a falsification of the results because we have already seen that the effect ofdeterministic strategies can be significant for certain test functions.We get the moments estimated as:

expectation µ standard deviation σ skewness vspherical 407.8 986.6 33.5ellipsoidal 370.0 260.8 5.2

What we see is that the ellipsoidal approach reduces all of the relevant characteristics of thealgorithm applied to that special function. The results are compared in Fig. 7.29, where theapproximated density functions are plotted. The algorithm that uses spherical poles hasa much larger variance thus produces more outliers, both with low and very high functionevaluations counts. In the figure, this can be seen by looking at the tail of the distributiondensities which has significantly more mass in case of the spherical approach. These resultsfurther encourage the use of ellipsoidal strategies in higher dimensions and therefore confirmthe interpretation of the experimental results for the tunneling in n dimensions above.

227


ABAof the global minimum

Figure 7.30.: Exemplary decomposition of a domain into ABAs. A naive multi-start willfind the global minimum after about expected 2 starts, while a tunneling-typemethod will spend some time on the right side of the domain.

7.4.4. Conclusions on Tunneling

Implementing a tunneling algorithm as presented in Chapter 6 gives a large variety ofparameters to be determined and there are some choices to be made. Among others, theseare:

• Choose the type of the pole function.

• Choose or implement a local minimization solver and be sure about termination tol-erances. Maybe use estimates on second order derivatives for shape-identification.

• Give a strategy to find a suitable pole strength so there is a singularity indeed witha high probability.

• Give initial sizes for pole regions and the parameters for a shaping strategy.

• Decide which distance from the current point to use for a new start point. Make surethat you start in the interior of the pole region.

• Determine the length of the cycles for starting at poles, mobile poles or performingglobal restarts.

• Choose whether to take a deterministic set of start points given by a low-discrepancysequence or random points from a uniform distribution.

228


The optimal choice, if it exists, surely depends on the function to be minimized. Actuallythere are many parameters that have strong effect on the performance of a specific problem.There are also ways to choose the parameters in a way that the algorithm fails to find theglobal minima within reasonable time that is when very bad parameters for the initial sizeof the pole region or the perturbation step length for new start points are chosen or thepole strength is chosen too weak.We see that there are cases in which a random background strategy performs better than adeterministic one, see function (i). This seems to be because the deterministic set is a specialcase of all possible random sets and a bad choice for the considered function by coincidence,however there are cases in which the deterministic strategy performed significantly better.So it is clear that there is no deterministic strategy that is superior to a random strategyfor all possible functions. Still, it depends on the choice of the cycle parameters Np and Nm

how many global restarts are used and this decides how many points from the backgroundstrategy set are used. The more points are used, the higher can be the difference betweenboth strategies.We could see that the presented tunneling algorithm has strengths in finding a single globalminimum that is surrounded by many local minima, especially in high dimensions. Thismight be explained by the fact that the unscaled search space grows rapidly with thedimension of the function which makes it hard to cover the domain by picking randomnumbers again and again. Our algorithm stays in the region of the last found minimumand moves away according to the cycle parameters. The numerical results show that inthe case of the test function (v) with variable dimensions, the shape-identification helpedto find the global minimum. We have to keep in mind that the Hessian describing thelocal shape of the function is of dimension n × n and can yield stronger differences in therotated axial directions of the pole region the higher the dimension of the function. So weconclude that the cycle parameters Np and Nm should grow slightly with the dimension ofthe problem instead of increasing the pole size. The chance to hit a pole region by throwinga random point into the domain tends to zero as the dimension gets large but recall thatthe center of the pole region is not an arbitrary point but a former attractor of the functionwhich definitely increases the chance to find a way to the pole region by following a descentpath. This is encouraging because tunneling can only work if the pole regions are visitedby the local minimization method.When we compare the results for the test functions (iiia) and (iiib), we see that the tunnelingalgorithm performs slightly worse than the simple multi-start minimization in case (a) andsignificantly better in case (b). We increased the size of the unscaled search domain byfactor 9 which casues a reduction of the relative size of the basins of attraction that leadto the two global minima. It is clear that picking random points is a bad choice when theregions we are looking for are small and unlikely to be hit. In that case, the tunnelingalgorithm performs way better than the simple multi-start. This simple result shows thetypical problem that most of the points thrown randomly into the domain will have a badobjective function value and only few of them will be good.Simple functions with few local and global minima are efficiently solved by picking pointsrandomly or from a quasi-random set and performing local optimizations. This can be evenmore efficient than using a tunneling type method because of faster local convergence rateswhen there is no pole on the landscape of the function. If the decomposition of the domainof a function has a form like the one shown in Fig. 7.4.4, the naive multi-start is one of thebest choices imaginable and tunneling will surely be less efficient. We see such a behaviorfor the functions (iiia) and (iv) which only have few local minima. As shown for (iiib) it

229


is important for picking random points without any special global optimization strategyneeds a narrow field where the optimum that we are looking for lies. If we know littleabout where to search, method without strategy can quickly become highly inefficient.We see high potential of such an approach for the global optimization of smooth functionsand would expect that research can lead to a concept of self-adaptation as it exists forevolution strategies, see [Bey95]. At the end of Chapter 6 we gave an example on how torealize a form of self-adaptation is such a way that the shape of the pole region is modifiedso that a gradient path method can find a way out of it. If there are efficient gradientpath methods, it might be quite promising to use them together with tunneling functionsbecause in this way, the algorithm can learn more about the surface of the function to beminimized and use this information to find good strategy parameters simultaneously withthe solution of the global optimization problem.In the end we see the general problem that a black-box method applied to an arbitraryproblem yields unpredictable behavior. If however problems of the same type have to besolved often, then we can find a configuration that fits the problem and generates very goodperformance compared to pure random strategies.

7.5. Global Optimization Results

Global optimization shall be used on the presented optimization problems to increase thechance that the solutions found are not just local solutions. We embedded this into aframework that uses the software gPROMS as a black-box solver. Keeping the MATLAB R©notation, we use it like this:

[f, grad] = evaluate_gproms(x);

This means that we give a vector x of our problem dimension. Internally we have totranslate the variable vector into the gPROMS control variables, make it evaluate the pointand read the results to get the function and the gradient of the objective function evaluatedat the current point. If x is interpreted as a feasible control for the underlying DAE system,this constraint is implicitly satisfied.Once we assume that this is possible, we can apply practically any optimization methodto the problem. And we choose to use the modified tunneling algorithm presented in thiswork. The local optimizations are carried out by the medium-scale algorithm of MATLAB’sfmincon that uses an active-set SQP line-search method and approximates the Hessian ofthe objective function the BFGS update formula.In the following two sections we do not require any information about the solution weare looking for. Instead of giving reasonable initial guesses that are probably close to thesolution we start randomly to see what the tunneling algorithm can do with a completeblack-box problem within a specified time.As a stopping rule we only impose an upper bound on the number of function evaluationssince this definitely limits the time spent on the problem.In our application, using the global optimization can be seen as an experimental analysisof the problem to be solved and can produce useful information for future applications thatrelate to these examples.

230


7.5.1. Wet-End Global Optimal Control

We have already seen that there are an unlimited number of controls which are effectivelyequivalent which encouraged an adaptive refinement strategy on the numbers of controlintervals. The number of degrees of freedom grows rapidly with the number of controlintervals and neither good initial controls nor knowledge about the characteristics of theobjective function might be available. We see it as strongly recommendable to analyze suchoptimization problems by means of global optimization to generate useful information forfuture applications.It is assumed that the dynamic system is in a steady-state at time 0 that is ws = const andw′s = 0 as well as for any other internal variables of the system. So the objective functionof the error-optimal control problem can be notated as

f(x) :=1T

∫ T

0(ws(x, t)− 90)2 dt, (7.5.1)

where x ∈ Ω is now the variable including the parameterization of the piecewise-constantthick stock control and the substance of the paper ws implicitly depends on that control.The domain Ω ⊂ Rn results from the following constraints. The DAE system that implicitlyconstrains the problem is omitted here since it is automatically solved each time the objec-tive function is evaluated, compare with the discussion for the drying section optimizationproblem. We do not repeat that here.Recall that the way we formulated the error-optimal control problem for the wet-end pro-cess leads to a linearly constrained problem. The variables consist of nT variables for theinterval lengths and nT variables for the value of the thick stock flow rate in case of our wet-end example. We have somehow natural lower and upper bounds on the total of n = 2nTvariables due to technical restrictions and the time horizon equality constraint

nT∑i=1

xi = T, (7.5.2)

so we actually are looking for a solution in a (n − 1)-dimensional subspace. We choose Tas a natural upper bound for each variable xi for i = 1, . . . , nT . Because our tunnelingalgorithm works on the scaled problem where the domain Ω := [0, 1]2nT and the equalityconstraint (7.5.2) just gets

nT∑i=1

xi = 1. (7.5.3)

Here xi are the scaled variables and each one of them is in [0, 1].An arbitrary point x ∈ Ω does not necessarily satisfy this constraint. In order to find feasibleinitial guesses for the active-set method we generate random or quasi-random points in theunit cube and project them onto the (n − 1) dimensional subspace by scaling the first nTcoordinates by the factor

1∑nTi=1 xi

,

where x is the current point in the cube. This scaling relates to moving the current pointalong a straight line through the point and the origin until it hits the (n− 1) dimensionalsubspace that is for example a plane triangle with vertexes (0, 0, 1), (0, 1, 0) and (1, 0, 0) forn = 3. Alternatively, one could choose an orthogonal projection. Note that the rest of thevariables are not affected by the projection.

231


Problem: nT = 4 and Time Horizon T = 500

Effectively, there are a total of 8 variables to be varied in order to find a solution of theglobal optimal control problem. We stopped the algorithm after 2000 function evaluationsand collected all local and global minima that have been identified during the tunnelingalgorithm because now we are not only interested in the best possible point but in thecomplexity of the considered problem. A local solver tolerance of 10−8 is used which refersto the MATLAB R© parameters ’TolFun’ and ’TolX’. We have found that identifying somecandidates is hardly possible with rougher tolerances because the objective function seemsto be quite ’flat’ on the way to the minima and a critical point might be wrongly identified.We have also seen this phenomenon as we applied the experimental gradient path algorithmto the problem.The values of the variables in the following table are linearly scaled to be in [0, 1], wherethe unscaled variables are in

0.1 ≤ xi ≤ 500, i = 1, . . . , 4, (7.5.4)

and

0.01 ≤ xi ≤ 0.5, i = 5, . . . , 8. (7.5.5)

We get:

x1 x2 x3 x4 x5 x6 x7 x8 f(x)cand. 1 0.1740 0.1020 0.2679 0.4554 0.3537 0.1671 0.3386 0.2613 7866.2cand. 2 0.0026 0.0338 0.9629 0.0000 1.0000 0.5478 0.4810 0.4685 13.4414cand. 3 0.9994 0.0000 0.0000 0.0000 0.4838 0.5610 0.2282 0.2509 21.4044cand. 4 0.0046 0.0016 0.0285 0.9647 1.0000 0 0.5963 0.5233 5.2761

Candidate 4 is equivalent to the solution found before and confirms that it is likely that wehave found the global minimum while performing the time discretization refinement.The L1 distances of the effective (scaled) controls are given by the following symmetricdistance matrix.

cand. 1 cand. 2 cand. 3 cand. 4cand. 1 0 0.4427 0.4418 0.4891cand. 2 0.4427 0 0.0789 0.2095cand. 3 0.4418 0.0789 0 0.2108cand. 4 0.4891 0.2095 0.2108 0

So we have found several candidates for a global minimum which proofs that a single localminimization does not necessarily give the desired solution. As demonstrated before bythe refinement strategy and increasing numbers of control intervals, a good initial guesscan overcome this problem because of the ’similarity’ of the solutions in the L1 distancesense. If however good initial guesses are not available and cannot be computed the waywe suggested, the black-box global optimization is the method of choice. Actually we usethe global optimization last but not least to detect if there is need for global optimization.

232


0.50 14

4.05

4.1

4.15

4.2

4.25

4.3

4.35

4.4

scaled x

radi

an m

easu

re o

f w

rapp

ing

Figure 7.31.: Wrapping over horizontal distance..

7.5.2. Global Optimization of the Drying Section Geometry

The global optimization algorithm presented in this work is applied to the problem of findingthe drying section geometry which allows the highest production rate. As for the wet-endproblem, gPROMS is used as a black-box solver for function and gradient evaluation.Again we face the problem that the objective function that is the machine speed, cannot beevaluated for every choice of the geometry parameters. Only if these are strictly feasible,we can solve the geometry equations for the wrapping lengths.The analytical problem of finding a geometry which causes maximal wrapping of the cylindercan be stated for the 3-parameter model as

minx1,x2,y

−`total(x1, x2, y), (7.5.6)

where we can get the total wrapping length (of a lower roll) from Chapter 2 as

`total(x1, x2, y) =2∑i=1

[arccos

(r2

2 + d2i − (y + r2)2 − x2

1

2r2di

)− arccos

(r1 + r2

di

)](7.5.7)

in radian measure. The distances are given by

di =√x2i + y2, i = 1, 2.

The constraints have to hold as already discussed in order for the length to have be real-valued. In Fig. 7.31 we can see the problem reduced to a single dimension by fixing the sumof x1 and x2 and y, evaluated for standard choices of r1 and r2. We see that the maximumlies at the boundary of the feasible set and that it is not unique. Real-valued wrappingangles cannot be computed outside. It can also be easily verified for the 2-parameterproblem that the maximum lies on the boundary and is not unique.

233


standard global

local 1 local 2

-

/R

Figure 7.32.: Standard, local and global optimal geometries with bound constraints.

Under the assumption that the problem of maximizing the possible paper production issolved by maximizing the wrapping angle with the heated cylinders, we can derive theconstructional concepts directly from the wrapping formulas.However, in the nonlinear drying section model, the dependency of the machine speed onthe geometry is not necessarily symmetric. In fact, it is not and we can explain this by thecontrol volume modeling used in our application. Moving the lower roll out of its symmetrypoint changes the lengths of zones that are balanced in the neighbor control volumes. Nowwe do not necessarily have equal conditions in both control volumes and so it can not beexpected that the effect of moving the lower roll has the same effect for both directions.This encourages the use of global optimization but it also tells us that the model is originallymade for symmetric layouts and that the conditions for the control volumes including flowrates of dry air into them have to be revisited. By using the tunneling algorithm on theproblem of maximizing the machine speed by choosing a suitable geometry representationwe found that the analytic results cannot be directly transfered to the model problem. Theparameters of the 2-parameter models are scaled so that the symmetric standard geometryhas the coordinates (0.3, 0.5). Let us say that the standard geometry causes a machinespeed of 1. Then we identified 3 critical points as

y x1 f

1 0 1.0171 1 1.017

0.338 1 1.041

234

7.6. Conclusions and Outlook

We need to explain the existence of the local solutions of the maximization problem. Inour model we set fixed rates of air flow to the control volumes that surround each roll.The properties of this air flow are chosen so that the humid air is evacuated. Now if we’pull down’ the lower rolls we cause long tangents and thus more contact length of thepaper with the air control volume. If we could change the length of these contact zonesindependently of the wrapping of the hot cylinders, it is clear that longer contact zoneswill cause the paper to dry faster, thus allowing higher machine speeds. Actually, we haveto make some compromise and it seems that there are geometries which profit more if thevertical distance is increased, causing a decrease of wrapping and an increase in air usage.This becomes evident when we think of the fact that the wrapping length tends to the halfof the cylinder circumference as the vertical distance is growing and it can be verified thatthe wrapping length loses sensitivity with respect to horizontal movement the lower theposition actually is. So we found that the assumed global minimum actually is consistentwith one the theoretical global minima for the analytical wrapping problem and that thebasin of attraction is relative small. We have to find a good initial guess in order not to getstuck on the lower bounds of the vertical distance. Such a guess is not necessarily given thestandard geometry since it depends on the choice of the local solver, especially line-searchmethods might find the solutions on the boundary when initial step sizes are too large.Finally we confirmed the results of the geometry optimization presented before, explainedthe results and limitations of the this optimization approach. As already shown in Fig.7.31, the standard geometry has the lowest wrapping angle with respect to a fixed verticaldistance. So we recommend analyzing asymmetric concepts, not only concerning construc-tional issues but also with focus on the air system which is strongly related to the dryingprocess. It might be worthy to model the complex three-dimensional geometry of dryingsection and simulate air flows to compare symmetric and asymmetric geometries.


The idea of a local minimization method for linearly constrained method using projectedgradients and methods for initial value problems was presented and its perspectives inoptimal control was discussed. The algorithm is a quick implementation to illustrate thecapabilities of such a method which was done successfully. It can be used to analyze theerror-optimal control problems which are actually only linearly constrained, gives a goodimpression on how the objective function surface looks like, and explains why line-searchmethods can be quite inefficient on these types of problems.We were concerned with the time-optimal control problem with trajectory boundaries in-troduced in Chapter 4 and solved it for exemplary applications with Algorithm 4.1. Wecould demonstrate that the time-optimal control problem could be solved up to a speci-fied tolerance by a finite sequence of linearly constrained error-optimal control problems.The optimal control structures for error-optimal control problems were computed by theapplication of a refinement scheme on the control interval discretization and confirmed theresults by using the global optimization method presented in this work.The C++ library where we implemented a foreign process routine for gPROMS was suc-cessfully used to perform the full parametric sensitivity analysis for a standard case of thepaper machine drying section model presented in Chapter 2. The results show that thistool has very practical use for the design and optimization of the drying process. We cannow analyze which parts of the drying section are influenced the most by any parameters

235


and especially by the parameters of interest. We saw that the output of the drying section,namely dry paper, does not only depend on drying section input and steam pressure con-trols but on the very geometry of the machine. We used this information to encourage anoptimization problem considering the production capacity by means of the machine speedas a function of geometry parameters. To describe a valid geometry, we use different vari-ants resulting in parameter vectors of different length, thus leading to different problemdimensions. Finally, the results of the optimizations could hardly be misunderstood. Asignificant increase of the possible production capacity can theoretically be achieved byarranging the rolls of the drying section asymmetrically in a sense that the distances froma lower roll to its neighbor upper rolls are not equal.The global optimization method based on the tunneling idea works. It was used in differentconfigurations on a set of academic test functions. We tried 8 configurations on 6 testproblems, each test included about 2000 complete global optimization runs which meansthat almost 100000 global optimization problems have been solved by using millions offunction evaluations, effectively. We found that it strongly depends on the problem whichconfiguration outperforms the others. In most cases, the algorithm based on tunneling wassignificantly better than performing random searches for initial guesses. Our results showthat the use of ellipsoidal supports of pole functions performs better as the spherical ap-proach for higher problem dimensions. The effect of a deterministic or random backgroundstrategy also depends on the function to be minimized. One can be lucky that the firstpoint in the deterministic sequence leads to the global minimum, then it is very hard to beatand all further strategies have no influence. Additionally, there are algorithm parametersto control the use of background strategy points. By increasing these cycle parameters suf-ficiently high, the background strategy can be deactivated or setting them to zero results invery basic version of a tunneling method that gets inefficient for high dimensional problemsas we argued. Besides the three configuration parameters shape-identification, reshapingand background strategy, there are several other parameters that are highly sensitive, suchas the strength of the poles, the maximal size of pole regions, the perturbation step lengthfor restarting close to poles and some more. Since we fixed these for all of our test problems,we cannot expect the algorithm to be best suitable for all problems and we suggest that theuser of such an algorithm spends some time to adjust the algorithm to the current problem.Finally, the algorithm shows to be robust in a way that it solves black-box optimizationproblems reliably. This is confirmed by applying the algorithm to the optimization prob-lems for the wet-end optimal control and the drying section geometry. We use it to gainknowledge about the problem to be solved and confirm that the idea of adaptive controlinterval refinements is suitable to circumvent the problem of having multiple local minima,otherwise the global optimization helps to identify the best candidates for global minima.In this work we presented extended concepts for process simulation and optimization inpulp and paper industry and successfully applied the methods to exemplary problem thatwere designed to have a strong relation to the real processes. Now we have a powerfulframework with embedded commercial software packages set up to assist the research anddevelopment. First applications with the geometrical description of drying section layoutsand the wet-end process are very promising and the methods presented here recommendthemselves for further use in paper industry.

236


7.6.1. Outlook on Tunneling-Type Algorithms

We discussed several topics which are interesting for further investigation. The multi-startapproach and the tunneling concept for global optimization is quite little developed. Inthis work we made a step and showed ways how to extend the methods. We think thatfor problems where gradient information is available, these methods are very attractive.If these methods are developed towards self-adaptation as it is done for other global opti-mization techniques such as the evolutionary strategies, tunneling-type multi-start methodscan become very efficient for high-dimensional problems involving many local and globalminima.We see this topic scientifically related to the use of ODE solvers for unconstrained smoothoptimization problems. We showed how to realize a gradient path methods based on pro-jected gradients for linearly-constrained problems and also presented an idea how to inte-grate the tunneling concept in an adaptive way. It seems that gradient path methods are avery natural choice for the local solver used in tunneling algorithms. It might be worth todevelop a gradient path method which has special modifications to make it best suitable foruse with tunneling functions. Then, this would be a whole new global optimization methodbecause the choice of the local optimization solver is no longer arbitrary but dedicated.

237

Appendix A.

Projected Gradient Path Algorithm inMATLAB

This is the full MATLAB R© code for an implementation of the gradient path method pre-sented and used in Chapter 7. This is the main function that can be called to minimize aprovided function with linear constraints.function [stepping , x, f, ncheck] = gradpath_2(objective_fun , x0, A0 , b0 , ind_eq0)

% start value

% ------------

x(:,1) = x0;;

% start order

% ------------

start_order = 1;

k = 1;

start_s = start_order -1;

s = 0;

In = eye (2);

usage.fsolve = 0;

usage.simple = 0;

usage.fsolve_func = 0;

usage.simple_func = 0;

usage.failed = 0;

% constraints

% ------------

global A b ind_eq working_eq;

A = A0;

b = b0;

ind_eq = ind_eq0;

ind_ineq = setdiff (1:1: size(A,1), ind_eq );

working_eq = ind_eq;

% BDF coefficients

% ------------------

alpha = zeros (5,1);

alpha (:,1) = [1; -1; 0; 0; 0];

alpha (:,2) = [3/2; -2; 1/2; 0; 0];

alpha (:,3) = [11/6; -3; 3/2; -1/3; 0];

alpha (:,4) = [25/12; -4; 3; -4/3; 1/4];

% Adams -Bashforth coefficients

% ---------------------------------

beta = zeros (4 ,3);

beta (:,1) = [-1; 3; 0; 0]/2;

beta (:,2) = [5; -16; 23; 0]/12;

beta (:,3) = [-9; 37; -59; 55]/24;

% METHOD PARAMETERS

% --------------------------

global tol_constr func;

tol = 1e-10;

tol_constr = 1e-8;

max_step = inf;

min_step = 1e-3;

armijo_constant = 0.0;

step_reduce = 1/4;

reduce_order = 1e-10;

tol_newton = 1e-10;

maxiter = 15;

options = optimset(’Display ’, ’off’, ’TolFun ’, tol_newton , ’NonlEqnAlgorithm ’, ’dogleg ’);

func = @(x) objective_fun(x);

stop = false;

n = 1;

n_after_init = 1;

initial_h = 1;

reduced = false;

T(1) = 0;

T_after_init = 0;

Bk = In;

239

Appendix A. Projected Gradient Path Algorithm in MATLAB

% Iteration 0

% -------------------------------------------------------------------------

[f(1), grad (:,1)] = func(x(: ,1));

str = sprintf(’It.: %d: x = [%.5f, %.5f], f(x) = %.5f’, 0, x(1,1), x(2,1), f(1));

disp(str);

step (1) = 0;

I = eye (2);

constr = constraints(x(: ,1));

[proj_grad (:,n), check] = gradient_projection(x(:,n), grad(:,n), ind_eq , ind_ineq );

ncheck (1) = norm(check);

if norm(check) < tol

disp(’nothing to be done’);

return;

end

% -------------------------------------------------------------------------

while stop == false

n = n + 1;

n_after_init = n_after_init + 1;

nsteps = length(step);

total = sum(step(nsteps :-1:nsteps -n_after_init +2));

if total == 0

initial_h = min_step;

else

initial_h = total/(max(k, s+1));

end

h = initial_h;

while 1

history = virtual_grid(x,k,h,step);

history_grad = virtual_grid(proj_grad , s+1, h, step);

constr_old = constraints(x(:,n-1));

% Adams -Bashforth predictor

if s == 0

predhist = -history_grad (:,1);

else

predhist = 0;

for i=0:s

predhist = predhist + beta(s+1-i,s)* history_grad (:,i+1);

end

end

predictor = x(:,n-1) + h*predhist;

guess = predictor;

% check predictor

c_pred = constraints(predictor );

% make initial guess feasible

if ~isempty(find(c_pred(ind_ineq )<0))

[v, violated] = min(c_pred );;

hpred = -0.5 * A(violated ,:)*x(:,n-1)/(A(violated ,:)* predhist );

guess = x(:,n-1) + hpred * predhist;

h = hpred; % reduce step

end

hist = 0;

for i=2:k+1

hist = hist + alpha(i,k)* history(:,i-1);

end

nonlin = @(x) h*nlfun(x) + alpha(1,k)*x + hist;

% active constraints

working_eq = ind_eq;

currently_active = find(abs(constr(ind_ineq))< tol_constr );

% --------------------------------------------------------------------------

% Simplified Newton

converged = false;

Jacobian = alpha(1,k)*In + h*Bk; % Jacobian approximation

xn_simple = [];

xn_simple (:,1) = guess;

res1 = inf;

for i=1: maxiter

res1 = nonlin(xn_simple(:,i));

dx = - Jacobian\res1;

xn_simple(:,i+1) = xn_simple (:,i) + dx;

if norm(res1) < tol_newton

converged = true;

break;

end

end

240

if ~converged

% Solve with MATLAB ’s method

[xn , fval , exitflag , output , jacobian] = fsolve(nonlin , guess , options );

used = ’fsolve ’;

usage.fsolve = usage.fsolve + 1;

usage.fsolve_func = usage.fsolve_func + output.funcCount;

usage.failed = usage.failed + i;

else

xn = xn_simple(:,size(xn_simple ,2));

used = ’simplified ’;

usage.simple = usage.simple + 1;

usage.simple_func = usage.simple_func + i;

end

% --------------------------------------------------------------------------

c_xn = constraints(xn);

% if predictor points in and corrector points out

if ~isempty(find(c_xn <0)) && isempty(find(c_pred <0))

% trust the corrector and walk on the boundary !

working_eq = currently_active;

xn = fsolve(nonlin , predictor , options );

end

% check constraints

[constr , A] = constraints(xn);

[ci , i] = min(constr );

if isempty(ind_eq)

c_eq = 0;

else

[c_eq , i_eq] = max(abs(constr(ind_eq )));

end

if ci < -tol_constr || (c_eq > tol_constr)

% linear extrapolation

% BDF infeasible

disp(’BDF step reached boundary ... reducing last step.’);

d1 = predhist;

d2 = xn - x(:,n-1);

h1 = h;

h2 = 1;

if abs(ci) > c_eq

violated = min(find(constr < 0));

if ~isempty(violated)

% now put xn onto the boundary by reducing step

h1 = h1 * constr_old(violated )/( constr_old(violated)-c_pred(violated ));

h2 = h2 * constr_old(violated )/( constr_old(violated)-constr(violated ));

xn1 = x(:,n-1) + h1*d1;

xn2 = x(:,n-1) + h2*d2;

xn = (xn1 + xn2 )/2;

end

h = h1;

end

[constr , A] = constraints(xn);

currently_active = find(abs(constr(ind_ineq))< tol_constr );

working_eq = currently_active;

% re - initialize solver

n_after_init = 1;

T_after_init = 0;

k = 1;

s = 0;

end

% current point evaluation (not necessary if information is stored in

% implicit equation solver

[fn , gradn] = func(xn);

ddirection = proj_grad (:,n-1)’ * grad(:,n-1);

armijo_bound = f(n-1) + h * armijo_constant * ddirection;

% armijo condition

if n_after_init == 1 || (fn < armijo_bound)

x(:,n) = xn;

f(n) = fn;

step(n) = h;

T(n) = T(n-1) + h;

T_after_init = T_after_init + h;

grad(:,n) = gradn;

[proj_grad (:,n), check] = gradient_projection(x(:,n), grad(:,n), ind_eq , ind_ineq );

ncheck(n) = norm(check);

% update of Hessian

yk = proj_grad(:,n) - proj_grad (:,n-1);

xk = x(:,n) - x(:,n-1);

Bk = Bk + (yk-Bk*xk)*xk ’/(xk ’*xk); % Broyden update

241

Appendix A. Projected Gradient Path Algorithm in MATLAB

% increase order to max

if k < start_order

k = k + 1;

end

if s < start_s

s = s + 1;

end

break;

else

h = step_reduce * h;

if h < reduce_order

if k == 1

disp(’failed ’);

return;

else

if reduced == true

disp(’failed ’);

return;

end

str = sprintf(’Order reduced to 1’);

disp(str);

k = 1;

s = 0;

h = initial_h;

end

end

end

end

if norm(gradn) < tol || ncheck(n) < tol

stop = true;

end

end

disp(’success ’);

for i=1:n

stepping(i) = sum(step (1:i));

end

end

function [F] = nlfun(x)

global working_eq;

global func;

[f, grad] = func(x);

F = gradient_projection(x, grad , working_eq , []);

end

function [c, A] = constraints(x)

global A b;

c = A*x - b;

end

function [pg, check] = gradient_projection(x, grad , ind_eq , ind_ineq)

global tol_constr;

[constr , A] = constraints(x);

activeset = ind_eq;

if ~isempty(ind_ineq)

% check constraints

W_eq = find(abs(constr(ind_ineq))< tol_constr );

% blocking constraints -> build guess for active set

blocking = find(A(W_eq ,:)*- grad < 0);

if ~isempty(blocking)

activeset = [ind_eq; W_eq(blocking )];

end

else

W_eq = activeset;

end

% projected gradient

Z = null(A(activeset ,:));

pg = Z*inv(Z’*Z)*Z’*grad;

%Lagrange multipliers for projected gradient problem

ind_ineq = setdiff (1: size(A,1), W_eq);

lambda = A(ind_ineq ,:)’\pg;

242

check = lambda ’* constr(ind_ineq );

end

The following code implements a call of the gradient path method for Rosenbrock’s functionand plots the result trajectories.function run_gradpath

% start value

x0 = [2; 1.2];

plots = true;

% constraints

% lb1 <= x1 <= ub1

% lb2 <= x2 <= ub2

lb1 = 0;

ub1 = 3;

lb2 = 0;

ub2 = 3;

A = [eye (2);

-eye (2);

1 1];

b = [lb1;

lb2;

-ub1;

-ub2;

0];

ind_eq = [];

[T, X, f, opt] = gradpath_2(@func , x0 , A, b, ind_eq );

n = length(T);

if plots

figure

h = plot(T, X(1,:),’o-’);

set(h, ’Color’, ’black’, ’DisplayName ’, ’x_1’);

hold on

h = plot(T, X(2,:),’x-’);

set(h, ’Color’, ’black’, ’DisplayName ’, ’x_2’);

legend(’show’);

figure

hold on

h = plot (1:n, opt , ’.’);

set(h, ’Color’, ’black’);

figure

h = plot(X(1,:),X(2,:),’x-’);

set(h, ’Color’, ’black’);

hold on;

[x,y] = meshgrid (0:0.01:2 , 0:0.01:2);

z = rosenbrock(x,y);

v = [0 1 2 3 5 10 100 1000 10000];

contour(x,y,z,v);

set(gca , ’FontName ’, ’Times’, ’FontSize ’, 14);

end

end

function [f, grad] = func(x)

f = (1-x(1))^2 + 100*(x(2)-x(1)^2)^2;

grad = zeros (2 ,1);

grad (1) = -2*(1-x(1)) - 400*(x(2)-x(1)^2)*x(1);

grad (2) = 200*(x(2)-x(1)^2);

end

243

Appendix B.

Distance of two Piecewise-Constant ControlsGiven two controls of the form

ui(t) =

vi1 0 ≤ t < zi1vi2 zi1 ≤ t < z1

1 + zi2vi3 zi1 + zi2 ≤ t < zi1 + zi2 + zi3...

...

for i = 1, 2 and 0 ≤ t ≤ T . Then the L1-distance

‖u1(t)− u2(t)‖L1 =∫ T

0|u1(t)− u2(t)| dt

is given by the following program when the inputs are of the specified form

ui = (zi1, zi2, . . . , v

i1, v

i2, . . . ) i = 1, 2.

The program determines the superset of all nodes which refers to a new piecewise-constantfunction of absolute distances on the common grid. Then the areas of the blocks arecalculated and summed up.

function d = distance(u1, u2)% usage:

% ------

% d = distance(x1, x2)

%

% u1 and u2 are controls with an even number of elements each

% of the form:

%

% x1 = [z1 z2 z3... v1 v2 v3 ...]

% first set

n1 = length(u1);z1 = [0, u1(1:n1 /2)];v1 = [u1(n1 /2+1:n1)];for i=1: length(z1)-2

nodes1(i) = sum(z1(1:i+1));end

% second set

n2 = length(u2);z2 = [0, u2(1:n2 /2)];v2 = [u2(n2 /2+1:n2)];

245

Appendix B. Distance of two Piecewise-Constant Controls

for i=1: length(z2)-2nodes2(i) = sum(z2(1:i+1));

end

T = min(sum(z1), sum(z2));nodes1(length(z1)-1) = T;nodes2(length(z2)-1) = T;total = [0, union(nodes1 , nodes2 )];

% remove irrelevant intervals

nt = length(total );while abs(total(nt)-total(nt -1)) < 1e-10

total = total (1:nt -1);nt = length(total);

end

error = @(t) abs(pcfun(t, z1, v1) - pcfun(t, z2, v2));

TOTAL_ERR = 0;for i=2: length(total)

p = (total(i-1) + total(i))/2;h = total(i) - total(i-1);TOTAL_ERR = TOTAL_ERR + h*error(p);

end

d = TOTAL_ERR;

end

function f = pcfun(t, z, v)

for k=1: length(t)if t(k) == 0

f(k) = v(1);endfor i=2: length(z)

if (sum(z(1:i-1)) < t(k) && t(k) <= sum(z(1:i)))f(k) = v(i-1);break;

endend

end

end

246

Symbols and Abbreviations

Abbreviations . . . . .r-GOP . . . . . . . . . . . . . . Global Optimization Problem of finding r global optima . . . . . . 130ABA . . . . . . . . . . . . . . . . Analytic Basin of Attraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170BDF . . . . . . . . . . . . . . . . Backward-Difference Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65CSTR . . . . . . . . . . . . . . . Continuously-Stirred Tank Reactor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33DAE . . . . . . . . . . . . . . . . Differential-Algebraic set of Equations . . . . . . . . . . . . . . . . . . . . . . . . . 12EDF . . . . . . . . . . . . . . . . Ellipsoidal Distance Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124GOP . . . . . . . . . . . . . . . . Global Optimization Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118gPROMS . . . . . . . . . . . . general PROcess Modeling System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11KKT point . . . . . . . . . . a point that satisfies the conditions of Karush-Kuhn-Tucker . . . . 84NLP . . . . . . . . . . . . . . . . . NonLinear Program. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83ODE . . . . . . . . . . . . . . . . Ordinary Differential Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12PDE . . . . . . . . . . . . . . . . Partial Differential Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34SQP . . . . . . . . . . . . . . . . . Sequential Quadratic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93TOC . . . . . . . . . . . . . . . . Time-Optimal Control problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

Drying Section SymbolsαPw . . . . . . . . . . . . . . . . . . heat transfer coefficient wire/paper . . . . . . . . . . . . . . . . . . . . [W/K ·m2]αPair . . . . . . . . . . . . . . . . . heat transfer coefficient air/paper . . . . . . . . . . . . . . . . . . . . . [W/K ·m2]αPcyl . . . . . . . . . . . . . . . . . heat transfer coefficient paper/cylinder . . . . . . . . . . . . . . . . [W/K ·m2]αcondcyl . . . . . . . . . . . . . . . . heat transfer coefficient cylinder/condensate . . . . . . . . . . . [W/K ·m2]∆p(i) . . . . . . . . . . . . . . . . discretization step length of zone i . . . . . . . . . . . . . . . . . . . . . . . . . . . . [m]∆x . . . . . . . . . . . . . . . . . . horizontal distance between the centers of two neighbor rolls . . [m]∆y . . . . . . . . . . . . . . . . . . vertical distance between the centers of two neighbor rolls . . . . . [m]∆VH . . . . . . . . . . . . . . . . evaporation enthalpy of water . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [J/kg]m . . . . . . . . . . . . . . . . . . . water evaporation rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [kg/m2 · s]Mdry . . . . . . . . . . . . . . . . mass flow of dry air to the control volume . . . . . . . . . . . . . . . . . . [kg/s]`c . . . . . . . . . . . . . . . . . . . length of arc on the lower roll, contact length with the cylinder on

the left half . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [m]`tan . . . . . . . . . . . . . . . . . length of the tangent between lower and upper roll . . . . . . . . . . . . [m]η . . . . . . . . . . . . . . . . . . . . diffusion coefficient for the Stefan equation. . . . . . . . . . . . . . . . . . . . . [–]γw . . . . . . . . . . . . . . . . . . . ratio of heat transfer through a wire . . . . . . . . . . . . . . . . . . . . . . . . . . . [–]γair . . . . . . . . . . . . . . . . . . multiplier for heat transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [–]S . . . . . . . . . . . . . . . . . . . . sinks and sources of enthalpy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [W]ωcyl . . . . . . . . . . . . . . . . . heat conductivity cylinder wall (steel) . . . . . . . . . . . . . . . . . . [W/m ·K]ρ0air . . . . . . . . . . . . . . . . . . density of the humid air to the control volume . . . . . . . . . . . . [kg/m3]ρcyl . . . . . . . . . . . . . . . . . . densitiy of cylinder wall (steel) . . . . . . . . . . . . . . . . . . . . . . . . . . . [kg/m3]cpP . . . . . . . . . . . . . . . . . heat capacity of the paper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [J/kg ·K]G . . . . . . . . . . . . . . . . . . . specific mass of the solids in the paper . . . . . . . . . . . . . . . . . . . . [kg/m2]Gb . . . . . . . . . . . . . . . . . . total specific mass of the paper . . . . . . . . . . . . . . . . . . . . . . . . . . . [kg/m2]pwa . . . . . . . . . . . . . . . . . . partial pressure of water in air . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [Pa]

247

Symbols and Abbreviations

pwP . . . . . . . . . . . . . . . . . . partial pressure of water on surface of paper . . . . . . . . . . . . . . . . . . [Pa]Qcylair . . . . . . . . . . . . . . . . . heat flow from the cylinder to the air . . . . . . . . . . . . . . . . . . . . . . . . . [W]

QcylP . . . . . . . . . . . . . . . . . heat flow from the cylinder to the paper . . . . . . . . . . . . . . . . . . . . . . [W]

T . . . . . . . . . . . . . . . . . . . temperature of the paper. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [K]Tair . . . . . . . . . . . . . . . . . temperature of the air. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [K]Tcyl . . . . . . . . . . . . . . . . . temperature of the cylinder surface . . . . . . . . . . . . . . . . . . . . . . . . . . . . [K]u . . . . . . . . . . . . . . . . . . . . moisture content of the paper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [kg/kg]uair . . . . . . . . . . . . . . . . . air moisture content. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [kg/kg]v . . . . . . . . . . . . . . . . . . . . machine speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [m/s]Aair . . . . . . . . . . . . . . . . . contact area between cylinder surface and air . . . . . . . . . . . . . . . . [m2]cpcyl . . . . . . . . . . . . . . . . . heat capacity of the cylinder wall (steel). . . . . . . . . . . . . . . . . [J/kg ·K]dcyl . . . . . . . . . . . . . . . . . . thickness of the cylinder wall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [mm]Mw . . . . . . . . . . . . . . . . . . molar mass of water . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [kg/mol]pa . . . . . . . . . . . . . . . . . . . ambient pressure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [Pa]pj . . . . . . . . . . . . . . . . . . . discretization node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [m]q0air . . . . . . . . . . . . . . . . . . mass flow fresh air. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [kg/s]R . . . . . . . . . . . . . . . . . . . universal gas constant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [J ·K/mol]r1 . . . . . . . . . . . . . . . . . . . radius of heated (upper) rolls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [m]r2 . . . . . . . . . . . . . . . . . . . radius of unheated (lower) rolls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [m]u0air . . . . . . . . . . . . . . . . . moisture content of the fresh air to the control volume . . . . [kg/kg]

Wet-End Model Symbols∆p . . . . . . . . . . . . . . . . . . pressure difference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [Pa]∆hp . . . . . . . . . . . . . . . . . pressure head . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [Pa]` . . . . . . . . . . . . . . . . . . . . liquid level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [m]γdew . . . . . . . . . . . . . . . . . dewatering fraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [–]ρ . . . . . . . . . . . . . . . . . . . . density of pulp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [kg/m3]dslice . . . . . . . . . . . . . . . . slice opening headbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [mm]fpc . . . . . . . . . . . . . . . . . . pump characteristic function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [m]g . . . . . . . . . . . . . . . . . . . . gravitational acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [m/s2]mfib . . . . . . . . . . . . . . . . . specific mass of fibers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [g/l]mfib . . . . . . . . . . . . . . . . . specific mass of fillers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [g/l]R . . . . . . . . . . . . . . . . . . . rotation speed of pump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [1/s]Rfib . . . . . . . . . . . . . . . . . fiber retention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [kg/kg]Rfil . . . . . . . . . . . . . . . . . filler retention. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [kg/kg]Rtot . . . . . . . . . . . . . . . . . total retention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [kg/kg]T . . . . . . . . . . . . . . . . . . . . temperature of pulp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [K]V . . . . . . . . . . . . . . . . . . . fill volume of liquid in a storage element . . . . . . . . . . . . . . . . . . . . . [m3]v . . . . . . . . . . . . . . . . . . . . flow velocity of pulp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [m2/s]vM . . . . . . . . . . . . . . . . . . machine speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [m/s]vin . . . . . . . . . . . . . . . . . . inflow rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [m3/s]vof . . . . . . . . . . . . . . . . . . overflow rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [m3/s]vout . . . . . . . . . . . . . . . . . outflow rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [m3/s]WM . . . . . . . . . . . . . . . . . width of the machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [m]ws . . . . . . . . . . . . . . . . . . . substance of the paper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [g/m2]

248

Bibliography

[AHM07] K. Stockmann A. Hartwich and W. Marquardt. Dyos: A software solution foradaptive control vector paramterizatuin in nonlinear model-predicitve control.NMPC-SOFAP Workshop, Loughborough, England, 2007.

[AL07] B. Addis and M. Locatelli. A new class of test functions for global optimization.J. of Global Optimization, 38(3):479–501, 2007.

[AM98] H. A. Al-Mharmah. Average performance of quasi monte carlo methods forglobal optimization. In WSC ’98: Proceedings of the 30th conference on Wintersimulation, pages 623–628, Los Alamitos, CA, USA, 1998. IEEE ComputerSociety Press.

[AQS02] R. Sacco A. Quarteroni and F. Saleri. Numerische Mathematik 2 (Springer-Lehrbuch) (German Edition). Springer, 1 edition, 9 2002.

[AS06] J. Akesson and O. Slatteke. Modeling, calibration and control of a paper ma-chine dryer section. Modelica Conference 2006 at arsenal research in Vienna,Austria - Proceedings, 2006.

[Atk93] P. W. Atkins. Einfuehrung in Die Physikalische Chemie. Wiley-VCH VerlagGmbH, 3 1993.

[BBB89] A. A. Brown and M. C. Bartholomew-Biggs. Some effective methods for uncon-strained optimization based on the solution of systems of ordinary differentialequations. J. Optim. Theory Appl., 62(2):211–224, 1989.

[Bed09] S. Beddiaf. Continuous steepest descent path for traversing non-convex regions.Advanced Modeling and Optimization, 11(1):3–24, 2009.

[Beh98] W. Behrmann. An efficient gradient flow method for unconstrained optimiza-tion. PhD thesis, Stanford University, 1998.

[Bel03] R. Bellman. Dynamic Programming. Dover Publications, 3 2003.

[Ber07] D. P. Bertsekas. Dynamic Programming and Optimal Control (2 Vol Set).Athena Scientific, 3rd edition, 1 2007.

[Bet93] J. T. Betts. Trajectory optimization using sparse sequential quadratic program-ming. International Series of Numerical Mathematics, 111:115 – 128, 1993.

[Bey95] H.-G. Beyer. Toward a theory of evolution strategies: Self-adaptation. Evolu-tionary Computation, 3(3):311–347, 1995.

[BG91] C. Barron and S. Gomez. The exponential tunneling method. Technical ReportIIMAS, 1(3), 1991.

249

Bibliography

[Bot78] C. A. Botsaris. A class of methods for unconstrained minimization based onstable numerical integration techniques. J. Math. Anal. Appl., 63(3):729–749,1978.

[BP94] P. I. Barton and C. C. Pantelides. Modeling of combined discrete/continuousprocesses. AIChE Journal, 40(6):966 – 979, 1994.

[BRK87] C. G. E. Boender and A. H .G. Rinnooy-Kan. Bayesian stopping rules formultistart global optimization methods. Mathematical Programming, 37(1):59–80, 1987.

[Bro91] I. N. Bronstein. Taschenbuch Der Mathematik 25ed. B. G. Teubner Verlagsge-sellschaft, 1991.

[BS93] T. Baeck and H.-P. Schwefel. An overview of evolutionary algorithms for pa-rameter optimization. Evolutionary Computation, 1(1):1–23, 1993.

[BS96] R. Bulirsch and J. Stoer. Introduction to Numerical Analysis (Texts in AppliedMathematics, No 12). Springer, 2nd edition, 4 1996.

[BS02] H.-G. Beyer and H.-P. Schwefel. Evolutions strategies: A comprehensive Intro-duction. Natural Computing, 1:3–52, 2002.

[But87] J. C. Butcher. The Numerical Analysis of Ordinary Differential Equations:Runge-Kutta and General Linear Methods. John Wiley & Sons Inc, 2 1987.

[BV04] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge UniversityPress, 3 2004.

[BW79] E. Braaten and G. Weller. An Improved Low-Discrepancy Sequence for Multi-dimensional Quasi-Monte Carlo Integration. Journal of Computational Physics,33:249–+, November 1979.

[CE93] F. E. Cellier and H. Elmqvist. Automated formula manipulation supportsobject-oriented continuous-system modeling. Control Systems Magazine, IEEE,13(2):28–38, Apr 1993.

[Cel79] F. E. Cellier. Combined continuous/discrete system simulation by use of digitalcomputers. Phd thesis, Diss. Techn.Wiss. ETH Zuerich, 1979.

[CG00] L. Castellanos and S. Gomez. A new implementation of the tunneling methodsfor bound constrained global optimization. Reporte de Investigacin IIMAS,10(59):1–18, 2000.

[CM87] P. H. Calamai and J. J. More. Projected gradient methods for linearly con-strained problems. Mathematical Programming, North Holland, 39:93–116,1987.

[Cor75] C. R. Corles. The use of regions of attraction to identify global minima, in:L.C.W. Dixon, G.P. Szego (Eds.), Towards Global Optimization. North Hol-land, Amsterdam, 1975.

[Cou43] R. Courant. Variational methods for the solution of problems of equilibriumand vibrations. Bull. Amer. Math. Soc., 49:1, 1943.

250

Bibliography

[Cry73] C. W. Cryer. On the instability of high order backward-difference multistepmethods. BIT, 13:153–159, 1973.

[CS85] M. Caracotsios and W. E. Stewart. Sensitivity analysis of initial value problemswith mixed ode and algebraic constraints. Comput. Chem. Engrg., 9:359–365,1985.

[DA91] A. Dekkers and E. Aarts. Global optimization and simulated annealing. Math-ematics and Statistics, 50(1-3):367–393, 1991.

[Dah08] E. Dahlquist. Process simulation for pulp and paper industries: Current prac-tice and future trend. Chemical Product and Process Modeling, 3(1), 2008.

[Dav06] T. A. Davis. Direct Methods for Sparse Linear Systems (Fundamentals of Al-gorithms). Society for Industrial and Applied Mathematic, illustrated edition,9 2006.

[DB95] P. Deuflhard and F. Bornemann. Numerische Mathematik II. Walter DeGruyter Inc, 9 1995.

[DCH01] M. A. Pai D. Chanitotis and I. Hiskens. Sensitivity analysis of differential-algebraic systems using the gmres method – application to power systems. InProceedings of the IEEE International Symposium on Circuits and Systems,Sydney, Australia, May 2001.

[Deu04] P. Deuflhard. Newton Methods for Nonlinear Problems: Affine Invarianceand Adaptive Algorithms (Springer Series in Computational Mathematics).Springer, Berlin, 6 2004.

[Dix78] L. C. W. Dixon. The optimization problem: An introduction, Towards GlobalOptimization II. North Holland, Amsterdam, 1978.

[Dod06] B. Dodson. The Weibull Analysis Handbook. ASQ Quality Press, 2nd edition,4 2006.

[DR96] I. S. Duff. and J. K. Reid. The design of ma48: a code for the direct solutionof sparse unsymmetric linear systems of equations. ACM Trans. Math. Softw.,22(2):187–226, 1996.

[DVS09] J. De Schutter D. Verscheure, M. Diehl and J. Swevers. Recursive log-barriermethod for on-line time-optimal robot path tracking. Accepted for Ameri-can Control Conference, http://people.mech.kuleuven.be/ dversche/cv/cv.html,2009.

[Ede04] A. Edelman. Mathematics 18.337, computer science 6.338, sma 5505, Spring2004.

[EGK08] C. Eck, H. Garcke, and P. Knabner. Mathematische Modellierung (Springer-Lehrbuch) (German Edition). Springer, 1 edition, 5 2008.

[EH08] J. Ekvall and T. Hagglund. Improved web break strategy using a new approachfor steam pressure control in paper machines. Control Engineering Practice,16(10):1151 – 1160, 2008.

251

Bibliography

[Ekv04] J. Ekvall. Dryer Section Control in paper machines during web breaks. Licen-tiate thesis, Departement of Automatic Control, Lund Institute of Technology,Sweden, 2004.

[Eva98] L. C. Evans. Partial Differential Equations (Graduate Studies in Mathematics,V. 19) GSM/19. American Mathematical Society, 6 1998.

[FA99] G. Fraser-Andrews. A multiple-shooting technique for optimal control. SpringerJournal of Optimization Theory and Applications, 102(2):299–313, 1999.

[FA00] R. Findeisen and F. Allgoewer. Nonlinear model predictive control for index-one dae systems. Progress in Systems and Control Theory, 26:145–161, 2000.

[Fau82] H. Faure. Discrepance de suites associees a un systeme de numeration (endimension s). Acta. Arith., 41:337–351, 1982.

[FB07] E. Freitag and R. Busam. Funktionentheorie 1. Springer, Berlin, 2007.

[FTB97] W. F. Feehery, J. E. Tolsma, and P. I. Barton. Efficient sensitivity analysisof large-scale differential-algebraic systems. Appl. Numer. Math., 25(1):41–54,1997.

[Gea71] C. W. Gear. The automatic integration of ordinary differential equations. Com-mun. ACM, 14(3):176–179, 1971.

[GJ08] A. Georgieva and I. Jordanov. Hybrid metaheuristics for global optimization: Acomparative study. In HAIS ’08: Proceedings of the 3rd international workshopon Hybrid Artificial Intelligence Systems, pages 298–305, Berlin, Heidelberg,2008. Springer-Verlag.

[GK99] C. Geiger and C. Kanzow. Numerische Verfahren zur Loesung unrestringierterOptimierungsaufgaben (Springer-Lehrbuch) (German Edition). Springer, 1 edi-tion, 9 1999.

[Gri02] G. Corliss; C. Faure; A. Griewank. Automatic Differentiation of Algorithms.Springer, 1 2002.

[HAKJ03] V. Ionescu H. Abou-Kandil, G. Freiling and G. Jank. Matrix Riccati Equationsin Control and Systems Theory (Systems & Control: Foundations & Applica-tions). Birkhaeuser Basel, 1 edition, 9 2003.

[Hal60] J. H. Halton. On the efficiency of certain quasi-random sequences of points inevaluating multi-dimensional integrals. Numer. Math. 2, pages 84–90, 1960.

[Hal72] J. H. Halton. On the efficiency of certain quasi-random sequences of pointsin evaluating multi-dimensional integrals. Application of Number Theory toNumerical Analysis, Academic Press, New York, 1972.

[Han98] N. Hansen. Verallgemeinerte individuelle Schrittweitenregelung in der Evolu-tionsstrategie (German Edition). Mensch und Buch Verlag, 1998.

[HBS00] D. B. Leineweber H.G. Bock, M. M. Diehl and J. P. Schloeder. A direct multipleshooting method for real-time optimization of nonlinear dae processes. Progressin Systems and Control Theory, 26, 2000.

252

Bibliography

[Hei88] W. R. Heilmann. Fundamentals of risk theory. VVW, 1988.

[HEK02] J. Hartung, B. Elpelt, and K.-H. Klosener. Statistik. Lehr- und Handbuch derangewandten Statistik. Oldenbourg, 3 2002.

[Him72] D. M. Himmelblau. Applied Nonlinear Programming. Mcgraw-Hill (Tx), 61972.

[HJ61] R. Hooke and T. A. Jeeves. Direct search solution of numerical and statisticalproblems. J. ACM, 8(2):212–229, 1961.

[HO01] N. Hansen and A. Ostermeier. Completely derandomized self-adaptation inevolution strategies. Evolutionary Computation, 9(2):159–195, 2001.

[Hol98] J. H. Holland. Adaptation in Natural and Artificial Systems. MIT, 1998.

[Hol06] H. Holik, editor. Handbook of Paper and Board. Wiley-VCH, 4 2006.

[HW04] E. Hairer and G. Wanner. Solving Ordinary Differential Equations II: Stiff andDifferential-Algebraic Problems (Springer Series in Computational Mathemat-ics) (v. 2). Springer, 2nd edition, 3 2004.

[JNW09] A. Waechter J. Nocedal and R. A. Waltz. Adaptive barrier update strategiesfor nonlinear interior methods. SIAM J. Optim., 19(4):1674–1693, 2009.

[JS03] F. Jarre and J. Stoer. Optimierung. Springer, Berlin, 1 edition, 9 2003.

[KEBP89] S. L. Campbell K. E. Brenan and L. R. Petzold. Numerical Solution of InitialValue Problems in Differential Algebraic Equations. Elsevier Science Ltd, 91989.

[Kir84] S. Kirkpatrick. Optimization by simulated annealing: Quantitative studies.Journal of Statistical Physics, 34(5), 1984.

[KJAU58] L. Hurwicz K. J. Arrow and H. Uzawa. Studies in linear and non-linear pro-gramming. Stanford, Calif., Stanford University Press, 1958.

[Kri97] O. Krischer. Trocknungstechnik: Band 1: Die wissenschaftlichen Grundlagender Trocknungstechnik (German Edition). Springer, 7 1997.

[KS91] M. I. Kamien and N. L. Schwartz. Dynamic Optimization: The Calculus ofVariations and Optimal Control in Economics and Management (AdvancedTextbooks in Economics). Elsevier Science, 2nd edition, 10 1991.

[KS93] F.-S. Kupfer and E.W. Sachs. Reduced sqp methods for nonlinear heat conduc-tion control problems. International Series of Numerical Mathematics, 111:145– 160, 1993.

[LD02] X. S. Li and J. W. Demmel. Superlu dist: A scalable distributed-memorysparse direct solver for unsymmetric linear systems. ACM Trans. MathematicalSoftware, 29:110–140, 2002.

[LM85] A. V. Levy and A. Montalvo. The tunneling algorithm for the global minimiza-tion of functions. SIAM J. Sci. and Stat. Comput., 6(1):15–29, 1985.

253

Bibliography

[LP99] S. Li and L. Petzold. Design of new daspk for sensitivity analysis. Technicalreport, 1999.

[LP04] S. Li and L. Petzold. Adjoint sensitivity analysis for time-dependent par-tial differential equations with adaptive mesh refinement. J. Comput. Phys.,198(1):310–325, 2004.

[LPS06] Y. Cao L. Petzold, S. Li and R. Serban. Sensitivity analysis of differential-algebraic equations and partial differential equations. Computers & ChemicalEngineering, 30(10-12):1553 – 1559, 2006. Papers form Chemical Process Con-trol VII - CPC VII.

[MA71] B. Moore and J. Anderson. Linear Optimal Control. Prentice Hall PTR, 1971.

[Met53] N. Metropolis. Equation of state calculations by fast computing machines. TheJournal of Chemical Physics, 21(6):1087–1092, 1953.

[MG75] H. Maurer and W. Gillessen. Application of multiple shooting to the numericalsolution of optimal control problems with bounded state variables. SpringerComputing, 15(2):105–126, 1975.

[MMAV97] A. Toern M. M. Ali and S. Viitanen. A Numerical Comparison of Some Mod-ified Controlled Random Search Algorithms. Journal of Global Optimization,11(4):377–385, 1997.

[MP96] T. Maly and L.R. Petzold. Numerical methods and software for sensitivityanalysis of differential-algebraic systems. Appl. Numer. Math., 20(1-2):57–79,1996.

[MT98] M.Matsumoto and T.Nishimura. Mersenne twister: a 623-dimensionallyequidistributed uniform pseudo-random number generator. ACM Trans. Model.Comput. Simul., 8(1):3–30, 1998.

[Net09] The CAPE-OPEN Laboratories Network. Cape-open documentation, 2009.Available online at http://www.colan.org/, visited on June, 13th 2009.

[NM65] J. A. Nelder and R. Mead. A Simplex Method for Function Minimization. TheComputer Journal, 7(4):308–313, 1965.

[Noc80] J. Nocedal. Updating quasi-newton matrices with limited storage. Mathematicsof Computation, 35(151):773–782, 1980.

[Nor62] A. Nordsieck. On numerical integration of ordinary differential equations. Math-ematics of Computation, 16(77):22–49, 1962.

[NW06] J. Nocedal and S. Wright. Numerical Optimization (Springer Series in Opera-tions Research and Financial Engineering). Springer, 2nd edition, 7 2006.

[NY96] H. Nusse and J. A. Yorke. Basins of attraction. Science, 271(5254):1376–1380,1996.

[OC96] M. Otter and F.E. Cellier. Software for Modeling and Simulating Control Sys-tems, pages 415 – 428. 1996.

254

Bibliography

[OR87] J. M. Ortega and W. C. Rheinboldt. Iterative Solution of Nonlinear Equa-tions in Several Variables (Classics in Applied Mathematics, 30). Society forIndustrial Mathematics, 1 1987.

[PBN94] B. L. Fox. P. Bratley and H. Niederreiter. Programs to generate niederreiter’slow-discrepancy sequences. ACM Trans. Math. Softw., 20(4):494–495, 1994.

[Pet82] L. R. Petzold. Description of dassl: a differential/algebraic system solver. Pre-sented as IMACS World Congress, Montreal, 1982.

[Pet92] L. R. Petzold. Numerical solution of differential-algebraic equations in mechan-ical systems simulation. In Proceedings of the eleventh annual internationalconference of the Center for Nonlinear Studies on Experimental mathematics: computational issues in nonlinear science, pages 269–279, Amsterdam, TheNetherlands, The Netherlands, 1992. Elsevier North-Holland, Inc.

[PFH92] P.Bratley, B. L. Fox, and H.Niederreiter. Implementation and tests of low-discrepancy sequences. ACM Trans. Model. Comput. Simul., 2(3):195–213,1992.

[Por96] B. Porat. A Course in Digital Signal Processing. Wiley, 1 edition, 10 1996.

[Pri77] W. L. Price. A controlled random search procedure for global optimisation.The Computer Journal, 20(4):367–370, 1977.

[Pri83] W. L. Price. Global optimization by controlled random search. Journal ofGlobal Optimization, 40(3):333–348, 1983.

[Pro04a] Process Systems Enterprise, Ltd., London, United Kingdom. gPROMS Intro-ductory User Guide, release 2.3.1 edition, 2004.

[Pro04b] Process Systems Enterprise, Ltd., London, United Kingdom. gPROMS SystemProgrammer Guide, release 2.3 edition, 2004.

[PTS09] Papertechnische Stiftung PTS. http://www.ptspaper.de/, September 2009.

[Ram70] J. O. Ramsay. A family of gradient methods for optimization. The computerjournal, 13(4):413–417, 1970.

[Rec73] I. Rechenberg. Evolutionsstrategie;: Optimierung technischer Systeme nachPrinzipien der biologischen Evolution (Problemata, 15) (German Edition).Frommann-Holzboog, 1973.

[RMLT00] V. Torczon R. M. Lewis and M. W. Trosset. Direct search methods: then andnow. Journal of Computational and Applied Mathematics, 124(1-2):191 – 207,2000.

[Ros60] H. H. Rosenbrock. An Automatic Method for Finding the Greatest or LeastValue of a Function. The Computer Journal, 3(3):175–184, 1960.

[RSC05] A. Usman R. Shahnaz and I. R. Chughtai. Review of storage techniques forsparse matrices. In 9th International Multitopic Conference, IEEE INMIC2005, pages 1–7, Dec. 2005.

255

Bibliography

[SA05] O. Slatteke and K.J. Astrom. Modeling of a steam heated rotating cylinder -a grey-box approach. pages 1449–1454 vol. 2, June 2005.

[Sab95] J. Saborowski. Schatzung von Durchmesserverteilungen. Deutscher VerbandForstlicher Forschungsanstalten, 7. Tagung:153–163, 1995.

[Sch81] H.-S. Schwefel. Numerical Optimization of Computer Models. John Wiley &Sons Ltd, 6 1981.

[Sch04] L. D. Schmidt. The Engineering of Chemical Reactions (Topics in ChemicalEngineering). Oxford University Press, USA, 2 edition, 8 2004.

[Sch07] A. Schiela. Barrier methods for optimal control problemswith state constraints. Preprint for SIAM J. on Optimization,http://opus.kobv.de/zib/volltexte/2007/950/, 2007.

[Sch08] C. Schlier. On scrambled halton sequences. Appl. Numer. Math., 58(10):1467–1478, 2008.

[SG04] O. Schenk and K. Gaertner. Solving unsymmetric sparse systems of linearequations with pardiso. Future Generation Computer Systems, 20(3):475 –487, 2004. Selected numerical algorithms.

[SLZ00] L. Petzold S. Li and W. Zhu. Sensitivity analysis of differential-algebraic equa-tions: A comparison of methods on a special problem. Appl. Numer. Math.,32:161–174, 2000.

[Sob67] I. M. Sobol. The distribution of points in a cube and the approximate evaluationof integrals. USSR Comput. Math. Math. Phys. 7, 4:86–112, 1967.

[SS96] S. Salvini and G. Shaw. An evaluation of new nag library solvers for largesparse symmetric linear systems, 1996.

[Str93] O.von Stryk. Numerical solution of optimal control problems by direct collo-cation. International Series of Numerical Mathematics, 111:129 – 143, 1993.

[SY05] S.Kucherenko and Y.Sytsko. Application of deterministic low-discrepancy se-quences in global optimization. Comput. Optim. Appl., 30(3):297–318, 2005.

[TBS97] U. Hammel T. Baeck and H.-P. Schwefel. Evolutionary computation: Com-ments on the history and current state. IEEE Transactions on evolutionarycomputation, 1(1), 1997.

[vdC35] J. G. van der Corput. Verteilungsfunktionen. Proc. Ned. Akad. v. Wet., 38:813–821, 1935.

[VDP09] Verband Deutscher Papierfabriken VDP. http://www.vdp-online.de/publikationen uebersicht.html, September 2009.

[vLA87] P.J. van Laarhoven and E.H. Aarts. Simulated Annealing: Theory and Appli-cations (Mathematics and Its Applications). Springer, 1 edition, 6 1987.

[vTB09] M. van Turnhout and F. Bociort. Instabilities and fractal basins of attractionin optical system optimization. Optic Express, 17(1):314–328, 2009.

256

Bibliography

[Wal00] W. Walter. Gewohnliche Differentialgleichungen. Springer, Berlin, 2000.

[WB08] Y. Wang and S. Boyd. Fast model predictive control using online optimization.Proceedings IFAC World Congress, Seoul, pages 6974 – 6997, 2008.

[Wei39] W. Weibull. A statistical theory of the strength of materials. Ing. VetenkapsAkad. Handl., 151:1–45, 1939.

[Wei51] W. Weibull. A statistical distribution function of wide applicability. ASMEJorunal of Applied Mechanics, 18:293–297, 1951.

[Wei02] G. Weidl. Bayesian networks for root cause analysis in process operation. Eu-ropean Journal of Operational Research, Special Issue on Advances in ComplexSystems Modeling, 2002.

[Wil95] B. Wilhelmsson. An experimental and theoretical study of multi-cylinder paperdrying. Phd thesis, Departement of Chemical Engineering, Lund Institute ofTechnology, Sweden, 1995.

[YCS02] L. Petzold Y. Cao, S. Li and R. Serban. Adjoint sensitivity analysis fordifferential-algebraic equations: The adjoint dae system and its numerical so-lution. SIAM J. Sci. Comput., 24(3):1076–1089, 2002.

[Zan67] W. I. Zangwill. Non-linear programming via penalty functions. ManagementScience, 3(5):344–358, 1967.

[Zan78] I. Zang. A new arc algorithm for unconstrained optimization. Math. Program-ming, 15(1):36–52, 1978.

257

Date post:	16-Oct-2021
Category:	Documents
Upload:	others
View:	5 times
Download:	0 times

Global Nonlinear Optimization and Optimal Control for ...

Documents