Download - Structural Optimization

Structural Optimization

Univ.Prof. Dr. Christian BucherVienna University of Technology, Austria

WS 2009/10

Last Update: October 22, 2009

Contents1 Introduction 2

1.1 Mathematical Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Nonlinear Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Unconstrained Optimization 62.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Search methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2.1 Newton-Raphson method . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2.2 Steepest descent method . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2.3 Quasi-Newton methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Applications to shape optimization . . . . . . . . . . . . . . . . . . . . . . . . . 122.3.1 Minimal Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3.2 Shape optimization by energy minimization . . . . . . . . . . . . . . . . 13

3 Constrained Optimization 183.1 Optimality Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.2 Quadratic problem with linear constraints . . . . . . . . . . . . . . . . . . . . . . 193.3 Sequential quadratic programming (SQP) . . . . . . . . . . . . . . . . . . . . . . 213.4 Penalty methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4 Genetic algorithms 264.1 Basic Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.2 Choice of encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.3 Selection Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

1

WS 09/10 Structural Optimization

4.4 Recombination and mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.5 Elitism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5 Robustness in optimization 295.1 Stochastic modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.2 Application example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

c2007-2009 Christian Bucher October 22, 2009 2


1 Introduction

1.1 Mathematical OptimizationA mathematical optimization problem has the form

Minimize f0(x)subject to fi(x) 0; i = 1 : : :m

(1.1)

The vector (x) = [x1; x2; : : : ; xn] 2 Rn is the optimization (design) variable, the function f0 :Rn ! R is the objective function and the functions fi : Rn ! R; i = 1 : : :m are the constraintfunctions.

A vector x is called optimal (i.e. a solution of the problem (1.1)) if it has he smallest objectiveamong all vectors that satisfy the constraints. So for any z with f1(z) 0; : : : fm(z) 0 we havef(z) f(x).

Equality constraints can be realized by using pairs of inequality constraints, e.g.

fi(x) 0fi(x) 0

(1.2)

The set F containing all vectors z satisfying the constraints is called feasible domain.

F = fzjf1(z) 0; : : : ; fm(z) 0g (1.3)

Example: Maximize the area of a rectangle for given circumference 4L.

We want to maximize A = x1x2 subject to 2x1 + 2x2 = 4L. In terms of Eq.(1.1) we write

f0 = x1x2f1 = 2x1 + 2x2 4Lf2 = 2x1 2x2 + 4L

We can easily find a direct solution by eliminating one variable using the constraints so that x1 =2L x1 which gives

f0 = A = x1x2 = x1(2L x1) = x21 2Lx1Elementary calculus gives

df0dx1

= 2x1 2L = 0from which we find that x1 = L and x2 = L.



Exercise: Minimize the circumference 2x1+2x2 of a rectangle subject to a given areaA = x1x2.

Convex Optimization Problems: A special class of optimization problems is called convex. Inthese problems, both objective and constraint functions are convex functions. This means that

fi [x + (1 )y] fi(x) + (1 )fi(y); i = 0; : : :m (1.4)Geometrically, this means the the function between two points lies below a straight line.

Example: Given the optimization problem

f0(x) = x1 + x2f1(x) = x21 + x

22 R2

Show that this is a convex optimization problem and determine the feasible domain F .For the solution we need to discuss the properties of f0 and f1. f0 is a linear function, and we

can easily see that

f0[x + (1 )y] = x1 + (1 )y1 + x2 + (1 )y2= (x1 + x2) + (1 )(y1 + y2) = f0(x) + (1 )f0(y)

which satisfies the requirement. For f1 the process is a bit more lengthy

Note: A twice differentiable function g : Rn ! R is convex, if its Hessian matrix Hg is positivesemi-definite for all x. The Hessian is defined by

Hg =

2664@2g@x21

: : : @2g

@x1@xn... . . .

...@2g

@xn@x1: : : @

2g@x2n

3775 (1.5)Exercise:

f0(x) = ex1ex2

f1(x) = x1 x2 RShow that this is a convex optimization problem and determine the feasible domain F .

1.2 Nonlinear OptimizationIn typical practical problems, both the objective function f0 and the constraint functions f1; : : : fmdepend nonlinearly in the design variables xk; k = 1 : : : n. In such a case, there may be severyllocal minima so that in a subset F` F of the feasible domain we have a local minimum point x` .This means that for all z 2 F` we have f(z) f(x). In general it is very difficult to decide if alocal minimum point x` is actuall the global minimum point x.

Kuhn-Tucker-Condition: A necessary condition for the existence of a local minimum point x`in the interior of F is

@f0@xk

= 0; k = 1 : : : n (1.6)

Note: This condition need not be fulfilled for a local minimum point on the boundary @F of thefeasible domain.



Figure 1.1: Local minima in a nonlinear optimization problem

Example: Consider the optimization problem

f0(x) = x2

f1(x) = R xfor different values of R. The feasible domain is the interval from R to 1, F = [R;1). The KTcondition for f0 states that a local minimum point should satisfy df0dx = 2x = 0. We can immediatelysee that for anyR < the point x = 0 belongs to the feasible domain, and hence x = 0. ForR > 0,however, the point x = 0 does not belong to the feasible set, and we have x = R.

Exercise: For the optimization problem

f0(x) = 1 x2 + x4f1(x) = R x

determine the number and location of local minimum points depending on the value of R.

Note: Convex optimization problems have only one local minimum point which is the globalminimum point x.



Example: Consider a simply two-bar truss system

We want to choose the height x of the truss system such that the deformation energy under astatic load F becomes minimal.

The deformation energy is equal to the work done by the applied load which is (assuming linearelasticity) W = 1

2Fu. Since we keep the load F constant in the optimization, this is equivalent to

minimizing the deflection u. For convenience, we introduce a new variable defined by tan = xL

.From equilibrium conditions, we find that the force Fs in one bar is given by

Fs =F

2 sin

From that we compute the compression us of one bar as

us =FsLsEA

=FL

2EA sin cos

an finally the vertical displacement u becomes

u =us

sin=

FL

2EA

1

sin2 cos

Minimizing u is equivalent to maximizing the function f = sin2 cos. The KT-condition forthis function is

dfd

= 2 sin cos2 sin3 = 0One solution is sin = 0 which does not give a useful result. The other solution is given by

2 cos2 sin2 = 2 2 sin2 sin2 = 2 3 sin2 = 0

The first value of satifying this relation is = arcsinq

23= 54:7. For this value, we have

x = L tan 54:7 =p2 L.

Exercise: Solve the same problem under a horizontal load F .



2 Unconstrained Optimization

2.1 Basic ConceptsNonlinear optimization problems are frequently solved by search techniques. Such methods gener-ate a sequence of points xk; k = 1 : : :1 whose limit is a local minimum point x` . In this context,several properties of the objective function f(x) are of interest.

Local properties Here the most important quantity is the gradient of the objective function

rf =@f

@x1;@f

@x2; : : : ;

@f

@xn

T= g(xa) (2.1)

The gradient can be used to expand f(x) into a first-order Taylor series about an arbitrary point xa:

f(x) = f(xa) + (x xa)Tg(x) (2.2)

Note that a tangent plane on f(x) in x = xa is given by

(x xa)Tg(xa) = 0 (2.3)

Obviously, this requires local differentiability of the objective function.

Regional properties This means looking a the topography of the objective function A ridge

Figure 2.1: Regional properties of objective function

is loosely defined as a region with a pronounced change of the objective function in one specificdirection including at least one local optimum. A saddle is a region in which the objective appearsto have a minimum along certain directions while it appears to possess a maximum in other specificdirections.



Global properties This deals with properties affecting the convergence of search methods tothe global minimum. The properties of interest are

continuity and differentiability convexity separability

Remark: Small errors in the objective function (numerical noise) may actually lead to largeerrors in the gradients which may effectively destroy differentiability. As an excamples consider thefunction

f(x) = 1 + " sinx

"(2.4)

which is almost constant for small values of ". However, its derivative is

f 0(x) = cosx

"(2.5)

which is not small and very rapidly oscillating.

Definition: A function f(x) is called separable (non-interacting) if can be expressed as

f(x) =nX

k=1

qk(xk) (2.6)

Such an objective function can be minimized by minimizing the partial functions qk separately.Sometimes a function can be made separable by an appropriate change of variables.

Example: Consider the function

f(x1; x2) = x21 + 10x1x2 + 100x

22

If we introduce a new set of variables z1; z2 by

x1 = z1 5p75z2; x2 =

1p75z2

we obtain

f = z21 25p75z1z2 +

25

75z22 + 10

1p75z1z2 10 5

75z22 +

100

75z22 = z

21 + z

22

which is separable in the new variables (cf. Fig. 2.2).

2.2 Search methods2.2.1 Newton-Raphson method

Within this method, the sequence xk is constructed using the Hessian matrix H at each iteration.Given a start vector x0, the iteration proceeds as

xk+1 = xk +xk = xk H1(xk)g(xk) (2.7)



Figure 2.2: Objective function in non-separable and separable form

This, of course, requires that f is twice differentiable. Since we assumed convexity the Hessianmatrix is positive definite and hence

gT (xk)xk = gT (xk)H1(xk)g(xk) 0 (2.8)

The Newton step is a descent step (but not the steepest). The choice of the Newton method canbe motivated by studying a second order Taylor expansion f^(x) of the objective function f(x):

f^(x + v) = f(x) + gT (x)v +1

2vTH(x)v (2.9)

This is a convex function of v which is minimized by v = H1(x)g(x).

Figure 2.3: Quadratic approximation of objective function



Example: Consider the objective function

f(x1; x2) = (x1 + 1)2 + x21x

22 + exp(x1 x2) (2.10)

A plot of this function is shown in Fig. 2.4

Figure 2.4: Plot of objective function and Newton iteration sequence

2.2.2 Steepest descent method

The first-order Taylor approximation f^ of f around a point x is

f^(x + v) = f(x) + gT (x)v (2.11)

We choose the direction of v such that the decrease in f^ becomes as large as possible. Let k:k beany norm on Rn. We define a normalized steepest descent direction as

p = argminkvk=1

gT (x)v

(2.12)

If we choose k:k to be the Euclidean norm, i.e. kxk =p

xTx, then we obtain the direction

p = g(x)kg(x)k (2.13)

The steepest descent method then performs a line search along the direction defined by p, so thatxk+1 = xk + tp. There are several possibilities of searching.



Exact line search. Determinet = argmin

s0f(x + sp) (2.14)

This may be very expensive.

Backtracking line search. Given a descent direction p, we choose 2 (0; 0:5) and 2 (0; 1).Then we apply this algorithm:

{ t := t0

{ while = f(x + tp) f(x) + tgT (x)p > 0 do t := t

Figure 2.5: Sufficient descent condition

This algorithm ensures that we obtain a descent which is at least as large as by the-fold gradient.Typical values for applications are 0:01 0:30 and 0:1 0:8.

Example: Minimize the objective function f(x1; x2) = (x1 1)2 + x21x22 + exp(x1 x2) usingthe steepest descent method with backtracking line search. Start at x0 = [0; 0]T and use t0 = kgk.

We get g = [3;1]T and p = [0:949; 0:316]T . In the line search we start with t0 = 3:162giving = 13.018. Then we get t1 = 1.581 with = -0.052. This is acceptable. Hence we getx1 = [1:5; 0:5]. The further steps are shown in the table below.

i 1 2 3 4 10t 1.581 0.665 0.219 0.069 0.0005x1 -1.500 -1.096 -1.171 -1.132 -1.130x2 0.500 -0.029 0.178 0.121 0.113f 0.948 0.354 0.322 0.322 0.322

2.2.3 Quasi-Newton methods

The basic idea of quasi-Newton methods is to utilize successive approximations of the Hessianmatrix H(x) or its inverse B(x) = H1(x). One specific popular method is the BFGS approach(named after Broyden, Fletcher, Goldfarb and Shanno). The procedure uses a quadratic approxi-mation of the objective function f in terms of

f^k(v) = gTk v +1

2vTHkv (2.15)



Figure 2.6: Plot of objective function and steepest descent iteration sequence

Here Hk is a symmetric, positive definite matrix which is updated during the iteration process. Theminimizer pk of f^k(v) is

pk = H1k gk = Bkgk (2.16)In most implementations, this vector is used as a search direction and the new iterate for the designvector is formed from

xk+1 = xk + tpk (2.17)

Here the value of t is computed from a line search (typically backtracking starting from t=1). Thena new approximation f^k+1(v) is constructed from

f^k+1(v) = gTk+1v +1

2vTHk+1v (2.18)

For this purpose we compute

sk = xk+1 xk; yk = gk+1 gk (2.19)

We then check the so-called curvature condition

k = sTk yk > 0 (2.20)

If k 0 we set Hk+1 = Hk. Otherwise, we compute the next approximation to the inverseHessian Bk+1 from

Bk+1 =

I skyTk

k

Bk

I yksTk

k

+

sksTk

k

(2.21)



Usually the procedure is started with B0 = I. In large problems it is not helpful to keep all updatevectors in the analysis. Therefore, limited-memory BFGS (L-BFGS) has been developed. In thisapproach, only a small number m of most recent vectors sk and yk is stored and Bk is re-computedfrom these vectors in each step. Due to round-off it may happen that the updated matrix becomesvery ill-conditioned. In this case, the update process is completely restarted from B = I.

Example: Minimize the objective function f(x1; x2) = (x1+1)2+x21x22+ exp(x1x2) usingthe BFGS method with backtracking line search. Start at x0 = [0; 0]T and use t0 = 1.

We get g0 = [3;1]T and p0 = [3; 1]T . In the line search we start with t0 = 1 giving =13.018. Then we get t1 = 0.5 with = -0.052. This is acceptable. Hence we get x1 = [1:5; 0:5]and g1 = [1:615; 2:115]T . From that, we have s1 = [1:5; 0:5]T , y1 = [4:615; 3:115] and

1 = 8:479. This leads to an updated inverse Hessian

B1 =0:603 0:4110:411 0:770

and a new search direction p1 = [0:103; 0:964].

The further steps are shown in the table below as well as in Fig. 2.7.

i 1 2 3 4 10t 1.581 0.5 1 1 1x1 -1.500 -1.448 -0.981 -1.111 -1.130x2 0.500 0.018 0.231 0.158 0.113f 0.948 0.432 0.322 0.322 0.322

2.3 Applications to shape optimization2.3.1 Minimal Surfaces

Two circles with radius R and the distance H should be connected by a membrane with minimalsurface areaA. We discretize the problem by replacing the meridian curve by a polygon as sketched.Then the membrane surface area is given by

A = 2(R + r)

r(R r)2 + H

2

9+ 2r

H

3(2.22)

Here r is to be determined by minimizing A. Taking derivative w.r.t. r we have

dAdr

= 2

r(R r)2 + H

2

9+ 2r

2(R r)(R + r)2q

(R r)2 + H29

+ 2H

3= 0 (2.23)

For a ratio of HR

= 1, the solution is r = 0:867R, for HR

= 1:3 it is r = 0:707R. The analyticalsolution for the meridian curve of this problem can be obtained as r(z) = a cosh z

ain which a has

to be chosen such that r(H2) = R. For H

R= 1, this leads to a = r(0) = 0:843, for H

R= 1:3,

we obtain a = r(0) = 0:642. So there is some level of agreement even with this very simplediscretization.



Figure 2.7: Plot of objective function and BFGS iteration sequence

Figure 2.8: Connecting two circles with a membrane

Exercise Connect two squares (LL) by trapezoidal elements minimizing the total surface area.Consider the cases

a) for HL= 1

2

b) for HL= 1

2.3.2 Shape optimization by energy minimization

Here we try to find a structural geometry in such a way that the work done by the applied loads be-comes minimal. For a structure with concentrated applied loads F and corresponding displacementsu this means FTu ! Min.!



Figure 2.9: Connecting two squares with trapezoidal elements

Example The geometry of a statically determinate system as shown in Fig. 2.10 is to be configuredsuch that the total external work W = F1u1 + F2u2 becomes minimal. The design variables arethe vertical locations of the load application points, i.e. x1 and x2. We assume identical rectangular

Figure 2.10: Minimize external work

cross sections d d throughout with the following geometrical relations: d = 0:05L, A = d2,I = d

4

12. Furthermore, we solve the problem for the fixed load relation F1 = F , F2 = 2F .

Computing W for the range 0 x1; x2 5 results in the values as shown in logarithmic scalein Fig. 2.11. Even in log-scale it can be seen that there is a deep and narrow ravine along the linex2 =

54x1. This line defines a moment-free geometric configuration of the system. Tracing this

line in x1; x2-space easily allows the location of a global minimum at x1 = 1:375L as shown inFig. 2.12.

Example The geometry of a statically determinate system as shown in Fig. 2.13 is to be configuredsuch that the total external work W becomes minimal. Assuming symmetry, the design variablesare the vertical locations of the load application points, i.e. z1; z2; z3. Again we assume identicalrectangular cross sections d d throughout. We now start by solving for possible moment-freeconfigurations. The moments in the points e and d are easily found. From the conditionMe = 0 weget z1 = 59z3. From the condition Md = 0 we get z2 =

89z3 so that the energy will be minimized

using z3 only. Using these relations, we locate a global minimum at z3 = 2:375L as shown inFig. 2.14.



Figure 2.11: External work as a function of x1 and x2

Figure 2.12: External work as a function of x1 along the line x2 = 54x1



Figure 2.13: Minimize external work

Figure 2.14: External work as a function of z3 in the plane z2 = 59z3; z2 =89z3



X

Y

Z

Figure 2.15: Initial configuration and load distribution

XYZ

Figure 2.16: Final configuration



Application to Finite Element models. This shape optimization by energy minimization canalso be used in the context of the finite element method. Here the nodal coordinates of the mesh arethe optimization variables. Of course, this implies that the element matrices and the global matriceshave to be re-assembled in each step of the optimization. The shape of the structure with the loadsas indicated in Fig. 2.15 should be optimized with respect to minimal external work. The optimizedshape is shown in Fig. 2.16.

3 Constrained Optimization

3.1 Optimality ConditionsWe now return to the problem of optimization with inequality constraints. Without loss of gener-ality, this can be written in the form of

Minimize f0(x)subject to f1(x) 0

(3.1)

The function f1 : Rn ! R may actually involve several constraint conditions put together e.g.in terms of a max-operator. The standard approach o the solution of this problem involves theconsrution of a Lagrange-function L combining the objective and the constraint:

L(x; ) = f0(x) + f1(x); 0 (3.2)The parameter 2 R is called Lagrange multiplier. It is an additional optimization variable.

The so-called Karush-Kuhn-Tucker (KKT) conditions for this optimization problem are theusual necessary conditions for the existence of a local minimum:

rf0(x) + rf1(x) = 0f1(x) = 0

0; f1(x) 0(3.3)

Example Consider a one-dimensional problem as previously discussed:

f0(x) = x2 ! Min.!

f1(x) = x+ 1 0(3.4)

The Lagrange function or this problem is

L(x; ) = x2 + (x+ 1) (3.5)

This function is shown in Fig. 3.1 for the range 3 x; 3. The Lagrange function has astationary point defined by 2x + = 0 and x + 1 = 0, so that x = 1 and = 2. Again, thisis shown in Fig. 3.1. It is easily seen that this point is a saddle point in (x; )-space.

Example Consider the optimization problem (as previously discussed in similar form): We wantto maximize A = x1x2 subject to 2x1 + 2x2 4L. In terms of Eq.(3.1) we write

f0 = x1x2f1 = 2x1 + 2x2 4L



Figure 3.1: Lagrange function

and the KKT-conditions become

x2 + 2 = 0x1 + 2 = 0

(2x1 + 2x2 4L) = 0 0; 2x1 + 2x2 4L 0

(3.6)

The first three equations have the solutions x1 = 0; x2 = 0; = 0. This solution obviously definesa maximum of f0. The second solution is x1 = L; x2 = L; = L2 . This satisfies all conditions,and therefore describes a local minimum of f0 (and therefore a maximum of A).

If the function f0 and f1 are both convex and differentiable, then the KKT conditions for(x; ) are necessary and sufficient for a local optimum. If, moreover, f0 is strictly convex, then thesolution x is unique (i.e. a global minimum).

Note that in the previous example f0 is not convex!

3.2 Quadratic problem with linear constraintsConsider the optimization problem

f0(x) =1

2xTH x + gTx ! Min.!f1(x) = aTx + b 0

(3.7)

with a positive definite matrix H. Since the objective function is strictly convex and the constraintequation is convex, the solution of the KKT-conditions (if it exists) defines the unique minimum.The KKT-conditions are

H x + g + a = 0(aTx + b) = 0

(3.8)



together with > 0 and aTx+ b 0. One possibility is = 0 and from that x = H1g. If thispoint is feasible, then it is the solution. The alternative with 6= 0 requires that

aTx = b; aTx = aTH1a aTH1g (3.9)

from which we immediately get

=b aTH1g

aTH1a(3.10)

and furthermorex = H1aH1g (3.11)

This can be used as a starting point for sequential numerical procedures (SQP methods such asNLPQL) utilizing a second order approximation for the objective function and a first order ap-proximation for the constraints.

Example Find the minimum of f0 = x21 + x1x2 + x22 subject to the constraint x1 x2 < R,i.e. f1 = x1x2R. In order to rewrite this in the previous notation, we introduce the matrix Hand the vector a as well as the scalar b:

H =2 11 2

; a =

11; b = R (3.12)

The solution = 0 and, correspondingly, x = 0 exists only for R 0 (see Fig. 3.2). The second

Figure 3.2: Objective function and feasible domain for R = 2 (lower right) and R = 2 (upperleft)

possible solution is obtained from

H1 =1

3

2 11 2

; aTH1a = 2; = R

2; x =

1

2

RR

(3.13)



3.3 Sequential quadratic programming (SQP)Essentially, this is a repeated application of the minimization of a quadratic function with linearconstraints. In the process, most implementations do not use the exact Hessian matrix of theobjective function, rather an approximation based on gradient information during the iteration (suchas the BFGS-updating) is used. In this case, it may be helpful to include a line search procedureusing the solution of Eq. 3.11 as search direction. Also, scaling of the constraints can significantlyinfluence the convergence!

Example Minimize the objective function

f(x1; x2) = (x1 + 1)2 + x21x

22 + exp(x1 x2) (3.14)

subject to the constraint condition

x21

2 x2 + 1:5 < 0 (3.15)

A plot of this function and the feasible domain is shown in Fig. 3.3

Figure 3.3: Plot of objective function and iteration sequence

Repeated application of the quadratic constrained minimization leads to a local minimum. Start-ing the procedure at x0 = [1; 0]T we get fast convergence and end up with the global minimumx = [1:632; 0:168]. With a slightly modified starting vector of x0 = [0:5; 0]T we converge(slowly) to the local minimum x = [1:000; 1:000] (see Fig. 3.3). Interestingly, when starting atthe origin, we converge (very slowly) to the global minimum x = [1:632; 0:168].



3.4 Penalty methodsAn alternative approach to explicit handling of constraints is the application of modifications tothe objective function in such a way as to prevent the optimization algorithm from reaching theinfeasible domain. A simple way to achieve this is adding a penalty term p(x) to the objectivefunction f0(x) which is large enough to shift the minimum of the augmented objective functionfp(x) = f0(x) + p(x) to the feasible domain. For computational purposes it is useful to constructp(x) in such a way that the objective function remains differentiable (or at least continuous). Usually

Figure 3.4: Interior and exterior penalty functions

it will not be possible to adjust p(x) in such a way that the minimum of the augmented objective willbe located exactly at the boundary of the feasible domain (cf. Fig. 3.4). Interior penalty functionsattempt to keep the optimization process away from the boundary of the feasible domain by addinga term which increases sharply when approaching the boundary from the interior. So the solutionwill be feasible. Exterior penalties lead to an optimum which is not in the feasible domain. However,it is usually easier to construct suitable exterior penalty functions, e.g.

p(x) =NXi=1

aiH[fi(x)]fi(x)ì (3.16)

Here H(:) denotes the Heaviside (unit step) function and the coefficients ai > 0 and ì 0are chosen according the the specific problem. The choice ì = 2 is leads to a differentiableaugmented objective function and is usually quite acceptable. By increasing the values of ai thesolution approaches the boundary of the feasible domain.

Example Minimize the objective function

f0(x1; x2) = (x1 + 1)2 + x21x

22 + exp(x1 x2) (3.17)

subject to the constraint condition

f1(x1; x2) = x21

2 x2 + 1:5 < 0 (3.18)

We choose the exterior penalty function

p(x1; x2) = aH[f1]f21 (3.19)



Figure 3.5: Plot of augmented objective function and iteration sequence, a = 1

A plot of this function for a = 1 is shown in Fig. 3.5. Application of the BFGS method to theaugmented objective function leads to the iteration sequence as shown in Fig. 3.5. The convergenceto the point x = [1:449; 0:180] is quite fast, but the final result is clearly infeasible. Changingthe value a = 10 leads to an augmented objective as shown in Fig. 3.6. Here a second minimumbecomes visible which is actually found when starting from the origin. Starting at the point (1; 0)we converge to the point x = [1:609; 0:170] which is reasonably close to the solution of theconstrained optimization problem.

Example The cross sectional areas of the truss structure as shown in Fig. 3.7 should be chosensuch that the total structural mass becomes minimal. As a constraint, the maximum stress (absolutevalue) in any truss member should not exceed a value of . We assume numerical values of L = 3and H = 4. The objective function is then

f0 = 5(A1 + A3 + A5) + 6(A2 + A4) (3.20)

Since this is a statically determinate system, the member forces Fk and stresses k are easilycomputed as

F1 =5

8F; F2 = 3

8F; F3 = 3

8F; F4 =

3

4F; F5 = 5

4F

1 =5F

8A1; 2 =

3F

8A2; 3 =

3F

8A3; 4 =

3F

4A4; 5 =

5F

4A5

(3.21)

In this case, the objective function is linear, but the constraints are not.1

1By introducing the inverses of the cross sectional areas as new design variables, the problem could be changed tononlinear objective with linear constraints.



Figure 3.6: Plot of augmented objective function and iteration sequence, a = 10

Figure 3.7: Simple truss structure

We solve the problem by introducing an exterior penalty function in the form of

p = aH(s)s4; s = maxk=1:::5

k (3.22)

In the following numerical evaluation we fix the values F=1, =1 and vary a. Using a BFGSiteration with numerical gradient evaluation (central differences with Ak = 106 starting atAk=1we get the results as shown in Table 1. It can be seen that as a increases, the results approach thefully stressed design in which each truss member reaches the stress limit. The convergence of the



Table 1: Truss example: convergence of penalty methoda A1 A2 A3 A4 A5 N

105 0.603 0.362 0.362 0.724 1.207 540107 0.620 0.372 0.372 0.744 1.240 706109 0.624 0.374 0.374 0.749 1.248 736

objective and the constraint is shown in Fig. 3.8 for the case of a = 109. The number of iterationsN required for convergence is given in Table 1.

Figure 3.8: Convergence for simple truss structure

Example The cross sectional areas of the frame structure as shown in Fig. 3.9 should be chosensuch that the total mass becomes minimal. The structure is modeled by 4 beam elements havingsquare cross sections with widths b1, b2, b3 and b4, respectively. As constraints we utilize displace-ment conditions, i.e. juj < and jwj < (cf. Fig. 3.9). For steel as material with E = 2.1 GPa and

Figure 3.9: Simple frame structure

= 7.85 t/m3, a deformation limit =50 mm as well as a penalty parameter A = 105 and startingvalues bi = B we obtain the optimal section widths as shown in Table 2. ForB = 0:5 we obtain theoptimum with 171 iterations. This solution has a total mass of 2173 kg, a horizontal displacementof 50.02 mm and a vertical displacement of 49.96 mm. Convergence is shown in Fig. 3.10. It shouldbe noted that for different starting values B the procedure converges to a different solution.



Table 2: Frame example: optimal section widthsB [mm] b1 [mm] b2 [mm] b3 [mm] b4 [mm] f0 [kg] u [mm] w [mm]

500 201 105 109 109 2173 50.02 49.96100 178 136 149 141 2549 49.75 19.47

Figure 3.10: Convergence for simple frame structure

4 Genetic algorithms

4.1 Basic PrinciplesThe general idea of genetic algorithms for optimization utilizes a string representation of the designvariables (chromosome). With a set of different designs (population) we can try to find better designs(individuals) through the processes of reproduction which involves recombination and mutation. The re-combination process is usually carried out by cross-over in which parts of the strings are swappedbetween individuals. The simplest string representation is a bit string representing states or discretenumerical values. As a matter of fact, any digital representation of real numbers is such a bit string.

As an example, consider maximizing the function f(x) = x2 for integer x in the interval [0; 31].Within that range, any integer can be represented by 5 bits. Let us assume that an initial populationwith four strings has been generated

01101

11000

01000

10011

(4.1)

For the reproduction process it is a good idea to consider primarily those individuals which have ahigh value of the objective function (the fitness). According to this concept, the individuals with ahigher fitness have a large probability of being selected for reproduction. In Table 3 the selectionprobability PS is shown proportional to a fitness value which is equal to the objective functionFrom this table it becomes obvious that it is beneficial for the fitness to have the high-order bits

Table 3: Sample strings and fitness values

No. String Fitness PS1 01101 169 0.1442 11000 576 0.4923 01000 64 0.0554 10011 361 0.309

set in the strings. This means that in the reproduction process all individuals with a chromosomecontaining sub-strings having the high-order bits set should be preferred as they are more likely toachieve better fitness values.



The cross-over process when applied to two individuals cuts the strings of each individual atthe same location chosen at random and swaps the pieces. As an example, consider the first twostrings in Table 3. We choose to cut the chromosomes after the fourth bit:

A1 = 0110j1A2 = 1100j0

(4.2)

Swapping the pieces results in

A01 = 01100

A02 = 11001(4.3)

It is easy to see that now we have a new individual (A02) with a better fitness than any other beforein the population. This individual decodes to the numerical value of x = 25 with an objectivefunction value of f(x) = 625.

Mutation can be introduced by randomly flipping the state of one single bit. Usually, the prob-ability of occurrence is kept rather small in order not to destroy the selection process. However,mutation can help the optimization process to escape the trap of a local extreme.

A very interesting property of genetic algorithms is that they are essentially blind to the math-ematical characteristics of the objective function. In particular, there are no requirements of differ-entiability or even continuity.

4.2 Choice of encodingIn the introductory section we discussed an example of an integer-valued design variable x in therange of [0; 2` 1] with ` being the number of bits (5 in this case). This is certainly not a typicalsituations. We may have continuous variables y varying within an interval [ymin; ymax]. A straight-forward coding would be a linear mapping from the interval [0; 2` 1] to the interval [ymin; ymax]:

y =ymax ymin

2` 1 x (4.4)

and x is represented by ` bits. Here the choice of ` affects the resolution of the variable x but notits range. Multi-parameter encodings can be achieved by concatenating single-parameter encodings.

A problem can arise from the fact that adjacent values of x can have a large number of differentbits. As an example, consider the 4-bit representations of the numbers 7 (0111) and 8 (1000). Allbits are different. Therefore sometimes so-called Gray-code (reflected binary code) is used. Graycoding reduces the bitwise difference between actually neighboring numbers to one single bit. Graycodes are constructed by arranging the binary strings into sequences in which the neighbors differonly by one bit. For ` = 2 one possible sequence is easily found as

00; 01; 11; 10 (4.5)

and for ` = 3 we have for instance

000; 001; 011; 010; 110; 111; 101; 100 (4.6)

For a 5-bit encoding, the natural and Gray codes are shown in Table 4.



Table 4: Natural and Gray codes for 5-bit encodings

x Natural Gray0 00000 000001 00001 000012 00010 000113 00011 000104 00100 001105 00101 001116 00110 001017 00111 001008 01000 011009 01001 0110110 01010 0111111 01011 0111012 01100 0101013 01101 0101114 01110 0100115 01111 01000

x Natural Gray16 10000 1100017 10001 1100118 10010 1101119 10011 1101020 10100 1111021 10101 1111122 10110 1110123 10111 1110024 11000 1010025 11001 1010126 11010 1011127 11011 1011028 11100 1001029 11101 1001130 11110 1000131 11111 10000

4.3 Selection ProcessDuring the course of a genetic optimization we want to keep the population size constant. If weinitially have a few individuals with a significantly higher fitness than the others, then it is very likelythat the population will be dominated by these individuals and their offspring. This can lead toa trap in a local maximum. One way to avoid this involves a scaling of the fitness function suchthat the best value is moderately large than the average. At the same time we want to maintain theaverage fitness also for the scaled function which is needed for average individuals to maintain theirchance of survival. Linear scaling introduces a scaled fitness f 0 in terms of the raw fitness f as (cf.Fig. 4.1):

f 0 = af + b (4.7)Here f 0max is chosen as multiple of the average fitness, i.e. f 0max = Cmultfavg . For typical small

Figure 4.1: Linear fitness scaling

population sizes of 50 to 100, a choice of Cmult between 1.2 and 2.0 as been used successfully. As



the optimization approaches the end, typically fitness values within the population show very littlevariation with the exception of a few very bad cases. Linear scaling might assign negative fitnessvalues, which must be suppressed by adjusting the factor Cmult accordingly. The actual selection

Figure 4.2: Biased roulette wheel for selection

process picks individuals at random with a selection probability PS proportional to the fitness. Thiscan be viewed as a roulette wheel with a non-uniform slot size. For the sample population as givenin Table 3, this is shown in Fig. 4.2.

4.4 Recombination and mutationOnce surviving individuals have been selected, they are paired and the chromosomes are cut atrandom locations. The pieces are then swapped thus forming two new individuals. In order to createpreviously unavailable bit patterns, individual bits may be flipped randomly simulating sponanteousmutations.

4.5 ElitismDue to the random reproduction process it may happen that genetic material related to the bestindividuals gets lost. This can be avoided by granting survival to a subset of the population withthe highest fitness (elite), usually one individual.

5 Robustness in optimization

5.1 Stochastic modellingIn many engineering applications of optimization there is some uncertainty about the exact valuesof design variables and/or other parameters affecting the objective function and constraints. Thisuncertainty is due to e.g. manufacturing tolerances or environmental conditions and can frequentlybe described in terms of probabilities. As a consequence, the objective function and the constraintsbecome random, i.e. they have a probability distribution. This implies that the objective functionmay on average be significantly larger than in the deterministic situation and that constraints can beviolated with large probabilities. Such a case would be called non-robust. Robust optimization aimsat mitigating the effect of random uncertainties by taking them into account during the optimizationprocess.



Uncertainties in the optimization process can be attributed to three major sources as shown inFig. 5.1 These sources of uncertainties or stochastic scatter are

Figure 5.1: Sources of uncertainty in optimization

Uncertainty of design variables. This means that the manufacturing process is unable toachieve the design precisely. The magnitude of such uncertainty depends to a large extent onthe quality control of the manufacturing process.

Uncertainty in the objective function. This means that some parameters affecting the struc-tural performance are beyond the control of the designer. These uncertainties may be reducedby a stringent specification of operating conditions. This may be possible for mechanicalstructures, but is typically not feasible for civil structures subjected to environmental loadingsuch as earthquakes or severe storms which cannot be controlled.

Uncertainty of the feasible domain. This means that the admissibility of a particular design(such as its safety or serviceability) cannot be determined deterministically. Such problemsare at the core of probability-based design of structures.

Monte Carlo Simulation This is a frequently used method to deal with the effect of randomuncertainties. Typically its application aims at integrations such as the computation of expectedvalues (e.g. mean or standard deviation). As an example, consider the determination of the area ofa quarter circle of unit radius. As we know, the area is

4. Using a the so-called Monte Carlo Method

we can obtain approximations to this result based on elementary function evaluations. Using 1000uniformly distributed random numbers x and y (cf. Fig. 5.2), and counting the number Nc of pairs(x; y) for which x2 + y2 < 1, we get an estimate

4 Nc

1000= 776

1000= 0:776.

Simple example Consider the objective function f0(x) = a(x 1)2 0:5 with different nu-merical values for a a. This function has the minimum at x = 1 with an objective function valuef(x) = -0.5 for all values of a > 0. We assume that the actual design value of x is not locatedexactly at the deterministic minimum but has a random offset y (representing e.g. manufacturingtolerances) so that the objective is evaluated at xD = 1 + y. This introduces more randomnessinto the objective function as evaluated at the actual design xD. y is assumed to be a zero-meanGaussian variable with a coefficient of variation of 0.1.

Carrying out a Monte-Carlo simulation with 1000 random samples drawn x with a fixed valueof a = 0.5 we obtain random samples for the objective function and the location of the minimum.



Figure 5.2: Estimating 4

by Monte Carlo Simulation

Figure 5.3: Simple objective function and location of the minimum

The samples are shown in Fig. 5.4. It can be easily seen that the objective function value is alwayslarger than the deterministic value. The mean value of the objective function is -0.495 with a COVof 0.014. For numerical value of a = 10 the same analysis yields a mean objective of -0.40 witha COV of 0.35. This demonstrates the effect of local curvature of the objective function at thedeterministic optimum. Assume now that both a is a Gaussian random variables with mean valuesof 0.5 and coefficients of variation of 0.1. This introduces additional into the objective function.Carrying out a Monte-Carlo simulation with 1000 random samples drawn for a and x we obtainsamples as shown in Fig. 5.5.

5.2 Application exampleAs an example, consider a simple beam under dynamic loading (cf. Fig. 5.6). For this beam withlength L = 1m and a rectangular steel cross section (w, h) subjected to vertical harmonic loadingFV (t) = AV sin!V t and horizontal harmonic loading FH(t) = AH sin!Ht the mass should beminimized considering the constraints that the center deflection due to the loading should be smallerthan 10 mm. Larger deflections are considered to be serviceability failures. The design variables are



Figure 5.4: Random samples of design values and objective function, deterministic objective

Figure 5.5: Random samples of design values and objective function, random objective

bounded in the range 0 < w; h < 0.1 m. Force values are AV = AH = 300 N, !V = 1000 rad/s,!H = 700 rad/s. Material data are E = 210 GPa and = 7850 kg/m3.

Using a modal representation of the beam response and taking into account the fundamentalvertical and horizontal modes only, the stationary response amplitudes uV and uH are readily com-puted. Fig. 5.7 shows the dynamic response u =

pu2V + u

2H as a function of the beam geometry.

The contour line shown indicates a response value of 0.01 m. Defining this value as acceptablelimit of deformation it is seen that the feasible domain is not simply connected. There is an islandof feasibility around w = 0.03 m and h= 0.05 m. The deterministic optimum is located on the



Figure 5.6: Beam with rectangular cross section

Figure 5.7: Dynamic response of beam and feasible domain

boundary of this island, i.e. at the values w = 0.022 m and h = 0.045 m.In the next step, the loading amplitudes are assumed to be log-normally distributed and the

excitation frequencies are assumed to be Gaussian random variables. The mean values are assumedto be the nominal values as given above, and the coefficients of variation are assumed to be 5% forthe load amplitudes and for the excitation frequencies (Case 1). This implies that the constraintscan be satisfied only with a certain probability < 1. Fig. 5.8 shows the probability P (Fjw; h)ofviolating the constraint as a function of the design variables w and h.

Accepting a possible violation of the constraint condition with a probability of 20%, it is seenthat the location of the deterministic optimum still contains a probabilistically feasible region. Inthat sense, the deterministic optimum may be considered as robust.

In a comparative analysis, the coefficients of variation are assumed to be 10% for the loadamplitudes and for the frequencies (Case 2). The resulting conditional failure probabilities are shown



Figure 5.8: Conditional failure probability P (Fjw; h) depending on w und h, Case 1

in Fig. 5.9. Due to the increased random variability, the feasible region around the deterministic

Figure 5.9: Conditional failure probability P (Fjw; h) depending on w und h, Case 2



optimum diminishes. This indicates the limited robustness of the deterministic optimum. It istherefore quite important to quantify the uncertainties involved appropriately in order to obtainuseful robustness measures.

BooksR. T. Haftka & M. P. Kamat, Elements of Structural Optimization, Martinus Nijhoff Publ., 1985.U. Kirsch, Fundamentals and Applications of Structural Optimization, Springer, 1993P. Pedregal, Introduction to Optimization, Springer, 2004.J. Nocedal & S. J. Wright, Numerical Optimization, Springer, 2006.