+ All Categories
Home > Documents > a b c b d d e c e f a b c d e f attempted Cholesky ...

a b c b d d e c e f a b c d e f attempted Cholesky ...

Date post: 11-Apr-2022
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
22
Department of Mechanical Engineering Indian Institute of Technology Kanpur ME 752: Optimization Methods in Engineering Design (2008-2009 II) Assignment 0: Mathematical Background 1. Given that a 0 0 b d 0 c e f a b c 0 d e 0 0 f = 4 2 4 2 2 2 4 2 3 , find out the values of a, b, c, d, e and f . (For a square root, select the positive value.) [What you just attempted is called Cholesky decomposition and works as desired for symmetric positive definite matrices.] 2. Consider three vectors u 1 = [2 0 1 1] T , u 2 = [1 2 0 3] T and u 3 = [3 0 1 2] T . (a) Find the unit vector v 1 along u 1 . (b) From u 2 , subtract its component along v 1 (which will have magnitude v T 1 u 2 ) and hence find the unit vector v 2 such that vectors v 1 , v 2 form an orthonormal basis for the subspace spanned by u 1 , u 2 . (c) Similarly, find a vector v 3 which, together with v 1 and v 2 , forms an orthonormal basis for the subspace spanned by all the three vectors u 1 , u 2 , u 3 . (d) Find a vector v 4 to complete this basis for the entire space R 4 . (e) Write a generalized algorithm for building up the vectors v 1 , v 2 , ··· , v l , l m, when the given m vectors u 1 , u 2 , ··· , u m are in R n , m<n. [This process is called Gram-Schmidt orthogonalization and is used for building orthonor- mal bases for prescribed subspaces.] 3. Find an orthonormal basis for the range space of the linear transformation defined by the matrix A = 2 1 3 4 3 0 2 2 5 1 1 6 . 4. A surveyor reaches a remote valley to prepare records of land holdings. The valley is a narrow strip of plain land between a mountain ridge and sea, and local people use a local and antiquated system of measures. They have two distant landmarks: the lighthouse and the high peak. To mention the location of any place, they typically instruct: so many bans
Transcript
Page 1: a b c b d d e c e f a b c d e f attempted Cholesky ...

Department of Mechanical Engineering

Indian Institute of Technology Kanpur

ME 752: Optimization Methods in Engineering Design (2008-2009 II)

Assignment 0: Mathematical Background

1. Given that

a 0 0b d 0c e f

a b c0 d e0 0 f

=

4 2 42 2 24 2 3

,

find out the values of a, b, c, d, e and f . (For a square root, select the positive value.)

[What you just attempted is called Cholesky decomposition and works as desired for

symmetric positive definite matrices.]

2. Consider three vectors u1 = [2 0 − 1 1]T , u2 = [1 2 0 3]T and u3 = [3 0 − 1 2]T .

(a) Find the unit vector v1 along u1.

(b) From u2, subtract its component along v1 (which will have magnitude vT1 u2) and hence

find the unit vector v2 such that vectors v1, v2 form an orthonormal basis for the

subspace spanned by u1, u2.

(c) Similarly, find a vector v3 which, together with v1 and v2, forms an orthonormal basis

for the subspace spanned by all the three vectors u1, u2, u3.

(d) Find a vector v4 to complete this basis for the entire space R4.

(e) Write a generalized algorithm for building up the vectors v1, v2, · · · , vl, l ≤ m, when

the given m vectors u1, u2, · · · , um are in Rn, m < n.

[This process is called Gram-Schmidt orthogonalization and is used for building orthonor-

mal bases for prescribed subspaces.]

3. Find an orthonormal basis for the range space of the linear transformation defined by the

matrix

A =

2 1 3 43 0 −2 25 1 1 6

.

4. A surveyor reaches a remote valley to prepare records of land holdings. The valley is a

narrow strip of plain land between a mountain ridge and sea, and local people use a local

and antiquated system of measures. They have two distant landmarks: the lighthouse and

the high peak. To mention the location of any place, they typically instruct: so many bans

Page 2: a b c b d d e c e f a b c d e f attempted Cholesky ...

towards the lighthouse and so many kos towards the high peak. Upon careful measurement,

the surveyor and his assistants found that (a) one bans is roughly 200 m, (b) one kos is around

15 km, (c) the lighthouse is 10 degrees south of east, and (d) the high peak is 5 degress west

of north. The surveyor’s team, obviously, uses the standard system, with unit distances of 1

km along east and along north. Now, to convert the local documents into standard system

and to make sense to the locals about their intended locations, work out

(a) a conversion formula from valley system to standard system, and

(b) another conversion formula from standard system to valley system.

5. Given that

1 0 0b 1 0c f 1

a d g0 e h0 0 i

=

5 2 110 6 15 −4 7

,

find out the values of a, b, c, d, e, f , g, h and i.

[This is the celebrated Crout-Doolittle algorithm LU decomposition without pivoting.]

6. For n × n matrices Q, R and A, consider the matrix multiplication QR = A columnwise

and observe that r1,kq1 + r2,kq2 + r3,kq3 + · · · + rn,kqn = ak. For the matrix

A =

6 5 −1 06 5 −1 66 1 1 06 1 1 2

,

write out the column equations one by one and determine the corresponding columns of an

orthogonal Q and an upper triangular R. (Note: There is no trick in this problem. Never

stop in between. The process of QR decomposition always works — till the end!)

7. For a bilinear form p(x,y) = xTAy of two vector variables x ∈ Rm and y ∈ Rn, find out ∂p∂xi

,∂p∂yi

; and hence the vector gradients ∂p∂x

and ∂p∂y

. As a corollary, derive the partial derivative∂q∂xi

and vector gradient of a quadratic form q(x) = xTAx.

8. Check whether the matrix

4 11 5 1

1 2

is positive definite matrix.

9. Consider matrix

P =

[

2 3 1a + b b − a 3a + b

]

.

Page 3: a b c b d d e c e f a b c d e f attempted Cholesky ...

(a) For which values of a and b, PPT is positive definite?

(b) For which values of a and b, PTP is positive definite?

10. Consider the matrix

A =

80 −6036 −27

−48 36

.

(a) Construct AT A and determine its eigenvalues λ1, λ2 (number them in descending order,

for convenience) and corresponding eigenvectors v1, v2, as an orthonormal basis of R2.

(b) Define σk =√

λk, form a diagonal matrix with σ1 and σ2 as the diagonal elements and

extend it (with additional zeros) to a matrix Σ of the same size as A.

(c) Assemble the eigenvectors into an orthogonal matrix as V = [v1 v2] and find any

orthogonal matrix U satisfying A = UΣVT .

(d) Identify the null space of A in terms of columns of V.

(e) Identify the range space of A in terms of columns of U.

(f) How does a system of equations Ax = b transform if the bases for the domain and the

co-domain of A change to V and U, respectively?

[This powerful decomposition of matrices for solution, optimization and diagnostics of linear

systems is called singular value decomposition (SVD).]

11. Find the characteristic polynomial of the following matrix and mention its significance.

0 0 0 · · · · · · 0 −an

1 0 0 · · · · · · 0 −an−1

0 1 0. . . · · · 0 −an−2

......

. . .. . .

. . ....

......

......

. . .. . .

. . ....

0 0 0 · · · . . . 0 −a2

0 0 0 · · · · · · 1 −a1

12. Eigenvalues of matrix A are 1.1, 1, 0.9 and the corresponding eigenvectors are [1 0 1]T ,

[1 2 − 1]T , [1 1 1]T . Compute A and A6.

13. Let f(x) be a scalar function of a vector variable x ∈ Rn. Let Q ∈ Rn×n be an orthogonal

matrix, such that its columns q1, q2, · · · , qn form an orthonormal basis of Rn.

(a) For small α, find out f(x + αqj) − f(x).

(b) Hence, show that the directional derivative ∂f∂qj

= qTj ∇f(x).

Page 4: a b c b d d e c e f a b c d e f attempted Cholesky ...

(c) Now, compose the vector resultant∑n

j=1

∂f∂qj

qj and show that it equals ∇f(x).

14. A function f(x) of two variables has been evaluated at the following points.

f(1.999, 0.999) = 7.352232, f(1.999, 1) = 7.381671, f(1.999, 1.001) = 7.411257;

f(2, 0.999) = 7.359574, f(2, 1) = 7.389056, f(2, 1.001) = 7.418686;

f(2.001, 0.999) = 7.366922, f(2.001, 1) = 7.396449, f(2.001, 1.001) = 7.426124.

Find out the gradient and Hessian of the function at the point (2,1). How many function

values did you have to use for each of them?

15. Let P (x) be a polynomial on which successive synthetic division by a chosen quadratic poly-

nomial x2 + px + q produces the successive quotients P1(x), P2(x) and remainders rx + s,

ux + v, such that

P (x) = (x2 + px + q)P1(x) + rx + s, P1(x) = (x2 + px + q)P2(x) + ux + v.

(a) Observing that the expressions P1(x), P2(x) and the numbers r, s, u, v all depend upon

p and q in the chosen expression x2 + px + q, differentiate the expression for P (x) above

partially with respect to p and q, and simplify to obtain expressions for ∂r∂p , ∂r

∂q , ∂s∂p , ∂s

∂q .

[Hint: At its roots, a polynomial evaluates to zero.]

(b) Frame the Jacobian J of [r s]T with respect to [p q]T and work out an iterative

algorithm based on the first order approximation to iterate over the parameters p and q

for obtaining r = s = 0. In brief, work out an iterative procedure to isolate a quadratic

factor from a polynomial.

[This is the Bairstow’s method, often found to be an effective way to solve polynomial

equations.]

(c) Implement the procedure to find all roots of the polynomial

P (x) = x6 − 19x5 + 125x4 − 329x3 + 66x2 + 948x − 216

up to two places of decimal, starting with p = 0 and q = 0, i.e. x2 as the initial divisor

expression.

16. Function f(t) is being approximated in the interval [0, 1] by a cubic interpolation formula in

terms of the boundary conditions as

f(t) = [f(0) f(1) f ′(0) f ′(1)] W [1 t t2 t3]T .

Determine the matrix W.

Page 5: a b c b d d e c e f a b c d e f attempted Cholesky ...

Department of Mechanical Engineering

Indian Institute of Technology Kanpur

ME 752: Optimization Methods in Engineering Design (2008-2009 II)

Assignment 1: Important Identities and Results

1. For an m × n matrix A, m < n, P = (AAT )−1 has already been computed. Then, an

additional row aT is appended in A, such that the matrix A gets updated to A. In terms of

P, develop an update formula in the form

P = (AAT )−1 =

[

Q bbT ε

]

, where A =

[

AaT

]

.

Similarly, develop a working rule to update P from available P, if A is obtained by dropping

the last row of A.

[In the active set strategy of nonlinear optimization, such updates are routinely involved while

including and excluding inequality constraints in and from the active set of constraints.]

2. Let an (n − m)-dimensional subspace be defined in Rn as M = {d : Ad = 0}, where

A ∈ Rm×n is full-rank. Show that the orthogonal projection of a vector to this subspace is

accomplished by the transformation

P = In − AT (AAT )−1A.

[Try to derive the transformation, apart from simply verifying.]

3. For an invertible matrix A ∈ Rn×n and non-zero column vectors u,v ∈ Rn,

(a) find out the rank of the matrix uvT , and

(b) prove the following identity (Sherman-Morrison formula) and comment on its utility.

(A − uvT )−1 = A−1 + A−1u(1 − vT A−1u)−1vTA−1.

4. In a certain application, the inverse of the matrix In + cAT A is required, where A is a full-

rank m × n matrix (m < n) and c is a large number. For two reasons, a direct inversion of

this matrix is not very advisable. For indirectly obtaining the inverse, prove the identity

(In + cATA)−1 = In − cAT (Im + cAAT )−1A

and verify it for c = 10, n = 3, m = 2 and A = [2e1 3e2 0].

Can you figure out what are the two reasons?

5. For a real invertible matrix A, show that ATA is positive definite. Further, show that, for

ν > 0, ATA + ν2I is positive definite for any real matrix A.

Page 6: a b c b d d e c e f a b c d e f attempted Cholesky ...

6. For solving Ax = b with symmetric positive definite matrix A, we formulate the error vector

e = Ax − b,

start from an arbitrary point x0 and iterate along selected directions d0, d1, d2 etc, as

xk+1 = xk + αkdk, for k = 0, 1, 2, · · · .

(a) Denoting ek = Axk − b, determine αk such that dTk ek+1 = 0, i.e. the step along a

chosen direction eliminates any error along that direction in the next iterate.

(b) Then, find out dT0 e2, dT

0 e3 and dT1 e3, i.e. errors along old directions.

(c) Generalize the observation for the k-th step.

(d) Work out the conditions that the chosen directions must satisfy such that errors along

the old directions also vanish.

[These conditions characterize these directions as conjugate directions for the matrix A.]

7. For n × n symmetric positive definite L and m × n full-rank A (m < n), prove the identity

A(L + cATA)−1AT (Im + cAL−1AT ) = AL−1AT

and use the result to prove that an eigenvector v of AL−1AT with eigenvalue σ is also an

eigenvector of A(L + cATA)−1AT with corresponding eigenvalue 1

c+1/σ .

[This result has valuable implication in the theory behind a powerful duality-based algorithm

(Augmented Lagrangian method) of nonlinear optimization.]

8. Plot contours of the function f(x) = x21x2 −x1x2 + 8, in the region 0 < x1 < 3, 0 < x2 < 10.

Develop a quadratic approximation of the function around (2,5) and superimpose its contours

with those of f(x). Are the contour curves of this quadratic approximation elliptic, parabolic

or hyperbolic?

9. Find a solution of the equation e−x = x up to two places of decimal. Is the solution unique?

10. Let A be an m × n matrix (m < n) of rank m and let L be an n × n symmetric positive

definite matrix. Then, show that the (n + m) × (n + m) matrix

H =

[

L AT

A 0

]

,

is non-singular, but indefinite.

Page 7: a b c b d d e c e f a b c d e f attempted Cholesky ...

Department of Mechanical Engineering

Indian Institute of Technology Kanpur

ME 752: Optimization Methods in Engineering Design (2008-2009 II)

Assignment 2: Optimization Problems and Algorithms

1. Mrs. Anna D’Souza (height 170 cm) called our office requesting for an economical size and

best positioning for a mirror that she would get fixed on a door of her bedroom almirah.

Considering that the cost of the mirror is proportional to its area and taking appropriate es-

timates of dimensions, margins etc, formulate an optimization problem and work out suitable

height, width and positioning of the mirror in which she can view her full image.

2. Design a minimum cost cylindrical tank closed at both ends to contain a fixed volume V of

fluid. Assume that the cost depends directly on the area A of sheet metal.

3. Find out the maximum volume of a tank of the above kind that has a given surface area A.

Show that the relationships between surface area and volume for the optimal design in this

case is equivalent to that in the earlier problem.

4. Design a tank of the above kind for a volume of 250 m3, incorporating the assembly constraint

H ≤ 10 − D/2 appearing from the plan of locating it at a particular place in a shed.

5. Using a graphical method, solve the problem

minimize f(x) = x21 + x2

2 − 4x1 + 4

subject to x1 − 2x2 + 6 ≥ 0,

x21 − x2 + 1 ≤ 0,

x1, x2 ≥ 0.

6. Show that the algorithm

xk+1 = A(xk) =

{

1

2(xk + 2) for xk > 1

1

4xk for xk ≤ 1

for the problem

minimize φ(x) = |x|

is not globally convergent. Explain the reason.

7. Find the order of convergence and convergence ratio of the sequence {xk}∞k=0if

(a) xk = αk (b) xk = α2k

for 0 < α < 1.

Page 8: a b c b d d e c e f a b c d e f attempted Cholesky ...

Department of Mechanical Engineering

Indian Institute of Technology Kanpur

ME 752: Optimization Methods in Engineering Design (2008-2009 II)

Assignment 3: Univariate Optimization

1. Identify the stationary points of the following function using the exact method:

f(x) = (x − 1)2 − 0.01x4.

Check if they are minimum or maximum points.

2. Find out the stationary points of the function

f(x) = 5x6 − 36x5 +165

2x4 − 60x3 + 36

analyze their nature.

3. Maximize f(x) = −x3 + 3x2 + 9x + 10 in the interval −2 ≤ x ≤ 4.

4. Implement bounding phase method (refer book by Deb for algorithm) to bracket a minimum

point of the function f(x) = ex−x3. Next, use the program to bracket minima of the functions

in the first three exercises above.

5. Over the bracket arrived at for f(x) = ex − x3, use two iterations of Fibonacci search and

golden section search methods. Implement one of these (whichever you like through this

experience) in a function and use it for all the four functions above upto an accuracy of 10−3.

6. Over the bracket arrived at for f(x) = ex−x3, use two iterations of regula falsi and Newton’s

methods. Implement one of these (whichever you like through this experience) in a function

and use it for all the four functions above upto an accuracy of 10−3.

7. Compare the above two favourite algorithms from two classes in terms of their performance

on the set of these four functions.

8. Identify the regions over which the function e−x2

is convex and where it is concave. Determine

its global minima and maxima.

9. Find all the maxima and minima of the function φ(x) = 25(x− 1

2)4 − 2(x− 1

2)2, identify its

other salient features and sketch its graph.

Page 9: a b c b d d e c e f a b c d e f attempted Cholesky ...

Department of Mechanical Engineering

Indian Institute of Technology Kanpur

ME 752: Optimization Methods in Engineering Design (2008-2009 II)

Assignment 4: Fundamentals of Multivariate Optimization

1. Consider the function f : R2 → R defined by

f(x) = 2x21 − x4

1 + x61/6 + x1x2 + x2

2/2 .

Find out all the stationary points and classify them as local minimum, local maximum and

saddle points.

2. The Hessian H(x) of a function f(x) is positive semi-definite everywhere.

(a) Using the Taylor’s theorem (in the remainder form) around point x2 as

f(x1) = f(x2) + [∇f(x2)]T (x1 − x2) +

1

2(x1 − x2)

T H[x2 + α(x1 − x2)](x1 − x2)

for α ∈ [0, 1], show that

f(x1) ≥ f(x2) + [∇f(x2)]T (x1 − x2).

(b) Does the argument remain valid if we know only that the Hessian is positive semi-definite

(or positive definite) at the point x2, and not everywhere? Why?

(c) Now, consider an arbitrary line through x2 and select two points y and z on opposite

sides of it. Writing the result of part (a) with y and z as x1 in turn, show that

βf(y) + (1 − β)f(z) ≥ f [βy + (1 − β)z]

for some β ∈ [0, 1].

(d) Does the above inequality hold for all β ∈ [0, 1]? Why?

(e) Using the definition of a convex function, summarize the entire result in a single sentence.

3. Find the domain in which the function 9(x21 − x2)

2 + (x1 − 1)2 is convex.

4. (a) Develop a quadratic model of the function 9(x21 − x2)

2 + (x1 − 1)2 around the origin.

(b) Superimpose the contours of the original function and the quadratic model. (Use a

software, e.g. MATLAB, to develop the contours.)

(c) With a circular trust region of radius 0.2 unit, mark the point where a step from the

origin should reach.

(d) Obtain the coordinates of this point from the plot and repeat the entire process for one

more step.

Page 10: a b c b d d e c e f a b c d e f attempted Cholesky ...

Department of Mechanical Engineering

Indian Institute of Technology Kanpur

ME 752: Optimization Methods in Engineering Design (2008-2009 II)

Assignment 5: Basic Methods of Multivariate Optimization

1. Use Nelder and Mead’s simplex search method, starting from the origin, to find the minimum

point of the function

f(x, y, z) = 2x2 + xy + y2 + yz + z2 − 6x − 7y − 8z + 9.

2. For minimizing the function f(x) = (x1 − x2)2 + (1 − x1)

2 + (2x1 − x2 − x3)2, consider the

origin as the starting solution, i.e. x0 = [0 0 0]T .

(a) Evaluate the function f(x0) and gradient g(x0) at this point. Using the negative gradient

as the search direction, define a new function φ(α) = f(x0 − αg(x0)).

(b) Find out the minimizer α0 of φ(α) and update x1 = x0 − α0g(x0).

(c) Similarly, carry out two more such iterations, i.e. find out f(xk), g(xk), αk and xk+1

for k = 1, 2.

(d) Tabulate the results and analyze them in terms of function value as well as distance from

the actual minimum point.

3. Starting from the point x0 = [2 − 1 0 1]T , minimize the function

f(x) = (x1 + 2x2 − 1)2 + 5(x3 − x4)2 + (x2 − 3x3)

4 + 10(x1 − x4)4

by (a) Hooke-Jeeves method with unit initial step size and (b) steepest descent method.

4. Show that, with x0 = [c 1]T , steepest descent iterations for the function f(x) = x21+cx2

2, c >

0 are given by xm = am[c (−1)m]T , where am = (c − 1)m/(c + 1)m. Comment on the

behaviour of the method for large values of c.

5. What are the starting points from which a single iteration of the steepest descent algorithm

would converge to the minimum point of the function 5x21 + 4x1x2 + 3x2

2?

6. While minimizing the function f(x) = 4x21 − 5x1x2 + 3x2

2 + 6x1 − 2x2, a transformation of

the form x1 = ay1 + by2, x2 = cy1 + dy2 was used. For the reformulated problem in terms

of the variables y1, y2, steepest descent method was found to converge to the minimum in a

single iteration from arbitrary starting solution. Find out the conditions that the coefficients

a, b, c, d must satisfy for this to happen.

Page 11: a b c b d d e c e f a b c d e f attempted Cholesky ...

7. (a) Identify the stationary points of the function

f(x) =[

x21 + (x2 + 1)2

] [

x21 + (x2 − 1)2

]

.

(b) Classify the stationary points and find out the minimum value(s).

(c) Starting from the point (1, 1), execute a Newton’s step and evaluate the step as accept-

able or otherwise, in terms of reduction in function value.

8. Consider the function

f(x) = (x21 − x2)

2 + (x1 − 1)2

and the origin as the starting point.

(a) Determine a step of the pure Newton’s method.

(b) Is it a descent step?

(c) Is the associated direction a descent direction?

9. For minimizing the function f(x) = (x21−x2)

2+(1−x1)2 , perform one iteration of Newton’s

method from the starting point [2 2]T and compare this step with the direction of the steepest

descent method, regarding approach towards the optimum.

10. (a) Develop the first order necessary condition for a minimum point of the function

E(x) =1

2‖Ax − b‖2.

(b) Is the resulting system of equations necessarily consistent? Why?

(c) At a solution of this system, does the function necessarily have a minimum value? Why?

(d) Discuss the distribution of minima when ATA is singular.

11. Solve the following systems of equations by formulating them as optimization problems:

(a) x2 − 5xy + y3 = 2, x + 3y = 6 and (b) zex − x2 = 10y, x2z = 0.5, x + z = 1.

12. Starting from x = [1 1 1]T , solve the system of equations

16x41 + 16x4

2 + x43 = 16, x2

1 + x22 + x2

3 = 3, x31 − x2 = 0

by Newton’s method and Levenberg-Marquardt method.

13. Find constants a1, a2, a3, a4 and λ for least square fit of the following tabulated data in the

form a1 + a2x + a3x2 + a4e

λx.x 0 1 2 3 4 5 6 7 8

y 20 52 69 76 74 67 55 38 17

[Hint: You may attempt it as a five-variable least square problem or as a single-variable

optimization problem with a linear least square problem involved in the function evaluation.]

Page 12: a b c b d d e c e f a b c d e f attempted Cholesky ...

Department of Mechanical Engineering

Indian Institute of Technology Kanpur

ME 752: Optimization Methods in Engineering Design (2008-2009 II)

Assignment 6: Multivariate Optimization Methods

1. For the function f(x, y, z) = 2x2 +xy +y2 +yz+z2−6x−7y−8z+9, develop expressions for

the gradient and Hessian, and work out three conjugate directions through a Gram-Schmidt

procedure. Now, starting from origin, conduct four sets of line searches along (a) e1, e2, e3;

(b) three successive steepest descent directions; (c) the three conjugate directions developed;

and (d) three directions recommended by conjugate gradient method. In each case, trace the

function values and gradient norms through the steps.

2. Starting from the point x0 = [2 − 1 0 1]T , minimize the function

f(x) = (x1 + 2x2 − 1)2 + 5(x3 − x4)2 + (x2 − 3x3)

4 + 10(x1 − x4)4

by (a) Polak-Ribiere/Fletcher-Reeves method and (b) Powell’s conjugate direction method,

and compare their performance in terms of number of function evaluations.

3. Following is an excerpt from the record of line searches in a run of Powell’s conjugate directions

method applied in a two-variable problem.

· · · → (2, 5) → (2.9, 6.2) → (4.2, 6.2) → (4.5, 6.6) → (4.9, p) → (5.05, q) → (5.09, r) → · · ·

What are the values of p, q and r?

4. Starting from the origin and taking the identity matrix as the initial estimate of Hessian

inverse, apply a few steps of the DFP method on the Himmelblau function

f(x1, x2) = (x21 + x2 − 11)2 + (x1 + x2

2 − 7)2.

Show the progress of the iterations superimposed with a contour plot, and record the devel-

opment of the inverse Hessian estimate.

5. Using the same starting point, apply conjugate gradient (Polak-Ribiere or Fletcher-Reeves)

method and quasi-Newton (DFP or BFGS) method to minimize the function

f(x) =[

x21 + (x2 + 1)2

] [

x21 + (x2 − 1)2

]

to an accuracy of 10−6. In each case, apart from the function value and gradient, also evaluate

the exact Hessian (which is not needed and not to be used for the iteration process) after

every line search and examine the relation of the two current search directions (previous and

next) with the local Hessian. Conduct such experiments with at least two starting points.

Page 13: a b c b d d e c e f a b c d e f attempted Cholesky ...

Department of Mechanical Engineering

Indian Institute of Technology Kanpur

ME 752: Optimization Methods in Engineering Design (2008-2009 II)

Assignment 7: Theory of Optimization Methods (optional)

1. For minimizing f(x) = 1

2xTAx + bTx, with positive definite Hessian matrix A, the starting

point has been taken as x0 and the starting direction as d0 = −g0, the negative gradient.

(a) Determine α0 such that x1 = x0 + α0d0 minimizes the function over the line from x0

along the direction d0.

(b) If at x1, the gradient g1 6= 0, then determine β0 that will make the new direction

d1 = −g1 + β0d0 conjugate to d0.

(c) Show that the subspace spanned by g0 and g1 is the same as that spanned by g0 and

Ag0, which is also the same as spanned by d0 and d1, i.e.

< g0,g1 > = < g0,Ag0 > = < d0,d1 > .

(d) Next, determine α1 such that x2 = x1 + α1d1 minimizes the function over the line from

x1 along the direction d1.

(e) If g2 6= 0, then determine β1 that will make d2 = −g2 + β1d1 conjugate to d1. Show

that the resulting direction d2 is conjugate to d0 as well.

(f) Show the following identity concerning the subspace traversed so far.

< g0,g1,g2 > = < g0,Ag0,A2g0 > = < d0,d1,d2 > .

(g) In the same manner, determine α2, x3, β2 and d3. Establish the conjugacy of d3 to

the old directions and the identity similar to the above, over the expanded subspace

< d0,d1,d2,d3 >.

(h) Finally, write the general result in terms of the step index k and prove it.

[In your arguments, you may use the expanding subspace theorem, which has been proved

earlier independently.]

2. Noting that the rank-one update on the inverse Hessian as

Bk+1 = Bk+ak(pk−Bkqk)(pk−Bkqk)T = Bk+ak

[

pkpTk − Bkqkp

Tk − pkq

Tk Bk + Bkqkq

Tk Bk

]

with ak = pTk qk − qT

k Bkqk fulfils the key requirement of ensuring the equality Bk+1qk = pk

but fails to guarantee continued positive definiteness, we propose to generalize the update

formula as

Bk+1 = Bk + αpkpTk − βBkqkp

Tk − γpkq

Tk Bk + δBkqkq

Tk Bk.

Page 14: a b c b d d e c e f a b c d e f attempted Cholesky ...

(a) Determine the conditions on the coefficients α, β, γ and δ so as to fulfil the basic

requirement, which is to ensure Bk+1qk = pk.

(b) Further, taking γ = β for symmetry of Bk+1, solve the above equations for α and δ in

terms of β.

(c) Show that the resulting update formula for Bk+1 in terms of the only remaining free

parameter β is equivalent to the Broyden family of update formulae.

(d) Imposing one more condition on the coefficients, special cases can be derived. Verify

that

i. condition α = δ leads to the degenerate rank-one update,

ii. condition β = 0 gives the DFP update formula, and

iii. condition δ = 0 results in the BFGS formula.

(e) Assuming Bk to be positive definite, denote√

Bkx = u and√

Bkqk = v for an arbitrary

vector x and develop a simplified expression for xT Bk+1x.

(f) Show that pTk qk = αkg

Tk Bkgk and hence determine a bound on the value of β that will

ensure Bk+1 to be positive definite.

(g) Show that, with any β satisfying the above bound for the update of Bk+1, any starting

point x0 and any positive definite matrix B0, the quasi-Newton iterations

dk = −Bkgk; Line search for αk; pk = αkdk; xk+1 = xk +pk; qk = gk+1 −gk;

applied on a convex quadratic problem, with constant positive definite Hessian H, leads

to the following additional properties:

pTi Hpk = 0, qT

k Bkqi = 0 and Bk+1qi = pi for 0 ≤ i < k .

3. The BFGS update on the inverse Hessian is not a rank-two update, but the equivalent of a

rank-two update on the corresponding Hessian. To derive it, consider a DFP-like update of

the Hessian as

Hk+1 = Hk +qkq

Tk

qTk pk

− HkpkpTk Hk

pTk Hkpk

and apply the following two steps of Sherman-Morrison formula to modify its inverse.

(a) First, put A = Hk, A−1 = Bk, u = µqk, v = µqk, where µ2 = 1

qTkpk

, and develop the

intermediate update B′ = (A + uvT )−1, which is the inverse of H′ = Hk +qkq

Tk

qTkpk

.

(b) Next, put A = H′, A−1 = B′, u = −νHkpk,v = νHkpk, where ν2 = 1

pTkHkpk

, and work

out the final update as Bk+1 = (A + uvT )−1, which is now the inverse of

Hk+1 = H′ − HkpkpTk Hk

pTk Hkpk

= Hk +qkq

Tk

qTk pk

− HkpkpTk Hk

pTk Hkpk

.

Page 15: a b c b d d e c e f a b c d e f attempted Cholesky ...

Department of Mechanical Engineering

Indian Institute of Technology Kanpur

ME 752: Optimization Methods in Engineering Design (2008-2009 II)

Assignment 8: Framework of Constrained Optimization

1. Sketch the feasible region described by the constraints x1 − x3 + 1 = 0 = x21 + x2

2 − 2x1.

Identify irregular points of the constraints, if any. Can you reduce an optimization problem

over this domain to a 2-variable problem? To a single-variable problem? How?

2. Reduce the problem

minimize f(x)

subject to Ax = b,

gi(x) ≤ 0 for 1 ≤ i ≤ q

to a k-variable problem (where k = n − p), given that A ∈ Rp×n has full row rank.

3. We want to determine the minimum sheet metal needed to construct a right cylindrical can

(including bottom and cover) of capacity at least 1.5 litre with diameter between 5 cm and

12 cm, and height between 10 cm and 18 cm. Taking sensible assumptions, write down the

KKT conditions, identify the salient points of the domain, where constraint boundaries meet,

as KKT candidate points and then test the conditions on those points.

4. For the problem

minimize f(x) = 0.01x21 + x2

2

subject to g1(x) = 25 − x1x2 ≤ 0, g2(x) = 2 − x1 ≤ 0 ;

obtain the solution using KKT conditions, sketch the domain with the solution and verify

the second order sufficient condition for optimality. Estimate the new optimal value of the

function if (a) the first constraint is changed to g1 = 26 − x1x2 ≤ 0, or (b) the second

constraint is changed to g2 = 3 − x1 ≤ 0.

5. Verify that (1, 0, 3) is a KKT point of the NLP problem

minimize f(x) = −x31 + x3

2 − 2x1x23

subject to 2x1 + x22 + x3 = 5,

5x21 − x2

2 − x3 ≥ 2,

x1, x2, x3 ≥ 0.

Examine the second order conditions. Is this a convex programming problem?

Page 16: a b c b d d e c e f a b c d e f attempted Cholesky ...

6. Identify the KKT points (i.e. points satisfying KKT conditions) of the problem

minimize f(x) = x21 − x2

2

subject to x21 + 2x2

2 = 4,

and examine them through the second order conditions for optimality.

7. Locate the KKT point(s) of the NLP problem

minimize f(x) = x21 + x2

2 − 2x2 − 1subject to g1(x) = 4(x1 − 4)2 + 9(x2 − 3)2 − 36 ≤ 0,

g2(x) = 9(x1 − 4)2 + 4(x2 − 3)2 − 36 ≤ 0

over a sketch of the domain.

8. Write down the formal KKT conditions of the NLP problem

minimize f(x) = x2 − 8x + 10 subject to x ≥ 6 .

Develop the Lagrangian L(x, µ) of this problem, evaluate its derivatives up to the second

order and construct contours of the Lagrangian function on the x-µ plane.

9. For the problem

minimize f(x) = (x1 − 3)2 + (x2 − 3)2 subject to 2x1 + x2 ≤ 2;

develop the dual function, maximize it and find the corresponding point in x-space.

Compare the optimal values of the primal and dual functions.

10. Show that convex functions gi(x) for all i in g(x) ≤ 0 with linear equality constraints in

h(x) = 0 define a convex domain.

11. Suppose that a regular point x∗ of a convex programming problem satisfies the KKT condi-

tions.

(a) If it is not a local minimum, then show that assumption of an arbitrarily close feasible

point y, such that f(y) < f(x∗), leads to a contradiction.

(b) Now that we are forced to admit x∗ as a local minimum point, let us suppose that it is

not a global minimum point. Then, taking a point z somewhere in the domain such that

f(z) < f(x∗), show that you can always find another point y satisfying all the premises

of part (a), and hence leading to contradiction.

(c) Finally, suppose that x∗ is a global minimum point, but it is not unique. Then, con-

sidering another global minimum point w, show that every point in the line segment

joining x∗ and w is also a global minimum point.

(d) Summarize the complete result in the form of a statement on KKT conditions in the

context of a convex programming problem.

Page 17: a b c b d d e c e f a b c d e f attempted Cholesky ...

Department of Mechanical Engineering

Indian Institute of Technology Kanpur

ME 752: Optimization Methods in Engineering Design (2008-2009 II)

Assignment 9: Linear and Quadratic Problems

1. Formulate the problem

maximize xα1

1xα2

2· · · xαn

n

subject to xβi1

1xβi2

2· · · xβin

n ≤ bi for i = 1 to m

xj ≥ 1 for j = 1 to n

as a linear programming problem and justify your formulation.

2. Using the simplex method, solve the following LP problems.

(a) Minimize x1 subject to

2x1 + x2 ≤ 2, x1 + 5x2 + 10 ≥ 0, x2 ≤ 1.

(b) Minimize 3x1 + x2 subject to

4x1 + x2 ≥ 3, 4x1 + 3x2 ≤ 6, x1 + 2x2 ≤ 3, x1, x2 ≥ 0.

3. Implement the simplex method in a general program, use it on the following LP problems,

and report your results and experience.

(a) Minimize 2x1 + 3x2 subject to

4x1 − 5x2 ≤ 17, −3x1 + 2x2 + 10 ≥ 7, x1, x2 ≤ 0.

(b) Minimize 3x1 + 4x2 subject to

3x1 + 2x2 ≤ 12, x1 + 2x2 ≤ 6, 2x1 − 7x2 ≥ 10, x1, x2 ≥ 0.

4. Maximize f(x) = 2x1 + 9x2 + 3x3 subject to

x1 + x2 + x3 ≤ 1, x1 + 4x2 + 2x3 ≤ 2, x1, x2, x3 ≥ 0;

and find out the Lagrange multipliers corresponding to the constraints at the optimal point.

5. Consider the two-variable optimization problem

minimize f(x, y) = c1x + c2y

subject to a11x + a12y = b1, a21x + a22y ≤ b2, y ≥ 0.

Page 18: a b c b d d e c e f a b c d e f attempted Cholesky ...

(a) Develop the Lagrangian for the problem and work out the KKT conditions.

(b) Use these conditions to express the optimal function value in terms of the Lagrange

multipliers and determine its sensitivity to b1 and b2.

(c) Develop the dual problem.

(d) Work out the KKT conditions for this dual problem.

6. In three-dimensional space, we have a line segment with known end-points A (a1, a2, a3) and

B (b1, b2, b3). Similarly, we have a triangle with known vertices P (p1, p2, p3), Q (q1, q2, q3) and

R (r1, r2, r3). Formulate the problem of finding the closest distance between the line segment

and the triangle as an optimization problem. Develop the KKT conditions for the problem.

If a given pair of points (on the line segment and on the triangle) together satisfies the KKT

conditions, can we say that this pair gives a local minimum for the distance?

7. Using quadratic programming approach, solve the problem formulated in the previous exer-

cise, for the triangle PQR with P (10, 0, 0), Q(0, 8, 0) and R(0, 0, 6) for the following cases of

line segment AB:

(i) A(1,−1, 1), B(6, 9, 6); (ii) A(1, 3, 8), B(3, 9, 12) and (iii) A(8, 5, 0), B(3, 1, 6).

Attempt both active set and slack variable strategies with several starting solutions.

8. Starting from the origin and using square trust regions by imposing artificial bounds on the

variables (take initial size as 0.4 units), use quadratic programming as the iterative step for

the unconstrained minimization problem of the function 9(x21 − x2)

2 + (x1 − 1)2. Use exact

gradient and Hessian for defining the quadratic model function at every iteration.

9. Consider a QP problem with two variables and a single inequality constraint (x,p ∈ R2) as

minimize f(x) =1

2xTQx − bT x + c subject to pTx ≤ d.

(a) Write down the complete KKT conditions. Identify the number of unknowns to be solved

and number and type (linear/nonlinear) of equations and inequalities to be satisfied.

(b) Develop formulas (in terms of Q, b, c, p, d, which are data for the problem) for these

unknowns if the constraint is inactive.

(c) Develop formulas for these unknowns if the constraint is active.

(d) Develop algorithmic steps to take care of both these cases for any given set of data, with

positive definite Q.

Page 19: a b c b d d e c e f a b c d e f attempted Cholesky ...

Department of Mechanical Engineering

Indian Institute of Technology Kanpur

ME 752: Optimization Methods in Engineering Design (2008-2009 II)

Assignment 10: A Case Study

Consider the NLP problem

minimize f(x) = 2(x21 + x2

2 − 1) − x1

subject to g1(x) = 4(x1 − 4)2 + 9(x2 − 3)2 − 36 ≤ 0,g2(x) = 9(x1 − 4)2 + 4(x2 − 3)2 − 36 ≤ 0.

(1)

1. Linearizing the objective function and constraint functions around the selected point (3, 2),

set up the LP problem according to the Frank-Wolfe formulation.

2. Sketch and describe the domain of this LP problem.

3. Find out the solution of this LP problem. Rather than solving the LP problem formally, you

can identify the solution from the sketch and establish it convincingly.

4. Is the resulting point a feasible solution of the original NLP problem (1)? If not, then what

can be done to obtain a feasible solution to proceed to the next iteration?

5. Conduct one more iteration of the above procedure.

6. In a fresh attempt, starting from the original point (3, 2), linearize only the constraint func-

tions, leaving the objective function as it is.

7. Solve the resulting quadratic programming problem.

8. Is the resulting point feasible for the original NLP problem? If not, then what would you do

to obtain a feasible solution to proceed to the next iteration?

9. Your friend Reeta wants to solve the NLP problem (1) approximately, but refuses to learn

any optimization algorithm. However, she is good at geometry and would not mind drawing

a few ellipses and circles. Chalk out a clear and economical action plan for her to capture

the optimal solution roughly through geometric construction.

10. Develop the diagram(s) Reeta would produce following your advice.

11. How many KKT points do you expect for the NLP problem (1): none, unique, multiple or

infinite? Support your answer with clear arguments.

12. Identify one KKT point of the problem. Use (and spell out) discretion in abandoning branches

of fruitless calculation.

Page 20: a b c b d d e c e f a b c d e f attempted Cholesky ...

Department of Mechanical Engineering

Indian Institute of Technology Kanpur

ME 752: Optimization Methods in Engineering Design (2008-2009 II)

Assignment 11: Methods of Constrained Optimization

1. We want to minimize f(x) = x2 − 8x + 10 subject to x ≥ 6 by using the penalty function1

2max[0, g(x)]2, where g(x) = 6 − x. Minimize a sequence of penalized functions, with the

penalty parameter values c = 0, 0.0001, 0.001, 0.01, 0.1, 1, 10, 100, 1000, 10000.

2. Starting from the origin and using square trust regions by imposing artificial bounds on the

variables (take initial size as 0.4 units), use quadratic programming as the iterative step for

the unconstrained minimization problem of the function 9(x21 − x2)

2 + (x1 − 1)2. Use exact

gradient and Hessian for defining the quadratic model function at every iteration.

3. Consider the problem

Minimize f(x) = 2(x21 + x2

2 − 1) − x1

subject to x21 + x2

2 − 1 = 0 .

(a) Show that x∗ = [1 0]T is the minimizer and find the associated Lagrange multiplier.

(b) Suppose that xk = [cos θ sin θ]T where θ ≈ 0. Verify feasiblity and closeness to opti-

mality.

(c) Set up and solve the corresponding quadratic program.

(d) With a full Newton step xk+1 = xk + dk, examine feasibility at xk+1 and compare the

function values at xk and xk+1.

(e) From this exercise, can you draw any significant conclusion about an active set method?

4. Solve the NLP problem

Minimize f(x) = −3x − 4y − 5xsubject to x2 + y2 + (z − 1)2 ≤ 4,

x2 + y2 + (z + 1)2 ≤ 4,(x − 1)2 + y2 + z2 ≤ 4;

by the cutting plane method.

5. A chain is suspended from two thin hooks that are 160 cm apart on a horizontal line. The

chain consists of 20 links of steel, each 10 cm in length. The equilibrium shape of the chain

is found by formulating the problem as

minimize

n∑

i=1

ciyi subject to

n∑

i=1

yi = 0 and L −n

i=1

l2 − y2i = 0,

Page 21: a b c b d d e c e f a b c d e f attempted Cholesky ...

where ci = n − i + 1/2, n = 20, l = 10, L = 160. Derive the dual function for this problem

and work out a complete steepest ascent formulation for maximizing the dual function, and

hence solving the original problem.

Implement this formulation in a steepest ascent loop and obtain optimal values of Lagrange

multipliers, equilibrium configuration and the corresponding (minimum) potential energy, i.e.

(∑n

i=1ciyi).

6. Starting from the origin (which is the unconstrained minimum point of the objective function),

use augmented Lagrangian method to minimize the function 5x21+4x1x2+3x2

2 over the domain

defined by constraints 2 sin x1 ≤ x2 ≤ 2 cos x1 and x1 + x22 = 15. (Try penalty parameter

values 2, 10 and 20.)

7. Use an alternative method to solve the above NLP problem. Now, rather than using any

formal method of constrained optimization, use variable elimination, study of the domain,

function plots and common sense to crack the problem. In brief, solve the problem the way

you would do if solving this were utterly necessary for your survival and if you had not taken

this optimization course. (You do not need to go all the way plotting contours like Reeta!)

8. Starting from the point (1, 1), perform two iterations of the feasible directions (Zoutendijk)

method to find the point, farthest from the point C(1.5, 4), in the domain defined by

4.5x1 + x22 ≤ 18, 2x1 − x2 ≥ 1, x1, x2 ≥ 0 .

9. Starting from the feasible solution (1,−2), and with initial line-search bound αU = 0.25, solve

the problem

Minimize x21 + 2x2

2 subject to x21 + x2

2 ≥ 5

by the generalized reduced gradient method using

(a) slack variable strategy,

(b) active set strategy.

10. Use the active set formulation of the gradient projection method to solve the NLP problem

Minimize (x1 − 1)2 + 4(x2 − 3)2

subject to x21 + x2

2 ≤ 5

(x1 − 1)2 + x22 ≥ 1

with initial line-search bound αU = 0.25, and starting point (a) (0, 0) and (b) (1, 1.5).

Page 22: a b c b d d e c e f a b c d e f attempted Cholesky ...

Department of Mechanical Engineering

Indian Institute of Technology Kanpur

ME 752: Optimization Methods in Engineering Design (2008-2009 II)

Assignment 12: Miscellaneous Topics

1. Minimize f(x, y, z) = 2x2+xy+y2+yz+z2−6x−7y−8z+9 if z can take only integer values.

Repeat the exercise, considering y and z both taking values from the set {0, 1.2, 2.8, 3.9, 6.2}.

2. For a particle of mass m, define the (Lagrangian) function

L(t, x, y, x, y) =m

2(x2 + y2) − mgy

and develop the integral (action)

s =

Ldt.

The problem is to determine the trajectory x(t), y(t) of the particle from (0, 0) at time t = 0

to (a, b) at time t = T , along which s is minimum, or at least stationary.

(a) Verify that x(t) = αt(t−T ) + at/T, y(t) = βt(t− T ) + bt/T is a feasible trajectory, and

develop the function s(α, β).

(b) Formally find out values of α and β to minimize s, and hence determine the required

trajectory x(t), y(t). Find out x, y, x, y. Which law of physics did you derive just now,

in a way?

(c) Now, bypass the work of the two previous steps and work on a more direct theme. Work

out the variation δs as a result of arbitrary variations δx(t), δy(t), that respect the given

boundary conditions, and consistent variations in their rates as well. Insist on δs = 0 to

derive the same result as above. [Hint: To get rid of the δx and δy terms, integrate

the corresponding terms by parts.]

3. We want to solve the Blasius problem in the form

f ′′′(x) + f(x)f ′′(x) = 0, f(0) = f ′(0) = 0, f ′(5) = 1

by Galerkin method. Let us choose x2, x3, · · · , x8 as the basis functions, which already satisfy

the first two conditions. Taking 1, x, x2, · · · , x5 as trial functions and using the boundary

condition at x = 1, determine the solution.


Recommended