Optimization: Theory, Algorithms, Applicationshwolkowi/henry/reports/talks.d/t06talks.d/... ·...

Optimization: Theory,Algorithms, Applications

MSRI - BerkeleySAC, Nov/06

Henry Wolkowicz

Department of Combinatorics & Optimization

University of Waterloo

Optimization: Theory, Algorithms, Applications – p.1/37

OutlineWhy are we here? (What is Optimization?)

History of Optimization

Main Players

Most important Open Problems

Different Areas for connections

Resources/References


What isOptimization?

Two quotes from Tjalling C. Koopmans, NobelMemorial Lecture: [22]

“best use of scarce resources”

“Mathematical Methods of Organizing andPlanning of Production", [18]

————————-

(Kantorovich and K.: joint winners Nobel Prize

Economics 1975, "for their contributions to the

theory of optimum allocation of resources")


History

Virgil’s Aeneid 19 BCE, Legend of CarthageQueen Dido’s Problem: Queen fled to African coastafter husband killed; she begged King Jambas (local ruler)for land; he granted only as much as she could enclosewithin a bull’s hide; she sliced the hide into strips; used thestrips to surround a large area. Optimal shape was ?—————-In 3-dimensions: soap bubbles and films are examples ofminimal surface areas.


The BrachistochroneProblem

cycloid or curve of fastest descent;stationary body starts at first point and passesdown along curve to second point, under actionof constant gravity, ignoring friction.Bernoulli (1696)/Calculus of Variations

Figure 1: CycloidOptimization: Theory, Algorithms, Applications – p.5/37

History of Math.Progr., 1991 [26]

• remarkably short - rooted in applications

• 1940’s - driven by applications (war time -moving men and machinery)

• Dantzig (Pentagon - Stanford) andKantorovich (Leningrad)

• Others: Hitchcock, Koopmans, Arrow,Charnes, Gale, Goldman, Hoffman, Kuhn,von Neumann (game theory, duality,computers) etc...


Dantzig/LinearProgramming, LP

• Planning problems:

Assign 70 men to 70 jobs; vij benefit ofman i assigned to job j (LinearAssignment Problem, LAP)

but 70! > 10100 (a googol)• Dantzig visited Von Neumann - Oct 3, 1947 -learned about Farkas’ Lemma, Duality (gametheory) - SIMPLEX METHOD for LP—————-• Hotteling: But we all know the world isnonlinear ... Von Neumann: ... if linearapplication ... use it Optimization: Theory, Algorithms, Applications – p.7/37

UnreasonableSuccess of Simplex

LP min cTx s.t. Ax = b, x ≥ 0.• Klee-Minty 1970: exponential time example forsimplex method. But, linear time in practice.• SIAM 70s computer survey: 70% of (world)computer time spent on LP/simplex• Is LP in class P (easy) or class NP (hard)?• Russian Mathematician Khachian 1978: LPalgorithm based on ellipsoids/duality/inequalitiesshowed LP is in P. (NYT frontpage stories/fables)• Hungarian method for LAP in O(n3) time,[23, 31]; BUT - still no known strongly polynomialmethod for general LP.


A First Meeting

Figure 2: George B. Dantzig and Leonid Khachiyan,meeting for the first time, February 1990, Asilomar, Cali-fornia, at the SIAM-organized workshop Progress in Math-ematical Programming.

2005: Khachiyan died Apr 29 (age 52)Dantzig died May 13 (age 90)


Lagrange MultiplierExtensions

NLP: MOTIVATED by LP Success e.g. [24]• [25] 1951: Kuhn-Tucker optimality conditionsfor nonlinear programming (NLP)[20] 1939: Karush, Masters Thesis, Math., Univ.Chicago(Same constraint qualification)• [17] 1948: Fritz John, Extremum problems withinequalities...


K-K-T Conditions

NLP min f(x) s.t. g(x) ≤ 0, h(x) = 0

CQ: Geometry (cone of tangents) coincides withalgebra (linearization) (modern opt cond)∇f(x∗) + g′(x∗)λ∗ + h′(x∗)µ∗ = 0, λ ≥ 0, dual feas.

h(x∗) = 0, g(x∗) ≤ 0, primal feas.g(x∗)Tλ∗ = 0, compl. slack.

Proof: Apply Farkas’ Lemma, 1902, to locallinearization. (modern: use hyperplaneseparation theorem/S.Mazur’s geometricHahn-Banach Theorem.)


Further Extensions• Infinite (Cone) Programs, Duffin, 1956 [6].• optimization with respect to partial orders,[15, 36, 16, 28, 14].• Optimal Control (Pontryagin MaximumPrinciple)• Discrete/Combinatorial Optimization

FIRST CHANCES- QUESTIONS? DISCUSSION?


NEOS/Argonne/Solvers

Figure 3: Optimization Tree, neos.mcs.anl.govOptimization: Theory, Algorithms, Applications – p.13/37

Quasi-NewtonMethods

For Unconstrained Optimization:• Least Change Secant Methods, Variable MetricMethods Davidon’59[5]/Fletcher-Powell’63[10](DFP), andBroyden[4]/Fletcher[9]/Goldfarb[13]/Shanno[34]’70/(BFGS).• rank-two updates of Hessian, maintainspositive definite Hessian approximations• But: automatic differentiationGriewank-Corliss’91 differentiates the codeefficiently.


Power of Duality

Find optimal trajectory/control (for rocket)

µ0 = min J(u) = 12‖u(t)‖2 = 1

2

∫ t1t0

u2(t)dt

s.t. x(t) = A(t)x(t) + b(t)u(t)

x(t0) = x0, x(t1) ≥ c.

Using fundamental solution matrix Φ

x(t1) = Φ(t1, t0)x(t0) +

∫ t1

t0

Φ(t1, t)u(t)b(t)dt

︸︷︷︸

integral oper. Ku


Duality cont...

Convex Pgm

{

min J(u) = 12‖u(t)‖2

s.t. Ku ≥ d

Lagrangian dual (best lower bound) is

µ0 = maxλ≥0 minu{J(u) + λT (d −Ku)}= maxλ≥0 λTQλ + λTd simple FIN. dim. QP

where Q = −12

∫ t1t0

Φ(t1, t)b(t)b(t)T Φ(t1, t)dt

u∗(t) = λT∗ Φ(t1, t)b(t)


Convex Analysis

Lies behind results in Optimization• Classic ’70 text by Rockafellar (UofW, Seattle),[33];• Nonsmooth Analysis: Clarke, Borwein (SmoothVariational Principle), Mordukhovich, Lewis• Variational Principles (powerful optimalityconditions, extensions to nonconvex case)


Proving/Generating Theorems using

Optimization

Spectral Decomposition Theorem, A = AT :• min xTAx s.t. xTx = 1Lagrangian is: L(x, λ) = xTAx + λ(1 − xTx)stationarity: ∇L(x1, λ) = 2Ax1 − 2λx1 = 0

min eig since obj.: xT1 Ax1 = λxT

1 x1 = λ → min

Now add constraint xTx1 = 0, to get secondeigen-pair etc...


Proving/Generating Theorems using

Optimization cont...

Eigenvalue Bounds, A = AT :

min λ1(A)

s.t∑

i λi(A) = trace (A)∑

i λ2i (A) = trace (A2)

Lagrangian is: L(....., stationarity: ∇L(.... = 0

Explicit solution: m := trace (A)n

; s2 = trace (A2)n

− m2

λmin(A) ≤ m +√

n − 1sSimilarly, get upper/lower bounds for λ2(A) andother functions of the eigenvalues, e.g. [35].


SUMT• Penalty and Barrier Methods (lost favour)• Frisch ’55 [11];Sequential Unconstrained MinimizationTechniques, Fiacco-McCormick, ’68 [8],• Penalize 1

µ‖equality constraints‖2, 1

µ→ ∞,

-replace inequality constraints by smooth barrier,µΣk log gk(x), µ ↓ 0• Solve sequence of simpler unconstrainedproblems

minx

Bµ(x) = f(x) +1

µ‖h(x)‖2 + µΣk log gk(x)


Methods usingLagrange Multipliers

• Hestenes, Rockafellar, Fletcher, Powell,Conn-Gould-Toint (augmented Lagrangians -combination of Lagrange and penalty methods),Gill-Murray-Wright (’81 [12], Stanford).• Sequential Quadratic Programming (SQP):Solve the Newton direction for the optimalityconditions using quadratic approximationsinvolving the Lagrangian function.

SECOND CHANCES- QUESTIONS? DISCUSSION?


Interior PointMethods, LP

• 1984: Karmarkar’84[19](Berkeley), interiorpoint method to improve complexity (polynomialtime) results. BUT: high efficiency claimed forpractical problems!(NYT frontpage stories/fables)• Stanford gang of four(Gill-Murray-Wright-Saunders) connection tolog-barrier methods (came back).


Interior PointRevolution

• Kojima-Mizuno-Yoshise’89[21] elegantprimal-dual path-following framework.Mehrotra’92[30] predictor-correctorspeedup-stability.• OB1 Lustig-Marsten-Shanno, ’92 [29], legalbattle with Bell Labs• CPLEX tool for LP - large scale - 15 millionvariable problems solved on a desktop.• Nesterov-Nemirovski, ’89 [32], extensions toconvex problems, e.g. cone optimizationproblems, e.g. Semidefinite Programming


SemidefiniteProgramming

• Elegant Theory, Efficient Algorithms, ManyApplications• MAX-CUT Undirected, weighted graphG = (N,E), weights W = wij. Cut (divide) the setof nodes N into two sets so that the sum ofweights that are cut is maximized.

p∗ := max 14

∑

ij wij(1 − xixj)

s.t xi ∈ {±1}, i = 1, . . . n

s.t/equiv. x2i = 1, i = 1, . . . n


SDP via Dual of DualLagrangian Dual

d∗ := minλ maxx xTQx +∑

i λi(1 − x2i )

= minQ−Diag (λ)�0

etλ

dual of MC dual of dual of MCmin etλ

s.t Diag (λ) − Z = Q

Z � 0

max trace QX

s.t diag (X) − Z = Q

Z � 0.878 performance guarantee, Goemans andWilliamson (IBM Bay Area)


SDP with CONDORSolves QAP

• QAP: Quadratic Assignment Problem: sizen > 20 considered hard. (compared to fastsolutions for n = 106 for LAP)• important applications to e.g. VLSI design,massive parallelism (Blue Gene/IBM)• SDP bound in a branch and bound framework;using CONDOR (High Throughput Computing-using free cycles worldwide)www.cs.wisc.edu/condor.• Solves Nugent problem n = 30 (and others) forfirst time, [1].


SDP and Robust!!Optimization

Robust optimization: problem data known onlywithin certain bounds.

conic: max bTy s.t. ∀A ∈ U , c − ATy ∈ K

goal: find feasible solution acceptably close tooptimal for data within the bounds.Applications e.g.: control theory; engineeringdesign and finance; Aircraft path planning;machine learning (robust classification, supportvector machines, and kernel optimization); e.g.Ben-Tal [3]; El-Ghaoui (Berkeley) [7].


SDP and Hilbert’s17th Problem, SOS

Hilbert, 1900: Given a multivariatepolynomial that takes only non-negativevalues over the reals, can it berepresented as a sum of squares ofrational functions?

Artin, 1927 - YES; (Gondard & Ribenboim,extension to symmetric matrices, 1974.)But: SOS polys ⊂ nonneg polys(ml(z) vector of monomials)

p(z) is SOS of pols iff p(z) ≡ ml(z)TWml(z),W �Optimization: Theory, Algorithms, Applications – p.28/37

Lax’58 Conjecture isTrue

Lewis, Parrilo, Ramana’03,[27]:Hyperbolic polynomials in three variablesare determinants of three symmetricmatrices.

and this is equivalent to Helton-Vinnikovobservation:

A polynomial q on ℜ2 is a real zeropolynomial of degree d and satisfiesq(0, 0) = 1 if and only if there existmatrices B,C ∈ Sd such that q is given by

q(y, z) = det(I + yB + zC).Optimization: Theory, Algorithms, Applications – p.29/37

SDP Open Problems;Exciting Area

• Which problems can be formulated as SDPs?(Algebraic connections, BIRS/MSRI workshop onPositive Polynomials and Optimization, e.g.Parrilo, ex-postdoc Berkeley)• efficient/stable solutions, large scale problems• extension of SDP methodology: e.g. symmetriccones and relation to Jordan algebras; bilinearmatrix inequalities.• SDP at UC Berkeley: leader is Laurent ElGhaoui; recent grad. Jiawang Nie (working onsensor localization) with co-advisors JimDemmel and Bernd Sturmfels.


OutstandingProblems/Questions

• Kepler (1611) Conjecture: close packing (cubicor hexagonal close packing; have maximumdensities of π/(3

√2) ≈ 74.048%) is the densest

possible sphere packing. Find densest (notnecessarily periodic) packing of spheres - Keplerproblem.


Effective Certificatesof Optimality?

• Hales’ (1997) detailed plan; extensive use ofcomputer calculations. Hales’ full proof in aseries of papers totaling more than 250 pages(Cipra 1998). Proof relies extensively on: globaloptimization; linear programming; intervalarithmetic. Computer files contain more than 3gigabytes of storage, e.g. [2].


Hard NonconvexProblems

• e.g. protein folding - how does naturalphenomena optimize?• - Hubble telescope SDP - projection algorithm.• strongly polynomial LP algorithms; Hirschconjecture (comb. diameter of d-polytope with nfacets is bounded by n − d;• weather prediction uses a least squaresmin/opt problem for initial conditions• Massive Parallel Computing optimize network -minimize heat - VLSI design; metrics forperformance?


Discrete Optimization

• Important applications: e.g. ministry of health -all discrete opt problems• using continuous optimization relaxationswithin branch and bound methods• Gomory cutting planes came back and liebehind current success of solving large scalediscrete problems (e.g. in CPLEX).• QAP problem is NP- hard, but still needs to besolved, i.e. worst case complexity and expectedperformance can differ drastically.


Connections withOptimization

• Discrete and Continuous Optimization• Optimal Control Theory (space program,environment)• Medicine (molecular conformation, scheduling)• Politics (game theory)• Computer Science (massive parallelism, VLSIdesign, computer design, quantum computing)• Management Science and Engineering ingeneral• Economics (government planning)• Statistics, e.g. machine learning


THIRD CHANCES- QUESTIONS? DISCUSSION?


Resources/References• Optimization Frequently Asked Questions:

Linear Programming FAQNonlinear Programming FAQ

www-unix.mcs.anl.gov/otc/Guide/faq/• NEOS: neos.mcs.anl.gov/neos/index.html• e-optimization community:

www.e-optimization.com/• Optimization Online:

www.optimization-online.org/


References

[1] K.M. ANSTREICHER, N.W. BRIXIUS, J.-P. GOUX, and

J. LINDEROTH. Solving large quadratic assignment prob-

lems on computational grids. Math. Program., 91(3, Ser.

A):563–588, 2002.

[2] D.H. BAILEY and J.M. BORWEIN. Experimen-

tal mathematics: recent developments and future

outlook. In Mathematics unlimited—2001 and be-

yond, pages 51–66. Springer, Berlin, 2001. URL:

users.cs.dal.ca/˜jborwein/math-future.pdf.

[3] A. BEN-TAL and A.S. NEMIROVSKI. Robust convex opti-

mization. Math. Oper. Res., 23(4):769–805, 1998.

[4] C.G. BROYDEN. The convergence of a class of double-

rank minimization algorithms, Part I. IMA J. Appl. Math.,

6:76–90, 1970.

[5] W.C. DAVIDON. Variable metric methods for minimiza-

tion. Technical Report ANL-5990, Argonne National Labs,

Argonne, IL, 1959.

[6] R.J. DUFFIN. Infinite programs. In A.W. Tucker, editor,

Linear Equalities and Related Systems, pages 157–170.

Princeton University Press, Princeton, NJ, 1956.

37-1

[7] L. EL GHAOUI and G. CALAFIORE. Worst-case sim-

ulation of uncertain systems. In A. Tesi A. Garulli and

A. Vicino, editors, Robustness in Identification and Con-

trol, Lecture Notes in Control and Information Sciences.

Springer, 1999. To appear.

[8] A.V. FIACCO and G.P. McCORMICK. Nonlinear program-

ming sequential unconstrained minimization techniques.

Classics in Applied Mathematics. SIAM, Philadelphia, PA,

USA, 1990.

[9] R. FLETCHER. A new approach to variable metric algo-

rithms. Comput. J., 13:317–322, 1970.

[10] R. FLETCHER and M.J.D. POWELL. A rapidly convergent

descent method for minimization. Comput. J., 6:163–168,

1963.

[11] K.R. FRISCH. The logarithmic potential method of convex

programming. Technical report, Institute of Economics,

Oslo University, Oslo, Norway, 1955.

[12] P.E. GILL, W. MURRAY, and M.H. WRIGHT. Practical Op-

timization. Academic Press, New York, London, Toronto,

Sydney and San Francisco, 1981.

[13] D. GOLDFARB. A family of variable-metric methods de-

rived by variational means. Math. Comp., 24:23–26, 1970.

37-2

[14] R.B. HOLMES. Geometric Functional Analysis and its Ap-

plications. Springer-Verlag, Berlin, 1975.

[15] J. JAHN. Mathematical Vector Optimization in Partially

Ordered Linear Spaces. Peter Lang, Frankfurt am Main,

1986.

[16] G. JAMESON. Ordered linear spaces. Springer-Verlag,

New York, 1970.

[17] F. JOHN. Extremum problems with inequalities as sub-

sidiary conditions. In Sudies and Essays, Courant An-

niversary Volume, pages 187–204. Interscience, New

York, 1948.

[18] L.V. KANTOROVICH. Mathematical methods of organiz-

ing and planning production. Management Sci., 6:366–

422, 1959/1960.

[19] N.K. KARMARKAR. A new polynomial-time algorithm for

linear programming. Combinatorica, 4:373–395, 1984.

[20] W. KARUSH. Minima of functions of several variables with

inequalities as side constraints. Master’s thesis, University

of Chicago, Illiois, 1939.

[21] M. KOJIMA, S. MIZUNO, and A. YOSHISE. A primal–

dual interior point algorithm for linear programming. In

37-3

N. Megiddo, editor, Progress in Mathematical Program-

ming : Interior Point and Related Methods, pages 29–47.

Springer Verlag, New York, 1989.

[22] T.C. KOOPMANS. Concepts of optimality and their uses.

Nobel Memorial Lecture, Yale University, 1975.

[23] H.W. KUHN. The Hungarian method for the assignment

problem. Naval Res. Logist. Quart., 2:83–97, 1955.

[24] H.W. KUHN. Nonlinear programming: a historical view. In

R.W. Cottle and C.E. Lemke, editors, Nonlinear Program-

ming, pages 1–26, Providence, R.I., 1976. AMS.

[25] H.W. KUHN and A.W. TUCKER. Nonlinear programming.

In Proceedings of the Second Berkeley Symposium on

Mathematical Statistics and Probability, 1950, pages 481–

492, Berkeley and Los Angeles, 1951. University of Cali-

fornia Press.

[26] J.K. LENSTRA, A.H.G. RINNOY KAN, and A. SCHRI-

JVER. History of Mathematical Programming: A Collec-

tion of Personal Reminiscences. CWI North-Holland, Am-

sterdam, 1991.

[27] A.S. LEWIS, P.A. PARRILO, and M.V. RAMANA. The Lax

conjecture is true. Proc. Amer. Math. Soc., 133(9):2495–

2499 (electronic), 2005.

37-4

[28] D.G. LUENBERGER. Optimization by Vector Space Meth-

ods. John Wiley, 1969.

[29] I. J. LUSTIG, R. E. MARSTEN, and D. F. SHANNO. On

implementing Mehrotra’s predictor–corrector interior point

method for linear programming. SIAM J. Optim., 2(3):435–

449, 1992.

[30] S. MEHROTRA. On the implementation of a primal-dual

interior point method. SIAM J. Optim., 2(4):575–601,

1992.

[31] J. MUNKRES. Algorithms for the assignment and trans-

portation problems. J. Soc. Indust. Appl. Math., 5:32–38,

1957.

[32] Y.E. NESTEROV and A.S. NEMIROVSKI. Interior Point

Polynomial Algorithms in Convex Programming. SIAM

Publications. SIAM, Philadelphia, USA, 1994.

[33] R.T. ROCKAFELLAR. Convex Analysis. Princeton Uni-

versity Press, Princeton, NJ, 1970.

[34] D.F. SHANNO. Conditioning of quasi-Newton methods for

function minimization. Math. Comp., 24:647–657, 1970.

[35] H. WOLKOWICZ and G.P.H STYAN. More bounds for

eigenvalues using traces. Linear Algebra Appl., 31:1–17,

1980.

37-5

[36] Yau Chuen Wong and Kung Fu Ng. Partially ordered topo-

logical vector spaces. Clarendon Press, Oxford, 1973.

Oxford Mathematical Monographs.

37-6

Date post:	08-Feb-2018
Category:	Documents
Upload:	vannhan
View:	239 times
Download:	2 times

Optimization: Theory, Algorithms, Applicationshwolkowi/henry/reports/talks.d/t06talks.d/... ·...

Documents