Optimization: Theory,Algorithms, Applications
MSRI - BerkeleySAC, Nov/06
Henry Wolkowicz
Department of Combinatorics & Optimization
University of Waterloo
Optimization: Theory, Algorithms, Applications – p.1/37
OutlineWhy are we here? (What is Optimization?)
History of Optimization
Main Players
Most important Open Problems
Different Areas for connections
Resources/References
Optimization: Theory, Algorithms, Applications – p.2/37
What isOptimization?
Two quotes from Tjalling C. Koopmans, NobelMemorial Lecture: [22]
“best use of scarce resources”
“Mathematical Methods of Organizing andPlanning of Production", [18]
————————-
(Kantorovich and K.: joint winners Nobel Prize
Economics 1975, "for their contributions to the
theory of optimum allocation of resources")
Optimization: Theory, Algorithms, Applications – p.3/37
History
Virgil’s Aeneid 19 BCE, Legend of CarthageQueen Dido’s Problem: Queen fled to African coastafter husband killed; she begged King Jambas (local ruler)for land; he granted only as much as she could enclosewithin a bull’s hide; she sliced the hide into strips; used thestrips to surround a large area. Optimal shape was ?—————-In 3-dimensions: soap bubbles and films are examples ofminimal surface areas.
Optimization: Theory, Algorithms, Applications – p.4/37
The BrachistochroneProblem
cycloid or curve of fastest descent;stationary body starts at first point and passesdown along curve to second point, under actionof constant gravity, ignoring friction.Bernoulli (1696)/Calculus of Variations
Figure 1: CycloidOptimization: Theory, Algorithms, Applications – p.5/37
History of Math.Progr., 1991 [26]
• remarkably short - rooted in applications
• 1940’s - driven by applications (war time -moving men and machinery)
• Dantzig (Pentagon - Stanford) andKantorovich (Leningrad)
• Others: Hitchcock, Koopmans, Arrow,Charnes, Gale, Goldman, Hoffman, Kuhn,von Neumann (game theory, duality,computers) etc...
Optimization: Theory, Algorithms, Applications – p.6/37
Dantzig/LinearProgramming, LP
• Planning problems:
Assign 70 men to 70 jobs; vij benefit ofman i assigned to job j (LinearAssignment Problem, LAP)
but 70! > 10100 (a googol)• Dantzig visited Von Neumann - Oct 3, 1947 -learned about Farkas’ Lemma, Duality (gametheory) - SIMPLEX METHOD for LP—————-• Hotteling: But we all know the world isnonlinear ... Von Neumann: ... if linearapplication ... use it Optimization: Theory, Algorithms, Applications – p.7/37
UnreasonableSuccess of Simplex
LP min cTx s.t. Ax = b, x ≥ 0.• Klee-Minty 1970: exponential time example forsimplex method. But, linear time in practice.• SIAM 70s computer survey: 70% of (world)computer time spent on LP/simplex• Is LP in class P (easy) or class NP (hard)?• Russian Mathematician Khachian 1978: LPalgorithm based on ellipsoids/duality/inequalitiesshowed LP is in P. (NYT frontpage stories/fables)• Hungarian method for LAP in O(n3) time,[23, 31]; BUT - still no known strongly polynomialmethod for general LP.
Optimization: Theory, Algorithms, Applications – p.8/37
A First Meeting
Figure 2: George B. Dantzig and Leonid Khachiyan,meeting for the first time, February 1990, Asilomar, Cali-fornia, at the SIAM-organized workshop Progress in Math-ematical Programming.
2005: Khachiyan died Apr 29 (age 52)Dantzig died May 13 (age 90)
Optimization: Theory, Algorithms, Applications – p.9/37
Lagrange MultiplierExtensions
NLP: MOTIVATED by LP Success e.g. [24]• [25] 1951: Kuhn-Tucker optimality conditionsfor nonlinear programming (NLP)[20] 1939: Karush, Masters Thesis, Math., Univ.Chicago(Same constraint qualification)• [17] 1948: Fritz John, Extremum problems withinequalities...
Optimization: Theory, Algorithms, Applications – p.10/37
K-K-T Conditions
NLP min f(x) s.t. g(x) ≤ 0, h(x) = 0
CQ: Geometry (cone of tangents) coincides withalgebra (linearization) (modern opt cond)∇f(x∗) + g′(x∗)λ∗ + h′(x∗)µ∗ = 0, λ ≥ 0, dual feas.
h(x∗) = 0, g(x∗) ≤ 0, primal feas.g(x∗)Tλ∗ = 0, compl. slack.
Proof: Apply Farkas’ Lemma, 1902, to locallinearization. (modern: use hyperplaneseparation theorem/S.Mazur’s geometricHahn-Banach Theorem.)
Optimization: Theory, Algorithms, Applications – p.11/37
Further Extensions• Infinite (Cone) Programs, Duffin, 1956 [6].• optimization with respect to partial orders,[15, 36, 16, 28, 14].• Optimal Control (Pontryagin MaximumPrinciple)• Discrete/Combinatorial Optimization
FIRST CHANCES- QUESTIONS? DISCUSSION?
Optimization: Theory, Algorithms, Applications – p.12/37
NEOS/Argonne/Solvers
Figure 3: Optimization Tree, neos.mcs.anl.govOptimization: Theory, Algorithms, Applications – p.13/37
Quasi-NewtonMethods
For Unconstrained Optimization:• Least Change Secant Methods, Variable MetricMethods Davidon’59[5]/Fletcher-Powell’63[10](DFP), andBroyden[4]/Fletcher[9]/Goldfarb[13]/Shanno[34]’70/(BFGS).• rank-two updates of Hessian, maintainspositive definite Hessian approximations• But: automatic differentiationGriewank-Corliss’91 differentiates the codeefficiently.
Optimization: Theory, Algorithms, Applications – p.14/37
Power of Duality
Find optimal trajectory/control (for rocket)
µ0 = min J(u) = 12‖u(t)‖2 = 1
2
∫ t1t0
u2(t)dt
s.t. x(t) = A(t)x(t) + b(t)u(t)
x(t0) = x0, x(t1) ≥ c.
Using fundamental solution matrix Φ
x(t1) = Φ(t1, t0)x(t0) +
∫ t1
t0
Φ(t1, t)u(t)b(t)dt
︸ ︷︷ ︸
integral oper. Ku
Optimization: Theory, Algorithms, Applications – p.15/37
Duality cont...
Convex Pgm
{
min J(u) = 12‖u(t)‖2
s.t. Ku ≥ d
Lagrangian dual (best lower bound) is
µ0 = maxλ≥0 minu{J(u) + λT (d −Ku)}= maxλ≥0 λTQλ + λTd simple FIN. dim. QP
where Q = −12
∫ t1t0
Φ(t1, t)b(t)b(t)T Φ(t1, t)dt
u∗(t) = λT∗ Φ(t1, t)b(t)
Optimization: Theory, Algorithms, Applications – p.16/37
Convex Analysis
Lies behind results in Optimization• Classic ’70 text by Rockafellar (UofW, Seattle),[33];• Nonsmooth Analysis: Clarke, Borwein (SmoothVariational Principle), Mordukhovich, Lewis• Variational Principles (powerful optimalityconditions, extensions to nonconvex case)
Optimization: Theory, Algorithms, Applications – p.17/37
Proving/Generating Theorems using
Optimization
Spectral Decomposition Theorem, A = AT :• min xTAx s.t. xTx = 1Lagrangian is: L(x, λ) = xTAx + λ(1 − xTx)stationarity: ∇L(x1, λ) = 2Ax1 − 2λx1 = 0
min eig since obj.: xT1 Ax1 = λxT
1 x1 = λ → min
Now add constraint xTx1 = 0, to get secondeigen-pair etc...
Optimization: Theory, Algorithms, Applications – p.18/37
Proving/Generating Theorems using
Optimization cont...
Eigenvalue Bounds, A = AT :
min λ1(A)
s.t∑
i λi(A) = trace (A)∑
i λ2i (A) = trace (A2)
Lagrangian is: L(....., stationarity: ∇L(.... = 0
Explicit solution: m := trace (A)n
; s2 = trace (A2)n
− m2
λmin(A) ≤ m +√
n − 1sSimilarly, get upper/lower bounds for λ2(A) andother functions of the eigenvalues, e.g. [35].
Optimization: Theory, Algorithms, Applications – p.19/37
SUMT• Penalty and Barrier Methods (lost favour)• Frisch ’55 [11];Sequential Unconstrained MinimizationTechniques, Fiacco-McCormick, ’68 [8],• Penalize 1
µ‖equality constraints‖2, 1
µ→ ∞,
-replace inequality constraints by smooth barrier,µΣk log gk(x), µ ↓ 0• Solve sequence of simpler unconstrainedproblems
minx
Bµ(x) = f(x) +1
µ‖h(x)‖2 + µΣk log gk(x)
Optimization: Theory, Algorithms, Applications – p.20/37
Methods usingLagrange Multipliers
• Hestenes, Rockafellar, Fletcher, Powell,Conn-Gould-Toint (augmented Lagrangians -combination of Lagrange and penalty methods),Gill-Murray-Wright (’81 [12], Stanford).• Sequential Quadratic Programming (SQP):Solve the Newton direction for the optimalityconditions using quadratic approximationsinvolving the Lagrangian function.
SECOND CHANCES- QUESTIONS? DISCUSSION?
Optimization: Theory, Algorithms, Applications – p.21/37
Interior PointMethods, LP
• 1984: Karmarkar’84[19](Berkeley), interiorpoint method to improve complexity (polynomialtime) results. BUT: high efficiency claimed forpractical problems!(NYT frontpage stories/fables)• Stanford gang of four(Gill-Murray-Wright-Saunders) connection tolog-barrier methods (came back).
Optimization: Theory, Algorithms, Applications – p.22/37
Interior PointRevolution
• Kojima-Mizuno-Yoshise’89[21] elegantprimal-dual path-following framework.Mehrotra’92[30] predictor-correctorspeedup-stability.• OB1 Lustig-Marsten-Shanno, ’92 [29], legalbattle with Bell Labs• CPLEX tool for LP - large scale - 15 millionvariable problems solved on a desktop.• Nesterov-Nemirovski, ’89 [32], extensions toconvex problems, e.g. cone optimizationproblems, e.g. Semidefinite Programming
Optimization: Theory, Algorithms, Applications – p.23/37
SemidefiniteProgramming
• Elegant Theory, Efficient Algorithms, ManyApplications• MAX-CUT Undirected, weighted graphG = (N,E), weights W = wij. Cut (divide) the setof nodes N into two sets so that the sum ofweights that are cut is maximized.
p∗ := max 14
∑
ij wij(1 − xixj)
s.t xi ∈ {±1}, i = 1, . . . n
s.t/equiv. x2i = 1, i = 1, . . . n
Optimization: Theory, Algorithms, Applications – p.24/37
SDP via Dual of DualLagrangian Dual
d∗ := minλ maxx xTQx +∑
i λi(1 − x2i )
= minQ−Diag (λ)�0
etλ
dual of MC dual of dual of MCmin etλ
s.t Diag (λ) − Z = Q
Z � 0
max trace QX
s.t diag (X) − Z = Q
Z � 0.878 performance guarantee, Goemans andWilliamson (IBM Bay Area)
Optimization: Theory, Algorithms, Applications – p.25/37
SDP with CONDORSolves QAP
• QAP: Quadratic Assignment Problem: sizen > 20 considered hard. (compared to fastsolutions for n = 106 for LAP)• important applications to e.g. VLSI design,massive parallelism (Blue Gene/IBM)• SDP bound in a branch and bound framework;using CONDOR (High Throughput Computing-using free cycles worldwide)www.cs.wisc.edu/condor.• Solves Nugent problem n = 30 (and others) forfirst time, [1].
Optimization: Theory, Algorithms, Applications – p.26/37
SDP and Robust!!Optimization
Robust optimization: problem data known onlywithin certain bounds.
conic: max bTy s.t. ∀A ∈ U , c − ATy ∈ K
goal: find feasible solution acceptably close tooptimal for data within the bounds.Applications e.g.: control theory; engineeringdesign and finance; Aircraft path planning;machine learning (robust classification, supportvector machines, and kernel optimization); e.g.Ben-Tal [3]; El-Ghaoui (Berkeley) [7].
Optimization: Theory, Algorithms, Applications – p.27/37
SDP and Hilbert’s17th Problem, SOS
Hilbert, 1900: Given a multivariatepolynomial that takes only non-negativevalues over the reals, can it berepresented as a sum of squares ofrational functions?
Artin, 1927 - YES; (Gondard & Ribenboim,extension to symmetric matrices, 1974.)But: SOS polys ⊂ nonneg polys(ml(z) vector of monomials)
p(z) is SOS of pols iff p(z) ≡ ml(z)TWml(z),W �Optimization: Theory, Algorithms, Applications – p.28/37
Lax’58 Conjecture isTrue
Lewis, Parrilo, Ramana’03,[27]:Hyperbolic polynomials in three variablesare determinants of three symmetricmatrices.
and this is equivalent to Helton-Vinnikovobservation:
A polynomial q on ℜ2 is a real zeropolynomial of degree d and satisfiesq(0, 0) = 1 if and only if there existmatrices B,C ∈ Sd such that q is given by
q(y, z) = det(I + yB + zC).Optimization: Theory, Algorithms, Applications – p.29/37
SDP Open Problems;Exciting Area
• Which problems can be formulated as SDPs?(Algebraic connections, BIRS/MSRI workshop onPositive Polynomials and Optimization, e.g.Parrilo, ex-postdoc Berkeley)• efficient/stable solutions, large scale problems• extension of SDP methodology: e.g. symmetriccones and relation to Jordan algebras; bilinearmatrix inequalities.• SDP at UC Berkeley: leader is Laurent ElGhaoui; recent grad. Jiawang Nie (working onsensor localization) with co-advisors JimDemmel and Bernd Sturmfels.
Optimization: Theory, Algorithms, Applications – p.30/37
OutstandingProblems/Questions
• Kepler (1611) Conjecture: close packing (cubicor hexagonal close packing; have maximumdensities of π/(3
√2) ≈ 74.048%) is the densest
possible sphere packing. Find densest (notnecessarily periodic) packing of spheres - Keplerproblem.
Optimization: Theory, Algorithms, Applications – p.31/37
Effective Certificatesof Optimality?
• Hales’ (1997) detailed plan; extensive use ofcomputer calculations. Hales’ full proof in aseries of papers totaling more than 250 pages(Cipra 1998). Proof relies extensively on: globaloptimization; linear programming; intervalarithmetic. Computer files contain more than 3gigabytes of storage, e.g. [2].
Optimization: Theory, Algorithms, Applications – p.32/37
Hard NonconvexProblems
• e.g. protein folding - how does naturalphenomena optimize?• - Hubble telescope SDP - projection algorithm.• strongly polynomial LP algorithms; Hirschconjecture (comb. diameter of d-polytope with nfacets is bounded by n − d;• weather prediction uses a least squaresmin/opt problem for initial conditions• Massive Parallel Computing optimize network -minimize heat - VLSI design; metrics forperformance?
Optimization: Theory, Algorithms, Applications – p.33/37
Discrete Optimization
• Important applications: e.g. ministry of health -all discrete opt problems• using continuous optimization relaxationswithin branch and bound methods• Gomory cutting planes came back and liebehind current success of solving large scalediscrete problems (e.g. in CPLEX).• QAP problem is NP- hard, but still needs to besolved, i.e. worst case complexity and expectedperformance can differ drastically.
Optimization: Theory, Algorithms, Applications – p.34/37
Connections withOptimization
• Discrete and Continuous Optimization• Optimal Control Theory (space program,environment)• Medicine (molecular conformation, scheduling)• Politics (game theory)• Computer Science (massive parallelism, VLSIdesign, computer design, quantum computing)• Management Science and Engineering ingeneral• Economics (government planning)• Statistics, e.g. machine learning
Optimization: Theory, Algorithms, Applications – p.35/37
THIRD CHANCES- QUESTIONS? DISCUSSION?
Optimization: Theory, Algorithms, Applications – p.36/37
Resources/References• Optimization Frequently Asked Questions:
Linear Programming FAQNonlinear Programming FAQ
www-unix.mcs.anl.gov/otc/Guide/faq/• NEOS: neos.mcs.anl.gov/neos/index.html• e-optimization community:
www.e-optimization.com/• Optimization Online:
www.optimization-online.org/
Optimization: Theory, Algorithms, Applications – p.37/37
References
[1] K.M. ANSTREICHER, N.W. BRIXIUS, J.-P. GOUX, and
J. LINDEROTH. Solving large quadratic assignment prob-
lems on computational grids. Math. Program., 91(3, Ser.
A):563–588, 2002.
[2] D.H. BAILEY and J.M. BORWEIN. Experimen-
tal mathematics: recent developments and future
outlook. In Mathematics unlimited—2001 and be-
yond, pages 51–66. Springer, Berlin, 2001. URL:
users.cs.dal.ca/˜jborwein/math-future.pdf.
[3] A. BEN-TAL and A.S. NEMIROVSKI. Robust convex opti-
mization. Math. Oper. Res., 23(4):769–805, 1998.
[4] C.G. BROYDEN. The convergence of a class of double-
rank minimization algorithms, Part I. IMA J. Appl. Math.,
6:76–90, 1970.
[5] W.C. DAVIDON. Variable metric methods for minimiza-
tion. Technical Report ANL-5990, Argonne National Labs,
Argonne, IL, 1959.
[6] R.J. DUFFIN. Infinite programs. In A.W. Tucker, editor,
Linear Equalities and Related Systems, pages 157–170.
Princeton University Press, Princeton, NJ, 1956.
37-1
[7] L. EL GHAOUI and G. CALAFIORE. Worst-case sim-
ulation of uncertain systems. In A. Tesi A. Garulli and
A. Vicino, editors, Robustness in Identification and Con-
trol, Lecture Notes in Control and Information Sciences.
Springer, 1999. To appear.
[8] A.V. FIACCO and G.P. McCORMICK. Nonlinear program-
ming sequential unconstrained minimization techniques.
Classics in Applied Mathematics. SIAM, Philadelphia, PA,
USA, 1990.
[9] R. FLETCHER. A new approach to variable metric algo-
rithms. Comput. J., 13:317–322, 1970.
[10] R. FLETCHER and M.J.D. POWELL. A rapidly convergent
descent method for minimization. Comput. J., 6:163–168,
1963.
[11] K.R. FRISCH. The logarithmic potential method of convex
programming. Technical report, Institute of Economics,
Oslo University, Oslo, Norway, 1955.
[12] P.E. GILL, W. MURRAY, and M.H. WRIGHT. Practical Op-
timization. Academic Press, New York, London, Toronto,
Sydney and San Francisco, 1981.
[13] D. GOLDFARB. A family of variable-metric methods de-
rived by variational means. Math. Comp., 24:23–26, 1970.
37-2
[14] R.B. HOLMES. Geometric Functional Analysis and its Ap-
plications. Springer-Verlag, Berlin, 1975.
[15] J. JAHN. Mathematical Vector Optimization in Partially
Ordered Linear Spaces. Peter Lang, Frankfurt am Main,
1986.
[16] G. JAMESON. Ordered linear spaces. Springer-Verlag,
New York, 1970.
[17] F. JOHN. Extremum problems with inequalities as sub-
sidiary conditions. In Sudies and Essays, Courant An-
niversary Volume, pages 187–204. Interscience, New
York, 1948.
[18] L.V. KANTOROVICH. Mathematical methods of organiz-
ing and planning production. Management Sci., 6:366–
422, 1959/1960.
[19] N.K. KARMARKAR. A new polynomial-time algorithm for
linear programming. Combinatorica, 4:373–395, 1984.
[20] W. KARUSH. Minima of functions of several variables with
inequalities as side constraints. Master’s thesis, University
of Chicago, Illiois, 1939.
[21] M. KOJIMA, S. MIZUNO, and A. YOSHISE. A primal–
dual interior point algorithm for linear programming. In
37-3
N. Megiddo, editor, Progress in Mathematical Program-
ming : Interior Point and Related Methods, pages 29–47.
Springer Verlag, New York, 1989.
[22] T.C. KOOPMANS. Concepts of optimality and their uses.
Nobel Memorial Lecture, Yale University, 1975.
[23] H.W. KUHN. The Hungarian method for the assignment
problem. Naval Res. Logist. Quart., 2:83–97, 1955.
[24] H.W. KUHN. Nonlinear programming: a historical view. In
R.W. Cottle and C.E. Lemke, editors, Nonlinear Program-
ming, pages 1–26, Providence, R.I., 1976. AMS.
[25] H.W. KUHN and A.W. TUCKER. Nonlinear programming.
In Proceedings of the Second Berkeley Symposium on
Mathematical Statistics and Probability, 1950, pages 481–
492, Berkeley and Los Angeles, 1951. University of Cali-
fornia Press.
[26] J.K. LENSTRA, A.H.G. RINNOY KAN, and A. SCHRI-
JVER. History of Mathematical Programming: A Collec-
tion of Personal Reminiscences. CWI North-Holland, Am-
sterdam, 1991.
[27] A.S. LEWIS, P.A. PARRILO, and M.V. RAMANA. The Lax
conjecture is true. Proc. Amer. Math. Soc., 133(9):2495–
2499 (electronic), 2005.
37-4
[28] D.G. LUENBERGER. Optimization by Vector Space Meth-
ods. John Wiley, 1969.
[29] I. J. LUSTIG, R. E. MARSTEN, and D. F. SHANNO. On
implementing Mehrotra’s predictor–corrector interior point
method for linear programming. SIAM J. Optim., 2(3):435–
449, 1992.
[30] S. MEHROTRA. On the implementation of a primal-dual
interior point method. SIAM J. Optim., 2(4):575–601,
1992.
[31] J. MUNKRES. Algorithms for the assignment and trans-
portation problems. J. Soc. Indust. Appl. Math., 5:32–38,
1957.
[32] Y.E. NESTEROV and A.S. NEMIROVSKI. Interior Point
Polynomial Algorithms in Convex Programming. SIAM
Publications. SIAM, Philadelphia, USA, 1994.
[33] R.T. ROCKAFELLAR. Convex Analysis. Princeton Uni-
versity Press, Princeton, NJ, 1970.
[34] D.F. SHANNO. Conditioning of quasi-Newton methods for
function minimization. Math. Comp., 24:647–657, 1970.
[35] H. WOLKOWICZ and G.P.H STYAN. More bounds for
eigenvalues using traces. Linear Algebra Appl., 31:1–17,
1980.
37-5
[36] Yau Chuen Wong and Kung Fu Ng. Partially ordered topo-
logical vector spaces. Clarendon Press, Oxford, 1973.
Oxford Mathematical Monographs.
37-6