Essential Mathematics for Economists

Essential Mathematics for Economists

Alexis Akira Toda1

1Department of Economics, University of California San Diego. Email:[email protected]

mailto:[email protected]

Contents

I Basics 6

1 Linear Algebra 71.1 Linearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2 Inner product and norm . . . . . . . . . . . . . . . . . . . . . . . 81.3 Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.4 Identity matrix, inverse, determinant . . . . . . . . . . . . . . . . 101.5 Transpose, symmetric matrices . . . . . . . . . . . . . . . . . . . 111.6 Eigenvector, diagonalization . . . . . . . . . . . . . . . . . . . . . 131.7 Jordan canonical form . . . . . . . . . . . . . . . . . . . . . . . . 151.8 Matrix norm, spectral radius . . . . . . . . . . . . . . . . . . . . 151.9 Nonnegative matrices . . . . . . . . . . . . . . . . . . . . . . . . 16

2 Topology in Euclidean Spaces 212.1 Convergence of sequences . . . . . . . . . . . . . . . . . . . . . . 212.2 Topological properties . . . . . . . . . . . . . . . . . . . . . . . . 232.3 Continuous functions . . . . . . . . . . . . . . . . . . . . . . . . . 24

3 One-Variable Optimization 273.1 A motivating example . . . . . . . . . . . . . . . . . . . . . . . . 273.2 One-variable calculus . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.2.1 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . 283.2.2 Mean value theorem and Taylor’s theorem . . . . . . . . . 29

3.3 Convex functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4 Multi-Variable Calculus 354.1 A motivating example . . . . . . . . . . . . . . . . . . . . . . . . 354.2 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.3 Vector notation and gradient . . . . . . . . . . . . . . . . . . . . 374.4 Mean value theorem and Taylor’s theorem . . . . . . . . . . . . . 384.5 Chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5 Multi-Variable Unconstrained Optimization 425.1 First and second-order conditions . . . . . . . . . . . . . . . . . . 425.2 Convex optimization . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.2.1 General case . . . . . . . . . . . . . . . . . . . . . . . . . 445.2.2 Quadratic case . . . . . . . . . . . . . . . . . . . . . . . . 45

1

6 Multi-Variable Constrained Optimization 506.1 A motivating example . . . . . . . . . . . . . . . . . . . . . . . . 50

6.1.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . 506.1.2 A solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 516.1.3 Why study the general theory? . . . . . . . . . . . . . . . 51

6.2 Optimization with linear constraints . . . . . . . . . . . . . . . . 526.2.1 One linear constraint . . . . . . . . . . . . . . . . . . . . . 526.2.2 Multiple linear constraints . . . . . . . . . . . . . . . . . . 546.2.3 Linear inequality and equality constraints . . . . . . . . . 56

6.3 Optimization with nonlinear constraints . . . . . . . . . . . . . . 576.3.1 Karush-Kuhn-Tucker theorem . . . . . . . . . . . . . . . . 576.3.2 Convex optimization . . . . . . . . . . . . . . . . . . . . . 586.3.3 Constrained maximization . . . . . . . . . . . . . . . . . . 61

7 Introduction to Dynamic Programming 657.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

7.2.1 Knapsack problem . . . . . . . . . . . . . . . . . . . . . . 667.2.2 Shortest path problem . . . . . . . . . . . . . . . . . . . . 677.2.3 Optimal saving problem . . . . . . . . . . . . . . . . . . . 687.2.4 Drawing cards . . . . . . . . . . . . . . . . . . . . . . . . 687.2.5 Optimal proposal . . . . . . . . . . . . . . . . . . . . . . . 69

7.3 General formulation . . . . . . . . . . . . . . . . . . . . . . . . . 697.4 Solving dynamic programming problems . . . . . . . . . . . . . . 71

7.4.1 Value function iteration . . . . . . . . . . . . . . . . . . . 717.4.2 Guess and verify . . . . . . . . . . . . . . . . . . . . . . . 72

II Advanced Topics 76

8 Contraction Mapping Theorem and Applications 778.1 Contraction Mapping Theorem . . . . . . . . . . . . . . . . . . . 778.2 Blackwell’s condition for contraction . . . . . . . . . . . . . . . . 808.3 Markov chain and Perron’s theorem . . . . . . . . . . . . . . . . 818.4 Implicit function theorem . . . . . . . . . . . . . . . . . . . . . . 84

9 Convex Sets 909.1 Convex sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 909.2 Hyperplanes and half spaces . . . . . . . . . . . . . . . . . . . . . 919.3 Separation of convex sets . . . . . . . . . . . . . . . . . . . . . . 92

10 Convex Functions 9710.1 Convex and quasi-convex functions . . . . . . . . . . . . . . . . . 9710.2 Continuity of convex functions . . . . . . . . . . . . . . . . . . . 10010.3 Characterization of convex functions . . . . . . . . . . . . . . . . 10210.4 Characterization of quasi-convex functions . . . . . . . . . . . . . 10410.5 Subgradient of convex functions . . . . . . . . . . . . . . . . . . . 105

2

11 Convex Programming 10911.1 Convex programming . . . . . . . . . . . . . . . . . . . . . . . . . 109

11.1.1 Sufficiency without constraints . . . . . . . . . . . . . . . 10911.1.2 Saddle Point Theorem . . . . . . . . . . . . . . . . . . . . 11011.1.3 Necessity and sufficiency of KKT conditions . . . . . . . . 11311.1.4 Quasi-convex programming . . . . . . . . . . . . . . . . . 115

11.2 Portfolio selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 11611.2.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . 11611.2.2 Mathematical formulation . . . . . . . . . . . . . . . . . . 11611.2.3 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

11.3 Capital asset pricing model (CAPM) . . . . . . . . . . . . . . . . 11811.3.1 The model . . . . . . . . . . . . . . . . . . . . . . . . . . 11811.3.2 Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . 11911.3.3 Asset pricing . . . . . . . . . . . . . . . . . . . . . . . . . 119

12 Nonlinear Programming 12212.1 The problem and the solution concept . . . . . . . . . . . . . . . 12212.2 Cone and dual cone . . . . . . . . . . . . . . . . . . . . . . . . . 12212.3 Necessary condition . . . . . . . . . . . . . . . . . . . . . . . . . 12512.4 Karush-Kuhn-Tucker theorem . . . . . . . . . . . . . . . . . . . . 12712.5 Constraint qualifications . . . . . . . . . . . . . . . . . . . . . . . 12912.6 Sufficient condition . . . . . . . . . . . . . . . . . . . . . . . . . . 131

13 Maximum and Envelope Theorems 13413.1 A motivating example . . . . . . . . . . . . . . . . . . . . . . . . 13413.2 Maximum Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 13513.3 Envelope Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 138

14 Duality Theory 14314.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14314.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

14.2.1 Linear programming . . . . . . . . . . . . . . . . . . . . . 14414.2.2 Entropy maximization . . . . . . . . . . . . . . . . . . . . 145

14.3 Convex conjugate function . . . . . . . . . . . . . . . . . . . . . . 14614.4 Duality theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

15 Dynamic Programming in Infinite Horizon 15215.1 A motivating example . . . . . . . . . . . . . . . . . . . . . . . . 15215.2 General formulation . . . . . . . . . . . . . . . . . . . . . . . . . 15215.3 Verification theorem . . . . . . . . . . . . . . . . . . . . . . . . . 15315.4 Contraction argument . . . . . . . . . . . . . . . . . . . . . . . . 15615.5 Non-contraction argument . . . . . . . . . . . . . . . . . . . . . . 158

III Introduction to Numerical Analysis 160

16 Solving Nonlinear Equations 16116.1 Bisection method . . . . . . . . . . . . . . . . . . . . . . . . . . . 16116.2 Order of convergence . . . . . . . . . . . . . . . . . . . . . . . . . 16216.3 Newton method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

3

16.4 Linear interpolation . . . . . . . . . . . . . . . . . . . . . . . . . 16416.5 Quadratic interpolation . . . . . . . . . . . . . . . . . . . . . . . 16516.6 Robustifying the algorithms . . . . . . . . . . . . . . . . . . . . . 165

17 Polynomial approximation 16817.1 Lagrange interpolation . . . . . . . . . . . . . . . . . . . . . . . . 16817.2 Chebyshev polynomials . . . . . . . . . . . . . . . . . . . . . . . 16917.3 Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

18 Quadrature 17318.1 Newton-Cotes quadrature . . . . . . . . . . . . . . . . . . . . . . 173

18.1.1 Trapezoidal rule (N = 2) . . . . . . . . . . . . . . . . . . 17318.1.2 Simpson’s rule (N = 3) . . . . . . . . . . . . . . . . . . . 17418.1.3 Compound rule . . . . . . . . . . . . . . . . . . . . . . . . 175

18.2 Gaussian quadrature . . . . . . . . . . . . . . . . . . . . . . . . . 177

19 Discretization 18319.1 Earlier methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18319.2 Maximum entropy method . . . . . . . . . . . . . . . . . . . . . . 184

19.2.1 Discretizing probability distributions . . . . . . . . . . . . 18419.2.2 Discretizing general Markov processes . . . . . . . . . . . 18619.2.3 Examples and applications . . . . . . . . . . . . . . . . . 188

4

Notations

Symbol Meaning∀x . . . for all x . . .∃x . . . there exists x such that . . .∅ empty setx ∈ A or A 3 x x is an element of the set AA ⊂ B or B ⊃ A A is a subset of B; B contains AA ∩B intersection of sets A and BA ∪B union of sets A and BA\B elements of A but not in BRN set of vectors x = (x1, . . . , xN ) with xn ∈ RRN+ set of x = (x1, . . . , xN ) with xn ≥ 0 for all nRN++ set of x = (x1, . . . , xN ) with xn > 0 for all nx ≥ y or y ≤ x xn ≥ yn for all n; same as x− y ∈ RN+x > y or y < x xn ≥ yn for all n and xn > yn for some n; same as x− y ∈ RN+\ 0x y or y x xn > yn for all n; same as x− y ∈ RN++

〈x, y〉 inner product of x and y, 〈x, y〉 = x1y1 + · · ·+ xNyN‖x‖ norm of x, usually Euclidean norm ‖x‖ =

√x2

1 + · · ·+ x2N

clA closure of AintA interior of AcoA convex hull of A[a, b] closed interval x | a ≤ x ≤ b(a, b) open interval x | a < x < b(a, b]; [a, b) half open intervalsf : A→ B f is a function defined on A taking values in Bdom f effective domain of f ,

x ∈ RN

∣∣ f(x) <∞

epi f epigraph of f ,

(x, y) ∈ RN × R∣∣ y ≥ f(x)

f ∈ C(Ω) function f is continuous on Ωf ∈ Cr(Ω) function f is r times continuously differentiable on Ωf ≤ g or g ≥ f f(x) ≤ g(x) for all x∇f(x) gradient (vector of partial derivatives) of f at x∇2f(x) Hessian (matrix of second derivatives) of f at xDf(x) Jacobian (matrix of partial derivatives) of f at x

5

Part I

Basics

6

Chapter 1

Linear Algebra

This chapter covers the most basic topics in linear algebra. This note is tooshort to cover all the details. Good references are Lax (2007) and Horn andJohnson (2013).

1.1 Linearity

In mathematics, “linear” means that a property is preserved by addition andmultiplication by a constant. A linear space (more commonly vector space) is aset X for which x+y (addition) and αx (multiplication by α) are defined, wherex, y ∈ X and α ∈ R. In this course we only encounter the Euclidean space RN ,which consists of N -tuples of real numbers (called N -vectors)

x = (x1, . . . , xN ).

Here addition and multiplication by a constant are defined entry-wise:

(x1, . . . , xN ) + (y1, . . . , yN ) := (x1 + y1, . . . , xN + yN )

α(x1, . . . , xN ) := (αx1, . . . , αxN ).

(The symbol “:=” means that we define the left-hand side by the right-handside.)

A linear function is a function that preserves linearity. Thus f : RN → R islinear if

f(x+ y) = f(x) + f(y),

f(αx) = αf(x).

An obvious example of a linear function f : RN → R is

f(x) = a1x1 + · · ·+ aNxN =

N∑n=1

anxn,

where a1, . . . , aN are numbers. In fact we can show that all linear functions areof this form.

7

Proposition 1.1. If f : RN → R is linear, then f(x) = a1x1 + · · ·+ aNxN forsome a1, . . . , aN .

Proof. Let en = (0, . . . , 0, 1, 0, . . . , 0) be the vector whose n-th entry is 1 and allother entries are 0. (These vectors are called unit vectors.) By the definition ofRN , we have

x = (x1, . . . , xN ) = x1e1 + · · ·+ xNeN .

Hence by the linearity of f , we get

f(x) = x1f(e1) + · · ·+ xNf(eN ),

so f(x) has the desired form by setting an = f(en).

A set of vectors xjJj=1 ⊂ RN is called linearly independent if∑Jj=1 αjxj =

0 implies α1 = · · · = αJ = 0. Otherwise (if there is a combination (α1, . . . , αJ) 6=(0, . . . , 0) such that

∑Jj=1 αjxj = 0), the set of vectors xjJj=1 ⊂ RN is called

linearly dependent.

1.2 Inner product and norm

An expression of the form a1x1 + · · ·+ aNxN appears so often that it deservesa special name and notation. Let x = (x1, . . . , xN ) and y = (y1, . . . , yN ) be twovectors. Then

〈x, y〉 := x1y1 + · · ·+ xNyN =

N∑n=1

xnyn

is called the inner product (also vector product) of x and y.1 The (Euclidean)norm of x is defined by

‖x‖ :=√〈x, x〉 =

√x2

1 + · · ·+ x2N .

The Euclidean norm is also called the L2 norm for a reason that will be clearlater.

Fixing x, the inner product 〈x, y〉 is linear in y, so we have

〈x, y1 + y2〉 = 〈x, y1〉+ 〈x, y2〉 ,〈x, αy〉 = α 〈x, y〉 .

The same holds for x as well, fixing y. So the inner product is a bilinear functionof x and y.

You might remember from high school algebra/geometry that the inner prod-uct in a two-dimensional space satisfies

〈x, y〉 = x1y1 + x2y2 = ‖x‖ ‖y‖ cos θ,

where θ is the angle between the vector x = (x1, x2) and y = (y1, y2). Since

cos θ

> 0 if θ is an acute angle,

= 0 if θ is a right angle,

< 0 if θ is an obtuse angle,

1The term “inner” is weird but this is because there is a notion of “outer product”. Theinner product of x, y is sometimes denoted by (x, y), x · y, and 〈x | y〉, etc.

8

the vectors x, y are orthogonal if 〈x, y〉 = 0 and form an acute (obtuse) angleif 〈x, y〉 > 0 (< 0). Most of us cannot “see” higher dimensional spaces, butgeometric intuition is very useful. For any x, y ∈ RN , we say that x, y areorthogonal if 〈x, y〉 = 0.

The inner product and norms of vectors x, y satisfy the following Cauchy-Schwarz inequality: |〈x, y〉| ≤ ‖x‖ ‖y‖. The proof is in Problem 1.1. The norm‖·‖ : RN → R satisfies

1. ‖x‖ ≥ 0, with equality if and only if x = 0,

2. ‖αx‖ = |α| ‖x‖ for all α ∈ R,

3. ‖x+ y‖ ≤ ‖x‖+ ‖y‖.The last inequality is called the triangle inequality because it says that thelength of any edge of any triangle is less than or equal to the sum of the lengthof the remaining two edges. (Draw a picture of a triangle with vertices at points0, x, and x+ y.) Proving the triangle inequality is an exercise.

In general, any function from RN to R that satisfies the above three prop-erties is called a norm. Other examples than the Euclidean norm are

‖x‖1 :=

N∑n=1

|xn| , (l1 norm)

‖x‖∞ := maxn|xn| , (l∞ or sup norm)

‖x‖p :=

(N∑n=1

|xn|p)1/p

. (lp norm for p ≥ 1)

The proof that ‖·‖1 and ‖·‖∞ are norms is straightforward. The l2 norm is thesame as the Euclidean norm. The proof that ‖·‖p is a norm uses Minkowski’sinequality, proved in Problem 5.9.

1.3 Matrix

Instead of a linear function f : RN → R, consider a linear map f : RN → RM .This means that for each x ∈ RN , f associates a vector f(x) ∈ RM , and fis linear (preserves addition and multiplication by a constant): f(x + y) =f(x) + f(y) and f(αx) = αf(x). Let fm(x) be the m-th element of f , sof(x) = (f1(x), . . . , fM (x)). It’s easy to see that each fm(x) is a linear functionof x. Hence by Proposition 1.1, we have

fm(x) = am1x1 + · · ·+ amNxN

for some numbers am1, . . . , amN . Since this is true for any m, a linear mapcorresponds to numbers (amn), where 1 ≤ m ≤M and 1 ≤ n ≤ N . Conversely,any such array of numbers corresponds to a linear map. We write

A = (amn) =

a11 · · · a1n · · · a1N

.... . .

.... . .

...am1 · · · amn · · · amN

.... . .

.... . .

...aM1 · · · aMn · · · aMN

9

and call it a matrix. For an M ×N matrix A and an N -vector x, we define theM -vector Ax by the vector whose m-th element is

am1x1 + · · ·+ amNxN .

So f : RN → RM defined by f(x) = Ax is a linear map. By defining additionand multiplication by a constant entry-wise, the set of all M ×N matrices canbe identified as RMN , the MN -dimensional Euclidean space.

Now consider the linear maps f : RN → RM and g : RM → RL. Since f, g arelinear, we can find an M×N matrix A = (amn) and an L×M matrix B = (blm)such that f(x) = Ax and g(y) = By. We can also consider the composition ofthese two maps, h = g f , where h(x) := g(f(x)). It is easy to see that h is alinear map from RN to RL, and therefore it can be written as h(x) = Cx withan L×N matrix C = (cln). Using the definition h(x) = g(f(x)) = B(Ax), it isnot hard to see (exercise) that

cln =M∑m=1

blmamn.

So it makes sense to define the multiplication of matrix C = BA by this rule.You can use all standard rules of algebra such as B(A1 + A2) = BA1 + BA2,A(BC) = (AB)C, etc. The proof is immediate by carrying out the algebra orthinking about linear maps. In Matlab, A+B and A*B return the sum and theproduct of matrices A,B (if they are well-defined). If A,B have the same size,then A.*B returns the entry-wise product (Hadamard product).

1.4 Identity matrix, inverse, determinant

An M × N matrix is square when M = N . The identity map id : RN → RNdefined by id(x) = x is clearly linear and has a corresponding matrix I. Bysimple calculation I is square, its diagonal (off-diagonal) entries are all 1 (0).Clearly AI = IA = A when A is a square matrix (think of maps, or alternatively,do the entry-wise calculation). In Matlab, eye(N) returns the N -dimensionalidentity matrix.

A map f : RN → RN is said to be one-to-one (or injective) if f(x) 6= f(y)whenever x 6= y. f is onto (or surjective) if for each y ∈ RN , there exists x ∈ RNsuch that y = f(x). f is bijective if f is both injective and surjective. If f isbijective, for each y ∈ RN there exists a unique x ∈ RN such that y = f(x).Since this x depends only on y, we write x = f−1(y) and say that f−1 is theinverse of f . Now if f : RN → RN is a bijective linear map with a correspondingsquare matrix A (so f(x) = Ax), its inverse f−1 is also linear and hence hasa matrix representation. We write this matrix A−1 and call it the inverse ofA. Clearly AA−1 = A−1A = I. The inverse of A, if it exists, is unique. Tosee this, suppose that B,C are both inverse of A. Then AB = BA = I andAC = CA = I, so

B = BI = B(AC) = (BA)C = IC = C.

A matrix that has an inverse is called regular, nonsingular, invertible, etc. InMatlab, inv(A) returns the inverse of A.

10

If A =

[a bc d

], then the determinant of A is detA = ad− bc. In general, we

can define the determinant of a square matrix inductively as follows. For 1× 1matrix A = (a), we have detA = a. Suppose that the determinant has beendefined up to (N − 1)× (N − 1) matrices. If A = (amn) is N ×N , then

detA =

N∑n=1

(−1)m+namnMmn =

N∑m=1

(−1)m+namnMmn,

where Mmn is the determinant of the matrix obtained by removing row m andcolumn n of A. It is well-known that this definition is consistent (i.e., doesnot depend on m,n). Following are useful properties of the determinant (seetextbooks for proofs).

1. A is regular if and only if detA 6= 0. In that case, we have A−1 = 1detA A,

where A = (amn) satisfies amn = (−1)m+nMnm. For example, if A =[a bc d

], then A−1 = 1

ad−bc

[d −b−c a

].

2. IfA,B are square matrices of the same order, then det(AB) = (detA)(detB).

3. If A is partitioned as A =

[A11 A12

O A22

], where A11 and A22 are square

matrices, then detA = (detA11)(detA22).

In Matlab, det(A) returns the determinant of A.

1.5 Transpose, symmetric matrices

When numbers are stacked horizontally like x = (x1, . . . , xN ), it is called a rowvector. When stacked vertically likex1

...xN

,it is a column vector. An N -column vector is the same as an N × 1 matrix.An N -row vector is the same as a 1 × N matrix. The notation f(x) = Ax iscompatible with the definition of the product of an M × N matrix A and anN × 1 matrix x. To see this, writing down the entries, we get

Ax =

a11x1 + · · ·+ a1NxN...

am1x1 + · · ·+ amNxN...

aM1x1 + · · ·+ aMNxN

=

a11 · · · a1n · · · a1N

.... . .

.... . .

...am1 · · · amn · · · amN

.... . .

.... . .

...aM1 · · · a1n · · · a1N

x1

...xn...xN

= Ax.

(The left-most Ax is the linear map; the right-most Ax is the multiplication ofthe M ×N matrix A and the N × 1 matrix x.)

11

Unless otherwise specified, vectors are always assumed to be column vectors.However, it is awkward to write down column vectors every time because theytake up a lot of space, so we use the notation (x1, . . . , xN )′ (with a prime) todenote a column vector. The vector (x1, . . . , xN )′ is called the transpose of therow vector x = (x1, . . . , xN ). Sometimes, to distinguish from derivatives, wealso use the symbol > instead of ′ to denote the transpose. Oftentimes, we areeven more sloppy and do not distinguish between a row and column vector whenthere is no risk of confusion. (After all, what does it mean mathematically tostack numbers horizontally or vertically?)

Transpose can be defined for matrices, too. For an M×N matrix A = (amn),we define its transpose by the N ×M matrix B = (bnm), where bnm = amn,and we write B = A′ (or B = A>). Thus A′ is the matrix obtained by flippingA “diagonally”. In Matlab, A’ returns the transpose of A.

A square matrix P such that P ′P = PP ′ = I is called orthogonal, becauseby definition the column vectors of P are orthogonal and have Euclidean norm 1(just write down the entries of P ′P ). If P is orthogonal, then clearly P−1 = P ′.

A square matrix A such that A = A′ is called symmetric, because its entriesare symmetric about the diagonal. A is positive semidefinite if 〈x,Ax〉 ≥ 0 forall x, and positive definite if in addition 〈x,Ax〉 = 0 only if x = 0. Symmetricmatrices have a natural (partial) order (exercise): we write A B if and onlyif A−B is positive semidefinite.

There is a simple test for positive definiteness. Let A be square. The deter-minant of the matrix obtained by keeping the first k-th rows and columns of A iscalled the k-th principal minor of A. For example, if A = (amn) is N ×N , thenthe first principal minor is a11, the second principal minor is a11a22 − a12a21,and the N -th principal minor is detA, etc.

Proposition 1.2. Let A be real symmetric. Then A is positive definite if andonly if its principal minors are all positive.

Proof. We prove by mathematical induction on the dimension N of the matrixA. If N = 1, the claim is trivial.

Suppose the claim is true up to dimension N−1, and let A be N -dimensional.

Partition A as A =

[A1 bb′ c

], where A1 is an (N − 1)-dimensional symmetric

matrix, b is an (N − 1)-dimensional vector, and c is a scalar. Let

P =

[I −A−1

1 b0 1

].

Then by simple algebra we get

P ′AP =

[A1 00 c− b′A−1

1 b

].

Clearly detP = 1, so P is regular. Since

〈x,Ax〉 = x′Ax = (P−1x)′(P ′AP )(P−1x),

A is positive definite if and only if P ′AP is. But since P ′AP is block diagonal,P ′AP is positive definite if and only if A1 is positive definite and c−b′A−1b > 0.

12

By assumption, A1 is positive definite if and only if its principal minors are allpositive. Furthermore, since detP = 1, we get

detA = det(P ′AP ) = (detA1)(c− b′A−11 b).

Therefore

A O ⇐⇒ all principal minors of A1 are positive and c− b′A−11 b > 0

⇐⇒ all principal minors of A1 are positive and detA > 0

⇐⇒ all principal minors of A are positive,

so the claim is true for N as well.

1.6 Eigenvector, diagonalization

If A is a square matrix and there exist a number α and a nonzero vector vsuch that Av = αv, then we say that v is an eigenvector of A associated witheigenvalue α. Since

Av = αv ⇐⇒ (αI −A)v = 0,

α is an eigenvalue of A if and only if det(αI − A) = 0 (for otherwise αI − A isinvertible, which would imply v = 0, a contradiction). The polynomial ΦA(x) :=det(xI − A) is called the characteristic polynomial of A. In Matlab, eig(A)

returns the eigenvalues of A.Even if A is a real matrix, eigenvalues and eigenvectors need not be real.

For complex vectors x, y, the inner product is defined by

〈x, y〉 = x∗y = x′y =

N∑n=1

xnyn,

where x is the complex conjugate of x and x∗ = x′ is the transpose of the complexconjugate of x (called adjoint). By definition, 〈x, y〉 = 〈y, x〉. Similarly, for acomplex matrix A, its adjoint A∗ is defined by the complex conjugate of thetranspose. It is easy to see that 〈x,Ay〉 = 〈A∗x, y〉, because

〈A∗x, y〉 = (A∗x)∗y = x∗(A∗)∗y = x∗Ay = 〈x,Ay〉 .

Matrices satisfying A∗ = A are called Hermite. If A is real, then an Hermitematrix is the same as a symmetric matrix. For an Hermite matrix A, thequadratic form 〈x,Ax〉 is real, for

〈x,Ax〉 = 〈Ax, x〉 = 〈A∗x, x〉 = 〈x,Ax〉 .

Proposition 1.3. The eigenvalues of an Hermite matrix are real.

Proof. Suppose that Av = αv with v 6= 0. Then

〈v,Av〉 = 〈v, αv〉 = α 〈v, v〉 = α ‖v‖2

is real, so α = 〈v,Av〉 / ‖v‖2 is also real.

13

Since real symmetric matrices are Hermite, the eigenvalues of real symmetricmatrices are all real (and so are eigenvectors).

If U is a square matrix such that U∗U = UU∗ = I, then U is called unitary.Real unitary matrices are orthogonal, by definition.

We usually take the standard basis e1, . . . , eN in RN , but that is not neces-sary. Suppose we take vectors p1, . . . , pN, where the matrix P = [p1, . . . , pN ]is regular. Let x be any vector and y = P−1x. Then

x = PP−1x = Py = y1p1 + · · ·+ yNpN ,

so the entries of y can be interpreted as the coordinates of x when we use thebasis P . What does a matrix A look like when we use the basis P? Consider thelinear map x 7→ Ax. Using the basis P , this map looks like P−1x 7→ P−1Ax =(P−1AP )(P−1x), so the linear map x 7→ Ax has the matrix representationB = P−1AP . Oftentimes, it is useful to find a matrix P such that P−1AP isa simple matrix. The simplest matrices of all are diagonal ones. If we can findP such that P−1AP is diagonal, we say that A is diagonalizable. A remarkableproperty of real symmetric matrices is that they are diagonalizable with someorthogonal matrix.

Theorem 1.4. Let A be real symmetric. Then there exists a real orthogonalmatrix P such that P−1AP = P ′AP = diag[α1, . . . , αN ], where α1, . . . , αNare eigenvalues of A. (diag is the symbol for a diagonal matrix with specifieddiagonal entries.)

Proof. We prove by mathematical induction on the dimension of A. If A isone-dimensional (scalar), then the claim is trivial. (Just take P = 1.)

Suppose that the claim is true up to dimensionN−1. Let α1 be an eigenvalueof A and p1 be an associated eigenvector, so Ap1 = α1p1. Let W1 be the set ofvectors orthogonal to p1, so W1 = x | 〈p1, x〉 = 0. If x ∈W1, then

〈p1, Ax〉 = 〈A′p1, x〉 = 〈Ap1, x〉 = 〈αp1, x〉 = α 〈p1, x〉 = 0,

so Ax ∈ W1. Pick an orthonormal basis q2, . . . , qN in W1. Letting P1 =[p1, q2, . . . , qN ], then we have

AP1 = P1

[α1 00 A1

]⇐⇒ P ′1AP1 =

[α1 00 A1

],

where A1 is some N − 1-dimensional matrix. To see this, note that

AP1 = [Ap1, Aq2, . . . , AqN ] = [αp1, Aq2, . . . , AqN ].

Since each qn belongs to the space W1, Aqn is a linear combination of q2, . . . , qN ,so there exists a matrixA1 above. SinceA is symmetric, (P ′1AP1)′ = P ′1A(P1)′′ =P ′1AP1, so P ′1AP1 is also symmetric. Therefore A1 is symmetric. Since A1

is (N − 1)-dimensional, by assumption we can take an orthogonal matrix P2

such that P ′2A1P2 = D1, where D1 = diag[α2, . . . , αN ] is diagonal. ThenA1 = P2D1P

′2, so

P ′1AP1 =

[α1 00 P2D1P

′2

]=

[1 00 P2

] [α1 00 D1

] [1 00 P2

]′.

Letting P = P1

[1 00 P2

], which is orthogonal, then P ′AP =

[α1 00 D1

]=

diag[α1, . . . , αN ], so P ′AP is diagonal.

14

Similarly, Hermite matrices can be diagonalized by unitary matrices. Diag-onalization is often useful for proving theorems, see for example Problems 1.8,1.9, and 1.10.

1.7 Jordan canonical form

Two matrices A,B are said to be similar if there exists a regular matrix S suchthat B = S−1AS. Sometimes we want to find a simple matrix that is similar to agiven matrix. We know from Theorem 1.4 that if A is Hermite, then we can finda unitary matrix U and a diagonal matrix D such that D = U∗AU = U−1AU ,so we can take B = D and S = U . However, in general not all matrices arediagonalizable (Problem 1.12). Jordan’s theorem allows us to reduce any matrixto a simple one, called the Jordan canonical form. I omit the proof since it istedious but refer to textbooks (Lax, 2007, Appendix 15).

An n× n Jordan matrix with diagonal element λ is defined by

Jn(λ) =

λ 1 · · · 0

0. . .

. . ....

.... . .

. . . 10 · · · 0 λ

,so the diagonal entries are λ, the super diagonal entries are 1, and all otherentries are 0.

Jordan’s theorem. For any matrix A, there exists a regular matrix S suchthat

S−1AS =

Jn1(λ1) · · · 0...

. . ....

0 · · · Jnk(λk)

= D +N,

where D is a diagonal matrix and N is a matrix whose super diagonal entriesare either 0 or 1 and all other entries are 0, and DN = ND.

Jordan’s theorem is useful for computing matrix powers analytically. Forexample, using DN = ND and the binomial theorem, we obtain

S−1AnS = (S−1AS)n =

n∑k=0

(n

k

)Dn−kNk.

It is straightforward to show that Nk = O for large enough k.

1.8 Matrix norm, spectral radius

The matrix norm is a function that satisfies

1. (positivity) ‖A‖ ≥ 0, with equality if and only if A = O,

2. (scalar multiplicativity) ‖αA‖ = |α| ‖A‖,

3. (triangle inequality) ‖A+B‖ ≤ ‖A‖+ ‖B‖,

15

4. (submultiplicativity) ‖AB‖ ≤ ‖A‖ ‖B‖.

The following observation shows that there is a matrix norm associated withany norm on RN . Let ‖·‖ be any norm on RN . For any N ×N matrix A, define

‖A‖ = supx 6=0

‖Ax‖‖x‖

.

Then it is easy to show (Problem 1.13) that ‖·‖ is a matrix norm, called theoperator norm.

Let αnNn=1 be the eigenvalues of a matrix A. The quantity

ρ(A) = maxn|αn| ,

the largest modulus of all eigenvalues, is called the spectral radius of A. Thespectral radius and the matrix norm are related as follows.

Proposition 1.5 (Gelfand spectral radius formula). Let ‖·‖ be any matrix

norm. Then ρ(A) ≤ ‖An‖1/n and ρ(A) = limn→∞ ‖An‖1/n.

Proof. Let α be an eigenvalue of A and x 6= 0 be a corresponding eigenvector.Then Anx = αnx for all n. Let X = (x, . . . , x) be the matrix obtained byreplicating x. Then AnX = αnX. Taking the norm of both sides, we obtain

|α|n ‖X‖ = ‖AnX‖ ≤ ‖An‖ ‖X‖ =⇒ |α|n ≤ ‖An‖ .

Since α is any eigenvalue, it follows that ρ(A) ≤ ‖An‖1/n.

Take any ε > 0 and define A = 1ρ(A)+εA. Then ρ(A) = ρ(A)

ρ(A)+ε < 1, so con-

sidering the Jordan canonical form, it follows that limn→∞ An = O. Therefore∥∥∥An∥∥∥ < 1 for large enough n, and hence ‖An‖ ≤ (ρ(A) + ε)n. Taking the n-th

root, letting n → ∞, and ε ↓ 0, we obtain lim supn→∞ ‖An‖1/n ≤ ρ(A). Since

ρ(A) ≤ ‖An‖1/n, it follows that ρ(A) = limn→∞ ‖An‖1/n.

Remark 1.1. Although the above proof of Proposition 1.5 uses the submulti-plicativity of the matrix norm, this condition is actually not necessary. Fora proof of the Gelfand formula that does not require submultiplicativity, seeTheorem 5.7.10 of Horn and Johnson (2013).

1.9 Nonnegative matrices

Nonnegative matrices, though not usually treated in introductory textbooks oflinear algebra, play an important role in economics. This section provides abrief introduction. For a more complete treatment, see Chapter 8 of Horn andJohnson (2013), Berman and Plemmons (1994), or Bapat and Raghavan (1997).

I first discuss a motivating example. Suppose a worker can be either em-ployed or unemployed. If employed, he will be unemployed with probability pnext period. If unemployed, he will be employed with probability q next period.Let xt = (et, ut)

′ be the probability vector of being employed and unemployedat time t, where ut = 1− et. Then by assumption we have

et+1 = (1− p)et + qut,

ut+1 = pet + (1− q)ut.

16

Putting these equations into vector form, we obtain xt+1 = P ′xt, where

P =

[1− p pq 1− q

].

Given the initial probability x0, one might be interested in the probability vectorxt at time t and its behavior as t→∞. For this example we can easily calculatethese as follows. First, note that xt = (P ′)tx0, so it suffices to compute P t. Thecharacteristic polynomial of P is

ΦP (x) = |xI − P | =∣∣∣∣x− 1 + p −p−q x− 1 + q

∣∣∣∣= x2 + (p+ q − 2)x+ 1− p− q = (x− 1)(x+ p+ q − 1).

Assuming 0 < p, q < 1, P has two eigenvalues 1 and 1 − p − q ∈ (−1, 1).Therefore the spectral radius of P is 1. We can easily show that

P

[11

]=

[11

]and P

[p−q

]= (1− p− q)

[p−q

].

Therefore

S−1PS = D =

[1 00 1− p− q

],

where

S =

[1 p1 −q

].

Therefore

P t = SDtS−1 =1

p+ q

[1 p1 −q

] [1 00 (1− p− q)t

] [q p1 −1

]=

1

p+ q

[q + p(1− p− q)t p(1− (1− p− q)t)q(1− (1− p− q)t) p+ q(1− p− q)t

].

Since |1− p− q| < 1, letting t→∞, we obtain

P t → 1

p+ q

[q pq p

].

Thus regardless of x0 = (e0, u0), we obtain

xt = (P ′)tx0 →1

p+ q

[q qp p

] [e0

u0

]=

1

p+ q

[qp

],

so the worker eventually becomes unemployed with probability pp+q .

The above example can be generalized as follows. We say that a squarematrix A = (amn) is positive if amn > 0 for all m,n, and we write A 0.

Theorem 1.6 (Perron). Let A be a square positive matrix. Then

1. ρ(A) > 0, which is an eigenvalue of A (called the Perron root),

2. there exist positive vectors x, y (called the right and left Perron vectors)such that Ax = ρ(A)x and y′A = ρ(A)y′,

17

3. ρ(A) is geometrically simple (hence x, y are unique up to a multiplicativeconstant), and

4. If x, y are chosen such that y′x = 1, then limk→∞[ 1ρ(A)A]k = xy′.

The proof of Theorem 1.6 is deferred to Chapter 8. When A is only non-negative, some of the conclusions of Theorem 1.6 hold by taking limits. (SeeChapter 2 for the definition of limits.)

Corollary 1.7. Let A be a square nonnegative matrix. Then ρ(A) is an eigen-value of A, and there exist nonnegative vectors x, y such that Ax = ρ(A)x andy′A = ρ(A)y′.

Proof. Let A = (ann′) and define A(ε) by A(ε)nn′ = ann′ + ε, where ε > 0.Applying Perron’s theorem to A(ε), there exists a positive vector x(ε) with‖x(ε)‖ = 1 such that A(ε)x(ε) = ρ(A(ε))x(ε). Since roots of polynomials arecontinuous in the coefficients (Harris and Martin, 1987), we obtain ρ(A(ε)) →ρ(A) as ε ↓ 0. By taking a subsequence, there exists a nonnegative vector xsuch that x(ε) → x as ε ↓ 0. Therefore Ax = ρ(A)x. The same is true for theleft eigenvector y.

With an additional assumption called irreducibility, Perron’s theorem fornonnegative matrices (Corollary 1.7) can be further strengthened, which isknown as the Perron-Frobenius theorem. See Theorem 8.4.4 of Horn and John-son (2013) for details.

In some applications, we need to deal with square matrices A = (ann′) withnonnegative off-diagonal entries (ann′ ≥ 0 if n 6= n′), although A may havenegative diagonal entries. We call such matrices Metzler. If A is Metzler, sinceby definition its off-diagonal entries are nonnegative, the matrix A+dI becomesnonnegative if d ≥ 0 is large enough. This observation enables us to establishproperties of Metzler matrices using the Perron-Frobenius theorem. For Metzlermatrices, the role of the spectral radius ρ(A) is replaced by the spectral abscissa

ζ(A) = max Reα |α is an eigenvalue of A ,

which is the maximum real part of all eigenvalues.The following theorem is the analogue of the Perron-Frobenius theorem for

Metzler matrices.

Theorem 1.8. Let A be a Metzler matrix. Then the spectral abscissa ζ(A) is aneigenvalue of A, and there exist nonnegative vectors x, y such that Ax = ζ(A)xand y′A = ζ(A)y′. If in addition A has positive off-diagonal entries (moregenerally, if A is irreducible), then x, y are positive vectors and unique up to amultiplicative constant.

Proof. Immediate by applying Corollary 1.7 or the Perron-Frobenius theoremto the matrix A+ dI, where d ≥ 0 is large enough such that A+ dI ≥ 0.

Problems

1.1. Let x = (x1, . . . , xN ) and y = (y1, . . . , yN ) be vectors in RN . Define

f(t) = ‖tx− y‖2, where t ∈ R.

18

1. Expand f and express it as a quadratic function of t.

2. Prove the Cauchy-Schwarz inequality |〈x, y〉| ≤ ‖x‖ ‖y‖. (Hint: how manysolutions does the quadratic equation f(t) = 0 have? Make sure to treatthe cases x = 0 and x 6= 0 separately.)

1.2. Prove the triangle inequality ‖x+ y‖ ≤ ‖x‖+‖y‖. (Hint: Cauchy-Schwarzinequality.)

1.3. Let A,B,C be matrices with appropriate dimensions so that the followingexpressions are well defined. Prove that A(B + C) = AB + AC, A(BC) =(AB)C, (AB)−1 = B−1A−1, and (AB)′ = B′A′.

1.4.

1. Let A be a 2× 2 block upper triangular matrix

A =

[A11 A12

0 A22

].

If A is invertible, explicitly compute A−1.

2. Repeat the above problem if A is 3 × 3 block upper triangular. What ifA is N ×N block upper triangular?

1.5. Let A be an M × N matrix and write A = [a1, . . . , aN ], where an is the

n-th column vector of A. Show that the set of vectors anNn=1 are linearlyindependent if and only if the linear map x 7→ Ax is injective.

1.6. Let A,B,C be real symmetric matrices of the same size.

1. Prove that A A (reflexivity).

2. Prove that A B and B A imply A = B (antisymmetry).

3. Prove that A B and B C imply A C (transitivity).

Hence is a partial order for real symmetric matrices.

1.7. Let P be a matrix such that P 2 = P . Show that the eigenvalues of P areeither 0 or 1.

1.8. Let A be real symmetric. Show that A is positive definite if and only if alleigenvalues of A are positive.

1.9. Let A be real symmetric and positive semidefinite. Show that there existsa real symmetric and positive semidefinite matrix B such that A = B2.

1.10. Let A be real symmetric with eigenvalues α1, . . . , αN , where |α1| ≤ · · · ≤|αN |. Let ‖·‖ be the Euclidean norm. Show that for any nonzero vector x ∈ RN ,we have |α1| ≤ ‖Ax‖ / ‖x‖ ≤ |αN |.

1.11. Let A be an M ×N matrix. Let an be the n-th column vector of A, soA = [a1, . . . , aN ]. Show that the matrix A′A is positive definite if and only ifthe set of vectors a1, . . . , aN is linearly independent.

1.12.

19

1. Let A = (amn) be an upper triangular matrix, i.e., amn = 0 whenever

m > n. Prove that the eigenvalues of A are the diagonal entries annNn=1.

2. Prove that the matrix A =

[1 10 1

]is not diagonalizable.

1.13. Let ‖·‖ be any norm on RN . For any N ×N matrix A, define

‖A‖ = supx 6=0

‖Ax‖‖x‖

.

Show that ‖A‖ is a matrix norm.

1.14. Let A be a square nonnegative matrix. Show that if z > ρ(A), then thematrix zI −A is regular and (zI −A)−1 is nonnegative. (Hint: let B = 1

zA andconsider the identity (I −B)(I + · · ·+Bk−1) = I −Bk.)

1.15.

1. Let A,B be square nonnegative matrices such that 0 ≤ A ≤ B entry-wise.Show that ρ(A) ≤ ρ(B).

2. Show that if A is a square positive matrix, then ρ(A) > 0.

1.16. Let A be a square nonnegative matrix. If there exists a positive eigen-vector x 0 with eigenvalue α, so Ax = αx, show that α = ρ(A). (Hint: lety > 0 be a left eigenvector corresponding to ρ(A), and multiply y′ from left toAx = αx.)

1.17. If α1, . . . , αN are eigenvalues of a square matrix A, for any scalar z, showthat the eigenvalues of A+ zI are α1 + z, . . . , αN + z. Use this property to fillin the details of the proof of Theorem 1.8.

1.18. Let A = (ann′) be a Metzler matrix such that

ann = − 1

dn

∑n′ 6=n

ann′dn′ ,

where dn > 0 for all n.

1. Define the vector d = (d1, . . . , dN )′. Show that Ad = 0.

2. Show that the spectral abscissa of A is ζ(A) = 0. (Hint: let y > 0 be aleft eigenvector corresponding to ζ(A), and multiply y′ from left to theidentity (A− ζ(A)I)d = −ζ(A)d.)

20

Chapter 2

Topology in EuclideanSpaces

2.1 Convergence of sequences

By the triangle inequality, the (Euclidean) norm ‖·‖ on RN can be used to definea distance. For x, y ∈ RN , we define the distance between these two points by

dist(x, y) = ‖x− y‖ .

Let xk∞k=1 be a sequence in RN . (Here each xk = (x1k, . . . , xNk) is a vectorin RN .) We say that xk∞k=1 converges to x ∈ RN if

(∀ε > 0)(∃K > 0)k > K =⇒ ‖xk − x‖ < ε,

that is, for any small error tolerance ε > 0, we can find a large enough numberK such that the distance between xk and x can be made smaller than the errortolerance ε, provided that the index satisfies k > K. When xk∞k=1 converges tox, we write limk→∞ xk = x or xk → x (k →∞). Sometimes we are sloppy andwrite limxk = x or xk → x. A sequence xk∞k=1 is convergent if it convergesto some point.

An acute reader may notice that we have defined the convergence of a se-quence using the Euclidean norm, and thus may be worried that a sequencexk∞k=1 may converge to x with respect to the Euclidean norm but not withrespect to another norm. A remarkable property of finite dimensional spaces(RN ) is that it does not matter which norm we use to define convergence.

Theorem 2.1 (Equivalence of norms in RN ). Let ‖·‖1, ‖·‖2 be two norms onRN . Then there exist constants 0 < c ≤ C such that

c ‖x‖1 ≤ ‖x‖2 ≤ C ‖x‖1 (2.1)

for all x ∈ RN . Consequently, a sequence that is convergent with respect to onenorm is convergent with respect to any other norm.

The proof of Theorem 2.1 is in Problem 2.5. In general, two norms ‖·‖1 and‖·‖2 are said to be equivalent if (2.1) holds. Here let us show that the Euclidean

21

and sup norms

‖x‖2 :=

√√√√ N∑n=1

x2n and ‖x‖∞ := max

n|xn|

are equivalent. Clearly

‖x‖2 =

√√√√ N∑n=1

x2n ≥ |xn|

for any n, so taking the maximum over n, we get ‖x‖2 ≥ ‖x‖∞. Furthermore,since by definition |xn| ≤ ‖x‖∞ for all n, we get

‖x‖2 =

√√√√ N∑n=1

x2n ≤

√N ‖x‖2∞ =

√N ‖x‖∞ .

Therefore‖x‖∞ ≤ ‖x‖2 ≤

√N ‖x‖∞ ,

so we can take c = 1 and C =√N in (2.1).

A sequence xk∞k=1 is bounded if there exists b > 0 such that ‖xk‖ ≤ b forall k. More generally, a set A ⊂ RN is bounded if there exists b > 0 such that‖x‖ ≤ b for all x ∈ A. By Theorem 2.1, it does not matter which norm we useto define bounded sequences or sets.

Proposition 2.2. A convergent sequence is bounded.

Proof. Suppose that xk → x. Setting ε = 1 in the definition of convergence,we can take K > 0 such that ‖xk − x‖ < 1 for all k > K. By the triangleinequality, we have ‖xk‖ ≤ ‖x‖+ 1 for k > K. Therefore

‖xk‖ ≤ b := max ‖x1‖ , . . . , ‖xK‖ , ‖x‖+ 1 .

The sequence xkl∞l=1 is called a subsequence of xk∞k=1 if k1 < k2 < · · · <

kl < · · · . The following proposition shows that the subsequence of a convergentsequence converges to the same limit.

Proposition 2.3. If xk → x and xkl∞l=1 is a subsequence of xk∞k=1, then

xkl → x.

Proof. By the definition of convergence, for any ε > 0 we can take K > 0 suchthat ‖xk − x‖ < ε whenever k > K. Since kl ≥ l, it follows that ‖xkl − x‖ < εwhenever l > K, so xkl → x.

Let xk ⊂ R be a real sequence. Define αl = supk≥l xk and βl = infk≥l xk,possibly ±∞. Clearly αl is decreasing and βl is increasing, so they havelimits α, β in [−∞,∞] by the continuity property of the real numbers. We write

lim supk→∞

xk := α = liml→∞

supk≥l

xk,

lim infk→∞

xk := β = liml→∞

infk≥l

xk,

and call them the limit superior and limit inferior of xk, respectively.

22

2.2 Topological properties

Let F ⊂ RN be a set. F is closed if for any convergent sequence xk∞k=1 inF (meaning that xk ∈ F for all k and xk → x for some x ∈ RN ), the limitpoint belongs to F (meaning that x ∈ F ).1 Intuitively, a closed set is one thatincludes its own boundary. Thus the set [0, 1] = x | 0 ≤ x ≤ 1 is closed but(0, 1) = x | 0 < x < 1 is not.

Let U ⊂ RN be a set. The complement of U , denoted by U c, is definedby U c =

x ∈ RN

∣∣x /∈ U. That is, the complement of a set consists of thosepoints that do not belong to the original set. U is said to be open if U c isclosed.2 Thus (0, 1) = x | 0 < x < 1 is open, because its complement (0, 1)c =(−∞, 0] ∪ [1,∞) is closed. Intuitively, an open set is one that does not includeits own boundary.

A set K ⊂ RN is said to be compact3 if any sequence in K has a convergentsubsequence with a limit in K.4 That is, K is compact if for any sequencexk∞k=1 ⊂ K, we can find a subsequence xkl

∞l=1 and a point x ∈ K such that

xkl → x as l →∞. Compact sets are important because as we see in Theorem2.5 below, any continuous function attains a maximum and a minimum on acompact set. The following Heine-Borel theorem characterizes compact sets inRN .

Theorem 2.4 (Heine-Borel). K ⊂ RN is compact if and only if it is closed andbounded.

Proof. Suppose that K is compact. Take any convergent sequence xk ⊂ Kwith limxk = x. Since K is compact, we can take a subsequence xkl suchthat xkl → y for some y ∈ K. But by Proposition 2.3 we get x = y ∈ K, soK is closed. To prove that K is bounded, suppose that it is not. Then for anyk we can find xk ∈ K such that ‖xk‖ > k. For any subsequence xkl, since‖xkl‖ > kl → ∞ as l → ∞, xkl is not bounded. Hence by Proposition 2.2xkl is not convergent. Since xk has no convergent subsequence, K is notcompact, which is a contradiction. Hence K is bounded.

Conversely, suppose that K is closed and bounded. Let us show by induc-tion on the dimension N that any bounded sequence in RN has a convergentsubsequence. By the remark after Theorem 2.1, we may use the sup norm‖x‖ = maxn |xn| instead of the Euclidean norm to define convergence. ForN = 1, let xk∞k=1 ⊂ [−b, b] be a bounded sequence, where b > 0. Defineαl = supk≥l xk. Since xk ∈ [−b, b], it follows that αl ∈ [−b, b]. Clearly αl is adecreasing sequence, so it has a limit α ∈ [−b, b]. For each l, choose kl ≥ l suchthat |xkl − αl| < 1/l, which is possible by the definition of αl. Then

|xkl − α| ≤ |xkl − αl|+ |αl − α| <1

l+ |αl − α| → 0

as l → ∞, so xkl → α. Therefore xk∞k=1 has a convergent subsequencexkl

∞l=1.

1The letter F is often used for a closed set since the French word for “closed” is ferme.2The letters U, V are often used for an open set since the French word for “open” is ouvert

but the letter O is confusing due to the resemblance to 0.3The letter K is often used for a compact set since the German word for “compact” is

kompakt.4Strictly speaking, this is the definition of a sequentially compact set, but in RN (or more

generally metric spaces) the two concepts are equivalent.

23

Suppose that the claim is true up to dimension N−1. Let xk∞k=1 ∈ [−b, b]Nbe a bounded sequence, where xk = (x1k, . . . , xNk). Since x1k∞k=1 ⊂ [−b, b], ithas a convergent subsequence x1k′. By the induction hypothesis, the sequenceof (N−1)-vectors (x2k′ , . . . , xNk′) ⊂ [−b, b]N−1 has a convergent subsequence(x2k′′ , . . . , xNk′′). Since x1k′′ is a subsequence of x1k′, by Proposition 2.3it is also convergent. Therefore xk′′ = (x1k′′ , . . . , xNk′′) ⊂ [−b, b]N alsoconverges, so xk∞k=1 has a convergent subsequence.

We have shown that any xk ⊂ K ⊂ [−b, b]N has a convergent subsequencexkl. Since K is closed, the limit belongs to K. Therefore K is compact.

2.3 Continuous functions

Let U ⊂ RN be a set and f : U → R be a function. We say that f is continuousat x ∈ U if f(xk)→ f(x) for any sequence xk ⊂ U such that xk → x. We saythat f continuous on U if it is continuous at every point of U . Intuitively, f iscontinuous if its graph has no gaps. The following theorem is important becauseit gives a sufficient condition for the existence of a solution to an optimizationproblem.

Theorem 2.5 (Extreme Value Theorem). If K ⊂ RN is nonempty and compactand f : K → R is continuous, then f(K) is compact. In particular, f attainsits maximum and minimum over K.

Proof. Let yk ⊂ f(K). Then we can take xk ⊂ K such that yk = f(xk) forall k. Since K is compact, we can take a subsequence xkl

∞l=1 such that xkl →

x ∈ K. Then by the continuity of f , we have ykl = f(xkl) → f(x) ∈ f(K), sof(K) is compact.

Since f(K) is compact, by Theorem 2.4 it is bounded. HenceM := sup f(K) <∞. Repeating the above argument with yk such that yk →M , it follows thatM = lim ykl = lim f(xkl) = f(x), so f attains its maximum. The case for theminimum is similar.

With applications in mind, it is useful to allow some discontinuous functionsand functions that take values ±∞. We say that f : RN → [−∞,∞] is lowersemi-continuous at x if for any xk → x we have f(x) ≤ lim infk→∞ f(xk). Wesay that f is upper semi-continuous if f(x) ≥ lim supk→∞ f(xk). Clearly f isupper semi-continuous if and only if −f is lower semi-continuous.

The following theorem generalizes Theorem 2.5.

Theorem 2.6 (Extreme Value Theorem for semi-continuous functions). Let Kbe compact and f : K → [−∞,∞] be lower (upper) semi-continuous. Then fattains a minimum (maximum) over K.

Proof. We show only for the case f is lower semi-continuous. If f(x) = −∞for some x ∈ K or f(x) = ∞ for all x ∈ K, there is nothing to prove. Henceassume that f(x) > −∞ for all x ∈ K and f(x) < ∞ for some x ∈ K. Letm = infx∈K f(x). Take a sequence xk ⊂ K such that f(xk)→ m. Since K iscompact, there is a subsequence such that xkl → x for some x ∈ K. Since f islower semi-continuous, we get

m ≤ f(x) ≤ lim infl→∞

f(xkl) = m,

so −∞ < f(x) = m.

24

Problems

2.1.

1. Let Fii∈I ⊂ RN be a collection of closed sets. Prove that⋂i∈I Fi is

closed.

2. Let A ⊂ RN be any set. Prove that there exists a smallest closed set thatincludes A. (We denote this set by clA and call it the closure of A.)

3. Prove that there exists a largest open subset of A. (We denote this set byintA and call it the interior of A.)

2.2. Let B(x, ε) =y ∈ RN

∣∣ ‖y − x‖ < ε

be the open ball with center x andradius ε.

1. Prove that U is open if and only if for any x ∈ U , there exists ε > 0 suchthat B(x, ε) ⊂ U .

2. Prove that if U1, U2 are open, so is U1∩U2. Prove that if F1, F2 are closed,so is F1 ∪ F2.

3. Let A,B be any set. Prove that int(A∩B) = intA∩ intB and cl(A∪B) =clA ∪ clB.

2.3. In the proof of the Heine-Borel theorem (Theorem 2.4), we used the factthat any bounded set A ⊂ R has a supremum α = supA, known as the axiomof continuity. Using this axiom, prove that any bounded monotone sequence isconvergent (which we also used in the proof).

2.4. A sequence xk∞k=1 ⊂ RN is said to be Cauchy if

∀ε > 0,∃K > 0, k, l > K =⇒ ‖xk − xl‖ < ε,

that is, the terms with sufficiently large indices are arbitrarily close to eachother.

1. Prove that a Cauchy sequence is bounded.

2. Prove that a Cauchy sequence converges. (Hint: Heine-Borel theorem.This property is called the completeness of RN .)

2.5. This problem asks you to prove Theorem 2.1.

1. For any norm ‖·‖ on RN , define f : RN → R by f(x) = ‖x‖. Show thatf is continuous, where we define convergence of sequences using the supnorm ‖·‖∞. (Hint: express x = (x1, . . . , xN )′ as x =

∑Nn=1 xnen, where

en is a unit vector, and use the triangle inequality.)

2. Define the set K =x ∈ RN

∣∣ ‖x‖∞ = 1

. Show that K is nonempty andcompact.

3. Define g : K → R by g(x) = f(x)/ ‖x‖∞. Show that g is continuous.

25

4. Show that there exist constants 0 < c ≤ C such that

c ‖x‖∞ ≤ ‖x‖ ≤ C ‖x‖∞

for all x ∈ RN .

5. Prove Theorem 2.1.

2.6. If X is a vector space, a set Y ⊂ X is called a subspace of X if Y is itself avector space. This problem asks you to prove that any subspace Y of X = RN(or more generally, a subspace of any finite-dimensional space) is closed.

Since Y is a subspace of RN , we can take a basis anNn=1 of RN such that

anMn=1 is a basis of Y , where M = dimY ≤ N . For any x ∈ RN , there exist

unique numbers x1, . . . , xN such that x =∑Nn=1 xnan. Define

‖x‖ = maxn|xn| .

1. Prove that ‖·‖ is a norm on RN .

2. Show that Y is closed. (Hint: use Theorem 2.1.)

3. Let X be the space of bounded continuous functions defined on [−1, 1]and Y ⊂ X be the space of polynomials. (Convergence is defined by thesup norm ‖f‖ = supx∈[−1,1] |f(x)|.) Show that Y is not closed. Hence theassumption of finite dimension is essential.

2.7. Let f : (a, b)→ R be increasing, so a < x1 ≤ x2 < b implies f(x1) ≤ f(x2).

1. Show that for each x ∈ (a, b),

g±(x) := limh→±0

f(x+ h)

exist, and that g−(x) ≤ g+(x) for all x ∈ (a, b).

2. Show that f is continuous on (a, b) except at at most countably manypoints. (Hint: if g−(x) < g+(x), then there is a rational number inbetween.)

26

Chapter 3

One-Variable Optimization

3.1 A motivating example

Suppose you are the owner of a firm that produces a good. It costs c ≥ 0 dollarsper unit of good produced. If you produce more, in general you will need tocharge a lower price in order to sell everything, so assume the price at whichyou can sell x units of the good is p(x) = a − bx, with a, b > 0. Then whatis the optimal production plan? This is the type of problems you will learn tosolve in this course.

This particular problem can be solved using only the mathematical knowl-edge up to high school. If you produce x, by assumption the revenue is pricetimes quantity, so

p(x)x = (a− bx)x.

Also, the cost is cx. Therefore the profit is

f(x) = p(x)x− cx = (a− bx)x− cx = −bx2 + (a− c)x.

One way to maximize this objective function is to complete the squares:

f(x) = −bx2 + (a− c)x = −b(x− a− c

2b

)2

+(a− c)2

4b.

Since the first term is nonpositive and the second term does not depend on x,assuming a > c the optimal production level is x = a−c

2b (which makes the first

term exactly zero) and the maximum profit is f(x) = (a−c)24b .

This profit maximization problem has a few typical features. First, theobjective function

f(x) = −bx2 + (a− c)xis a nonlinear function of the variable x. (In this case, it is a quadratic function.)Second, since you cannot produce a negative amount of good, implicitly thereis the constraint x ≥ 0.

In this course we will learn how to solve these types of problems—nonlinearconstrained optimization problems. Since the objective function is nonlinear, thetechnique of linear programming does not apply. We will go step-by-step. Firstwe will consider unconstrained optimization problems with a single variable andproceed to constrained optimization problems with a single or multiple variables.

27

3.2 One-variable calculus

A powerful tool for solving nonlinear optimization problems is calculus. In themotivating example above, we were able to solve the problem without usingcalculus because the objective function was quadratic and we could completethe squares. Such a trick does not apply in general.

3.2.1 Differentiation

Differentiation (taking derivatives) is basically approximating a nonlinear func-tion by a linear one. Suppose we want to approximate a function f(x) by alinear function around the point x = a, so

f(x) ≈ p(x− a) + q

for some numbers p, q. The approximation should be exact at x = a, so sub-stituting x = a we must have q = f(a). Subtracting q and dividing by x − a(when x 6= a), we get

p ≈ f(x)− f(a)

x− a.

Since the approximation is for x close to a, it makes sense to define p by the

limit of f(x)−f(a)x−a as x approaches to a (we write this as x→ a).

p = f ′(a) := limx→a

f(x)− f(a)

x− ais called the derivative of f(x) at x = a. Letting x = a+ h with h 6= 0, we canalso write

f ′(a) = limh→0

f(a+ h)− f(a)

h.

Example 3.1. Let f(x) = x. Then

f ′(a) = limh→0

(a+ h)− ah

= limh→0

h

h= limh→0

1 = 1.

Example 3.2. Let f(x) = x2. Then

f ′(a) = limh→0

(a+ h)2 − a2

h= limh→0

2ah+ h2

h= limh→0

(2a+ h) = 2a.

If the derivative of f exists at every point, then the function f is calleddifferentiable. The derivative of f at x is denoted by f ′(x). The derivativef ′(x) is itself another function. If f ′ is continuous, then f is called continuouslydifferentiable, or simply a C1 function. If f ′(x) is again differentiable, then itsderivative is denoted by f ′′(x) and is called the second derivative of f . You candefine f ′′′(x), f ′′′′(x), etc. in the same way. The n-th derivative of f is usuallydenoted by f (n)(x). If f is n times differentiable and f (n)(x) is continuous (“ntimes continuously differentiable”), then f is called a Cn function.

The following proposition shows why calculus is a powerful tool.

Proposition 3.1. Consider the optimization problem

maximize f(x),

where f is differentiable. If x is a solution, then f ′(x) = 0.

28

Proof. Take any h > 0. Since x attains the maximum of f , we have

f(x+ h) ≤ f(x).

Subtracting f(x) and dividing by h > 0, we get

f(x+ h)− f(x)

h≤ 0.

Letting h → 0 and using the definition of the derivative, we get f ′(x) ≤ 0. Byconsidering the case h < 0, we can show f ′(x) ≥ 0. Therefore f ′(x) = 0.

Proposition 3.1 says that in order to maximize a differentiable function (withno constraints), it is necessary that the derivative is zero.

Example 3.3. Consider the motivating example above. The objective functionis f(x) = −bx2 + (a− c)x. The derivative is

f ′(x) = −2bx+ a− c.

By Proposition 3.1, the solution x must satisfy

f ′(x) = −2bx+ a− c = 0 ⇐⇒ x =a− c

2b.

However, setting the derivative to zero is not sufficient in general, as thefollowing example shows.

Example 3.4. Let f(x) = x3 − 12x. Then

f ′(x) = 3x2 − 12 = 3(x− 2)(x+ 2),

so f ′(x) = 0 ⇐⇒ x = ±2. Now f(±2) = ∓16. But f(±5) = ∓65, so x = ±2are neither the minimum nor the maximum of f .

3.2.2 Mean value theorem and Taylor’s theorem

Let f be a differentiable function. By definition, f ′(a) is the limit of f(b)−f(a)b−a —

the slope between the points (a, f(a)) and (b, f(b))—as b approaches a. Is therean exact relationship between f ′ and arbitrary b? The mean value theoremgives an answer.

Proposition 3.2 (Mean value theorem). Let f be continuous on [a, b] anddifferentiable on (a, b). Then there exists c ∈ (a, b) such that

f(b)− f(a)

b− a= f ′(c).

Proof. Let

φ(x) = f(x)− f(a)− f(b)− f(a)

b− a(x− a).

By direct substitution, we can show φ(a) = φ(b) = 0. If φ ≡ 0 on [a, b], then

0 = φ′(x) = f ′(x)− f(b)− f(a)

b− a

29

on (a, b), so we can take any c ∈ (a, b). Suppose there exists x ∈ [a, b] such thatφ(x) > 0. Since φ is continuous, by the extreme value theorem it attains themaximum at some point c ∈ [a, b]. Since φ(a) = φ(b) = 0 and φ takes a positivevalue, it must be c ∈ (a, b). By Proposition 3.1, we have

0 = φ′(c) = f ′(c)− f(b)− f(a)

b− a⇐⇒ f(b)− f(a)

b− a= f ′(c).

The proof if φ takes a negative value is similar.

Remember that differentiation is basically a linear approximation:

f(x) ≈ f(a) + f ′(a)(x− a).

Changing the notation in the mean value theorem such that b = x and c = ξ,we obtain

f ′(ξ) =f(x)− f(a)

x− a⇐⇒ f(x) = f(a) + f ′(ξ)(x− a).

There is no reason to stop at a linear (first-order) approximation. If, for example,we continue to a quadratic (second-order) approximation, we can show that foreach x, there exists a number ξ between a and x such that

f(x) = f(a) + f ′(a)(x− a) +1

2f ′′(ξ)(x− a)2.

In general, by increasing the order of the polynomial approximation, we canprove the following Taylor’s theorem (proof in Problem 3.7):

Proposition 3.3 (Taylor’s theorem). Let f be n times differentiable. Then foreach x, there exists a number ξ between a and x such that

f(x) = f(a)+f ′(a)(x−a)+· · ·+ 1

(n− 1)!f (n−1)(a)(x−a)n−1+

1

n!f (n)(ξ)(x−a)n.

Here n! = n× (n− 1)× · · · × 2× 1 is the n factorial.

The proof is in Problem 3.7.

3.3 Convex functions

Proposition 3.1 tells us that if a function is differentiable, the derivative is zeroat the optimum (maximum or minimum). Therefore, setting the derivative tozero (first-order condition) is a necessary condition for optimality. Is there asufficient condition for optimality? The answer is yes: there is a special but largeenough class of functions such that the first-order condition is also sufficient.

A function f is said to be convex if for any x1, x2 and 0 ≤ α ≤ 1 we have

f((1− α)x1 + αx2) ≤ (1− α)f(x1) + αf(x2).

Graphically, a function is convex if the segment joining the points (x1, f(x1))and (x2, f(x2)) lies above the graph of f (Figure 3.1). f is strictly convex if theinequality is strict for 0 < α < 1. f is concave if −f is convex.

30

x

y = f(x)

x1 x2

f(x1)

f(x2)

Figure 3.1: Convex function.

As shown in Problems 3.10 and 3.11, a twice continuously differentiablefunction f is convex if and only if the second derivative is nonnegative, sof ′′(x) ≥ 0. The intuitive explanation is as follows. When f ′′(x) ≥ 0, thenf ′(x)—the derivative or the slope of f—is increasing. Therefore if you imagineflying along the graph of f , you will be constantly turning upwards. Thereforethe segment that joins arbitrary two points on the trajectory must line abovethe actual trajectory.

The following proposition shows that setting the derivative to zero is suffi-cient for optimization when the objective function is convex or concave.

Proposition 3.4. Let f be twice differentiable and convex (concave). If f ′(x) =0, then x is the minimum (maximum) of f .

Proof. Suppose that f is convex, so f ′′(x) ≥ 0. Applying Taylor’s theorem(Proposition 3.3) for a = x and n = 2, for any x there exists ξ such that

f(x) = f(x) + f ′(x)(x− x) +1

2f ′′(ξ)(x− x)2.

Since by assumption f ′(x) = 0 and f ′′(ξ) ≥ 0, we obtain f(x) ≥ f(x). Thereforex is the minimum of f . A similar argument holds when f is concave.

Example 3.5. Consider the motivating example above. The objective functionis f(x) = −bx2 + (a− c)x and the first derivative is f ′(x) = −2bx+ a− c. Sincethe second derivative is

f ′′(x) = −2b < 0,

f is concave. Therefore

f ′(x) = −2bx+ a− c = 0 ⇐⇒ x =a− c

2b

is the maximum of f .

Problems

3.1. Using the definition, compute the derivative of the following functions.

31

1. f(x) = x3.

2. f(x) = x4.

3. f(x) = xn, where n is a natural number. (Hint: binomial theorem.)

4. f(x) = 1/x.

5. f(x) =√x.

3.2. Let f, g be differentiable and α be a real number. Show that

1. (f(x) + g(x))′ = f ′(x) + g′(x),

2. (αf(x))′ = αf ′(x),

3. (f(x)g(x))′ = f ′(x)g(x) + f(x)g′(x),

4. (g(f(x)))′ = g′(f(x))f ′(x) (chain rule).

3.3. The exponential function is defined by

ex = 1 + x+1

2x2 +

1

6x3 + · · · =

∞∑n=0

1

n!xn,

where e = 2.718281828 . . . . It satisfies ex+y = exey and (ex)′ = ex. Thelogarithmic function is the inverse function of the exponential, so elog x = x andlog ex = x. Using the chain rule, show that

1. (log x)′ = 1/x,

2. (xα)′ = αxα−1.

3.4. Define f(x) by f(x) = x2 sin(1/x) if x 6= 0 and f(0) = 0.

1. Compute f ′(x) when x 6= 0.

2. Using the definition, compute f ′(0).

3. Show that f is differentiable but not continuously differentiable.

3.5.

1. Fill in the details of the proof of Proposition 3.1.

2. Show that Proposition 3.1 also holds for minimization.

3.6.

1. Let f : (a, b) → R be differentiable and f ′ > 0. Show that f is strictlyincreasing, i.e., x1 < x2 implies f(x1) < f(x2).

2. Let f : [a, b]→ R be continuous and differentiable on (a, b). Let x ∈ [a, b]be a maximum of f , which exists by Theorem 2.5. If f ′(x) > 0 for xsufficiently close to a, show that x 6= a.

3.7. This problem asks you to prove Taylor’s theorem. Let a 6= b and f :[a, b] → R. Suppose that f is n times differentiable and f (k)(x) is continuouson [a, b] for k = 1, . . . , n− 1.

32

1. Define the polynomial P (x) =∑n−1k=0

f(k)(a)k! (x − a)k, let M = f(b)−P (b)

(b−a)n ,

andφ(x) = f(x)− P (x)−M(x− a)n.

Show that φ(a) = φ′(a) = · · · = φ(n−1)(a) = 0 and φ(n)(x) = f (n)(x) −n!M .

2. Prove Taylor’s theorem.

3.8. For each of the following functions, show whether it is convex, concave, orneither.

1. f(x) = 10x− x2,

2. f(x) = x4 + 6x2 + 12x,

3. f(x) = 2x3 − 3x2,

4. f(x) = x4 + x2,

5. f(x) = x3 + x4,

6. f(x) = ex,

7. f(x) = log x (x > 0),

8. f(x) = x log x (x > 0),

9. f(x) = xα, where α 6= 0 and x > 0. (Hint: there are a few cases toconsider.)

3.9. Suppose that you are running a firm that produces an output good usingan input good. When the input is x, the output is 2

√x. Suppose that the price

of the input is c and the price of the output is p. Compute the input level thatmaximizes the profit.

The following two exercises ask you to show that a twice differentiable func-tion is convex if and only if the second derivative is nonnegative.

3.10. Let f be differentiable.

1. Fix x 6= y and let g(t) = f((1−t)x+ty)−f(x)t , where t > 0. For 0 < s < t,

show that

g(s) ≤ g(t) ⇐⇒ f((1− s)x+ sy) ≤(

1− s

t

)f(x) +

s

tf((1− t)x+ ty).

2. Show that the function g is increasing if and only if f is convex.

3. Compute g(1) and limt→0 g(t).

4. Show that f is convex if and only if

f(y)− f(x) ≥ f ′(x)(y − x)

for all x, y.

33

3.11. Using Taylor’s theorem and the previous exercise, show that a twicecontinuously differentiable function f is convex if and only if f ′′(x) ≥ 0 for allx.

3.12. Prove Proposition 3.4 assuming only that f is differentiable (but notnecessarily twice differentiable). (Hint: Problem 3.10.)

3.13. Let f be strictly convex. If f has a minimum, show that it is unique.(Hint: assume there are two minima x1, x2 and derive a contradiction using thedefinition of convexity.)

34

Chapter 4

Multi-Variable Calculus


Suppose you are the owner of a firm that produces two goods. The unit priceof good 1 and 2 are p1 and p2, respectively. To produce x1 units of good 1 andx2 units of good 2, it costs

c(x1, x2) =1

2(x2

1 + x22).

What is the optimal production plan?This problem can be solved using only high school algebra. If you produce

(x1, x2), the profit is

f(x1, x2) = p1x1 + p2x2 − c(x1, x2) = p1x1 + p2x2 −1

2(x2

1 + x22).

Since f is a quadratic function, you can complete the squares:

f(x1, x2) = −1

2(x1 − p1)2 − 1

2(x2 − p2)2 +

1

2(p2

1 + p22),

so the optimal plan is (x1, x2) = (p1, p2), with maximum profit 12 (p2

1 + p22).

Many practical problems are optimization problems that involve two or morevariables, as in this example. In the previous chapter, we saw that calculus is apowerful tool for solving one-variable optimization problems. The same is truefor the multi-variable case. This chapter introduces the basics of multi-variablecalculus.

4.2 Differentiation

Consider a function of two variables, f(x1, x2). In Chapter 3, we motivateddifferentiation by a linear approximation. The same is true for functions oftwo or more variables. Suppose we want to approximate f(x1, x2) by a linearfunction around the point (x1, x2) = (a1, a2), so

f(x1, x2) ≈ p1(x1 − a1) + p2(x2 − a2) + q (4.1)

35

for some numbers p1, p2, q. The approximation should be exact at (x1, x2) =(a1, a2), so substituting (x1, x2) = (a1, a2) we must have q = f(a1, a2). Thevalues of p1, p2 should be such that as (x1, x2) approaches (a1, a2), the approxi-mation should get better and better. Therefore subtracting f(a1, a2) from bothsides of (4.1) and letting x2 = a2 and x1 → a1, it must be

p1 =∂f

∂x1(a1, a2) := lim

x1→a1

f(x1, a2)− f(a1, a2)

x1 − a1.

This quantity is called the partial derivative of f with respect to x1 (evaluatedat (a1, a2)). A partial derivative, as the name suggests, is just a derivative of afunction with respect to one variable, fixing all other variables. Intuitively, thepartial derivative is the rate of change (slope) of the function in the direction ofone particular coordinate. By a similar argument, we obtain

p2 =∂f

∂x2(a1, a2) := lim

x2→a2

f(a1, x2)− f(a1, a2)

x2 − a2,

the partial derivative of f with respect to x2.If you know how to take the derivative of a one-variable function, comput-

ing partial derivatives of a multi-variable function is straightforward: you justpretend that all variables except one are constants.

Example 4.1. Let f(x1, x2) = x1 + 2x2 + 3x21 + 4x1x2 + 5x2

2. Then

∂f

∂x1(x1, x2) = 1 + 6x1 + 4x2,

∂f

∂x2(x1, x2) = 2 + 4x1 + 10x2.

A function is said to be partially differentiable if the partial derivatives exist.If a function is partially differentiable and the partial derivatives are continuous,we call it a C1 function. In general, a Cr function means that you can partiallydifferentiate r times (with an arbitrary choice of variables) and the resultingfunction is continuous. A function is said to be differentiable if the linear ap-proximation (4.1) becomes exact as the point (x1, x2) gets closer to (a1, a2), soformally

f(a1 + h1, a2 + h2)− f(a1, a2) = p1h1 + p2h2 + ε(h1, h2) (4.2)

with ε(h1, h2)/√h2

1 + h22 → 0 as (h1, h2) → (0, 0), where p1, p2 are partial

derivatives. It is known that a C1 function is differentiable (Problem 4.4).In the one-variable case, the derivative of a function at a minimum or a

maximum is zero. The same is true for partial derivatives of a multi-variablefunction. We omit the proof because it is essentially the same as the one-variablecase.


maximize f(x1, x2),

where f is partially differentiable. If (x1, x2) is a solution, then ∂f∂x1

(x1, x2) =∂f∂x2

(x1, x2) = 0.

36

Example 4.2. Consider the motivating example. Then

∂f

∂x1= p1 − x1,

∂f

∂x2= p2 − x2,

so ∂f/∂x1 = ∂f/∂x2 = 0 implies (x1, x2) = (p1, p2), which maximizes the profit.

4.3 Vector notation and gradient

Equation (4.2) shows that the difference in f is approximately a linear functionof the differences in the coordinates, p1h1 + p2h2. Define the vectors a, p, h by

a =

[a1

a2

], p =

[p1

p2

], and h =

[h1

h2

]. If you remember the definition of the inner

product,1 (4.2) can be compactly written as

f(a+ h)− f(a) = p · h+ ε(h)

with ε(h)/ ‖h‖ → 0 as h → 0, where ‖h‖ =√h · h =

√h2

1 + h22 is the (Eu-

clidean) norm of the vector h. The vector of partial derivatives,

∇f(a) :=

[p1

p2

]=

[∂f∂x1

(a1, a2)∂f∂x2

(a1, a2)

],

is called the gradient of f at (a1, a2). (You read the symbol ∇ “nabla”.) Theabove equation then becomes

f(a+ h)− f(a) = ∇f(a) · h+ ε(h).

Example 4.3. Let f(x1, x2) = x21x

32. Then

∇f(x1, x2) =

[∂f∂x1∂f∂x2

]=

[2x1x

32

3x21x

22

].

Using the gradient, Proposition 4.1 simplifies as follows: if x is a solution ofthe optimization problem

maximize f(x),

where f is partially differentiable, then ∇f(x) = 0.2 The same is true forminimization.

My experience tells that when students are first introduced to the vector no-tation, they are overwhelmed by the “abstract” nature. It is true that imagininga vector requires more mental effort than imagining a real number. However,the vector notation has two important advantages over the component-wise no-tation. First, since you don’t need to write down all the components, it savesspace and you can focus on the substantive content. Second, since the vectornotation applies to any dimension (1, 2, . . . ), you can develop a single theory

1The inner product is also called the dot product, although inner product is more common.

2We use the letter 0 to denote the zero vector

[00

].

37

that applies to all cases. For these reasons, you should get used to the vectornotation. Whenever you think it is too abstract, consider the two-dimensionalcase for concreteness.

Intuitively, the gradient ∇f(a) is the direction at which the function in-creases fastest at the point a. To see this, take any vector d and evaluate thevalue of f along the straight line x = a+ td that passes through the point a andpoints to the direction d, where t is a parameter. The value is then f(a + td).The slope of f along this line is

limt→0

f(a+ td)− f(a)

t= ∇f(a) · d,

which can be shown using the chain rule (Proposition 4.3). This quantity isknown as the directional derivative of f (with direction d). In particular, if d is a

unit vector (say d =

[10

]or

[01

]in the two-dimensional case), then the directional

derivative is a partial derivative. Assuming d has length 1 (so ‖d‖ = 1) andapplying the Cauchy-Schwarz inequality x · y ≤ ‖x‖ ‖y‖ to x = ∇f(a) andy = d, it follows that

∇f(a) · d ≤ ‖∇f(a)‖ ‖d‖ = ‖∇f(a)‖ , (4.3)

with equality if d is parallel to ∇f(a), so d = ∇f(a)/ ‖∇f(a)‖. This inequal-ity shows that the directional derivative (the rate of change of the function) ismaximum when the direction is that of the gradient. In other words, an inter-pretation of the gradient ∇f(a) is the direction of steepest ascent of f at x = a.Similarly, −∇f(a) is the direction of steepest descent of f at x = a.

Example 4.4. Let f(x1, x2) =√x2

1 + x22. Letting x1 = r cos θ and x2 = r sin θ,

we have f(x1, x2) = r, so f is constant along a circle and increases away fromthe circle. Therefore at any point x = (x1, x2), f increases fastest along theradius joining the origin and x. In fact, the gradient is

∇f(x1, x2) =

x1√x21+x2

2x2√x21+x2

2

=

[cos θsin θ

],

which points to the direction of the radius.

4.4 Mean value theorem and Taylor’s theorem

In one-variable optimization problems, the mean value theorem (Proposition3.2) and Taylor’s theorem (Proposition 3.3) are useful to characterize the solu-tion. The same is true with multiple variables.

Proposition 4.2 (Mean value theorem). Let f be differentiable. For any vec-tors a, b, there exists a number 0 < θ < 1 such that

f(b)− f(a) = 〈∇f((1− θ)a+ θb), b− a〉 .

Here 〈x, y〉 = x · y = x1y1 + · · · + xNyN is another notation for the innerproduct. The proof is in Problem 4.3. The mean value theorem for the one-variable case says that there exists a number c between a and b such that

f(b)− f(a)

b− a= f ′(c).

38

Multiplying both sides by b−a and choosing 0 < θ < 1 such that c = (1−θ)a+θb(which is possible because c is between a and b), we get

f(b)− f(a) = f ′((1− θ)a+ θb)(b− a).

Therefore the multi-variable version of the mean value theorem is a generaliza-tion of the one-variable case.

Taylor’s theorem also generalizes to the multi-variable case. Suppose youwant to approximate f(x) around x = a. Let h = x − a, and consider theone-variable function g(t) = f(a+ th). Then g(0) = f(a) and g(1) = f(x). Nowapply Taylor’s theorem to the one-variable function g(t) and set t = 1. The re-sult is Taylor’s theorem for the multi-variable function f(x). The multi-variableversion of Taylor’s theorem is most useful in the second-order approximation.The result is

f(x) = f(a) + 〈∇f(a), x− a〉+1

2

⟨x− a,∇2f(ξ)(x− a)

⟩,

where ξ = (1− θ)a+ θx for some 0 < θ < 1. Here ∇2f is the matrix of secondpartial derivatives of f , which is known as the Hessian:

∇2f =

[∂2f∂x2

1

∂2f∂x1∂x2

∂2f∂x2∂x1

∂2f∂x2

2

].

In general, the (m,n) element of the Hessian ∇2f is ∂2f∂xm∂xn

. Although we

do not prove it, for C2 functions, the order of the partial derivatives can be

exchanged: ∂2f∂x2∂x1

= ∂2f∂x1∂x2

.

Example 4.5. Consider the motivating example. Then

∂2f

∂x21

= −1,∂2f

∂x1∂x2= 0,

∂2f

∂x22

= −1,

so the Hessian is

∇2f(x) =

[−1 00 −1

].

4.5 Chain rule

Let me convince you that the vector and matrix notation is quite useful byproving the chain rule. Instead of a real valued function of several variables,consider a vector valued function, for example

f(x) =

[f1(x1, x2)f2(x1, x2)

].

Here the variables are x1, x2. f(x) is a two-dimensional vector with first com-ponent f1(x1, x2) and second component f2(x1, x2). More generally, you canconsider f : RN → RM , where N is the dimension of the domain (variables)and M is the dimension of the range (value). In the example above, we have

39

M = N = 2. Such a function f is differentiable at the point a = (a1, . . . , aN )′

if there exists an M ×N matrix A and a function ε(h) such that

f(a+ h)− f(a) = Ah+ ε(h)

with ε(h)/ ‖h‖ → 0 as h → 0, where h = (h1, . . . , hN )′. Setting hn = 0 for allbut one n, dividing by hn 6= 0, taking the limit as hn → 0, and comparing them-th component of both sides, you can show that the (m,n) component of thematrix A is the partial derivative ∂fm

∂xn(a). The matrix A is called the Jacobian

of f at a, and is often denoted by Df(a). In particular, if the dimension of therange is M = 1 (so f is a real valued function), then

Df(a) =[∂f∂x1

(a) · · · ∂f∂xN

(a)],

the 1×N matrix obtained by transposing the gradient ∇f(a).With these notations, we can now state and prove the chain rule.

Proposition 4.3. Let f : RN → RM be differentiable at a and g : RM → RL bedifferentiable at b = f(a). Then g f : RN → RL defined by (g f)(x) = g(f(x))is differentiable at a and

D(g f)(a) = Dg(b)Df(a). (4.4)

Proof. By definition, there exists an M ×N matrix A and a function ε(h) suchthat

f(a+ h)− f(a) = Ah+ ε(h)

with ε(h)/ ‖h‖ → 0 as h→ 0. Similarly, there exists an L×M matrix B and afunction δ(k) such that

g(b+ k)− g(b) = Bk + δ(k)

with δ(k)/ ‖k‖ → 0 as k → 0. Consider the function obtained by composing gand f . Letting k = f(a+ h)− f(a), we obtain

g(f(a+ h))− g(f(a)) = g(b+ k)− g(b) = Bk + δ(k)

= B(f(a+ h)− f(a)) + δ(f(a+ h)− f(a))

= B(Ah+ ε(h)) + δ(Ah+ ε(h))

= BAh+Bε(h) + δ(Ah+ ε(h)).

Since ε and δ are negligible compared with their arguments, it follows thatg(f(x)) is differentiable at x = a and

D(g f)(a) = BA = Dg(b)Df(a),

which is (4.4).

Proposition 4.3 generalizes the familiar formula (g(f(x)))′ = g′(f(x))f ′(x)to the multi-dimensional case.

What does (4.4) mean? For example, let g be a real valued function of twovariables, say g(x1, x2). Let f be a vector valued function of one variable, say

f(t) =

[f1(t)f2(t)

]. Since

Dg =[∂g∂x1

∂g∂x2

]and Df =

[f ′1(t)f ′2(t)

],

40

it follows that

d

dtg(f1(t), f2(t)) = D(g f) = DgDf

=[∂g∂x1

∂g∂x2

] [f ′1(t)f ′2(t)

]=

∂g

∂x1f ′1(t) +

∂g

∂x2f ′2(t).

In general, if f is an M -dimensional function of x1, . . . , xN and g is a real valuedfunction of y1, . . . , yM , we have

∂(g f)

∂xn=

M∑m=1

∂g

∂ym

∂fm∂xn

.

Problems

4.1. Compute the partial derivatives, the gradient, and the Hessian of the fol-lowing functions.

1. f(x1, x2) = a1x1 + a2x2, where a1, a2 are constants.

2. f(x1, x2) = ax21 + 2bx1x2 + cx2

2, where a, b, c are constants.

3. f(x1, x2) = x1x2.

4. f(x1, x2) = x1 log x2, where x2 > 0.

4.2. Compute the gradient and the Hessian of the following functions.

1. f(x) = 〈a, x〉, where a, x are vectors of the same dimensions and 〈a, x〉 =a · x is the inner product of a and x.

2. f(x) = 〈x,Ax〉, where A is a square matrix of the same dimension as thevector x.

4.3. This problem asks you to prove the multi-variable mean value theorem(Proposition 4.2). Let f be differentiable.

1. Let g(t) = f(a+ t(b− a)). Using the chain rule, compute g′(t).

2. Using the one-variable mean value theorem, prove the multi-variable meanvalue theorem.

4.4. This problem asks you to prove that a C1 function is differentiable. Letf(x1, x2) be a C1 function (i.e., partially differentiable and the partial deriva-tives are continuous). Fix (a1, a2).

1. Using the one-variable mean value theorem, show that there exist numbers0 < θ1, θ2 < 1 such that

f(a1+h1, a2+h2)−f(a1, a2) =∂f

∂x1(a1+θ1h1, a2+h2)h1+

∂f

∂x2(a1, a2+θ2h2)h2.

(Hint: subtract and add f(a1, a2 + h2) to the left-hand side.)

2. Letε(h) = f(a+ h)− f(a)−∇f(a) · h,

where a = (a1, a2) and h = (h1, h2). Prove that limh→0 ε(h)/ ‖h‖ = 0.

41

Chapter 5

Multi-VariableUnconstrainedOptimization

5.1 First and second-order conditions

Consider the unconstrained optimization problem

minimize f(x), (5.1)

where f is a (one- or multi-variable) differentiable function. Recall from Propo-sition 4.1 that if x is a solution, then ∇f(x) = 0, where

∇f(x) =

[∂f∂x1

(x)∂f∂x2

(x)

]

is the gradient (the vector of partial derivatives) in the two-dimensional casebut the general case is similar. The condition ∇f(x) = 0 is called the first-ordercondition for optimality. It is necessary, but not sufficient, as the followingexample shows.

Example 5.1. Let f(x) = x3 − 3x. Since

f ′(x) = 3x2 − 3 = 3(x− 1)(x+ 1)

> 0, (x < −1, x > 1)

= 0, (x = ±1)

< 0, (−1 < x < 1)

x = ±1 are stationary points. x = 1 is a local minimum and x = −1 is a localmaximum. However, since f(x) → ±∞ as x → ±∞, they are neither globalminimum nor global maximum.

Can we derive a sufficient condition for optimality? The answer is yes. Tothis end we need to introduce a few notations. We say that x is a (global)solution to the unconstrained minimization problem (5.1) if f(x) ≥ f(x) for all

42

x. We say that x is a local solution if f(x) ≥ f(x) for all x close enough to x.Finally, x is called a stationary point if ∇f(x) = 0.

Example 5.1 shows that in general, we can only expect that a stationarypoint is a local optimum, not a global optimum. We can use Taylor’s theoremto derive conditions under which this is indeed true. Suppose that f is a C2

(twice continuously differentiable) function, and x is a stationary point. Takeany x = x+ h. By Taylor’s theorem, there exists 0 < α < 1 such that

f(x+ h) = f(x) + 〈∇f(x), h〉+1

2

⟨h,∇2f(x+ αh)h

⟩, (5.2)

where 〈a, b〉 is the inner product between vectors a, b and ∇2f is the Hessian(matrix of second derivatives) of f :

∇2f =

[∂2f∂x2

1

∂2f∂x1∂x2

∂2f∂x2∂x1

∂2f∂x2

2

]

in the two-variable case. Since x is a stationary point, we have ∇f(x) = 0, so(5.2) implies

f(x+ h) = f(x) +1

2

⟨h,∇2f(x+ αh)h

⟩.

Therefore whether f(x) = f(x + h) is greater than or less than f(x) dependson whether the quantity

⟨h,∇2f(x+ αh)h

⟩is positive or negative. Letting

Q = ∇2f(x) be the Hessian of f evaluated at the stationary point x, if h = x−xis small, then ∇2f(x+ αh) is close to Q. Therefore

f(x) ≷ f(x) ⇐⇒ 〈h,Qh〉 ≷ 0. (5.3)

Recall from Chapter 1 that a symmetric matrix A is positive (negative)definite if 〈h,Ah〉 > 0 (< 0) for all vector h 6= 0, and that A is positive (negative)semidefinite if 〈h,Ah〉 ≥ 0 (≤ 0) for all h. The equivalence (5.3) says that xis a local minimum (maximum) if Q = ∇2f(x) is positive (negative) definite.Thus we obtain the following sufficient condition for local optimality, called thesecond-order condition.

Proposition 5.1. Let f be a twice continuously differentiable function and∇f(x) = 0. If x is a local minimum (maximum), then ∇2f(x) is positive(negative) semidefinite. Conversely, if ∇2f(x) is positive (negative) definite,then x is a local minimum (maximum).

The proof of Proposition 5.1 is straightforward using the second-order Taylorapproximation (5.2), so it is left as an exercise (Problem 5.3).

Example 5.2. Let f(x) = x3 − 3x. Since f ′(x) = 3x2 − 3 and f ′′(x) = 6x, wehave f ′(±1) = 0 and f ′′(±1) = ±6. Therefore x = 1 is a local minimum andx = −1 is a local maximum.

Example 5.3. Let f(x1, x2) = x21 + x1x2 + x2

2. Then the gradient is ∇f(x) =[2x1 + x2

x1 + 2x2

]and the Hessian is

∇2f(x) =

[2 11 2

].

43

Since ∇f(0) = 0, (x1, x2) = (0, 0) is a stationary point. Now

⟨h,∇2f(0)h

⟩=[h1 h2

] [2 11 2

] [h1

h2

]= 2h2

1 + 2h1h2 + 2h22 = 2

(h1 +

1

2h2

)2

+3

2h2

2 ≥ 0,

with strict inequality if (h1, h2) 6= (0, 0), so∇2f(0) is positive definite. Therefore(x1, x2) = (0, 0) is a local minimum (indeed, a global minimum).

Example 5.4. Let f(x1, x2) = x21−x2

2. Since∇f(x) =

[2x1

−2x2

], (x1, x2) = (0, 0)

is a stationary point. However, since f(x1, 0) = x21 attains the minimum at

x1 = 0 and f(0, x2) = −x22 attains the maximum at x2 = 0, (x1, x2) = (0, 0) is

neither a local minimum nor a local maximum (it is a saddle point). Indeed,the Hessian

∇2f(x) =

[2 00 −2

]is neither positive nor negative definite.

In order to determine whether a stationary point is a local minimum, max-imum, or a saddle point, we need to determine whether the Hessian is positivedefinite, negative definite, or neither. Although there are a few ways to do so(see Proposition 1.2), usually the easiest way is to complete the squares. Ifh = (h1, h2) and Q is a symmetric matrix, 〈h,Qh〉 is a quadratic function ofh1, h2, so you can complete the squares as in the example above. If the resultis the sum of two positive terms (N positive terms if there are N variables),then Q is positive (semi)definite. If the result is the sum of negative terms, theQ is negative (semi) definite. If the result is the sum of positive and negativeterms, then Q is neither positive nor negative definite. The order you completethe squares doesn’t matter—a property known as Sylvester’s law of inertia.1

5.2 Convex optimization

Recall that in the one-variable case, a twice differentiable function f is convexif f ′′(x) ≥ 0 for all x, and if f is convex, f ′(x) = 0 implies that x is the (global)minimum (Proposition 3.4). That is, the first order condition is necessary andsufficient. The same holds for the multi-variable case.

5.2.1 General case

As in the one-variable case, a function f is said to be convex if for any x1, x2

and 0 ≤ α ≤ 1 we have

f((1− α)x1 + αx2) ≤ (1− α)f(x1) + αf(x2).

A function is called concave if the reverse inequality holds (so −f is convex).Proposition 10.6 shows that a twice continuously differentiable function is con-vex (concave) if and only if the Hessian is positive (negative) semidefinite.

1http://en.wikipedia.org/wiki/Sylvester’s_law_of_inertia

44

http://en.wikipedia.org/wiki/Sylvester's_law_of_inertia

Let f be a twice continuously differentiable convex function, and x be astationary point (so ∇f(x) = 0). By (5.2) and using the definition of thepositive definiteness, we obtain

f(x+ h) = f(x) + 〈∇f(x), h〉+1

2

⟨h,∇2f(x+ αh)h

⟩= f(x) +

1

2

⟨h,∇2f(x+ αh)h

⟩≥ f(x)

for all h, so x is a global minimum. This important result is summarized in thefollowing theorem.

Theorem 5.2. Let f be a differentiable convex (concave) function. Then x isa minimum (maximum) of f if and only if ∇f(x) = 0.

Although the above discussion requires that f is twice continuously differ-entiable, actually this is not necessary, as we see in Proposition 11.1.

5.2.2 Quadratic case

A special but important class of convex and concave functions are quadraticfunctions, because we can solve for the optimum in closed-form. A generalquadratic function with two variables has the following form:

f(x1, x2) = a+ b1x1 + b2x2 + c1x21 + c2x1x2 + c3x

22,

where a, b, c’s are constants. It turns out that it is useful to change the notationsuch that c1 = 1

2q11, c2 = q12, and c3 = 12q22. Then

f(x1, x2) = a+ b1x1 + b2x2 +1

2q11x

21 + q12x1x2 +

1

2q22x

22

= a+ 〈b, x〉+1

2〈x,Qx〉 ,

where b =

[b1b2

]and Q =

[q11 q12

q12 q22

]. The gradient is

∇f(x) =

[b1 + q11x1 + q12x2

b2 + q12x1 + q22x2

]=

[b1b2

]+

[q11 q12

q12 q22

] [x1

x2

]= b+Qx,

and the Hessian is

∇2f(x) =

[q11 q12

q12 q22

]= Q.

The vector and matrix notation is valid with an arbitrary number of variables.Since the Hessian of a quadratic function is constant, f is convex (concave)

if Q is positive (negative) semidefinite. Since

0 = ∇f(x) = b+Qx ⇐⇒ x = −Q−1b,

(Q−1 is the inverse matrix of Q) if Q is positive (negative) definite, then x =−Q−1b is the minimum (maximum) of f(x) = a+ 〈b, x〉+ 1

2 〈x,Qx〉.

45

Problems

5.1. Let f(x) = 10x3 − 15x2 − 60x. Find the local maxima and minima of f .Does f have global maximum and minimum?

5.2. Let f(x) = 180x− 15x2 − 10x3. Solve

maximize f(x) subject to x ≥ 0.

5.3. Prove Proposition 5.1.

5.4. Let f(x1, x2) = x21 − x1x2 + 2x2

2 − x1 − 3x2.

1. Compute the gradient and the Hessian of f .

2. Determine whether f is convex, concave, or neither.

3. Find the stationary point(s) of f .

4. Determine whether each stationary point is a maximum, minimum, orneither.

5.5. Let f(x1, x2) = x21 − x1x2 − 6x1 + x3

2 − 3x2.

1. Find the stationary point(s) of f .

2. Determine whether each stationary point is a local maximum, local mini-mum, or a saddle point.

5.6. Let A =

[a bb c

]be a 2 × 2 symmetric matrix. Show that A is positive

definite if and only if a > 0 and ac− b2 > 0.

5.7. For each of the following symmetric matrices, show whether it is positive(semi)definite, negative (semi)definite, or neither.

1. A =

[1 00 1

],

2. A =

[0 11 0

],

3. A =

[2 11 2

],

4. A =

[2 11 −2

],

5. A =

[1 22 1

],

6. A =

[1 11 4

],

7. A =

[−3 11 −4

],

46

5.8. For each of the following functions, show whether it is convex, concave, orneither.

1. f(x1, x2) = x1x2 − x21 − x2

2,

2. f(x1, x2) = 3x1 + 2x21 + 4x2 + x2

2 − 2x1x2,

3. f(x1, x2) = x21 + 3x1x2 + 2x2

2,

4. f(x1, x2) = 20x1 + 10x2,

5. f(x1, x2) = x1x2,

6. f(x1, x2) = ex1 + ex2 ,

7. f(x1, x2) = log x1 + log x2, where x1, x2 > 0,

8. f(x1, x2) = log(ex1 + ex2),

9. f(x1, x2) = (xp1 + xp2)1p , where x1, x2 > 0 and p 6= 0 is a constant.

5.9. Let p ≥ 1. For x ∈ RN , let ‖x‖p =(∑N

n=1 |xn|p)1/p

. Show Minkowski’s

inequality‖x+ y‖p ≤ ‖x‖p + ‖y‖p

and that ‖·‖p is a norm on RN .

5.10. Let K ≥ 2. Prove that f is convex if and only if

f

(K∑k=1

αkxk

)≤

K∑k=1

αkf(xk)

for all xkKk=1 ⊂ RN and αk ≥ 0 such that∑Kk=1 αk = 1.

5.11.

1. Prove that f(x) = log x (x > 0) is strictly concave.

2. Prove the following inequality of arithmetic and geometric means: for anyx1, . . . , xK > 0 and α1, . . . , αK > 0 such that

∑αk = 1, we have

K∑k=1

αkxk ≥K∏k=1

xαkk ,

with equality if and only if x1 = · · · = xK .

5.12. Let p, q > 0 be numbers such that 1/p+ 1/q = 1.

1. Fixing b ≥ 0, define f(x) = 1px

p − bx + 1q bq for x ≥ 0. Show that f is

convex.

2. Show Young’s inequality

ab ≤ 1

pap +

1

qbq

for all a, b ≥ 0.

47

3. Let x, y ∈ RN . Define ‖x‖p =(∑N

n=1 |xn|p)1/p

. Show Holder’s inequality

N∑n=1

|xnyn| ≤ ‖x‖p ‖y‖q .

(Hint: set a = |xn| / ‖x‖p, b = |yn| / ‖y‖q and use Young.)

5.13.

1. Show that if fi(x)Ii=1 are convex, so is f(x) =∑Ii=1 αifi(x) for any

α1, . . . , αI ≥ 0.

2. Show that if fi(x)Ii=1 are convex, so is f(x) = max1≤i≤I fi(x).

3. Suppose that h : RM → R is increasing (meaning that h is increasing ineach coordinate x1, . . . , xM ) and convex and gm : RN → R is convex form = 1, . . . ,M . Prove that f(x) = h(g1(x), . . . , gM (x)) is convex.

5.14. Let f : RN → (−∞,∞] be convex.

1. Show that the set of solutions to minx∈RN f(x) is a convex set.

2. If f is strictly convex, show that the solution (if it exists) is unique.

5.15. Suppose you collect some two-dimensional data (xn, yn)Nn=1, where Nis the sample size. You wish to fit a straight line y = a+bx to the data. Supposeyou do so by making the observed value yn as close as possible to the theoreticalvalue a+ bxn by minimizing the sum of squares

f(a, b) =

N∑n=1

(yn − a− bxn)2.

1. Is f convex, concave, or neither?

2. Compute the gradient of f .

3. Express a, b that minimize f using the following quantities:

E[X] =1

N

N∑n=1

xn, Var[X] =1

N

N∑n=1

(xn − E[X])2,

E[Y ] =1

N

N∑n=1

yn, Cov[X,Y ] =1

N

N∑n=1

(xn − E[X])(yn − E[Y ]).

5.16. In the previous problem, the variable y is explained by two variables, 1, x.Generalize the problem when y is explained by K variables, x = (x1, . . . , xK).It will be useful to define the N × K matrix X = (xnk) and N -vector y =(y1, . . . , yN ), where xnk is the n-th observation of the k-th variable. The equa-tion you want to fit is

yn = β1xn1 + · · ·+ βKxnK + error term,

and β = (β1, . . . , βK) is the vector of coefficients.

48

5.17. Let A be a symmetric positive definite matrix.

1. Let f(x) = 〈y, x〉 − 12 〈x,Ax〉, where y is a fixed vector. Compute the

gradient and Hessian of f and show that f is concave.

2. Find the maximum of f and its value.

3. Let A,B be symmetric positive definite matrices. We write A B if thematrix C = A− B is positive semidefinite. Show that A B if and onlyif B−1 A−1.

This problem is motivated by Toda (2011).

49

Chapter 6

Multi-Variable ConstrainedOptimization

In the real world, optimization problems come with constraints. Most of us havea budget and cannot spend more money than we have, so we have to choose whatto buy or not. Our stomach has a finite capacity and we cannot eat more than acertain amount, so we must choose what to eat or not. This chapter provides anintuitive introduction to the optimization of a multi-variable function subjectto constraints. The rigorous theory is developed in Chapter 12.


6.1.1 The problem

Suppose there are two goods (say apples and bananas), and your satisfaction isrepresented by the function (called utility function)

u(c1, c2) = log c1 + log c2,

where c1, c2 are the amounts of good 1 and 2 that you consume. Suppose thatthe unit price of goods are p1 and p2, and your budget is w. If you buy c1 andc2 units of good each, your expenditure is

p1c1 + p2c2.

Since your budget is w, your budget constraint is

p1c1 + p2c2 ≤ w.

So the problem of attaining maximum satisfaction within your budget can bemathematically expressed as:

maximize log c1 + log c2

subject to p1c1 + p2c2 ≤ w.

Here u(c1, c2) = log c1+log c2 is called the objective function, and p1c1+p2c2 ≤ wis the constraint.1

1Strictly speaking, there are other constraints c1 ≥ 0 and c2 ≥ 0, since you cannot consumea negative amount.

50

6.1.2 A solution

How can we solve this problem? Some of you might find a trick that turns thisconstrained optimization problem into an unconstrained one, as follows. First,since the objective function log c1 + log c2 is increasing in both c1 and c2, youwill always exhaust your budget. That is, you will always want to consume ina way such that the budget constraint holds with equality, i.e.,

p1c1 + p2c2 = w.

Solving this for c2, we get

c2 =w − p1c1

p2.

Substituting this into the objective function, the problem is equivalent to findingthe maximum of

f(c1) = log c1 + logw − p1c1

p2= log c1 + log(w − p1c1)− log p2.

Setting the derivative equal to zero, we get

f ′(c1) =1

c1− p1

w − p1c1= 0 ⇐⇒ c1 =

w

2p1.

In this case, f tends to −∞ when c1 approaches the boundaries c1 = 0 and c1 =w/p1, so we need not worry about the boundaries. The value of c2 correspondingto c1 = w

2p1is

c2 =w − p1

w2p1

p2=

w

2p2.

Therefore the solution is

(c1, c2) =

(w

2p1,w

2p2

).

6.1.3 Why study the general theory?

The above solution is mathematically correct, but too special to be useful. Thereasons it worked are

1. the inequality constraint p1c1 +p2c2 ≤ w could be turned into an equalityconstraint p1c1 + p2c2 = w,

2. the equality constraint p1c1 + p2c2 = w could be solved for one variablec2, and

3. after substitution the optimization problem became unconstrained, whichis why we could apply calculus.

But in general we cannot hope that any of these steps work. As an exercise, tryto solve the following problem:

maximize x1 + 2x2 + 3x3

subject to x21 + x2

2 + x23 ≤ 1.

Some of you might be able to solve this problem using an ingenious trick, butsuch tricks are generally inapplicable. That is why we need a general theory forsolving constrained optimization problems.

51

6.2 Optimization with linear constraints

To build intuition, we start the discussion of constrained optimization from thesimplest cases, namely when constraints are linear.

6.2.1 One linear constraint

Consider the two-variable optimization problem with one linear constraint,

minimize f(x1, x2)

subject to a1x1 + a2x2 ≤ c,

where f is differentiable and a1, a2, c are constants. Suppose a solution x =(x1, x2) exists. The goal is to derive a necessary condition for x.

If a1x1 + a2x2 < c, so the constraint does not bind or is inactive, since xcan move freely around x, the point x must be a local minimum of f . Therefore∇f(x) = 0. If a1x1 + a2x2 = c, so the constraint binds or is active, then thesituation is more complicated. Let

Ω = x = (x1, x2) | a1x1 + a2x2 ≤ c

be the constraint set. The boundary of Ω is the straight line a1x1 + a2x2 = c,

which has the normal vector a =

[a1

a2

]. Figure 6.1 shows the constraint set Ω

(with the boundary and the normal vector), the solution x, and the negative ofthe gradient −∇f(x). (Here we draw the negative of the gradient because thatis the direction at which the function f decreases fastest, and we are solving aminimization problem.)

a1x1 + a2x2 = c

a =

[a1

a2

]

d =

[d1

d2

]x −∇f(x)

Ω

Figure 6.1: Gradient and feasible direction.

Consider moving towards the direction d from the solution x. Since x is onthe boundary, we have 〈a, x〉 = c. The point x = x + td (where t > 0 is small)is feasible if and only if

〈a, x+ td〉 ≤ c = 〈a, x〉 ⇐⇒ 〈a, d〉 ≤ 0,

which is when the vectors a, d make an obtuse angle as in the picture. Since xis a solution, we have f(x+ td) ≥ f(x) for small enough t > 0. Therefore

0 ≤ limt↓0

f(x+ td)− f(x)

t= 〈∇f(x), d〉 ⇐⇒ 〈−∇f(x), d〉 ≤ 0,

52

so the vectors −∇f(x) and d make an obtuse angle. Therefore we obtain thefollowing necessary condition for optimality:

If a and d make an obtuse angle, then so do −∇f(x) and d.

In Figure 6.1, the angle between −∇f(x) and d is acute, so f decreasestowards the direction d. But this is a contradiction because x is by assumption asolution to the constrained minimization problem, so f cannot decrease towardsany feasible direction. Thus Figure 6.1 is false.

The only case that −∇f(x) and d make an obtuse angle whenever a andd do so is when −∇f(x) and a point to the same direction, as in Figure 6.2.Therefore if x is a solution, there must be a number λ ≥ 0 such that

∇f(x) = −λa ⇐⇒ ∇f(x) + λa = 0.

a1x1 + a2x2 = c

a =

[a1

a2

]

x−∇f(x)

d =

[d1

d2

]Ω

Figure 6.2: Necessary condition for optimality.

In the discussion above, we considered two cases depending on whether theconstraint is binding (active) or not binding (inactive), but the inactive case(∇f(x) = 0) is a special case of the active case (∇f(x) + λa = 0) by settingλ = 0. Furthermore, although we explained the intuition in two-dimension, theresult clearly holds in arbitrary dimensions. Therefore we can summarize thenecessary condition for optimality as in the following proposition.


minimize f(x)

subject to 〈a, x〉 ≤ c,

where f is differentiable, a is a nonzero vector, and c is a constant. If x is asolution, then there exists a number λ ≥ 0 such that

∇f(x) + λa = 0.

Example 6.1. Consider the motivating example

maximize log x1 + log x2

subject to p1x1 + p2x2 ≤ w.

53

Maximizing log x1 + log x2 is the same as minimizing f(x1, x2) = − log x1 −

log x2. The gradient is ∇f(x) = −[1/x1

1/x2

]. Let p =

[p1

p2

]. By Proposition 6.1,

there is a number λ ≥ 0 such that

∇f(x) + λp = 0 ⇐⇒ −[1/x1

1/x2

]+ λ

[p1

p2

]=

[00

].

Therefore it must be x1 = 1λp1

and x2 = 1λp2

.Since the objective function is increasing, at the solution the constraint

p1x1 + p2x2 ≤ w must bind. Therefore

p1x1 + p2x2 = w ⇐⇒ p11

λp1+ p2

1

λp2= w ⇐⇒ λ =

2

w.

Substituting again, we get the solution (x1, x2) = ( w2p1

, w2p2

).

6.2.2 Multiple linear constraints

Now consider the optimization problem

minimize f(x)

subject to 〈a1, x〉 ≤ c1,〈a2, x〉 ≤ c2,

where f is differentiable, a1, a2 are nonzero vectors, and c1, c2 are constants.Let x be a solution and Ω be the constraint set, i.e., the set

Ω = x | g1(x) ≤ 0, g2(x) ≤ 0 ,

where gi(x) = 〈ai, x〉 − ci for i = 1, 2. Assume that both constraints are activeat the solution. Figure 6.3 shows the situation.

a1 = ∇g1

a2 = ∇g2

d

x

−∇f(x)

Ω

Figure 6.3: Gradient and feasible direction.

In general, the vector d is called a feasible direction if you can move a littlebit towards the direction d from the point x, so x + td ∈ Ω for small enought > 0. If d is a feasible direction and x is the minimum of f , then f cannotdecrease towards the direction d, so we must have

0 ≤ limt↓0

f(x+ td)− f(x)

t= 〈∇f(x), d〉 ⇐⇒ 〈−∇f(x), d〉 ≤ 0.

54

Therefore the negative of the gradient −∇f(x) and any feasible direction d mustmake an obtuse angle. Recall that d is a feasible direction if d and ai = ∇gimake an obtuse angle for i = 1, 2. By looking at Figure 6.3, in order for x to bethe minimum, it is necessary that −∇f(x) lies between the vectors a1 and a2.This is true if and only if there are numbers λ1, λ2 ≥ 0 such that

−∇f(x) = λ1a1 + λ2a2 ⇐⇒ ∇f(x) + λ1∇g1(x) + λ2∇g2(x) = 0.

Although we have assumed that both constraints bind, this equation is true evenif one (or both) of them does not bind by setting λ1 = 0 and/or λ2 = 0. Also, itis clear that this argument holds for an arbitrary number of linear constraints.Therefore we obtain the following general theorem.

Theorem 6.2 (Karush-Kuhn-Tucker theorem with linear constraints). Con-sider the optimization problem

minimize f(x)

subject to gi(x) ≤ 0 (i = 1, . . . , I),

where f is differentiable and gi(x) = 〈ai, x〉 − ci is linear with ai 6= 0. If x is asolution, then there exist numbers (called Lagrange multipliers) λ1, . . . , λI suchthat

∇f(x) +

I∑i=1

λi∇gi(x) = 0, (6.1a)

(∀i) λi ≥ 0, gi(x) ≤ 0, λigi(x) = 0. (6.1b)

Condition (6.1a) is called the first-order condition. Its interpretation is thatat the minimum x, the negative of the gradient −∇f(x) must lie between all thenormal vectors ai = ∇gi(x) corresponding to the active constraints. Condition(6.1b) is called the complementary slackness condition. The first-order condi-tion and the complementary slackness condition are jointly called the Karush-Kuhn-Tucker (KKT) conditions.2 The condition λi ≥ 0 says that the Lagrangemultiplier is nonnegative, and gi(x) ≤ 0 says that the constraint is satisfied,which are not new. The condition λigi(x) = 0 takes care of both the active(binding) and inactive (non-binding) cases. If the constraint i is active, thengi(x) = 0, so we have λigi(x) = 0 automatically. If the constraint i is inactive,we have λi = 0, so again λigi(x) = 0 automatically.

An easy way to remember (6.1a) is as follows. Given the objective functionf(x) and the constraints gi(x) ≤ 0, define the Lagrangian

L(x, λ) = f(x) +

I∑i=1

λigi(x),

where λ = (λ1, . . . , λI). The Lagrangian is the sum of the objective functionf(x) and the weighted sum of the constraint functions gi(x) weighted by the

2A version of this theorem appeared in the 1939 Master’s thesis of William Karush (1917-1997) but did not get much attention. (Applied Mathematics gained respect only after prov-ing its usefulness during World War II.) The theorem became widely known after the redis-covery by Harold Kuhn (1925-2014) and Albert Tucker (1905-1995) in a conference paperin 1951. However, the paper (http://projecteuclid.org/download/pdf_1/euclid.bsmsp/1200500249) does not cite Karush.

55

http://projecteuclid.org/download/pdf_1/euclid.bsmsp/1200500249

http://projecteuclid.org/download/pdf_1/euclid.bsmsp/1200500249

Lagrange multipliers λi. Pretend that λ is constant and you want to minimizeL(x, λ) with respect to x. Then the first-order condition is

0 = ∇L(x, λ) = ∇f(x) +

I∑i=1

λi∇gi(x),

which is the same as (6.1a).

6.2.3 Linear inequality and equality constraints

So far we have considered the case when all constraints are inequalities, but whatif there are also equalities? For example, consider the optimization problem

minimize f(x)

subject to 〈a, x〉 ≤ c,〈b, x〉 = d,

where f is differentiable, a, b are nonzero vectors, and c, d are constants. Wederive a necessary condition for optimality by turning this problem into onewith only inequality constraints. Note that 〈b, x〉 = d is equivalent to 〈b, x〉 ≤ dand 〈b, x〉 ≥ d. Furthermore, 〈b, x〉 ≥ d is equivalent to 〈−b, x〉 ≤ −d. Thereforethe problem is equivalent to:

minimize f(x)

subject to 〈a, x〉 − c ≤ 0,

〈b, x〉 − d ≤ 0,

〈−b, x〉+ d ≤ 0.

Setting g1(x) = 〈a, x〉 − c, g2(x) = 〈b, x〉 − d, and g3(x) = 〈−b, x〉 + d, thisproblem is exactly the form in Theorem 6.2. Therefore there exist Lagrangemultipliers λ1, λ2, λ3 ≥ 0 such that

∇f(x) + λ1∇g1(x) + λ2∇g2(x) + λ3∇g3(x) = 0.

Substituting ∇gi(x)’s, we get

∇f(x) + λ1a+ λ2b+ λ3(−b) = 0.

Letting λ = λ1 and µ = λ2 − λ3, we get

∇f(x) + λa+ µb = 0.

This equation is similar to the KKT condition (6.1), except that µ can bepositive or negative. In general, we obtain the following theorem.

Theorem 6.3 (Karush-Kuhn-Tucker theorem with linear constraints). Con-sider the optimization problem

minimize f(x)

subject to gi(x) ≤ 0 (i = 1, . . . , I),

hj(x) = 0 (j = 1, . . . , J),

56

where f is differentiable and gi(x) = 〈ai, x〉 − ci and hj(x) = 〈bj , x〉 − dj arelinear with ai, bj 6= 0. If x is a solution, then there exist Lagrange multipliersλ1, . . . , λI and µ1, . . . , µJ such that

∇f(x) +

I∑i=1

λi∇gi(x) +

J∑j=1

µj∇hj(x) = 0, (6.2a)

(∀i) λi ≥ 0, gi(x) ≤ 0, λigi(x) = 0, (6.2b)

(∀j) hj(x) = 0. (6.2c)

An easy way to remember the KKT conditions (6.2) is as follows. As in thecase with only inequality constraints, define the Lagrangian

L(x, λ, µ) = f(x) +

I∑i=1

λigi(x) +

J∑j=1

µjhj(x).

Pretend that λ, µ are constants and you want to minimize L with respect tox. The first-order condition is ∇xL(x, λ, µ) = 0, which is exactly (6.2a). Thecomplementary slackness condition (6.2b) is the same as in the case with onlyinequality constraints. The new condition (6.2c) merely says that the solutionx must satisfy the equality constraints hj(x) = 0.

6.3 Optimization with nonlinear constraints

We now consider the general case, the optimization of a nonlinear multi-variablefunction subject to nonlinear constraints.

6.3.1 Karush-Kuhn-Tucker theorem

Consider the optimization problem

minimize f(x) (6.3a)

subject to gi(x) ≤ 0 (i = 1, . . . , I). (6.3b)

Let x be a solution. By Taylor’s theorem, we have

gi(x) ≈ gi(x) + 〈∇gi(x), x− x〉 .

The gradient of both sides at x = x is ∇gi(x). A natural idea is to approximatethe nonlinear constraints by linear ones around the solution this way, and derivea necessary condition corresponding to these linear constraints. This idea indeedworks, subject to some caveats.

Theorem 6.4 (Karush-Kuhn-Tucker theorem with nonlinear constraints). Con-sider the optimization problem (6.3), where f and gi’s are differentiable. If x isa solution and a regularity condition (called constraint qualification, CQ) holds,then there exist Lagrange multipliers λ1, . . . , λI such that

∇f(x) +

I∑i=1

λi∇gi(x) = 0, (6.4a)

(∀i) λi ≥ 0, gi(x) ≤ 0, λigi(x) = 0. (6.4b)

57

Proving the Karush-Kuhn-Tucker theorem in the full generality is beyondthe scope of this chapter, and the rigorous discussion is deferred to Chapter12. The conclusion of Theorem 6.4 is the same as that of Theorem 6.2. Whatis different is the assumption. While in the linear case the conclusion holdswithout any qualification, in the nonlinear case we need to verify certain “con-straint qualifications”. The following trivial example shows the need for suchconditions.

Example 6.2. Consider the minimization problem

minimize x

subject to − x3 ≤ 0.

Since −x3 ≤ 0 ⇐⇒ x ≥ 0, the solution is obviously x = 0. Now let f(x) = xand g(x) = −x3. If the KKT theorem holds, there must be a Lagrange multiplierλ ≥ 0 such that

f ′(x) + λg′(x) = 0.

But since f ′(x) = 1 and g′(x) = −3x2, we have f ′(0) = 1 and g′(0) = 0, sothere is no number λ ≥ 0 such that f ′(x) + λg′(x) = 0.

Below are a few examples of the constraint qualifications. In general, let I(x)be the set of indices such that gi(x) = 0—that is, i’s for which the constraintgi(x) ≤ 0 binds at x = x.

Linear independence (LICQ) The gradients of the active constraints,

∇gi(x) | i ∈ I(x) ,

are linearly independent.

Slater (SCQ) gi’s are convex, and there exists a point x0 such that the con-straints are satisfied with strict inequalities: gi(x0) < 0 for all i.

There are other weaker constraint qualifications, which are deferred to Chapter12.

In practice, many optimization problems have only linear constraints, inwhich case there is no need to check any constraint qualifications. If there areequality constraints as well as inequality constraints, then the conclusion ofTheorem 6.3 holds under certain constraint qualifications.

6.3.2 Convex optimization

We previously learned (Theorem 5.2) that for unconstrained optimization prob-lems, the first-order condition is not just necessary but also sufficient for opti-mality when the objective function is convex or concave. The same holds forconstrained optimization problems. Indeed, we can prove the following theorem.

Theorem 6.5 (Karush-Kuhn-Tucker theorem for convex optimization). Con-sider the optimization problem (6.3), where f and gi’s are differentiable andconvex.

1. If x is a solution and there exists a point x0 such that gi(x0) < 0 for alli, then there exist Lagrange multipliers λ1, . . . , λI such that (6.4) holds.

58

2. If there exist Lagrange multipliers λ1, . . . , λI such that (6.4) holds, then xis a solution.

The first part of Theorem 6.5 is just the KKT theorem with the Slaterconstraint qualification. The second part says that the first-order conditionsare sufficient for optimality.

Proof. We only prove the second part. Suppose that there exist Lagrange mul-tipliers λ1, . . . , λI such that (6.4) holds. Let

L(x, λ) = f(x) +

I∑i=1

λigi(x)

be the Lagrangian. Since λi ≥ 0 and f, gi’s are convex, L is convex as a functionof x. By the first-order condition (6.4a), we have

0 = ∇f(x) +

I∑i=1

λi∇gi(x) = ∇xL(x, λ),

so L attains the minimum at x. Therefore

f(x) = f(x) +

I∑i=1

λigi(x) (∵ λigi(x) = 0 for all i by (6.4b))

= L(x, λ) ≤ L(x, λ) (∵ ∇xL(x, λ) = 0)

= f(x) +

I∑i=1

λigi(x) (∵ Definition of L)

≤ f(x), (∵ λi ≥ 0 and gi(x) ≤ 0 for all i)

so x is a solution.

Theorem 6.5 makes our life easy for solving nonlinear constrained optimiza-tion problems. A general approach is as follows.

Step 1. Verify that f and gi’s are differentiable and convex, and that the Slaterconstraint qualification holds.

Step 2. Set the Lagrangian L(x, λ) = f(x) +∑Ii=1 λigi(x) and derive the KKT

conditions (6.4).

Step 3. Solve for x and λ. The solution of the original problem is x.

Example 6.3. Consider the problem

minimize1

x1+

1

x2

subject to x1 + x2 ≤ 2,

where x1, x2 > 0. Let us solve this problem step by step.

59

Step 1. Let f(x1, x2) = 1x1

+ 1x2

be the objective function. Since

(1/x)′′ = (−x−2)′ = 2x−3 > 0,

the Hessian of f ,

∇2f(x1, x2) =

[2x−3

1 00 2x−3

2

],

is positive definite. Therefore f is convex. Let g(x1, x2) = x1 + x2 − 2.Since g is linear, it is convex. For (x1, x2) = (1

2 ,12 ) we have g(x1, x2) =

−1 < 0, so the Slater condition holds.

Step 2. Let

L(x1, x2, λ) =1

x1+

1

x2+ λ(x1 + x2 − 2)

be the Lagrangian. The first-order condition is

0 =∂L

∂x1= − 1

x21

+ λ ⇐⇒ x1 =1√λ,

0 =∂L

∂x2= − 1

x22

+ λ ⇐⇒ x2 =1√λ.

The complementary slackness condition is

λ(x1 + x2 − 2) = 0.

Step 3. From these equations it must be λ > 0 and

x1 + x2 − 2 = 0 ⇐⇒ 2√λ− 2 = 0 ⇐⇒ λ = 1

and x1 = x2 = 1/√λ = 1. Therefore (x1, x2) = (1, 1) is the (only)

solution.

In practice, in convex optimization problems (meaning that both the objec-tive function and the constraints are convex) we often skip verifying the Slaterconstraint qualification. The reason is that if you set up the Lagrangian andfind a point that satisfies the KKT conditions, then it is automatically a solu-tion by the second part of Theorem 6.5. If you cannot find any point satisfyingthe KKT conditions, then either there is no solution or the Slater constraintqualification is violated.

If the problem is not a convex optimization problem, then the procedure isslightly more complicated.

Step 1. Show that a solution exists (e.g., by showing that the objective functionis continuous and the constraint set is compact).

Step 2. Verify that f and gi’s are differentiable, and some constraint qualifica-tion holds.

Step 3. Set up the Lagrangian L(x, λ) = f(x) +∑Ii=1 λigi(x) and derive the

KKT conditions (6.4).

Step 4. Solve for x and λ. If you get a unique x, it is the solution. If you getmultiple x’s, compute f(x) for each and pick the minimum.

60

6.3.3 Constrained maximization

Next, we briefly discuss maximization. Although maximization is equivalent tominimization by flipping the sign of the objective function, doing so every timeis awkward. So consider the maximization problem

maximize f(x)

subject to gi(x) ≥ 0 (i = 1, . . . , I), (6.5)

where f, gi’s are differentiable. (6.5) is equivalent to the minimization problem

minimize − f(x)

subject to − gi(x) ≤ 0 (i = 1, . . . , I).

Assuming that Theorem 6.4 applies, the necessary condition for optimality is

−∇f(x)−I∑i=1

λi∇gi(x) = 0, (6.6a)

(∀i) λi(−gi(x)) = 0. (6.6b)

But (6.6) is equivalent to (6.4) by multiplying everything by (−1). For thisreason, it is customary to formulate a maximization problem as in (6.5) so thatthe inequality constraints are always “greater than or equal to zero”.

Example 6.4. Consider a consumer with utility function

u(x) = α log x1 + (1− α) log x2,

where 0 < α < 1 and x1, x2 ≥ 0 are consumption of good 1 and 2 (such afunction is called Cobb-Douglas). Let p1, p2 > 0 the price of each good andw > 0 be the wealth.

Then the consumer’s utility maximization problem (UMP) is

maximize α log x1 + (1− α) log x2

subject to x1 ≥ 0, x2 ≥ 0, p1x1 + p2x2 ≤ w.

Step 1. Clearly the objective function is concave and the constraints are linear(hence concave). The point (x1, x2) = (ε, ε) strictly satisfies the inequal-ities for small enough ε > 0, so the Slater condition holds. Thereforewe can apply Theorem 6.5.

Step 2. Let

L(x, λ, µ) = α log x1 +(1−α) log x2 +λ(w−p1x1−p2x2)+µ1x1 +µ2x2,

where λ ≥ 0 is the Lagrange multiplier corresponding to the budgetconstraint

p1x1 + p2x2 ≤ w ⇐⇒ w − p1x1 − p2x2 ≥ 0

and µn is the Lagrange multiplier corresponding to xn ≥ 0 for n = 1, 2.By the first-order condition, we get

0 =∂L

∂x1=

α

x1− λp1 + µ1,

0 =∂L

∂x2=

1− αx2

− λp2 + µ2.

61

By the complementary slackness condition, we have λ(w−p1x1−p2x2) =0, µ1x1 = 0, µ2x2 = 0.

Step 3. Since log 0 = −∞, x1 = 0 or x2 = 0 cannot be an optimal solution.Hence x1, x2 > 0, so by complementary slackness we get µ1 = µ2 = 0.Then by the first order condition we get x1 = α

λp1, x2 = 1−α

λp2, so λ > 0.

Substituting these into the budget constraint p1x1 + p2x2 = w, we get

α

λ+

1− αλ

= w ⇐⇒ λ =1

w,

so the solution is

(x1, x2) =

(αw

p1,

(1− α)w

p2

).

Remark 6.1. In the above example, you notice that the constraints x1 ≥ 0and x2 ≥ 0 never bind because the objective function is increasing in botharguments. (We can use an argument similar to Problem 3.6 to be more precise.)Therefore a quicker way to solve the problem is to ignore these constraints andset the Lagrangian

L(x1, x2, λ) = α log x1 + (1− α) log x2 + λ(w − p1x1 − p2x2).

Problems

6.1. This problem asks you to show that in general you cannot omit the con-straint qualification. Consider the optimization problem

minimize x

subject to x2 ≤ 0.

1. Find the solution.

2. Show that both the objective function and the constraint function areconvex, but the Slater constraint qualification does not hold.

3. Show that the Karush-Kuhn-Tucker condition does not hold.

6.2. Consider the problem

minimize1

x1+

4

x2


where x1, x2 > 0.

1. Prove that the objective function is convex.

2. Write down the Lagrangian.

3. Explain why the Karush-Kuhn-Tucker conditions are both necessary andsufficient for a solution.

62

4. Compute the solution.


maximize4

3x3

1 +1

3x3

2


x1 ≥ 0, x2 ≥ 0.

1. Are the Karush-Kuhn-Tucker conditions necessary for a solution? Answeryes or no, then explain why.

2. Are the Karush-Kuhn-Tucker conditions sufficient for a solution? Answeryes or no, then explain why.



6.4. Let f(x1, x2) = x21 + x1x2 + x2

2.

1. Compute the gradient and the Hessian of f .

2. Show that f is convex.

3. Solve

minimize x21 + x1x2 + x2

2

subject to x1 + x2 ≥ 2.


maximize x1 + log x2 −1

2x23

subject to x1 + p2x2 + p3x3 ≤ w,

where x2, x3 > 0 but x1 is unconstrained and p2, p3, w > 0 are constants.

1. Show that the objective function is concave.


3. Find the solution.

6.6. Let A be an N×N symmetric positive definite matrix. Let B be an M×Nmatrix with M ≤ N such that the M row vectors of B are linearly independent.Let c ∈ RM be a vector. Solve

minimize1

2〈x,Ax〉

subject to Bx = c.

6.7. Solve

maximize x1 + x2

subject to x21 + x1x2 + x2

2 ≤ 1.

63

6.8. Solve

maximize 〈b, x〉subject to 〈x,Ax〉 ≤ r2,

where 0 6= b ∈ RN , A is an N × N symmetric positive definite matrix, andr > 0.

6.9. Consider a consumer with utility function

u(x) =

N∑n=1

αn log xn,

where αn > 0,∑Nn=1 αn = 1, and xn ≥ 0 is the consumption of good n. Let

p = (p1, . . . , pN ) be a price vector with pn > 0 and w > 0 be the wealth.

1. Formulate the consumer’s utility maximization problem.


6.10. Solve the same problem as above for the case

u(x) =

N∑n=1

αnx1−σn

1− σ,

where 1 6= σ > 0.

64

Chapter 7

Introduction to DynamicProgramming

7.1 Introduction

So far, we have only considered the maximization or minimization of a givenfunction, subject to some constraints. Such a problem is (sometimes) called astatic optimization problem because there is only one decision to make, namelychoosing the variables that optimize the objective function. In some cases,writing down or evaluating the objective function itself may be complicated.Furthermore, in many problems the decision maker makes multiple decisionsover time instead of a single decision.

Dynamic programming (DP) is a mathematical programming (optimization)technique that exploits the sequential structure of the problem. It is easier tounderstand the logic by examples instead of the abstract formulation. Supposethat you want to minimize the function

f(x1, x2) = 2x21 − 2x1x2 + x2

2 − 2x1 − 4x2.

One way to solve this is to compute the gradient and set it equal to zero, so

∇f(x1, x2) =

[4x1 − 2x2 − 2−2x1 + 2x2 − 4

]=

[00

]⇐⇒

[x1

x2

]=

[35

].

(This is only a necessary condition for optimality, but since the objective func-tion is convex because the Hessian is positive definite, so it is also sufficient.)

Another way to solve this problem is in two steps. First, assume that wehave already determined the value of x1, so treat x1 as a constant. Then theobjective function is a (convex) quadratic function in x2. Taking the partialderivative with respect to x2 and setting it equal to zero, we get

∂f

∂x2= −2x1 + 2x2 − 4 = 0 ⇐⇒ x2 = x1 + 2.

65

Then the function value becomes

g(x1) := f(x1, x1 + 2)

= 2x21 − 2x1(x1 + 2) + (x1 + 2)2 − 2x1 − 4(x1 + 2)

= x21 − 6x1 − 4.

Here g(x) is the minimum value that we can attain if we choose x2 optimally,given x1 = x. Clearly we can solve the original problem by choosing x1 so as tominimize g. Since g is a convex quadratic function, setting the derivative equalto zero, we get

g′(x1) = 2x1 − 6 = 0 ⇐⇒ x1 = 3.

Therefore the solution is (x1, x2) = (x1, x1 + 2) = (3, 5), as it should be.Essentially, dynamic programming amounts to breaking a single optimiza-

tion problem with many variables into multiple optimization problems withfewer variables. By doing so, the problem sometimes becomes easier to handle,especially when the problem is stochastic (probabilistic). In the above example,we have solved the single problem with two variables

minx1,x2

f(x1, x2)

by breaking it into two problems with one variable each,

g(x1) := minx2

f(x1, x2) and minx1

g(x1).

7.2 Examples

We now discuss several concrete examples.

7.2.1 Knapsack problem

Suppose you are a thief who has broken into a jewelry store. You have a knapsackof size S (an integer) to pack what you have stolen. There are I types of jewelriesindexed by i = 1, 2, . . . , I, and a type i jewelry has integer size si and value vi.You want to pack your knapsack so as to maximize the value of jewelries thatyou have stolen.

Formulating this problem as a constrained optimization problem is not par-ticularly hard. Letting ni be the number of type i jewelries that you pack, thetotal value is

∑Ii=1 nivi and the total size is

∑Ii=1 nisi. Therefore the problem

is equivalent to

maximize

I∑i=1

nivi

subject to

I∑i=1

nisi ≤ S,

ni: nonnegative integer.

One way to solve this problem is to use the theory on integer linear programming(which I do not discuss further).

66

Another way to solve is to use dynamic programming. Let V (S) be themaximum value of jewelries that can be packed in a size S knapsack. (Thisis called a value function.) Clearly V (S) = 0 if S < mini si since you cannotpack anything in this case. If you put anything at all in your knapsack (soS ≥ mini si), clearly you start packing with some type of jewelry. If you putobject i, then you get value vi and you are left with remaining size S − si. Bythe definition of the value function, if you continue packing optimally, you gettotal value V (S − si) from the remaining space. Therefore if you first packobject i, the maximum value that you can get is

vi + V (S − si).

Since you want to pick the first object optimally, you want to maximize thisvalue with respect to i, which will give you the total maximum value V (S) ofthe original problem. Therefore

V (S) = maxi

[vi + V (S − si)].

You can iterate this equation (called the Bellman equation) backward startingfrom V (S) = 0 for S < mini si to find out the maximum value.

For example, let I = 3 (three types), (s1, s2, s3) = (1, 2, 5), and (v1, v2, v3) =(1, 3, 8). Then

V (0) = 0,

V (1) = v1 + V (0) = 1,

V (2) = maxi

[vi + V (2− si)] = max 1 + V (1), 3 + V (0) = max 2, 3 = 3,

V (3) = maxi

[vi + V (3− si)] = max 1 + V (2), 3 + V (1) = max 4, 4 = 4,

V (4) = max 1 + V (3), 3 + V (2) = max 5, 6 = 6,

V (5) = max 1 + V (4), 3 + V (3), 8 + V (0) = max 7, 7, 8 = 8,

and so on.

7.2.2 Shortest path problem

Suppose that there are locations indexed by i = 1, . . . , I. Traveling directlyfrom i to j costs cij ≥ 0, with cii = 0. (If there is no direct route from i to j,simply define cij = ∞.) You want to find the cheapest way to travel from anypoint i to any other point j.

To solve this problem, let VN (i, j) be the minimum cost to travel from i toj in at most N steps. Let k be the first connection (including possibly k = i).Traveling from i to k costs cik, and now you need to travel from k to j in atmost N − 1 steps. If you continue optimally, the cost from k to j is (by thedefinition of the value function) VN−1(k, j). Therefore the Bellman equation is

VN (i, j) = minkcik + VN−1(k, j) .

Since 0 ≤ VN (i, j) ≤ VN−1(i, j) (because cii = 0), the limit limN→∞ VN (i, j)exists.1 Therefore the cheapest path can be found by iterating backwards fromV1(i, j) = cij .

1In fact, it converges in finite steps. This is because since you visit each point at mostonce, the number of connections is at most I − 1, so VN = VN−1 for N ≥ I.

67

7.2.3 Optimal saving problem

Suppose that you live for T+1 years indexed by t = 0, 1, . . . , T . You have initialwealth w0. At each point in time, you can either consume some of your wealthor save it at gross interest rate R > 0. That is, if you save 1 dollar this year, itwill grow to R dollars next year.

To solve this problem, let wt be your wealth at the beginning of year t. Ifyou consume ct in year t, the next year’s wealth will be wt+1 = R(wt− ct). Forconcreteness, assume that the utility function is

UT (c0, . . . , cT ) =

T∑t=0

βt log ct.

(The subscript T in UT means that there are T years to go in the future.)Clearly we have

UT (c0, . . . , cT ) = log c0 + βUT−1(c1, . . . , cT ).

Let VT (w) be the maximum utility you get when you start with capital wand there are T years to go. If T = 0, you have no choice but to consumeeverything, so V0(w) = logw. If T > 0 and you consume c this year, by thebudget equation you will have capital w′ = R(w − c) next year and there willbe T − 1 years to go. Therefore the Bellman equation is

VT (w) = max0≤c≤w

[log c+ βVT−1(R(w − c))].

In principle you can compute VT (w) by iterating backwards from T = 0using V0(w) = logw. Let us compute V1(w), for example. By the Bellmanequation and V0(w) = logw, we have

V1(w) = max0≤c≤w

[log c+ βV0(R(w − c))]

= max0≤c≤w

[log c+ β log(R(w − c))].

The right-hand side inside the brackets is concave in c, so we can maximize itby setting the derivative equal to zero. The first-order condition is

1

c+ β

−1

w − c= 0 ⇐⇒ w − c = βc ⇐⇒ c =

w

1 + β.

Therefore the value function is

V1(w) = logw

1 + β+ β log

(R

βw

1 + β

)= (1 + β) logw + constant,

Where “constant” is some constant that depends only on the given parametersβ and R.

7.2.4 Drawing cards

Suppose there are equal numbers of black and red cards (say N each), and youdraw one card at a time. You have the option to stop at any time. The scoreyou get when you stop is

“number of black cards drawn”− “number of red cards drawn”.

68

You want to maximize the expected score. What is the optimal strategy?Let b, r be the number of black and red cards that remain in the stack. Then

you have already drawn N − b black cards and N − r red cards, so your currentscore is (N − b)− (N − r) = r − b. If you stop, you get r − b. If you continue,on the next draw you draw a black card with probability b

b+r (and b decreasesby 1) and a red card with probability r

b+r (and r decreases by 1). Let V (b, r)be the expected score when b black cards and r red cards remain. Then theBellman equation is

V (b, r) = max

r − b︸︷︷︸stop

,b

b+ rV (b− 1, r) +

r

b+ rV (b, r − 1)︸︷︷︸

continue

.

You can find the optimal strategy by iterating backwards from V (0, 0) = 0.

7.2.5 Optimal proposal

Suppose you know you are going to meet N persons one at a time that youmay want to marry. You can propose only once (possibly because your proposalwill be accepted for sure and the cost of divorce is prohibitive). The value ofyour potential partner is independently distributed uniformly over the interval0 ≤ v ≤ 1. Having observed a candidate, you can either propose or wait tosee the next candidate (but you cannot go back once forgone). You want tomaximize the expected value of your marriage. What is the best strategy topropose?

Let Vn(v) be the maximum expected value when faced with a candidatewith value v and there are n candidates to go. Clearly V0(v) = v. The Bellmanequation is

Vn(v) = max v,E[Vn−1(v′)] = max

v,

∫ 1

0

Vn−1(v′) dv′,

where the expectation is taken with respect to v′, the value of the next candidate.In this case we can do more than writing down the Bellman equation. Since

E[Vn−1(v′)] is just a constant depending on n, say an, it follows that Vn(v) =max v, an. Therefore the optimal strategy is to propose if v ≥ an and waitotherwise. Using the definition of an and the Bellman equation, it follows that

an = E[Vn−1(v′)] =

∫ 1

0

Vn−1(v′) dv′

=

∫ an−1

0

an−1 dv′ +

∫ 1

an−1

v′ dv′

= a2n−1 +

1

2(1− a2

n−1) =1

2(1 + a2

n−1).

Starting from a0 = 0, we can compute the threshold for proposing an.

7.3 General formulation

In general, we can formulate a dynamic programming problem as follows. Ateach stage, there are variables that define your current situation, called state

69

variables. Let xn be the state variable when there are n stages to go. (The statevariable may be a number, a vector, or whatever is relevant for decision making.The dimension of xn may depend on n.) The state variable xn determines yourconstraint set, denoted by Γn(xn). A feasible action is an element of the setΓn(xn), which is called a control variable. By choosing a control yn, the nextstage’s state variable is determined by the law of motion xn−1 = gn(xn, yn).(Here I am indexing the state and control variables by the number of stages togo, so x’s are counted backwards.)

A sequence of state variables (xn, . . . , x0) and control variables (yn, . . . , y0)are said to be feasible if they satisfy the constraint and the law of motion.That is, yk ∈ Γk(xk) and xk−1 = gk(xk, yk) for all k = 0, . . . , n. Given afeasible sequence of state and control variables up to n stages from the last,there corresponds a value (real number) Un((xk, yk)nk=0). (Un is a functionthat takes as argument all present and future state and control variables.) Wewant to maximize or minimize Un depending on the context, but for concretenessassume that we want to maximize Un and let us call it the utility function.

In order to apply dynamic programming, the utility function Un must admita special recursive structure. That is, today’s utility must be a function oftoday’s state and control and tomorrow’s utility. Thus we require

Un = fn(xn, yn, Un−1), (7.1)

where the function fn is called the aggregator, assumed to be continuous andincreasing in the third argument. The supremum of the feasible utility,

Vn(xn) = sup Un((xk, yk)nk=0) | (∀k)yk ∈ Γk(xk), xk−1 = gk(xk, yk) , (7.2)

is called the value function. The following principle of optimality is extremelyimportant.

Theorem 7.1 (Principle of Optimality). Suppose that the aggregator fn(x, y, v)is continuous and increasing in v. Then

Vn(xn) = supyn∈Γn(xn)

fn(xn, yn, Vn−1(gn(xn, yn))). (7.3)

The relation (7.3) is called the Bellman equation.

Proof. For any feasible (xk, yk)nk=0, we have

Un((xk, yk)nk=0)= fn(xn, yn, Un−1(

(xk, yk)n−1

k=0

)) (∵ (7.1))

≤ fn(xn, yn, Vn−1(xn−1)) (∵ (7.2), fn monotone)

= fn(xn, yn, Vn−1(gn(xn, yn))) (∵ xn−1 feasible)

≤ supyn∈Γn(xn)

fn(xn, yn, Vn−1(gn(xn, yn))). (∵ yn feasible)

Taking the supremum of the left-hand side over all feasible paths, we get

Vn(xn) ≤ supyn∈Γn(xn)

fn(xn, yn, Vn−1(gn(xn, yn))).

70

To show the reverse inequality, pick any yn ∈ Γn(xn) and let xn−1 = gn(xn, yn).By the definition of Vn−1, for any v < Vn−1(xn−1) there exists a feasible sequence

(xk, yk)n−1k=0

such that v < Un−1(

(xk, yk)n−1

k=0

). Therefore

Vn(xn) ≥ Un((xk, yk)nk=0) (∵ (7.2))

= fn(xn, yn, Un−1(

(xk, yk)n−1k=0

)) (∵ (7.1))

≥ fn(xn, yn, v). (∵ fn monotone)

Since fn is continuous in v, letting v ↑ Vn−1(xn−1) we get

Vn(xn) ≥ fn(xn, yn, Vn−1(xn−1)) = fn(xn, yn, Vn−1(gn(xn, yn))).

Taking the supremum of the right-hand side with respect to yn ∈ Γn(xn), weget

Vn(xn) ≥ supyn∈Γn(xn)

fn(xn, yn, Vn−1(gn(xn, yn))).

Remark 7.1. The power of dynamic programming is to break a single optimiza-tion problem with many variables into multiple optimization problems withfewer variables. Without dynamic programming, the evaluation of the objec-tive function alone might be a nightmare. For example, in principle we get theutility function Un((xk, yk)nk=0) by iterating (7.1) backwards, but this functionmay be extremely complicated.

Remark 7.2. In the above formulation, we implicitly assumed that the opti-mization problem is deterministic, but the stochastic case is similar. In thestochastic case, the number of control variables increases exponentially with thenumber of stages. (For example, flipping a coin n times has 2n potential out-comes.) Then solving the optimization problem in one shot would be impossiblewhen the number of stages is large. Dynamic programming would be the onlypractical way to solve the problem.

7.4 Solving dynamic programming problems

There are a few ways to solve dynamic programming problems.

7.4.1 Value function iteration

The most basic way to solve a dynamic programming problem is by value func-tion iteration, also called backward induction. Under mild conditions, we knowthat the Bellman equation (7.3) holds. Starting from n = 0, which is merely

V0(x0) = supy0∈Γ0(x0)

U0(x0, y0),

in principle we can compute Vn(xn) by iterating the Bellman equation (7.3)from backwards.

The knapsack problem, shortest path problem, and drawing cards problemcan all be solved this way using a computer, which are left as an exercise.

71

7.4.2 Guess and verify

Sometimes we can guess the functional form of the value function from thestructure of the problem. For example, in the optimal proposal problem, weknow that the value function must be of the form

Vn(v) = max v, an

for some constant an, with a0 = 0. Then we derived a difference equation thatan satisfies,

an =1

2(1 + a2

n−1).

Thus the original problem of finding the value function Vn(v) reduced to findingthe number an.

The optimal saving problem can also be solved by guess and verify. We knowthat V0(w) = logw. We might guess in general that VT (w) = aT + bT logw,where aT and bT are some constants with bT > 0. Assuming that this is correct,substituting into the Bellman equation we get

aT + bT logw = max0≤c≤w

[log c+ β(aT−1 + bT−1 log(R(w − c)))].

Taking the derivative of the expression inside the brackets with respect to c andsetting it equal to zero, we get

1

c− βbT−1

w − c= 0 ⇐⇒ c =

w

1 + βbT−1.

Substituting this into the Bellman equation, we get

aT + bT logw = log c+ β(aT−1 + bT−1 log(R(w − c)))= (1 + βbT−1) logw + constant.

In order for this to be an identity, it must be bT = 1 + βbT−1, which is a firstorder linear difference equation (so can be solved). Since b0 = 1, the generalterm is

bT = 1 + β + · · ·+ βT =1− βT+1

1− β.

There will also be a difference equation for aT , which is unimportant becausethe value of aT does not affect the behavior (it only affects the utility level).Therefore the optimal consumption is

c =w

1 + βbT−1=

w

bT=

1− β1− βT+1

w

when there are T periods to go. This formula means that you should consumea fraction 1−β

1−βT+1 when there are T periods to go, independent of the interestrate.

Problems

7.1. What are the state variables of the knapsack problem, optimal savingproblem, drawing cards, and optimal proposal? What are the control variables?What are the aggregators?

72

7.2. There are N types of coins. A coin of type n has integer value vn. Youwant to find the minimum number of coins needed for the value of the coins tosum to S, where S ≥ 0 is an integer.

1. What is (are) the state variable(s)?

2. Write down the Bellman equation.

3. Solve the problem for S = 10 when N = 3 and (v1, v2, v3) = (1, 2, 4).

7.3. Suppose you live for 1 + T years and your utility function is

E

T∑t=0

βtu(ct),

where ct is consumption. At each time t you get a job offer (income) y drawnfrom some distribution. If you accept the job, you receive y each year for therest of your life. If you reject the job, you get unemployment benefit b today andyou can search for a job next period. Assume that you cannot save or borrow,so you spend all your income every period. Write down the Bellman equation.

7.4. You have a call option on a stock with strike price K and time to expirationT . This means that if you exercise the option at time t ≤ T when the stockprice is St, you will get St −K at t. If you don’t exercise the option, you willget nothing. You want to exercise the option so as to maximize the expecteddiscounted payoff

E

[1

(1 + r)t(St −K)

],

where t is the exercise date and r is the interest rate. Assume that the grossreturn of the stock is

St+1

St=

1 + µ+ σ, (with probability 1/2)

1 + µ− σ, (with probability 1/2)

where µ > 0 is the expected return and σ > µ is the volatility.

1. What is (are) the state variable(s)?

2. Write down the Bellman equation that the option value satisfies.

3. Compute the option value when T = 1 and the current stock price is S,where S < K < (1 + µ+ σ)S.

7.5. Suppose you currently have a T year mortgage at fixed interest rate r,which you can keep or refinance once. Assume that there are J types of mortgagein the market and S states of the world. The term of mortgage j is Tj yearsand the interest rate is rjs in state s. The transition probability from state s

to s′ is pss′ , so∑Ss′=1 pss′ = 1. Letting the mortgage payment at time t be mt,

your objective is to minimize the discounted expected payments

E∑t≥1

βtmt,

where β > 0 is your discount factor.

73

1. What are the state variables?

2. Write down the Bellman equation. (Hint: the objective function is linearin mortgage payments, so consider the value function per dollar borrowed.)

7.6. Set up concrete numbers for the knapsack problem, the shortest pathproblem, and the drawing cards problem. Solve the problems using your favoriteprogramming language (Matlab, Python, etc.).

7.7. You are a potato farmer. You start with some stock of potatoes. At eachtime, you can eat some of them and plant the rest. If you plant x potatoes, youwill harvest Axα potatoes at the beginning of the next period, where A,α > 0.You want to maximize your utility from consuming potatoes

T∑t=0

βt log ct,

where 0 < β < 1 is the discount factor, ct > 0 is consumption of potatoes attime t, and T is the number of periods you live.

1. If you have k potatoes now and consume c out of it, how many potatoescan you harvest next period?

2. Let VT (k) be the maximum utility you get when you start with k potatoes.Write down the Bellman equation.

3. Solve for the optimal consumption when T = 1.

4. Guess that VT (k) = aT + bT log k for some constants aT , bT . Assumingthat this guess is correct, derive a relation between bT and bT−1.

7.8. Consider the optimal saving problem with the utility function

T∑t=0

βtc1−γt

1− γ,

where γ > 0 and γ 6= 1.


2. Show that the value function must be of the form VT (w) = aTw1−γ

1−γ forsome aT > 0 with a0 = 1.

3. Take the first-order condition and express the optimal consumption as afunction of aT−1.

4. Substitute the optimal consumption into the Bellman equation and derivea relation between aT and aT−1.

5. Solve for aT and the optimal consumption rule.

7.9. Consider the optimal saving problem with stochastic interest rates. Let Rsbe the gross interest rate in state s ∈ 1, . . . , S, and let pss′ be the probabilityof moving from state s to s′.

74


2. Show that the value function must be of the form VT (w, s) = as,Tw1−γ

1−γfor some as,T > 0 with as,0 = 1.

3. By solving for the optimal consumption rule, derive a relation betweenas,T and as′,T−1Ss′=1.

75

Part II

Advanced Topics

76

Chapter 8

Contraction MappingTheorem and Applications

8.1 Contraction Mapping Theorem

In economics, we often apply fixed point theorems. Let X be a set and T : X →X a mapping that maps X into itself. (Such a mapping is called a self map.)Then x ∈ X is called a fixed point if T (x) = x, i.e., the point x stays fixed byapplying the mapping T .

One of the most useful (and easiest to prove) fixed point theorems is thecontraction mapping theorem, also known as the Banach fixed point theorem(named after the Polish mathematician who proved it first).

Before stating the theorem, we need to introduce some definitions. Let Xbe a set. Then the function d : X ×X → R is called a metric (or distance) if

1. (positivity) d(x, y) ≥ 0 for all x, y ∈ X, and d(x, y) = 0 if and only ifx = y,

2. (symmetry) d(x, y) = d(y, x) for all x, y ∈ X, and

3. (triangle inequality) d(x, z) ≤ d(x, y) + d(y, z) for all x, y, z ∈ X.

The pair (X, d) is called a metric space if d is a metric on X. When the metricd is clear from the context, we often call X a metric space. A typical exampleis X = RN and

d(x, y) =

√√√√ N∑n=1

(xn − yn)2

(the Euclidean distance), or more generally

d(x, y) =

(N∑n=1

|xn − yn|p) 1p

for p ≥ 1 (lp distance). When p =∞, it becomes d(x, y) = maxn |xn − yn| (supnorm).

77

Example 8.1. If X is a normed space, meaning that X is a vector space witha norm ‖·‖, then (X, d) becomes a metric space by setting d(x, y) = ‖x− y‖.

Proof. Take any x, y, z ∈ X. We have d(x, y) = ‖x− y‖ ≥ 0, with equality ifand only if x = y. Furthermore, d(x, y) = ‖x− y‖ = ‖y − x‖ = d(y, x). Finally,by the triangle inequality for the norm, we obtain

d(x, z) = ‖x− z‖ = ‖x− y + y − z‖ ≤ ‖x− y‖+‖y − z‖ = d(x, y)+d(y, z).

A metric space (X, d) is called complete if any Cauchy sequence is convergent,that is, x = limn→∞ xn exists whenever

(∀ε > 0)(∃N > 0)(∀m,n ≥ N) d(xm, xn) < ε.

A complete normed space is called a Banach space. We discuss a few examples.

Example 8.2. Let Ω be a set and bΩ be the set of all bounded functions fromΩ to R. Then bΩ is a Banach space with the sup norm ‖f‖ = supx∈Ω |f(x)|.

Proof. Let us first show that ‖·‖ is a norm. Since f is bounded for f ∈ bΩ,‖f‖ = supx∈Ω |f(x)| < ∞ is well-defined. Clearly ‖f‖ = supx∈Ω |f(x)| ≥ 0,with equality if and only if f = 0. For any α ∈ R, we have

‖αf‖ = supx∈Ω|αf(x)| = |α| sup

x∈Ω|f(x)| = |α| ‖f‖ .

Let f, g ∈ bΩ. Then for any x ∈ Ω we have

|f(x) + g(x)| ≤ |f(x)|+ |g(x)| ≤ ‖f‖+ ‖g‖ .

Taking the supremum of the left-hand side over x ∈ Ω, we obtain ‖f + g‖ ≤‖f‖+ ‖g‖, so the triangle inequality holds. Hence ‖·‖ is a norm.

To show that bΩ is a Banach space, it suffices to show that bΩ is complete.Let fn∞n=1 be a Cauchy sequence in bΩ. Then for all ε > 0, there exists N suchthat m,n ≥ N implies ‖fm − fn‖ < ε. Hence |fm(x)− fn(x)| < ε for all m,n ≥N and x ∈ Ω. Since for each x ∈ Ω the real sequence fn(x)∞n=1 is Cauchyand R is complete, there exists f(x) = limn→∞ fn(x). Letting m → ∞ in|fm(x)− fn(x)| < ε, we obtain |f(x)− fn(x)| ≤ ε. Using the triangle inequalityand noting that fn ∈ bΩ, we obtain |f(x)| ≤ |fn(x)| + ε ≤ ‖fn‖ + ε < ∞, so fis bounded with ‖f‖ ≤ ‖fn‖ + ε. Therefore f ∈ bΩ. Taking the supremum of|f(x)− fn(x)| ≤ ε over x ∈ Ω, we obtain ‖f − fn‖ ≤ ε. Therefore fn → f , andbΩ is a Banach space.

Example 8.3. Let Ω be a topological space and bcΩ be the set of all boundedcontinuous functions from Ω to R. Then bcΩ is a Banach space with the supnorm ‖f‖ = supx∈Ω |f(x)|.

Proof. Since bcΩ ⊂ bΩ, by Example 8.2, ‖·‖ is a norm on bcΩ.To show completeness, let fn∞n=1 be a Cauchy sequence in bcΩ. Then by

Example 8.2, we have f = limn→∞ fn ∈ bΩ. Therefore to show that bcΩ is aBanach space, it suffices to show that f is continuous. Take any ε > 0. Sincefn → f in bΩ, we can take N such that ‖f − fn‖ < ε/3 for n > N . Fix such n

78

and take any x ∈ Ω. Since fn is continuous, we can take a neighborhood U ofx such that |fn(y)− fn(x)| < ε/3 for y ∈ U . Then

|f(y)− f(x)| = |f(y)− fn(y) + fn(y)− fn(x) + fn(x)− f(x)|≤ |f(y)− fn(y)|+ |fn(y)− fn(x)|+ |fn(x)− f(x)|

≤ ‖f − fn‖+ε

3+ ‖f − fn‖ < ε,

so f is continuous.

Let (X, d) be a metric space. A mapping T : X → X is called a contractionmapping (or simply a contraction) with modulus β if β ∈ [0, 1) and

d(T (x), T (y)) ≤ βd(x, y) (8.1)

for all x, y ∈ X. Intuitively, the condition (8.1) means that when we apply T ,the distance between two points shrinks by a factor at most β < 1. The followingcontraction mapping theorem (also called the Banach fixed point theorem) iselementary but has many important applications, as we shall see below.

Theorem 8.1 (Contraction Mapping Theorem). Let (X, d) be a complete met-ric space and T : X → X be a contraction with modulus β ∈ [0, 1). Then

1. T has a unique fixed point x ∈ X,

2. for any x0 ∈ X, we have x = limn→∞ Tn(x0), and

3. the approximation error d(Tn(x0), x) has order of magnitude βn.

Proof. First note that a contraction is continuous (indeed, uniformly continu-ous) because for any ε > 0, if d(x, y) < ε then d(T (x), T (y)) ≤ βd(x, y) ≤ βε ≤ ε.

Take any x0 ∈ X and define xn = T (xn−1) for n ≥ 1. Then xn = Tn(x0).Since T is a contraction, we have

d(xn, xn−1) = d(T (xn−1), T (xn−2)) ≤ βd(xn−1, xn−2) ≤ · · · ≤ βn−1d(x1, x0).

If m > n ≥ N , then by the triangle inequality we have

d(xm, xn) ≤ d(xm, xm−1) + · · ·+ d(xn+1, xn)

≤ (βm−1 + · · ·+ βn)d(x1, x0)

=βn − βm

1− βd(x1, x0) ≤ βn

1− βd(x1, x0) ≤ βN

1− βd(x1, x0).

Since 0 ≤ β < 1, βN gets smaller as N gets large, so xn is a Cauchy sequence.Since X is complete, x = limn→∞ xn exists. Since

d(T (xn), xn) = d(xn+1, xn) ≤ βnd(x1, x0),

letting n→∞ and using the continuity of T , we get d(T (x), x) = 0. Since d isa metric, we have T (x) = x, so x is a fixed point of T .

To show uniqueness, suppose that x, y are fixed points of T , so T (x) = xand T (y) = y. Since T is a contraction, we have

0 ≤ d(x, y) = d(T (x), T (y)) ≤ βd(x, y) =⇒ (β − 1)d(x, y) ≥ 0.

79

Since β < 1, it must be d(x, y) = 0 and hence x = y. Therefore the fixed pointis unique.

Finally, let x be the fixed point of T , x0 be any point, and xn = Tn(x0).Then

d(xn, x) = d(T (xn−1), T (x)) ≤ βd(xn−1, x) ≤ · · · ≤ βnd(x0, x),

so letting n→∞ we have xn → x, and the error has order of magnitude βn.

Sometimes, we need to work with mappings T such that T k is a contractionfor some k ∈ N, although T itself may not be a contraction. The followingtheorem extends Theorem 8.1 to such cases. (See Problem 8.2 for an example.)

Theorem 8.2. Let (X, d) be a complete metric space and T : X → X be suchthat T k is a contraction for some k ∈ N. Then T has a unique fixed point x ∈ Xand we have x = limn→∞ Tn(x0) for any x0 ∈ X.

Proof. By the contraction mapping theorem, T k has a unique fixed point x ∈ X,so T k(x) = x. Since

T (x) = T (T k(x)) = T k+1(x) = T k(T (x)),

T (x) is also a fixed point of T k. Since the fixed point of T k is unique, it mustbe T (x) = x, so x is a fixed point of T . If x, y are fixed points of T , thenx = T (x) = · · · = T k(x) and y = T (y) = · · · = T k(y), so x, y are also fixedpoints of T k. Since T k is a contraction, it must be x = y. Therefore the fixedpoint of T is unique.

To show Tn(x0) → x for any x0 ∈ X, express any n ∈ N uniquely asn = kmn + rn, where mn ∈ Z+ and rn ∈ 0, . . . , k − 1. Then for each fixedr ∈ 0, . . . , k − 1, applying Theorem 8.1 to the initial value T r(x0) we havex = limm→∞(T k)mT r(x0) = limm→∞ T km+r(x0). Since r ∈ 0, . . . , k − 1 isarbitrary, we obtain Tn(x0)→ x.

8.2 Blackwell’s condition for contraction

It would be convenient if there is a sufficient condition for contraction that iseasily verifiable. Blackwell (1965) provides one such example.

Let X be a set. We say that a binary relation ≤ on X is a partial order if

1. (reflexivity) x ≤ x for all x ∈ X,

2. (antisymmetry) if x ≤ y and y ≤ x, then x = y,

3. (transitivity) if x ≤ y and y ≤ z, then x ≤ z.

A set with a partial order is called a partially ordered set, or poset for short.X = RN is a partially ordered Banach space by letting x ≤ y whenever xn ≤ ynfor all n.

Example 8.4. Let Ω be a topological space and bcΩ be the set of all boundedcontinuous functions from Ω to R. Then bcΩ is a partially ordered Banach spaceif we define f ≤ g whenever f(x) ≤ g(x) for all x ∈ Ω and ‖f‖ = supx∈Ω |f(x)|.

The following theorem provides a sufficient condition for a contraction.

80

Theorem 8.3 (Blackwell, 1965). Let Ω be a set and X ⊂ bΩ be a subset ofbounded functions from Ω to R. Suppose that for all f ∈ X and c ∈ R+, wehave f + c ∈ X, and T : X → X satisfies

1. (monotonicity) f ≤ g implies Tf ≤ Tg,

2. (discounting) there exists β ∈ [0, 1) such that T (f + c) ≤ Tf + βc for allconstant c ≥ 0.

Then T is a contraction.

Proof. Let ‖·‖ be the sup norm. Take any f, g ∈ X and x ∈ Ω. Since

f(x) = f(x)− g(x) + g(x) ≤ g(x) + ‖f − g‖ ,

we have f ≤ g + ‖f − g‖. Hence

Tf ≤ T (g + ‖f − g‖) (∵ monotonicity)

≤ Tg + β ‖f − g‖ (∵ discounting for c = ‖f − g‖)=⇒ Tf − Tg ≤ β ‖f − g‖ .

Interchanging the role of f, g, we obtain Tg− Tf ≤ β ‖f − g‖. This shows that|(Tf)(x)− (Tg)(x)| ≤ β ‖f − g‖ for any x ∈ Ω. Taking the supremum over x,we obtain ‖Tf − Tg‖ ≤ β ‖f − g‖, so T is a contraction.

8.3 Markov chain and Perron’s theorem

When a random variable is indexed by time, it is called a stochastic process.Let Xt be a stochastic process, where t = 0, 1, 2 . . . . When the distributionof Xt conditional on the past information Xt−1, Xt−2, . . . depends only on themost recent past (i.e., Xt−1), Xt is called a Markov process. For example, anAR(1) process

Xt = aXt−1 + εt

(where a is a number and εt is independent and identically distributed over time)is a Markov process. When the Markov process Xt takes on finitely many val-ues, it is called a finite-state Markov chain. Let Xt be a (finite-state) Markovchain and n = 1, 2, . . . , N be an index of the values the process can take. (Wesay that Xt = xn when the state at t is n.) Since there are finitely many states,the distribution of Xt conditional on Xt−1 is just a multinomial distribution.Therefore the Markov chain is completely characterized by the transition prob-ability (stochastic) matrix P = (pnn′), where pnn′ is the probability of moving

from state n to n′. Clearly, we have pnn′ ≥ 0 and∑Nn′=1 pnn′ = 1.

Suppose that X0 is distributed according to the distribution µ = [µ1, . . . , µN ](µn is the probability of being in state n). Then what is the distribution of X1?Using transition probabilities, the probability of being in state n′ at t = 1 is∑Nn=1 µnpnn′ , because the process must be in some state (say n) at t = 0 (which

happens with probability µn) and conditional on being in state n at t = 0, theprobability of moving to state n′ at t = 1 is pnn′ . By the definition of matrixmultiplication,

N∑n=1

µnpnn′ = (µP )n′

81

is the n′-th element of the row vector µP . Therefore µP is the distributionof X1. Similarly, the distribution of X2 is (µP )P = µP 2, and in general thedistribution of Xt is µP t.

As we let the system run for a long time, does the distribution settle down tosome fixed distribution? That is, does limt→∞ µP t exist, and if so, is it unique?We can answer this question by using the contraction mapping theorem.

Theorem 8.4. Let P = (pnn′) be a stochastic matrix such that pnn′ > 0 forall n, n′. Then there exists a unique invariant distribution π such that π = πP ,and limt→∞ µP t = π for all initial distribution µ.

Proof. Let ∆ =x ∈ RN+

∣∣∣∑Nn=1 xn = 1

be the set of all multinomial distri-

butions. Since ∆ ⊂ RN is closed and RN is a complete metric space with theL1 norm (that is, d(x, y) = ‖x− y‖ for ‖x‖ =

∑Nn=1 |xn|), ∆ is also a complete

metric space.Define T : ∆ → ∆ by T (x) = xP . To show that T (x) ∈ ∆, note that if

x ∈ ∆, since pnn′ ≥ 0 for all n, n′, we have xP ≥ 0, and since∑Nn′=1 pnn′ = 1,

we have

N∑n′=1

(xP )n′ =

N∑n′=1

N∑n=1

xnpnn′ =

N∑n=1

xn

N∑n′=1

pnn′ =

N∑n=1

xn = 1.

Therefore T (x) = xP ∈ ∆.Next, let us show that T is a contraction mapping. Since pnn′ > 0 and the

number of states is finite, there exists ε > 0 such that pnn′ > ε for all n, n′.Without loss of generality, we may assume Nε < 1. Let qnn′ = pnn′−ε

1−Nε > 0and Q = (qnn′). Since

∑n′ pnn′ = 1, we obtain

∑n′ qnn′ = 1, so Q is also a

stochastic matrix. Letting J be the matrix with all entries equal to 1, we haveP = (1−Nε)Q+ εJ .

Now let µ, ν ∈ ∆. Then

µP − νP = (1−Nε)(µQ− νQ) + ε(µJ − νJ).

Since all entries of J are 1 and the vectors µ, ν sum to 1, we have µJ = νJ =1 = (1, . . . , 1). Therefore letting 0 < β = 1−Nε < 1, we get

‖T (µ)− T (ν)‖ = ‖µP − νP‖ = β ‖µQ− νQ‖

= β

N∑n′=1

|(µQ)n′ − (νQ)n′ | = β

N∑n′=1

∣∣∣∣∣N∑n=1

(µn − νn)qnn′

∣∣∣∣∣≤ β

N∑n′=1

N∑n=1

|µn − νn| qnn′ = β

N∑n=1

|µn − νn|N∑

n′=1

qnn′

= β

N∑n=1

|µn − νn| = β ‖µ− ν‖ .

Therefore T is a contraction. By the contraction mapping theorem, there existsa unique π ∈ ∆ such that πP = π, and limt→∞ µP t = π for all µ ∈ ∆.

Remark 8.1. The same conclusion holds if there exists a number k such thatP k is a positive matrix. Just apply Theorem 8.2.

82

We can prove Perron’s theorem (Theorem 1.6) using Theorem 8.4.

Proof of Theorem 1.6. Let α = ρ(A) be the spectral radius of A. Let us firstshow Parts 1 and 2. Since A is positive, we can take d > 0 such that A ≥ dI.Then α = ρ(A) ≥ ρ(dI) = d > 0 by Problem 1.15. Let λ be an eigenvalue of Awith |λ| = α > 0 and u = (u1, . . . , uN )′ 6= 0 be a corresponding eigenvector. Letv = (|u1| , . . . , |uN |)′ be the vector of absolute values. Since Au = λu, takingthe absolute value of each entry and noting that A is positive, we obtain

α |um| =

∣∣∣∣∣N∑n=1

amnun

∣∣∣∣∣ ≤N∑n=1

amn |un| ⇐⇒ αv ≤ Av.

Let us show that Av = αv. Suppose to the contrary that Av > αv. ThenAv − αv > 0, so multiplying A from the left and noting that A is positive, weobtain

A(Av − αv) 0 ⇐⇒ A2v αAv.

Since A is finite dimensional, we can take ε > 0 such that A2v ≥ (1 + ε)αAv.Multiplying both sides from left by Ak−1, we obtain

Ak+1v ≥ (1 + ε)αAkv ≥ · · · ≥ [(1 + ε)α]kAv.

Taking the norm of both sides, we obtain∥∥Ak∥∥ ‖Av‖ ≥ ∥∥Ak+1v∥∥ ≥ [(1 + ε)α]k ‖Av‖ =⇒

∥∥Ak∥∥1/k ≥ (1 + ε)α.

Letting k → ∞, by the Gelfand spectral radius formula (Proposition 1.5), weobtain α ≥ (1 + ε)α, which is a contradiction since α > 0. Therefore Av = αv.Since v > 0, we have Av 0, so v = 1

αAv 0.To show Part 3, let x be a right Perron vector of A, so Ax = αx. Suppose

there exists a (complex) vector u such that Au = αu. Since A,α are both real,by taking the real and imaginary parts, v = Reu, Imu both satisfy Av = αv. Ifv is not collinear with x, then v 6= 0. Without loss of generality, we may assumev has a positive entry, so (since x 0) we can take c > 0 such that x− cv > 0and at least one entry of x− cv is zero. But then

0 A(x− cv) = α(x− cv),

a contradiction. Therefore v = Reu, Imu are both collinear with x, and so is u.Hence the eigenvalue α = ρ(A) is geometrically simple.

To show Part 4, let x, y 0 be the (unique) right and left Perron vectors of

A. Then for each m we have∑Nn=1 amnxn = αxm. Define the diagonal matrix

D = diag(x1, . . . , xN ), which is regular. Let P = 1αD−1AD. Comparing the

(m,n) entry, we obtain pmn = amnxnαxm

, so P is positive and

N∑n=1

pmn =

N∑n=1

amnxnαxm

= 1.

Thus P is a positive stochastic matrix. By Theorem 8.4, there exists a uniquevector θ 0 with

∑Nn=1 θn = 1 such that

θ′P = θ′ ⇐⇒ θ′1

αD−1AD = θ′ ⇐⇒ θ′D−1A = αθ′D−1.

83

This shows that θ′D−1 is a positive left eigenvector of A corresponding tothe eigenvalue α = ρ(A), so it must be θ′D−1 = y′ (up to a multiplicative

constant). Multiplying x from the right and noting that∑Nn=1 θn = 1 and

D = diag(x1, . . . , xN ), it must be y′x = 1. Setting µ to be the unit vectorse1, . . . , eN in Theorem 8.4 and letting 1 be the vector of ones, we obtain

1θ′ = limk→∞

P k = limk→∞

D−1

[1

αA

]kD

⇐⇒ limk→∞

[1

ρ(A)A

]k= D1θ′D−1 = xy′.

8.4 Implicit function theorem

When solving economic models, we often encounter equations like f(x, y) = 0,where y is an endogenous variable and x is an exogenous variable. Oftentimesy does not have an explicit expression, but nevertheless we might be interestedin dy/dx. The implicit function theorem lets you compute this derivative asfollows. Let y = g(x). Then f(x, g(x)) = 0. Differentiating both sides withrespect to x and using the chain rule, we get fx+fyg

′(x) = 0, so g′(x) = −fx/fy.(fx is the shorthand for ∂f/∂x.)

The same argument holds for multi dimensions. If x is M -dimensional, yis N -dimensional, and f : RM × RN → RN , then differentiating both sides off(x, g(x)) = 0 we get Dxf + DyfDxg = 0, so Dxg = −[Dyf ]−1Dxf . (Dxf isthe Jacobian (matrix of partial derivatives) of f with respect to x: the (m,n)element of Dxf is ∂fm/∂xn.) The goal of this section is to prove the followingimplicit function theorem.

Theorem 8.5 (Implicit Function Theorem). Let f : RM × RN → RN be acontinuously differentiable function. If f(x0, y0) = 0 and Dyf(x0, y0) is regular,then there exist neighborhoods U of x0 and V of y0 and a function g : U → Vsuch that

1. for all x ∈ U , f(x, y) = 0 ⇐⇒ y = g(x),

2. g is continuously differentiable, and

3. Dxg(x) = −[Dyf(x, y)]−1Dxf(x, y), where y = g(x).

The last claim of the implicit function theorem follows from the first two andthe chain rule. The first two claims follow from the inverse function theorem:

Theorem 8.6 (Inverse Function Theorem). Let f : RN → RN be a continuouslydifferentiable function. If Df(x0) is regular, then there exists a neighborhood Vof y0 = f(x0) such that

1. f : U = f−1(V )→ V is bijective (one-to-one and onto),

2. g = f−1 is continuously differentiable, and

3. Dg(y) = [Df(g(y))]−1 on V .

84

Proof of Implicit Function Theorem. Define F : RM+N → RM+N by

F (x, y) =

[x

f(x, y)

].

Then F is continuously differentiable. Furthermore, since

DF (x, y) =

[IM OM,N

Dxf(x, y) Dyf(x, y)

],

we have detDF (x0, y0) = detDyf(x0, y0) 6= 0, so DF (x0, y0) is regular. SinceF (x0, y0) = (x0, 0), by the inverse function theorem there exists a neighborhoodV of (x0, 0) such that F : F−1(V ) → V is bijective. Let G be the inversefunction of F . Then for any (z, w) ∈ V , we have F (x, y) = (z, w) ⇐⇒ (x, y) =G(z, w) = (G1(z, w), G2(z, w)). Since by definition F (x, y) = (x, f(x, y)), wehave x = z, so f(x, y) = w ⇐⇒ y = G2(x,w). Letting w = 0, we havef(x, y) = 0 ⇐⇒ y = g(x) := G2(x, 0). Since G is continuously differentiable,so is g. Therefore the implicit function theorem is proved.

Next, we prove the inverse function theorem. If f is linear, so f(x) =y0 + A(x − x0) for some matrix A, then we can find the inverse function bysolving y = y0 + A(x − x0) ⇐⇒ x = x0 + A−1(y − y0) (provided that A isregular). The idea to prove the nonlinear case is to linearize f around x0.

Since f is differentiable, we have y = f(x) ≈ f(x0)+Df(x0)(x−x0). Solvingthis equation for x, we obtain

x ≈ x0 +Df(x0)−1(y − f(x0)).

This equation shows that given an approximate solution x0 of f(x) = y, thesolution can be approximated further by x0 + Df(x0)−1(y − f(x0)) (which isessentially the Newton-Raphson algorithm for solving the nonlinear equationf(x) = y). This intuition is helpful for understanding the proof below.

To prove the inverse function theorem, we need the following result.

Proposition 8.7 (Mean value inequality). Let f : RN → RM be differentiableand ‖·‖ be the Euclidean norm (as well as the operator norm induced by ‖·‖).Then

‖f(x2)− f(x1)‖ ≤ supt∈[0,1]

‖Df(x1 + t(x2 − x1))‖ ‖x2 − x1‖ .

Proof. The claim is trivial if f(x1) = f(x2), so assume f(x1) 6= f(x2). Takeany d ∈ RM with ‖d‖ = 1. Define φ : [0, 1]→ R by

φ(t) = d′(f(x1 + t(x2 − x1))− f(x1)).

Then φ(0) = 0 and φ(1) = d′(f(x2)−f(x1)). By the mean value theorem, thereexists t ∈ [0, 1] such that

d′(f(x2)− f(x1)) = φ(1)− φ(0) = φ′(t) = d′Df(x1 + t(x2 − x1))(x2 − x1),

where the last equality follows from the chain rule. Taking the norm of bothsides, we obtain

|d′(f(x2)− f(x1))| ≤ ‖d′Df(x1 + t(x2 − x1))(x2 − x1)‖≤ ‖Df(x1 + t(x2 − x1))‖ ‖x2 − x1‖

because ‖d‖ = 1. Taking the supremum over t ∈ [0, 1] and setting d = (f(x2)−f(x1))/ ‖f(x2)− f(x1)‖, we obtain the desired inequality.

85

Proof of Inverse Function Theorem. Fix y ∈ RN and define T : RN → RNby

T (x) = x+Df(x0)−1(y − f(x)),

which is well-defined because Df(x0) is regular. For notational simplicity, letA = Df(x0)−1. Applying the mean value inequality to T (x), for any x1, x2 wehave

‖T (x2)− T (x1)‖ ≤ supt∈[0,1]

‖I −ADf(x(t))‖ ‖x2 − x1‖ ,

where x(t) = x1 + t(x2 − x1). Since f is continuously differentiable and A =Df(x0)−1, we can take ε > 0 such that ‖I −ADf(x)‖ ≤ 1/2 whenever ‖x− x0‖ ≤ε. Note that since y cancels out in T (x2) − T (x1), so ε does not depend on y.Let B = x | ‖x− x0‖ ≤ ε be the closed ball with center x0 and radius ε. Ifx1, x2 ∈ B, then

‖x(t)− x0‖ ≤ (1− t) ‖x1 − x0‖+ t ‖x2 − x0‖ ≤ ε,

so x(t) ∈ B. Then by assumption ‖T (x2)− T (x1)‖ ≤ 12 ‖x2 − x1‖ on B. Let us

show that T (B) ⊂ B if y is sufficiently close to y0 = f(x0). To see this, notethat

T (x)− x0 = x− x0 +A(f(x0)− f(x)) +A(y − y0).

Using the mean value inequality to h(x) = x−Af(x), we obtain

‖T (x)− x0‖ ≤ supt∈[0,1]

‖I −ADf(x0 + t(x− x0))‖ ‖x− x0‖+ ‖A(y − y0)‖ .

Take a neighborhood V of y0 such that ‖A(y − y0)‖ ≤ 12ε for all y ∈ V . If y ∈ V

and x ∈ B, then we have

‖T (x)− x0‖ ≤1

2‖x− x0‖+

1

2ε ≤ ε,

so T (x) ∈ B. This shows that T (B) ⊂ B.Since T : B → B, B is closed, and T is a contraction on B, there exists a

unique fixed point x of T . Since

x = T (x) = x+A(y − f(x)),

we have y = f(x).Let the unique x ∈ B such that y = f(x) for any y ∈ V be denoted by

x = g(y). Since f(x) = y, we have f(g(y)) = y. First let us show that g iscontinuous on V . Suppose yn → y and xn = g(yn) but xn 6→ x. Since B iscompact, by taking a subsequence if necessary, we may assume that xn → x′,where x 6= x′ ∈ B. Since f(xn) = f(g(yn)) = yn, letting n → ∞, since f iscontinuous we get f(x′) = y, which is a contradiction because f(x) = y has aunique solution on B. Therefore g is continuous.

To show the differentiability of g, for small enough h ∈ RN , let k(h) := g(y+h)−g(y). Since g is continuous, so is k. Noting that g(y+h) = g(y)+k(h) = x+kand f is differentiable, we obtain

y + h = f(g(y + h)) = f(x+ k) = f(x) +Dfk + o(k).

86

Since y = f(x), we get h = Dfk + o(k), so h and k have the same order ofmagnitude. Finally,

g(y + h) = g(y) + k = g(y) + [Df ]−1(h− o(k)) = g(y) + [Df ]−1h+ o(h),

so g is differentiable and Dg(y) = [Df(g(y))]−1.

As an application of the implicit function theorem, let us study how aninvestor’s asset allocation is related to wealth. Consider an agent with vonNeumann-Morgenstern utility function u and initial wealth w > 0. Supposethat there are two assets, one risky (stock) with gross return R > 0 and theother risk-free (bond) with gross risk-free rate Rf > 0. If the investor invests xin the risky asset, the total wealth after investment is

R(x) := Rx+Rf (w − x).

Therefore the utility maximization problem is

maximizex

E[u(R(x))],

where E denotes the expectation. Suppose that u′ > 0 and u′′ < 0, so utility isstrictly increasing and concave.

The following lemma shows that if the expected excess return of the riskyasset is positive, then the investor always holds a positive amount of stocks.

Lemma 8.8. Suppose that E[R] > Rf and a solution x to the utility maximiza-tion problem exists. Then x > 0.

Proof. Let f(x) = E[u(R(x))] be the expected utility. Then by the chain rulewe have

f ′(x) = E[u′(R(x))(R−Rf )],

f ′′(x) = E[u′′(R(x))(R−Rf )2].

Since u′′ < 0, and R 6= Rf with positive probability because E[R] > Rf , itfollows that f ′′(x) < 0. Therefore f is strictly concave.

If x solves the utility maximization problem, by the first-order condition wehave f ′(x) = 0. Since

f ′(0) = E[u′(Rfw)(R−Rf )] = u′(Rfw)(E[R]−Rf ) > 0

because u′ > 0 and E[R] > Rf and f ′ is strictly decreasing because f ′′ < 0,f ′(x) = 0 < f ′(0) implies x > 0.

A measure of risk aversion known as the absolute risk aversion coefficientat wealth w is defined by the quantity α(w) = −u′′(w)/u′(w). The followingproposition shows that if an investor’s absolute risk aversion is decreasing, thenthe investor holds more stocks as he gets richer. The proof is an application ofthe implicit function theorem.

Proposition 8.9. Suppose that α(w) = −u′′(w)/u′(w) is decreasing, E[R] >Rf , and let x > 0 be the optimal stock holdings. Then ∂x/∂w ≥ 0.

87

Proof. By the first-order condition, we have

F (w, x) := E[u′(R(x))(R−Rf )] = 0.

Assuming that the implicit function theorem is applicable, we have ∂x/∂w =−(∂F/∂w)/(∂F/∂x). The denominator is

∂F

∂x= E[u′′(R(x))(R−Rf )2] < 0,

so we can apply the implicit function theorem. Using the definition of α, thenumerator is

∂F

∂w= E[u′′(R(x))(R−Rf )Rf ] = −E[α(R(x))u′(R(x))(R−Rf )Rf ].

If R ≥ Rf , since x > 0 we get R(x) = Rx+Rf (w−x) = Rfw+(R−Rf )x ≥ Rfw.Since α is decreasing, we get α(R(x)) ≤ α(Rfw). Multiplying both sides byR−Rf ≥ 0, we obtain

α(R(x))(R−Rf ) ≤ α(Rfw)(R−Rf ).

If R ≤ Rf , by a similar argument R(x) ≤ Rf and α(R(x)) ≥ α(Rfw), so again

α(R(x))(R−Rf ) ≤ α(Rfw)(R−Rf ).

Since u′ > 0, we obtain

E[α(R(x))u′(R(x))(R−Rf )Rf ] ≤ E[α(Rfw)u′(R(x))(R−Rf )Rf ]

= α(Rfw)Rf E[u′(R(x))(R−Rf )] = 0

by the first order condition, so ∂F/∂w ≥ 0. Hence by the implicit functiontheorem ∂x/∂w = −(∂F/∂w)/(∂F/∂x) ≥ 0.

For similar economic applications of the implicit function theorem, see forexample Theorem 3 of Phelan and Toda (2019) and Lemma 1 of Toda and Walsh(2020).

Problems

8.1. Let (X, d) be a complete metric space, T : X → X a contraction with aunique fixed point x ∈ X, and let X1 ⊂ X be a nonempty closed set such thatTX1 ⊂ X1. Show that x ∈ X1.

8.2. Consider the integral equation

f(x) = λ

∫ x

a

K(x, y)f(y) dy + φ(x), (8.2)

where φ : [a, b] → R and K : [a, b]2 → R are given continuous functions andλ ∈ R.

88

1. Let X be the space of continuous functions f : [a, b] → R equipped withthe supremum norm ‖·‖. For each x ∈ [a, b], define

(Tf)(x) = λ

∫ x

a

K(x, y)f(y) dy + φ(x).

Show that T : X → X.

2. Let M := sup(x,y)∈[a,b]2 |K(x, y)|. For any f, g ∈ X, show that

|(Tf)(x)− (Tg)(x)| ≤ |λ|M(x− a) ‖f − g‖ .

3. Show that there exists a unique solution f to (8.2) by showing that T k isa contraction for some k ∈ N.

8.3. Let A be a square nonnegative matrix and suppose that there exists k ∈ Nsuch that Ak is positive. Show that the conclusions of Theorem 1.6 remain true.(Hint: use Theorem 8.2.)

8.4. Consider the function F : R2 × R2 → R2 defined by

F (x1, x2, y1, y2) =

[x2

1 − x22 − y3

1 + y22 + 4

2x1x2 + x22 − 2y2

1 + 3y42 + 8

].

1. Compute the Jacobian DF (x1, x2, y1, y2).

2. Show F (2,−1, 2, 1) = (0, 0).

3. Let x = (x1, x2) and y = (y1, y2). Show that if x is sufficiently close to(2,−1), then there exists a function G(x) such that F (x, y) = 0 ⇐⇒ y =G(x).

8.5. For a utility function u satisfying u′ > 0 and u′′ < 0, the quantity

γ(w) = −wu′′(w)

u′(w)

is called the relative risk aversion at wealth w. Consider an optimal portfolioproblem, where an investor chooses the optimal fraction of wealth invested instocks. If an investor has initial wealth w and invests fraction θ in stocks, thenthe final wealth becomes R(θ)w, where R(θ) := Rθ + Rf (1 − θ) and R,Rf arethe gross returns on stocks and bonds satisfying E[R] > Rf .

Prove that if an investor has decreasing relative risk aversion (so γ(w) isdecreasing in w), then a rational investor that solves

maximizeθ

E[u(R(θ)w)]

invests a higher fraction of wealth θ in stocks as he gets richer.

89

Chapter 9

Convex Sets

9.1 Convex sets

A set C ⊂ RN is said to be convex if the line segment generated by any twopoints in C is entirely contained in C. Formally, C is convex if x, y ∈ C implies(1−α)x+αy ∈ C for all α ∈ [0, 1] (Figure 9.1). So a circle, triangle, and squareare convex but a star-shape is not (Figure 9.2). One of my favorite jokes is thatthe Chinese character for “convex” is not convex (Figure 9.3).

x

y

(1− α)x+ αy

Figure 9.1: Definition of a convex set.

Rectangle Circle Ellipse Convex

Convex

Non-convex

Figure 9.2: Examples of convex and non-convex sets.

90

Figure 9.3: Chinese character for “convex” is not convex.

Let A ⊂ RN be any set. The smallest convex set that includes A is calledthe convex hull of A and is denoted by coA. (Its existence is proved in Problem9.1.) For example, in Figure 9.4, the convex hull of the set A consisting of twocircles is the entire region in between.

AA coA

Figure 9.4: Convex hull.

Let xk ∈ RN for k = 1, . . . ,K. A point of the form

x =

K∑k=1

αkxk,

where αk ≥ 0 and∑Kk=1 αk = 1, is called a convex combination of the points

xkKk=1. The following lemma provides a constructive way to obtain the convexhull of a set. Its proof is in Problem 9.2.

Lemma 9.1. Let A ⊂ RN be any set. Then coA consists of all convex combi-nations of points of A.

9.2 Hyperplanes and half spaces

You should know from high school that the equation of a line in R2 is

a1x1 + a2x2 = c

for some real numbers a1, a2, c, and that the equation of a plane in R3 is

a1x1 + a2x2 + a3x3 = c.

Letting a = (a1, . . . , aN ) and x = (x1, . . . , xN ) be vectors in RN , the equation〈a, x〉 = c is a line if N = 2 and a plane if N = 3, where

〈a, x〉 = a1x1 + · · ·+ aNxN

is the inner product of the vectors a and x.1 In general, we say that the setx ∈ RN

∣∣ 〈a, x〉 = c

1The inner product is sometimes called the vector product or the dot product. Commonnotations for the inner product are 〈a, x〉, (a, x), a · x, etc.

91

is a hyperplane if a 6= 0. The vector a is orthogonal to this hyperplane (isa normal vector). To see this, let x0 be a point in the hyperplane. Since〈a, x0〉 = c, by subtraction and linearity of inner product we get 〈a, x− x0〉 = 0.This means that the vector a is orthogonal to the vector x−x0, which can pointto any direction in the plane by moving x. So it makes sense to say that a isorthogonal to the hyperplane 〈a, x〉 = c. The sets

H+ =x ∈ RN

∣∣ 〈a, x〉 ≥ c ,H− =

x ∈ RN

∣∣ 〈a, x〉 ≤ care called half spaces, since H+ (H−) is the portion of RN separated by thehyperplane 〈a, x〉 = c towards the direction of a (−a). Hyperplanes and halfspaces are convex sets (Problem 9.3).

9.3 Separation of convex sets

Let A,B be two sets. We say that the hyperplane 〈a, x〉 = c separates A,B ifA ⊂ H− and B ⊂ H+ (Figure 9.5), that is,

x ∈ A =⇒ 〈a, x〉 ≤ c,x ∈ B =⇒ 〈a, x〉 ≥ c.

(The inequalities may be reversed.)

A

B

H−

H+

〈a, x〉 = c

a

Figure 9.5: Separation of convex sets.

Clearly A,B can be separated if and only if

supx∈A〈a, x〉 ≤ inf

x∈B〈a, x〉 ,

since we can take c between these two numbers. We say that A,B can be strictlyseparated if the inequality is strict, so

supx∈A〈a, x〉 < inf

x∈B〈a, x〉 .

The remarkable property of convex sets is the following separation property.

92

Theorem 9.2 (Separating Hyperplane Theorem). Let C,D ⊂ RN be nonemptyand convex. If C ∩D = ∅, then there exists a hyperplane that separates C,D. IfC,D are closed and one of them is compact, then they can be strictly separated.

We need the following lemma to prove Theorem 9.2.

Lemma 9.3. Let C be nonempty and convex. Then any x ∈ RN has a uniqueclosest point PC(x) ∈ clC, called the projection of x on clC. Furthermore, forany z ∈ C we have

〈x− PC(x), z − PC(x)〉 ≤ 0.

Proof. Let δ = inf ‖x− y‖ | y ∈ C ≥ 0 be the distance from x to C (Figure9.6).

δ

x

y = PC(x)

z

C

Figure 9.6: Projection on a convex set.

Take a sequence yk ⊂ C such that ‖x− yk‖ → δ. Then by simple algebrawe get

‖yk − yl‖2 = 2 ‖x− yk‖2 + 2 ‖x− yl‖2 − 4

∥∥∥∥x− 1

2(yk + yl)

∥∥∥∥2

. (9.1)

Since C is convex, we have 12 (yk + yl) ∈ C, so by the definition of δ we get

‖yk − yl‖2 ≤ 2 ‖x− yk‖2 + 2 ‖x− yl‖2 − 4δ2 → 2δ2 + 2δ2 − 4δ2 = 0

as k, l → ∞. Since yk ⊂ C is Cauchy, it converges to some point y ∈ clC.Then

‖x− y‖ ≤ ‖x− yk‖+ ‖yk − y‖ → δ + 0 = δ,

so y is the closest point to x in clC. If y1, y2 are two closest points, then by thesame argument we get

0 ≤ ‖y1 − y2‖2 ≤ 2 ‖x− y1‖2 + 2 ‖x− y2‖2 − 4δ2 ≤ 0,

so y1 = y2. Thus y = PC(x) is unique.Finally, let z ∈ C be any point. Take yk ⊂ C such that yk → y = PC(x).

Since C is convex, for any 0 < α ≤ 1 we have (1− α)yk + αz ∈ C. Therefore

δ2 = ‖x− y‖2 ≤ ‖x− (1− α)yk − αz‖2 .

93

Letting k →∞ we get ‖x− y‖2 ≤ ‖x− y − α(z − y)‖2. Expanding both sides,dividing by α > 0, and letting α → 0, we get 〈x− y, z − y〉 ≤ 0, which is thedesired inequality.

The following proposition shows that a point that is not an interior point ofa convex C can be separated from C.

Proposition 9.4. Let C ⊂ RN be nonempty and convex and x /∈ intC. Thenthere exists a hyperplane 〈a, x〉 = c that separates x and C, i.e.,

〈a, x〉 ≥ c ≥ 〈a, z〉

for any z ∈ C. If x /∈ clC, then the above inequalities can be made strict.

Proof. Suppose that x /∈ clC. Let y = PC(x) be the projection of x on clC.Then x 6= y because y ∈ clC and x /∈ clC. Let a = x − y 6= 0 and c =〈a, y〉+ 1

2 ‖a‖2. Then for any z ∈ C we have

〈x− y, z − y〉 ≤ 0 =⇒ 〈a, z〉 ≤ 〈a, y〉 < 〈a, y〉+1

2‖a‖2 = c,

〈a, x〉 − c = 〈x− y, x− y〉 − 1

2‖a‖2 =

1

2‖a‖2 > 0 ⇐⇒ 〈a, x〉 > c.

Therefore the hyperplane 〈a, x〉 = c strictly separates x and C.If x ∈ clC, since by assumption x /∈ intC, we can take a sequence xk such

that xk /∈ clC and xk → x. Then for each k we can find a vector ak 6= 0 and anumber ck ∈ R such that

〈ak, xk〉 ≥ ck ≥ 〈ak, z〉

for all z ∈ C. By dividing both sides by ‖ak‖ 6= 0, without loss of generality wemay assume ‖ak‖ = 1. Since xk → x, the sequence ck is bounded. Thereforewe can find a convergent subsequence (akl , ckl)→ (a, c). Letting l→∞, we get

〈a, x〉 ≥ c ≥ 〈a, z〉

for any z ∈ C. Therefore the hyperplane 〈a, x〉 = c separates x and C.

Proof of Theorem 9.2. Let E = C−D := x− y |x ∈ C, y ∈ D. Since C,Dare nonempty and convex, so is E. Since C ∩ D = ∅, we have 0 /∈ E. Inparticular, 0 /∈ intE. By Proposition 9.4, there exists a 6= 0 such that 〈a, 0〉 =0 ≥ 〈a, z〉 for all z ∈ E. By the definition of E, we have

〈a, x− y〉 ≤ 0 ⇐⇒ 〈a, x〉 ≤ 〈a, y〉

for any x ∈ C and y ∈ D. Letting supx∈C 〈a, x〉 ≤ c ≤ infy∈D 〈a, y〉, it followsthat the hyperplane 〈a, x〉 = c separates C and D.

Suppose that C is closed and D is compact. Let us show that E = C −Dis closed. For this purpose, suppose that zk ⊂ E and zk → z. Then we cantake xk ⊂ C, yk ⊂ D such that zk = xk − yk. Since D is compact, there isa subsequence such that ykl → y ∈ D. Then xkl = ykl + zkl → y + z, but sinceC is closed, x = y + z ∈ C. Therefore z = x− y ∈ E, so E is closed.

Since E = C −D is closed and 0 /∈ E, by Proposition 9.4 there exists a 6= 0such that 〈a, 0〉 = 0 > 〈a, z〉 for all z ∈ E. The rest of the proof is similar.

94

Notes

Rockafellar (1970) is a classic reference for convex analysis. Much of the theoryof separation of convex sets can be generalized to infinite-dimensional spaces.The proof in this chapter using the projection generalizes to Hilbert spaces, butfor more general spaces (topological vector spaces) we need the Hahn-Banachtheorem. See Berge (1959) and Luenberger (1969).

Problems

9.1.

1. Let Cii∈I ⊂ RN be a collection of convex sets. Prove that⋂i∈I Ci is

convex.

2. Let A ⊂ RN be any set. Prove that there exists a smallest convex set thatincludes A (convex hull of A).

9.2. Let A ⊂ RN be any set. Prove that coA consists of all convex combinationsof points of A.

9.3.

1. Let 0 6= a ∈ RN and c ∈ R. Show that the hyperplaneH =x ∈ RN

∣∣ 〈a, x〉 = c

and the half space H+ =x ∈ RN

∣∣ 〈a, x〉 ≥ c are convex sets.

2. Let A be an M ×N matrix and b ∈ RM . The set of the form

P =x ∈ RN

∣∣Ax ≤ bis called a polytope. Show that a polytope is convex.

9.4. Let A ⊂ RN be any nonempty set.

1. Show that cl coA (the closure of the convex hull of A) is a closed convexset.

2. Show by example that co clA (the convex hull of the closure of A) neednot be closed.

9.5.

1. Let a, b ∈ RN . Prove the following parallelogram law :

‖a+ b‖2 + ‖a− b‖2 = 2 ‖a‖2 + 2 ‖b‖2 .

2. Using the parallelogram law, prove (9.1).

9.6. Let A =

(x, y) ∈ R2∣∣ y > x3

and B =

(x, y) ∈ R2

∣∣x ≥ 1, y ≤ 1

.

1. Draw a picture of the sets A,B on the xy plane.

2. Can A,B be separated? If so, provide an equation of a straight line thatseparates them. If not, explain why.

95

9.7. Let C =

(x, y) ∈ R2∣∣ y > ex

and D =

(x, y) ∈ R2

∣∣ y ≤ 0

.

1. Draw a picture of the sets C,D on the xy plane.

2. Provide an equation of a straight line that separates C,D.

3. Can C,D be strictly separated? Answer yes or no, then explain why.

9.8. This problem asks you to prove Stiemke’s theorem: if A is an M × Nmatrix, then exactly one of the following statements is true:

(a) There exists x ∈ RN++ such that Ax = 0.

(b) There exists y ∈ RM such that A′y > 0.

Prove Stiemke’s theorem using the following hints.

1. Show that statements (a) and (b) cannot both be true.

2. Define the sets C,D ⊂ RN by

C =A′y

∣∣ y ∈ RM,

D =

x ∈ RN

∣∣∣∣∣x ≥ 0,

N∑n=1

xn = 1

.

Show that C,D are nonempty, closed, convex, and D is compact.

3. Show that if statement (b) does not hold, then statement (a) holds.

9.9. A typical linear programming problem is

minimize 〈c, x〉subject to Ax ≥ b,

where x ∈ RN , 0 6= c ∈ RN , b ∈ RM , and A is an M ×N matrix with M ≥ N .A standard algorithm for solving a linear programming problem is the simplexmethod. The idea is that you keep moving from one vertex of the polytope

P =x ∈ RN

∣∣Ax ≥ bto a neighboring vertex as long as the function value decreases, and if there areno neighboring vertex with smaller function values, you stop.

1. Prove that the simplex method terminates in finite steps.

2. Prove that when the algorithm stops, you are at a solution of the originalproblem.

96

Chapter 10

Convex Functions

10.1 Convex and quasi-convex functions

Let f : RN → (−∞,∞] be a function. The set

epi f :=

(x, y) ∈ RN × R∣∣ f(x) ≤ y

is called the epigraph of f (Figure 10.1), for the obvious reason that epi f isthe set of points that lie on or above the graph of f . A function f is said tobe convex if epi f is a convex set. It is straightforward to show (Problem 10.1)that a function f is convex if and only if for any x1, x2 ∈ RN and α ∈ [0, 1], wehave

f((1− α)x1 + αx2) ≤ (1− α)f(x1) + αf(x2). (10.1)

This inequality is often used as the definition of a convex function.

x

y = f(x)epi f

x1 x2

f(x1)

f(x2)

Figure 10.1: Convex function and its epigraph.

When the inequality (10.1) is strict whenever x1 6= x2 and α ∈ (0, 1), we saythat f is strictly convex. A convex function is proper if f(x) > −∞ for all xand f(x) <∞ for some x. If f is a convex function, then the set

dom f :=x ∈ RN

∣∣ f(x) <∞

97

is a convex set, since x1, x2 ∈ dom f implies

f((1− α)x1 + αx2) ≤ (1− α)f(x1) + αf(x2) <∞.

dom f is called the effective domain of f .Another useful but weaker concept is quasi-convexity. The set

Lf (y) =x ∈ RN

∣∣ f(x) ≤ y

(10.2)

is called the lower contour set of f at level y. f is said to be quasi-convex ifall lower contour sets are convex. It is straightforward to show (Problem 10.2)that f is quasi-convex if and only if for any x1, x2 ∈ RN and α ∈ [0, 1], we have

f((1− α)x1 + αx2) ≤ max f(x1), f(x2) . (10.3)

Again if the inequality is strict whenever x1 6= x2 and 0 < α < 1, then f is saidto be strictly quasi-convex.

A function f is said to be concave if −f is convex, that is, f is a convex func-tion flipped upside down. The definition for strict concavity or quasi-concavityis similar.

Note that all convex functions are quasi-convex, but not vice versa. To seethat all convex functions are quasi-convex, let f : RN → (−∞,∞] be convex.Take any y ∈ (−∞,∞] consider the lower contour set Lf (y) in (10.2). Then ifx1, x2 ∈ Lf (y) and α ∈ [0, 1], we have

f((1− α)x1 + αx2) ≤ (1− α)f(x1) + αf(x2)

≤ (1− α)y + αy = y,

so by definition (1−α)x1+αx2 ∈ Lf (y). Therefore Lf (y) is convex, and hence fis quasi-convex. To see that not all quasi-convex functions are convex, considerthe function f : R → R defined by f(x) =

√|x|. Then it is easy to see by

drawing a graph that f is quasi-convex but not convex (Figure 10.2).

x

Figure 10.2: f(x) =√|x| is quasi-convex but not convex.

A convenient property of quasi-convexity (concavity) is that it is preservedby monotonic transformations, as in the following proposition (proof in Problem10.4).

Proposition 10.1. Let f : RN → (−∞,∞] be quasi-convex (concave) andφ : (−∞,∞]→ (−∞,∞] be increasing, meaning φ(y1) ≤ φ(y2) whenever −∞ <y1 ≤ y2 ≤ ∞. Then g = φ f : RN → (−∞,∞] defined by g(x) = φ(f(x)) isquasi-convex (concave).

In contrast, convexity is not necessarily preserved by monotonic transforma-tions. For instance, define f : R → R by f(x) = |x| and φ : [0,∞) → R by

98

φ(y) =√y. Then f is convex and φ is increasing, but g(x) = φ(f(x)) =

√|x|

is not convex, as we see in Figure 10.2.So far we have seen that all convex functions are quasi-convex but not all

quasi-convex functions are convex. The following theorem, which is slightlystronger than Berge (1959, p. 208, Theorem 3), shows that quasi-convex (con-cave) functions that are homogeneous and have constant signs are always convex(concave). This result is sometimes useful because checking quasi-convexity iseasier than convexity.

Theorem 10.2. Let C ⊂ RN be a nonempty convex set such that λx ∈ C for allx ∈ C and λ > 0. Let f : C → (−∞,∞] be (i) quasi-convex, (ii) homogeneousof degree 1, that is, f(λx) = λf(x) for all x ∈ C and λ > 0, and (iii) eitherf(x) > 0 for all x ∈ C\ 0 or f(x) < 0 for all x ∈ C\ 0. Then f is convex.

Proof. Take any x1, x2 ∈ C and α ∈ [0, 1]. Let us show (10.1). The claim istrivial if α = 0 or α = 1, so assume 0 < α < 1. Similarly, (10.1) is trivial iff(x1) =∞ or f(x2) =∞, so assume f(x1) <∞ and f(x2) <∞.

If x1 = 0, using homogeneity for λ = 2 and f(x1) < ∞, we obtain f(0) =f(2 · 0) = 2f(0), implying f(0) = 0. Again using homogeneity for λ = α andnoting that x1 = 0 and f(x1) = f(0) = 0, we obtain

f((1− α)x1 + αx2) = f(αx2) = αf(x2) = (1− α)f(x1) + αf(x2),

so (10.1) holds. The case for x2 = 0 is similar.Therefore we may assume x1, x2 6= 0. Since by assumption f has constant

sign on C\ 0, it follows that (1 − α)f(x1) and αf(x2) are both nonzero real

numbers with the same sign. Define k = αf(x2)(1−α)f(x1) > 0 and x = (1−α)x1+αx2.

Using the homogeneity and quasi-convexity of f , we obtain

k

1 + kf(x) = f

(k

1 + kx

)= f

(1

1 + kk(1− α)x1 +

k

1 + kαx2

)≤ max f(k(1− α)x1), f(αx2) = max k(1− α)f(x1), αf(x2) .

Since by construction k(1− α)f(x1) = αf(x2), the last expression is also equalto

1

1 + kk(1− α)f(x1) +

k

1 + kαf(x2) =

k

1 + k((1− α)f(x1) + αf(x2)).

Thereforek

1 + kf(x) ≤ k

1 + k((1− α)f(x1) + αf(x2)),

and dividing both sides by k1+k > 0, we obtain (10.1).

Remark 10.1. By replacing f by −f and “convex” by “concave”, etc., the state-ment in Theorem 10.2 remains true.

Example 10.1. Let 1 ≤ p <∞ and define f : RN → R by

f(x) = ‖x‖p :=

(N∑n=1

|xn|p)1/p

.

99

Then f is convex. To see this, note that f is nonnegative and homogeneous ofdegree 1, with f(x) = 0 if and only if x = 0. Let φ(y) = 1

pyp for y ≥ 0. Then

φ′(y) = yp−1 ≥ 0 and φ′′(y) = (p − 1)yp−2 ≥ 0, so φ is increasing and convex.Clearly the function x 7→ |xn| is convex. Hence by Problem 10.5,

g(x) := φ(f(x)) =1

p

N∑n=1

|xn|p

is convex, so f = φ−1 g is quasi-convex. Hence by Theorem 10.2, f is convex.Setting α = 1/2 in (10.1), for all x, y ∈ RN we obtain∥∥∥∥x+ y

2

∥∥∥∥p

≤ 1

2‖x‖p +

1

2‖y‖p ⇐⇒ ‖x+ y‖p ≤ ‖x‖p + ‖y‖p . (10.4)

The inequality (10.4) is called the Minkowski inequality, which is a generalizationof the Cauchy-Schwarz inequality (corresponding to p = 2) and establishes thatthe lp norm ‖·‖p is indeed a norm.

Example 10.2. Define f : RN++ → R by f(x) = xα11 · · ·x

αNN , where αn > 0

and∑Nn=1 αn = 1. Then f is concave. To see this, note that f is positive and

homogeneous of degree 1. Furthermore,

log f(x) =

N∑n=1

αn log xn

is concave, so its monotonic transformation f(x) = exp(log f(x)) is quasi-concave. Hence by Theorem 10.2, f is concave.

Example 10.3. Define f : RN++ → R by

f(x) =

(N∑n=1

αnx1−γn

) 11−γ

,

where αn > 0 for all n and 0 < γ 6= 1. Then f is concave. To see this, note that

f is positive and homogeneous of degree 1. Let φ(y) = y1−γ

1−γ for y > 0. Then

φ′(y) = y−γ > 0, so φ is increasing. Furthermore,

g(x) := φ(f(x)) =

N∑n=1

αnx1−γn

1− γ

is concave (compute the Hessian and apply Proposition 10.6 below), so f =φ−1 g is quasi-concave. Hence by Theorem 10.2, f is concave. For an economicapplication of this example, see Theorems 1 and 2 of Toda and Walsh (2020).

10.2 Continuity of convex functions

A nice property of convex functions is that they are continuous except at bound-ary points of the domain.

100

Theorem 10.3. Let U ⊂ RN be an open convex set and f : U → R be convex.Then f is continuous.

Proof. Equip RN with the supremum norm defined by ‖x‖ = maxn |xn| for avector x = (x1, . . . , xN ) ∈ RN . For any x ∈ RN and r > 0, define the closedball with center x and radius r by

B(x, r) :=y ∈ RN

∣∣ ‖y − x‖ ≤ r .By the definition of the supremum norm, B(x, r) is actually the hypercube

[x1 − r, x1 + r]× · · · × [xN − r, xN + r]

with 2N vertices (x1 ± r, . . . , xN ± r).Take any x ∈ U . Since U is open, we can take r > 0 such that B(x, r) ⊂ U .

Let the vertices of B(x, r) be denoted by xkKk=1, where K = 2N . DefineM := maxk f(xk) < ∞. Since clearly any point of B(x, r) can be expressed as

a convex combination of xkKk=1 (the proof is by induction on N), we have

f(z) ≤M for all z ∈ B(x, r). (10.5)

Now take any y ∈ B(x, r)\ x, let 0 6= d = y − x, ε = ‖d‖ /r ∈ (0, 1], anddefine the points z1, z2 by z1 = x + d/ε and z2 = x − d/ε (Figure 10.3). Thenclearly ‖zj − x‖ = ‖d‖ /ε = r for j = 1, 2, so zj ∈ B(x, r).

z2 x y z1

r r

εr

d

Figure 10.3: Definition of z1 and z2.

By the definition of zj , we have

y − x = d = ε(z1 − x) ⇐⇒ y = (1− ε)x+ εz1,

y − x = d = −ε(z2 − x) ⇐⇒ x =1

1 + εy +

ε

1 + εz2.

Hence by the convex inequality (10.1) and the upper bound (10.5), we obtain

f(y) ≤ (1− ε)f(x) + εf(z1) =⇒ f(y)− f(x) ≤ ε(M − f(x)),

f(x) ≤ 1

1 + εf(y) +

ε

1 + εf(z2) =⇒ f(x)− f(y) ≤ ε(M − f(x)).

Combining these two inequalities, we obtain

|f(y)− f(x)| ≤ ε(M − f(x)) =M − f(x)

r‖y − x‖ . (10.6)

Therefore f(y)→ f(x) as y → x, so f is continuous.

A convex function need not be continuous at boundary points of the domain.For example, define f : [0, 1] → R by f(x) = 0 if x < 1 and f(1) = 1. Thenclearly f is convex but not continuous at x = 1.

A corollary of the proof of Theorem 10.3 is that convex functions are actuallylocally Lipschitz continuous. Recall that f : U → R is Lipschitz continuous withLipschitz constant L ≥ 0 if for all x, y ∈ U , we have |f(x)− f(y)| ≤ L ‖x− y‖.

101

Corollary 10.4. Let U ⊂ RN be a nonempty open convex set and f : U → Rbe convex. Then f is locally Lipschitz.

Proof. Take any x ∈ U and r > 0 such that B(x, r) ⊂ U , and define V =B(x, r/3). Let us show that f is Lipschitz on V . Since by Theorem 10.3 f iscontinuous on the compact set B(x, r), it attains a minimum m and a maximumM . Take any x1, x2 ∈ V . Then

‖x1 − x2‖ ≤ ‖x1 − x‖+ ‖x− x2‖ ≤2r

3,

so x1 ∈ B(x2, 2r/3). If y ∈ B(x2, 2r/3), then

‖y − x‖ ≤ ‖y − x2‖+ ‖x2 − x‖ ≤2r

3+r

3= r,

so B(x2, 2r/3) ⊂ B(x, r). Applying (10.6) to y = x1 and x = x2, we obtain

|f(x1)− f(x2)| ≤ M −m2r/3

‖x1 − x2‖ ,

which shows that f is Lipschitz on V with Lipschitz constant L := 3(M−m)2r .

Unlike convex functions, quasi-convex functions need not be continuous. Forexample, any strictly increasing function f : R → R is quasi-convex, but thereare many of them that are discontinuous.

10.3 Characterization of convex functions

When f is differentiable, there are simple ways to establish convexity.

Proposition 10.5. Let U ⊂ RN be an open convex set and f : U → R bedifferentiable. Then f is (strictly) convex if and only if

f(y)− f(x) ≥ (>) 〈∇f(x), y − x〉

for all x 6= y.

Proof. Suppose that f is (strictly) convex. Let x 6= y ∈ U and define g : (0, 1]→R by

g(t) =f((1− t)x+ ty)− f(x)

t.

Then g is (strictly) increasing, for if 0 < s < t ≤ 1 we have

g(s) ≤ (<)g(t)

⇐⇒ f((1− s)x+ sy)− f(x)

s≤ (<)

f((1− t)x+ ty)− f(x)

t

⇐⇒ f((1− s)x+ sy) ≤ (<)(

1− s

t

)f(x) +

s

tf((1− t)x+ ty),

but the last inequality holds by letting α = s/t, x1 = x, x2 = (1− t)x+ ty, andusing the definition of convexity. Therefore

f(y)− f(x) = g(1) ≥ (>) limt→0

g(t) = 〈∇f(x), y − x〉 .

102

Conversely, suppose that

f(y)− f(x) ≥ (>) 〈∇f(x), y − x〉

for all x 6= y. Take any x1 6= x2 and α ∈ (0, 1). Setting y = x1, x2 andx = (1− α)x1 + αx2, we get

f(x1)− f((1− α)x1 + αx2) ≥ (>) 〈∇f(x), x1 − x〉f(x2)− f((1− α)x1 + αx2) ≥ (>) 〈∇f(x), x2 − x〉 .

Multiplying each by 1 − α and α respectively and adding the two inequalities,we get

(1− α)f(x1) + αf(x2)− f((1− α)x1 + αx2) ≥ (>)0,

so f is (strictly) convex.

Figure 10.4 shows the geometric intuition of Proposition 10.5. Since QR =f(y)− f(x) and SR = 〈∇f(x), y − x〉, we have f(y)− f(x) ≥ 〈∇f(x), y − x〉.

x y

P

Q

R

S

y − x

〈∇f(x), y − x〉

Slope = ∇f(x)

Figure 10.4: Characterization of a convex function.

A twice differentiable function f : R→ R is convex if and only if f ′′(x) ≥ 0for all x. The following proposition is the generalization for RN .

Proposition 10.6. Let U ⊂ RN be an open convex set and f : U → R be C2.Then f is convex if and only if the Hessian (matrix of second derivatives)

∇2f(x) =

[∂2f(x)

∂xm∂xn

]is positive semidefinite.

Proof. Suppose that f is a C2 function on U . Take any x 6= y ∈ U . ApplyingTaylor’s theorem to g(t) = f((1− t)x+ ty) for t ∈ [0, 1], there exists α ∈ (0, 1)such that

f(y)− f(x) = g(1)− g(0) = g′(0) +1

2g′′(α)

= 〈∇f(x), y − x〉+1

2

⟨y − x,∇2f(x+ α(y − x))(y − x)

⟩.

103

If f is convex, by Proposition 10.5

1

2

⟨y − x,∇2f(x+ α(y − x))(y − x)

⟩= f(y)− f(x)− 〈∇f(x), y − x〉 ≥ 0.

Since y is arbitrary, take any d ∈ RN and let y = x+ εd for small enough ε > 0such that y ∈ U . Dividing the above inequality by 1

2ε2 > 0 and letting ε → 0,

we get0 ≤

⟨d,∇2f(x+ εαd)d

⟩→⟨d,∇2f(x)d

⟩,

so∇2f(x) is positive semidefinite. Conversely, if∇2f(x) is positive semidefinite,then

f(y)− f(x)− 〈∇f(x), y − x〉 =1

2

⟨y − x,∇2f(x+ α(y − x))(y − x)

⟩≥ 0,

so by Proposition 10.5 f is convex.

10.4 Characterization of quasi-convex functions

As in the case with convex functions, there are simple ways to establish quasi-convexity if f is differentiable or C2.

Proposition 10.7. Let f be differentiable. Then f is quasi-convex if and onlyif for all x, y we have

f(y) ≤ f(x) =⇒ 〈∇f(x), y − x〉 ≤ 0. (10.7)

Proof. Suppose that f is quasi-convex and f(y) ≤ f(x). Then for any 0 < t ≤ 1we have

f((1− t)x+ ty) ≤ max f(x), f(y) = f(x) =⇒ 1

t(f(x+ t(y− x))− f(x)) ≤ 0.

Letting t→ 0, we obtain 〈∇f(x), y − x〉 ≤ 0, so (10.7) holds.Conversely, suppose that (10.7) holds. If f is not quasi-convex, there exist

x, y, and 0 ≤ t ≤ 1 such that

f((1− t)x+ ty) > max f(x), f(y) . (10.8)

Without loss of generality, we may assume f(x) ≥ f(y). Define g : [0, 1] → Rby g(t) = f((1− t)x+ ty) and T = t ∈ [0, 1] | g(t) > g(0). Since g(0) = f(x) ≥f(y) = g(1), (10.8) implies T 6= ∅ and T ⊂ (0, 1).

Let us show that t ∈ T implies g′(t) ≥ 0. To see this, take any t ∈ T and letx = (1− t)x+ ty and y = x. Since f(x) = g(t) > g(0) = f(x) = f(y), it followsfrom (10.7) that

〈∇f(x), y − x〉 ≤ 0 ⇐⇒ 0 ≤ 〈∇f((1− t)x+ ty), y − x〉 = g′(t).

Since g is continuous, T is open. Let (t1, t2) ⊂ T be a connected componentof T . By continuity, we have g(t1) = g(t2) = g(0). Since g(t) > g(0) on T ,we can take t3 ∈ (t1, t2) such that g(t3) > g(0) = g(t2). By the mean valuetheorem, there exists t4 ∈ (t3, t2) such that

g′(t4) =g(t2)− g(t3)

t2 − t3< 0,

which contradicts g′(t4) ≥ 0.

104

Proposition 10.8. Let f be C2. If f is quasi-convex, then

〈∇f(x), d〉 = 0 =⇒⟨d,∇2f(x)d

⟩≥ 0

for all x and d 6= 0. Conversely, if

〈∇f(x), d〉 = 0 =⇒⟨d,∇2f(x)d

⟩> 0

for all x and d 6= 0, then f is quasi-convex.

Proof. Let g(t) = f(x+td). If f is quasi-convex, so is g. Suppose 〈∇f(x), d〉 = 0but

⟨d,∇2f(x)d

⟩< 0. Since g′(0) = 〈∇f(x), d〉 = 0 and g′′(0) =

⟨d,∇2f(x)d

⟩<

0, t = 0 is a strict local maximum of g, which is a contradiction.Conversely, suppose

〈∇f(x), d〉 = 0 =⇒⟨d,∇2f(x)d

⟩> 0

for all x and d 6= 0. By Proposition 10.7, it suffices to show

f(y) ≤ f(x) =⇒ 〈∇f(x), y − x〉 ≤ 0.

If x = y, the claim is trivial. Suppose that x 6= y, f(y) ≤ f(x), and letd = y − x 6= 0. Suppose 〈∇f(x), y − x〉 > 0. Define g : [0, 1] → R by g(t) =f((1 − t)x + ty). Since g is continuous, it attains a maximum t ∈ [0, 1]. Sinceg(0) = f(x) ≥ f(y) = g(1) and g′(0) = 〈∇f(x), y − x〉 > 0, we have 0 < t < 1.Since t is an interior maximum, we have

0 = g′(t) = 〈∇f((1− t)x+ ty), d〉 = 0,

so by assumption

0 <⟨d,∇2f((1− t)x+ ty)d

⟩= g′′(t).

However, this shows that t is a strict local minimum, which is a contradiction.

10.5 Subgradient of convex functions

Suppose that f is a proper convex function and x ∈ int dom f . Since (x, f(x)) ∈epi f but (x, f(x)− ε) /∈ epi f for all ε > 0, we have (x, f(x)) /∈ int epi f . Henceby the separating hyperplane theorem, there exists a vector 0 6= (η, β) ∈ RN×Rsuch that

〈η, x〉+ βf(x) ≤ 〈η, y〉+ βz

for any (y, z) ∈ epi f . Letting z →∞, we get β ≥ 0. Letting z = f(y) we get

β(f(y)− f(x)) ≥ 〈−η, y − x〉

for all y. If β = 0, since x ∈ int dom f , the vector y − x can point to anydirection. Then η = 0, which contradicts (η, β) 6= 0. Therefore β > 0. Lettingξ = −η/β, we get

f(y)− f(x) ≥ 〈ξ, y − x〉 (10.9)

for any y.

105

A vector ξ that satisfies (10.9) is called a subgradient (subdifferential) of fat x ∈ int dom f . The set of all subgradients at x is denoted by

∂f(x) =ξ ∈ RN

∣∣ (∀y)f(y)− f(x) ≥ 〈ξ, y − x〉.

∂f(x) is a closed convex set (exercise).If f is partially differentiable, letting y = x+ td and letting t→ 0, we get

〈ξ, d〉 ≤ f(x+ td)− f(x)

t→ f ′(x; d) = 〈∇f(x), d〉

for all d, so ξ = ∇f(x) and ∂f(x) = ∇f(x).

Problems

10.1. Prove that epi f is a convex set if and only if

f((1− α)x1 + αx2) ≤ (1− α)f(x1) + αf(x2)

for all x1, x2 ∈ RN and α ∈ [0, 1].

10.2. Prove that f is quasi-convex if and only if

f((1− α)x1 + αx2) ≤ max f(x1), f(x2)

for all x1, x2 ∈ RN and α ∈ [0, 1].

10.3. This problem explains why it is convenient to allow the value ∞ forquasi-convex (in particular, convex) functions.

1. Let C ⊂ RN be convex and f : C → R quasi-convex. Define f : RN →(−∞,∞] by

f(x) =

f(x), (x ∈ C)

∞. (x /∈ C)

Show that f is quasi-convex.

2. Let f : RN → (−∞,∞] be quasi-convex. Show that C :=x ∈ RN

∣∣ f(x) <∞

is convex.

10.4. Prove Proposition 10.1.

10.5. This problem lists a few operations that preserve convexity.

1. Show that if fiIi=1 are convex, so is f =∑Ii=1 βifi for any β1, . . . , βI ≥ 0.

2. Show that if fii∈I are (quasi-)convex, so is f = supi∈I fi.

3. Suppose that h : RM → R is increasing (meaning x ≤ y implies h(x) ≤h(y)) and (quasi-)convex and gm : RN → R is convex for m = 1, . . . ,M .Prove that f(x) = h(g1(x), . . . , gM (x)) is (quasi-)convex.

10.6. Let X,Y be vector spaces, f : X ×Y → (−∞,∞] be (quasi-)convex, anddefine g : Y → [−∞,∞] by g(y) = infx∈X f(x, y). Show that g is (quasi-)convex.

106

10.7. Let X be a vector space, Y a set, Γ : X Y a correspondence (so foreach x ∈ X, Γ(x) is a subset of Y ), and f : Y → [−∞,∞]. Suppose that Γsatisfies

Γ((1− α)x1 + αx2) ⊂ Γ(x1) ∪ Γ(x2)

for all x1, x2 ∈ X and α ∈ [0, 1]. Define

g(x) = supy∈Γ(x)

f(y),

g¯(x) = inf

y∈Γ(x)f(y).

Prove that g is quasi-convex and g¯

is quasi-concave.

10.8. Let C be a convex set of a vector spaceX. We say that x ∈ C is an extremepoint if there exist no x1 6= x2 ∈ C and α ∈ (0, 1) such that x = (1−α)x1 +αx2.If f : C → R is strictly quasi-convex and x ∈ C achieves the maximum of fover C, prove that x is an extreme point of C.

10.9. Let f : [a, b]→ R be convex, continuous, and f(a) < 0 < f(b). Show thatthere exists a unique x ∈ (a, b) such that f(x) = 0.

10.10. Let f : (a, b)→ R be convex.

1. Show that for each x ∈ (a, b),

g±(x) := limh→±0

f(x+ h)− f(x)

h

exist.

2. Show that g−(x) ≤ g+(x) for each x ∈ (a, b).

3. Show that f is differentiable on (a, b) except at at most countably manypoints.

10.11. Let ∅ 6= X ⊂ RN and u : X → R. Define e : RN × R→ [−∞,∞] by

e(p, u) = inf p · x |x ∈ X,u(x) ≥ u ,

where by convention we define inf ∅ = ∞. (Economically, X is a consumptionset, u is a utility function, p is a price vector, and e is the minimum expenditureto achieve utility level u given the price vector p, which is called the expenditurefunction.) Prove that e(p, u) is concave in p.

10.12. Let ∅ 6= X ⊂ RN and u : X → R. Define v : RN × R→ [−∞,∞] by

v(p, w) = sup u(x) |x ∈ X, p · x ≤ w ,

where by convention we define sup ∅ = −∞. (Economically, X is a consumptionset, u is a utility function, p is a price vector, w is wealth, and v is the maximumutility given the price vector p and wealth w, which is called the indirect utilityfunction.)

1. Take any (pj , wj) ∈ RN × R and define p = (1 − α)p1 + αp2, w = (1 −α)w1 + αw2, where α ∈ [0, 1]. Show that if x ∈ X satisfies p · x ≤ w, thenpj · x ≤ wj for at least one j.

107

2. Prove that v(p, w) is quasi-convex in (p, w).

10.13. Let f : R → R be defined by f(x) = |x|. Compute the subdifferential∂f(x).

10.14. Let f be a proper convex function and x ∈ int dom f . Prove that ∂f(x)is a closed convex set.

10.15. Define f : R→ (−∞,∞] by

f(x) =

∞, (x < 0)

−√x. (x ≥ 0)

1. Show that f is a proper convex function.

2. Compute the subdifferential ∂f(x) for x > 0. Does ∂f(0) exist?

108

Chapter 11

Convex Programming

11.1 Convex programming

Consider the minimization problem

minimize f(x) subject to x ∈ C. (11.1)

The minimization problem (11.1) is called a convex programming problem iff(x) is a convex function and C is a convex set. By redefining f such thatf(x) =∞ for x /∈ C, the constrained optimization problem (11.1) becomes theunconstrained optimization problem

minimize f(x).

11.1.1 Sufficiency without constraints

By Proposition 4.1, if f is differentiable and x minimizes f , then ∇f(x) = 0,which is called the first-order necessary condition. In general, the first-ordercondition is not sufficient. For instance, for f(x) = x3 we have f ′(0) = 0, butx = 0 does not minimize f since f(−1) = −1 < 0 = f(0).

For a convex programming problem, however, the first-order necessary con-dition is also sufficient, as the following proposition shows.

Proposition 11.1. Let f : RN → (−∞,∞] be a proper convex function. Thenx ∈ int dom f is a solution to (11.1) if and only if 0 ∈ ∂f(x). In particular, iff is differentiable, f(x) = min f(x) if and only if ∇f(x) = 0.

Proof. If f(x) = min f(x), then for any x

f(x)− f(x) ≥ 0 = 〈0, x− x〉 = 0,

so 0 ∈ ∂f(x). If 0 ∈ ∂f(x), then

f(x)− f(x) ≥ 〈0, x− x〉 = 0,

so f(x) = min f(x).

109

11.1.2 Saddle Point Theorem

In applications, the constraint set is often given by inequalities and equations.Consider the constrained minimization problem

minimize f(x)

subject to gi(x) ≤ 0 (i = 1, . . . , I),

hj(x) = 0 (j = 1, . . . , J),

x ∈ Ω, (11.2)

where f , gi’s, and hj ’s are functions and Ω ⊂ RN is a set. The constraintsgi(x) ≤ 0 and hj(x) = 0 are called inequality and equality constraints, respec-tively. The condition x ∈ Ω is called a side constraint.

When f, gi’s are convex, hj ’s are affine (so hj(x) = 〈aj , x〉 − bj for someaj ∈ RN\ 0, bj ∈ R for all j), and Ω is convex, then (11.2) is a convexprogramming problem. Without loss of generality, we may assume that aj islinearly independent, for otherwise either the constraint set is empty or someconstraints are redundant (Problem 11.2). Letting A = (a1, . . . , aJ)′ (an J ×Nmatrix) and b = (b1, . . . , bJ)′, the equality constraints can be compactly writtenas Ax− b = 0.

To characterize the solution of (11.2), define the Lagrangian by

L(x, λ, µ) =

f(x) +

∑Ii=1 λigi(x) +

∑Jj=1 µjhj(x), (λ ∈ RI+)

−∞, (λ /∈ RI+)

where λ = (λ1, . . . , λI) ∈ RI and µ = (µ1, . . . , µJ) ∈ RJ are called Lagrangemultipliers. (Defining L to be −∞ when λ /∈ RI+ is useful later when explainingduality.) A point (x, λ, µ) ∈ Ω × RI × RJ is called a saddle point if it achievesthe minimum with respect to x and maximum with respect to (λ, µ). Formally,(x, λ, µ) is a saddle point if

L(x, λ, µ) ≤ L(x, λ, µ) ≤ L(x, λ, µ) (11.3)

for all (x, λ, µ) ∈ Ω× RI+J .The following theorem gives necessary and sufficient conditions for optimal-

ity.

Theorem 11.2 (Saddle Point Theorem). Let Ω ⊂ RN be a convex set. Supposethat f, gi : Ω → (−∞,∞]’s are convex and hj’s are affine in the minimizationproblem (11.2). Let

(h1(x), . . . , hJ(x))′ = Ax− b,

where A is an J ×N matrix and b ∈ RJ .

1. If (i) x is a solution to the minimization problem (11.2), (ii) there existsx0 ∈ RN such that gi(x0) < 0 for all i and Ax0 − b = 0, and (iii) 0 ∈int(AΩ − b), then there exist Lagrange multipliers λ ∈ RI+ and µ ∈ RJsuch that (x, λ, µ) is a saddle point of L.

2. If there exist Lagrange multipliers λ ∈ RI+ and µ ∈ RJ such that (x, λ, µ)is a saddle point of L, then x is a solution to the minimization problem(11.2).

110

Remark 11.1. Condition (1ii) is called the Slater constraint qualification andwill be discussed in more detail in Chapter 12. In applications, we often haveΩ = RN . Then condition (1iii) holds automatically since ARN − b = RJ whenthe row vectors of A are linearly independent, which we may assume without lossof generality. Condition (1iii) also holds when there are no equality constraints(J = 0).

Proof of Theorem 11.2.

Necessity (Claim 1). Assume that x ∈ Ω is a solution to (11.2). Define thesets C,D ⊂ R1+I+J by

C =

(u, v, w) ∈ R1+I+J∣∣ (∃x ∈ Ω)u ≥ f(x), (∀i)vi ≥ gi(x), w = Ax− b

,

D =

(u, v, w) ∈ R1+I+J∣∣u < f(x), (∀i)vi < 0, (∀j)wj = 0

.

(See Figure 11.1.)

v

u

f(x)

C

D

O

Figure 11.1: Saddle point theorem.

Clearly C,D are convex since f, gi’s are convex and Ω is convex. Since

(f(x), v, Ax− b) ∈ C

for vi = gi(x), C is nonempty. Letting v0i = gi(x0) < 0, v0 = (v01, . . . , v0I), andu0 < f(x), we have (u0, v0, 0) ∈ D, so D is nonempty. If (u, v, w) ∈ C∩D, since(u, v, w) ∈ D we have u < f(x), v 0, and w = 0. Then since (u, v, 0) ∈ C thereexists x ∈ Ω such that f(x) ≤ u < f(x), gi(x) < 0 for all i, and Ax − b = 0,contradicting the optimality of x. Therefore C ∩ D = ∅. By the separatinghyperplane theorem, there exists 0 6= (α, β, γ) ∈ R× RI × RJ such that

sup(u,v,w)∈D

αu+ 〈β, v〉+ 〈γ,w〉 ≤ inf(u,v,w)∈C

αu+ 〈β, v〉+ 〈γ,w〉 .

Taking (u, v, w) ∈ D and letting u → −∞, it must be α ≥ 0. Similarly, lettingvi → −∞, it must be βi ≥ 0 for all i.

Let us show that α > 0. For any ε > 0, we have (f(x)− ε,−ε1, 0) ∈ D, so

α(f(x)− ε)− ε 〈β,1〉 ≤ αf(x) +

I∑i=1

βigi(x) +

J∑j=1

γjhj(x) (11.4)

111

for any x. Letting x = x0 and ε→ 0, we get

αf(x) ≤ αf(x0) +

I∑i=1

βigi(x0).

If α = 0, since by assumption gi(x0) < 0, it must be βi = 0 for all i. If J = 0(no equality constraints), then (α, β) = 0, a contradiction. Hence α > 0. IfJ > 0, then

sup(u,v,w)∈D

〈γ,w〉 ≤ inf(u,v,w)∈C

〈γ,w〉 ⇐⇒ (∀x ∈ Ω)γ′(Ax− b) ≥ 0.

Since by assumption 0 is an interior point of AΩ − b, for small enough δ > 0there exists x ∈ Ω with −δγ = Ax − b. Therefore −δ ‖γ‖2 ≥ 0 implies γ = 0.Then (α, β, γ) = 0, a contradiction. Hence α > 0.

Since α > 0, define λ = β/α and µ = γ/α. Then letting ε→ 0 in (11.4), weget

f(x) ≤ f(x) +

I∑i=1

λigi(x) +

J∑j=1

µjhj(x) =: L(x, λ, µ)

for all x. Since λi ≥ 0 and gi(x) ≤ 0 for all i and hj(x) = 0 for all j, it followsthat

L(x, λ, µ) = f(x) +

I∑i=1

λigi(x) +

J∑j=1

µjhj(x)

≤ f(x) ≤ f(x) +

I∑i=1

λigi(x) +

J∑j=1

µjhj(x) = L(x, λ, µ),

which is the right inequality of (11.3) and it must be λigi(x) = 0 for all i. Itremains to show the left inequality of (11.3). If λ /∈ RI+, by the definition ofthe Lagrangian we have L(x, λ, µ) = −∞, so it is trivial. If λ ∈ RI+, then sincegi(x) ≤ 0 and hj(x) = 0, we obtain

L(x, λ, µ) = f(x) +

I∑i=1

λigi(x) +

J∑j=1

µjhj(x)

≤ f(x) = L(x, λ, µ),

which is the left inequality of (11.3).

Sufficiency (Claim 2). Assume that (x, λ, µ) ∈ Ω×RI+×RJ is a saddle pointof L. By the left inequality of (11.3), for any λ ∈ RI+ and µ ∈ RJ we obtain

f(x) +

I∑i=1

λigi(x) +

J∑j=1

µjhj(x) ≤ f(x) +

I∑i=1

λigi(x) +

J∑j=1

µjhj(x)

=⇒I∑i=1

λigi(x) +

J∑j=1

µjhj(x) ≤I∑i=1

λigi(x) +

J∑j=1

µjhj(x).

112

Letting µj → ±∞, it must be hj(x) = 0 for all j. Letting λi → ∞, we get

gi(x) ≤ 0 for all i. Letting λ = 0, we get 0 ≤∑Ii=1 λigi(x), so it must be

λigi(x) = 0 for all i. Then by the right inequality of (11.3), for any x ∈ Ω weobtain

f(x) +

I∑i=1

λigi(x) +

J∑j=1

µjhj(x) ≤ f(x) +

I∑i=1

λigi(x) +

J∑j=1

µjhj(x)

=⇒ f(x) ≤ f(x) +

I∑i=1

λigi(x) +

J∑j=1

µjhj(x).

Since λi ≥ 0, if gi(x) ≤ 0 and hj(x) = 0 it follows that f(x) ≤ f(x), so x is asolution to the constrained minimization problem (11.2).

11.1.3 Necessity and sufficiency of KKT conditions

The following corollary is useful for computing the solution of a convex pro-gramming problem.

Corollary 11.3. Let Ω ⊂ RN be a convex set, f, gi : Ω → (−∞,∞] be convexand differentiable on Ω, and hj be affine.

1. If (i) x is a solution to the minimization problem (11.2), (ii) there existsx0 ∈ RN such that gi(x0) < 0 for all i and Ax0 − b = 0, and (iii) 0 ∈int(AΩ − b), then there exist Lagrange multipliers λ ∈ RI+ and µ ∈ RJsuch that

∇f(x) +

I∑i=1

λi∇gi(x) +

J∑j=1

µj∇hj(x) = 0, (11.5a)

(∀i)λi ≥ 0, gi(x) ≤ 0, λigi(x) = 0, (11.5b)

(∀j)hj(x) = 0. (11.5c)

2. If there exist Lagrange multipliers λ ∈ RI+ and µ ∈ RJ such that (11.5)holds, then x is a solution to the minimization problem (11.2).

Proof. If x is a solution, by the saddle point theorem and its proof, there ex-ists Lagrange multipliers such that (11.5b) and (11.5c) hold and x minimizesL(x, λ, µ). Then by Proposition 11.1, (11.5a) holds.

If (11.5) holds, then by condition (11.5a) and Proposition 11.1, x minimizesL(x, λ, µ). By conditions (11.5b) and (11.5c), we can show that x solves (11.4)by imitating the sufficiency proof of Theorem 11.2.

Condition (11.5a) is called the first-order condition. Condition (11.5b) iscalled the complementary slackness condition.

By Corollary 11.3, we can solve a constrained minimization problem (11.2)as follows.

Step 1. Verify that the functions f, gi’s are convex, hj ’s are affine, and theconstraint set Ω is convex.

113

Step 2. Verify the Slater condition (there exists x0 ∈ Ω such that gi(x0) < 0 forall i and hj(x0) = 0 for all j) and 0 ∈ intAΩ− b.

Step 3. Form the Lagrangian

L(x, λ, µ) = f(x) +

I∑i=1

λigi(x) +

J∑j=1

µjhj(x).

Derive the first-order condition and complementary slackness condition(11.5).

Step 4. Solve (11.5). If there is a solution x, it is a solution to the minimizationproblem 11.2. Otherwise, there are no solutions.

As an example, consider the problem

minimize1

x1+

1

x2


x1, x2 > 0.

Let us solve this problem step by step.

Step 1. Let f(x1, x2) = 1x1

+ 1x2

be the objective function. Since

(1/x)′′ = (−x−2)′ = 2x−3 > 0,

the Hessian of f ,

∇2f(x1, x2) =

[2x−3

1 00 2x−3

2

],

is positive definite. Therefore f is convex. Let g(x1, x2) = x1 + x2 − 2.Since g is affine, it is convex. Clearly the set

Ω = R2++ = (x1, x2) |x1, x2 > 0

is convex.

Step 2. For (x1, x2) = (12 ,

12 ) we have g(x1, x2) = −1 < 0, so the Slater condition

holds.

Step 3. Let

L(x1, x2, λ) =1

x1+

1

x2+ λ(x1 + x2 − 2)

be the Lagrangian. The first-order condition is

0 =∂L

∂x1= − 1

x21

+ λ ⇐⇒ x1 =1√λ,

0 =∂L

∂x2= − 1

x22

+ λ ⇐⇒ x2 =1√λ.

The complementary slackness condition is

λ(x1 + x2 − 2) = 0.

114

Step 4. From these equations it must be λ > 0 and

x1 + x2 − 2 = 0 ⇐⇒ 2√λ− 2 = 0 ⇐⇒ λ = 1

and x1 = x2 = 1/√λ = 1. Therefore (x1, x2) = (1, 1) is the (only)

solution.

11.1.4 Quasi-convex programming

When f, gi are not convex, but only quasi-convex, we can still give a sufficientcondition for optimality.

Theorem 11.4. Consider the constrained optimization problem

minimize f(x)

subject to gi(x) ≤ 0 (i = 1, . . . , I),

where f, gi’s are differentiable and quasi-convex. Suppose that the Slater con-straint qualification holds and that x and λ satisfy the KKT conditions. If∇f(x) 6= 0 then x is a solution.

Proof. First, let us show that 〈∇f(x), x− x〉 ≥ 0 for all feasible x. Multiplyingx− x as an inner product to the first-order condition

∇f(x) +

I∑i=1

λi∇gi(x) = 0,

we obtain

〈∇f(x), x− x〉 = −I∑i=1

λi 〈∇gi(x), x− x〉 .

Therefore it suffices to show λi 〈∇gi(x), x− x〉 ≤ 0 for all i. If gi(x) < 0,by complementary slackness we have λi = 0, so there is nothing to prove. Ifgi(x) = 0, since x is feasible, we have gi(x) ≤ 0 = gi(x). Hence by Proposition10.7, we have 〈∇gi(x), x− x〉 ≤ 0. Since λi ≥ 0, we have λi 〈∇gi(x), x− x〉 ≤ 0.

Next, let us show that there exists a feasible point x1 such that

〈∇f(x), x1 − x〉 > 0.

Consider the point x0 in the Slater condition. By the previous result we have〈∇f(x), x0 − x〉 ≥ 0. Since ∇f(x) 6= 0, letting x1 = x0 + ε∇f(x) with ε > 0sufficiently small, then by the Slater condition x1 is feasible and

〈∇f(x), x1 − x〉 ≥ ε ‖∇f(x)‖2 > 0.

Finally, take any feasible x and x1 as above. Since 〈∇f(x), x− x〉 ≥ 0, forany 0 < t < 1 we have

〈∇f(x), (1− t)x+ tx1 − x〉 = 〈∇f(x), (1− t)(x− x) + t(x1 − x)〉= (1− t) 〈∇f(x), x− x〉+ t 〈∇f(x), x1 − x〉 > 0.

Since f is quasi-convex, by Proposition 10.7 this inequality implies that

f((1− t)x+ tx1) > f(x).

Letting t→ 0, we get f(x) ≥ f(x), so x is a solution.

115

The rest of this chapter contains applications to portfolio selection and cap-ital asset pricing model, which can be skipped by uninterested readers.

11.2 Portfolio selection

11.2.1 The problem

Suppose you live in a world with two periods, t = 0, 1. There are J assetsindexed by j = 1, . . . , J . Asset j trades at price Pj per share and it pays off Xj

per share at t = 1, where Xj is a random variable. You have some wealth w0

to invest at t = 0 and let w1 be your wealth at t = 1. Assume that (as most ofyou do) you like money but dislike risk. So suppose you want to minimize thevariance Var[w1] while keeping the expected wealth E[w1] at some value w.

This is the classic portfolio selection problem studied by Harry Markowitz1

in his dissertation at University of Chicago, which was eventually published inJournal of Finance Markowitz (1952) and made him win the Nobel Prize in1990.

11.2.2 Mathematical formulation

To mathematically formulate the problem, let nj be the number of shares youbuy. (If nj > 0, you buy asset j; if nj < 0, you shortsell asset j. I assume thatshortselling is allowed.) Then the problem is

minimize Var

J∑j=1

Xjnj

subject to E

J∑j=1

Xjnj

= w,

J∑j=1

Pjnj = w0.

Rj = Xj/Pj be the gross return of asset j and let θj = Pjnj/w0 be the

fraction of wealth invested in asset j. Since by definition∑Jj=1 θj = 1 and

Xjnj =Xj

Pj

Pjnjw0

w0 = w0Rjθj ,

the problem is equivalent to

minimize Var[R(θ)]

subject to E[R(θ)] = µ,∑j

θj = 1,

where θ = (θ1, . . . , θJ), R(θ) =∑j Rjθj , and µ = w/w0.

11.2.3 Solution

Let µj = E[Rj ] be the expected return of asset j and Σ = (σij) be the variance-covariance matrix of (R1, . . . , RJ), where

σij = Cov[Ri, Rj ] = E[(Ri − µi)(Rj − µj)].1Harry Markowitz is currently Professor of Finance at Rady School of Management, UCSD.

116

Then the problem can be stated as

minimize1

2〈θ,Σθ〉

subject to 〈µ, θ〉 = µ, 〈1, θ〉 = 1,

where µ = (µ1, . . . , µJ) is the vector of expected returns and 1 = (1, . . . , 1) isthe vector of ones. (The 1

2 is there for making the first derivative nice.) Assumethat Σ is positive definite. Then the objective function is strictly convex. TheLagrangian is

L(θ, λ1, λ2) =1

2〈θ,Σθ〉+ λ1(µ− 〈µ, θ〉) + λ2(1− 〈1, θ〉).

The first-order condition is

Σθ − λ1µ− λ21 = 0 ⇐⇒ θ = λ1Σ−1µ+ λ2Σ−11.

The first interesting observation is that the optimal portfolio θ is spanned bytwo vectors, Σ−1µ and Σ−11 (mutual fund theorem). Substituting θ into theconstraints, we obtain

λ1

⟨µ,Σ−1µ

⟩+ λ2

⟨µ,Σ−11

⟩= µ,

λ1

⟨1,Σ−1µ

⟩+ λ2

⟨1,Σ−11

⟩= 1.

Noting that⟨µ,Σ−11

⟩=⟨1,Σ−1µ

⟩because Σ is symmetric and letting[

a bb c

]=

[⟨µ,Σ−1µ

⟩ ⟨µ,Σ−11

⟩⟨1,Σ−1µ

⟩ ⟨1,Σ−11

⟩]−1

,

we get [λ1

λ2

]=

[a bb c

] [µ1

].

Using the first-order condition and the constraints, the minimum variance is

σ2 = 〈θ,Σθ〉 = 〈θ, λ1µ+ λ21〉 = λ1µ+ λ21

= [µ, 1]

[a bb c

] [µ1

]= aµ2 + 2bµ+ c

= a

(µ+

b

a

)2

+ c− b2

a= a(µ−m)2 + s2.

Using1

ac− b2

[c −b−b a

]=

[⟨µ,Σ−1µ

⟩ ⟨µ,Σ−11

⟩⟨1,Σ−1µ

⟩ ⟨1,Σ−11

⟩] ,we can compute

m = − ba

=

⟨1,Σ−1µ

⟩〈1,Σ−11〉

,

s2 =ac− b2

a=

1

〈1,Σ−11〉.

117

The relationship between the target return µ and standard deviation σ,

σ2 = a(µ−m)2 + s2 ⇐⇒ µ = m± 1√a

√σ2 − s2,

is a hyperbola (Figure 11.2). A portfolio satisfying this relation is called anmean-variance efficient portfolio.

σ

µ

m

s

Slope = 1√a

O

Figure 11.2: Mean-variance efficient frontier.

11.3 Capital asset pricing model (CAPM)

11.3.1 The model

Now consider an economy consisting of I agents indexed by i = 1, . . . , I. Letwi > 0 be the initial wealth of agent i. Instead of minimizing the variance ofportfolio returns subject to a target expected portfolio returns, suppose thatagent i wants to maximize

vi(θ) = E[R(θ)]− 1

2τiVar[R(θ)],

where R(θ) is the portfolio return and τi > 0 is the “risk tolerance”. Assumethat in addition to the risky J assets, agents can trade a risk-free asset in zeronet supply, where the risk-free rate Rf is determined in equilibrium. Letting θjthe fraction of wealth invested in asset j, the fraction of wealth invested in therisk-free asset is 1−

∑j θj . Therefore the portfolio return is

R(θ) =∑j

Rjθj +Rf

1−∑j

θj

= Rf +∑j

(Rj −Rf )θj .

The expected return and variance of the portfolio are

E[R(θ)] = Rf + 〈µ−Rf1, θ〉 , Var[R(θ)] = 〈θ,Σθ〉 ,

118

respectively. Thus the optimal portfolio problem of agent i reduces to

maximize Rf + 〈µ−Rf1, θ〉 − 1

2τi〈θ,Σθ〉 ,

where θ ∈ RJ is unconstrained. The first-order condition is

µ−Rf1− 1

τiΣθ = 0 ⇐⇒ θi = τiΣ

−1(µ−Rf1), (11.6)

where the subscript i indicates that it refers to agent i.

11.3.2 Equilibrium

Mathematics ends and economics starts here. Since every asset must be heldby someone and the risk-free asset is in zero net supply by definition, theaverage portfolio weighted by individual wealth,

∑wiθi/

∑i wi, must be the

market portfolio (value-weighted average portfolio), denoted by θm. Letting

τ =∑Ii=1 wiτi/

∑Ii=1 wi be the “average risk tolerance”, it follows from the

first-order condition (11.6) that

θm = τΣ−1(µ−Rf1). (11.7)

Comparing (11.6) and (11.7), we obtain

θi =τiτθm. (11.8)

This means that you should hold risky assets in the same proportion as in themarket portfolio, where the fraction of the market portfolio is τi/τ and thefraction of the risk-free asset is 1− τi/τ . Thus the mutual fund theorem holdswith the market portfolio and the risk-free asset. This strong implication hadan enormous impact on investment practice, and led to the creation of the firstindex fund in 1975 by Vanguard.2

11.3.3 Asset pricing

Since by definition the market portfolio invests only in risky assets, we have

1 = 〈1, θm〉 = τ⟨1,Σ−1(µ−Rf1)

⟩⇐⇒ Rf =

⟨1,Σ−1µ

⟩− 1

τ

〈1,Σ−11〉,

which determines the risk-free rate. Since the market portfolio does not investin the risk-free asset, it must lie on the hyperbola corresponding to no risk-freeasset (Figure 11.4).

Letting Rm =∑j Rjθm,j be the market return, we have

Cov[Rm, Ri] = Cov

Ri,∑j

Rjθm,j

= (Σθm)i.

Hence multiplying both sides of (11.7) by Σ and taking the i-th element, weobtain

Cov[Rm, Ri] = τ(E[Ri]−Rf ). (11.9)

2See http://en.wikipedia.org/wiki/Index_fund.

119

http://en.wikipedia.org/wiki/Index_fund

Multiplying both sides of (11.9) by θm,i and summing of i, we obtain

Var[Rm] = Cov[Rm, Rm] = τ(E[Rm]−Rf ). (11.10)

Dividing (11.9) by (11.10), we obtain the covariance pricing formula

E[Ri]−Rf =Cov[Rm, Ri]

Var[Rm](E[Rm]−Rf ). (11.11)

Thus, the excess return of an asset E[Ri]−Rf is proportional to the covarianceof the asset with the market, Cov[Rm, Ri]. The quantity

βi =Cov[Rm, Ri]

Var[Rm]

is called the beta of the asset. By definition, the market beta is βm = 1. Betameasures the market risk of an asset.

Rewriting (11.11), we obtain

E[Ri] = Rf + βi(E[Rm]−Rf ).

The theoretical linear relationship between βi and E[Ri] is called the securitymarket line (SML) (Figure 11.3). An asset above (below) the security marketline, that is,

E[Ri] > (<)Rf + βi(E[Rm]−Rf )

is undervalued (overvalued) because the expected return is higher (lower) thanpredicted.

β

E[R]

Rf

Market portfolio

SML

βi

E[Ri]

1

E[Rm]

O

Undervalued

Overvalued

Figure 11.3: Security market line.

Since both sides of (11.11) are linear in Ri, (11.11) also holds for any linearcombination (hence portfolio) of assets. In particular, letting Ri be the optimalportfolio return of agent i, R(θi), it follows from (11.8) that

σθi =√

Var[R(θi)] =τiτ

√Var[Rm],

Cov[Rm, R(θi)] =τiτ

Var[Rm].

120

Substituting these equations into (11.11), eliminating τi/τ , and letting σm =√Var[Rm], we get

E[R(θi)]−Rf =E[Rm]−Rf

σmσθi .

This linear relationship between the standard deviation of the optimal portfolioσθi and the excess return E[R(θi)]−Rf is called the capital market line, denotedby CML in Figure 11.4. Agents that are relatively risk tolerant (τi > τ) borrowand choose a portfolio on the capital market line to the right of the marketportfolio. Agents that are relatively risk averse (τi < τ) lend and choose aportfolio on the capital market line to the left of the market portfolio. Theslope,

E[Rm]−Rfσm

,

is called the Sharpe ratio, named after William Sharpe who invented it (Sharpe,1964). The capital market line is tangent to the efficient frontier with no risk-free asset at the market portfolio. Clearly the market portfolio (and any optimalportfolio) attains the maximum Sharpe ratio.

σ

µ

Market Portfolio

CML

σm

µm

Rf

O

Figure 11.4: Capital market line.

Problems

11.1. Let Ω be a convex set and f : Ω→ (−∞,∞] be quasi-convex.

1. Show that the set of solutions to minx∈Ω f(x) is a convex set.

2. If f is strictly quasi-convex, show that the solution (if it exists) is unique.

11.2. Let ajJj=1 be vectors in RN and bjJj=1 be scalars. Define the set

C =x ∈ RN

∣∣ (∀j) 〈aj , x〉 = bj.

Show that if ajJj=1 are linearly dependent, then either C = ∅ or some con-

straints are redundant (i.e., we may drop some j without affecting the set C).

121

Chapter 12

Nonlinear Programming

12.1 The problem and the solution concept

We are interested in solving a general constrained optimization problem

minimize f(x) subject to x ∈ C, (12.1)

where f is the objective function and C ⊂ RN is the constraint set. Such anoptimization problem is called a linear programming problem when both theobjective function and the constraints are linear. Otherwise, it is called a non-linear programming problem. If both the objective function and the constraintshappen to be convex, it is called a convex programming problem. Thus

Linear Programming ⊂ Convex Programming ⊂ Nonlinear Programming.

We focus on minimization because maximizing f is the same as minimizing−f . If f is continuous and C is compact, by the extreme value theorem (The-orem 2.5) we know there is a solution, but the theorem does not tell you howto compute it. In this chapter you will learn how to derive necessary conditionsfor optimality for general nonlinear programming problems. Oftentimes, thenecessary conditions alone will pin down the solution to a few candidates, soyou only need to compare these candidates.

We call the point x ∈ C a global solution if f(x) ≥ f(x) for all x ∈ C and alocal solution if there exists ε > 0 such that f(x) ≥ f(x) whenever x ∈ C and‖x− x‖ < ε. If x is a global solution, clearly it is also a local solution.

12.2 Cone and dual cone

We first introduce some mathematical concepts, cones and dual cones.A set C ⊂ RN is said to be a cone if it contains a ray originating from 0

and passing through any point of C. Formally, C is a cone if x ∈ C and α ≥ 0implies αx ∈ C. An example of a cone is the nonnegative orthant

RN+ =x = (x1, . . . , xN ) ∈ RN

∣∣ (∀n)xn ≥ 0.

Another example is the setx =

K∑k=1

αkak

∣∣∣∣∣ (∀k)αk ≥ 0

, (12.2)

122

where a1, . . . , aK are fixed vectors. The set (12.2) is called the polyhedral conegenerated by vectors a1, . . . , aK , and is denoted by cone[a1, . . . , aK ] (Figure12.1). Clearly RN+ = cone[e1, . . . , eN ], where e1, . . . , eN are unit vectors of RN .A polyhedral cone is a closed convex cone (Problem 12.1).

a1

a2

O

cone[a1, a2]

Figure 12.1: Cone generated by vectors.

Let C ⊂ RN be any nonempty set. The set

C∗ =y ∈ RN

∣∣ (∀x ∈ C) 〈x, y〉 ≤ 0

(12.3)

is called the dual cone of C. Thus the dual cone C∗ consists of all vectors thatmake an obtuse angle with any vector in C (Figure 12.2).

O

C

C∗

Figure 12.2: Cone and its dual.

Note that in the definition of the dual cone (12.3), the set C is arbitrary(not necessarily a cone). Yet, C∗ is called the dual cone, which suggests thatC∗ is always a cone. In fact this is the case, and the following proposition provessome basic properties of the dual cone.

Proposition 12.1. Let ∅ 6= C ⊂ D. Then (i) the dual cone C∗ is a nonempty,closed, convex cone, (ii) C∗ = (coC)∗, and (iii) C∗ ⊃ D∗.

Proof. C∗ is nonempty since 0 ∈ C∗. If y ∈ C∗, then by definition 〈x, y〉 ≤ 0for all x ∈ C. Then for any α ≥ 0 and x ∈ C, we have 〈x, αy〉 = α 〈x, y〉 ≤ 0, so

123

αy ∈ C∗. Therefore C∗ is a cone. To show that C∗ is closed, take any sequenceyk∞k=1 ⊂ C∗ and yk → y. Since yk ∈ C∗, by definition 〈x, yk〉 ≤ 0 for allx ∈ C, so letting k → ∞ we obtain 〈x, y〉 ≤ 0 for all x ∈ C. Therefore y ∈ C∗and hence C∗ is closed.

If y ∈ D∗, then 〈x, y〉 ≤ 0 for all x ∈ D. Since C ⊂ D, we have 〈x, y〉 ≤ 0 forall x ∈ C. Therefore y ∈ C∗, which proves D∗ ⊂ C∗. Finally, since C ⊂ coC,we have C∗ ⊃ (coC)∗ by letting D = coC. To prove the reverse inclusion, take

any x ∈ coC. By Lemma 9.1, there exists a convex combination x =∑Kk=1 αkxk

such that xk ∈ C for all k. If y ∈ C∗, it follows that

〈x, y〉 =⟨∑

αkxk, y⟩

=∑

αk 〈xk, y〉 ≤ 0,

so y ∈ (coC)∗. Therefore C∗ ⊂ (coC)∗.

Proposition 12.2. Let C ⊂ RN be a nonempty cone. Then C∗∗ = cl coC.(C∗∗ is the dual cone of C∗, so it is the dual cone of the dual cone of C.)

Proof. Let x ∈ C. For any y ∈ C∗ we have 〈x, y〉 ≤ 0. This implies x ∈ C∗∗.Hence C ⊂ C∗∗. Since by Proposition 12.1 the dual cone is closed and convex,we have cl coC ⊂ C∗∗.

To show the reverse inclusion, suppose that x /∈ cl coC. Then by Proposition9.4 there exists a nonzero vector a such that

supz∈cl coC

〈a, z〉 < c < 〈a, x〉 .

In particular, 〈a, z〉 < c for all z ∈ C. Since C is a cone, it must be 〈a, z〉 ≤ 0for all z. To see this, suppose there exists z0 ∈ C with 〈a, z0〉 > 0. Then for anyβ > 0 we have βz0 ∈ C, so letting β large enough we have 〈a, βz0〉 = β 〈a, z0〉 >c, a contradiction. Since 〈a, z〉 ≤ 0 for all z ∈ C, we have a ∈ C∗. Again sinceC is a cone, we have 0 ∈ C, so c > 〈a, 0〉 = 0. Since 〈a, x〉 > c > 0 and a ∈ C∗,it follows that x /∈ C∗∗. Therefore C∗∗ ⊂ cl coC.

The following corollary plays an important role in optimization theory.

Corollary 12.3 (Farkas). Let C = cone[a1, . . . , aK ] be the polyhedral cone gen-erated by the vectors a1, . . . , aK . Let D =

y ∈ RN

∣∣ (∀k) 〈ak, y〉 ≤ 0

. ThenD = C∗ and C = D∗ (Figure 12.3).

Proof. Let y ∈ D. For any x ∈ C, we can take αkKk=1 ⊂ R+ such thatx =

∑k αkak. Then

〈x, y〉 =∑k

αk 〈ak, y〉 ≤ 0,

so y ∈ C∗. Conversely, let y ∈ C∗. Since ak ∈ C, we get 〈ak, y〉 ≤ 0 for all k, soy ∈ D. Therefore D = C∗.

Since C is a closed convex cone, by Propositions 12.2 and 12.1 (iii) we get

C = cl coC = C∗∗ = (C∗)∗ = D∗.

124

a1

a2

O

C = D∗

D = C∗

Figure 12.3: Farkas’ lemma.

12.3 Necessary condition

In this section we derive the first-order necessary condition for optimality usingthe tangent cone of the constraint set.

Let C ⊂ RN be any nonempty set and x ∈ C be any point. The tangentcone of C at x is defined by

TC(x) =

y ∈ RN

∣∣∣∣ (∃) αk ≥ 0, xk ⊂ C, limk→∞

xk = x, y = limk→∞

αk(xk − x)

.

That is, y ∈ TC(x) if y points to the same direction as the limiting direction ofxk − x as xk approaches to x. Intuitively, the tangent cone of C at x consistsof all directions that can be approximated by that from x to another point in C.Figure 12.4 shows an example. Here C is the region in between the two curves,and the tangent cone is the shaded area.

x

TC(x)

C

Figure 12.4: Tangent cone.

Lemma 12.4. TC(x) is a nonempty closed cone.

Proof. Setting αk = 0 for all k we get 0 ∈ TC(x), so TC(x) 6= ∅. If y ∈ TC(x),then y = limαk(xk − x) for some αk ≥ 0 and xk ⊂ C such that limxk = x.Then for β ≥ 0 we have βy = limβαk(xk − x) ∈ TC(x), so TC(x) is a cone. Toshow that TC(x) is closed, let yl ⊂ TC(x) and yl → y. For each l we can takea sequence such that αk,l ≥ 0, limk→∞ xk,l = x, and yl = limk→∞ αk,l(xk,l− x).

125

Hence we can take kl such that ‖xkl,l − x‖ < 1/l and ‖yl − αkl,l(xkl,l − x)‖ <1/l. Then xkl,l → x and

‖y − αkl,l(xkl,l − x)‖ ≤ ‖y − yl‖+ ‖yl − αkl,l(xkl,l − x)‖ → 0,

so y ∈ TC(x).

The dual cone of TC(x) is called the normal cone at x and is denoted byNC(x) (Figure 12.5). By the definition of the dual cone, we have

NC(x) = (TC(x))∗ =z ∈ RN

∣∣ (∀y ∈ TC(x)) 〈y, z〉 ≤ 0.

x

NC(x)

C

Figure 12.5: Normal cone.

The following theorem is fundamental for constrained optimization.

Theorem 12.5. If f is differentiable and x is a local solution to the problem

minimize f(x) subject to x ∈ C,

then −∇f(x) ∈ NC(x).

Proof. By the definition of the normal cone, it suffices to show that

〈−∇f(x), y〉 ≤ 0 ⇐⇒ 〈∇f(x), y〉 ≥ 0

for all y ∈ TC(x). Let y ∈ TC(x) and take a sequence such that αk ≥ 0, xk → x,and αk(xk − x)→ y. Since x is a local solution, for sufficiently large k we havef(xk) ≥ f(x). Since f is differentiable, we have

0 ≤ f(xk)− f(x) = 〈∇f(x), xk − x〉+ o(‖xk − x‖).1

Multiplying both sides by αk ≥ 0 and letting k →∞, we get

0 ≤ 〈∇f(x), αk(xk − x)〉+ ‖αk(xk − x)‖ · o(‖xk − x‖)‖xk − x‖

→ 〈∇f(x), y〉+ ‖y‖ · 0 = 〈∇f(x), y〉 .1o(h) represents any quantity q(h) such that q(h)/h→ 0 as h→ 0.

126

The geometric interpretation of Theorem 12.5 is the following. By the dis-cussion around (4.3), −∇f(x) is the direction towards which f decreases fastestaround the point x. The tangent cone TC(x) consists of directions towardswhich x can move around x without violating the constraint x ∈ C. Hence inorder for x to be a local minimum, −∇f(x) must make an obtuse angle withany vector in the tangent cone, for otherwise f can be decreased further. Thisis the same as −∇f(x) belonging to the normal cone.

12.4 Karush-Kuhn-Tucker theorem

Theorem 12.5 is very general. Usually, we are interested in the cases where theconstraint set C is given parametrically. Consider the minimization problem

minimize f(x)

subject to gi(x) ≤ 0 (i = 1, . . . , I),

hj(x) = 0 (j = 1, . . . , J). (12.4)

This problem is a special case of problem (12.1) by setting

C =x ∈ RN

∣∣ (∀i)gi(x) ≤ 0, (∀j)hj(x) = 0.

gi(x) ≤ 0 is called an inequality constraint. hj(x) = 0 is an equality constraint.Let x ∈ C be a local solution. To study the shape of C around x, we define asfollows. The set of indices for which the inequality constraints are binding,

I(x) = i | gi(x) = 0 ,

is called the active set. Assume that gi’s and hj ’s are differentiable. The set

LC(x) =y ∈ RN

∣∣ (∀i ∈ I(x)) 〈∇gi(x), y〉 ≤ 0, (∀j) 〈∇hj(x), y〉 = 0

(12.5)

is called the linearizing cone of the constraints gi’s and hj ’s. The reason whyLC(x) is called the linearizing cone is the following. Since

gi(x+ ty)− gi(x) = t 〈∇gi(x), y〉+ o(t),

the point x = x + ty almost satisfies the constraint gi(x) ≤ 0 if gi(x) = 0 (iis an active constraint) and 〈∇gi(x), y〉 ≤ 0. The same holds for hj ’s. Thusy ∈ LC(x) implies that from x we can move slightly towards the direction ofy and still (approximately) satisfy the constraints. Thus we can expect thatthe linearizing cone is approximately equal to the tangent cone. The followingproposition make this statement precise.

Proposition 12.6. Suppose that x ∈ C. Then coTC(x) ⊂ LC(x).

Proof. Clearly the linearizing cone (12.5) is a closed convex cone, so it sufficesto prove TC(x) ⊂ LC(x). Let y ∈ TC(x). Take xk ⊂ C and αk ⊂ R+

such that xk → x and αk(xk − x) → y. Since gi(x) = 0 for i ∈ I(x) and gi isdifferentiable, we get

0 ≥ gi(xk) = gi(xk)− gi(x) = 〈∇gi(x), xk − x〉+ o(‖xk − x‖).

127

Multiplying both sides by αk ≥ 0 and letting k →∞, we get

0 ≥ 〈∇gi(x), αk(xk − x)〉+ ‖αk(xk − x)‖ · o(‖xk − x‖)‖xk − x‖

→ 〈∇gi(x), y〉+ ‖y‖ · 0 = 〈∇gi(x), y〉 .A similar argument applies to hj . Hence y ∈ LC(x).

Note that while the tangent cone is directly defined by the constraint setC, the linearizing cone is defined through the functions that define the set C.Therefore different parametrizations of the same set C may lead to differentlinearizing cones (Problem 12.3).

The main result in static optimization is the following.

Theorem 12.7 (Karush-Kuhn-Tucker). Suppose that f, gi, hj are differentiableand x is a local solution to the minimization problem (12.4). If LC(x) ⊂coTC(x), then there exist vectors (called Lagrange multipliers) λ ∈ RI+ andµ ∈ RJ such that

∇f(x) +I∑i=1

λi∇gi(x) +

J∑j=1

µj∇hj(x) = 0, (12.6a)

(∀i) λigi(x) = 0. (12.6b)

Proof. By Theorem 12.5, −∇f(x) ∈ NC(x) = (TC(x))∗. By Proposition 12.6and the assumption LC(x) ⊂ coTC(x), we get LC(x) = coTC(x). Hence by theproperty of dual cones, we get (TC(x))∗ = (coTC(x))∗ = (LC(x))∗. Now let

K be the polyhedral cone generated by ∇gi(x)i∈I(x) and ±∇hj(x)Jj=1. By

Farkas’s lemma (Corollary 12.3), it follows that

K∗ =y ∈ RN

∣∣ (∀i ∈ I(x)) 〈∇gi(x), y〉 ≤ 0, (∀j) 〈±∇hj(x), y〉 ≤ 0

=y ∈ RN

∣∣ (∀i ∈ I(x)) 〈∇gi(x), y〉 ≤ 0, (∀j) 〈∇hj(x), y〉 = 0,

which is precisely the linearizing cone LC(x) in (12.5). Again by Farkas’s lemma,we have (LC(x))∗ = K. Therefore −∇f(x) ∈ K, so there exist numbers λi ≥ 0(i ∈ I(x)) and αj , βj ≥ 0 such that

−∇f(x) =∑i∈I(x)

λi∇gi(x) +

J∑j=1

(αj − βj)∇hj(x).

Letting λi = 0 for i /∈ I(x) and µj = αj − βj , we get (12.6a). Finally, (12.6b)holds for i ∈ I(x) since gi(x) = 0. It also holds for i ∈ I(x) since we definedλi = 0 for such i.

Here is an easy way to remember the conditions in (12.6). Define the La-grangian of the minimization problem 12.4 by

L(x, λ, µ) = f(x) +

I∑i=1

λigi(x) +

J∑j=1

µjhj(x),

which is the sum of the objective function f(x) and the constraint functionsgi(x), hj(x) weighted by the Lagrange multipliers λi, µj . Then (12.6a) impliesthat the derivative of L(·, λ, µ) at x is zero. (12.6a) is called the first-ordercondition. (12.6b) is called the complementary slackness condition. Together,(12.6a) and (12.6b) are called Karush-Kuhn-Tucker (KKT) conditions.

128

12.5 Constraint qualifications

Conditions of the form LC(x) ⊂ coTC(x) in Theorem 12.7 are called constraintqualifications (CQ). These are necessary conditions in order for the KKT con-ditions to hold. There are many constraint qualifications in the literature:

Guignard (GCQ) LC(x) ⊂ coTC(x).

Abadie (ACQ) LC(x) ⊂ TC(x).

Mangasarian-Fromovitz (MFCQ) ∇hj(x)Jj=1 are linearly independent,

and there exists y ∈ RN such that 〈∇gi(x), y〉 < 0 for all i ∈ I(x) and〈∇hj(x), y〉 = 0 for all j.

Slater (SCQ) gi’s are convex, hj(x) = 〈aj , x〉 − cj where ajJj=1 are linearly

independent, and there exists x0 ∈ RN such that gi(x0) < 0 for all i andhj(x0) = 0 for all j.

Linear independence (LICQ) ∇gi(x)i∈I(x) and ∇hj(x)Jj=1 are linearlyindependent.

The point of listing these constraint qualifications is that some of themare general but hard to verify (GCQ and ACQ), while others are special buteasy to verify (SCQ and LICQ). Users of the KKT theorem need to pick theappropriate constraint qualification for the problem under consideration. Thefollowing theorem shows the relation between these constraint qualifications.

Theorem 12.8. The following is true for constraint qualifications.

LICQ or SCQ =⇒ MFCQ =⇒ ACQ =⇒ GCQ.

Proof.

ACQ =⇒ GCQ. Trivial because TC(x) ⊂ coTC(x).

MFCQ =⇒ ACQ. By dropping non-binding constraints, without loss ofgenerality we may assume all the constraints bind, so I(x) = 1, . . . , I. De-fine G : RN → RI by G(x) = (g1(x), . . . , gI(x))′ and H : RN → RJ byH(x) = (h1(x), . . . , hJ(x))′. Then MFCQ holds if and only if the J ×N Jaco-bian DH(x) has full row rank and there exists y ∈ RN such that [DG(x)]y 0and [DH(x)]y = 0, where v 0 means that all entries of v are strictly negative.

Define the set

LC(x) =y ∈ RN

∣∣ [DG(x)]y 0, [DH(x)]y = 0.

Since MFCQ holds, by definition we have LC(x) 6= ∅. Since cl LC(x) = LC(x)by the definition of the linearizing cone, and since TC(x) is closed, it suffices toshow LC(x) ⊂ TC(x).

Since DH(x) has full row rank, by relabeling the variables if necessary, wemay assume that we can split the variables as x = (x1, x2) ∈ RN−J × RJ andwrite DH(x) = [Dx1

H,Dx2H], where Dx2

H = Dx2H(x) is regular. By the

implicit function theorem, for x close enough to x, we can write

0 = H(x) = H(x1, x2) ⇐⇒ x2 = φ(x1),

129

where φ is C1 and Dφ = −[Dx2H]−1Dx1

H.Take any y = (y1, y2) ∈ LC(x), where y1 ∈ RN−J and y2 ∈ RJ . For small

t > 0, define

x(t) = (x1(t), x2(t)) = (x1 + ty1, φ(x1 + ty1)).

Let us show that x(0) = x, x(t) ∈ C for small t > 0, and x′(0) = y, whichimplies that y ∈ TC(x).

SinceH(x) = 0, by the implicit function theorem we have x(0) = (x1, φ(x1)) =(x1, x2) = x.

Using the chain rule, we obtain x′(0) = (y1, [Dφ]y1). Since y ∈ LC(x), itfollows that

0 = [DH(x)]y = [DHx1]y1+[DHx2

]y2 ⇐⇒ y2 = −[DHx2]−1[DHx1

]y1 = [Dφ]y1

by the implicit function theorem. Therefore x′(0) = (y1, y2) = y.Finally, by the chain rule and the definition of LC(x), at t = 0 we have

d

dtG(x(0)) = [DG(x)]x′(0) = [DG(x)]y 0.

Therefore for small enough t > 0, we have

G(x(t))

t=G(x(t))−G(x)

t 0

because G(x) = 0, so G(x(t)) 0. Since H(x(t)) = H(x1(t), φ(x1(t))) = 0, itfollows that x(t) ∈ C for small enough t > 0.

SCQ =⇒ MFCQ. Suppose that gi’s are convex, hj(x) = 〈aj , x〉 − cj where

ajJj=1 are linearly independent, and there exists x0 ∈ RN such that gi(x0) < 0

for all i and hj(x0) = 0 for all j.

Since ∇hj = aj and ajJj=1 are linearly independent, ∇hj(x)Jj=1 are

linearly independent. If i ∈ I(x), since gi is convex, by Proposition 10.5 wehave

0 > gi(x0) = gi(x0)− gi(x) ≥ 〈∇gi(x), x0 − x〉 .Setting y = x0 − x, we have 〈∇gi(x), y〉 < 0 for all i ∈ I(x). Since x, x0 arefeasible, we have 〈aj , x〉 − cj = 0 and 〈aj , x0〉 − cj = 0, so taking the difference〈∇hj(x), y〉 = 〈aj , x0 − x〉 = 0. Therefore MFCQ holds.

LICQ =⇒ MFCQ. As in the previous case we may assume I(x) = 1, . . . , I.Suppose to the contrary that MFCQ does not hold. Then there exist no ysuch that 〈∇gi(x), y〉 < 0 for all i and 〈∇hj(x), y〉 = 0 for all j. Let G(x) =(g1(x), . . . , gI(x))′, H(x) = (h1(x), . . . , hJ(x))′, and define the (I + J)×N ma-

trix M by M =

[DG(x)DH(x)

]. Define the sets A,B ⊂ RI+J by

A = −RI++ × 0 ⊂ RI × RJ ,B =

z ∈ RI+J

∣∣ (∃y ∈ RN )z = My.

Since MFCQ does not hold, we have A ∩ B = ∅. Clearly A,B are nonemptyand convex. By the separating hyperplane theorem, there exists 0 6= a ∈ RI+Jsuch that

supz∈A〈a, z〉 ≤ inf

z∈B〈a, z〉 = inf

y∈RNa′My.

130

Since y 7→ a′My is linear and supz∈A 〈a, z〉 > −∞ because A 6= ∅, in order forthe above inequality to hold, it is necessary that a′M = 0. Letting a = (λ, µ) ∈RI × RJ , then

0 = M ′a =

I∑i=1

λi∇gj(x) +

J∑j=1

µj∇hj(x).

Since a = (λ, µ) 6= 0, ∇gi(x)Ii=1 and ∇hj(x)Jj=1 are not linearly indepen-dent. Hence LICQ does not hold.

In applications, oftentimes constraints are linear. In that case GCQ is auto-matically satisfied, so there is no need to check it (Problem 12.4). It is knownthat the GCQ is the weakest possible condition (Gould and Tolle, 1971).

12.6 Sufficient condition

The Karush-Kuhn-Tucker theorem provides necessary conditions for optimal-ity: if the constraint qualification holds, then a local solution must satisfythe Karush-Kuhn-Tucker conditions (first-order conditions and complementaryslackness conditions). Note that the first-order conditions are equivalent to

∇xL(x, λ, µ) = 0, (12.7)

where L(x, λ, µ) is the Lagrangian. The condition (12.7) can be interpreted asthe first-order necessary condition for the unconstrained minimization problem

minx∈RN

L(x, λ, µ). (12.8)

Below I give a sufficient condition for optimality.

Proposition 12.9. Suppose that x is a solution to the unconstrained minimiza-tion problem (12.8) for some λ ∈ RI+ and µ ∈ RJ . If gi(x) ≤ 0 and λigi(x) = 0for all i and hj(x) = 0 for all j, then x is a solution to the constrained mini-mization problem (12.1).

Proof. Take any x such that gi(x) ≤ 0 for all i and hj(x) = 0 for all j. Then

f(x) = f(x) +

I∑i=1

λigi(x) +

J∑j=1

µjhj(x)

= L(x, λ, µ) ≤ L(x, λ, µ)

= f(x) +

I∑i=1

λigi(x) +

J∑j=1

µjhj(x) ≤ f(x).

The first line is due to λigi(x) = 0 for all i and hj(x) = 0 for all j. The secondline is the assumption that x minimizes L(·, λ, µ). The third line is due to λi ≥ 0and gi(x) ≤ 0 for all i and hj(x) = 0 for all j.

Corollary 12.10. If f, gi are all convex, then the KKT conditions are sufficientfor optimality.

131

Next I give a second order sufficient condition that will be useful later. Con-sider the minimization problem (12.4). Assume that the KKT conditions (12.6)hold at x with corresponding Lagrange multipliers λ ∈ RI+ and µ ∈ RJ . Re-member that the active set of the inequality constraints is I(x) = i | gi(x) = 0.Let I(x) = i |λi > 0 be the set of constraints such that the Lagrange multi-plier is positive. Since λigi(x) = 0 by complementary slackness, λi > 0 impliesgi(x) = 0, so necessarily I(x) ⊂ I(x). Define the cone

LC(x) =y ∈ RN

∣∣∣(∀i ∈ I(x)\I(x)) 〈∇gi(x), y〉 ≤ 0,

(∀i ∈ I(x)) 〈∇gi(x), y〉 = 0, (∀j) 〈∇hj(x), y〉 = 0.

Clearly LC(x) ⊂ LC(x). The following theorem gives a second order sufficientcondition for local optimality.

Theorem 12.11. Suppose that f , gi’s, and hj’s are twice differentiable, theKKT conditions (12.6) hold at x = x, and⟨

y,∇2xL(x, λ, µ)y

⟩> 0 (12.9)

for all 0 6= y ∈ LC(x). Then x is a strict local solution to the minimizationproblem (12.4), i.e., there exists a neighborhood Ω of x such that f(x) < f(x)whenever x ∈ Ω satisfies the constraints in (12.4).

Proof. Suppose that x is not a strict local solution. Then we can take a sequenceC 3 xk → x such that f(xk) ≤ f(x). Let αk = 1/

∥∥xk − x∥∥ > 0. Then∥∥αk(xk − x)∥∥ = 1, so by taking a subsequence if necessary we may assume

αk(xk − x)→ y with ‖y‖ = 1. Let us show that y ∈ LC(x).Multiplying both sides of

f(xk)− f(x) ≤ 0, gi(xk)− gi(x) ≤ 0 (i ∈ I(x)), hj(x

k)− hj(x) = 0

by αk and letting k →∞, we get

〈∇f(x), y〉 ≤ 0, 〈∇gi(x), y〉 ≤ 0 (i ∈ I(x)), 〈∇hj(x), y〉 = 0. (12.10)

Multiplying both sides of the first-order condition (12.6a) by y as an innerproduct, noting that λi = 0 if i /∈ I(x) by complementary slackness, and using(12.10), we get

〈∇f(x), y〉+∑i∈I(x)

λi 〈∇gi(x), y〉 = 0.

Again by (12.10) it must be 〈∇f(x), y〉 = 0 and λi 〈∇gi(x), y〉 = 0 for alli ∈ I(x). Therefore if i ∈ I(x), so λi > 0, it must be 〈∇gi(x), y〉 = 0. Hence bydefinition we have y ∈ LC(x).

Since f(xk) ≤ f(x), λi ≥ 0, gi(xk) ≤ 0, and λigi(x) = 0, it follows that

L(xk, λ, µ) = f(xk) +∑i∈I(x)

λigi(xk) ≤ f(x) = L(x, λ, µ).

By Taylor’s theorem

0 ≥ L(xk, λ, µ)− L(x, λ, µ)

=⟨∇xL(x, λ, µ), xk − x

⟩+

1

2

⟨xk − x,∇2

xL(x, λ, µ)(xk − x)⟩

+ o(∥∥xk − x∥∥2

).

132

By the KKT conditions, the first term in the right-hand side is zero. Multiplyingboth sides by α2

k and letting k →∞, we get

0 ≥ 1

2

⟨y,∇2

xL(x, λ, µ)y⟩,

which contradicts (12.9). Therefore x is a strict local solution.

Problems

12.1. Let a1, . . . , aK be vectors. This problem asks you to prove that thepolyhedral cone C = cone[a1, . . . , aK ] is a closed convex cone.

1. Prove that C is a nonempty convex cone.

2. Prove that if x ∈ C, then x can be expressed as x =∑Jj=1 αjakj , where

αj ≥ 0 and ak1 , . . . , akJ are linearly independent.

3. Prove that C is closed. (Hint: use Problem 2.6.)

12.2. Let C be any set and suppose x ∈ intC (interior point of C).

1. Compute the tangent cone TC(x) and the normal cone NC(x).

2. Interpret Theorem 12.5.

12.3. Let x ∈ R and consider the constraints (i) x ≤ 0 and (ii) x3 ≤ 0.

1. Show that the constraints (i) and (ii) are equivalent, and compute thetangent cone at x = 0.

2. Compute the linearizing cones corresponding to constraints (i) and (ii) atx = 0, respectively. Are they the same?

3. Construct an example such that the Slater condition gi(x0) < 0 holds andgi is quasi-convex (but not convex) but the KKT conditions do not hold.

12.4. Suppose that gi(x) = 〈ai, x〉 − ci and hj(x) = 〈bj , x〉 − dj in the min-imization problem (12.4). Show that the Guignard constraint qualification issatisfied.

133

Chapter 13

Maximum and EnvelopeTheorems


Suppose you are managing a firm that produces a final product by using laborand raw materials. The production function is

y = Alαx1−α,

where y is the quantity of output, A > 0 is a productivity parameter, l > 0 islabor input, x > 0 is the input of raw materials, and 0 < α < 1 is a parameter.

Assume that you cannot hire or fire workers in the short run and thereforeyou see labor input l as constant, but can choose the input of raw materials xfreely. The wage rate is w > 0 and the unit price of the raw material is p > 0.The unit price of the final product is normalized to 1. You are interested in twoquestions:

1. What is the optimal amount of input of raw materials?

2. What would happen to the firm’s profit if parameter values change?

We can answer the first question by solving the optimization problem. We canalso answer the second question once we have solved the optimization problem,but the topic of this chapter is how to (partly) answer the second questionwithout solving the optimization problem.

Mathematically, the problem is

maximize Alαx1−α − wl − pxsubject to l fixed, x ≥ 0.

It is not hard to see that the objective function is concave in x. Clearly theconstraint is linear. Therefore the KKT conditions are necessary and sufficientfor optimality. The Lagrangian is

L(x, λ) = Alαx1−α − wl − px+ λx.

134

The KKT conditions are

Alα(1− α)x−α − p+ λ = 0, (13.1a)

λ ≥ 0, x ≥ 0, λx = 0. (13.1b)

(13.1a) is the first-order condition. (13.1b) is the complementary slackness con-dition.

If x = 0, then the first-order condition (13.1a) will be ∞ − p + λ = 0, acontradiction. Therefore it must be x > 0. By the complementary slacknesscondition (13.1b), we get λ = 0. Substituting this into (13.1a) and solving forx, we get

Alα(1− α)x−α − p = 0 ⇐⇒ x =

(A(1− α)

p

) 1α

l. (13.2)

Substituting this into the objective function, after some algebra the maximizedprofit is

π(α,A, p, w, l) := α(1− α)1α−1A

1α p1− 1

α l − wl.

Regarding the second question, suppose that we are interested in how themaximized profit π change when the price of the raw materials p changes. Thenwe compute

∂π

∂p= α

(1− 1

α

)(1− α)

1α−1A

1α p−

1α l = −

(A(1− α)

p

) 1α

l.

This is simply the negative of the optimal input of raw materials computed in(13.2). If we had partially differentiated the profit function

Alαx1−α − wl − px

with respect to p, we get the same answer −x (evaluated at the optimal solution,though)! Is this a coincidence? The answer is no. The Maximum Theorem tellsthat the optimal value and the solution are continuous in the parameter. TheEnvelope Theorem tells that the optimal value is differentiable in parametersand the derivatives are related to the Lagrange multipliers.

13.2 Maximum Theorem

Let X ⊂ RN and Y ⊂ RM be sets. Γ : X Y is a correspondence (or multi-valued function) if for each x ∈ X we have Γ(x) ⊂ Y , a subset of Y . Note thatwe use an arrow with two heads “” for a correspondence, while we use theusual arrow “→” for a function. Another common notation is Γ : X ⇒ Y . Γ issaid to be compact (convex) valued if for each x ∈ X, the set Γ(x) is compact(convex). Γ is said to be uniformly bounded if for each x ∈ X, there exists aneighborhood U of x such that

⋃x∈U Γ(x) is bounded. Of course, Γ(x) need not

be uniformly bounded just because Γ(x) is bounded. For instance, let X = Rand

Γ(x) =

[0, 1], (x ≤ 0)

[0, 1/x]. (x > 0)

Then Γ(x) is bounded but not uniformly bounded at x = 0 (draw a picture).

135

Remember that a function f : X → Y is continuous if xn → x impliesf(xn) → f(x). “xn → x” is a shorthand notation for “limn→∞ xn = x”. Wecan define continuity for correspondences.

Definition 13.1 (Upper hemicontinuity). Γ : X Y is upper hemicontinuousif it is uniformly bounded and xn → x, yn ∈ Γ(xn), and yn → y implies y ∈ Γ(x).

Upper hemicontinuity is also called upper semi-continuity (I often use semi-continuity). Perhaps hemicontinuity is less confusing since there is a separatesemi-continuity concept for functions, introduced below. When the requirementthat Γ is uniformly bounded is dropped, then Γ is called closed. When Y is itselfbounded, upper hemicontinuity is the same as closedness. Upper hemicontinuitysays that if a sequence in the image of a convergent sequence is convergent, thenthe limit belongs to the image of the limit. There is also a concept called lowerhemicontinuity, which is roughly the converse. If you take a point in the imageof the limit, then you can take a sequence in the image of the sequence thatconverges to that point.

Definition 13.2 (Lower hemicontinuity). Γ : X Y is lower hemicontinuousif for any xn → x and y ∈ Γ(x), there exists a number N and a sequence yn → ysuch that yn ∈ Γ(xn) for n > N .

A correspondence that is both upper and lower hemicontinuous is calledcontinuous.

The next Maximum Theorem guarantees that the maximum value of a para-metric maximization problem is continuous and the solution set is upper hemi-continuous.

Theorem 13.3 (Maximum Theorem). Let f : X × Y → R and Γ : X Y becontinuous. Assume

Γ∗(x) = arg maxy∈Γ(x)

f(x, y) 6= ∅

and let f∗(x) = maxy∈Γ(x) f(x, y). Then f∗ is continuous and Γ∗ : X Y isupper hemicontinuous.

The proof of the maximum theorem is not so difficult, but it is clearer toweaken the assumptions and prove several weaker statements. To do so I definesemi-continuity for functions.

Definition 13.4 (Semi-continuity of functions). f : X → [−∞,∞] is uppersemi-continuous if xn → x implies lim supn→∞ f(xn) ≤ f(x). f is lower semi-continuous if xn → x implies lim infn→∞ f(xn) ≥ f(x).

Clearly, f is upper semi-continuous if −f is lower semi-continuous, and f iscontinuous if it is both upper and lower semi-continuous. The extreme valuetheorem (Theorem 2.5) guarantees that a continuous function attains the max-imum on a compact set. Indeed all we need is upper semi-continuity, as thefollowing theorem shows.

Theorem 13.5. Let X be nonempty and compact and f : X → [−∞,∞) uppersemi-continuous. Then f attains the maximum on X.

136

Proof. Let M = supx∈X f(x). Take xn such that f(xn) → M . Since X iscompact, xn has a convergent subsequence. Assume xnk → x. Since f isupper semi-continuous, we have

M ≥ f(x) ≥ lim supk→∞

f(xnk) = M,

so f(x) = M = maxx∈X f(x).

By the same argument, a lower semi-continuous function attains the mini-mum on a compact set. Theorem 13.5 is useful. For example, the Cobb-Douglasutility function

u(x1, x2) = α1 log x1 + α2 log x2

is not continuous at x1 = 0 or x2 = 0 in the usual sense. But if we definelog 0 = −∞, u becomes upper semi-continuous. Therefore if the budget set iscompact, we know a priori that a solution to the utility maximization problemexists.

We prove two lemmas to prove the maximum theorem.

Lemma 13.6. Let f : X×Y → R be upper semi-continuous and Γ : X Y up-per hemicontinuous. Then f∗(x) = supy∈Γ(x) f(x, y) is upper semi-continuous.

Proof. Take any xn → x and ε > 0. Take a subsequence xnk such thatf∗(xnk) → lim supn→∞ f∗(xn). For each k, take ynk ∈ Γ(xnk) such thatf(xnk , ynk) > f∗(xnk) − ε. Since Γ is upper hemicontinuous, it is uniformlybounded. Therefore there exists a neighborhood U of x such that

⋃x′∈U Γ(x′)

is bounded. Since xnk → x, there exists K such that⋃k>K Γ(xnk) is bounded.

Hence ynk is bounded. By taking a subsequence if necessary, we may assumeynk → y. Since Γ is upper hemicontinuous, we have y ∈ Γ(x). Since f is uppersemi-continuous,

f∗(x) ≥ f(x, y) ≥ lim supk→∞

f(xnk , ynk) ≥ limk→∞

f∗(xnk)− ε = lim supn→∞

f∗(xn)− ε.

Letting ε→ 0, it follows that f∗ is upper semi-continuous.

Lemma 13.7. Let f : X × Y → R be lower semi-continuous and Γ : X Ylower hemicontinuous. Then f∗(x) = supy∈Γ(x) f(x, y) is lower semi-continuous.

Proof. Take any xn → x and ε > 0. Take y ∈ Γ(x) such that f(x, y) > f∗(x)−ε.Since Γ is lower hemicontinuous, there exist N and yn → y such that yn ∈ Γ(xn)for all n > N . Then f∗(xn) ≥ f(xn, yn). Since f is lower semi-continuous,

lim infn→∞

f∗(xn) ≥ lim infn→∞

f(xn, yn) ≥ f(x, y) > f∗(x)− ε.

Letting ε→ 0, it follows that f∗ is lower semi-continuous.

Proof of the maximum theorem. By Lemmas (13.6) and (13.7), f∗ is con-tinuous. Since Γ∗(x) ⊂ Γ(x) and Γ is uniformly bounded, so is Γ∗. Take anyxn → x, yn ∈ Γ∗(xn), and assume yn → y. Since f and f∗ are continuous, wehave

f(x, y) = limn→∞

f(xn, yn) = limn→∞

f∗(xn) = f∗(x),

so y ∈ Γ∗(x). Hence Γ∗ is upper hemicontinuous.

137

13.3 Envelope Theorem

Consider the parametric minimization problem

minimizex

f(x, u)

subject to gi(x, u) ≤ 0 (i = 1, . . . , I). (13.3)

Here x ∈ RN is the control variable and u ∈ Rp is a parameter. Let

φ(u) = infxf(x, u) | (∀i)gi(x, u) ≤ 0

be the minimum value function. We are interested in how φ(u) changes when udoes.

Recall the second-order sufficient condition for optimality (Theorem 12.11).The active set of the inequality constraints is

I(x) = i | gi(x) = 0 .

LetI(x) = i |λi > 0

be the set of constraints such that the Lagrange multiplier is positive. Sinceλigi(x) = 0 by complementary slackness, λi > 0 implies gi(x) = 0, so necessarilyI(x) ⊂ I(x). Define the cone

LC(x)

=y ∈ RN

∣∣∣ (∀i ∈ I(x)\I(x)) 〈∇gi(x), y〉 ≤ 0, (∀i ∈ I(x)) 〈∇gi(x), y〉 = 0.

Then the second-order sufficient condition for local optimality is⟨y,∇2

xL(x, λ, µ)y⟩> 0 (∀0 6= y ∈ LC(x)). (13.4)

Theorem 13.8 (Sensitivity Analysis). Suppose that x ∈ RN is a local solutionto the parametric optimization problem (13.3) corresponding to u ∈ Rp. Assumethat

1. The vectors ∇xgi(x, u)i∈I(x) are linearly independent, so the Karush-

Kuhn-Tucker theorem holds with Lagrange multiplier λ ∈ RI+,

2. Strict complementary slackness condition holds, so gi(x, u) = 0 impliesλi > 0 and therefore I(x) = I(x),

3. The second order condition (13.4) holds.

Then there exists a neighborhood U of u and C1 functions x(u), λ(u) such thatfor any u ∈ U , x(u) is the local solution to the parametric optimization problem(13.3) and λ(u) is the corresponding Lagrange multiplier.

In our case, since strict complementary slackness holds and there are noequality constraints (hj ’s), condition (13.4) reduces to

y 6= 0, (∀i ∈ I(x)) 〈∇xgi(x, u), y〉 = 0 =⇒⟨y,∇2

xL(x, λ, u)y⟩> 0. (13.5)

We need the following lemma in order to prove the theorem.

138

Lemma 13.9. Let everything be as in Theorem 13.8. Define the (N+I)×(N+I)matrix A by

A =

∇2xL ∇xg1 · · · ∇xgI

λ1∇xg′1 g1 · · · 0...

.... . .

...λI∇xg′I 0 · · · gI

,where all functions are evaluated at (x, λ, u). Then A is regular.

Proof. Suppose that A

[vw

]= 0, where v ∈ RN and w ∈ RI . It suffices to show

that v = 0 and w = 0. By the definition of A, we get

∇2xLv +

I∑i=1

wi∇xgi = 0, (13.6a)

(∀i) λi 〈∇xgi, v〉+ wigi = 0. (13.6b)

For i ∈ I(x) (hence gi(x, u) = 0), by (13.6b) and strict complementary slacknesswe have λi > 0 and therefore 〈∇xgi, v〉 = 0. For i /∈ I(x) (hence gi(x, u) <0), again by (13.6b) and strict complementary slackness we have λi = 0 andtherefore wi = 0. Therefore (13.6a) becomes

∇2xLv +

∑i∈I(x)

wi∇xgi = 0. (13.7)

Multiplying (13.7) by v as an inner product and using 〈∇xgi, v〉 = 0 for i ∈ I(x),we obtain ⟨

v,∇2xLv

⟩= 0.

By condition (13.5), it must be v = 0. Then by (13.7) we obtain∑i∈I(x)

wi∇xgi = 0,

but since ∇xgii∈I(x) are linearly independent, it must be wi = 0 for all i.Therefore v = 0 and w = 0.

Proof of Theorem 13.8. Define f : RN × RI × Rp → RN × RI by

f(x, λ, u) =

∇xL(x, λ, u)λ1gi(x, u)

...λIgI(x, u)

.Then the Jacobian of f with respect to (x, λ) evaluated at (x, λ, u) is A, whichis regular. Furthermore, since x is a local solution corresponding to u = u, bythe KKT Theorem we have f(x, λ, u) = 0. Therefore by the Implicit FunctionTheorem, there exists a neighborhood U of u and C1 functions x(u), λ(u) suchthat

f(x(u), λ(u), u) = 0

for u ∈ U . By the second-order sufficient condition (Theorem 12.11), x(u) isthe local solution to the parametric optimization problem (13.3) and λ(u) is thecorresponding Lagrange multiplier.

139

The following theorem is extremely important.

Theorem 13.10 (Envelope Theorem). Let everything be as in Theorem 13.8and

L(x, λ, u) = f(x, u) +

I∑i=1

λigi(x, u)

be the Lagrangian. Assume that the parametric optimization problem (13.3) hasa solution x(u) with Lagrange multiplier λ(u). Let

φ(u) = minxf(x, u) | (∀i)gi(x, u) ≤ 0

be the minimum value function. Then φ is differentiable and

∇φ(u) = ∇uL(x(u), λ(u), u),

i.e., the derivative of φ can be computed by differentiating the Lagrangian withrespect to the parameter u alone, treating x and λ as constants.

Proof. By the definition of φ and complementary slackness, we obtain

φ(u) = f(x(u), u) = L(x(u), λ(u), u).

Differentiating both sides with respect to u, we get

Duφ(u)︸︷︷︸1×p

= DxL︸︷︷︸1×N

Dux(u)︸︷︷︸N×p

+DλL︸︷︷︸1×I

Duλ(u)︸︷︷︸I×p

+DuL︸︷︷︸1×p

.

By the KKT theorem, we have DxL = 0. By the strict complementary slackness,we have λi(u) = 0 for i /∈ I(x) and

DλiL(x(u), λ(u), u) = gi(x(u), u) = 0

for i ∈ I(x), so DλLDuλ(u) = 0. Therefore ∇φ(u) = ∇uL(x(u), λ(u), u).

Corollary 13.11. Consider the special case

minimizex

f(x)

subject to gi(x) ≤ ui (i = 1, . . . , I).

Then ∇φ(u) = −λ(u).

Proof. The Lagrangian is

L(x, λ, u) = f(x) +

I∑i=1

λi[gi(x)− ui].

By the envelope theorem, ∇φ(u) = ∇uL(x(u), λ(u), u) = −λ(u).

140

Problems

13.1. Consider the utility maximization problem (UMP)

maximize α log x1 + (1− α) log x2

subject to p1x1 + p2x2 ≤ w,

where 0 < α < 1 is a parameter, p1, p2 > 0 are prices, and w > 0 is wealth.

1. Solve UMP.

2. Let v(p1, p2, w) be the value function. Compute the partial derivatives ofv with respect to each of p1, p2, and w.

3. Verify Roy’s identity xn = − ∂v∂pn

/ ∂v∂w for n = 1, 2, where xn is the optimaldemand of good n.

13.2. Consider the UMP

maximize u(x)

subject to x ∈ RL+, 〈p, x〉 ≤ w,

where w > 0 is the wealth of the consumer, p ∈ RL++ is the price vector, x ∈ RL+is the demand, and u : RL+ → R is differentiable and strictly quasi-concave. Letx(p, w) be the solution to UMP (called Marshallian demand) and v(p, w) thevalue function. If x(p, w) 0, prove Roy’s identity

x(p, w) = −∇pv(p, w)

∇wv(p, w).

(Hint: envelope theorem.)

13.3. Consider the expenditure minimization problem (EMP)

minimize p1x1 + p2x2

subject to α log x1 + (1− α) log x2 ≥ u

where 0 < α < 1 is a parameter, p1, p2 > 0 are prices, and u ∈ R is the desiredutility level.

1. Solve EMP.

2. Let e(p1, p2, u) be the value function. Compute the partial derivatives ofe with respect to p1 and p2.

3. Verify Shephard’s lemma xn = ∂e∂pn

for n = 1, 2, where xn is the optimaldemand of good n.

13.4. Consider the EMP

minimize 〈p, x〉subject to u(x) ≥ u,

where p ∈ RL++ is the price vector, x ∈ RL is the demand, u(x) is a strictlyquasi-concave differentiable utility function, and u ∈ R is the desired utility

141

level. Let h(p, u) be the solution to EMP (called Hicksian demand) and e(p, u)be the minimum expenditure (called expenditure function). Prove Shephard’slemma

h(p, u) = ∇pe(p, u).

13.5. Prove the following Slutsky equation:

Dpx(p, w)︸︷︷︸L×L

= D2pe(p, u)︸︷︷︸L×L

− [Dwx(p, w)]︸︷︷︸L×1

[x(p, w)]′︸︷︷︸1×L

,

where x(p, w) is the Marshallian demand, u = u(x(p, w)) is the utility levelevaluated at the demand, and e(p, u) is the expenditure function.

142

Chapter 14

Duality Theory

14.1 Motivation

Consider the following constrained minimization problem

minimize f(x)

subject to gi(x) ≤ 0 (i = 1, . . . , I) (14.1)

with Lagrangian

L(x, λ) =

f(x) +

∑Ii=1 λigi(x), (λ ∈ RI+)

−∞. (λ /∈ RI+)

(The following discussion can easily accommodate equality constraints as well.)Recall the saddle point theorem (Theorem 11.2): if f, gi’s are all convex andthere is a point x0 ∈ RN such that gi(x0) < 0 for all i (Slater constraintqualification), then x is a solution to (14.1) if and only if there is λ ∈ RI+ suchthat (x, λ) is a saddle point of L, that is,

L(x, λ) ≤ L(x, λ) ≤ L(x, λ) (14.2)

for all x ∈ RN and λ ∈ RI . By (14.2), we get

supλL(x, λ) ≤ L(x, λ) ≤ inf

xL(x, λ).

Thereforeinfx

supλL(x, λ) ≤ L(x, λ) ≤ sup

λinfxL(x, λ). (14.3)

On the other hand, L(x, λ) ≤ supλ L(x, λ) always, so taking the infimum withrespect to x, we get

infxL(x, λ) ≤ inf

xsupλL(x, λ).

Noting that the right-hand side is just a constant, taking the supremum of theleft-hand side with respect to λ, we get

supλ

infxL(x, λ) ≤ inf

xsupλL(x, λ). (14.4)

143

Combining (14.3) and (14.4), it follows that

L(x, λ) = infx

supλL(x, λ) = sup

λinfxL(x, λ). (14.5)

Define

θ(x) = supλL(x, λ),

ω(λ) = infxL(x, λ).

Then (14.5) is equivalent to

L(x, λ) = infxθ(x) = sup

λω(λ). (14.6)

Note that by the definition of the Lagrangian,

θ(x) = supλL(x, λ) =

f(x), (∀i, gi(x) ≤ 0)

∞. (∃i, gi(x) > 0)

Therefore the constrained minimization problem (14.1) is equivalent to the un-constrained minimization problem

minimize θ(x). (P)

For this reason the problem (P) is called the primal problem. In view of (14.6),define the dual problem by

maximize ω(λ). (D)

Then (14.6) implies that the primal and dual values coincide.The above discussion suggests that in order to solve the constrained mini-

mization problem (14.1) and hence the primal problem (P), it might be sufficientto solve the dual problem (D). Since L(x, λ) is linear in λ, ω(λ) = infx L(x, λ)is always a concave function of λ no matter what f or gi’s are. Therefore wecan expect that solving the dual problem is much easier than solving the primalproblem.

14.2 Example

14.2.1 Linear programming

A typical linear programming problem is

minimize 〈c, x〉subject to Ax ≥ b,

where x ∈ RN is the vector of decision variables, c ∈ RN is the vector of thecoefficients, A is an M ×N matrix, and b ∈ RM is a vector. The Lagrangian is

L(x, λ) = 〈c, x〉+ 〈λ, b−Ax〉 ,

144

where λ ∈ RM+ is the vector of Lagrange multipliers. Since

ω(λ) = infxL(x, λ) = inf

x[〈c−A′λ, x〉+ 〈b, λ〉]

=

〈b, λ〉 , (A′λ = c)

−∞. (A′λ 6= c)

The dual problem is

maximize 〈b, λ〉subject to A′λ = c, λ ≥ 0.

14.2.2 Entropy maximization

Let p = (p1, . . . , pN ) be a multinomial distribution, so pn ≥ 0 and∑Nn=1 pn = 1.

The quantity

H(p) = −N∑n=1

pn log pn

is called the entropy of p.In practice we often want to find the distribution p that has the maximum

entropy satisfying some moment constraints. Suppose the constraints are givenby

N∑n=1

ainpn = bi. (i = 1, . . . , I)

Since maximizing H is equivalent to minimizing −H, the problem is

minimize

N∑n=1

pn log pn

subject to

N∑n=1

ainpn = bi, (i = 0, . . . , I) (14.7)

where ain’s and bi’s are given and I define a0n = 1 and b0 = 1 to accommodatethe constraint

∑n pn = 1 (accounting of probability).

If the number of unknown variablesN is large (sayN ∼ 10000), then it wouldbe very hard to solve the problem even using a computer since the objectivefunction pn log pn is nonlinear. However, it turns out that the dual problem isvery simple.

Although p log p is not well-defined when p ≤ 0, define

p log p =

0, (p = 0)

∞. (p < 0)

Then the constraint pn ≥ 0 is built in the problem. The Lagrangian is

L(p, λ) =

N∑n=1

pn log pn +

I∑i=0

λi

(bi −

N∑n=1

ainpn

)

= 〈b, λ〉+

N∑n=1

(pn log pn − 〈an, λ〉 pn) ,

145

where b = (b0, . . . , bI) and an = (a0n, . . . , aIn). To derive the dual problem, weneed to compute infp L(p, λ), which reduces to computing

infp

[p log p− cp]

for c = 〈an, λ〉 above. But this problem is easy! Differentiating with respect top, the first-order condition is

log p+ 1− c = 0 ⇐⇒ p = ec−1,

with the minimum value

p log p− cp = ec−1(c− 1)− cec−1 = −ec−1.

Substituting pn = e〈an,λ〉−1 in the Lagrangian, after some algebra the objectivefunction of the dual problem becomes

ω(λ) = infpL(p, λ) = 〈b, λ〉 −

N∑n=1

e〈an,λ〉−1.

Hence the dual problem of (14.7) is

maximize 〈b, λ〉 −N∑n=1

e〈an,λ〉−1. (14.8)

Numerically solving (14.8) is much easier than (14.7) because the dual problem(14.8) is unconstrained and the number of unknown variables 1 + I is typicallymuch smaller than N .

14.3 Convex conjugate function

In the above example we needed to compute

infp

[p log p− cp] = − supp

[cp− p log p].

In general, for any function f : RN → [−∞,∞] we define

f∗(ξ) = supx∈RN

[〈ξ, x〉 − f(x)],

which is called the convex conjugate function of f . Fixing x, since 〈ξ, x〉−f(x) isan affine (hence closed and convex) function of ξ, f∗ is a closed convex function.(A function is closed if its epigraph is closed. Closedness of a function is thesame as lower semi-continuity.)

For any function f , the largest closed convex function that is less than orequal to f is called the closed convex hull of f , and is denoted by cl co f . For-mally, we define

cl co f(x) = sup g(x) | g is closed convex and f(x) ≥ g(x) .

146

(At least one such g exists—g(x) = −∞.) Clearly φ = cl co f satisfies epiφ =cl co epi f , so

epi cl co f = cl co epi f

is also the definition of cl co f .The convex conjugate function of the convex conjugate function of f(x) is

called the biconjugate function. Formally,

f∗∗(x) = supξ∈RN

[〈x, ξ〉 − f∗(ξ)].

In the case of cones, we had C∗∗ = cl coC for any cone C. The same holds forfunctions, under some conditions.

Theorem 14.1. Let f be a function. If cl co f is proper (so cl co f(x) > −∞for all x), then

f∗∗(x) = cl co f(x).

In particular, f∗∗(x) = f(x) if f is a proper closed convex function.

We need a lemma in order to prove Theorem 14.1.

Lemma 14.2. For any function f , let L(f) be the set of affine functions thatare less than or equal to f , so

L(f) = h |h is affine and f(x) ≥ h(x) for all x .

If f is a proper closed convex function, then L(f) 6= ∅.

Proof. Let f be a proper closed convex function. The claim is obvious if f =∞,so we may assume dom f 6= ∅. Take any vector x ∈ dom f and real number ysuch that y < f(x). Then (x, y) /∈ epi f . Since f is a closed convex function,epi f is a closed convex set. Therefore by the separating hyperplane theorem wecan take a vector (0, 0) 6= (η, β) ∈ RN × R and a constant γ ∈ R such that

〈η, x〉+ βy < γ < 〈η, x〉+ βy

for any (x, y) ∈ epi f . Letting x = x and y → ∞, it must be β > 0. Dividingboth sides by β > 0 and letting a = −η/β and c = γ/β, we get

−〈a, x〉+ y < c < −〈a, x〉+ y

for all (x, y) ∈ epi f . Let h(x) = 〈a, x〉 + c. If x /∈ dom f , then clearly ∞ =f(x) > h(x). If x ∈ dom f , letting y = f(x) in the right inequality we getf(x) > 〈a, x〉+ c = h(x). Therefore f(x) ≥ h(x) for all x, so h ∈ L(f) 6= ∅.

Lemma 14.3. Let f be a function. If cl co f is proper, then

cl co f(x) = sup h(x) |h ∈ L(f) .

Proof. Since φ(x) = cl co f is the largest closed convex function that is less thanequal to f , and any h ∈ L(f) is an affine (hence closed convex) function that isless than equal to f , clearly

φ(x) ≥ sup h(x) |h ∈ L(f) .

147

To prove that equality holds, suppose that

φ(x) > sup h(x) |h ∈ L(f)

for some x. Then we can take a real number y such that

φ(x) > y > sup h(x) |h ∈ L(f) . (14.9)

By the left inequality, (x, y) /∈ epiφ. Since epiφ = epi cl co f = cl co epi f is aclosed convex set, by the separating hyperplane theorem we can take a vector(0, 0) 6= (η, β) ∈ RN × R and a constant γ ∈ R such that

〈η, x〉+ βy < γ < 〈η, x〉+ βy (14.10)

for any (x, y) ∈ epiφ. Letting y → ∞, we get β ≥ 0. There are two cases toconsider.

Case 1: β > 0. If β > 0, as in the proof of Lemma 14.2 dividing both sides of(14.10) by β > 0 and letting a = −η/β and c = γ/β, we get

−〈a, x〉+ y < c < −〈a, x〉+ y

for all (x, y) ∈ epiφ. Since f(x) ≥ cl co f(x) = φ(x), letting y = f(x) weget f(x) > 〈a, x〉 + c and y < 〈a, x〉 + c. The first inequality implies thath1(x) := 〈a, x〉+ c satisfies h1 ∈ L(f). Hence by the second inequality we get

y < h1(x) ≤ sup h(x) |h ∈ L(f) ,

which contradicts (14.9)

Case 2: β = 0. If β = 0, let h1(x) = −〈η, x〉 + γ. Then by (14.10) we geth1(x) < 0 < h1(x) for any x ∈ domφ. Since by assumption φ is a proper closedconvex function, be Lemma 14.2 we can take an affine function h2(x) such thatφ(x) ≥ h2(x). Hence for any λ ≥ 0 we have

φ(x) > λh1(x) + h2(x)

for any x, so h(x) = λh1(x) + h2(x) satisfies h ∈ L(f). But since h1(x) > 0, forlarge enough λ we have h(x) = λh1(x)+h2(x) > y, which contradicts (14.9).

Proof of Theorem 14.1. For h(x) = 〈ξ, x〉 − β, by the definition of L(f) andf∗ we have

h ∈ L(f) ⇐⇒ (∀x) f(x) ≥ 〈ξ, x〉 − β⇐⇒ (∀x) β ≥ 〈ξ, x〉 − f(x)

⇐⇒ β ≥ supx

[〈ξ, x〉 − f(x)] = f∗(ξ)

⇐⇒ (ξ, β) ∈ epi f∗.

Hence by the definition of the biconjugate function and Lemma 14.3,

f∗∗(x) = sup〈x, ξ〉 − f∗(ξ)

∣∣ ξ ∈ RN

= sup 〈x, ξ〉 − β | (ξ, β) ∈ epi f∗= sup h(x) |h ∈ L(f) = cl co f(x).

148

14.4 Duality theory

Looking at the entropy maximization example, the key to simplifying the dualproblem is to reduce it to the calculation of the convex conjugate function.However, unless the functions gi in the constraints (14.1) are all affine, theLagrangian L(x, λ) will not contain an affine function and therefore we cannotreduce it to a convex conjugate function.

To circumvent this issue, instead of the original constrained minimizationproblem (14.1) consider the perturbed parametric minimization problem

minimize f(x)

subject to gi(x) ≤ ui, (i = 1, . . . , I) (14.11)

where u = (u1, . . . , uI) is a parameter. Using the Lagrangian of (14.1), theLagrangian of (14.11) is

L(x, λ)− 〈λ, u〉 .Let

F (x, u) =

f(x), (∀i, gi(x) ≤ ui)∞, (∃i, gi(x) > ui)

be the value of f restricted to the feasible set and

φ(u) = infxF (x, u)

be the value function of the minimization problem 14.11. By the definition ofθ(x), we obtain θ(x) = F (x, 0).

Lemma 14.4. Let everything be as above. Then

L(x, λ) = infu

[F (x, u) + 〈λ, u〉], (14.12a)

F (x, u) = supλ

[L(x, λ)− 〈λ, u〉]. (14.12b)

Proof. Since F (x, u) = ∞ if gi(x) > ui for some i, in taking the infimum of(14.12a) we may assume gi(x) ≤ ui for all i. If λi < 0 for some i, then lettingui →∞ we get

infu

[F (x, u) + 〈λ, u〉] = −∞ = L(x, λ).

If λi ≥ 0 for all i, since F (x, u) = f(x) the infimum is attained when ui = gi(x)for all i, so

infu

[F (x, u) + 〈λ, u〉] = f(x) +

I∑i=1

λigi(x) = L(x, λ).

Since L(x, λ) = −∞ if λi < 0 for some i, in taking the supremum of (14.12b)we may assume λi ≥ 0 for all i. Then

supλ

[L(x, λ)− 〈λ, u〉] = supλ≥0

[f(x) +

I∑i=1

λi(gi(x)− ui)

]

=

f(x), (∀i, gi(x) ≤ ui)∞, (∃i, gi(x) > ui)

= F (x, u).

149

We need one more lemma.

Lemma 14.5. Let everything be as above. Then

ω(λ) = −φ∗(−λ),

supλω(λ) = φ∗∗(0).

Proof. By the definition of ω, φ, and the convex conjugate function, we get

ω(λ) = infxL(x, λ)

= infx,u

[F (x, u) + 〈λ, u〉] (∵ (14.12a))

= infu

[φ(u) + 〈λ, u〉]

= − supu

[〈−λ, u〉 − φ(u)] = −φ∗(−λ).

Therefore

supλω(λ) = sup

λ[−φ∗(−λ)] = sup

λ[〈0,−λ〉 − φ∗(−λ)] = φ∗∗(0).

We immediately obtain the main result.

Theorem 14.6 (Duality theorem). The primal value infx θ(x) and the dualvalue supλ ω(λ) coincide if and only if φ(0) = φ∗∗(0). In particular, this is thecase if φ is a proper closed convex function.

Proof. Clearly infx θ(x) = infx F (x, 0) = φ(0). By the previous lemma we havesupλ ω(λ) = φ∗∗(0). If φ is a proper closed convex function, then φ(u) = φ∗∗(u)for all u, so the primal and dual values coincide.

Problems

14.1. Compute the convex conjugate functions of the following functions.

1. f(x) = 1p |x|

p, where p > 1. (Express the solution using q > 1 such that

1/p+ 1/q = 1.)

2. f(x) =

∞, (x < 0)

0, (x = 0)

x log xa , (x > 0)

where a > 0.

3. f(x) =

∞, (x ≤ 0)

− log x. (x > 0)

4. f(x) = 〈a, x〉, where a ∈ RN .

5. f(x) = δa(x) :=

0, (x = a)

∞, (x 6= a)where a ∈ RN .

6. f(x) = 12 〈x,Ax〉, where A is an N×N symmetric positive definite matrix.

150

14.2. Let

f(x) =

∞, (x < 0)

−x2. (x ≥ 0)

1. Compute f∗(ξ), f∗∗(x), cl co f(x).

2. Does f∗∗(x) = cl co f(x) hold? If not, is it a contradiction?

14.3. Derive the dual problem of

minimize 〈c, x〉subject to Ax ≥ b, x ≥ 0.


minimize 〈c, x〉+1

2〈x,Qx〉

subject to Ax ≥ b,

where Q is a symmetric positive definite matrix.

14.5. Consider the entropy maximization problem (14.7). Noting that a0n = 1and b0 = 1, let an = (1, Tn) ∈ R1+I , b = (1, T ) ∈ R1+I , and λ = (λ0, λ1) ∈R×RI . Carry out the maximization in (14.8) with respect to λ0 alone and showthat (14.8) is equivalent to

maximize⟨T , λ1

⟩− log

(N∑n=1

e〈Tn,λ1〉

),

and also to

minimize log

(N∑n=1

e〈Tn−T ,λ1〉)⇐⇒ minimize

N∑n=1

e〈Tn−T ,λ1〉.

14.6. Let p = (p1, . . . , pN ) and q = (q1, . . . , qN ) be multinomial distributions.A concept closely related to entropy is the Kullback-Leibler information of pwith respect to q, defined by

H(p; q) =

N∑n=1

pn logpnqn.

Derive the dual problem of the minimum information problem

minimize

N∑n=1

pn logpnqn

subject to

N∑n=1

ainpn = bi, (i = 0, . . . , I)

where the minimization is over p given q.


minimize f(x)

subject to Ax = b,

where f : RN → (−∞,∞].(Hint: define the Lagrangian by L(x, λ) = f(x) + 〈λ, b−Ax〉.)

151

Chapter 15

Dynamic Programming inInfinite Horizon


Suppose you live forever and need to finance your consumption from savings.Let x0 > 0 be your initial wealth, R > 0 be the gross risk-free rate, and the flow

utility from consumption y is u(y) = y1−γ

1−γ , where 0 ≤ γ 6= 1 is the relative riskaversion coefficient, and you discount future utility with discount factor β > 0.Then what is the optimal way to consume and save?


maximize

∞∑t=0

βty1−γt

1− γ(15.1a)

subject to 0 ≤ yt ≤ xt, xt+1 = R(xt − yt), (15.1b)

where x0 > 0 is given, yt > 0 is consumption at time t, and xt is the financialwealth at the beginning of time t.

15.2 General formulation

The previous example can be generalized as follows. Let X,Y be nonemptysets. At each stage t = 0, 1, . . . (which we call “time” for concreteness), giventhe state variable xt ∈ X at time t, the decision maker can choose the controlvariable yt ∈ Γ(xt), where Γ : X Y is a correspondence with Γ(x) 6= ∅.Given the state xt and control yt, the decision maker receives a flow utilityu(xt, yt) and the next period’s state is determined by xt+1 = g(xt, yt), whereu : X × Y → [−∞,∞) and g : X × Y → X. The decision maker discountsfuture utility using discount factor β ∈ [0, 1).

152


maximize

∞∑t=0

βtu(xt, yt) (15.2a)

subject to (∀t)yt ∈ Γ(xt), xt+1 = g(xt, yt), (15.2b)

x0 ∈ X given. (15.2c)

The motivating example (15.1) is a special case by setting X = R+, Y = R+,

u(x, y) = y1−γ

1−γ , Γ(x) = [0, x], and g(x, y) = R(x− y).Given x0 ∈ X, we say that the sequence of state and control variables

(xt, yt)∞t=0 is feasible if yt ∈ Γ(xt) and xt+1 = g(xt, yt) for all t. Thus the goalis to maximize the discounted sum of utility

∑∞t=0 β

tu(xt, yt) among all feasiblesequences.1

To solve the problem (15.2), define the (supremum) value function V ∗ : X →[−∞,∞] by

V ∗(x) = sup

∞∑t=0

βtu(xt, yt)

∣∣∣∣∣x0 = x, (xt, yt)∞t=0 feasible

. (15.3)

Then by the principle of optimality (Theorem 7.1), the Bellman equation

V ∗(x) = supy∈Γ(x)

u(x, y) + βV ∗(g(x, y)) (15.4)

holds. Therefore V ∗ is a fixed point of the operator T defined by

(TV )(x) = supy∈Γ(x)

u(x, y) + βV (g(x, y)) , (15.5)

where V : X → (−∞,∞] is any function.

15.3 Verification theorem

The above argument suggests that we may solve the problem (15.2) by com-puting a fixed point of T . The following lemma provides a simple sufficientcondition.

Lemma 15.1 (Verification Theorem). Let X,Y be nonempty sets, u : X×Y →[−∞,∞), Γ : X Y be such that Γ(x) 6= ∅ for all x ∈ X, and g : X × Y →X. Suppose that V is a fixed point of the Bellman operator (15.5). Then thefollowings are true.

1. If for any feasible (xt, yt)∞t=0 we have

lim supt→∞

βtV (xt) ≥ 0, (15.6)

then V (x0) ≥ V ∗(x0).

1For this purpose, the discounted sum of utility∑∞

t=0 βtu(xt, yt) needs to be well defined

in the first place. This needs to be verified in particular applications. Here we simply assumethat the objective function is well defined. If the existence of a limit is a priori unclear, wemay write (15.2a) as lim infT→∞

∑Tt=0 β

tu(xt, yt).

153

2. Suppose that for any x ∈ X, the set

Γ∗(x) := arg maxy∈Γ(x)

u(x, y) + βV (g(x, y)) (15.7)

is nonempty. If yt ∈ Γ∗(xt) for all t and

lim inft→∞

βtV (xt) ≤ 0, (15.8)

then V (x0) ≤ V ∗(x0).

In particular, if both (15.6) and (15.8) hold, then V = V ∗ and any feasible(xt, yt)∞t=0 with yt ∈ Γ∗(xt) is a solution to (15.2).

Proof. Suppose that (15.6) holds for any feasible (xt, yt)∞t=0. Since by as-sumption V satisfies the Bellman equation (15.4), for any xt ∈ X we have

V (xt) = supyt∈Γ(xt)

u(xt, yt) + βV (g(xt, yt)) ≥ u(xt, yt) + βV (xt+1),

where we have used xt+1 = g(xt, yt) by feasibility. Iterating this inequality, weobtain

V (x0) ≥T−1∑t=0

βtu(xt, yt) + βTV (xT ).

Taking the lim sup as T → ∞ and noting that∑∞t=0 β

tu(xt, yt) exists by as-sumption, it follows from (15.6) that

V (x0) ≥∞∑t=0

βtu(xt, yt) + lim supT→∞

βTV (xT ) ≥∞∑t=0

βtu(xt, yt).

Taking the supremum over all feasible sequences, we obtain V (x0) ≥ V ∗(x0).Next, suppose that Γ∗ in (15.7) is nonempty and a feasible sequence (xt, yt)∞t=0

satisfies yt ∈ Γ∗(xt) and (15.8). Then by (15.7), we obtain

V (xt) = u(xt, yt) + βV (xt+1).

Iterating this equation, we obtain

V (x0) =

T−1∑t=0

βtu(xt, yt) + βTV (xT ).

Taking the lim inf as T → ∞ and noting that∑∞t=0 β

tu(xt, yt) exists by as-sumption, it follows from (15.8) that

V (x0) =

∞∑t=0

βtu(xt, yt) + lim infT→∞

βTV (xT ) ≤∞∑t=0

βtu(xt, yt) ≤ V ∗(x0).

In particular, if both (15.6) and (15.8) hold, then V = V ∗ and any feasible(xt, yt)∞t=0 with yt ∈ Γ∗(xt) is a solution to (15.2).

The conditions (15.6) and (15.8) are called transversality conditions. Ingeneral, transversality conditions are boundary conditions at infinity that arenecessary or sufficient for optimality, which take various forms. In the caseof Lemma 15.1, these conditions are sufficient but not necessary. For morediscussion, see Kamihigashi (2002, 2014).

154

Example 15.1. Let us apply Lemma 15.1 to solve the optimal consumption-

saving problem (15.1) for the case 0 < γ < 1. Since u(y) = y1−γ

1−γ is homogeneous

of degree 1 − γ and g(x, y) = R(x − y) is homogeneous of degree 1, we canconjecture that the value function V is also homogeneous of degree 1−γ. Henceconjecture a solution to the Bellman equation of the form

V (x) = ax1−γ

1− γfor some a > 0. Then the Bellman equation (15.4) becomes

ax1−γ

1− γ= max

0≤y≤x

y1−γ

1− γ+ βa

[R(x− y)]1−γ

1− γ

.

Since R, a > 0, it is straightforward to show that the expression inside thebraces is a concave function of y. Therefore by Proposition 11.1, the first-ordercondition for optimality is sufficient for the maximum, which is

y−γ − βaR1−γ(x− y)−γ = 0 ⇐⇒ y =x

1 + (βR1−γa)1/γ.

Substituting this into the Bellman equation, after some algebra we obtain

ax1−γ

1− γ=[1 + (βR1−γa)1/γ

]γ x1−γ

1− γ.

Comparing the coefficients of x1−γ

1−γ , we obtain

a =[1 + (βR1−γa)1/γ

]γ⇐⇒ a1/γ = 1 + (βR1−γa)1/γ

⇐⇒ a1/γ = [1− (βR1−γ)1/γ ]−1,

provided that βR1−γ < 1. Suppose that this is the case. Then the consumptionfunction must be

y =x

1 + (βR1−γa)1/γ= a−1/γx = [1− (βR1−γ)1/γ ]x.

In the context of Lemma 15.1, we have verified that V (x) = ax1−γ

1−γ with a =

[1−(βR1−γ)1/γ ]−γ satisfies the Bellman equation and Γ∗(x) =

[1− (βR1−γ)1/γ ]x

.To show that this is the solution to the optimal consumption-saving problem(15.1), it remains to show the transversality conditions (15.6) and (15.8).

Since γ > 0, we have V (x) ≥ 0, so (15.6) is trivial. Under the consumptionpolicy y = [1− (βR1−γ)1/γ ]x, by the budget constraint the next period’s wealthis

x′ = R(x− y) = (βR)1/γx.

Therefore xt = (βR)t/γx0, and

βtV (xt) = βta[(βR)t/γx0]1−γ

1− γ= a

x1−γ0

1− γ(βR1−γ)t/γ → 0

as t→∞ because βR1−γ < 1 by assumption. Therefore (15.8) holds.

Remark 15.1. When γ > 1, the condition (15.6) does not hold, so we cannotapply Lemma 15.1 to show that the consumption rule y = [1 − (βR1−γ)1/γ ]xis optimal. It turns out that this is indeed optimal, but a different argument isrequired, as we shall see below. See, for example, Toda (2014, 2019).

155

15.4 Contraction argument

Returning to the dynamic optimization problem (15.2), since V ∗ is a fixed pointof the Bellman equation (15.4), one way to compute V ∗ is to show that theBellman operator T in (15.5) is a contraction and then apply the contractionmapping theorem (Theorem 8.1). For any set X, let bX denote the space ofbounded functions from X to R. Then by Example 8.2, bX is a Banach spacewith the supremum norm ‖f‖ = supx∈X |f(x)|. The following theorem providesan algorithm for computing the value function V ∗ in (15.3).

Theorem 15.2. Let X,Y be nonempty sets, u ∈ b(X ×Y ), Γ : X Y be suchthat Γ(x) 6= ∅ for all x ∈ X, and g : X × Y → X. Then the followings are true.

1. T : bX → bX is a contraction mapping.

2. The value function V ∗ in (15.3) is the unique fixed point of T .

3. For any V (0) ∈ bX, letting V (n) = TnV (0), we have∥∥V ∗ − V (n)

∥∥ → 0 asn→∞.

Proof. We know from the principle of optimality (Theorem 7.1) that V ∗ is afixed point of T . Therefore by the contraction mapping theorem (Theorem 8.1),it suffices to show that (i) V ∗ ∈ bX, and (ii) T : bX → bX is a contraction.Since by assumption u ∈ b(X × Y ), we have∣∣∣∣∣

∞∑t=0

βtu(xt, yt)

∣∣∣∣∣ ≤∞∑t=0

βt |u(xt, yt)| ≤∞∑t=0

βt ‖u‖ =1

1− β‖u‖ <∞.

Therefore the sum of discounted utility is always finite. In particular, the valuefunction V ∗ exists and ‖V ∗‖ ≤ 1

1−β ‖u‖, so V ∗ ∈ bX.To show that T : bX → bX is a contraction, we verify Blackwell’s sufficient

conditions (Theorem 8.3). If V1, V2 ∈ bX and V1 ≤ V2, then by the definition ofT in (15.5), we have

(TV1)(x) = supy∈Γ(x)

u(x, y) + βV1(g(x, y))

≤ supy∈Γ(x)

u(x, y) + βV2(g(x, y)) = (TV2)(x),

so TV1 ≤ TV2. Therefore monotonicity holds. If V ∈ bX and c ≥ 0, then

(T (V + c))(x) = supy∈Γ(x)

u(x, y) + β(V (g(x, y)) + c)

= supy∈Γ(x)

u(x, y) + βV (g(x, y))+ βc = (TV )(x) + βc,

so in particular T (V + c) ≤ TV + βc. Therefore discounting holds. Hence byTheorem 8.3, T : bX → bX is a contraction.

The last part of Theorem 15.2 shows that to compute the value functionV ∗, it suffices to start from any bounded function V (0) and iterate the Bellmanequation, which is called value function iteration.

Theorem 15.2 does not say anything about the existence of a solution. Bycombining Theorem 15.2 and Lemma 15.1 and putting more topological struc-ture, we can prove the existence of a solution to the dynamic programming

156

problem (15.2). Recall that for a topological space X, we denote the space ofbounded continuous functions f : X → R by bcX.

Theorem 15.3. Let X,Y be topological spaces, u ∈ bc(X × Y ), Γ : X Ycontinuous, and g : X × Y → X continuous. Then the followings are true.

1. T : bcX → bcX is a contraction mapping.

2. The value function V ∗ in (15.3) is the unique fixed point of T . In partic-ular, V ∗ ∈ bcX.

3. For any V (0) ∈ bcX, letting V (n) = TnV (0), we have∥∥V ∗ − V (n)

∥∥→ 0 asn→∞.

4. If in addition Γ∗(x) in (15.7) is nonempty for each x ∈ X, then anyfeasible sequence (xt, yt)∞t=0 with yt ∈ Γ∗(xt) is a solution to the problem(15.2).

Proof. We know from Theorem 15.2 that T : bX → bX is a contraction. SincebcX ⊂ bX, to show that T : bcX → bcX is a contraction, it suffices to showthat for any V ∈ bcX, the function TV : X → R is continuous. But since byassumption u : X ×Y → R, Γ : X Y , and g : X ×Y → X are all continuous,the claim immediately follows from the maximum theorem (Theorem 13.3).

By the contraction mapping theorem, T : bcX → bcX has a unique fixedpoint V ∈ bcX. Since bcX ⊂ bX, V is also a fixed point of T : bX → bX.Since by Theorem 15.2 V ∗ is the unique fixed point of T : bX → bX, it mustbe V ∗ = V . Therefore V ∗ is the unique fixed point of T : bcX → bcX, and inparticular V ∗ ∈ bcX.

By Theorem 15.2, for any V (0) ∈ bX (in particular, V (0) ∈ bcX), lettingV (n) = TV (0), we have

∥∥V ∗ − V (n)∥∥→ 0.

Since V ∗ ∈ bcX, for any feasible (xt, yt)∞t=0, we have∣∣βtV ∗(xt)∣∣ ≤ βt ‖V ∗‖ → 0

as t→∞, so the transversality conditions (15.6) and (15.8) hold. Therefore bythe verification theorem (Lemma 15.1), any feasible sequence (xt, yt)∞t=0 withyt ∈ Γ∗(xt) is a solution to the problem (15.2).

Corollary 15.4. If Γ(x) in Theorem 15.3 is nonempty and compact, thenΓ∗(x) 6= ∅. Consequently, a solution to the problem 15.2 exists.

Proof. By Theorem 15.3, the value function V ∗ is continuous. Since by assump-tion Γ(x) is nonempty and compact, by the extreme value theorem (Theorem2.5) the right-hand side of (15.7) attains a maximum and Γ∗(x) 6= ∅.

Almost all dynamic programming problems that appear in applications donot admit closed-form solutions and thus need to be solved numerically. The-orem 15.3 shows that under its assumptions, the solution can be computed byvalue function iteration.

Although the transversality conditions (15.6) and (15.8) do not explicitlyappear in Theorem 15.3 (because they automatically hold due to the bounded-ness of the value function), in general they cannot be omitted, as the followingexample shows.

157

Example 15.2. Consider the optimal consumption-saving problem (15.1) withγ = 0. The Bellman equation is then

V (x) = max0≤y≤x

y + βV (R(x− y)) .

Since the utility function is linear, we can conjecture that the value function isalso linear, so V (x) = ax for some a > 0. Then

ax = max0≤y≤x

y + βaR(x− y) .

If βR = 1, then V (x) = x (a = 1) clearly satisfies the Bellman equation, andany y ∈ [0, x] is a maximizer. Take yt = 0 for all t. Then xt = Rtx0, and

βtV (xt) = βtRtx0 = (βR)tx0 = x0 > 0,

so the transversality condition (15.8) does not hold. Indeed, choosing (0, 0, . . . )(consuming zero forever) gives lifetime utility 0, whereas choosing (x0, 0, . . . )(consuming everything now and zero in the future) gives lifetime utility x0 > 0,so (0, 0, . . . ) is not optimal.

15.5 Non-contraction argument

Theorem 15.3 is quite elegant in that it proves the existence, uniqueness, and acomputational algorithm for the solution to a dynamic programming problem.Unfortunately, Theorem 15.3 is quite unsatisfactory from an applied perspectivebecause its assumptions are often not satisfied in applications. For example,in many applications it is common to use the constant relative risk aversion(CRRA) utility function

u(y) =

y1−γ

1−γ , (0 < γ 6= 1)

log y. (γ = 1)

It is clear that u(y) is unbounded above if γ ≤ 1 and unbounded below if γ ≥ 1,so u(y) is always unbounded. Therefore unless we a priori know that y takesvalues in a bounded set (and bounded away from 0), we cannot apply Theorem15.3.

Sometimes, we can prove the existence of a solution using value functioniteration starting from the zero function.

Theorem 15.5. Let X,Y be nonempty sets, u : X×Y → [−∞,∞), Γ : X Ybe such that Γ(x) 6= ∅ for all x ∈ X, and g : X × Y → X. Define V (0)(x) ≡ 0,V (n)(x) = TnV (0), and

V (x) = lim infn→∞

V (n)(x). (15.9)

Then V ≥ V ∗ and V ≥ TV . If in addition V ≤ TV , Γ∗ in (15.7) is nonempty,and the transversality condition (15.8) holds, then V = V ∗ and any feasible(xt, yt)∞t=0 with yt ∈ Γ∗(xt) is a solution to (15.2).

Proof. Take any feasible sequence (xt, yt)∞t=0. Then by definition

V (n)(xt) = supyt∈Γ(xt)

u(xt, yt) + βV (n−1)(g(xt, yt))

≥ u(xt, yt)+βV

(n−1)(xt+1).

158

Iterating this for n = T, T − 1, . . . , 1, we obtain

V (T )(x0) ≥T−1∑t=0

βtu(xt, yt) + βTV (0)(xT ) =

T−1∑t=0

βtu(xt, yt),

where we have used V (0) ≡ 0. Taking the lim inf as T → ∞ and noting that∑∞t=0 β

tu(xt, yt) exists by assumption, we obtain

V (x0) ≥∞∑t=0

βtu(xt, yt).

Taking the supremum over all feasible sequences, we obtain V (x0) ≥ V ∗(x0).Similarly, by definition

V (n)(x) = (TV (n−1))(x) ≥ u(x, y) + βV (n−1)(g(x, y)).

Taking the lim inf as n→∞, we obtain

V (x) ≥ u(x, y) + βV (g(x, y)).

Taking the supremum over y ∈ Γ(x), we obtain V ≥ TV .If V ≤ TV , then (since V ≥ TV ) V is a fixed point of T . Therefore if Γ∗

in (15.7) is nonempty and the transversality condition (15.8) holds, then by theverification theorem (Lemma 15.1) we have V ≤ V ∗, so it must be V = V ∗.

Corollary 15.6 (Positive utility). Let everything be as in Theorem 15.5 andsuppose u ≥ 0. Then TV = V . Consequently, if Γ∗ in (15.7) is nonemptyand the transversality condition (15.8) holds, then V = V ∗ and any feasible(xt, yt)∞t=0 with yt ∈ Γ∗(xt) is a solution to (15.2).

Proof. Since u(x, y) ≥ 0 and V (0) ≡ 0, we have

V (1)(x) = supy∈Γ(x)

u(x, y) ≥ 0 = V (0)(x).

Using the monotonicity of T , by induction we obtain

V (n) ≥ V (n−1) ≥ · · · ≥ V (1) ≥ 0.

Therefore V (n) ↑ V as n→∞, so

V (n)(x) = supy∈Γ(x)

u(x, y) + βV (n−1)(g(x, y))

≤ supy∈Γ(x)

u(x, y) + βV (g(x, y)) .

Letting n → ∞, we obtain V ≤ TV . The conclusion follows from Theorem15.5.

Problems

159

Part III

Introduction to NumericalAnalysis

160

Chapter 16

Solving NonlinearEquations

So far, we have studied optimization problems from a theoretical perspective. Ifthe objective function happens to be convex or concave, to minimize or maximizeit, all we need to do is to find a point at which the derivative is zero. This iseasier said than done. In practice, almost all problems have no closed-formsolutions, and therefore we need to use some kind of numerical algorithms tofind the (approximate) solution. Note that if a (one-variable) function f isdifferentiable, the first-order condition for optimality is f ′(x) = 0. Lettingg(x) = f ′(x), it thus suffices to solve the nonlinear equation g(x) = 0. Thischapter discusses algorithms for solving such nonlinear equations.

16.1 Bisection method

Let g : R→ R be a continuous function and suppose that

g(x)

< 0, (x < x∗)

= 0, (x = x∗)

> 0. (x > x∗)

These inequalities show that we know exactly whether the current approximatesolution x is greater or less than the true solution x∗ according as g(x) ≷ 0.The idea of the bisection method is to decrease x if g(x) > 0 and increase x ifg(x) < 0, until we find a value such that g(x) ≈ 0.

To describe the algorithm, let

x = current approximate solution,

x¯

= current lower bound of x∗,

x = current upper bound of x∗,

ε = error tolerance for x∗.

The following is the bisection algorithm.

161

Initialization Verify that g(x) = 0 has a unique solution and crosses 0 frombelow (e.g., g is increasing). Select the error tolerance ε > 0. Find x

¯and

x such that g(x¯) < 0 and g(x) > 0.

Iteration 1. Compute x = x¯+x2 and g(x).

2. If g(x) < 0, set x¯

= x. If g(x) > 0, set x = x.

Stopping rule If x− x¯< ε, stop. The approximate solution is x. Otherwise,

repeat the iteration step.

The bisection method also works when g crosses 0 from above, but theupdating rule of the lower and upper bounds must be interchanged in the obviousway.

The bisection method is a sure way to obtain a solution but is slow. Sincethe interval gets halved at each iteration, after n iterations the length of theinterval is of the order 2−n. Therefore convergence is (only) exponentially fast.

16.2 Order of convergence

At this point it is useful to define how fast an algorithm converges. Let xn∞n=0

be the sequence of approximate solutions generated by some algorithm. Let x∗

be the true solution. We say that the order of convergence of the algorithm isα if there exist constants α ≥ 1 and β > 0 such that

|xn+1 − x∗| ≤ β |xn − x∗|α (16.1)

for sufficiently large n.If α = 1, we also require β < 1 to guarantee convergence. In that case, by

iteration (and assuming that the inequality (16.1) holds for all n) we get

|xn − x∗| ≤ βn |x0 − x∗| ,

so xn converges to x∗ exponentially fast. Therefore the bisection method hasorder of convergence 1. If α > 1, then xn converges to x∗ double exponentially.To see this, let us find a constant C > 0 such that

C |xn+1 − x∗| ≤ (C |xn − x∗|)α.

Comparing with the definition of the order of convergence (16.1), it suffices to

choose C such that β = Cα−1 ⇐⇒ C = β1

α−1 . Iterating (16.1) over n, weobtain

C |xn − x∗| ≤ (C |x0 − x∗|)αn

,

so provided that |x0 − x∗| < 1/C, we get

|xn − x∗| ≤ C−1(C |x0 − x∗|)αn

→ 0,

and the speed of convergence is double exponentially fast.How many iterations are needed to compute the solution up to some decimal

place, say d? When the order of convergence is 1 (exponential), the number ofiterations required is approximately

βn = 10−d ⇐⇒ n = − d

log10 β=⇒ n = constant× d.

162

On the other hand, when the order of convergence is α > 1, the number ofiterations required is approximately

(C |x0 − x∗|)αn

= 10−d ⇐⇒ αn = − d

log10 C |x0 − x∗|

=⇒ n =1

logα(log d+ constant) .

Since log d is much smaller than d, in practice it is important to use algorithmsthat have order α > 1.

16.3 Newton method

The bisection method is inefficient in the sense that the only information ofg(x) the algorithm uses is its sign. Unsurprisingly, the order of convergence is1, which is slow. The Newton method, which is based on Taylor’s theorem, usesthe function value and the derivative and achieves a much faster convergence.

The idea of the Newton method is as follows. Let g be continuously differ-entiable. Suppose that you have an approximate solution at x = a. By Taylor’stheorem, we have

g(x) ≈ g(a) + g′(a)(x− a).

Since the right-hand side is linear in x, we can just solve it to obtain

0 = g(a) + g′(a)(x− a) ⇐⇒ x = a− g(a)

g′(a).

The formal algorithm of the Newton method is as follows.

1. Pick an initial value x0 and error tolerance ε > 0.

2. For n = 1, 2, . . . , compute

xn+1 = xn −g(xn)

g′(xn). (16.2)

3. Stop if |xn+1 − xn| < ε. The approximate solution is xn+1.

The following theorem shows that the Newton method has order of conver-gence 2.

Theorem 16.1. Let g : R → R be twice continuously differentiable. Supposethat x∗ ∈ R satisfies g(x∗) = 0 and g′(x∗) 6= 0. Then there exists a constantC > 0 and a neighborhood U of x∗ such that xn ∈ U implies

|xn+1 − x∗| ≤ C |xn − x∗|2 .

Proof. Let xn be the current approximate solution. Subtracting x∗ from bothsides of (16.2), we get

xn+1 − x∗ = xn − x∗ −g(xn)

g′(xn)= −g(xn) + g′(xn)(x∗ − xn)

g′(xn).

163

Since g is twice continuously differentiable, applying Taylor’s theorem to g(x∗)around xn, there exists t ∈ [0, 1] such that ξ := (1− t)x∗ + txn satisfies

0 = g(x∗) = g(xn) + g′(xn)(x∗ − xn) +1

2g′′(ξ)(x∗ − xn)2.

Substituting into the expression for xn+1− x∗ and assuming g′(xn) 6= 0, we get

xn+1 − x∗ =g′′(ξ)

2g′(xn)(xn − x∗)2.

Since by assumption g is twice continuously differentiable and g′(x∗) 6= 0, wecan take a neighborhood U of x∗ such that g′(x) 6= 0 for x ∈ U and

β := supt∈[0,1]

supx∈U

∣∣∣∣g′′((1− t)x+ tx∗)

2g′(x)

∣∣∣∣ <∞.Then |xn+1 − x∗| ≤ β |xn − x∗|2 whenever xn ∈ U , so the order of convergenceof the Newton method is (at least) 2.

The Newton method can be applied to solve a system of nonlinear equations.For example, let g : RN → RN and we would like to solve g(x) = 0. By Taylor’stheorem, we have

g(x) ≈ g(a) +Dg(a)(x− a) ⇐⇒ x ≈ a− [Dg(a)]−1g(a),

where Dg denotes the N×N Jacobian of g. Thus if x0 is close to a true solutionx∗ and Dg(x∗) is regular, we can expect that iterating

xn+1 = xn − [Dg(xn)]−1g(xn)

converges to x∗ (Problem 16.4).

16.4 Linear interpolation

The Newton method requires the function value g(x) and its derivative g′(x) toimplement it. Oftentimes, the derivative g′(x) has a complicated form. In somecases (e.g., the objective function is defined only numerically, not analytically),it is impossible to compute the derivative. In such cases, we can use linearinterpolation to solve for the solution.

Let xn and xn−1 be the two most recent approximate solutions to g(x) = 0.Approximating g by the linear function that agrees with g at these two points,we obtain

g(x) ≈ g(xn)− g(xn−1)

xn − xn−1(x− xn) + g(xn).

Setting the right-hand side equal to 0, we obtain

g(xn)− g(xn−1)

xn − xn−1(x− xn) + g(xn) = 0

⇐⇒ xn+1 = xn − g(xn)xn − xn−1

g(xn)− g(xn−1). (16.3)

Problem 16.3 asks you to show that the order of convergence of the linear

interpolation method is the golden ratio α = 1+√

52 = 1.618 . . . .

164

16.5 Quadratic interpolation

The linear interpolation method approximates a nonlinear function by a linearone by interpolating between two points. This way, we can solve for the newapproximate solution explicitly by solving a linear equation. However, we canalso solve quadratic equations explicitly. The quadratic interpolation methodfits a quadratic function to three points.

Suppose that you have three approximate solutions a < b < c to the thenonlinear equation g(x) = 0, with g(a)g(c) < 0. The quadratic interpolationmethod constructs a quadratic function that agrees with g at these three points,and then finds the root. By direct substitution, we can show that the quadraticfunction

q(x) = g(a)(x− b)(x− c)(a− b)(a− c)

+ g(b)(x− c)(x− a)

(b− c)(b− a)+ g(c)

(x− a)(x− b)(c− a)(c− b)

= Ax2 +Bx+ C

satisfies q(x) = g(x) for x = a, b, c. Comparing the coefficients, we obtain

A =g(a)

(a− b)(a− c)+

g(b)

(b− c)(b− a)+

g(c)

(c− a)(c− b),

B = − g(a)(b+ c)

(a− b)(a− c)− g(b)(c+ a)

(b− c)(b− a)− g(c)(a+ b)

(c− a)(c− b),

C =g(a)bc

(a− b)(a− c)+

g(b)ca

(b− c)(b− a)+

g(c)ab

(c− a)(c− b).

Using the formula for the solution to a quadratic equation, we obtain

x =−B ±

√B2 − 4AC

2A,

where we should pick the sign ± such that a < x < c.The quadratic interpolation algorithm is defined as follows.

1. Pick initial values a0 < b0 < c0 and error tolerance ε > 0.

2. For each n, compute d = xn+1 given current an, bn, cn. Stop if |xn+1 − xn| <ε. Otherwise, set

(an+1, bn+1, cn+1) =

(an, d, bn), (an < d < bn)

(bn, d, cn). (bn < d < cn)

The order of convergence of the quadratic interpolation method is the root α > 1of the equation

x3 − x2 − x− 1 = 0,

which is α = 1.8393 . . . .

16.6 Robustifying the algorithms

The linear interpolation, quadratic interpolation, and Newton methods are usu-ally much faster than the bisection method because the order of convergence

165

exceeds 1. However, this does not mean that we should always use such al-gorithms. There are at least two reasons. First, algorithms other than thebisection method are only local, meaning that they are guaranteed to convergeonly if the initial value is close enough to the true solution. If the initial value isfar away from the true solution, the algorithm may not converge at all. On theother hand, the bisection method is a global algorithm (if the function has a sin-gle solution), so convergence is guaranteed (although slow). One way to makethe algorithm robust is to initially use a global algorithm such the bisectionmethod (or grid search), and then switch to a faster local algorithm.

Second, because the linear interpolation and Newton methods approximatethe function by a linear function and the quadratic interpolation method by aquadratic function, the approximation may be poor when the function is highlynonlinear (or highly non-quadratic). For example, consider the equation

g(x) = x100 − 2.

Then the function value is of order 1 on [0, 1] but is huge when x > 1. Then theconvergence of linear and quadratic interpolation can be slow. In that case itmay be useful to “robustify” the algorithm by considering the equation h(x) = 0instead, where

h(x) = max −1,min g(x), 1 .

Problems

16.1. Let f(x) =√x2 + 1.

1. Compute f ′(x), f ′′(x), and show that f is convex.

2. Find the minimum of f .

3. Using your favorite programming language, implement the Newton methodfor finding the minimum (solving f ′(x) = 0). Experiment what happenswhen the initial values are x0 = 0.9, 1, and 1.1.

16.2. Consider the nonlinear equation

g(x) = x3 − 2 = 0,

where x > 0. Clearly the solution is x = 21/3 ∈ (1, 2). Using your favorite pro-gramming language, implement the bisection, linear interpolation, quadratic in-terpolation, and Newton methods and compare the speed of convergence. Whatif g(x) = x100 − 2?

16.3. This problem asks you to derive the order of convergence of the linearinterpolation method. Let g be a twice continuously differentiable function withg(x∗) = 0 and g′(x∗) 6= 0. Consider the linear interpolation algorithm (16.3).

1. Let φ(x; a) = g(x)−g(a)x−a for x 6= a. Show that

xn+1 − x∗ = (xn − x∗)φ(xn−1;xn)− φ(x∗;xn)

φ(xn−1;xn).

166

2. Using the mean value theorem, show that there exists ξn between xn andxn−1 such that φ(xn−1;xn) = g′(ξn).

3. Regard φ(x; a) as a function of x. Using the mean value theorem, showthat there exists a number ηn between xn−1 and x∗ such that

φ(xn−1;xn)− φ(x∗;xn) = φ′(ηn;xn)(xn−1 − x∗).

4. Compute φ′(x; a) explicitly.

5. Using Taylor’s theorem, show that there exists a number ζn such that

φ′(ηn;xn) =1

2g′′(ζn).

6. Show that if xn, xn−1 are sufficiently close to x∗, there exists a constantC > 0 such that

|xn+1 − x∗| ≤ C |xn − x∗| |xn−1 − x∗| .

7. Show that the order of convergence of the linear interpolation method is

at least α = 1+√

52 = 1.618 . . . .

16.4. Let g : RN → RN be twice continuously differentiable. Assume thatg(x∗) = 0 and the Jacobian Dg(x∗) is regular. Show that the Newton algorithmconverges to x∗ double exponentially fast if the initial value is close enough tox∗. (Hint: use the mean value inequality (Proposition 8.7).)

167

Chapter 17

Polynomial approximation

Polynomials are useful for approximating smooth functions because they can bedifferentiated and integrated analytically. This chapter studies the polynomialapproximation of a one-variable function.

17.1 Lagrange interpolation

Since a degree n− 1 polynomial is determined by n coefficients, once we specifyn points on the xy plane, there exists (at most) one polynomial that passesthrough these points. Lagrange interpolation gives an explicit formula for theinterpolating polynomial.

Proposition 17.1. Let x1 < · · · < xn and define the k-th Lagrange polynomial

Lk(x) =

∏l 6=k(x− xl)∏l 6=k(xk − xl)

for k = 1, . . . , n. Then

p(x) =n∑k=1

ykLk(x)

is the unique polynomial of degree up to n − 1 satisfying p(xk) = yk for k =1, . . . , n.

Proof. By the definition of Lk(x), we have Lk(xl) = δkl, where δkl = 1 if k = land δkl = 0 if k 6= l, which is called Kronecker’s delta.1 Therefore for all l, wehave

p(xl) =

n∑k=1

ykLk(xl) =

n∑k=1

ykδkl = yl.

Clearly Lk(x) is a polynomial of degree n− 1, so p(x) is a polynomial of degreeup to n− 1.

If we interpolate a function f(x) at the points x1 < · · · < xn by a degreen − 1 polynomial, what is the approximation error? The following propositiongives an error bound if f is sufficiently smooth.

1https://en.wikipedia.org/wiki/Kronecker_delta

168

https://en.wikipedia.org/wiki/Kronecker_delta

Proposition 17.2. Let f : R→ R be Cn and pn−1 be the interpolating polyno-mial of f at x1 < · · · < xn. Then for any x, there exists ξ in the convex hull ofx, x1, . . . , xn such that

f(x)− pn−1(x) =f (n)(ξ)

n!

n∏k=1

(x− xk). (17.1)

Proof. If x = xk for some k, then f(xk) − pn−1(xk) = 0, so (17.1) is trivial.Suppose x 6= xk for all k and let I = co x, x1, . . . , xn. For any t ∈ I, letR(t) = f(t)− pn−1(t) be the error term and define

g(t) = R(t)S(x)−R(x)S(t),

where S(t) =∏nk=1(t − xk). Clearly g(x) = 0. Furthermore, since R(xk) =

S(xk) = 0, we have g(xk) = 0 for k = 1, . . . , n. In general, if g is differentiableand g(a) = g(b) = 0, by the mean value theorem (Proposition 3.2) there existsc ∈ (a, b) such that g′(c) = 0. Applying this to the n non-overlapping intervalswith endpoints x, x1, . . . , xn, there exist n distinct points y1, . . . , yn betweenx, x1, . . . , xn such that g′(yk) = 0 for k = 1, . . . , n. Continuing this argument,there exists ξ ∈ I such that g(n)(ξ) = 0. But since S is a degree n polynomialwith leading coefficient 1, we have S(n) = n!, so

0 = g(n)(ξ) = R(n)(ξ)S(x)−R(x)n!.

Since R(t) = f(t)− pn−1(t) and deg pn−1 ≤ n− 1, we obtain R(n)(ξ) = f (n)(ξ).Therefore

f(x)− pn−1(x) = R(x) =1

n!f (n)(ξ)S(x) =

f (n)(ξ)

n!

n∏k=1

(x− xk).

17.2 Chebyshev polynomials

If we want to interpolate a function on an interval by a polynomial but we arefree to choose the interpolation nodes x1, . . . , xn, how should we choose them?By mapping the interval with an affine function, without loss of generality wemay assume that the interval is [−1, 1]. Since f (n)(ξ) in (17.1) depends on theparticular function f but

∏nk=1(x−xk) does not, it is natural to find x1, . . . , xn

so as to minimize

maxx∈[−1,1]

∣∣∣∣∣n∏k=1

(x− xk)

∣∣∣∣∣ .Chebyshev2 has solved this problem a long time ago.

The degree n Chebyshev polynomial Tn(x) is obtained by expanding cosnθas a degree n polynomial of cos θ and setting x = cos θ. For instance,

cos 0θ = 1 =⇒ T0(x) = 1,

cos θ = cos θ =⇒ T1(x) = x,

cos 2θ = 2 cos2 θ − 1 =⇒ T2(x) = 2x2 − 1,

2https://en.wikipedia.org/wiki/Pafnuty_Chebyshev

169

https://en.wikipedia.org/wiki/Pafnuty_Chebyshev

and so on. In general, adding

cos(n+ 1)θ = cosnθ cos θ − sinnθ sin θ,

cos(n− 1)θ = cosnθ cos θ + sinnθ sin θ,

and setting x = cosnθ, we obtain

Tn+1(x) = 2xTn(x)− Tn−1(x). (17.2)

The coefficients of Chebyshev polynomials can be easily computed by iterating(17.2).

Theorem 17.3. The solution to

minx1≥···≥xn

maxx∈[−1,1]

∣∣∣∣∣n∏k=1

(x− xk)

∣∣∣∣∣is given by xk = cos 2k−1

2n π, in which case∏nk=1(x− xk) = 21−nTn(x).

Proof. Let p(x) = 21−nTn(x). By the recursive formula (17.2), the leadingcoefficient of Tn(x) is 2n−1. Therefore the leading coefficient of p(x) is 1. Sincep(cos θ) = 21−n cosnθ, clearly

supx∈[−1,1]

|p(x)| = supθ∈[−π,π]

21−n |cosnθ| = 21−n.

Suppose that there exists a degree n polynomial q(x) with leading coefficient 1such that supx∈[−1,1] |q(x)| < 21−n. Again since p(cos θ) = 21−n cosnθ, we have

p(x) = (−1)k21−n at x = yk = cos kπ/n, where k = 0, 1, . . . , n. Since |q(x)| <21−n for all x ∈ [−1, 1], by the intermediate value theorem there exist z1, . . . , znbetween y0, . . . , yn such that p(zk)−q(zk) = 0. But since p, q are polynomials ofdegree n with leading coefficient 1, r(x) := p(x)− q(x) is a polynomial of degreeup to n−1. Since r(zk) = 0 for k = 1, . . . , n, it must be r(x) ≡ 0 or p ≡ q, whichis a contradiction. Therefore

∏nk=1(x−xk) = 21−nTn(x), so xk = cos 2k−1

2n π fork = 1, . . . , n.

17.3 Projection

In economics, we often need to solve functional equations. For instance, in adynamic programming problem, we need to solve for either the value functionor the policy function. The projection method (a standard reference is Judd,1992) approximates the policy function (here whatever you want to solve for)on some compact set by a polynomial.

By Theorem 17.3, if you want to approximate a smooth function f on [−1, 1]by a degree N − 1 polynomial, it is “optimal” to interpolate f at the roots ofthe degree N Chebyshev polynomial. The idea of the projection method isto approximate the policy function f by a linear combination of Chebyshevpolynomials,

f(x) ≈ f(x) =

N−1∑n=0

anTn(x),

170

and determine the coefficients anN−1n=0 to make the functional equation (that

you want to solve) true at the Chebyshev nodes.It is easier to see how things work by looking at a simple example. Suppose

you want to solve the ordinary differential equation (ODE)

y′(t) = y(t) (17.3)

with initial condition y(0) = 1. Of course the solution is y(t) = et, but let uspretend that we do not know the solution and solve it numerically. Suppose wewant to compute a numerical solution for t ∈ [0, T ], where T > 0 is some upperbound. We can do as follows.

1. Map [0, T ] to [−1, 1] by the affine transformation t 7→ 2t−TT .

2. Approximate y(t) by

y(t) =

N−1∑n=0

anTn

(2t− TT

),

where anN−1n=0 are unknown coefficients.

3. Determine anN−1n=0 by setting y′(t) = y(t) at t corresponding to the roots

of the degree N Chebyshev polynomial; more precisely, find tn by solving2tn−TT = cos

(2n−12N π

)for n = 1, . . . , N .

4. In this example, we must also impose the initial condition y(0) = 1, sofor example we can minimize the sum of squared residuals at Chebyshevnodes:

minimizeanN−1

n=0

N−1∑n=0

(y′(tn)− y(tn))2 (17.4a)

subject to y(0) = 1. (17.4b)

In general, when we solve a minimization problem such as (17.4) numerically,for numerical stability it is a good idea to start from a low order approximation(say N − 1 = 2) and compute the solution for progressively higher order usingthe previous solution as an initial guess. (Implementing numerical methods isan art and often requires a lot of trial and error.) Figure 17.1 shows the log10

relative errors when T = 4 and N − 1 = 3, 4, . . . . We can see that the relativeerrors become smaller as we increase the degree of polynomial approximation.

For more details on the projection method, see Pohl et al. (2018), which hasa nice application to solving asset pricing models.

Problems

17.1. Using your favorite programming language, write a code that computesthe coefficients of a Chebyshev polynomial of a given degree.

17.2. Using your favorite programming language, implement the projectionmethod for solving the ordinary differential equation (17.3) and replicate Figure17.1.

171

Figure 17.1: log10 relative errors for solving the ODE (17.3).

172

Chapter 18

Quadrature

Many economic problems involve maximizing the expected value. Unless thedistribution is discrete, expectations become integrals, and we cannot computethem explicitly except for special cases. Therefore we need numerical methodsto evaluate integrals, which are called quadrature (or numerical integration).

A typical quadrature formula has the form∫ b

a

f(x) dx ≈N∑n=1

wnf(xn), (18.1)

where f is a general integrand, xnNn=1 are nodes, and wnNn=1 are weights ofthe quadrature rule. In this chapter we cover the most basic theory of quadra-ture. See Davis and Rabinowitz (1984) for a more complete textbook treatment.

18.1 Newton-Cotes quadrature

The simplest quadrature rule is to divide the interval [a, b] into N − 1 evenlyspaced subintervals (so xn = a + n−1

N−1 (b − a) for n = 1, . . . , N) and choose the

weights wnNn=1 so that one can integrate all polynomials of degree N − 1 orless exactly. This quadrature rule is known as the N -point Newton-Cotes rule.Since we can map the interval [0, 1] to [a, b] through the linear transformationx 7→ a+ (b− a)x, without loss of generality let us assume a = 0 and b = 1. Weconsider several examples.

18.1.1 Trapezoidal rule (N = 2)

The 2-point Newton-Cotes rule is known as the trapezoidal rule. In this casewe have xn = 0, 1, and we choose w1, w2 to integrate a linear function exactly.Therefore requiring that (18.1) holds exactly for f(x) = 1, x, we obtain

1 =

∫ 1

0

1 dx = w1 + w2,

1

2=

∫ 1

0

xdx = w2.

173

Solving these equations, we obtain w1 = w2 = 12 . Changing the interval from

[0, 1] to [a, b], the trapezoidal rule becomes∫ b

a

f(x) dx ≈ b− a2

(f(a) + f(b)). (18.2)

Let us estimate the error of this approximation. Let p(x) be the degree 1interpolating polynomial of f at x = a, b. Since p agrees with f at a, b, clearly∫ b

a

p(x) dx =b− a

2(f(a) + f(b)).

Therefore by Proposition 17.2, we obtain∫ b

a

f(x) dx− b− a2

(f(a) + f(b)) =

∫ b

a

(f(x)− p(x)) dx

=

∫ b

a

f ′′(ξ(x))

2(x− a)(x− b) dx,

where ξ(x) ∈ (a, b). Since (x−a)(x−b) < 0 on (a, b), by the mean value theoremfor Riemann-Stieltjes integrals, there exists c ∈ (a, b) such that∫ b

a

f ′′(ξ(x))

2(x− a)(x− b) dx =

f ′′(c)

2

∫ b

a

(x− a)(x− b) dx = −f′′(c)

12(b− a)3.

Therefore we can estimate the error in (18.2) as∣∣∣∣∣∫ b

a

f(x) dx− b− a2

(f(a) + f(b))

∣∣∣∣∣ ≤ ‖f ′′‖12(b− a)3, (18.3)

where ‖·‖ denotes the sup norm on [a, b].

18.1.2 Simpson’s rule (N = 3)

The 3-point Newton-Cotes rule is known as Simpson’s rule. In this case thequadrature nodes are xn = 0, 1/2, 1, and we choose the weights w1, w2, w3 so asto integrate a quadratic function exactly. Therefore requiring that (18.1) holdsexactly for f(x) = 1, x, x2, we obtain

1 =

∫ 1

0

1 dx = w1 + w2 + w3,

1

2=

∫ 1

0

xdx =1

2w2 + w3,

1

3=

∫ 1

0

x2 dx =1

4w2 + w3.

Solving these equations, we obtain w1 = w3 = 16 and w2 = 2

3 . Changing theinterval from [0, 1] to [a, b], Simpson’s rule becomes∫ b

a

f(x) dx ≈ b− a6

(f(a) + 4f

(a+ b

2

)+ f(b)

). (18.4)

174

Interestingly, since1

4=

∫ 1

0

x3 dx =1

8w2 + w3,

Simpson’s rule actually integrates polynomials of degree 3 exactly, even thoughit is not designed to do so.

To estimate the error of Simpson’s rule (18.4), take any point d ∈ (a, b) andlet p(x) be a degree 3 interpolating polynomial of f at x = a, a+b

2 , b, d. SinceSimpson’s rule integrates degree 3 polynomials exactly, by Proposition 17.2 wehave∫ b

a

f(x) dx− b− a6

(f(a) + 4f

(a+ b

2

)+ f(b)

)=

∫ b

a

(f(x)− p(x)) dx =

∫ b

a

f (4)(ξ(x))

4!(x− a)

(x− a+ b

2

)(x− b)(x− d) dx.

Since d ∈ (a, b) is arbitrary, we can take d = a+b2 . Since

(x− a)

(x− a+ b

2

)2

(x− b) < 0

on (a, b) almost everywhere, as before we can apply the mean value theorem.Using the change of variable x = a+b

2 + b−a2 t, we can compute∫ b

a

(x− a)

(x− a+ b

2

)2

(x− b) dx

=

(b− a

2

)5 ∫ 1

−1

(t+ 1)t2(t− 1) dt = − 1

120(b− a)5.

Since 4! = 24 and 24× 120 = 2880, the integration error of (18.4) is∣∣∣∣∣∫ b

a

f(x) dx− b− a6

(f(a) + 4f

(a+ b

2

)+ f(b)

)∣∣∣∣∣ ≤∥∥f (4)

∥∥2880

(b− a)5. (18.5)

18.1.3 Compound rule

Newton-Cotes rules with N ≥ 4 are almost never used because beyond someorderN , some of the weights wnNn=1 become negative, which introduces round-ing errors. One way to avoid this problem is to divide the interval [a, b] into Nevenly spaced subintervals and apply the trapezoidal rule or the Simpson’s ruleto each subinterval. This method is known as the compound (or composite)rule.

If you apply the trapezoidal rule to N subintervals, then there are N + 1endpoints. Letting xn = n/N for n = 0, 1, . . . , N , the formula for [0, 1] is∫ 1

0

f(x) dx ≈N∑n=1

1

2N(f(xn−1) + f(xn))

=1

2N(f(x0) + 2f(x1) + · · ·+ 2f(xN−1) + f(xN )) .

175

(Just remember that the relative weights are 1 at endpoints and 2 in between.)Since b− a = 1/N and there are N subintervals, the error of the (N + 1)-point

trapezoidal rule is of order‖f ′′‖

12 N−2.If you apply Simpson’s rule, then there are 3 points on each subinterval,

of which there are N , and N − 1 endpoints are counted twice. Therefore thetotal number of points is 3N − (N − 1) = 2N + 1. Letting xn = n/(2N) forn = 0, 1, . . . , 2N , the formula for [0, 1] is∫ 1

0

f(x) dx ≈N∑n=1

1

6N(f(x2n−2) + 4f(x2n−1) + f(x2n))

=1

6N(f(x0) + 4f(x1) + 2f(x2) + · · ·+ 4f(x2N−1) + f(x2N )) .

(Just remember that the relative weights are 1 at endpoints, and they alternatelike 4, 2, 4, 2, . . . , 4, 2, 4 in between.) Since b− a = 1/N and there are N subin-

tervals, the error of the (2N + 1)-point Simpson’s rule is of order‖f(4)‖2880 N−4.

Since the quadrature weights are given explicitly for trapezoidal and Simp-son’s rules, it is straightforward to write codes for computing numerical inte-grals. Tables 18.1 and 18.2 show the log10 relative errors of integrals over the

interval [0, 1] (log10

∣∣∣I/I − 1∣∣∣, where I is the true integral and I is the numerical

one) for several functions when we use the N -point compound trapezoidal andSimpson’s rule. As the above error analysis suggests, errors tend to be smallerwhen the integrand is smoother (has higher order derivatives). Furthermore,Simpson’s rule is more accurate than the trapezoidal rule.

Table 18.1: log10 relative errors of compound trapezoidal rule.

# points x1/2 x3/2 x5/2 x7/2 x9/2 ex

3 -1.0238 -1.1743 -0.7343 -0.4896 -0.3041 -1.68305 -1.4550 -1.7558 -1.3394 -1.0875 -0.8937 -2.28389 -1.8926 -2.3438 -1.9427 -1.6885 -1.4928 -2.885517 -2.3346 -2.9361 -2.5452 -2.2902 -2.0941 -3.487433 -2.7795 -3.5314 -3.1474 -2.8922 -2.6960 -4.089565 -3.2264 -4.1287 -3.7495 -3.4943 -3.2980 -4.6915

Table 18.2: log10 relative errors of compound Simpson’s rule.

# points x1/2 x3/2 x5/2 x7/2 x9/2 ex

3 -1.3676 -2.2275 -2.3780 -1.8192 -1.1040 -3.47225 -1.8179 -2.9667 -3.3705 -2.9823 -2.3199 -4.66679 -2.2691 -3.7142 -4.3841 -4.1584 -3.5289 -5.868417 -2.7206 -4.4649 -5.4112 -5.3435 -4.7350 -7.072033 -3.1722 -5.2168 -6.4470 -6.5346 -5.9399 -8.275965 -3.6237 -5.9692 -7.4884 -7.7297 -7.1443 -9.4800

176

18.2 Gaussian quadrature

In the Newton-Cotes quadrature, we assume that the nodes are evenly spaced,but of course there is no particular reason to do so. Can we do better by choosingthe quadrature nodes optimally? In general, consider the integral∫ b

a

w(x)f(x) dx, (18.6)

where −∞ ≤ a < b ≤ ∞ are endpoints of integration, w(x) > 0 is some (fixed)weighting function, and f is a general integrand. A typical example is a = −∞,b = ∞, and w(x) = 1√

2πσ2e−(x−µ)2/2σ2

, in which case we want to compute the

expectation E[f(X)] when the random variable X is normally distributed asX ∼ N(µ, σ2).

The main result in this section is that for any N , we can choose nodesxnNn=1 and weights wnNn=1 such that we can integrate all polynomials ofdegree up to 2N − 1 exactly using a quadrature formula of the form (18.1),which is known as the Gaussian quadrature. In the discussion below, assume

that∫ baw(x)xn dx exists for all n ≥ 0, where −∞ ≤ a < b ≤ ∞ are fixed. For

functions f, g, define the inner product (f, g) by

(f, g) =

∫ b

a

w(x)f(x)g(x) dx. (18.7)

As usual, define the norm of f by ‖f‖ =√

(f, f). For notational simplicity, let

us omit a, b, so∫

means∫ ba

.

The first step is to construct orthogonal polynomials pn(x)Nn=0 correspond-ing to the inner product (18.7).

Definition 18.1 (Orthogonal polynomial). The polynomials pn(x)Nn=0 arecalled orthogonal if (i) deg pn = n and the leading coefficient of pn is 1, and(ii) for all m 6= n, we have (pm, pn) = 0.

Some authors require that the polynomials are orthonormal, so (pn, pn) = 1.Here we normalize the polynomials by requiring that the leading coefficient is 1,which is useful for computation. The following three-term recurrence relation(TTRR) shows the existence of orthogonal polynomials and provides an explicitalgorithm for computing them.

Proposition 18.2 (Three-term recurrence relation, TTRR). Let p0(x) = 1,

p1(x) = x− (xp0,p0)

‖p0‖2, and for n ≥ 1 define

pn+1(x) =

(x− (xpn, pn)

‖pn‖2

)pn(x)− ‖pn‖2

‖pn−1‖2pn−1(x). (18.8)

Then pn(x) is the degree n orthogonal polynomial.

Proof. Let us show by induction on n that (i) pn is an degree n polynomial withleading coefficient 1, and (ii) (pn, pm) = 0 for all m < n. The claim is trivial

177

for n = 0. For n = 1, by construction p1 is a degree 1 polynomial with leadingcoefficient 1, and since p0(x) = 1, we obtain

(p1, p0) =

((x− (xp0, p0)

‖p0‖2

)p0, p0

)= (xp0, p0)− (xp0, p0) = 0.

Suppose the claim holds up to n. Then for n + 1, by (18.8) the leadingcoefficient of pn+1 is the same as that of xpn, which is 1. If m = n, then

(pn+1, pn) =

((x− (xpn, pn)

‖pn‖2

)pn −

‖pn‖2

‖pn−1‖2pn−1, pn

)

= (xpn, pn)− (xpn, pn)− ‖pn‖2

‖pn−1‖2(pn−1, pn) = 0.

If m = n− 1, then

(pn+1, pn−1) =

((x− (xpn, pn)

‖pn‖2

)pn −

‖pn‖2

‖pn−1‖2pn−1, pn−1

)

= (xpn, pn−1)− (xpn, pn)

‖pn‖2(pn, pn−1)− ‖pn‖2

= (pn, xpn−1)− ‖pn‖2 .

Since the leading coefficients of pn, pn−1 are 1, we can write xpn−1(x) = pn(x)+q(x), where q(x) is a polynomial of degree at most n − 1. Clearly q can beexpressed as a linear combination of p0, p1, . . . , pn−1, so (pn, q) = 0. Therefore

(pn+1, pn−1) = (pn, pn + q)− ‖pn‖2 = ‖pn‖2 + (pn, q)− ‖pn‖2 = 0.

Finally, if m < n− 1, then

(pn+1, pm) =

((x− (xpn, pn)

‖pn‖2

)pn −

‖pn‖2

‖pn−1‖2pn−1, pm

)

= (xpn, pm)− (xpn, pn)

‖pn‖2(pn, pm)− ‖pn‖2

‖pn−1‖2(pn−1, pm)

= (pn, xpm) = 0

because xpm is a polynomial of degree 1 +m < n.

The following lemma shows that an degree n orthogonal polynomial hasexactly n real roots (so they are all simple).

Lemma 18.3. pn(x) has exactly n real roots on (a, b).

Proof. By the fundamental theorem of algebra, pn(x) has exactly n roots inC. Suppose to the contrary that pn(x) has fewer than n real roots on (a, b).Let x1, . . . , xk (k < n) be those roots at which pn(x) changes its sign andq(x) = (x − x1) · · · (x − xk). Since pn(x)q(x) has a constant sign but is notidentically equal to zero, we have

(pn, q) =

∫w(x)pn(x)q(x) dx 6= 0

178

because w(x) > 0. On the other hand, since deg q = k < n, we have (pn, q) = 0,which is a contradiction.

The following theorem shows that using the N roots of the degree N orthog-onal polynomial pN (x) as quadrature nodes and choosing specific weights, wecan integrate all polynomials of degree up to 2N − 1 exactly. Thus Gaussianquadrature always exists.

Theorem 18.4 (Gaussian quadrature). Let a < x1 < · · · < xN < b be the Nroots of the degree N orthogonal polynomial pN and define

wn =

∫w(x)Ln(x) dx (18.9)

for n = 1, . . . , N , where

Ln(x) =∏m6=n

x− xmxn − xm

is the degree N − 1 polynomial that takes value 1 at xn and 0 at xm (m ∈1, . . . , N \n). Then ∫

w(x)p(x) dx =

N∑n=1

wnp(xn) (18.10)

for all polynomials p(x) of degree up to 2N − 1.

Proof. Since deg p ≤ 2N − 1 and deg pN = N , we can write

p(x) = pN (x)q(x) + r(x),

where deg q,deg r ≤ N − 1. Since q can be expressed as a linear combination oforthogonal polynomials of degree up to N − 1, we have (pN , q) = 0. Hence∫

w(x)p(x) dx = (pN , q) +

∫w(x)r(x) dx =

∫w(x)r(x) dx.

On the other hand, since xnNn=1 are roots of pN , we have

p(xn) = pN (xn)q(xn) + r(xn) = r(xn)

for all n, so in particular

N∑n=1

wnp(xn) =

N∑n=1

wnr(xn).

Therefore it suffices to show (18.10) for polynomials r of degree up to N − 1.Since deg r ≤ N − 1 and degLn = N − 1, by Proposition 17.1 we have

r(x) =

N∑n=1

r(xn)Ln(x)

identically. Since r can be represented as a linear combination of Ln’s, it sufficesto show (18.10) for all Ln’s. But since by (18.9) we have∫

w(x)Ln(x) dx = wn =

N∑m=1

wmδmn =

N∑m=1

wmLn(xm),

the claim is true.

179

In practice, how can we compute the nodes xnNn=1 and weights wnNn=1

of the N -point Gaussian quadrature established in Theorem 18.4? The solutionis given by the following Golub-Welsch algorithm.

Theorem 18.5 (Golub and Welsch, 1969). For each n ≥ 1, define αn, βn by

αn =(xpn−1, pn−1)

‖pn−1‖2, βn =

‖pn‖‖pn−1‖

> 0.

Define the N ×N symmetric tridiagonal matrix

TN =

α1 β1 0 · · · 0

β1 α2 β2. . .

...

0 β2 α3. . . 0

.... . .

. . .. . . βN−1

0 · · · 0 βN−1 αN

. (18.11)

Then the Gaussian quadrature nodes xnNn=1 are eigenvalues of TN . Lettingvn = (vn1, . . . , vnn)′ be an eigenvector of TN corresponding to the eigenvalue xn,

the weights wnNn=1 in (18.9) are equal to

wn =v2n1

‖vn‖2∫w(x) dx > 0. (18.12)

Proof. By (18.8) and the definition of αn, βn, for all n ≥ 0 we have

pn+1(x) = (x− αn+1)pn(x)− β2npn−1(x).

Note that this is true for n = 0 by defining p−1(x) = 0 and β0 = 0. For eachn, let p∗n(x) = pn(x)/ ‖pn‖ be the normalized orthogonal polynomial. Then theabove equation becomes

‖pn+1‖ p∗n+1(x) = ‖pn‖ (x− αn+1)p∗n(x)− ‖pn−1‖β2np∗n−1(x).

Dividing both sides by ‖pn‖ > 0, using the definition of βn, βn+1, and rearrang-ing terms, we obtain

βnp∗n−1(x) + αn+1p

∗n(x) + βn+1p

∗n+1(x) = xp∗n(x).

In particular, setting x = xk (where xk is a root of pN ), we obtain

βnp∗n−1(xk) + αn+1p

∗n(xk) + βn+1p

∗n+1(xk) = xkp

∗n(xk).

for all n and k = 1, . . . , N . Since β0 = 0 by definition and p∗N (xk) = 0 (since xkis a root of pN and hence p∗N = pN/ ‖pN‖), letting P (x) = (p∗0(x), . . . , p∗N−1(x))′

and collecting the above equation into a vector, we obtain

TNP (xk) = xkP (xk)

for k = 1, . . . , N . Define the N×N matrix P by P = (P (x1), . . . , P (xN )). ThenTNP = diag(x1, . . . , xN )P , so x1, . . . , xN are eigenvalues of TN provided that

180

P is invertible. Now since p∗nN−1n=0 are normalized and Gaussian quadrature

integrates all polynomials of degree up to 2N − 1 exactly, we have

δmn = (p∗m, p∗n) =

∫w(x)p∗m(x)p∗n(x) dx =

N∑k=1

wkp∗m(xk)p∗n(xk)

for m,n ≤ N − 1. Letting W = diag(w1, . . . , wN ), this equation becomesPWP ′ = I. Therefore P,W are invertible and x1, . . . , xN are eigenvalues ofTN . Solving for W and taking the inverse, we obtain

W−1 = P ′P ⇐⇒ 1

wn=

N−1∑k=0

p∗k(xn)2 > 0

for all n. To show (18.12), let vn be an eigenvector of TN corresponding to theeigenvalue xn. Then vn = cP (xn) for some constant c 6= 0. Taking the norm,we obtain

‖vn‖2 = c2 ‖P (xn)‖2 = c2N−1∑k=0

p∗k(xn)2 =c2

wn⇐⇒ wn =

c2

‖vn‖2.

Comparing the first element of vn = cP (xn), noting that p0(x) = 1 and hencep∗0 = p0/ ‖p0‖ = 1/ ‖p0‖, we obtain

c2 = v2n1 ‖p0‖2 = v2

n1

∫w(x)p0(x)2 dx = v2

n1

∫w(x) dx,

which implies (18.12).

Below are a few examples of the Gaussian quadrature. By doing a Googlesearch, you can find subroutines in Matlab or whatever programming languagethat compute the nodes and weights of these quadratures.

Example 18.1. The case (a, b) = (−1, 1), w(x) = 1 is known as the Gauss-Legendre quadrature.

Example 18.2. The case (a, b) = (−1, 1), w(x) = 1/√

1− x2 is known as theGauss-Chebyshev quadrature. It is useful for computing Fourier coefficients(through the change of variable x = cos θ).

Example 18.3. The case (a, b) = (−∞,∞), w(x) = e−x2

is known as theGauss-Hermite quadrature, which is useful for computing the expectation withrespect to the normal distribution.

Example 18.4. The case (a, b) = (0,∞), w(x) = e−x is known as the Gauss-Laguerre quadrature, which is useful for computing the expectation with respectto the exponential distribution.

Table 18.3 shows the log10 relative errors when using the N -point Gauss-Legendre quadrature. Comparing to Tables 18.1 and 18.2, we can see thatGaussian quadrature is overwhelmingly more accurate than Newton-Cotes.

If (a, b) = (−∞,∞) and∫∞−∞ w(x) dx = 1 in (18.6), then w(x) can be

viewed as a probability density and (18.6) becomes an expectation. After a suit-able transformation, the Gauss-Legendre, Gauss-Hermite, and Gauss-Laguerre

181

Table 18.3: log10 relative errors of Gauss-Legendre quadrature.

# points x1/2 x3/2 x5/2 x7/2 x9/2 ex

3 -2.4237 -3.3289 -3.8570 -4.0525 -3.8824 -6.31915 -3.0245 -4.3578 -5.3560 -6.0948 -6.6082 -12.41949 -3.7418 -5.5649 -7.0688 -8.3362 -9.4106 -15.954617 -4.5396 -6.8986 -8.9436 -10.7592 -12.3913 -15.954633 -5.3862 -8.3108 -10.9229 -13.3092 -15.3525 −∞

quadratures can then be viewed as approximations to the uniform, normal, andexponential distributions. The same idea can be applied to a wider class of dis-tributions. Since by Theorem 18.5 all we need for implementing the Gaussianquadrature are the polynomial moments

∫w(x)xn dx of the weighting functions

w, Gaussian quadrature can be used for approximating any distribution thathas explicit moments. Toda (2021) uses this idea to discretize nonparametricdistributions from data.

182

Chapter 19

Discretization

If the goal is to solve a single optimization problem that involves expectations(e.g., static optimal portfolio problem), a highly accurate Gaussian quadratureis a natural choice. However, many economic problems are dynamic, in whichcase one needs to compute conditional expectations. Furthermore, to reduce thecomputational complexity of the problem, it is desirable that the quadraturenodes are preassigned instead of being dependent on the particular state of themodel. Discretization is a useful tool for solving such problems.

This chapter explains the Farmer and Toda (2017) method of discretiz-ing Markov processes, which is based on the maximum entropy discretizationmethod of distributions introduced in Tanaka and Toda (2013, 2015). Matlabcodes are available at

https://github.com/alexisakira/discretization

19.1 Earlier methods

For concreteness, consider the Gaussian AR(1) process

xt = ρxt−1 + εt, εt ∼ N(0, σ2).

Then the conditional distribution of xt given xt−1 is N(ρxt−1, σ2). How can

we discretize (find a finite-state Markov chain approximation) of this stochasticprocess?

A classic method is Tauchen (1986) but it should not be used it because itis inaccurate (so we will not explain it further). Similarly, the quantile methodin Adda and Cooper (2003) is poor, as documented in the accuracy comparisonin Farmer and Toda (2017). For Gaussian AR(1) processes, the Rouwenhorst(1995) method is good because the conditional moments are matched exactlyup to order 2 and the method is constructive (does not involve optimization).It is especially useful when ρ ≥ 0.99.

The Tauchen and Hussey (1991) method is based on the Gauss-Hermite

quadrature (Example 18.3). First consider discretizingN(0, σ2). Letting xnNn=1

and wnNn=1 be the nodes and weights of the N -point Gauss-Hermite quadra-

183


ture, since for any integrand g we have

E[g(X)] =

∫ ∞−∞

g(x)1√

2πσ2e−

x2

2σ2 dx

=

∫ ∞−∞

g(√

2σy)1√π

e−y2

dy

≈N∑n=1

wn√πg(√

2σxn),

we can use the nodes x′n =√

2σxn and weights w′n = wn/√π to discretize

N(0, σ2).The same idea can be used to discretize the Gaussian AR(1) process. Let us

fix the nodes x′nNn=1 as constructed above. Since for any integrand g, letting

µ = ρx′m we have

E [g(xt) |xt−1 = x′m] =

∫ ∞−∞

g(x)1√

2πσ2e−

(x−µ)2

2σ2 dx

=

∫ ∞−∞

g(x)e−µ2−2xµ

2σ21√

2πσ2e−

x2

2σ2 dx

≈N∑n=1

w′ne−µ2−2x′nµ

2σ2 g(x′n),

so we can construct the transition probability matrix P = (pmn) by

pmn ∝ w′ne−µ2−2x′nµ

2σ2 ,

where µ = ρx′m and the constant of proportionality is determined such that∑Nn=1 pmn = 1. The Tauchen-Hussey method is relatively accurate if ρ ≤ 0.5,

although a drawback is that it assumes Gaussian shocks. Furthermore, theperformance deteriorates quickly when ρ becomes larger.

19.2 Maximum entropy method

The maximum entropy discretization method of Farmer and Toda (2017) is gen-erally applicable and accurate. Thus it should be the first choice for discretizinggeneral Markov processes.

19.2.1 Discretizing probability distributions

We start the discussion from discretizing a single probability distribution. Sup-pose that we are given a continuous probability density function f : RK → R,which we want to discretize. Let X be a random vector with density f , andg : RK → R be any bounded continuous function. The first step is to pick aquadrature formula

E[g(X)] =

∫RK

g(x)f(x) dx ≈N∑n=1

wng(xn), (19.1)

184

where N is the number of quadrature nodes xnNn=1 and wnNn=1 are weightssuch that wn > 0.

For now, we do not take a stance on the choice of the initial quadratureformula, but take it as given. Given the quadrature formula (19.1), a coarse butvalid discrete approximation of the density f would be to assign probability qnto the point xn proportional to wn, so

qn =wn∑Nn=1 wn

. (19.2)

However, this is not necessarily a good approximation because the moments ofthe discrete distribution qn do not generally match those of f .

Tanaka and Toda (2015) propose exactly matching a finite set of momentsby updating the probabilities qn in a particular way. Let T : RK → RLbe a function that defines the moments that we wish to match and let T =∫RK T (x)f(x) dx be the vector of exact moments. For example, if we want to

match the first and second moments in the one dimensional case (K = 1), thenT (x) = (x, x2)′. Tanaka and Toda (2015) update the probabilities qn bysolving the optimization problem

minimizepn

N∑n=1

pn logpnqn

subject to

N∑n=1

pnT (xn) = T ,

N∑n=1

pn = 1, pn ≥ 0. (P)

The objective function in the primal problem (P) is the Kullback and Leibler(1951) information of pn relative to qn, which is also known as the relativeentropy. This method matches the given moments exactly while keeping theprobabilities pn as close to the initial approximation qn in (19.2) as possiblein the sense of the Kullback-Leibler information. Note that since (P) is a convexminimization problem, the solution (if one exists) is unique.

The optimization problem (P) is a constrained minimization problem witha large number (N) of unknowns (pn) with L+ 1 equality constraints and Ninequality constraints, which is in general computationally intensive to solve.However, it is well-known that entropy-like minimization problems are compu-tationally tractable by using duality theory (Chapter 14). Tanaka and Toda(2015) convert the primal problem (P) to the dual problem

maxλ∈RL

[λ′T − log

(N∑n=1

qneλ′T (xn)

)], (D)

which is a low dimensional (L unknowns) unconstrained concave maximizationproblem and hence computationally tractable. The following theorem showshow the solutions to the two problems (P) and (D) are related. Below, thesymbols “int” and “co” denote the interior and the convex hull of sets.

Theorem 19.1. 1. The primal problem (P) has a solution if and only ifT ∈ coT (DN ). If a solution exists, it is unique.

2. The dual problem (D) has a solution if and only if T ∈ int coT (DN ). If asolution exists, it is unique.

185

3. If the dual problem (D) has a (unique) solution λN , then the (unique)solution to the primal problem (P) is given by

pn =qneλ

′NT (xn)∑N

n=1 qneλ′NT (xn)

=qneλ

′N (T (xn)−T )∑N

n=1 qneλ′N (T (xn)−T )

. (19.3)

Proof. See Farmer and Toda (2017, Theorem 2.1).

Theorem 19.1 provides a practical way to implement the Tanaka-Toda method.After choosing the initial discretization Q = qn and the moment defining func-tion T , one can numerically solve the unconstrained optimization problem (D).To this end, we can instead solve

minλ∈RL

N∑n=1

qneλ′(T (xn)−T ) (D′)

because the objective function in (D′) is a monotonic transformation (−1 timesthe exponential) of that in (D). Since (D′) is an unconstrained convex mini-mization problem with a (relatively) small number (L) of unknowns (λ), solvingit is computationally simple. Letting JN (λ) be the objective function in (D′),its gradient and Hessian can be analytically computed as

∇JN (λ) =

N∑n=1

qneλ′(T (xn)−T )(T (xn)− T ), (19.4a)

∇2JN (λ) =

N∑n=1

qneλ′(T (xn)−T )(T (xn)− T )(T (xn)− T )′, (19.4b)

respectively. In practice, we can quickly solve (D′) numerically using optimiza-tion routines by supplying the analytical gradient and Hessian.

If a solution to (D′) exists, it is unique, and we can compute the updateddiscretization P = pn by (19.3). If a solution does not exist, it means thatthe regularity condition T ∈ int coT (DN ) does not hold and we cannot matchmoments. Then one needs to select a smaller set of moments. Numericallychecking whether moments are matched is straightforward: by (19.3), (D′), and(19.4a), the error is

N∑n=1

pnT (xn)− T =

∑Nn=1 qneλ

′N (T (xn)−T )(T (xn)− T )∑N

n=1 qneλ′N (T (xn)−T )

=∇JN (λN )

JN (λN ). (19.5)

19.2.2 Discretizing general Markov processes

Next we show how to extend the Tanaka-Toda method to the case of time-homogeneous Markov processes.

Consider the time-homogeneous first-order Markov process

P (xt ≤ x′ | xt−1 = x) = F (x′;x),

where xt is the vector of state variables and F (·;x) is a cumulative distributionfunction (CDF) that determines the distribution of xt = x′ given xt−1 = x. The

186

dynamics of any Markov process are completely characterized by its Markovtransition kernel. In the case of a discrete state space, this transition kernelis simply a matrix of transition probabilities, where each row corresponds to aconditional distribution. We can discretize the continuous process x by applyingthe Tanaka-Toda method to each conditional distribution separately.

More concretely, suppose that we have a set of grid points DN = xnNn=1

and an initial coarse approximation Q = (qnn′), which is an N ×N probabilitytransition matrix. Suppose we want to match some conditional moments ofx, represented by the moment defining function T (x). The exact conditionalmoments when the current state is xt−1 = xn are

Tn = E [T (xt) |xn] =

∫T (x) dF (x;xn),

where the integral is over x, fixing xn. By Theorem 19.1, we can match thesemoments exactly by solving the optimization problem

minimizepnn′Nn′=1

N∑n′=1

pnn′ logpnn′

qnn′

subject to

N∑n′=1

pnn′T (xn′) = Tn,

N∑n′=1

pnn′ = 1, pnn′ ≥ 0 (Pn)

for each n = 1, 2, . . . , N , or equivalently the dual problem

minλ∈RL

N∑n′=1

qnn′eλ′(T (xn′ )−Tn). (D′n)

(D′n) has a unique solution if and only if the regularity condition

Tn ∈ int coT (DN ) (19.6)

holds. We summarize our procedure in Algorithm 1 below.

Algorithm 1 (Discretization of Markov processes).

1. Select a discrete set of points DN = xnNn=1 and an initial approximationQ = (qnn′).

2. Select a moment defining function T (x) and corresponding exact condi-

tional momentsTnNn=1

. If necessary, approximate the exact conditionalmoments with a highly accurate numerical integral.

3. For each n = 1, . . . , N , solve minimization problem (D′n) for λn. Checkwhether moments are matched using formula (19.5), and if not, selecta smaller set of moments. Compute the conditional probabilities corre-sponding to row n of P = (pnn′) using (19.3).

The resulting discretization of the process is given by the transition prob-ability matrix P = (pnn′). Since the dual problem (D′n) is an unconstrainedconvex minimization problem with a typically small number of variables, stan-dard Newton type algorithms can be applied. Furthermore, since the proba-bilities (19.3) are strictly positive by construction, the transition probabilitymatrix P = (pnn′) is a strictly positive matrix, so the resulting Markov chain isstationary and ergodic.

187

19.2.3 Examples and applications

Farmer and Toda (2017) contain several applications to solving asset pricingmodels. The Matlab package at


provides codes for discretizing various stochastic processes.

188


Bibliography

Jerome Adda and Russel W. Cooper. Dynamic Economics: Quantitative Meth-ods and Applications. MIT Press, Cambridge, MA, 2003.

R. B. Bapat and T. E. S. Raghavan. Nonnegative Matrices and Applications.Number 64 in Encyclopedia of Mathematics and Its Applications. CambridgeUniversity Press, 1997.

Claude Berge. Espaces Topologiques: Fonctions Multivoques. Dunod, Paris,1959. English translation: Translated by E. M. Patterson. Topological Spaces,New York: MacMillan, 1963. Reprinted: Mineola, NY: Dover, 1997.

Abraham Berman and Robert J. Plemmons. Nonnegative Matrices in the Math-ematical Sciences. Number 9 in Classics in Applied Mathematics. Society forIndustrial and Applied Mathematics, 1994. doi:10.1137/1.9781611971262.

David Blackwell. Discounted dynamic programming. Annals of MathematicalStatistics, 36(1):226–235, February 1965. doi:10.1214/aoms/1177700285.

Philip J. Davis and Philip Rabinowitz. Methods of Numerical Integration. Aca-demic Press, Orlando, FL, second edition, 1984.

Leland E. Farmer and Alexis Akira Toda. Discretizing nonlinear, non-GaussianMarkov processes with exact conditional moments. Quantitative Economics,8(2):651–683, July 2017. doi:10.3982/QE737.

Gene H. Golub and John H. Welsch. Calculation of Gauss quadrature rules.Mathematics of Computation, 23(106):221–230, may 1969. doi:10.1090/S0025-5718-69-99647-1.

F. J. Gould and Jon W. Tolle. A necessary and sufficient qualification forconstrained optimization. SIAM Journal of Applied Mathematics, 20(2):164–172, March 1971. doi:10.1137/0120021.

Gary Harris and Clyde Martin. The roots of a polynomial vary continuouslyas a function of the coefficients. Proceedings of the American MathematicalSociety, 100(2):390–392, June 1987. doi:10.2307/2045978.

Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge UniversityPress, New York, second edition, 2013.

Kenneth L. Judd. Projection methods for solving aggregate growthmodels. Journal of Economic Theory, 58(2):410–452, December 1992.doi:10.1016/0022-0531(92)90061-L.

189

https://doi.org/10.1137/1.9781611971262

https://doi.org/10.1214/aoms/1177700285

https://doi.org/10.3982/QE737

https://doi.org/10.1090/S0025-5718-69-99647-1

https://doi.org/10.1090/S0025-5718-69-99647-1

https://doi.org/10.1137/0120021

https://doi.org/10.2307/2045978

https://doi.org/10.1016/0022-0531(92)90061-L

Takashi Kamihigashi. A simple proof of the necessity of the transver-sality condition. Economic Theory, 20(2):427–433, September 2002.doi:10.1007/s001990100198.

Takashi Kamihigashi. Elementary results on solutions to the Bellman equationof dynamic programming: Existence, uniqueness, and convergence. EconomicTheory, 56(2):251–273, 2014. doi:10.1007/s00199-013-0789-4.

Solomon Kullback and Richard A. Leibler. On information and suffi-ciency. Annals of Mathematical Statistics, 22(1):79–86, March 1951.doi:10.1214/aoms/1177729694.

Peter D. Lax. Linear Algebra and Its Applications. John Wiley & Sons, Hoboken,NJ, second edition, 2007.

David G. Luenberger. Optimization by Vector Space Methods. John Wiley &Sons, New York, 1969.

Harry Markowitz. Portfolio selection. Journal of Finance, 7(1):77–91, March1952. doi:10.1111/j.1540-6261.1952.tb01525.x.

Gregory Phelan and Alexis Akira Toda. Securitized markets, international capi-tal flows, and global welfare. Journal of Financial Economics, 131(3):571–592,March 2019. doi:10.1016/j.jfineco.2018.08.011.

Walter Pohl, Karl Schmedders, and Ole Wilms. Higher-order effects in assetpricing models with long-run risks. Journal of Finance, 73(3):1061–1111,June 2018. doi:10.1111/jofi.12615.

R. Tyrrell Rockafellar. Convex Analysis. Princeton University Press, Princeton,NJ, 1970.

K. Geert Rouwenhorst. Asset pricing implications of equilibrium business cyclemodels. In Thomas F. Cooley, editor, Frontiers of Business Cycle Research,chapter 10, pages 294–330. Princeton University Press, 1995.

William F. Sharpe. Capital asset prices: A theory of market equilibrium un-der conditions of risk. Journal of Finance, 19(3):425–442, September 1964.doi:10.1111/j.1540-6261.1964.tb02865.x.

Ken’ichiro Tanaka and Alexis Akira Toda. Discrete approximations of contin-uous distributions by maximum entropy. Economics Letters, 118(3):445–450,March 2013. doi:10.1016/j.econlet.2012.12.020.

Ken’ichiro Tanaka and Alexis Akira Toda. Discretizing distributions with ex-act moments: Error estimate and convergence analysis. SIAM Journal onNumerical Analysis, 53(5):2158–2177, 2015. doi:10.1137/140971269.

George Tauchen. Finite state Markov-chain approximations to univari-ate and vector autoregressions. Economics Letters, 20(2):177–181, 1986.doi:10.1016/0165-1765(86)90168-0.

George Tauchen and Robert Hussey. Quadrature-based methods for obtainingapproximate solutions to nonlinear asset pricing models. Econometrica, 59(2):371–396, March 1991. doi:10.2307/2938261.

190

https://doi.org/10.1007/s001990100198

https://doi.org/10.1007/s00199-013-0789-4

https://doi.org/10.1214/aoms/1177729694

https://doi.org/10.1111/j.1540-6261.1952.tb01525.x

https://doi.org/10.1016/j.jfineco.2018.08.011

https://doi.org/10.1111/jofi.12615

https://doi.org/10.1111/j.1540-6261.1964.tb02865.x

https://doi.org/10.1016/j.econlet.2012.12.020

https://doi.org/10.1137/140971269

https://doi.org/10.1016/0165-1765(86)90168-0

https://doi.org/10.2307/2938261

Alexis Akira Toda. Operator reverse monotonicity of the inverse.American Mathematical Monthly, 118(1):82–83, January 2011.doi:10.4169/amer.math.monthly.118.01.082.

Alexis Akira Toda. Incomplete market dynamics and cross-sectional dis-tributions. Journal of Economic Theory, 154:310–348, November 2014.doi:10.1016/j.jet.2014.09.015.

Alexis Akira Toda. Wealth distribution with random discount fac-tors. Journal of Monetary Economics, 104:101–113, June 2019.doi:10.1016/j.jmoneco.2018.09.006.

Alexis Akira Toda. Data-based automatic discretization of nonparamet-ric distributions. Computational Economics, 57:1217–1235, April 2021.doi:10.1007/s10614-020-10012-6.

Alexis Akira Toda and Kieran James Walsh. The equity premium and theone percent. Review of Financial Studies, 33(8):3583–3623, August 2020.doi:10.1093/rfs/hhz121.

191

https://doi.org/10.4169/amer.math.monthly.118.01.082

https://doi.org/10.1016/j.jet.2014.09.015

https://doi.org/10.1016/j.jmoneco.2018.09.006

https://doi.org/10.1007/s10614-020-10012-6

https://doi.org/10.1093/rfs/hhz121

Date post:	27-Dec-2021
Category:	Documents
Upload:	others
View:	6 times
Download:	0 times

Essential Mathematics for Economists

Documents