DYNAMIC PROGRAMMING - · PDF fileChapter 13 Dynamic Programming: Theory Dynamic Programming...

Part III

DYNAMIC PROGRAMMING

163

Overview

We begin with part with the following chapter that reviews the theory of dynamicprogramming. In general, understanding these theoretical results serves two pur-poses. The first one is to establish qualitative properties and prove theorems aboutthe model that we analyze, and this is the most common reason for learning thesetheoretical foundations. However, a second benefit is that understanding the theorycan greatly aid in applying the appropriate computational tools to numerically solveour model. The next chapter is focused on this second aspect and I show how itworks.

164

Chapter 13

Dynamic Programming:Theory

Dynamic Programming is a set of powerful tools and methods for analyzinga broad set of sequential (or dynamic) decision problems that appear in vir-

tually all fields of economics, as well as in applications in engineering, operationsresearch, and so on. A sequential decision problem can be thought of as one where atany given point in time a decision maker takes an action which (i) yields an imme-diate reward or utility (or cost) and (ii) modifies state of the problem starting fromtomorrow. A broad range of frameworks developed to study substantively differentquestions—such as Markov decision problems, dynamic games, and stochastic op-timal control problems—turn out to share very similar mathematical structure anddynamic programming provides a unifying approach to analyze all such problems.

Mastery of the theory of dynamic programming at the Phd level is an essentialprerequisite for this book and I will assume that the reader has already studiedthese tools carefully.1 For completeness, this chapter reviews the main results ofdynamic programming without going into details.

The results of dynamic programming can be presented for environments withdiffering levels of generality, so before moving forward it is useful to provide a classi-fication of these environments and how they compare to each other. To understandthe taxonomy that will follow, it is useful to note that, broadly speaking, the resultsof dynamic programming theory can be viewed in two categories: (i) substantive

1Sundaram (1996) is an excellent first reading that builds toward dynamic programming start-ing from the basics. Acemoglu (2009, Ch. 6 and 16) contains a concise yet precise exposition of thesubject. The gold standard for textbook treatment of this subject remains Stokey et al. (1989),although it can be a daunting read as a first text on the subject. Bertsekas (2001, 2005) provide acomprehensive treatment of the subject, and present applications from a very broad backgroundin sciences and engineering.

165

CHAPTER 13. DYNAMIC PROGRAMMING: THEORY 166

results, and (ii) technical (measurability) considerations.2 By this I mean, thereare a number of key ideas that give the theory its immense power—such as theequivalence between sequence and recursive problems, the contraction mapping,the characterization of the value function and policy rules, and so on—and somemore technical issues that do not necessarily add new insights but makes sure thatthe whole theory in its most general form is well specified mathematically.

With these considerations in mind, it is useful to consider the following taxonomyof dynamic programming problems. The first class of problems are deterministic—they contain no probabilistic elements of any kind. This class provides the simplestframework in which most of the basic ideas (substantive considerations) of dynamicprogramming can be understood. Having said that, of course, randomness is a keyaspect of many real life decision problems, and the question is: how do we introduceit? The second class is the stochastic decision problems, where random outcomesare restricted to a domain that is either finite or countably infinite. It turns outthat this case can be studied with relatively straightforward extensions of the resultsfrom case I. Furthermore, this case contains arguably all the substantive ideas ofthis theory. One drawback is that this case does not allow for a continuous spacefor randomness, which can sometimes be a useful theoretical modeling tool. So,the third case is the stochastic decision problems with uncountably infinite (e.g.,continuous) support for stochastic elements. This latter case raises a whole host oftechnical issues involving measurability that requires substantially more advancedprobability theory.3

For the purposes of this book, the sweet spot is case II and below I focus on it.It contains all the essential ideas without burdening the reader with extra detailsthat do not provide a deeper understanding of the method.

13.1 Monotone Mappings

It would not be an overstatement to say that the theory of dynamic programmingwas born out of the realization that the power of fixed point theory (or monotonemappings) can be applied to study sequential decision problems.4 So what aremonotone mappings and what properties do they possess?

The main ideas can be most easily explained in a deterministic framework, so inthis section I shall abstract from uncertainty. First let us formally define two closely-

2Following Bertsekas and Shreve (1978)’s terminology.3There are various ways to address these measurability concerns, as discussed in detail in

Bertsekas and Shreve (1978). Among these, the approach adopted most closely by economists isthe one pioneered by Blackwell (1965) and developed more fully by Bertsekas and Shreve (1978),which imposes Borel-measurability on all relevant functions and objects from the outset. This isalso the approach followed by Stokey et al. (1989).

4Similarly, fixed point theory also underlies the key ideas in game theory (used in the earliestwork by John Nash) as well as in general equilibrium theory by Kenneth Arrow and Gerard Debreu.


related mappings. The first one is what I will refer to as the Bellman mapping andis defined as:

TJ := max

y2�(x)

[U(x, y) + �J(y))] . (13.1.1)

Notice that this is simply the right-hand-side of the Bellman equation, where J

denotes the value function and U denotes the period return function. This mappingcan be split into two steps. The first stage is a mapping that evaluates the righthand side of the Bellman equation for a fixed policy (let us denote it with yt):

Ty(x)

J := [U(x, y(x)) + �J(y(x)))] .

I refer to this second mapping as Howard’s mapping, because of his key insight inhow to use it to speed of the computation of a solution, discussed in more detaillater (Howard (1960)). The Bellman mapping is simply the maximized value of theHoward mapping:

TJ = max

y(x)2�(x)

Ty(x)

J.

If J and U are bounded continuous functions, it can be shown that both T

and Ty map this space into itself.5 And as it turns out both of these mappingsare monotone. These features will play a key role in establishing the contractionmapping theorem.

Definition 13.1. Contraction Mapping. Let (S, d) be a metric space and T :

S ! S be a mapping of S into itself. T is a contraction mapping with modulus �,

if for some � 2 (0, 1) we have

d(Tv1

, T v2

) �d(v1

, v2

)

for all v1

, v2

2 S.6

It turns out that both the Bellman and Howard mappings defined above arecontraction mappings. This can be easily shown by appealing to a sufficient set ofconditions established by Blackwell (1965).

Theorem 13.2. [Blackwell’s Sufficiency Conditions] Let X ✓ RK and B(X)

5Unbounded returns can be dealt with; see, e.g., Alvarez and Stokey (1998) for a full treatmentof the case with homogenous return functions and constraints. The key to establishing results isto impose sufficient restrictions to bound the growth rate of the state variables from above orbelow (along some or all feasible paths), which in turn ensures that the objective is bounded alongthe optimal path(s). However, this additional generality comes at the expense of extra notationand technical arguments that do not add insights that are relevant for the purposes of this book.Therefore, I shall confine the analysis to the case of bounded returns throughout this chapter.

6The definition of a contraction given here is a bit stronger than it needs to be. Most of thefollowing results would go through if we simply consider mappings that contract all elements of thespace after N repeated application of T for some N > 0. See for example, Bertsekas and Shreve(1978), page 52-54.


be the space of bounded functions f : X ! R defined on X equipped with the sup-norm. Suppose that B0

(X) ⇢ B(X) is a subset and let T : B0(X) ! B0

(X) be amapping satisfying the following two conditions:

A Monotonicity: For any f, g 2 B0(X) and f(x) g(x), for all x 2 X, implies

Tf(x) Tg(x) for all x 2 X.

B Discounting: There exists a � 2 (0, 1) such that

T (f + c)(x) Tf(x) + �c

for all f 2 B(X), x 2 X, and positive constants c.

Then, T is a contraction mapping on B0(X) with modulus �.

These sufficiency conditions are especially useful for our purposes, because wewill be interested in spaces of functions that satisfy certain properties (such asboundedness or continuity, etc.), and checking whether a mapping of functions is acontraction can, in general, be very challenging. In contrast, Blackwell’s conditionsare often simple to verify. Notice also that the distinction between B(X) and B0

(X)

is to emphasize that, depending on the application, we can take a subset of the fullspace—such as the space of bounded, continuous, and concave functions—and wouldneed to only verify the sufficiency conditions for functions in this subset to establishthe contraction property, which is typically easier to do.

We next state the contraction mapping theorem, a simple but very useful fixedpoint theorem. It only requires the space to be a complete metric space and themapping to be a contraction. For example, the space of continuous bounded func-tions endowed with the sup-norm is a complete metric space making the results ofthis theorem applicable to dynamic programming problems (with bounded returns.More on this later.)

Theorem 13.3. [Contraction Mapping Theorem] Let (S,d) be a completemetric space and suppose that T : S ! S is a contraction mapping. Then, T has aunique fixed point v⇤ 2 S such that

Tv⇤ = v⇤ = lim

N!1TNv

0

for all v0

2 S.7

In the next section, we present the main theoretical results of dynamic program-ming theory. In that context, S will be thought of as the space of functions (that

7The logic of the proof is very simple. Repeated applications of the contraction mappinggenerates a sequence {v

0

, v1

, ...} that is a Cauchy sequence. Observing that all Cauchy sequencesconverge to a limit point in complete metric spaces delivers the desired result.


possess certain properties, such as boundedness and continuity) and the proofs willrely on showing that the recursive problem defines a mapping that is a contractionthat maps the space of functions into itself. For example, the contraction map-ping theorem is the key result that establishes the existence and uniqueness of thesolution to the Bellman equation given in (13.2.2).

In addition to proving existence and uniqueness, another key application ofthe contraction mapping theorem is for characterizing the properties of the fixedpoint, v⇤, such as strict concavity or strict monotonicity (Theorems 13.10 and 13.11below). It might be tempting to think that we can take S to be the set of strictlyconcave functions and apply the theorem, which would establish that the fixedpoint itself is strictly concave—since v⇤ 2 S. Unfortunately, this is not possible,because the set of strictly concave functions is not a complete metric space, makingthe contraction mapping theorem inapplicable.8 In such instances, the followingcorollary to Theorem 13.3 will be very useful.

Corollary 13.4. Let (S,d) be a complete metric space and T : S ! S be a contrac-tion mapping with Tv⇤ = v⇤.

a. If S is a closed subset of S, and T (S) ⇢ S, then v⇤ 2 S.

b. If, in addition, T (S) ⇢ ˜S ⇢ S, then v⇤ 2 ˜S.

To see how this result helps, take S to be the space of continuous, bounded, concavereal-valued functions (endowed with the sup-norm). This space is a complete metricspace. Then define ˜S to be a proper subset of S by adding the condition of strictconcavity. This set is not a complete metric space, but that is fine. All we needshow is that T maps elements of S, which are concave, into functions in ˜S, whichare strictly concave. This can be ensured in dynamic programming applications wewill study later by assuming that the period utility (or reward) function is strictlyconcave. Then applying part (b) of the corollary establishes that the fixed pointwill also be strictly concave.

For the purposes of computation, equally important is that the contraction map-ping theorem provides a convenient way of obtaining this solution, under very gen-eral conditions. Specifically starting from any initial guess for the value functionthat is bounded and continuous, repeatedly applying the mapping T yields con-vergence to the unique solution. If this sounds familiar, it is because this is theclassic value function iteration method for solving a dynamic program, which weshall cover in a moment. Furthermore, the modulus of the mapping is � (the time

8Notice that while the space of continuous, bounded, and (weakly) concave real-valued func-tions endowed with the sup-norm is a complete metric space, it ceases to be so once we strengthenthe requirement to strict concavity.


discount rate) and determines the rate of convergence (which is linear in �).9 Inparticular, the contraction property can be used to derive a simple bound for themaximum deviation between the current iterate and the (yet unknown true fixedpoint v⇤ :

Corollary 13.5. Let k·k1 denote the supnorm. The maximum deviation betweenthe fixed point and the n-th iterate is bounded as follows:

kv⇤ � vnk1 1

1� �kvn+1

� vnk1 . (13.1.2)

This corollary provides a way to estimate the remaining distance to the fixedpoint with the right hand side of this expression. It is often used in stopping rules indynamic programming algorithms. As we shall see in the next chapter, tighter errorbounds have been derived10 in subsequent work that we shall use in implementingsome algorithms.

13.2 Main Theoretical Results

Understanding the basic theoretical results on dynamic programming is critical fornumerically solving a dynamic programming problem. It is especially critical tounderstand what assumptions are necessary for delivering concavity and differen-tiability of the value function. For completeness, I briefly review and state the mainresults here without proof.

Let xt 2 X ⇢ RN represent the (continuous) endogenous state and zt 2 Z ⇢ Zrepresent the discrete exogenous state of a system that follows a first-order Markovprocess, with transition matrix denoted with Q(z, z0). Every period, an individualcan choose among a set of actions, consistent with the constraint set given thecurrent state: yt 2 �(xt, zt) ⇢ X ⇢ RN . For example, � can be a budget constraintfor an individual, or a resource constraint for an aggregate economy. A rewardfunction, U(xt, yt, zt), assigns a value to action yt in each state, so: U : X ⇥ X ⇥Z ! R. Finally, next period’s state is determined as: xt+1

= G(xt, yt,zt). Giventhis relationship, choosing yt for given (xt, zt) is the same as choosing xt+1

, we willalternatively use yt and xt+1

as the choice variable when convenient. Finally let0 < � < 1 be the discount factor that applies to future rewards.

9If there is no discounting, e.g., � = 1, then we can still define a meaningful DP problem byfocusing on average value function, not total. See Bertsekas (2001) for how this is done. Further-more, an alternative way to write a Bellman equation is by defining the value function as: V (x) =E�P1

t=1

�tu(xt

)

�, so that the Bellman equation becomes: V (x) = maxE(u(x)+�V (x0

)|x). Bert-sekas (2001) and Powell (2011) show how this can suggest a set of new computational techniquesfor solving DP problems more efficiently in high dimension.

10Most notably by MacQueen (1966) and Porteus (1971).


To properly set up the sequence problem, define the history of shocks (exogenousstate) as zt ⌘ (z

1

, z2

, ..., zt), which are vectors in the product space Zt, and definea contingent plan as xt+1

= x(zt) for every history zt.

Sequence Problem: Consider the lifetime optimization problem in sequence form:

˜V (x0

, z0

) = max

{x(zt

)}1t=0

E0

" 1X

t=0

�tU(x(zt�1

), x(zt), zt)

#

(13.2.1)

s.t. x(zt) 2 �(x(zt�1

), zt) 8t � 0, and x0

, z0

given.

Recursive Problem: Now consider the recursive functional (Bellman) equation:

V (xt, zt) = max

yt

2�(xt

,zt

)

[U(xt, yt, zt) + �E(V (yt, zt+1

)|zt)] . (13.2.2)

It will be useful to define the set of all feasible plans (infinite sequences) startingfrom the current state (xt, zt) that can be generated from the period constraint set.Specifically, let

⇡(xt, zt) = {{x(z⌧)}1⌧=t, x(z

⌧) 2 �(x(z⌧�1

), z⌧ ), for ⌧ = t, t + 1, ...}

be the set of all feasible plans starting from (xt, zt).

Before moving ahead, it is useful to study a concrete example.

Example 13.6. An Income Fluctuation Problem. Consider an individual whoderives utility from consumption according to function U(ct), earns a stochasticincome stream yt, and trades a single riskless bond to smooth consumption. Hislifetime utility maximization problem is thus:

˜V (k0

, y0

) = max

{ct

,kt

}1t=0

E0

" 1X

t=0

�tU(ct)

#

(13.2.3)

s.t. ct + kt = (1 + r)kt�1

+ yt 8t � 0, (13.2.4)

kt � �k, and k0

, y0

given. (13.2.5)

Comparing this problem to the sequence problem in (13.2.1), it is clear that thelatter does not have the same structure. For one thing, the sequence problem aboveis written entirely in terms of a single sequence whose values at t and t � 1 enterthe return function U , and the budget constraint for today’s choice is in terms ofyesterday’s choice and today’s shock (x(zt�1

), zt). To put the income fluctuationproblem in this structure, use the budget constraint (13.2.4) to substitute ct =

(1+r)kt�1

�kt+yt into the objective to get: U(kt, kt�1

, yt) ⌘ U((1+r)kt�1

�kt+yt)

and the choice variable in period t is: kt 2 �(kt�1

, yt) ⌘ [�k, (1+r)kt�1

+yt]. Using


the sequence notation for the history of shocks, we have:

˜V (k0

, y0

) = max

{˜k(yt

)}1t=0

E0

" 1X

t=0

�tU(

˜k(yt), ˜k(yt�1

), yt)

#

(13.2.6)

s.t. ˜k(yt) 2 �(

˜k(yt�1

), yt) ⌘ [�k, (1 + r)˜k(yt�1

) + yt], (13.2.7)

which has exactly the same structure as the sequence problem (13.2.1).

One point of this exercise is to remind the reader that the reward function U

has different arguments from the utility function U . In particular, it is obtained bysubstituting some of the constraints into the utility function. Therefore, it dependsboth on the current state kt as well as today’s choice variable (and tomorrow’sstate) kt+1

and the stochastic stream, zt. Furthermore, the constraint set for kt isa closed and convex interval. These points are useful to remember when evaluatingsome of the assumptions that we will need to impose on U and � in a moment.

To obtain our first set of theoretical results, we make two assumptions.

Assumption 1. For all (x, z) 2 X ⇥ Z, the constraint correspondence � is non-empty valued and for all initial conditions (x

0

, z0

) the value of the sequenceproblem ˜V (x

0

, z0

) exists and is finite.

Now let A be the graph of �, that is A = {(x, y, z) 2 X ⇥X ⇥Z ! R : y 2 �(x, z)}.

Assumption 2. X is compact and � is nonempty valued, compact-valued, andcontinuous with a continuous graph A.

Theorem 13.7. [Equivalence of Problems]. Under assumption 1, any solution˜V (x, z) to the sequence problem (13.2.1) is also a solution to the recursive problem(13.2.2) and vice versa for V (x, z). Therefore, for all (x, z) 2 X ⇥ Z, we have˜V (x, z) = V (x, z).

Notice that Theorem 13.7 does not establish the existence of a solution to eitherproblem. It merely states that if such solutions exist, they must coincide.

Theorem 13.8. [Principle of Optimality] Suppose that Assumption 1 holds. Ifa feasible plan {x(z⌧

)}1⌧=0

2 ⇡(x0

, z0

) attains the maximum in the sequence problemthen it also attains the maximum in the recursive problem. And the same is true inthe opposite direction.

Theorem 13.9. [Existence of Optimum] Existence of OptimumUnder Assump-tions 1 and 2, there exists a unique value function V, which is continuous andbounded in x for each z 2 Z. Further, for every initial state (x

0

, z0

) 2 X ⇥ Z anoptimal plan {x⇤

(z⌧)}1⌧=0

2 ⇡(x0

, z0

).


To summarize these three theorems, under mild conditions on the constraintcorrespondence � and finiteness of lifetime utility, the sequence problem and recur-sive problem coincide: the value function and decision rules that solve the recursiveproblem generate sequences that solve the sequence problem for any initial con-ditions and deliver the same lifetime utility. The existence of a solution relies onWeirstrass’ theorem and therefore requires stronger assumptions, namely compact-ness and continuity provided by Assumption 2. These assumptions are rather mildand are satisfied by a very broad range of economic problems, making dynamicprogramming a very useful tools.

However, for computation, we want to know more about the properties of thesolution. For this, we need to make further assumptions.

Assumption 3. [Strict Concavity of Period Utility] For all z 2 Z, U(., ., z)

is strictly concave in (x, y) over A. That is, for any ✓ 2 (0, 1) and (x, y) and(x0, y0

) 2 A, we have

U(✓(x, y) + (1� ✓)(x0, y0), z) � ✓U(x, y, z) + (1� ✓)U(x0, y0, z)

with strict inequality when x 6= x0.

Assumption 4. [Convex Choice Set] For all z 2 Z, �(., z) is convex in x. Thatis, for any ✓ 2 (0, 1) and x and x0 2 X , y 2 �(x, z) and y0 2 �(x0, z) impliesthat

✓y + (1� ✓)y0 2 �(✓x + (1� ✓)x0, z).

The assumption of a convex choice set is critical for establishing the concavity ofthe value function as we shall we see in a moment. A convex choice set rules outincreasing returns. For example, suppose that y represents consumption and f(x)

is output produced with capital level x. So �(x, z) = {y 2 X : 0 < y < zf(x)}.Convexity implies: ✓y + (1� ✓)y0 < z(✓f(x) + (1� ✓)f(x0

)). Therefore, f needs tobe concave for � to be convex.

Another case ruled out by Assumption 4 is the discretization of the choice space,which was a technique used in the early computational literature and is now all butirrelevant.11 This approach breaks the convexity of the choice set and therefore cangive rise to value functions that are not strictly concave.

Theorem 13.10. [Strict Concavity of V] Under Assumptions 1–4, the uniquevalue function that satisfies (13.2.2) is strictly concave in x for every x 2 Z. More-over, a continuous policy function, g : X ⇥ Z ! X , exists such that the optimalplan satisfies x⇤

(z⌧) = g(x⇤

t , zt).

11Having said that, one can find several recent papers that still use this approach in the nameof convenience. As I argue later, there is no excuse for this approach with today’s computationalresources and the practicality of numerical techniques that allows one to avoid this approach.


Assumptions 1 and 2 are more technical in nature and are usually satisfiedin a broad set of problems. Thus, concavity of the return function U in bothof its arguments, ensured by assumption 3, and the convexity of the choice set(assumption 4), are very often the critically important ones. The strict concavityof the value function is very useful and important in a variety of contexts (e.g., itis required for the Euler equation to be sufficient), so it is important to make surewhether assumption 3 is satisfied or not in a problem before we attempt to solve itnumerically.

Assumption 5. For all (y, z) 2 X ⇥ Z, U(., y, z) is strictly increasing in x and �

is a monotone correspondence: if x x0 then �(x, z) ⇢ �(x0, z).

Theorem 13.11. [Strict Monotonicity of V] Under Assumptions 1, 2, and 5,the optimal value function V (x, z) is strictly increasing in x for all z 2 Z.

Assumption 6. U(x, y, z) is continuously differentiable in x in the interior of itsdomain.

Theorem 13.12. [Differentiability of V] Under Assumptions 1-3 and 6, theoptimal value function V (x, z) is differentiable with respect to x in the interior ofx’s domain.

Differentiability is another key property that, if violated, makes a number of tech-niques inapplicable or less powerful. Thus, it is essential to figure out whether ornot its assumptions are satisfied. The only additional assumptions needed, relative,to strict concavity, is the continuous differentiability of the period utility functionin the current state, x. This is violated more often than one might think, leading tokink(s) in the value function. One of the most common scenario is if our problemhas occasionally binding constraints, or fixed costs. etc.

[[[

13.3 Examples

Example 13.13. Analytical Solution of Dynamic Programming Problem.In this example, we show how to apply the Bellman operator repeatedly (as dictatedby the contraction mapping theorem) to solve a dynamic programming problem.This is a special example that has a closed-form solution, which is an exceptionrather than the rule.


Consider the following stylized version of the neoclassical growth model in adeterministic setting, with log utility over consumption, a Cobb-Douglas productionfunction, y = Ak↵, and full depreciation of the capital stock:

V (k) = max

c,k0{log c + �V (k0

)}s.t c = Ak↵ � k0.

Rewrite the Bellman equation as:

V (k) = max

k0{log (Ak↵ � k0

) + �V (k0)} . (13.3.1)

Our goal is to find V (k) and a decision rule k0= g(k). As our initial guess, we

take V0

(k) ⌘ 0 (which would be the natural choice for the last period of life in afinite-horizon problem with no bequest motive). Therefore, we have:

V1

(k) = TV0

(k) = max


) + V0

(k0)} ) k0

= 0

) V1

(k) = logA + ↵ log k.

Now, substitute V1

into the RHS of V2

:

V2

(k) = TV1

(k) = max


) + � (logA + ↵ log k0)}

) FOC :

1

Ak↵ � k0 =�↵

k0 =) k0=

↵�

1 + ↵�⇥ Ak↵

|{z}

y

.

Therefore, the optimal policy is to save a constant fraction of output, which we cansubstitute into the RHS to obtain V

2

:

V2

(k) =

(1 + � + ↵�) logA+log1

1 + ↵�+ ↵ log(

↵�

1 + ↵�)

�

+ ↵(1 + ↵�)log (k) .

We can keep iterating to find the solution, but there is a faster way. Note that bothV

1

and V2

are of the form: a+b log k, for some constants a and b. Thus, T maps thefamily of functions of this form into itself and it is straightforward to show that T

is also a contraction. Therefore, the contraction mapping theorem tells us that thisBellman equation has a unique solution, which is itself log-linear in k.12 So let usdenote the unique fixed point as V ⇤

(k) = a + b log k, where a and b are coefficientsthat need to be determined. Let us then use this conjecture in (13.3.1) to get:

V ⇤= a + b log k = max


) + � (a + b log k0)} = TV ⇤.

12Of course, it needs to be shown that this class forms a complete metric space too. This is notstraightforward, since logarithmic function is unbounded. See Stokey et al. (1989) for details.


The first order condition with respect to k0 of this problem is:

1

Ak↵ � k0 =

�b

k0 ) k0=

�a

1 + �bAk↵.

Let LHS = a + b log k. Plug in the expression for k0 into the RHS:

RHS = log

✓

Ak↵ � �b

1 + �bAk↵

◆

+ �

✓

a + b log

✓

�b

1 + �bAk↵

◆◆

.

Imposing the condition that LHS ⌘ RHS for all k, we can solve for a and b :

a =

1

1� �

1

1� ↵�

"

logA + (1� ↵�) log (1� ↵�)

+↵� log↵�

#

b =

↵

1� ↵�

We have solved the model! Although this was a very special example—analyticalsolutions are hard to come by—some aspects of the approach used here are infor-mative for the contraction mapping theorem more generally using computationalmethods. As long as the true value function is “well-behaved” (smooth, etc), wecan choose a sufficiently flexible functional form that has a finite (ideally small)number of parameters. Then we can apply the same logic as above and solve forthe unknown coefficients (sometimes called the “method of undetermined coeffi-cients”), which then gives us the complete solution. Many solution methods rely onvarious versions of this general idea (perturbation methods, collocation methods,parametrized expectations).

Example 13.14. Non-strictly concave value function. This can arise moreoften than one might initially think. One class of models that give rise to thisoutcome is when the individual faces a discrete choice (so non-convex choice set),where each choice on their own deliver a strictly concave value function, but then theindividual is allowed to randomize between the two choices. Without randomization,the time-zero value function would be the upper envelope of the two concave valuefunctions, which may well be convex, as shown in Figure 13.3.1. Randomization,or lotteries, are used to convexify the choice set and convert the discrete choice setinto a continuous one, leading to the disappearance of the convex value function.This idea was first pursued in a pioneering paper by Rogerson (1988) and has beenemployed as an effective “trick” to convexify problems (rather than an economicallymeaningful mechanism) that would otherwise lead to non-concave value functions(among others, see, Hansen (1985), Clementi and Hopenhayn (2006), and Paulsonet al. (2006)).

A particularly interesting example of this has been studied by Hopenhayn andVereshchagina (2009) in the context of risk taking by entrepreneurs. This paper


Figure 13.3.1 – Discrete Choice, Non-Convex Value Function, and Convexificationvia Lotteries

A. Value function in period t = 1 B. Value function in period t = 0

goes beyond lotteries as a trick and considers a framework in which lotteries nat-urally arise as an economically plausible entrepreneurial project risk choice. In itssimplest form, they consider a two period model in which agents choose one of twooccupations in period—being a worker or an entrepreneur. Conditional on eachchoice the value function is strictly concave under the assumption that the periodutility function is strictly concave.

13.4 The Euler Equation

The solution(s) to the sequential decision problems of the kind that are typicallystudied in economics problems can often be characterized as the solution to a second-order (stochastic) difference equation, called Euler equations.13

To derive the Euler equation, start with the Bellman equation:

V (x, z) = max

y2�(x,z)

{U(y, x, z) + �E (V (x0, z0)|z)}s.t. x0

= H(y, x, z)

An optimal policy of the form: y = g(x, z) must satisfy the first-order optimalitycondition (FOC):

Uy(y, x, z) + �E (Vx (x0, z0) |z)Hy(y, x, z) 0, (13.4.1)

13The class of problems studied here grew out of the famous brachistochrone curve problemfirst posed by Johann Bernoulli and studied by many of the great mathematicians of his time.Swiss mathematician Leonhard Euler made extensive contributions to this field, including refiningthe approach that uses the second-order difference (or differential) equation and giving the field itsname “calculus of variations,” with his treatise of the same title. The equation is also sometimesreferred to as Euler-Lagrange equation.


with equality if the solution is interior.14 The envelope condition (Benveniste andScheinkman (1979)) (with y = g(x, z)) is:

Vx(x, z) = Uy(y, x, z)gx(x, z) + Ux(y, x, z)+

+ �E (V 0(x0, z) |z)Hx(y, x, z) + �E (Vx (x

0, z) |z) @Hy(y, x, z)gx(x, z).

(13.4.2)

From the FOC, assuming an interior solution, it is easily seen that the first andlast terms in (13.4.2) sum up to zero, leaving:

Vx(x, z) = Ux(y, x, z) + �E (Vx (x0, z) |z)Hx(y, x, z). (13.4.3)

In some problems it is possible to define the state and choice variables such thatthe transition equation does not depend on current period state: x0

= H(y, z).15 Wehave seen an example of this in the income fluctuations problem presented above,where choosing current assets to be the state variable and next period’s assets asthe choice variable delivered this result. In this case, the second term in (13.4.3)vanishes, since Hx(y, z) ⌘ 0, leaving:

Vx(x, z) = Ux(y, x, z)|y=g(x,z)

(13.4.4)

The intuition for the envelope condition is that because the objective functionhas already been maximized with respect to the choice variable, y, the indirecteffect of y on the objective from that optimal point is zero. The envelope conditiontypically requires convexity conditions and differentiability of the value function,although Milgrom and Segal (2002) show that it extends to arbitrary choice sets aslong as V is differentiable at point x.16

Substituting (13.4.4) into the first order condition (13.4.1), we obtain the Eulerequation:

0 =Uy(y, x, z) + �E (Ux(y0, x0, z0))Hy(y, z)

=Uy(g(x), x) + �E (Ux(g(x0, z0), x0, z0)|z)Hy(g(x, z), z).

13.4.1 Variational Approach to Deriving Euler Equations

There is an alternative way to derive the Euler equation that works on the sequenceproblem and relies on the definition of the optimum sequence that solves that prob-

14Hy

would stay inside the expectations operator if the transition function depends on anvariable that is not in time t information set.

15In fact, Stokey et al. (1989) and Acemoglu (2009) adopt this notation throughout, whereasBertsekas (2001) uses the more general notation defined above.

16They have a very useful simple example in the intro of their paper. Check it out. Theintuition is useful to discuss here.


lem.17 Although working on the sequence problem might seem antithetical to theidea of recursive methods on which this book builds, this approach can often haveimportant advantages, so it is essential to master it. I first show how the derivationworks and then discuss when it is advantageous to use this approach.

Consider the sequence problem (13.2.1), specialized to the case of a consumption-savings problem here for simplicity. We have:

˜V (a0

, z0

) = max

{x(zt

)}1t=0

E0

" 1X

t=0

�tU(c(zt), zt)

#

(13.4.5)

s.t. c(zt) + a(zt

) = (1 + r)a(zt+1

) + y(zt) 8t � 0,

and z0

and a0

= a(z0

) given. Let c⇤(zt) denote the optimal consumption sequence,

i.e., the one that attains the value function ˜V (a0

, z0

). Construct an alternativesequence that reduces consumption in period t (after all possible histories up untilthat date) by an amount " > 0, and invests this extra savings in the risk-free asset,which yields (1 + r)" in t + 1, which is then consumed at that date. Therefore, thenew sequence is

c ⌘ {c⇤(z0

), z⇤(z1

), ...., c⇤(zt)� ", c⇤(zt+1

) + (1 + r)", c⇤(zt+2

), ....}.

Notice that this alternative sequence is also budget-feasible, since it only affects theconstraints in periods t and t + 1 and in a way that offsets each other. Let V alt

denote the lifetime utility from this alternative policy. This perturbed sequence willchange the lifetime value of the individual by

V alt� ˜V (a0

, z0

) = [U(c⇤(zt)�")�U(c⇤(zt

))]+�[U(c⇤(zt+1

)+(1+r)")�U(c⇤(zt+1

))].

Because c⇤ is the optimal sequence by assumption, this increase cannot be positive,therefore: V alt � ˜V (a

0

, z0

) 0. Since " > 0 we can divide through both sides of theinequality with it, and take the limit as " ! 0, which yields:

�U 0(c⇤(zt

)) + (1 + r)�U 0(c⇤(zt+1

)) 0,

with equality when the sequence {c⇤} is in the interior of the choice set in t andt + 1. Rearranging it for an interior solution yields:

1 = (1 + r)�U 0

(c⇤(zt+1

))

U 0(c⇤(zt

))

.

Using the same steps, it can be easily shown that for an asset with a stochastic

17In fact, this is how the great mathematicians of the 17th century originally approached cal-culus of variations problems.


return the usual Euler equation is obtained:18

1 = E

�U 0

(c⇤(zt+1

))

U 0(c⇤(zt

))

(1 + r(zt+1

))

�

.

This approach makes several issues clear. First, we derived the Euler equationby using the argument that {c⇤} cannot be the optimal sequence unless it satis-fies the resulting Euler equation. This makes clear that the Euler equation is anecessary condition for optimality but may not be sufficient. Because even thoughthe particular one-step deviation we introduced may not yield utility improvementsperhaps others can. Or even though small perturbations (e.g., a small ") may notincrease utility there is no guarantee that large deviations would not improve utility.This gives us a hint that for this not to be the case, the utility function and thechoice set must possess certain properties. Indeed, this is the case. If the choice setis convex and the utility function is concave then and only then the Euler equationwill also be sufficient for optimality.

Second, in complicated problems, this approach can often allow one to derivethe Euler equation in a more intuitive and easier way. For example, Mirleesiantaxation problems, which usually have complex constraint sets, are typically moreamenable to the variational approach. Thus, unsurprisingly, many of the proofsin this literature (among others, Rogerson (1985), Golosov et al. (2003), Werning(2014)) use variational arguments to derive, what is called, the inverse Euler equa-tion that captures the key trade-offs in that framework. Similarly, problems withtime inconsistency (due to non-geometric discounting in preferences or governmentcommitment problems) often yield, what is called, generalized Euler equations,which are often derived more easily using this approach. Third, this sequential for-mat of the Euler equation will come in handy in one type of algorithm, where weshall track the history of shocks. See Chapter 20.7.

13.5 Fancier Euler Equations

13.5.1 Ben-Porath/Irreversible Investment

Consider the problem of an individual who divides one unit of time each periodbetween accumulating human capital, h, and working in the market...

18A more general derivation is the following. Consider the same problem but allow the choiceset to be more general, so that y 2 �(x). Now take the optimal sequence {x⇤

t

} and attempt to varyonly one element to improve the objective value. If the period return function is time separable thenonly utility in the current period and the next will be affected. So max

y

⇥U(x⇤

t

, y) + �U(y, x⇤t+2

)

⇤

subject to y 2 �(x⇤t

) and x⇤t+2

2 �(y). Assuming that the original sequence is in the interior ofthe choice set of all t, the first order conditions are U

y

(x⇤t

, x⇤t+1

)+�Ux

(x⇤t+1

, x⇤t+2

) = 0. Since thisvariation can be done for every t, we get a second-order difference equation of this kind for everyt.


max

{it

}T

t=1

"

TX

t=1

�t�1Rht(1� it)

#

s.t

ht+1

= ht + A (htit)↵

, with h0

> 0 given

it 2 [0, 1]

Defining the newly produced human capital as Q = A(hi)↵ and denoting the op-portunity cost of investment as C(Qs) ⌘ hsis = (Qs/A)

1/↵, we can rewrite thedynamic programming problem in this way (that turns out to be more convenientto work with):

Vs (hs) = max

Qs

2[0,Ah↵

s

]

[R (hs � C(Qs)) + �Vs+1

(hs + Qs)]

Assuming interior solution, the FOC:

RC 0(Qs) = �V 0

s+1

(hs + Qs)

Envelope condition

V 0s (hs) = R + �V 0

s+1

(hs + Qs) .

Notice that the envelope condition did not allow us to eliminate the derivative ofthe value function as it did before. Instead the derivative of today’s value functionturns out to depend on the derivative of tomorrow’s value function. But this isstill useful if we observe that such an equation holds in every period, so we cansubstitute out Vs+1

on the right hand sides as a function of Vs+2

:

V 0s (hs) = R + �

⇥

R + �V 0s+2

(hs+2

)

⇤

and at the terminal date we have VT+1

⌘ 0, which allows us to obtain and expressionwithout value functions in it. Here is the first order condition:

RC 0(Qs) = �

�

R + �R + ... + �S�s�1R

(13.5.1)

Notice an interesting feature of this problem: the optimal choice does not dependon any endogenous state variable, that is Qs is not a function of hs; it only dependson s. This suggests that the value function formulation above included a redundantstate variable. This result—which is called “neutrality”—is due to the fact that theBen-Porath production function adopted here has the same exponent ↵ for both it

and ht. Relaxing this assumption would make hs a proper state variable again. But


this is to keep in mind that what state variables a dynamic programming problemhas takes some consideration and understanding the economic problem at a deeplevel. Later we shall see some examples in which we can reduce the number of statevariables in even more subtle ways.

13.5.2 Time Non-Separable Preferences

Here is an example of an Euler equation for preferences that exhibit habit formation,whose Euler equation is straightforwardly obtained using this approach. Considerthe consumption and saving decision of a household given by,

˜V (a0

, z0

) = max

{ct

,at+1}1

t=0

E0

" 1X

t=0

�tU(c(zt+1

)� ✓c(zt))

#

, (13.5.2)

s. to

c(zt) + a(zt

) = (1 + r(zt+1

))a(zt+1

) + y(zt), c�1

> 0 given (13.5.3)

in which zt is the history of random realizations of the income shock y(zt). There-

fore, the momentary utility in the current period depends on consumption spendingboth today and yesterday. Let c⇤(zt

) denote the optimal consumption sequence andconsider as before reducing consumption in t for all states by an amount of ✏ > 0

and investing this amount at gross rate of return R(zt+1

). Hence, this new sequenceis

c = {c⇤(z0

), c⇤(z1

), ..., c⇤(zt)� ✏, c⇤(zt+1

) + ✏R(zt+1

), c⇤(zt+1

), ...}, (13.5.4)

which in turn yields the following sequence of momentary utilities:

{U⇤0

, ..., U((c⇤(zt)� ✏)� ✓c(zt�1

)), U((c⇤(zt+1

) + ✏R(zt+1

))� ✓(c⇤(zt)� ✏)),

U(c⇤(zt+2

)� ✓(c(zt+1

) + ✏R(zt+1

)), U⇤t+3

, ...},

where U⇤t denotes the momentary utility under the optimal sequence. This alterna-

tive sequence is feasible, which is easy to verify. It also affects momentary utility int+2 by modifying the habit stock in period t+1. This perturbed sequence changeslifetime utility by

�V = [U((c⇤(zt)� ✏)� ✓c⇤(zt�1

))� U(c⇤(zt)� ✓c⇤(zt�1

))]+

�Et[U(((c⇤(zt+1

) + ✏R(zt+1

))� ✓(c(zt)� ✏))� U(c⇤(zt+1

)� ✓c(zt))]+

�2Et[U(c⇤(zt+2

)� ✓(c⇤(zt+1

) + ✏R(zt+1

)))� U(c⇤(zt+2

)� ✓c⇤(zt+1

))] 0,

which cannot be positive since c⇤ is the optimal sequence and the new sequence isbudget feasible. Since ✏ > 0, we can divide in both side of this equation and take


the limit when ✏ ! 0 and assume that U is a power utility function, which yields

1 = Et

�R(zt+1

)⇥ (c(zt+1

)� ✓c(zt))

�� ✓(c(zt+2

)� ✓c(zt+1

))

��

(c(zt)� ✓c(zt�1

))

�� ✓(c(zt+1

)� ✓c(zt))

�

�

(13.5.5)

at an interior optimum, which is the Euler Equation.

The last two examples relied a sequential formulation, leading to Euler equationswhere the choice function is expressed as a function of the entire history of shocks.In the next example, we use a recursive formulation with explicit state variables.

13.5.3 Time Inconsistency Problems and a Generalized EulerEquation

Consider the consumption-savings problem facing an individual with preferencesthat display quasi-geometric discounting defined as follows.19 Let Vt denote thelifetime utility starting in period t :

V0

= U0

+ ��

�U1

+ �2U2

+ �3U3

...�

(13.5.6)

V1

= U1

+ ��

�U2

+ �2U3

+ ...�

V2

= U2

+ �(�U3

+ ...)

Notice that there are two parameters governing patience: � determines the dis-counting between today and tomorrow, i.e., it is a measure of short-term impa-tience, whereas � is what the individual thinks he will use to discount future datesstarting tomorrow. When � = 1, these preferences reduce to the usual geometricdiscounting, leading to a recursive problem that is time consistent. However, when� < 1, the optimal plan made by the current self will not typically be viewed asoptimal by the individual tomorrow. To refer to this time inconsistency we refer tothe individual at each point in time as a separate self. Of course an open question iswhether the individual today will take into account this time inconsistency and inparticular the fact that he cannot control the choices that the self tomorrow makes.

A reasonable assumption is that the individual is rational and then an interestingquestion is whether there is a policy that is time consistent in this environment—that is tomorrow’s self chooses the same policy as the one chosen by the self today.Before going further, it is useful to note that here we discuss an application to pref-erences, largely motivated by experimental evidence showing that time discountingresembles a hyperbolic function (here approximated by quasi-geometric form), avery similar kind of time inconsistency arises in government policy problems withbroad applications.20

19See Krusell et al. (2002).20e.g., Klein et al. (2008)


Now let us solve the planner’s problem in this economy. The problem can bewritten recursively once we properly distinguish between the current self’s valuefunction J(k) and what he perceives as the value function of the self tomorrow,V (k) :

J(k) = max

k0[U(f(k)� k0

) + ��V (k0)] (13.5.7)

where, denoting the optimal policy k0= h(k) we have

V (k) = [U(f(k)� h(k)) + �V (h(k))] (13.5.8)

Now the FOC is:

U 0(f(k)� h(k)) = ��V 0

(h(k)). (13.5.9)

Differentiate future self’s value function using (13.5.8):

V 0(k) = U 0

(f(k)� h(k))(f 0(k)� h0

(k)) + �V 0(h(k))h0

(k). (13.5.10)

Substitute V 0(h(k)) from (13.5.9) into (13.5.10), then lead the resulting equation

by one period, and substitute the new expression for V 0(h(k)) back into the FOC

(13.5.9), which yields

1 = ��U 0

(f(h(k))� h(h(k)))f 0(h(k))

U 0(f(k)� h(k))

✓

f 0(h(k)) +

✓

1

�� 1

◆

h0(h(k))

◆

.

This is a functional equation in h(k) and can be solved using one of the methodsdescribed in the coming chapters. Alternatively, substitute back kt+1

= h(kt) toget

1 = ��U 0

(f(kt+1

)� kt+2

)f 0(kt+1

)

U 0(f(kt)� kt+1

)

✓

f 0(kt+1

) +

✓

1

�� 1

◆

h0(kt+1

)

◆

,

for t = 0, 1, 2, .... Notice that when � = 1, this reduces to the standard Eulerequation derived above (give equation number). When � < 1, the return to savingsperceived by the current self is higher than the marginal product of capital bya (positive) amount

�

��1 � 1

�

h0(kt+1

), which captures the fact that current self,while being impatient about tomorrow, is more patient than tomorrow’s self aboutthe future starting tomorrow. Therefore, he wants to save an additional amountwhose intensity increases with how much of that additional wealth tomorrow’s selfwill save (given by h0

(kt+1

)).


Exercise 13.15. Consider a firm’s optimal investment problem in the presence ofconvex capital adjustment costs. Specifically, the firm’s managers choose investmentpolicy {It} and labor demand {Lt}, to maximize the value of the firm, which equalsthe value of the future dividend stream generated by the firm, {Dt+j}1j=1

, dis-counted by the marginal rate of substitution process of firm owners,

�

�j⇤t,t+j

1j=1

.The firm’s dividend is Dt = ZtK

✓t L1�✓

t � WtLt � It, where logZt follows an AR(1)process, Kt and Lt are capital and labor employed by the firm, Wt is the wage rate

P st = max

{It+j

,Lt+j

}Et

2

4

1X

j=1

�j⇤t,t+jDt+j

3

5 (13.5.11)

subject to the law of motion for capital, which features “adjustment costs” in in-vestment:

Kt+1

= (1� �)Kt + �

✓

It

Kt

◆

Kt. (13.5.12)

Let � = a1

(It/Kt)1�1/⇠

+ a2

and choose a1

and a2

such that the balanced growthpath is invariant to ⇠. Derive the Euler equation that characterizes the optimalpolicy of this firm.

13.6 Dynamic Programming: An Alternative For-mulation

While the formulation adopted so far is the most common way to write down adynamic programming problem in economic applications, it is not the only possibleone. In particular, in the formulation of Section 13.2, we had xt+1

= G(xt, yt,zt),

implying that today’s state (xt, zt) together with today’s choice yt determined to-morrow’s endogenous state variable, xt+1

, which was an argument into the valuefunction on the right hand side of the Bellman equation. Consequently, the first or-der condition for yt (equation (13.4.1)) involved the derivative of tomorrow’s valuefunction—a function that is not known until the problem is solved.

An alternative formulation proceeds as follows. First, we assume that the statespace is discrete: x 2 {x

0

, x1

, x2

, ..., xN}. In general, this is not a palatable assump-tion because in the original formulation it would have required the choice variable toalso be discrete (so that we can move from today’s grid to tomorrow’s grid and notland off-grid). However, we will not be making this assumption. Instead, we assumethat the choice variable determines the transition probabilities between today’s andtomorrow’s states: ⇡ij(yt) ⌘ Pr{xt+1

= xj |xt = xi, yt). This specification is moregeneral than it may first appear. For example, if yt is savings, we can allow it to bea continuous choice, and specify the function ⇡ij(yt) judiciously. For example wecould assume that higher savings today results in a probability distribution over to-


morrow’s wealth (xt+1

) that first order stochastically dominates the one with lowersavings.21 Similarly, one can model deterministic transitions with a matrix properlyplaced zeroes and ones. We can allow for constraints by adding virtual states withlarge negative payoffs. And so on.

With this formulation, we can now write the Bellman equation analogous to(13.2.2) as:

V (xi) = max

y2�(xi

)

"

U(xi, y) + �

NX

n=1

⇡ij(yt)V (xj)

#

. (13.6.1)

Notice several differences of this formulation from the previous one. First, zt

is no longer a separate state variable, because we assume that all the effects ofMarkov uncertainty is subsumed into the transition matrix, ⇡. Second, and moreimportantly, now today’s choice variable, y, no longer enters the unknown valuefunction V as an argument. Instead, it is also subsumed into the transition matrix.To see why this matters, differentiate the right hand side with respect to y to obtainthe first order condition:

U2

(xi, y) + �

NX

n=1

⇡0ij(yt)V (xj) = 0.

We have quite a bit more to say on this formulation in the coming chapters.Some especially useful computational tricks have been developed in this frameworkand they extend nicely to frameworks with continuous state. More on this later.

21F{xt+1

xj

|xt

= xi

, ya

} F{xt+1

xj

|xt

= xi

, yb

} for all ya

> yb

, (where F is thecumulative distribution function corresponding to ⇡

ij

) with strict inequality for some a, b.

Date post:	06-Mar-2018
Category:	Documents
Upload:	vuongdat
View:	233 times
Download:	6 times

DYNAMIC PROGRAMMING - · PDF fileChapter 13 Dynamic Programming: Theory Dynamic Programming...

Documents