Download - Experimentation and Persuasion in Political Organizationsavhirsch/ExperimentationPersuasionOnline.pdf · Experimentation and Persuasion in Political Organizations Alexander V. Hirsch

Experimentation and Persuasion in PoliticalOrganizations

Alexander V. Hirsch 1

February 28, 2015

1Division of the Humanities and Social Sciences, California Institute of Technology, MC 228-77,Pasadena, CA 91125. Email: [email protected].

Abstract

Different beliefs about how to achieve shared goals are common in political organizationssuch as government agencies, campaigns, and NGOs. However, the consequences of suchconflicts have not yet been explored. We develop a formal model in which a principal and anagent disagree about the right policy for achieving their shared goals. Disagreement creates amotivational problem, but we show how both observing policy outcomes and experimentingwith policies can ameliorate it. We also show that the principal often defers to the agent inorder to motivate him, thereby generating more informative policy outcomes and buildingfuture consensus. Most surprisingly, she sometimes allows the agent to implement his desiredpolicy even when she is sure it is wrong, to persuade him through failure that he is mistaken.Using the model, we generate empirical implications about performance measurement andPresidential appointments in U.S. federal agencies.

1 Introduction

Within political organizations, strong disagreements about how to achieve shared goals are

common. For example, Presidents often appoint agency heads with very different beliefs

about how to achieve agency goals than career bureaucrats (Heclo 1974). Legislators often

disagree strongly about the likely efficacy of new programs that are proposed by federal

agency heads (Carpenter 2001, Ch. 4). And party leaders and activists often disagree about

the likely political consequences of different tactics, e.g., the recent battles between Speaker

Boehner and the Tea Party caucus over a government shutdown (Draper 2014).

Despite the frequency of “belief conflicts,” their consequences have received little at-

tention in the literatures on political agency and bureaucratic politics. The typical model

considers an agent who has both different goals and superior information than his principal,

and analyzes how he can exploit his information to achieve his goals at the principal’s ex-

pense (Moe 2012). Such models have produced important insights about when Congress will

delegate policymaking to the bureaucracy (Epstein and O’Halloran 1999; Huber and Shipan

2002) and how the civil service should be designed (Gailmard and Patty 2007). However,

they cannot be used to study open belief conflicts because, by construction, any belief con-

flict cannot be open – if the principal knew the agent’s beliefs, she could infer the information

that led to them, use it to revise her own beliefs, and eventually eliminate their disagreement.

In this paper we develop a principal-agent model of policymaking in a political organiza-

tion to study belief conflicts. At the heart of our analysis is a simple idea: what makes belief

conflicts different from goal conflicts is that beliefs can change systematically with learning,

while goals do not. In particular, individuals might be persuaded to revise their beliefs in

light of their experiences with actually implementing policies. When strategic political actors

anticipate this, it can influence their policy decisions in subtle and surprising ways.

In the model a principal repeatedly chooses policy, but is uncertain about which one

will most effectively achieve her goals. Her choices must be implemented by an agent, whose

effort increases the chance that an effective policy will succeed, but is wasted on an ineffective

1

policy. As in the existing literature, we assume the agent cannot be directly monitored or

controlled by the principal. In contrast to the existing literature, we assume that he shares the

principal’s goals, and is no better informed about how to achieve them. Instead, he disagrees

about which policy is most likely to achieve those goals in the sense of heterogeneous prior

beliefs (Morris 1995). Because this assumption is an important point of departure from the

existing literature, we briefly defer an in-depth discussion of its meaning and import.

The following running example illustrates the idea. Suppose a city is trying to combat

a wave of violent crime but city officials disagree about its underlying cause, and thus the

best policy response. The mayor (the principal) thinks the main cause is a concentration of

violent criminals, and wants to magnify the department’s efforts to solve violent crimes and

get perpetrators off the streets. The police chief (the agent) thinks that petty crime and

general disorder have caused the community to disengage, thereby encouraging all crime, i.e.,

he believes in “broken windows policing” (Wilson and Kelling 1982). He wants to refocus

the department’s energies on maintaining order by punishing and preventing misdemeanors.

An immediate insight that emerges from the model is that the mayor faces an agency

problem with her police chief, even though they have exactly the same goal of reducing

crime. The reason is that the police chief disagrees that “more of the same” will do the

job. Consequently, he will likely be demoralized, and work with less intensity, if denied the

chance to implement a new policy that he believes is more likely to be successful.

How should the principal manage her agent in the face of a belief conflict? In a private

firm she could impose her desired policy, but also use tools from classical principal-agent

theory, such as closer monitoring, a bonus for a good outcome, and a threat of termination

for a bad one. But these tools are often very costly, unavailable, or ineffective in political

organizations because of civil service protections and political considerations (Brehm and

Gates 1999; Lewis 2008). Instead, principals mainly motivate through their policy choices.

It turns out that the effect of those choices on the agent depends crucially on whether or not

the organization can learn by observing policy outcomes.

2

When learning is impossible, the principal’s policy choices in the face of a belief conflict

exhibit a type of “coping” behavior (Wilson 1989). Because neither inputs (the agent’s

effort) nor outputs (success or failure) can be observed, the principal must make decisions

that cope with disagreement without trying to resolve it. In the model, the principal copes

by deferring to the agent and selecting his desired policy whenever his beliefs are stronger

than her own, calculating that the agent’s greater motivation will outweigh the cost of an

inferior policy. Importantly, the principal does not defer because she thinks that the agent’s

beliefs reflect superior information, as in classical political agency models.

If the principal and agent can observe policy outcomes and learn from them, however,

the principal can do more than just cope. She can experiment with policies – try one out

but discard it for an alternative if it fails. She can also try to persuade the agent to agree

with her through her policy choices. The model with learning produces three main results.

First, we show that introducing the ability to learn by observing policy outcomes is

enough to increase the agent’s effort – even if the principal completely ignores those outcomes

in her decisions. With learning, the agent values his own effort more because it also makes

policy outcomes more informative about the efficacy of the organization’s policy. This helps

the agent decide how hard to work in the future. To understand this effect, consider two

possible approaches the police chief could take if allowed to implement “broken windows

policing”; he could implement the policy haphazardly, or be sure to put in extra time, effort,

and manpower. If broken windows fails to produce results in the former case, it might be

because it was inadequately implemented. But if it fails in the latter case, this is stronger

evidence that it is probably not that effective.

Second, we show that experimenting isn’t just good for finding better policies (Callander

2011) – it’s also better for motivating the agent. The reason is that experimenting ties

the organization’s future policies to the results of the agent’s effort, which motivates him

to work harder. If the mayor in the example says, “we’ll try broken windows today, but

do something different tomorrow if it doesn’t work,” this raises the stakes over its initial

3

success, and motivates the police chief to work harder to ensure that success. Surprisingly,

experimenting better motivates the agent even when it is with the policy he opposes; he never

“sabotages” that policy by withdrawing his effort to ensure failure and a policy change.

Finally, we show that the potential for learning causes the principal to defer more to the

agent. This occurs because learning makes it doubly important to motivate the agent: more

effort produces both a higher chance of success if the chosen policy is effective, and speeds

learning about whether that policy is effective.

In fact, learning makes the agent’s effort triply valuable because it also reduces future

belief conflicts (Smith and Stam 2004). This insight produces our most striking result: the

principal sometimes defers to the agent even when she is already sure that the policy he

favors is ineffective. In other words, the mayor might allow the police chief to implement

broken windows even expecting it to fail. The reason is that she expects failure despite the

chief’s high effort to effectively persuade him that his pet policy doesn’t work. This strategy

of “deferring to persuade” can’t occur in classical political agency models because the only

reason the principal defers is to exploit the agent’s superior expertise.

Summarizing, our analysis makes several contributions. First, it begins to identify the

differences between conflict over goals and conflict over beliefs in political organizations.

Second, it provides new insights about a variety of managerial techniques that have been

extensively discussed in both the political science and public administration literatures;

we show motivational benefits to measuring performance, encouraging experimentation, and

deferring to subordinates. Third, it generates new empirical implications about management

patterns in political organizations; after presenting the model we discuss how to test them

using data from “performance measurement” initiatives in the U.S. federal bureaucracy. We

also discuss how the model can shed further light on how Presidential appointments impact

bureaucratic morale and performance (Lewis 2008). Finally, our analysis contributes to a

growing literature on policy experimentation in political environments. When concluding we

discuss several avenues for future work across political science enabled by the model.

4

Belief Conflicts and Learning The interaction between belief conflicts, learning, and

policymaking has long been of interest in the public policy literature. Sabatier (1988) (build-

ing on seminal work by Heclo (1974)) placed elite “belief systems” about causal effects at

the heart of his influential theory about how “advocacy coalitions” generate policy change,

arguing that “much of the policy debate can be understood as disputes over the validity of ...

causal theories” (p. 157). This work spawned several decades of research on policy learning

and change (Weible, Sabatier, and McQueen 2009); Hall (1993), for example, studies how

high unemployment and stagnating growth in 1970s Britain discredited Keynesian policies

among politicians on both sides of the ideological spectrum who were “seeking solutions to

Britain’s economic problems.”

Only half of these ideas, however, have thus far been incorporated into formal political

science. Callander (2011) and Callander and Hummel (2014) study learning about the spatial

location of policies through experimentation, either by a single actor or two competing actors.

Volden, Ting, and Carpenter (2008) analyze how a group of actors with different ideologies –

such as states or municipalities – learn about the quality of policies through experimentation.

Neither considers how conflicting beliefs affect this process.

This gap in the literature is rooted in the methodological traditions of both political

science and economics. In political science, conflicts are customarily thought to result from

different economic interests (Bartels 2008) or political ideologies (Abramowitz and Saunders

2008). In economics and formal political science, there are many models with differences in

beliefs, but they are nearly always modeled as arising from differences in information. This

is referred to as the common priors assumption, because individuals are assumed to “begin

the game” in some hypothetical condition in which they share common beliefs about all the

relevant uncertainty in the world, and then each receive some “private information,” known

only to themselves, that causes those beliefs to diverge (Morris 1995). While there are many

arguments in the literature both for and against the common priors assumption,1 the key

1Briefly, the two main arguments in favor are a philosophical belief that two rationalindividuals who share all the same information ought to agree, and a practical concern that

5

practical limitation of such models is that they cannot be easily used to study open belief

conflicts. The reason is that knowledge of a belief conflict in a fully rational model with

common priors causes individuals to infer each other’s information, revise their own beliefs

in light of that information, and eventually come to agreement.2

Rather than discard rationality and introduce a host of additional complications, in this

paper we study belief conflicts simply and transparently by discarding the common priors

assumption. That is, we assume outright that the principal and the agent disagree because

they have heterogeneous prior beliefs about which policy is most likely to be effective for

achieving their goals. They are fully aware of each others’ beliefs,3 and do not infer anything

from just the fact that those beliefs differ. While this approach is unorthodox, it nevertheless

has a long history in formal modeling, beginning with Arrow (1964) and continuing through

to the present (Van den Steen 2010b; Minozzi 2013).

As in the present study, a growing subset of this literature studies how belief conflicts

interact with learning (Yildiz 2004; Che and Kartik 2009; Smith and Stam 2004; Van den

Steen 2010a). An important common insight is that individuals with belief conflicts think

they can persuade each other by taking actions that will produce more information, each

expecting it to “prove” that they were right. An example is Smith and Stam’s (2004)

pathbreaking study of bargaining and fighting between two states that disagree about their

relative strength. Because “the act of waging war reveals information about the relative

strengths of each side,” the two parties “fight to resolve their difference of opinions” about

who is stronger (p. 783). They continue to fight until enough information has been revealed,

and “each side’s beliefs converge sufficiently, that they can agree to a settlement” (p. 787).

a modeler can “predict anything” if free to choose beliefs. See Morris (1995) and Smith andStam (2004) for extensive and accessible discussions.

2Specifically, Aumann (1976) showed that a group of individuals cannot have commonknowledge of a belief disagreement if they have common priors. Relatedly, Geanakoplosand Polemarchakis (1982) show that two individuals with common priors who take turnstruthfully exchanging beliefs must agree in finite time.

3Formally, they have common knowledge of heterogeneous prior beliefs; i.e., they knoweach other’s different beliefs, know that they know each other’s beliefs, etc.

6

Public Administration and Bureaucratic Politics While our model is general to many

types of political organizations, an important application is to the study of bureaucracy.

Our assumption that the agent shares the principal’s goals is supported by a vast litera-

ture in bureaucratic politics and public administration that studies the intrinsic motivation

of bureaucrats to carry out their tasks (Brehm and Gates 1999). Many sources for this mo-

tivation have been documented. The agent may like to serve the “public good,” and think

that achieving the principals’s goals would do so; this “public service motivation” (PSM) has

been studied extensively in public administration (Perry and Hondeghem 2008). The agent

may have been actively socialized to share those goals, as is common in “mission driven

organizations” (Kaufman 1967; DiIulio 1987; Goodsell 2011), or he may view his “role” as

serving his political superiors (Wilson 1989; Golden 2000). Theory and data also support

the proposition that intrinsically motivated individuals will sort into public organizations

(Besley and Ghatak 2006; Perry, Hondeghem, and Wise 2010), and once hired, be assigned

to carry out tasks they care about (Brehm and Gates 1999; Prendergast 2008).

Two key takeaways from the literature on intrinsic motivation are that it matters in

explaining how hard bureaucrats work (Perry, Hondeghem, and Wise 2010), and that direct

mechanisms of control such as monitoring, rewards, and sanctions, matter much less; either

because they are unavailable (Lewis 2008), or ineffective (Brehm and Gates 1999; Durant

et al. 2006). We contribute to this literature by separating the agent’s motivation to achieve

the principal’s goals from his potentially differing beliefs about how to do so (Boardman

and Sundquist 2008). Such disagreements are well documented by case studies of federal

management and surveys of political elites (Heclo 1977; Sabatier and Hunter 1989; Golden

2000). This separation immediately yields an underappreciated insight: intrinsic motivation

alone doesn’t “solve” agency problems, because it does a principal little good to have an agent

who shares her goals but disagrees strongly about how to achieve them. We can then derive

new results about how an intrinsically motivated agent is influenced by his environment –

specifically, the presence of “performance information” about policy outcomes (Moynihan

7

2008) and the propensity of managers and political principals to experiment and defer.

Given the need to explore our new assumptions about belief conflicts, we do not directly

consider how external political factors (such as elections or separation of powers) create or

influence those conflicts. Nevertheless, our study is motivated by the broader perspective that

“to understand issues of control and oversight over an organization ... one must understand

issues of compliance within an organization” (Brehm and Gates 1999, p2). This perspective

is shared by many recent works in bureaucratic politics; Gailmard and Patty (2007) and

Lewis (2008) both study how decisions by the politicians affect bureaucratic performance,

while Huber and McCarty (2004) consider the reverse. While we mostly analyze a single

principal and agent, we also begin to move toward this larger goal by considering a variant

in which the principal is appointed by another player, and discuss how it sheds light on the

impact of Presidential appointments on bureaucratic performance (Lewis 2008).

2 Model

The model is a game of repeated policy choice and implementation played between a principal

and an agent. They seek to achieve a shared organizational goal in each of two periods

t ∈ {1, 2}. For simplicity, we assume that the outcome in each period yt is either success

(yt = 1) or failure (yt = 0).

Player 1 is the principal; in each period she publicly chooses a course of action or policy

xt from the set X = {a, b}. This can be thought of as her choice about how the organization

will pursue a goal. Player 2 is the agent; in each period he implements the principal’s policy

with unobservable effort et ∈ [0, 1]. (As in Bueno De Mesquita and Stephenson (2007), the

agent chooses how much effort to put into implementing the policy, rather than the policy

itself.) Effort is costly for the agent, and this cost takes the form −(et)2

2λ, where λ is bounded

between λ and λ.4 The value of λ determines the agent’s willingness to expend costly effort

4λ ≈ .68466 is necessary and sufficient to ensure that the agent’s initial effort is boundedin [0, 1] and is characterized in the proof of Proposition 1. Otherwise exerting effort is

8

to achieve the organization’s goals and thus captures the strength of his intrinsic motivation.

For simplicity we assume that the players place equal weight on payoffs in each period.

The players’ two period payoffs are thus:

y1 + y2

(y1 − (e1)

2

2λ

)+

(y2 − (e2)

2

2λ

)(Principal) (Agent)

These assumptions best capture environments where goals must be achieved repeatedly over

time, such as minimizing the level of a pollutant or reducing crime. However, our insights

about how disagreement and learning affect policy choices clearly extend to settings where

the future is more important than the present, such as a trial or pilot project.

How Successes Occur Intuitively, the success or failure of a policy depends both on

whether it is fundamentally well-suited to achieve the intended goal, and on how well it is

implemented. To capture these intuitions, we assume that “nature” initially draws a state

ω = {a, b} determining which of the two policies is the “correct” one to achieve the goal.

In both periods, the probability that the correct policy (xt = ω) succeeds is equal to the

agent’s effort et, while the incorrect policy (xt 6= ω) always fails. Thus, more effort produces

a higher chance of success, but only when exerted on the correct policy. Our assumptions

are standard in the economics literature on learning through experimentation (Bergemann

and Hege 2005; Keller, Rady, and Cripps 2005); their main property is that choosing well

and exerting effort are complementary to achieving success, i.e., both are necessary inputs.

A policy problem that might take this form is identifying the root cause of severe smog in

a municipality; city bureaucrats could be uncertain about whether the smog is predominantly

caused by emissions from refineries or emissions from automobiles. To improve air quality

so cheap that the agent chooses e∗ = 1 in the first period, learns which policy is correct,and always achieves success in the second period. λ ≈ .23505 is only relevant for part ofProposition 3 – it ensures that the effect of the agent’s beliefs on effort is not trivial.

9

they must both correctly identify the root cause, and expend effort effectively regulating it.5

Modeling Disagreement To model disagreement we assume that the players have het-

erogeneous prior beliefs about the probability distribution over ω. Specifically, each player

i has their own prior θi that ω = a, i.e., that a is the correct policy, and by implica-

tion their own prior 1 − θi that ω = b. The principal initially believes that a is more

likely to be correct(θ1 ≥ 1

2

), while the agent believes that b is more likely to be correct(

θ2 ≤ 12⇐⇒ 1− θ2 >

12

). They know each others’ prior beliefs (in the sense of common

knowledge) and fully understand the nature of their disagreement. In the smog example, the

principal could be a city manager who believes that existing evidence points to refineries as

the root cause, while the agent is a city bureaucrat charged with implementing the policy

who believes that automobiles are the more likely cause.

Observing Outcomes We solve and compare two variants of the model. In the first vari-

ant policy outcomes are never observed, and disagreement persists unchanged throughout

the game. We call this the No Learning Benchmark because it captures policymaking envi-

ronments where outcomes cannot be reliably measured, and/or circumstances are changing

too rapidly, for learning to occur. The second variant is the Game with Learning. In it, both

players observe whether the first period policy succeeded or failed (i.e., the value of y1) prior

to making their decisions in the second period. This allows them to learn from the initial

outcome before making their subsequent policy decisions.

3 Coping with Disagreement

We first solve the No Learning Benchmark, in which policy outcomes can’t be observed.

Since choices and outcomes in the first period have no effect on beliefs in the second, the

principal must choose policies that cope with disagreement without trying to resolve it.

5The city of Los Angeles faced this uncertainty in the 1960s (Krier 1977).

10

Each period is identical and so can be solved in isolation. We first consider the agent’s

choices. For the organization to succeed the agent must exert high effort, but that effort

only “makes a difference” if the principal has chosen the correct policy. Formally, the agent’s

expected payoff to exerting effort et to implement policy xt = a is equal to θ2et− (et)

2

2λ, where

θ2 is his subjective prior belief that a is the correct policy.6 (The expression is identical for

xt = b, but substituting in the agent’s belief 1− θ2 that b is the correct policy.) His optimal

responses to the principal’s policy choices are thus as follows.

Observation 1. The agent’s effort e∗ (xt) and utility U∗2 (xt) for each policy xt ∈ {a, b} are:

e∗(xt)

=

λθ2 if xt = a

λ (1− θ2) if xt = bU∗2(xt)

=

λ2

(θ2)2 if xt = a

λ2

(1− θ2)2 if xt = b

The left panel of Figure 1 depicts how much effort the agent would exert on each policy as a

function of his beliefs. On all figures, the x-axis illustrates the strength of the agent’s belief

1 − θ2 that his desired policy b is correct; moving right along the x-axis causes the agent’s

beliefs to diverge from those of the principal, and θ2 to decrease.

Figure 1 illustrates that the agent sharing the principal’s goals is alone not enough to

ensure that he works hard to achieve them (Brehm and Gates 1999). While higher intrinsic

motivation λ does induce more effort in our model (Besley and Ghatak 2006), high effort

also requires that the agent believe the principal chose correctly. The reason is that he has

no interest in wasting his effort on an ineffective policy.

The agent’s disagreement about the right policy thus creates a managerial dilemma. If the

principal imposes the policy she believes in (a), the agent will be less motivated, thinking his

effort would be wasted. If she instead defers to the agent by choosing the policy he believes

in (b), then this will better motivate him – but from the principal’s perspective, there are

better than even odds that this extra motivation is useless. Either resolution entails a cost,

either in terms of choosing well or in terms of motivating the agent (Van den Steen 2009).

6“Subjective” because the agent uses his own prior θ2 when evaluating expected utility.

11

Figure 1: Effort and Policy Choice in the No Learning Benchmark Agent’s optimal effort

effort on b

effort on a

agent’s beliefs in favor of b ✓2 = 12

✓2 = 0

effort

λ / 2

λ

No Learning Benchmark

agent’s beliefs in favor of b

Principal defers and chooses b

θ1 =1−θ2

Principal imposes a

✓1 = 1

✓1 = 12

✓2 = 12

✓2 = 0

The following observation characterizes how the principal copes with this disagreement.

Observation 2. In the unique equilibrium of the No Learning Benchmark, the principal

defers to the agent and chooses policy x∗ = b in both periods if and only if the agent’s belief

that b is correct is stronger than her own belief that a is correct, i.e. 1− θ2 ≥ θ1. Otherwise,

she imposes policy x∗ = a in each period.

The right panel of Figure 1 shows the principal’s policy choice x∗ as a function of the players’

beliefs. The y-axis illustrates the strength of the principal’s beliefs θ1 that her desired policy

a is correct; moving up along it causes the principal’s beliefs to diverge from the agent’s.

The principal thus defers to the agent whenever the agent’s beliefs in favor of policy b

are sufficiently strong. But the rationale is very different than in classic political agency

models, in which deference is used to exploit the agent’s superior expertise and improves

policy outcomes despite his differing preferences (Moe 2012). In our model, the agent has

the same preferences as the principal but differing beliefs, and the principal does not ascribe

the difference to superior expertise. She defers simply because she has no other good option.

She needs the agent to work, can’t force him to do so, and concludes that more effort on

an inferior policy is a better bet than less effort on a superior policy. Disagreement and

12

deference are thus pure cost, in that the principal would prefer if she could just choose a,

and the agent agreed that it was right (i.e., θ1 · 1 > max {θ1θ2, (1− θ1) (1− θ2)} for all θ2).

4 The Game with Learning

We now consider the Game with Learning, in which the actors can observe and learn from

policy outcomes. We first analyze how this learning will occur.

Learning by Doing When the players can observe whether the initial policy succeeded

or failed before making decisions in the second period, they will update their beliefs about

which policy is the right one using Bayes’ rule. Exactly how this updating will occur depends

crucially on how hard the agent worked; failure is a more informative signal that the initial

policy was incorrect if it was implemented with high effort. To see this intuitively, recall

the pollution example. Suppose that regulations are imposed on refineries, but the effects

on smog are minimal. This failure provides indirect evidence that automobiles are actually

the root cause, but that indirect evidence is stronger if the bureaucrat vigorously enforced

regulations on refineries than if he weakly enforced them.

To see this mathematically, suppose the principal initially chooses policy x1 = a, and the

agent implements it with effort e1. If a succeeds (y1 = 1), then updating beliefs is simple.

Since only the correct policy can succeed under our assumptions, both the principal and

the agent agree after success that a is definitely the correct policy. If a fails, however,

it could either be because it was incorrect, or because it was inadequately implemented.

Each player i then computes a posterior belief that a failed despite being correct equal

to h (e1, θi) =θi(1−e1)

θi(1−e1)+(1−θi) < θi. This is player i’s prior assessment θi (1− e1) of the

probability a was correct but still failed, divided by the unconditional probability of failure.

This posterior is lower than the prior θi, so failure is always a negative signal. However, how

much lower depends on how much effort e1 the agent put in: the harder he worked, the more

both players ascribe failure to the policy being incorrect rather than poorly implemented.

13

Principal’s Strategies When policy outcome can be observed, the strategies available to

the principal also become more complicated, because she can base the second period policy

on the first period outcome. In theory this allows for even “pathological” strategies such as

switching from the initial policy only if it succeeds. However, it turns out that such strategies

will never be used in equilibrium; the reason is that success persuades both players that the

initial policy was correct. We can thus restrict attention to two types of strategies: ones

where the principal rigidly implements the same policy in both periods regardless of the

outcome, and ones where she experiments with the initial policy, i.e., sticks with the policy

if it succeeds, but abandons the policy if it fails.7

Observation 3. A pure strategy for the principal in the Game with Learning consists of a

first period policy x1, and a second period policy x2 (x1, y1) for each first period policy x1 and

outcome y1. In a pure strategy equilibrium, on each possible policy x1 the principal either

• rigidly implements x1, i.e., chooses it in both periods regardless of the first period

outcome (x2 (x1, y1 = 1) = x2 (x1, y1 = 0) = x1).

• experiments with x1, i.e., chooses it again in the second period if it succeeds (x2 (x1, y1 = 1) = x1)

but switches to the alternative if it fails (x2 (x1, y1 = 0) 6= x1).

Note that the principal can’t directly base her decision to experiment on how hard the agent

worked because his effort is unobservable. Instead, in equilibrium the principal’s strategy

must be optimal given her equilibrium beliefs about how hard the agent worked.8

7For some parameters the model exhibits both multiple and mixed strategy equilibria.Since our focus is on what the principal can achieve managing the agent, we consider onlythe equilibrium that is best for the principal, which is always in pure strategies.

8Formally, the principal’s second period choices x2(x1, y1) after choosing policy x1 arenot a function of the agent’s initial effort e1(x1) on that policy. Instead, experimenting witha is optimal in equilibrium i.f.f. h (e1 (a) , θ1) ≤ 1 − h (e1 (a) , θ2), i.e., if after failure theprincipal’s posterior that a was correct is below the agent’s posterior that b is correct.

14

5 Learning, Experimentation, and Deference

We now present results from the Game with Learning, focusing on three questions. First,

how does the presence of “performance information” – that is, the ability to observe policy

outcomes – directly affect the agent’s incentives? Second, how does the agent respond when

the principal also uses that information in her decisions by experimenting with policies?

Finally, when will the principal defer to the agent, and why? To clarify the exposition,

mathematical details are relegated to a main Appendix (in-print) and a supplemental online

Appendix. The main Appendix contains proofs of Propositions 1 and 2, while the online

Appendix contains the general equilibrium characterization and remaining proofs.

5.1 Learning

A principal who rigidly implements policies ignores “performance information” in her deci-

sions. Thus, by comparing the agent’s effort in this case to his effort in the No Learning

Benchmark, we can isolate how that information directly affects the agent. The reason there

is any effect is two-fold. First, the agent would like to learn whether the initial policy was

incorrect in order to avoid wasting his costly effort in the future. Second, he can influence

that learning through his initial choice of implementation effort. To see how the agent will

choose that effort we analyze his optimization problem, working backwards.

Suppose the principal’s strategy is to rigidly implement policy a (i.e. x1 = a, x2 (a, 1) =

x2 (a, 0) = a).9 Then the agent anticipates that his future effort on a will be based on his

posterior belief that a was correct after observing its outcome, i.e., e2 = λh (θ2, e1). His

two-period expected utility as a function of his first-period effort e1 is then:

−(e1)2

2λ+θ2

(e1

(1 +

λ

2

)+(1− e1

)(λh(e1, θ2

)− λh2 (e1, θ2)

2

))+(1− θ2)

(−λh

2 (e1, θ2)

2

).

(1)

The first term is his first-period cost of effort. The second term is his two-period expected

9The analysis is identical for x1 = b but substituting in 1− θ2 for θ2.

15

utility if a is correct, which he thinks is true with probability θ2. With probability e1

policy a succeeds, he gets 1 in the first period, the principal sticks with a, and he gets λ2

in the second period. With probability 1 − e1, a fails despite being correct, he mistakenly

revises his posterior belief downward to h (e1, θ2), the principal sticks with a, and he gets

λh (e1, θ2) − λh2(e1,θ2)2

in the second period. The third and final term is his two period

expected utility if a is incorrect, which he thinks is true with probability 1 − θ2; a will fail

for sure in the first period, the principal will stay with it, it will again fail for sure in the

second period, and he pays the cost of his wasted effort λh (e1, θ2).

Equation 1 is complicated, but a little algebraic manipulation yields a simpler expression:

(θ2e

1 − (e1)2

2λ

)︸︷︷︸

first period payoff

+λ

2θ2

(e1 +

(1− e1

)h(e1, θ2

))︸︷︷︸second period payoff

. (2)

The first term is the agent’s first-period expected payoff as a function of his initial effort e1,

and is identical to his per-period payoff in the No Learning Benchmark. The second term

is his expected future payoff as a function of his initial effort. The key insight is that this

term is increasing in effort e1. The reason is that more initial effort makes the first period

policy outcome more informative, resulting in more learning. This benefits the agent in the

second period by allowing him to better calibrate his effort, i.e., to work hard if the policy

is the correct one and avoid wasting effort if it is not. Formally, e1 + (1− e1)h (e1, θ2) is the

agent’s expected posterior belief that a is correct when it is actually correct. This expression

captures how accurate the agent’s future beliefs will be after observing the initial outcome.

The presence of a “learning premium” to first-period effort in the Game with Learning

produces the following result, which is proved in the main Appendix.

Proposition 1. When the principal rigidly implements some policy x ∈ {a, b}, observing

outcomes strictly increases both the agent’s first period effort, and the probability of success

in both periods, relative to the No Learning Benchmark.

16

The left panel of Figure 2 illustrates how introducing the ability to observe outcomes and

learn from them better motivates the agent in the first period.

Figure 2: Effect of Observing Outcomes and Experimenting on the Agent’s Effort

effort

Agent’s optimal effort

with observable outcomes

λ / 2

λrigidly implements b (observe outcomes)


✓2 = 0

rigidly implements b (no learning)

rigidly implements a (observe outcomes)

rigidly implements a (no learning)


with observable outcomes


experiments with b

rigidly implements a

rigidly implements b

experiments with a


✓2 = 0

λ

effort

λ / 2

Discussion One of the most important developments in public management over the last

several decades has been the push for “performance information” by governments across the

globe. Reform proponents have drawn on the simple logic of classical principal-agent theory

in economics: if we “better assess what government does and how well it does it ... it will

be easier to hold public administrators accountable for their performance” (Kettl 2005).

Research in bureaucracy and public administration, however, provides little evidence that

performance information can be effectively used this way. Public managers lack the freedom

to promote accountability with high-powered incentives, and politicians seem to choose not

to (Pollitt 2006). Bureaucrats appear to respond mainly to intrinsic rewards, which can

be “crowded out” by extrinsic ones (Georgellis, Iossa, and Tabvuma 2011). And manipu-

lating extrinsic rewards generates many unintended consequences. Bureaucrats inefficiently

reallocate effort from unmeasured goals to measured ones (Bevan and Hood 2006) – termed

the “multi-task problem” in principal-agent theory (Holmstrom and Milgrom 1991) – and

manipulate the performance information (Milgrom and Roberts 1988; Heinrich 2007).

17

Our model captures the more empirically supported idea that “the most likely users [of

performance measures] appear to be the agents themselves” (Moynihan, Pandey, and Wright

2012). In our model, an intrinsically motivated agent uses what he learns from outcomes to

allocate effort more efficiently, by working harder when he learns that the policy is likely to be

effective, and working less when he learns that it is not. The principal thus benefits from the

information even when she ignores it in her own decisionmaking ; the agent’s greater efficiency

increases the chance of success in both periods relative to the No Learning Benchmark.

In capturing this rationale for performance information, the model also generates a new

and empirically testable implication – that introducing such information better motivates

the agent. The reason is that the harder the agent works to implement a policy, the more

its outcome reflects its underlying quality, and the more the agent learns from observing it.

Ergo, the more valuable is his effort, and the more effort he puts in.

5.2 Experimentation

A principal who experiments with policies is one whose decisions are directly informed by

policy outcomes – she sticks with the initial policy if it succeeds, but abandons it for the

alternative if it fails. The information contained in policy outcomes therefore directly aids

her “search” for the best policy (Callander 2011).

Experimentation, however, also has a more subtle and previously overlooked property:

it indirectly empowers the agent to influence the principal’s policy choices. The reason is

that by working hard he can increase the chance of a success, and therefore the chance that

the principal will stick with the initial policy. Alternatively, by withdrawing his effort from

the initial policy he can “sabotage” it – that is, intentionally increase its chance of failure –

which will cause the principal to abandon it (Brehm and Gates 1999).10

10Note that these authors reserve the term “sabotage” for expending active effort to inducefailure and call withdrawing effort “dissent shirking.” There is no need for active sabotagein our model due to the simplifying assumption that failure can be ensured with shirking;relaxing this would yield active sabotage when it now yields “dissent shirking.”

18

How will the agent respond when the principal experiments with a policy, rather than

rigidly implements it? Suppose the principal experiments with policy a (i.e. x1 = a,

x2 (a, 1) = a and x2 (a, 0) = b). Analyzing the agent’s two-period expected utility as a

function of his first period effort e1 yields an expression that is analogous to Equation 2

(details are in the main Appendix):

(−(e1)

2

2λ+ θ2e

1

)︸︷︷︸

first period payoff

+λ

2(1− θ2)

(1− h

(e1, θ2

))︸︷︷︸learning term

+λ

2θ2e

1︸︷︷︸policy influence term︸︷︷︸

second period payoff

. (3)

As in equation 2, the expression is divided into first and second-period terms, and the first-

period terms are identical. But because the principal experiments, the second period term

has two subterms: a learning term that represents how the agent’s initial effort affects his

future utility directly through his own learning, and a policy influence term that represents

how his initial effort affects his future utility indirectly through the principal’s policy choices.

The learning term has properties that are similar to the agent’s expected second-period

utility when the principal rigidly implements a.11 The policy influence term λ2θ2e

1, however,

only appears when the principal experiments. The key insight is that this term is always

increasing in effort. Intuitively, this means that from the agent’s perspective, working harder

initially always has a beneficial effect on the principal’s future choices. Experimenting thus

incentivizes the agent to work harder initially than does rigid implementation. This result is

perhaps not so surprising for the policy that the agent initially believes in (b), since failure

will cause a switch to the policy he opposes (a). But it also holds for the policy he opposes

(a), even though simply withdrawing his effort would induce certain failure and a switch to

the policy he believes in (b). We state the formal result in the following proposition.

Proposition 2. For any initial beliefs of the agent, the principal elicits more first period

11The key expression is 1− h (e1, θ2), which is the expected value of the agent’s posteriorthat a is wrong when it is actually wrong. This captures how accurate the agent’s futurebeliefs will be after observing the first period outcome, and it is increasing in e1.

19

effort by experimenting with either policy x1 ∈ {a, b} than by rigidly implementing it.

The right panel of Figure 2 illustrates the effect by comparing the agent’s first period effort

on each policy when the principal experiments with it vs. rigidly implements it.

Preferences versus Beliefs Why does experimenting better motivate the agent, even

when it is with a policy he opposes? To clarify, it is helpful to contrast with a version of the

model where the agent’s opposition to policy a is rooted in preferences rather than beliefs.12

Suppose the agent agreed about the probability θ1 that a was the best policy to achieve

the principal’s goals but simply had his own different goals; specifically, his own intrinsic

values πa < πb for exerting effort on each policy, with a preference for b. The agent’s

two-period expected utility for implementing policy x1 = a would then be:

(−(e1)

2

2λ+ πae

1

)︸︷︷︸

first period payoff

+

λ2π2a if principal rigid

λ2π2b−

λ

2θ1e

1 ·(π2b − π2

a

)︸︷︷︸policy influence term

if principal experiments (4)

As in the baseline model, experimenting introduces a “policy influence” term into the agent’s

first period objective function. But in contrast, that term would actually be decreasing in

effort. In other words, when the agent’s opposition to policy a is rooted in preferences rather

than beliefs, he would “sabotage” a policy experiment with a by withdrawing some of his

effort to increase the chance of failure and a policy change.

This sharp distinction between preferences and beliefs arises because an ineffective policy

doesn’t need to be sabotaged to fail; policy change will happen without the agent’s help.

Rather, sabotage only “works” on an effective policy that could have achieved the principal’s

goals with enough effort. When the agent shares those goals (as in the baseline model) this

means that sabotage would only make a difference exactly when the policy is correct, and

the agent wouldn’t want it to switch from it.13 When the agent doesn’t share those goals,

12Details are in the online Appendix.13It doesn’t matter that he assigns lower probability to this than the principal (θ2 < θ1).

20

whether the policy is correct for the principal is irrelevant to him; his net benefit λ2

(π2b − π2

a)

from inducing failure and a policy change is always the same.

Discussion An influential alternative to the “reward-and-sanction” school of performance

information argues that such information aids public managers in their search for the most

effective policies (Simon 1947; Moynihan 2008). These arguments have been especially in-

fluential in the United States: the idea of explicitly treating “policies as experiments” with

the intent to “learn from them” dates back at least to Great Society-era social engineering

(Campbell 1969), was a key element of the nearly decade-long Clinton administration reform

initiative to “reinvent government” (Osborne 1992; Aberbach 2000), and is now codified in

the Interior Department’s official “adaptive management” policies for resource management

(Lee 1993; Office of Environmental Policy and Compliance 2008).

None of these previous discussions, however, consider how experimentation might affect

the incentives of agents who are charged with actually implementing policy. This is surprising

because nearly any policy must be carried out by lower-level subordinates. Our analysis

demonstrates both why there is an effect – because tying future policy choices to previous

outcomes lets the agent influence policy choice through his effort – and that the direction of

the effect depends crucially on whether conflict is rooted in preferences or beliefs.

An additional implication of our model is that there can be motivational benefits to

arrangements that commit principals to experiment with policies, by forcing, encouraging,

or selecting them to be responsive to negative policy outcomes. In the online Appendix,

we formally show that such institutions can benefit the principal: this is the case when her

beliefs in favor of the optimal initial policy are too strong to credibly abandon it after failure,

but ex-ante she would like to “tie her hands” to better motivate the agent.14

What might some such arrangements be in practice? A blunt one is to directly legislate

trigger mechanisms that make a “pre-negotiated commitment ... specifying what actions

14We also show the reverse is never true; the principal would never be better off being forcedto rigidly implement a policy when she would choose to experiment with it in equilibrium.

21

will be taken if monitoring information shows x or y” (Nie and Schultz 2012). This was

recently proposed during policy debates on health care15 and national security (Gates 2014,

p. 375). A more flexible one is to impose costs on decision makers – not for failure, but for

sticking with policies after failure.16 The Compstat program developed in New York City

for analyzing high-frequency crime data furnishes an example: as originally conceived, the

program’s central component was a weekly meeting in which “trouble arose” for precinct

commanders not when “crime numbers on their watch went up,” but instead when they

“didn’t have a plan to address the problem” (Maple and Mitchell 2010).

A final possibility is to simply select individuals to make policy decisions whose intrin-

sic beliefs are more amenable to experimentation. In the online Appendix we show that a

principal with strong beliefs is sometimes better off appointing a “moderate” to make pol-

icy decisions for her, because they will be more willing to abandon the initial policy after

failure, and thereby better motivate the agent. This insight provides a new perspective on

some of the trade-offs facing U.S. Presidents in their attempts to influence federal agencies

through appointments (Lewis 2008). In particular, it suggests a novel reason why “stressing

loyalty and ideology above all else” (Moe 1985, p. 258) appears to demoralize rank-and-file

bureaucrats (Golden 2000). Because of their extreme beliefs, such appointees not only select

policies the bureaucrats disagree with, but also rigidly implement those policies even after

apparent failures. We return to this idea in Section 6.

5.3 Deference

Finally, we consider the principal’s decision to defer to the agent in the Game with Learning.

A principal who defers gives the agent’s desired policy b a chance to succeed, despite her

own skepticism that it is the right policy. What drives her decision to do so?

In the No Learning Benchmark, the principal defers as a second-best coping strategy

15http://www.slate.com/articles/news and politics/prescriptions/2009/10/public option lite.html16In the online Appendix, we prove that when the principal is best off experimenting, there

is a cost of maintaining policy after failure that would induce her to do so.

22

given her irresolvable disagreement; when the agent’s beliefs are stronger than her own, she

calculates that more effort on an inferior policy is a better bet than less effort on a superior

policy. In the Game with Learning, however, much more can happen. She could rigidly

implement policy a, given the strength of her beliefs in its favor, or experiment with a,

hoping that success will persuade the agent that it is right. Alternatively, she could defer to

the agent and rigidly implement b, given the strength of his beliefs, or experiment with b,

hoping that failure will persuade him that it is wrong.

These possibilities result in a complicated set of equilibrium policy choices by the prin-

cipal. Figure 3 depicts these choices as a function of the players’ initial beliefs.17 The fill

of each region indicates the principal’s first period policy choice – in the dotted regions the

principal imposes x1 = a, while in the solid regions she defers and chooses x1 = b. The dark-

ness of the shading indicates whether the principal is responsive to failure – in the darkly

shaded regions she rigidly implements the first period policy, while in the lightly shaded

regions she experiments. Despite the complexity of the principal’s choices, we can extract

two insights about her decision to defer.

Proposition 3. In the first period of the Game with Learning, the principal defers for a

larger set of initial beliefs than in the No Learning Benchmark.18 Moreover, when the agent is

both sufficiently motivated(λ > λ

)and believes strongly enough in b (1− θ2 is sufficiently high),

the principal always defers and selects policy x1 = b regardless of her own beliefs.

Compared to the No Learning Benchmark, the principal thus defers more often (i.e., for a

larger set of initial beliefs). In addition, when the agent’s beliefs are sufficiently strong and

17The agent’s intrinsic motivation is fixed at λ ≈ .653. Part of the complexity also arisesfrom the principal’s problem committing to experiment (see Section 5.2). This accountsfor the jagged region in the upper left quadrant. Strengthening the principal’s beliefs (i.e.,moving up the y-axis) while holding the agent’s beliefs fixed (e.g., at θ2 = 1

3), policy can

temporarily flip back to experimenting with b when the best strategy ex-ante is to experimentwith a, but the principal’s prior beliefs are too strong to sustain commitment.

18This is the only result requiring λ > λ ≈ .23505 and is a quirk of the principal’scommitment problem. With commitment it holds ∀λ. See the online Appendix for details.

23

Figure 3: Principal’s Equilibrium Policy Choices

Principal’s Strategy

x1=b, experiments

x1=a, experiments

x1=b, rigid

x1=a, rigid

Equilibrium Policies

always defers

additional deference


✓2 = 0

✓1 = 1

✓1 = 12

he is sufficiently motivated, then the principal always defers even when she is already sure

that b is the wrong policy.

Why does the principal defer more in the Game with Learning? As in the No Learning

Benchmark, the principal can always elicit more effort from the agent by simply deferring;

he will always work harder on policy b because he thinks that his effort is more likely to

produce success. In the Game with Learning, however, the agent’s effort is more valuable:

more effort makes the initial policy outcome more informative, which results in better future

decisions by both the principal and the agent (Sections 5.1 – 5.2). Combining these two

effects tilts the scales towards deference.

This explanation, however, presents a puzzle. If the additional deference in the Game

with Learning is due to the informational benefits of the agent’s effort, why does the principal

still defer when she already thinks she knows that a is the right policy? The reason is that

the target of that additional information is sometimes the agent himself. The principal

sometimes defers to persuade the agent through failure that his beliefs were wrong, so that

24

he will work harder on policy a in the future. This can be an effective way of managing

their disagreement because allowing the agent an opportunity to prove that b is the right

policy strongly motivates him, and makes failure all the more persuasive when it (from the

principal’s perspective) inevitably occurs. The alternative – forcing the agent to implement

policy a immediately – is less effective because he will exert low effort on a, failure will most

likely result, and it will (slightly) reinforce the agent’s initial belief that a is wrong.19

Discussion In an extensive formal literature in bureaucratic politics, principals defer, del-

egate, or give discretion to agents with differing preferences to use their (or encourage them

to acquire) superior expertise.20 Across multiple models, two implications have emerged as

“the cornerstones of the modern field [of public bureaucracy]” (Moe 2012, p. 21): the ally

principle, which states that deference is less likely the further are the agent’s preferences

from the principal’s, and the uncertainty principle, which states that deference is more likely

“the more uncertain and complex [is] the policy area.” Both of these principles, in general,

fail to hold in our model.

The failure of the ally principal can be seen in Figure 1 (for the No Learning Benchmark)

and Figure 3 (for the Game with Learning). As the agent’s beliefs diverge from the principal’s

because he is increasingly sure that b is right, the principal defers more (i.e., for a larger

set of principal beliefs). This occurs because the principal in our model can’t simply walk

away from the agent; even if she ignores his beliefs, she still needs him to implement her

decisions. This captures an important aspect of real-world policymaking that is assumed

away in most classical models: politicians (and their appointees) will always need lower-level

19When ω = a and the principal rigidly implements a, the agent’s expected future beliefthat a is right is e1 + (1− e1)h (e1, θ2), which is low since θ2 and consequently e1 is low. Ifω = a but the principal experiments with b, the agent’s expected future belief that a is rightis 1− h (e1, 1− θ2), which is higher since 1− θ2 and consequently e1 are higher.

20See Gailmard and Patty (2012) for a review. Some ambiguity about terms is worth not-ing. Deference (following an agent’s recommendations), delegation (relinquishing authorityto an agent) and discretion (placing fewer constraints on his authority) are distinct actions.But we discuss them collectively, because the factors that induce principals to employ themin informational models are similar.

25

subordinates to carry out their policies. As the model clearly shows, it actually matters

what these subordinates think even when they lack superior information, because what they

think affects how well they do their jobs.

The uncertainty principle actually holds in the No Learning Benchmark; the principal

defers for a larger set of agent beliefs when she is less sure that a is right (θ1 closer to 12).

However, it fails in the Game with Learning. When the agent’s beliefs are sufficiently strong,

the principal defers regardless of her own (un)certainty. This crucial difference arises because

the principal only defers in the canonical framework to benefit from the agent’s information.

When she is already certain about which policy is right, she has no reason to defer. But in

our model, generating more information can also help the agent learn about which policy

is right. This yields a surprising new reason to defer, even when doing so requires choosing

(what the principal thinks is) the wrong policy: allowing the agent to implement that policy

and watch it fail can more effectively “teach” him to share the principal’s beliefs.

In addition to these differences, our model produces a new testable implication that

has not been previously considered: introducing effective performance measurement (i.e.,

moving from the No Learning Benchmark to the Game with Learning) should also increase

deference. Our model speaks to the effects of performance information because learning

comes from implementing policies and observing outcomes, rather than “drawing a costly

signal” about the state (Gailmard and Patty 2007). Suggestively, the combination of perfor-

mance measurement and deference resembles the core principles of the National Performance

Review, a Clinton-administration management initiative that combined performance mea-

surement with the directive that “decision-making power should be decentralized, giving

lower-level employees more authority to make decisions ” (Aberbach 2000).

Finally, the benefits of exercising broad deference to subordinates are often extolled

by practitioners. NYPD commissioner William Bratton describes a management style in

which he “pick[ed] good people,” “turned responsibility back onto the worker,” and “let

them do their jobs” (Bratton 1998, p. 127). But the canonical formal model for studying

26

principal-agent relationships has forced our understanding of these benefits into the narrow

confines of asymmetric expertise versus control, and forced out many factors that commanded

the attention of early bureaucracy and public administration scholars (and contemporary

“informal” organizational theorists): instilling culture, gaining trust, building consensus,

and teaching (Kaufman 1967; Heclo 1977; Kaufman 1981). Like most formal models, our

contribution to this list is incremental but nevertheless important: a principal sometimes

defers to teach, because teaching sometimes requires letting an agent make mistakes.21

6 Application: The U.S. Federal Bureaucracy

We now apply our model to the U.S. federal bureaucracy. We first discuss how repeated

efforts to enhance performance measurement have produced data that could be used to test

some of our predictions. We then discuss how both the model and extensions can shed light

on Presidential “politicization” and bureaucratic performance.

Performance Measurement in U.S. Government Since the early 1990s, there have

been three major initiatives to enhance performance measurement in federal agencies: the

Government Performance and Results Act of 1993 (Long and Franklin 2004), former Vice

President Gore’s “National Performance Review” (NPR) (Thompson and Riccucci 1998),

and the Bush-era Program Assessment Rating Tool (PART) (Lewis 2008). The federal

government has collected voluminous survey data tracking the consequences of these reforms.

Since 2002 the Office of Personal Management (OPM) has administered frequent surveys of

federal employees across agencies and ranks that track key variables in our model: effort,

21Brehm and Gates (2008) also study how bureaucratic supervisors teach subordinates in“environments of great uncertainty” (p. 42), but their formal exercise is akin to “attemptingto change the preferences of the subordinate.” While this too certainly occurs in politicalorganizations, we believe that our approach, which explicitly models beliefs and how theychange with rational Bayesian learning, may also capture the spirit of their exercise.

27

intrinsic motivation, and managerial styles.22 The Government Accountability Office (GAO)

has also conducted three cross-agency surveys of federal managers since 2003, measuring

both whether federal agencies have introduced useful performance data, and whether it is

actually used to guide agency decisionmaking.23 This survey data can be used to paint a

picture across agencies and time about when performance measurement was introduced, and

the consequences of doing so for employee effort and managerial practices.

One set of predictions that can be tested is the collection of organizational changes that

occur when moving from the “No Learning Benchmark” to the “Game with Learning”: ef-

fort, experimentation, and deference to subordinates should all increase. This transition can

be measured in the data using the perceptions of managers and subordinates as to when

effective performance measures were introduced. The second prediction that can be tested is

the effect of experimentation on bureaucratic morale: subordinates should be more motivated

in agencies where managers use performance data in decisionmaking, i.e., experiment. Some

evidence already exists from the Clinton-era NPR reforms, which explicitly combined perfor-

mance measurement with a directive to “give lower-level employees more authority to make

decisions” (Aberbach 2000). An important component of these reforms was “reinvention

labs” in which agencies could receive rule waivers for experimental projects spearheaded by

low level bureaucrats. A 1996 GAO study found that these labs had significantly improved

agency effort, morale, and performance (U.S. Government Accountability Office 1996).

Presidential Appointments and Bureaucratic Performance How Presidents choose

their appointees to the executive branch has been long debated in the literature. In a

provocative essay that set the terms of the debate, Moe (1985) argued that Presidents would

inevitably continue “politicizing” agencies by filling their ranks with loyal and ideologically

like-minded appointees, even at the cost of “neutral competence” (Heclo 1977). Recent

22Federal Employee Viewpoint Survey (FEVS) 2013 summary available athttp://www.fedview.opm.gov/2013files/2013 Governmentwide Management Report.PDF.

23Summary of 2013 survey is available at http://www.gao.gov/products/GAO-13-519SP

28

empirical work has cast doubt on this contention. Lewis (2008) finds both that the number

of political appointees is not secularly increasing, and that the “complexity” of an agency’s

tasks reduces politicization. This suggests that Presidents do indeed perceive, and react to,

“trade-offs between policy influence and agency performance” (Lewis 2011, p. 50).

Whether and how such trade-offs also affect the ideology of appointees is less well un-

derstood. The appointments extension of our model discussed in Section 5.2 suggests that

Presidents sometimes prefer appointees with more moderate beliefs than their own, to better

motivate rank-and-file bureaucrats. Consistent with this argument, Bertelli and Grose (2011)

find that Presidents do not actually appoint ideological “clones”; executive department heads

are significantly more moderate than their appointing presidents. But many other factors

examined in the literature can account for this finding, e.g., the budgetary and advice and

consent powers of the Senate (Bertelli and Grose 2011; McCarty 2004) and external interest

group behavior (Bertelli and Feldmann 2007; Gailmard and Patty 2013).24 Here we discuss

some suggestive evidence that the sorts of managerial factors captured by our model also

play a role, by revisiting a case study examining President Reagan’s experience managing

the Environmental Protection Agency (EPA) (Golden 2000).

President Reagan’s first choice to run the EPA was an archetypal appointee of the “admin-

istrative presidency”: Anne Gorsuch, a conservative ideologue who “shared the president’s

environmental policy agenda” and “convictions about the negative impact that environmen-

tal regulations had on economic growth” (Golden 2000, p. 120). Unsurprisingly, her tenure

was marred by a crisis of morale; careerists reacted like the agent in our model not with

“deliberate foot-dragging or sabotage,” but rather a severe waning of “commitment,” “en-

thusiasm,” and voluntarily effort. Despite Gorsuch’s apparent success shifting policy (Wood

and Waterman 1991), she resigned after 22 months amid controversy and Congressional

hearings, and was replaced by former EPA head and Nixon appointee William Ruckelshaus.

Golden (2000) describes the revolution in agency morale after Ruckelshaus’ return; he was

24See Lewis (2011) for a comprehensive review of the relevant literature.

29

“regarded as a savior” by EPA bureaucrats who “wanted to work for him.” Notably, rather

than returning to an aggressive politicization strategy once the crises subsided, President

Reagan continued to appoint moderate EPA heads for the remainder of his tenure.

While the episode suggests that internal managerial factors may influence even aggressively-

ideological Presidents, it remains unclear what about Ruckelshaus motivated EPA bureau-

crats. It was likely not “going native”; Ruckelshaus remained a committed conservative and

was opposed by environmental groups. He did, however, have a notably respectful, “open”

and “data driven” management style. Interpreted through the lens of the model, what may

have been motivating about Ruckelshaus was not his ideological moderation per se, but his

less rigid beliefs that allowed him to manage experimentally and responsively (Dobel 1992,

p. 251). As would be necessary for our logic to apply, Ruckelshaus was given considerable

discretion by the White House, and was “free to set the agency’s internal agenda” (Golden

2000, p. 138). Finally, the ostensibly liberal bureaucrats under him faithfully implemented

even his conservative policy initiatives like cost-benefit analysis; in their own words, this was

because “career staff recognize that we don’t always know the right thing to do,” and that

“long-standing agency policy may be wrong” (Golden 2000, p. 143).

More broadly, the idea that “management suffers ... when government is run by a tran-

sient group of strangers” is a perennial concern of public administration scholars (Heclo

1977). But why it suffers, and how that influences political decisions, has received little at-

tention (Lewis 2011, p. 60). Our model provides a new way of examining this (re)emerging

question. One reason performance suffers is that appointees and bureaucrats have different

beliefs about how to “get things done.” It can suffer more when appointees impose their

will and bureaucrats “go into hiding.” It can recover when they find ways to work together

despite their disagreements, by trying to “educate” each other and “bring [each other] along,

irrespective of their politics” (Starobin 1995).

Extending the model could therefore help answer many open questions about condi-

tions under which Presidential appointments enhance or degrade agency performance. How

30

does the tenure length of appointees, or their insulation from the executive and legislative

branches, enhance or impede the ability and willingness of appointees and bureaucrats to

“educate each other” and improve performance? Do the short tenures of political appointees

diminish their ability to do so, or actually enhance their incentives to effect an enduring

change in bureaucratic beliefs? Do electoral expectations influence Presidents’ decisions to

try and make appointments that “persuade,” rather than control? Answering these open

questions would shed further light on Presidential appointment decisions.

7 Conclusion

In the study of bureaucratic relationships, the idea that policymakers are uncertain about

how to achieve their goals has long been central. However, the literature has largely focused

on only one aspect of this uncertainty – that an agent may have superior information about

the consequences of different policies. As argued by Moe (2012), this consensus has been

“empowering,” but also “constraining”: by pushing the study of bureaucratic politics “away

from other avenues that, even if potentially productive, are not compatible with the accepted

way of thinking about things,” it has resulted in “a kind of path dependence ... that makes

certain kinds of progress more difficult to achieve.”

In this paper, we attempt to break this path dependence by considering a radically dif-

ferent aspect of bureaucratic relationships – that principals and agents might openly disagree

about how to achieve shared goals. To explore its implications, we develop two models of

policy choice and implementation – one in which policy outcomes cannot be observed, and

another in which they can. The progress made in doing so is three-fold. First, we find that

the ability to observe and learn from policy outcomes improves the efficiency and motivation

of agents, even when principals ignore the information in their choices. Second, we find

motivational benefits to experimenting, and show that they are specific to disagreements

rooted in beliefs rather than preferences. Finally, we show that belief conflicts can produce

31

a sometimes-extraordinary degree of deference driven by the desire to persuade.

Stepping back from the study of bureaucracy, our model illustrates some incentives spe-

cific to belief conflicts and learning that are relevant to many additional questions across

political science. We give three examples as suggestions for future research.

First, a burgeoning literature studies the organization and decision making of terrorist

groups using classical principal-agent models (e.g., Shapiro and Siegel (2007); Bueno de

Mesquita (2008)). However, for reasons articulated by Shapiro (2013), the assumptions of

our model are arguably an even better fit in this setting than in public bureaucracies. A

perverse sort of intrinsic motivation is necessary to recruit terrorist “agents” for poorly paid

and highly dangerous work, and it is risky for leaders to monitor these agents too closely.

Direct mechanisms of control like punishment can backfire and result in defections, or even

violent reprisals. Most importantly, terrorist organizations are riven by belief conflicts,

in particular, about the political consequences of violent versus non-violent tactics.25 Do

terrorists choose tactics with an eye toward what the observed political repercussions will

“teach” competing factions, difficult-to-control operatives, and potential sympathizers about

how to most effectively pursue their aims? Do shifting strategies help motivate rank and file

operatives even when they are otherwise counterproductive?

Second, a large literature studies how public opinion constrains the foreign policy deci-

sions of chief executives (Aldrich et al. 2006). If such constraints on the public’s “agent”

(Ashworth 2012) are due to differences in beliefs, then a model like ours may shed light

on how chief executives cope with them. For example, it may sometimes be rational for

executives to pursue “bad” policies so failure will persuade the public to agree with them.26

Documentary evidence suggests that the Johnson administration’s decision to halt bombing

and pursue a “peace offensive” during the Vietnam War in 1965 was driven by such motives.

Despite little confidence that the peace initiative would succeed, the administration gave

25Shapiro (2013) also analyzes the content of 108 terrorist memoirs and finds evidence of“induced” disagreements driven by “different beliefs about what to do” in 58% of them.

26In our model the agent doesn’t “teach” the principal but would if effort were observable.

32

it “the old college try.” Their primary motive was to persuade the “73% of the American

public eager for a cease-fire” that peace was futile, as well as “Fulbright ... the New York

Times, all these people thinking there could be peace if we were only willing to have peace.”

The amount of effort expended on peace was also understood to be a “crucial element” of

the persuasiveness of failure: the administration sought to show that “we have explored fully

every alternative” and “left no door to peace untried” (Dallek 1996, pp. 152-153).

Finally, pilot experiments to test new programs have become increasingly common and

influential with governments, the media, and the public (Rogers-Dillon 2004). However,

their political uses and consequences have received little attention. A “textbook” view of

pilot projects would embrace them as sincere efforts to learn about policies’ effectiveness,

and a raw political view would discount them as manipulation by the policies’ supporters

and beneficiaries (Rogers-Dillon 2004). Our model suggests that they may be understood as

a sort of biased and belief-driven political entrepreneurship. By proposing to “experiment

with new programs or ... innovations,” policy-motivated actors with genuinely strong be-

liefs may be seeking to produce persuasive evidence that “convince[s] diverse coalitions of

organized interests ... of the value of their ideas.” (Carpenter 2001, p. 30). As Carpenter

(2001) demonstrates in his influential study of the 19th century U.S. postal service, this sort

of experimentation-as-entrepreneurship by early agency heads was important in the devel-

opment of bureaucratic autonomy. A model of pilot experimentation based on belief (and

preference) conflicts could generate useful insights about how and when such programs are

used, and their political role in policymaking and institutional development.

Appendix

We first derive equations 2 and 3; for simplicity denote θ for θ2, e for e1, and h (·) for h (e, θ).

Rearranging the “long form” in equation 1 when the principal rigidly implements a yields:

(− e

2

2λ+ θe

)+λ

2θe+ λ

(θ (1− e)h (·)− 1

2(θ (1− e) + (1− θ))h2 (·)

).

33

Now from the definition of h (·) we have (θ (1− e) + (1− θ))h2 (·) = θ (1− e)h (·); substi-

tuting and simplifying yields(− e2

2λ+ θe

)+ λ

2θe+ λ

2θ (1− e)h (·) , and then equation 2.

To derive equation 3, first write the analog to equation 1 when the principal experiments:

− e2

2λ+ θ

(e

(1 +

λ

2

)+ (1− e)

(−λ

2(1− h (·))2

))+(1− θ)

(λ (1− h (·))− 1

2λ (1− h (·))2

).

To see this, recall that failure results in a switch to b, and that the agent’s posterior belief

that b is the best policy after a fails is 1− h (·). Rearranging then yields:

(− e

2

2λ+ θe

)+λ

2θe+ λ

((1− θ) (1− h (·))− 1

2(θ (1− e) + (1− θ)) (1− h (·))2

)

Similar to above, observe that (θ (1− e) + (1− θ)) (1− h (·))2 = (1− θ) (1− h (·)). Substi-

tuting and simplifying then yields equation 3.

Proof of Propositions 1 and 2 Consider the function U (e, θ, η1, η2) =

(θe− e2

2λ

)+λ

2

(1− η1) θ2+

η1 ((1− η2) · θ (e+ (1− e)h (e, θ)) + η2 · (1− θ) (1− h (e, θ)) + η2θe)

.

It is simple to verify from the definitions that U (e1, θ2, 0, 0) is the agent’s two-period ex-

pected utility when the principal rigidly implements a in the No Learning Benchmark

(NLB), U (e1, θ2, 1, 0) when she rigidly implements a in the Game with Learning (GWL),

and U (e1, θ2, 1, 1) when she experiments with a in the GWL (analogous expressions for b

involve substituting 1− θ2 for θ2). Now denote the agent’s optimal effort given each objec-

tive function e∗ (θ, η1, η2) = arg maxe∈[0,1]

(U (e, θ, η1, η2)). In Lemma 1 in the online Appendix we

prove that e∗ (θ, η1, η2) is unique and ∈ (0, 1) ∀ (θ ∈ (0, 1), η1, η2) when λ < λ ≈ .68466.

Now the derivatives of both θ (e+ (1− e)h (e, θ)) and (1− θ) (1− h (e, θ)) w.r.t. e are

equal to θ (1− h (e, θ))2, which is > 0 ∀θ ∈ (0, 1). (To verify use that ∂h∂e

= − θ(1−θ)(1−θe)2 ,

(1− e) ∂h∂e

= −h (·) (1− h (·)), and − (1− θ) ∂h∂e

= θ (1− h (·))2). Applying this yields that

34

∂2U(e,θ,η1,0)∂η1∂e

= λ2θ (1− h (e, θ))2 and ∂2U(e,θ,1,η2)

∂η2∂e= λ

2θ. Intuitively, ∂2U(e,θ,η1,0)

∂η1∂eis the increase

in the marginal benefit of effort going from the NLB to the GWL, and ∂2U(e,θ,1,η2)∂η2∂e

is the

increase in the marginal benefit of effort when the principal goes from rigidly implementing

to experimenting. Since both cross partials are strictly positive when θ ∈ (0, 1), Theorem

1 of Edlin and Shannon (1998) (see Ashworth and Bueno de Mesquita (2006)) implies that

e∗ (θ2, 1, 0) > e∗ (θ2, 0, 0) ∀θ2 ∈ (0, 1) – the agent works harder on a in the GWL (Prop. 1) –

and e∗ (θ2, 1, 1) > e∗ (θ2, 1, 0) ∀θ2 ∈ (0, 1) – the agent works harder on a when the principal

experiments (Prop. 2). Finally, by symmetry these results also apply to x1 ∈ b.

To see that going from the NLB to the GWL increases the probability of success in both

periods (Prop. 1), first observe that the first period increase follows immediately from the

agent’s greater effort. The expected second period probability of success in the GWL (from

the principal’s perspective) is θ1·(λ (e1 + (1− e1)h (e1, θ2))), which is equal to the probability

of success θ1 · λθ2 in the NLB when e1 = 0, and is (already shown to be) increasing in e1. �

References

Aberbach, Joel D. 2000. In the Web of Politics. Washington, D.C: Brookings Institution

Press.

Abramowitz, Alan I., and Kyle L. Saunders. 2008. “Is Polarization a Myth?” The Journal

of Politics 70(02): 542–555.

Aldrich, John H., Christopher Gelpi, Peter Feaver, Jason Reifler, and Kristin Thompson

Sharp. 2006. “Foreign Policy and the Electoral Connection.” Annual Review of Political

Science 9(1): 477–502.

Arrow, K. J. 1964. “The Role of Securities in the Optimal Allocation of Risk-bearing.” The

Review of Economic Studies 31(April): 91–96.

35

Ashworth, Scott. 2012. “Electoral Accountability: Recent Theoretical and Empirical Work.”

Annual Review of Political Science 15(1): 183–201.

Ashworth, Scott, and Ethan Bueno de Mesquita. 2006. “Monotone Comparative Statics for

Models of Politics.” American Journal of Political Science 50(January): 214–231.

Aumann, Robert J. 1976. “Agreeing to Disagree.” The Annals of Statistics 4(November):

1236–1239.

Bartels, Larry M. 2008. Unequal Democracy. Princeton: Princeton University Press.

Bergemann, Dirk, and Ulrich Hege. 2005. “The Financing of Innovation: Learning and

Stopping.” The RAND Journal of Economics 36(December): 719–752.

Bertelli, Anthony, and Sven E. Feldmann. 2007. “Strategic Appointments.” Journal of Public

Administration Research and Theory 17(January): 19–38.

Bertelli, Anthony M., and Christian R. Grose. 2011. “The Lengthened Shadow of Another

Institution?” American Journal of Political Science 55(4): 767–781.

Besley, Timothy, and Maitreesh Ghatak. 2006. “Sorting with Motivated Agents.” Journal

of the European Economic Association 4(2-3): 404–414.

Bevan, Gwyn, and Christopher Hood. 2006. “Whats Measured Is What Matters: Targets

and Gaming in the English Public Health Care System.” Public Administration 84(3):

517–538.

Boardman, Craig, and Eric Sundquist. 2008. “Toward Understanding Work Motivation:

Worker Attitudes and the Perception of Effective Public Service.” The American Review

of Public Administration (September).

Bratton, William. 1998. Turnaround. New York: Random House.

36

Brehm, John, and Scott Gates. 1999. Working, Shirking, and Sabotage. Ann Arbor: Uni-

versity of Michigan Press.

Brehm, John, and Scott Gates. 2008. Teaching, Tasks, and Trust: Functions of the Public

Executive. New York: Russell Sage Foundation.

Bueno de Mesquita, Ethan. 2008. “Terrorist Factions.” Quarterly Journal of Political Science

3(December): 399–418.

Bueno De Mesquita, Ethan, and Matthew C. Stephenson. 2007. “Regulatory Quality Under

Imperfect Oversight.” American Political Science Review 101(03): 605–620.

Callander, Steven. 2011. “Searching for Good Policies.” American Political Science Review

105(4): 643–662.

Callander, Steven, and Patrick Hummel. 2014. “Preemptive Policy Experimentation.”

Econometrica 82(4): 1509–1528.

Campbell, Donald T. 1969. “Reforms as experiments.” American Psychologist 24(4): 409–

429.

Carpenter, Daniel P. 2001. The Forging of Bureaucratic Autonomy. Princeton, N.J.: Prince-

ton University Press.

Che, Yeon Koo, and Navin Kartik. 2009. “Opinions as Incentives.” Journal of Political

Economy 117(October): 815–860.

Dallek, Robert. 1996. “Lyndon Johnson and Vietnam: The Making of a Tragedy.” Diplo-

matic History 20(April): 147–162.

DiIulio, John J. 1987. Governing Prisons. London: Collier Macmillan.

Dobel, P. J. 1992. “William D. Ruckelshaus.” In Exemplary Public Administrators, ed.

Terry L. Cooper. San Francisco: Jossey-Bass Publishers pp. 241–269.

37

Draper, Robert. 2014. “The War Within.” POLITICO Magazine (November).

Durant, Robert F., Robert Kramer, James L. Perry, Debra Mesch, and Laurie Paarlberg.

2006. “Motivating Employees in a New Governance Era: The Performance Paradigm

Revisited.” Public Administration Review 66(4): 505–514.

Edlin, Aaron S., and Chris Shannon. 1998. “Strict Monotonicity in Comparative Statics.”

Journal of Economic Theory 81(July): 201–219.

Epstein, David, and Sharyn O’Halloran. 1999. Delegating Powers: A Transaction Cost

Politics Approach to Policy Making Under Separate Powers. Cambridge: Cambridge Uni-

versity Press.

Gailmard, Sean, and John W. Patty. 2007. “Slackers and Zealots.” American Journal of

Political Science 51(4): 873–889.

Gailmard, Sean, and John W. Patty. 2012. “Formal Models of Bureaucracy.” Annual Review

of Political Science 15(1): 353–377.

Gailmard, Sean, and John W. Patty. 2013. Learning While Governing. Chicago: The

University of Chicago Press.

Gates, Robert Michael. 2014. Duty: Memoirs of a Secretary at War. New York: Alfred A.

Knopf.

Geanakoplos, John D, and Heraklis M Polemarchakis. 1982. “We can’t disagree forever.”

Journal of Economic Theory 28(October): 192–200.

Georgellis, Yannis, Elisabetta Iossa, and Vurain Tabvuma. 2011. “Crowding Out Intrinsic

Motivation in the Public Sector.” Journal of Public Administration Research and Theory

21(July): 473–493.

Golden, Marissa. 2000. What Motivates Bureaucrats? New York: Columbia University

Press.

38

Goodsell, Charles T. 2011. Mission Mystique. Washington, D.C: CQ Press.

Heclo, Hugh. 1974. Modern Social Politics in Britain and Sweden. New Haven: Yale Uni-

versity Press.

Heclo, Hugh. 1977. A Government of Strangers. Washington: Brookings Institution.

Heinrich, Carolyn J. 2007. “Evidence-Based Policy and Performance Management Challenges

and Prospects in Two Parallel Movements.” The American Review of Public Administra-

tion 37(September): 255–277.

Holmstrom, Bengt, and Paul Milgrom. 1991. “Multitask Principal-Agent Analyses: Incentive

Contracts, Asset Ownership, and Job Design.” Journal of Law, Economics, & Organization

7(January): 24–52.

Huber, John D., and Charles R. Shipan. 2002. Deliberate Discretion? The Institutional

Foundations of Bureaucratic Autonomy. Cambridge: Cambridge University Press.

Huber, John D., and Nolan McCarty. 2004. “Bureaucratic Capacity, Delegation, and Political

Reform.” The American Political Science Review 98(August): 481–494.

Kaufman, Herbert. 1967. The Forest Ranger. Baltimore: Johns Hopkins Press.

Kaufman, Herbert. 1981. The Administrative Behavior of Federal Bureau Chiefs. Washing-

ton, D.C: Brookings Institution.

Keller, Godfrey, Sven Rady, and Martin Cripps. 2005. “Strategic Experimentation with

Exponential Bandits.” Econometrica 73(1): 39–68.

Kettl, Donald F. 2005. The Global Public Management Revolution. Washington, D.C: Brook-

ings Institution Press.

Krier, James E. 1977. Pollution and Policy. Berkeley: University of California Press.

Lee, Kai N. 1993. Compass and Gyroscope. Washington, D.C: Island Press.

39

Lewis, David E. 2008. The Politics of Presidential Appointments. Princeton: Princeton

University Press.

Lewis, David E. 2011. “Presidential Appointments and Personnel.” Annual Review of Polit-

ical Science 14(1): 47–66.

Long, Edward, and Aimee L. Franklin. 2004. “The Paradox of Implementing the Govern-

ment Performance and Results Act: Top-Down Direction for Bottom-Up Implementation.”

Public Administration Review 64(3): 309–319.

Maple, Jack, and Chris Mitchell. 2010. The Crime Fighter. Random House Digital, Inc.

McCarty, Nolan. 2004. “The Appointments Dilemma.” American Journal of Political Science

48(3): 413–428.

Milgrom, Paul, and Chris Shannon. 1994. “Monotone Comparative Statics.” Econometrica

62(January): 157–180.

Milgrom, Paul, and John Roberts. 1988. “An Economic Approach to Influence Activities in

Organizations.” American Journal of Sociology 94(January): S154–S179.

Minozzi, William. 2013. “Endogenous Beliefs in Models of Politics.” American Journal of

Political Science 57(3): 566–81.

Moe, Terry M. 1985. “The Politicized Presidency.” The New Direction in American Politics

235: 269–71.

Moe, Terry M. 2012. “Delegation, Control, and the Study of Public Bureaucracy.” 10(2).

Morris, Stephen. 1995. “The Common Prior Assumption in Economic Theory.” Economics

and Philosophy 11(02): 227–253.

Moynihan, Donald P. 2008. The Dynamics of Performance Management. Washington, D.C:

Georgetown University Press.

40

Moynihan, Donald P., Sanjay K. Pandey, and Bradley E. Wright. 2012. “Prosocial Values

and Performance Management Theory: Linking Perceived Social Impact and Performance

Information Use.” Governance 25(3): 463–483.

Nie, Martin A., and Courtney A. Schultz. 2012. “Decision-Making Triggers in Adaptive

Management.” Conservation Biology 26(6): 1137–1144.

Office of Environmental Policy and Compliance. 2008. “Adaptive Management Implemen-

tation Policy.” Department of the Interior Departmental Manual 522(February).

Osborne, David. 1992. Reinventing Government. Reading, MA: Addison-Wesley.

Perry, James L., and Annie Hondeghem. 2008. Motivation in Public Management: The Call

of Public Service. Oxford: Oxford University Press.

Perry, James L., Annie Hondeghem, and Lois Recascino Wise. 2010. “Revisiting the Motiva-

tional Bases of Public Service: Twenty Years of Research and an Agenda for the Future.”

Public Administration Review 70(5): 681–690.

Pollitt, Christopher. 2006. “Performance Information for Democracy.” Evaluation 12(Jan-

uary): 38–55.

Prendergast, Canice. 2008. “Intrinsic Motivation and Incentives.” The American Economic

Review 98(May): 201–205.

Rogers-Dillon, Robin. 2004. The Welfare Experiments. Stanford, CA: Stanford Law and

Politics.

Sabatier, Paul A. 1988. “An advocacy coalition framework of policy change and the role of

policy-oriented learning therein.” Policy Sciences 21(June): 129–168.

Sabatier, Paul A., and Susan Hunter. 1989. “The Incorporation of Causal Perceptions into

Models of Elite Belief Systems.” Political Research Quarterly 42(September): 229–261.

41

Shapiro, Jacob N. 2013. The Terrorist’s Dilemma: Managing Violent Covert Organizations.

Princeton University Press.

Shapiro, Jacob N., and David A. Siegel. 2007. “Underfunding in Terrorist Organizations.”

International Studies Quarterly 51(June): 405–429.

Simon, Herbert A. 1947. Administrative Behavior. New York: Macmillan.

Smith, Alastair, and Allan C. Stam. 2004. “Bargaining and the Nature of War.” Journal of

Conflict Resolution 48(December): 783–813.

Starobin, Paul. 1995. Surviving at the EPA: Gary Dietrich. (C16-84-592.0) Cambridge, MA:

Kennedy School of Government Case Program.

Thompson, Frank J., and Norma M. Riccucci. 1998. “Reinventing Government.” Annual

Review of Political Science 1(1): 231–257.

U.S. Government Accountability Office. 1996. Managing Reform: Status of Agency Rein-

vention Lab Efforts. (GAO/GGD-96-69) Washington, D.C.: U.S. Government Printing

Office.

Van den Steen, Eric. 2009. “Authority versus Persuasion.” The American Economic Review

99(May): 448–453.

Van den Steen, Eric. 2010a. “Disagreement and the Allocation of Control.” Journal of Law,

Economics, and Organization 26(August): 385–426.

Van den Steen, Eric. 2010b. “Interpersonal Authority in a Theory of the Firm.” American

Economic Review 100(1): 466–90.

Volden, Craig, Michael M. Ting, and Daniel P. Carpenter. 2008. “A Formal Model of

Learning and Policy Diffusion.” American Political Science Review 102(August): 319–332.

42

Weible, Christopher M., Paul A. Sabatier, and Kelly McQueen. 2009. “Themes and Varia-

tions: Taking Stock of the Advocacy Coalition Framework.” Policy Studies Journal 37(1):

121–140.

Wilson, James Q. 1989. Bureaucracy. New York: Basic Books.

Wilson, James Q., and George L. Kelling. 1982. “Broken Windows.” Atlantic Monthly 249(3):

29–38.

Wood, B. Dan, and Richard W. Waterman. 1991. “The Dynamics of Political Control of the

Bureaucracy.” American Political Science Review 85(September): 801–828.

Yildiz, Muhamet. 2004. “Waiting to Persuade.” The Quarterly Journal of Economics

119(February): 223–248.

43

Online Appendix – NOT FOR PUBLICATION“Experimentation and Persuasion in Political Organizations”

Alexander V. Hirsch, February 28 2015.

This Online Appendix is divided into five parts. Appendix A provides a general state-

ment of strategies and equilibria, and describes how we do equilibrium selection when there

are multiple equilibria. Appendix B proves Proposition 3. Appendix C proves verbal state-

ments in Section 5.2 about the principal’s commitment problem and institutional solutions.

Appendix D contains accessory lemmas used in the other proofs. Appendix E treats the

variant of the model with differing preferences rather than beliefs discussed in Section 5.2.

1

A Equilibrium Characterization and Selection

We begin by introducing additional notation and providing a general equilibrium charac-

terization for the baseline model. This requires allowing mixed strategies for the principal,

which are omitted from the main text for simplicity. As in the main text, a strategy for the

agent consists of two functions e1 (x1) , e2 (x1, e1, y1, x2) to [0, 1] mapping histories to effort.

To allow for mixed strategies for the principal, we now denote the principal’s strategy as a

probability p1 ∈ [0, 1] of initially choosing policy a, and a set of probabilities px1

y1 ∈ [0, 1] of

sticking with the initial policy x1 after outcome y1 for every (x1, y1). Throughout we will

also use Pi (·) to denote probabilities evaluated with respect to the prior of player i – so

P2 (ω = b) = 1− θ2 denotes the agent’s prior belief that b is the correct policy.

The Agent’s Problem In the second period the agent exerts effort e2 (x1, e1, y1, x2) =

λP2 (ω = x2 |x1, e1, y1) and his expected utility is λ2

[P2 (ω = x2 |x1, e1, y1)]2. For simplicity,

denote his prior that ω = a as θ, his initial effort e1 as e, and pas and paf as ps and pf . Then

it is straightforward to show that his expected two-period utility when x1 = a as a function

of first period effort is

U2 (e, θ, ps, pf ) =

(θe− e2

2λ

)+λ

2(pf · θ (e+ (1− e)h (e, θ)) + (1− pf ) · (1− θ) (1− h (e, θ)))

+ (ps − pf )λ

2θe (A.1)

By symmetry the agent’s expected utility from exerting effort e1 on an arbitrary policy

x ∈ {a, b} is U2

(e1, P2 (ω = x) , pxs , p

xf

). Also note that U2 (·) is distinct from the expression

U (·) for the agent’s objective function as defined in the main Appendix because the latter

amalgamated utilities from the NLB and the GWL, and did not account for mixed strategies.

The Principal’s Problem The principal’s second period policy choices must be interim-

optimal given e2 (x1, e1, y1, x2) and her posteriors computed with the agent’s equilibrium

2

strategy. Thus, she must stay with the initial policy x if it succeeds (pxs = 1), and can only

stay with the initial policy if it fails(pxf > 0

)i.f.f. h (e1 (x) , P1 (ω = x)) ≥ 1−h (e1 (x) , P2 (ω = x)).

If the inequality is strict then pxf = 1.

For simplicity denote e1 as e and pas and paf as ps and pf . In period 1 the principal’s

expected utility from selecting policy a when she expects first period effort e and future

equilibrium behavior is,

U1

(e1, θ1, θ2, ps, pf

)= θ1 (e+ λ (eps + (1− e) pf · h (e, θ2)))+(1− θ1) (1− pf )λ (1− h (e, θ2))

(A.2)

By symmetry her expected utility from some x is U1

(e1, P1 (ω = x) , P2 (ω = x) , pxs , p

xf

).

Equilibrium Conditions Strategies(x1, pas , p

af , p

bs, p

bf

)and (e1 (x1) , e2 (x1, e1, y1, x2)) are

an equilibrium if and only if they satisfy the following conditions.

(Agent Optimality)

1. e2 (x1, e1, y1, x2) = λP2 (ω = x2 |x1, e1, y1) (the agent optimizes in the second period)

2. e1 (x) ∈ arg maxe1∈[0,1]

{U2

(e1, P2 (ω = x) , pxs , p

xf

)}∀x ∈ {a, b} (the agent optimizes in the

first period given the principal’s strategy and expectations about his own future effort)

(Principal Optimality)

1. pxs = 1 ∀x ∈ {a, b} (the principal always stays after success)

2. ∀x ∈ {a, b} pxf ≥ 0 ⇐⇒ h (e1 (x) , P1 (ω = x)) ≥ 1 − h (e1 (x) , P2 (ω = x)) and = 1

if satisfied with strict inequality (the principal only stays after failure if it is interim-

optimal given on-path posteriors)

3. x1 ∈ arg maxx∈{a,b}

{U1

(e1 (x) , P1 (ω = x) , P2 (ω = x) , pxs , p

xf

)}(the principal’s initial policy

choice maximizes her expected continuation value).

3

Equilibrium Selection and Notation Lemma 2 in the Appendix D proves the following

two statements: (i) whenever experimenting is an equilibrium in a subgame x1, it is the

optimal strategy for the principal even if she could precommit to her future decisions, and

(ii) if experimenting with x1 = a is not an equilibrium of the subgame following x1 = a,

then the unique equilibrium is rigid implementation. Together, these statements imply that

we can select the equilibria that are best for the principal by considering only pure strategy

equilibria, and choosing experimentation for sure in the subgame commencing with policy

x1 ∈ {a, b} (i.e., px1

s = 1 and px1

f = 0) whenever it is an equilibrium.

With this selection and restriction to pure strategies, we now introduce simplified nota-

tion for the agent’s best responses and principal’s utility. First let es (θ2) denote the agent’s

first-period best response when the principal’s pure strategy is (x1 = a, s), where s ∈ {R,E}

denotes whether the principal (R)igidly implements or (E)xperiments with the initial pol-

icy. It is also helpful to state implicit characterizations of eR (θ2) and eE (θ2) so we can

approximate their values in several proofs. The FOC from the proof of Lemma 1 yields that

eR (θ2) = λθ2

(1 +

λ

2k (e, θ2)

)and eE (θ2) = λθ2

(1 +

λ

2(k (e, θ2) + 1)

),

where k (e, θ) = (1− h (e, θ))2.

Second, let V s1 (θ1, θ2) denote the principal’s two-period expected utility when her pure

strategy is (x1 = a, s) and the agent best-responds, so V R1 (θ1, θ2) = U1

(eR (θ2) , θ1, θ2, 1, 1

)and V E

1 (θ1, θ2) = U1

(eE (θ2) , θ1, θ2, 1, 0

). Third, let θ (θ2) be the unique solution to

h(eR (θ2) , θ (θ2)

)= 1− h

(eR (θ2) , θ2

), (A.3)

By the equilibrium characterization, experimenting with x1 = a is an equilibrium of that

subgame i.f.f. θ1 ≤ θ (θ2), and it is easy verified that θ (θ2) > 1− θ2. Finally, by symmetry

the agent’s effort on x1 = b is es (1− θ2), the principal’s utility is V s1 (1− θ1, 1− θ2), and the

threshold for experimentation with b is 1− θ1 < θ (1− θ2) ⇐⇒ θ1 > 1− θ (1− θ2).

4

B Proof of Proposition 3

In Lemma 3 in the Supplemental Proofs, we prove the following handy property: fixing the

principal’s experimentation decisions down each path of play, if she prefers x1 = a given

beliefs θ1 (in the sense of ex-ante expected utility) then she also prefers it for all higher

beliefs θ1 > θ1 (of course, by symmetry if she prefers x1 = b at beliefs θ1 then she also prefers

it for all beliefs θ1 < θ1). We call this property “preference monotonicity” and employ it in

this and the subsequent proofs.

Part 1 We prove that deference expands in GWL as compared to the NLB both in the

baseline model when λ ≥ λ ≈ .23505, and when the principal can commit ∀λ. When λ goes

below λ and the principal can’t commit, there opens up a very tiny interval of beliefs where

the principal would have deferred in the No Learning Benchmark, but does not in the Game

with Learning. Using Mathematica we find that the size of this interval is maximized when

(λ = .123, θ2 = .545), and at these values is θ1 ∈ (.5, .508). In this interval the principal is

actually best off deferring by experimenting with b, but can’t credibly commit do so because

the agent is working so (unrealistically) little that almost nothing is learned from failure.

Rather than rigidly implement b, she experiments with a to better motivate the agent.

To prove that deference strictly expands when the principal can commit, we argue that the

following property proved in analytically in Lemma 4 in the Supplemental Proofs is sufficient:

a principal with beliefs θ1 = 1 − θ2 strictly prefers experimenting with b to experimenting

with a. To see this, observe that for θ1 ≤ 1− θ2 experimenting with a is strictly better than

any other strategy with a, by 1− θ2 < θ (θ2) and Lemma 2. If experimenting with b is also

strictly better than experimenting with a, then it is strictly better than any other strategy

with a, and x1 = b must be chosen (either experimenting or rigidly implementing). Thus

within the entire deference region from the NLB the principal defers in the GWL. Because

the preference for deference is there strict, the deference region must strictly expand.

To prove that the deference region strictly expands when the principal can’t commit,

5

observe that the preceding argument proves the principal defers whenever her beliefs θ1 are

∈[1− θ (1− θ2) , 1− θ2

], because this condition implies that the principal experiments in

the subgame following x1 = b without commitment. However, to show that the deference

region in the GWL contains the entire deference region in the NLB, we must also show

that the principal will still choose x1 = b even when θ1 ∈[

12, 1− θ (1− θ2)

]and she rigidly

implements in the subgame following x1 = b. If θ2 is such that 12≥ 1 − θ (1− θ2) then

this region is empty. If 1 − θ (1− θ2) > 12, then we require that a principal with beliefs

θ1 = 1 − θ (1− θ2) weakly prefers rigidly implementing b to experimenting with a. If this

holds then by preference monotonicity a principal with beliefs θ1 ∈[

12, 1− θ (1− θ2)

]who

would rigidly implement b if chosen also prefers that to experimenting with a, and so selects

b initially. If it fails, then for some principal beliefs a little bit below 1− θ (1− θ2) < 1− θ2,

the principal will experiment with a in the GWL when she would have deferred in the NLB.

Finally, we prove that a principal with beliefs θ1 = 1 − θ (1− θ2) weakly prefers rigidly

implementing b to experimenting with a in Lemma 5 in the Supplemental Proofs with the

aid of Mathematica, if and only if λ > λ ≈ .23505.

Part 2 We first argue that following three conditions on θ2 are jointly sufficient for the

principal to always defer in the first period regardless of her own beliefs; 1) V E1 (0, 1− θ2) ≥

V R1 (1, θ2), 2) V R

1 (1, θ2) > V E1 (1, θ2), 3) V R

1

(θ (1− θ2) , 1− θ2

)> V E

1

(1− θ (1− θ2) , θ2

).

Conditions (1) and (2) jointly imply that experimenting with b is better than both ex-

perimenting with or rigidly implementing a when θ1 = 1; by preference monotonicity

(proved in Lemma 3 in the Supplemental Proofs) this also implies that experimenting with

b is better ∀θ1 ∈ [0, 1]. Thus, whenever experimenting with b is an equilibrium strategy

(θ1 ≥ 1− θ (1− θ2)) it is chosen. Now whenever experimenting with b is not an equilibrium

strategy(θ1 < 1− θ (1− θ2)

), the principal compares rigidly implementing b to experiment-

ing with a; again applying preference monotonicity, condition (3) implies that she prefers

the former ∀θ1 ≤ 1 − θ (1− θ2). All possible principal beliefs are covered, which completes

6

the argument. We next argue that condition (1) is necessary for the principal to always

defer regardless of her own beliefs; if it fails then V R1 (1, θ2) > V E

1 (0, 1− θ2). For θ1 > θ (θ2)

the principal would rigidly implement a and experiment with b, and by continuity she also

prefers rigidly implementing a to experimenting with b for θ1 sufficiently close to 1. Thus

for such θ1 she selects x1 = a in equilibrium and does not defer.

Finally, Lemma 6 in the Supplemental Proofs proves analytically that when λ > λ (where

λ is the unique solution to λ (1 + λ)(1 + λ

2

)= 1 and ≈ .5214), each of the three conditions

k ∈ {1, 2, 3} holds for θ2 in a nonempty interval (0, εk).27 Since they then all hold for

θ2 ∈ (0, εk), λ > λ is therefore sufficient for existence of a range of θ2 where the principal

always defers. In addition, the supplemental mathematica code verifies that condition 1 fails

∀θ2 > 0 when λ ≤ λ, which is equivalent to

V E1 (0, 1− θ2)− V R

1 (1, θ2)

λθ2

< 0 ∀θ2 ∈[0,

1

2

]when λ < λ.

λ > λ is thus also necessary for existence of a range of θ2 where the principal always defers.

C Underexperimentation and Commitment

In this section we formally state, and then prove, several verbal statements in Section 5.2

about the principal sometimes “underexperimenting,” and institutional arrangements that

can help the principal solve this commitment problem.

The first result pertains the possibility of “underexperimentation” in equilibrium. For-

mally, we say that the principal underexperiments if she rigidly implements a policy x1 on

the equilibrium path of play, but a strategy of experimenting with some policy x1 or ¬x1

would yield higher ex-ante higher expected utility. In other words, she underexperiments if

she implements a polciy in equilibrium, but would experiment with some policy if she could

27Note that we have already shown that property (3) holds ∀θ2 when λ > λ in Lemma 5;however, the proof is computational. In Lemma 6 we prove analytically that property (3)holds for sufficiently low θ2 for any λ ∈ [0, 1].

7

commit to her entire two period strategy ex-ante.

Underexperimentation happens in the model because it is possible that a principal with

relatively strong beliefs in favor of a policy would be better off ex-ante committing to exper-

iment with that policy in order to better motivate the agent, but after actually observing

failure she will want to renege on the experiment and persist with the initial policy. The agent

will anticipate this rigidity, work accordingly, experimentation will collapse in equilibrium.

Formally, we have the following result.

Proposition C.1. For some beliefs (θ1, θ2) the principal underexperiments. Conversely, the

principal never experiments with a policy in equilibrium when rigidly implementing some

policy would yield higher ex-ante expected utility.

The next result considers an institutional arrangement that can eliminate underexperi-

mentation, and in particular that will result in the principal’s “optimal policy experiment”

becoming the equilibrium outcome. The phrase “optimal policy experiment” refers to the

policy experiment that would yield the highest ex-ante expected utility if the principal could

commit to her strategy ex-ante. The result states that creating exogenous “costs to rigidity”

can induce the optimal policy experiment when she underexperiments.

Proposition C.2. For any beliefs (θ1, θ2) s.t. the principal underexperiments, there is a cost

c of maintaining policy after failure that makes her optimal policy experiment an equilibrium.

The final result pertains to a model variant in which the principal can first “appoint”

a player with different beliefs θ1 ∈ [0, 1] to make decisions in her place, and that player

and the agent will then play the equilibrium that is best for the “original” principal with

beliefs θ1. In particular, we look for conditions under which the principal is strictly better off

appointing somebody with beliefs that differ from her own. This yields the following result.

Proposition C.3. Suppose that a principal with beliefs θ1 could appoint a player with beliefs

θ1 to make policy decisions in her place. If appointing herself is not optimal, then any optimal

appointee θ∗1 believes less strongly in the resulting policy x∗1 than the principal does.

8

The intuition here is that the principal would appoint somebody with different beliefs

when she would like to commit ex-ante to experiment with some policy x∗1, but her beliefs

are such that she must rigidly implement x∗1 whenever she selects it, and so in equilibrium

she either rigidly implements x∗1 or experiments with ¬x∗1. The optimal appointee will

be anybody whose beliefs allow her to experiment with x∗1, and this must necessarily be

somebody who believes less strongly in it.

Proofs of underexperimentation and commitment

Proof of Proposition C.1 First, there is never overexperimentation in equilibrium, i.e.,

the principal never experiments with x1 when rigidly implementing either x1 or ¬x1 would

be better. The former is immediately ruled out by Lemma 2. The latter also ruled out

– if rigidly implementing ¬x1 were optimal with commitment then it must be better than

experimenting with ¬x1, and by implication the unique equilibrium of the subgame following

¬x1 without commitment; thus, the principal failing to choose it would be a contradiction.

Next, we there ∃ (θ1, θ2) s.t. underexperimentation occurs, i.e. the principal rigidly im-

plements x1 when experimenting would be better. In part 3 of Lemma 6 in the Supplemental

Proofs we show there exists a nonempty interval of θ2 s.t. the principal prefers rigidly im-

plementing b to experimenting with a when θ1 = 1 − θ (1− θ2). By continuity, rigidly

implementing b is thus the equilibrium outcome for θ1 = 1 − θ (1− θ2) − ε when ε > 0 is

sufficiently close to 0, and it is also worse than experimenting with b since the agent’s effort

drops discretely. Formally,

V E1

(θ (1− θ2) , 1− θ2

)= U1

(eE (1− θ2) , θ (1− θ2) , 1− θ2, 1, 0

)= U1

(eE (1− θ2) , θ (1− θ2) , 1− θ2, 1, 1

)> U1

(eR (1− θ2) , θ (1− θ2) , 1− θ2, 1, 1

)= V R

1

(θ (1− θ2) , 1− θ2

)The first and last equalities follow from the definitions, the second follows from the definition

9

of θ (·), and the inequality from U1 (·) increasing in e1 and eE (1− θ2) > eR (1− θ2). �

Proof of Proposition C.2 If there were an exogenous cost c > 0 of maintaining policy

after failure, then for experimentation to fail to be an equilibrium of the subgame following

policy x requires that the principal prefer to reselect x given initial effort eE (P2 (ω = x)),

the players’ resulting posterior beliefs, and the cost c. This condition is,

h(eE (P2 (ω = x)) , P1 (ω = x)

)− c

λ> 1− h

(eE (P2 (ω = x)) , P2 (ω = x)

)Now suppose the principal underexperiments by rigidly implementing policy x1∗ = a

when c = 0. Experimenting must then be the unique equilibrium of the subgame following

b, it will remain so with any higher cost c > 0, and also rigidly implementing a is better

than experimenting with b. So experimenting with a must be the optimal policy experiment,

but not an equilibrium of the subgame following a without commitment. However, it will

become one when

c ≥ λ(h(eE (θ2) , θ1

)−(1− h

(eE (θ2) , θ2

))),

and so the principal will select it in equilibrium. A symmetric argument holds when the

principal underexperiments by rigidly implementing b. �

Proof of Proposition C.3 Suppose appointing herself is not optimal; then for a principal

with beliefs θ1, the resulting equilibrium (x∗, s∗) is strictly worse than the equilibrium that

would result if an optimal appointee with beliefs θ1 were making policy decisions. Denote

this equilibrium (x, s). First note s must not an equilibrium experimentation decision the

subgame following x for θ1 (since otherwise the principal would choose it). We next argue

that s = E. If s = R then by implication experimenting must be an equilibrium of the

subgame following x1 = x for θ1; but then by Lemma 2 it is also strictly better than rigidly

implementing s and we have a contradiction. Finally, since (x, s = E) is an equilibrium of

10

the subgame following x for θ1 but not θ1, it follows that

h(eE (P2 (ω = x)) , P1 (ω = x)

)< 1− h

(eE (P2 (ω = x)) , P2 (ω = x)

)< h

(eE (P2 (ω = x)) , P1 (ω = x)

),

implying P1 (ω = x) < P1 (ω = x). �

D Supplemental Proofs

In this section we prove a sequence of lemmas that are employed in the previous proofs.

Lemma 1. Say that the agent’s problem is “well-behaved” when e∗ (θ, η1, η2) = arg maxe∈[0,1]

(U (e, θ, η1, η2))

is unique and ∈ (0, 1). The set of λ s.t. the agent’s problem is well-behaved ∀ (θ ∈ (0, 1) , η1, η2) is

an interval λ ∈ [0, λ), where λ ∈(√

5−12,√

3− 1)

and is ≈ .68466.

Proof: First, it is simple to verify that the derivative of the agent’s objective function

U (e, θ, η1, η2) from the proof of Propositions 1 and 2 in the main appendix is:

∂U (e, θ, η1, η2)

∂e=

1

λ

(−e+ λθ

(1 +

λ

2η1 [k (e, θ) + η2]

)),

where k (e, θ) = (1− h (e, θ))2.

Second, observe ∂U∂e

> 0 at e = 0, and that ∂U∂e

is convex in e, which follows from the

convexity of k (e, θ). The set of maximizers is thus either a singleton in the interior (the first

and possibly only point where the FOC is satisfied), a singleton on the boundary e = 1, or a

pair where one of the two elements is e = 1. This further implies that whenever the problem

is not “well behaved,” e = 1 is a maximizer.

Third, observe that ∂2U∂e∂λ

= eλ2

+ θ2η1 (k (e, θ) + η2) > 0. Thus by Milgrom and Shannon

(1994) the set of maximizers of U (·) is weakly increasing in λ. This implies that the set of λ

s.t. the problem is well behaved for a given (η1, η2, θ) is an interval; if it were well behaved

11

for λ′ but not λ′′ < λ′, then e = 1 would be a maximizer for the former but not the latter,

contradicting weak set increasingness. Lastly, this implies that the set of λ that are well

behaved for all feasible parameters ∀ (θ ∈ (0, 1) , η1, η2) is also an interval [0, λ); if it were

not then it would also not an interval for some specific profile of parameters (η1, η2, θ), a

contradiction.

We can bound λ below and above analytically, and computationally compute an estimated

value. We must have λ >√

5−12

since the problem is strictly concave for all feasible parameters

(and by implication well behaved) when ∂U∂e

∣∣e=1

= − 1λ+θ(1 + λ

2η1 (1 + η2)

)≤ − 1

λ+(1 + λ) <

0, which holds i.f.f. λ <√

5−12

. We also must have λ <√

3 − 1 since best-response effort

at θ = 1 when the principal experiments is e∗ (1, 1, 1) = λ(1 + λ

2

)< 1 i.f.f. λ <

√3 − 1.

In the supplemental mathematica code to this document, we verify that λ ≈ .68466. Since

the set of maximizers is weakly increasing in η2 (by ∂2U∂e∂η2

= λθ > 0), to find λ it suffices to

check that the problem is well-behaved ∀θ when η2 = 1. We thus identify λ by compute the

highest λ s.t. ∀θ ∈ [0, 1], the agent’s utility at the lowest solution to the first-order condition

is greater than his utility from e = 1. �

Lemma 2. The following two statements hold when the principal can play mixed strategies.

(i) Whenever experimenting is an equilibrium of the subgame following initial policy x1 ∈

{a, b}, then it is also the optimal (pure or mixed) strategy for the principal if she could

precommit to her responses to success and failure.

(ii) Whenever experimenting with x1 = a is not an equilibrium of the subgame following

x1 ∈ {a, b}, then the unique equilibrium is rigid implementation.

(Part 1) Because of symmetry we can restrict attention to the subgame following x1 = a.

We first must characterize the agent’s best response effort arg maxe∈[0,1]

{U2 (e, θ2, ps, pf )} to a

general mixed strategy by the principal in the Game with Learning as characterized in

12

Appendix A. Taking the derivative of equation (A.1) w.r.t. e yields:

∂U2 (e, θ, ps, pf )

∂e=

1

λ

(−e+ λθ

(1 +

λ

2[k (e, θ) + (ps − pf )]

)).

Now, it is easily verified that∂U2(e,θ,ps,pf)

∂e=

∂U(e,θ,1,ps−pf)∂e

, where U (·) is the form of the

agent’s objective function that we used in the Main Appendix, which amalgamated payoffs

from the NLB and the GWL, but did not account for principal mixed strategies. This means

we can “piggyback” off of the analysis of that problem in Lemma 1. First, it remains true that

the agent’s problem is well behaved ∀ (θ, ps, pf ) when λ < λ. Second, when λ < λ we have

that (i) arg maxe∈[0,1]

{U2 (e, θ2, ps, pf )} = e∗ (θ, 1, ps − pf ) where e∗ (θ, η1, η2) is the maximizer of

U (θ, η1, η2), and (ii) e∗ (θ, 1, ps − pf ) is strictly increasing in ps and strictly decreasing in pf

by ∂2U∂e∂η2

= λθ > 0 and Theorem 1 of Edlin and Shannon (1998). For notational simplicity,

for the remainder of the proof we will write e∗ (θ, 1, ps − pf ) as e (θ, ps − pf ) so as not to carry

around unecessary terms.

(Part 2) We now prove (i). Employing the characterization in Appendix A, the prin-

cipal’s utility from choosing x1 = a if she could precommit to her responses to success and

failure (ps, pf ) would be U1 (e (θ2, ps − pf ) , θ1, θ2, ps, pf ). It is easily verified that U1 (·) is

strictly increasing in ps holding first period effort e fixed, and increasing in e when ps = 1.

Also recall that e (θ2, ps − pf ) is increasing in ps. Hence,

U1 (e (θ2, ps − pf ) , θ1, θ2, ps, pf ) < U1 (e (θ2, ps − pf ) , θ1, θ2, 1, pf ) < U1 (e (θ2, 1− pf ) , θ1, θ2, 1, pf )

and the optimal choice after success is therefore to always stay, i.e. p∗s = 1. This feature

is shared with the baseline model without commitment. Intuitively, the reason is that a

higher probability of staying with the initial policy after success is both interim-better for

the principal, and also better motivates the agent ex-ante.

Given the above analysis, the principal’s optimal choice after policy failure satisfies p∗f ∈

13

arg maxpf∈[0,1] {U1 (e (θ2, 1− pf ) , θ1, θ2, 1, pf )}. Now her utility U1 (·) can be rewritten as,

θ1e · (1 + λps)

+ (1− θ1e) · λ (pf · h (e, θ1)h (e, θ2) + (1− pf ) · (1− h (e, θ1)) (1− h (e, θ2))) ,

and it is easily verified that this is decreasing in pf whenever h (e, θ1) ≤ 1− h (e, θ2), i.e., if

posteriors after failure are s.t. it is better to switch. If experimenting is an equilibrium of

the subgame x1 = a then by definition this property holds for e = e (θ2, 1) (see equilibrium

conditions in Appendix A). In addition, recall that U1 (·) is increasing in e when ps = 1 and

that e (θ2, 1− pf ) is decreasing in pf . Combining these observations yields,

U1 (e (θ2, 1) , θ1, θ2, 1, 0) > U1 (e (θ2, 1) , θ1, θ2, 1, pf ) > U1 (e (θ2, 1− pf ) , θ1, θ2, 1, pf )

whenever experimenting is an equilibrium. Consequently(p∗s = 1, p∗f = 0

), i.e. experiment-

ing, is the optimal strategy with commitment.

(Part 3) We now prove (ii). If experimentation is not an equilibrium of the subgame fol-

lowing x1 = a, then by the equilibrium characterization in Appendix A, h(e(θ2, 1− paf

), θ1

)>

1− h(e(θ2, 1− paf

), θ2

)when paf = 0. Another equilibrium with paf > 0 would require that

the l.h.s. ≤ r.h.s. – but this cannot be since e(θ2, 1− paf

)is decreasing in paf , so the l.h.s. is

increasing and the r.h.s. is decreasing. �

Lemma 3 (Preference Monotonicity). For all (s, s′) ∈ {R,E}2,

V s1

(θ1, θ2

)> V s′

1

(1− θ1, 1− θ2

)→ V s

1 (θ1, θ2) > V s′

1 (1− θ1, 1− θ2) for all θ1 > θ1.

In words, fixing the principal’s experimentation decisions down each path of play, if she

prefers x1 = a given beliefs θ1 then she also prefers it for any higher belief.

Proof: Because the principal’s expected utility for each first period policy is linear in her

14

prior beliefs (holding her future experimentation decisions fixed), a nonmonotonicity would

imply that x1 = b is better when ω = a and x1 = a is better when ω = b. The former could

not be true if she is rigidly implementing b since it would always fail, and the latter could

not be true if she is rigidly implementing a. Thus, a nonmonotonicity requires that she be

experimenting down both paths of play. To rule it out, it therefore suffices to show that

experimenting with b is better than experimenting with a when ω = b (given the agent is

predisposed to b, i.e. θ2 ≤ 12). This is,

(1 + λ) eE (1− θ2) > λ(1− h

(eE (θ2) , θ2

))Applying the definition from Appendix A, we know eE (1− θ) > λ (1− θ) → the l.h.s. is

> (1 + λ)λ (1− θ2). Also, eE (θ2) < λθ2 (1 + λ) → the r.h.s. < λ(1−θ2)

1−θ22λ(1+λ). The above

inequality will thus hold when

(1 + λ)λ (1− θ2) >λ (1− θ2)

1− θ22λ (1 + λ)

⇐⇒ (1 + λ)2 <1

θ22

.

The l.h.s. is < the r.h.s. ∀θ2 ≤ 12

when λ < 1, which always holds by assumption since

λ < λ < 1.�

Lemma 4. The principal strictly prefers experimenting with b to experimenting with a when

θ1 = 1− θ2 and θ2 <12.

Proof: Let φ (θ2) = λθ2

(1 + λ

2

(k(eE (θ2) , θ2

)+ 1))

(1 + λ)−λ(1− h

(eE (1− θ2) , 1− θ2

));

this is the principal’s utility difference between experimenting with a and experimenting with

b when ω = a. Applying symmetry, her expected utility difference between experimenting

with a and experimenting with b with prior θ1 is θ1φ (θ2)− (1− θ1)φ (1− θ2). We now wish

to show that this is < 0 when θ1 = 1− θ2 given that θ2 <12, i.e.

(1− θ2)φ (θ2)− θ2φ (1− θ2) < 0 ⇐⇒ φ (θ2)

λθ2

− φ (1− θ2)

λ (1− θ2)< 0 when θ2 <

1

2.

15

It is simple to verify that

φ (θ2)

λθ2

=

(1 +

λ

2

(k(eE (θ2) , θ2

)+ 1))

(1 + λ)− 1

1− eE (1− θ2) · (1− θ2),

and by symmetry it suffices to show that φ(θ2)λθ2− φ(1−θ2)

λ(1−θ2)> 0 when θ2 >

12. Using substitution

and rearranging, we then have that φ(θ2)λθ2− φ(1−θ2)

λ(1−θ2)> 0 ⇐⇒

θ2eE (θ2)− (1− θ2) eE (1− θ2)

(1− θ2eE (θ2)) · (1− (1− θ2) eE (1− θ2))>λ

2(1 + λ)

(k(eE (1− θ2) , 1− θ2

)− k

(eE (θ2) , θ2

))(D.1)

Since the denominator of the l.h.s. is < 1, and 1 + λ < 2, the above inequality holds if the

following yet stronger inequality holds,

θ2eE (θ2)− (1− θ2) eE (1− θ2) > λ

(k(eE (1− θ2) , 1− θ2

)− k

(eE (θ2) , θ2

)). (D.2)

Now again substituting in the definition of eE (·), the l.h.s. can be rewritten

λ

((2θ2 − 1)

(1 +

λ

2

)+λ

2

(θ2

2k(eE (θ2) , θ2

)− (1− θ2)2 k

(eE (1− θ2) , 1− θ2

))),

implying that the desired inequality holds i.f.f.

(2θ2 − 1)

(1 +

λ

2

)>

(1 +

λ

2(1− θ2)2

)k(eE (1− θ2) , 1− θ2

)−(

1 +λ

2θ2

2

)k(eE (θ2) , θ2

)It is easily verified that the r.h.s. <

(1 + λ

2

) (k(eE (1− θ2) , 1− θ2

)− k

(eE (θ2) , θ2

))when

θ2 > 12, implying that the above inequality holds if the following yet stronger inequality

holds:

2θ2 − 1 > k(eE (1− θ2) , 1− θ2

)− k

(eE (θ2) , θ2

)(D.3)

We now show that eqn. (D.3) holds. Substituting in the definition of k (·), observing that

16

2θ2 − 1 = θ22 − (1− θ2

2), and rearranging, we have that the inequality is equivalent to

θ22

(1− 1

(1− (1− θ2) eE (1− θ2))2

)> (1− θ2)2

(1− 1

(1− θ2eE (θ2))2

)

Since eE (θ2) is increasing in θ2, this clearly holds because (1− θ2) eE (1− θ2) < θ2eE (θ2)

when θ2 >12. The desired property is hence shown. �

Lemma 5. For all θ2 s.t. 1− θ (1− θ2) > 12, the principal weakly prefers rigidly implement-

ing b to experimenting with a when θ1 = 1− θ (1− θ2).

Proof: Observe that 1 − θ (1− θ2) is = 1 at θ2 = 0, less than 12

at θ2 = 12, and strictly

decreasing in θ2. Thus, there exists some unique θ2 satisfying 1− θ(

1− θ2

)= 1

2s.t. the set

of beliefs[

12, 1− θ (1− θ2)

]where the principal would rigidly implement b in that subgame

is nonempty if and only if θ2 < θ2. Note that θ2 is a function of λ so we henceforth write

θ2 (λ) for clarity. Now applying the definitions and rearranging, we wish to show that

∀λ ∈[λ, λ]

,

θ (1− θ2) ·(V R

1 (1, 1− θ2)− V E1 (0, θ2)

)>(1− θ (1− θ2)

)· V E

1 (1, θ2) ∀θ2 ∈[0, θ2 (λ)

],

where λ ≈ .23505. We verify this step in the supplemental mathematical code. �

Lemma 6. The following three properties hold.

1. When λ > λ, V E1 (0, 1− θ2)− V R

1 (1, θ2) > 0 for θ2 in a nonempty interval [0, ε1].

2. When λ ∈ [0, 1] , V R1 (1, θ2)− V E

1 (1, θ2) > 0 for θ2 in a nonempty interval [0, ε2].

3. When λ ∈ [0, 1] , V R1

(θ (1− θ2) , 1− θ2

)− V E

1

(1− θ (1− θ2) , θ2

)> 0 for θ2 in a

nonempty interval [0, ε3].

17

To show that some function f (θ2) satisfying f (0) = 0 is > 0 for θ2 ∈ (0, ε) where ε > 0,

it suffices to show by continuity that f(θ2)θ2

∣∣∣θ2=0

> 0 provided that this quantity is finite. We

now show this for each of the desired expressions.

Property 1: When ω = a, the principal’s utility from experimenting with b is,

λ(1− h

(eE (1− θ2) , 1− θ2

))=

λθ2

1− (1− θ2) eE (1− θ2),

and from rigidly implementing a is,

eR (θ2) (1 + λ)+(1− eR (θ2)

)h(eR (θ2) , θ2

)= λθ2

((1 + λ)

(1 +

λ

2k(eR (θ2) , θ2

))+

(1− eR (θ2)

)2

1− θ2eR (θ2)

)

Since eR (0) = 0, eE (1) = λ(1 + λ

2

), and k

(eR (0) , 0

)= 1,

1

λθ2

(V E

1 (0, 1− θ2)− V R1 (1, θ2)

)∣∣∣∣θ2=0

=

(1

1− λ(1 + λ

2

))− ((1 + λ)

(1 +

λ

2

)+ 1

).

Manipulating the above expression demonstrates that it is ≥ 0 i.f.f.

λ (1 + λ)

(1 +

λ

2

)> 1. (D.4)

which holds i.f.f. λ > λ by definition.

Property 2: When ω = a, the principal’s utility from experimenting with a is,

eE (θ2) (1 + λ) = λθ2

(1 +

λ

2

(k(eE (θ2) , θ2

)+ 1))

(1 + λ) ,

Since k(eE (0) , 0

)= 1 and applying Part 1, the desired condition is equivalent to

(1 + λ)

(1 +

λ

2

)+ 1 > (1 + λ)2

which holds if λ (1 + λ) < 2 ⇐⇒ λ < 1.

18

Property 3: Using the definition in equation (A.3), the threshold function θ (θ2) in closed

form is θ (θ2) = 1−θ2(1−θ2)+θ2(1−eE(θ2))2

. Thus, when θ1 = 1 − θ (1− θ2) the difference in the

principal’s utility between rigidly implementing b and experimenting with a is,

θ (1− θ2) ·(V R

1 (1, 1− θ2)− V E1 (0, θ2)

)−(1− θ (1− θ2)

)· V E

1 (1, θ2) (D.5)

The first term is the subjective probability that ω = b times the utility difference between

rigidly implementing b and experimenting with a when ω = b. The second term is the

subjective probability that ω = a times the utility of experimenting with a when ω = a (the

payoff from rigidly implementing b when the state is a is 0).

Now to get the desired expression we divide through by θ2 and then evaluate at θ2 = 0

– we do so by dividing θ (1− θ2) by θ2 in the first term, then V E1 (1, θ2) by θ2 in the second

term, and then evaluating all expressions at θ2 = 0. First,

(1

θ2

)· θ (1− θ2) =

1

θ2 + (1− θ2) (1− eE (1− θ2))2

which is equal to(1− λ

(1 + λ

2

))−2at θ2 = 0. Second, V R

1 (1, 1− θ2) = 2λ, V R1 (0, θ2) = λ,

and 1 − θ (1− θ2) = 1 at θ2 = 0. Third, from the proof of property 2 we know(

1θ2

)·

V E1 (1, θ2) = λ

(1 + λ

2

(k(eE (θ2) , θ2

)+ 1))

(1 + λ) which is = λ (1 + λ)2 evaluated at θ2 = 0.

Assembling these observations, the desired expression is

(1− λ

(1 +

λ

2

))−2

· (2λ− λ)− λ (1 + λ)2 > 0 ⇐⇒(

1− λ(

1 +λ

2

))2

· (1 + λ)2 < 1

Now the l.h.s. is < (1− λ)2 · (1 + λ)2 = (1− λ2)2 ≤ 1 ∀λ ∈ [0, 1], so the result is shown. �

19

E Preferences vs. Beliefs

Suppose the state ω ∈ {a, b} is payoff-irrelevant for the agent, and he shares the principal’s

prior θ1 over P (ω = a). Instead, he simply has his own return to effort of πxe when working

on each policy x ∈ {a, b}, and πb > πa.

In the second period, his objective function from exerting second period effort e on some

policy x ∈ {a, b} is simply πxe − e2

2λregardless of the history. It is straightforward to show

that his optimal level of effort is λπx, and his expected payoff is λ2π2x.

Now consider the first period. Deriving the agents expected payoff from exerting first

period effort e when the principal rigidly implement policy a is immediate. His utility when

the principal experiments with a is:

− e2

2λ+ πae+ θ1

(e

(λ

2π2a

)+(1− e1

)(λ2π2b

))+ (1− θ2)

(λ

2π2b

),

which easily reduces to the expression in the main text.

Both objective functions are strictly concave, so the unique solution is given by the first-

order condition. The agent’s effort on a given rigid implementation is simply λπa. If the

principal experiments then the derivative of the objective function is

− eλ

+ πa −λ

2θ1

(π2b − π2

a

),

so optimal effort is max{λπa − λ2

2θ1 (π2

b − π2a) , 0

}and is strictly lower as stated in the text.

Finally, by symmetry the agent’s effort if the principal rigidly implements b is λπb, and

if she experiments with b is min{λπb + λ2

2(1− θ1) (π2

b − π2a) , 1

}. The agent therefore works

harder on a policy experiment with the policy b that he prefers; the “threat” of switching to

his less preferred policy a after failure motivates him, as intuition would suggest.

20