Experimentation and Persuasion in PoliticalOrganizations
Alexander V. Hirsch 1
February 28, 2015
1Division of the Humanities and Social Sciences, California Institute of Technology, MC 228-77,Pasadena, CA 91125. Email: [email protected].
Abstract
Different beliefs about how to achieve shared goals are common in political organizationssuch as government agencies, campaigns, and NGOs. However, the consequences of suchconflicts have not yet been explored. We develop a formal model in which a principal and anagent disagree about the right policy for achieving their shared goals. Disagreement creates amotivational problem, but we show how both observing policy outcomes and experimentingwith policies can ameliorate it. We also show that the principal often defers to the agent inorder to motivate him, thereby generating more informative policy outcomes and buildingfuture consensus. Most surprisingly, she sometimes allows the agent to implement his desiredpolicy even when she is sure it is wrong, to persuade him through failure that he is mistaken.Using the model, we generate empirical implications about performance measurement andPresidential appointments in U.S. federal agencies.
1 Introduction
Within political organizations, strong disagreements about how to achieve shared goals are
common. For example, Presidents often appoint agency heads with very different beliefs
about how to achieve agency goals than career bureaucrats (Heclo 1974). Legislators often
disagree strongly about the likely efficacy of new programs that are proposed by federal
agency heads (Carpenter 2001, Ch. 4). And party leaders and activists often disagree about
the likely political consequences of different tactics, e.g., the recent battles between Speaker
Boehner and the Tea Party caucus over a government shutdown (Draper 2014).
Despite the frequency of “belief conflicts,” their consequences have received little at-
tention in the literatures on political agency and bureaucratic politics. The typical model
considers an agent who has both different goals and superior information than his principal,
and analyzes how he can exploit his information to achieve his goals at the principal’s ex-
pense (Moe 2012). Such models have produced important insights about when Congress will
delegate policymaking to the bureaucracy (Epstein and O’Halloran 1999; Huber and Shipan
2002) and how the civil service should be designed (Gailmard and Patty 2007). However,
they cannot be used to study open belief conflicts because, by construction, any belief con-
flict cannot be open – if the principal knew the agent’s beliefs, she could infer the information
that led to them, use it to revise her own beliefs, and eventually eliminate their disagreement.
In this paper we develop a principal-agent model of policymaking in a political organiza-
tion to study belief conflicts. At the heart of our analysis is a simple idea: what makes belief
conflicts different from goal conflicts is that beliefs can change systematically with learning,
while goals do not. In particular, individuals might be persuaded to revise their beliefs in
light of their experiences with actually implementing policies. When strategic political actors
anticipate this, it can influence their policy decisions in subtle and surprising ways.
In the model a principal repeatedly chooses policy, but is uncertain about which one
will most effectively achieve her goals. Her choices must be implemented by an agent, whose
effort increases the chance that an effective policy will succeed, but is wasted on an ineffective
1
policy. As in the existing literature, we assume the agent cannot be directly monitored or
controlled by the principal. In contrast to the existing literature, we assume that he shares the
principal’s goals, and is no better informed about how to achieve them. Instead, he disagrees
about which policy is most likely to achieve those goals in the sense of heterogeneous prior
beliefs (Morris 1995). Because this assumption is an important point of departure from the
existing literature, we briefly defer an in-depth discussion of its meaning and import.
The following running example illustrates the idea. Suppose a city is trying to combat
a wave of violent crime but city officials disagree about its underlying cause, and thus the
best policy response. The mayor (the principal) thinks the main cause is a concentration of
violent criminals, and wants to magnify the department’s efforts to solve violent crimes and
get perpetrators off the streets. The police chief (the agent) thinks that petty crime and
general disorder have caused the community to disengage, thereby encouraging all crime, i.e.,
he believes in “broken windows policing” (Wilson and Kelling 1982). He wants to refocus
the department’s energies on maintaining order by punishing and preventing misdemeanors.
An immediate insight that emerges from the model is that the mayor faces an agency
problem with her police chief, even though they have exactly the same goal of reducing
crime. The reason is that the police chief disagrees that “more of the same” will do the
job. Consequently, he will likely be demoralized, and work with less intensity, if denied the
chance to implement a new policy that he believes is more likely to be successful.
How should the principal manage her agent in the face of a belief conflict? In a private
firm she could impose her desired policy, but also use tools from classical principal-agent
theory, such as closer monitoring, a bonus for a good outcome, and a threat of termination
for a bad one. But these tools are often very costly, unavailable, or ineffective in political
organizations because of civil service protections and political considerations (Brehm and
Gates 1999; Lewis 2008). Instead, principals mainly motivate through their policy choices.
It turns out that the effect of those choices on the agent depends crucially on whether or not
the organization can learn by observing policy outcomes.
2
When learning is impossible, the principal’s policy choices in the face of a belief conflict
exhibit a type of “coping” behavior (Wilson 1989). Because neither inputs (the agent’s
effort) nor outputs (success or failure) can be observed, the principal must make decisions
that cope with disagreement without trying to resolve it. In the model, the principal copes
by deferring to the agent and selecting his desired policy whenever his beliefs are stronger
than her own, calculating that the agent’s greater motivation will outweigh the cost of an
inferior policy. Importantly, the principal does not defer because she thinks that the agent’s
beliefs reflect superior information, as in classical political agency models.
If the principal and agent can observe policy outcomes and learn from them, however,
the principal can do more than just cope. She can experiment with policies – try one out
but discard it for an alternative if it fails. She can also try to persuade the agent to agree
with her through her policy choices. The model with learning produces three main results.
First, we show that introducing the ability to learn by observing policy outcomes is
enough to increase the agent’s effort – even if the principal completely ignores those outcomes
in her decisions. With learning, the agent values his own effort more because it also makes
policy outcomes more informative about the efficacy of the organization’s policy. This helps
the agent decide how hard to work in the future. To understand this effect, consider two
possible approaches the police chief could take if allowed to implement “broken windows
policing”; he could implement the policy haphazardly, or be sure to put in extra time, effort,
and manpower. If broken windows fails to produce results in the former case, it might be
because it was inadequately implemented. But if it fails in the latter case, this is stronger
evidence that it is probably not that effective.
Second, we show that experimenting isn’t just good for finding better policies (Callander
2011) – it’s also better for motivating the agent. The reason is that experimenting ties
the organization’s future policies to the results of the agent’s effort, which motivates him
to work harder. If the mayor in the example says, “we’ll try broken windows today, but
do something different tomorrow if it doesn’t work,” this raises the stakes over its initial
3
success, and motivates the police chief to work harder to ensure that success. Surprisingly,
experimenting better motivates the agent even when it is with the policy he opposes; he never
“sabotages” that policy by withdrawing his effort to ensure failure and a policy change.
Finally, we show that the potential for learning causes the principal to defer more to the
agent. This occurs because learning makes it doubly important to motivate the agent: more
effort produces both a higher chance of success if the chosen policy is effective, and speeds
learning about whether that policy is effective.
In fact, learning makes the agent’s effort triply valuable because it also reduces future
belief conflicts (Smith and Stam 2004). This insight produces our most striking result: the
principal sometimes defers to the agent even when she is already sure that the policy he
favors is ineffective. In other words, the mayor might allow the police chief to implement
broken windows even expecting it to fail. The reason is that she expects failure despite the
chief’s high effort to effectively persuade him that his pet policy doesn’t work. This strategy
of “deferring to persuade” can’t occur in classical political agency models because the only
reason the principal defers is to exploit the agent’s superior expertise.
Summarizing, our analysis makes several contributions. First, it begins to identify the
differences between conflict over goals and conflict over beliefs in political organizations.
Second, it provides new insights about a variety of managerial techniques that have been
extensively discussed in both the political science and public administration literatures;
we show motivational benefits to measuring performance, encouraging experimentation, and
deferring to subordinates. Third, it generates new empirical implications about management
patterns in political organizations; after presenting the model we discuss how to test them
using data from “performance measurement” initiatives in the U.S. federal bureaucracy. We
also discuss how the model can shed further light on how Presidential appointments impact
bureaucratic morale and performance (Lewis 2008). Finally, our analysis contributes to a
growing literature on policy experimentation in political environments. When concluding we
discuss several avenues for future work across political science enabled by the model.
4
Belief Conflicts and Learning The interaction between belief conflicts, learning, and
policymaking has long been of interest in the public policy literature. Sabatier (1988) (build-
ing on seminal work by Heclo (1974)) placed elite “belief systems” about causal effects at
the heart of his influential theory about how “advocacy coalitions” generate policy change,
arguing that “much of the policy debate can be understood as disputes over the validity of ...
causal theories” (p. 157). This work spawned several decades of research on policy learning
and change (Weible, Sabatier, and McQueen 2009); Hall (1993), for example, studies how
high unemployment and stagnating growth in 1970s Britain discredited Keynesian policies
among politicians on both sides of the ideological spectrum who were “seeking solutions to
Britain’s economic problems.”
Only half of these ideas, however, have thus far been incorporated into formal political
science. Callander (2011) and Callander and Hummel (2014) study learning about the spatial
location of policies through experimentation, either by a single actor or two competing actors.
Volden, Ting, and Carpenter (2008) analyze how a group of actors with different ideologies –
such as states or municipalities – learn about the quality of policies through experimentation.
Neither considers how conflicting beliefs affect this process.
This gap in the literature is rooted in the methodological traditions of both political
science and economics. In political science, conflicts are customarily thought to result from
different economic interests (Bartels 2008) or political ideologies (Abramowitz and Saunders
2008). In economics and formal political science, there are many models with differences in
beliefs, but they are nearly always modeled as arising from differences in information. This
is referred to as the common priors assumption, because individuals are assumed to “begin
the game” in some hypothetical condition in which they share common beliefs about all the
relevant uncertainty in the world, and then each receive some “private information,” known
only to themselves, that causes those beliefs to diverge (Morris 1995). While there are many
arguments in the literature both for and against the common priors assumption,1 the key
1Briefly, the two main arguments in favor are a philosophical belief that two rationalindividuals who share all the same information ought to agree, and a practical concern that
5
practical limitation of such models is that they cannot be easily used to study open belief
conflicts. The reason is that knowledge of a belief conflict in a fully rational model with
common priors causes individuals to infer each other’s information, revise their own beliefs
in light of that information, and eventually come to agreement.2
Rather than discard rationality and introduce a host of additional complications, in this
paper we study belief conflicts simply and transparently by discarding the common priors
assumption. That is, we assume outright that the principal and the agent disagree because
they have heterogeneous prior beliefs about which policy is most likely to be effective for
achieving their goals. They are fully aware of each others’ beliefs,3 and do not infer anything
from just the fact that those beliefs differ. While this approach is unorthodox, it nevertheless
has a long history in formal modeling, beginning with Arrow (1964) and continuing through
to the present (Van den Steen 2010b; Minozzi 2013).
As in the present study, a growing subset of this literature studies how belief conflicts
interact with learning (Yildiz 2004; Che and Kartik 2009; Smith and Stam 2004; Van den
Steen 2010a). An important common insight is that individuals with belief conflicts think
they can persuade each other by taking actions that will produce more information, each
expecting it to “prove” that they were right. An example is Smith and Stam’s (2004)
pathbreaking study of bargaining and fighting between two states that disagree about their
relative strength. Because “the act of waging war reveals information about the relative
strengths of each side,” the two parties “fight to resolve their difference of opinions” about
who is stronger (p. 783). They continue to fight until enough information has been revealed,
and “each side’s beliefs converge sufficiently, that they can agree to a settlement” (p. 787).
a modeler can “predict anything” if free to choose beliefs. See Morris (1995) and Smith andStam (2004) for extensive and accessible discussions.
2Specifically, Aumann (1976) showed that a group of individuals cannot have commonknowledge of a belief disagreement if they have common priors. Relatedly, Geanakoplosand Polemarchakis (1982) show that two individuals with common priors who take turnstruthfully exchanging beliefs must agree in finite time.
3Formally, they have common knowledge of heterogeneous prior beliefs; i.e., they knoweach other’s different beliefs, know that they know each other’s beliefs, etc.
6
Public Administration and Bureaucratic Politics While our model is general to many
types of political organizations, an important application is to the study of bureaucracy.
Our assumption that the agent shares the principal’s goals is supported by a vast litera-
ture in bureaucratic politics and public administration that studies the intrinsic motivation
of bureaucrats to carry out their tasks (Brehm and Gates 1999). Many sources for this mo-
tivation have been documented. The agent may like to serve the “public good,” and think
that achieving the principals’s goals would do so; this “public service motivation” (PSM) has
been studied extensively in public administration (Perry and Hondeghem 2008). The agent
may have been actively socialized to share those goals, as is common in “mission driven
organizations” (Kaufman 1967; DiIulio 1987; Goodsell 2011), or he may view his “role” as
serving his political superiors (Wilson 1989; Golden 2000). Theory and data also support
the proposition that intrinsically motivated individuals will sort into public organizations
(Besley and Ghatak 2006; Perry, Hondeghem, and Wise 2010), and once hired, be assigned
to carry out tasks they care about (Brehm and Gates 1999; Prendergast 2008).
Two key takeaways from the literature on intrinsic motivation are that it matters in
explaining how hard bureaucrats work (Perry, Hondeghem, and Wise 2010), and that direct
mechanisms of control such as monitoring, rewards, and sanctions, matter much less; either
because they are unavailable (Lewis 2008), or ineffective (Brehm and Gates 1999; Durant
et al. 2006). We contribute to this literature by separating the agent’s motivation to achieve
the principal’s goals from his potentially differing beliefs about how to do so (Boardman
and Sundquist 2008). Such disagreements are well documented by case studies of federal
management and surveys of political elites (Heclo 1977; Sabatier and Hunter 1989; Golden
2000). This separation immediately yields an underappreciated insight: intrinsic motivation
alone doesn’t “solve” agency problems, because it does a principal little good to have an agent
who shares her goals but disagrees strongly about how to achieve them. We can then derive
new results about how an intrinsically motivated agent is influenced by his environment –
specifically, the presence of “performance information” about policy outcomes (Moynihan
7
2008) and the propensity of managers and political principals to experiment and defer.
Given the need to explore our new assumptions about belief conflicts, we do not directly
consider how external political factors (such as elections or separation of powers) create or
influence those conflicts. Nevertheless, our study is motivated by the broader perspective that
“to understand issues of control and oversight over an organization ... one must understand
issues of compliance within an organization” (Brehm and Gates 1999, p2). This perspective
is shared by many recent works in bureaucratic politics; Gailmard and Patty (2007) and
Lewis (2008) both study how decisions by the politicians affect bureaucratic performance,
while Huber and McCarty (2004) consider the reverse. While we mostly analyze a single
principal and agent, we also begin to move toward this larger goal by considering a variant
in which the principal is appointed by another player, and discuss how it sheds light on the
impact of Presidential appointments on bureaucratic performance (Lewis 2008).
2 Model
The model is a game of repeated policy choice and implementation played between a principal
and an agent. They seek to achieve a shared organizational goal in each of two periods
t ∈ {1, 2}. For simplicity, we assume that the outcome in each period yt is either success
(yt = 1) or failure (yt = 0).
Player 1 is the principal; in each period she publicly chooses a course of action or policy
xt from the set X = {a, b}. This can be thought of as her choice about how the organization
will pursue a goal. Player 2 is the agent; in each period he implements the principal’s policy
with unobservable effort et ∈ [0, 1]. (As in Bueno De Mesquita and Stephenson (2007), the
agent chooses how much effort to put into implementing the policy, rather than the policy
itself.) Effort is costly for the agent, and this cost takes the form −(et)2
2λ, where λ is bounded
between λ and λ.4 The value of λ determines the agent’s willingness to expend costly effort
4λ ≈ .68466 is necessary and sufficient to ensure that the agent’s initial effort is boundedin [0, 1] and is characterized in the proof of Proposition 1. Otherwise exerting effort is
8
to achieve the organization’s goals and thus captures the strength of his intrinsic motivation.
For simplicity we assume that the players place equal weight on payoffs in each period.
The players’ two period payoffs are thus:
y1 + y2
(y1 − (e1)
2
2λ
)+
(y2 − (e2)
2
2λ
)(Principal) (Agent)
These assumptions best capture environments where goals must be achieved repeatedly over
time, such as minimizing the level of a pollutant or reducing crime. However, our insights
about how disagreement and learning affect policy choices clearly extend to settings where
the future is more important than the present, such as a trial or pilot project.
How Successes Occur Intuitively, the success or failure of a policy depends both on
whether it is fundamentally well-suited to achieve the intended goal, and on how well it is
implemented. To capture these intuitions, we assume that “nature” initially draws a state
ω = {a, b} determining which of the two policies is the “correct” one to achieve the goal.
In both periods, the probability that the correct policy (xt = ω) succeeds is equal to the
agent’s effort et, while the incorrect policy (xt 6= ω) always fails. Thus, more effort produces
a higher chance of success, but only when exerted on the correct policy. Our assumptions
are standard in the economics literature on learning through experimentation (Bergemann
and Hege 2005; Keller, Rady, and Cripps 2005); their main property is that choosing well
and exerting effort are complementary to achieving success, i.e., both are necessary inputs.
A policy problem that might take this form is identifying the root cause of severe smog in
a municipality; city bureaucrats could be uncertain about whether the smog is predominantly
caused by emissions from refineries or emissions from automobiles. To improve air quality
so cheap that the agent chooses e∗ = 1 in the first period, learns which policy is correct,and always achieves success in the second period. λ ≈ .23505 is only relevant for part ofProposition 3 – it ensures that the effect of the agent’s beliefs on effort is not trivial.
9
they must both correctly identify the root cause, and expend effort effectively regulating it.5
Modeling Disagreement To model disagreement we assume that the players have het-
erogeneous prior beliefs about the probability distribution over ω. Specifically, each player
i has their own prior θi that ω = a, i.e., that a is the correct policy, and by implica-
tion their own prior 1 − θi that ω = b. The principal initially believes that a is more
likely to be correct(θ1 ≥ 1
2
), while the agent believes that b is more likely to be correct(
θ2 ≤ 12⇐⇒ 1− θ2 >
12
). They know each others’ prior beliefs (in the sense of common
knowledge) and fully understand the nature of their disagreement. In the smog example, the
principal could be a city manager who believes that existing evidence points to refineries as
the root cause, while the agent is a city bureaucrat charged with implementing the policy
who believes that automobiles are the more likely cause.
Observing Outcomes We solve and compare two variants of the model. In the first vari-
ant policy outcomes are never observed, and disagreement persists unchanged throughout
the game. We call this the No Learning Benchmark because it captures policymaking envi-
ronments where outcomes cannot be reliably measured, and/or circumstances are changing
too rapidly, for learning to occur. The second variant is the Game with Learning. In it, both
players observe whether the first period policy succeeded or failed (i.e., the value of y1) prior
to making their decisions in the second period. This allows them to learn from the initial
outcome before making their subsequent policy decisions.
3 Coping with Disagreement
We first solve the No Learning Benchmark, in which policy outcomes can’t be observed.
Since choices and outcomes in the first period have no effect on beliefs in the second, the
principal must choose policies that cope with disagreement without trying to resolve it.
5The city of Los Angeles faced this uncertainty in the 1960s (Krier 1977).
10
Each period is identical and so can be solved in isolation. We first consider the agent’s
choices. For the organization to succeed the agent must exert high effort, but that effort
only “makes a difference” if the principal has chosen the correct policy. Formally, the agent’s
expected payoff to exerting effort et to implement policy xt = a is equal to θ2et− (et)
2
2λ, where
θ2 is his subjective prior belief that a is the correct policy.6 (The expression is identical for
xt = b, but substituting in the agent’s belief 1− θ2 that b is the correct policy.) His optimal
responses to the principal’s policy choices are thus as follows.
Observation 1. The agent’s effort e∗ (xt) and utility U∗2 (xt) for each policy xt ∈ {a, b} are:
e∗(xt)
=
λθ2 if xt = a
λ (1− θ2) if xt = bU∗2(xt)
=
λ2
(θ2)2 if xt = a
λ2
(1− θ2)2 if xt = b
The left panel of Figure 1 depicts how much effort the agent would exert on each policy as a
function of his beliefs. On all figures, the x-axis illustrates the strength of the agent’s belief
1 − θ2 that his desired policy b is correct; moving right along the x-axis causes the agent’s
beliefs to diverge from those of the principal, and θ2 to decrease.
Figure 1 illustrates that the agent sharing the principal’s goals is alone not enough to
ensure that he works hard to achieve them (Brehm and Gates 1999). While higher intrinsic
motivation λ does induce more effort in our model (Besley and Ghatak 2006), high effort
also requires that the agent believe the principal chose correctly. The reason is that he has
no interest in wasting his effort on an ineffective policy.
The agent’s disagreement about the right policy thus creates a managerial dilemma. If the
principal imposes the policy she believes in (a), the agent will be less motivated, thinking his
effort would be wasted. If she instead defers to the agent by choosing the policy he believes
in (b), then this will better motivate him – but from the principal’s perspective, there are
better than even odds that this extra motivation is useless. Either resolution entails a cost,
either in terms of choosing well or in terms of motivating the agent (Van den Steen 2009).
6“Subjective” because the agent uses his own prior θ2 when evaluating expected utility.
11
Figure 1: Effort and Policy Choice in the No Learning Benchmark Agent’s optimal effort
effort on b
effort on a
agent’s beliefs in favor of b ✓2 = 12
✓2 = 0
effort
λ / 2
λ
No Learning Benchmark
agent’s beliefs in favor of b
Principal defers and chooses b
θ1 =1−θ2
Principal imposes a
✓1 = 1
✓1 = 12
✓2 = 12
✓2 = 0
The following observation characterizes how the principal copes with this disagreement.
Observation 2. In the unique equilibrium of the No Learning Benchmark, the principal
defers to the agent and chooses policy x∗ = b in both periods if and only if the agent’s belief
that b is correct is stronger than her own belief that a is correct, i.e. 1− θ2 ≥ θ1. Otherwise,
she imposes policy x∗ = a in each period.
The right panel of Figure 1 shows the principal’s policy choice x∗ as a function of the players’
beliefs. The y-axis illustrates the strength of the principal’s beliefs θ1 that her desired policy
a is correct; moving up along it causes the principal’s beliefs to diverge from the agent’s.
The principal thus defers to the agent whenever the agent’s beliefs in favor of policy b
are sufficiently strong. But the rationale is very different than in classic political agency
models, in which deference is used to exploit the agent’s superior expertise and improves
policy outcomes despite his differing preferences (Moe 2012). In our model, the agent has
the same preferences as the principal but differing beliefs, and the principal does not ascribe
the difference to superior expertise. She defers simply because she has no other good option.
She needs the agent to work, can’t force him to do so, and concludes that more effort on
an inferior policy is a better bet than less effort on a superior policy. Disagreement and
12
deference are thus pure cost, in that the principal would prefer if she could just choose a,
and the agent agreed that it was right (i.e., θ1 · 1 > max {θ1θ2, (1− θ1) (1− θ2)} for all θ2).
4 The Game with Learning
We now consider the Game with Learning, in which the actors can observe and learn from
policy outcomes. We first analyze how this learning will occur.
Learning by Doing When the players can observe whether the initial policy succeeded
or failed before making decisions in the second period, they will update their beliefs about
which policy is the right one using Bayes’ rule. Exactly how this updating will occur depends
crucially on how hard the agent worked; failure is a more informative signal that the initial
policy was incorrect if it was implemented with high effort. To see this intuitively, recall
the pollution example. Suppose that regulations are imposed on refineries, but the effects
on smog are minimal. This failure provides indirect evidence that automobiles are actually
the root cause, but that indirect evidence is stronger if the bureaucrat vigorously enforced
regulations on refineries than if he weakly enforced them.
To see this mathematically, suppose the principal initially chooses policy x1 = a, and the
agent implements it with effort e1. If a succeeds (y1 = 1), then updating beliefs is simple.
Since only the correct policy can succeed under our assumptions, both the principal and
the agent agree after success that a is definitely the correct policy. If a fails, however,
it could either be because it was incorrect, or because it was inadequately implemented.
Each player i then computes a posterior belief that a failed despite being correct equal
to h (e1, θi) =θi(1−e1)
θi(1−e1)+(1−θi) < θi. This is player i’s prior assessment θi (1− e1) of the
probability a was correct but still failed, divided by the unconditional probability of failure.
This posterior is lower than the prior θi, so failure is always a negative signal. However, how
much lower depends on how much effort e1 the agent put in: the harder he worked, the more
both players ascribe failure to the policy being incorrect rather than poorly implemented.
13
Principal’s Strategies When policy outcome can be observed, the strategies available to
the principal also become more complicated, because she can base the second period policy
on the first period outcome. In theory this allows for even “pathological” strategies such as
switching from the initial policy only if it succeeds. However, it turns out that such strategies
will never be used in equilibrium; the reason is that success persuades both players that the
initial policy was correct. We can thus restrict attention to two types of strategies: ones
where the principal rigidly implements the same policy in both periods regardless of the
outcome, and ones where she experiments with the initial policy, i.e., sticks with the policy
if it succeeds, but abandons the policy if it fails.7
Observation 3. A pure strategy for the principal in the Game with Learning consists of a
first period policy x1, and a second period policy x2 (x1, y1) for each first period policy x1 and
outcome y1. In a pure strategy equilibrium, on each possible policy x1 the principal either
• rigidly implements x1, i.e., chooses it in both periods regardless of the first period
outcome (x2 (x1, y1 = 1) = x2 (x1, y1 = 0) = x1).
• experiments with x1, i.e., chooses it again in the second period if it succeeds (x2 (x1, y1 = 1) = x1)
but switches to the alternative if it fails (x2 (x1, y1 = 0) 6= x1).
Note that the principal can’t directly base her decision to experiment on how hard the agent
worked because his effort is unobservable. Instead, in equilibrium the principal’s strategy
must be optimal given her equilibrium beliefs about how hard the agent worked.8
7For some parameters the model exhibits both multiple and mixed strategy equilibria.Since our focus is on what the principal can achieve managing the agent, we consider onlythe equilibrium that is best for the principal, which is always in pure strategies.
8Formally, the principal’s second period choices x2(x1, y1) after choosing policy x1 arenot a function of the agent’s initial effort e1(x1) on that policy. Instead, experimenting witha is optimal in equilibrium i.f.f. h (e1 (a) , θ1) ≤ 1 − h (e1 (a) , θ2), i.e., if after failure theprincipal’s posterior that a was correct is below the agent’s posterior that b is correct.
14
5 Learning, Experimentation, and Deference
We now present results from the Game with Learning, focusing on three questions. First,
how does the presence of “performance information” – that is, the ability to observe policy
outcomes – directly affect the agent’s incentives? Second, how does the agent respond when
the principal also uses that information in her decisions by experimenting with policies?
Finally, when will the principal defer to the agent, and why? To clarify the exposition,
mathematical details are relegated to a main Appendix (in-print) and a supplemental online
Appendix. The main Appendix contains proofs of Propositions 1 and 2, while the online
Appendix contains the general equilibrium characterization and remaining proofs.
5.1 Learning
A principal who rigidly implements policies ignores “performance information” in her deci-
sions. Thus, by comparing the agent’s effort in this case to his effort in the No Learning
Benchmark, we can isolate how that information directly affects the agent. The reason there
is any effect is two-fold. First, the agent would like to learn whether the initial policy was
incorrect in order to avoid wasting his costly effort in the future. Second, he can influence
that learning through his initial choice of implementation effort. To see how the agent will
choose that effort we analyze his optimization problem, working backwards.
Suppose the principal’s strategy is to rigidly implement policy a (i.e. x1 = a, x2 (a, 1) =
x2 (a, 0) = a).9 Then the agent anticipates that his future effort on a will be based on his
posterior belief that a was correct after observing its outcome, i.e., e2 = λh (θ2, e1). His
two-period expected utility as a function of his first-period effort e1 is then:
−(e1)2
2λ+θ2
(e1
(1 +
λ
2
)+(1− e1
)(λh(e1, θ2
)− λh2 (e1, θ2)
2
))+(1− θ2)
(−λh
2 (e1, θ2)
2
).
(1)
The first term is his first-period cost of effort. The second term is his two-period expected
9The analysis is identical for x1 = b but substituting in 1− θ2 for θ2.
15
utility if a is correct, which he thinks is true with probability θ2. With probability e1
policy a succeeds, he gets 1 in the first period, the principal sticks with a, and he gets λ2
in the second period. With probability 1 − e1, a fails despite being correct, he mistakenly
revises his posterior belief downward to h (e1, θ2), the principal sticks with a, and he gets
λh (e1, θ2) − λh2(e1,θ2)2
in the second period. The third and final term is his two period
expected utility if a is incorrect, which he thinks is true with probability 1 − θ2; a will fail
for sure in the first period, the principal will stay with it, it will again fail for sure in the
second period, and he pays the cost of his wasted effort λh (e1, θ2).
Equation 1 is complicated, but a little algebraic manipulation yields a simpler expression:
(θ2e
1 − (e1)2
2λ
)︸ ︷︷ ︸
first period payoff
+λ
2θ2
(e1 +
(1− e1
)h(e1, θ2
))︸ ︷︷ ︸second period payoff
. (2)
The first term is the agent’s first-period expected payoff as a function of his initial effort e1,
and is identical to his per-period payoff in the No Learning Benchmark. The second term
is his expected future payoff as a function of his initial effort. The key insight is that this
term is increasing in effort e1. The reason is that more initial effort makes the first period
policy outcome more informative, resulting in more learning. This benefits the agent in the
second period by allowing him to better calibrate his effort, i.e., to work hard if the policy
is the correct one and avoid wasting effort if it is not. Formally, e1 + (1− e1)h (e1, θ2) is the
agent’s expected posterior belief that a is correct when it is actually correct. This expression
captures how accurate the agent’s future beliefs will be after observing the initial outcome.
The presence of a “learning premium” to first-period effort in the Game with Learning
produces the following result, which is proved in the main Appendix.
Proposition 1. When the principal rigidly implements some policy x ∈ {a, b}, observing
outcomes strictly increases both the agent’s first period effort, and the probability of success
in both periods, relative to the No Learning Benchmark.
16
The left panel of Figure 2 illustrates how introducing the ability to observe outcomes and
learn from them better motivates the agent in the first period.
Figure 2: Effect of Observing Outcomes and Experimenting on the Agent’s Effort
effort
Agent’s optimal effort
with observable outcomes
λ / 2
λrigidly implements b (observe outcomes)
agent’s beliefs in favor of b ✓2 = 12
✓2 = 0
rigidly implements b (no learning)
rigidly implements a (observe outcomes)
rigidly implements a (no learning)
Agent’s optimal effort
with observable outcomes
Agent’s optimal effort
experiments with b
rigidly implements a
rigidly implements b
experiments with a
agent’s beliefs in favor of b ✓2 = 12
✓2 = 0
λ
effort
λ / 2
Discussion One of the most important developments in public management over the last
several decades has been the push for “performance information” by governments across the
globe. Reform proponents have drawn on the simple logic of classical principal-agent theory
in economics: if we “better assess what government does and how well it does it ... it will
be easier to hold public administrators accountable for their performance” (Kettl 2005).
Research in bureaucracy and public administration, however, provides little evidence that
performance information can be effectively used this way. Public managers lack the freedom
to promote accountability with high-powered incentives, and politicians seem to choose not
to (Pollitt 2006). Bureaucrats appear to respond mainly to intrinsic rewards, which can
be “crowded out” by extrinsic ones (Georgellis, Iossa, and Tabvuma 2011). And manipu-
lating extrinsic rewards generates many unintended consequences. Bureaucrats inefficiently
reallocate effort from unmeasured goals to measured ones (Bevan and Hood 2006) – termed
the “multi-task problem” in principal-agent theory (Holmstrom and Milgrom 1991) – and
manipulate the performance information (Milgrom and Roberts 1988; Heinrich 2007).
17
Our model captures the more empirically supported idea that “the most likely users [of
performance measures] appear to be the agents themselves” (Moynihan, Pandey, and Wright
2012). In our model, an intrinsically motivated agent uses what he learns from outcomes to
allocate effort more efficiently, by working harder when he learns that the policy is likely to be
effective, and working less when he learns that it is not. The principal thus benefits from the
information even when she ignores it in her own decisionmaking ; the agent’s greater efficiency
increases the chance of success in both periods relative to the No Learning Benchmark.
In capturing this rationale for performance information, the model also generates a new
and empirically testable implication – that introducing such information better motivates
the agent. The reason is that the harder the agent works to implement a policy, the more
its outcome reflects its underlying quality, and the more the agent learns from observing it.
Ergo, the more valuable is his effort, and the more effort he puts in.
5.2 Experimentation
A principal who experiments with policies is one whose decisions are directly informed by
policy outcomes – she sticks with the initial policy if it succeeds, but abandons it for the
alternative if it fails. The information contained in policy outcomes therefore directly aids
her “search” for the best policy (Callander 2011).
Experimentation, however, also has a more subtle and previously overlooked property:
it indirectly empowers the agent to influence the principal’s policy choices. The reason is
that by working hard he can increase the chance of a success, and therefore the chance that
the principal will stick with the initial policy. Alternatively, by withdrawing his effort from
the initial policy he can “sabotage” it – that is, intentionally increase its chance of failure –
which will cause the principal to abandon it (Brehm and Gates 1999).10
10Note that these authors reserve the term “sabotage” for expending active effort to inducefailure and call withdrawing effort “dissent shirking.” There is no need for active sabotagein our model due to the simplifying assumption that failure can be ensured with shirking;relaxing this would yield active sabotage when it now yields “dissent shirking.”
18
How will the agent respond when the principal experiments with a policy, rather than
rigidly implements it? Suppose the principal experiments with policy a (i.e. x1 = a,
x2 (a, 1) = a and x2 (a, 0) = b). Analyzing the agent’s two-period expected utility as a
function of his first period effort e1 yields an expression that is analogous to Equation 2
(details are in the main Appendix):
(−(e1)
2
2λ+ θ2e
1
)︸ ︷︷ ︸
first period payoff
+λ
2(1− θ2)
(1− h
(e1, θ2
))︸ ︷︷ ︸learning term
+λ
2θ2e
1︸ ︷︷ ︸policy influence term︸ ︷︷ ︸
second period payoff
. (3)
As in equation 2, the expression is divided into first and second-period terms, and the first-
period terms are identical. But because the principal experiments, the second period term
has two subterms: a learning term that represents how the agent’s initial effort affects his
future utility directly through his own learning, and a policy influence term that represents
how his initial effort affects his future utility indirectly through the principal’s policy choices.
The learning term has properties that are similar to the agent’s expected second-period
utility when the principal rigidly implements a.11 The policy influence term λ2θ2e
1, however,
only appears when the principal experiments. The key insight is that this term is always
increasing in effort. Intuitively, this means that from the agent’s perspective, working harder
initially always has a beneficial effect on the principal’s future choices. Experimenting thus
incentivizes the agent to work harder initially than does rigid implementation. This result is
perhaps not so surprising for the policy that the agent initially believes in (b), since failure
will cause a switch to the policy he opposes (a). But it also holds for the policy he opposes
(a), even though simply withdrawing his effort would induce certain failure and a switch to
the policy he believes in (b). We state the formal result in the following proposition.
Proposition 2. For any initial beliefs of the agent, the principal elicits more first period
11The key expression is 1− h (e1, θ2), which is the expected value of the agent’s posteriorthat a is wrong when it is actually wrong. This captures how accurate the agent’s futurebeliefs will be after observing the first period outcome, and it is increasing in e1.
19
effort by experimenting with either policy x1 ∈ {a, b} than by rigidly implementing it.
The right panel of Figure 2 illustrates the effect by comparing the agent’s first period effort
on each policy when the principal experiments with it vs. rigidly implements it.
Preferences versus Beliefs Why does experimenting better motivate the agent, even
when it is with a policy he opposes? To clarify, it is helpful to contrast with a version of the
model where the agent’s opposition to policy a is rooted in preferences rather than beliefs.12
Suppose the agent agreed about the probability θ1 that a was the best policy to achieve
the principal’s goals but simply had his own different goals; specifically, his own intrinsic
values πa < πb for exerting effort on each policy, with a preference for b. The agent’s
two-period expected utility for implementing policy x1 = a would then be:
(−(e1)
2
2λ+ πae
1
)︸ ︷︷ ︸
first period payoff
+
λ2π2a if principal rigid
λ2π2b−
λ
2θ1e
1 ·(π2b − π2
a
)︸ ︷︷ ︸policy influence term
if principal experiments (4)
As in the baseline model, experimenting introduces a “policy influence” term into the agent’s
first period objective function. But in contrast, that term would actually be decreasing in
effort. In other words, when the agent’s opposition to policy a is rooted in preferences rather
than beliefs, he would “sabotage” a policy experiment with a by withdrawing some of his
effort to increase the chance of failure and a policy change.
This sharp distinction between preferences and beliefs arises because an ineffective policy
doesn’t need to be sabotaged to fail; policy change will happen without the agent’s help.
Rather, sabotage only “works” on an effective policy that could have achieved the principal’s
goals with enough effort. When the agent shares those goals (as in the baseline model) this
means that sabotage would only make a difference exactly when the policy is correct, and
the agent wouldn’t want it to switch from it.13 When the agent doesn’t share those goals,
12Details are in the online Appendix.13It doesn’t matter that he assigns lower probability to this than the principal (θ2 < θ1).
20
whether the policy is correct for the principal is irrelevant to him; his net benefit λ2
(π2b − π2
a)
from inducing failure and a policy change is always the same.
Discussion An influential alternative to the “reward-and-sanction” school of performance
information argues that such information aids public managers in their search for the most
effective policies (Simon 1947; Moynihan 2008). These arguments have been especially in-
fluential in the United States: the idea of explicitly treating “policies as experiments” with
the intent to “learn from them” dates back at least to Great Society-era social engineering
(Campbell 1969), was a key element of the nearly decade-long Clinton administration reform
initiative to “reinvent government” (Osborne 1992; Aberbach 2000), and is now codified in
the Interior Department’s official “adaptive management” policies for resource management
(Lee 1993; Office of Environmental Policy and Compliance 2008).
None of these previous discussions, however, consider how experimentation might affect
the incentives of agents who are charged with actually implementing policy. This is surprising
because nearly any policy must be carried out by lower-level subordinates. Our analysis
demonstrates both why there is an effect – because tying future policy choices to previous
outcomes lets the agent influence policy choice through his effort – and that the direction of
the effect depends crucially on whether conflict is rooted in preferences or beliefs.
An additional implication of our model is that there can be motivational benefits to
arrangements that commit principals to experiment with policies, by forcing, encouraging,
or selecting them to be responsive to negative policy outcomes. In the online Appendix,
we formally show that such institutions can benefit the principal: this is the case when her
beliefs in favor of the optimal initial policy are too strong to credibly abandon it after failure,
but ex-ante she would like to “tie her hands” to better motivate the agent.14
What might some such arrangements be in practice? A blunt one is to directly legislate
trigger mechanisms that make a “pre-negotiated commitment ... specifying what actions
14We also show the reverse is never true; the principal would never be better off being forcedto rigidly implement a policy when she would choose to experiment with it in equilibrium.
21
will be taken if monitoring information shows x or y” (Nie and Schultz 2012). This was
recently proposed during policy debates on health care15 and national security (Gates 2014,
p. 375). A more flexible one is to impose costs on decision makers – not for failure, but for
sticking with policies after failure.16 The Compstat program developed in New York City
for analyzing high-frequency crime data furnishes an example: as originally conceived, the
program’s central component was a weekly meeting in which “trouble arose” for precinct
commanders not when “crime numbers on their watch went up,” but instead when they
“didn’t have a plan to address the problem” (Maple and Mitchell 2010).
A final possibility is to simply select individuals to make policy decisions whose intrin-
sic beliefs are more amenable to experimentation. In the online Appendix we show that a
principal with strong beliefs is sometimes better off appointing a “moderate” to make pol-
icy decisions for her, because they will be more willing to abandon the initial policy after
failure, and thereby better motivate the agent. This insight provides a new perspective on
some of the trade-offs facing U.S. Presidents in their attempts to influence federal agencies
through appointments (Lewis 2008). In particular, it suggests a novel reason why “stressing
loyalty and ideology above all else” (Moe 1985, p. 258) appears to demoralize rank-and-file
bureaucrats (Golden 2000). Because of their extreme beliefs, such appointees not only select
policies the bureaucrats disagree with, but also rigidly implement those policies even after
apparent failures. We return to this idea in Section 6.
5.3 Deference
Finally, we consider the principal’s decision to defer to the agent in the Game with Learning.
A principal who defers gives the agent’s desired policy b a chance to succeed, despite her
own skepticism that it is the right policy. What drives her decision to do so?
In the No Learning Benchmark, the principal defers as a second-best coping strategy
15http://www.slate.com/articles/news and politics/prescriptions/2009/10/public option lite.html16In the online Appendix, we prove that when the principal is best off experimenting, there
is a cost of maintaining policy after failure that would induce her to do so.
22
given her irresolvable disagreement; when the agent’s beliefs are stronger than her own, she
calculates that more effort on an inferior policy is a better bet than less effort on a superior
policy. In the Game with Learning, however, much more can happen. She could rigidly
implement policy a, given the strength of her beliefs in its favor, or experiment with a,
hoping that success will persuade the agent that it is right. Alternatively, she could defer to
the agent and rigidly implement b, given the strength of his beliefs, or experiment with b,
hoping that failure will persuade him that it is wrong.
These possibilities result in a complicated set of equilibrium policy choices by the prin-
cipal. Figure 3 depicts these choices as a function of the players’ initial beliefs.17 The fill
of each region indicates the principal’s first period policy choice – in the dotted regions the
principal imposes x1 = a, while in the solid regions she defers and chooses x1 = b. The dark-
ness of the shading indicates whether the principal is responsive to failure – in the darkly
shaded regions she rigidly implements the first period policy, while in the lightly shaded
regions she experiments. Despite the complexity of the principal’s choices, we can extract
two insights about her decision to defer.
Proposition 3. In the first period of the Game with Learning, the principal defers for a
larger set of initial beliefs than in the No Learning Benchmark.18 Moreover, when the agent is
both sufficiently motivated(λ > λ
)and believes strongly enough in b (1− θ2 is sufficiently high),
the principal always defers and selects policy x1 = b regardless of her own beliefs.
Compared to the No Learning Benchmark, the principal thus defers more often (i.e., for a
larger set of initial beliefs). In addition, when the agent’s beliefs are sufficiently strong and
17The agent’s intrinsic motivation is fixed at λ ≈ .653. Part of the complexity also arisesfrom the principal’s problem committing to experiment (see Section 5.2). This accountsfor the jagged region in the upper left quadrant. Strengthening the principal’s beliefs (i.e.,moving up the y-axis) while holding the agent’s beliefs fixed (e.g., at θ2 = 1
3), policy can
temporarily flip back to experimenting with b when the best strategy ex-ante is to experimentwith a, but the principal’s prior beliefs are too strong to sustain commitment.
18This is the only result requiring λ > λ ≈ .23505 and is a quirk of the principal’scommitment problem. With commitment it holds ∀λ. See the online Appendix for details.
23
Figure 3: Principal’s Equilibrium Policy Choices
Principal’s Strategy
x1=b, experiments
x1=a, experiments
x1=b, rigid
x1=a, rigid
Equilibrium Policies
always defers
additional deference
agent’s beliefs in favor of b ✓2 = 12
✓2 = 0
✓1 = 1
✓1 = 12
he is sufficiently motivated, then the principal always defers even when she is already sure
that b is the wrong policy.
Why does the principal defer more in the Game with Learning? As in the No Learning
Benchmark, the principal can always elicit more effort from the agent by simply deferring;
he will always work harder on policy b because he thinks that his effort is more likely to
produce success. In the Game with Learning, however, the agent’s effort is more valuable:
more effort makes the initial policy outcome more informative, which results in better future
decisions by both the principal and the agent (Sections 5.1 – 5.2). Combining these two
effects tilts the scales towards deference.
This explanation, however, presents a puzzle. If the additional deference in the Game
with Learning is due to the informational benefits of the agent’s effort, why does the principal
still defer when she already thinks she knows that a is the right policy? The reason is that
the target of that additional information is sometimes the agent himself. The principal
sometimes defers to persuade the agent through failure that his beliefs were wrong, so that
24
he will work harder on policy a in the future. This can be an effective way of managing
their disagreement because allowing the agent an opportunity to prove that b is the right
policy strongly motivates him, and makes failure all the more persuasive when it (from the
principal’s perspective) inevitably occurs. The alternative – forcing the agent to implement
policy a immediately – is less effective because he will exert low effort on a, failure will most
likely result, and it will (slightly) reinforce the agent’s initial belief that a is wrong.19
Discussion In an extensive formal literature in bureaucratic politics, principals defer, del-
egate, or give discretion to agents with differing preferences to use their (or encourage them
to acquire) superior expertise.20 Across multiple models, two implications have emerged as
“the cornerstones of the modern field [of public bureaucracy]” (Moe 2012, p. 21): the ally
principle, which states that deference is less likely the further are the agent’s preferences
from the principal’s, and the uncertainty principle, which states that deference is more likely
“the more uncertain and complex [is] the policy area.” Both of these principles, in general,
fail to hold in our model.
The failure of the ally principal can be seen in Figure 1 (for the No Learning Benchmark)
and Figure 3 (for the Game with Learning). As the agent’s beliefs diverge from the principal’s
because he is increasingly sure that b is right, the principal defers more (i.e., for a larger
set of principal beliefs). This occurs because the principal in our model can’t simply walk
away from the agent; even if she ignores his beliefs, she still needs him to implement her
decisions. This captures an important aspect of real-world policymaking that is assumed
away in most classical models: politicians (and their appointees) will always need lower-level
19When ω = a and the principal rigidly implements a, the agent’s expected future beliefthat a is right is e1 + (1− e1)h (e1, θ2), which is low since θ2 and consequently e1 is low. Ifω = a but the principal experiments with b, the agent’s expected future belief that a is rightis 1− h (e1, 1− θ2), which is higher since 1− θ2 and consequently e1 are higher.
20See Gailmard and Patty (2012) for a review. Some ambiguity about terms is worth not-ing. Deference (following an agent’s recommendations), delegation (relinquishing authorityto an agent) and discretion (placing fewer constraints on his authority) are distinct actions.But we discuss them collectively, because the factors that induce principals to employ themin informational models are similar.
25
subordinates to carry out their policies. As the model clearly shows, it actually matters
what these subordinates think even when they lack superior information, because what they
think affects how well they do their jobs.
The uncertainty principle actually holds in the No Learning Benchmark; the principal
defers for a larger set of agent beliefs when she is less sure that a is right (θ1 closer to 12).
However, it fails in the Game with Learning. When the agent’s beliefs are sufficiently strong,
the principal defers regardless of her own (un)certainty. This crucial difference arises because
the principal only defers in the canonical framework to benefit from the agent’s information.
When she is already certain about which policy is right, she has no reason to defer. But in
our model, generating more information can also help the agent learn about which policy
is right. This yields a surprising new reason to defer, even when doing so requires choosing
(what the principal thinks is) the wrong policy: allowing the agent to implement that policy
and watch it fail can more effectively “teach” him to share the principal’s beliefs.
In addition to these differences, our model produces a new testable implication that
has not been previously considered: introducing effective performance measurement (i.e.,
moving from the No Learning Benchmark to the Game with Learning) should also increase
deference. Our model speaks to the effects of performance information because learning
comes from implementing policies and observing outcomes, rather than “drawing a costly
signal” about the state (Gailmard and Patty 2007). Suggestively, the combination of perfor-
mance measurement and deference resembles the core principles of the National Performance
Review, a Clinton-administration management initiative that combined performance mea-
surement with the directive that “decision-making power should be decentralized, giving
lower-level employees more authority to make decisions ” (Aberbach 2000).
Finally, the benefits of exercising broad deference to subordinates are often extolled
by practitioners. NYPD commissioner William Bratton describes a management style in
which he “pick[ed] good people,” “turned responsibility back onto the worker,” and “let
them do their jobs” (Bratton 1998, p. 127). But the canonical formal model for studying
26
principal-agent relationships has forced our understanding of these benefits into the narrow
confines of asymmetric expertise versus control, and forced out many factors that commanded
the attention of early bureaucracy and public administration scholars (and contemporary
“informal” organizational theorists): instilling culture, gaining trust, building consensus,
and teaching (Kaufman 1967; Heclo 1977; Kaufman 1981). Like most formal models, our
contribution to this list is incremental but nevertheless important: a principal sometimes
defers to teach, because teaching sometimes requires letting an agent make mistakes.21
6 Application: The U.S. Federal Bureaucracy
We now apply our model to the U.S. federal bureaucracy. We first discuss how repeated
efforts to enhance performance measurement have produced data that could be used to test
some of our predictions. We then discuss how both the model and extensions can shed light
on Presidential “politicization” and bureaucratic performance.
Performance Measurement in U.S. Government Since the early 1990s, there have
been three major initiatives to enhance performance measurement in federal agencies: the
Government Performance and Results Act of 1993 (Long and Franklin 2004), former Vice
President Gore’s “National Performance Review” (NPR) (Thompson and Riccucci 1998),
and the Bush-era Program Assessment Rating Tool (PART) (Lewis 2008). The federal
government has collected voluminous survey data tracking the consequences of these reforms.
Since 2002 the Office of Personal Management (OPM) has administered frequent surveys of
federal employees across agencies and ranks that track key variables in our model: effort,
21Brehm and Gates (2008) also study how bureaucratic supervisors teach subordinates in“environments of great uncertainty” (p. 42), but their formal exercise is akin to “attemptingto change the preferences of the subordinate.” While this too certainly occurs in politicalorganizations, we believe that our approach, which explicitly models beliefs and how theychange with rational Bayesian learning, may also capture the spirit of their exercise.
27
intrinsic motivation, and managerial styles.22 The Government Accountability Office (GAO)
has also conducted three cross-agency surveys of federal managers since 2003, measuring
both whether federal agencies have introduced useful performance data, and whether it is
actually used to guide agency decisionmaking.23 This survey data can be used to paint a
picture across agencies and time about when performance measurement was introduced, and
the consequences of doing so for employee effort and managerial practices.
One set of predictions that can be tested is the collection of organizational changes that
occur when moving from the “No Learning Benchmark” to the “Game with Learning”: ef-
fort, experimentation, and deference to subordinates should all increase. This transition can
be measured in the data using the perceptions of managers and subordinates as to when
effective performance measures were introduced. The second prediction that can be tested is
the effect of experimentation on bureaucratic morale: subordinates should be more motivated
in agencies where managers use performance data in decisionmaking, i.e., experiment. Some
evidence already exists from the Clinton-era NPR reforms, which explicitly combined perfor-
mance measurement with a directive to “give lower-level employees more authority to make
decisions” (Aberbach 2000). An important component of these reforms was “reinvention
labs” in which agencies could receive rule waivers for experimental projects spearheaded by
low level bureaucrats. A 1996 GAO study found that these labs had significantly improved
agency effort, morale, and performance (U.S. Government Accountability Office 1996).
Presidential Appointments and Bureaucratic Performance How Presidents choose
their appointees to the executive branch has been long debated in the literature. In a
provocative essay that set the terms of the debate, Moe (1985) argued that Presidents would
inevitably continue “politicizing” agencies by filling their ranks with loyal and ideologically
like-minded appointees, even at the cost of “neutral competence” (Heclo 1977). Recent
22Federal Employee Viewpoint Survey (FEVS) 2013 summary available athttp://www.fedview.opm.gov/2013files/2013 Governmentwide Management Report.PDF.
23Summary of 2013 survey is available at http://www.gao.gov/products/GAO-13-519SP
28
empirical work has cast doubt on this contention. Lewis (2008) finds both that the number
of political appointees is not secularly increasing, and that the “complexity” of an agency’s
tasks reduces politicization. This suggests that Presidents do indeed perceive, and react to,
“trade-offs between policy influence and agency performance” (Lewis 2011, p. 50).
Whether and how such trade-offs also affect the ideology of appointees is less well un-
derstood. The appointments extension of our model discussed in Section 5.2 suggests that
Presidents sometimes prefer appointees with more moderate beliefs than their own, to better
motivate rank-and-file bureaucrats. Consistent with this argument, Bertelli and Grose (2011)
find that Presidents do not actually appoint ideological “clones”; executive department heads
are significantly more moderate than their appointing presidents. But many other factors
examined in the literature can account for this finding, e.g., the budgetary and advice and
consent powers of the Senate (Bertelli and Grose 2011; McCarty 2004) and external interest
group behavior (Bertelli and Feldmann 2007; Gailmard and Patty 2013).24 Here we discuss
some suggestive evidence that the sorts of managerial factors captured by our model also
play a role, by revisiting a case study examining President Reagan’s experience managing
the Environmental Protection Agency (EPA) (Golden 2000).
President Reagan’s first choice to run the EPA was an archetypal appointee of the “admin-
istrative presidency”: Anne Gorsuch, a conservative ideologue who “shared the president’s
environmental policy agenda” and “convictions about the negative impact that environmen-
tal regulations had on economic growth” (Golden 2000, p. 120). Unsurprisingly, her tenure
was marred by a crisis of morale; careerists reacted like the agent in our model not with
“deliberate foot-dragging or sabotage,” but rather a severe waning of “commitment,” “en-
thusiasm,” and voluntarily effort. Despite Gorsuch’s apparent success shifting policy (Wood
and Waterman 1991), she resigned after 22 months amid controversy and Congressional
hearings, and was replaced by former EPA head and Nixon appointee William Ruckelshaus.
Golden (2000) describes the revolution in agency morale after Ruckelshaus’ return; he was
24See Lewis (2011) for a comprehensive review of the relevant literature.
29
“regarded as a savior” by EPA bureaucrats who “wanted to work for him.” Notably, rather
than returning to an aggressive politicization strategy once the crises subsided, President
Reagan continued to appoint moderate EPA heads for the remainder of his tenure.
While the episode suggests that internal managerial factors may influence even aggressively-
ideological Presidents, it remains unclear what about Ruckelshaus motivated EPA bureau-
crats. It was likely not “going native”; Ruckelshaus remained a committed conservative and
was opposed by environmental groups. He did, however, have a notably respectful, “open”
and “data driven” management style. Interpreted through the lens of the model, what may
have been motivating about Ruckelshaus was not his ideological moderation per se, but his
less rigid beliefs that allowed him to manage experimentally and responsively (Dobel 1992,
p. 251). As would be necessary for our logic to apply, Ruckelshaus was given considerable
discretion by the White House, and was “free to set the agency’s internal agenda” (Golden
2000, p. 138). Finally, the ostensibly liberal bureaucrats under him faithfully implemented
even his conservative policy initiatives like cost-benefit analysis; in their own words, this was
because “career staff recognize that we don’t always know the right thing to do,” and that
“long-standing agency policy may be wrong” (Golden 2000, p. 143).
More broadly, the idea that “management suffers ... when government is run by a tran-
sient group of strangers” is a perennial concern of public administration scholars (Heclo
1977). But why it suffers, and how that influences political decisions, has received little at-
tention (Lewis 2011, p. 60). Our model provides a new way of examining this (re)emerging
question. One reason performance suffers is that appointees and bureaucrats have different
beliefs about how to “get things done.” It can suffer more when appointees impose their
will and bureaucrats “go into hiding.” It can recover when they find ways to work together
despite their disagreements, by trying to “educate” each other and “bring [each other] along,
irrespective of their politics” (Starobin 1995).
Extending the model could therefore help answer many open questions about condi-
tions under which Presidential appointments enhance or degrade agency performance. How
30
does the tenure length of appointees, or their insulation from the executive and legislative
branches, enhance or impede the ability and willingness of appointees and bureaucrats to
“educate each other” and improve performance? Do the short tenures of political appointees
diminish their ability to do so, or actually enhance their incentives to effect an enduring
change in bureaucratic beliefs? Do electoral expectations influence Presidents’ decisions to
try and make appointments that “persuade,” rather than control? Answering these open
questions would shed further light on Presidential appointment decisions.
7 Conclusion
In the study of bureaucratic relationships, the idea that policymakers are uncertain about
how to achieve their goals has long been central. However, the literature has largely focused
on only one aspect of this uncertainty – that an agent may have superior information about
the consequences of different policies. As argued by Moe (2012), this consensus has been
“empowering,” but also “constraining”: by pushing the study of bureaucratic politics “away
from other avenues that, even if potentially productive, are not compatible with the accepted
way of thinking about things,” it has resulted in “a kind of path dependence ... that makes
certain kinds of progress more difficult to achieve.”
In this paper, we attempt to break this path dependence by considering a radically dif-
ferent aspect of bureaucratic relationships – that principals and agents might openly disagree
about how to achieve shared goals. To explore its implications, we develop two models of
policy choice and implementation – one in which policy outcomes cannot be observed, and
another in which they can. The progress made in doing so is three-fold. First, we find that
the ability to observe and learn from policy outcomes improves the efficiency and motivation
of agents, even when principals ignore the information in their choices. Second, we find
motivational benefits to experimenting, and show that they are specific to disagreements
rooted in beliefs rather than preferences. Finally, we show that belief conflicts can produce
31
a sometimes-extraordinary degree of deference driven by the desire to persuade.
Stepping back from the study of bureaucracy, our model illustrates some incentives spe-
cific to belief conflicts and learning that are relevant to many additional questions across
political science. We give three examples as suggestions for future research.
First, a burgeoning literature studies the organization and decision making of terrorist
groups using classical principal-agent models (e.g., Shapiro and Siegel (2007); Bueno de
Mesquita (2008)). However, for reasons articulated by Shapiro (2013), the assumptions of
our model are arguably an even better fit in this setting than in public bureaucracies. A
perverse sort of intrinsic motivation is necessary to recruit terrorist “agents” for poorly paid
and highly dangerous work, and it is risky for leaders to monitor these agents too closely.
Direct mechanisms of control like punishment can backfire and result in defections, or even
violent reprisals. Most importantly, terrorist organizations are riven by belief conflicts,
in particular, about the political consequences of violent versus non-violent tactics.25 Do
terrorists choose tactics with an eye toward what the observed political repercussions will
“teach” competing factions, difficult-to-control operatives, and potential sympathizers about
how to most effectively pursue their aims? Do shifting strategies help motivate rank and file
operatives even when they are otherwise counterproductive?
Second, a large literature studies how public opinion constrains the foreign policy deci-
sions of chief executives (Aldrich et al. 2006). If such constraints on the public’s “agent”
(Ashworth 2012) are due to differences in beliefs, then a model like ours may shed light
on how chief executives cope with them. For example, it may sometimes be rational for
executives to pursue “bad” policies so failure will persuade the public to agree with them.26
Documentary evidence suggests that the Johnson administration’s decision to halt bombing
and pursue a “peace offensive” during the Vietnam War in 1965 was driven by such motives.
Despite little confidence that the peace initiative would succeed, the administration gave
25Shapiro (2013) also analyzes the content of 108 terrorist memoirs and finds evidence of“induced” disagreements driven by “different beliefs about what to do” in 58% of them.
26In our model the agent doesn’t “teach” the principal but would if effort were observable.
32
it “the old college try.” Their primary motive was to persuade the “73% of the American
public eager for a cease-fire” that peace was futile, as well as “Fulbright ... the New York
Times, all these people thinking there could be peace if we were only willing to have peace.”
The amount of effort expended on peace was also understood to be a “crucial element” of
the persuasiveness of failure: the administration sought to show that “we have explored fully
every alternative” and “left no door to peace untried” (Dallek 1996, pp. 152-153).
Finally, pilot experiments to test new programs have become increasingly common and
influential with governments, the media, and the public (Rogers-Dillon 2004). However,
their political uses and consequences have received little attention. A “textbook” view of
pilot projects would embrace them as sincere efforts to learn about policies’ effectiveness,
and a raw political view would discount them as manipulation by the policies’ supporters
and beneficiaries (Rogers-Dillon 2004). Our model suggests that they may be understood as
a sort of biased and belief-driven political entrepreneurship. By proposing to “experiment
with new programs or ... innovations,” policy-motivated actors with genuinely strong be-
liefs may be seeking to produce persuasive evidence that “convince[s] diverse coalitions of
organized interests ... of the value of their ideas.” (Carpenter 2001, p. 30). As Carpenter
(2001) demonstrates in his influential study of the 19th century U.S. postal service, this sort
of experimentation-as-entrepreneurship by early agency heads was important in the devel-
opment of bureaucratic autonomy. A model of pilot experimentation based on belief (and
preference) conflicts could generate useful insights about how and when such programs are
used, and their political role in policymaking and institutional development.
Appendix
We first derive equations 2 and 3; for simplicity denote θ for θ2, e for e1, and h (·) for h (e, θ).
Rearranging the “long form” in equation 1 when the principal rigidly implements a yields:
(− e
2
2λ+ θe
)+λ
2θe+ λ
(θ (1− e)h (·)− 1
2(θ (1− e) + (1− θ))h2 (·)
).
33
Now from the definition of h (·) we have (θ (1− e) + (1− θ))h2 (·) = θ (1− e)h (·); substi-
tuting and simplifying yields(− e2
2λ+ θe
)+ λ
2θe+ λ
2θ (1− e)h (·) , and then equation 2.
To derive equation 3, first write the analog to equation 1 when the principal experiments:
− e2
2λ+ θ
(e
(1 +
λ
2
)+ (1− e)
(−λ
2(1− h (·))2
))+(1− θ)
(λ (1− h (·))− 1
2λ (1− h (·))2
).
To see this, recall that failure results in a switch to b, and that the agent’s posterior belief
that b is the best policy after a fails is 1− h (·). Rearranging then yields:
(− e
2
2λ+ θe
)+λ
2θe+ λ
((1− θ) (1− h (·))− 1
2(θ (1− e) + (1− θ)) (1− h (·))2
)
Similar to above, observe that (θ (1− e) + (1− θ)) (1− h (·))2 = (1− θ) (1− h (·)). Substi-
tuting and simplifying then yields equation 3.
Proof of Propositions 1 and 2 Consider the function U (e, θ, η1, η2) =
(θe− e2
2λ
)+λ
2
(1− η1) θ2+
η1 ((1− η2) · θ (e+ (1− e)h (e, θ)) + η2 · (1− θ) (1− h (e, θ)) + η2θe)
.
It is simple to verify from the definitions that U (e1, θ2, 0, 0) is the agent’s two-period ex-
pected utility when the principal rigidly implements a in the No Learning Benchmark
(NLB), U (e1, θ2, 1, 0) when she rigidly implements a in the Game with Learning (GWL),
and U (e1, θ2, 1, 1) when she experiments with a in the GWL (analogous expressions for b
involve substituting 1− θ2 for θ2). Now denote the agent’s optimal effort given each objec-
tive function e∗ (θ, η1, η2) = arg maxe∈[0,1]
(U (e, θ, η1, η2)). In Lemma 1 in the online Appendix we
prove that e∗ (θ, η1, η2) is unique and ∈ (0, 1) ∀ (θ ∈ (0, 1), η1, η2) when λ < λ ≈ .68466.
Now the derivatives of both θ (e+ (1− e)h (e, θ)) and (1− θ) (1− h (e, θ)) w.r.t. e are
equal to θ (1− h (e, θ))2, which is > 0 ∀θ ∈ (0, 1). (To verify use that ∂h∂e
= − θ(1−θ)(1−θe)2 ,
(1− e) ∂h∂e
= −h (·) (1− h (·)), and − (1− θ) ∂h∂e
= θ (1− h (·))2). Applying this yields that
34
∂2U(e,θ,η1,0)∂η1∂e
= λ2θ (1− h (e, θ))2 and ∂2U(e,θ,1,η2)
∂η2∂e= λ
2θ. Intuitively, ∂2U(e,θ,η1,0)
∂η1∂eis the increase
in the marginal benefit of effort going from the NLB to the GWL, and ∂2U(e,θ,1,η2)∂η2∂e
is the
increase in the marginal benefit of effort when the principal goes from rigidly implementing
to experimenting. Since both cross partials are strictly positive when θ ∈ (0, 1), Theorem
1 of Edlin and Shannon (1998) (see Ashworth and Bueno de Mesquita (2006)) implies that
e∗ (θ2, 1, 0) > e∗ (θ2, 0, 0) ∀θ2 ∈ (0, 1) – the agent works harder on a in the GWL (Prop. 1) –
and e∗ (θ2, 1, 1) > e∗ (θ2, 1, 0) ∀θ2 ∈ (0, 1) – the agent works harder on a when the principal
experiments (Prop. 2). Finally, by symmetry these results also apply to x1 ∈ b.
To see that going from the NLB to the GWL increases the probability of success in both
periods (Prop. 1), first observe that the first period increase follows immediately from the
agent’s greater effort. The expected second period probability of success in the GWL (from
the principal’s perspective) is θ1·(λ (e1 + (1− e1)h (e1, θ2))), which is equal to the probability
of success θ1 · λθ2 in the NLB when e1 = 0, and is (already shown to be) increasing in e1. �
References
Aberbach, Joel D. 2000. In the Web of Politics. Washington, D.C: Brookings Institution
Press.
Abramowitz, Alan I., and Kyle L. Saunders. 2008. “Is Polarization a Myth?” The Journal
of Politics 70(02): 542–555.
Aldrich, John H., Christopher Gelpi, Peter Feaver, Jason Reifler, and Kristin Thompson
Sharp. 2006. “Foreign Policy and the Electoral Connection.” Annual Review of Political
Science 9(1): 477–502.
Arrow, K. J. 1964. “The Role of Securities in the Optimal Allocation of Risk-bearing.” The
Review of Economic Studies 31(April): 91–96.
35
Ashworth, Scott. 2012. “Electoral Accountability: Recent Theoretical and Empirical Work.”
Annual Review of Political Science 15(1): 183–201.
Ashworth, Scott, and Ethan Bueno de Mesquita. 2006. “Monotone Comparative Statics for
Models of Politics.” American Journal of Political Science 50(January): 214–231.
Aumann, Robert J. 1976. “Agreeing to Disagree.” The Annals of Statistics 4(November):
1236–1239.
Bartels, Larry M. 2008. Unequal Democracy. Princeton: Princeton University Press.
Bergemann, Dirk, and Ulrich Hege. 2005. “The Financing of Innovation: Learning and
Stopping.” The RAND Journal of Economics 36(December): 719–752.
Bertelli, Anthony, and Sven E. Feldmann. 2007. “Strategic Appointments.” Journal of Public
Administration Research and Theory 17(January): 19–38.
Bertelli, Anthony M., and Christian R. Grose. 2011. “The Lengthened Shadow of Another
Institution?” American Journal of Political Science 55(4): 767–781.
Besley, Timothy, and Maitreesh Ghatak. 2006. “Sorting with Motivated Agents.” Journal
of the European Economic Association 4(2-3): 404–414.
Bevan, Gwyn, and Christopher Hood. 2006. “Whats Measured Is What Matters: Targets
and Gaming in the English Public Health Care System.” Public Administration 84(3):
517–538.
Boardman, Craig, and Eric Sundquist. 2008. “Toward Understanding Work Motivation:
Worker Attitudes and the Perception of Effective Public Service.” The American Review
of Public Administration (September).
Bratton, William. 1998. Turnaround. New York: Random House.
36
Brehm, John, and Scott Gates. 1999. Working, Shirking, and Sabotage. Ann Arbor: Uni-
versity of Michigan Press.
Brehm, John, and Scott Gates. 2008. Teaching, Tasks, and Trust: Functions of the Public
Executive. New York: Russell Sage Foundation.
Bueno de Mesquita, Ethan. 2008. “Terrorist Factions.” Quarterly Journal of Political Science
3(December): 399–418.
Bueno De Mesquita, Ethan, and Matthew C. Stephenson. 2007. “Regulatory Quality Under
Imperfect Oversight.” American Political Science Review 101(03): 605–620.
Callander, Steven. 2011. “Searching for Good Policies.” American Political Science Review
105(4): 643–662.
Callander, Steven, and Patrick Hummel. 2014. “Preemptive Policy Experimentation.”
Econometrica 82(4): 1509–1528.
Campbell, Donald T. 1969. “Reforms as experiments.” American Psychologist 24(4): 409–
429.
Carpenter, Daniel P. 2001. The Forging of Bureaucratic Autonomy. Princeton, N.J.: Prince-
ton University Press.
Che, Yeon Koo, and Navin Kartik. 2009. “Opinions as Incentives.” Journal of Political
Economy 117(October): 815–860.
Dallek, Robert. 1996. “Lyndon Johnson and Vietnam: The Making of a Tragedy.” Diplo-
matic History 20(April): 147–162.
DiIulio, John J. 1987. Governing Prisons. London: Collier Macmillan.
Dobel, P. J. 1992. “William D. Ruckelshaus.” In Exemplary Public Administrators, ed.
Terry L. Cooper. San Francisco: Jossey-Bass Publishers pp. 241–269.
37
Draper, Robert. 2014. “The War Within.” POLITICO Magazine (November).
Durant, Robert F., Robert Kramer, James L. Perry, Debra Mesch, and Laurie Paarlberg.
2006. “Motivating Employees in a New Governance Era: The Performance Paradigm
Revisited.” Public Administration Review 66(4): 505–514.
Edlin, Aaron S., and Chris Shannon. 1998. “Strict Monotonicity in Comparative Statics.”
Journal of Economic Theory 81(July): 201–219.
Epstein, David, and Sharyn O’Halloran. 1999. Delegating Powers: A Transaction Cost
Politics Approach to Policy Making Under Separate Powers. Cambridge: Cambridge Uni-
versity Press.
Gailmard, Sean, and John W. Patty. 2007. “Slackers and Zealots.” American Journal of
Political Science 51(4): 873–889.
Gailmard, Sean, and John W. Patty. 2012. “Formal Models of Bureaucracy.” Annual Review
of Political Science 15(1): 353–377.
Gailmard, Sean, and John W. Patty. 2013. Learning While Governing. Chicago: The
University of Chicago Press.
Gates, Robert Michael. 2014. Duty: Memoirs of a Secretary at War. New York: Alfred A.
Knopf.
Geanakoplos, John D, and Heraklis M Polemarchakis. 1982. “We can’t disagree forever.”
Journal of Economic Theory 28(October): 192–200.
Georgellis, Yannis, Elisabetta Iossa, and Vurain Tabvuma. 2011. “Crowding Out Intrinsic
Motivation in the Public Sector.” Journal of Public Administration Research and Theory
21(July): 473–493.
Golden, Marissa. 2000. What Motivates Bureaucrats? New York: Columbia University
Press.
38
Goodsell, Charles T. 2011. Mission Mystique. Washington, D.C: CQ Press.
Heclo, Hugh. 1974. Modern Social Politics in Britain and Sweden. New Haven: Yale Uni-
versity Press.
Heclo, Hugh. 1977. A Government of Strangers. Washington: Brookings Institution.
Heinrich, Carolyn J. 2007. “Evidence-Based Policy and Performance Management Challenges
and Prospects in Two Parallel Movements.” The American Review of Public Administra-
tion 37(September): 255–277.
Holmstrom, Bengt, and Paul Milgrom. 1991. “Multitask Principal-Agent Analyses: Incentive
Contracts, Asset Ownership, and Job Design.” Journal of Law, Economics, & Organization
7(January): 24–52.
Huber, John D., and Charles R. Shipan. 2002. Deliberate Discretion? The Institutional
Foundations of Bureaucratic Autonomy. Cambridge: Cambridge University Press.
Huber, John D., and Nolan McCarty. 2004. “Bureaucratic Capacity, Delegation, and Political
Reform.” The American Political Science Review 98(August): 481–494.
Kaufman, Herbert. 1967. The Forest Ranger. Baltimore: Johns Hopkins Press.
Kaufman, Herbert. 1981. The Administrative Behavior of Federal Bureau Chiefs. Washing-
ton, D.C: Brookings Institution.
Keller, Godfrey, Sven Rady, and Martin Cripps. 2005. “Strategic Experimentation with
Exponential Bandits.” Econometrica 73(1): 39–68.
Kettl, Donald F. 2005. The Global Public Management Revolution. Washington, D.C: Brook-
ings Institution Press.
Krier, James E. 1977. Pollution and Policy. Berkeley: University of California Press.
Lee, Kai N. 1993. Compass and Gyroscope. Washington, D.C: Island Press.
39
Lewis, David E. 2008. The Politics of Presidential Appointments. Princeton: Princeton
University Press.
Lewis, David E. 2011. “Presidential Appointments and Personnel.” Annual Review of Polit-
ical Science 14(1): 47–66.
Long, Edward, and Aimee L. Franklin. 2004. “The Paradox of Implementing the Govern-
ment Performance and Results Act: Top-Down Direction for Bottom-Up Implementation.”
Public Administration Review 64(3): 309–319.
Maple, Jack, and Chris Mitchell. 2010. The Crime Fighter. Random House Digital, Inc.
McCarty, Nolan. 2004. “The Appointments Dilemma.” American Journal of Political Science
48(3): 413–428.
Milgrom, Paul, and Chris Shannon. 1994. “Monotone Comparative Statics.” Econometrica
62(January): 157–180.
Milgrom, Paul, and John Roberts. 1988. “An Economic Approach to Influence Activities in
Organizations.” American Journal of Sociology 94(January): S154–S179.
Minozzi, William. 2013. “Endogenous Beliefs in Models of Politics.” American Journal of
Political Science 57(3): 566–81.
Moe, Terry M. 1985. “The Politicized Presidency.” The New Direction in American Politics
235: 269–71.
Moe, Terry M. 2012. “Delegation, Control, and the Study of Public Bureaucracy.” 10(2).
Morris, Stephen. 1995. “The Common Prior Assumption in Economic Theory.” Economics
and Philosophy 11(02): 227–253.
Moynihan, Donald P. 2008. The Dynamics of Performance Management. Washington, D.C:
Georgetown University Press.
40
Moynihan, Donald P., Sanjay K. Pandey, and Bradley E. Wright. 2012. “Prosocial Values
and Performance Management Theory: Linking Perceived Social Impact and Performance
Information Use.” Governance 25(3): 463–483.
Nie, Martin A., and Courtney A. Schultz. 2012. “Decision-Making Triggers in Adaptive
Management.” Conservation Biology 26(6): 1137–1144.
Office of Environmental Policy and Compliance. 2008. “Adaptive Management Implemen-
tation Policy.” Department of the Interior Departmental Manual 522(February).
Osborne, David. 1992. Reinventing Government. Reading, MA: Addison-Wesley.
Perry, James L., and Annie Hondeghem. 2008. Motivation in Public Management: The Call
of Public Service. Oxford: Oxford University Press.
Perry, James L., Annie Hondeghem, and Lois Recascino Wise. 2010. “Revisiting the Motiva-
tional Bases of Public Service: Twenty Years of Research and an Agenda for the Future.”
Public Administration Review 70(5): 681–690.
Pollitt, Christopher. 2006. “Performance Information for Democracy.” Evaluation 12(Jan-
uary): 38–55.
Prendergast, Canice. 2008. “Intrinsic Motivation and Incentives.” The American Economic
Review 98(May): 201–205.
Rogers-Dillon, Robin. 2004. The Welfare Experiments. Stanford, CA: Stanford Law and
Politics.
Sabatier, Paul A. 1988. “An advocacy coalition framework of policy change and the role of
policy-oriented learning therein.” Policy Sciences 21(June): 129–168.
Sabatier, Paul A., and Susan Hunter. 1989. “The Incorporation of Causal Perceptions into
Models of Elite Belief Systems.” Political Research Quarterly 42(September): 229–261.
41
Shapiro, Jacob N. 2013. The Terrorist’s Dilemma: Managing Violent Covert Organizations.
Princeton University Press.
Shapiro, Jacob N., and David A. Siegel. 2007. “Underfunding in Terrorist Organizations.”
International Studies Quarterly 51(June): 405–429.
Simon, Herbert A. 1947. Administrative Behavior. New York: Macmillan.
Smith, Alastair, and Allan C. Stam. 2004. “Bargaining and the Nature of War.” Journal of
Conflict Resolution 48(December): 783–813.
Starobin, Paul. 1995. Surviving at the EPA: Gary Dietrich. (C16-84-592.0) Cambridge, MA:
Kennedy School of Government Case Program.
Thompson, Frank J., and Norma M. Riccucci. 1998. “Reinventing Government.” Annual
Review of Political Science 1(1): 231–257.
U.S. Government Accountability Office. 1996. Managing Reform: Status of Agency Rein-
vention Lab Efforts. (GAO/GGD-96-69) Washington, D.C.: U.S. Government Printing
Office.
Van den Steen, Eric. 2009. “Authority versus Persuasion.” The American Economic Review
99(May): 448–453.
Van den Steen, Eric. 2010a. “Disagreement and the Allocation of Control.” Journal of Law,
Economics, and Organization 26(August): 385–426.
Van den Steen, Eric. 2010b. “Interpersonal Authority in a Theory of the Firm.” American
Economic Review 100(1): 466–90.
Volden, Craig, Michael M. Ting, and Daniel P. Carpenter. 2008. “A Formal Model of
Learning and Policy Diffusion.” American Political Science Review 102(August): 319–332.
42
Weible, Christopher M., Paul A. Sabatier, and Kelly McQueen. 2009. “Themes and Varia-
tions: Taking Stock of the Advocacy Coalition Framework.” Policy Studies Journal 37(1):
121–140.
Wilson, James Q. 1989. Bureaucracy. New York: Basic Books.
Wilson, James Q., and George L. Kelling. 1982. “Broken Windows.” Atlantic Monthly 249(3):
29–38.
Wood, B. Dan, and Richard W. Waterman. 1991. “The Dynamics of Political Control of the
Bureaucracy.” American Political Science Review 85(September): 801–828.
Yildiz, Muhamet. 2004. “Waiting to Persuade.” The Quarterly Journal of Economics
119(February): 223–248.
43
Online Appendix – NOT FOR PUBLICATION“Experimentation and Persuasion in Political Organizations”
Alexander V. Hirsch, February 28 2015.
This Online Appendix is divided into five parts. Appendix A provides a general state-
ment of strategies and equilibria, and describes how we do equilibrium selection when there
are multiple equilibria. Appendix B proves Proposition 3. Appendix C proves verbal state-
ments in Section 5.2 about the principal’s commitment problem and institutional solutions.
Appendix D contains accessory lemmas used in the other proofs. Appendix E treats the
variant of the model with differing preferences rather than beliefs discussed in Section 5.2.
1
A Equilibrium Characterization and Selection
We begin by introducing additional notation and providing a general equilibrium charac-
terization for the baseline model. This requires allowing mixed strategies for the principal,
which are omitted from the main text for simplicity. As in the main text, a strategy for the
agent consists of two functions e1 (x1) , e2 (x1, e1, y1, x2) to [0, 1] mapping histories to effort.
To allow for mixed strategies for the principal, we now denote the principal’s strategy as a
probability p1 ∈ [0, 1] of initially choosing policy a, and a set of probabilities px1
y1 ∈ [0, 1] of
sticking with the initial policy x1 after outcome y1 for every (x1, y1). Throughout we will
also use Pi (·) to denote probabilities evaluated with respect to the prior of player i – so
P2 (ω = b) = 1− θ2 denotes the agent’s prior belief that b is the correct policy.
The Agent’s Problem In the second period the agent exerts effort e2 (x1, e1, y1, x2) =
λP2 (ω = x2 |x1, e1, y1) and his expected utility is λ2
[P2 (ω = x2 |x1, e1, y1)]2. For simplicity,
denote his prior that ω = a as θ, his initial effort e1 as e, and pas and paf as ps and pf . Then
it is straightforward to show that his expected two-period utility when x1 = a as a function
of first period effort is
U2 (e, θ, ps, pf ) =
(θe− e2
2λ
)+λ
2(pf · θ (e+ (1− e)h (e, θ)) + (1− pf ) · (1− θ) (1− h (e, θ)))
+ (ps − pf )λ
2θe (A.1)
By symmetry the agent’s expected utility from exerting effort e1 on an arbitrary policy
x ∈ {a, b} is U2
(e1, P2 (ω = x) , pxs , p
xf
). Also note that U2 (·) is distinct from the expression
U (·) for the agent’s objective function as defined in the main Appendix because the latter
amalgamated utilities from the NLB and the GWL, and did not account for mixed strategies.
The Principal’s Problem The principal’s second period policy choices must be interim-
optimal given e2 (x1, e1, y1, x2) and her posteriors computed with the agent’s equilibrium
2
strategy. Thus, she must stay with the initial policy x if it succeeds (pxs = 1), and can only
stay with the initial policy if it fails(pxf > 0
)i.f.f. h (e1 (x) , P1 (ω = x)) ≥ 1−h (e1 (x) , P2 (ω = x)).
If the inequality is strict then pxf = 1.
For simplicity denote e1 as e and pas and paf as ps and pf . In period 1 the principal’s
expected utility from selecting policy a when she expects first period effort e and future
equilibrium behavior is,
U1
(e1, θ1, θ2, ps, pf
)= θ1 (e+ λ (eps + (1− e) pf · h (e, θ2)))+(1− θ1) (1− pf )λ (1− h (e, θ2))
(A.2)
By symmetry her expected utility from some x is U1
(e1, P1 (ω = x) , P2 (ω = x) , pxs , p
xf
).
Equilibrium Conditions Strategies(x1, pas , p
af , p
bs, p
bf
)and (e1 (x1) , e2 (x1, e1, y1, x2)) are
an equilibrium if and only if they satisfy the following conditions.
(Agent Optimality)
1. e2 (x1, e1, y1, x2) = λP2 (ω = x2 |x1, e1, y1) (the agent optimizes in the second period)
2. e1 (x) ∈ arg maxe1∈[0,1]
{U2
(e1, P2 (ω = x) , pxs , p
xf
)}∀x ∈ {a, b} (the agent optimizes in the
first period given the principal’s strategy and expectations about his own future effort)
(Principal Optimality)
1. pxs = 1 ∀x ∈ {a, b} (the principal always stays after success)
2. ∀x ∈ {a, b} pxf ≥ 0 ⇐⇒ h (e1 (x) , P1 (ω = x)) ≥ 1 − h (e1 (x) , P2 (ω = x)) and = 1
if satisfied with strict inequality (the principal only stays after failure if it is interim-
optimal given on-path posteriors)
3. x1 ∈ arg maxx∈{a,b}
{U1
(e1 (x) , P1 (ω = x) , P2 (ω = x) , pxs , p
xf
)}(the principal’s initial policy
choice maximizes her expected continuation value).
3
Equilibrium Selection and Notation Lemma 2 in the Appendix D proves the following
two statements: (i) whenever experimenting is an equilibrium in a subgame x1, it is the
optimal strategy for the principal even if she could precommit to her future decisions, and
(ii) if experimenting with x1 = a is not an equilibrium of the subgame following x1 = a,
then the unique equilibrium is rigid implementation. Together, these statements imply that
we can select the equilibria that are best for the principal by considering only pure strategy
equilibria, and choosing experimentation for sure in the subgame commencing with policy
x1 ∈ {a, b} (i.e., px1
s = 1 and px1
f = 0) whenever it is an equilibrium.
With this selection and restriction to pure strategies, we now introduce simplified nota-
tion for the agent’s best responses and principal’s utility. First let es (θ2) denote the agent’s
first-period best response when the principal’s pure strategy is (x1 = a, s), where s ∈ {R,E}
denotes whether the principal (R)igidly implements or (E)xperiments with the initial pol-
icy. It is also helpful to state implicit characterizations of eR (θ2) and eE (θ2) so we can
approximate their values in several proofs. The FOC from the proof of Lemma 1 yields that
eR (θ2) = λθ2
(1 +
λ
2k (e, θ2)
)and eE (θ2) = λθ2
(1 +
λ
2(k (e, θ2) + 1)
),
where k (e, θ) = (1− h (e, θ))2.
Second, let V s1 (θ1, θ2) denote the principal’s two-period expected utility when her pure
strategy is (x1 = a, s) and the agent best-responds, so V R1 (θ1, θ2) = U1
(eR (θ2) , θ1, θ2, 1, 1
)and V E
1 (θ1, θ2) = U1
(eE (θ2) , θ1, θ2, 1, 0
). Third, let θ (θ2) be the unique solution to
h(eR (θ2) , θ (θ2)
)= 1− h
(eR (θ2) , θ2
), (A.3)
By the equilibrium characterization, experimenting with x1 = a is an equilibrium of that
subgame i.f.f. θ1 ≤ θ (θ2), and it is easy verified that θ (θ2) > 1− θ2. Finally, by symmetry
the agent’s effort on x1 = b is es (1− θ2), the principal’s utility is V s1 (1− θ1, 1− θ2), and the
threshold for experimentation with b is 1− θ1 < θ (1− θ2) ⇐⇒ θ1 > 1− θ (1− θ2).
4
B Proof of Proposition 3
In Lemma 3 in the Supplemental Proofs, we prove the following handy property: fixing the
principal’s experimentation decisions down each path of play, if she prefers x1 = a given
beliefs θ1 (in the sense of ex-ante expected utility) then she also prefers it for all higher
beliefs θ1 > θ1 (of course, by symmetry if she prefers x1 = b at beliefs θ1 then she also prefers
it for all beliefs θ1 < θ1). We call this property “preference monotonicity” and employ it in
this and the subsequent proofs.
Part 1 We prove that deference expands in GWL as compared to the NLB both in the
baseline model when λ ≥ λ ≈ .23505, and when the principal can commit ∀λ. When λ goes
below λ and the principal can’t commit, there opens up a very tiny interval of beliefs where
the principal would have deferred in the No Learning Benchmark, but does not in the Game
with Learning. Using Mathematica we find that the size of this interval is maximized when
(λ = .123, θ2 = .545), and at these values is θ1 ∈ (.5, .508). In this interval the principal is
actually best off deferring by experimenting with b, but can’t credibly commit do so because
the agent is working so (unrealistically) little that almost nothing is learned from failure.
Rather than rigidly implement b, she experiments with a to better motivate the agent.
To prove that deference strictly expands when the principal can commit, we argue that the
following property proved in analytically in Lemma 4 in the Supplemental Proofs is sufficient:
a principal with beliefs θ1 = 1 − θ2 strictly prefers experimenting with b to experimenting
with a. To see this, observe that for θ1 ≤ 1− θ2 experimenting with a is strictly better than
any other strategy with a, by 1− θ2 < θ (θ2) and Lemma 2. If experimenting with b is also
strictly better than experimenting with a, then it is strictly better than any other strategy
with a, and x1 = b must be chosen (either experimenting or rigidly implementing). Thus
within the entire deference region from the NLB the principal defers in the GWL. Because
the preference for deference is there strict, the deference region must strictly expand.
To prove that the deference region strictly expands when the principal can’t commit,
5
observe that the preceding argument proves the principal defers whenever her beliefs θ1 are
∈[1− θ (1− θ2) , 1− θ2
], because this condition implies that the principal experiments in
the subgame following x1 = b without commitment. However, to show that the deference
region in the GWL contains the entire deference region in the NLB, we must also show
that the principal will still choose x1 = b even when θ1 ∈[
12, 1− θ (1− θ2)
]and she rigidly
implements in the subgame following x1 = b. If θ2 is such that 12≥ 1 − θ (1− θ2) then
this region is empty. If 1 − θ (1− θ2) > 12, then we require that a principal with beliefs
θ1 = 1 − θ (1− θ2) weakly prefers rigidly implementing b to experimenting with a. If this
holds then by preference monotonicity a principal with beliefs θ1 ∈[
12, 1− θ (1− θ2)
]who
would rigidly implement b if chosen also prefers that to experimenting with a, and so selects
b initially. If it fails, then for some principal beliefs a little bit below 1− θ (1− θ2) < 1− θ2,
the principal will experiment with a in the GWL when she would have deferred in the NLB.
Finally, we prove that a principal with beliefs θ1 = 1 − θ (1− θ2) weakly prefers rigidly
implementing b to experimenting with a in Lemma 5 in the Supplemental Proofs with the
aid of Mathematica, if and only if λ > λ ≈ .23505.
Part 2 We first argue that following three conditions on θ2 are jointly sufficient for the
principal to always defer in the first period regardless of her own beliefs; 1) V E1 (0, 1− θ2) ≥
V R1 (1, θ2), 2) V R
1 (1, θ2) > V E1 (1, θ2), 3) V R
1
(θ (1− θ2) , 1− θ2
)> V E
1
(1− θ (1− θ2) , θ2
).
Conditions (1) and (2) jointly imply that experimenting with b is better than both ex-
perimenting with or rigidly implementing a when θ1 = 1; by preference monotonicity
(proved in Lemma 3 in the Supplemental Proofs) this also implies that experimenting with
b is better ∀θ1 ∈ [0, 1]. Thus, whenever experimenting with b is an equilibrium strategy
(θ1 ≥ 1− θ (1− θ2)) it is chosen. Now whenever experimenting with b is not an equilibrium
strategy(θ1 < 1− θ (1− θ2)
), the principal compares rigidly implementing b to experiment-
ing with a; again applying preference monotonicity, condition (3) implies that she prefers
the former ∀θ1 ≤ 1 − θ (1− θ2). All possible principal beliefs are covered, which completes
6
the argument. We next argue that condition (1) is necessary for the principal to always
defer regardless of her own beliefs; if it fails then V R1 (1, θ2) > V E
1 (0, 1− θ2). For θ1 > θ (θ2)
the principal would rigidly implement a and experiment with b, and by continuity she also
prefers rigidly implementing a to experimenting with b for θ1 sufficiently close to 1. Thus
for such θ1 she selects x1 = a in equilibrium and does not defer.
Finally, Lemma 6 in the Supplemental Proofs proves analytically that when λ > λ (where
λ is the unique solution to λ (1 + λ)(1 + λ
2
)= 1 and ≈ .5214), each of the three conditions
k ∈ {1, 2, 3} holds for θ2 in a nonempty interval (0, εk).27 Since they then all hold for
θ2 ∈ (0, εk), λ > λ is therefore sufficient for existence of a range of θ2 where the principal
always defers. In addition, the supplemental mathematica code verifies that condition 1 fails
∀θ2 > 0 when λ ≤ λ, which is equivalent to
V E1 (0, 1− θ2)− V R
1 (1, θ2)
λθ2
< 0 ∀θ2 ∈[0,
1
2
]when λ < λ.
λ > λ is thus also necessary for existence of a range of θ2 where the principal always defers.
C Underexperimentation and Commitment
In this section we formally state, and then prove, several verbal statements in Section 5.2
about the principal sometimes “underexperimenting,” and institutional arrangements that
can help the principal solve this commitment problem.
The first result pertains the possibility of “underexperimentation” in equilibrium. For-
mally, we say that the principal underexperiments if she rigidly implements a policy x1 on
the equilibrium path of play, but a strategy of experimenting with some policy x1 or ¬x1
would yield higher ex-ante higher expected utility. In other words, she underexperiments if
she implements a polciy in equilibrium, but would experiment with some policy if she could
27Note that we have already shown that property (3) holds ∀θ2 when λ > λ in Lemma 5;however, the proof is computational. In Lemma 6 we prove analytically that property (3)holds for sufficiently low θ2 for any λ ∈ [0, 1].
7
commit to her entire two period strategy ex-ante.
Underexperimentation happens in the model because it is possible that a principal with
relatively strong beliefs in favor of a policy would be better off ex-ante committing to exper-
iment with that policy in order to better motivate the agent, but after actually observing
failure she will want to renege on the experiment and persist with the initial policy. The agent
will anticipate this rigidity, work accordingly, experimentation will collapse in equilibrium.
Formally, we have the following result.
Proposition C.1. For some beliefs (θ1, θ2) the principal underexperiments. Conversely, the
principal never experiments with a policy in equilibrium when rigidly implementing some
policy would yield higher ex-ante expected utility.
The next result considers an institutional arrangement that can eliminate underexperi-
mentation, and in particular that will result in the principal’s “optimal policy experiment”
becoming the equilibrium outcome. The phrase “optimal policy experiment” refers to the
policy experiment that would yield the highest ex-ante expected utility if the principal could
commit to her strategy ex-ante. The result states that creating exogenous “costs to rigidity”
can induce the optimal policy experiment when she underexperiments.
Proposition C.2. For any beliefs (θ1, θ2) s.t. the principal underexperiments, there is a cost
c of maintaining policy after failure that makes her optimal policy experiment an equilibrium.
The final result pertains to a model variant in which the principal can first “appoint”
a player with different beliefs θ1 ∈ [0, 1] to make decisions in her place, and that player
and the agent will then play the equilibrium that is best for the “original” principal with
beliefs θ1. In particular, we look for conditions under which the principal is strictly better off
appointing somebody with beliefs that differ from her own. This yields the following result.
Proposition C.3. Suppose that a principal with beliefs θ1 could appoint a player with beliefs
θ1 to make policy decisions in her place. If appointing herself is not optimal, then any optimal
appointee θ∗1 believes less strongly in the resulting policy x∗1 than the principal does.
8
The intuition here is that the principal would appoint somebody with different beliefs
when she would like to commit ex-ante to experiment with some policy x∗1, but her beliefs
are such that she must rigidly implement x∗1 whenever she selects it, and so in equilibrium
she either rigidly implements x∗1 or experiments with ¬x∗1. The optimal appointee will
be anybody whose beliefs allow her to experiment with x∗1, and this must necessarily be
somebody who believes less strongly in it.
Proofs of underexperimentation and commitment
Proof of Proposition C.1 First, there is never overexperimentation in equilibrium, i.e.,
the principal never experiments with x1 when rigidly implementing either x1 or ¬x1 would
be better. The former is immediately ruled out by Lemma 2. The latter also ruled out
– if rigidly implementing ¬x1 were optimal with commitment then it must be better than
experimenting with ¬x1, and by implication the unique equilibrium of the subgame following
¬x1 without commitment; thus, the principal failing to choose it would be a contradiction.
Next, we there ∃ (θ1, θ2) s.t. underexperimentation occurs, i.e. the principal rigidly im-
plements x1 when experimenting would be better. In part 3 of Lemma 6 in the Supplemental
Proofs we show there exists a nonempty interval of θ2 s.t. the principal prefers rigidly im-
plementing b to experimenting with a when θ1 = 1 − θ (1− θ2). By continuity, rigidly
implementing b is thus the equilibrium outcome for θ1 = 1 − θ (1− θ2) − ε when ε > 0 is
sufficiently close to 0, and it is also worse than experimenting with b since the agent’s effort
drops discretely. Formally,
V E1
(θ (1− θ2) , 1− θ2
)= U1
(eE (1− θ2) , θ (1− θ2) , 1− θ2, 1, 0
)= U1
(eE (1− θ2) , θ (1− θ2) , 1− θ2, 1, 1
)> U1
(eR (1− θ2) , θ (1− θ2) , 1− θ2, 1, 1
)= V R
1
(θ (1− θ2) , 1− θ2
)The first and last equalities follow from the definitions, the second follows from the definition
9
of θ (·), and the inequality from U1 (·) increasing in e1 and eE (1− θ2) > eR (1− θ2). �
Proof of Proposition C.2 If there were an exogenous cost c > 0 of maintaining policy
after failure, then for experimentation to fail to be an equilibrium of the subgame following
policy x requires that the principal prefer to reselect x given initial effort eE (P2 (ω = x)),
the players’ resulting posterior beliefs, and the cost c. This condition is,
h(eE (P2 (ω = x)) , P1 (ω = x)
)− c
λ> 1− h
(eE (P2 (ω = x)) , P2 (ω = x)
)Now suppose the principal underexperiments by rigidly implementing policy x1∗ = a
when c = 0. Experimenting must then be the unique equilibrium of the subgame following
b, it will remain so with any higher cost c > 0, and also rigidly implementing a is better
than experimenting with b. So experimenting with a must be the optimal policy experiment,
but not an equilibrium of the subgame following a without commitment. However, it will
become one when
c ≥ λ(h(eE (θ2) , θ1
)−(1− h
(eE (θ2) , θ2
))),
and so the principal will select it in equilibrium. A symmetric argument holds when the
principal underexperiments by rigidly implementing b. �
Proof of Proposition C.3 Suppose appointing herself is not optimal; then for a principal
with beliefs θ1, the resulting equilibrium (x∗, s∗) is strictly worse than the equilibrium that
would result if an optimal appointee with beliefs θ1 were making policy decisions. Denote
this equilibrium (x, s). First note s must not an equilibrium experimentation decision the
subgame following x for θ1 (since otherwise the principal would choose it). We next argue
that s = E. If s = R then by implication experimenting must be an equilibrium of the
subgame following x1 = x for θ1; but then by Lemma 2 it is also strictly better than rigidly
implementing s and we have a contradiction. Finally, since (x, s = E) is an equilibrium of
10
the subgame following x for θ1 but not θ1, it follows that
h(eE (P2 (ω = x)) , P1 (ω = x)
)< 1− h
(eE (P2 (ω = x)) , P2 (ω = x)
)< h
(eE (P2 (ω = x)) , P1 (ω = x)
),
implying P1 (ω = x) < P1 (ω = x). �
D Supplemental Proofs
In this section we prove a sequence of lemmas that are employed in the previous proofs.
Lemma 1. Say that the agent’s problem is “well-behaved” when e∗ (θ, η1, η2) = arg maxe∈[0,1]
(U (e, θ, η1, η2))
is unique and ∈ (0, 1). The set of λ s.t. the agent’s problem is well-behaved ∀ (θ ∈ (0, 1) , η1, η2) is
an interval λ ∈ [0, λ), where λ ∈(√
5−12,√
3− 1)
and is ≈ .68466.
Proof: First, it is simple to verify that the derivative of the agent’s objective function
U (e, θ, η1, η2) from the proof of Propositions 1 and 2 in the main appendix is:
∂U (e, θ, η1, η2)
∂e=
1
λ
(−e+ λθ
(1 +
λ
2η1 [k (e, θ) + η2]
)),
where k (e, θ) = (1− h (e, θ))2.
Second, observe ∂U∂e
> 0 at e = 0, and that ∂U∂e
is convex in e, which follows from the
convexity of k (e, θ). The set of maximizers is thus either a singleton in the interior (the first
and possibly only point where the FOC is satisfied), a singleton on the boundary e = 1, or a
pair where one of the two elements is e = 1. This further implies that whenever the problem
is not “well behaved,” e = 1 is a maximizer.
Third, observe that ∂2U∂e∂λ
= eλ2
+ θ2η1 (k (e, θ) + η2) > 0. Thus by Milgrom and Shannon
(1994) the set of maximizers of U (·) is weakly increasing in λ. This implies that the set of λ
s.t. the problem is well behaved for a given (η1, η2, θ) is an interval; if it were well behaved
11
for λ′ but not λ′′ < λ′, then e = 1 would be a maximizer for the former but not the latter,
contradicting weak set increasingness. Lastly, this implies that the set of λ that are well
behaved for all feasible parameters ∀ (θ ∈ (0, 1) , η1, η2) is also an interval [0, λ); if it were
not then it would also not an interval for some specific profile of parameters (η1, η2, θ), a
contradiction.
We can bound λ below and above analytically, and computationally compute an estimated
value. We must have λ >√
5−12
since the problem is strictly concave for all feasible parameters
(and by implication well behaved) when ∂U∂e
∣∣e=1
= − 1λ+θ(1 + λ
2η1 (1 + η2)
)≤ − 1
λ+(1 + λ) <
0, which holds i.f.f. λ <√
5−12
. We also must have λ <√
3 − 1 since best-response effort
at θ = 1 when the principal experiments is e∗ (1, 1, 1) = λ(1 + λ
2
)< 1 i.f.f. λ <
√3 − 1.
In the supplemental mathematica code to this document, we verify that λ ≈ .68466. Since
the set of maximizers is weakly increasing in η2 (by ∂2U∂e∂η2
= λθ > 0), to find λ it suffices to
check that the problem is well-behaved ∀θ when η2 = 1. We thus identify λ by compute the
highest λ s.t. ∀θ ∈ [0, 1], the agent’s utility at the lowest solution to the first-order condition
is greater than his utility from e = 1. �
Lemma 2. The following two statements hold when the principal can play mixed strategies.
(i) Whenever experimenting is an equilibrium of the subgame following initial policy x1 ∈
{a, b}, then it is also the optimal (pure or mixed) strategy for the principal if she could
precommit to her responses to success and failure.
(ii) Whenever experimenting with x1 = a is not an equilibrium of the subgame following
x1 ∈ {a, b}, then the unique equilibrium is rigid implementation.
(Part 1) Because of symmetry we can restrict attention to the subgame following x1 = a.
We first must characterize the agent’s best response effort arg maxe∈[0,1]
{U2 (e, θ2, ps, pf )} to a
general mixed strategy by the principal in the Game with Learning as characterized in
12
Appendix A. Taking the derivative of equation (A.1) w.r.t. e yields:
∂U2 (e, θ, ps, pf )
∂e=
1
λ
(−e+ λθ
(1 +
λ
2[k (e, θ) + (ps − pf )]
)).
Now, it is easily verified that∂U2(e,θ,ps,pf)
∂e=
∂U(e,θ,1,ps−pf)∂e
, where U (·) is the form of the
agent’s objective function that we used in the Main Appendix, which amalgamated payoffs
from the NLB and the GWL, but did not account for principal mixed strategies. This means
we can “piggyback” off of the analysis of that problem in Lemma 1. First, it remains true that
the agent’s problem is well behaved ∀ (θ, ps, pf ) when λ < λ. Second, when λ < λ we have
that (i) arg maxe∈[0,1]
{U2 (e, θ2, ps, pf )} = e∗ (θ, 1, ps − pf ) where e∗ (θ, η1, η2) is the maximizer of
U (θ, η1, η2), and (ii) e∗ (θ, 1, ps − pf ) is strictly increasing in ps and strictly decreasing in pf
by ∂2U∂e∂η2
= λθ > 0 and Theorem 1 of Edlin and Shannon (1998). For notational simplicity,
for the remainder of the proof we will write e∗ (θ, 1, ps − pf ) as e (θ, ps − pf ) so as not to carry
around unecessary terms.
(Part 2) We now prove (i). Employing the characterization in Appendix A, the prin-
cipal’s utility from choosing x1 = a if she could precommit to her responses to success and
failure (ps, pf ) would be U1 (e (θ2, ps − pf ) , θ1, θ2, ps, pf ). It is easily verified that U1 (·) is
strictly increasing in ps holding first period effort e fixed, and increasing in e when ps = 1.
Also recall that e (θ2, ps − pf ) is increasing in ps. Hence,
U1 (e (θ2, ps − pf ) , θ1, θ2, ps, pf ) < U1 (e (θ2, ps − pf ) , θ1, θ2, 1, pf ) < U1 (e (θ2, 1− pf ) , θ1, θ2, 1, pf )
and the optimal choice after success is therefore to always stay, i.e. p∗s = 1. This feature
is shared with the baseline model without commitment. Intuitively, the reason is that a
higher probability of staying with the initial policy after success is both interim-better for
the principal, and also better motivates the agent ex-ante.
Given the above analysis, the principal’s optimal choice after policy failure satisfies p∗f ∈
13
arg maxpf∈[0,1] {U1 (e (θ2, 1− pf ) , θ1, θ2, 1, pf )}. Now her utility U1 (·) can be rewritten as,
θ1e · (1 + λps)
+ (1− θ1e) · λ (pf · h (e, θ1)h (e, θ2) + (1− pf ) · (1− h (e, θ1)) (1− h (e, θ2))) ,
and it is easily verified that this is decreasing in pf whenever h (e, θ1) ≤ 1− h (e, θ2), i.e., if
posteriors after failure are s.t. it is better to switch. If experimenting is an equilibrium of
the subgame x1 = a then by definition this property holds for e = e (θ2, 1) (see equilibrium
conditions in Appendix A). In addition, recall that U1 (·) is increasing in e when ps = 1 and
that e (θ2, 1− pf ) is decreasing in pf . Combining these observations yields,
U1 (e (θ2, 1) , θ1, θ2, 1, 0) > U1 (e (θ2, 1) , θ1, θ2, 1, pf ) > U1 (e (θ2, 1− pf ) , θ1, θ2, 1, pf )
whenever experimenting is an equilibrium. Consequently(p∗s = 1, p∗f = 0
), i.e. experiment-
ing, is the optimal strategy with commitment.
(Part 3) We now prove (ii). If experimentation is not an equilibrium of the subgame fol-
lowing x1 = a, then by the equilibrium characterization in Appendix A, h(e(θ2, 1− paf
), θ1
)>
1− h(e(θ2, 1− paf
), θ2
)when paf = 0. Another equilibrium with paf > 0 would require that
the l.h.s. ≤ r.h.s. – but this cannot be since e(θ2, 1− paf
)is decreasing in paf , so the l.h.s. is
increasing and the r.h.s. is decreasing. �
Lemma 3 (Preference Monotonicity). For all (s, s′) ∈ {R,E}2,
V s1
(θ1, θ2
)> V s′
1
(1− θ1, 1− θ2
)→ V s
1 (θ1, θ2) > V s′
1 (1− θ1, 1− θ2) for all θ1 > θ1.
In words, fixing the principal’s experimentation decisions down each path of play, if she
prefers x1 = a given beliefs θ1 then she also prefers it for any higher belief.
Proof: Because the principal’s expected utility for each first period policy is linear in her
14
prior beliefs (holding her future experimentation decisions fixed), a nonmonotonicity would
imply that x1 = b is better when ω = a and x1 = a is better when ω = b. The former could
not be true if she is rigidly implementing b since it would always fail, and the latter could
not be true if she is rigidly implementing a. Thus, a nonmonotonicity requires that she be
experimenting down both paths of play. To rule it out, it therefore suffices to show that
experimenting with b is better than experimenting with a when ω = b (given the agent is
predisposed to b, i.e. θ2 ≤ 12). This is,
(1 + λ) eE (1− θ2) > λ(1− h
(eE (θ2) , θ2
))Applying the definition from Appendix A, we know eE (1− θ) > λ (1− θ) → the l.h.s. is
> (1 + λ)λ (1− θ2). Also, eE (θ2) < λθ2 (1 + λ) → the r.h.s. < λ(1−θ2)
1−θ22λ(1+λ). The above
inequality will thus hold when
(1 + λ)λ (1− θ2) >λ (1− θ2)
1− θ22λ (1 + λ)
⇐⇒ (1 + λ)2 <1
θ22
.
The l.h.s. is < the r.h.s. ∀θ2 ≤ 12
when λ < 1, which always holds by assumption since
λ < λ < 1.�
Lemma 4. The principal strictly prefers experimenting with b to experimenting with a when
θ1 = 1− θ2 and θ2 <12.
Proof: Let φ (θ2) = λθ2
(1 + λ
2
(k(eE (θ2) , θ2
)+ 1))
(1 + λ)−λ(1− h
(eE (1− θ2) , 1− θ2
));
this is the principal’s utility difference between experimenting with a and experimenting with
b when ω = a. Applying symmetry, her expected utility difference between experimenting
with a and experimenting with b with prior θ1 is θ1φ (θ2)− (1− θ1)φ (1− θ2). We now wish
to show that this is < 0 when θ1 = 1− θ2 given that θ2 <12, i.e.
(1− θ2)φ (θ2)− θ2φ (1− θ2) < 0 ⇐⇒ φ (θ2)
λθ2
− φ (1− θ2)
λ (1− θ2)< 0 when θ2 <
1
2.
15
It is simple to verify that
φ (θ2)
λθ2
=
(1 +
λ
2
(k(eE (θ2) , θ2
)+ 1))
(1 + λ)− 1
1− eE (1− θ2) · (1− θ2),
and by symmetry it suffices to show that φ(θ2)λθ2− φ(1−θ2)
λ(1−θ2)> 0 when θ2 >
12. Using substitution
and rearranging, we then have that φ(θ2)λθ2− φ(1−θ2)
λ(1−θ2)> 0 ⇐⇒
θ2eE (θ2)− (1− θ2) eE (1− θ2)
(1− θ2eE (θ2)) · (1− (1− θ2) eE (1− θ2))>λ
2(1 + λ)
(k(eE (1− θ2) , 1− θ2
)− k
(eE (θ2) , θ2
))(D.1)
Since the denominator of the l.h.s. is < 1, and 1 + λ < 2, the above inequality holds if the
following yet stronger inequality holds,
θ2eE (θ2)− (1− θ2) eE (1− θ2) > λ
(k(eE (1− θ2) , 1− θ2
)− k
(eE (θ2) , θ2
)). (D.2)
Now again substituting in the definition of eE (·), the l.h.s. can be rewritten
λ
((2θ2 − 1)
(1 +
λ
2
)+λ
2
(θ2
2k(eE (θ2) , θ2
)− (1− θ2)2 k
(eE (1− θ2) , 1− θ2
))),
implying that the desired inequality holds i.f.f.
(2θ2 − 1)
(1 +
λ
2
)>
(1 +
λ
2(1− θ2)2
)k(eE (1− θ2) , 1− θ2
)−(
1 +λ
2θ2
2
)k(eE (θ2) , θ2
)It is easily verified that the r.h.s. <
(1 + λ
2
) (k(eE (1− θ2) , 1− θ2
)− k
(eE (θ2) , θ2
))when
θ2 > 12, implying that the above inequality holds if the following yet stronger inequality
holds:
2θ2 − 1 > k(eE (1− θ2) , 1− θ2
)− k
(eE (θ2) , θ2
)(D.3)
We now show that eqn. (D.3) holds. Substituting in the definition of k (·), observing that
16
2θ2 − 1 = θ22 − (1− θ2
2), and rearranging, we have that the inequality is equivalent to
θ22
(1− 1
(1− (1− θ2) eE (1− θ2))2
)> (1− θ2)2
(1− 1
(1− θ2eE (θ2))2
)
Since eE (θ2) is increasing in θ2, this clearly holds because (1− θ2) eE (1− θ2) < θ2eE (θ2)
when θ2 >12. The desired property is hence shown. �
Lemma 5. For all θ2 s.t. 1− θ (1− θ2) > 12, the principal weakly prefers rigidly implement-
ing b to experimenting with a when θ1 = 1− θ (1− θ2).
Proof: Observe that 1 − θ (1− θ2) is = 1 at θ2 = 0, less than 12
at θ2 = 12, and strictly
decreasing in θ2. Thus, there exists some unique θ2 satisfying 1− θ(
1− θ2
)= 1
2s.t. the set
of beliefs[
12, 1− θ (1− θ2)
]where the principal would rigidly implement b in that subgame
is nonempty if and only if θ2 < θ2. Note that θ2 is a function of λ so we henceforth write
θ2 (λ) for clarity. Now applying the definitions and rearranging, we wish to show that
∀λ ∈[λ, λ]
,
θ (1− θ2) ·(V R
1 (1, 1− θ2)− V E1 (0, θ2)
)>(1− θ (1− θ2)
)· V E
1 (1, θ2) ∀θ2 ∈[0, θ2 (λ)
],
where λ ≈ .23505. We verify this step in the supplemental mathematical code. �
Lemma 6. The following three properties hold.
1. When λ > λ, V E1 (0, 1− θ2)− V R
1 (1, θ2) > 0 for θ2 in a nonempty interval [0, ε1].
2. When λ ∈ [0, 1] , V R1 (1, θ2)− V E
1 (1, θ2) > 0 for θ2 in a nonempty interval [0, ε2].
3. When λ ∈ [0, 1] , V R1
(θ (1− θ2) , 1− θ2
)− V E
1
(1− θ (1− θ2) , θ2
)> 0 for θ2 in a
nonempty interval [0, ε3].
17
To show that some function f (θ2) satisfying f (0) = 0 is > 0 for θ2 ∈ (0, ε) where ε > 0,
it suffices to show by continuity that f(θ2)θ2
∣∣∣θ2=0
> 0 provided that this quantity is finite. We
now show this for each of the desired expressions.
Property 1: When ω = a, the principal’s utility from experimenting with b is,
λ(1− h
(eE (1− θ2) , 1− θ2
))=
λθ2
1− (1− θ2) eE (1− θ2),
and from rigidly implementing a is,
eR (θ2) (1 + λ)+(1− eR (θ2)
)h(eR (θ2) , θ2
)= λθ2
((1 + λ)
(1 +
λ
2k(eR (θ2) , θ2
))+
(1− eR (θ2)
)2
1− θ2eR (θ2)
)
Since eR (0) = 0, eE (1) = λ(1 + λ
2
), and k
(eR (0) , 0
)= 1,
1
λθ2
(V E
1 (0, 1− θ2)− V R1 (1, θ2)
)∣∣∣∣θ2=0
=
(1
1− λ(1 + λ
2
))− ((1 + λ)
(1 +
λ
2
)+ 1
).
Manipulating the above expression demonstrates that it is ≥ 0 i.f.f.
λ (1 + λ)
(1 +
λ
2
)> 1. (D.4)
which holds i.f.f. λ > λ by definition.
Property 2: When ω = a, the principal’s utility from experimenting with a is,
eE (θ2) (1 + λ) = λθ2
(1 +
λ
2
(k(eE (θ2) , θ2
)+ 1))
(1 + λ) ,
Since k(eE (0) , 0
)= 1 and applying Part 1, the desired condition is equivalent to
(1 + λ)
(1 +
λ
2
)+ 1 > (1 + λ)2
which holds if λ (1 + λ) < 2 ⇐⇒ λ < 1.
18
Property 3: Using the definition in equation (A.3), the threshold function θ (θ2) in closed
form is θ (θ2) = 1−θ2(1−θ2)+θ2(1−eE(θ2))2
. Thus, when θ1 = 1 − θ (1− θ2) the difference in the
principal’s utility between rigidly implementing b and experimenting with a is,
θ (1− θ2) ·(V R
1 (1, 1− θ2)− V E1 (0, θ2)
)−(1− θ (1− θ2)
)· V E
1 (1, θ2) (D.5)
The first term is the subjective probability that ω = b times the utility difference between
rigidly implementing b and experimenting with a when ω = b. The second term is the
subjective probability that ω = a times the utility of experimenting with a when ω = a (the
payoff from rigidly implementing b when the state is a is 0).
Now to get the desired expression we divide through by θ2 and then evaluate at θ2 = 0
– we do so by dividing θ (1− θ2) by θ2 in the first term, then V E1 (1, θ2) by θ2 in the second
term, and then evaluating all expressions at θ2 = 0. First,
(1
θ2
)· θ (1− θ2) =
1
θ2 + (1− θ2) (1− eE (1− θ2))2
which is equal to(1− λ
(1 + λ
2
))−2at θ2 = 0. Second, V R
1 (1, 1− θ2) = 2λ, V R1 (0, θ2) = λ,
and 1 − θ (1− θ2) = 1 at θ2 = 0. Third, from the proof of property 2 we know(
1θ2
)·
V E1 (1, θ2) = λ
(1 + λ
2
(k(eE (θ2) , θ2
)+ 1))
(1 + λ) which is = λ (1 + λ)2 evaluated at θ2 = 0.
Assembling these observations, the desired expression is
(1− λ
(1 +
λ
2
))−2
· (2λ− λ)− λ (1 + λ)2 > 0 ⇐⇒(
1− λ(
1 +λ
2
))2
· (1 + λ)2 < 1
Now the l.h.s. is < (1− λ)2 · (1 + λ)2 = (1− λ2)2 ≤ 1 ∀λ ∈ [0, 1], so the result is shown. �
19
E Preferences vs. Beliefs
Suppose the state ω ∈ {a, b} is payoff-irrelevant for the agent, and he shares the principal’s
prior θ1 over P (ω = a). Instead, he simply has his own return to effort of πxe when working
on each policy x ∈ {a, b}, and πb > πa.
In the second period, his objective function from exerting second period effort e on some
policy x ∈ {a, b} is simply πxe − e2
2λregardless of the history. It is straightforward to show
that his optimal level of effort is λπx, and his expected payoff is λ2π2x.
Now consider the first period. Deriving the agents expected payoff from exerting first
period effort e when the principal rigidly implement policy a is immediate. His utility when
the principal experiments with a is:
− e2
2λ+ πae+ θ1
(e
(λ
2π2a
)+(1− e1
)(λ2π2b
))+ (1− θ2)
(λ
2π2b
),
which easily reduces to the expression in the main text.
Both objective functions are strictly concave, so the unique solution is given by the first-
order condition. The agent’s effort on a given rigid implementation is simply λπa. If the
principal experiments then the derivative of the objective function is
− eλ
+ πa −λ
2θ1
(π2b − π2
a
),
so optimal effort is max{λπa − λ2
2θ1 (π2
b − π2a) , 0
}and is strictly lower as stated in the text.
Finally, by symmetry the agent’s effort if the principal rigidly implements b is λπb, and
if she experiments with b is min{λπb + λ2
2(1− θ1) (π2
b − π2a) , 1
}. The agent therefore works
harder on a policy experiment with the policy b that he prefers; the “threat” of switching to
his less preferred policy a after failure motivates him, as intuition would suggest.
20