arX
iv:2
108.
0721
8v1
[ec
on.T
H]
16
Aug
202
1
Strategic Exploration for Innovation∗
Shangen Li†
August 17, 2021
Abstract
We analyze a game of technology development where players allocate resources
between exploration, which continuously expands the public domain of available
technologies, and exploitation, which yields a flow payoff by adopting the explored
technologies. The qualities of the technologies are correlated and initially un-
known, and this uncertainty is fully resolved once the technologies are explored.
We consider Markov perfect equilibria with the quality difference between the best
available technology and the latest technology under development as the state vari-
able. In all such equilibria, while the players do not fully internalize the benefit of
failure owing to free-riding incentives, they are more tolerant of failure than in the
single-agent optimum thanks to an encouragement effect. In the unique symmetric
equilibrium, the cost of exploration determines whether free-riding prevails as team
size grows. Pareto improvements over the symmetric equilibrium can be achieved
by asymmetric equilibria where players take turns performing exploration.
Keywords: Strategic experimentation, Innovation, Multi-armed bandit, Organiza-
tional learning.
JEL classification: C73; D83; O3.
∗I am grateful to Steven Callander, Martin Cripps, Christian Ewerhart, David Hémous, Johannes
Hörner, Vijay Krishna, Margaret Meyer, Alessandro Lizzeri, Nick Netzer, Anne-Katrin Roesler, Bruno
Strulovici, Leeat Yariv, and seminar participants at University of Zürich and Swiss Theory Day 2021 for
helpful comments. I am especially indebted to Marek Pycia, Armin Schmutzler, and Aleksei Smirnov for
advice and encouragement. All errors are mine.†Department of Economics, University of Zürich. E-mail address: [email protected]
1
1 Introduction
Without innovation, we would still be living in caves and hunting with stones. Though
people celebrate disruptive inventions such as penicillin and the Internet, innovations
are usually accomplished through hundreds and thousands of minor improvements.
These incremental improvements are achieved from iterative experimentation through
small changes, and can have a huge cumulative impact in the long run.1 Incremental
experimentation of this kind is particularly pertinent in today’s data-driven world, in
which experiments can be conducted more swiftly and inexpensively than ever before
thanks to the fast-growing fields of data science and artificial intelligence. Tech giants
such as Microsoft and Amazon run thousands of experiments every day to optimize their
products, processes, customer experiences, and business models.2
However, even for these companies with digital roots that can run experiments
at virtually zero cost, valuable resources still need to be allocated to the design and
implementation of the experiments, as well as to the collection and analysis of the
data they produce. Moreover, because of its uncertain nature, experimentation does
not guarantee innovation. As a result, the trade-off between creating value through
innovation—–to explore—–and capturing value through operations—–to exploit—–is
behind all innovation processes in organizations. On top of such complexities, incentives
for experimentation are susceptible to free-riding opportunities when successes and
failures are publicly observed. Such a strategic effect inevitably creates inefficiency of
learning, having a lasting impact on technological progress and organizational growth.
Examples of strategic exploration certainly go beyond intracompany R&D activities.
Research joint ventures formed by multiple firms, open-source software development,
consumer search, and even academic researchers working on a joint paper are also
virtually involved in strategic exploration.
This paper analyzes a game of strategic exploration in which a finite number of
forward-looking players jointly search for innovative technologies. Given a set of pub-
licly available technologies, at each point in time, each player decides simultaneously
how to allocate a perfectly divisible unit of resource between exploration, which ex-
pands the set of feasible technologies at a rate proportional to the resource allocated,
and exploitation, which yields a flow payoff from the adoption of one of the available
technologies. This flow payoff is generated according to an (initially unknown) realized
path of Brownian motion, which maps technologies to their qualities. Not all tech-
nologies are readily available to the public, and only the qualities associated with the
1Garcia-Macia, Hsieh, and Klenow (2019) estimate that improvements to already existing products
accounted for about 77 percent of economic growth in the United States between 2003 and 2013.
2See Thomke (2020).
2
explored technologies are known. We focus on the scenario in which players’ actions
and outcomes are publicly observed. Hence, the explored technologies are pure public
goods, and there are perfect knowledge spillovers between players.
One of the major contributions of this paper is to extend the classic game of strate-
gic experimentation to a setting that features a continuum of correlated arms with the
possibility of discovering new arms. When modeling technologies as arms, correlation
between arms and discoveries of new arms are key features of technology development
and innovation. However, a fixed set of independent arms is assumed in most literature
on strategic experimentation because the analysis is substantially complicated by corre-
lation. We demonstrate that under a specific form of correlation, such complexities can
be circumvented by simply focusing on incremental experimentation.
We first consider the efficient benchmark in which the players work jointly to maxi-
mize the average payoff. The solution to this cooperative problem takes a simple cutoff
form: All players allocate resources exclusively to exploration if and only if the quality
difference between the best available technology and the latest outcome of the experi-
ment is within a time-invariant cutoff. Players in a larger team can attain higher payoffs
by being more tolerant of unsuccessful outcomes. As team size grows, because of the
growing total learning capacity, exploration extends to arbitrarily pessimistic states in
the limit, and the welfare under complete information can be achieved asymptotically.
In the strategic problem, we restrict attention to Markov perfect equilibria (MPE)
with the difference between the qualities of the best available technology and the latest
technology under development as the state variable. Despite the considerable disparities
between our setting and the two-armed bandit models in the strategic experimentation
literature, all MPE in our model exhibit a similar encouragement effect: The presence
of other players encourages at least one of them to continue exploration at states where
a single agent would already have given up. The players thus become more tolerant
of failure in the hope that successful outcomes will bring the other players back to
exploration in the future, promoting innovation and sharing its burden. In addition,
exactly because of such an incentive, any player who never explores would strictly prefer
the deviation that resumes the exploration process as a volunteer at the state where all
exploration stops. Therefore, no player always free-rides in equilibrium in spite of the
fact that exploration per se never generates any payoffs.
We further show that there is no MPE in which all players use simple cutoff strategies
as in the cooperative solution. A symmetric equilibrium thus necessarily requires
the players to choose an interior allocation of their resources at some states. We
establish the existence and uniqueness of the symmetric equilibrium, and provide closed-
form representations for the equilibrium strategies and the associated payoff functions.
3
Depending on the parameters, the symmetric equilibria can be classified into two types
as follows. In the first type of equilibrium, all players allocate resources exclusively to
exploration when the latest outcome is sufficiently close to the highest-known quality,
exclusively to exploitation when it is sufficiently far away, and to both simultaneously
when it is in between. In such a case, the resource constraints on exploration are binding
at optimistic states, and therefore will be referred to as the binding case. In the second
type of equilibrium, however, the resource constraints are never binding. An interior
allocation is chosen by each player when the latest outcome is close to the highest-known
quality, and all resources are allocated to exploitation if it is sufficiently far away. In
this non-binding case, the incentive to free-ride is so strong that players even allocate a
positive fraction of their resources to exploitation when the latest technology achieves
the highest-known quality and thus innovation takes place instantaneously. Moreover,
this free-rider effect completely freezes the encouragement effect: Additional players
would not be able to encourage the players to explore at more pessimistic states and
would bring down both the individual and the overall intensity of exploration.
We then show the following comparative statics: The symmetric equilibrium always
belongs to the binding case for a small team and may or may not fall into the non-binding
category irreversibly as team size grows. The non-binding case can be averted if and only
if exploration is cheap enough and the prospect of development is sufficiently promising.
In such a case, the players become increasingly tolerant of failure as team size grows
and are willing to explore at arbitrary pessimistic states in the limit, sharing the same
features as in the cooperative solution. Otherwise, free-riding incentives prevail as the
team becomes large: The resource constraints eventually fail to be binding, and the
tolerance for failure among the players stays the same for further increases in team size.
This result suggests that whether an experimentation culture can be successfully
created in large organizations hinges on the tools available for experimentation. As
innovation is driven by experimentation, perturbations in the effectiveness of the explo-
ration technology can have drastic impacts on welfare and long-run outcomes. More
precisely, for a sufficiently inexpensive exploration technology and promising prospects,
the equilibrium payoffs in a large team resemble the ones in the cooperative solution,
and the first-best technology under complete information is highly likely to be developed
in the long run. Otherwise, both the equilibrium payoffs and the long-run likelihood of
developing the first-best technology would be identical to those in a small team owing
to severe free-riding.
Lastly, we investigate whether asymmetric equilibria can improve welfare and long-
run outcomes over the symmetric equilibrium. We construct a class of asymmetric MPE
in which the players take turns performing exploration at pessimistic states, so that each
4
player achieves a higher payoff than in the symmetric equilibrium. It turns out that these
asymmetric MPE are the best MPE in two-player games in terms of average payoffs.
Unlike in the symmetric equilibrium, the players in these asymmetric MPE become
increasingly tolerant of failure as team size grows, independent of the other parameters.
The intuition for this result is that when alternation is allowed, the burden of keeping the
exploration process active can be shared among more players in larger teams,and thus the
players are willing to explore at more pessimistic states. As a consequence, the long-run
outcomes approach the first best as team size grows, irrespective of the cost of exploration
or the prospect of the development. Nevertheless, similar to the symmetric equilibrium
with non-binding resource constraints, in these asymmetric equilibria innovations might
arrive at a much slower rate than in the cooperative solution because of the low proportion
of the overall resource allocated to exploration. As a result, the welfare loss might still
be significant in large teams because of the strong free-rider effect.
1.1 Related Literature
This paper contributes to the literature on strategic experimentation, started by the
Brownian model in Bolton and Harris (1999), and then followed by the exponential
model in Keller, Rady, and Cripps (2005) and the Poisson model in Keller and Rady
(2010) (hereafter, BH, KRC, and KR). In all of these models, players face identical two-
armed bandit machines, which consist of a risky arm with an unknown quality and a safe
arm with a known quality. At each point in time, each player decides simultaneously how
to split one unit of a perfectly divisible resource between these arms, so that learning
occurs gradually by observing other players’ actions and outcomes. These models differ
in the assumptions on the probability distribution of the flow payoffs that each type of
the arm generates. By contrast, players in our model face a continuum of correlated
arms. Local learning—learning the quality of a particular arm—occurs instantaneously.
However, as the set of arms is unbounded, global learning—learning the qualities of
all arms—occurs gradually. The encouragement effect was first identified by BH in the
symmetric equilibrium in their Brownian model. This effect is then established by KR
for all MPE in the Poisson model with inconclusive good news. We are not only able to
show the presence of encouragement effect in all MPE in our model, but also (1) express
the stopping threshold in the unique symmetric MPE in closed form, and (2) perform
comparative statics analysis of the stopping threshold in both the symmetric MPE and
a class of asymmetric MPE to further study the encouragement effect and the free-rider
effect.
Callander (2011) proposes modeling the correlation between arms by Brownian path
5
and studies experimentation conducted by a sequence of myopic agents. Garfagnini
and Strulovici (2016) extend this model to a setting with overlapping generations, in
which short-lived yet forward-looking players search on a Brownian path for arms
with higher qualities. They mainly focus on the search patterns and the long-run
dynamics, and identify the stagnation of search and the emergence of a technological
standard in finite time. By contrast, a byproduct of our analysis suggests that their
findings might depend on the search cost or the underlying correlation structure. When
the qualities of arms contribute exponentially to the payoffs, which is common in
modeling technological progress in the macroeconomic growth literature, stagnation
can be avoided even under a negative drift of the Brownian path. Our main assumption
on the parameters (Assumption 1) provides the condition for stagnation in the cooperative
problem.
Both of their models focus on non-strategic environments, because in their discrete-
time set-ups unexplored gaps between explored technologies create difficulty for further
analysis in strategic environments. To overcome this challenge, we forgo the fine
details of the learning dynamics for analytical tractability by imposing continuity on the
experimentation process. This simplification can be interpreted as the qualities of the
neighboring technology being revealed during experimentation, so that no unexplored
territory remains between the explored technologies. Moreover, we impose a hard
constraint on the intensity of exploration to capture the scenario that technologies
far ahead of their time are infeasible to be explored today. These abstractions allow
us to derive explicit expressions for the equilibrium payoffs and strategies, perform
comparative statics analysis, and construct asymmetric equilibria.
In an independent paper, Cetemen, Urgun, and Yariv (2021) study the exit patterns
during a search process conducted jointly by heterogeneous players in a setting similar
to ours. In their model, exploitation is only possible after an irreversible exit chosen
endogenously by each player. The players in our model, however, are not faced with
stopping problems and thus are free to choose between exploration and exploitation, or
even both simultaneously, at all times.
Our model is also related to the literature on organizational learning and adaption.
Since March (1991) identified the trade-off between exploration and exploitation in orga-
nization learning, multi-armed bandit models as in Rothschild (1974) and Gittins (1979)
have been widely used in the management literature to study the exploration-exploitation
trade-off in organizational adaption in dynamic environments; see, for example, Denrell
and March (2001), Posen and Levinthal (2012), Lee and Puranam (2016), or Stieglitz,
Knudsen, and Becker (2016). In all of these models, organizational decisions are made
by a single unit of decision-making authority, and thus strategic elements are largely
6
absent. Our paper brings new insights into this strand of literature by showing that strate-
gic interaction within organizations can play a central role in organizational learning
and organizations’ long-term performance. More specifically, our results suggest that
because of free-riding, a horizontal organizational structure where everyone is engaged
in learning and experimentation might be less efficient in organizational learning than
a vertical hierarchy where tasks are coordinated and controlled by upper-level authori-
ties, and this inefficiency hinders organizational learning to a greater extent in a larger,
flatter organization if experimentation is too costly and the innovators are not properly
rewarded. In addition, the single-agent problem in this paper, as an attempt to model
the process of knowledge building and to capture the exploration-exploitation trade-off,
could serve as a new alternative to the canonical multi-armed bandit model for further
research in this area.
2 The Exploration Game
Time C ∈ [0,∞) is continuous, and the discount rate is A > 0. There are # ≥ 1
players, each endowed with one unit of perfectly divisible resource per unit of time.
Each player has to simultaneously and independently allocate the resources between
exploration, which expands the feasible technology domain, and exploitation, which
allows her to adopt one of the explored technologies. The feasible technology domain,
which is common to all players and contains all the explored technologies at time C, is
modeled as an interval [0, -C] with -0 = 0. If a player allocates the fraction :C ∈ [0, 1]to exploration over an interval of time [C, C + dC), the boundary -C is pushed to the right
by an amount of :C/2 dC, where the constant 2 > 0 represents the cost of exploration in
the unit of resources allocated to it. With the fraction 1 − :C allocated to exploitation,
by adopting technology GC ∈ [0, -C], the player receives a deterministic flow payoff
(1 − :C) exp(, (GC)) dC, where, (GC) denotes the quality of the adopted technology.
The quality mapping, : ℝ+ → ℝ is common to all players, but only the qualities
of the feasible technologies in [0, -C] are known to each player at time C. It coincides
with a realized path of a Brownian motion,+ ∈ � (ℝ+,ℝ) starting at ,+(0) = ~ ∈ ℝ,
with drift \ ∈ ℝ and volatility normalized to f =√
2, except possibly for, (0) = B ≥ ~,which is interpreted as the quality of a status quo technology. At the outset of the game,
all players know the parameters of the mapping but not the realized Brownian path,+
on ℝ++. Therefore, the process of exploration described above captures the dynamics
of research—building knowledge on the qualities of technologies—and development—
expanding the set of feasible technologies.
Given a player’s actions {(:C , GC)}C≥0, with :C ∈ [0, 1] and GC ∈ [0, -C] measurable
7
with respect to the information available at time C, her total expected discounted payoff,
expressed in per-period unit, is
E
[∫ ∞
0
A4−AC (1 − :C)4, (GC ) dC].
Clearly, for any {:C}C≥0, each player maximizes her total expected discounted payoff
by choosing GC = arg maxG∈[0,-C ], (G), which is the best feasible technology, for all
C ≥ 0. So we can focus on such an exploitation strategy without loss and rewrite the
above total payoff as
E
[∫ ∞
0
A4−AC (1 − :C)4(C dC
],
where (C = maxG∈[0,-C ], (G).Therefore, the environment above can be equivalently reformulated as follows. Play-
ers have prior beliefs represented by a filtered probability space (Ω,ℱ, (ℱC),P), where
Ω = � (ℝ+,ℝ) is the space of Brownian paths, P is the law of standard Brownian motion
� = {�C}C≥0, and ℱC is the canonical filtration of �. Each player chooses her strategy
from the space of admissible control processesA, which consists of all processes {:C}C≥0
adapted to the filtration (ℱC)C≥0 with :C ∈ [0, 1]. The public history of technology devel-
opment is represented by a development process {.C}C≥0, where.C ≔ , (-C) denotes the
quality of the technology being explored at time C. This process satisfies the stochastic
differential equation
d.C = \ C/2 dC +√
2 C/2 d�C , .0 = ~,
where C =∑
1≤=≤# :=,C measures how much of the overall resources is allocated to
exploration, and will be referred to as the intensity of exploration at time C.
Given a strategy profile k = {(:1,C , . . . , :#,C)}C≥0, player =’s total expected dis-
counted payoff can be written as
E0B
[∫ ∞
0
A4−AC (1 − :=,C)4(C dC
],
where
(C = max
(B, max
0≤g≤C.C
)
denotes the quality of the best explored technology at time C.
In addition, we use the term “gap”, denoted by �C ≔ (C − .C ≥ 0, to refer to the
quality difference between the best explored technology and the latest technology under
development. Henceforth, we shall use 0 and B when referring to the state variables as
8
opposed to the stochastic processes {�C}C≥0 and {(C}C≥0 (i.e., if �C = 0, then “the game
is in state 0 at time C”).
A Markov strategy := : ℝ+ × ℝ → [0, 1] with (0, B) as the state variables specifies
the action player = takes at time C to be := (�C , (C). A Markov strategy is called B-
invariant if it depends on (0, B) only through 0. Thus an B-invariant Markov strategy
:= : ℝ+ → [0, 1] takes the gap 0 as the state variable. Finally, an B-invariant Markov
strategy := is a cutoff strategy if there is a cutoff 0 ≥ 0 such that := (0) = 1 for all
0 ∈ [0, 0) and := (0) = 0 otherwise.
Given an B-invariant Markov strategy profile k, player =’s associated payoff at state
(0, B) can be written as {= (0, B |k) = 4B{= (0, 0|k).3 It is thus convenient to write
{= (0, B |k) = 4BD= (0 |k), where D= (0 |k) = {= (0, 0|k) stands for player =’s payoff at state
(0, B) normalized by 4B , the opportunity cost of exploration. We refer to D= : ℝ+ → ℝ
as player =’s normalized payoff function, or simply as payoff function when it is clear
from the context.
2.1 Exploration under Complete Information
To study the value of information, here we consider an alternative setting under complete
information. More specifically, how would # players allocate their resources cooper-
atively over time if the entire Brownian path , is publicly known at the outset of the
game?
Formally, in this subsection we replace FC with F for each C ≥ 0 while maintaining
the assumption that the feasible technology domain [0, -C] can only be expanded by
continuously pushing forward the boundary at the rate of d-C/dC = C/2. For a given
Brownian path, , denote the average (ex-post) value under complete information by
{(,) ≔ sup
∫ ∞
0
A4−AC (1 − C/#)4(C dC ,
where (C = maxG∈[0,-C ], (G), and the supremum is taken over all measurable functions
C ↦→ C ∈ [0, #].
Lemma 1 (Complete-information Payoff). Denote the average (ex-ante) value under
complete information at state (0, B) by + (0, B) ≔ E0B [{(,)]. We have + (0, B) =
4B* (0), where
* (0) =
1 + exp(−_0)/(_ − 1), if _ > 1,
+∞, otherwise,
3See Lemma 5 in the Appendix.
9
with _ = A2/# − \.
When _ > 1, there almost surely exists a first-best technology G ∈ ℝ+ so that the
value {(,) < +∞ is achieved by exploring with full intensity up to the point when
G is developed, and thereafter exploiting G.4 On the contrary, if _ ≤ 1, then with
probability 1, the payoff can be improved indefinitely by delaying the exploitation,
and thus {(,) = +∞ almost surely. As a consequence, the value under incomplete
information becomes infinite as well. The assumption that _ > 1 can be equivalently
written in the following way.
Assumption 1. # (1 + \) < A2.
To ensure well-defined payoffs and deviations, we maintain Assumption 1 throughout
the rest of the paper unless otherwise stated.
3 Joint Maximization of Average Payoffs
Suppose that # ≥ 1 players work cooperatively to maximize the average expected
payoff. Denote by A# the space of all adapted processes { C }C≥0 with C ∈ [0, #].Formally, we are looking for the value function
{(0, B) = sup ∈A#
{(0, B | ),
where
{(0, B | ) = E0B
[∫ ∞
0
A4−AC (1 − C/#)4(C dC
]
is the average payoff function associated with the control process = { C}C≥0, and an
optimal control ∗ ∈ A# such that {(0, B) = {(0, B | ∗). Clearly, the structure of the
problem allows us to focus on Markov strategies : ℝ+×ℝ→ [0, #] with (0, B) as the
state variables, so that the intensity of exploration at time C is specified by C = (�C , (C).According to the dynamic programming principle, we have
{(0, B) = max ∈[0,#]
{A (1 − /#) 4B dC + E0B
[4−AdC{(0 + d0 , B + dB)
]}.
First note that (C can only change when �C = 0, and thus dB = 0 for all positive gaps.
Hence, for each 0 > 0 at which m2{/m02 is continuous, the value function { satisfies
4See the proof of Lemma 7 for the precise definition of G.
10
the Hamilton-Jacobi-Bellman (HJB) equation
{(0, B) = max ∈[0,#]
{(1 −
#
)4B +
A2
(m2{(0, B)m02
− \ m{(0, B)m0
)}. (1)
Assume, as will be verified, that the optimal strategy is B-invariant. Then by the
homogeneity of the value function {(0, B) = 4B{(0, 0) = 4BD(0), we can replace {(0, B)with 4BD(0), and divide both sides of equation (1) by the opportunity cost 4B , to obtain
the normalized HJB equation
D(0) = 1 + max ∈[0,#]
{V(0, D) − 1/#}, (2)
where V(0, D) ≔ (D′′(0)−\D′(0))/(A2) is the ratio of the expected benefit of exploration
( m2{/m02−\ m{/m0 )/(A2) to its opportunity cost 4B. It is then straightforward to see that
the optimal action takes the following “bang-bang” form. If the shared opportunity cost
of exploration, 1/# , exceeds the full expected benefit, the optimal choice is (0) = 0
(all agents choose exploitation exclusively), which gives value D(0) = 1. Otherwise,
(0) = # is optimal (all agents choose exploration exclusively), and D satisfies the
second-order ordinary differential equation (henceforth ODE),
V(0, D) = D(0)/#. (3)
The optimal strategy could presumably depend on both 0 and B and hence might not
be B-invariant. Indeed, both the benefit and the opportunity cost of exploration increase
as innovation occurs. Nevertheless, due to our specific form of flow payoff where
technologies contribute exponentially to the payoffs, the increased benefit of exploration
exactly offsets the increased opportunity cost. As a result, the incentives for exploration
for a fixed gap do not depend on the highest-known quality, which leads to an B-invariant
optimal strategy.5 This conjecture is confirmed by the following theorem.
Proposition 1 (Cooperative Solution). Suppose Assumption 1 holds. In the #-agent
cooperative problem, there is a cutoff 0∗ > 0 given by
0∗ =1
W2 − W1
(ln
(1 + 1
W2
)− ln
(1 + 1
W1
))
with W1 < W2 being the roots of W(W − \) = A2/#, such that it is optimal for all players
to choose exploitation exclusively when the gap is above the cutoff 0∗ and it is optimal
5This feature also appears in Urgun and Yariv (2020) and Cetemen, Urgun, and Yariv (2021) under a
setting of undiscounted linear preference with a cost convex in the learning rate.
11
for all players to choose exploration exclusively when the gap is below the cutoff 0∗.
The associated payoff at state (0, B) can be written as +∗(0, B) = 4B*∗(0), where
the normalized payoff function*∗ : ℝ+ → ℝ is given by
*∗(0) = 1
W2 − W1
(W24−W1 (0∗−0) − W14
−W2 (0∗−0))
when 0 ∈ [0, 0∗) and by *∗(0) = 1 otherwise.
If Assumption 1 is violated, then*∗(0) = +∞ for all 0 ≥ 0.
The cooperative solution is pinned down by the standard smooth pasting condition
D′(0∗) = 0, and a normal reflection condition ( m{/m0 + m{/mB )(0+, B) = 0, which takes
the form of D(0) + D′(0+) = 0 for B-invariant strategies.6
Because of the lack of information on the qualities of technologies, the players might
stop too early, giving up exploration before developing the first-best technology and thus
ultimately adopting a suboptimal technology, or might stop too late, wasting too many
resources for marginal improvement while the first-best technology has already been
developed. The cooperative solution optimally balances these trade-offs between early
and late stopping and therefore determines the efficient strategies under incomplete
information.
Corollary 1 (Comparative Statics of the Cooperative Solution). The cooperate cutoff
0∗ is strictly increasing in # and strictly decreasing in 2. For all 0 ≥ 0, the cooperative
payoff *∗(0) is strictly below the complete-information payoff * (0). For each \ ∈ ℝ
and 2 > 0,
• if \ < −1, then |*∗# (0) − *# (0) | → 0 as # → +∞;
• if \ ≥ −1, then *∗# → +∞ as # → A2/(1 + \) ∈ (1, +∞].7
In both cases 0∗# → +∞.
The stopping cutoff 0∗ represents the tolerance for failure among the players. The
benefit of exploration increases as exploration becomes cheaper, and thus the players
are willing to explore at more pessimistic states for smaller 2.8 Likewise, because the
players work cooperatively, extra resources brought by additional players as team size
6The normal reflection condition is not an optimality condition. It ensures that the infinitesimal change
of the payoff at a zero gap has a zero d( term, which is necessary for the continuation value process to be
a martingale. See Peskir and Shiryaev (2006) for an introduction to the normal reflection condition in the
context of optimal stopping problems, and the proof of Lemma 4 for more details.
7Here we allow # to be non-integral values for convenience. Also note that Assumption 1 is violated
for # ≥ A2/(1 + \) if \ ≥ −1, in which case *∗# = +∞.
8The comparative statics with respect to 2 and the discount rate A are equivalent.
12
increases enable a higher rate of exploration, so that for a fixed tolerance for failure,
resources are wasted for a shorter period of time before the exploration fully stops, which
in turn leads to greater tolerance.
Note that under the complete information setting in which players can observe the
whole path, it is optimal for them to explore at arbitrarily large gaps as long as the
gap would quickly shrink to zero if they explore a little bit longer. By contrast, under
incomplete information, it is optimal to stop exploration whenever the gap reaches 0∗.
This informational disadvantage vanishes as either # goes up or 2 goes down because
it originates from the suboptimal stopping decision in the cooperative solution due to
the lack of information, rather than from the inefficient allocation of resources during
exploration. As a result, the cooperative payoff converges to the complete-information
payoff as the players become increasingly tolerant of failure.
3.1 Long-run Outcomes
Consider an B-invariant strategy profile k such that the set of states at which the intensity
of exploration is bounded away from zero takes the form of a half-open interval [0, 0).We denote by G(0) ≔ limC→∞ -C =
∫ ∞0 C/2 dC the amount of exploration under k. In
addition, we denote by B(0) ≔ limC→∞ (C the long-run technological standard, which is
defined as the quality of the best technology available as time approaches infinity. For
a given Brownian path , , it is straightforward to see that both B(0) and G(0) depend
only on the initial state (0, B) and the stopping threshold 0, and are independent of
the intensity of exploration. Moreover, they are clearly nondecreasing in 0. We can
explicitly express the prior belief on the distribution of B(0) as follows.
Lemma 2. At state (0, B), for a strategy profile with stopping threshold 0, the long-run
technological standard B(0) has the same distribution as max{B, " − 0}, where the
random variable " has an exponential distribution with mean (4\0 − 1)/\.9
Recall from Section 2.1 that the first-best technology, denoted by G, is the tech-
nology G ≥ 0 that yields the highest payoff for # cooperative agents under complete
information, taking the opportunity cost of its development into account. Denote by
@(0) ≔ P0B (G ∈ [0, G(0)]) the probability that the first-best technology will be explored
in the long run under a strategy profile with stopping threshold 0.
Lemma 3. At state (0, B), we have @(0) → 1 as 0 → +∞.
Therefore, the comparative statics in Corollary 1 imply that G will be developed
under the cooperative solution with probability @# (0∗#) → 1 as the team size grows.10
9For \ = 0, we take lim\→0(4 \0 − 1)/\ = 0 for the mean of " .
10Note that for a given, , the first-best technology G could depend on the team size # . However, the
13
4 The Strategic Problem
From now on, we assume that there are # > 1 players acting noncooperatively. We study
equilibria in the class of B-invariant Markov strategies, which are the Markov strategies
with the gap as the state variable and will hereafter be referred to as Markov strategies.
In this section, we provide characterization of best responses and the associated payoff
functions, which help us to establish useful properties of the equilibria.
4.1 Best Responses and Equilibria
We denote by K the set of Markov strategies that are right-continuous and piecewise
Lipschitz-continuous, and denote by A the space of admissible control processes as in
Section 2.11 A strategy :∗= ∈ K for player = is a best response against her opponents’
strategies k¬= = (:1, . . . , :=−1, :=+1, :# ) ∈ K#−1 if
{= (0, B |:∗=, k¬=) = sup:=∈A
{= (0, B |:=,k¬=)
at each state (0, B) ∈ ℝ+ ×ℝ. This turns out to be equivalent to
D= (0 |:∗=, k¬=) = sup:=∈K
D= (0 |:=, k¬=)
for each gap 0 ≥ 0, with the normalized payoff function D= (0 |k) = 4−B{= (0, B |k) defined
as in Section 2. A Markov perfect equilibrium is a profile of Markov strategies that are
mutually best responses.
Denote the intensity of exploration carried out by player =’s opponents by ¬= (0) =∑;≠= :; (0), and the benefit-cost ratio of exploration by V(0, D=) as in Section 3. The
following lemma characterizes all MPE in the exploration game.
Lemma 4 (Equilibrium Characterization). A strategy profile k = (:∗1, . . . , :∗# ) ∈ K#
is a Markov perfect equilibrium with D= : ℝ+ → ℝ being the corresponding payoff
function of player =, if and only if for each = ∈ {1, . . . , #}, function D=
1. is once continuously differentiable on ℝ++;
convergence in Lemma 3 does not require a constant sequence of parameters, provided that the parameters
satisfy Assumption 1 along the sequence.
11Piecewise Lipschitz-continuity means thatℝ+ can be partitioned into a finite number of intervals such
that the strategy is Lipschitz-continuous on each of them. This rules out the infinite-switching strategies
considered in Section 6.2 of KRC.
14
2. is piecewise twice continuously differentiable on ℝ++;12
3. satisfies the normal reflection condition
D= (0) + D′= (0+) = 0; (4)
4. satisfies, at each continuity point of D′′= , the HJB equation
D= (0) = 1 + ¬= (0)V(0, D=) + max:=∈[0,1]
:={V(0, D=) − 1}, (5)
with :∗= (0) achieving the maximum on the right-hand side, i.e.,
:∗= (0) ∈ arg max:=∈[0,1]
:={V(0, D=) − 1}.
These conditions are standard in optimal control problems. Condition 4 and the
smooth pasting condition, which is implicitly stated in Condition 1, are the optimality
conditions. The rest are properties for general payoff functions.
In any MPE, Lemma 4 provides the following characterization of best responses. If
V(0, D=) < 1, then :∗= (0) = 0 is optimal and D= (0) = 1 + ¬= (0)V(0, D=) < 1 + ¬= (0).If V(0, D=) = 1, then the optimal :∗= (0) takes arbitrary values in [0, 1] and D= (0) =1 + ¬= (0). Finally, if V(0, D=) > 1, then :∗= (0) = 1 is optimal and D=(0) = (1 + ¬= (0))V(0, D=) > 1 + ¬= (0). In short, player =’s best response to a given intensity of
exploration ¬= by the others depends on whether D= is greater than, equal to, or less
than 1 + ¬=.On intervals where each := is continuous, HJB equation (5) gives rise to the ODE
D=(0) = 1 − := (0) + (0)V(0, D=). (6)
In particular, on the intervals where each := is constant, the ODE above admits the
explicit solution
* (0) = 1 − := +�14W10 + �24
W20, (7)
where W1 and W2 are the roots of the equation W(W − \) = A2/ , and �1, �2 are constants
to be determined.
Lastly, on intervals where an interior allocation is chosen, the ODE from the indif-
12This condition means that there is partition of ℝ++ into a finite number of intervals such that D′′= is
continuous on the interior of each of them.
15
ference condition V(0, D=) = 1 has the general solution
* (0) =
�1 +�20 + A202/2, if \ = 0,
�1 +�24\0 − A20/\, if \ ≠ 0,
(8)
where �1, �2 are constants to be determined.
4.2 Properties of MPE
First note that in any MPE, the average payoff can never exceed the #-player cooperative
payoff *∗# , and no individual payoff can fall below the single-agent payoff *∗1. The
upper bound follows directly from the fact that the cooperative solution maximizes the
average payoff. The lower bound *∗1
is guaranteed by playing the single-agent optimal
strategy, as the players can only benefit from the exploration efforts exerted by others.
Second, all Markov perfect equilibria are inefficient. Along the efficient exploration
path, the benefit of exploration tends to 1/# of its opportunity cost as the development
process approaches the efficient stopping threshold. A self-interested player thus has an
incentive to deviate to exploitation whenever the benefit of exploration drops below its
full opportunity cost.
Also note that in any MPE, the set of states at which the intensity of exploration
is positive must be an interval [0, 0) with 0∗1≤ 0 ≤ 0∗# . The bounds on the stopping
threshold follow directly from the bounds on the average payoffs and imply that the
long-run outcomes in any MPE cannot outperform the cooperative solution.
Corollary 2. In any Markov equilibrium with stopping threshold 0, at any state (0, B)we have B(0) ≤ B(0∗# ) almost surely, and @(0) ≤ @(0∗# ).
Moreover, the intensity of exploration must be bounded away from zero on any
compact subset of [0, 0). If this were not the case, there would exist some gap 0 < 0
such that the development process {.C}C≥0 starting from .0 = B − 0 would never reach
the best-known quality B because of diminishing intensity, and therefore allocating a
positive fraction of resource to exploration at gap 0 is clearly not optimal for any player.
In the two-armed bandit models of BH, KRC, and KR, the players always use the
risky arm at beliefs higher than some myopic cutoff, above which the expected short-
run payoff from the risky arm exceed the deterministic flow payoff from the safe arm.
Because our model lacks such a myopic cutoff, it might seem reasonable to conjecture
that some player =, with her payoff function D= bounded from above by 1 + ¬=, never
explores, and thus free-rides the technologies developed by the other players. Such a
conjecture is refuted by the following proposition.
16
Proposition 2 (No Player Always Free-rides). In any Markov perfect equilibrium, no
player allocates resources exclusively to exploitation for all gaps.
The intuition behind this result is that in equilibrium, the cost and benefit of explo-
ration must be equalized at the stopping threshold for each player, whereas any player
who never explores would find that this benefit outweighs the cost, manifested by a kink
in her payoff function at the stopping threshold. She would then have a strict incentive
to resume exploration immediately after the other players give up, hoping to reduce
the gap to bring the development process back alive. Therefore, in equilibrium, every
player must perform exploration at some states, respecting the smooth pasting condition
(Condition 1 in Lemma 4).
This result shows that each player strictly benefits from the presence of the other
players in equilibrium, which in turn encourages some of the players to explore at gaps
larger than their single-agent cutoffs. Such an encouragement effect is exhibited in all
MPE of the exploration game.
Proposition 3 (Encouragement Effect). In any Markov perfect equilibrium, at least one
player explores at gaps above the single-agent cutoff 0∗1.
With the same intuition as the encouragement effect, our last general result on
Markov perfect equilibria concerns the nonexistence of equilibria where all players use
cutoff strategies.
Proposition 4 (No MPE in Cutoff Strategies). In any Markov perfect equilibrium, at
least one player uses a strategy that is not of the cutoff type.
Next we look into Markov perfect equilibria in greater depth.
5 Symmetric Equilibrium
Our characterization of best responses and the nonexistence of MPE in cutoff strategies
suggest that in any symmetric equilibrium, the players choose an interior allocation at
some states. At these states of interior allocation, the benefit of exploration must be
equal to the opportunity cost, and therefore the common payoff function solves the ODE
V(0, D) = 1. As a consequence of equation (6), the payoff of each player at the states
of interior allocation in the symmetric equilibrium must also satisfy D = 1 + ¬= ≤ # .
Therefore, whenever the common payoff exceeds # , each player allocates resources
exclusively to exploration, and the payoff function satisfies the same ODE (3) as in the
cooperative solution. However, it is worth pointing out that for some configurations of
17
the parameters, because of the strength of free-riding incentives among the players, the
common payoff could be below # for all gaps, and accordingly, the resource constraint
of each player is not necessarily binding even when the gap is zero, in marked contrast
to the cooperative solution.13 Lastly, the common payoff satisfies D = 1 at the gaps for
which the resource is exclusively allocated to exploitation.
The solutions to the corresponding ODEs provided in equations (7) and (8), to-
gether with the normal reflection condition (4) and the smoothness requirement on the
equilibrium payoff functions, uniquely pin down the strategies and the associated pay-
off functions in the symmetric equilibrium, which can be expressed in closed form as
follows.
Proposition 5 (Symmetric Equilibrium). The #-player exploration game has a unique
symmetric Markov perfect equilibrium with the gap as the state variable. There exists
a stopping threshold 0 ∈ (0∗1, 0∗#) and a full-intensity threshold 0† ≥ 0 such that the
fraction of resource :†(0) that each player allocates to exploration at gap 0 is given by
:†(0) =
0, on [0, +∞),A2#−1
∫ 0−00
q\ (I) dI ∈ (0, 1), on [0†, 0),
1, on [0, 0†) if 0† > 0,
(9)
with q\ (I) ≔ 1−4−\I\ .14
The corresponding payoff function is the unique function*† : ℝ+ → [1, +∞) of class
�1 with the following properties: *†(0) = 1 on [0, +∞); *†(0) = 1 + (# − 1):†(0) ∈(0, #) and solves the ODE V(0, D) = 1 on (0†, 0); if 0† > 0, then*†(0) > # and solves
the ODE V(0, D) = D/# on (0, 0†).
The closed-form expressions for the common payoff function*† and the thresholds
0† and 0 are provided in the Appendix.
As we have already pointed out, depending on the parameters, it is possible that
0† = 0, in which case :†(0) < 1 for all 0 > 0, and hence will be referred to as the
13Exploration at a zero gap immediately leads to innovation and thus has an “innovation benefit” that
strictly dominates the benefit of exploitation. It might therefore seem puzzling why any player would not
explore with full intensity at a zero gap. The reason is that, heuristically speaking, the innovation benefit
can be achieved by any positive intensity of exploration at a zero gap. In the single-agent problem, the
player can only reap such an innovation benefit by her own effort and thus would never stop exploration
when the gap is zero. In the strategic problem, however, from each player’s own perspective, the innovation
benefit is secured by the others’ effort whenever ¬= (0) > 0. In the extreme cases, it is even possible
for some players to allocate all resources to exploitation at a zero gap. This happens in the asymmetric
equilibria for some configurations of the parameters in the next section.
14We define q0(I) ≔ lim\→0 q\ (I) = I.
18
non-binding case. The opposite case, where 0† > 0, will be referred to as the binding
case. Figure 1 illustrates the symmetric equilibrium for these two cases.
0
1
Figure 1. The symmetric equilibrium with binding resource constraints in a two-player
game, and with non-binding constraints in a four-player game (A = 1, 2 = 1/2, \ = −1).
As the outcomes are publicly observed and newly developed technologies are freely
available, the players have incentives to free-ride. Such a free-rider effect can be seen
more clearly from the comparison between the benefit-cost ratio of exploration at the
states of interior allocation in the symmetric MPE
V(0,*†) = 1,
and the one in the cooperative solution
V(0,*∗) = *∗(0)/#.
Exploration in equilibrium thus requires the benefit of exploration to cover the cost,
whereas the efficient strategy entails exploration at states where the cost exceeds the
benefit, as V(0,*∗) < 1 whenever *∗(0) < # . Figure 2 illustrates the comparison
between the common payoff function in the symmetric equilibrium and the cooperative
solution in a two-player exploration game.
19
0
1
2
Value
Figure 2. From top to bottom: complete-information payoff *# , average payoff *∗# in
the cooperative solution, common payoff *†# in the symmetric equilibrium, and payoff
in the single-agent optimum*∗1. Parameter values: A = 1, 2 = 1/2, \ = −1, # = 2.
5.1 Comparative Statics
In this section we examine the comparative statics of the symmetric equilibrium with
respect to the cost of exploration 2 and the number of players # .
Corollary 3 (Effect of 2). The stopping threshold 02 is strictly decreasing in 2, and the
full-intensity threshold 0†2 is weakly decreasing in 2. For any gap 0 ≥ 0, the equilibrium
strategy :†2 (0) and the common payoff *†2 (0) are weakly decreasing in 2.
As 2 decreases, exploration becomes less expensive and thus the players have greater
incentives for exploration. Moreover, the increased exploration efforts of others encour-
age each player to raise their own effort further.
The common payoff is decreasing in 2 for two reasons. First, as in the cooperative
solution, the reduced cost of exploration raises players’ tolerance for failure 02 , and hence
more advanced technologies will be developed and adopted in the long run. Second,
because cheaper exploration leads to a higher rate of exploration, a lower 2 enables
earlier adoption of the developed technologies.
Corollary 4 (Effect of #). On { # ≥ 1 | 0†# > 0 }, which is the range of # for which
the players’ resource constraints are binding in the symmetric equilibrium, the stopping
threshold 0# is strictly increasing in # and the common payoff function *†# is weakly
20
increasing in # . Whereas on { # ≥ 1 | 0†# = 0 }, both 0# and *†# are constant over # ,
and the equilibrium strategy :†# is weakly decreasing in # .
In the binding case, notice that :†# (0) is not monotone in # because the full-intensity
threshold 0†# could be decreasing in # . This occurs when the free-rider effect outweighs
the encouragement effect. On the one hand, extra encouragement brought by additional
players raises the stopping threshold 0# . On the other hand, the increased free-riding
incentives due to extra players tighten the resource binding requirement that D > # ,
which enlarges the region (0†# , 0# ) of interior allocation. The total effect of increasing
# on the intensity of exploration is determined by these two competing forces and hence
is not monotone in # .
In the non-binding case, as # increases, each player adjusts their individual intensity
of exploration downward, maintaining the same equilibrium payoff. This is a situation
where the incentive to free-ride is so strong that it completely offsets further encourage-
ment brought by additional players, and even the overall intensity of exploration #:† is
decreasing in # for the gaps in [0, 0# ). Thus, free-riding slows down the development
process considerably. In the worst scenario, the overall intensity when the gap is zero
could even be lower than that in the single-agent problem.
Also notice that whenever the resource constraints are not binding, the full-intensity
threshold 0†# remains constant at zero for any further increase in # because :†# would
be even lower. Therefore, if 0†# ever hits zero as # goes up, the resource constraints in
the symmetric equilibrium remain non-binding for any larger # .
As we have seen, depending on whether or not the resource constraints are binding
in the symmetric equilibrium, a larger team size can have qualitatively different effects
on the welfare and long-run outcomes. If the resource constraints are not binding in
equilibrium, any extra resources brought by additional players translate entirely to free-
riding, which results in a highly inefficient outcome in a large team in terms of average
payoffs, the likelihood of developing the first-best technology, and the technological
standard in the long run. The question then naturally arises of whether the resource
constraints would ever fail to be binding in the symmetric equilibrium as # goes up. Or,
conversely, would the encouragement effect eventually overcome the free-rider effect?
To investigate this question, we now examine the effect on the symmetric equilibrium
as # increases toward infinity, while keeping the other parameters fixed. For ease of
exposition, we allow # ≥ 1 to take non-integral values and drop Assumption 1 for the
rest of this section.15
15Caution needs to be taken in the case of \ > −1, in which Assumption 1 could be violated for large
# , and the existence and uniqueness of the symmetric equilibrium is no longer guaranteed.
21
Corollary 5 (Asymptotic Effect of #). There exists 2 > 0, which depends on \, such
that if 2 < 2 and \ > −1, then we have *†# → +∞, 0†# → +∞, and @# (0# ) → 1 as
# → A2/(1 + \) ∈ (1, +∞); if instead 2 ≥ 2 or \ ≤ −1, we have 0†# = 0 for sufficiently
large # , and we have lim*†# (0) < lim*∗# (0) for each 0 ≥ 0, 0# is bounded, and
@# (0#) is bounded away from 1 as # → +∞.
Next we discuss this result for \ < −1 and \ ≥ −1 separately.
When \ < −1, recall that the cooperative payoff converges to the finite limit of
the complete-information payoff as # → +∞, with the cooperative cutoff 0∗# → +∞and thus the long-run probability of developing the first-best technology @# (0∗#) → 1.
In contrast, the symmetric equilibrium always falls into the non-binding category for
large # . After the full-intensity threshold 0†# reaches zero, the encouragement effect is
completely frozen by the free-rider effect: The stopping threshold 0# remains constant
for any further increase of # , so both the amount and the rate of exploration are highly
inefficient for large # . Consequently, as # → +∞, the average payoff is bounded away
from the cooperative payoff in the limit, and @# (0#) is bounded away from 1.
On the other hand, when \ ≥ −1, recall that in addition to the divergence of 0∗# ,
the cooperative payoff tends to infinity as # → A2/(1 + \) ∈ (1, +∞]. Whether this
is the case for the symmetric equilibrium now depends on the cost of exploration, as
illustrated in Figure 3. If exploration is too expensive, the symmetric equilibrium always
falls into the non-binding category for large # , same as the case of \ < −1. However,
if exploration is sufficiently inexpensive, the resource constraints are always binding in
the symmetric equilibrium. Not only does the stopping threshold 0# tend to infinity,
but the full-intensity threshold 0†# does as well. So the encouragement effect eventually
overcomes the free-rider effect when exploration is sufficiently cheap and the prospect
of development is sufficiently promising. Both the amount and the rate of exploration
resemble the cooperative solution for large # . As a consequence, @# (0#) → 1 and the
average equilibrium payoff tends to infinity as # goes up, bearing a resemblance to the
cooperative payoff.
Also note from Figure 3 that when \ > −1, for large fixed # the symmetric equilib-
rium could be very sensitive in 2 around 2. This observation suggests that perturbations
in the cost of exploration can have a drastic effect on the welfare and the long-run
outcomes of technology development.
Lastly, as yet another manifestation of the strength of the free-rider effect, in the
case of \ > −1, Corollary 5 asserts that symmetric equilibrium with finite payoff exists
for all # provided that 2 ≥ 2, standing in marked contrast to the cooperative solution,
which fails to exist due to unbounded payoffs for large # .
22
5 10 15 20 25
0
2
4
6
8
10
(a) Full-intensity threshold 0†#
5 10 15 20 25
0
2
4
6
8
10
(b) Stopping threshold 0#
Figure 3. Full-intensity thresholds and the stopping thresholds in the symmetric equilib-
rium (A = 1, \ = −0.09) for different cost of exploration 2. If exploration is sufficiently
cheap (solid curves), resources constraints are binding (0†# > 0) for all # . Otherwise
(dashed curve), resources constraints are not binding (0†# = 0) for sufficiently large # .
6 Asymmetric Equilibria and Welfare Properties
Note that in the symmetric equilibrium, the intensity of exploration dwindles down to
zero as the gap approaches the stopping threshold. As a result, the threshold is never
reached and exploration never fully stops. This suggests that welfare can be improved if
the players take turns between the roles of explorer and free-rider, keeping the intensity
of exploration bounded away from zero until all exploration stops. In this section
we investigate this possibility by constructing a class of asymmetric Markov perfect
equilibria.
6.1 Construction of Asymmetric Equilibria
Our construction of asymmetric MPE is based on the idea of the asymmetric MPE
proposed in KR. We let the players adopt the common actions in the same way as in
the symmetric equilibrium whenever the resulting average payoff is high enough to
induce an overall intensity of exploration greater than one, and let the players take turns
exploring at more pessimistic states in order to maintain the overall intensity at one.
Such alternation between the roles of explorer and free-rider leads to an overall intensity
of exploration higher than in the symmetric equilibrium, yielding higher equilibrium
payoffs.
Here we briefly address the two main steps in our construction. In the first step,
we construct the average payoff function D. We let D solve the same ODE V(0, D) =max{D(0)/#, 1} as the common payoff function in the symmetric equilibrium whenever
D > 2 − 1/# , so that the corresponding overall intensity is #: (0) = # (D(0) − 1)/(# −
23
1) ≥ 1. Whenever 1 < D < 2 − 1/# , we let D solve the ODE D(0) = 1 − 1/# + V(0, D),which is the ODE for the average payoff function among # players associated with an
overall intensity = 1. The boundary conditions for the average payoff function D,
namely the smooth pasting condition at the stopping threshold and the normal reflection
condition (4), are identical to the conditions in Lemma 4, simply because these conditions
remain unchanged after taking the average. The unique solution of class �1 (ℝ++) to
the ODE above serves as the average payoff function, which also gives thresholds
0♭ > 0♯ ≥ 0 such that D = 1 on [0♭, +∞), 1 < D < 2 − 1/# on (0♯, 0♭) and D > 2 − 1/#on [0, 0♯). In the second step, equilibrium-compatible actions are assigned to each
player. On [0, 0♯), if it is nonempty, we let the players adopt the common action
:= (0) = min{(D(0) − 1)/(# − 1), 1} in the same way as in the symmetric equilibrium.
On [0♯, 0♭), players alternate between the roles of explorer and free-rider so as to keep
the overall intensity at one. We first split [0♯, 0♭] into subintervals in an arbitrary way
and then meticulously choose the switch points of their actions, so that all individual
payoff functions have the same values and derivatives as the average payoff function at
the endpoints of these subintervals.16 Lastly, our characterization of MPE in Lemma 4
confirms that the assigned action profile is compatible with equilibrium. We leave the
method for choosing the switch points and further details to the Appendix.
For # = 2, Figure 4 illustrates the intensity of exploration in the asymmetric MPE,
compared with the symmetric equilibrium. We can see that the resource constraints are
binding in the depicted equilibria, but it is worth bearing in mind that it may well be the
other way around for different parameters. For example, if the cost of exploration is too
high, the average payoff function could be bounded by 2− 1/# from above, resulting in
an intensity of exploration equal to 1 on the entire region [0, 0♭). The states 0� and 0� in
the figure demarcate the switch points at which these two players swap roles when they
take turns exploring on [0♯, 0♭). The volunteer explores on [0♯, 0�) ∪ [0� , 0♭), whereas
the free-rider explores on [0� , 0� ). These switch points are chosen in a way that ensures
their individual payoff functions are of class �1(ℝ++) and coincide on [0, 0♯).Figure 5 illustrates the associated average payoff function (dashed curve) and the
individual payoff functions (solid curves) that can arise in the equilibria in a two-player
game, compared with the common payoff function in the symmetric equilibrium (solid
dotted curve). Note that the payoff function of the volunteer is strictly higher than
that of the free-rider at the states immediately to the left of the stopping threshold 0♭.
In fact, the free-rider has a payoff equal to 1 on [0� , 0♭). This observation stands in
16In fact, this technique can be used to construct asymmetric equilibria with strategies that take values
in {0, 1} only, which are referred to as simple equilibria in KRC. See Proposition 10 in the Appendix.
However, it is not clear whether such equilibria achieve higher average payoffs than the symmetric
equilibrium.
24
0
0
1
2
Figure 4. From top to bottom: intensity of exploration in the cooperative solution, the
asymmetric equilibria, and the symmetric equilibrium (A = 1, 2 = 1/2, \ = −1, # = 2).
marked contrast to the models in KRC and KR, where the volunteer is worse off on
this region. The intuition behind this feature is similar to that in Proposition 2. A kink
has to be created at 0♭ in the free-rider’s payoff function in order to attain a higher
payoff than the volunteer’s at states immediately to the left of 0♭, because the free-rider’s
ODE D(0) = 1 + V(0, D) must be satisfied.17 In such a case, the free-rider has a strict
incentive to take over the role of volunteer to kickstart the development process at a
larger gap. Therefore, in equilibrium, the volunteer must be compensated for acting as
a lone explorer at pessimistic states by bearing relatively less burden at more optimistic
states.
For arbitrary # , we have the following result.
Proposition 6 (Asymmetric MPE). The #-player exploration game admits Markov
perfect equilibria with thresholds 0 ≤ 0‡ ≤ 0♯ < 0♭ < 0∗, such that on [0, 0♯], the
players have a common payoff function; on [0, 0‡], all players choose exploration
exclusively; on (0‡, 0♯), the players allocate a common interior fraction of the unit
resource to exploration, and this fraction decreases in the gap; on [0♯, 0♭), the intensity
of exploration equals 1 with players taking turns exploring on consecutive subintervals;
17Even though the free-rider has a payoff equal to 1 around 0♭, she still benefits from free-riding on
this region. This benefit, however, is offset by the relatively high burden of exploration effort she must
bear in equilibrium at more optimistic states to reward the volunteer.
25
0
1
1.5
2
Value
Figure 5. Average payoff and possible individual payoffs in the best two-player asym-
metric equilibria, compared to the common payoff in the symmetric equilibrium (A = 1,
2 = 1/2, \ = −1, # = 2).
on [0♭, +∞), all players choose exploitation exclusively. The intensity of exploration is
continuous in the gap on [0, 0♭). The average payoff function is strictly decreasing on
[0, 0♭], once continuously differentiable on ℝ++, and twice continuously differentiable
on ℝ++ except for the cutoff 0♭. On [0, 0♭), the average payoff is higher than in the
symmetric MPE, and 0♭ lies to the right of the threshold 0 at which all exploration stops
in that equilibrium.
6.2 Welfare Results
For # ≥ 3, further improvements can be easily achieved by letting the players take turns
exploring, maintaining the intensity of exploration at whenever < D < +1− /#for all ∈ {1, . . . , #}, rather than for = 1 only, as in the Proposition above. However,
it is not clear whether such improvements achieve the highest welfare among all MPE
of the #-player exploration game. For # = 2, the asymmetric equilibria of Proposition
6 are the best among all MPE.
Proposition 7 (Best MPE for # = 2). The average payoff in any Markov perfect
equilibrium of the two-player exploration game cannot exceed the average payoff in the
equilibria of Proposition 6.
In the construction of the asymmetric MPE depicted in Figure 4, the interval [0♯, 0♭]
26
is not split into subintervals. This can be confirmed by the observation from Figure 5
that the players’ payoff functions match values only at the endpoints of [0♯, 0♭], not in the
interior. Our construction allows an arbitrary partition on [0♯, 0♭] during the splitting
procedure, thus a trivial partition of [0♯, 0♭], as in Figure 4, suffices. A finer partition,
however, produces equilibria in which the players exchange roles more often, which
allows them to share the burden of exploration more equally. Sufficiently frequent
alternation of roles on [0♯, 0♭) guarantees each player a payoff close enough to the
average payoff and thus yields a Pareto improvement over the symmetric equilibrium.18
Proposition 8 (Pareto Improvement over the Symmetric MPE). For any n > 0, the #-
player exploration game admits Markov perfect equilibria as in Proposition 6 in which
each player’s payoff exceeds the symmetric equilibrium payoff on [0, 0♭ − n].
Recall that the symmetric equilibrium exhibits a limited encouragement effect when
the cost of exploration is too high. The reason is that the common payoff function on the
interval of interior allocation must satisfy the ODE V(0,*†) = 1, which does not depend
on # . As a result, the common payoff function in the symmetric equilibrium is constant
over the team size when the resource constraints are not binding. By contrast, the average
payoff always increases in the number of players in the asymmetric MPE in which the
players take turns exploring at states immediately to the left of the stopping cutoff. This
is because the burden of keeping the overall intensity at one at these states can be shared
among more players in larger teams. As a result, the players would be able to exploit
more often on average, which in turn encourages them to explore at more pessimistic
states. Therefore, unlike the comparative statics of the symmetric equilibrium, the
encouragement effect in the asymmetric equilibria is not frozen by the free-rider effect
as # goes up, irrespective of the cost and the prospect of the development.
Proposition 9. The stopping cutoff 0♭# in the asymmetric MPE of Proposition 6 goes to
infinity as # → +∞.19
Therefore, the amount of exploration, and thus the long-run outcomes, can be
improved significantly over the symmetric equilibrium for large # by letting the players
take turns exploring before the exploration fully stops. This positive result, unfortunately,
does not extend as far to the welfare, as the rate of exploration might still be too
low because of non-binding resource constraints, similar to the situation faced in the
18The payoffs of both players in the asymmetric MPE depicted in Figure 5 are higher than in the
symmetric equilibrium on [0, 0� ); however, this might not be the case in general when the trivial
partition of [0♯, 0♭] is used in the construction, as 0� could lie on the left of 0.
19When \ > −1, if the asymmetric MPE of Proposition 6 fail to exist due to unbounded payoffs for
# ≥ A2/(1 + \), then by # → +∞ we actually mean # → A2/(1 + \).
27
symmetric equilibrium. More precisely, when \ < −1, the full-intensity threshold 0‡#always hits 0 as # → +∞; when \ ≥ 1, whether this happens again depends on the cost
of exploration, just like in the symmetric equilibrium. It can be shown that the results
regarding the average payoff and the full-intensity threshold in Corollary 5 extend to the
asymmetric MPE of Proposition 6.
7 Conclusion
This paper provides a novel and tractable framework for analyzing strategic interaction
among a team of players conducting incremental experimentation. We show how welfare
and long-run outcomes depend on the strength of two competing forces—the free-rider
effect and the encouragement effect.
Several limitations in our model are noteworthy. First, the scope of experimenta-
tion is restricted: Players do not have complete freedom to choose where to explore.
Admittedly, this is indeed a simplifying assumption when our model is considered as
one of the models of spatial experimentation as in Callander (2011) and Garfagnini and
Strulovici (2016). Yet, incremental experimentation in our model can be alternatively
viewed as an approximation of a sequential search process in which the search results
follow a Markov process, with the search frequency being controlled by the players.
In this sense, the restriction imposed on the scope of experimentation is no different
from Markov assumption on the search results. Second, in our model, all actions and
outcomes are publicly observable. Hence, there are perfect knowledge spillovers be-
tween players, and the developed technologies are pure public goods. Extensions that
allow for some degree of excludability of the technologies or friction in the diffusion
of knowledge could be useful for applications in the industrial organization literature.
Third, the cost of exploration in our model is assumed to be the same for all players. The
effects of heterogeneity in the abilities of the players on their welfare and exploration
incentives are also worth exploring in future research.
References
Barbu, Viorel. “Existence and Uniqueness for the Cauchy Problem”. In: Differential
Equations. Cham: Springer International Publishing, 2016, pp. 29–77.
Bolton, Patrick and Christopher Harris. “Strategic Experimentation”. In: Econometrica
67.2 (1999), pp. 349–374.
28
Callander, Steven. “Searching and Learning by Trial and Error”. In: American Economic
Review 101.6 (Oct. 2011), pp. 2277–2308.
Cetemen, D., C. Urgun, and L. Yariv. Collective Progress: Dynamics of Exit Waves.
Working Paper. 2021.
Denrell, Jerker and James G. March. “Adaptation as Information Restriction: The Hot
Stove Effect”. In: Organization Science 12.5 (2001), pp. 523–538.
Fleming, W. and H. Soner. “Optimal Control of Markov Processes: Classical Solutions”.
In: Controlled Markov Processes and Viscosity Solutions. New York, NY: Springer
New York, 2006, pp. 119–149.
Garcia-Macia, Daniel, Chang-Tai Hsieh, and Peter J. Klenow. “How Destructive Is
Innovation?” In: Econometrica 87.5 (2019), pp. 1507–1541.
Garfagnini, Umberto and Bruno Strulovici. “Social Experimentation with Interdepen-
dent and Expanding Technologies”. In: The Review of Economic Studies 83.4 (Feb.
2016), pp. 1579–1613.
Gittins, J. C. “Bandit Processes and Dynamic Allocation Indices”. In: Journal of the
Royal Statistical Society. Series B (Methodological) 41.2 (1979), pp. 148–177.
Keller, Godfrey and Sven Rady. “Strategic experimentation with Poisson bandits”. In:
Theoretical Economics 5.2 (2010), pp. 275–311.
Keller, Godfrey, Sven Rady, and Martin Cripps. “Strategic Experimentation with Expo-
nential Bandits”. In: Econometrica 73.1 (2005), pp. 39–68.
Lee, Eucman and Phanish Puranam. “The implementation imperative: Why one should
implement even imperfect strategies perfectly”. In: Strategic Management Journal
37.8 (2016), pp. 1529–1546.
Lehoczky, John P. “Formulas for Stopped Diffusion Processes with Stopping Times
Based on the Maximum”. In: The Annals of Probability 5.4 (1977), pp. 601–607.
March, James G. “Exploration and Exploitation in Organizational Learning”. In: Orga-
nization Science 2.1 (1991), pp. 71–87.
Peskir, Goran and Albert Shiryaev. “Methods of solution”. In: Optimal Stopping and
Free-Boundary Problems. Basel: Birkhäuser Basel, 2006, pp. 143–241.
Posen, Hart E. and Daniel A. Levinthal. “Chasing a Moving Target: Exploitation and Ex-
ploration in Dynamic Environments”. In: Management Science 58.3 (2012), pp. 587–
601.
Rothschild, Michael. “A two-armed bandit theory of market pricing”. In: Journal of
Economic Theory 9.2 (1974), pp. 185–202.
Stieglitz, Nils, Thorbjørn Knudsen, and Markus C. Becker. “Adaptation and inertia in
dynamic environments”. In: Strategic Management Journal 37.9 (2016), pp. 1854–
1864.
29
Taylor, Howard M. “A Stopped Brownian Motion Formula”. In: The Annals of Proba-
bility 3.2 (1975), pp. 234–246.
Thomke, S. H. “Experimentation Works: The Surprising Power of Business Experi-
ments”. In: Harvard Business Review Press, 2020.
Urgun, C. and L. Yariv. Retrospective Search: Exploration and Ambition on Uncharted
Terrain. Working Paper. Department of Economics, Princeton University, 2020.
Appendices
In this Appendix, we let \ ≔ 2`/f2 and d ≔ f2/(2A2), where ` ∈ ℝ denotes the drift
of the Brownian path ,+, without normalizing the volatility f to√
2. Consequently,
we replace _ in Lemma 1 with _ ≔ 1/(#d) − \. Moreover, we let X ≔ A2/# for the
sake of notational convenience. Assumption 1, that _ > 1, is maintained throughout
this Appendix unless it is explicitly stated otherwise.
A Explicit Representation of the Symmetric MPE
Corollary 6. The explicit representation for the normalized payoff function *† in the
unique symmetric equilibrium on [0†, 0) is given by
*†(0) =
1 + 12d (0 − 0)2, if \ = 0,
1 + 1d\
(0 − 0 + 1
\
(4−\ (0−0) − 1
)), if \ ≠ 0.
If 0† > 0,*† on [0, 0†) is given by
*†(0) = #
W2 − W1
((W2 + ])4−W1 (0†−0) − (W1 + ])4−W2 (0†−0)
),
with W1 < W2 being the roots of the equation W(W − \) = 1#d and ] > 0 being
] ≔
1#d\
(1 +,0
(− exp
(−1 − (# − 1)d\2
) ) ), if \ > 0,√
2#d (1 − 1/#), if \ = 0,
1#d\
(1 +,−1
(− exp
(−1 − (# − 1)d\2
) ) ), if \ < 0,
where,0 and,−1 denote the two real branches of the Lambert, function.
If ] < 1, then the equilibrium belongs to the binding case. The full-intensity threshold
30
is given by
0† =1
W2 − W1
(ln
(1 + W2
] + W2
)− ln
(1 + W1
] + W1
)),
and the stopping threshold is given by
0 = 0† + #d(] + \ (1 − 1/#)).
If ] ≥ 1, then the equilibrium belongs to the non-binding case with the full-intensity
threshold 0† = 0, whereas the stopping threshold 0 is given by
0 =
1\
(1 + \ − \2d +,0
(−(1 + \)4−1−\+\2d
)), if \ < 0,
1 −√
1 − 2d, if \ = 0,
1\
(1 + \ − \2d +,−1
(−(1 + \)4−1−\+\2 d
)), if \ > 0.
Proof. From Lemma 4 and explicit calculations. �
B Properties of Payoff Functions
Lemma 5 (Homogeneity). Player =’s payoff function for an B-invariant Markov strategy
profile k ∈ K# can be written as {= (0, B |k) = 4B{= (0, 0|k).
Proof. At state (0, B), player =’s payoff associated to k is given by
{= (0, B |k) = E
[∫ ∞
0
A4−AC (1 − := (�C))4(C dC
���� �0 = 0, (0 = B
]
= 4BE
[∫ ∞
0
A4−AC (1 − := (�C))4(C−B dC
���� �0 = 0, (0 = B
]
= 4BE
[∫ ∞
0
A4−AC (1 − := (�C))4(C dC
���� �0 = 0, (0 = 0
]
= 4B{= (0, 0|k),
where the second-to-last equality comes from the Markov property of the diffusion
process {�C}C≥0.
�
Lemma 6 (Normal Reflection Condition and Feynman-Kac Equation). Given an B-
invariant Markov strategy profile, the associated payoff function of each player satisfies
Feynman-Kac equation (6) at each state at which it is twice continuously differentiable.
Moreover, if the intensity of exploration is bounded away from 0 in some neighborhood
of the state 0 = 0, the payoff function also satisfies the normal reflection condition (4).
31
Proof. We prove a more general version of this lemma in the Online Appendix for
Markov strategies with (0, B) as the state variables. The normal reflection condition
( m{=/m0 + m{=/mB )(0+, B) = 0 takes form of D= (0) + D′= (0+) = 0 because of the
homogeneity of the payoff functions for B-invariant strategies. �
C Complete Information Setting
C.1 Proof of Lemma 1
Proof. Let B, ≔ supG≥0{, (G) − XG}. Note that,+(G) − ~− XG has the same law as the
Brownian motion starting from 0 with drift `−X and volatilityf. Then it is well-known
that B,+ − ~ has the exponential distribution with mean 1/_ if _ > 1. Otherwise, if
_ ≤ 1, then B,+ = +∞ with probability 1, which obviously leads to + (0, B) = +∞.
Now suppose _ > 1. The (ex ante) value under complete information at state (0, B)can be calculated as
+ (0, B) = E [{(,) |,+(0) = ~,, (0) = B]= P( B, ≤ B)4B + P( B, > B)E
[4 B,
��B, > B]
= 4B(P( B, ≤ B) + P( B, > B)E
[4~−B4 B, −~
��B, > B]),
where {(,) = 4 B, is proved in Lemma 7 below.
Note that B, > B if and only if B,+ − ~ > 0, and hence we have
P( B, > B)E[4~−B4 B, −~
��B, > B]= 4−0
∫ ∞
04I_4−_I dI = − _
1 − _4−_0 .
Therefore, we have + (0, B) = 4B(1 − 4−_0 − _
1−_ 4−_0 )
≕ 4B* (0) with * (G) =
1 + 4−_0/(_ − 1). �
Lemma 7 (Average Ex-post Value Function under Complete Information). For a given
Brownian path , , the average (ex-post) value under complete information is given by
{(,) = 4 B, , where B, = supG≥0 {, (G) − XG}.
Proof. For an arbitrary technology G ≥ 0, denote by (G) = { C }C≥0 the cutoff strategy
in which the players explore with full intensity until G is developed, and exploit G
thereafter. In other words, C = # 1[0,G2/#) (C). By the nature of the problem, it is
obvious that we can focus on this class of cutoff strategies without loss.
32
The average payoff for strategy (G) can be calculated as
{(, | (G)) =∫ ∞
G2/#A4−AC4, (G) dC = 4, (G)−XG ,
which gives
{(,) = supG≥0
{(, | (G)) = 4 B, ,
and the proof is complete.
We define the first-best technology to be any G ∈ arg maxG≥0{, (G) − XG}, if this set
is not empty given , . In such a case, the value {(,) < +∞ is achieved by strategy
(G).�
D Long-run Outcomes
D.1 Proof of Lemma 2
Proof. This lemma can be easily derived from the results in Lehoczky (1977). �
D.2 Proof of Lemma 3
Proof. By the definition of G in the proof of Lemma 7, we have, (G)−XG ≥ , (G)−XG for
all G ≥ 0. We assume there exists a unique G without loss.20 Let G ≔ arg max[0,G], (G)denote the technology with quality, (G) = B.21 Note that if 0 < 0 we have, (G) = B− 0,
otherwise we have G = 0 and thus, (G) = B − 0.
Suppose that G > G. Then we have , (G) − XG > B − XG ≥ B − XG, which implies
the event of supG>G{, (G) − X(G − G)} > B. By the Markov property of the Brownian
motion, this event occurs with probability P("∞ > max{0, 0}), where "∞ denotes the
global maximum of a Brownian motion starting from zero, with drift `−X and volatility
f. It is well known that "∞ has an exponential distribution with mean − f2
2(`−X) = 1/_.
Therefore, we have @(0) = 1−P0B (G > G) ≥ 1−P("∞ > max{0, 0}) → 1 as 0 → +∞,
since _ > 1 under Assumption 1. �
20This assumption holds almost surely under Assumption 1.
21By standard results, G(0) is almost surely finite under Assumption 1. See, e.g., equation (1.1) of
Taylor (1975).
33
E Characterization and Properties of MPE
E.1 Proof of Lemma 4
The theorem follows from the following lemmas.
Lemma 8 (Sufficiency). Given k¬= ∈ K#−1, if D= : ℝ+ → ℝ satisfies Condition 1–
4 in Lemma 4, then a piecewise right-continuous function :∗= : ℝ+ → [0, 1] which
maximizes the right-hand side of the HJB equation (5) at each continuity point of D′′= is
a best response against k¬=, with D= being the associated payoff function of player =.
Proof. The proof is a standard verification argument. See, e.g., Fleming and Soner
(2006), Theorem III.9.1. The role played by the linear growth condition in a standard
proof is instead played by Assumption 1. The detailed proof is provided in the Online
Appendix.
�
Lemma 9 (Necessity). In any MPE k ∈ K# , for each player = ∈ {1, . . . , #}, her payoff
function D=(·|k) satisfies Conditions 1–4 in Lemma 4, and her equilibrium strategy
:= (0) maximizes the right-hand side of the HJB equation (5) at each continuity point of
D′′= .
Proof. Condition 1: On the stopping region where (0) = 0, it is trivial that the payoff
functions are smooth. On the region where (0) is bounded away from zero, players’
payoff functions are once continuously differentiable by standard results. Because the
intensity of exploration in any MPE is bounded away from zero on any compact subset
of [0, 0), as we argue in Section 4.2, the only part left to prove is the smooth pasting
condition, which states that the payoff functions in any MPE must be once continuously
differentiable at the stopping threshold 0. This is proved in the Online Appendix.
Condition 2: By standard results, players’ payoff functions are twice continuously
differentiable at each point at which all players’ strategies are continuous. Condition 2
thus follows from our piecewise continuity assumption on players’ strategies.
Condition 3 follows from Lemma 6.
Condition 4 follows directly from the dynamic programming principle.
�
E.2 Proof of Proposition 2
Proof. Suppose to the contrary that there is an MPE where player 1 chooses exploitation
at all states. Let 0 > 0 denote the state at which all exploration stops. Then on (0, 0)
34
player 1’s payoff function D1 is continously differentiable and solves the free-rider ODE
D(0) = 1 + (0)V(0, D) with value matching D(0) = 1 and smooth pasting condition
D′(0) = 0. It can then be easily verified that D1(0) = 1 is the unique solution, which is a
contradiction because the normal reflection condition (4) is violated. �
E.3 Proof of Proposition 3
Proof. As the individual payoff functions are bounded from below by the single-agent
payoff function*∗1, it is clear that the stopping threshold 0 in any MPE is weakly larger
than the single-agent cutoff 0∗1. Suppose by contradiction that 0 = 0∗
1. Assume without
loss of generality that all players’ strategies are continuous on (0∗1− n, 0∗
1) for some
n > 0, so that each player’s payoff function is twice continuously differentiable on this
region. Note that for each player =, both D= and *∗1
satisfy value matching and smooth
pasting at 0∗1. Then from equation (6), we have for each player =,
dD′′= (0∗1−) =:= (0∗1−) (0∗
1−) ≤ 1
and d*∗′′1(0∗
1−) = 1. As dD′′= (0∗1−) < 1 for at least one player, her payoff lower bound
D= ≥ *∗1 is then violated at the states immediately to the left of 0. �
E.4 Proof of Proposition 4
Proof. Suppose by contradiction that there is an MPE where all players use cutoff
strategies. Let player 1 be the one who uses the strategy with the largest cutoff 0. Then
no other player uses the same cutoff 0 as player 1 for the following reason. Suppose to
the contrary that player 2 uses a strategy with the same cutoff 0, then both player 1 and
2 must have a payoff strictly greater than 1 and lower than 2 at the states immediately
to the left of 0, as their payoff functions solve the explorer ODE D(0) = V(0, D) for
some > 1 with initial condition D(0) = 1 and D′(0) = 0. As a result, exploration with
full intensity cannot be optimal for both players at the states immediately to the left of
cutoff 0 according to our characterization of best responses. Therefore, player 1 would
be the lone explorer with D1 > 1 on (0 − n, 0) for some n > 0, whereas all other players
free-ride on this region.
Moreover, it is easy to see that player 1’s payoff is weakly lower than the others,
because the region on which she collects flow payoffs is the smallest among all players.
However, on (0 − n, 0) the payoff function D= of player = ≠ 1 satisfies the free-rider
ODE D = 1+ V(0, D) with the same initial conditions as player 1. This yields the unique
35
solution D= = 1 < D1 on (0 − n, 0), a contradiction.
�
F Symmetric MPE
F.1 Proof of Proposition 5
From Lemma 4, it is not difficult to verify that the strategy profile in Corollary 6
constitutes an equilibrium, and that our proposition properly summarizes the equilibrium
payoff function in that corollary.
Uniqueness follows directly from symmetry and Lemma 4. One can check by
explicit calculations that, under Assumption 1, the equilibrium in Corollary 6 is the only
symmetric B-invariant strategy profile such that the associated common payoff function
satisfies Conditions 1–4 in Lemma 4.
The comparison between the stopping threshold and the cooperative cutoffs follows
from Lemma 10.
G Comparative Statics
Lemma 10 (Welfare Comparison). Consider normalized payoff functions D8 : ℝ+ →[1, +∞) for 8 = 1, 2, with stopping thresholds 08 ≔ sup{ 0 ≥ 0 | D8 (0) > 1 } < +∞.
Suppose for each 8 = 1, 2, D8 satisfies the following conditions:
1. �1 on (0, 08) with D′1(01) = D′2(02) ≤ 0;
2. piecewise �2 on ℝ++;
3. there exists some function �8 : (1, +∞) × ℝ− → ℝ++ with �8 (I1, I2)/I2 nonde-
creasing in I1 and nonincreasing in I2 such that
(a) D′′8 (0) = �8 (D8 (0), D′8 (0)) for all 0 ∈ (0, 08) at which D′′8 (0) is continuous;
(b) D′8 (0) + �8 (D8 (0), D′8 (0)) > 0 for all 0 ∈ (0, 08).
If �1 ≥ �2, then 01 ≤ 02 and D1 ≤ D2. Additionally, if for some D ∈ (1, D2(0)] we
have �1 = �2 on (D, D2(0)) × ℝ− and for all I2 ≤ 0 at least one of the following two
conditions holds:
1. �1(D−, I2) > �2(D−, I2);
2.m�1 (D−,I2)
mI1<
m�2 (D−,I2)mI1
andm�1 (D−,I2)
mI2≥ m�2 (D−,I2)
mI2;
36
then 01 < 02, and D1 < D2 on [0, 02).
Proof. See the Online Appendix. �
Remark. Almost all the payoff functions in this paper fulfill the first four conditions in
the Lemma above under Assumption 1 with value matching and smooth pasting at 0.
These include
• the cooperative solution: �(I1, I2) = I1#d + \I2;
• the symmetric MPE: �(I1, I2) = max{I1/#, 1}/d + \I2;
• the average payoff in the asymmetric MPE:
�(I1, I2) = max{I1/#,min{1, I1 − (1 − 1/#)}}/d + \I2;
• the average payoff in the simple asymmetric MPE of Proposition 10:
�(I1, I2) =
I1−(1− /#) d + \I2, if < I1 ≤ + 1 for = 1, . . . , # − 1,
I1#d + \I2, if I1 ≥ #.
G.1 Proof of Corollary 1
Proof. Here we prove the limit results only. Other results can be easily derived from
explicit calculations. Write | = (ln *)′ and |∗ = (ln*∗)′. It is not difficult to verify
that |′ = −(
1#d − \
)| − |2 on ℝ++; |∗′ =
1#d + \|∗ − |∗
2 on (0, 0∗# ) and |∗ = 0 on
[0∗# , +∞).Suppose that \ ≤ −1. Then on (0, 0∗#), both ODEs above converge to|′−\|+|2
= 0
as # → +∞. Because the normal reflection condition (4) implies |(0) = |∗(0) = −1,
we must have lim|∗ = lim | on (0, 0∗#). However, notice that lim |(0) < 0 for all
0 ≥ 0. Therefore, we must have 0∗# → +∞ by the continuity of |∗ on ℝ+. This, together
with the convergence of |∗ and |, implies lim*∗(0) = lim * (0) for each 0 ≥ 0. In
particular for \ = −1, from Lemma 1 we know * (0) → +∞ as # → +∞, and hence
*∗(0) → +∞ for each 0 ≥ 0. Then the results for the case of \ > −1 follow from the
monotonicity of *∗(0) in \, which can be easily verified by Lemma 10.
�
G.2 Proof of Corollary 3
Proof. Since 2 = f2/(2Ad), the comparative statics with respect to 2 directly follow
from the following proof of the comparative statics with respect to d. Recall that the
37
common payoff function*†d in the symmetric MPE satisfies ODE D′′ = �(D, D′) on (0, 0)where �(I1, I2) = max{I1/#, 1}/d + \I2. Then the comparative statics of *†d and the
stopping threshold 0d follow directly from Lemma 10.
Because *†d (0) is decreasing on [0, 0) and increasing in d for each 0 ∈ [0, 0), the
full-intensity threshold 0†d = *†−1d (#) is weakly increasing in d.
Lastly, because :†d (0) = (*†d (0) − 1)/(# − 1) on (0†d, 0d), the comparative statics
results above imply that :†d (0) is weakly increasing in d for all 0 ≥ 0.
�
G.3 Proof of Corollary 4
Proof. For the binding case, similar to the proof of Corollary 3, the comparative statics
of *†# and 0# with respect to # are immediate from Lemma 10.
For the non-binding case, Lemma 10 implies that both *†# and 0# are constant
over # , as �(I1, I2) = 1/d + \I2 does not depend on # . Then it is obvious that
:# (G) = (*†# (0) − 1)/(# − 1) is weakly decreasing in # for each 0 ≥ 0. �
G.4 Proof of Corollary 5
For the sake of notational convenience, we are going to prove the results in terms of
d = f2/(2A2) > d for some d, which is equivalent to 2 < 2 = f2/(2A d). We let
d = (\ − ln(1 + \))/\2 if \ > −1.22
The statement for the case that d > d and \ > −1 can be easily shown by an analysis
on the phase diagram from the theory of ordinary differential equations. A rigorous
proof by explicit derivation is provided in the Online Appendix. The opposite case
follows from the following lemma.
Lemma 11. If either d ≤ d or \ ≤ −1, then there exists # > 1 such that 0†# = 0 and
*†# = *†# < +∞ for all # ≥ # .
Here we do not impose Assumption 1. Therefore, this lemma states that when #
reaches # as increased from 1, the symmetric equilibrium falls into the non-binding
category and continues to be an equilibrium for all # > # , even when Assumption 1
is violated for large # in the case of \ > −1. As a consequence, we have 0# = 0# for
all # ≥ # , and thus the stopping threshold 0# is bounded from above as # → +∞.23
22If \ = 0 we set d = 1/2, the limit of the right-hand side as \ → 0.
23More precisely, when Assumption 1 is violated and the uniqueness of the symmetric MPE is not
guaranteed, here we mean there exists a sequence of symmetric MPE indexed by # , with 0# bounded
from above as # → +∞.
38
This implies |*∗# (0) −*†# (0) | is bounded away from zero in the limit, because for large
enough # we have that*†# (0) is constant over # , and that*∗# (0) is increasing in # for
each 0 ≥ 0.
Moreover, since 0# is constant over # for large enough # , the amount of exploration
G(0# ) is also constant for large # . The limit result for @# (0#) then follows from the
fact that the first-best technology G is nondecreasing in # for each given Brownian path
, and each initial state (0, B).Next we prove the above lemma.
Proof of Lemma 11. Suppose either d ≤ d or \ ≤ −1. We construct *†# according to
the closed-form expression of the equilibrium payoff function for the non-binding case
in Corollary 6. This is possible only when either d ≤ d or \ ≤ −1, as otherwise the
expression for 0 in that corollary is not well-defined. Let # = *†# (0), and for all # ≥ # ,
let :†# = (*†# − 1)/(# − 1).We now verify that the strategy profile k
†# with each player playing :†# constitutes
a symmetric MPE with non-binding resource constraints in the #-player exploration
game for all # ≥ # , with*†# being the associated payoff function. Note that we cannot
apply Lemma 4 directly because it relies on Assumption 1. Nevertheless, from the
proof of Lemma 8, we know that*†# is an upper bound on player =’s achievable payoffs
against k†¬= (this fact does not rely on Assumption 1). Moreover,*†# is indeed the payoff
function associated with k†# , since *†# is the only function that satisfies both of the
properties in Lemma 6. As a result, the upper bound on the payoff functions when the
player plays against k†¬= is achieved by :†# , and therefore k
†# is a symmetric MPE.
�
H Asymmetric MPE
H.1 Simple MPE
Here we present a class of asymmetric MPE similar to the simple MPE in Section 6.1
of KRC. The construction is almost identical to the asymmetric MPE of Proposition 6,
and therefore the proof is omitted.
Proposition 10 (Asymmetric Simple MPE). The #-player exploration game admits
simple Markov perfect equilibria with " thresholds 0 ≕ 0"+1 < 0" < · · · < 01 <
00 ≔ +∞ for some " ∈ {1, . . . , #}, such that on [0 +1, 0 ) exact players take turns
exploring on consecutive subintervals.
39
The average payoff function D is strictly decreasing on [0, 01], twice continuously
differentiable on ℝ++ except for the states {0 }" =1. Moreover, we have ⌊D(0)⌋ =
" , and D solves the ODE D = (1 − /#) + d (D′′ − \D′) on [0 +1, 0 ) for each
∈ {0, 1, . . . , "}. The payoff function of each player D= is strictly decreasing on
[0, 01], once continuously differentiable on ℝ++, and satisfies D=(0 ) = D(0 ) = for
∈ {1, . . . , "}.
H.2 Proof of Proposition 6
Step 1: Construction of the average payoff function.
Let D : ℝ− → ℝ be once continuously differentiable and solves
max{D(0)/#,min{1, D(0) − (1 − 1/#)}} = V(0, D), (10)
with initial conditions D(0) = 1 and D′(0) = 0.24 Let 0♭ > 0 be such that D(−0♭) +D′(−0♭) = 0.25
Now we let the average payoff function D(0) = D(0 − 0♭) for 0 ≤ 0♭ and D(0) = 1
for 0 ≥ 0♭. We can check that D satisfies value matching D(0♭) = 1, smooth pasting
D′(0♭) = 0, and the normal reflection condition D(0) + D′(0) = 0. Let 0♯ = D−1(2−1/#)and 0‡ = D−1(#) where D−1(D) = inf{ 0 ≥ 0 | D(0) ≤ D }. Under such construction
we have 0 ≤ 0‡ ≤ 0♯ < 0♭ and D : ℝ+ → ℝ is continuously differentiable and satisfies,
on (0♭, +∞), D = 1; on (0♯, 0♭), solves D = 1 − 1/# + V(0, D); on (0‡, 0♯), solves
1 = V(0, D); on (0, 0‡), solves D = #V(0, D).
Step 2: Construction of the players’ payoff functions and strategies.
For each player =, on [0♭, +∞), let := (0) = 0 and D= = D = 1; on [0, 0♯), if not empty,
let := (0) = min{1, (D(0) − 1)/(# − 1)} and D= = D.
Next, consider any partition 0♯ = 01 < 02 < · · · < 0< < 0<+1 = 0♭ of interval
[0♯, 0♭]. For each subinterval [0 9 , 0 9+1] in the partition {0 9 }<+19=1, we use Algorithm 1
to construct payoff function D= and strategy := for each player =.
For each subinterval [0 9 , 0 9+1], Algorithm 1 calls procedure ActionAssignment,
which first calls function Split to split [0 9 , 0 9+1] into three subintervals according to
Lemma 12 below, and lets the player with the lowest index available freeride on the
subinterval [0"−, 0"+] in the middle, and explore on the rest two subintervals at both
ends. In such a way, the strategy := and payoff function D= of this player are defined on
24It is straightforward to verify that D is strictly decreasing.
25It can be shown that under Assumption 1 there exists a unique 0 < 0♭ < +∞.
40
[0 9 , 0 9+1], and she is then labeled as unavailable. Then ActionAssignment is called
recursively on these three subintervals, preserving the total intensity = 1 by allocating
intensity 0 on the subintervals at both ends, and intensity 1 on [0"−, 0"+], with D on
these subintervals being replaced by D¬=, which is the average payoff function among
the rest of the available players.
Lemma 12 ensures function Split partitions any interval [0!, 0'] into three subin-
tervals in a unique way such that D=, and therefore also D¬=, have the same values and
derivatives as D at the end points 0! and 0'.
The termination of Algorithm 1 is trivial because each time ActionAssignment
is called, the strategy of one of the players is assigned and she is thereafter removed
from the set of the available players. In the case where the allocation of strategies is
unambiguous—that is, either the task of exploration has already been assigned or is to
be assigned to the last available player—splitting the interval is unnecessary, and the
corresponding strategies are set at once for all these players.
When Algorithm 1 terminates, the strategies and the constructed payoff functions of
each player are defined on [0♯, 0♭]. Also note that D from Step 1 is indeed the average of
the constructed payoffs of all the players, and the input requirement for D in procedure
Split is satisfied in each call to the procedure. These conditions are maintained by line 22
during each call of ActionAssignment. Lastly, D= is once continuously differentiable
on ℝ++ for each =, since no “kink” is created in Algorithm 1.
Step 3: Ensuring mutually best responses.
Because each D= is once differentiable on ℝ++, is twice differentiable except for the
switch points, and satisfies normal reflection condition (4), our characterization of MPE
in Lemma 4 states that (:1, . . . , :# ) constitutes an MPE with D= being the associated
payoff function of player = if D= solves the HJB equation
D= (0) = 1 + ¬= (0)V(0, D=) + max:=∈[0,1]
:={V(0, D=) − 1}
for each continuity point of D′′= , and the constructed strategy maximizes the right-hand
side.
In other words, we need to verify for all 0 ≥ 0, := (0) = 1 if V(0, D=) > 1, and
:= (0) = 0 if V(0, D=) < 1. Then the HJB is satisfied by construction.
On (0♭, +∞), D=(0) = 1 gives V(0, D=) = 0 < 1, therefore := (0) = 0 is optimal. On
[0, 0♯), the argument is exactly the same as in the symmetric MPE. Lastly, on (0♯, 0♭),we have = 1 by construction. The monotonicity of D and the construction of D=
implies 1 ≤ D= (0) < D(0♯) = 2− 1/# for each player =. Our construction of D= ensures
41
Algorithm 1 Payoff construction
Require: 0♯ = 01 < . . . < 0<+1 = 0♭, D from Step 1, and a finite set of players
# = {1, . . . , |# |}Ensure: equilibrium strategy profile {:=} |# |==1
is defined on [01, 0<+1], and {D=} |# |==1is
the set payoff functions corresponding to strategy profile {:=} |# |==1
1: for all 9 ∈ {1, . . . , <} do
2: ActionAssignment(# , 1, 0 9 , 0 9+1, D)
3: end for
4: function Split(|� |, 0! , 0', D) ⊲ input and output according to Lemma 12
Require: |� |: number of available players
Require: D: solves D = 1 − 1/|� | + d(D′′ − \D′) on (0! , 0')Ensure: D ∈ �1((0! , 0')) solves D = d(D′′ − \D′) on (0!, 0"−) ∪ (0"+, 0'), and
D = 1 + d(D′′− \D′) on (0"−, 0"+), with the same values and derivatives as D at 0!and 0'
5: return (0"−, 0"+, D)
6: end function
7: procedure ActionAssignment(�, ^, 0! , 0', D)
⊲ allocate aggregate intensity ^ ∈ {0, 1} to a set of available players �
8: if ^ = 0 then ⊲ let all available players free-ride
9: for all = ∈ � do
10: := | [0! ,0') ← 0
11: D= | [0! ,0') ← D12: end for
13: else if ^ = 1 = |� | ≕ |{=}| then ⊲ let the only available player explore
14: := | [0! ,0') ← 1
15: D= | [0! ,0') ← D
16: else ⊲ let one of the |� | ≥ 2 available player explore
17: (0"−, 0"+, D) ← Split(|� |, 0! , 0', D)18: =← min �
19: := | [0! ,0"−)∪[0"+,0') ← 1
20: := | [0"−,0"+) ← 0
21: D= | [0! ,0') ← D
22: D¬= | [0! ,0') ←|� |D−D=|� |−1
⊲ the average payoff among the rest of the players
23: ActionAssignment(�\{=}, ^, 0"−, 0"+, D¬=)24: ActionAssignment(�\{=}, ^ − 1, 0! , 0"−, D¬=)25: ActionAssignment(�\{=}, ^ − 1, 0"+, 0', D¬=)26: end if
27: end procedure
42
:= (0) = 1 if and only if D= (0) = V(0, D), and := (0) = 0 if and only if D= (0) = 1+V(0, D)for each G ∈ (0♯, 0♭). Suppose V(0, D=) > 1, which implies 1 + V(0, D=) > 2 > D= (0).Then by construction it must be D= (0) = V(0, D), and hence the constructed strategy
:= (0) = 1 is optimal. On the contrary, if V(0, D=) < 1, which implies V(0, D=) < D= (0),then by construction it must be D= (0) = 1 + V(0, D=), and hence the constructed strategy
:= (0) = 0 is optimal.
Comparison with symmetric MPE
Note that D solves ODE
D′′ = max{D/#,min{1, D − (1 − 1/#)}}/d + \D′
on (0, 0♭), while the average payoff function*† of symmetric MPE solves
D′′ = max{D/#, 1}/d + \D′
on (0, 0), with value matching and smooth pasting at 0♭ and 0, respectively. Obviously
the right-hand side in the first equation is weakly smaller. We can then check the second
condition for the strict inequalities in Lemma 10 with D = 2 − 1/# and conclude that
0 < 0♭ and D > *† on [0, 0♭). Similarly, we can also compare the first equation with
the ODE in the cooperative problem D′′ = D/(#d) + \D′ and conclude 0♭ < 0∗# .
�
Lemma 12 (Split). Given a strictly decreasing function D : [0!, 0'] → [D! , D'],with D(0!) = D! and D(0') = D' ≥ 1, which satisfies the average payoff ODE D =
5 +d(D′′−\D′) on (0!, 0')with 0 < 5 < 1 and d > 0, there exist 0! < 0"− < 0"+ < 0',
and a function D : [0! , 0'] → [D! , D'] continuously differentiable on (0!, 0') such
that D(0!) = D(0!); D′(0!) = D′(0!); D(0') = D(0'); D′(0') = D′(0'); D solves the
explorer ODE D = d(D′′− \D′) on (0! , 0"−) ∪ (0"+, 0') and solves the free-rider ODE
D = 1 + d(D′′ − \D′) on (0"−, 0"+).
Proof. See the Online Appendix. �
H.3 Proof of Proposition 7
Proof. The average payoff D in any MPE must be once continuously differentiable and
satisfy the ODE D′′ = �(D, D′) for some � respecting Feynman-Kac equation (6) whenever
D > 1 and D′′ is continuous. For # = 2, the characterization of best responses gives the
following minimal requirements on �. For each 1 < D < 2, �(D, D′) can only be either
43
D−1/2d +\D′ (one player explores and the other freerides) or 1
d +\D′ (both players choose a
common interior allocation); for each D > 2, � can only be eitherD−1/2d + \D′ (one player
explores and the other freerides) or D2d + \D′ (both players explore). The asymmetric
MPE we construct in Proposition 6 adopts �(D, D′) = max{D/2,min{1, D−1/2}}/d+\D′,which is the minimal � that satisfies these constraints, and therefore achieves the highest
payoff among all MPE by Lemma 10. �
H.4 Proof of Proposition 8
Proof. Choose a partition {0 9 }<+19=1of [0♯, 0♭] in the construction of the equilibria in
Proposition 6, so that
max1≤ 9≤<
��D(0 9+1) − D(0 9 )�� ≤ X ≔ 1
2max0♯≤0≤0
{D(0) −*†(0)},
and |0< − 0<+1 | < n . Because the average payoff D and the players’ payoff functions
are monotone and coincide at the endpoints of the subintervals [0 9 , 0 9+1], we have
|D= (0) − D(0) | ≤ X for 0 ∈ [0♯, 0] for all player =. Therefore, we have D= ≥ D − X > *†
on [0♯, 0). Also, as D= (0<) = D(0<) > 1 for all player =, we have D= > 1 = *† on
[0, 0<] = [0, 0♭ − n]. Lastly, D= = D > *† on [0, 0♯), if it is not empty.
�
H.5 Proof of Proposition 9
Proof. Suppose \ < −1. Then Assumption 1 is always satisfied and the asymmetric
MPE in Proposition 6 exist for all # .
Denote by D# the average payoff function in the asymmetric MPE of Proposition 6 for
an #-player exploration game. Because D# satisfies the ODE D = 1−1/# + d(D′′− \D′)whenever 1 < D < 2 − 1/# , we verify that |# ≔ ln(D#)′ satisfies the second-order
ODE
|′′ − \|′ + 3||′ + |(|2 − \| − 1/d) = 0 (11)
on (0♯, 0♭), with initial conditions |(0♭) = 0 and |′(0♭) = 1#d , and |# = 0 on [0♭, +∞).
By the continuity of the solution to ODE (11) in the initial data (see, e.g., Theorem
2.14 of Barbu (2016)), as # → +∞, |# converges to 0 on (0♯, 0♭), which is the unique
solution to ODE (11) with initial conditions |(0♭) = 0 and |′(0♭) = lim 1#d = 0.
Suppose by contradiction that 0♭ is bounded from above as # → +∞. This implies for
sufficiently large # we have D# (0) = exp(∫ 00♭|#
)< 2 − 1/# for all 0 ∈ [0♯, 0♭], and
thus by construction we have 0♯ = 0. Then the fact that |# → 0 on (0, 0♭) contradicts
44
|# (0) = −1 imposed by the normal reflection condition (4).
Now suppose \ ≥ −1. If the constructed asymmetric MPE exist for all # > 1,
then 0♭ → +∞ as # → +∞. This is because for fixed # > 1 and 0 ≥ 0, D# (0) is
nondecreasing in \, and thus the claim follows from the case of \ < −1 as we have shown
above. Otherwise, it can be shown that as # → 1d(1+\) we have D# → +∞ and 0♭ → +∞
by an argument similar to the proof of Corollary 5 for the symmetric equilibrium.
�
45
arX
iv:2
108.
0721
8v1
[ec
on.T
H]
16
Aug
202
1
Online Appendix
“Strategic Exploration for Innovation”
Shangen Li∗
August 17, 2021
1 Equilibrium Characterization (Details)
1.1 Proof of Lemma 8
Proof. Given k¬= ∈ K#−1, for any admissible control process := = {:=,C}C≥0 ∈ A, let
L ≔ [(0, B, :=,C)m
m0+ 1
2U2(0, B, :=,C)
m2
m02,
with [(0, B, :) = −`(:+ ¬= (0))/2 andU(0, B, :) = f√(: + ¬= (0))/2. Let 5 (0, B, :) =
A (1 − :)4B , and consider {(0, B) = 4BD= (0), where D= : ℝ+ → ℝ satisfies Conditions
1–4 in Lemma 4.
Usually, applying Itô’s formula on { at 0 > 0 requires D= ∈ �2 (ℝ++). However, Itô’s
formula is still valid for �1 functions with absolutely continuous derivatives, which is
satisfied by D= under Conditions 1 and 2.1
∗Department of Economics, University of Zürich. E-mail address: [email protected]
1See, e.g., Chung and R. J. Williams (2014), Remark 1, p. 187; Rogers and D. Williams (2000),
Lemma IV.45.9, p. 105; or Strulovici and Szydlowski (2015), footnotes 73 and 78.
1
By applying Itô’s formula on 4−A) {(�) , () ),2 for fixed 0 < ) < ∞ we have
4−A) {(�) , () ) − {(0, B) =∫ )
0
4−AC (L{ − A{)(�C , (C) dC
−∫ )
0
4−ACU(�C , (C , :=,C)m{
m0(�C , (C) d�C
+∫ )
0
4−AC(m{
m0+ m{mB
)(�C , (C) d(C .
Since d(C = 0 whenever �C > 0, and by Condition 3 we have (m{/m0 + m{/mB ) (�C , (C) =0 when �C = 0, the last term is identically zero. Moreover, because U(�C , (C , :=,C) and
m{/m0 are bounded, the second term has mean zero. Taking expectation on both sides,
for fixed 0 < ) < ∞ we have the Dynkin’s formula
{(0, B) = −E0B
[∫ )
0
4−AC (L{ − A{)(�C , (C) dC
]+ 4−A)E0B [{(�) , () )],
where the expectations on the right-hand side are finite for each ) < ∞.
Note that HJB equation (5) implies
A{(0, B) ≥ 5 (0, B, :=,C) + L{(0, B),
and therefore we have
{(0, B) ≥ E0B
[∫ )
0
4−AC 5 (�C , (C , :=,C) dC
]+ 4−A)E0B [{(�) , () )] . (1)
Let ) → ∞, we have
{(0, B) ≥ lim inf)→∞
E0B
[∫ )
0
4−AC 5 (�C , (C , :=,C) dC
]+ lim inf
)→∞4−A)E0B [{(�) , () )] .
Because∫ )04−AC 5 (�C , (C , :=,C) dC is increasing in ) , as the integrand is non-negative,
we can apply either the monotone convergence theorem or Fatou’s lemma to have
{(0, B) ≥ E0B
[∫ ∞
0
4−AC 5 (�C , (C , :=,C) dC
]+ lim inf
)→∞4−A)E0B [{(�) , () )]
= {= (0, B |:=, k¬=) + lim inf)→∞
4−A)E0B [{(�) , () )]
≥ {= (0, B |:=,k¬=).
2Note that (C is a finite variation process and hence both the quadratic variation 〈(, (〉C and the
covariation 〈�, (〉C are identically zero. See Shreve (2004) Section 7.4.2 for a non-technical treatment.
2
Now we repeat the argument by replacing :=,C with :∗=,C = :∗= (�C). Inequality (1)
becomes equality, and for 0 < ) < ∞ we have
{(0, B) = E0B
[∫ )
0
4−AC 5 (�C , (C , :∗=,C) dC
]+ 4−A)E0B [{(�) , () )] .
As the first term on the right-hand side is increasing in ) , the second term must be
decreasing, and hence
{(0, B) = E0B
[∫ ∞
0
4−AC 5 (�C , (C, :∗=,C) dC
]+ lim)→∞
4−A)E0B [{(�) , () )]
= {(0, B |:∗=, k¬=) + lim)→∞
4−A)E0B [{(�) , () )] .
We finish the proof by showing lim)→∞ 4−A)E0B [{(�) , () )] = 0 as follows.
Because �C ≥ 0, we have
E0B [{(�) , () )] ≤ E0B [{(0, () )]= D(0)E0B [4() ]≤ D(0)E0B [4() ]= D(0)4BE0B [4() −B]≤ D(0)4BE0B [4") # /2 ],
where ") = max0≤C≤) {`C +f�C}. The last inequality comes from the fact that () − B ≤")#/2 almost surely. Indeed, given that �0 = 0, the process {() − B})≥0 can be viewed
as {") })≥0 with time change in the sense that () − B = ") ′ with ) ′=
∫ )0 C/2 dC.
Because C ∈ [0, #], we have ) ′ ≤ )#/2 and hence ") ′ ≤ ")#/2.
Then from Lemma 4.1 in Section 4, we have
E0B [4") # /2 ] ≤ �14(`+f2/2))#/2 +�2
for some �1, �2 ≥ 0, and therefore
4−A)E0B [{(�) , () )] ≤ �14(−A+(`+f2/2)#/2)) +�24
−A) .
The right-hand side goes to 0 as ) → ∞ if (` + f2/2)#/2 < A, which is precisely
Assumption 1. �
3
1.2 Normal Reflection Condition and Feynman-Kac Formula
The following Lemma provides some properties of player =’s payoff function for Markov
strategy profile k with state variables (0, B), in which the strategies are not necessarily
best responses against each other.
Let
L = [(0, B) mm0
+ 1
2U2(0, B) m
2
m02
with [(0, B) = −` (0, B)/2 and U(0, B) = f√ (0, B)/2. Denote by 5 (0, B) = A (1 −
:= (0, B))4B the flow payoff that player= receives at state (0, B), and by {(0, B) = {= (0, B |k)her payoff at state (0, B).
Lemma 1.1 (Properties of Payoff Function). If 0 ↦→ {(0, B) is twice continuously
differentiable at 0, then it satisfies the Feynman-Kac formula
A{(0, B) = L{(0, B) + 5 (0, B). (2)
Moreover, if { is continuously differentiable on {0 = 0},3 and (0, B) is bounded away
from 0 on some open set containing {0 = 0}, then { satisfies the normal reflection
conditionm{(0+, B)m0
+ m{(0+, B)mB
= 0. (3)
Proof. Given strategy profile k, consider player =’s continuation value at time ) , that
is,
{(�) , () ) = E
[∫ ∞
)
4−A (C−) ) 5 (�C , (C) dC
����F)],
and her total discounted payoff at time 0, evaluated conditionally on the information
available at time ) , that is,
/) ≔ E
[∫ ∞
0
4−AC 5 (�C , (C) dC
����F)]
= E
[∫ ∞
)
4−AC 5 (�C , (C) dC
����F)]+∫ )
0
4−AC 5 (�C , (C) dC
= 4−A) {(�) , () ) +∫ )
0
4−AC 5 (�C , (C) dC .
3Here we mean { : ℝ+ × ℝ → ℝ can be extended to a function that is continuously differentiable on
some open set in ℝ2 containing {0 = 0}.
4
Note that {/C}C≥0 is a martingale.4 Indeed, for 0 ≤ g ≤ ) , we have
E[/) |Fg] = E
[E
[∫ ∞
0
4−AC 5 (�C , (C) dC
����F)] ����Fg
]
= E
[∫ ∞
0
4−AC 5 (�C , (C) dC
����Fg]
= /g .
Because (·, B) is piecewise continuous, we have 0 ↦→ {(0, B) is piecewise �2 on
ℝ++. Moreover, { is continuously differentiable on {0 = 0} by assumption. Therefore,
we can apply Itô formula to 4−A) {(�) , () ),5 which gives
4−A) {(�) , () ) − {(0, B) =∫ )
0
4−AC (L{ − A{)(�C , (C) dC
−∫ )
0
4−ACU(�C , (C)m{
m0(�C , (C) d�C
+∫ )
0
4−AC(m{
m0+ m{mB
)(�C , (C) d(C .
Because /) is a martingale starting from {(0, B), we can conclude the process
/) − {(0, B) +∫ )
0
4−ACU(�C , (C)m{
m0(�C , (C) d�C
=
∫ )
0
4−AC ((L{ − A{ + 5 )(�C , (C)) dC +∫ )
0
4−AC(m{
m0+ m{mB
)(�C , (C) d(C
is a continuous martingale starting from 0. Observe that the right-hand side is a
finite variation process. As a consequence, /) must be 0 for all ) ≥ 0 almost surely.
Moreover, note that the Lebesgue measure dC and the measure d() on ([0, )],B([0, )]))are mutually singular almost surely. Indeed, the set � ≔ { C ∈ [0, )] : �C = 0 } is a null
set under Lebesgue measure dC, and [0, )]\� is a null set under measure d() almost
surely. Because almost surely the sum of the integrals on the right-hand side is zero
and dC ⊥ d() , by Lebesgue decomposition theorem we know that each integrand is a.s.
a.e. 0 on [0, )] for all ) .6 This yields the Feynman-Kac formula (2) and the normal
reflection condition (3).7 �
4Assumption 1 guarantees all expectations in this proof are finite.
5See footnotes 1 and 2 for the validity of applying Itô’s formula here.
6By a.e. 0 on [0, )] we mean the integrands are 0 on all non-null sets w.r.t. the corresponding measure.
7We require (0, B) > 0 on some neighborhood containing {0 = 0}, as otherwise all measurable
subsets of [0, )] would be null sets w.r.t. d() (l), in which case we cannot conclude that { satisfies the
normal reflection condition.
5
Lemma 1.2 (Smooth Pasting Condition). In any MPE with stopping threshold 0 > 0,
each player’s normalized payoff function is continuously differentiable at 0.
Proof. Given any MPE k = (:1, . . . , :# ), consider the region [0, 0) on which the
intensity of exploration is positive. Clearly D′= (0−), the left derivative of the equilibrium
payoff function of player = at 0, cannot be positive, because otherwise we would have
D= (0) < 1 for 0 immediately to the left of 0, contradicting the fact that := is a best
response.
Suppose by contradiction that D= violates the smooth pasting condition at 0 with
D′= (0−) < 0. Consider a deviation strategy := in which := (0) = 1 on [0 − n, 0 + n) for
some small n > 0, and := = := otherwise, with D= denoting the associated payoff function
of player =. Moreover, write |= ≔ (ln(D=))′ = D′=/D= and |= ≔ (ln(D=))′ = D′=/D=.Notice that under this deviation the intensity of exploration := +
∑;≠= :; is bounded
away from zero on [0, 0+ n) so that D= is once continuously differentiable on this region,
which implies |= is continuous on [0, 0 + n). Also note that because the strategy profile
remains unchanged on [0, 0 − n), the normal reflection condition |= (0) = |= (0) = −1
and the Feynman-Kac equation (6) imply that |= and |= coincide on [0, 0 − n).8 From
D′= (0−) < 0 we know that |= (0−) < 0 as well, and therefore by the continuity of |= we
can choose n small enough so that |= < 0 on [0 − n, 0 + n). This implies D= > 1 = D= on
[0, 0 + n), because D= (0) = exp(∫ 0
0+n |= (I) dI)
for 0 < 0 + n . Therefore, the deviation
strategy := leads to a higher payoff on [0, 0 + n) than the equilibrium strategy :=, which
is a contradiction.
�
2 Comparative Statics
Proof of Lemma 10. From D′′8 (G) = �8 (D8 (G), D′8 (G)) > −D′8 (G) ≥ 0, we know D′8 is
negative on (0, 08). Therefore, D−18(D), the inverse of D8 at D, is uniquely defined for
D ∈ [1, D8 (0)].9 To simplify notations, we denote the derivatives (one-side if necessary)
of D8 at level D ∈ [1, D8 (0)] by D′8 (D) and D′′8 (D), where D′8 = D′8 ◦ D−1
8 and D′′8 = D′′8 ◦ D−18 .
We first show that for all D ∈ [1,min{D1 (0), D2(0)}], we have D′1(D) ≤ D′
2(D).
Suppose by contradiction that D′1(D) > D′
2(D) for some 1 ≤ D ≤ min{D1 (0), D2(0)}. Let
D ≔ sup{ D ≤ D | D′1(D) ≤ D′
2(D) }. Obviously we have 1 ≤ D < D since D′
1(1) = D′
2(1),
and by the continuity of D′8
we have D′1(D) = D′
2(D) < 0. We then compare their
8This claim is not affected by the possible discontinuities of the strategies on [0, 0), as the payoff
functions are at least once continuously differentiable on this region.
9We define D−18(1) ≔ 08 .
6
derivatives on (D, D + n) for some small enough n > 0 and have
dD′1(D)
dD=D′′
1(D)
D′1(D) =
�1(D, D′1(D))D′
1(D) ≤
�1(D, D′2(D))D′
2(D) ≤
�2(D, D′2(D))D′
2(D) =
D′′2(D)
D′2(D) =
dD′2(D)
dD,
where the first inequality comes from the monotonicity of �1(I1, I2)/I2, and the second
inequality comes from D′2(D) < 0 and our assumption that �1 ≥ �2. This, together with
D′1(D) = D′
2(D), implies D′
1(D) ≤ D′
2(D) on (D, D + n), contradicting the definition of D.
Next we show that D1(0) ≤ D2(0). Suppose by contradiction that D1(0) > D2(0).Then normal reflection conditions D8 (0) = −D′8 (0) and Condition 4 in the lemma imply
the following strict inequality
D1(0) = −D′1(0) = −D′1(D1(0)) = −D′1(D2(0)) −∫ D1 (0)
D2 (0)D′′1 (D)/D′1(D) dD
≥ −D′2(D2(0)) −∫ D1 (0)
D2 (0)�1(D, D′1(D))/D
′1(D) dD
> −D′2(D2(0)) +∫ D1 (0)
D2 (0)1 dD
= −D′2(0) + D1(0) − D2(0) = D1(0),
which is a contradiction.
Since we have D′1(D) ≤ D′
2(D) ≤ 0 for all D ∈ [1, D1(0)] with D1(0) ≤ D2(0), it is
obvious that 01 ≤ 02 and D1 ≤ D2.
To prove the strict inequalities, suppose in addition to �1 ≥ �2 we also have �1 = �2 on
(D, D2(0)) ×ℝ− for some D ∈ (1, D2(0)]. Moreover, assume that for all I2 ≤ 0, we have
�1(D−, I2) > �2(D−, I2), or we havem�1 (D−,I2)
mI1<
m�2 (D−,I2)mI1
andm�1 (D−,I2)
mI2≥ m�2 (D−,I2)
mI2.
Note that either of these conditions implies �1(I1, I2) > �2(I1, I2) for I1 immediately
below D, which together with D′1(D) ≤ D′
2(D) implies that D′
1< D′
2< 0 on (D − [, D)
for some [ > 0. We next show that D1(0) < D2(0), which together with the weak
inequalities proved above yields the strict inequalities in the Lemma.
Suppose by contradiction that D1(0) = D2(0). Since �1 = �2 on (D, D2(0)) ×ℝ−, we
must have D′1(D) = D′
2(D) ≕ I2 < 0. We can then compare the left derivatives of D′8 at D
to have
dD′1(D−)
dD=D′′
1(D−)
D′1(D) =
�1(D−, I2)I2
≤ �2(D−, I2)I2
=D′′
2(D−)
D′2(D) =
dD′2(D−)
dD.
If �1(D−, I2) > �2(D−, I2), then this inequality is strict, which together with D′1(D) =
D′2(D), contradicts the fact that D′
1< D′
2on (D−[, D). Otherwise, it must be �1(D−, I2) =
7
�2(D−, I2) and hencedD′
1(D−)
dD=
dD′2(D−)
dD< 0. Then from their second-order derivatives
d2D′8(D)
dD2=
1
D′8(D)
d�8 (D, D′8 (D))dD
−�8 (D, D′8 (D))(D′8(D))2
dD′8(D)
dD
=1
D′8(D)
(m�8 (D, D′8 (D))
mI1+m�8 (D, D′8 (D))
mI2
dD′8(D)
dD
)−�8 (D, D′8 (D))(D′8(D))2
dD′8(D)
dD,
we can conclude fromm�1 (D−,I2)
mI1<
m�2 (D−,I2)mI1
andm�1 (D−,I2)
mI2≥ m�2 (D−,I2)
mI2that
d2D′1(D−)
dD2 >
d2D′2(D−)
dD2 . This, together with D′1(D) = D′
2(D) and
dD′1(D−)
dD=
dD′2(D−)
dD, contradicts the fact
that D′1< D′
2on (D − [, D).
�
2.1 Proof of Corollary 5 (First Part in Detail)
Let d = (\ − ln(1 + \))/\2 and # =1
d(1+\) for \ > −1. To show the limit results for
d > d, we first show in Lemma 2.1 below that |0∗#− 0†
#| → Δ for some Δ < +∞
as # → # . Because we know from Corollary 1 that 0∗#
→ +∞, we conclude that
0†#→ +∞, and hence 0# → +∞ as well. The convergence of @# (0# ) then follows
from Lemma 3.
After showing that lim#→#
*∗#/*†
#
∞< +∞ in Lemma 2.2, we can then conclude
*†#→ +∞ from the fact that *∗
#→ +∞ from Corollary 1.
Lemma 2.1. If d > d, we have |0∗#− 0†
#| → Δ ∈ (0, +∞) as # → # .
Proof. Note that when d > d, the expression of 0 for the non-binding case in Corollary
6 cannot be applied. Therefore, the symmetric MPE must belong to the binding case
for all # ∈ (1, #). Moreover, for d > d, we can verify that ]# in Corollary 6 converges
to some ]#∈ (0, 1). Then from explicit calculations we know that W2 → 1 + \ > 0 and
W1 → −1 as # → # , and therefore we have
Δ ≔ lim#→#
|0∗# − 0†#|
= lim#→#
1
W2 − W1
ln
(1 + ]/W2
1 + ]/W1
)
=1
2 + \ ln
(1 + ]
#/(1 + \)
1 − ]#
)
∈ (0, +∞).
�
8
Lemma 2.2. If d > d, we have lim
*∗#/*†
#
∞< +∞ as # → # .
Proof. We first show that *∗/*†
∞ = *∗(0)/*†(0) by showing *∗(0)/*†(0) is
nonincreasing in 0.
Write |∗ = (ln*∗)′ and |† = (ln*†)′. It is not difficult to verify that
|∗′=
1
#d+ \|∗ − |∗2
, on (0, 0∗),
and |†′=
1#d
+ \|† − |†2, on (0, 0†),
1*†d
+ \|† − |†2, on (0†, 0).
Because *† < # on (0†, 0), we have |†′ ≥ |∗′. Then from the normal reflection
condition |∗(0) = |†(0) = −1 we have |∗ ≤ |†. Therefore, |∗(0) − |†(0) =(ln(*∗(0)/*†(0)
) )′ ≤ 0, which implies ln(*∗(0)/*†(0)
)is nonincreasing in 0 and
hence so is*∗(0)/*†(0). Therefore, we have *∗/*†
∞ = *∗(0)/*†(0).Moreover, because |∗(0) = |† (0) = −1 and |∗′ = |†′ on (0, 0†
#), we have |∗ = |†
on (0, 0†#) and hence *∗/*† is constant on (0, 0†
#). Therefore, we can write
*∗#(0)
*†#(0)
=*∗#(0†#)
*†#(0†#)=
1
#
1
W2 − W1
(W24
−W1 (0∗−0†) − W14−W2 (0∗−0†)
).
Since we know that W2 → 1 + \ > 0 and W1 → −1, we have
*∗#
*†#
∞=*∗#(0)
*†#(0)
→ 1
# (2 + \)
((1 + \)4Δ + 4−(1+\)Δ
)< +∞.
�
3 Lemmas in the Construction of Asymmetric MPE
Proof of Lemma 12. For any 0! ≤ 01 ≤ 02 ≤ 0', consider D(·|01, 02) of class
�1 ((0!, 0')) that solves the explorer ODE D = d(D′′ − \D′) on [0! , 01] ∪ [02, 0'], and
the free-rider ODE D = 1+d(D′′−\D′) on [01, 02], with initial conditions D(0') = D(0')and D′(0') = D′(0'). The existence and uniqueness of D(·|01, 02) is guaranteed by stan-
dard results. We write D0!(01, 02) ≔ D(0! |01, 02), and D1
!(01, 02) ≔ D′(0! |01, 02) for
the value and the derivative of D(·|01, 02) evaluated at 0! .
Note that the functions D0!, D1
!: { (01, 02) | 0! ≤ 01 ≤ 02 ≤ 0' } → ℝ+ are
continuous (see, e.g., Theorem 2.14 in Barbu (2016)). Moreover, from Lemma 3.1 below
we know that they have the following properties: D0!(0, 0) > D! , D1
!(0, 0) < D′(0!) < 0
9
for all 0 ∈ [0!, 0']; D0!(0! , 0') < D! , 0 ≥ D1
!(0! , 0') > D′(0!); D0
!(01, 02) is strictly
increasing in 01 and strictly decreasing in 02.
By construction, D(·|01, 02) and D match value and derivative at 0'. Next we choose
01 and 02 so that they match value and derivatives at 0! as well.
Because D0!(0! , 0!) > D! and D0
!(0! , 0') < D!, there exists a unique 01 ∈ (0! , 0')
such that D0!(01, 0') = D!, and D0
!(01, 0') < D! for all 01 ∈ [0! , 01). Therefore, for
each 01 ∈ [0! , 01], because D0!(01, 01) > D! , there exists a unique 02(01) ∈ (01, 0'],
such that D0!(01, 02(01)) = D! , with 02(01) = 0' by the definition of 01. It is obvious
that the function 02 is continuous on its domain [0! , 01].In other words, D(·|01, 0') solves the free-rider ODE on (01, 0') and the explorer
ODE on (0!, 01), while D(·|0!, 02(0!)) solves the explorer ODE on (02(0!), 0') and
the free-rider ODE on (0! , 02(0!)), with both D(·|01, 0') and D(·|0! , 02(0!)) having
the same value as D at 0! and 0', and same derivatives as D at 0'.
From Lemma 3.1 below we can conclude that D1!(01, 0') < D′(0!) < D1
!(0!, 02(0!)).10
Therefore, there exists a 01 ∈ (0!, 01) such that D1!(01, 02(01)) = D′(0!) from the con-
tinuity of D1!
and 02. Because 01 < 01, we have D0!(01, 0') < D! , which implies
02(01) ≠ 0' and thus it must be 02(01) < 0'.
Finally, let 0"− ≔ 01, 0"+ ≔ 02(01) and D(·) ≔ D(·|0"−, 0"+). We have shown
that D(0!) = D! , D′(0!) = D′(0!), and by construction 0! < 0"− < 0"+ < 0'. The
image of D follows from the monotonicity of D, which is implied by Lemma 3.1 with
D′(0') = D′(0') ≤ 0. �
Lemma 3.1. Let D(·| 5 , D0, D1) be the solution of the ODE D = 5 + d(D′′ − \D′) with
initial conditions D(0') = D0 and D′(0') = D1, for some d > 0 and 5 , \ ∈ ℝ. For any
0! < 0', D(0! | 5 , D0, D1) is strictly increasing in D0 and strictly decreasing in 5 and D1,
whereas D′(0! | 5 , D0, D1) is strictly decreasing in D0 and strictly increasing in 5 and D1.
Moreover, if D0 ≥ 5 and D1 ≤ 0, then D′(0! | 5 , D0, D1) ≤ 0, with D′(0! | 5 , D0, D1) = 0 if
and only if D0 = 5 and D1 = 0.
Proof. Let [ denote either 5 , −D0, or D1. For 0! < 0', to show D′(0! |[) is strictly
10Write D(·) ≔ D(·|01, 0'). First we show D1!(01, 0') ≔ D′(0!) ≤ D′(0!). Since D solves the
free-rider ODE on (01, 0'), with the same initial conditions at 0' as D, from Lemma 3.1 we know
that D(01) < D(01). Then we must have D(0) < D(0) for all 0 ∈ (0! , 01) for the following reason.
Suppose by contradiction this is not the case, then there exists an 0 ∈ (0! , 01) such that D(0) = D(0)and D′(0) ≤ D′(0) ≤ 0. Now since D solves the explorer ODE on (0! , 0), Lemma 3.1 implies that
D(0!) > D(0!), which contradicts the fact that D(0!) = D(0!), guaranteed by the our choice of 01.
Therefore we have D(0) < D(0) for all 0 ∈ (0! , 01), which together with D(0!) = D(0!), implies
D′(0!) ≤ D′(0!).Next we show that this inequality is strict. Suppose by contradiction that D′(0!) = D′(0!). Then D
and D have the same values and derivatives at 0!. From the ODEs which D and D satisfy on (0!, 01), we
can conclude D′′(0!) > D′′(0!). This contradicts the fact that D(0) < D(0) on (0! , 01), and therefore we
must have D′(0!) < D′(0!). The other inequality can be derived analogously.
10
increasing in [, let [1 < [2. For the purpose of contradiction, suppose D′(0! |[1) ≥D′(0! |[2). Let 0 = sup{ 0 < 0' | D′(0 |[1) = D′(0 |[2) }.
For [ being either 5 or −D0, because [1 < [2 and D′′(0' |[) = D0− 5d
+ \D1, we have
D′′(0' |[1) > D′′(0' |[2). Therefore, we have D′(0 |[1) < D′(0 |[2) for all 0 ∈ (0'−n, 0')for some n > 0, which obviously also holds when [ denotes D1 by the continuity of
D′(·|[). Therefore, we have 0! ≤ 0 < 0', and from the continuity of D′(·|[), we have
D′(0 |[1) = D′(0 |[2).Because D′(0 |[1) < D′(0 |[2) for all 0 ∈ (0, 0'), we have
D(0 |[1) = D(0' |[1) −∫ 0'
0
D′(0 |[1) d0 > D(0' |[2) −∫ 0'
0
D′(0 |[2) d0 = D(0 |[2),
which implies
D′′(0 |[1) =D(0 |[1) − 5 |[1
d+ \D′(0 |[1) >
D(0 |[2) − 5 |[2
d+ \D′(0 |[2) = D′′(0 |[2).
But this is a contradiction, as D′(0 |[1) = D′(0 |[2) together with D′(0 |[1) < D′(0 |[2) for
all 0 ∈ (0, 0') implies D′′(0 |[1) ≤ D′′(0 |[2).As 0! is chosen arbitrarily, we have shown that for any 0 < 0', D′(0 |[) is strictly
increasing in [. Therefore, as D(0! |[) = D(0' |[) −∫ 0'0!
D′(0 |[) d0, we have D(0! |[) is
strictly decreasing in [.
Finally, if D0 = 5 and D1 = 0, then D(0 | 5 , D0, D1) = 5 is the solution of the ODE and
hence D′(0! | 5 , D0, D1) = 0 for any 0! ≤ 0'. Otherwise, if either D0 > 5 or D1 < 0, from
the strict monotonicity of D′(0! | 5 , D0, D1) in D0 and D1, we have D′(0! | 5 , D0, D1) < 0. �
4 Math Appendix
Lemma 4.1. Let ") = max0≤C≤) {`C + f�C} be the running maximum of the standard
Brownian motion {�C}C≥0 starting from 0.
Then there exist �1, �2 ≥ 0 which do not depend on ) , such that
E[4") ] ≤ �14(`+f2/2)) +�2.
Moreover, as ) → ∞,
E[4") ] → `
` + f2/2
if ` + f2/2 < 0.
Proof. From Corollary 9.1 in Choe (2016), or Shreve (2004) equation (7.27), we can
11
calculate E[4"C ] explicitly to have
E[4") ] =(` + f
2
2
)−1 ((` + f2)4(`+f2/2))
(1 − #
(−` + f
2
f
√)
))+ `#
(− `f
√)))
≤ �14(`+f2/2)) +�2
≤ |�1 |4(`+f2/2)) + |�2 |,
where # is the CDF of standard normal distribution. The convergence result follows
from straightforward calculation.
�
References
Barbu, Viorel. “Existence and Uniqueness for the Cauchy Problem”. In: Differential
Equations. Cham: Springer International Publishing, 2016, pp. 29–77.
Choe, Geon Ho. “The Reflection Principle of Brownian Motion”. In: Stochastic Anal-
ysis for Finance with Simulations. Cham: Springer International Publishing, 2016,
pp. 147–156.
Chung, K. L. and R. J. Williams. “Generalized Ito Formula, Change of Time and
Measure”. In: Introduction to Stochastic Integration. New York, NY: Springer New
York, 2014, pp. 183–215.
Rogers, L. C. G. and David Williams. “Introduction to Itô Calculus”. In: Diffusions,
Markov Processes and Martingales. 2nd ed. Vol. 2. Cambridge Mathematical Li-
brary. Cambridge University Press, 2000, pp. 1–109.
Shreve, Steven E. Stochastic Calculus for Finance II. Springer New York, 2004.
Strulovici, Bruno and Martin Szydlowski. “On the smoothness of value functions and the
existence of optimal strategies in diffusion models”. In: Journal of Economic The-
ory 159 (2015). Symposium Issue on Dynamic Contracts and Mechanism Design,
pp. 1016–1055.
12