+ All Categories
Home > Documents > Strategic Exploration for Innovation

Strategic Exploration for Innovation

Date post: 11-Apr-2022
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
57
arXiv:2108.07218v1 [econ.TH] 16 Aug 2021 Strategic Exploration for Innovation * Shangen Li August 17, 2021 Abstract We analyze a game of technology development where players allocate resources between exploration, which continuously expands the public domain of available technologies, and exploitation, which yields a flow payoff by adopting the explored technologies. The qualities of the technologies are correlated and initially un- known, and this uncertainty is fully resolved once the technologies are explored. We consider Markov perfect equilibria with the quality difference between the best available technology and the latest technology under development as the state vari- able. In all such equilibria, while the players do not fully internalize the benefit of failure owing to free-riding incentives, they are more tolerant of failure than in the single-agent optimum thanks to an encouragement effect. In the unique symmetric equilibrium, the cost of exploration determines whether free-riding prevails as team size grows. Pareto improvements over the symmetric equilibrium can be achieved by asymmetric equilibria where players take turns performing exploration. Keywords: Strategic experimentation, Innovation, Multi-armed bandit, Organiza- tional learning. JEL classification: C73; D83; O3. * I am grateful to Steven Callander, Martin Cripps, Christian Ewerhart, David Hémous, Johannes Hörner, Vijay Krishna, Margaret Meyer, Alessandro Lizzeri, Nick Netzer, Anne-Katrin Roesler, Bruno Strulovici, Leeat Yariv, and seminar participants at University of Zürich and Swiss Theory Day 2021 for helpful comments. I am especially indebted to Marek Pycia, Armin Schmutzler, and Aleksei Smirnov for advice and encouragement. All errors are mine. Department of Economics, University of Zürich. E-mail address: [email protected] 1
Transcript
Page 1: Strategic Exploration for Innovation

arX

iv:2

108.

0721

8v1

[ec

on.T

H]

16

Aug

202

1

Strategic Exploration for Innovation∗

Shangen Li†

August 17, 2021

Abstract

We analyze a game of technology development where players allocate resources

between exploration, which continuously expands the public domain of available

technologies, and exploitation, which yields a flow payoff by adopting the explored

technologies. The qualities of the technologies are correlated and initially un-

known, and this uncertainty is fully resolved once the technologies are explored.

We consider Markov perfect equilibria with the quality difference between the best

available technology and the latest technology under development as the state vari-

able. In all such equilibria, while the players do not fully internalize the benefit of

failure owing to free-riding incentives, they are more tolerant of failure than in the

single-agent optimum thanks to an encouragement effect. In the unique symmetric

equilibrium, the cost of exploration determines whether free-riding prevails as team

size grows. Pareto improvements over the symmetric equilibrium can be achieved

by asymmetric equilibria where players take turns performing exploration.

Keywords: Strategic experimentation, Innovation, Multi-armed bandit, Organiza-

tional learning.

JEL classification: C73; D83; O3.

∗I am grateful to Steven Callander, Martin Cripps, Christian Ewerhart, David Hémous, Johannes

Hörner, Vijay Krishna, Margaret Meyer, Alessandro Lizzeri, Nick Netzer, Anne-Katrin Roesler, Bruno

Strulovici, Leeat Yariv, and seminar participants at University of Zürich and Swiss Theory Day 2021 for

helpful comments. I am especially indebted to Marek Pycia, Armin Schmutzler, and Aleksei Smirnov for

advice and encouragement. All errors are mine.†Department of Economics, University of Zürich. E-mail address: [email protected]

1

Page 2: Strategic Exploration for Innovation

1 Introduction

Without innovation, we would still be living in caves and hunting with stones. Though

people celebrate disruptive inventions such as penicillin and the Internet, innovations

are usually accomplished through hundreds and thousands of minor improvements.

These incremental improvements are achieved from iterative experimentation through

small changes, and can have a huge cumulative impact in the long run.1 Incremental

experimentation of this kind is particularly pertinent in today’s data-driven world, in

which experiments can be conducted more swiftly and inexpensively than ever before

thanks to the fast-growing fields of data science and artificial intelligence. Tech giants

such as Microsoft and Amazon run thousands of experiments every day to optimize their

products, processes, customer experiences, and business models.2

However, even for these companies with digital roots that can run experiments

at virtually zero cost, valuable resources still need to be allocated to the design and

implementation of the experiments, as well as to the collection and analysis of the

data they produce. Moreover, because of its uncertain nature, experimentation does

not guarantee innovation. As a result, the trade-off between creating value through

innovation—–to explore—–and capturing value through operations—–to exploit—–is

behind all innovation processes in organizations. On top of such complexities, incentives

for experimentation are susceptible to free-riding opportunities when successes and

failures are publicly observed. Such a strategic effect inevitably creates inefficiency of

learning, having a lasting impact on technological progress and organizational growth.

Examples of strategic exploration certainly go beyond intracompany R&D activities.

Research joint ventures formed by multiple firms, open-source software development,

consumer search, and even academic researchers working on a joint paper are also

virtually involved in strategic exploration.

This paper analyzes a game of strategic exploration in which a finite number of

forward-looking players jointly search for innovative technologies. Given a set of pub-

licly available technologies, at each point in time, each player decides simultaneously

how to allocate a perfectly divisible unit of resource between exploration, which ex-

pands the set of feasible technologies at a rate proportional to the resource allocated,

and exploitation, which yields a flow payoff from the adoption of one of the available

technologies. This flow payoff is generated according to an (initially unknown) realized

path of Brownian motion, which maps technologies to their qualities. Not all tech-

nologies are readily available to the public, and only the qualities associated with the

1Garcia-Macia, Hsieh, and Klenow (2019) estimate that improvements to already existing products

accounted for about 77 percent of economic growth in the United States between 2003 and 2013.

2See Thomke (2020).

2

Page 3: Strategic Exploration for Innovation

explored technologies are known. We focus on the scenario in which players’ actions

and outcomes are publicly observed. Hence, the explored technologies are pure public

goods, and there are perfect knowledge spillovers between players.

One of the major contributions of this paper is to extend the classic game of strate-

gic experimentation to a setting that features a continuum of correlated arms with the

possibility of discovering new arms. When modeling technologies as arms, correlation

between arms and discoveries of new arms are key features of technology development

and innovation. However, a fixed set of independent arms is assumed in most literature

on strategic experimentation because the analysis is substantially complicated by corre-

lation. We demonstrate that under a specific form of correlation, such complexities can

be circumvented by simply focusing on incremental experimentation.

We first consider the efficient benchmark in which the players work jointly to maxi-

mize the average payoff. The solution to this cooperative problem takes a simple cutoff

form: All players allocate resources exclusively to exploration if and only if the quality

difference between the best available technology and the latest outcome of the experi-

ment is within a time-invariant cutoff. Players in a larger team can attain higher payoffs

by being more tolerant of unsuccessful outcomes. As team size grows, because of the

growing total learning capacity, exploration extends to arbitrarily pessimistic states in

the limit, and the welfare under complete information can be achieved asymptotically.

In the strategic problem, we restrict attention to Markov perfect equilibria (MPE)

with the difference between the qualities of the best available technology and the latest

technology under development as the state variable. Despite the considerable disparities

between our setting and the two-armed bandit models in the strategic experimentation

literature, all MPE in our model exhibit a similar encouragement effect: The presence

of other players encourages at least one of them to continue exploration at states where

a single agent would already have given up. The players thus become more tolerant

of failure in the hope that successful outcomes will bring the other players back to

exploration in the future, promoting innovation and sharing its burden. In addition,

exactly because of such an incentive, any player who never explores would strictly prefer

the deviation that resumes the exploration process as a volunteer at the state where all

exploration stops. Therefore, no player always free-rides in equilibrium in spite of the

fact that exploration per se never generates any payoffs.

We further show that there is no MPE in which all players use simple cutoff strategies

as in the cooperative solution. A symmetric equilibrium thus necessarily requires

the players to choose an interior allocation of their resources at some states. We

establish the existence and uniqueness of the symmetric equilibrium, and provide closed-

form representations for the equilibrium strategies and the associated payoff functions.

3

Page 4: Strategic Exploration for Innovation

Depending on the parameters, the symmetric equilibria can be classified into two types

as follows. In the first type of equilibrium, all players allocate resources exclusively to

exploration when the latest outcome is sufficiently close to the highest-known quality,

exclusively to exploitation when it is sufficiently far away, and to both simultaneously

when it is in between. In such a case, the resource constraints on exploration are binding

at optimistic states, and therefore will be referred to as the binding case. In the second

type of equilibrium, however, the resource constraints are never binding. An interior

allocation is chosen by each player when the latest outcome is close to the highest-known

quality, and all resources are allocated to exploitation if it is sufficiently far away. In

this non-binding case, the incentive to free-ride is so strong that players even allocate a

positive fraction of their resources to exploitation when the latest technology achieves

the highest-known quality and thus innovation takes place instantaneously. Moreover,

this free-rider effect completely freezes the encouragement effect: Additional players

would not be able to encourage the players to explore at more pessimistic states and

would bring down both the individual and the overall intensity of exploration.

We then show the following comparative statics: The symmetric equilibrium always

belongs to the binding case for a small team and may or may not fall into the non-binding

category irreversibly as team size grows. The non-binding case can be averted if and only

if exploration is cheap enough and the prospect of development is sufficiently promising.

In such a case, the players become increasingly tolerant of failure as team size grows

and are willing to explore at arbitrary pessimistic states in the limit, sharing the same

features as in the cooperative solution. Otherwise, free-riding incentives prevail as the

team becomes large: The resource constraints eventually fail to be binding, and the

tolerance for failure among the players stays the same for further increases in team size.

This result suggests that whether an experimentation culture can be successfully

created in large organizations hinges on the tools available for experimentation. As

innovation is driven by experimentation, perturbations in the effectiveness of the explo-

ration technology can have drastic impacts on welfare and long-run outcomes. More

precisely, for a sufficiently inexpensive exploration technology and promising prospects,

the equilibrium payoffs in a large team resemble the ones in the cooperative solution,

and the first-best technology under complete information is highly likely to be developed

in the long run. Otherwise, both the equilibrium payoffs and the long-run likelihood of

developing the first-best technology would be identical to those in a small team owing

to severe free-riding.

Lastly, we investigate whether asymmetric equilibria can improve welfare and long-

run outcomes over the symmetric equilibrium. We construct a class of asymmetric MPE

in which the players take turns performing exploration at pessimistic states, so that each

4

Page 5: Strategic Exploration for Innovation

player achieves a higher payoff than in the symmetric equilibrium. It turns out that these

asymmetric MPE are the best MPE in two-player games in terms of average payoffs.

Unlike in the symmetric equilibrium, the players in these asymmetric MPE become

increasingly tolerant of failure as team size grows, independent of the other parameters.

The intuition for this result is that when alternation is allowed, the burden of keeping the

exploration process active can be shared among more players in larger teams,and thus the

players are willing to explore at more pessimistic states. As a consequence, the long-run

outcomes approach the first best as team size grows, irrespective of the cost of exploration

or the prospect of the development. Nevertheless, similar to the symmetric equilibrium

with non-binding resource constraints, in these asymmetric equilibria innovations might

arrive at a much slower rate than in the cooperative solution because of the low proportion

of the overall resource allocated to exploration. As a result, the welfare loss might still

be significant in large teams because of the strong free-rider effect.

1.1 Related Literature

This paper contributes to the literature on strategic experimentation, started by the

Brownian model in Bolton and Harris (1999), and then followed by the exponential

model in Keller, Rady, and Cripps (2005) and the Poisson model in Keller and Rady

(2010) (hereafter, BH, KRC, and KR). In all of these models, players face identical two-

armed bandit machines, which consist of a risky arm with an unknown quality and a safe

arm with a known quality. At each point in time, each player decides simultaneously how

to split one unit of a perfectly divisible resource between these arms, so that learning

occurs gradually by observing other players’ actions and outcomes. These models differ

in the assumptions on the probability distribution of the flow payoffs that each type of

the arm generates. By contrast, players in our model face a continuum of correlated

arms. Local learning—learning the quality of a particular arm—occurs instantaneously.

However, as the set of arms is unbounded, global learning—learning the qualities of

all arms—occurs gradually. The encouragement effect was first identified by BH in the

symmetric equilibrium in their Brownian model. This effect is then established by KR

for all MPE in the Poisson model with inconclusive good news. We are not only able to

show the presence of encouragement effect in all MPE in our model, but also (1) express

the stopping threshold in the unique symmetric MPE in closed form, and (2) perform

comparative statics analysis of the stopping threshold in both the symmetric MPE and

a class of asymmetric MPE to further study the encouragement effect and the free-rider

effect.

Callander (2011) proposes modeling the correlation between arms by Brownian path

5

Page 6: Strategic Exploration for Innovation

and studies experimentation conducted by a sequence of myopic agents. Garfagnini

and Strulovici (2016) extend this model to a setting with overlapping generations, in

which short-lived yet forward-looking players search on a Brownian path for arms

with higher qualities. They mainly focus on the search patterns and the long-run

dynamics, and identify the stagnation of search and the emergence of a technological

standard in finite time. By contrast, a byproduct of our analysis suggests that their

findings might depend on the search cost or the underlying correlation structure. When

the qualities of arms contribute exponentially to the payoffs, which is common in

modeling technological progress in the macroeconomic growth literature, stagnation

can be avoided even under a negative drift of the Brownian path. Our main assumption

on the parameters (Assumption 1) provides the condition for stagnation in the cooperative

problem.

Both of their models focus on non-strategic environments, because in their discrete-

time set-ups unexplored gaps between explored technologies create difficulty for further

analysis in strategic environments. To overcome this challenge, we forgo the fine

details of the learning dynamics for analytical tractability by imposing continuity on the

experimentation process. This simplification can be interpreted as the qualities of the

neighboring technology being revealed during experimentation, so that no unexplored

territory remains between the explored technologies. Moreover, we impose a hard

constraint on the intensity of exploration to capture the scenario that technologies

far ahead of their time are infeasible to be explored today. These abstractions allow

us to derive explicit expressions for the equilibrium payoffs and strategies, perform

comparative statics analysis, and construct asymmetric equilibria.

In an independent paper, Cetemen, Urgun, and Yariv (2021) study the exit patterns

during a search process conducted jointly by heterogeneous players in a setting similar

to ours. In their model, exploitation is only possible after an irreversible exit chosen

endogenously by each player. The players in our model, however, are not faced with

stopping problems and thus are free to choose between exploration and exploitation, or

even both simultaneously, at all times.

Our model is also related to the literature on organizational learning and adaption.

Since March (1991) identified the trade-off between exploration and exploitation in orga-

nization learning, multi-armed bandit models as in Rothschild (1974) and Gittins (1979)

have been widely used in the management literature to study the exploration-exploitation

trade-off in organizational adaption in dynamic environments; see, for example, Denrell

and March (2001), Posen and Levinthal (2012), Lee and Puranam (2016), or Stieglitz,

Knudsen, and Becker (2016). In all of these models, organizational decisions are made

by a single unit of decision-making authority, and thus strategic elements are largely

6

Page 7: Strategic Exploration for Innovation

absent. Our paper brings new insights into this strand of literature by showing that strate-

gic interaction within organizations can play a central role in organizational learning

and organizations’ long-term performance. More specifically, our results suggest that

because of free-riding, a horizontal organizational structure where everyone is engaged

in learning and experimentation might be less efficient in organizational learning than

a vertical hierarchy where tasks are coordinated and controlled by upper-level authori-

ties, and this inefficiency hinders organizational learning to a greater extent in a larger,

flatter organization if experimentation is too costly and the innovators are not properly

rewarded. In addition, the single-agent problem in this paper, as an attempt to model

the process of knowledge building and to capture the exploration-exploitation trade-off,

could serve as a new alternative to the canonical multi-armed bandit model for further

research in this area.

2 The Exploration Game

Time C ∈ [0,∞) is continuous, and the discount rate is A > 0. There are # ≥ 1

players, each endowed with one unit of perfectly divisible resource per unit of time.

Each player has to simultaneously and independently allocate the resources between

exploration, which expands the feasible technology domain, and exploitation, which

allows her to adopt one of the explored technologies. The feasible technology domain,

which is common to all players and contains all the explored technologies at time C, is

modeled as an interval [0, -C] with -0 = 0. If a player allocates the fraction :C ∈ [0, 1]to exploration over an interval of time [C, C + dC), the boundary -C is pushed to the right

by an amount of :C/2 dC, where the constant 2 > 0 represents the cost of exploration in

the unit of resources allocated to it. With the fraction 1 − :C allocated to exploitation,

by adopting technology GC ∈ [0, -C], the player receives a deterministic flow payoff

(1 − :C) exp(, (GC)) dC, where, (GC) denotes the quality of the adopted technology.

The quality mapping, : ℝ+ → ℝ is common to all players, but only the qualities

of the feasible technologies in [0, -C] are known to each player at time C. It coincides

with a realized path of a Brownian motion,+ ∈ � (ℝ+,ℝ) starting at ,+(0) = ~ ∈ ℝ,

with drift \ ∈ ℝ and volatility normalized to f =√

2, except possibly for, (0) = B ≥ ~,which is interpreted as the quality of a status quo technology. At the outset of the game,

all players know the parameters of the mapping but not the realized Brownian path,+

on ℝ++. Therefore, the process of exploration described above captures the dynamics

of research—building knowledge on the qualities of technologies—and development—

expanding the set of feasible technologies.

Given a player’s actions {(:C , GC)}C≥0, with :C ∈ [0, 1] and GC ∈ [0, -C] measurable

7

Page 8: Strategic Exploration for Innovation

with respect to the information available at time C, her total expected discounted payoff,

expressed in per-period unit, is

E

[∫ ∞

0

A4−AC (1 − :C)4, (GC ) dC].

Clearly, for any {:C}C≥0, each player maximizes her total expected discounted payoff

by choosing GC = arg maxG∈[0,-C ], (G), which is the best feasible technology, for all

C ≥ 0. So we can focus on such an exploitation strategy without loss and rewrite the

above total payoff as

E

[∫ ∞

0

A4−AC (1 − :C)4(C dC

],

where (C = maxG∈[0,-C ], (G).Therefore, the environment above can be equivalently reformulated as follows. Play-

ers have prior beliefs represented by a filtered probability space (Ω,ℱ, (ℱC),P), where

Ω = � (ℝ+,ℝ) is the space of Brownian paths, P is the law of standard Brownian motion

� = {�C}C≥0, and ℱC is the canonical filtration of �. Each player chooses her strategy

from the space of admissible control processesA, which consists of all processes {:C}C≥0

adapted to the filtration (ℱC)C≥0 with :C ∈ [0, 1]. The public history of technology devel-

opment is represented by a development process {.C}C≥0, where.C ≔ , (-C) denotes the

quality of the technology being explored at time C. This process satisfies the stochastic

differential equation

d.C = \ C/2 dC +√

2 C/2 d�C , .0 = ~,

where C =∑

1≤=≤# :=,C measures how much of the overall resources is allocated to

exploration, and will be referred to as the intensity of exploration at time C.

Given a strategy profile k = {(:1,C , . . . , :#,C)}C≥0, player =’s total expected dis-

counted payoff can be written as

E0B

[∫ ∞

0

A4−AC (1 − :=,C)4(C dC

],

where

(C = max

(B, max

0≤g≤C.C

)

denotes the quality of the best explored technology at time C.

In addition, we use the term “gap”, denoted by �C ≔ (C − .C ≥ 0, to refer to the

quality difference between the best explored technology and the latest technology under

development. Henceforth, we shall use 0 and B when referring to the state variables as

8

Page 9: Strategic Exploration for Innovation

opposed to the stochastic processes {�C}C≥0 and {(C}C≥0 (i.e., if �C = 0, then “the game

is in state 0 at time C”).

A Markov strategy := : ℝ+ × ℝ → [0, 1] with (0, B) as the state variables specifies

the action player = takes at time C to be := (�C , (C). A Markov strategy is called B-

invariant if it depends on (0, B) only through 0. Thus an B-invariant Markov strategy

:= : ℝ+ → [0, 1] takes the gap 0 as the state variable. Finally, an B-invariant Markov

strategy := is a cutoff strategy if there is a cutoff 0 ≥ 0 such that := (0) = 1 for all

0 ∈ [0, 0) and := (0) = 0 otherwise.

Given an B-invariant Markov strategy profile k, player =’s associated payoff at state

(0, B) can be written as {= (0, B |k) = 4B{= (0, 0|k).3 It is thus convenient to write

{= (0, B |k) = 4BD= (0 |k), where D= (0 |k) = {= (0, 0|k) stands for player =’s payoff at state

(0, B) normalized by 4B , the opportunity cost of exploration. We refer to D= : ℝ+ → ℝ

as player =’s normalized payoff function, or simply as payoff function when it is clear

from the context.

2.1 Exploration under Complete Information

To study the value of information, here we consider an alternative setting under complete

information. More specifically, how would # players allocate their resources cooper-

atively over time if the entire Brownian path , is publicly known at the outset of the

game?

Formally, in this subsection we replace FC with F for each C ≥ 0 while maintaining

the assumption that the feasible technology domain [0, -C] can only be expanded by

continuously pushing forward the boundary at the rate of d-C/dC = C/2. For a given

Brownian path, , denote the average (ex-post) value under complete information by

{(,) ≔ sup

∫ ∞

0

A4−AC (1 − C/#)4(C dC ,

where (C = maxG∈[0,-C ], (G), and the supremum is taken over all measurable functions

C ↦→ C ∈ [0, #].

Lemma 1 (Complete-information Payoff). Denote the average (ex-ante) value under

complete information at state (0, B) by + (0, B) ≔ E0B [{(,)]. We have + (0, B) =

4B* (0), where

* (0) =

1 + exp(−_0)/(_ − 1), if _ > 1,

+∞, otherwise,

3See Lemma 5 in the Appendix.

9

Page 10: Strategic Exploration for Innovation

with _ = A2/# − \.

When _ > 1, there almost surely exists a first-best technology G ∈ ℝ+ so that the

value {(,) < +∞ is achieved by exploring with full intensity up to the point when

G is developed, and thereafter exploiting G.4 On the contrary, if _ ≤ 1, then with

probability 1, the payoff can be improved indefinitely by delaying the exploitation,

and thus {(,) = +∞ almost surely. As a consequence, the value under incomplete

information becomes infinite as well. The assumption that _ > 1 can be equivalently

written in the following way.

Assumption 1. # (1 + \) < A2.

To ensure well-defined payoffs and deviations, we maintain Assumption 1 throughout

the rest of the paper unless otherwise stated.

3 Joint Maximization of Average Payoffs

Suppose that # ≥ 1 players work cooperatively to maximize the average expected

payoff. Denote by A# the space of all adapted processes { C }C≥0 with C ∈ [0, #].Formally, we are looking for the value function

{(0, B) = sup ∈A#

{(0, B | ),

where

{(0, B | ) = E0B

[∫ ∞

0

A4−AC (1 − C/#)4(C dC

]

is the average payoff function associated with the control process = { C}C≥0, and an

optimal control ∗ ∈ A# such that {(0, B) = {(0, B | ∗). Clearly, the structure of the

problem allows us to focus on Markov strategies : ℝ+×ℝ→ [0, #] with (0, B) as the

state variables, so that the intensity of exploration at time C is specified by C = (�C , (C).According to the dynamic programming principle, we have

{(0, B) = max ∈[0,#]

{A (1 − /#) 4B dC + E0B

[4−AdC{(0 + d0 , B + dB)

]}.

First note that (C can only change when �C = 0, and thus dB = 0 for all positive gaps.

Hence, for each 0 > 0 at which m2{/m02 is continuous, the value function { satisfies

4See the proof of Lemma 7 for the precise definition of G.

10

Page 11: Strategic Exploration for Innovation

the Hamilton-Jacobi-Bellman (HJB) equation

{(0, B) = max ∈[0,#]

{(1 −

#

)4B +

A2

(m2{(0, B)m02

− \ m{(0, B)m0

)}. (1)

Assume, as will be verified, that the optimal strategy is B-invariant. Then by the

homogeneity of the value function {(0, B) = 4B{(0, 0) = 4BD(0), we can replace {(0, B)with 4BD(0), and divide both sides of equation (1) by the opportunity cost 4B , to obtain

the normalized HJB equation

D(0) = 1 + max ∈[0,#]

{V(0, D) − 1/#}, (2)

where V(0, D) ≔ (D′′(0)−\D′(0))/(A2) is the ratio of the expected benefit of exploration

( m2{/m02−\ m{/m0 )/(A2) to its opportunity cost 4B. It is then straightforward to see that

the optimal action takes the following “bang-bang” form. If the shared opportunity cost

of exploration, 1/# , exceeds the full expected benefit, the optimal choice is (0) = 0

(all agents choose exploitation exclusively), which gives value D(0) = 1. Otherwise,

(0) = # is optimal (all agents choose exploration exclusively), and D satisfies the

second-order ordinary differential equation (henceforth ODE),

V(0, D) = D(0)/#. (3)

The optimal strategy could presumably depend on both 0 and B and hence might not

be B-invariant. Indeed, both the benefit and the opportunity cost of exploration increase

as innovation occurs. Nevertheless, due to our specific form of flow payoff where

technologies contribute exponentially to the payoffs, the increased benefit of exploration

exactly offsets the increased opportunity cost. As a result, the incentives for exploration

for a fixed gap do not depend on the highest-known quality, which leads to an B-invariant

optimal strategy.5 This conjecture is confirmed by the following theorem.

Proposition 1 (Cooperative Solution). Suppose Assumption 1 holds. In the #-agent

cooperative problem, there is a cutoff 0∗ > 0 given by

0∗ =1

W2 − W1

(ln

(1 + 1

W2

)− ln

(1 + 1

W1

))

with W1 < W2 being the roots of W(W − \) = A2/#, such that it is optimal for all players

to choose exploitation exclusively when the gap is above the cutoff 0∗ and it is optimal

5This feature also appears in Urgun and Yariv (2020) and Cetemen, Urgun, and Yariv (2021) under a

setting of undiscounted linear preference with a cost convex in the learning rate.

11

Page 12: Strategic Exploration for Innovation

for all players to choose exploration exclusively when the gap is below the cutoff 0∗.

The associated payoff at state (0, B) can be written as +∗(0, B) = 4B*∗(0), where

the normalized payoff function*∗ : ℝ+ → ℝ is given by

*∗(0) = 1

W2 − W1

(W24−W1 (0∗−0) − W14

−W2 (0∗−0))

when 0 ∈ [0, 0∗) and by *∗(0) = 1 otherwise.

If Assumption 1 is violated, then*∗(0) = +∞ for all 0 ≥ 0.

The cooperative solution is pinned down by the standard smooth pasting condition

D′(0∗) = 0, and a normal reflection condition ( m{/m0 + m{/mB )(0+, B) = 0, which takes

the form of D(0) + D′(0+) = 0 for B-invariant strategies.6

Because of the lack of information on the qualities of technologies, the players might

stop too early, giving up exploration before developing the first-best technology and thus

ultimately adopting a suboptimal technology, or might stop too late, wasting too many

resources for marginal improvement while the first-best technology has already been

developed. The cooperative solution optimally balances these trade-offs between early

and late stopping and therefore determines the efficient strategies under incomplete

information.

Corollary 1 (Comparative Statics of the Cooperative Solution). The cooperate cutoff

0∗ is strictly increasing in # and strictly decreasing in 2. For all 0 ≥ 0, the cooperative

payoff *∗(0) is strictly below the complete-information payoff * (0). For each \ ∈ ℝ

and 2 > 0,

• if \ < −1, then |*∗# (0) − *# (0) | → 0 as # → +∞;

• if \ ≥ −1, then *∗# → +∞ as # → A2/(1 + \) ∈ (1, +∞].7

In both cases 0∗# → +∞.

The stopping cutoff 0∗ represents the tolerance for failure among the players. The

benefit of exploration increases as exploration becomes cheaper, and thus the players

are willing to explore at more pessimistic states for smaller 2.8 Likewise, because the

players work cooperatively, extra resources brought by additional players as team size

6The normal reflection condition is not an optimality condition. It ensures that the infinitesimal change

of the payoff at a zero gap has a zero d( term, which is necessary for the continuation value process to be

a martingale. See Peskir and Shiryaev (2006) for an introduction to the normal reflection condition in the

context of optimal stopping problems, and the proof of Lemma 4 for more details.

7Here we allow # to be non-integral values for convenience. Also note that Assumption 1 is violated

for # ≥ A2/(1 + \) if \ ≥ −1, in which case *∗# = +∞.

8The comparative statics with respect to 2 and the discount rate A are equivalent.

12

Page 13: Strategic Exploration for Innovation

increases enable a higher rate of exploration, so that for a fixed tolerance for failure,

resources are wasted for a shorter period of time before the exploration fully stops, which

in turn leads to greater tolerance.

Note that under the complete information setting in which players can observe the

whole path, it is optimal for them to explore at arbitrarily large gaps as long as the

gap would quickly shrink to zero if they explore a little bit longer. By contrast, under

incomplete information, it is optimal to stop exploration whenever the gap reaches 0∗.

This informational disadvantage vanishes as either # goes up or 2 goes down because

it originates from the suboptimal stopping decision in the cooperative solution due to

the lack of information, rather than from the inefficient allocation of resources during

exploration. As a result, the cooperative payoff converges to the complete-information

payoff as the players become increasingly tolerant of failure.

3.1 Long-run Outcomes

Consider an B-invariant strategy profile k such that the set of states at which the intensity

of exploration is bounded away from zero takes the form of a half-open interval [0, 0).We denote by G(0) ≔ limC→∞ -C =

∫ ∞0 C/2 dC the amount of exploration under k. In

addition, we denote by B(0) ≔ limC→∞ (C the long-run technological standard, which is

defined as the quality of the best technology available as time approaches infinity. For

a given Brownian path , , it is straightforward to see that both B(0) and G(0) depend

only on the initial state (0, B) and the stopping threshold 0, and are independent of

the intensity of exploration. Moreover, they are clearly nondecreasing in 0. We can

explicitly express the prior belief on the distribution of B(0) as follows.

Lemma 2. At state (0, B), for a strategy profile with stopping threshold 0, the long-run

technological standard B(0) has the same distribution as max{B, " − 0}, where the

random variable " has an exponential distribution with mean (4\0 − 1)/\.9

Recall from Section 2.1 that the first-best technology, denoted by G, is the tech-

nology G ≥ 0 that yields the highest payoff for # cooperative agents under complete

information, taking the opportunity cost of its development into account. Denote by

@(0) ≔ P0B (G ∈ [0, G(0)]) the probability that the first-best technology will be explored

in the long run under a strategy profile with stopping threshold 0.

Lemma 3. At state (0, B), we have @(0) → 1 as 0 → +∞.

Therefore, the comparative statics in Corollary 1 imply that G will be developed

under the cooperative solution with probability @# (0∗#) → 1 as the team size grows.10

9For \ = 0, we take lim\→0(4 \0 − 1)/\ = 0 for the mean of " .

10Note that for a given, , the first-best technology G could depend on the team size # . However, the

13

Page 14: Strategic Exploration for Innovation

4 The Strategic Problem

From now on, we assume that there are # > 1 players acting noncooperatively. We study

equilibria in the class of B-invariant Markov strategies, which are the Markov strategies

with the gap as the state variable and will hereafter be referred to as Markov strategies.

In this section, we provide characterization of best responses and the associated payoff

functions, which help us to establish useful properties of the equilibria.

4.1 Best Responses and Equilibria

We denote by K the set of Markov strategies that are right-continuous and piecewise

Lipschitz-continuous, and denote by A the space of admissible control processes as in

Section 2.11 A strategy :∗= ∈ K for player = is a best response against her opponents’

strategies k¬= = (:1, . . . , :=−1, :=+1, :# ) ∈ K#−1 if

{= (0, B |:∗=, k¬=) = sup:=∈A

{= (0, B |:=,k¬=)

at each state (0, B) ∈ ℝ+ ×ℝ. This turns out to be equivalent to

D= (0 |:∗=, k¬=) = sup:=∈K

D= (0 |:=, k¬=)

for each gap 0 ≥ 0, with the normalized payoff function D= (0 |k) = 4−B{= (0, B |k) defined

as in Section 2. A Markov perfect equilibrium is a profile of Markov strategies that are

mutually best responses.

Denote the intensity of exploration carried out by player =’s opponents by ¬= (0) =∑;≠= :; (0), and the benefit-cost ratio of exploration by V(0, D=) as in Section 3. The

following lemma characterizes all MPE in the exploration game.

Lemma 4 (Equilibrium Characterization). A strategy profile k = (:∗1, . . . , :∗# ) ∈ K#

is a Markov perfect equilibrium with D= : ℝ+ → ℝ being the corresponding payoff

function of player =, if and only if for each = ∈ {1, . . . , #}, function D=

1. is once continuously differentiable on ℝ++;

convergence in Lemma 3 does not require a constant sequence of parameters, provided that the parameters

satisfy Assumption 1 along the sequence.

11Piecewise Lipschitz-continuity means thatℝ+ can be partitioned into a finite number of intervals such

that the strategy is Lipschitz-continuous on each of them. This rules out the infinite-switching strategies

considered in Section 6.2 of KRC.

14

Page 15: Strategic Exploration for Innovation

2. is piecewise twice continuously differentiable on ℝ++;12

3. satisfies the normal reflection condition

D= (0) + D′= (0+) = 0; (4)

4. satisfies, at each continuity point of D′′= , the HJB equation

D= (0) = 1 + ¬= (0)V(0, D=) + max:=∈[0,1]

:={V(0, D=) − 1}, (5)

with :∗= (0) achieving the maximum on the right-hand side, i.e.,

:∗= (0) ∈ arg max:=∈[0,1]

:={V(0, D=) − 1}.

These conditions are standard in optimal control problems. Condition 4 and the

smooth pasting condition, which is implicitly stated in Condition 1, are the optimality

conditions. The rest are properties for general payoff functions.

In any MPE, Lemma 4 provides the following characterization of best responses. If

V(0, D=) < 1, then :∗= (0) = 0 is optimal and D= (0) = 1 + ¬= (0)V(0, D=) < 1 + ¬= (0).If V(0, D=) = 1, then the optimal :∗= (0) takes arbitrary values in [0, 1] and D= (0) =1 + ¬= (0). Finally, if V(0, D=) > 1, then :∗= (0) = 1 is optimal and D=(0) = (1 + ¬= (0))V(0, D=) > 1 + ¬= (0). In short, player =’s best response to a given intensity of

exploration ¬= by the others depends on whether D= is greater than, equal to, or less

than 1 + ¬=.On intervals where each := is continuous, HJB equation (5) gives rise to the ODE

D=(0) = 1 − := (0) + (0)V(0, D=). (6)

In particular, on the intervals where each := is constant, the ODE above admits the

explicit solution

* (0) = 1 − := +�14W10 + �24

W20, (7)

where W1 and W2 are the roots of the equation W(W − \) = A2/ , and �1, �2 are constants

to be determined.

Lastly, on intervals where an interior allocation is chosen, the ODE from the indif-

12This condition means that there is partition of ℝ++ into a finite number of intervals such that D′′= is

continuous on the interior of each of them.

15

Page 16: Strategic Exploration for Innovation

ference condition V(0, D=) = 1 has the general solution

* (0) =

�1 +�20 + A202/2, if \ = 0,

�1 +�24\0 − A20/\, if \ ≠ 0,

(8)

where �1, �2 are constants to be determined.

4.2 Properties of MPE

First note that in any MPE, the average payoff can never exceed the #-player cooperative

payoff *∗# , and no individual payoff can fall below the single-agent payoff *∗1. The

upper bound follows directly from the fact that the cooperative solution maximizes the

average payoff. The lower bound *∗1

is guaranteed by playing the single-agent optimal

strategy, as the players can only benefit from the exploration efforts exerted by others.

Second, all Markov perfect equilibria are inefficient. Along the efficient exploration

path, the benefit of exploration tends to 1/# of its opportunity cost as the development

process approaches the efficient stopping threshold. A self-interested player thus has an

incentive to deviate to exploitation whenever the benefit of exploration drops below its

full opportunity cost.

Also note that in any MPE, the set of states at which the intensity of exploration

is positive must be an interval [0, 0) with 0∗1≤ 0 ≤ 0∗# . The bounds on the stopping

threshold follow directly from the bounds on the average payoffs and imply that the

long-run outcomes in any MPE cannot outperform the cooperative solution.

Corollary 2. In any Markov equilibrium with stopping threshold 0, at any state (0, B)we have B(0) ≤ B(0∗# ) almost surely, and @(0) ≤ @(0∗# ).

Moreover, the intensity of exploration must be bounded away from zero on any

compact subset of [0, 0). If this were not the case, there would exist some gap 0 < 0

such that the development process {.C}C≥0 starting from .0 = B − 0 would never reach

the best-known quality B because of diminishing intensity, and therefore allocating a

positive fraction of resource to exploration at gap 0 is clearly not optimal for any player.

In the two-armed bandit models of BH, KRC, and KR, the players always use the

risky arm at beliefs higher than some myopic cutoff, above which the expected short-

run payoff from the risky arm exceed the deterministic flow payoff from the safe arm.

Because our model lacks such a myopic cutoff, it might seem reasonable to conjecture

that some player =, with her payoff function D= bounded from above by 1 + ¬=, never

explores, and thus free-rides the technologies developed by the other players. Such a

conjecture is refuted by the following proposition.

16

Page 17: Strategic Exploration for Innovation

Proposition 2 (No Player Always Free-rides). In any Markov perfect equilibrium, no

player allocates resources exclusively to exploitation for all gaps.

The intuition behind this result is that in equilibrium, the cost and benefit of explo-

ration must be equalized at the stopping threshold for each player, whereas any player

who never explores would find that this benefit outweighs the cost, manifested by a kink

in her payoff function at the stopping threshold. She would then have a strict incentive

to resume exploration immediately after the other players give up, hoping to reduce

the gap to bring the development process back alive. Therefore, in equilibrium, every

player must perform exploration at some states, respecting the smooth pasting condition

(Condition 1 in Lemma 4).

This result shows that each player strictly benefits from the presence of the other

players in equilibrium, which in turn encourages some of the players to explore at gaps

larger than their single-agent cutoffs. Such an encouragement effect is exhibited in all

MPE of the exploration game.

Proposition 3 (Encouragement Effect). In any Markov perfect equilibrium, at least one

player explores at gaps above the single-agent cutoff 0∗1.

With the same intuition as the encouragement effect, our last general result on

Markov perfect equilibria concerns the nonexistence of equilibria where all players use

cutoff strategies.

Proposition 4 (No MPE in Cutoff Strategies). In any Markov perfect equilibrium, at

least one player uses a strategy that is not of the cutoff type.

Next we look into Markov perfect equilibria in greater depth.

5 Symmetric Equilibrium

Our characterization of best responses and the nonexistence of MPE in cutoff strategies

suggest that in any symmetric equilibrium, the players choose an interior allocation at

some states. At these states of interior allocation, the benefit of exploration must be

equal to the opportunity cost, and therefore the common payoff function solves the ODE

V(0, D) = 1. As a consequence of equation (6), the payoff of each player at the states

of interior allocation in the symmetric equilibrium must also satisfy D = 1 + ¬= ≤ # .

Therefore, whenever the common payoff exceeds # , each player allocates resources

exclusively to exploration, and the payoff function satisfies the same ODE (3) as in the

cooperative solution. However, it is worth pointing out that for some configurations of

17

Page 18: Strategic Exploration for Innovation

the parameters, because of the strength of free-riding incentives among the players, the

common payoff could be below # for all gaps, and accordingly, the resource constraint

of each player is not necessarily binding even when the gap is zero, in marked contrast

to the cooperative solution.13 Lastly, the common payoff satisfies D = 1 at the gaps for

which the resource is exclusively allocated to exploitation.

The solutions to the corresponding ODEs provided in equations (7) and (8), to-

gether with the normal reflection condition (4) and the smoothness requirement on the

equilibrium payoff functions, uniquely pin down the strategies and the associated pay-

off functions in the symmetric equilibrium, which can be expressed in closed form as

follows.

Proposition 5 (Symmetric Equilibrium). The #-player exploration game has a unique

symmetric Markov perfect equilibrium with the gap as the state variable. There exists

a stopping threshold 0 ∈ (0∗1, 0∗#) and a full-intensity threshold 0† ≥ 0 such that the

fraction of resource :†(0) that each player allocates to exploration at gap 0 is given by

:†(0) =

0, on [0, +∞),A2#−1

∫ 0−00

q\ (I) dI ∈ (0, 1), on [0†, 0),

1, on [0, 0†) if 0† > 0,

(9)

with q\ (I) ≔ 1−4−\I\ .14

The corresponding payoff function is the unique function*† : ℝ+ → [1, +∞) of class

�1 with the following properties: *†(0) = 1 on [0, +∞); *†(0) = 1 + (# − 1):†(0) ∈(0, #) and solves the ODE V(0, D) = 1 on (0†, 0); if 0† > 0, then*†(0) > # and solves

the ODE V(0, D) = D/# on (0, 0†).

The closed-form expressions for the common payoff function*† and the thresholds

0† and 0 are provided in the Appendix.

As we have already pointed out, depending on the parameters, it is possible that

0† = 0, in which case :†(0) < 1 for all 0 > 0, and hence will be referred to as the

13Exploration at a zero gap immediately leads to innovation and thus has an “innovation benefit” that

strictly dominates the benefit of exploitation. It might therefore seem puzzling why any player would not

explore with full intensity at a zero gap. The reason is that, heuristically speaking, the innovation benefit

can be achieved by any positive intensity of exploration at a zero gap. In the single-agent problem, the

player can only reap such an innovation benefit by her own effort and thus would never stop exploration

when the gap is zero. In the strategic problem, however, from each player’s own perspective, the innovation

benefit is secured by the others’ effort whenever ¬= (0) > 0. In the extreme cases, it is even possible

for some players to allocate all resources to exploitation at a zero gap. This happens in the asymmetric

equilibria for some configurations of the parameters in the next section.

14We define q0(I) ≔ lim\→0 q\ (I) = I.

18

Page 19: Strategic Exploration for Innovation

non-binding case. The opposite case, where 0† > 0, will be referred to as the binding

case. Figure 1 illustrates the symmetric equilibrium for these two cases.

0

1

Figure 1. The symmetric equilibrium with binding resource constraints in a two-player

game, and with non-binding constraints in a four-player game (A = 1, 2 = 1/2, \ = −1).

As the outcomes are publicly observed and newly developed technologies are freely

available, the players have incentives to free-ride. Such a free-rider effect can be seen

more clearly from the comparison between the benefit-cost ratio of exploration at the

states of interior allocation in the symmetric MPE

V(0,*†) = 1,

and the one in the cooperative solution

V(0,*∗) = *∗(0)/#.

Exploration in equilibrium thus requires the benefit of exploration to cover the cost,

whereas the efficient strategy entails exploration at states where the cost exceeds the

benefit, as V(0,*∗) < 1 whenever *∗(0) < # . Figure 2 illustrates the comparison

between the common payoff function in the symmetric equilibrium and the cooperative

solution in a two-player exploration game.

19

Page 20: Strategic Exploration for Innovation

0

1

2

Value

Figure 2. From top to bottom: complete-information payoff *# , average payoff *∗# in

the cooperative solution, common payoff *†# in the symmetric equilibrium, and payoff

in the single-agent optimum*∗1. Parameter values: A = 1, 2 = 1/2, \ = −1, # = 2.

5.1 Comparative Statics

In this section we examine the comparative statics of the symmetric equilibrium with

respect to the cost of exploration 2 and the number of players # .

Corollary 3 (Effect of 2). The stopping threshold 02 is strictly decreasing in 2, and the

full-intensity threshold 0†2 is weakly decreasing in 2. For any gap 0 ≥ 0, the equilibrium

strategy :†2 (0) and the common payoff *†2 (0) are weakly decreasing in 2.

As 2 decreases, exploration becomes less expensive and thus the players have greater

incentives for exploration. Moreover, the increased exploration efforts of others encour-

age each player to raise their own effort further.

The common payoff is decreasing in 2 for two reasons. First, as in the cooperative

solution, the reduced cost of exploration raises players’ tolerance for failure 02 , and hence

more advanced technologies will be developed and adopted in the long run. Second,

because cheaper exploration leads to a higher rate of exploration, a lower 2 enables

earlier adoption of the developed technologies.

Corollary 4 (Effect of #). On { # ≥ 1 | 0†# > 0 }, which is the range of # for which

the players’ resource constraints are binding in the symmetric equilibrium, the stopping

threshold 0# is strictly increasing in # and the common payoff function *†# is weakly

20

Page 21: Strategic Exploration for Innovation

increasing in # . Whereas on { # ≥ 1 | 0†# = 0 }, both 0# and *†# are constant over # ,

and the equilibrium strategy :†# is weakly decreasing in # .

In the binding case, notice that :†# (0) is not monotone in # because the full-intensity

threshold 0†# could be decreasing in # . This occurs when the free-rider effect outweighs

the encouragement effect. On the one hand, extra encouragement brought by additional

players raises the stopping threshold 0# . On the other hand, the increased free-riding

incentives due to extra players tighten the resource binding requirement that D > # ,

which enlarges the region (0†# , 0# ) of interior allocation. The total effect of increasing

# on the intensity of exploration is determined by these two competing forces and hence

is not monotone in # .

In the non-binding case, as # increases, each player adjusts their individual intensity

of exploration downward, maintaining the same equilibrium payoff. This is a situation

where the incentive to free-ride is so strong that it completely offsets further encourage-

ment brought by additional players, and even the overall intensity of exploration #:† is

decreasing in # for the gaps in [0, 0# ). Thus, free-riding slows down the development

process considerably. In the worst scenario, the overall intensity when the gap is zero

could even be lower than that in the single-agent problem.

Also notice that whenever the resource constraints are not binding, the full-intensity

threshold 0†# remains constant at zero for any further increase in # because :†# would

be even lower. Therefore, if 0†# ever hits zero as # goes up, the resource constraints in

the symmetric equilibrium remain non-binding for any larger # .

As we have seen, depending on whether or not the resource constraints are binding

in the symmetric equilibrium, a larger team size can have qualitatively different effects

on the welfare and long-run outcomes. If the resource constraints are not binding in

equilibrium, any extra resources brought by additional players translate entirely to free-

riding, which results in a highly inefficient outcome in a large team in terms of average

payoffs, the likelihood of developing the first-best technology, and the technological

standard in the long run. The question then naturally arises of whether the resource

constraints would ever fail to be binding in the symmetric equilibrium as # goes up. Or,

conversely, would the encouragement effect eventually overcome the free-rider effect?

To investigate this question, we now examine the effect on the symmetric equilibrium

as # increases toward infinity, while keeping the other parameters fixed. For ease of

exposition, we allow # ≥ 1 to take non-integral values and drop Assumption 1 for the

rest of this section.15

15Caution needs to be taken in the case of \ > −1, in which Assumption 1 could be violated for large

# , and the existence and uniqueness of the symmetric equilibrium is no longer guaranteed.

21

Page 22: Strategic Exploration for Innovation

Corollary 5 (Asymptotic Effect of #). There exists 2 > 0, which depends on \, such

that if 2 < 2 and \ > −1, then we have *†# → +∞, 0†# → +∞, and @# (0# ) → 1 as

# → A2/(1 + \) ∈ (1, +∞); if instead 2 ≥ 2 or \ ≤ −1, we have 0†# = 0 for sufficiently

large # , and we have lim*†# (0) < lim*∗# (0) for each 0 ≥ 0, 0# is bounded, and

@# (0#) is bounded away from 1 as # → +∞.

Next we discuss this result for \ < −1 and \ ≥ −1 separately.

When \ < −1, recall that the cooperative payoff converges to the finite limit of

the complete-information payoff as # → +∞, with the cooperative cutoff 0∗# → +∞and thus the long-run probability of developing the first-best technology @# (0∗#) → 1.

In contrast, the symmetric equilibrium always falls into the non-binding category for

large # . After the full-intensity threshold 0†# reaches zero, the encouragement effect is

completely frozen by the free-rider effect: The stopping threshold 0# remains constant

for any further increase of # , so both the amount and the rate of exploration are highly

inefficient for large # . Consequently, as # → +∞, the average payoff is bounded away

from the cooperative payoff in the limit, and @# (0#) is bounded away from 1.

On the other hand, when \ ≥ −1, recall that in addition to the divergence of 0∗# ,

the cooperative payoff tends to infinity as # → A2/(1 + \) ∈ (1, +∞]. Whether this

is the case for the symmetric equilibrium now depends on the cost of exploration, as

illustrated in Figure 3. If exploration is too expensive, the symmetric equilibrium always

falls into the non-binding category for large # , same as the case of \ < −1. However,

if exploration is sufficiently inexpensive, the resource constraints are always binding in

the symmetric equilibrium. Not only does the stopping threshold 0# tend to infinity,

but the full-intensity threshold 0†# does as well. So the encouragement effect eventually

overcomes the free-rider effect when exploration is sufficiently cheap and the prospect

of development is sufficiently promising. Both the amount and the rate of exploration

resemble the cooperative solution for large # . As a consequence, @# (0#) → 1 and the

average equilibrium payoff tends to infinity as # goes up, bearing a resemblance to the

cooperative payoff.

Also note from Figure 3 that when \ > −1, for large fixed # the symmetric equilib-

rium could be very sensitive in 2 around 2. This observation suggests that perturbations

in the cost of exploration can have a drastic effect on the welfare and the long-run

outcomes of technology development.

Lastly, as yet another manifestation of the strength of the free-rider effect, in the

case of \ > −1, Corollary 5 asserts that symmetric equilibrium with finite payoff exists

for all # provided that 2 ≥ 2, standing in marked contrast to the cooperative solution,

which fails to exist due to unbounded payoffs for large # .

22

Page 23: Strategic Exploration for Innovation

5 10 15 20 25

0

2

4

6

8

10

(a) Full-intensity threshold 0†#

5 10 15 20 25

0

2

4

6

8

10

(b) Stopping threshold 0#

Figure 3. Full-intensity thresholds and the stopping thresholds in the symmetric equilib-

rium (A = 1, \ = −0.09) for different cost of exploration 2. If exploration is sufficiently

cheap (solid curves), resources constraints are binding (0†# > 0) for all # . Otherwise

(dashed curve), resources constraints are not binding (0†# = 0) for sufficiently large # .

6 Asymmetric Equilibria and Welfare Properties

Note that in the symmetric equilibrium, the intensity of exploration dwindles down to

zero as the gap approaches the stopping threshold. As a result, the threshold is never

reached and exploration never fully stops. This suggests that welfare can be improved if

the players take turns between the roles of explorer and free-rider, keeping the intensity

of exploration bounded away from zero until all exploration stops. In this section

we investigate this possibility by constructing a class of asymmetric Markov perfect

equilibria.

6.1 Construction of Asymmetric Equilibria

Our construction of asymmetric MPE is based on the idea of the asymmetric MPE

proposed in KR. We let the players adopt the common actions in the same way as in

the symmetric equilibrium whenever the resulting average payoff is high enough to

induce an overall intensity of exploration greater than one, and let the players take turns

exploring at more pessimistic states in order to maintain the overall intensity at one.

Such alternation between the roles of explorer and free-rider leads to an overall intensity

of exploration higher than in the symmetric equilibrium, yielding higher equilibrium

payoffs.

Here we briefly address the two main steps in our construction. In the first step,

we construct the average payoff function D. We let D solve the same ODE V(0, D) =max{D(0)/#, 1} as the common payoff function in the symmetric equilibrium whenever

D > 2 − 1/# , so that the corresponding overall intensity is #: (0) = # (D(0) − 1)/(# −

23

Page 24: Strategic Exploration for Innovation

1) ≥ 1. Whenever 1 < D < 2 − 1/# , we let D solve the ODE D(0) = 1 − 1/# + V(0, D),which is the ODE for the average payoff function among # players associated with an

overall intensity = 1. The boundary conditions for the average payoff function D,

namely the smooth pasting condition at the stopping threshold and the normal reflection

condition (4), are identical to the conditions in Lemma 4, simply because these conditions

remain unchanged after taking the average. The unique solution of class �1 (ℝ++) to

the ODE above serves as the average payoff function, which also gives thresholds

0♭ > 0♯ ≥ 0 such that D = 1 on [0♭, +∞), 1 < D < 2 − 1/# on (0♯, 0♭) and D > 2 − 1/#on [0, 0♯). In the second step, equilibrium-compatible actions are assigned to each

player. On [0, 0♯), if it is nonempty, we let the players adopt the common action

:= (0) = min{(D(0) − 1)/(# − 1), 1} in the same way as in the symmetric equilibrium.

On [0♯, 0♭), players alternate between the roles of explorer and free-rider so as to keep

the overall intensity at one. We first split [0♯, 0♭] into subintervals in an arbitrary way

and then meticulously choose the switch points of their actions, so that all individual

payoff functions have the same values and derivatives as the average payoff function at

the endpoints of these subintervals.16 Lastly, our characterization of MPE in Lemma 4

confirms that the assigned action profile is compatible with equilibrium. We leave the

method for choosing the switch points and further details to the Appendix.

For # = 2, Figure 4 illustrates the intensity of exploration in the asymmetric MPE,

compared with the symmetric equilibrium. We can see that the resource constraints are

binding in the depicted equilibria, but it is worth bearing in mind that it may well be the

other way around for different parameters. For example, if the cost of exploration is too

high, the average payoff function could be bounded by 2− 1/# from above, resulting in

an intensity of exploration equal to 1 on the entire region [0, 0♭). The states 0� and 0� in

the figure demarcate the switch points at which these two players swap roles when they

take turns exploring on [0♯, 0♭). The volunteer explores on [0♯, 0�) ∪ [0� , 0♭), whereas

the free-rider explores on [0� , 0� ). These switch points are chosen in a way that ensures

their individual payoff functions are of class �1(ℝ++) and coincide on [0, 0♯).Figure 5 illustrates the associated average payoff function (dashed curve) and the

individual payoff functions (solid curves) that can arise in the equilibria in a two-player

game, compared with the common payoff function in the symmetric equilibrium (solid

dotted curve). Note that the payoff function of the volunteer is strictly higher than

that of the free-rider at the states immediately to the left of the stopping threshold 0♭.

In fact, the free-rider has a payoff equal to 1 on [0� , 0♭). This observation stands in

16In fact, this technique can be used to construct asymmetric equilibria with strategies that take values

in {0, 1} only, which are referred to as simple equilibria in KRC. See Proposition 10 in the Appendix.

However, it is not clear whether such equilibria achieve higher average payoffs than the symmetric

equilibrium.

24

Page 25: Strategic Exploration for Innovation

0

0

1

2

Figure 4. From top to bottom: intensity of exploration in the cooperative solution, the

asymmetric equilibria, and the symmetric equilibrium (A = 1, 2 = 1/2, \ = −1, # = 2).

marked contrast to the models in KRC and KR, where the volunteer is worse off on

this region. The intuition behind this feature is similar to that in Proposition 2. A kink

has to be created at 0♭ in the free-rider’s payoff function in order to attain a higher

payoff than the volunteer’s at states immediately to the left of 0♭, because the free-rider’s

ODE D(0) = 1 + V(0, D) must be satisfied.17 In such a case, the free-rider has a strict

incentive to take over the role of volunteer to kickstart the development process at a

larger gap. Therefore, in equilibrium, the volunteer must be compensated for acting as

a lone explorer at pessimistic states by bearing relatively less burden at more optimistic

states.

For arbitrary # , we have the following result.

Proposition 6 (Asymmetric MPE). The #-player exploration game admits Markov

perfect equilibria with thresholds 0 ≤ 0‡ ≤ 0♯ < 0♭ < 0∗, such that on [0, 0♯], the

players have a common payoff function; on [0, 0‡], all players choose exploration

exclusively; on (0‡, 0♯), the players allocate a common interior fraction of the unit

resource to exploration, and this fraction decreases in the gap; on [0♯, 0♭), the intensity

of exploration equals 1 with players taking turns exploring on consecutive subintervals;

17Even though the free-rider has a payoff equal to 1 around 0♭, she still benefits from free-riding on

this region. This benefit, however, is offset by the relatively high burden of exploration effort she must

bear in equilibrium at more optimistic states to reward the volunteer.

25

Page 26: Strategic Exploration for Innovation

0

1

1.5

2

Value

Figure 5. Average payoff and possible individual payoffs in the best two-player asym-

metric equilibria, compared to the common payoff in the symmetric equilibrium (A = 1,

2 = 1/2, \ = −1, # = 2).

on [0♭, +∞), all players choose exploitation exclusively. The intensity of exploration is

continuous in the gap on [0, 0♭). The average payoff function is strictly decreasing on

[0, 0♭], once continuously differentiable on ℝ++, and twice continuously differentiable

on ℝ++ except for the cutoff 0♭. On [0, 0♭), the average payoff is higher than in the

symmetric MPE, and 0♭ lies to the right of the threshold 0 at which all exploration stops

in that equilibrium.

6.2 Welfare Results

For # ≥ 3, further improvements can be easily achieved by letting the players take turns

exploring, maintaining the intensity of exploration at whenever < D < +1− /#for all ∈ {1, . . . , #}, rather than for = 1 only, as in the Proposition above. However,

it is not clear whether such improvements achieve the highest welfare among all MPE

of the #-player exploration game. For # = 2, the asymmetric equilibria of Proposition

6 are the best among all MPE.

Proposition 7 (Best MPE for # = 2). The average payoff in any Markov perfect

equilibrium of the two-player exploration game cannot exceed the average payoff in the

equilibria of Proposition 6.

In the construction of the asymmetric MPE depicted in Figure 4, the interval [0♯, 0♭]

26

Page 27: Strategic Exploration for Innovation

is not split into subintervals. This can be confirmed by the observation from Figure 5

that the players’ payoff functions match values only at the endpoints of [0♯, 0♭], not in the

interior. Our construction allows an arbitrary partition on [0♯, 0♭] during the splitting

procedure, thus a trivial partition of [0♯, 0♭], as in Figure 4, suffices. A finer partition,

however, produces equilibria in which the players exchange roles more often, which

allows them to share the burden of exploration more equally. Sufficiently frequent

alternation of roles on [0♯, 0♭) guarantees each player a payoff close enough to the

average payoff and thus yields a Pareto improvement over the symmetric equilibrium.18

Proposition 8 (Pareto Improvement over the Symmetric MPE). For any n > 0, the #-

player exploration game admits Markov perfect equilibria as in Proposition 6 in which

each player’s payoff exceeds the symmetric equilibrium payoff on [0, 0♭ − n].

Recall that the symmetric equilibrium exhibits a limited encouragement effect when

the cost of exploration is too high. The reason is that the common payoff function on the

interval of interior allocation must satisfy the ODE V(0,*†) = 1, which does not depend

on # . As a result, the common payoff function in the symmetric equilibrium is constant

over the team size when the resource constraints are not binding. By contrast, the average

payoff always increases in the number of players in the asymmetric MPE in which the

players take turns exploring at states immediately to the left of the stopping cutoff. This

is because the burden of keeping the overall intensity at one at these states can be shared

among more players in larger teams. As a result, the players would be able to exploit

more often on average, which in turn encourages them to explore at more pessimistic

states. Therefore, unlike the comparative statics of the symmetric equilibrium, the

encouragement effect in the asymmetric equilibria is not frozen by the free-rider effect

as # goes up, irrespective of the cost and the prospect of the development.

Proposition 9. The stopping cutoff 0♭# in the asymmetric MPE of Proposition 6 goes to

infinity as # → +∞.19

Therefore, the amount of exploration, and thus the long-run outcomes, can be

improved significantly over the symmetric equilibrium for large # by letting the players

take turns exploring before the exploration fully stops. This positive result, unfortunately,

does not extend as far to the welfare, as the rate of exploration might still be too

low because of non-binding resource constraints, similar to the situation faced in the

18The payoffs of both players in the asymmetric MPE depicted in Figure 5 are higher than in the

symmetric equilibrium on [0, 0� ); however, this might not be the case in general when the trivial

partition of [0♯, 0♭] is used in the construction, as 0� could lie on the left of 0.

19When \ > −1, if the asymmetric MPE of Proposition 6 fail to exist due to unbounded payoffs for

# ≥ A2/(1 + \), then by # → +∞ we actually mean # → A2/(1 + \).

27

Page 28: Strategic Exploration for Innovation

symmetric equilibrium. More precisely, when \ < −1, the full-intensity threshold 0‡#always hits 0 as # → +∞; when \ ≥ 1, whether this happens again depends on the cost

of exploration, just like in the symmetric equilibrium. It can be shown that the results

regarding the average payoff and the full-intensity threshold in Corollary 5 extend to the

asymmetric MPE of Proposition 6.

7 Conclusion

This paper provides a novel and tractable framework for analyzing strategic interaction

among a team of players conducting incremental experimentation. We show how welfare

and long-run outcomes depend on the strength of two competing forces—the free-rider

effect and the encouragement effect.

Several limitations in our model are noteworthy. First, the scope of experimenta-

tion is restricted: Players do not have complete freedom to choose where to explore.

Admittedly, this is indeed a simplifying assumption when our model is considered as

one of the models of spatial experimentation as in Callander (2011) and Garfagnini and

Strulovici (2016). Yet, incremental experimentation in our model can be alternatively

viewed as an approximation of a sequential search process in which the search results

follow a Markov process, with the search frequency being controlled by the players.

In this sense, the restriction imposed on the scope of experimentation is no different

from Markov assumption on the search results. Second, in our model, all actions and

outcomes are publicly observable. Hence, there are perfect knowledge spillovers be-

tween players, and the developed technologies are pure public goods. Extensions that

allow for some degree of excludability of the technologies or friction in the diffusion

of knowledge could be useful for applications in the industrial organization literature.

Third, the cost of exploration in our model is assumed to be the same for all players. The

effects of heterogeneity in the abilities of the players on their welfare and exploration

incentives are also worth exploring in future research.

References

Barbu, Viorel. “Existence and Uniqueness for the Cauchy Problem”. In: Differential

Equations. Cham: Springer International Publishing, 2016, pp. 29–77.

Bolton, Patrick and Christopher Harris. “Strategic Experimentation”. In: Econometrica

67.2 (1999), pp. 349–374.

28

Page 29: Strategic Exploration for Innovation

Callander, Steven. “Searching and Learning by Trial and Error”. In: American Economic

Review 101.6 (Oct. 2011), pp. 2277–2308.

Cetemen, D., C. Urgun, and L. Yariv. Collective Progress: Dynamics of Exit Waves.

Working Paper. 2021.

Denrell, Jerker and James G. March. “Adaptation as Information Restriction: The Hot

Stove Effect”. In: Organization Science 12.5 (2001), pp. 523–538.

Fleming, W. and H. Soner. “Optimal Control of Markov Processes: Classical Solutions”.

In: Controlled Markov Processes and Viscosity Solutions. New York, NY: Springer

New York, 2006, pp. 119–149.

Garcia-Macia, Daniel, Chang-Tai Hsieh, and Peter J. Klenow. “How Destructive Is

Innovation?” In: Econometrica 87.5 (2019), pp. 1507–1541.

Garfagnini, Umberto and Bruno Strulovici. “Social Experimentation with Interdepen-

dent and Expanding Technologies”. In: The Review of Economic Studies 83.4 (Feb.

2016), pp. 1579–1613.

Gittins, J. C. “Bandit Processes and Dynamic Allocation Indices”. In: Journal of the

Royal Statistical Society. Series B (Methodological) 41.2 (1979), pp. 148–177.

Keller, Godfrey and Sven Rady. “Strategic experimentation with Poisson bandits”. In:

Theoretical Economics 5.2 (2010), pp. 275–311.

Keller, Godfrey, Sven Rady, and Martin Cripps. “Strategic Experimentation with Expo-

nential Bandits”. In: Econometrica 73.1 (2005), pp. 39–68.

Lee, Eucman and Phanish Puranam. “The implementation imperative: Why one should

implement even imperfect strategies perfectly”. In: Strategic Management Journal

37.8 (2016), pp. 1529–1546.

Lehoczky, John P. “Formulas for Stopped Diffusion Processes with Stopping Times

Based on the Maximum”. In: The Annals of Probability 5.4 (1977), pp. 601–607.

March, James G. “Exploration and Exploitation in Organizational Learning”. In: Orga-

nization Science 2.1 (1991), pp. 71–87.

Peskir, Goran and Albert Shiryaev. “Methods of solution”. In: Optimal Stopping and

Free-Boundary Problems. Basel: Birkhäuser Basel, 2006, pp. 143–241.

Posen, Hart E. and Daniel A. Levinthal. “Chasing a Moving Target: Exploitation and Ex-

ploration in Dynamic Environments”. In: Management Science 58.3 (2012), pp. 587–

601.

Rothschild, Michael. “A two-armed bandit theory of market pricing”. In: Journal of

Economic Theory 9.2 (1974), pp. 185–202.

Stieglitz, Nils, Thorbjørn Knudsen, and Markus C. Becker. “Adaptation and inertia in

dynamic environments”. In: Strategic Management Journal 37.9 (2016), pp. 1854–

1864.

29

Page 30: Strategic Exploration for Innovation

Taylor, Howard M. “A Stopped Brownian Motion Formula”. In: The Annals of Proba-

bility 3.2 (1975), pp. 234–246.

Thomke, S. H. “Experimentation Works: The Surprising Power of Business Experi-

ments”. In: Harvard Business Review Press, 2020.

Urgun, C. and L. Yariv. Retrospective Search: Exploration and Ambition on Uncharted

Terrain. Working Paper. Department of Economics, Princeton University, 2020.

Appendices

In this Appendix, we let \ ≔ 2`/f2 and d ≔ f2/(2A2), where ` ∈ ℝ denotes the drift

of the Brownian path ,+, without normalizing the volatility f to√

2. Consequently,

we replace _ in Lemma 1 with _ ≔ 1/(#d) − \. Moreover, we let X ≔ A2/# for the

sake of notational convenience. Assumption 1, that _ > 1, is maintained throughout

this Appendix unless it is explicitly stated otherwise.

A Explicit Representation of the Symmetric MPE

Corollary 6. The explicit representation for the normalized payoff function *† in the

unique symmetric equilibrium on [0†, 0) is given by

*†(0) =

1 + 12d (0 − 0)2, if \ = 0,

1 + 1d\

(0 − 0 + 1

\

(4−\ (0−0) − 1

)), if \ ≠ 0.

If 0† > 0,*† on [0, 0†) is given by

*†(0) = #

W2 − W1

((W2 + ])4−W1 (0†−0) − (W1 + ])4−W2 (0†−0)

),

with W1 < W2 being the roots of the equation W(W − \) = 1#d and ] > 0 being

] ≔

1#d\

(1 +,0

(− exp

(−1 − (# − 1)d\2

) ) ), if \ > 0,√

2#d (1 − 1/#), if \ = 0,

1#d\

(1 +,−1

(− exp

(−1 − (# − 1)d\2

) ) ), if \ < 0,

where,0 and,−1 denote the two real branches of the Lambert, function.

If ] < 1, then the equilibrium belongs to the binding case. The full-intensity threshold

30

Page 31: Strategic Exploration for Innovation

is given by

0† =1

W2 − W1

(ln

(1 + W2

] + W2

)− ln

(1 + W1

] + W1

)),

and the stopping threshold is given by

0 = 0† + #d(] + \ (1 − 1/#)).

If ] ≥ 1, then the equilibrium belongs to the non-binding case with the full-intensity

threshold 0† = 0, whereas the stopping threshold 0 is given by

0 =

1\

(1 + \ − \2d +,0

(−(1 + \)4−1−\+\2d

)), if \ < 0,

1 −√

1 − 2d, if \ = 0,

1\

(1 + \ − \2d +,−1

(−(1 + \)4−1−\+\2 d

)), if \ > 0.

Proof. From Lemma 4 and explicit calculations. �

B Properties of Payoff Functions

Lemma 5 (Homogeneity). Player =’s payoff function for an B-invariant Markov strategy

profile k ∈ K# can be written as {= (0, B |k) = 4B{= (0, 0|k).

Proof. At state (0, B), player =’s payoff associated to k is given by

{= (0, B |k) = E

[∫ ∞

0

A4−AC (1 − := (�C))4(C dC

���� �0 = 0, (0 = B

]

= 4BE

[∫ ∞

0

A4−AC (1 − := (�C))4(C−B dC

���� �0 = 0, (0 = B

]

= 4BE

[∫ ∞

0

A4−AC (1 − := (�C))4(C dC

���� �0 = 0, (0 = 0

]

= 4B{= (0, 0|k),

where the second-to-last equality comes from the Markov property of the diffusion

process {�C}C≥0.

Lemma 6 (Normal Reflection Condition and Feynman-Kac Equation). Given an B-

invariant Markov strategy profile, the associated payoff function of each player satisfies

Feynman-Kac equation (6) at each state at which it is twice continuously differentiable.

Moreover, if the intensity of exploration is bounded away from 0 in some neighborhood

of the state 0 = 0, the payoff function also satisfies the normal reflection condition (4).

31

Page 32: Strategic Exploration for Innovation

Proof. We prove a more general version of this lemma in the Online Appendix for

Markov strategies with (0, B) as the state variables. The normal reflection condition

( m{=/m0 + m{=/mB )(0+, B) = 0 takes form of D= (0) + D′= (0+) = 0 because of the

homogeneity of the payoff functions for B-invariant strategies. �

C Complete Information Setting

C.1 Proof of Lemma 1

Proof. Let B, ≔ supG≥0{, (G) − XG}. Note that,+(G) − ~− XG has the same law as the

Brownian motion starting from 0 with drift `−X and volatilityf. Then it is well-known

that B,+ − ~ has the exponential distribution with mean 1/_ if _ > 1. Otherwise, if

_ ≤ 1, then B,+ = +∞ with probability 1, which obviously leads to + (0, B) = +∞.

Now suppose _ > 1. The (ex ante) value under complete information at state (0, B)can be calculated as

+ (0, B) = E [{(,) |,+(0) = ~,, (0) = B]= P( B, ≤ B)4B + P( B, > B)E

[4 B,

��B, > B]

= 4B(P( B, ≤ B) + P( B, > B)E

[4~−B4 B, −~

��B, > B]),

where {(,) = 4 B, is proved in Lemma 7 below.

Note that B, > B if and only if B,+ − ~ > 0, and hence we have

P( B, > B)E[4~−B4 B, −~

��B, > B]= 4−0

∫ ∞

04I_4−_I dI = − _

1 − _4−_0 .

Therefore, we have + (0, B) = 4B(1 − 4−_0 − _

1−_ 4−_0 )

≕ 4B* (0) with * (G) =

1 + 4−_0/(_ − 1). �

Lemma 7 (Average Ex-post Value Function under Complete Information). For a given

Brownian path , , the average (ex-post) value under complete information is given by

{(,) = 4 B, , where B, = supG≥0 {, (G) − XG}.

Proof. For an arbitrary technology G ≥ 0, denote by (G) = { C }C≥0 the cutoff strategy

in which the players explore with full intensity until G is developed, and exploit G

thereafter. In other words, C = # 1[0,G2/#) (C). By the nature of the problem, it is

obvious that we can focus on this class of cutoff strategies without loss.

32

Page 33: Strategic Exploration for Innovation

The average payoff for strategy (G) can be calculated as

{(, | (G)) =∫ ∞

G2/#A4−AC4, (G) dC = 4, (G)−XG ,

which gives

{(,) = supG≥0

{(, | (G)) = 4 B, ,

and the proof is complete.

We define the first-best technology to be any G ∈ arg maxG≥0{, (G) − XG}, if this set

is not empty given , . In such a case, the value {(,) < +∞ is achieved by strategy

(G).�

D Long-run Outcomes

D.1 Proof of Lemma 2

Proof. This lemma can be easily derived from the results in Lehoczky (1977). �

D.2 Proof of Lemma 3

Proof. By the definition of G in the proof of Lemma 7, we have, (G)−XG ≥ , (G)−XG for

all G ≥ 0. We assume there exists a unique G without loss.20 Let G ≔ arg max[0,G], (G)denote the technology with quality, (G) = B.21 Note that if 0 < 0 we have, (G) = B− 0,

otherwise we have G = 0 and thus, (G) = B − 0.

Suppose that G > G. Then we have , (G) − XG > B − XG ≥ B − XG, which implies

the event of supG>G{, (G) − X(G − G)} > B. By the Markov property of the Brownian

motion, this event occurs with probability P("∞ > max{0, 0}), where "∞ denotes the

global maximum of a Brownian motion starting from zero, with drift `−X and volatility

f. It is well known that "∞ has an exponential distribution with mean − f2

2(`−X) = 1/_.

Therefore, we have @(0) = 1−P0B (G > G) ≥ 1−P("∞ > max{0, 0}) → 1 as 0 → +∞,

since _ > 1 under Assumption 1. �

20This assumption holds almost surely under Assumption 1.

21By standard results, G(0) is almost surely finite under Assumption 1. See, e.g., equation (1.1) of

Taylor (1975).

33

Page 34: Strategic Exploration for Innovation

E Characterization and Properties of MPE

E.1 Proof of Lemma 4

The theorem follows from the following lemmas.

Lemma 8 (Sufficiency). Given k¬= ∈ K#−1, if D= : ℝ+ → ℝ satisfies Condition 1–

4 in Lemma 4, then a piecewise right-continuous function :∗= : ℝ+ → [0, 1] which

maximizes the right-hand side of the HJB equation (5) at each continuity point of D′′= is

a best response against k¬=, with D= being the associated payoff function of player =.

Proof. The proof is a standard verification argument. See, e.g., Fleming and Soner

(2006), Theorem III.9.1. The role played by the linear growth condition in a standard

proof is instead played by Assumption 1. The detailed proof is provided in the Online

Appendix.

Lemma 9 (Necessity). In any MPE k ∈ K# , for each player = ∈ {1, . . . , #}, her payoff

function D=(·|k) satisfies Conditions 1–4 in Lemma 4, and her equilibrium strategy

:= (0) maximizes the right-hand side of the HJB equation (5) at each continuity point of

D′′= .

Proof. Condition 1: On the stopping region where (0) = 0, it is trivial that the payoff

functions are smooth. On the region where (0) is bounded away from zero, players’

payoff functions are once continuously differentiable by standard results. Because the

intensity of exploration in any MPE is bounded away from zero on any compact subset

of [0, 0), as we argue in Section 4.2, the only part left to prove is the smooth pasting

condition, which states that the payoff functions in any MPE must be once continuously

differentiable at the stopping threshold 0. This is proved in the Online Appendix.

Condition 2: By standard results, players’ payoff functions are twice continuously

differentiable at each point at which all players’ strategies are continuous. Condition 2

thus follows from our piecewise continuity assumption on players’ strategies.

Condition 3 follows from Lemma 6.

Condition 4 follows directly from the dynamic programming principle.

E.2 Proof of Proposition 2

Proof. Suppose to the contrary that there is an MPE where player 1 chooses exploitation

at all states. Let 0 > 0 denote the state at which all exploration stops. Then on (0, 0)

34

Page 35: Strategic Exploration for Innovation

player 1’s payoff function D1 is continously differentiable and solves the free-rider ODE

D(0) = 1 + (0)V(0, D) with value matching D(0) = 1 and smooth pasting condition

D′(0) = 0. It can then be easily verified that D1(0) = 1 is the unique solution, which is a

contradiction because the normal reflection condition (4) is violated. �

E.3 Proof of Proposition 3

Proof. As the individual payoff functions are bounded from below by the single-agent

payoff function*∗1, it is clear that the stopping threshold 0 in any MPE is weakly larger

than the single-agent cutoff 0∗1. Suppose by contradiction that 0 = 0∗

1. Assume without

loss of generality that all players’ strategies are continuous on (0∗1− n, 0∗

1) for some

n > 0, so that each player’s payoff function is twice continuously differentiable on this

region. Note that for each player =, both D= and *∗1

satisfy value matching and smooth

pasting at 0∗1. Then from equation (6), we have for each player =,

dD′′= (0∗1−) =:= (0∗1−) (0∗

1−) ≤ 1

and d*∗′′1(0∗

1−) = 1. As dD′′= (0∗1−) < 1 for at least one player, her payoff lower bound

D= ≥ *∗1 is then violated at the states immediately to the left of 0. �

E.4 Proof of Proposition 4

Proof. Suppose by contradiction that there is an MPE where all players use cutoff

strategies. Let player 1 be the one who uses the strategy with the largest cutoff 0. Then

no other player uses the same cutoff 0 as player 1 for the following reason. Suppose to

the contrary that player 2 uses a strategy with the same cutoff 0, then both player 1 and

2 must have a payoff strictly greater than 1 and lower than 2 at the states immediately

to the left of 0, as their payoff functions solve the explorer ODE D(0) = V(0, D) for

some > 1 with initial condition D(0) = 1 and D′(0) = 0. As a result, exploration with

full intensity cannot be optimal for both players at the states immediately to the left of

cutoff 0 according to our characterization of best responses. Therefore, player 1 would

be the lone explorer with D1 > 1 on (0 − n, 0) for some n > 0, whereas all other players

free-ride on this region.

Moreover, it is easy to see that player 1’s payoff is weakly lower than the others,

because the region on which she collects flow payoffs is the smallest among all players.

However, on (0 − n, 0) the payoff function D= of player = ≠ 1 satisfies the free-rider

ODE D = 1+ V(0, D) with the same initial conditions as player 1. This yields the unique

35

Page 36: Strategic Exploration for Innovation

solution D= = 1 < D1 on (0 − n, 0), a contradiction.

F Symmetric MPE

F.1 Proof of Proposition 5

From Lemma 4, it is not difficult to verify that the strategy profile in Corollary 6

constitutes an equilibrium, and that our proposition properly summarizes the equilibrium

payoff function in that corollary.

Uniqueness follows directly from symmetry and Lemma 4. One can check by

explicit calculations that, under Assumption 1, the equilibrium in Corollary 6 is the only

symmetric B-invariant strategy profile such that the associated common payoff function

satisfies Conditions 1–4 in Lemma 4.

The comparison between the stopping threshold and the cooperative cutoffs follows

from Lemma 10.

G Comparative Statics

Lemma 10 (Welfare Comparison). Consider normalized payoff functions D8 : ℝ+ →[1, +∞) for 8 = 1, 2, with stopping thresholds 08 ≔ sup{ 0 ≥ 0 | D8 (0) > 1 } < +∞.

Suppose for each 8 = 1, 2, D8 satisfies the following conditions:

1. �1 on (0, 08) with D′1(01) = D′2(02) ≤ 0;

2. piecewise �2 on ℝ++;

3. there exists some function �8 : (1, +∞) × ℝ− → ℝ++ with �8 (I1, I2)/I2 nonde-

creasing in I1 and nonincreasing in I2 such that

(a) D′′8 (0) = �8 (D8 (0), D′8 (0)) for all 0 ∈ (0, 08) at which D′′8 (0) is continuous;

(b) D′8 (0) + �8 (D8 (0), D′8 (0)) > 0 for all 0 ∈ (0, 08).

If �1 ≥ �2, then 01 ≤ 02 and D1 ≤ D2. Additionally, if for some D ∈ (1, D2(0)] we

have �1 = �2 on (D, D2(0)) × ℝ− and for all I2 ≤ 0 at least one of the following two

conditions holds:

1. �1(D−, I2) > �2(D−, I2);

2.m�1 (D−,I2)

mI1<

m�2 (D−,I2)mI1

andm�1 (D−,I2)

mI2≥ m�2 (D−,I2)

mI2;

36

Page 37: Strategic Exploration for Innovation

then 01 < 02, and D1 < D2 on [0, 02).

Proof. See the Online Appendix. �

Remark. Almost all the payoff functions in this paper fulfill the first four conditions in

the Lemma above under Assumption 1 with value matching and smooth pasting at 0.

These include

• the cooperative solution: �(I1, I2) = I1#d + \I2;

• the symmetric MPE: �(I1, I2) = max{I1/#, 1}/d + \I2;

• the average payoff in the asymmetric MPE:

�(I1, I2) = max{I1/#,min{1, I1 − (1 − 1/#)}}/d + \I2;

• the average payoff in the simple asymmetric MPE of Proposition 10:

�(I1, I2) =

I1−(1− /#) d + \I2, if < I1 ≤ + 1 for = 1, . . . , # − 1,

I1#d + \I2, if I1 ≥ #.

G.1 Proof of Corollary 1

Proof. Here we prove the limit results only. Other results can be easily derived from

explicit calculations. Write | = (ln *)′ and |∗ = (ln*∗)′. It is not difficult to verify

that |′ = −(

1#d − \

)| − |2 on ℝ++; |∗′ =

1#d + \|∗ − |∗

2 on (0, 0∗# ) and |∗ = 0 on

[0∗# , +∞).Suppose that \ ≤ −1. Then on (0, 0∗#), both ODEs above converge to|′−\|+|2

= 0

as # → +∞. Because the normal reflection condition (4) implies |(0) = |∗(0) = −1,

we must have lim|∗ = lim | on (0, 0∗#). However, notice that lim |(0) < 0 for all

0 ≥ 0. Therefore, we must have 0∗# → +∞ by the continuity of |∗ on ℝ+. This, together

with the convergence of |∗ and |, implies lim*∗(0) = lim * (0) for each 0 ≥ 0. In

particular for \ = −1, from Lemma 1 we know * (0) → +∞ as # → +∞, and hence

*∗(0) → +∞ for each 0 ≥ 0. Then the results for the case of \ > −1 follow from the

monotonicity of *∗(0) in \, which can be easily verified by Lemma 10.

G.2 Proof of Corollary 3

Proof. Since 2 = f2/(2Ad), the comparative statics with respect to 2 directly follow

from the following proof of the comparative statics with respect to d. Recall that the

37

Page 38: Strategic Exploration for Innovation

common payoff function*†d in the symmetric MPE satisfies ODE D′′ = �(D, D′) on (0, 0)where �(I1, I2) = max{I1/#, 1}/d + \I2. Then the comparative statics of *†d and the

stopping threshold 0d follow directly from Lemma 10.

Because *†d (0) is decreasing on [0, 0) and increasing in d for each 0 ∈ [0, 0), the

full-intensity threshold 0†d = *†−1d (#) is weakly increasing in d.

Lastly, because :†d (0) = (*†d (0) − 1)/(# − 1) on (0†d, 0d), the comparative statics

results above imply that :†d (0) is weakly increasing in d for all 0 ≥ 0.

G.3 Proof of Corollary 4

Proof. For the binding case, similar to the proof of Corollary 3, the comparative statics

of *†# and 0# with respect to # are immediate from Lemma 10.

For the non-binding case, Lemma 10 implies that both *†# and 0# are constant

over # , as �(I1, I2) = 1/d + \I2 does not depend on # . Then it is obvious that

:# (G) = (*†# (0) − 1)/(# − 1) is weakly decreasing in # for each 0 ≥ 0. �

G.4 Proof of Corollary 5

For the sake of notational convenience, we are going to prove the results in terms of

d = f2/(2A2) > d for some d, which is equivalent to 2 < 2 = f2/(2A d). We let

d = (\ − ln(1 + \))/\2 if \ > −1.22

The statement for the case that d > d and \ > −1 can be easily shown by an analysis

on the phase diagram from the theory of ordinary differential equations. A rigorous

proof by explicit derivation is provided in the Online Appendix. The opposite case

follows from the following lemma.

Lemma 11. If either d ≤ d or \ ≤ −1, then there exists # > 1 such that 0†# = 0 and

*†# = *†# < +∞ for all # ≥ # .

Here we do not impose Assumption 1. Therefore, this lemma states that when #

reaches # as increased from 1, the symmetric equilibrium falls into the non-binding

category and continues to be an equilibrium for all # > # , even when Assumption 1

is violated for large # in the case of \ > −1. As a consequence, we have 0# = 0# for

all # ≥ # , and thus the stopping threshold 0# is bounded from above as # → +∞.23

22If \ = 0 we set d = 1/2, the limit of the right-hand side as \ → 0.

23More precisely, when Assumption 1 is violated and the uniqueness of the symmetric MPE is not

guaranteed, here we mean there exists a sequence of symmetric MPE indexed by # , with 0# bounded

from above as # → +∞.

38

Page 39: Strategic Exploration for Innovation

This implies |*∗# (0) −*†# (0) | is bounded away from zero in the limit, because for large

enough # we have that*†# (0) is constant over # , and that*∗# (0) is increasing in # for

each 0 ≥ 0.

Moreover, since 0# is constant over # for large enough # , the amount of exploration

G(0# ) is also constant for large # . The limit result for @# (0#) then follows from the

fact that the first-best technology G is nondecreasing in # for each given Brownian path

, and each initial state (0, B).Next we prove the above lemma.

Proof of Lemma 11. Suppose either d ≤ d or \ ≤ −1. We construct *†# according to

the closed-form expression of the equilibrium payoff function for the non-binding case

in Corollary 6. This is possible only when either d ≤ d or \ ≤ −1, as otherwise the

expression for 0 in that corollary is not well-defined. Let # = *†# (0), and for all # ≥ # ,

let :†# = (*†# − 1)/(# − 1).We now verify that the strategy profile k

†# with each player playing :†# constitutes

a symmetric MPE with non-binding resource constraints in the #-player exploration

game for all # ≥ # , with*†# being the associated payoff function. Note that we cannot

apply Lemma 4 directly because it relies on Assumption 1. Nevertheless, from the

proof of Lemma 8, we know that*†# is an upper bound on player =’s achievable payoffs

against k†¬= (this fact does not rely on Assumption 1). Moreover,*†# is indeed the payoff

function associated with k†# , since *†# is the only function that satisfies both of the

properties in Lemma 6. As a result, the upper bound on the payoff functions when the

player plays against k†¬= is achieved by :†# , and therefore k

†# is a symmetric MPE.

H Asymmetric MPE

H.1 Simple MPE

Here we present a class of asymmetric MPE similar to the simple MPE in Section 6.1

of KRC. The construction is almost identical to the asymmetric MPE of Proposition 6,

and therefore the proof is omitted.

Proposition 10 (Asymmetric Simple MPE). The #-player exploration game admits

simple Markov perfect equilibria with " thresholds 0 ≕ 0"+1 < 0" < · · · < 01 <

00 ≔ +∞ for some " ∈ {1, . . . , #}, such that on [0 +1, 0 ) exact players take turns

exploring on consecutive subintervals.

39

Page 40: Strategic Exploration for Innovation

The average payoff function D is strictly decreasing on [0, 01], twice continuously

differentiable on ℝ++ except for the states {0 }" =1. Moreover, we have ⌊D(0)⌋ =

" , and D solves the ODE D = (1 − /#) + d (D′′ − \D′) on [0 +1, 0 ) for each

∈ {0, 1, . . . , "}. The payoff function of each player D= is strictly decreasing on

[0, 01], once continuously differentiable on ℝ++, and satisfies D=(0 ) = D(0 ) = for

∈ {1, . . . , "}.

H.2 Proof of Proposition 6

Step 1: Construction of the average payoff function.

Let D : ℝ− → ℝ be once continuously differentiable and solves

max{D(0)/#,min{1, D(0) − (1 − 1/#)}} = V(0, D), (10)

with initial conditions D(0) = 1 and D′(0) = 0.24 Let 0♭ > 0 be such that D(−0♭) +D′(−0♭) = 0.25

Now we let the average payoff function D(0) = D(0 − 0♭) for 0 ≤ 0♭ and D(0) = 1

for 0 ≥ 0♭. We can check that D satisfies value matching D(0♭) = 1, smooth pasting

D′(0♭) = 0, and the normal reflection condition D(0) + D′(0) = 0. Let 0♯ = D−1(2−1/#)and 0‡ = D−1(#) where D−1(D) = inf{ 0 ≥ 0 | D(0) ≤ D }. Under such construction

we have 0 ≤ 0‡ ≤ 0♯ < 0♭ and D : ℝ+ → ℝ is continuously differentiable and satisfies,

on (0♭, +∞), D = 1; on (0♯, 0♭), solves D = 1 − 1/# + V(0, D); on (0‡, 0♯), solves

1 = V(0, D); on (0, 0‡), solves D = #V(0, D).

Step 2: Construction of the players’ payoff functions and strategies.

For each player =, on [0♭, +∞), let := (0) = 0 and D= = D = 1; on [0, 0♯), if not empty,

let := (0) = min{1, (D(0) − 1)/(# − 1)} and D= = D.

Next, consider any partition 0♯ = 01 < 02 < · · · < 0< < 0<+1 = 0♭ of interval

[0♯, 0♭]. For each subinterval [0 9 , 0 9+1] in the partition {0 9 }<+19=1, we use Algorithm 1

to construct payoff function D= and strategy := for each player =.

For each subinterval [0 9 , 0 9+1], Algorithm 1 calls procedure ActionAssignment,

which first calls function Split to split [0 9 , 0 9+1] into three subintervals according to

Lemma 12 below, and lets the player with the lowest index available freeride on the

subinterval [0"−, 0"+] in the middle, and explore on the rest two subintervals at both

ends. In such a way, the strategy := and payoff function D= of this player are defined on

24It is straightforward to verify that D is strictly decreasing.

25It can be shown that under Assumption 1 there exists a unique 0 < 0♭ < +∞.

40

Page 41: Strategic Exploration for Innovation

[0 9 , 0 9+1], and she is then labeled as unavailable. Then ActionAssignment is called

recursively on these three subintervals, preserving the total intensity = 1 by allocating

intensity 0 on the subintervals at both ends, and intensity 1 on [0"−, 0"+], with D on

these subintervals being replaced by D¬=, which is the average payoff function among

the rest of the available players.

Lemma 12 ensures function Split partitions any interval [0!, 0'] into three subin-

tervals in a unique way such that D=, and therefore also D¬=, have the same values and

derivatives as D at the end points 0! and 0'.

The termination of Algorithm 1 is trivial because each time ActionAssignment

is called, the strategy of one of the players is assigned and she is thereafter removed

from the set of the available players. In the case where the allocation of strategies is

unambiguous—that is, either the task of exploration has already been assigned or is to

be assigned to the last available player—splitting the interval is unnecessary, and the

corresponding strategies are set at once for all these players.

When Algorithm 1 terminates, the strategies and the constructed payoff functions of

each player are defined on [0♯, 0♭]. Also note that D from Step 1 is indeed the average of

the constructed payoffs of all the players, and the input requirement for D in procedure

Split is satisfied in each call to the procedure. These conditions are maintained by line 22

during each call of ActionAssignment. Lastly, D= is once continuously differentiable

on ℝ++ for each =, since no “kink” is created in Algorithm 1.

Step 3: Ensuring mutually best responses.

Because each D= is once differentiable on ℝ++, is twice differentiable except for the

switch points, and satisfies normal reflection condition (4), our characterization of MPE

in Lemma 4 states that (:1, . . . , :# ) constitutes an MPE with D= being the associated

payoff function of player = if D= solves the HJB equation

D= (0) = 1 + ¬= (0)V(0, D=) + max:=∈[0,1]

:={V(0, D=) − 1}

for each continuity point of D′′= , and the constructed strategy maximizes the right-hand

side.

In other words, we need to verify for all 0 ≥ 0, := (0) = 1 if V(0, D=) > 1, and

:= (0) = 0 if V(0, D=) < 1. Then the HJB is satisfied by construction.

On (0♭, +∞), D=(0) = 1 gives V(0, D=) = 0 < 1, therefore := (0) = 0 is optimal. On

[0, 0♯), the argument is exactly the same as in the symmetric MPE. Lastly, on (0♯, 0♭),we have = 1 by construction. The monotonicity of D and the construction of D=

implies 1 ≤ D= (0) < D(0♯) = 2− 1/# for each player =. Our construction of D= ensures

41

Page 42: Strategic Exploration for Innovation

Algorithm 1 Payoff construction

Require: 0♯ = 01 < . . . < 0<+1 = 0♭, D from Step 1, and a finite set of players

# = {1, . . . , |# |}Ensure: equilibrium strategy profile {:=} |# |==1

is defined on [01, 0<+1], and {D=} |# |==1is

the set payoff functions corresponding to strategy profile {:=} |# |==1

1: for all 9 ∈ {1, . . . , <} do

2: ActionAssignment(# , 1, 0 9 , 0 9+1, D)

3: end for

4: function Split(|� |, 0! , 0', D) ⊲ input and output according to Lemma 12

Require: |� |: number of available players

Require: D: solves D = 1 − 1/|� | + d(D′′ − \D′) on (0! , 0')Ensure: D ∈ �1((0! , 0')) solves D = d(D′′ − \D′) on (0!, 0"−) ∪ (0"+, 0'), and

D = 1 + d(D′′− \D′) on (0"−, 0"+), with the same values and derivatives as D at 0!and 0'

5: return (0"−, 0"+, D)

6: end function

7: procedure ActionAssignment(�, ^, 0! , 0', D)

⊲ allocate aggregate intensity ^ ∈ {0, 1} to a set of available players �

8: if ^ = 0 then ⊲ let all available players free-ride

9: for all = ∈ � do

10: := | [0! ,0') ← 0

11: D= | [0! ,0') ← D12: end for

13: else if ^ = 1 = |� | ≕ |{=}| then ⊲ let the only available player explore

14: := | [0! ,0') ← 1

15: D= | [0! ,0') ← D

16: else ⊲ let one of the |� | ≥ 2 available player explore

17: (0"−, 0"+, D) ← Split(|� |, 0! , 0', D)18: =← min �

19: := | [0! ,0"−)∪[0"+,0') ← 1

20: := | [0"−,0"+) ← 0

21: D= | [0! ,0') ← D

22: D¬= | [0! ,0') ←|� |D−D=|� |−1

⊲ the average payoff among the rest of the players

23: ActionAssignment(�\{=}, ^, 0"−, 0"+, D¬=)24: ActionAssignment(�\{=}, ^ − 1, 0! , 0"−, D¬=)25: ActionAssignment(�\{=}, ^ − 1, 0"+, 0', D¬=)26: end if

27: end procedure

42

Page 43: Strategic Exploration for Innovation

:= (0) = 1 if and only if D= (0) = V(0, D), and := (0) = 0 if and only if D= (0) = 1+V(0, D)for each G ∈ (0♯, 0♭). Suppose V(0, D=) > 1, which implies 1 + V(0, D=) > 2 > D= (0).Then by construction it must be D= (0) = V(0, D), and hence the constructed strategy

:= (0) = 1 is optimal. On the contrary, if V(0, D=) < 1, which implies V(0, D=) < D= (0),then by construction it must be D= (0) = 1 + V(0, D=), and hence the constructed strategy

:= (0) = 0 is optimal.

Comparison with symmetric MPE

Note that D solves ODE

D′′ = max{D/#,min{1, D − (1 − 1/#)}}/d + \D′

on (0, 0♭), while the average payoff function*† of symmetric MPE solves

D′′ = max{D/#, 1}/d + \D′

on (0, 0), with value matching and smooth pasting at 0♭ and 0, respectively. Obviously

the right-hand side in the first equation is weakly smaller. We can then check the second

condition for the strict inequalities in Lemma 10 with D = 2 − 1/# and conclude that

0 < 0♭ and D > *† on [0, 0♭). Similarly, we can also compare the first equation with

the ODE in the cooperative problem D′′ = D/(#d) + \D′ and conclude 0♭ < 0∗# .

Lemma 12 (Split). Given a strictly decreasing function D : [0!, 0'] → [D! , D'],with D(0!) = D! and D(0') = D' ≥ 1, which satisfies the average payoff ODE D =

5 +d(D′′−\D′) on (0!, 0')with 0 < 5 < 1 and d > 0, there exist 0! < 0"− < 0"+ < 0',

and a function D : [0! , 0'] → [D! , D'] continuously differentiable on (0!, 0') such

that D(0!) = D(0!); D′(0!) = D′(0!); D(0') = D(0'); D′(0') = D′(0'); D solves the

explorer ODE D = d(D′′− \D′) on (0! , 0"−) ∪ (0"+, 0') and solves the free-rider ODE

D = 1 + d(D′′ − \D′) on (0"−, 0"+).

Proof. See the Online Appendix. �

H.3 Proof of Proposition 7

Proof. The average payoff D in any MPE must be once continuously differentiable and

satisfy the ODE D′′ = �(D, D′) for some � respecting Feynman-Kac equation (6) whenever

D > 1 and D′′ is continuous. For # = 2, the characterization of best responses gives the

following minimal requirements on �. For each 1 < D < 2, �(D, D′) can only be either

43

Page 44: Strategic Exploration for Innovation

D−1/2d +\D′ (one player explores and the other freerides) or 1

d +\D′ (both players choose a

common interior allocation); for each D > 2, � can only be eitherD−1/2d + \D′ (one player

explores and the other freerides) or D2d + \D′ (both players explore). The asymmetric

MPE we construct in Proposition 6 adopts �(D, D′) = max{D/2,min{1, D−1/2}}/d+\D′,which is the minimal � that satisfies these constraints, and therefore achieves the highest

payoff among all MPE by Lemma 10. �

H.4 Proof of Proposition 8

Proof. Choose a partition {0 9 }<+19=1of [0♯, 0♭] in the construction of the equilibria in

Proposition 6, so that

max1≤ 9≤<

��D(0 9+1) − D(0 9 )�� ≤ X ≔ 1

2max0♯≤0≤0

{D(0) −*†(0)},

and |0< − 0<+1 | < n . Because the average payoff D and the players’ payoff functions

are monotone and coincide at the endpoints of the subintervals [0 9 , 0 9+1], we have

|D= (0) − D(0) | ≤ X for 0 ∈ [0♯, 0] for all player =. Therefore, we have D= ≥ D − X > *†

on [0♯, 0). Also, as D= (0<) = D(0<) > 1 for all player =, we have D= > 1 = *† on

[0, 0<] = [0, 0♭ − n]. Lastly, D= = D > *† on [0, 0♯), if it is not empty.

H.5 Proof of Proposition 9

Proof. Suppose \ < −1. Then Assumption 1 is always satisfied and the asymmetric

MPE in Proposition 6 exist for all # .

Denote by D# the average payoff function in the asymmetric MPE of Proposition 6 for

an #-player exploration game. Because D# satisfies the ODE D = 1−1/# + d(D′′− \D′)whenever 1 < D < 2 − 1/# , we verify that |# ≔ ln(D#)′ satisfies the second-order

ODE

|′′ − \|′ + 3||′ + |(|2 − \| − 1/d) = 0 (11)

on (0♯, 0♭), with initial conditions |(0♭) = 0 and |′(0♭) = 1#d , and |# = 0 on [0♭, +∞).

By the continuity of the solution to ODE (11) in the initial data (see, e.g., Theorem

2.14 of Barbu (2016)), as # → +∞, |# converges to 0 on (0♯, 0♭), which is the unique

solution to ODE (11) with initial conditions |(0♭) = 0 and |′(0♭) = lim 1#d = 0.

Suppose by contradiction that 0♭ is bounded from above as # → +∞. This implies for

sufficiently large # we have D# (0) = exp(∫ 00♭|#

)< 2 − 1/# for all 0 ∈ [0♯, 0♭], and

thus by construction we have 0♯ = 0. Then the fact that |# → 0 on (0, 0♭) contradicts

44

Page 45: Strategic Exploration for Innovation

|# (0) = −1 imposed by the normal reflection condition (4).

Now suppose \ ≥ −1. If the constructed asymmetric MPE exist for all # > 1,

then 0♭ → +∞ as # → +∞. This is because for fixed # > 1 and 0 ≥ 0, D# (0) is

nondecreasing in \, and thus the claim follows from the case of \ < −1 as we have shown

above. Otherwise, it can be shown that as # → 1d(1+\) we have D# → +∞ and 0♭ → +∞

by an argument similar to the proof of Corollary 5 for the symmetric equilibrium.

45

Page 46: Strategic Exploration for Innovation

arX

iv:2

108.

0721

8v1

[ec

on.T

H]

16

Aug

202

1

Online Appendix

“Strategic Exploration for Innovation”

Shangen Li∗

August 17, 2021

1 Equilibrium Characterization (Details)

1.1 Proof of Lemma 8

Proof. Given k¬= ∈ K#−1, for any admissible control process := = {:=,C}C≥0 ∈ A, let

L ≔ [(0, B, :=,C)m

m0+ 1

2U2(0, B, :=,C)

m2

m02,

with [(0, B, :) = −`(:+ ¬= (0))/2 andU(0, B, :) = f√(: + ¬= (0))/2. Let 5 (0, B, :) =

A (1 − :)4B , and consider {(0, B) = 4BD= (0), where D= : ℝ+ → ℝ satisfies Conditions

1–4 in Lemma 4.

Usually, applying Itô’s formula on { at 0 > 0 requires D= ∈ �2 (ℝ++). However, Itô’s

formula is still valid for �1 functions with absolutely continuous derivatives, which is

satisfied by D= under Conditions 1 and 2.1

∗Department of Economics, University of Zürich. E-mail address: [email protected]

1See, e.g., Chung and R. J. Williams (2014), Remark 1, p. 187; Rogers and D. Williams (2000),

Lemma IV.45.9, p. 105; or Strulovici and Szydlowski (2015), footnotes 73 and 78.

1

Page 47: Strategic Exploration for Innovation

By applying Itô’s formula on 4−A) {(�) , () ),2 for fixed 0 < ) < ∞ we have

4−A) {(�) , () ) − {(0, B) =∫ )

0

4−AC (L{ − A{)(�C , (C) dC

−∫ )

0

4−ACU(�C , (C , :=,C)m{

m0(�C , (C) d�C

+∫ )

0

4−AC(m{

m0+ m{mB

)(�C , (C) d(C .

Since d(C = 0 whenever �C > 0, and by Condition 3 we have (m{/m0 + m{/mB ) (�C , (C) =0 when �C = 0, the last term is identically zero. Moreover, because U(�C , (C , :=,C) and

m{/m0 are bounded, the second term has mean zero. Taking expectation on both sides,

for fixed 0 < ) < ∞ we have the Dynkin’s formula

{(0, B) = −E0B

[∫ )

0

4−AC (L{ − A{)(�C , (C) dC

]+ 4−A)E0B [{(�) , () )],

where the expectations on the right-hand side are finite for each ) < ∞.

Note that HJB equation (5) implies

A{(0, B) ≥ 5 (0, B, :=,C) + L{(0, B),

and therefore we have

{(0, B) ≥ E0B

[∫ )

0

4−AC 5 (�C , (C , :=,C) dC

]+ 4−A)E0B [{(�) , () )] . (1)

Let ) → ∞, we have

{(0, B) ≥ lim inf)→∞

E0B

[∫ )

0

4−AC 5 (�C , (C , :=,C) dC

]+ lim inf

)→∞4−A)E0B [{(�) , () )] .

Because∫ )04−AC 5 (�C , (C , :=,C) dC is increasing in ) , as the integrand is non-negative,

we can apply either the monotone convergence theorem or Fatou’s lemma to have

{(0, B) ≥ E0B

[∫ ∞

0

4−AC 5 (�C , (C , :=,C) dC

]+ lim inf

)→∞4−A)E0B [{(�) , () )]

= {= (0, B |:=, k¬=) + lim inf)→∞

4−A)E0B [{(�) , () )]

≥ {= (0, B |:=,k¬=).

2Note that (C is a finite variation process and hence both the quadratic variation 〈(, (〉C and the

covariation 〈�, (〉C are identically zero. See Shreve (2004) Section 7.4.2 for a non-technical treatment.

2

Page 48: Strategic Exploration for Innovation

Now we repeat the argument by replacing :=,C with :∗=,C = :∗= (�C). Inequality (1)

becomes equality, and for 0 < ) < ∞ we have

{(0, B) = E0B

[∫ )

0

4−AC 5 (�C , (C , :∗=,C) dC

]+ 4−A)E0B [{(�) , () )] .

As the first term on the right-hand side is increasing in ) , the second term must be

decreasing, and hence

{(0, B) = E0B

[∫ ∞

0

4−AC 5 (�C , (C, :∗=,C) dC

]+ lim)→∞

4−A)E0B [{(�) , () )]

= {(0, B |:∗=, k¬=) + lim)→∞

4−A)E0B [{(�) , () )] .

We finish the proof by showing lim)→∞ 4−A)E0B [{(�) , () )] = 0 as follows.

Because �C ≥ 0, we have

E0B [{(�) , () )] ≤ E0B [{(0, () )]= D(0)E0B [4() ]≤ D(0)E0B [4() ]= D(0)4BE0B [4() −B]≤ D(0)4BE0B [4") # /2 ],

where ") = max0≤C≤) {`C +f�C}. The last inequality comes from the fact that () − B ≤")#/2 almost surely. Indeed, given that �0 = 0, the process {() − B})≥0 can be viewed

as {") })≥0 with time change in the sense that () − B = ") ′ with ) ′=

∫ )0 C/2 dC.

Because C ∈ [0, #], we have ) ′ ≤ )#/2 and hence ") ′ ≤ ")#/2.

Then from Lemma 4.1 in Section 4, we have

E0B [4") # /2 ] ≤ �14(`+f2/2))#/2 +�2

for some �1, �2 ≥ 0, and therefore

4−A)E0B [{(�) , () )] ≤ �14(−A+(`+f2/2)#/2)) +�24

−A) .

The right-hand side goes to 0 as ) → ∞ if (` + f2/2)#/2 < A, which is precisely

Assumption 1. �

3

Page 49: Strategic Exploration for Innovation

1.2 Normal Reflection Condition and Feynman-Kac Formula

The following Lemma provides some properties of player =’s payoff function for Markov

strategy profile k with state variables (0, B), in which the strategies are not necessarily

best responses against each other.

Let

L = [(0, B) mm0

+ 1

2U2(0, B) m

2

m02

with [(0, B) = −` (0, B)/2 and U(0, B) = f√ (0, B)/2. Denote by 5 (0, B) = A (1 −

:= (0, B))4B the flow payoff that player= receives at state (0, B), and by {(0, B) = {= (0, B |k)her payoff at state (0, B).

Lemma 1.1 (Properties of Payoff Function). If 0 ↦→ {(0, B) is twice continuously

differentiable at 0, then it satisfies the Feynman-Kac formula

A{(0, B) = L{(0, B) + 5 (0, B). (2)

Moreover, if { is continuously differentiable on {0 = 0},3 and (0, B) is bounded away

from 0 on some open set containing {0 = 0}, then { satisfies the normal reflection

conditionm{(0+, B)m0

+ m{(0+, B)mB

= 0. (3)

Proof. Given strategy profile k, consider player =’s continuation value at time ) , that

is,

{(�) , () ) = E

[∫ ∞

)

4−A (C−) ) 5 (�C , (C) dC

����F)],

and her total discounted payoff at time 0, evaluated conditionally on the information

available at time ) , that is,

/) ≔ E

[∫ ∞

0

4−AC 5 (�C , (C) dC

����F)]

= E

[∫ ∞

)

4−AC 5 (�C , (C) dC

����F)]+∫ )

0

4−AC 5 (�C , (C) dC

= 4−A) {(�) , () ) +∫ )

0

4−AC 5 (�C , (C) dC .

3Here we mean { : ℝ+ × ℝ → ℝ can be extended to a function that is continuously differentiable on

some open set in ℝ2 containing {0 = 0}.

4

Page 50: Strategic Exploration for Innovation

Note that {/C}C≥0 is a martingale.4 Indeed, for 0 ≤ g ≤ ) , we have

E[/) |Fg] = E

[E

[∫ ∞

0

4−AC 5 (�C , (C) dC

����F)] ����Fg

]

= E

[∫ ∞

0

4−AC 5 (�C , (C) dC

����Fg]

= /g .

Because (·, B) is piecewise continuous, we have 0 ↦→ {(0, B) is piecewise �2 on

ℝ++. Moreover, { is continuously differentiable on {0 = 0} by assumption. Therefore,

we can apply Itô formula to 4−A) {(�) , () ),5 which gives

4−A) {(�) , () ) − {(0, B) =∫ )

0

4−AC (L{ − A{)(�C , (C) dC

−∫ )

0

4−ACU(�C , (C)m{

m0(�C , (C) d�C

+∫ )

0

4−AC(m{

m0+ m{mB

)(�C , (C) d(C .

Because /) is a martingale starting from {(0, B), we can conclude the process

/) − {(0, B) +∫ )

0

4−ACU(�C , (C)m{

m0(�C , (C) d�C

=

∫ )

0

4−AC ((L{ − A{ + 5 )(�C , (C)) dC +∫ )

0

4−AC(m{

m0+ m{mB

)(�C , (C) d(C

is a continuous martingale starting from 0. Observe that the right-hand side is a

finite variation process. As a consequence, /) must be 0 for all ) ≥ 0 almost surely.

Moreover, note that the Lebesgue measure dC and the measure d() on ([0, )],B([0, )]))are mutually singular almost surely. Indeed, the set � ≔ { C ∈ [0, )] : �C = 0 } is a null

set under Lebesgue measure dC, and [0, )]\� is a null set under measure d() almost

surely. Because almost surely the sum of the integrals on the right-hand side is zero

and dC ⊥ d() , by Lebesgue decomposition theorem we know that each integrand is a.s.

a.e. 0 on [0, )] for all ) .6 This yields the Feynman-Kac formula (2) and the normal

reflection condition (3).7 �

4Assumption 1 guarantees all expectations in this proof are finite.

5See footnotes 1 and 2 for the validity of applying Itô’s formula here.

6By a.e. 0 on [0, )] we mean the integrands are 0 on all non-null sets w.r.t. the corresponding measure.

7We require (0, B) > 0 on some neighborhood containing {0 = 0}, as otherwise all measurable

subsets of [0, )] would be null sets w.r.t. d() (l), in which case we cannot conclude that { satisfies the

normal reflection condition.

5

Page 51: Strategic Exploration for Innovation

Lemma 1.2 (Smooth Pasting Condition). In any MPE with stopping threshold 0 > 0,

each player’s normalized payoff function is continuously differentiable at 0.

Proof. Given any MPE k = (:1, . . . , :# ), consider the region [0, 0) on which the

intensity of exploration is positive. Clearly D′= (0−), the left derivative of the equilibrium

payoff function of player = at 0, cannot be positive, because otherwise we would have

D= (0) < 1 for 0 immediately to the left of 0, contradicting the fact that := is a best

response.

Suppose by contradiction that D= violates the smooth pasting condition at 0 with

D′= (0−) < 0. Consider a deviation strategy := in which := (0) = 1 on [0 − n, 0 + n) for

some small n > 0, and := = := otherwise, with D= denoting the associated payoff function

of player =. Moreover, write |= ≔ (ln(D=))′ = D′=/D= and |= ≔ (ln(D=))′ = D′=/D=.Notice that under this deviation the intensity of exploration := +

∑;≠= :; is bounded

away from zero on [0, 0+ n) so that D= is once continuously differentiable on this region,

which implies |= is continuous on [0, 0 + n). Also note that because the strategy profile

remains unchanged on [0, 0 − n), the normal reflection condition |= (0) = |= (0) = −1

and the Feynman-Kac equation (6) imply that |= and |= coincide on [0, 0 − n).8 From

D′= (0−) < 0 we know that |= (0−) < 0 as well, and therefore by the continuity of |= we

can choose n small enough so that |= < 0 on [0 − n, 0 + n). This implies D= > 1 = D= on

[0, 0 + n), because D= (0) = exp(∫ 0

0+n |= (I) dI)

for 0 < 0 + n . Therefore, the deviation

strategy := leads to a higher payoff on [0, 0 + n) than the equilibrium strategy :=, which

is a contradiction.

2 Comparative Statics

Proof of Lemma 10. From D′′8 (G) = �8 (D8 (G), D′8 (G)) > −D′8 (G) ≥ 0, we know D′8 is

negative on (0, 08). Therefore, D−18(D), the inverse of D8 at D, is uniquely defined for

D ∈ [1, D8 (0)].9 To simplify notations, we denote the derivatives (one-side if necessary)

of D8 at level D ∈ [1, D8 (0)] by D′8 (D) and D′′8 (D), where D′8 = D′8 ◦ D−1

8 and D′′8 = D′′8 ◦ D−18 .

We first show that for all D ∈ [1,min{D1 (0), D2(0)}], we have D′1(D) ≤ D′

2(D).

Suppose by contradiction that D′1(D) > D′

2(D) for some 1 ≤ D ≤ min{D1 (0), D2(0)}. Let

D ≔ sup{ D ≤ D | D′1(D) ≤ D′

2(D) }. Obviously we have 1 ≤ D < D since D′

1(1) = D′

2(1),

and by the continuity of D′8

we have D′1(D) = D′

2(D) < 0. We then compare their

8This claim is not affected by the possible discontinuities of the strategies on [0, 0), as the payoff

functions are at least once continuously differentiable on this region.

9We define D−18(1) ≔ 08 .

6

Page 52: Strategic Exploration for Innovation

derivatives on (D, D + n) for some small enough n > 0 and have

dD′1(D)

dD=D′′

1(D)

D′1(D) =

�1(D, D′1(D))D′

1(D) ≤

�1(D, D′2(D))D′

2(D) ≤

�2(D, D′2(D))D′

2(D) =

D′′2(D)

D′2(D) =

dD′2(D)

dD,

where the first inequality comes from the monotonicity of �1(I1, I2)/I2, and the second

inequality comes from D′2(D) < 0 and our assumption that �1 ≥ �2. This, together with

D′1(D) = D′

2(D), implies D′

1(D) ≤ D′

2(D) on (D, D + n), contradicting the definition of D.

Next we show that D1(0) ≤ D2(0). Suppose by contradiction that D1(0) > D2(0).Then normal reflection conditions D8 (0) = −D′8 (0) and Condition 4 in the lemma imply

the following strict inequality

D1(0) = −D′1(0) = −D′1(D1(0)) = −D′1(D2(0)) −∫ D1 (0)

D2 (0)D′′1 (D)/D′1(D) dD

≥ −D′2(D2(0)) −∫ D1 (0)

D2 (0)�1(D, D′1(D))/D

′1(D) dD

> −D′2(D2(0)) +∫ D1 (0)

D2 (0)1 dD

= −D′2(0) + D1(0) − D2(0) = D1(0),

which is a contradiction.

Since we have D′1(D) ≤ D′

2(D) ≤ 0 for all D ∈ [1, D1(0)] with D1(0) ≤ D2(0), it is

obvious that 01 ≤ 02 and D1 ≤ D2.

To prove the strict inequalities, suppose in addition to �1 ≥ �2 we also have �1 = �2 on

(D, D2(0)) ×ℝ− for some D ∈ (1, D2(0)]. Moreover, assume that for all I2 ≤ 0, we have

�1(D−, I2) > �2(D−, I2), or we havem�1 (D−,I2)

mI1<

m�2 (D−,I2)mI1

andm�1 (D−,I2)

mI2≥ m�2 (D−,I2)

mI2.

Note that either of these conditions implies �1(I1, I2) > �2(I1, I2) for I1 immediately

below D, which together with D′1(D) ≤ D′

2(D) implies that D′

1< D′

2< 0 on (D − [, D)

for some [ > 0. We next show that D1(0) < D2(0), which together with the weak

inequalities proved above yields the strict inequalities in the Lemma.

Suppose by contradiction that D1(0) = D2(0). Since �1 = �2 on (D, D2(0)) ×ℝ−, we

must have D′1(D) = D′

2(D) ≕ I2 < 0. We can then compare the left derivatives of D′8 at D

to have

dD′1(D−)

dD=D′′

1(D−)

D′1(D) =

�1(D−, I2)I2

≤ �2(D−, I2)I2

=D′′

2(D−)

D′2(D) =

dD′2(D−)

dD.

If �1(D−, I2) > �2(D−, I2), then this inequality is strict, which together with D′1(D) =

D′2(D), contradicts the fact that D′

1< D′

2on (D−[, D). Otherwise, it must be �1(D−, I2) =

7

Page 53: Strategic Exploration for Innovation

�2(D−, I2) and hencedD′

1(D−)

dD=

dD′2(D−)

dD< 0. Then from their second-order derivatives

d2D′8(D)

dD2=

1

D′8(D)

d�8 (D, D′8 (D))dD

−�8 (D, D′8 (D))(D′8(D))2

dD′8(D)

dD

=1

D′8(D)

(m�8 (D, D′8 (D))

mI1+m�8 (D, D′8 (D))

mI2

dD′8(D)

dD

)−�8 (D, D′8 (D))(D′8(D))2

dD′8(D)

dD,

we can conclude fromm�1 (D−,I2)

mI1<

m�2 (D−,I2)mI1

andm�1 (D−,I2)

mI2≥ m�2 (D−,I2)

mI2that

d2D′1(D−)

dD2 >

d2D′2(D−)

dD2 . This, together with D′1(D) = D′

2(D) and

dD′1(D−)

dD=

dD′2(D−)

dD, contradicts the fact

that D′1< D′

2on (D − [, D).

2.1 Proof of Corollary 5 (First Part in Detail)

Let d = (\ − ln(1 + \))/\2 and # =1

d(1+\) for \ > −1. To show the limit results for

d > d, we first show in Lemma 2.1 below that |0∗#− 0†

#| → Δ for some Δ < +∞

as # → # . Because we know from Corollary 1 that 0∗#

→ +∞, we conclude that

0†#→ +∞, and hence 0# → +∞ as well. The convergence of @# (0# ) then follows

from Lemma 3.

After showing that lim#→#

*∗#/*†

#

∞< +∞ in Lemma 2.2, we can then conclude

*†#→ +∞ from the fact that *∗

#→ +∞ from Corollary 1.

Lemma 2.1. If d > d, we have |0∗#− 0†

#| → Δ ∈ (0, +∞) as # → # .

Proof. Note that when d > d, the expression of 0 for the non-binding case in Corollary

6 cannot be applied. Therefore, the symmetric MPE must belong to the binding case

for all # ∈ (1, #). Moreover, for d > d, we can verify that ]# in Corollary 6 converges

to some ]#∈ (0, 1). Then from explicit calculations we know that W2 → 1 + \ > 0 and

W1 → −1 as # → # , and therefore we have

Δ ≔ lim#→#

|0∗# − 0†#|

= lim#→#

1

W2 − W1

ln

(1 + ]/W2

1 + ]/W1

)

=1

2 + \ ln

(1 + ]

#/(1 + \)

1 − ]#

)

∈ (0, +∞).

8

Page 54: Strategic Exploration for Innovation

Lemma 2.2. If d > d, we have lim

*∗#/*†

#

∞< +∞ as # → # .

Proof. We first show that *∗/*†

∞ = *∗(0)/*†(0) by showing *∗(0)/*†(0) is

nonincreasing in 0.

Write |∗ = (ln*∗)′ and |† = (ln*†)′. It is not difficult to verify that

|∗′=

1

#d+ \|∗ − |∗2

, on (0, 0∗),

and |†′=

1#d

+ \|† − |†2, on (0, 0†),

1*†d

+ \|† − |†2, on (0†, 0).

Because *† < # on (0†, 0), we have |†′ ≥ |∗′. Then from the normal reflection

condition |∗(0) = |†(0) = −1 we have |∗ ≤ |†. Therefore, |∗(0) − |†(0) =(ln(*∗(0)/*†(0)

) )′ ≤ 0, which implies ln(*∗(0)/*†(0)

)is nonincreasing in 0 and

hence so is*∗(0)/*†(0). Therefore, we have *∗/*†

∞ = *∗(0)/*†(0).Moreover, because |∗(0) = |† (0) = −1 and |∗′ = |†′ on (0, 0†

#), we have |∗ = |†

on (0, 0†#) and hence *∗/*† is constant on (0, 0†

#). Therefore, we can write

*∗#(0)

*†#(0)

=*∗#(0†#)

*†#(0†#)=

1

#

1

W2 − W1

(W24

−W1 (0∗−0†) − W14−W2 (0∗−0†)

).

Since we know that W2 → 1 + \ > 0 and W1 → −1, we have

*∗#

*†#

∞=*∗#(0)

*†#(0)

→ 1

# (2 + \)

((1 + \)4Δ + 4−(1+\)Δ

)< +∞.

3 Lemmas in the Construction of Asymmetric MPE

Proof of Lemma 12. For any 0! ≤ 01 ≤ 02 ≤ 0', consider D(·|01, 02) of class

�1 ((0!, 0')) that solves the explorer ODE D = d(D′′ − \D′) on [0! , 01] ∪ [02, 0'], and

the free-rider ODE D = 1+d(D′′−\D′) on [01, 02], with initial conditions D(0') = D(0')and D′(0') = D′(0'). The existence and uniqueness of D(·|01, 02) is guaranteed by stan-

dard results. We write D0!(01, 02) ≔ D(0! |01, 02), and D1

!(01, 02) ≔ D′(0! |01, 02) for

the value and the derivative of D(·|01, 02) evaluated at 0! .

Note that the functions D0!, D1

!: { (01, 02) | 0! ≤ 01 ≤ 02 ≤ 0' } → ℝ+ are

continuous (see, e.g., Theorem 2.14 in Barbu (2016)). Moreover, from Lemma 3.1 below

we know that they have the following properties: D0!(0, 0) > D! , D1

!(0, 0) < D′(0!) < 0

9

Page 55: Strategic Exploration for Innovation

for all 0 ∈ [0!, 0']; D0!(0! , 0') < D! , 0 ≥ D1

!(0! , 0') > D′(0!); D0

!(01, 02) is strictly

increasing in 01 and strictly decreasing in 02.

By construction, D(·|01, 02) and D match value and derivative at 0'. Next we choose

01 and 02 so that they match value and derivatives at 0! as well.

Because D0!(0! , 0!) > D! and D0

!(0! , 0') < D!, there exists a unique 01 ∈ (0! , 0')

such that D0!(01, 0') = D!, and D0

!(01, 0') < D! for all 01 ∈ [0! , 01). Therefore, for

each 01 ∈ [0! , 01], because D0!(01, 01) > D! , there exists a unique 02(01) ∈ (01, 0'],

such that D0!(01, 02(01)) = D! , with 02(01) = 0' by the definition of 01. It is obvious

that the function 02 is continuous on its domain [0! , 01].In other words, D(·|01, 0') solves the free-rider ODE on (01, 0') and the explorer

ODE on (0!, 01), while D(·|0!, 02(0!)) solves the explorer ODE on (02(0!), 0') and

the free-rider ODE on (0! , 02(0!)), with both D(·|01, 0') and D(·|0! , 02(0!)) having

the same value as D at 0! and 0', and same derivatives as D at 0'.

From Lemma 3.1 below we can conclude that D1!(01, 0') < D′(0!) < D1

!(0!, 02(0!)).10

Therefore, there exists a 01 ∈ (0!, 01) such that D1!(01, 02(01)) = D′(0!) from the con-

tinuity of D1!

and 02. Because 01 < 01, we have D0!(01, 0') < D! , which implies

02(01) ≠ 0' and thus it must be 02(01) < 0'.

Finally, let 0"− ≔ 01, 0"+ ≔ 02(01) and D(·) ≔ D(·|0"−, 0"+). We have shown

that D(0!) = D! , D′(0!) = D′(0!), and by construction 0! < 0"− < 0"+ < 0'. The

image of D follows from the monotonicity of D, which is implied by Lemma 3.1 with

D′(0') = D′(0') ≤ 0. �

Lemma 3.1. Let D(·| 5 , D0, D1) be the solution of the ODE D = 5 + d(D′′ − \D′) with

initial conditions D(0') = D0 and D′(0') = D1, for some d > 0 and 5 , \ ∈ ℝ. For any

0! < 0', D(0! | 5 , D0, D1) is strictly increasing in D0 and strictly decreasing in 5 and D1,

whereas D′(0! | 5 , D0, D1) is strictly decreasing in D0 and strictly increasing in 5 and D1.

Moreover, if D0 ≥ 5 and D1 ≤ 0, then D′(0! | 5 , D0, D1) ≤ 0, with D′(0! | 5 , D0, D1) = 0 if

and only if D0 = 5 and D1 = 0.

Proof. Let [ denote either 5 , −D0, or D1. For 0! < 0', to show D′(0! |[) is strictly

10Write D(·) ≔ D(·|01, 0'). First we show D1!(01, 0') ≔ D′(0!) ≤ D′(0!). Since D solves the

free-rider ODE on (01, 0'), with the same initial conditions at 0' as D, from Lemma 3.1 we know

that D(01) < D(01). Then we must have D(0) < D(0) for all 0 ∈ (0! , 01) for the following reason.

Suppose by contradiction this is not the case, then there exists an 0 ∈ (0! , 01) such that D(0) = D(0)and D′(0) ≤ D′(0) ≤ 0. Now since D solves the explorer ODE on (0! , 0), Lemma 3.1 implies that

D(0!) > D(0!), which contradicts the fact that D(0!) = D(0!), guaranteed by the our choice of 01.

Therefore we have D(0) < D(0) for all 0 ∈ (0! , 01), which together with D(0!) = D(0!), implies

D′(0!) ≤ D′(0!).Next we show that this inequality is strict. Suppose by contradiction that D′(0!) = D′(0!). Then D

and D have the same values and derivatives at 0!. From the ODEs which D and D satisfy on (0!, 01), we

can conclude D′′(0!) > D′′(0!). This contradicts the fact that D(0) < D(0) on (0! , 01), and therefore we

must have D′(0!) < D′(0!). The other inequality can be derived analogously.

10

Page 56: Strategic Exploration for Innovation

increasing in [, let [1 < [2. For the purpose of contradiction, suppose D′(0! |[1) ≥D′(0! |[2). Let 0 = sup{ 0 < 0' | D′(0 |[1) = D′(0 |[2) }.

For [ being either 5 or −D0, because [1 < [2 and D′′(0' |[) = D0− 5d

+ \D1, we have

D′′(0' |[1) > D′′(0' |[2). Therefore, we have D′(0 |[1) < D′(0 |[2) for all 0 ∈ (0'−n, 0')for some n > 0, which obviously also holds when [ denotes D1 by the continuity of

D′(·|[). Therefore, we have 0! ≤ 0 < 0', and from the continuity of D′(·|[), we have

D′(0 |[1) = D′(0 |[2).Because D′(0 |[1) < D′(0 |[2) for all 0 ∈ (0, 0'), we have

D(0 |[1) = D(0' |[1) −∫ 0'

0

D′(0 |[1) d0 > D(0' |[2) −∫ 0'

0

D′(0 |[2) d0 = D(0 |[2),

which implies

D′′(0 |[1) =D(0 |[1) − 5 |[1

d+ \D′(0 |[1) >

D(0 |[2) − 5 |[2

d+ \D′(0 |[2) = D′′(0 |[2).

But this is a contradiction, as D′(0 |[1) = D′(0 |[2) together with D′(0 |[1) < D′(0 |[2) for

all 0 ∈ (0, 0') implies D′′(0 |[1) ≤ D′′(0 |[2).As 0! is chosen arbitrarily, we have shown that for any 0 < 0', D′(0 |[) is strictly

increasing in [. Therefore, as D(0! |[) = D(0' |[) −∫ 0'0!

D′(0 |[) d0, we have D(0! |[) is

strictly decreasing in [.

Finally, if D0 = 5 and D1 = 0, then D(0 | 5 , D0, D1) = 5 is the solution of the ODE and

hence D′(0! | 5 , D0, D1) = 0 for any 0! ≤ 0'. Otherwise, if either D0 > 5 or D1 < 0, from

the strict monotonicity of D′(0! | 5 , D0, D1) in D0 and D1, we have D′(0! | 5 , D0, D1) < 0. �

4 Math Appendix

Lemma 4.1. Let ") = max0≤C≤) {`C + f�C} be the running maximum of the standard

Brownian motion {�C}C≥0 starting from 0.

Then there exist �1, �2 ≥ 0 which do not depend on ) , such that

E[4") ] ≤ �14(`+f2/2)) +�2.

Moreover, as ) → ∞,

E[4") ] → `

` + f2/2

if ` + f2/2 < 0.

Proof. From Corollary 9.1 in Choe (2016), or Shreve (2004) equation (7.27), we can

11

Page 57: Strategic Exploration for Innovation

calculate E[4"C ] explicitly to have

E[4") ] =(` + f

2

2

)−1 ((` + f2)4(`+f2/2))

(1 − #

(−` + f

2

f

√)

))+ `#

(− `f

√)))

≤ �14(`+f2/2)) +�2

≤ |�1 |4(`+f2/2)) + |�2 |,

where # is the CDF of standard normal distribution. The convergence result follows

from straightforward calculation.

References

Barbu, Viorel. “Existence and Uniqueness for the Cauchy Problem”. In: Differential

Equations. Cham: Springer International Publishing, 2016, pp. 29–77.

Choe, Geon Ho. “The Reflection Principle of Brownian Motion”. In: Stochastic Anal-

ysis for Finance with Simulations. Cham: Springer International Publishing, 2016,

pp. 147–156.

Chung, K. L. and R. J. Williams. “Generalized Ito Formula, Change of Time and

Measure”. In: Introduction to Stochastic Integration. New York, NY: Springer New

York, 2014, pp. 183–215.

Rogers, L. C. G. and David Williams. “Introduction to Itô Calculus”. In: Diffusions,

Markov Processes and Martingales. 2nd ed. Vol. 2. Cambridge Mathematical Li-

brary. Cambridge University Press, 2000, pp. 1–109.

Shreve, Steven E. Stochastic Calculus for Finance II. Springer New York, 2004.

Strulovici, Bruno and Martin Szydlowski. “On the smoothness of value functions and the

existence of optimal strategies in diffusion models”. In: Journal of Economic The-

ory 159 (2015). Symposium Issue on Dynamic Contracts and Mechanism Design,

pp. 1016–1055.

12


Recommended