+ All Categories
Home > Documents > Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999...

Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999...

Date post: 09-Oct-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
33
Ž . Games and Economic Behavior 29, 191]223 1999 Article ID game.1998.0674, available online at http:rrwww.idealibrary.com on Strategic Entropy and Complexity in Repeated Games Abraham Neyman U Institute of Mathematics, The Hebrew Uni ¤ ersity, 91904 Jerusalem Israel, and SUNY at Stony Brook, Stony Brook, New York 11794-4384 and Daijiro Okada ² Center for Rationality, The Hebrew Uni ¤ ersity, 91904 Jerusalem, Israel Received April 16, 1997 We introduce the entropy-based measure of uncertainty for mixed strategies of repeated games }strategic entropy. We investigate the asymptotic behavior of the maxmin values of repeated two-person zero-sum games with a bound on the strategic entropy of player 1’s strategies while player 2 is unrestricted, as the bound grows to infinity. We apply the results thus obtained to study the asymptotic behavior of the value of the repeated games with finite automata and bounded recall. Journal of Economic Literature Classification Numbers: C73, C72. Q 1999 Academic Press 1. INTRODUCTION Strategic complexity and bounded rationality in the context of repeated games have been extensively studied. Among the numerous directions of inquiry in this area we are interested in how equilibrium outcomes are affected when we impose an exogenous restriction on the sets of strategies available to the players by means of finite automata and bounded recall. For other complexity and rationality issues and approaches to them, the Ž . reader is referred to a survey by Kalai 1990 . Finite automata and bounded recall are possible alternatives to describe strategies in a repeated game which have ‘‘finite memories.’’ As suggested Ž . in Aumann 1981 , such formulations put bounds on the complexity of * E-mail: [email protected]. ² E-mail: [email protected]. 191 0899-8256r99 $30.00 Copyright Q 1999 by Academic Press All rights of reproduction in any form reserved.
Transcript
Page 1: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

Ž .Games and Economic Behavior 29, 191]223 1999Article ID game.1998.0674, available online at http:rrwww.idealibrary.com on

Strategic Entropy and Complexity in Repeated Games

Abraham NeymanU

Institute of Mathematics, The Hebrew Uni ersity, 91904 Jerusalem Israel,and SUNY at Stony Brook, Stony Brook, New York 11794-4384

and

Daijiro Okada†

Center for Rationality, The Hebrew Uni ersity,91904 Jerusalem, Israel

Received April 16, 1997

We introduce the entropy-based measure of uncertainty for mixed strategies ofrepeated games}strategic entropy. We investigate the asymptotic behavior of themaxmin values of repeated two-person zero-sum games with a bound on thestrategic entropy of player 1’s strategies while player 2 is unrestricted, as the boundgrows to infinity. We apply the results thus obtained to study the asymptoticbehavior of the value of the repeated games with finite automata and boundedrecall. Journal of Economic Literature Classification Numbers: C73, C72. Q 1999

Academic Press

1. INTRODUCTION

Strategic complexity and bounded rationality in the context of repeatedgames have been extensively studied. Among the numerous directions ofinquiry in this area we are interested in how equilibrium outcomes areaffected when we impose an exogenous restriction on the sets of strategiesavailable to the players by means of finite automata and bounded recall.For other complexity and rationality issues and approaches to them, the

Ž .reader is referred to a survey by Kalai 1990 .Finite automata and bounded recall are possible alternatives to describe

strategies in a repeated game which have ‘‘finite memories.’’ As suggestedŽ .in Aumann 1981 , such formulations put bounds on the complexity of

* E-mail: [email protected].† E-mail: [email protected].

1910899-8256r99 $30.00

Copyright Q 1999 by Academic PressAll rights of reproduction in any form reserved.

Page 2: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

NEYMAN AND OKADA192

strategies and enable an analysis in the framework of finite games. Amemory may be interpreted as a ‘‘state of the player’s mind.’’ Restrictingstrategies to those with a finite memory thus amounts to postulatinglimitations on the amount of information a player can process in decidingwhat to do in the course of a play. An example of such a constraint is theinability to count the number of stages played. Alternatively, one can thinkof a player using an external device, such as a computer, to carry out hisplans, and this machine has some hardware constraints.

In the case of an automaton, memories are represented by its states. Toeach state is assigned an action to be taken whenever the automaton is atthat state. Starting at a prespecified initial state, an automaton undergoesthe transition of its states according to a rule which determines the nextstate as a function of the current state and the other players’ actions at thelast stage.1 The more states an automaton has, the more complicated is thestrategy}depending on the greater variety of past actions}that it cancarry out. Thus the number of states of an automaton, called the size ofthat automaton, serves to distinguish quantitatively between ‘‘simple’’ and‘‘complicated’’ strategies. On the other hand, bounded recall limits how farback in the past a player can remember, i.e., the length of recall. Such astrategy can be represented by a function that assigns actions only to thesequences of past actions with a limited length. In the early stages of thegame, however, the sequence of actual actions may be too short for thisfunction to be ‘‘activated’’ to play a repeated game. In order to fulfill thetime span required, a surrogate memory}formally termed an initialmemory}is introduced.2 Again, the longer the length of recall, the morecomplicated the patterns of actions a player can take.

Ž .Aumann 1981 also includes a study of the undiscounted infinitelyrepeated prisoner’s dilemma with a special case of finite memory, due toAumann, Cave, and Kurz: action at each stage is allowed to depend onlyon the last action taken by the other player; i.e., the length of recall is 1.Such strategies are called ‘‘reactive’’ and can be implemented by automatawith at most two states. They examine the payoff matrix of this restricted

Ž .game 8 = 8 bimatrix and show that the process of successive weakdomination eliminates all but ‘‘Tit-for-Tat’’ strategies for both players. The

1 This type of automaton, which we will use in this paper, is sometimes called an exactautomaton. In contrast, a full automaton takes into account all players’ actions. Dependingon the issues and questions addressed, each of the two types has its advantage. See KalaiŽ . Ž . Ž .1990 , Section 4.1, and also Neyman 1997 , Section 7. The results in Kalai 1990 are statedin terms of full automata.

2 Again, there are two alternatives for the domain of such a function: only other players’actions or all players’ actions. Unlike our choice for automata, we will take the secondalternative for bounded recall. In Section 7.2 we will show that bounded recall can be viewedas a special case of automata.

Page 3: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

STRATEGIC ENTROPY 193

Ž .outcome is cooperation at every stage. See also Kalai et al. 1988 for thediscounted case.

Another study of repeated prisoner’s dilemma with complexity bounds isŽ . Ž .found in Neyman 1985 . He studies a finitely repeated undiscounted

version in which the players are restricted to the strategies that areimplementable by automata of bounded sizes. It is well known that,without any restriction on the set of strategies, every strategic equilibriumof this game leads to defections, or double-crossing, throughout the

3 Ž .stages. Neyman 1985 states that if the bounds are so small that theplayers are unable to count up to the last stage, then cooperation canprevail at every stage in equilibrium. Furthermore, even if players areallowed to use automata of much larger sizes, e.g., any power of thenumber of repetitions, one can still achieve cooperation at most of thestages provided that the game is sufficiently long. More general results onthe set of equilibrium payoffs of finitely repeated two-person games with

Ž .bounded automata can be found in Neyman 1998 . See also PapadimitriouŽ . Ž .and Yannakakis 1994 and Zemel 1989 .

Ž .In Neyman 1998 , questions and results are formalized as the asymp-totics of equilibrium outcomes of the restricted repeated games. Heconsiders a sequence of finitely repeated two-person games that areparameterized by the number of repetitions and bounds on the size ofautomata for each player. The main theorem of this paper specifies thecondition on the order of magnitude of these parameters under which theset of equilibrium payoffs converges to the set of feasible and individuallyrational payoffs as the parameters tend to infinity.

This approach has been useful in that it provides a framework to study aset of seemingly different questions concerning repeated games with finite

Ž .automata or bounded recall. For example, Ben-Porath 1993 studiesundiscounted infinitely repeated two-person zero-sum games. His investi-gation centers around the question regarding how much of an advantage abigger automaton has over a smaller one. Suppose that players 1 and 2 are

Ž .restricted to strategies implementable by automata of size n and m nŽ .) n , respectively. The results are stated in terms of the asymptotics ofthe value of the restricted game V as n ª `. Interestingly, a playern, mŽn.with a larger bound has an advantage only if his bound is exponentiallylarger than the other player’s bound. Otherwise, the player with a smallerbound can asymptotically secure the value of the one-shot game. Evenbeing polynomially bigger than the other bound is not big enough. For-

3 Let us remark that the proof of this statement is more subtle than the simple backwardinduction argument which is sometimes erroneously given in the literature. One shows byinduction that, at any equilibrium path, the actions in the last k stages of the game areŽ .defect, defect with probability 1.

Page 4: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

NEYMAN AND OKADA194

mally,

log m nŽ .lim s 0 « lim V s ¨ ,n , mŽn.nnª` nª`

Ž .where ¨ is the value of the stage game. Lehrer 1988 has similar resultsfor bounded recall.

Study of the zero-sum two-person case is important for that of thenon-zero-sum case; it provides insights into individually rational levels andeffective punishments, and thus the set of equilibrium payoffs. See Ben-

Ž . Ž . Ž .Porath 1993, Sect. 4 , Lehrer 1988, Sect. 5.5 , and also Lehrer 1994 .More recent results and open problems in this line of research can be

Ž .found in Neyman 1997 .In this paper we study repeated two-person zero-sum games in which a

player, say player 1, with a restricted set of strategies plays against anunrestricted player, player 2. Concerning the values of such games, twoquestions arise: ‘‘What is the number of repetitions needed for player 2 totake advantage of player 1’s restriction?’’, and ‘‘How long can player 1protect himself against an unrestricted player 2?’’ The questions are againformulated as the asymptotic behavior of the value of the repeated game:‘‘What is the relationship between the number of repetitions and the

Ž .complexity bound so that, as they tend to infinity, either a player 2 canhold player 1’s payoff down to his one-shot game maxmin value in pure

Ž .actions or b player 1 can still secure the value of the one-shot game?’’Ž .Neyman 1997 has conjectures on these questions for repeated games with

finite automata and bounded recall. The present work has stemmed fromŽ .an attempt to resolve the conjecture corresponding to question a above.

Precise statements of the conjectures will be given in Section 7.Our approach is to deduce the results concerning the value of a

repeated game with finite automata and bounded recall from a resultconcerning the ‘‘maxmin’’ value of a repeated game with a restrictiondirectly on the set of mixed strategies.

Note that finite automata and bounded recall limit the set of purestrategies in a repeated game. However, any mixture on the restricted setof pure strategies is allowed. Geometrically, this amounts to saying thatthe set of strategies is restricted to a face of the mixed strategy simplex.Since a face of a simplex is a compact convex set, the value of the repeatedgame with finite automata or bounded recall exists by the minimaxtheorem.

In contrast, we impose a restriction directly on mixed strategies. To thisend we employ an information theoretic quantity called entropy. Entropy isa measure of uncertainty of random variables and is a functional ofprobability distributions. Every mixed strategy is a probability distribution

Page 5: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

STRATEGIC ENTROPY 195

over pure strategies and thus its entropy is well defined. To define strategicentropy, however, we will take into account the uncertainty of a mixedstrategy relative to the other player’s strategy. Specifically, we look at theprobability distribution on the play induced by a pair of strategies and itsentropy. The strategic entropy of a mixed strategy is then defined to be themaximum entropy of the play with respect to the other player’s strategy.Thus it is the maximum uncertainty of the play that the other player facesagainst that mixed strategy.

We restrict player 1’s strategies to those that have strategic entropy lessthan a prespecified bound. A mixed strategy with a small strategic entropywould be ‘‘not so uncertain’’ and hence ‘‘close’’ to a pure strategy, and viceversa. We argue that a strategy with relatively little strategic entropy

Ž .compared to the number of repetitions eventually ‘‘reveals’’ its pureactions in the course of a play, and an unrestricted player may takeadvantage of it. To be more precise, our main theorem states that if thebound on strategic entropy is of a smaller order than the number ofrepetitions, then the maxmin value for player 1 in the restricted game isasymptotically his one-shot game maxmin level in pure actions.

The crucial fact in applying our result to finite automata and boundedrecall is that any strategy implementable by a bounded automaton orbounded recall has also a bounded entropy. This implies that the subset ofplayer 1’s strategies with a large enough entropy bound contains a facespanned by bounded automata or bounded recall strategies. Therefore, themaxmin value for player 1 on this subset is at least the correspondingvalue, which is equal to the value, of the repeated game with finiteautomata or bounded recall. This enables us to deduce the conjecturesmentioned above as corollaries of our theorem. For example, in Section 7,we will consider the finitely repeated game Gn in which player 1’smŽn.

Ž .strategies are restricted to those implementable by automata with m nstates, a function of the number of repetitions n, and then study the

n Žasymptotics of its value, V . We will see that the number of themŽn.. Ž .O mŽn.equivalence classes of such pure strategies is of the order m n and

Ž Ž . Ž ..any mixture of such strategies has entropy of the order O m n log m n .Now a main result of Section 6 implies that for every « ) 0 there is g ) 0such that for any mixed strategy of player 1 whose entropy is a sufficientlysmall fraction of n, i.e., at most g n, player 2 has a counterstrategy whichholds player 1’s average payoff within « of U#, the maxmin payoff in pure

Ž . Ž .actions of the stage game. This in turn implies that if m n log m n ª 0as n ª `, then the maxmin payoff to player 1 in Gn tends to U# asmŽn.n ª `. But since the mixed strategy sets in Gn are compact and convex,mŽn.its minimax value is indeed its value V n .mŽn.

Let us make a precautionary remark: we are not suggesting that ourentropy concept is a measure of strategic complexity in the way that the

Page 6: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

NEYMAN AND OKADA196

size of an automaton or the length of recall is intended to be. Our view inthis paper is that entropy captures an abstract informational featurecommon in automata and bounded recall restrictions and thereby serves asa useful tool to analyze them.

A basic model of repeated games is described in the next section. InSection 3 we review some information theoretic concepts such as entropyand conditional entropy. Section 4 is devoted to the definition of strategicentropy and its variants. We discuss the relation between strategic entropyand the ordinary entropy of a strategy in Section 5. Section 6 contains themain results on repeated games with bounded strategic entropy. Section 7is devoted to an application of the theorem to finite automata andbounded recall which gives positive resolutions to the two conjectures of

Ž .Neyman 1997 . Section 8 concludes the paper.

2. PRELIMINARIES

Henceforth, N denotes the set of all positive integers, and R denotes thew xset of all real numbers. For any x g R, x is the integer part of x. For any

Ž .set Q, we denote the set of all probability distributions on it by D Q . Then-fold Cartesian product of Q is denoted by Qn. For any probability pŽ .resp. a random variable X , the expectation operator with respect to pŽ . Ž .resp. the distribution of X is denoted by E resp. E .p X

Ž .Let G s A, B, r be a two-person zero-sum game in strategic formwhere A and B are finite sets of pure actions for player 1 and 2,respectively, and r : A = B ª R is the payoff matrix of player 1, themaximizer. A mixed action is a probability distribution on the set of pure

Ž . Ž .actions. Thus D A and D B are the sets of mixed actions of the twoplayers.

We denote the maxmin ¨alue in pure actions and the ¨alue of G byŽ . Ž . Ž . ŽU# G and Val G , respectively. That is, U# G s max min r a,ag A bg B

. Ž . w Ž .xb , and Val G s min max E r a, b . By the minimax theoremb g DŽB . ag A b

Ž . w Ž .xwe can also write Val G s max min E r a, b .a g DŽ A. bg B a

Ž .Given a game G s A, B, r , we next describe a new game in which G isplayed repeatedly. Each of the repetitions is referred to as a stage and Gas the stage game. At each stage, each player independently takes a stagegame action. The pair of actions is announced to both players and thetransfer of the corresponding payoff from one player to the other takesplace. The information structure of the repeated game is that of perfectrecall. That is, at any stage each player remembers all information he

Ž .received the sequence of actions taken by both players up to that stage.

Page 7: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

STRATEGIC ENTROPY 197

This description is common knowledge. The formal description follows.Ž .`A play of a repeated game is an infinite sequence v s v , wherek ks1

Ž .v s a , b g A = B. We denote the set of all plays by V , i.e., V sk k k ` `

Ž .`A = B . This will be our basic space throughout the paper.Ž .` X Ž X .`Two plays v s v and v s v are said to be n-equi alent ifk ks1 k ks1

v s vX for k s 1, . . . , n. The n-equivalence is clearly an equivalencek krelation on V . Denote by HH the finite partition of V into the n-equiv-` n `

alence classes. Each n-equivalence class of plays is called an n-history andit represents an information available to the players at the end of stage n.We sometimes represent an n-history by exhibiting the first n coordinates,e.g., v , . . . , v . We denote by AA the algebra on V generated by HH . Set1 n n ` n

� 4AA s f, V . Clearly, AA ; AA for every n g N. The s-algebra gener-0 ` ny1 nated by D AA is denoted by AA .nG 0 n `

Ž .Denote by S and T the sets of measurable mappings from V , AAn n ` ny1to A and B, respectively. Each element of S and T represents an n

Ž .‘‘strategy at stage n.’’ For each s g S , since s v depends only on then n nŽ .` Ž .first n coordinates of v s v , we sometimes write s v , . . . , v .k ks1 n 1 ny1

Ž .Similarly for t g T . A pure strategy of player 1 resp. 2 is a sequencen nŽ .` Ž Ž .` .s s s with s g S resp. t s t with t g T . Thus the sets ofn ns1 n n n ns1 n n

pure strategies of the two players are S s = S and T s = T . Wen nnG1 nG1consider S and T to be endowed with the product topologies with thediscrete topology on each factor. Then they are compact metrizable spaces.Denote by SS and TT the Borel s-algebra of S and T , respectively. A mixed

Ž . Ž . Ž Ž ..strategy of player 1 resp. 2 is then a probability on S, SS resp. T , TT .Ž . Ž .`A beha¨ioral strategy of player 1 resp. 2 is a sequence s s sn ns1

Ž Ž .` . Ž .resp. t s t where s resp. t is a measurable mapping fromn ns1 n nŽ . Ž . Ž Ž ..V , AA to D A resp. D B . By a slight abuse of notation we denote` ny1

Ž . Ž .by D S and D T the sets of all mixed and behavioral strategies of players1 and 2, respectively. A pure strategy is considered to be a specialŽ .degenerate case of mixed and behavioral strategy.

Ž . Ž .`Every pair of pure strategies s, t induces a play v s v g V ,k ks1 `

where v is defined inductively as follows:k

s , t for k s 1Ž .1 1v s a , b sŽ .k k k ½ s v , . . . , v , t v , . . . , v for k ) 1.Ž . Ž .Ž .k 1 ky1 k 1 ky1

Note that s and t , being AA -measurable, are constant everywhere.1 1 0Ž . Ž . Ž .Accordingly, every pair s , t g D S = D T induces a probability Ps , t

Ž . Ž .on V , AA . Equivalently, s , t induces a sequence of random actions, or` `

Ž .` Ž . Ž .a random play, X , where X s a , b is an A = B -valued randomk ks1 k n nvariable. The expectation operator with respect to P will be denoted bys , t

E .s , t

Page 8: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

NEYMAN AND OKADA198

For each n g N we define the n-average payoff function r : S = T ª RnŽ . Ž . n Ž . w .by r s, t s 1rn Ý r a , b . Also, for each l g 0, 1 we define then ks1 k k

Ž . Žl-discounted payoff function r : S = T ª R by r s, t s 1 yl l

. ` ky1 Ž . Ž . Ž .l Ý l r a , b . The bilinear extensions of r and r to D S = D Tks1 k k n l

Ž . wŽ . n Ž .xare denoted by the same symbols, i.e., r s , t s E 1rn Ý r a , bn s , t ks1 k kŽ . wŽ . ` ky1 Ž .xand r s , t s E 1 y l Ý l r a , b . Both r and r are contin-l s , t ks1 k k n l

Ž . Ž .uous on S = T. The continuity extends to D S = D T with respect to theproduct topology with the weakU topology on each factor.

In this paper we study three classes of repeated games differentiated bytheir payoff functions:

vnFinitely Repeated Game G with n-average payoff r ;n

v l-Discounted Game G with l-discounted payoff r ;l l

v` Ž .Undiscounted Game G , where the payoff from s, t is evaluated

by the Cesaro limit of the induced sequence of stage payoffs, i.e.,`Ž .lim r s, t .nª` n

For the undiscounted game G`, the above Cesaro limit does not neces-`sarily exist. We will supplement this loose end by being explicit in thedescription of the solution concepts in Section 6.3.

For each n g N, two pure strategies are said to be n-equivalent if,against any pure strategy of the other player, they induce n-equivalentplays. If two pure strategies induce the same play against any strategy ofthe other player, they are said to be equi alent. Extending this notion tomixed and behavioral strategies, we say that two strategies of a player aren-equivalent if, against any strategy of the other player, they induce the

Ž .same probability on V , AA . The equivalence of two strategies is similarly` ndefined by replacing AA by AA above. Perfect recall implies that everyn `

Žmixed strategy has an equivalent behavioral strategy and vice versa Kuhn’s.theorem .

3. REVIEW OF INFORMATION THEORETIC CONCEPTS

Throughout this section we deal with finite random variables and distri-butions. For a more general, measure theoretic, treatment of entropy, see

Ž .Smorodinsky 1971 .Let Q be a finite set. Let X be a random variable which takes values in

Ž . Ž . Ž .Q and whose distribution is p g D Q , i.e., p u s Prob X s u for eachu g Q. Note that the distribution p can be considered as a real random

Ž .variable on Q so that the expectation with respect to p of a measurabletransformation of p itself is well defined.

Page 9: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

STRATEGIC ENTROPY 199

Ž .DEFINITION 3.1. The entropy H X of X is defined by

w xH X s y p u log p u s yE log p ,Ž . Ž . Ž .ÝugQ

where 0 log 0 is defined to be 0.

To fix the unit, we take the logarithm to the base 2, and we say thatentropy is measured in bits. For example, the entropy of tossing a fair coinonce is 1 bit, twice is 2 bits, and so on.

It is obvious by the definition that the entropy of a random variabledepends only on its distribution and not on the particular values it takes.4

Ž .Thus we also write H p for the quantity in the above definition andŽ .regard H as a function on D Q .

Entropy possesses a number of desirable properties as a measure ofuncertainty of a random variable or a probability distribution. If oneinterprets the amount of uncertainty removed as the amount of informa-tion gained, then the entropy also measures the amount of informationcontained in a random variable or a probability distribution.5

� 4 Ž . Ž .EXAMPLE 3.1. Let Q s 0, 1 . Then for each p s p , 1 y p in D Q0 0Ž .or any random variable X with this distribution ,

H p s H X s yp log p y 1 y p log 1 y p .Ž . Ž . Ž . Ž .Ž . 0 0 0 0

Ž .Figure 1 shows the graph of H p which is parameterized by p .0Note that if p s 0 or 1, then there is no uncertainty involved, and the0

1entropy is indeed 0. On the other hand, if p s , there is no greater0 2

likelihood of either of the two outcomes and this corresponds to theŽ .highest value of the entropy, log 2 s 1. Also observe that H p is strictly

concave and continuous.

It is easy to verify that the properties of entropy illustrated in thisŽ .example hold in general. See Cover and Thomas 1991, Chap. 2 .

PROPOSITION 3.1. Let X be a random ¨ariable with a finite range Q. Then

Ž . Ž . < <1 0 F H X F log Q ;Ž . Ž . Ž .2 H X s 0 if , and only if , Prob X s u s 1 for some u g Q;

Ž . < < Ž . < <H X s log Q if , and only if , Prob X s u s 1r Q for e¨ery u g Q;

4 Furthermore, it depends only on the unordered profile of probabilities; for example, if1 1 1Ž .two random variables taking values in a set of three elements have the distributions , ,2 3 6

1 1 1Ž .and , , , then they have the same entropy.3 6 25 Ž .Shannon 1948 showed that, under certain axioms for a measure of uncertainty, the

Ž .entropy up to the choice of the logarithmic base is the only one that satisfies them.

Page 10: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

NEYMAN AND OKADA200

FIG. 1. The graph of entropy function.

Ž . Ž .3 H is continuous and strictly conca¨e as a function on D Q , whereŽ .D Q is considered to be the unit simplex in the Euclidean space whose

coordinates are indexed by the elements of Q.

The notion of entropy can be extended to an arbitrary finite dimensionalvector of random variables or probability distributions. In Definition 3.1

Ž .simply replace a random variable X by a random vector X s X , . . . , X1 nand the range Q by =n Q , where each Q is the range of X and isk k kks1finite. Thus

H X , . . . , X s y ??? p u , . . . , u log p u , . . . , uŽ . Ž . Ž .Ý Ý1 n 1 n 1 nu gQ u gQ1 1 n n

Ž . Ž .where p u , . . . , u s Prob X s u , . . . , X s u .1 n 1 1 n nŽ .Let X , X be a random vector taking values in Q = Q with distribu-1 2 1 2Ž .tion p u , u . Suppose that we observe the realization of X . For each1 2 1

X s u , we calculate the entropy of X conditional on X s u as the1 1 2 1 1entropy of the conditional distribution of X given X s u denoted by2 1 1Ž < .p X u . The expected value of such entropy with respect to the marginal2 1

Ž .distribution of X , p X , would then measure the ex ante uncertainty of1 1Ž < .X when we can observe X . Formally, let us define a function h X X :2 1 2 1

Q ª R by1

< < <h X u s y p u u log p u u ,Ž . Ž . Ž .Ý2 1 2 1 2 1u gQ2 2

Ž < . Ž < . Ž < .where h X u is the value of h X X at u g Q . We consider h X X2 1 2 1 1 1 2 1Ž .as a random variable on Q equipped with distribution p X .1 1

Page 11: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

STRATEGIC ENTROPY 201

Ž < .DEFINITION 3.2. The conditional entropy H X X of X given X is2 1 2 1defined by

< < <H X X s E h X X s p u h X u .Ž .Ž . Ž . Ž .Ý2 1 X 2 1 1 2 11u gQ1 1

Ž . w Ž < .xNote that the distribution of X , p X , is equal to E p X X and2 2 X 2 11Ž . Ž Ž .. Ž < . Ž Ž < ..H X s H p X . Also, h X X s H p X X . Therefore the con-2 2 2 1 2 1

cavity of the entropy and Jensen’s inequality imply that

< < <H X X s E H p X X F H E p X X s H X .Ž .Ž . Ž . Ž .Ž . Ž .2 1 X 2 1 X 2 1 21 1

That is, the conditioning reduces the entropy. The strict concavity of Himplies, moreover, that the equality holds if, and only if, X and X are1 2independent.

The next proposition states that the uncertainty of a pair of randomvariables is the uncertainty of one plus the remaining uncertainty of theother given the first. The proof is a simple marshaling of the definitions.

Ž .See Cover and Thomas 1991, Chap. 2 .

Ž . Ž . Ž < . Ž Ž .PROPOSITION 3.2. H X , X s H X q H X X s H X q1 2 1 2 1 2Ž < ..H X X .1 2

One can easily verify this in a simple example:

Ž . � 4 � 4EXAMPLE 3.2. Consider a joint distribution of X , X on 0, 1 = 0, 11 2given below:

X1

0 11 10 4 2

X2 1ž /01 4

1 1 3 1Ž . Ž . Ž . Ž .Then p X s , , p X s , ,1 22 2 4 4

1 1, if u s 0Ž . 12 2<p X X s u sŽ .2 1 1 ½ 1, 0 if u s 1Ž . 1

and

1 2, if u s 0Ž . 23 3<p X X s u sŽ .1 2 2 ½ 1, 0 if u s 1.Ž . 2

Page 12: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

NEYMAN AND OKADA202

Therefore we have

3H X , X sŽ .1 2 2

H X s 1Ž .1

3H X s 2 y log 3Ž .2 4

1<H X X sŽ .2 1 2

1 3<H X X s y q log 3.Ž .1 2 2 4

Ž < . Ž < .Note that in general H X X / H X X .2 1 1 2

Ž .Proposition 3.2 applied to X , . . . , X and X yields1 ny1 n

<H X , . . . , X s H X , . . . , X q H X X , . . . , X .Ž . Ž . Ž .1 n 1 ny1 n 1 ny1

Hence by induction the ‘‘chain rule’’ for entropy is proved:

Ž . Ž . n Ž <PROPOSITION 3.3. H X , . . . , X s H X q Ý H X X , . . . ,1 n 1 ks2 k 1.X .ky1

An extension of the entropy measure to stochastic processes is calledentropy rate and is defined as follows.

Ž .`DEFINITION 3.3. Let X be a stochastic process where each Xn ns1 n`ŽŽ .. Ž .takes values in a finite set Q. The entropy rate H X of X isn n ns1

defined by

1H X s lim sup H X , . . . , X .Ž . Ž .Ž .n 1 kkkª`

Thus the entropy rate is the upper limit of the average uncertainty perbit of the process. For example, for an i.i.d. process the entropy ratecoincides with the entropy of the common distribution. This followsimmediately from Proposition 3.3 with the i.i.d. assumption. Since

kŽ . < < Ž . ŽŽ ..H X , . . . , X F log Q by Proposition 3.1 1 , the entropy rate H X1 k n< <is bounded by log Q .

If we were to define entropy rate as the limit rather than the upper limit,it would not necessarily exist. Consider, for example, two Bernoulli random

1 1Ž . Ž .variables X and Y whose distributions are 1, 0 and , , respectively.2 2

Construct an independent sequence consisting of a string of Xs followedby an exponentially longer string of Ys followed further by a still exponen-

Ž . Ž .tially longer string of Xs and so on, so that 1rk H X , . . . , X oscillates1 kbetween 0 and 1. However, it can be shown that for any stationary processthe entropy rate, defined as the limit, does exist and coincide with the limit

Page 13: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

STRATEGIC ENTROPY 203

Ž < .of conditional entropy, lim H X X , . . . , X . See Cover andk ª` k 1 ky1Ž .Thomas 1991, Chap. 4 .

4. STRATEGIC ENTROPY

We start with an example which illuminates the heuristics behind theconcepts we define in this section and motivates the use of entropyconcepts discussed in the previous section.

EXAMPLE 4.1. Consider the twice repetition of a stage game in whichplayer 1 has two actions, T and B, and player 2 also has two actions, L andR. The game is depicted in Figure 2.

We represent a player’s pure strategy by an ordered five-tuple. ForŽ .example, T , B, T , T , B represents player 1’s strategy which assigns the

action T in the first stage, i.e., at the information set I , B in the second1Ž .stage after the history T , L , i.e., at the information set I , T at I , and so2 3

on.Now consider a mixed strategy s of player 1 whose support consists of

Ž . Ž .four pure strategies s s T , B, T , T , B , s s T , B, B, B, T , s s1 2 3Ž . Ž .T , B, T , T , T , and s s B, B, B, B, B . Suppose that player 2 is informed4of s}its support and the probabilities it assigns}but not of its realiza-tion.

Note that s and s are equivalent. Hence if either one of these is1 3selected by s , no matter what strategy player 2 uses, he will never be ableto tell which is actually selected. Now suppose that s has selected s . Let2

Ž . Žplayer 2 play L, L, L, L, L . Player 2’s actions in the second stage will be.irrelevant in the subsequent discussion. After the first stage player 2

observes that player 1 has taken the action T , and thus concludes that therealized strategy must be either s , s , or s . Will player 2 ever be able to1 2 3narrow down the realization of s further? The answer is no. All thatplayer 2 observes at the end of the game is that player 1 has chosen T in

FIG. 2. The game tree of Example 4.1.

Page 14: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

NEYMAN AND OKADA204

Ž .the first stage and B in the second stage after the history T , L . Any oneof s , s , and s could lead to the same play of the game. Therefore1 2 3uncertainty among the three strategies persists even after the game isended provided that player 2 uses a strategy that assigns L at the first

Ž .stage. Notice that s is not equivalent to s and hence not to s , either ;2 1 3nevertheless, it is not distinguished if player 2 chooses L at the first stage.Had he chosen R at the first stage, however, he would have discoveredthat s had indeed selected s because this is the only strategy that plays2

Ž .B after the history T , R .

From this example it is clear that, by using a pure strategy t, player 2can only distinguish those pure strategies of player 1 in the support of hismixed strategy which give rise to different play of the game against t and

Žnothing more. In other words, the amount of information the reduction of.uncertainty about player 1’s strategy s obtained by using a pure strategy t

is precisely the amount of information offered by the play induced byŽ .s , t . This motivates us to look at the information on the induced playmeasured by entropy. The following discussion focuses on player 1’sstrategy. It will be an easy task to extend the concepts defined below togames with an arbitrary number of players. See Section 8.

nŽ . Ž .For each n g N, define a function H ? : ? : D S = T ª R as follows.Ž . Ž . Ž .`Given s , t g D S = T , let X be the random play induced byk ks1

Ž . nŽ .s , t . Then H s : t is defined to be the entropy of this random play upto stage n, that is,

H n s : t s H X , . . . , X s y P C log P C .Ž . Ž . Ž . Ž .Ý1 n s , t s , tCgHHn

Recall that HH is the partition of V with respect to actions in the first nn `nŽ .stages. Thus H s : t is the uncertainty about the play up to stage n that

player 2 faces when player 1 uses s and player 2 uses t. The dualinterpretation is that it is the amount of information on the play of thegame that player 2 can obtain using t when player 1 uses s .

nŽ .Let us discuss some properties of the function H ? : ? . Fix t g T. EachŽ .s g D S induces a probability distribution P on the finite set ofs , t

Ž .n-histories. Its support is the set of n-histories induced by s, t for some sin the support of s . Note that, with t fixed, the number of n-histories in

< < n Ž .the support of P is at most A . Therefore by Proposition 3.1 1 we sees , tnŽ . < <that H ? : ? is bounded by n log A . Since the probability P is continu-s , tŽ . nŽ . Ž .ous in s , t , H ? : ? is continuous on D S = T.

nŽ .From the above definition it is also easy to see that H ? : t is a concaveŽ .function on D S for each fixed t g T. The strict concavity, however, does

not hold. For example, suppose that two strategies s and s X differ onlyon those histories that would not be realized against a particular t. Let s Y

Page 15: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

STRATEGIC ENTROPY 205

X Ž .be any mixture of s and s . Then the random plays induced by s , t ,Ž X . Ž Y . Ž .X Ys ,t , and s , t or P , P , and P all coincide. Consequently, thes , t s , t s , t

nŽ . nŽ X .corresponding convex combination of H s : t and H s : t is equal tonŽ Y .H s : t .

nŽ . Ž .Clearly, H ? : t depends only on the n-equivalence classes of D S fornŽ .each t g T , and similarly, H s : ? depends only on the n-equivalence

X Ž .classes of T. That is, if s and s in D S are n-equivalent, thennŽ . nŽ X . XH s : t s H s : t for every t g T , and if t and t in T are n-equiv-

nŽ . nŽ X. Ž .alent, then H s : t s H s : t for every s g D S . We summarize thenŽ .properties of H ? : ? discussed above as a proposition.

Ž . nŽ . Ž .PROPOSITION 4.1. 1 H ? : ? is continuous on D S = T.

Ž . nŽ . Ž .2 For each t g T , H ? : t is conca¨e on D S .Ž . nŽ .3 For each t g T , H ? : t is constant on each n-equi alence class of

Ž .D S .Ž . Ž . nŽ .4 For each s g D S , H s : ? is constant on each n-equi alence

class of T.Ž . nŽ . < < Ž . Ž .5 0 F H s : t F n log A for e¨ery s , t g D S = T.

The next lemma follows directly from Proposition 3.3.

Ž .` Ž .LEMMA 4.1. If X is the random play induced by s , t , thenk ks1n

n <H s : t s H X q H X X , . . . , X .Ž . Ž . Ž .Ý1 k 1 ky1ks2

Ž < .The quantity H X X , . . . , X may be interpreted as the averagek 1 ky1Ž .uncertainty of actions at stage k given the k y 1 -history induced by

Ž . Ž < .s , t . One can think of H X X , . . . , X as per-stage average reduc-k 1 ky1Žtion of uncertainty about s or a pure strategy selected by s if s is a

.mixed strategy when player 2 uses t. In this context Lemma 4.1 states thatthe average overall reduction of uncertainty, which is the gain of informa-

nŽ .tion, about s up to stage n using t is precisely H s : t .Ž .This lemma provides an alternative proof of Proposition 4.1 5 . Given

Ž .the history up to stage k y 1, the actions at stage k, X s a , b , hask k krandomness only on the part of player 1’s action a whose entropy is atk

< < Ž . Ž <most log A . Thus, for every k y 1 -history, v , . . . , v , h X v , . . . ,1 ky1 k 1. < < Ž < . < <v F log A , and hence H X X , . . . , X F log A . Therefore, byky1 k 1 ky1

nŽ . < < Ž < .Lemma 4.1, H s : t F n log A . Note that, unlike H X X , . . . , X ,k 1 ky1Ž .the unconditional entropy H X of the action at stage k can be largerk

< <than log A .

EXAMPLE 4.2. Consider the game of Example 4.1. See Fig. 2. WeŽ . Ž . Ž .denote by p, q a mixed action of player 1 player 2 that takes T L with

Ž . Ž .`probability p and B R with probability q s 1 y p. Define s s sn ns1

Page 16: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

NEYMAN AND OKADA206

1 1`Ž . Ž . Ž . Ž .g D S and t g T as follows. For each v s v , s v s , , andn ns1 n 2 2

t v s 1, 0Ž . Ž .1

1, 0 if v s T , LŽ . Ž .1t v sŽ .2 ½ 0, 1 otherwise.Ž .

Ž .For n ) 2, t v is an arbitrary pure action. It is easily verified that X isn 2� 4 � 4 Ž .uniformly distributed on T , B = R, L and hence H X s log 4 ) log 22

< <s log A .

This example also shows that even if player 1’s actions are independentŽ .from stage to stage, the pair of random actions X s a , b induced byn n n

Ž .s , t are not necessarily so. This is because player 2’s stage actions dodepend on the past histories.

nŽ .One may consider two alternatives for the extension of H ? : ? toŽ . Ž . Ž .D S = D T . One is the linear extension: for t g D T , let us definenŽ . w nŽ .xH s : t s E H s : t . This quantity may be interpreted as the aver-t

age uncertainty of the play that player 2 faces when he uses t against s ,or the average amount of information on the play that player 2 can obtainusing t against s . The other alternative is to look directly at the random

Ž .play up to stage n induced by s , t and take its entropy. We denote thisnŽ .by H s , t . It is the uncertainty of the play up to stage n when one is

informed only of the distributions s and t but not their realizations. SincenŽ . nŽ .H s , t s H s : t for each t g T , it is indeed an extension of the

Ž . nŽ .original function on D S = T. In general, H s , t is greater than thenŽ .linear extension H s : t because of the concavity of the entropy.Ž .For each s g D S , we now define the n-strategic entropy of s to be the

nŽ .maximum of H s : t , where the maximum is taken over all pure strate-gies t g T of player 2.

nŽ . Ž .DEFINITION 4.1. The n-strategic entropy H s of s g D S is definedby

H n s s max H n s : t .Ž . Ž .tgT

Ž . nŽ .By Proposition 4.1 1 , H s : ? is continuous on the compact set T , andŽ .so the above maximum does exist. Proposition 4.1 4 offers a more intu-

nŽ .itive argument for the existence of the maximum: H s : ? is constant oneach n-equivalence classes of T , and since there are only a finite numberof the n-equivalence classes, the maximum exists. With the above discus-

nŽ . nŽ .sion of the extension of H ? : ? in mind, we can also write H s snŽ . Ž . Ž .max H s : t . Proposition 4.1 1 and 4 imply, moreover, that thet g DŽT .

nŽ .n-strategic entropy H s , considered as a function of s , is the maximum

Page 17: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

STRATEGIC ENTROPY 207

of essentially a finite number of continuous functions of s . Therefore,nŽ . nŽ .H s is continuous in s . The n-strategic entropy H s , unlike the

entropy, is not a concave function in s . This is well expected in light ofŽ . Ž . nŽ .Proposition 4.1 2 and 4 since they imply that H s is the maximum of

Ž .a finite number of concave functions. See also Example 4.3 d below.Although we will not utilize it in this paper, it is also of interest to consider

Ž .the lower n-strategic entropy of s g D S defined by

H n s s min H n s : t ,Ž . Ž .tgT

which preserves the concavity. This is the lower bound of the uncertaintyof the play up to stage n that player 2 faces no matter what pure strategyshe uses against s . We provide some examples to facilitate the under-standing of the n-strategic entropy.

Ž .EXAMPLE 4.3. a Any pure strategy has zero n-strategic entropy forevery n. Any strategy that is n-equivalent to a pure strategy has zeron-strategic entropy. This is trivial because such a strategy, by definition,induces pure actions against any pure strategy of player 2 up to stage n.Conversely, if a strategy has zero n-strategic entropy, then it is necessarilyn-equivalent to a pure strategy.

Ž . Ž .`b Suppose that s s s is an independent sequence of mixedk ks1Ž . Ž .`actions, s s a g D A . Take t g T and let X be the random playk k k ks1

Ž . Ž < . Ž .induced by s , t . Then H X X , . . . , X s H a for every k. Hencek 1 ky1 knŽ . n Ž . nŽ .by Lemma 4.1, H s : t s Ý H a for every t g T and thus H sks1 k

n Ž .s Ý H a . A special case of this is when a s a for all k. In thisks1 k knŽ . Ž .case, H s s nH a .

Ž .c Consider the game of Example 4.1. Define two behavioral strate-Ž .` X Ž X.`gies of player 1, s s s and s s s , as follows. For eachn ns1 n ns1

Ž .`v s v g V ,n ns1 `

s v s 1, 0Ž . Ž .1

1 1, if v s ?, LŽ .Ž . 12 2s v sŽ .2 ½ 0, 1 if v s ?, RŽ . Ž .1

and

s X v s 1, 0Ž . Ž .1

1, 0 if v s ?, LŽ . Ž .1Xs v sŽ .2 1 2½ , If v s ?, R .Ž .Ž . 13 3

Page 18: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

NEYMAN AND OKADA208

Specification of s and s X for n ) 2 is irrelevant to the subsequentn n2Ž . Ž .`discussion. The maximum of H s : t over t is attained by any t s tn ns1

12Ž . Ž . Ž .with t s L. Thus H s s H , where H p is the entropy function1 2nŽ X .defined in Example 3.1. On the other hand the maximum of H s : t

X 1` 2Ž . Ž . Ž .over t is attained by any t s t with t s R and H s s H .n ns1 1 3

Ž . Ž .d The two strategies defined in c can be used to show that thenŽ . Yn-strategic entropy H s is not concave in s . Let s be the mixture of

X Ž .`s and s , each with equal probabilities. Then, for each v s v ,n ns1

s Y v s 1, 0Ž . Ž .1

3 1, if v s ?, LŽ .Ž . 14 4Ys v sŽ .2 1 5½ , if v s ?, RŽ .Ž . 16 6

nŽ Y . Ž .`and the maximum of H s : t over t is attained by any t s t withn ns1Y 3 3 2 12Ž . Ž . Ž . Ž . Ž . Žt s L. Hence H s s H . Since H - H - H see also1 4 4 3 2

.Fig. 1 , we have

Y 1 1 X2 2 2H s - H s q H s .Ž . Ž . Ž .2 2

2Ž . Ž .Therefore, H ? is not concave on D S .Ž . Ž . nŽ . < <e Proposition 4.1 5 implies that H s is bounded by n log A .

Let a be the mixed action of player 1 which is uniformly distributed on A.˜Ž .From the argument of b above it follows that if s plays the i.i.d.

nŽ . < <sequence of a , then H s s n log A . There are other strategies that˜Ž .`attain this bound. Again, in the game of Example 4.1, define s s sn ns1

as follows:

1 1s v s ,Ž . Ž .1 2 2

1 1, if v s ?, LŽ .Ž . 12 2s v sŽ .2 ½ 0, 1 if v s ?, R .Ž . Ž .1

Ž .`Then any pure strategy t s t of player 2 with t s L would maximizen ns1 12Ž . 2Ž .H s : t and H s s 2 log 2 s 2.

The next proposition summarizes the properties of the n-strategic en-tropy.

Ž . nŽ . Ž .PROPOSITION 4.2. 1 H s is continuous as a function on D S .

Ž . X nŽ . nŽ X.2 If s and s are n-equi alent, then H s s H s .Ž . nŽ . < < Ž .3 0 F H s F n log A for e¨ery s g D S .Ž . nŽ .4 H s s 0 if , and only if , s is n-equi alent to a pure strategy.

Page 19: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

STRATEGIC ENTROPY 209

nŽ .It is clear that the n-strategic entropy H s is nondecreasing in n. Wedefine the total strategic entropy of s by the supremum of the n-strategicentropy.

U Ž . Ž .DEFINITION 4.2. The total strategic entropy H s of s g D S isdefined by

HU s s sup H n s .Ž . Ž .n

U Ž . nŽ .Note that H s may be infinite. Also note that H s cannot growŽ . < <faster than the linear function f n s n log A .

Ž .` Ž .If X is the random play induced by s , t , we denote its entropyn ns1`Ž .rate by H s : t . By the definition of the entropy rate and that of

nŽ .H s : t we have

1 1` nH s : t s lim sup H X , . . . , X s lim sup H s : t .Ž . Ž . Ž .1 nn nnª` nª`

`Ž .The strategic entropy rate of s is defined to be the supremum of H s : tover t g T.

`Ž . Ž .DEFINITION 4.3. The strategic entropy rate H s of s g D S isdefined by

H` s s sup H` s : t .Ž . Ž .tgT

For example, every behavioral strategy that plays pure actions after afinite number of stages has zero strategic entropy rate. It is easy toconstruct a behavioral strategy that plays a mixed action everywhere and

nŽ . nŽ .has zero strategic entropy rate. Since H s : t F H s for every n g N˜Ž .and t g T , the strategic entropy rate of s is at most H s s

Ž . nŽ .lim sup 1rn H s . This inequality can be strict.nª`

EXAMPLE 4.4. Consider the following behavioral strategy s of player 1� 4where the stage game is a 2 = 2 matrix game with A s Top, Bottom and

� 4B s Left, Right . Play Top in the first stage and continue with Top as longas player 2 plays Left. If player 2 plays Right at stage m for the first time,

1 1Ž . Ž .play the i.i.d. mixed action , Top and Bottom with equal probabilities2 2

for the next m2 stages and then play Top forever after that. Let us show` ˜Ž . Ž .that H s s 0 while H s s 1.

Ž .` Ž .` Ž .For every t s t g T , the random play X induced by s , t isn ns1 n ns1Ž .such that either X s Top, Left for every n or there is a unique m - `n

Page 20: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

NEYMAN AND OKADA210

for which

¡ Top, Left for n s 1, . . . , m y 1Ž .Top, Right for n s mŽ .~X s 4.1Ž .1 1 2n , , t for n s m q 1, . . . , m q mŽ .Ž .n2 2

2¢ Top, t for n G m q m q 1.Ž .n

nŽ . nŽ .In the former case H s : t s 0 for every n. In the latter case, H s : t2 2 Ž . nŽ . 2s m for every n G m q m and hence 1rn H s : t s m rn ª 0.

`Ž .This shows that H s s 0.˜Ž . < <Next, it is clear that H s F log A s log 2 s 1. For each m g N,

m Ž m.` m mdefine t s t g T as follows: t s Left if n / m, and t s Rightn ns1 n nŽ m. Ž .if n s m. Then the random play induced by s , t is exactly as 4.1

mq m2Ž . mq m2Ž m. 2above with t replaced by Left, and H s G H s : t s m .nHence

H mq m2s m2Ž .

H s G lim sup G lim s 1.Ž . 2 2m q m m q mmª`mª`

˜Ž .Therefore H s s 1.

5. ENTROPY AND STRATEGIC ENTROPY

In the first example in the previous section we saw that if s has twoŽ .` X Ž X .`different pure strategies s s s and s s s in its support suchk ks1 k ks1

Ž . Ž X .that s, t and s , t induce n-equivalent plays, then player 2, using this t,can never tell which one was actually selected within the first n stages. In

Ž .n Ž X .nother words, the uncertainty about s and s persists even at thek ks1 k ks1Žend of stage n. This suggests that the reduction of uncertainty about a

.pure strategy selected by s in the first n stages using t, which isnŽ .H s : t , is smaller than the uncertainty of s as a probability on the first

ˆnŽ .n stage strategies. Denote by H s the entropy of s viewed as an ˆnŽ .probability on = S . We call H s the n-entropy of s . In this sectionkks1

we will show that the n-strategic entropy of s is at most its n-entropy.For this purpose it is convenient to introduce the concept of the entropy

Ž .of a partition. Let F, FF, m be a probability space and let PP be a finitepartition of F into FF-sets. The entropy of the partition PP with respect tom is defined by

H PP s y m F log m F .Ž . Ž . Ž .ÝmFgPP

The next proposition states that the coarsening of a partition cannotŽ .increase entropy. See Smorodinsky 1971 .

Page 21: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

STRATEGIC ENTROPY 211

PROPOSITION 5.1. If a partition PP is a coarsening of a partition QQ, i.e.,e¨ery element of PP is a union of elements of QQ, then

H PP F H QQ .Ž . Ž .m m

Recall that S s = S , where S is a finite set of stage k strategies.k kk G1Let PP be the finite partition of S induced6 by =n S . The algebra on Sn nks1generated by PP is denoted by SS . We denote by SS the s-algebra on Sn n

Ž .generated by D SS . A mixed strategy is thus a probability on S, SS .nG1 nŽ .Since PP is a finite partition of S into SS-sets, its entropy H PP isn s n

ˆnŽ .defined as above. It is clear from the definition of PP that H s snŽ .H PP .s n

Given player 2’s pure strategy t, we say that s and sX in S areŽ . Ž .n-equivalent with respect to t, or n, t -equivalent for short, if s, t and

Ž X . Ž .s , t induce n-equivalent plays. Since n, t -equivalence is an equivalenceŽ .relation on S, it induces a partition of S denoted by PP t . Thus with eachn

Ž . Ž .DD g PP t is associated an n-history C s v D g HH in one-to-one andn nonto manner. Therefore,

H n s : t s y P C log P CŽ . Ž . Ž .Ý s , t s , tCgHHn

s y s D log s DŽ . Ž .ÝŽ .DgPP tn

s H PP t .Ž .Ž .s n

Ž . Ž .Clearly, PP t is a coarsening of PP . Indeed, PP t is a coarsening of then n npartition of S into the n-equivalence classes which is in turn a coarsening

n ˆnŽ . Ž .of PP . Therefore it follows from Proposition 5.1 that H s : t F H snfor every t g T. By taking the maximum over t we see that the n-strategic

n ˆnŽ . Ž .entropy of s does not exceed its n-entropy: H s F H s . Conse-˜ ˆnŽ . Ž . Ž .quently, H s is at most lim sup 1rn H s , and hence so is thenª`

`Ž .strategic entropy rate H s .

6. THE REPEATED GAMES WITH ENTROPY BOUND

In this section we study the three classes of repeated games each with arestriction on player 1’s strategies in terms of one of the three strategicentropy concepts formulated in Section 4. All the results in this section arebased on the main lemma which we present first.

6 That is, the partition induced by the natural projection of S onto the first n factors.

Page 22: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

NEYMAN AND OKADA212

Ž .Observe that if s g D S has a very small n-strategic entropy, thenLemma 4.1 implies that the entropy of player 1’s stage action mustnecessarily be small at most of the first n stages, i.e., it is ‘‘close to’’ a pureaction. So if player 2 plays a one-shot best response at every stage, then-average payoff for player 1 will be close to his maxmin value in pureactions of the stage game. This is the heuristics for the lemma below. Note

Ž .`that if X is the random play induced by a pair of strategies, thenk ks1Ž < .h X X , . . . , X is AA -measurable. By a slight abuse of notation wek 1 ky1 ky1

Ž < . Ž < . Ž < .denote h X X , . . . , X by h X AA and H X X , . . . , X byk 1 ky1 k ky1 k 1 ky1Ž < .H X AA .k ky1

Ž . Ž .LEMMA 6.1. ;« ) 0, 'g « ) 0 such that ;s g D S , ' t g T such that

r s , t F U# G q «Ž . Ž .n

nŽ . Ž .whene¨er n g N and H s F g « n.

Proof. Without loss of generality we assume that 0 F r F 1. We shallŽ . < A <identify D A with the unit simplex in R while A itself is identified with

Ž . Ž .the extreme points of this simplex. For each a g D A let d a , A s5 5 5 5min a y a , where ? is the L -norm.ag A 1

Let « ) 0 be given. For every pure action b g B, the stage game payoffw Ž .x Ž .E r a, b is continuous in a g D A . The minimum of finitely manya

w Ž .xcontinuous functions is continuous, and thus min E r a, b is contin-bg B a

Ž . Ž . Ž .uous in a g D A . For every a g A, min r a, b F U# G . Thereforebg BŽ . Ž . Ž . Ž .there is d « ) 0 so that, for every a g D A with d a , A - d « , we

w Ž .x Ž .have min E r a, b - U# G q «r2.bg B a

Ž .Take s g D S . Without loss of generality suppose that s is a behav-Ž .` Ž .`ioral strategy, s s s . Define player 2’s strategy t s t ink ks1 s s , k ks1

Ž . w Ž .xsuch a way that, for every k, t v minimizes E r a, b over b g B.s , k s Žv .k

That is, t takes a best response action against s at every k-history. Lets , k kŽ .` Ž .X be the random play induced by s , t .k ks1 s

The second and third properties of entropy in Proposition 3.1 imply thatŽ . Ž . Ž . Ž . Ž .there is k « ) 0 such that d a , A G d « implies H a G k « . Thus

Ž Ž . Ž .. Ž < . Ž . Ž .I d s , A G d « F h X AA rk « , where I ? is the indicator func-k k ky1tion. Notice that both sides of this inequality are measurable with respectto AA and recall our assumption 0 F r F 1. Then at every stage k weky1have

E r X AAŽ .s , t k ky1s

«F U# G q I d s , A - d « q I d s , A G d «Ž . Ž . Ž . Ž . Ž .Ž . Ž .k kž /2

<« h X AAŽ .k ky1F U# G q q .Ž .2 k «Ž .

Page 23: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

STRATEGIC ENTROPY 213

Ž . Ž . nŽ . Ž .Set g « s «k « r2. Assume that n is such that H s F g « n.Using the above inequality together with Lemma 4.1 and the definition of

nŽ .H s , we obtain the following:

n n1 1E r X s E E r X AAŽ . Ž .Ý Ýs , t k s , t s , t k ky1s s sn nks1 ks1

n« 1F U# G q q E h X AAŽ . Ž .Ý s , t k ky1s2 k « nŽ . ks1

n« 1s U# G q q H X AAŽ . Ž .Ý k ky12 k « nŽ . ks1

« H n s : tŽ .ss U# G q qŽ .2 k « nŽ .« H n sŽ .

F U# G q qŽ .2 k « nŽ .« g «Ž .

F U# G q qŽ .2 k «Ž .

s U# G q « .Ž .This completes the proof. Q.E.D.

nŽ .6.1. The Finitely Repeated Game G h

We will modify the finitely repeated game Gn by restricting the set ofplayer 1’s strategies. This is done by imposing an exogenous bound on then-strategic entropy of his strategies. Player 2’s strategy set remains intact.

nŽ .For h ) 0, let S h be the set of player 1’s strategies whose n-strategicnŽ . � Ž . < nŽ . 4 nŽ .entropy is at most h. That is, S h s s g D S H s F h . Let G h

Ž nŽ . Ž . . nŽ . Žbe the game S h , D T , r . Since H s is continuous in s Proposi-nŽ .. nŽ . Ž .tion 4.2 1 , S h is a compact subset of D S . Recall that r is continu-nŽ . Ž . Ž .ous on D S = T. Thus, for each s g D S , r s , ? is continuous on then

Ž .compact set T and hence min r s , t is well defined and continuous int g T nnŽ . nŽ .s . As S h is compact, the maxmin value of G h ,

W n h s max min r s , t ,Ž . Ž .nnŽ . tgTsgS h

is well-defined.nŽ .As illustrated in the next example, the set S h is in general not

convex. Therefore, the minimax theorem does not apply to the gamenŽ . nŽ .G h . Consequently, the value of G h need not exist. Since pure

Page 24: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

NEYMAN AND OKADA214

nŽ .strategies have zero n-strategic entropy, S ; S h for every h G 0. So then n Ž .minimax value of G , V s min max r s, t , also well-defined byt g DŽT . sg S n

nŽ .an argument similar to the above, remains the minimax value of G h .n nŽ .But the maxmin value W h is in general smaller than V .

EXAMPLE 6.1. Consider the matching pennies:

1 y1

y1 1

1Ž .In the one-shot game, H s is simply the entropy of the first stage action.Ž . 1Ž .So for all h G 1 s log 2 , S h coincides with the whole set of mixed

strategies. Fix a positive number h - 1. Let p be the probability of1Ž .choosing the top row. For every 0 F g - 1 there is h s h g - such2

� 4 y1Ž . Ž . 1Ž .that h, 1 y h s H g see also Fig. 1 . Thus S h is the union of two1Ž . �Ž . < 4 �Ž . <disjoint intervals: S h s p, 1 y p 0 F p F h j p, 1 y p 1 y h F p

1 14 Ž . Ž Ž ..F 1 . It is easy to see that W h s y1 q 2h - 0 s V s Val G .

Consider the bound on the n-strategic entropy to be a function of thenumber of repetitions, i.e., h: N ª R. We are interested in the asymptotics

nŽ Ž .. Ž .of W h n . The next theorem asserts that if h n grows more slowly thanany linear function of n, then the unrestricted player 2 can asymptoticallyhold player 1’s payoff down to his maxmin value in pure actions of thestage game.

Ž . nŽ Ž .. Ž .THEOREM 6.1. If lim h n rn s 0, then lim W h n s U# G .nª` nª`

Ž .Proof. Player 1 can guarantee himself at least U# G at every stagenŽ Ž .. Ž .using a pure action, and hence W h n G U# G for every n. It follows

nŽ Ž .. Ž .that lim inf W h n G U# G .nª`

Ž .Next, take « ) 0 arbitrarily. Let g « ) 0 be as specified in Lemma 6.1.Ž .By the condition of the theorem, there is n « such that for, every

Ž . Ž . Ž . Ž .n G n « , we have h n F g « n. Then by Lemma 6.1, for every n G n «nŽ Ž .. Ž . Ž .and every s g S h n , r s , t F U# G q « , where t is as defined inn s s

nŽ Ž .. Ž .the proof of the lemma. This shows that lim sup W h n F U# G qnª`

« . As this inequality holds for every « ) 0, we conclude thatnŽ Ž .. Ž .lim sup W h n F U# G , which completes the proof. Q.E.D.nª`

Ž .It is easy to see that if h n rn is bounded away from zero, thennŽ Ž .. Ž .W h n is bounded away from U# G provided, of course, that there is a

Ž .feasible payoff in G for player 1 which is strictly greater than U# G .Consider, for example, the matching pennies of Example 6.1. Suppose that

1Ž . Ž xh n s k n for every n for some 0 - k F 1. Let p g 0, be such that2Ž . Ž .` Ž . Ž . nŽ .H p s k and define s s s by s ? s p, 1 y p . Then H s sk ks1 k

w Ž .xk n and min E r a , b s y1 q 2 p for every k. Thereforebg B s k kŽ . nŽ Ž ..min r s , t s y1 q 2 p ) y1 and thus W h n ) y1.t g T n

Page 25: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

STRATEGIC ENTROPY 215

U Ž .6.2. The Discounted Game G hl

In this section we consider the l-discounted game. In contrast to theprevious section, we will impose a bound on the total strategic entropy ofplayer 1’s strategies. Again, player 2’s strategy set remains intact. We willalso show that bounding the strategic entropy rate is not an essentialrestriction in the discounted game. Let us start with a lemma which is ananalogue of Lemma 6.1.

Ž . Ž .LEMMA 6.2. ;« ) 0, 'u « ) 0 such that ;s g D S , ' t g T such that

r s , t F U# G q «Ž . Ž .l

w . U Ž . Ž . Ž .whene¨er l g 0, 1 and H s F u « r 1 y l .

Proof. As a preparatory step, we express the l-discounted payoff as an7 Ž . Ž . Ž .average of n-average payoffs. Fix s , t g D S = D T . First,

n ny1

E r a , b s E r a , b y E r a , bŽ . Ž . Ž .Ý Ýs , t n n s , t k k s , t k kks1 ks1

s nr s , t y n y 1 r s , t ,Ž . Ž . Ž .n ny1

Ž .where we set r s , t s 0. From this it follows that, for each m g N,0

mny1E 1 y l l r a , bŽ . Ž .Ýs , t n n

ns1

mny1s 1 y l l E r a , bŽ . Ž .Ý s , t n n

ns1

mny1s 1 y l l nr s , t y n y 1 r s , t� 4Ž . Ž . Ž . Ž .Ý n ny1

ns1

m mny1 ns 1 y l l nr s , t y 1 y l l nr s , tŽ . Ž . Ž . Ž .Ý Ýn n

ns1 ns1

q 1 y l lmmr s , tŽ . Ž .m

mny1 n ms 1 y l l y l nr s , t q 1 y l l mr s , t .Ž . Ž . Ž . Ž . Ž .Ý n m

ns1

Ž . < Ž . <Since 0 F l - 1 and r a , b F K s max r a, b for every n g N, then n a, bŽ .left-hand side of the above equality converges to r s , t as m tends tol

7 Although it is a well-known formula of Abel’s partial summation, we present it hereexplicitly with our payoff function to facilitate the reading of the proof.

Page 26: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

NEYMAN AND OKADA216

Ž .infinity. On the other hand, r s , t F K for every n g N andnlim lmm s 0. Therefore we have obtained the following equality:mª`

`ny1 nr s , t s 1 y l l y l nr s , t . 6.1Ž . Ž . Ž . Ž . Ž .Ýl n

ns1

Assume, again, that 0 F r F 1. Let « ) 0 be given. Lemma 6.1 impliesŽ . Ž .that there is g « ) 0 such that for every n and s g D S

«nr s , t F U# G q q I H s G g « n . 6.2Ž . Ž . Ž . Ž . Ž .Ž .n s 2

Ž . Ž . ` Ž .Ž ny1 n.Using 6.1 and 6.2 , and as Ý 1 y l l y l n s 1, we havens1

`ny1 nr s , t F 1 y l l y lŽ . Ž . Ž .Ýl s

ns1

«n= n U# G q q I H s G g « nŽ . Ž . Ž .Ž .ž /2

`«ny1 ns U# G q q 1 y l l y lŽ . Ž . Ž .Ý2 ns1

= nI H n s G g « n . 6.3Ž . Ž . Ž .Ž .U Ž . Ž . Ž .If s is such that H s F g « «r2 1 y l , then for every n ) «r2

Ž .1 y l we have

g « «Ž .UnH s F H s F - g « n ,Ž . Ž . Ž .

2 1 y lŽ .Ž nŽ . Ž . . Ž . w Ž .x Ž .that is, I H s G g « n s 0. Write n « , l for «r2 1 y l and u «

Ž . U Ž . Ž . Ž .for g « «r2. Then, for every s with H s F u « r 1 y l , the lastŽ .term in 6.3 is bounded by

Ž .n « , l «ny1 n1 y l l y l n F 1 y l n « , l F ,Ž . Ž . Ž . Ž .Ý 2ns1

Ž .which, together with 6.3 , completes the proof. Q.E.D.U Ž .For h G 0 let G h be the l-discounted game in which player 1’sl

strategies are restricted to those with total strategic entropy at most h. SetU Ž . � Ž . < U Ž . 4 U Ž . Ž U Ž . Ž . .S h s s g D S H s F h and G h s S h , D T , r . Noticel l

U Ž . nŽ . nŽ . U Ž .that S h s F S h . As each S h is compact, S h is also compact.nBy an argument analogous to the one in the previous section, one can

U Ž .show that the maxmin value of G h ,l

WU h s max min r s , tŽ . Ž .l lUŽ . tgTsgS h

Page 27: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

STRATEGIC ENTROPY 217

is well defined. We will consider the bound on the total strategic entropy hw .to be a function of the discount factor, h: 0, 1 ª R , and study theq

U Ž Ž ..asymptotics of W h l as l ª 1.l

Note that a l-discounted game can be viewed as a finitely repeatedgame whose number of repetitions is a geometrically distributed random

Ž .variable with mean 1r 1 y l . With this view, the next theorem is compa-Ž .rable to Theorem 6.1. It asserts that if h l grows more slowly than any

Ž . U Ž Ž .. Ž .linear function of 1r 1 y l , then W h l converges to U# G as ll

tends to 1.

Ž . Ž . U Ž Ž ..THEOREM 6.2. If lim 1 y l h l s 0, then lim W h l slª 1 lª 1 l

Ž .U# G .

Ž .Proof. Since player 1 can guarantee U# G by a pure strategy andU Ž Ž .. w . U Ž Ž .. Ž .S ; S h l for every l g 0, 1 , we have lim inf W h l G U# G .lª 1 l

U Ž Ž .. Ž .Next we show that lim sup W h l F U# G .lª 1 l

Ž . w .Let « ) 0 be given. By the condition of the theorem there is l « g 0, 1w Ž . . Ž . Ž . Ž .such that, for every l g l « , 1 , we have h l F u « r 1 y l , where

Ž . w Ž . .u « ) 0 is as specified in Lemma 6.2. Then, for every l g l « , 1 andU Ž Ž .. Ž . Ž .s g S h l , we have r s , t F U# G q « . This shows thatl s

U Ž Ž .. Ž .lim sup W h l F U# G q « . As « ) 0 was taken arbitrarily, welª 1 l

have the desired result. Q.E.D.

To conclude this subsection we demonstrate that in the l-discountedgame there is an «-optimal strategy with zero strategic entropy rate forevery « ) 0. Therefore, putting a bound on the strategic entropy rate isnot a restriction in the discounted games.

w . Ž . Ž . `Ž .OBSERVATION. ;l g 0, 1 , ;« ) 0, 's g D S such that i H s s 0Ž . Ž . Ž .and ii ; t g T , r s , t G Val G y « .l

Proof. Without loss of generality, assume that 0 F r F 1. Given l gw . Ž . ` ky10, 1 and « ) 0, choose m large enough so that 1 y l Ý l - « .ksmq1

U Ž .Let a be an optimal mixed action of player 1 in the stage game G, i.e.,w Ž .x Ž . � 4UE r a, b G Val G for every b g B. Define s s s as follows. Fora n ns1

n s 1, . . . , m, s s aU regardless of the history, and for n ) m, s is ann narbitrary pure action.

Ž .`Observe that, for any t g T , if X is the random play induced byn ns1Ž . Ž . nŽ . Ž . Ž . Ž . Ž U .s , t , then 1rn H s : t s 1rn H X , . . . , X s mrn H a ª 01 n

Ž .as n ª `. Hence i . By playing s , regardless of player 2’s strategy, playerŽ .1 receives an expected payoff at least Val G at every stage up to stage m

and at least 0 thereafter. Therefore, by the choice of m, player 1’sŽ . Ž .discounted payoff is at least 1 y « Val G , which in turn is at least

Ž . Ž .Val G y « under the assumption r F 1. This proves ii . Q.E.D.

Page 28: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

NEYMAN AND OKADA218

Ž .The above phenomenon is due to the facts that a the definition ofstrategic entropy rate does not impose any restriction on the entropy of

Ž .the induced play in finite stages and b with discounting, payoffs in thedistant future have little effect on the present value. We will see in thenext section that in the undiscounted games, where payoffs in a finitenumber of stages have no effect on the overall payoff, a bound on thestrategic entropy rate has an appreciable effect on what the restrictedplayer can guarantee in the long run.

`Ž .6.3. The Undiscounted Game G g

`Ž .For each g G 0 the game G g is the infinite repetitions of G in whichplayer 1’s strategies are restricted to those that have strategic entropy rate

`Ž . � Ž . < `Ž . 4at most g . Define S g s s g D S H s F g . The set of strategiesŽ .available to player 2 remains D T .

For a payoff x to be a maxmin value of an infinitely repeated game, it isnatural to require two things. First, player 1 should be able to ‘‘guarantee’’himself x in the sense that he has a strategy that yields something veryclose to x in a long run regardless of player 2’s strategy. Such a strategy

Ž .may depend on how close the payoff should be to x «-optimality . Second,player 2 should be able to ‘‘defend’’ x in the sense that, for every strategyof player 1, player 2 has a counterstrategy that prevents player 1 fromgaining substantially more than x in a long run. Again such a counter-strategy may depend not only on player 1’s strategy but also on how closethe resulting payoff should be to x. The next theorem asserts in particular

`Ž . Ž .that the maxmin value of G 0 equals U# G . It asserts, moreover, thatŽ .the payoff U# G has stronger properties than the two requirements

defining the maxmin value.

Ž . `Ž . Ž .THEOREM 6.3. i 's g S 0 such that ; t g T , ;n, r s , t GnŽ .U# G .

Ž . `Ž . Ž . Ž .ii ;s g S 0 , ' t g T such that ;« ) 0 'n « such that ;n ) n « ,Ž . Ž .r s , t F U# G q « .n

Ž .Proof. Player 1 can obtain at least U# G at every stage using a pureŽ . Ž . `Ž .action. This proves i . To prove ii , take s with H s s 0. Let t g Ts

be as in Lemma 6.1, and let « ) 0 be given. Then by the definition of`Ž . Ž . Ž . nŽ . Ž .H s there is n « such that, for every n G n « , H s : t F g « n,s

Ž . Ž . Ž .where g « is as specified in Lemma 6.1. Therefore, r s , t F U# G qn s

« . This completes the proof. Q.E.D.

Page 29: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

STRATEGIC ENTROPY 219

7. BOUNDED COMPLEXITY AND STRATEGIC ENTROPY

In this section we will apply Theorem 1 to repeated games with finiteautomata and bounded recall. The fact that any probability distributionover k points has entropy at most log k plays a crucial role in thesubsequent discussion.

7.1. Finite Automata

Ž .Given a stage game G s A, B, r , an automaton of player 1 is a² :four-tuple M s Q, q , f , g , where Q is a set of states, q g Q is an1 1

initial state, f : Q ª A is an action function, and g : Q = B ª Q is aŽtransition function. Player 2’s automaton is defined by substituting A with

.B. By the size of an automaton we mean the cardinality of the set of its< <states Q , which may be infinite. A finite automaton is an automaton of a

finite size.An automaton M plays a repeated game as follows. At each stage n it

Ž .takes an action prescribed by f for the current state, say q , i.e., f q ; it isn nset for q at the first stage. Then it changes its state to q specified by g1 nq1as a function of the current state q and player 2’s action b , that is,n n

Ž .q s g q , b .nq1 n nŽ .`Every automaton M induces a pure strategy s s s of player 1 inn ns1

the repeated game in the following manner. First, for any sequence ofplayer 2’s actions b , . . . , b define an extension of the transition function1 ninductively by

g q , b , . . . , b s g g q , b , . . . , b , b .Ž . Ž .Ž .1 n 1 ny1 n

Ž . ŽŽ ..`Then s s f q and, for n ) 1, for each v s a , b g V ,1 1 k k ks1 `

s v s f g q , b , . . . , b .Ž . Ž .Ž .n 1 1 ny1

Conversely, every pure strategy of player 1 has a representation by anautomaton, possibly of an infinite size. If a pure strategy s is equivalent toa pure strategy induced by an automaton, we say that s is implementableby that automaton. Here we set a bound on the size of automata thatplayer 1 may use so that not all pure strategies are available to him.

For each m g N let FA be the set of all pure strategies implementablemn Ž .by automata of size m. Denote by G the game FA , T , r which is them m n

Žn-fold repetition of G in which player 1 is restricted to FA and mixturesm.over it while there is no restriction on player 2’s strategies. The payoff

function r is restricted accordingly.nŽ . n nSince D FA is a compact convex set, the value of G , denoted by Vm m m

exists, i.e.,

V n s min max r s, t s max min r s , t .Ž . Ž .m n nŽ . sgFA Ž . tgTtgD T sgD FAm m

Page 30: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

NEYMAN AND OKADA220

Thus it is obvious, nonetheless important to realize, that in studying thevalue of a game, if it exists, we can focus our attention on either theminimax value or the maxmin value. It is the latter that we utilize in thesubsequent discussion.

We consider the bound on the size of automata to be a function of thenumber of repetitions, m: N ª N, and study the asymptotics of V n . It ismŽn.

< < C m Ž .easily seen that FA F m m G 2 for some positive constant C, e.g.,m< < < < Ž . nŽ . nŽ U . UC s B q log A q 1. Let s g D FA and H s s H s : t . Let Cm

Ž U .be the number of the n, t -equivalence classes of FA . ObviouslymU < < Ž .C F FA . Then from the discussion of Section 5 and Proposition 3.1 1m

nŽ . Uit follows that H s F log C F Cm log m. The following theorem is animmediate consequence of this observation and Theorem 6.1.

Ž . Ž . nCOROLLARY 7.1. If lim m n log m n rn s 0, then lim V snª` nª` mŽn.Ž .U# G .

n Ž . Ž .Proof. Obviously V G U# G for every n and m n . As notedmŽn.Ž . nŽ Ž . Ž ..above, D FA ; S Cm n log m n , and since the maxmin value ismŽn.

nondecreasing in the range over which max is taken,

Vn s max min r s , t F W n Cm n log m n .Ž . Ž . Ž .Ž .mŽn. nŽ . tgTsgD FAmŽn.

Ž . Ž .Assume that m n log m n rn ª 0 as n ª `. By Theorem 6.1 the right-Ž .hand side of the above inequality converges to U# G as n ª `. Thus

n Ž .V converges to U# G as n ª `. Q.E.D.mŽn.

Ž . Ž . Ž .In Neyman 1997 it is conjectured that if m n s o nrlog n , thenn Ž . Ž . Ž .V converges to U# G as n tends to infinity. Since m n s o nrlog nmŽn.

Ž . Ž . Ž .is equivalent to m n log m n s o n , Corollary 7.1 affirms the truth ofthe conjecture. Note that Corollary 7.1 gives only a sufficient condition for

n Ž . Ž .V to converge to U# G . It is also conjectured in Neyman 1997 that ifmŽn.Ž . n Ž .lim nrm n log n s 0, then V converges to Val G .nª` mŽn.

Ž .Next we consider the game G s FA , T , r , the l-discountedl, mŽl. mŽl. l

Ž . Žgame in which player 1 is restricted to finite automata of size m l and.mixtures of them which is a function of l. Denote by V the value ofl, mŽl.nŽ . Ž .G . Since H s F Cm log m for every s g D FA , by taking thel, mŽl. m

U Ž . Ž .supremum over n, we have H s F Cm log m. That is, D FA ;mU Ž .S Cm log m . The next corollary follows from this and Theorem 6.2. The

proof is analogous to that of Corollary 7.1 and is omitted.

Ž . Ž . Ž .COROLLARY 7.2. If lim 1 y l m l log m l s 0, thenlª 1Ž .lim V s U# G .lª 1 l, mŽl.

Page 31: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

STRATEGIC ENTROPY 221

7.2. Bounded Recall

Ž .`A pure strategy s s s of player 1 in a repeated game is said to ben ns1of bounded recall of size l, or simply l-recall, if its choice of action at eachstage depends only on actions taken by both players in the last l stages.

Ž . lFormally, such a strategy is represented by a function z: A = B ª AŽ . Ž . land an initial memory e s e , . . . , e g A = B , where, for each v s1 l

Ž .`v g V ,k ks1 `

z e , . . . , e , v , . . . , v if n - lŽ .n l 1 ny1s v sŽ .n ½ z v , . . . , v if n G l.Ž .ny l ny1

Ž .We will write s s e, z and denote the set of all pure strategies of l-recallby BR .l

n, l Ž .Denote by G the game BR , T , r which is the n-fold repetition of Gl nŽ .in which player 1 is restricted to BR and mixtures over it while there isl

no restriction on player 2’s strategies. The payoff matrix r is restrictednaccordingly. As in Gn , the value of Gn, l exists:m

V n , l s min max r s, t s max min r s , t .Ž . Ž .n nŽ . sgBR Ž . tgTtgD T sgD BRl l

Ž . ²Given s s e, z g BR , we can construct an automaton M s Q, q, f ,l: Ž . < < l Ž . lg of size m l s A = B that implements s. Set Q s A = B , q s e,1

Ž . Ž . Žand f s z. For v s v , . . . , v g Q and b g B, define g v, b s v ,1 l 2Ž Ž . ... . . , v , f v , b . Therefore, by identifying strategies with their equiva-l

lence classes, we have BR ; FA . The next corollary, stated in Neymanl mŽ l .Ž .1997 as a conjecture, now follows from this and Corollary 7.1. We omitan easy proof.

COROLLARY 7.3. There is a positi e constant K such that if n: N ª NŽ . Ž . nŽ l ., l Ž .satisfies n l ) exp Kl , then lim V s U# G .l ª`

< <Take, for example, K s log A = B q 1.

8. CONCLUDING REMARKS

We have defined strategic entropy for two-person games. The definitioncan be readily extended to n-person games. Given a vector of mixed or

Ž 1 n. Ž yi i.behavioral strategies s s s , . . . , s , let H s : s be the entropy ofŽ yi i.the play up to stage n induced by s : s in which player i plays a pure

strategy s and player j / i plays s j. For a mixed strategy t i of player i,inŽ yi i. nŽ yi i. idefine H s : t to be the average of H s : s with respect to t .

This is the average amount of information on the other players’ strategies

Page 32: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

NEYMAN AND OKADA222

syi that player i can obtain in the first n stages using t i. One then definesyi Ž yi i.the n-strategic entropy of s by the maximum of H s : s over all

pure strategies of player i. The total strategic entropy and the strategicentropy rate of s for player i are defined as in Section 4.

It is also of interest to extend the strategic entropy concept so as tomeasure the uncertainty of the play that a group of players collectively

Ž .faces against possibly correlated strategies used by another group ofplayers, or the amount of information that a group of players can obtainabout strategies used by another group of players. One such extension is asfollows.

Given a subset J of the n players, let s J be a correlated strategy of Jand let syJ be a correlated strategy of the players not in J. For each pure

J Ž i. nŽ yJ J .strategy vector s s s define H s : s to be the entropy of theig J

Ž yJ J . nŽ yJ J .play up to stage n induced by s , s , and let H s : s be thenŽ yJ J . Jexpectation of H s : s with respect to s . The latter is the average

amount of information on syJ that the coalition J can collectively obtainusing s J. Then the n-strategic entropy of syJ is defined to be the

nŽ yJ J . Jmaximum of H s : s over all s . The study of these entropy-basedquantities may be useful in the study of n-person games with boundedcomplexity, in particular, when some mode of correlation is possiblethrough signaling of actions among the members of a coalition even if therule of the game does not allow correlation in the stage game. Forexample, consider a three-player game in which there is a bound on thesize of automata each player may use. Then the player with the smallestbound may be able to send a signal by means of a long sequence of actionsthat can be deciphered by the player with the largest bound but not by theone with the middle bound, thereby generating a correlation. The presenceof such a phenomenon clearly affects the individually rational level of eachplayer and hence equilibrium payoffs. One such example is given in

Ž .Neyman 1997 .As an application of the results obtained in this paper for the non-zero-

sum case, there are folk theorem type results for finitely repeated andl-discounted two-person games with finite automata. Given a finite two-

Ž .person game in strategic form G s A , A , r , r , let w be the maxmin1 2 1 2 1Ž .value for player 1 in pure actions, i.e., w s max min r a, b , and1 ag A bg A 11 2

let u be the minimax value for player 2 also in pure actions, i.e.,2˜Ž .u s min max r a, b . Let F be the set of feasible payoff vectors2 ag A bg A 21 2

in which player 1 receives at least w and player 2 receives at least u .1 2nŽ Ž .. Ž Ž ..Denote by E m n and E m l the set of equilibrium payoffs of thel

n-time repeated game and the l-discounted game, with G as the stageŽ . Ž .game, in which player 1 is restricted to automata of size m n and m l ,

Ž .respectively. In Neyman and Okada 1998 it is shown that if there is a

Page 33: Strategic Entropy and Complexity in Repeated Games · Games and Economic Behavior 29, 191]223 1999 . Article ID game.1998.0674, available online at http:rr on Strategic Entropy and

STRATEGIC ENTROPY 223

˜ nŽ . Ž . Ž Ž ..payoff vector x, y in F such that x ) w , then 1 E m n1˜converges in the Hausdorff topology to F as n ª ` under the condition

˜Ž . Ž . Ž . Ž Ž ..that lim m n log m n rn s 0 and 2 E m l converges also to F asnª` l

Ž . Ž . Ž .l ª 1 under the condition that lim 1 y l m l log m l s 0. As usual,lª 1part of the proof involves constructing an equilibrium path and punish-ment strategies for a deviation from it. Theorems 6.1 and 6.2 ensure theexistence of effective punishment strategies of player 2.

REFERENCES

Ž .Aumann, R. J. 1981 . ‘‘Survey of Repeated Games,’’ in Essays in Game Theory andMathematical Economics in Honor of Oskar Morgenstern, pp. 11]42, Zurich: Bibliographis-ches Institut.

Ž .Ben-Porath, E. 1993 . ‘‘Repeated Games with Finite Automata,’’ J. Econ. Theory 59, 17]32.Ž .Cover, T. M. and Thomas, J. A. 1991 . Elements of Information Theory. New York: Wiley.

Ž .Kalai, E. 1990 . ‘‘Bounded Rationality and Complexity in Repeated Games,’’ in Game TheoryŽ .and Applications T. Ichiishi, A. Neyman, and Y. Tauman, Eds. . San Diego: Academic

Press.Ž .Kalai, E., Samet, D., and Stanford, W. 1988 . ‘‘A Note on Reactive Equilibria in the

Discounted Prisoner’s Dilemma and Associated Games,’’ Int. J. Game Theory 3, 117]186.Ž .Lehrer, E. 1988 . ‘‘Repeated Games with Stationary Bounded Recall Strategies,’’ J. Econ.

Theory 46, 130]144.Ž .Lehrer, E. 1994 . ‘‘Finitely Many Players with Bounded Recall in Infinitely Repeated

Games,’’ Games Econ. Beha¨ior 7, 390]405.Ž .Neyman, A. 1985 . ‘‘Bounded Complexity Justifies Cooperation in the Finitely Repeated

Prisoner’s Dilemma,’’ Econ. Lett. 19, 227]229.Ž .Neyman, A. 1997 . ‘‘Cooperation, Repetition, and Automata,’’ in Cooperation: Game-Theo-

Ž .retic Approaches S. Hart and A. Mas-Colell, Eds. , NATO ASI}Series F, Vol. 155, pp.233]255. BerlinrNew York: Springer-Verlag.

Ž .Neyman, A. 1998 . ‘‘Finitely Repeated Games with Finite Automata,’’ Math. Oper. Res., toappear.

Ž .Neyman, A. and Okada, D. 1998 . ‘‘Two-Person Repeated Games with Finite Automata,’’Discussion Paper 173, Center for Rationality and Interactive Decision Theory, HebrewUniv. of Jerusalem, Israel.

Ž .Papadimitriou, C. H. and Yannakakis, M. 1994 . ‘‘On Complexity as Bounded Rationality:Extended Abstract,’’ in STOC 94, pp. 726]733.

Ž .Shannon, C. E. 1948 . ‘‘A Mathematical Theory of Communication,’’ Bell System Tech. J. 27,379]423, 623]656.

Ž .Smorodinsky, M. 1971 . Ergodic Theory, Entropy, Lecture Notes in Mathematics. BerlinrNewYork: Springer-Verlag.

Ž .Zemel, E. 1989 . ‘‘Small Talk and Cooperation: A Note on Bounded Rationality,’’ J. Econ.Theory 49, 1]9.


Recommended