1 Near-Optimal Play in a Social Learning Game Ryan Carr, Eric Raboin, Austin Parker, and Dana Nau...

1

Near-Optimal Play in a Social Learning Game

Ryan Carr, Eric Raboin, Austin Parker, and Dana Nau

Department of Computer Science, University of Maryland College Park, MD, 20742, USA

{carr2,eraboin,austinjp,nau}@cs.umd.edu

2

Outline

Introduction

The Social Learning Strategies Tournament

Evaluating Social Learning Strategies

Computing “Near-Optimal” Strategies

Conclusions

Ongoing and Future Work

3

Introduction

Exploration vs. exploitation in a social setting You are an agent in an unknown environment

Do not know how to do simple tasks, like find food, travel, etc.

Can work out how to do them on your own, or copy others

What do you do?

4


Evolutionary simulation with social interaction Designed to examine biological and cultural evolution of

communication Developed by the European Commission’s Cultaptation

Project €10,000 Prize 102 entries from numerous fields (Biology, Physics,

Psychology, Sociology, Mathematics, Philosophy…)

5

Our Contributions

Metric for evaluating social learning strategies: Expected Reproductive Utility (ERU)

Proof that maximizing ERU maximizes chances of winning the game

Formula for calculating ERU within any ε > 0 Proof that near-optimal strategies (within ε > 0)

can be found with a finite amount of computation Algorithm to do this computation and find the

near-optimal strategies

6

The Social Learning Strategies Tournament: Rules

Let V be a set of 100 actions, each with some payoff utility drawn from probability distribution π

Environment populated by 100 agents, each using some strategy

Agents begin with no knowledge of V or π Each round, each agent may:

Innovate (I): Learn a random action from V and its payoff Observe (O): Learn an action performed by some other

agent, and its payoff Exploit (E): Perform an action to gain payoff

Payoff of an action changes with probability c

7

The Social Learning Strategies Tournament: Rules

Agents die with a 2% chance every round Dead agents are replaced by offspring of living agents Agents' chance to reproduce is proportional to their

per-round payoff Game lasts 10,000 rounds

Each strategy's score is its average population over the last 2,500 rounds

8

Example Game

Strategy I1 – Innovates once, then exploits forever Learns action 1 (payoff 3)

Strategy I2O – Innovates twice, observes, then exploits forever Learns action 3 (payoff 8), action 2 (payoff 5), then

observes I1

9

Outline

Introduction




Conclusions


10

Terminology

History h describes what actions the agent performed each round, and their result h = <action, move number, move payoff>,... Ex: I1: <Inv,1,3>, <X[1],1,3> H is the set of all possible histories

A Strategy S is a function that describes what action an agent will take for a given history h S(h) = action Ex: I1(<Inv,1,3>, <X[1],1,3>) = X[1]

11

Expected Reproductive Utility

Each round, a single agent reproduces with probability proportional to its per-round payoff

ERU measures the expected total per-round payoff the agent will earn in its lifetime

For strategy S

Note: Not computable, |H| = ∞

12


A recursive definition of ERU:

Still not computable, since recursion never stops We can stop the recursion at a certain depth, if we

are willing to lose some accuracy

13


A depth-limited version:

Stops recursion after round k Computable, but not exact Need an upper-bound on ERU obtained after

round k

14

Bounding ERU

Each round, agent dies with probability d Probability of living k rounds: (1-d)k

Let Q be the reproductive utility gained by exploiting some action Assume we do so on every round after k

Q(1-d)k+1 + Q(1-d)k+2 + Q(1-d)k+3 + ... is the ERU we get from this This geometric series converges when 0 < d < 1

15

Bounding ERU

G(k,v) gives us the ERU gained by exploiting an action with value v on every round after k Derivation of G(k,v) is in the paper

Let vmax

be the highest possible payoff in V

G(k, vmax

) is an upper-bound on ERU gained

after round k

17

Maximizing ERU Leads to Optimal Performance

Proof Intuition (more details in paper): ERU(S) sums, for each round r:

A - Probability that an agent lives for r rounds, times B - Expected per-round payoff for S on round r

Each round, chance that any agent using S reproduces is proportional to the sum, for all r, of: C - Portion of the population of S agents that have lived

for r rounds, times D – Average per-round payoff of agents that have lived

r rounds In the expected case, A ~ C and B ~ D

18

Outline

Introduction




Conclusions


19

Finding a Near-Optimal Strategy

Corrolary 1: For any error tolerance ε > 0, there is some k' such that G(k', v

max ) < ε

Given k', we can compute a strategy S' that optimizes ERU over the first k' rounds ERU(S') must be within ε of optimal Algorithm 1 in the paper, essentially a depth-first

search of all <history, action> pairs

20

Outline

Introduction




Conclusions


21

Conclusions Studying games like the SLST can help us understand cultural

and biological evolution of communication We have

Introduced the ERU metric for Social Learning strategies Proven that maximizing ERU leads to optimal play in the Social

Learning Strategies Tournament Given a formula for computing the ERU of a strategy to within

any ε > 0 Given an algorithm that, given any ε > 0, will find a strategy

whose ERU is within ε of optimal Limitations

Strategy-generating algorithm has exponential running time Algorithm needs detailed models of environment and opponents

to guarantee near-optimality

22


Ongoing work: Mitigating exponential running time of our algorithm Dealing with the lack of accurate opponent models Testing near-optimal strategies against in-house

agents Future work:

Testing near-optimal strategies against tournament winners

Dealing with unknown V and π Dealing with noise in Observe moves

Date post:	31-Dec-2015
Category:	Documents
Upload:	anne-bailey
View:	218 times
Download:	1 times

1 Near-Optimal Play in a Social Learning Game Ryan Carr, Eric Raboin, Austin Parker, and Dana Nau...

Documents