Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | anne-bailey |
View: | 218 times |
Download: | 1 times |
1
Near-Optimal Play in a Social Learning Game
Ryan Carr, Eric Raboin, Austin Parker, and Dana Nau
Department of Computer Science, University of Maryland College Park, MD, 20742, USA
{carr2,eraboin,austinjp,nau}@cs.umd.edu
2
Outline
Introduction
The Social Learning Strategies Tournament
Evaluating Social Learning Strategies
Computing “Near-Optimal” Strategies
Conclusions
Ongoing and Future Work
3
Introduction
Exploration vs. exploitation in a social setting You are an agent in an unknown environment
Do not know how to do simple tasks, like find food, travel, etc.
Can work out how to do them on your own, or copy others
What do you do?
4
The Social Learning Strategies Tournament
Evolutionary simulation with social interaction Designed to examine biological and cultural evolution of
communication Developed by the European Commission’s Cultaptation
Project €10,000 Prize 102 entries from numerous fields (Biology, Physics,
Psychology, Sociology, Mathematics, Philosophy…)
5
Our Contributions
Metric for evaluating social learning strategies: Expected Reproductive Utility (ERU)
Proof that maximizing ERU maximizes chances of winning the game
Formula for calculating ERU within any ε > 0 Proof that near-optimal strategies (within ε > 0)
can be found with a finite amount of computation Algorithm to do this computation and find the
near-optimal strategies
6
The Social Learning Strategies Tournament: Rules
Let V be a set of 100 actions, each with some payoff utility drawn from probability distribution π
Environment populated by 100 agents, each using some strategy
Agents begin with no knowledge of V or π Each round, each agent may:
Innovate (I): Learn a random action from V and its payoff Observe (O): Learn an action performed by some other
agent, and its payoff Exploit (E): Perform an action to gain payoff
Payoff of an action changes with probability c
7
The Social Learning Strategies Tournament: Rules
Agents die with a 2% chance every round Dead agents are replaced by offspring of living agents Agents' chance to reproduce is proportional to their
per-round payoff Game lasts 10,000 rounds
Each strategy's score is its average population over the last 2,500 rounds
8
Example Game
Strategy I1 – Innovates once, then exploits forever Learns action 1 (payoff 3)
Strategy I2O – Innovates twice, observes, then exploits forever Learns action 3 (payoff 8), action 2 (payoff 5), then
observes I1
9
Outline
Introduction
The Social Learning Strategies Tournament
Evaluating Social Learning Strategies
Computing “Near-Optimal” Strategies
Conclusions
Ongoing and Future Work
10
Terminology
History h describes what actions the agent performed each round, and their result h = <action, move number, move payoff>,... Ex: I1: <Inv,1,3>, <X[1],1,3> H is the set of all possible histories
A Strategy S is a function that describes what action an agent will take for a given history h S(h) = action Ex: I1(<Inv,1,3>, <X[1],1,3>) = X[1]
11
Expected Reproductive Utility
Each round, a single agent reproduces with probability proportional to its per-round payoff
ERU measures the expected total per-round payoff the agent will earn in its lifetime
For strategy S
Note: Not computable, |H| = ∞
12
Expected Reproductive Utility
A recursive definition of ERU:
Still not computable, since recursion never stops We can stop the recursion at a certain depth, if we
are willing to lose some accuracy
13
Expected Reproductive Utility
A depth-limited version:
Stops recursion after round k Computable, but not exact Need an upper-bound on ERU obtained after
round k
14
Bounding ERU
Each round, agent dies with probability d Probability of living k rounds: (1-d)k
Let Q be the reproductive utility gained by exploiting some action Assume we do so on every round after k
Q(1-d)k+1 + Q(1-d)k+2 + Q(1-d)k+3 + ... is the ERU we get from this This geometric series converges when 0 < d < 1
15
Bounding ERU
G(k,v) gives us the ERU gained by exploiting an action with value v on every round after k Derivation of G(k,v) is in the paper
Let vmax
be the highest possible payoff in V
G(k, vmax
) is an upper-bound on ERU gained
after round k
17
Maximizing ERU Leads to Optimal Performance
Proof Intuition (more details in paper): ERU(S) sums, for each round r:
A - Probability that an agent lives for r rounds, times B - Expected per-round payoff for S on round r
Each round, chance that any agent using S reproduces is proportional to the sum, for all r, of: C - Portion of the population of S agents that have lived
for r rounds, times D – Average per-round payoff of agents that have lived
r rounds In the expected case, A ~ C and B ~ D
18
Outline
Introduction
The Social Learning Strategies Tournament
Evaluating Social Learning Strategies
Computing “Near-Optimal” Strategies
Conclusions
Ongoing and Future Work
19
Finding a Near-Optimal Strategy
Corrolary 1: For any error tolerance ε > 0, there is some k' such that G(k', v
max ) < ε
Given k', we can compute a strategy S' that optimizes ERU over the first k' rounds ERU(S') must be within ε of optimal Algorithm 1 in the paper, essentially a depth-first
search of all <history, action> pairs
20
Outline
Introduction
The Social Learning Strategies Tournament
Evaluating Social Learning Strategies
Computing “Near-Optimal” Strategies
Conclusions
Ongoing and Future Work
21
Conclusions Studying games like the SLST can help us understand cultural
and biological evolution of communication We have
Introduced the ERU metric for Social Learning strategies Proven that maximizing ERU leads to optimal play in the Social
Learning Strategies Tournament Given a formula for computing the ERU of a strategy to within
any ε > 0 Given an algorithm that, given any ε > 0, will find a strategy
whose ERU is within ε of optimal Limitations
Strategy-generating algorithm has exponential running time Algorithm needs detailed models of environment and opponents
to guarantee near-optimality
22
Ongoing and Future Work
Ongoing work: Mitigating exponential running time of our algorithm Dealing with the lack of accurate opponent models Testing near-optimal strategies against in-house
agents Future work:
Testing near-optimal strategies against tournament winners
Dealing with unknown V and π Dealing with noise in Observe moves