Joint Strategy Fictitious Play

transcript

Sherwin Doroudi

“Adapted” from

J. R. Marden, G. Arslan, J. S. Shamma, “Joint strategy fictitious play with inertia for potential games,” in Proceedings of the 44th IEEE Conference on Decision and Control, December 2005, pp. 6692-6697.

Review: Game• Players:

• Actions:

• Payoffs:

Review: GameWe then play the game repeatedly in

“stages,” starting at stage 0. Players can use learning algorithms as discussed in lecture. Note that players know the structural form of their own payoff function, but do not know the form of the other players’ payoff functions.

Notation: ActionsAs in the lecture, we use the

notation

Review: Regret Matching• Guaranteed to converge to a

Coarse Correlated Equilibrium (CCE) in all games (Hart & Mas-Colell, 2000).

• But CCE can be quite bad in some cases, as they are a superset of Nash Equilibria (NE).

Review: Fictitious Play (FP)• Observe empirical frequencies of

every player’s action• Consider best response(s) under

the (incorrect) assumption that other players play according to their empirical frequencies

• Randomly choose a best response and act accordingly

Empirical Frequency in FPThe empirical frequency for a player

and an action is the percentage of stages that the player chose that action up to the previous stage:

Empirical Frequency in FPEach player also has an empirical

frequency vector.

Best Response in FPEach player assumes an expected

payoff

And each player chooses a best response from the set

The Good News!“The empirical frequencies

generated by FP converge to a Nash equilibrium in potential games” (Monderer & Shapley, 1996).

The Bad News (if any)?What are some weaknesses of FP?

A Routing Example• Consider a routing game with 100

players all with the same source and sink

• There are 4 roads from the source to the sink

• Players want to minimize their cost.

A Routing Example• The cost of traveling on each road

is given by a quadratic cost function with positive coefficients (could be randomly generated) depending on the number of players choosing that road

• Can we use FP as a learning algorithm in this example?

A Routing ExampleFormalizing the game, we have

A Routing ExampleRemember this?

The sum above is over 4^99=2^198 terms!

A Routing ExampleRemember this?

This is not computationally feasible!

The sum above is over 4^99=2^198 terms!

What do we do?The routing example (which is fairly

realistic) is motivation that we either need to find a more effective way to compute this utility or we need to develop an algorithm that is computationally suitable for “large” games.

Joint Strategy Fictitious Play (JSFP)

• Observe empirical frequencies of joint actions

• Consider best response(s) under the (still incorrect) assumption that all other players act collectively as a group according to their joint empirical frequency

• Randomly choose a best response and act accordingly

Does FP=JSFP?• In the case of two players it is easy

to see that FP and JSFP are the same.

Does FP=JSFP?• In the case of two players it is easy

to see that FP and JSFP are the same

• But in the case of three or more players this is not necessarily the case!

Empirical Frequency in JSFP

The empirical frequency for an action profile may be calculated as follows:

Expected Payoff in JSFPEach player assumes an expected

payoff

Expected Payoff in JSFPEach player assumes an expected

payoff

But this looks about as bad (maybe worse) than FP!

So what can we do?

Expected Payoff in JSFPEach player assumes an expected payoff

We rewrite it in a more useful form!

The JSFP Payoff RecursionSo now, we can rewrite the expected

payoff as a simple recursion, and at every stage choose a value that maximizes it (our best response)

We are maximizing regret!

Convergence Properties of JSFP

The convergence properties of JSFP (for games of three or more players) remain unknown; so this is an open problem. But when a joint action generated by JSFP reaches a strict NE, it will stay there forever. To get convergence properties, we add “inertia” to our learning algorithm.

JSFP with Inertia• Assume that all NE are strict• JSFP-1: If the action chosen by a

player in the previous stage is a best response to the current stage choose that action

• JSFP-2: Otherwise choose an action according to the distribution

The JSFP-2 DistributionHere the alpha parameter represents the

player’s willingness to optimize at a given stage, while the beta parameter whose support is contained in the set of best responses to this stage, and the v term is a distribution with full support on the action taken in the previous stage.

JSFP w/ Inertia Converges!• In particular to some Nash Equilibria for

generalized ordinal potential games• Of course there is no equilibrium

selection mechanism• And not much is known regarding the

convergence rate• But we have shown that JSFP w/ Inertia is

a good substitute for FP in “large” games

JSFP w/ Inertia Converges!If you want the proof, read the paper

as the proof is not trivial!

The Fading Memory Variant

We used the recursion

But we could also use the recursion

Here, rho is a constant or function less than or equal to 1, and it is also proven that this algorithm gives rise to a process converging to some NE.

A Routing Example, Revisited

• We can now apply JSFP w/ Inertia and fading memory to the routing problem, and we should converge to some NE (in generalized ordinal potential games, which includes routing games)

• Simulations show that JSFP without inertia should also work in this case

• Try it!

Example of Convergence

Conclusion• We have demonstrated some

weaknesses of FP (computational demands, observational demands, etc.)

• We have developed JSFP, which seems to accommodate computational limitations

Joint Strategy Fictitious Play

Documents