Post on 22-Feb-2016
description
transcript
A Lyapunov Optimization Approach to Repeated Stochastic Games
Michael J. NeelyUniversity of Southern California
http://www-bcf.usc.edu/~mjneelyProc. Allerton Conference on Communication, Control, and Computing, Oct. 2013
Game manager
Player 1
Player 2
Player 3
Player 4
Player 5
Game structure• Slotted time t in {0, 1, 2, …}.
• N players, 1 game manager.
• Slot t utility for each player depends on:(i) Random events ω(t) = (ω0(t), ω1(t),…,ωN(t))(ii) Control actions α(t) = (α1(t), … , αN(t))
• Players Maximize time average utility.
• Game manager Provides suggestions. Maintains fairness of utilities subject to equilibrium constraints.
Random events ω(t)
• Player i sees ωi(t).
• Manager sees: ω(t) = (ω0(t), ω1(t), … , ωN(t))
Game managerPlayer 1 ω1(t)
Player 2 ω2(t)
Player 3 ω3(t)
(ω0(t), ω1(t), …, ωΝ(t))
Only known to manager!
Random events ω(t)
• Player i sees ωi(t).
• Manager sees: ω(t) = (ω0(t), ω1(t), … , ωN(t))
• Vector ω(t) is i.i.d. over slots (components are possibly correlated)
Game managerPlayer 1 ω1(t)
Player 2 ω2(t)
Player 3 ω3(t)
(ω0(t), ω1(t), …, ωΝ(t))
Actions and utilities
• Manager sends suggested actions Mi(t).
• Players take actions αi(t) in Ai.
• Ui(t) = ui( α(t), ω(t) ).
Game managerPlayer 1 α1(t)
Player 2 α2(t)
Player 3 α3(t)
(ω0(t), ω1(t), …, ωΝ(t))
M1(t)
M2(t)
M3(t)
Example: Wireless MAC game
• Manager knows current channel conditions: ω0(t) = (C1(t), C2(t), … , CN(t))
• Users do not have this knowledge: ωi(t) = NULL
User 1
User 2
User 3
Access Point
C1(t)
C2(t)
C3(t)
Example: Economic market
• ω0(t) = vector of current prices.
• Prices are commonly known to everyone: ωi(t) = ω0(t) for all i.
Game managerPlayer 1
Player 2
Player 3
ω0(t) = [priceHAM(t)] [priceEGGS(t)]
ParticipationAt beginning of game, players choose either: (i) Participate: • Receive messages Mi(t).• Always choose αi(t) = Mi(t).
(ii) Do not participate: • Do not receive messages Mi(t).• Can choose αi(t) however they like.
Need incentives for participation…
ParticipationAt beginning of game, players choose either: (i) Participate: • Receive messages Mi(t).• Always choose αi(t) = Mi(t).
(ii) Do not participate: • Do not receive messages Mi(t).• Can choose αi(t) however they like.
Need incentives for participation…• Nash equilibrium (NE)• Correlated equilibrium (CE)• Coarse Correlated Equilibrium (CCE)
ΝΕ for Static Game • Consider special case with no ω(t) process.• Nash equilibrium (NE): Players actions are independent: Pr[α] = Pr[α1]Pr[α2]…Pr[αN]
Game manager not needed.
• Definition: Distribution Pr[α] is a Nash equilibrium (NE) if no player can benefit by unilaterally changing its action probabilities.
Finding a NE in a general game is a nonconvex problem!
CΕ for Static Game • Manager suggests actions α(t) i.i.d. Pr[α].
• Suppose all players participate.
• Definition: [Aumann 1974, 1987] Distribution Pr[α] is a Correlated Equilibrium (CE) if:
E[ Ui(t)| αi(t)=α ] ≥ E[ ui(β, α{-i}) | αi(t)=α] for all i in {1, …, N}, all pairs α, β in Ai.
LP with |A1|2 + |A2|2 + … + |AN|2 constraints
Criticism of CE• Manager gives suggestions Mi(t) to players even if
they do not participate.
• Without knowing message Mi(t) = αi : Player i only knows a-priori likelihood of other player actions via joint distribution Pr[α].
• Knowing Mi(t) = αi : Player i knows a-posteriori likelihood of other player actions via conditional distribution Pr[α | αi ]
CCΕ for Static Game • Manager suggests α(t) i.i.d. Pr[α]. • Gives suggestions only to participating players.• Suppose all players participate.
• Definition: [Moulin and Vial, 1978] Distribution Pr[α] is a Coarse Corr. Eq. (CCE) if:
E[ Ui(t) ] ≥ E[ ui(β, α{-i}) ] for all i in {1, …, N}, all pairs β in Ai.
LP with |A1| + |A2| + … + |AN| constraints.
( significantly less complex! )
Superset Theorem
The NE, CE, CCE definitions extend easily to the stochastic game.
Theorem:
{all NE} {all CE} {all CCE}
Example (static game)Pl
ayer
1Player 2
Utility function 1 Utility function 2
2 5253
4
Play
er 1
Player 2
50 1403
2
Avg.
Util
ity 2
Avg. Utility 1
(3.5, 2.4)
(3.5, 9.3)(3.87, 3.79)
NE and CE point
All players benefit if non-participants are denied access to the suggestions of the game manager.
CCE region
Pure strategies for stochastic games• Player i observes: ωi(t) in Ωi
• Player i chooses: αi(t) in Ai
• Definition: A pure strategy for player i is a function bi : Ωi Ai.
• There are |Ai||Ωi| pure strategies for player i.
• Define Si as this set of pure strategies.
Ωi Aibi(ωi)
Stochastic optimization problem
Subject to:
Ui ≥ Ui(s) for all i in {1, …, N}
for all s in Si
φ( U1, U2, …, UN )Maximize:
α(t) in A1 x A2 x … x AN for all t in {0, 1, 2, …}
1)
2)
Concave fairness function
CCE Constraints
Lyapunov optimization approach
Ui ≥ Ui(s) for all i in {1, …, N}, for all s in Si
Constraints:
Virtual queue:
Qi(s)(t)
ui(α(t), ω(t))ui(s)(α(t), ω(t))
Formally:ui
(s)(α(t), ω(t)) = ui((bi(s)(ωi(t)), α{-i}(t)), ω(t))
Online algorithm (main part):Every slot t: • Game manager observes queues and ω(t). • Chooses α(t) in A1 x A2 x … x AN to minimize:
• Do an auxiliary variable selection (omitted here).• Update virtual queues.
Knowledge of Pr[ω(t) = (ω0, ω1, …., ωN)] not required!
Conclusions:• CCE constraints are simpler and lead to improved
utilities.
• Online algorithm for the stochastic game.
• No knowledge of Pr[ω(t) = (ω0, ω1, …., ωN)] required!
• Complexity and convergence time is independent of size of Ω0.
• Scales gracefully with large N.
Aux variable update:• Choose xi(t) in [0, 1] to maximize:
Vφ(x1(t), …, xN(t)) – ∑ Zi(t)xi(t)
Where Zi(t) is another virtual queue, one for each player i in {1, …, N}. See paper for details: http://ee.usc.edu/stochastic-nets/docs/repeated-games-maxweight.pdf