Satisfaction Equilibrium Stéphane Ross. Canadian AI 20062 / 21 Problem In real life multiagent...

Satisfaction Equilibrium

Stéphane Ross

Canadian AI 2006 2 / 21

Problem In real life multiagent systems :

Agents generally do not know the preferences (rewards) of their opponents

Agents may not observe the actions of their opponents

In this context, most game theoretic solution concepts are hardly applicable

We may try to define equilibrium concepts that : do not require complete information are achievable through learning, over repeated play


Plan

Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions


Presentation Plan



Game model

: Number of agents : Joint action space : Set of possible outcomes , the outcome function. , agent i’s reward function.

Agent only knows , and . After each turn, every agent observes an

outcome .


Game model

Observations: The agents do not know the game matrix

They are unable to compute best responses and Nash Equilibrium.

They can only reason on their history of actions and rewards.

A a,? b,?

B c,? d,?a,b,c,d



Presentation Plan


Since the agents can only reason on their history of payoff, we may adopt a satisfaction-based reasoning: If an agent is satisfied by its current reward, it should

keep playing the same strategy An unsatisfied agent may decide to change its

strategy according to some exploration function

An equilibrium will arise when all agents are satisfied.



Formally : is the satisfaction function of agent :

if (agent i is satisfied) if (agent i is not satisfied)

is the satisfaction threshold of agent

A joint strategy is a satisfaction equilibrium if :



Example

Prisoner’s dilemma

Possible satisfaction matrix :

C D

C -1, -1 -10, 0

D 0, -10 -8,-8

Dominant strategy : D

Nash Equilibrium : (D,D)

Pareto-Optimal : (C,C), (D,C), (C,D)

C D

C 1, 1 0, 1

D 1, 0 0, 0

C D

C 1, 1 0, 1

D 1, 0 1, 1



However, even if a satisfaction equilibrium exists, it may be unreachable :

A B C

A 1,1 0,1 0,1

B 1,0 1,0 0,1

C 1,0 0,1 1,0



Presentation Plan


Satisfaction Equilibrium Learning

If the satisfaction thresholds are fixed, we only need to apply the satisfaction-based reasoning: Choose a strategy randomly If satisfied, keep playing the same strategy Else choose a new strategy randomly

We can also use other exploration functions which favour actions that have not been explored often Ex:


Satisfaction Equilibrium Learning

We use a simple update rule: When the agent is satisfied, we increment its

satisfaction threshold by some variable If the agent is unsatisfied, we decrement its

satisfaction threshold of is multiplied by a factor each turn such

that it converges to 0 We also use a limited history of our previous

satisfaction states and thresholds for each action to bound the value of the satisfaction threshold



Presentation Plan


Results Fixed satisfaction thresholds

In simple games, we were always able to reach a satisfaction equilibrium.

Using a biased exploration improves the speed of convergence of the algorithm.

Learning the satisfaction thresholds We are generally able to learn the optimal satisfaction equilibrium

in simple games. Using a biased exploration improves the convergence

percentage of the algorithm. The factor and history size affects the convergence of the

algorithm and need to be adjusted to get optimal results.


Results – Prisoner’s dilemma



Presentation Plan


Conclusion

It is possible to learn stable outcomes without observing anything but our own rewards

Satisfaction equilibria can be defined on any Pareto-Optimal solution.However, satisfaction equilibria are not always

reachable The proposed learning algorithms achieves good

performance in simple gamesHowever, they require game-specific

adjustments for optimal performance


Conclusion

For more information, you can consult my publications at:http://www.damas.ift.ulaval.ca/~ross

Thank You!


Questions

?

Date post:	17-Dec-2015
Category:	Documents
Upload:	shannon-baker
View:	213 times
Download:	0 times

Satisfaction Equilibrium Stéphane Ross. Canadian AI 20062 / 21 Problem In real life multiagent...

Documents