+ All Categories
Home > Documents > Satisfaction Equilibrium Stéphane Ross. Canadian AI 20062 / 21 Problem In real life multiagent...

Satisfaction Equilibrium Stéphane Ross. Canadian AI 20062 / 21 Problem In real life multiagent...

Date post: 17-Dec-2015
Category:
Upload: shannon-baker
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
21
Satisfaction Equilibrium Stéphane Ross
Transcript

Satisfaction Equilibrium

Stéphane Ross

Canadian AI 2006 2 / 21

Problem In real life multiagent systems :

Agents generally do not know the preferences (rewards) of their opponents

Agents may not observe the actions of their opponents

In this context, most game theoretic solution concepts are hardly applicable

We may try to define equilibrium concepts that : do not require complete information are achievable through learning, over repeated play

Canadian AI 2006 3 / 21

Plan

Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions

Canadian AI 2006 4 / 21

Presentation Plan

Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions

Canadian AI 2006 5 / 21

Game model

: Number of agents : Joint action space : Set of possible outcomes , the outcome function. , agent i’s reward function.

Agent only knows , and . After each turn, every agent observes an

outcome .

Canadian AI 2006 6 / 21

Game model

Observations: The agents do not know the game matrix

They are unable to compute best responses and Nash Equilibrium.

They can only reason on their history of actions and rewards.

A a,? b,?

B c,? d,?a,b,c,d

Canadian AI 2006 7 / 21

Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions

Presentation Plan

Canadian AI 2006 8 / 21

Since the agents can only reason on their history of payoff, we may adopt a satisfaction-based reasoning: If an agent is satisfied by its current reward, it should

keep playing the same strategy An unsatisfied agent may decide to change its

strategy according to some exploration function

An equilibrium will arise when all agents are satisfied.

Satisfaction Equilibrium

Canadian AI 2006 9 / 21

Formally : is the satisfaction function of agent :

if (agent i is satisfied) if (agent i is not satisfied)

is the satisfaction threshold of agent

A joint strategy is a satisfaction equilibrium if :

Satisfaction Equilibrium

Canadian AI 2006 10 / 21

Example

Prisoner’s dilemma

Possible satisfaction matrix :

C D

C -1, -1 -10, 0

D 0, -10 -8,-8

Dominant strategy : D

Nash Equilibrium : (D,D)

Pareto-Optimal : (C,C), (D,C), (C,D)

C D

C 1, 1 0, 1

D 1, 0 0, 0

C D

C 1, 1 0, 1

D 1, 0 1, 1

Canadian AI 2006 11 / 21

Satisfaction Equilibrium

However, even if a satisfaction equilibrium exists, it may be unreachable :

A B C

A 1,1 0,1 0,1

B 1,0 1,0 0,1

C 1,0 0,1 1,0

Canadian AI 2006 12 / 21

Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions

Presentation Plan

Canadian AI 2006 13 / 21

Satisfaction Equilibrium Learning

If the satisfaction thresholds are fixed, we only need to apply the satisfaction-based reasoning: Choose a strategy randomly If satisfied, keep playing the same strategy Else choose a new strategy randomly

We can also use other exploration functions which favour actions that have not been explored often Ex:

Canadian AI 2006 14 / 21

Satisfaction Equilibrium Learning

We use a simple update rule: When the agent is satisfied, we increment its

satisfaction threshold by some variable If the agent is unsatisfied, we decrement its

satisfaction threshold of is multiplied by a factor each turn such

that it converges to 0 We also use a limited history of our previous

satisfaction states and thresholds for each action to bound the value of the satisfaction threshold

Canadian AI 2006 15 / 21

Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions

Presentation Plan

Canadian AI 2006 16 / 21

Results Fixed satisfaction thresholds

In simple games, we were always able to reach a satisfaction equilibrium.

Using a biased exploration improves the speed of convergence of the algorithm.

Learning the satisfaction thresholds We are generally able to learn the optimal satisfaction equilibrium

in simple games. Using a biased exploration improves the convergence

percentage of the algorithm. The factor and history size affects the convergence of the

algorithm and need to be adjusted to get optimal results.

Canadian AI 2006 17 / 21

Results – Prisoner’s dilemma

Canadian AI 2006 18 / 21

Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions

Presentation Plan

Canadian AI 2006 19 / 21

Conclusion

It is possible to learn stable outcomes without observing anything but our own rewards

Satisfaction equilibria can be defined on any Pareto-Optimal solution.However, satisfaction equilibria are not always

reachable The proposed learning algorithms achieves good

performance in simple gamesHowever, they require game-specific

adjustments for optimal performance

Canadian AI 2006 20 / 21

Conclusion

For more information, you can consult my publications at:http://www.damas.ift.ulaval.ca/~ross

Thank You!

Canadian AI 2006 21 / 21

Questions

?


Recommended