+ All Categories
Home > Documents > Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9.

Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9.

Date post: 17-Dec-2015
Category:
Upload: ezra-mitchell
View: 216 times
Download: 2 times
Share this document with a friend
Popular Tags:
22
Towards Equilibrium Transfer in Markov Games 胡胡胡 2013-9-9
Transcript
Page 1: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9.

Towards Equilibrium Transfer in Markov Games

胡裕靖2013-9-9

Page 2: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9.

Outline

BackgroundPreliminary IdeasSome Results

Page 3: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9.

Background

Page 4: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9.

Multi-agent Reinforcement Learning

Single-agent RL:

Mountain CarPath finding

RL in multi-agent tasks

Robot Soccer IKEA furniture robot

Page 5: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9.

Markov Games

N: the set of agents.: the discrete state space.: the joint action space of the agents.is the reward function.p is the transition function.

Agent take joint actions

: the discrete state space.: the action space of the agent.is the reward function. is the transition function.

from one agent to more than one

Page 6: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9.

Equilibrium-based MARL

Some equilibrium solution concepts in game theory can be adopted

Page 7: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9.

Our Previous Work Equilibrium-based MARL:

Multi-agent reinforcement learning with meta equilibrium []

Multi-agent reinforcement learning by negotiation with unshared value functions []

Focusing on combining MARL with equilibrium solution concepts

Problematic issues: Equilibrium computing is complicated and time

consuming A new complexity class: TFNP! [] For tasks with many agents, equilibrium-based

MARL algorithms may take too much time

How to accelerate the learning process of equilibrium-based MARL?

Page 8: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9.

Transfer Learning in RLMatthew E Taylor, Peter Stone. Transfer learning for reinforcement learning domains. Journal of Machine Learning Research, 2009.

𝑀𝐷𝑃 𝑀𝐷𝑃 ′instance/policy/value function/model/…

Alessandra Lazaric. Transfer in reinforcement learning: a framework and a survey. Reinforcement Learning, Springer, 2012.

accelerate

Reuse learnt knowledge

Page 9: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9.

Transfer Learning in Markov Games?

𝑀𝑎𝑟𝑘𝑜𝑣𝐺𝑎𝑚𝑒 𝑀𝑎𝑟𝑘𝑜𝑣𝐺𝑎𝑚𝑒 ′instance/policy/value function/model/…

𝐺 (𝑠 ′ ′)𝐺 (𝑠 ′)𝐺 (𝑠 ) ……

𝐺 (𝑠 ′ ′)𝐺 (𝑠 ′)𝐺 (𝑠 ) ……

…………

……

𝑡

Why not transfer between these normal-form games within a Markov game?

Inter-task transfer

Inner-task transfer

Page 10: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9.

Inner-task Transfer

𝑄1𝑡 (𝑠 ,𝑎 ,𝑏) 𝑄1

𝑡+1(𝑠 ,𝑎 ,𝑏)

……

Transfer equilibrium between similar normal-form games during learning in a Markov game:

Reuse the computed equilibria in previous games Reducing learning time

Key problems: Which games are similar? For example: the games occur on different visits

of a state How to transfer equilibrium?

Page 11: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9.

Preliminary Ideas

Page 12: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9.

Game Similarity Games with the same action space? Games with different action space? Similarity payoff distance? Equilibrium-based similarity or equilibrium-

independent similarity?Drew Fudenberg and David M. Kreps. A theory of learning, experimentation and equilibrium in games. 1990.

Page 13: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9.

Game Similarity

Why not take in the second game?

Equilibrium-based similarity

Equilibrium transfer

Find equilibria of two games and compute the similarity

Transfer seems senseless!

Weird Cycle

Page 14: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9.

Our IdeaTransfer equilibrium between games which are thought to

be similar.

Evaluate how much the loss brought by equilibrium transfer is.

Transfer is acceptable when there is a little loss.

𝑄1𝑡 (𝑠 ,𝑎 ,𝑏) 𝑄1

𝑡+1(𝑠 ,𝑎 ,𝑏)

……

The two games are different only in one item.

Page 15: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9.

Problem Definition

𝐺 ,𝑝∗𝐺′ , ?

transfer method

Can we find a transfer method which can transfer the computed Nash equilibrium in game to a strategy profile in game that satisfies and , there holds

where is close to . In other words, given a transfer method, if is

small enough, then the transfer method is acceptable.

Furthermore,

Approximate Nash

equilibrium

Page 16: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9.

Problem Definition

and , define the transfer error

Let Let

Given a transfer method, we need to find the bound of !

Page 17: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9.

A Naïve Transfer Method

Define the difference of the two games such that and

Examine the transfer error

Direct Transfer

𝐺 ,𝑝∗ 𝐺′ , ?

𝑝∗

Page 18: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9.

A Naïve Transfer Method

𝜖 𝑖 (𝑎𝑖 ,𝑝′)=𝑈 𝑖

𝐺′ (𝑎𝑖 ,𝑝−𝑖∗ )−𝑈 𝑖𝐺′

(𝑝∗ )

¿ Σ𝑎− 𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 ) [𝑈 𝑖

𝐺 (𝑎𝑖 , �⃗�−𝑖 )+𝛿𝑖 (𝑎𝑖 , �⃗�− 𝑖)−Σ𝑎𝑖′𝑝𝑖

∗ (𝑎𝑖′ )[𝑈 𝑖𝐺 (𝑎𝑖

′ , �⃗�−𝑖)+𝛿𝑖 (𝑎𝑖′ , �⃗�− 𝑖)]]

¿ Σ�⃗�− 𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 ) [𝑈𝑖

𝐺′

(𝑎𝑖 , �⃗�−𝑖 )− Σ𝑎𝑖′𝑝𝑖

∗ (𝑎𝑖′ )𝑈 𝑖

𝐺′

(𝑎𝑖′ , �⃗�− 𝑖) ]

¿ Σ�⃗�− 𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 ) [𝑈𝑖

𝐺 (𝑎𝑖 ,�⃗�−𝑖 )−Σ𝑎𝑖′𝑝𝑖

∗ (𝑎𝑖′ )𝑈 𝑖

𝐺 (𝑎𝑖′ , �⃗�−𝑖 )]+Σ�⃗�−𝑖𝑝−𝑖∗ (�⃗�−𝑖 )[𝛿𝑖 (𝑎𝑖 , �⃗�− 𝑖)−Σ𝑎𝑖′𝑝𝑖

∗ (𝑎𝑖′ )𝛿𝑖 (𝑎𝑖′ , �⃗�− 𝑖) ]

≤ Σ�⃗�−𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 )[𝛿𝑖(𝑎𝑖 , �⃗�− 𝑖)−Σ𝑎𝑖

′𝑝𝑖∗ (𝑎𝑖

′ )𝛿𝑖 (𝑎𝑖′ , �⃗�− 𝑖)]

¿ Σ�⃗�− 𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 ) 𝛿𝑖 (𝑎𝑖 , �⃗�−𝑖 )−Σ�⃗�𝑝∗ (�⃗� )𝛿𝑖(�⃗�)

¿ Σ�⃗�− 𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 ) 𝛿𝑖

+¿ (𝑎𝑖 ,�⃗�− 𝑖)−Σ�⃗�𝑝∗ ( �⃗�)𝛿𝑖 (�⃗�)¿

≤ Σ�⃗�−𝑖 𝛿𝑖+¿ (𝑎𝑖 ,�⃗�− 𝑖)−Σ�⃗�𝑝

∗ ( �⃗�)𝛿𝑖 (�⃗�)¿𝛿𝑖

+¿ (𝑎𝑖 ,�⃗�− 𝑖)=max (0 ,𝛿𝑖 (𝑎𝑖 ,�⃗�− 𝑖) )¿

¿ Σ�⃗�− 𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 )𝑈 𝑖

𝐺 ′

(𝑎𝑖 , �⃗�− 𝑖 )−Σ𝑎𝑖′ Σ�⃗�−𝑖𝑝𝑖

∗ (𝑎𝑖′ )𝑈 𝑖

𝐺′ (𝑎𝑖′ , �⃗�−𝑖)

Page 19: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9.

A Naïve Transfer Method

Σ�⃗�− 𝑖𝛿𝑖+¿ (𝑎𝑖 , �⃗�− 𝑖)−Σ�⃗�𝑝

∗ ( �⃗�) 𝛿𝑖(�⃗�)¿

Many items in are zero if two games are very similar

Page 20: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9.

Some Results

Page 21: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9.

Future Work

Some problems: Other transfer methods? Only Nash equilibrium? Equilibrium finding algorithms

Transfer between games with different action space

Transfer between games with different agent numbers

Game abstraction

Page 22: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9.

Thanks!


Recommended