Games where you can play optimally without any memory

transcript

Authors:Hugo Gimbert and Wieslaw Zielonka

Presented by Moria Abadi

Arena and Play

MaxS MinS

color(play) = blue blue yellow …

Payoff Mapping of Player

means that

y is good for the player at least as x

},{: RCu

)()( yuxu

Player wins payoff u(x) in play x

Example 1 – Parity Game

2mod)suplim(...)( 10 inincccu

Max wins 1 if the highest color visited infinitely often is odd, otherwise his payoff is 0

Example 2 – Sup Game

iNicccu

sup...)( 10

Max wins the highest value seen during the play

Example 4 – Mean Payoff Game

1lim...)(

1suplim...)(

Does not always exist

Example 4 – Mean Payoff Game

1suplim...)(

Preference Relation of Player

is complete preorder relation on C

x y means

y is good for the player at least as x

u induces : x y iff u(x)≤u(y)

x y denotes x y but not y x

Antagonistic Games

• x -1 y iff y x

is preference relation of Max

-1 is preference relation of Min

Games, Strategies

• Game (G,)– G is finite arena G = (SMax, SMin, E)

is a preference relation for player Max

strategy for Max

strategy for Min

• pG(t,,) is a play in G with source t consistent

with both and .

Optimal Strategies Intuition

• pG(t,#,#) is a play

# and # are optimal if:

For Max and Min it is not worth to exchange his strategy unilaterally

Optimal Strategies Definition(G,) is given

# and # are optimal if

For all states s and all strategies and

)),,(( #spcolour G

)),,(( ## spcolour G

)),,(( # spcolour G

The Main Question

Under which conditions Max and Min have optimal memoryless strategies for all

games?

Some conditions on will be definedMin and Max have optimal memoryless strategies iff

satisfies these conditions

Parity games, mean payoff games,…

Rec(C) all languages recognizable by automata

Pref(L) all prefixes of the words in L

Cx[L]={ | every finite prefix of x is in Pref(L)}

LRec(C)

[L] Example

})10(,)01{(][ L

}0,10,10|0)01(1{ mjkL jmk

Lemma 3

[L M] = [L] [M]

xPref(L), xPref(M)

xPref(M), xPref(L)

Co-accessible Automaton

• From any state there is a (possibly empty) path to a final state

C={0,1}

Lemma 4• Let A=(Q,i,F,Δ) be a co-accessible finite

automaton recognizing a language L. Then

[L]={color(p) | p is an infinite path in A, source(p)=i}

p=e0e1e2… n there is a path from target(en) to a final state

color(p)[L]

Lemma 4• Let A=(Q,I,F,Δ) be a co-accessible finite

automaton recognizing a language L. Then

[L]={color(p) | p is an infinite path in A, source(p)=i}

x=c0c1c2… n there is a path matching c0…cnThere is an infinite path p: color(p)=x

Extension of and

XY iff xX yY, xy

XY iff yY xX, xy

For X,YC

Monotony

is monotone if M,NRec(C)

xC* [xM] [xN] yC* [yM] [yN]

Intuitively: at each moment during the play the optimal choice between two possible futures does not depend on the preceding finite play

Example of non-monotone

1sup...)(

u(xv)<u(xw) while u(yw)<u(yv)

u(xv) = 2/5, u(xw) = 1, u(yv) = 6/5, u(yw) = 1

Selectivity

is selective if xC* M,N,KRec(C)

[x(MN)*K] [xM*] [xN*] [xK]

Intuitively: the player cannot improve his payoff by switching between different behaviors

Example of non-selective

...)( 21ccu1 if the colors 0 and 1 occur infinitely often 0 otherwise

C={0,1}

01M = {1k | 0≤k} N = {0k | 0≤k}

(01) [(MN)*] [M*] = {1}

u((01) > u(1) and u((01) > u(0)

[N*] = {0}

The Main Theorem

Given a preference relation , both players have optimal memoryless strategies for all games (G,) over finite arenas G if and only if the relations and -1 are monotone and selective

Proof of Necessary Condition

Given a preference relation , if both players have optimal memoryless strategies for all games (G,) over finite arenas G then the relations and -1 are monotone and selective

Simplification 1

A, , #

B, -1, #

A, , #

Max Min

It is enough to prove only for

Simplification 2

• It turns out that already for one-player games if Max has optimal strategy, has to be monotone and selective

Two-player arenas

One-player arenas

Lemma 5

Suppose that player Max has optimal

memoryless strategies for all games (G,) over finite one-player arenas G=(SMax,Ø,E).

Then is monotone and selective.

Prove of Monotony

x,yC* and M,NRec(C) and [xM] [xN]We shall prove [yM] [yN]

• Ax and Ay are deterministic co-accessible

automata recognizing {x} and {y}

• AN and AM are co-accessible automata

recognizing N and M

• W.l.o.g. AN and AM have no transition with initial state as a target

Prove of Monotony

x,yC* and M,NRec(C) and [xM] [xN] [yM] [yN]

If [M] = Ø – trivial.

[M] Ø and [N] Ø by Lemma 4 there is an infinite path from initial state of AM and AN

Automaton A

ANANi i

Recognizes

All plays are

[x(MN)]

=[xM][xN]

AxAx Ay

p play consistent with #

x,yC* and M,NRec(C) and [xM] [xN] [yM] [yN]

q play consistent with #

color(q)[yN],

[yM][yN]

color(q)

[yM] [yN]

Proof of Sufficient Condition

Given a monotone and selective preference relations and -1, both players have optimal memoryless strategies for all games (G,) over finite arenas G.

Arena Number

• G=(S,E)

• nG = |E|-|S|

• Each state has at least one outgoing transition nG0

• The proof by induction on nG

Induction

For arena G, where nG=0.

Hypothesis

Let G be an arena and is monotone and selective. Suppose Max and Min have memoryless strategies in all games (H,) over arenas H such that nH<nG. Then Max has optimal memoryless strategy in (G,).

strategies are unique

• We need to find # such that (#,#) optimal

• We will find #m which requires memory

such that (#, #m) optimal

• Permuting Max and Min we will find (#

m, #) optimal

• (#, #m) and (#

m, #) are optimal (#,#) optimal

Induction Step

GG nni

(#i, #

i) – optimal strategies in Gi

Induction Step

Ki colors of finite plays from in Gi from t consistent with #

KiRec(C), monotone xC* [xK0] [xK1] or xC* [xK1] [xK0]

W.l.o.g xC* [xK1] [xK0] So let # = #0

)(# p #0(target(p)) if last transition from t was to G0

#1(target(p)) if last transition from t was to G1

color(pG(s,,#))color(pG(s,#,#))color(pG(s,#,))

color(pG(s,#,#))color(pG(s,#,))

All plays are in G0

color(pG(s,,#))color(pG(s,#,#))

pG(s,,#) traverse the state t

All plays are in G0

color(pG(s,,#)) [x(M0M1)*(K0K1)] [x(M0)*] [x(M1)*][x(K0K1)](Mi*)Ki color(pG(s, ,#)) [x(K0K1)] = [xK0][xK1] [xK0]

x - color of the shortest path to t consistent with #

Mi colors of finite plays from in Gi from t to t consistent with #

color(pG(s, ,#)) [xK0] color(pG0(s,#0,#

0)) = color(pG(s,#,#))

A Very Important Corollary

Suppose that is such that for each finite arena G=(SMax,SMin,E) controlled by one player (SMax=Ø or SMin=Ø), this player has an optimal memoryless strategy in (G,).Then for all finite two-player arenas G both players have optimal memoryless strategies in the games (G,).

Mean Payoff Game

1suplim...)(

Games where you can play optimally without any memory

Documents