Games, Times, and Probabilities: Value Iteration in Verification and Control

Games, Times, and Probabilities:Value Iteration in Verification and Control

Krishnendu Chatterjee Tom Henzinger

Graph Models of Systems

vertices = states

edges = transitions

paths = behaviors

graph

Extended Graph Models

CONTROL: game graph

OBJECTIVE: -automaton

PROBABILITIES: Markov decision process

stochastic game

regular game

CLOCKS: timed automaton

stochastic hybrid system

Graphs vs. Games

a

baa b

a

Games model Open Systems

Two players: environment / controller / input vs.

system / plant / output

Multiple players: processes / components / agents

Stochastic players: nature / randomized algorithms

Example

P1:

init x := 0

loop

choice | x := x+1 mod 2| x := 0

end choice

end loop

1: (x = y )

P2:

init y := 0

loop

choice | y := x | y := x+1 mod 2

end choice

end loop

2: ( y = 0 )

Graph Questions

8 ( x = y )

9 ( x = y )

CTL

Graph Questions

8 ( x = y )

9 ( x = y )00

10 11

01

X

CTL

Zero-Sum Game Questions

hhP1ii ( x = y )

hhP2ii ( y = 0 )

ATL [Alur/H/Kupferman]


hhP1ii ( x = y )

hhP2ii ( y = 0 )

00

00 00

10

10 10

01

01 01

11

1111ATL [Alur/H/Kupferman]


hhP1ii ( x = y )

hhP2ii ( y = 0 )

00

00 00

10

10 10

01

01 01

11


X


hhP1ii ( x = y )

hhP2ii ( y = 0 )

00

00 00

10

10 10

01

01 01

11


X

Nonzero-Sum Game Questions

hhP1ii ( x = y )

hhP2ii ( y = 0 )

00

00 00

10

10 10

01

01 01

11

1111

Secure equilibra [Chatterjee/H/Jurdzinski]

Nonzero-Sum Game Questions

hhP1ii ( x = y )

hhP2ii ( y = 0 )

00

00 00

10

10 10

01

01 01

11

1111

Secure equilibra [Chatterjee/H/Jurdzinski]

Strategies

Strategies x,y: Q* ! Q

From a state q, a pair (x,y) of a player-1 strategy x21 and a player-2 strategy y22 gives a unique infinite path Outcomex,y(q) 2 Q.

Strategies

hhP1ii 1 = (9 x21) (8 y22) 1(x,y)

Short for:

q ² hhP1ii 1 iff (9 x21) (8 y22) ( Outcomex,y(q) ² 1 )



Strategies

hhP1ii 1 = (9 x21) (8 y22) 1(x,y)

hhP1ii 1 hhP2ii 2 = (9 x21) (9 y22) [ (1 Æ 2)(x,y) Æ (8 y’22) (2 ! 1)(x,y’) Æ (8 x’21) (2 ! 1)(x,y) ]



Objectives and 2

Qualitative: reachability; Buechi; parity (-regular)

Quantitative: max; lim sup; lim avg

Reachability } aSafety a = :}: a

Normal Forms of -Regular Sets

Borel-1


Buechi } acoBuechi } a = :}: a


Borel-1

Borel-2


Buechi } acoBuechi } a = :}: a

Streett Æ ( } a ! } b ) = Æ ( }: a Ç } b )Rabin Ç ( } a Æ } b )

Parity: complement-closed subset of Streett/Rabin


Borel-1

Borel-2

Borel-2.5

Buechi Game

q4q0q2

q1q3

G

B

Buechi Game

q4q0q2

q1q3

G

B

• Secure equilibrium (x,y) at q0:

x: if q1 ! q0, then q2 else q4. y: if q3 ! q1, then q0 else q4.

• Strategies require memory.

Zero-Sum Games: Determinacy

W1

W2

1 = : 2

hhP2ii 2

hhP1ii 1

Nonzero-sum Games

W10 hhP1ii (1 Æ : 2 )

W01 hhP2ii (2 Æ : 1)

W11

W00

hhP1ii1 hhP2ii2

Objectives

Qualitative: reachability; Buchi; parity (-regular)


Objectives

Qualitative: reachability; Buchi; parity (-regular)


Borel-1Borel-2

Borel-3

Quantitative Games

hhP1ii lim sup

hhP1ii lim avg

4

2

2

0

2

0

0

4

3

Quantitative Games

hhP1ii lim sup = 3

hhP1ii lim avg

4

2

2

0

2

0

0

4

3

Quantitative Games

hhP1ii lim sup = 3

hhP1ii lim avg = 1

4

2

2

0

2

0

0

4

3

Solving Games by Value Iteration

Generalization of the -calculus: computing fixpoints of transfer functions (pre; post).

Generalization of dynamic programming: iterative optimization.

q

Region R: Q ! V

q’

R(q’)


Generalization of the -calculus: computing fixpoints of transfer functions (pre; post).

Generalization of dynamic programming: iterative optimization.

q

Region R: Q ! V

q’

R(q’)

R(q) := pre(R(q’))

Q states transition labels : Q Q

transition function

Graph

Q states transition labels : Q Q

transition function

= [ Q ! {0,1} ] regions with V = B

9pre:

q 9pre(R) iff ( ) (q,) R

8pre:

q 8pre(R) iff ( ) (q,) R

Graph

a cb

Graph

9 c = ( X) ( c Ç 9pre(X) )

a cb

Graph

9 c = ( X) ( c Ç 9pre(X) )

a cb

Graph

9 c = ( X) ( c Ç 9pre(X) )

a cb

Graph

9 c = ( X) ( c Ç 9pre(X) )

8 c = ( X) ( c Ç 8pre(X) )

Graph Reachability

R

Given RµQ, find the states from which some path leads to R.

R

RR [ pre(

R)

R = ( X) (R Ç 9pre(X))


Graph Reachability

RR [ pre(

R)R

[ pre(R)

[ pre2(R)

R = ( X) (R Ç 9pre(X))


Graph Reachability

R

. . .

RR

[ pre(R)

R [ pre(

R) [ pre2(R

)

R = ( X) (R Ç 9pre(X))


Graph Reachability

R

. . .

RR

[ pre(R)

R [ pre(

R) [ pre2(R

)

R = ( X) (R Ç 8pre(X))

Given RµQ, find the states from which all paths lead to R.

Graph Reachability

Value Iteration Algorithms

consist of

A. LOCAL PART: 9pre and 8pre computation

B. GLOBAL PART: evaluation of a fixpoint expression

We need to generalize both parts to solve games.

Q1, Q2 states ( Q = Q1 [ Q2 ) transition labels : Q Q

transition function

Turn-based Game


transition function


1pre:

q 1pre(R) iff q 2 Q1 Æ ( ) (q,) R or q 2 Q2 Æ (8 2 )

(q,) 2 R

Turn-based Game


transition function


1pre:

q 1pre(R) iff q 2 Q1 Æ ( ) (q,) R or q 2 Q2 Æ (8 2 )

(q,) 2 R

2pre:

q 2pre(R) iff q 2 Q1 Æ (8 ) (q,) R or q 2 Q2 Æ (9 2 ) (q,) 2 R

Turn-based Game

c

Turn-based Game

a b

c

Turn-based Game

a b

hhP1ii c = ( X) ( c Ç 1pre(X) )

c

Turn-based Game

a b


c

Turn-based Game

a b



c

Turn-based Game

a b



c

Turn-based Game

a b



R

P1 R

Given RµQ, find the states from which player 1 has a strategy to force the game to R.

Reachability Game

RR [ 1pre(

R)

P1 R


Reachability Game

RR [ 1pre(

R)R

[ 1pre(R)

[ 1pre2(R)

P1 R


Reachability Game

R

. . .

1 RR

[ 1pre(R)

R [ 1pre(

R) [ 1pre2(R

)

P1 R = ( X) (R Ç 1pre(X))


Reachability Game

P1 R

Given RµQ, find the states from which player 1 has a strategy to keep the game in R.

R

Safety Game

R \ 1pre(R)

P1 R


R

Safety Game

R \ 1pre(R)

R \ 1pre(R) \ 1pre2(R)

P1 R


R

Safety Game

. . .1 R

R \ 1pre(R)

R \ 1pre(R) \ 1pre2(R)

P1 R = ( X) (R Æ 1pre(X))


R

Safety Game

Q1, Q2 states ( Q = Q1 [ Q2 ) transition labels : Q N £ Q transition function

Quantitative Game


= [ Q ! N ] regions with V = N

1pre:

1pre(R)(q) = (max ) max( 1(q,), R(2(q,)) ) if q 2 Q1 (min 2 ) max( 1(q,), R(2(q,)) ) if q 2 Q2

Quantitative Game


= [ Q ! N ] regions with V = N

1pre:

1pre(R)(q) = (max ) max( 1(q,), R(2(q,)) ) if q 2 Q1 (min 2 ) max( 1(q,), R(2(q,)) ) if q 2 Q2

2pre:

2pre(R)(q) = (min ) max( 1(q,), R((q,)) ) if q 2 Q1 (max 2 ) max( 1(q,), R(2(q,)) ) if q 2 Q2

Quantitative Game

c

Maximizing Game

a b0

1

2

5

3

c

Maximizing Game

a b

hhP1ii 0 = ( X) max( 0, 1pre(X) )

0

1

2

5

3

0 0 0

c

Maximizing Game

a b

hhP1ii 0 = ( X) max( 0, 1pre(X) )

0

1

2

5

3

1 0 0

c

Maximizing Game

a b

hhP1ii 0 = ( X) max( 0, 1pre(X) )

0

1

2

5

3

1 2 0

c

Maximizing Game

a b

hhP1ii 0 = ( X) max( 0, 1pre(X) )

0

1

2

5

3

2 2 0

B

B

Given BµQ, find the states from which some path visits B infinitely often.

Buechi Graph

BR1 = pre(B)

. . .pre(B

)

pre(B) [ pre2(B)

B


Buechi Graph

BR1 = pre(B)

B


Buechi Graph

BR1 = pre(B)R2 = pre(B Å

R1)

B


Buechi Graph

B

... B

B = ( Y) 9 (B Æ 9pre(Y))


Buechi Graph

B

... B

B = ( Y) ( X) ((B Æ 9pre(Y)) Ç 9pre(X))


Buechi Graph

B

P1 B

Given BµQ, find the states from which player 1 has a strategy to force the game to B infinitely often.

Buechi Game

B

...P1 B

R2 = P1 1pre(B Å R1)

R1 = P1 1pre(B)

P1 B = ( Y) ( X) ((B Æ 1pre(Y)) Ç 1pre(X))

Given BµQ, find the states from which player 1 has a strategy to force the game to B infinitely often.

Buechi Game

Can we use the same value iteration scheme?

Yes, iff the fixpoint expression computes correctly on all single-player (player 1 and player 2) structures.

Reachability: 9 p = ( X) (p Ç 9pre(X)) 8 p = ( X) (p Ç 8pre(X))

Hence: hhP1ii p = ( X) (p Ç 1pre(X)) hhP2ii p = ( X) (p Ç 2pre(X))

From Graphs to Games

Complexity of Turn-based Games

1. Reachability, safety: linear time (P-complete)

2. Buechi: quadratic time (optimal ???)

3. Parity: NP Å coNP (in P ???)

Complexity of Turn-based Games

1. Reachability, safety: linear time (P-complete)

2. Buechi: quadratic time (optimal ???)

3. Parity: NP Å coNP (in P ???)

on graphs polynomial

on graphs linear

Graph-based (finite-carrier) systems:

Q = Bm = boolean formulas [e.g. BDDs]

9pre = (9 x 2 B)

Timed and hybrid systems:

Q = Bm £ Rn

= formulas of (Q,·,+) [e.g. polyhedral sets]9pre = (9 x 2 Q)

Beyond Graphs as Finite Carrier Sets

Q states 1, 2 moves of both players : Q 1 2 Q transition function

Concurrent Game



1pre:

q 1pre(R) iff (1 1) (2 2) (q,1,2) R

Concurrent Game



1pre:

q 1pre(R) iff (1 1) (2 2) (q,1,2) R

2pre:

q 2pre(R) iff (2 2 ) (1 1) (q,1,2) R

Concurrent Game

a cb

1,1 1,2

2,1 2,2

1,1 1,2 2,2

2,1

Concurrent Game

a cb

1,1 1,2

2,1 2,2

1,1 1,2 2,2

2,1

Concurrent Game


a cb

1,1 1,2

2,1 2,2

1,1 1,2 2,2

2,1

Concurrent Game


a cb

1,1 1,2

2,1 2,2

1,1 1,2 2,2

2,1

Concurrent Game


Pr(1): 0.5 Pr(2): 0.5

graph

Extended Graph Models

CONTROL: game graph

OBJECTIVE: -automaton

PROBABILITIES: Markov decision process

stochastic game

regular game

CLOCKS: timed automaton

stochastic hybrid system

Nondeterministic closed system.

q1

q2 q3

Graph: 1 Player

b

a

a

a

Probabilistic closed system.

0.4 0.6

q1

q3q2

q5q4

MDP: 1.5 Players

a

b

a

c

Asynchronous open system.

q1

q3q2

q5q4

Turn-based Game: 2 Players

a

b

a

c

a

a

Probabilistic asynchronous open system.

0.4 0.6

q1

q3q2

q5q4

q7q6

Turn-based Stochastic Game: 2.5 Players

cb

c a

b

a

a

aa

q1

bbq2 q4 q5q3

1,1

1,2 2,1

2,2

Concurrent Game

Synchronous open system.

a

aa

q1

bbq2 q4 q5q3

q2: 0.3 q3: 0.2 q4: 0.5 q5:

q2: 0.1 q3: 0.1 q4: 0.5 q5: 0.3

q2: q3: 0.2 q4: 0.1 q5: 0.7

q2: 1.0 q3: q4: q5:

1 2

2

1Matrix game at each vertex.

q1:

Concurrent Stochastic Game

Probabilistic synchronous open system.

Graph: nondeterministic generator of behaviors (possibly stochastic)

Strategy: deterministic selector of behaviors (possibly randomized)

Graph + Strategies for both players ! Behavior

Two pure strategies at q1: “left” and “right”. Two pure behaviors: ab; aa.

Model = graph Pure behavior = path

q1

q2 q3b

a

a

Two pure strategies at q1: “left” and “right”. Two pure behaviors: {ab: 1}; {aac: 0.4, aaa: 0.6}.

Model = MDP Pure behavior = probability distribution on paths = p-path

a

0.4 0.6

q1

q3q2

q5q4a

b

a

c

Model = turn-based game Pure behavior = path

Two pure pl. 1 strategies at q1: “left” and “right”. Two pure pl. 2 strategies at q3: “left” and “right”. Three pure behaviors: ab; aac; aaa.

q1

q3q2

q5q4a

b

a

c

a

Model = turn-based game Pure behavior = path General (randomized) behavior = p-path

Three pure behaviors: ab; aac; aaa. Infinitely many behaviors, e.g. {aac: 0.5, aaa: 0.5}.

q1

q3q2

q5q4a

b

a

c

a

The objective of each player is to find a strategy that optimizes the value of the resulting behavior.

How do we define “value”?

A. Assign a value to each path

B. Assign a value to each behavior (expected value of A.)

C. Assign a value to each state (strategy sup inf of B.)

A. Value of Paths

Qualitative value function: : Q ! {0,1}

e.g. -regular subsets of Q

B. Value of Behaviors

path t: (T) = (t)

p-path T: (T) = Exp {(T)} (expected value)

Example:

T = {aaa: 0.2, aab: 0.7, bbb: 0.1 }

(} b)(T) = 0.8

C. Value of States

hh1ii (q) = supx infy ( Outcomex,y(q) ) hh2ii (q) = supy infx ( Outcomex,y(q) )

Q states 1, 2 moves of both players : Q 1 2 Dist(Q) probabilistic transition function

= [ Q ! [0,1] ] regions with V = [0,1]



= [ Q ! [0,1] ] regions with V = [0,1]

1pre:

1pre(R)(q) = (sup 1 1 ) (inf 2 2) R((q,1,2))



= [ Q ! [0,1] ] regions with V = [0,1]

1pre:

1pre(R)(q) = (sup 1 1 ) (inf 2 2) R((q,1,2))

2pre:

2pre(R)(q) = (sup 2 2) (inf 1 1) R((q,1,2))


a cb

1

1

2

2Pl.1Pl.2

a: 0.6 b: 0.4

a: 0.1 b: 0.9

a: 0.5 b: 0.5

a: 0.2 b: 0.8

1

1

2

2Pl.1Pl.2

a: 0.0 c: 1.0

a: 0.7 c: 0.3

a: 0.0 c: 1.0

a: 0.0 c: 1.0


a cb

1

1

2

2Pl.1Pl.2

a: 0.6 b: 0.4

a: 0.1 b: 0.9

a: 0.5 b: 0.5

a: 0.2 b: 0.8

1

1

2

2Pl.1Pl.2

a: 0.0 c: 1.0

a: 0.7 c: 0.3

a: 0.0 c: 1.0

a: 0.0 c: 1.0


hhP1ii c = ( X) max( c, 1pre(X) )

0

10

a cb

1

1

2

2Pl.1Pl.2

a: 0.6 b: 0.4

a: 0.1 b: 0.9

a: 0.5 b: 0.5

a: 0.2 b: 0.8

1

1

2

2Pl.1Pl.2

a: 0.0 c: 1.0

a: 0.7 c: 0.3

a: 0.0 c: 1.0

a: 0.0 c: 1.0



0

11

a cb

1

1

2

2Pl.1Pl.2

a: 0.6 b: 0.4

a: 0.1 b: 0.9

a: 0.5 b: 0.5

a: 0.2 b: 0.8

1

1

2

2Pl.1Pl.2

a: 0.0 c: 1.0

a: 0.7 c: 0.3

a: 0.0 c: 1.0

a: 0.0 c: 1.0



0.8

11

a cb

1

1

2

2Pl.1Pl.2

a: 0.6 b: 0.4

a: 0.1 b: 0.9

a: 0.5 b: 0.5

a: 0.2 b: 0.8

1

1

2

2Pl.1Pl.2

a: 0.0 c: 1.0

a: 0.7 c: 0.3

a: 0.0 c: 1.0

a: 0.0 c: 1.0



0.96

11

a cb

1

1

2

2Pl.1Pl.2

a: 0.6 b: 0.4

a: 0.1 b: 0.9

a: 0.5 b: 0.5

a: 0.2 b: 0.8

1

1

2

2Pl.1Pl.2

a: 0.0 c: 1.0

a: 0.7 c: 0.3

a: 0.0 c: 1.0

a: 0.0 c: 1.0



limit 1

11


Reachability / max: Buechi / lim sup: Parity: …


Reachability / max: Buechi / lim sup: Parity: …

Many open questions: How do different evaluation orders compare? How fast do these algorithms converge? When are they

optimal?

1. Number of players: 1, 1.5, 2, 2.5

2. Alternation: turn-based or concurrent

3. Strategies: pure or randomized

4. Value of a path: qualitative (boolean) or quantitative (real)

5. Objective: Borel 1, 2, 3

6. Zero-sum vs. nonzero-sum

Summary: Classification of Games

The two players have complementary path values: 2(t) = 1 – 1(t)

-reachability vs. safety / max vs. min-Buechi vs. coBuechi / lim sup vs. lim inf -Rabin vs. Streett

Main Theorem [Martin75, Martin98]: The concurrent stochastic games are determined for all Borel objectives, i.e., hh1ii1(q) + hh2ii2(q) = 1.

sup inf = inf sup

Summary: Zero-Sum Games

1.5 players

2 players

2.5 players

concurrent

parity

CY98, dAl97: polynomial

GH82, EJ88

dAM01

dAH00, CdAH06:NP Å coNP

Summary: Zero-Sum Games

-optimal strategies may not exist

-limit values may not be rational

--close strategies, for fixed , may require infinite memory

-no determinacy for pure strategies

a

aa

q1

bb

1,1

1,2 2,1

2,2 hhP1ii (} a) (q1) = 0 hhP2ii (} b) (q1) = 0

Concurrent Games are Difficult

-optimal strategies always exist [McIver/Morgan]

-in the non-stochastic case, pure finite-memory optimal strategies exist for -regular objectives [Gurevich/Harrington]

-for parity objectives, pure memoryless optimal strategies exist [Emerson/Jutla: non-stochastic Rabin; Condon: stochastic reachability; Chatterjee/deAlfaro/H: stochastic Rabin], hence

NP Å coNP

Turn-based Games are More Pleasant

-optimal strategies always exist [McIver/Morgan]

-in the non-stochastic case, pure finite-memory optimal strategies exist for -regular objectives [Gurevich/Harrington]

-for parity objectives, pure memoryless optimal strategies exist [Emerson/Jutla: non-stochastic Rabin; Condon: stochastic reachability; Chatterjee/deAlfaro/H: stochastic Rabin], hence

NP Å coNP

If solvable in P is open for non-stochastic parity games and for stochastic reachability games.

Turn-based Games are More Pleasant

Summary

Verification and control are very special (boolean) cases of graph-based optimization problems.

They can be generalized to solve questions that involve multiple players, quantitative resources, probabilistic transitions, and continuous state spaces.

The theory and practice of this is still wide open …

Date post:	06-Jan-2016
Category:	Documents
Upload:	hollie
View:	17 times
Download:	1 times

Games, Times, and Probabilities: Value Iteration in Verification and Control

Documents