Games, Times, and Probabilities:Value Iteration in Verification and Control
Krishnendu Chatterjee Tom Henzinger
Graph Models of Systems
vertices = states
edges = transitions
paths = behaviors
graph
Extended Graph Models
CONTROL: game graph
OBJECTIVE: -automaton
PROBABILITIES: Markov decision process
stochastic game
regular game
CLOCKS: timed automaton
stochastic hybrid system
Graphs vs. Games
a
baa b
a
Games model Open Systems
Two players: environment / controller / input vs.
system / plant / output
Multiple players: processes / components / agents
Stochastic players: nature / randomized algorithms
Example
P1:
init x := 0
loop
choice | x := x+1 mod 2| x := 0
end choice
end loop
1: (x = y )
P2:
init y := 0
loop
choice | y := x | y := x+1 mod 2
end choice
end loop
2: ( y = 0 )
Graph Questions
8 ( x = y )
9 ( x = y )
CTL
Graph Questions
8 ( x = y )
9 ( x = y )00
10 11
01
X
CTL
Zero-Sum Game Questions
hhP1ii ( x = y )
hhP2ii ( y = 0 )
ATL [Alur/H/Kupferman]
Zero-Sum Game Questions
hhP1ii ( x = y )
hhP2ii ( y = 0 )
00
00 00
10
10 10
01
01 01
11
1111ATL [Alur/H/Kupferman]
Zero-Sum Game Questions
hhP1ii ( x = y )
hhP2ii ( y = 0 )
00
00 00
10
10 10
01
01 01
11
1111ATL [Alur/H/Kupferman]
X
Zero-Sum Game Questions
hhP1ii ( x = y )
hhP2ii ( y = 0 )
00
00 00
10
10 10
01
01 01
11
1111ATL [Alur/H/Kupferman]
X
Nonzero-Sum Game Questions
hhP1ii ( x = y )
hhP2ii ( y = 0 )
00
00 00
10
10 10
01
01 01
11
1111
Secure equilibra [Chatterjee/H/Jurdzinski]
Nonzero-Sum Game Questions
hhP1ii ( x = y )
hhP2ii ( y = 0 )
00
00 00
10
10 10
01
01 01
11
1111
Secure equilibra [Chatterjee/H/Jurdzinski]
Strategies
Strategies x,y: Q* ! Q
From a state q, a pair (x,y) of a player-1 strategy x21 and a player-2 strategy y22 gives a unique infinite path Outcomex,y(q) 2 Q.
Strategies
hhP1ii 1 = (9 x21) (8 y22) 1(x,y)
Short for:
q ² hhP1ii 1 iff (9 x21) (8 y22) ( Outcomex,y(q) ² 1 )
Strategies x,y: Q* ! Q
From a state q, a pair (x,y) of a player-1 strategy x21 and a player-2 strategy y22 gives a unique infinite path Outcomex,y(q) 2 Q.
Strategies
hhP1ii 1 = (9 x21) (8 y22) 1(x,y)
hhP1ii 1 hhP2ii 2 = (9 x21) (9 y22) [ (1 Æ 2)(x,y) Æ (8 y’22) (2 ! 1)(x,y’) Æ (8 x’21) (2 ! 1)(x,y) ]
Strategies x,y: Q* ! Q
From a state q, a pair (x,y) of a player-1 strategy x21 and a player-2 strategy y22 gives a unique infinite path Outcomex,y(q) 2 Q.
Objectives and 2
Qualitative: reachability; Buechi; parity (-regular)
Quantitative: max; lim sup; lim avg
Reachability } aSafety a = :}: a
Normal Forms of -Regular Sets
Borel-1
Reachability } aSafety a = :}: a
Buechi } acoBuechi } a = :}: a
Normal Forms of -Regular Sets
Borel-1
Borel-2
Reachability } aSafety a = :}: a
Buechi } acoBuechi } a = :}: a
Streett Æ ( } a ! } b ) = Æ ( }: a Ç } b )Rabin Ç ( } a Æ } b )
Parity: complement-closed subset of Streett/Rabin
Normal Forms of -Regular Sets
Borel-1
Borel-2
Borel-2.5
Buechi Game
q4q0q2
q1q3
G
B
Buechi Game
q4q0q2
q1q3
G
B
• Secure equilibrium (x,y) at q0:
x: if q1 ! q0, then q2 else q4. y: if q3 ! q1, then q0 else q4.
• Strategies require memory.
Zero-Sum Games: Determinacy
W1
W2
1 = : 2
hhP2ii 2
hhP1ii 1
Nonzero-sum Games
W10 hhP1ii (1 Æ : 2 )
W01 hhP2ii (2 Æ : 1)
W11
W00
hhP1ii1 hhP2ii2
Objectives
Qualitative: reachability; Buchi; parity (-regular)
Quantitative: max; lim sup; lim avg
Objectives
Qualitative: reachability; Buchi; parity (-regular)
Quantitative: max; lim sup; lim avg
Borel-1Borel-2
Borel-3
Quantitative Games
hhP1ii lim sup
hhP1ii lim avg
4
2
2
0
2
0
0
4
3
Quantitative Games
hhP1ii lim sup = 3
hhP1ii lim avg
4
2
2
0
2
0
0
4
3
Quantitative Games
hhP1ii lim sup = 3
hhP1ii lim avg = 1
4
2
2
0
2
0
0
4
3
Solving Games by Value Iteration
Generalization of the -calculus: computing fixpoints of transfer functions (pre; post).
Generalization of dynamic programming: iterative optimization.
q
Region R: Q ! V
q’
R(q’)
Solving Games by Value Iteration
Generalization of the -calculus: computing fixpoints of transfer functions (pre; post).
Generalization of dynamic programming: iterative optimization.
q
Region R: Q ! V
q’
R(q’)
R(q) := pre(R(q’))
Q states transition labels : Q Q
transition function
Graph
Q states transition labels : Q Q
transition function
= [ Q ! {0,1} ] regions with V = B
9pre:
q 9pre(R) iff ( ) (q,) R
8pre:
q 8pre(R) iff ( ) (q,) R
Graph
a cb
Graph
9 c = ( X) ( c Ç 9pre(X) )
a cb
Graph
9 c = ( X) ( c Ç 9pre(X) )
a cb
Graph
9 c = ( X) ( c Ç 9pre(X) )
a cb
Graph
9 c = ( X) ( c Ç 9pre(X) )
8 c = ( X) ( c Ç 8pre(X) )
Graph Reachability
R
Given RµQ, find the states from which some path leads to R.
R
RR [ pre(
R)
R = ( X) (R Ç 9pre(X))
Given RµQ, find the states from which some path leads to R.
Graph Reachability
RR [ pre(
R)R
[ pre(R)
[ pre2(R)
R = ( X) (R Ç 9pre(X))
Given RµQ, find the states from which some path leads to R.
Graph Reachability
R
. . .
RR
[ pre(R)
R [ pre(
R) [ pre2(R
)
R = ( X) (R Ç 9pre(X))
Given RµQ, find the states from which some path leads to R.
Graph Reachability
R
. . .
RR
[ pre(R)
R [ pre(
R) [ pre2(R
)
R = ( X) (R Ç 8pre(X))
Given RµQ, find the states from which all paths lead to R.
Graph Reachability
Value Iteration Algorithms
consist of
A. LOCAL PART: 9pre and 8pre computation
B. GLOBAL PART: evaluation of a fixpoint expression
We need to generalize both parts to solve games.
Q1, Q2 states ( Q = Q1 [ Q2 ) transition labels : Q Q
transition function
Turn-based Game
Q1, Q2 states ( Q = Q1 [ Q2 ) transition labels : Q Q
transition function
= [ Q ! {0,1} ] regions with V = B
1pre:
q 1pre(R) iff q 2 Q1 Æ ( ) (q,) R or q 2 Q2 Æ (8 2 )
(q,) 2 R
Turn-based Game
Q1, Q2 states ( Q = Q1 [ Q2 ) transition labels : Q Q
transition function
= [ Q ! {0,1} ] regions with V = B
1pre:
q 1pre(R) iff q 2 Q1 Æ ( ) (q,) R or q 2 Q2 Æ (8 2 )
(q,) 2 R
2pre:
q 2pre(R) iff q 2 Q1 Æ (8 ) (q,) R or q 2 Q2 Æ (9 2 ) (q,) 2 R
Turn-based Game
c
Turn-based Game
a b
c
Turn-based Game
a b
hhP1ii c = ( X) ( c Ç 1pre(X) )
c
Turn-based Game
a b
hhP1ii c = ( X) ( c Ç 1pre(X) )
c
Turn-based Game
a b
hhP1ii c = ( X) ( c Ç 1pre(X) )
hhP2ii c = ( X) ( c Ç 2pre(X) )
c
Turn-based Game
a b
hhP1ii c = ( X) ( c Ç 1pre(X) )
hhP2ii c = ( X) ( c Ç 2pre(X) )
c
Turn-based Game
a b
hhP1ii c = ( X) ( c Ç 1pre(X) )
hhP2ii c = ( X) ( c Ç 2pre(X) )
R
P1 R
Given RµQ, find the states from which player 1 has a strategy to force the game to R.
Reachability Game
RR [ 1pre(
R)
P1 R
Given RµQ, find the states from which player 1 has a strategy to force the game to R.
Reachability Game
RR [ 1pre(
R)R
[ 1pre(R)
[ 1pre2(R)
P1 R
Given RµQ, find the states from which player 1 has a strategy to force the game to R.
Reachability Game
R
. . .
1 RR
[ 1pre(R)
R [ 1pre(
R) [ 1pre2(R
)
P1 R = ( X) (R Ç 1pre(X))
Given RµQ, find the states from which player 1 has a strategy to force the game to R.
Reachability Game
P1 R
Given RµQ, find the states from which player 1 has a strategy to keep the game in R.
R
Safety Game
R \ 1pre(R)
P1 R
Given RµQ, find the states from which player 1 has a strategy to keep the game in R.
R
Safety Game
R \ 1pre(R)
R \ 1pre(R) \ 1pre2(R)
P1 R
Given RµQ, find the states from which player 1 has a strategy to keep the game in R.
R
Safety Game
. . .1 R
R \ 1pre(R)
R \ 1pre(R) \ 1pre2(R)
P1 R = ( X) (R Æ 1pre(X))
Given RµQ, find the states from which player 1 has a strategy to keep the game in R.
R
Safety Game
Q1, Q2 states ( Q = Q1 [ Q2 ) transition labels : Q N £ Q transition function
Quantitative Game
Q1, Q2 states ( Q = Q1 [ Q2 ) transition labels : Q N £ Q transition function
= [ Q ! N ] regions with V = N
1pre:
1pre(R)(q) = (max ) max( 1(q,), R(2(q,)) ) if q 2 Q1 (min 2 ) max( 1(q,), R(2(q,)) ) if q 2 Q2
Quantitative Game
Q1, Q2 states ( Q = Q1 [ Q2 ) transition labels : Q N £ Q transition function
= [ Q ! N ] regions with V = N
1pre:
1pre(R)(q) = (max ) max( 1(q,), R(2(q,)) ) if q 2 Q1 (min 2 ) max( 1(q,), R(2(q,)) ) if q 2 Q2
2pre:
2pre(R)(q) = (min ) max( 1(q,), R((q,)) ) if q 2 Q1 (max 2 ) max( 1(q,), R(2(q,)) ) if q 2 Q2
Quantitative Game
c
Maximizing Game
a b0
1
2
5
3
c
Maximizing Game
a b
hhP1ii 0 = ( X) max( 0, 1pre(X) )
0
1
2
5
3
0 0 0
c
Maximizing Game
a b
hhP1ii 0 = ( X) max( 0, 1pre(X) )
0
1
2
5
3
1 0 0
c
Maximizing Game
a b
hhP1ii 0 = ( X) max( 0, 1pre(X) )
0
1
2
5
3
1 2 0
c
Maximizing Game
a b
hhP1ii 0 = ( X) max( 0, 1pre(X) )
0
1
2
5
3
2 2 0
B
B
Given BµQ, find the states from which some path visits B infinitely often.
Buechi Graph
BR1 = pre(B)
. . .pre(B
)
pre(B) [ pre2(B)
B
Given BµQ, find the states from which some path visits B infinitely often.
Buechi Graph
BR1 = pre(B)
B
Given BµQ, find the states from which some path visits B infinitely often.
Buechi Graph
BR1 = pre(B)R2 = pre(B Å
R1)
B
Given BµQ, find the states from which some path visits B infinitely often.
Buechi Graph
B
... B
B = ( Y) 9 (B Æ 9pre(Y))
Given BµQ, find the states from which some path visits B infinitely often.
Buechi Graph
B
... B
B = ( Y) ( X) ((B Æ 9pre(Y)) Ç 9pre(X))
Given BµQ, find the states from which some path visits B infinitely often.
Buechi Graph
B
P1 B
Given BµQ, find the states from which player 1 has a strategy to force the game to B infinitely often.
Buechi Game
B
...P1 B
R2 = P1 1pre(B Å R1)
R1 = P1 1pre(B)
P1 B = ( Y) ( X) ((B Æ 1pre(Y)) Ç 1pre(X))
Given BµQ, find the states from which player 1 has a strategy to force the game to B infinitely often.
Buechi Game
Can we use the same value iteration scheme?
Yes, iff the fixpoint expression computes correctly on all single-player (player 1 and player 2) structures.
Reachability: 9 p = ( X) (p Ç 9pre(X)) 8 p = ( X) (p Ç 8pre(X))
Hence: hhP1ii p = ( X) (p Ç 1pre(X)) hhP2ii p = ( X) (p Ç 2pre(X))
From Graphs to Games
Complexity of Turn-based Games
1. Reachability, safety: linear time (P-complete)
2. Buechi: quadratic time (optimal ???)
3. Parity: NP Å coNP (in P ???)
Complexity of Turn-based Games
1. Reachability, safety: linear time (P-complete)
2. Buechi: quadratic time (optimal ???)
3. Parity: NP Å coNP (in P ???)
on graphs polynomial
on graphs linear
Graph-based (finite-carrier) systems:
Q = Bm = boolean formulas [e.g. BDDs]
9pre = (9 x 2 B)
Timed and hybrid systems:
Q = Bm £ Rn
= formulas of (Q,·,+) [e.g. polyhedral sets]9pre = (9 x 2 Q)
Beyond Graphs as Finite Carrier Sets
Q states 1, 2 moves of both players : Q 1 2 Q transition function
Concurrent Game
Q states 1, 2 moves of both players : Q 1 2 Q transition function
= [ Q ! {0,1} ] regions with V = B
1pre:
q 1pre(R) iff (1 1) (2 2) (q,1,2) R
Concurrent Game
Q states 1, 2 moves of both players : Q 1 2 Q transition function
= [ Q ! {0,1} ] regions with V = B
1pre:
q 1pre(R) iff (1 1) (2 2) (q,1,2) R
2pre:
q 2pre(R) iff (2 2 ) (1 1) (q,1,2) R
Concurrent Game
a cb
1,1 1,2
2,1 2,2
1,1 1,2 2,2
2,1
Concurrent Game
a cb
1,1 1,2
2,1 2,2
1,1 1,2 2,2
2,1
Concurrent Game
hhP2ii c = ( X) ( c Ç 2pre(X) )
a cb
1,1 1,2
2,1 2,2
1,1 1,2 2,2
2,1
Concurrent Game
hhP2ii c = ( X) ( c Ç 2pre(X) )
a cb
1,1 1,2
2,1 2,2
1,1 1,2 2,2
2,1
Concurrent Game
hhP2ii c = ( X) ( c Ç 2pre(X) )
Pr(1): 0.5 Pr(2): 0.5
graph
Extended Graph Models
CONTROL: game graph
OBJECTIVE: -automaton
PROBABILITIES: Markov decision process
stochastic game
regular game
CLOCKS: timed automaton
stochastic hybrid system
Nondeterministic closed system.
q1
q2 q3
Graph: 1 Player
b
a
a
a
Probabilistic closed system.
0.4 0.6
q1
q3q2
q5q4
MDP: 1.5 Players
a
b
a
c
Asynchronous open system.
q1
q3q2
q5q4
Turn-based Game: 2 Players
a
b
a
c
a
a
Probabilistic asynchronous open system.
0.4 0.6
q1
q3q2
q5q4
q7q6
Turn-based Stochastic Game: 2.5 Players
cb
c a
b
a
a
aa
q1
bbq2 q4 q5q3
1,1
1,2 2,1
2,2
Concurrent Game
Synchronous open system.
a
aa
q1
bbq2 q4 q5q3
q2: 0.3 q3: 0.2 q4: 0.5 q5:
q2: 0.1 q3: 0.1 q4: 0.5 q5: 0.3
q2: q3: 0.2 q4: 0.1 q5: 0.7
q2: 1.0 q3: q4: q5:
1 2
2
1Matrix game at each vertex.
q1:
Concurrent Stochastic Game
Probabilistic synchronous open system.
Graph: nondeterministic generator of behaviors (possibly stochastic)
Strategy: deterministic selector of behaviors (possibly randomized)
Graph + Strategies for both players ! Behavior
Two pure strategies at q1: “left” and “right”. Two pure behaviors: ab; aa.
Model = graph Pure behavior = path
q1
q2 q3b
a
a
Two pure strategies at q1: “left” and “right”. Two pure behaviors: {ab: 1}; {aac: 0.4, aaa: 0.6}.
Model = MDP Pure behavior = probability distribution on paths = p-path
a
0.4 0.6
q1
q3q2
q5q4a
b
a
c
Model = turn-based game Pure behavior = path
Two pure pl. 1 strategies at q1: “left” and “right”. Two pure pl. 2 strategies at q3: “left” and “right”. Three pure behaviors: ab; aac; aaa.
q1
q3q2
q5q4a
b
a
c
a
Model = turn-based game Pure behavior = path General (randomized) behavior = p-path
Three pure behaviors: ab; aac; aaa. Infinitely many behaviors, e.g. {aac: 0.5, aaa: 0.5}.
q1
q3q2
q5q4a
b
a
c
a
The objective of each player is to find a strategy that optimizes the value of the resulting behavior.
How do we define “value”?
A. Assign a value to each path
B. Assign a value to each behavior (expected value of A.)
C. Assign a value to each state (strategy sup inf of B.)
A. Value of Paths
Qualitative value function: : Q ! {0,1}
e.g. -regular subsets of Q
B. Value of Behaviors
path t: (T) = (t)
p-path T: (T) = Exp {(T)} (expected value)
Example:
T = {aaa: 0.2, aab: 0.7, bbb: 0.1 }
(} b)(T) = 0.8
C. Value of States
hh1ii (q) = supx infy ( Outcomex,y(q) ) hh2ii (q) = supy infx ( Outcomex,y(q) )
Q states 1, 2 moves of both players : Q 1 2 Dist(Q) probabilistic transition function
= [ Q ! [0,1] ] regions with V = [0,1]
Concurrent Stochastic Game
Q states 1, 2 moves of both players : Q 1 2 Dist(Q) probabilistic transition function
= [ Q ! [0,1] ] regions with V = [0,1]
1pre:
1pre(R)(q) = (sup 1 1 ) (inf 2 2) R((q,1,2))
Concurrent Stochastic Game
Q states 1, 2 moves of both players : Q 1 2 Dist(Q) probabilistic transition function
= [ Q ! [0,1] ] regions with V = [0,1]
1pre:
1pre(R)(q) = (sup 1 1 ) (inf 2 2) R((q,1,2))
2pre:
2pre(R)(q) = (sup 2 2) (inf 1 1) R((q,1,2))
Concurrent Stochastic Game
a cb
1
1
2
2Pl.1Pl.2
a: 0.6 b: 0.4
a: 0.1 b: 0.9
a: 0.5 b: 0.5
a: 0.2 b: 0.8
1
1
2
2Pl.1Pl.2
a: 0.0 c: 1.0
a: 0.7 c: 0.3
a: 0.0 c: 1.0
a: 0.0 c: 1.0
Concurrent Stochastic Game
a cb
1
1
2
2Pl.1Pl.2
a: 0.6 b: 0.4
a: 0.1 b: 0.9
a: 0.5 b: 0.5
a: 0.2 b: 0.8
1
1
2
2Pl.1Pl.2
a: 0.0 c: 1.0
a: 0.7 c: 0.3
a: 0.0 c: 1.0
a: 0.0 c: 1.0
Concurrent Stochastic Game
hhP1ii c = ( X) max( c, 1pre(X) )
0
10
a cb
1
1
2
2Pl.1Pl.2
a: 0.6 b: 0.4
a: 0.1 b: 0.9
a: 0.5 b: 0.5
a: 0.2 b: 0.8
1
1
2
2Pl.1Pl.2
a: 0.0 c: 1.0
a: 0.7 c: 0.3
a: 0.0 c: 1.0
a: 0.0 c: 1.0
Concurrent Stochastic Game
hhP1ii c = ( X) max( c, 1pre(X) )
0
11
a cb
1
1
2
2Pl.1Pl.2
a: 0.6 b: 0.4
a: 0.1 b: 0.9
a: 0.5 b: 0.5
a: 0.2 b: 0.8
1
1
2
2Pl.1Pl.2
a: 0.0 c: 1.0
a: 0.7 c: 0.3
a: 0.0 c: 1.0
a: 0.0 c: 1.0
Concurrent Stochastic Game
hhP1ii c = ( X) max( c, 1pre(X) )
0.8
11
a cb
1
1
2
2Pl.1Pl.2
a: 0.6 b: 0.4
a: 0.1 b: 0.9
a: 0.5 b: 0.5
a: 0.2 b: 0.8
1
1
2
2Pl.1Pl.2
a: 0.0 c: 1.0
a: 0.7 c: 0.3
a: 0.0 c: 1.0
a: 0.0 c: 1.0
Concurrent Stochastic Game
hhP1ii c = ( X) max( c, 1pre(X) )
0.96
11
a cb
1
1
2
2Pl.1Pl.2
a: 0.6 b: 0.4
a: 0.1 b: 0.9
a: 0.5 b: 0.5
a: 0.2 b: 0.8
1
1
2
2Pl.1Pl.2
a: 0.0 c: 1.0
a: 0.7 c: 0.3
a: 0.0 c: 1.0
a: 0.0 c: 1.0
Concurrent Stochastic Game
hhP1ii c = ( X) max( c, 1pre(X) )
limit 1
11
Solving Games by Value Iteration
Reachability / max: Buechi / lim sup: Parity: …
Solving Games by Value Iteration
Reachability / max: Buechi / lim sup: Parity: …
Many open questions: How do different evaluation orders compare? How fast do these algorithms converge? When are they
optimal?
1. Number of players: 1, 1.5, 2, 2.5
2. Alternation: turn-based or concurrent
3. Strategies: pure or randomized
4. Value of a path: qualitative (boolean) or quantitative (real)
5. Objective: Borel 1, 2, 3
6. Zero-sum vs. nonzero-sum
Summary: Classification of Games
The two players have complementary path values: 2(t) = 1 – 1(t)
-reachability vs. safety / max vs. min-Buechi vs. coBuechi / lim sup vs. lim inf -Rabin vs. Streett
Main Theorem [Martin75, Martin98]: The concurrent stochastic games are determined for all Borel objectives, i.e., hh1ii1(q) + hh2ii2(q) = 1.
sup inf = inf sup
Summary: Zero-Sum Games
1.5 players
2 players
2.5 players
concurrent
parity
CY98, dAl97: polynomial
GH82, EJ88
dAM01
dAH00, CdAH06:NP Å coNP
Summary: Zero-Sum Games
-optimal strategies may not exist
-limit values may not be rational
--close strategies, for fixed , may require infinite memory
-no determinacy for pure strategies
a
aa
q1
bb
1,1
1,2 2,1
2,2 hhP1ii (} a) (q1) = 0 hhP2ii (} b) (q1) = 0
Concurrent Games are Difficult
-optimal strategies always exist [McIver/Morgan]
-in the non-stochastic case, pure finite-memory optimal strategies exist for -regular objectives [Gurevich/Harrington]
-for parity objectives, pure memoryless optimal strategies exist [Emerson/Jutla: non-stochastic Rabin; Condon: stochastic reachability; Chatterjee/deAlfaro/H: stochastic Rabin], hence
NP Å coNP
Turn-based Games are More Pleasant
-optimal strategies always exist [McIver/Morgan]
-in the non-stochastic case, pure finite-memory optimal strategies exist for -regular objectives [Gurevich/Harrington]
-for parity objectives, pure memoryless optimal strategies exist [Emerson/Jutla: non-stochastic Rabin; Condon: stochastic reachability; Chatterjee/deAlfaro/H: stochastic Rabin], hence
NP Å coNP
If solvable in P is open for non-stochastic parity games and for stochastic reachability games.
Turn-based Games are More Pleasant
Summary
Verification and control are very special (boolean) cases of graph-based optimization problems.
They can be generalized to solve questions that involve multiple players, quantitative resources, probabilistic transitions, and continuous state spaces.
The theory and practice of this is still wide open …