Timed Games STRATEGO and Stochastic Priced Timed Gamespeople.cs.aau.dk › ~kgl › GRAZ17 ›...

Post on 03-Jul-2020

1 views 0 download

transcript

Timed Gamesand

Stochastic Priced Timed GamesSynthesis & Machine Learning

TIGA

STRATEGO

Kim G. Larsen

– Aalborg University

DENMARK

Overview

Timed Automata Decidability (regions) Symbolic Verification (zones)

Priced Timed Automata Decidability (priced regions) Symbolic Verification (priced zones)

Stochastic Timed Automata Stochastic Semantics Statistical Model Checking Stochastic Hybrid Automata

Timed Games & Stochastic Priced Timed Games Symbolic Synthesis Reinforcement Learning Applications

TU Graz, May 2017 Kim Larsen [2]

TRON

CLASSIC

TIGA

CORA

ECDAR

SMC

Optimization

Synthesis

Component

Testing

PerformanceAnalysis

Verification

STRATEGOOptimal Synthesis

1995

2001

2005

2011

2014

2010

2004

Timed Gamesand

Stochastic Priced Timed GamesSynthesis & Machine Learning

TIGA

STRATEGO

Kim G. Larsen

– Aalborg University

DENMARK

Timed Automata & Model Checking

TU Graz, May 2017 Kim Larsen [4]

State (L1, x=0.81)Transitions

(L1 , x=0.81) - 2.1 ->

(L1 , x=2.91)->

(goal , x=2.91)

Ehi goal ?

Ahi goal ?

A[ ] : L4 ?

Timed Game

TU Graz, May 2017 Kim Larsen [5]

Timed Game & Synthesis

TU Graz, May 2017

x<=1

x<=1

Kim Larsen [6]

Decidability of Timed Games

TU Graz, May 2017 Kim Larsen [7]

Untimed and Timed Games

Reachability / Safety Games

Uncontrollable

Controllable

1

2

3

4

x>1

x·1

x<1

x:=0

x<1

x·1

x¸2

8TU Graz, May 2017

Untimed Games

Uncontrollable

Controllable

Memoryless Strategy:F : Q Ec

Winning Run:States(r) Å G Ø

Winning Strategy:Runs(F) WinRuns

9TU Graz, May 2017

Untimed Games

Uncontrollable

Controllable

Memoryless Strategy:F : Q Ec

Winning Run r:States(r) Å G Ø

Winning Strategy:Runs(F) WinRuns

10TU Graz, May 2017

Untimed Games

Uncontrollable

Controllable

Memoryless Strategy:F : Q Ec

Winning Run r :States(r) Å G Ø

Winning Strategy:Runs(F) WinRuns

11TU Graz, May 2017

Untimed Games

Uncontrollable

Controllable

Backwards Fixed-Point Computation

cPred(X) = { q2Q | 9 q’2 X. q c q’}uPred(X) = { q2Q | 9 q’2 X. q u q’}

p(X) = cPred(X) \ uPred(XC) ]

Theorem:The set of winning states is obtained as the least fixpoint of the function:

X a p(X) [ Goal

12TU Graz, May 2017

Untimed Games

Uncontrollable

Controllable

Backwards Fixed-Point Computation

cPred(X) = { q2Q | 9 q’2 X. q c q’}uPred(X) = { q2Q | 9 q’2 X. q u q’}

p(X) = cPred(X) \ uPred(XC) ]

Theorem:The set of winning states is obtained as the least fixpoint of the function:

X a p(X) [ Goal

13TU Graz, May 2017

Untimed Games

Uncontrollable

Controllable

Backwards Fixed-Point Computation

cPred(X) = { q2Q | 9 q’2 X. q c q’}uPred(X) = { q2Q | 9 q’2 X. q u q’}

p(X) = cPred(X) \ uPred(XC) ]

Theorem:The set of winning states is obtained as the least fixpoint of the function:

X a p(X) [ Goal

14TU Graz, May 2017

Untimed Games

Uncontrollable

Controllable

Backwards Fixed-Point Computation

cPred(X) = { q2Q | 9 q’2 X. q c q’}uPred(X) = { q2Q | 9 q’2 X. q u q’}

p(X) = cPred(X) \ uPred(XC) ]

Theorem:The set of winning states is obtained as the least fixpoint of the function:

X a p(X) [ Goal

15TU Graz, May 2017

Untimed Games

Uncontrollable

Controllable

Backwards Fixed-Point Computation

cPred(X) = { q2Q | 9 q’2 X. q c q’}uPred(X) = { q2Q | 9 q’2 X. q u q’}

p(X) = cPred(X) \ uPred(XC) ]

Theorem:The set of winning states is obtained as the least fixpoint of the function:

X a p(X) [ Goal

16TU Graz, May 2017

Untimed Games

Uncontrollable

Controllable

Backwards Fixed-Point Computation

cPred(X) = { q2Q | 9 q’2 X. q c q’}uPred(X) = { q2Q | 9 q’2 X. q u q’}

p(X) = cPred(X) \ uPred(XC) ]

Theorem:The set of winning states is obtained as the least fixpoint of the function:

X a p(X) [ Goal

17TU Graz, May 2017

Computing Winning States

TU Graz, May 2017 Kim Larsen [18]

Reachability GamesBackwards Fixed-Point Computation

Theorem:

The set of winning states is obtained as the least fixpointof the function: X a p(X) [ Goal

cPred(X) = { q2Q | 9 q’2 X. q c q’}

uPred(X) = { q2Q | 9 q’2 X. q u q’}

Predt(X,Y) = { q2Q | 9 t. qt2X and 8 s·t. qs2YC }

p(X) = Predt[ X [ cPred(X) , uPred(XC) ]

Definitions

X

YPredt(X,Y)

Kim Larsen [19]TU Graz, May 2017

Symbolic On-the-fly Algorithms for Timed Games [CDF+05, BCD+07]

symbolic version of on-the-fly MC algorithmfor modal mu-calculus

Liu & Smolka 98

Kim Larsen [20]TU Graz, May 2017

UPPAAL Tiga [CDF+05, BCD+07]

Reachability properties: control: A[ p U q ] until

control: Ahi q control: A[ true U q ]

Safety properties: control: A[ p W q ] weak until

control: A[] p control: A[ p W false ]

Time-optimality : control_t*(u,g): A[ p U q ]

u is an upper-bound to prune the search

g is the time to the goal from the current state

TU Graz, May 2017 Kim Larsen [21]

UPPAAL Tiga

TU Graz, May 2017 Kim Larsen [22]

DEMO

Model Checking (ex Train Gate)

TU Graz, May 2017 Kim Larsen [23]

: Never two trains at

the crossing at the

same time

Environment

Controller

Synthesis (ex Train Gate)

TU Graz, May 2017 Kim Larsen [24]

: Never two trains at

the crossing at the

same time

Environment

Controller

?

Timed Games

TU Graz, May 2017 Kim Larsen [25]

: Never two trains at

the crossing at the

same time

Controllable Uncontrollable

Find strategy for controllable

actions st behaviour satisfies

Controller

Environment

TU Graz, May 2017

Synthesis Demo

Kim Larsen [26]

A Buggy Brick Sorting Program

16MCD 2001, Twente Kim G. Larsen

UCb

First UPPAAL model

Sorting of Lego Boxes

Conveyer Belt

Exercise: Design Controller so that only yellew boxes are being pushed out

Boxes

Piston

Black

Yellow

9 18 81 90

99

BlckYel

remove

eject

Controller

Ken Tindell

MAIN PUSH

Conveyer Belt

eject

27TU Graz, May 2017

Brick Sorting

Piston

Generic Plate

Controller

28TU Graz, May 2017

Brick Sorting

Piston

Generic Plate

Controller

29TU Graz, May 2017

30

Problem: avoid having the plates falling down

The Chinese Juggling Problem

thanks to Oded Maler

Kim G Larsen

Balancing Plates / Timed Automata

A Plate

The Joggler

E :(Plate1.Bang or Plate2.Bang or …)Kim G Larsen 31China Summer 2009

Balancing Plates / Time Uncertainty

Strategy

BDD/CDD

Kim G Larsen 32China Summer 2009

Production Cell Overview

Realistic case-study describedin several formalisms(1994 and later).

Objective: stampmetal plates in press.

feed belt, two-armedrobot, press, anddeposit belt.

Kim Larsen [33]TU Graz, May 2017

Production Cell in UPPAAL Tiga

TU Graz, May 2017 Kim Larsen [34]

Experimental Results

[CDF+05]

[BCD+07]

Kim Larsen [35]TU Graz, May 2017

Plastic Injection Molding Machine

Robust and optimal control

Tool Chain

Synthesis: UPPAAL TIGA

Verification: PHAVer

Performance: SIMULINK

40% improvement of existing solutions..

Quasiomodo

Kim Larsen [36]TU Graz, May 2017

[CJL+09]

Oil Pump Control Problem

R1: stay within safe interval [4.9,25.1]

R2: minimize average/overall oil volume

Kim Larsen [37]TU Graz, May 2017

Quasiomodo

The Machine (consumption)

Infinite cyclic demand to be satisfied by our control strategy.

P: latency 2 s between state change of pump

F: noise 0.1 l/s

Kim Larsen [38]TU Graz, May 2017

Quasiomodo

Hybrid Game Model

TU Graz, May 2017 Kim Larsen [39]

Abstract Game Model

UPPAAL Tiga offers games of perfect information

Abstract game model such that states only contain information about: Volume of oil at the beginning of cycle

The ideal volume as predicted by the consumption cycle

Current time within the cycle

State of the Pump (on/off)

Discrete model

DV, V_rate

V_acctime

Kim Larsen [40]TU Graz, May 2017

Quasiomodo

Machine (uncontrollable)

Checks whether V under noise gets

outside [Vmin+0.1,Vmax-0.1]

Kim Larsen [41]TU Graz, May 2017

Quasiomodo

Pump (controllable)

Every 1 (one) seconds

Kim Larsen [42]TU Graz, May 2017

Quasiomodo

TU Graz, May 2017

Global Approach

Find some interval I1=[V1,V2] [4.9,25.1] s.t

I1 is m-stable i.e. from any V0 in I1 there is strategy stwhatever fluctuation volume is always within [5,25] and at the end within I2=[V1+m,V1-m]

I1 is optimal among all m-stable intervals.

0

25

5

10

15

20

I1 I2

0 s 20 s

Page 43

Tool Chain

Strategy Synthesis TIGA

Verification PHAVER

Performance Evaluation

SIMULINK

GuaranteedCorrectnessRobustness

with40% Improvement

Quasiomodo

Kim Larsen [44]TU Graz, May 2017

Synthesis of Home Automation

TU Graz, May 2017 Kim Larsen [45]

What else ?

Timed Games w Partial Observability Action-based Observation: undecidable [BDMP03] Finite-observation of states: decidable [CDL+07]

Priced Timed Games: Acyclic, cost non-zeno: decidable [LTMM02] [BCFL04] 1 clock: decidable [BLMR06] >2 clocks: undecidable [BBR05, BBM06] 2 clocks: open

Energy Games: Several Open Problems Exponential Observers

Climate Controller inPig Stables [JRLD07]

CHESS Way [Quasimodo@ESWEEK]

TU Graz, May 2017 Kim Larsen [46]

Timed Gamesand

Stochastic Priced Timed GamesSynthesis & Machine Learning

TIGA

STRATEGO

Kim G. Larsen

– Aalborg University

DENMARK

Going to Uppsala – in 1 hour

TU Graz, May 2017 Kim Larsen [48]

0.9

0.9

0.1

0.1

U[42,45]

U[0,35]

U[0,20]

U[0,140]

Optimal WC Strategy(2-player) Take bikeWC=45

Optimal Expected Strategy(1½ player)Take carE = 16WC = 140

Optimal Expected Strategyguaranteeing WC<=60

?????

DEMO

GTimed Game

σStrategy

PStochastic

PricedTimed Game

P|σ

φ

synthesis

abstraction

σ°optimizedStrategy

G|σTimed Automata

P|σ°Stochastic Priced Timed Automata

minE(cost)

maxE(gain)

Uppaal TIGAstrategy NS = control: A<> goalstrategy NS = control: A[] safe

Statistical Learning

strategy DS = minE (cost) [<=10]: <> done under NSstrategy DS = maxE (gain) [<=10]: <> done under NS

UppaalE<> error under NSA[] safe under NS

Uppaal SMCsimulate 5 [<=10]{e1, e2} under SS Pr[<=10](<> error) under SS E[<=10;100](max: cost) under SS

Timed Games

TU Graz, May 2017 Kim Larsen [51]

Strategy:

Memoryless, deterministic, most permissive.

Uncontrol-

lable

Controllable

TIGA

Run

𝜋 = INIT, 𝑥 = 050.1

՜r(CHOICE, 𝑥 = 0)

2.4՜b(B, 𝑥 = 0)

20.3՜d(END, 𝑥 = 20.3)

Total time = 50.1 + 2.4 + 20.3 = 72.8

Timed Games –Time Bounded Reachability

TU Graz, May 2017 Kim Larsen [52]

Objective: 𝐴⟨⟩(END ∧ time≤ 210)

Deterministic, memoryless strategy:

100 200

x w w

𝝀 𝝀

time

100 200

time

x w w

𝝀

9070

𝝀

a

𝝀

a b

Most permissive, memoryless strategy

100

100

time

time

Priced Timed Games

TU Graz, May 2017 Kim Larsen [53]

• Cost optimal strategy take b immediately

WC= 280

”CORA”

COST

Total 𝑐𝑜𝑠𝑡 = 𝟎 + 𝟒. 𝟖 + 𝟔𝟎. 𝟗 = 𝟔𝟓. 𝟕

Priced Run

𝜋 = Init, 𝑥 = 050.1

𝟎՜rCHOICE, 𝑥 = 0

2.4

4.8՜b

B, 𝑥 = 020.3

𝟔𝟎.𝟗՜d(END, 𝑥 = 20.3)

Priced Timed MDP

TU Graz, May 2017 Kim Larsen [54]

• Cost optimal strategy take b immediately

WC= 280

Priced Timed MDP

Optimal expected cost str

take a immediately

expectation = 160

UNIFORM[0,100]

Controllable

”SMC”

COST

Priced Timed MDP

TU Graz, May 2017

Kim Larsen [55]

Cost optimal strategy

take b immediately

overall = 280

Priced Timed MDP

Optimal expected cost str

take a immediately

expectation = 160

Minimal Expected Cost while

guaranteeing END is reached

within time 210:

Strat.: t>90 (100,w)

t>70 (0,a)

t<70 (0,b)

= 204

UNIFORM[0,100]

Controllable

”SMC”

COST

Stochastic Strategies for Learning!

TU Graz, May 2017 Kim Larsen [56]

Objective: 𝐴⟨⟩(END ∧ time≤ 210)

Most permissive, memoryless strategy:

100

100 200

x

𝝀

9070

𝝀

a

𝝀

a bCost optimal deterministic

sub-strategy !

100

200

x w w

𝝀

9070

𝝀

a

𝝀

a b

100

Stochastic Strategies

𝝀 )

time

time

w w

Reinforcement Learning

TU Graz, May 2017 Kim Larsen [57]

Time Bounded Reachability(G,T)

TIGA

SMC

SMC

Learned Strategies

TU Graz, May 2017 Kim Larsen [58]

More plots of runs according to strategies learne. 𝝀

a

b

Covariance Matrices

Learned Strategies

TU Graz, May 2017 Kim Larsen [59]

More plots of runs according to strategies learne. 𝝀

a

b

Covariance Matrices

Learned Strategies

TU Graz, May 2017 Kim Larsen [60]

More plots of runs according to strategies learne. 𝝀

a

b

Covariance Matrices

Learned Strategies

TU Graz, May 2017 Kim Larsen [61]

More plots of runs according to strategies learne. 𝝀

a

b

Covariance Matrices

Learned Strategies

TU Graz, May 2017 Kim Larsen [62]

More plots of runs according to strategies learne. 𝝀

a

b

Covariance Matrices

Learned Strategies

TU Graz, May 2017 Kim Larsen [63]

More plots of runs according to strategies learne. 𝝀

a

b

Covariance Matrices

Strategies – Representation

TU Graz, May 2017 Kim Larsen [64]

Nondeterministic Strategies 𝜎𝑛(ℓ,𝑣)

⊆ Σ𝑐 ∪ 𝜆

Stochastic Strategies

Covariance Matrices

Splitting

Logistic Regression

𝜇𝑠(ℓ,𝑣)

∶ Σ𝑐 ∪ 𝜆 ՜ [0,1]

𝑅ℓ𝑅ℓ

Stochastic Timed Game

TU Graz, May 2017 Kim Larsen [65]

DEMO

Safe & Adaptive Cruice Control

TU Graz, May 2017 Kim Larsen [67]

Q1: Find the most permissive strategy ensuring safety.

Q2:Find the optimal sub-strategy that will allow Ego to go as close as possible.

EGO FRONT

Two Player Game (simplified)

TU Graz, May 2017 Kim Larsen [68]

Discretization

TU Graz, May 2017 Kim Larsen [69]

Discrete

Continuous

Front (complete)

TU Graz, May 2017 Kim Larsen [70]

No Strategy

TU Graz, May 2017 Kim Larsen [71]

Safety Strategy

TU Graz, May 2017 Kim Larsen [72]

Safety Strategy

TU Graz, May 2017 Kim Larsen [73]

Safe and Optimal Strategy

TU Graz, May 2017 Kim Larsen [74]

Safe and Optimal Strategy

TU Graz, May 2017 Kim Larsen [75]

Optimal and Safe Strategy

TU Graz, May 2017 Kim Larsen [76]

Safety Strategy (Code)

TU Graz, May 2017 Kim Larsen [77]

Synthesis of Climate Controllers

TU Graz, May 2017 Kim Larsen [78]

TACAS16

Synthesis of Climate Controllers

TU Graz, May 2017 Kim Larsen [79]

TACAS16

Synthesis of Climate Controllers

TU Graz, May 2017 Kim Larsen [80]

TACAS16

From Verification to Synthesisand Optimization

TU Graz, May 2017 Kim Larsen [81]

1 ½

2 1½

Skov

HYDAC

SELUXIT

Zone-based climatecontrol pig-stables

Profit-optimal, minimal-wear and energy-awareschedules for satelittes

Personalized light control in homeautomation

Energy- and comfort-optimal floor heating

Intelligent Traffic Control

Safe and optimal car maneuvers

Mathias G Sørensen

TU Graz, May 2017

GOMSpace

Skov

More Practical Synthesis …

82

LASSO

Learning, Analysis,

SynthesiS and Optimization

of Cyber-Physical Systems

Ongoing Work

Verification of learned strategy (zonification) Full correctness of hybrid control obtained from

discretization. Combination with Invariance Analysis for Switched Controller

Synthesis; Metrics; Verification using SpaceEX;

From optimal strategy of bounded horizon to optimal infinite strategy.

Strategies: complexity (space and time) and permissiveness

On-line synthesis may be too slow for some application

Satellites > Floor-heating > Traffic > Power Electronics Learn Neural Network representation of optimal strategy

TU Graz, May 2017 Kim Larsen [83]

LASSOLearning, Analysis, SynthesiS and Optimization

of Cyber-Physical Systems

1

𝜇1 𝜇𝑛

Safety Constraints

Perf. Measures

Model of

Physical Comp.Model of

Cyber Comp.

Unknown

Known

Learning

Analysis

Synthesize

Optimize

Fig 1. The LASSO Framework

TU Graz, May 2017

NEXT UPPAAL BRANCHES

PhD/visitors/PostDocPosition available

Contact: kgl@cs.aau.dk84

Thank You !

TU Graz, May 2017 Kim Larsen [85]

Exercises (from yesterday)

Exercise 28: Jobshop Scheduling

Look at

https://www.dropbox.com/sh/96tvd7qklf1gcyh/AADFwVJk97L9qtJLkf0VUgtOa?dl=0

In UPPAAL SMC

Coffee Machine

Hammer Nail

Hybrid

… and more

TU Graz, 2017 Kim Larsen [86]

Exercises

Download UPPAAL STRATEGO 4.1.20-3 or 4.1.20-4

http://people.cs.aau.dk/~marius/stratego

Experiment with Newspaper

Traffic Uppsala

Cruice

Small Car

See Dropbox

TU Graz, May 2017 Kim Larsen [87]