Timed Games STRATEGO and Stochastic Priced Timed Gamespeople.cs.aau.dk › ~kgl › GRAZ17 ›...

transcript

Timed Gamesand

Stochastic Priced Timed GamesSynthesis & Machine Learning

STRATEGO

Kim G. Larsen

– Aalborg University

DENMARK

Overview

Timed Automata Decidability (regions) Symbolic Verification (zones)

Priced Timed Automata Decidability (priced regions) Symbolic Verification (priced zones)

Stochastic Timed Automata Stochastic Semantics Statistical Model Checking Stochastic Hybrid Automata

Timed Games & Stochastic Priced Timed Games Symbolic Synthesis Reinforcement Learning Applications

TU Graz, May 2017 Kim Larsen [2]

CLASSIC

Optimization

Synthesis

Component

Testing

PerformanceAnalysis

Verification

STRATEGOOptimal Synthesis

Timed Gamesand

STRATEGO

Kim G. Larsen

DENMARK

Timed Automata & Model Checking

State (L1, x=0.81)Transitions

(L1 , x=0.81) - 2.1 ->

(L1 , x=2.91)->

(goal , x=2.91)

Ehi goal ?

Ahi goal ?

A[ ] : L4 ?

Timed Game

Timed Game & Synthesis

TU Graz, May 2017

Kim Larsen [6]

Decidability of Timed Games

Untimed and Timed Games

Reachability / Safety Games

Uncontrollable

Controllable

8TU Graz, May 2017

Untimed Games

Uncontrollable

Controllable

Memoryless Strategy:F : Q Ec

Winning Run:States(r) Å G Ø

Winning Strategy:Runs(F) WinRuns

9TU Graz, May 2017

Untimed Games

Uncontrollable

Controllable

Winning Run r:States(r) Å G Ø

10TU Graz, May 2017

Untimed Games

Uncontrollable

Controllable

Winning Run r :States(r) Å G Ø

11TU Graz, May 2017

Untimed Games

Uncontrollable

Controllable

Backwards Fixed-Point Computation

cPred(X) = { q2Q | 9 q’2 X. q c q’}uPred(X) = { q2Q | 9 q’2 X. q u q’}

p(X) = cPred(X) \ uPred(XC) ]

Theorem:The set of winning states is obtained as the least fixpoint of the function:

X a p(X) [ Goal

12TU Graz, May 2017

Untimed Games

Uncontrollable

Controllable

X a p(X) [ Goal

13TU Graz, May 2017

Untimed Games

Uncontrollable

Controllable

X a p(X) [ Goal

14TU Graz, May 2017

Untimed Games

Uncontrollable

Controllable

X a p(X) [ Goal

15TU Graz, May 2017

Untimed Games

Uncontrollable

Controllable

X a p(X) [ Goal

16TU Graz, May 2017

Untimed Games

Uncontrollable

Controllable

X a p(X) [ Goal

17TU Graz, May 2017

Computing Winning States

Reachability GamesBackwards Fixed-Point Computation

Theorem:

The set of winning states is obtained as the least fixpointof the function: X a p(X) [ Goal

cPred(X) = { q2Q | 9 q’2 X. q c q’}

uPred(X) = { q2Q | 9 q’2 X. q u q’}

Predt(X,Y) = { q2Q | 9 t. qt2X and 8 s·t. qs2YC }

p(X) = Predt[ X [ cPred(X) , uPred(XC) ]

Definitions

YPredt(X,Y)

Kim Larsen [19]TU Graz, May 2017

Symbolic On-the-fly Algorithms for Timed Games [CDF+05, BCD+07]

symbolic version of on-the-fly MC algorithmfor modal mu-calculus

Liu & Smolka 98

UPPAAL Tiga [CDF+05, BCD+07]

Reachability properties: control: A[ p U q ] until

control: Ahi q control: A[ true U q ]

Safety properties: control: A[ p W q ] weak until

control: A[] p control: A[ p W false ]

Time-optimality : control_t*(u,g): A[ p U q ]

u is an upper-bound to prune the search

g is the time to the goal from the current state

UPPAAL Tiga

Model Checking (ex Train Gate)

: Never two trains at

the crossing at the

same time

Environment

Controller

Synthesis (ex Train Gate)

the crossing at the

same time

Environment

Controller

Timed Games

the crossing at the

same time

Controllable Uncontrollable

Find strategy for controllable

actions st behaviour satisfies

Controller

Environment

TU Graz, May 2017

Synthesis Demo

Kim Larsen [26]

A Buggy Brick Sorting Program

16MCD 2001, Twente Kim G. Larsen

First UPPAAL model

Sorting of Lego Boxes

Conveyer Belt

Exercise: Design Controller so that only yellew boxes are being pushed out

Piston

Yellow

9 18 81 90

BlckYel

remove

Controller

Ken Tindell

MAIN PUSH

Conveyer Belt

27TU Graz, May 2017

Brick Sorting

Piston

Generic Plate

Controller

28TU Graz, May 2017

Brick Sorting

Piston

Generic Plate

Controller

29TU Graz, May 2017

Problem: avoid having the plates falling down

The Chinese Juggling Problem

thanks to Oded Maler

Kim G Larsen

Balancing Plates / Timed Automata

A Plate

The Joggler

E :(Plate1.Bang or Plate2.Bang or …)Kim G Larsen 31China Summer 2009

Balancing Plates / Time Uncertainty

Strategy

BDD/CDD

Kim G Larsen 32China Summer 2009

Production Cell Overview

Realistic case-study describedin several formalisms(1994 and later).

Objective: stampmetal plates in press.

feed belt, two-armedrobot, press, anddeposit belt.

Production Cell in UPPAAL Tiga

Experimental Results

[CDF+05]

[BCD+07]

Plastic Injection Molding Machine

Robust and optimal control

Tool Chain

Synthesis: UPPAAL TIGA

Verification: PHAVer

Performance: SIMULINK

40% improvement of existing solutions..

Quasiomodo

[CJL+09]

Oil Pump Control Problem

R1: stay within safe interval [4.9,25.1]

R2: minimize average/overall oil volume

Quasiomodo

The Machine (consumption)

Infinite cyclic demand to be satisfied by our control strategy.

P: latency 2 s between state change of pump

F: noise 0.1 l/s

Quasiomodo

Hybrid Game Model

Abstract Game Model

UPPAAL Tiga offers games of perfect information

Abstract game model such that states only contain information about: Volume of oil at the beginning of cycle

The ideal volume as predicted by the consumption cycle

Current time within the cycle

State of the Pump (on/off)

Discrete model

DV, V_rate

V_acctime

Quasiomodo

Machine (uncontrollable)

Checks whether V under noise gets

outside [Vmin+0.1,Vmax-0.1]

Quasiomodo

Pump (controllable)

Every 1 (one) seconds

Quasiomodo

TU Graz, May 2017

Global Approach

Find some interval I1=[V1,V2] [4.9,25.1] s.t

I1 is m-stable i.e. from any V0 in I1 there is strategy stwhatever fluctuation volume is always within [5,25] and at the end within I2=[V1+m,V1-m]

I1 is optimal among all m-stable intervals.

0 s 20 s

Tool Chain

Strategy Synthesis TIGA

Verification PHAVER

Performance Evaluation

SIMULINK

GuaranteedCorrectnessRobustness

with40% Improvement

Quasiomodo

Synthesis of Home Automation

What else ?

Timed Games w Partial Observability Action-based Observation: undecidable [BDMP03] Finite-observation of states: decidable [CDL+07]

Priced Timed Games: Acyclic, cost non-zeno: decidable [LTMM02] [BCFL04] 1 clock: decidable [BLMR06] >2 clocks: undecidable [BBR05, BBM06] 2 clocks: open

Energy Games: Several Open Problems Exponential Observers

Climate Controller inPig Stables [JRLD07]

CHESS Way [Quasimodo@ESWEEK]

Timed Gamesand

STRATEGO

Kim G. Larsen

DENMARK

Going to Uppsala – in 1 hour

U[42,45]

U[0,35]

U[0,20]

U[0,140]

Optimal WC Strategy(2-player) Take bikeWC=45

Optimal Expected Strategy(1½ player)Take carE = 16WC = 140

Optimal Expected Strategyguaranteeing WC<=60

GTimed Game

σStrategy

PStochastic

PricedTimed Game

synthesis

abstraction

σ°optimizedStrategy

G|σTimed Automata

P|σ°Stochastic Priced Timed Automata

minE(cost)

maxE(gain)

Uppaal TIGAstrategy NS = control: A<> goalstrategy NS = control: A[] safe

Statistical Learning

strategy DS = minE (cost) [<=10]: <> done under NSstrategy DS = maxE (gain) [<=10]: <> done under NS

UppaalE<> error under NSA[] safe under NS

Uppaal SMCsimulate 5 [<=10]{e1, e2} under SS Pr[<=10](<> error) under SS E[<=10;100](max: cost) under SS

Timed Games

Strategy:

Memoryless, deterministic, most permissive.

Uncontrol-

Controllable

𝜋 = INIT, 𝑥 = 050.1

՜r(CHOICE, 𝑥 = 0)

2.4՜b(B, 𝑥 = 0)

20.3՜d(END, 𝑥 = 20.3)

Total time = 50.1 + 2.4 + 20.3 = 72.8

Timed Games –Time Bounded Reachability

Objective: 𝐴⟨⟩(END ∧ time≤ 210)

Deterministic, memoryless strategy:

100 200

𝝀 𝝀

100 200

Most permissive, memoryless strategy

Priced Timed Games

• Cost optimal strategy take b immediately

WC= 280

”CORA”

Total 𝑐𝑜𝑠𝑡 = 𝟎 + 𝟒. 𝟖 + 𝟔𝟎. 𝟗 = 𝟔𝟓. 𝟕

Priced Run

𝜋 = Init, 𝑥 = 050.1

𝟎՜rCHOICE, 𝑥 = 0

4.8՜b

B, 𝑥 = 020.3

𝟔𝟎.𝟗՜d(END, 𝑥 = 20.3)

Priced Timed MDP

• Cost optimal strategy take b immediately

WC= 280

Priced Timed MDP

Optimal expected cost str

take a immediately

expectation = 160

UNIFORM[0,100]

Controllable

”SMC”

Priced Timed MDP

TU Graz, May 2017

Kim Larsen [55]

Cost optimal strategy

take b immediately

overall = 280

Priced Timed MDP

Optimal expected cost str

take a immediately

expectation = 160

Minimal Expected Cost while

guaranteeing END is reached

within time 210:

Strat.: t>90 (100,w)

t>70 (0,a)

t<70 (0,b)

UNIFORM[0,100]

Controllable

”SMC”

Stochastic Strategies for Learning!

Objective: 𝐴⟨⟩(END ∧ time≤ 210)

Most permissive, memoryless strategy:

100 200

a bCost optimal deterministic

sub-strategy !

Stochastic Strategies

𝝀 )

Reinforcement Learning

Time Bounded Reachability(G,T)

Learned Strategies

More plots of runs according to strategies learne. 𝝀

Covariance Matrices

Learned Strategies

Covariance Matrices

Learned Strategies

Covariance Matrices

Learned Strategies

Covariance Matrices

Learned Strategies

Covariance Matrices

Learned Strategies

Covariance Matrices

Strategies – Representation

Nondeterministic Strategies 𝜎𝑛(ℓ,𝑣)

⊆ Σ𝑐 ∪ 𝜆

Stochastic Strategies

Covariance Matrices

Splitting

Logistic Regression

𝜇𝑠(ℓ,𝑣)

∶ Σ𝑐 ∪ 𝜆 ՜ [0,1]

𝑅ℓ𝑅ℓ

Stochastic Timed Game

Safe & Adaptive Cruice Control

Q1: Find the most permissive strategy ensuring safety.

Q2:Find the optimal sub-strategy that will allow Ego to go as close as possible.

EGO FRONT

Two Player Game (simplified)

Discretization

Discrete

Continuous

Front (complete)

No Strategy

Safety Strategy

Safe and Optimal Strategy

Optimal and Safe Strategy

Safety Strategy (Code)

Synthesis of Climate Controllers

TACAS16

From Verification to Synthesisand Optimization

SELUXIT

Zone-based climatecontrol pig-stables

Profit-optimal, minimal-wear and energy-awareschedules for satelittes

Personalized light control in homeautomation

Energy- and comfort-optimal floor heating

Intelligent Traffic Control

Safe and optimal car maneuvers

Mathias G Sørensen

TU Graz, May 2017

GOMSpace

More Practical Synthesis …

Learning, Analysis,

SynthesiS and Optimization

of Cyber-Physical Systems

Ongoing Work

Verification of learned strategy (zonification) Full correctness of hybrid control obtained from

discretization. Combination with Invariance Analysis for Switched Controller

Synthesis; Metrics; Verification using SpaceEX;

From optimal strategy of bounded horizon to optimal infinite strategy.

Strategies: complexity (space and time) and permissiveness

On-line synthesis may be too slow for some application

Satellites > Floor-heating > Traffic > Power Electronics Learn Neural Network representation of optimal strategy

LASSOLearning, Analysis, SynthesiS and Optimization

of Cyber-Physical Systems

𝜇1 𝜇𝑛

Safety Constraints

Perf. Measures

Model of

Physical Comp.Model of

Cyber Comp.

Unknown

Learning

Analysis

Synthesize

Optimize

Fig 1. The LASSO Framework

TU Graz, May 2017

NEXT UPPAAL BRANCHES

PhD/visitors/PostDocPosition available

Contact: kgl@cs.aau.dk84

Thank You !

Exercises (from yesterday)

Exercise 28: Jobshop Scheduling

Look at

https://www.dropbox.com/sh/96tvd7qklf1gcyh/AADFwVJk97L9qtJLkf0VUgtOa?dl=0

In UPPAAL SMC

Coffee Machine

Hammer Nail

Hybrid

… and more

TU Graz, 2017 Kim Larsen [86]

Exercises

Download UPPAAL STRATEGO 4.1.20-3 or 4.1.20-4

http://people.cs.aau.dk/~marius/stratego

Experiment with Newspaper

Traffic Uppsala

Cruice

Small Car

See Dropbox

Timed Games STRATEGO and Stochastic Priced Timed Gamespeople.cs.aau.dk › ~kgl › GRAZ17 ›...

Documents