Date post: | 14-Dec-2015 |
Category: |
Documents |
Upload: | austen-lucas |
View: | 215 times |
Download: | 0 times |
Decision-Theoretic Planning with Asynchronous Events
Håkan L. S. YounesCarnegie Mellon University
2
Introduction
Asynchronous processes are abundant in the real world
Discrete-time models are inappropriate for systems with asynchronous events
Generalized semi-Markov (decision) processes are great for this!
3
Stochastic Processes with Asynchronous Events
m1 m2
m1 upm2 up
t = 0
4
Stochastic Processes with Asynchronous Events
m1 m2
m1 upm2 up
m1 upm2 down
m2 crashes
t = 0 t = 2.5
5
Stochastic Processes with Asynchronous Events
m1 m2
m1 upm2 up
m1 upm2 down
m1 downm2 down
m1 crashesm2 crashes
t = 0 t = 2.5 t = 3.1
6
Stochastic Processes with Asynchronous Events
m1 m2
m1 upm2 up
m1 upm2 down
m1 downm2 down
m1 downm2 up
m1 crashesm1 crashes m2 rebooted
t = 0 t = 2.5 t = 3.1 t = 4.9
7
A Model of StochasticDiscrete Event Systems
Generalized semi-Markov process (GSMP) [Matthes 1962] A set of events E A set of states S
8
Events
In a state s, events Es E are enabled
With each event e is associated: A distribution Ge governing the time e
must remain enabled before it triggers
A next-state probability distribution pe(s′|s)
9
Semantics of GSMP Model Associate a real-valued clock te with e For each e Es sample te from Ge
Let e* = argmine Es te, t* = mine Es te
Sample s′ from pe*(s′|s) For each e Es′
te′ = te – t* if e Es \ {e*} sample te′ from Ge otherwise
Repeat with s = s′ and te = te′
10
Semantics: Example
m1 upm2 up
m1 upm2 down
m2 crashes
m2 crashes: 2.5m1 crashes: 3.1
m1 crashes: 0.6reboot m2: 2.4
11
Notes on Semantics
Events that remain enabled across state transitions without triggering are not rescheduled Asynchronous events! Differs from semi-Markov process in
this respect Continuous-time Markov chain if all Ge
are exponential distributions
12
General State-Space Markov Chain (GSSMC)
Model is Markovian if we include the clocks in the state space Extended state space X Next-state distribution f(x′|x) well-defined
Clock values are not know to observer Time events have been enabled is know
13
Observation Model
An observation o is a state s and a real value ue for each event representing the time e has currently been enabled f(x|o) is well-defined
14
Observations: Example
m1 upm2 up
m1 upm2 down
m2 crashes
m2 crashes: 2.5m1 crashes: 3.1
m1 crashes: 0.6reboot m2: 2.4
m1 upm2 up
m1 upm2 down
m2 crashes
m2 crashes: 0.0m1 crashes: 0.0
m1 crashes: 2.5reboot m2: 0.0
Observed model:
Actual model:
15
Actions and Policies (GSMDPs)
Identify a set A E of controllable events (actions)
A policy is a mapping from observations to sets of actions Action choice can change at any time
in a state s
16
Rewards and Discounts
Lump sum reward k(s,e,s′) associated with transition from s to s′ caused by e
Continuous reward rate r(s,a) associated with a being enabled in s
Discount factor Unit reward earned at time t counts as e
–
t
17
Value Function for GSMDPs
X Sse
tt
X X
tt
t
dxsoxobsVsesksspeosceoxf
dxxdsoxobsVseskoxxfedtosceoxfoV
)),,(()*,,()|())(,(11
)|(
)),,(()*,,(),|())(,()|()(
***
**
0
18
GSMDP Solution Method [Younes & Simmons 2004]
Continuous-time MDPGSMDP Discrete-time MDP
Phase-type distributions Uniformization[Jensen 1953]
GSMDP Continuous-time MDP
19
Continuous Phase-Type Distributions [Neuts 1981]
Time to absorption in a continuous-time Markov chain with n transient states
eetX t Q 1]Pr[
rows
1
1
ne
20
Exponential Distribution
1
0
Q
tetX 1]Pr[
21
Two-Phase Coxian Distribution
01
10p1
(1 – p)1
2
2
11
0 p
Q
22
Generalized Erlang Distribution
n – 1 10 …p
(1 – p)
001
000
00
0
0
00
p
Q
23
Method of Moments
Approximate general distribution G with phase-type distribution PH by matching the first n moments
24
Moments of a Distribution
The ith moment: i = E[X i]
Mean: 1
Variance: 2 = 2 – 1
2 Coefficient of variation: cv = /1
25
Matching One Moment
Exponential distribution: = 1/1
26
Matching Two Moments
cv 2
0 1
1
1
Exponential Distribution
27
Matching Two Moments
cv 2
0 1
1
1
Exponential Distribution
2
1
cvn
1
1
npp
)1)(1(2
44221
2
222
cvn
cvnnncvnp
Generalized Erlang Distribution
28
cv 2
Matching Two Moments
0 1
2
1
cvn
1
1
npp
)1)(1(2
44221
2
222
cvn
cvnnncvnp
Generalized Erlang Distribution
21
2
21
2
1
cv
22
1
cvp
Two-Phase Coxian Distribution
1
1
Exponential Distribution
29
Matching Moments: Example 1
Weibull distribution: W(1,1/2) 1 = 2, cv2 = 5 10
1/10
9/10
1/10
one momenttwo moments
W(1,1/2)
1 2 3 4 5 6 7 8
0.5
1.0
t
F(t)
30
Matching Moments: Example 2
Uniform distribution: U(0,1) 1 = 1/2, cv2 = 1/3 10
6 62
6
0.5
1.0F(t)
U(0,1)
t
one momenttwo moments
1 2
31
Matching More Moments
Closed-form solution for matching three moments of positive distributions [Osogami & Harchol-Balter 2003] Combination of Erlang distribution
and two-phase Coxian distribution
32
Approximating GSMDP with Continuous-time MDP
Each event with a non-exponential distribution is approximated by a set of events with exponential distributions Phases become part of state
description
33
Policy Execution
Phases represent discretization into random-length intervals of the time events have been enabled
Phases are not part of real model Simulate phase transition during
execution
34
The Foreman’s Dilemma
When to enable “Service” action in “Working” state?
Workingc = 1
Failedc = 0
Servicedc = 0.5
ServiceExp(10)
ReturnExp(1)
FailG
ReplaceExp(1/100)
35
The Foreman’s Dilemma: Optimal Solution
Find t0 that maximizes v0
0201
0
210
2
1
1
1
1001
1
11
)(1)(11
)(1)(
vvvv
dtveetFtfveetFtfv ttXY
ttYX
t
XX
ttX
dxxftF
tte
tttf
0
0)(10
0
)()(
10
0)(
0
Y is the time to failure in “Working” state
36
The Foreman’s Dilemma: SMDP Solution
Same formulas, but restricted choice: Action is immediately enabled (t0 = 0) Action is never enabled (t0 = ∞)
37
3 moments2 momentsSMDP1 moment
The Foreman’s Dilemma: Performance
Failure-time distribution: U(5,x)
x
Perc
ent
of
opti
mal
100
90
80
70
60
505 10 15 20 25 30 35 40 45 50
38
3 moments2 momentsSMDP1 moment
The Foreman’s Dilemma: Performance
Failure-time distribution: W(1.6x,4.5)
x
Perc
ent
of
opti
mal
100
90
80
70
60
505 10 15 20 25 30 35 400
39
System Administration
Network of n machines Reward rate c(s) = k in states where
k machines are up One crash event and one reboot
action per machine At most one action enabled at any
time
40
System Administration: Performance
3 moments2 moments1 moment
Rew
ard
50
45
40
35
30
25
20
15
1 2 3 4 5 6 7 8 9 10 11 12 13n
Reboot-time distribution: U(0,1)
41
System Administration: Performance
1 moment 2 moments 3 moments
size
states
q time (s)
states q time (s)
states q time (s)
4 16 5 0.36 32 9 3.57 112 21.0
10.30
5 32 6 0.82 80 10 7.72 272 22.0
22.33
6 64 7 1.89 192 11 16.24 640 23.0
40.98
7 128 8 3.65 448 12 28.04 1472 24.0
69.06
8 256 9 6.98 1024 13 48.11 3328 25.0
114.63
9 512 10 16.04 2304 14 80.27 7424 26.0
176.93
10 1024 11 33.58 5120 15 136.4 16384 27.0
291.70
11 2048 12 66.00 24576 16 264.17 35840 28.0
481.10
12 4096 13 111.96 53248 17 646.97 77824 29.0
1051.33
13 8192 14 210.03 114688
18 2588.95
167936
30.0
3238.16
2n (n+1)2n (1.5n+1)2n
42
The Role of Phases
Foreman’s dilemma: Phases permit delay in enabling of
actions System administration:
Phases allow us to take into account the time an action has already been enabled
43
Summary
Generalized semi-Markov (decision) processes allow asynchronous events
Phase-type distributions can be used to approximate a GSMDP with an MDP Allows us to approximately solve GSMDPs
using existing MDP techniques Phase does matter!
44
Future Work
Discrete phase-type distributions Handles deterministic distributions Avoids uniformization step
Value function approximation Take advantage of GSMDP structure
Other optimization criteria Finite horizon, etc.
45
Tempastic-DTP
A tool for GSMDP planning:http://www.cs.cmu.edu/~lorens/tempastic-
dtp.html