Date post: | 20-Dec-2015 |
Category: |
Documents |
View: | 216 times |
Download: | 2 times |
A. Bobbio Bertinoro, March 10-14, 2003 1
Dependability Theory and Methods
5. Markov Models
Andrea BobbioDipartimento di Informatica
Università del Piemonte Orientale, “A. Avogadro”15100 Alessandria (Italy)
[email protected] - http://www.mfn.unipmn.it/~bobbio
Bertinoro, March 10-14, 2003
A. Bobbio Bertinoro, March 10-14, 2003 2
States and labeled state transitionsState can keep track of:
– Number of functioning resources of each type– States of recovery for each failed resource– Number of tasks of each type waiting at each
resource– Allocation of resources to tasks
A transition:– Can occur from any state to any other state– Can represent a simple or a compound event
State-Space-Based Models
A. Bobbio Bertinoro, March 10-14, 2003 3
Transitions between states represent the change of the system state due to the occurrence of an event
Drawn as a directed graph
Transition label:
– Probability: homogeneous discrete-time Markov chain (DTMC)
– Rate: homogeneous continuous-time Markov chain (CTMC)
– Time-dependent rate: non-homogeneous CTMC
– Distribution function: semi-Markov process (SMP)
State-Space-Based Models (Continued)
A. Bobbio Bertinoro, March 10-14, 2003 4
Modeler’s Options
Should I Use Markov Models?
State-Space-Based Methods
+ Model Dependencies
+ Model Fault-Tolerance and Recovery/Repair
+ Model Contention for Resources
+ Model Concurrency and Timeliness
+ Generalize to Markov Reward Models for Modeling
Degradable Performance
A. Bobbio Bertinoro, March 10-14, 2003 5
Modeler’s Options
Should I Use Markov Models?
+ Generalize to Markov Regenerative Models for Allowing
Generally Distributed Event Times
+ Generalize to Non-Homogeneous Markov Chains for Allowing
Weibull Failure Distributions
+ Performance, Availability and Performability Modeling Possible
- Large (Exponential) State Space
A. Bobbio Bertinoro, March 10-14, 2003 6
In order to fulfill our goals
Modeling Performance, Availability and Performability
Modeling Complex Systems
We Need
Automatic Generation and Solution of Large Markov
Reward Models
A. Bobbio Bertinoro, March 10-14, 2003 7
Model-based evaluation
Choice of the model type is dictated by:– Measures of interest
– Level of detailed system behavior to be represented
– Ease of model specification and solution
– Representation power of the model type
– Access to suitable tools or toolkits
A. Bobbio Bertinoro, March 10-14, 2003 8
State space State space modelsmodels
A transition represents the change of state of a single component
x i
s s’
Pr {s s’, t} = Pr {Z(t+ t) = s’ | Z(t) = s}
Z(t) is the stochastic processPr {Z(t) = s} is the probability of finding Z(t) instate s at time t.
A. Bobbio Bertinoro, March 10-14, 2003 9
State space State space modelsmodels
If s s’ represents a failure event:
x i
s s’
Pr {s s’, t} = = Pr {Z(t+ t) = s’ | Z(t) = s} = i t
If s s’ represents a repair event:
Pr {s s’, t} = = Pr {Z(t+ t) = s’ | Z(t) = s} = i t
Transient analysisTransient analysis
Given that the initial state of the Markov chain,
then the system of differential Equations is written
based on:
rate of buildup = rate of flow in - rate of flow out
for each state (continuity equation).
Steady-state analysisSteady-state analysis(balance equation)(balance equation)
The steady-state equation can be written as a flow balance equation with a normalization condition on the state probabilities.
(rate of buildup) = rate of flow in - rate of flow out
rate of flow in = rate of flow out
for each state (balance equation).
A. Bobbio Bertinoro, March 10-14, 2003 25
2-component series system2-component series systemA1 A2
2-component parallel system2-component parallel systemA1
A2
A. Bobbio Bertinoro, March 10-14, 2003 26
2-component stand-by system2-component stand-by system
A
B
A. Bobbio Bertinoro, March 10-14, 2003 27
Repairable system: Repairable system:
AvailabilityAvailability
A. Bobbio Bertinoro, March 10-14, 2003 28
Repairable system: Repairable system: 2 identical 2 identical
componentscomponents
A. Bobbio Bertinoro, March 10-14, 2003 29
Repairable system: Repairable system: 2 identical 2 identical
componentscomponents
A. Bobbio Bertinoro, March 10-14, 2003 30
Assume we have a two-component parallel
redundant system with repair rate .
Assume that the failure rate of both the components
is .
When both the components have failed, the system
is considered to have failed.
2-component Markov availability model
A. Bobbio Bertinoro, March 10-14, 2003 31
Markov availability model
Let the number of properly functioning
components be the state of the system.
The state space is {0,1,2} where 0 is the system
down state.
We wish to examine effects of shared vs. non-
shared repair.
A. Bobbio Bertinoro, March 10-14, 2003 32
2 1 0
2
2
2 1 0
2
Non-shared (independent) repair
Shared repair
Markov availability model
A. Bobbio Bertinoro, March 10-14, 2003 33
Note: Non-shared case can be modeled &
solved using a RBD or a FTREE but
shared case needs the use of Markov
chains.
Markov availability model
A. Bobbio Bertinoro, March 10-14, 2003 34
Steady-state balance equations
For any state: Rate of flow in = Rate of flow out Considering the shared case
i: steady state probability that system is in state i
122
021 2)(
01
A. Bobbio Bertinoro, March 10-14, 2003 35
Steady-state balance equations
Hence
Since
We have
Or
12 2
1210
01
12 000
2
20
21
1
A. Bobbio Bertinoro, March 10-14, 2003 36
Steady-state balance equations (Continued)
Steady-state Unavailability:
For the Shared Case = 0 = 1 - Ashared
Similarly, for the Non-Shared Case,
Steady-state Unavailability = 1 - Anon-shared
Downtime in minutes per year = (1 - A)* 8760*60
2
221
11
sharednonA
A. Bobbio Bertinoro, March 10-14, 2003 41
Markov model with imperfect coverage
Next consider a modification of the 2-component parallel system proposed by Arnold as a model of duplex processors of an electronic switching system. We assume that not all faults are recoverable and that c is the coverage factor which denotes the conditional probability that the system recovers given that a fault has occurred. The state diagram is now given by the following picture:
A. Bobbio Bertinoro, March 10-14, 2003 43
Markov modelwith imperfect coverage
Assume that the initial state is 2 so that:
Then the system of differential equations are:
0)0()0(,1)0( 102 PPP
)()()1(2)(
)()()(2)(
)()()1(2)(2)(
120
121
1222
tPtPcdt
tdP
tPtcPdt
tdP
tPtPctcPdt
tdP
A. Bobbio Bertinoro, March 10-14, 2003 44
Markov model with imperfect coverage
)]1([2
)21(
c
cMTTF
After solving the differential equations we obtain:
R(t)=P2(t) + P1(t)
From R(t), we can obtain system MTTF:
It should be clear that the system MTTF and system reliability
are critically dependent on the coverage factor.
A. Bobbio Bertinoro, March 10-14, 2003 45
Source of fault coverage dataMeasurement data from an operational system
Large amount of data neededImproved instrumentation needed
Fault-injection experimentsExpensive but badly neededTools from CMU,Illinois, LAAS (Toulouse)
A fault/error handling submodel (FEHM)Phases: detection, location, retry, reconfig, rebootEstimate duration and probability of success of
each phase
A. Bobbio Bertinoro, March 10-14, 2003 46
Redundant System with Finite Detection Switchover Time
Modify the Markov model with imperfect coverage to allow for finite time to detect as well as imperfect detection.
You will need to add an extra state, say D. The rate at which detection occurs is . Draw the state diagram and investigate the
effects of detection delay on system reliability and mean time to failure.
A. Bobbio Bertinoro, March 10-14, 2003 47
Redundant System with Finite Detection Switchover Time
Assumptions:
Two units have the same MTTF and MTTR;
Single shared repair person;
Average detection/switchover time tsw=1/;
We need to use a Markov model.
A. Bobbio Bertinoro, March 10-14, 2003 48
Redundant System with Finite Detection Switchover Time
11D2 0
2
/1
/1
MTTR
MTTF
A. Bobbio Bertinoro, March 10-14, 2003 49
Redundant System with Finite Detection Switchover Time
After solving the Markov model, we obtain
steady-state probabilities:
)(
,,,
112
0112
Dsys
D
orA
A. Bobbio Bertinoro, March 10-14, 2003 50
Closed-form
Er
rA D
/))(2
(2
2
2
112
E
E
E
E
D
1
2
1
)(
1
1
1]2)(
1[
2
2
2
2
1
1
0
2
22
0
A. Bobbio Bertinoro, March 10-14, 2003 52
A Workstations-Fileserver Example
Computing system consisting of:– A file-server– Two workstations– Computing network connecting them
System operational as long as:– One of the Workstations
and– The file-server are operational
Computer network is assumed to be fault-free
A. Bobbio Bertinoro, March 10-14, 2003 54
Assuming exponentially distributed times to failure
w : failure rate of workstation
f : failure rate of file-server
Assume that components are repairable
w: repair rate of workstation
f: repair rate of file-server
File-server has priority for repair over workstations (such repair priority cannot be captured by non-state-space models)
Markov Chain for WFS Example
A. Bobbio Bertinoro, March 10-14, 2003 55
Markov Availability Model for WFS
0,0
2,1 1,1
1,02,0
0,1
f
2w
2w
w
w w
w
f f ff f
Since all states are reachable from every other states, the CTMC is irreducible. Furthermore, all states are positive recurrent.
A. Bobbio Bertinoro, March 10-14, 2003 56
In the figure, the label (i,j) of each state is
interpreted as follows:
i represents the number of workstations that are
still functioning
j is 1 or 0 depending on whether the file-server is
up or down respectively.
Markov Availability Model for WFS (Continued)
A. Bobbio Bertinoro, March 10-14, 2003 57
For the example problem, with the states ordered as (2,1), (2,0), (1,1), (1,0), (0,1), (0,0) the Q matrix is given by:
Markov Availability Model for WFS (Continued)
ff
ffww
wwff
wfwfww
wwff
wfwf
0000
)(000
0)(00
0)(0
0020)2(
0002)2(
Q =
A. Bobbio Bertinoro, March 10-14, 2003 58
Markov Model (steady-state) : Steady-state probability vector
These are called steady-state balance equations
rate of flow in = rate of flow out
after solving for obtain Steady-state availability
1,0 i
iQ
1121 SSA
),,,,,( 000110112021
,
A. Bobbio Bertinoro, March 10-14, 2003 59
We compute the availability of the system:
System is available as long as it is in states
(2,1) and (1,1).
Instantaneous availability of the system:
Markov Availability Model
sst
AtA
tPtPtA
)(lim
)()()( )1,1()1,2(
A. Bobbio Bertinoro, March 10-14, 2003 60
Markov Availability Model (Continued)
9999.0ssA
1111 5.0,0.1,00005.0,0001.0 hrhrhrhr fwfw
A. Bobbio Bertinoro, March 10-14, 2003 61
Assume that the computer system does not recover if
both workstations fail, or if the file-server fails
Markov Reliability Model with Repair
A. Bobbio Bertinoro, March 10-14, 2003 62
Markov Reliability Model with Repair
States (0,1), (1,0) and (2,0) become absorbing states while (2,1) and (1,1)
are transient states.
Note: we have made a simplification that, once the CTMC reaches a system
failure state, we do not allow any more transitions.
A. Bobbio Bertinoro, March 10-14, 2003 63
Markov Model with Absorbing States
If we solve for P2,1(t) and P1,1(t) then
R(t)=P2,1(t) + P1,1(t)
For a Markov chain with absorbing states: A: the set of absorbing states B = - A: the set of remaining states
zi,j: Mean time spent in state i,j until absorption
BjidPz jiji
),(,)(0 ,,
)0(BB PQz
A. Bobbio Bertinoro, March 10-14, 2003 64
Markov Model with Absorbing States (Continued)
Mean time to absorption MTTA is given as:
Bji
jizMTTA),(
),(
QB derived from Q by restricting it to only states in B
A. Bobbio Bertinoro, March 10-14, 2003 65
Markov Reliability Model with Repair (Continued)
)(
2)2(
wfww
wwfBQ
[ ]
)(2)()(
)()(2solveFirst
1,21,11,1
1,11,21,2
tPtPdt
dP
tPtPdt
dP
wwfw
ww
A. Bobbio Bertinoro, March 10-14, 2003 66
Mean time to failure is 19992 hours.
Markov Reliability Model with Repair (Continued)
1,11,2
1,11,2
1,11,2
1,11,2
:Then
0 )(2
1))2((solvenext
)()()( :Then
zzMTTF
zz
zz
tPtPtR
wfww
wwf
A. Bobbio Bertinoro, March 10-14, 2003 67
Assume that neither workstations nor file-server is repairable
Markov Reliability Model without Repair
A. Bobbio Bertinoro, March 10-14, 2003 68
Markov Reliability Model without Repair (Continued)
States (0,1), (1,0) and (2,0) become absorbing states