+ All Categories
Home > Documents > A. BobbioBertinoro, March 10-14, 20031 Dependability Theory and Methods 5. Markov Models Andrea...

A. BobbioBertinoro, March 10-14, 20031 Dependability Theory and Methods 5. Markov Models Andrea...

Date post: 20-Dec-2015
Category:
View: 216 times
Download: 2 times
Share this document with a friend
Popular Tags:
69
A. Bobbio Bertinoro, March 10-14, 20 03 1 Dependability Theory and Methods 5. Markov Models Andrea Bobbio Dipartimento di Informatica Università del Piemonte Orientale, “A. Avogadro” 15100 Alessandria (Italy) bobbio @ unipmn .it - http://www.mfn.unipmn.it/~bobbio Bertinoro, March 10-14, 2003
Transcript

A. Bobbio Bertinoro, March 10-14, 2003 1

Dependability Theory and Methods

5. Markov Models

Andrea BobbioDipartimento di Informatica

Università del Piemonte Orientale, “A. Avogadro”15100 Alessandria (Italy)

[email protected] - http://www.mfn.unipmn.it/~bobbio

Bertinoro, March 10-14, 2003

A. Bobbio Bertinoro, March 10-14, 2003 2

States and labeled state transitionsState can keep track of:

– Number of functioning resources of each type– States of recovery for each failed resource– Number of tasks of each type waiting at each

resource– Allocation of resources to tasks

A transition:– Can occur from any state to any other state– Can represent a simple or a compound event

State-Space-Based Models

A. Bobbio Bertinoro, March 10-14, 2003 3

Transitions between states represent the change of the system state due to the occurrence of an event

Drawn as a directed graph

Transition label:

– Probability: homogeneous discrete-time Markov chain (DTMC)

– Rate: homogeneous continuous-time Markov chain (CTMC)

– Time-dependent rate: non-homogeneous CTMC

– Distribution function: semi-Markov process (SMP)

State-Space-Based Models (Continued)

A. Bobbio Bertinoro, March 10-14, 2003 4

Modeler’s Options

Should I Use Markov Models?

State-Space-Based Methods

+ Model Dependencies

+ Model Fault-Tolerance and Recovery/Repair

+ Model Contention for Resources

+ Model Concurrency and Timeliness

+ Generalize to Markov Reward Models for Modeling

Degradable Performance

A. Bobbio Bertinoro, March 10-14, 2003 5

Modeler’s Options

Should I Use Markov Models?

+ Generalize to Markov Regenerative Models for Allowing

Generally Distributed Event Times

+ Generalize to Non-Homogeneous Markov Chains for Allowing

Weibull Failure Distributions

+ Performance, Availability and Performability Modeling Possible

- Large (Exponential) State Space

A. Bobbio Bertinoro, March 10-14, 2003 6

In order to fulfill our goals

Modeling Performance, Availability and Performability

Modeling Complex Systems

We Need

Automatic Generation and Solution of Large Markov

Reward Models

A. Bobbio Bertinoro, March 10-14, 2003 7

Model-based evaluation

Choice of the model type is dictated by:– Measures of interest

– Level of detailed system behavior to be represented

– Ease of model specification and solution

– Representation power of the model type

– Access to suitable tools or toolkits

A. Bobbio Bertinoro, March 10-14, 2003 8

State space State space modelsmodels

A transition represents the change of state of a single component

x i

s s’

Pr {s s’, t} = Pr {Z(t+ t) = s’ | Z(t) = s}

Z(t) is the stochastic processPr {Z(t) = s} is the probability of finding Z(t) instate s at time t.

A. Bobbio Bertinoro, March 10-14, 2003 9

State space State space modelsmodels

If s s’ represents a failure event:

x i

s s’

Pr {s s’, t} = = Pr {Z(t+ t) = s’ | Z(t) = s} = i t

If s s’ represents a repair event:

Pr {s s’, t} = = Pr {Z(t+ t) = s’ | Z(t) = s} = i t

A. Bobbio Bertinoro, March 10-14, 2003 10

Markov Process: definition

Transition Probability MatrixTransition Probability Matrix

initial

State Probability VectorState Probability Vector

Chapman-Kolmogorov Chapman-Kolmogorov EquationsEquations

Time-homogeneous CTMCTime-homogeneous CTMC

Time-homogeneous CTMCTime-homogeneous CTMC

The transition rate matrixThe transition rate matrix

C-K Equations for CTMCC-K Equations for CTMC

Solution equationsSolution equations

Transient analysisTransient analysis

Given that the initial state of the Markov chain,

then the system of differential Equations is written

based on:

rate of buildup = rate of flow in - rate of flow out

for each state (continuity equation).

Steady-state conditionSteady-state condition

If the process reaches a steady state condition, then:

Steady-state analysisSteady-state analysis(balance equation)(balance equation)

The steady-state equation can be written as a flow balance equation with a normalization condition on the state probabilities.

(rate of buildup) = rate of flow in - rate of flow out

rate of flow in = rate of flow out

for each state (balance equation).

A. Bobbio Bertinoro, March 10-14, 2003 22

2-component system2-component system

A. Bobbio Bertinoro, March 10-14, 2003 23

2-component system2-component system

A. Bobbio Bertinoro, March 10-14, 2003 24

2-component system2-component system

A. Bobbio Bertinoro, March 10-14, 2003 25

2-component series system2-component series systemA1 A2

2-component parallel system2-component parallel systemA1

A2

A. Bobbio Bertinoro, March 10-14, 2003 26

2-component stand-by system2-component stand-by system

A

B

A. Bobbio Bertinoro, March 10-14, 2003 27

Repairable system: Repairable system:

AvailabilityAvailability

A. Bobbio Bertinoro, March 10-14, 2003 28

Repairable system: Repairable system: 2 identical 2 identical

componentscomponents

A. Bobbio Bertinoro, March 10-14, 2003 29

Repairable system: Repairable system: 2 identical 2 identical

componentscomponents

A. Bobbio Bertinoro, March 10-14, 2003 30

Assume we have a two-component parallel

redundant system with repair rate .

Assume that the failure rate of both the components

is .

When both the components have failed, the system

is considered to have failed.

2-component Markov availability model

A. Bobbio Bertinoro, March 10-14, 2003 31

Markov availability model

Let the number of properly functioning

components be the state of the system.

The state space is {0,1,2} where 0 is the system

down state.

We wish to examine effects of shared vs. non-

shared repair.

A. Bobbio Bertinoro, March 10-14, 2003 32

2 1 0

2

2

2 1 0

2

Non-shared (independent) repair

Shared repair

Markov availability model

A. Bobbio Bertinoro, March 10-14, 2003 33

Note: Non-shared case can be modeled &

solved using a RBD or a FTREE but

shared case needs the use of Markov

chains.

Markov availability model

A. Bobbio Bertinoro, March 10-14, 2003 34

Steady-state balance equations

For any state: Rate of flow in = Rate of flow out Considering the shared case

i: steady state probability that system is in state i

122

021 2)(

01

A. Bobbio Bertinoro, March 10-14, 2003 35

Steady-state balance equations

Hence

Since

We have

Or

12 2

1210

01

12 000

2

20

21

1

A. Bobbio Bertinoro, March 10-14, 2003 36

Steady-state balance equations (Continued)

Steady-state Unavailability:

For the Shared Case = 0 = 1 - Ashared

Similarly, for the Non-Shared Case,

Steady-state Unavailability = 1 - Anon-shared

Downtime in minutes per year = (1 - A)* 8760*60

2

221

11

sharednonA

A. Bobbio Bertinoro, March 10-14, 2003 37

Steady-state balance equations

A. Bobbio Bertinoro, March 10-14, 2003 38

Absorbing states MTTF

A. Bobbio Bertinoro, March 10-14, 2003 39

Absorbing states - MTTF

BjidPz jiji

),(,)(0 ,,

jiz ,

Markov Reliability Model with Imperfect Coverage

A. Bobbio Bertinoro, March 10-14, 2003 41

Markov model with imperfect coverage

Next consider a modification of the 2-component parallel system proposed by Arnold as a model of duplex processors of an electronic switching system. We assume that not all faults are recoverable and that c is the coverage factor which denotes the conditional probability that the system recovers given that a fault has occurred. The state diagram is now given by the following picture:

A. Bobbio Bertinoro, March 10-14, 2003 42

Now allow for Imperfect coverage

c

A. Bobbio Bertinoro, March 10-14, 2003 43

Markov modelwith imperfect coverage

Assume that the initial state is 2 so that:

Then the system of differential equations are:

0)0()0(,1)0( 102 PPP

)()()1(2)(

)()()(2)(

)()()1(2)(2)(

120

121

1222

tPtPcdt

tdP

tPtcPdt

tdP

tPtPctcPdt

tdP

A. Bobbio Bertinoro, March 10-14, 2003 44

Markov model with imperfect coverage

)]1([2

)21(

c

cMTTF

After solving the differential equations we obtain:

R(t)=P2(t) + P1(t)

From R(t), we can obtain system MTTF:

It should be clear that the system MTTF and system reliability

are critically dependent on the coverage factor.

A. Bobbio Bertinoro, March 10-14, 2003 45

Source of fault coverage dataMeasurement data from an operational system

Large amount of data neededImproved instrumentation needed

Fault-injection experimentsExpensive but badly neededTools from CMU,Illinois, LAAS (Toulouse)

A fault/error handling submodel (FEHM)Phases: detection, location, retry, reconfig, rebootEstimate duration and probability of success of

each phase

A. Bobbio Bertinoro, March 10-14, 2003 46

Redundant System with Finite Detection Switchover Time

Modify the Markov model with imperfect coverage to allow for finite time to detect as well as imperfect detection.

You will need to add an extra state, say D. The rate at which detection occurs is . Draw the state diagram and investigate the

effects of detection delay on system reliability and mean time to failure.

A. Bobbio Bertinoro, March 10-14, 2003 47

Redundant System with Finite Detection Switchover Time

Assumptions:

Two units have the same MTTF and MTTR;

Single shared repair person;

Average detection/switchover time tsw=1/;

We need to use a Markov model.

A. Bobbio Bertinoro, March 10-14, 2003 48

Redundant System with Finite Detection Switchover Time

11D2 0

2

/1

/1

MTTR

MTTF

A. Bobbio Bertinoro, March 10-14, 2003 49

Redundant System with Finite Detection Switchover Time

After solving the Markov model, we obtain

steady-state probabilities:

)(

,,,

112

0112

Dsys

D

orA

A. Bobbio Bertinoro, March 10-14, 2003 50

Closed-form

Er

rA D

/))(2

(2

2

2

112

E

E

E

E

D

1

2

1

)(

1

1

1]2)(

1[

2

2

2

2

1

1

0

2

22

0

A. Bobbio Bertinoro, March 10-14, 2003 51

WFS Example

A. Bobbio Bertinoro, March 10-14, 2003 52

A Workstations-Fileserver Example

Computing system consisting of:– A file-server– Two workstations– Computing network connecting them

System operational as long as:– One of the Workstations

and– The file-server are operational

Computer network is assumed to be fault-free

A. Bobbio Bertinoro, March 10-14, 2003 53

The WFS Example

A. Bobbio Bertinoro, March 10-14, 2003 54

Assuming exponentially distributed times to failure

w : failure rate of workstation

f : failure rate of file-server

Assume that components are repairable

w: repair rate of workstation

f: repair rate of file-server

File-server has priority for repair over workstations (such repair priority cannot be captured by non-state-space models)

Markov Chain for WFS Example

A. Bobbio Bertinoro, March 10-14, 2003 55

Markov Availability Model for WFS

0,0

2,1 1,1

1,02,0

0,1

f

2w

2w

w

w w

w

f f ff f

Since all states are reachable from every other states, the CTMC is irreducible. Furthermore, all states are positive recurrent.

A. Bobbio Bertinoro, March 10-14, 2003 56

In the figure, the label (i,j) of each state is

interpreted as follows:

i represents the number of workstations that are

still functioning

j is 1 or 0 depending on whether the file-server is

up or down respectively.

Markov Availability Model for WFS (Continued)

A. Bobbio Bertinoro, March 10-14, 2003 57

For the example problem, with the states ordered as (2,1), (2,0), (1,1), (1,0), (0,1), (0,0) the Q matrix is given by:

Markov Availability Model for WFS (Continued)

ff

ffww

wwff

wfwfww

wwff

wfwf

0000

)(000

0)(00

0)(0

0020)2(

0002)2(

Q =

A. Bobbio Bertinoro, March 10-14, 2003 58

Markov Model (steady-state) : Steady-state probability vector

These are called steady-state balance equations

rate of flow in = rate of flow out

after solving for obtain Steady-state availability

1,0 i

iQ

1121 SSA

),,,,,( 000110112021

,

A. Bobbio Bertinoro, March 10-14, 2003 59

We compute the availability of the system:

System is available as long as it is in states

(2,1) and (1,1).

Instantaneous availability of the system:

Markov Availability Model

sst

AtA

tPtPtA

)(lim

)()()( )1,1()1,2(

A. Bobbio Bertinoro, March 10-14, 2003 60

Markov Availability Model (Continued)

9999.0ssA

1111 5.0,0.1,00005.0,0001.0 hrhrhrhr fwfw

A. Bobbio Bertinoro, March 10-14, 2003 61

Assume that the computer system does not recover if

both workstations fail, or if the file-server fails

Markov Reliability Model with Repair

A. Bobbio Bertinoro, March 10-14, 2003 62

Markov Reliability Model with Repair

States (0,1), (1,0) and (2,0) become absorbing states while (2,1) and (1,1)

are transient states.

Note: we have made a simplification that, once the CTMC reaches a system

failure state, we do not allow any more transitions.

A. Bobbio Bertinoro, March 10-14, 2003 63

Markov Model with Absorbing States

If we solve for P2,1(t) and P1,1(t) then

R(t)=P2,1(t) + P1,1(t)

For a Markov chain with absorbing states: A: the set of absorbing states B = - A: the set of remaining states

zi,j: Mean time spent in state i,j until absorption

BjidPz jiji

),(,)(0 ,,

)0(BB PQz

A. Bobbio Bertinoro, March 10-14, 2003 64

Markov Model with Absorbing States (Continued)

Mean time to absorption MTTA is given as:

Bji

jizMTTA),(

),(

QB derived from Q by restricting it to only states in B

A. Bobbio Bertinoro, March 10-14, 2003 65

Markov Reliability Model with Repair (Continued)

)(

2)2(

wfww

wwfBQ

[ ]

)(2)()(

)()(2solveFirst

1,21,11,1

1,11,21,2

tPtPdt

dP

tPtPdt

dP

wwfw

ww

A. Bobbio Bertinoro, March 10-14, 2003 66

Mean time to failure is 19992 hours.

Markov Reliability Model with Repair (Continued)

1,11,2

1,11,2

1,11,2

1,11,2

:Then

0 )(2

1))2((solvenext

)()()( :Then

zzMTTF

zz

zz

tPtPtR

wfww

wwf

A. Bobbio Bertinoro, March 10-14, 2003 67

Assume that neither workstations nor file-server is repairable

Markov Reliability Model without Repair

A. Bobbio Bertinoro, March 10-14, 2003 68

Markov Reliability Model without Repair (Continued)

States (0,1), (1,0) and (2,0) become absorbing states

A. Bobbio Bertinoro, March 10-14, 2003 69

Mean time to failure is 9333 hours.

Markov Reliability Model without Repair (Continued)

)(0

2)2(

wf

wwf

BQ

[ ]

1,11,2

1,11,2 )()()(

zzMTTF

tPtPtR


Recommended