+ All Categories
Home > Documents > Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur...

Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur...

Date post: 29-Dec-2015
Category:
Upload: rebecca-hodge
View: 220 times
Download: 1 times
Share this document with a friend
25
Background on Background on Reliability and Reliability and Availability Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002, 2003 E E 681 - Module 2 ( Version for book website )
Transcript
Page 1: Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,

Background on Reliability and Background on Reliability and AvailabilityAvailability

Slides prepared by Wayne D. Grover and Matthieu Clouqueur

TRLabs & University of Alberta

© Wayne D. Grover 2002, 2003

E E 681 - Module 2

( Version for book website )

Page 2: Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,

E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 2

Overview of the lecture

• Concept of Reliability– Reliability function, Failure density function, hazard rate

• Concept of Availability:– Availability function, unavailability, availability of elements in

series/parallel

• Methodology for Availability Analysis– Quick Unavailability Lower bound estimation– Cut sets method– Tie paths method

• Automatic Protection Switching (APS) Systems– Principle– Availability Analysis of an APS system

Page 3: Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,

E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 3

Reliability is a mission-oriented question

Technical meaning of Reliability

• In everyday English:– “My car is very reliable” It works well, it starts every time (even at

-30°).

• Technical meaning:– Reliability is the probability of a device performing its purpose

adequately for the period of time intended under the operating conditions intended.

– Example:• Reliability of a fuel-pump during a rocket launch

Page 4: Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,

E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 4

– Q(t) = probability { at least one failure in interval [0,t] }– Q(t) = 1 - R(t)

R(t)

t

( )

0dR t

dt(R(t) is always a non-increasing function)

Reliability

• The reliability function R(t):– R(t) = probability { no failure in interval [0,t] }

R(0) = 1R() = 0

1

Page 5: Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,

E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 5

2

1

1 2( ) probability of at least one failure in interval [t ,t ]t

t

f t dt

( ) 1 ( )t

o

R t f t dt

f(t) can be seen as the pdf of time to next failure

Reliability

• R(t) = prob { no failure in [0,t] }

• Related function: failure density function, f(t)

( ) ( )t

o

Q t f t dt

Page 6: Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,

E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 6

1 ( ) 1 ( )( )

( ) ( )

dQ t dR tt

R t dt R t dt

0

( )

( )

t

u du

R t e

Rate of failuresGiven that the element has

survived this long

Reliability

• Hazard rate (t) (age specific failure rate) : Rate of failure of an element given that this element has survived this long

0

( ) ln ( )t

u du R t

integration

Page 7: Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,

E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 7

0 0

{ ( )} ( ) ( )d

MTTF E f t t f t dt t R t dtdt

TTF1

Failure0 t

Reliability

• Expected Time To Failure or Mean Time To Failure (MTTF):– It is the expected value of the random variable with pdf f(t):

Page 8: Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,

E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 8

0( )t

0( ) tR t e

0

1MTTF

00( )

prob { k failures in [0,t] } = !

kt

te

k

Reliability

• Special case: constant hazard rate (memoryless)

– In this case we can apply the Poisson distribution:

Page 9: Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,

E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 9

Reliability

• Numerical example:– Poisson distribution with = 1 / (5 years)

• Probability of 1 failure in the first year: P = 16.4%• Probability of at least one failure in the first year: P = 18.1%

• Probability of 1 failure in the first 5 years: P = 36.8%• Probability of at least one failure in the first 5 years: P = 63.2%

Page 10: Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,

E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 10

• “What is the probability that the engine of a formula 1 car will work during the whole race?”– This is a reliability question

• “How often do I hear the dial tone when I pick up the phone?”– This is an availability question

Availability is the probability of finding the system in the operating state at any arbitrary time in the future

Unlike in the context of reliability we now consider systems that can be repaired

Availability (Repairable systems)

Page 11: Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,

E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 11

A(t)R(t)

A(t)

R(t)

t=0 t

“Asystem”

21 3

Region 1: R(t) and A(t) are the same

Region 2: Repair actions begin to hold up A

Region 3: A reaches a steady state

Availability

• Comparison of Availability and Reliability Functions:

Page 12: Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,

E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 12

Time to Repair

Time to Failure

Time to Repair

Time Between Failures

t

MTTF: Mean-Time To Failure

MTBF: Mean-Time Between Failures

MTTR: Mean-Time To Repair

Failure Repair RepairFailure

MTTFA

MTTF MTTR

lim

obsT obs

UptimeA

T

Availability

Page 13: Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,

E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 13

MTTR MTTRU

MTTF MTTR MTBF

1U A

In availability analysis we usually work with unavailability quantities because of some simplifications that can be done on the unavailability of elements in series and in parallel

FIT: Unit corresponding to 1 failure in 109 hours

1 FIT = 1 failure in 114,155 years1 failure / year = 114,155 FITS (high!)Typical value for telecom equipment: 1500 FITS

( MTTF = 76 years )

Availability

• Unavailability:

Page 14: Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,

E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 14

1 2 3 n . . .

2

1

n

. . .

1

n

s i

i

U U

1

n

s i

i

U U

Approximation based on the fact that Ui << 1

Numerical examp.

Ui = 10-3, n = 3

Us = 3 . 10 -3

Us = 10 -9

Availability

• Series elements unavailability reduction:

• Parallel elements unavailability reduction:

Page 15: Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,

E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 15

Availability Analysis

• The reliability engineer can use different techniques to evaluate the availability of a system:– 1) Quick estimate of a lower bound for the unavailability– 2) Series and parallel unavailability reductions– 3) Cut set method– 4) Tie paths method– 5) Conditional decomposition

• The general methodology is explained next…

Page 16: Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,

E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 16

Availability Analysis

• General Methodology:1) Get unavailability values of all components and sub-systems.2) Draw parallel and series availability relationships3) Reduce the system availability model by repeated applications of the

parallel/series availability simplifications.4) If not completely reduced, do quick unavailability lower bound

estimation, use the tie paths method, the cut sets method or the conditional decomposition

Page 17: Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,

E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 17

AB

C

D

E

F

G H

Lower bound of Us: UA+UH

Availability Analysis

• Lower bound on unavailability– The contributions of parallel elements to the unavailability is not

taken into account

– In some cases this quick evaluation of a lower bound on U can be enough to conclude that the system does not meet the availability requirements

Page 18: Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,

E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 18

1A

2

3

5

B 4

IO

path i

( path)

v

v

A A

tiepaths

syst path i

( )i

A A

Availability Analysis

• Tie paths method:– We enumerate all the paths from I to O

8 || 96+7

1 2

3

5

4

A B

I O

– The availability of each paths is calculated:

– The availability of the system is:

Page 19: Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,

E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 19

cut i)(

(cut i) 1v

vP A

(i cusets)

syst 1 (cut i)A P

Availability Analysis

• Cut sets method:– Which combinations of element failures can bring the system down?

– The probability of each cut is calculated:

– The availability of the system is :

1A

2

3

5

B 4

Page 20: Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,

E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 20

Asyst 1 Asyst 2

syst d syst 1 d syst 2(1 )A A A A A

Ad low

Availability Analysis

• Conditional decomposition (High Unavailability Elements):– When some elements have high U, it becomes less acceptable to

sum unavailabilities.– Solution: Conditional decomposition:

A1

A3

A2

A4

A d

A1 A2

A3A4

A1

A3 A4

A2

– The availability of the system is :

Page 21: Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,

E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 21

Automatic Protection Switching (APS) Systems• Basic idea:

– to provide a standby transmission channel that is kept in fully operating condition and used to replace any of the other traffic bearing channels in the event of their failure

• Characteristics of an APS system:– spare to working ratio:

• ‘1-to-1’ or ‘1-to-N’

– co-routed / diversely routed:• ‘1-to-1’ or ‘1-to-1 /DP’• ‘1-to-N’ or ‘1-to-N /DP’

– 1+1 or 1:1:• ‘1+1’: Signal always sent on the spare channel• ‘1:1’: Signal sent on spare channel upon failure of the working channel

Page 22: Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,

E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 22

1Uw

NUw

Us

Ub2 Ut2

spare

N working

Ub1 Ut1Ub1

For Head End Bridge(HEB) and Tail End Transfer(TET):Mode 1 failure: working signal is not relayedMode 2 failure: no bridging or transfer to/from spare channel

Automatic Protection Switching (APS) Systems• 1:N APS system:

Page 23: Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,

E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 23

Automatic Protection Switching (APS) Systems• Cut sets approach to 1:N APS availability analysis:

– Combinations creating outage for a specific channel (cut sets):• Cut set 1: Failure that channel with prior failure of at least one other working

channel• Cut set 2: Failure of that working channel plus the spare channel or head end

bridge or tail end transfer in mode 2• Cut set 3: Failure of head end bridge or tail end transfer in mode 1

– The probability of each cut set is:• Cut set 1: Uw (N-1)Uw 0.5

• Cut set 2: Uw (Us + Ub2 + Ut2)

• Cut set 3: Ub1 + Ut1

Page 24: Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,

E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 24

2

channel 2 2 1 11

( )2

w s b t b twN

U U U U U U U U

O(U)

A

B

O(Uc)UA = UB = 10-3 UC = 10 -5

US = UA UB + 2 UC 2 UC

c c

Automatic Protection Switching (APS) Systems• 1:N APS Unavailability

– The unavailability of a channel is:

– The term in O(U) reflects the irreducible series-availability elements: the HEB and the TET in their mode 1 failure.

• It is impossible to make a perfectly redundant system. There is always some parallelism-accessing device c that brings series unavailability contribution

Page 25: Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,

E E 681 Lecture #2 © Wayne D. Grover 2002, 2003 25

Summary

• Reliability is a mission oriented question for non-repairable systems

• In telecom engineering we are interested in the availability of the system designed

• There are several techniques that can be used for availability analysis. The one we will use in the rest of the course is the algebraic approach (equivalent to cut sets)

• APS is a protection scheme that enhances availability by providing a spare channel for restoration of failed working channels


Recommended