+ All Categories
Home > Documents > Probabilistic sensitivity analysis of system availability using Gaussian processes

Probabilistic sensitivity analysis of system availability using Gaussian processes

Date post: 05-Dec-2016
Category:
Upload: tim
View: 215 times
Download: 3 times
Share this document with a friend
12
Probabilistic sensitivity analysis of system availability using Gaussian processes Alireza Daneshkhah, Tim Bedford n Department of Management Science, University of Strathclyde, Glasgow G1 1QE, UK article info Article history: Received 7 January 2012 Received in revised form 30 October 2012 Accepted 1 November 2012 Available online 27 November 2012 Keywords: Sensitivity analysis Sobol indices Dependency Computer code model output Emulator-based sensitivity Gaussian process System availability abstract The availability of a system under a given failure/repair process is a function of time which can be determined through a set of integral equations and usually calculated numerically. We focus here on the issue of carrying out sensitivity analysis of availability to determine the influence of the input parameters. The main purpose is to study the sensitivity of the system availability with respect to the changes in the main parameters. In the simplest case that the failure repair process is (continuous time/ discrete state) Markovian, explicit formulae are well known. Unfortunately, in more general cases availability is often a complicated function of the parameters without closed form solution. Thus, the computation of sensitivity measures would be time-consuming or even infeasible. In this paper, we show how Sobol and other related sensitivity measures can be cheaply computed to measure how changes in the model inputs (failure/repair times) influence the outputs (availability measure). We use a Bayesian framework, called the Bayesian analysis of computer code output (BACCO) which is based on using the Gaussian process as an emulator (i.e., an approximation) of complex models/functions. This approach allows effective sensitivity analysis to be achieved by using far smaller numbers of model runs than other methods. The emulator-based sensitivity measure is used to examine the influence of the failure and repair densities’ parameters on the system availability. We discuss how to apply the methods practically in the reliability context, considering in particular the selection of parameters and prior distributions and how we can ensure these may be considered independentone of the key assumptions of the Sobol approach. The method is illustrated on several examples, and we discuss the further implications of the technique for reliability and maintenance analysis. & 2012 Elsevier Ltd. All rights reserved. 1. Introduction In this paper, we present a new approach to study the sensitivity analysis of availability. In general sensitivity analysis is concerned with understanding how changes in the model input (distribution parameters) would influence the output. Suppose that our deter- ministic model can be written as y ¼ f ðxÞ, where x is a vector of input variables (or parameters) and y is the model output. For example, the inputs could be considered as the parameters of the failure and repair densities, h, and the output could be the availability Aðt, hÞ at time t. The traditional method of examining sensitivity of a model with respect to the changes in its input variables is local sensitivity analysis which is based on derivatives of f ðÞ evaluated at some ‘base-line’ (or central estimate) x ¼ x 0 and indicates how the output y will change if the base line input values are slightly perturbed (see [11] for the different local sensitivity measures commonly used in Bayesian analysis). This is clearly of limited value in under- standing the consequences of real uncertainty about the inputs, which would in practice require more than infinitesimal changes in the inputs. Furthermore, these methods are computationally very expensive for complex models and usually require a considerable number of model runs if we use a Monte Carlo based method to compute these sensitivity measures. For instance, Marseguerra et al. [21] used Monte Carlo simula- tion to calculate the first-order differential sensitivity indexes of the basic events characterising the reliability behaviour of a nuclear safety system. They reported that the computation of the sensitivity indexes for the system unavailability at the mission time by Monte Carlo simulation requires 10 7 iterations. In another study, Reedijk [28] reported that first order reliability methods and Monte Carlo simulation have certain disadvantages and some problems could not be solved with these methods. This issue is particularly interesting in the case where the model is computationally expensive so that simply computing the output for any given set of input values is a non-trivial task. Contents lists available at SciVerse ScienceDirect journal homepage: www.elsevier.com/locate/ress Reliability Engineering and System Safety 0951-8320/$ - see front matter & 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.ress.2012.11.001 n Corresponding author. E-mail address: [email protected] (T. Bedford). Reliability Engineering and System Safety 112 (2013) 82–93
Transcript
Page 1: Probabilistic sensitivity analysis of system availability using Gaussian processes

Reliability Engineering and System Safety 112 (2013) 82–93

Contents lists available at SciVerse ScienceDirect

Reliability Engineering and System Safety

0951-83

http://d

n Corr

E-m

journal homepage: www.elsevier.com/locate/ress

Probabilistic sensitivity analysis of system availability usingGaussian processes

Alireza Daneshkhah, Tim Bedford n

Department of Management Science, University of Strathclyde, Glasgow G1 1QE, UK

a r t i c l e i n f o

Article history:

Received 7 January 2012

Received in revised form

30 October 2012

Accepted 1 November 2012Available online 27 November 2012

Keywords:

Sensitivity analysis

Sobol indices

Dependency

Computer code model output

Emulator-based sensitivity

Gaussian process

System availability

20/$ - see front matter & 2012 Elsevier Ltd. A

x.doi.org/10.1016/j.ress.2012.11.001

esponding author.

ail address: [email protected] (T. Bedf

a b s t r a c t

The availability of a system under a given failure/repair process is a function of time which can be

determined through a set of integral equations and usually calculated numerically. We focus here on

the issue of carrying out sensitivity analysis of availability to determine the influence of the input

parameters. The main purpose is to study the sensitivity of the system availability with respect to the

changes in the main parameters. In the simplest case that the failure repair process is (continuous time/

discrete state) Markovian, explicit formulae are well known. Unfortunately, in more general cases

availability is often a complicated function of the parameters without closed form solution. Thus, the

computation of sensitivity measures would be time-consuming or even infeasible.

In this paper, we show how Sobol and other related sensitivity measures can be cheaply computed

to measure how changes in the model inputs (failure/repair times) influence the outputs (availability

measure). We use a Bayesian framework, called the Bayesian analysis of computer code output (BACCO)

which is based on using the Gaussian process as an emulator (i.e., an approximation) of complex

models/functions. This approach allows effective sensitivity analysis to be achieved by using far smaller

numbers of model runs than other methods.

The emulator-based sensitivity measure is used to examine the influence of the failure and repair

densities’ parameters on the system availability. We discuss how to apply the methods practically in

the reliability context, considering in particular the selection of parameters and prior distributions and

how we can ensure these may be considered independent—one of the key assumptions of the Sobol

approach. The method is illustrated on several examples, and we discuss the further implications of the

technique for reliability and maintenance analysis.

& 2012 Elsevier Ltd. All rights reserved.

1. Introduction

In this paper, we present a new approach to study the sensitivityanalysis of availability. In general sensitivity analysis is concernedwith understanding how changes in the model input (distributionparameters) would influence the output. Suppose that our deter-ministic model can be written as y¼ f ðxÞ, where x is a vector ofinput variables (or parameters) and y is the model output. Forexample, the inputs could be considered as the parameters of thefailure and repair densities, h, and the output could be theavailability Aðt,hÞ at time t.

The traditional method of examining sensitivity of a model withrespect to the changes in its input variables is local sensitivityanalysis which is based on derivatives of f ð�Þ evaluated at some‘base-line’ (or central estimate) x¼ x0 and indicates how the outputy will change if the base line input values are slightly perturbed

ll rights reserved.

ord).

(see [11] for the different local sensitivity measures commonly usedin Bayesian analysis). This is clearly of limited value in under-standing the consequences of real uncertainty about the inputs,which would in practice require more than infinitesimal changes inthe inputs. Furthermore, these methods are computationally veryexpensive for complex models and usually require a considerablenumber of model runs if we use a Monte Carlo based method tocompute these sensitivity measures.

For instance, Marseguerra et al. [21] used Monte Carlo simula-tion to calculate the first-order differential sensitivity indexes ofthe basic events characterising the reliability behaviour of anuclear safety system. They reported that the computation ofthe sensitivity indexes for the system unavailability at themission time by Monte Carlo simulation requires 107 iterations.In another study, Reedijk [28] reported that first order reliabilitymethods and Monte Carlo simulation have certain disadvantagesand some problems could not be solved with these methods.

This issue is particularly interesting in the case where themodel is computationally expensive so that simply computingthe output for any given set of input values is a non-trivial task.

Page 2: Probabilistic sensitivity analysis of system availability using Gaussian processes

A. Daneshkhah, T. Bedford / Reliability Engineering and System Safety 112 (2013) 82–93 83

This is especially the case for large process models in engineering,environmental science, reliability analysis, etc. that may be imple-mented in complex computer codes requiring many minutes, hoursor even days for a single run. However, in order to implement manyof the standard sensitivity techniques discussed by [30] we requirea very large number of model runs. In that case even for a modelthat takes just one second to run, many sensitivity analysismeasures may take too long to compute.

The most frequently used sensitivity indices are due to Sobol[32]. However, these require an assumption of independence (asdiscussed by Bedford [1]). Hence in Section 3 we discuss here howone might go about choosing an appropriate parameterisation inwhich the sensitivity analysis can be carried out using indepen-dent uncertainty variables.

It should be noticed that the probabilistic sensitivity analysesare often effectively carried out with efficient sampling proce-dures (e.g., [14,15]), but these procedures are computationallyvery expensive. Therefore, we present an alternative computa-tional tool to implement sensitivity analysis based on the work of[22]. This is a Bayesian approach of sensitivity analysis whichunifies the various methods of probabilistic sensitivity analysiswhich will be briefly introduced in Section 2. This approach iscomputationally highly efficient and allows effective sensitivityanalysis to be achieved by using very smaller numbers of modelruns than Monte Carlo methods require. The range of tools used inthis approach also enables us to do uncertainty analysis, predic-tion, optimisation and calibration. Section 4 presents this method.

This paper extends work carried out by Daneshkhah andBedford [10] where emulators were used to examine the influ-ence of failure and repair densities’ parameters on the systemavailability of a simple one component repairable system whereExponential and Weibull were considered as distributions for thefailure and repair rates. Here, we consider the sensitivity analysisof repairable systems with more than one component, chosen sothat we can use numerical integration to compute availability.The systems are: a parallel system with two components wherethe failure and repair distributions are exponentials; a wellknown standby-redundancy system where there are three para-meters to be examined and the failure and repair distributions arealso exponentials; and move-drive system with eight componentsand 17 parameters, where the repair rate is constant but thefailure distributions are Weibulls. There are closed forms for theavailability functions for the first two systems and we use them tovalidate the method. But there is no closed form for the thirdsystem, and to evaluate the system availability at the selectedparameter values, an expensive numerical method required. Wepresent some conclusions and possible future developments inSection 6.

2. Probabilistic sensitivity analysis

Local sensitivity analysis evaluates the influence of uncertaininputs around a point in the input space and generally relies onthe estimation, at this point, of the partial derivatives of theoutput with respect to the inputs. This is known as a one-at-a-time measure of sensitivity because it measures the effect on theresponse of varying one input alone by a fixed fraction of its value(assumed known). As a consequence, if the model is nonlinear,the relative importance of the model inputs depends on thechosen point. Several investigators tried to get around thislimitation by evaluating averages of the partial derivatives atdifferent points in the input space.

Conversely, global sensitivity analysis of model output evalu-ates the relative importance of inputs when they are variedgenerously, i.e., when their uncertainty is acknowledged over a

wide range. One approach to global sensitivity analysis is theanalysis of variance of the model response originally proposed by[32]. In this setting nonlinearity in the model is not an issue. Theapproach can capture the fraction of the model response varianceexplained by a model input on its own or by a group of modelinputs. In addition, it can also provide the total contribution to theoutput variance of a given input, that is its marginal contributionand its cooperative contribution. There are different computa-tional techniques to perform this sensitivity analysis, e.g.,[32,30,33]. We will focus on the emulator-based method tocalculate this sensitivity measure presented by [22].

Borgonovo et al. [4] introduced the moment-independentsensitivity methods which are recently attracted by practitioners(particulary, in Environmental sciences). Similar to the emulator-based sensitivity method, these methods provide a thorough wayof investigating the sensitivity of model output under uncertainty.However, their estimation is challenging, especially in the pre-sence of computationally intensive models. Borgonovo et al. [4]suggest to replace the original model by a metamodel to lowerthe computation burden. They utilise the emulator proposed in[27] as an efficient metamodel. Their results show that theemulator allows an accurate estimation of density-based sensi-tivity measures, when the main structural features of the originalmodel are captured (see also [3]).

Ratto et al. [26] also reported that emulators (also denoted asmetamodelling in the literature) provide an efficient means fordoing a sensitivity analysis for large and expensive models. Theyprovide some tools and applications of sensitivity analysis in thecontext of environmental modelling.

Caniou and Sudret [6] propose an alternative method againstthe variance-based sensitivity methods mentioned above (e.g.,Sobol indices) which are relatively expensive because of theconditional moments estimation for quantifying the uncertaintyof the model output due to the uncertain input variable. Theysubstitute the initial model by an alternative metamodel calledpolynomial chaos expansion. The process is applied to numericaltest cases. Their obtained results are discussed and comparedwith the reference obtained with variance-based methods. In2011 [7], these authors developed this work to the situationwhere the input parameters are correlated.

In this paper, we focus on the method suggested by [22]. Weshall consider how a function f ðxÞ depends on its input variables,and in our case f will typically be the function that computessystem availability as a function of a vector of quantities such asfailure and repair distribution parameters. Note that we wouldconsider availability at different time points as distinct functions,so time is not considered as an input variable. We also assume theexistence of a distribution G representing the uncertainty of theparameters. We discuss below appropriate choices for the vectorof inputs and the prior distribution G. For the purpose ofcomputing Sobol indices it is often assumed that G has to be anindependent distribution, and we discuss this further below.

Some notation is introduced first. Write a d-dimensionalrandom vector as X¼ ðX1, . . . ,XdÞ, where Xi is the ith element ofX, the subvector ðXi,XjÞ is shown by Xi,j, and in general if p is a setof indices then write Xp for the subvector of X whose elementshave those indices. We also denote X�i as the subvector of xcontaining all elements except xi. Similarly, x¼ ðx1, . . . ,xdÞ denotethe corresponding observed random vector X.

2.1. Main effects and interactions

Since f is a function of uncertain quantities X we can considerits expected value (when f is availability at a given time, thenthe expected value is the predictive availability at that time).

Page 3: Probabilistic sensitivity analysis of system availability using Gaussian processes

A. Daneshkhah, T. Bedford / Reliability Engineering and System Safety 112 (2013) 82–9384

Write z0 ¼ E½f ðXÞ�. The function

ziðxiÞ ¼ E½f ðXÞ9xi��E½f ðXÞ�

is called the main effect of variable i. It is the function of xi onlythat best approximates f in the sense of minimising the variance(calculated over the other variables)

varX�iðf ðX1, . . . ,Xi�1,xi,Xiþ1, . . . ,XnÞ�ziðxiÞÞ:

The main effect has a straightforward interpretation in ourcontext: It is the expected change to the availability that wouldbe obtained if we were to know that parameter i has value xi,taking into account the residual uncertainty in the other para-meters. Furthermore, the well-known decomposition of varianceformula varðYÞ ¼ EðvarðY9XÞþvarðEðY9XÞÞ implies that the var-iance of the main effect for parameter i is the expected amountby which the variance of f would be reduced if were to know thevalue of parameter i.

The above definition and interpretation do not require that G isan independent distribution. If the parameters are not indepen-dent then the variance reduction from specifying one parametermight change the amount of variance reduction that could beachieved by specifying another parameter. See the discussion in[1] for ways to define variance reduction in this case.

Assuming that the parameters are independent one can gomuch further. Sobol [32] proves that any square integrablemathematical function can be decomposed as follows:

y¼ f ðxÞ ¼ E½f ðXÞ�þXd

i ¼ 1

ziðxiÞþXio j

zi,jðxi,jÞ

þX

io jok

zi,j,kðxi,j,kÞþ � � � þz1,2,...,dðxÞ, ð1Þ

where ziðxiÞ ¼ E½f ðXÞ9xi��E½f ðXÞ�, zi,jðxi,jÞ ¼ E½f ðXÞ9xi,j��ziðxiÞ�zjðxjÞ�

E½f ðXÞ�,

zi,j,k ¼ E½f ðXÞ9xi,j,k��zi,jðxi,jÞ�zi,kðxi,kÞ

�zj,kðxj,kÞ�ziðxiÞ�zjðxjÞ�zkðxkÞ�E½f ðXÞ�

and so on. This is called the Sobol decomposition. Note that Rattoet al. [26,33] also mentioned that the starting point of themetamodel building process is the representation of y¼ f ðxÞ interms of the ANOVA model given in (1).

We call zi,jðxi,jÞ the first-order interaction between xi and xj andso on. It is easy to see that the functions in the Sobol decomposi-tion are pairwise orthogonal, that is,

E½zir ,...,is ðXir , . . . ,Xis Þ � ziq ,...,it ðXiq , . . . ,Xit Þ� ¼ 0 for each ðir , . . . ,isÞa ðiq, . . . ,itÞ

because of the independence assumption. By assumption all theterms (except the constant term) have mean zero. It is straight-forward to show that if we require the functions in a Soboldecomposition to be orthogonal with zero mean, then in fact theyhave to be the functions given above, that is the Sobol decom-position is unique. Ways of defining a Sobol decomposition fordependent variables are discussed in [1].

It is important to remember that the Sobol decompositiondoes, however, depend on the distribution G of the uncertaininputs. The sensitivity measures determined are done so withrespect to the distribution G. However, they allow us to identifywhich elements in X are the most influential in inducing theuncertainty in f. The main effects and first-order interactions andtheir plots can be considered as a powerful visual tool toinvestigate how the model output responds to each individualinput, and how those inputs interact in their influence on themodel output.

2.2. Variance-based methods

As mentioned above, the variance of the main effect can beinterpreted as the amount by which the overall variance of f

would be reduced if we knew xi. The main effect variance forparameter i is therefore defined as

Vi ¼ varfEðY9XiÞg:

A second measure, first proposed by [18], is

VTi¼ varðYÞ�varfEðY9X�iÞg,

which is the remaining uncertainty in Y that is unexplained aftereverything has been learnt except xi.

These two measures can be converted into scale invariantmeasures by dividing by var(Y) as follows:

Si ¼Vi

varðYÞ, STi

¼VTi

varðYÞ¼ 1�S�i,

where Si is the main effect index of xi, and STiis the total effect index

of xi.The details of other sensitivity analysis methods, such as variance

decomposition and regression components can be seen in [22].The variance measures are linked to the Sobol decomposition,

when the parameters are independent, because the independenceof the terms in the Sobol decomposition implies that the totalvariance for f can be represented as the sum of the variances foreach term

varðf Þ ¼X

i

V iþXio j

Vi,jþ � � � þV1,2,...,d,

where Vi1 ,...,ik ¼ varðzi1 ,...,ik Þ. Hence the main effect variance is oneof the terms in the variance decomposition, and the total variancefor i is the sum of all the variances of terms involving i.

3. Choice of parameters and prior distribution G

In order to give a useful operational meaning to the sensitivitymeasures described in this paper, it is important that the para-meters used have a reasonable operational interpretation. In thispart we discuss how to ensure that the parameters are mean-ingful and how the uncertainty on the parameters should bemodelled.

Bedford and Cooke [2] take the view that uncertainty quanti-fication has to be about observable quantities, or at least aboutquantities that could in principle be observed. This is basedaround the notion of subjective probability which is used toquantify uncertainties. Hence we could in principle use probabil-ities to describe the uncertainty in the event that my car will starttomorrow morning, giving us probabilities of single events.Similarly we could use probabilities to consider the event thatmy car will fail by a given time t, giving us cumulative probabilitydistributions. However, we can also use probabilities at a secondorder level if we are happy to consider families of events that wejudge are exchangeable. The classical example of this is asequence of coin tosses which are judged to be exchangeable.In principle we could consider the long-run average number ofheads to be an observable, and therefore use probabilities toquantify the uncertainty in the long-run average. This approach,while allowing us to consider ‘‘a probability of a probability’’ doesnot allow anything further (a probability of a probability of aprobability) because it ties the second level probability into anexchangeable sequence. It also allows us to quantify the uncer-tainty in median lifetime, or the uncertainty in other quantiles(admittedly, explaining to a domain expert what a quantile of aquantile is, is not easy, but at least it makes philosophical sense).This approach gives a clear articulation of the difference between

Page 4: Probabilistic sensitivity analysis of system availability using Gaussian processes

t501500.001495.001490.001485.001480.00

EF

3.20

3.15

3.10

3.05

3.00

Fig. 1. Random sample of 100 points for of an independent distribution on t50 and

EF ¼ t95=t50.

alpha.34.32.30.28.26.24

k.00

.00

.00

.00

.00

Fig. 2. Random sample of 100 points for of an independent distribution on t50 and

EF ¼ t95=t50, shown in a and k parameters.

A. Daneshkhah, T. Bedford / Reliability Engineering and System Safety 112 (2013) 82–93 85

aleatory and epistemic uncertainty, as also discussed for examplein [12,13,17,24,25].

From this perspective, and in a reliability context, it makessense to consider sensitivity analysis for observables as describedabove, for example, mean time to failure, mean time to repair,quantiles of failure time distributions, etc. The parameters of theclassical lifetime distributions do not always have a direct opera-tional interpretation, but may sometimes be derived from some-thing with such an interpretation. For example the constantfailure rate of an exponential distribution is one over the meantime to failure. A less obvious example is the Weibull where theshape and scale parameters might not be directly considered asobservables, but could be calculated if two quantiles are specified.Quantiles, as discussed above, can be considered observable.Hence uncertainty on the values of two quantiles (for definitenesswe consider the 50% and 95% quantiles) can be transformed intouncertainty on the scale and shape parameters. Alternatively,uncertainty about these parameters could be given an operationalmeaning by transforming into uncertainty about quantile values.This type of approach is used in [16] to specify uncertainty inhazard curves for seismic events.

In practical terms the initial specification through uncertaintyon quantiles might come through expert judgement, whileuncertainties on parameters might be available through genericdatabases. In either case, reliability managers can find themselvesin a situation where they are designing new systems and have todeal with uncertainty assessments about the components of thesystem. In order to reduce the overall uncertainty for the systemthe managers will wish to explore which parameters couldprovide the biggest decrease in uncertainty, if they could bedetermined more precisely. That is the problem context for thispaper.

Focussing particularly on the Weibull distribution, we use theparameterisation

FðtÞ ¼ 1�exp �kt1þa

1þa

!, tZ0,k40,a4�1,

where k is the shape parameter and a is the scale parameter. Thefailure rate is lðtÞ ¼ kta. By taking logarithms twice we have

lnð�lnð1�FðtÞÞÞ ¼ lnðkÞ�lnð1þaÞþð1þaÞlnðtÞ: ð2Þ

Hence if we have the 50% and 95% quantiles, t50, t95, then thefollowing equations hold:

lnð�lnð0:5ÞÞ ¼ ðlnðkÞ�lnð1þaÞþð1þaÞlnðt50Þ, ð3Þ

lnð�lnð0:05ÞÞ ¼ ðlnðkÞ�lnð1þaÞþð1þaÞlnðt95Þ, ð4Þ

from which we determine k and a.There is no reason why the uncertainty distributions defined

on k and a should be product distributions (i.e., independent). Ifwe were to consider quantiles for a given lifetime distributionthen the ordering requirement (a 50% quantile is always less thana 95% quantile) would imply that the joint uncertainty cannot beindependent. However, if we consider the median and the errorfactor EF,

t50,EF ¼ t95=t50

as the parameters of the Weibull, then the uncertainties in thesequantities might be considered independent. Therefore, the shapeparameter k and the scale parameter a can be determined interms of t50 and EF ¼ t95=t50 as follows:

a¼ lnð�lnð0:5ÞÞ�lnð�lnð0:05ÞÞ

lnðt50Þ�lnðt50 � EFÞ�1

� �, ð5Þ

k¼ expðlnð�lnð0:5ÞÞþ lnð1þaÞ�ð1þaÞlnðt50ÞÞ, ð6Þ

We shall illustrate the use of the these parameters in Section 5.3.

As discussed before, the Sobol expansion used in sensitivityanalysis does assume that the parameters are independent. Forthe two possible parameterisations considered here (that is, k anda or t50 and t95=t50) it makes a considerable difference whetherone chooses one or the other. This is illustrated in Figs. 1 and 2,where we show 100 points drawn randomly from independentuniform distributions on ð1480,1500Þ for t50 and ð3,3:2Þ forEF ¼ t95=t50. The same 100 points are used to calculate 100 valuesfor the a and k parameters in Fig. 2, and it can be seen that theyare far from independently distributed.

4. Emulators-based sensitivity analysis

In principle, if f ðxÞ is sufficiently simple it would be possible toanalytically compute the sensitivity measures described in the pre-vious sections. With more complex models this is not possible and wehave to seek to derive the desired measures computationally.

Page 5: Probabilistic sensitivity analysis of system availability using Gaussian processes

A. Daneshkhah, T. Bedford / Reliability Engineering and System Safety 112 (2013) 82–9386

If f ðxÞ is cheap enough (that is, we can compute it quickly) that weare able to evaluate it easily for many different inputs, standardMonte Carlo methods would be suffice to estimate var(Y). Thecomputation techniques proposed by [32,9,31] demand many thou-sands of function evaluations. Thus, these methods are impractical fora more expensive function. We use the Bayesian inference toolsdeveloped by [22], briefly introduced in the following section, totackle this computation issue. By using this method we are able toestimate all the quantities of interest required to examine thesensitivity analysis in reliability analysis.

The most important aspect of this Bayesian approach is thatthe model (function) f ð�Þ is considered as uncertain. Specificallythis means that the value of f ðxÞ is unknown for any particularinput configuration x until we actually run the model for thoseinputs. The Bayesian model specifies a prior joint distribution forthe values taken by f ðxÞ at different values of x. This prior is thenupdated according to the usual Bayesian paradigm, using as datathe outputs yi ¼ f ðxiÞ,i¼ 1, . . . ,N, from a set of runs of the model.The result is a posterior distribution for f ð�Þ, which is used to makeformal Bayesian inferences about the sensitivity measures thatwere introduced in the previous section. Although we are stilluncertain about the function f at parameter values where it wasnot evaluated, the uncertainty is very much reduced by theassumed correlation of function values from one point to another.Typically the expected value of the posterior distribution is usedas a point estimate for f. The reader should be alert to the fact thatthere are therefore two different distributions being used in thesensitivity analysis computation: the first is the distribution G

which represents the uncertainty in the availability model para-meters x, and which is propagated to the output values throughthe function f; the second is the (posterior) distribution on f whichplays a pure computational role, can be reduced as much asrequired by computing the function f at more points x, and doesnot have any operational interpretation. Expected values com-puted over the second distribution will be denoted by EGP

(GP standing for Gaussian Process) to avoid potential confusionwith expected values taken over G. The next section describesbriefly the Bayesian set up and the second distribution.

4.1. Inference about functions using Gaussian processes

We consider the function or complex model under study as adeterministic code that returns an output y¼ f ðxÞ at input vectorx. A Gaussian process provide a prior model for our uncertainknowledge of the values taken by this function before it isevaluated through the deterministic code. The evaluation of thecode at different input values gives data with which a posteriordistribution is derived. In fact, the prior distribution consideredfor f ð�Þ is a two-stage prior distribution, where in the first stagegiven some hyperparameters a Gaussian process is chosen as aprior distribution on f ð�Þ, and the mean vector and covariancematrix of the Gaussian process are determined by estimating thehyperparameters in the second stage. The key requirement to usethe Gaussian process is that f ð�Þ should be a smooth function, so ifwe know the value of f ðxÞ we should then have some idea aboutthe value of f ðx0Þ for x close to x0. This basic assumption of asmooth, continuous function gives the Gaussian process majorcomputational advantages over MC methods, since the expectedproximity of function values evaluated at nearby points is usuallyignored in the Monte Carlo simulation approach.

Using a Gaussian process prior for f ð�Þ implies that the uncertaintyabout ff ðx1Þ, . . . ,f ðxnÞg, given any set of points fx1, . . . ,xng, can berepresented through a multivariate normal distribution. We there-fore need to make tractable prior assumptions about the mean andcovariance. The mean of f ðxÞ conditional on the hyperparameters, b,

is modelled as

E½f ðxÞ9b� ¼ hðxÞTb, ð7Þ

where hð�Þ is a vector of q known functions of x, and b is a vector ofcoefficients. The choice of hð�Þ is arbitrary, but it should be chosen toincorporate any beliefs that we might have about the form of f ð�Þ.The covariance between f ðxÞ and f ðx0Þ is given by

covðf ðxÞ,f ðx0Þ9s2Þ ¼ s2cðx,x0Þ, ð8Þ

where cð�,�Þ is a monotone correlation function on Rþ withcðx,xÞ ¼ 1, and decreases as 9x�x09 increases. Furthermore, thefunction cð�,�Þ must ensure that the covariance matrix of any setof outputs fy1 ¼ f ðx1Þ, . . . ,yn ¼ f ðxnÞg is positive semi-definite.Throughout this paper, we use the following correlation functionwhich satisfies all the conditions mentioned above and is widelyused for its computationally convenience,

cðx,x0Þ ¼ expf�ðx�x0ÞT Bðx�x0Þg, ð9Þ

where B is a diagonal matrix of positive smoothness parameters.The matrix B has the effect of rescaling the distance between x andx and x0. Thus B determines how close two inputs x and x0 need tobe such that the correlation between f ðxÞ and f ðx0Þ takes aparticular value.

Oakley and O’Hagan [22] suggest the following conjugateprior, the normal inverse gamma distribution, for ðb,s2Þ

pðb,s2Þpðs2Þ�ð1=2Þðdþqþ2Þ expf�fðb�zÞT V�1

ðb�zÞþag=ð2s2Þg

for fixed hyperparameters z,V ,a and d.The output of f ð�Þ is observed at n design points x1, . . . ,xn to

obtain y¼ ff ðx1Þ, . . . ,f ðxnÞg considered as data. It should be noticedthat these points, in contrast with Monte Carlo methods, are notchosen randomly but are selected to give good information aboutf ð�Þ. The design points will usually be spread to cover X , the inputspace of X. Since X is unknown, the beliefs about X is representedby the probability distribution GðXÞ. Therefore, the choice of thedesign points will also depend on Gð�Þ (the choice of design pointsis discussed in [29]). The standardised posterior distribution of f ð�Þ

given y¼ ff ðx1Þ, . . . ,f ðxnÞg is

f ðxÞ�mnðxÞ

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffifficnðx,x0Þ

p�����y� tdþn, ð10Þ

where tdþn is a student t random variable with nþd degrees offreedom and d is the dimension of x [19],

mnðxÞ ¼ hðxÞT bþtðxÞT A�1ðy�HbÞ, ð11Þ

cnðx,x0Þ ¼ cðx,x0Þ�tðxÞT A�1tðx0ÞþðhðxÞT

�tðxÞT A�1HÞðHT A�1HÞ�1ðhðx0ÞT�tðx0ÞT A�1HÞT ð12Þ

and

tðxÞT ¼ ðcðx,x1Þ, . . . ,cðx,xnÞÞ, HT¼ ðhT

ðx1ÞT , . . . ,hT

ðxnÞTÞ,

1 cðx1,x2Þ . . . cðx1,xnÞ

cðx2,x1Þ 1 ^

^ &

cðxn,x1Þ . . . 1

0BBBB@

1CCCCA,

b ¼ VnðV�1zþHT A�1yÞ, s2

¼faþzT V�1zþyT A�1y�b

TðVnÞ�1bg

ðnþd�2Þ

Vn¼ ðV�1

þHT A�1HÞ�1:

The outputs corresponding to any set of inputs will now have amultivariate t-distribution, with covariance between any twooutputs given by Eq. (10). Note that the t-distribution arises asa marginal distribution for f ð�Þ after integrating out the hyper-parameters b and s2. In practice, further hyperparameters, thesmoothness parameters B, will be associated with the modelling

Page 6: Probabilistic sensitivity analysis of system availability using Gaussian processes

A. Daneshkhah, T. Bedford / Reliability Engineering and System Safety 112 (2013) 82–93 87

of the correlation function, cð�,�Þ. It is not practical to give B a fullyanalytical Bayesian treatment since it is generally impossible tointegrate the posterior distribution analytically with respect tothese further parameters. The simplest option is to keep B fixed.Another way is to use a numerical method to integrate theposterior distribution. Although it is possible to integrate numeri-cally, in particular, by using Markov chain Monte Carlo (MCMC)sampling which is a highly intensive computation, but a prag-matic and robust approach is to simply estimate the hyperpara-meters of cð�,�Þ from the posterior distribution and then tosubstitute these estimates into cð�,�Þ wherever they appear inthe above formulae (see [19]). These estimates can be obtained byusing the posterior mode approach, and using a cross validationapproach [23]. The GEM-SA package can estimate the smoothnessparameters using both methods. We will give more details in thenext section.

4.2. Inference for main effects and interactions

In this section, we explain how inferences about the sensitivitymeasures introduced in the preceding section can be estimatedfrom the Gaussian process posterior distribution derived inSection 4.1. One of the key insights in [22] is that inference aboutf can be used to obtain inference about the main and interactioneffects of f because these are simply linear functionals of f. Inparticular, when the posterior distribution for f is tdþn afterstandardising as shown in (10), so is the resulting posterior forthe main and interaction effects. Specifically, if the posteriormean for f is given by Eq. (11), then for

EðY9xpÞ ¼

Zw�p

f ðxÞ dG�p9pðx�p9xpÞ ð13Þ

(recalling that w�p denote the input space associated with x�p, andG�p9pðx�p9xpÞ is conditional distribution of x�p given xp under G),the posterior mean of this quantity can be written as

EpostfEðY9xpÞg ¼ RpðxpÞbþTpðxpÞe,

where

RpðxpÞ ¼

Zw�p

hðxÞT dG�p9pðx�p9xpÞ,

TpðxpÞ ¼

Zw�p

tðxÞT dG�p9pðx�p9xpÞ

and e¼ A�1ðy�HbÞ.

The posterior mean of main effect or interaction can besimilarly obtained as follows:

EpostfziðxiÞg ¼ fRiðxiÞ�RgbþfTiðxiÞ�Tge,

Epostfzi,jðxi,jÞg ¼ fRi,jðxi,jÞ�RiðxiÞ�Rjðxj�RÞgb

þfTi,jðxi,jÞ�TiðxiÞ�Tjðxj�TÞge:

In a similar way, we can derive the standard deviations of themain effects and interactions, see [22] for the details of thiscomputation.

We can plot the posterior mean of the main effect EposðziðxiÞÞ

against xi, with bounds of, for example, plus and minus twoposterior standard deviations. By standardising the input vari-ables we are able to draw EposðziðxiÞÞ for i¼1,y,d on a single plotgiving a good graphical summary of the influence of each variable.We will present this plot for the examples given in Section 5.

Direct posterior inference for the variance-based measuresintroduced in Section 2.2, Vi and VTi

is more complex as thesemeasures are quadratic functionals of f ð�Þ, and we refer to [22] fora discussion of ways of dealing with this.

5. Application in reliability analysis

We present five examples where we examine the sensitivityanalysis of the quantities of interest in reliability analysis withrespect to the changes of the relevant parameters. We areparticularly interested to study the sensitivity analysis of theavailability function with respect to the uncertainty in theparameters of the failure and repair distributions at the giventime of interests.

In the first two examples, we consider a one componentrepairable system, where in the first example the failure andrepairable distributions are assumed to be exponential while inthe second example these distributions are Weibull. These twoexamples are discussed in detail in [10], and we only briefly reporttheir findings in Section 5.1. We then move to repairable systemswith more than one component. We are interested in parallelsystems, such as two component standby-redundancy system orthree-component redundancy system. The failure and repair ratesof each component are assumed to be constant and followexponential distributions. These two examples are given in Section5.2. In the last example given in Section 5.3, we study a morecomplex system, the move-drive system. This system consists ofeight components with constant repair system and time-dependentfailure rates, where the failure rate follows Weibull distribution.

To build the emulator of the system availability required toconduct the sensitivity analysis, we need to evaluate this avail-ability at selected parameter points at time t. In the followingsection, we briefly present the methods to evaluate the systemavailability at the given parameters. There are no restrictions on thevalues of the inputs selected to build the emulator, but a goodemulator (one presenting small uncertainties from relatively smallnumbers of runs) usually requires a well spaced set of inputscovering the range of inputs over which emulation is required. Weuse maximin Latin hypercube or LP-tau designs [29] to generate anefficient set of data points or inputs. Suppose we wish to draw n

random values of x¼ ðx1, . . . ,xdÞ. For i¼1,y,d we divide the samplespace of xi into n regions of equal marginal probability. We thendraw one random value of xi from each region, giving realisationswe call xi1, . . . ,xin. To obtain one random value of x we samplewithout replacement from the values xi1, . . . ,xin for i¼ 1,2, . . . ,d.This ensures that each dimension in the input space is fullyrepresented. In other words, the Latin Hypercube Sampling schemeis designed to ensure that the whole sample space of x isrepresented without having to use a very large sample size asdescribed in the details as above. After building the emulator, it istrivial to calculate main effects and joint effects to aid modelinterpretation and model checking as part of a sensitivity analysis,when direct evaluation of the model output is too costly.

5.1. One-component repairable system

We assume that the systems under study here and later arerepairable, and that at any given time, a component is eitherfunctioning normally or failed (and under repair), so that thecomponent state changes as time evolves. We further assume thatrepairs restore the component to a condition as good as new.Thus, the process consists of alternating repetitions of the repair-to-failure and the failure-to-repair processes.

The availability at time t, denoted by A(t), is the probabilitythat the component is functioning at time t. The availability of asystem provides a measure of the readiness of a system for use atany instant of time. The availability analysis of a system requires aknowledge of the following aspects:

the system configuration describing how the components arefunctionally connected;
Page 7: Probabilistic sensitivity analysis of system availability using Gaussian processes

A. Daneshkhah, T. Bedford / Reliability Engineering and System Safety 112 (2013) 82–9388

the failure process of the components; � the method of operation and the definition of failure of the

system;

5 6 7 8 9 10

x 10−3

0.99

0.995

1

1.005

Mai

n ef

fect

λ

0.9955

0.996

0.9965

0.997

ain

effe

ct

the repair or maintainability policy.

A variety of failure and repair time distributions can be used inthe availability analysis. Two of the well-known of these distribu-tions are exponential and Weibull. We examined the emulator-based sensitivity analysis of the availability when the failure andrepair rates are either both exponential distributions or bothWeibull distributions [10]. In the case when the failure and repairrates are both constant and their densities follow exponentialdistribution, we show that the system availability becomes moresensitive to the repair rate as time increased from t¼0.01 to t¼

1000 (for tZ1000, the steady state availability will be obtained),and correspondingly its variance contribution increased from 0.1%to 84.71%. It is obvious at the early time the system is moresensitive with respect to the failure rate and the percentagevariance contribution of its main effect was 98.84% whent¼0.01 and decreased to 14.51% as time increased to t¼1000(the details can be found in [10]).

In the latter case, the main effect’s plots suggest that as timeincreased, for the fixed scale parameter of the failure to timedistribution, the system availability is more sensitive with respectto the scale parameter of the repair density, and the variancecontribution of its main effect varies from 87% to 91%.

However, the computation of the availability at the selectedparameters’ values for the first case is trivial, but as expected, theevaluation of the system availability for the selected parameters’values when the failure and repair rates are time dependent is notan easy task and a numerical method is required. In the nextsection, we show that the emulator-based sensitivity measureintroduced in this paper can be easily calculated for the systemswith a higher number of components, complexity and number ofparameters to be estimated. In particular, we first focus on theparallel systems and then study a real system with more com-plexity and parameters.

The systems investigated here have been selected so that theavailability function can, in principle, be calculated by numericalintegration and without Monte Carlo simulation.

5.2. Availability of parallel systems

Consider a parallel system considering of two components A

and B. For this system, there are three possible states

0.9945

0.995M

1.

State 0, both components operating. 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 2. State 1, one component operating and the other under repair.

μ

3.

Fig. 3. Estimated main effects for the availability system described in (14) at t¼10

based on 10 datapoints.

Table 1The emulator-based sensitivity analysis of the system availability based on 10, 20

and 40 design points at t¼10.

Measurement 10 points 20 points 40 points

s2 0.219781 0.0759006 0.0351314

AEmð10Þ 0.995543 0.995527 0.995523

EmStdðAð10ÞÞ 0.0017 0.001634 0.00163

Variance (%) of m 96.03 95.1 95.44

Variance (%) of l 3.05 4.27 3.94

Total variance (%) 99.0767 99.3671 99.3733

State 2, both components under repair.

We first calculate the availability of this system under twomaintenance policies: one repair person; two repair persons. Inthe first case when there is only a single repair person to servicethe two components. The availability of this system at any time, t

can be obtained by solving the following differential equations:

dP0ðtÞ

dt¼�2lP0ðtÞþmP1ðtÞ,

dP1ðtÞ

dt¼ 2lP0ðtÞ�ðlþmÞP1ðtÞþmP2ðtÞ,

dP2ðtÞ

dt¼ lP1ðtÞ�mP2ðtÞ, ð14Þ

where P0ðtÞ is the probability that both components at time t areoperating, P1ðtÞ denote to the probability that one component isoperating and the other one is under repair at time t, P2ðtÞ isstanding for the probability that both components are under

repair at time t, so, P0ðtÞþP1ðtÞþP2ðtÞ ¼ 1, l and m denote thecommon failure rate and repair rate, respectively, per unit time ofthe components.

These differential equations can be easily solved numerically(for example, in MATLAB) under the initial conditions: P0ð0Þ ¼ 1,P1ð0Þ ¼ 0, P2ð0Þ ¼ 0, corresponding to the system starting off ‘‘asgood as new’’.

Since States 0 and 1 constitute the operation of the system, theavailability of this system at time t is given by

AðtÞ ¼ P0ðtÞþP1ðtÞ:

If there are two repair persons, one repair person can be assignedto each component and the availability of this system can beobtained by solving the following differential equations:

dP0ðtÞ

dt¼�2lP0ðtÞþmP1ðtÞ,

dP1ðtÞ

dt¼ 2lP0ðtÞ�ðlþmÞP1ðtÞþ2mP2ðtÞ,

dP2ðtÞ

dt¼ lP1ðtÞ�2mP2ðtÞ: ð15Þ

To examine the sensitivity analysis of A(t) at time t with respect tothe uncertainties in l and m, we need to calculate the main effects,their plots and other sensitivity measures described in Section 4.Plots of main effects provide us a cheap and effective tool toexamine which of the parameters of A(t) is significantly sensitiveto, and the nature of the possible relationships between A(t) andthe parameters.

Page 8: Probabilistic sensitivity analysis of system availability using Gaussian processes

Fig. 4. Transition diagram for cold, warm, or hot standby. (a) Diagram for cold standby. (b) Diagram for warm or hot standby.

5 6 7 8 9 10x 10−3

0.995

1

1.005

λ

main effects of the availability system based on 10 points for t=10

2.5 3 3.5 4 4.5 5x 10−3

0.996

0.998

1

η

0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.050.996

0.997

0.998

0.999

μ

Fig. 5. Estimated main effects for the system availability of the standby-redun-

dancy system presented in Eqs. (16) at t¼10 and obtained based on 10 points.

A. Daneshkhah, T. Bedford / Reliability Engineering and System Safety 112 (2013) 82–93 89

Fig. 3 illustrates the estimated main effects, EðAðtÞ9lÞ andEðAðtÞ9mÞ, and defined in the general case in (13) and its approx-imation given in the previous section, for the system availabilitybased on 10 design points and at t¼10. It is clear by increasingthe number of design points the uncertainty of the fitted emulatorwill be reduced (see Table 1). The thickness of the band in Fig. 3shows the emulator uncertainty associated with each input. Forinstance, fixing m¼ 0:025, this point shows the expected value ofthe output (obtained by averaging over l). In addition, fixing l atits central value and comparing m¼ 0:01 with m¼ 0:05 wouldunderestimate the influence of this input. In other words, theemulator will allow small groups of inputs to vary, others fixed atoriginal default values. From comparing the thickness of the maineffect plots of l and m, it can be concluded that there is moreuncertainty about m than l. The sensitivity analysis of the systemavailability, A(t) with respect to the changes in m and l is shownin Table 1 for the different number of design points. The systemavailability is more sensitive to the changes of m regardless of thenumber of design points at which it is covering more than 95% ofoutput variance. The other fact that we can derive from this tableis that the emulator standard error denoted by EmStdðAðtÞÞ,t¼ 10,20,40, decreases as the number of design points increased.This standard error also shows the uncertainty in the modeloutput that is induced by input/parameter’s uncertainties (thedetails can be found in [22]). Furthermore, the estimation of theemulator parameter, such as s2, by maximising the emulatorposterior distribution is also given in this table. The value of 1=s2

can be considered as precision of the fitted Gaussian processrequired to implement the sensitivity and uncertainty analysis.It is clear that the precision will be increased by increasing thenumber of design points.

The information given in this table regarding to variancepercentage of the parameters enable us to conduct a variance-based sensitivity analysis to identify which uncertain parametersare driving the system availability’s uncertainty. From this table,we can say that 96% , 95.1% and 95.4% of total variance (alsoknown as the main effect and explained in Section 4) can beexplained by m based on 10, 20 and 40 points, respectively, whilemain effects of l are only described 3%, 4.3% and 4% for 10, 20, 40points, respectively. We can then conclude that the systemavailability is more sensitive to m. In addition, we can concludethat 10 points is enough to examine sensitivity analysis of thesystem availability. We consider product uniform as the prob-ability distribution of the parameters Gð�Þ in this example and

other examples given in the rest of this paper. We also check theproduct normal for Gð�Þ, and we have got the similar results.

We obtain the similar results yielded above for the systempresented in (15) for two repair persons.

We now study the emulator-based sensitivity analysis ofavailability of a more complex system with respect to the changesin its parameters. The availability of the following standby-redundancy system is studied in [20]. The behaviour of thissystem consisting of two components A and B is shown inFig. 4. Each rectangle represents a redundancy state. The rectangle1 represents a state where component B is in standby and A inoperating. Similarly, rectangle 4 expresses the event that compo-nent B is operating and A is under repair. Possible state transitionsare shown in the same figure. The warm or hot standby hastransitions from 1 to 3, or state 2 to 4, whereas the cold standbydoes not. For the warm or hot standby, the standby componentfails with constant failure rate Z. For hot standby, Z is equal to l,the failure rate for principle components. For cold standby, Z¼ 0.The warm standby (0rZrl) has as its special cases the hotstandby l¼ Z and the cold standby, Z¼ 0. Two or less compo-nents can be repaired at a time, and each component has a

Page 9: Probabilistic sensitivity analysis of system availability using Gaussian processes

5 6 7 8 9 10x 10−3

0.995

1

1.005

λ

main effects of the availability system based on 30 points for t=10

2.5 3 3.5 4 4.5 5x 10−3

0.996

0.997

0.998

η

0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.050.996

0.997

0.998

0.999

μ

Fig. 6. Estimated main effects for the system availability of the standby-redun-

dancy system presented in Eqs. (16) at t¼10 and obtained based on 30 points.

Table 2The emulator-based sensitivity analysis of the system availability of the standby-

redundancy system presented in Eqs. (16) at t¼10 and obtained based on 10 and

30 points.

Measurement 10 points 30 points

s2 0.935836 0.0881324

AEmð10Þ 0.996918 0.996943

EmStdðAð10ÞÞ 1.2255e�005 4.5408e�006

Variance (%) of l 82.6633 84.7421

Variance (%) of Z 4.66386 3.2605

Variance (%) of m 12.0513 10.6746

Total variance (%) 99.3785 98.6772

Table 3The emulator-based sensitivity analysis of the system availability of the standby-

redundancy system presented in Eq. (16) at t¼500 and obtained based on 10 and

30 points.

Measurement 10 points 30 points

s2 9.75993 0.502583

AEmð500Þ 0.951366 0.952061

EmStdðAð500ÞÞ 0.0011 0.00055

Variance (%) of l 11.5649 9.37515

Variance (%) of Z 0.120892 0.575127

Variance (%) of m 81.8986 85.711

Total variance (%) 93.5844 95.6613

A. Daneshkhah, T. Bedford / Reliability Engineering and System Safety 112 (2013) 82–9390

constant repair rate m. In all cases, the system fails when it entersstate 5.

We denote the probability that the redundant system in state i

at time t by Pi(t) and its derivative shown by PiðtÞ=dt. Theavailability of this system can be obtained by solving the follow-ing ordinary differential equations [20, pp. 428–430]:

dPð0ÞðtÞ

dt¼�ðlþZÞPð0ÞðtÞþmPð1ÞðtÞ,

dPð1ÞðtÞ

dt¼ ðlþZÞPð0ÞðtÞ�ðlþmÞPð1ÞðtÞþ2mPð2ÞðtÞ,

dPð2ÞðtÞ

dt¼ lPð1ÞðtÞ�2mPð2ÞðtÞ, ð16Þ

where Pð0ÞðtÞ ¼ P1ðtÞþP2ðtÞ, Pð1ÞðtÞ ¼ P3ðtÞþP4ðtÞ, Pð2ÞðtÞ ¼ P5ðtÞ, m isthe constant repair rate for each component, and Z and l are theconstant failure rates for the principle components in the differ-ent situations as explained above. Similarly, we can solveEqs. (16) using the Matlab commands mentioned above.

Figs. 5 and 6 illustrate the estimated main effects (EðAðtÞ9lÞ,EðAðtÞ9ZÞ and EðAðtÞ9mÞ) for the system availability presented inEq. (16) at t¼10, based on 10 and 30 design points, respectively.The thickness of the band in each plot of these figures shows theuncertainty associated with each input. For instance, there aremore uncertainties regarding m and larger values of Z (0:004oZo0:005) based on 10 design points as illustrated in Fig. 5. Asexpected, these uncertainties decrease as more design points usedto approximate the corresponding main effects. From thesefigures, we can conclude that the system availability at the giventime (t¼10) is more sensitive to the changes of l and then m. Thisconclusion can be obtained in more detail from Table 2 whichshows sensitivity analysis of A(t) with respect to the changes in l,Z and m. The main effects of l are almost 83% and 85% based on 10and 30 points respectively, while m covers only 10% and 12% oftotal variance for 10 and 30 points, respectively. In addition, Z’schanges do not have any influence on the system availabilityuncertainty. This is reasonable to conclude that the systemavailability is more under influence of the failure rate at theearly time.

For larger t, for example at t¼500, the system availabilitybecomes more sensitive to the changes of m. As shown in Table 3,the m’s contribution to the total variance is 82% based on using 10design points and 86% based on using 30 points. But, theavailability at t¼500 is slightly sensitive to the changes of l(its contribution to the variance is 11% and 9% for 10 and 30points, respectively) which is opposite to the results we have gotfor smaller t, shown above for t¼10. This is also reasonable toconclude that the system availability is more under influence ofthe repair rate at the late time. In addition, regardless of the timethat system availability evaluated at, increasing the number ofpoints will increase the precision of the fitted emulator andcomputation measured by 1=s2. From Tables 2 and 3, we canconclude that the precision of the emulator-based sensitivityanalysis presented above will increase more than 10 times whenwe increase the number of points from 10 to 30.

The method used here shows that the availability is sensitiveto different parameters at early times and late times. The latetime corresponds more or less to the steady state. Note also thatthe calculations of the sensitivity indices and availability are fairlyrobust to the number of points chosen to build the emulatormodel. Note also that by determining the indices at differenttime points we could determine the intervals on which eachparameter is the parameter of most importance, or more gen-erally the periods in which sensitivity to parameters remainsbroadly unchanged.

5.3. Move-drive system

We consider a more complex system, three special vehicles ofthe same kind, taken from Zhang and Horigome [34]. Theirpurpose is to examine the performance of the move-drive systemof the vehicles. The system consists of eight LWTSS (load wheeland torsional shaft suspensions) or components which can betreated as the same kind. These eight components are a seriessystem in logic. Since the components are same, integrate thesestates by the following definition: State i – i out of eightcomponents is failed, i¼ 0,1, . . . ,8. Based on these considerations,this system has N¼9 states.

Page 10: Probabilistic sensitivity analysis of system availability using Gaussian processes

Fig. 7. System state-transition diagram of move-drive system.

Table 4The possible ranges for t50 and t95=t50.

Component i t50 t95=t50

1 (1480,1500) (3, 3.2)

2 (3750,3850) (4.2, 4.4)

3 (11400, 11500) (4.2, 4.4)

4,5,6,7,8 (22 700, 23 000) (4.2,4.4)

A. Daneshkhah, T. Bedford / Reliability Engineering and System Safety 112 (2013) 82–93 91

According to the information given in [34] and their sugges-tion, it is not unreasonable to assume that the failure ratefunctions have the form liðtÞ ¼ ait

bi . They have also assumed thatthere are at most three repair facilities for each vehicle with thesame proficiency in operation and the repair rate is also con-sidered constant (the repair rate for a failed component by onerepair person is independent of time). Fig. 7 shows the resultingsystem state-transition diagram. Eq. (17) is derived from Fig. 7:

dPðtÞ

dt¼ T0RðtÞPðtÞ, ð17Þ

where PðtÞ ¼ ðP0ðtÞ,P1ðtÞ, . . . ,P8ðtÞÞ,

T0RðtÞ ¼

c m 0 0 0 0 0 0 0

l1 �m 2m 0 0 0 0 0 0

l2 0 �2m 3m 0 0 0 0 0

l3 0 0 �3m 3m 0 0 0 0

l4 0 0 0 �3m 3m 0 0 0

l5 0 0 0 0 �3m 3m 0 0

l6 0 0 0 0 0 �3m 3m 0

l7 0 0 0 0 0 0 �3m 3ml8 0 0 0 0 0 0 0 �3m

0BBBBBBBBBBBBBBBB@

1CCCCCCCCCCCCCCCCA

,

c��X8

i ¼ 1

li, ð18Þ

where li ¼ kitai .

The initial condition for (17) is

ðP0ð0Þ,P1ð0Þ, . . . ,P8ð0ÞÞ ¼ ð1,0, . . . ,0Þ:

The closed form for the system availability does not exist here,and Zhang and Horigome [34] used numerical methods to obtainthe system availability. Their methods involves integrating con-vulsion kind integrals and then solving a matrix differentialequation of linear time-varying system with the given initialcondition (the details can be found in [34]).

In order to examine the sensitivity analysis of the systemavailability with respect to the changes in the parameters, we usethe numerical method mentioned above to evaluate the systemavailability at the training points obtained from the Maxmin LatinHypercube design. In order to get the training points for thecorresponding parameters, we first should know the possibleranges for t50 and t95=t50 of each component. It can be achievedusing expert judgment or the available failure data. By studyingthe system and data available in [34], the following ranges foreach component of t50 and t95=t50 are proposed: We are now ableto generate training points (200 design points) for t50 and t95=t50

over the ranges proposed in Table 4, and then using Eqs. (5) and(6), we can obtain the corresponding parameter values forfðki,aiÞ,i¼ 1,2, . . . ,8g. In addition, in a similar way we obtain 200design points for the repair rate, mAð0:02,0:08Þ. We can now usethese values to calculate the system availability, as proposed in[34], at the requested time (t¼100).

Fig. 8 shows the main effects of the system availability, A(t) forthe parameters yielded as discussed above when t¼100 andn¼100 training points used to fit the required Gaussian process.From this figure and information, regarding the main effects ofthe parameters required to calculate the system availability givenin Table 5, we can conclude that the availability is most sensitiveto the uncertainties in m, a4 and k1, when 100 design points areused, respectively, and when 200 training points are used thesystem availability is most sensitive to the changes of thefollowing parameters, m and k1.

This method enables us to revise the sensitivity analysis ifthe values of some of these parameters are known or fixed.For instance, when we use 200 design points, if we know thevalues of m and k1 somehow, the system availability given thatthese parameters are known are then most sensitive to k2 (with29.95% of variance contribution) , a3 (13.82%), k4 (12.29%) and k5

(12.70%). As a result, total variance contribution would drop to94.8222%, but the estimated system availability is increased to0.960544.

Note that the emulator was constructed using 10 times asmany points as before because of the larger dimensionality of thespace of parameters.

6. Discussion

An alternative sensitivity analysis of the quantities of interest inthe reliability analysis, such as the availability/unavailability func-tion, with respect to the changes of uncertain parameters has beenpresented. This method is originally introduced by [22] to examinesensitivity analysis of a complex model with respect to changes inits inputs based on an emulator which is built to approximate themodel. This method enables us to decompose the output varianceinto components representing main effects and interactions. Thisapproach allows effective sensitivity analysis to be achieved byusing far smaller number of model runs than standard Monte Carlomethods. For example, Marseguerra et al. [21] employ a MonteCarlo method with 107 runs to compute the first-order differentialsensitivity indexes of the basic events characterising the reliabilitybehaviour of the transport system while our approach mightrequire few hundred runs to achieve the same accuracy but withbroader sensitivity analysis measures.

In the reliability context it is very useful to compute sensitiv-ities to different parameters in order to understand which has thelargest impact and therefore which has to be assessed moreaccurately. In our final example we showed that it was possibleto assess this for a system with more realistic number ofparameters, and that only a small number of the parametersgenerated most of the uncertainty about the availability.

It is worth mentioning that the availability function consid-ered here is a function of time and should therefore, in principle,be computed through time. In this paper, we present a method toexamine sensitivity of the system availability at certain times,such as time after starting the system or time very close to steadystate situation. In [10], for a one-component repairable system,we examined the sensitivity analysis of the availability function at

Page 11: Probabilistic sensitivity analysis of system availability using Gaussian processes

6 8 100.85

0.90.95

k1x 10−50.25 0.3 0.35

0.850.9

0.95

α1

1 2 30.9

0.910.92

k2x 10−4−0.02 0 0.02

0.90.910.92

α2

5 6 70.85

0.90.95

k3x 10−5−0.02 0 0.02

0.850.9

0.95

α3

2.5 3 3.50.85

0.90.95

kx 10−54

−0.02 0 0.020.85

0.90.95

α4

2.5 3 3.50.9

0.9050.91

kx 10−55−0.02 0 0.02

0.90.920.94

α5

2.5 3 3.50.9

0.9050.91

kx 10−56−0.02 0 0.02

0.850.9

0.95

α6

2.5 3 3.50.9

0.910.92

kx 10−57−0.02 0 0.02

0.90.920.94

α7

2.5 3 3.50.9

0.9050.91

kx 10−58−0.02 0 0.02

0.850.9

0.95

α8

0 0.05 0.10.80.9

1

μ

Fig. 8. Estimated main effects for the availability at t¼100 and based on 200 design points for move-drive system given in (17).

Table 5The emulator-based sensitivity analysis of the system

availability of the move-drive system at t¼100 and based

on 100 and 200 design points.

Measurement 100 points 200 points

Variance (%) of k1 3.59 4.53

Variance (%) of a1 0.82 0.86

Variance (%) of k2 0.42 0.08

Variance (%) of a2 0.22 0.02

Variance (%) of k3 0.87 0.61

Variance (%) of a3 0.85 0.51

Variance (%) of k4 0.56 0.17

Variance (%) of a4 9.53 0.53

Variance (%) of k5 1.57 0.12

Variance (%) of a5 0 0

Variance (%) of k6 1.15 0.14

Variance (%) of a6 0 0

Variance (%) of k7 1.56 0.12

Variance (%) of a7 0.02 0

Variance (%) of k8 1.13 0.20

Variance (%) of a8 0.02 0

Variance (%) of m 76.94 91.57

Total variance (%) 99.2358 99.4702

AEmð100Þ 0.907689 0.905833

EmStdðAð100ÞÞ 0.00034 0.00031

A. Daneshkhah, T. Bedford / Reliability Engineering and System Safety 112 (2013) 82–9392

more times. There is certainly a case for studying the sensitivitythrough time. In the case that the dynamic behaviour of thereliability quantity is of our interest then variants of the approachesused above can be employed. Conti and O’Hagan [8] suggested thefollowing procedures to emulate a dynamic simulator:

1.

The first method consists of using a multi-output emulator,developed by [8], where the dimension of the output space isq¼T and we assume that the outputs of the dynamic model ispresented by y¼ ðy1, . . . ,yT Þ. Hence this approach simulta-neously emulates at different time points.

2.

The second approach uses a single-output emulator, followingan idea originally mentioned by Kennedy and O’Hagan [19] toanalyse the spatial outputs. The time is considered as an extra

input for this model, and the output yt ¼ f tðxÞ can now bepresented as f nðx,tÞ, where t¼ 1, . . . ,T.

3.

The third approach is to emulate the T outputs separately, eachvia a single-output emulator. Data for the t-th emulator wouldbe then provided by the corresponding column of the data set.

Any of these methods can be used to emulate A(t) and then usethe fitted emulator to examine the sensitivity analysis of A(t) (andother quantities of interest in reliability analysis) over time.

Buzzard [5] proposed an alternative method at which he usedthe sparse grid interpolation to provide good approximations tosmooth functions in high dimensions based on relatively fewfunction evaluations. He used an efficient conversion from theinterpolating polynomial provided by evaluations on a sparse gridto a representation in terms of orthogonal polynomials. He thenshows how to use these relatively few function evaluations toestimate several types of sensitivity coefficients and to provideestimates on local minima and maxima. Applying Buzzard’smethod to examine the sensitivity analysis of interest in thispaper and comparing it with the emulator-based method pro-posed here can be considered as the future works.

Having a solution to the computational problem of computingsensitivity for large scale models, however, does not make thesemodels easily useable in a practical context. The uses for this kindof analysis are, in our view, two fold.

A high level screening of reliability parameters to identifywhich parameters are the most significant, and hence toreduce the number of parameters being considered. � An exploratory approach to look at gross model sensitivities.

The second point may require us to reparameterise our reliabilitymodels to insert extra parameters that represent model features. Forexample, when a material is being used in a novel context, there maybe a lot of uncertainty about the time at which aging begins. Henceinstead of (say) a Weibull model, we could use a distribution whichincludes a specific time to onset of aging. Similarly, when investigat-ing the impact of possible logistical delays then extra parameters areintroduced to model the repair delay.

Page 12: Probabilistic sensitivity analysis of system availability using Gaussian processes

A. Daneshkhah, T. Bedford / Reliability Engineering and System Safety 112 (2013) 82–93 93

Finally it is clear that by modelling costs associated to therepair process we can compute the uncertainties in costs up totime t alongside the availability.

The approach given above assumes that we are able todetermine a distribution G for the model parameters. Typicallythe distributions should come from generic databases for newequipment, or be informed by industrial data records wheresimilar equipment being used in a similar context. The uncer-tainty represents the state of knowledge uncertainties about theparameter values—uncertainties which, if we learn from operat-ing experience, will be reduced through time. Hence the methodsdiscussed here are most appropriate in the design phase for newequipment and the planning stage for maintenance, when thesestate of knowledge uncertainties are greatest.

Acknowledgements

This research was supported by the Engineering and PhysicalSciences Research Council (Grant EP/E018084/1). The authors alsowish to acknowledge discussions with Dr. John Quigley.

References

[1] Bedford T. Sensitivity indices for (tree)-dependent variables. In: Chan K,Tarantola S, Campolongo F, editors. SAMO’98, Proceedings of second inter-national symposium on sensitivity analysis of model output. EUR 17758 EN,JRC-EC, Ispra, 1998. p. 17–20.

[2] Bedford T, Cooke R. Probabilistic risk analysis: foundations and methods.Cambridge: Cambridge University Press; 2001.

[3] Borgonovo E. A new uncertainty importance measure. Reliability Engineeringand System Safety 2007;92(6):771–84.

[4] Borgonovo E, Castaings W, Tarantola S. Model emulation and moment-independent sensitivity analysis: an application to environmental modelling.Environmental Modelling and Software 2012;34:105–15.

[5] Buzzard GT. Global sensitivity analysis using sparse grid interpolation andpolynomial chaos. Reliability Engineering and System Safety 2012;107:82–9.

[6] Caniou Y, Sudret B. Distribution-based global sensitivity analysis usingpolynomial chaos expansions. Procedia—Social and Behavioral Sciences2010;2(6):7625–6.

[7] Caniou Y, Sudret B. Distribution-based global sensitivity analysis in case ofcorrelated input parameters using polynomial chaos expansions. In:Nishijima K, editor. Applications of statistics and probability in civil engineering.CRC Press; 2011. p. 695–702.

[8] Conti S, O’Hagan A. Bayesian emulation of complex multi-output anddynamic computer models. Journal of Statistical Planning and Inference2010;140:640–51.

[9] Cukier RI, Fortuin CM, Shuler KE, Petschek AG, Schaibly JH. Study of thesensitivity of coupled reaction systems to uncertainties in rate coefficients.I. Theory. Journal of Chemical Physics 1973;59(8):3873–8.

[10] Daneshkhah A, Bedford T. Sensitivity analysis of a reliability system usingGaussian processes. In: Bedford T, Quigley J, Walls L, Alkali B, Daneshkhah A,Hardman G, editors. Advances in mathematical modeling for reliability.Amsterdam: IOS Press; 2008. p. 46–62.

[11] Gustafson P. The local sensitivity of posterior expectations. Unpublished PhDthesis. Department of Statistics, Carnegie Mellon University; 1994.

[12] Helton JC. Treatment of uncertainty in performance assessments for complexsystems. Risk Analysis 1994;14(4):483–511.

[13] Helton JC. Uncertainty and sensitivity analysis in the presence of stochasticand subjective uncertainty. Journal of Statistical Computation and Simulation1997;57(1–4):3–76.

[14] Helton JC, Davis FJ. Latin hypercube sampling and the propagation ofuncertainty in analyses of complex systems. Reliability Engineering andSystem Safety 2003;81(1):23–69.

[15] Helton JC, Johnson JD, Sallaberry CJ, Storlie CB. Survey of sampling-basedmethods for uncertainty and sensitivity analysis. Reliability Engineering andSystem Safety 2006;91(10–11):1175–209.

[16] Helton JC, Sallaberry CJ. Yucca mountain 2008 performance assessment:incorporation of seismic hazard curve uncertainty. In: Proceedings of the13th international high-level radioactive waste management conference(IHLRWMC), Albuquerque, NM April 10–14, 2011. La Grange Park, IL:American Nuclear Society; 2011. p. 1041–8.

[17] Hoffman FO, Hammonds JS. Propagation of uncertainty in risk assessments:the need to distinguish between uncertainty due to lack of knowledge anduncertainty due to variability. Risk Analysis 1994;14(5):707–12.

[18] Homma T, Saltelli A. Importance measures in global sensitivity analysis ofmodel output. Reliability Engineering and System Safety 1996;52:1–17.

[19] Kennedy MC, O’Hagan A. Bayesian calibration of computer models (withdiscussion). Journal of the Royal Statistical Society: Series B 2001;63:425–64.

[20] Kumamoto H, Henley EJ. Probabilistic risk assessment and management forengineers and scientists. Piscataway, NJ: IEEE Press; 1996.

[21] Marseguerra M, Zio E, Podofillini L. First-order differential sensitivity analysisof a nuclear safety system by Monte Carlo simulation. Reliability Engineeringand System Safety 2005;90:162–8.

[22] Oakley JE, O’Hagan A. A probabilistic sensitivity analysis of complex models:a Bayesian approach. Journal of the Royal Statistical Society: Series B2004;66:751–69.

[23] O’Hagan A, Kennedy M, Oakley JE. Uncertainty analysis and other inferencetools for complex computercodes (with discussion). In: Bernardo JM, BergerJO, Dawid AP, Smith AFM, editors. Bayesian statistics, vol. 6. Oxford, UK:Oxford University Press; 1999. p. 503–24.

[24] Pat�e-Cornell ME. Uncertainties in risk analysis: six levels of treatment.Reliability Engineering and System Safety 1996;54(2–3):95–111.

[25] Parry GW. The characterization of uncertainty in probabilistic risk assess-ments of complex systems. Reliability Engineering and System Safety1996;54(2–3):119–26.

[26] Ratto M, Castelletti A, Pagano A. Emulation techniques for the reduction andsensitivity analysis of complex environmental models. EnvironmentalModelling and Software 2012;34:1–4.

[27] Ratto M, Pagano A. Using recursive algorithms for the efficient identificationof smoothing spline ANOVA models. Advances in Statistical Analysis2010;94:367–88.

[28] Reedijk CI. Sensitivity analysis of model output: performance of various localand global sensitivity measures on reliability problems. Master’s thesis, DelftUniversity of Technology; 2000.

[29] Sacks WJ, Welch TJ, Mitchell, Wynn HP. Design and analysis of computerexperiments. Statistical Science 1989;4(4):409–35, With comments and arejoinder by the authors.

[30] Saltelli A, Chan KPS, Scott M, editors. Sensitivity analysis. New York: Wiley;2000.

[31] Saltelli A, Tarantola S, Chan KPS. A quantitative model-independent method forglobal sensitivity analysis of model output. Technometrics 1999;41(1):39–56.

[32] Sobol I. Sensitivity analysis for nonlinear mathematical models. Mathema-tical Modelling and Computational Experiment 1993;1:407–14.

[33] Sudret B. Global sensitivity analysis using polynomial chaos expansions.Reliability Engineering and System Safety 2008;93(7):964–79.

[34] Zhang T, Horigome M. Availability and reliability of system with dependentcomponents and time-varying failure and repair rates. IEEE Transactions onReliability 2001;50(2):151–8.


Recommended