On Model Validation Techniques Alex Karagrigoriou University of Cyprus "Quality - Theory and...

On Model Validation Techniques

Alex KaragrigoriouUniversity of Cyprus

"Quality - Theory and Practice”,

ORT Braude College of Engineering, Karmiel, May 2012

OUTLINE

• Introduction

• Graphical Methods

• Likelihood Method

• Kolmogorov Test

• Chi-Squared Tests

• Tests based on Measures

After fitting a distribution model to a data set when performing life data analysis, we are often interested in diagnosing the model's fit or comparing the fit of different distributions.

In addition to the engineering knowledge that should always govern the choice of a distribution model, there are many statistical tools that can help in deciding whether or not a distribution model is a good choice from a statistical point of view.

These tools can also be used to compare the fit of different distributions.

Reliability Terms

•Mean Time To Failure (MTTF) for non-repairable systems

•Mean Time Between Failures for repairable systems (MTBF)

•Reliability Probability (survival) R(t)

•Failure Probability (cumulative density function) F(t)=1-R(t)

•Failure Probability Density f(t)

•Failure Rate (hazard rate) λ(t)

•Mean residual life (MRL)

Time Distributions (Models) of the Failure Density

• Exponential Distribution

Very commonly used, even in cases to which it does not apply (simple);

Applications: Electronics, mechanical components etc.

• Normal Distribution

Very straightforward and widely used;

Applications: Electronics, mechanical

components etc.

-( ) tf t e • Lognormal Distribution

Very powerful and can be applied to describe various failure processes;

Applications: Electronics, material,

structure etc.

• Weibull Distribution

Very powerful and can be applied to

describe various failure processes;

Applications: Electronics, mechanical

components, material, structure etc.

2

2

(ln - )-

21( )

2

t

f t et

-1-

( )t

tf t e

Probability Plots – Graphical Validation

Probability plotting (e.g. Q-Q plot) is a graphical method that allows a visual assessment of the model fit.

Once the model parameters have been estimated, the probability plot can be created.

The next figure shows a comparison of the probability plots of the two choices (Weibulll & Exponential) using the data set.

Problems typical with reliability & survival data

Censoring when the observation period ends, not all units have failed - some are survivors)

Lack of Failures if there is too much censoring, even though a large number of units may be under observation, the information in the data is limited due to the lack of enough failures)

Practical difficulty when planning reliability assessment tests and analyzing failure data.

Type I Censoring – Right Censoring

n items are observed during a fixed time period [0, T]. The number of failures r is random. n-r items (also random) will be in operation (censored) at the end of the time period.

Also called "right censoring" since the failure times to the right (i.e., larger than T) are missing.

Type II Censoring

We run the test until we observe exactly r failures. The time period T is random. n-r units are in operation (nonrandom).

In Type II censoring we know in advance how many failuretimes we have - this helps when planning adequate tests.

However, an open-ended random test time is generally impractical from a management point of view and this type of testing is rarely seen.

Readout or Interval Censored Data

Sometimes exact times of failure are not known; only an interval of time in which the failure occurred is recorded.

Likelihood Value

Use the MLE (Maximum Likelihood Estimation) method to estimate the parameters. Then, the likelihood value can be used to assess the fit:

The distribution with the largest L value is the best fit.

Table: Comparing the log-likelihood value for comparing the fit of two distributions.

The log-likelihood value for the Weibull distribution is greater than that for the exponential distribution (i.e. the Weibull distribution is statistically a better fit).

data set

Distribution Model Weibull Exponential

Parameters β = 3.03, η = 100.99 λ = 0.0111

Log-Likelihood Value

-48.42 -55.04

Modified Kolmogorov-Smirnov (KS) Test

The standard (KS) test is used for continuous distributions with known parameters. The Modified KS test is used when the parameters are unknown and need to be estimated.

For N failure times , we define to be the

empirical distribution function. The Modified KS test uses the maximum of the absolute difference between and the fitted cumulative distribution function, Q(t):

1,..., Nt t ( )NS t

( )NS t

The distribution of the Modified KS test in the case of the null hypothesis (i.e. data set drawn from the fitted distribution) can be calculated.

The test returns the probability that . A high probability value, close to 1, indicates that there is a significant difference between the theoretical distribution and the data set.

The value for the Weibull distribution is smaller thus:

the Weibull distribution is statistically a better fit.

test


Parameters β = 3.03, η = 100.99 λ = 0.0111

P(DCRIT < Dmax) 14.84% 89.58%

CRITD D

Chi-Squared TestThe chi-squared test relies on the idea of grouping the data into a suitable number of intervals. Grouping involves a loss of information, and there is also often considerable arbitrariness in how the intervals are chosen. The optimal number k of intervals for a sample of size N may be estimated from Sturges' Rule

Let Ni be the number of data points in the i interval and ni the expected number according to the fitted distribution. The chi-squared statistic is

A high probability value, close to 1, indicates that there is a significant difference between the theoretical distribution and the data set.

Table: Comparing two distributions using the chi-squared test

The value for the Weibull distribution is smaller (i.e. the Weibull distribution is statistically a better fit).


Parameters β = 3.03, η = 100.99 λ = 0.0111 P(χ2 CRIT < χ2) 26.50% 66.76%

Empirical model fitting – Distribution Free

(Kaplan-Meier) approach

•No underlying model (Weibull, lognormal etc) is assumed

•K-M estimation is an empirical (non-parametric) procedure

•Exact times of failure are required

Kullback-Leibler:

Matusita:

Kagan:

Csiszar:

Hellinger:

wwhheerree iiss aa ccoonnvveexx ffuunnccttiioonn iinn 0,

ssaattiissffyyiinngg cceerrttaaiinn ccoonnddiittiioonnss

Cressie and Read:

Observe that Csiszar’s measure reduces to Kullback-Leibler divergence

if ( ) logu u u . If 21( ) 1

2u u or 2

( ) 1u u Csiszar’s measure

yields the Kagan’s and the Matusita’s divergence respectively.

Methods based on Measures

21

The BHHJ Power Divergence [Basu et. al (1998)]

The BHHJ family reduces

to the Kullback-Leibler divergence for α↓0 and to the square of L2 distance for α = 1.

where (1.5.4)

22

1 1

1

1 1: 1

ma a a

a j i j jj

BHHJ d q p q pa a

,

(1.5.5)

Discrete cases: Distance between 2 binomial/multinomial

1 1( / ), ln( / )

m m

j j j j j jj jCsiszar q p q KL p p q

1 2 1 2

1 2 1 2

: , , : ,

: , ,... , : , ,...m m

P p success p failure Q q success q failure

P p p p Q q q q

The AIC Model Selection Criterion

For the construction of AIC, Akaike used the K-L measure

Akaike proposed the evaluation of the 2nd term (expected LogLik) using minus twice the mean expected LogLik

Finally, he provided an unbiased estimator of the expected LogLik:

ˆ ˆln ln (ln ) (ln )KLg gI g g g f E g E f

ˆ ˆ 12 (ln ) 2 .... (ln ) ( ) ...g g g i nE E f E f g x dx dx

ˆ

2 2ln( ( ))iAIC f x p

n n

The AIC Model Selection Criterion

where p is the number of unknown parameters involved in the model/distribution.

In our case:

Weibull model: AIC=2x48.42 + 4=100.84

Exponential : AIC=2x55.04 + 2=112.08

The Weibull fit is better.

( ) 2 ln Likelihood 2AIC p p

Other Model Selection Methods

where p is the number of unknown parameters involved in the model/distribution.

( ) 2 ln Likelihood ln( )BIC p n p

( ) 2 ln Likelihood ln(ln( )) , 2HQ p c n p c

The DIC criterion is derived based on the BHHJ measure.

The DIC Model Selection

1 1ˆ ˆ ˆ

1

1( ) 1 ( ), 0 1

na a

ii

Q f z dz a f x an

Modified Divergence Information

Criterion (MDIC)

2 / 2/ 2( ) * (2 ) (1 )ˆ

paMDIC p n MQ a p

1ˆ ˆ

1

11 ( )

na

ii

MQ f xn

where .

28

Tests based on Measures

29

Goodness of Fit Tests

30


31

Compare BHHJ test with the goodness of fit tests based on the Kullback measure (KL), the Kagan measure (Pearson chi-square test), the Matusita measure (Mat), and the Cressie and Read measure (CR).

Three different values of the index α are used: α = 0.01, 0.05 & 0.10.

Both the power and the type I error are investigated.

Simulated results: A trinomial distribution is used with n=150 and a number of 10000 simulations have been created.

32


33


34

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3

BHHJ

KULLBACK

KAGAN

MATUSITA

% of rejections when Ho: M(150, 0.2, 0.6, 0.2) holds

% o

f rej

ectio

ns w

hen

H1:

M(1

50,

0.2,

0.7

, 0.

1) h

olds

POWER vs ‘SIZE’ of the TESTGoodness of Fit Tests

Date post:	28-Dec-2015
Category:	Documents
Upload:	ernest-mccormick
View:	213 times
Download:	0 times

On Model Validation Techniques Alex Karagrigoriou University of Cyprus "Quality - Theory and...

Documents