Discrepancy between Data and Fit. Introduction What is Deviance? Deviance for Binary Responses and...

Discrepancy between Data and Fit

Discrepancy between Data and Fit

IntroductionWhat is Deviance?Deviance for Binary Responses and Proportions

Deviance as measure of the goodness of fitPearson StatisticGoodness of Fit for Continous PredictorsSummary

Introduction

Some measure of the discrepancy between a model and the observations is wanted

The sum of squared residuals is not always appropriate because the distribution might not be symmetric

When the unknown parameters of the underlying distributions are estimated by maximum likelihood, the deviance is a common measure

In simple words: The deviance is a comparison between the model fit and the perfect fit

Basic test statistic: likelihood ratio statistic:

L(model): maximum Likelihood when a model is fit

L(submodel): maximum Likelihood if a more restrictive model is fit

Consider now:

„submodel“ represents the fitted model

„model“ represents the model with perfect fit („saturated model“) which has as many parameters as observations, and therefore perfect fit – all observations are covered.

►The difference between the loglikelihoods of a fitted and the best possibly fitting model is called „Deviance D“

λ=−2logL (submodel )

L (model )

λ=−2 {lo g L (fitted subm odel)− lo g L (fitted m odel )}=− 2 {l (fitted subm odel )−l (fitted m odel)}

Deviance for Binary ResponsesA set of binary data is given by:

The log-likelihoods for the two models: , with ,

The deviance for the responses is then given by the difference of the log-likelihoods:

or:

D( y , π̂ )=2∑i=1

n

d ( y i , π̂ i ) ,where d ( yi , π̂ i )={−log( π̂ i) yi=1

−log(1−π̂ i ) yi=0

The simple form: d ( y i , π̂ i)=−log(1−|y i−π̂ i|)

Deviance for proportions

Proportions: Data with N binomially distributed yi . yi is composed from ni binary variables.

There are n observations in total ( ) and N model-specific probabilities .

Application:

are the means ( , )

of repeated observations at the point .

Deviance:

Difference to single binary observations:

the means are fitted by the models, not the datapoints!

Use of Deviance for Goodness of Fit

Example: single sample and n independent Bernoulli variables. n1 is the number of observations with yi=1, and n2=n-n1 the number with yi=0.

Since the probability is the same for all observations:

Deviance:

only dependent on !Does not reflect the Goodness of Fit

Deviance as Goodness-of-Fit Statistic

Example: Unemployment

Sample size: 982Success (y = 1): short term unemployment (≤ 6 months)Response and covariates: gender (1:male, 0:female) and age (16-61)

A software gives:

The Probabilityis higher for males,and decreases with age

Deviance for ungrouped data: 1224,1 on 979 df → cannot be considered a goodness-of-fit statistic.

But if the data are grouped into 91 combinations of age and gender effects, the deviance is 87,16 on 88 df (91-3 fitted parameters).

In the grouped case, the deviance has an asymptotic distribution, and the main effect model is acceptable

Example: Household Commodities

Two linear logit models „car“ and „PC“, with the only covariate „net income“

Strictly binary responses

→ Deviance cannot be considered goodness-of-fit statistic

Number of observations and the linear predictor are the same

→ comparison of the two models

The linear logit model seems to fit the response „PC“ data better. But ist the fit significantly better?

Pearson Statistic

For proportions and large ni, the deviance can be used as a goodness-of-fit statistic

The Pearson statistic is an alternative:

It is also asymptotically χ-distributed with N-dim(x) df, when N is fixed and

It has the same asymptotic distribution as the deviance when for fixed N:

If the values of D and differ strongly, that is a hint that the test statistics are not reliable

The Kullback-Leibler Distance is a concept related to the deviance

It is a general directed measure for the distance between two distributions

with support and the probability vectors of the two probability functions it has the form:

the Deviance can be formulated with convenient mass functions:

The ML Estimator can be seen as a minimum distance estimator that minimizes the sum of the KL-distances

Generally, large values of the test statistic indicate lack-of-fit, but not the reason for it

→ comparison of fits (different predictors, different link funktions) is necessary to get some insight

Goodness-of-Fit for Continous Predictors

Deviance and Pearson cannot be used

Categorizing the continous variables might be an easy remedy,

Power is lost

Does not work well with larger numbers of variables (how to categorize?)

Several methods are available for continous variables, e.g.

Hosmer-Lemeshow Test

Alternatives based on e.g.:

Score test Pseudo likelihood ratio using smoothed response probability

estimates

Hosmer-Lemeshow Test Pearson-type test statistic where categorizing is based on the fitted values

Responses are ordered according to the fitted probabilities and forms N equally sized groups

N=10 groups are proposed by Hosmer and Lemeshow

yij is the jth observation in the ith group, and there are N groups with n

i

observations each

The averages of the observations of the groups are compared to the averages of the fitted probabilities (for each observation) in a Person-like statistic:

Hosmer and Lemeshow showed that the asymptotic distribution can be approximated by a χ²-distribution with df=N-2

Grouping is based on the model itself → one cannot expect good power

A large value of the statistic indicates lack-of-fit, but not the reason for it

One alternative to the H-L test A pseudo-likelihood ratio test by using a kernel-smoothed estimate of the

response probabilities can be used.

Binomial data are used, and a smoothing parameter h is introduced to form a smooth nonparametric estimate of the response probability

Corresponding pseudo-likelihood ratio statistic:

Investigated hypothesis:

Not asymptotically χ²-distributed → Null hypothesis behaviour of the test statistic is examined by simulating data from the fitted parametric model

Kernelfunktion K:

Summary

The deviance and related statistics quantify the difference of the fit of two models

The distributions do not need to be symmetric

The use of grouping can improve the power of the statistic

The fit for continous predictors can be assesed by grouping

The Goodness-of-Fit does not indicate if a model is „correct“ or what the reason for lack-of-fit is

Thank you for your attention

Date post:	18-Jan-2016
Category:	Documents
Upload:	justin-brooks
View:	221 times
Download:	0 times

Discrepancy between Data and Fit. Introduction What is Deviance? Deviance for Binary Responses and...

Documents