Date post: | 18-Jan-2016 |
Category: |
Documents |
Upload: | justin-brooks |
View: | 221 times |
Download: | 0 times |
Discrepancy between Data and Fit
Discrepancy between Data and Fit
IntroductionWhat is Deviance?Deviance for Binary Responses and Proportions
Deviance as measure of the goodness of fitPearson StatisticGoodness of Fit for Continous PredictorsSummary
Introduction
Some measure of the discrepancy between a model and the observations is wanted
The sum of squared residuals is not always appropriate because the distribution might not be symmetric
When the unknown parameters of the underlying distributions are estimated by maximum likelihood, the deviance is a common measure
In simple words: The deviance is a comparison between the model fit and the perfect fit
Basic test statistic: likelihood ratio statistic:
L(model): maximum Likelihood when a model is fit
L(submodel): maximum Likelihood if a more restrictive model is fit
Consider now:
„submodel“ represents the fitted model
„model“ represents the model with perfect fit („saturated model“) which has as many parameters as observations, and therefore perfect fit – all observations are covered.
►The difference between the loglikelihoods of a fitted and the best possibly fitting model is called „Deviance D“
λ=−2logL (submodel )
L (model )
λ=−2 {lo g L (fitted subm odel)− lo g L (fitted m odel )}=− 2 {l (fitted subm odel )−l (fitted m odel)}
Deviance for Binary ResponsesA set of binary data is given by:
The log-likelihoods for the two models: , with ,
The deviance for the responses is then given by the difference of the log-likelihoods:
or:
D( y , π̂ )=2∑i=1
n
d ( y i , π̂ i ) ,where d ( yi , π̂ i )={−log( π̂ i) yi=1
−log(1−π̂ i ) yi=0
The simple form: d ( y i , π̂ i)=−log(1−|y i−π̂ i|)
Deviance for proportions
Proportions: Data with N binomially distributed yi . yi is composed from ni binary variables.
There are n observations in total ( ) and N model-specific probabilities .
Application:
are the means ( , )
of repeated observations at the point .
Deviance:
Difference to single binary observations:
the means are fitted by the models, not the datapoints!
Use of Deviance for Goodness of Fit
Example: single sample and n independent Bernoulli variables. n1 is the number of observations with yi=1, and n2=n-n1 the number with yi=0.
Since the probability is the same for all observations:
Deviance:
only dependent on !Does not reflect the Goodness of Fit
Deviance as Goodness-of-Fit Statistic
Example: Unemployment
Sample size: 982Success (y = 1): short term unemployment (≤ 6 months)Response and covariates: gender (1:male, 0:female) and age (16-61)
A software gives:
The Probabilityis higher for males,and decreases with age
Deviance for ungrouped data: 1224,1 on 979 df → cannot be considered a goodness-of-fit statistic.
But if the data are grouped into 91 combinations of age and gender effects, the deviance is 87,16 on 88 df (91-3 fitted parameters).
In the grouped case, the deviance has an asymptotic distribution, and the main effect model is acceptable
Example: Household Commodities
Two linear logit models „car“ and „PC“, with the only covariate „net income“
Strictly binary responses
→ Deviance cannot be considered goodness-of-fit statistic
Number of observations and the linear predictor are the same
→ comparison of the two models
The linear logit model seems to fit the response „PC“ data better. But ist the fit significantly better?
Pearson Statistic
For proportions and large ni, the deviance can be used as a goodness-of-fit statistic
The Pearson statistic is an alternative:
It is also asymptotically χ-distributed with N-dim(x) df, when N is fixed and
It has the same asymptotic distribution as the deviance when for fixed N:
If the values of D and differ strongly, that is a hint that the test statistics are not reliable
The Kullback-Leibler Distance is a concept related to the deviance
It is a general directed measure for the distance between two distributions
with support and the probability vectors of the two probability functions it has the form:
the Deviance can be formulated with convenient mass functions:
The ML Estimator can be seen as a minimum distance estimator that minimizes the sum of the KL-distances
Generally, large values of the test statistic indicate lack-of-fit, but not the reason for it
→ comparison of fits (different predictors, different link funktions) is necessary to get some insight
Goodness-of-Fit for Continous Predictors
Deviance and Pearson cannot be used
Categorizing the continous variables might be an easy remedy,
Power is lost
Does not work well with larger numbers of variables (how to categorize?)
Several methods are available for continous variables, e.g.
Hosmer-Lemeshow Test
Alternatives based on e.g.:
Score test Pseudo likelihood ratio using smoothed response probability
estimates
Hosmer-Lemeshow Test Pearson-type test statistic where categorizing is based on the fitted values
Responses are ordered according to the fitted probabilities and forms N equally sized groups
N=10 groups are proposed by Hosmer and Lemeshow
yij is the jth observation in the ith group, and there are N groups with n
i
observations each
The averages of the observations of the groups are compared to the averages of the fitted probabilities (for each observation) in a Person-like statistic:
Hosmer and Lemeshow showed that the asymptotic distribution can be approximated by a χ²-distribution with df=N-2
Grouping is based on the model itself → one cannot expect good power
A large value of the statistic indicates lack-of-fit, but not the reason for it
One alternative to the H-L test A pseudo-likelihood ratio test by using a kernel-smoothed estimate of the
response probabilities can be used.
Binomial data are used, and a smoothing parameter h is introduced to form a smooth nonparametric estimate of the response probability
Corresponding pseudo-likelihood ratio statistic:
Investigated hypothesis:
Not asymptotically χ²-distributed → Null hypothesis behaviour of the test statistic is examined by simulating data from the fitted parametric model
Kernelfunktion K:
Summary
The deviance and related statistics quantify the difference of the fit of two models
The distributions do not need to be symmetric
The use of grouping can improve the power of the statistic
The fit for continous predictors can be assesed by grouping
The Goodness-of-Fit does not indicate if a model is „correct“ or what the reason for lack-of-fit is
Thank you for your attention