NOTE TO USERS
This reproduction is the best copy available.
Diagnmtics for Generalized Lin- Models
Sonia Benghiat
A Thesis
in
The Department
of
Mathematics ancl Statistio;
Presenteci in Partial FWillment of the Requirements
for the Degree of Maser of Science at
Concordia UniversiS.
Montreal, Quek, Canada
Natianal Library Bibiiotheque nationale du Canada
Acquisitions and Acquisitions et Bibiiiraphic Services sewices bibliographiques
The author has granted a non- exclusive licence dowing the National Library of Canada to reproduce, loaa, distribute or sell copies of this thesis in microform, paper or elechmiic formats.
The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts fiom it may be printed or otherwise reproduced without the author's permission.
L'auteur a accordé une licence non exclusive permettant à la Bibliothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la forme de microfiche/fiIm, de reproduction sur papier ou sur format électronique.
L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.
Abstract
Diagndics for Generalieed Linear Models
Sonia Benghiat
The analysis of residuals c m capture departures from a parametrized model. In
this thesis we look at how the generalid Iinear model has become one of the m a t
important developments in statistics in the las* thirty years, anci on the aciequacy of
regessiion m d e l diagnostics that are meaningfiù and sienificaat in a generalimcl linear
model context. Some aymptotic pmperties are di- and numerical examples are
providl to ilinstrate the techniques for binomial, Poimon, and gamma clistributed
random variables.
Résumé
Des diagnostiques pour les modèles
Sonia Benghiat
généralisés
L'analyse des résidia est un outil fort puhant qui nous permet de vérifier la va-
lidité d'un moc.èle paramètnquc. Dans ce mémoire, je donne un aperçu de hqmtance
que les modèles linéaires généralisé; ont eu sur le déroulement des statistiques daas
les trentes dernières années. J'analyse la facilité que nous procurent de tels modèles
1orsqii'i.l s'agit des dinpostiques de régressiom. J'éxamine également les lois acc
yniptatiques cmnoeniant ce8 modèles. Finalement, je présente des exemples pour des
variables aléatoires b'moniiales, Poisson, et gamnm-
Acknowledgements
This thesis mdd not have k m passible without the patience and the boundless
support £rom rny hushd . To h I one a debt of gratitude. M y parents, niy brother
anci my sister continuously remindeci me of the importance of completing my mas%ers
degree aucl to them 1 am thankfd for their peMis-tent encouragements. 1 hold a
peat respect for mv supervisor Prof- Y. Chaubey. He very patiently guided the
advancenients of th thesis. To hini 1 express m y sincerest gratitude. 1 woiild also
like to thank Prof. J. Carrido who wjlhgly provided me with some usefiil materjal
for the realizatiou of tbis thesis. 1 thank Prof. A. Canty for kindly acxxpting to advise
me on the choice of my software application. 1 thank the graduate ~ecretaries and
the proft?ssors from the Mathematics and Statistics department, anct my clasmates,
not least, for their insightfui help and for dering a pleasant 1eaRLing environnient
altogether.
Contents
1 Introduction 1
1.1 The Linear M d e l . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
. . . . . . . . . . . . . . . . . . . . . 1.1.1 V ' v of Ass~mptions 5
. . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 a b e r Diagnmtics 10
. . . . . . . . . . . . . . . . . . . . . . . . 1.1.3 Remdial Mt?i~.%ues 12
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Outliue of Thesis 16
2 The Generatized Linear Mode1 17
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Historicril A S S A S 17
2.2 hiean and Variance Functions in au
. . . . . . . . . . . . . . . . . . . . . . . . . . . . Expnential F d y 19
. . . . . . . . . . . . . . 2.3 h n p t i ~ ~ of the Generalized Linear Mode1 20
2.4 Maximm Likelihood Estimation
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . for the GLM 25
. . . . . . . . . . . . . . . . . . 2.4.1 The Newton-Raphson Methoù 29
. . . . . . . . . . . . . . . . . . . . . 2.4.2 Fisher's Smring Methd 29
. . . . . . . . . . . 2.4.3 lteratively Weighted Least Squares (W) 31
. . . . . . . . . . . . . . . . . . . . . . . 2.5 The G o o h e s of M d e l Fit 34
. . . . . . . . . . . . . . . . . . . . . . 2.5.1 The Deviance Function 35
. . . . . . . . . . . . . . . . . . . . . . . 2.5.2 The Pearson StatWtic 36
. . . . . . . . . . . . . . 2.5.3 MdualsandtheProjectionMatrix 36
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Alternative hlodels 38
3 Residual Diagnostic Measures 43
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Modifieci Rtsiduals 43
. . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Muential Observations 50
. . . . . . . . . . . . . . . . . . . . . . . 3.3 Tésting the Chdnesof-Fit 52
. . . . . . . . . . . . . . . . . . . 3.4 Testing Goodnesof-Link hc.tions 55
. . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Software Apyliratims GO
4 Numerical Examples 62
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Intrdiiction 62
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Binomial Data 62
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Poisson Data 6'3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Gamma Data 76
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Conclusion 81
A Progréuns for Parameter Estimation for DifFerent Families 82
. . . . . . . . . . . . . . . . . . . . A.l MLE program for binomial f d y 82
. . . . . . . . . . . . . . . . . . . . A.2 MLE program for Poison f d y 85
. . . . . . . . . . . . . . . . . . . . A.3 MLE program for Gamma f d y 88
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.4 Onestep hinction 91
B 92
. . . . . . . . . . . . . . . . . . . . . . B.1 Output for the Herbicide data 92
B.2 Output for One-Step fiulction using the Herbicide data . . . . . . . . 93
List of Figures
4.1 Deviame residid for birth abnorrualities due to herbicide spray ex-
pmirue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 x residnals for birth abnonualities due tn herbicide spray exyosure . 4.3 Projection matrix cliagonai elements for birth abnormalities due to
herbicide spray qxmue . . , . . . . . . . . . - . . . . . . . . . . . . 4.4 Standardizerl change in for b ï ï aabormalities due tu herbicide
spray expcmre . . . . - . . . . . . . . . . . . . . . . . . . . . - . . . 4.5 stand^^ diange iu for herbicide data . . . . . . . . . . . . . . 4.6 Devimce residiiâls for defects found on furnitlue producecl in a certain
manSacturing plant . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 X. residuais for defects f o d on furnitun! prociuecl in a certain nianu-
fachuhg plant . . . . . . . . . . . . . . . . . . . . . . . . . . . . , . 4.8 Projection ruatrix diagonal elements for defects found on furniture prw
ducd in a certain m1if8c.turing plant . . . . . . . . . . . . . . . . . 4.9 Staudaràized diange in & for defecb fond on furnihue produced in
a certain manufadurhg plant . . . . . . . . . . . . . . . . . . . . . . 4.10 Standardid change ix& for furnifame damage data . . . . . . . . . 4.11 Standardized change ir& for furniture damage data . . . . . . . . .
. . . . . . . . . 4-12 ~tandardiz~dcbangeïn&forfurnituredarnegedata 74
. . . . . . . . . 4.13 Standarclizerl change in b4 for furniture damage data 75
. . . . . . . . . 4.14 Standarûizeci change in for furnitute cîarnage data 75
. . . . . . . . . . . . . . 4.15 Deviane residuals for lot1 of b1ooddot time 78
. . . . . . . . . . . . . . . . . . 4-16 x resid~ialc; for lot1 of bloodclot t h e 79
. . . . 4.17 Projedon ~lliitxix diagonal elements for lot1 of bldclot time 79
. . . . . . . . . 4.18 Stand=- change in A for lot1 of b l d c l o t time 80
. . . . . . . . . . 4.19 Standarclizecl change for lot1 of bloodclot time 80
List of Tables
2.1 Dispersion Pamrricter. Canoniaai tir& and Viricitux Function for LLs-
tnbutioris of the Exponer~tid Fundy . . . . . . . . . . . . . . . . . . . 24
2-2 Dictributiorr Functiorrs un'tlr h i r Assocàatd Ltrrk.9 . . . . . . . . . . . 41
2.3 An Extension of the N o d - T h e o r y L h m r M& to the GLM . . . . 41
2.4 Dtxriaficx Furtctiorr fm Ezponctriid Famdy DiPtributions . . . . . . . . 42
3.1 Anscombe und VaMnce-SLa6iiiting ReSidds Ezpmssed for the Bi-
nomial. Poi~son and Gamma d%sttibutions . . . . . . . . . . . . . . . 48
3.2 D ( r ~ t * u ~ ~ r ~ arui Adjusted Deviorux R d & for the Time Dhtributior~~ 49
4.1 h r n b c r of birth thornuditics out of total b i r t ? ~ pcr nronth for hcrli-
cidceflcct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 Contingeney table for -tu= defect . . . . . . . . . . . . . . . . . . 69
4.3 Bloml dotting tzmw in seconds for 9 perr;entage wncentmtiong of p l a m
a n d f o r 2 l o t s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Chapter 1
Introduction
1.1 The Linear Mode1
Most of the generalized linear mode1 cont;epts Stern from the theory of the normal
linear model. Before intrcducbg the generalized linear model, it is itsefirl to set the
scene by providiy: a brief review of the nomal linear mode1 in this hrst chapter, and
hence to derstand anti see the para1leL.s between the two types of mciclels.
The normal-theory linew m d e l is given by
where y is an n x 1 observation vector, X is a n x p known design matrix, P is a p x 1
vector of unklluwn parameters, d e d regression parameters anri e is an n x 1 vedor
of unobserveci random Vanables with zero mean and constant varianc'~ 2, whkh are
independently anci noroi;ùly dishibuteci. The muùel (1.1) is alternatively described
by the ruean-vector anci varianc~(YIVBfi8nce niatrix of the obrvatious y as
CHAPTER 1. INTRODUCTlON 2
The linearity of the mode1 is understoal in temis of the regrt&on parameters P. For estimation of the paramet-, the maicinnun W r & d method can be iised when
the error~ are normal. LikewWe, the principle of ieast squares provicies the same
estimates of the regression parameters. However, it does not require any distributional
asmmptiou. It is d e ~ T i M klw.
Least Squares Estimation of Parameters f l The 1-t squares methoci estimates the regression parameters f l by minimizing the
suni of squares:
= y'y - 2CIX'y + gx9Cp.
In additiou to being i ~ ~ , the least qm estuiiator (LSE) /j, the foilowhg
properties:
(1) have niinini~m variarice mong ail unbiased linear estiniatars (GaussMarkm
theoreni),
( 2 ) consistent, auci
Projection Matrix and Residuals
The builcihg blocks for cletecting influentid oticsewatioris in a giveri data are generatd
by the projection matrù, M, anci residd, e which are d&d in what follows.
Chmider the mode1 (1.1) with correspondhg fit;td values (9) and r&id vec%or (e)
dehed by:
The projection matrix M = (r*,) is definecl by:
is c d e d the " k t matrïxn. The projection 1~1;8trix is niast usefiil in the d . y s i s of
residds as it spam the r & d d space, Le.,
The residU e memue the Merence between the obse~ed anci the fitted values,
with the f o l l h g pruperties:
0 Var(e) = 02 (1 - H).
An d i a s e d estbator of d l>ased on the residual e is given by
whereby (1.8) is denoted by MSE, the nurrn s~urrrie due to c m r . Therefm,
Vm(e) = M S E (1 - H)
Theorem 1.1 The follouing am important pmpdie .5 &ted with the pmjection nta-
triz M:
1. H and M = (1 - H) are symmetric and idempotent,
2. m n l - M = r n n k ( I - H ) = t r ( M ) = t r ( I - H ) = n - p ,
and
CHAPTER 1- INTRODUCTION 5
2. Since (1 - H) is idempotent, r d ( I - H) = tr (1 - H). Fbtherrnore, since
It can be further deducecl that
In fitting a linear rqpsion malel, the &duah e c m be uyed to juste the auvmp
tions about the ranciou e m m r. Since e ir iinear in y, e iu a nmd011l variable f o U h g
a normal distribution, and hence the assumption of m d t y can be used to draw
inferences about the h e a r model. TbuY, an anal* which combines the d d u a l s
and the f i t d values will examine whether there are any departuns h m the linear
mode1 with n o d errors. The mode1 departimes to be examined are categorized as :
a non-constant variance,
non-independence,
omission of independent covariates.
Graphid methoch (se Draper rrml Smith [7], Chapter 4), involvirig the residuak
provide iisefiil tmls fur detecting s u i mode1 departures. They are describai below:
1- Plots of rwiduals agabt independent variables will detect potentid outliers,
non-constaut. &illlce, non-linearïty of an idependent variable or the need for
niore independent variables,
2. Plots of resicliiah agains* the titted valiies wili detect non-c'onstancy of variance,
3. Plots of residuak a-t t h (*(if pocrsible) will de-% non-independence anion*
errors or if the t h e effect has been omittd h m the mcdel,
4. BW-plots, n o r d probabiity piou, Half-normal plots, histograms and stem-
and-ieaf plots will check for n o d ~ and outliers, and
5. Plots of residuais again& other signiscant independent variables (if possible)
will detec* whether such variables rue to be included in the d e l .
Formal tes* buikl statistics iavolving residualy which are uYed to test the Müdity of
the foUawbe; u o d linear regression moclel assumptions:
F-test for Adequacy of the Regmamion Mode1
Consider the Liriear regession mode1 (1.1) whereby the e m ~i are assumeci to be
i - id. . The aûequacy of the mode1 is interpreted in the forru of the sienificmce of the
indepeudent variables (xi} i = 1 . . . : p - 1. The following hypotheses are testeci:
Ho : / 3 1 = / & = . . . = & l = 0
Ha : not all p, =O; j = 1: ... , p - 1-
It can be shown that the likelifioocl ratio tes* for Ho t V s . Ha if Ho is true yieldv the
following F-S-tistic:
J'(I-H)Y e'e MSE = = -
and
yC(H - k l ' l ) ~ M S R =
with the randtnu variable Fm,, having an F-distribution wïth VI. y desees of frec
dom. The critical region given in (1.15) is jus%ifid by the folloaring fats:
hlSK (ü) ( p - 1) 7 - Xc (A). where A = /3'Xf (H - 11') Xfit &(A) denotg the non-
central &-square random variable with u degrees of f i d o m and non-centraüty
parameter ( m p ) A.
(iii) AME and M S R are independent,
(iv) E ( M S R ) = 2 + IrX>(H - !II') x/~/O, - 1) 1 a2 = E ( M S E ) .
The asertions (i)-(üi) are consequences of Cochmn's Theomm (see Searle [23], Chap
ter 3), essentially l>y usine; the following theoreni:
Theorem 1.2 Let a - N(0,I). Then,
(1) d h ha9 a ?-distribution with d ( A ) = &pzs of jkdom, iif/. A ip idempo-
tent;
CHAPTER 1. INTRODUCTION
(2) dAz and z'Bz am independent iff. AB = O.
where z - N ( O t 1) md A = (1 - H ) .
S i m e A is idempotent witb rauk rz - p (Theoreni 1. l ) , it fdows that
M S E (n - P) 0' Xn-p
and, similarly MSR f ( H - 1'1) y b ~ - 1)- =
u'L €9
lia a non-central q u e distribution with degrees of freedom=
trace (H - 5 11') = p - 1 and non-cxmtraliw parameter
S k : e HX = X, the nou-centrality paranieter simplifies to
which is 2 O and equal zero S. Ho holds.
Independence easily follows since
The as~?rtiori in (iv) is s strict ineqU81ity if at least one of the pj # O.
1.1.2 Other Diagnostics
Some diagnmtic bols are d to detect infiuential and outl-Mng observations in a
given regression model. The Studentited ~ s i d d is very informative in examinhg
residuals d e r a n o d mode1 skce it is stanciar- ancl it introduces the idea of
casse deletion, where the fit for al1 o\iervations is ixnnpard to the fit witb the delet&
case. MW,
Vkre) = C? M.
where
The diagonal elenients m, of the pmjectzon matriz depict thœe observations with
Iiigli-leverage (i-e. hi@y influentid observatioirs) since they are relatai to the dis-
tarice Msween % and S. Giveu tht X is of fidl r d , then
Hence, the average of diagonal elments mii is 1 - p/n and high-levwdge observa-
tions shoiild have b7n;ill values for m, as compared to 1 - p/n. A s a d e of thumb,
fiom H e i n and W e W ([l l]), if m y 5 1 - 2p/n, then the ith observation is a
hi&-leverage point. Thus, M is a uaefui diagioiltic tool for detecting iduential o b
servations.
Another type of ill-fitting point which a r h s in mod&fitting is an outiier. It does
not n a d y imp1y an iduential observation in a dven niodel. In fa*, an outlier
may be outweighed by neighboring X-valued points. S a , the effect that an outlying
point exerts on the fit ne& to be measurd. The smder the number of okrvations
involveci in a model, the greater the dec* of the outlier on the model- This can
be done through the diaepcstic t o d of Cook's distatux whicb meamues the &a% of
delethg an outlier from the data:
= (A~&xIY(A~~~). (1.18)
where A ~ B = - p-f, o., denotuig the m a l LSE of P 4 t h the tth obervation
deleted fiom the data.
It nives the (iistaux, h e e n the U Y I ~ Least s q i i a r a estiruator an<l the le& squares
estimator obtairied after the Ah observation has ben deleteci and provicies a measlue
for the change in leart squares es-ates B for the deletion of the tth observation. It
cati be showri that
A& =
hence, it be writteu that
cc
The residrial s i m of squares (RSS) will alw &ange a rem& of an okrvatiou
deletiou. This is measued by:
AeRSS = RSS - RSSdc
CHAPTER 1. INTRODUCTION 12
where RSS-( reptesents the RSS nith the îth case deleted. Another a p p r d is to
me.sure the perttubation of the fit by letting 4 - N(0, d / v i ) . Coasider
where O 5 r. 5 1 is a weight k.tot defùung the matrix V = diag(z+). The mdtiug
weighteci LSE of P is denoted by &).
At r = 1 : B(i) = 8, the wuai le& squares estimate, aucl
at r = O : &O) = & , the least squares =&hate when the Oth point is deletecl
from the data-
The nomal equatiom are chaugecl and cowequently, so are the least squares esti-
mates. B(r) c m be e x p r d as
The perturbation effect is rneas'ufeci by dinerentiating (1 -22) with respect to r:
1.1.3 Remedial Measures
If the nomalie arwmptions made on the 1-t squares eitimates for linear models
are not met in practico, then m e reniedial measlm need to be taken. Tbrotighoiit
the extensive literahue available on th topic, one of the mœt prominent solutions
is to use a tr@offlliition on the data whkh may keep the normal hear regression
CHAPTER 1. INTRODUCTION 13
form. Howeer, the implications indved with a dected transformation may not
necesyarily be easy tO interpret. Some of the standard rexnedial measmes talren in
case of Yarious mode1 departmes are d d b e d below.
Non-Lineariw
Non-linear Least Squares Estimation:
When a model has normally distrihted arors with cons&mt variance, but is
non-iinear in the independent variables, then the property of dùitive errors may
enable a linear mode1 thugh a transfomation of the independent variables.
The most cornmon trans.formations are the following:
Stich m d d s are intrirrnmlly bear ([7], Chapter 5). If these traadolllliltions
are not possible, then alternative non-linear models nÿry have to be oonsidered:
where x represeuts a vec-tor of pr&%or variables, g(& x) is not linear in ,O.
The least squares estimatoi of p for 0 is obtained tlirough ditferentiation of
the p nomal equationv which are not lin-, unlike in the case for ordinary
lesst squares. Hence, these normal equations are more complicatecl to solve.
Consequently, numerical methods are uYually required fa be useci to obtain
solutions.
When the observations are independent yet have uneqd variances, an ordi-
nary 1- squares @on may yield u n b i i estimates, but it wiU not have
minimum variance- Then the chenmtims need to be transformeci in ternis of
weights, u;- > 0: Var(yi) = sULh that
Large weights u i imply s d variances anci have more impact in a regressim
model.
Exaniples of weight cmmponents:
1. if the ith respoase is the rem& of an average of ni equally variable ohser-
vatiom, then Var(%) = d/n i wliere u;- = ni;
Theri, introdiiciug the weight matrix, W, the m&ed estimator of 13 is
Variance Stabilizing ~ - f o r m a t i o n s :
When the variancm of the observations are not constant, it is possible to trans-
form (see Rao 1221, Chapter 6) the observations to d e the variance constant.
For this methd to work, the form of the hetermϝasticity must be known,
which is often not the case. Hence, in practice, one seeks transformations in
C ' T E R 1. ClVTRODUCTION 15
- a larger family and loob for an opamal member in this famüy, which c l d y
follm the assumptionu of the mode1. One such transformation, known as the
Bcix-Ch transfommtiou, is disc-ussed later.
The r o u ~ t s s penalty apprmh using cubic spliaes is a method for reLaxing the
mode1 asnuuptions in the normal-th- linear case. It addrti~ses two equally
important problenis in m e estimation: that of finciing a goorl fit to the data
ued ancl that of quant- the rapid fluctuation of a c m . Consider a moclel
wliich in specified without phcing any restrictions on the m e g. Henco, if thcre
are rio ciis-tribiitional assmnptionu d e , then the norr.dity of errors assuruptiou
is relaxecl. hteth& associateù with the above mdel corne under the general
auqir:es of the topic of Non-paranietnc Repesion and the literatiire on this
topic is extensive (e G m u anù Silverman (101).
Non-normlllity and Hetemsced=ticity
for a positive response variable y > O. This transformation may brhg nymmetry
to a skewed resyoim and reduce the heavy tails of a distribution while still
retaïning the siniplicity of the normal iinear model. When it does not provide
a g d fit to the data, alternative apprdes have to be explored. One such
a p p d is to use the genemlued liruw model (GLM), where the response is
ass'u11ied to be10ng to the exponential f m y - The assixxmptions made here are
baseci on the concept that the response depends on the preciictors through a
linear fom. Thus, the Lin- mdeis are gendzeci through
1. a litrk hmdiou which relates the expectattion of the response to the linear
preciic-tor, and thruugh
2. an exp~nential f d y distribution for the emrs.
This d e l will l>e descxibed in detail in Chapter 2 and i9 the highlight of this thesis.
1.2 Outline of Thesis
The next chapter introcluces the GLM, with all the relevant notatioac. It gives the
properties of estimators and computational details for estimating the parameters for
conuiion exponentiai fiunilieri. Tests for gOo(iflfssof-fit and incIusion/excllusion of
variables are dso includd. The basic properties of res5duaJs in the nomai theory
linear niodels are i ~ d for extendhg the regression diagnostics to the generalized
h e m models in -ter 3. This extension is d e possible t h u g h transfomeci
residiials, whi& is explaineci in detail in that chapter. The final b p t e r presents
numerical illustrations of the techniques cliscussecl in Chapter 3 and @ v a a handson
experienco with real data through cornputer programi developed using the %Plus
software application.
Chapt er 2
The Generalized Linear Mode1
2.1 Historical Aspects
The terru "generalized linear modeln w= fiFst introduced by Nelder and Wedder-
buni iii 1072. The geueralized hear d e l ?us been one of the m05.t important
developnients in the field of statis-tics in the last thirty y-. Much uyed in applica-
tions to the social sciences anù medicine, these models also play an important role
in the aaalysis of sumival data. As their name imggest, these mcxiels generalize the
nomial-theory hear modehi s u c h that the usual linear regression coniponent is 1 d
to desc.ribe a wider class of yrobbility distributioiis, specIfidy the exponential faru-
ily distributions. A1thoite;li g m e r m hezu modeIs have had an important impact
on statistics, most introcluctory Ytativtics textbooks however, st i l l only present n o d
linear mdeis.
It was ~ e e n in Chapter 1 that an aùequate lin- m o n m d d ybdd inchde a
d e which ellsl1it-s the canibination of wnstancy of variance, appmbate normality
of the emfs, auri additivity of the qmtematic effects. Huwever, this d e does not
CHAPTER 2. THE GENERALUED iimE4.R MODEL 18
always respect all three criteria. For example, if some discrete data is found to
have errors with an apprkmate Poison distriition, the systematic dects may be
multiplicative, in which case log-linear models are uYually employed. The folIowbg
choices of sushg are obtained by t r d o m ü q tm :
0 yL/2 tO ensure apprmimate anwtancy of variance,
Generally, none of these Ycaling powibiilities combine di three criteria for an adquate
hear regession analalysks. Alternatively, a generalized linear mode1 encornpumes sr-
ponentially djstriLmtd enors anci a variance fimc%ion whi& depenàs on the mean
in some known way, so t h t there is no neeà to d e y for nonriality of errors or
for constancy of variance. In fact, the scd.ing problem is reciuced to ensuhg that
the sys-tematic effW are aciclitive. It may be considemi to be an extension to the
normal-theq lincar moclel with ~ o m e ddeù modifications where the mean p of
an exponential M y with resyonse variable y is linearly related to the predictors
XI.. . . x,, by a Iuik hinction, g(p). This L describecl in detail in the sec.tions ttiat
f0Ilrn~.
2.2 Mean and Variance Fhnctions in an
Exponent ial Family
An observation y foUows an exponential f d y distribution if its probability demi@
fiindion is givien by
where a! b. anclc are some known functions, û is the h t a o n pommeler and 4 is the
dispersion pamnxtcr. This is denoted by
When the dispersion parameter 4 is hown, 0 is the aznonid parameter. The mean
rind variance of y are given by U(B) and a(4)lP(B). Thuy it can be written that
is called the vu&mce finction. For example, in the case of the normal distribution,
û = pt V ( p ) = 1 and a(4) = O? These may be c l e n d kom
CHAPTER 2- THE GENERALBED fiLNEAR MODEL
respectively, where l is the log-iikelihod fimctïon, Note that
hence equatiou (2.6) yields
Var (y) = a(#)b"(B).
2.3 Description of the Generalized Linear Mode1
The okrvations belonging to a statistical mode1 can be summarid in terms of a
spteniatic component and a ranàom component. In the generalizeù linear mode1
CHAPTER 2- THE GENERALIZED LZNEAR MODEL 21
(GLM) diPcussed by McCullagh and NeMa [l?], the d o m copupanent is inherent
in the exponential M y &tribution of the o h t i o n , while the systematic camp
nent assumes a linear struc.tUre in the predictor vafiaHes for a func%ion of the mean.
This fiuiction is h m as the link fwrction. When the parameter 8 is modeled as
a linear function of the predictors, then the link function is known as the c a n o n i d
litut. Themfort?, for a g*en set of okrvatiorw {yi)& where yi iY wIwidered tu be
asmciated with pfeciictor values xi = (zil,. . . , z*)', the GLM is expressed as:
where 6 is assumcl to depend on xi through the relation
If g is the canonid link, theri, the link function is specifieci by
h yractice, a @en <lata set may be diytributed accordhg to soue uniaiown m e d e r
of the expnential family auci thetefore, different lïnk func.tiom have to be evdimted.
The lirik fuuction serves to d e t e d e the d e on whkh hearity is assunied, aiid the
form of the exponential famüy st~ctturea: the variation in the data. If the parameters
. . . , are unrestrickd, then g(p) can take a ~ y value in R, hence the link fimction
is determineci to some extent by the domuin of variation of p. For example, if the
response is a proportion, then the link function g must map the unit interval of the
domain of variation onto the unrestncted range (-w, oo). In the case where the
respom is limiteci ta king positive, g niiist niap the e t i v e interval onto R.
It is shown, as follows, that in the case of a canonid link, the sufücient statistic
for the linear parameter /3 is given by X'y, where X = (xi, .... qJr represents the
CHAPTER 2. THE GENERAtlZEn LINEAR MODEL 22
desi- rnatrix of the p predictor variables and y mpmsents the dimui vector of the
n observations-
To yee this, firss note that p = V(8) and for the canonid link g(p) = 8, then it
follows t k t
fact is i d in deriving the likelihood estimator of whkh will be
consequently shown to depencl on the okrvations y thr@ X'y. pro* the Yufn-
ciency. Here, the 1%-likelihd function is @veu by
where Bi = Now, the differentiation of the hkelihood function iu equation (2.10)
Using q m t i o n (2.9) dong with the a b e equation produces
which implies for canonical links that
X'y = X' - q(p),
for wme nonlinear func.tion q. This is attnbuted to the fact that g(p) = 9 bol& for
canonid linkii only Henc-,
Now, canonid links for the binomial, P o h n and gamma famüies are given re-
spec%ively by the l e t , log and inverse transfhnatim. Consider the probability
distribution of the proportion y b a d on a seqt~cnce of m identid Bernoulli trials
with proùab'ilit'. of sucx.ess ir, then
wliere B = kg et lieuce the c.auoriical luilr is @wu by the logît transfonuatiou and
the peralirRci liuear mode1 is @ v a by
For the Poisson data with mean p, the probabiiity distribution fimction is denotecl
by :
f (Y; 014) = exI?{(yB - ee) - log(y)}.
where B = log p, then clear1y the log transformation *el& a canonid link. Similady
for the ganuna data with deriYi@
1 -w/k a-i f (9) = Y 9
it may l>e reparauietrizec-l such that a = 114 and k = -#/O, hence to get
Therefore, p = ka = -118 and mwequently, the canonid link is given by
Table 2.1 : Diqm-szon Pamnieter, Cononad Li& a d Variancc Functzon for Distri-
b u t i o ~ ~ ~ of the Ezpmm~tiul Fandy
DISTRIBUTION Notation a(&) 9 = g(p) Nauie v(14
Table 2.1 gives cananical links and other components for oommon distribution
faniiiicr with respect tn the exponential family gïven by equation (2.1) [17]. The choice
of a proper link function that will sa* the criterion of the domain of variation p is
b d on:
1. how the liiik fundion wil l &y interpret the paranieters in the linear predidor;
2. how the link fits to the data; and
3. the existence of a sinipie siiffiCient statistic.
cHAPTER 2- THE GENERALUED LINt3A.R MODEL 25
Pœsible link functions aasocîateci to some important members of the scponentiaî
family are ated in 'Ildole 2.2. In sunmary, gendZRd lin- models make up a
general chus of pmhab'ic regression m d e h with the assumptions tbat:
(1) the respnse probability distribution is a menit.m of the exponential fàtuily of
distributions;
(2) the respLw ?/i i = 1: . . . ? n is a set of independent raadom variables;
(3) the explanatory variables are linearly combined to srplain systematic variation
in a func*tion of the mean.
in a practical &ta sikiatiou, GLM fittuig involves the following:
a choasing an error distribution that is relevant;
ideutmg the independent variables to be included in the systeniatic conipo-
nent; and
a s p e c m the link funr:tion.
The next section presents the maxixu~uii likelihd method for estimathe; the regres
sion parameters assurning that the above have been specifieci.
2.4 Maximum Likelihood Estimation
for the GLM
If the probab'ility specifications of an exponential f d y mode1 are howu by f (y? d ) ,
then the h - t way to fit a generalid lin- mde i is by Maximum Likelihd b%i-
matiou of the parameters 13 for the data oùservd (Silverman aml Green [IO]). With
CHAPTER 2- THE GEIWMUZEI) tJNEAR MODEL 26
many desirable pmperties of maximum klihood estimatom su& as mIlSiStency, e f L
ciency, diiciency and asymptotic nORnali@, it is naturd to amsider such a method
for GLMs. In p e r d , the maximum lïkelihood equations which result fiom GLMs
cannot be solved expficitly and hence remunie must be made to ~ m n e r i d methcuis.
Three meth& ore deynihl in thW section: the Newton-Fbpbn method, the Fisher
Scoring methoci, and the Iteratively Weightecl Leart Squares rnethoù. But k t , the
niaxinizm likelihood equations are derived. Given the raponses ylr.. . , y,, where gi
is üonsidered tu lx geuerated h m a menber of the exponential f d y &(Ol 4; a. 6. c) ,
the likelikood fiuiction is written as
i=l i= 1
Then the 1%-likeiihd is @ven by
whereby Ei is the ith mmponent to the log-likelihood and is therefore given by
The l ikel ihd implicitly depends on the pai.anetefs pj: j = 1:. . . ? p , h t l y t h @
the link fimction g(p) and se<mndy through the hearity that it encompasses with
respec* to values. The derivatives of the 1%-likelihd with respec.t to pi, 0 t h M ~ e
known as the score hmCtions, are evaluated by the chain rule:
Hence, the score fimctions rduce to
In a ve<.tor foriu, the score equatioiu; are ejven by
where
The maximm l ikaoocl etirnator of 0 is obtained by solving (2.19) uskg the lin-
earity founcl in g(p) = Xfi, where g(p) = (y(pi)?. . . , q(p.,,))'. Numerid methods
tu solve (2.19) are essentiaily iterative. Cornnion to all these methods is the starting
value of the estimate. With the i d t h t e aini of obtahhg a ugood" starting value of
the estimate, the following technique is employed using the apprOOLimacte linearkxi
form of g ( ~ ) = g(p) + (y - ~)g ' (p ) . The a d ! t e d dependent vatwte, z which depends
on both y and p is i n t d u d .
Given that the variance of z is ~(q5))[g'(~)]~V(~), an initial estimate of 0 may be
obtaind by Weightecl Least Squares of z (with p = y) on X, with varian~covariance
matrix given by a diagonal matrix whobpe campnents are @eCi by
1
Knom as the working ueights makiz, th% matrix is denoted by W. In msa; where
repeatd o ~ t i o n s occur at a @ven design point, yi is replacect by the awage of the
saiupk observations. Sioce the average also belongs to the sanie exponential f d y ,
with the variance replaceci by a(4)V(fi)/wr ni behg the number of observations on
whidi the saiup1e mean is baYed upou, the working weights mat* contains diagonal
elements @wu by
or eqiiivalently, so1vhg for the weighted least squares estiniator from the mode1
Both z and W are id for maximum iikelihood estimation t h u g h a tveighted least
squaws reg ries si or^ This promm is iterative, sime both z and W depend on the
fitted values of cment estimates available- Some smring methods are needed to
measure the iteratiori variations for a weightecl least squares mgesion of a GLM,
until convergence is mched.
2.4.1 The Newton-Raphson Method
The Newton-Raphson method pr-ts a numerical approâch to d d a t i n g the maxi-
mtm l ikel i t id esthate p. This itenrtive pr- begim with a weighted least squares
estiniator obtained from the initial sulution of (2.23). A Taylor-series expansion of
€(f i) about t(@)) is i~d:
This is iteratively repeated until convergence is obt;iined [IO].
2.4.2 Fisher's Scoring Method
If the negative sewnd4erivative matrix, or the H d a n matrix, is not positive definite
at every iteration (i.e. if it is not invertibe), then the Newton-Rapttson's algorithm
cHN"TER 2. THE G E N . E D LtIMMR MODEL 30
is no longer valid. In this case, the H d a n matrix is replsced by its expectation,
obtaining Fisher's s e aljlorithm This methal is simple &ce the expectd matrix
is more likely ta tx positive definite as
wbich is the expectation of a p a s definite matrix. Thus, the itmative process for
Fisher's scoring algorithm is @en bj.:
-1 where
= - ( E [&] ) k evahiated at the previous iteratim. For evaluatùig
the derivatives in (2.28), the linear pceilictor Q is irPed where = 4p:
and
Note tliat -E ij = [a($)]-Iwij for i = j , and it is = O for i # j. Consider do) to be the initial n-vector with
Then it follows that
Since q = X'P, then by the dain rule
The Fisher's sconrig aigorithm nriyieId~ the foUowing sequence of updated es-timates:
The <lispers-iori parameter 4 is eiim.inatd because a@) geh canceied in the multipli-
cation, heuce it L d e c l a nuisance paramekr (McCuhgh and Nelder [17]).
2.4.3 Iteratively Weighted Least Squares (IWLS)
As indicated in Section 2.4, the intrduction of the adjusteci dependent variate z
results in the foll- equation for the MLE [see (2.23)];
However, the 1; and W depend on the unlmown fi , hence th& equation gives rise to
the iterative pr- p l ) = b(i)
CHAPTER 2. THE GENL.IRALIzED LJNEAR MODEL 32
This is lmown as the method of àtemtàoely tuicighted faut squarre~, Iw"S. The starting
value of the iteration is obtained by substituting fio = y. At each iteration i , a
weighted least squares mgmision of the working respoilse tariate z(') on the design
matrix X W obtained with the working weights matrix ~ ( ' 1 , where di) and w(') are obtaind by rreplackg p with fi(') = g-l(~@i)). T h . aigorithm can thus be
s u m m a r i d as follows :
9 Start with a siifticient statistiç fiom the data to get an initial fitted d u e vedor
p.
O From t h statistic, the link fim&ion g is used to denve an initial hear predictor
p.
TLiese stntisticr are us& iu uesting the s-tarting adju&eù dependent variate and
workhg weight matrix as foUows:
A weighterl least squares regmgsion is carried out of do) on X for the mode1 E(z ) = XP with the working weights matrix, w(O) to obtain a nrst xnmcimm likelihood e s t h t e :
which is tùen iiserl to obtain updated values of i j and ji:
This process is repeateci to update the regression esthates at each iteration via a
ycoring algorith, until the variation fmn one iteratim to the next is sufficiently
sniail. The niaxiniuni likelihood estimation method through the IWLS procedure
is an ateasion tu the non-iterative least quarts method of estimation for nomal-
theory Lùiear models, with W1/=X as the design matrix and the adjiisted dependent
variate W1/2z as the reqmIlSe variable
At mnvergeuce, if it OCC'UTS, z h m e s z = + W-a(y - f i ) so th* the &-
iikeiihood estimate of fl is:
If the working weights matrix W = 1 (the identity matrix), then the maximum lik*
lihood and leact squares niethoch mincide. No iteration is requked for the maximum
likelihooci estimation:
Hence, the IWLS niethocl extends the least squares proccdure beyond the lin-
mode1 to the generalizeci lin- mode1 that indudes the binomial, Poiuson, nord,
inverse normal, gamma, exponential, and multinomial clistributions.
An interesting point to note Ir that the worLing aRights matrix u d in IWLS, W,
is updated at each iterative step of IWLS so that each element of W, u*ii is updated
too for each observation i. Hence, W depends entirely on the fit of the mdel , ancl
not at aU on the LikeIihOOd equations X'(y - f i ) = O useü to determine B- in mntrast,
the weights deteniiine the fit in the weighteù least squams method-
The basic components of the generakmi linear model, as an actensioon to the normal
thmry mode1, m q be summarized in the foUowing table:
2.5 The Goodness of Mode1 Fit
As previously statecj, the link fhction which is usecl to describe the systematic corn-
ponent is often unknown. Canonical iinks may simplify the mathematics, but tkey
u y mt necessarily represent the best predi&iori- A natural question bund to a1-2~
in fittiug a GLM 14 YhOW good is the link htion used?", in m & n to sonie
othcr potential link fimc.tions, In f&, the mode1 fit is questioned. Other issues at-
tributable to d e l fitting are baseci on assmmptions such as the exponential family
distribution of the observations, the coIlStaDlcy of the dispersion parameter and the
iiidependenc* of the observations, mich like t h e seen in the nod- thmry linex
models, and the issue of ident&ing iduential observations.
A cx)mmon pal in postidating the systematic: de& is to have ody as -y in-
dependent variables as nwxsary for a goal fit. Consequently, r n ~ ~ which can
determine the quality of the fit and .statis%icaI tes* for keeping the variates in the
model are sought for. In partidar, the two most usefd goodness-of-fit .statistics are
the devuInce mea.w~rr: and the Peamon statistic The deviance measure is motivateci
by the discrepancy between the maxima of the o M ancl the expected (under
the mdel) log-IikeIihOOCl functions, Conviersely, the Pearson ststis-tic measmes the
relative ciifference between the obaerved and the fitted values, Both of these statistics
can ùe appruxhated by the 3 distribution with amesponding degrees of f i d o m .
Iu either caîe, a large deviane or &&square value inipües poorly fitted olx3ervations
with res'pect to the model.
CHAPTER 2. THE G-4 L3lWNZ MODEL
2.5.1 The Deviance b c t i o n
The maxinitzed . . likelihood for a given mode1 may be considered to be an indicator of
the goodnes-of-fit. Fur example, the ratio of the niaamized likelihoods under taro
models as a measure of the gcmchess of one niode1 over the other niay be such an
inriicator, or alternatively, taking the logarithm of this ratio. The deviance measnue D
is thus defineci as twie the logarithni of the likelihood ratio. Subsequent1y, a relatecl
implies a sudl D*, a g d fit is inclicahxi by d values of the deviance. The table
below expresses the deviance function for the Merent membem of the exponential
f d y with their respx-tive mnonictal links. Note that fi is the vahie of pi = E(yi)
for the mode1 co~1sidered.
The imscalled version of the deviance is
in (2.38), the parameter êr) = MLE of under the fitte<l mode1 . Each di measure contributes to the deviance. The value of Oi which maximizes the
( A r ) likelihd function, for each ith observation, i9 0;") whereby b'(ei ) = y,.
2.5.2 The Pearson Statistic
The Pearson statistic is dehed usïng the weightd least squarar a p p r d , which
prwides the follcrwing chi- quar ri: godnessof-fit:
2 = min C 1 ~ ' ~ (% - ~ i f l ) ' . s
i=L
This measmm is computationally simpler than the deviace mearure but it is more
usehd for distributions dmer to the N o r d fandy, as it resembles the RSS under the
normal-theory for other ùiagnmtic p-. However, when the probabiüty density
function of the obyemtions is m a r k d y a~ymmetric, the outliers may not be well
d e k M by Peamon TeSidiials. C o n v d y , the deviance residuais wiil detcwt outliers
better in these situations.
2.5.3 Residuals and the Projection Mat&
The ~ f u l n e s s of residuals r, = yi - & where 6 is fiom the mdel fit as wed for diag-
nostic pt- in normal-theory linear mdels, does not apply in GLW. However,
< serves as a measure of goodn-of-fit iu norrual-theory mociels, it wouici be
best if the two measnues gven here crndd be demmpowd into components, which in
turn could serve as modified reyiduals in GLhb. Ushg this concept, it can be yeen
that
CHAPTER 2. THE GENERALIZED LllVEAR MODEL
where
which are the weighteû res5dual.s or the Peaxson fesiduals.
Similady,
where
Tliese are tlw deviance residiials (see Preg ih [ml). Hence like in nod-theow
mdels, both the Pearson and deviance re?riduaJs may be usefid in developing diag-
nostic tmls in GLhls. This wili be d i d in Chapter 3.
For detecting influentid observations anci outliers, the use of the adjusseci dependeut
variate z permits the use of the projection matrix
usiry: the transformation X -t w1I2x = Xw and the least squares theory as intr*
ducd in Chapter 1. Hence
shares the properties of a projection matrix. As mentioned in Chapter 1, the diagonal
elenients mi:) ran be 1 4 for diagpœtiic piirpases. It is ako interesthg to note that
This inrplies that
where x deiiotes the vector of Pearson residuals for the cawnical W. Hence to
conclude, MM- spans the s p m of the Pearson residuaLv under the cvnclition that the
canotùcai link is usecl.
2.6 Alternat ive Models
For botli n o d h e a r d e l s and GLMY, the fom of the distribution and therefore
the Likelihoocl function is known. However, in practice this information may not
be available. Then some feahires of the ciab need to be evaluateci svch a how
the mean reqmrtse p relates to the independent d a t e s , how the varialiüty of
the respome relates to p, and whether the obervations are all independent. Quosi-
1ikc1ihd estimation is based on the idea of inoomplete distribution zrpecification. It is
determinecl entirely by the mean aml variance functions. Lüre the optimal ptoperty of
linear least squares estimates, quasi-iikelihood estimab have asymptotic optimalih.
properties.
Consider g to be the link funciion which d a t e s the mean T n s e f i to the systematic
part of a GLM:
dl4 = 4 P :
Only Ihe fonn of the mean and variance /unctimu are muxsary for the quasi-likelihood
fi~r~ctiort.
Tlrc quasi-likdiltood funetion ir def ird by thc ~ m t t c jonrt
Since V ( p ) is mœt o h praportional to Cm&), it is safe to m e tbat V ( p ) =
Cov(y). Here, the proportionality of Cm(y) to a ma- of lmown constants in n o r d
hear modeIs is extendeci to the pmpartionaliw to a matrix of known func.tions of
the mean vector p for nonlinear mdeJs. Then it foilows h m the 1east sqt- that
(1) the estinrate 17 mhimbm the quadratic form of Q(p;y ) over p@), and
(2) the weightd aini of squares estimate p will aatisf3. the quasi-smre equations
This approach is the GUf mmterpart of the least squares a p p d to the usual
linear mde1 with normality assumption. It makes a base for uskg the generalized
linear mdel without aùhering to a partidar arponentiai faniily assmmption.
Table 2-2: Dirtribution lhctwru ununth their Aawciated Links
FAMILX MEMBER
LINK Normal PoWn Binomial Gamma Inverse Gaussian
Table 2.3: An E x t t ~ ~ i o r r of the Nornrol-Thw~i/ Linmr Model to the GLM
N o d - L i n e a r GLM
y - dependent rariate a - adjus- dependent variate
ji - linear predictor f j - lin- p d c t o r
s2 - the res-iduai variance PM h~ 6v@) X W"~X
H - the bt (projection) matrix H = w'/~x(x%)-' x-''~
Chapter 3
Residual Diagnostic Measures
Two hrpes of residuals were introdud in Chapter 2, namely, the Pearson type (re)
and the deviance-baseci (rDJ It is f d that the devianae-bssecl residuals pr*
vide better goodneof-fit mea~ures for GLMs than does the Pearsou statistic, even
thmgh the latter is niore nearly chi-sq~iared distrib~ited. me reasonc for this are
the alni& n o d t y of the d e v i a n ~ ~ residuals and the mnvenience in their
use for likelihd-baseù inference. In f&, deviancebased residuak are especially
appropriate for identifv;ne individual poorly fittd observations. Aere, the <lisper-
sion parameter q5 is considemi to be known, in which case the exponentid family is
essentially given by the density hc t ion
where the d e parameter iY omitted. The Oi are asumed to foilow the tentative
mode1 given by
where g(- ) is a specifieci funetion, q 4- a vector of known d a b l e s , and P is a
vector of unknm parameters. The residuals discvssed in this chapter, however, are
usefid in a niore general setting t h just for the expnential family distribution. The
diagnostics are basai ou the aqmptotic distribution of residuals. In GLM, two types
of aqmptotic situations arisc:
(1) when n + w, and
(2) when the index rn + m, which iu equivalent to each Yi becorning approximately
normal.
These situations are r e f e d to as n - asymptat ics and m - asymptdics respectively.
In situation (2), rn wotild mrreqond to the (3an1ple size for the binomial distribution,
the meaus for Poison, or the gamma sbape parameters. Hence m can be thought of
as a cornon factor niultiplIllIllving the exponents in these aforementioned dens-ities. The
standard m p t o t i c r d t s for estimation and b p t h e s h testine; with respect to ,û
apply if either m or n is large. However, asyrnptotic r d t s pertaining to individual
case diagnostics require large rn, h p e c t i v e of n. The problm arises when n is
large but rn is not. This is a mmmon occurring situation for rez9ciual distributions.
Distinguishing betveen fÙs+ and second- order rn - usymptutics (i.e.: corresponds
to the stochastiic convergence of order m-L/2 and rn-l respec.tively), the second-order
agyniptotic r d t s are more iisehil when m is small t h the firstorder ones (see
Pierce ami Sdiafer [21]).
C H ' R 3. RJWDUAL DIAGNOSTIC lMEASURES 45
Consider TeSiduals that are sppmxhateiy mrmally distributed. In the following
models, ei is treated as known, but in prac-tice, it is replaced by
Three typeY of residuah are cowidered:
where E = mean and SD = standard deviation,
where t ( - ) is a s'pecifieci transtformation dependhg on the p r t i d a r distribution
of y.
There are two wa-p to go about in choosing a trandorniation t ( - ) . One way lets
the fhsborcier m - asymptotic skewnem of t ( y ) be zero (Le. symmetrizing) and
hence approiamate normaiity may be achievecl. This is done iising primarily
the Anscombe tesidual.
(a) Anscombe Rtsidual (see [2])
Starting witb a function which wil l make the distribution of A(y) as normal
as pocrrible, standarùïzed 4 t h O mean a d unit dmce to the first order
in p, for the likelihood hctions in GLMs, the fundion A(-) is given by:
A 'symmetrizing trari9ZDmtion '(see Chaubey d Mudhdkar (31) on t(-)
(for t' # 0) can be obtaheù by solving
lu the ixwe of the binomial &-trilution with proportions T a ~ d rrt trials,
the symmetrizing traosfoffiiation is dven by
which cari solved nunierically using the inmmp1ete beta fiinc.tion, with
no explkit solution-
For a Poisson clistribution with mean p, the transformation fields
As for the ganuna distribution with mean p and shape paranieter a, the
traasfomtiori is known zw the Wilson-Hilferty c11be-root trandomation
An alternative to the apprriJrimate norrnality objective iÿ to choose a t ( - )
that wiU make the m - crpymptdic variance of t(y) constant in 8.
(b) V'aoce Stabiizing Mduai (see [a) I f ( t , , } ,n = 1:2 !... : is asequenceofstatisticssuch that
Le. fi@, - 9 ) has an asymptotic distribution,
then it follm that if g is a fuaction with the first derivative existhg and
b e i mutinuous, g'(0) # O, then
and further, if O(@) is eontinuous, then
By the Taylor series expansion,
Now if h is a h c t i o n sich that h'(0)o (6) = c where c is independent of 8,
dl1 c -=- l riB. de o ( * ) ' " = ' / ~
Then the a5sptotic Mnance of h(t,) is independent of 8:
If y is a r d o m variable with B(m, x ) , then the variancestabiizing trans-
formation for the binomial distribution is
and for the Poisson, P(p) ,
CHAPTER 3. RESIDUAL DIAGNOSTIC MEASURES 48
The vatiancestabilipng - - . ~ ~ t i o n ior the gamma distrihticm G(a, k),
where E(y) = ak = p, Var (y) = al? = kp yieIds the fo11uwjng asymptotic
mean
Table
and variance
(3.1) smmmuks the Anscmmbe residuals with a O(m-Ir2) correction
added to t [E , (y ) ] and the vari8acestabiiizïng residduals (see (21)):
Binomial, Pois.~on and Gamma distributions
ANSCOMBE RESIDUAL VARUNCE-STABI~IZING
RD(yt 6) = sW(ê - e)p[c(ê, Y) - qe, (a 11)
8 is the MLE of B ixwxi on y without restriction by model Bi = g ( ~ @ ) . The * . deviance residual will measure the disclepancy betareen the 1%-
likelihood for the cument mode1 and the maximum poesible log-Likelihd for
the data, Under a f i r s t d e r m-asymptotic, the deviane has an appmximate
normal standard distribution. An adjustment to the deviancf! residual will
remove the bias comiing fimm the asymptotic tam, O(m1I2), and the adjusted
delriatm residual is fonued, as d d b e d next.
(4) Adjusted denance residual
The table whch follows cites the expressions for deviance residiials and adjusted
deviauce residuals, for the three given d d t i t s .
Table 3.2: DeviOncie and AdNted D&me R d d 9 for th Thme Dish-ibutim~
ADJUSTMENT TERM TO
Taking the n o r d apprmhatd tail probbiities, these residuals for different values
of y lie ktween .ûûû1 and .10 for the binomial ancl Poivvon distributions and are
equal to -05 ancl .O1 for the gamma- Pierce and Schafer [21] comparecl the tme tail
pmbabiLities for each respective density,
9[R(y + -5: O)] and 1 - +[R(y - -5, O ) ] ,
by considering Merent residuals R, where y is an integer. In all thme density func-
tions, Pierce and Sdiafer fouad that the Ansmmbe residual and the adjusted deviame
residual are good for appmcïmate mmmdity, evai when m is small. Rrrthermore, the
adjusteci deviance residual should be consisidxntly the clozçest to the true tail pmbab'il-
i@ throughout, for the different distributions due to its alma-normal characteristic.
3.2 Muent ial Observations
Deletion or perturbation of obeervations finru a ejven mode1 helps deted t h e in-
dividiial obavatioiiu wkck may ex& infiuence on the various cwniponents of the
fitted d e l . The followiiie; approach is descxibed in F'regibon [20]. To see the effet%
of perturbing an individual okmation is to e the &et of its deletion. Pregibon
pursues this idea by considering tbt? likelihd n
wherti ansidering t.i = 1: V i yiel&c the u s i d likeLihood, whereas z;- = 1 V i except
i = C anioiints to delethg the Eth observation. Thus, a niatrix composed of diagonal
f o r O < v < 1.
Then the l ike1i.d es-timate B becornes a function of V and is denoted by &c). The
l ike l ihd equations are
CHAPTER 3. RESWUAL DIAGNOSTIC MEMURES 51
Then Fisher's scoring aigorithm for the modifieci likdihood leads b a new sequence
of estimates:
@+'(r) = p(r) + (X'W'~~VW'~~X)-%V(~ - f i) . (3.15)
As r + 0, the Pth point has lers leverage in the fit. The êth point is iduential if a
small value for r yielcis a large D1(r):
nieasim the inipac* that rui
a GLM regesion. Plotthg
8th
the
observation exerts on the v-r of e c i e n t s in
standardized in coefficients ~&''/s.e.(fi') agahst l detect.5 any iduential observations in the s e l d coeffiuent, &. Cook's statistic y, measwes the impact of an ohrervation on al1 the coefllcients p. One conveuient way of interpretiag y in a GLM wntext is by the amficience region
nispkwxuent for due to deleting an (th obziervation, naniely,
CHAPTER 3. RESIDUAL DIAGNOSTIC MEASURES 52
A large y corresponds to a highiy hduential (th observation on the overail fit of the
model. By applying a secondder 'Iâylor aieries expansion to (3.19), the addence
region is generated by the Iiruiting Normal distribution of p. The concept of observation deletions can be extenderi to perturbations by let-
tvt = O so that = &O) measure> the influence that the Cth point sr& on the d-
cient estimates B through y. Then the confidence i n t e 4 displacement is measured
by the one-step approximation to &O):
where X: = r$, (2.43)-
3.3 Test ing the Goodness-of-Fit
M-vriag the goocln-f-fit of a model can be done by calculahg the effect of
change in u on the diagnœi%ic measmes of the deviace function D anù Pearson's &a-
tistic X2. In case of the deviance fimction, the maximum likelihd estimate should
m h h k D, much like the least quates estimate xuhimkm the resichial SIUU of
Yquares RSS in a nomal-theory linear model. Subeiequently, deletion of an observa-
tion d e c ~ e h s D , iïke it wodd decrease RSS in the normal-theory model.
Using the observation munt ruatrix V in the l ~ l i h o o d fundion fielcis a deViance
A onestep estimate b1 (c), and a second-order Taylor Senes of D, (@' (v); Y)
CHAPTER 3. RESIDUAL DIAGNOSTIC MEAS-
about 6 appmximates the above quantity :
at c = O : D,.(x~~'(v); y) is at a tnininlum of D ( X D ; ~ ) - (& + ci), where is
the ciiwge in the corifidence interval displaÿaiient diagno&ic i$.
The deviailice dec~eases as t. O.
î l e rate of tfiwee of D due to perturbations is obtxhed by taking the derivative of
(3.22) with res'yec-t to r..
The change in devimce due tu deletion of the tth point is apprazaniatd by:
which are i ~ f i d for index plotting. The presence of 2 components is a feature founci
in the onclstep appraxiniation, ruakine; it a useful diagnostic tool.
The Pearson's statistic is not a shaîghtforward rueasure to interpret since it doam't
extend fiom the normal-theory linear model as does the deviance huiction- As o k r -
vations are deletecl kom a given model, the 2 me- does not necesady decreaw.
However, like the RSS, the 2 is the d t of the sum of squares of Merences of the
observed ikom the fitted vahies. The one-step approDcimation to the 3 due to the
deletion of the tth observation is:
In extrerne cases, 2 WU in<=rease for some observation deletions.
The devianre fimction anû Pear~on's 2 goodnea~of-fit statistics can be interpreted
in h o way:
(1) when the lth point is not well fit by a @ven model, ive. an outlier, then a model
perturbation ca\~d by tl will be refie<%ed iu the single mmpoueuts of D arid
(2) when the (th point is an cxtreme point in the design matrix, Le. an influential
point, theri ail the individual components of D and X2 will change.
A change in either the deviance hction or the Pearson's statistic w 7 t distinguish
whether the change cornes from (1) or (2). An addtiod dia&nobItic mesnue hU cari
rerioIve this problern, where hri is an off4rigwal dement in the hat matrix H for the
(th observation with repec.t. to the jth ~bRenratiicm, Ihtjl 5 &JI;;;. The Iitj's in
comb'itiou with the xt and are usefid for nieesuring h m an îth point is irifluential
on the remaining (n - 1) points.
Tliere are other wa-ys of measuting the g o o d n d f i t such as by investigating the
interactiom between avariates, or by 160king for non-linear efiec.- by adding some
tems to a model in the hopes of reùucing the appmximated deviance.
Once a model bas been tested for potential outliers and influental observations and
that they've been r e m o d kom the data, then the valiciity of the link function needs
to be checkeù. Consider a pneraIizeci hear mode1 to be fittecl with a helvpothesizecl
link hinction g&) generated h m a class of functiow, of which the true and un-
lmown link function g.(p) is also a member. AU link functions belonghg to a clas
of functions are in<lexed by one or more unknown parametem. Plotting for a range
of fixd parameter values again& the corresponcling deViances is wefd in d e c i h g
which range of parmeter values are rn& mILSiZfteDt with the data. The adequacy of
the hypothesized luik fimction i9 examinecl by expan- and linearizing the link to
opthize over the range of paranietm. The deviaoces obtained h m f ù c d parameter
values are teyted agaiast bestfitting values. This is callecl the g d ~ s - o f - l z n k test.
If a class of linL functions is generated by the the power f d y for one parameter A,
then it is clefincd either by
Mth limiting value g(p; A) = logp as X + O
or by
The power f d y trdorxw the fitted d u e s p in a GLM ase. Conversely, the
Buoc-Cm t d o r m a t i o n k a power hinc.tion which transfomis the data in a normal
linear d e l .
If a model is fitted with a link function g&) when the true link is g,(p), then this
can be represented by:
To optimk over A., oue approach is to l i n e m the power famüy thro~qgh a fùst-order
Taylor series expnsirn ahut g,,(p). B d on the apprackate relationshiy
the true l i d c g.(p) = XP is apprachateù by
where d = (y;(p; &)) and -/ = (-A. + &).
The h.ypothesizecl Link function is now modifiecl by the addition of a covariate z' to
t h design niatrix and i t s parmeter estimate 9 fields a first-order arljus.tment to A,.
Heuce the dditiunal Ficrtx~r in the systematic h e a r mmponent accm~ints fur local
ciifferences betweeu the hypothesized link and the niodified one. These differences
are rneaswd by a recluctian in the cieviance. In turn, this reciuctian senres to test
whether A,, is suitable enough for A,:
(e= O.& - p - q ) or x?/(n - p - q ) .
When g,(p) is atsnmd to have the identity link (in. the data is n o d y distributecl),
then the apprmhnations macle on the 9 ùistribution are areexact:
The proces is repeated to form a new adjusteà value for A, at each iteration u n t l
a possible mnvergence is resched and then the mriyimm likelihooà estimate of A,
is obtained. If the initial X, is su8iciently clme to &, amvergence is assurecl. Tben
the Linearization of the power family will yïeld the true d u m likelihoocl estimate.
The p r w . foUm a sequeme
which is implemented in the iterations for fitting a generalized linear d e l .
The link moiiification methrwl has its limits such that it is restricted to a specitieù
class of link functiom g. The mas* which can be done is to improve an already
rasonable fit in order to obtain the true link function. On the other hanci, if the
h'rpothesized link is iuadqtiate, then the tn~e link func%ion belongs to another class
of luzk fimc~ons altogetkr. This is attributakde to a micrypec3ication of the systematic
corupnent of the nidel.
Com-ider a mode1 initiaily fitteù with link go(p) = XP to get estimates fi and fitted
values 6 = xB- Thus ê = (&(fi; &)) can be obtaind, and the model L refitted with
the extendeci design matrix now incluciïng the covariatti 2 = ex. In turn,
The s u m of squares corre5ponùing to 9 (to test if 7 = O) is
A parallel reduction in the d m of W o m anci in the deviance h m the initial
model to the extendecl one iacluùing Ê is produd. This reduction is evaluated by
an F-test to daide for the validity of the hypothesized fimctioi.
For every parameter added to the p<lwer function, an extra OOvaTiate is aàded to
the design matrix which is givai by -% Ir=&. The powa f d y provides link
generaüzations for the n o r d distribution with identiw link, for the Poisson with log
link, for the gamma with reciprocal link and for the invewe pussian with p-2 link.
For log-linear data, the power hm.ily is defined by the one-link parameter function
As for binomial data, the powet f d y does not apply. Another onclparanieter link
family is given insteé,ul hy
As A + 0, the complementary log-log linL is generateci:
lim g(p; A) = log A 4
?; = p / m is the respon* proportion. It ïs a tweparameter Iuik f d y with
parameters a and 6 (baserl on tolerance ciit&ributiorur). This famiiy of fuactions
generatcs the logit Iirik as the limiting form of y:
For this d e l , the series eXpiiri.rion is
The tme link func-tion is apprmrimatd by the extendeci mode1
The maxiniu~u likelihood estimate of y is r d e d thn,ue;h the iterative prooess de-
saiM earlier. A rediiction in devianoe residts h m adcihg on the additional fac-tor
to the ~teniat ic linear component. Fïrdy, an F-test uses the change in ùeviance
to a ~ e s wliether the e s t h t e of -y via (a,: &), henm of the link functioa i t d , is
3.5 Software Applications
The software application GLIlli ("Generalizeci Linear Interactive hlodelling") was
createù in the early 1970's for generalid linear mdei computations, but because
one kad to have some in-depth knawledge of s~tis%ics to use this tool, the geueralized
lincar d e l u were not popularized. It took twenfy yeam for generalized hear mad-
elling procedures to becmue d b l e to everyone through user-"friendy" software
appliations. lu SAS, GLMs can be E t t d through the Genmod p d u r e , and the
GEE macro analCyzes longitudinal ùata by ushg the Generalized Estimation Equation
approach. In SPlus, the StatMod libfcif'y contains some functions for GLM statisti-
cal modebg. R, which is a non-commed equivalent to S-Plus, can fit GLMs. It
shares =me libraries mth SPlus which are accessible h m the website
LispStat is iiseful for GLMs and uses some R a x h g . Matlab uses a module called
g l n a to fit GLMY. Another application is Gerritat which is mu& like GLIM. S m e
websites offer artides anci abstracts on GLMs. The foilowing are only a few websites
worth consuking for a start:
0 http:/ /www.ams.org/w~et/ and
Chapter 4
Numerical Examples
4.1 Introduction
In this rhpter, three sets of data are id for illinhratioii of the techniques presented
earlier for gerieralizect linear modeb. The tirs% set of data iy a ~ s u n i d to corne h m the
binomial f d y , the second one fÏom the Poisson f d y ancl the third one nom the
gamma f d y . In each case, maximum likelihd fit of the mode1 is providecl dong
Mth the residud diagnostics. The parameter estimates were obtained through some
cornputer yrqpms c m a t d in SPliiy. Th- prowanis are provided in Appendix A:
see A.1 for binomial data, A.2 for Poisson, and A.3 for gamma data.
4.2 Binomial Data
A study of a herbicide &et on the proportion of b i a b n o d t i e s was conducted
over a time y p a n of one year (see Aitken, Anderson, d Rancis, 1989, %tatisti-
atl hIdeUin; in G L W ) . The data was deded on a monthly b i s . The birth
abnormality proportions are determined by dividing the oberved number of birth
abnormalities by the total number of b ' i for a gïven month.
Table 4.1: Number of birth abnornuilitics out of total births pcr month for herbicàde
effect
MONTH ABNORM. TOTAL HERB MONTH ABNORM. TOTAL HERB - - -
Jan. 10 222 O J ~ Y 20 208 788
Feb- 17 221 O A%- 17 210 O
Mar. 18 188 O sep- 9 198 304
Apr - I l 183 O Oc*. 15 216 5m
May 16 197 1454 Na-. 16 244 O
Jmc 24 218 3280 Dec. 15 218 O
Based on the ass-umption that the data is b'mdally ciistributeci and that the logit
liuk is d to fit tkis mocle1, a combiition of gmphid and aridfical tt?chniclues are
d to tes* for au\. higb1everdge or outltlving obrjervations. The maximum likelihooci
esthates for this logistic: regmision mode1 are calc.Ulateci ~ising au S-Plus canipiiter
program tLat was created for this purpose. Other pertinent sktistics(-b adjusted
depemlent variate, fitted viiues, variance) are also calculateci in an iterative fashion
thruugù the SPl- hear mode1 func.tion(see lm, A.1). The output is presented in
the fdowing page in table format. Testhg the griodnestwf-fit for the cmrrent logit
d e l with one explanatory variable accounting for birth abnonualities, the test sta-
tistic (2.37) @ves D' = 8.31 < 18.3 = 2as4=,o which irnplies that thti logit mode1
is well fitted by the b'momiaily distributed data at a 5% level of sigdicance. F'urther,
a one-&ep function base<t on Pregibon's work [20j whïch modifies the loglikelihood
fimction was also developed in S P b to determine the effect that each observation
exertg on the w o n coefficients tbrough mode1 perturbations to the ertent of case
deletions. 1 d e d this fimc4k.m ka0nesteps(see Appendix B). A sriLall change in
coefficients for lth observation means that the obBervation is non-infiuential in the
mode1 fit.
Data Fitteci Vdiies
10 15-05~x1
17 14.93851
18 12.75041
11 12.41 130
16 16.63094
24 24.076GG
20 15.89181
17 14.85287
9 14.06200
15 15.94563
16 16.-
15 14.785û5
Adjusted Dependent Variable Variance
Figure 4.1 : Deviance reiiduals for birth abnormalitier due tr, herbicide spray expoyure
Figure 4.2: x rsidualv for birth abnormalities due to herbicide spray expoaure
Figue 4.3: Projection matrix diagonal elements for b i h abnonilities due to her-
Figue 4.4: Standardizd change in ,&, for birth a b n d t i e due to herbicide qmy
expm.ue
Figure 4.5: Stanùarûid change in for herbicide data
Accordhg to the deviance misidual aucl the x nisidual index plots, the month of
Mar& woull inclicate that the herbicide spray dect is sigrdirnntly p a t e r on birth
abnomalities than for any other month of the year, The standardid chauge plots
in both the intercq>t(A) and the herbicide spray aqxmire variable(&) woidd also
a- t h t a perti~bation or a deletion of the obrvation for the month of Mar&
(i.e.w = 0.5$.2 or u. = O rep.) would cause a pater st;ancladize<l change in the
regrasion coefEicients than for any other month. Hence, basal on these d i a ~ ~ ~ c s ,
it is likely that the month of March artrtts an d u e influence on the total m b e r of
birth abnormali ties.
4.3 Poisson Data
The set of data gîven hem dass ik the ddects found on furniture £rom a gïven
manufktmiq plant obtaïneci kom (see Aitken, Andefson, and Rancis, [l]). The
defects are thus classified as the type of deféct, ancl the production shift. There were
a total of n = 309 defects reconld in all, clasdiecl in one of four types: A: B: C, D.
Each piae of fiunihue is also classifieci by one of tbree production shifts: 1,2,3.
Tlie mntingmcy table beluw tabulates these dekt cuuntts by CVpe of def& and
pduction shift. The Poisson distribution d e l is fitted to the data with the log
Table 4.2: Cmtirrgmcy table for jùrniturre defect
link. The cornputer program in Appendix A.2 caldates the ML& for the GLM
log-liuear repssiou. The oittpit is , m m in the following tabla:
Data
15
21
45
13
26
31
3l
5
33
17
49
20
- -
Fitteà Values -- -
Variance
22.31133
2 0 2 9
38.9385 1
11.35987
22.99029
21.43G89
39.7GG99
11.80583
28.49838
26.57282
49.29450
14.W30
This mode1 is explainecl by four leveb of defst types and three levels of prodiicdion
shifts. To asses the sienifir;uicz of this log-linear mdel, the statistics £rom equatiom
(2.37) and (2.42) are comparecl tu x&,, = 12.6. Silice Do = 20.34 anci 9 = 19.14,
it is concludecl that the lq-linear m d e l does not provide a g& fit to the Pois
son distributecl data at a 5% sienificance level. In fact, the gcxxh-f fit for this
model is only significant at the 1% level. The index plots of the deviance residu-
als, the x residuals anci the diagonal elenients of the projection niatrix are based
on the fitted log-linear d e l . Both the 6th and the 8th observations, wbi& mrre-
spond to the T-ype B nuruber of defects and Type D number of defec- respectively,
f o d in the ~econd proùuction shift, are not well fit by the mdel. In hct, the
8th okrvation has a very large value. The s&mdardized change in d u e n t
plots for the intercePt(&), the B defed -le&), and the semd production
shift variable(& agree that the 6th observation is caiising instability in these aeffi-
cients, wkile the 8th observation is mushg instsb' i more so in the Type D def&
variable(& and the second production shift variable(&). Henœ, the sdandardizeù
&ange in coefficient plots are in-line with the residual aml projection mat& index
ploLs.
Figure 4.6: Deviance residual?3 for defects f m d on hirniture producd in a certain
miifarrtiuing plant
Figure 4.7: x resirlids for defeckz f d on furnittue p m d u d in a certain mufac -
turing plant
Figure 4.8: Proja%ion niatrk diagonal elements for defects f o d on hiniiture prw
dt~ced in a certain mtifac.turing plant
Figue 4.9: S t a n d a r M change in & for defecis found ou hirniture prociuced in a
certain manufacturing plant
Figure 4.10: S t a n b a change in for hirniture damage data
C ' A M E R 4- IVIIMERlCAL EXAMPLES
Figue 4-11: Starid=&& change in for huniture damage dab
Figue. 4.13: StandiuciiWCI &ange in b4 for hirniture damage data
Figure 4.14: Standardid chauge in &, for hirniture damage data
4.4 Gamma Data
The next set of data are taken h m McCuhgh aad Nelder, 1989, 'Generalized Linear
Modeid"' p.300. They d d b e blwd clotting times, in seconcis, for normal plasma
ciilutmi at nine different percentage concentratiom(X) with a protbmbin-free agent.
The b l d clotting is indud by two lots of tbn,mbopIss-tin. Bliss(1970) fitteù a
hyperl>dic d e l I>y iisuig an inverse transformation of the data to the first lot o n .
Here, the data assumm a ganum distribution with the inverse Lulk applied to eacb
lot separately, since some initial plots indimte that the two intercepts and slopes are
rlirrerent for the two lots. Some of the output fiom the program in Appenciix A.3 is
Table 4.3: Blood clottirig tirne.. in smr& for 9 perce~toge wnm~tmtion+ of plamui
and for 2 lots
9% CONCENTRATION
LOT 1 118 42 35 27 25 21 19 18
LOT 2 69 % 26 21 18 16 13 12 12
1 Data (lot 1) 1 118 58 42 35 27 25 21 19 18
1 Data (lot 2) 1 69 35 26 21 18 16 13 12 12
1 Fitted Values / 71.06 32.86 25 21.37 17.74 15.M 13.75 12.58 11.8
If the level of sienificance is 0.05, then the 95th percentiie of the 3 = 14.1. The d u e
obtained t h & (2.37) is mu& less than that: Do = 0.017 for lot 1 and D* = 0.013
for lot S. Thiiy, the ganuiia àistributed b l d cldting times provides a g d mode1
fit for both lots.
In the graphs that follow, some diagœtic bols are used to a s m ~ which observations
exert some influence on the fitted mode1 for lot 1. The firYt two index plots agree
that okmtion 2, which is the 10% concentration of the prothrombin-free agent, is
not well fittecl bq. the inverse mode1 of the blood dotting times. However, the two
standarctized change in coefficient plob for the intercePt(& anci the percentage of
agent coucentration(&) agme that the 5% concentration level is greatly ïnfluential
on the mode1 fit, depeuclhg on the level of perturbation(u7 = 0.5,0.2) or on a case
deletion(u = 0) altugether.
Figure 4.15: DeViance residuals for lot 1 of bloodclot tirne
Figure 4.16: x residds for lot1 of bldc1ot time
Figure 4.1 7: Projection matrix d i a g d elements for lot 1 of blwdclot time
Figure 4.18: Standardized change in /jo for lot1 of bloodclot t h e
Figue 4.19: Standardid change in for lot1 of bloodclot tirne
4.5 Conclusion
The diagnostic measutes developed through the one-step function provide an effective
coune device to modify the loglikelihd fiinction which is not too time consuming.
In fact, the one-step function p-ts an dequate way of detecting ancl quantifying
the effect of outl-ying obsemtions and extreme points for G m . It is noteworthy to
mention that for lo&%ic regremion, the H d - D o n n e r phenornenon can occur (see
[27], p.225). When the b, are large, the t statiatic g- to zero. This implies that
highly sienificant fi, may have non-sipnifieant t ratiw. For example, when deaiing
with fitted vali~es that are wxy dase to either one or zero, then a dual mnfii& of
the Hauck-Damer phenomenon and convergence problems may arise. This can be
seen wkeri deaihg with a very large dataset of say, 1000 okmmtions, and about fi.v biuary explanatory variables, whereLy one of the OOYatiates is al- one to cmnfirni
the prcsencr? of a di,sase, for example. Then the resulting fittd pmbailities with
respect to that cwariate mus* neceSSafily be one, and hence its d a t e c i regressiori
cdcient , f j j - = m. K c ; in t u m hplies thprt the I1l8Xim- likelihood estimates do
not &,
Since the geiierahd linea mociels are menibers of the exponential f d y distri-
butions, the computations ancl cliagncstic measwes deycribed hem can be extended
to a greater =ope to lead to applications in tirne series md& and survival modes.
Some reyeârch work on diagnab-tic measmres for smrvival mai& hzw been investigatd
by D. Pregibon.
Appendix A
Programs for Paramet ex-
Estimation for Different Families
A.l MLE program for binomial f d y
# Binomial data program: sufficient statistic is the proportion of y to m # for the ith obserpation. # muhati <- function(y,m) (
m u h a t <- rep(NA, length(y) for (i in l:length<y>) C
if (yLiJ/mCi]-O l l y[il/mCil-1) C muhat [il <- (y [il +O. 5 ) /(m [il +l)
> else { mhat [il <- y [il /di]
> 3 muhat
> # Cather al1 ïnformation(z - adjusted dependent variate, X - covariates,
iC V - mightvalae) in a datafraae - CsnDataFrame. #
GenDataF:ame <- function(zValue, X, weightValue. nRows) C X Ceaerate mtdata. matdata <- data.frame(zValue, X[l , ] ) i f (nRows >= 2) (
for (j in 2:nRoas) ( matdata <- data.frame(matdata. XCj.1)
> > matdata <- data.frame(matdata.weightValue)
> # Purpose of this function is t o create and execute the coranand: # betavalue <- lm(zValue ' x C l , 1 +x [2,1 +x [3,1 +etc. . . , matdatavaiue) Écoef f icient s # thsough concatenation of each covariate X i . # Cenlm <- fuaçtion(zVa1ue. X. matdatavalue. nRovs, weightvalue) I
# cat-file & parse file t o generate B e t a O cat("betaVaiue <- h(zValua ' X[l,]a,filet=mtnp.l*) i f (nRows>=2) (
for ( i in 2:nRovs) i ~ a t < ~ + X [ ~ . file="tmp.la, append=T) cat (1, f ile="tmp. la, append=T) ~ a t < ~ . ] " , file="tmp.la. appnd=T)
3 1
# Now erecute the created a-d aval (parsa<f ile='tmp. la) . local=T)
>
t Purpose of this fmction is t o create and execute the command: # etahat <- betaValue C l ] + betavalue [2] *X [ID 1 + betavalue [3] *X [2,] +
betavaiue *X 13.1 + etc. # Genetahat <- functionbetaValue , X, nbws) C
# H e e d a for loop t o genesate the required c-d. cat ("etabat <- betaValne Cl] f ilwntmp. 1") for (k in 1:nRous) (
cat ("+ bataValue Ln , f ile=="tmp. lnD append=T) cat(k+l, filetatmp.la, appnd=T) ~at(~]*X[", file="tmp.in, append=T) cat(k,file="tmp.ln, append=T)
# Nov execute the created comand. eval(parse (f ile="tmp. 1") )
> # This part is made to measure for binomial data - need to extract # pertinent statistics . # iterbin <- function(y, X, m. i-50) {
# Fiad out hou many Xi's, by the length of a column. n b s <- length (X 1.11 1
n <- O for (i in 1:itma.x) <
a <- n+l
wight <- met*( l -muhat) z <- etahat + m*(<y/m>-auhat>/wei@t matdata <- GaDataFrame(z, X, w e i e t , nâows) beta <- Gealm(z, X, aatdata, nRovs. weight) if (sum(abs (beta-beta0) ) <= 10- (-10) ) {
retura(list("Passn=T, coefficients=beta, fittedvaluesainnrhat, adjustedValue~z, Variance=ueigbt, iterations=n))
3 betaO <- beta
>
A.2 MLE program for Poisson family
# Poisson data program: sufficient s tat i s t ic is the mean of y # for the i th obserpation. #y is vector of the sum of counts rnuhatpoi <- functiody ,m) {
YI^ >
# Gather al1 infonnation(z - ndjusted dependent variate, X - covariates, # Y - oeightvalue) in a àataframe - hoDataFrame. #
GenDataFrame <- function(zValue, X, weightVaîue, nRows) €
X Rirpose of this function is to crûate and execute the coiiand: # betavdue <- lm<zValue ' x [l , j +x[2 ,] +x[3. ] +etc. . . , matdatavalue) Scoef f icients #
Genlm <- fimction(zValue. X, matdatavalue, aRows. weightvalud C t cat-file & parse file to generate BetaO ~at(~betaValtie <- lm(zValue ' X[l,)g,file--mtarp.l') if (nRows>=2) ( for (i in 2:dbws) ( cat ("+X 1" . f ilet="tmp. ln, append=T) cat(i, file=wtmp.lm, apperid=T) cat Ca, J " . f ile="tmp tm , append=T)
1 3
# Nou execute the created command eval (parse (file=" tmp . lm) . local=T)
>
# Rirpose of this function is to create and execute the command: # etahat <- betavalue [il + betaValue [ZJ *X Cl ,] + betaVaïue [3] *X [2,1 +
# betaValue[4]*X[3,] + etc # Cenetahat <- f unction(betaValue , X, nRows) <
# Need a for loop to generate the reqpired c-d. cat ("etahat <- bataValue [Il q .f ilwmtmp. 1') for (k ia 1:nRous) C
cat('+ betaValue['. file='tmp.las append=T) cat (k+1, f ile=%mp. lm, appnd=T) ~at(~J*x[". fiie=atmp.l"s append=T) cat~k.filetatip.l'. appead=T) ~at(~.)", file='tmp.ln, append=T)
>
t Rov execute t he created coaanand. eval(parse(file="tmp- 1"))
# This part is made t o measure f o r poisson data - need t o extract # pertinent statistics - #
iterpoi <- functioncy, X, m, i--100) { # Find out hou many Xi's, by the length of a column. n b w s <- length(X C, 11 ) etahatO <- log(muhatpoi(y,m)) weightO <- rep(1, length(y) ) z0 <- etahatO # Cenerate matdata. matdata <- CenDataFrame (zO, X, weightb, nRous)
# cat-file & parse f i le to gePerate B e t a O # beta0 <- h ( z O ' x[l,I+x[2,]+~[3.J+etc~~~, matdata)Scoefficients betaO <- Cenlm(z0 , X, matdata, nRows, weight0)
h <- O for (i in 1:itmax) <
h <- h+l # etahat <- beta0 Cl] + betaû 121 *X Cl, 1 + betaO 131 *X [2 ,] +betaO 141 *X L3.1 etahat <- Cenetahat(beta0, X, nRows)
mubat <- expcetahat) weight <- muhat z <- etahat + <yluhat)/weight t Generate matàata matdata <- CenDataFram(z, X, weight. nRows) beta <- Cenlm<z, X, aatdata, PRows, weight)
> betaW-beta
1 list ("Pas8 "=F, coef f icients=beta, iterationwh)
3
A.3 MLE program for Gamma family
# Gamma data program: sufficient statistic is the mean of y # for the ith obsemation. # mhatgam <- functiody .ml C
muhat <- rep(Wd. length<y)) for (i in 1:lengthCy)) <
mùhat Ci3 <- y Ci3 /m [il 1
muhat
# Cather al1 information(z: aàjusted dependent variate, X: covariates, # V: weightvalue) in a dataframe - CenDataFname. # GenDataFrame <- function(zValue, X, ueightvalue, nRovs) {
# Cenerate matdata. matdata <- data-frame(zValue, X f l , ) ) i f (nRovs >= 2) C for (j in 2:n-s) {
matdata <- data-f ramdmatdata. X [ j ) 3
3 matdata <- data. f rame (matdata,uei@tValue)
>
# Pwpose of th i s function is to create and execute the m d :
# betavalue <- lm(zVaiue ' r Cl. ] +r CS, 1 +x C3.1 +etc. .., matdatavalue) $coef f icionta # C e d m <- functiodzvalue . X. ma+dataValue. nlbos. weightvahe) {
t cat-file & parse file to genesa-te BetaO ~at(~betaVaïue <- lm(zValue ' X[l.]a.file=atmp.la) if (nRovs>=2) (
for (i in 2:nRows) ( ~at(~+X[~. file=atmp.la. appand=T) cat(i. file=atmp.l". append=T) cat Ca .] f ile=atmp. la. append=T)
3 > cat (" . veights=riei@tValue)$coef f icientsm . f ile="tmp. 1' . "\nn. append=T) # Nov execute the created comand ev~(parse(fi1e='tmpPla), local=T)
1
# Purpose of this function is to create and erecute the conunand: # etahat <- betavaiue Cl] + betavalue [2j *X cl .j + betavalue C31 *X [2 ,] + # betaValue[4]*X[3,] + etc # Cenetahat <- function(betaValue, X, nRovs) C
X leed a for loop to generate the requued conwuand. catcnetahat <- betaVal~e[l]~,fil~"~.l~) for (k in 1:nRoos) C
cat("+ betaValueCa. file="tmp.la, append=T) cat(k+i. file=atmp.la, append=T) ~at(~~*X[". file='tmp.ia, append=T) cat(k.file="tmp.la. append==T) ~at(".]~. fil~~tmp.1~. app.nd=T)
3
# Nov execute the created comnand. eval(parse (f ile=%mp. la))
3
# This part is m a à e t o measure for data - aeed to extract pertinent # statist ics . # itergam <- function(y, X, m. i-50) 1
# Find out hou many Xi's, by the length of a column. ORows <- length(X [. 11 )
# Cenerate matdata. matdata <- GeaDataFrame(z0. X. weigbt0, nlbws)
# cat-f i l e & parse f i l e to generate BetaO # beta0 <- h(z0 ' x [lm 1 +x C2.J +r [3 ,J +etc. . . , matdata)Scoef f icients betaO <- Geoln(z0, X. matdata. nlbos. weightO)
h <- O for Ci in 1:itmar) <
h <- h+l # etahat <- beta0 C l ] + betaO 123 *X Cl ,] + betaO [3] *X 12.1 +betaO [4] *X C3.1 etahat <- Genetahat (betaO , X, nRous)
muhat <- inverse(etahat) weight <- muhat-2 z <- etahat + <y-anrhat)/weight # Generate -tdata matdata <- CenDataFramedz, X. we igh t , n8oss) beta <- Genldz. X, matdata, IiRows, weight)
for (i in 1:dimen) C W- diag (rep ( 1. dimen) Y [ i , i J <- O
temp <- onestep(X,V,U,z,i)
Appendix B
B.l Output for the Herbicide data
B.2 Output for One-Step nuiction using the Her- bicide data
Bibliography
[l] Aitken, M., Anderson, D. and Rancis, B. (1989). Statistical ModeIIing in GLIM.
M o r d Univeruity Press, New York.
[2] Amcombe, F. J. (1948). The ~8I1Sformation of PoiYson, Binomial, Negative Bi-
nomial data. Bzornctda, 35 246-254.
[3] Chaubey, Y.P. and Mudholloir, CS. (IW). On the SymmetMng Itnnsformo-
lions of mndom varMbIes. Paper unpubiished.
[4] Cook, R.D. and Weisberg, S. (1982). Residuals and Infience in i?egri~s~sàon Wi-
ley, New York.
[5] Cm, D.R., Hinkley, D.V., Reid, N. anci Sn&, E.J. (1991). Stotistiml Thwry and
Modelhg: In Homur of Sir David Cox, FRS- Chapman and HaLi, London ; New
York.
[G] Davison, A C . and Gi@, A. (1989). Deviance Residuak and Normal Sawe Plots.
Bionre trika, 79 2 1 1-221.
[7] Draper, N.R. and Smith, H. (1981). Appfied Regvssim Analys&. Second ed..
W'ïey, New York.
[8] Firth, D. (1988). Multiplicative - Log-Normal or Ganmia?
J. R.Statist.Soc. B, 50 2GG268.
[9] Green, P.J. (1984). Iteratively Reweighted Least Squares fm Uaxinnuii Like1.i-
hood Estimation and =me Robust and Resivtant Alternatives. J.R.Statist. Soc.,
46 14!&1!32.
[IO] Green, P.J. anci Silvc?rman, B.W. (1994). Nonpm~net~ïc Rqms ion and Geneml-
ired Lineur MadeL~, A mughnes.4 penalty apprrmch. Chapmm and Hall, London.
[il] Hoa*, D.C. and Wekh, RE. (1978). The Hat Matrix in Regressiou and
ANOVA. Amer- Statisticiarr, 32 17-22.
[12] L i u k y , .J. K. (1997). App1yhg Gtmmditd Linear M o ~ ~ P - Sprhger-Verlag, New
York.
1131 Mathai, A.M. anci Provcst, S. (1W2). Quadmtic fonns an mndmn variable.^:
theory and upplimtion. Dekker, New York.
[14] Mathsoft (1997). S-PL US Pmgmmmer's Guide, Data AnalAnalysis Prducts Divi-
sion, Seattle, WA.
[15] McCdlagh, P. (1985). On the Asqmptotic Distribution of Pearson's Statis-tic in
Linear Exponential Family Models. Internotional Stotistd -ew, 53 61-67.
[16] McCullagh, P. and Nelder, J.A. (1983). G d k d Li- Modeu Chapman
and HaU, London.
[17] McCullagh, P. and Nelder, J.A. (1989). Genedized Limeur Md&, Second
eù. .Chapman and Hall, London.
[18] Nelder, J.A. and Wedderbuni, R W . U (1972). Genaaliaed Lin- Models. J. R.
Statist. Soc. A, 135 37@383.
[19) Pregibon, D. (1980). Goodness of Link Tests for Generalized Linear Modeis.
Appl. Statist., 29 15-24.
[20] Pregibon, D. (1981). Logistic Regression Diagnostics. The A n d of Stutistics,
9 705-724.
[2i] Pierce, D.A. and Schder, D.W. (1986). Residuals in Generalizeci Linear Modeis.
JASA, 396 97'7-986.
[22] Rao, C.R. (1973). Lin- ShtUtiad Infereruz and i t s AppIiaations. Wiley, New
York.
[23] Searle, S.R. (1971). Lirmw Mal&. Wiiey, New Yak.
[24] Seber, G.A.F. (19m). Lin- Repssion Adysis . Wiley, New York.
[25] Seber, G.A.F. anù Wild, C.J. (1989). Nodineur l k p e s s i o ~ ~ Wiley, New York.
[26] Spectur, P. (1994). An I n ~ u c t i o n to S a d SPlus, Duxbury Press.
[27] Venables, W.N. and Ripley, B.D. ( l m ) . Modem AppZied Stati.stics with S m ,
Third d.. Springer-Verlag, New York.
[28] Williams, D.A. (1987). Generaiized Linear M d e l Diagnostics Using the Deviance
and Single Case Deletions Appl. StatUt., 36 181-191.
[29] Zellner, A. (1976). Bayesian and Non-Ba.VeYian Analysis of the F&gress.ion M d e l
with Mdtivariate Student-t Error Terms. JASA, 354 400-405.