8 - Introduction to Logistic RegressionThese data are taken from the text “Applied Logistic Regression” by Hosmer and Lemeshow. Researchers are interested in the relationship between age and presence or absence of evidence of coronary heart disease (CHD).
The smooth is an estimate of:E(CHD|Age) = P(CHD=1|Age) Why?
Expectation of a Bernoulli Random Variable
193
Fitting the Model in JMPSelect Analyze > Fit Y by X and place CHD (y/n) in the Y box and age in the X box.The resulting output is shown below. Because the response is a dichotomous categorical variable logistic regression is performed.
Example:P(CHD|Age=40)=
P(CHD|Age=60 )=
The curve is a plot of:
P(CHD|Age )=exp( βo+ β1 Age )
1+exp ( βo+ β1 Age )
194
Interpretation of Model Parameters
P(CHD=1|Age) =
eβo+ β1 Age
1+eβo+ β1 Age
Odds for Success
θ ( x~)
1−θ( x~)=
thus
ln ( θ( x~)
1−θ( x~))=βo+β1 Age
Suppose we contrast individuals who are Age = x to those who are Age = x + c. What can we say about the increased risk associated with a c year increase in age? The logistic model gives us a means to do this through the odds ratio (OR).
ln (OR associated with a c year increase in age)=ln(θ( Age=x+c )1−θ( Age=x+c )θ( Age=x )1−θ( Age=x )
)=ln (θ ( Age=x+c )
1−θ ( Age=x+c ) )− ln(θ( Age=x )1−θ ( Age=x ) )=βo+β1( Age+c )−( βo+ β1 Age )=cβ1
Exponentiating both sides gives
Thus the multiplicative increase (or decrease ifβ1<0 ) in odds associated with a c year
increase in age isecβ1
.
195
Example: Interpreting a c year increase in age.
Question: Is it reasonable to assume that a c unit increase in a continuous predictor is constant regardless of starting point? For example, does the risk associated with a 5 year increase in age remain constant throughout ones life?
196
Statistical Inference for the Logistic Regression ModelGiven estimates for the model parameters and their estimated standard errors what types of statistical inferences can be made?
Hypothesis Testing
For testing:Ho : β i=0Ha : β i≠0
Large sample test for significance of “slope” parameter ( β i)
z=β i
SE ( βi )≈N (0,1 )
Confidence Intervals for Parameters and Corresponding OR’s100(1−α )% CI for β i
β i±z1−α /2 SE ( β i )
100(1−α )% CI for OR Associated with β i
exp( β i±z1−α /2 SE( β i))
if β i corresponds to a continuous predictor and we wish to examine the OR associated with a c unit increase the CI for the OR becomes
exp( c β i±z1−α /2 cSE( β i ))
Example: What is the OR for CHD associated with a 10 year increase in age? Give a 95% confidence interval based on this estimate.
197
In JMP Using the Analyze > Fit Y by X Approach
Estimated Odds Ratios
ROC Curve and Table
By changing the classification rule based on estimated probability we can obtain an ROC curve.
OPTIONS FOR LOGISTIC REGRESSIONRange Odds Ratios – Odds ratio associated with being at the maximum of x vs. the minimum of x.Unit Odds Ratios – Odds ratio associated with a unit increase in x, i.e. c = 1.
ROC Curve – if we use θ( x
~)
=P (CHD|x~ ) to construct a rule for classifying a
patient as having CHD vs. No CHD this option gives the ROC curve coming from all possible cutpoints based on this estimate probability.
198
Logistic Regression for the CHD data in R> CHD <- read.table(file.choose(),header=T)> CHD agegrp age chd1 1 20 02 1 23 03 1 24 04 1 25 05 1 25 1. . . .. . . .. . . .96 8 63 197 8 64 098 8 64 199 8 65 1100 8 69 1
> names(CHD)[1] "agegrp" "age" "chd" > attach(CHD)
> chd <- factor(chd) > chd.glm <- glm(chd~age,family="binomial")> summary(chd.glm)
Call:glm(formula = chd ~ age, family = "binomial")
Deviance Residuals: Min 1Q Median 3Q Max -1.9718 -0.8456 -0.4576 0.8253 2.2859
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.30945 1.13263 -4.688 2.76e-06 ***age 0.11092 0.02404 4.614 3.95e-06 ***---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Make sure that you specify family=”binomial” or R will perform ordinary least squares
199
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 136.66 on 99 degrees of freedomResidual deviance: 107.35 on 98 degrees of freedomAIC: 111.35
Number of Fisher Scoring iterations: 3
> probCHD <- exp(-5.30945 + .11092*age)/(1+exp(-5.30945 + .11092*age))> plot(age,probCHD,type="b",ylab="P(CHD|Age)",xlab="Age")
An easier way obtain the estimated probabilities is to extract them from the model object.
> probCHD <- fitted(chd.glm)> plot(Age,probCHD,type=”b”,ylab=”P(CHD|Age)”) # This produces plot above
We can obtain the estimated logit (Li= βo+ β1 Age ) by using the predicted command.> chd.logit = predict(chd.glm)> plot(Age,chd.logit,type="b",ylab="L = bo + b1*Age")> title(main="Plot of Estimated Logit vs. Age")
P(CHD|Age )= eβo+β1 Age
1+eβ o+ β1 Age
βo=−5 . 310β1= .11092
200
The Logistic Regression Model (single predictor case)
y i=e
βo+β1 xi
1+eβo+β 1x i
+εi
where yi={1 if outcome is a success
0 if outcome is a failure
=θ ( xi )+ε i
What can we say about the errors?
If y i=1 then
If y i=0 then
Thus E( ε )= and Var (ε )=
We see that the errors are binomial NOT normal!
Estimation of Model Parameters (Method of Maximum Likelihood)
201
For ith
observed pair ( x i , y i ) the contribution to the likelihood is
θ( xi )yi (1−θ( x i))
1− yi
where θ( xi )=
eβo+β1 xi
1+eβo+ β1x i and
y i={10The Likelihood Function
L( β
~ )=L ( βo , β1 )=∏i=1
n
θ( x i )yi (1−θ( x i))
1− yi
maximizing this as a function of both βo and β1 yields the maximum likelihood estimates of the model parameters.
For computational purposes it is usually easier to maximize the logarithm of the likelihood function rather than the likelihood function itself. This is fine because the logarithm is a monotonic increasing function so the maximizing parameters is the same for the likelihood and log-likelihood function. The log-likelihood function is given by
ln L( βo , β1 )=∑i=1
n
y i ln (θ ( xi ))+(1− y i ) ln (1−θ( x i ))To find the parameter estimates we solve simultaneously the equations given by setting the partial derivatives with respect to each parameter equal to 0,i.e. solve simultaneously,
∂∂ βo
ln L( βo , β1)=0
∂∂ β1
ln L( βo , β1 )=0
Several different nonlinear optimization routines are used to find solutions to such systems. Realize of course that this process gets increasingly computationally intensive as the number of terms in the model increases.
How do we measure discrepancy between observed and fitted values?In OLS regression with a continuous response we used
RSS=∑i=1
n
( y i− y i)2=∑
i=1
n
( y i−( ηT ui) )2
=∑i=1
n
( y i−( ηo+η1 u1 i+⋯+ηk uki))2
In logistic regression modeling we can use the deviance (typically denoted D or G2) which is defined as
D =2 ln
D=2∑
i=1
n
y i ln( y i
θ ( xi ))+(1− y i ) ln ( 1− y i
1−θ( x i) )
likelihood of saturated modellikelihood of fitted model
202
Because the likelihood function of the saturated model is equal to 1 when the response (y i ¿is 0 or 1 the deviance reduces to:
D = -2 ln(likelihood of the fitted model)
The deviance can be used to compare two potential models where one model is nested within the other by using the “General Chi-Square Test” for comparing rival logistic regression models.
Nested model concept:
General Chi-Square TestConsider the comparing two rival models where the alternative hypothesis model
Ho :log (θ( x )1−θ( x )
)=β1T x1
H1 :log(θ ( x )1−θ( x )
)=β1T x1+β
2T x2
General Chi-Square Statisticχ2
= (residual deviance of reduced model) – (residual deviance of full model)
= D( for model without the terms in x2 )−D(for model with the terms in x2)~ χ
Δ df 2
If the full model is needed χ2
is BIG and the associated p-value = P( χ Δ df2 > χ2)is small.
Example: CHD and Age
Ho :H1 :
From JMP
(reduced model OK)
(full model needed)
203
From R> summary(chd.glm)Call:glm(formula = chd ~ Age, family = "binomial")
Deviance Residuals: Min 1Q Median 3Q Max -1.9718 -0.8456 -0.4576 0.8253 2.2859
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.30945 1.13365 -4.683 2.82e-06 ***Age 0.11092 0.02406 4.610 4.02e-06 ***---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Null deviance: 136.66 on 99 degrees of freedomResidual deviance: 107.35 on 98 degrees of freedom
Logistic Regression with a Single Dichotomous Predictor
Example: CHD and Indicator of Age Over 55
Computed using standard approach
Logistic ModelThere are two different ways to code dichotomous variables (0,1) coding or (-1,+1, i.e. contrast) coding. JMP uses contrast coding where as other packages we will generally use the (0,1) coding. The two coding types are shown below.
Age 55+ = {10 Age 55+ =
{+1−1
For the purposes of discussion we will consider the (0,1) coding.
Recall
Age > 55
Age < 55
Age > 55
Age < 55
204
θ( x )=P(CHD=1|x )= e
βo+ β1 x
1+eβo+β 1 x
where x = Age 55+ indicator we have the following.
Age > 55 (x = 1) Age < 55 (x = 0)
CHD = 1
θ( x=1 )= eβo+ β1
1+eβo+β1
θ( x=0)= eβo
1+eβo
CHD = 0
1−θ ( x=1)= 1
1+e βo+ β11−θ ( x=0 )= 1
1+eβo
Estimating the model parameters “by hand”
OR =
(θ( x=1)/(1−θ ( x=1))(θ( x=0 )/(1−θ ( x=0 ))
=
Logistic Regression in R
> Over55[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0[53] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1Levels: 0 1
> chd[1] 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 1 0 0 1 1[53] 0 1 0 1 0 0 1 0 1 1 0 0 1 0 1 0 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 0 1 1 1Levels: 0 1
> table(chd,Over55)
Over55chd 0 1 0 51 6 1 22 21
> chd55 = glm(chd~Over55,family=”binomial”)> summary(chd55)
205
Call:glm(formula = chd ~ Over55, family = "binomial")
Deviance Residuals: Min 1Q Median 3Q Max -1.734 -0.847 -0.847 0.709 1.549
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.8408 0.2551 -3.296 0.00098 ***Over55 2.0935 0.5285 3.961 7.46e-05 ***---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1) Null deviance: 136.66 on 99 degrees of freedomResidual deviance: 117.96 on 98 degrees of freedomAIC= 121.96 Number of Fisher Scoring iterations: 4
In JMP To fit a logistic regression model is best to use the Analyze > Fit Model option.We place CHD y/n (1 = Yes, 2 = No) in the Y box and Over 55 (1 = Yes, 2 = No) in the model effects box. The key is to have “Yes” for risk and disease alpha-numerically before “No”, thus the use of 1 for “Yes” and 2 for “No”
The summary of the fitted logistic model is shown below. Notice that the parameter estimates are the not the same as those obtained from R. This because JMP uses contrast coding for the Over 55 predictor (+1 = Age > 55 and -1 = Age < 55).
206
OR’s and Fitted Probabilities
Using JMP to Compute OR’s, CI’s, Fitted Probabilities
207
For dichotomous predictors the range odds ratios compare -1 to +1 in terms of odds ratio which is precisely what we want.
By selecting Save Probability Formula we can save the fitted probabilities to the spreadsheet.
208
Example 1: Oral Contraceptive Use and Myocardial InfarctionsSet up a text file with the data in columns with variable names at the top. The case and control counts are in separate columns. The risk factor OC use and stratification variable Age follow. > OCMI.data = read.table(file.choose(),header=T) # read in text file> OCMI.data MI NoMI Age OCuse1 4 62 1 Yes2 2 224 1 No3 9 33 2 Yes4 12 390 2 No5 4 26 3 Yes6 33 330 3 No7 6 9 4 Yes8 65 362 4 No9 6 5 5 Yes10 93 301 5 No> attach(OCMI.data)
> OC.glm <- glm(cbind(MI,NoMI)~Age+OCuse,family=binomial) # fit model
> summary(OC.glm)
Call:glm(formula = cbind(MI, NoMI) ~ Age + OCuse, family = binomial)
Deviance Residuals:
209
[1] 0.456248 -0.520517 1.377693 -0.886710 -1.685521 0.714695 -0.130922 0.033643 [9] -0.045061 0.008822
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -4.3698 0.4347 -10.054 < 2e-16 ***Age2 1.1384 0.4768 2.388 0.0170 * Age3 1.9344 0.4582 4.221 2.43e-05 ***Age4 2.6481 0.4496 5.889 3.88e-09 ***Age5 3.1943 0.4474 7.140 9.36e-13 ***OCuseYes 1.3852 0.2505 5.530 3.19e-08 ***---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 158.0085 on 9 degrees of freedomResidual deviance: 6.5355 on 4 degrees of freedomAIC: 58.825
Number of Fisher Scoring iterations: 3
Find OR associated with oral contraceptive use ADJUSTED for age. Note: CMH procedure gave 3.97.> exp(1.3852)[1] 3.995625
Find a 95% CI for OR associated with OC use.
> exp(1.3852-1.96*.2505)[1] 2.445428> exp(1.3852+1.96*.2505)[1] 6.528518
Interpreting the age effect in terms of OR’s ADJUSTING for OC use. Note: The reference group is Age = 1 which was women 25 – 29 years of age.
> OC.glm$coefficients(Intercept) Age2 Age3 Age4 Age5 OCuseYes -4.369850 1.138363 1.934401 2.648059 3.194292 1.385176 > Age.coefs <- OC.glm$coefficients[2:5]> exp(Age.coefs) Age2 Age3 Age4 Age5 3.121653 6.919896 14.126585 24.392906
Find 95% CI for age = 5 group.
> exp(3.1943-1.96*.4474)[1] 10.14921> exp(3.1943+1.96*.4474)[1] 58.62751
Example 2: Coffee Drinking and Myocardial InfarctionsCoffeeMI.data = read.table(file.choose(),header=T)
210
> CoffeeMI.data Smoking Coffee MI NoMI1 Never > 5 7 312 Never < 5 55 2693 Former > 5 7 184 Former < 5 20 1125 1-14 Cigs > 5 7 246 1-14 Cigs < 5 33 1147 15-25 Cigs > 5 40 458 15-25 Cigs < 5 88 1729 25-34 Cigs > 5 34 2410 25-34 Cigs < 5 50 5511 35-44 Cigs > 5 27 2412 35-44 Cigs < 5 55 5813 45+ Cigs > 5 30 1714 45+ Cigs < 5 34 17> attach(CoffeeMI.data)> Coffee.glm = glm(cbind(MI,NoMI)~Smoking+Coffee,family=binomial)> summary(Coffee.glm)
Call:glm(formula = cbind(MI, NoMI) ~ Smoking + Coffee, family = binomial)
Deviance Residuals: Min 1Q Median 3Q Max -0.7650 -0.4510 -0.0232 0.2999 0.7917
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.2981 0.1819 -7.136 9.60e-13 ***Smoking15-25 Cigs 0.6892 0.2119 3.253 0.00114 ** Smoking25-34 Cigs 1.2462 0.2398 5.197 2.02e-07 ***Smoking35-44 Cigs 1.1988 0.2389 5.017 5.24e-07 ***Smoking45+ Cigs 1.7811 0.2808 6.342 2.27e-10 ***SmokingFormer -0.3291 0.2778 -1.185 0.23616 SmokingNever -0.3153 0.2279 -1.384 0.16646 Coffee> 5 0.3200 0.1377 2.324 0.02012 * ---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 173.7899 on 13 degrees of freedomResidual deviance: 3.7622 on 6 degrees of freedomAIC: 84.311
Number of Fisher Scoring iterations: 3
OR for drinking 5 or more cups of coffee per day.Note: CMH procedure gave OR = 1.375
> exp(.3200)[1] 1.377128
95% CI for OR associated with heavy coffee drinking
211
> exp(.3200 - 1.96*.1377)[1] 1.051385> exp(.3200 + 1.96*.1377)[1] 1.803794
Reordering a FactorTo examine the effect of smoking we might want to “reorder” the levels of smoking status so that individuals who have never smoked are used as the reference group. To do this in R you must do the following:
Smoking = factor(Smoking,levels=c("Never","Former","1-14 Cigs","15-25 Cigs","25-34 Cigs","35-44 Cigs","45+ Cigs"))
The first level specified in the levels subcommand will be used as the reference group, “Never” in this case. Refitting the model with the reordered smoking status factor gives the following:
> Coffee.glm2 <-glm(cbind(MI,NoMI)~Smoking+Coffee,family=binomial)> summary(Coffee.glm2)Call:glm(formula = cbind(MI, NoMI) ~ Smoking + Coffee, family = binomial)Deviance Residuals: Min 1Q Median 3Q Max -0.7650 -0.4510 -0.0232 0.2999 0.7917
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.61344 0.14068 -11.469 < 2e-16 ***SmokingFormer -0.01376 0.25376 -0.054 0.9568 Smoking1-14 Cigs 0.31533 0.22789 1.384 0.1665 Smoking15-25 Cigs 1.00451 0.17976 5.588 2.30e-08 ***Smoking25-34 Cigs 1.56150 0.21254 7.347 2.03e-13 ***Smoking35-44 Cigs 1.51417 0.21132 7.165 7.77e-13 ***Smoking45+ Cigs 2.09646 0.25855 8.108 5.13e-16 ***Coffee> 5 0.31995 0.13766 2.324 0.0201 * ---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 173.7899 on 13 degrees of freedomResidual deviance: 3.7622 on 6 degrees of freedomAIC: 84.311
Number of Fisher Scoring iterations: 3
Notice that “SmokingNever” is now absent from the output so we know it is being used as the reference group. The OR’s associated with the various levels of smoking are computed below.
> Smoke.coefs = Coffee.glm$coefficients[2:7]> exp(Smoke.coefs)SmokingFormer Smoking1-14 Cigs Smoking15-25 Cigs Smoking25-34 Cigs 0.986338 1.370715 2.730561 4.765984
212
Smoking35-44 Cigs Smoking45+ Cigs 4.545632 8.137279
Confidence intervals for each could be computed in the standard way.
Some Details for Categorical Predictors with More Than Two LevelsConsider the coffee drinking/MI study above. The stratification variable smoking has seven levels. Thus it requires six dummy variables to define it. The level that is not defined using a dichotomous dummy variable serves as the reference group. The table below shows how the value of the dummy variables:
Level D2 D3 D4 D5 D6 D7Never (Reference Group)
0 0 0 0 0 0
Former 1 0 0 0 0 01 – 14 Cigs 0 1 0 0 0 015 – 24 Cigs 0 0 1 0 0 025 – 34 Cigs 0 0 0 1 0 035 – 44 Cigs 0 0 0 0 1 045+ Cigs 0 0 0 0 0 1
Example: Coffee Drinking and Myocardial InfarctionsCoffeeMI.data = read.table(file.choose(),header=T)> CoffeeMI.data Smoking Coffee MI NoMI1 Never > 5 7 31
213
2 Never < 5 55 2693 Former > 5 7 184 Former < 5 20 1125 1-14 Cigs > 5 7 246 1-14 Cigs < 5 33 1147 15-25 Cigs > 5 40 458 15-25 Cigs < 5 88 1729 25-34 Cigs > 5 34 2410 25-34 Cigs < 5 50 5511 35-44 Cigs > 5 27 2412 35-44 Cigs < 5 55 5813 45+ Cigs > 5 30 1714 45+ Cigs < 5 34 17
The Logistic Model
ln ( θ( x )~
1−θ( x~))=βo+β1 Coffee+β2 D2+β3 D3+β4 D4+β5 D5+β6 D6+β7 D7
where Coffee is a dichotomous predictor equal to 1 if they 5 or more cups of coffee per day.
Comparing the log-odds of a heavy coffee drinker who who smokes 15-25 cigarettes day to a heavy coffee drinker who has never smoked we have.
ln (θ1( x~)
1−θ1 ( x~) )=βo+ β1+ β4
ln (θ2( x~)
1−θ2 ( x~) )=βo+ β1
Taking the difference gives,
ln (θ1( x
~)
1−θ1( x~)
θ2( x~)
1−θ2( x~) )=β4
thus
eβ4= the odds ratio associated with smoking 15-24 cigarettes per day when compared to
individuals who have never smoked amongst heavy coffee drinkers. Because β1 is not involved in the odds ratio the result is the same for non-heavy coffee drinkers as well!
214
You can also consider combinations of factors, e.g. if we compared heavy coffee drinkers who smoked 15-24 cigarettes to a non-heavy coffee drinkers who have never smoked the associated OR would be given bye
β1+β 4 .
Using our fitted model the OR’s ratios discussed above would be.
> summary(Coffee.glm)
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.61344 0.14068 -11.469 < 2e-16 ***SmokingFormer -0.01376 0.25376 -0.054 0.9568 Smoking1-14 Cigs 0.31533 0.22789 1.384 0.1665 Smoking15-25 Cigs 1.00451 0.17976 5.588 2.30e-08 ***Smoking25-34 Cigs 1.56150 0.21254 7.347 2.03e-13 ***Smoking35-44 Cigs 1.51417 0.21132 7.165 7.77e-13 ***Smoking45+ Cigs 2.09646 0.25855 8.108 5.13e-16 ***Coffee> 5 0.31995 0.13766 2.324 0.0201 * ---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
OR for 15-24 cigarette smokers vs. never smokers (regardless of coffee drinking status)> exp(1.00451)[1] 2.730569
OR for 15-24 cigarette smokers who are also heavy coffee drinkers vs. non-smokers who are not heavy coffee drinkers > exp(.31995 + 1.00451)[1] 3.760154
Similar calculations could be done for other combinations of coffee and cigarette use.
Using Arc when the Number Trials is not 1
Example 1: Oral contraceptive use, myocardial infarctions, and ageTo read these data in Arc it is easiest to create a text file that looks like:
Age OCuse MI NoMI Trials1 Yes 4 62 661 No 2 224 2262 Yes 9 33 422 No 12 390 4023 Yes 4 26 303 No 33 330 3634 Yes 6 9 154 No 65 362 4275 Yes 6 5 115 No 93 301 394
215
The Trials column contains the total number of patients in each age and oral contraceptive use category, i.e. the sum of the number of patients with MI and the number of patients without MI (NoMI).
When read in Arc we have:; loading D:\Data\Deppa Documents\Biostatistics (Biometry II)\Book Data\OCMI.txtArc 1.06, rev July 2004, Mon Oct 16, 2006, 12:58:46. Data set name: OCMIOral contraceptive use, age, and myocardial infarctionsName Type n InfoAGE Variate 10 MI Variate 10 NOMI Variate 10 TRIALS Variate 10 OCUSE Text 10
In Arc we need to create to turn the Age variable into a factor as we don’t want to be interpreted as an actual number and we need to create a factor based on OCuse. By default Arc does things alphabetically so No would be used as “present” which is not desirable. Thus it is best to create separate dichotomous dummy variables for each level individually. This will allow us to use those who used oral contraceptives as having “risk present”. To do this in Arc we need to use the Make Factors… option in the data menu.
For oral contraceptive use we want two separate dummy variables, one for each level of use, i.e. Yes and No.
216
Fitting the logistic model in Arc with MI as the response and OCUSE[YES] as the risk factor indicator.
Results for Fitted Logistic ModelIteration 1: deviance = 6.69914Iteration 2: deviance = 6.53561
Data set = OCMI, Name of Fit = B1Binomial RegressionKernel mean function = LogisticResponse = MITerms = ({F}AGE {T}OCUSE[YES])Trials = TRIALSCoefficient EstimatesLabel Estimate Std. Error Est/SE p-valueConstant -4.36985 0.434642 -10.054 0.0000{F}AGE[2] 1.13836 0.476782 2.388 0.0170{F}AGE[3] 1.93440 0.458227 4.221 0.0000{F}AGE[4] 2.64806 0.449627 5.889 0.0000{F}AGE[5] 3.19429 0.447386 7.140 0.0000{T}OCUSE[YES] 1.38518 0.250458 5.531 0.0000
Scale factor: 1. Number of cases: 10Degrees of freedom: 4Pearson X2: 6.386Deviance: 6.536
217
We can work with these parameter estimates as above to obtain OR’s of interest etc.
Logistic Regression Case Study 1: Risk Factors for Low Birth Weight
ResponseY = low birth weight, i.e. birth weight < 2500 grams (1 = yes, 0 = no)Set of potential predictorsX1 = previous history of premature labor (1 = yes, 0 = no)X2 = hypertension during pregnancy (1 = yes, 0 = no)X3 = smoker (1 = yes, 0 = no)X4 = uterine irritability (1 = yes, 0 = no)X5 = minority (1 = yes, 0 = no)X6 = mother’s age in yearsX7 = mother’s weight at last menstrual cycle
Analysis in R> Lowbirth = read.table(file.choose(),header=T)> Lowbirth[1:5,] # print first 5 rows of the data set Low Prev Hyper Smoke Uterine Minority Age Lwt race bwt1 0 0 0 0 1 1 19 182 2 25232 0 0 0 0 0 1 33 155 3 25513 0 0 0 1 0 0 20 105 1 25574 0 0 0 1 1 0 21 108 1 2594
218
5 0 0 0 1 1 0 18 107 1 2600
Make sure categorical variables are interpreted as factors by using the factor command> Low = factor(Low)> Prev = factor(Prev)> Hyper = factor(Hyper)> Smoke = factor(Smoke)> Uterine = factor(Uterine)> Minority = factor(Minority)
Note: This is not really necessary for dichotomous variables that are coded (0,1).
Fit a preliminary model using available covariates> low.glm = glm(Low~Prev+Hyper+Smoke+Uterine+Minority+Age+Lwt,family=binomial)> summary(low.glm)Call:glm(formula = Low ~ Prev + Hyper + Smoke + Uterine + Minority + Age + Lwt, family = binomial)
Deviance Residuals: Min 1Q Median 3Q Max -1.6010 -0.8149 -0.5128 1.0188 2.1977
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.378479 1.170627 0.323 0.74646 Prev1 1.196011 0.461534 2.591 0.00956 **Hyper1 1.452236 0.652085 2.227 0.02594 * Smoke1 0.959406 0.405302 2.367 0.01793 * Uterine1 0.647498 0.466468 1.388 0.16511 Minority1 0.990929 0.404969 2.447 0.01441 * Age -0.043221 0.037493 -1.153 0.24900 Lwt -0.012047 0.006422 -1.876 0.06066 . ---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Null deviance: 232.40 on 185 degrees of freedomResidual deviance: 196.71 on 178 degrees of freedomAIC: 212.71
Number of Fisher Scoring iterations: 3
It appears that both uterine irritability and mother’s age are not significant. We can fit the reduced model eliminating both terms and test whether the model is significantly degraded by using the general chi-square test (see pg. 11 of the logistic notes).
> low.reduced = glm(Low~Prev+Hyper+Smoke+Minority+Lwt,family=binomial)> summary(low.reduced)
Call:glm(formula = Low ~ Prev + Hyper + Smoke + Minority + Lwt, family = binomial)
Deviance Residuals: Min 1Q Median 3Q Max -1.7277 -0.8219 -0.5368 0.9867 2.1517
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.261274 0.885803 -0.295 0.76803 Prev1 1.181940 0.444254 2.661 0.00780 **Hyper1 1.397219 0.656271 2.129 0.03325 * Smoke1 0.981849 0.398300 2.465 0.01370 *
219
Minority1 1.044804 0.394956 2.645 0.00816 **Lwt -0.014127 0.006387 -2.212 0.02697 * ---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 232.40 on 185 degrees of freedomResidual deviance: 200.32 on 180 degrees of freedomAIC: 212.32
Number of Fisher Scoring iterations: 3
Ho :ln (θ( x~)
1−θ ( x~) )=βo+ β1 X1+β2 X2+ β3 X3+β5 X5+β7 X 7
H1 :ln(θ( x~)
1−θ( x~) )=βo+β1 X1+ β2 X2+β3 X3+β4 X4+β5 X5+ β6 X6+ β7 X7
* Recall: θ( x
~)=P( Low=1|X
~)
Residual Deviance Null Hypothesis Model: DH o
=200 .32 df = 180
Residual Deviance Alternative Hypothesis Model: DH1
=196 .71 df = 178
General Chi-Square Test χ2=DH 0
−D H1=200. 32−196 .71=3 . 607
p−value=P ( χ22>3 .607 )=. 1647
Fail to reject the null, the reduced model is adequate.Interpretation of Model Parameters OR’s Associated with Categorical Predictors> low.reduced
Call: glm(formula = Low ~ Prev + Hyper + Smoke + Minority + Lwt, family = binomial)
Coefficients:(Intercept) Prev1 Hyper1 Smoke1 Minority1 Lwt -0.26127 1.18194 1.39722 0.98185 1.04480 -0.01413
Degrees of Freedom: 185 Total (i.e. Null); 180 ResidualNull Deviance: 232.4 Residual Deviance: 200.3 AIC: 212.3
Estimated OR’s > exp(low.reduced$coefficients[2:5]) Prev1 Hyper1 Smoke1 Minority1
220
3.260693 4.043938 2.669388 2.842841
95% CI for OR Associated with History of Premature Labor> exp(1.182 - 1.96*.444)[1] 1.365827> exp(1.182 + 1.96*.444)[1] 7.78532
Holding everything else constant we estimate that the odds of having an infant with low birth weight are between 1.366 and 7.785 times larger for mothers with a history of premature labor.
95% CI for OR Associated with Hypertension> exp(1.397 - 1.96*.6563)[1] 1.117006> exp(1.397 + 1.96*.6563)[1] 14.63401
Holding everything else constant we estimate that the odds of having an infant with low birth weight are between 1.117 and 14.63 times larger for mothers with hypertension during pregnancy.
95% CI for OR Associated with Smoking> exp(.981849 - 1.96*.3983)[1] 1.222846> exp(.981849 + 1.96*.3983)[1] 5.827086
Holding everything else constant we estimate that the odds of having an infant with low birth weight are between 1.223 and 5.827 times larger for mothers who smoked during pregnancy.
95% CI for OR Associated with Minority Status> exp(1.0448 - 1.96*.3950)[1] 1.310751> exp(1.0448 + 1.96*.3950)[1] 6.16569
Holding everything else constant we estimate that the odds of having an infant with low birth weight are between 1.311 and 6.166 times larger for non-white mothers.
OR Associated with Mother’s Weight at Last Menstrual Cycle
Because this is a continuous predictor with values over 100 we should use an increment larger than one when considering the effect of mother’s weight on birth weight. Here we will use an increment of c = 10 lbs. although certainly there are other possibilities.
> exp(-10*.014127) [1] 0.8682549
221
i.e. 13.2% decrease in the OR for each additional 10 lbs. in premenstrual weight.
A 95% CI for this OR is:> exp(10*(-.014127) - 1.96*10*.006387)[1] 0.7660903> exp(10*(-.014127) + 1.96*10*.006387)[1] 0.9840439
x = seq(min(Lwt),max(Lwt),.5)fit = predict(low.reduced,data.frame(Prev=factor(rep(1,length(x))),Hyper=factor(rep(0,length(x))),Smoke=factor(rep(1,length(x))),Minority=factor(rep(0,length(x))),Lwt=x),type="response")plot(x,fit,xlab=”Mother’s Weight”,ylab=”P(Low|Prev=1,Smoke=1,Lwt)”)
Diagnostics (Delta Deviance and Cook’s Distance)As in the case of ordinary least squares (OLS) regression we need to be wary of cases that may have unduly high influence on our results and those that are poorly fit. The most common influence measure is Cook’s Distance and a good measure of poorly fit cases is the Delta Deviance.
Essentially Cook’s Distance (Δ β(−i ) ) measures the changes in the estimated parameters when the ith observation is deleted. This change is measured for each of the observations
and can be plotted versus θ( x
~) or observation number to aid in the identification of high
influence cases. Several cut-offs have been proposed for Cook’s Distance, the most
common being to classify an observation as having large influence if Δ β (−i)>1 or, in
case of large sample size n, Δ β (−i)>4 /n . (details of Cook’s distance on page 38 below)
This is a plot of the effect of premenstrual weight for smoking mothers with a history of premature labor. Using the predict command above similar plots could be constructed by examining other combinations of the categorical predictors.
222
Delta deviance measures the change in the deviance (D) when the ith case is deleted. Values around 4 or larger are considered to cases that are poorly fit. These cases
correspond to cases to individuals where y i=1 but θ( x
~) is small, or cases where y i=0
but θ( x
~) is large.
In cases of both high influence and poor fit it is good to look at the covariate values for these individuals and we can begin to address the role they play in the analysis. In many cases there will be several individuals with the same covariate pattern, especially if most or all of the predictors are categorical in nature.
> Diagplot.glm(low.reduced)
> Diagplot.log(low.reduced)
223
Cases 11 and 13 have the highest Cook’s distances although they are not that large. It should be noted also that they are also somewhat poorly fit. Cases 129, 144, 152, and 180 appear to be poorly fit. The information on all of these cases is shown below.
> Lowbirth[c(11,13,129,144,152,180),] Low Prev Hyper Smoke Uterine Minority Age Lwt race bwt11 0 0 1 0 0 1 19 95 3 272213 0 0 1 0 0 1 22 95 3 2750129 1 0 0 0 1 0 29 130 1 1021144 1 0 0 0 1 1 21 200 2 1928152 1 0 0 0 0 0 24 138 1 2100180 1 0 0 1 0 0 26 190 1 2466
Case 152 had a low birth weight infant even in the absence of the identified potential risk factors. The fitted values for all four of the poorly fit cases are quite small.
> fitted(low.reduced)[c(11,13,129,144,152,180)] 11 13 129 144 152 180 0.69818500 0.69818500 0.10930602 0.11486743 0.09877858 0.12307383
Cases 11 and 13 have high predicted probabilities despite the fact that they had babies with normal birth weight. Their relatively high leverage might come from the fact that there were very few hypertensive minority women in the study. These two facts combined lead to the relatively large Cook’s Distances for these two cases.
224
Plotting Estimated Conditional Probabilities ~P( Low=1|x
~)
A summary of the reduced model is given below:> low.reduced
Call: glm(formula = Low ~ Prev + Hyper + Smoke + Minority + Lwt, family = binomial)
Coefficients:(Intercept) Prev1 Hyper1 Smoke1 Minority1 Lwt -0.26127 1.18194 1.39722 0.98185 1.04480 -0.01413
Degrees of Freedom: 185 Total (i.e. Null); 180 ResidualNull Deviance: 232.4 Residual Deviance: 200.3 AIC: 212.3
To easily plot probabilities in R we can write a function that takes covariate values and compute the desired conditional probability.
> x <- seq(min(Lwt),max(Lwt),.5)
> PrLwt <- function(x,Prev,Hyper,Smoke,Minority) {+ L <- -.26127 + 1.18194*Prev + 1.39722*Hyper + .98185*Smoke + + 1.0448*Minority - .01413*x+ exp(L)/(1 + exp(L))+ }> plot(x,PrLwt(x,1,1,1,1),xlab="Mother's Weight",ylab="P(Low=1|x)",+ ylim=c(0,1),type="l")> title(main="Plot of P(Low=1|X) vs. Mother's Weight")> lines(x,PrLwt(x,0,0,0,0),lty=2,col="red")> lines(x,PrLwt(x,1,1,0,0),lty=3,col="blue")> lines(x,PrLwt(x,0,0,1,1),lty=4,col="green")
225
Fitting Logistic Models in Arc and More Diagnostics (lowbirtharc.txt from website)Again we consider the low birth weight case study.
Arc 1.03, rev Aug, 2000, Wed Oct 22, 2003, 12:10:14. Data set name: LowbwLow birth weight study.Name Type n InfoAGE Variate 189 Age of motherBWT Variate 189 Actual birthweight of child in gramsHT Variate 189 Mother hypertensive during pregnancy (1 = yes, 0 = no)ID Variate 189 LOW Variate 189 (1 = low birthweight, 0 = normal birthweight)LWT Variate 189 Mothers weight at last menstrual cyclePTD Variate 189 do not knowPTL Variate 189 Previous history of premature labor (1 = yes, 0 = no)RACE Variate 189 Race of mother (1 = white, 2 = black, 3 = other)SMOKE Variate 189 Mother smoke (1 = yes, 0 = no)UI Variate 189 Uterine irritability (1 = yes, 0 = no)FTV Text 189 # of doctor visits during 1st trimester{F}FTV Factor 189 Factor--first level dropped{F}HT Factor 189 Factor--first level dropped{F}PTD Factor 189 Factor--first level dropped{F}RACE Factor 189 Factor--first level dropped{F}SMOKE Factor 189 Factor--first level dropped{F}UI Factor 189 Factor--first level dropped
In the resulting dialog box, specify the model as shown on the following page.
Select Fit binomial response… from the Graph & Fit menu
226
The output below shows the results of fitting this initial model.Data set = Lowbw, Name of Fit = B1Binomial RegressionKernel mean function = LogisticResponse = LOWTerms = (AGE LWT {F}FTV {F}HT {F}PTD {F}RACE {F}SMOKE {F}UI)Trials = OnesCoefficient EstimatesLabel Estimate Std. Error Est/SE p-valueConstant 0.386634 1.27736 0.303 0.7621AGE -0.0372340 0.0386777 -0.963 0.3357LWT -0.0156530 0.00707594 -2.212 0.0270{F}FTV[0] 0.436379 0.479161 0.911 0.3624{F}FTV[2+] 0.615386 0.553104 1.113 0.2659 {F}HT[1] 1.91316 0.720434 2.656 0.0079{F}PTD[1] 1.34376 0.480445 2.797 0.0052{F}RACE[2] 1.19241 0.535746 2.226 0.0260{F}RACE[3] 0.740681 0.461461 1.605 0.1085{F}SMOKE[1] 0.755525 0.424764 1.779 0.0753{F}UI[1] 0.680195 0.464216 1.465 0.1429
Scale factor: 1. Number of cases: 189Degrees of freedom: 178Pearson X2: 179.059Deviance: 195.476 (Note: AIC = D + 2k*(scale factor) = 195.48 + 22 = 217.48)The results are identical those obtained from R. Null deviance: 234.67 on 188 degrees of freedomResidual deviance: 195.48 on 178 degrees of freedom
Give the model a name if you want.
Always include an intercept.
Use the Make Factors… option from the data set menu to ensure all categorical predictors are treated as factors.
Put dichotomous response in the Response… box. The response may also be the number of “successes” observed. (see below)
If mi=1 for all cases then put the variable Ones in the Trials… box. If your response represented the number of “successes” observed in mi > 1 trials then you we need to import the number trials and put that variable in this box.
Note: For FTV those who went to the doctor once during the first trimester are used as the reference group
227
Examining Submodels – Backward Elimination and Forward Selection
The results of backward elimination for the current low birth weight model are shown below.Data set = Lowbw, Name of Fit = B1Binomial RegressionKernel mean function = LogisticResponse = LOWTerms = (AGE LWT {F}FTV {F}HT {F}PTD {F}RACE {F}SMOKE {F}UI)Trials = OnesBackward Elimination: Sequentially remove termsthat give the smallest change in AIC.All fits include an intercept.
Current terms: (AGE LWT {F}FTV {F}HT {F}PTD {F}RACE {F}SMOKE {F}UI) df Deviance Pearson X2 | k AICDelete: {F}FTV 180 196.834 180.989 | 9 214.834 *Delete: AGE 179 196.417 181.401 | 10 216.417Delete: {F}UI 179 197.585 180.753 | 10 217.585Delete: {F}SMOKE 179 198.674 186.809 | 10 218.674Delete: {F}RACE 180 201.227 183.365 | 9 219.227Delete: LWT 179 200.949 177.855 | 10 220.949Delete: {F}HT 179 202.934 177.447 | 10 222.934Delete: {F}PTD 179 203.584 180.74 | 10 223.584
Current terms: (AGE LWT {F}HT {F}PTD {F}RACE {F}SMOKE {F}UI) df Deviance Pearson X2 | k AICDelete: AGE 181 197.852 183.999 | 8 213.852 *Delete: {F}UI 181 199.151 184.559 | 8 215.151Delete: {F}RACE 182 203.24 182.815 | 7 217.240Delete: {F}SMOKE 181 201.247 186.953 | 8 217.247Delete: LWT 181 201.833 181.355 | 8 217.833Delete: {F}PTD 181 203.948 181.536 | 8 219.948Delete: {F}HT 181 204.013 179.069 | 8 220.013
Forward Elimination –Select this option and click OK. It will then show terms are sequentially added to a model containing any base terms to the model. By default the base contains the intercept only.
Backward Elimination –Simply select this option and click OK. It will show how terms are sequentially eliminated from the model along with the resulting AIC for the deletion.
The other options do what they say.
228
Current terms: (LWT {F}HT {F}PTD {F}RACE {F}SMOKE {F}UI) df Deviance Pearson X2 | k AICDelete: {F}UI 182 200.482 186.918 | 7 214.482 *Delete: {F}SMOKE 182 202.567 189.716 | 7 216.567Delete: {F}RACE 183 205.466 186.461 | 6 217.466Delete: LWT 182 203.816 185.551 | 7 217.816Delete: {F}PTD 182 204.217 182.499 | 7 218.217Delete: {F}HT 182 205.162 182.282 | 7 219.162
Current terms: (LWT {F}HT {F}PTD {F}RACE {F}SMOKE) df Deviance Pearson X2 | k AICDelete: {F}SMOKE 183 205.397 189.925 | 6 217.397Delete: {F}RACE 184 207.955 192.506 | 5 217.955Delete: {F}HT 183 207.039 184.17 | 6 219.039Delete: LWT 183 207.165 187.234 | 6 219.165Delete: {F}PTD 183 208.247 184.45 | 6 220.247
Current terms: (LWT {F}HT {F}PTD {F}RACE) df Deviance Pearson X2 | k AICDelete: {F}RACE 185 210.123 194.086 | 4 218.123Delete: {F}HT 184 212.18 188.048 | 5 222.180Delete: LWT 184 213.226 187.544 | 5 223.226Delete: {F}PTD 184 216.295 191.533 | 5 226.295
Current terms: (LWT {F}HT {F}PTD) df Deviance Pearson X2 | k AICDelete: {F}HT 186 217.497 190.809 | 3 223.497Delete: LWT 186 217.662 188.394 | 3 223.662Delete: {F}PTD 186 221.142 193.26 | 3 227.142
Current terms: (LWT {F}PTD) df Deviance Pearson X2 | k AICDelete: LWT 187 221.898 188.863 | 2 225.898Delete: {F}PTD 187 228.691 189.647 | 2 232.691
* indicates a potential “final” model using the AIC criteria, Arc does not add the *’s.
Making InteractionsTo make interactions in Arc…
1st - Select Make Interactions from the data set menu.
2nd - Placing all covariates in the right-hand box will create all possible two-way interactions.
229
Deciding which interactions to include however is not as easy as in R. You could potentially include all interactions and then backward eliminate, however things will get unstable numerically with that many terms in the model. It is better to choose any interactions you feel might make physiological sense and then backward eliminate.
If Arc does not use the reference group you would like to use, you can create dummy variables for each level of the factor and then leave the one for the reference group out when you specify the model.
The model with the age*recoded FTV and the smoking*uterine irritability interactions we saw in the R handout is summarized below.Data set = Lowbw, Name of Fit = B6Binomial RegressionKernel mean function = LogisticResponse = LOWTerms = (AGE LWT {F}HT {F}PTD {F}SMOKE {F}UI {F}SMOKE*{F}UI {T}FTV[1] {T}FTV[2+] {T}FTV[1]*AGE {T}FTV[2+]*AGE)Trials = OnesCoefficient EstimatesLabel Estimate Std. Error Est/SE p-valueConstant -0.582374 1.42158 -0.410 0.6821AGE 0.0755389 0.0539665 1.400 0.1616LWT -0.0203726 0.00749678 -2.718 0.0066{F}HT[1] 2.06570 0.748727 2.759 0.0058{F}PTD[1] 1.56032 0.496986 3.140 0.0017{F}SMOKE[1] 0.780044 0.420371 1.856 0.0635{F}UI[1] 1.81853 0.667517 2.724 0.0064{F}SMOKE[1].{F}UI[1] -1.91668 0.973066 -1.970 0.0489{T}FTV[1] 2.92109 2.28571 1.278 0.2013{T}FTV[2+] 9.24491 2.66099 3.474 0.0005{T}FTV[1].AGE -0.161824 0.0968164 -1.671 0.0946{T}FTV[2+].AGE -0.411033 0.119117 -3.451 0.0006 Number of cases: 189Degrees of freedom: 177Pearson X2: 179.282Deviance: 183.073
Selecting these options will create three dummy variables one for each level of FTV (0, 1, 2+).
Notice: The recoding of FTV so FTV=0 is now the reference group.
230
Diagnostic PlotsThere are several plotting options in Arc to help assess a models adequacy. They are as follows:
Residuals (deviance or chi-square) vs. the estimated logit (L= βT X )Deviance Residual
Di=2⋅sgn ( y i−θ ( xi ))⋅[ yi ln( y i
θ( x i ) )+(1− y i ) ln( 1− y i
1−θ (x i ))]Chi-residual for the ith covariate pattern is defined as:
e χ i=
yi− y i
√mi θ (x~
i )(1−θ( x i))~ the sum of the squared chi-residuals = Pearson’s
where y i=miθ ( xi )
~ and y i=1for cases and 0 for controls. Plot of Cook’s distance vs. Case Number or some other quantity. Plot of Leverage (potential for influence) vs. Case Number Model checking plots
Residuals vs. Estimated Logit (or some other function of the covariates)If the model is adequate, a lowess ( = .6) smooth added to the plot should be constant,
i.e. flat. This plot will not work well when the number of replicates,mi are small, i.e. close to 1. Model checking plots work better for checking model adequacy in those cases.
Eta’U ~ Estimated Logit (Li= βT x i
~ )
Obs-Fraction ~ y i/mi (1 and 0’s in the case mi =1)
Fit-Fraction ~ θ( x i
~)= e Li
1+eLi
Chi-Residuals ~ see aboveDev-Residuals ~ see above
T-Residuals ~
e χi
√1−hi studentized chi-residualLeverages ~ hi = ith element of the hat matrix H
Cook’s Distance ~ Di=
1k ( e χ i
2
1−hi) hi
1−hi measures influence of the ith case.
231
As an example consider the simple, but reasonable, main effects model shown on the next page.
Data set = Lowbw, Name of Fit = B3Binomial RegressionKernel mean function = LogisticResponse = LOWTerms = (LWT {F}HT {F}PTD {F}RACE {F}SMOKE {F}UI)Trials = OnesCoefficient EstimatesLabel Estimate Std. Error Est/SE p-valueConstant -0.125327 0.967238 -0.130 0.8969LWT -0.0159185 0.00695085 -2.290 0.0220{F}HT[1] 1.86689 0.707212 2.640 0.0083{F}PTD[1] 1.12886 0.450330 2.507 0.0122{F}RACE[2] 1.30085 0.528349 2.462 0.0138{F}RACE[3] 0.854413 0.440761 1.938 0.0526{F}SMOKE[1] 0.866581 0.404341 2.143 0.0321{F}UI[1] 0.750648 0.458753 1.636 0.1018
Scale factor: 1. Number of cases: 189Degrees of freedom: 181Pearson X2: 183.999Deviance: 197.852
The plots of the chi-square residuals vs. the estimated logit (L =βT X ) and LWT are shown below. The lowess smooth looks fairly flat and so no model inadequacies are suggested.
232
Cook’s Distance vs. Case Number and Est. Probs - (no cases have high influence)
Leverages vs. Case Numbers For leverages the average value is k/n, so values far exceeding the average have the potential to be influential. The following is a good rule of thumb:
1/n < hi < .25 no worries.25 < hi < .50 worry.50 < hi < 1 worry lots
Model Checking Plots
For any linear combination bT x i of the predictors of terms imagine drawing two plots:
one of y i/mi vs. bT x i , and one of
θ( x i )~ vs.b
T x i . If the model is adequate lowess smooth of each should match for any linear combination we choose. A model checking plot is a
plot with bT x i on the x-axis and both the lowess smooths described above added to the
plot. If they agree for a variety of choices of bT x i then we can feel reasonably confident
that our model is adequate. Large differences between these smooths can indicate model
deficiencies. Common choices for bT x i include the estimated logits (L ), the individual
predictors, and randomly chosen combinations of the terms in the model.
233
Here we see good agreement between the two smoothes for the estimated logits.
Model checking plot with the single term LWT on the x-axis.
234
Model checking plot for one random linear combination of the terms in the model. Again we see good agreement.
235
Interactions and Higher Order Terms (Note ~ uses data frame: Lowbwt ) Working with a slightly different version of the low birth weight data available which includes an additional predictor, ftv, which is a factor that indicates the number of first trimester doctor visits the woman (coded as: 0, 1, or 2+). We will examine how the model below was developed in the next section where we discuss model development.
In the model below we have added an interaction between age and the number of first trimester visits. The logistic model is:
log(θ( x~)
1−θ( x~) )=βo+β1 Age+β2 Lwt+ β3 Smoke+β4 Pr ev+β5 HT +β6 UI+
β7 FTV 1+β8 FTV 2+β9 Age∗FTV 1+β10 Age∗FTV 2+β11 Smoke∗UI
> summary(bigmodel)
Call:glm(formula = low ~ age + lwt + smoke + ptd + ht + ui + ftv + age:ftv + smoke:ui, family = binomial)
Deviance Residuals: Min 1Q Median 3Q Max -1.8945 -0.7128 -0.4817 0.7841 2.3418
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.582389 1.420834 -0.410 0.681885 age 0.075538 0.053945 1.400 0.161428 lwt -0.020372 0.007488 -2.721 0.006513 ** smoke1 0.780047 0.420043 1.857 0.063302 . ptd1 1.560304 0.496626 3.142 0.001679 ** ht1 2.065680 0.748330 2.760 0.005773 ** ui1 1.818496 0.666670 2.728 0.006377 ** ftv1 2.921068 2.284093 1.279 0.200941 ftv2+ 9.244460 2.650495 3.488 0.000487 ***age:ftv1 -0.161823 0.096736 -1.673 0.094360 . age:ftv2+ -0.411011 0.118553 -3.467 0.000527 ***smoke1:ui1 -1.916644 0.972366 -1.971 0.048711 * ---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 234.67 on 188 degrees of freedomResidual deviance: 183.07 on 177 degrees of freedomAIC: 207.07
Number of Fisher Scoring iterations: 4> bigmodel$coefficients (Intercept) age lwt smoke1 prev1 ht1 -0.58238913 0.07553844 -0.02037234 0.78004747 1.56030401 2.06567991 ui1 ftv1 ftv2+ age:ftv1 age:ftv2+ smoke1:ui1 1.81849631 2.92106773 9.24445985 -0.16182328 -0.41101103 -1.91664380
236
Calculate P(Low|Age,FTV) for women of average pre-pregnancy weight with all other risk factors absent. Similar calculations could be done if we wanted to add in other factors as well.
First we calculate the logits as function of age for three levels of FTV 0, 1, and 2+ respectively.> L <- -.5824 + .0755*agex - .02037*mean(lwt)> L1 <- -.5824 + .0755*agex - .02037*mean(lwt) + 2.9211 - .16182*agex> L2 <- -.5824 + .0755*agex - .02037*mean(lwt) + 9.2445 - .4110*agex
Next we calculate the associated conditional probabilities.> P <- exp(L)/(1+exp(L))> P1 <- exp(L1)/(1+exp(L1))> P2 <- exp(L2)/(1+exp(L2))
Finally we plot the probability curves as function of age and FTV.> plot(agex,P,type="l",xlab="Age",ylab="P(Low|Age,FTV)",ylim=c(0,1))> lines(agex,P1,lty=2,col="blue")> lines(agex,P2,lty=3,col="red")> title(main="Interaction Between Age and First Trimester Visits",cex=.6)
We also have an interaction between smoking and uterine irritability added to the model. This will affect how we interpret the two in terms of odds ratios. We need to consider the OR associated with smoking for women without uterine irritability, the OR associated with uterine irritability for nonsmokers, and finally the OR associated with smoking and having uterine irritability during pregnancy.
The interaction between in age and FTV produces differences in direction and magnitude of the age effect. For women with no first trimester doctor visits their probability of low birth weight increases with age. However for women with at least one first trimester visit the probability of low birth weight decreases with age. The magnitude of that drop is largest for women with 2 or more first trimester visits.
237
These estimated odds ratios are given below:
OR for Smoking with No Uterine Irritability> exp(.7800)[1] 2.181472OR for Uterine Irritability with No Smoking> exp(1.8185)[1] 6.162608OR for Smoking and Uterine Irritability > exp(.7800+1.8185-1.91664)[1] 1.977553
This result is hard to explain physiologically and so this interaction term might be removed from the model.
Model Selection MethodsStepwise methods used in logistic regression are the same as those used in ordinary least square regression however the measure is the AIC (Akaike Information Criteria) as opposed to Mallow’s Ck statistic. Like Mallow’s statistic, AIC balances residual deviance and the number of parameters in the model.
AIC = D + 2kφ
Where D = residual deviance, k = total number of estimated parameters, and φ is an estimate of the dispersion parameter which is taken to be 1 in models where overdispersion is not present. Overdispersion occurs when the data consists of the
number of successes out of mi > 1 trials and the trials are not independent (e.g. male birth data from your last homework).
Forward, backward, both forward and backward simultaneously, and all possible subsets regression methods can be employed to find models with small AIC values. By default R uses both forward and backward selection simultaneously. The command to do this in R has the basic form:
> step(current model name)
To have it select from models containing all potential two-way interactions use:
> step(current model name, scope=~.^2)
This sometimes will have problems with convergence due to overfitting (i.e. the estimated probabilities approach 0 and 1 as in the saturated model). If this occurs you can have R consider adding each of the potential interaction terms and then you can scan the list and decide which you might want to add to your existing model. You can then continue adding terms until the AIC criteria suggests additional terms do not improve current model.
238
These commands are illustrated for the low birth weight data with first trimester visits included in the output shown below.
Base Model> low.glm <- glm(low~age+lwt+race+smoke+ht+ui+ptd+ftv,family=binomial)> summary(low.glm)
Call:glm(formula = low ~ age + lwt + race + smoke + ht + ui + ptd + ftv, family = binomial)
Deviance Residuals: Min 1Q Median 3Q Max -1.7038 -0.8068 -0.5009 0.8836 2.2151
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.822706 1.240174 0.663 0.50709 age -0.037220 0.038530 -0.966 0.33404 lwt -0.015651 0.007048 -2.221 0.02637 * race2 1.192231 0.534428 2.231 0.02569 * race3 0.740513 0.459769 1.611 0.10726 smoke1 0.755374 0.423246 1.785 0.07431 . ht1 1.912974 0.718586 2.662 0.00776 **ui1 0.680162 0.463464 1.468 0.14222 ptd1 1.343654 0.479409 2.803 0.00507 **ftv1 -0.436331 0.477792 -0.913 0.36112 ftv2+ 0.178939 0.455227 0.393 0.69426 ---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 234.67 on 188 degrees of freedomResidual deviance: 195.48 on 178 degrees of freedomAIC: 217.48
Number of Fisher Scoring iterations: 3
Find “best” model that includes all potential two-way interactions> low.step <- step(low.glm,scope=~.^2)Start: AIC= 217.48 low ~ age + lwt + race + smoke + ht + ui + ptd + ftv
Df Deviance AIC+ age:ftv 2 183.00 209.00- ftv 2 196.83 214.83- age 1 196.42 216.42<none> 195.48 217.48- ui 1 197.59 217.59+ smoke:ui 1 193.76 217.76+ lwt:smoke 1 194.04 218.04+ ui:ptd 1 194.24 218.24+ lwt:ui 1 194.28 218.28
239
+ ptd:ftv 2 192.38 218.38+ ht:ptd 1 194.55 218.55+ age:ptd 1 194.58 218.58+ age:ht 1 194.59 218.59+ age:smoke 1 194.61 218.61+ race:ui 2 192.63 218.63- smoke 1 198.67 218.67+ smoke:ht 1 195.03 219.03+ smoke:ptd 1 195.16 219.16- race 2 201.23 219.23+ race:smoke 2 193.24 219.24+ lwt:ptd 1 195.35 219.35+ lwt:ht 1 195.44 219.44+ age:lwt 1 195.46 219.46+ age:ui 1 195.47 219.47+ ht:ftv 2 194.00 220.00+ lwt:ftv 2 194.19 220.19+ smoke:ftv 2 194.47 220.47+ age:race 2 194.58 220.58+ lwt:race 2 194.63 220.63+ race:ptd 2 194.83 220.83- lwt 1 200.95 220.95+ race:ht 2 195.19 221.19+ ui:ftv 2 195.32 221.32- ht 1 202.93 222.93- ptd 1 203.58 223.58+ race:ftv 4 193.81 223.81
Step: AIC= 209 low ~ age + lwt + race + smoke + ht + ui + ptd + ftv + age:ftv
Df Deviance AIC+ smoke:ui 1 179.94 207.94+ lwt:smoke 1 180.89 208.89- race 2 186.99 208.99<none> 183.00 209.00+ ui:ptd 1 181.42 209.42+ lwt:ui 1 181.90 209.90+ ht:ptd 1 182.06 210.06- smoke 1 186.11 210.11+ age:smoke 1 182.16 210.16+ race:ui 2 180.32 210.32+ age:ptd 1 182.50 210.50- ui 1 186.61 210.61+ smoke:ht 1 182.71 210.71+ lwt:ptd 1 182.75 210.75+ smoke:ptd 1 182.82 210.82+ age:ht 1 182.90 210.90+ age:ui 1 182.96 210.96+ age:lwt 1 183.00 211.00+ lwt:ht 1 183.00 211.00+ race:smoke 2 181.23 211.23+ lwt:ftv 2 181.44 211.44+ ptd:ftv 2 181.57 211.57+ age:race 2 181.62 211.62+ smoke:ftv 2 181.65 211.65+ ht:ftv 2 181.82 211.82
240
+ lwt:race 2 182.55 212.55+ race:ht 2 182.78 212.78+ race:ptd 2 182.85 212.85- lwt 1 188.88 212.88+ ui:ftv 2 182.94 212.94- ht 1 190.13 214.13- ptd 1 191.05 215.05+ race:ftv 4 181.69 215.69- age:ftv 2 195.48 217.48
Step: AIC= 207.94 low ~ age + lwt + race + smoke + ht + ui + ptd + ftv + age:ftv + smoke:ui
Df Deviance AIC- race 2 183.07 207.07<none> 179.94 207.94+ lwt:smoke 1 178.34 208.34+ ht:ptd 1 178.89 208.89- smoke:ui 1 183.00 209.00+ ui:ptd 1 179.07 209.07+ age:ptd 1 179.35 209.35+ age:smoke 1 179.37 209.37+ smoke:ptd 1 179.58 209.58+ lwt:ptd 1 179.61 209.61+ lwt:ui 1 179.76 209.76+ age:ht 1 179.78 209.78+ smoke:ht 1 179.82 209.82+ age:lwt 1 179.84 209.84+ age:ui 1 179.86 209.86+ lwt:ht 1 179.94 209.94+ lwt:ftv 2 178.25 210.25+ ptd:ftv 2 178.53 210.53+ smoke:ftv 2 178.64 210.64+ race:smoke 2 178.73 210.73+ age:race 2 178.84 210.84+ ht:ftv 2 178.89 210.89+ race:ui 2 179.13 211.13+ ui:ftv 2 179.50 211.50+ race:ht 2 179.52 211.52+ lwt:race 2 179.68 211.68+ race:ptd 2 179.86 211.86- lwt 1 187.15 213.15- ht 1 187.66 213.66+ race:ftv 4 178.51 214.51- ptd 1 188.83 214.83- age:ftv 2 193.76 217.76
Step: AIC= 207.07 low ~ age + lwt + smoke + ht + ui + ptd + ftv + age:ftv + smoke:ui
Df Deviance AIC<none> 183.07 207.07+ lwt:smoke 1 181.40 207.40+ ui:ptd 1 181.88 207.88+ ht:ptd 1 181.93 207.93+ race 2 179.94 207.94
241
+ age:smoke 1 181.97 207.97+ age:ht 1 182.64 208.64+ age:ptd 1 182.69 208.69+ lwt:ptd 1 182.73 208.73+ lwt:ui 1 182.76 208.76+ smoke:ptd 1 182.85 208.85+ age:lwt 1 182.92 208.92- smoke:ui 1 186.99 208.99+ age:ui 1 182.99 208.99+ smoke:ht 1 183.02 209.02+ lwt:ht 1 183.06 209.06+ smoke:ftv 2 181.48 209.48+ lwt:ftv 2 181.69 209.69+ ptd:ftv 2 181.85 209.85+ ui:ftv 2 182.28 210.28+ ht:ftv 2 182.41 210.41- ht 1 191.21 213.21- lwt 1 191.56 213.56- ptd 1 193.59 215.59- age:ftv 2 199.00 219.00Summarize the model returned from the stepwise search> summary(low.step)
Call:glm(formula = low ~ age + lwt + smoke + ht + ui + ptd + ftv + age:ftv + smoke:ui, family = binomial)
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.582389 1.420834 -0.410 0.681885 age 0.075538 0.053945 1.400 0.161428 lwt -0.020372 0.007488 -2.721 0.006513 ** smoke1 0.780047 0.420043 1.857 0.063302 . ht1 2.065680 0.748330 2.760 0.005773 ** ui1 1.818496 0.666670 2.728 0.006377 ** ptd1 1.560304 0.496626 3.142 0.001679 ** ftv1 2.921068 2.284093 1.279 0.200941 ftv2+ 9.244460 2.650495 3.488 0.000487 ***age:ftv1 -0.161823 0.096736 -1.673 0.094360 . age:ftv2+ -0.411011 0.118553 -3.467 0.000527 ***smoke1:ui1 -1.916644 0.972366 -1.971 0.048711 * Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 234.67 on 188 degrees of freedomResidual deviance: 183.07 on 177 degrees of freedomAIC: 207.07Number of Fisher Scoring iterations: 4
This is the model used to demonstrate model interpretation in the presence of interactions.An alternative to the full blown search above is to consider adding a single interaction term to the “Base Model” from the set of all possible terms.
242
> add1(low.glm,scope=~.^2)Single term additions
Model:low ~ age + lwt + race + smoke + ht + ui + ptd + ftv Df Deviance AIC<none> 195.48 217.48age:lwt 1 195.46 219.46age:race 2 194.58 220.58age:smoke 1 194.61 218.61age:ht 1 194.59 218.59age:ui 1 195.47 219.47age:ptd 1 194.58 218.58age:ftv 2 183.00 209.00 *lwt:race 2 194.63 220.63lwt:smoke 1 194.04 218.04lwt:ht 1 195.44 219.44lwt:ui 1 194.28 218.28lwt:ptd 1 195.35 219.35lwt:ftv 2 194.19 220.19race:smoke 2 193.24 219.24race:ht 2 195.19 221.19race:ui 2 192.63 218.63race:ptd 2 194.83 220.83race:ftv 4 193.81 223.81smoke:ht 1 195.03 219.03smoke:ui 1 193.76 217.76smoke:ptd 1 195.16 219.16smoke:ftv 2 194.47 220.47ht:ui 0 195.48 217.48ht:ptd 1 194.55 218.55ht:ftv 2 194.00 220.00ui:ptd 1 194.24 218.24ui:ftv 2 195.32 221.32ptd:ftv 2 192.38 218.38
We can than “manually” enter this term to our base model by using the update command in R.> low.glm2 <- update(low.glm,.~.+age:ftv)> summary(low.glm2)
Call:glm(formula = low ~ age + lwt + race + smoke + ht + ui + ptd + ftv + age:ftv, family = binomial)
Deviance Residuals: Min 1Q Median 3Q Max -2.0338 -0.7690 -0.4510 0.8354 2.3383
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.636485 1.558677 -1.050 0.29376 age 0.085461 0.055734 1.533 0.12519 lwt -0.017599 0.007653 -2.300 0.02147 * race2 0.994134 0.550962 1.804 0.07118 . race3 0.700669 0.491400 1.426 0.15391
243
smoke1 0.792972 0.452303 1.753 0.07957 . ht1 1.936204 0.747576 2.590 0.00960 **ui1 0.938620 0.492240 1.907 0.05654 . ptd1 1.373390 0.495738 2.770 0.00560 **ftv1 2.877889 2.253710 1.277 0.20162 ftv2+ 8.264965 2.594444 3.186 0.00144 **age:ftv1 -0.149619 0.096342 -1.553 0.12043 age:ftv2+ -0.359454 0.115429 -3.114 0.00185 **---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 234.67 on 188 degrees of freedomResidual deviance: 183.00 on 176 degrees of freedomAIC: 209Number of Fisher Scoring iterations: 4
Next we could use add1 to consider the remaining interaction terms for addition to this model.> add1(low.glm2,scope=~.^2)Single term additionsModel:low ~ age + lwt + race + smoke + ht + ui + ptd + ftv + age:ftv Df Deviance AIC<none> 183.00 209.00age:lwt 1 183.00 211.00age:race 2 181.62 211.62age:smoke 1 182.16 210.16age:ht 1 182.90 210.90age:ui 1 182.96 210.96age:ptd 1 182.50 210.50lwt:race 2 182.55 212.55lwt:smoke 1 180.89 208.89 *lwt:ht 1 183.00 211.00lwt:ui 1 181.90 209.90lwt:ptd 1 182.75 210.75lwt:ftv 2 181.44 211.44race:smoke 2 181.23 211.23race:ht 2 182.78 212.78race:ui 2 180.32 210.32race:ptd 2 182.85 212.85race:ftv 4 181.69 215.69smoke:ht 1 182.71 210.71smoke:ui 1 179.94 207.94 **smoke:ptd 1 182.82 210.82smoke:ftv 2 181.65 211.65ht:ui 0 183.00 209.00ht:ptd 1 182.06 210.06ht:ftv 2 181.82 211.82ui:ptd 1 181.42 209.42ui:ftv 2 182.94 212.94ptd:ftv 2 181.57 211.57
244
Motivating Example: Recumbant Cows“The abiltiy of biochemical and haematolgical tests to predict recovery in periparturient recumbent cows.” NZ Veterinary Journal, 35, 126-133 Clark, R. G., Henderson, H. V., Hoggard, G. K. Ellison, R. S. and Young,B. J. (1987).
Study Description:For unknown reasons, many pregnant dairy cows become recumbant--they lay down--either shortly before or after calving. This condition can be serious, and may lead to death of the cow. These data are from a study of blood samples of over 500 cows studied at the Ruakura (N.Z.) Animal Health Laboratory during 1983-84. A variety of blood tests were performed, and for many of the animals the outcome (survived, died, or animal was killed) was determined. The goal is to see if survival can be predicted from theblood measurements. Case numbers 12607 and 11630 were noted as having exceptional care---and they survived.Name Type n InfoAST Variate 429 serum asparate amino transferase (U/l at 30C)Calving Variate 431 0 if measured before calving, 1 if afterCK Variate 413 Serum creatine phosphokinase (U/l at 30C)Daysrec Variate 432 Days recumbentInflamat Variate 136 inflamation 0=no, 1=yesMyopathy Variate 222 Muscle disorder, 1 if present, 0 if absentOutcome Variate 435 outcome: 1 if survived, 0 if died or killed (response)PCV Variate 175 Packed Cell Volume (Haemactocrit), %Urea Variate 266 serum urea (mmol/l)CaseNo Text 435 case number
Because calving, inflammation, and myopathy are Bernoulli dichotomous predictors they will not be transformed, although we might consider potential interactions involving these predictors. We will not consider inflammation and myopathy however as most of the cows have that information missing.
Guidelines for Transforming Predictors in Logistic Regression
Examining univariate conditional density plots for continuous predictors f ( xi|y ) (Cook & Weisberg)
Consider,
f ( x|y )= the conditional density of x given outcome variable y ={1 if success
0 if failure
Idea:
245
Univariate considerationsf ( x|y ) Suggested model termsNormal, common variancei.e. Var ( x|y=0 )=Var ( x|y=1) xNormal, unequal variancesi.e. Var ( x|y=0 )≠Var ( x|y=1)v x and x2
Skewed right x and log2(x)base 2 is easier to interpret
x∈[ 0,1 ] log(x) , log(1-x)x is dichotomous, Bernoulli xx ~ Poisson, i.e. x is a count x
Multivariate considerationsWhen considering multiple continuous predictors simultaneously we look at multivariate normality.
If
f ( x~|y )~ MVN ( μ y=k , Σ)
then use the x’s themselves
f ( x~|y )~ MVN ( μ y=k , Σ y=k )
then include x i2
’s and x i x j terms
For example in the two predictor case (p = 2)
x1 x2 is needed if E( x1|x2 )=βo+β1 , y=k x2
and if the variances are different for the x i across levels of y then we add x i2
terms as well.
AST
246
Clearly AST has skewed distribution and using the log 2( AST ) in the model would be recommended. After transformation we have
In f ( log2 ( AST )|Outcome )appears to approximately normal for both levels with a constant variance so quadratic terms in the log scale are not suggested.
CK
247
Clearly CK is extremely right skewed and would benefit from log transformation.
Again the conditional densities appear approximately normal with equal variance, so we
will consider adding log 2(CK )only to the model.
PCV
248
f ( PCV |Outcome ) is approximately normal for both outcome groups but the variation in PCV levels appear to be higher for cows that survived. Thus we will consider PCV and PCV2 terms in the model.
Daysrec
Despite the fact that Daysrec is right skewed we will not log transform it. It represents a count of the number of days the cow was recumbent, so it could be modeled as a Poisson and thus the only term recommended is the Daysrec itself.
Urea
249
Consider the log transformation of urea level.
f ( log2 (Urea )|Outcome) is approximately normal however the variation for cows that
survived appears larger so we will consider both log 2(Urea ) and log 2(Urea )2 terms.
250
Data set = Downer, Name of Fit = B2372 cases are missing at least one value. (PCV has lots of missing values also)Binomial RegressionKernel mean function = LogisticResponse = OutcomeTerms = (AST log2[AST] CK log2[CK] Urea log2[Urea] log2[Urea]^2 PCV PCV^2 Daysrec Calving)Trials = OnesCoefficient EstimatesLabel Estimate Std. Error Est/SE p-valueConstant -1.03935 6.35298 -0.164 0.8700AST -0.000720027 0.00242524 -0.297 0.7666log2[AST] -0.330179 0.554239 -0.596 0.5514CK -0.000109772 0.000135315 -0.811 0.4172log2[CK] -0.0121434 0.223648 -0.054 0.9567Urea -1.13453 1.05860 -1.072 0.2838log2[Urea] 0.730468 2.89371 0.252 0.8007log2[Urea]^2 0.660165 1.38757 0.476 0.6342PCV 0.182480 0.224691 0.812 0.4167PCV^2 -0.00165620 0.00325722 -0.508 0.6111Daysrec -0.391937 0.157490 -2.489 0.0128Calving 1.28561 0.648089 1.984 0.0473
Scale factor: 1. Number of cases: 435Number of cases used: 165Degrees of freedom: 153Pearson X2: 127.410Deviance: 141.988
Clearly we have some model reduction to do, as many of the current terms are not significant. Before backward eliminating we will drop all of the non-transformed versions of log scale predictors.
Coefficient EstimatesLabel Estimate Std. Error Est/SE p-valueConstant -3.82598 5.84498 -0.655 0.5127log2[AST] -0.554005 0.293416 -1.888 0.0590log2[CK] -0.118575 0.160536 -0.739 0.4601log2[Urea] 4.09939 3.12355 1.312 0.1894log2[Urea]^2 -0.978895 0.545929 -1.793 0.0730PCV 0.218085 0.213730 1.020 0.3075PCV^2 -0.00229912 0.00305947 -0.751 0.4524Daysrec -0.383179 0.153758 -2.492 0.0127Calving 1.39322 0.647605 2.151 0.0314Scale factor: 1. Number of cases: 435Number of cases used: 165Degrees of freedom: 156Pearson X2: 134.154Deviance: 145.123 Backward Elimination: Sequentially remove termsthat give the smallest change in AIC.All fits include an intercept.
251
Current terms: (log2[AST] log2[CK] log2[Urea] log2[Urea]^2 PCV PCV^2 Daysrec Calving) df Deviance Pearson X2 | k AICDelete: log2[CK] 157 145.671 134.797 | 8 161.671Delete: PCV^2 157 145.786 134.995 | 8 161.786Delete: PCV 157 146.392 135.415 | 8 162.392Delete: log2[Urea] 157 148.141 140.787 | 8 164.141Delete: log2[AST] 157 148.92 140.737 | 8 164.920Delete: Calving 157 150.163 141.672 | 8 166.163Delete: Daysrec 157 151.993 135.976 | 8 167.993Delete: log2[Urea]^2 157 152.536 143.299 | 8 168.536
Current terms: (log2[AST] log2[Urea] log2[Urea]^2 PCV PCV^2 Daysrec Calving) df Deviance Pearson X2 | k AICDelete: PCV^2 158 146.202 135.813 | 7 160.202 *Delete: PCV 158 146.701 136.211 | 7 160.701Delete: log2[Urea] 158 149.035 142.035 | 7 163.035Delete: Calving 158 151.207 140.587 | 7 165.207Delete: Daysrec 158 152.168 136.078 | 7 166.168Delete: log2[Urea]^2 158 153.767 145.12 | 7 167.767Delete: log2[AST] 158 161.383 144.17 | 7 175.383
Current terms: (log2[AST] log2[Urea] log2[Urea]^2 PCV Daysrec Calving) df Deviance Pearson X2 | k AICDelete: PCV 159 148.955 137.789 | 6 160.955Delete: log2[Urea] 159 150.035 144.626 | 6 162.035Delete: Calving 159 152.176 141.179 | 6 164.176Delete: Daysrec 159 152.699 136.298 | 6 164.699Delete: log2[Urea]^2 159 155.31 149.108 | 6 167.310Delete: log2[AST] 159 163.059 140.738 | 6 175.059
Current terms: (log2[AST] log2[Urea] log2[Urea]^2 Daysrec Calving) df Deviance Pearson X2 | k AICDelete: log2[Urea] 160 152.373 144.523 | 5 162.373Delete: Daysrec 160 155.744 138.388 | 5 165.744Delete: Calving 160 155.99 142.871 | 5 165.990Delete: log2[Urea]^2 160 157.017 148.417 | 5 167.017Delete: log2[AST] 160 164.785 143.03 | 5 174.785
Current terms: (log2[AST] log2[Urea]^2 Daysrec Calving) df Deviance Pearson X2 | k AICDelete: Calving 161 160.932 150.399 | 4 168.932Delete: Daysrec 161 162.036 146.037 | 4 170.036Delete: log2[AST] 161 169.755 148.817 | 4 177.755Delete: log2[Urea]^2 161 176.794 157.24 | 4 184.794
Current terms: (log2[AST] log2[Urea]^2 Daysrec) df Deviance Pearson X2 | k AICDelete: Daysrec 162 167.184 150.961 | 3 173.184Delete: log2[AST] 162 178.021 150.618 | 3 184.021Delete: log2[Urea]^2 162 181.641 162.028 | 3 187.641
Current terms: (log2[AST] log2[Urea]^2) df Deviance Pearson X2 | k AICDelete: log2[Urea]^2 163 182.688 162.386 | 2 186.688Delete: log2[AST] 163 192.479 151.943 | 2 196.479
252
Forward selection suggests the same model.
“Final” ModelData set = Downer, Name of Fit = B5372 cases are missing at least one value.Binomial RegressionKernel mean function = LogisticResponse = OutcomeTerms = (log2[AST] log2[Urea] log2[Urea]^2 PCV Daysrec Calving)Trials = OnesCoefficient EstimatesLabel Estimate Std. Error Est/SE p-valueConstant -1.12404 5.01853 -0.224 0.8228log2[AST] -0.733670 0.196044 -3.742 0.0002log2[Urea] 4.44950 3.17044 1.403 0.1605log2[Urea]^2 -1.05918 0.554282 -1.911 0.0560PCV 0.0514512 0.0335256 1.535 0.1249Daysrec -0.386695 0.153067 -2.526 0.0115Calving 1.44641 0.623820 2.319 0.0204
Scale factor: 1. Number of cases: 435Number of cases used: 170Degrees of freedom: 163Pearson X2: 138.509Deviance: 148.269
Diagnostics and Model Checking Plots
Chi-residuals vs. estimated logits ~ Looks good.
253
Cook’s Distance and Leverage vs. Case Numbers
Model Checking Plots (Estimated Logits and Marginals)
LOGIT
254
AST
UREA
255
PCV
DAYSREC
All of these plots look OK. The largest departure observed is in the case of urea but the discrepancy there is primarily due to one observation stands out from the rest.
256
In R
To replicate the analysis above in R you will need the following functions to look and the conditional densities:
f ( x|y=0 ) and f (x∨ y=1)
The first two functions are used to make pretty histograms in the conplot function. The function conplot replicates the fitting density estimates conditional on the value of the outcome variable Y, by taking the predictor X and the Y as arguments. If there are missing values on either the response or the predictor those cases are automatically removed before constructing the plot.
nclass.FD = function (x) { r <- quantile(x, c(0.25, 0.75)) names(r) <- NULL h <- 2 * (r[2] - r[1]) * length(x)^{ -1/3 } ceiling(diff(range(x))/h)}
bandwidth.nrd = function (x) { r <- quantile(x, c(0.25, 0.75)) h <- (r[2] - r[1])/1.34 4 * 1.06 * min(sqrt(var(x)), h) * length(x)^(-1/5)}
conplot = function (x, xname = deparse(substitute(x)),y) { xname <- deparse(substitute(x)) data = na.omit(cbind(x,y)) x = data[,1] y = as.numeric(data[,2]) lev = unique(y) par(err = -1) dens0 <- density(x[y==0], width = bandwidth.nrd(x[y==lev[1]])) dens1 <- density(x[y==1], width = bandwidth.nrd(x[y==lev[2]])) ylim <- range(c(dens0$y,dens1$y)) xlim <- range(c(dens0$x,dens1$x)) hist(x, nclass.FD(x), prob = T, xlab = xname, xlim = xlim, ylim = ylim,main=paste("Conditional X|Y Plot of ",xname)) lines(dens0,col="blue") lines(dens1,col="red") invisible()}
257
> conplot(x=AST,y=Outcome)
> conplot(x=log(AST),y=Outcome)
Etc…
To obtain model checking plots in R you will need to install the package car from the CRAN which essentially is a collection of functions to replicate Arc in R. The two functions that create model checking plots in the car library are called mmp and mmps, the latter creates model checking plots for each predictor as well as the overall fit.
258
Downer Example in R> mod1 = glm(Outcome~AST+Urea+PCV+Calving+Daysrec+CK,family="binomial")> summary(mod1)
Call:glm(formula = Outcome ~ AST + Urea + PCV + Calving + Daysrec + CK, family = "binomial")
Deviance Residuals: Min 1Q Median 3Q Max -1.7678 -0.7541 -0.1928 0.7546 2.0696
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.3313771 1.1987644 0.276 0.78222 AST -0.0022405 0.0014726 -1.521 0.12815 Urea -0.3140380 0.0770497 -4.076 4.59e-05 ***PCV 0.0601745 0.0339726 1.771 0.07652 . Calving 1.3192777 0.6238318 2.115 0.03445 * Daysrec -0.4804961 0.1498000 -3.208 0.00134 ** CK -0.0001435 0.0001121 -1.280 0.20068 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 212.71 on 164 degrees of freedomResidual deviance: 146.39 on 158 degrees of freedom (270 observations deleted due to missingness) AIC: 160.39
Number of Fisher Scoring iterations: 7
> mmps(mod1)
259
Using the same approach as in the analysis in Arc we might use the model with several terms based on predictor transformations.
> logAST = log2(AST)> logCK = log2(CK)> logUrea = log2(Urea)> logUrea2 = logUrea^2> PCV2 = PCV^2
> Downer2 = data.frame(Outcome,logAST,logCK,logUrea,logUrea2,PCV,PCV2,Daysrec,Calving)> Downer2 = na.omit(Downer2)> attach(Downer2)
> mod2 = glm(Outcome~logAST+logCK+logUrea+logUrea2+PCV+PCV2+ Daysrec+Calving,family="binomial",data=Downer2)> summary(mod2)
Call:glm(formula = Outcome ~ logAST + logCK + logUrea + logUrea2 + PCV + PCV2 + Daysrec + Calving, family = "binomial", data = Downer2)
Deviance Residuals: Min 1Q Median 3Q Max -1.9522 -0.7094 -0.2869 0.7109 2.0585
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.826289 5.856671 -0.653 0.5135 logAST -0.554007 0.293529 -1.887 0.0591 .logCK -0.118574 0.160621 -0.738 0.4604 logUrea 4.099642 3.137295 1.307 0.1913 logUrea2 -0.978940 0.548487 -1.785 0.0743 .PCV 0.218083 0.213771 1.020 0.3076 PCV2 -0.002299 0.003060 -0.751 0.4525
260
Daysrec -0.383178 0.153795 -2.491 0.0127 *Calving 1.393222 0.647759 2.151 0.0315 *---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 212.71 on 164 degrees of freedomResidual deviance: 145.12 on 156 degrees of freedomAIC: 163.12
Number of Fisher Scoring iterations: 7
Backwards eliminate using the step() function
> mod3 = step(mod2)Start: AIC=163.12Outcome ~ logAST + logCK + logUrea + logUrea2 + PCV + PCV2 + Daysrec + Calving
Df Deviance AIC- logCK 1 145.67 161.67- PCV2 1 145.79 161.79- PCV 1 146.39 162.39<none> 145.12 163.12- logUrea 1 148.14 164.14- logAST 1 148.92 164.92- Calving 1 150.16 166.16- Daysrec 1 151.99 167.99- logUrea2 1 152.54 168.54
Step: AIC=161.67Outcome ~ logAST + logUrea + logUrea2 + PCV + PCV2 + Daysrec + Calving
Df Deviance AIC- PCV2 1 146.20 160.20- PCV 1 146.70 160.70<none> 145.67 161.67- logUrea 1 149.03 163.03- Calving 1 151.21 165.21- Daysrec 1 152.17 166.17- logUrea2 1 153.77 167.77- logAST 1 161.38 175.38
Step: AIC=160.2Outcome ~ logAST + logUrea + logUrea2 + PCV + Daysrec + Calving
Df Deviance AIC
261
<none> 146.20 160.20- PCV 1 148.96 160.96- logUrea 1 150.03 162.03- Calving 1 152.18 164.18- Daysrec 1 152.70 164.70- logUrea2 1 155.31 167.31- logAST 1 163.06 175.06
> summary(mod3)
Call:glm(formula = Outcome ~ logAST + logUrea + logUrea2 + PCV + Daysrec + Calving, family = "binomial", data = Downer2)
Deviance Residuals: Min 1Q Median 3Q Max -2.0329 -0.6836 -0.2644 0.7002 2.0893
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.49051 5.04944 -0.295 0.767853 logAST -0.72986 0.19514 -3.740 0.000184 ***logUrea 4.61037 3.19802 1.442 0.149406 logUrea2 -1.08728 0.55899 -1.945 0.051768 . PCV 0.05489 0.03370 1.629 0.103394 Daysrec -0.37191 0.15422 -2.411 0.015888 * Calving 1.45572 0.62845 2.316 0.020538 * ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 212.71 on 164 degrees of freedomResidual deviance: 146.20 on 158 degrees of freedomAIC: 160.2
Number of Fisher Scoring iterations: 7
> mmps(mod3)
262
We could consider adding interaction terms to our “final” model. This is easily done using the scope option.
> mod3 = step(mod2,scope=~.^2)> summary(mod3)
Call:glm(formula = Outcome ~ logAST + logUrea + logUrea2 + PCV + PCV2 + Daysrec + Calving + PCV:Calving + logAST:PCV2 + logAST:PCV, family = "binomial", data = Downer2)
Deviance Residuals: Min 1Q Median 3Q Max -1.8712 -0.6639 -0.1012 0.6784 2.5298
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 44.950653 39.720672 1.132 0.2578 logAST -9.472551 5.720701 -1.656 0.0978 .
263
logUrea 6.107698 3.425871 1.783 0.0746 .logUrea2 -1.315449 0.602659 -2.183 0.0291 *PCV -3.814468 2.386123 -1.599 0.1099 PCV2 0.069737 0.036368 1.918 0.0552 .Daysrec -0.388998 0.162716 -2.391 0.0168 *Calving 12.376954 6.037254 2.050 0.0404 *PCV:Calving -0.322466 0.168853 -1.910 0.0562 .logAST:PCV2 -0.010067 0.005179 -1.944 0.0519 .logAST:PCV 0.608959 0.344917 1.766 0.0775 .---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 212.71 on 164 degrees of freedomResidual deviance: 134.15 on 154 degrees of freedomAIC: 156.15
Number of Fisher Scoring iterations: 7
> mmps(mod3)
264
There is a slight improvement in fit. The same model fit in JMP produces the following ROC curve. The resulting classification is very good using this model.
265
MORE EXAMPLES OF LOGISTIC REGRESSION
Example 8.1 - Classification of Credit Card DefaultsIn this example, we seek to develop classification models to predict which customers will default on their credit card debt. The data frame is called Default and is in the ISLR library. The variables in the data frame are summarized below:
> summary(Default)
default student balance income No :9667 No :7056 Min. : 0.0 Min. : 772 Yes: 333 Yes:2944 1st Qu.: 481.7 1st Qu.:21340 Median : 823.6 Median :34553 Mean : 835.4 Mean :33517 3rd Qu.:1166.3 3rd Qu.:43808 Max. :2654.3 Max. :73554
The response is default={1if default=Yes0 if default=No and the predictors are customer
student status (Yes or No), the average balance on credit card ($) after making their monthly payment, and the customer’s annual income ($).
> def.glm1 = glm(default~student,data=Default,family="binomial")> summary(def.glm1)
Call:glm(formula = default ~ student, family = "binomial", data = Default)
Deviance Residuals: Min 1Q Median 3Q Max -0.2970 -0.2970 -0.2434 -0.2434 2.6585
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.50413 0.07071 -49.55 < 2e-16 ***studentYes 0.40489 0.11502 3.52 0.000431 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2920.6 on 9999 degrees of freedomResidual deviance: 2908.7 on 9998 degrees of freedomAIC: 2912.7
Number of Fisher Scoring iterations: 6
> logits = predict(def.glm1,type=”link”)> Pdefault = 1/(1+exp(-logits))> table(Pdefault)> table(Pdefault)
Pdefault0.0291950113382457 0.0431385869565177 7056 2944
266
> table(student,default) defaultstudent No Yes No 6850 206 Yes 2817 127
> mosaicplot(~student+default,color=3:5,main=”Mosaic Plot of Defaults vs. Student Status”)
> def.glm2 = glm(default~.,data=Default,family="binomial")> summary(def.glm2)
Call:glm(formula = default ~ ., family = "binomial", data = Default)
Deviance Residuals: Min 1Q Median 3Q Max -2.469 -0.142 -0.056 -0.020 3.738
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.09e+01 4.92e-01 -22.08 <2e-16 ***studentYes -6.47e-01 2.36e-01 -2.74 0.0062 ** balance 5.74e-03 2.32e-04 24.74 <2e-16 ***income 3.03e-06 8.20e-06 0.37 0.7115 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2920.6 on 9999 degrees of freedomResidual deviance: 1571.5 on 9996 degrees of freedomAIC: 1580
Number of Fisher Scoring iterations: 8
> PrDefault = function(x,student){ L = -10.9 - .647*student + .00574*balance + .00000303*33517 1/(1+exp(-L))}
267
> plot(balance,PrDefault(balance,student=1),col=5,ylab="P(Default|X)",pch=19)> points(balance,PrDefault(balance,student=0),col=3,pch=20)> legend(250,.8,c("Student","Non-student"),col=c(3,5),pch=c(19,20))
> par(mfrow=c(1,2))> boxplot(split(balance,student),col=c(3:5))> boxplot(split(income,student),col=c(3:5))
268
> def.glm3 = glm(default~.^2,data=Default,family="binomial")> summary(def.glm3)
Call:glm(formula = default ~ .^2, family = "binomial", data = Default)
Deviance Residuals: Min 1Q Median 3Q Max -2.485 -0.142 -0.055 -0.020 3.758
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.10e+01 1.87e+00 -5.91 3.3e-09 ***studentYes -5.20e-01 1.34e+00 -0.39 0.70 balance 5.88e-03 1.18e-03 4.98 6.3e-07 ***income 4.05e-06 4.46e-05 0.09 0.93 studentYes:balance -2.55e-04 7.90e-04 -0.32 0.75 studentYes:income 1.45e-05 2.78e-05 0.52 0.60 balance:income -1.58e-09 2.82e-08 -0.06 0.96 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2920.6 on 9999 degrees of freedomResidual deviance: 1571.1 on 9993 degrees of freedomAIC: 1585
Number of Fisher Scoring iterations: 8
> def.step = step(def.glm3)Start: AIC=1585default ~ (student + balance + income)^2
Df Deviance AIC- balance:income 1 1571 1583- student:balance 1 1571 1583- student:income 1 1571 1583<none> 1571 1585
Step: AIC=1583default ~ student + balance + income + student:balance + student:income
Df Deviance AIC- student:balance 1 1571 1581- student:income 1 1571 1581<none> 1571 1583
Step: AIC=1581default ~ student + balance + income + student:income
Df Deviance AIC- student:income 1 1572 1580<none> 1571 1581- balance 1 2907 2915
Step: AIC=1580default ~ student + balance + income
Df Deviance AIC
- income 1 1572 1578
269
<none> 1572 1580- student 1 1579 1585- balance 1 2907 2913
Step: AIC=1578default ~ student + balance
Df Deviance AIC<none> 1572 1578- student 1 1596 1600- balance 1 2909 2913
> summary(def.step)
Call:glm(formula = default ~ student + balance, family = "binomial", data = Default)
Deviance Residuals: Min 1Q Median 3Q Max -2.458 -0.142 -0.056 -0.020 3.743
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.07e+01 3.69e-01 -29.12 < 2e-16 ***studentYes -7.15e-01 1.48e-01 -4.85 1.3e-06 ***balance 5.74e-03 2.32e-04 24.75 < 2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2920.6 on 9999 degrees of freedomResidual deviance: 1571.7 on 9997 degrees of freedomAIC: 1578
Number of Fisher Scoring iterations: 8
The package ROCR contains functions to examine classification performance of a model where predicted probabilities of class membership are returned by modeling method. Given the estimated probabilities we first run the function pred to compare predicted probabilities to actual class memberships for the training data. We can then examine various performance measures and plot them by using the perf command. The ROC curve is obtained by running the performance function with the True Positive Rate (tpr) and False Positive Rate (fpr) as arguments. Plotting the results will give the ROC curve. The area underneath the curve requires another run of perf function with “auc” as the performance measure. This process is demonstrated below for our simple model to classify credit card defaulters.
270
> library(ROCR)> PrDefault = fitted(def.step)> pred = prediction(PrDefault,default)> perf = performance(pred,"tpr","fpr")> plot(perf,main="ROC Curve for Credit Card Default")> performance(pred,”auc”)
AUC = .9495
Example 8.2 - Classification of Real vs. Forged Swiss Francs
names(Swiss)[1] "id" "leng" "left" "right" "bottom" "top" "diagon" "genu" > Swiss = Swiss[,-1]> attach(Swiss)> pairs(Swiss[,-7],panel=function(x,y){points(x[genu==1],y[genu==1],pch="+",col="blue")points(x[genu==0],y[genu==0],pch="o",col="red")})
diagonal
271
> pairs.image(Swiss[,-7],cont=T)
272
> pairs.persp(Swiss[,-7])
Two predictor model with only linear terms> rb.sim = glm(genu~right+bottom,data=Swiss,family="binomial")> right.seq = seq(min(right),max(right),length=100)> bottom.seq = seq(min(bottom),max(bottom),length=100)> rb.grid = expand.grid(right=right.seq,bottom=bottom.seq)> PrGenu = predict(rb.glm,newdata=rb.grid,"response")> plot(right,bottom,xlab="Right (mm)",ylab="Bottom (mm)",type="n")> points(right,bottom,col=as.numeric(genu)+3,pch=17+as.numeric(genu))> z = matrix(PrGenu,100,100)
273
> contour(right.seq,bottom.seq,z,add=T,levels=.5,lty=1,lwd=2)
Two predictor model with non-linear terms> rb.glm = glm(genu~poly(right,2)+poly(bottom,2)+right:bottom,data=Swiss,family="binomial")> summary(rb.glm)Call:glm(formula = genu ~ poly(right, 2) + poly(bottom, 2) + right:bottom, family = "binomial", data = Swiss)
Deviance Residuals: Min 1Q Median 3Q Max -2.1890 -0.0486 0.0024 0.1998 2.7819
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1855.43 1370.41 -1.35 0.17576 poly(right, 2)1 -106.44 59.52 -1.79 0.07373 . poly(right, 2)2 9.81 5.06 1.94 0.05230 . poly(bottom, 2)1 -4110.61 2973.80 -1.38 0.16689 poly(bottom, 2)2 -43.92 13.15 -3.34 0.00084 ***right:bottom 1.51 1.12 1.35 0.17637 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 277.259 on 199 degrees of freedomResidual deviance: 76.157 on 194 degrees of freedomAIC: 88.16
Number of Fisher Scoring iterations: 8> PrGenu = fitted(rb.glm,type="response")> table(PrGenu>.5,genu) genu 0 1 FALSE 90 8 TRUE 10 92
> PrGenu = predict(rb.glm,newdata=rb.grid,"response")> plot(right,bottom,xlab="Right (mm)",ylab="Bottom (mm)",type="n")
274
> points(right,bottom,col=as.numeric(genu)+3,pch=17+as.numeric(genu))> z = matrix(PrGenu,100,100)> contour(right.seq,bottom.seq,z,add=T) adds contours of P(genu=1)
> plot(right,bottom,xlab="Right (mm)",ylab="Bottom (mm)",type="n")> points(right,bottom,col=as.numeric(genu)+3,pch=17+as.numeric(genu))
Add decision boundary for rule: P(genu=1)>.50 Real Swiss Franc > contour(right.seq,bottom.seq,z,add=T,levels=0.5,lty=1,lwd=2
Building a logistic model using all available bill dimensions
> swiss.glm = glm(genu~.,data=Swiss,family="binomial")
Warning messages:1: glm.fit: algorithm did not converge 2: glm.fit: fitted probabilities numerically 0 or 1 occurred
275
Logistic regression will become unstable if the estimated probabilities are near 0 and/or 1. For these data, this is precisely what happens. Despite this instability, the model does nearly produce a perfect classification of the Swiss francs in the training data with an overall misclassification rate of .015 or 1.5%.
> table(PrGenu>.5,genu) genu 0 1 FALSE 99 2 TRUE 1 98
In cases where this instability occurs, both ridge and Lasso logistic regression are good options. These are also good options when you have a “wide data” problem where n < p or when p is large and also when you have some highly correlated predictors. For logistic regression, the regularized logistic models using the ridge and Lasso are given below.
Ridge Logistic Model: ln ( θ(x )1−θ(x ))=ηo+∑
j=1
k
η j u j+ λ∑j=1
k
η j2
Lasso Logistic Model: ln ( θ(x )1−θ(x ))=ηo+∑
j=1
k
η j u j+ λ∑j=1
k
|η j|
We now consider fitting both a ridge and Lasso logistic regression to the Swiss Franc data.
X = model.matrix(genu~.,data=Swiss[,-1])[,-1]y = Swiss$genuforg.ridge = glmnet(X,y,alpha=0,family="binomial")forg.lasso = glmnet(X,y,alpha=1,family="binomial")ridge.cv = cv.glmnet(X,y,alpha=0,family="binomial")lasso.cv = cv.glmnet(X,y,alpha=1,family="binomial")plot(ridge.cv)
ridge.lam = ridge.cv$lambda.minridge.lam[1] 0.04476108
276
plot(lasso.cv)
lasso.lam = lasso.cv$lambda.minlasso.lam[1] 0.001849533
ypred.ridge = predict(forg.ridge,newx=X,s=ridge.lam,type=”response”)ypred.lasso = predict(forg.lasso,newx=X,s=lasso.lam,type=”response”)
table(ypred.ridge>.5,y) y 0 1 FALSE 100 1 TRUE 0 99
table(ypred.lasso>.5,y) y 0 1 FALSE 100 0 TRUE 0 100
277
Cross-validation of a Classification from GLM Models (non-regularized or regularized, i.e. ridge and Lasso)
log.cv = function (fit, B=50,p = .67, pcut = 0.5) { cv <- rep(0, B) data = fit$data y = fit$y n = dim(data)[1] k = floor(n*p) for (i in 1:B) { sam <- sample(1:n,k,replace=F) fit2 <- glm(formula(fit),data = data[sam,],family = "binomial") phat <- predict(fit2, newdata = data[-sam,], type="response") predclass <- phat > pcut tab <- table(predclass, y[-sam]) mc <- (n-k) - sum(diag(tab)) cv[i] <- mc/(n-k) } cv}
It should not be hard to modify this code to handle the ridge and Lasso glmnet() models as well.
glmnetlog.cv = function (X,y,s=.10,alpha=0,B=50,p = .67, pcut = 0.5) { cv <- rep(0, B) n = length(y) k = floor(n*p) for (i in 1:B) { sam <- sample(1:n,k,replace=F) fit2 <- glmnet(X[sam,],y[sam],alpha=alpha,family = "binomial") phat <- predict(fit2, newx = X[-sam,], type="response",s=s) predclass <- phat > pcut tab <- table(predclass, y[-sam]) mc <- (n-k) - sum(diag(tab)) cv[i] <- mc/(n-k) } cv}
Recall,alpha = 0 RIDGE LOGISTIC REGRESSIONalpha = 1 LASSO LOGISTIC REGRESSION
and s will be the value for you found to be optimal from running the cv.glmnet() function.
278