Post on 23-Dec-2015
transcript
Logistic regression for binary response variables
Space shuttle example
• n = 24 space shuttle launches prior to Challenger disaster on January 27, 1986
• Response y is an indicator variable– y = 1 if O-ring failures during launch– y = 0 if no O-ring failures during launch
• Predictor x1 is launch temperature, in degrees Fahrenheit
Space shuttle example
80706050
1.0
0.5
0.0
Temperature
Fai
lure
Yes
No
Plot of Failure versus Temperature
A model
The mean of a binary response
If there are 20% smokers and 80% non-smokers, and Yi = 1, if smoker and 0, if non-smoker, then:
120.0)80.0)(0()20.0)(1( ii YPYE
If pi = P (Yi = 1) and 1 – pi = P (Yi = 0), then:
1)1)(0())(1( iiiii YPpppYE
A linear regression model for a binary response
iii xY 10 for Yi = 0, 1
If the simple linear regression model is:
iii xYPYE 101
Then, the mean response …
… is the probability that Yi = 1 when the level of the predictor variable is xi.
Space shuttle example
80706050
1.0
0.5
0.0
Temperature
Pro
babi
lity
of
Fai
lure
S = 0.414476 R-Sq = 23.8 % R-Sq(adj) = 20.3 %
failure = 2.43729 - 0.0306883 temp
Regression Plot
(Simple) logistic regression function
i
iii x
xYPp
exp1
exp1
15010050
1.0
0.5
0.0
X
E(Y
) =
p
i
iii x
xYPp
1.010exp1
1.010exp1
i
iii x
xYPp
1.010exp1
1.010exp1
15010050
1.0
0.5
0.0
X
E(Y
) =
p
Space shuttle example
80706050
1.0
0.5
0.0
Temperature
Pro
babi
lity
of
failu
re
i
iii x
xYPp
17.08.10exp1
17.08.10exp1ˆˆ
Alternative formulation of (simple) logistic regression function
i
iii x
xYPp
exp1
exp1
ii
i xp
p
1
ln
(algebra)
“logit”
Space shuttle example
80706050
2
1
0
-1
-2
-3
Temperature
Logi
t
ii
i xp
p17.08.10
ˆ1
ˆln
Interpretation of slope coefficients
Odds
If there are 20% smokers and 80% non-smokers:
420.0
80.0Odds
“Odds are 4 to 1” … 4 non-smokers to 1 smoker.
and 39.14lnln Odds
If pi = P (Yi = 1) and 1 – pi = P (Yi = 0), then:
i
i
p
pOdds
1and
i
i
p
pOdds
1lnln
Odds ratioMALE: 20% smokers and 80% non-smokers:
420.0
80.0MOdds
FEMALE: 40% smokers and 60% non-smokers:
5.140.0
60.0FOdds
67.25.1
4OR
The odds that a male is a nonsmoker is 2.67 times the odds that a female is a nonsmoker.
Odds ratio
1|
1|1 1 i
i
p
pOdds
Group 1 Group 2
2|
2|2 1 i
i
p
pOdds
The odds ratio
2|
2|
1|
1|
2
1
11 i
i
i
i
p
p
p
p
Odds
OddsOR
Space shuttle example
ii
i xp
p17.08.10exp
ˆ1
ˆ
Predicted odds:
Predicted odds at x1 = 55 degrees:
263.45517.08.10expˆ1
ˆ
55|
55| i
i
p
p
Predicted odds at x1 = 80 degrees:
06081.08017.08.10expˆ1
ˆ
80|
80| i
i
p
p
Space shuttle example
76
06081.0
263.4
8017.08.10exp
5517.08.10expˆ1
ˆ
ˆ1
ˆ
80|
80|
55|
55|
i
i
i
i
p
p
p
p
Predicted odds ratio for x1 = 55 relative to x1 = 80:
The odds of O-ring failure at 55 degrees Fahrenheit is 76 times the odds of O-ring failure at 80 degrees Fahrenheit!
Interpretation of slope coefficients
The ratio of the odds at X1 = A relative to the odds at X1 = B (for fixed values of other X’s) is:
BAB
A
Odds
Odds
B
A 11
1 expexp
exp
Estimation of logistic regression coefficients
Maximum likelihood estimation
• Choose as estimates of the parameters the values that assign the highest probability to (“maximize likelihood of”) the observed outcome.
Suppose i
iii x
xYPp
15.010exp1
15.010exp1
For first observation, Y1 = 1 and x1 = 53:
886.0
)53(15.010exp1
)53(15.010exp11
YP
… for second observation, Y2 = 1 and x2 = 56:
832.0
)56(15.010exp1
)56(15.010exp12
YP
… and for 24th observation, Y24 = 0 and x24 = 81:
896.0
)81(15.010exp1
)81(15.010exp1024
YP
If α = 10 and β = -0.15, what is the probability of observed outcome?
24.12896.0ln...832.0ln886.0ln
0...11ln
0,...,1,1ln
2421
2421
YPYPYP
YYYP
The log likelihood of the observed outcome is:
The likelihood of the observed outcome is:
6
2421
2421
1082.4896.0...832.0886.0
0...11
0,...,1,1
YPYPYP
YYYP
Maximum likelihood estimation
• Choose as estimates of the parameters the values that assign the highest probability to (“maximize likelihood of”) the observed outcome.
Suppose i
iii x
xYPp
17.08.10exp1
17.08.10exp1
For first observation, Y1 = 1 and x1 = 53:
857.0
)53(17.08.10exp1
)53(17.08.10exp11
YP
… for second observation, Y2 = 1 and x2 = 56:
782.0
)56(17.08.10exp1
)56(17.08.10exp12
YP
… and for 24th observation, Y24 = 0 and x24 = 81:
951.0
)81(17.08.10exp1
)81(17.08.10exp1024
YP
If α = 10.8 and β = -0.17, what is the probability of observed outcome?
52.11951.0ln...782.0ln857.0ln
0...11ln
0,...,1,1ln
2421
2421
YPYPYP
YYYP
The log likelihood of the observed outcome is:
The likelihood of the observed outcome is:
6
2421
2421
1097.9951.0...782.0857.0
0...11
0,...,1,1
YPYPYP
YYYP
Space shuttle example
Link Function: Logit
Response Information
Variable Value Countfailure 1 7 (Event) 0 17 Total 24
Logistic Regression Table Odds 95% CIPredictor Coef SE Coef Z P Ratio Lower UpperConstant 10.875 5.703 1.91 0.057temp -0.17132 0.08344 -2.05 0.040 0.84 0.72 0.99
Properties of MLEs
• If a model is correct and the sample size is large enough:– MLEs are essentially unbiased.– Formulas exist for estimating the standard
errors of the estimators.– The estimators are about as precise as any
nearly unbiased estimators.– MLEs are approximately normally distributed.
Test and confidence intervals for single coefficients
Inference for βj
Test statistic: jjj
seZ
ˆ
ˆ follows approximate standard
normal distribution.
Confidence interval: jj sez ˆˆ
2/
Space shuttle example
Link Function: Logit
Response Information
Variable Value Countfailure 1 7 (Event) 0 17 Total 24
Logistic Regression Table Odds 95% CIPredictor Coef SE Coef Z P Ratio Lower UpperConstant 10.875 5.703 1.91 0.057temp -0.17132 0.08344 -2.05 0.040 0.84 0.72 0.99
Space shuttle example
• There is sufficient evidence, at the α = 0.05 level, to conclude that temperature is related to the probability of O-ring failure.
• For every 1-degree increase in temperature, the odds ratio of O-ring failure to O-ring non-failure is estimated to be 0.84 (95% CI is 0.72 to 0.99).
Survival in the Donner Party
• In 1846, Donner and Reed families traveled from Illinois to California by covered wagon.
• Group became stranded in eastern Sierra Nevada mountains when hit by heavy snow.
• 40 of 87 members died from famine and exposure.
• Are females better able to withstand harsh conditions than are males?
Survival in the Donner Party
655545352515
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
Age
Pro
babi
lity
of
surv
ival
Female
Male
Survival in the Donner Party
Link Function: Logit
Response Information
Variable Value CountSTATUS SURVIVED 20 (Event) DIED 25 Total 45
Logistic Regression Table Odds 95% CIPredictor Coef SE Coef Z P Ratio Lower UpperConstant 1.633 1.110 1.47 0.141AGE -0.07820 0.03729 -2.10 0.036 0.92 0.86 0.99Gender 1.5973 0.7555 2.11 0.034 4.94 1.12 21.72