+ All Categories
Home > Documents > 061220ppt1 - Logistic Regression

061220ppt1 - Logistic Regression

Date post: 18-May-2017
Category:
Upload: shashank-magulur
View: 231 times
Download: 2 times
Share this document with a friend
33
An Introduction to Logistic Regression Analysis and Reporting Chao-Ying Joanne Peng Indiana University-Bloomington
Transcript

An Introduction to Logistic Regression Analysis and Reporting

Chao-Ying Joanne Peng

Indiana University-Bloomington

Three purposes of this session:

1. Introduces you to basic concepts of logistic regression– LR constitutes a special class

of regression methods for research utilizing dichotomous outcomes.

Three purposes of this session:2. Provides you with a set of guidelines

of what to expect in an article using logistic regression techniques– What tables, figures, or charts should

be included to comprehensively assess the results?

– And, what assumption should be verified?

Three purposes of this session:3. Recommendations are also

offered for– appropriate reporting formats of

logistic regression results and – the minimum observation to

predictor ratio.

• Many research problems in education call for the analysis and prediction of a dichotomous outcome, for example, whether a child should be classified as learning disabled (LD), or whether a teenager is prone to engage in risky behaviors.

• Traditionally, these research questions were addressed by either ordinary least squares (OLS) regression or linear discriminant function analysis.

An Introduction to Logistic Regression Analysis and Reporting

An Introduction to Logistic Regression Analysis and Reporting• Both techniques were subsequently

found to be less than ideal in handling dichotomous outcomes, due to their strict statistical assumptions i.e., linearity, normality, and continuity for OLS regression and multivariate normality with equal variances and covariances for discriminant analysis.

• As an alternative, logistic regression was proposed in the late 60’s and early 70’s (Cabrera, 1994). It became routinely available in statistical packages in the early 80’s.

• With the wide availability of sophisticated statistical software installed on high-speed computers, the use of logistic regression is increasing.

An Introduction to Logistic Regression Analysis and Reporting

Logistic Regression Models• The central mathematical concept that

underlies logistic regression is the logit.

• The simplest example of a logit derives from a 2×2 contingency table. Consider an instance in which the distribution of a dichotomous outcome variable (a child from an inner city school recommended for remedial reading classes) is paired with a dichotomous predictor variable (gender).

lnln(odds)(odds)

Table 1. Sample Data for Gender and Recommendation for Remedial Reading

GenderRemedial reading instruction recommended

Boys Girls

Yes (coded as 1) 73 15 88

No (coded as 0) 23 11 34

Totals 96 26 122

Totals

• The results yield χ2(df =1) = 3.43. Alternatively, one might prefer to assess a boy’s odds of being recommended for remedial reading instructions, relative to a girl’s odds; the result is an odds ratio of 2.33.

33.2

ratio odds ===73 2315 11

3.171.36

• Its natural logarithm [i.e., ln (2.33)] equals 0.85 which would be the regression coefficient of the gender predictor, if logistic regression were used to model the two outcomes of a remedial recommendation as it is related to gender.

• The simple logistic model has the form:

,)1

ln()log()(logit XoddsnaturalY βαπ

π+=

−==

where π is the probability of interested outcome, α is the intercept parameter, β is a regression coefficient, and X is a predictor.

a.k.a. slope parameter

Figure 1. The relationship of a dichotomous outcome variable, Y (1=remedial reading recommended, 0=remedial reading not recommended) with a continuous predictor, READING scores.

• For the data in Table 1, the regression coefficient (β) is the logit (=0.85) previously explained. Taking the antilog of equation (1) on both sides, one derives an equation for the prediction of the probability of the occurrence of the outcome of interest as follows:

xe

xe

valuespecificaxXinterestofoutcomeYP

βα

βαπ

++

+=

===

1

),|(

• Extending the logic of the simple logistic regression to multiple predictors (say X1=reading score and X2=gender), one may construct a complex logistic regression for Y(recommendation for remedial reading programs) as follows:

. 1

ln)(logit 2211 XXY ββαπ

π++=⎟

⎠⎞

⎜⎝⎛

−=

• Therefore,

π ),|( 2211 xXxXinterestofoutcomeYP ====

2211

2211

1 xx

xx

ee

ββα

ββα

++

++

+=

Illustration of Logistic Regression Analysis and Reporting• The hypothetical data consisted of 189 inner city

school children’s reading scores and gender.

• Of these children, 59 (31.22%) were recommended for remedial reading classes while 130 (68.78%) were not. A legitimate research hypothesis posed to the data was: “the likelihood that an inner city school child is recommended for remedial reading instruction is related to both his/her reading score and gender.”

Table 2. Description of a Hypothetical Data Set for Logistic Regression

Gender Reading scores

Boys (n1)

Girls (n2)

Mean SD

Yes 59 36 23 61.07

13.28

No 130 57 73 66.65

15.86

Summary 189 93 96 64.91

15.29

Remedial reading

instructionrecommende

d?

Total Sample

(N)

Logistic Regression Analysis

• The logistic regression analysis was carried out by the LOGISTIC REGRESSION command in SPSS®

version 13 (SPSS Inc., 2004)

• Predicted logit of (REMEDIAL)=0.534 + (−0.026)×READING + (0.648)×GENDER

Evaluations of the Logistic Regression Model

a) overall model evaluationb) goodness-of-fit statisticsc) statistical tests of individual predictorsd) validations of predicted probabilities

Overall Model Evaluation

Tests χ2 df p OK?

Likelihood Ratio test 10.019 2 0.007

Score test 9.518 2 0.009Goodness-of-fit Test

Test χ2 df p OK?

Hosmer-LemeshowGoodness-of-fit test 9.286 8 0.319

R2-type Indices

Cox and Snell R squared = .052Nagelkerke (Max rescaled) R squared = .073

Table 3. Logistic Regression Analysis of 189 Children’s Referrals for Remedial Reading Programs by SPSS LOGISTIC REGRESSION command (version 12)

Predictor β SE β Wald’s χ2

(df=1) p eβ

(odds ratio)

CONSTANT 0.534 0.811 0.434 .510 (not applicable)

READING –0.026 0.012 4.565 .033 0.974

GENDER (1=boys, 0=girls) 0.648 0.325 3.976 .046 1.911

Table 4. The Observed and the Predicted Frequencies for Remedial Reading Instructions by Logistic Regression with the Cutoff of 0.50

Predicted

Observed Yes No PercentageCorrect

Yes 3 56 5.1%No 1 129 99.2%Overall % correct 69.8%Note.Sensitivity=3/(3+56)=5.1%Specificity=129/(1+129)=99.2%False Positive=1/(1+3)=25.0%False Negative=56/(56+129)=30.3%

0.6 ‚‚ Boys‚‚ A‚ B

0.5 ˆ AE ‚ FAs ‚ CCt ‚ EAi ‚ AIm ‚ Ea 0.4 ˆ PCt ‚ Girls BCe ‚ HBd ‚ D

‚ AC AP ‚ CB ACr 0.3 ˆ BB ABAo ‚ AIB Ab ‚ CJ BAa ‚ Eb ‚ AKE Bi ‚ AFA Bl 0.2 ˆ BB Ai ‚ CAt ‚ AAAy ‚ ACA

‚ AB A‚ B A A A

0.1 ˆ A‚‚ A‚‚‚

0.0 ˆ‚Šƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒ40 60 80 100 120 140

Reading score

Figure 2. Predicted Figure 2. Predicted probability of being probability of being referred for referred for remedial reading remedial reading instructions versus instructions versus reading scores, reading scores, plotting symbols plotting symbols A=1 observation, A=1 observation, B=2 observations, B=2 observations, C=3 observations, C=3 observations, etc.etc.

Reporting and Interpreting Logistic Regression Results

In addition to Tables 3, 4 and Figure 2, it is helpful to demonstrate the relationship between the predicted outcome and certain characteristics found in observations.

Table 5. Predicated Probability of Being Referred for Remedial Reading Instructions for 8 Children

CaseNumber

READINDBeta= −0.026

GENDERBeta=0.648

Intercept= 0.534

Predicted probability of being referred for remedial reading

instructions

Actual outcome1=Yes, 0=No

1 52.5 Boy 0.5340 0.4530 1

2 85 Boy 0.5340 0.2618 0

3 75 Girl 0.5340 0.1941 1

4 92 Girl 0.5340 0.1250 0

5 60 Boy 0.5340 0.4051 --

6 60 Girl 0.5340 0.2627 --

7 100 Boy 0.5340 0.1934 --

8 100 Girl 0.5340 0.1115 --

Interpretation of Regression Coefficients

• For each point increase on the reading score, the odds of being recommended for remedial reading programs decrease from one to 0.974 (=e –0.026, Table 3).

• If the increase on the reading score is 10 points, the odds decrease from one to 0.771 [=e 10*(–0.026) ].

• However, when the READING score was held as a constant, boys were predicted to be referred for remedial reading instructions with greater probability than girls.

Guidelines and RecommendationsWhat Tables, Figures, or Charts Should be Included to Comprehensively Assess the Result?• the overall evaluation of the logistic model • goodness-of-fit statistics • statistical tests of individual predictors• an assessment of the predicted

probabilities

What Assumption Should be Verified?

• Logistic regression does not assume that predictor variables are distributed as a multivariate normal distribution with equal covariance matrix.

• It assumes that the binomial distribution describes the distribution of the errors, which equal the actual Yminus the predicted Y ;

• It is also the assumed distribution for the conditional mean of the dichotomous outcome.

• The binomial assumption may be tested by the normal z test (Siegel & Castellan, 1988), or taken to be robust as long as the sample is random; thus, observations are independent from each other.

What Assumption Should be Verified?

Recommended Reporting Formats of Logistic Regression• In terms of reporting logistic regression

results, we recommend presenting A complete logistic regression model including• the Y-intercept• odds ratio• a table such as Table 5 to illustrate

the relationship between outcomes and observations with profiles of certain characteristics

Recommended Minimum Observation to Predictor Ratio

• The literature has not offered specific rules applicable to logistic regression.

• Several authors of multivariate statistics recommended a minimum ratio of 10 to 1, with a minimum sample size of 100 or 50 plus a variable number that is a function of the number of predictors.

Preview of Next Session


Recommended