+ All Categories
Home > Documents > Psy524 Lecture 18 Logistic

Psy524 Lecture 18 Logistic

Date post: 06-Apr-2018
Category:
Upload: romi-aggarwal
View: 221 times
Download: 0 times
Share this document with a friend

of 37

Transcript
  • 8/3/2019 Psy524 Lecture 18 Logistic

    1/37

    Logistic Regression

    Psy 524

    Ainsworth

  • 8/3/2019 Psy524 Lecture 18 Logistic

    2/37

    What is Logistic Regression? Form of regression that allows the

    prediction of discrete variables by a mix

    of continuous and discrete predictors. Addresses the same questions that

    discriminant function analysis andmultiple regression do but with no

    distributional assumptions on thepredictors (the predictors do not have tobe normally distributed, linearly related orhave equal variance in each group)

  • 8/3/2019 Psy524 Lecture 18 Logistic

    3/37

    What is Logistic Regression? Logistic regression is often used because

    the relationship between the DV (adiscrete variable) and a predictor is non-linear

    Example from the text: the probability ofheart disease changes very little with a ten-point difference among people with low-bloodpressure, but a ten point change can mean adrastic change in the probability of heartdisease in people with high blood-pressure.

  • 8/3/2019 Psy524 Lecture 18 Logistic

    4/37

    Questions Can the categories be correctly predicted

    given a set of predictors?

    Usually once this is established the predictorsare manipulated to see if the equation can besimplified.

    Can the solution generalize to predicting new

    cases? Comparison of equation with predictors plus

    intercept to a model with just the intercept

  • 8/3/2019 Psy524 Lecture 18 Logistic

    5/37

    Questions What is the relative importance of each

    predictor?

    How does each variable affect the outcome?

    Does a predictor make the solution better orworse or have no effect?

  • 8/3/2019 Psy524 Lecture 18 Logistic

    6/37

    Questions Are there interactions among predictors?

    Does adding interactions among predictors

    (continuous or categorical) improve themodel?

    Continuous predictors should be centeredbefore interactions made in order to avoidmulticollinearity.

    Can parameters be accurately predicted?

    How good is the model at classifyingcases for which the outcome is known ?

  • 8/3/2019 Psy524 Lecture 18 Logistic

    7/37

    Questions What is the prediction equation in the presence

    of covariates?

    Can prediction models be tested for relative fitto the data?

    So called goodness of fit statistics

    What is the strength of association between the

    outcome variable and a set of predictors? Often in model comparison you want non-significant

    differences so strength of association is reported foreven non-significant effects.

  • 8/3/2019 Psy524 Lecture 18 Logistic

    8/37

    Assumptions

    The only real limitation on logistic

    regression is that the outcome mustbe discrete.

  • 8/3/2019 Psy524 Lecture 18 Logistic

    9/37

    Assumptions If the distributional assumptions are met

    than discriminant function analysis maybe more powerful, although it has beenshown to overestimate the associationusing discrete predictors.

    If the outcome is continuous thenmultiple regression is more powerfulgiven that the assumptions are met

  • 8/3/2019 Psy524 Lecture 18 Logistic

    10/37

    Assumptions Ratio of cases to variables using

    discrete variables requires that there are

    enough responses in every givencategory

    If there are too many cells with no responsesparameter estimates and standard errors willlikely blow up

    Also can make groups perfectly separable(e.g. multicollinear) which will makemaximum likelihood estimation impossible.

  • 8/3/2019 Psy524 Lecture 18 Logistic

    11/37

    Assumptions

    Linearity in the logit the regression

    equation should have a linearrelationship with the logit form of theDV. There is no assumption about thepredictors being linearly related to

    each other.

  • 8/3/2019 Psy524 Lecture 18 Logistic

    12/37

  • 8/3/2019 Psy524 Lecture 18 Logistic

    13/37

    Background Odds like probability. Odds are usually

    written as 5 to 1 odds which isequivalent to 1 out of five or .20probability or 20% chance, etc.

    The problem with probabilities is that theyare non-linear

    Going from .10 to .20 doubles the probability,but going from .80 to .90 barely increasesthe probability.

  • 8/3/2019 Psy524 Lecture 18 Logistic

    14/37

    Background

    Odds ratio the ratio of the odds over

    1 the odds. The probability ofwinning over the probability of losing.5 to 1 odds equates to an odds ratioof .20/.80 = .25.

  • 8/3/2019 Psy524 Lecture 18 Logistic

    15/37

    Background

    Logit this is the natural log of an

    odds ratio; often called a log oddseven though it really is a log oddsratio. The logit scale is linear andfunctions much like a z-score scale.

  • 8/3/2019 Psy524 Lecture 18 Logistic

    16/37

    Background

    LOGITS ARE CONTINOUS, LIKE Z

    SCORESp = 0.50, then logit = 0

    p = 0.70, then logit = 0.84

    p = 0.30, then logit = -0.84

  • 8/3/2019 Psy524 Lecture 18 Logistic

    17/37

    Plain old regression Y = A BINARY RESPONSE (DV)

    1 POSITIVE RESPONSE (Success) P

    0 NEGATIVE RESPONSE (failure) Q = (1-P)

    MEAN(Y) = P, observed proportion ofsuccesses

    VAR(Y) = PQ, maximized when P = .50,variance depends on mean (P)

    XJ = ANY TYPE OF PREDICTOR Continuous, Dichotomous, Polytomous

  • 8/3/2019 Psy524 Lecture 18 Logistic

    18/37

    Plain old regression

    and it is assumed that errors arenormally distributed, with mean=0and constant variance (i.e.,

    homogeneity of variance)

    0 1 1|Y X B B X I!

  • 8/3/2019 Psy524 Lecture 18 Logistic

    19/37

    Plain old regression

    an expected value is a mean, so

    0 1 1( | )E Y X B B X !

    The predicted value equals the proportion ofobservations for which Y|X = 1; P is theprobability of Y = 1(A SUCCESS) given X, andQ = 1- P (A FAILURE) given X.

    1 ( ) |

    YY P XT !! !

  • 8/3/2019 Psy524 Lecture 18 Logistic

    20/37

    Plain old regression

    For any value of X, only two errors ( )

    are possible, AND .W

    hichoccur at rates P|X AND Q|X and withvariance (P|X)(Q|X)

    Y Y

    1 T 0 T

  • 8/3/2019 Psy524 Lecture 18 Logistic

    21/37

    Plain old regression

    Every respondent is given a probability

    of success and failure which leads toevery person having drasticallydifferent variances (because theydepend on the mean in discrete cases)

    causing a violation of thehomoskedasticity assumption.

  • 8/3/2019 Psy524 Lecture 18 Logistic

    22/37

    Plain old regression

    Long story short you cant use

    regular old regression when you havediscrete outcomes because you dontmeet homoskedasticity.

  • 8/3/2019 Psy524 Lecture 18 Logistic

    23/37

    An alternative the ogive

    function

    An ogive function is a curved s-shaped

    function and the most common is thelogistic function which looks like:

  • 8/3/2019 Psy524 Lecture 18 Logistic

    24/37

    The logistic function

  • 8/3/2019 Psy524 Lecture 18 Logistic

    25/37

    The logistic function

    Where Y-hat is the estimatedprobability that the ith case is in a

    category and u is the regular linearregression equation:

    1

    u

    i u

    eY

    e!

    1 1 2 2 K Ku A B X B X B X !

  • 8/3/2019 Psy524 Lecture 18 Logistic

    26/37

    The logistic function

    0 1 1

    0 1 1

    1

    b b X

    i b b X

    e

    e

    T

    !

  • 8/3/2019 Psy524 Lecture 18 Logistic

    27/37

    The logistic function Change in probability is not constant

    (linear) with constant changes in X This means that the probability of asuccess (Y = 1) given the predictorvariable (X) is a non-linear function,specifically a logistic function

  • 8/3/2019 Psy524 Lecture 18 Logistic

    28/37

    The logistic function It is not obvious how the regression

    coefficients for X are related tochanges in the dependent variable (Y)when the model is written this way

    Change in Y(in probability units)|Xdepends on value of X. Look at S-shaped function

  • 8/3/2019 Psy524 Lecture 18 Logistic

    29/37

    The logistic function The values in the regression equation

    b0 and b1 take on slightly differentmeanings.

    b0 The regression constant (movescurve left and right)

    b1

  • 8/3/2019 Psy524 Lecture 18 Logistic

    30/37

    Logistic Function Constant regression

    constant different

    slopes v2: b0 = -4.00

    b1 = 0.05 (middle)

    v3: b0 = -4.00

    b1 = 0.15 (top)

    v4: b0 = -4.00

    b1 = 0.025 (bottom)10090807060504030

    1.0

    .8

    .6

    .4

    .2

    0.0

    V4

    V1

    V3

    V1

    V2

    V1

  • 8/3/2019 Psy524 Lecture 18 Logistic

    31/37

    Logistic Function Constant slopes with

    different regression

    constants v2: b0 = -3.00

    b1 = 0.05 (top)

    v3: b0 = -4.00

    b1 = 0.05 (middle)

    v4: b0 = -5.00

    b1 = 0.05 (bottom) 10090807060504030

    1.0

    .9

    .8

    .7

    .6

    .5

    .4

    .3

    .2

    .1

    0.0

    V4

    V1

    V3

    V1

    V2

    V1

  • 8/3/2019 Psy524 Lecture 18 Logistic

    32/37

    The Logit By algebraic manipulation, the logistic

    regression equation can be written interms of an odds ratio for success:

    0 1 1( 1| ) exp( )

    (1 ( 1| )) (1 )

    i

    i

    i

    P Y X b b X P Y X

    TT

    ! ! ! ! - -

  • 8/3/2019 Psy524 Lecture 18 Logistic

    33/37

    The Logit Odds ratios range from 0 to positive

    infinity Odds ratio: P/Q is an odds ratio; lessthan 1 = less than .50 probability,greater than 1 means greater than .50

    probability

  • 8/3/2019 Psy524 Lecture 18 Logistic

    34/37

    The Logit Finally, taking the natural log of both

    sides, we can write the equation interms of logits (log-odds):

    0 1 1

    ( 1| )ln ln (1 ( 1| )) (1 )

    P Y X

    b b X P Y X

    T

    T

    !

    ! ! ! - -

    For a single predictor

  • 8/3/2019 Psy524 Lecture 18 Logistic

    35/37

    The Logit

    For multiple predictors

    0 1 1 2 2

    ln (1 )k kb b X b X b X

    T

    T

    ! -

  • 8/3/2019 Psy524 Lecture 18 Logistic

    36/37

    The Logit Log-odds are a linear function of the

    predictors

    The regression coefficients go back totheir old interpretation (kind of)

    The expected value of the logit (log-

    odds) when X = 0 Called a logit difference; The amountthe logit (log-odds) changes, with a oneunit change in X; the amount the logitchanges in going from X to X + 1

  • 8/3/2019 Psy524 Lecture 18 Logistic

    37/37

    Conversion EXP(logit) or = odds ratio

    Probability = odd ratio / (1 + oddratio)


Recommended