LOGISTIC REGRESSION- CORONARY ARTERY DISEASE

Post on 14-Jun-2015

222 views 2 download

Tags:

transcript

:

1. MUHAMMAD SAFWAN BIN SUKERI (SES100346)

2. MUHAMAD HARIF BIN HARUN@IBRAHIM (SEU110016)

3. MUHAMMAD RIDZUAN HAKIM BIN MOHD MUSLEH (SEU110017)

4. WAN MOHAMAD FARHAN BIN AB RAHMAN (SEU110024)

Lecturer : Dr Adriana Irawati Nur binti Ibrahim

Date : 19th May 2013

LOGISTIC REGRESSION

• Type of regression analysis used for predicting the outcome of a categorical dependent variable, based one or more predictor variable.

• Two type of logistic regression:a) Binary logistic regression,

b) Multinomial logistic regression.

• “Logistic regression” usually refer specifically to the problem in which the dependent variable is binary.

• The purpose is to describe how many times more likely the event in one grop compared to the other.

• Used extensively in the medical and social science fields.

The main measure we

use in Logistic

Regession is Odds

Using natural log of odds to make the

function to become

linear

The general equation for

logistic regression

Probabilities not odds....

• ln𝑃

1−𝑃= 𝑎 + 𝑖=1

𝑛 𝑏𝑖𝑋𝑖

•𝑃

1−𝑃= 𝑒𝑎+ 𝑖=1

𝑛 𝑏𝑖𝑋𝑖

• 𝑃 =𝑒𝑎+

𝑖=1𝑛 𝑏𝑖𝑋𝑖

1+𝑒𝑎+ 𝑖=1𝑛 𝑏𝑖𝑋𝑖

Why using Logistic regression and not ordinary linear regression?

LINEAR REGRESSION

• The dependent variable is considered continous.

• The independent variable is continuous.

• The probabilities will be more than one and less than zero.

• The observation is to study the relationship between independent and dependent variable.

LOGISTIC REGRESSION

• The dependent variable is considered categorical.

• The independent variable is either categorical, continuous or both.

• The probabilities will be between zero and one.

• The observation is to study whether or not the event to occur.

Background of case study : Binary Logistic Regression

Case study: Diagnosing coronary artery disease

The method that use:

cardiac catheterization

using radioisotope thallium while the patient is

made to exercise.

Sample of 100 people(55 men and 45 women) had

both the exercise thallium test and cardiac

catheterisation.

Objective:

To determine which variable were good predictors

of a positive thallium test

Outcome Variable :

=>Thallium (1= Positive scan; 0 = Negative scan)

Explanatory variables :

Use of propranolol prior to test (0 = No; 1 = Yes)

X1: Maximum heart rate during exercise (0 = Heart rate did not rise to

85% of maximum predicted rate; 1 = Heart rate exceeded 85% of

predicted rate )

X2: Ischaemia during exercise ( 1 = Occurred ; 0 = No )

X3: Sex ( 0 = Male ; 1 = Female )

Chest pain during exercise ( 0 = No pain; 1 = Moderate pain; 2 = Severe pain)

THALLIUM USE OF

PROPRANOLOL

MAXIMUM

HEART

RATE

( X1 )

ISCHAEMIA

DURING

EXERCISE

( X2 )

SEX

( X3 )

CHEST

PAIN

DURING

EXERCISE

1 0 0 0 1 0

1 0 1 0 1 0

1 1 0 1 1 2

1 0 1 0 1 0

1 0 0 0 1 0

0 0 1 0 1 1

0 1 1 1 0 0

0 0 1 0 1 2

1 1 0 0 0 0

1 0 0 0 1 0

1 0 0 0 0 0

0 0 0 0 1 0

1 0 0 0 0 0

0 0 0 0 0 1

Logistic Regression

Perform logistic regression analysis on fitting

the model :

Loge {Probability (event) / Probability (no

event) }

= Log { exp( α + β1X1 + β2X2… βkXk) }

= α + β1X1 + β2X2… βkXk

Steps Using SPSS 16.0 STEP 1 : Choose Analyze => Regression =>

Binary Logistic

STEP 2 :

Dependent => Choose Y = Thallium =>

Covariates => Choose variables : HEART_RATE ,

ISCHAEMIA and SEX => Add them

Categorical => Categorical Covariates =>

HEART_RATE , ISCHAEMIA and SEX

Change contrast => Check for indicator =>

Reference Category => Choose Last => Continue

The dialog box should look like this:

STEP 3 :

Options - Statistics and Plots => Check for

Classification Plots (if possible) and CI for

exp(B) = 95% => Continue

Main tab - Click ok => Interpret the result

The dialog and result shown below appears

We can see that the -2 Log Likelihood statistic is 125.462. This statistic measures how poorly the model predicts the decisions-- the smaller the statistic the better the model should be . In a perfect model, -2 Log Likelihood would equal to 0. The Cox & Snell R2 can be interpreted like R2 in a multiple regression, but cannot reach a maximum value of 1.In this case, the value of Cox and Snell R2 is 0.027, which is not a very good model meanwhile the Nagelkerke R2 in this case is 0.037 but possibly can reach a maximum of 1. These are attempts to replicate R2 using information based on -2 log likelihood.

Refer to the table of variables in the equation above :

From the formula of odds , we easily convert the odds formula to get the value of probability:

Let “Y” be the P ( Y )

𝑌/(1−𝑌) = 𝑂𝐷𝐷S

Y = ODDS (1 – Y )

ODDS = Y( 1 + ODDS )

Y = 𝑂𝐷𝐷𝑆/(1+𝑂𝐷𝐷𝑆)

We can get the value of odds by the formula:

ODDS = exp ( a + Ʃ bixi ) where i = 1, 2 and 3 only for this case.

HEART RATE ( First variable ) :

For X1 = 1 and b1 = 0.556 :

exp ( -1.384 + 0.556 (1) )

0.436922257

For X1 = 0 and b1 = 0.556 :

exp ( -1.384 + 0.556 (0) )

0.250574248

Thus, ODDS RATIO = (0.436922257/0.250574248) = 1.744

From this calculation, we can say that for heart rate exceeded 85% is 1.744

times more likely to have positive scan. In terms of probability, we simply get

the value of probability from the formula:

𝑌=(𝑂𝐷𝐷) / (1+𝑂𝐷𝐷S)

After substitute the value into the formula, we get the value for Y = 0.6356.

From this probability,we can simply say that for heart rate exceeded 85%, the

probability to get positive scan is 0.6356

ISCHAEMIA ( Second Variable ) :

For X2 = 1 and b2 = 0.353 :

exp (-1.384 + 0.353 (1) )

0.356650132

For X2 = 0 and b2 = 0.353 :

exp (-1.384 + 0.353 (0) )

0.250574248

Thus, ODDS RATIO = 0.356650132/(0.250574248) = 1.423

From this calculation , we can interprete that during exercise, Ischaemia is 1.423 times more likely to have positive scan for Thallium. In terms of probability , we simply get the value of probability from the formula:

𝑌=𝑂𝐷𝐷𝑆/(1+𝑂𝐷𝐷𝑆)

After substitute the value into the formula, we get the value for Y = 0.59. From this probability, we can simply say that when Ischaemia during exercise occurred , the probability to have positive scan for thallium is 0.59.

SEX ( Third Variable ) :

For X3 = 1 and b3 = 0.308 :

exp ( -1.384 + 0.308 (1) )

0.340956628

For X3 = 0 and b3 = 0.308 :

exp ( -1.384 + 0.353(0) )

0.250574248

Thus, ODDS RATIO = 0.340956628/(0.250574248) = 1.361

From this calculation , we can interprete that woman is 1.361 times

more likely to have positive scan for thallium. In terms of probability

, we simply get the value of probability from the formula:

𝑌=𝑂𝐷𝐷𝑆/(1+𝑂𝐷𝐷𝑆 )

After substitute the value into the formula , we get the value for Y =

0.637. For every woman, probability to get positive scan is 0.637

Summary

A few things can be conclude after analyzing the data on the case

study.

A higher heart rate will propose a higher chance on positive

thallium test compared to a person with a normal heart rate.

Women has higher chance compared to men on having positive

thallium test.

we can interprete that during exercise, Ischaemia is 1.423 times

more likely to have positive scan for Thallium.

Conclusion

In this paper, we demonstrate that logistic regression can be a

powerful analytical technique for use when the outcome variable is

dichotomous(dependent variable that can take value 1 or 0) . This

method is widely used around the world and in various fields such as :

the medical and social science fields. For example, the Trauma and

Injury Severity Score (TRISS)

Marketing applications such as prediction of a customer's

propensity to purchase a product

In economics it can be used to predict the likelihood of a person's

choosing to be in the labor force

In conclusion to that. Logistics regression is a great method that has

been gaining popularity in the last decade.