Logistic Regression versus Cox Regression Ch. Mélot, MD...

1

Logistic Regression versus Cox Regression

Ch. Mélot, MD, PhD, MSciBiostatService des Soins IntensifsHôpital Universitaire Erasme

ESP, le 26 février 2008

Why do we need multivariable analyses?

We live in a multivariable world. Mostevents, whether medical, political, social, orpersonal, have multiple causes. And thesecauses are related to one another.

2

Definition

Multivariable analysis is a tool fordetermining the relative contributions ofdifferent causes to a single event.

Note: the terms “multivariate analysis” and“multivariable analysis” are often usedinterchangeably. In the strict sense,multivariate analysis refers tosimultaneously predicting multipleoutcomes.

Multivariable approach

Y X1, X2, X3, …

Single dependent variableOutcome

e.g., dead or alive

Independent variablesRisk factorsPredictors

e.g., age, gender, …

3

Multivariate approach

y1

y2

y3

1

x1

x2

x3

=

0j 1j 2j 3j

0j 1j 2j 3j

0j 1j 2j 3j

x

Multiple dependent variablesOutcomes

e.g., countries

Independent variablesRisk factorsPredictors

e.g., drugs, …

Belgium-Luxembourg

France

Germany

HollandSwitzerland

Ital yFinland

UK Ireland

Norway

Austria

Sweden

Spain

Portugal

Denmark

-2

-1.5

-1

-0.5

0.5

1

1.5

2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

MIDAZOLAM

MORPHINE

PROPOFOL

SUFENTANIL

FENTANYL

0

Soliman H.M., Mélot C., et al. Br. J. Anaesth. 2001;87:186-192

MULTIVARIATE ANALYSIS

4

Definition

Outcome: n. That which comes out of, or followsfrom; issue; result;

Webster’s dictionary

Outcome: status of the patient at the end of anepisode of care - presence of symptoms, level ofactivity, and mortality.

If y = continuous-> linear regression

If y = categorical (1 or 0)-> logistic regression

If y = count of events in a period of time-> Poisson regression

If y = time to event (censored data)-> Cox regression

Types of regression

5

MULTIVARIABLE REGRESSION

If y = continuous variable: multipleregression

y = o + 1 x1 + 2 x2 + 3 x3

If y = dichotomus variable: multiple logisticregression

y =o + 1 x1 + 2 x2 + 3 x3e

1 + e o + 1 x1 + 2 x2 + 3 x3

Logit(y) = o + 1 x1 + 2 x2 + 3 x3

MULTIVARIABLE REGRESSION

If y = count of events during a given period oftime (ti) : multivariable Poisson’s regression

If y = time to event: multivariable Cox’sregression

o + 1 x1 + 2 x2 + 3 x3y = ti e

1 x1 + 2 x2 + 3 x3y = h0 (t) e

Ln(y/ti) =o + 1 x1 + 2 x2 + 3 x3

Ln(y/h0 (t)) = 1 x1 + 2 x2 + 3 x3

6

Multiple linearregression

Multiplelogisticregression

Proportionalhazardsanalysis

MultiplePoisson’sregression

What is beingmodeled?

The mean valueof the outcome

The logarithmof the odds ofthe outcome(logit)

The logarithmof the relativehazard

The logarithmof the count ofthe events

Relationship ofmultipleindependentvariabes (X’s)to outcome

The mean valueof the outcomechangeslinearly withx’s

The logit ofthe outcomechangeslinearly withX’s

The logarithmof the relativehazard changeslinearly withX’s

The logarithmof the count ofthe eventschangeslinearly withX’s

Distribution ofthe outcomevariable

Normal Binomial None specif ied Poisson

Variance ofoutcomevariable

Equal groundthe mean

Depends onlyon the mean

None specif ied Mean equalsvariance

Relativehazard overtime

Not applicable Not applicable Constant Not applicable

Expression of the results

If y = continuous variable: multipleregression

y = o + 1 x1 + 2 x2 + 3 x3

1 = « slope » for the risk factor x1

7


If y = dichotomus variable: multivariablelogistic regression

Logit(y) = o + 1 x1 + 2 x2 + 3 x3

e = odds ratio for the risk factor x11


If y = count of events during a given period oftime (ti) : multivariable Poisson’s regression

Ln(y/ti) =o + 1 x1 + 2 x2 + 3 x3

e = relative risk of theoccurrence of the event duringthe period of time or relativerisk incidence

1

8


If y = time to event: multivariable Cox’sregression

Ln(y/h0 (t)) =1 x1 + 2 x2 + 3 x3

e = hazard ratio for the risk factor x1or incidence rate ratio

1

Cox versus Logistic Regression

9

Cox regression vs logistic regression

Distinction between rate andproportion:

– Incidence (hazard) rate: number of newcases of disease per population at-riskper unit time (or mortality rate, ifoutcome is death)

– Cumulative incidence: proportion of newcases that develop in a given time period

Cox regression vs logistic regression

Distinction between hazard/rateratio and odds ratio/risk ratio:– Hazard/rate ratio: ratio of incidence

rates– Odds/risk ratio: ratio of proportions

By taking into account time, you are taking into accountmore information than just binary yes/no.

Gain power/precision.

Logistic regression aims to estimate the odds ratio; Coxregression aims to estimate the hazard ratio

10

Risks vs Rates

Relationship between risk and rates:

R(t) = 1 – e-ht

h = constant hazard rate

R(t) = probability of disease in time t

Risks vs Rates

For example, if rate is 5 cases/1000person-years, then the chance ofdeveloping disease over 10 years is:

Compare to .005(10) = 5%The loss of persons atrisk because they havedeveloped diseasewithin the period ofobservation is smallrelative to the size ofthe total group.

R(t) = 1 - .951 = 0.0488

R(t) = 1 – e -.05

R(t) = 1 – e –(.005) (10)

11

Risks vs Rates

If rate is 50 cases/1000 person-years, then the chance of developingdisease over 10 years is:

Compare to .05(10) = 50%

R(t) = 1 - .61 = 0.39

R(t) = 1 – e -.5

R(t) = 1 – e –(.05) (10)

403

50484543413937353332

1000950902857814773734697662629

12345678910

Incidence: 0.050Persons at riskyear

12

Risk vs Rates

Relationship between risk and rates (derivation):

Waiting time distribution will change ifthe hazard rate changes as a functionof time: h(t)

Exponential density function forwaiting time until the event(constant hazard rate)

r(t) = h e-ht

tt

he-hu du = -e-huR(t) = = -e-hu - -e-0 = 1 – e-ht 00

LOGISTIC REGRESSION

13

Data set (CHD: Coronary Heart Disease) (Yes:1/No:0)

PA TID AGEGRP AGE C HD34 3 38 035 3 38 036 3 39 037 3 39 138 4 40 039 4 40 140 4 41 041 4 41 042 4 42 043 4 42 044 4 42 045 4 42 146 4 43 047 4 43 048 4 43 149 4 44 050 4 44 051 4 44 152 4 44 153 5 45 054 5 45 155 5 46 056 5 46 157 5 47 058 5 47 059 5 47 160 5 48 061 5 48 162 5 48 163 5 49 064 5 49 065 5 49 166 6 50 0

PATID AGEGRP AGE CHD1 1 2 0 02 1 2 3 03 1 2 4 04 1 2 5 05 1 2 5 16 1 2 6 07 1 2 6 08 1 2 8 09 1 2 8 010 1 2 9 011 2 3 0 012 2 3 0 013 2 3 0 014 2 3 0 015 2 3 0 016 2 3 0 117 2 3 2 018 2 3 2 019 2 3 3 020 2 3 3 021 2 3 4 022 2 3 4 023 2 3 4 124 2 3 4 025 2 3 4 026 3 3 5 027 3 3 5 028 3 3 6 029 3 3 6 130 3 3 6 031 3 3 7 032 3 3 7 133 3 3 7 0

PATID AGEGRP AGE CHD67 6 50 168 6 51 069 6 52 070 6 52 171 6 53 172 6 53 173 6 54 174 7 55 075 7 55 176 7 55 177 7 56 178 7 56 179 7 56 180 7 57 081 7 57 082 7 57 183 7 57 184 7 57 185 7 57 186 7 58 087 7 58 188 7 58 189 7 59 190 7 59 191 8 60 092 8 60 193 8 61 194 8 62 195 8 62 196 8 63 197 8 64 098 8 64 199 8 65 1100 8 69 1

LINEAR REGRESSION

y = 0.0218 x - 0.538R² = 0.264

0

0.2

0.4

0.6

0.8

1.0

0 20 40 60 80

Age, yrs

CHD

(0=

No,

1=

Yes

)

14

LOGISTIC REGRESSION

0

5

10

15

20

20-29 30-34 35-39 40-44 45-49 50-54 55-59 60-69

Age Group (yrs)

Num

ber

ofpa

tien

ts

CHD=0 (n = 57)CHD=1 (n = 43)

LOGISTIC REGRESSION

0%

20%

40%

60%

80%

100%

20-29 30-34 35-39 40-44 45-49 50-54 55-59 60-69

Age Group (yrs)

NU

MB

ER

OF

PA

TIE

NTS

CHD=0 (n = 57)CHD=1 (n = 43)

15

LOGISTIC REGRESSION

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

20 30 40 50 60 70

Age, yrs

Prop

orti

onCH

D

LOGISTIC REGRESSION

0

0.2

0.4

0.6

0.8

1.0

0 10 20 30 40 50 60 70 80 90 100

Age, yrs

Prop

orti

onCH

D

e-5.31 + 0.111 Age=(x) 1 + e-5.31 + 0.111 Age

16

LOGISTIC TRANSFORMATION

Logit [(x)] = ln [ ](x)

1 - (x)

(x) =e0 + 1 x

1 + e0 + 1 x

LOGISTIC REGRESSION

Logit (x) = -5.31 + 0.111 age

-3

-2

-1

0

1

2

3

0 10 20 30 40 50 60 70 80 90 100

Age, yrs

Logi

tPr

opor

tion

CHD

17

Odds and Probability

Probability = = 0.166

Odds in favour = = = 0.2056

61

61

Odds against = = 5 against 116

65

(x)

1-(x)

ODDS RATIO AND LOGISTIC REGRESSION

OR = = e

Example: OR = e 0.110 x 10 = 3.03

(x=1)

1-(x=1)(x=0)

1-(x=0)

18

ODDS RATIO AND LOGISTIC REGRESSION

Ln(OR) = 1

95 % CI for OR = ln (e )1 ± 1.96 SE(1)

OR = = e

(x=1)

1-(x=1)(x=0)

1-(x=0)

Forest plot: Odds Ratio with 95 % confidenceinterval

1 3 0 0.5

a bc d

OR =a db c

SE(ln(OR)) = d1c

1b

1a1 +++

p = ns

p < 0.05

p < 0.05

2OR

IC 95 % = OR ± 1.96 SE

Trt A > Trt B Trt B > Trt A

Amplitude of the observed effect

Precision of the observed effect

Favoursactive

Favoursplacebo

Trt A = Trt B

19

COX’s REGRESSION

Cox model

A Cox model is a well-recognized statisticaltechnique for exploring the relationshipbetween the occurrence of an event (e.g.,death, relapse,…) in a patient and severalexplanatory variables.

Survival analysis is concerned with stuyingthe time between entry to a study and asubsequent event (such as death).

Censored survival times occur if the eventof interest does not occur for a patient duringthe study period.

20

Survival Analysis: Terms

Time-to-event: The time from entry into astudy until a subject has a particularoutcome

Censoring: Subjects are said to becensored if they are lost to follow up ordrop out of the study, or if the study endsbefore ends before they die or have anoutcome of interest. They are counted asalive or disease-free for the time theywere enrolled in the study.– If dropout is related to both outcome and

treatment, dropouts may bias the results

Right Censoring (T>t)

Common examplesTermination of the studyDeath due to a cause that is not the

event of interestLoss to follow-up

We know that subject survived at least totime t.

21

Left censoring (T<t)

The origin time, not the event time, isknown only to be less than some value.

For example, if you are studying menarcheand you begin following girls at age 12, youmay find that some of them have alreadybegun menstruating. Unless you can obtaininformation about the start date for thosegirls, the age of menarche is left-censoredat age 12.

Interval censoring (a<T<b)

When we know the event hasoccurred between two time points,but don’t know the exact dates.

For example, if you’re screeningsubjects for HIV infection yearly,you may not be able to determine theexact date of infection.

22

Data Structure: survival analysis

Time variable: ti = time at lastdisease-free observation or time atevent.

Censoring variable: ci =1 if had theevent; ci =0 no event by time ti

Introduction to Kaplan-Meier

Non-parametric estimate of survivorfunction.

Commonly used to describe survivorship ofstudy populations.

Commonly used to compare two studypopulations.

Intuitive graphical presentation.

23

Beginning of study End of study Time in months

Subject B

Subject A

Subject C

Subject D

Subject E

Survival Data (right-censored)

1. subject E dies at 4months

X

100%

Time in months

Corresponding Kaplan-Meier Curve

Probability ofsurviving to justbefore4 monthsis 100% = 5/5

Fractionsurviving thisdeath = 4/5

Subject E dies at 4months

24

100%

Time in months


subject C dies at7 months

Fractionsurviving thisdeath = 2/3

subject A drops outafter 6 months

Beginning of study End of study Time in months

Subject B

Subject A

Subject C

Subject D

Subject E

Survival Data

2. subject Adrops out after6 months

4. Subjects Band D survivefor the wholeyear-longstudy period

1. subject E dies at 4months

X

3. subject C diesat 7 monthsX

25

100%

Time in months


Product limit estimateof survival =P(surviving/at-risk throughfailure1) *P(surviving/at-risk throughfailure2) =4/5 * 2/3= .5333

The product limit estimate

The probability of surviving in the entireyear, taking into account censoring

= (4/5) (2/3) = 53%

NOTE:– 40% (2/5) because the one drop-out survived

at least a portion of the year.

– < 60% (3/5) because we don’t know if the onedrop-out would have survived until the end ofthe year.

26

Cox model

....2211 xbxbeh(t) = h0(t)

h(t) = hazard function, i.e. the probabi li ty of death at time t

h0(t) = baseline or underlying hazard function, and correspondsto the probabi li ty of dying when all the explanatoryvariables are zero. The baseline hazard function isanalogous to the intercept in ordinary regression(since e0 = 1).

Non parametric Parametric

Cox model

The risk to die at time t (hO(t)) is equalto the number of deaths divided by thenumber of patients at risk to die at time t(risk set).

Survival analysis take into accountpatients who did not reach the time t.They are substracted from the number ofpatients at risk to die.

27

What is a hazard function?

The hazard function (h(t)) is theprobability that an individual willexperience an event (e.g., death) within asmall time interval given that theindividual has survived up to the beginningof the interval. It can therefore beinterpreted as the risk of dying at time t.

h(t) =N of individuals experiencing an event in interval beginning at t

(N of individuals surviving at time t) x (interval width)

Assumptions in a Cox model

The relationship between thedependent variable (outcome) and theexplanatory variables must beconstant. It is called theproportional hazards assumption

28

Captopril4909

4871 (99.2%)

Vital statusunknown:38 (0 .8%)

VALsartan In Acute myocardialiNfarcTion (VALIANT study)

Median follow-up: 24.7 months

Valsartan4909

4856 (98.9%)


14,808 Patients Randomized

4837 (99.0%)


Combination4885

Informed consentnot ensured: 105 patients

14,703 Patients

13 Pfeffer, McMurray, Velazquez, et al. N Engl J Med 2003;349:1893–1906

Testing PH: VALIANT example.

Under PH, curvesshould be parallel(should not cross)

Real change ineffect of HXMI(History of MI)over time?

History of MIgood for earlysurvival, bad forlater survival?

-7-6.5

-6-5.5

-5-4.5

-4-3.5

-3-2.5

-2

1 3 5 7 9 11 13 15

Days (from AMI)

log(

-log

(S(t

))

No HXMI

HXMI

29

Cox’s regression model

....2211 xbxbeh(t) = h0(t)

Hazard ratio for x1 (= eb1)

Loge ...2211 xbxbh0(t)h(t)

HR for x2 (= eb2)

Example

PROGRESS, Lancet 2001;358:1033-1041

significant

non significantsignificant

significantsignificantsignificant

non significant

significantsignificantsignificant

non significantnon significantnon significantnon significantnon significantnon significant

30

Survival after hepatic surgery forcancer

PatID

Age(years)

ATime

(weeks)

BNumber at riskat start ofstudy

CNumberofdeaths

DNumbercensored

EProportionsurviving until endof week

FCumulativeproportionsurviving

0 18 - - 1.000

1 59 10 18 1 0 1 - 1/18 = 0.944 0.944

2 56 13* 17 0 1 1 – 0/17 = 1.000 0.944

3 54 18* 16 0 1 1 – 0/16 = 1.000 0.944

4 67 19 15 1 0 1 – 1/15 = 0.933 0.882

5 37 23* 14 0 0 1 – 0/14 = 1.000 0.882

6 55 30 13 1 0 1 – 1/13 = 0.923 0.8137

7 65 36 12 1 0 1 – 1/12 = 0.916 0.7459

8 60 38* 11 0 1 1 – 0/11 = 1.000 0.7459

9 58 54* 10 0 1 1 – 0/10 = 1.000 0.7459

10 57 56* 9 0 1 1 – 0/9 = 1.000 0.7459

11 52 59 8 1 0 1 – 1/8 = 0.875 0.6526

12 46 75 7 1 0 1 – 1/7 = 0.857 0.5594

13 43 93 6 1 0 1 – 1/6 = 0.830 0.4662

14 58 97 5 1 0 1 – 1/5 = 0.800 0.3729

15 39 104* 4 0 1 1 – 0/4 = 1.000 0.3729

16 43 107 3 1 0 1 – 1/3 = 0.667 0.2486

17 45 107* 2 0 1 1 – 0/2 = 1.000 0.2486

18 37 107* 2 0 1 1 – 0/2 = 1.000 0.2486

31

Kaplan-Meier estimate of the survivor function

0 20 40 60 80 100 120

Time in weeks to death following surgery for liver cancer

100

90

80

70

60

50

40

30

20

10

0

Sur

viva

lpro

babi

lity

(%)

9 deaths / 18

Median survival: 93 wks


0.13 ageeh(t) = h0(t)

Loge = 0.13 ageh0(t)h(t)

HR = 1.14

32


0 20 40 60 80 100 120

100

90

80

70

60

50

40

30

20

10

0Sur

viva

lpro

babi

lity

(%)

HR = 1.14 (1.03 – 1.26)p = 0.0098

Time in weeks to death following surgery for liver cancer

HR = 1.39

Logistic or Cox model to identifyrisks factors?

33

Cox model versus logisticregression

The logistic regression requires asimilar period of time of observationon all the sujects to avoid theinfluence of time on the outcome.

The Cox regression allows to takeinto account of different period oftime of observation.

Comparison with complete follow-up(fictitious example)

Generated survival timesfrom exponentialdistribution

Assume 100% follow-upfor 365 days

Survival times > 365 daysare right censored

Analysis with logistic andproportional hazardregressions

112(11.2%)

158(15.8%)

Deaths in365 days

2,195days

1,435days

Mediansurvival time(uncensored)

1,0001,000N

TreatPlacebo

34

Kaplan Meier curves for fictitious example

0.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

0 50 100 150 200 250 300 350

Days from randomization to death

Sur

viva

l

Placebo

Treatment

log-rank p-value = 0.0026

Logistic vs Cox PH regression

HR (for treat) =0.691 (0.543, 0.881)

p-value = 0.0028Cox PH regression(with censoring)

HR (for treat) =0.648 (0.592, 0.708)

p-value < 0.0001Cox PH regression(no censoring)

OR (for treat) =0.672 (0.518, 0.872)

p-value = 0.0027Logistic regression

Placebo: 158/1000Treat: 112/1000

OR: (a x d) / (b x c) =0.672

2 p-value = 0.0026Contingency table

35

Cox model vs Logistic regression

Multivariable Cox model Multivariable Logistic regression

Variable HR

Low95 %CI

High95 %CI p OR

Low95 %CI

High95 %CI p

SAPS II score >39 2.05 1.68 2.50 <0.0001 1.39 1.02 1.89 0.04

No ultimately fatal disease(McCabe 1) 0.48 0.39 0.58 <0.0001 0.43 0.32 0.58 <0.0001

Chronic liver disease 1.46 1.09 1.95 0.01 1.90 1.12 3.22 0.02

Decisions to forego life-sustaining therapy 1.89 1.42 2.51 <0.0001 16.56 11.00 24.90 <0.0001

Worsening of the LOD scorewithin the first week afterICU admission 1.36 1.32 1.39 <0.0001 1.60 1.48 1.72 <0.0001

Azoulay E., et al. Intens. Care Med 2003;29:1895-1901

Cox model vs Logistic regression

Cox model Logistic regression

Variable HR

Low95 %CI

High95 %CI p OR

Low95 %CI

High95 %CI p

Uncensored database(survival at day 28)

Albumin (yes = 1) 1.44 1.21 1.73 <0.0001 2.58 2.05 3.25 <0.0001

Censored database (ICUsurvival)

Albumin (yes=1) 1.11 0.91 1.36 0.307 2.77 2.18 3.52 <0.0001

SOAP study

36

Conclusions: Cox regression vs logisticregression

Cox model: estimates hazard/rate ratio: ratio ofincidence rates

Logistic model: estimates odds/risk ratio: ratio ofproportions

By taking into account time, you are taking intoaccount more information than just binary yes/no.

The logistic regression requires a similar period oftime of observation on all the sujects to avoid theinfluence of time on the outcome.

Date post:	29-Dec-2018
Category:	Documents
Upload:	dinhnhi
View:	215 times
Download:	0 times

Logistic Regression versus Cox Regression Ch. Mélot, MD...

Documents