+ All Categories
Home > Documents > Homework 11 - Solution - Purdue Universityovitek/STAT526-Spring11_files/pdfs/hw11-sol.pdf · 14 13...

Homework 11 - Solution - Purdue Universityovitek/STAT526-Spring11_files/pdfs/hw11-sol.pdf · 14 13...

Date post: 21-Apr-2018
Category:
Upload: doantruc
View: 219 times
Download: 5 times
Share this document with a friend
13
STAT 526 - Spring 2011 Olga Vitek Homework 11 - Solution Each part of the problems 5 points 1. The survival times of 11 patients with Acute Myelogenous Leukemia are given below: 9, 13, 13 + , 18, 23, 28 + , 31, 34, 45 + , 48, 161 + , where censored data are marked by “+” in the superscript. (a) Calculate the Kaplan-Meier estimate of S(20). Answer: ˆ S(20) = (10/11)(9/10)(7/8) = 0.7159 (b) Obtain the standard error of log ˆ S(20). Answer: p 1/11/10 + 1/10/9+1/8/7=0.1951 (c) Assuming an exponential distribution, estimate S(20). Answer: With exponential, S(t)= e -λt , and ˆ λ =7/(9 + ··· + 161) = 7/423 = 0.01655 with s.e. = ˆ λ/ d = ˆ λ/ 7=0.006255. So ˆ S(20) = e -0.01655(20) =0.7182. (d) Obtain the standard error of log ˆ S(20) for the exponential estimate. Answer: SE[log S(t)] = p (V ar[log S(t)]) = p (V ar[-λt]) = p (t 2 V ar[λ]) = tSE(λ) = 20(0.006255) = 0.1251. 2. [Methods qualifying exam, August 2005: use paper and pencil.] Consider the survival model character- ized by the following piecewise-constant hazard function: λ(t)= λ 1 , 0 t<π 1 λ 2 , π 1 t<π 2 λ 3 , π 2 t where π 1 and π 2 are known constants. Based on right-censored data (x i i ), i =1,...,n derive the MLEs of λ 1 , λ 2 , and λ 3 and their standard errors. 1
Transcript

STAT 526 - Spring 2011 Olga VitekHomework 11 - Solution

Each part of the problems 5 points

1. The survival times of 11 patients with Acute Myelogenous Leukemia are given below:

9, 13, 13+, 18, 23, 28+, 31, 34, 45+, 48, 161+,

where censored data are marked by “+” in the superscript.

(a) Calculate the Kaplan-Meier estimate of S(20).

Answer:

S(20) = (10/11)(9/10)(7/8) = 0.7159

(b) Obtain the standard error of log S(20).

Answer:√1/11/10 + 1/10/9 + 1/8/7 = 0.1951

(c) Assuming an exponential distribution, estimate S(20).

Answer:

With exponential, S(t) = e−λt, and λ = 7/(9 + · · ·+ 161) = 7/423 = 0.01655 with s.e. =λ/√d =

λ/√

7 = 0.006255. So S(20) = e−0.01655(20) = 0.7182.

(d) Obtain the standard error of log S(20) for the exponential estimate.

Answer:

SE[logS(t)] =√

(V ar[logS(t)]) =√

(V ar[−λt]) =√

(t2V ar[λ]) = tSE(λ) = 20(0.006255) =0.1251.

2. [Methods qualifying exam, August 2005: use paper and pencil.] Consider the survival model character-ized by the following piecewise-constant hazard function:

λ(t) =

λ1, 0 ≤ t < π1

λ2, π1 ≤ t < π2

λ3, π2 ≤ t

where π1 and π2 are known constants. Based on right-censored data (xi, δi), i = 1, . . . , n derive theMLEs of λ1, λ2, and λ3 and their standard errors.

1

Answer:

The cumulative hazard Λ(t) = − logS(t) is given by

Λ(t) =

λ1t, 0 ≤ t < π1

λ1π1 + λ2(t− π1), π1 ≤ t < π2

λ1π1 + λ2(π1 − π2) + λ3(t− π2), π2 ≤ t

so the log likelihood of the data is seen to be∑xi<π1

{δi log λ1 − λ1xi}+∑

π1≤xi<π2

{δi log λ2 − [λ1π1 + λ2(xi − π1)]}

+∑π2≤xi

{δi log λ3 − [λ1π1 + λ2(π2 − π1) + λ3(xi − π2)]}

= (d1 log λ1 − λ1e1) + (d2 log λ2 − λ2e2) + (d3 log λ3 − λ3e3),

where d1, d2, and d3 are the number of failures in [0, π1), [π1, π2), and [π2,∞), respectively, and e1, e2,

and e3 are the total exposure times over the 3 time segments. Thus, λj = dj/ej with standard error

λj/√dj .

3. [Methods qualifying exam, January 2010: use paper and pencil.] Researcher in a cancer center exposed8 rats to carcinogen, and recorded the following time to mortality in days:

Sumxi = min(ti, ci) 143 188 190 206 213 220 216 244 1620

x2i 20449 35344 36100 42436 45369 48400 46656 59536 334290

δi = I[ti≤ci] 1 1 1 1 1 1 0 0 6

In the table, xi = min(ti, ci), with ti and ci being the death time and censoring time of the ith rat,respectively, and δi indicates whether the observation was censored. “Sum” indicates the sum of allthe values in the row.

(a) The researcher considers modeling these data with a family of lifetime distributions, which has ahazard function λ(t) = 2t/β2. Write down the survival function S(t) for this family of models.

Answer:

Λ(t) =t∫−∞

2u/β2 = t2/β2, so S(t) = e−Λ(t) = e−t2/β2

.

(b) Obtain the MLE β of β.

Answer:

The likelihood of the data is

L(β) =

10∏i=1

[2xiβ2e−x

2i /β

]δi [e−x

2i /β]1−δi

=

10∏i=1

[2xiβ2

]δi· e−x

2i /β

The log likelihood of data is

l(β) =∑i

δi(log(2xi)− 2 log β)−∑i

(x2i /β

2).

2

Settingdl

dβ= −

2∑i δiβ

+2∑i x

2i

β3= 0,

one has β =√∑

i x2i /∑i δi =

√334290/6 = 236.0403.

(c) Obtain the standard error of β.

Answer: The information is

I(β) = − d2l

dβ2= −

2∑i δi

β2+

6∑i x

2i

β4

thus I(β) = 4(∑i δi)

2/∑i x

2i , so the standard error is

√1/I(β) =

√∑i x

2i /(2

∑i δi) =

√334290/(2·

6) = 48.18.

4. [Methods qualifying exam, January 2011: use paper and pencil.] A clinical trial studies the survival oflung cancer patients, where the patients were randomly assigned to an experimental treatment or toa placebo. The following table reports survival times of patients in months. ’+’ indicates a censoredobservation.

Survival times ti, months∑16i=1 ti

Placebo 1 1 1+ 2 2 3 3 3+ 16Treatment 1 2 2+ 2 3 3 4 5 5+ 27

(a) The researchers would like to obtain the Kaplan-Meier estimate of the survival function. Computethe estimate of the survival function for the treatment group at 2 months SKMtrt (2).

Answer:

SKMtrt (t) = 1 · (1− 1/9) · (1− 2/8) = 0.667

(b) The researchers now assume that the survival function for each group follows an exponentialdistribution.

i. State the assumptions of the model, and interpret the parameters.

Answer:The survival function is

Splacebo(t) = e−λplacebo·t, Strt(t) = e−λtrt·t,

where λ is the reciprocal of the expected survival time.

ii. State the estimated survival function for each group, according to the Maximum Likelihoodestimation procedure.

Answer:

3

The MLE of λ is

λ =

∑16i=1 δi∑16i=1 ti

. λplacebo = 0.375, λtrt = 0.2592

(c) To determine the effectiveness of the treatment, the researchers fit the accelerated failure timemodel given in the partial output below. In the output, the variable trt=0 for placebo, andtrt=1 for the treatment group.

Call:

survreg(formula = Surv(time, obs) ~ trt, data = x, dist = "exponential")

Value Std. Error

(Intercept) 0.981 0.408

trt 0.369 0.556

Scale fixed at 1

Exponential distribution

Loglik(model)= -28.3 Loglik(intercept only)= -28.6

Chisq= 0.43 on 1 degrees of freedom, p= 0.51

i. State the model, and interpret the parameters.

Answer:An accelerated failure time model assumes that the covariates have the effect of adjustingtime to event

S(t) = S0

(t · ex

′β)

= S0

(t · eβ0+β1·trt

), where

S0(t) is the survival function of the standard exponential random variable.

ii. State the null and the alternative hypotheses regarding the in terms of the model parameters,and conclude at the confidence level of 95% whether the treatment is effective.

Answer:To test for the effect of the treatment, we test H0 : β1 = 0 vs Ha : β1 6= 0. The test statistic

z =0.369

0.556= 0.663 < z(1− 0.05/2) = 1.96

Therefore we fail to reject H0, and conclude that the treatment is not effective.

(d) Finally, the researchers fit the model below

Call:

coxph(formula = Surv(time, obs) ~ trt, data = x, method = "exact")

coef exp(coef) se(coef)

trt -0.8188 0.4410 0.7376

i. State the model and the assumptions.

Answer:The model is specified in terms of the hazard function h(t) = h0(t) expβ0+β1·trt, where thehazard function is defined as h(t) = f(t)/S(t) and h0(t) is the baseline hazard.

4

ii. Test whether the treatment is effective. State the null and the alternative hypotheses, andyour conclusions at the confidence level of 95%.

Answer:We test H0 : β1 = 0 vs Ha : β1 6= 0. The test statistic

z =−0.8188

0.4410= −1.11 > −z(1− 0.05/2) = −1.96

Therefore we fail to reject H0, and conclude that the treatment is not effective.

iii. Discuss how you can assess the plausibility of the assumption of “proportional hazard” forthis dataset.

Answer:Since the survival function satisfies

S(t) = exp(−H0(t) ·Xβ)⇒ log[− logS(t)] = −log[H0(t)] + Xβ

If the model is appropriate, the plot of log[− log SKMplacebo(t)] and of of log[− log SKMtrt (t)] versust will produce two roughly parallel curves.

5. [Methods qualifying exam, August 2010: use paper and pencil.] A randomized clinical trial studiessurvival times of leukemia patients, where the patients are randomly assigned to an experimentaltreatment or to a placebo. The following table reports survival times of the patients in weeks, where’+’ indicates a censored observation.

Survival times ti, i = 1, . . . , 16∑16i=1 ti

Treatment 10 10+ 12 12+ 13 14 17+ 20 26 34+ 35 37 38 39 40 40+ 397Placebo 7 9 9+ 10 11 11+ 13 14 14+ 15 16 17 19 21+ 22 27+ 235

(a) Compute the Kaplan-Meier estimates of the survival at time 12 for both groups.

Answer:

The Kaplan-Meier estimates are

S(ti) = S(ti−1)ni − dini

Streatment(12) =15

16· 13

14= 0.8705

Splacebo(12) =15

16· 14

15· 12

13· 11

12= 0.7403

(b) Compute the confidence intervals associated with the estimates of survival above.

Answer:

5

The variances of the log-estimates of survival are

var[log(S(t))] =∑i:ti≤t

dini(ni − di)

var[log(Streatment(12))] =1

16 · 15+

1

14 · 13= 0.009661172

var[log(Splacebo(12))] =1

16 · 15+

1

15 · 14+

1

13 · 12+

1

12 · 11= 0.02291459

Therefore

SE[log(Streatment(12))] = 0.0982 and SE[log(Splacebo(12))] = 0.1513

The corresponding confidence intervals are

Treatment :(eln(0.8705)−1.96·0.0982, eln(0.8705)+1.96·0.0982

)= (0.7180, 1)

Placebo :(eln(0.7403)−1.96·0.1513, eln(0.7403)+1.96·0.1513

)= (0.5503, 0.9958)

(c) Based on these results, does the treatment have an effect on the survival?

Answer:

The point estimate of the survival at time 12 is higher for the treatment, but the confidenceintervals overlap, therefore we cannot make a definite conclusion. This is due in part to thenon-parametric nature of the Kaplan-Meier estimate, which lacks power.

(d) Now assume that the survival function follows an exponential distribution. Compute model-basedestimates of the survival at time 12 for both groups.

Answer:

The survival function is S(t) = e−λt. The MLE of λ is λ =∑16

i=1 δi∑16i=1 ti

.

Therefore, λtreatment = 11397 = 0.02777, and λplacebo = 11

235 = 0.04681, and Streatment(12) =e−0.02777·12 = 0.7165 and Splacebo(12) = e−0.04681·12 = 0.57022

(e) Discuss the assumptions underlying these two approaches, and the reasons for the discrepancybetween these two estimates of survival.

Answer:

The Kaplan-Meyer approach is non-parametric, and assumes that the survival is a step function.It is flexible in the representation of survival, but requires the estimation of many parameters andtherefore lacks power, and does not allow adjustments for covariates.

The assumption of exponential distribution requires the estimation of only one parameter and canbe adjusted for covariates. However it imposes a very restrictive assumption on the functionalform of the survival function. This restriction is the main reason for the discrepancy of the results.

6

6. [Methods qualifying exam, August 2008: use paper and pencil.] To assess the gender risk of deathfrom coronary heart disease (CHD), a study was performed to control for genetic factors with twinsconsisting of a male and female. The age at which a male/female twin died of CHD was recorded. Thedataset twin contains the following samples from this study.

gender = "male" age 49 50 56 61 67 68 69 70 74 74 75 81cens 1 0 1 0 0 0 1 1 1 1 1 1

gender = "female" age 52 58 63 69 70 70 70 72 73 74 75 81cens 0 0 1 1 0 1 1 0 1 1 0 1

(a) Calculate the Kaplan-Meier estimates of the survival functions for the male group.

Answer:

Let ni be the number of subjects at risk at a time just prior to ti, 1 ≤ i ≤ 12, and di isthe number of deaths at ti. Survival function is defined as S(t) = P{T > t}. Its Kaplan-Meierestimate is

S(t) =∏ti≤t

(1− di

ni

)and is

ti 49 50 56 61 67 68 69 70 74 74 75 81ni 12 11 10 9 8 7 6 5 4 3 2 1

di=cens 1 0 1 0 0 0 1 1 1 1 1 1

S(ti) 0.9167 0.9167 0.825 0.825 0.825 0.825 0.687 0.550 0.412 0.275 0.137 0.000

(b) Although the evidence is substantial that males are at higher risk than females, the role of geneticfactors versus the gender factor in CHD is still largely unknown. Using the output below, statethe hypothesis that gender is a risk factor of death from CHD, and test the hypothesis whilecontrolling for genetic factors.

> survdiff(Surv(age,cens)~gender,data=twin);

...

N Observed Expected (O-E)^2/E (O-E)^2/V

gender=female 12 7 7.91 0.105 0.299

gender=male 12 8 7.09 0.117 0.299

Answer:

The above code compares the hazard functions of male and female populations using log-ranktest. The hazard function is defined as the instantaneous failure rate of individuals at risk

λ(t) = limδ→0

P{t ≤ T ≤ t+ δ | T ≥ t}δ

The test is non-parametric in that it does not assume a particular functional form of the distri-butions. It tests H0 : λf (t) = λm(t) versus Ha : λf (t) 6= λm(t)

From the approximation based on hypergeometric distribution, the test statistic is L = 0.299 <χ2(0.05, df = 1) = 3.84. We fail to reject the null hypothesis and conclude that there is no evi-dence against different hazards of death between genders.

7

(c) An accelerated failure time model is fitted to the data as shown below. Using the output below,predict the probability for a female with CHD to survive past 75.

> aftmod <- survreg(Surv(age,cens)~gender,dist="exponential",data=twin)

> summary(aftmod)

...

Value Std. Error z p

(Intercept) 4.772 0.378 12.625 1.53e-36

gendermale -0.174 0.518 -0.337 7.36e-01

...

Answer:

An accelerated failure time model assumes that the covariates have the effect of adjusting timeto event. In other words,

S(t) = S0

(t · ex

′β), where

S0(t) is the baseline survival function of a random variable T0, i.e. T0 = T · ex′β or, equivalently,

log(T0) ∼ log(T ) + x′β, and

log(T ) ∼ log(T0)− x′β

The R code above specifies the baseline distribution as standard exponential, i.e. T0 ∼ Exponential(1),and S0(t) = e−t. However, survreg implements the model

log(T ) ∼ 1 · log(T0)− x′(−β)

Therefore, the estimated survival is

S(75 | female) = S0(75 · e−x′ (−β)) = exp(−75 · e−(4.772+(−0.174)·0)) = 0.5301

(d) A proportional hazards model is fitted to the data as shown below. For a twin consisting of amale and female, what is the effect of gender on the survival?

> coxmod <- coxph(Surv(age,cens)~gender,data=twin)

> summary(coxmod)

...

coef exp(coef) se(coef) z p

gendermale 0.26 1.30 0.522 0.499 0.62

exp(coef) exp(-coef) lower .95 upper .95

gendermale 1.30 0.771 0.467 3.60

...

Answer:

The Cox proportional hazards model specifies a multiplicative relationship of the baseline hazardλ0 and of the covariates

λ(t) = λ0(t)ex′β

According to the model, the ratio of hazards for males and females is

λm(t)

λf (t)= exp(βgendermale)

8

This ratio is estimated by e0.26 = 1.30. In other words, males have a 30% higher hazard of death.

7. Consider the dataset mgus in the library survival (use ?mgus for information on the dataset). Our goalis to model the survival time of these patients using Cox proportional hazard model (ignore variablespcdx and pctime, which have a large proportion of missing data).

Verify the plausibility of a proportional hazard model for this dataset. Select important predictorsin the model, and their functional form. Evaluate the quality of fit. Interpret model parameters anddiscuss the results.

Answer:Cleaning up the data. Cleaning the data by removing pcdx and pctime from the dataset, deleting allthe observiations with missing values on the variables. In total, 10 variables and 187 observations areleft in the data set.

mgus <- mgus[-(5:6)]

mgus <- na.omit(mgus)

Checking the functional form of the predictors. We calculate the martingale residuals of the full modelwhich includes all of the potential covariates, and plot the residuals versus each continuous predictor.

fit.full <- coxph(Surv(futime, death) ~ age + sex + dxyr + alb + creat + hgb + mspike, data=mgus)

rr <- resid(fit.full)

pdf("functionalForm.pdf", width=9, height=6)

par(mfrow=c(2,3))

for (i in c(2, 4, 7, 8, 9, 10)) {

plot(mgus[,i], rr, xlab=dimnames(mgus)[[2]][i], ylab="Residual", cex.lab=1.5)

lines(lowess(mgus[,i], rr), lty=1, lwd=2, col="red")

}

dev.off()

9

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●●

●●

●●

● ●

● ●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

40 50 60 70 80

−3

−2

−1

01

age

Res

idua

l●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

60 65 70

−3

−2

−1

01

dxyr

Res

idua

l

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

● ●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

2.0 2.5 3.0 3.5 4.0 4.5 5.0

−3

−2

−1

01

alb

Res

idua

l

●●

●●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

● ●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

1 2 3 4 5 6

−3

−2

−1

01

creat

Res

idua

l

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

● ●

● ● ●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

8 10 12 14 16

−3

−2

−1

01

hgb

Res

idua

l●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●● ●

●●

● ●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

1.0 1.5 2.0 2.5

−3

−2

−1

01

mspike

Res

idua

lMost of the relationships are roughly linear, indicating that the linear form of the predictors is appro-priate. The measurements of creat are highly skewed to the right, and may have a slight curvature inthe region where the bulk of the measurements are located. Therefore we’ll take the log transformationof creat.

Verifying the plausibility of the assumption of proportional hazards. We now verify the plausibility ofthe assumption of proportional hazards. The assumption

λ(t) = λ0(t)ex′β

implies that

S(t) = exp(−Λ0(t)ex′β) or, equivalently, log[−logS(t)] = logΛ0(t) + x′β

Therefore, parallel plots of the complementary log-log transformation of an empirical survival functionlog[−logS(t)] for different values of covariates indicate the plausibility of the assumption of proportionalhazard. As an example, we show these plots for 3 predictors (similar plots should be made for allpredictors in the model)

pdf("proportionalHazard.pdf", width=9, height=3)

par(mfrow=c(1,3))

# versus sex

fit.sex <- survfit(Surv(futime, death) ~ sex, data = mgus)

plot(fit.sex, mark.time=FALSE, fun=’cloglog’, xlab="Days", ylab="Log(-Log(Survival))")

title("By sex")

# versus age

10

group <- 1*(mgus$age > quantile(mgus$age, .33)) + 1*(mgus$age > quantile(mgus$age, .66))

fit.age <- survfit(Surv(futime, death) ~ group, data = mgus)

plot(fit.age, mark.time=FALSE, fun=’cloglog’, xlab="Days", ylab="Log(-Log(Survival))")

title("By quantiles of age")

# versus log(creat)

group <- 1*(log(mgus$creat) > quantile(log(mgus$creat), .33)) +

1*(log(mgus$creat) > quantile(log(mgus$creat), .66))

fit.age <- survfit(Surv(futime, death) ~ group, data = mgus)

plot(fit.age, mark.time=FALSE, fun=’cloglog’, xlab="Days", ylab="Log(-Log(Survival))")

title("By quantiles of log(creat)")

dev.off()

50 200 500 2000 10000

−4

−3

−2

−1

0

Days

Log(

−Lo

g(S

urvi

val))

By sex

50 200 500 2000 10000

−4

−3

−2

−1

01

Days

Log(

−Lo

g(S

urvi

val))

By quantiles of age

50 200 500 2000 10000

−4

−3

−2

−1

01

Days

Log(

−Lo

g(S

urvi

val))

By quantiles of log(creat)

The plots show a roughly parallel patterns for the intermediate values of survival time, however deviatefrom parallel patterns for low and high times. A drawback of these diagnostics plots is that it considersonly one predictor variable at a time, and a non-parallel pattern can be due to confounding with otherpredictor variables. Overall, the assumption of proportional hazard should be made with caution. Herewe will continue to pursue the proportional hazard model.

Variable selection. Consider the full model with variable log(creat).

fit.full <- coxph(Surv(futime, death) ~ age + sex + dxyr + alb + log(creat) + hgb + mspike,

data=mgus)

summary(fit.full)

coxph(formula = Surv(futime, death) ~ age + sex + dxyr + alb +

log(creat) + hgb + mspike, data = mgus)

coef exp(coef) se(coef) z Pr(>|z|)

age 0.064913 1.067067 0.008808 7.370 1.71e-13 ***

sexfemale -0.268277 0.764696 0.180980 -1.482 0.13825

dxyr 0.053137 1.054574 0.037069 1.433 0.15173

alb -0.223054 0.800071 0.221082 -1.009 0.31301

log(creat) 0.618538 1.856212 0.310884 1.990 0.04663 *

hgb -0.165536 0.847439 0.056729 -2.918 0.00352 **

mspike 0.068987 1.071422 0.211569 0.326 0.74437

exp(coef) exp(-coef) lower .95 upper .95

age 1.0671 0.9371 1.0488 1.0856

sexfemale 0.7647 1.3077 0.5363 1.0903

11

dxyr 1.0546 0.9483 0.9807 1.1340

alb 0.8001 1.2499 0.5187 1.2340

log(creat) 1.8562 0.5387 1.0093 3.4139

hgb 0.8474 1.1800 0.7583 0.9471

mspike 1.0714 0.9333 0.7077 1.6220

Rsquare= 0.369 (max possible= 0.999 )

Likelihood ratio test= 86.09 on 7 df, p=7.772e-16

Wald test = 81.04 on 7 df, p=8.438e-15

Score (logrank) test = 84.6 on 7 df, p=1.554e-15

We can consider a reduced model that eliminates all the non-significant variables, and compare thetwo models using Likelihood Ratio test.

fit.reduced <- coxph(Surv(futime, death) ~ age + log(creat) + hgb, data=mgus)

summary(fit.reduced)

coxph(formula = Surv(futime, death) ~ age + log(creat) + hgb,

data = mgus)

coef exp(coef) se(coef) z Pr(>|z|)

age 0.067695 1.070038 0.008758 7.730 1.08e-14 ***

log(creat) 0.763505 2.145785 0.290075 2.632 0.00849 **

hgb -0.156690 0.854969 0.050914 -3.078 0.00209 **

exp(coef) exp(-coef) lower .95 upper .95

age 1.070 0.9345 1.0518 1.0886

log(creat) 2.146 0.4660 1.2153 3.7888

hgb 0.855 1.1696 0.7738 0.9447

Rsquare= 0.35 (max possible= 0.999 )

Likelihood ratio test= 80.66 on 3 df, p=0

Wald test = 74.63 on 3 df, p=4.441e-16

Score (logrank) test = 78.6 on 3 df, p=1.110e-16

The Likelihood Ratio test statistic is L = 86.09 − 80.66 = 5.43H0∼ χ2

4, the p-value of the test is 0.24,therefore there is no evidence against the reduced model.

Influence diagnostics. We consider the reduced model above, and plot changes in parameters of eachpredictor that follow deletions of individual observations.

pdf("influence.pdf", width=9, height=3)

par(mfrow=c(1,3))

rr <- resid(fit.reduced, type=’dfbeta’)

plot(mgus$age, rr[,1], xlab="Age", ylab="Influence")

plot(log(mgus$creat), rr[,2], xlab="log(Creat)", ylab="Influence")

plot(mgus$hgb, rr[,3], xlab="Hgb", ylab="Influence")

dev.off()

12

●●

●●● ●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

40 50 60 70 80

−0.

003

−0.

001

0.00

1

Age

Influ

ence

●●

●●●

●●

●●●

●● ●● ●

●●

●●

● ● ● ●

●●

● ●

● ●●●

●● ●

●●

● ●●

● ●

●●

●●

●● ●●

●●●● ●●

●●●● ●●

●●

●●

●●●

● ●●

●●

●●

●●

●●●

● ● ●●● ●●

●● ●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●●

●●

●●

−0.5 0.0 0.5 1.0 1.5

−0.

15−

0.05

0.05

log(Creat)

Influ

ence

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●●

●●

● ● ●●

●● ●●

●●

●● ●

●●

●●●

●●

●●●

●●

●●

●● ●●

● ●

●●● ●

●●

8 10 12 14 16

−0.

005

0.00

00.

005

Hgb

Influ

ence

The plots show 3 potentially influential observations, two for the parameters of age, and one forlog(creat). These observations should be followed up upon.

Interpretation of the results.

• age: this is the most important predictor, as evidenced by the largest test statistic. Each extrayear is associated with a 7% increase in risk. An additional decade corresponds to exp(10 x0.067695)=1.96-fold increase in risk.

• log(creat): each 1 point change of log(creat) is associated with a 2.146-fold increase in pa-tient’s risk

• hgb: each unit change of hgb is associated with a 15% reduction in risk

13


Recommended