Date post: | 15-Jan-2016 |
Category: |
Documents |
View: | 226 times |
Download: | 1 times |
John B. Willett & Judith D. SingerHarvard Graduate School of
Education
Introducing discrete-time survival analysis ALDA, Chapter Eleven
“To exist is to change, to change is to mature”Henri Bergson
Chapter 11: Fitting basic discrete-time hazard modelsChapter 11: Fitting basic discrete-time hazard models
Review basic descriptive statistics for discrete-time survival data (Ch 10)Life table
Hazard function
Survivor function
Median lifetime
Specifying a suitable discrete-time hazard model (§11.1 & 11.2)—both heuristic and formal representations
Fitting the discrete-time hazard model to data (§11.3)—it turns out that it’s very easy to fit the model
Interpreting parameter estimates (§11.4)—very different from growth modeling, but more similar to logistic regression
Displaying fitted hazard and survivor functions (§11.5)—as in growth modeling, we’ll display fitted functions at prototypical predictor values
Comparing (nested) discrete-time hazard models using goodness-of-fit statistics (§11.5)—methods for data analysis and model comparison
Review basic descriptive statistics for discrete-time survival data (Ch 10)Life table
Hazard function
Survivor function
Median lifetime
Specifying a suitable discrete-time hazard model (§11.1 & 11.2)—both heuristic and formal representations
Fitting the discrete-time hazard model to data (§11.3)—it turns out that it’s very easy to fit the model
Interpreting parameter estimates (§11.4)—very different from growth modeling, but more similar to logistic regression
Displaying fitted hazard and survivor functions (§11.5)—as in growth modeling, we’ll display fitted functions at prototypical predictor values
Comparing (nested) discrete-time hazard models using goodness-of-fit statistics (§11.5)—methods for data analysis and model comparison
Illustrative example: Grade at first heterosexual intercourseIllustrative example: Grade at first heterosexual intercourse
Sample: 180 middle school boys (all considered “at risk”)Research design:
Large panel study in which each boy was tracked from 7 th through 12th gradesBy the end of data collection (at the end of 12th grade), n=126 (70.0%) had had sexThe remaining n=54 (30%) were still virgins. These censored observations pose a challenge for data analysis.
Question predictor: PT, for parenting transition, a dichotomy indicating whether the boy lived with his biological parents during his early formative years (before 7th grade when data collection began)
72 boys (40%) lived with both biological parents (PT=0)108 boys (60%) experienced at least one parenting transition before 7 th grade (PT=1)
Ultimately, we’ll also examine a continuous predictor, PAS, which assesses the parents’ level of antisocial behavior during the child’s formative years (also time-invariant—behavior before the study started).
Because the original scale is totally arbitrary, scores have been standardized to a mean of 0 and sd of 1
Sample: 180 middle school boys (all considered “at risk”)Research design:
Large panel study in which each boy was tracked from 7 th through 12th gradesBy the end of data collection (at the end of 12th grade), n=126 (70.0%) had had sexThe remaining n=54 (30%) were still virgins. These censored observations pose a challenge for data analysis.
Question predictor: PT, for parenting transition, a dichotomy indicating whether the boy lived with his biological parents during his early formative years (before 7th grade when data collection began)
72 boys (40%) lived with both biological parents (PT=0)108 boys (60%) experienced at least one parenting transition before 7 th grade (PT=1)
Ultimately, we’ll also examine a continuous predictor, PAS, which assesses the parents’ level of antisocial behavior during the child’s formative years (also time-invariant—behavior before the study started).
Because the original scale is totally arbitrary, scores have been standardized to a mean of 0 and sd of 1
Data source: Deborah Capaldi & colleagues (1996) Child Development
(ALDA, Section 11.1, pp 358-360)
The life table: Summarizing the distribution of event occurrence over timeThe life table: Summarizing the distribution of event occurrence over time
(ALDA, Section 10.1, pp 326-329)
Risk set n censored in interval jn experiencing target
event in interval j
J intervals,T=7, 8, …, 12
How might we summarize the distribution of event occurrence?
How might we summarize the distribution of event occurrence?
Assessing the conditional risk of event occurrence: The discrete-time hazard functionAssessing the conditional risk of event occurrence: The discrete-time hazard function
(ALDA, Section 10.2.1, pp 330-339)
6 7 8 9 10 11 12
Grade
0.00
0.10
0.20
0.30
h(t)
riskatn
eventsnth
j
jj ,)(ˆ
0833.0180
15)(ˆ 7 th
1519.0158
24)(ˆ 9 th
3250.080
26)(ˆ 12 th
Discrete-time hazardConditional probability that individual i will experience the
target event in time period j (Ti = j) given that s/he didn’t experience it in any earlier time period (Ti j)
h(tij)=Pr{Ti= j|Ti j}
As a probability (only in discrete time), hazard is bounded by 0 and 1. This is an issue for modeling that we’ll need to addressEstimation is easy because each value of hazard is based on that interval’s risk set.
Discrete-time hazardConditional probability that individual i will experience the
target event in time period j (Ti = j) given that s/he didn’t experience it in any earlier time period (Ti j)
h(tij)=Pr{Ti= j|Ti j}
As a probability (only in discrete time), hazard is bounded by 0 and 1. This is an issue for modeling that we’ll need to addressEstimation is easy because each value of hazard is based on that interval’s risk set.
Cumulating risk over time: The survivor function (and median lifetime)Cumulating risk over time: The survivor function (and median lifetime)
(ALDA, Section 10.2, pp 330-339)
6 7 8 9 10 11 12
Grade
0.00
0.25
0.50
0.75
1.00
S(t)
7444.0]1519.01[8778.0)(ˆ9 tS
Discrete-time survival probabilityProbability that individual i will “survive”
beyond time period j (Ti > j)
(i.e.,will not experience the event until after time period j).
S(tij)=Pr{Ti > j}
Also a probability bounded by 0 and 1.
At the beginning of time, S(ti0)=1.0
Strategy for estimation: Since h(tij) tells us about the probability of event occurrence, 1-h(tij) tells us about the probability of non-occurrence (i.e., about survival)
Discrete-time survival probabilityProbability that individual i will “survive”
beyond time period j (Ti > j)
(i.e.,will not experience the event until after time period j).
S(tij)=Pr{Ti > j}
Also a probability bounded by 0 and 1.
At the beginning of time, S(ti0)=1.0
Strategy for estimation: Since h(tij) tells us about the probability of event occurrence, 1-h(tij) tells us about the probability of non-occurrence (i.e., about survival)
9167.0]0833.01[0.1)(ˆ7 tS
)](ˆ1)[(ˆ)(ˆ1 jjj thtStS
ML = 10.6
Estimated median lifetime
Person-period data set:• one row for every person-period until event
occurrence or censoring—different from growth modeling
• EVENT indicates either event occurrence or censoring
Person-period data set:• one row for every person-period until event
occurrence or censoring—different from growth modeling
• EVENT indicates either event occurrence or censoring
Converting a person-level data set into a person-period data setConverting a person-level data set into a person-period data set
(ALDA, Section 10.5.1, pp 351-354)
Person-level data set:one row per person
Person-level data set:one row per person
ID T CENSOR PT
193 9 0 1
126 12 0 1
407 12 1 0
ID 407 was censored,
remaining a virgin through 12th grade
ID 126 had sex in the 12th grade
ID 193 had sex in the 9th grade
Contemplating a DTSA model: Inspecting sample plots of within-group hazard and survivor functions
Contemplating a DTSA model: Inspecting sample plots of within-group hazard and survivor functions
(ALDA, Section 11.1.1, pp 358-361)
Q’s to ask when examining sample hazard f ns:• What is the shape of each hazard function?—here,
their shape is similar—both beginning low and climbing steadily over time.
• Does the relative level of hazard differ across groups?—here, hazard for boys with a parenting transition is consistently higher
• Suggests partitioning variation in risk into:• A baseline profile of risk• A shift in risk corresponding to variation in the
predictor
Q’s to ask when examining sample hazard f ns:• What is the shape of each hazard function?—here,
their shape is similar—both beginning low and climbing steadily over time.
• Does the relative level of hazard differ across groups?—here, hazard for boys with a parenting transition is consistently higher
• Suggests partitioning variation in risk into:• A baseline profile of risk• A shift in risk corresponding to variation in the
predictor
Q’s to ask when examining sample survivor f ns:• They tend to be less useful because they assess the
predictor’s cumulative effect—here, telling us that the ML for boys with a PT is 10.0 vs. 11.7 when PT=0.
Note: reversal of relative rankings
Q’s to ask when examining sample survivor f ns:• They tend to be less useful because they assess the
predictor’s cumulative effect—here, telling us that the ML for boys with a PT is 10.0 vs. 11.7 when PT=0.
Note: reversal of relative rankings
We’re almost ready to go, but back to the bounded nature of hazard
We’re almost ready to go, but back to the bounded nature of hazard
As in regular regression, we use transformation to deal with hazard’s bounds: Understanding the effects of taking odds and logits
As in regular regression, we use transformation to deal with hazard’s bounds: Understanding the effects of taking odds and logits
(ALDA, Section 11.1.2, pp 362-365)
6 7 8 9 10 11 12
Grade
0.0
0.2
0.4
0.6
0.8
1.0Estimated hazard
No early parenting transitions
One or more early parenting transitions
6 7 8 9 10 11 12
Grade
0.0
0.3
0.5
0.8
1.0Estimated odds
No early parenting transitions
One or more early parenting transitions
hazard
hazardodds
1
odds
6 7 8 9 10 11 12
Grade
0.0
-1.0
-2.0
-3.0
-4.0
Estimated logit(hazard)
No early parenting transitions
One or more early parenting transitions
hazard
hazardodds
1log)log(
logit
Facts about odds scale• Symmetric about 1 (50/50)• Effect most prominent when hazard is larger• Easy to get back to raw hazard:
• But it’s still bounded below by 0 and it’s asymmetric (raw differences have different meanings depending upon value of odds)
Facts about odds scale• Symmetric about 1 (50/50)• Effect most prominent when hazard is larger• Easy to get back to raw hazard:
• But it’s still bounded below by 0 and it’s asymmetric (raw differences have different meanings depending upon value of odds)
odds
oddshazard
1
Facts about logit scaleNot bounded at all, although you need to get used to negative values (whenever hazard<.50)Usually regularizes distance betw hazard fns
Stretches distance between small valuesCompresses distance between large values
It’s easy to get back to raw hazard
Facts about logit scaleNot bounded at all, although you need to get used to negative values (whenever hazard<.50)Usually regularizes distance betw hazard fns
Stretches distance between small valuesCompresses distance between large values
It’s easy to get back to raw hazard
logit
ehazard
1
1
6 7 8 9 10 11 12
Grade
0.0
-1.0
-2.0
-3.0
-4.0
Logit(hazard)
PT=0
PT=1
What population model might have generated these sample data?Plotting sample hazard estimates and overlaying alternative hypothesized models
What population model might have generated these sample data?Plotting sample hazard estimates and overlaying alternative hypothesized models
(ALDA, Section 11.1.1, pp 366-369)
Flat population logit hazard, shifted when PT switches from
0 to 1
Linear population logit hazard, shifted when PT switches from 0
to 1
General population logit hazard, shifted when PT switches from 0 to 1
Three reasonable features of a population discrete-time hazard model1. For each predictor value, there is a population logit-hazard function.
• When the predictor(s)=0, we call it the “baseline” logit-hazard function.
2. Each population logit-hazard function is constrained to have the identical shape, regardless of predictor value.
• This is an assumption, and it can—and will—be relaxed later.
3. The distance between each of these logit hazard functions is identical in every time period.
• Differences in predictor value only “shift” the logit-hazard function “vertically.” • This assumption can—and will—be relaxed later• In the meantime, the magnitude of this shift is the magnitude of the predictor’s effect
Three reasonable features of a population discrete-time hazard model1. For each predictor value, there is a population logit-hazard function.
• When the predictor(s)=0, we call it the “baseline” logit-hazard function.
2. Each population logit-hazard function is constrained to have the identical shape, regardless of predictor value.
• This is an assumption, and it can—and will—be relaxed later.
3. The distance between each of these logit hazard functions is identical in every time period.
• Differences in predictor value only “shift” the logit-hazard function “vertically.” • This assumption can—and will—be relaxed later• In the meantime, the magnitude of this shift is the magnitude of the predictor’s effect
How do we specify a discrete-time hazard model that has these 3 features?How do we specify a discrete-time hazard model that has these 3 features?
(ALDA, Section 11.2, pp369-372)
Recode PERIOD into a set of TIME indicators
ijjij PTDDDthlogit 1121277 ][)(
Constant vertical shift in logit hazard associated
with variation in PT
How does this model relate to the previous graph?
How does this model relate to the previous graph?
Carefully unpacking the discrete-time hazard modelCarefully unpacking the discrete-time hazard model
(ALDA, Section 11.2.1, pp 372-376)
ijjij PTDDDthlogit 1121277 ][)(
6 7 8 9 10 11 12
Grade
0.0
-1.0
-2.0
-3.0
-4.0
Logit(hazard)
PT = 0
PT = 1
(D7=1) (D8=1) (D12=1).........
12
8
9
1011
7
1
When PT=0, you get the baseline logit hazard function
When PT=1, you shift this entire baseline vertically by 1
1
iijjij PASPTDDDt 21121277 ][)( hlogitAnd we can add predictors just as in regular (logistic) regression
How does this model behave when hazard is expressed in the other scales?
How does this model behave when hazard is expressed in the other scales?
What does the DT hazard model look like when expressed on the other scales?What does the DT hazard model look like when expressed on the other scales?
(ALDA, Section 11.2.2, pp 376-379)
6 7 8 9 10 11 12
Grade
0.0
-1.0
-2.0
-3.0
-4.0
Logit(hazard)
PT = 0
PT = 1
1
logit
ehazard
1
1
6 7 8 9 10 11 12
Grade
0.0
0.1
0.2
0.3
0.4
0.5Hazard
PT = 0
PT = 1
logiteodds
6 7 8 9 10 11 12
Grade
0.0
0.2
0.4
0.6
0.8Odds
PT = 0
PT = 1
exp(1)
On the logit scale, the distances between functions is identical in every time period
(assumption built into our model)
On the logit scale, the distances between functions is identical in every time period
(assumption built into our model)
On the odds scale, one function is a constant magnification (or dimunition) of the other
—they are proportional
On the odds scale, one function is a constant magnification (or dimunition) of the other
—they are proportional
On the hazard scale, the functions have no constant relationship
(Would need to use a complementary log-log transformation to get a proportional hazards model)
On the hazard scale, the functions have no constant relationship
(Would need to use a complementary log-log transformation to get a proportional hazards model)
The “standard” DTSA model is a proportional odds model! The “standard” DTSA model is a proportional odds model!
Fitting the model to data: Use logistic regression in the person-period data setFitting the model to data: Use logistic regression in the person-period data set
(ALDA, Section 11.3, pp 378-386)
All parameter estimates, standard errors, t- and z-statistics, goodness-of-fit statistics, and tests will be
correct for the discrete-time hazard model
All parameter estimates, standard errors, t- and z-statistics, goodness-of-fit statistics, and tests will be
correct for the discrete-time hazard model
OutcomeTIME indicators Substantive predictors
PASPTDDD
PASDDD
PTDDD
DDD
2112128877
212128877
112128877
12128877
...)
...)
...)
...)
j
j
j
j
h(t logit :D Model
h(t logit :C Model
h(t logit :B Model
h(t logit :A Model
’s estimate the baseline logit hazard function
’s assess the effects of substantive predictors
Strategies for interpreting the ’s: ML estimates of the baseline hazard functionStrategies for interpreting the ’s: ML estimates of the baseline hazard function
(ALDA, Section 11.4.1, pp 386-388)
Because there are no predictors in Model A, this baseline is for the
entire sample• If est’s are approx equal,
baseline is flat• If est’s decline, hazard
declines• If est’s increase (as they do
here), hazard increases
Simplifying interpretation by transforming back to odds and hazard
^
Because there are no substantive predictors, Model A’s estimates are the full sample estimates
Strategies for interpreting the ’s: ML estimates of the substantive predictors’ effectsStrategies for interpreting the ’s: ML estimates of the substantive predictors’ effects
(ALDA, Section 11.4.2 & 11.4.3, pp 388-390)
^
4.28736.0ˆee PT
Dichotomous predictorsAs in regular logistic
regression, antilogging a parameter estimate yields the
estimated odds-ratio associated with a 1-unit
difference in the predictor:
The estimated odds of first intercourse for boys who have
experienced a parenting transition are 2.4 times higher than the odds for boys who did
not experience such a transition.
Continuous predictorsAntilogging still yields a
estimated odds-ratio associated with a 1-unit
difference in the predictor:
56.14428.0ˆee PAS
The estimated odds of first intercourse for boys whose
parents exhibited “1 unit more” of antisocial behavior are 1.56 times the odds for boys whose
parental antisocial behavior was one unit lower.
Because odds ratios are symmetric about 1, you can
also invert the odds ratios and change the reference group
Estimated odds of first intercourse for boys who did not experience a parenting
transition are 1/(2.40)=.42 or approximately 40% the odds
for boys who did
Estimated odds of first intercourse for boys who
parents have “1 unit less” of antisocial behavior are
1/(1.56)=.641 or approximately 2/3rds the odds for boys whose
parents were 1 unit higher
Displaying fitted hazard and survivor functionsIllustrating the general idea using Model B for a single dichotomous predictor
Displaying fitted hazard and survivor functionsIllustrating the general idea using Model B for a single dichotomous predictor
(ALDA, Section 11.5.1, pp 392-394)
)](ˆ1)[(ˆ)(ˆ1 jjj thtStS PTth logit jj 1
ˆˆ)(ˆ
)(ˆ1
1)(ˆ
jth logitje
th
With a single dichotomous predictor, there are only 2 possible prototypical functions: PT=0 (for boys from stable homes with no parenting transitions before 7 th grade) PT=1 (for boys who experienced one of more early parenting transitions)
With a single dichotomous predictor, there are only 2 possible prototypical functions: PT=0 (for boys from stable homes with no parenting transitions before 7 th grade) PT=1 (for boys who experienced one of more early parenting transitions)
Displaying fitted hazard and survivor functionsDisplaying fitted hazard and survivor functions
(ALDA, Section 11.5.1, pp 392-394)
Constant vertical separation of 0.8736 (the parameter estimate for PT)
Easy to see the effect of PTNon-constant vertical separation (no
simple interpretation because the model is proportional in odds, not
hazard)
Effect of PT cumulates into a large difference in estimated median lifetimes
(9.9 vs. 11.8 2 years)
Displaying fitted hazard and survivor functions when some predictors are continuousDisplaying fitted hazard and survivor functions when some predictors are continuous
(ALDA, Section 11.5.1, pp 392-394)
As in growth modeling, select substantively interesting
prototypical values and proceed in just as you did for
dichotomous predictors
here, we’ll choose +/- 1 sd PAS (lo=-1, medium=0, and high=+1)
As in growth modeling, select substantively interesting
prototypical values and proceed in just as you did for
dichotomous predictors
here, we’ll choose +/- 1 sd PAS (lo=-1, medium=0, and high=+1)
6 7 8 9 10 11 12
Grade
0.0
0.1
0.2
0.3
0.4
0.5Fitted hazard
6 7 8 9 10 11 12
Grade
0.0
0.5
1.0Fitted survival probability
One or more early parenting transitions
One or more early parenting transitions
No early parenting transitions
No early parenting transitions
PAS=+1
PAS= -1
PAS= 0
PAS=+1
PAS= -1
PAS= 0
PAS = -1
PAS = +1PAS = 0
PAS = -1
PAS = +1PAS = 0
PAS PT=0 PT=1
Low (-1) >12.0 10.7
Medium (0) 11.5 10.1
High (+1) 10.9 9.6
Estimated Median Lifetimes
Comparing goodness of fit using deviance statistics and information criteria:The strategies are generally the same as in growth modeling
Comparing goodness of fit using deviance statistics and information criteria:The strategies are generally the same as in growth modeling
(ALDA, Section 11.6, pp 397-402)
TIME dummies
Deviance smaller value, better fit, 2
dist., compare nested models
AIC, BIC smaller value, better fit,
compare non- nested models
Model B vs. Model A provides an uncontrolled test of H0: PT=0Deviance=17.30(1), p<.001
Model C vs. Model A provides an uncontrolled test of H0: PAS=0Deviance=14.79(1), p<.001
Model D vs. Models B&C provide controlled tests
[Both rejected as well]