Date post: | 04-Jul-2015 |
Category: |
Documents |
Upload: | htstatistics |
View: | 2,081 times |
Download: | 1 times |
Nonlinear Discrete-time Hazard Models for
Entry into Marriage
Heather Turner, Andy Batchelor, David Firth
Department of StatisticsUniversity of Warwick, UK
8th March 2010
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Motivating Application: The LII Survey
The Living in Ireland Surveys were conducted 1994-2001
For five 5-year cohorts of women born between 1950 and1975 we have the following data
I year of (first) marriageI year and month of birthI social classI highest level of education attainedI year highest level of education was attained
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
When do women get married?
We can use methods from survival analysis to model thetiming of marriage
Consider time starting from the legal age of marriage,then the survival time, T is the time until a personmarries
The time of marriage is recorded to the nearest year, sowe will use a discrete-time analysis
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Discrete-time Hazard Models
For discrete-time the hazard of marriage occuring at timet is defined as
h(t) = P (T = t|T ≥ t)
We are interested in the shape of the hazard over the lifecourse and how the hazard is affected by covariates
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Cox Proportional Odds Model
A popular choice is the proportional odds model proposedby Cox (JRSSB, 1972):
h(t|xit)
1− h(t|xit)=
h0(t)
1− h0texp x′
itβ
where h0(t) is the baseline hazard
Taking logs we obtain
logit(h(t|xit)) = logit(h0(t)) + x′itβ
= lt + x′itβ
I semi-parametric - makes no assumption about the shapeof the hazard function
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Episode-splitting
A simple way to estimate the proportional odds model isto generate an event history for each observation
Pseudo observations are created at each time point fromtime 0 up to marriage or censoring - this is known asepisode-splitting
The parameters in the proportional odds model can thenbe estimated by fitting a logistic regression model to abinary indicator of marriage at each time point (married= 1, unmarried = 0)
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Cox Proportional Odds Model
15 19 23 27 31 35 39 43
0.00
0.04
0.08
Age (years)
Pro
babi
lity
of M
arria
ge
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Sidenote: interval-censored data
A similar model can be obtained by assuming that thedata are interval-censored observations of acontinuous-time proportional hazards model
The coefficients in the model
cloglog(h(t|xit)) = lt + x′itβ
are then the coefficients of the proportional hazards model
This relationship breaks down however if αt is replaced bya parametric function
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Blossfeld and Huinink Model
Blossfeld and Huinink (Am. J. Sociol., 1991) propose thefollowing parametric baseline
logit(h0(t|ageit)) = l(ageit)
= c + βl log(ageit − 15) + βr log(45− ageit)
I describes the nature of the time dependenceI fixes the support of the hazard to be 15 to 45 years
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
BH Model
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●●
●●●●●
10 20 30 40 50
0.00
0.04
0.08
Age (years)
Pro
babi
lity
of M
arria
ge
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Effect of Endpoints
10 20 30 40 50
0.00
0.04
0.08
0.12
Age (years)
Pro
babi
lity
of M
arria
ge
Hazard support
15−45 years12−75 years
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Nonlinear Discrete-time Hazard Model
An obvious extension of the BH model is to treat theendpoints as parameters
l(ageit) = c + βl log(ageit − αl) + βr log(αr − ageit)
I nonlinear - need to extend available softwareI near-aliasing between parameters - need to
reparameterise
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Developing the Nonlinear Model
First analyse using the BH model as a reference
Then analyse using the extended model and illustratenear-aliasing
Finally analyse using a re-parameterised nonlinear discretemodel
I compare to BH modelI refine model for the LII data
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
BH Models
The BH models can be fitted using the glm function in R.
Following the model building strategy of Blossfeld &Huinink (1991), we select
I a cohort factorI a time-varying indicator of educational status (in/out)
For the 1970-1974 cohort the conditional odds ofmarriage are 24% of those for the 1950-1954 cohort
For women in education the conditional odds of marriageare 11% of those for women not in education
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Selected BH Model
15 20 25 30 35 40 45
0.00
0.05
0.10
0.15
Age (years)
Pro
babi
lity
of M
arria
ge(1949,1954](1954,1959](1959,1964](1964,1969](1969,1974]
Deviance = 12073 Residual d.f. = 31001
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Nonlinear Discrete-time Hazard Models
The nonlinear discrete-time hazard model is an example ofa generalised nonlinear model, which can be fitted usingthe gnm package in R (Turner and Firth, R News, 2007)
I parameters estimated by a modified IWLS algorithmI certain nonlinear terms inbuilt e.g. Mult, ExpI our terms cannot be expressed in terms of these
functions, so need to write custom "nonlin" function
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Custom "nonlin" Function
LogExcess <- function(age, side = "left"){call <- sys.call()constraint <- ifelse(side == "left",
min(age) - 1e-5, max(age) + 1e-5)list(predictors = list(beta = ∼1, alpha = ∼1),
variables = list(substitute(age)),term = function(predLabels, varLabels) {
paste(predLabels[1], " * log("," -"[side == "right"], varLabels[1], " + "," -"[side == "left"], constraint," + exp(", predLabels[2], "))")
},call = as.expression(call))
}class(LogExcess) <- "nonlin"
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Summary of Baseline ModelCall:
gnm(formula = marriages/lives ~ LogExcess(age, side = "left") +
LogExcess(age, side = "right"), family = binomial, data = fulldata,
weights = lives, start = c(-20, 3, 0, 3, 0))
Deviance Residuals:
Min 1Q Median 3Q Max
-0.8098 -0.4441 -0.3224 -0.1528 4.0483
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -118.5395 201.6387 -0.588 0.55661
LogExcess(age, side = "left")beta 3.6928 1.1913 3.100 0.00194
LogExcess(age, side = "left")alpha -0.1432 0.8935 -0.160 0.87267
LogExcess(age, side = "right")beta 24.8623 38.5743 0.645 0.51923
LogExcess(age, side = "right")alpha 4.0247 1.7376 2.316 0.02054
Std. Error is NA where coefficient has been constrained or is unidentified
Residual deviance: 12553 on 31004 degrees of freedom
AIC: 12748
Number of iterations: 76
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Parameter Correlations
c βl αl βr αr
c 1.00000βl -0.92563 1.00000αl -0.80861 0.95844 1.00000βr -0.99999 0.92688 0.80989 1.00000αr -0.99833 0.90319 0.77910 0.99808 1.00000
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Example ’Recoil’ Plot
10 20 30 40 50
0.00
0.04
0.08
0.12
Age
Pro
babi
lity
of M
arria
ge
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Example ’Recoil’ Plot
10 20 30 40 50
0.00
0.04
0.08
0.12
Age
Pro
babi
lity
of M
arria
ge
10 20 30 40 50
0.00
0.04
0.08
0.12
Age
Pro
babi
lity
of M
arria
ge
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Example ’Recoil’ Plot
10 20 30 40 50
0.00
0.04
0.08
0.12
Age
Pro
babi
lity
of M
arria
ge
10 20 30 40 50
0.00
0.04
0.08
0.12
Age
Pro
babi
lity
of M
arria
ge
10 20 30 40 50
0.00
0.04
0.08
0.12
Age
Pro
babi
lity
of M
arria
ge
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●●
●●
●●
●●●●
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Is Near-aliasing a Problem?
Extended model can still be used as baseline hazard
logit(h(t|xit)) = l(ageit) + x′itβ
Near-aliasing will make models harder to fit - particularlywith several covariates
Not all parameters are interpretable
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Re-parameterizing the Nonlinear Model
The nonlinear hazard model can be re-parameterized asfollows:
l(ageit) = γ − δ
{(ν − αl) log
(ν − αl
ageit − αl
)}+ δ
{(αr − ν) log
(αr − ν
αr − ageit
)}
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Interpretation of Parameters
The parameters of the new parameterisation have a moreuseful interpretation than before:
Age (years)
Pro
babi
lity
of M
arria
ge
ααL νν ααR
expit((γγ))
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
New Parameter Correlations
γ ν δ αl αr
γ 1.00000ν 0.12956 1.00000δ 0.21943 -0.69849 1.00000
αl 0.27236 -0.42848 0.91425 1.00000αr 0.03231 -0.75428 0.93696 0.77910 1.00000
Table: Correlations between the estimated parameters of thereparameterized baseline model defined in Equation ??
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Recoil Plots for Reparameterised Model
x
pred
ictC
urve
(x)
peak height (γ)−2.09 → −1.95
0.00
0.04
0.08
0.12
x
pred
ictC
urve
(x)
peak location (ν)25.39 → 28
x
pred
ictC
urve
(x)
fall off (δ)0.34 → 0.15
x
pred
ictC
urve
(x)
left endpoint (αL)14.17 → 15.04
10 20 30 40 50
x
pred
ictC
urve
(x)
right endpoint (αR)100.66 → 47.68
0.00
0.04
0.08
0.12
10 20 30 40 50
10:50
rep(
0, 4
1)
●
Original ModelPerturbed ModelRe−fitted Model
Age
Pro
babi
lity
of M
arria
ge
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Analysis with the Reparameterised Model
We can now repeat the previous analysis using thenonlinear baseline hazard instead of the BH hazardfunction
I The model selection is qualitatively unchangedI The residual deviance is reduced by about 20 at the
expense of 2 d.f.I There is a lot of uncertainty about the right end-point -
in the final model it is estimated as 400 years with alarge standard error.
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Infinite Right End-point
It seems more appropriate to define the baseline hazard inwhich the right end-point tends to infinity:
l(ageit) = γ−δ
{(ν − αl) log
(ν − αl
ageit − αl
)− ageit − ν
}Re-fitting the final model with this baseline increases thedeviance by a negligible amount
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Comparing Models
15 20 25 30 35 40 45
0.00
0.05
0.10
0.15
Age (years)
Pro
babi
lity
of M
arria
ge
(1949,1954](1954,1959](1959,1964](1964,1969](1969,1974]
Deviance = 12073 Residual d.f. = 31001
15 20 25 30 35 40 45
0.00
0.05
0.10
0.15
Age (years)P
roba
bilit
y of
Mar
riage
(1949,1954](1954,1959](1959,1964](1964,1969](1969,1974]
Deviance = 12051 Residual d.f. = 31000
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Refining the Model
The model building strategy so far has been similar toBlossfeld and Huinink (1991) for comparison
Careful consideration of the fit of the model suggests thatimprovements can be made
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Final Model with New Baseline
15 20 25 30 35 40 45
0.00
0.05
0.10
0.15
Age (years)
Pro
babi
lity
of M
arria
ge(1949,1954](1954,1959](1959,1964](1964,1969](1969,1974]
Deviance = 12051 Residual d.f. = 31000
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Cohort Effect
We can investigate the cohort effect further by replacingthe cohort factor by a year-of-birth factor and plotting theresultant effects
● ● ●
●
●●
● ●
●● ● ●
● ●
●●
●
●
●
●
●
● ●
●
1955 1960 1965 1970
−2.
5−
1.5
−0.
50.
0
Year of Birth
Yea
r−of
−bi
rth
Effe
ct
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Year-of-birth Effect
The plot suggests a more appropriate model
θ exp(λ(yrbi − 1950))
Replacing the year-of-birth factor with this nonlinear termreduces the deviance by 19 whilst gaining 2 d.f.
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Checking the Fit
The new year-of-birth terms takes account of the effect ofthis factor on the magnitude of the hazard
To check for other effects on the hazard, we can groupthe data by year of age and cohort then plot thecorresponding observed and fitted proportions
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Fit over Cohorts
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●●●
●●
as.numeric(colnames(grp))
grpO
bs[i,
]
(1949, 1954](5211)
0.00
0.05
0.10
0.15
0.20
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●●
●●
●
●
●●
as.numeric(colnames(grp))
grpO
bs[i,
]
(1955, 1959)(6283)
●●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
as.numeric(colnames(grp))
grpO
bs[i,
]
(1959, 1964](6560)
15 20 25 30 35 40 45
●●●
●●
●
●●
●
●
●
●●
●
●
●
●
●●
●●
as.numeric(colnames(grp))
grpO
bs[i,
]
(1965, 1969](6289)
15 20 25 30 35 40 45
●●●●●●●
●
●●●
●
●
●●
●
as.numeric(colnames(grp))
grpO
bs[i,
]
(1969, 1974](6666)
as.numeric(colnames(grp))
grpO
bs[i,
]
Age (years)
Pro
port
ion
mar
ried
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Fit over Education Levels
●●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●●
●
●●●●●
●
●●●
as.numeric(colnames(grp))
grpO
bs[i,
]
No attainment/primary(2366)
0.00
0.05
0.10
0.15
0.20
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●●●●
●
●●●
as.numeric(colnames(grp))
grpO
bs[i,
]
Lower secondary(7900)
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●●
●
●●●
as.numeric(colnames(grp))
grpO
bs[i,
]
Upper secondary(11507)
15 20 25 30 35 40 45
●●●●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●●●●
●
●●
as.numeric(colnames(grp))
grpO
bs[i,
]
College(4829)
15 20 25 30 35 40 45
●●●●●●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
as.numeric(colnames(grp))
grpO
bs[i,
]
University(4407)
as.numeric(colnames(grp))
grpO
bs[i,
] ● Observed
Model 13 (common peak)Model 14 (separate peaks)
Age (years)
Pro
port
ion
mar
ried
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Linear Dependence of Peak Location
Quantifying the education level by a dynamic measure ofyears in education ed, we incorporate a linear dependenceof peak location on ed :
l(xit) = γ − δ
{(ν0 + ν1edi − αl) log
(ν0 + ν1edi − αl
ageit − αl
)}+δ {ageit + ν0 + ν1edi}
This results in a non-proportional hazards model
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Years Post-Education
Checking the fit against years post-education:
●●●●●●●●
●●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●●
●
●
●●
●●●●
●
●
●●●●●
−10 0 10 20 30
0.00
0.05
0.10
0.15
Years post education
Pro
port
ion
mar
ried I lower rate of increase in
first 3 yearspost-education
I sharp change at 7 yearspost-education
I outlying points
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Early Career Effect
The lower rate of increase during the first 3 yearspost-education may be explained by an early career effect
This can be incorporated in the model by including anappropriate indicator variable, significantly reducing thedeviance
The deviance does not significantly increase when the leftendpoint is constrained to 15 years
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Effect of EducationPeak location varies from 20.78 years (primary education)to 26.89 years (university graduates)
10 20 30 40 50
0.00
0.05
0.10
0.15
0.20
Age (years)
Pro
babi
lity
of m
arria
geEducation level
PrimaryLower sec.Upper sec.PLCITUniversity
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Effect of Year-of-birthPeak hazard varies from 0.17 (b. 1950) through 0.15 (b.1960) to 0.07 (b. 1970)
10 20 30 40 50
0.00
0.05
0.10
0.15
0.20
Age (years)
Pro
babi
lity
of m
arria
geYear of Birth
195019601970
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Summary
Estimating the support of the hazard function improves fit
Near-aliasing can occur in nonlinear models, but can beovercome by re-parameterisation
Our proposed model has more interpretable parameters,particularly location and magnitude of the maximumhazard
I can investigate effect of covariates on these features
The parametric form does impose some restrictions onthe shape of the hazard curve
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
References
A comprehensive manual is distributed with the packageat http://www.cran.r-project.org/package=gnm
A working paper on the marriage application is availableat www.warwick.ac.uk/go/crism/research/2007
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Acknowledgements
The data are from The Economic and Social ResearchInstitute Living in Ireland Survey Microdata File(©Economic and Social Research Institute).
We gratefully acknowledge Carmel Hannan forintroducing us to this application and providingbackground on the data.
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage