OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMANAARHUSUNIVERSITYDEPARTMENT OF PUBLIC HEALTH
AUDEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
LONGITUDINAL ANALYSES IN DIABETES EPIDEMIOLOGY
Methodological overview
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
OUTLINE
� Multilevel models
• Motivation
• Definition, model formulation
• Practical example
� Missing data
• Types, consequences
� Extensions of the multilevel framework
• Joint modeling
• Latent class trajectory analysis
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
MULTILEVEL MODELS
� Part on multilevel models is based on ALDA* from Singer & Willett
� Notations and examples are from ALDA
http://www.ats.ucla.edu/stat/examples/alda*Applied Longitudinal Data Analysis
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
MOTIVATION
� Is there a need for longitudinal studies?
� If yes, do we need special statistical models to analyze the data?
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
QUESTIONS
� How does the outcome change over time within individuals?
� Which factors explain between-individual differences?
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
MULTILEVEL MODEL FORMULATION
Yij = π 0i + π1i ⋅TIMEij +εijLevel 1 ε ij �N(0,σ ε
2 )
Growth parameters for individual i
� We assume linear change over time t
π 0 i = γ00 +ζ0i
π1i = γ10 +ζ1i
Level 2
ζ 0 i
ζ1i
�N
0
0
,
σ 0
2 σ 01
σ 10 σ 1
2
Fixed effects: population average intercept & slopeRandom effects: deviations from the population average
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
VARIANCE COMPONENTS
σ ε2: unexplained within-person residual variance
σ 0
2: between-person residual variance in initial status
σ1
2: between-person residual variance in rate of change
σ 01,σ10: residual covariance between initial status and rate of change
π 0 i = γ 00 +ζ 0 i
π1i = γ 10 +ζ1i
ε ij �N(0,σ ε
2 )
ζ 0 i
ζ1i
�N
0
0
,
σ 0
2 σ 01
σ 10 σ 1
2
Yij = π 0i + π1i ⋅TIMEij +εij
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
ESTIMATION
� Full maximum likelihood vs. Restricted maximum likelihood (ML / REML)
• Focus on fixed effects (ML) or variance components (REML)
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
COMPOSITE MODEL
� Substitute level-2 equations into level-1
� Rearrange terms of the composite model
Yij = (γ00 +ζ0i )+ (γ10 +ζ1i ) ⋅TIMEij +εij
Yij = γ00 +γ10 ⋅TIMEij + (ζ0i +ζ1i ⋅TIMEij +εij )
Composite residual: rij
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
EXAMINE COMPOSITE RESIDUALS
� We assume a balanced data set with three waves for simplicity
� OLS?
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
OLS NOT CORRECT
� Between individuals
• Independence
� Within individuals
• Heteroscedastic
• Correlated
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
COVARIANCE STRUCTURE
� Block diagonal matrix
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
COVARIANCE STRUCTURE
� Unstructured
� Compound symmetry (if slopes does not differ much)
� Autoregressive (band diagonal)
� …
Σ r =
σ r1
2 σ r1r2σ r1r3
σ r2r1σ r2
2 σ r2r3
σ r3r1σ r3r2
σ r3
2
Σ r =
σ 2 +σ 1
2 σ1
2 σ1
2
σ 1
2 σ 2 +σ 1
2 σ1
2
σ 1
2 σ1
2 σ 2 +σ 1
2
Σ r =
σ 2 σ 2ρ σ 2ρ 2
σ 2ρ σ 2 σ 2ρ
σ 2ρ 2 σ 2ρ σ 2
σ ri
2 = Var (ζ 0 i +ζ1i ⋅ tij + ε ij ) = σ ε2 +σ 0
2 + 2σ 01tij +σ1
2tij
2
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
GENERALIZED LEAST SQUARES
� More complex assumptions than OLS
� Minimizes the sum of squared residuals
� Some programs use iterative GLS (IGLS)
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
EXAMPLE FROM SINGER & WILLET
Yij = (γ00 +γ01COAi +ζ0 i )+ (γ10 +γ11COAi +ζ1i ) ⋅TIMEij +εij
� Alcohol use during adolescence (data and code at ALDA webpage)
• Unconditional means (Model A)
• Unconditional growth (Model B)
• Model B + a time-invariant predictor (Model C)
Yij = (γ00 +ζ0i )+ (γ10 +ζ1i ) ⋅TIMEij +εij
Yij = γ00 +ζ0 i +εij
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
PRACTICAL
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
MISSING DATA
� Different reasons: late entry, intermittent missingness, loss to follow-up
� Can lead to unbalanced data
� Effect depends on the missing data mechanism (type of missingness)
• Missing completely at random (MCAR)
• Missing at random (MAR)
• Missing not at random (MNAR)
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
MCAR
� Missingness is completely unrelated to both the history and the current value of the outcome
� Examples:
• Lab equipment did not work
• Samples were lost
• Participants were mistakenly not invited for examination
� Less data collected (loss of efficiency, but no bias)
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
MAR
� Missingness depends on the history of the outcome, but not on the current value
� Example:
• Participants are not invited anymore if they reached a threshold (diabetes diagnosis based on FPG)
� Some methods might give biased estimatest
7.0 mmol/lF
PG
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
MAR
� ML based method gives valid estimates in case of MAR
Figure from Rizopoulos D, CRC Press, 2012
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
MNAR
� Missingness depends on the current value of the outcome
� E.g. participant doesn’t show up because she
� MAR vs. MNAR?
� Dropout event should be modelled simultaneously (JM)t
7.0 mmol/lF
PG
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
EXTENSIONS OF THE MULTILEVEL MODEL
� Joint models for longitudinal and survival data
� Latent class trajectory analyses
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
JOINT MODELING
� Combines trajectory and survival analyses (continuous biomarker closely related to an event)
� Survival analysis, but taking the entire history of a biomarker into account (also for endogenous variables)
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
IN RELATION TO MULTILEVEL MODELS
� JM gives valid estimates in case of MNAR
� No statistical test to decide between MAR and MNAR…
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
IN RELATION TO COX REGRESSION
� Common approach to use baseline or last available value in Cox model
� Time-varying Cox regression
• Problematic for endogenous (internal) variables
• Does not take measurement error into account (unrealistic)
• Underestimates the true association (theoretical + simulation results)
Figure from Rizopoulos D, CRC Press, 2012
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
PARAMETERIZATION OF JM
� Link between longitudinal and survival model
� Association between biomarker and event hazard
• Value (1)
• Lagged (2)
• Slope (3)
• Cumulative (4)
• …
• Any function of m(t)
1 2
3 4
Figure from Rizopoulos D, CRC Press, 2012
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
DYNAMIC PREDICTION
Figure from Rizopoulos D, CRC Press, 2012
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
LATENT CLASS ANALYSES
� Is it sufficient to look at only mean trajectories?
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
LATENT CLASS ANALYSES
� Just one mean trajectory does not always fit well to the data
� Potential underlying heterogeneity, but not based on predefined groups
� “Cluster-type” analysis
� Model formulation is similar to what we saw previously for multilevel models, but we have to specify a priori
• which effects might vary between classes (not necessarily all)
• the number of classes
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
MODELING STEPS
� Common strategy: fit 1, 2, 3 classes…
� Choose lowest BIC with meaningful patterns and sufficient class size (e.g. >5%)
� This results in coefficient estimates for each class (some might be shared coefs)
� We also get class membership probabilities, so that we can assign individuals in our sample to a pattern
� Comparison of class characteristics
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMAN
DEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AARHUSUNIVERSITYAU
EXAMPLE
� Plasma glucose peaks between 30-75 mins at 7–12.5 mmol/l
Study Inter99 CPH Hoorn
N (participants) 118 238 185
N (measurements) 9 5 6
Men (%) 61 64 48
Age, median (Q1-Q3)
56 (46-61) 56 (38-66) 54 (48-59)
� Oral glucose tolerance test (OGTT) with multiple glucose measurements
OCTOBER 23, 2015
RESEARCH SEMINAR
POSTDOCTORAL RESEARCHER
ADAM HULMANAARHUSUNIVERSITYDEPARTMENT OF PUBLIC HEALTH, SECTION FOR EPIDEMIOLOGY
AU
Thank you for your attention!