{ Multilevel Modeling using Stata Andrew Hicks CCPR Statistics and Methods Core Workshop based on...

Post on 01-Apr-2015

216 views 2 download

transcript

{

Multilevel Modeling using

StataAndrew HicksCCPR Statistics and Methods Core

Workshop based on the book:

Multilevel and Longitudinal ModelingUsing Stata(Second Edition)

bySophia Rabe-HeskethAnders Skrondal

200

300

400

500

600

700

Min

i Wrig

ht M

eas

ure

me

nts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17Subject ID

Occasion 1 Occasion 2

Within-Subject Dependence

Within-Subject Dependence: We can predict occasion 2 measurement ifwe know the subject’s occasion 1 measurement.

Between-Subject Heterogeneity: Large differences between subjects(compare subjects 9 and 15)

Within-subject dependence is due to between-subject heterogeneity

Standard Regression Model

𝑦 𝑖𝑗=𝛽+πœ‰ 𝑖𝑗

Measurement of subject i on occasion j

Population Mean

Residuals (error terms)Independent over subjects and occasions

Clearly ignores information aboutwithin-subject dependence

{{

{ { 𝜷

Variance Component Model

𝑦 𝑖𝑗=𝛽+πœ‰ 𝑖𝑗

𝜁 𝑗 πœ– 𝑖𝑗𝑦 𝑖𝑗=𝛽+ΒΏ +ΒΏRandom Intercept: deviation of subjectj’s mean from overall mean

Within-subject residual: deviation of observation i from subject j’s mean

Variance Component Model

𝑦 𝑖𝑗=𝛽+πœ‰ 𝑖𝑗

𝜁 𝑗 πœ– 𝑖𝑗𝑦 𝑖𝑗=𝛽+ΒΏ +ΒΏRandom Intercept: deviation of subjectj’s mean from overall mean

Within-subject residual: deviation of observation i from subject j’s mean

Variance Component Model

𝜁 𝑗 πœ– 𝑖𝑗𝑦 𝑖𝑗=𝛽+ΒΏ +ΒΏRandom Intercept: deviation of subjectj’s mean from overall mean

Within-subject residual: deviation of observation i from subject j’s mean

𝜷𝜁 𝑗

𝛽+𝜁 π‘—πœ–2 𝑗

πœ–1 𝑗

Variance Component Model

𝜁 𝑗 πœ– 𝑖𝑗𝑦 𝑖𝑗=𝛽+ΒΏ +¿𝜁 𝑗 ∼ 𝑁 (0 ,πœ“)πœ– π‘–π‘—βˆΌ 𝑁 (0 ,πœƒ)

π‘‰π‘Žπ‘Ÿ ( 𝑦 𝑖𝑗 )=π‘‰π‘Žπ‘Ÿ ( 𝛽)+π‘‰π‘Žπ‘Ÿ (𝜁 𝑗)+π‘‰π‘Žπ‘Ÿ (πœ– 𝑖𝑗)0 πœ“ πœƒ

π‘‰π‘Žπ‘Ÿ ( 𝑦 𝑖𝑗 )=πœ“+πœƒ

Variance Component Model

𝜁 𝑗 πœ– 𝑖𝑗𝑦 𝑖𝑗=𝛽+ΒΏ +ΒΏProportion of Total Variance due to subject differences:

=

=

Intraclass Correlation: within cluster correlation

=

Random or Fixed Effect?

Since every subject has a different effect we can think of subjects as categorical explanatory variables. Since the effectsof each subject is random, we have been using a random effect model:

, 𝜁 π‘—βˆΌ 𝑁 (0 ,πœ“)What if we want to fix our model so that each effect is for a specific subject? Then we would use a fixed effect model:

,

.xtreg wm, fe

Random or Fixed Effect?

random effect model:

if the interest concerns the population of clusters

β€œgeneralize the potential effect” i.e. nurse giving the drug

fixed effect model:

if we are interest in the β€œeffect” of the specific clusters in a particulardataset

β€œreplicable in life” i.e. the actual drug

Random Intercept Model with Covariates

𝑦 𝑖𝑗=𝛽+πœ‰ 𝑖𝑗

𝜁 𝑗 πœ– 𝑖𝑗𝑦 𝑖𝑗=𝛽+ΒΏ +ΒΏwithout covariates:

Random Intercept Model with Covariates

with covariates:

𝑦 𝑖𝑗=𝛽1+𝛽2 π‘₯2 𝑖𝑗+… 𝛽𝑝 π‘₯𝑝𝑖𝑗+πœ‰ 𝑖𝑗

πœ– 𝑖𝑗+¿𝑦 𝑖𝑗=𝛽1+𝛽2 π‘₯2 𝑖𝑗+… 𝛽𝑝 π‘₯𝑝𝑖𝑗+𝜁 𝑗

πœ– 𝑖𝑗+ΒΏ

random parameter not estimated with fixed parameters

but whose variance is estimated with variance of

Ecological Fallacyoccurs when between-cluster relationships differ substantially from within-cluster relationships.

β€’ Can be caused by cluster-lever confounding

For example, mothers who smoke during pregnancy may also adoptother behaviors such as drinking and poor nutritional intake, or have lowersocioeconomic status and be less educated. These variables adversely affectbirthweight and have not be adequately controlled for. In these cases thecovariate is correlated with the error term. (endogeneity)

β€’ Because of this, the between-effect may be an overestimate of thetrue effect.

β€’ In contrast, for within-effects each mother serves as her own control, so within mother estimates may be closer to the true causal effect.

How to test for endogeneity?

Use the Hausman test to compare two alternative estimators of

Random-coefficient model

We’ve already considered random intercept models where the interceptis allowed to vary over clusters after controlling for covariates.

What if we would also like the coefficients (or slopes) to vary across clusters?

Models the involve both random intercepts and random slopes are called Random Coefficient Models

Random-coefficient model

Random Intercept Model:

𝑦 𝑖𝑗=𝛽1+𝛽2 π‘₯𝑖𝑗+𝜁 𝑗+πœ–π‘–π‘—

Random Coefficient Model:

𝑦 𝑖𝑗=𝛽1+𝛽2 π‘₯𝑖𝑗+𝜁 1 𝑗+𝜁2 𝑗 π‘₯ 𝑖𝑗+πœ– 𝑖𝑗

𝑦 𝑖𝑗=(𝛽¿¿1+𝜁1 𝑗)+(𝛽2+𝜁2 𝑗)π‘₯𝑖𝑗+πœ– 𝑖𝑗¿

cluster-specific random intercept

cluster-specific random slope