7/12/10
1
Alan C. Acock University Distinguished Professor of Family Studies &
Knudson Chair for Family Research & Policy Oregon State University
College of Health and Human Sciences Summer Workshop Series
July 2010
Multilevel/Mixed Models and Longitudinal Analysis Using Stata
Introduction to Growth Curves Using Stata
7/12/10
2
What’s in a name
Alan C. Acock, July, 2010 2
What’s in a name: Cross Sectional When measured at one time—Repeated
measures on a case The case might be a family Repeated measures might be Dad’s happiness,
Mom’s happiness, Oldest kid’s happiness, Next oldest kid’s happiness, etc.
Idea is that the measurements are nested (repeated) in the case.
We have 2+ measurements in each family
Alan C. Acock, July, 2010 3
7/12/10
3
What’s in a name: Longitudinal When measured longitudinally in a panel The case might be a an individual Repeated measures might be his/her happiness at
wave 1, wave 2, wave 3, wave 4, etc. Idea is that the measurements are nested
(repeated) in the case. We have 2+ measurements in each family
Alan C. Acock, July, 2010 4
What’s this about levels—Cross Sectional? Cross-sectional has individuals nested in families.
Level 1 is the individual’s score (mom, dad, kid) Level 2 is the family Level 1 scores within a family more homogeneous than
scores for random individuals
Level 3 might be neighborhood
Alan C. Acock, July, 2010 5
7/12/10
4
What’s this about levels?--Variables Can have different predictor variables at each
level Level 1 variables might be personality, IQ, attitude Level 2 variables might be household income, days/
week family eats dinner together
Level 3 might be neighborhood %white, median home value
Key—All this is interdependent because the levels are nested
Alan C. Acock, July, 2010 6
What’s this about levels--Longitudinal? Longitudinal models have scores at each wave
nested in individuals Level 1 is the score at wave 1, wave 2, etc. Level 2 is the individual Level 1 scores of individual at each wave are more
homogeneous than scores for random individuals
Alan C. Acock, July, 2010 7
7/12/10
5
Graphing the Interdependence Sophia Rabe-Hesketh & Anders Skrondal, Multilevel and Longitudinal Modeling Using Stata.
College Station, TX: Stata Press. I change the labels of variables from what they use
Alan C. Acock, July, 2010 8
Graphing the Interdependence Generate a mean for the husband
Alan C. Acock, July, 2010 9
twoway (scatter husband1 couple, msymbol(circle)) ///! (scatter husband2 couple, sort msymbol(circle_hollow)), ///! xtitle(Couple) ytitle(Husband's measure Stability) ///! legend(order(1 "Time 1" 2 "Time 2")) !
7/12/10
6
Graphing the Interdependence
Alan C. Acock, July, 2010 10
200
300
400
500
600
700
Husb
and'
s m
easu
re S
tabi
lity
0 5 10 15 20Couple
Time 1 Time 2
Wide to Long Format
Alan C. Acock, July, 2010 11
7/12/10
7
Wide to Long Format
Alan C. Acock, July, 2010 12
Variance Components—Intraclass Correlation We run a regression with no predictors and tell Stata
what is the id variable
Alan C. Acock, July, 2010 13
7/12/10
8
The command--xtreg! Stata has many commands for multilevel models, all start
with xt! . xtreg husband, i(couple) mle!
Just enter the level 1 variable (repeated variable) in variable list In our data, each husband from 1 to 17 is identified by the
variable couple. The “i” means whatever variable is in parentheses is the
identification variable. This might be called id, case, etc. Here it happens to be
called couple
Alan C. Acock, July, 2010 14
The command--xtreg! The “mle” means we are asking for a maximum likelihood
estimator The default is restricted maximum likelihood, reml ! But reml makes it harder to compare models
This command requires the data to be in the long format
Alan C. Acock, July, 2010 15
7/12/10
9
The xtreg Result
Alan C. Acock, July, 2010 16
The xtreg Result
Alan C. Acock, July, 2010 17
We have 34 level 1 observations (two measures) for each of our 17 level 2 cases (called groups since the level 1 values are grouped in the 17 level 2 husbands
We have no missing values: min, avg, max all = 2. Stata automatically uses all available data, e.g., with families and mom, dad, kids some families (level 2) might have 1 kid, some might have 2 kids, etc.
The chi-square test with no predictors is meaningless (df = 0)
The maximized log likelihoods value is -184.58
7/12/10
10
The xtreg Result
Alan C. Acock, July, 2010 18
The _cons (constant/intercept) with no predictors, 453.91, is the overall mean (best guess in absence of predictors)
The /sigma_u is that standard deviation (other programs report variance (option in Stata) Between(husbands). We expect this to be large
The /sigma_e is the standard deviation Within(husbands). We expect this to be small.
Rho (ρ) is the intraclass correlation (ICC) ICC =
Var(Between)Var(Between) +Var(Within
=SD(Between)2
SD(Between)2 + SD(Within)2
=107.04642
107.04642 +19.910832= .967
The xtreg Result
Alan C. Acock, July, 2010 19
The _cons (constant/intercept) with no predictors, 453.91, is the overall mean (best guess in absence of predictors)
The /sigma_u is that standard deviation (other programs report variance (option in Stata) Between(husbands). We expect this to be large
The /sigma_e is the standard deviation Within(husbands). We expect this to be small.
Rho (ρ) is the intraclass correlation (ICC) is .967 Below table chi-square(1) = 46.27; p < .001 is the significance of
the ICC
7/12/10
11
The Intraclass Correlation
Alan C. Acock, July, 2010 20
ICC =Var(Between)
Var(Between) +Var(Within=
SD(Between)2
SD(Between)2 + SD(Within)2
=107.04642
107.04642 +19.910832= .967
Using the standard deviations is more easily interpretable than using the variances. About 95% of the husbands will be within 2*(107.05) of the mean of 453.91. That is, the mean plus or minus 214.10. Roughly between 250 and 650.
About 95% of the two measures for each husband will be within 2*19.91 of the husband’s mean. Roughly his mean plus or minus 40.
Husbands are relatively stable. Most variance is between husbands rather than within husbands.
The xtmixed command
• The xtmixed command is much more general • . xtmixed does not report the ICC • . xtmixed husband || couple:, mle!
• After the two vertical bars we have the identification variable followed by a colon
• After the comma we ask from a maximum likelihood estimator
Alan C. Acock, July, 2010 21
7/12/10
12
The xtmixed result
Alan C. Acock, July, 2010 22
The xtmixed result
Alan C. Acock, July, 2010 23
This has all of the same numbers as the xtreg! The variance components are shown in the bottom table
labeled random-effects parameters The standard deviation between individuals is the standard
deviation around the overall mean, 453.91. This appears as the sd(_cons) and is 107.05
The standard deviation within each husband, across his repeated measures is the sd(Residual) and is 19.91
The ICC is computed using the simple formula shown before
7/12/10
13
The xtmixed result—Conf. Intervals
Alan C. Acock, July, 2010 24
Both results show standard errors for the estimated standard deviations and 95% confidence intervals
These are somewhat problematic. The boundary space for a variance or standard deviation has a lower limit of zero
A similar problem occurs putting a confidence interval around a correlation coefficient since it can’t be below 0.
Stata adjusts for this by reporting an asymmetric confidence interval. A symmetrical C.I. for sd(Residual) would be
13.24-26.58
Graphic Representation of Variance Components
Alan C. Acock, July, 2010 25
H1
H2
ε11
ε21ζ1
Husband j's mean (true score)
Distance of husband j from overall mean 453.91
M = 453.91
7/12/10
14
Graphic Representation of Variance Components
Alan C. Acock, July, 2010 26
Husband j’s mean is ζj above the overall mean—a happy guy
At time 1, he is ε1j point is above his average score At time 2, he is ε2j point is above his average score The variance of his mean around the overall mean (ζj) is the
between variance (should be big) The variance of his two scores around his own mean (εij ) is
the within variance (should be small)
Applications of Variance Components Often just a first step to get the ICC to show that the data
is not independent and a multilevel analysis is needed If ICC is small some say you do not need to run
multilevel analysis Counter argument—If the design is multilevel then you
need to run a multilevel analysis
Alan C. Acock, July, 2010 27
7/12/10
15
Applications of Variance Components You don’t change the test you planned to do to get a
significant result If you set up a nonparametric test and it was not
significant, but then you noticed a few outliers, what would you do? Change to a t-test that is sensitive to outliers and might be significant Stay the course with your research design FDA expects drug companies to indicate what tests they will run
before they collect the data and does not allow them to try different tests till they find one is significant
If you set up a test using a two-tail assumption, can you change it to one-tail after seeing the result?
This is equivalent to not running a multilevel analysis after you see the ICC is small
Alan C. Acock, July, 2010 28
Applications of Variance Components Can use ICC and graph to see who is most similar Are wives more consistent than husbands? Are identical twins more similar than other twins? Are students in all female math classes more similar than
mixed math classes? Just compare the ICCs and possibly do a graph
Alan C. Acock, July, 2010 29
7/12/10
16
How many 2nd level groups are needed?
Alan C. Acock, July, 2010 30
• Here these are husbands • Could be families, organizations, classrooms, etc
• In a very real sense, these are your cases. • 30 to 50 seems reasonable • It is possible to do a power analysis
• If you had 5 classes, it would be like having 5 observations—a pretty small sample size
How many level 1 scores are needed?
Alan C. Acock, July, 2010 31
Here we only had 2, more would be very helpful Could be scores on members of a group—students
in a class (25-30), members of a family (3-6) Issue is getting a mean of these values to
represent some sense of a true score. Husband’s mean is his reference point Mean of 25 students in a class is the classes
reference point
7/12/10
17
Do-file
Alan C. Acock, July, 2010 32
* intraclass.do!clear!cd "/Volumes/acock/1flash/1presentations/OSU 2010 Workshop/
data"!use intraclass.dta!list couple hus*!
egen husband_mean = rowmean(husband1 husband2)!summarize husband_mean!* Use menu system to generate this graph!twoway (scatter husband1 couple, msymbol(circle)) ///! (scatter husband2 couple, sort msymbol(circle_hollow)), ///! xtitle(Couple) ytitle(Husband's measure Stability) ///! legend(order(1 "Time 1" 2 "Time 2")) !list!* Reshaping the data from wide to long!reshape long wife husband, i(couple) j(occassion)!list couple occassion husband husband_mean if couple < 5!* Variance Components models!xtreg husband, i(couple) mle!xtmixed husband || couple: , mle!xtmixed husband || couple: , mle nolog!
Do-file
Alan C. Acock, July, 2010 33
* Comparison table!quietly xtreg wife, i(couple) mle!estimates store her!quietly xtreg husband, i(couple) mle!estimates store him!estimates table her him!list in 1/10!gen id = _n!list in 1/10!rename wife pw!rename husband ph!
list in 1/10!reshape long p, i(id) j(partner) string!list in 1/10!encode partner, gen(spouse)!list in 1/10, nolabel!recode spouse 2 = 0!list in 1/10, nolabel!
7/12/10
18
Do-file
Alan C. Acock, July, 2010 34
xtmixed p || couple:, mle!estimates store model1!xtmixed p spouse || couple:, mle!estimates store model2!twoway (scatter p couple if spouse==0, msymbol(circle)) ///!! (scatter p couple if spouse==1, msymbol(circle_hollow)), ///!! xtitle(Couple) ytitle(Marital Satisfaction) ///!! legend(order(0 "Wife" 1 "Husband")) xlabel(1/17)!
* Three Way, measures nested in spouses who are nested in couples!
xtmixed p spouse || couple: || spouse:, mle!estimates store model3!lrtest model2 model3!
Sometimes a Simple Example Helps Farmer Brown has 48 brand new pigs and his daughter,
Emma, weighs each pig once a week for 9 weeks Farmer Brown wants to know what the weight trajectory Stata uses this data, but I’ve added a catch. Emma is not
reliable. In fact, she only records 294 of the 432 (9*48) possible weights so that we have 30% missing values.
This means only 3 pigs got weighed all 9 weeks (listwise) The result for the first 2 pigs (in Long Format) appears on
the next slide
Alan C. Acock, July, 2010 35
7/12/10
19
Data for first two pigs
Alan C. Acock, July, 2010 36
Graph for 10 pigs
Alan C. Acock, July, 2010 37
twoway connected weight week if id<=10, connect(line)!
2040
6080
weigh
t
0 2 4 6 8 10week
7/12/10
20
How about a fixed effects model?
Alan C. Acock, July, 2010 38
Brown really doesn’t care much for individual differences and really just want to see how fast the pigs are growing—overall
To adjust for the lack of independence (9 weights nested in each pig), Brown does a fixed effects model using xtreg!
Fixed Effects Model
Alan C. Acock, July, 2010 39
7/12/10
21
Making a graph of the fixed effect
Alan C. Acock, July, 2010 40
predict weightfe! twoway (line weightfe week)!
2040
6080
Line
ar p
redi
ctio
n
0 2 4 6 8 10week
Random Intercept model
Alan C. Acock, July, 2010 41
There are now two error terms, one for the variance around the intercept and one for the rest of the unexplained variance
Pig i at week j now has μi This error will be positive if the pig weighs more than the
average initially It will be negative if weights less than average initially Intercept will be
There is also an error, εij, for each pig at each wave. A pig might have been sick one week and lost weight that week.
weightij = β0 + β1weekij + µi + εij
β0 + µi
7/12/10
22
Estimating the random intercept model
Alan C. Acock, July, 2010 42
. xtmixed weight week || id:, mle! . estimates store weightri! weight week part—Response variable weight has a
fixed portion depending on the week || id: specifies a random effect by the grouping variable id. This gives us the random intercept.
The mle uses a maximum likelihood estimator The estimates store weightri stores the
results using the name weightri!
Random Intercept Model: Results
Alan C. Acock, July, 2010 43
7/12/10
23
Random Intercept Model: Interpretation
Alan C. Acock, July, 2010 44
We have 294 cases where we have a weight for a pig (not 3 as would be the case with listwise deletion and not 432
The first estimation table reports the fixed effects We estimate B0 = 19.36 and B1 = 6.21! Weight = 19.36 + 6.21×week + error is our fixed effect part
Second table is variance components. The 3.89 is the standard deviation of the constant/intercept and its
standard error, 0.41 is quite small The sd(Residual) = 2.10 is the standard deviation of the
error (standard error) The chi-square(1) = 295.19, p < 0.001
tells us we needed to use a multilevel model
A Random Slope
Alan C. Acock, July, 2010 45
Now let’s try a random coefficient/slope
The μ0i is the variance around the intercept The μ1iweekij is the variance weekly variance around the
slope Random intercept:
Random slope:
weightij = β0 + β1weekij + µ0i + µ1iweekij + εij
(β0 + µ0i )
(β1 + µ1i )weekij
7/12/10
24
Covariance of Intercept & Slope
Alan C. Acock, July, 2010 46
Need to decide on the covariance of the intercept and the slope
The default assumes the covariance of the intercept variance and slope variance are uncorrelated, an identity matrix
A Random slope: cov(unstruct)
Alan C. Acock, July, 2010 47
Now let’s try a random slope
The μ0i is the variance around the intercept The μ1iweekij is the variance weekly variance around the
slope Unstructured covariance assumes the covariance of the
intercept variance and slope variance are correlated:
weightij = β0 + β1weekij + µ0i + µ1iweekij + εij
7/12/10
25
Random Coefficients Model
Alan C. Acock, July, 2010 48
. xtmixed weight week || id: week, nolog mle cov(unstruct) var! . estimates store weightrc! . lrtest weightri weightrc!
The id: is the part of the command that gives us the random intercept
Any variable after the colon will have a random coefficient The variable week is allowed to have a different slope for each pig
since some grow faster than others The cov(unstruct) allows the random intercept and
random slope to be correlated Notice the var at the end means we are estimating variances
Alan C. Acock, July, 2010 49
7/12/10
26
School Engagement Example
Alan C. Acock, July, 2010 50
Data from Day and others of children and their parents from Seattle. They have 3 waves. Kids were 10, 11, 12, or 13 the first wave, 11, 12, 13, or 14 the second wave, and 12, 13, 14, or 15 the third year
Reorganized data by age at birth (MCAR)
birthyr | wave1 wave2 wave3 wave4 wave5 wave6!---------+------------------------------------------------------------! 1994 | 0 0 0 68 68 68! 1995 | 0 0 121 121 121 0! 1996 | 0 190 190 190 0 0! 1997 | 115 115 115 0 0 0!---------+------------------------------------------------------------! Total | 115 305 426 379 189 68!----------------------------------------------------------------------!
Correlation of Intercept and Slope
Alan C. Acock, July, 2010 51
We can see if the intercept and slope are correlated
We need to do 494 separate regressions of school engagement on year for each child and save the 494 intercepts and slopes
statsby inter=_b[_cons] slope = _b[yr], ///! by(id) saving(ols): regress sch yr!
7/12/10
27
Correlation of Intercept and Slope
Alan C. Acock, July, 2010 52
We merge the saved dataset with our active dataset
Then we do the graph using twoway (scatter slope inter) (lfit slope /// inter), xtitle(Intercept) ytitle(Slope)!
Intercept and slope are correlated
Alan C. Acock, July, 2010 53
r = -.73
-2-1
01
2Sl
ope
0 2 4 6Intercept
7/12/10
28
How do the means fit?
Alan C. Acock, July, 2010 54
We expect there to be a steady decline in school engagement
Using xtreg to estimate the ICC
Alan C. Acock, July, 2010 55
7/12/10
29
Compare random intercept & random coefficient models
Alan C. Acock, July, 2010 56
. xtmixed sch female mom_ed nev_mar ///! div_sep other yr || id:, mle ///! cov(unstructured)!. estimates store ri!. xtmixed sch female mom_ed nev_mar /// ! div_sep other yr || id:, mle!. estimates store ri!. lrtest ri rc!
Telling a story
Alan C. Acock, July, 2010 57
We will run the model using random slopes (even though in this case they were not needed)
We will create a graph comparing a male whose mother has low education and has never married to a female whose mother has a college degree and is married
We think of these as “ideal types” . xtmixed sch female mom_ed nev_mar div_sep ///! other yr || id: yr, mle cov(unstructured)!. predict sch_score!. twoway (connected sch_score yr if female==0 ///! & mom_ed==2 & nev_mar==1, sort)(connected ///! sch_score yr if female ==1 & mom_ed==4 & ///! mom_ed < . & nev_mar==0 & div_sep==0 & other==0)!
7/12/10
30
Telling a Story
Alan C. Acock, July, 2010 58
33.
23.
43.
63.
84
Line
ar p
redi
ctio
n, fi
xed
porti
on
0 1 2 3 4 5yr
Male, Mom never married, low edFemale, Mom married, B.A.
Alan C. Acock, July, 2010 59
*mkdaygrow!clear!cd "/Volumes/acock/1daygrow/data"!use "wave1-3_final_combinedsite_8.dta!destring family_id, gen(id)!
keep if site == 1!fre p1_21b_1 p1_21c_1!gen birthyr = 2007 - p1_21b_1!tab birthyr p1_21b_1!replace birthyr = 1995 if birthyr == 1994.5!drop if birthyr == 1993 | birthyr == 1998!tab birthyr p1_21b_1!gen age1 = 13 if birthyr == 1994!gen age2 = 14 if birthyr == 1994!gen age3 = 15 if birthyr == 1994!replace age1 = 12 if birthyr == 1995!replace age2 = 13 if birthyr == 1995!replace age3 = 14 if birthyr == 1995!replace age1 = 11 if birthyr == 1996!replace age2 = 12 if birthyr == 1996!replace age3 = 13 if birthyr == 1996!replace age1 = 10 if birthyr == 1997!replace age2 = 11 if birthyr == 1997!replace age3 = 12 if birthyr == 1997!gen wave1 = 0 if age1 == 10!gen wave2 = 1 if age2 == 11!replace wave2 = 1 if age1 == 11!gen wave3 = 2 if age3 == 12!replace wave3 = 2 if age2 == 12!replace wave3 = 2 if age1 == 12!
7/12/10
31
Alan C. Acock, July, 2010 60
factor c_scheng1_1 - c_scheng3_1 c_scheng5_1 c_scheng7_1 - c_scheng8_1 ///!!c_scheng15_1, pcf!
factor c_scheng1_2 - c_scheng3_2 c_scheng5_2 c_scheng7_2 - c_scheng9_2, pcf!factor c_scheng1_3 - c_scheng3_3 c_scheng5_3 c_scheng7_3 - c_scheng9_3, pcf!alpha c_scheng1_1 - c_scheng3_1 c_scheng5_1 c_scheng7_1 - c_scheng8_1 ///!!c_scheng15_1, asis item!
alpha c_scheng1_2 - c_scheng3_2 c_scheng5_2 c_scheng7_2 - c_scheng9_2, ///! asis item!alpha c_scheng1_3 - c_scheng3_3 c_scheng5_3 c_scheng7_3 - c_scheng9_3, ///! asis item!egen schengage1 = rowmean(c_scheng1_1 - c_scheng3_1 c_scheng5_1 ///! c_scheng7_1 - c_scheng8_1 c_scheng15_1)!egen schengage2 = rowmean(c_scheng1_2 - c_scheng3_2 c_scheng5_2 ///!! c_scheng7_2 - c_scheng9_2) !
egen schengage3 = rowmean(c_scheng1_3 - c_scheng3_3 c_scheng5_3 ///!! c_scheng7_3 - c_scheng9_3)!
pwcorr schengage1-schengage3, obs!/* make six wave for school engatement!*/!
gen sch1 = schengage1 if birthyr == 1997!gen sch2 = schengage2 if birthyr == 1997!gen sch3 = schengage3 if birthyr == 1997!replace sch2 = schengage1 if birthyr == 1996!replace sch3 = schengage2 if birthyr == 1996!gen sch4 = schengage3 if birthyr == 1996!replace sch3 = schengage1 if birthyr == 1995!replace sch4 = schengage2 if birthyr == 1995!gen sch5 = schengage3 if birthyr == 1995!replace sch4 = schengage1 if birthyr == 1994!replace sch5 = schengage2 if birthyr == 1994!gen sch6 = schengage3 if birthyr == 1994!list id sch* birthyr in 1/50!tabstat sch1-sch6, statistics( count mean ) by(birthyr) columns(variables)!
Alan C. Acock, July, 2010 61
gen wave4 = 3 if age3 == 13!replace wave4 = 3 if age2 == 13!replace wave4 = 3 if age1 == 13!gen wave5 = 4 if age3 == 14!replace wave5 = 4 if age2 == 14!gen wave6 = 5 if age3 == 15!tabstat wave*, statistics( count ) by(birthyr) columns(variables)!/*!Summary statistics: N! by categories of: birthyr !
birthyr | wave1 wave2 wave3 wave4 wave5 wave6!---------+------------------------------------------------------------! 1994 | 0 0 0 68 68 68! 1995 | 0 0 121 121 121 0! 1996 | 0 190 190 190 0 0! 1997 | 115 115 115 0 0 0!---------+------------------------------------------------------------! Total | 115 305 426 379 189 68!----------------------------------------------------------------------!
*/!/* School Engagement!Wave 2 and 3 had 9 items, wave 1 had 15. Droped items 4 and 6 as negqtively !worded. Kept 7 items that are in common!c_scheng1_1 - c_scheng3_1 c_scheng5_1 c_scheng7_1 - c_scheng9_1 c_scheng15_1!c_scheng1_2 - c_scheng3_2 c_scheng5_2 c_scheng7_2 - c_scheng9_2!c_scheng1_3 - c_scheng3_3 c_scheng5_3 c_scheng7_3 - c_scheng9_3!alphas are .80, .83, and .83 for waves 1, 2, and 3.!*/!
7/12/10
32
Alan C. Acock, July, 2010 62
factor c_scheng1_1 - c_scheng3_1 c_scheng5_1 c_scheng7_1 - c_scheng8_1 ///!!c_scheng15_1, pcf!
factor c_scheng1_2 - c_scheng3_2 c_scheng5_2 c_scheng7_2 - c_scheng9_2, pcf!factor c_scheng1_3 - c_scheng3_3 c_scheng5_3 c_scheng7_3 - c_scheng9_3, pcf!alpha c_scheng1_1 - c_scheng3_1 c_scheng5_1 c_scheng7_1 - c_scheng8_1 ///!
!c_scheng15_1, asis item!alpha c_scheng1_2 - c_scheng3_2 c_scheng5_2 c_scheng7_2 - c_scheng9_2, ///! asis item!alpha c_scheng1_3 - c_scheng3_3 c_scheng5_3 c_scheng7_3 - c_scheng9_3, ///! asis item!egen schengage1 = rowmean(c_scheng1_1 - c_scheng3_1 c_scheng5_1 ///! c_scheng7_1 - c_scheng8_1 c_scheng15_1)!egen schengage2 = rowmean(c_scheng1_2 - c_scheng3_2 c_scheng5_2 ///!
! c_scheng7_2 - c_scheng9_2) !egen schengage3 = rowmean(c_scheng1_3 - c_scheng3_3 c_scheng5_3 ///!
! c_scheng7_3 - c_scheng9_3)!pwcorr schengage1-schengage3, obs!/* make six wave for school engatement!*/!
gen sch1 = schengage1 if birthyr == 1997!gen sch2 = schengage2 if birthyr == 1997!gen sch3 = schengage3 if birthyr == 1997!replace sch2 = schengage1 if birthyr == 1996!replace sch3 = schengage2 if birthyr == 1996!gen sch4 = schengage3 if birthyr == 1996!replace sch3 = schengage1 if birthyr == 1995!replace sch4 = schengage2 if birthyr == 1995!gen sch5 = schengage3 if birthyr == 1995!replace sch4 = schengage1 if birthyr == 1994!replace sch5 = schengage2 if birthyr == 1994!gen sch6 = schengage3 if birthyr == 1994!list id sch* birthyr in 1/50!tabstat sch1-sch6, statistics( count mean ) by(birthyr) columns(variables)!
Alan C. Acock, July, 2010 63
/* Generating covariates!gender!mom's education!marital status===REDO!*/!gen nev_mar = 1 if famstruct2_1 == 4!replace nev_mar = 0 if famstruct2_1 ~= 4 & famstruct2_1 < .!gen div_sep = 1 if famstruct2_1 == 1 | famstruct2_1 == 5!replace div_sep = 0 if famstruct2_1 ~= 1 & famstruct2_1 ~= 5 & famstruct2_1 < .!gen married = 1 if famstruct2_1 == 2!replace married = 0 if famstruct2_1 ~= 2 & famstruct2_1 < .!gen other = 1 if famstruct2_1 == 3 | famstruct2_1 == 6!replace other = 0 if famstruct2_1 ~= 3 & famstruct2_1 ~= 6 & famstruct2_1 < .!fre famstruct2_1 nev_mar div_sep married other!gen female = p1_21a_1 -1!fre female!clonevar mom_ed = p1_4_1!reshape long sch, i(id) j(w)!keep id sch w female mom_ed nev_mar div_sep married other!list id sch w in 1/30!gen yr = w -1!
/*!!We want to know if the means for school engagement go down/up in a linear!!fashion. We can make a table of the mean for each of the six years, year!!0 to year 5!
*/!tabstat sch, statistics(mean count) by(yr) columns(variables)!
xtreg sch, i(id) mle!xtmixed sch yr female mom_ed nev_mar div_sep other || id:!xtmixed sch yr female mom_ed ||id:yr!regress sch yr if id == 7010001!
7/12/10
33
Alan C. Acock, July, 2010 64
/* ! Correlation of intercept and Slope! This section calculates the intercept and the slope when you regress sch on ! yr for each case, then it creates a graph showing the link of school and year. !*/!statsby inter=_b[_cons] slope = _b[yr], by(id) saving(ols): regress sch yr!sort id!merge id using ols!drop _merge!
twoway (scatter slope inter) (lfit slope inter), xtitle(Intercept) ytitle(Slope)!corr inter slope!corr inter slope, cov!xtdescribe if yr < ., i(id) t(yr)!xtsum sch female mom_ed nev_mar div_sep married other yr, i(id)!regress sch female mom_ed nev_mar div_sep married other yr!predict res, residuals!/* Correlation of residuals */!preserve!keep id res yr!reshape wide res, i(id) j(yr)!tabstat res*, statistics(count variance) !pwcorr res*,obs!restore!/* Fixed effects model !These effects are the within subject estimates effects of the time !varying covariates. We have none. The time invariant covariates have !no within subject variance and hence cannot be estimated (are dropped).!The estimates for time variant covariance are not biased because of !omitted time invariant covariates. Each subject serves as his/her own!control. We could add time varying family processes, for example.!*/!
Alan C. Acock, July, 2010 65
xtreg sch female mom_ed nev_mar div_sep other yr, i(id) fe!/* Random Intercept Model !*/!xtmixed sch yr || id:, ml cov(unstructured)!estimates store riyronly!xtmixed sch female mom_ed nev_mar div_sep other yr || id:, mle!estimates store ri!/*!Mixed-effects ML regression Number of obs = 1386!
------------------------------------------------------------------------------! sch | Coef. Std. Err. z P>|z| [95% Conf. Interval]!-------------+----------------------------------------------------------------! female | .2047184 .0465646 4.40 0.000 .1134535 .2959834! mom_ed | .0666624 .0157266 4.24 0.000 .0358388 .097486! nev_mar | -.2009229 .0785592 -2.56 0.011 -.3548962 -.0469496! div_sep | -.1490321 .0621459 -2.40 0.016 -.2708357 -.0272284! other | -.2215656 .1206561 -1.84 0.066 -.4580471 .014916! yr | -.0828231 .0120616 -6.87 0.000 -.1064634 -.0591829! _cons | 3.52834 .0888233 39.72 0.000 3.354249 3.70243!------------------------------------------------------------------------------!------------------------------------------------------------------------------! Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]!-----------------------------+------------------------------------------------!id: Identity |! sd(_cons) | .4432578 .0190082 .4075252 .4821236!-----------------------------+------------------------------------------------! sd(Residual) | .4194833 .0098414 .4006312 .4392225!------------------------------------------------------------------------------!LR test vs. linear regression: chibar2(01) = 353.71 Prob >= chibar2 = 0.0000!
ICC = .443^2/(.443^2 + .419^2) = = .528!*/!
7/12/10
34
Alan C. Acock, July, 2010 66
Random Coefficients Model!*/!xtmixed sch yr || id: yr, mle cov(unstructured)!estimates store rcyronly!xtmixed sch female mom_ed nev_mar div_sep other yr || id: yr, mle cov(unstructured)!estimates store rc!/*!Mixed-effects ML regression Number of obs = 1386!Group variable: id Number of groups = 483! Obs per group: min = 1! avg = 2.9! max = 3! Wald chi2(6) = 107.82!Log likelihood = -1106.0162 Prob > chi2 = 0.0000!------------------------------------------------------------------------------! sch | Coef. Std. Err. z P>|z| [95% Conf. Interval]!-------------+----------------------------------------------------------------! female | .2093711 .0464561 4.51 0.000 .1183187 .3004234! mom_ed | .0664688 .0157478 4.22 0.000 .0356036 .097334! nev_mar | -.1964902 .0785915 -2.50 0.012 -.3505267 -.0424538! div_sep | -.1407179 .06219 -2.26 0.024 -.262608 -.0188277! other | -.2121512 .1208165 -1.76 0.079 -.4489471 .0246447! yr | -.0834127 .0123854 -6.73 0.000 -.1076876 -.0591377! _cons | 3.525203 .0886104 39.78 0.000 3.351529 3.698876!------------------------------------------------------------------------------!Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]!-----------------------------+------------------------------------------------!id: Unstructured |! sd(yr) | .069636 .0395577 .0228716 .212017! sd(_cons) | .4315361 .0407186 .3586742 .5191994! corr(yr,_cons) | -.1248666 .3542511 -.6809258 .5225163!-----------------------------+------------------------------------------------! sd(Residual) | .4136742 .0115779 .3915931 .4370003!------------------------------------------------------------------------------!LR test vs. linear regression: chi2(3) = 356.06 Prob > chi2 = 0.0000!
Alan C. Acock, July, 2010 67
lrtest riyronly rcyronly!lrtest ri rc!/*!. estimates store riyronly!. lrtest riyronly rcyronly!Likelihood-ratio test LR chi2(2) = 2.62!(Assumption: riyronly nested in rcyronly) Prob > chi2 = 0.2695!Note: The reported degrees of freedom assumes the null hypothesis is not on the boundaryof! the parameter space. If this is not true, then the reported test is conservative.! lrtest ri rc!Likelihood-ratio test LR chi2(2) = 2.35!(Assumption: ri nested in rc) Prob > chi2 = 0.3083!
!DIVIDE THE P VALUE BY TWO BECAUSE THIS IS INHERENTLY A ONE TAIL TEST --CAN'T! !BE NEGATIVE!xtmixed sch female mom_ed nev_mar div_sep other yr || id: yr, mle cov(unstructured)!predict sch_score!twoway (connected sch_score yr if female==0 & mom_ed==2 & nev_mar==1, sort) ///! (connected sch_score yr if female ==1 & mom_ed==4 & mom_ed < . & nev_mar==0 ///! & div_sep==0 & other==0) !gen yrXfemale = yr * female!gen yrXmom_ed = yr * mom_ed!gen yrXnev_mar = yr*nev_mar!gen yrXdiv_sep = yr*div_sep!xtmixed sch female mom_ed nev_mar div_sep other yr yrXfemale ///! || id: yr, mle cov(unstructured)!xtmixed sch female mom_ed nev_mar div_sep other yr yrXmom_ed ///! || id: yr, mle cov(unstructured)! xtmixed sch female mom_ed nev_mar div_sep other yr yrXnev_mar ///! || id: yr, mle cov(unstructured) !xtmixed sch female mom_ed nev_mar div_sep other yr yrXdiv_sep ///! || id: yr, mle cov(unstructured)!