+ All Categories
Home > Documents > Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

Date post: 22-Dec-2015
Category:
Upload: daisy-cooper
View: 213 times
Download: 0 times
Share this document with a friend
55
Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland
Transcript
Page 1: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

Growth Models

Raul Cruz-CanoHTLH 654 Spring 2013University of Maryland

Page 2: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

2

What is a Growth Model?

• A way to assess individual stability and change, both growth and decay, over time.

• A two-level, hierarchical model that that models (1) within individual change over time and (2) between individual differences in patterns of growth.

Page 3: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

3

Also known as:

• Growth Models• Trajectory Models• Growth Curve

Models• Latent GM

Page 4: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

4

Why Latent?

• Because we assume that whatever process that is underlying the thing we are modeling (or the behavior we observe) is actually unobserved, or latent.

• The characteristics we observe are a manifestation of this latent trajectory.

Page 5: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

5

Why use Growth Models?

• You have longitudinal data and are interested in change over time.– You may want to explain those changes.– You may also believe that not everyone follows the same

path.

Page 6: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

6

Hierarchical Models

• Traditional:– Level 1: Students– Level 2: Schools

• Growth Models (a type of HM): – Level 1: Repeated Observations– Level 2: Individuals

Page 7: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

7

Unconditional Model

• Level 1: Within Individual

• Level 2: Between Individual

yit = αi + βit + εit

αi = α0 + ui

βi = β0 + vi

Page 8: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

8

A Latent Trajectory

α

Time

β

Latent Depression Trajectory

Dep

ress

ive

Sym

ptom

s

Page 9: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

9

Time-Invariant Covariates

• Level 1: Within Individual

• Level 2: Between Individual

yit = αi + βit + εit

αi = α0 + α1xi1 + α2xi2 + . . . αkxik + ui

βi = β0 + β1xi1 + β2xi2 + . . . βkxik + vi

Page 10: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

10

Time-Varying Variables

• Level 1: Within Individual

• Level 2: Between Individual

yit = αi + βit + γt wit + εit

Time-varying effect.

αi = α0 + α1xi1 + α2xi2 + . . . αkxik + ui

βi = β0 + β1xi1 + β2xi2 + . . . βkxik + vi

Page 11: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

11

Example

• “Stability and Change in Family Structure and Maternal Health Trajectories.” Meadows, McLanahan, & Brooks-Gunn. American Sociological Review. Forthcoming.

• We wanted to know whether changes in family structure, including transitions into and out of coresidential relationships, had impacts on health

Page 12: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

12

Example: Self-Rated Health

• Mothers in FFCWS• “In general, how is your health?”

– Excellent (5)– Very Good (4)– Good (3)– Fair (2)– Poor (1)

• Repeated measures one, three, and five years after birth.

Page 13: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

13

Models

• Unconditional– Model Fit

• Conditional– Time-Invariant Covariates– Time-Varying Covariates

Page 14: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

14

Example (cont.)

• Trajectories of maternal self-rated health and mental health problems from one year after birth to five years after birth.

• Two types of measures of family structure change:– Level 1: Time-Varying– Level 2: Time-Invariant

Page 15: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

15

Time-Invariant Covariates

• Age at Baseline• Education• Race• Biological Parents Mental

Health Problem• Lived with both Bio

Parents at Age 15• Number of Previous

Relationships• Baseline SRH

• Considered an Abortion• Positive Marriage

Attitude• Prenatal Variables

(medical care, drug and alcohol use, smoking)

• Baseline Marital Status

Page 16: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

16

Mothers’ Self-Rated Health Trajectories for each Baseline Marital Status.

Page 17: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

17

Time-Varying Covariate

• Mothers’ Household Income• Fathers’ Mental Health• Fathers’ Earnings

Page 18: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

18

Mothers’ Household Income Trajectories

Page 19: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

19

Fathers’ Mental Health Trajectories

Page 20: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

20

Fathers’ Earnings Trajectories

Page 21: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

21

Example

• Results:– Transitions, especially exits from marriages,

resulted in declines in mental health problems.– No growing gap in well-being between mothers

who remained stably married and those remained stably single, as well as mothers who made transitions.

Page 22: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

Other topics worth visiting…

Page 23: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

PROC TRAJ• PROC TRAJ is a specialized model that estimates multiple groups within the population, in

contrast to a traditional regression or growth curve model that models only one mean within the population (similar to what we do “by hand” when we divide a variable the groups in a categorical variable)

• It is not part of the base SAS program and must be downloaded separately.• Addressed research questions focused on describing the trajectory, or pattern, of change

over time in the dependent variable, specifically questions concerned with multiple distinct patterns of change over time

• Estimates a regression model for each discrete group within the population. • The focus of the Proc Traj procedure is identifying distinct subgroups within the population. • Does not provide any individual level information on the pattern of change over time;

subjects are grouped and it is assumed that every subject in the group follows the same trajectory.

• There is no random effect capability• In order to use Proc Traj you must organize your data in a multivariate, or “wide” format,

where there is only one row of data for each subject and multiple observations included in one line of data.

Page 24: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

PROC TRAJ

• The posterior group probabilities are calculated for each individual based on the estimated parameters, and the individual is assigned to a group based on their highest posterior group probability

• You have to use an iterative process to decide the best model based on the fit parameters

Page 25: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

Options

• DATA= data for analysis• OUTPUT NAMES::

– OUT= Group assignments and membership probabilities, e.g. OUT=OF.

– OUTSTAT= Parameter estimates used by TRAJPLOT macro, e.g. OUTSTAT=OS.

– OUTPLOT= Trajectory plot data, e.g. OUTPLOT=OP.

Page 26: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

Options• MODEL; Dependent variable distribution (CNORM, ZIP, LOGIT) e.g.

MODEL CNORM;• VAR; Dependent variables, measured at different times or ages (for

example, hyperactivity score measured at age t,) e.g. VAR V1-V8;• INDEP; Independent variables (e.g. age, time) when the dependent

(VAR) variables were measured, e.g. INDEP T1-T8;1 dependent variable, and 2 independent variable which are always ID

and time• ORDER; Polynomial (0=intercept, 1=linear, 2=quadratic, 3=cubic) for

each group, e.g. ORDER 2 2 2 0; If omitted, cubics are used by default. • ID; Variables (typically containing information to identify observations)

to place in the output (OUT=) data set, e.g. ID IDNO; • WEIGHT; Weight variable for a weighted likelihood function.

Page 27: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

Example

• This example uses data from 195 subjects in a prospective longitudinal survey. Offense convictions were recorded annually for boys from age 8 through age 32 (1 = 1 or more convictions, 0 = no convictions).

Page 28: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

PROC TRAJ DATA=CAMBRDGE OUT=OF OUTPLOT=OP OUTSTAT=OS ITDETAIL; ID ID; VAR C1-C23; INDEP T1-T23; MODEL LOGIT; NGROUPS 2; ORDER 1 1;RUN;%TRAJPLOT(OP,OS,'Offenses vs. Time','Logistic Model','Offenses','Scaled Age')

Page 29: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.
Page 30: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

PROC TRAJ DATA=CAMBRDGE OUT=OF OUTPLOT=OP OUTSTAT=OS; ID ID; VAR C1-C23; INDEP T1-T23; MODEL LOGIT; NGROUPS 2; ORDER 3 3;RUN;/*Creating Graph*/%TRAJPLOT(OP,OS,'Offenses vs. Time','Logistic Model','Offenses','Scaled Age')

Notice change in AIC

Page 31: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.
Page 32: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

Now what?

• In any case there are clearly 2 groups of people:– Why are they different? Look at the other

independent variables

Page 33: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

Example 2: Number of remissionsPROC TRAJ DATA=TRY OUTPLOT=OP OUTSTAT=OS OUT=OF OUTEST=OE; ID ID; VAR R0-R10; INDEP T0-T10; MODEL LOGIT; NGROUPS 3; ORDER 1 2 2;RUN;%TRAJPLOT(OP,OS,'Remission vs. Time','Logistic Model','Remission','Time')

Page 34: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.
Page 35: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

PROC TRAJ DATA=TRY OUTPLOT=OP OUTSTAT=OS OUT=OF OUTEST=OE; ID ID; VAR R0-R10; INDEP T0-T10; MODEL LOGIT; NGROUPS 4; ORDER 0 3 3 3;RUN;%TRAJPLOT(OP,OS,'Remission vs. Time','Logistic Model','Remission','Time')

Page 36: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.
Page 37: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

37

PROC GLIMMIX for Counts•The GLIMMIX procedure fits statistical models to data with correlations or nonconstant variability and where the response is not necessarily normally distributed.

•These models are known as generalized linear mixed models (GLMM).

•The GLMMs, like linear mixed models, assume normal (Gaussian) random effects.

•Conditional on these random effects, data can have any distribution in the exponential family. The exponential family comprises many of the elementary discrete and continuous distributions.

•The binary, binomial, Poisson, and negative binomial distributions, for example, are discrete members of this family. The normal, beta, gamma, and chi-square distributions are representatives of the continuous distributions in this family.

•In the absence of random effects, the GLIMMIX procedure fits generalized linear models (fit by the GENMOD procedure).

Page 38: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

38

Basic FeaturesThe GLIMMIX procedure enables you to specify a generalized linear mixed model and to perform confirmatory inference in such models. The syntax is similar to that of the MIXED procedure and includes CLASS, MODEL, and RANDOM statements.

The following are some of the basic features of PROC GLIMMIX.

• SUBJECT= and GROUP= options, which enable blocking of variance matrices and parameter heterogeneity

• choice of linearization about expected values or expansion about current solutions of best linear unbiased predictors

• flexible covariance structures for random and residual random effects, including variance components, unstructured, autoregressive, and spatial structures

• CONTRAST, ESTIMATE, LSMEANS, and LSMESTIMATE statements, which produce hypothesis tests and estimable linear combinations of effects

Page 39: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

39

Notation for the Generalized Linear Mixed Model

The GLIMMIX procedure determines the variance function from the DIST= option in the MODEL statement or from the user-supplied variance function.

The matrix R is a variance matrix specified by the user through the RANDOM statement.

Page 40: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

40

PROC GLIMMIX Contrasted with Other SAS Procedures

The GLIMMIX procedure generalizes the MIXED and GENMOD procedures in twoimportant ways.

First, the response can have a nonnormal distribution. The MIXED procedure assumes that the response is normally (Gaussian) distributed.

Second, the GLIMMIX procedure incorporates random effects in the model and so allowsfor subject-specific (conditional) and population-averaged (marginal) inference. TheGENMOD procedure only allows for marginal inference.

The GLIMMIX and MIXED procedure are closely related.

Page 41: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

41

Example

Researchers investigated the performance of two medical procedures in a multicenter study.

They randomly selected 5 centers for inclusion.

One of the study goals was to compare the occurrence of side effects for the procedures.

In each center nA patients were randomly selected and assigned to procedure “A,” and nB patients were randomly assigned to procedure “B”.

The following DATA step creates the data set for the analysis.

Page 42: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

42

Example

data multicenter;input center group$ n sideeffect;datalines;

1 A 32 141 B 33 182 A 30 42 B 28 83 A 23 143 B 24 94 A 8 14 B 8 15 A 7 15 B 8 0;

The variable group identifies the two procedures, n is the number of patients who received a given procedure in a particular center, and sideeffect is the number of patients who reported side effects.

Page 43: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

43

Example

If YiA and YiB denote the number of patients in center i who report side effects forprocedures A and B, respectively, then—for a given center—these are independentbinomial random variables.

To model the probability of side effects for the two drugs, πiA and π iB, you need to account for the fixed group effect and the random selection of centers. One possibility is to assume a model that relates group and center effects linearly to the logit of the probabilities:

log

log

iA

iAiA A i

iB

iBiB B i

1

1

0

0

-

-

RSTUVW

RSTUVW

Page 44: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

44

Example

proc glimmix data=multicenter;class center group;model sideeffect/n = group / solution;random intercept / subject=center;

run;

The PROC GLIMMIX statement invokes the procedure.

The CLASS statement instructs the procedure to treat the variables center and group as classification variables.

The MODEL statement specifies the response variable as a sample proportionusing the events/trials syntax. In terms of the previous formulas, sideeffect/n corresponds to YiA/niA for observations from Group A and to YiB/niB for observations from Group B

Page 45: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

45

Example

The SOLUTION option in the MODEL statement requests a listing of the solutions for the fixed-effects parameter estimates.

Note that because of the events/trials syntax, the GLIMMIX procedure defaults to the binomial distribution, and that distribution’s default link is the logit link.

The RANDOM statement specifies that the linear predictor contains an intercept term that randomly varies at the level of the center effect. In other words, a random intercept is drawn separately and independently for each center in the study.

The results of this analysis are shown on the following pages.

Page 46: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

46

Example

The “Parameter Estimates” table displays the solutions for the fixed effects in the model.

Solutions for Fixed Effects

Standard Effect group Estimate Error DF t Value Pr > |t|

Intercept -0.8071 0.2514 14 -3.21 0.0063 group A -0.4896 0.2034 14 -2.41 0.0305 group B 0 . . . .

Because of the fixed-effects parameterization used in the GLIMMIX procedure, the“Intercept” effect is an estimate of β0 + βB, and the “A” group effect is an estimate of βA − βB, the log-odds ratio. The associated estimated probabilities of side effectsin the two groups are

exp . .4896

.

exp .

.

iA

iB

1

1 0 8071 00 2147

1

1 0 80710 3085

k p

k p

There is a significant difference between the two groups (p=0.0305).

Results from complete data from 15 Centers

Page 47: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

47

Example

You can produce the estimates of the average logits in the two groups and their predictions on the scale of the data with the LSMEANS statement in PROC GLIMMIX.

ods select lsmeans;proc glimmix data=multicenter;

class center group;model sideeffect/n = group / solution;random intercept / subject=center;lsmeans group / cl;

run;

The LSMEANS statement requests the least-squares means of the group effect on the logit scale.

The CL option requests their confidence limits.

Page 48: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

48

Example

group Least Squares Means

Standard group Estimate Error DF t Value Pr > |t| Alpha Lower Upper Mean

A -1.2966 0.2601 14 -4.99 0.0002 0.05 -1.8544 -0.7388 0.2147 B -0.8071 0.2514 14 -3.21 0.0063 0.05 -1.3462 -0.2679 0.3085

The “Estimate” column displays the least-squares mean estimate on the logit scale, and the “Mean” column represents its mapping onto the probability scale.

The “Lower” and “Upper” columns are 95% confidence limits for the logits in the two groups.

The “Lower Mean” and “Upper Mean” columns are the corresponding confidence limits for the probabilities of side effects. These limits are obtained by inversely linking the confidence bounds on the linear scale, and thus are not symmetric about the estimate of the probabilities.

Page 49: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

Poisson Distribution

• Poisson distribution is for counts—if events happen at a constant rate over time, the Poisson distribution gives the probability of X number of events occurring in time T.

Page 50: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

Poisson Mean and Variance

• Mean

Variance and Standard Deviation

2

where = expected number of hits in a given time period

For a Poisson random variable, the variance and mean are the same!

Page 51: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

Example: PoissonSubjects are HIV+ drug users from Project CLEAR. Two different outcomes, number of sex acts in the last 3 months and the number of HIV negative or unknown partners in the last 3 months. Subject = Subject ID numberAct3m = Sex acts, last 3 monthsHpart3m = HIV positive or HIV status unknown partners in the last 3 monthsFollow = 0, 3, 6, 9, 15, or 21 months post baseline. Intv = Intervention = 1, not =0Gender = 0 = female, 1 = maleTrade3m = 0 = no, 1=traded sex for money in the last 3 months

Page 52: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

proc glimmix data=poisson; class subject intv follow gender ethnic trade3m; model Act3m = follow intv gender ethnic trade3m/ dist=poisson solution; random int / subject=subject;run;

Page 53: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

Exercise 8: Binomial

• The variable age gives the age group • The variables hmo is binary indicator variable

for HMO insured patients• Suppose that we want to determine if patients

with hmo die at a different rate

Page 54: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

Exercise 8: PostDoc Example• 557 Biochemist got doctorate from 106 American Universities• Variables:

– PDC: Went for post-doc training immediately after PhD– AGE: Age at PhD completion– MAR: Married= 1, Unmarried =0– DOC: Prestige of Doctoral Institution– UND: selectivity of undergraduate institution – AG: Agricultural Department = 1, 0 otherwise– ARTS: Number of Articles Published (Outcome Variable)– CITS: Number of Citation of published articles– DOCID: ID of doctoral institution

Raul Cruz-Cano, HLTH653 Spring 2013

Page 55: Growth Models Raul Cruz-Cano HTLH 654 Spring 2013 University of Maryland.

Reference

1. Arrandale VH. An Evaluation of Two Existing Methods for Analyzing Longitudinal Respiratory Symptom Data [M.Sc. Thesis]. Vancouver: University of British Columbia; 2006.

2. Jones BL, Nagin DS, Roeder K. A SAS procedure based on mixture models for estimating developmental trajectories. Sociological Methods & Research 2001;29(3):374-393


Recommended