AN INTRODUCTION TO LATENT CLASS AND LATENT PROFILE …

transcript

AN INTRODUCTION TO LATENT CLASS AND

LATENT PROFILE ANALYSIS

Social Science Research Commons

Indiana University Bloomington

Workshop in Methods

BETHANY C. BRAY, PH.D.

• Associate Director for Scientific and Infrastructure Development, Institute for Health Research and Policy

• The University of Illinois at Chicago

OVERVIEW

• Conceptual introduction to latent class analysis (LCA)

• An example:Latent classes of adolescent drinking behavior

• Parameters estimated in LCA

• Technical considerations:Model identification, model selection

• Software options

OVERVIEW

• Including grouping variables

• Predicting latent class membership

• Predicting a distal outcome

• Considerations with latent profile analysis

• Resources

• Question & Answer

LATENT CLASS ANALYSIS(LCA)

ABBREVIATIONS

• LCA = latent class analysis

• Static, categorical latent variable measured with categorical items

• LPA = latent profile analysis

• Static, categorical latent variable measured with continuous items

• LTA = latent transition analysis

• Dynamic, categorical latent variable

CONCEPTUAL INTRODUCTION: LCA

THE BASIC IDEAS

• Individuals can be divided into subgroups based on unobservable construct

• The construct of interest is the latent variable

• Subgroups are called latent classes

THE BASIC IDEAS

• True class membership is unknown

• Unknown due to measurement error

• Measurement of the construct is typically based on several categorical indicators

THE BASIC IDEAS

• True class membership is unknown

• Unknown due to measurement error

• Measurement of the construct is typically based on several categorical indicators

• Latent classes are mutually exclusive & exhaustive

ESTIMATED PARAMETERS

• Latent class prevalences

• e.g., probability of membership in HIGH DEPRESSION latent class

• Item-response probabilities

• e.g., probability of reporting Felt Lonely given membership in HIGH DEPRESSION latent class

LATENT CLASSES OFADOLESCENT DRINKING BEHAVIOR

DRINKING IN 12TH GRADE

• Data from 2004 cohort of Monitoring the Future public release

• n = 2490 high school seniors who answered at least one question about alcohol use (48% boys, 52% girls)

• Goals of the study:

• Alcohol use behavior among U.S. 12th graders

• Gender differences in measurement and behavior

• Predict behavior from skipping school and grades

DRINKING IN 12TH GRADE

Item Proportion ‘Yes’

Lifetime alcohol use 82%

Past-year alcohol use 73%

Past-month alcohol use 50%

Lifetime drunkenness 57%

Past-year drunkenness 49%

Past-month drunkenness 29%

5+ drinks in past 2 weeks 26%

Seven indicators of drinking behavior

WE WILL USE LCA TO…

• Identify and describe underlying classes of drinking behavior in U.S. 12th grade students

THE 5-CLASS MODEL

Probability of ‘Yes’ response

Class 1

Class 2

Class 3

Class 4

Class 5

Lifetime alcohol use .00 1.00 1.00 1.00 1.00

Past-year alcohol .00 .61 1.00 1.00 1.00

Past-month alcohol .00 .00 1.00 .39 1.00

Lifetime drunk .00 .24 .29 1.00 1.00

Past-year drunk .00 .00 .00 1.00 1.00

Past-month drunk .00 .00 .00 .00 .92

5+ drinks past 2 wk .00 .00 .16 .00 .73

What would you name

these 5 classes?

THE 5-CLASS MODEL

Probability of ‘Yes’ response

Drinkers

Experi-

menters

Drinkers

Partiers

Drinkers

Lifetime alcohol use √ √ √ √

Past-year alcohol √ √ √ √

Past-month alcohol √ √

Lifetime drunk √ √

Past-year drunk √ √

Past-month drunk √

5+ drinks past 2 wk √

What would you name

these 5 classes?

GRAPHICAL REPRESENTATION

Drinking

Classes

Lifetime

Drinks…

SOME TECHNICAL DETAILS: LCA

PARAMETERS ESTIMATED

LATENT CLASS NOTATION

• Y represents the vector of all possible response patterns

• y represents a particular response pattern

• Example: y = (Y, Y, N, N, N, N, N)

• X represents the vector of all covariates of interest

• x represents a particular covariate

• The latent class model can be expressed as

[ | ] ( )m

RMKI y r

i i i i c i mr c

P Y y X x x

exp[ ]( ) [ | ]

1 exp[ ]

c c i pc ip

c i i i i i K

c c i pc ip

x xP C c

…with (c = 1,2,…,K) latent classes and (m = 1,2,…,M) indicators, each with (rm = 1,2,…,Rm) response options.

= probability of membership in latent class c(latent class membership probabilities)

= probability of response rm to indicator m,conditional on membership in latent class c(item-response probabilities)

ITEM-RESPONSE PROBABILITIES

• parameters express the relation between…

• The discrete latent variable in an LCA and

• The observed indicator variables

• Similar conceptually to factor loadings

• Basis for interpretation of latent classes

• Are probabilities (between 0 and 1)

ITEM-RESPONSE PROBABILITIES

• parameters analogous to factor loadings; both…

• Express relation between manifest and latent variables

• Form basis for interpreting latent structure

• But…

• Factor loadings are -weights

• parameters are probabilities

PARAMETERS

• 0 ≤ ≤ 1

• When latent variable and manifest variable completely correspond, = 0 OR = 1

• When latent variable does not at all predict manifest variable, = marginal probability for all classes

• So, if we are trying to measure a latent variable, what kind of ’s do we like?

CHARACTERISTICS OF PATTERNS OF PARAMETERS

• Homogeneity: degree to which parameters for a particular latent class are close to 0 and 1

• Latent class separation: degree to which latent classes can clearly be distinguished from each other

Probability of correctly

performing practical task

Latent

Class 1

Latent

Class 2

Task 1 .10 .91

Task 2 .15 .90

Task 3 .05 .89

Task 4 .10 .95

Task 5 .12 .90

High homogeneity + High latent class separation

Probability of correctly

performing practical task

Latent

Class 1

Latent

Class 2

Task 1 .80 .91

Task 2 .82 .90

Task 3 .81 .89

Task 4 .80 .95

Task 5 .84 .90

High homogeneity + Lower latent class separation

MODEL IDENTIFICATION

• What is “maximum likelihood estimation”?

• Likelihood function expresses likelihood of observed data, given model being fit and as a function of all possible parameter estimates

• “Winning” parameter estimates (if identified): the set that maximizes the likelihood

DEALING WITH IDENTIFICATION IN PRACTICE

• Many estimation procedures require initial values for the parameters to “kick off” the estimation procedure

• If different starting values produce very different estimates and different G2s, model is not well-identified

• Run many different sets of starting values, say 100 or more

• Look at distribution of G2 values

MODEL SELECTION

ABSOLUTE VS. RELATIVE MODEL FIT

• Absolute model fit model fit refers to whether a specified LCA model provides an ‘adequate’ representation of the data

• Adequate, according to some test statistic

• To test absolute model fit, we need the distribution of the test statistic under the null hypothesis

• H0: the specified model fits the data

COMMON TEST STATISTIC: G2

• As in many contingency table methods, LCA computes predicted response pattern proportions according to the model and estimated parameters

• These predicted response pattern proportions are compared to the observed response pattern proportions

• This comparison is expressed in the likelihood ratio statistic G2

ISSUES WITH THIS APPROACH

• There are issues with this approach to model selection in LCA, and especially in LTA

• When data are sparse, G2 not distributed as chi-square

• This makes it hard to test the fit of model

ABSOLUTE VS. RELATIVE MODEL FIT

• Relative model fit refers to deciding whether Model A or Model B is better

• AIC, BIC good tools for relative model fit

• These are information criteria (penalized log-likelihood)

• Optimize balance between fit and parsimony

• Usually scaled so that smaller AIC, BIC is better

AIC AND BIC

• p = number of parameters estimated in the model

• n = sample size

[log( )][ ]

AIC G p

BIC G n p

DIFFERENCE IN G2 VS. BLRT

• It is tempting to calculate the G2 difference for two competing models

• For example, 3 vs. 4 classes

• But test is not appropriate because we do not know the correct reference distribution for the test

• One solution: bootstrap the G2 difference

• H0: 3 class model sufficient

• H1: 4 classes required

SELECTING THE NUMBER OF DRINKING CLASSES

• BLRT not significant for 6- vs. 5-class model, indicating 6 classes are not needed

Classes G2 df AIC BIC BLRT

1 9510 120 9524 9564 N/A

2 3019 112 3049 3137 .01

3 911 104 957 1091 .01

4 209 96 271 452 .01

5 4 88 81 308 .01

6 4 80 98 372 .08

7 3 72 113 434 N/A

SOFTWARE OPTIONS

MAIN OPTIONS

• SAS (LCA, LTA)

• Stata (LCA)

• Mplus (LCA, LPA, LTA)

• Latent Gold (LCA, LPA, LTA)

• R – poLCA (LCA)

• http://www.john-uebersax.com/stat/soft.htm

INCLUDING GROUPING VARIABLES

MULTIPLE-GROUPS LCA

• Two reasons to include a grouping variable:

• To explore measurement invariance

• e.g., “Do the items map onto the latent construct in the same way for males and females?”

• To divide sample into groups for comparison purposes

• e.g., “How does the probability of membership in the HEAVY DRINKERS latent class differ in the experimental and control groups?”

MULTIPLE-GROUPS LCA

• parameters may vary as a function of the grouping variable

• Allows test of measurement invariance

• parameters may vary as a function of the grouping variable

• Allows comparison of latent class prevalences

• Include a grouping variable (i.e., sex)

• Test for measurement invariance across males and females

• Examine sex differences in prevalence of behavior types

• Models with parameters free and constrained equal across groups are statistically nested

• Free parameters allow measurement to differ across groups

• Constrained parameters equate corresponding measurement parameters across groups

In general, two models are nested if the simpler model can be arrived at by imposing parameter restrictions on the more complex model.

MEASUREMENT INVARIANCEACROSS GROUPS

TESTING MEASUREMENT INVARIANCE

• H0: Simpler model is adequate

• H1: Simpler model is not adequate

• Often, we hope to fail to reject the null hypothesis

• If non-significant, strong support for measurement invariance

• Our result not significant…• G2=18 with 35 df, p>.05

• Measurement invariance is plausible

• Keep parameter restrictions

SEX DIFFERENCES IN CLASS PREVALENCES

Class Males Females

Nondrinkers 18% 18%

Experimenters 22% 23%

Light Drinkers 9% 9%

Past Partiers 13% 21%

Heavy Drinkers 38% 28%

Sex differences in probabilities of membership in drinking classes: parameters

Class Males Females

Nondrinkers 18% 18%

Class Males Females

Nondrinkers 18% 18%

PREDICTING LATENT CLASS MEMBERSHIP

• Include a grouping variable (i.e., sex)

• Test for measurement invariance across males and females

• Examine sex differences in prevalence of behavior types

• Explore whether grades and skipping school predict drinking class membership

Skipping

SchoolDrinking

Classes

Lifetime

Drinks…

• Regress latent class variable on predictors

• Logistic regression with latent outcome

• parameters express relation between covariates and class membership

LCA WITH COVARIATES

• is a logistic regression coefficient influencing the log-odds that an individual falls into Class 1 relative to Class 2

INTERPRETING BETA PARAMETERS

101 11

( )log

TRANSFORMING BETAS TO ODDS RATIOS

• Exponentiated parameters are odds ratios

• They reflect the increase in odds of class membership, relative to the reference class, corresponding to a one-unit increase in the covariate

LCA WITH COVARIATES:SKIPPING SCHOOL, GRADES

• Skipped school in past month (dummy coded; 33% yes)

• Grades (standardized)

• Covariates added in separate models here, but can be added simultaneously to control for effects

• Non-drinkers class specified as reference group for multinomial logit model

OVERALL TESTS OF SIGNIFICANCE

• Skipped school:

• Change in 2logL (4 df) = 162.1

• p<.0001

• Grades:

• Change in 2logL (4 df) = 56.8

• p<.0001

BETAS AND ODDS RATIOS

Skipped School Grades

β OR β OR

Nondrinkers --- 1.0 --- 1.0

Experimenters 0.4 1.5 -0.2 0.8

Light Drinkers 0.7 2.0 -0.4 0.7

Past Partiers 0.9 2.5 -0.3 0.7

Heavy Drinkers 1.6 5.0 -0.5 0.6

BETAS AND ODDS RATIOS

Skipped School Grades

β OR β OR

Nondrinkers --- 1.0 --- 1.0

Experimenters 0.4 1.5 -0.2 0.8

Light Drinkers 0.7 2.0 -0.4 0.7

Past Partiers 0.9 2.5 -0.3 0.7

Heavy Drinkers 1.6 5.0 -0.5 0.6

The odds of membership in the Heavy Drinkers class relative to the Non-Drinkers class is 5 times higher for adolescents who skipped school relative to those who did not skip.

Did not Skip Skipped

Nondrinkers

Experimenters

Drinkers

Bingers

Heavy Drinkers

Past Partiers

RELATION BETWEEN SKIPPING SCHOOL AND DRINKING CLASSES

RELATION BETWEEN GRADES AND DRINKING CLASSES

-2.0 -1.6 -1.2 -0.8 -0.4 0.0 0.4 0.8 1.2 1.6 2.0

Grades

Nondrinkers Experimenters Drinkers Bingers Heavy DrinkersPast

Partiers

PREDICTING A DISTAL OUTCOME

MOTIVATION

• Now, we are interested in predicting later academic achievement from drinking subtypes

• Latent class variable: drinking

• Distal outcome: academic achievement

• Recent research has been aimed at developing new approaches to estimate these types of associations

Drinking

Classes

Lifetime

Drinks…

Academic

Achievement

USING CLASSES TO PREDICT AN OUTCOME

• Broadly categorized as 1-step and 3-step approaches based on terminology by Vermunt

• 1-step approaches are sometimes called model-based approaches

• 3-step approaches are sometimes called classify-analyze approaches

• These approaches are based on posterior probabilities

TWO TRADITIONAL APPROACHES

• Maximum probability assignment, also known as modal probability assignment

• Multiple pseudo-class draws assignment

THREE MODERN APPROACHES

• 3-step approach with adjustment for classification error using specialized weights, often referred to as the “BCH approach”

• Model-based approach using Bayes’ Theorem, often referred to as the “LTB approach”

• 3-step approach based on multiple imputation, which relies on a model, often referred to as the “inclusive classify-analyze approach”

MODERN APPROACH #1

• Classification error correction using the BCH approach

• Good references to consider:

Bakk, Z., & Vermunt, J. K. (2016). Robustness of stepwise latent class modeling with continuous distal outcomes. Structural Equation Modeling, 23, 20-31.

Dziak, J. J., Bray, B. C., Zhang, J.-T., Zhang, M., & Lanza, S. T. (2016). Comparing the performance of improved classify-analyze approaches for distal outcomes in latent profile analysis. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 12, 107-116.

MODERN APPROACH #1

• Corrects the traditional 3-step approach by accounting for classification error

• Uses the idea that the joint probability distribution of the distal outcome Y and the assigned class variable W is a linear combination of the joint probability distribution of Y and the true latent class variable C, weighted by classification error probabilities

• In other words, it weights the outcome analysis model to adjust for classification error

MODERN APPROACH #1

• Some important notes:

• BCH weights are NOT like survey weights, they CANNOTbe imported into other software packages and used as weights in generalized models; they MUST be used in a software package/routine designed to handle BCH weights

• Works well for both binary and continuous outcomes

• Has been shown to be fairly robust to violations of homoscedasticity of the outcome across classes

• Implemented robustly in Mplus and Latent GOLD and implemented limitedly in SAS

MODERN APPROACH #1

• Some important notes:

• Can only be used with a single latent class variable

LATENT PROFILE ANALYSIS(LPA)

TO MAKE A LONG STORY SHORT…

• LPA is conceptually the same as LCA

• Original cite for LPA is often attributed to:

• Lazarsfeld, P. F., & Henry, N. W. (1968). Latent structure analysis. Boston, MA: Houghton Mifflin.

• Just about everything from the LCA part of the workshop is relevant to LPA…

CONCEPTUAL INTRODUCTION: LPA

AN EXAMPLE RESEARCH QUESTION

• Are there distinct profiles of self leader perceptions?

• Latent class prevalences

• e.g., probability of membership in the PROTOTYPICAL latent class

• Class-specific means

• e.g., mean of SINCERE indicator given membership in the PROTOTYPICAL latent class

• Class-specific variances

• e.g., variance of SINCERE indicator given membership in the PROTOTYPICAL latent class

ESTIMATED PARAMETERSFactors Items

Overall Item Means Prototypical Laissez-Faire Narcissistic

Anti-Prototypical

Latent Profile Membership Proportions

.31(n=151)

.38(n=181)

.18(n=87)

.13(n=64)

Within-Profile Item Means

Sensitivity Sincere 5.75 6.60 5.89 4.93 4.47

Compassionate 5.60 6.53 5.76 4.44 4.56

Sensitive 5.38 6.34 5.48 4.33 4.28

Warm 5.59 6.53 5.65 4.66 4.48

Sympathetic 5.34 6.39 5.53 3.90 4.30

Within-Profile Means 6.48 5.66 4.45 4.42

Intelligence Knowledgeable 5.57 6.39 5.21 5.73 4.41

Educated 5.72 6.43 5.30 6.14 4.71

Wise 4.97 5.64 4.55 5.33 4.10

Intellectual 5.43 6.22 5.07 5.67 4.23

Intelligent 5.64 6.26 5.26 6.13 4.57

Dedication Motivated 5.86 6.55 5.62 6.14 4.51

Dedicated 6.00 6.71 5.80 6.13 4.67

Hardworking 6.12 6.69 5.86 6.52 4.96

Tyranny Pushy 3.02 2.62 2.88 3.85 3.27

Manipulative 2.73 2.41 2.48 3.50 3.17

Conceited 2.55 2.29 2.31 3.27 2.90

Selfish 2.61 2.11 2.49 3.21 3.30

• Means and variances can be held constant or be made free-to-vary across classes

• Probably you want to leave the means free-to-vary because this allows the classes to be different from each other

• If you have estimation issues, one of the first thing to try, though, is restricting the variances to be equal across classes

• Greatly reduces the ‘unknowns’ in the model and can help with model identification

• If we are trying to measure a latent variable, what kind of means and variances do we like?

LATENT PROFILE HOMOGENEITY

• In LCA, homogeneity is the degree to which the item-response probabilities for a particular latent class are close to 0 and 1

• In FA, homogeneity is the degree to which the factor loadings for a particular factor are close to -1 and 1

• Unlike in LCA and FA, when the latent variable completely predicts the manifest variables, the means aren’t any specific value(s)

LATENT PROFILE HOMOGENEITY

• Homogeneity is tied to both the within-class mean and the amount of within-class variance for each manifest variable

• But, for estimation purposes, usually we have to constrain the variances to be equal across classes

• Thus, homogeneity is not as straightforward as it is in LCA and FA

LATENT PROFILE SEPARATION

• The good news is that latent profile separation is still a very helpful concept to consider

• It is the degree to which the latent profiles can clearly be distinguished from each other

KEY ASSUMPTIONS

• Latent profile indicators are continuous and normally distributed within classes

• Why is this important?

• If they are not normally distributed, simulation studies in the context of GMM suggest you will over-extract the number of classes

• Bauer, D. J., & Curran, P. J. (2003). Distributional assumptions of growth mixture models: Implications for overextraction of latent trajectory classes. Psychological Methods, 8, 338-363.

KEY ASSUMPTIONS

• Latent profile indicators are independent within classes (conditional independence)

RESOURCES

LCA, LPA, LTA RESOURCES

• Recommended reading list included in the download of these slides

www.latentclassanalysis.com

LCA, LPA, LTA RESOURCES

• YouTube videos include…

• Intro to LCA

• Intro to LTA

• 1-and-1 webinar on LCA

• 1-and-1 webinar on LTA

QUESTIONS?

THANK YOU!!

• Bethany C. Bray, Ph.D.bcbray@uic.edu

• Associate Director for Scientific and Infrastructure Development, Institute for Health Research and Policy, The University of Illinois at Chicago

• bcbray.com

• latentclassanalysis.com

AN INTRODUCTION TO LATENT CLASS AND LATENT PROFILE …

Documents