Download - Bayesian Structural Equations Modeling (SEM)

1/44

Bayesian Structural Equations Modeling

M’hamed (Hamy) Temkit1

1Division of BiostatisticsMayo Clinic, Arizona

Applied Statistics Seminar, November 17, 2016

M’hamed (Hamy) Temkit Division of Biostatistics


2/44

Outline

Introduction to SEM

Covariance Analysis

SEM Estimation (GLS vs MLE)

CFA

The General Model of SEM

LAAVAN

Bayesian Paradigm

Bayesian SEM

Bayesian CFA

BLAAVAN

CONCLUSION



3/44

Motivation



4/44

Motivation



5/44

Two Paradigms

Covariance Analysis

Σ = Σ(θ)

Bayesian Inference

p(θ | y) = p(y | θ)p(θ)



6/44

Brief SEM Terminology

ξ1

X1

X2

δ1

δ2

λx11

λx21

ξ2

X3

X4

δ1

δ2

λx32

λx42

ξ3

X5

X6

δ1

δ2

λx53

λx63

η1

η 2

y1

y2

y3

y4

ε1

ε2

ε3

ε4

λy11

λy21

λy32

λy42

Measurement model

Structural model

β21

γ11

γ12

γ22

γ23

ϕ21

ϕ32

ϕ31

Endogenous latent variables

Exogenous latent variables



7/44

Background

Factor Analysis (Spearman, 1904)

Path Analysis (Sewal Wright 1918,1921,1934,1960)

Confirmatory Factor Analysis (CFA)(Joreskog, 1969 )

General SEM ( Joreskog (1973), Wiley (1973))

LISREL model (Wiley (1973), Joreskog (1977))

Generalized least squares Browne (1974,1982,1984)



8/44

Relevant Reading References

Structural Equations With Latent Variables (Bollen, 1989)

Structural Equations Modeling With Amos (Byrn)

Latent Curve Models (Bollen, Curran 2006)

Structural Equation Modeling, A Bayesian Approach (Sik-YumLee 2007)

Structural Equation Modeling: A Multidisciplinary Journal



9/44

First Principle: Linear Regression



10/44

Linear Regression: The Machinery

yi = β0 + β1xi + εi , i = 1, n (regression line)

minn∑

i=1

(yi − β0 − β1xi )2 (OLS)

and if εi ∼ N(0, σ2) iid’s

maxn∏

i=1

1

2πσ2exp(− 1

2σ2

n∑i=1

(yi − β0 − β1xi )2) (ML)

β ∼ N(β, σ2(X ′X )−1)



11/44

Pros and Cons of Regression (Linear Models)

Oversimplistic view of the Phenomena

Underestimates Measurement error (covariates are fixed)

Lacking in simultaneous equations in general (mediation )

Lacks flexibility to fit the SEM models



12/44

What is SEM

A melding of factor analysis and path (regression) analysisinto one comprehensive statistical methodolgy

Simultaneous equation modeling

Does the implied covariance matrix match up with theobserved covariance matrix

Degree to which they match represents the goodness of fit



13/44

Estimation (graph)

1.00 0.49

1.00 3.51

1.00 0.84

1.00 230.18

0.59

0.02

-0.00

1.09 1.32

1.20 0.47

0.44 0.34

1.18 -123.86

0.27

-0.02

1.22

0.00

0.51

x1 x2

x3 x4

x5 x6

x7 x8

Eps

Tlr

Eng

Rng



14/44

Estimation (equations)

Measurement Model:

x1 = a1 + epistemiology + e1

x2 = a2 + b2 epistemiology + e2

x3 = a3 + tolerance + e3

x4 = a4 + b4 tolerance + e4

x5 = a5 + engagement + e5

x6 = a6 + b6 engagement + e6

x7 = a7 + range + e7

x8 = a8 + b8 range + e8

Structural Model:

tolerance = a9 + b9 epistemiology + e9

range = a10 + b10 tolerance

b11 engagement + e10

cov(epist, engag) 6= 0



15/44

Estimation: objective function

S =

1n

∑ni=1(x1i − x1)2 1

n

∑ni=1(x1i − x1)(x2i − x2) · · · cov(x1, x8)

cov(x1, x2) var(x2) · · · cov(x2, x8)· · · · · · · · · · · ·

cov(x1, x8) cov(x2, x8) · · · var(x8)

Σ(θ) = cov(x1, x2, · · · , x8) =

var(x1) cov(x1, x2) · · · cov(x1, x8)

cov(x1, x2) var(x2) · · · cov(x2, x8)· · · · · · · · · · · ·

cov(x1, x8) cov(x2, x8) · · · var(x8)

S ≈ Σ(θ)

Basically, minimize f (Σ(θ), S)



16/44

Generalized Least Squares (GLS)

x1, · · · , xn ∼ N(0,Σ(θ0)), xi ∈ Rp iid’s

vec SL−→ N(Σ(θ0),C )

G (θ) = 2−1tr(S − Σ(θ))V 2,V > 0

θL−→ N(θ0,D(θ0))

nG (θ)L−→ χ2

p∗−q

p∗ = p(p+1)2 , q parameters

H0 : Σ = Σ(θ) vs Ha : Σ 6= Σ(θ)



17/44

Maximum Likelihood (ML)

x1, · · · , xn ∼ N(µ0,Σ(θ0)), xi ∈ Rp iid’s

(n − 1)S ∼Wp(R0, ρ0)

F (θ) = log det(Σθ) + tr((SΣ(θ))−1)− log det(S)− p

θML−→ N(θ0,C2(θ0))

nF (θM)L−→ χ2

p∗−q

H0 : Σ = Σ(θ) vs Ha : Σ 6= Σ(θ)



18/44

SEM Modeling

Model ( Diagram )

Identifyability ( q ≤ 2−1p(p + 1)),check identifyabiltiy rules in Bollen (page 238)

Constraints ( loadings equal 1 )

EDA ( Distribution, correlation, outliers, etc...)

EDA ( Estimation )

Fit indices ( SMR ( residuals ))

Diagnostics ( residuals, outliers, etc... )



19/44

Measurement model (CFA)

xi = Λξi + εi , i = 1, · · · , n

ξ ∼ N(0,Φ), Latent variablesε ∼ N(0,Ψε), Ψε diagonalξ and ε are uncorrelated

Σ = ΛΦΛt + Ψε

Λ, Φ, Ψε are the parameters



20/44

CFA Example (graph)

1.00 0.55 0.73 1.00 1.11 0.93 1.00 1.18 1.08

0.55 1.13 0.84 0.37 0.45 0.36 0.80 0.49 0.57

0.81 0.98 0.38

0.41

0.26

0.17

x1 x2 x3 x4 x5 x6 x7 x8 x9

vsl txt spd



21/44

CFA (loadings and latents)

ξ =

vsltxtspd

Λ =

1 0 0λ21 0 0λ31 0 00 1 00 λ52 00 λ62 00 0 10 0 λ820 0 λ92

But also remember the variances and covariances



22/44

CFA using Laavan (R)

library(stringr)

library(lavaan)

library(DiagrammeR)

library(dplyr)

library(semPlot)

# specify the model

HS.model <-

" visual =~ x1 + x2 + x3

textual =~ x4 + x5 + x6

speed =~ x7 + x8 + x9 "

fit.HS <- sem(HS.model,

data=HolzingerSwineford1939)

summary(fit.HS)

semPaths(fit.HS, intercept = FALSE,

whatLabel = "est",

residuals = TRUE, exoCov = TRUE)



23/44

CFA Example (output)

> summary(fit.HS)

lavaan (0.5-22) converged normally after 35 iterations

Number of observations 301

Estimator ML

Minimum Function Test Statistic 85.306

Degrees of freedom 24

P-value (Chi-square) 0.000

Parameter Estimates:

Information Expected

Standard Errors Standard

Latent Variables:

Estimate Std.Err z-value P(>|z|)

visual =~

x1 1.000

x2 0.554 0.100 5.554 0.000

x3 0.729 0.109 6.685 0.000

textual =~

x4 1.000

x5 1.113 0.065 17.014 0.000

x6 0.926 0.055 16.703 0.000

speed =~

x7 1.000

x8 1.180 0.165 7.152 0.000

x9 1.082 0.151 7.155 0.000

Covariances:


visual ~~

textual 0.408 0.074 5.552 0.000

speed 0.262 0.056 4.660 0.000

textual ~~

speed 0.173 0.049 3.518 0.000

Variances:


.x1 0.549 0.114 4.833 0.000

.x2 1.134 0.102 11.146 0.000

.x3 0.844 0.091 9.317 0.000

.x4 0.371 0.048 7.779 0.000

.x5 0.446 0.058 7.642 0.000

.x6 0.356 0.043 8.277 0.000

.x7 0.799 0.081 9.823 0.000

.x8 0.488 0.074 6.573 0.000

.x9 0.566 0.071 8.003 0.000

visual 0.809 0.145 5.564 0.000

textual 0.979 0.112 8.737 0.000

speed 0.384 0.086 4.451 0.000



24/44

Structural model (SEM)

η = Bη + Γξ + ζ

y = Λyη + εx = Λxξ + δ

B, Γ, Λy , Λx ,Φ, Ψ, Θε,Θδ, are the parameters



25/44

SEM Example (graph)

1.00 2.18 1.82

1.00 1.26 1.06 1.26 1.00 1.19 1.28 1.27

1.48 0.57

0.84

0.621.31

2.15 0.79 0.351.36

x1 x2 x3

y1 y2 y3 y4 y5 y6 y7 y8

i60

d60 d65



26/44

SEM Example (some equations)

[d60d65

]=

[0 0B21 0

] [d60d65

]+

[γ11γ21

] [i60]

+

[ξ1ξ2

]

Σ(θ) =

(Σyy (θ) Σyx(θ)Σxy (θ) Σxx(θ)

)



27/44

SEM Example ( R code)

# specify the model

model <- ’

# latent variables

ind60 =~ x1 + x2 + x3

dem60 =~ y1 + y2 + y3 + y4

dem65 =~ y5 + y6 + y7 + y8

# regressions

dem60 ~ ind60

dem65 ~ ind60 + dem60

# residual covariances

y1 ~~ y5

y2 ~~ y4 + y6

y3 ~~ y7

y4 ~~ y8

y6 ~~ y8

’

fit <- sem(model, data=PoliticalDemocracy)

summary(fit)

semPaths(fit, intercept = FALSE, whatLabel = "est",

residuals = FALSE, exoCov = FALSE)



28/44

SEM Example (output)

summary(fit)

lavaan (0.5-22) converged normally after 68 iterations


Estimator ML

Minimum Function Test Statistic 38.125

Degrees of freedom 35

P-value (Chi-square) 0.329


Information Expected

Standard Errors Standard

Latent Variables:


ind60 =~

x1 1.000

x2 2.180 0.139 15.742 0.000

x3 1.819 0.152 11.967 0.000

dem60 =~

y1 1.000

y2 1.257 0.182 6.889 0.000

y3 1.058 0.151 6.987 0.000

y4 1.265 0.145 8.722 0.000

dem65 =~

y5 1.000

y6 1.186 0.169 7.024 0.000

y7 1.280 0.160 8.002 0.000

y8 1.266 0.158 8.007 0.000

Regressions:


dem60 ~

ind60 1.483 0.399 3.715 0.000

dem65 ~

ind60 0.572 0.221 2.586 0.010

dem60 0.837 0.098 8.514 0.000



29/44

SEM Example (output)

Covariances:


.y1 ~~

.y5 0.624 0.358 1.741 0.082

.y2 ~~

.y4 1.313 0.702 1.871 0.061

.y6 2.153 0.734 2.934 0.003

.y3 ~~

.y7 0.795 0.608 1.308 0.191

.y4 ~~

.y8 0.348 0.442 0.787 0.431

.y6 ~~

.y8 1.356 0.568 2.386 0.017

Variances:


.x1 0.082 0.019 4.184 0.000

.x2 0.120 0.070 1.718 0.086

.x3 0.467 0.090 5.177 0.000

.y1 1.891 0.444 4.256 0.000

.y2 7.373 1.374 5.366 0.000

.y3 5.067 0.952 5.324 0.000

.y4 3.148 0.739 4.261 0.000

.y5 2.351 0.480 4.895 0.000

.y6 4.954 0.914 5.419 0.000

.y7 3.431 0.713 4.814 0.000

.y8 3.254 0.695 4.685 0.000

ind60 0.448 0.087 5.173 0.000

.dem60 3.956 0.921 4.295 0.000

.dem65 0.172 0.215 0.803 0.422



30/44

Why Bayesian

Flexibility to utilize prior knowledge ( priors )

Robust to small sample sizes

Bayes Factor and flexibility in comparing models

Easy production of the Latent scores ( Factors )

Blaavan ( open software in R )

WinBUGS ( open software )



31/44

Bayesian References

A Bayesian approach to confirmatory factor analysis (Lee,1980)

Evaluation of the Bayesian and maximum likelihoodapproaches in analyzing structural equation models with smallsmall sample sizes (Lee, Song, 2004)

Structural Equation Modeling, A Bayesian Approach (Lee,2007)

Basic and Advanced Bayesian Structural Equation Modeling,With Applications in the Medical and Behavioral Sciences(Song, Lee, 2012)



32/44

Bayesian estimation

log p(Θ|Y ,M) ∝ log p(Y |Θ,M) + log p(Θ)M: arbitrary SEM model

Y: observed dataset of raw observations, sample size nθ: Random vector of parameters in M



33/44

Conjugate priors

p(y |θ) =(nk

)θy (1− θ)n−y , θ ∈ (0, 1)

p(θ) ∝ θα−1(1− θ)β−1 , θ ∼ β(α, β)p(θ|y) ∝ p(y |θ)p(θ) ∝ θy (1− θ)n−y (1− θ)β−1

∝ θy+α−1(1− θ)n−y+β−1 ∼ β(y + α, n − y + β)The prior p(θ) and posterior p(θ|y) have the same distribution

form



34/44

Measurement model (CFA) Bayesian approach

yi = Λwi + εi , i = 1, · · · , n, yi ∈ Rk

wi ∼ N(0,Φ),w ∈ Rq

εi ∼ N(0,Ψε), Ψε diagonal , Ψεk elementswi and εi are independent

Λ, Φ, Ψε are the parametersLet Λt

k be the kth row of Λ



35/44

Measurement model (CFA) priors

The conjugate priors on the parameters are:

Ψεk ∼ IGamma(α∗0εk , β∗0εk)

[Λk |Ψεk ] ∼ N(Λ0k ,ΨεkH0yk)

Φ ∼ IWq(R∗0 , ρ0), R∗0 is pd

The problem is choosing the hyperparameters, such that we haveinformative vs. non informative priors



36/44

Measurement model (CFA) Gibbs Sampling (MCMC)

Let Y = y1, · · · , yn be the observed data matrixΩ = (w1, · · · ,wn) matrix of the the latent variables(Y ,Ω) is the complete dataset ( augmented data )

P(Λ, Φ, Ψε|Y ) the posterior is intractable

P(Λ, Φ, Ψε|Ω,Y ) usually standardP(Ω|Λ, Φ, Ψε,Y ) can be also derived based on Model M



37/44

Measurement model (CFA) Gibbs Sampling

The Gibbs sampling algorithm allows to sample fromP(Λ, Φ, Ψε,Ω|Y )

at the (j + 1)thiteration given Ωj , Λj , Φj , Ψjε

Generate Ωj+1 ∼ P(Ω|Λj , Φj , Ψjε,Y )

Generate Ψj+1ε ∼ P(Ψε|Ωj+1, Λj , Φj , Y )

Generate Φj+1 ∼ P(Φ|Ωj+1, Λj , Ψj+1ε ,Y )

Generate Λj+1 ∼ P(Λ|Ωj+1, Φj+1, Ψj+1ε ,Y )



38/44

Measurement model (CFA) Posterior Parameters Estimates

θt = (Λt , Φt , Ψtε), t = 1, · · · ,T ∗

θ =1

T ∗

T∗∑i=1

θt

var(θ) =1

(T ∗ − 1)

T∗∑i=1

(θt − θ)(θt − θ)t

along with 95% confidence intervals using the Q0.025 and Q0.975



39/44

Bayesian CFA Example using Blaavan

library(blavaan)

# specify the model

bHS.model <- " visual =~ x1 + x2 + x3

textual =~ x4 + x5 + x6

speed =~ x7 + x8 + x9

# intercepts

x1 ~ 0

x2 ~ 0

x3 ~ 0

x4 ~ 0

x5 ~ 0

x6 ~ 0

x7 ~ 0

x8 ~ 0

x9 ~ 0

"

bfit.HS <- bsem(bHS.model,

data=HolzingerSwineford1939 )

summary(bfit.HS)

fitMeasures(bfit.HS,fit.measures="all", baseline.model= NULL)



40/44

Bayesian CFA Example (output)

blavaan (0.2-2) results of 10000 samples after 5000 adapt+burnin iterations


Number of missing patterns 1

Statistic MargLogLik PPP

Value -4481.087 0.000


Latent Variables:

Estimate Post.SD HPD.025 HPD.975 PSRF Prior

visual =~

x1 1.000

x2 1.221 0.018 1.186 1.255 1.000 dnorm(0,1e-2)

x3 0.463 0.012 0.438 0.487 1.000 dnorm(0,1e-2)

textual =~

x4 1.000

x5 1.404 0.020 1.365 1.445 1.004 dnorm(0,1e-2)

x6 0.731 0.016 0.7 0.761 1.001 dnorm(0,1e-2)

speed =~

x7 1.000

x8 1.320 0.020 1.28 1.357 1.002 dnorm(0,1e-2)

x9 1.286 0.019 1.25 1.325 1.002 dnorm(0,1e-2)



41/44


Covariances:


visual ~~

textual 15.500 1.321 12.998 18.14 1.000 dwish(iden,4)

speed 20.910 1.764 17.576 24.439 1.000 dwish(iden,4)

textual ~~

speed 13.003 1.118 10.9 15.259 1.000 dwish(iden,4)

Intercepts:


.x1 0.000

.x2 0.000

.x3 0.000

.x4 0.000

.x5 0.000

.x6 0.000

.x7 0.000

.x8 0.000

.x9 0.000

visual 0.000

textual 0.000

speed 0.000



42/44


Variances:


.x1 0.716 0.088 0.547 0.891 1.001 dgamma(1,.5)

.x2 1.219 0.138 0.96 1.5 1.000 dgamma(1,.5)

.x3 0.993 0.086 0.832 1.164 1.000 dgamma(1,.5)

.x4 0.449 0.053 0.346 0.552 1.001 dgamma(1,.5)

.x5 0.314 0.069 0.184 0.452 1.002 dgamma(1,.5)

.x6 0.509 0.048 0.417 0.604 1.000 dgamma(1,.5)

.x7 0.877 0.084 0.717 1.045 1.000 dgamma(1,.5)

.x8 0.567 0.077 0.417 0.72 1.000 dgamma(1,.5)

.x9 0.478 0.068 0.347 0.61 1.000 dgamma(1,.5)

visual 24.998 2.118 20.929 29.176 1.000 dwish(iden,4)

textual 10.256 0.882 8.518 11.953 1.001 dwish(iden,4)

speed 17.812 1.539 14.813 20.859 1.001 dwish(iden,4)

> fitMeasures(bfit.HS,fit.measures="all", baseline.model= NULL)

npar logl ppp bic dic p_dic waic

21.000 -4398.287 0.000 8916.354 8837.747 20.586 8838.364

p_waic looic p_loo margloglik

20.848 8838.391 20.861 -4481.087



43/44

Conclusions

The frequentist SEM approach is based on MLE

The Bayesian approach with data augmentation and MCMCmethods is flexible to analyze SEM

The Bayesian approach may be used when prior knowledge isavailabe when small sample size

Some open problems (power, optimal designs, GSEM, etc...)



44/44

THANK YOU!