Applications of SEM
Bill Cassill’s Thesis Data: Likert scales vs. Content Analysis
Bill Cassill chose to investigate the extent to which content analysis could be used to measure traits which are typically measured using Likert scaling techniques. He chose student evaluations of instructors as the vehicle for this investigation.
He picked a common Likert type evaluation form and created a questionnaire which included four scales on this form. These scales measured student’s perceptions of
1) Enthusiasm of the instructor,2) Learning Value - the extent to which the instructor stimulated intellectual effort3) Interaction - the extent to which the instructor encouraged student discussion4) Organization of the instructor - the extent to which the lectures followed a logical
order
In the questionnaire, Bill also included two questions to which written answers were to be given. The first was “In what aspects to you think your instructor a good teacher?” The second was, “In what aspects to you think your instructor needs improvement?”
For each student’s written responses, Bill counted the number of positive references to Enthusiasm, Learning, Interaction, and Organization. He also counted the number of negative references to each.
Then for each student, he formed eight scale scores. The first four were the student’s summated responses to the four Likert scales.
The second four were based on the counts of references from the written responses - in each case it was the number of positive references minus the number of negative references. So a positive count would mean that the student made more positive than negative comments about a teacher with respect to an attribute, such as Enthusiasm.
Ultimately, he had an 8 column by 200+ row data matrix. Four of the columns represented Likert scale scores. Four represented Content Analysis scale scores.
The two major issues here are the following . . .1) Do the four scales (E, L, I, and O) represent four separate dimensions. If so, there should be low
correlations between the scales. This is a discriminant validity issue.2) Do the two methods measuring the same dimension correlate with each other. If they do, then
that’s an indication that they’re both measuring the same dimension. This is a convergent validity issue.
Applications of SEM - 1 Printed on 5/8/2023
The questions used in the Cassill thesis
Instructions given to participants:
With regard to the instructor, please rate how often your instructor did each of the following using the rating format below:
1-hardly ever2-occasionally3-sometimes4-frequently5-almost always
The names in parentheses are the dimensions represented by the items and were not on the questionnaire given to students.
1. (Interaction). Promoted teacher-student discussions (as opposed to mere responses to questions).2. (Interaction). Found ways to help students answer their own questions.3. (Interaction). Encouraged students to express themselves freely and openly.4. (Enthusiasm). Seemed enthusiastic about the subject matter.5. (Enthusiasm). Spoke with expressiveness and variety in tone of voice.6. (Enthusiasm - reversed scored). Made presentations that are dull and dry.7. (Organization). Made it clear how each topic fits into the course.8. (Interaction). Explained the reasons for criticisms of students' academic performance.9. (Interaction). Encouraged student comments even when they turn out to be incorrect or irrelevant.10. (Organization). Summarized material in a way which aids retention.11. (Learning). Stimulated students to intellectual effort beyond that required by most courses.12. (Organization). Clearly stated the objectives of the course.13. (Organization). Explained the course material clearly, and explanations are to the point.14. (Learning). Related course material to real life situations.15. (Learning). Introduced stimulating ideas about the subject.
The items were taken from Cashin, W. E. and Downey, R. G. (1992). Using global student rating items for summative evaluation. Journal of Educational Psychology, 84(4), 563-572.
The responses to the items are variables i1 through i15 in the CassillNM.SAV file.
Summary
There are 8 observed variables
4 are Likert scale scores on the four dimensions4 are content analysis scores on the same four dimensions
Applications of SEM - 2 Printed on 5/8/2023
Exploration of Various Models
Model 1: Simple Orthogonal Instructor Dimension Factors Model
This model assumes that there are four dimensions, and the dimensions are orthogonal. Clearly this is not the correct model.
.17
likenth
.44
cainter
.36
liklrn
.69
likorg
.25
caenth
.32
likinter
.44
calrn
.36
caorg
EN
INT
LRN
ORG
.50
.57
.66
.60
.42
.67
.60
.83
eae
eoe
eai
eoi
eal
eol
eao
eoo
Chi-square = 502.44df = 24p = .00RMSEA = .30
Applications of SEM - 3 Printed on 5/8/2023
What’s good?1. Each indicator loads on its factor.2. Content Analysis indicators have high loadings.
What’s bad?1. Horrible fit.
Model 2: Oblique Instructor Dimension Factors
This model assumes four dimensions of teaching as before but assumes that they are correlated.
.41
likenth
.37
cainter
.56
liklrn
.86
likorg
.25
caenth
.53
likinter
.38
calrn
.37
caorg
EN
INT
LRN
ORG
.50
.73
.62
.61
.64
.61
.75
.93
eae
eoe
eai
eoi
eal
eol
eao
eoo
Chi-square = 226.22df = 18p = .00RMSEA = .23
1.15
1.02
.89.83
.75
.86
Applications of SEM - 4 Printed on 5/8/2023
Note that even though these are standardized loadings, some of the correlations between factors are > 1.
That’s a red flag that the model is not the appropriate model.What’s good?
1. Indicators load on factors.2. CA indicators have large, positive loadings.
What’s bad1. Horrible fit.2. Highly correlated factors.3. Factor correlations out of range.
Model 3: A Higher-order Instructor Evaluation Factor. In this model, the correlations between the factors have been replaced by a higher order factor. Since there was something screwy about the above oblique factor model, we wouldn’t expect this model to fix the problem, and it hasn’t. This model fits almost as well (or poorly) as the four correlated dimensions.
In fact, substituting correlations between a set of indicators (observed indicators or lower order factors) with a higher order factor will always fit WORSE than the model which allows the indicators to simply be correlated.
That’s because the loadings on the higher order factor have to meet certain criteria. These criteria are more restrictive than the “anything goes” unrestricted correlations between the indicators.
Thus the higher order factor model will always fit worse than the unrestricted correlations between indicators model.
It’s shown here merely to illustrate how the correlations between first order factors can be accounted for my assuming a single higher-order factor.
.36
likenth
.36
cainter
.54
liklrn
.86
likorg
.24
caenth
.50
likinter
.38
calrn
.37
caorg
1.31
EN
.88
INT
.90
LRN
.70
ORG
.49
.71
.62
.61
.60
.60
.74
.93
eae
eoe
eai
eoi
eal
eol
eao
eoo
Chi-square = 237.22df = 20p = .00RMSEA = .22
OverallEval
1.14
.94
.95
.83
een
eint
elrn
eorg
Applications of SEM - 5 Printed on 5/8/2023
What’s good?1. Indicators load on factors.2. CA indicators have large, positive loadings.
What’s bad1. Horrible fit.2. One standardized loading out of range.
Model 4A. Two Orthogonal Response Method Factors –Likert and Content Analysis. In this model, two different rating processes are assume – a Likert process and an open-ended process. Note that it does not allow corresponding scales to correlate, e.g., LIKENTH with CAQENTH. To get the estimates, the variance of CA had to be fixed at 1. Although this model clearly fits better than the four-dimension model, there are some problems. For example, the standardized loading of caenth onto CA is 1.68, a value that doesn’t make sense.
.65
nlikenth
.00
cainter
.68
nliklrn
.70
nlikorg
2.81
caenth
.52
nlikintr
.00
calrn
.00
caorg
Likert
CA-.03
eae
eoe
eai
eoi
eal
eol
eao
eoo
Cassillnm dataChi-square = 162.62df = 21p = .00RMSEA = .18
-.01
.05
1.68
.72
.83
.83
.81
Applications of SEM - 6 Printed on 5/8/2023
What’s good?1. Better fit than above models.2. Likert scales have nice positive loadings.
What’s bad?1. Wrong dimensions – we were expecting 4 content dimensions, not 2 method dimensions.2. Screwy loadings on CA.3. Fit still not good.
Model 4B. Two Oblique Response Method Factors.
This model fits better, although the chi-square is still significant. But it’s doesn’t square with our conceptualization of the problem. Specifically, we conceptualize there being four dimensions of teaching performance – enthusiasm, interaction, learning, and organization. This is saying there are only two dimensions and they correspond to the METHOD of response, not the characteristics of the teachers. This is a situation in which theory and data collide.
There may be a problem with the estimation of this model. Why is the standardized loading of CAENTH on CA = 1.67? (The reason, not shown in the standardized model presented below is that the model represents what is called a Heywood case. One of the residual variance estimates is less than 0.)
.66
nlikenth
.00
cainter
.68
nliklrn
.69
nlikorg
2.80
caenth
.52
nlikintr
.00
calrn
.00
caorg
Likert
CA-.05
eae
eoe
eai
eoi
eal
eol
eao
eoo
Cassillnm dataChi-square = 151.99df = 20p = .00RMSEA = .17
-.04
.01
1.67
.72
.83
.83
.81
.15
Applications of SEM - 7 Printed on 5/8/2023
What’s good?1. Fit is a little better than the above models.2. Likert indicators are good.
What’s bad?1. Fit still bad.2. Two method dimensions rather than content dimensions.3. Heywood case for CA subset. 4. Correlation between Likert and CA factors is small.
Model 4B revisited, after experimenting with the reference indicator to find one that didn’t yield a Heywood case.
But there is an anomaly here, also – the correlation between the two factors is > 1.
Applications of SEM - 8 Printed on 5/8/2023
Model 4C: A Single Overall Evaluation Factor.
Amazingly, this model fits much better than does a model which assumes four dimensions of teaching and it fits just as well as a model which assumes two dimensions of responding. Actually, it suggests a resolution of the conflict of theory and data. It suggests that there is a general tendency to respond positively or negatively that affects ALL the measures – Likert and written responses.
Applications of SEM - 9 Printed on 5/8/2023
What’s good?1. Fit is a much better than the above models.2. Likert indicators are good.
What’s bad?1. Fit still bad.2. Two method dimensions rather than content dimensions.3. Correlation between factors is larger than 1, not a possible value.
.65
nlikenth
.08
cainter
.69
nliklrn
.70
nlikorg
.04
caenth
.52
nlikintr
.15
calrn
.22
caorg
eae
eoe
eai
eoi
eal
eol
eao
eoo
Cassillnm dataChi-square = 58.78df = 20p = .00RMSEA = .09
OverallEval
.81
.72
.83
.84
.21
.28
.39
.46
Applications of SEM - 10 Printed on 5/8/2023
What’s good?1. Fit is not much worse than the best of the previous models.2. Loadings of indicators on the single factor are all positive.
What’s bad?1. Fit still not acceptable.2. Only one method factor – no content factors.
Model 5. A bifactor model – a Method Factor model with 4 content factor. This model fits extraordinarily well. The chi-square is not significant, something rarely found in application of SEM. Moreover, it makes sense. It fits both theory and data. The fit to theory is found in the evidence for 4 dimensions of instructor performance. They’re correlated but the correlations are not extraordinarily high. It also fits the data, quite well. And the Method factor makes sense. It reflects a individual differences in a general tendency to evaluate everything either positively or negatively. Note that this tendency is most apparent in the Likert items – they load most highly on it.
1.00
likenth
.15
cainter
.69
liklrn
.99
likorg
.04
caenth
1.00
likinter
.36
calrn
.25
caorg
EN
INT
LRN
ORG
.11
.69
.58
.31
.69
.30
.53
.53
METH.64
.85
eae
eoe
eai
eoi
eal
eol
eao
eoo
.24
Chi-square = 8.69df = 10p = .56RMSEA = .00
.05
.48
.53
.51
.37
.72
.18
.72
.14
.39
Applications of SEM - 11 Printed on 5/8/2023
HALO
What’s good?1. Fit is acceptable.2. Model makes sense.
What’s bad?1. Nothing I can think of.
Development of bifactor models of Big Five Questionnaires.
The original Faking model. Nhung Nguyen’s dissertation data – Summer 2003
Nhung had gathered data involving administration of the IPIP 50 item Big 5 twice – once with instructions to respond honestly, once with instructions to fake good.
When working with these data on a paper involving faking of situational judgment tests, I hit on the idea that the Big 5 latent variables were common across the two instructional conditions but there was an additional influence on responding in the faking conditions. I later found that others had considered this notion. This lead to the following set of models.Model 1: Basic CFA of Parcels formed from Honest and Faked Questionnaire Items.
HSURGT1
HSURGT2
HSURGT3
HAGREET1
HAGREET2
HAGREET3
HCONST1
HCONST2
HCONST3
HEST1
HEST2HEST3
HINT1
HINT2
HINT3
e
.87
.86
.79
a
ee1hee2hee3h
ea1h
ea2h
ea3h
ec1h
ec2h
ec3h
es1h
es2hes3h
eo1heo2heo3h
c
s
o
.66
.57
.62
.44
.58
.51.48
.38
.55.49
.48
FSURGT1
FSURGT2
FSURGT3
ee1f
ee2fee3f
FAGREET1
FAGREET2
FAGREET3
ea1fea2f
ea3f
FCONST1
FCONST2
FCONST3
ec1fec2f
ec3f
FEST1
FEST2
FEST3
es1f
es2f
es3f
FINT1
FINT2
FINT3
eo1f
eo2feo3f
.67
.64
.59
.73
.67
.63
.89.83
.82
.91
.84
.86
.86
.30
.23
.34
.44
.54
.31
.49 .63
.77
.70
.56
sjtml.20
.29
.37
.32
.44
FSJTML1
FSJTML2
FSJTML3
ejf1ejf2ejf3
HSJTML1
HSJTML2
HSJTML3
ejh1
ejh2
ejh3.59
.68
.76
.76
.75
.66
.67
.77
Applications of SEM - 12 Printed on 5/8/2023
Notes;1. We formed 3 testlets/parcels from each set of 10 items, discarding the item with lowest communality from each dimension.2. This is simply a CFA of the 36 testlets – 6 for each dimension, 3 honest, 3 fake-good.3. SJT represents situational judgment test responses. 4. Fit is terrible because, we believe, that there is another influence on the Faked items, an influence not included in this model, a Faking influence.
X2(579 )= 2241.38.GFI = .558AGFI = .492RMSEA = .119
Skipped to end of lecture in 2015.
Below is the model above with one additional latent variable, representing individual differences in tendency to agree with each item based on the instructions, rather than the item content, called F, for faking here.
To simplify the presentation, the 3 regression arrows to each set of 3 parcels are represented as a single arrow.
Note that the fit is not spectacular, but that it is much better than the fit of the previous model.
Applications of SEM - 13 Printed on 5/8/2023
.62
.71
.64
.40
.54
.21
F
.39
.72
.45
.86
.49
.78
.50
.72
.53
.87
.70
.67
F-O
H-O
F-S
H-S
F-C
H-C
F-A
H-A
F-E
H-E
F-SJT
H-SJT
O
S
C
A
E
SJT
X2(561) = 1323.05GF I= .736AGFI = .687RMSEA = .082
Since we were afraid that the paper might be rejected outright because of the fact that the fit indices were not close enough to the traditional threshold values, we looked around for ways to improve fit. We realized that when participants are asked to respond to the same item twice, even under different instructional sets, their responses to those identical items will both be influenced by specific idiosyncratic aspects of the items. Thus, across participants, responses to identical items will be positively correlated.
These idiosyncratic items are part of the “other” influences that are the residual terms. So we allowed the residuals of identical testlets to be correlated. This let to the following model . . .
.80
HSURGT1.79
HSURGT2.66
HSURGT3
.86
HAGREET1.39
HAGREET2.39
HAGREET3
.78
HCONST1.58
HCONST2.51
HCONST3
.84
HEST1 .72
HEST2 .68
HEST3
.63
HINT1 .55
HINT2 .41
HINT3
e
.90
.89
.81
a
ee1h
ee2hee3h
ea1h
ea2h
ea3h
ec1h
ec2h
ec3h
es1h
es2hes3h
eo1h
eo2h
eo3h
c
s
o
.93
.63
.88
.76
.72
.92
.85
.83
.80
.74
.64
.55
FSURGT1.51
FSURGT2.65
FSURGT3
ee1f
ee2f
ee3f
.53
FAGREET1.40
FAGREET2.25
FAGREET3
ea1f
ea2f
ea3f
.67
FCONST1.60
FCONST2.58
FCONST3
ec1f
ec2f
ec3f
.74
FEST1 .69
FEST2 .64
FEST3
es1f
es2f
es3f
.66
FINT1 .51
FINT2 .40
FINT3
eo1f
eo2feo3f
.56
.55
.43
.55
.37
.45
.52
.39
.38
.44
.42.38
.38.23
Faking Model Correlated F, F-H errorsRMSEA = .079CFI = .855Chi-square = 1224.934df = 543p = .000
.27
.14
.26
.36
.36
.08
.30 .26
.50
.32
.63
F
.59
.60
.68
.74.74
.66.67
sjtml.14
.14
.23
.09
.34
.57
FSJTML1.52
FSJTML2.51
FSJTML3
ejf1
ee2hejf2ejf3
.49
HSJTML1.49
HSJTML2.35
HSJTML3
ejh1
ejh2
ejh3
.59.70
.73
.69
.69
.70
.19
.22.18
.48.46.68
.48.51.21
.64
.39
.72
-.14.02
.07
.03
-.08.20
-.50
.27.33
.16.24
.23
-.02
.20.32
.13
.24.27
The fit of this model is closer to being acceptable. Note that F influences only the faked items, not the honest items. We have since discovered that there is an analogous influence on the honest items, one we call M, for method bias.
Applications of SEM - 14 Printed on 5/8/2023
X2(543)=1056.64GFI=.778AGFI = .728RMSEA = .068
The fit was better, but still not quite at “rejection-proof” levels.We considered other possibilities and discovered that there were positive correlations among the F testlets that were not accounted for by the loadings of those testlets onto the single F factor. These seemed to be dimension-specific effects. To account for these we could have introduced a different F latent variable for each dimension. Instead, we chose to allow the residuals between testlets within each dimension to be correlated. This lead to the following, final model . . . We felt that the fit of this model was acceptable, and submitted the paper for presentation to SIOP, 2004 based on it.
.82
HSURGT1.80
HSURGT2.68
HSURGT3
.89
HAGREET1.41
HAGREET2.41
HAGREET3
.77
HCONST1.60
HCONST2.53
HCONST3
.83
HEST1 .73
HEST2 .69
HEST3
.65
HINT1 .56
HINT2 .40
HINT3
e
.90
.89
.83
a
ee1h
ee2hee3h
ea1h
ea2h
ea3h
ec1h
ec2h
ec3h
es1h
es2hes3h
eo1h
eo2h
eo3h
c
s
o
.94
.64
.88
.77
.73
.91
.86
.83
.81
.75
.63
.44
FSURGT1.39
FSURGT2.61
FSURGT3
ee1f
ee2f
ee3f
.46
FAGREET1.38
FAGREET2.18
FAGREET3
ea1f
ea2f
ea3f
.64
FCONST1.55
FCONST2.53
FCONST3
ec1f
ec2f
ec3f
.65
FEST1 .58
FEST2 .54
FEST3
es1f
es2f
es3f
.65
FINT1 .46
FINT2 .39
FINT3
eo1f
eo2feo3f
.48
.47
.35
.44
.30
.36
.46
.31
.32
.39
.36.32
.32.17
Faking Model Correlated F, F-H errorsRMSEA = .053CFI = .938Chi-square = 819.279df = 525p = .000
.27
.14
.25
.33
.35
.08
.28 .27
.49
.31
.64
F
.60
.60
.65
.69.70
.65.68
sjtml.16
.15
.29
.17
.39
.31
FSJTML1.34
FSJTML2.31
FSJTML3
ejf1
ee2hejf2ejf3
.59
HSJTML1.57
HSJTML2.43
HSJTML3
ejh1
ejh2
ejh3
.66.76
.51
.52
.52
.77
.21
.26.20
.46.41.69
.51.54.21
.66
.32
.74
.43.47
.33
.40.45
.61
.40.24
.25
.27.36
.33
.46
.38
.44
.11.14
.25
.09.13
.18
.19
.10.30
-.20
.32.40
.24.34
.31
.03
.24.38
.19
.31.29
Applications of SEM - 15 Printed on 5/8/2023
Faking Model Conceptualized as a Longitudinal Growth Model –Summer of 2004 - ALL summer.For each dimension, the single-letter latent variable is the Intercept. The latent variable whose name begins with F is the slope.
HEI1
HETL1HEI2HETL2 HEI3HETL3
HAI1HATL1 HAI2HATL2 HAI3HATL3
HCI1
HCTL1 HCI2
HCTL2HCI3
HCTL3
HSI1
HSTL1HSI2HSTL2HSI3
HSTL3
HOI1HOTL1HOI2
HOTL2HOI3HOTL3
UHE,
eLHE1
LHE2LHE3
UHA,
a
0,
ee1h 0,ee2h0,
ee3h
0,
ea1h0,
ea2h0,
ea3h
0,
ec1h0,
ec2h0,
ec3h
0,
es1h0,
es2h0,
es3h
0,eo1h0,eo2h0,eo3h
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
UHC,
c
UHS,
s
UHO,
o
LHA1LHA2
LHC1
LHC2
LHC3
LHS1
LHS2
LHS3
LHO1
LHO2LHO3
FEI1FETL1 FEI2FETL2FEI3
FETL3
0,
ee1f0,
ee2f0,
ee3f
1
1
1
FAI1FATL1 FAI2FATL2FAI3
FATL3
0,ea1f0,
ea2f0,
ea3f
1
1
1
FCI1FCTL1 FCI2FCTL2FCI3FCTL3
0,
ec1f0,
ec2f0,
ec3f
1
1
1
FSI1
FSTL1FSI2
FSTL2 FSI3
FSTL3
0,es1f0,es2f
0,
es3f
1
1
1
FOI1FOTL1 FOI2FOTL2 FOI3FOTL3
0,
eo1f0,
eo2f0,eo3f
1
1
1
LFE1LFE2LFE3
LFA1LFA2
LFA3
LFC1LFC2
LFC3
LFS1LFS3LFS2
LFO1
LFO3
LHA3
UFO
FO
UHJ,
JFJI1
FJTL1 FJI2FJTL2 FJI3
FJTL3
0,ejf1
0,ejf20,ejf3
1
1
1
HJI1
HJTL1 HJI2HJTL2 HJI3
HJTL3
0,ejh10,ejh20,ejh3
1
1
1LHJ3
LHJ2
LFJ1LFJ2LFJ3
LHJ1
LFO2
1
UFS
FS
UFC
FC
UFA
FA
UFE
FE
UFJ
FJ
1
1
1
1
1
0,EFO
0,EFS
0,EFC
0,EFA
0,EFE
0,EFJ
1
1
1
1
1
1
UF,
F
1
While my wife and son and daughter-in-law visited Europe, I stayed home to try to develop a perspective on model we had developed. I did this in response to the remark of a reviewer of the SIOP paper who said, “learn all you can about longitudinal models . . .”, I spent the summer doing just that and figuring out how to conceptualize the faking model as an LGM. It turns out to have been a bust, I think.
Applications of SEM - 16 Printed on 5/8/2023
Measuring Method bias in Honest conditions - 2006.
F in the above is a latent variable that represents a systematic bias on part of participants to adjust their scores to ALL items on a questionnaire. Most participants adjusted their scores positively in the faking condition, but some adjusted them negatively. This type of adjustment is what has been studied under the heading of method bias for more than 20 years. So, it could be said that the above model is consistent with the conceptualization that faking is a form of method bias that emerges under instructions to fake.
The existence of a method bias in the faking condition lead to the question: Is there analogous (or different) bias occurring when participants are instructed to response honestly. The natural extension of the above model is one in which a latent variable like F is added with the Honest testlets as indicators Here it is . . .
.72
HETL1 .82
HETL2 .71
HETL3
.41
HATL1 .63
HATL2 .87
HATL3
.74
HCTL1 .62
HCTL2 .61
HCTL3
.82
HSTL1 .76
HSTL2 .73
HSTL3
.74
HOTL1 .62
HOTL2 .43
HOTL3
e
.85
.90
.84
a
ee1h
ee2hee3h
ea1h
ea2h
ea3h
ec1h
ec2h
ec3h
es1h
es2hes3h
eo1h
eo2h
eo3h
c
s
o
.64
.75
.83
.74
.78
.86
.86
.77
.83
.78
.63
.50
FETL1 .40
FETL2.65
FETL3
ee1f
ee2f
ee3f
.33
FATL1 .26
FATL2 .59
FATL3
ea1f
ea2f
ea3f
.64
FCTL1 .55
FCTL2 .52
FCTL3
ec1f
ec2f
ec3f
.65
FSTL1.58
FSTL2 .54
FSTL3
es1f
es2f
es3f
.65
FOTL1 .43
FOTL2 .40
FOTL3
eo1f
eo2f
eo3f
.41
.46
.39
.27.36.44
.45.31
.31
.42
.35.34
.32.16
TwoCondition_M,FRMSEA = .045CFI = .956Chi-square = 714.292df = 507p = .000
.36
.17
.31
.32
.36
.08
.28 .23
.50
.31
.92
F
.57
.65
.68.69
.65.67
sjt.16
.27
.15
.38
.32
FJTL1.36
FJTL2.31
FJTL3
ejf1
ejf2
ejf3
.59
HJTL1.59
HJTL2 .46
HJTL3
ejh1
ejh2
ejh3
.65.77
.53
.54
.52
.76
.58.44
.51.36
.66
.33
.20
.25
.20
.71
.63
.74
.14
M
.13
.04
.22
.06
.00-.02
-.06-.25.13
.25.27-.03
.27
.14.38
.22-.15-.18
.00
.61
.41
.32.46
.46
.43.36
.26
.42.34
.31
.36.27
.45
.38.44
.30.15.11
.10.11
.17
.28.10.27
.40.44-.30
.29.34
.33
.06
.24.38
.16.30
.32
Applications of SEM - 17 Printed on 5/8/2023
Note that this model fits the data quite well (if you ignore the chi-square).
Applications of SEM - 18 Printed on 5/8/2023
Estimating method bias from a single session of data -2006.
At the time we believed that the ability to estimate a method bias latent variable was due to the fact that we were employing two-condition data – with an honest condition and a faking condition.
We then decided to see whether or not the method latent variables (M or F) could be estimated from the data of only one condition. Here are the results for the H condition of Nhung’s study . . .
The date on the output is probably not correct, since we didn’t start looking at M until 2005. I often change models without changing the documentation associated with them. That is the weak point of documentation – it must be kept consistent. Who has the time?
.74
HETL1 .81
HETL2 .73
HETL3
.43
HATL1 .56
HATL2 .85
HATL3
.76
HCTL1 .62
HCTL2 .72
HCTL3
.81
HSTL1 .77
HSTL2 .76
HSTL3
.76
HOTL1 .62
HOTL2.41
HOTL3
e
.86
.90
.82
a
ee1h
ee2hee3h
ea1h
ea2h
ea3h
ec1h
ec2h
ec3h
es1h
es2hes3h
eo1h
eo2h
eo3h
c
s
o
.64
.71
.84
.76
.81
.90
.87
.84
.87
.70
.55
SIOPM4_MeansNotEstimated 3/18/4RMSEA = .056CFI = .964Chi-square = 166.306df = 102p = .000
.35
.16
.30
.29
.36
.09
.27 .27
.53
.34
.92
M
sjtml.15
.30
.20
.40
.61
HJTL1 .54
HJTL2 .46
HJTL3
ejh1
ejh2
ejh3
.67.73.78
.15
.01
.05-.09
.00.09.23
.14
.22-.02
-.22-.21.24
-.06.07-.24
.01
.37.32
This model is significant in two ways. First, it demonstrates that the “general” factor (called M here) can be estimated from the data of ONE condition. Second, it demonstrates that there is apparently a general factor effect even when participants are told to respond honestly.
Applications of SEM - 19 Printed on 5/8/2023
The following is the “same” model applied to only the Nguyen faking condition data.
The main points of this and the previous page is that 1) Method effects exist in both faked data and in honest data, and 2) Big 5 latent variables AND a method bias latent variable could be measured from the data of a single instructional condition.
e
a
c
s
o
.68
FETL1 .79
FETL2.80
FETL3
ee1f
ee2f
ee3f
.56
.45
.45
.50
FATL1 .48
FATL2 .87
FATL3
ea1f
ea2f
ea3f
.71
.72
.37
.78
FCTL1 .84
FCTL2 .68
FCTL3
ec1f
ec2f
ec3f
.47
.40
.56
.84
FSTL1.75
FSTL2 .70
FSTL3
es1f
es2f
es3f
.40
.50
.55
.75
FOTL1 .65
FOTL2 .48
FOTL3
eo1f
eo2f
eo3f
.50
.59
.72
.69
.85
.63
.41
.60
.68
.42
.70
.45
.64
.62.63
.53
.48
SIOPM4_MeansNotEstimated F TestletsCFA 11/17/5RMSEA = .032CFI = .991Chi-square = 123.439df = 102p = .073
.42
.24
.45
.55
.30
.05
.31 .28
.53
.51
FA
.50
.46
.56.60.66
.69.58
sjtml.07
-.12
-.03
.06
.72
FJTL1.57
FJTL2 .58
FJTL3
ejf1
ee2hejf2
ejf3
.53
.65
.65
.74 .62
.64
.46.26
.57.34
.78
.66
.40
.43
.42.63
.63
.69
-.02
Applications of SEM - 20 Printed on 5/8/2023
Dude, check out the nonsignificant chi-square.
2007 - Measuring faking from a single session.
The fact that the method latent variable could be measured from the data of a single session meant that it might be possible to measure “faking” from the data of a single session, something that has been done only once, by Cellar, et. al. in 1996. We applied the faking model to both condition of Nhung’s data and then to only the faking data.
For each application, we computed factor scores of the F latent variable. If the F latent variable in the one-condition data was measuring faking in the same way as the F latent variable in the two-condition data, the factor scores should be highly correlated. Here’s a scatterplot of the faking latent variable factor scores from Nhung’s data and from the data of a follow-up study with Lyndsay Wrensen . . .
b. Wrensen and Biderman (2005)
The above relationships strongly suggest that faking measured in the one-condition data is highly correlated with faking measured from two-condition data. This suggests that the faking model could be applied to the data of a single session and amount of faking of participants in that session measured.
Applications of SEM - 21 Printed on 5/8/2023
One-condition Faking Ability Factor Scores
One-condition Faking Ability Factor Scores
Parcels vs. Items as Indicators - 2007
The models above were all applied to testlet/parcel data. That is, each indicator was the average of responses to two or three items. We did that originally because of a belief that we would not get acceptable goodness-of-fit unless we applied the models to parcel data. In the period 2005-2007 we began considering the use of individual items as indicators, rather than parcels. One reason for this was that having more indicators gives you more degrees of freedom, and allows you more freedom to estimate latent variables. The downside is that I believe there is a general tendency for models of individual items to have poorer fit indices than those of parcels. For example, below are graphs of fit indices CFI and RMSEA for individual-item indicators and 2-item parcel indicators for the same data. Note that goodness-of-fit is generally better for the two-items parcel data, particularly that the CFI values move from traditionally unacceptable to traditionally acceptable when parcels are indicators.
Applications of SEM - 22 Printed on 5/8/2023
Two-itemParcel Indicators
Individual-itemIndicators
CFI increased by.1 or more in each study when 2-item parcels were used as indicators, rather than individual items.
RMSEA decreased in two studies and stayed the same in two when 2-item parcels were used as indicators, rather than individual items.
Individual-itemIndicators
Two-itemParcel Indicators
Positively-worded and negatively-worded method biases –2008.
Nhung Nguyen had mentioned in emails regarding method bias that we should look at method bias associated with item wording, specifically associated with positively worded items and with negatively-worded items.
We decided to look at bias associated with positively-worded and with negatively-worded items for the several studies we’ve conducted here. Here’s what the path diagram of a Mp, Mn model with individual items as indicators looks like . . . Because of the complexity of the path diagram, all of the applications of the MpMn model have been done using Mplus, which is programmed with commands, rather than figures.
Applications of SEM - 23 Printed on 5/8/2023
Mn
Mp
O
S
C
A
E
O10O9O8O7O6O5O4O3O2O1
S10S9S8S7S6S5S4S3S2S1
C10C9C8C7C6C5C4C3C2C1
A10A9A8A7A6A5A4A3A2A1
E10E9E8E7E6E5E4E3E2E1
Here’s a summary of Mplus output from application of the MpMn model to four datasets.
In each application, the 10 individual IPIP Big 5 items were indicators.
Dataset
Nguyen Wrensen Damron Sebren
Model df Chi-square
CFA with No method 1165 2252.12 2315.73 2839.79 2552.45
CFA with M 1115 2031.74 2048.11 2449.20 2253.26
CFA with Mp,Mn 1114 1972.32 2025.24 2282.74 2184.08
M vs. No M 50 220.38* 267.62* 390.59* 299.19*
MpMn vs. M 1 59.42* 22.87* 166.46* 69.18*
MpMn CFI .785 .712 .833 .708
MpMn RMSEA .062 .070 .054 .072
Correlation of Mp with Mn .766 .844 .754 .752
* p < .001
The bottom line is that there is considerable evidence that the responses of participants to Big Five items are influenced by
1. The amount of the particular Big 5 characteristic that each participant possesses
2. A tendency to adjust responses to all positive items. The adjustment is positive for some people, negligible for some, negative for others.
3. A tendency to adjust responses to all negatively worded items. The adjustment is positive for some people, negligible for some, negative for others.
The item-wording adjustments measured here are independent of the Big Five dimensions.
The item-wording adjustments are positively correlated with each other, although not so positively correlated that they can be treated as a single latent variable. This is shown by the significant MpMn vs. M chi-squares.
Applications of SEM - 24 Printed on 5/8/2023
2009 – Three types of bias factor – General bias, negative bias, and positive bias
As we explored the idea that there are different bias factors associated with different item wordings, I questioned the idea that there were only two such factors. It seemed more reasonable that there are THREE bias factors – negative, positive, and a general bias factor. My idea was buttressed by a recent article by March et al. (2010) in which three factors – a negative, positive, and general factor – were found to account for data of the Rosenberg Self Esteem scale. We explored this possibility by comparing several models for five different datasets. They’re summarized in the following figure from a paper recently submitted to Journal of Research in Personality. Model 6 is the model I believe best represents Big Five questionnaire data.
Figure 1. Models compared. Each rectangle represents the items indicating a Big Five dimension. The left half of each rectangle represents positively-worded items and the right half negatively-worded items. A single arrow drawn from a factor to a rectangle represents all the loadings from that factor to the indicators represented by the rectangle. Residual latent variables have been omitted for clarity
Applications of SEM - 25 Printed on 5/8/2023
OSCAE
Mp
Op | OnSp | SnCp | CnAp | AnEp | En
M
OSCAE
MnMp
Op | OnSp | SnCp | CnAp | AnEp | En
Model 4Model 3
Mp Mn
OSCAE
M
Op | OnSp | SnCp | CnAp | AnEp | En
OSCAE
Mn
Op | OnSp | SnCp | CnAp | AnEp | En
M
Model 6Model 5
OSCAE
M
Op | OnSp | SnCp | CnAp | AnEp | En
OSCAE
Op | OnSp | SnCp | CnAp | AnEp | En
Model 2Model 1
The results of comparisons . . .
Table 2. Chi-square goodness-of-fit measures and chi-square difference tests.------------------------------------------------------------------------------------------------------------------
Analysis--------------------------------------------------------------------
1 2 3 4 5 6 df--- --- --- --- --- --- ---
Questionnaire IPIP IPIP IPIP IPIP IPIP NEO IPIP / NEO-------- -------- ------- -------- -------- -------- ---------------
Model 1 2174.4 2552.5 3523.0 3734.4 2568.6 3219.6 1165 / 1700Model 2 1901.7 2253.3 3063.7 2431.2 2241.5 2893.1 1115 / 1640
Chi-square Model 3 1853.7 2186.2 2715.7 2275.0 2230.9 2838.2 1114 / 1639Model 4 1786.0 2136.2 2638.9 2112.9 2085.0 2744.8 1089 / 1611Model 5 1758.2 2101.9 2629.6 2152.7 2044.1 2732.7 1091 / 1609Model 6 1642.9 1980.7 2162.2 1912.3 1962.1 2589.6 1065 / 1580
-----------------------------------------------------------------------------------
Δχ2 Model 2 vs 1 272.7 299.2 459.3 672.9 327.1 326.5 50 / 60Δχ2 Model 3 vs 2 54.0 67.1 348.0 156.2 31.1 54.9 1 / 1
rMpMn .77 .76 .33 .50 .86 .89Δχ2 Model 4 vs 2 115.7 117.1 424.8 318.3 156.5 148.3 26 / 29Δχ2 Model 5 vs 2 143.5 151.4 434.1 278.5 197.4 160.4 24 / 31Δχ2 Model 6 vs 2 258.8 272.6 901.5 518.9 279.4 303.5 50 / 60
Δχ2 Model 6 vs 4 143.1 155.5 476.7 200.6 122.9 155.2 24 / 31Δχ2 Model 6 vs 5 115.3 121.2 467.4 240.4 82.0 143.1 26 / 29------------------------------------------------------------------------------------------------------------------Note. For Analysis 4, residual variance of one item set to .001 .
For analysis 5, variance of Mp set to .001;For analysis 3, residual variance of one item set to .001;
In all data sets the most general model, Model 6, fit significantly better than any of the other models.
This suggests that the most appropriate model for Big Five data is one that include EIGHT factors – 5 Big Five Trait factors and THREE method bias factors – one influencing only negatively worded items, a second influencing only positively worded items, and a third influencing all items.
Applications of SEM - 26 Printed on 5/8/2023
Here is a more detailed figure representing Model 6.
But wait, there’s more . . .
Applications of SEM - 27 Printed on 5/8/2023
M
Mn
Mp
O
S
C
A
E
O10O9O8O7O6O5O4O3O2O1
S10S9S8S7S6S5S4S3S2S1
C10C9C8C7C6C5C4C3C2C1
A10A9A8A7A6A5A4A3A2A1
E10E9E8E7E6E5E4E3E2E1
2010 – Method factors as measures of well-being??
A few years ago a person with whom I was acquainted was going through some very rough times. One of the primary problems was depression. That person had taken a Big Five questionnaire. When the Big Five questionnaire was scored for just the five traits, there was nothing terribly unusual about the profile of scores. Even the Emotional Stability score, while below average, was not as far below average as one would have expected based on the severity of the depression at that time.
However, when the Big Five was scored for SIX factors – the Big Five plus M – a striking profile emerged. The person’s scores on the Big Five traits, including Emotional Stability were nearly normal, but the persons M score were VERY low. At the time, we were still considering M to be a measure of faking and I didn’t do anything immediately with the information. It was one of those isolated pieces of information that you store away for future reference.
Last year, I gathered data on the Big Five, and remembering that person’s profile, I included a measure of depression and also a measure of self-esteem in the questionnaire packet that was given to students. In the analysis of the data, I correlated M scores with both depression and self-esteem scores.
Here are the results, from a paper presented at SIOP in 2011 . . .
Table 1. Means, standard deviations, correlations, and reliability coefficients for study variables.--------------------------------------------------------------------------------------------------------------------
Mean SD E A C S O M CCD RSE_____ _____ _____ _____ _____ _____ _____ _____ _____ _____
E 4.75 1.04 .885
A 5.30 0.74 .317c .789
C 4.57 0.86 .007 .164a .823
S 4.24 0.99 .237b .176a -.021 .842
O 4.85 0.82 .244c .335c .270c .156a .812
M 0.00 0.38 .714c .616c .231b .592c .292c .912
CCD 1.84 0.83 -.202b -.309c -.330c -.284c -.192b -.412c .920
RSE 5.65 0.87 .285c .188a .381c .242c .359c .401c -.674c .847
--------------------------------------------------------------------------------------------------------------------
a p < .05 b p < .01 c p < .001
M correlates very negatively with Depression (CCD) and very positively with Self Esteem (RSE). Its correlations with these are larger than the correlations of Emotional Stability (S) with both. In fact, we argued in the paper that the correlations of S with both CCD and RSE were spurious, caused by the influence of M on both the Big Five, CCD, and RSE scores.
Applications of SEM - 28 Printed on 5/8/2023
When the effect of M on S is removed, the correlation between “purified” S and CCD was .01 and with RSE was .00.
Applications of SEM - 29 Printed on 5/8/2023
2011 - The Big Two and the General Factor of Personality (GFP)
Several theorists believe that there are higher order factors that influence the Big Five.
The Big Two theorists believe that there are two 2nd order factors – Stability and Plasticity. Stability is believed to influence Agreeablenss, Conscientiousness, and Emotional Stability. Plasticity is believed to influence Extraversion and Opennesss.
Other theorists believe that there is a single higher order factor – called the general factor of personality or GFP. It has been conceptualized as a 3rd order factor, influencing Stability and Plasticity.
Applications of SEM - 30 Printed on 5/8/2023
O
E
S
C
A
O10O9O8O7O6O5O4O3O2O1
S10S9S8S7S6S5S4S3S2S1
C10C9C8C7C6C5C4C3C2C1
A10A9A8A7A6A5A4A3A2A1
E10E9E8E7E6E5E4E3E2E1
GFP
Pl
St
2011 – M and the GFP
Contrast the GFP model with the models we’ve been considering. Clearly they are 1) different and 2) can coexist.
Our data have indicated that when M is estimated, the correlations between the Big Five factors are reduced to essentially zero. Since the indicators of a factor must be correlated, else there is no reason for the factor, this result provides little support for the GFP as presented below.
Some studies regarding the GFP have used the first unrotated factor in an EFA of items. They found that the GFP estimated in that way correlated positively with self presentation. But I would argue that what they’ve done is get crude estimates of M and have replicated our finding of the relationship of M to self presentation.
Applications of SEM - 31 Printed on 5/8/2023
GFP
Pl
St
O
E
S
C
A
O10O9O8O7O6O5O4O3O2O1
S10S9S8S7S6S5S4S3S2S1
C10C9C8C7C6C5C4C3C2C1
A10A9A8A7A6A5A4A3A2A1
E10E9E8E7E6E5E4E3E2E1
M
Van der Linden, D., Scholte, R. H. J., Cillessen, A. H. N., Nijenhuis, J., Segers, E. (2010). Classroom ratings of likeability and popularity are related to the Big Five and the general factor of personality. Journal of Research in Personality, 44, 669-672.