The Cassill Data · Web viewApplications of SEM Bill Cassill’s Thesis Data: Likert scales vs....

Applications of SEM

Bill Cassill’s Thesis Data: Likert scales vs. Content Analysis

Bill Cassill chose to investigate the extent to which content analysis could be used to measure traits which are typically measured using Likert scaling techniques. He chose student evaluations of instructors as the vehicle for this investigation.

He picked a common Likert type evaluation form and created a questionnaire which included four scales on this form. These scales measured student’s perceptions of

1) Enthusiasm of the instructor,2) Learning Value - the extent to which the instructor stimulated intellectual effort3) Interaction - the extent to which the instructor encouraged student discussion4) Organization of the instructor - the extent to which the lectures followed a logical

order

In the questionnaire, Bill also included two questions to which written answers were to be given. The first was “In what aspects to you think your instructor a good teacher?” The second was, “In what aspects to you think your instructor needs improvement?”

For each student’s written responses, Bill counted the number of positive references to Enthusiasm, Learning, Interaction, and Organization. He also counted the number of negative references to each.

Then for each student, he formed eight scale scores. The first four were the student’s summated responses to the four Likert scales.

The second four were based on the counts of references from the written responses - in each case it was the number of positive references minus the number of negative references. So a positive count would mean that the student made more positive than negative comments about a teacher with respect to an attribute, such as Enthusiasm.

Ultimately, he had an 8 column by 200+ row data matrix. Four of the columns represented Likert scale scores. Four represented Content Analysis scale scores.

The two major issues here are the following . . .1) Do the four scales (E, L, I, and O) represent four separate dimensions. If so, there should be low

correlations between the scales. This is a discriminant validity issue.2) Do the two methods measuring the same dimension correlate with each other. If they do, then

that’s an indication that they’re both measuring the same dimension. This is a convergent validity issue.

Applications of SEM - 1 Printed on 5/8/2023

The questions used in the Cassill thesis

Instructions given to participants:

With regard to the instructor, please rate how often your instructor did each of the following using the rating format below:

1-hardly ever2-occasionally3-sometimes4-frequently5-almost always

The names in parentheses are the dimensions represented by the items and were not on the questionnaire given to students.

1. (Interaction). Promoted teacher-student discussions (as opposed to mere responses to questions).2. (Interaction). Found ways to help students answer their own questions.3. (Interaction). Encouraged students to express themselves freely and openly.4. (Enthusiasm). Seemed enthusiastic about the subject matter.5. (Enthusiasm). Spoke with expressiveness and variety in tone of voice.6. (Enthusiasm - reversed scored). Made presentations that are dull and dry.7. (Organization). Made it clear how each topic fits into the course.8. (Interaction). Explained the reasons for criticisms of students' academic performance.9. (Interaction). Encouraged student comments even when they turn out to be incorrect or irrelevant.10. (Organization). Summarized material in a way which aids retention.11. (Learning). Stimulated students to intellectual effort beyond that required by most courses.12. (Organization). Clearly stated the objectives of the course.13. (Organization). Explained the course material clearly, and explanations are to the point.14. (Learning). Related course material to real life situations.15. (Learning). Introduced stimulating ideas about the subject.

The items were taken from Cashin, W. E. and Downey, R. G. (1992). Using global student rating items for summative evaluation. Journal of Educational Psychology, 84(4), 563-572.

The responses to the items are variables i1 through i15 in the CassillNM.SAV file.

Summary

There are 8 observed variables

4 are Likert scale scores on the four dimensions4 are content analysis scores on the same four dimensions


Exploration of Various Models

Model 1: Simple Orthogonal Instructor Dimension Factors Model

This model assumes that there are four dimensions, and the dimensions are orthogonal. Clearly this is not the correct model.

.17

likenth

.44

cainter

.36

liklrn

.69

likorg

.25

caenth

.32

likinter

.44

calrn

.36

caorg

EN

INT

LRN

ORG

.50

.57

.66

.60

.42

.67

.60

.83

eae

eoe

eai

eoi

eal

eol

eao

eoo

Chi-square = 502.44df = 24p = .00RMSEA = .30


What’s good?1. Each indicator loads on its factor.2. Content Analysis indicators have high loadings.

What’s bad?1. Horrible fit.

Model 2: Oblique Instructor Dimension Factors

This model assumes four dimensions of teaching as before but assumes that they are correlated.

.41

likenth

.37

cainter

.56

liklrn

.86

likorg

.25

caenth

.53

likinter

.38

calrn

.37

caorg

EN

INT

LRN

ORG

.50

.73

.62

.61

.64

.61

.75

.93

eae

eoe

eai

eoi

eal

eol

eao

eoo


1.15

1.02

.89.83

.75

.86


Note that even though these are standardized loadings, some of the correlations between factors are > 1.

That’s a red flag that the model is not the appropriate model.What’s good?

1. Indicators load on factors.2. CA indicators have large, positive loadings.

What’s bad1. Horrible fit.2. Highly correlated factors.3. Factor correlations out of range.

Model 3: A Higher-order Instructor Evaluation Factor. In this model, the correlations between the factors have been replaced by a higher order factor. Since there was something screwy about the above oblique factor model, we wouldn’t expect this model to fix the problem, and it hasn’t. This model fits almost as well (or poorly) as the four correlated dimensions.

In fact, substituting correlations between a set of indicators (observed indicators or lower order factors) with a higher order factor will always fit WORSE than the model which allows the indicators to simply be correlated.

That’s because the loadings on the higher order factor have to meet certain criteria. These criteria are more restrictive than the “anything goes” unrestricted correlations between the indicators.

Thus the higher order factor model will always fit worse than the unrestricted correlations between indicators model.

It’s shown here merely to illustrate how the correlations between first order factors can be accounted for my assuming a single higher-order factor.

.36

likenth

.36

cainter

.54

liklrn

.86

likorg

.24

caenth

.50

likinter

.38

calrn

.37

caorg

1.31

EN

.88

INT

.90

LRN

.70

ORG

.49

.71

.62

.61

.60

.60

.74

.93

eae

eoe

eai

eoi

eal

eol

eao

eoo


OverallEval

1.14

.94

.95

.83

een

eint

elrn

eorg


What’s good?1. Indicators load on factors.2. CA indicators have large, positive loadings.

What’s bad1. Horrible fit.2. One standardized loading out of range.

Model 4A. Two Orthogonal Response Method Factors –Likert and Content Analysis. In this model, two different rating processes are assume – a Likert process and an open-ended process. Note that it does not allow corresponding scales to correlate, e.g., LIKENTH with CAQENTH. To get the estimates, the variance of CA had to be fixed at 1. Although this model clearly fits better than the four-dimension model, there are some problems. For example, the standardized loading of caenth onto CA is 1.68, a value that doesn’t make sense.

.65

nlikenth

.00

cainter

.68

nliklrn

.70

nlikorg

2.81

caenth

.52

nlikintr

.00

calrn

.00

caorg

Likert

CA-.03

eae

eoe

eai

eoi

eal

eol

eao

eoo

Cassillnm dataChi-square = 162.62df = 21p = .00RMSEA = .18

-.01

.05

1.68

.72

.83

.83

.81


What’s good?1. Better fit than above models.2. Likert scales have nice positive loadings.

What’s bad?1. Wrong dimensions – we were expecting 4 content dimensions, not 2 method dimensions.2. Screwy loadings on CA.3. Fit still not good.

Model 4B. Two Oblique Response Method Factors.

This model fits better, although the chi-square is still significant. But it’s doesn’t square with our conceptualization of the problem. Specifically, we conceptualize there being four dimensions of teaching performance – enthusiasm, interaction, learning, and organization. This is saying there are only two dimensions and they correspond to the METHOD of response, not the characteristics of the teachers. This is a situation in which theory and data collide.

There may be a problem with the estimation of this model. Why is the standardized loading of CAENTH on CA = 1.67? (The reason, not shown in the standardized model presented below is that the model represents what is called a Heywood case. One of the residual variance estimates is less than 0.)

.66

nlikenth

.00

cainter

.68

nliklrn

.69

nlikorg

2.80

caenth

.52

nlikintr

.00

calrn

.00

caorg

Likert

CA-.05

eae

eoe

eai

eoi

eal

eol

eao

eoo


-.04

.01

1.67

.72

.83

.83

.81

.15


What’s good?1. Fit is a little better than the above models.2. Likert indicators are good.

What’s bad?1. Fit still bad.2. Two method dimensions rather than content dimensions.3. Heywood case for CA subset. 4. Correlation between Likert and CA factors is small.

Model 4B revisited, after experimenting with the reference indicator to find one that didn’t yield a Heywood case.

But there is an anomaly here, also – the correlation between the two factors is > 1.


Model 4C: A Single Overall Evaluation Factor.

Amazingly, this model fits much better than does a model which assumes four dimensions of teaching and it fits just as well as a model which assumes two dimensions of responding. Actually, it suggests a resolution of the conflict of theory and data. It suggests that there is a general tendency to respond positively or negatively that affects ALL the measures – Likert and written responses.


What’s good?1. Fit is a much better than the above models.2. Likert indicators are good.

What’s bad?1. Fit still bad.2. Two method dimensions rather than content dimensions.3. Correlation between factors is larger than 1, not a possible value.

.65

nlikenth

.08

cainter

.69

nliklrn

.70

nlikorg

.04

caenth

.52

nlikintr

.15

calrn

.22

caorg

eae

eoe

eai

eoi

eal

eol

eao

eoo


OverallEval

.81

.72

.83

.84

.21

.28

.39

.46


What’s good?1. Fit is not much worse than the best of the previous models.2. Loadings of indicators on the single factor are all positive.

What’s bad?1. Fit still not acceptable.2. Only one method factor – no content factors.

Model 5. A bifactor model – a Method Factor model with 4 content factor. This model fits extraordinarily well. The chi-square is not significant, something rarely found in application of SEM. Moreover, it makes sense. It fits both theory and data. The fit to theory is found in the evidence for 4 dimensions of instructor performance. They’re correlated but the correlations are not extraordinarily high. It also fits the data, quite well. And the Method factor makes sense. It reflects a individual differences in a general tendency to evaluate everything either positively or negatively. Note that this tendency is most apparent in the Likert items – they load most highly on it.

1.00

likenth

.15

cainter

.69

liklrn

.99

likorg

.04

caenth

1.00

likinter

.36

calrn

.25

caorg

EN

INT

LRN

ORG

.11

.69

.58

.31

.69

.30

.53

.53

METH.64

.85

eae

eoe

eai

eoi

eal

eol

eao

eoo

.24


.05

.48

.53

.51

.37

.72

.18

.72

.14

.39


HALO

What’s good?1. Fit is acceptable.2. Model makes sense.

What’s bad?1. Nothing I can think of.

Development of bifactor models of Big Five Questionnaires.

The original Faking model. Nhung Nguyen’s dissertation data – Summer 2003

Nhung had gathered data involving administration of the IPIP 50 item Big 5 twice – once with instructions to respond honestly, once with instructions to fake good.

When working with these data on a paper involving faking of situational judgment tests, I hit on the idea that the Big 5 latent variables were common across the two instructional conditions but there was an additional influence on responding in the faking conditions. I later found that others had considered this notion. This lead to the following set of models.Model 1: Basic CFA of Parcels formed from Honest and Faked Questionnaire Items.

HSURGT1

HSURGT2

HSURGT3

HAGREET1

HAGREET2

HAGREET3

HCONST1

HCONST2

HCONST3

HEST1

HEST2HEST3

HINT1

HINT2

HINT3

e

.87

.86

.79

a

ee1hee2hee3h

ea1h

ea2h

ea3h

ec1h

ec2h

ec3h

es1h

es2hes3h

eo1heo2heo3h

c

s

o

.66

.57

.62

.44

.58

.51.48

.38

.55.49

.48

FSURGT1

FSURGT2

FSURGT3

ee1f

ee2fee3f

FAGREET1

FAGREET2

FAGREET3

ea1fea2f

ea3f

FCONST1

FCONST2

FCONST3

ec1fec2f

ec3f

FEST1

FEST2

FEST3

es1f

es2f

es3f

FINT1

FINT2

FINT3

eo1f

eo2feo3f

.67

.64

.59

.73

.67

.63

.89.83

.82

.91

.84

.86

.86

.30

.23

.34

.44

.54

.31

.49 .63

.77

.70

.56

sjtml.20

.29

.37

.32

.44

FSJTML1

FSJTML2

FSJTML3

ejf1ejf2ejf3

HSJTML1

HSJTML2

HSJTML3

ejh1

ejh2

ejh3.59

.68

.76

.76

.75

.66

.67

.77


Notes;1. We formed 3 testlets/parcels from each set of 10 items, discarding the item with lowest communality from each dimension.2. This is simply a CFA of the 36 testlets – 6 for each dimension, 3 honest, 3 fake-good.3. SJT represents situational judgment test responses. 4. Fit is terrible because, we believe, that there is another influence on the Faked items, an influence not included in this model, a Faking influence.

X2(579 )= 2241.38.GFI = .558AGFI = .492RMSEA = .119

Skipped to end of lecture in 2015.

Below is the model above with one additional latent variable, representing individual differences in tendency to agree with each item based on the instructions, rather than the item content, called F, for faking here.

To simplify the presentation, the 3 regression arrows to each set of 3 parcels are represented as a single arrow.

Note that the fit is not spectacular, but that it is much better than the fit of the previous model.


.62

.71

.64

.40

.54

.21

F

.39

.72

.45

.86

.49

.78

.50

.72

.53

.87

.70

.67

F-O

H-O

F-S

H-S

F-C

H-C

F-A

H-A

F-E

H-E

F-SJT

H-SJT

O

S

C

A

E

SJT

X2(561) = 1323.05GF I= .736AGFI = .687RMSEA = .082

Since we were afraid that the paper might be rejected outright because of the fact that the fit indices were not close enough to the traditional threshold values, we looked around for ways to improve fit. We realized that when participants are asked to respond to the same item twice, even under different instructional sets, their responses to those identical items will both be influenced by specific idiosyncratic aspects of the items. Thus, across participants, responses to identical items will be positively correlated.

These idiosyncratic items are part of the “other” influences that are the residual terms. So we allowed the residuals of identical testlets to be correlated. This let to the following model . . .

.80

HSURGT1.79

HSURGT2.66

HSURGT3

.86

HAGREET1.39

HAGREET2.39

HAGREET3

.78

HCONST1.58

HCONST2.51

HCONST3

.84

HEST1 .72

HEST2 .68

HEST3

.63

HINT1 .55

HINT2 .41

HINT3

e

.90

.89

.81

a

ee1h

ee2hee3h

ea1h

ea2h

ea3h

ec1h

ec2h

ec3h

es1h

es2hes3h

eo1h

eo2h

eo3h

c

s

o

.93

.63

.88

.76

.72

.92

.85

.83

.80

.74

.64

.55

FSURGT1.51

FSURGT2.65

FSURGT3

ee1f

ee2f

ee3f

.53

FAGREET1.40

FAGREET2.25

FAGREET3

ea1f

ea2f

ea3f

.67

FCONST1.60

FCONST2.58

FCONST3

ec1f

ec2f

ec3f

.74

FEST1 .69

FEST2 .64

FEST3

es1f

es2f

es3f

.66

FINT1 .51

FINT2 .40

FINT3

eo1f

eo2feo3f

.56

.55

.43

.55

.37

.45

.52

.39

.38

.44

.42.38

.38.23

Faking Model Correlated F, F-H errorsRMSEA = .079CFI = .855Chi-square = 1224.934df = 543p = .000

.27

.14

.26

.36

.36

.08

.30 .26

.50

.32

.63

F

.59

.60

.68

.74.74

.66.67

sjtml.14

.14

.23

.09

.34

.57

FSJTML1.52

FSJTML2.51

FSJTML3

ejf1

ee2hejf2ejf3

.49

HSJTML1.49

HSJTML2.35

HSJTML3

ejh1

ejh2

ejh3

.59.70

.73

.69

.69

.70

.19

.22.18

.48.46.68

.48.51.21

.64

.39

.72

-.14.02

.07

.03

-.08.20

-.50

.27.33

.16.24

.23

-.02

.20.32

.13

.24.27

The fit of this model is closer to being acceptable. Note that F influences only the faked items, not the honest items. We have since discovered that there is an analogous influence on the honest items, one we call M, for method bias.


X2(543)=1056.64GFI=.778AGFI = .728RMSEA = .068

The fit was better, but still not quite at “rejection-proof” levels.We considered other possibilities and discovered that there were positive correlations among the F testlets that were not accounted for by the loadings of those testlets onto the single F factor. These seemed to be dimension-specific effects. To account for these we could have introduced a different F latent variable for each dimension. Instead, we chose to allow the residuals between testlets within each dimension to be correlated. This lead to the following, final model . . . We felt that the fit of this model was acceptable, and submitted the paper for presentation to SIOP, 2004 based on it.

.82

HSURGT1.80

HSURGT2.68

HSURGT3

.89

HAGREET1.41

HAGREET2.41

HAGREET3

.77

HCONST1.60

HCONST2.53

HCONST3

.83

HEST1 .73

HEST2 .69

HEST3

.65

HINT1 .56

HINT2 .40

HINT3

e

.90

.89

.83

a

ee1h

ee2hee3h

ea1h

ea2h

ea3h

ec1h

ec2h

ec3h

es1h

es2hes3h

eo1h

eo2h

eo3h

c

s

o

.94

.64

.88

.77

.73

.91

.86

.83

.81

.75

.63

.44

FSURGT1.39

FSURGT2.61

FSURGT3

ee1f

ee2f

ee3f

.46

FAGREET1.38

FAGREET2.18

FAGREET3

ea1f

ea2f

ea3f

.64

FCONST1.55

FCONST2.53

FCONST3

ec1f

ec2f

ec3f

.65

FEST1 .58

FEST2 .54

FEST3

es1f

es2f

es3f

.65

FINT1 .46

FINT2 .39

FINT3

eo1f

eo2feo3f

.48

.47

.35

.44

.30

.36

.46

.31

.32

.39

.36.32

.32.17

Faking Model Correlated F, F-H errorsRMSEA = .053CFI = .938Chi-square = 819.279df = 525p = .000

.27

.14

.25

.33

.35

.08

.28 .27

.49

.31

.64

F

.60

.60

.65

.69.70

.65.68

sjtml.16

.15

.29

.17

.39

.31

FSJTML1.34

FSJTML2.31

FSJTML3

ejf1

ee2hejf2ejf3

.59

HSJTML1.57

HSJTML2.43

HSJTML3

ejh1

ejh2

ejh3

.66.76

.51

.52

.52

.77

.21

.26.20

.46.41.69

.51.54.21

.66

.32

.74

.43.47

.33

.40.45

.61

.40.24

.25

.27.36

.33

.46

.38

.44

.11.14

.25

.09.13

.18

.19

.10.30

-.20

.32.40

.24.34

.31

.03

.24.38

.19

.31.29


Faking Model Conceptualized as a Longitudinal Growth Model –Summer of 2004 - ALL summer.For each dimension, the single-letter latent variable is the Intercept. The latent variable whose name begins with F is the slope.

HEI1

HETL1HEI2HETL2 HEI3HETL3

HAI1HATL1 HAI2HATL2 HAI3HATL3

HCI1

HCTL1 HCI2

HCTL2HCI3

HCTL3

HSI1

HSTL1HSI2HSTL2HSI3

HSTL3

HOI1HOTL1HOI2

HOTL2HOI3HOTL3

UHE,

eLHE1

LHE2LHE3

UHA,

a

0,

ee1h 0,ee2h0,

ee3h

0,

ea1h0,

ea2h0,

ea3h

0,

ec1h0,

ec2h0,

ec3h

0,

es1h0,

es2h0,

es3h

0,eo1h0,eo2h0,eo3h

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

UHC,

c

UHS,

s

UHO,

o

LHA1LHA2

LHC1

LHC2

LHC3

LHS1

LHS2

LHS3

LHO1

LHO2LHO3

FEI1FETL1 FEI2FETL2FEI3

FETL3

0,

ee1f0,

ee2f0,

ee3f

1

1

1

FAI1FATL1 FAI2FATL2FAI3

FATL3

0,ea1f0,

ea2f0,

ea3f

1

1

1

FCI1FCTL1 FCI2FCTL2FCI3FCTL3

0,

ec1f0,

ec2f0,

ec3f

1

1

1

FSI1

FSTL1FSI2

FSTL2 FSI3

FSTL3

0,es1f0,es2f

0,

es3f

1

1

1

FOI1FOTL1 FOI2FOTL2 FOI3FOTL3

0,

eo1f0,

eo2f0,eo3f

1

1

1

LFE1LFE2LFE3

LFA1LFA2

LFA3

LFC1LFC2

LFC3

LFS1LFS3LFS2

LFO1

LFO3

LHA3

UFO

FO

UHJ,

JFJI1

FJTL1 FJI2FJTL2 FJI3

FJTL3

0,ejf1

0,ejf20,ejf3

1

1

1

HJI1

HJTL1 HJI2HJTL2 HJI3

HJTL3

0,ejh10,ejh20,ejh3

1

1

1LHJ3

LHJ2

LFJ1LFJ2LFJ3

LHJ1

LFO2

1

UFS

FS

UFC

FC

UFA

FA

UFE

FE

UFJ

FJ

1

1

1

1

1

0,EFO

0,EFS

0,EFC

0,EFA

0,EFE

0,EFJ

1

1

1

1

1

1

UF,

F

1

While my wife and son and daughter-in-law visited Europe, I stayed home to try to develop a perspective on model we had developed. I did this in response to the remark of a reviewer of the SIOP paper who said, “learn all you can about longitudinal models . . .”, I spent the summer doing just that and figuring out how to conceptualize the faking model as an LGM. It turns out to have been a bust, I think.


Measuring Method bias in Honest conditions - 2006.

F in the above is a latent variable that represents a systematic bias on part of participants to adjust their scores to ALL items on a questionnaire. Most participants adjusted their scores positively in the faking condition, but some adjusted them negatively. This type of adjustment is what has been studied under the heading of method bias for more than 20 years. So, it could be said that the above model is consistent with the conceptualization that faking is a form of method bias that emerges under instructions to fake.

The existence of a method bias in the faking condition lead to the question: Is there analogous (or different) bias occurring when participants are instructed to response honestly. The natural extension of the above model is one in which a latent variable like F is added with the Honest testlets as indicators Here it is . . .

.72

HETL1 .82

HETL2 .71

HETL3

.41

HATL1 .63

HATL2 .87

HATL3

.74

HCTL1 .62

HCTL2 .61

HCTL3

.82

HSTL1 .76

HSTL2 .73

HSTL3

.74

HOTL1 .62

HOTL2 .43

HOTL3

e

.85

.90

.84

a

ee1h

ee2hee3h

ea1h

ea2h

ea3h

ec1h

ec2h

ec3h

es1h

es2hes3h

eo1h

eo2h

eo3h

c

s

o

.64

.75

.83

.74

.78

.86

.86

.77

.83

.78

.63

.50

FETL1 .40

FETL2.65

FETL3

ee1f

ee2f

ee3f

.33

FATL1 .26

FATL2 .59

FATL3

ea1f

ea2f

ea3f

.64

FCTL1 .55

FCTL2 .52

FCTL3

ec1f

ec2f

ec3f

.65

FSTL1.58

FSTL2 .54

FSTL3

es1f

es2f

es3f

.65

FOTL1 .43

FOTL2 .40

FOTL3

eo1f

eo2f

eo3f

.41

.46

.39

.27.36.44

.45.31

.31

.42

.35.34

.32.16

TwoCondition_M,FRMSEA = .045CFI = .956Chi-square = 714.292df = 507p = .000

.36

.17

.31

.32

.36

.08

.28 .23

.50

.31

.92

F

.57

.65

.68.69

.65.67

sjt.16

.27

.15

.38

.32

FJTL1.36

FJTL2.31

FJTL3

ejf1

ejf2

ejf3

.59

HJTL1.59

HJTL2 .46

HJTL3

ejh1

ejh2

ejh3

.65.77

.53

.54

.52

.76

.58.44

.51.36

.66

.33

.20

.25

.20

.71

.63

.74

.14

M

.13

.04

.22

.06

.00-.02

-.06-.25.13

.25.27-.03

.27

.14.38

.22-.15-.18

.00

.61

.41

.32.46

.46

.43.36

.26

.42.34

.31

.36.27

.45

.38.44

.30.15.11

.10.11

.17

.28.10.27

.40.44-.30

.29.34

.33

.06

.24.38

.16.30

.32


Note that this model fits the data quite well (if you ignore the chi-square).


Estimating method bias from a single session of data -2006.

At the time we believed that the ability to estimate a method bias latent variable was due to the fact that we were employing two-condition data – with an honest condition and a faking condition.

We then decided to see whether or not the method latent variables (M or F) could be estimated from the data of only one condition. Here are the results for the H condition of Nhung’s study . . .

The date on the output is probably not correct, since we didn’t start looking at M until 2005. I often change models without changing the documentation associated with them. That is the weak point of documentation – it must be kept consistent. Who has the time?

.74

HETL1 .81

HETL2 .73

HETL3

.43

HATL1 .56

HATL2 .85

HATL3

.76

HCTL1 .62

HCTL2 .72

HCTL3

.81

HSTL1 .77

HSTL2 .76

HSTL3

.76

HOTL1 .62

HOTL2.41

HOTL3

e

.86

.90

.82

a

ee1h

ee2hee3h

ea1h

ea2h

ea3h

ec1h

ec2h

ec3h

es1h

es2hes3h

eo1h

eo2h

eo3h

c

s

o

.64

.71

.84

.76

.81

.90

.87

.84

.87

.70

.55

SIOPM4_MeansNotEstimated 3/18/4RMSEA = .056CFI = .964Chi-square = 166.306df = 102p = .000

.35

.16

.30

.29

.36

.09

.27 .27

.53

.34

.92

M

sjtml.15

.30

.20

.40

.61

HJTL1 .54

HJTL2 .46

HJTL3

ejh1

ejh2

ejh3

.67.73.78

.15

.01

.05-.09

.00.09.23

.14

.22-.02

-.22-.21.24

-.06.07-.24

.01

.37.32

This model is significant in two ways. First, it demonstrates that the “general” factor (called M here) can be estimated from the data of ONE condition. Second, it demonstrates that there is apparently a general factor effect even when participants are told to respond honestly.


The following is the “same” model applied to only the Nguyen faking condition data.

The main points of this and the previous page is that 1) Method effects exist in both faked data and in honest data, and 2) Big 5 latent variables AND a method bias latent variable could be measured from the data of a single instructional condition.

e

a

c

s

o

.68

FETL1 .79

FETL2.80

FETL3

ee1f

ee2f

ee3f

.56

.45

.45

.50

FATL1 .48

FATL2 .87

FATL3

ea1f

ea2f

ea3f

.71

.72

.37

.78

FCTL1 .84

FCTL2 .68

FCTL3

ec1f

ec2f

ec3f

.47

.40

.56

.84

FSTL1.75

FSTL2 .70

FSTL3

es1f

es2f

es3f

.40

.50

.55

.75

FOTL1 .65

FOTL2 .48

FOTL3

eo1f

eo2f

eo3f

.50

.59

.72

.69

.85

.63

.41

.60

.68

.42

.70

.45

.64

.62.63

.53

.48

SIOPM4_MeansNotEstimated F TestletsCFA 11/17/5RMSEA = .032CFI = .991Chi-square = 123.439df = 102p = .073

.42

.24

.45

.55

.30

.05

.31 .28

.53

.51

FA

.50

.46

.56.60.66

.69.58

sjtml.07

-.12

-.03

.06

.72

FJTL1.57

FJTL2 .58

FJTL3

ejf1

ee2hejf2

ejf3

.53

.65

.65

.74 .62

.64

.46.26

.57.34

.78

.66

.40

.43

.42.63

.63

.69

-.02


Dude, check out the nonsignificant chi-square.

2007 - Measuring faking from a single session.

The fact that the method latent variable could be measured from the data of a single session meant that it might be possible to measure “faking” from the data of a single session, something that has been done only once, by Cellar, et. al. in 1996. We applied the faking model to both condition of Nhung’s data and then to only the faking data.

For each application, we computed factor scores of the F latent variable. If the F latent variable in the one-condition data was measuring faking in the same way as the F latent variable in the two-condition data, the factor scores should be highly correlated. Here’s a scatterplot of the faking latent variable factor scores from Nhung’s data and from the data of a follow-up study with Lyndsay Wrensen . . .

b. Wrensen and Biderman (2005)

The above relationships strongly suggest that faking measured in the one-condition data is highly correlated with faking measured from two-condition data. This suggests that the faking model could be applied to the data of a single session and amount of faking of participants in that session measured.


One-condition Faking Ability Factor Scores

One-condition Faking Ability Factor Scores

Parcels vs. Items as Indicators - 2007

The models above were all applied to testlet/parcel data. That is, each indicator was the average of responses to two or three items. We did that originally because of a belief that we would not get acceptable goodness-of-fit unless we applied the models to parcel data. In the period 2005-2007 we began considering the use of individual items as indicators, rather than parcels. One reason for this was that having more indicators gives you more degrees of freedom, and allows you more freedom to estimate latent variables. The downside is that I believe there is a general tendency for models of individual items to have poorer fit indices than those of parcels. For example, below are graphs of fit indices CFI and RMSEA for individual-item indicators and 2-item parcel indicators for the same data. Note that goodness-of-fit is generally better for the two-items parcel data, particularly that the CFI values move from traditionally unacceptable to traditionally acceptable when parcels are indicators.


Two-itemParcel Indicators

Individual-itemIndicators

CFI increased by.1 or more in each study when 2-item parcels were used as indicators, rather than individual items.

RMSEA decreased in two studies and stayed the same in two when 2-item parcels were used as indicators, rather than individual items.

Individual-itemIndicators

Two-itemParcel Indicators

Positively-worded and negatively-worded method biases –2008.

Nhung Nguyen had mentioned in emails regarding method bias that we should look at method bias associated with item wording, specifically associated with positively worded items and with negatively-worded items.

We decided to look at bias associated with positively-worded and with negatively-worded items for the several studies we’ve conducted here. Here’s what the path diagram of a Mp, Mn model with individual items as indicators looks like . . . Because of the complexity of the path diagram, all of the applications of the MpMn model have been done using Mplus, which is programmed with commands, rather than figures.


Mn

Mp

O

S

C

A

E

O10O9O8O7O6O5O4O3O2O1

S10S9S8S7S6S5S4S3S2S1

C10C9C8C7C6C5C4C3C2C1

A10A9A8A7A6A5A4A3A2A1

E10E9E8E7E6E5E4E3E2E1

Here’s a summary of Mplus output from application of the MpMn model to four datasets.

In each application, the 10 individual IPIP Big 5 items were indicators.

Dataset

Nguyen Wrensen Damron Sebren

Model df Chi-square

CFA with No method 1165 2252.12 2315.73 2839.79 2552.45

CFA with M 1115 2031.74 2048.11 2449.20 2253.26

CFA with Mp,Mn 1114 1972.32 2025.24 2282.74 2184.08

M vs. No M 50 220.38* 267.62* 390.59* 299.19*

MpMn vs. M 1 59.42* 22.87* 166.46* 69.18*

MpMn CFI .785 .712 .833 .708

MpMn RMSEA .062 .070 .054 .072

Correlation of Mp with Mn .766 .844 .754 .752

* p < .001

The bottom line is that there is considerable evidence that the responses of participants to Big Five items are influenced by

1. The amount of the particular Big 5 characteristic that each participant possesses

2. A tendency to adjust responses to all positive items. The adjustment is positive for some people, negligible for some, negative for others.

3. A tendency to adjust responses to all negatively worded items. The adjustment is positive for some people, negligible for some, negative for others.

The item-wording adjustments measured here are independent of the Big Five dimensions.

The item-wording adjustments are positively correlated with each other, although not so positively correlated that they can be treated as a single latent variable. This is shown by the significant MpMn vs. M chi-squares.


2009 – Three types of bias factor – General bias, negative bias, and positive bias

As we explored the idea that there are different bias factors associated with different item wordings, I questioned the idea that there were only two such factors. It seemed more reasonable that there are THREE bias factors – negative, positive, and a general bias factor. My idea was buttressed by a recent article by March et al. (2010) in which three factors – a negative, positive, and general factor – were found to account for data of the Rosenberg Self Esteem scale. We explored this possibility by comparing several models for five different datasets. They’re summarized in the following figure from a paper recently submitted to Journal of Research in Personality. Model 6 is the model I believe best represents Big Five questionnaire data.

Figure 1. Models compared. Each rectangle represents the items indicating a Big Five dimension. The left half of each rectangle represents positively-worded items and the right half negatively-worded items. A single arrow drawn from a factor to a rectangle represents all the loadings from that factor to the indicators represented by the rectangle. Residual latent variables have been omitted for clarity


OSCAE

Mp

Op | OnSp | SnCp | CnAp | AnEp | En

M

OSCAE

MnMp


Model 4Model 3

Mp Mn

OSCAE

M


OSCAE

Mn


M

Model 6Model 5

OSCAE

M


OSCAE


Model 2Model 1

The results of comparisons . . .

Table 2. Chi-square goodness-of-fit measures and chi-square difference tests.------------------------------------------------------------------------------------------------------------------

Analysis--------------------------------------------------------------------

1 2 3 4 5 6 df--- --- --- --- --- --- ---

Questionnaire IPIP IPIP IPIP IPIP IPIP NEO IPIP / NEO-------- -------- ------- -------- -------- -------- ---------------

Model 1 2174.4 2552.5 3523.0 3734.4 2568.6 3219.6 1165 / 1700Model 2 1901.7 2253.3 3063.7 2431.2 2241.5 2893.1 1115 / 1640

Chi-square Model 3 1853.7 2186.2 2715.7 2275.0 2230.9 2838.2 1114 / 1639Model 4 1786.0 2136.2 2638.9 2112.9 2085.0 2744.8 1089 / 1611Model 5 1758.2 2101.9 2629.6 2152.7 2044.1 2732.7 1091 / 1609Model 6 1642.9 1980.7 2162.2 1912.3 1962.1 2589.6 1065 / 1580

-----------------------------------------------------------------------------------

Δχ2 Model 2 vs 1 272.7 299.2 459.3 672.9 327.1 326.5 50 / 60Δχ2 Model 3 vs 2 54.0 67.1 348.0 156.2 31.1 54.9 1 / 1

rMpMn .77 .76 .33 .50 .86 .89Δχ2 Model 4 vs 2 115.7 117.1 424.8 318.3 156.5 148.3 26 / 29Δχ2 Model 5 vs 2 143.5 151.4 434.1 278.5 197.4 160.4 24 / 31Δχ2 Model 6 vs 2 258.8 272.6 901.5 518.9 279.4 303.5 50 / 60

Δχ2 Model 6 vs 4 143.1 155.5 476.7 200.6 122.9 155.2 24 / 31Δχ2 Model 6 vs 5 115.3 121.2 467.4 240.4 82.0 143.1 26 / 29------------------------------------------------------------------------------------------------------------------Note. For Analysis 4, residual variance of one item set to .001 .

For analysis 5, variance of Mp set to .001;For analysis 3, residual variance of one item set to .001;

In all data sets the most general model, Model 6, fit significantly better than any of the other models.

This suggests that the most appropriate model for Big Five data is one that include EIGHT factors – 5 Big Five Trait factors and THREE method bias factors – one influencing only negatively worded items, a second influencing only positively worded items, and a third influencing all items.


Here is a more detailed figure representing Model 6.

But wait, there’s more . . .


M

Mn

Mp

O

S

C

A

E






2010 – Method factors as measures of well-being??

A few years ago a person with whom I was acquainted was going through some very rough times. One of the primary problems was depression. That person had taken a Big Five questionnaire. When the Big Five questionnaire was scored for just the five traits, there was nothing terribly unusual about the profile of scores. Even the Emotional Stability score, while below average, was not as far below average as one would have expected based on the severity of the depression at that time.

However, when the Big Five was scored for SIX factors – the Big Five plus M – a striking profile emerged. The person’s scores on the Big Five traits, including Emotional Stability were nearly normal, but the persons M score were VERY low. At the time, we were still considering M to be a measure of faking and I didn’t do anything immediately with the information. It was one of those isolated pieces of information that you store away for future reference.

Last year, I gathered data on the Big Five, and remembering that person’s profile, I included a measure of depression and also a measure of self-esteem in the questionnaire packet that was given to students. In the analysis of the data, I correlated M scores with both depression and self-esteem scores.

Here are the results, from a paper presented at SIOP in 2011 . . .

Table 1. Means, standard deviations, correlations, and reliability coefficients for study variables.--------------------------------------------------------------------------------------------------------------------

Mean SD E A C S O M CCD RSE_____ _____ _____ _____ _____ _____ _____ _____ _____ _____

E 4.75 1.04 .885

A 5.30 0.74 .317c .789

C 4.57 0.86 .007 .164a .823

S 4.24 0.99 .237b .176a -.021 .842

O 4.85 0.82 .244c .335c .270c .156a .812

M 0.00 0.38 .714c .616c .231b .592c .292c .912

CCD 1.84 0.83 -.202b -.309c -.330c -.284c -.192b -.412c .920

RSE 5.65 0.87 .285c .188a .381c .242c .359c .401c -.674c .847

--------------------------------------------------------------------------------------------------------------------

a p < .05 b p < .01 c p < .001

M correlates very negatively with Depression (CCD) and very positively with Self Esteem (RSE). Its correlations with these are larger than the correlations of Emotional Stability (S) with both. In fact, we argued in the paper that the correlations of S with both CCD and RSE were spurious, caused by the influence of M on both the Big Five, CCD, and RSE scores.


When the effect of M on S is removed, the correlation between “purified” S and CCD was .01 and with RSE was .00.


2011 - The Big Two and the General Factor of Personality (GFP)

Several theorists believe that there are higher order factors that influence the Big Five.

The Big Two theorists believe that there are two 2nd order factors – Stability and Plasticity. Stability is believed to influence Agreeablenss, Conscientiousness, and Emotional Stability. Plasticity is believed to influence Extraversion and Opennesss.

Other theorists believe that there is a single higher order factor – called the general factor of personality or GFP. It has been conceptualized as a 3rd order factor, influencing Stability and Plasticity.


O

E

S

C

A






GFP

Pl

St

2011 – M and the GFP

Contrast the GFP model with the models we’ve been considering. Clearly they are 1) different and 2) can coexist.

Our data have indicated that when M is estimated, the correlations between the Big Five factors are reduced to essentially zero. Since the indicators of a factor must be correlated, else there is no reason for the factor, this result provides little support for the GFP as presented below.

Some studies regarding the GFP have used the first unrotated factor in an EFA of items. They found that the GFP estimated in that way correlated positively with self presentation. But I would argue that what they’ve done is get crude estimates of M and have replicated our finding of the relationship of M to self presentation.


GFP

Pl

St

O

E

S

C

A






M

Van der Linden, D., Scholte, R. H. J., Cillessen, A. H. N., Nijenhuis, J., Segers, E. (2010). Classroom ratings of likeability and popularity are related to the Big Five and the general factor of personality. Journal of Research in Personality, 44, 669-672.

Date post:	17-May-2018
Category:	Documents
Upload:	lykhuong
View:	214 times
Download:	0 times

The Cassill Data · Web viewApplications of SEM Bill Cassill’s Thesis Data: Likert scales vs....

Documents