Date post: | 22-Dec-2015 |
Category: |
Documents |
View: | 216 times |
Download: | 0 times |
FACTOR ANALYSIS
LECTURE 11
EPSY 625
PURPOSES
SUPPORT VALIDITY OF TEST SCALE WITH RESPECT TO UNDERLYING TRAITS (FACTORS)
• EFA- EXPLORE/UNDERSTAND UNDERLYING FACTORS FOR A TEST
• CFA- CONFIRM THEORETICAL STRUCTURE IN A TEST
HISTORICAL DEVELOPMENT
PEARSON (1901)- eigenvalue/eigenvector problem (dimensional reduction) “method of principal axes)
SPEARMAN (1904) “General Intelligence, Objectively Measured and Determined”
Others: Burt, Thompson, Garnett, Holzinger, Harmon, Thurstone
FACTOR MODELS
FIXED SAMPLE
Fixed
Principal
components, common factors
Image
SUBJECTS
VARIABLES
ALPHA Factor Analysis
Canonical Factor Analysis
Sampl
e
EXPLORATORY FACTOR ANALYSIS
USE PRINCIPAL AXIS METHOD:• ASSUMES THERE ARE 3 VARIANCE
COMPONENTS IN EACH ITEM:• COMMONALITY (h2)
• UNIQUENESS:
• SPECIFICITY (s2)
• ERROR (e2)
SINGLE FACTOR
REQUIRES AT LEAST 3 ITEMS OR MEASUREMENTS TO UNIQUELY DETERMINE
FACTOR
ITEM1
SPECIFICITY
e
ITEM2
ITEM3
e
e
.7
.8
.6
CALLED FACTOR LOADING
CORRELATION BETWEEN ITEM AND FACTOR
ASSUMED=0 FOR PARALLEL ITEMS
.6
.8
.714
FACTOR
ITEM1
SPECIFICITY
e
ITEM2
ITEM3
e
e
.7
.8
.6
ALPHA= SPEARMAN-BROWN STEPPED UP AVERAGE INTER ITEM CORRELATION:
(.56 +.42+.48)/3=.49
ALPHA= 3(.49)/[1+2(.49)]
= .74
ASSUMED=0 FOR PARALLEL ITEMS
.6
.8
.714
=1-.72
TWO FACTORS
NEED AT LEAST 2 ITEMS OR MEASUREMENTS PER FACTOR, ASSUMING FACTORS ARE CORRELATED
FACTOR 1
ITEM1e
ITEM2eITEM 3
ITEM 4
FACTOR 2
e
e.5CORRELATION
BETWEEN FACTORS
.7
.8
.6
.7
FACTOR 1
ITEM1e
ITEM2eITEM 3
ITEM 4
FACTOR 2
e
e.5CORRELATION
BETWEEN FACTORS
.7
.8
.6
.7
CORRELATION BETWEEN ANY TWO ITEMS = PRODUCT OF ALL PATHS BETWEEN THEM;
EX. R(ITEM1, ITEM4) =
.7 x .5 x .7 = .245
SIMPLE STRUCTURE TRY TO CREATE SCALE IN WHICH
EACH ITEM CORRELATES WITH ONLY ONE FACTOR:
ITEM FACTOR
1 2 3
ITEM 1 1 0 0
ITEM 2 1 0 0
ITEM 3 0 1 0
ETC
CRITERIA FOR SIMPLE STRUCTURE
Structural equation modeling provides chi square test of fit
Compares observed covariance (correlation) matrix with predicted/fitted matrix
Alternatively, look at RMSEA (Root mean square error of approximation) of deviations from fitted matrix
MATHEMATICAL MODEL
Z = persons by variables matrix of p x k standardized variables (mean=0, SD=1)
Z’Z = NR (covariance matrix) k x k Zi = aiFi + ei
MATHEMATICAL MODEL
Z = AF = C + U
ZZ’/N = R = AFF’A’ + U2
S = ZF’/N (structure matrix: correlations between Z and F)
= AFF’/N = FF’/N (correlations among factors) A = Pattern matrix
MATHEMATICAL MODEL
S = A A = S -1 (If factors uncorrelated, A=S)
Pattern matrix = Structure matrix
R = ZZ’/N = CC’/N + U2
MATHEMATICAL MODEL
If we take the covariance matrix of F to be diagonal, and the metric of variances of Fi to be 1.0,
R = AA’/N = SA’ = AS’
MATHEMATICAL MODEL
Now let Zi = aiFi + si + ei
Let Ŕ = R - D2, where D2 is a diagonal matrix of specificities and error: si + e2
i
Then Ŕ = AFF’A/N = A A’ = SA’ = AS’ = I Ŕ = AA’
MATHEMATICAL MODEL
How do we estimate s2i ?
Instead, estimate [R2- U2]ii= [I- s2i - e2
i]ii
Consider for each zi that it is predictable from the rest:
zi = b1z1 + b2z2 + …bi-1zi-1 + ...
Then R2i = variance common to all other
variables (squared multiple correlation or SMC) h2
i = communality for item i Due to Dwyer (1939)
MATHEMATICAL MODEL
SMC is estimable from the observed data, so that Ŕ = R - [1-SMCi]
where [SMCi] = diagonal matrix with SMCs for each variable on the diagonals and zeros off-diagonal
Theorem states “SMCs guarantee that the number of factors # eigenvalues>1.0
MATHEMATICAL MODEL
Ŕ =
R21.234.. 0 0 0 0 …
0 R22.134.. 0 0 0 …
0 0 R23.124.. 0 0 …
0 0 0 R24.123.. 0 …
MATHEMATICAL MODEL
SOLUTIONS: PRINCIPAL COMPONENTS (R =
Ŕ )Rq = q,
RQ = Q, = diagonal [i]
Q-1RQ = QQ’ = I = Q-1 = Q’
Q’RQ = (Spectral Theorem)
MATHEMATICAL MODEL
SOLUTIONS: PRINCIPAL AXIS ( Ŕ- I)q = 0
That is, solve for first eigenvalue | Ŕ- I | = 0, solved by Rmq = mq
begin with m=2: R2q = 2q , then put solution in R(Rq1) = 2q1, iterate for m=4
MATHEMATICAL MODEL
Now compute residual correlation matrix:R2
1 = R2 - Ŕ , iterate
EIGENVALUES
i = variance of ith factor i / i = proportion of total variance
accounted for by the ith factor i < 1 chance factor Scree plot (value x factor eigenvalue
ordered from greatest to lowest)
K
1.0
0
1 2 3 4 5 6 7 . . . . K
SCREE PLOT
ROTATION
MEANING CRITERION: SIMPLE STRUCTURE POSITIVE MANIFOLD
B=AT A=INITIAL FACTOR MATRIX
T=TRIANGULAR MATRIX
B=FINAL FACTOR MATRIX
TT’=
VARIMAX ROTATION (uncorrelated Factors)
ORTHOGONAL (RIGID) ROTATION Maximize V=n (bjp/hj)4 - (b2
jp/h2j)2
Geometric problem: (X,Y) = (x,y) cos - sin
sin - cos
VARIMAX ROTATION (X,Y) = (x,y) cos - sin sin - cos
uj = x2j - y2
j
vj = 2xjyj
A= uj
B= vj
C= (uj - vj)2
D=2 ujvj
solve tan4 = [D-2AB/h]/[C-(A2-B2)/h]
-45o 45o
Unrotated Factor 1 loading values
Unrotated Factor 2 loading values
Orthogonal (perpendicular) Rotation of Axes
OBLIQUE SOLUTION (correlated Factors)
MINIMIZE S (OBLIMIN) S = [n(v2
jp/h2j) (v2
jg/h2j)
- ((v2jp/h2
j)((v2jg/h2
j)]PROMAX:
• Start with VARIMAX, B=AT, transform with vjp = (bjp
4)/bjp
FACTOR CORRELATION
= TT’
Tij = cos(ij) -sin(ij) sin(ij) cos(ij)
rij = [cos(ij)(-sin(ij)] + [sin(ij)cos(ij)] = T11T12 + T21T22
FACTOR CORRELATION
S = P (Structure matrix= Pattern matrix x factor correlation matrix)
P = A(T’)-1
A = PT’
ij
Oblique Rotation of Axes
ALPHA FACTOR ANALYSIS
Estimates population h2i for each variable Little different from common factors
Canonical Factor Analysis
Uses canonical analysis to maximize R between factors and variables, iterative Maximum Likelihood analysis
Image Analysis
h2i = R2i.1,2,…K
pj = wjkzk (standard regression) ej = zj - pj called anti-image Var(ej)> Var(j) where Var(j) = anti-
image for the regression of zj on the factors F1,F2, …FK
FACTOR CONGRUENCE
Alternative to Confirmatory Analysis for two groups who it is hypothesized have the same factor structure:
Spq = ajpbjq / [a2jp b2jq ] This is basically the correlation between
factor loadings on the comparable factors for two groups
Example of 2 factor structure
Achievement (reading, math) and IQ (verbal, nonverbal)
quasi-multitrait multimethod analysis:• reading is verbal
• math is “nonverbal”
.9
Ach Apt
.7
Reading
.9
Arithmetic
.8
Verbal Nonverbal
.8
e
.3.6
e
.43
e
.6
e
Factor Structure
F1 F2R .9 .63
A .8 .56
V .63 .9
NV .56 .8
Reduced Correlation Matrix
R A V NV R .81 .72 .57 .50
A .64 .51 .45
V .81 .72
NV .64
.9
Ach Apt
.7
Reading
.9
Arithmetic
.8
Verbal Nonverbal
.8
e
.43.6
e
.43
e
.6
e
Factor Structure
F1 F2R .9 .63
A .8 .56
V .63 .9
NV .56 .8
Reduced Correlation Matrix
R A V NV R .91 .72 .63 .51
A .92 .65 .61
V .99 .72
NV .80
.32
.40
Revised Model with additional specificities
.37
CONFIRMATORY FACTOR ANALYSIS
BASIC PRINCIPLES
x x´)
2x1
xx = x1x2 2x2 x1x3 x2x3 2x3
BASIC PRINCIPLES
2x1 =2111 + 21
2xk =2k11 + 2k
xixk =x111 xk
x1
1
xkk
1
IDENTIFICATION RULES t-rule : tq(q+1), q=#manifest variables
• necessary but not sufficient
3-indicator rule: 1 factor3 indicators• sufficient but not necessary
2-indicator rule: 2+ factors2 indicators @ local vs. global identification:
• local: sample estimates of parameters independent- necessary but not sufficient
• global: population parameters independent- necessary and sufficient
ESTIMATION
MODEL EVALUATION• FIT: FML used to evaluate , S
• Residuals: E= S -
• RMR = SD(sij - ij )
• RMSEA = √[(2/df - 1) /(N - 1)]
• note: factor analyze E , should be 0
ˆ
ˆ
ˆ
ˆ
ˆ
Hancock’s Formula- reliability for a given factor
Hj = 1/ [ 1 + {1 / (Σ[l2ij/(1- l2ij )] ) }
Ex. l1 = .7, l2= .8, l3 = .6
H = 1 / [ 1 +1/( .49/.51 + .64/.36 + .36/.64 )]
= 1 / [ 1 + 1/ ( .98 +1.67 + .56 ) ]
= 1/ (1 + 1/3.21)
= .76
Hancock’s Formula Explained
Hj = 1/ [ 1 + {1 / (Σ[l2ij/(1- l2ij )] ) }
now assume strict parallelism: then l2ij= 2xt
thus Hj = 1/ [ 1 + {1 / (Σ[2xt /(1- 2
xt)] ) }
= k 2xt / [1 + (k-1) 2
xt ]
= Spearman-Brown formula
TEST
(n-1)FML ~ t
used for nested model: model with one or more restrictions from original
restriction = known parameter, equality of two or more parameters
Proof: Bollen shows (N-1)[-2Log(L0/L1)=(N-1)FML
where L0 is unrestricted, L1 restricted models
INCREMENTAL FIT
Bentler and Bonnet: 1 = Fb - Fm Fb
= b -
m
b
can be used to compare improvements over original model or against a standard or baseline
Bentler & Bonnet Baseline conventions b= S Alternatives:
b= [.5] or b = from a previous study example: Willson & Rupley (1997) was
used by Nichols (1997) dissertation
Bollen’s fit index2 = Fb - Fm
Fb
= b -
m
b - df
Logic: the difference in the numerator has expected value equal to the denominator
AMOS SEM PROGRAM Uses SPSS to input data- select SPSS file Draw factor model
• Circles for factors, boxes for observed variables
• Arrows from circles to boxes to indicate loadings
• Errors for each box (special drawing character)
• Label all circles and boxes with names- SPSS variable names for boxes, your own name for factors and circles
• Correlate factors with curved arrows as needed
AMOS Drawing
ANX
e1e2
e3
DEP SE
F1