Date post: | 25-Dec-2015 |
Category: |
Documents |
Upload: | vanessa-blake |
View: | 217 times |
Download: | 0 times |
1
Factor Analysis 2006
Lecturer: Timothy [email protected]
Lecture Notes based on Austin 2005
• Bring your hand out to the tutorial• Please read prior to the tutorial
2
FACTOR ANALYSIS
• A statistical tool to account for variability in observed traits in terms of a smaller number of factors– Factor = "unobserved random
variable"– Measured item = Observed random
variable
• Values for an observation are recovered (with some error) from a linear combination of (usually much smaller set of) extracted factors.
3
Visually…
4
FA as a Data reduction technique
• Simplify complex multivariate datasets by finding “natural groupings” within the data– May correspond to underlying
‘dimensions’.– Subsets of variables that correlate
strongly with each other and weakly with other variables in the dataset.
• Natural groupings (factors) can assist the theoretical interpretation of complex datasets
• Theoretical linkage of factors to underlying (latent) constructs, e.g. “extraversion”, liberal attitudes, interest in ideas, ability
5
EXAMPLE DATASET210 students produced self-ratings on a list of trait
adjectives. Correlations above 0.2 marked in bold
1 2 3 4 5 6 7 8 9 10 11 2 0.27 3 0.37 0.53 4 0.40 0.30 0.38 5 0.17 -0.07 -0.09 -0.08 6 0.17 -0.05 -0.06 0.10 0.59 7 0.19 0.01 -0.05 0.05 0.38 0.42 8 0.06 -0.02 -0.02 0.02 0.51 0.54 0.48 9 -0.25 -0.05 -0.15 -0.20 -0.06 -0.11 -0.14 -0.07 10 -0.24 -0.10 -0.09 -0.10 -0.03 -0.02 -0.13 0.08 0.38 11 -0.21 -0.08 -0.22 -0.12 0.00 -0.03 -0.07 0.03 0.49 0.38 12 -0.01 0.02 -0.10 -0.04 0.07 0.09 0.06 0.04 0.34 0.40 0.40
1. ASSERTIVE, 2. TALKATIVE, 3.EXTRAVERTED, 4. BOLD5. ORGANIZED6. EFFICIENT, 7. THOROUGH, 8. SYSTEMATIC9. INSECURE10. SELF-PITYING, 11 NERVOUS, 12. IRRITABLE
• Clear structure in this sorted matrix• How easy would this be to see in a larger matrix?
6
THE THREE FACTORS FROM THE EXAMPLE
DATA I (C) II (N) III (E) EFFICIENT 0.82 ORGANIZED 0.80 SYSTEMATIC 0.79 THOROUGH 0.71 NERVOUS 0.75 -0.15 IRRITABLE 0.14 0.73 INSECURE -0.14 0.73 -0.16 SELF -PITYING 0.72 EXTRAVERTED -0.12 -0.10 0.79 TALKATIVE 0.75 BOLD 0.69 ASSERTIVE 0.24 -0.21 0.65
•The numbers are factor loadings = correlation of each variable with the underlying factor.
•Loadings less than 0.1 omitted.)•Can construct factor score (multiplied factor loadings)
•N =(0.75*Nervous) + (.73*Irritable) + (.73*Insecure) + (.72*Self-pity) – (.10*Extraverted) –(.21*Assertive)
•Main loadings are large and highly significant.•Smaller (cross-)loadings may be informative.•Factors are close to simple structure.
7
OBJECTIVES AND OUTCOMES OF
FACTOR ANALYSIS• Aim of factor analysis is to objectively
detect natural groupings of variables (factors)
• Can deal with large matrices, uses (reasonably) objective statistical criteria.
• Can obtain quantitative information– e.g. factor scores.
• Factors are (should be) of theoretical interest.– In the example the factors correspond to
the personality traits of Extraversion, Neuroticism and Conscientiousness
• Exploratory method, uncovering structure in data– Confirmatory factor analysis (model
testing) is also possible.
8
SOME TECHNICAL REQUIREMENTS FOR A FACTOR
ANALYSIS TO BE VALID AND USEFUL
• Simple structure– Each item loads highly on one
factor and close to zero on all others
• Factors have a meaningful theoretical interpretation– Rotation
• Factors retain most of the variance in the raw data– Parsimony compared to starting
variables achieved without loss of explanatory power
• Factors are Replicable
9
Assumptions
• Large enough sample– So that the correlations are
reliable
• Somewhat normal variables, No outliers
• No variables uncorrelated with any other
• No variables correlated 1.0 with each other– Remove one of each problematic
pair, or use sum if appropriate.
10
DATA QUALITY
• Sample Size– Rough rule is that 300 is OK,
smaller numbers may be OK.
• Subjects/variables ratio– Much discussion (less agreement)– Values between 2:1 and 10:1 have
been proposed as a minimum.
• Simulations suggest that overall sample size is more important.
• Well-defined factors (large loadings) will replicate in smaller samples than poorly-defined ones (small loadings)
11
STAGES OF ANALYSIS
• Examine data for outliers and correlations
• Choose number of factors– Scree plot
• Rotate factors if necessary• Interpret factors• Obtain scores
– Check reliability of scales defining factors
• Further experiments to validate factors
12
Partitioning item variance
• Variance of each item can be thought of in three partitions:1. Shared variance
• Common variance, explained by factors
+Unique variance
Not explained by other factors• 2. Specific variance • 3. Error variance
• Communality– The proportion of common variance
for a given variable• Sum of squares of item factor loadings
– Large communalities are required for a valid and useful factor solution
13
Computing a Factor Analysis
• Two main approaches– Differ in estimating
communalities
• Principal components– Simplest computationally– Assumes all variance is common
variance (implausible) but gives similar results to more sophisticated methods.
– SPSS default.
• Principal factor analysis– Estimates communalities first
14
How many Factors?
• Initially unknown– Needs to be specified by the
investigator on the basis of preliminary analysis
– No 100% foolproof statistical test for number of factors
– Similar problems with other multivariate methods
15
How many factors?
• There are potentially as many factors as items
• We don’t want to retain factors which account for little variance.
• Most commonly-used method to decide the number of factors is the “scree” plot of the “eigenvalues” – Variance explained by each factor.
• A point of inflection or kink or in the scree plot is a good method of making a cut-off
16
Scree plot for emotional intelligence items
0 2 4 6 8 10 12 140
1
2
3
4
5
6
7
8
EQ Scree
17
Scree diagram for Goldberg trait adjectives
0 2 4 6 8 10 12 14 160
2
4
6
8
10
12
14
Goldberg Scree
18
Scree diagram for food and health behaviour items
0 2 4 6 8 10 12 14 160.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Food and health Scree
19
Scree plot for ability test scores, Swedish Twin Study
0 2 4 6 8 10 12 14
0
1
2
3
4
5
6
IQ Scree
20
OTHER METHODS FOR FACTOR NUMBERS
• Eigenvalues > 1– Eigenvalues sum to the number of
items, so an eigenvalue of >1 = more informative than a single average item
– Not a useful guide in practice
• Parallel Analysis– Repeatedly randomise the
correlation matrix and determine how large an eigenvalue appears by chance in many thousands of trials.
– Excellent method
• Theory-driven– Extract a number of factors based
on theoretical considerations• Hard to justify
21
How to align the factors?
• The initial solution is “un-rotated”
• Two undesirable features make it hard to interpret:– Designed to maximise the
loadings of all items on the first factor
– Most items have large loadings on more than one factor
• Hides groupings in the data
22
UNROTATED FACTORS FOR THE EXAMPLE
DATA
I II III EFFICIENT 0.45 0.69 0.02 ORGANIZED 0.37 0.71 -0.04 SYSTEMATIC 0.37 0.70 0.04 THOROUGH 0.45 0.55 -0.02 NERVOUS -0.56 0.33 0.40 IRRITABLE -0.34 0.37 0.56 INSECURE -0.62 0.21 0.38 SELF-PITYING -0.52 0.28 0.42 EXTRAV ERTED 0.46 -0.41 0.51 TALKATIVE 0.36 -0.31 0.58 BOLD 0.48 -0.24 0.45 ASSERTIVE 0.64 -0.10 0.33
23
ROTATION – DETAIL (1)
• Rotation shows up the groups of items in the data.
• Orthogonal rotation– Factors remain independent
• Oblique rotation– Factors allowed to correlate
• Theoretical reasons to choose a type of rotation– (e.g. for intelligence test scores);
• May explore both types– Choose oblique if there are large
correlations between factors, orthogonal otherwise.
24
C
N
+1
-1
+1 -1 XXX
XXX X
X
Item loadings on the first 2 factors
25
C
N+1
-1
+1 -1
X
X
X
XXX
X
Lack of Simple Structure
26
X
X
XX
X
X
Rotation Defines New Axes WhichReveal the Item Groups
27
X
X
XX
X
X
Oblique Rotation
28
ROTATION -DETAIL (2)
• Rotated and un-rotated solutions are mathematically equivalent– Rotation is performed for purposes
of interpretation.
• Most common types:– Oblique
• Direct oblimin
– Orthogonal• Varimax (maximzes squared colun
variance)– Most common
• Quartimax (maximises row variance)• Equamax simplifies rows and columns of a
factor matrix
29
INTERPRETING FACTORS
• Done on the basis of ‘large’ loadings– Often taken to be above 0.3. – Size of loading which should be
considered substantive is sample-size dependent.
– For large samples loadings of 0.1 or below may be significant but do not explain much variance.
• Well-defined factor should have at least three high-loading variables– Existence of factors with only one or two
large loadings indicates factors over-extracted, or multi-colinearity problems.
• Assigning meaning to factors.
30
FACTOR SCORES
• Factor scores– Estimate of each subject’s score on the
underlying latent variable– Calculated from the factor loadings of each
item.
• Simple scoring methods– Often used for, e.g., personality
questionnaires is to sum the individual item scores (reverse-keying where necessary).
– This method is reasonable when all variables are measured on the same scale;
– What if you have a mix of items measured on different scales?
• (e.g. farmer’s extraversion score, farm annual profit, farm area).
31
EXAMPLE 1 – FACTOR STRUCTURE OF DIETARY
BEHAVIOUR
• Research question: Is there a dimension of healthy vs. unhealthy diet preferences?
– (Mac Nicol et al 2003)
• 451 schoolchildren completed a 35-item questionnaire mainly on food items regularly consumed (also some general health behaviour items)– Subjects:variables 12.9. Population not
representative for SES.• Scree suggested three factors, two diet
related– F1: Unhealthy foods (chips, fizzy drinks etc)– F2 Healthy foods (fruit, veg etc)
• Validation– Higher SES and better nutrition knowledge
associated with healthier eating patterns.• Factor reliabilities low
– Problem of yes/no items– Sample in-homogeneity.
32
EXAMPLE 2 –FACTOR STRUCTURE OF THE AQ
(Austin, 2005)
• Does the AQ have the factor structure that its original author thinks it has?
• The AQ is a 50-item questionnaire designed to assess autistic traits in the general population and at the high-functioning end of the clinical range.
• Designed produce a general factor and to have subscales assessing well-known clinical characteristics of autism:– Poor social skills– Strong focus of attention– Attention to detail– Poor communication– Poor imagination/play
• Completed by 201 undergraduates. • Subjects: variables 4:1. • Scree suggested a general factor + three sub-
factors– Poor social skills, attention to detail and poor
communication. • Reliabilities OK, some validation (males vs.
females, arts vs. science)
33
EXAMPLE 3 –FACTOR STRUCTURE OF AN EI SCALE
• How many factors in a published emotional intelligence scale, and can it be improved by adding more items?– (Saklofske, etal. 2003; Austin et al., 2004).
• 354 undergraduates completed a 33-item EI scale for which previous findings on the factor structure had given contradictory results.
• Scree plot (and some confirmatory modelling) suggested four factors, one with poor reliability.
• The factor structure has been replicated although other factor structures have been reported.
• A longer 41-item version of the same scale was constructed with more reverse-keyed items than the original scale, and also with additional items targeted on the low-reliability factor (utilisation of emotions).
• Completed by 500 students and was found to have a three-factor structure.
• Reliability of utilisation subscale increased, but still below 0.7.
34
EXAMPLE 4 – ABNORMAL PERSONALITY
• How does personality disorder relate to normal personality?
• Deary et al. (1998).
– Scale-level analysis of DSM-III-R personality disorders & EPQ-R
– Sample = 400 students
• Joint analysis gives four factors:– N+ Borderline, Self-defeating,
Paranoid– P+ Antisocial, Passive-aggressive,
Narcissistic– E+ avoidant(-), histrionic– P(-) Obsessive-compulsive,
Narcissistic
35
EXAMPLE 5 - THE ATTITUDES TO CHOCOLATE QUESTIONNAIRE
• 80 items on attitudes to chocolate were constructed using interviews and related literature.
• Aspects assessed included– difficulty controlling consumption, positive
attitudes, negative attitudes, craving.• Self-report chocolate consumption
was obtained; participants also performed a bar-pressing task with chocolate button reinforcements delivered on a progressive ratio schedule.
• Factor analysis gave three factors (eigenvalue 1 criterion)– 33.2%, 14.1% & 6.1% of the variance.– Third scale had low reliability
• Probably over-factored.• Follow up paper (Cramer & Hartleib, 2001)
has confirmed the first two factors.
36
Factors Found
1. Craving– I like to indulge in chocolate– I often go into a shop for something else
and end up buying chocolate),
2. Guilt– I feel guilty after eating chocolate
3. Functional approach– I eat chocolate to keep my energy levels
up when doing physical exercise.
• High-craving individuals reported– Consuming more bars per month– Were prepared to work harder to get
chocolate buttons
37
Example 6: Criterion based FA(Kline, Easy Guide, Ch 9)
• Two groups: long-term tranquilliser users and matched controls
– Measured• Personality• Psychological distress• Life events• Health data• Visits to GP• Ratings by GP• etc. etc.
• What factor(s) predict group membership?– High loadings for the group membership variable– In this study the best factor loaded
• Anxiety• Few friends• High GP contact• High repeat prescriptions
– Some variables unrelated (life events, job satisfaction, church attendance…)
• Alternative approaches – Regression– Cluster analysis
38
End of Lecture I
• See you next week :-)
39
INTERPRETING FACTORS
• Done on the basis of ‘large’ loadings– Often taken to be above 0.3. – Size of loading which should be
considered substantive is sample-size dependent.
– For large samples loadings of 0.1 or below may be significant but do not explain much variance.
• Well-defined factor should have at least three high-loading variables– Existence of factors with only one or two
large loadings indicates factors over-extracted, or multi-colinearity problems.
• Assigning meaning to factors.
40
FACTOR SCORES
• Factor scores– Estimate of each subject’s score on the
underlying latent variable
– Calculated from the factor loadings of each item.
• Simple scoring methods– Often used for, e.g., personality questionnaires
is to sum the individual item scores (reverse-keying where necessary).
– This method is reasonable when all variables are measured on the same scale;
– What if you have a mix of items measured on different scales?
• (e.g. farmer’s extraversion score, farm annual profit, farm area).
41
STATISTICAL TESTS FOR DATA QUALITY
• Examine KMO statistic.– Kaiser-Meyer-Olkin test of sampling
adequacy– Should be 0.5 or more.
• Low values indicate diffuse correlations with no substantive groupings.
– KMO statistics for each item• Item values below 0.5 indicate item does
not belong to a group and may be removed
• Bartlett’s test of sphericity.– Tests that the correlation matrix is
significantly different from an identity matrix.
• p-value should be significant
– Tests that there are not duplicate items in the matrix
42
SPSS ASPECTS• Path to follow is analyse, data reduction, factor.• EXTRACTION
– Select scree plot for initial run.– Choose number of factors.
• ROTATION– Select rotation method– Increase number of iterations for rotation if necessary
(default 25)• DESCRIPTIVES
– KMO and Bartlett tests– Reproduced correlations and residuals– Anti-Image matrix
• SCORES– Save as variables– Method
43
• Sort coefficients by size
• Suppress small loadings
OPTIONS
44
SCORING ETC.
• Factor scores constructed as above or by related methods can be used in further analyses
• e.g. are there M/F differences in scores on N, E, C?
• Do the factor scores correlate with other measures (exam anxiety, subjective reports of life quality, number of friends, exam success…)
45
OTHER ASPECTS OF FACTOR ANALYSIS
• Discussion so far has been in terms of questionnaire items, but factor analysis is possible with any set of measures for which correlations can be calculated.– Hypothetical example: personality traits, socio-
economic status, salary, life satisfaction, number of serious illnesses etc in the last five years
• Datasets of this type raise issues of factor analysis vs. regression modelling.
• Scale-level analysis can be very useful in the study of personality/individual differences.
• Hierarchical factor structures.– Best-known example is intelligence test scores.– Scores on a diverse range of tests are usually
all positively intercorrelated (positive manifold).– Can extract either
• A general ability (g) factor (positive loadings from all tests)
– or • Examine clustering of tests in more detail giving
correlated (oblique) lower-level factors.– Choice of level of description; both
descriptions are equally ‘correct’.
46
d
g
gsgrgcgf
Specific tests
Nested Analysis
47
USING FACTORS
• Naming – use content of high-loading items as a guide
• Assess internal reliability for each factor
• Scores – ‘unit weighting’ best for comparison between samples
• Validation – do factor scores correlate as expected with other variables? Issues of convergent/divergent validity with other tests if relevant.
48
Scale Reliability
• Factor Derived Scales can be assessed as with any other scale
• For instance using Cronbach’s Alpha
• Check alpha if item deleted to identify poorly-functioning items
• Adequate reliability is defined as 0.7 or above
49
CONFIRMATORY FACTOR ANALYSIS
• Hypothesis testing– Test the “fit” of a pre-
specified model– Compare different Models
• Available in several packages
– AMOS, Mx, Mplus
• Not covered in this course
50
How to assess FA
• Sample size– To things matter:
• ratio of subjects to Items• Total sample size
– Item to subject ratio is important– Can get away with smaller numbers when
communalities are high (i.e. factors well-defined)• Restriction of range (subject too similar)
– reduces correlations • Items per factor.
– Need at least three per factor, four is better. Some published analyses discuss factors with only one item loading!
• Use of eigenvalue-1. – Often seen in papers where factor number comes out
implausibly high. • Rotation.
– Orthogonal used when oblique should have been tried first.
– Generally safest to assume by default that factors will correlate.
• Scores.– SPSS and other packages give scores which are
sample-dependent.– Use of unit weighting of items is better practice.
51
Adequacy of sample size
– 50 – very poor– 100 – poor– 200 – fair– 300 – good– 500 – very good– >1000 – excellent
• Comfrey and Lee (1992, p. 217)
52
Item-subject ratios.
• With too many items and too few subjects, the data are “over-fitted”– Unreplicable results
• Bobko & Schemmer, 1984
• Subjects to items– 5:1 (Gorsuch, 1983, p.332; Hatcher,
1994, p. 73) – 10:1 (Nunnally, 1978, p. 421)
• Subjects to parameters measures– MacCallum, Widaman, Preacher, &
Hong (2001)• Subject: factor ratio• Item communalities• Item loadings
53
Summary
54
Assumptions and Purpose
• Assumptions of factor analysis
• Latent variable (i.e. factor)
• Research questions answered by factor analysis
• Factor loadings
55
Process
• Steps in factor analysis• Initial v final solution• Factorability of an inter-correlation
matrix• Bartlett's test of sphericity
and its interpretation• Kaiser-Meyer-Olkin measure
of sampling adequacy (KMO) and its interpretation
• Identity matrix and the determinant of an identity matrix
56
Extracting Factors.
• Methods for extracting factors
• Principal components
• Maximum likelihood method
• Principal axis method
• Un-weighted least squares
• Generalized least squares
• Alpha method
• Image factoring
57
Numbers of Factors.
• Criteria for determining the number of factors
• Eigenvalue greater than 1.0
• Cattell's scree plot• Percent and cumulative percent
of variance explained by the factors extracted
• Component matrix and factor loadings
• Communality of a variable• Determining what a factor
measures and naming a factor
58
Rotation
• Factor rotation and its purpose
• Varimax
• Quartimax
• Equimax
• Orthogonal v oblique rotation
• Reproduced correlation matrix
• Computing factor scores
• Factor score coefficient matrix
59
SEM & Factor Analysis
• SEM is a family of statistical techniques
• SEM incorporates path analysis and factor analysis
• SEM models in which each variable has multiple indicators but there are no direct effects (arrows) connecting the variables is a type of factor analysis.
60
Factor Analysis & Path Analysis
• SEM models in which each variable has only one indicator is a type of path analysis
• SEM encompasses models with both multiple indicators for each variable (called latent variables or factors), and paths specified connecting the latent variables.
• Synonyms for SEM are covariance structure analysis, covariance structure modeling, and analysis of covariance structures. Although these synonyms rightly indicate that analysis of covariance is the focus of SEM, be aware that SEM can also analyze the mean structure of a model.