+ All Categories
Home > Documents > Factor Analysis

Factor Analysis

Date post: 21-Mar-2016
Category:
Upload: nasia
View: 24 times
Download: 1 times
Share this document with a friend
Description:
Factor Analysis. Elizabeth Garrett-Mayer, PhD Georgiana Onicescu, ScM Cancer Prevention and Control Statistics Tutorial July 9, 2009. Motivating Example: Cohesion in Dragon Boat paddler cancer survivors. - PowerPoint PPT Presentation
Popular Tags:
50
Factor Analysis Elizabeth Garrett-Mayer, PhD Georgiana Onicescu, ScM Cancer Prevention and Control Statistics Tutorial July 9, 2009
Transcript
Page 1: Factor Analysis

Factor Analysis

Elizabeth Garrett-Mayer, PhDGeorgiana Onicescu, ScM

Cancer Prevention and Control Statistics TutorialJuly 9, 2009

Page 2: Factor Analysis

Motivating Example: Cohesion in Dragon Boat paddler cancer survivors Dragon boat paddling is an ancient Chinese sport that offers a unique

blend of factors that could potentially enhance the quality of the lives of cancer survivor participants.

Evaluating the efficacy of dragon boating to improve the overall quality of life among cancer survivors has the potential to advance our understanding of factors that influence quality-of-life among cancer survivors.

We hypothesize that physical activity conducted within the context of the social support of a dragon boat team contributes significantly to improved overall quality of life above and beyond a standard physical activity program because the collective experience of dragon boating is likely enhanced by team sport factors such as cohesion, teamwork, and the goal of competition.

Methods: 134 cancer survivors self-selected to an 8-week dragon boat paddling intervention group or to an organized walking program. Each study arm was comprised of a series of 3 groups of approximately 20-25 participants, with pre- and post-testing to compare quality of life and physical performance outcomes between study arms.

Page 3: Factor Analysis

Motivating Example: Cohesion

We have a concept of what “cohesion” is, but we can’t measure it directly.

Merriam-Webster:• the act or state of sticking together tightly• the quality or state of being made one

How do we measure it? We cannot simply say “how cohesive is your

team?” or “on a scale from 1-10, how do you rate your team cohesion?”

We think it combines several elements of “unity” and “team spirit” and perhaps other “factors”

Page 4: Factor Analysis

Factor Analysis Data reduction tool Removes redundancy or duplication from a set of

correlated variables Represents correlated variables with a smaller set of

“derived” variables. Factors are formed that are relatively independent of one

another. Two types of “variables”:

• latent variables: factors• observed variables

Page 5: Factor Analysis

Cohesion Variables:G1 (I do not enjoy being a part of the social environment of this exercise group)

G2 (I am not going to miss the members of this exercise group when the program ends)

G3 (I am unhappy with my exercise group’s level of desire to exceed)

G4 (This exercise program does not give me enough opportunities to improve my personal performance)

G5 (For me, this exercise group has become one of the most important social groups to which I belong)

G6 (Our exercise group is united in trying to reach its goals for performance)

G7 (We all take responsibility for the performance by our exercise group)

G8 (I would like to continue interacting with some of the members of this exercise group after the program ends)

G9 (If members of our exercise group have problems in practice, everyone wants to help them)

G10 (Members of our exercise group do not freely discuss each athlete’s responsibilities during practice)

G11 (I feel like I work harder during practice than other members of this exercise group)

Page 6: Factor Analysis

Other examples

Diet Air pollution Personality Customer satisfaction Depression Quality of Life

Page 7: Factor Analysis

Some Applications of Factor Analysis

1. Identification of Underlying Factors:• clusters variables into homogeneous sets• creates new variables (i.e. factors)• allows us to gain insight to categories

2. Screening of Variables:• identifies groupings to allow us to select one variable to

represent many• useful in regression (recall collinearity)

3. Summary:• Allows us to describe many variables using a few factors

4. Clustering of objects:• Helps us to put objects (people) into categories depending on

their factor scores

Page 8: Factor Analysis

“Perhaps the most widely used (and misused) multivariate [technique] is factor analysis. Few statisticians are neutral aboutthis technique. Proponents feel that factor analysis is the greatest invention since the double bed, while its detractors feel it is a useless procedure that can be used to support nearly any desired interpretation of the data. The truth, as is usually the case,lies somewhere in between. Used properly, factor analysis can yield much useful information; when applied blindly, without regard for its limitations, it is about as useful and informative asTarot cards. In particular, factor analysis can be used to explorethe data for patterns, confirm our hypotheses, or reduce the Many variables to a more manageable number.

-- Norman Streiner, PDQ Statistics

Page 9: Factor Analysis

Let’s work backwards

One of the primary goals of factor analysis is often to identify a measurement model for a latent variable

This includes • identifying the items to include in the model• identifying how many ‘factors’ there are in the latent

variable• identifying which items are “associated” with which

factors

Page 10: Factor Analysis

Standard Result

------------------------------------ Variable | Factor1 Factor2 | -------------+--------------------+ notenjoy | -0.3118 0.5870 | notmiss | -0.3498 0.6155 | desireexceed | -0.1919 0.8381 | personalpe~m | -0.2269 0.7345 | importants~l | 0.5682 -0.1748 | groupunited | 0.8184 -0.1212 | responsibi~y | 0.9233 -0.1968 | interact | 0.6238 -0.2227 | problemshelp | 0.8817 -0.2060 | notdiscuss | -0.0308 0.4165 | workharder | -0.1872 0.5647 | -----------------------------------

Page 11: Factor Analysis

How to interpret?

------------------------------------ Variable | Factor1 Factor2 | -------------+--------------------+ notenjoy | -0.3118 0.5870 | notmiss | -0.3498 0.6155 | desireexceed | -0.1919 0.8381 | personalpe~m | -0.2269 0.7345 | importants~l | 0.5682 -0.1748 | groupunited | 0.8184 -0.1212 | responsibi~y | 0.9233 -0.1968 | interact | 0.6238 -0.2227 | problemshelp | 0.8817 -0.2060 | notdiscuss | -0.0308 0.4165 | workharder | -0.1872 0.5647 | -----------------------------------

Loadings: represent correlations between item and factor

High loadings: define a factor Low loadings: item does not “load” on

factor Easy to skim the loadings This example:

• factor 1 is defined by G5, G6, G7, G8 G9

• factor 2 is defined by G1, G2, G3, G4, G10, G11

Other things to note:• factors are ‘independent’ (usually)• we need to ‘name’ factors • important to check their face validity.• These factors can now be ‘calculated’

using this model• Each person is assigned a factor score for

each factor• Range between -1 to 1

High loadings are highlightedin yellow.

Page 12: Factor Analysis

How to interpret? ------------------------------------ Variable | Factor1 Factor2 | -------------+--------------------+ notenjoy | -0.3118 0.5870 | notmiss | -0.3498 0.6155 | desireexceed | -0.1919 0.8381 | personalpe~m | -0.2269 0.7345 | importants~l | 0.5682 -0.1748 | groupunited | 0.8184 -0.1212 | responsibi~y | 0.9233 -0.1968 | interact | 0.6238 -0.2227 | problemshelp | 0.8817 -0.2060 | notdiscuss | -0.0308 0.4165 | workharder | -0.1872 0.5647 | -----------------------------------

Authors may conclude something like:

“We were able to derive two factors from the 11 items. The first factor is defined as “teamwork.” The second factor is defined as “personal competitive nature .” These two factors describe 72% of the variance among the items.”

High loadings are highlightedin yellow.

Page 13: Factor Analysis

Where did the results come from?

Based on the basic “Classical Test Theory Idea”:

For a case with just one factor:Ideal: X1 = F + e1 var(ej) = var(ek) , j ≠ k X2 = F + e2

Xm = F + em

Reality: X1 = λ1F + e1 var(ej) ≠ var(ek) , j ≠ k X2 = λ2F + e2

Xm = λmF + em

(unequal “sensitivity” to change in factor)(Related to Item Response Theory (IRT))

Page 14: Factor Analysis

Multi-Factor Models

Two factor orthogonal model ORTHOGONAL = INDEPENDENT Example: cohesion has two domains

X1 = λ11F1 + λ12F2 + e1

X2 = λ21F1 + λ22F2 + e2

…….

X11 = λ111F1 + λ112F2 + e11

More generally, m factors, n observed variables X1 = λ11F1 + λ12F2 +…+ λ1mFm + e1

X2 = λ21F1 + λ22F2 +…+ λ2mFm + e2

…….

Xn = λn1F1 + λn2F2 +…+ λnmFm + en

Page 15: Factor Analysis

Loadings (estimated) in our example

56.019.042.003.021.088.022.062.020.092.012.082.017.057.0

73.023.084.019.062.035.059.031.0

112111

102101

9291

8281

7271

6261

5251

4241

3231

2221

1211

Page 16: Factor Analysis

The factor analysis process

Multiple steps “Stepwise optimal”

• many choices to be made!• a choice at one step may impact the remaining

decisions• considerable subjectivity

Data exploration is key Strong theoretical model is critical

Page 17: Factor Analysis

Steps in Exploratory Factor Analysis

(1) Collect and explore data: choose relevant variables.(2) Determine the number of factors (3) Estimate the model using predefined number of factors (4) Rotate and interpret(5) (a) Decide if changes need to be made (e.g. drop

item(s), include item(s)) (b) repeat (3)-(4)

(6) Construct scales and use in further analysis

Page 18: Factor Analysis

Data Exploration

Histograms• normality• discreteness• outliers

Covariance and correlations between variables• very high or low correlations?

Same scale high = good, low = bad?

Page 19: Factor Analysis

Data exploration

NotEnjoyPOST

Freq

uenc

y

1 2 3 4 5

040

100

NotMissPOST

Freq

uenc

y

1 2 3 4 5

030

DesireExceedPOST

Freq

uenc

y

1 2 3 4 5

040

PersonalPerformPOST

Freq

uenc

y

1 2 3 4 5

040

80

ImportantSocialPOST

Freq

uenc

y1 2 3 4 5

020

GroupUnitedPOST

Freq

uenc

y

1 2 3 4 5

030

ResponsibilityPOST

Freq

uenc

y

1 2 3 4 5

030

60

InteractPOST

Freq

uenc

y

1 2 3 4 5

030

60

ProblemsHelpPOST

Freq

uenc

y

1 2 3 4 5

030

60NotDiscussPOST

Freq

uenc

y

1 2 3 4 5

020

40

WorkHarderPOST

Freq

uenc

y

1 2 3 4 5

030

70

Page 20: Factor Analysis

Correlation Matrix

. pwcorr notenjoy-workharder

| notenjoy notmiss desire~d person~m import~l groupu~d respon~y-------------+--------------------------------------------------------------- notenjoy | 1.0000 notmiss | 0.3705 1.0000 desireexceed | 0.2609 0.3987 1.0000 personalpe~m | 0.2552 0.3472 0.5946 1.0000 importants~l | -0.2514 -0.3357 -0.1384 -0.3123 1.0000 groupunited | -0.1732 -0.2460 -0.2384 -0.1359 0.4364 1.0000 responsibi~y | -0.2554 -0.3663 -0.2908 -0.2507 0.4399 0.8016 1.0000 interact | -0.1847 -0.2966 -0.2162 -0.2294 0.4415 0.4251 0.5174 problemshelp | -0.2561 -0.2865 -0.2567 -0.1940 0.4159 0.6498 0.7748 notdiscuss | 0.1610 0.0763 0.2253 0.2193 -0.0242 0.0027 -0.0598 workharder | 0.3482 0.1606 0.3794 0.3848 -0.0010 -0.2765 -0.3083

| interact proble~p notdis~s workha~r-------------+------------------------------------ interact | 1.0000 problemshelp | 0.5446 1.0000 notdiscuss | -0.0346 -0.0699 1.0000 workharder | -0.1063 -0.2358 0.2660 1.0000

Page 21: Factor Analysis

Valid correlations?

1 2 3 4 5

13

5

jitter(data[, 184 + i])

jitter

(dat

a[, 1

84 +

j])

1 2 3 4 5

13

5

jitter(data[, 184 + i])

jitter

(dat

a[, 1

84 +

j])

1 2 3 4 5

13

5

jitter(data[, 184 + i])

jitter

(dat

a[, 1

84 +

j])

1 2 3 4 5

13

5

jitter(data[, 184 + i])

jitter

(dat

a[, 1

84 +

j])

1 2 3 4 5

13

5

jitter(data[, 184 + i])

jitter

(dat

a[, 1

84 +

j])

1 2 3 4 5

13

5

jitter(data[, 184 + i])

jitter

(dat

a[, 1

84 +

j])

1 2 3 4 5

13

5

jitter(data[, 184 + i])

jitter

(dat

a[, 1

84 +

j])

1 2 3 4 5

13

5

jitter(data[, 184 + i])

jitter

(dat

a[, 1

84 +

j])

1 2 3 4 5

13

5

jitter(data[, 184 + i])

jitter

(dat

a[, 1

84 +

j])

1 2 3 4 5

13

5

jitter(data[, 184 + i])

jitter

(dat

a[, 1

84 +

j])

1 2 3 4 5

13

5jitter(data[, 184 + i])

jitter

(dat

a[, 1

84 +

j])

1 2 3 4 5

13

5

jitter(data[, 184 + i])

jitter

(dat

a[, 1

84 +

j])

1 2 3 4 5

13

5

jitter(data[, 184 + i])

jitter

(dat

a[, 1

84 +

j])

1 2 3 4 5

13

5

jitter(data[, 184 + i])

jitter

(dat

a[, 1

84 +

j])

1 2 3 4 5

13

5

jitter(data[, 184 + i])

jitter

(dat

a[, 1

84 +

j])

Page 22: Factor Analysis

Data Matrix

Factor analysis is totally dependent on correlations between variables.

Factor analysis summarizes correlation structure

v1……...vk

O1

.

.

.

.

.

.

.

.On

Data Matrix

v1……...vk

v1

.

.

.vk

v1

.

.

.vk

F1…..Fj

CorrelationMatrix

FactorMatrix

Implications for assumptions about X’s?

Page 23: Factor Analysis

Important implications

Correlation matrix must be valid measure of association

Likert scale? i.e. “on a scale of 1 to K?” Consider previous set of plots Is Pearson (linear) correlation a reasonable

measure of association?

Page 24: Factor Analysis

Correlation for categorical items

Odds ratios? Nope. on the wrong scale. Need measures on scale of -1 to 1, with zero meaning

no association Solutions:

• tetrachoric correlation: for binary items• polychoric correlation: for ordinal items

-’choric corelations• assume that variables are truncated versions of continuous

variables• only appropriate if ‘continuous underlying’ assumption makes

sense Not available in many software packages for factor

analysis!

Page 25: Factor Analysis

Polychoric Correlation MatrixPolychoric correlation matrix

notenjoy notmiss desireexceed notenjoy 1 notmiss .64411349 1 desireexceed .44814752 .60971951 1personalperform .37687346 .49572253 .74640077importantsocial -.33466689 -.35262233 -.18773414 groupunited -.26640575 -.25987331 -.32414348 responsibility -.38218019 -.43174724 -.34289848 interact -.31300025 -.41147172 -.28711931 problemshelp -.40864072 -.44688816 -.34338549 notdiscuss .28367782 .2071563 .33714715 workharder .49864257 .26866894 .50117974

personalperform importantsocial groupunitedpersonalperform 1importantsocial -.42902852 1 groupunited -.22011768 .47698468 1 responsibility -.32272048 .49187407 .85603168 interact -.37003374 .51150655 .46469124 problemshelp -.31435615 .51458893 .75552992 notdiscuss .28191066 -.07289447 -.0934676 workharder .4766736 .02547056 -.35603256

responsibility interact problemshelp responsibility 1 interact .59252523 1 problemshelp .84727982 .60910395 1 notdiscuss -.11548039 -.09653691 -.11580359 workharder -.37311526 -.13316066 -.30122735

notdiscuss workharder notdiscuss 1 workharder .3471915 1

.

Page 26: Factor Analysis

Polychoric Correlation in Stata

. findit polychoric

. polychoric notenjoy-workharder

. matrix R = r(R)

Page 27: Factor Analysis

Choosing Number of Factors

Intuitively: The number of uncorrelated constructs that are jointly measured by the X’s.

Only useful if number of factors is less than number of X’s (recall “data reduction”).

Use “principal components” to help decide • type of factor analysis• number of factors is equivalent to number of variables• each factor is a weighted combination of the input variables:

F1 = a11X1 + a12X2 + …. • Recall: For a factor analysis, generally,

X1 = a11F1 + a12F2 +...

Page 28: Factor Analysis

Eigenvalues To select how many factors to use, consider eigenvalues from a principal components analysis

Two interpretations:• eigenvalue equivalent number of variables which the factor

represents• eigenvalue amount of variance in the data described by the

factor. Rules to go by:

• number of eigenvalues > 1• scree plot• % variance explained• comprehensibility

Note: sum of eigenvalues is equal to the number of items

Page 29: Factor Analysis

Cohesion Example

. factormat R, pcf n(134)(obs=134)

Factor analysis/correlation Number of obs = 134 Method: principal-component factors Retained factors = 3 Rotation: (unrotated) Number of params = 30

-------------------------------------------------------------------------- Factor | Eigenvalue Difference Proportion Cumulative -------------+------------------------------------------------------------ Factor1 | 4.96356 3.14606 0.4512 0.4512 Factor2 | 1.81751 0.76378 0.1652 0.6165 Factor3 | 1.05373 0.27749 0.0958 0.7123 Factor4 | 0.77624 0.02065 0.0706 0.7828 Factor5 | 0.75559 0.22587 0.0687 0.8515 Factor6 | 0.52972 0.05654 0.0482 0.8997 Factor7 | 0.47318 0.24670 0.0430 0.9427 Factor8 | 0.22647 0.02484 0.0206 0.9633 Factor9 | 0.20163 0.07341 0.0183 0.9816 Factor10 | 0.12822 0.05407 0.0117 0.9933 Factor11 | 0.07415 . 0.0067 1.0000 --------------------------------------------------------------------------

Page 30: Factor Analysis

Scree Plot for Cohesion Example0

12

34

5E

igen

valu

es

0 5 10Number

Scree plot of eigenvalues after factor

. screeplot

Page 31: Factor Analysis

Choose two factors: Now fit the model. factormat R, n(134) ipf factor(2)(obs=134)

Factor analysis/correlation Number of obs = 134 Method: iterated principal factors Retained factors = 2 Rotation: (unrotated) Number of params = 21

.........Factor loadings (pattern matrix) and unique variances

------------------------------------------------- Variable | Factor1 Factor2 | Uniqueness -------------+--------------------+-------------- notenjoy | -0.6091 0.2661 | 0.5582 notmiss | -0.6566 0.2648 | 0.4988 desireexceed | -0.6712 0.5373 | 0.2608 personalpe~m | -0.6342 0.4344 | 0.4091 importants~l | 0.5538 0.2162 | 0.6466 groupunited | 0.7164 0.4137 | 0.3156 responsibi~y | 0.8456 0.4197 | 0.1088 interact | 0.6271 0.2132 | 0.5613 problemshelp | 0.8187 0.3866 | 0.1802 notdiscuss | -0.2830 0.3072 | 0.8256 workharder | -0.4977 0.3260 | 0.6461 -------------------------------------------------

Page 32: Factor Analysis

Interpretability?

Not interpretable at this stage In an unrotated solution, the first factor describes most of

variability. Ideally we want to

• spread variability more evenly among factors.• make factors interpretable

To do this we “rotate” factors:• redefine factors such that loadings on various factors tend to be

very high (-1 or 1) or very low (0)• intuitively, it makes sharper distinctions in the meanings of the

factors• We use “factor analysis” for rotation NOT principal

components! Rotation does NOT improve fit!

Page 33: Factor Analysis

Rotating Factors (Intuitively)

F1

F1

F2F2

Factor 1Factor 2

x1 0.5 0.5x2 0.8 0.8x3 -0.7 0.7x4 -0.5 -0.5

Factor 1Factor 2

x1 0 0.6x2 0 0.9x3 -0.9 0x4 0 -0.9

21

3

4

2

1

3

4

Page 34: Factor Analysis

Rotated Solution. rotateFactor analysis/correlation Number of obs = 134 Method: iterated principal factors Retained factors = 2 Rotation: orthogonal varimax (Kaiser off) Number of params = 21

-------------------------------------------------------------------------- Factor | Variance Difference Proportion Cumulative -------------+------------------------------------------------------------ Factor1 | 3.35544 0.72180 0.5603 0.5603 Factor2 | 2.63364 . 0.4397 1.0000 -------------------------------------------------------------------------- LR test: independent vs. saturated: chi2(55) = 959.26 Prob>chi2 = 0.0000

Rotated factor loadings (pattern matrix) and unique variances

------------------------------------------------- Variable | Factor1 Factor2 | Uniqueness -------------+--------------------+-------------- notenjoy | -0.3118 0.5870 | 0.5582 notmiss | -0.3498 0.6155 | 0.4988 desireexceed | -0.1919 0.8381 | 0.2608 personalpe~m | -0.2269 0.7345 | 0.4091 importants~l | 0.5682 -0.1748 | 0.6466 groupunited | 0.8184 -0.1212 | 0.3156 responsibi~y | 0.9233 -0.1968 | 0.1088 interact | 0.6238 -0.2227 | 0.5613 problemshelp | 0.8817 -0.2060 | 0.1802 notdiscuss | -0.0308 0.4165 | 0.8256 workharder | -0.1872 0.5647 | 0.6461 -------------------------------------------------

Page 35: Factor Analysis

Rotation options

“Orthogonal”• maintains independence of factors• more commonly seen• usually at least one option• Stata: varimax, quartimax, equamax, parsimax, etc.

“Oblique”• allows dependence of factors• make distinctions sharper (loadings closer to 0’s and

1’s• can be harder to interpret once you lose independence

of factors

Page 36: Factor Analysis

Uniqueness

Should all items be retained? Uniquess for each item describes the proportion of the

item described by the factor model Recall an R-squared:

• proportion of variance in Y explained by X 1-Uniqueness:

• proportion of the variance in Xk explained by F1, F2, etc.

Uniqueness: • represents what is left over that is not explained by factors• “error” that remainese

A GOOD item has a LOW uniqueness

Page 37: Factor Analysis

Our current model?

Rotated factor loadings (pattern matrix) and unique variances

------------------------------------------------- Variable | Factor1 Factor2 | Uniqueness -------------+--------------------+-------------- notenjoy | -0.3118 0.5870 | 0.5582 notmiss | -0.3498 0.6155 | 0.4988 desireexceed | -0.1919 0.8381 | 0.2608 personalpe~m | -0.2269 0.7345 | 0.4091 importants~l | 0.5682 -0.1748 | 0.6466 groupunited | 0.8184 -0.1212 | 0.3156 responsibi~y | 0.9233 -0.1968 | 0.1088 interact | 0.6238 -0.2227 | 0.5613 problemshelp | 0.8817 -0.2060 | 0.1802 notdiscuss | -0.0308 0.4165 | 0.8256 workharder | -0.1872 0.5647 | 0.6461 -------------------------------------------------

Page 38: Factor Analysis

Revised without “notdiscuss”

Rotated factor loadings (pattern matrix) and unique variances

------------------------------------------------- Variable | Factor1 Factor2 | Uniqueness -------------+--------------------+-------------- notenjoy | -0.3093 0.5811 | 0.5667 notmiss | -0.3345 0.6455 | 0.4715 desireexceed | -0.1783 0.8483 | 0.2486 personalpe~m | -0.2119 0.7551 | 0.3849 importants~l | 0.5618 -0.2057 | 0.6420 groupunited | 0.8265 -0.1271 | 0.3008 responsibi~y | 0.9247 -0.2089 | 0.1012 interact | 0.6160 -0.2469 | 0.5596 problemshelp | 0.8784 -0.2224 | 0.1789 workharder | -0.2023 0.5271 | 0.6813 -------------------------------------------------

Page 39: Factor Analysis

Methods for Estimating Model

Principal Components (already discussed) Principal Factor Method Iterated Principal Factor / Least Squares Maximum Likelihood (ML)

Most common(?): ML and Least SquaresUnfortunately, default is often not the best approach! Caution! ipf and ml may not converge to the right

answer! Look for uniqueness of 0 or 1. Problem of “identifiability” or getting “stuck.”

Page 40: Factor Analysis

Interpretation

Naming of Factors

Wrong Interpretation: Factors represent separate groups of people.

Right Interpretation: Each factor represents a continuum along which people vary (and dimensions are orthogonal if orthogonal)

Page 41: Factor Analysis

Factor Scores and Scales

Each object (e.g. each cancer survivor) gets a factor score for each factor.

Old data vs. New data The factors themselves are variables An individual’s score is weighted combination of scores on input

variables These weights are NOT the factor loadings! Loadings and weights determined simultaneously so that there is

no correlation between resulting factors.

Page 42: Factor Analysis

Factor Scoring

. predict f1 f2(regression scoring assumed)

Scoring coefficients (method = regression; based on varimax rotated factors)

---------------------------------- Variable | Factor1 Factor2 -------------+-------------------- notenjoy | -0.03322 0.19223 notmiss | 0.04725 0.13279 desireexceed | 0.15817 0.54996 personalpe~m | -0.04037 0.21452 importants~l | 0.02971 -0.02168 groupunited | 0.12273 0.12938 responsibi~y | 0.60379 0.07719 interact | 0.04594 -0.00870 problemshelp | 0.31516 0.06376 workharder | 0.11750 0.10810 ----------------------------------

Why different than loadings?Factors are generally

scaled to have variance 1.

Mean is arbitrary.

* If based on Pearson correlationmean will be zero.

Page 43: Factor Analysis

Orthgonal (i.e., independent)?

23

45

67

Sco

res

for f

acto

r 1

1 2 3 4 5 6Scores for factor 2

Page 44: Factor Analysis

Teamwork (Factor 1) by Program2

34

56

71 2

Sco

res

for f

acto

r 1

Graphs by progrm Dragon Boat Walking

Page 45: Factor Analysis

Personal Competitive Nature (Factor 2) by Program

12

34

56

1 2S

core

s fo

r fac

tor 2

Graphs by progrm Dragon Boat Walking

Page 46: Factor Analysis

Criticisms of Factor Analysis

Labels of factors can be arbitrary or lack scientific basis Derived factors often very obvious

• defense: but we get a quantification “Garbage in, garbage out”

• really a criticism of input variables• factor analysis reorganizes input matrix

Too many steps that could affect results Too complicated Correlation matrix is often poor measure of association of input

variables.

Page 47: Factor Analysis

Our example?

Preliminary analysis of pilot data! Concern: negative items “hang together”, positive items

“hang together: Is separation into two factors:

• based on two different factors (teamwork, pers. comp. nature)• based on negative versus positive items?

Recall: the computer will always give you something! Validity?

• boxplots of factor 1 suggest something• additional reliability and validity needs to be considered

Page 48: Factor Analysis

Stata Code

pwcorr notenjoy-workharderpolychoric notenjoy-workhardermatrix R = r(R)factormat R, pcf n(134)screeplotfactormat R, n(134) ipf factor(2)rotate

polychoric notenjoy notmiss desire personal important group respon interact problem workhardermatrix R = r(R)factormat R, n(134) ipf factor(2)rotatepredict f1 f2scatter f1 f2graph box f1, by(progrm)graph box f2, by(progrm)

Page 49: Factor Analysis

Stata Code for Pearson Correlation

factor notenjoy-workharder, pcfscreeplotfactor notenjoy-workharder, ipf factor(2)rotate

factor notenjoy notmiss desire personal important group respon interact problem workharder, ipf factor(2)

rotatepredict f1 f2scatter f1 f2graph box f1, by(progrm)graph box f2, by(progrm)

Page 50: Factor Analysis

Stata Options

Pearson correlation• Use factor for principal components and factor analysis

choose estimation approach: ipf, pcf, ml, pf choose to retain n factors: factor(n)

Polychoric correlation• Use factormat for principal components and factor analysis

choose estimation approach: ipf, pcf, ml, pf choose to retain n factors: factor(n) include n(xxx) to describe the sample size

Scree Plot: screeplot Rotate: choose rotation type: varimax (default), promax, etc. Create factor variables

• predict: list as many new variable names as there are retained factors.

• Example: for 3 retained factors, factor teamwork competition hardworks


Recommended