Principal Components and Factor Analysis Principal components.

Principal Components and Factor AnalysisPrincipal components

Intro• In many disciplines we study phenomena or

constructs that cannot be directly measured▫ (self-esteem, personality, intelligence)

• It often is required to take multiple observations for each case, and in the end we may have more data than can be readily interpreted▫ Items are representations of underlying or latent

factors. We want to know what these factors are

▫ We have an idea of the phenomena that a set of items represent (construct validity).

• Because of this, we’ll want to “reduce” them to a smaller set of factors

Purpose of PCA/FA•To find underlying latent constructs

▫As manifested in multiple items/variables

•To assess the association between multiple factors

•To produce usable scores that reflect critical aspects of any complex phenomenon

•As an end in itself and a major step toward creating error free measures

•These problems can be addressed with factor analysis (general)

•The goal is to explain a set of data in less than the total number of observations per person

•It identifies linear combinations of variables that will account for the variance in data set

Basic Concept• If two items are highly correlated

▫ They may represent the same phenomenon▫ If they tell us about the same underlying variance, combining

them to form a single measure is reasonable for two reasons Parsimony Reduction in Error

• Suppose one is just a little better than the other at representing this underlying phenomena?

• And suppose you have 3 variables, or 4, or 5, or 100?• FACTOR ANALYSIS (general) looks for the phenomena

underlying the observed variance and covariance in a set of variables.

• These phenomena are called “factors” or “principal components.”

Substance UseF1

PsychosocialFunctioning

F2

AlcoholUse X1

MarijuanaUse X2

Hard Drug Use X3

DistressY1

Self-Esteem Y2

Example of PCA-FA each with 3 main (bold-faced) loadings and each with 3 inconsequential (dashed-line) loadings.

Powerless-ness Y3

PCA/FA

• While often used similarly, PCA and FA are distinct from one another

• Principal Components Analysis▫ Extracts all the factors

underlying a set of variables▫ The number of factors = the

number of variables▫ Completely explains the

variance in each variable• Factor Analysis

▫ Analyzes only the shared variance Error is estimated apart from

shared variance

FA vs. PCA conceptually• FA produces factors; PCA produces components• Factors cause variables; components are

aggregates of the variables• The underlying causal model is fundamentally

distinct between the two▫ Some do not consider PCA as part of the FA family*

FA

I1 I3I2

PCA

I1 I3I2

Contrasting the underlying models*• PCA

▫ Extraction is the process of forming PCs as linear combinations of the measured variables as we have done with our other techniques

PC1 = b11X1 + b21X2 + … + bk1Xk PC2 = b12X1 + b22X2 + … + bk2Xk PCf = b1fX1 + b2fX2 + … + bkfXk

• Common factor model▫ Each measure X has two contributing sources of

variation: the common factor ξ and the specific or unique factor δ:

X1 = λ1ξ + δ1 X2 = λ2ξ + δ2 Xf = λfξ + δf

FA vs. PCA• PCA

▫ PCA is mathematically precise in orthogonalizing dimensions

▫ PCA redistributes all variance into orthogonal components

▫ PCA uses all variable variance and treats it as true variance FA is conceptually realistic in identifying common factors

• FA▫ FA distributes common variance into orthogonal factors▫ FA recognizes measurement error and true factor

variance

FA vs. PCA

•In some sense, PCA and FA are not so different conceptually than what we have been doing since multiple regression▫Creating linear combinations▫PCA especially falls more along the line of

what we’ve already been doing•What we do have different from previous

methods is that there is no IV/DV distinction▫Just a single set of variables

Summary• PCA goal is to analyze

variance and reduce the observed variables

• PCA reproduces the R matrix perfectly

• PCA – the goal is to extract as much variance with the least amount of factors

• PCA gives a unique solution

• FA analyzes covariance (communality)

• FA is a close approximation to the R matrix

• FA – the goal is to explain as much of the covariance with a minimum number of factors that are tied specifically to assumed constructs

• FA can give multiple solutions depending on the method and the estimates of communality

Questions

•Three general goals: data reduction, describe relationships and test theories about relationships

•How many interpretable factors exist in the data?

•How many factors are needed to summarize the pattern of correlations?

Questions• Which factors account for the most variance?• How well does the factor structure fit a given

theory?• What would each subject’s score be if they could

be measured directly on the factors?• What does each factor mean?• What is the percentage of variance in the data

accounted for by the factors? (FA or how much by the most notable components in PCA)

Assumptions/Issues• Assumes reliable variables/correlations

▫ Very much affected by missing data, outlying cases and truncated data

▫ Data screening methods (e.g. transformations, etc.) may improve poor factor analytic results

• Normality▫ Univariate - normally distributed variables make the

solution stronger but not necessary if we are using the analysis in a purely descriptive manner

▫ Multivariate – is assumed when assessing the number of factors

Assumptions/Issues•No outliers

▫Influence on correlations would bias results•Variables as outliers

▫Some variables don’t work▫Explain very little variance▫Relates poorly with factor▫Low squared multiple correlations as DV

with other items as predictors▫Low loadings

Assumptions/Issues• Factorable R matrix

▫ Need inter-item correlations > .30 or PCA/FA is going to do much for you

▫ Large inter-item correlations does not guarantee solution either While two variables may be highly correlated, they may not be

correlated with others▫ Matrix of partials adjusted for other variables, Kaiser’s

measure of sampling adequacy can help assess. Kaiser’s is a ratio of the sum of squared correlations to the sum of

squared correlations plus sum of squared partial correlations Approaches 1 if partials are small, and typically desire or about .6+

• Multicollinearity/Singularity▫ In PCA it is not problem; no matrix inversion necessary

As such it is a solution to dealing with collinearity in regression▫ Investigate tolerances, det(R)

Assumptions/Issues• Sample Size and Missing Data

▫ True missing data are handled in the usual ways ▫ Factor analysis via Maximum Likelihood needs large samples and it is

one of the only drawbacks• The more reliable the correlations are, the smaller the number of

subjects needed• Need enough subjects for stable estimates• How many?• Depends on the nature of the data and the number of parameters

to be estimated▫ For example, a simple setting with few variables and clean data might

not need as much▫ Having several hundred data points for a more complex solution with

messy data with lower correlations among the variables might not provide a meaningful result (PCA) or even converge upon a solution (FA)

Other issues

•No readily defined criteria by which to judge outcome▫Before we had R2, canonical corr,

classification•Choice of rotations is dependent entirely

on researcher’s estimation of interpretability

•Often used when other outcomes/analyses are not so hot, just to have something to talk about*

Extraction Methods for Factor Analytic Approaches•There are many (dozens at least)•All extract orthogonal sets of factors

(components) that reproduce the R matrix•Different techniques – some maximize

variance, others minimize the residual matrix (R – reproduced R)

•With large stable sample interpretations will be similar

Extraction Methods for Factor Analytic Approaches•Usually solutions are difficult to interpret

without a rotation•The output will differ depending on

▫Extraction method▫Communality estimates▫Number of factors extracted▫Rotational Method

Extraction Methods for Factor Analytic Approaches•PCA vs. FA (family)•PCA

▫begins with 1s on the diagonal of the correlation matrix

▫as such all variance is extracted and each variable given equal weight

•FA ▫begins with a communality estimates (e.g.

squared multiple correlation, reliability estimate) on the diagonal

▫analyzes only common/shared variance

Extraction Methods•PCA

▫Extracts maximum variance with each component

▫First component is a linear combination of variables that maximizes component score variance for the cases

▫The second (etc.) extracts the max. variance from the residual matrix left over after extracting the first component (therefore orthogonal to the first)

▫If all components retained, all variance explained

PCA

•First factor is the one that accounts for the most variance▫It is the latent phenomena to which the

items are as a group most strongly associated

•Second represents the factor that accounts for the most of what is left

•And so on till there is none left to account for

PCA• Factors are linear combinations of variables.

▫ These combinations are based on weights (eigenvectors) developed by the analysis

• FA and PCA are not much different than canonical correlation in terms of generating canonical variates from linear combinations of variables▫ Although there are now no “sides” of the equation, and you’re

not necessarily correlating the “factors”, “components”, “variates”, etc.

• The factor loading for each item/variable is the r between it and the factor (i.e., the underlying shared variance)

• However, unlike many of the analyses so far there is no statistical criterion to compare the linear combination to▫ In MANOVA we create linear combinations that maximally

differentiate groups▫ In Canonical correlation one linear combination is used to

maximally correlate with another

PCA• Once again we come to eigenvalues and

eignenvectors • Eigenvalues

▫ Conceptually can be considered to measure the strength (relative length) of an axis

▫ Derived eigen analysis of the square symmetric matrix (covariance or correlation)

• Eigenvector▫ Each eigenvalue has an associated eigenvector. An

eigenvalue is the length of an axis, the eigenvector determines its orientation in space.

▫ The values in an eigenvector are not unique because any coordinates that described the same orientation would be acceptable.

Data•Example data of

women’s height and weight height weight Zheight Zweight

57 93 -1.77427146053986-1.9651628606882458 110 -1.47097719378091-.87340571586144160 99 -.86438866026301 -1.579836809572959 111 -1.16768292702196-.80918470734221861 115 -.561094393504058-.55230067326532460 122 -.86438866026301 -.10275361363075862 110 -.257800126745107-.87340571586144161 116 -.561094393504058-.488079664746162 122 -.257800126745107-.10275361363075863 128 .0454941400138444 .28257243748458362 134 -.257800126745107.66789848859992564 117 .348788406772796 -.42385865622687663 123 .0454941400138444 -.038532605111534765 129 .652082673531747 .34679344600380764 135 .348788406772796 .73211949711914866 128 .955376940290699 .28257243748458367 135 1.25867120704965 .73211949711914866 148 .955376940290699 1.5669926078690668 142 1.5619654738086 1.1816665567537169 155 1.86525974056755 2.01653966750362

Data transformation• Consider two variables height and weight• X would be our data matrix, w our eigenvector

(coefficients)• Multiplying our original data by these weights*

results in a column vector of values▫ z1 = Xw

• The multiplying of a matrix by a vector is a linear combination

• The variance of this linear combination is the eigenvalue

Data transformation• Consider a gal 5’ and 122 pounds• She is -.86sd from the mean height and -.10

sd from the mean weight for this data

• The first eigenvector associated with the normalized data* is [.707,.707], as such the resulting value for that data point is -.68

• So with the top graph we have taken the original data point and projected it onto a new axis -.68 units from the origin

• Now if we do this for all data points we will have projected them onto a new axis/component/dimension/factor/linear combination

• The length of the new axis is the eigenvalue

11 2 1 1 2 2

2

' ( )b

a b a a a b a bb

Data transformation• Suppose we have more than one

dimension/factor?• In our discussion of the techniques thus far,

we have said that each component or dimension is independent of the previous one

• What does independent mean?▫ r = 0

• What does this mean geometrically in the multivariate sense?

• It means that the next axis specified is perpendicular to the previous

• Note how r is represented even here▫ The cosine of the 90o angle formed by the

two axes is… 0• Had the lines been on top of each other (i.e.

perfectly correlated) the angle formed by them would be zero, whose cosine is 1▫ r = 1

Data transformation• The other eigenvector associated with

the data is (-.707,.707)• Doing as we did before we’d create that

second axis, and then could plot the data points along these new axes*

• We now have two linear combinations, each of which is interpretable as the vector comprised of projections of original data points onto a directed line segment

• Note how the basic shape of the original data has been perfectly maintained

• The effect has been to rotate the configuration (45o) to a new orientation while preserving its essential size and shape▫ It is an orthogonal transformation▫ Note that we have been talking of

specifiying/rotating axes, but rotating the points themselves would give us the same result

Stretching and shrinking• Note that with what we have now

there are two new variables, Z1 and Z2, with very different variances▫ Z1 much larger

• If we want them to be equal we can simply standardize* those Z1 and Z2 values▫ s2 = 1 for both

• In general, multiplying a matrix by a scalar will shrink or stretch the plot

• Here, let Z be the matrix of the Z variables and D a diagonal matrix with the standard deviations on the diagonal

• The resulting plot would now be circular

1

2

1/ 0

0 1/

s

s

-1

-1s

D

Z ZD

Singular value decomposition• Given a data matrix X, we can use one matrix operation to stretch or

shrink the values▫ Multiply by a scalar

< 1 shrink > 1 stretch

• We’ve just seen how to rotate the values▫ Matrix multiplication

• In general we can start with a matrix X and get▫ Zs = X W D-1

• W here is the matrix that specifies the rotation by some amount (degrees)• With a little reworking

▫ X = Zs D W’

• What this means is that any data matrix X can be decomposed into three parts:▫ A matrix of uncorrelated variables with variance/sd = 1 (Zs)▫ A stretching and shrinking transformation (D)▫ And an orthogonal rotation (W’)

• Finding these components is called singular value decomposition

cos sin

sin cos

W

The Determinant• The determinant of a var/covar

matrix provides a single measure to characterize the variance and covariance in the data

• Generalized variance▫ 10.87*224.42 – 44.51*44.51 = 654

• Note ▫ 44.51/(10.87*224.42).5 = r = .867

VarCovar matrix for the height and weight data

10.87 44.5144.51 224.42

Geometric interpretation of the determinant• Suppose we use the values from our

variance/covariance matrix and plot them geometrically as coordinates

• The vectors emanating from the origin to those points defines a parrallelogram

• In general the skinnier the parallelogram, the larger the correlation▫ Here our’s is pretty skinny due to an r = .867

• The determinant is the area of this parallelogram, and thus will be smaller with larger correlations▫ Recall our collinearity problem

• The top is that associated with the raw variables varcovar matrix

• The bottom is that of the components Z1 and Z2 which have zero correlation

• The scatterplots illustrate two variables with increasing correlations 0,.25,.50,.75,1.

• The smaller plots are the plot of the correlation matrix in 2d space (e.g. the coordinates for the 4th diagram are 1, .75 for one point, .75,1 for the other).

• The eigenvalues associated with the correlation matrix are the lengths of the major and minor axes, e.g. 1.75 and .25 for diagram 4.

• Also drawn is the ellipse specified by the axes

•My head hurts!•What is all this noise?

PCA

•It may take awhile for that stuff to sink in, but at this point we have the tools necessary to jump into the application of PCA

•“Principal components”•Extraction process and resulting

characteristics

Meaning of “Principal Components”• “Component” analyses are those that are based

on the “full” correlation matrix• 1.00s in the diagonal

• “Principal” analyses are those for which each successive factor...• accounts for maximum available variance• is orthogonal (uncorrelated, independent) with all prior

factors• full solution (as many factors as variables), i.e. accounts

for all the variance

Application of PC analysis• Components analysis is a kind of “data reduction”

• start with an inter-related set of “measured variables”• identify a smaller set of “composite variables” that can be

constructed from the “measured variables” and that carry as much of their information as possible

• A “Full components solution” ...• has as many components as variables• accounts for 100% of the variables’ variance• each variable has a final communality of 1.00

• A “Truncated components solution” …• has fewer components than variables• accounts for <100% of the variables’ variance• each variable has a communality < 1.00

The steps of a PC analysis• Compute the correlation matrix• Extract a full components solution• Determine the number of components to “keep”

• total variance accounted for• variable communalities• interpretability• replicability

• “Rotate” the components and interpret (name) them• Compute “component scores” • “Apply” components solution

• theoretically -- understand the meaning of the data reduction• statistically -- use the component scores in other analyses

PC Factor Extraction• Extraction is the process of forming PCs as linear

combinations of the measured variables as we have done with our other techniques

PC1 = b11X1 + b21X2 + … + bk1Xk

PC2 = b12X1 + b22X2 + … + bk2Xk

PCf = b1fX1 + b2fX2 + … + bkfXk

• The goal is to reproduce as much of the information in the measured variables with as few PCs as possible

• Here’s the thing to remember…• We usually perform factor analyses to “find out how many

groups of related variables there are” … however …• The mathematical goal of extraction is to “reproduce the

variables’ variance, efficiently”

3 variable example• Consider 3 variables

with the correlations displayed

• In a 3d sense we might envision their relationship as this, with the shadows what the scatterplots would roughly look like for each bivariate relationship

X1

X3

X2

Correlations

1 .562 .704

.562 1 .304

.704 .304 1

Pearson Correlation

Pearson Correlation

Pearson Correlation

V1

V2

V3

V1 V2 V3

The first component identified

• The variance of this component, its eigenvalue, is 2.063• In other words it accounts for twice as much variance as

any single variable*• Note 3 variables 2.063/3 = .688% variance accounted

for*

Total Variance Explained

2.063 68.778 68.778 2.063 68.778 68.778 1.064 35.451 35.451

.706 23.518 92.296 .706 23.518 92.296 1.045 34.828 70.279

.231 7.704 100.000 .231 7.704 100.000 .892 29.721 100.000

Component1

2

3

Total % of Variance Cumulative % Total % of Variance Cumulative % Total % of Variance Cumulative %

Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings

Extraction Method: Principal Component Analysis.

PCA• In principal components, we extract as many components

as there are variables• As mentioned previously, each component is uncorrelated

with the previous• If we save the component scores and were to look at their

graph it would resemble something like this

Correlations

1 .000 .000

.000 1 .000

.000 .000 1

Pearson Correlation

Pearson Correlation

Pearson Correlation

A-R factor score1 for analysis 1

A-R factor score2 for analysis 1A-R factor score3 for analysis 1

A-R factorscore 1 foranalysis 1



How do we interpret the components?• As we have done with the other

techniques, the component loadings can inform us as to their interpretation

• As before, they are the original variable’s correlation with the component

• In this case, all load nicely on the first component, which since the others do not account for nearly as much variance is probably the only one to interpret

Component Matrixa

.928 -.080 -.364

.726 .670 .159

.822 -.501 .271

V1

V2

V3

1 2 3

Component


3 components extracted.a.

•Here is an example of magazine readership

•Underlined loadings are > .30

•How might this be interpreted?

Applied example

•Six items▫Three sadness, three relationship quality ▫N = 300

•PCA

Start with the Correlation Matrix

Correlation Matrix

1.000 .594 .551 .099 .233 .204

.594 1.000 .511 .189 .168 .230

.551 .511 1.000 .169 .207 .180

.099 .189 .169 1.000 .685 .666

.233 .168 .207 .685 1.000 .676

.204 .230 .180 .666 .676 1.000

OW1RQ1

OW1RQ2

OW1RQ3

OW1SAD1

OW1SAD2

OW1SAD3

CorrelationOW1RQ1 OW1RQ2 OW1RQ3 OW1SAD1 OW1SAD2 OW1SAD3

Communalities are ‘Estimated’• A measure of how much variance of the

original variables is accounted for by the observed factors

• Uniqueness is 1-communality• With PC with all factors, communality

always = 1• As we’ll see with FA, the approach will

be different▫ The initial value is the multiple R2 for the

association between a item and all the other items in the model

• Why 1.0? ▫ PCA analyzes all the variance for each

variable▫ FA only that shared variance

Communalities

1.000 1.000

1.000 1.000

1.000 1.000

1.000 1.000

1.000 1.000

1.000 1.000

OW1RQ1

OW1RQ2

OW1RQ3

OW1SAD1

OW1SAD2

OW1SAD3

Initial Extraction


What are we looking for?• Any factor whose eigenvalue is less than 1.0 is in

most cases not going to be retained for interpretation▫ Unless it is very close or has a readily understood and

interesting meaning• *Loadings that are:

▫ more than .5 are good ▫ between .3 and .5 are ok ▫ Less than .3: small

• Matrix reproduction▫ All the information about the correlation matrix is maintained▫ Correlations can be reproduced exactly in PCA

Sum of cross loadings

Assessing the variance accounted for

Total Variance Explained

2.802 46.700 46.700 2.802 46.700 46.700

1.658 27.634 74.334 1.658 27.634 74.334

.503 8.389 82.723 .503 8.389 82.723

.444 7.393 90.117 .444 7.393 90.117

.326 5.426 95.543 .326 5.426 95.543

.267 4.457 100.000 .267 4.457 100.000

Component1

2

3

4

5

6

Total % of Variance Cumulative % Total % of Variance Cumulative %

Initial Eigenvalues Extraction Sums of Squared Loadings


Eigenvalue is an index of the strength of the factor, the amount of variance it accounts for. It is the sum of the squared loadings for that factor/component

Eigenvalue/N of items or variables

Factor Loadings

Component Matrixa

.609 .608 -.119 -.447 .062 .203

.614 .568 -.408 .325 .051 -.158

.593 .558 .548 .166 -.091 -.026

.728 -.512 .026 .238 .250 .298

.767 -.448 .089 -.231 .198 -.332

.764 -.438 -.117 -.028 -.457 .034

OW1RQ1

OW1RQ2

OW1RQ3

OW1SAD1

OW1SAD2

OW1SAD3

1 2 3 4 5 6

Component


6 components extracted.a.

Eigenvalue of factor 1 = .6092 + .6142 .5932 + .7282 + .7672 + .7642 = 2.80

Reproducing the correlation matrix (R)• Sum the products of the loadings for two variables on all factors

▫ For RQ1 and RQ2: (.61 * .61) + (.61 * .57) + (-.12 * -.41) + (-.45 * .33) + .06 * .05) + (.20 * -.16)

= .59 If we just kept to the first the first two factors only the reproduced

correlation = .72 • Note that an index of the quality of a factor analysis (as opposed

to PCA) is the extent to which the factor loadings can reproduce the correlation matrix*

Original correlation

Variance Accounted For• For Items

▫ The sum of the square of the loadings (i.e., weights) across the factors is the amount of variance accounted for in each item.

▫ Item 1: .612 + .612 + -.122 + .452 + .062 + .202

.37 + .37 + .015 + .2 + .004 + .04 = ~1.0 For the first two factors: .74

• For components▫ How much variance is accounted for by the components

that will be retained?

When is it appropriate to use PCA?• PCA is largely a descriptive procedure• In our examples, we are looking at variables with

decent correlations. However, if the variables are largely uncorrelated PCA won’t do much for you▫ May just provide components that are respective of each

individual variable i.e. nothing is gained• One may use Bartlett’s sphericity test to determine

whether such an approach is appropriate• It tests the null hypothesis that the R matrix is an

identity matrix (1s on diagonal, 0s offdiagonals)• When the determinant of R is small (recall from

before this implies strong correlation), the chi-square statistic will be large reject H0 and PCA would be appropriate for data reduction

• One should note though that it is a powerful test, and usually will result in rejection with typical sample sizes

• One may instead refer to estimation of practical effect rather than a statistical test▫ Are the correlations worthwhile?

22

2

2 5( 1) ln

2 6

2number of variables

number of observations

ln natural log of thedeterminant of

p p pn

p pdf

p

n

R

R R

How should the data be scaled?• In most of our examples we have been using the

R matrix instead of the var-covar matrix• As PCA seeks to maximize variance, it can be

sensitive to scale differences across variables• Variables with a larger range of scores would

thus have more of an impact on the linear combination created

• As such, the R matrix should be used, except perhaps in cases where the items are on the same scale (e.g. Likert)

• The values involved will change (e.g. eigenvalues), though the general interpretation may not

How many components should be retained?• Kaiser’s Rule

▫ What we’ve already suggested i.e. eigenvalues over 1▫ The idea is that any component should account for at

least as much as a single variable• Another perspective on this is to retain as many

components as will account for X amount of variance▫ Practical approach

• Scree Plot▫ Look for the elbow

Look for the point after which the remaining eigenvalues decrease in linear fashion and retain only those ‘above’ the elbow

▫ Not really a good primary approach though may be consistent with others

How many components should be retained?• Horn’s Procedure• This is a different approach which suggests to

create a set of random data of the same size N and p variables

• The idea is that in this maximizing variance accounted for, PCA has a good chance of capitalization on chance

• Even with random data, the first eigenvalue will be > 1

• As such, retain components with eigenvalues greater than that produced by the largest component of the random data

Rotation•Sometimes our loadings will be a little

difficult to interpret initially•Given such a case we can ‘rotate’ the

solution such that the loadings perhaps make more sense▫This is typically done in factor analysis but

is possible here too•An orthogonal rotation is just a shift to a

new set of coordinate axes in the same space spanned by the principal components

Rotation• You can think of it as shifting the axes or

rotating the ‘egg’• The gist is that the relations among the

items is maintained, while maximizing their more natural loadings and minimizing ‘off-loadings’*

• Note that as PCA is a technique that initially creates independent components, and orthogonal rotations that maintain this independence are typically used▫ Loadings will be either large or small,

little in between• Varimax is the common rotation utilized

▫ Maximizes the sum of the squared loadings for each component

Other issues: How do we assess validity?• Cross-validation

▫ Holdout sample as we have discussed before▫ About a 2/3, 1/3 split▫ Using eigenvectors from the original components, we can

create new components with the new data and see how much variance each accounts for

▫ Hope it’s similar to original solution• Jackknife

▫ With smaller samples conduct PCA multiple times each with a specific case held out

▫ Using the eigenvectors, calculate the component score for the value held out

▫ Compare the eigenvalues for the components involved• Bootstrap

▫ In the absence of a hold out sample, we can create a bootstrapped sample to perform the same function

Other issues: Factoring items vs. factoring scales• Items are often factored as part of the process of

scale development • Check if the items “go together” like the scale’s

author thinks• Scales (composites of items) are factored to …

▫ examine construct validity of “new” scales▫ test “theory” about what constructs are interrelated

• Remember, the reason we have scales is that individual items are typically unreliable and have limited validity

Other issues: Factoring items vs. factoring scales• The limited reliability and validity of items means

that they will be measured with less precision, and so, their intercorrelations for any one sample will be “fraught with error”

• Since factoring starts with R, factorings of items is likely to yield spurious solutions -- replication of item-level factoring is very important!

• Is the issue really “items vs. scales” ?▫ No -- it is really the reliability and validity of the “things

being factored”, scales having these properties more than items

Other issues: When is it appropriate to use PCA?• Another reason to use PCA, which isn’t a great

one obviously, is that the maximum likelihood test involved in and Exploratory Factor Analysis does not converge

• PCA will always give a result (it does not require matrix inversion) and so can often be used in such a situation

• We’ll talk more on this later, but in data reduction situations EFA is to be preferred for social scientists and others that use imprecise measures

Other issues: Selecting Variables for a Factor Analysis• Sometimes a researcher has access to a data set

that someone else has collected -- an “opportunistic data set”

• While this can be a real money/time saver, be sure to recognize the possible limitations

• Be sure the sample represents a population you want to talk about

• Carefully consider variables that “aren’t included” and the possible effects their absence has on the resulting factors▫ this is especially true if the data set was chosen to be

“efficient” variables chosen to cover several domains• You should plan to replicate any results obtained

from opportunistic data

Other issues: Selecting the Sample for a Factor Analysis• How many?• Keep in mind that the R and so the factor

solution is the same no matter now many cases are used -- so the point is the representativeness and stability of the correlation

• Advice about the subject/variable ration varies pretty dramatically▫ 5-10 cases per variable▫ 300 cases minimum (maybe + # of items)

• Consider that like for other statistics, your standard error for correlation decreases with increasing sample size

A note about SPSS•SPSS does provide a means for principal

components analysis•However, its presentation (much like

many textbooks for that matter) blurs the distinction between PCA and FA, such that they are easily confused

•Although they are both data dimension reduction techniques, they do go about the process differently, have different implications regarding the results and can even come to different conclusions

A note about SPSS• In SPSS, the menu is ‘factor’ analysis (even

though ‘principal components’ is the default technique setting)

• Unlike other programs PCA isn’t even a separate procedure (it’s all in the Factor syntax)

• In order to perform PCA, make sure you have principal components selected as your extraction method, analyze the correlation matrix, and specify the number of factors to be extracted equals the number of variables

• Even now, your loadings will be different from other programs, which are scaled such that the sum of their squared values = 1

• In general be cautious when using SPSS

No frills PCA in R*

•pca=princomp(Dataset)•pca •summary(pca) •pca$loadings •pca$scores •#scree plot; •plot(pca)

Other functions• http://rss.acs.unt.edu/Rdoc/library/pcaMethods/h

tml/00Index.html• library(pcaMethods)

▫ pca Uses modern pca approaches (Bayesian, nonlinear etc.)

nipalsPca▫ Uses a technique ‘nipals’ to estimate missing values first▫ Can be specified in the ‘pca’ function

bpca▫ Same but uses a Bayesian method

robustPCA

• Q2 ▫ Can perform cross-validation

Date post:	12-Jan-2016
Category:	Documents
Upload:	marion-christine-doyle
View:	237 times
Download:	3 times

Principal Components and Factor Analysis Principal components.

Documents