Chapter TenCopyright © 2010 Pearson Education, Inc. publishing as
Prentice Hall 18-1
Chapter Eighteen
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-2
Direct marketing scenario
• Every year, the DM company sends to one annual catalog, four
seasonal catalogs, and a number of calalogs for holiday
seasons.
• The company has a list of 10 million potential buyers.
• The response rate is on average 5%. • Who should receive a
catalog? (mostly likely to
buy) • How are buyers different from nonbuyers? • Who are most
likely to default?...
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-3
Similarities and Differences between ANOVA, Regression, and
Discriminant Analysis
Table 18.1
ANOVA REGRESSION DISCRIMINANT/LOGIT Similarities Number of One One
One dependent variables Number of independent Multiple Multiple
Multiple variables
Differences Nature of the dependent Metric Metric Categorical
variables Nature of the independent Categorical Metric Metric (or
binary, dummies) variables
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-4
Discriminant Analysis
Discriminant analysis is a technique for analyzing data when the
criterion or dependent variable is categorical and the predictor or
independent variables are interval in nature.
The objectives of discriminant analysis are as follows: •
Development of discriminant functions, or linear
combinations of the predictor or independent variables, which will
best discriminate between the categories of the criterion or
dependent variable (groups). (buyers vs. nonbuyers)
• Examination of whether significant differences exist among the
groups, in terms of the predictor variables.
• Determination of which predictor variables contribute to most of
the intergroup differences.
• Classification of cases to one of the groups based on the values
of the predictor variables.
• Evaluation of the accuracy of classification.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-5
Discriminant Analysis
• When the criterion variable has two categories, the technique is
known as two-group discriminant analysis.
• When three or more categories are involved, the technique is
referred to as multiple discriminant analysis.
• The main distinction is that, in the two-group case, it is
possible to derive only one discriminant function. In multiple
discriminant analysis, more than one function may be computed. In
general, with G groups and k predictors, it is possible to estimate
up to the smaller of G - 1, or k, discriminant functions.
• The first function has the highest ratio of between-groups to
within-groups sum of squares. The second function, uncorrelated
with the first, has the second highest ratio, and so on. However,
not all the functions may be statistically significant.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-6
Geometric Interpretation
Fig. 18.1
2 2 2 2 2 2 2 2 2 2
2 1 1
1 1 1 1
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-7
Discriminant Analysis Model
the following form:
Where:
b 's = discriminant coefficient or weight
X 's = predictor or independent variable
• The coefficients, or weights (b), are estimated so that the
groups differ as much as possible on the values of the discriminant
function.
• This occurs when the ratio of between-group sum of squares to
within-group sum of squares for the discriminant scores is at a
maximum.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-8
Statistics Associated with Discriminant Analysis
• Canonical correlation. Canonical correlation measures the extent
of association between the discriminant scores and the groups. It
is a measure of association between the single discriminant
function and the set of dummy variables that define the group
membership.
• Centroid. The centroid is the mean values for the discriminant
scores for a particular group. There are as many centroids as there
are groups, as there is one for each group. The means for a group
on all the functions are the group centroids.
• Classification matrix. Sometimes also called confusion or
prediction matrix, the classification matrix contains the number of
correctly classified and misclassified cases.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-9
Statistics Associated with Discriminant Analysis
• Discriminant function coefficients. The discriminant function
coefficients (unstandardized) are the multipliers of variables,
when the variables are in the original units of measurement.
• Discriminant scores. The unstandardized coefficients are
multiplied by the values of the variables. These products are
summed and added to the constant term to obtain the discriminant
scores.
• Eigenvalue. For each discriminant function, the Eigenvalue is the
ratio of between-group to within-group sums of squares. Large
Eigenvalues imply superior functions.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-10
Statistics Associated with Discriminant Analysis
• F values and their significance. These are calculated from a
one-way ANOVA, with the grouping variable serving as the
categorical independent variable. Each predictor, in turn, serves
as the metric dependent variable in the ANOVA.
• Group means and group standard deviations. These are computed for
each predictor for each group.
• Pooled within-group correlation matrix. The pooled within-group
correlation matrix is computed by averaging the separate covariance
matrices for all the groups.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-11
Statistics Associated with Discriminant Analysis
• Standardized discriminant function coefficients. The standardized
discriminant function coefficients are the discriminant function
coefficients and are used as the multipliers when the variables
have been standardized to a mean of 0 and a variance of 1.
• Structure correlations. Also referred to as discriminant
loadings, the structure correlations represent the simple
correlations between the predictors and the discriminant
function.
• Total correlation matrix. If the cases are treated as if they
were from a single sample and the correlations computed, a total
correlation matrix is obtained.
• Wilks' . Sometimes also called the U statistic, Wilks' for each
predictor is the ratio of the within-group sum of squares to the
total sum of squares. Its value varies between 0 and 1. Large
values of (near 1) indicate that group means do not seem to be
different. Small values of (near 0) indicate that the group means
seem to be different.
λ λ
λ λ
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-12
Conducting Discriminant Analysis
Determine the Significance of the Discriminant Function
Formulate the Problem
Interpret the Results
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-13
Conducting Discriminant Analysis Formulate the Problem
• Identify the objectives, the criterion variable, and the
independent variables.
• The criterion variable must consist of two or more mutually
exclusive and collectively exhaustive categories.
• The predictor variables should be selected based on a theoretical
model or previous research, or the experience of the
researcher.
• One part of the sample, called the estimation or analysis sample,
is used for estimation of the discriminant function.
• The other part, called the holdout or validation sample, is
reserved for validating the discriminant function.
• Often the distribution of the number of cases in the analysis and
validation samples follows the distribution in the total
sample.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-14
Information on Resort Visits: Analysis Sample
Table 18.2
Annual Attitude Importance Household Age of Amount Resort Family
Toward Attached Size Head of Spent on
No. Visit Income Travel to Family Household Family ($000) Vacation
Vacation
1 1 50.2 5 8 3 43 M (2) 2 1 70.3 6 7 4 61 H (3) 3 1 62.9 7 5 6 52 H
(3) 4 1 48.5 7 5 5 36 L (1) 5 1 52.7 6 6 4 55 H (3) 6 1 75.0 8 7 5
68 H (3) 7 1 46.2 5 3 3 62 M (2) 8 1 57.0 2 4 6 51 M (2) 9 1 64.1 7
5 4 57 H (3) 10 1 68.1 7 6 5 45 H (3) 11 1 73.4 6 7 5 44 H (3) 12 1
71.9 5 8 4 64 H (3) 13 1 56.2 1 8 6 54 M (2) 14 1 49.3 4 2 3 56 H
(3) 15 1 62.0 5 6 2 58 H (3)
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-15
Information on Resort Visits: Analysis Sample Ta
bl e
18 .2
, c on
t. Annual Attitude Importance Household Age of Amount Resort Family
Toward Attached Size Head of Spent on
No. Visit Income Travel to Family Household Family ($000) Vacation
Vacation
16 2 32.1 5 4 3 58 L (1) 17 2 36.2 4 3 2 55 L (1) 18 2 43.2 2 5 2
57 M (2) 19 2 50.4 5 2 4 37 M (2) 20 2 44.1 6 6 3 42 M (2) 21 2
38.3 6 6 2 45 L (1) 22 2 55.0 1 2 2 57 M (2) 23 2 46.1 3 5 3 51 L
(1) 24 2 35.0 6 4 5 64 L (1) 25 2 37.3 2 7 4 54 L (1) 26 2 41.8 5 1
3 56 M (2) 27 2 57.0 8 3 2 36 M (2) 28 2 33.4 6 8 2 50 L (1) 29 2
37.5 3 2 3 48 L (1) 30 2 41.3 3 3 2 42 L (1)
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-16
Information on Resort Visits: Holdout Sample Table 18.3
Annual Attitude Importance Household Age of Amount Resort Family
Toward Attached Size Head of Spent on
No. Visit Income Travel to Family Household Family ($000) Vacation
Vacation
1 1 50.8 4 7 3 45 M(2) 2 1 63.6 7 4 7 55 H (3) 3 1 54.0 6 7 4 58
M(2) 4 1 45.0 5 4 3 60 M(2) 5 1 68.0 6 6 6 46 H (3) 6 1 62.1 5 6 3
56 H (3) 7 2 35.0 4 3 4 54 L (1) 8 2 49.6 5 3 5 39 L (1) 9 2 39.4 6
5 3 44 H (3) 10 2 37.0 2 6 5 51 L (1) 11 2 54.5 7 3 3 37 M(2) 12 2
38.2 2 2 3 49 L (1)
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-17
Conducting Discriminant Analysis Estimate the Discriminant Function
Coefficients
• The direct method involves estimating the discriminant function
so that all the predictors are included simultaneously.
• In stepwise discriminant analysis, the predictor variables are
entered sequentially, based on their ability to discriminate among
groups.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-18
Results of Two-Group Discriminant Analysis Ta
bl e
18 .4
GROUP MEANS VISIT INCOME TRAVEL VACATION HSIZE AGE
1 60.52000 5.40000 5.80000 4.33333 53.73333 2 41.91333 4.33333
4.06667 2.80000 50.13333 Total 51.21667 4.86667 4.9333 3.56667
51.93333
Group Standard Deviations
1 9.83065 1.91982 1.82052 1.23443 8.77062 2 7.55115 1.95180 2.05171
.94112 8.27101 Total 12.79523 1.97804 2.09981 1.33089 8.57395
Pooled Within-Groups Correlation Matrix INCOME TRAVEL VACATION
HSIZE AGE
INCOME 1.00000 TRAVEL 0.19745 1.00000 VACATION 0.09148 0.08434
1.00000 HSIZE 0.08887 -0.01681 0.07046 1.00000 AGE - 0.01431
-0.19709 0.01742 -0.04301 1.00000
Wilks' (U-statistic) and univariate F ratio with 1 and 28 degrees
of freedom
Variable Wilks' F Significance
INCOME 0.45310 33.800 0.0000 TRAVEL 0.92479 2.277 0.1425 VACATION
0.82377 5.990 0.0209 HSIZE 0.65672 14.640 0.0007 AGE 0.95441 1.338
0.2572 Cont.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-19
Results of Two-Group Discriminant Analysis Table 18.4, cont.
CANONICAL DISCRIMINANT FUNCTIONS
% of Cum Canonical After Wilks' Function Eigenvalue Variance %
Correlation Function λ Chi-square df Significance
: 0 0 .3589 26.130 5 0.0001 1* 1.7862 100.00 100.00 0.8007 :
* marks the 1 canonical discriminant functions remaining in the
analysis.
Standard Canonical Discriminant Function Coefficients
FUNC 1
INCOME 0.74301 TRAVEL 0.09611 VACATION 0.23329 HSIZE 0.46911 AGE
0.20922
Structure Matrix: Pooled within-groups correlations between
discriminating variables & canonical discriminant functions
(variables ordered by size of correlation within function)
FUNC 1
INCOME 0.82202 HSIZE 0.54096 VACATION 0.34607 TRAVEL 0.21337 AGE
0.16354 Cont.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-20
Cont.
Unstandardized Canonical Discriminant Function Coefficients FUNC
1
INCOME 0.8476710E-01 TRAVEL 0.4964455E-01 VACATION 0.1202813 HSIZE
0.4273893 AGE 0.2454380E-01 (constant) -7.975476 Canonical
discriminant functions evaluated at group means (group
centroids)
Group FUNC 1 1 1.29118 2 -1.29118 Classification results for cases
selected for use in analysis
Predicted Group Membership Actual Group No. of Cases 1 2
Group 1 15 12 3 80.0% 20.0%
Group 2 15 0 15 0.0% 100.0%
Percent of grouped cases correctly classified: 90.00%
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-21
Results of Two-Group Discriminant Analysis
Table 18.4, cont.
Classification Results for cases not selected for use in the
analysis (holdout sample)
Predicted Group Membership Actual Group No. of Cases 1 2
Group 1 6 4 2 66.7% 33.3%
Group 2 6 0 6 0.0% 100.0%
Percent of grouped cases correctly classified: 83.33%.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-22
Conducting Discriminant Analysis Interpret the Results
• The interpretation of the discriminant weights, or coefficients,
is similar to that in multiple regression analysis.
• Given the multicollinearity in the predictor variables, there is
no unambiguous measure of the relative importance of the predictors
in discriminating between the groups.
• With this caveat in mind, we can obtain some idea of the relative
importance of the variables by examining the absolute magnitude of
the standardized discriminant function coefficients.
• Some idea of the relative importance of the predictors can also
be obtained by examining the structure correlations, also called
canonical loadings or discriminant loadings. These simple
correlations between each predictor and the discriminant function
represent the variance that the predictor shares with the
function.
• Another aid to interpreting discriminant analysis results is to
develop a Characteristic profile for each group by describing each
group in terms of the group means for the predictor
variables.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-23
Conducting Discriminant Analysis Assess Validity of Discriminant
Analysis
• Many computer programs, such as SPSS, offer a leave-one- out
cross-validation option.
• The discriminant weights, estimated by using the analysis sample,
are multiplied by the values of the predictor variables in the
holdout sample to generate discriminant scores for the cases in the
holdout sample. The cases are then assigned to groups based on
their discriminant scores and an appropriate decision rule. The hit
ratio, or the percentage of cases correctly classified, can then be
determined by summing the diagonal elements and dividing by the
total number of cases.
• It is helpful to compare the percentage of cases correctly
classified by discriminant analysis to the percentage that would be
obtained by chance. Classification accuracy achieved by
discriminant analysis should be at least 25% greater than that
obtained by chance.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-24
Results of Three-Group Discriminant Analysis Ta
bl e
18 .5
1 38.57000 4.50000 4.70000 3.10000 50.30000
2 50.11000 4.00000 4.20000 3.40000 49.50000
3 64.97000 6.10000 5.90000 4.20000 56.00000
Total 51.21667 4.86667 4.93333 3.56667 51.93333
Group Standard Deviations 1 5.29718 1.71594 1.88856 1.19722
8.09732
2 6.00231 2.35702 2.48551 1.50555 9.25263
3 8.61434 1.19722 1.66333 1.13529 7.60117
Total 12.79523 1.97804 2.09981 1.33089 8.57395
Pooled Within-Groups Correlation Matrix INCOME TRAVEL VACATION
HSIZE AGE
INCOME 1.00000
AGE -0.20939 -0.34022 -0.01326 -0.02512 1.00000 Cont.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-25
All-Groups Scattergram
Fig. 18.3
4.0
0.0
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-26
Territorial Map
Fig. 18.4
4.0
0.0
1 1 2 3 1 1 2 2 3 3
1 1 2 2
1 1 1 2 2
1 1 2 21 1 2
21 1 1 2 2
1 1 2 2
1 1 2 2
1 1 2 2 2
2 2 3 2 3 3
2 2 3 3
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-27
The Logit Model
• The dependent variable is binary and there are several
independent variables that are metric
• The binary logit model commonly deals with the issue of how
likely is an observation to belong to each group (classification n
prediction, buyer vs nonbuyer)
• It estimates the probability of an observation belonging to a
particular group
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-28
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-29
Binary Logit Model Formulation
The probability of success may be modeled using the logit model
as:
Or i
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-30
Model Formulation
)exp(1 )exp(
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-31
Properties of the Logit Model
• Although Xi may vary from to , P is constrained to lie between 0
and 1.
• When Xi approaches , P approaches 0.
• When Xi approaches , P approaches 1.
∞−
∞+
∞+
∞−
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-32
Estimation and Model Fit
• Both these measures are similar to R2 in multiple
regression.
• The Cox & Snell R Square can not equal 1.0, even if the fit
is perfect.
• This limitation is overcome by the Nagelkerke R Square.
• Compare predicted and actual values of Y to determine the
percentage of correct predictions.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-33
Significance Testing
The significance of the estimated coefficients is based on Wald’s
statistic.
Wald = (ai / SEai) 2
ai = logistical coefficient for that predictor variable
SEai= standard error of the logistical coefficient
The Wald statistic is chi-square distributed with 1 degree of
freedom if the
variable is metric and the number of categories minus 1 if the
variable is
nonmetric.
The significance of the estimated coefficients is based on Wald’s
statistic.
Wald = (ai / SEai)2
ai = logistical coefficient for that predictor variable
SEai= standard error of the logistical coefficient
The Wald statistic is chi-square distributed with 1 degree of
freedom if the variable is metric and the number of categories
minus 1 if the variable is nonmetric.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-34
Interpretation of Coefficients
• If Xi is increased by one unit, the log odds will change by ai
units, when the effect of other independent variables is held
constant.
• The sign of ai will determine whether the probability increases
(if the sign is positive) or decreases (if the sign is negative) by
this amount.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-35
Explaining Brand Loyalty
Table 18.6 No. Loyalty Brand Product Shopping 1 1 4 3 5 2 1 6 4 4 3
1 5 2 4 4 1 7 5 5 5 1 6 3 4 6 1 3 4 5 7 1 5 5 5 8 1 5 4 2 9 1 7 5
4
10 1 7 6 4 11 1 6 7 2 12 1 5 6 4 13 1 7 3 3 14 1 5 1 4 15 1 7 5 5
16 0 3 1 3 17 0 4 6 2 18 0 2 5 2 19 0 5 2 4 20 0 4 1 3 21 0 3 3 4
22 0 3 4 5 23 0 3 6 3 24 0 4 4 2 25 0 6 3 6 26 0 3 6 3 27 0 4 3 2
28 0 3 5 2 29 0 5 5 3 30 0 1 3 2
No.
Loyalty
Brand
Product
Shopping
1
1
4
3
5
2
1
6
4
4
3
1
5
2
4
4
1
7
5
5
5
1
6
3
4
6
1
3
4
5
7
1
5
5
5
8
1
5
4
2
9
1
7
5
4
10
1
7
6
4
11
1
6
7
2
12
1
5
6
4
13
1
7
3
3
14
1
5
1
4
15
1
7
5
5
16
0
3
1
3
17
0
4
6
2
18
0
2
5
2
19
0
5
2
4
20
0
4
1
3
21
0
3
3
4
22
0
3
4
5
23
0
3
6
3
24
0
4
4
2
25
0
6
3
6
26
0
3
6
3
27
0
4
3
2
28
0
3
5
2
29
0
5
5
3
30
0
1
3
2
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-36
Results of Logistic Regression
Table 18.7
Dependent Variable Encoding Original Value Internal Value Not Loyal
0 Loyal 1
Model Summary
Nagelkerke R Square
1 23.471(a) .453 .604 a Estimation terminated at iteration number 6
because parameter estimates changed by less than .001.
Dependent Variable Encoding
a Estimation terminated at iteration number 6 because parameter
estimates changed by less than .001.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-37
Results of Logistic Regression
Variable(s) entered on step 1: Brand, Product, Shopping.a.
1.274 .479 7.075 1 .008 3.575 .186 .322 .335 1 .563 1.205 .590 .491
1.442 1 .230 1.804
-8.642 3.346 6.672 1 .010 .000
Brand Product Shopping Constant
Classification Tablea
12 3 80.0 3 12 80.0
80.0
Step 1 Not Loyal Loyal Loyalty to the Brand Percentage
Correct
Predicted
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-38
Results of Three-Group Discriminant Analysis Multinomial Logistic
regression
Ta bl
e 18
1 38.57000 4.50000 4.70000 3.10000 50.30000
2 50.11000 4.00000 4.20000 3.40000 49.50000
3 64.97000 6.10000 5.90000 4.20000 56.00000
Total 51.21667 4.86667 4.93333 3.56667 51.93333
Group Standard Deviations 1 5.29718 1.71594 1.88856 1.19722
8.09732
2 6.00231 2.35702 2.48551 1.50555 9.25263
3 8.61434 1.19722 1.66333 1.13529 7.60117
Total 12.79523 1.97804 2.09981 1.33089 8.57395
Pooled Within-Groups Correlation Matrix INCOME TRAVEL VACATION
HSIZE AGE
INCOME 1.00000
AGE -0.20939 -0.34022 -0.01326 -0.02512 1.00000
Cont.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-39
Chapter Nineteen
Factor Analysis
The company asked 20 questions about casual dining and lifestyle.
How are these questions related to one another? What are the
important dimensions or factors?
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-40
Factor Analysis
• Factor analysis is a general name denoting a class of procedures
primarily used for data reduction and summarization.
• Factor analysis is an interdependence technique in that an entire
set of interdependent relationships is examined without making the
distinction between dependent and independent variables.
• Factor analysis is used in the following circumstances: • To
identify underlying dimensions, or factors, that explain
the correlations among a set of variables. • To identify a new,
smaller, set of uncorrelated variables to
replace the original set of correlated variables in subsequent
multivariate analysis (regression or discriminant analysis).
• To identify a smaller set of salient variables from a larger set
for use in subsequent multivariate analysis.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-41
Factors Underlying Selected Psychographics and Lifestyles
Fig. 19.1 Factor 2
Plays Movies
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-42
Statistics Associated with Factor Analysis
• Bartlett's test of sphericity. Bartlett's test of sphericity is a
test statistic used to examine the hypothesis that the variables
are uncorrelated in the population. In other words, the population
correlation matrix is an identity matrix; each variable correlates
perfectly with itself (r = 1) but has no correlation with the other
variables (r = 0).
• Correlation matrix. A correlation matrix is a lower triangle
matrix showing the simple correlations, r, between all possible
pairs of variables included in the analysis. The diagonal elements,
which are all 1, are usually omitted.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-43
Statistics Associated with Factor Analysis
• Communality. Communality is the amount of variance a variable
shares with all the other variables being considered. This is also
the proportion of variance explained by the common factors.
• Eigenvalue. The eigenvalue represents the total variance
explained by each factor.
• Factor loadings. Factor loadings are simple correlations between
the variables and the factors.
• Factor loading plot. A factor loading plot is a plot of the
original variables using the factor loadings as coordinates.
• Factor matrix. A factor matrix contains the factor loadings of
all the variables on all the factors extracted.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-44
Statistics Associated with Factor Analysis
• Factor scores. Factor scores are composite scores estimated for
each respondent on the derived factors.
• Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy. The
Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy is an index
used to examine the appropriateness of factor analysis. High values
(between 0.5 and 1.0) indicate factor analysis is appropriate.
Values below 0.5 imply that factor analysis may not be
appropriate.
• Percentage of variance. The percentage of the total variance
attributed to each factor.
• Residuals are the differences between the observed correlations,
as given in the input correlation matrix, and the reproduced
correlations, as estimated from the factor matrix.
• Scree plot. A scree plot is a plot of the Eigenvalues against the
number of factors in order of extraction.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-45
Conducting Factor Analysis
RESPONDENT NUMBER V1 V2 V3 V4 V5 V6
1 7.00 3.00 6.00 4.00 2.00 4.00 2 1.00 3.00 2.00 4.00 5.00 4.00 3
6.00 2.00 7.00 4.00 1.00 3.00 4 4.00 5.00 4.00 6.00 2.00 5.00 5
1.00 2.00 2.00 3.00 6.00 2.00 6 6.00 3.00 6.00 4.00 2.00 4.00 7
5.00 3.00 6.00 3.00 4.00 3.00 8 6.00 4.00 7.00 4.00 1.00 4.00 9
3.00 4.00 2.00 3.00 6.00 3.00
10 2.00 6.00 2.00 6.00 7.00 6.00 11 6.00 4.00 7.00 3.00 2.00 3.00
12 2.00 3.00 1.00 4.00 5.00 4.00 13 7.00 2.00 6.00 4.00 1.00 3.00
14 4.00 6.00 4.00 5.00 3.00 6.00 15 1.00 3.00 2.00 2.00 6.00 4.00
16 6.00 4.00 6.00 3.00 3.00 4.00 17 5.00 3.00 6.00 3.00 3.00 4.00
18 7.00 3.00 7.00 4.00 1.00 4.00 19 2.00 4.00 3.00 3.00 6.00 3.00
20 3.00 5.00 3.00 6.00 4.00 6.00 21 1.00 3.00 2.00 3.00 5.00 3.00
22 5.00 4.00 5.00 4.00 2.00 4.00 23 2.00 2.00 1.00 5.00 4.00 4.00
24 4.00 6.00 4.00 6.00 4.00 7.00 25 6.00 5.00 4.00 2.00 1.00 4.00
26 3.00 5.00 4.00 6.00 4.00 7.00 27 4.00 4.00 7.00 2.00 2.00 5.00
28 3.00 7.00 2.00 6.00 4.00 3.00 29 4.00 6.00 3.00 7.00 2.00 7.00
30 2.00 3.00 2.00 4.00 7.00 2.00
Table 19.1
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-46
Correlation Matrix
Variables V1 V2 V3 V4 V5 V6 V1 1.000 V2 -0.530 1.000 V3 0.873
-0.155 1.000 V4 -0.086 0.572 -0.248 1.000 V5 -0.858 0.020 -0.778
-0.007 1.000 V6 0.004 0.640 -0.018 0.640 -0.136 1.000
Table 19.2
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-47
Conducting Factor Analysis: Determine the Method of Factor
Analysis
• In principal components analysis, the total variance in the data
is considered. The diagonal of the correlation matrix consists of
unities, and full variance is brought into the factor matrix.
Principal components analysis is recommended when the primary
concern is to determine the minimum number of factors that will
account for maximum variance in the data for use in subsequent
multivariate analysis. The factors are called principal
components.
• In common factor analysis, the factors are estimated based only
on the common variance. Communalities are inserted in the diagonal
of the correlation matrix. This method is appropriate when the
primary concern is to identify the underlying dimensions and the
common variance is of interest. This method is also known as
principal axis factoring.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-48
Results of Principal Components Analysis
Communalities Variables Initial Extraction V1 1.000 0.926 V2 1.000
0.723 V3 1.000 0.894 V4 1.000 0.739 V5 1.000 0.878 V6 1.000
0.790
Initial Eigen values Factor Eigen value % of variance Cumulat. % 1
2.731 45.520 45.520 2 2.218 36.969 82.488 3 0.442 7.360 89.848 4
0.341 5.688 95.536 5 0.183 3.044 98.580 6 0.085 1.420 100.000
Table 19.3
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-49
Results of Principal Components Analysis
Extraction Sums of Squared Loadings
Factor Eigen value % of variance Cumulat. % 1 2.731 45.520 45.520 2
2.218 36.969 82.488
Factor Matrix Variables Factor 1 Factor 2 V1 0.928 0.253 V2 -0.301
0.795 V3 0.936 0.131 V4 -0.342 0.789 V5 -0.869 -0.351 V6 -0.177
0.871
Rotation Sums of Squared Loadings
Factor Eigenvalue % of variance Cumulat. % 1 2.688 44.802 44.802 2
2.261 37.687 82.488
Table 19.3, cont.
Factor
Factor
Eigenvalue
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-50
Results of Principal Components Analysis
Rotated Factor Matrix Variables Factor 1 Factor 2 V1 0.962 -0.027
V2 -0.057 0.848 V3 0.934 -0.146 V4 -0.098 0.845 V5 -0.933 -0.084 V6
0.083 0.885
Factor Score Coefficient Matrix Variables Factor 1 Factor 2 V1
0.358 0.011 V2 -0.001 0.375 V3 0.345 -0.043 V4 -0.017 0.377 V5
-0.350 -0.059 V6 0.052 0.395
Table 19.3, cont.
Rotated Factor Matrix
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-51
Conducting Factor Analysis: Rotate Factors
• Although the initial or unrotated factor matrix indicates the
relationship between the factors and individual variables, it
seldom results in factors that can be interpreted, because the
factors are correlated with many variables. Therefore, through
rotation, the factor matrix is transformed into a simpler one that
is easier to interpret.
• In rotating the factors, we would like each factor to have
nonzero, or significant, loadings or coefficients for only some of
the variables. Likewise, we would like each variable to have
nonzero or significant loadings with only a few factors, if
possible with only one.
• The rotation is called orthogonal rotation if the axes are
maintained at right angles.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-52
Conducting Factor Analysis: Rotate Factors
• The most commonly used method for rotation is the varimax
procedure. This is an orthogonal method of rotation that minimizes
the number of variables with high loadings on a factor, thereby
enhancing the interpretability of the factors. Orthogonal rotation
results in factors that are uncorrelated.
• The rotation is called oblique rotation when the axes are not
maintained at right angles, and the factors are correlated.
Sometimes, allowing for correlations among factors can simplify the
factor pattern matrix. Oblique rotation should be used when factors
in the population are likely to be strongly correlated.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-53
Factor Matrix Before and After Rotation
Factors
Fig. 19.5
Factors Variables
2
X
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-54
Conducting Factor Analysis: Interpret Factors
• A factor can then be interpreted in terms of the variables that
load high on it.
• Another useful aid in interpretation is to plot the variables,
using the factor loadings as coordinates. Variables at the end of
an axis are those that have high loadings on only that factor, and
hence describe the factor.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-55
Chapter Twenty
Cluster Analysis
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-56
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-57
Chapter Outline
known groups) 3) Statistics Associated with Cluster Analysis 4)
Conducting Cluster Analysis
i. Formulating the Problem ii. Selecting a Distance or Similarity
Measure iii. Selecting a Clustering Procedure iv. Deciding on the
Number of Clusters v. Interpreting and Profiling the Clusters vi.
Assessing Reliability and Validity
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-58
Cluster Analysis
• Cluster analysis is a class of techniques used to classify
objects or cases into relatively homogeneous groups called
clusters. Objects in each cluster tend to be similar to each other
and dissimilar to objects in the other clusters. Cluster analysis
is also called classification analysis, or numerical
taxonomy.
• Both cluster analysis and discriminant analysis are concerned
with classification. However, discriminant analysis requires prior
knowledge of the cluster or group membership for each object or
case included, to develop the classification rule. In contrast, in
cluster analysis there is no a priori information about the group
or cluster membership for any of the objects. Groups or clusters
are suggested by the data, not defined a priori.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-59
A Practical Clustering Situation
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-60
An Ideal Clustering Situation
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-61
Statistics Associated with Cluster Analysis
• Agglomeration schedule. An agglomeration schedule gives
information on the objects or cases being combined at each stage of
a hierarchical clustering process.
• Cluster centroid. The cluster centroid is the mean values of the
variables for all the cases or objects in a particular
cluster.
• Cluster centers. The cluster centers are the initial starting
points in nonhierarchical clustering. Clusters are built around
these centers, or seeds.
• Cluster membership. Cluster membership indicates the cluster to
which each object or case belongs.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-62
Statistics Associated with Cluster Analysis
• Dendrogram. A dendrogram, or tree graph, is a graphical device
for displaying clustering results. Vertical lines represent
clusters that are joined together. The position of the line on the
scale indicates the distances at which clusters were joined. The
dendrogram is read from left to right. Figure 20.8 is a
dendrogram.
• Distances between cluster centers. These distances indicate how
separated the individual pairs of clusters are. Clusters that are
widely separated are distinct, and therefore desirable.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-63
Attitudinal Data For Clustering
Case No. V1 V2 V3 V4 V5 V6
1 6 4 7 3 2 3 2 2 3 1 4 5 4 3 7 2 6 4 1 3 4 4 6 4 5 3 6 5 1 3 2 2 6
4 6 6 4 6 3 3 4 7 5 3 6 3 3 4 8 7 3 7 4 1 4 9 2 4 3 3 6 3 10 3 5 3
6 4 6 11 1 3 2 3 5 3 12 5 4 5 4 2 4 13 2 2 1 5 4 4 14 4 6 4 6 4 7
15 6 5 4 2 1 4 16 3 5 4 6 4 7 17 4 4 7 2 2 5 18 3 7 2 6 4 3 19 4 6
3 7 2 7 20 2 3 2 4 7 2
Ta bl
e 20
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-64
Conducting Cluster Analysis: Select a Distance or Similarity
Measure
• The most commonly used measure of similarity is the Euclidean
distance or its square. The Euclidean distance is the square root
of the sum of the squared differences in values for each variable.
Other distance measures are also available. The city-block or
Manhattan distance between two objects is the sum of the absolute
differences in values for each variable. The Chebychev distance
between two objects is the maximum absolute difference in values
for any variable.
• If the variables are measured in vastly different units, the
clustering solution will be influenced by the units of measurement.
In these cases, before clustering respondents, we must standardize
the data by rescaling each variable to have a mean of zero and a
standard deviation of unity. It is also desirable to eliminate
outliers (cases with atypical values).
• Use of different distance measures may lead to different
clustering results. Hence, it is advisable to use different
measures and compare the results.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-65
A Classification of Clustering Procedures
Fig. 20.4
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-66
Other Agglomerative Clustering Methods
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-67
Results of Hierarchical Clustering
Stage cluster Clusters combined first appears
Stage Cluster 1 Cluster 2 Coefficient Cluster 1 Cluster 2 Next
stage 1 14 16 1.000000 0 0 6 2 6 7 2.000000 0 0 7 3 2 13 3.500000 0
0 15 4 5 11 5.000000 0 0 11 5 3 8 6.500000 0 0 16 6 10 14 8.160000
0 1 9 7 6 12 10.166667 2 0 10 8 9 20 13.000000 0 0 11 9 4 10
15.583000 0 6 12 10 1 6 18.500000 6 7 13 11 5 9 23.000000 4 8 15 12
4 19 27.750000 9 0 17 13 1 17 33.100000 10 0 14 14 1 15 41.333000
13 0 16 15 2 5 51.833000 3 11 18 16 1 3 64.500000 14 5 19 17 4 18
79.667000 12 0 18 18 2 4 172.662000 15 17 19 19 1 2 328.600000 16
18 0
Agglomeration Schedule Using Ward’s Procedure Table 20.2
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-68
Conducting Cluster Analysis: Decide on the Number of Clusters
• Theoretical, conceptual, or practical considerations may suggest
a certain number of clusters.
• In hierarchical clustering, the distances at which clusters are
combined can be used as criteria. This information can be obtained
from the agglomeration schedule or from the dendrogram.
• In nonhierarchical clustering, the ratio of total within-group
variance to between-group variance can be plotted against the
number of clusters. The point at which an elbow or a sharp bend
occurs indicates an appropriate number of clusters.
• The relative sizes of the clusters should be meaningful.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-69
Conducting Cluster Analysis: Interpreting and Profiling the
Clusters
• Interpreting and profiling clusters involves examining the
cluster centroids. The centroids enable us to describe each cluster
by assigning it a name or label.
• It is often helpful to profile the clusters in terms of variables
that were not used for clustering. These may include demographic,
psychographic, product usage, media usage, or other
variables.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-70
Cluster Distribution
N % of
Combined % of Total 1 6 30.0% 30.0% 2 6 30.0% 30.0% 3 8 40.0%
40.0%
Cluster
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-71
Cluster Profiles
Fun Bad for Budget Eating Out
Mean Std. Deviation Mean Std. Deviation Mean Std. Deviation 1 1.67
.516 3.00 .632 1.83 .753 2 3.50 .548 5.83 .753 3.33 .816 3 5.75
1.035 3.63 .916 6.00 1.069
Cluster
Combined 3.85 1.899 4.10 1.410 3.95 2.012
Best Buys Don't Care Compare Prices Mean Std. Deviation Mean Std.
Deviation Mean Std. Deviation
3.50 1.049 5.50 1.049 3.33 .816 6.00 .632 3.50 .837 6.00 1.549 3.13
.835 1.88 .835 3.88 .641 4.10 1.518 3.45 1.761 4.35 1.496
Fun
Copyright © 2010 Pearson Education, Inc. publishing as Prentice
Hall 18-72
Slide Number 1
Direct marketing scenario
Discriminant Analysis
Discriminant Analysis
Geometric Interpretation
Conducting Discriminant Analysis
Conducting Discriminant Analysis Estimate the Discriminant Function
Coefficients
Results of Two-Group Discriminant Analysis
Results of Two-Group Discriminant Analysis
Results of Two-Group Discriminant Analysis
Results of Two-Group Discriminant Analysis
Conducting Discriminant Analysis Interpret the Results
Conducting Discriminant Analysis Assess Validity of Discriminant
Analysis
Results of Three-Group Discriminant Analysis
All-Groups Scattergram
Territorial Map
Estimation and Model Fit
Slide Number 39
Statistics Associated with Factor Analysis
Statistics Associated with Factor Analysis
Statistics Associated with Factor Analysis
Conducting Factor Analysis
Results of Principal Components Analysis
Results of Principal Components Analysis
Results of Principal Components Analysis
Conducting Factor Analysis: Rotate Factors
Conducting Factor Analysis: Rotate Factors
Factor Matrix Before and After Rotation
Conducting Factor Analysis: Interpret Factors
Slide Number 55
Slide Number 56
Attitudinal Data For Clustering
A Classification of Clustering Procedures
Other Agglomerative Clustering Methods
Results of Hierarchical Clustering
Conducting Cluster Analysis:Interpreting and Profiling the
Clusters
Cluster Distribution
Cluster Profiles