+ All Categories
Home > Documents > Multidimensional

Multidimensional

Date post: 04-Jan-2016
Category:
Upload: suresh84123
View: 6 times
Download: 0 times
Share this document with a friend
Description:
Scaling
Popular Tags:
12
Appendix H
Transcript
Page 1: Multidimensional

Appendix H

bow83755_app H_001-012.qxd 23/5/08 4:52 PM Page H1

Page 2: Multidimensional

H2 Appendix H Factor Analysis, Cluster Analysis, and Multidimensional Scaling

APPENDIX H: Factor Analysis, Cluster Analysis, and Multidimensional Scaling1

In the following two exercises, we illustrate factor analysis, cluster analysis, and multidimen-sional scaling.

H.1 Factor Analysis: An Application of CorrelationA personnel officer interviewed and rated 48 job applicants on the following 15 variables.

1 Form of application letter 6 Lucidity 11 Ambition

2 Appearance 7 Honesty 12 Grasp

3 Academic ability 8 Salesmanship 13 Potential

4 Likeability 9 Experience 14 Keenness to join

5 Self-confidence 10 Drive 15 Suitability

In order to better understand the relationships between the 15 variables, the personnel officer willuse a technique called factor analysis. The first step in factor analysis is to standardize each vari-able. A variable is standardized by calculating the mean and the standard deviation of the 48 valuesof the variable and then subtracting from each value the mean and dividing the resulting differenceby the standard deviation. The variance of the values of each standardized variable can be shownto be equal to 1, and the pairwise correlations between the standardized variables can be shown tobe equal to the pairwise correlations between the original variables. Although we will not give the48 values of each of the 15 variables (see Kendall (1980)), we present in Table H.1 a matrix con-taining the pairwise correlations of these variables. Considering the matrix, we note that there areso many fairly large pairwise correlations that it is difficult to understand the relationships betweenthe 15 variables. When we use factor analysis, we determine whether there are uncorrelatedfactors, fewer in number than 15, that (1) explain a large percentage of the total variation in the 15variables and (2) help us to understand the relationships between the 15 variables.

To find the desired factors, we first find what are called principal components. The firstprincipal component is the composite of the 15 standardized variables that explains the high-est percentage of the total of the variances of these variables. The SPSS output in Figure H.1 tellsus that the first principal component is

y(1) � .44676x1 � .58285x2 � .10900x3 � � .64584x15

where x1, x2, . . . , x15 denote the 15 standardized variables. Here, the coefficient multiplied byeach xi is called the factor loading of y(1) on xi and can be shown to equal the pairwise correla-tion between y(1) and xi. For example, the factor loading .58285 says that the pairwise correla-tion between y(1) and x2 is .58285. The SPSS output also tells us that the variance (or eigenvalue)of the 48 values of y(1) is 7.50395. Furthermore, since the sum of the variances of the 15 stan-dardized variables is 15, the SPSS output tells us that the variance of y(1) explains(7.50395�15)100% � 50% of the total variation in the standardized variables. Similarly, theSPSS output shows the second principal component, which has a variance of 2.06148 and ex-plains (2.06148�15)100% � 13.7% of the total variation in the standardized variables. In all,there are 15 principal components that are uncorrelated with each other and explain a cumulativepercentage of 100 percent of the total variation in the 15 variables. Also, note that the varianceof a particular principal component can be shown to equal the sum of the squared pairwise cor-relations between the principal component and the 15 standardized variables. For example, ex-amining the first column of pairwise correlations in the upper portion of Figure H.1, it followsthat the variance of the first principal component is

(.44676)2 � (.58285)2 � � (.64584)2 � 7.50395

Although the SPSS output shows the percentage of the total variation explained by each ofthe 15 principal components, it only shows 7 of these principal components. The reason is that,

� � �

� � �

1Some of the discussion and three examples in this appendix are based on Chapters 15 and 16 in Intermediate StatisticalMethods, A Computer Package Approach (Prentice Hall, 1983) by Mark L. Berenson, David M. Levine, and Mathew Goldstein.

bow83755_app H_001-012.qxd 23/5/08 4:52 PM Page H2

Page 3: Multidimensional

F I G U R E H.1 SPSS Output of a Factor Analysis of the Applicant Data (7 Factors Used)

Appendix H Factor Analysis, Cluster Analysis, and Multidimensional Scaling H3

T A B L E H.1 A Matrix of Pairwise Correlations for the Applicant Data

Variable 1 2 3 4 5 6 7 8 9 10 11 12 13 14 151 1.00 .24 .04 .31 .09 .23 �.11 .27 .55 .35 .28 .34 .37 .47 .59

2 1.00 .12 .38 .43 .37 .35 .48 .14 .34 .55 .51 .51 .28 .38

3 1.00 .00 .00 .08 �.03 .05 .27 .09 .04 .20 .29 �.32 .14

4 1.00 .30 .48 .65 .35 .14 .39 .35 .50 .61 .69 .33

5 1.00 .81 .41 .82 .02 .70 .84 .72 .67 .48 .25

6 1.00 .36 .83 .15 .70 .76 .88 .78 .53 .42

7 1.00 .23 �.16 .28 .21 .39 .42 .45 .00

8 1.00 .23 .81 .86 .77 .73 .55 .55

9 1.00 .34 .20 .30 .35 .21 .69

10 1.00 .78 .71 .79 .61 .62

11 1.00 .78 .77 .55 .43

12 1.00 .88 .55 .53

13 1.00 .54 .57

14 1.00 .40

15 1.00

Source: Reproduced by permission of the Publishers, Charles Griffin & Company Ltd., of London and High Wycombe, from Kendall, MultivariateAnalysis, 2nd. (1980).

FACTOR MATRIX USING PRINCIPAL FACTOR, NO ITERATIONS

FACTOR 1a FACTOR 2b FACTOR 3 FACTOR 4 FACTOR 5 FACTOR 6 FACTOR 7 FACTOR EIGENVALUE PCT OF VAR CUM PCT

x1 0.44676 0.61880 0.37635 -0.12148 0.10168 0.42496 0.08504 1 7.50395a 50.0 50.0

x2 0.58285 -0.05019 -0.01995 0.28167 0.75188 -0.03325 0.00345 2 2.06148b 13.7 63.8

x3 0.10900 0.33907 -0.49450 0.71393 -0.18095 0.16113 0.18206 3 1.46768 9.8 73.6

x4 0.61698 -0.18150 0.57968 0.35707 -0.09904 0.07837 -0.05714 4 1.20910 8.1 81.6

x5 0.79807 -0.35611 -0.29930 -0.17939 0.00025 0.00377 0.06620 5 0.74143 4.9 86.6

x6 0.86688 -0.18544 -0.18414 -0.06923 -0.17813 0.11744 -0.30132 6 0.48402 3.2 89.8

x7 0.43330 -0.58195 0.36036 0.44570 -0.06052 -0.21591 0.06539 7 0.34408 2.3 92.1

x8 0.88244 -0.05647 -0.24821 -0.22786 0.02960 -0.06262 0.00981 8 0.31027 2.1 94.1

x9 0.36549 0.79438 0.09258 0.07431 -0.08999 -0.25962 -0.06758 9 0.25965 1.7 95.9

x10 0.86261 0.06908 -0.09993 -0.16645 -0.17554 -0.17549 0.29665 10 0.20575 1.4 97.2

x11 0.87185 -0.09840 -0.25565 -0.20948 0.13698 0.07573 0.12514 11 0.15093 1.0 98.3

x12 0.90776 -0.03023 -0.13453 0.09726 -0.06359 0.10194 -0.24685 12 0.09327 0.6 98.9

x13 0.91310 0.03250 -0.07327 0.21842 -0.10489 0.04666 -0.00366 13 0.07628 0.5 99.4

x14 0.71033 -0.11478 0.55801 -0.23496 -0.10071 0.05911 0.14353 14 0.05766 0.4 99.8

x15 0.64584 0.60374 0.10687 -0.02889 0.06431 -0.29308 -0.10537 15 0.03441 0.2 100.0

VARIMAX ROTATED FACTOR MATRIX

FACTOR 1 FACTOR 2 FACTOR 3 FACTOR 4 FACTOR 5 FACTOR 6 FACTOR 7 VARIABLE COMMUNALITY

x1 0.12359 0.04204 0.42738 -0.00497 0.85336✔ 0.09437 0.01521 x1 0.93708

x2 0.32636 0.21176 0.11729 0.05621 0.07715 0.90226✔ 0.01101 x2 0.98841

x3 0.05396 -0.02816 0.13368 0.97451✔ -0.01201 0.03936 0.00014 x3 0.97293

x4 0.22106 0.85846✔ 0.13049 -0.01215 0.26494 0.09997 0.11479 x4 0.89636

x5 0.91144✔ 0.15413 -0.08310 -0.04208 -0.05072 0.13904 -0.06943 x5 0.88989

x6 0.87938✔ 0.25709 0.10119 0.01702 0.05912 -0.00285 0.32778 x6 0.96088

x7 0.20161 0.87606✔ -0.13423 0.00066 -0.22952 0.16057 -0.06982 x7 0.90948

x8 0.90070✔ 0.07788 0.21967 -0.05953 0.05564 0.16142 -0.04510 x8 0.90031

x9 0.06497 -0.03039 0.88690✔ 0.16270 0.20105 -0.01159 -0.00158 x9 0.85878

x10 0.79694✔ 0.20942 0.35909 0.02333 0.10603 -0.02550 -0.34034 x10 0.93618

x11 0.89427✔ 0.06033 0.08680 -0.01813 0.16585 0.26018 -0.11304 x11 0.91921

x12 0.79690✔ 0.30629 0.23598 0.14095 0.12915 0.14954 0.29053 x12 0.92786

x13 0.73031✔ 0.40428 0.29019 0.26489 0.16244 0.14080 0.06079 x13 0.90108

x14 0.45932 0.56662 0.16988 -0.38607 0.42522 -0.03753 -0.16248 x14 0.91856

x15 0.33966 0.07614 0.84300✔ -0.01002 0.18417 0.17055 0.00713 x15 0.89497

aFirst principal component has variance 7.50395.bSecond principal component has variance 2.06148.

since we wish to obtain final factors that are fewer in number than the number of original vari-ables, we have instructed SPSS to retain 7 principal components for “further study.” The choiceof 7 principal components, while somewhat arbitrary, is based on the belief that 7 principal com-ponents will explain a high percentage of the total variation in the 15 variables. The SPSS output

bow83755_app H_001-012.qxd 23/5/08 4:52 PM Page H3

Page 4: Multidimensional

H4 Appendix H Factor Analysis, Cluster Analysis, and Multidimensional Scaling

tells us that this choice is reasonable—the first 7 principal components explain 92.1 percent ofthe total variation in the 15 variables. The reason that we need to “further study” the 7 principalcomponents is that, in general, principal components tend to be correlated with many of the fac-tors (see the factor loadings on the SPSS output) and thus tend to be difficult to interpret in ameaningful way. For this reason, we rotate the 7 principal components by using VARIMAX ro-tation. This technique attempts to find final uncorrelated factors each of which loads highly on(that is, is strongly correlated with) a limited number of the 15 original standardized variablesand loads as low as possible on the rest of the standardized variables. The SPSS output showsthe results of the VARIMAX rotation. Examining the check marks that we have placed on theoutput, we see that Factor 1 loads heavily on variables 5 (self-confidence), 6 (lucidity), 8 (sales-manship), 10 (drive), 11 (ambition), 12 (grasp) and 13 (potential). Therefore, Factor 1 might beinterpreted as an “extroverted personality” dimension. Factor 2 loads heavily on variables 4 (like-ability) and 7 (honesty). Therefore, Factor 2 might be interpreted as an “agreeable personality”dimension. Similarly, Factors 3 through 7 might be interpreted as the following dimensions:Factor 3: “experience”; Factor 4: “academic ability”; Factor 5: “form of application letter”;Factor 6: “appearance”; Factor 7: no discernible dimension. Note that, although variable14 (keenness to join) does not load heavily on any factor, its correlation of .56662 with Factor 2(“agreeable personality”) might mean that it should be interpreted to be part of the agreeable per-sonality dimension.

We next note that the communality to the right of each variable in Figure H.1 is the per-centage of the variance of the variable that is explained by the 7 factors. The communality foreach variable can be shown to equal the sum of the squared pairwise correlations between thevariable and the 7 factors. For example, examining the first row of pairwise correlations in thelower portion of Figure H.1, it follows that the communality for factor 1 is

(.12359)2 � (.04204)2 � � (.01521)2 � .93708

All of the communalities in Figure H.1 seem high. However, some statisticians might say thatwe have retained too many factors. To understand this, note that the upper portion of Figure H.1tells us that the sum of the variances of the first seven factors is

7.50395 � 2.06148 � 1.46768 � 1.20910 � .74143 � .48402 � .34408 � 13.81174

This variance is (13.81174�15)100% � 92.1% of the sum of the variances of the 15 stan-dardized variables. Some statisticians would suggest that we retain a factor only if its vari-ance exceeds 1, the variance of each standardized variable. If we do this, we would retain 4factors, since the variance of the fourth factor is 1.20910 and the variance of the fifth factoris .74143. Figure H.2 gives the SAS output obtained by using 4 factors. Examining the checkmarks that we have placed on the output, we see that Factors 1 through 4 might be interpretedas follows: Factor 1: “extroverted personality”; Factor 2: “experience”; Factor 3: “agreeablepersonality”; Factor 4: “academic ability.” Variable 2 (appearance) does not load heavily onany factor and thus is its “own factor,” as Factor 6 on the SPSS output in Figure H.1 indi-cated is true. Variable 1 (form of application letter) loads heavily on Factor 2 (“experience”).In summary, there is not much difference between the 7 factor and 4 factor solutions. Wemight therefore conclude that the 15 variables can be reduced to the following 5 uncorrelatedfactors: “extroverted personality,” “experience,” “agreeable personality,” “academic ability,”and “appearance.”

a In Applied Multivariate Techniques (John Wiley and Sons, 1996), Subhash Sharma considers a studyin which 143 respondents rated three brands of laundry detergents on 12 product attributes using a 5-point Likert scale. The 12 product attributes are:

V1: Gentle to natural fabrics V7: Makes colors bright

V2: Won’t harm colors V8: Removes grease stains

V3: Won’t harm synthetics V9: Good for greasy oil

V4: Safe for lingerie V10: Pleasant fragrance

V5: Strong, powerful V11: Removes collar soil

V6: Gets dirt out V12: Removes stubborn stains

� � �

bow83755_app H_001-012.qxd 23/5/08 4:52 PM Page H4

Page 5: Multidimensional

F I G U R E H.2 SAS Output of a Factor Analysis of the Applicant Data (4 Factors Used)

Appendix H Factor Analysis, Cluster Analysis, and Multidimensional Scaling H5

Table H.2 is a matrix containing the pairwise correlations between the variables, and Figure H.3 is theSPSS output of a factor analysis of the detergent data. Why did the analyst choose to retain twofactors? Discuss why Factor 1 can be interpreted to be the ability of the detergent to clean clothes.Discuss why Factor 2 can be interpreted to be the mildness of the detergent.

b Table H.3 shows the output of a factor analysis of the ratings of 82 respondents who were asked toevaluate a particular discount store on 29 attributes using a 7-point Likert scale. Interpret and givenames to the five factors.

PRINCIPAL AXIS

PRIOR ESTIMATES OF COMMUNALITY

X1 X2 X3 X4 X5 X6 X7 X8

1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000

X9 XA XB XC XD XE XF

1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000

1 2 3 4 5 6 7 8

EIGENVALUES 7.503986 2.061498 1.467686 1.209097 0.741423 0.484018 0.344075 0.310272

PORTION 0.500 0.137 0.098 0.081 0.049 0.032 0.023 0.021

CUM PORTION 0.500 0.638 0.736 0.816 0.866 0.898 0.921 0.941

9 10 11 12 13 14 15

EIGENVALUES 0.259652 0.205746 0.150932 0.093269 0.076283 0.057655 0.034407

PORTION 0.017 0.014 0.010 0.006 0.005 0.004 0.002

CUM PORTION 0.959 0.972 0.983 0.989 0.994 0.998 1.000

4 FACTORS WILL BE RETAINED.

FACTOR PATTERN

FACTOR1 FACTOR2 FACTOR3 FACTOR4

X1 0.44676 0.61880 0.37635 -0.12148 FORM OF APPLICATION LETTER

X2 0.58285 -0.05019 -0.01995 0.28166 APPEARANCE

X3 0.10900 0.33907 -0.49449 0.71391 ACADEMIC ABILITY

X4 0.61699 -0.18149 0.57967 0.35706 LIKEABILITY

X5 0.79807 -0.35610 -0.29930 -0.17939 SELF CONFIDENCE

X6 0.86688 -0.18543 -0.18414 -0.06923 LUCIDITY

X7 0.43330 -0.58195 0.36035 0.44569 HONESTY

X8 0.88244 -0.05647 -0.24821 -0.22786 SALESMANSHIP

X9 0.36549 0.79437 0.09258 0.07431 EXPERIENCE

XA 0.86261 0.06908 -0.09993 -0.16645 DRIVE

XB 0.87186 -0.09840 -0.25564 -0.20948 AMBITION

XC 0.90776 -0.03023 -0.13453 0.09726 GRASP

XD 0.91310 0.03250 -0.07327 0.21842 POTENTIAL

XE 0.71033 -0.11478 0.55800 -0.23495 KEENNESS TO JOIN

XF 0.64584 0.60373 0.10687 -0.02889 SUITABILITY

VARIMAX

ROTATED FACTOR PATTERN

FACTOR1 FACTOR2 FACTOR3 FACTOR4

X1 0.11447 0.83336✔ 0.11063 -0.13808 FORM OF APPLICATION LETTER

X2 0.43964 0.14979 0.39417 0.22555 APPEARANCE

X3 0.06115 0.12744 0.00557 0.92792✔ ACADEMIC ABILITY

X4 0.21559 0.24667 0.87360✔ -0.08137 LIKEABILITY

X5 0.91896✔ -0.10368 0.16241 -0.06219 SELF CONFIDENCE

X6 0.86439✔ 0.10195 0.25878 0.00642 LUCIDITY

X7 0.21715 -0.24607 0.86440✔ 0.00341 HONESTY

X8 0.91799✔ 0.20635 0.08773 -0.04938 SALESMANSHIP

X9 0.08530 0.84871✔ -0.05537 0.21919 EXPERIENCE

XA 0.79576✔ 0.35407 0.15950 -0.05026 DRIVE

XB 0.91641✔ 0.16268 0.10496 -0.04184 AMBITION

XC 0.80415✔ 0.25872 0.34049 0.15153 GRASP

XD 0.73917✔ 0.32885 0.42493 0.22980 POTENTIAL

XE 0.43597 0.36420 0.54105 -0.51862 KEENNESS TO JOIN

XF 0.37950 0.79807✔ 0.07847 0.08221 SUITABILITY

VARIANCE EXPLAINED BY EACH FACTOR

FACTOR1 FACTOR2 FACTOR3 FACTOR4

5.745474 2.735065 2.413961 1.347767

Good service—friendly; Price level;Attractiveness;Spaciousness; Size

bow83755_app H_001-012.qxd 23/5/08 8:29 PM Page H5

Page 6: Multidimensional

T A B L E H.2 Correlation Matrix for Detergent Data2

H6 Appendix H Factor Analysis, Cluster Analysis, and Multidimensional Scaling

H.2 Cluster Analysis and Multidimensional ScalingProfessional baseball and tennis were less popular in 2000 than in the late 1970’s and early1980’s. To see why this might be true, we consider a study by Levine (1977) concerning theperceptions of various sports in 1977. Levine had 45 undergraduate students give each of box-ing (BX), basketball (BK), golf (G), swimming (SW), skiing (SK), baseball (BB), ping pong(PP), hockey (HK), handball (H), track and field (TF), bowling (BW), tennis (T), and football(F) an integer rating of 1 to 7 on six scales: fast moving (1) versus slow moving (7); complicatedrules (1) versus simple rules (7); team oriented (1) versus individual (7); easy to play (1) versushard to play (7); noncontact (1) versus contact (7); competition against opponent (1) versus com-petition against standard (7). The first two rows of Table H.4 present a particular undergraduate’sratings of boxing and basketball on each of the six scales, and Table H.5 presents the averagerating by all 45 undergraduates of each sport on each of the six scales.

To better understand the perceptions of the 13 sports, we will cluster them into groups. The firststep in doing this is to consider the distance between each pair of sports for each undergraduate. Forexample, to calculate the distance between boxing and basketball for the undergraduate whose rat-ings are given in Table H.4, we calculate the paired difference between the ratings on each of the sixscales, square each paired difference, sum the six squared paired differences, and find the square rootof this sum. The resulting distance is 5.9161. A distance for each undergraduate for each pair ofsports can be found, and then an average distance over the 45 undergraduates for each pair of sportscan be calculated. Statistical software packages do this, but these packages sometimes standardizethe individual ratings before calculating the distances. We will not discuss the various ways in which

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12

V1 1.00000 0.41901 0.51840 0.56641 0.18122 0.17454 0.23034 0.30647 0.24051 0.21192 0.27443 0.20694

V2 0.41901 1.00000 0.57599 0.49886 0.18666 0.24648 0.22907 0.22526 0.21967 0.25879 0.32132 0.25853

V3 0.51840 0.57599 1.00000 0.64325 0.29080 0.34428 0.41083 0.34028 0.32854 0.38828 0.39433 0.36712

V4 0.56641 0.49886 0.64325 1.00000 0.38360 0.39637 0.37699 0.40391 0.42337 0.36564 0.33691 0.36734

V5 0.18122 0.18666 0.29080 0.38360 1.00000 0.57915 0.59400 0.67623 0.69269 0.43873 0.55485 0.65261

V6 0.17454 0.24648 0.34428 0.39637 0.57915 1.00000 0.57756 0.70103 0.62280 0.62174 0.59855 0.57845

V7 0.23034 0.22907 0.41083 0.37699 0.59400 0.57756 1.00000 0.67682 0.68445 0.54175 0.78361 0.63889

V8 0.30647 0.22526 0.34028 0.40391 0.67623 0.70103 0.67682 1.00000 0.69813 0.68589 0.71115 0.71891

V9 0.24051 0.21967 0.32854 0.42337 0.69269 0.62280 0.68445 0.69813 1.00000 0.58579 0.64637 0.69111

V10 0.21192 0.25879 0.38828 0.36564 0.43873 0.62174 0.54175 0.68589 0.58579 1.00000 0.62250 0.63494

V11 0.27443 0.32132 0.39433 0.33691 0.55485 0.59855 0.78361 0.71115 0.64637 0.62250 1.00000 0.63973

V12 0.20694 0.25853 0.36712 0.36734 0.65261 0.57845 0.63889 0.71891 0.69111 0.63494 0.63973 1.00000

2The source of Table H.2 and Figure H.3 is Applied Multivariate Techniques by Subhash Sharma, John Wiley and Sons, Inc., New York, 1996.

F I G U R E H.3 SPSS Output of a Factor Analysis of the Detergent Data2

INITIAL STATISTICS: ROTATED FACTOR MATRIX:

VARIABLE COMMUNALITY * FACTOR EIGENVALUE PCT OF VAR CUM PCT FACTOR 1 FACTOR 2

V1 .42052 * 1 6.30111 52.5 52.5 VI .12289 .65101

V2 .39947 * 2 1.81757 15.1 67.7 V2 .13900 .64781

V3 .56533 * 3 .66416 5.5 73.2 V3 .24971 .78587

V4 .56605 * 4 .57155 4.8 78.0 V4 .29387 .74118

V5 .60467 * 5 .55995 4.7 82.6 V5 .73261 .15469

V6 .57927 * 6 .44517 3.7 86.3 V6 .73241 .20401

V7 .69711 * 7 .41667 3.5 89.8 V7 .77455 .22464

V8 .74574 * 8 .32554 2.7 92.5 V8 .85701 .20629

V9 .66607 * 9 .27189 2.3 94.8 V9 .80879 .19538

V10 .59287 * 10 .25690 2.1 96.9 V10 .69326 .23923

V11 .71281 * 11 .19159 1.6 98.5 V11 .77604 .25024

V12 .64409 * 12 .17789 1.5 100.0 V12 .79240 .19822

bow83755_app H_001-012.qxd 23/5/08 4:52 PM Page H6

Page 7: Multidimensional

T A B L E H.4 A Particular Undergraduate’s Ratings of Boxing and Basketball

Appendix H Factor Analysis, Cluster Analysis, and Multidimensional Scaling H7

such standardization can be done. Rather, we note that Table H.6 presents a matrix containing theaverage distance over the 45 undergraduates for each pair of sports, and we note that this matrix hasbeen obtained by using a software package that uses a standardization procedure. There are manydifferent approaches to using the average distances to cluster the sports. We will discuss one ap-proach—the hierarchical, complete linkage approach. Hierarchical clustering implies that, oncetwo sports are clustered together at a particular stage, they are considered to be permanently joinedand cannot be separated into different clusters at a later stage. Complete linkage bases the mergerof two clusters of sports (either cluster of which can be an individual sport) on the maximum dis-tance between sports in the clusters. For example, since Table H.6 shows that the smallest averagedistance is the average distance between football and hockey, which is 2.20, football and hockey are

T A B L E H.3 Factor Analysis of the Discount Store Data3

(1) Fast (1) Easy (1) CompMvg. to Play Opp.

(7) Slow (1) Compl. (1) Team (7) Hard (1) Ncon. (7) CompSport Mvg. (7) Simple (7) Indv. to Play (7) Con. Std.

Boxing 3 5 7 4 6 1

Basketball 2 3 2 4 4 2

Paired Difference 1 2 5 0 2 �1

Distance

� 235 � 5.9161

� 2(1)2 � (2)2 � (5)2 � (0)2 � (2)2 � (�1)2

Factor

Scale I II III IV V Communality

1. Good service .79 �.15 .06 .12 .07 .67

2. Helpful salespersons .75 �.03 .04 .13 .31 .68

3. Friendly personnel .74 �.07 .17 .09 �.14 .61

4. Clean .59 �.31 .34 .15 �.25 .65

5. Pleasant store to shop in .58 �.15 .48 .26 .10 .67

6. Easy to return purchases .56 �.23 .13 �.03 �.03 .39

7. Too many clerks .53 �.00 .02 .23 .37 .47

8. Attracts upper-class customers .46 �.06 .25 �.00 .17 .31

9. Convenient location .36 �.30 �.02 �.19 .03 .26

10. High quality products .34 �.27 .31 .12 .25 .36

11. Good buys on products .02 �.88 .09 .10 .03 .79

12. Low prices �.03 �.74 .14 .00 .13 .59

13. Good specials .35 �.67 �.05 .10 .14 .60

14. Good sales on products .30 �.67 .01 �.08 .16 .57

15. Reasonable value for price .17 �.52 .11 �.02 �.03 .36

16. Good store .41 �.47 .47 .12 .11 .63

17. Low pressure salespersons �.20 �.30 �.28 �.03 �.05 .18

18. Bright store �.02 �.10 .75 .26 �.05 .61

19. Attractive store .19 .03 .67 .34 .24 .66

20. Good displays .33 �.15 .61 .15 �.20 .57

21. Unlimited selections of products .09 .00 .29 �.03 .00 .09

22. Spacious shopping .00 .20 .00 .70 .10 .54

23. Easy to find items you want .36 �.16 .10 .57 .01 .49

24. Well-organized layout �.02 �.05 .25 .54 �.17 .39

25. Well-spaced merchandise .20 .15 .27 .52 .16 .43

26. Neat .38 �.12 .45 .49 �.34 .72

27. Big store �.20 .15 .06 .07 �.65 .49

28. Ads frequently seen by you .03 �.20 .07 .09 .42 .23

29. Fast checkout .30 �.16 .00 .25 �.33 .28

Percentage of variance explained 16 12 9 8 5

Cumulative variance explained 16 28 37 45 50

3The source of Table H.3 is Marketing Research, Sixth Edition by David A. Aaker, V. Kumar, and George S. Dax, John Wiley andSons, Inc., New York, 1998.

bow83755_app H_001-012.qxd 23/5/08 4:52 PM Page H7

Page 8: Multidimensional

H8 Appendix H Factor Analysis, Cluster Analysis, and Multidimensional Scaling

clustered together in the first stage of clustering (see the tree diagram in Figure H.4). Since the sec-ond smallest average distance is the average distance between tennis and handball, which is 2.33,tennis and handball are clustered together in the second stage of clustering. The third smallest aver-age distance is the average distance between football and basketball, which is 2.51, but football hasalready been clustered with hockey. The average distance between hockey and basketball is 2.58,and so the average distance between basketball and the cluster containing football and hockey is2.58—the maximum of 2.51 and 2.58. This average distance is equal to the average distance be-tween ping pong and the cluster containing tennis and handball, which (as shown in Table H.6) isthe maximum of 2.54 and 2.58—that is, 2.58. There is no other average distance as small as 2.58.Furthermore, note that the distance between basketball and football is 2.51, whereas the distancebetween ping pong and tennis is 2.54. Therefore, we will break the “tie” between the two averagedistances of 2.58 by adding basketball to the cluster containing football and hockey in the third stageof clustering. Then, we add ping pong to the cluster containing tennis and handball in the fourthstage of clustering. Figure H.4 shows the results of all 12 stages of clustering.

At the end of seven stages of clustering, six clusters have been formed. They are:

Cluster 1: Boxing Cluster 4: Golf, Bowling

Cluster 2: Skiing Cluster 5: Basketball, Hockey, Football

Cluster 3: Swimming, Ping Pong, Handball, Cluster 6: BaseballTennis, Track and Field

T A B L E H.5 Average Rating of Each Sport on Each of the Six Scales

T A B L E H.6 A Matrix Containing the Average Distances

(1) Fast (1) Easy (1) CompMvg. to Play Opp.

(7) Slow (1) Compl. (1) Team (7) Hard (1) Ncon. (7) CompSport Mvg. (7) Simple (7) Indv. to Play (7) Con. Std.

Boxing 3.07 4.62 6.62 4.78 6.02 1.73

Basketball 1.84 3.78 1.56 3.82 4.89 2.27

Golf 6.13 4.49 6.58 3.84 1.82 4.11

Swimming 2.87 5.02 5.29 3.64 2.22 4.36

Skiing 2.13 4.60 5.96 5.22 2.51 4.71

Baseball 4.78 4.18 2.16 3.33 3.60 2.67

Ping-Pong 3.18 5.13 5.38 2.91 2.04 2.20

Hockey 1.71 3.22 1.82 5.04 5.96 2.49

Handball 2.53 4.67 4.78 3.71 2.78 2.31

Track & field 2.82 4.38 4.47 3.84 2.89 3.82

Bowling 5.07 5.16 5.40 3.11 1.60 3.73

Tennis 2.89 3.78 5.47 4.09 2.16 2.42

Football 2.42 2.76 1.44 5.00 6.47 2.33

Sport BX BK G SK SW BB PP HK H TF BW T

BK 3.85

G 4.33 4.88

SK 3.80 4.05 3.73

SW 3.81 3.81 3.56 2.84

BB 4.12 3.15 3.83 4.16 3.60

PP 3.74 3.56 3.61 3.67 2.72 3.41

HK 3.85 2.58 5.11 4.02 4.17 3.49 4.27

H 3.41 3.24 3.92 3.25 2.80 3.34 2.58 3.52

TF 3.81 3.36 3 3.88 3.20 2.84 3.37 3.06 3.72 2.75

BW 4.07 4.23 2.72 3.75 2.89 3.32 2.87 4 4.58 3.13 3.26

T 3.49 3.32 3.59 3.19 2.82 3.25 2.54 3.58 2.33 2 2.72 2.85

F 3.86 2.51 5.15 4.38 4.41 3.43 4.35 2.20 1 3.68 3.84 4.67 3.69

Source of Tables H.4, H.5, and H.6 and of Figures H.4 and H.5: D. M. Levine, “Nonmetric Multidimensional Scaling and Hierarchical Clustering:Procedures for the Investigation of the Perception of Sports,” Research Quarterly, Vol. 48 (1977), pp. 341–348.

5

bow83755_app H_001-012.qxd 23/5/08 4:52 PM Page H8

Page 9: Multidimensional

Appendix H Factor Analysis, Cluster Analysis, and Multidimensional Scaling H9

In Figure H.5 we present a two dimensional graph in which we place ovals around these six clus-ters. This graph is the result of a procedure called multidimensional scaling. To understand thisprocedure, note that, since each sport is represented by six ratings, each sport exists geometricallyas a point in six dimensional space. Multidimensional scaling uses the relative average distancesbetween the sports in the six dimensional space (that is, the relationships between the average dis-tances in Table H.6) and attempts to find points in a lesser dimensional space that approximatelyhave the same relative average distances between them. In this example we illustrate mapping thesix dimensional space into a two dimensional space, because a two dimensional space allows us tomost easily interpret the results of multidimensional scaling—that is, to study the location of thesports relative to each other and thereby determine the overall factors or dimensions that appear toseparate the sports. Figure H.5 gives the output of multidimensional scaling that is given by a stan-dard statistical software system (we will not discuss the numerical procedure used to actually carryout the multidimensional scaling). By comparing the sports near the top of Axis II with the sportsnear the bottom, and by using the average ratings in Table H.5, we see that Axis II probably rep-resents the factor “team versus individual.” By comparing the sports on the left of Axis I with thesports on the right, and by using the average ratings in Table H.5, we see that Axis I probably rep-resents the factor “degree of action,” which combines “contact/noncontact” aspects with “fast mov-ing/slow moving” aspects. Also, note that the two clusters that have been formed at the end of 11stages of clustering in Figure H.4 support the existence of the “team versus individual” factor.

F I G U R E H.4 A Tree Diagram Showing Clustering of the 13 Sports

F I G U R E H.5 Multidimensional Scaling ofthe 13 Sports

Axis II

Axis I

FHK

BK

BB

BW

TFH T

PPSW

GSK

BX

BoxingSkiing

SwimmingPing PongHandball

Baseball

2 3 4

Distance

5

FootballHockey

BasketballBowling

GolfTrack & Field

Tennis

13

5

8

12

11

10

9

7

64

2

bow83755_app H_001-012.qxd 23/5/08 4:52 PM Page H9

Page 10: Multidimensional

H10 Appendix H Factor Analysis, Cluster Analysis, and Multidimensional Scaling

Although the perception of sports in 1977 relate to sports “in general” (and not to just pro-fessional sports), and although these perceptions do not directly relate to the popularity of sports,note that “high action, team oriented” sports (football and basketball) tended to be popular in2000. Considering the Axis II factor (“team versus individual”), it might be that high baseballplayer salaries, free agency, frequent player moves, and the inability of small market teams tocompete made baseball seem less team oriented to fans in 2000. Perhaps more revenue sharingbetween small and large market teams would improve the situation. Considering the Axis I fac-tor (“degree of action”), it might be that power tennis (partially due to new tennis racquet tech-nologies) and the resulting shorter rallies made tennis seem less action oriented to fans in 2000.Perhaps limiting the power of tennis racquets and thus allowing smaller, exciting players (likeJimmy Connors and John McEnroe of the 1970’s and 1980’s) to be major competitors might helpincrease the degree of action in tennis.

a In Intermediate Statistical Methods and Applications, A Computer Package Approach (Prentice Hall,1983), Mark L. Berenson, David M. Levine, and Mathew Goldstein consider a marketing researchstudy concerning the similarities and differences between the ten types of food shown in Table H.7.Each type of food was given an integer rating of 1 to 7 on three scales: bland (1) versus spicy (7);light (1) versus heavy (7); and low calories (1) versus high calories (7). Table H.7 gives the averagevalue for each of the food types on the three scales. Figures H.6 and H.7 present the results of acluster analysis and multidimensional scaling of the 10 food types.(1) Discuss why the two axes in Figure H.7 may be interpreted as “oriental versus western” and

“spicy versus bland.”

(2) Using Table H.7 and Figures H.6 and H.7, discuss the similarities and differences between thefood types.

(3) Suppose that you are in charge of choosing restaurants to be included in a new riverfrontdevelopment that initially will include a limited number of restaurants. How might Table H.7and Figures H.6 and H.7 help you to make your choice?

T A B L E H.7 Average Ratings of the Food Types on Three Scales

Food Spicy/Bland Heavy/Light High/Low Calories

Japanese (JPN) 2.8 3.2 3.4

Cantonese (CNT) 2.6 5.3 5.4

Szechuan (SCH) 6.6 3.6 3.0

French (FR) 3.5 4.5 5.1

Mexican (MEX) 6.4 4.3 4.3

Mandarin (MAN) 3.4 4.1 4.2

American (AMR) 2.3 5.8 5.7

Spanish (SPN) 4.7 5.4 4.9

Italian (ITL) 4.6 6.0 6.2

Greek (GRK) 5.3 4.7 6.0

F I G U R E H.6 A Cluster Analysis of the 10 Food Types

JapaneseCantoneseMandarin

FrenchAmerican

ItalianSpanish

GreekMexican

Szechuan

bow83755_app H_001-012.qxd 23/5/08 4:52 PM Page H10

Page 11: Multidimensional

Appendix H Factor Analysis, Cluster Analysis, and Multidimensional Scaling H11

b Automakers use multidimensional scaling to measure the images of their cars. Customer surveys askowners of different car makes to rank their autos from 1 to 10 for such qualities as “youthfulness,”“luxury,” and “practicality.” The responses are used to carry out multidimensional scaling, whichproduces a perceptual map showing the images of the different cars. Figure H.8 is a perceptual mapshowing car images in 1984. After viewing the map in Figure H.8, Chrysler concluded thatPlymouth, Dodge, and Chrysler needed to present a more youthful image and that Plymouth and

CNTJPN

MAN

ITL

SPN GRKMEX

AMR

FR

SCH

Axis I

Axis II

F I G U R E H.7 Multidimensional Scaling of the 10 Food Types

F I G U R E H.8 Multidimensional Scaling Showing Car Images in 1984

Perceptual Map—Brand Images

Source: Chrylser Corp.

LincolnCadillac

Mercedes

ChryslerBuick

Oldsmobile

Pontiac

BMW

Porsche

Ford

Dodge

VW

Toyota

DatsunChevrolet

Plymouth

Has a Touch of Classa Car I‘d Be Proud to OwnDistinctive Looking

ConservativeLookingAppeals toOlder People

Very PracticalProvides Good Gas MileageAffordable

Has SpiritedPerformance

Appeals toYoung People

Fun to DriveSporty Looking

Source: Marketing Research: Methodological Foundations (page 492), by Gilbert A. Churchill, Jr., The Dryden Press, Orlando,1995.Source: John Koten, “Car Makers Use ‘Image’ Map as Tool to Position Products,” The Wall Street Journal (March 22, 1984), p. 31. Reprinted by permission of The Wall Street Journal, © Dow Jones & Company, Inc., 1984. All Rights ReservedWorldwide.

bow83755_app H_001-012.qxd 23/5/08 4:52 PM Page H11

Page 12: Multidimensional

H12 Appendix H Factor Analysis, Cluster Analysis, and Multidimensional Scaling

Dodge needed to move up on the luxury scale. By the year 2000 Chrysler had introducted cars suchas the Dodge Neon, Plymouth Breeze, Dodge Intrepid, Chrysler Concorde, and Chrysler 300 M.These cars are more youthful and/or luxurious and tremendously increased Chrysler sales. What doesthe perceptual map say about the Buick and Oldsmobile divisions of General Motors? By 2000General Motors had made Buick the “family car” division and had introduced new Oldsmobiles thatwere more youthful and performance oriented. Do you think a perceptual map in 2002 would showthe same relationships between the Buick and Oldsmobile divisions?

bow83755_app H_001-012.qxd 23/5/08 4:52 PM Page H12


Recommended