Multidimensional Scaling. Agenda Multidimensional Scaling Goodness of fit measures Nosofsky, 1986.

transcript

Multidimensional Scaling

Agenda

• Multidimensional Scaling

• Goodness of fit measures

• Nosofsky, 1986

Proximities

Amherst Belchertown Hadley Leverett Pelham Shutesbury Sunderland

Amherst 0 9.94 4.32 7.29 6.81 9.94 7.81

Belchertown 0 14.06 14.94 8.25 13.96 17.66

Hadley 0 11.02 10.93 14.49 9.5

Leverett 0 12.57 7.45 5.18

Pelham 0 5.71 16.16

Shutesbury 0 11.04

Sunderland 0

pAmherst, Hadley

Configuration (in 2-D)

Formal MDS Definition

• f: pijdij(X)• MDS is a mapping from proximities to corresponding

distances in MDS space.• After a transformation f, the proximities are equal to

distances in X.

Amherst Belchertown

Hadley Leverett Pelham Shutesbury

Sunderland

Amherst 0 9.94 4.32 7.29 6.81 9.94 7.81

Belchertown

0 14.06 14.94 8.25 13.96 17.66

Hadley 0 11.02 10.93 14.49 9.5

Leverett 0 12.57 7.45 5.18

Pelham 0 5.71 16.16

Shutesbury

0 11.04

Sunderland

Distances, dij

dAmherst, Hadley(X)

Distances, dij

dAmherst,Hadley (X) = xAmherst,1 − xHadley,1( )2+ xAmherst,2 − xHadley,2( )

= −.5775 −−2.3076( )2+ −1.0928 −−7.1844( )

= 6.332

Distances, dij

dAmherst, Hadley(X)=4.32

Proximities and DistancesAmherst Belchertown Hadley Leverett Pelham Shutesbury Sunderland

Amherst 0 9.94 4.32 7.29 6.81 9.94 7.81

Belchertown 0 14.06 14.94 8.25 13.96 17.66

Hadley 0 11.02 10.93 14.49 9.5

Leverett 0 12.57 7.45 5.18

Pelham 0 5.71 16.16

Shutesbury 0 11.04

Sunderland 0

Proximities

Amherst 0 10.0577 6.3325 7.4738 7.9313 7.8319 7.8328

Belchertown 0 12.0455 16.8332 6.7959 12.7215 17.6600

Hadley 0 12.0350 13.1492 14.1632 8.1892

Leverett 0 12.2097 7.3591 6.6429

Pelham 0 6.3360 15.4250

Shutesbury 0 12.7366

Sunderland 0

Distances

The Role of f

• f relates the proximities to the distances.

• f(pij)=dij(X)

The Role of f

• f can be linear, exponential, etc.

• In psychological data, f is usually assumed any monotonic function.– That is, if pij<pkl then dij(X)dkl(X).

– Most psychological data is on an ordinal scale, e.g., rating scales.

Looking at Ordinal RelationsAmherst Belchertown Hadley Leverett Pelham Shutesbury Sunderland

Amherst 0 9.94 4.32 7.29 6.81 9.94 7.81

Belchertown 0 14.06 14.94 8.25 13.96 17.66

Hadley 0 11.02 10.93 14.49 9.5

Leverett 0 12.57 7.45 5.18

Pelham 0 5.71 16.16

Shutesbury 0 11.04

Sunderland 0

Proximities

Amherst 0 10.0577 6.3325 7.4738 7.9313 7.8319 7.8328

Belchertown 0 12.0455 16.8332 6.7959 12.7215 17.6600

Hadley 0 12.0350 13.1492 14.1632 8.1892

Leverett 0 12.2097 7.3591 6.6429

Pelham 0 6.3360 15.4250

Shutesbury 0 12.7366

Sunderland 0

Distances

Stress

• It is not always possible to perfectly satisfy this mapping.

• Stress is a measure of how closely the model came.

• Stress is essentially the scaled sum of squared error between f(pij) and dij(X)

Stress

Dimensions

ess “Correct” Dimensionality

Distance Invariant Transformations

• Scaling (All X doubled in size (or flipped))

• Rotatation (X rotated 20 degrees left)

• Translation (X moved 2 to the right)

Rotated Configuration (in 2-D)

Uses of MDS

• Visually look for structure in data.

• Discover the dimensions that underlie data.

• Psychological model that explains similarity judgments in terms of distance in MDS space.

Simple Goodness of Fit Measures

• Sum-of-squared error (SSE)

• Chi-Square

• Proportion of variance accounted for (PVAF)

• R2

• Maximum likelihood (ML)

Sum of Squared ErrorData Prediction (Data-Prediction)2

7 5.03 3.88

8 6.97 1.06

1 2.12 1.25

8 8.91 0.83

6 6.97 0.94

SSE 7.97

Chi-Square

Data Prediction(Data-

Prediction)2

(Data - Prediction)2/Predictio

7 5 4 0.80

8 7 1 0.14

1 2 1 0.50

8 9 1 0.11

6 7 1 0.14

Chi-Square 1.70

Proportion of Variance Accounted for

Data Mean Prediction Model Prediction

Mean Error Error2 Prediction Error Error2

7 6 1 1 5.03 1.97 3.88

8 6 2 4 6.97 1.03 1.06

1 6 -5 25 2.12 -1.12 1.25

8 6 2 4 8.91 -0.91 0.83

6 6 0 0 6.97 -0.97 0.94

SST 34 SSE 7.96

(SST-SSE)/SST = (34-7.96)/34 = .77

• R2 is PVAF, but…

Data Mean Prediction Model Prediction

Mean Error Error2 Prediction Error Error2

7 6 1 1 5.9 1.1 1.21

8 6 2 4 10.1 -2.1 4.41

1 6 -5 25 4 -3 9

8 6 2 4 5.9 2.1 4.41

6 6 0 0 1 5 25

SST 34 SSE 44.03

(SST-SSE)/SST = (34-44.03)/34 = -0.295

Maximum Likelihood

• Assume we are sampling from a population with probability f(Y; ).

• The Y is an observation and the are the model parameters.

N(-1.7; [=0])=0.094

Maximum Likelihood• With independent observations, Y1…Yn,

the joint probability of the sample observations is:

g(Y1,...,Yn ) = f (Yi;θ)i=1

0.094 x 0.2661 x .3605 = .0090Y2Y3

Maximum Likelihood

• Expressed as a function of the parameters, we have the likelihood function:

• The goal is to maximize L with respect to the parameters, .€

L(θ) = f (Yi;θ)i=1

Maximum Likelihood

0.094 x 0.2661 x .3605 = .0090Y2Y3

=[-1.0167]

0.3159 x 0.3962 x .3398 = .0425Y2Y3

(Assuming =1)

Maximum Likelihood• Preferred to other methods

– Has very nice mathematical properties.– Easier to interpret.– We’ll see specifics in a few weeks.

• Often harder (or impossible?) to calculate than other methods.

• Often presented as log likelihood, ln(ML).– Easier to compute (sums, not products).– Better numerical resolution.

• Sometimes equivalent to other methods. – E.g., same as SSE when calculating mean of a distribution.

Multidimensional Scaling. Agenda Multidimensional Scaling Goodness of fit measures Nosofsky, 1986.

Documents