Post on 19-Dec-2015
transcript
Multidimensional Scaling
Agenda
• Multidimensional Scaling
• Goodness of fit measures
• Nosofsky, 1986
Proximities
Amherst Belchertown Hadley Leverett Pelham Shutesbury Sunderland
Amherst 0 9.94 4.32 7.29 6.81 9.94 7.81
Belchertown 0 14.06 14.94 8.25 13.96 17.66
Hadley 0 11.02 10.93 14.49 9.5
Leverett 0 12.57 7.45 5.18
Pelham 0 5.71 16.16
Shutesbury 0 11.04
Sunderland 0
pAmherst, Hadley
Configuration (in 2-D)
xi
Configuration (in 1-D)
Formal MDS Definition
• f: pijdij(X)• MDS is a mapping from proximities to corresponding
distances in MDS space.• After a transformation f, the proximities are equal to
distances in X.
Amherst Belchertown
Hadley Leverett Pelham Shutesbury
Sunderland
Amherst 0 9.94 4.32 7.29 6.81 9.94 7.81
Belchertown
0 14.06 14.94 8.25 13.96 17.66
Hadley 0 11.02 10.93 14.49 9.5
Leverett 0 12.57 7.45 5.18
Pelham 0 5.71 16.16
Shutesbury
0 11.04
Sunderland
0
Distances, dij
dAmherst, Hadley(X)
Distances, dij
€
dAmherst,Hadley (X) = xAmherst,1 − xHadley,1( )2+ xAmherst,2 − xHadley,2( )
2
= −.5775 −−2.3076( )2+ −1.0928 −−7.1844( )
2
= 6.332
Distances, dij
dAmherst, Hadley(X)=4.32
Proximities and DistancesAmherst Belchertown Hadley Leverett Pelham Shutesbury Sunderland
Amherst 0 9.94 4.32 7.29 6.81 9.94 7.81
Belchertown 0 14.06 14.94 8.25 13.96 17.66
Hadley 0 11.02 10.93 14.49 9.5
Leverett 0 12.57 7.45 5.18
Pelham 0 5.71 16.16
Shutesbury 0 11.04
Sunderland 0
Proximities
Amherst Belchertown Hadley Leverett Pelham Shutesbury Sunderland
Amherst 0 10.0577 6.3325 7.4738 7.9313 7.8319 7.8328
Belchertown 0 12.0455 16.8332 6.7959 12.7215 17.6600
Hadley 0 12.0350 13.1492 14.1632 8.1892
Leverett 0 12.2097 7.3591 6.6429
Pelham 0 6.3360 15.4250
Shutesbury 0 12.7366
Sunderland 0
Distances
The Role of f
• f relates the proximities to the distances.
• f(pij)=dij(X)
The Role of f
• f can be linear, exponential, etc.
• In psychological data, f is usually assumed any monotonic function.– That is, if pij<pkl then dij(X)dkl(X).
– Most psychological data is on an ordinal scale, e.g., rating scales.
Looking at Ordinal RelationsAmherst Belchertown Hadley Leverett Pelham Shutesbury Sunderland
Amherst 0 9.94 4.32 7.29 6.81 9.94 7.81
Belchertown 0 14.06 14.94 8.25 13.96 17.66
Hadley 0 11.02 10.93 14.49 9.5
Leverett 0 12.57 7.45 5.18
Pelham 0 5.71 16.16
Shutesbury 0 11.04
Sunderland 0
Proximities
Amherst Belchertown Hadley Leverett Pelham Shutesbury Sunderland
Amherst 0 10.0577 6.3325 7.4738 7.9313 7.8319 7.8328
Belchertown 0 12.0455 16.8332 6.7959 12.7215 17.6600
Hadley 0 12.0350 13.1492 14.1632 8.1892
Leverett 0 12.2097 7.3591 6.6429
Pelham 0 6.3360 15.4250
Shutesbury 0 12.7366
Sunderland 0
Distances
Stress
• It is not always possible to perfectly satisfy this mapping.
• Stress is a measure of how closely the model came.
• Stress is essentially the scaled sum of squared error between f(pij) and dij(X)
Stress
Dimensions
Str
ess “Correct” Dimensionality
Distance Invariant Transformations
• Scaling (All X doubled in size (or flipped))
• Rotatation (X rotated 20 degrees left)
• Translation (X moved 2 to the right)
Configuration (in 2-D)
Rotated Configuration (in 2-D)
Uses of MDS
• Visually look for structure in data.
• Discover the dimensions that underlie data.
• Psychological model that explains similarity judgments in terms of distance in MDS space.
Simple Goodness of Fit Measures
• Sum-of-squared error (SSE)
• Chi-Square
• Proportion of variance accounted for (PVAF)
• R2
• Maximum likelihood (ML)
Sum of Squared ErrorData Prediction (Data-Prediction)2
7 5.03 3.88
8 6.97 1.06
1 2.12 1.25
8 8.91 0.83
6 6.97 0.94
SSE 7.97
Chi-Square
Data Prediction(Data-
Prediction)2
(Data - Prediction)2/Predictio
n
7 5 4 0.80
8 7 1 0.14
1 2 1 0.50
8 9 1 0.11
6 7 1 0.14
Chi-Square 1.70
Proportion of Variance Accounted for
Data Mean Prediction Model Prediction
Mean Error Error2 Prediction Error Error2
7 6 1 1 5.03 1.97 3.88
8 6 2 4 6.97 1.03 1.06
1 6 -5 25 2.12 -1.12 1.25
8 6 2 4 8.91 -0.91 0.83
6 6 0 0 6.97 -0.97 0.94
SST 34 SSE 7.96
(SST-SSE)/SST = (34-7.96)/34 = .77
R2
• R2 is PVAF, but…
Data Mean Prediction Model Prediction
Mean Error Error2 Prediction Error Error2
7 6 1 1 5.9 1.1 1.21
8 6 2 4 10.1 -2.1 4.41
1 6 -5 25 4 -3 9
8 6 2 4 5.9 2.1 4.41
6 6 0 0 1 5 25
SST 34 SSE 44.03
(SST-SSE)/SST = (34-44.03)/34 = -0.295
Maximum Likelihood
• Assume we are sampling from a population with probability f(Y; ).
• The Y is an observation and the are the model parameters.
Y
=[0]
N(-1.7; [=0])=0.094
Maximum Likelihood• With independent observations, Y1…Yn,
the joint probability of the sample observations is:
€
g(Y1,...,Yn ) = f (Yi;θ)i=1
n
∏
Y1
=[0]
0.094 x 0.2661 x .3605 = .0090Y2Y3
Maximum Likelihood
• Expressed as a function of the parameters, we have the likelihood function:
• The goal is to maximize L with respect to the parameters, .€
L(θ) = f (Yi;θ)i=1
n
∏
Maximum Likelihood
Y1
=[0]
0.094 x 0.2661 x .3605 = .0090Y2Y3
Y1
=[-1.0167]
0.3159 x 0.3962 x .3398 = .0425Y2Y3
(Assuming =1)
Maximum Likelihood• Preferred to other methods
– Has very nice mathematical properties.– Easier to interpret.– We’ll see specifics in a few weeks.
• Often harder (or impossible?) to calculate than other methods.
• Often presented as log likelihood, ln(ML).– Easier to compute (sums, not products).– Better numerical resolution.
• Sometimes equivalent to other methods. – E.g., same as SSE when calculating mean of a distribution.