+ All Categories
Home > Documents > Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate...

Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate...

Date post: 27-Jul-2020
Category:
Upload: others
View: 23 times
Download: 0 times
Share this document with a friend
28
Multidimensional Scaling Applied Multivariate Statistics Spring 2013
Transcript
Page 1: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

Multidimensional Scaling

Applied Multivariate Statistics – Spring 2013

TexPoint fonts used in EMF.

Read the TexPoint manual before you delete this box.: AAAAAAAAA

Page 2: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

Outline

Fundamental Idea

Classical Multidimensional Scaling

Non-metric Multidimensional Scaling

Appl. Multivariate Statistics - Spring 2013

Page 3: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

Basic Idea

Appl. Multivariate Statistics - Spring 2013

How to represent in two dimensions?

Page 4: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

Idea 1: Projection

Appl. Multivariate Statistics - Spring 2013

Page 5: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

Idea 2: Squeeze on table

Appl. Multivariate Statistics - Spring 2013

Close points stay close

Page 6: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

Which idea is better?

Appl. Multivariate Statistics - Spring 2013

Page 7: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

Idea of MDS

Represent high-dimensional point cloud in few (usually 2)

dimensions keeping distances between points similar

Classical/Metric MDS: Use a clever projection

R: cmdscale

Non-metric MDS: Squeeze data on table, only conserve

ranks

R: isoMDS

Appl. Multivariate Statistics - Spring 2013

Page 8: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

Classical MDS

Problem: Given euclidean distances among points, recover

the position of the points!

Example: Road distance between 21 European cities

(almost euclidean, but not quite)

Appl. Multivariate Statistics - Spring 2013

Page 9: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

Classical MDS

First try:

Appl. Multivariate Statistics - Spring 2013

Page 10: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

Classical MDS

Flip axes:

Appl. Multivariate Statistics - Spring 2013

Can identify points up to

- shift

- rotation

- reflection

Page 11: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

Classical MDS

Another example: Airpollution in US cities

Range of manu and popul is much bigger than range of

wind

Need to standardize to give every variable equal weight

Appl. Multivariate Statistics - Spring 2013

Page 12: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

Classical MDS

Appl. Multivariate Statistics - Spring 2013

Page 13: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

Classical MDS: Theory

Input: Euclidean distances between n objects in p

dimensions

Output: Position of points up to rotation, reflection, shift

Two steps:

- Compute inner products matrix B from distance

- Compute positions from B

Appl. Multivariate Statistics - Spring 2013

Page 14: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

Classical MDS: Theory – Step 1

Inner products matrix B = XXT

Connect to distance:

Center points to avoid shift invariance

Invert relationship:

“doubly centered”

(Hint for middle of page 108: Plug in (4.3) and equations on

top of page 108 to show that the expression involving d’s is

equal to bij)

Thus, we obtained B from the distance matrix Appl. Multivariate Statistics - Spring 2013

d2ij =Pq

k=1(xik ¡xjk)2 = ::: = bii + bjj ¡ 2bij

bij =¡12(d2ij ¡ d2i: ¡ d2:j + d2::)

bij =Pq

k=1 xikxjk

n * q data matrix

³x = 0!

Pn

i=1 xik = 0!P

i or j bij = 0´

Page 15: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

Classical MDS: Theory – Step 2

Since B = XXT, we need the “square root” of B

B is a symmetric and positive definite n*n matrix

Thus, B can be diagonalized:

D is a diagonal matrix with on diagonal

(“eigenvalues”)

V contains as columns normalized eigenvectors

Some eigenvalues will be zero; drop them:

Take “square root”:

Thus we obtained the position of points from the distances

between all points

Appl. Multivariate Statistics - Spring 2013

B = V¤V T

¸1 ¸ ¸2 ¸ ::: ¸ ¸n

B = V1¤1VT1

X = V1¤12

1

Page 16: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

Classical MDS: Low-dim representation

Keep only few (e.g. 2) largest eigenvalues and

corresponding eigenvectors

The resulting X will be the low-dimensional representation

we were looking for

Goodness of fit (GOF) if we reduce to m dimensions:

(should be at least 0.8)

Finds “optimal” low-dim representation: Minimizes

Appl. Multivariate Statistics - Spring 2013

GOF =

Pm

i=1¸iP

n

i=1¸i

S =Pn

i=1

Pn

j=1

³d2ij ¡ (d

(m)ij )2

´

Page 17: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

Classical MDS: Pros and Cons

+ Optimal for euclidean input data

+ Still optimal, if B has non-negative eigenvalues

(pos. semidefinite)

+ Very fast

- No guarantees if B has negative eigenvalues

However, in practice, it is still used then. New measures for

Goodness of fit:

Appl. Multivariate Statistics - Spring 2013

GOF =

Pm

i=1j¸ijP

n

i=1j¸ij

GOF =

Pm

i=1¸2iP

n

i=1¸2i

GOF =

Pm

i=1max(0;¸i)P

n

i=1max(0;¸i)

Used in R function “cmdscale”

Page 18: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

Non-metric MDS: Idea

Sometimes, there is no strict metric on original points

Example: How beautiful are these persons?

(1: Not at all, 10: Very much)

Appl. Multivariate Statistics - Spring 2013

2 6 9

OR 1 5 10 ??

Page 19: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

Non-metric MDS: Idea

Absolute values are not

that meaningful

Ranking is important

Non-metric MDS finds a low-dimensional

representation, which

respects the ranking of distances

Appl. Multivariate Statistics - Spring 2013

>

>

Page 20: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

Non-metric MDS: Theory

is the true dissimilarity, dij is the distance of representation

Minimize STRESS ( is an increasing function):

Optimize over both position of points and µ

is called “disparity”

Solved numerically (isotonic regression);

Classical MDS as starting value;

very time consuming

Appl. Multivariate Statistics - Spring 2013

S =

Pi<j

(µ(±ij)¡dij)2Pi<j

d2ij

±ij

µ

d̂ij = µ(±ij)

Page 21: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

Non-metric MDS: Example for intuition (only)

Appl. Multivariate Statistics - Spring 2013

True points in

high dimensional space

3

2

5

B A

C

STRESS = 19.7

Compute best

representation

±AB < ±BC < ±AC

Page 22: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

Non-metric MDS: Example for intuition (only)

Appl. Multivariate Statistics - Spring 2013

True points in

high dimensional space

2.7

2

4.8

B A

C

STRESS = 20.1

Compute best

representation

±AB < ±BC < ±AC

Page 23: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

Non-metric MDS: Example for intuition (only)

Appl. Multivariate Statistics - Spring 2013

True points in

high dimensional space

2.9

2

5.2

B A

C

STRESS = 18.9

We will finally represent the

“transformed true distances”

(called disparities):

Compute best

representation

±AB < ±BC < ±AC d̂AB = 2; d̂BC = 2:9; d̂AC = 5:2instead of the true distances:

±AB = 2; ±BC = 3; ±AC = 5

Stop if minimal STRESS is found.

Page 24: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

Non-metric MDS: Pros and Cons

+ Fulfills a clear objective without many assumptions

(minimize STRESS)

+ Results don’t change with rescaling or monotonic variable

transformation

+ Works even if you only have rank information

- Slow in large problems

- Usually only local (not global) optimum found

- Only gets ranks of distances right

Appl. Multivariate Statistics - Spring 2013

Page 25: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

Non-metric MDS: Example

Do people in the same party vote alike?

Number of votes where 15 congressmen disagreed in 19

votes

Appl. Multivariate Statistics - Spring 2013

Page 26: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

Non-metric MDS: Example

Appl. Multivariate Statistics - Spring 2013

Page 27: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

Concepts to know

Classical MDS:

- Finds low-dim projection that respects distances

- Optimal for euclidean distances

- No clear guarantees for other distances

- fast

Non-metric MDS:

- Squeezes data points on table

- respects only rankings of distances

- (locally) solves clear objective

- slow

Appl. Multivariate Statistics - Spring 2013

Page 28: Applied Multivariate Statistics Spring 2013 · Multidimensional Scaling Applied Multivariate Statistics – Spring 2013 TexPoint fonts used in EMF. Read the TexPoint manual before

R commands to know

cmdscale included in standard R distribution

isoMDS from package “MASS”

Appl. Multivariate Statistics - Spring 2013


Recommended