+ All Categories
Home > Documents > MULTIVARIATE STATISTICAL ANALYSIS FOR FOOD …zg ze ze sc mc scsb ce sc ze cocm zb sc zb zb mh me...

MULTIVARIATE STATISTICAL ANALYSIS FOR FOOD …zg ze ze sc mc scsb ce sc ze cocm zb sc zb zb mh me...

Date post: 24-May-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
23
MULTIVARIATE STATISTICAL ANALYSIS FOR FOOD SCIENCE AND AGRICULTURE: AN INTRODUCTION 4. MULTIDIMENSIONAL SCALING Prof. Eugenio Parente Scuola di Scienze Agrarie- Università della Basilicata 05/01/2013 Multistat 3 cfu, Dec 2012 - Jan 2013
Transcript
Page 1: MULTIVARIATE STATISTICAL ANALYSIS FOR FOOD …zg ze ze sc mc scsb ce sc ze cocm zb sc zb zb mh me mbmb mf mg zb mf mbmh mbmemgmfma sd mh mb mdmgma mdmf mg mh mhcd me mb md mdmb mb

MULTIVARIATE STATISTICAL ANALYSIS FOR FOOD SCIENCE AND AGRICULTURE: AN INTRODUCTION 4. MULTIDIMENSIONAL SCALING Prof. Eugenio Parente Scuola di Scienze Agrarie- Università della Basilicata

05/01/2013 Multistat 3 cfu, Dec 2012 - Jan 2013

Page 2: MULTIVARIATE STATISTICAL ANALYSIS FOR FOOD …zg ze ze sc mc scsb ce sc ze cocm zb sc zb zb mh me mbmb mf mg zb mf mbmh mbmemgmfma sd mh mb mdmgma mdmf mg mh mhcd me mb md mdmb mb

05/01/2013 Multistat 3 cfu, Dec 2012 - Jan 2013

Outline • Multidimensional scaling (MDS)

•  objectives of MDS •  metric and monotonic MDS •  MDS output •  examples

Page 3: MULTIVARIATE STATISTICAL ANALYSIS FOR FOOD …zg ze ze sc mc scsb ce sc ze cocm zb sc zb zb mh me mbmb mf mg zb mf mbmh mbmemgmfma sd mh mb mdmgma mdmf mg mh mhcd me mb md mdmb mb

05/01/2013 Multistat 3 cfu, Dec 2012 - Jan 2013

Multidimensional scaling Multidimensional scaling is a group of techniques used to fit a set of points in a q-dimensional space such as the distance between the points (δij) matches as closely as possible the dissimilarity (dij) between the original objects in the p-dimensional space in order to obtain a simple spatial model (map). The model does not need statistical distribution assumptions, but the data should satisfy metric conditions: •  distance from an object with itself is 0 •  distance from object A to object B is the same as the

distance of B from A (the dissimilarity matrix should be simmetrical)

•  distance from A to C is less or equal than the sum of distances between A to B and B to C (triangle inequality)

Page 4: MULTIVARIATE STATISTICAL ANALYSIS FOR FOOD …zg ze ze sc mc scsb ce sc ze cocm zb sc zb zb mh me mbmb mf mg zb mf mbmh mbmemgmfma sd mh mb mdmgma mdmf mg mh mhcd me mb md mdmb mb

MDS vs. PCA • Advantages

•  MDS will usually find a solution with less dimensions compared to PCA

•  If the purpose is just finding or visualizing natural groups of objects, a map is easier to explain than a score plot

•  Cluster of objects are easier to visualize / highlight in a MDS map

• Disadvantages •  It is usually more difficult to find relationships between dimensions

and original variables •  MDS is usually effective only when observation are fairly well

spread out in the space

05/01/2013 Multistat 3 cfu, Dec 2012 - Jan 2013

Page 5: MULTIVARIATE STATISTICAL ANALYSIS FOR FOOD …zg ze ze sc mc scsb ce sc ze cocm zb sc zb zb mh me mbmb mf mg zb mf mbmh mbmemgmfma sd mh mb mdmgma mdmf mg mh mhcd me mb md mdmb mb

05/01/2013 Multistat 3 cfu, Dec 2012 - Jan 2013

Multidimensional scaling The model can be written as:

δij = f (dij )dij = h(x i,x j )

where: xi and xj are the vectors of the coordinates of objects i and j in the q-dimensional space (q<p) f(dij) is the assumed functional relationship between the dissimilarities and the distances h is the distance function (usually Euclidean, but Minkowski metrics can be used)

Page 6: MULTIVARIATE STATISTICAL ANALYSIS FOR FOOD …zg ze ze sc mc scsb ce sc ze cocm zb sc zb zb mh me mbmb mf mg zb mf mbmh mbmemgmfma sd mh mb mdmgma mdmf mg mh mhcd me mb md mdmb mb

05/01/2013 Multistat 3 cfu, Dec 2012 - Jan 2013

Metric multidimensional scaling

A direct numerical comparison between fitted distances and dissimilarities (usually based on a least squares criterion) is used. The coordinates are iteratively calculated to minimize a goodness of fit statistics (stress). In linear metric scaling a linear model is used to relate distances to dissimilarities:

dij =α + βδij + εij

Page 7: MULTIVARIATE STATISTICAL ANALYSIS FOR FOOD …zg ze ze sc mc scsb ce sc ze cocm zb sc zb zb mh me mbmb mf mg zb mf mbmh mbmemgmfma sd mh mb mdmgma mdmf mg mh mhcd me mb md mdmb mb

05/01/2013 Multistat 3 cfu, Dec 2012 - Jan 2013

Metric multidimensional scaling A fit criterion which is invariant both under rigid transformations (rotations, reflections, translations) and under non-rigid transformations (stretching and shrinking obtained by multiplications of the coordinate by a factor k) is:

S3 =

dij − f δij( )( )2

i< j∑

dij2

i< j∑

whose square root is known as stress

Page 8: MULTIVARIATE STATISTICAL ANALYSIS FOR FOOD …zg ze ze sc mc scsb ce sc ze cocm zb sc zb zb mh me mbmb mf mg zb mf mbmh mbmemgmfma sd mh mb mdmgma mdmf mg mh mhcd me mb md mdmb mb

05/01/2013 Multistat 3 cfu, Dec 2012 - Jan 2013

Non-metric (monotonic) MDS When observed proximities contain information on rank order rather than on real distances, assuming a linear relationship between observed and fitted distances may be inappropriate and monotonic regression should be used

the fitted distances are chosen to represent a weak monotonicity condition

dij = ˆ d ij + εij

δi1 , j1< δi2 , j2

< ...δiN , jN

ˆ d i1 j1≤ ˆ d i2 j2

≤ ...≤ ˆ d iN jN

Page 9: MULTIVARIATE STATISTICAL ANALYSIS FOR FOOD …zg ze ze sc mc scsb ce sc ze cocm zb sc zb zb mh me mbmb mf mg zb mf mbmh mbmemgmfma sd mh mb mdmgma mdmf mg mh mhcd me mb md mdmb mb

MDS input data • Dissimilarity matrices obtained in a direct way (ask

assessors to state how different two objects are, take measurement from a map, etc.)

• Dissimilarity matrices calculated from rectangular (n x p) data matrices •  Euclidean distance (on standardized or unstandardized data) •  Negative correlation (beware, high positive correlation = most

similar, high negative correlation = most dissimilar) •  Other distance or correlation measures (Spearman, Guttman)

05/01/2013 Multistat 3 cfu, Dec 2012 - Jan 2013

Page 10: MULTIVARIATE STATISTICAL ANALYSIS FOR FOOD …zg ze ze sc mc scsb ce sc ze cocm zb sc zb zb mh me mbmb mf mg zb mf mbmh mbmemgmfma sd mh mb mdmgma mdmf mg mh mhcd me mb md mdmb mb

Adjustable parameters in the analysis •  Type of scaling (monotonic, metric: linear, log, power) • Number of dimensions • Stress function •  Iteration and convergence parameters

05/01/2013 Multistat 3 cfu, Dec 2012 - Jan 2013

Page 11: MULTIVARIATE STATISTICAL ANALYSIS FOR FOOD …zg ze ze sc mc scsb ce sc ze cocm zb sc zb zb mh me mbmb mf mg zb mf mbmh mbmemgmfma sd mh mb mdmgma mdmf mg mh mhcd me mb md mdmb mb

05/01/2013 Multistat 3 cfu, Dec 2012 - Jan 2013

MDS output In the MDS output look for: •  final configuration in the q dimensions

(q<p), coordinates and plots •  final stress and proportion of the variance;

according to Kruskal (1964) •  stress 0.20 -> poor fit •  stress 0.10 -> fair fit •  stress 0.05 -> good fit •  stress 0.025 -> excellent fit

•  Shepard diagram (plot of observed vs fitted distances)

Page 12: MULTIVARIATE STATISTICAL ANALYSIS FOR FOOD …zg ze ze sc mc scsb ce sc ze cocm zb sc zb zb mh me mbmb mf mg zb mf mbmh mbmemgmfma sd mh mb mdmgma mdmf mg mh mhcd me mb md mdmb mb

Individual differences MDS • Uses multiple dissimilarity matrices (for example different

judges evaluating a common set of ptoducts) •  The input is a rectangular matrix containing stacked

triangular dissimilarity matrices • Scales both objects in a common space in order to

•  Find a common configuration for all objects •  Calculate weights for judges in the common space •  Assess goodness of fit for both objects and judges

05/01/2013 Multistat 3 cfu, Dec 2012 - Jan 2013

Page 13: MULTIVARIATE STATISTICAL ANALYSIS FOR FOOD …zg ze ze sc mc scsb ce sc ze cocm zb sc zb zb mh me mbmb mf mg zb mf mbmh mbmemgmfma sd mh mb mdmgma mdmf mg mh mhcd me mb md mdmb mb

The data file

05/01/2013 Multistat 3 cfu, Dec 2012 - Jan 2013

Page 14: MULTIVARIATE STATISTICAL ANALYSIS FOR FOOD …zg ze ze sc mc scsb ce sc ze cocm zb sc zb zb mh me mbmb mf mg zb mf mbmh mbmemgmfma sd mh mb mdmgma mdmf mg mh mhcd me mb md mdmb mb

The configuration

05/01/2013 Multistat 3 cfu, Dec 2012 - Jan 2013

Page 15: MULTIVARIATE STATISTICAL ANALYSIS FOR FOOD …zg ze ze sc mc scsb ce sc ze cocm zb sc zb zb mh me mbmb mf mg zb mf mbmh mbmemgmfma sd mh mb mdmgma mdmf mg mh mhcd me mb md mdmb mb

The output

05/01/2013 Multistat 3 cfu, Dec 2012 - Jan 2013

Page 16: MULTIVARIATE STATISTICAL ANALYSIS FOR FOOD …zg ze ze sc mc scsb ce sc ze cocm zb sc zb zb mh me mbmb mf mg zb mf mbmh mbmemgmfma sd mh mb mdmgma mdmf mg mh mhcd me mb md mdmb mb

05/01/2013 Multistat 3 cfu, Dec 2012 - Jan 2013

MDS examples Open file MDSRAPD.syo for examples of MDS on RAPD-PCR data (including bootstrapping/MDS procedure, which can be found in file sardiniabread.syo); look at the original data and final configuration for the MDS on RAPD data in file breadlab.xls; look at file MDSboot.xls for the final bootstrapping/MDS configuration; look at the command files for details on the procedures of pretreatment (boot.syc, Bread\fornextloop.syc)

Page 17: MULTIVARIATE STATISTICAL ANALYSIS FOR FOOD …zg ze ze sc mc scsb ce sc ze cocm zb sc zb zb mh me mbmb mf mg zb mf mbmh mbmemgmfma sd mh mb mdmgma mdmf mg mh mhcd me mb md mdmb mb

05/01/2013 Multistat 3 cfu, Dec 2012 - Jan 2013

MDS examples Open file mds.syo for MDS examples on the RP-HPLC dataset for smear cheese.

Page 18: MULTIVARIATE STATISTICAL ANALYSIS FOR FOOD …zg ze ze sc mc scsb ce sc ze cocm zb sc zb zb mh me mbmb mf mg zb mf mbmh mbmemgmfma sd mh mb mdmgma mdmf mg mh mhcd me mb md mdmb mb

05/01/2013 Multistat 3 cfu, Dec 2012 - Jan 2013

MDS of RAPD patterns of LAB Mk 1 2 3 4 5 6 7 8 9 10 11 12 Mk

1.000 bp

34

5.000 bp

34 29 5 28 38

Mk 13 14 15 16 17 18 19 20 21 22 23 24 Mk

30 29 30 18 30

Mk 37 38 39 40 41 42 43 44 45 46 47 Mk

18 18 42 14 37 42 27 37 3 24 27

Mk 25 26 27 28 29 30 31 32 33 34 35 36 Mk

22 35 28 22 8 24 28 9 24

A B

C D

Page 19: MULTIVARIATE STATISTICAL ANALYSIS FOR FOOD …zg ze ze sc mc scsb ce sc ze cocm zb sc zb zb mh me mbmb mf mg zb mf mbmh mbmemgmfma sd mh mb mdmgma mdmf mg mh mhcd me mb md mdmb mb

05/01/2013 Multistat 3 cfu, Dec 2012 - Jan 2013

MDS of RAPD patterns of LAB

ZB CH CH CH CP CP CP MH CL CL CL CM

ZE ZE CE CD CD SA

SA

SA

ZE

ZE

CG CG

ZE

ZB ZE

ZE ZE SC SB SC

SC

ZE CO CM

ZB

ZB

ZB CE

MH ME MB MB MG MF MB MF ME MF MA MB MG MH MF

MH MB MD MG MA MF MD MG MH MH CD

ME MB MD MD MB MB MB

CE CF CE CD CA CD MA MG

MC MB MB MF MG MG ME ME MF MG MC MA

MB MG MB

MB

CA CA

CM CM CF CF

SB SA SB SB

SB

SB SB SB SA

SA CM SA SB SA SC SB SA

ZE CG

ZE ZB ZB ZB

ZB ZB ZE ZE

ZB ZB ZB ZE

ZB CG

ZE ZE ZB ZB ZB ZB

ZB ZG

ZG ZG ZG ZG

ZG ZG

ZG

ZB

SD SD

SC SC SC SD SD SD SD SD SD ZG SD SD SD SD

SD SC SC ZG

ZG

SA

ZG ZG

W.confusa Leuc.citreum

Lb.sanfranciscensis

Lb.plantarum Lb.pentosus Lb.brevis

Species

Page 20: MULTIVARIATE STATISTICAL ANALYSIS FOR FOOD …zg ze ze sc mc scsb ce sc ze cocm zb sc zb zb mh me mbmb mf mg zb mf mbmh mbmemgmfma sd mh mb mdmgma mdmf mg mh mhcd me mb md mdmb mb

05/01/2013 Multistat 3 cfu, Dec 2012 - Jan 2013

MDS of RAPD patterns of LAB

-3 -2 -1 0 1 2 dim(1)

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

MB

CG

ZB

ZG ZE

SD

SC

SA SB

dim

(2)

Page 21: MULTIVARIATE STATISTICAL ANALYSIS FOR FOOD …zg ze ze sc mc scsb ce sc ze cocm zb sc zb zb mh me mbmb mf mg zb mf mbmh mbmemgmfma sd mh mb mdmgma mdmf mg mh mhcd me mb md mdmb mb

05/01/2013 Multistat 3 cfu, Dec 2012 - Jan 2013

MDS of RP-HPLC data from smear cheese

Page 22: MULTIVARIATE STATISTICAL ANALYSIS FOR FOOD …zg ze ze sc mc scsb ce sc ze cocm zb sc zb zb mh me mbmb mf mg zb mf mbmh mbmemgmfma sd mh mb mdmgma mdmf mg mh mhcd me mb md mdmb mb

05/01/2013 Multistat 3 cfu, Dec 2012 - Jan 2013

RP-HPLC data from smear cheese: PCA vs. MDS of Eulidean distance matrix

Page 23: MULTIVARIATE STATISTICAL ANALYSIS FOR FOOD …zg ze ze sc mc scsb ce sc ze cocm zb sc zb zb mh me mbmb mf mg zb mf mbmh mbmemgmfma sd mh mb mdmgma mdmf mg mh mhcd me mb md mdmb mb

05/01/2013 Multistat 3 cfu, Dec 2012 - Jan 2013

Some rights reserved

This presentation was created by Eugenio Parente, 2008 (revised:s 2012). With the exception of figures and tables taken from published articles the material included in this presentation is covered by Creative Commons Public License “by-nc-sa” (http://creativecommons.org/licenses/by-nc-sa/2.5/deed.en).


Recommended