Home > Science > Some Multivariate Methods Used by Ecologists

# Some Multivariate Methods Used by Ecologists

Date post: 27-Jan-2015
Category:
View: 132 times
Description:

Embed Size (px)
Popular Tags:

#### n y y y y n

of 97 /97
Some Multivariate Methods Used by Ecologists Peter Chapman Wokingham U3A 4 October 2012
Transcript

Some Multivariate Methods Used by Ecologists

Peter Chapman Wokingham U3A

4 October 2012

Contents

• Assumptions • Introduction • Notation • Refresher on matrices • Eigen vectors and eigen values • Principal component analysis • Principal coordinate analysis • Correspondence analysis • Redundancy analysis and canonical correspondence analysis • R Software

Assumptions

1 2 3

2

I am assuming that you are familiar with the following :

A sum of several numbers ............. , 1 to

The mean of several numbers

1 1Variance ( ) , or

( 1)

n

i

i

n

i

i

n

i

i

x x x x i n

x

xn

x xn n

2( )

1Covariance ( )

( 1)

n

i

i

n

i i

i

x x

x x y yn

Introduction

Before 1980 multivariate research tended to be theoretical. Often this involved working on distribution theory for hypothetical but unrealistic situations. Very rarely did anyone carry out a multivariate analysis. Since 1980 or so, with growth of computing power, people started using multivariate methods. This has led to further development of the old methods, plus introduction of a lot of new methods. Starting in the mid 1980s a lot of biologically trained people who were also computer literate started appearing in the work place. These people, who rarely had any formal training in mathematics or statistics were able to run multivariate software and get results. This led to a lot of nonsense, including some published nonsense.

1 2 3 4 5 6 7

1 3 8

2 0 9

3 13 4

4 5 5

8 19

23

77 7 3

Species Si

tes

1 2 3 4 5 6 7

1 45.8 78.6

2 32.8 98.5

3 56.1 45.0

4

77 78.3

Counts

Environmental

Site

s

e.g. Soil, Climate, etc.

Real Numbers

Typical Example of Ecology Data

In pesticide research “site” could refer to different chemicals, or different rates of the same chemical

Example data sets: Although this is a talk about methods used in ecology, I shall be using data from other sources. This is because it is easier to understand what is going on if data is more familiar. Also, because during my search of the web I found it quite difficult to find suitable ecological examples. I have also tended to use data with small numbers of dimensions

Notation

is an matrix :

I will call it an n by p matrix - which means n rows and p columns

Always in blue, and always a captital if p > 1

And using the matric "style" in "Mathtype w h

it

n pY

dimensions in subscripts

also represents matrix

showing generic cell member,

with subscripts but not dimensions....................not used very much

ijy

11 12 13 1

21 22 23 2

31 32 33 3

1 2 3

. . .

. . .

. . .

. . . . . . . is another way of representing a matrix

. . . . . . .

. . . . . . .

. . .

p

p

p

n n n np

y y y y

y y y y

y y y y

y y y y

If matrix has only one row or only one column i will use lower case blue,e.g. n×1

u

Refresher on Matrices

Generic Matrix

Square Matrix

Row Matrix Column Matrix Vector

11 12 13 1

21 22 23 2

31 32 33 3

1 2 3

. . .

. . .

. . .

. . . . . . .

. . . . . . .

. . . . . . .

. . .

p

p

p

ij

n n n np

y y y y

y y y y

y y y y

y

y y y y

n pY

11 12 13

21 22 23

31 32 33

ij

y y y

y y y y

y y y

3 3Y

11 12 11 3 1. . . . . . ny y y y p

Y

11

21

31

1

.

.

.

.

n

y

y

y

y

n 1Y

Matrix Multiplication

3 3 3 2 2 3

Q B C

11 12 13 11 12

11 12 13

21 22 23 21 22

21 22 23

31 32 33 31 32

q q q b bc c c

q q q b bc c c

q q q b b

11 11 11 12 21

12 11 12 12 22

13 11 13 12 23

33 31` 13 32 23

.

q b c b c

q b c b c

q b c b c

q b c b c

32 32 32

Q B C

11 12 11 12 11 12 11 11 12 12

21 22 21 22 21 22 21 21 22 21

31 32 31 32 31 32 31 31 32 33

q q b b c c b c b c

q q b b c c b c b c

q q b b c c b c b c

Diagonal (Square) Matrix

11

22

33

44

55

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

ij

a

a

b a

a

a

5×5B

Identity Matrix

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

0 0 0 0 1

5×5I

n×n n×n n×n n×n

I C C I

Norm: Normalisation

1

2

3

.

.

.

.

n

b

b

b

b

nb

3

4

4

3

4

3

2 2 221 ....... nb b b b

2 23 4 5 b

Length or Norm

/ / 5 b b b b b

Normalised

/b b b

Transpose of a Matrix

11 12 13 14

21 22 23 24

31 32 33 34

41 42 43 44

51 52 53 54

11 21 31 41 51

12 22 32 42 52

13 23 33 43 53

14 24 34 44 54

the transpose of i s

y y y y

y y y y

y y y y

y y y y

y y y y

y y y y y

y y y y y

y y y y y

y y y y y

5×4

5×4 4×5

Y

Y Y

( ) =

( ) =

n×p p×q q×p p×n

n×p p×q q×m m×q q×p p×n

A B B A

A B C C B A

Scalar Product

1 2 3 1

2

3

. .

.

.

n

n

b c b b b b c

c

c

c

1 n n 1b c1 1 2 2 3 3 .......... n nbc b c b c b c

If b and c are orthogonal then

length of length of cos b c

cos cos90 0

1 2 3 1

2

3

. .

.

.

n

n

b c b b b b b

b

b

b

1×n n×1b b1 1 2 2 3 3 .......... n nbb b b b b b c

0 1 n n 1

b cso

length of length of cos0 b b

2

length of b

1 if is normal ised b

Determinant

11 12

11 22 12 21

21 22

b bb b b b

b b

B

11 12 13

21 22 23

31 32 33

22 23 21 23 21 221 1 1 2 1 3

11 12 13

32 33 31 33 31 32

( 1) ( 1) ( 1)

b b b

b b b

b b b

b b b b b bb b b

b b b b b b

B

Scalar

Rank of a Square Matrix

1 1 1

3 0 2

4 1 3

(-2*Col 1) = col 2 + (3*col3) row 1 = row 2 – row 3 Only two linearly independent (orthogonal) rows so rank = 2.

2 1 4

2 1 4

2 1 4

(-2*col1) = (4*col2) = col 3 row 1 = row 2 = row 3 Rank = 1

Order of a square matrix is number of rows/columns Rank of a square matrix is the number of linearly independent rows/columns A square matrix whose rank is less than its order has a determinant of zero. If a square matrix has a non-zero determinant it has full rank = number of rows or columns A full rank square matrix is called non-singular

Inverse of a Square Matrix

If is non singula r then -1 -1B BB B B I

is called the inverse o f-1B B

1 1

1 0

3 1

32B

1 3 1

2 5 1

23 23 32 22 32 23 33C C B I B C I

4 15 4

7 25 6

23 23 32 22 32 23 33C C B I B C I

C is not unique

http://www.mathwords.com/i/inverse_of_a_matrix.htm http://mathworld.wolfram.com/MatrixInverse.html http://www.purplemath.com/modules/mtrxinvr.htm

Association Matrices

Association Matrices Q-Mode

11 1

1

p

n np

y y

y y

npY

11 12 13 1. . . . . . py y y y

21 22 23 2. . . . . . py y y y

Row 1

Row 2

2 2 2 2

12 21 11 21 11 21 11 21 11 21.......a a y y y y y y y y

Euclidean Distance

Correlation

1 1. 2 2.

1

12 212 2

1 1. 2 2.

p

j j

j

j j

y y y y

a a

y y y y

11 12 13 1

21 22 23 2

31 32 33 3

1 2 3

. . .

. . .

. . .

. . . . . . .

. . . . . . .

. . . . . . .

. . .

n

n

n

n

i

n n

n j

n nn

a a a a

a a a a

a a a a

a

a a a a

A

11 1

1

p

n np

y y

y y

npY

11

21

31

1

.

.

.

.

n

y

y

y

y

12

22

32

2

.

.

.

.

n

y

y

y

y

11 12 13 1

21 22 23 2

31 32 33 3

1 2 3

. . .

. . .

. . .

. . . . . . .

. . . . . . .

. . . . . . .

. . .

p

p

p

ij

p p p pp

a a a a

a a a a

a a a a

a

a a a a

nnA

Compute

12 21a a

Association Matrices R-Mode

Summary: Q and R Mode

Descriptors Objects

Objects e.g. Sites

Descriptors e.g. Species

n pYn n

A

p pA

R mode association matrix

Q mode association matrix

Original data matrix

Eigen Values and Eigen Vectors

Ecological data sets usually include a large number of variables that are associated to one

another (e.g. linearly correlated). The basic idea underlying several methods of data analysis is

to reduce this large number of inter-correlated variables to a smaller number of composite, but

linearly independent variables, each explaining a different fraction of the observed variation.

One of the main goals of numerical data analysis is to generate a small number of variables,

each explaining a large proportion of the variation, and to ascertain that these new variables

explain different aspects of the phenomena under study.

Eigen analysis is a key tool in helping us achieve this aim.

Eigen Values and Eigenvectors

For any square matrix the following relationship always exists

Where the colums of are orthonormal

n×n

-1

n×n n×n n×n n×n n×n n×n n×n

n×n

A

A U Λ U U Λ U

U

11 12 13

21 22 23

31 32 33

. . .

. . .

. . .

. . . . . .

. . . . . .

ij

a a a

a a a

a a a a

n×nA

1

2

3

0 0 . .

0 0 . .

0 0 . .

. . . . .

. . . . .

n×nΛ

11 12 13 1

21 22 23 2

31 32 33 3

1 2 3

. .

. .

. .

. . . . . .

. .

n

n

ij n

n n n nn

u u u u

u u u u

u u u u u

u u u u

n×nU

Matrix is known as the canonical form of n×n n×n

Λ A

Any square matrix Eigenvalues Some may be zero Some may be equal Lagrange multipliers

Columns are eigenvectors Columns orthonormal

Eigen Values and Eigenvectors

We compute eigen values of square matrix by solving n equations: i

i iAu u i = 1 to n

i i i

Au λ u 0

( )i i A I u 0

n*n matrix n*1 vector

n*1 vector

0i A Iare found by solving i a polynomial of degree n

iu are then easily found characteristic equation

n×nA

Singular Value Decomposition

Singular Value Decomposition (SVD)

Any matrix can be factorised as follows :

Columns of are the left singular vectors of

is a diagonal matrix containing (non - negative) singular va

lue

n×p

n×p n×p p×p p×p

n×p n×n n×p p×p

n×n n×p

n×p

Y

Y V W U

Y V W U

V Y

W s

Columns of are the right singular vectors of p×p n×p

U Y

Lack of consistency in literature

SVD can be applied to any m × n matrix.

Eigenvalue decomposition can only be applied to certain classes of square matrices.

Nevertheless, the two decompositions are related.

Given an SVD of M, as described above, the following holds:

) (

p×n n×p p×p p×n n×n n×n n×p p×p

p×p p×n n×p p×p p×p p×p p×p

Y Y U W V V W U

U W W U U Λ U

SVD Transpose of SVD

n×p p×n n×n n×p p×n n×n n×n n×n n×nY Y V (W W )V V Λ Valso

eigenvectors eigenvalues

Principal Component Analysis

Karl Pearson, 1901 Harold Hotelling, 1933, 1936

11

21

1

We have a matrix with variance - covariance matrix

We transform as follows

Now consider the first column of

:

.

or .

.

.

n

z

z

z

n×p p p

n×p n×p n×p p×p

n×p n×1 n×p n×1

Y S

Y Z Y U

Z z Y u

11 12 1 11 11 11 12 21 1 1

21 22 2 21 21 11 22 21 2 1

1 1 1 11 2 21

. . .....

. . .....

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . .....

p p p

p p p

n np p n n

y y y u y u y u y u

y y y u y u y u y u

y y u y u y u y

1np pu

2

1 .

1

2

1 11 2 21 1 .1 11 .2 21 . 1

1

2

1 .1 11 2 .2 21 21 1

1

2 2

1 .1 11 1 .1 2 .2 11 21

1( )

1..... ( ..... )

1( ) ( ) ..... ( )

1( ) ( )( ) ..............

n

i p

n

i i ip p p p

n

i i ip ip p

i i i

Var z zn

y u y u y u y u y u y un

y y u y y u u y y un

y y u y y y y u un

n×1z

1

..........

and so on

n

1×p p×p p×1u S u

1

1

Now we need to find that maximises var[ ] subject to = 1

Let be a Lagrange multiplier

Then maximise = ( -1)

p×1 p×1 1×p p×1

1×p p×p p×1 1×p p×1

u z u u

u S u u u

1

1

1

0

( ) 0

We find first by solving the polynomial

p×p p×1 p×1

p×p p×p p×1

p

S u (1)

u

u (2S I

S

)

1

1

1

1

0 (degree p)

We then substitute into to find

From we can show that var[ ] , subject to = 1

is the first eig

(

e

1)

×p p×p

p×1

1×p p×p p×1 p×1 1×p p×1

I

u

u S u z u

(2)

u

n value of and is the first eigen vec to rp×p p×1S u

To compute 2nd,3rd, and higher eigen values and eigen vectors is an identical process

Notably ( ) = 0 equivalent to Cov( ) = 0

where and are d

,

i

1×p p×1 p×1 p×1

p×1 p×1

u v u v

u v fferent eigen vectors

Ro

bb

ery

Violence

Vio

len

ce

Burglary

Burglary

Ro

bb

ery

3.0 5.7 6.5 9.4 3.49 3.70

7.2 11.6 6.5 9.4 0.71 2.20

4.5 7.6. . . 1.99. 1.80

. . . . . . =

. . . . . .

. . . . . .

. . . . . .

6.0 9.8 6.5 9.4 0.49 0.40

24×2 24×2 24×2 24×2Y Y Y Y -

Burglary Violence

8.799 3.0361

3.036 6.5051n

24×2 24×2 24×2 24×2 24×2S Y Y Y - Y

R-Mode

08.799 3.0360 1 2

03.036 6.505

k

k

k

k or

24×2 24×2S I

Eigen Values = 10.898 = 4.407 1 2

-0.8226264 0.5685823

-0.5685823 -0.8226264

2×2UEigen Vectors

Variance-covariance matrix

3.49 3.70

0.71 2.20

1.99. 1.80

. . -0.8226264 0.5685823

. . -0.5685823 -0.8226264

. .

. .

0.49 0

(

.

)

40

24×2 24×2 24×2 2×2Z Y Y U

1 0.8226264 0.5685823z Burglary Violence

Accounting for 10.898*100/(10.898+4.407) = 71.2 % of variance

2 0.5685823 0.8226264z Burglary Violence

Accounting for 28.8% of Variance

Violence

2 0.5685823 0.8226264Z Burglary Violence

Plots of 1st and 2nd Components

Robbery

Violence

Plots of 1st and 2nd Components

Robbery

1Z2Z

2 2 2 2

1 2

2 2 2 2

1 2

1 1 1 1

1 2

1 1 1 1

variance( ) variance( ) variance( ) variance( )

n n n n

Robbery Violence Z Z

Robbery Violence Z Zn n n n

Robbery Violence Z Z

The PCA rotation maximises variance of Z1 relative to Z2

It also maximises relative to

1

2

Now three variables

2 3

3.0 5.7 0.4 6.5 9.4 1.7 3.49 3.70 1

7.2 11.6 4.1 6.5 9.4 1.7

4.5 7.6. 1.5 6.5 9.4 1.7

. . . . . . =

. . . . . .

. . . . . .

. . . . . .

6.0 9.8 0.9 6.5 9.4 1.7

4× 24×3 24×3 24×3Y Y Y Y -

.3

0.71 2.20 2.4

1.99. 1.80 0.2

. . .

. . .

. . .

. . .

0.49 0.40 0.8

Burglary Violence Robbery

8.799 3.036 3.1211

3.036 6.505 2.0051

3.121 2.005 1.689n

24×3 24×3 24×3 24×3 24×3S Y Y Y Y

8.799 3.036 3.121 0 0

3.036 6.505 2.005 0 0 0 1,2, 3

3.121 2.005 1.689 0 0

k

k k

k

k

24×3 24×3

S I

Eigen Values = 12.205 = 4.411 = 0.378

0.7788 0.5562 0.2900

0.5318 0.8307 0.1648

0.3325 0.0259 0.9427

3×3UEigen Vectors

1 2 3

3.49 3.70 1.3

0.71 2.20 2.4

1.99. 1.80 0.20.7788 0.5562 0.2900

. . . 0.5318 0.8307 0.1648

. . .0.3325 0.0259 0.9427

. . .

. . .

0.49 0.40 0.

( )

8

24×3 24×3 24×3 3×3Z Y Y U

1 0.7788 0.5318 0.3325z Burglary Violence Robbery

Accounting for 12.205*100/(12.205+4.411+0.378) = 71.8 % of variance

2 0.5562 0.8307 0.0259z Burglary Violence Robbery

Accounting for 26.0% of Variance

3 0.2900 0.1648 0.9427z Burglary Violence Robbery

Accounting for 2.2% of Variance

-6

-4

-2

0

2

4

6

-6 -4 -2 0 2 4 6 8 10

High Crime

High Burglary/ Lower Violence

Low Crime

High Violence/ Lower Burglary

Bath

Canterbury

Lancaster

Leeds

Liverpool

Nottingham

Manchester Bournemouth

Southampton

2Z

1Z

Violence

2 0.5685823 0.8226264Z Burglary Violence

Plots of 1st and 2nd Components

Robbery

Bi-Plot

Ordination in 2 dimensions

Original 3-dimensions

Projection of 3 dimensions into 2 dimensions

Bi-Plot

Both the direction and length of the vectors can be interpreted. So, for these data, where the vectors represent judges, and the points cars, a group of vectors pointing in the same direction correspond to a group of judges who have the same preference opinions about the automobiles

In a biplot, the length of the lines approximates the variances of the variables. The longer the line, the higher is the variance. The angle between the lines, or, to be more precise, the cosine of the angle between the lines, approximates the correlation between the variables they represent. The closer the angle is to 90, or 270 degrees, the smaller the correlation. An angle of 0 or 180 degrees reflects a correlation of 1 or −1, respectively.

Taken from The Stata Journal (2005) 5, Number 2, pp. 208–223 Data inspection using biplots. Ulrich Kohler, Wissenschaftszentrum Berlin [email protected],Magdalena Luniak, Wissenschaftszentrum Berlin [email protected]

http://onlinelibrary.wiley.com/doi/10.1002/9780470238004.app2/pdf

Hardcover: 476 pages Publisher: Wiley-Blackwell (24 Dec 2010) Language: English ISBN-10: 0470012552 ISBN-13: 978-0470012550

Bi-Plot

Ordination in 2 dimensions

Original 3-dimensions

Projection of 3 dimensions into 2 dimensions

US Agricultural Exports: 1990 to 2010

Country Value (million dollars)

1990 1995 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

Canada 4,214 5,796 7,060 7,643 8,124 8,662 9,315 9,733 10,618 11,951 14,062 16,253 15,725 16,856

Mexico 2,560 3,522 5,625 6,410 7,407 7,238 7,891 8,510 9,429 10,881 12,692 15,508 12,932 14,575

Caribbean 1,015 1,281 1,493 1,408 1,400 1,518 1,590 1,850 1,913 2,114 2,575 3,592 3,082 3,192

Central America 483 871 1,100 1,121 1,234 1,252 1,338 1,429 1,589 1,832 2,363 3,106 2,553 2,923

Brazil 177 522 212 264 221 329 384 279 228 287 411 677 386 575

Colombia 120 464 441 417 453 521 513 594 680 868 1,223 1,675 907 832

Japan 8,142 11,149 8,893 9,292 8,884 8,384 8,906 8,147 7,931 8,390 10,159 13,223 11,072 11,819

Korea, South 2,650 3,742 2,449 2,546 2,588 2,673 2,886 2,489 2,233 2,851 3,528 5,561 3,917 5,308

Taiwan \3 1,663 2,591 1,945 1,996 2,010 1,966 2,025 2,065 2,301 2,477 3,097 3,419 2,988 3,190

China \3 \4 818 2,633 854 1,716 1,939 2,068 5,017 5,542 5,233 6,711 8,314 12,115 13,109 17,522

Hong Kong 702 1,503 1,209 1,264 1,228 1,091 1,114 913 872 977 1,168 1,715 2,008 2,808

India 108 194 145 210 353 274 317 257 295 365 475 489 691 755

Indonesia 275 816 531 668 907 810 996 925 958 1,102 1,542 2,195 1,796 2,246

Philippines 381 765 783 901 794 776 626 695 798 888 1,112 1,734 1,294 1,634

Thailand 275 590 409 493 570 611 684 686 675 703 885 1,063 1,046 1,152

Australia 226 339 320 317 292 339 412 410 463 520 662 826 840 928

European Union \5 7,474 8,789 6,858 6,515 6,676 6,398 6,736 6,953 7,052 7,408 8,754 10,080 7,445 8,894

USSR 2,248 1,233 839 670 1,060 695 740 1,112 1,227 1,025 1,665 2,304 1,736 1,454

Saudi Arabia 482 517 447 477 429 343 332 364 350 426 710 890 694 840

Turkey 226 516 502 658 571 675 921 944 1,062 1,030 1,496 1,696 1,499 2,112

Egypt 687 1,309 966 1,050 1,022 863 967 935 819 1,022 1,801 2,050 1,354 2,092

South Africa 81 267 173 134 99 148 149 169 146 126 291 393 162 292

Oceania 343 506 486 490 473 512 621 601 742 760 963 1,189 1,282 1,394

(1) 55418724 (2) 26490533 (3) 13560738 (4) 8083174 (5) 6362753 (6) 3097378.91 (7) 1717442 (8) 1214768 (9) 344449 (10) 137884 (11) 67213 (12) 49836 (13) 11222 (14) 6340

Eigenvalues

Principal Coordinate Analysis

Gower, 1966

Start with a matrix with n rows (objects) and p columns (descriptors)

Compute the Euclidean distance (between rows) matrix [ ]

Transform into new matrix = [a ], such that :

hi

hi

d

n×p

n×n

n×n n×n

Y

D

D A

21 = -

2

Then compute centred matrix [ ] : . . ..

Finally, scale eigen vectors so that

If eigen vectors are arranged as columns. Rows of the

hi hi

hi hi hi h i

k

a d

a a a a

n×p

k k

Δ

u u

resulting table are the

coordinates of the objects in the space of principal coordinates.

Oxford Newcastle Plymouth Dover Glasgow Bristol

Oxford 0 258 187 144 363 72

Newcastle 258 0 408 368 145 297

Plymouth 187 408 0 287 484 118

Dover 144 368 287 0 498 194

Glasgow 363 145 484 498 0 372

Bristol 72 297 118 194 372 0

Distances Between British Cities

Q-Mode

0 258 187 144 363 72

258 0 408 368 145 297

187 408 0 287 484 118

144 144 287 0 498 194

363 363 484 498 0 372

72 72 118 194 372 0

6×6D

[,1] [,2] [,3] [,4] [,5] [,6]

[1,] 0.1473475 0.1128741 0.04538458 0.4082483 0.81339727 0.3677045 [2,] -0.4335791 0.2304177 0.65857972 0.4082483 -0.30873244 0.2514103 [3,] 0.3970476 -0.6344915 0.35761362 0.4082483 -0.03729046 -0.3792480 [4,] 0.3754683 0.6761005 -0.15553511 0.4082483 -0.16408006 -0.4291055 [5,] -0.6752523 -0.1782828 -0.42405981 0.4082483 0.13883744 -0.3827275 [6,] 0.1889680 -0.2066181 -0.48198299 0.4082483 -0.44213175 0.5719662

Distances computed from Google maps – road distances are not straight – so ordination will have 3 or more dimensions but should be dominated by 2-dimensions

1.931479e+05 = 19314.97 4.733022e+04 = 4733.022 3.820062e+03 = 382.0062 4.820322e-11 -4.952561e+02 -6.316731e+03

Eigenvalues

Can be ignored

Eigen vectors

Ordination Plot: First Two Eigen Vectors

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Angouleme 4.2 4.9 7.9 10.4 13.6 17.0 18.7 18.4 16.1 11.7 7.6 4.9 Angers 4.6 5.4 8.9 11.3 14.5 17.2 19.5 19.4 16.9 12.5 8.1 5.3 Besancon 1.1 2.2 6.4 9.7 13.6 16.9 18.7 18.3 15.5 10.4 5.7 2.0 Biarritz 7.6 8.0 10.8 12.0 14.7 17.8 19.7 19.9 18.5 14.8 10.9 8.2 Bordeaux 5.6 6.6 10.3 12.8 15.8 19.3 20.9 21.0 18.6 13.8 9.1 6.2 Brest 6.1 5.8 7.8 9.2 11.6 14.4 15.6 16.0 14.7 12.0 9.0 7.0 Cler-Ferr 2.6 3.7 7.5 10.3 13.8 17.3 19.4 19.1 16.2 11.2 6.6 3.6 Dijon 1.3 2.6 6.9 10.4 14.3 17.7 19.6 19.0 15.9 10.5 5.7 2.1 Grenoble 1.5 3.2 7.7 10.6 14.5 17.8 20.1 19.5 16.7 11.4 6.5 2.3 Lille 2.4 2.9 6.0 8.9 12.4 15.3 17.1 17.1 14.7 10.4 6.1 3.5 Limoges 3.1 3.9 7.4 9.9 13.3 16.8 18.4 17.8 15.3 10.7 6.7 3.8 Lyon 2.1 3.3 7.7 10.9 14.9 18.5 20.7 20.1 16.9 11.4 6.7 3.1 Marseille 5.5 6.6 10.0 13.0 16.8 20.8 23.3 22.8 19.9 15.0 10.2 6.9 Montpellier 5.6 6.7 9.9 12.8 16.2 20.1 22.7 22.3 19.3 14.6 10.0 6.5 Nancy 0.8 1.6 5.5 9.2 13.3 16.5 18.3 17.7 14.7 9.4 5.2 1.8 Nantes 5.0 5 3 8.4 10.8 13.9 17.2 18.8 18.6 16.4 12.2 8.2 5.5 Nice 7.5 8.5 10.8 13.3 16.7 20.1 22.7 22.5 20.3 16.0 11.5 8.2 Nimes 5.7 6.8 10.1 13.0 16.6 20.8 23.6 22.9 19.7 14.6 9.8 6.5 Orleans 2.7 3.6 6.9 9.8 13.4 16.6 18.4 18.2 15.6 10.9 6.6 3.6 Paris 3.4 4.1 7.6 10.7 14.3 17.5 19.1 18.7 16.0 11.4 7.1 4.3 Perpignan 7.5 8.4 11.3 13.9 17.1 21.1 23.8 23.3 20.5 15.9 11.5 8.6 Reims 1.9 2.8 6.2 9.4 13.3 16.4 18.3 17.9 15.1 10.3 6.1 3.0 Rennes 4.8 5.3 7.9 10.1 13.1 16.2 17.9 17.8 15.7 11.6 7.8 5.4 Rouen 3.4 3.9 6.8 9.5 12.9 15.7 17.6 17.2 15.0 11.0 6.8 4.3 St-Quent 2.0 2.9 6.3 9.2 12.7 15.6 17.4 17.4 15.0 10.5 6.1 3.1 Strasbourg 0.4 1.5 5.6 9.8 14.0 17.2 19.0 18.3 15.1 9.5 4.9 1.3 Toulon 8.6 9.1 11.2 13.4 16.6 20.2 22.6 22.4 20.5 16.5 12.6 9.7 Toulouse 4.7 5.6 9.2 11.6 14.9 18.7 20.9 20.9 18.3 13.3 8.6 5.5 Tours 3.5 4.4 7.7 10.6 13.9 17.4 19.1 18.7 16.2 11.7 7.2 4.3 Vichy 2.4 3.4 7.1 9.9 13.6 17.1 19.3 18.8 16.0 11.0 6.6 3.4

Monthly Average Temperatures: 30 French Cities

Eigen values

1113.8 170.4 4.6 2.7 1.2 0.6 0.5 0.3 0.2 0.1 0.1 0

In practice only 2 dimensions in these data

30 in total

Mediterranean Coast

Brittany Coast

Southwest

South and West of Paris

North and East of Paris – Belgian Border

South East of Paris

North and East of Paris – Belgian Border

Brittany Coast

Mediterranean Coast

South East of Paris South and West of Paris

Correspondence Analysis

Many people stretching back to 1933

11 12 13

21 22 23

31 32 33

41 42 43

O O O

O O O

O O O

O O O

11 12 13

21 22 23

31 32 33

41 42 43

E E E

E E E

E E E

E E E

Matrix of observed counts

Matrix of expectations

ij ij ijijj i ji

ij ij

i jij ij ij

i j i j i j

O O OO

E OO O O

Under hypothesis that row and column are independent

2

2 2

( 1) ( 1) 3 2

( )ij ij

n p

i j ij

O E

E

and

follows a chi-squared distribution

10 50 90

20 60 100

30 70 110

40 80 120

19.23 50.00 80.77

23.08 60.00 96.92

26.92 70.00 113.08

30.77 80.00 129.23

X-squared = 9.8576, df = 3*2=6, p-value = 0.1308

non-significant rows and columns independent

10 50 120

20 60 110

30 70 100

40 80 90

12.82 33.33 53.85

14.10 36.67 59.23

15.38 40.00 64.62

16.67 43.33 70.00

X-squared = 36.7282, df = 3*26, p-value = 1.989e-06

significant result Rows, columns not independent

ijEijO

Marsh Lotus Open

Swamp Swamp Water

Purple swamphen 798 78 25

Yellow-vented bulbul 690 101 129

Pink-necked green pigeon 614 150 90

Peaceful dove 462 101 84

Spotted dove 386 56 67

Pacific swallow 208 39 85

White-breasted waterhen 200 38 25

Baya weaver 173 7 52

Common myna 166 17 51

Purple heron 164 52 22

Yellow bittern 162 42 11

Jungle myna 154 15 117

White-throated kingfisher 128 51 42

Scaly-breasted munia 125 36 49

Relative abundance of bird species recorded at three habitats of Paya IndahWetland Reserve, Peninsular Malaysia.

Chi-squared = 505.9142, df = 13*2=26, p-value < 2.2e-16

International Journal of Zoology Volume 2011, Article ID 758573, 17 pages doi:10.1155/2011/758573

Bird Species Abundance and Their Correlationship with Microclimate and Habitat Variables at Natural Wetland Reserve, Peninsular Malaysia

2

2

( 1) ( 1)

( )Earlier we saw that

ij ij

n p

i j ij

O E

E

ij ij

ij

ij iji j

O E

O E

r×cQ

: Apply SVD to r×c r×c r×r r×c c×c

Q Q V W U

orthonormal orthonormal diagonal

c×c

We know from an earlier discussion that

are the eigen vectors of

are the eigen vectors of

Diagonal elements of W are square roots of eigenval

ues, ii iw

c×c c×r r×c

r×r r×c c×r

U Q Q

V Q Q

We will also need : [ ]ij

ij

ij

i j

Op

O

r×cP

Matrices and can be used to plot the positions

of the row and column vectors in two seperate scatter diagrams

c×c r×rU V

Eigen Values (1) 2.506575e-01 (2) 1.436226e-01 (3) 2.032566e-17

For joint plots a number of different scalings have been proposed :

For example :

Where is a diagonal matrix in which the diagonals are the

reciprocals of the square roo

ts of the

-1/2

c×c c×c c×c

-1/2

c×c

X D U

D

column totals of

And

Where is a diagonal matrix in which the diagonals are the

reciprocals of row totals of

And

Finally, plot column 1 of against c

r r

r×c

-1

r×c r×r r×c c×c

-1

×

r×c

c×c c×c c×c

r×c

P

F D P X

D

P

G X W

F olumn2 of : on same graph

plot column1 of against column 2 of

r×c

c×c c×c

F

G G

H+ H- A+ A- "+" "-" GD

1 Manchester City 55 12 38 17 93 29 64 2 Manchester United 52 19 37 14 89 33 56 3 Arsenal 39 17 35 32 74 49 25 4 Tottenham Hotspur 39 17 27 24 66 41 25 5 Newcastle United 29 17 27 34 56 51 5 6 Chelsea 41 24 24 22 65 46 19 7 Everton 28 15 22 25 50 40 10 8 Liverpool 24 16 23 24 47 40 7 9 Fulham 36 26 12 25 48 51 -3

10 West Bromwich Albion 21 22 24 30 45 52 -7 11 Swansea City 27 18 17 33 44 51 -7

12 Norwich City 28 30 24 36 52 66 -14 13 Sunderland 26 17 19 29 45 46 -1 14 Stoke City 25 20 11 33 36 53 -17 15 Wigan Athletic 22 27 20 35 42 62 -20 16 Aston Villa 20 25 17 28 37 53 -16 17 Queens Park Rangers 24 25 19 41 43 66 -23 18 Bolton Wanderers 23 39 23 38 46 77 -31

19 Blackburn Rovers 26 33 22 45 48 78 -30

20 Wolverhampton Wndrs

19 43 21 39 40 82 -42

Premier League Final Table 2011-2012

Data used

Principal inertias (eigenvalues):

1 2 3 Value 0.053538 0.008691 0.00678 Percentage 77.58% 12.59% 9.82%

Redundancy Analysis and Canonical Correspondence Analysis

In this context canonical means that we have two matrices or, alternatively, two sets of descriptors for one set of objects

Rao, 1964 Ter Braak, 1986

1 2 3 4 5 6 7

1 3 8

2 0 9

3 13 4

4 5 5

8 19

23

77 7 3

Species Si

tes

1 2 3 4 5 6 7

1 45.8 78.6

2 32.8 98.5

3 56.1 45.0

4

77 78.3

Counts

Environmental

Site

s

e.g. Soil, Climate, etc.

Real Numbers

Typical Example of Ecology Data

Indirect Comparison: The matrix of explanatory variables, , does not intervene in the in the calculation that produces the ordination of . Correlation or regression of the ordination vectors on are carried out first and the ordination is carried out on a modified . In a direct comparison the matrix X intervenes in the calculation , forcing the ordination vectors, , to be maximally related to combinations of the columns of . In mathematics more generally, a canonical form is the simplest and most comprehensive form to which certain functions , relations, or expressions can be reduced without loss of generality. For example, the canonical form of a covariance matrix is its matrix of eigenvalues

n×pY

n×pX

n×pXn×pY

n×pX

n×pY

n×pY

n×pY

n×pY

n×mXn×1

y

n×mX

Simple ordination Principal Components Correspondence Analysis

Ordination of under constrained Multiple regression

n×1y

n×pX

Ordination of under constrained Redundancy Analysis Canonical Correspondence Analysis

n×pXn×pY

1 1 1 1 1 1

1 1

1 1 1 1 1 1

1 1 1

, , , ,

, , , ,

, , , ,

, , , ,

. . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . .

. . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . .

p m

p p p p p m

p m

p m m m m

y y y y y x y x

y y y y y x y x

x y x y x x x x

x y x y x x x x

S S S S

S S S S

S S S S

S S S S

Y+XS

YY YX YY YX

XY XX YX XX

S S S S

S S S S

Variance covariance matrix of n×pY

Variance covariance matrix of n×pX

Covariances amongst descriptors of

X and Y

Covariances amongst descriptors of

X and Y

In principal component analysis the eigen analysis equation was :

In redundancy analysis it is

0

:

0

k

k

YY k

-1 T

YX XX YX k

S I u

S S S I u

An Aside on Multiple Linear Regression

2

0 1 1 2 2 .... ,y b b x b x e e N

1 0 1 11 2 12

2 0 1 21 2 22

3 0 1 31 2 32

0 1 1 2 2

.......

.......

.......

.

.

.......n n n

y b b x b x

y b b x b x

y b b x b x

y b b x b x

1 11 12

2 21 22 0

3 31 32 1

3

1 2

1 . .

1 . .

1 . .

. . . . . .

. . . . . . .

. . . . . . .

1 . .n n n

y x x

y x x b

y x x b

b

y x x

ˆ

ˆ

ˆ

ˆ

ˆ

-1 -1

-1

-1

y = Xb

X y = (X X)b

(X X) X y = (X X) (X X)b

(X X) X y = Ib

(X X) X y = bLeast squares solution

Coefficients to be estimated Data

The Algebra of Redundancy Analysis

Centre both response matrix and matrix of independent variables

by subtracting the column means from the column values / elements

For each column in compute , giving

Fo

.

ˆ ˆ

r

-1 -1

Y X

Y b (X X) X y B (X X) X Y

ˆ ˆ

ˆ ˆ ˆˆ ,

ˆ ˆ

each column of compute fitted values giving =

[1/ (1 )]

= [1/ (1 )]

[1/ (1 )]

= [1/ (1 )

=

]

n

n

n

n

-1

XX

Y Y

-1 -1

-1

-1

YX

Y y Xb Y XB X(X X) X Y

S X X

S Y Y

Y X(X X) X X(X X) X Y

Y X(X X) X Y

S S

ˆ ˆSo, perform redundancy analysis by solving

: 0k k

XX YX

-1

k YX XX YX kY Y

S

S I u S S S I u

Population density /ha Population density /ha

Population density /ha

Burglary Violence

Robbery

Eigen values (1) 6.612048e+00 (2) 2.664535e-15 (3) -1.110223e-16

1 1 1 2 1 3

2 1 2 2 2 3

3 1 3 2 3 3

1 2 3

ˆ ˆ ˆ

ˆ ˆ ˆ

ˆ ˆ ˆ

. . .

. . .

. . .

ˆ ˆ ˆn n n

x b x b x b

x b x b x b

x b x b x b

x b x b x b

11 12 13

21 22 23

31 32 33

1 2 3

. . .

. . .

. . .

n n n

y y y

y y y

y y y

y y y

1 1 1 2 1 311 12 13 1

21 22 23 2 2 1 2 2 2 3

31 32 33 3 3 1 3 2 3 3

1 2 3

1 2 31 2 3

ˆ ˆ ˆ

ˆ ˆ ˆ

ˆ ˆ ˆ

ˆ ˆ ˆ. . . . . .

. . . . . .

. . . . . .

ˆ ˆ ˆn n n nn n n

x b x b x by y y x

y y y x x b x b x b

y y y x x b x b x b

b b b

y y y xx b x b x b

Regression model fitted

Matrix used in PCA, Rank = 3 Matrix used in Redundancy Analysis, Rank = 1

Eigen Values: (1) 6.706016e+00 (2) 7.439234e-02 (3) 6.071489e-18

11 12 13

1 2 321 22 23

11 21 3131 32 33

12 22 3241 42 43

11 21 31

12 22 32

1 1 0 1 0

ˆ ˆ ˆ1 1 0 0 1

ˆ ˆ ˆ1 0 1 1 0

ˆ ˆ ˆ1 0 1 0 1

ˆ ˆ ˆ. . . . . . . .

ˆ ˆ ˆ. . . . . . . .

. . . . . . . .

ˆ

y y y

y y y

y y y

y y y

1 11 11 2 21 21 3 31 31

1 11 12 2 21 22 3 31 32

1 12 11 2 22 21 3 32 31

1 12 12 2 22 22 3 32 32

ˆ ˆ ˆˆ ˆ ˆˆ ˆ

ˆ ˆ ˆˆ ˆ ˆˆ ˆ ˆ

ˆ ˆ ˆˆ ˆ ˆˆ ˆ ˆ

ˆ ˆ ˆˆ ˆ ˆˆ ˆ ˆ

. . .

. . .

. . .

1st replicate of a 2 by 2 factorial

Used in redundancy analysis

Recommended