Systematics Lecture: Phenetics

Post on 27-Jul-2015

35 views 1 download

transcript

PHENETICS (Numerical Taxonomy)

Phenetics -Character scoring

1

2

3

4

5

C1 C2 C3 C4 C5 C6 ..

Raw table

Character states

•Multi-dimensional problem

•Numerical taxonomy/ phenetics is

essentially a multivariate method of

statistical analysis

• Characters are reduced to distances for

phenetic analysis

5. Dendrogram – cluster or group OTUs by

overall resemblance

Scores

4) Calculate from characters the pairwise measures of overall resemblance between OTUs (results in a distance matrix – OTU x OTU)

5×5

1

2

3

4

5

C1 C2 C3 C4 C5 C6 ..

3) List characters

Similarity criterion

Cluster criterion

Any set of numbers per

column

1) Choose taxa 2) Discover and measure characters

Caminalcules

Operational Taxonomic Units (OTUs)--a name we use to avoid

assigning organisms to any particular taxonomic rank (such as

species).

Step 1

The first step is to make a

subjective judgement about

the overall similarity

between all pair-wise

combinations of the eight

OTUs

Measures of Overall Similarity

• Measured by means of a “similarity coefficient

1. Qualitative characters

a. Coefficients of association

i. Simple matching

ii. Jaccard

2. Quantitative characters

a. Distance

i. Taxonomic distance

Coefficients of association

- Data are qualitative characters with 2 states,

i.e. presence/absence

CHARACTERS

1 2 3 4

OTUi + + - -

OTUj - + - +

Simple matching coefficient

• Fraction of characters where OTUs have

the identical state:

• Formula:

Ssm = m/(m+u), where m = match

u = mismatch

CHARACTERS

1 2 3 4

OTUi + + - -

OTUj - + - +

• Ssm = m/(m+u)

• Number of matches = 2

• Number of mismatches = 2

• Ssm = 2/(2+2)

= 1/2

Simple matching coefficient

Jaccard coefficient (Sneath)

• Sj = a/(a+u)

where a = # presence identities

u = b + c (mismatch)

- Ignores absence matches

OTUi

+ -

OTUj + a b

- c d

OTUi

+ -

OTUj + a =1 b = 1

- c =1 d =1

CHARACTERS

1 2 3 4

OTUi + + - -

OTUj - + - +

• Sj = a/(a+u)

where a = # presence identities

u = b + c

Sj = 1 / (1+ 2)

= 1/3

Distance coefficients

1. Taxonomic distance = Euclidean

distance in character space

Quantitative characters

Euclidean Distance Metric

The Euclidean distance between two

points and

in Euclidean n-space, is defined as:

Sample data for seven operational

taxonomic units

Distance between 1 and 2 character SPECIES X-Y (X-Y)2

1 (X) 2 (Y)

5 2.47 2.35 0.12 0.0144

6 3.08 2.99 0.09 0.0081

7 1.93 1.88 0.05 0.0025

9 1.97 1.88 0.09 0.0081

10 1.93 1.81 0.12 0.0144

11 2.46 2.31 0.15 0.0225

12 1.08 1.36 -0.28 0.0784

15 2.3 2.23 0.07 0.0049

17 8.5 8.3 0.2 0.04

22 109.7 111.1 -1.4 1.96

23 96 94.6 1.4 1.96

25 90.9 89.9 1 1

Euclidean Distance = 5.1133

add

Step 2

The similarity rankings you have produced

are then used to create a similarity matrix.

Step 3 Find the pair of OTUs that have the highest similarity

ranking. (In this example, it happens to be OTUs 2 and

7, with a similarity ranking of 0.9 shown in boldface and

with an asterisk*).

Step 4 Combine OTUs 2 and 7, and treat them as a single composite

unit from this point on. Construct a new similarity matrix (this

time it will be 7 x 7), as shown in the table below.

Recalculate the similarity values for each OTU with the new

composite 2/7 OTU. To do so, simply compute the average

similarity of each OTU with 2 and with 7

How to calculate for the new similarity values ?

1 & 7 = 0.1

1 & 2 = 0.1

1 and (7,2) = (0.1 + 0.1)/2

= 0.2/2 = 0.1

How to calculate for the new similarity values ?

3 & 7 = 0.2

3 & 2 = 0.1

3 and (7,2) = (0.2 + 0.1)/2

= 0.3/2 = 0.15

How to calculate for the new similarity values ?

4 & 7 = 0.3

4 & 2 = 0.3

4 and (7,2) = (0.3 + 0.3)/2

= 0.6/2 = 0.3

How to calculate for the new similarity values ?

5 & 7 = 0.2

5 & 2 = 0.2

5 and (7,2) = (0.2 + 0.2)/2

= 0.4/2 = 0.2

How to calculate for the new similarity values ?

6 & 7 = 0.3

6 & 2 = 0.2

6 and (7,2) = (0.3 + 0.2)/2

= 0.5/2 = 0.25

How to calculate for the new similarity values ?

8 & 7 = 0.4

8 & 2 = 0.3

8 and (7,2) = (0.4 + 0.3)/2

= 0.7/2 = 0.35

Step 5 In the new, reduced matrix with recomputed similarity

values, find the next pair of OTUs with the highest

similarity value. In this case, OTUs 1 and 6 and OTUs 3

and 5 are tied with a similarity value of 0.8. For

simplicity, choose one pairing at random and recalculate

the similarity indices, and then do the next pairing,

Dendrogram (tree)

Similarity matrix

Cluster criterion

Your OTUs can now be clustered

graphically in a branching diagram

called a phenogram.

How to construct the dendrogram ?

How to construct the dendrogram ?

Simple matching coefficient Formula:

Ssm = m/(m+u), where m = match

u = mismatch

Jaccard coefficient Formula:

Sj = a/(a+u)

where a = # matching presence identities

u = b + c (mismatch)