+ All Categories
Home > Documents > Computational Molecular Biology - NUIG MATHEMATICS

Computational Molecular Biology - NUIG MATHEMATICS

Date post: 24-Feb-2022
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
23
Computational Molecular Biology Lecture Thirteen: Neighbour-joining algorithm Semester I, 2009-10 Graham Ellis NUI Galway, Ireland
Transcript
Page 1: Computational Molecular Biology - NUIG MATHEMATICS

Computational Molecular Biology

Lecture Thirteen: Neighbour-joining algorithm

Semester I, 2009-10

Graham EllisNUI Galway, Ireland

Page 2: Computational Molecular Biology - NUIG MATHEMATICS

About the algorithm

Neighbour-joining is a method used for the construction ofphylogenetic trees.

Page 3: Computational Molecular Biology - NUIG MATHEMATICS

About the algorithm

Neighbour-joining is a method used for the construction ofphylogenetic trees.

It is usually used for trees based on DNA or protein sequence data.

Page 4: Computational Molecular Biology - NUIG MATHEMATICS

The algorithm’s input

The algorithm inputs an r × r symmetric matrix D with zeros onthe diagonal.

Page 5: Computational Molecular Biology - NUIG MATHEMATICS

The algorithm’s input

The algorithm inputs an r × r symmetric matrix D with zeros onthe diagonal.

No assumption is made about the triangle inequality, or the fourpoint condition.

Page 6: Computational Molecular Biology - NUIG MATHEMATICS

The algorithm’s input

The algorithm inputs an r × r symmetric matrix D with zeros onthe diagonal.

No assumption is made about the triangle inequality, or the fourpoint condition.

The idea is that the matrix arises from experimental data from r

taxa (e.g. DNA samples).

Page 7: Computational Molecular Biology - NUIG MATHEMATICS

The algorithm’s output

The algorithm outputs a phylogenetic tree with r leaves and withlenghts assigned to edges.

Page 8: Computational Molecular Biology - NUIG MATHEMATICS

The algorithm

Neighbour-joining is an iterative algorithm. Each iteration consistsof the following steps:

Page 9: Computational Molecular Biology - NUIG MATHEMATICS

The algorithm

Neighbour-joining is an iterative algorithm. Each iteration consistsof the following steps:

1. Based on the current distance matrix calculate the matrix Q(explained below).

Page 10: Computational Molecular Biology - NUIG MATHEMATICS

The algorithm

Neighbour-joining is an iterative algorithm. Each iteration consistsof the following steps:

1. Based on the current distance matrix calculate the matrix Q(explained below).

2. Find the pair of taxa in Q with the lowest value. Create anode on the tree that joins these two taxa.

Page 11: Computational Molecular Biology - NUIG MATHEMATICS

The algorithm

Neighbour-joining is an iterative algorithm. Each iteration consistsof the following steps:

1. Based on the current distance matrix calculate the matrix Q(explained below).

2. Find the pair of taxa in Q with the lowest value. Create anode on the tree that joins these two taxa.

3. Calculate the distance of each of the taxa in the pair to thisnew node (using the formula below).

Page 12: Computational Molecular Biology - NUIG MATHEMATICS

The algorithm

Neighbour-joining is an iterative algorithm. Each iteration consistsof the following steps:

1. Based on the current distance matrix calculate the matrix Q(explained below).

2. Find the pair of taxa in Q with the lowest value. Create anode on the tree that joins these two taxa.

3. Calculate the distance of each of the taxa in the pair to thisnew node (using the formula below).

4. Calculate the distance of all taxa outside of this pair to thenew node.

Page 13: Computational Molecular Biology - NUIG MATHEMATICS

The algorithm

Neighbour-joining is an iterative algorithm. Each iteration consistsof the following steps:

1. Based on the current distance matrix calculate the matrix Q(explained below).

2. Find the pair of taxa in Q with the lowest value. Create anode on the tree that joins these two taxa.

3. Calculate the distance of each of the taxa in the pair to thisnew node (using the formula below).

4. Calculate the distance of all taxa outside of this pair to thenew node.

5. Start the algorithm again, considering the pair of joinedneighbors as a single taxon and using the distances calculatedin the previous step.

Page 14: Computational Molecular Biology - NUIG MATHEMATICS

The Q matrix

Let D be our distance data relating r taxa. We calculate Q asfollows:

Q(i , j) = (r − 2)D(i , j) −r

k=1

D(i , k) −r

k=1

D(j , k)

Page 15: Computational Molecular Biology - NUIG MATHEMATICS

Example

Suppose we start with the following distance data.

D A B C D

A 0 7 11 14B 7 0 6 9C 11 6 0 7D 14 9 7 0

Page 16: Computational Molecular Biology - NUIG MATHEMATICS

Example

Suppose we start with the following distance data.

D A B C D

A 0 7 11 14B 7 0 6 9C 11 6 0 7D 14 9 7 0

We get the following Q matrix

Q A B C D

A −64 −40 −34 −34B −40 −44 −34 −34C −34 −34 −48 −40D −34 −34 −40 −60

Page 17: Computational Molecular Biology - NUIG MATHEMATICS

Example (cont.)

The neighbours (A,B) and neighbours (C ,D) both have lowest Q

value -40. We choose either pair and join them.

Page 18: Computational Molecular Biology - NUIG MATHEMATICS

Example (cont.)

The neighbours (A,B) and neighbours (C ,D) both have lowest Q

value -40. We choose either pair and join them.

Let’s choose (A,B). Our graph starts to look like

A B C D

E

Page 19: Computational Molecular Biology - NUIG MATHEMATICS

Example (cont.)

We now calculate the distance from E to the paired taxa A,B

using the fomula

D(A,E ) =1

2D(A,B) +

1

2(r − 2)

[

r∑

k=1

D(A, k) −r

k=1

D(B , k)

]

.

Page 20: Computational Molecular Biology - NUIG MATHEMATICS

Example (cont.)

We now calculate the distance from E to the paired taxa A,B

using the fomula

D(A,E ) =1

2D(A,B) +

1

2(r − 2)

[

r∑

k=1

D(A, k) −r

k=1

D(B , k)

]

.

The formula gives D(A,E ) = 6, from which we deduceD(B ,E ) = 1.

Page 21: Computational Molecular Biology - NUIG MATHEMATICS

Example (cont.)

We now calculate the distance from E to any of the other fournodes X using the fomula

D(E ,X ) =1

2[D(A,X ) − D(A,E )] +

1

2[D(B ,X ) − D(B ,E )]

Here E is the new node, X is the node whose distance from E weare computing, and A,B are the two nodes just joined.

Page 22: Computational Molecular Biology - NUIG MATHEMATICS

Example (cont.)

We now calculate the distance from E to any of the other fournodes X using the fomula

D(E ,X ) =1

2[D(A,X ) − D(A,E )] +

1

2[D(B ,X ) − D(B ,E )]

Here E is the new node, X is the node whose distance from E weare computing, and A,B are the two nodes just joined.

We get

D E C D

E 0 5 8C 5 0 7D 8 7 0

Page 23: Computational Molecular Biology - NUIG MATHEMATICS

Example (cont.)

Now we find the next Q matrix. Use it to adjoin a new node toour tree. Then calculate a new distance matrix D.


Recommended