+ All Categories
Home > Documents > Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Date post: 19-Dec-2015
Category:
View: 221 times
Download: 2 times
Share this document with a friend
46
Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference
Transcript
Page 1: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Phylogenetics

“Inferring Phylogenies”

Joseph Felsenstein

Excellent reference

Page 2: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

What is a phylogeny?

Page 3: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Different Representations Cladogram - branching pattern only Phylogram - branch lengths are

estimated and drawn proportional to the amount of change along the branch

Rooted - implies directionality of change Unrooted - does not How do you root a tree?

Page 4: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

What is a phylogeny used for?

π =2 π ij

j=i+1

n

∑i=1

n−1

n n−1( )

θ =4N eiμ

Page 5: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Estimate a Phylogeny

Sp1 ACCGTCTTGTTASp2 AGCGTCATCAAASp3 AGCGTCATCAAASp4 ACCGTCTTGATASp5 AGCCTCTTCATA

Page 6: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Estimate a Phylogeny

Sp1 ACCGTCTTGTTASp2 AGCGTCATCAAASp3 AGCGTCATCAAASp4 ACCGTCTTGATASp5 AGCCTCTTCATA

Page 7: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Working Tree

sp1

sp4

sp2

sp3

sp5

c2

Page 8: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Estimate a Phylogeny

Sp1 ACCGTCTTGTTASp2 AGCGTCATCAAASp3 AGCGTCATCAAASp4 ACCGTCTTGATASp5 AGCCTCTTCATA

Page 9: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Working Tree

sp1

sp4

sp2

sp3

sp5

c2

c4

Page 10: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Estimate a Phylogeny

Sp1 ACCGTCTTGTTASp2 AGCGTCATCAAASp3 AGCGTCATCAAASp4 ACCGTCTTGATASp5 AGCCTCTTCATA

Page 11: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Working Tree

sp1

sp4

sp2

sp3

sp5

c2

c4

c7

Page 12: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Estimate a Phylogeny

Sp1 ACCGTCTTGTTASp2 AGCGTCATCAAASp3 AGCGTCATCAAASp4 ACCGTCTTGATASp5 AGCCTCTTCATA

Page 13: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Working Tree

sp1

sp4

sp2

sp3

sp5

c2

c4

c7

c9

Page 14: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Estimate a Phylogeny

Sp1 ACCGTCTTGTTASp2 AGCGTCATCAAASp3 AGCGTCATCAAASp4 ACCGTCTTGATASp5 AGCCTCTTCATA

Page 15: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Working Tree

sp1

sp4

sp2

sp3

sp5

c2

c4

c7

c9

c10

Page 16: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Estimate a Phylogeny

Sp1 ACCGTCTTGTTASp2 AGCGTCATCAAASp3 AGCGTCATCAAASp4 ACCGTCTTGATASp5 AGCCTCTTCATA

Page 17: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Final Tree

sp1

sp4

sp2

sp3

sp5

c2

c4

c7

c9

c10 c11

Page 18: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

What optimality criteria do we use then? Parsimony Likelihood Bayesian

Distance methods?

Page 19: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Parsimony Why should we choose a specific grouping? Maximum parsimony: we should accept the

hypothesis that explain the data most simply and efficiently

“Parsimony is simply the most robust criterion for choosing between competing scientific hypotheses. It is not a statement about how evolution may or may not have taken place”1

1 Kitching, I. J.; Forey, P. L.; Humphries, J. & Williams, D. M. 1998. Cladistics: the theory and practice of parsimony analysis. The systematics Association Publication. No. 11.

Page 20: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Parsimony Optimality criteria that chooses the

topology with the less number of transformations of character states

Optimizing one component: tree topology (pattern based)

Most parsimonious tree: the one (or multiple) with the minimum number of evolutionary changes (smaller size/tree length)

Page 21: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Reconstructing trees via sequence data1 2 3 4 5 6

O T G T A A T

A A A T G A G

B A G C C - G

C A A T G A T

D A G C C - T

AO DC B

1. T=>A

3. T=>C

2. G=>A

4. A=>G

4. A=>C5. A=> GAP

6. T=>G6. T=>G

Tree length = 8

Page 22: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Neighbor-joining Method

Page 23: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

NJ distance matrices

Page 24: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

NJ distance matrices

Page 25: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

NJ distance matrices

Page 26: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

NJ distance matrices

Page 27: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Finished NJ tree

Page 28: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Pyrimidines

Purines

T C

A G

Models of Evolution

Transversions Transitions

Page 29: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Maximum Likelihood

Base frequencies: fA + fG + fC + fT = 1 Base exchange: fs + fv = 1 R-matrix: + + + + + = 1 Gamma shape parameter Number of discrete gamma-distribution categories Pinvar: fvar + finv = 1

Likelihood: L = li where i is each character state

Page 30: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Maximum Likelihood

L=Pr(D|H)

L

( i )

= Pr AGGCG via x , y , z , w( )

all x , y , z , w

= (Pr w ) (Pr z ; w , t8

) (Pr x ; w , t7

) (Pr y ; z , t6

) (Pr A ; x , t1

) (Pr G ; z , t2

) (Pr G ; z , t3

) (Pr C ; y , t4

) (Pr G ; y , t5

)

w

z

y

x

w

zx

y

GC

GGA

t1 t2 t3

t4 t5

t6

t7 t8

Page 31: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

ML cont.

L = L

( i )

i = 1

n

Pii

( t ) =

1

4

+

3

4

e

− 4 λ t / 3

the probability that the nucleotide at time t is i is given by

the probability that the nucleotide at time t is j, j i, is given by

Pij

( t ) =

1

4

1

4

e

− 4 λ t / 3

Page 32: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Bayes Theorem

Prob (H │D) = Prob (H) Prob (D│H)

Prob (D)H=Hypothesis D=Data

Prior probability orMarginal probability of HThe conditional

probability of H given D: posterior probability

Likelihood function

Prior probability orMarginal probability of D∑HP(H) P(D|H)

Normalizing Constant: ensures ∑ P (H │D) = 1

Page 33: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Take Home Message Likelihood: represents the P of the data

given the hypothesis => difficult to interpret

Bayes approach: estimates the P of the hypothesis given the data => estimates P for the hypothesis of interest

Page 34: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Bayesian Inference of Phylogeny

Calculating pP of a tree involves a summation over all possible trees and, for each tree, integration over all combinations of bl and substitution-model parameter values

f(i |X) = f(i) f(X|i)∑j=1 f(i) f(X|i)

B(s)

f(i,i,|X) = f(i,i,) f(X|i,i,)∑j=1 ∫ , f(i,i,) f(X| i,i,)dd

B(s)

f(i|X) = ∫ , f(i,i,) f(X|i,i,) dd∑j=1 ∫ , f(i,i,) f(X| i,i,)ddB(s)

Inferences of any single parameter are based on the marginal distribution of the parameter

This marginal P distribution of the topology, for example, integrates out all the other parameters

Advantage: the power of the analysis is focused on the parameter of interest (i.e., the topology of the tree)

Page 35: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Estimating phylogenies Exhaustive Searches Branch and bound methods Rise in computational time versus rise

in solution space

Page 36: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

How many topologies are there?

T =2n − 3( )!

2n−1 n −1( )!

Page 37: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

The Phylogenetic Problem

Number of Seqs Number of Trees10 2x106

100 2x10182

1,000 2x102,860

10,000 8x1038,658

100,000 1x10486,663

1,000,000 1x105,866,723

B(T)= 2i−5( )i=3

T∏

Page 38: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

HIV-1 Whole Genomes1993 - 15

HIV-1 Whole Genomes2003 (JAN) - 397

Page 39: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Tree Space - the final frontier

Page 40: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Heuristic Searches Nearest-neighbor interchanges (NNI) - swap two adjacent

branches on the tree Subtree pruning and regrafting (SPR) - removing a branch

from the tree (either an interior or an exterior branch) with a subtree attached to it. The subtree is then reinserted into the remaining tree in all possible places

Tree bisection and reconnection (TBR) - An interior branch is broken, and the two resulting fragments o the tree ar considered as separate trees. All possible connections are made between a branch of one and a branch of the other.

Page 41: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Other approaches Tree-fusing - find two near optimal trees

and exchange subgroups between the two trees

Genetic Algorithms - a simulation of evolution with a genotype that describes the tree and a fitness function that reflects the optimality of the tree

Disc Covering - upcoming paper

Page 42: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Phylogenetic Accuracy? Consistency - A phylogenetic method is consistent for a given evolutionary model if the method converges on the correct tree as the data available to the method become infinite.

Efficiency - Statistical efficiency is a measure of how quickly a method converges on the correct solution as more data are applied to the problem.

Robustness - Robustness refers to the degree to which violations of assumptions will affect performance of phylogenetic methods

Page 43: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.
Page 44: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

How reliable is MY phylogeny? Bootstrap Analysis Jackknife Analysis Posterior Probabilities (Bayesian

Approaches) Decay Indices

Page 45: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Bootstrap

Page 46: Phylogenetics “Inferring Phylogenies” Joseph Felsenstein Excellent reference.

Pseudoreplicates


Recommended