+ All Categories
Home > Documents > Thanks to Paul Lewis and Joe Felsenstein for the use of...

Thanks to Paul Lewis and Joe Felsenstein for the use of...

Date post: 21-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
97
Thanks to Paul Lewis and Joe Felsenstein for the use of slides
Transcript
Page 1: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Thanks to Paul Lewis and Joe Felsenstein for the use of slides

Page 2: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Review

• Hennigian logic reconstructs the tree if we know polarity of charactersand there is no homoplasy• UPGMA infers a tree from a distance matrix:

– groups based on similarity– fails to give the correct tree if rates of character evolution vary much

• Modern distance-based approaches:– find trees and branch lengths: patristic distances ≈ distances from

character data.– do not use all of the information in the data.

• Parsimony:– prefer the tree that requires the fewest character state changes.

Minimize the number of times you invoke homoplasy to explain thedata.

– can work well if if homoplasy is not rare– fails if homoplasy very common or is concentrated on certain parts

of the tree

Page 3: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Long branch attraction

Felsenstein, J. 1978. Cases in which

parsimony or compatibility methods will be

positively misleading. Systematic Zoology

27: 401-410.

The probability of a parsimony informative

site due to inheritance is very low,

(roughly 0.0003).

taxon1 taxon3

taxon2 taxon4

A G

A G

1.0 1.0

0.010.010.01

Page 4: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Long branch attraction

Felsenstein, J. 1978. Cases in which

parsimony or compatibility methods will be

positively misleading. Systematic Zoology

27: 401-410.

The probability of a parsimony informative

site due to inheritance is very low,

(roughly 0.0003).

The probability of a misleading parsimony

informative site due to parallelism is much

higher (roughly 0.008).

taxon1 taxon3

taxon2 taxon4

A A

G G

1.0 1.0

0.010.010.01

Page 5: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Long branch attraction data

Under such a tree misleading characters are more common that charactersthat favor the true tree.

Rare Commontaxon1 A A C C A A C Ctaxon2 A A C C G C T Gtaxon3 G C T G A A C Ctaxon4 G C T G G C T G

Page 6: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Long branch attraction

Parsimony is almost guaranteed to get this tree wrong.1 3

2 4True

1 3

2 4

Inferred

Page 7: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Likelihood

X is the data.

T is the tree.

ν is a vector of branch lengths.

Pr(X|T, ν) is the likelihood; this is sometimes

denoted L(T, ν).

Maximum likelihood: find the T and ν that givesthe highest likelihood.

Page 8: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Copyright © 2007 Paul O. Lewis 2

Combining probabilities• Multiply probabilities if the component events

must happen simultaneously (i.e. whereever you would naturally use the word AND when describing the problem)

(1/6) × (1/6) = 1/36

What is the probability of rolling two dice and having the first show 1 dot AND the second show 6 dots?

Page 9: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Copyright © 2007 Paul O. Lewis 3

Combining probabilities• Add probabilities if the component events are

mutually exclusive (i.e. whereever you would naturally use the word OR)

(1/36) + (1/36) + (1/36) + (1/36) + (1/36) + (1/36) = 1/6

What is the probability of rolling 7 using two dice? This is the same as asking "What is the probability of rolling (1 and 6) OR (2 and 5) OR (3 and 4)

OR (4 and 3) OR (5 and 2) OR (6 and 1)?"

Page 10: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Copyright © 2007 Paul O. Lewis 4

Likelihood of a single sequence

12 7 7 6G A A G T C C T T G A G A A A T A A A C T G C A C A C A C T G G

A C G T

L π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π π

π π π π

=

=

GAAGTCCTTGAGAAATAAACTGCACACACTGG

First 32 nucleotides of the ψη-globin gene of gorilla:

( ) ( ) ( ) ( )ln 12 ln 7 ln 7 ln 6 lnA C G TL π π π π= + + +

We can already see by eye-balling this that the F81 model (whichallows unequal base frequencies) will fit better than the JC69 model (which assumes equal base frequencies) because there are about twice as many As as there are Cs, Gs and Ts.

Page 11: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Likelihoods on the simplest possible tree

GA→GG

L = L1L2

= Pr(G) Pr(G→ G) Pr(A) Pr(A→ G)

= Pr(G) Pr(G→ G|ν) Pr(A) Pr(A→ G|ν)

Page 12: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Copyright © 2007 Paul O. Lewis 25

Water analogy (time 0)

•Start with container A completely full and others empty• Imagine that all containers are connected by tubes that allow

same rate of flow between any two• Initially, A will be losing water at 3 times the rate that C

(or G or T) gains water

A C G Tα

−3α

Page 13: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Copyright © 2007 Paul O. Lewis 26

Water analogy (after some time)

A C G TA’s level is not dropping as fast now because it is now also receiving water from C, G and T

Page 14: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Copyright © 2007 Paul O. Lewis 27

Water analogy (after a very long time)

Eventually, all containers are one fourth full and there is zeronet volume change – stationarity (equilibrium) has been achieved

A C G T

(Thanks to Kent Holsinger for this analogy)

Page 15: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Copyright © 2007 Paul O. Lewis 24

Probability of “A present”as a function of time

Lower curve assumes we started with some state other than A (T is used here). Over time, the probability of seeing an A at this site grows because the rate at which the current base will change into an A is α.

Upper curve assumes we started with A at time 0.Over time, the probability of still seeing an A at this site drops because rate of changing to one of the other three bases is 3α (so rate of staying the same is -3α).

The equilibrium relative frequency of A is 0.25

Page 16: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

05

1015

Obs

. Num

ber o

f diff

eren

ces

Number of substitutions simulated onto a twenty-base sequence.

1 5 10 15 20

Page 17: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Jukes-Cantor model

Pr(G→ G|ν) =14

+34e−4ν3

Pr(A→ G|ν) =14− 1

4e−4ν3

Page 18: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Likelihoods on the simplest possible tree

GA→GG

L = L1L2

= Pr(G) Pr(G→ G) Pr(A) Pr(A→ G)

= Pr(G) Pr(G→ G|ν) Pr(A) Pr(A→ G|ν)

=(

14

)(14

+34e−4ν3

)(14

)(14− 1

4e−4ν3

)

Page 19: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

The first 30 nucleotides of the ψη-globin gene

gorilla GAAGTCCTTGAGAAATAAACTGCACACTGGorangutan GGACTCCTTGAGAAATAAACTGCACACTGG

L =[(

14

)(14

+34e−4ν3

)]28 [(14

)(14− 1

4e−4ν3

)]2

0.00 0.05 0.10 0.15 0.20 0.25

54

53

52

51

50

ν̂ = 0.06982lnL = −51.13396

Page 20: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Copyright © 2007 Paul O. Lewis 9

A

A

A T

C

C

Likelihood of a tree(data for only one site shown)

Arbitrarily chosen to serve as the root node

Ancestral states like this are not really known - we will address this in a

minute.

Page 21: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Copyright © 2007 Paul O. Lewis 10

3 51 2 44 /3 4 /34 /3 4 /3 4 /33 3 31 1 1 1 1 1 1 14 4 4 4 4 4 4 4 4 4 4kL e e e e eν νν ν ν− −− − −⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤= + + − − +⎣ ⎦ ⎣ ⎦ ⎣ ⎦⎣ ⎦ ⎣ ⎦

A

A

A T

C

Cν2

ν1

ν3

ν4

ν5 ν5 is the expected no. substitutions for just thissegment of the tree

Likelihood for site k

PAA(ν1) PAA(ν2) PAC(ν3)

πA

PCT(ν4) PCC(ν5)

Page 22: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Copyright © 2007 Paul O. Lewis 11

Brute force approach would be to calculate Lk forall 16 combinations of ancestral states and sum

Page 23: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Likelihood and Bayesian procedures

1. very computationally intensive,

2. Use all of the information in the data,

3. Let us estimate the forces of character evolution while

estimating trees,

4. Uses models to detect concerted patterns of homoplasy

(this is how likelihood based procedures avoid long-branch

attraction).

Page 24: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Tree Searching

Parsimony and ML give us ways to deciding whether one tree

is fits our data better than another tree, but . . .

How do we find the best tree?

(or one that is good enough)

Page 25: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

© 2007 by Paul O. Lewis 5

Exhaustive Enumeration

A

B C

With the first three taxa, create the trivial unrooted tree

Page 26: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

© 2007 by Paul O. Lewis 6

A

B C

Can add fourth taxon (D) to any of the three edges

A

D

C

B

B

D

A

C

A

B

C

D

Exhaustive Enumeration...

Page 27: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

© 2007 by Paul O. Lewis 7

3 taxa

A

B C

Can add fifthtaxon (E) to any of the 5 edges of each of the 3 4-taxon trees!

A

D

C

B

B

D

A

C

A

B

C

D

B

D

E

C

AB

D

C

A

E

B

D

A

E

C

B

E

A

D

C

E

D

A

B

C

E

B

C

A

D A

E

C

B

D

A

B

C

E

D

A

B

E

D

C

A

B

D

C

E

D

E

C

A

B

E

A

C

D

BA

D

B

C

E

A

D

C

E

B

A

D

E

B

C

ExhaustiveEnumeration(getting tired yet?)

4 taxa

5 taxa

Page 28: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Tips Number of unrooted (binary) trees4 35 156 1057 9458 10,3959 135,135

10 2,027,02511 34,459,42512 654,729,07513 13,749,310,57514 316,234,143,22515 7,905,853,580,62516 213,458,046,676,87517 6,190,283,353,629,37518 191,898,783,962,510,62519 6,332,659,870,762,850,62520 22,164,309,5476,699,771,87521 8,200,794,532,637,891,559,37522 319,830,986,772,877,770,815,62523 13,113,070,457,687,988,603,440,625 > 21 moles of trees24 563,862,029,680,583,509,947,946,87525 25,373,791,335,626,257,947,657,609,375

Page 29: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

For N taxa:

# unrooted, binary trees =N−1∏i=3

(2i− 3)

=N∏i=4

(2i− 5)

# rooted, binary trees =N∏i=3

(2i− 3)

= (2N − 3)(# unrooted, binary trees)

Page 30: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Stepwise addition

A

B

C

D-1860.22536

C

BA

D

-1822.77292

B

CA

D

-1860.98996

D

BA

C

A

B

C

Page 31: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Stepwise addition

-2279.73818

D

C

E

A

B-2278.55324

E

C

D

A

B

-2303.36753

A

C

D

E

B

-2303.36753

B

C

D

A

E

-2226.51605

C

D

E

A

B

A

B

C

D-1860.22536

C

BA

D

-1822.77292

B

CA

D

-1860.98996

D

BA

C

A

B

C

Page 32: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Is stepwise addition guaranteed to find the best tree?

1 2 3 4 5

taxonA A A A A A

taxonB A C C A C

taxonC C C C T T

taxonD C A A C T

taxonE A A A T C

Page 33: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

First step of stepwise addition

1 2 3 4 5

taxonA A A A A A

taxonB A C C A C

taxonC C C C T T

taxonD C A A C T

Page 34: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

First step of stepwise addition

1 2 3 4 5

taxonA A A A A A

taxonB A C C A C

taxonC C C C T T

taxonD C A A C T

tree (A, B, (C, D)) 1 2 2 2 2 9

tree (A, C, (B, D)) 2 2 2 2 2 10

tree (A, D, (B, C)) 2 1 1 2 2 8

Page 35: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

1 2 3 4 5

taxonA A A A A A

taxonB A C C A C

taxonC C C C T T

taxonD C A A C T

taxonA taxonB

taxonCtaxonD

taxonA taxonD

taxonCtaxonB

12 3

1

1

2 23 3

54

45

445

5

Page 36: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

1 2 3 4 5

taxonA A A A A A

taxonB A C C A C

taxonC C C C T T

taxonD C A A C T

taxonA taxonB

taxonCtaxonD

taxonA taxonD

taxonCtaxonB

12 3

1

1

2 23 3

54

45

445

5

taxonE A A A T C

taxonE

taxonE

54

Page 37: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Comparison of two five taxon trees

1 2 3 4 5

taxonA A A A A A

taxonB A C C A C

taxonC C C C T T

taxonD C A A C T

taxonE A A A T C

tree ((A, B), E, (C, D)) 1 2 2 2 2 9

tree ((A,E), D, (B, C)) 2 1 1 3 3 10

Page 38: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Stepwise addition

• heuristic – not guaranteed to find the best tree

• Number of trees scored for N taxa :

# trees scored =N−1∑i=3

(2i− 3)

= (N − 1)(N − 3)

Thus, stepwise addition is O(N2). For N=10:

63 = 3 + 5 + 7 + 9 + 11 + 13 + 15

Page 39: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Trying to improve a tree

Heuristic hill-climbing searches can work quite well:

1. Start with a tree2. Score the tree3. Consider a new tree within the neighborhood of the current tree:

(a) Score the new tree.(b) If the new tree has a better tree, use it as the “current tree”(c) Stop if there are no other trees within the neighborhood to consider.

These are not guaranteed to find even one of the optimal trees.

The most common way to explore the neighborhood of a tree is to swapthe branches of the tree to construct similar trees.

Page 40: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Greedy search for a maximum

If start here

Week 2: Searching for trees, ancestral states – p.2/51

Page 41: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Greedy search for a maximum

If start here

Week 2: Searching for trees, ancestral states – p.3/51

Page 42: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Greedy search for a maximum

If start here

Week 2: Searching for trees, ancestral states – p.4/51

Page 43: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Greedy search for a maximum

If start here

Week 2: Searching for trees, ancestral states – p.5/51

Page 44: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Greedy search for a maximum

If start here

Week 2: Searching for trees, ancestral states – p.6/51

Page 45: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Greedy search for a maximum

If start here

Week 2: Searching for trees, ancestral states – p.7/51

Page 46: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Greedy search for a maximum

If start here

Week 2: Searching for trees, ancestral states – p.8/51

Page 47: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Greedy search for a maximum

If start here

Week 2: Searching for trees, ancestral states – p.9/51

Page 48: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Greedy search for a maximum

If start here

Week 2: Searching for trees, ancestral states – p.10/51

Page 49: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Greedy search for a maximum

If start here

Week 2: Searching for trees, ancestral states – p.11/51

Page 50: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Greedy search for a maximum

If start here

Week 2: Searching for trees, ancestral states – p.12/51

Page 51: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Greedy search for a maximum

If start here

Week 2: Searching for trees, ancestral states – p.13/51

Page 52: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Greedy search for a maximum

If start here

Week 2: Searching for trees, ancestral states – p.14/51

Page 53: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Greedy search for a maximum

If start here

Week 2: Searching for trees, ancestral states – p.15/51

Page 54: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Greedy search for a maximum

If start here

Week 2: Searching for trees, ancestral states – p.16/51

Page 55: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Greedy search for a maximum

end up here

If start here

Week 2: Searching for trees, ancestral states – p.17/51

Page 56: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Greedy search for a maximum

end up here but global maximum is here

If start here

Week 2: Searching for trees, ancestral states – p.18/51

Page 57: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Nearest-neighbor rearrangements

U V

U V

U V

S T

S T

S T S T

U V

and reforming them in one of the two possible alternative ways:

is rearranged by dissolving the connections to an interior branch

A subtree

Week 2: Searching for trees, ancestral states – p.19/51

Page 58: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Schoenberg graph – edges connect NNI neighbors

D BC EA

C ED AB D C

A EB

A CD EB

E BC DA B C

D EA

C BD EA

A BD EC

E BD CA

E CB DA

B DC EA

B CE DA

A BE CD

C DB EA D B

E CA

Page 59: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Tree “Islands” possible

An Op − L tree island (sensu Maddison, 1991): A set of trees with score≤ L that are connected to each other by Op operations such that you canget from any tree in the set to any other tree by repeated Op changes andall intermediate trees along the path are also members of the set.

The following Schoenberg graph shows the scores of the 15 trees on thefollowing dataset (contrived data by POL):

A ACGCAGGTB ATGGTGATC GCTCACGGD ACTGTCGTE GTTCTGAG

Page 60: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Schoenberg graph with parsimony scores

13

13

16

15

16

15

14

14

161513

14

15

13

14

Page 61: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Tree Islands implications

1. Islands can be larger than 1 tree – we must

consider ties if we want to find all trees that

optimize the score.

2. Swapping to completion on all optimal trees found

in a search is not guaranteed to succeed.

3. The delimitation of an island depends on tree

changing operation used.

Page 62: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Heuristics explore “Tree Space”

Most commonly used methods

are “hill-climbers.”

Multiple optima found by

repeating searches from

different origins.

Severity of the problem

of multiple optima

depends on step size.

Page 63: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Subtree Pruning Regrafting (SPR) and Tree BisectionReconnection (TBR)

C

I

D

E

F

GH

A

B

C

I

D

E

F

G

H

A

B

C I

D

E

FG

H

AB

C I

D

E

FH

G

AB

C I

D

E

FG

H

AB

C I

D

E

HG

F

AB

C

I

D

E

A

B

F

GH

SPR maintains

subtree rooting

TBR tries all

possible rootings

Page 64: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

1-Edge-contract Refine

C

I

D

E

F

GH

A

B

C

I

D E

F

GH

A

B

C

I

D E

F

G

H

A

B

C

I

F

G

D

EH

A

B

Page 65: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

2-Edge-contract Refine

C

I

D

E

F

GH

A

B

C

I

DEF

GH

A

B

C

I

D E

F

G

H

A

B

C

I

F

G

D

E

H

A

B

12 other trees

Page 66: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Many other heuristic strategies proposed

• Swapping need not include all neighbors (RAxML,

reconlimit in PAUP*)

• “lazy” scoring of swaps (RAxML)

• Ignoring (at some stage) interactions between different

branch swaps (PHYML)

• Stochastic searches

– Genetic algorithms (GAML, MetaPIGA, GARLI)

– Simulated annealing

• Divide and conquer methods (the sectortial searching of

Goloboff, 1999; Rec-I-DCM3 Roshan 2004)

• Data perturbation methods (e.g. Kevin Nixon’s “ratchet”)

Page 67: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Population withvariation

Page 68: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Population withvariation

lnL calculated

-127.5

-128.1

-131.0

-131.6

-132.0

Page 69: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Population withvariation

lnL calculatedFitnesscalculated

0.623

0.341

0.019

0.010

0.007

Page 70: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Population withvariation

lnL calculatedFitnesscalculated Selection

Page 71: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Population withvariation

lnL calculatedFitnesscalculated Selection

Mutation

Page 72: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Divide-and-Conquer Methods

The basic outline of a phylogenetic Divide-and-Conquer approach is:

1. Decompose a starting tree into subsets of the taxa.

2. Improve the tree for each of the subsets of taxa.

3. Merge the resulting trees into a tree for the full set of taxa.

4. Refine the full tree (it will often have polytomies).

5. Improve the full tree using a simple (and fast) heuristic.

Examples include Rec-I-DCM3 by Roshan et al.(2004). See Goloboff and Pol(Systematic Biology, 2007) for a contrasting viewpoint about the relativeefficiency of Rec-I-DCM3 compared to heuristics implemented in TNT.

Page 73: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Step 1: Leaf set decomposition

In Rec-I-DCM3 Roshan et al. (2004):

• A tree is divided (“decomposed”) into 4 trees around a central edge.The edge is chosen such that it comes as close as possible to dividingthe taxa into 2 equally-sized groups.

• The short quartet (taxa closest to this edge in each of the 4 directions)is selected.

• 4 sub-problems are produced. Each contains 1 subtree connected to thecentral edge and all leaves that are a part of the short quartet.

Page 74: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

1

2

3

98

7 5

6

10

11

12

16

13

14

15

22

23

24

17

18

19 20

21

Page 75: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

1

2

3

98

7 5

6

10

11

12

16

13

14

15

22

23

24

17

18

19 20

21

Page 76: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

1

2

3

98

7 5

6

10

11

12

16

13

14

15

22

23

24

17

18

19 20

21

Page 77: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

1

2

3

98

7 5

6

10

11

12

16

13

14

15

22

23

24

17

18

19 20

21

Page 78: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

1

2

3

98

7 5

6

10

11

12

16

13

14

15

22

23

24

17

18

19 20

21

Page 79: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

1

2

3

98

7 5

6

10

11

12

16

13

14

15

22

23

24

17

18

19 20

21

Page 80: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

1

2

3

98

7 5

6

10

11

12

16

13

14

15

22

23

24

17

18

19 20

21

Page 81: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Step 2: Tree improvement

Simply a tree search on a smaller tree

DCM is a “meta-method” that can be used with almost any type oflarge-scale tree inference.

Page 82: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Step 3: Tree Merge (Supertree analysis)

The step of “glueing” the trees for subproblems together is a supertreeanalysis.

If there is no conflict between the input trees, the problem is trivial.

Roshan et recommend using a Strict Consensus Merger - collapse theminimal number of edges required to make 2 trees display the same tree(for the leaves that they have in common).

Page 83: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

1

23

5

67

8

9

10

16

22

1

10 11

12

16 22

+

Page 84: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

1

23

5

67

8

9

10

16

22

1

10 11

12

16 22

1

23 5

67

8

9

12

10

11

1622

+

=

Page 85: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

9

13

5

67

8

10

2

16

22

11

10 1

12

16 22

2

13 5

67

8

101211

16 22

9

Page 86: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Step 4: Tree Refine

Optional step - some tree searching methods require binary trees

Step 5: Tree Improve

Another “base method” tree search (but with a large set of taxa, so theseach often has to be less thorough)

Page 87: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

1

2

3

98

7 5

6

10

11

12

16

13

14

15

22

23

24

17

18

19 20

21

Page 88: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

1

2

3

98

7 5

6

10

11

12

16

13

14

15

22

23

24

17

18

19 20

21

111

101010

161616

222222

23

24

17

18

19 20

21

111

2

3

98

7 5

6

101010

222222

101010

161616

13

141414

15

Decompose

Page 89: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

1

2

3

98

7 5

6

10

11

12

16

13

14

15

22

23

24

17

18

19 20

21

111

101010

161616

222222

23

24

17

18

19 20

21

111

2

3

98

7 5

6

101010

222222

101010

161616

13

141414

15

Decompose

Small Tree Improve

Page 90: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

13

1

2

3

98

7 5

6

10

11

12

16

13

14

15

22

23

24

17

18

19 20

21

Decompose Tree Merge

1

2

3

98

7 5

6

10

11

12

16

13

14

15

22

23

24

17

18

19 20

21

Small Tree Improve

Page 91: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

13

1

2

3

98

7 5

6

10

11

12

16

13

14

15

22

23

24

17

18

19 20

21

Decompose Tree Merge

Tree Refine

1

2

3

98

7 5

6

10

11

12

16

13

14

15

22

23

24

17

18

19 20

21

Small Tree Improve

Page 92: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

13

1

2

3

98

7 5

6

10

11

12

16

13

14

15

22

23

24

17

18

19 20

21

Small Tree Improve

Decompose Tree Merge

Tree Refine

1

2

3

98

7 5

6

10

11

12

16

13

14

15

22

23

24

17

18

19 20

21

Large Tree Improve

Page 93: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Recursion

A recursive algorithm is one that calls (invokes) itself.

A definition of the function to compute the factorial is the classic example:

def factorial(n):if n == 1:

return 1else:

return n * factorial(n - 1)

Recursion is often used when it is easy to perform a few tasks, but thenyou are faced with the same problem you originally faced, but on a smallerscale.Recursive DCM3 arises from the recognition that, when we break our fullset of taxa into subsets some of them may still be too large for thoroughsearching. We can use another level of DCM to break them down intosmaller problems.

Page 94: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

13

1

2

3

98

7 5

6

10

11

12

16

13

14

15

22

23

24

17

18

19 20

21

1

2

3

98

7 5

6

10

11

12

16

13

14

15

22

23

24

17

18

19 20

21

1313

Page 95: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

Iteration

Because the decompositions are sensitive to the starting tree, we may do abetter job decomposing the tree into closely related subtrees if we have abetter estimate of the tree.

So we can simply repeat the whole recursive DCM process

Page 96: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

13

1

2

3

98

7 5

6

10

11

12

16

13

14

15

22

23

24

17

18

19 20

21

1

2

3

98

7 5

6

10

11

12

16

13

14

15

22

23

24

17

18

19 20

21

1313

13

1313

1

2

3

98

7 5

6

10

11

12

16

13

14

15

22

23

24

17

18

19 20

21

Page 97: Thanks to Paul Lewis and Joe Felsenstein for the use of slidesphylo.bio.ku.edu/BIOL428/MLAndTreeSearch.pdf · Long branch attraction Felsenstein, J. 1978. Cases in which parsimony

References

Maddison, D. (1991). The discovery and importance of multiple islands of most-parsimonious trees. Systematic Zoology,40(3):315–328.


Recommended