2/25/09
1
CSCI1950‐Z Computa3onal Methods for Biology
Lecture 9
Ben Raphael February 23, 2009
hHp://cs.brown.edu/courses/csci1950‐z/
Outline
Searching Through trees 1. Branch‐swapping: NNI, SPR, TBR. 2. MCMC
Consensus Trees and Supertrees
2/25/09
2
Heuris3c Search
1. Start with an arbitrary tree T. 2. Check “neighbors” of T. 3. Move to a neighbor if it provides the best
improvement in parsimony/likelihood score.
Caveats: Could be stuck in local op3mum, and not achieve global op3mum
Trees and Splits
Given a set X, a split is a par33on of X into two non‐empty subsets A and B such that X = A | B.
For a phylogene3c tree T with leaves L, each edge e defines a split Le = A | B, where A and B are the leaves in the subtrees obtained by removing e.
A B
e
2/25/09
3
Compu3ng the Splits Metric
A phylogene3c tree T defines a collec3on of splits Σ(T) = { Le | e is edge in T}.
Theorem: ρ(T1, T2) = | Σ(T1) \ Σ(T2) | + |Σ(T2) \ Σ(T1) | = |Σ(T1)| + |Σ(T2)| ‐ 2 |Σ(T1)∩Σ(T2)|
Proof: (whiteboard)
Nota3on: A \ B = {x: x ∈ A, x ∉ B}
Nearest Neighbor Interchange
Claim: The number of NNI neighbors of a binary tree is 2(n‐3)
Proof: (whiteboard)
Rearrange four subtrees defined by one internal edge
2/25/09
4
Subtree Pruning and Regrafing (SPR)
1. Remove a branch. 2. Reconnect incident vertex by
subdividing a branch
Subtree Pruning and Regrafing (SPR)
1. Remove a branch. 2. Reconnect incident vertex by
subdividing a branch
Claim: The number of SPR neighbors of a binary tree is 2(n‐3) (2n – 7)
Proof: (whiteboard)
2/25/09
5
Tree Bisec3on and Reconnec3on (TBR)
1. Remove a branch. 2. Reconnect subtrees by adding
new branch that subdivides branches in both.
Rela3onship between Opera3ons
• Every NNI is an SPR and every SPR is a TBR. • Every TBR is a single SPR or a composi3on of two SPR.
• All three types of opera3ons are inver3ble: If T T’, then T’ T.
Theorem: For all T and T’ in B(n), there is a sequence of NNI (or SPR or TBR) opera3ons that transform T into T’.
α α‐1
2/25/09
6
Rela3onship between Opera3ons
• Every NNI is an SPR and every SPR is a TBR. • Every TBR is a single SPR or a composi3on of two SPR. • All three types of opera3ons are inver3ble:
If T T’, then T’ T.
NNI TBR SPR
Heuris3c Search
1. Start with an arbitrary tree T. 2. Check “neighbors” of T. 3. Move to a neighbor if it provides the best
improvement in parsimony/likelihood score.
PAUP* (widely used phylogene3c package) includes command:
hsearch nreps=num swap=type
Where type = NNI, SPR, TBR
2/25/09
7
From Likelihood to Bayesian
Given data X = (x1, …, xn), we found the tree T and branch lengths t* that maximized likelihood Pr[X | T, t*].
What about other trees?
Could we compute Pr[T, t* | X]?
Back to Coin Flipping
Flip coin with p = Pr[heads] unknown.
Earlier we computed max. likelihood es3mate of p. L(p) = Pr[ D | p].
Pr[p | D] = Pr[ p, D]/Pr[D] = Pr[D|p]Pr[p] / Pr[D]
44 tosses 20 heads
11 tosses 5 heads
Prior Posterior
2/25/09
8
Bayesian Methods
Pr[T, t* | X] = Pr[X, T, t*] / Pr[X] = Pr[X | T, t*] Pr[T, t*] / Pr[X] = Pr[X | T, t*] Pr[T, t*] / (ΣT’, t’Pr[X | T’, t’] Pr[T’, t’]
Prior Posterior
Problem: Cannot compute denominator.
Bayes Theorem
Bayesian Methods
Pr[T, t* | X] = Pr[X, T, t*] / Pr[X] = Pr[X | T, t*] Pr[T, t*] / Pr[X] = Pr[X | T, t*] Pr[T, t*] / (ΣT’, t’Pr[X | T’, t’] Pr[T’, t’]
Prior Posterior
Problem: Cannot compute denominator.
Solu2on: Use power of Markov Chains to draw trees (“sample”) according to distribu3on Pr[T, t* | X]
Bayes Theorem
2/25/09
9
Markov Chain Monte Carlo
To sample from a distribu3on Define a Markov chain with equilibrium distribu3on π. Simulate chain through many transi3ons. Afer many transi3ons (e.g. ~10000), will be at equilibrium π. (“Burn‐in”) Output every n‐th state. (n ~ 50).
A C
G T
Jukes‐Cantor model of DNA
Equilibrium distribu3on: qA = qC = qG = qT = 1/4
MCMC on Trees
NNI neighborhood for trees with 5 leaves
1. Define a Markov chain: • States are trees T. • Equilibrium distribu3on is posterior Pr[T,
t* | X]. 2. Simulate Markov chain for many steps (burn‐
in). 3. Output T from every n‐th (e.g. n = 50) step.
2/25/09
10
MCMC on Trees
NNI neighborhood for trees with 5 leaves
1. Define a Markov chain: • States are trees T. • Equilibrium distribu3on is posterior Pr[T,
t* | X]. 2. Simulate Markov chain for many steps (burn‐
in). 3. Output T from every n‐th (e.g. n = 50) step.
For transi3ons, can use NNI, SPR, TBR, or other opera3ons.
Can define* the transi3on probabili3es of this Markov chain without compu3ng Z = (ΣT’, t’Pr[X | T’, t’] Pr[T’, t’] (Metropolis algorithm).
*“involves burning of incense, cas3ng of chicken bones, use of magical incanta3ons, and invoking the opinions of more pres3gious colleagues.” ‐‐Felsenstein
How Many Times Did Wings Evolve?
• Previous studies had shown loss of wings: winged wingless transi3ons
• Gain of wings (Wingless winged transi3on) appears to be much more complicated
2/25/09
11
Phylogeny of Insects
Build phylogeny of winged and wingless s3ck insects
Used data from: 18S ribosomal DNA (~1,900 base
pairs (bp)) 28S rDNA (2,250 bp) Por3on of histone 3 (H3, 372 bp) Used mul3ple tree reconstruc3on
techniques
(Nature 2003)
Most Parsimonious Evolu3onary Tree of Winged and Wingless Insects
• All most parsimonious reconstruc3on gave a wingless ancestor • All required mul3ple winged wingless transi3ons.
2/25/09
12
Most Parsimonious Evolu3onary Tree of Winged and Wingless Insects
Will Wingless Insects Fly Again?
• All most parsimonious reconstruc3ons all required the re‐inven3on of wings.
• It is likely that wing developmental pathways are conserved in wingless s3ck insects
2/25/09
13
Next Ques3ons
• How to combine/merge trees? • How to determine “confidence” in a par3cular tree/branch?
Mul3ple Trees?
2/25/09
14
Consensus Trees
Strict Consensus Tree
2/25/09
15
Strict Consensus
No non‐trivial splits in common! Strict consensus tree is unresolved.
Splits Equivalence Theorem
A phylogene3c tree T defines a collec3on of splits Σ(T) = { Le | e is edge in T}.
Splits A1 | B1 and A2 | B2 are pairwise compa.ble if at least one of A1∩A2 , A1∩B2 , B1∩A2, and B1∩B2 is the empty set.
Splits Equivalence Theorem: Let Σ be a collec3on of splits. There is a phylogene3c tree such that Σ(T) = Σ if and only if the splits in Σ are pairwise compa3ble.
The Pairwise Compa3bility Theorem (for binary characters) follows from this theorem.
2/25/09
16
Majority Consensus Tree
Majority Consensus Tree