+ All Categories
Home > Documents > Rooting Phylogenetic Trees with Non-reversible Substitution Models

Rooting Phylogenetic Trees with Non-reversible Substitution Models

Date post: 14-Jan-2016
Category:
Upload: leigh
View: 20 times
Download: 2 times
Share this document with a friend
Description:
Rooting Phylogenetic Trees with Non-reversible Substitution Models. Von Bing Yap* and Terry Speed § *Statistics and Applied Probability, National University of Singapore § Statistics, University of California Reference: BMC Evolutionary Biology 5:1 (2005). Molecular Phylogenetics. - PowerPoint PPT Presentation
22
Rooting Phylogenetic Trees with Non- reversible Substitution Models Von Bing Yap* and Terry Speed § *Statistics and Applied Probability, National University of Singapore § Statistics, University of California
Transcript
Page 1: Rooting Phylogenetic Trees with Non-reversible Substitution Models

Rooting Phylogenetic Trees with Non-reversible Substitution Models

Von Bing Yap* and Terry Speed§

*Statistics and Applied Probability,

National University of Singapore§Statistics, University of California

Reference: BMC Evolutionary Biology 5:1 (2005)

Page 2: Rooting Phylogenetic Trees with Non-reversible Substitution Models

Molecular Phylogenetics

• From alignments to trees.

• Many methods: parsimony, distance, stochastic models.

Page 3: Rooting Phylogenetic Trees with Non-reversible Substitution Models

Reversible Models

• Almost all substitution models are reversible: for example, Pr(anc=A, des=C) = Pr(anc=C, des=A).

• Rooted trees that give the same unrooted tree are indistinguishable.

Page 4: Rooting Phylogenetic Trees with Non-reversible Substitution Models

Stationary Models

• Character states have the same frequencies everywhere on the tree.

• Root can be identified (Yang 1994, Huelsenbeck et al. 2001).

Page 5: Rooting Phylogenetic Trees with Non-reversible Substitution Models

Nonstationary Models

• Yang and Roberts (1985)

• Galtier and Gouy (1998)

Page 6: Rooting Phylogenetic Trees with Non-reversible Substitution Models

NON-STATIONARY

STATIONARY

REVERSIBLE

SUBSTITUTIONMODELS

Page 7: Rooting Phylogenetic Trees with Non-reversible Substitution Models

The Simplest NSTA Model

• Parameters:

rooted tree topology

θ: root base frequency

Q: rate matrix (calibrated)

branch lengths

No relationship between θ and Q.

Page 8: Rooting Phylogenetic Trees with Non-reversible Substitution Models

Specialisations

• If θ is the equilibrium distribution of Q, get STA.

• If in addition, Q satisfies the detailed balance conditions, get REV.

Page 9: Rooting Phylogenetic Trees with Non-reversible Substitution Models

Probability of alignment

• Felsenstein’s algorithm can be used to compute the probability of one site.

• Multiplying across sites gives probability of alignment.

Page 10: Rooting Phylogenetic Trees with Non-reversible Substitution Models

Tree Inference

• Fix a rooted tree.

• Find the most likely parameter values.

• The maximum likelihood is the support of the tree.

• Choose tree with highest support.

Page 11: Rooting Phylogenetic Trees with Non-reversible Substitution Models

Site Heterogeneity

• Codon positions, secondary structure.

• Deterministic or random relative rates can be accommodated in the model.

• Two deterministic models: codon position, and codon position + fast/slow.

Page 12: Rooting Phylogenetic Trees with Non-reversible Substitution Models

Two deterministic models

• codon: 3 fixed unknown rates, corresponding to codon positions, with weighted average 1.

• codonsite: get two classes of amino acids (fast/slow) from CLUSTAL alignment output. Coupled with codon positions, get 6 unknown rates with weighted average 1.

Page 13: Rooting Phylogenetic Trees with Non-reversible Substitution Models

Test Data Sets

• A: human, chimp, gorilla

• B: human, mouse, rat

• C: human, chimp, gorilla, orangutan

• D: human, chimp, mouse, rat

• E: human, mouse, chicken, frog

• 13 mitochondrial protein-coding genes

Page 14: Rooting Phylogenetic Trees with Non-reversible Substitution Models

Method

• Unrooted tree is assumed known.

• For each rooted tree consistent with the unrooted tree, its support is the maximum loglikelihood upon finding the MLE of the process parameters and branch lengths.

Page 15: Rooting Phylogenetic Trees with Non-reversible Substitution Models

Method (continued)

• Three processes: REV, STA, NSTA

• Three site models: novar (no variations), codon (3 classes), codonsite (6 classes).

Page 16: Rooting Phylogenetic Trees with Non-reversible Substitution Models

Method (continued)

• Two outcomes

(a) number of genes for which the correct rooted tree is the most likely

(b) does the model get the right rooted tree when the loglikelihoods are summed over genes?

Page 17: Rooting Phylogenetic Trees with Non-reversible Substitution Models

Number of successes

A B C D E

novarNSTA 5 11 12 12 0

STA 4 7 3 4 4

codonNSTA 9 13 11 12 9

STA 3 6 2 2 5

codon site

NSTA 9 12 10 12 8

STA 3 5 1 5 6

Page 18: Rooting Phylogenetic Trees with Non-reversible Substitution Models

Combined genes: Does it get the right tree?

A B C D E

novarNSTA N Y Y Y N

STA Y N N Y N

codonNSTA Y Y Y Y Y

STA Y N N Y Y

codonsiteNSTA Y Y Y Y Y

STA Y N N N N

Page 19: Rooting Phylogenetic Trees with Non-reversible Substitution Models

Discussion (1)

• In general, NSTA fits much better than STA, which fits much better than REV, by the likelihood ratio test criterion.

• Not only does NSTA get the right tree more often than STA, it is also more discriminative: the best tree has much larger support compared to the other trees.

Page 20: Rooting Phylogenetic Trees with Non-reversible Substitution Models

Discussion (2)

• The codon+site model of site variation is very crude, and this may explain why the performance is worse than codon model.

• Need to use better methods. Also need to compare with random model, like discrete gamma.

Page 21: Rooting Phylogenetic Trees with Non-reversible Substitution Models

Discussion (3)

• The NSTA only has 3 more parameters than STA, and 6 more than REV, so the extra computation is not heavy.

• Also, since it is possible to identify the root, perhaps NSTA should be used routinely.

Page 22: Rooting Phylogenetic Trees with Non-reversible Substitution Models

Discussion (4)

• Constraint on NSTA: base compositions of sequences that are equally distant from the root are the same. This may not hold.

• Software freely available upon request. Email [email protected]


Recommended