+ All Categories
Home > Documents > New Approaches for Inferring the Tree of Life

New Approaches for Inferring the Tree of Life

Date post: 06-Jan-2016
Category:
Upload: romeo
View: 16 times
Download: 0 times
Share this document with a friend
Description:
New Approaches for Inferring the Tree of Life. Tandy Warnow Associate Professor Department of Computer Sciences Graduate Program in Ecology, Evolution, and Behavior Co-Director The Center for Computational Biology and Bioinformatics The University of Texas at Austin. Packard Proposal 1996. - PowerPoint PPT Presentation
21
New Approaches for New Approaches for Inferring the Tree of Life Inferring the Tree of Life Tandy Warnow Associate Professor Department of Computer Sciences Graduate Program in Ecology, Evolution, and Behavior Co-Director The Center for Computational Biology and Bioinformatics The University of Texas at Austin
Transcript
Page 1: New Approaches for Inferring the Tree of Life

New Approaches for New Approaches for Inferring the Tree of LifeInferring the Tree of Life

Tandy WarnowAssociate Professor

Department of Computer SciencesGraduate Program in Ecology, Evolution, and Behavior

Co-DirectorThe Center for Computational Biology and Bioinformatics

The University of Texas at Austin

Page 2: New Approaches for Inferring the Tree of Life

Packard Proposal 1996Packard Proposal 1996

I observed that DNA and RNA sequences are low in phylogenetic signal, as currently analyzed, and

I proposed to seek out and model new sources of significant phylogenetic signal, and then develop efficient algorithms to extract that signal, so that the inference of evolutionary history could be made with greater accuracy.

Page 3: New Approaches for Inferring the Tree of Life

What I did insteadWhat I did instead

• Developed methods for use with biomolecular sequences that recover the true tree with high probability from polynomial length sequences.

• (Last two years): Developed methods for reconstructing phylogenies from gene order and content within whole genomes.

• (Last year): Started looking at inferring non-tree models of evolution.

Page 4: New Approaches for Inferring the Tree of Life

DNA Sequence EvolutionDNA Sequence Evolution

AAGACTT

TGGACTTAAGGCCT

-3 mil yrs

-2 mil yrs

-1 mil yrs

today

AGGGCAT TAGCCCT AGCACTT

AAGGCCT TGGACTT

TAGCCCA TAGACTT AGCGCTTAGCACAAAGGGCAT

AGGGCAT TAGCCCT AGCACTT

AAGACTT

TGGACTTAAGGCCT

AGGGCAT TAGCCCT AGCACTT

AAGGCCT TGGACTT

AGCGCTTAGCACAATAGACTTTAGCCCAAGGGCAT

Page 5: New Approaches for Inferring the Tree of Life

Major Phylogenetic Reconstruction Major Phylogenetic Reconstruction MethodsMethods

• Polynomial-time distance-based methods (neighbor joining, perhaps the most popular)

• NP-hard sequence-based methods– Maximum Parsimony– Maximum Likelihood

that can take years on real datasets• Heated debates over the relative

performance of these methods

Page 6: New Approaches for Inferring the Tree of Life

Quantifying ErrorQuantifying Error

FN: false negative (missing edge)FP: false positive (incorrect edge)

50% error rate

FN

FP

Page 7: New Approaches for Inferring the Tree of Life

Absolute fast convergence Absolute fast convergence vs. exponential convergencevs. exponential convergence

Page 8: New Approaches for Inferring the Tree of Life

• DCM+SQS is a two-phase procedure which reduces the sequence length requirement of methods.

DCM SQSExponentiallyconvergingmethod

Absolute fast convergingmethod

• We modify the second phase to improve the empirical performance, replacing SQS with ML (maximum likelihood) or MP (maximum parsimony).

DCM-Boosting DCM-Boosting [Warnow et al. 2001][Warnow et al. 2001]

Page 9: New Approaches for Inferring the Tree of Life

DCMDCMNJNJ+ML vs. other methods on a +ML vs. other methods on a fixed model treefixed model tree

•500-taxon rbcL tree•K2P+ model (=2, =1)•Avg. branch length = 0.278•Relative performance is typical in our studies

Page 10: New Approaches for Inferring the Tree of Life

Comparison of methods on random trees as Comparison of methods on random trees as a function of number of taxaa function of number of taxa

•K2P+ model (=2, =1)•Avg. branch length = 0.05•Seq. length = 1000

Page 11: New Approaches for Inferring the Tree of Life

SummarySummary

• These are the first polynomial time methods that improve upon NJ (with respect to topological accuracy) and are never worse than NJ.

• The advantage obtained with DCMNJ+MP and

DCMNJ+ML increases with number of taxa.

• In practice these new methods are slower than NJ (minutes vs. seconds), but still much faster than MP and ML (which can take days).

• Conjecture: DCMNJ+ML is AFC.

Page 12: New Approaches for Inferring the Tree of Life

II. Whole-Genome PhylogenyII. Whole-Genome Phylogeny

A

B

C

D

E

F

X

Y

ZW

A

B

C

D

E

F

Page 13: New Approaches for Inferring the Tree of Life

Genomes As Signed PermutationsGenomes As Signed Permutations

1 –5 3 4 -2 -6or

6 2 -4 –3 5 –1etc.

Page 14: New Approaches for Inferring the Tree of Life

1 2 3 4 5 6 7 8 9 10

1 2 3 –8 –7 –6 –5 -4 9 10

1 2 3 9 4 5 6 7 8 10

1 2 3 9 –8 –7 –6 –5 -4 10

Inversion:

Transposition:

Inverted Transposition:

Genomes Evolve by Genomes Evolve by RearrangementsRearrangements

Page 15: New Approaches for Inferring the Tree of Life

Genome Rearrangement Has Genome Rearrangement Has A Huge State SpaceA Huge State Space

• DNA sequences : 4 states per site• Signed circular genomes with n genes:

states, 1 site

• Circular genomes (1 site)

– with 37 genes: states

– with 120 genes: states

)!1(2 1 nn

521056.2 2321070.3

Page 16: New Approaches for Inferring the Tree of Life

Our ApproachesOur Approaches

• Statistically-based genomic distance estimators so that NJ analyses are more accurate, recovering 90% of the edges even for datasets close to saturation.

• Improved bounds for tree length.• GRAPPA: high performance

implementation for the maximum parsimony problems for rearranged genomes, achieving up to 200,000-fold speedup.

Page 17: New Approaches for Inferring the Tree of Life

Accuracy of Neighbor Joining Accuracy of Neighbor Joining Using Distance EstimatorsUsing Distance Estimators

•120 genes•Inversion-only evolution (other models of evolution show the same relative performance)•10, 20, 40, 80, and 160 genomes

Page 18: New Approaches for Inferring the Tree of Life

Consensus of 216 MP Trees for Consensus of 216 MP Trees for the the CampanulaceaeCampanulaceae dataset dataset

Strict Consensus of 216 trees;6 out of 10 internal edges recovered.

Trachelium

Campanula

Adenophora

Symphandra

Legousia

Asyneuma

Triodanus

Wahlenbergia

Merciera

Codonopsis

Cyananthus

Platycodon

Tobacco

Page 19: New Approaches for Inferring the Tree of Life

Future WorkFuture Work

• New focus on Rare Genomic Changes– New data– New models– New methods

• New techniques for large-scale analyses– Divide-and-conquer methods– Non-tree models– Visualization of large trees and large sets of

trees

Page 20: New Approaches for Inferring the Tree of Life

AcknowledgementsAcknowledgements

• Funding: The David and Lucile Packard Foundation, The National Science Foundation, and Paul Angello• Collaborators: Robert Jansen (U. Texas) Bernard Moret, David Bader, Mi-Yan

(U. New Mexico) Daniel Huson (Celera) Katherine St. John (CUNY) Linda Raubeson (Central Washington U.) Luay Nakhleh, Usman Roshan, Jerry Sun,

Li-San Wang, Stacia Wyman (Phylolab, U. Texas)

Page 21: New Approaches for Inferring the Tree of Life

Phylolab, U. TexasPhylolab, U. Texas

Please visit us athttp://www.cs.utexas.edu/users/phylo/


Recommended