+ All Categories
Home > Documents > 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006...

1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006...

Date post: 22-Dec-2015
Category:
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
45
1 Tricks for trees: Having reconstructed Tricks for trees: Having reconstructed phylogenies phylogenies what can we do with them? what can we do with them? DIMACS, June 2006 Mike Steel Mike Steel Allan Wilson Centre for Allan Wilson Centre for Molecular Ecology and Evolution Molecular Ecology and Evolution Biomathematics Research Centre Biomathematics Research Centre University of Canterbury, University of Canterbury, Christchurch, New Zealand Christchurch, New Zealand
Transcript

1

Tricks for trees: Having reconstructed Tricks for trees: Having reconstructed phylogenies phylogenies what can we do with them?what can we do with them?

DIMACS, June 2006

Mike Steel Mike Steel

Allan Wilson Centre for Allan Wilson Centre for Molecular Ecology and EvolutionMolecular Ecology and EvolutionBiomathematics Research CentreBiomathematics Research Centre

University of Canterbury, University of Canterbury, Christchurch, New ZealandChristchurch, New Zealand

2

Where are phylogenetic trees used? Evolutionary biology – species

relationships, dating divergences, speciation processes, molecular evolution.

Ecology – classifying new species; biodiversity, co-phylogeny, migration of populations.

Epidemiology – systematics, processes, dynamics

Extras - linguistics, stematology, psychology.

3

Phylogenetic trees[Definition] A phylogenetic X-tree is a tree T=(V,E) with a set X of labelled leaves, and all other vertices unlabelled and of degree >3.

If all non-leaf vertices have degree 3 then T is binary

4

Trees and splits

}:|{)( EeBAT ee

ee BAe |

),( XP

1

2

3

45

6

Partial order:

)'()(' TTTT

Buneman’s Theorem

5

Quartet trees• A quartet tree is a binary phylogenetic tree on 4 leaves (say, x,y,w,z) written xy|wz.

• A phylogenetic X-tree displays xy|wz if there is an edge in T whose deletion separates {x,y} from {w,z}

x

y

w

z

ry

zu

x

s

w

6

Corresponding notions for rooted trees

Clusters (in place of splits)

Triples in place of quartets

7

How are trees useful in epidemiology?

Systematics and reconstruction

How are different types/strains of a virus related?

When, where, and how did they arise?

What is their likely future evolution?

What was the ancestral sequence?

8

How are trees useful in epidemiology?

Processes and dynamics (“Phylodynamics”)

How do viruses change with time in a population? Population size etc

What is their rate of mutation, recombination, selection?

Within-host dynamcs

How do viruses evolve in a single patient?

How is this related to the progression of the disease?

How much compartmental variation exists?

10

What do the shapes of these trees tell us about the processes governing their evolution?

Eg. Population dynamics, selection

Coalescent prediction

11

a b c d e

Tree shapes (non-metric)

George Yule

13

Why do trees on the same taxa disagree?

1. Model violation1. “true model” differs from “assumed model”2. “true model = assumed model” but estimation method

not appropriate to model 3. model true but too parameter rich (non-identifyability)

2. Sampling error (and factors that make it worse!)3. Alignment error4. Evolutionary processes

1. Lineage sorting 2. Recombination3. Horizontal gene transfer; hybrid taxa4. Gene duplication and loss

14

Sampling error that’s hard to deal with

?

T4

T3

T2T1

Time

15

Example: Deep divergence in the Metazoan

phylogeny

Fungi

Choanoflagellates

Arthropods

Nematodes

Deuterostomes

Platyhelminthes

ActinopterMammaliaCnidaria

Monosiga ovata

CryptococcusPhanerochaete

Ustilago

Schizosaccharomyces

Saccharomyces

Candida

Paracooccidioides

Gibberella

MagnaporthNeurospora

Glomus

Neocallimastix

Schistosoma mansoniSchistosoma japonicum

FasciolaEchinococcus

Dugesia

Strongyloides

Caenorhabditis briggsaeCaenorhabditis elegans

AncylostomaPristionchus

Brugia

Ascaris

Heterodera

Trichinella

Glossina

DrosophilaAnopheles

Monosiga brevicollis

Urochordata

Echinodermata

Ctenophora

Meloidogyne

Tardigrades

Chelicerata

HemipteraHymenoptera

Coleoptera

SiphonapteraLepidoptera

Crustacea

AnnelidaMolluscaCephalochordata

From Huson and Bryant, 2006

16

Models

2

1

k

1

2

3

4

1

3

2

4

vs

Finite state Markov process

17

Models

1

2

3

4

vs

1

2

3

4

•“site saturation”

• subdividing long edges only offers a partial remedy (trade-off).

18

Why do trees on the same taxa disagree?

1. Model violation1. “true model” differs from “assumed model”2. “true model = assumed model” but estimation method

not appropriate to model3. model true but too parameter rich (non-identifyability)

2. Sampling error (and factors that make it worse!)3. Alignment4. Evolutionary processes

1. Lineage sorting 2. Recombination3. Horizontal gene transfer; hybrid taxa4. Gene duplication and loss

19

Gene trees vs species trees

Theorem J. H. Degnan and N.A. Rosenberg, 2006.

For n>5, for any tree, there are branch lengths and population sizes for which the most likely gene tree is different from the species tree.

Discordance of species trees with their most likely gene trees. PLoS Genetics, 2(5), e68 May, 2006

a b c a b c

20

Example

Orangutan Gorilla Chimpanzee Human

Adapted From the Tree of the Life Website,University of Arizona

?

21

Distinguishing between signals

Lineage sorting vs sampling error vs HGT

A B C

A B C

A C B

22

Why do trees on the same taxa disagree?

1. Model violation1. “true model” differs from “assumed model”2. “true model = assumed model” but estimation method

not appropriate to model3. model true but too parameter rich (non-identifyability)

2. Sampling error (and factors that make it worse!)3. Alignment4. Evolutionary processes

1. Lineage sorting 2. Recombination3. Horizontal gene transfer; hybrid taxa4. Gene duplication and loss

23

Given a tree what questions might we want to answer?

How reliable is a split? Where is the root of the tree? Relative ranking of vertices?

Dating? How well supported is some ‘deep divergence’ resolved? What model best describes the evolution of the sequences

(molecular clock? dS/dN ratio constant? etc)

Statistical approaches: Non-parametric bootstrap Parametric bootstrap Likelihood ratio tests Bayesian posterior probabilities Tests (KH, SH, SOWH)Goldman, N., J. P. Anderson, and A. G. Rodrigo. 2000. Likelihood-based tests of topologies in phylogenetics. Systematic Biology 49: 652-670.

24From Steve Thompson, Florida State Uni

25

Example

26

Non-parametric bootstrap

27

28

Dealing with incompatibility: Consensus trees

Strict Majority rule Semistrict consensus

29

Consensus networks

Take the splits that are in at least x% of the trees and represent them by a graph

Splits Graph (G()) – Dress and HusonEach split is represented by a class of ‘parallel’ edges

Simplest example (n=4).

30

R.nivicola

(NS)

(SS)

(SS)

(NS)

(SS)

(NS)

(NS)

(SS)

chloroplast

JSA tree

(A)(NS)

(SS)

(N,NS)

(A)

(SS)

(SS)

(SS)

(SS) (NS)

(C,S)

(NS)

(N)

(NS, N)

31

R.nivicola

nuclear

ITS tree

(SS)

(NS)

(NS)

(SS)

(NS) (SS)

(SS)

(NS)

(SS)

(SS) (SS)

(SS)

(NS)

(SS)

(NS,N) (NS)

(NS) (SS)

(NS,N)

(A)

(A)

(N)

(SS,NS)

32

consensus network (ITStree+JSAtree)

I

III

IIR.nivicola

33

Maximum agreement subtrees

Concept

Computational complexity

34

Comparing trees

Splits metric (Robinson-Foulds)

Statistical aspects.

Tree rearrangement operations – the graph of

trees (rSPR).

Cophylogeny

35

Co-phylogeny (m. charleston)

36

Supertrees

Compatibility concept Compatibility of rooted trees (BUILD) Why do we want to do this? Extension – higher order taxa, dates Methods for handling incompatible trees(MRP; mincut variants; minflip)

37

Compatibility

Example: Q={12|34, 13|45, 14|26}

1

2

3

4

5

6

A set Q of quartets is compatible if there is a phylogenetic X-tree T that displays each quartet of Q

Complexity?

38

Supertrees

Compatibility concept Compatibility of rooted trees (BUILD) Why do we want to do this? Extension – higher order taxa, dates Methods for handling incompatible trees(MRP; mincut variants; minflip)

39

Phylogenetic networks

Consensus setting: consensus networks Minimizing hybrid/reticulate vertices Supernetworks – Z closure, filtering

40

a

Networks can represent: Reticulate evolution (eg. hybrid species) Phylogenetic uncertainty (i.e. possible alternative trees)

Z-closure Given T1,…, Tk on overlapping sets of species,

let construct spcl2() and construct the ‘splits graph’ of the resulting splits that are ‘full’.

cb d a bc d a cbd

)()( 1 kTT

41

AA22

BB22

Split closure operation (Meacham 1986)

AA11

BB11

AA22

BB22

AA11

BB11UUBB22

AA11UUAA22

BB22

,,

AA11

BB11

42

43

44

Reconstructing ancestral sequences

Methods (MP, Likelihood, Bayesian)

Quiz. MP for a balanced tree = majority state?

Information-theoretic considerations

45

Statistics of parsimony (clustering on a tree)


Recommended