How many characters are needed to reconstruct the true tree?

Post on 01-Jan-2016

21 views 0 download

Tags:

description

Future Directions in Phylogenetic Methods and Models, 17 – 21 Dec 07. How many characters are needed to reconstruct the true tree?. Mareike Fischer and Mike Steel. The Problem. Given: Sequence of characters (e.g. DNA) Wanted: Reconstruction of the ‘true’ tree - PowerPoint PPT Presentation

transcript

Mareike Fischer

How many characters are needed to reconstruct the true tree?

Mareike Fischer

and Mike Steel

Future Directions in Phylogenetic Methods and Models, 17 – 21 Dec 07

Mareike Fischer

The Problem

Given: Sequence of characters (e.g. DNA)

Wanted: Reconstruction of the ‘true’ tree

Solution: Maximum Parsimony, Maximum Likelihood, etc.

But: Is the sequence long enough for a reliable reconstruction?

Mareike Fischer

Previous Approaches

1.Churchill, von Haeseler, Navidi (1992)

• 4 taxa scenario• Observations:

The probability of reconstructing the true tree increases with the length of the interior edge.

“Bringing the outer nodes closer to the central branch can increase [this probability] dramatically.”

more characters

Rec.

Pro

b.

int. edge

Mareike Fischer

Previous Approaches

2. Yang (1998)

• 4 taxa scenario, interior edge ‘fixed’ at 5% of tree length

• 5 different tree-shapes were investigated• Observations:

‘Farris Zone’: MP

better

‘Felsenstein Zone’: ML better

The optimal length for the interior edge ranges

between 0.015 and 0.025.

Tree length

Rec.

Pro

b.

Mareike Fischer

Our Approach

• Limitation: Most previous approaches are based on simulations.

• Our approach: Mathematical analysis of influence of branch lengths on tree reconstruction.

• We investigate MP first and consider other methods afterwards.

Mareike Fischer

Already known

x

y

y

y

y

Here, the number k of

characters needed to

reconstruct the true

tree grows at rate .

But what happens if we fix the ratio (y:=px), and then

take the value of x that minimizes k?

Steel and Székely (2002):

Mareike Fischer

Our Approach

Setting: 4 taxa, pending edges of length px (with p>1), short interior edge of length x, 2-state symmetric model.

x

px

px

px

px

Mareike Fischer

Main Result

k grows at least at rate p2

For the optimal value of x, k grows at rate p2

For ‘reliable’ MP reconstruction:

Mareike Fischer

Idea of Proof: 1. Applying the CLT

. Then (by CLT)

Set

Xi i.i.d., and

Note that the true tree T1 will be favored over T2 if and only if Zk>0.

Mareike Fischer

Idea of Proof: 2. The Hadamard Representation

Since the Xi are i.i.d., μk and σk depend only on k and the probabilities P(X1=1) and P(X1=-1).

These probabilities can using the ‘Hadamard Representation’:

(Here, θ=e-

2x.)

Thus, for fixed p, the ratio

to find a value of x that minimizes k.

Note that P(X1=1) and P(X1=-1) only depend on x and p.

can be used

Mareike Fischer

Summary and Extension

• For MP, the number k of characters needed to reliably reconstruct the true tree grows at rate p2.

• Can other methods do better (e.g. rate p)? No! [Can be shown using the

‘Hellinger distance’.]

Mareike Fischer

Outlook

Questions for future work:• What happens when you approach the

‘Felsenstein Zone’?

• What happens in general with different tree shapes or more taxa?

Mareike Fischer

Thanks…

… to my supervisor Mike Steel,

… to the Newton Institute for organizing this great conference,

… to the Allan Wilson Centre for financing my research,

… to YOU for listening or at least waking up early enough to read this message .

Mareike Fischer

The only true tree…

Merry Christmas!

… is a Christmas tree .

(And it does not even require reconstruction!)