Likelihood methods

transcript

Likelihood methods

Trees - “What is the probability that a

proposed model of sequence evolution and a particular tree would give rise to the observed data?” “What tree and model would maximize the probability of observing the observed data?

In practice, the data are “given,” the tree is a hypothesis, and the model of the evol’n process is usually unknown, but w/ parameters either “given” based on external knowledge or estimated from the data set.

Therefore, we search for the hypothesis (tree) that gives the best probability of getting the observed data.

P (data) :: tree, model

Potential Benefits of Likelihood

• Improved compensation for superimposed changes using explicit models

• Method is consistent• Usually minimizes variance of model

parameters• Often robust to violations of assumptions• Estimation and testing of evolutionary

models and hypotheses is a natural outcome

Likelihood of a tree

Likelihood of a tree IIFixed

Tree-dependent

4 bases x 4 bases = 16 possibles. Some much more probable.

Likelihood of a tree IIIIf we can assume that nucleotide sites evolve independently, the Likelihood of full tree is product of likelihood at each site -- because these are vanishingly small., usu. Would log transform, so log likelihood of the tree is sum of log likelihoods of each site

if L(tree1) = .0000002, ln L = -15.4

if L(tree2) = .0000004, ln L = -14.7

If L(tree3) = .0000008, ln L = -14.0

Likelihood of a tree IV

0. Prior probability of an “A”

1. X P ( retaining A)

2. X P ( A to C)

3. X P ( A to C)

4. X P ( retaining A)

5. X P ( A to G)

Probabilities are a function of:Substitution model, base frequencies, branch lengths

Calculation of probability of substitution or retention

Probabilities are a function of:Substitution model, base frequencies, branch lengths

* See example in Mount, p. 277* Formal analysis takes uses the model (JC, HKY, etc.) to generate explicit probabilities

eg., Probability of a substitution:

Under Jukes-Cantor

PC = (1 + 3 e )/4

PnotC = 3/4 * (1 - e )

aC fdb

Likelihood of state i at position j in A

Branch length

Ie., Conditional likelihood that A has state i is the product of the likelihoods that the i could have given rise to the outcomes in B and C

Prob of state i changing to state k

Likelihood that B has state k

*Likelihood that i could give rise to state in B

Similar for going to outcome in C

= max[L(null hypothesis data)] max[L(alternative hypothesis data)] Huelsenbeck et al (1997) Science. 276:227

Likelihood Ratio test

**** effective Likelihood analysis requires a lg. Dataset, and full ML analysis is comput. intensive

Likelihood of a tree - review

= max[L(null hypothesis data)] max[L(alternative hypothesis data)] Huelsenbeck et al (1997) Science. 276:227

Likelihood Ratio test

**** effective Likelihood analysis requires a lg. Dataset, and full ML analysis is comput. intensive

Likelihood methods

Documents