Post on 14-Jan-2016
description
transcript
Likelihood methods
Trees - “What is the probability that a
proposed model of sequence evolution and a particular tree would give rise to the observed data?” “What tree and model would maximize the probability of observing the observed data?
In practice, the data are “given,” the tree is a hypothesis, and the model of the evol’n process is usually unknown, but w/ parameters either “given” based on external knowledge or estimated from the data set.
Therefore, we search for the hypothesis (tree) that gives the best probability of getting the observed data.
P (data) :: tree, model
Potential Benefits of Likelihood
• Improved compensation for superimposed changes using explicit models
• Method is consistent• Usually minimizes variance of model
parameters• Often robust to violations of assumptions• Estimation and testing of evolutionary
models and hypotheses is a natural outcome
Likelihood of a tree
Likelihood of a tree IIFixed
Tree-dependent
4 bases x 4 bases = 16 possibles. Some much more probable.
Likelihood of a tree IIIIf we can assume that nucleotide sites evolve independently, the Likelihood of full tree is product of likelihood at each site -- because these are vanishingly small., usu. Would log transform, so log likelihood of the tree is sum of log likelihoods of each site
eg,
if L(tree1) = .0000002, ln L = -15.4
if L(tree2) = .0000004, ln L = -14.7
If L(tree3) = .0000008, ln L = -14.0
Likelihood of a tree IV
0. Prior probability of an “A”
1. X P ( retaining A)
2. X P ( A to C)
3. X P ( A to C)
4. X P ( retaining A)
5. X P ( A to G)
Probabilities are a function of:Substitution model, base frequencies, branch lengths
Calculation of probability of substitution or retention
Probabilities are a function of:Substitution model, base frequencies, branch lengths
* See example in Mount, p. 277* Formal analysis takes uses the model (JC, HKY, etc.) to generate explicit probabilities
eg., Probability of a substitution:
Under Jukes-Cantor
PC = (1 + 3 e )/4
PnotC = 3/4 * (1 - e )
aC fdb
c
e
-4 t
-4 t
Likelihood of state i at position j in A
Branch length
Ie., Conditional likelihood that A has state i is the product of the likelihoods that the i could have given rise to the outcomes in B and C
Prob of state i changing to state k
Likelihood that B has state k
*Likelihood that i could give rise to state in B
Similar for going to outcome in C
= max[L(null hypothesis data)] max[L(alternative hypothesis data)] Huelsenbeck et al (1997) Science. 276:227
Likelihood Ratio test
Potential Benefits of Likelihood
• Improved compensation for superimposed changes using explicit models
• Method is consistent• Usually minimizes variance of model
parameters• Often robust to violations of assumptions• Estimation and testing of evolutionary
models and hypotheses is a natural outcome
**** effective Likelihood analysis requires a lg. Dataset, and full ML analysis is comput. intensive
Likelihood of a tree - review
= max[L(null hypothesis data)] max[L(alternative hypothesis data)] Huelsenbeck et al (1997) Science. 276:227
Likelihood Ratio test
Potential Benefits of Likelihood
• Improved compensation for superimposed changes using explicit models
• Method is consistent• Usually minimizes variance of model
parameters• Often robust to violations of assumptions• Estimation and testing of evolutionary
models and hypotheses is a natural outcome
**** effective Likelihood analysis requires a lg. Dataset, and full ML analysis is comput. intensive