RA FisherRA Fisher1890 - 19621890 - 1962
““Natural selection is a Natural selection is a mechanism for mechanism for generating an generating an
exceedingly high exceedingly high degree of degree of
improbability”improbability”
Testing for the Extreme Value Testing for the Extreme Value Domain of Attraction of Beneficial Domain of Attraction of Beneficial
Fitness EffectsFitness Effects
Craig J. BeiselCraig J. BeiselBioinformatics and Computational BiologyBioinformatics and Computational Biology
Department of MathematicsDepartment of [email protected]
ConceptsConcepts
Natural SelectionThe differential survival and reproduction of
individuals within a population based on hereditary characteristics.
ConceptsConcepts
AdaptationThe adjustment of an organism or population to a
new or altered environment through genetic changes brought about by natural selection.
ConceptsConcepts
PhenotypeThe overall attributes of an organism arising due to
the interaction of its genotype with the environment.
ConceptsConcepts
GenotypeThe specific genetic makeup of an individual
ConceptsConcepts
FitnessDescribes the ability of a genotype to reproduce.
More formally, it is defined as the ratio of the counts of a genotype before and after one generation.
ConceptsConcepts
Fitness LandscapeA function mapping genotype into fitness.
ConceptsConcepts
Fitness DistributionThe distribution of fitness for every possible
genotype in a fixed environment.
Lethal
Moderate
High
Mutational Landscape ModelMutational Landscape Model
John Maynard Smith John Maynard Smith (1920 – 2004)(1920 – 2004)
First remarked that First remarked that adaptation does not adaptation does not
take place in take place in phenotypic space, but phenotypic space, but in sequence space…in sequence space…
Mutational Landscape ModelMutational Landscape Model
Gillespie (1983)Gillespie (1983)
Given a sequence of nucleotides of length L,Given a sequence of nucleotides of length L,
There are 4There are 4LL possible sequences. possible sequences.
Each sequence has 3L neighboring sequences Each sequence has 3L neighboring sequences which are exactly one point mutation away.which are exactly one point mutation away.
Mutational Landscape ModelMutational Landscape Model
Additionally, if we assume Strong Selection Additionally, if we assume Strong Selection and Weak Mutation (SSWM) then we can and Weak Mutation (SSWM) then we can
ignore the possibility of clonal interference.ignore the possibility of clonal interference.
Formally 2Ns >>1, NFormally 2Ns >>1, Nμ<1μ<1
Therefore new mutants will fix (or not) in the Therefore new mutants will fix (or not) in the population before the next mutant arises.population before the next mutant arises.
Also, double mutants and neutral/deleterious Also, double mutants and neutral/deleterious mutations can be ignored.mutations can be ignored.
Mutational Landscape ModelMutational Landscape Model
Consider a sequence in an environment where it Consider a sequence in an environment where it is currently the most fit.is currently the most fit.
A small change occurs in the environment which A small change occurs in the environment which shifts it to be the shifts it to be the iithth most fit sequence among most fit sequence among
its one-step mutant neighbors where its one-step mutant neighbors where ii is small. is small.
Mutational Landscape ModelMutational Landscape Model
There are then There are then i-1i-1 more fit sequences which more fit sequences which the population could move to.the population could move to.
Notice that the fitnesses of these sequences Notice that the fitnesses of these sequences are in the tail of the fitness distribution.are in the tail of the fitness distribution.
Mutational Landscape ModelMutational Landscape Model
We would like to find the probability of the We would like to find the probability of the population fixing mutant population fixing mutant jj when starting with when starting with
sequence sequence ii..
Since we are dealing with only the tail of the Since we are dealing with only the tail of the fitness distribution we can apply EVT.fitness distribution we can apply EVT.
Orr’s One Step ModelOrr’s One Step Model
AssumptionsAssumptions
The fitness distribution is in the Gumbel The fitness distribution is in the Gumbel domain of attraction and therefore the domain of attraction and therefore the
fitnesses of the fitnesses of the i-1i-1 more fit one-step mutants more fit one-step mutants can be considered to be drawn from an can be considered to be drawn from an
‘exponential’ distribution by GPD.‘exponential’ distribution by GPD.
This will allow a result which is independent of This will allow a result which is independent of the underlying fitness distribution.the underlying fitness distribution.
Orr’s One Step ModelOrr’s One Step Model
LemmaLemma
Let XLet X11,…, X,…, Xnn be iid observations where X be iid observations where Xii~Exp ~Exp and Xand X(1)(1),…,X,…,X(n)(n) be their corresponding order be their corresponding order
statistics.statistics.
Then the spacings defined Then the spacings defined ΔXΔXii = X = X(i-1)(i-1) – X – X(i) (i) are are distributed exponential anddistributed exponential and
E(ΔXE(ΔXii)) = ΔX= ΔX11 / i / i
Sukhatme (1937)Sukhatme (1937)
Orr’s One Step ModelOrr’s One Step Model
SinceSince j j 2s2sjj (Haldane 1927) (Haldane 1927)
Orr’s One Step ModelOrr’s One Step Model
Taking the expected value…Taking the expected value…
Orr’s One Step ModelOrr’s One Step Model
Notice, we have an expression for the Notice, we have an expression for the expected transition probability which is expected transition probability which is
independent of the fitness of the individual independent of the fitness of the individual sequences and depends only on sequences and depends only on ii and and jj..
Orr’s One Step ModelOrr’s One Step Model
Can this model be validated empirically?Can this model be validated empirically?
Orr’s One Step ModelOrr’s One Step Model
ExperimentalExperimentalEvolutionEvolution
Natural Isolate ID11Natural Isolate ID11~3% differ from G4~3% differ from G4
MicroviridaeMicroviridaeHost - E. ColiHost - E. Coli
5577 bp5577 bp
Orr’s One Step ModelOrr’s One Step Model
20 one-step walks20 one-step walks9 observed mutations9 observed mutations
Rokyta et al (2005)Rokyta et al (2005)
Orr’s One Step ModelOrr’s One Step Model
Concluded Orr’s transition probabilities did not Concluded Orr’s transition probabilities did not explain data as well as Wahl model even explain data as well as Wahl model even
after correcting the model for mutation bias.after correcting the model for mutation bias.
Orr’s One Step ModelOrr’s One Step Model
Where did Orr go wrong?Where did Orr go wrong?
Perhaps, the tail of the fitness distribution is Perhaps, the tail of the fitness distribution is not in the Gumbel domain of attraction and not in the Gumbel domain of attraction and
therefore not exponentially distributed?therefore not exponentially distributed?
Extreme Value TheoryExtreme Value Theory
Extreme Value TheoryExtreme Value Theory
Field of statistical theory Field of statistical theory which attempts to which attempts to
describe the distribution describe the distribution of extreme values of extreme values
(maxima and minima) of (maxima and minima) of a sample from a given a sample from a given
probability distribution.probability distribution.
Extreme Value TheoryExtreme Value Theory
Notice that extreme values of a sample Notice that extreme values of a sample generally fall in the tail of the underlying generally fall in the tail of the underlying probability distribution. For example the probability distribution. For example the maximum of a sample of size 10 from a maximum of a sample of size 10 from a
standard normal distribution…standard normal distribution…
Extreme Value TheoryExtreme Value Theory
Since the tail is all that must be considered, Since the tail is all that must be considered, many results of extreme value theory are many results of extreme value theory are independent of the underlying probability independent of the underlying probability
distribution.distribution.
In fact, EVT shows almost all probability In fact, EVT shows almost all probability distributions can be classified into three distributions can be classified into three
groups by their tail behavior.groups by their tail behavior.
Extreme Value TheoryExtreme Value Theory
WeibullWeibull Finite Tail distributions Finite Tail distributions
These three types are…These three types are…
GumbelGumbel Most Common Distributions Most Common DistributionsExponential, Normal, Gamma, etc.Exponential, Normal, Gamma, etc.
FrFrééchetchet Heavy Tail Distributions Heavy Tail DistributionsCauchyCauchy
Extreme Value TheoryExtreme Value Theory
EVT allows all three types of tail behavior to EVT allows all three types of tail behavior to be described by the Generalized Pareto be described by the Generalized Pareto
Distribution (GPD)Distribution (GPD)
tau – scaletau – scale kappa-shapekappa-shape
Extreme Value TheoryExtreme Value Theory
EVT allows all three types of tail behavior to EVT allows all three types of tail behavior to be described by the Generalized Pareto be described by the Generalized Pareto
Distribution (GPD)Distribution (GPD)
Extreme Value TheoryExtreme Value Theory
The GPD not only provides the natural The GPD not only provides the natural alternative distribution for testing against alternative distribution for testing against
the exponential in this context, the null the exponential in this context, the null model of k=0 is nested which allows the model of k=0 is nested which allows the application of Maximum Likelihood and application of Maximum Likelihood and
Likelihood Ratio Testing.Likelihood Ratio Testing.
Maximum Likelihood and LRTMaximum Likelihood and LRT
Log-Likelihood for the GPD is given…Log-Likelihood for the GPD is given…
Maximum Likelihood and LRTMaximum Likelihood and LRT
Distribution of the LRT test statistic?Distribution of the LRT test statistic?
Although a common approximation is to Although a common approximation is to assume Chi-squared with one degree of assume Chi-squared with one degree of freedom, this does not appear to be the freedom, this does not appear to be the
case here.case here.
Distribution of the test statistic was Distribution of the test statistic was calculated using parametric bootstrap.calculated using parametric bootstrap.
Maximum Likelihood and LRTMaximum Likelihood and LRT
PowerPower
Probability of rejecting the null when the Probability of rejecting the null when the alternative is true.alternative is true.
1-P(Type II error)1-P(Type II error)
Can we hope to reject the null with a given Can we hope to reject the null with a given data set?data set?
Maximum Likelihood and LRTMaximum Likelihood and LRT
Maximum Likelihood and LRTMaximum Likelihood and LRT
Sensitivity AnalysisSensitivity Analysis
Determine the inflation of the Type I error Determine the inflation of the Type I error rate under violations of the null.rate under violations of the null.
If null is rejected, what is the chance that If null is rejected, what is the chance that rejection was due to inflation of alpha due rejection was due to inflation of alpha due to violations in the assumptions of the null to violations in the assumptions of the null
hypothesis?hypothesis?
Maximum Likelihood and LRTMaximum Likelihood and LRT
Violations of the Null AssumptionsViolations of the Null Assumptions
1.1. Small effect mutations have low Small effect mutations have low probability of fixation and therefore may probability of fixation and therefore may
not be observed.not be observed.
2.2. Observations include measurement error Observations include measurement error which may be normal or log-normal.which may be normal or log-normal.
Maximum Likelihood and LRTMaximum Likelihood and LRT
Maximum Likelihood and LRTMaximum Likelihood and LRT
GPD is stable to shifts of GPD is stable to shifts of threshold, analyze data threshold, analyze data relative to the smallest relative to the smallest
observed!observed!
Maximum Likelihood and LRTMaximum Likelihood and LRT
Maximum Likelihood and LRTMaximum Likelihood and LRT
If measurement error is not considered and If measurement error is not considered and our test rejects it is likely that we are safe our test rejects it is likely that we are safe in our conclusion assuming error is small.in our conclusion assuming error is small.
In the event that we fail to reject, it is likely In the event that we fail to reject, it is likely due to the loss of power encountered due to the loss of power encountered
when operating under a false null when operating under a false null hypothesis.hypothesis.
In this case, we must reanalyze our data In this case, we must reanalyze our data incorporating measurement error.incorporating measurement error.
Maximum Likelihood and LRTMaximum Likelihood and LRT
The likelihood equation of normal or The likelihood equation of normal or lognormal measurement error conditional lognormal measurement error conditional
on the GPD has no closed form ;(on the GPD has no closed form ;(
Maximum Likelihood and LRTMaximum Likelihood and LRT
Maximum Likelihood and LRTMaximum Likelihood and LRT
Standard optimization procedures fail to Standard optimization procedures fail to converge…converge…
Metropolis-Hastings and Bayesian MethodsMetropolis-Hastings and Bayesian Methods
MH AlgorithmMH Algorithm
Given X(t)Given X(t)
1. Generate Y(t) ~ g(y-x(t))1. Generate Y(t) ~ g(y-x(t))
2. Take X(t) = 2. Take X(t) = Y(t) with probability min(1,f(Y(t))/f(X(t)))Y(t) with probability min(1,f(Y(t))/f(X(t)))
X(t) otherwiseX(t) otherwise
If g(z) is normal (symmetric) then If g(z) is normal (symmetric) then convergence to posterior is assuredconvergence to posterior is assured
Metropolis-Hastings and Bayesian MethodsMetropolis-Hastings and Bayesian Methods
tau=1, kappa=-2, sigma=.1tau=1, kappa=-2, sigma=.1mean=-1.64mean=-1.64 95%CI=(-.826,-2.70)95%CI=(-.826,-2.70)
Metropolis-Hastings and Bayesian MethodsMetropolis-Hastings and Bayesian Methods
tau=1, kappa=-2, sigma=.1tau=1, kappa=-2, sigma=.1mean=.893mean=.893 95%CI=(.509,1.41)95%CI=(.509,1.41)
Metropolis-Hastings and Bayesian MethodsMetropolis-Hastings and Bayesian Methods
tau=1, kappa=-2, sigma=.1tau=1, kappa=-2, sigma=.1mean=-1.818mean=-1.818CI=(-1.47,-2.23)CI=(-1.47,-2.23)
Metropolis-Hastings and Bayesian MethodsMetropolis-Hastings and Bayesian Methods
tau=1, kappa=-2, sigma=.1tau=1, kappa=-2, sigma=.1mean=.083mean=.083 95%CI=(.034,.160)95%CI=(.034,.160)
Thanks to…Thanks to…
Darin RokytaDarin Rokyta
Paul JoycePaul Joyce
Holly WichmanHolly Wichman
Jim BullJim Bull
IBESTIBEST
NIHNIH
E. ColiE. Coli
ReferencesReferences
Gillespie, J. H. 1984. Molecular evolution over the mutational landscape. Evolution 38:1116–Gillespie, J. H. 1984. Molecular evolution over the mutational landscape. Evolution 38:1116–1129.1129.
Gillespie, J. H. 1991. The causes of molecular evolution. Oxford Univ. Press, New York.Gillespie, J. H. 1991. The causes of molecular evolution. Oxford Univ. Press, New York.
Gumbel, E. J. 1958. Statistics of Extremes. Columbia Univ. Press, New York.Gumbel, E. J. 1958. Statistics of Extremes. Columbia Univ. Press, New York.
Orr, H. A. 2002. The population genetics of adaptation: The adaptation of DNA sequences.Orr, H. A. 2002. The population genetics of adaptation: The adaptation of DNA sequences.
Evolution 56:1317–1330.Evolution 56:1317–1330.
Orr, H. A. 2003a. The distribution of fitness effects among beneficial mutations. GeneticsOrr, H. A. 2003a. The distribution of fitness effects among beneficial mutations. Genetics
163:1519–1526.163:1519–1526.
Rokyta, D. R., Joyce, P., Caudle, S. B., and Wichman, H. A. 2005. An empirical test of the Rokyta, D. R., Joyce, P., Caudle, S. B., and Wichman, H. A. 2005. An empirical test of the mutational landscape model of adaptation using a single-stranded DNA virus. Nat. Gen. mutational landscape model of adaptation using a single-stranded DNA virus. Nat. Gen. 37:441–444.37:441–444.
Rokyta, R., C.J. Beisel and P. Joyce. Properties of adaptive walks on uncorrelated Rokyta, R., C.J. Beisel and P. Joyce. Properties of adaptive walks on uncorrelated landscapes under strong selection and weak mutation. Journal of Theoretical Biology , landscapes under strong selection and weak mutation. Journal of Theoretical Biology , 243, (1), 114-120, 2006.243, (1), 114-120, 2006.
Beisel, C.J., R. Rokyta, H.A. Wichman, P. Joyce. Testing the extreme value domain of Beisel, C.J., R. Rokyta, H.A. Wichman, P. Joyce. Testing the extreme value domain of attraction for beneficial fitness effects. (Submitted Genetics)attraction for beneficial fitness effects. (Submitted Genetics)