FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Molecular population genetics of adaptation from recurrent
beneficial mutation
Joachim Hermisson and Pleuni Pennings,
LMU Munich
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
How can genetic variation be maintained in a population in the face of positive selection?
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Selective sweepwith recombination
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Selective sweep with recombination
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Selective sweepwith recombination
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Selective sweepwith recombination
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Selective sweepwith recombination
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Recurrent mutation
Classical view:Adaptive substitutions occur from a single
mutational origin
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Recurrent mutation
Classical view:Adaptive substitutions occur from a single
mutational origin
What happens if the same beneficial allele
occurs recurrently in a population?
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Soft sweepfrom recurrent mutation
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Soft sweepfrom recurrent mutation
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Soft sweepfrom recurrent mutation
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Soft sweepfrom recurrent mutation
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Soft sweepfrom recurrent mutation
time →
freq
uenc
y →
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Is recurrent mutation relevant?
• What is the probability of a soft sweep under recurrent mutation?
• What is the impact on patterns of neutral polymorphism?
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Model
• Haploid population of constant size Ne
• At selected locus: recurrent mutation of rate u to a beneficial allele (or a class of equivalent alleles) with selective advantage s
• Scaled values: = 2Ne u , = 2Ne s, R = 2Ne r• Generation update: Wright-Fisher model (fitness
weighted multinomial sampling)
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Coalescent viewGenealogy of a sample from a linked locus
• What can happen one generation back in time?
time
freq
uen
cy
x1- x
n lines
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Coalescent viewCoalescence of two lines
• Rate per generation:
time
freq
uen
cy
x1- x
xN
n
e
nc1
2)(,
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Coalescent viewRecombination
• Rate per generation:
time
freq
uen
cy
x1- x
e
nrN
xnR
1
)(,
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Coalescent viewNew mutation at selected site
• Rate per generation:
time
freq
uen
cy
x1- x
xN
xn
e
nm
1
2)(,
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Coalescent view
Problem: Rates for
• coalescence
• recombination
• beneficial mutation
depend on the frequency x of the selected allele:stochastic path
xN
n
e
nc1
2,
xN
xn
e
nm
1
2,
e
nrN
xnR
1,
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Coalescent viewClassic case: Coalescence and recombination
• Probability for multiple haplotypes in a sample after a sweep due to recombination:
(Higher orders: Etheridge, Pfaffelhuber, Wakolbinger)• small for large strong selection makes broad sweep patterns)
)Log(
PrR
reco
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Coalescent view Coalescence and mutation, sample of size 2
Probability for coalescence before mutation (single haplotype)
xNe
c1
)(2,
xN
x
e
m)1(
)(2,
xi
cmchard iiP
1
1
1
2,2,2,2, )()(1)(
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Coalescent view Coalescence and mutation, sample of size 2
Probability for coalescence before mutation (single haplotype)
xi ie
i
e
hardxN
x
xNP
1
1
1
2,)1(1
11
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Coalescent view Coalescence and mutation, sample of size 2
Probability for coalescence before mutation (single haplotype)
xi ie
i
e
hardxN
x
xNP
1
1
1
2,)1(1
11
1
1
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Coalescent view Coalescence and mutation, sample of size 2
Probability for coalescence before mutation (single haplotype)
x
τ
i ie
i
τe
τ
xτ
τ
i ie
i
τe
τhard,
xN
xθ
xN
xθ
xN
xθ
xN
xθ
θP
1
1
1
1
1
1
2
)1(11
)1(11
)1(1
1
1
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Coalescent view Coalescence and mutation, sample of size 2
Probability for coalescence before mutation (single haplotype)
x
τ
i ie
i
e
xτ
τ
i ie
i
τe
τhard,
xN
xθ
N
θ
xN
xθ
xN
xθ
θP
1
1
1
1
1
1
2
)1(11
)1(11
)1(1
1
1
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Coalescent view Coalescence and mutation, sample of size 2
Probability for single or multiple haplotypes:
12 1
1
1T
N
θ
θP
e
hard,
e
soft,N
T
θP 1
2 11
T1: average time to the first coalescence or mutation-event
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Coalescent view Coalescence and mutation, sample of size 2
Sampling at time of fixation: 0 < T1 < Tfix
θP
N
T
θsoft,
e
fix
11
12
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Coalescent view Coalescence and mutation, sample of size 2
General: sampling Tobs generations after fixation:
obs
e
soft,
obs
ee
fixT
NθP
T
NN
T
θ
11
1
111
12
extra factor can be ignored for Tobs << Ne
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Coalescent view Coalescence and mutation, sample of size 2
Sampling at time of fixation: 0 < T1 < Tfix
θP
N
T
θsoft,
e
fix
11
12
Tfix / Ne ≈ 4 log() / = 2Ne s (scaled selection strength)
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Coalescent view Coalescence and mutation, sample of size 2
100 1000 100002Nes
0.05
0.1
0.15
0.2
0.25
0.3
θP
θsoft,
1
)log(41
12
Simulation results (θ = 0.4)
2soft,P
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Coalescent view Coalescence and mutation, sample of size 2
For > 500 : Tfix / Ne << 1, thus
θN
T
θP
e
soft,
11
1
12
Corresponds to approximation:
xNxN
x
ee
m
)1(
)(2,
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Coalescent view Coalescence and mutation, sample of size n
xN
n
e
nc1
2)(,
xN
nx
e
nm 12
)(,
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Coalescent view Coalescence and mutation, sample of size n
xN
n
e
nc1
2)(,
xN
nx
e
nm 12
)(,
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Coalescent view Coalescence and mutation, sample of size n
xN
n
e
nc1
2)(,
xN
nx
e
nm 12
)(,
Continuous time and time rescaling:
)/(~ xNe
2~
,n
nc2
~,
nnm
Neutral coalescent !
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Coalescent view Coalescence and mutation, sample of size n
• Problem independent of the path x and all selection parameters
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Coalescent view Coalescence and mutation, sample of size n
• Problem independent of the path x and all selection parameters• Coalescent of the infinite alleles model• Forward in time: “Hoppe urn” or Yule process with immigration
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Coalescent view Coalescence and mutation, sample of size n
• Problem independent of the path x and all selection parameters• Coalescent of the infinite alleles model• Forward in time: “Hoppe urn” or Yule process with immigration
The sampling distribution of ancestral haplotypes
can be approximated by the distribution of family sizes
in a Hoppe urn or a Yule process with immigration
Solved problem
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
ResultsEwens sampling formula
• Probability for k haplotypes, occurring n1,…, nk times
in a sample of size n:
)1()1(!
!)Pr(
1
1
nnnk
nnn
k
k
k
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
ResultsEwens sampling formula
• Probability for more than one ancestral haplotype in a sample (“soft sweep”):
1
1
, 1n
i
nsofti
iP
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
ResultsProbability of a soft sweep
Ewens approximation, sample size n = 20
0%
20%
40%
60%
80%
100%
= 0.
004
= 0.
04
= 0
.4
=
1
= 4
>4 haplos
4 haplos
3 haplos
2 haplos
1 haplo
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
ResultsProbability of a soft sweep
Simulation (2Ne s = 10 000, n = 20)
0%
20%
40%
60%
80%
100%
= 0.
004
= 0.
04
= 0
.4
=
1
= 4
>4 haplos
4 haplos
3 haplos
2 haplos
1 haplo
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
ResultsProbability of a soft sweep
Simulation (2Ne s = 10 000, n = 20)
0%
20%
40%
60%
80%
100%
= 0.
004
= 0.
04
= 0
.4
=
1
= 4
>4 haplos
4 haplos
3 haplos
2 haplos
1 haplo
Probability for multiple haplotypes > 5% for > 0.01 >95% for > 1
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
0
0.1
0.2
0.3
0.4
0.5
5/10 6/10 7/10 8/10 9/10
α =100
α =1000
α =10000
prediction
ResultsFrequency of major haplotype
Sample size 10
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
• Strong dependence on the mutation rate– More than 5% for > 0.01– E.g. African D. melanogaster: ≈ 0.05 (Li / Stephan 2006)About 16% of all single-site adaptations “soft”
• Particularly relevant for – Large populations (e.g. bacteria)– Adaptive (partial) loss-of-function mutations
When should we expect soft sweeps?Multiple haplotypes due to recurrent beneficial mutations
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Soft sweeps in data?
• Drosophila – Schlenke and Begun (Genetics 2005): LD pattern at 3 immunity
receptor genes in Californian D. simulans
• Humans– Multiple origin of FY-0 Duffy allele (loss of function)
• Plasmodium– Multiple origins of pyrimethamine resistance mutations
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Generality of the resultmigration instead of mutation
• Beneficial alleles enter by recurrent migration at rate M = 2Ne m from a genetically diverged source population
• Coalescent analysis with migration rate
xN
nM
e
nm2
)(,
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Generality of the resultmigration instead of mutation
• Beneficial alleles enter by recurrent migration at rate M = 2Ne m from a genetically diverged source population
• Coalescent analysis with migration rate
• Directly proportional to coalescence rate (no factor 1- x) Approximation holds exactly in this case
xN
nM
e
nm2
)(,
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Generality of the resultmigration instead of mutation
100 1000 10000
0.05
0.1
0.15
0.2
0.25
0.3
Psoft, 2
Selection strength 2Nes
M = 0.4
= 0.4
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Generality of the resulttime or frequency-dependent selection
• Results independent of the stochastic path x of the frequency of the beneficial allele
Independent of any form of time or frequency dependence of the selection strength
In particular: Independent of the level of dominance In particular: Holds also for adaptation from standing
genetic variation (number of independent origins)
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Generality of the resultvariance in selection coefficients
If beneficial allele corresponds to a class of alleles: some fitness differences among variants likely
Assume: 2 classes of alleles with selective advantage
(D = coefficient of variation)
D 1
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Generality of the resultvariance in selection coefficients
0
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%θθ = 0.01 θ = 1
>4
4
3
2
1
= 0.1
0.0
1
0.0
50
.1
0.2 0
0.0
1
0.0
50
.1
0.2 0
0.0
1
0.0
50
.1
0.2
Nu
mb
er
of h
ap
loty
pes
D
10000
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Generality of the resultvariance in selection coefficients
D=0
D=0.01
D=0.05
D=0.1
D=0.2
5/10 6/10 7/10 8/10 9/10
0.4
0.3
0.2
0.1
0
= 0.1
Fre
que
ncy
of m
ajo
r ha
plo
type
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Ewens + neutral coalescent prior to the sweep:
Derive frequency distribution of ancestral variation that survives the sweep
Skew toward intermediate allele frequencies (singleton frequency lower than neutral)
In contrast:
Recombination haplotypes are most likely at low frequency
Footprint of selectionFrequency spectrum of polymorphic sites
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
coalescence
recombination
mutation
Footprint of selectionFrequency spectrum of polymorphic sites
Pro
bab
ility
of e
ven
t
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Footprint of selectionFrequency spectrum of polymorphic sites
time
freq
uen
cy x1-x
recombination
mutation
coalescence
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Footprint of selectionIncluding recombination
• Analytical results– E.g. Probability for a single haplotype in sample of two:
– General: “Marked Yule process with immigration” For now …• Simulation results
– Add recurrent mutation to simulation program by Yuseob Kim
log2exp
1
1Prsing
R
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Footprint of selectionPower of Tajima’s D test at the selected gene
Neutral locus at recombination distance R to selected site:
• Recombination width of the neutral locus Rn = 10
• Neutral mutational input n = 10
• = 2Ne s = 10000
• Sample size 20
Power of Tajima’ D for various recombination distances and
sampling times after fixation of the beneficial allele
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
0 10 20 100 200 600
Footprint of selectionPower of Tajima’s D test: single origin
0
1
0.5
0.2
0.1
0.05
0.01
Tim
e si
nce
fixat
ion
in 2
Ne
gene
ratio
ns
Distance in unitsof R = 2Ne r
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
0 10 20 100 200 600
Footprint of selectionPower of Tajima’s D test: = 0.1
0
1
0.5
0.2
0.1
0.05
0.01
Tim
e si
nce
fixat
ion
in 2
Ne
gene
ratio
ns
Distance in unitsof R = 2Ne r
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
0 10 20 100 200 600
Footprint of selectionPower of Tajima’s D test: = 0.4
0
1
0.5
0.2
0.1
0.05
0.01
Tim
e si
nce
fixat
ion
in 2
Ne
gene
ratio
ns
Distance in unitsof R = 2Ne r
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
0 10 20 100 200 600
Footprint of selectionPower of Tajima’s D test: = 1
0
1
0.5
0.2
0.1
0.05
0.01
Tim
e si
nce
fixat
ion
in 2
Ne
gene
ratio
ns
Distance in unitsof R = 2Ne r
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
0 10 20 100 200 600
Footprint of selectionCondition on soft sweeps: negative D
0
1
0.5
0.2
0.1
0.05
0.01
Tim
e si
nce
fixat
ion
in 2
Ne
gene
ratio
ns
Distance in unitsof R = 2Ne r
= 0.1
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
0 10 20 100 200 600
Footprint of selectionCondition on soft sweeps: positive D
0
1
0.5
0.2
0.1
0.05
0.01
Tim
e si
nce
fixat
ion
in 2
Ne
gene
ratio
ns
Distance in unitsof R = 2Ne r
= 0.1
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Footprint of selectionTests based on linkage disequilibrium
E.g. number-of-haplotypes test (K-test) by Depaulis and Veuille
• Conditioned on number of segregating sites• Zero recombination assumed for neutral comparison• Other values as before
Power of K for various recombination distances and
sampling times after fixation of the beneficial allele
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
0 10 20 100 200 600
Footprint of selectionPower of haplotype test: single origin
0
1
0.5
0.2
0.1
0.05
0.01
Tim
e si
nce
fixat
ion
in 2
Ne
gene
ratio
ns
Distance in unitsof R = 2Ne r
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
0 10 20 100 200 600
Footprint of selectionPower of haplotype test: = 0.1
0
1
0.5
0.2
0.1
0.05
0.01
Tim
e si
nce
fixat
ion
in 2
Ne
gene
ratio
ns
Distance in unitsof R = 2Ne r
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
0 10 20 100 200 600
Footprint of selectionPower of haplotype test: = 0.4
0
1
0.5
0.2
0.1
0.05
0.01
Tim
e si
nce
fixat
ion
in 2
Ne
gene
ratio
ns
Distance in unitsof R = 2Ne r
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
0 10 20 100 200 600
Footprint of selectionPower of haplotype test: = 1
0
1
0.5
0.2
0.1
0.05
0.01
Tim
e si
nce
fixat
ion
in 2
Ne
gene
ratio
ns
Distance in unitsof R = 2Ne r
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
0 10 20 100 200 600
Footprint of selectionPower of haplotype test: = 4
0
1
0.5
0.2
0.1
0.05
0.01
Tim
e si
nce
fixat
ion
in 2
Ne
gene
ratio
ns
Distance in unitsof R = 2Ne r
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
0 10 20 100 200 600
Footprint of selectionCondition on soft sweeps: number of haplotypes
0
1
0.5
0.2
0.1
0.05
0.01
Tim
e si
nce
fixat
ion
in 2
Ne
gene
ratio
ns
Distance in unitsof R = 2Ne r
= 0.1
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Can we extend high power to a longer time after fixation?
Idea:• Use only ancestral variation
– E.g. local adaptation to an “island”: use only shared polymorphisms with the continental founder population
• Adapt neutral standard of the test accordingly
Footprint of selectionTests based on linkage disequilibrium
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
0 10 20 100 200 600
Footprint of selectionCondition on soft sweeps: ancestral haplotypes
0
1
0.5
0.2
0.1
0.05
0.01
Tim
e si
nce
fixat
ion
in 2
Ne
gene
ratio
ns
Distance in unitsof R = 2Ne r
= 0.1
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
0 10 20 100 200 600
Footprint of selectionCondition on soft sweeps: “ancestral ZnS”
0
1
0.5
0.2
0.1
0.05
0.01
Tim
e si
nce
fixat
ion
in 2
Ne
gene
ratio
ns
Distance in unitsof R = 2Ne r
= 0.1
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Summary
• Soft sweeps from recurrent mutation likely for biologically realistic parameter values
• Pattern described by Ewens sampling distribution• Result very stable with respect to the selection scenario• May be detected by LD tests, in particular if recent
mutations can be sieved out
FOOTPRINT
INTRODUCTION
COALESCENT VIEW
RESULTS
SUMMARY
Open Issues
• Unified Yule process (?) theory of coalescence, recombination, and mutation
• Description of LD patterns after soft (or hard) sweeps: Which aspect lasts the longest?