Date post: | 25-Aug-2018 |
Category: |
Documents |
Upload: | duongthien |
View: | 222 times |
Download: | 0 times |
Copyright: Gilean McVean, 2001 1
Population structure
• The evolutionary significance of structure
• Detecting and describing structure
– Wright’s F statistics
• Implications for genetic variability
– Inbreeding effects of structure– The Wahlund effect– Drift and founder effects
• Island models of population structure
– Identity by descent– Diffusion methods– The coalescent with structure
• Selection in subdivided populations
– Location adaptation– Clines– Wright’s Shifting-Balance theory
Copyright: Gilean McVean, 2001 2
Population structure
• Non-random location
• Non-random mating
Genetic and phenotypic divergence due to
ChanceSelectionSelection plus chance
Distribution of surname
Hannah
Goodacre and Sykes
Copyright: Gilean McVean, 2001 3
Detecting and describing genetic structure
Wright’s FST statistic
T
ST
H
HH −=
Testing by permutation
Average heterozygositywithin subpopulations
Heterozygosity overall populations
Copyright: Gilean McVean, 2001 4
The hierarchical nature of F statistics
• F statistics can be used to contrast structure at different levels
e.g.S
ISIS H
HHF
−= Average within-individualheterozygosity
measure of inbreeding
TotalRegionPopulationionSubpopulatIndividual HHHHH <<<<
Copyright: Gilean McVean, 2001 5
FST in natural populations
0.6760.0120.037Jumping rodent
0.1130.0860.097House mouse
0.0770.0360.039Human (Yanomama)
0.0690.1210.130Human (major races)
FSTHTOrganismSH
Nei (1975)
Allozymes
SNPs
0.0230.01510.0154Drosophila melanogastera
0.0670.2010.195Human (major races)
FSTHTOrganism SH
aBased on pairwise diversity
Copyright: Gilean McVean, 2001 6
The inbreeding effect of population structure
• Differences in allele frequency between populations lead to an excess of homozygotes
21
21 qp +
HWeqm
22
22
11 qpF
FFF qp
T
TSST −−
σ+σ=
−−=
Expectedhomozygosity
Observedhomozygosity
2221
21 qpqp σ+σ++
Combined samples
Heterozygosity = 1- Homozygosity
Copyright: Gilean McVean, 2001 7
The Wahlund effect
• Increase in heterozyogisty following mixing of isolated populations
• Medical implications for disease incidence in admixed populations
– Recessive disease reduced by mixing
0.013
0.07
0.022
Disease allele frequency
Ashkenazi JewsTay-Sachs disease
HopiAlbinism
CaucasiansCystic fibrosis
High risk population
Disease
CombineRandommating
Copyright: Gilean McVean, 2001 8
Differences between allozymes and DNA?
• American oysters (Crassostrea virginica)
0
0.2
0.4
0.6
0.8
1
MA SC GA FL FL FL FL FL LA
0
0.2
0.4
0.6
0.8
1
MA SC GA GFL FL FL FL FL LA
Allozymes
DNAmtDNA
Avise (1994)
Copyright: Gilean McVean, 2001 9
Differences between allozymes?
0. 291hk
0.035to
0.027α-gpdh
0.034bdh
0.062ak
0.017got
0.052pgi
0.028pgm
FSTLocus
Unusually high differentiation
Checkersport butterfly
Euphydryas editha
McKechnie et al. 1975
Problems with FST
• Arbitrary a priori choice of structure to test
• High sampling variance when polymorphism low
• Throws away much information
Copyright: Gilean McVean, 2001 10
Population genetics models of structure
• Quantify relationship between genetic drift, selection and population differentiation
• Assumptions
– Infinite mainland population (island)– Equal population size (n-island)– Constant population size– Proportion m of population replaced migrants
each generation– Symmetric migration (n-island)
Island model n-island model
Copyright: Gilean McVean, 2001 11
Identity by descent in the island model
1
Same parent
ft-1 0
Different parents MigrationEvent
Identity
Probability eN2/1 2mmNe 22/11 −−
At equilibriummN
fe41
1
+=
generationper migrants ofNumber 24 ×=mNe
Only a few migrants each generation are required to prevent a build up of identity within the island population
Copyright: Gilean McVean, 2001 12
Relationship between FST and migration rate
• Can estimate scaled migration rate from estimated FST (assuming equilibrium, etc.)
mNFE
eST 41
1][
+≈
0.01
0.1
1
10
100
0 0.2 0.4 0.6 0.8 1STF
mNe
E.g. in humans, FST ≈ 0.067
5.3≈mNe
NB: This is NOT a good estimator – do not trust the answer!
Copyright: Gilean McVean, 2001 13
Wright’s diffusion model for allele frequencies with migration
0 0.2 0.4 0.6 0.8 1
Allele frequency onmainland = 0.5
Probability density
104 =mNe
2.04 =mNe
allele frequency on island
Mainlandfrequency = xm
ex
mx
N
xxV
xxmM
2
)1(
)(
−=
−=
δ
δ
Islandfrequency = x
Wright (1951)
Deterministic
Drift
Copyright: Gilean McVean, 2001 14
Example: SNP frequencies in African Americans
• Goddard et al. (2000)
– 114 SNPs in 33 genes– 190 African Americans sampled
• Likelihood estimation of Nem from sample
– assume independence between SNPs
0
0.25
0.5
0.75
1
0 0.25 0.5 0.75 1Worldwide frequency
Afr
ican
Am
eric
an f
requ
ency
-50
-40
-30
-20
-10
0
0 5 10 15
0.5=mNe
)(LLn∆ mNe
Copyright: Gilean McVean, 2001 15
The coalescent in structured populations
• Two-island model
Population 1 Population 2
Pr{coalescence} =e
ii
N
nn
4
)1( −
Pr{migration} = mni
Copyright: Gilean McVean, 2001 16
The time to coalescence for two sequences sampled from the same population
Pr{1st event is a coalescence}
mNmN
N
ee
e
41
1
22/1
2/1
+=
+
Pr{1st event is a migration}
mN
mN
mN
m
e
e
e 41
4
22/1
2
+=
+
Expected time to coalescence = eN4
Ne
Ne
2Ne≡For expectedpairwise diversity (within population)
BUT
0 3 6 9 12 15 18 21 24
Subdivided: 4Nem = 0.2
Single population
Variance affected by population structure
Average pairwise differences
Copyright: Gilean McVean, 2001 17
Effect on allele frequency spectrum
Rapid coalescencewithin population
Slow coalescence between populationsMutation at
high frequency
1 4 7 10 13 16 19Frequency of derived allele
Subdivided: 4Nem = 0.1Single population
Copyright: Gilean McVean, 2001 18
Effect on neutrality statistics within populations
• Tajima’s D statistic
• Fu and Li D statistic
-4 -3 -2 -1 0 1 2 3 4
Subdivided: 4Nem = 0.2
Single population
Main effect is to increase the variance
Other statistics (e.g. Fay and Wu, 2000) more sensitive
-4.5 -3 -1.5 0 1.5 3
Subdivided: 4Nem = 0.2
Single population
Copyright: Gilean McVean, 2001 19
Effect on polymorphism between populations
-4 -3 -2 -1 0 1 2 3 4
• Tajima’s D statistic
• Frequency distribution
Subdivided: 4Nem = 0.2Single population
1 3 5 7 9 11 13 15 17 19
Subdivided: 4Nem = 0.2Singlepopulation
Copyright: Gilean McVean, 2001 20
Effect on linkage disequilibrium
• Linkage disequilibrium measures correlations between alleles at different loci
• Population structure increases linkage disequilibrium between linked loci
• Population structure creates linkage disequilibrium between unlinked loci in different populations
0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95 r2
Subdivided: 4Nem = 0.1
Single population
14 =rNe
BAAB fffD −=
0
8.0,2.0
===
D
ff BA
0
2.0,8.0
===
D
ff BA
Naive analysis
09.0=D
Admixture
Copyright: Gilean McVean, 2001 21
Admixture dynamics
• Combination of two previously separated populations
• Over time random mating returns population to equilibrium
• Disequilibrium between unlinked loci can persist for several generations, while Hardy-Weinberg equilibrium is achieved instantly
tt rDD )1(0 −=BAD δδ= 4
10
AAA ff δ=− 21
BBB ff δ=− 21
0 2 4 6 8 10
0/ DDt
unlinked
1cM distance
generation
Copyright: Gilean McVean, 2001 22
Selection in a subdivided population
• Maruyama (1970)
– The fixation probability of an unconditionally beneficial mutation is unaffected by population structure (Pfix ≈ 2s)
• Levene (1953)
– Environmental heterogeneity can maintain genetic polymorphism
• BUT
– If migration high, selection has to be strong and finely balanced to habitat frequencies to maintain polymorphism
• Low migration rates can promote local adaptation
– Heavy metal tolerance in plants– Melanism in the peppered moth
– Milk tolerance in humans
favoured on
favoured on
Copyright: Gilean McVean, 2001 23
Selection at different scales
• Evidence for local adaptation from gradients in allele frequency : clines
• Continental clines in Adh activity and allozyme variation in Drosophila
• Clines in genetic and morphological characters in the toad Bombina
Driven by scale of environmental heterogeneity
Balance between selection against hybrids and migration, following secondary contact
0
0.2
0.4
0.6
22 27 32 37 42 47
Latitude
Freq
uenc
y
F/S
1∇
0
1
-10 -5 0 5 10
Distance (√km)
Frequency B. variegata
morphological
Genetic
Berry & Kreitman (1993)
Szymura & Barton (1991)
Copyright: Gilean McVean, 2001 24
Indirect evidence for local adaptation?
• Local hitch-hiking?
• But the structured coalescent also leads to variation in coalescence times
India
Zimbabwe
China
Antilles
Mic
rosa
tell
ite
dive
rsit
y
Locus Schlötterer et al. (1997)
Copyright: Gilean McVean, 2001 25
The interaction between selection, gene flow and genetic drift
• Wright’s Shifting Balance theory
• Epistasis between alleles at different loci
• The adaptive landscape
– Epistasis creates adaptive valleys between peaks of fitness
Population fitness
Frequency allele A
Frequencyallele B
Adaptive valley
Starting pointof population
AA Aa aa
BB
Bb
bb
Locus 2
Locus 1least fit
most fit
Copyright: Gilean McVean, 2001 26
The Shifting Balance theory
• Drift allows population to cross adaptive valley due to stochastic processes in finite populations
• Evidence for widespread epistasis?
– F2 hybrid breakdown
– Coadapted gene complexes
• Theoretical issues
– Very difficult for a population that has crossed a valley to spread throughout rest of population
– The interaction between epistatic selection and genetic drift may be important in reproductive isolation
• e.g. recessive epistatic interactions important in Haldane’s rule of unisexual hybrid sterility
Subpopulations are natural experiments, allowing species to evolve across complex adaptive landscapes
Copyright: Gilean McVean, 2001 27
Future directions
• Theoretical and statistical issues– Methods for discriminating between local
adaptation and chance effects of coalescence in a structured population
– The relationship between population structure and linkage disequilibrium
– Selection on polygenic traits in subdivided populations
• Empirical challenges– Describing patterns of gene diversity at
many loci across genomes (from an well-chosen sample)
– Comparing differentiation for different types of mutation (e.g. silent v replacement)
– Mapping genetic variation to phenotypic variation