Evolutionary Genetics: Part 5
Coalescent simulations
S. peruvianum
S. chilense
Winter Semester 2012-2013
Prof Aurélien TellierFG Populationsgenetik
Color code
Color code:
Red = Important result or definition
Purple: exercise to do
Green: some bits of maths
Population genetics: 4 evolutionary forces
random genomic processes
(mutation, duplication, recombination, gene conversion)
natural
selection
random demographic
process (drift)
random spatial
process (migration)
molecular diversity
Simulating sequence data
How to simulate?
How to simulate?
How to simulate?
Algorithm to generate sequence data
� Put k+n where n is the sample size
� Choose an exponential variable with parameter k(k-1+θ)/2
� With probability:
� (k-1)/(k-1+θ) the event is a coalescent event
� And with probability θ/(k-1+θ) the event is a mutation
� If a coalescent event occurs choose a pair of lineages to coalesce, k becomes then
k-1
� If a mutation event occurs, choose a lineage to mutate, k is unchanged
� Repeat all this until k=1
Simulations 1
What is θ ?????
Simulations 1
Do you see the same numbers? WHY?
Simulations 1
Simulations 1
pdf(file=‘‘constant_tree.pdf‘‘)
Dev.off()
4 –t 5 –T > treefile.tre
Simulations 1: neutral and constant size
Simulations 2: neutral and expansion
t1 = 0.5 = time at which the expansion starts in the past
x = 0.1 = the population in the past is 0.1*N0
Present population size = N0
Ancestral population size = x*N0
Time t1 of expansionIn 4N0 generations
Do you see a problem ??? What is N0 ???
Simulations 2: neutral and expansion
-eN 0.5 0.1
0.5 = time at which the expansion starts in the past
0.1 = the population in the past is 0.1*N0
-eN 0.05 0.1 – T > expansion.tre
4
4
Simulations 2: trees of expansion
pdf(file=‘‘expansion-tree.pdf‘‘)
Dev.off()
expansion.tre
Simulations 3: crash or bottleneck?
For a crash:
./ms 10 4 –t 5 -eN 0.5 5
Present population size = N0
Ancestral population size = x*N0
Time t1 of expansionIn 4N0 generations
Simulations 3: crash or bottleneck?
For a bottleneck:
./ms 10 4 –t 5 -eN 0.5 0.25 -eN 0.75 2
Present population size = N0
Ancestral population size = x2*N0
Time t1
Time t2
Bottleneck population size = x1*N0
t1 x1t2 x2
Simulations 2: trees of expansion
Exercise
Summarize the ms output
Exercise
Exercise
Then save the output in a file:
> test1.out
Exercise
Now using R
Load the file:
test <- read.table(“test1.out“,header=FALSE)
Then draw graphs:
pdf(file=‘‘summary_neutral_constant.pdf‘‘)
hist(test[,2],main=“Theta_Pi Tajima“)
hist(test[,4],main=“Theta_Watterson“)
hist(test[,6],main=“Tajima D“)
Dev.off()
Then do the same for an expansion, decline or bottleneck
Exercise
Final simulations
� Using msmsplay on your computer
� Command line is similar
� Can see directly the site Frequency-Spectrum
� Can you compare the site frequency spectrum with values of Tajima‘s D ?
� Lets simulate neutral model, expansion, decline
� What differences we see?
Some data analysis
� Use datasets:
� Use DnaSP to calculate usual statistics:
� Diversity = θW , θπ
� Site frequency spectrum
� Tajima‘s D
� What do you conclude on these various data?
� Do you have an idea of the past demography of these populations?
� Why do you need several independent loci ?