Date post: | 31-Mar-2015 |
Category: |
Documents |
Upload: | claire-sobie |
View: | 214 times |
Download: | 0 times |
Genealogies of time Genealogies of time structured data, an structured data, an
application on cave bear application on cave bear ancient DNAancient DNA
Frantz Frantz DepaulisDepaulis
Ludovic Ludovic OrlandoOrlando
Catherine Catherine HannïHannï
UMR 5534 Centre de Génétique Moléculaire et Cellulaire
Université Claude Bernard, Lyon I
UMR 7625 Laboratoire d’écologie
Paris 6/ENS
Outline of the presentationOutline of the presentation
1 Introduction: Gene genealogiesIntroduction: Gene genealogies2 ResultsResults
2 .1 Simulation exploratory results.1 Simulation exploratory results2 .2 Cave bear application.2 Cave bear application
3 ConclusionsConclusions
Wright Fisher Neutral modelWright Fisher Neutral modelAssumptionsAssumptions Selective neutrality (Selective neutrality (NNe e s s <<1)<<1) Demography Demography
- Isolated panmictic Population, - Isolated panmictic Population,
- Constant size - Constant size NN
- Poisson Distribution of offspring - Poisson Distribution of offspring PP (1) (1)
- Same sampling time- Same sampling time
Mutational, sequence data: Mutational, sequence data: infinite site model (ISM) infinite site model (ISM)
- No recombination- No recombination
- Independent mutations- Independent mutations
- Constant mutation rate - Constant mutation rate µµ
Along the sequenceAlong the sequence
Across time Across time
- Each mutation affects a new nucleotide site- Each mutation affects a new nucleotide site
-Coalescence-
Genealogy of a gene sampleGenealogy of a gene sample
gene sample
ancestral lineage
coalescence= common ancestor
Most recent common ancestor (MRCA)
-Coalescence-
CoalescentCoalescent
a b c d e f
Most recent common ancestor of the
sample(MRCA)
sample of “genes” /
of individual
s
Common ancestor
(CA)
neutral mutati
ons
TC
C
G
CG
A
A
-Coalescence-
ConstructiConstructing ng
coalescentcoalescents, s,
a b cd e f1°)Ages of the nodes
t3
p=1/2NExp( p )
t1
t2
t4
t5:
additional assumption: n << N
p = (n (n -1)/2) /2N
-Coalescence-
neutral mutati
ons G
TC
C
G
CA
3°) uniform distribution of
mutations
gene sample
Topologyof the tree
2°)
CA
C
G
CG
T T
neutral distribution of sequence polymorphis
m
A AA
A
a b cd e f
MRCA
common ancestor (CA)
t1
t2
t3
t4
t5:
100 000 times
ConstructiConstructing-ng-
deconstrucdeconstructing ting
coalescentscoalescents
-Coalescence-
T
A
C
C
G
CG
CC
GA A
C
T T
G
A
AT
A
A
C
G
T
C
CT
CA A
T
T
G
A
T
C
T
A
C
C
G
CG
C
TG
G G
CC
C
G
AA
A
A
T
Haplotype tests: simulationsHaplotype tests: simulations
parameters‡ : S =8 n =6
K = 6K = 5K = 4
10 000
simulations
haplotype number K {haplotyp
e diversity
H = 1- fi2 H = 0.83H = 0.78H = 0.72
CC
A
T
{
Depaulis and Veuille MBE 1998‡ Hudson 1993
...
0.20.30.40.50.60.70.80.9
H
density
observed H : P = 0.03 *
Distribution of simulated
H
0.1
-Coalescence-
GCGCGCGAACCCATT outgroup 121531416121423 frequencies
Alignment of polymorphic Alignment of polymorphic sites: sites: frequencies of mutationsfrequencies of mutations
GCCCGCGAATCCATTGCGTGCGATCCGATTGCGTACAATCCCGTCGTGTACAATCTCGACGTGTACAATCTCGACGCGTGGAATCCCGTTCCGCGCGGTCCCATT
n =7
S =15C
T
→
T
C
C
-Coalescence-
0
1
2
3
4
5
6
7
1 2 3 4 5 6
unfolded
0
1
2
3
4
5
6
7
1 2 3 4 5 6
unfoldedfolded
Frequency spectrum of Frequency spectrum of mutations & neutrality mutations & neutrality
teststests
fi : number
of occurrences in a sample
Number of
polymorphic sites
== 00(Tajima Genetics 1989)
sityheterozygo estimator
(1975) sWatterson'
S
i
ii
nn
fnf
1 )1(
)(2
1
1
1
ˆn
i
W
i
S
)ˆˆ(
ˆˆ
W
W
SED
=4Ne
)ˆˆ(
ˆˆ*
e
e
SED
e
H=-H(Fay and Wu Genetics 2000)
state derived the
ofty homozygosi
S
i
iH nn
f
1
2
)1(
2
(Fu and Li Genetics 1993)
-Coalescence-
Mitochondria, correlation LD/distance Mitochondria, correlation LD/distance recombination or mutational effects?recombination or mutational effects?
distance d
r 2 = ↘(d )
Pearson’s
statistic tested by
permutations of
sites
Awadalla et al. (Science 1999)
Time structured data & Time structured data & genealogiesgenealogies
- Parasites during disease evolution (virus…)
- Microbial experimental evolution
- Ancient DNA
Issue:
- To what extent the analyses are affected by time structure?
- How to correct for this?
-Coalescence-
n =2n =5
Algorithm for time Algorithm for time structured coalescentstructured coalescent
a b c
d e f
n =3
n =3
n =2
t 1
n =4
The exponential law is The exponential law is memoryless !memoryless !
n1 =3
- Simulations-
Age structure effect on gene Age structure effect on gene genealogies genealogies
Contemporaneous sample Limited time
structure
Two subsets
with large time
spacingExcess of rare variants Deficit of LD
Deficit of rare variants Excess of LDDifferentiation
t 1
n1 =4
- Simulations-
Pearson
10
S/S0
Effect of subset Effect of subset size on statistical size on statistical
tests : tests : meanmean t1 =0.2 Ne generations
Dt: Tajima's (1989) D; D*fl Fu and Li (1993)'s D*; Hfl Fay and Wu's (2000) H; ZnS Kelly (1997)'s ZnS; K and H Depaulis and Veuille (1998)'s haplotype tests (K is scaled to its expected maximal value S+1 corresponding to ); Pearson: Pearson correlation coefficient between pairwise allelic correlation and distance between mutations; Fst Hudson et al's (1992) Fst.
nn11
n =40, S =20
- Simulations-
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0.2 0.4 0.6 0.8 n1/n
Mean
Dt D*fl Hfw ZnS K Hpi/pi0 Fst
Effect of subset size on Effect of subset size on statistical tests : statistical tests : significance rate significance rate
Empty symbols: deficit of the statistics; Filled symbols: excess of the statistics. Dt: Tajima's (1989) D; D*fl Fu and Li (1993)'s D*; Hfl Fay and Wu's (2000) H; ZnS Kelly (1997)'s ZnS; K and H Depaulis and Veuille (1998)'s haplotype tests; Pearson: Pearson correlation coefficient between pairwise allelic correlation and distance between mutation tested by permutations according to Awaddala et al. (1999); Fst Hudson et al's (1992) Fst tested by permutations
t1 =0.2 Ne generations nn11
n =40, S =20
- Simulations-
0
0.05
0.1
0.15
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1n1/n
significance rate
Dt_inf D*fl_inf Hfw_inf ZnS_inf K_sup H_sup Fst
Effect of a half Effect of a half subset age on subset age on
statistical tests: statistical tests: mean mean
Dt: Tajima's (1989) D; D*fl Fu and Li (1993)'s D*; Hfl Fay and Wu's (2000) H; ZnS Kelly (1997)'s ZnS; K and H Depaulis and Veuille (1998)'s haplotype tests (K is scaled to its expected maximal value S+1 corresponding to ); Pearson: Pearson correlation coefficient between pairwise allelic correlation and distance between mutations; Fst Hudson et al's (1992) Fst.
t 1
nn11==nn/2/2
- Simulations-
-1
-0.5
0
0.5
1
1.5
2
2.5
3
0.001 0.01 0.1 1 10t1 in 2Ne generations
Mean
Dt D*fl Hfw K H ZnS Pearson Fst Pi/Theta0 S/S0
Effect of a half subset Effect of a half subset age on statistical tests: age on statistical tests:
significance ratessignificance rates
Empty symbols: deficit of the statistics; Filled symbols: excess of the statistics. Dt: Tajima's (1989) D; D*fl Fu and Li (1993)'s D*; Hfl Fay and Wu's (2000) H; ZnS Kelly (1997)'s ZnS; K and H Depaulis and Veuille (1998)'s haplotype tests; Pearson: Pearson correlation coefficient between pairwise allelic correlation and distance between mutation tested by permutations according to Awaddala et al. (1999); Fst Hudson et al's (1992) Fst tested by permutations
t 1
nn11==nn/2/2
- Simulations-
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.001 0.01 0.1 1 10t1 in 2Ne generations
Significance rate
Dt_inf Dt_sup D*fl_inf D*fl_sup ZnS_inf K_sup H_sup Fst
Cave bear: Cave bear: Ursus spelaeusUrsus spelaeus(12-300kYA)(12-300kYA)
- Application-
Sampling sitesSampling sites- Application-
Alignment of Alignment of polymorphic sites:polymorphic sites: D-loop D-loop of cave bear of cave bear
REF TTGTCAACTT TCGAATTGAA GTREF TTGTCAACTT TCGAATTGAA GT#NOASC3500_40-45 ..A....T.C ..A....... ..#NOASC3500_40-45 ..A....T.C ..A....... ..#NOASC3800_40-45 ..A....T.C ..A....... ..#NOASC3800_40-45 ..A....T.C ..A....... ..#NOASC85F16_40-45 .......... .......... ..#NOASC85F16_40-45 .......... .......... ..#NOASC95456_40-45 ..A....T.C ..A....... ..#NOASC95456_40-45 ..A....T.C ..A....... ..#NOASC92386_40-45 ..A....T.C ..A....... ..#NOASC92386_40-45 ..A....T.C ..A....... ..#NOASC92413_40-45 C.A....T.C ..A....... ..#NOASC92413_40-45 C.A....T.C ..A....... ..#NOASC92152_40-45 C.A....T.C ..A....... A.#NOASC92152_40-45 C.A....T.C ..A....... A.#NOASC5300_50-60 ..A....T.C ..A....... ..#NOASC5300_50-60 ..A....T.C ..A....... ..#NOASC11600_80 .......... .......... ..#NOASC11600_80 .......... .......... ..#NOASC12500_80 .......... .......... ..#NOASC12500_80 .......... .......... ..#NOASC13800_80 .......... .......... ..#NOASC13800_80 .......... .......... ..#NOASC100801_80 .......... .......... ..#NOASC100801_80 .......... .......... ..#NOASC12400_80 ..A....T.C ..A....... ..#NOASC12400_80 ..A....T.C ..A....... ..#NOASC11800_80 .CA....T.C ..A.G..... ..#NOASC11800_80 .CA....T.C ..A.G..... ..#NOASC11700_80 C.A....T.C ..A....... A.#NOASC11700_80 C.A....T.C ..A....... A.#NOASC84E16_90-130 C.A....T.C ..A....... ..#NOASC84E16_90-130 C.A....T.C ..A....... ..#NOASC84G19_90-130 C.A....T.C ..A....... ..#NOASC84G19_90-130 C.A....T.C ..A....... ..#NOASCbrC5-02_90-130 C.A....T.C ..A....... ..#NOASCbrC5-02_90-130 C.A....T.C ..A....... ..#NOASC15400_90-130 C.A....T.C ..A......G ..#NOASC15400_90-130 C.A....T.C ..A......G ..#NOASC15700_90-130 ....T.G.C. .TA..C..G. ..#NOASC15700_90-130 ....T.G.C. .TA..C..G. ..#NOATAB2_40 .......... .......... ..#NOATAB2_40 .......... .......... ..#NOAGrotteMerve_? .......... .T........ ..#NOAGrotteMerve_? .......... .T........ ..#NOAAZE_80-130 .......... .......... .C#NOAAZE_80-130 .......... .......... .C#NOAGigny189F3_? ..A....T.C ..A....... ..#NOAGigny189F3_? ..A....T.C ..A....... ..#NOAJAL104_? C.A....T.C ..A....... ..#NOAJAL104_? C.A....T.C ..A....... ..#NOATAB15_25-35 ..A......C ..A....... ..#NOATAB15_25-35 ..A......C ..A....... ..#NOAGailenreuth_? ..A......C ..A....... ..#NOAGailenreuth_? ..A......C ..A....... ..#NOA47910_30 ..A....T.C ..A....A.. ..#NOA47910_30 ..A....T.C ..A....A.. ..#NOAHohleFels_? ..A....T.C ..A..C.... ..#NOAHohleFels_? ..A....T.C ..A..C.... ..#NOACLA_35 ..A....T.C C.A....... ..#NOACLA_35 ..A....T.C C.A....... ..#NOACLB_35 ..A....T.C C.A....... ..#NOACLB_35 ..A....T.C C.A....... ..#NOAChiemsee_35 ..A..G.... ..A...C... ..#NOAChiemsee_35 ..A..G.... ..A...C... ..#NOARamesch1_? ..A..G.... ..A...C... ..#NOARamesch1_? ..A..G.... ..A...C... ..#NOARamesch2_? ..A..G.... ..A...C... ..#NOARamesch2_? ..A..G.... ..A...C... ..#NOAGeissenklt1_? ...CT..... .T.G.C.... ..#NOAGeissenklt1_? ...CT..... .T.G.C.... ..#NOAGeissenklt2_? ...CT..... .T.G.C.... ..#NOAGeissenklt2_? ...CT..... .T.G.C.... ..#NOANixloch_? ...CT..... .T...C.... ..#NOANixloch_? ...CT..... .T...C.... ..
--------------------------------------------- --------------------------------------------- Alp barrierAlp barrier#SOAPoto_? ...CT..... .T...C.... ..#SOAPoto_? ...CT..... .T...C.... ..#SOAVind1_? ...CT..... .T...C.... ..#SOAVind1_? ...CT..... .T...C.... ..#SOAVind2_? ...CT..... .T...C.... ..#SOAVind2_? ...CT..... .T...C.... ..#SOAConturi_? .......T.. .......... ..#SOAConturi_? .......T.. .......... ..
n n =41 =41 S S =22=22
(Loreille et al. 2001) (Orlando et al. 2002) (Hofreiter et al. 2002)(Kühn et al. 2001)
Ne= 13 000
- Application-
Neutrality tests, Belgium Neutrality tests, Belgium cavecave
a permutation test
- Application-
Statistic Dt D*fl Hfw K H ZnS Pearson
Observed -0.82 -1.55 -1.32 7 0.79 0.24 -0.39 (2.8*)a
(P value %) (21.0) (5.3) (18.4) (16.4) (37.7) (43.7) (2.8*)
Mean 0.06 -0.05 0.30 8.3 0.79 0.26 0.00
CI [-1.42;1.51] [-1.89;1.18] [-4.46;2.62] [5;11] [0.64;0.88] [0.10;0.55] [-0.25;0.20]
No time
structure
% rejected (4.9;5.5) (5.2;2.8) (5.4;4.8) (1.7;3.9) (4.9;4.6) (5.5;5.1) (5.0;/)
(P value %) (30.0) (8.8) (17.2) (8.6) (31.2) (31.7) (2.7*)
Mean -0.30 -0.38 0.39 9.1 0.80 0.22 0.00
CI [-1.56;1.26] [-1.89;0.84] [-4.04;2.56] [6;12] [0.66;0.89] [0.08;0.47] [-0.29;0.23]
Average
time
structure
% rejected (7.8;3.0) (8.2;1.0) (4.2;3.7) (0.8;9.5) (3.3;7.8) (11.5;2.9) (4.9;/)
(P value %) (30.0) (8.6) (17.4) (7.9) (30.9) (31.9) (2.8*)
Mean -0.33 -0.42 0.37 9.1 0.80 0.22 0.00
CI [-1.59;1.18] [-1.89;0.84] [-4.20;2.54] [6;12] [0.66;0.89] [0.08;0.48] [-0.29;0.24]
Scladina
n=20
S=15
Uncertainty
in time
structure % rejected (9.3;2.8) (9.3;0.8) (4.5;3.6) (0.7;9.8) (3.7;7.5) (11.6;2.8) (4.8;/)
Neutrality tests, dated Neutrality tests, dated subsamplesubsample
a permutation test
- Application-
Statistic Dt D*fl Hfw K H ZnS Pearson
Observed -1.21 -2.28 -0.69 12 0.86 0.14 -0.27 (11.4) a
(P value %) (10.5) (0.6**) (25.7) (16.5) (32.1) (24.3) (11.5)
Mean -0.09 -0.08 0.29 10.3 0.82 0.23 0.00
CI [-1.49;1.50] [-1.98;1.32] [-5.66;3.18] [7;14] [0.69;0.90] [0.09;0.48] [-0.19;0.16]
No time
structure
% rejected (5.0;5.2) (3.6;1.4) (5.3;4.7) (4.0;2.8) (5.3;4.7) (5.7;5.0) (4.7;/)
(P value %) (17.7) (1.7*) (24.3) (38.2) (42.6) (41.8) (11.2)
Mean -0.42 -0.59 0.35 11.8 0.84 0.18 0.00
CI [-1.69;1.11] [-2.28;0.72] [-5.34;2.98] [8;15] [0.71;0.91] [0.07;0.39] [-0.23;0.20]
Average
time structure
% rejected (9.3;2.1) (6.9;0.3) (4.7;2.6) (1.2;11.1) (3.4;9.5) (13.7;2.4) (4.9;/)
(P value %) (18.5) (1.9*) (23.4) (39.9) (43.2) (41.1) (11.9)
Mean -0.44 -0.61 0.37 11.8 0.84 0.18 0.00
CI [-1.70;1.09] [-2.28;0.72] [-5.23;2.99] [8;16] [0.71;0.91] [0.07;0.40] [-0.24;0.19]
all dated
n=27,
S=20
Uncertainty
in time
structure
% rejected (9.3;2.4) (7.0;0.2) (4.6;2.7) (1.2;11.7) (3.5;9.7) (14.1;2.5) (5.4;/)
Neutrality tests, total Neutrality tests, total samplesample
a permutation test
- Application-
Statistic Dt D*fl Hfw K H ZnS Pearson Fst
Observed -0.45 -0.88 1.35 17 0.91 0.10 -0.09 (22.0) a 0.32 (0.4**) a
(P value %) (37.1) (14.7) (47.1) (1.7*) (3.7*) (18.1) (21.5) (0.4**)
Mean -0.09 -0.09 0.30 12.3 0.83 0.19 0.00 -0.03
CI [-1.44;1.52] [-1.85;1.38] [-5.84;3.15] [8;16] [0.70;0.90] [0.07;0.41] [-0.20;0.17] [-0.38;0.27]
No time
structure
% rejected (4.5;5.3) (4.1;1.1) (4.8;4.7) (3.0;4.3) (4.8;4.9) (5.5;4.6) (4.8;/) (/;4.6)
(P value %) (45.5) (35.6) (45.6) (7.8) (5.5) (36.6) (21.8) (1.3*)
Mean -0.45 -0.74 0.32 13.9 0.84 0.15 0.00 -0.01
CI [-1.71;1.10] [-2.49;0.73] [-5.38;2.93] [9;18] [0.71;0.91] [0.05;0.34] [-0.23;0.20] [-0.40;0.38]
Average
time
structure
% rejected (10.2;2.2) (10.7;0.1) (4.2;2.4) (0.8;16.1) (4.3;7.9) (15.2;2.2) (4.9;/) (/;8.9)
(P value %) (42.1) (40.7) (44.9) (10.3) (6.2) (39.2) (21.8) (1.7*)
Mean -0.54 -0.90 0.26 14.3 0.84 0.14 0.00 -0.01
CI [-1.76;0.96] [-2.81;0.73] [-5.70;2.90] [10;18] [0.71;0.91] [0.05;0.32] [-0.24;0.21] [-0.40;0.41]
n=41,
S=22
Uncertainty
in time
structure
% rejected (12.2;1.4) (14.2;0.1) (4.5;2.3) (0.5;19.8) (4.0;7.9) (16.7;2.1) (4.7;/) (/;9.7)
LD as a function of distanceLD as a function of distance- Application-
R2 = 0.4174
0.01
0.1
1
0 10 20 30 40 50 60 70
distance (nt)
r 2
Can substantially bias the resultsCan substantially bias the results– Even if within 10% of the age of the MRCAEven if within 10% of the age of the MRCA
bottom of the tree with more branchesbottom of the tree with more branches
non random subset of mutations (rare ones)non random subset of mutations (rare ones)
– small: long external branches, excess of rare small: long external branches, excess of rare variants (negative D, deficit of LD)variants (negative D, deficit of LD)
– great: a long internal branch apparent great: a long internal branch apparent differentiation excess of intermediate differentiation excess of intermediate frequency variants (positive D, excess of LD) frequency variants (positive D, excess of LD) if equilibratedif equilibrated
Time structure , Time structure , ConclusionConclusion
AcknowledgementsAcknowledgements
CNRSCNRS Nick BartonNick Barton