Date post: | 30-Mar-2015 |
Category: |
Documents |
Upload: | joana-tripp |
View: | 216 times |
Download: | 0 times |
Juan DazaUCF
Fall 2008
Reconstructing the evolutionary process
Reconstructing the evolutionary process
Evolutionary process implies TIME
We are interested in determineHow,Where,Why,WHEN evolution occursor has occurred
Genetic data
Molecular evolution
theory
Molecular dating
The general procedure of molecular dating
Phylogram Ultrametric tree
The evolution of molecular datingHemoglobin
example
The term is introduced
Neutral theory
Statisticalproperties of
clocks
Fitch’s test
Autocorrelation of rates
Local clocks
The evolution of molecular datingbranch pruning
NPRS Bayesian
Penalized likelihood
Uncorrelated rates
The evolution of molecular dating
The evolution of molecular dating
• Amino acids• Nucleotides• Pruning branches• Local clocks (PAML, Pathd8 packages)• Relaxed clocks
Correlated rates (r8s, Multidivtime)
Uncorrelated rates (Beast)
Applications
Species divergence
Explosive radiations
Gene evolution
Rates estimation
Virus epidemiology
Historical demography
0
5
10
15
20
25
30
35
40
45
50
0 5 10 15 20 25 30 35 40
bursts
TimeLo
g (#
line
ages
)
The molecular clock hypothesis
The hypothesis of the molecular clock proposes that molecular evolution occurs at rates that persist through time and across lineages
Constant Burst
“The discovery of the molecular clock stands out as the most significant result of research in molecular evolution.”
Wilson et al., 1977
Emile Zuckerland and Linus Pauling “…It is possible to evaluate very roughly and tentatively the time that has elapsed since any of the hemoglobin chains present in a given species and controlled by non-allelic genes diverged from a common chain ancestor. . . . From paleontological evidence it may be estimated that the common ancestor of man and horse lived in the Cretaceous or possibly the Jurassic period, say between 100 and 160 million years ago. . . . The presence of 18 differences between human and horse -chains would indicate that each chain had 9 evolutionary effective mutations in 100 to 160 millions of years. This yields a figure of 11 to 18 million years per amino acid substitution in a chain of about 150 amino acids, with a medium [sic] figure of 14.5 million years…”
Constant BurstZuckerland and Pauling, 1962
Emile Zuckerland and Linus Pauling
Constant Burst
The molecular clock hypothesis
Constant Burst
)(2
)()(
vE
KETE ijij
v
K
v
KTT
2,
2],[ 2121
rate = number of substitutions per site per year
number of substitutions
per siteDivergence time
between species i
and j
Confidence interval
The molecular clock hypothesis
Increasing of genetic dataQuantification of ratesMolecular evolution understanding
Constant
Framework for hypothesis testing
The molecular clock hypothesis
Constant
• Differences in generation times
• Differences in population size
• Natural selection and its intensity
Some biological attributes might be responsible:
LLclocknonclock
loglog2
Null hypothesis: the phylogeny is rooted and the branch lengths are constrained such that all of the tips can be drawn at a single time plane.
Alternative hypothesis: each branch is allowed to vary independently.
Chi-square distribution with 3 d.f.
Log Likelihood ratio test
Amount of evolution BL = R*T
What to do if the clock is rejected?
Branch lengths
Error in topologyError in branch lengths
Error in rates optimizationError in calibration
Phylogram Ultrametric tree
What to do if the clock is rejected?
…Go simple
Eliminate branches (lineages) that are causing the clock to be rejected
What to do if the clock is rejected?
What to do if the clock is rejected?
Objective functions need to be developed to reduce dimensionality
Global clock to Local clocks
Assign specific rates to specific parts of the tree and calculate divergence times
Packages:
PAMLPathd8
r1
r2
…what if still doesn’t work?
We need to find the function that explain the data better.
“Relaxed clock methods”
Maximum Likelihood and Bayesian Inference
Uncorrelated relaxed clocks
Correlated relaxed clocks
Penalized Likelihood Method(Sanderson, 2002)
A likelihood method to generate an ultrametric chronogram from a non-ultrametric tree
Finds the best fitting model of rate evolution considering both:
1. how well modeled changes explain the branch lengths2. The amount of rate changes across the tree (less
change = better)
Rates correlation
Penalized Likelihood Method(Sanderson, 2002)
A topology with branch lengths is required.
Absolute or relative dates can be obtained.
Bootstrap method is used for confidence intervals (time consuming!!!)
Fossil cross validation
Penalized Likelihood Method(Sanderson, 2002)
Maximizes the sequence data (X) on a combination of average rates (R) and time (T) with a penalty function to discourage rate change.
RTRXp ,log
Likelihood Penalty function
Number of pseudoreplicates
Mean date for the same node from all
bootstrap pseudoreplicates
Estimate of time for a single node from single bootstrap pseudoreplicate
1
)(1
2
n
n
i BBi
B
Standard error of a bootstrap
distribution
Confidence intervals for Penalized Likelihood(Burbrink and Pyron, 2008)
)(
)()(),()(),,,(
CXp
vpCTpvTRpBXpCXvRTp
Posterior Likelihood Prior
marginal p of the data
ages tree parameters
constraints
Bayesian Inference(Thorne and Kishino, 2000; Drummond et al., 2006)
Uses the bayes’ rule to estimate rates and dates
Bayesian Inference(Thorne and Kishino, 2000; Drummond et al., 2006)
BL=0.065 subs/site
BL=R*T
Bayesian Inference(Thorne and Kishino, 2000; Drummond et al., 2006)
r=0.1
t=0.65
Bayesian Inference(Thorne and Kishino, 2000; Drummond et al., 2006)
BL=0.065 subs/site
Bayesian Inference(Thorne and Kishino, 2000; Drummond et al., 2006)
Prior
BL=0.065 subs/site
Bayesian Inference(Thorne and Kishino, 2000; Drummond et al., 2006)
Prior
Posterior
BL=0.065 subs/site
Thorne and Kishino, 1998
BL=0.065 subs/site
A topology is required.
Branch lengths are estimated using the F84 model
Variance-covariance matrix of the branch lengths are also estimated
Several priors (e.g., time constraints, rates) can be included
MCMC methods are implemented to sample from the posterior
Drummond et al., 2006
BL=0.065 subs/site
A topology is not required. Phylogeny and dates are estimated simultaneously.
More complex models can be applied.
Several priors (e.g., time constraints, rates) can be included. Distributions do not need to be normal.
MCMC methods are implemented to sample from the posterior
Coalescent theory and molecular dating
Coalescent A stochastic process that describes how population genetic processes determine the shape of the genealogy of sampled gene sequences .
+Molecular dating
Test hypotheses about historical demography
Coalescent theory and molecular dating
Coalescent A stochastic process that describes how population genetic processes determine the shape of the genealogy of sampled gene sequences .
+Molecular dating
Test hypotheses about historical demography
EO
Coalescent theory and molecular dating
Coalescent A stochastic process that describes how population genetic processes determine the shape of the genealogy of sampled gene sequences .
+Molecular dating
Test hypotheses about historical demography
HCV
Bison
The methods seems to be more “realistic” but…
Are they more accurate in the real world?
How do we know if a method is appropiate??
Uncertainty of phylogenetic relationships.
Rates of evolution are unknown for many organisms.
Rate heterogeneity no molecular clock.
Lack of calibration points (fossils, biogeographic events).
BL = R*T
There are many factors that can affect divergence times
Gene tree vs. species tree
Coalescent times
Divergence times
Time of cladogenetic
event≠=
TMRCA
0.1
Deinagkistrodon acutusOvophis chaseni B306
Hypnale hypnaleCalloselasma rhodostoma
Ermia B300Protobothrops flavoviridisProtobothrops tokarensis
Proto cornutus B350Protobothrops jerdonii
Protobothrops elegansProtobothrops mucrosquamatus B106Protobothrops mucrosquamatus
Ovophis monticolajbsOvophis monticola rom
Ovophis monticola A87Ovophis montmakGloydius strauchi
Gloydius ussuriensisGloydius halysGloydius shedaoensis
Trimeresurus gracilis A86Trimeresurus gracilis ntub
Ovophis okinavensiscpOvophis okinavensisfk
Lachesis stenophrysLachesis muta
Ophryacus undulatusOphryacus melanurus
Agkistrodon contortrixAgkistrodon piscivorusAgkistrodon bilineatush
Agkistrodon tayloriSistrurus catenatus
Sistrurus miliarusCrotalus cerastes
Crotalus polystictusSistrurus ravus
Crotalus pusillusCrotalus triseriatusXoCrotalus triseriatusTo
Crotalus lepidusCrotalus aquilusCrotalus triseriatusLG
Crotalus horridusARCrotalus horridusNY
Crotalus priceiCrotalus intermedius
Crotalus transversusCrotalus enyo
Crotalus willardiROMCrotalus willardi2575Crotalus willardi413
Crotalus adamanteusCrotalus tigris
Crotalus mitchelliCrotalus scutulatus
Crotalus viridisCrotalus molossus
Crotalus basiliscusCrotalus unicolor
Crotalus durissusCrotalus vegrandis
Crotalus atroxCrotalus tortugensisCrotalus catalinensisCrotalus exsul
Crotalus ruberBothriechis supercilliarisBothriechis schlegelii
Bothriechis nigroviridisBothriechis lateralisBothriechis thalassinus
Bothriechis marchiBothriechis bicolor
Bothriechis auriferBothriechis rowleyi
Bothrocophias hyoproraBothrocophias microphthalmus2
Bothrops ammodytoidesBothrops cotiara
Bothrops alternatusBothrops insularis
Bothrops erythromelasBothrops neuwiedi
Bothriopsis bilineataBothriopsis taeniata
Bothriopsis oligolepis4Bothrops jararacussu
Bothrops atroxBothrops asper
A picadoi Alajuella CRAtropoides picadoiA picadoi SanJose2 CRA picadoi SanJose CR
Atropoides Honduras AnH1A n occiduus Solola GUATA n occiduus Sonsonate ELSALVA n occiduus1 Guatemala GUATAtropoides occiduus2 Escuintla GUATA n occiduus2 Escuintla GUATA olmec Chiapas1 MEXA olmec Chiapas2 MEXAtropoides olmecA olmec1 Veracruz MEXA olmec Chiapas3 MEXA olmec BVerapaz GUATA olmec2 Veracruz MEXA olmec Oaxaca MEXA n nummifer Hidalgo MEX
A n nummifer Veracruz3 MEXA n nummifer Veracruz2 MEXA n nummifer Veracruz1 MEXAtropoides nummifer Puebla MEXA n nummifer Puebla MEXAtropoides mexicanusclpA n mexicanus SanJose CRA n mexicanus Puntar CRA n mexicanus Cartago2 CRA n mexicanus Cartago1 CRA n mexicanus Huehet GUATA n mexicanus Quiche GUAT
A n mexicanus Izabal GUATA n mexicanus AVerpaz GUATA n mexicanus BVerpaz GUATA n mexicanus Peten GUAT
P dunni Pd4Porthidium dunniPorthidium dunni Oaxaca1 MEX
P ophryomegas Zacapa GUATP ophryomegas Hond ND4 PSPHPorthidium ophryomegasPorthidium ophryomegas Guanacaste CR
P yucatanicum PY1Porthidium nasutumPorthidium nasutum CR CLPP nasutum Alajuela CRP nasutum AVerapaz GUATP nasutum CR1P nasutum CR4P lansbergi MargIs VENP lansbergi PANAMA
AF191580 WW P nasutum EcuadorPorthidium arcosePorthidium arcosae ECUADOR CLP
P porrasi Punt5 CRP porrasi Punt4 CRP porrasi Punt3 CRPorthidium porrasi Punt2 CRP porrasi Punt2 CR
Cerrophidion petlacalensisC tzotzliroum Chiapas1 MEXC tzotzliroum Chiapas2 CR
C godmani SantaAna ESC godmani Ocotepeque2 HNDC godmani Ocotepeque1 HNDC godmani Honduras CgH2C godmani Honduras CgH3
C godmani SanJose4 CRCerrophidion godmaniC godmani SanJose5 CRC godmani SanJose6 CRC godmani SanJose1 CRC godmani SanJose2 CRC godmani SanJose3 CRC godmani SanJose CR
C godmani Quetzal GUATC godmani Guat GuatC godmani Guat3 GUATC godmani Guat2 GUATC godmani Oaxaca2 MexC godmani Oaxaca MEX
C godmani Huehuet GUATC godmani Quiche GUATC godmani Bverapaz2 GUATC godmani SanMarcos GUATC godmani BVerapaz GUATCerrophidion godmani GUATC godmani GUAT
0.1
Deinagkistrodon acutusOvophis chaseni B306
Hypnale hypnaleCalloselasma rhodostoma
Ermia B300Protobothrops flavoviridisProtobothrops tokarensis
Proto cornutus B350Protobothrops jerdonii
Protobothrops elegansProtobothrops mucrosquamatus B106Protobothrops mucrosquamatus
Ovophis monticolajbsOvophis monticola rom
Ovophis monticola A87Ovophis montmakGloydius strauchi
Gloydius ussuriensisGloydius halysGloydius shedaoensis
Trimeresurus gracilis A86Trimeresurus gracilis ntub
Ovophis okinavensiscpOvophis okinavensisfk
Lachesis stenophrysLachesis muta
Ophryacus undulatusOphryacus melanurus
Agkistrodon contortrixAgkistrodon piscivorusAgkistrodon bilineatush
Agkistrodon tayloriSistrurus catenatus
Sistrurus miliarusCrotalus cerastes
Crotalus polystictusSistrurus ravus
Crotalus pusillusCrotalus triseriatusXoCrotalus triseriatusTo
Crotalus lepidusCrotalus aquilusCrotalus triseriatusLG
Crotalus horridusARCrotalus horridusNY
Crotalus priceiCrotalus intermedius
Crotalus transversusCrotalus enyo
Crotalus willardiROMCrotalus willardi2575Crotalus willardi413
Crotalus adamanteusCrotalus tigris
Crotalus mitchelliCrotalus scutulatus
Crotalus viridisCrotalus molossus
Crotalus basiliscusCrotalus unicolor
Crotalus durissusCrotalus vegrandis
Crotalus atroxCrotalus tortugensisCrotalus catalinensisCrotalus exsul
Crotalus ruberBothriechis supercilliarisBothriechis schlegelii
Bothriechis nigroviridisBothriechis lateralisBothriechis thalassinus
Bothriechis marchiBothriechis bicolor
Bothriechis auriferBothriechis rowleyi
Bothrocophias hyoproraBothrocophias microphthalmus2
Bothrops ammodytoidesBothrops cotiara
Bothrops alternatusBothrops insularis
Bothrops erythromelasBothrops neuwiedi
Bothriopsis bilineataBothriopsis taeniata
Bothriopsis oligolepis4Bothrops jararacussu
Bothrops atroxBothrops asper
A picadoi Alajuella CRAtropoides picadoiA picadoi SanJose2 CRA picadoi SanJose CR
Atropoides Honduras AnH1A n occiduus Solola GUATA n occiduus Sonsonate ELSALVA n occiduus1 Guatemala GUATAtropoides occiduus2 Escuintla GUATA n occiduus2 Escuintla GUATA olmec Chiapas1 MEXA olmec Chiapas2 MEXAtropoides olmecA olmec1 Veracruz MEXA olmec Chiapas3 MEXA olmec BVerapaz GUATA olmec2 Veracruz MEXA olmec Oaxaca MEXA n nummifer Hidalgo MEX
A n nummifer Veracruz3 MEXA n nummifer Veracruz2 MEXA n nummifer Veracruz1 MEXAtropoides nummifer Puebla MEXA n nummifer Puebla MEXAtropoides mexicanusclpA n mexicanus SanJose CRA n mexicanus Puntar CRA n mexicanus Cartago2 CRA n mexicanus Cartago1 CRA n mexicanus Huehet GUATA n mexicanus Quiche GUAT
A n mexicanus Izabal GUATA n mexicanus AVerpaz GUATA n mexicanus BVerpaz GUATA n mexicanus Peten GUAT
P dunni Pd4Porthidium dunniPorthidium dunni Oaxaca1 MEX
P ophryomegas Zacapa GUATP ophryomegas Hond ND4 PSPHPorthidium ophryomegasPorthidium ophryomegas Guanacaste CR
P yucatanicum PY1Porthidium nasutumPorthidium nasutum CR CLPP nasutum Alajuela CRP nasutum AVerapaz GUATP nasutum CR1P nasutum CR4P lansbergi MargIs VENP lansbergi PANAMA
AF191580 WW P nasutum EcuadorPorthidium arcosePorthidium arcosae ECUADOR CLP
P porrasi Punt5 CRP porrasi Punt4 CRP porrasi Punt3 CRPorthidium porrasi Punt2 CRP porrasi Punt2 CR
Cerrophidion petlacalensisC tzotzliroum Chiapas1 MEXC tzotzliroum Chiapas2 CR
C godmani SantaAna ESC godmani Ocotepeque2 HNDC godmani Ocotepeque1 HNDC godmani Honduras CgH2C godmani Honduras CgH3
C godmani SanJose4 CRCerrophidion godmaniC godmani SanJose5 CRC godmani SanJose6 CRC godmani SanJose1 CRC godmani SanJose2 CRC godmani SanJose3 CRC godmani SanJose CR
C godmani Quetzal GUATC godmani Guat GuatC godmani Guat3 GUATC godmani Guat2 GUATC godmani Oaxaca2 MexC godmani Oaxaca MEX
C godmani Huehuet GUATC godmani Quiche GUATC godmani Bverapaz2 GUATC godmani SanMarcos GUATC godmani BVerapaz GUATCerrophidion godmani GUATC godmani GUAT
0.54
0.66
0.91
0.74
0.84
NewWorld
Crotalinae
Calibration Error includes several components:
Fossil misidentified (belongs elsewhere and calibrates a different node)
Fossil mis-dated (uncertainty in determining absolute age of fossil)
Non-preservation (fossil never gives true origin - impossible to avoid)
Fossil cross-validation (Near et al., 2005)
Test the effect of each fossil on the time estimates
We left one fossil and re-estimated dates of remaining fossils using r8s Consistent
Inconsistent
Parameters:
1
n
DD xi
i
x
xi
ix DSS 2
)1(
2
1
nn
Ds xi
i
n
x
Average difference between molecular ages and fossil ages
Sumsquares of differences
Standard deviation
Effect of removing inconsistent fossils
Fossils inconsistency
-6
-4
-2
0
2
4
6
8
1 2 3 4 5
Fossil calibration
Fossil 1
Ove
resti
mati
onun
dere
stim
ation
Best ?
Use of all fossils
Different values of (parameter that relaxes the molecular clock using Penalized Likelihood).
0.01 0.1 1 10 100 1000 10000
Estimation of divergence time using r8s
4.2
4.4
4.6
4.8
5
5.2
5.4
5.6
5.8
6
-2 -1 0 1 2 3 4
0
10
20
30
40
50
60
-2 -1 0 1 2 3 4
Cros
s-va
lidati
on s
core
Subs
tituti
on ra
te ra
tio
Log () Log ()
Clock behavior
5 different outgroups depending of its distance to the ingroup (number of internal branches)
Optimization of branch lenghts using likelihood and the GTR++I model
Estimation of divergence time using the Mean Path Length method Pathd8
ingroup
outgroup 1
outgroup 2
outgroup 3
0
4
8
12
16
20
1 2 3 4 5 6
Gloydius
Ovophis_jbs
Proto
Ermia
Calloselasma
Gloydius 41T
Node
Est
imate
d a
ge
2nd codon
position
3rd codon position
GTR dist
GTR dist
Unco
rrect
ed
dis
t
Unco
rrect
ed
dis
t
Target
Calibration
Calibration BELOW the target OVERESTIMATION
Target
Calibration
Calibration ABOVE the target UNDERESTIMATION
Parameters required to derive posterior densities
Phylogenetics topology, node support
DTE credibility intervals of dates
Implemented in the Multidistribute package (baseml, estbranches, multidivtime)
We tested:
Time expectedrttm, rttmsd
Rate expectedrtrate,rtratesd
Bigtime
brownmean
minab
Fossils
0
5
10
15
20
25
48 48 48 79 79 79 57 57 57 89 89 89
Node
MY
A
02
468
10
121416
1820
48 48 48 79 79 79 57 57 57 89 89 89
Node
MY
A
rttm
15
18
25
rttmsd
0.1
0.3
0.5
15 18 25
0
2
46
8
10
12
1416
18
20
48 48 48 79 79 79 57 57 57 89 89 89
Node
MY
A
0
24
6
810
12
14
1618
20
48 48 48 79 79 79 57 57 57 89 89 89
Node
MY
A
bigtime
24
30
50
rtrate
0.05
0.14
0.2
0
24
68
10
1214
1618
20
48 48 48 79 79 79 57 57 57 89 89 89
Node
MY
A
02
46
81012
1416
1820
48 48 48 79 79 79 57 57 57 89 89 89
Node
MY
A
brownmean
0.56
0.83
1.1
minab
0.5
1.0
1.5
0
2
4
6
810
12
14
16
18
20
48 48 79 79 57 57 89 89
Node
MY
A
fossils
with
without
w w/o
0.0
2.0
4.0
6.0
8.0
10.0
12.0
14.0
16.0
18.0
0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0 18.0
4X
Cyt
b
0.0
2.0
4.0
6.0
8.0
10.0
12.0
14.0
16.0
18.0
0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0 18.0
4X
ND
4
0.0
2.0
4.0
6.0
8.0
10.0
12.0
14.0
16.0
18.0
0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0 18.0
4X
12S
0.0
2.0
4.0
6.0
8.0
10.0
12.0
14.0
16.0
18.0
0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0 18.0
4X
16S
mean
0.0
0.4
0.8
1.2
1.6
2.0
2.4
2.8
3.2
3.6
0.0 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6
4X
Cyt
b
0.0
0.4
0.8
1.2
1.6
2.0
2.4
2.8
3.2
3.6
0.0 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6
4X
ND
4
0.0
0.4
0.8
1.2
1.6
2.0
2.4
2.8
3.2
3.6
0.0 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6
4X
12S
0.0
0.4
0.8
1.2
1.6
2.0
2.4
2.8
3.2
3.6
0.0 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6
4X
16S
717 bp 669 bp
417 bp 503 bp
SD
0.0
2.0
4.0
6.0
8.0
10.0
12.0
14.0
16.0
18.0
20.0
22.0
0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0 18.0 20.0 22.0
4X
mD
NA
0.0
2.0
4.0
6.0
8.0
10.0
12.0
14.0
16.0
0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0
4X
mD
NA
0.0
0.4
0.8
1.2
1.6
2.0
2.4
2.8
3.2
3.6
0.0 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6
4X
mD
NA
0.0
2.0
4.0
6.0
8.0
10.0
12.0
14.0
16.0
18.0
0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0 18.0
4X
mD
NA
M SD
Lower Upper
Partitioned vs unpartitioned
0
2
4
6
8
10
12
14
16
18
20
0 2 4 6 8 10 12 14 16 18 20
8 million
1 m
illio
n
0
2
4
6
8
10
12
14
16
18
0 2 4 6 8 10 12 14 16 18
8 million
1 m
illio
n
Date Upper
The final result…you hope is the best estimate!!!!
MY final remarks
Hedges is always wrong!!
Graur and Martin were wrong!!! Ok, to some extent!
Time estimation using molecular data is a very useful tool in the advance of evolutionary theory
Divergence time estimation procedures should to take into account factors different than violations of molecular clock assumptions in order to avoid spurious results.