Bayesian models for molecular evolutionRates, rates and traits
Nicolas Lartillot
March 20, 2012
Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 1 / 38
1 Molecular dating
2 Phylogenetic covariance model
3 ResultsRates, dates and traitsTesting the nearly-neutral modelGC landscapes and biased genes conversion
4 Conclusions
Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 2 / 38
The molecular clock
!"#$%&$'()*+'()+,'"*-(./+0123/+4-5"&'+/+0167+
8%&9%:&'9%+.*;:-(+
ρ = µ× f0substitution rate = mutation rate× fraction of neutral mutations
Variation of the substitution rate among lineages
Possible causesgeneration-time effect
time
metabolic rate effectsselection for longevity
(reviewed in Lanfear et al, 2010)
=⇒
TREESHREWLEMUR
HUMANFLYINGLEMUR
RABBITPIKA
SCIURIDRAT
MOUSECAVIOMORPH
MOLESHREWHEDGEHOG
LLAMAPIG
HIPPOWHALE
DELPHINOIDCOW
TAPIRRHINO
HORSEPHYLLOSTOMID
FLYINGFOXPANGOLIN
DOGCAT
ARMADILLOSLOTH
ANTEATERSIRENIAN
HYRAXELEPHANT
MACROSCELIDESELEPHANTULUS
TENRECIDGOLDENMOLE
AARDVARK
0.1 subs per site
Nicolas Lartillot (Universite de Montréal) BIN6009 10/05/2009 1 / 1concatenation of 13 nuclear genes, 38 placentals
Variation of the substitution rate among lineages
Possible causesgeneration-time effect
time
metabolic rate effectsselection for longevity
(reviewed in Lanfear et al, 2010)
=⇒
TREESHREWLEMUR
HUMANFLYINGLEMUR
RABBITPIKA
SCIURIDRAT
MOUSECAVIOMORPH
MOLESHREWHEDGEHOG
LLAMAPIG
HIPPOWHALE
DELPHINOIDCOW
TAPIRRHINO
HORSEPHYLLOSTOMID
FLYINGFOXPANGOLIN
DOGCAT
ARMADILLOSLOTH
ANTEATERSIRENIAN
HYRAXELEPHANT
MACROSCELIDESELEPHANTULUS
TENRECIDGOLDENMOLE
AARDVARK
0.1 subs per site
Nicolas Lartillot (Universite de Montréal) BIN6009 10/05/2009 1 / 1concatenation of 13 nuclear genes, 38 placentals
Estimating divergence times: the relaxed clock model
!"#!$#!%#
&'(#'('#('&#')'#***#
&'&#'&'#('&#')'#***#
&'(#'&'#()&#')'#***#
&'(#'&'#()&#')'#***#
sequence alignment
!
rt
Data and constraintsmultiple alignment D (here, nuclear coding genes in mammals)tree topology T , and fossil calibrations Φ
Estimating divergence times: the relaxed clock model
!"#!$#!%#
&'(#'('#('&#')'#***#
&'&#'&'#('&#')'#***#
&'(#'&'#()&#')'#***#
&'(#'&'#()&#')'#***#
sequence alignment
!
rt
Hierarchial Bayesian modeldiversification process (e.g. birth-death, parameters λ, µ, ρ)substitution rate: Brownian log-normal process (variance σ2)substitution process (4x4 substitution matrix Q)parameters: θ = (λ, µ, ρ, σ, t , r)
Estimating divergence times: the relaxed clock model
!"#!$#!%#
&'(#'('#('&#')'#***#
&'&#'&'#('&#')'#***#
&'(#'&'#()&#')'#***#
&'(#'&'#()&#')'#***#
sequence alignment
!
rt
Diversification processdiversification process (e.g. birth-death, parameters λ, µ, ρ)tree topology T , fossil constraints Φ
t : vector of divergence timesp(t | λ, µ, ρ,Φ,T ) = p(t | λ, µ, ρ)
Kendall 1948, Nee et al 1994, Yang and Rannala, 2006
Molecular dating
Tree shape and diversification patterns
Nee et al, 1994
modeling speciation-extinction processesusing time-calibrated phylogenies to fit and compare models
Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 6 / 38
Molecular dating
Brownian process
!"#!$#!%#
!
l = r "t
!
"t = t2# t
1
!
rj up
!
rj
!
p(D |T ,l,Q)p(r | t," )p(t)p(" )p(Q)
joint probability:
!
ln rj ~ N (ln rjup ,"#t)
xt = ln rt
dxt = σdBt
xt ∼ N(x0, σ2t)
P taxa, j = 1..2P − 2 branches
p(r | r0, σ2, t) =
∏j
p(rj | rjup ,∆tj , σ2)
Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 7 / 38
Molecular dating
Substitution process
!"#!$#!%#
!
l = r "t
!
"t = t2 # t1
!
rj up
!
rj &'(#'('#('&#')'#***#
&'&#'&'#('&#')'#***#
&'(#'&'#()&#')'#***#
&'(#'&'#()&#')'#***#
sequence alignment
probability of transition between nucleotides a and b over time ∆t :
r =1
∆t
∫ t1
t2rtdt '
rj + rjup
2
p(a→ b | r ,∆t ,Q) = [exp(r∆t Q)]ab
Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 8 / 38
Molecular dating
Relaxed clock model
!"#!$#!%#
!
l = r "t
!
"t = t2# t
1
!
rj up
!
rj
!
p(D |T ,l,Q)p(r | t," )p(t)p(" )p(Q)
joint probability:
!
ln rj ~ N (ln rjup ,"#t)
joint probability:
p(t | λ, µ, ρ) p(r | t , σ2) p(D | r , t ,Q) = p(D | θ)
parameters: θ = (λ, µ, ρ, σ, t , r)
Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 9 / 38
Molecular dating
Bayes theorem: general formulation
p(θ | D) =p(D | θ)p(θ)
p(D)
p(D) =
∫p(D | θ)p(θ)dθ
θ model parametersD Data
p(θ) prior distributionp(D | θ) likelihood
p(D) evidence, or marginal likelihoodp(θ | D) posterior distribution
Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 10 / 38
Sampling from the posterior: the Metropolis algorithm
0
0.2
0.4
0.6
0.8
1
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
"out2"
Histogram of out2[[1]]
random walk
Den
sity
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
Metropolis algorithmdraw U ∼ Uniform[0,1]
set θ∗k = θk + δ(U − 0.5), possibly reflecting back into [0,1]
accept with probability q = min{
1, p(θ∗k |D)
p(θk |D)
}iterate
(θk )k∈N is a Markov chain with stationary distirbution p(θ | f )
Molecular dating
Relaxed clock model
!"#!$#!%#
!
l = r "t
!
"t = t2# t
1
!
rj up
!
rj
!
p(D |T ,l,Q)p(r | t," )p(t)p(" )p(Q)
joint probability:
!
ln rj ~ N (ln rjup ,"#t)
posterior proportional to joint probability:
p(λ, µ, ρ) p(σ2) p(t | λ, µ, ρ) p(r | t , σ2) p(D | r , t ,Q)
Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 12 / 38
Molecular dating
Relaxed clock model
!"#!$#!%#
!
rj " # r j
Metropolis Hastings on rates
α =p(D | r ′, t ,Q) p(r ′ | t , σ2)
p(D | r , t ,Q) p(r | t , σ2)
α > 1: accept moveα < 1: accept with prob. α
Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 12 / 38
Molecular dating
Relaxed clock model
!"#!$#!%#
!
t j " # t j
Metropolis Hastings on divergence times
α =p(D | r , t ′,Q) p(r | t ′, σ2) p(t ′ | λ, µ, ρ)
p(D | r , t ,Q) p(r | t , σ2) p(t | λ, µ, ρ)
α > 1: accept moveα < 1: accept with prob. α
Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 12 / 38
Molecular dating
Relaxed clock model
!"#!$#!%#
!
l = r "t
!
"t = t2# t
1
!
rj up
!
rj
!
p(D |T ,l,Q)p(r | t," )p(t)p(" )p(Q)
joint probability:
!
ln rj ~ N (ln rjup ,"#t)
Metropolis Hastings on σ2
α =p(r | t , σ′2) p(σ′2)
p(r | t , σ2) p(σ2)
α > 1: accept moveα < 1: accept with prob. α
Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 12 / 38
Molecular dating
Relaxed clock model
!"#!$#!%#
!
l = r "t
!
"t = t2# t
1
!
rj up
!
rj
!
p(D |T ,l,Q)p(r | t," )p(t)p(" )p(Q)
joint probability:
!
ln rj ~ N (ln rjup ,"#t)
Metropolis Hastings on λ, µ, ρ
α =p(t | λ′, µ′, ρ′) p(λ′, µ′, ρ′)
p(t | λ, µ, ρ) p(λ, µ, ρ)
α > 1: accept moveα < 1: accept with prob. α
Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 12 / 38
Posterior mean times and rates
rate
900
1000
1100
1200
1300
1400
1500
1600
500 1000 1500 2000 2500
time
3.55
3.6
3.65
3.7
3.75
3.8
3.85
3.9
3.95
4
200 400 600 800 1000 1200 1400 1600 1800 2000 2200
Divergence times and substitution rates
PLATYPUSMONODELPHIDIDELPHISARMADILLOSLOTHANTEATERSIRENIANHYRAXELEPHANTAARDVARKSHEARELESHLOEARELESHTENRECIDGOLDENMOLETREESHREWSTREPSIRRHHUMANFLYINGLEMURABBITPIKASCIURIDRATMOUSECAVIOMORPHMOLESHREWHEDGEHOGLLAMAPIGHIPPOWHALEDELPHINOIDCOWTAPIRRHINOHORSEPHYLLOSTOMFLYINGFOXPANGOLINDOGCAT
0100 MyrsKT
carnivoreschiropteresperissodactyls
cetartiodactyls
eulipotyphlans
rodentslagomorphs
primates
afrotherians
xenarthransmarsupialsmonotremes
(Thorne et al 1998, Lepage et al 2007, Rannala and Yang 2007)Multidivtime, PhyloBayes, MCMCtree
Substitution-calibrated versus time-calibrated trees
TREESHREWLEMUR
HUMANFLYINGLEMUR
RABBITPIKA
SCIURIDRAT
MOUSECAVIOMORPH
MOLESHREWHEDGEHOG
LLAMAPIG
HIPPOWHALE
DELPHINOIDCOW
TAPIRRHINO
HORSEPHYLLOSTOMID
FLYINGFOXPANGOLIN
DOGCAT
ARMADILLOSLOTH
ANTEATERSIRENIAN
HYRAXELEPHANT
MACROSCELIDESELEPHANTULUS
TENRECIDGOLDENMOLE
AARDVARK
0.1 subs per site
Nicolas Lartillot (Universite de Montréal) BIN6009 10/05/2009 1 / 1
Correlating rates and life-history traitsRates and life-history traits
!!
!
!!!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!!!!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
5 10 15
!3.0
!2.5
!2.0
!1.5
!1.0
!0.5
0.0
rate / mass regression
log mass
log
su
bs.
rate
correcting for phylogenetic inertia (independent contrasts)sequential method: error propagationno feedback of rate variations on life-history evolution
Coupling substitution process with life-history evolution
!"#!$#!%#
&'#"()#
!
" =2 #1
#1 1
$
% &
'
( )
*+,-#./00#
01*0!2#3/!4#
"()#
)2(5#
5)#
(6))#
covariance
matrix
kg
!
r 1
!
r 3
!
r 2
!
l2
= r 2"t
789#898#987#8:8#222#
787#878#987#8:8#222#
789#878#9:7#8:8#222#
789#878#9:7#8:8#222#
sequence alignment
(Lartillot and Poujol, 2011, Molecular Biology and Evolution)
Data and constraintsmultiple alignment D (nuclear coding genes in mammals)
matrix of quantitative characters (C) (life-history traits)
tree topology (T ), and fossil calibrations (Φ)
Coupling substitution process with life-history evolution
!"#!$#!%#
&'#"()#
!
" =2 #1
#1 1
$
% &
'
( )
*+,-#./00#
01*0!2#3/!4#
"()#
)2(5#
5)#
(6))#
covariance
matrix
kg
!
r 1
!
r 3
!
r 2
!
l2
= r 2"t
789#898#987#8:8#222#
787#878#987#8:8#222#
789#878#9:7#8:8#222#
789#878#9:7#8:8#222#
sequence alignment
(Lartillot and Poujol, 2011, Molecular Biology and Evolution)
Hierarchial Bayesian model (parameter estimation by MCMC)
diversification process t (birth-death, parameters λ, µ, ρ)
Brownian multivariate process X (covariance matrix Σ)
time-dependent codon model Q
Codon modelMutation matrix U (4 x 4)
A C G TA − γ κγ 1C 1 − γ κG κ γ − 1T 1 κγ γ −
κ: transition-transversion ratioγ = GC∗/(1−GC∗) (GC∗: equilibrium GC)
Codon substitution matrix Q (61 x 61)
Qc1c2 = Un1n2 , c1, c2synonymous,Qc1c2 = Un1n2 ω, c1, c2non-synonymous.
Muse and Gaut 1994
Phylogenetic covariance model
Generalization
time-dependent substitution parametersrate of synonymous substitution (r )non-synonymous / synonymous ratio (ω)equilibrium GC (γ)
time-dependent quantitative traitssexual maturity (proxy of generation time)adult body massmaximum recorded lifespan (proxy of longevity)metabolic rategenome sizekaryotypic number (number of chromosomes 2n)
Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 18 / 38
Coupling substitution process with life-history evolution
!"#!$#!%#
&'#"()#
!
" =2 #1
#1 1
$
% &
'
( )
*+,-#./00#
01*0!2#3/!4#
"()#
)2(5#
5)#
(6))#
covariance
matrix
kg
!
r 1
!
r 3
!
r 2
!
l2
= r 2"t
789#898#987#8:8#222#
787#878#987#8:8#222#
789#878#9:7#8:8#222#
789#878#9:7#8:8#222#
sequence alignment
(Lartillot and Poujol, 2011, Molecular Biology and Evolution)
posterior proportional to joint probability:
p(λ, µ, ρ) p(t | λ, µ, ρ) p(Σ) p(X | t ,Σ) p(D | X , t)
Results Rates, dates and traits
1. Nuclear data: correlates of synonymous rate
Nicolas Lartillot (Universite de Montréal) BIN6009 10/05/2009 1 / 1
dS
dN/dS
maturity
longevity
mass
metabolic rate
dS dN/dS mat. long. mass met.
red: positive
blue: negative
light shade: not significant
strong correlations between life-history traitsdS correlates negatively with body mass, gen. time and longevityR2: life-history variations explain ∼ 35% of synonymous rate.dN/dS positively correlated with body size (nearly neutral effect?)
Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 20 / 38
Inferring divergence times and body size evolution
ManisAiluropodaCanisFelisPantheraEquusCeratotheriumTapirusTadaridaAntrozousMyotisArtibeusNycterisMegadermaPteropusRousettusSusBosTragelaphusHippopotamusMegapteraTursiopsLamaVicugnaSolenodonErinaceusSorexGalemysTalpaTarsiusCallithrixMacacaPongoGorillaHomoPanOtolemurLemurMicrocebusCynocephalusGaleopterusPtilocercusTupaiaCastorDipodomysPedetesMusRattusHystrixErethizonCaviaHydrochoerusMuscardinusSpermophilusTamiasOchotonaOryctolagusSylvilagusOrycteropusAmblysomusEchinopsElephantulusMacroscelidesLoxodontaProcaviaTrichechusDasypusChaetophractusEuphractusCholoepusdiCholoepushoMyrmecophagaTamandua
1
10
100
1000
10000
100000
1000000
10000000
0100 MyrsKT
The evolution of body size
PLATYPUSMONODELPHISDIDELPHISARMADILLOSLOTHANTEATERSIRENIANHYRAXELEPHANTAARDVARKMACROSCELIDESELEPHANTULUSTENRECIDGOLDENMOLETREESHREWLEMURHUMANFLYINGLEMURRABBITPIKASCIURIDRATMOUSECAVIOMORPHMOLESHREWHEDGEHOGLLAMAPIGHIPPOWHALEDELPHINOIDCOWTAPIRRHINOHORSEPHYLLOSTOMIDFLYINGFOXPANGOLINDOGCAT
0100 MyrsKT
1 kg
10 kg
100 kg
1000 kg
Nicolas Lartillot (Universite de Montréal) BIN6009 10/05/2009 1 / 1
Pakicetids
(Thewissen et al, 2001)
1 2 3 4 5 6 7 8
0.0
0.2
0.4
0.6
0.8
Hippo Whale ancestor
log10 Mass (g)p
ost.
de
nsity
coupled < KT uncoupled
Systematic trends
-3.2
-3
-2.8
-2.6
-2.4
-2.2
-2
-1.8
-100 -80 -60 -40 -20 0
substitution rate
4
5
6
7
8
9
10
11
12
-100 -80 -60 -40 -20 0
body mass
Cope’s or Stanley’s ruleintra-lineage drive towards larger body sizemore frequent extinction of large-bodied mammalsneeds to be explicitely modeled (directed Brownian motion)possible impact in estimated divergence times (Welch 2008)connections with mass-dependent extinction (FitzJohn, 2010).
Results Testing the nearly-neutral model
2. Mitochondrial data: correlates of dN/dS
dS
dN/dS
maturity
mass
dS dN/dS mat. long.
Nicolas Lartillot (Universite de Montréal) BIN6009 10/05/2009 1 / 1
mass
longevity
unfit
fit
coding sequence
1/N2
1/N1
positive correlation between dN/dS and body sizecompatible with a nearly-neutral interpretationvia negative correlation body size population size (N)(Ohta, 1972, Kimura, 1979, Popadin, 2007)
Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 24 / 38
Radical-conservative amino-acid replacement model
(adapted from Livington and Barton, 1993)
ω = Kr/Kc
Qab = Rab if a→ b conservative,Qab = Rab ω if a→ b radical.
Rab: a general time reversible 20x20 process.conservative = conserving volume and/or polarity (and/or charge)
Results Testing the nearly-neutral model
Mitochondrial data Kr/Kc (volume + polarity)
Kc
Kr/Kc
maturity
mass
Kc Kr/Kc mat. long. mass
longevity
Nicolas Lartillot (Universite de Montréal) BIN6009 10/05/2009 1 / 1
red: positive
blue: negative
light shade: not significant
positive correlation between Kr/Kc and body sizesimilar to that observed for dN/dS (but higher R2)charge: no significant effectpolarity + volume : strongest correlation (highest R2)
Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 26 / 38
Reconstructed variations of Kr/Kc
Procavia capensisLoxodonta africanaElephas maximusDugong dugonEchinops telfairiMacroscelides proboscideusElephantulus spOrycteropus aferChrysochloris asiaticaDasypus novemcinctusTamandua tetradactylaCholoepus didactylusBradypus tridactylusTupaia belangeriNycticebus coucangLemur cattaTarsius bancanusCynocephalus variegatusHylobates larPongo pygmaeus abeliiPongo pygmaeusPan troglodytesPan paniscusHomo sapiensGorilla gorillaTrachypithecus obscurusColobus guerezaPapio hamadryasMacaca sylvanusMacaca mulattaCercopithecus aethiopsCebus albifronsOchotona princepsOchotona collarisOryctolagus cuniculusLepus europaeusSciurus vulgarisMyoxus glisNannospalax ehrenbergiVolemys kikuchiiRattus norvegicusMus musculus molossinusMus musculusJaculus jaculusThryonomys swinderianusCavia porcellusHemiechinus auritusErinaceus europaeusEchinosorex gymnuraUrotrichus talpoidesTalpa europaeaMogera woguraSorex unguiculatusEpisoriculus fumidusCrocidura russulaRhinolophus pumilusRhinolophus monocerosPteropus scapulatusPteropus dasymallusPipistrellus abramusChalinolobus tuberculatusMystacina tuberculataArtibeus jamaicensisSus scrofaLama pacosMuntiacus reevesiMuntiacus muntjakMuntiacus crinifronsCervus nippon yesoensisCervus nippon centralisOvis ariesCapra hircusBubalus bubalisBos taurusBos indicusBos grunniensHippopotamus amphibiusPhyseter catodonKogia brevicepsPhocoena phocoenaMonodon monocerosLagenorhynchus albirostrisPontoporia blainvilleiInia geoffrensisPlatanista minorHyperoodon ampullatusBerardius bairdiiCaperea marginataEschrichtius robustusMegaptera novaeangliaeBalaenoptera physalusBalaenoptera musculusBalaenoptera brydeiBalaenoptera borealisBalaenoptera bonaerensisBalaenoptera acutorostrataEubalaena japonicaEubalaena australisBalaena mysticetusManis tetradactylaEquus caballusEquus asinusTapirus terrestrisRhinoceros unicornisCeratotherium simumCanis familiarisUrsus maritimusUrsus arctosUrsus americanusPhoca vitulinaHalichoerus grypusOdobenus rosmarus rosmarusEumetopias jubatusArctocephalus forsteriHerpestes javanicusFelis catusAcinonyx jubatus
0.66 1.2 1.7
!"#$%&'#"(
!)*"#+',"-*./"(
!0%#'1*)#"(
23/.1'*.10/"(
4)#%55',"-*./"(
6',)$+"(
7"8'9'#10"(
4#%9"*)5(
:)$"#*0#"(
;<#'*0)#%"(
Results GC landscapes and biased genes conversion
3. Equilibrium GC (GC∗) in nuclear genomes
dS
dN/dS
maturity
longevity
mass
dS dN/dS mat. long. mass GC*
GC*
# chrom.
# chromosomes
red: positive
blue: negative
light shade: not significant
negative correlation between GC∗ and body sizepositive correlation between GC∗ and number of chromosomes
Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 28 / 38
Results GC landscapes and biased genes conversion
Biased gene conversion (BGC) during meiosis
GC-biased mismatch repair
adapted from Duret and Galtier 2009
=⇒
GC is overtransmitted
A
T
A
T
C
G
C
G
A
T
C
G
C
G
C
G
xGC =1 + b
2xAT =
1− b2
meiotic distorsion bias b ⇐⇒ positive selection for GCb proportional to local recombination rate
Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 29 / 38
Results GC landscapes and biased genes conversion
Mutation selection model
Kimura 1982
Substitution process (low mutation approx.)Substitution rate = mutation rate x fixation probability
r = 2Nu p = u 2Np = uP
u: mutation ratep: fixation probabilityN: effective population size
P = 2Np: scaled fixation probability (relative to neutral)
Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 30 / 38
Results GC landscapes and biased genes conversion
Fixation probability of a mutation20ZZ CARTWRIGHT, LARTILLOT, AND THORNE—STUDYING ANCESTRAL LINEAGES 23
FIGURES
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
(a) Ancestral Lineage
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
(b) Ancestral Lineages Relating Three Species
FIGURE 1.
neutral case
p0 =1
2N
general case
p =2s
1− e−4Ns
Scaled fixation probability
P = 2Np =4Ns
1− e−4Ns =S
1− e−S
S = 4Ns: scaled selection coefficient
Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 31 / 38
Results GC landscapes and biased genes conversion
Scaled fixation probability as a function of S = 4Ns
P = 2Np =S
1− e−S
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●●
●●●●●●●●
●●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−6 −4 −2 0 2 4 6
01
23
45
6
SP
neutral case S = 0: P = 1deleterious S < 0: P < 1advantageous S > 0: P > 1
Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 32 / 38
Neutral fixation probability in the presence of BGC
20ZZ CARTWRIGHT, LARTILLOT, AND THORNE—STUDYING ANCESTRAL LINEAGES 23
FIGURES
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
(a) Ancestral Lineage
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
(b) Ancestral Lineages Relating Three Species
FIGURE 1.
GC overtransmission
A
T
A
T
C
G
C
G
A
T
C
G
C
G
C
G
xGC =1 + b
2xAT =
1− b2
Scaled conversion coefficient B = 4Nbmutation from AT to GC
P =B
1− e−B > 1
mutation from GC to AT
P =−B
1− eB < 1
Biased gene conversion explains variations of GC∗
Nicolas Lartillot (Universite de Montréal) BIN6009 10/05/2009 1 / 1
dS
dN/dS
maturity
longevity
mass
dS dN/dS mat. long. mass gc
gc
# chrom.
# chromosomes
red: positive
blue: negative
light shade: not significant
Negative correlation GC∗ / body sizelarger animals = smaller population = less efficient selectionalso less efficient BGC (lower GC∗)
Biased gene conversion explains variations of GC∗
Nicolas Lartillot (Universite de Montréal) BIN6009 10/05/2009 1 / 1
dS
dN/dS
maturity
longevity
mass
dS dN/dS mat. long. mass gc
gc
# chrom.
# chromosomes
linkage maps are provisional, they are in good agreement with theregression line.
Both the regression line for all species in Fig. 1a (NC !0.85(NF/2) + 6.2; R2 ! 0.83; F < 2.91 ! 10"17, ! in Fig. 1a) andthe average NC per chromosome arm (1.06 and 1.16 in Dutrillaux(1986) and Burt and Bell (1987), respectively) suggest that there isa required minimum of one crossover per chromosome arm. If thisconclusion is true, then the minimum genetic size for chromosomearms is expected to be 50 cM. This prediction is supported by thedistribution of the genetic size of 286 chromosome arms shown inFig. 2. Few of the chromosome arms appear to be less than 50 cM,and this limit applies equally to arms from uni-armed and bi-armedchromosomes. We suggest that those chromosome arms with anapparent size of less than 50 cM result from failure to observerecombination distal to the most telomeric marker employed inmap construction.
We conclude that genome-wide recombination is directly pro-portional to the number of chromosome arms, as proposed byDutrillaux (1986). Our analysis indicates that there is a generalrequirement for at least one crossover per chromosome arm, ratherthan the classical expectation of one crossover per chromosome.Chiasma (the cytological manifestation of crossovers at meiosis I)are required to establish the physical attachment between ho-mologs that, in turn, is required to ensure that one member of eachpair of homologs segregates to opposite poles of the meioticspindle. The proportionality between recombination and the num-ber of chromosome arms and, most importantly, the observed dis-tribution of arm size in bi-armed chromosomes (Fig. 2a) indicatesthat the effect of crossovers on maintaining proper distributivesegregation is effectively suppressed at the centromere. Interest-
ingly, the centromere shows distinct patterns of repression of re-combination and interference in Drosophila, Neurospora, yeast,and human (Dobzhanky 1930; Hulten 1974; Davies et al. 1994;Mathani and Willard 1998, Kaback et al. 1999). This requirementestablishes important constraints on the minimum number ofcrossovers for any chromosome.
Perhaps the most extreme example of a requirement for onecrossover per arm is the pseudoautosomal region (PAR) of mam-malian X and Y Chromosomes (Chrs). The PAR undergoes pairingand recombination during male meiosis. Crossing-over within thePAR is critical for the fidelity of chromosome segregation (Rouyeret al. 1986; Soriano et al. 1987). This requirement exists despitethe limited physical size of this region and represents a strikingexample of independence between physical and genetic distances.In humans the location of the PARs is remarkable. The X Chr issubmetacentric with two PARs located in the terminal regions ofeach arm, and both regions undergo pairing and recombinationduring male meiosis.
Although we conclude that in mammals the majority of theinterspecific variation in mammalian recombination rate is owingto changes in FN, this conclusion does not affect the significanceof the observed correlation between EC and age to maturity re-ported by Burt and Bell (1987). We note, however, that theiranalysis was based on two assumptions. First, that only one cross-over in each bivalent is required to ensure distributive segregation.Second, crossovers in excess of one per bivalent (EC) are notrequired for fidelity of chromosome segregation but are a mecha-nism to increase variability in the progeny. Under this model, ECis expected to represent the amount of recombination in an organ-ism that is unrelated to the fidelity of chromosome segregation. In
Fig. 1. Plot of mammalian recombination frequency as a function of hap-loid number of chromosome arms (FN/2). a) Recombination estimated asnumber of chiasma (NC). Each circle represents a species: 1, Dasyuroidesbyrnei; 2, Dasyurus viverrinus; 3, Sarcophilus harrissi; 4, Smithopsis cras-sicaudata; 5, Paremeles gunnii; 6, Isoodon macrourus; 7, Dasypus novem-cinctus; 8, Oryctolagus cuniculus; 9, Cricetus cricetus; 10, Lagurus lagu-rus; 11, Meriones ungiculatus; 12, Apodemus sylvaticus; 13, Rattus nor-vegicus; 14, Mus musculus; 15, Cebuella pygmamaea; 16, Sanguinusoedipus; 17, Macaca fuscata; 18, Macaca mulatta; 19, Macaca nemes-trina; 20, Pan troglodytes; 21, Homo sapiens; 22, Homo sapiens; 23,Mandrillux sphinx; 24, Cebus capucinus; 25, Lemur fulvus; 26, Lemurfulvus collaris; 27, Lemur fulvus albocollaris; 28, Akodon arviculoides;29; Akodon arviculoides; 30, Akodon sp; 31, Akodon nigris; 32, Zygodon-tomys lasiurus; 33, Clyomys laticeps; 34; Nectomys squamipes; 35, Nec-tomys squamypes; 36, Oxymicterus sp; 37, Euryzygomalomys guiara; 38,Proechimys iheringi; 39, Mus musculus; 40, Mesocricetus auratus; 41,
Cricetulus griseus; 42, Gerbillus aureus; and 43, Taterillus gracillis. NCdata in species represented as black circles were calculated from the ECdata reported by Burt and Bell (1987) (NC ! EC + n, where n is thenumber of haploid autosomes) and the regression line is denoted as ". NCdata in species represented as gray circles are from Dutrillaux (1986), andthe regression line is denoted as #. The regression line for the combineddata is denoted as !. b) Recombination estimated as the size of the linkagemap. Each filled circle represent a species used for the regression analysis:1, cattle; 2, dog; 3, pig; 4, mouse; 5, rat; and 6, human. Open circles arespecies not included in the regression analysis (see text): 7, goat; 8, sheep;9, cat; 10, horse; and 11, baboon. The estimated size of the linkage map ofeach species is as given in the references in the text, except for the baboon,which we estimate to be 28.3 M instead of 23.7 M (Rogers et al. 2000) aftercorrecting for the fraction of the human genome covered by the markersused in their study.
F. Pardo-Manuel de Villena, C. Sapienza: Recombination and chromosome arms 319
de Villena and Sapienza, 2001
Positive correlation GC∗ / chromosome numberconversion bias proportional to recombination rate∼ 1 recombination event per chromosome arm per meiosisrecombination rate inversely proportional to chromosome sizestronger gene conversion bias in more fragmented karyotypes
A history of biased gene conversion in placentals
reconstruction ofB = 4Ne b
B < 1: effectively neutral
B > 1: selective regime
B > 10: likely deleterious
VicugnaSusBosTursiopsMyotisPteropusEquusCanisFelisErinaceusSorexTupaiaTarsiusCallithrixMacacaPongoGorillaHomoPanMicrocebusOtolemurSpermophilusCaviaDipodomysMusRattusOchotonaOryctolagusEchinopsLoxodontaProcaviaCholoepusDasypus
0.14 3 5.8
VicugnaSusBosTursiopsMyotisPteropusEquusCanisFelisErinaceusSorexTupaiaTarsiusCallithrixMacacaPongoGorillaHomoPanMicrocebusOtolemurSpermophilusCaviaDipodomysMusRattusOchotonaOryctolagusEchinopsLoxodontaProcaviaCholoepusDasypus
0.064 1.9 3.7
VicugnaSusBosTursiopsMyotisPteropusEquusCanisFelisErinaceusSorexTupaiaTarsiusCallithrixMacacaPongoGorillaHomoPanMicrocebusOtolemurSpermophilusCaviaDipodomysMusRattusOchotonaOryctolagusEchinopsLoxodontaProcaviaCholoepusDasypus
0.06 2 3.9
VicugnaSusBosTursiopsMyotisPteropusEquusCanisFelisErinaceusSorexTupaiaTarsiusCallithrixMacacaPongoGorillaHomoPanMicrocebusOtolemurSpermophilusCaviaDipodomysMusRattusOchotonaOryctolagusEchinopsLoxodontaProcaviaCholoepusDasypus
0.13 2.6 5.1
VicugnaSusBosTursiopsMyotisPteropusEquusCanisFelisErinaceusSorexTupaiaTarsiusCallithrixMacacaPongoGorillaHomoPanMicrocebusOtolemurSpermophilusCaviaDipodomysMusRattusOchotonaOryctolagusEchinopsLoxodontaProcaviaCholoepusDasypus
0.071 2.4 4.7
VicugnaSusBosTursiopsMyotisPteropusEquusCanisFelisErinaceusSorexTupaiaTarsiusCallithrixMacacaPongoGorillaHomoPanMicrocebusOtolemurSpermophilusCaviaDipodomysMusRattusOchotonaOryctolagusEchinopsLoxodontaProcaviaCholoepusDasypus
0.14 2.8 5.4
BGC very weak (B ∼ 0.2) in anthropoids
BGC above the nearly neutral threshold (B > 1) in many taxa
in some lineages (microbats, insectivores) > 5% exons under B > 10
GC and genome rearrangementsGC and genome rearrangements
tion rates of all types. Our use of only high-quality chimp basecalls should reduce this effect, but some residual overestimationmay exist in our data. Nonetheless, we observe in the substitu-tions specific to the chimpanzee lineage the same pattern ofUCBS as found in the human-specific substitutions in the humangenome, indicating that our findings are not specific to the hu-man lineage. In fact, there is a striking similarity between thehuman and chimp UBCS profiles along each chromosome, de-spite the fact that (by definition) no substitutions are sharedbetween the two genomes (Fig. 2). The human and chimpsmoothed UBCS values are highly correlated genome-wide(Spearman correlation: ! = 0.70, permutation P ! 0) and for eachchromosome (Supplemental Table S3). This correlation is muchlower (! = 0.44 genome-wide), though still significant, if the dis-tal 16 Mb of each chromosome arm are dropped from the analy-sis. Analyzing only every 100th window along the genome inorder to reduce possible effects of autocorrelation betweenadjacent regions does not change the correlation between
human and chimp UBCS nor its statistical significance. Rough(unsmoothed) UBCS values are less highly correlated but arestill significantly similar between the two species (Supple-mental Table S3). This suggests that the correlation betweenchimp and human UBCS is regional rather than fine-scale. Thus,the force that is responsible for creating biased clusters of sub-stitutions is clearly location-dependent and evolutionarilyconserved.
Bias absent from chromosomes X and Y
The human sex chromosomes do not share the consistent pat-tern of bias found on all the autosomes (Fig. 3). On human chro-mosome Y, UBCS varies randomly and does not reach significantlevels. This observation is consistent with the BGC model, whichpredicts no bias in the absence of recombination. Data for chim-panzee chromosome Y was insufficient for analysis of UBCS. Sur-prisingly, despite the presence of substantial recombination (al-
Figure 2. Patterns of substitution bias are nearly identical in human and chimp. (A) Unexpected biased clustered substitutions (UBCS) (faint line) forhuman chromosome 7 is above zero, indicating GC bias, along most (91.1%) of the chromosome and rises significantly at the distal ends. SmoothedUBCS (dark line) and 95% confidence band (yellow) are shown. The 95% confidence region is above the null expectation (zero) for more than half ofthe chromosome (61.8%). UBCS only exceeds the genome-wide 95% confidence interval (gray) near the telomeres. (B) Chimpanzee chromosome 7has a remarkably similar profile (Spearman correlation ! = 0.87). (C) The pattern of bias on chromosome 7 is mirrored on all autosomes (orderedsequentially; red, human; blue, chimp). Elevated UBCS near telomeres exceeds the human genome-wide 95% confidence interval (gray) on almost allautosomes. Here the chimpanzee sequence has been aligned to the human genome.
Dreszer et al.
1422 Genome Researchwww.genome.org
Dreszer et al, 2007
beit reduced relative to the autosomes) both the human andchimpanzee X chromosomes show almost no UBCS signal(Fig. 3). Overall, clusters of substitutions on chromosome X are infact biased strong-to-weak, as would be expected in the absenceof selective pressures, due to AT mutation bias. In fact, correla-tion between the smoothed UBCS curves in human and chimpis much weaker on chromosome X (! = 0.24) compared to theautosomes (! = 0.39 to 0.96; Supplemental Table S3).
Since the X and Y chromosomes recombine with each otherin the pseudo-autosomal regions (PAR), we specifically examinedthese regions for evidence of UBCS. In humans there are two PARregions. One may be too small (!320 kb) to detect any significantUBCS, and we have found none. The only significant level ofUBCS on chromosome X is found along the distal end of the parm peaking in the larger (2.6 Mb) PAR, but extending !5 Mbbeyond its current boundary (Fig. 3). If UBCS is a product ofrecombination, we might expect a dramatic peak in the PAR re-gion. We detect a mild elevation, but only in comparison withthe levels of UCBS in other regions of X, and not from a genome-wide perspective. This milder elevation may be due to the sameprocesses that render X and Y much less polymorphic (Interna-tional SNP Map Working Group 2001) and diverse (Hellborg andEllegren 2004; Baines and Harr 2007) than autosomes. Furtherinvestigation is needed to determine why the UCBS effect is notas dramatic in the PAR regions as it is in autosomal telomeres.
Bias correlated with male recombination rateIn order to further investigate genome-wide bias trends and tobetter understand this “X exception,” we computed correlationsbetween human UBCS and a variety of genome characteristicsusing a window size of 1 Mb. Male recombination rate (! = 0.38),recombination hot spots (! = 0.37), and G+C% (! = 0.33) showthe strongest associations with UBCS, all of which are statistically
significant (P " 0). This level of correlation is high, especiallygiven that UBCS has accumulated for 6 million years underchanging patterns of recombination, which should be only par-tially reflected in current rates (Kong et al. 2002). UBCS, G+C%,and recombination rates all rise substantially near the telomeresof each autosome. Computing the correlations without the last16 Mb of sequence from each chromosome arm reduces the as-sociations by !50%. Compared to the male rate, the female re-combination rate has a much weaker, though statistically sig-nificant, correlation with UBCS (! = 0.25). Both transcriptiondensity (! = 0.009) and conservation (! = !0.069) show littlerelation to UBCS. We also examined genome-wide correla-tions over a range of window sizes from 10 kb to 1 Mb. Results atfiner scales were similar to those for 1 Mb windows, and weobserved that the stronger correlations all increase with windowsize (Fig. 4). Repeating this analysis with smoothed UBCS valuesincreases the magnitude of the observed correlations but doesnot qualitatively alter our findings.
To further investigate these correlations, we fit linear regres-sion models on the data for 1-Mb windows. A multiple linearregression analysis that adjusts for the effects of other variablesindicates that recombination hot spots, G+C%, and male recom-bination rate all have significant positive associations withUBCS. These associations are reduced in magnitude, but remainsignificant, when distal 16-Mb regions are dropped from theanalysis. Conservation and female recombination rate showweak negative associations with UBCS in the model adjusting forother genomic variables. Furthermore, the female recombinationrate is not correlated with the residuals from a simple linear re-gression of the male recombination rate on UBCS, suggestingthat the female rate offers no additional explanatory power. Ifclusters of biased substitutions are a consequence of some processassociated primarily with recombination in the male germline,
Figure 3. Exceptions to genome-wide patterns of bias. (A) There is little evidence of UBCS on either human or chimp sex chromosomes. There is alsopoor agreement between the human and chimp smoothed UBCS curves (! = 0.25 on chrX). The pseudo-autosomal regions (PAR) are shown in green.(B) Human chromosome 2 shows an atypical central peak, which is likely due to the fusion of two ancestral chromosomes (red, human chr2; blue, chimpchr2a and chr2b). The two smoothed curves are in remarkable agreement (! = 0.84). The UBCS signal for hypothetical telomeres (if no fusion hadoccurred) was predicted (green) by using 16 Mb of distal sequence (yellow).
Biased clustered substitutions in the human genome
Genome Research 1423www.genome.org
chromosome fusion dated at 0.74 Myr in the human lineage.Dreszer et al, 2007
Conclusions
Conclusions
Molecular datingmolecular clock: sequence divergence proportional to divergencetimesviolation of the molecular clock (variation of substitution rate)complex interplay between rates and timesrequires probabilistic model for teasing out rates and timeslife-history correlates of variation of ratealso variation of other aspects of the substitution process (dN/dS,GC, etc)
Nicolas Lartillot (Universite de Montréal) BCM2004: Molecular evolution March 20, 2012 38 / 38