Post on 03-Aug-2015
transcript
Convergence for everyone? Detecting genomic adaptive convergence:
Initial results, lessons & perspectives
16th September 2014
Joe Parker, Queen Mary University London
Adaptive molecular convergence
• Background & definition • Site-based methods • Tree-based methods • Combined approaches • Sampling / phylogenies as parameters • Future
Lab Interests
• Ecology and evolution of traits • Echolocation, sociality • NGS data for population genetics and phylogenomics
Defining molecular convergence
• It isn’t: – Divergence (adaptive or neutral) – Conservation or purifying selection – Retention of ancestral states with secondary
changes in outgroups – ‘Neutral’ homoplasy
• It ought to be: – ‘Adaptive’ homoplasies – ‘Excess’ homoplasies
Prestin
• Gene phylogeny recovers (paraphyletic) mammalian echolocators as monophyletic1 • Functional convergence of parallel changes N7T & I384T demonstrated in vitro2
1Liu et al. (2010) Curr. Biol. 20:R53; 2Liu et al. (2014) MBE 31(9):2415
Methods
Methods
• Species phylogeny & inputs • Selection detection • Site-based convergence detection • Tree-based convergence detection
• Look at tips
Site-based methods
Site-based methods • Look at tips • Reconstruct ancestral changes
??? ??
Lysozyme • Convergent and parallel
substitutions in stomach lysozymes of advanced ruminants
• Parsimony (‘over-estimate’) and Bayesian (‘under-estimate’) methods
Zhang & Kumar (1997) MBE 14:527
Site-based methods • Look at tips • Reconstruct ancestral changes
• Pairwise (conv) ∝ (div) changes
• BEB posterior probabilities
P(conv|data), P(div|data)
Castoe et al. (2009) PNAS 106(22):8986
Tree-based methods
Tree-based methods
• de novo tree search – Inference error – Signal : noise – Multiple phylogenies
Tree-based methods
∆SSLSnull - alternative (likelihood support comparison)
Convergence in echolocating mammals
• 22 mammals, 2326 loci, ~600,000 sites
• Convergence signals across genome
• Loci linked to sensory perception
Parker et al. (2013) Nature 502:228
details see Supplementary Fig. 1 and Methods): H1 corresponds to allecholocatingbats in amonophyletic group (‘bat–bat convergence’) andH2 to all echolocating mammals together in a monophyletic group(‘bat–dolphin convergence’). Using this approach we obtained theSSLS values of all amino acids under three different tree topologies.Thedifference in SSLS for a single site under the species tree and a givenconvergent tree with an identical substitution model denotes the rela-tive support for the convergence hypothesis; for example, DSSLS(H1)5 SSLS (H0)2 SSLS (H1) (where negativeDSSLS implies supportfor convergence; see Supplementary Fig. 2).Wequantified the extent ofsequence convergence at each locus by taking the mean of its DSSLSvalues, and found 824 loci with mean support for H1 and 392 for H2.Using simulationswe confirmed that these convergent signalswere notdue toneutral processes andwere robust to the substitutionmodel used(see Supplementary Methods).We ranked the mean DSSLS for all 2,326 loci under both conver-
gence hypotheses and, to assess theperformance of ourmethod, inspectedthe rank positions of seven hearing genes that have previously beenshown to exhibit convergence and/or adaptation in echolocatingmam-mals: prestin (Slc26a5), Tmc1, Kcnq4 (Kqt-4), Pjvk (Dfnb9), otoferlin,Pcdh15 and Cdh23 (see Methods). Prestin was ranked 43rd (H1) and22nd (H2), whereas several other loci were also ranked highly in thedistribution of convergence support values (see Fig. 1b). In addition tothese,wealso found several otherhearinggenes in the top5%supporting
H1 (Itm2b, Slc4a11) and H2 (Coch, Itm2b, Ercc3 and Opa1). Becausebats and cetaceans are also known to have undergone shifts in spec-tral tuning and other adaptations in response to living in low lightenvironments26–28, we also examined the position of genes implicatedin vision and found four such loci in the top 5%of genes supportingH1(Lcat, Slc45a2, Rabggtb and Rp1) and three supporting H2 (Jmjd6, Sixand Rho; see examples in Fig. 1b and Supplementary Tables 2 and 3).We tested statisticallywhether the strength of sequence convergence
among echolocating bats, and between echolocating bats and thebottlenose dolphin, is greater in hearing genes than in other genes(for locus selection, see Methods). For each phylogenetic hypothesis,we averaged the mean DSSLS values of all 21 genes in our data set thatare listed as linked to either hearing and/or deafness in any taxon basedon published functional annotations (see Supplementary Informa-tion). By comparing our observed values to null distributions of cor-responding values obtained by randomization, we found that hearinggenes had significantly more negative average values than expected bychance for bat–dolphin convergence (H2: z520.0194, P, 0.05). Werepeated this method for 75 genes listed as involved in vision and/orblindness, and found support, althoughweaker, in both cases of pheno-typic convergence (z520.0020,P# 0.055 and z520.0097,P# 0.09).Loci previously reported to have association with echolocation hadstrong support by randomization for both hypotheses (P# 0.01 inboth cases).
Lcat**Pcdh15**
Itm2b**
Hypothesis H1 (‘bat–bat convergence’)
Hypothesis H2 (‘bat–dolphin convergence’)
Hypothesis H0 (species tree)
Echolocating bats anddolphin
Euarchontoglires
Chiroptera; Yinpterochiroptera(echolocating and non-echolocating)
Chiroptera; Yangochiroptera(all echolocating)
Laurasiatheria
ArmadilloElephant
ChimpanzeeHuman
MousePika
RabbitHedgehog
Shrew
CatDog
HorseVicuna
Bottlenose dolphinCow
Greater false vampire batGreater horseshoe batStraw-coloured fruit bat
Parnell’s moustached batLittle brown bat
Large !ying fox
Echolocating bats
Non-echolocating batsAtlantogenata
Greater false vampire batGreater horseshoe bat
Straw-coloured fruit bat
Parnell’s moustached batLittle brown bat
Large !ying fox
Non-echolocating bats
Greater false vampire bat
Greater horseshoe bat
Straw-coloured fruit bat
Parnell’s moustached batLittle brown bat
Large !ying fox
Bottlenose dolphin
0
25
50
–0.06 –0.04 –0.02 0.00 0.02 0.04 0.06 0.08 0.10 –0.2 –0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6
75
500
1,000
1,500
!SSLS (H1) !SSLS (H2)
a
b
All other mammal lineages
All other mammal lineages
Support for H1 tree Support for H2 tree
Prestin**Dfnb59**
Slc44a2*
n = 2,326 loci n = 2,326 loci
0
5
10
15
20
100
200
300
400
500
Prestin*
Pcdh15*Ddx1**
Jmjd6*
Rho*
Six6*Tmc1**
Opa1*
Figure 1 | Convergence hypotheses and genomic distribution of support.a, For each locus, the goodness-of-fit of three separate phylogenetic hypotheseswas considered: (left) H0, the accepted species phylogeny based on recentfindings (for example, refs 14, 23–25); (top-right panel) H1, or ‘bat–batconvergence’, inwhich echolocating bat lineages (shown in brown) are forced toform a monophyletic group to the exclusion of non-echolocating Old Worldfruit bats (shown in orange); and (bottom-right panel) H2, or ‘bat–dolphinconvergence’, inwhich the echolocatingbat lineages and thedolphin (blue) forma monophyletic group to the exclusion of all non-echolocating mammals. SeeMethods for details of model fitting and topologies. b, The distribution of
convergence signal across 2,326 loci in 14–22 representativemammalian taxa, asmeasured by locus-wise mean site-specific likelihood support for the speciestopology (H0) over (left) the ‘bat–bat’ hypothesis uniting echolocating bats (thatis, DSSLS (H1)) and (right) bat–dolphin hypothesis (that is, DSSLS (H2)).Representative hearing and vision loci are shown in green and blue, respectively;for each locus significance levels based on simulation denote whether it hadsignificant counts of convergent sites after correcting for expected counts inrandom (control) phylogenies (*), and additionally whether strength of positiveselection (dN/dS) and convergence (DSSLS) at sites under selection inecholocators were correlated (**); see Supplementary Table 4 and Methods.
LETTER RESEARCH
1 0 O C T O B E R 2 0 1 3 | V O L 5 0 2 | N A T U R E | 2 2 9
Macmillan Publishers Limited. All rights reserved©2013
Combined methods
Trees and sites methods
Correlate selection (dN/dS) and incongruence (∆SSLS) signals
Genomic approaches
• Pool information across sites • Orthology, paralogy
-1.0 -0.5 0.0 0.5 1.0
010
020
030
040
050
0
Distribution of genomic convergence, various hypothesesDistribution of genomic convergence,
Mean locus sitewise-specific likelihood support for H0; !SSLS (H0 - Ha)Support H0 (species phylogeny)Support Ha (alternative phylogeny)
Genomic approaches
• Pool information across sites • Orthology, paralogy
-1.0 -0.5 0.0 0.5 1.0
010
020
030
040
050
0
Distribution of genomic convergence, various hypothesesDistribution of genomic convergence,
Mean locus sitewise-specific likelihood support for H0; !SSLS (H0 - Ha)Support H0 (species phylogeny)Support Ha (alternative phylogeny)
-1.0 -0.5 0.0 0.5 1.0
010
020
030
040
050
0
Distribution of genomic convergence, various hypothesesDistribution of genomic convergence,
Mean locus sitewise-specific likelihood support for H0; !SSLS (H0 - Ha)Support H0 (species phylogeny)Support Ha (alternative phylogeny)
Genomic approaches
• Pool information across sites • Orthology, paralogy
-1.0 -0.5 0.0 0.5 1.0
010
020
030
040
050
0
Distribution of genomic convergence, various hypothesesDistribution of genomic convergence,
Mean locus sitewise-specific likelihood support for H0; !SSLS (H0 - Ha)Support H0 (species phylogeny)Support Ha (alternative phylogeny)
-1.0 -0.5 0.0 0.5 1.0
010
020
030
040
050
0
Distribution of genomic convergence, various hypothesesDistribution of genomic convergence,
Mean locus sitewise-specific likelihood support for H0; !SSLS (H0 - Ha)Support H0 (species phylogeny)Support Ha (alternative phylogeny)
Genomic approaches
• Pool information across sites • Orthology, paralogy
-1.0 -0.5 0.0 0.5 1.0
010
020
030
040
050
0
Distribution of genomic convergence, various hypothesesDistribution of genomic convergence,
Mean locus sitewise-specific likelihood support for H0; !SSLS (H0 - Ha)Support H0 (species phylogeny)Support Ha (alternative phylogeny)
-1.0 -0.5 0.0 0.5 1.0
010
020
030
040
050
0
Distribution of genomic convergence, various hypothesesDistribution of genomic convergence,
Mean locus sitewise-specific likelihood support for H0; !SSLS (H0 - Ha)Support H0 (species phylogeny)Support Ha (alternative phylogeny)
Genomic approaches
• Pool information across sites • Orthology, paralogy
-1.0 -0.5 0.0 0.5 1.0
010
020
030
040
050
0
Distribution of genomic convergence, various hypothesesDistribution of genomic convergence,
Mean locus sitewise-specific likelihood support for H0; !SSLS (H0 - Ha)Support H0 (species phylogeny)Support Ha (alternative phylogeny)
-1.0 -0.5 0.0 0.5 1.0
010
020
030
040
050
0
Distribution of genomic convergence, various hypothesesDistribution of genomic convergence,
Mean locus sitewise-specific likelihood support for H0; !SSLS (H0 - Ha)Support H0 (species phylogeny)Support Ha (alternative phylogeny)
Genomic approaches
• Pool information across sites • Orthology, paralogy
-1.0 -0.5 0.0 0.5 1.0
010
020
030
040
050
0
Distribution of genomic convergence, various hypothesesDistribution of genomic convergence,
Mean locus sitewise-specific likelihood support for H0; !SSLS (H0 - Ha)Support H0 (species phylogeny)Support Ha (alternative phylogeny)
-1.0 -0.5 0.0 0.5 1.0
010
020
030
040
050
0
Distribution of genomic convergence, various hypothesesDistribution of genomic convergence,
Mean locus sitewise-specific likelihood support for H0; !SSLS (H0 - Ha)Support H0 (species phylogeny)Support Ha (alternative phylogeny)
Interpretation
• Notional convergence detected across genome, or not at all
• Relative measure
• Strength-of-evidence
Sampling
Which Trees?
Which Trees?
• Choice of hypothesis, subtly different from usual practice
• If we accept tree space distance important…
• … Hypotheses are parameters • Ennumerate over trees?
Future
On the horizon
• Models: – Null model – Alternative / convergent model
• Phylogeny methods: – Ennumerated / unrestricted phylogenies – Tree space ‘distance’
Conclusion
• Strong evidence molecular convergence, or something like our best definition of it, is a pervasive force
• Very early work; e.g. early attempts to estimate ω, and current dN/dS tests
Thanks Georgia Tsagkogeorga1 Kalina Davies1, James Cotton2, Elia Stupka3 & Steve Rossiter1
1School of Biological and Chemical Sciences, Queen Mary, University of London 2Wellcome Trust Sanger Institute
3Center for Translational Genomics and Bioinformatics, San Raffaele Institute, Milan
Chris Walker & Dan Traynor Queen Mary GridPP High-throughput Cluster
Chaz Mein & Anna Terry Barts and The London Genome Centre
Mahesh Pancholi, Seb Bailey, Xiuguang Mao & Chris Faulkes School of Biological and Chemical Sciences
European Research Council; BBSRC (UK); Queen Mary, University of London
(R-L): Joe Parker; GeorgiaTsagkogeorga; Kalina Davies; Steve Rossiter; Xiuguang Mao; Seb Bailey
Further information References 1. Zhang & Kumar (1997) MBE 14:527 2. Li et al. (2008) PNAS 105(37):13959 3. Castoe et al. (2009) PNAS 106(22):8986 4. Liu et al. (2010) Curr. Biol. 20:R53 5. Parker et al. (2013) Nature 502:228 6. Liu et al. (2014) MBE 31(9):2415
Resources – Lab: evolve.sbcs.qmul.ac.uk/rossiter – SVN: bit.ly/1m96pXM – email: j.d.parker@qmul.ac.uk