+ All Categories
Home > Documents > 16 September 2007 Coalescent Consequences for Consensus Cladograms J. H. Degnan 1, M. Degiorgio 2,...

16 September 2007 Coalescent Consequences for Consensus Cladograms J. H. Degnan 1, M. Degiorgio 2,...

Date post: 13-Jan-2016
Category:
Upload: jasper-bryan-hancock
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
14
16 September 2007 16 September 2007 Coalescent Coalescent Consequences for Consequences for Consensus Consensus Cladograms Cladograms J. H. Degnan J. H. Degnan 1 , M. Degiorgio , M. Degiorgio 2 , D. Bryant , D. Bryant 3 , and N. A. , and N. A. Rosenberg Rosenberg 1,2 1,2 1 1 Dept. of Human Genetics, U. of Michigan Dept. of Human Genetics, U. of Michigan 2 2 Bioinformatics Program, U. of Michigan Bioinformatics Program, U. of Michigan 3 3 Dept. of Mathematics, U. of Auckland Dept. of Mathematics, U. of Auckland
Transcript
Page 1: 16 September 2007 Coalescent Consequences for Consensus Cladograms J. H. Degnan 1, M. Degiorgio 2, D. Bryant 3, and N. A. Rosenberg 1,2 1 Dept. of Human.

16 September 200716 September 2007

Coalescent Coalescent Consequences for Consequences for Consensus Consensus CladogramsCladograms

J. H. DegnanJ. H. Degnan11, M. Degiorgio, M. Degiorgio22, D. Bryant, D. Bryant33, and N. A. Rosenberg, and N. A. Rosenberg1,21,2

1 1 Dept. of Human Genetics, U. of Michigan Dept. of Human Genetics, U. of Michigan 2 2 Bioinformatics Program, U. of MichiganBioinformatics Program, U. of Michigan3 3 Dept. of Mathematics, U. of AucklandDept. of Mathematics, U. of Auckland

Page 2: 16 September 2007 Coalescent Consequences for Consensus Cladograms J. H. Degnan 1, M. Degiorgio 2, D. Bryant 3, and N. A. Rosenberg 1,2 1 Dept. of Human.

OutlineOutline

Species trees vs. gene treesSpecies trees vs. gene trees Consensus tree backgroundConsensus tree background Asymptotic consensus trees Asymptotic consensus trees Finite sample consensus treesFinite sample consensus trees Consistency resultsConsistency results ConclusionsConclusions

Page 3: 16 September 2007 Coalescent Consequences for Consensus Cladograms J. H. Degnan 1, M. Degiorgio 2, D. Bryant 3, and N. A. Rosenberg 1,2 1 Dept. of Human.

Gene trees vary across the genomeGene trees vary across the genome

Page 4: 16 September 2007 Coalescent Consequences for Consensus Cladograms J. H. Degnan 1, M. Degiorgio 2, D. Bryant 3, and N. A. Rosenberg 1,2 1 Dept. of Human.

Why? Incomplete lineage sorting, Why? Incomplete lineage sorting, horizontal gene transfer, sampling, etc.horizontal gene transfer, sampling, etc.

Page 5: 16 September 2007 Coalescent Consequences for Consensus Cladograms J. H. Degnan 1, M. Degiorgio 2, D. Bryant 3, and N. A. Rosenberg 1,2 1 Dept. of Human.

Gene tree discordanceGene tree discordance

From one true species tree, we expect there to From one true species tree, we expect there to be different gene trees at different loci as a be different gene trees at different loci as a result of lineage sorting, independently of result of lineage sorting, independently of problems due to estimation or sampling error.problems due to estimation or sampling error.

Gene tree discordance depends especially on Gene tree discordance depends especially on branch lengths in the species tree, measured branch lengths in the species tree, measured by the number of generations scaled by by the number of generations scaled by effective population size, effective population size, t / t / (2(2NN).).

Page 6: 16 September 2007 Coalescent Consequences for Consensus Cladograms J. H. Degnan 1, M. Degiorgio 2, D. Bryant 3, and N. A. Rosenberg 1,2 1 Dept. of Human.

x=2, y=1.2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

x=y=0.1

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

GT:(((A

,B),C

),D)

GT:(((A

,B),D

),C)

GT:(((A

,C),B

),D)

GT:(((A

,C),D

),B)

GT:(((A

,D),B

),C)

GT:(((A

,D),C

),B)

GT:(((B

,C),A

),D)

GT:(((B

,C),D

),A)

GT:(((B

,D),A

),C)

GT:(((B

,D),C

),A)

GT:(((C

,D),A

),B)

GT:(((C

,D),B

),A)

GT:((A,B

),(C,D

))

GT:((A,C

),(B,D

))

GT:((A,D

),(B,C

))

Page 7: 16 September 2007 Coalescent Consequences for Consensus Cladograms J. H. Degnan 1, M. Degiorgio 2, D. Bryant 3, and N. A. Rosenberg 1,2 1 Dept. of Human.

Consensus (majority-rule)Consensus (majority-rule)

Page 8: 16 September 2007 Coalescent Consequences for Consensus Cladograms J. H. Degnan 1, M. Degiorgio 2, D. Bryant 3, and N. A. Rosenberg 1,2 1 Dept. of Human.

Asymptotic consensus Asymptotic consensus treestrees

Consensus trees are usually Consensus trees are usually statisticsstatistics, functions of , functions of data like x-bar.data like x-bar.

We consider replacing observed (estimated) gene We consider replacing observed (estimated) gene trees with their theoretical probabilities under trees with their theoretical probabilities under coalescence and determining the resulting consensus coalescence and determining the resulting consensus tree. tree.

Motivation: if there are a large number of independent Motivation: if there are a large number of independent loci, observed clade proportions should approximate loci, observed clade proportions should approximate their theoretical probabilities.their theoretical probabilities.

Page 9: 16 September 2007 Coalescent Consequences for Consensus Cladograms J. H. Degnan 1, M. Degiorgio 2, D. Bryant 3, and N. A. Rosenberg 1,2 1 Dept. of Human.

Types of consensus treesTypes of consensus trees

Strict—only clades that are included in observed trees are in the Strict—only clades that are included in observed trees are in the consensus tree. In the coalescent model, all clades have probability > 0.consensus tree. In the coalescent model, all clades have probability > 0.

Democratic vote—use the gene tree that occurs most frequently.Democratic vote—use the gene tree that occurs most frequently.

Majority rule—consensus tree has all clades that were observed in > 50% Majority rule—consensus tree has all clades that were observed in > 50% of trees.of trees.

Greedy—sort clades by their proportions. Accept the most frequently Greedy—sort clades by their proportions. Accept the most frequently observed clades one at a time that are compatible with already accepted observed clades one at a time that are compatible with already accepted clades. Do this until you have a fully resolved tree.clades. Do this until you have a fully resolved tree.

R*—for each set of 3 taxa, find the most commonly occurring triple e.g., R*—for each set of 3 taxa, find the most commonly occurring triple e.g., (AB)C, (AC)B or (BC)A. Build the tree from the most commonly occurring (AB)C, (AC)B or (BC)A. Build the tree from the most commonly occurring triples. triples.

Page 10: 16 September 2007 Coalescent Consequences for Consensus Cladograms J. H. Degnan 1, M. Degiorgio 2, D. Bryant 3, and N. A. Rosenberg 1,2 1 Dept. of Human.

Unresolved zone for majority-rule Unresolved zone for majority-rule and too-greedy zoneand too-greedy zone

Page 11: 16 September 2007 Coalescent Consequences for Consensus Cladograms J. H. Degnan 1, M. Degiorgio 2, D. Bryant 3, and N. A. Rosenberg 1,2 1 Dept. of Human.

What about finite samples?What about finite samples?

If you sample 10 loci, you could have:If you sample 10 loci, you could have: All 10 match the species treeAll 10 match the species tree 9 match the species tree, 1 disagrees9 match the species tree, 1 disagrees 8 match the species tree, 2 disagree, etc.8 match the species tree, 2 disagree, etc.

You can consider gene trees as You can consider gene trees as categories categories and use and use multinomialmultinomial probabilities for the probability of your probabilities for the probability of your samplesample

By enumerating all multinomial samples, you can By enumerating all multinomial samples, you can compute the probabilities of every possible consensus compute the probabilities of every possible consensus tree.tree.

Page 12: 16 September 2007 Coalescent Consequences for Consensus Cladograms J. H. Degnan 1, M. Degiorgio 2, D. Bryant 3, and N. A. Rosenberg 1,2 1 Dept. of Human.
Page 13: 16 September 2007 Coalescent Consequences for Consensus Cladograms J. H. Degnan 1, M. Degiorgio 2, D. Bryant 3, and N. A. Rosenberg 1,2 1 Dept. of Human.

Are consensus trees inconsistent Are consensus trees inconsistent estimators of species trees?estimators of species trees?

Theorem 1Theorem 1. Majority-rule asymptotic . Majority-rule asymptotic consensus trees (MACTs) do not have any consensus trees (MACTs) do not have any clades not on the species tree.clades not on the species tree.

Theorem 2Theorem 2. Greedy asymptotic consensus . Greedy asymptotic consensus trees (GACTs) can be misleading estimators of trees (GACTs) can be misleading estimators of species for the 4-taxon asymmetric tree and for species for the 4-taxon asymmetric tree and for any species tree with any species tree with nn > 4 species. > 4 species.

Theorem 3Theorem 3. R* asymptotic consensus trees . R* asymptotic consensus trees (RACTs) always match the species tree.(RACTs) always match the species tree.

Page 14: 16 September 2007 Coalescent Consequences for Consensus Cladograms J. H. Degnan 1, M. Degiorgio 2, D. Bryant 3, and N. A. Rosenberg 1,2 1 Dept. of Human.

ConclusionsConclusions

Coalescent gene tree probabilities are useful for Coalescent gene tree probabilities are useful for understanding asymptotic behavior of consensus trees understanding asymptotic behavior of consensus trees constructed from independent gene trees.constructed from independent gene trees.

R* consensus trees are consistent and more resolved R* consensus trees are consistent and more resolved than majority-rule consensus trees.than majority-rule consensus trees.

Greedy consensus trees can be misleading, but are Greedy consensus trees can be misleading, but are quicker to approach the species tree than majority-rule quicker to approach the species tree than majority-rule or R* when outside of the greedy zone.or R* when outside of the greedy zone.


Recommended