+ All Categories
Home > Documents > Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary...

Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary...

Date post: 19-Dec-2015
Category:
View: 216 times
Download: 1 times
Share this document with a friend
45
Phylogeny - A brief introduction in 4 hours -
Transcript
Page 1: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Phylogeny

- A brief introduction in 4 hours -

Page 2: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Outline

• Introduction• Practical approach• Evolutionary models• Distance-based methods / TP5_1• Databases and software• Sequence-based methods / TP5_2

Page 3: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

What is phylogeny?

Page 4: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Phylogeny is the evolutionary history and relationship of species.

Page 5: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Why is phylogeny of interest in a proteomics

course?

Page 6: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

What data types can be used to infer phylogenies?

• Morphological characters• Physiological characters• Gene order (e.g. in mitochondria)• Sequence data

– Nucleotide sequences– Amino acid sequences

• Mixed characters• ….

Page 7: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

What is a phylogenetic tree?

• A phylogenetic tree is a model about the evolutionary relationship between species (OTUs) based on homologous characters

• But not all trees are phylogenetic trees– Dendrogram = general term for a

branching diagram– Cladogram: branching diagram without

branch length estimates– Phylogenetic tree or Phylogram: branching

diagram with branch length estimates

Page 8: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

What is a phylogenetic tree?

• Rooted or unrooted• bifurcating or multifurcating

(solved or unsolved)

Page 9: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Gene duplication• Prokaryots: at least 50%• Eukaryots: >90%

Page 10: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

After gene duplication• Coexistence (normally only for a short

while)• Mostly, only one copy is retained

– becomes nonfunctional (non-functionalization),– becomes a pseudogene (pseudogenization)– is lost

• Both copies are retained– Distinct expression pattern– Distinct subcellular location (rare)– One copy keeps the original function, the other

copy acquires a new function (neofunctionalization)

– Deleterious mutations in both entries (subfunctionalization)

Page 11: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Human gene A

Mouse gene B

Mouse gene A

Human gene B

Frog gene A

Frog gene B

Drosophila gene AB

Orthologs

Orthologs

Paralogs

Homologs

Gene duplication

Ancestral gene

Relationships within homologs

Page 12: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Homologs …Homologs = Genes of common originOrthologs = 1. Genes resulting from a speciation event, 2. Genes originating

from an ancestral gene in the last common ancestor of the compared genomes

Co-orthologs = Orthologs that have undergone lineage-specific gene duplications subsequent to a particular speciation event

Paralogs = Genes resulting from gene duplicationInparalogs = Paralogs resulting from lineage-specific duplication(s)

subsequent to a particular speciation eventOutparalogs = Paralogs resulting from gene duplication(s) preceding a

particular speciation eventOne-to-one (1:1) orthologs = Orthologs with no (known) lineage-specific gene

duplications subsequent to a particular speciation eventOne-to-many (1:n) orthologs: Orthologs of which at least one - and at most all

but one - has undergone lineage-specific gene duplication subsequent to a particular speciation event

Many-to-many (n:n) orthologs = Orthologs which have undergone lineage-specific gene duplications subsequent to a particular speciation event

Xenologs = Orthologs derived by horizontal gene transfer from another lineage

Page 13: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Human gene A

Mouse gene B

Mouse gene A

Human gene B

Frog gene A

Frog gene B

Drosophila gene AB

Inparalogs of Group 2

Gene duplication

Ancestral gene

Co-orthologs of Drosophila gene AB

Orthologs (Group 1)

Outparalogs of Group 1

Orthologs (Group 2)

Relationships between orthologs and paralogs

Page 14: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Practical approach I

Actin-related protein 2 (first 60 columns of the alignment)

ARP2_A MESAP---IVLDNGTGFVKVGYAKDNFPRFQFPSIVGRPILRAEEKTGNVQIKDVMVGDEARP2_B MDSQGRKVIVVDNGTGFVKCGYAGTNFPAHIFPSMVGRPIVRSTQRVGNIEIKDLMVGEEARP2_C MDSQGRKVVVCDNGTGFVKCGYAGSNFPEHIFPALVGRPIIRSTTKVGNIEIKDLMVGDEARP2_D MDSQGRKVVVCDNGTGFVKCGYAGSNFPEHIFPALVGRPIIRSTTKVGNIEIKDLMVGDEARP2_E MDSKGRNVIVCDNGTGFVKCGYAGSNFPTHIFPSMVGRPMIRAVNKIGDIEVKDLMVGDE *:* :* ******** *** *** . **::****::*: . *::::**:***:*

Species are:Caenorhabditis briggsaeDrosophila melanogasterHomo sapiensMus musculusSchizosaccharomyces pombe

Can you build a dendrogram (tree) for the sequences of the alignment?Can you assign the species to the corresponding sequences of the alignment?

Page 15: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Phylogenetic analysis

1. Select Data2. Alignment3. Select a data model4. Select a substitution model5. Tree-building

• [Distance matrix]• Tree-building

6. Tree evaluation

Page 16: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Select data

• To be considered:– Input data must be homolog!– Number of character states– Content of phylogenetic information– Size of the dataset– Automated cluster data from large

datasets– etc

Page 17: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Alignment

• MSA methods– ClustalW– muscle– MAFFT– Probcons– T-coffee– …

• See previous course …

Page 18: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Data model

= Characters selected for the analysis

• To be considered:– Each character should be homolog!– Missing data (in some OTU)– Number of characters– etc

Page 19: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Evolutionary modelsPhylogenetic tree-building presumes

particular evolutionary modelsThe model used influences the outcome of

the analysis and should be considered in the interpretation of the analysis results

• Which aspects are to be considered?1. Frequencies of aa exchange2. Change of aa frequencies during evolution3. Between-site rate variation or Among-site

substitution rate heterogenity4. Presence of invariable sites

Page 20: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Evolutionary modelsNotation, e.g.

JTTJTT + FJTT + F + gamma (4 )JTT + F + gamma (8 ) + I (under discussion)JTT + F + I

It is not always the most complex model that produces the best result.

The more complex the model, the more complex the explanation of the results.

Page 21: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Tree-building methods

• Distance (matrix) methods1. Calculate distances for all pairs of taxa

based on the sequence alignment2. Construct a phylogenetic tree based on

a distance matrix

• Character-based (Sequence) methods

1. Constructs a phylogenetic tree based on the sequence alignment

Page 22: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Step 1: Compute distances

1. Estimate the number of amino acid substitutions between sequence pairs

p distance: p=nd/n

p = proportion (p distance)nd= number of aa differences

n = number of aa used

^

Page 23: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Step 1: Compute distances

• Nonlinear relationship of p with t (time)

• Estimation of aa substitutions– Poisson correction

• PC distance

– Gamma correction• Gamma distance

Page 24: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Step 2: Tree-building

Common distance methods• Neighbor Joining (NJ)• UPGMA / WPGMA• Least Square (LS)• Minimal Evolution (ME)

Page 25: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Neighbor Joining (NJ)• Saitou, Nei (1987)• Principle

– Clustering method– Simplified minimal evolution principle– Neighbors = taxa connected by a single

node in an unrooted tree– Computational process: Star tree, followed

by a successive joining of neighbors and the creation of new pairs of neighbors

– Result: • A single final tree with branch length estimates• unrooted tree

Page 26: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Neighbor Joining (NJ)

• Sum of branch lengths in the star tree

• Calculate the sum of all branch lengths for all possible neighbors …

Page 27: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Neighbor Joining (NJ)

• Calculate Length X-Y

• Calculate again sum of all branch length

Page 28: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Neighbor Joining (NJ)

Page 29: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Neighbor Joining (NJ)

• Advantage– Very efficient– Also for large datasets

• Disadvantage– Does not examine all possible

topologies

Page 30: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Bootstrap

• Used to test the robustness of a tree topology

• by Bradley Efron (1979)• Felsenstein (1985)• Principle: new MSA datasets are created by

choosing randomly N columns from the original MSA; where N is the length of the original MSA

• 100-1000 replicates• Bootstrap support values: (75%), 95%, 98%

Page 32: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Ortholog databases & phylogenetic databases

Some databases providing orthologous groups and trees

• COG/KOG• HOGENOM• Ensembl• OMA browser• OrthoDB• OrthoMCL

• Pfam• PANDIT• SYSTERS• TreeBase• Tree of Life

Page 33: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Phylogenetic software

Software packages• Freely available

– Phylip – BioNJ– PhyML– Tree Puzzle– MrBayes

• Commercial– PAUP– MEGA

Page 34: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Phylogenetic servers

• http://www.phylogeny.fr/• http://bioweb.pasteur.fr/seqanal/phylogeny/intro-

uk.html• http://atgc.lirmm.fr/phyml/• http://phylobench.vital-it.ch/raxml-bb/• http://www.fbsc.ncifcrf.gov/app/htdocs/appdb/

drawpage.php?appname=PAUP• http://power.nhri.org.tw/power/home.htm

Page 35: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Sequence methods

Most common:• Maximum Parsimony (MP)• Maximum Likelihood (ML)• Baysian Inference

Page 36: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Maximum Parsimony (MP)

• Originally developed for morphological characters

• Henning, 1966• William of Ockham: the best

hypothesis is the one that requires the smallest number of assumptions

Page 37: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Maximum Parsimony (MP)• Principle:

– Estimate the minimum number of substitutions for a given topology

– Parsimony-informative sites (exclude invariable sites and singletons)

– Searching MP trees• Exhaustive search• Branch-and-bound (Hendy-Penny, 1982)

– Good but time-consuming, if m>20• Heuristic search

– Result tree might not be the most parsimonious tree

– Result• Multiple result trees are possible (strict consensus

tree, majority-rule consensus tree)• Most parsimonious tree vs true tree• Unrooted result trees

Page 38: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Maximum Parsimony (MP)

• Advantages– Free from assumptions (model-free)

• Disadvantages– Does not take into account homoplasy– Long-branch attraction (LBA): creates

wrong topologies, if the substitution rate varies extensively between lineages

Page 39: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Maximum Likelihood (ML)• Cavalli-Sforza, Edwards (1967), gene frequency data• Felsenstein (1981), nucleotide sequences• Kishino (1990), proteins• Principle

– Maximizes the likelihood of observing the sequence data for a specific model of character state changes

– Likelihood of a site = Sum of probabilities of every possible reconstruction of ancestral states at the internal nodes

– Likelyhood of the tree = Product of the likelihoods for all sites (=sum of log likelihoods)

– Result = tree with the highest likelihood• Maximized to estimate branch lengths, not topologies• Search strategies: rarely exhaustive, mostly heuristic

• NNI (Nearest neighbor interchanges)• TBR (Tree bisection-reconnection)• SPR (Subtree pruning and regrafting)

Page 40: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Number of possible trees

• Unrooted bifurcating trees:

• Rooted bifurcating trees:

Page 41: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Number of possible trees

Leaves Rooted Unrooted

Page 42: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Number of possible trees

Leaves Unrooted Rooted 3 1 3 4 3 15 5 15 105 6 105 945 7 945 10395 8 10395 135135 9 135135 202702510 2027025 34459425

Page 43: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Maximum Likelihood (ML)

• Methods:– ProML (Phylip)– PhyML– RaxML– …

Page 44: Phylogeny - A brief introduction in 4 hours -. Outline Introduction Practical approach Evolutionary models Distance-based methods / TP5_1 Databases and.

Tree evaluation

1. Topology1. Comparison with species tree2. Robustness, e.g. bootstrap

2. Branch lengths


Recommended