+ All Categories
Home > Documents > Phylogenetic analysis

Phylogenetic analysis

Date post: 04-Jan-2016
Category:
Upload: tariq
View: 36 times
Download: 0 times
Share this document with a friend
Description:
Phylogenetic analysis. Selecting sequences Outgroup sequences Alignment Choice of method Example using one method. Three most important choices. Which sequences to include Outgroup sequences Alignment. “Outgroup” sequences be included. - PowerPoint PPT Presentation
37
Phylogenetic analysis Selecting sequences Outgroup sequences Alignment Choice of method Example using one method
Transcript
Page 1: Phylogenetic analysis

Phylogenetic analysis

Selecting sequences

Outgroup sequences

Alignment

Choice of method

Example using one method

Page 2: Phylogenetic analysis

Three most important choices

Which sequences to include

Outgroup sequences

Alignment

Page 3: Phylogenetic analysis

T 1T 2

T 3

O 2

T 4

T 5

T 6

O 1

O 3

O 4

T 1T 2

T 3

O 2

O 4

T 4

T 6T 5

O 1

O 3

T 1 T 2 T 3 O 2T 4 T 5 T 6 O 1 O 3 O 4T 1 T 2 T 3 O 2 T 4 T 5 T 6 O 1 O 3 O 4

Page 4: Phylogenetic analysis

T 1T 2

T 3

O 2

T 4

T 5

T 6

O 1

O 3

O 4

T 1T 2

T 3

O 2

O 4

T 4

T 6T 5

O 1

O 3

T 1 T 2 T 3 O 2T 4 T 5 T 6 O 1 O 3 O 4T 1 T 2 T 3 O 2 T 4 T 5 T 6 O 1 O 3 O 4

Page 5: Phylogenetic analysis

“Outgroup” sequences be included

The best outgroup sequences are sequences clearly outside the group being studied, but not too far out.

Multiple outgroup sequences should be chosen.

The outgroup sequences are included in the data matrix just like the other sequences.

They will be used to root the tree.

Page 6: Phylogenetic analysis
Page 7: Phylogenetic analysis
Page 8: Phylogenetic analysis
Page 9: Phylogenetic analysis
Page 10: Phylogenetic analysis

Methods of phylogenetic analysis

Parsimony (Cladistics)

Maximum likelihood

Bayesian

Genetic distance (Neighbor-joining, etc.)

Page 11: Phylogenetic analysis

Parsimony (Cladistics)

Willi Hennig. 1950. Grundzüge einer Theorie der phylogenetischen Systematik.

1966. Phylogenetic systematics.

Evidence comes from characters

Goal: build most parsimonious tree

Page 12: Phylogenetic analysis

Finding the most parsimonious tree

Goal- fewest evolutionary steps (optimality criterion)

• Fewest a.a. changes

• Fewest base changes

Many tree topologies are tested, choosing the best.

Unrooted

Rooting the tree comes later.

Page 13: Phylogenetic analysis

Rooting the tree

The outgroup taxa are included in the data matrix just like the other taxa.

Once the best tree is found, it is “rooted” along the branch connecting the outgroup and ingroup taxa.

Page 14: Phylogenetic analysis

T 1 T 2 T 3 O 2T 4 T 5 T 6 O 1 O 3 O 4 T 1 T 2 T 3 O 2T 4 T 5 T 6 O 1 O 3 O 4

T 1 T 2 T 3 O 2T 4 T 5 T 6 O 1 O 3 O 4

Strict consensus

Page 15: Phylogenetic analysis

What to do in case of a tie- consensus

A “strict” consensus tree is one in which the branches not present on all trees are collapsed, resulting in polytomies.

A “50% majority rule” consensus tree is one in which the branches not present on 50% of the trees are collapsed, resulting in polytomies.

Trees with many polytomies are said to be less resolved than trees with few or no polytomies.

Page 16: Phylogenetic analysis

T 1 T 2 T 3 O 2T 4 T 5 T 6 O 1 O 3 O 4 T 1 T 2 T 3 O 2T 4 T 5 T 6 O 1 O 3 O 4

T 1 T 2 T 3 O 2T 4 T 5 T 6 O 1 O 3 O 4

Strict consensus

Page 17: Phylogenetic analysis

Why are Maximum Likelihood and Bayesian methods considered an improvement over parsimony?

+ They allow for a model of molecular evolution to be specified.

• Not all changes from one base to another (or from one a.a. to another) are equally likely.

• Not all positions have the same probabilty of change.

- They require that the correct model be specified.

Page 18: Phylogenetic analysis

What is Maximum Likelihood (ML)?

Just like parsimony, ML examines lots of trees and picks the best one.

However, the optimality criteria differ.

• Parsimony -- fewest changes.

• ML -- maximizes the probability of observing the data (aligned sequences), given a model of molecular evolution.

Page 19: Phylogenetic analysis

Models of molecular evolution

Substitution matrix

• For proteins, this is the (observed) probability of one amino acid changing to another.

• For DNA, it is the probability of one base changing to another.

Site-to-site variation in rate of change

• Some sites don’t vary.

• Among those that do, they vary at different rates.

Page 20: Phylogenetic analysis

Why is using a correct model of molecular evolution better than using parsimony?

Under some conditions, parsimony chooses the wrong tree (long branch attraction).

Methods using a model are more precise and result in fewer exact ties, generally.

• For example, changes between two chemically similar a.a.’s can be used as “similarity”. Under parsimony all differences are simply “different”.

• Models usually choose a single best tree, whereas parsimony usually chooses a large set of most parsimonious trees.

Branch length estimates are more accurate with a model.

Page 21: Phylogenetic analysis

What is Bayesian phylogenetic analysis?

Just like ML, we search for the best trees that are consistent with both the model and the data.

Optimality criterion:

• -- maximizes the probability of the tree, given the data (aligned sequences) and the model of molecular evolution.

Bayesian analysis is the only one that automatically provides confidence estimates (similar to bootstrap values) for each node.

Page 22: Phylogenetic analysis

Example - Bayesian analysis of signal transduction proteins

Using ProtTest to find out how the sequences are evolving

Informing MrBayes of the model of molecular evolution

Using MrBayes to get the phylogeny

Making a figure

Page 23: Phylogenetic analysis
Page 24: Phylogenetic analysis
Page 25: Phylogenetic analysis
Page 26: Phylogenetic analysis
Page 27: Phylogenetic analysis

MrBayes doesn’t know when it has run long enough -- you decide.Average standard deviation of split frequencies: < 0.01

Page 28: Phylogenetic analysis
Page 29: Phylogenetic analysis
Page 30: Phylogenetic analysis
Page 31: Phylogenetic analysis

A B C D E B A E D C

Page 32: Phylogenetic analysis

What is Neighbor-joining (NJ)?

NJ is an algorithm for building a tree.

There is no optimality criterion.

First, a matrix of distances between all pairs of sequences is computed.

• A substitution matrix is needed to do this.

Then, one pair is chosen from among all possible pairs, because combining them best minimizes the length of the tree.

Page 33: Phylogenetic analysis
Page 34: Phylogenetic analysis

Neighbor-joining

NJ is very fast.

There is no optimality criterion.

• This means there is no way to assess its success.

• There is also no way to say whether a “best” tree is significantly better that a set of “next best” trees. (mt Eve)

The tree it chooses is not always the shortest. Distances are estimated from noisy data and early mistakes in NJ can’t be revisited.

Page 35: Phylogenetic analysis

Large data sets

If you have over 50 sequences, or if you have very long sequences (hundreds of proteins) ProtTest and MrBayes may take more than a couple of days to finish.

Parsimony is much faster.

• It allows node support (bootstrap values) to be calculated.

• It doesn’t require a model of molecular evolution.

• PAUP* can read nexus files.

NJ is faster still. Sometimes it is the only method that is fast enough.

• A default model of molecular evolution must be used.

Page 36: Phylogenetic analysis

DNA sequences should be used when sequences are highly similar

Use a very similar procedure.

Use MrModelTest instead of ProtTest.

Page 37: Phylogenetic analysis

Summary

Three most important choices

• Which sequences to include

• Outgroup sequences

• Alignment

Choice of method - Bayesian

Example - Look on Ned’s Computational Corner for more details.


Recommended