+ All Categories
Home > Documents > CS 177 Phylogenetics I

CS 177 Phylogenetics I

Date post: 02-Feb-2016
Category:
Upload: juro
View: 60 times
Download: 0 times
Share this document with a friend
Description:
CS 177 Phylogenetics I. Taxonomy and phylogenetics Phylogenetic trees Cladistic versus phenetic analyses Model of sequence evolution. Phylogenetic trees and networks Cladistic and phenetic methods Computer software and demos. Taxonomy and phylogenetics Phylogenetic trees - PowerPoint PPT Presentation
Popular Tags:
45
CS 177 Phylogenetics I Taxonomy and phylogenetics Phylogenetic trees Cladistic versus phenetic analyses Model of sequence evolution Phylogenetic trees and networks Cladistic and phenetic methods Computer software and demos Taxonomy and phylogenetics Phylogenetic trees Cladistic versus phenetic analyses Homology and homoplasy
Transcript
Page 1: CS 177          Phylogenetics I

CS 177 Phylogenetics I

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Model of sequence evolution

Phylogenetic trees and networks

Cladistic and phenetic methods

Computer software and demos

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

Page 2: CS 177          Phylogenetics I

Phylogenetic Inference I

A science primer: Phylogeneticshttp://www.ncbi.nlm.nih.gov/About/primer/phylo.html

Brown, S.M. (2000) Bioinformatics, Eaton Publishing, pp. 145-160

Brown, S.M.: Molecular Phylogeneticswww.med.nyu.edu/rcr/rcr/course/PPT/phylogen.ppt

Hillis, D.M.; Moritz, G. & Mable, B.K. (1996) Molecular Systematics, 2. Edition, Sinauer Associates, 655 pp.

Mount, D.W. (2001) Bioinformatics,Cold Spring Harbor Lab Press, pp.237-280

Recommended readings

(very) basic

advanced

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

Page 3: CS 177          Phylogenetics I

CS 177 Phylogenetic Inference I

The theory of evolution is the foundation upon which all of modern biology is built

Evolution

From anatomy to behavior to genomics, the scientific method requires an appreciation of changes in organisms over time

It is impossible to evaluate relationships among gene sequences without taking into consideration the way these sequences have been modified over time

Ernst Haeckel (1834-1919)

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

Page 4: CS 177          Phylogenetics I

CS 177 Phylogenetic Inference I

Similarity searches and multiple alignments of sequences naturally lead to the question

“How are these sequences related?”

and more generally:

“How are the organisms from which these sequences come related?”

Relationships

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

Page 5: CS 177          Phylogenetics I

Classifying Organisms

Nomenclature is the science of naming organisms

Evolution has created an enormous diversity, so how do we deal with it?

Names allow us to talk about groups of organisms.

- Scientific names were originally descriptive phrases; not practical

- Binomial nomenclature

> Developed by Linnaeus, a Swedish naturalist

> Names are in Latin, formerly the language of science

> binomials - names consisting of two parts

> The generic name is a noun.

> The epithet is a descriptive adjective.

- Thus a species' name is two words e.g. Homo sapiens

Carolus Linnaeus (1707-1778)

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

Page 6: CS 177          Phylogenetics I

Classifying Organisms

Taxonomy is the science of the classification of organisms

Taxonomy deals with the naming and ordering of taxa.

The Linnaean hierarchy:

1. Kingdom

2. Division

3. Class

4. Order

5. Family

6. Genus

7. Species

Ta xo no m ic C la ssific a tio n o f M a n Ho m o sa p ie ns

Sup e rking d o m : Euka ryo ta King d o m : M e ta zo a Phylum : C ho rd a ta C la ss: M a m m a lia O rd e r: Prim a ta Fa m ily: Ho m inid a e G e nus: Sp e c ie s:

Ho m osa p ie ns

Sub sp e c ie s: sa p ie ns Evol u

tionary

di s

tanc e

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

Page 7: CS 177          Phylogenetics I

Systematics is the science of the relationships of organisms

Systematics is the science of how organisms are related and the evidence for those relationships

Systematics is divided primarily into phylogenetics and taxonomy

Speciation -- the origin of new species from previously existing ones

- anagenesis - one species changes into another over time

- cladogenesis - one species splits to make two

Classifying Organisms

Reconstruct evolutionary history

Phylogeny

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

Page 8: CS 177          Phylogenetics I

Phylogenetics

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

Phylogenetics is the science of the pattern of evolution.

A. Evolutionary biology is the study of the processes that generate diversity, while phylogenetics is the study of the pattern of diversity produced by

those processes.

B. The central problem of phylogenetics:

1. How do we determine the relationships between species?

2. Use evidence from shared characteristics, not differences

3. Use homologies, not analogies

4. Use derived condition, not ancestral

a. synapomorphy - shared derived characteristic

b. plesiomorphy - ancestral characteristic

C. Cladistics is phylogenetics based on synapomorphies.

1. Cladistic classification creates and names taxa based only on synapomorphies.

2. This is the principle of monophyly

3. monophyletic, paraphyletic, polyphyletic

4. Cladistics is now the preferred approach to phylogenyThe phylogeny and classification of life as proposed by Haeckel (1866)

Page 9: CS 177          Phylogenetics I

Phylogenetics

Evolutionary theory states that groups of similar organisms are descendedfrom a common ancestor.

Phylogenetic systematics is a method of taxonomic classification basedon their evolutionary history.

It was developed by Hennig, a German entomologist, in 1950.

Willi Hennig (1913-1976)

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

Page 10: CS 177          Phylogenetics I

Phylogenetics

Phylogenetics is the science of the pattern of evolution

Evolutionary biology versus phylogenetics

- Evolutionary biology is the study of the processes that generate diversity

- Phylogenetics is the study of the pattern of diversity produced by those processes

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

Page 11: CS 177          Phylogenetics I

Phylogenetics

Who uses phylogenetics? Some examples:

Evolutionary biologists (e.g. reconstructing tree of life)

Systematists (e.g. classification of groups)

Anthropologists (e.g. origin of human populations)

Forensics (e.g. transmission of HIV virus to a rape victim)

Parasitologists (e.g. phylogeny of parasites, co-evolution)

Epidemiologists (e.g. reconstruction of disease transmission)

Genomics/Proteomics (e.g. homology comparison of new proteins)

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

Page 12: CS 177          Phylogenetics I

Phylogenetic trees

The central problem of phylogenetics:

how do we determine the relationships between taxa?

in phylogenetic studies, the most convenient way of presenting evolutionary relationships among a group of organisms is the phylogenetic tree

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

Page 13: CS 177          Phylogenetics I

Phylogenetic trees

Sp e c ie s A

Sp e c ie s E

Sp e c ie s D

Sp e c ie s C

Sp e c ie s B

Node: a branchpoint in a tree (a presumed ancestral OTU)

Branch: defines the relationship between the taxa in terms of descent and ancestry

Topology: the branching patterns of the tree

Branch length (scaled trees only): represents the number of changes that have occurred in the branch

Root: the common ancestor of all taxa

Clade: a group of two or more taxa or DNA sequences that includes both their common ancestor and all their descendents

Operational Taxonomic Unit (OTU): taxonomic level of sampling selected by the user to be used in a study, such as individuals, populations, species, genera, or bacterial strains

Root

Branch

CladeNode

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

Page 14: CS 177          Phylogenetics I

Phylogenetic trees

There are many ways of drawing a tree

A

E

D

C

B

A EDCB

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

Page 15: CS 177          Phylogenetics I

Phylogenetic trees

There are many ways of drawing a tree

=

A EDCB E DC B A

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

=

E CD B A

Page 16: CS 177          Phylogenetics I

Phylogenetic trees

There are many ways of drawing a tree

A EDCBA EDCB

= =

A EDCB

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

no meaning

Page 17: CS 177          Phylogenetics I

Phylogenetic trees

There are many ways of drawing a tree

A EDCB A EDCB

Bifurcation

Trifurcation

=/

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

Bifurcation versus Multifurcation (e.g. Trifurcation)

Multifurcation (also called polytomy): a node in a tree that connects more than three branches. A multifurcation may represent a lack of resolution because of too few data available for inferring the phylogeny (in which case it is said to be a soft multifurcation) or it may represent the hypothesized simultaneous splitting of several lineages (in which case it is said to be a hard multifurcation).

Page 18: CS 177          Phylogenetics I

Phylogenetic trees

Trees can be scaled or unscaled (with or without branch lengths)

A

E

D

C

B

A

E

D

C

B

A

E

D

C

B

A

E

D

C

B

unit

unit

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

Page 19: CS 177          Phylogenetics I

Phylogenetic trees

Trees can be unrooted or rooted

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

D

A C

B

Unrooted tree

A CB D

Root

Rooted tree

D

A C

B

Root

A CB D

Root

Root

Page 20: CS 177          Phylogenetics I

Phylogenetic trees

Trees can be unrooted or rooted

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

Unrooted tree

A C

B D

4

3

5

2

1

These trees show five different evolutionary relationships among the taxa!

Rooted tree 1

B

A

C

D

Rooted tree 2

A

B

C

D

Ro oted tree 3

A

B

C

D

Rooted tree 4

C

D

A

B

Ro oted tree 5

D

C

A

B

Page 21: CS 177          Phylogenetics I

Phylogenetic trees

Possible evolutionary trees

Taxa (n) Unrooted/rooted

2

2 1/1

3 1/3

4 3/15

43

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

Taxa (n):

Page 22: CS 177          Phylogenetics I

Phylogenetic trees

Possible evolutionary trees

Taxa (n) rooted(2n-3)!/(2n-2(n-2)!)

unrooted(2n-5)!/(2n-3(n-3)!)

2 1 1

3 3 1

4 15 3

5 105 15

6 954 105

7 10,395 954

8 135,135 10,395

9 2,027,025 135,135

10 34,459,425 2,027,025

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

Page 23: CS 177          Phylogenetics I

Phylogenetic trees

How to root?

Use information from ancestors

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

In most cases not available

A C

B D

4

3

5

2

1

Page 24: CS 177          Phylogenetics I

Phylogenetic trees

How to root?

Use statistical tools will root trees automatically (e.g. mid-point rooting)

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

A C

B D

4

3

5

2

1

This must involve assumptions … BEWARE!

A

B

C

D

10

2

3

5

2

d (A ,D ) = 10 + 3 + 5 = 18

M idpoint = 18 / 2 = 9

Page 25: CS 177          Phylogenetics I

Phylogenetic trees

How to root?

Using “outgroups”

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

A C

B D

4

3

5

2

1

outgroup

- the outgroup should be a taxon known to be less closely related to the rest of the taxa (ingroups)

- it should ideally be as closely related as possible to the rest of the taxa while still satisfying the above condition

Page 26: CS 177          Phylogenetics I

Phylogenetic trees

Exercise: rooted/unrooted; scaled/unscaled

A EDCB

A

E

D

C

B

AE

DC

BA

E

D

C

B

A

E

D

C

B

A EDCB

A

ED

CB

F

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

Page 27: CS 177          Phylogenetics I

Phylogenetics

What are useful characters?

Use homologies, not analogies!

- Homology: common ancestry of two or more character states

- Analogy: similarity of character states not due to shared ancestry

- Homoplasy: a collection of phenomena that leads to similarities in character states for reasons other than inheritance from a common ancestor (e.g. convergence, parallelism, reversal)

Homoplasy is huge problemin morphology data sets!

But in molecular data sets, too!

Cactaceae and Euphorbiaceae

Taxonomy and phylogenetics

Phylogenetic trees

Homology and homoplasy

Cladistic versus phenetic analyses

Page 28: CS 177          Phylogenetics I

Phylogenetics

Molecular data and homoplasy

260 * 280 * 300 * 320 0841r : CCTTCAATTTTTATT-----------------------AGAGTTTTAGGAGAAATAAGTATGTG : 2720992r : CCTCCAATTTTTATTAGCTTGCCTACTCCTTTGGGCACAGAGTTTTAGGAGAAATAAGTATGTG : 2133803r : CCTCCAATTTTTATTAGCTTGCCTACTCCTTTGGGCACAGAGTTTTAGGAGAAATAAGTATGTG : 3054062r : CCTCCAATTTTTATTAGCTTGCCTACTCCTTTGGGAACAGAGTTTTAGGAGAAATAAGTATGTG : 3193802r : CCTCCAATTTTTATTAGTTTGCCTACTCCTTTGGGCACAGAGTTTTAGGAGAAATAAGTATGTG : 282ph2f : CCTCCAATTTTTATTAGCTTGCCTACTCCTTTGGGCACAGAGTTTTAGGAGAAATAAGTATGTG : 306 CCTcCAATTTTTATTag ttgcctactcctttggg acAGAGTTTTAGGAGAAATAAGTATGTG

gene sequences represent character data

characters are positions in the sequence (not all workers agree; some say one gene is one character)

character states are the nucleotides in the sequence (or amino acids in the case of proteins)

Problems:

the probability that two nucleotides are the same just by chance mutation is 25%

what to do with insertions or deletions (which may themselves be characters)

homoplasy in sequences may cause alignment errors

Taxonomy and phylogenetics

Phylogenetic trees

Homology and homoplasy

Cladistic versus phenetic analyses

Page 29: CS 177          Phylogenetics I

Phylogenetics

Molecular data and homoplasy: Orthologs vs. Paralogs

When comparing gene sequences, it is important to distinguish between identical vs. merely similar genes in different organisms

Orthologs are homologous genes in different species with analogous functions

Paralogs are similar genes that are the result of a gene duplication

A phylogeny that includes both orthologs and paralogs is likely to be incorrect

Sometimes phylogenetic analysis is the best way to determine if a new gene is an ortholog or paralog to other known genes

Taxonomy and phylogenetics

Phylogenetic trees

Homology and homoplasy

Cladistic versus phenetic analyses

Page 30: CS 177          Phylogenetics I

Phylogenetics

What are useful characters?

Use derived condition, not ancestral

- Synapomorphy (shared derived character): homologous traits share the same character state because it originated in their immediate common ancestor

- Plesiomorphy (shared ancestral character”): homologous traits share the same character state because they are inherited from a common distant ancestor

Taxonomy and phylogenetics

Phylogenetic trees

Homology and homoplasy

Cladistic versus phenetic analyses

a na lo g y

syna p o m o rp hy(sha re d d e rive d

c ha ra c te r)

p le sio m o rp hy(sha re d a nc e stra l

c ha ra c te r)

a uta p o m o rp hy(uniq ue d e rive d

c ha ra c te r)

Page 31: CS 177          Phylogenetics I

Phenetic methods construct trees (phenograms) by considering the current states of characters without regard to the evolutionary history that brought the species to their current phenotypes;phenograms are based on overall similarity

Cladistic methods construct trees (cladograms) rely on assumptions about ancestral relationships as well as on current data;cladograms are based on character evolution (e.g. shared derived characters)

Within the field of taxonomy there are two different methods and philosophies of building phylogenetic trees: cladistic and phenetic

Cladistics is becoming the method of choice; it is considered to be more powerfuland to provide more realistic estimates, however, it is slower than phenetic algorithms

Phenetics versus cladistics

Page 32: CS 177          Phylogenetics I

Phenetics vs. cladistics

An example

Page 33: CS 177          Phylogenetics I

Phenetics vs. cladistics

Phenetic (overall similarity)

A

B

Coverall similarityoverall similarity

C B A

3

4

5

Page 34: CS 177          Phylogenetics I

characteristics identity

critter A 4 limbs meta.kidney

hair endothermy vivip. nocloaca

placental

critter B 4 limbs meta.kidney

hair endothermy ovip. cloaca echidna

critter C 4 limbs meta.kidney

feathers endothermy ovip. cloaca bird

ancestor 4 limbs meta.kidney

nohair/feathers

ectothermy ovip. cloaca turtle

Phenetics vs. cladistics

Cladistics (character evolution; e.g. shared derived characters)

A

B

C

shared derived charactersshared derived characters

A B C

1

2

1

Page 35: CS 177          Phylogenetics I

Model of sequence evolution

The problem

- A basic process in the evolution of a sequence is change in that sequence over time

- Now we are interested in a mathematical model to describe that

- It is essential to have such a model to understand the mechanisms of change and is required to estimate both the rate of evolution and the evolutionary history of sequences

260 * 280 * 300 * 320 0841r : CCTTCAATTTTTATT-----------------------AGAGTTTTAGGAGAAATAAGTATGTG : 2720992r : CCTCCAATTTTTATTAGCTTGCCTACTCCTTTGGGCACAGAGTTTTAGGAGAAATAAGTATGTG : 2133803r : CCTCCAATTTTTATTAGCTTGCCTACTCCTTTGGGCACAGAGTTTTAGGAGAAATAAGTATGTG : 3054062r : CCTCCAATTTTTATTAGCTTGCCTACTCCTTTGGGAACAGAGTTTTAGGAGAAATAAGTATGTG : 3193802r : CCTCCAATTTTTATTAGTTTGCCTACTCCTTTGGGCACAGAGTTTTAGGAGAAATAAGTATGTG : 282ph2f : CCTCCAATTTTTATTAGCTTGCCTACTCCTTTGGGCACAGAGTTTTAGGAGAAATAAGTATGTG : 306 CCTcCAATTTTTATTag ttgcctactcctttggg acAGAGTTTTAGGAGAAATAAGTATGTG

Page 36: CS 177          Phylogenetics I

Model of sequence evolution

Pyrimidine (C4N2H4) Purine (C5N4H4)

Nucleotide base + sugar + phosphate

O

sug a r

P OO -

O -

P O 4--

Guanine

AdenineThymine

Cytosine

5 ’

3 ’

3 ’

3 ’

3 ’

3 ’

5 ’

3 ’

3 ’

3 ’

3 ’

3 ’

Page 37: CS 177          Phylogenetics I

A

C T

G

Models of sequence evolution

Examples

Jukes-Cantor model (1969)

All substitutions have an equal probability and base frequencies are equal

Page 38: CS 177          Phylogenetics I

A

C T

G

Models of sequence evolution

Examples

Felsenstein (1981)

All substitutions have an equal probability, but there are unequal base frequencies

Page 39: CS 177          Phylogenetics I

APurines

Purym idines C T

G

Models of sequence evolution

Examples

Kimura 2 parameter model (K2P) (1980)

Transitions and transversions have different probabilities

Page 40: CS 177          Phylogenetics I

APurines

Purym idines C T

G

Models of sequence evolution

Examples

Hasegawa, Kishino & Yano (HKY) (1985)

Transitions and transversions have different probabilities,base frequencies are unequal

Page 41: CS 177          Phylogenetics I

A

C T

G

Models of sequence evolution

Examples

General time reversible model (GTR)

Different probabilities for each substitution,base frequencies are unequal

Page 42: CS 177          Phylogenetics I

A

C T

G

Models of sequence evolution

GTR

HKY

A

C T

G

A

C T

G

A

C T

G

A

C T

G

Jukes-Cantor

Felsenstein K2P

Page 43: CS 177          Phylogenetics I

More models of sequence evolution …

Currently, there are more than 60 models described

- plus gamma distribution and invariable sites

- accuracy of models rapidly decreases for highly divergent sequences

- problem: more complicated models tend to be less accurate (and slower)

How to pick an appropriate model?

- use a maximum likelihood ratio test

- implemented in Modeltest 3.06 (Posada & Crandall, 1998)

Page 44: CS 177          Phylogenetics I

More models of sequence evolution …

Example for Modeltest file

JC = 3158.0095

F81 = 3121.2188

K80 = 2994.6611

HKY = 2924.4182

TrNef = 2994.5491

TrN = 2923.6340

K81 = 2987.6548

K81uf = 2923.5620

TIMef = 2987.6196

TIM = 2922.9878

TVMef = 2983.3450

TVM = 2922.1970

SYM = 2983.3069

GTR = 2921.1187

A Equal base frequencies

Null model = JC -lnL0 = 3369.2803

Alternative model = F81 -lnL1 = 3342.5513

2(lnL1-lnL0) = 53.4580 df = 3

P-value = <0.000001

B

Model selected: TVM+G

-lnL = 2911.3660

C

Page 45: CS 177          Phylogenetics I

helix

sheet

Did the Florida dentist infect his patients with HIV?

Taxonomy and phylogenetics

Phylogenetic trees

Homology and homoplasy

Cladistic versus phenetic analyses

DENTIST

DENTIST

Patient D

Patient F

Patient C

Patient A

Patient G

Patient BPatient E

Patient A

Local control 2

Local control 3

Local control 9

Local control 35

Local control 3

From Ou et al. (1992) and Page & Holmes (1998)

N oN o

N oN o

Yes


Recommended