+ All Categories
Home > Documents > Character based phylogeny - Reykjavík University · JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ...

Character based phylogeny - Reykjavík University · JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ JJJ...

Date post: 13-Sep-2019
Category:
Upload: others
View: 28 times
Download: 0 times
Share this document with a friend
21
Algorithms for phylogeny construction A Hybrid Micro-Macroevolutionary Approach to Gene Tree Reconstruction ICE-TCS Inaugural Symposium Bjarni V. Halld´ orsson April 30, 2005 1
Transcript

Algorithms for phylogeny construction

A Hybrid Micro-Macroevolutionary Approach to Gene Tree

Reconstruction

ICE-TCS Inaugural Symposium

Bjarni V. Halldorsson

April 30, 2005

1

Character based phylogeny

2

Has Intelligence?

nozzvvvvvvvvvvvvvvvvvvvvvvvv

yes

$$JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ

Has Body Hair?

no����

����

����

����

����

����

����

����

yes BB

BBBB

BBBB

BBBB

BBBB

BBBB

BBBB

BBBB

BB

3

Genes, genomes

• Gene - a sequence having functional importance. AACG,

CACC, TACT

• Genome - a sequence containing genes as subsequences

TATAACGTTTCTACTCTATTACTCC

4

Evolution - changes in the genome

Original:

TATAACGTTTCTACTCTATTACTCC

Mutation:

TATAACGTTTCTAATCTCTTACTCC

Duplication:

TATAACGTTTCTACTCTATTACTCCTCTACTCT

Loss:

TA-TCTACTCTATTACTCC

5

Phylogenies

A species phylogeny shows the evolutionary history of a set of

species.

wwooooooooooooooo

))TTTTTTTTTTTTTTTT

%%KKKKKKKKKKKK

zzuuuuuuuuuuuu mouse

human monkey

A gene phylogeny shows the evolutionary history of a single gene.

P

�� ))SSSSSSSSSSSSSSSSS

P

xxpppppppppppppp

((RRRRRRRRRRRRRRRRRRRR Pmouse

P1

yyssssssssss

%%LLLLLLLLLL P2

�� ((PPPPPPPPPPPPPP

P1human

P1monkey P2

humanP2monkey

6

Why are gene phylogenies interesting?

• The same gene in different species is likely to play the same

role.

• We want to determine the function of a gene in human.

• Experiments in mouse, yeast or flies are less controversial

and take less time than in human.

7

Phylogeny construction considering mutations A very large num-

ber of algorithms exist for this problem.

• Character based algorithms (as mentioned before).

• Distances between the sequences are computed (such as the

number of mutations that occured between the sequences).

• If the phylogeny has the ultrametric property an efficient

algorithm can be employed.

8

Macroevolutionary phylogeny

Input: A rooted species tree, TS with s leaves; a list of multiplic-

ities m1 . . . ms, where ml is the number of gene family members

found in species l; weights cλ and cδ.

Output: A rooted gene tree {TG} with∑s

l=1 ml leaves such that

the D/L Score of TG is minimal.

zztttttttttt

''OOOOOOOOOOOOO

||zzzz

zzzz

��yyssssssssss

""EE

EEEE

EEEE

E

2A

����

@@@@

@@@@

@ 1B

~~~~~~

~~~~

~

��@@

@@@@

@@@

��@@

@@@@

@@@

��~~~~

~~~~

~ 2C 1D 2E

1F 2G

9

Phylogenies considering only cost of loss

• If the cost of losing a gene is much higher than the cost of

duplication we will construct a phylogeny that minimizes the

number of lost genes.

• All duplications will then take place after the speciations take

place.

10

4 Duplications

zzvvvvvvvvv

&&NNNNNNNNNNNN

}}zzzz

zzzz

��yytttt

tttttt

""DD

DDDD

DDDD

2A

����

@@@@

@@@@

@ 1B

��~~~~

~~~~

~

��@@

@@@@

@@@

��@@

@@@@

@@@

������

����

� 2C 1D 2E

1F 2G

vvmmmmmmmmmmmmmm

''PPPPPPPPPPPPP

xxppppppppppp

��yyrrrrrrrrrrr

##GGGGGGGGGGG

Dupl

||xxxx

xxxx

x

����

%%JJJJJJJJJJJJJ 1B

}}||||

||||

||

""FF

FFFF

FFFF

F

A A

yytttttttttttttt

��

Dupl

�� $$HHH

HHHHH

HH1D Dupl

||xxxx

xxxx

x

��

F Dupl

%%KKKKKKKKKKK

yysssssssssssC C E E

G G

11

Phylogenies considering only cost of duplication

• If the cost of a duplication is much higher than the cost of a

loss we will construct a phylogeny that minimizes the number

of duplications.

• All duplications can then be assumed to occur before any

speciation occurs.

12

1 Duplication, 3 Losses

zzvvvvvvvvv

&&NNNNNNNNNNNN

}}zzzz

zzzz

��yytttt

tttttt

""DD

DDDD

DDDD

2A

����

@@@@

@@@@

@ 1B

��~~~~

~~~~

~

��@@

@@@@

@@@

��@@

@@@@

@@@

������

����

� 2C 1D 2E

1F 2G

rreeeeeeeeeeeeeeeeeeeeeeeeeeeeee

))RRRRRRRRRRRRRRR

yyssssssssss

((PPPPPPPPPPPPPP

||yyyy

yyyy

y

$$IIIII

IIIII

{{wwwwwwwww

��xxqqqqqqqqqqq

##GG

GGGGG

GGGG

~~~~~~

~~~

��||yy

yyyy

yyy

AA

AAAA

AAA

A

��!!

CCCC

CCCC

CC Lost

}}{{{{

{{{{

{{

��<<

<<<<

<<< A

����

<<<<

<<<<

< B

������

����

��<<

<<<<

<<<

!!CC

CCCC

CCCC

}}{{{{

{{{{

{{ C Lost E

��<<

<<<<

<<<

������

����

� C D E

Lost G F G

13

Phylogenies considering duplication and loss

Reconstruct[TS, {m1 . . . ms}]

Ascend[root(TS)];

Descend[root(TS), 1];

Construct[root(TS)];

14

Ascend[v]

if v is not a leaf: Ascend[l(v)]; Ascend[r(v)];

if v is a leaf:

∀i s.t. 1 ≤ i ≤ m

costminv [i]← cδ ∗max(mv − i,0) + cλ ∗max(i−mv,0);

if v is not a leaf:

∀i, j s.t. 1 ≤ i, j ≤ m

costv[i, j]← cδ ∗max(j − i,0) + cλ ∗max(i− j,0) + costminl(v)

[j] + costminr(v)

[j];

∀i costminv [i]← min∀j{costv[i, j]};

15

Descend

Descend[v, i]

if v is a leaf:

v.losses← max((i−mv),0); v.dups← max((mv − i),0);

v.out← 0;

else

repeat { v.out + + } until ( costv[i, v.out] == costminv [i] );

Descend[l(v), v.out]; Descend[r(v), v.out]

v.losses← max(i− v.out,0); v.dups← max (v.out− i,0)

16

Construct

Construct[s]

g ← new gene node; g.species← s

if (s.currDup < s.dups)

s.currDup + +; l(g)← Construct[s]; r(g)← Construct[s];

else if (s.currLoss < s.losses)

s.currLoss + +;

else if (s.currSpec < s.out)

s.currSpec + +;

if s is not a leaf: l(g)← Construct[l(s)]; r(g)← Construct[r(s)];

return g;

17

2 Duplications, 1 loss

zzvvvvvvvvv

&&NNNNNNNNNNNN

}}zzzz

zzzz

��yytttt

tttttt

""DD

DDDD

DDDD

2A

����

@@@@

@@@@

@ 1B

��~~~~

~~~~

~

��@@

@@@@

@@@

��@@

@@@@

@@@

������

����

� 2C 1D 2E

1F 2G

Dupl

vvmmmmmmmmmmmmmm

%%LLLLLLLLLL

@@

@@@@

@@@

vvmmmmmmmmmmmmmmmmm

xxqqqqqqqqqqq

##HHHHHHHHHHH

~~||||

||||

||

�� ������

����

��� 1B

}}{{{{

{{{{

{{

##GG

GGGG

GGGG

G

A

��::

::::

:::

��

A

!!CC

CCCC

CCCC

��

1D Dupl

��{{wwww

wwww

w

��<<

<<<<

<<<

}}{{{{

{{{{

{{ C

""DD

DDDD

DDDD

������

����

C E E

Lost G F G

18

Time Complexity

Optimal history can be found in time O(nm2). Where n is the

number of nodes in the species tree and m is the maximum

number of genes drawn from any species.

In Ascendleaves of the species tree can be annotated with mul-

tiplicities in O(nm) time. The cost vector in each node is of

length m + 1 and each entry can be computed in time O(m),

total O(nm2).

Descend requires O(m) at each node, total O(nm). Construct

inserts duplication and loss nodes in the new tree, which can

number in total no more than m per node in TS. Total O(nm).

19

Extensions

• Combining duplication and loss cost with cost of mutations.

– Some edges of a phylogeny tree are well supported by a

micro-evolutionary phylogenetic construction algorithms.

– Edges that are not as well supported can be rearranged

minimizing duplication and loss.

• Consider and display all possible optimal histories.

20

Acknowledgements

• R. Ravi, Carnegie Mellon University

• Dannie Durand, Carnegie Mellon University

A Hybrid Micro-Macroevolutionary Approach to Gene Tree Re-

construction. D. Durand, B. V. Halldorsson, B. Vernot, 2005.

Proceedings of the Ninth Annual International Conference on

Computational Molecular Biology (RECOMB), To Appear.

21


Recommended