Date post: | 02-Jan-2016 |
Category: |
Documents |
Upload: | cordelia-harmon |
View: | 214 times |
Download: | 0 times |
Constructing evolutionary trees from rooted triples
Bang Ye Wu
Dept. of Computer Science and Information Engineering
Shu-Te University
An evolutionary tree A rooted tree Each leaf represents one species. Internal nodes are unlabelled. (inferred
common ancestors)
a b c d e f
A (rooted) triple (triplet) An evolutionary tree of 3 species. A constraint in an evolutionary tree construction
problem. (c(ab)): lca(b,c)=lca(c,a)lca(a,b)
lca : lowest common ancestor : “is an ancestor of “
a,b should be closer than a,c or b,c.
a b c
A tree compatible with triples
Given a set of triples, construct a tree satisfying all the triples.
If such a tree exists, the problem is polynomial time solvable. [Aho et al, 1981]
a d b cab cca dba d
Incompatible (conflicting) triples
ab c ba c
Two conflicting triples
ab c bd c db a
Three conflicting triples (pairwise compatible)
Two optimization problems
The maximum consensus tree: – the tree satisfying maximum number of triples.– NP-hard [Jansson, 2001][Wu, to appear]– A new heuristic algorithm [this paper]
The maximum compatible set:– The compatible species subset of maximum
cardinality. – NP-hard [this paper]
Previous heuristicBest-One-Split-First
If a species x is split from a set V, all triples (x(v1v2)), v1 and v2 in V, will be satisfied.
Repeatedly split one species from the set. Choose the split species greedily.
triples: (a(bc)),(c(ad)),(b(ad)),(c(bd))
{a,b,d}
cb
{a,d}c
da b c
c is chosen, and the two triples is satisfied.
c is split
b is split
Previous heuristicMin-Cut-Split-First
Construct an auxiliary graph:– Vertex: species– Each edge is labeled by a set: for each
triple (x(yz)), x is in the label set of edge (y,z).
c
b
ca
d
a
b,c triples: (a(bc)),(c(ad)),(b(ad)),(c(bd))
– A bipartition corresponds to a split in the tree.– The label in the cut of the bipartition corresponds to
the triples conflicting the split. Repeatedly find the bipartition with minimum
cut.
{a,d} {b,c}
a d b cc
b
ca
d
a
b,c
a min-cut, triple (c(bd)) is conflicting
Previous heuristicBest-Pair-Merge-First
Instead of top-down splitting, BPMF uses the bottom-up merging strategy.
Starting from sets of singleton, we repeatedly merge the sets step by step.
Scoring functions are used to evaluate which pair should be merged in each step.
triples: (a(bc)),(c(ad)),(b(ad)),(c(bd))
{a} {b} {c} {d}
{a,d} {b} {c}
{a,d} {b,c}
{a,d,b,c}
a d
a d b c
a d b c
An exact algorithm for MCTT
Dynamic programming F(V)=max{F(V1)+F(V2)+W(V1,V2)},
taken among all bipartition (V1,V2) of V.– F(V): # of satisfied triples over V.– W(V1,V2): # of (x(v1v2) for x not in V and
v1, v2 in V1, V2 respectively. Computed with cardinality from small
to large.
n=4 abcd3
n=3 abc1
abd3
bcd2
n=2 ab0
ac0
ad2
bc1
bd1
cd0
n=1 a0
b0
c0
d0
ab c ca d ba d cb d
a d b c
Our new heuristic algorithm (DPWP)
Derived from the exact algorithm. The number of subsets of each
cardinality is limited by a parameter K. When K=infinity, it is just the exact
algorithm. Time-quality trade-off. The time complexity is O(n2k2(n3+k)).
– Sorry, there is a mistake in the paper.
The experiment results (time)
1
10
100
1000
10000
12 15 18 20 24 27 30
n
tim
e (s
ec) Exact
DPWP(300)
DPWP(600)
DPWP(900)
Average ratio in the test
0.80.850.9
0.951
1.051.1
1.151.2
12 15 18 20
ratio
BPMF
BOSF
MCSF
DWDP(300)
DWDP(600)
DWDP(900)
Worst ratio in the test
0.80.9
11.11.21.31.41.51.6
12 15 18 20
ratio
BPMF
BOSF
MCSF
DWDP(300)
DWDP(600)
DWDP(900)
Improvement100*(DPWP - BestofOther)/BestofOther
0
5
10
15
20
18 20 24 27 30
n
(%) Max
average
The MCST problem Given triples over species set S, find a
subset U of S such that all given triples over U is compatible and |U| is maximum.
We show the problem is NP-hard.– Transformed from the Feedback Vertex
Set problem.
The feedback vertex set problem
Feedback vertex set: a vertex subset containing at one vertex of each cycle of the given directed graph.– In other words, removing a feedback
vertex set results in an acyclic digraph.
The reduction
T 1
T 2
T p
....
x
rp
r1
r2
....
V p
V 3
V 2
V 1
x 1 ,x 2 ,...
Concluding remarks What is the approximation ratio?
– The Best-One-Split-First algorithm is a 3-approximation algorithm,
– The larger K give us better solution, but we do not know the theoretic bound of the ratio.