Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | delilah-mcgee |
View: | 214 times |
Download: | 1 times |
Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in
Bacterial Genome
Yaw-Ling Lin 1 Tsan-Sheng Hsu2
1 Dept Computer Sci. & Info. Management,Providence University, Taichung, Taiwan.
2 Institute of Information ScienceAcademia Sinica, Taipei, Taiwan
Yaw-Ling Lin, Providence, Taiwan 2
Motivation – Where the problems
come from?
Yaw-Ling Lin, Providence, Taiwan 3
Two-Component System
• Two-component systems (2CS):– Sensor histidine kinase– response regulator
• The major controlling machinery in order for bacteria to encounter a diverse and often hostile environment
Yaw-Ling Lin, Providence, Taiwan 4
2CS in Pseudomonas aeruginosa PAO1
http://www.pseudomonas.com/
“Complete genome sequence of Pseudomonas aeruginosa PAO1, an opportunistic pathogen.” Nature. 2000 Aug 31;406(6799):947-8. by Stover CK, Pham XQ, Erwin AL, et al.
• Genome: 6.3M bp• predicted genes: 5570• 123 genes were classif
ied as 2CSs.
Yaw-Ling Lin, Providence, Taiwan 5
2CS in PAO1
Yaw-Ling Lin, Providence, Taiwan 6
2CS in PAO1
Yaw-Ling Lin, Providence, Taiwan 7
2CS in PAO1
Yaw-Ling Lin, Providence, Taiwan 8
2CS in PAO1
• There are 123 annotated 2CS genes in PAO1.• Use systemic analysis of the evolutionary relations
hips between the sensor kinase and response regulator of a 2CS.
• Construct phylogenic trees using Clustal-W for 54 sensor kinases and 59 response regulators.
Yaw-Ling Lin, Providence, Taiwan 9
2CS in PAO1 -- Sensor Tree
Yaw-Ling Lin, Providence, Taiwan 10
2CS: Regulator Tree
Yaw-Ling Lin, Providence, Taiwan 11
Subtrees Analysis of 2CS
Yaw-Ling Lin, Providence, Taiwan 12
Co-evolution subtree Analysis
Sensor Tree Regulator Tree
versus
Yaw-Ling Lin, Providence, Taiwan 13
Different Trees• Different phylogenetic trees inference methods:
- Maximum parsimony
- Maximum likelihood
- Distance matrix fitting
- Quartet based methods
• Comparing the same set of species w.r.t. different biological sequences or different genes, hence obtaining various trees.
• How to find the largest set of items on which the trees agree ?
Yaw-Ling Lin, Providence, Taiwan 14
Previous Results• Measuring the similarity / difference between trees:
- Symmetric difference [Robinson 1979]
- Robinson and Foulds (RF) metric [Robinson 1981]
- Nearest-neighbor interchange [Waterman 1978]
- Subtree transfer distance [Allen 2001]
- Quartet metric [Estabrook 1985]
• Inferring the consensus tree: maximum agreement subtree problem (MAST) ; a.k.a the maximum homeomorphic agreement subtree
Yaw-Ling Lin, Providence, Taiwan 15
MAST: Maximum Agreement Subtree
• Problem: given a set of rooted trees whose leaves are drawn from the same set of items of size n, find the largest subset of these items so that the portions of the trees restricted to the subset are isomorphic.
• [Amir and Keselman 1997]: NP-hard even for 3 unbounded degree trees.
• [Hein 1995]: the MAST for 3 trees with unbounded degree is hard to be approximated.
• [Amir et al 1997] Polynomial time algorithms for three or more bounded degree trees, but the time complexity is exponential in the bound for the degree.
Yaw-Ling Lin, Providence, Taiwan 16
MAST: Maximum Agreement Subtree
• [Farach and Thorup 1997]: O(n1. 5 log n) time algorithm for two arbitrary degree trees.
• [Cole et al 2002]: MAST of two binary trees can be found in O(n log n) time; MAST of two degree d trees can be found in time.2(min{ log , log log })O n d n nd n d
Yaw-Ling Lin, Providence, Taiwan 17
Problem Definition• A phylogenetic tree with n leaves is a (rooted) t
ree such that all the leaf nodes are uniquely labelled from 1 to n.
• The descendent subtree of a phylogenetic tree T is the subtree composed by all edges and nodes of T descending from a vertex.
• Given a set of n-leaf phylogenetic trees, we wish to explore the descendent subtrees relationships within these trees.
Yaw-Ling Lin, Providence, Taiwan 18
Normalized cluster distance between two sets
• Symmetric set difference:
• Normalized cluster distance:
Yaw-Ling Lin, Providence, Taiwan 19
All Pairs Subtrees Comparison – A naïve O(n3) algorithm
Yaw-Ling Lin, Providence, Taiwan 20
All Pairs Subtrees Comparison – Property
Yaw-Ling Lin, Providence, Taiwan 21
All Pairs Subtrees Comparison – an O(n2) algorithm
Yaw-Ling Lin, Providence, Taiwan 22
Lowest Common Ancestor
Yaw-Ling Lin, Providence, Taiwan 23
Confluent subtree
Yaw-Ling Lin, Providence, Taiwan 24
Confluent subtree – Illustration
Yaw-Ling Lin, Providence, Taiwan 25
Consructing confluent subtree
Yaw-Ling Lin, Providence, Taiwan 26
Nearest subtree
Yaw-Ling Lin, Providence, Taiwan 27
Nearest subtree: reasoning
Yaw-Ling Lin, Providence, Taiwan 28
Nearest subtree: Algorithm
Yaw-Ling Lin, Providence, Taiwan 29
Leaf-agree / Isomorphic Subtrees
Yaw-Ling Lin, Providence, Taiwan 30
leaf-agreement – Two Trees
Yaw-Ling Lin, Providence, Taiwan 31
All-agreement: Illustration
XY
z
x yy’=Lca(Y)
T1
X
z’=Lca(x’, y’)
Y
x’=Lca(X)
T2
Yaw-Ling Lin, Providence, Taiwan 32
All-agreement Method
Yaw-Ling Lin, Providence, Taiwan 33
leaf-agreement – k Trees
Yaw-Ling Lin, Providence, Taiwan 34
Isomorphic Descendent Subtrees
Yaw-Ling Lin, Providence, Taiwan 35
Isomorphic Descendent Subtrees (2)
Yaw-Ling Lin, Providence, Taiwan 36
Conclusion• Computing all pairs normalized cluster distances between a
ll paired subtrees of two trees can be computationally optimally done in O(n2) time
• Finding nearest subtrees for a collection of pairwise disjointed subsets of leaves can be done in O(n) time.
• Finding all descendent subtrees consisting of the same set of leaves in a set of (unbounded-degree) trees is solvable in time linear to the size of the input trees.
• Finding all isomorhpic descendent subtrees in a set of (unbounded-degree) trees is solvable in time linear to the size of the input trees.
Yaw-Ling Lin, Providence, Taiwan 37
Future Research
• Clustering analysis of 2CS for functional prediction of uncharacterized genes
• Co-evolutionary analysis of 2CS
• (Rooted / unrooted) phylogenetic trees comparison: when edges are labeled with (likelihood, log-odds) distances.
Yaw-Ling Lin, Providence, Taiwan 38
The End
Yaw-Ling Lin, Providence, Taiwan 39
What Date is Today?
• Magic Number:– 4/4, 6/6, 8/8, 10/10, 12/12– 7/11, 9/5 [also 11/7, 5/9]– 3/0? [implying 2/28, 2/0 = 1/31]
• Extension:– 365 = 52 * 7 + 1– Leap Year?
• 2003: 5 ; 2004: 7 ; 2005: 1 ; 2005:2