SCNSMAA F03 Phylogenetic Distance and Coxeter Groups 1
Phylogenetic Distance
andCoxeter Groups
David J. Hunter
Department of Mathematics, Westmont College
SCNSMAA F03 Phylogenetic Distance and Coxeter Groups 2
The Phylogeny Problem
Given genetic data from a set of related organisms,
reconstruct the evolutionary tree.
SCNSMAA F03 Phylogenetic Distance and Coxeter Groups 3
Distance Based Methods
Definition: The phylogenetic distance between two
genomes is the number of evolutionary events (e.g.,
mutations) that occurred in the transition from one
genome to the other.
Given reasonably accurate estimates for the pairwise
phylogenetic distances among a set of genomes, methods
exist that will reconstruct the true topology of the
evolutionary tree. (See Saitou and Nei, 1987.)
Many estimators exist: breakpoint distance, inversion
distance, EDE, IEBP, etc.
SCNSMAA F03 Phylogenetic Distance and Coxeter Groups 4
Inversions
An inversion is a mutation of a chromosome where a
sequence of genes is broken off and glued back in reverse
order. The directionality of the inverted sequence is
toggled.
1 2 3 4 5 6 7 8 9 10 11 1 2 −5 −4 −3 6 7 8 9 10 11
SCNSMAA F03 Phylogenetic Distance and Coxeter Groups 5
Algebraic Model
Let n be the number of genes. The set of possible
genomes can be modeled as the group Bn of signed
permutations. Secretly,
Bn∼= Z/2 o Sn
but it is more convenient to view Bn as a subgroup of
O(n).
SCNSMAA F03 Phylogenetic Distance and Coxeter Groups 6
Generators of Bn:
1. . .
0 1. . .
1 0. . .
1
,
1. . .
−1. . .
1
Note: These are reflections in Rn.
SCNSMAA F03 Phylogenetic Distance and Coxeter Groups 7
Finite groups of reflections are Coxeter groups, so they
have the following presentation:
Generators: s1, s2, . . . , sn
Relations: (sisj)o(sisj) = 1
Many results in Coxeter group theory deal with this
presentation.
Key Fact: In the case of Bn the set {s1, s2, . . . , sn}contains all the inversions of length 2.
Therefore, word length l(α) can be used as a phylogenetic
distance estimator (related to inversion distance).
SCNSMAA F03 Phylogenetic Distance and Coxeter Groups 8
Since Bn is a finite reflection group, we have
Theorem: [Humphreys] The Poincare polynomial
Wn(t) of Bn has the following factorization, where l(α) is
word length:
Wn(t) =∑
α∈Bn
tl(α) =n∏
i=1
t2i − 1
t− 1
i.e., Wn(t) is a generating function for P (n, k), the
number of words of length k. For example, when n = 4,
W4(t) = 1 + 4t + 9t2 + 16t3 + 24t4 + 32t5 + 39t6
+44t7 + 46t8 + 44t9 + 39t10 + 32t11
+24t12 + 16t13 + 9t14 + 4t15 + t16
SCNSMAA F03 Phylogenetic Distance and Coxeter Groups 9
Distribution of word length for elements of B10
SCNSMAA F03 Phylogenetic Distance and Coxeter Groups 10
It is possible to solve for the coefficients of Wn(t) for
k ≤ 2n:
P (n, k) =
n + k + 1
k
+
∑i≥1
(−1)i
n + k − 1− i(3i− 1)
k − i(3i− 1)
+
n + k − 1− i(3i + 1)
k − i(3i + 1)
where(
ij
)= 0 if j < 0.
SCNSMAA F03 Phylogenetic Distance and Coxeter Groups 11
Computer simulations indicate that inversion distance
tends to underestimate the true phylogenetic distance.
(See, for example, Moret, Tang, Wang, Warnow, 2001.)
Actual Number of Inversions (Simulated)
Inve
rsio
n D
ista
nce
SCNSMAA F03 Phylogenetic Distance and Coxeter Groups 12
Questions
• Is it possible to quantify how bad the estimate is
(and correct it)?
• Are there other phylogenetic distance estimators
lurking in Bn?
• What about the unsigned case?
• ???
SCNSMAA F03 Phylogenetic Distance and Coxeter Groups 13
References
• James E. Humphreys, Reflection Groups and Coxeter
Groups, Cambridge Study #29, 1990.
• B. M. E. Moret et. al., Steps toward accurate
reconstructions of phylogenies from gene-order data,
LNCS 2149, 2001.
• N. Saitou and M. Nei, The Neighbor-joining method:
A new method for reconstructing phylogenetic trees,
Mol. Biol. Evol., 1987.