BitPhylogeny: a probabilistic framework forreconstructing intra-tumor phylogenies
Thomas Sakoparnig
Feb 05, 2016
1 / 15
Intra-tumor view of carcinogenesis
Figure: Nik-Zainal et al., (2012) Cell
2 / 15
Intra-tumor tree of one breast cancer sample
Figure: Nik-Zainal et al., (2012) Cell3 / 15
Intra-tumor phylogenies
Poly-clonal tumor
Clones Normal
A B CClassical phylogenetic trees,hierarchical clustering
Di�erentiation hierarchy
10%
60%30%
Goal: probabilistic model for differentiation hierarchy
4 / 15
Tree structured Dirichlet process - probabilistic clusteringprior
(a) Dirichlet process stick breaking
(b) Tree-structured stick breaking
Figure 1: a) Dirichlet process stick-breaking procedure, with a linear partitioning. b) Interleaving two stick-breaking processes yields a tree-structured partition. Rows 1, 3 and 5 are ν-breaks. Rows 2 and 4 are ψ-breaks.
“prototypes”) should be able to live at internal nodes in the tree, and 2) as the ancestor/descendantrelationships are not known a priori, the data should be infinitely exchangeable.
2 A Tree-Structured Stick-Breaking Process
Stick-breaking processes based on the beta distribution have played a prominent role in the develop-ment of Bayesian nonparametric methods, most significantly with the constructive approach to theDirichlet process (DP) due to Sethuraman [10]. A random probability measure G can be drawn froma DP with base measure αH using a sequence of beta variates via:
G =∞�
i=1
πi δθiπi = νi
i−1�i�=1
(1− νi�) θi ∼ H νi ∼ Be(1, α) π1 = ν1. (1)
We can view this as taking a stick of unit length and breaking it at a random location. We call theleft side of the stick π1 and then break the right side at a new place, calling the left side of this newbreak π2. If we continue this process of “keep the left piece and break the right piece again” as inFig. 1a, assigning each πi a random value drawn from H , we can view this is a random probabilitymeasure centered on H . The distribution over the sequence (π1, π2, · · · ) is a case of the GEMdistribution [11], which also includes the Pitman-Yor process [12]. Note that in Eq. (1) the θi are i.i.d.from H; in the current paper these parameters will be drawn according to a hierarchical process.
The GEM construction provides a distribution over infinite partitions of the unit interval, with naturalnumbers as the index set as in Fig. 1a. In this paper, we extend this idea to create a distribution overinfinite partitions that also possess a hierarchical graph topology. To do this, we will use finite-lengthsequences of natural numbers as our index set on the partitions. Borrowing notation from the Polyatree (PT) construction [13], let �=(�1, �2, · · · , �K), denote a length-K sequence of positive integers,i.e., �k∈N+. We denote the zero-length string as �=ø and use |�| to indicate the length of �’ssequence. These strings will index the nodes in the tree and |�| will then be the depth of node �.
We interleave two stick-breaking procedures as in Fig. 1b. The first has beta variates ν�∼Be(1, α(|�|))which determine the size of a given node’s partition as a function of depth. The second has betavariates ψ�∼Be(1, γ), which determine the branching probabilities. Interleaving these processespartitions the unit interval. The size of the partition associated with each � is given by
π� = ν�ϕ�
���≺�
ϕ��(1− ν��) ϕ��i= ψ��i
�i−1�j=1
(1− ψ�j) πø = νø, (2)
where ��i denotes the sequence that results from appending �i onto the end of �, and ��≺� indicatesthat � could be constructed by appending onto ��. When viewing these strings as identifying nodes ona tree, {��i : �i∈1, 2, · · · } are the children of � and {�� : ��≺�} are the ancestors of �. The {π�} inEq. (2) can be seen as products of several decisions on how to allocate mass to nodes and branches inthe tree: the {ϕ�} determine the probability of a particular sequence of children and the ν� and (1−ν�)terms determine the proportion of mass allotted to � versus nodes that are descendants of �.
2
Adams et al. (2010) NIPS
I Nodes correspond to clonesI Data is placed in nodes
5 / 15
Flexible framework
C
C
AA
AA
AA ...ACTACAGCAC..
...ACTACAGCAC..
...ACTACCGCAC..
...ACTACAGCAC..
...ACTACAGCAC..
...ACTACCGCAC..
...ACTACAGCAC..
...ACTACAGCAC..
...ACTACAGCAC..
I transition kernel - clone parameters depend on parent clonesI provides link to classical phylogenyI back mutation for methylationsI no back mutation for single nucleotide variants
6 / 15
Performance - clustering and tree summaries
mutator phenotype
0 0.01 0.02 0.05
0.0
0.2
0.4
0.6
0.8
1.0
v−m
easu
re
0 0.01 0.02 0.05
0.5
0.6
0.7
0.8
0.9
1.0
v−m
easu
re
tracesBitPhylogenyk−centroidshierarchical clustering
0 0.01 0.02 0.05
0.5
0.6
0.7
0.8
0.9
1.0
v−m
easu
re
error
5 10 15 20
05
1015
mono clone t[2]
mon
o_cl
one_
t[1]
truthBitPhylogenyk−centroidshierarchical clustering
max
imum
tree
dep
th
5 10 15 20 25 30
hyper clone t[2]
hype
r_cl
one_
t[1]
05
1015
max
imum
tree
dept
h
noiselessnoise levels:0.01,0.02,0.05
15 20 25 30
05
1015
20
number of clones
max
imum
tree
dep
thm
axim
um tr
ee d
epth
number of nodes
polyclonal
monclonal
7 / 15
Colon cancer
I about 10.000 cells persample
I Bisulfite sequencing(bulk sequencing)
I IRX2 locus: 201 bplocus
I span 8 CpG regionSottoriva et al. (2013)Cancer Research
8 / 15
Tumor I
0.0
0.2
0.4
0.6
0.8
1
CT_L2 CT_L3 CT_L7 CT_L8 CT_R1 CT_R4 CT_R5 CT_R6
A
0.0
0.5
1.0
1.5
2.0
2 3 4 5Maximum tree depth
Densi
ty
B
0 2 4 6 8 10 12
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Total branch legnth
Densi
ty
C
left sideright side
Left side Right side
9 / 15
Tumor II
0 2 4 6 8 10 12
0.0
0.1
0.2
0.3
0.4
0.5
Total branch length
Densi
ty
Number of clonesM
axim
um
tre
e d
ep
th4 6 8 10 12
2.6
2.8
3.0
3.2
left sideright side
10 / 15
Leukemia - Myeloproliferative neoplasm
I 56 cancer cellsI whole-exome
sequencingI 712 SNVsI 43 % allelic dropout
rateI assumption: infinite
sites modelHou et. al (2013) Cell
11 / 15
Clonal hierarchy
a:1
00
e:2
d:6
b:7
c:34
f:3
h:2i:4
subtree 2
subtree 1
subtree 3
g:1
0
12 / 15
Discussion
I Probabilistic model for intra-tumor phylogeny reconstructionI Faster inference neededI Comparison method for evolutionary trees is lackingI Vision: patient stratification
13 / 15
Acknowledgement
I Niko BeerenwinkelI Florian MarkowetzI Ke Yuan
14 / 15