Fragmentation Coagulation Processes
Konstantina Palla
Reading and Communication Club
Konstantina Palla 1 / 20
Talk based on
• Modelling Genetic Variations using Fragmentation-Coagulation ProcessesY. W. Teh, C. Blundell and L. T. Elliott. NIPS 2011.
• Scalable Imputation of Genetic Data with a DiscreteFragmentation-Coagulation ProcessL. T. Elliott and Y. W. Teh. NIPS 2012.
Konstantina Palla 2 / 20
Partition-valued Processes
Mechanisms
Discrete Fragmentation Coagulation Process
Continuous Fragmentation Coagulation Process
Konstantina Palla 3 / 20
PARTITION-VALUED PROCESSES
1 3 7
2 8
4 5 6
9
1 7
2 8 4
5 6
9
3
1
2
5
6 9
3 7 4
8
1 8
2 5
6 9
7 4
3
π1 π2 π3 π4
t1 t2 t3 t4
Let:
• [n] denote the natural numbers1, . . . , n.
• Π[n] is the set of unlabelledpartitions of [n]
• Each π ∈ Π[n] is a set of disjointnon-empty clusters indexed by acovariate (time/location); πi
• Markov chain whose states arepartitions of the natural numbers
• When ti+1 = ti + dt and dt→ 0;Markov jump process
Konstantina Palla 4 / 20
PARTITIONING MECHANISMS
1. Fragmentation
2. Coagulation
3. Fragmentation/Coagulation
Konstantina Palla 5 / 20
FRAGMENTATION MECHANISM
• π(0): all objects belong to the samecluster
• define: πt+1|πt ∼ FRAGα,θf (πt)
• Partition each cluster ci of πt furtheraccording to CRPci (α, θf ), wherei = 1, . . . ,K and K is the number ofclusters in πt and θf is a parametergoverning the fragmentation rate.
• Then πt+1 is a refinement of πt
• Top down generative process fortrees
1 2 3 4 5 6 7 8 9
1 3 4 8 2 7 9 5 6
1 8 3 4
1
2 7 9 5 6
8 3 4 2 7 9 5 6
1 8 2 5 6 3 4 7 9
πt+2
πt
πt+1
πt+3
πt+4
Let ρi be the random partitions drawn from each CRPci (0, θf ) at state πt
• P(πt+1|πt) =∏K
i=1 p(ρi)
• For a single cluster in πt p(ρi) = CRPci (0, θf ) =Γ(θf )
Γ(θf +ni)θKi
f
∏Kij=1 Γ(nij),
where Ki is the number of clusters in partition ρi
• Example of discrete time case: nested CRP [Blei, Griffiths, Jordan, 2003]Konstantina Palla 6 / 20
FRAGMENTATION MECHANISM - CONTINUOUS CASE
• Continuous-indexed analogue of nCRP.
• Use of θf dt, compute P(π(t + dt)|π(t)) as dt→ 0
• For a single cluster in π(t):
p(ρi) =θ
Ki−1f
Γ(ni)
∏Kij=1 Γ(nij)limdt→0dtKi−1
• Fragmentation rate of a single cluster to Ki clusters:
θfKi−1
∏Ki=1j=1 Γ(nij)
Γ(ni)
• Rate of a single cluster c in π(t) fragmenting to two clusters aand b (c = a ∪ b) is for Ki = 2: θf
Γ(na)Γ(nb)Γ(nc)
• Example: Dirichlet Diffusion Trees [Neal, 2001]. At a branchpoint:
• P(following branch k) = nkm
• P(diverging) =θf dt
m• nk: number of objects which previously took branch k• K: current number of branches from this branch point• m =
∑Kk=1 nk: number of samples which previously took
the current path• P(π(t + dt)|π(t)) = 1
1θf dt
213
1 2 3 4
1 2
3 4
Konstantina Palla 7 / 20
COAGULATION MECHANISM
• π(0): each object is in its owncluster.
• define: πt+1|πt ∼ COAGα,θc (πt)
• Partition the set of clusters of πt
according to a CRPπt (α, θc), replaceeach cluster with the union of itselements,
• πt+1 is coarser than πt
• Bottom up generative process fortrees
• For α = 0, K clusters in πt+1 and nin πt: P(πt+1|πt) = CRPπt (α, θc) =
Γ(θc)Γ(θc+n)θ
Kc∏K
i=1 Γ(ni)
1 2 3 4 5 6 7 8 9
1 3 4 8 2 7 9 5 6
1 8 3 4
1
2 7 9 5 6
8 3 4 2 7 9 5 6
1 8 2 5 6 3 4 7 9
πt+2
πt+4
πt+3
πt+1
πt
Konstantina Palla 8 / 20
COAGULATION MECHANISM - CONTINUOUS CASE
• Continuous-indexed analogue of thediscrete coagulation tree
• Kingman’s coalescent [Kingman,1982]
• Define coagulation parameter θcdt and
π(t + dt)|π(t) ∼ COAGα, θc
dt(π(t))
• K and n the number of clusters inπ(t + dt) and π(t)
• As dt→ 0: P(π(t + dt)|π(t)) =
θcK−n[
∏Ki=1 Γ(ni)] limdt→0
dt−K+1
dt−n+1
• Rate of two clusters in π(t)coagulating to one in π(t + dt), forK = n− 1 : θ−1
c
1 2 3 4 5 6 7 8 9
1 3 4 8 2 7 9 5 6
1 8 3 4
1
2 7 9 5 6
8 3 4 2 7 9 5 6
1 8 2 5 6 3 4 7 9
πt+2
πt+4
πt+3
πt+1
πt
Konstantina Palla 9 / 20
FRAGMENTATION - COAGULATION DUALITY
Duality between Pitman-Yor fragmentations and coagulations.
Theorem [Pitman, 1999]For all 0 < α < 1, 0 ≤ β < 1 and θ > −αβ, the following statements are equivalent
π ∼ CRP[n](αβ, θ) and F|π ∼ FRAGα,−αβ(c)
η ∼ CRP[n](α, θ) and ρ|η ∼ COAGβ, θα
(η)
Equivalent: P(π = S,F = T) = P(ρ = S, η = T)
1 2 3 5 8
4 6 9
7
1 3 8
4 6 9
7
2 5
Fragmenta3on
Coagula3on
π η
A
B
C
D
1 3 8
2 5
F1 4 6 9
F2 7 F3
A B
C
D
ρ
• P(π) = P(ρ) = CRP(αβ, θ)
• P(F) = P(η) = CRP(α, θ)
Fragmentation(/coagulation) is the timereversal of coagulation(/fragmentation)(with appropriately chosen parameters).
Konstantina Palla 10 / 20
DISCRETE FRAGMENTATION COAGULATION PROCESS -DFCP
A B
C
E
D
CRP[n](R, θ) CRP[n](R, θ)
CRP[n](0, θ) CRP[n](0, θ) CRP[n](0, θ)
πt πt+1 πt+2
• A Markov chain over partitions.• Transition using fragmentation followed by coagulation.• Assume T steps on Markov chain. Define (Rt)
T−1t=1 ; control the dependence
between πt and πt+1
• Duality theorem for β = 0, α = R at each step:
π ∼ CRP[n](0, θ) and Fc|π ∼ FRAGR,0(c)∀c ∈ πη ∼ CRP[n](R, θ) and ρ|η ∼ COAG0, θR
(η)
Konstantina Palla 11 / 20
DFCP - CONT.
A B
C
E
D
CRP[n](R, θ) CRP[n](R, θ)
CRP[n](0, θ) CRP[n](0, θ) CRP[n](0, θ)
πt πt+1 πt+2
• Stationary distribution: CRP(0, θ) Show detailed balanceP(A)P(A→ B) = P(B)P(B→ A) Hint:P(A)P(A→ B) =
∑C P(A)P(B|C)P(C|A) and apply duality twice
Konstantina Palla 12 / 20
DFCP - CONT.
A B
C
E
D
CRP[n](R, θ) CRP[n](R, θ)
CRP[n](0, θ) CRP[n](0, θ) CRP[n](0, θ)
• Reversible markov chain; due to F/C duality
• Exchangeable: each πt has CRP marginal
• Projectivity: CRP projective
Konstantina Palla 13 / 20
CONTINUOUS FCP
• Continuum limit of DFCP;π(t + dt)|π(t) when dt→ 0
• Time Markov process overpartitions, an exchangeablefragmentation-coalescence process[Berestycki, 2004]
• Binary events only: at most onefragmentation or one coagulation ateach time
• One cluster fragments to two,OR
• Two clusters coagulate to one• Parameters: θ = R
α, where
α > 0 and R > 0 parametersgoverning the rate ofcoagulation and fragmentationrespectively
A B
C
CRP[n](Rdt, θ)
CRP[n](0, θ) CRP[n](0, θ)
πt πt+dt
Konstantina Palla 14 / 20
CONTINUOUS FCP - CONT.
Why binary events?
Fragmentation: for each cluster ci in π(t)
• θf = R, probability of fragmentingto Ki clusters in π(t + dt):
P(ρi) =
RKi−1
Γ(ni)
Ki∏j=1
Γ(nij)limdt→0dtKi−1
=
O(1) ,Ki = 1O(dt) ,Ki = 2O(dt2) ,Ki > 2
Coagulation: K clusters in π(t + dt), n inπ(t)
• Set θc = 1α
P(π(t + dt)|π(t)) =
αn−KK∏
i=1
Γ(ni)limdt→0dt−K+1
dt−n+1
=
O(1) ,K = nO(dt) ,K = n− 1O(dt2) ,K < n− 1
Konstantina Palla 15 / 20
CONTINUOUS FCP - CONT.
• CFCP continuous-time Markov process π = (π(t), t ∈ [0, T])
• Space of partitions (states) is finite→Markov jump process
• Transition rate matrix Q[qij]: the transition rate from state i to j
• Total transition rate from state i: qi =∑
j6=i qij
qi = fragrate + coagrate = R∑
c Hnc−1 + α n(n−1)2
• Homogeneous Poisson Process: initial state π(0) = s, mean number of eventsqsT
• Interarrival time ∼ exp(qs)
• Equilibrium distribution: CRP(0, Rα
).
Konstantina Palla 16 / 20
SEQUENTIAL GENERATIVE PROCESS IN CFCP
• Incremental construction: describe the law of the path of object i given thepaths of 1, . . . , i− 1 objects
• Let ci(t) denote the cluster object i belongs to at time t.
• c1(t) = constant
• t = 0 : CRPn(0, Rα
)
• t>0: If ith object is in an existing cluster c, ci(t−) = c:
• if c fragments to a and b: P(ci(t)|ci(t−))
=
{|a||c|−1 , ci(t) = a|b||c|−1 , ci(t) = b
Same as DFT’s probability of following a path.• If c coagulates at time t: P(ci(t) = c′) = 1• If no F/C event involving c takes place, c fragments with rate: R
|c|−1Same as DFT’s branching probability.
• t>0: If ith object is alone in cluster, ci(t−) = ∅, it joins an existing cluster withrate α and any existing cluster with rate α|π|[i−1](t)|
Same as Kingman’s coalescent.
Konstantina Palla 17 / 20
CONDITIONAL DISTRIBUTION IN CFCP - CONT.
1 3 6
2 7
4 5 8
9
1 3 6
2 4 5 7 8 9
1 3 6
2 5 8
47
9
θ/(θ+n) Coag, rate: α
3/5
2/5
Frag, rate: R/2
(figure taken from Yee Whye Teh’s slides, MLSS 2011)Konstantina Palla 18 / 20
CFCP - PROPERTIES
• Rate of fragmentation same as for Dirichlet diffusion trees (with constant rate)
• Rate of coagulation is same as Kingman’s coalescent.
• Markov process is
• reversible; fragmentation (DFT) is precisely the converse of coagulation(KC).
• exchangeable; CRP marginals
Konstantina Palla 19 / 20
Thank you!
Konstantina Palla 20 / 20