Graph Modeled Data Clustering: Fixed Parameter Algorithms for
Clique Generation
J. Gramm, J. Guo, F. Hüffner and R. Niedermeier Theory of Computing Systems (2005)
Student: Vishal Kapoor
Presentation Outline
• Problem Introduction
• Past Research
• Results of the paper
• CLUSTER EDITING– Kernelization– Search Tree
• CLUSTER DELETION
• Questions
Problem Statement
• Make k changes to the edge set of an input graph to get vertex disjoint cliques.
• Each connected component is a clique in the resulting cluster graph
• CLUSTER EDITING– Both edge additions and deletions are allowed
• CLUSTER DELETION– Only edge deletions are allowed
• Used in clustering of data – vertices are adjacent iff their similarity exceeds a threshold
Past Research
• [2000] Study of both these problems started by Shamir et. al. who proved that they are NPC and APX-hard
• [1996] Cai studied the problem of edge additions and deletions and vertex deletions for certain graphs and showed it is FPT
• [2001] Natanzon et. al. gave a general c-approximation for deletion and editing problems on bounded degree graphs for graphs with certain properties
• [2002] Khot and Raman investigated the complexity of vertex deletion problems to find subgraphs with hereditary properties
Results of this paper
• CLUSTER EDITING – O(2.27k+|V|3)
• CLUSTER DELETION – O(1.77k+|V|3)
• By using certain reduction rules, the resulting kernel size = O(k3)– Has at most 2k2+ 2 vertices and 2k3+k2 edges.
u v
common neighbor
non-common neighbor
CLUSTER EDITING
Reduction Rules
• Rule1:a. If u and v have more than k common
neighbors then {u,v} is set to ADDED and added to E if not already there
b. If u and v have more than k non-common neighbors then {u,v} is set to DELETED and deleted from E if already there
c. If u and v have both more than k common neighbors and more than k non-common neighbors then the instance has no solution
Reduction Rules
• Rule2:
• For every 3 vertices u, v and w:a. If {u,v} = ADDED and {u,w} = ADDED then
{v,w} should be set to ADDED and added if not already in E
b. If {u,v} = ADDED and {u,w} = DELETED then {v,w} should be set to DELETED and deleted from E if already present
Running Time
• What is checked?
– Every pair of vertices • Every vertex which is a neighbor of both of
them
• Takes time O(|V|3)
Kernel Size
• The kernel contains at most (2k+1).k vertices and at most (2k+1 choose 2).k edges.
• Proof Skipped
Branch and Search Algorithm
• Identify a bad triple (of 3 vertices) in the kernel and repair it by adding/deleting edges to/from it, to transform the graph into disjoint cliques
• Overall at most k edge additions/deletions are allowed
• 2 branching strategies:– Basic = O(3k)– Advanced = O(2.27k)
• Lemma: A graph consists of disjoint cliques iff there are no three vertices u,v,w such that {u,v}, {u,w} are edges, but {v,w} is not an edge
• i.e. among such a triple, there should either be a single edge or a triangle
• Thus if a graph is not a union of disjoint cliques, then a bad triple can be found and repaired
Basic Branching
v w
u
Basic Branch Algorithm
1. If G is a union of disjoint cliques, return SUCCESS
2. If k <= 0, return FAIL
3. Otherwise, find 3 vertices u,v,w such that edges {u,v}, {u,w} exist and {v,w} does not and branch on 3 instances of G’ as follows:
a. E’ = E – {u,v}, k’=k-1 and set {u,v}=DELETED
b. E’ = E – {u,w}, k’=k-1 and set {u,w} and {v,w}=DELETED, {u,v}=ADDED
c. E’ = E + {v,w}, k’=k-1 and set all edges=ADDED
Branching Rules
v w
u
v w
u
v w
u
v w
u
??
BR3
BR2
BR1
Running timeThe algorithm solves CLUSTER EDITING in
time = O(3k.k2+|V|3)
1. O(|V|3) is the time required to find all bad triples
2. O(3k) is the size of the search tree3. The kernel (modified input G’) has |V| = O(k2)
vertices. So a newly added/deleted edge can create/delete at most O(k2) bad triples. [And the edge list can then be updated only for vertices affected by that edge in O(k2) time.]
Eg.
NOTE: The time can be improved to O(3k+|V|3) by using repeated kernelization at every search tree node whenever possible for a polynomial size problem kernel
• Similarly CLUSTER-DELETION can be solved in time = O(2k+|V|3)
Advanced Branch Algorithm
1. Bad triples are considered, but their classification is refined further as follows:
vw
u vw
u
vw
u
C1
C2
C3
Branching for each case
• For C1: BR3 cannot give a solution better than both BR1 and BR2 and can be omitted
• If N(v) >= N(w), then total edges changed to make 1 clique >= total edges changed to make 2 cliques
u2
v2w2 v1
w1
u1v
w
u
C1
• Edges added to make 1 clique =– {v,w} added = +1– {v,N(w)} added – {u,v} existing = N(v) – 1– {w,N(v)} added – {u,w} existing = N(w) – 1– joining all N(w) and N(v) = ([N(w)+N(v)] choose 2)– joining each N(v) and N(w) with u = N(v)+N(w)– Total = 2.[N(v) + N(w)] + ([N(w)+N(v)] choose 2) – 1 =>(A)
• Edges changed to make 2 cliques =– N(w) deleted = N(w)– {v,N(w)} added – {u,v} existing = N(v) – 1– joining all N(w) and N(v) = ([N(w)+N(v)] choose 2)– joining each N(v) and N(w) with u = N(v)+N(w)– Total = N(v) + 3.N(w) + ([N(w)+N(v)] choose 2) – 1 =>(B)
• Conclusion: As N(v) >= N(w) So (A) >= (B).
u2
v2w2 v1
w1
u1v
w
u
C1
• Thus only BR1 and BR2 can be used:
• So resulting graphs = G\{u,v} or G\{u,w} and branching vector = (1,1)
• And final recurrence relation: T(k) = 2.T(k-1) with root = 2.
• So final tree size for C1 = 2k.
v w
u
v w
u
??
BR2BR1
• For C2:
• Branching Vector = (1,2,3,2,3)
• For C3:
• Branching Vector = (1,2,3,2,3)
Overall Running Time
• Solve T(k) = T(k-1) + 2 [T(k-2) + T(k-3)]
• So final worst search tree size = O(2.27k)
• Thus CLUSTER-EDITING can be solved in O(2.27k+|V|3)
• Cases for CLUSTER-DELETION:
• Branching Vector = (2,3,2,3) and running time = O(1.77k + |V|3)
Questions?
Thanks.