A Framework For Community Identification in A Framework For Community Identification in Dynamic Social NetworksDynamic Social Networks
A Framework For Community Identification in A Framework For Community Identification in Dynamic Social NetworksDynamic Social Networks
Chayant TantipathananandhChayant TantipathananandhTanya Berger-WolfTanya Berger-Wolf
David KempeDavid Kempe
Presented by Victor LeePresented by Victor Lee
Outline of Presentation
• The Challenge: Dynamic Social Networks
• Framework and Problem Formulation
• Individual and Group Colorings
• Group Coloring Heuristics
• Experimental Results
• Future Directions
The Problem
• Many well-known approaches to identify communities in social networks– Graph Partitioning– Clustering– Various measures of closeness or density
• But, these approaches generally assume static networks
• Most social networks are dynamic
Dynamic Social Networks
• Social Networks change over time– Membership changes
– Interaction changes
• Most community identification techniques:– Use a single snapshot
– Or use time-averaged measurements
– Lose important information
Importance of Dynamic Information
• Networks 1 and 2: same average characteristics,but…– Network 1 shows an oscillation– Network 2 suggests that C joins the community
A B
A B C
A B
A B C
A B
A B C
time
A B
A B
A B
A B C
A B C
A B C
T1
T2
T3
T4
T5
T6Network 1 Network 2
Proposal
• New framework for modeling social networks over time
• Algorithms and Heuristics to identify dynamic communities
• Experiments to verify the concept and the computational performance
Problem Formation
• Given:– A set of individuals– A sequence of snapshot observations
• Find:– A best-fit set of time-varying communities
C(t)– Best-fit time-varying community membership
for each individual
• Approach:– Combinatorial optimization– Graph coloring
Model: Individuals and Groups
• Set of individuals X = {i1, i2, …in}
• Sequence of observations <P1, P2, …PT>
– Discrete time– Record interaction between individuals
• The set of individuals interacting at time t define a group.– If A interacts with B, and B interacts with C,
than {A,B,C} ⊆ a groupA
BC
Group vs Community
• Snapshot Graph– Individual is a vertex– Interaction is an edge– Group is a connected subgraph– Assumption: interaction is sufficiently limited so
that the graph is not connected (we have disjoint groups)
• Group ≠ Community– Groups capture observed interaction at a point in
time– Communities extend over time
Graphing the Observations
• Each time slice is one observation
• Edges within a time slice show observed interaction at time t
• Add edges joining all observations of the same individual
• No edges between groups from one time to another○ =
individual□ = group
Refine the Problem
• A community appears as a sequence of groups, of at most one group per time slice.
• Tasks:– Assign each group to a community
(color the group vertices)– Assign each individual to a community, for each
time step (color individual vertices)
• More Assumptions:– Individuals belong to one community at a time– Individuals don’t change community frequently– Individuals frequently appear in their community
Cost Model
• Quantify a “good” community identification
• Assign costs to undesirable behavior:– I-cost: when an individual changes color.– G-costs:
• 1 when an individual is absent from its community.
• 2 when an individual is present in a different community.
– C-cost: for each color that I uses
• Find a coloring with minimum cost
Coloring Choices and Costs
• Coloring 1: C changes community and then changes back.– Cost = 2* (+ if this color hasn’t been used before)
• Coloring 2: C stays in its original community and just visits.– Cost = 1 + 2
• Optimal coloring depends on comparison (1 + 2) < (2* + ) or (2*)
A B
A B
C
A B
A B
Ctime
T1
T2
T3
T4
Coloring 1 Coloring 2
C
C
D
D
D
D A B
A B
C
A B
A B
C
C
C
D
D
D
D
At time T3, C temporarily changes its interaction.
Finding Optimal Colorings
• Finding the optimal solution is NP-hard
• Partition the problem:1. Find an optimal set of communities2. Find optimal assignment of individuals to
communities
• If Phase 1 (Group Coloring) is completed first:– Phase 2 is reduced from O(2N) to O(2G),
N = # of individuals, G = # of groups
– The cost incurred by one individual’s coloring is independent of the colors chosen by others.
Independence of Individual Color Choice
Proof:• Cost of an individual’s behavior
= A (I-cost) + B (G-cost) + C * (C-cost)
• Costs are assessed individually:– I-cost = ∗ (# of color changes)– G-cost = 1∗ (# absences from its group)
+ 2∗ (# visits to other groups)– C-cost = ∗ (# of colors that an individual uses)
• So, we can solve for each individual one at a time.
• Moreover, we can assess cost incrementally,from time t to time t+1…
Individual Coloring Algorithm• C = set of all colors observed to be used by an individual i (t) = {S ⊆ C: 1 ≤ |S| ≤ t} all possible subsets of colors up to
time t
• G(t,x) = G-cost to use color x at time t• I(t,x,y) = I-cost to use color x at time t-1 and color y at time t• C(x,R) = C-cost to use color x when color set R has been
used
Min. cost at time t, using color x, with color set S used:• At time=1: (I, {x}, x) = G(1,x)
At time=t: (t, S, x) = G(t, x)+ min [ (t-1, R, y) + I(t, x, y) + C(x, R) ]
over all R and y, whereR ∈ (t-1), y ∈ RR U {x} = S, i-cost: changing
colorg-cost: wrong groupc-cost: new color
Optimal Individual Coloring
• Given a group coloring, the minimum cost of coloring the individual I is
min (T, S, x)S ∈ (T), x ∈ S
• Time complexity is O( nT|C|2 2|C| )
• Space requirement is O( |C| 2|C| )
• If the number of groups |C| is not large, the complexity is tractable.
Optimal Group Coloring
• Determine the best mapping of groups at time t to groups at time t+1
• Groups that are mapped across time are part of the same community and have the same color
• A coloring is good if most individuals can retain their color from step to step.
A possible coloring
Bipartite Matching Heuristic
• Matching Graph– For each pair of groups g, g’ at times t, t’=t+1,
add a weighted edge from vg,t to vg’,t’
– Weight = |g ∩ g’| (similarity of g to g’)
• Find the maximum weight bipartite matching
• Evaluation– Weights i-cost more than g-cost– Performs well if membership is fairly stable – No long range perspective– More efficient heuristics?
i-cost: changing colorg-cost: wrong groupc-cost: new color
Greedy Heuristics for Group Coloring
• Approach: Maximize pairwise similarity between groups, for all pairs of groups over all timesteps
• Jaccard’s index:Jac(g, g′) = | g ∩ g′|
| g U g′|
• Weighted for temporal proximity:JacD(g, g′) = Jac(g, g′)
| t - t′ |
overlap between g and g′, scaled to size of g and g′
Greedy Heuristics for Group Coloring
• Greedy Heuristic 1 (time is not a factor)– Construct a square similarity matrix of size |#groups|– Using agglomerative clustering
• Greedy Heuristic 2 (look backwards in time) For t=1 to T do– Match most similar pairs g, g′ for any time t′ < t– If similarity=0 or all colors have been used, add a new
color
• Greedy Heuristic 3 (look back the shortest interval)– Like Heuristic 2, but use t′, t′ is the closest value to t such
that ∃ similarity(g, g′) > 0
Experiment 1: Verify the Framework
• Does the framework capture the intuitive concept of dynamic community?
• Procedure– Construct small, synthetic datasets– Use exhaustive search to get a truly optimal
coloring
Experiment 1A: “Assembly Line”
(A) () =(1,0,1,1) (B) = (1,0,3,1)
• At each time step, 1 member leaves and 1 enters a group, resulting in a complete membership change in 3 steps.
• Results change as costs change. (A) favors stable membership. (B) allows for more fluid membership.
Experiment 1B: “Dutiful Children”
• 2, 3, and 4 are Children. 0 and 1 are Parents that visit a different child each timestep.
• Results: Framework succeeds at detecting the individual children as well as the visitation pattern.
(A) () =(1,0,1,1) (B) = (1,0,3,1)
Experiment 2: Quality of Heuristic Results
• Do the heuristics obtain colorings similar to those of an exhaustive search?
• Procedure– Re-test the synthetic datasets using the various heuristics
Results: At least one Heuristic method obtains the same coloring and total cost as Exhaustive Search
Experiment 3: Real World Datasets
• Do the framework and heuristics together obtain expected results using real-world datasets?
Experiment 3A: “Southern Women”
• Eighteen women in 1933 in Natchez, Tennessee• Tracks their attendance at 14 social events
Experiment 3A: Prior Results• Twenty one analyses (1941 to 2001) all show similar results
– Two clear communities– The membership of individuals 8, 9, and 16 is less certain.
Experiment 3A: Results
• Detects 4 communities, which are subsets of the traditional 2 communities
• Individuals 6 and 10 change membership over time
• By adjusting cost factors, the results of most of the 21 prior analyses can be duplicated
=(1,1,1,1)
Experiment 3B: “Grevy’s Zebra”
• 28-member zebra herd observed 44 times over 3 months in 2002
• The graph to the left shows the aggregate interaction.
• Temporal information is lost.
Experiment 3B: Results
• Inferred communities agree with manual results obtained by biologists.– 4 stable communities– Some short-lived communities and some visiting
Conclusions
• We present a framework for identifying communities in dynamic social networks
• The framework produces meaningful results compared to traditional methods
• Heuristic methods produce near-optimal solutions
• Future Directions– Develop an approximation algorithm which guarantees
the quality of the result– Investigate scalability over network size and time– Relax assumptions about interaction and dynamics