A Framework For Community Identification in Dynamic Social Networks

A Framework For Community Identification in A Framework For Community Identification in Dynamic Social NetworksDynamic Social Networks

A Framework For Community Identification in A Framework For Community Identification in Dynamic Social NetworksDynamic Social Networks

Chayant TantipathananandhChayant TantipathananandhTanya Berger-WolfTanya Berger-Wolf

David KempeDavid Kempe

Presented by Victor LeePresented by Victor Lee

Outline of Presentation

• The Challenge: Dynamic Social Networks

• Framework and Problem Formulation

• Individual and Group Colorings

• Group Coloring Heuristics

• Experimental Results

• Future Directions

The Problem

• Many well-known approaches to identify communities in social networks– Graph Partitioning– Clustering– Various measures of closeness or density

• But, these approaches generally assume static networks

• Most social networks are dynamic

Dynamic Social Networks

• Social Networks change over time– Membership changes

– Interaction changes

• Most community identification techniques:– Use a single snapshot

– Or use time-averaged measurements

– Lose important information

Importance of Dynamic Information

• Networks 1 and 2: same average characteristics,but…– Network 1 shows an oscillation– Network 2 suggests that C joins the community

A B

A B C

A B

A B C

A B

A B C

time

A B

A B

A B

A B C

A B C

A B C

T1

T2

T3

T4

T5

T6Network 1 Network 2

Proposal

• New framework for modeling social networks over time

• Algorithms and Heuristics to identify dynamic communities

• Experiments to verify the concept and the computational performance

Problem Formation

• Given:– A set of individuals– A sequence of snapshot observations

• Find:– A best-fit set of time-varying communities

C(t)– Best-fit time-varying community membership

for each individual

• Approach:– Combinatorial optimization– Graph coloring

Model: Individuals and Groups

• Set of individuals X = {i1, i2, …in}

• Sequence of observations <P1, P2, …PT>

– Discrete time– Record interaction between individuals

• The set of individuals interacting at time t define a group.– If A interacts with B, and B interacts with C,

than {A,B,C} ⊆ a groupA

BC

Group vs Community

• Snapshot Graph– Individual is a vertex– Interaction is an edge– Group is a connected subgraph– Assumption: interaction is sufficiently limited so

that the graph is not connected (we have disjoint groups)

• Group ≠ Community– Groups capture observed interaction at a point in

time– Communities extend over time

Graphing the Observations

• Each time slice is one observation

• Edges within a time slice show observed interaction at time t

• Add edges joining all observations of the same individual

• No edges between groups from one time to another○ =

individual□ = group

Refine the Problem

• A community appears as a sequence of groups, of at most one group per time slice.

• Tasks:– Assign each group to a community

(color the group vertices)– Assign each individual to a community, for each

time step (color individual vertices)

• More Assumptions:– Individuals belong to one community at a time– Individuals don’t change community frequently– Individuals frequently appear in their community

Cost Model

• Quantify a “good” community identification

• Assign costs to undesirable behavior:– I-cost: when an individual changes color.– G-costs:

• 1 when an individual is absent from its community.

• 2 when an individual is present in a different community.

– C-cost: for each color that I uses

• Find a coloring with minimum cost

Coloring Choices and Costs

• Coloring 1: C changes community and then changes back.– Cost = 2* (+ if this color hasn’t been used before)

• Coloring 2: C stays in its original community and just visits.– Cost = 1 + 2

• Optimal coloring depends on comparison (1 + 2) < (2* + ) or (2*)

A B

A B

C

A B

A B

Ctime

T1

T2

T3

T4

Coloring 1 Coloring 2

C

C

D

D

D

D A B

A B

C

A B

A B

C

C

C

D

D

D

D

At time T3, C temporarily changes its interaction.

Finding Optimal Colorings

• Finding the optimal solution is NP-hard

• Partition the problem:1. Find an optimal set of communities2. Find optimal assignment of individuals to

communities

• If Phase 1 (Group Coloring) is completed first:– Phase 2 is reduced from O(2N) to O(2G),

N = # of individuals, G = # of groups

– The cost incurred by one individual’s coloring is independent of the colors chosen by others.

Independence of Individual Color Choice

Proof:• Cost of an individual’s behavior

= A (I-cost) + B (G-cost) + C * (C-cost)

• Costs are assessed individually:– I-cost = ∗ (# of color changes)– G-cost = 1∗ (# absences from its group)

+ 2∗ (# visits to other groups)– C-cost = ∗ (# of colors that an individual uses)

• So, we can solve for each individual one at a time.

• Moreover, we can assess cost incrementally,from time t to time t+1…

Individual Coloring Algorithm• C = set of all colors observed to be used by an individual i (t) = {S ⊆ C: 1 ≤ |S| ≤ t} all possible subsets of colors up to

time t

• G(t,x) = G-cost to use color x at time t• I(t,x,y) = I-cost to use color x at time t-1 and color y at time t• C(x,R) = C-cost to use color x when color set R has been

used

Min. cost at time t, using color x, with color set S used:• At time=1: (I, {x}, x) = G(1,x)

At time=t: (t, S, x) = G(t, x)+ min [ (t-1, R, y) + I(t, x, y) + C(x, R) ]

over all R and y, whereR ∈ (t-1), y ∈ RR U {x} = S, i-cost: changing

colorg-cost: wrong groupc-cost: new color

Optimal Individual Coloring

• Given a group coloring, the minimum cost of coloring the individual I is

min (T, S, x)S ∈ (T), x ∈ S

• Time complexity is O( nT|C|2 2|C| )

• Space requirement is O( |C| 2|C| )

• If the number of groups |C| is not large, the complexity is tractable.

Optimal Group Coloring

• Determine the best mapping of groups at time t to groups at time t+1

• Groups that are mapped across time are part of the same community and have the same color

• A coloring is good if most individuals can retain their color from step to step.

A possible coloring

Bipartite Matching Heuristic

• Matching Graph– For each pair of groups g, g’ at times t, t’=t+1,

add a weighted edge from vg,t to vg’,t’

– Weight = |g ∩ g’| (similarity of g to g’)

• Find the maximum weight bipartite matching

• Evaluation– Weights i-cost more than g-cost– Performs well if membership is fairly stable – No long range perspective– More efficient heuristics?

i-cost: changing colorg-cost: wrong groupc-cost: new color

Greedy Heuristics for Group Coloring

• Approach: Maximize pairwise similarity between groups, for all pairs of groups over all timesteps

• Jaccard’s index:Jac(g, g′) = | g ∩ g′|

| g U g′|

• Weighted for temporal proximity:JacD(g, g′) = Jac(g, g′)

| t - t′ |

overlap between g and g′, scaled to size of g and g′

Greedy Heuristics for Group Coloring

• Greedy Heuristic 1 (time is not a factor)– Construct a square similarity matrix of size |#groups|– Using agglomerative clustering

• Greedy Heuristic 2 (look backwards in time) For t=1 to T do– Match most similar pairs g, g′ for any time t′ < t– If similarity=0 or all colors have been used, add a new

color

• Greedy Heuristic 3 (look back the shortest interval)– Like Heuristic 2, but use t′, t′ is the closest value to t such

that ∃ similarity(g, g′) > 0

Experiment 1: Verify the Framework

• Does the framework capture the intuitive concept of dynamic community?

• Procedure– Construct small, synthetic datasets– Use exhaustive search to get a truly optimal

coloring

Experiment 1A: “Assembly Line”

(A) () =(1,0,1,1) (B) = (1,0,3,1)

• At each time step, 1 member leaves and 1 enters a group, resulting in a complete membership change in 3 steps.

• Results change as costs change. (A) favors stable membership. (B) allows for more fluid membership.

Experiment 1B: “Dutiful Children”

• 2, 3, and 4 are Children. 0 and 1 are Parents that visit a different child each timestep.

• Results: Framework succeeds at detecting the individual children as well as the visitation pattern.

(A) () =(1,0,1,1) (B) = (1,0,3,1)

Experiment 2: Quality of Heuristic Results

• Do the heuristics obtain colorings similar to those of an exhaustive search?

• Procedure– Re-test the synthetic datasets using the various heuristics

Results: At least one Heuristic method obtains the same coloring and total cost as Exhaustive Search

Experiment 3: Real World Datasets

• Do the framework and heuristics together obtain expected results using real-world datasets?

Experiment 3A: “Southern Women”

• Eighteen women in 1933 in Natchez, Tennessee• Tracks their attendance at 14 social events

Experiment 3A: Prior Results• Twenty one analyses (1941 to 2001) all show similar results

– Two clear communities– The membership of individuals 8, 9, and 16 is less certain.

Experiment 3A: Results

• Detects 4 communities, which are subsets of the traditional 2 communities

• Individuals 6 and 10 change membership over time

• By adjusting cost factors, the results of most of the 21 prior analyses can be duplicated

=(1,1,1,1)

Experiment 3B: “Grevy’s Zebra”

• 28-member zebra herd observed 44 times over 3 months in 2002

• The graph to the left shows the aggregate interaction.

• Temporal information is lost.

Experiment 3B: Results

• Inferred communities agree with manual results obtained by biologists.– 4 stable communities– Some short-lived communities and some visiting

Conclusions

• We present a framework for identifying communities in dynamic social networks

• The framework produces meaningful results compared to traditional methods

• Heuristic methods produce near-optimal solutions

• Future Directions– Develop an approximation algorithm which guarantees

the quality of the result– Investigate scalability over network size and time– Relax assumptions about interaction and dynamics

Date post:	09-Jan-2016
Category:	Documents
Upload:	keelia
View:	28 times
Download:	1 times

A Framework For Community Identification in Dynamic Social Networks

Documents