[ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Post on 05-Jul-2015

367 views 0 download

description

We describe the complete skills of tracking cluster evolution patterns in large evolving networks in this talk. In simple words, given a dynamic graph which is updated at each time moment, how can we incrementally monitor the evolution patterns of graph clusters? Typical evolution patterns include appear/disappear, grow/decay, merge/split. We discussed the incremental computation framework, in contrast to the traditional graph snapshot sequence approach. The ICDE 2014 paper can be found at http://www.cs.ubc.ca/~peil/research.html

transcript

Pei Lee, ICDE 2014, Chicago, IL, USA

Incremental Cluster Evolution Tracking

from Highly Dynamic Network Data

Pei Lee, Laks V.S. LakshmananComputer Science Department

University of British Columbia

Vancouver, BC, Canada

Evangelos E. MiliosComputer Science Department

Dalhousie University

Halifax, NS, Canada

1

2014-4-16

Outline2

Motivation

Evolving network meets social event

Incremental Computation Framework

Divide-and-conquer vs. incremental computation

Post Network Construction

Combat noise

Network and Cluster Evolution

Evolution operations

Empirical Study

Examples

Outline3

Motivation

Evolving network meets social event

Incremental Computation Framework

Divide-and-conquer vs. incremental computation

Post Network Construction

Combat noise

Network and Cluster Evolution

Evolution operations

Empirical Study

Examples

Evolving Network

Network changes with time

Examples:

Social Network

add/remove friends or followers

Co-authorship/citation network

new collaborations/citations added every year

Email/Calling Graph

every edge has a time stamp

4

An illustration of evolving co-authorship network

5

Taken from http://wiki.cns.iu.edu/pages/viewpage.action?pageId=2199676

Social Streams:

Twitter, Facebook, etc6

7

Social Event Evolution Tracking

Event Evolution Patterns8

Post Network

(time t)

Post Network

(time t+1)

Event Snapshots

(time t)

Event Snapshots

(time t+1)

Evolution

Patterns:

emerge

disappear

grow

decay

merge

split

evolve

Evolving Network

Social Events

9

Model social stream as an evolving network

Outline10

Motivation

Evolving network meets social event

Incremental Computation Framework

Divide-and-conquer vs. incremental computation

Post Network Construction

Combat noise

Network and Cluster Evolution

Evolution operations

Empirical Study

Examples

Traditional Evolving Network

Mining Approaches

Divide and Conquer:

decompose a dynamic network into a series of

snapshots for each moment,

apply graph mining algorithms on each snapshot

to find useful patterns,

match patterns between consecutive moments to

generate a dynamic pattern sequence.

Imagine the finding of evolving clusters

11

Illustrating Divide-and-Conquer12

Taken from http://sydney.edu.au/engineering/it/~shhong/gallery.htm

Moment 1Moment 2

Moment 3

Moment 4

Moment 5

Divide-and-Conquer:

Clustering in evolving networks13

Ct: a cluster we find at snapshot of time t;

Ct+1: a cluster we find at snapshot of time t+1.

How to define “Ct evolves to Ct+1”?

Heuristics:

If Ct and Ct+1 have the overlap above a given

threshold, we say they are matched.

Formally, based on Jaccard similarity:

Drawbacks of Divide-and-conquer14

Quality:

It is difficult to decide the threshold K

The matching between two consecutive snapshots

will lose accuracy

Performance:

Need to cluster each snapshot from scratch

Lots of redundant computation

New Proposal: Incremental Computation

for dense subgraph mining15

Basic Idea:

For the very first snapshot, mine the graph pattern

set S0 from scratch

After this, this step is never applied again.

On the steady state, let t start at 1

Obtain the graph update ΔG by comparing the

network at moment t with moment t-1

Derive St from St-1 based on ΔG

Let t increase to t+1

Divide-and-Conquer vs. Incremental

Computation16

Divide-and-Conquer:

1, 2, 3, 4

Incremental Computation:

Initial step: 1

Steady state: 5

Advantages:

Avoid redundant computation

More accurately capture the evolution patterns

Incremental Computation

Framework17

Adjust the clusters at each moment as the

updating of networks

Outline18

Motivation

Evolving network meets social event

Incremental Computation Framework

Divide-and-conquer vs. incremental computation

Post Network Construction

Combat noise

Network and Cluster Evolution

Evolution operations

Empirical Study

Examples

Post Network Construction19

A social stream is a FIFO queue of posts

Post similarity:

Post Network:

Each post is a node

Each edge is constructed if the similarity of end nodes is higher than a given threshold

Content similarity

Time distance

Evolving Post Network20

We can build a post network for your daily

timeline in Facebook/Twitter/LinkedIn

As the streaming of posts, the post network is

evolving very quickly

Challenges of evolving post network mining:

The quick surge of post streams (speed)

A large number of posts are noise (quality)

The huge amount of posts (scalability)

Observing Time Window21

Len: time window length

Δt: time window shifting size at each moment

Notations:

How to filter out noise?22

Noise is ubiquitous in social streams

“Good morning ”, “thank you ^.^”, etc

About 40% tweets make very little sense

How to filter out noise?23

Distinguish posts into three types:wt(p): the priority of post p at moment t

For the example in social network:

Core: person with lots of friends

Border: not core, but a friend of core

Noise: not core, and not a friend of core

Outline24

Motivation

Evolving network meets social event

Incremental Computation Framework

Divide-and-conquer vs. incremental computation

Post Network Construction

Combat noise

Network and Cluster Evolution

Evolution operations

Empirical Study

Examples

Skeletal graph of a post network25

Skeletal Graph:

A graph consisting of all core posts

A brief summary of the original post network

Clusters can be derived from skeletal graphs

Our algorithm monitors the changing of

skeletal graphs

Network Evolution Operations26

Add a post

Remove a post

Cluster Evolution Operations27

We define 6 cluster evolution patterns:

appear, disappear, grow, decay, merge and split

Summary: Cluster Evolution28

Add a post:

a new cluster may appear

An existing cluster may grow

Multiple clusters may merge into the single one

Delete a post:

An existing cluster may disappear

An existing cluster may decay

An existing cluster may split into multiple clusters

Network Evolution to Cluster Evolution29

Cluster evolution of adding a post

Network Evolution to Cluster Evolution30

Cluster evolution of deleting a post

Bulk Updating31

Existing incremental computation on dynamic

graphs usually treats the addition/deletion of

nodes or edges one by one

Since social posts arrive at a high speed, the

post-by-post incremental updating will lead to

very poor performance

Bulk updating: update subgraph-by-subgraph

a bulk = a post cluster

More details in Section VII of the paper

Proposed Algorithms32

ICM: Incremental

Cluster Maintenance

eTrack: Cluster

Evolution Tracking

Outline33

Motivation

Evolving network meets social event

Incremental Computation Framework

Divide-and-conquer vs. incremental computation

Post Network Construction

Combat noise

Network and Cluster Evolution

Evolution operations

Empirical Study

Examples

Twitter Technology domain data sets34

Time span: 1 month

Tech-Lite: collecting all the timelines of users

listed in the Technology category of “Who to

follow” and their retweeted users

streaming rate is about 11700 tweets/day

Tech-Full: collecting all the timelines followed

by users who are in the Technology category

streaming rate is about 7216 tweets/hour

Ground Truth35

Major events from News articles:

Crawl news from major technology websites

By treating the news article titles as posts, we

apply our approach to extract events

Peaks in Google Trends

Precision and recall36

HashtagPeaks: use common hashtags to compute post similarity

UnigramPeaks: use common unigrams to compute post similarity

Louvain: use common entities to compute post similarity and apply Louvain community detection algorithm

eTrack: use common entities to compute post similarity and apply our approach

Top 10 social events detected by

different methods37

Running time 38

(a) Adjusting time window length

(b) Adjusting step length

Cluster Evolution Examples

39

40

41

Conclusion42

Theoretical side:

We propose an incremental computation

framework for cluster evolution tracking in highly

dynamic networks

Application side:

We propose an efficient tracking system for event

evolution patterns in social streams

Q & A

Post Network Mining43

A snapshot of post network is constructed by

the posts in the same time window

As social posts stream in, events (dense clusters) are identified out

Relationships between post

network, skeletal graph and clusters44

Skeletal graph is a sketch of post network

Clusters can be generated from the skeletal

graphs