Paper Presentation - Peoplepeople.cs.vt.edu/liangzhe/slides/03-05-2015-steve.pdf · 2015-03-05 ·...

Post on 13-Aug-2020

0 views 0 download

transcript

Paper Presentation

Steve Jan

Virginia Tech

March 5, 2015

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 1 / 28

2 paper to present

Nonparametric Multi-group Membership Model for DynamicNetworks, NIPS13, Myunghwan Kim and Jure Leskovec, Stanford

Community Detection in Graphs through Correlation, KDD14, LianDuan, W. Nick Street, Yanchi Liu, Haibing Lu, New Jersey Instituteof Technology, Santa Clara University

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 2 / 28

Nonparametric Multi-group Membership Model

Social networkis often dynamic in a sense that relations betweenentities rise and decay over time.

Problem: extract a summary of the common structure and dynamicof the underlying relations.

Applications: Predict missing relationships, forecast future links,identify clusters and groups of nodes

Note

It uses lots of statistic techniques to solve this problem.

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 3 / 28

Dynamic Multi-group Membership Graph Model

They pay close attention to the three processes governing networkdynamics:

Birth and death dynamics of individual groups

Evolution of memberships of nodes to groups

The structure of network interactions between group members as wellas non-members.

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 4 / 28

Birth and death dynamics of individual groups

Why do we know when the groups birth and death?It would be more clear that for the number of groups at each specific time.A group can be be in one of two states:{ active (alive) or inactive (not yetborn or dead) }.

Figure: Blact: active (alive), White: inactive

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 5 / 28

Formal Way of Birth and death dynamics of individualgroups

It uses distance-dependent Indian Buffet Processes (dd-IBP) to model,which is a time-relate stochastic process.Customers enter an Indian Buffet restaurant and sample some subset of aninfinitely long sequence of dishes.In this applications, time t would be customers, they samples a set ofactive groups Kt .Formally speacking, at the first time step t = 1, we have Poisson(λ)number of groups that are initially active, i.e., K1 ∼ Poisson(λ).Poisson(γλ) new groups are also born at time t.

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 6 / 28

Dynamics of node group memberships

Intuition: Nodes joining and leaving groups based on their current status..They further uses Markov chain to model dynamics of nodes joining andleaving groups.They denote each node i of the network is whether belong to communityK at time t by a binary variable z tik ∈ {0, 1}

where, ak , bk are two parameters and probability.

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 7 / 28

Relationship between node group memberships and links ofthe network

Intuition: Link netween two nodes based on their current groups Theyassume there is a connection between nodes memberships to groups andthe links of the network.They build on the Multiplicative Attribute Graph model: each group k isassociated with a link affinity matrix M ∈ R2×2.These four entries represent groups members, members and non-members,as well as non- members themselves.

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 8 / 28

Model Inference via MCMC

After introducing these three models, then they try to sample theseparameters.

Sampling node group memberships Z : Use forward-backwardrecursion algorithm.

group membership transition matrix Q: Use a conjugate prior ofBernoulli distribution and some posterior distribution.

Sampling link affinities M: Use Metropolis-Hastings and HybridMonte Carlo (HMC) sampling.

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28

Experiments

Datasets they use:

NIPS co-authorships network for T = 17 years (1987 to 2003).

DBLP co- authorship network is obtained from 21 Computer Scienceconferences from 2000 to 2009 (T = 10)

INFOCOM dataset represents the physical proximity interactionsbetween 78 students at the 2006 INFOCOM conference, T = 50

Tasks they have:

Missing link prediction

Future network forecasting

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 10 / 28

Missing link prediction

Randomly hold out 20% of node pairs throughout the entire time period.Naive: Relationship between each pair of nodes is decided by Bernoullidistribution with Beta(1, 1) prior.LFRM: static networksDRIFT: infinite factorial HMM model.

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 11 / 28

Future network forecasting

Given networks from t = 1, . . . ,T , they want to predict the link oft = T + 1. They train the models on first Tobs networks, fix theparameters, and then for each model they run MCMC sampling one timestep into the future.

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 12 / 28

Conclusion

We learn three models for time series.

How to sampel these parameters

Personally I think this paper is good in terms of the statisticsmethods they use

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 13 / 28

Community Detection in Graphs through Correlation

Then, we move to the next paper. This paper is about CommunityDetection, based on Modularity-based.

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 14 / 28

Major problem of modularity

Resolution problem.Km is an m-cliqueThe detected communities are marked by circles with dash lines.

Multi-resolutionFurther divide each detected communityBias: (the tendency to merge small communities and to split largecommunities, are introduced.)

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 15 / 28

Connection with itemset search

Graph communities: number of internal edges is greater thanexpected under assumption of random partition

Correlated itemsets: occur more than expected under the assumptionof item independence

Connection: modularity = leverage

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 16 / 28

Correlated Itemsets

Given itemset S = {I1, I2, . . . , Im} with m items in a dataset with ntransactions

True probability: tps = P(S)

Expected probability eps =∏m

i=1 P(Ii )

Correlation measure: Ms = f (tps , eps)

Chi-square: (tps−eps )2

eps

Probability ratio : tps/epsLeverage: tps − eps

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 17 / 28

Correlated itemset example

t1: Beef, Chicken, Milkt2:Beef, Cheeset3: Cheese, Bootst4: Beef, Chicken, Cheeset5: Beef, Chicken, Clothes, Cheese, Milk

For the itemset {Beef, Chicken}tp = 3

5 , ep = 35 ∗

45 , Leverage = tp − ep = 3

25

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 18 / 28

Modularity Function

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 19 / 28

Transforming modularity function

For partition {G1,G2, . . . ,Gl}on graph G

ki : degree of node i

k internal : number of nodes in the same group of node i that connectto node.

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 20 / 28

Transforming modularity function

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 21 / 28

Transforming modularity function

They found that if translating the undirect-graph modulaity todirect-graph one, they can use itemset criteria to represnt moduality.If we randomly select an edge from the doubly-directed graph:

The true probability of the edge in Gp : tp =

∑i∈Gpk

internali

2m

Probability the edge started from Gp :

∑i∈Gpki2m

Probability the edge ended in Gp :

∑j∈Gpkj

2m

The expected probability of the edge in Gp under the assumption of

independence: ep =

∑i∈Gpki2m ∗

∑j∈Gpkj

2m

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 22 / 28

Transforming modularity function

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 23 / 28

Transforming modularity function

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 24 / 28

Transforming modularity function

Connecting correlation with modularity

For a given partition Gp, partial modularity Qp = tpp − eppFor a given itemset S , leverage = tps − eps

Since the other correlation measures are also functions of tp and ep, theycan change the partial modularity function Qp by using the formula ofother correlation measures.

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 25 / 28

Experiments

Modify the objective function

Greedy search (hierarchical clustering)

Baseline: Modularity-based methods (Leverage)

Datasets: Real life: 1. Karate club( two equal size communities) 2.College football(12 equal size communities)

Evaluation measures:

Rand Index (Rand1971), Jaccard, F-measure, Normalized mutualinformation (Danon 2005)

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 26 / 28

Real life datasets

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 27 / 28

Summary

Connection between community detection and correlation search

Modularity is good only when there are large and clear communities

Likelihood ratio is robust to any type of communities

Probability ratio partitions the whole graph into small communitieswith 2 or 3 objects

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 28 / 28