Paper Presentation - Peoplepeople.cs.vt.edu/liangzhe/slides/03-05-2015-steve.pdf · 2015-03-05 ·...

transcript

Paper Presentation

Steve Jan

Virginia Tech

March 5, 2015

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 1 / 28

2 paper to present

Nonparametric Multi-group Membership Model for DynamicNetworks, NIPS13, Myunghwan Kim and Jure Leskovec, Stanford

Community Detection in Graphs through Correlation, KDD14, LianDuan, W. Nick Street, Yanchi Liu, Haibing Lu, New Jersey Instituteof Technology, Santa Clara University

Nonparametric Multi-group Membership Model

Social networkis often dynamic in a sense that relations betweenentities rise and decay over time.

Problem: extract a summary of the common structure and dynamicof the underlying relations.

Applications: Predict missing relationships, forecast future links,identify clusters and groups of nodes

It uses lots of statistic techniques to solve this problem.

Dynamic Multi-group Membership Graph Model

They pay close attention to the three processes governing networkdynamics:

Birth and death dynamics of individual groups

Evolution of memberships of nodes to groups

The structure of network interactions between group members as wellas non-members.

Birth and death dynamics of individual groups

Why do we know when the groups birth and death?It would be more clear that for the number of groups at each specific time.A group can be be in one of two states:{ active (alive) or inactive (not yetborn or dead) }.

Figure: Blact: active (alive), White: inactive

Formal Way of Birth and death dynamics of individualgroups

It uses distance-dependent Indian Buffet Processes (dd-IBP) to model,which is a time-relate stochastic process.Customers enter an Indian Buffet restaurant and sample some subset of aninfinitely long sequence of dishes.In this applications, time t would be customers, they samples a set ofactive groups Kt .Formally speacking, at the first time step t = 1, we have Poisson(λ)number of groups that are initially active, i.e., K1 ∼ Poisson(λ).Poisson(γλ) new groups are also born at time t.

Dynamics of node group memberships

Intuition: Nodes joining and leaving groups based on their current status..They further uses Markov chain to model dynamics of nodes joining andleaving groups.They denote each node i of the network is whether belong to communityK at time t by a binary variable z tik ∈ {0, 1}

where, ak , bk are two parameters and probability.

Relationship between node group memberships and links ofthe network

Intuition: Link netween two nodes based on their current groups Theyassume there is a connection between nodes memberships to groups andthe links of the network.They build on the Multiplicative Attribute Graph model: each group k isassociated with a link affinity matrix M ∈ R2×2.These four entries represent groups members, members and non-members,as well as non- members themselves.

Model Inference via MCMC

After introducing these three models, then they try to sample theseparameters.

Sampling node group memberships Z : Use forward-backwardrecursion algorithm.

group membership transition matrix Q: Use a conjugate prior ofBernoulli distribution and some posterior distribution.

Sampling link affinities M: Use Metropolis-Hastings and HybridMonte Carlo (HMC) sampling.

Experiments

Datasets they use:

NIPS co-authorships network for T = 17 years (1987 to 2003).

DBLP co- authorship network is obtained from 21 Computer Scienceconferences from 2000 to 2009 (T = 10)

INFOCOM dataset represents the physical proximity interactionsbetween 78 students at the 2006 INFOCOM conference, T = 50

Tasks they have:

Missing link prediction

Future network forecasting

Missing link prediction

Randomly hold out 20% of node pairs throughout the entire time period.Naive: Relationship between each pair of nodes is decided by Bernoullidistribution with Beta(1, 1) prior.LFRM: static networksDRIFT: infinite factorial HMM model.

Future network forecasting

Given networks from t = 1, . . . ,T , they want to predict the link oft = T + 1. They train the models on first Tobs networks, fix theparameters, and then for each model they run MCMC sampling one timestep into the future.

Conclusion

We learn three models for time series.

How to sampel these parameters

Personally I think this paper is good in terms of the statisticsmethods they use

Community Detection in Graphs through Correlation

Then, we move to the next paper. This paper is about CommunityDetection, based on Modularity-based.

Major problem of modularity

Resolution problem.Km is an m-cliqueThe detected communities are marked by circles with dash lines.

Multi-resolutionFurther divide each detected communityBias: (the tendency to merge small communities and to split largecommunities, are introduced.)

Connection with itemset search

Graph communities: number of internal edges is greater thanexpected under assumption of random partition

Correlated itemsets: occur more than expected under the assumptionof item independence

Connection: modularity = leverage

Correlated Itemsets

Given itemset S = {I1, I2, . . . , Im} with m items in a dataset with ntransactions

True probability: tps = P(S)

Expected probability eps =∏m

i=1 P(Ii )

Correlation measure: Ms = f (tps , eps)

Chi-square: (tps−eps )2

Probability ratio : tps/epsLeverage: tps − eps

Correlated itemset example

t1: Beef, Chicken, Milkt2:Beef, Cheeset3: Cheese, Bootst4: Beef, Chicken, Cheeset5: Beef, Chicken, Clothes, Cheese, Milk

For the itemset {Beef, Chicken}tp = 3

5 , ep = 35 ∗

45 , Leverage = tp − ep = 3

Modularity Function

Transforming modularity function

For partition {G1,G2, . . . ,Gl}on graph G

ki : degree of node i

k internal : number of nodes in the same group of node i that connectto node.

They found that if translating the undirect-graph modulaity todirect-graph one, they can use itemset criteria to represnt moduality.If we randomly select an edge from the doubly-directed graph:

The true probability of the edge in Gp : tp =

∑i∈Gpk

internali

Probability the edge started from Gp :

∑i∈Gpki2m

Probability the edge ended in Gp :

∑j∈Gpkj

The expected probability of the edge in Gp under the assumption of

independence: ep =

∑i∈Gpki2m ∗

∑j∈Gpkj

Connecting correlation with modularity

For a given partition Gp, partial modularity Qp = tpp − eppFor a given itemset S , leverage = tps − eps

Since the other correlation measures are also functions of tp and ep, theycan change the partial modularity function Qp by using the formula ofother correlation measures.

Experiments

Modify the objective function

Greedy search (hierarchical clustering)

Baseline: Modularity-based methods (Leverage)

Datasets: Real life: 1. Karate club( two equal size communities) 2.College football(12 equal size communities)

Evaluation measures:

Rand Index (Rand1971), Jaccard, F-measure, Normalized mutualinformation (Danon 2005)

Real life datasets

Summary

Connection between community detection and correlation search

Modularity is good only when there are large and clear communities

Likelihood ratio is robust to any type of communities

Probability ratio partitions the whole graph into small communitieswith 2 or 3 objects

Paper Presentation - Peoplepeople.cs.vt.edu/liangzhe/slides/03-05-2015-steve.pdf · 2015-03-05 ·...

Documents