Paper Presentation
Steve Jan
Virginia Tech
March 5, 2015
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 1 / 28
2 paper to present
Nonparametric Multi-group Membership Model for DynamicNetworks, NIPS13, Myunghwan Kim and Jure Leskovec, Stanford
Community Detection in Graphs through Correlation, KDD14, LianDuan, W. Nick Street, Yanchi Liu, Haibing Lu, New Jersey Instituteof Technology, Santa Clara University
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 2 / 28
Nonparametric Multi-group Membership Model
Social networkis often dynamic in a sense that relations betweenentities rise and decay over time.
Problem: extract a summary of the common structure and dynamicof the underlying relations.
Applications: Predict missing relationships, forecast future links,identify clusters and groups of nodes
Note
It uses lots of statistic techniques to solve this problem.
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 3 / 28
Dynamic Multi-group Membership Graph Model
They pay close attention to the three processes governing networkdynamics:
Birth and death dynamics of individual groups
Evolution of memberships of nodes to groups
The structure of network interactions between group members as wellas non-members.
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 4 / 28
Birth and death dynamics of individual groups
Why do we know when the groups birth and death?It would be more clear that for the number of groups at each specific time.A group can be be in one of two states:{ active (alive) or inactive (not yetborn or dead) }.
Figure: Blact: active (alive), White: inactive
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 5 / 28
Formal Way of Birth and death dynamics of individualgroups
It uses distance-dependent Indian Buffet Processes (dd-IBP) to model,which is a time-relate stochastic process.Customers enter an Indian Buffet restaurant and sample some subset of aninfinitely long sequence of dishes.In this applications, time t would be customers, they samples a set ofactive groups Kt .Formally speacking, at the first time step t = 1, we have Poisson(λ)number of groups that are initially active, i.e., K1 ∼ Poisson(λ).Poisson(γλ) new groups are also born at time t.
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 6 / 28
Dynamics of node group memberships
Intuition: Nodes joining and leaving groups based on their current status..They further uses Markov chain to model dynamics of nodes joining andleaving groups.They denote each node i of the network is whether belong to communityK at time t by a binary variable z tik ∈ {0, 1}
where, ak , bk are two parameters and probability.
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 7 / 28
Relationship between node group memberships and links ofthe network
Intuition: Link netween two nodes based on their current groups Theyassume there is a connection between nodes memberships to groups andthe links of the network.They build on the Multiplicative Attribute Graph model: each group k isassociated with a link affinity matrix M ∈ R2×2.These four entries represent groups members, members and non-members,as well as non- members themselves.
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 8 / 28
Model Inference via MCMC
After introducing these three models, then they try to sample theseparameters.
Sampling node group memberships Z : Use forward-backwardrecursion algorithm.
group membership transition matrix Q: Use a conjugate prior ofBernoulli distribution and some posterior distribution.
Sampling link affinities M: Use Metropolis-Hastings and HybridMonte Carlo (HMC) sampling.
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28
Experiments
Datasets they use:
NIPS co-authorships network for T = 17 years (1987 to 2003).
DBLP co- authorship network is obtained from 21 Computer Scienceconferences from 2000 to 2009 (T = 10)
INFOCOM dataset represents the physical proximity interactionsbetween 78 students at the 2006 INFOCOM conference, T = 50
Tasks they have:
Missing link prediction
Future network forecasting
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 10 / 28
Missing link prediction
Randomly hold out 20% of node pairs throughout the entire time period.Naive: Relationship between each pair of nodes is decided by Bernoullidistribution with Beta(1, 1) prior.LFRM: static networksDRIFT: infinite factorial HMM model.
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 11 / 28
Future network forecasting
Given networks from t = 1, . . . ,T , they want to predict the link oft = T + 1. They train the models on first Tobs networks, fix theparameters, and then for each model they run MCMC sampling one timestep into the future.
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 12 / 28
Conclusion
We learn three models for time series.
How to sampel these parameters
Personally I think this paper is good in terms of the statisticsmethods they use
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 13 / 28
Community Detection in Graphs through Correlation
Then, we move to the next paper. This paper is about CommunityDetection, based on Modularity-based.
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 14 / 28
Major problem of modularity
Resolution problem.Km is an m-cliqueThe detected communities are marked by circles with dash lines.
Multi-resolutionFurther divide each detected communityBias: (the tendency to merge small communities and to split largecommunities, are introduced.)
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 15 / 28
Connection with itemset search
Graph communities: number of internal edges is greater thanexpected under assumption of random partition
Correlated itemsets: occur more than expected under the assumptionof item independence
Connection: modularity = leverage
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 16 / 28
Correlated Itemsets
Given itemset S = {I1, I2, . . . , Im} with m items in a dataset with ntransactions
True probability: tps = P(S)
Expected probability eps =∏m
i=1 P(Ii )
Correlation measure: Ms = f (tps , eps)
Chi-square: (tps−eps )2
eps
Probability ratio : tps/epsLeverage: tps − eps
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 17 / 28
Correlated itemset example
t1: Beef, Chicken, Milkt2:Beef, Cheeset3: Cheese, Bootst4: Beef, Chicken, Cheeset5: Beef, Chicken, Clothes, Cheese, Milk
For the itemset {Beef, Chicken}tp = 3
5 , ep = 35 ∗
45 , Leverage = tp − ep = 3
25
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 18 / 28
Modularity Function
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 19 / 28
Transforming modularity function
For partition {G1,G2, . . . ,Gl}on graph G
ki : degree of node i
k internal : number of nodes in the same group of node i that connectto node.
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 20 / 28
Transforming modularity function
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 21 / 28
Transforming modularity function
They found that if translating the undirect-graph modulaity todirect-graph one, they can use itemset criteria to represnt moduality.If we randomly select an edge from the doubly-directed graph:
The true probability of the edge in Gp : tp =
∑i∈Gpk
internali
2m
Probability the edge started from Gp :
∑i∈Gpki2m
Probability the edge ended in Gp :
∑j∈Gpkj
2m
The expected probability of the edge in Gp under the assumption of
independence: ep =
∑i∈Gpki2m ∗
∑j∈Gpkj
2m
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 22 / 28
Transforming modularity function
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 23 / 28
Transforming modularity function
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 24 / 28
Transforming modularity function
Connecting correlation with modularity
For a given partition Gp, partial modularity Qp = tpp − eppFor a given itemset S , leverage = tps − eps
Since the other correlation measures are also functions of tp and ep, theycan change the partial modularity function Qp by using the formula ofother correlation measures.
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 25 / 28
Experiments
Modify the objective function
Greedy search (hierarchical clustering)
Baseline: Modularity-based methods (Leverage)
Datasets: Real life: 1. Karate club( two equal size communities) 2.College football(12 equal size communities)
Evaluation measures:
Rand Index (Rand1971), Jaccard, F-measure, Normalized mutualinformation (Danon 2005)
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 26 / 28
Real life datasets
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 27 / 28
Summary
Connection between community detection and correlation search
Modularity is good only when there are large and clear communities
Likelihood ratio is robust to any type of communities
Probability ratio partitions the whole graph into small communitieswith 2 or 3 objects
Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 28 / 28