UBLF ： An Upper Bound Based Approach to Discover Influential Nodes in Social Networks Authors: C....

UBLF： An Upper Bound Based Approach to Discover Influential No

des in Social Networks

Authors: C. Zhou, P. Zhang, J. Guo, X. Zhu, L. GuoPresenter: Peng Zhang, Chinese Academy of Sciences

December 7-10, 2013, Dallas, Texas

IEEE ICDM 2013

Content

• Background• Problem Formulation• Related work• Our solution• Experiments• Conclusion

Background

Social networks are popularly used– Viral marketing– Information dissemination– Technology/Idea transfers

Influence propagation– Influence maximization– Community detection– Influence inference– Early warning of public opinion– Link Prediction/Friends Recommendation– Partner Recommendation/

Social Cooperation/Team Formation

Problem Formulation

Given a directed social graph G=(V,E), a budget k, and a stochastic propagation model M, finding k nodes, such that the expected spread of the influence can be maximized [Kemp KDD’03]

Challenges:

How to measure the objective function M(S) ?

How to find the optimal solution, i.e., the subset k of the most influential nodes?

Problem Formulation

How to measure the influence M(S) ?

Stochastic propagation models

IC model LT model Other propagation models: e.g.

continuous time IC or LT models

Monte Carlo (MC) simulation

Exact calculation under IC and LT is #P-hard (Chen, KDD’ 10).

IC propagation model

.1

.1

.1

.1

.1

.2

.2

.2

.2

.3

.3

.3

.3

.4

.4

.4

.4

.4

.1

.1

b

a

c

fe

d

g

h

I

#P-hard

Greedy Algorithm

How to find a subset k containing the most influential nodes

Influence maximization under both IC and LT models is NP-hard . (Kemp, KDD’03)

Property 1: M(S) is monotone:

Property 2: M(S) is submodular : The set cover problem

Greedy Algorithm

Advantage: Performance guarantee of 1− 1/e =63%

Disadvantage: Heavy computation cost Inner loop ： M(S) needs many Monte-Carlo simulations Outer loop ： time complexity of O(Nk), where N is network

size

Improvement direction (I): Heuristic algorithms

• Heuristic algorithms– ShortestPath: Kimura and Saito (PKDD’06) “Tractable models for information diffusion in social networks”

– DegreeDiscount: Chen et al. (KDD'09) “Efficient influence maximization in social networks”

– MIA: Chen et al. (KDD'10) “Scalable influence maximization for prevalent viral marketing in large-scale social networks”

– DAG: Chen et al. (ICDM’10) “Scalable influence maximization in social networks under the linear threshold model”

– SIMPATH ： Goyal et al. (ICDM’11)“SIMPATH: An Efficient Algorithm for Influence Maximization under the Linear Threshold Model”

Shortest Path from a to c

d e

fg

2

5DegreeDiscount

Node 2’s degree will shrink

Advantage: faster than the Greedy algorithm Disadvantage: no performance guarantee

• Advanced greedy algorithms– CELF ： Leskovec et al. (KDD'07) “Cost-effective outbreak detection in networks”– Goyal et al. (WWW’11) “CELF++: optimizing the greedy algorithm for influence

maximization in social networks”

a

b

c

ab

c

d

d

reward

e

e

Greedy algorithm

Improvement direction (II): Advanced greedy


a

b

c

ab

c

d

d

reward

e

e

Greedy algorithm



a

b

c

ab

c

d

d

reward

e

e

Greedy algorithm

a

b

c

ab

c

d

d

reward

e

e

CELF algorithm




a

b

c

ab

c

d

d

reward

e

e

Greedy algorithm

a

ab

c

d

d

b

c

reward

e

e

CELF algorithm




a

b

c

ab

c

d

d

reward

e

e

Greedy algorithm

a

c

ab

c

d

d

b

reward

e

e

CELF algorithm




Advantage: by setting up an upper bound, CELF reduces the Monte-Carlo calls and improves the greedy algorithm by up to 700 times

Disadvantage: needs N Monte Carlo simulations to initialize the upper bound, where N is the network size.

Our work

Motivation Can we initialize the upper bounds without actually computing

the MC simulations ?

Node upper bound MC

a 2.1 1

b 1.5 1

c 1.1 1

d 1.8 1

e 1.2 1

Node Upper bound MC

a 2.3 0

b 1.7 0

c 1.2 0

d 1.8 0

e 1.2 0

UBLF algorithmCELF algorithmCELF algorithm UBLF algorithmUBLF algorithm

The upper bound of M(S)

Global view

Local view

Proposition 2 establishes a relationship among the activation probabilties in time t and t+1.

How many heads?


M(S) is bounded by a sum of series.

In what condition the series convergent? and what is the limit?

Its aera?

But we know its upper bound!

Too hard!

Convergent condition ： the total influence to or from any node is less than 1.

Under condition (14), we get a tractable upper bound.


+……=

Our UBLF algorithm

• CELF: the first round is time-consuming, needs full MC simulations.

• UBLF: the first round is analytical calculated.

Our work: An example for UBLF

Monte-Carlo Simulation

Node 1 is selected! (only 1 time MC simulation)

Experiments

• Data collection– Ca-GrQc – Digger– Ca-HepPh– Email-Enron

• Benchmark– CELF– Degree– DegreeDiscount– PageRank– Random

Statistics of datas

Experiments

• Comparison results (Numbers of MC simulations)

Observation:

The total MC calls of UBLF is significantly reduced compared to CELF.

Experiments

• Comparison results (Influence spread)

Observations:

The spreads of UBLF and CELF are completely identical, which explains again that UBLF and CELF share the same logic in selecting nodes.

Experiments

• Comparison results (Time cost)

Observation:

UBLF is 2-5 times faster than CELF.

Conclusions

BackgroundProblem

Formulation Greedy Algorithm

Heuristic algorithms: DegreeDis

count, PageRank,

et al.

Advanced greedy

algorithms:CELF,

CELF++

UBLF

Comparisons

Email: [email protected]

Questions ?

Date post:	03-Jan-2016
Category:	Documents
Upload:	john-laurence-rich
View:	217 times
Download:	0 times

UBLF ： An Upper Bound Based Approach to Discover Influential Nodes in Social Networks Authors: C....

Documents