Scalable Diffusion-Aware Optimization of Network...

Scalable Diffusion-Aware Optimization of Network Topology

Elias Boutros Khalil, Bistra Dilkina, Le Song

Georgia Institute of Technology

Problem

• Given

• G(V,E),

• a set of source nodes X (infected nodes)

• Linear Threshold Model

• Find a set of k edges to

• remove, s.t., the spread of a certain

substance is minimized

• add, s.t., the spread of a certain substance

is maximized

2

Review: Diffusion Models

• Linear Threshold Model

• Each edge has a weight Wuv

• each node u chooses a threshold uniformly

at random in [0,1]

• Node v will be infected if

• Independent Cascade Model

• Each edge has a propagation probability

Puv

• Each infected node u has only one chance

to infect its neighbor v with prob. Puv

3

Review: Influence Maximization

• Given

• G(V,E)

• LT model or IC model

• To find k nodes to activate to maximize

the spread of a certain substance

• Greedy algorithm

• Objective function is submodular

• (1-1/e)-appriximation

4

Edge Deletion Problem

• Given G, source set A,

• Find k edges

• Supermodular

• Greedy algorithm provides (1-1/e)-

approximation

• Scaling up tricks

5

Edge Addition Problem

• Given G, source set A,

• Find k edges

• Still supermodular (Equivalent to

constrained submodular minimization)

• Algorithm: max. the lowerbound

6

Edge Addition Problem

• Marginal Gain is bounded

• Apply an approach for constrained submodular

minimization with approximation guarantees R. Iyer, S. Jegelka, and J. Bilmes. Fast semidifferential based

submodular function optimization. In ICML, 2013.

7

Experiments

• Datasets

• Syntetic dataset: generated by Kronecker

graph model

• (1) CorePeriphery, (2) ErdosRenyi and (3)

Hierarchical

• Real datasets:

8

Experiments

• Competing heuristics

• Random

• Weights: highest weights

• Betweenness

• Eigen: k edges to max the leading

eigendrop

• Degree: k edges whose destination nodes

have the highest out-degrees [8]

9

Experiments

Edge deletion Edge addition

10

Core Decomposition of Uncertain Graphs

Francesco Bonchi, Francesco Gullo, Andreas

Kaltenbrunner, Yana Volkovich

Yahoo Labs, Spain

Core decomposition

• k-core of a graph

• a maximal subgraph in which every vertex

is connected to at least k other vertices

within that subgraph

• Core decomposition

• The set of all k-cores of a graph G forms

the core decomposition of G

12

K-core under uncertain graphs • A maximal subgraph whose vertices have at

least k neigbours in that subgraph with

probability no less than η

13

Example

14

Motivation

• core decomposition can be computed

efficiently in deterministic graphs

• computed in linear time

• However, does not guarantee efficiency

in uncertain graphs

• even the simplest graph operations may

become computationally intensive.

• uncertain graph

• edges are assigned a probability of existence

• E.g.:, protein-interaction, the influence of one

person on another 15

Applications • Influence maximization

• Idea: just reduce the input graph G by keeping only

the inner-most η-shells

• the higher the core index is, the more likely the

vertex is an influential spreader [17]

• Task-driven team formation

• Node: individuals; edge: a probabilistic topic model

• Given a pair <T,Q> where T is the set of terms, Q is

a set of nodes

• Goal: Find a node of nodes A where Q⊆A, which a

good team to perform the task in T

• Solution: find a connected component of (k,η)-core

which contains A 16

Algorithm framework

17

Follow the deterministic

case

the maximum degree such that

the probability for v to have that

degree is no less than η

Non-trivial to compute

Experiments

18

Influence Maximization

Task-driven Team-formation

Fast Influence-based Coarsening for Large Networks

KDD, New York City

August 26, 2014

Manish Purohit^, B. Aditya Prakash*,

Chanhyun Kang^, Yao Zhang*, V S Subrahmanian^

*Virginia Tech ^University of Maryland

Networks are getting huge!

20

Flickr (friendship network): 87 million

users and 8 billion photos until 2013 Amazon (friendship network): 237 million

accounts until 2013

Twitter (follower network): 271 million

monthly active users

Facebook (friendship network): 829

million daily active users on average in

June 2014 Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

Need for fast analysis

• Ever growing list of applications of

network effects

• Viral Marketing

• Immunization

• Information Diffusion

• …

21

However, scaling up traditional algorithms

up to millions of nodes is hard

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

How to handle large-scale networks

• Approaches

• Use faster / simpler algorithms

• Perform analysis locally

• i.e., divide the large network into

smaller subgraphs

• Zoom-out the network to

obtain a smaller

representation of the network

22

this paper


Bird’s eye view of a network

23


Bird’s eye view of a network

• “Zoom-out” of the graph to get a quick

picture

24

Called “coarsen” in this paper

Big graph

Zoom-out

A

F

E

D

C

B

Small representation

of the network

A

C B

E

F

D


Outline

• Motivation

• Challenges

• Problem Definition

• Our Proposed Method

• Experiments

• Applications

• Conclusion

25


Challenges

• C1: How do we maintain diffusive

characteristics when coarsening

networks?

• C2: How do we merge node to get the

coarse network?

• C3: how do we find the best node to

merge fast?

26


C1: Information Diffusion

• Cascading behavior in networks

27

Diffusion is graph induced by a time ordered propagation of information (edges)

Blogs Posts

Links

Information

cascade

Source: [McGlohon et. al., SDM2007]

B1 B2

B4 B3

1

1

2

3

1

Blog network


C1: Model information diffusion

• Information spreads over networks

• e.g.:, rumor/meme spreads over Twitter following

network

• Independent cascade model (IC) [Kempe+, KDD03]

• Weights pij: propagation prob. from i to j

• Each node has only one chance to infect its

neighbors

28


Meme spreading

C1: Diffusive characteristics

• First eigenvalue λ1 (of adjacency matrix)

is enough for most diffusion models.

(Prakash et al. [ICDM’12])

29

λ1 is the epidemic threshold

“Safe” “Vulnerable” “Deadly”

Increasing λ1 , Increasing vulnerability Purohit, Prakash, Kang, Zhang, Subrahmanian 2014

C1: maintain diffusive characteristics

• Goal: maintain the diffusive characteristics of

the original network in the coarsened network?

30 Original network

coarsen

A

F

E

D

C

B

Coarsened network

A

C B

E

F

D

Make the coarsened network has the least

change in the first eigenvalue


C2: How to merge nodes

• Goal: Merge nodes of graph G to get the

coarsened graph that “approximates” G with

respect to diffusion

31

Merge b and a can

get the least change

of λ1

Is this correct?

0.375!

Original network

Influence from d to b: 0.5

Influence from d to a: 0.25

Average: 0.375


• In general:

32

C2: How to merge nodes

Merging a,b


Details

C3: which nodes to merge

• Goal:

• Find the best nodes to merge

• Fast, scalable to large network

33

Talk about it

later

Original network

coarsen

A

F

E

D

C

B

Coarsened network

A

C B

E

F

D


Outline

• Motivation

• Challenges



• Experiments

• Applications

• Conclusion

34


Problem Definition

Graph Coarsening Problem (GCP)

Given: large graph G(V, E), and reduction

factor α

Find: the best set of edges to merge

Such that: |λG - λH| is minimized

• (i.e. H is the coarsened graph with the

least change in the first eigenvalue)

35

Naive Greedy Heuristic

Step: • Score every edge by the change in eigenvalue

• Greedily choose the edge (a,b) with the least score,

and merge (a,b)

• Re-evaluate the scores of every edge and repeat

36

• Too slow! O(m2) time to score all edges

• Lose time benefits of analyzing the smaller graph


Outline

• Motivation


• Challenges


• CoarseNet

• Experiments

• Applications

• Conclusion

37


CoarseNet: idea

• Can we approximate the edge scores faster?

• Yes!

• Use matrix perturbation arguments to

estimate (up to first order terms) the score of

an edge in constant time!

• Score all edges in O(m) time

• Naive Heuristic: O(m2) time

38


CoarseNet: details

• Corollary 5.1: Given the first eigenvalue λ,

and corresponding eigenvectors u, v, the

score of a node pair score(a, b) can be

approximated in constant time.

39

(a,b) is a node-

pair


We want to characterize the change of λ after coarsening

a b

f

g

e

Coarsen

merge (a,b)

c

f

g

e

the out-adjacency vector of merged node c

CoarseNet

40

See paper for

details A u = λ . u

u(i)


left eigenvector right eigenvector

weight of (b,a)

weight of (a,b)

Details

CoarseNet: Complete algorithm • Step

1: compute scores for all edge pairs

2: Merge nodes with smallest score

3. Goto step 1 until αn nodes left

41

Original Network

(weight=0.5)

Assigning

scores

Merging edges

Coarsened Network


CoarseNet: running time

42

• Running time: O(mln(m)+αnnθ)

• m: number of edges

• n: number of nodes

• nθ : the maximum degree of any vertex during the

merging process


Outline

• Motivation

• Challenges



• Experiments

• Applications

• Conclusion

43


How do we perform?

44

The first eigenvalue gets preserved well up to large

coarsening factors!

Amazon

(See more results in the paper)

DBLP


Scalability w.r.t Reduction Factor (α)

45

Scales linearly with the desired reduction factor

Amazon (334,863 vertices) DBLP (511,163 vertices)



Scalability w.r.t Graph Size (𝑛)

46

Flickr

Scales linearly with the number of nodes

We extracted 6

connected

components (with

500K to 1M vertices

in steps of 100K) of

the Flickr network


Outline

• Motivation

• Challenges



• Experiments

• Applications

• Conclusion

47


• How to market well?

• Convince a subset of individuals to adopt a new

product

• Then, trigger a large cascade of further adoptions

• Influence maximization problem

• [Kempe et. al, KDD03]

• Find the best set of seeds in a network to achieve

highest diffusion

48

Application 1: Influence Maximization


Who is the most

influential person?

Influence

Application 1: Influence Maximization • Our fast algorithm CSPIN:

Step 1: Coarsen the large social network using CoarsenNet

Step 2: Solve influence maximization on the coarsened network

Step 3: Randomly select one node from each selected “supernode”

49

Step 1: Coarsen

A

C B

E

F

D Step 2: Solve influence

maximization

A

C B

E

F

D

Step 3: Randomly

select one node from C We call it CSPIN


Quality of CSPIN

• We use and compare against the fast and

popular PMIA algorithm (Chen et al.

[KDD’07])

50

We obtain influence spread as good as by PMIA


Quality of CSPIN w.r.t 𝛼

51

We can merge up to 95% of the vertices are merged

without significantly affecting the influence spread!


Scalability w.r.t number of seeds

52

Log scale

Finds good solutions in minutes instead of hours!

Portland (1.5 million vertices)



Application 2: Diffusion Characterization

• Goal: use Graph Coarsening to understand

information cascades

• Dataset: Flixster • a fridendship network with movie ratings

• Cascade: the same movie rating from friends

• Methodology

• coarsen the network using CoarseNet with the

reduction factor α=0.5

• study the formed groups (supernodes)

Purohit, Prakash, Kang, Zhang, Subrahmanian 2014 53

Diffusion observation

Observation 1: a very large fraction of movies

propagate in a small number of groups

Observation 2: a multi-modal distribution

Stats:

• 1891 groups

• mean group size: 16.6

• the largest group: 22061

nodes (roughly 40% of

nodes)


Purohit, Prakash, Kang, Zhang, Subrahmanian 2014 54

Can get non-network

surrogates for

super-nodes

Outline

• Motivation

• Challenges



• Experiments

• Applications

• Conclusion

55


Conclusion Graph Coarsening Problem

• Given: a large graph and

the reduction factor

• Find: "best" nodes to

coarsen

CoarseNet

• estimate edge score in

constant time

• Sub-quadratic

Applications

• Influence Maximization

• Diffusion Characterization

56

Original

network

coarsen

A

F

E

D

C

B

Coarsened

network

A

C B

E

F

D


Any Questions?

• Code at:

http://www.cs.vt.edu/~badityap/

Funding:

57

Original

network

coarsen

A

F

E

D

C

B

Coarsened

network

A

C B

E

F

D

http://www.cs.vt.edu/~badityap/

Date post:	29-Jul-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Scalable Diffusion-Aware Optimization of Network...

Documents