Stanford Universitysnap.stanford.edu/class/cs322-2009/11-viral-annot.pdf · 2020. 1. 9. ·...

Post on 27-Sep-2020

11 views 0 download

transcript

CS 322: (Social and Information) Network AnalysisJure LeskovecStanford University

Probabilistic models of network diffusion Probabilistic models of network diffusion How cascades spread in real life: Viral marketing Viral marketing BlogsG b hi Group membership

Last 20 minutes: mid term course evaluation Last 20 minutes: mid‐term course evaluation

10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 2

How do viruses/rumors propagate?How do viruses/rumors propagate? Will a flu‐like virus linger, or will it become extinct?

(Virus) birth rate β: probability than an infected(Virus) birth rate β: probability than an infected neighbor attacks

(Virus) death rate δ: probability that an infected node healsheals

HealthyN2

Prob. δ

NN1

2

Prob. β

Jure Leskovec, Stanford CS322: Network Analysis 3

Infected N310/27/2009

General scheme for epidemic models: General scheme for epidemic models:

S…susceptibleE…exposedI…infected

d

10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis

R…recoveredZ…immune

4

Assuming perfect g pmixing, i.e., a network is a complete graph

odes

The model dynamics:

mbe

r of n

oNu

time

10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis

Susceptible Infected Recovered

5

Susceptible Infective Susceptible (SIS) model Susceptible‐Infective‐Susceptible (SIS) model  Cured nodes immediately become susceptible Virus “strength”: s = β / δ Virus  strength : s = β / δ

Infected by neighbor with prob. β

Susceptible Infective

Jure Leskovec, Stanford CS322: Network Analysis 6

Cured internally with prob. δ

10/27/2009

f Assuming perfect mixing (complete graph): n

odes

graph):

ISIdS

Num

ber o

f n

ISIdI

dt

N

S sceptible Infected

ISIdt

time

10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis

Susceptible Infected

7

Epidemic threshold of a graph is a value of t Epidemic threshold of a graph is a value of t, such that: If strength s β / δ < t epidemic can not happen If strength s = β / δ <  t epidemic can not happen (it eventually dies out)

Given a graph compute its epidemic threshold

Jure Leskovec, Stanford CS322: Network Analysis 810/27/2009

What should t depend on? What should t depend on? avg. degree? and/or highest degree?  and/or variance of degree? and/or variance of degree? and/or third moment of degree? and/or diameter? and/or diameter?

Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 9

[Wang et al. 2003]

We have no epidemic if: We have no epidemic if:

(Virus) Death Epidemic threshold

β/δ < τ = 1/ λ1,A

( )rate

β 1,A

(Virus) Birth rate largest eigenvalue

► λ A alone captures the property of the graph!

of adj. matrix A

Jure Leskovec, Stanford CS322: Network Analysis

► λ1,A alone captures the property of the graph!

10/27/2009 10

[Wang et al. 2003]

500 Oregonβ 0 001

10,900 nodes and 31,180 edges

400

d N

odes

β = 0.001

β/δ > τ(above threshold)

3 , g

200

300

f Inf

ecte

d

100

200

umbe

r of

β/δ = τ(at the threshold)

00 250 500 750 1000

N

β/δ < τ

Jure Leskovec, Stanford CS322: Network Analysis

Timeδ: 0.05 0.06 0.07

β(below threshold)

10/27/2009 11

Does it matter how many people are Does it matter how many  people are initially infected?

Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 12

[Leskovec et al., SDM ’07]

Bloggers write posts and refer (link) to other Bloggers write posts and refer (link) to other posts and the information propagates

1310/27/2009 Jure Leskovec, Stanford CS322: Network Analysis

Posts

Blogs

Time 

Information cascade

D t Bl

Time ordered 

hyperlinks

Data – Blogs: We crawled 45,000 blogs for 1 year 10 million posts and 350,000 cascades

Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 14

[Leskovec et al., TWEB ’07]

Senders and followers of recommendations receive discounts on products

10% credit 10% off

• Data – Incentivized Viral Marketing program• 16 million recommendations

  illi   l

Jure Leskovec, Stanford CS322: Network Analysis

• 4 million people• 500,000 products

10/27/2009 15

[Backstrom et al., KDD ’06]

Use social networks where people belong to Use social networks where people belong to explicitly defined groups

Each group defines a behavior that diffuses Each group defines a behavior that diffuses

Data – LiveJournal: On‐line blogging community with friendship links and user‐defined groupsg p Over a million users update content each month Over 250,000 groups to joinOver 250,000 groups to join

Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 16

Prob of adoption depends on the number of Prob. of adoption depends on the number of friends who have adopted [Bass ‘69, Granovetter ’78] What is the shape?What is the shape? Distinction has consequences for models and algorithms

on on

 of a

doptio

 of a

doptio

k = number of friends adopting

Prob

. o

k = number of friends adopting

Prob

. o

k = number of friends adopting k = number of friends adopting

Diminishing returns? Critical mass?10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 17

[Leskovec et al., TWEB ’07]

DVD recommendations

asing

0.090.1

DVD recommendations(8.2 million observations)

of purcha

0 050.060.070.08

bability o

0 020.030.040.05

Prob

00.010.02

0 10 20 30 40

18

0 10 20 30 40# recommendations received

10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis

[Backstrom et al., KDD ’06]

LiveJournal community membership LiveJournal community membership oining

rob. of jo

k ( b   f f i d  i  th   it )

Pr

Jure Leskovec, Stanford CS322: Network Analysis

k (number of friends in the community)

10/27/2009 19

[Kossinets‐Watts ‘06]

Sending email:Sending email: Email network of large university Prob. of a link as a function of # of common friends

mail

ob. o

f em

Pro

Jure Leskovec, Stanford CS322: Network Analysis

k (number of common friends)

10/27/2009 20

For viral marketing: For viral marketing: We see that node v receiving the i‐threcommendation and then purchased the productp p

For communities: At time t we see the behavior of node v’s friends

Questions: When did v become aware of recommendations or f i d ’ b h i ?friends’ behavior? When did it translate into a decision by v to act? How long after this decision did v act? How long after this decision did v act?

Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 21

Dependence on number of friends Dependence on number of friends Consider: connectedness of friends x and y have three friends in the group x and y have three friends in the group x’s friends are independent

’ f i d ll t d x y y’s friends are all connected Who is more likely to join?

x y

Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 22

Competing sociological theories x y Competing sociological theories Information argument [Granovetter ‘73] S i l it l t [C l ’88]

x y

Social capital argument [Coleman ’88]

Information argument:Information argument:  Unconnected friends give independent support

Social capital argument:Social capital argument: Safety/truest advantage in having friends who know each otherknow each other

Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 23

[Backstrom et al. KDD ‘06]

LiveJournal: 1 million users, 250,000 groups

Social capital argument wins!p gProb. of joining  increaseswith 

adjacent members.

10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 24

Large anonymous online retailerLarge anonymous online retailer (June 2001 to May 2003)

15 646 121 d ti 15,646,121 recommendations 3,943,084 distinct customers 548 523 products recommended548,523 products recommended Products belonging to 4 product groups: books DVDs music VHS

2510/27/2009 Jure Leskovec, Stanford CS322: Network Analysis

t < t < < t t1 < t2 < … < tn

t3legend

bought but didn’ti    di t

t1

receive a discount

bought andreceived a discount

t2received a recommendationbut didn’t buy

26

t4

10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis

There are relatively few DVD titles, but DVDs account for ~ 50% of recommendations. Recommendations per personp p

DVD: 10 books and music: 2 VHS: 1

Recommendations per purchase Recommendations per purchase books: 69 DVDs: 108 music: 136 VHS: 203

Overall there are 3.69 recommendations per node on 3.85 different products.

Music recommendations reached about the same number of people as DVDs but used only p p y1/5 as many recommendations 

Book recommendations reached by far the most people – 2.8 million. All networks have a very small number of unique edges. For books, videos and music the 

number of unique edges is smaller than the number of nodes – the networks are highlynumber of unique edges is smaller than the number of nodes – the networks are highly disconnected

2710/27/2009 Jure Leskovec, Stanford CS322: Network Analysis

What role does the product category play? What role does the product category play?      

products customers recommenda-tions edges

buy + getdiscount

buy + no discounttions discount discount

Book 103,161 2,863,977 5,741,611 2,097,809 65,344 17,769

DVD 19,829 805,285 8,180,393 962,341 17,232 58,189

Music 393,598 794,148 1,443,847 585,738 7,837 2,739

Video 26,131 239,583 280,270 160,683 909 467

F ll 542 719 3 943 084 15 646 121 3 153 676 91 322 79 164Full 542,719 3,943,084 15,646,121 3,153,676 91,322 79,164

peoplerecommendations

Jure Leskovec, Stanford CS322: Network Analysis

highlow

10/27/2009 28

Some products are easier to recommend than Some products are easier to recommend than othersd t t number of buy forward tproduct category number of buy

bitsforward

recommendations percent

Book 65,391 15,769 24.2

DVD 16,459 7,336 44.6

Music 7,843 1,824 23.3Music 7,843 1,824 23.3

Video 909 250 27.6

Total 90 602 25 179 27 8

Jure Leskovec, Stanford CS322: Network Analysis

Total 90,602 25,179 27.8

10/27/2009 29

Does sending more recommendations Does sending more recommendations influence more purchases?

5

6

7

ases

3

4

ber o

f Pur

cha

20 40 60 80 100 120 1400

1

2

Num

Jure Leskovec, Stanford CS322: Network Analysis

20 40 60 80 100 120 140Outgoing Recommendations

10/27/2009 30

What is the effectiveness of subsequent What is the effectiveness of subsequent recommendations?

0 07

0.06

0.07

ying

0.04

0.05

babi

lity

of b

u

0 02

0.03Pro

b

Jure Leskovec, Stanford CS322: Network Analysis

5 10 15 20 25 30 35 400.02

Exchanged recommendations

10/27/2009 31

consider successful recommendations in terms of av # senders of recommendations per book categoryav. # senders of recommendations per book category av. # of recommendations accepted

books overall have a 3% success rate  (2% with discount, 1% without)

lower than average success rate (significant at p=0 01 level) lower than average success rate (significant at p=0.01 level) fiction romance (1.78), horror (1.81) teen (1.94), children’s books (2.06) i (2 30) i fi (2 34) t d th ill (2 40) comics (2.30), sci‐fi (2.34), mystery and thrillers (2.40)

nonfiction sports (2.26) home & garden (2.26) travel (2 39) travel (2.39)

higher than average success rate (statistically significant) professional & technical medicine (5.68) professional & technical (4 54) professional & technical (4.54) engineering (4.10), science (3.90),  computers & internet (3.61) law (3.66), business & investing (3.62)

3210/27/2009 Jure Leskovec, Stanford CS322: Network Analysis

47 000 customers responsible for the 2 5 out of47,000 customers responsible for the 2.5 out of 16 million recommendations in the system

29% success rate per recommender of an anime DVD

Giant component covers 19% of the nodes

Overall, recommendations for DVDs are more likely to result in a purchase (7%), but the anime 

i dcommunity stands out

Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 33

Variable transformation Coefficient

const -0.940 ***# recommendations ln(r) 0 426 ***# recommendations ln(r) 0.426 # senders ln(ns) -0.782 ***# recipients ln(n ) -1 307 ***# recipients ln(nr) 1.307 product price ln(p) 0.128 ***# reviews ln(v) -0 011 ***# reviews ln(v) -0.011 avg. rating ln(t) -0.027 *

R2 0 74

10/27/2009 Jure Leskovec, Stanford CS322: Network Analysis 34

R2 0.74significance at the 0.01 (***), 0.05 (**) and 0.1 (*) levels 

94% of users make first recommendation without having greceived one previously

Size of giant connected component increases from 1% to 2 % f h k (100 20 ) ll!2.5% of the network (100,420 users) – small!

Some sub‐communities are better connected24% f 18 000 f V 24% out of 18,000 users for westerns on DVD

26% of 25,000 for classics on DVD 19% of 47,000 for anime (Japanese animated film) on DVD

Others are just as disconnected 3% of 180,000 home and gardening 2‐7% for children’s and fitness DVDs

Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 35

Products suited for Viral Marketing: small and tightly knit community few reviews, senders, and recipients but sending more recommendations helps

pricey products pricey products rating doesn’t play as much of a role

Observations for future diffusion models: purchase decision more complex than threshold or simple infection influence saturates as the number of contacts expands links user effectiveness if they are overused

Conditions for successful recommendations: professional and organizational contexts discounts on expensive items small tightly knit communities small, tightly knit communities

Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 36

How big are cascades? How big are cascades? What are the building blocks of cascades?blocks of cascades?

973

938

Jure Leskovec, Stanford CS322: Network Analysis

Medical guide book DVD10/27/2009 37

Given a (social) network Given a (social) network A process by spreading over the network creates a graph (a tree)creates a graph (a tree)

Cascade (propagation graph)

Social network

Jure Leskovec, Stanford CS322: Network Analysis

Let’s count cascades

10/27/2009 38

is the most common cascade subgraph is the most common cascade subgraph It accounts for ~75% cascades in books, CD and VHS, only 12% of DVD cascades, y

is 6 (1.2 for DVD) times more frequent than

For DVDs          is more frequent than Chains (             ) are more frequent than 

i f t th lli i is more frequent than a collision (       )   (but collision has less edges)

Late split ( ) is more frequent than Late split (             ) is more frequent than

Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 39

Stars (“no propagation”) Stars ( no propagation )

Bipartite cores (“common friends”)

Nodes having same friends Nodes having same friends

Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 40

t  d ff bookssteep drop‐off books

106= 1.8e6 x-4.98

104

ount

very few large cascades102Co

100 101 102100

x = Cascade size (number of nodes)

Jure Leskovec, Stanford CS322: Network Analysis

x = Cascade size (number of nodes)

10/27/2009 41

DVD cascades can grow large DVD cascades can grow large Possibly as a result of websites where people sign up to exchange recommendationssign up to exchange recommendations 

shallow drop off – fat tail~ x-1.56

104

Coun

t

a number of large cascades102

Jure Leskovec, Stanford CS322: Network Analysis

100 101 102 103100

x = Cascade size (number of nodes)10/27/2009 42

The probability of observing a cascade on x p y gnodes follows: p(x) ~ x‐2

Coun

t

Jure Leskovec, Stanford CS322: Network Analysis

x = Cascade size (number of nodes)10/27/2009 43

Cascade sizes follow a heavy‐tailed distributiony Viral marketing: Books: steep drop‐off: power‐law exponent ‐5 DVDs: larger cascades: exponent ‐1.5s a ge cascades e po e 5

Blogs:  Power‐law exponent ‐2

However, it is not a simple branching processo e e , s o a s p e b a c g p ocess A simple branching process (a on k‐ary tree): Every node infects each of k of its neighbors with prob. pgives exponential cascade size distributiongives exponential cascade size distribution

Questions: What role does the underlying social network play? C k t t d li ti d ti Can make a step towards more realistic cascade generation (propagation) model?

Jure Leskovec, Stanford CS322: Network Analysis10/27/2009 44

1) Randomly pick blog to infect  add to cascade

2)  Infect each in‐linked neighbor with probability infect, add to cascade.

B1 B2

11

neighbor with probability 

B1B1

B1 B2

11

B4B3

2

1 3

1

B4B3

2

1 3

1

4

3) Add infected neighbors to cascade.

4) Set node infected in (i) to uninfected.

B1 B2

11

21

B1 B2

11

21

B1 B1

Jure Leskovec, Stanford CS322: Network Analysis

B4B31 3B4

B31 3B4 B4

10/27/2009 45

Generative model 

Coun

t

Coun

tproduces realistic cascades

Cascade size Cascade node in‐degree

β=0.025

ount

nt

Co

Cou

Jure Leskovec, Stanford CS322: Network Analysis

Most frequent cascades Size of star cascade Size of chain cascade

10/27/2009 46

Blogs – information epidemics Blogs – information epidemics Which are the influential/infectious blogs?

Viral marketing Who are the trendsetters?Who are the trendsetters?  Influential people?

Disease spreading Where to place monitoring stations to detect p gepidemics?

4710/27/2009 Jure Leskovec, Stanford CS322: Network Analysis