+ All Categories
Transcript
Page 1: Spotting Culprits in Epidemics: How many and Which ones?

Spotting Culprits in Epidemics: How many and

Which ones?B. Aditya Prakash Virginia Tech

Jilles Vreeken University of Antwerp

Christos Faloutsos Carnegie Mellon University

IEEE ICDM BrusselsDecember 11, 2012

Page 2: Spotting Culprits in Epidemics: How many and Which ones?

Contagions• Social collaboration• Information Diffusion• Viral Marketing• Epidemiology and Public Health• Cyber Security• Human mobility • Games and Virtual Worlds • Ecology• Localized effects: riots…

Page 3: Spotting Culprits in Epidemics: How many and Which ones?

Prakash, Vreeken, Faloutsos 2012

Virus Propagation• Susceptible-Infected (SI) Model

[AJPH 2007]

CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts

Diseases over contact networks

β

Page 4: Spotting Culprits in Epidemics: How many and Which ones?

Prakash, Vreeken, Faloutsos 2012

Outline• Motivation---Introduction• Problem Definition• Intuition• MDL• Experiments• Conclusion

Page 5: Spotting Culprits in Epidemics: How many and Which ones?

Prakash, Vreeken, Faloutsos 2012

Culprits: Problem definition2-d grid

Q: Who started it?

Page 6: Spotting Culprits in Epidemics: How many and Which ones?

Prakash, Vreeken, Faloutsos 2012

Culprits: Problem definition

Prior work: [Lappas et al. 2010, Shah et al. 2011]

2-d grid

Q: Who started it?

Page 7: Spotting Culprits in Epidemics: How many and Which ones?

Prakash, Vreeken, Faloutsos 2012

Outline• Motivation---Introduction• Problem Definition• Intuition• MDL• Experiments• Conclusion

Page 8: Spotting Culprits in Epidemics: How many and Which ones?

Prakash, Vreeken, Faloutsos 2012

Culprits: Exoneration

Page 9: Spotting Culprits in Epidemics: How many and Which ones?

Prakash, Vreeken, Faloutsos 2012

Culprits: Exoneration

Page 10: Spotting Culprits in Epidemics: How many and Which ones?

Prakash, Vreeken, Faloutsos 2012

Who are the culprits• Two-part solution– use MDL for number of seeds– for a given number:• exoneration = centrality + penalty

• Running time =– linear! (in edges and nodes)

NetSleuth

Page 11: Spotting Culprits in Epidemics: How many and Which ones?

Prakash, Vreeken, Faloutsos 2012

Outline• Motivation---Introduction• Problem Definition• Intuition• MDL– Construction– Opitimization

• Experiments• Conclusion

Page 12: Spotting Culprits in Epidemics: How many and Which ones?

Prakash, Vreeken, Faloutsos 2012

Modeling using MDL• Minimum Description Length Principle ==

Induction by compression• Related to Bayesian approaches• MDL = Model + Data • Model – Scoring the seed-set

Number of possible |S|-sized setsEn-coding integer |S|

Page 13: Spotting Culprits in Epidemics: How many and Which ones?

Prakash, Vreeken, Faloutsos 2012

Modeling using MDL• Data: Propagation Ripples

Original Graph

Infected Snapshot

Ripple R2Ripple R1

Page 14: Spotting Culprits in Epidemics: How many and Which ones?

Prakash, Vreeken, Faloutsos 2012

Modeling using MDL• Ripple cost

• Total MDL cost

How the ‘frontier’ advancesHow long is the ripple

Ripple R

Page 15: Spotting Culprits in Epidemics: How many and Which ones?

Prakash, Vreeken, Faloutsos 2012

Outline• Motivation---Introduction• Problem Definition• Intuition• MDL– Construction– Opitimization

• Experiments• Conclusion

Page 16: Spotting Culprits in Epidemics: How many and Which ones?

Prakash, Vreeken, Faloutsos 2012

How to optimize the score?• Two-step process– Given k, quickly identify high-quality set– Given these nodes, optimize the ripple R

Page 17: Spotting Culprits in Epidemics: How many and Which ones?

Prakash, Vreeken, Faloutsos 2012

Optimizing the score• High-quality k-seed-set– Exoneration

• Best single seed: – Smallest eigenvector of Laplacian sub-matrix– Analyze a Constrained SI epidemic

• Exonerate neighbors • Repeat

Page 18: Spotting Culprits in Epidemics: How many and Which ones?

Prakash, Vreeken, Faloutsos 2012

Optimizing the score• Optimizing R– Get the MLE ripple!

• Finally use MDL score to tell us the best set

• NetSleuth: Linear running time in nodes and edges

Ripple R

Page 19: Spotting Culprits in Epidemics: How many and Which ones?

Prakash, Vreeken, Faloutsos 2012

Outline• Motivation---Introduction• Problem Definition• Intuition• MDL• Experiments• Conclusion

Page 20: Spotting Culprits in Epidemics: How many and Which ones?

Prakash, Vreeken, Faloutsos 2012

Experiments• Evaluation functions:– MDL based

– Overlap based

(JD == Jaccard distance)

Closer to 1 the better

How far are they?

Page 21: Spotting Culprits in Epidemics: How many and Which ones?

Experiments: # of Seeds

One Seed Two Seeds

Three Seeds

Page 22: Spotting Culprits in Epidemics: How many and Which ones?

Prakash, Vreeken, Faloutsos 2012

Experiments: Quality (MDL and JD)

Ideal = 1

One Seed Two Seeds

Three Seeds

Page 23: Spotting Culprits in Epidemics: How many and Which ones?

Prakash, Vreeken, Faloutsos 2012

Experiments: Quality (Jaccard Scores)

Closer to diagonal, the better

True

Net

Sleu

th

One Seed Two Seeds

Three Seeds

Page 24: Spotting Culprits in Epidemics: How many and Which ones?

Prakash, Vreeken, Faloutsos 2012

Experiments: Scalability

Page 25: Spotting Culprits in Epidemics: How many and Which ones?

Prakash, Vreeken, Faloutsos 2012

Outline• Motivation---Introduction• Problem Definition• Intuition• MDL• Experiments• Conclusion

Page 26: Spotting Culprits in Epidemics: How many and Which ones?

Prakash, Vreeken, Faloutsos 2012

Conclusion• Given: Graph and Infections• Find: Best ‘Culprits’

• Two-part solution– use MDL for number of seeds– for a given number:

exoneration = centrality + penalty

• NetSleuth: – Linear running time in nodes and edges

Page 27: Spotting Culprits in Epidemics: How many and Which ones?

Prakash, Vreeken, Faloutsos 2012

B. Aditya Prakash http://www.cs.vt.edu/~badityap

Any Questions?


Top Related