Predictive Blacklisting as an Implicit Recommendation System
Authors: Fabio Soldo, Anh Le, Athina MarkopoulouIEEE INFOCOM 2010Reporter: Jing ChiuAdvisor: Yuh-Jye LeeEmail: [email protected]
112/04/22 1Data Mining & Machine Learning Lab
Outlines• Introduction
▫ Blacklists▫ Recommendation System
• Related Works▫ LWOL▫ GWOL▫ HPB▫ Room for improvement
• DSHIELD Dataset Observation• Model Overview
▫ Time Series EWMA
▫ Neighborhood Model kNN CA
• Evaluation• Conclusions
112/04/22 2Data Mining & Machine Learning Lab
•Blacklists•Recommendation System
Introduction
112/04/22 Data Mining & Machine Learning Lab 3
•Local Worst Offender List(LWOL)•Global Worst Offender List(GWOL)•Highly Predictive Blacklisting(HPB)
▫J. Zhang, P. Porras, and J. Ullrich, “Highly predictive blacklisting,” in Proc. of USENIX Security ’08 (Best Paper award), San Jose, CA, USA, Jul. 2008, pp. 107–122.
Related Works
112/04/22 Data Mining & Machine Learning Lab 4
Room for improvement
112/04/22 Data Mining & Machine Learning Lab 5
DSHIELD Dataset Observation
112/04/22 Data Mining & Machine Learning Lab 6
DSHIELD Dataset Observation(cont.)
112/04/22 Data Mining & Machine Learning Lab 7
DSHIELD Dataset Observation(cont.)
112/04/22 Data Mining & Machine Learning Lab 8
•Time Series for Attack Prediction▫Exponential Weighted Moving Average(EWMA)
•Neighborhood Model▫Victim Neighborhood (kNN)
k-nearest neighbor Pearson correlation as similarity metric
▫Joint Attacker-Victim Neighborhood (CA) cross-associations Fully automatic clustering algorithm that finds
row and column groups of sparce binary matrices
Model Overview
112/04/22 Data Mining & Machine Learning Lab 9
•Local approaches•Global (neighborhood) approaches•Proposed combined method•Robustness
Evaluations
112/04/22 Data Mining & Machine Learning Lab 10
Evaluations (cont.)
112/04/22 Data Mining & Machine Learning Lab 11
Evaluations (cont.)
112/04/22 Data Mining & Machine Learning Lab 12
Evaluations (cont.)
112/04/22 Data Mining & Machine Learning Lab 13
Evaluations (cont.)
112/04/22 Data Mining & Machine Learning Lab 14
•Frame the problem as an implicit recommendation system
•Analyze a real dataset of 1-month logs from Dshield.rg
•Shows that even larger improvement can be obtained
•Give a methodological development with improvement over state-of-the-art.
Conclusions
112/04/22 Data Mining & Machine Learning Lab 15
•Questions?
Thanks for your attention
112/04/22 Data Mining & Machine Learning Lab 16