Post on 27-Mar-2015
transcript
Google-based Traffic Classification
Aleksandar KuzmanovicNorthwestern University
IEEE Computer Communications Workshop (CCW ‘08)
October 23, 2008
http://networks.cs.northwestern.edu
I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)2
Traffic Classification
Problem – traffic classificationCurrent approaches(port-based, payload signatures,numerical and statistical etc.)
Our approach– Use information about destination IP
addresses available on the Internet
A. Kuzmanovic Google-based Traffic Classification
I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)3
Getting External Information
Use Google!
Can we systematically exploit search engines to harvest endpoint information available on the Internet?
Huge amount of endpoint information available on the web
A. Kuzmanovic Google-based Traffic Classification
I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)4
Websites run logging software and display statistics
Some popular proxy services also display logs
Popular servers (e.g., gaming) IP addresses are listed
Blacklists, banlists, spamlists also have web interfaces
Even P2P information is available on the Internet since the first point of contact with a P2P swarm is a
publicly available IP address
Where Does the Information Come From?
ServersServersClientsClientsP2PP2PMaliciousMalicious
A. Kuzmanovic Google-based Traffic Classification
I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)
URL Hit textURL Hit textURL Hit text…. ….
Rapid Match
Domain name KeywordsDomain name Keywords
….….
IP tagging
IP Addressxxx.xxx.xxx.xxx
Website cache
Search hits
5
Methodology – Web Classifier and IP Tagging
A. Kuzmanovic Google-based Traffic Classification
I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)6
165.124.182.169
Tagged IP Cache
Traffic Classification
Mail server
193.226.5.150 Website
68.87.195.25 Router
186.25.13.24 Halo server
Hold a small % of the IP addresses seen
Look at source and destination IP addresses
and classify traffic
A. Kuzmanovic Google-based Traffic Classification
I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)
When no sampling is doneUEP outperforms BLINC
UEP maintains a large classification ratio even at
higher sampling rates
BLINC stays in the dark2% at sampling rate 100
UEP retains high classification capabilities with sampled traffic
7
Working with Sampled Traffic
A. Kuzmanovic Google-based Traffic Classification
I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)
Summary
Shift research focus from mining operational network traces to harnessing information that is already available on the web
Deep packet inspection and legal issues:– Federal Wiretap Act: “thou shalt not intercept the
contents of communications. Violations can result in civil and criminal penalties. The worst offenses may be investigated by the FBI, Secret Service, DEA, and IRS as felony prosecutions.”
– Only 2 exceptions:• The provider protection exception• Consent
8A. Kuzmanovic Google-based Traffic Classification