Waterfall: Rapid identification of IP flows using cascade classification

Post on 21-Jul-2015

72 views 1 download

Tags:

transcript

Waterfall:

Rapid identificationof IP flows using

cascade classificationPaweł Foremski, MSc. Eng.

The Institute of Theoretical and Applied Informaticsof the Polish Academy of Sciences, Gliwice

pjf@iitis.pl

Brunów, 24th June 2014CN 2014 Conference

Identification of IP flows?“traffic classification” or “traffic identification”

TC: input - output

TrafficClassifier

Input Output

networktraffic

applicationnames

TC input• TC input is the object of classification:

o Single IP packet

o IP flow

o Endpoint

o Host

TC output• TC output is the result of classification:

o Application name – e.g. Skype, Teamviewer

o Network protocol – e.g. HTTP, SMTP

o Category – e.g. chat, streaming

o Traffic profile – e.g. bulk, interactive

o Content type – e.g. text, image

o Web application – e.g. Google Docs, Facebook

TC: the problem• How to identify network traffic?

• How to cope with practical constraints?o With limited resources (on high-speed routers)

o With limited details (only packet headers)

o ...

• How to measure the performance?o Result accuracy

o Reaction time

o Temporal stability

o Spatial stability

o ...

TC: applications

HTTP

Skype

BitTorrent

FTP

BitTorrent

Queuing

Quality of Service

Firewall

Access Policy

Monitoring

Routing

...

TC: applications

Alessandro Finamore, Marco Mellia, Michela Meo, Maurizio M. Munafò, Dario Rossi, Experiences of Internet Traffic Monitoring with Tstat,IEEE Network "March/April 2011", Vol.25, No.3, pp.8-14, ISSN: 0890-8044, March/April 2011

TC: applications

FTTH4 Mbps

ADSL24 Mbps

VoIP, DNS, G

ames,

...

BitTorrent, eMule, YouTube, ...

5-10 ms

50-100 ms

TC: existing solutions• Port numbers

• Deep Packet Inspection (DPI) - e.g. [2,3]

• Machine Learning - e.g. [5,9]

• Behavioral analysis - e.g. [4,7,8]

• Classifier fusion - e.g. [6]

Waterfall: motivation

Each TC algorithm has advantages and disadvantages.

The problem: Could we integrate these approaches into one system so that we move forward in TC?

How would solving this problem affect classification performance?

Waterfall: the idea1. Use existing classifiers as modules2. Implement the rejection option3. Minimize false positives4. Connect in a cascade structure

1

2

3

An old (yet new) idea

• Classifier selection• Mixture of experts• Cascade classification

Kuncheva L., “Combining pattern classifiers: methods and algorithms",John Wiley & Sons, 2004

A

A

B

Ax

• Classifier fusion• Majority vote• Weighted vote• Naive Bayes Combination• Behavior Knowledge Space• ...

Waterfall: the idea

Waterfall: practical system

dstip

dnsclass

portsize

npkts

port

(Python source code available at mutrics.iitis.pl)

Flow features limited to first 10 seconds

Waterfall: validation

• Total sum of over 3.5 TB of data

• Validation of spatial and temporal stability

Foremski P., Callegari C., Pagano M., "Waterfall: Rapid identification of IP flows using cascade classification“.Proceedings of the 21st International Conference on Computer Networks, CN2014, CCIS 431, pp. 14-23. Springer, 2014

Validation: dataset 1

Foremski P., Callegari C., Pagano M., "Waterfall: Rapid identification of IP flows using cascade classification“.Proceedings of the 21st International Conference on Computer Networks, CN2014, CCIS 431, pp. 14-23. Springer, 2014

Validation: dataset 2

Foremski P., Callegari C., Pagano M., "Waterfall: Rapid identification of IP flows using cascade classification“.Proceedings of the 21st International Conference on Computer Networks, CN2014, CCIS 431, pp. 14-23. Springer, 2014

Temporal stability (8 months)

Validation: datasets 3 and 4

Foremski P., Callegari C., Pagano M., "Waterfall: Rapid identification of IP flows using cascade classification“.Proceedings of the 21st International Conference on Computer Networks, CN2014, CCIS 431, pp. 14-23. Springer, 2014

Spatial stability

No payloads

Experiment 1: >50% is easy

Foremski P., Callegari C., Pagano M., "Waterfall: Rapid identification of IP flows using cascade classification“.Proceedings of the 21st International Conference on Computer Networks, CN2014, CCIS 431, pp. 14-23. Springer, 2014

>50%

>50%

Experiment 2: more is faster

Foremski P., Callegari C., Pagano M., "Waterfall: Rapid identification of IP flows using cascade classification“.Proceedings of the 21st International Conference on Computer Networks, CN2014, CCIS 431, pp. 14-23. Springer, 2014

adding specialized modules

Discussion• Waterfall is a new architecture for TC• We propose an idea and an open source implementation• A 5-element system yielded very good results

• Findings• More than 50% of traffic in Internet is easy to identify

• Adding more modules to cascade can increase the speed

• Open questions• Quantitative comparison: Waterfall vs. BKS

• How to train the system in an optimal way?

• How to put the modules in a proper order?

References1. Foremski P., On different ways to classify Internet traffic: a short review of selected publications.

Theoretical and Applied Informatics 2013; 25(2).2. B.-C. Park, Y. J. Won, M.-S. Kim, and J. W. Hong, Towards automated application signature

generation for traffic identification, in Network Operations and Management Symposium, 2008. NOMS 2008. IEEE, pp. 160–167, IEEE, 2008.

3. S. H. Yeganeh, M. Eftekhar, Y. Ganjali, R. Keralapura, and A. Nucci, CUTE: Traffic Classification Using TErms, in Computer Communications and Networks (ICCCN), 2012 21st International Conference on, pp. 1–9, IEEE, 2012.

4. T. Karagiannis, K. Papagiannaki, and M. Faloutsos, BLINC: Multilevel traffic classification in the dark, in ACM SIGCOMM Computer Communication Review, vol. 35, pp. 229 – 240, ACM, 2005.

5. A. Finamore, M. Mellia, M. Meo, and D. Rossi, KISS: Stochastic packet inspection classifier for udp traffic, Networking, IEEE/ACM Transactions on, vol. 18, no. 5, pp. 1505 – 1515, 2010.

6. A. Dainotti, A. Pescapé, and C. Sansone, Early classification of network traffic through multi-classification, Traffic Monitoring and Analysis, pp. 122 – 135, 2011.

7. Foremski P., Callegari C., Pagano M., DNS-Class: Immediate classification of IP flows using DNS, International Journal of Network Management, John Wiley & Sons, 2014, DOI: 10.1002/nem.1864

8. P. Bermolen, M. Mellia, M. Meo, D. Rossi, and S. Valenti, Abacus: Accurate behavioral classification of P2P-TV traffic, Computer Networks, vol. 55, no. 6, pp. 1394 – 1411, 2011.

9. G. Münz, H. Dai, L. Braun, and G. Carle, TCP traffic classification using Markov models, Traffic Monitoring and Analysis, pp. 127 – 140, 2010.

Thank you!

Paweł Foremski, pjf@iitis.plProject website: http://mutrics.iitis.pl/

TC: definition

Internet traffic classification (or identification) isthe act of matching IP packets

to the applications that generated them. [1]

TC: the problem• How to identify network traffic?• How to do it well?

o With limited resources (on high-speed routers)

o With limited details (only packet headers)

o With good accuracy (no errors)

o In limited time (in real-time)

o For current and future protocols (flexibility and stability)

o For the whole Internet (backbone routers and gateways)

• How to measure the performance?o Result accuracy

o Reaction time

o Temporal stability

o Spatial stability

o Processing time

o Unknown detection

Example: dnsclassForemski P., Callegari C., Pagano M., "DNS-Class: Immediate classification of IP flows using DNS",

International Journal of Network Management, John Wiley & Sons, 2014

dnsclass: details

Foremski P., Callegari C., Pagano M., "DNS-Class: Immediate classification of IP flows using DNS",International Journal of Network Management, John Wiley & Sons, 2014

dnsclass: details

Foremski P., Callegari C., Pagano M., "DNS-Class: Immediate classification of IP flows using DNS", International Journal of Network Management, John Wiley & Sons, 2014

dnsclass: motivation

Foremski P., Callegari C., Pagano M., "DNS-Class: Immediate classification of IP flows using DNS", International Journal of Network Management, John Wiley & Sons, 2014