Using Visual Motifs to Classify Encrypted Traffic
VizSEC'06 - November 3, 2006
Charles V WrightFabian Monrose
Gerald M Masson
Johns Hopkins UniversityInformation Security Institute
Traffic Classification: Why?● To detect intrusions or malware
– Is your mail server hosting a phishing website?(Are you sure?)
● To detect misuse by legitimate users– File sharing– Chat, Instant Messaging
Traffic Classification: Why?● Port Numbers are not reliable
– They can be changed at will by the end hosts
● Increased use of cryptography precludes inspection of packet payloads– Good: Hackers can't get our passwords.– Bad: Network admins have less info to work with
Traffic Classification: How?● Manually?
– tcpdump output? Ethereal/Wireshark?
Traffic Classification: How?● Manually? No.
– tcpdump output? Ethereal/Wireshark?● Machine Learning
– Text classification [ZP00] [MP05] [Dre06] [Ma06]
– Decision Trees [EBR03]
– Naïve Bayes [MZ05]
– Hidden Markov Models [WMM04] [WMM]
Traffic Classification: How?● Manually? No.
– tcpdump output? Ethereal/Wireshark?● Machine Learning
– [ZP00] [EBR03] [WMM04] [MP05] [MZ05] [Dre06] [Ma06] [WMM]
● Visually– Look for distinctive visual motifs in the patterns
produced by packets on the wire
Core observation of this work:
Application protocols behave differentlyand thus look different from each other
on the wire.
Core observation of this work:
Application protocols behave differentlyand thus look different from each other
on the wire.
Even when encrypted using SSL or TLS.
Application to Traffic Classification
● We can use these differences to distinguish between common application protocols in the traffic that we see on our networks– Quickly and Easily– Without port numbers– Without packet payloads
What does a TCP connection look like?
from
clie
ntfr
om s
erve
r
Example: HTTP
What does a TCP connection look like?
HTTP Request
TCP 3-wayHandshake
from
clie
ntfr
om s
erve
r Data Transfer from Server to Client
Example: HTTP
What does a TCP connection look like?
TCP 3-wayHandshake
from
clie
ntfr
om s
erve
r
Data Transfer from Client to Server
SMTP Handshaking(EHLO, RCPT TO, etc.)
SMTP GOODBYE
TCP FIN
Example: SMTP
Viewing many similar TCP connections at once
Example: HTTP
from
clie
ntfr
om s
erve
r
n = 1
Viewing many similar TCP connections at once
Example: HTTP
from
clie
ntfr
om s
erve
r
n = 2
Viewing many similar TCP connections at once
Example: HTTP
from
clie
ntfr
om s
erve
r
n = 3
Viewing many similar TCP connections at once
Example: HTTP
from
clie
ntfr
om s
erve
r
Yuck!n = 50
Viewing many similar TCP connections at once - heat maps
Example: HTTP
dark spots - very few packets
bright spots -lots of packets
from
clie
ntfr
om s
erve
r
Viewing many similar TCP connections at once – heat maps
Example: HTTP
TCPhandshake
from
clie
ntfr
om s
erve
rHTTPrequests
HTTP response
Data from server
ACKs fromclient
Classifying traffic with heat maps and visual motifs
HTTP
AIM
SMTP
HTTP
Classifying traffic with heat maps and visual motifs
HTTP
AIM
SMTP
SSH
Does this look like HTTP?
Or more like SMTP?
Limitations● The previous graphs illustrate time-dependent
properties of the application protocols
● They also cover a very short time span
● Long-lived, free-form protocols like SSH may be better characterized by taking a different view of the data
Steady-State Properties● We assume these don't change over the life of
the connection● Look at individual packets (unigrams)
– How big is the packet?– How long since the previous packet?
● Look at pairs of consecutive packets (bigrams)
Unigram Frequencies: HTTPfr
om c
lient
from
ser
ver
Unigram Frequencies
HTTP
AIM
SMTP
SSH
Bigram Frequencies
HTTP
AIM
SMTP
SSH
Bigram Frequencies: HTTPfr
om c
lient
from
ser
ver
from server from client
Bigram Frequencies: SMTPfr
om c
lient
from
ser
ver
from server from client
Bigram Frequencies: AIMfr
om c
lient
from
ser
ver
from server from client
Bigram Frequencies: SSHfr
om c
lient
from
ser
ver
from server from client
Bigrams in 3D
Future Work● Work is in progress to build an interactive GUI
application for analyzing packet traces– Open Source release planned for later this
academic year● We're also exploring ways to integrate Machine
Learning with Visualization more effectively
Acknowledgments● Many thanks to the developers of Numerical
Python and the Python matplotlib package
● Thanks also to the Statistics Group at GMU and to Pang et al. at LBNL for providing access to their packet traces
Thanks!● Questions?
References● [Dre06] H. Dreger, A. Feldmann, M. Mai, V. Paxson, and R. Sommer. Dynamic Application-Layer Protocol
Analysis for Network Intrusion Detection. USENIX Security 2006.
● [EBR03] J. Early, C. Brodley and C. Rosenberg. Behavioral Authentication of Server Flows. ACSAC 2003.
● [Ma06] J. Ma, K. Levchenko, C. Kreibich, S. Savage, and G.M. Voelker. Unexpected Means of Protocol Inference. IMC 2006.
● [MP05] A. Moore and D. Papagiannaki. Toward the Accurate Identification of Network Applications. PAM 2005.
● [MZ05] A. Moore and D. Zuev. Internet Traffic Classification Using Bayesian Analysis Techniques. ACM SIGMETRICS, June 2005.
● [WMM04] C. Wright, F. Monrose, and G.M. Masson. HMM Profiles for Network Traffic Classification (Extended Abstract). VizSEC/DMSEC 2004.
● [WMM] C.V. Wright, F. Monrose, and G.M. Masson. On Inferring Application Protocol Behaviors in Encrypted Network Traffic. JMLR Special Topic on Computer Security. (to appear)
● [ZP00] Y. Zhang and V. Paxson. Detecting Back Doors. USENIX Security 2000.