Detecting P2P Traffic from the P2P Flow Graph
Jonghyun KimKhushboo Shah
Stephen Bohacek
Electrical and Computer Engineering
OutlineIntroduction and ObjectivesFlow DataIdentification Methods
◦Class A-1 : Degree-Based P2P Detection◦Class A-2 : Known Port◦Class B-1 : Repeated Communication◦Class B-2 : P2P Port-Based Identification◦Class B-3 : Triggered P2P Detection
ResultsConclusionFuture Work
IntroductionWhy detection of P2P Traffic?
◦Helpful for network capacity planning, pro-visioning, traffic shaping/policing, etc.
How to detect P2P Traffic?◦Port based◦Signature based ◦Behavior based◦Machine learning based◦Host graph based
ObjectivesNo deep packet inspectionSimpler, but still be effectiveP2P flow graph based
Flow DataSIP : source IPDIP : destination IP
SP : source portDP : destination portPR : protocol (tcp or udp)ST : flow start timeEID : event ID (info for signature matching)
Flow Data
| PR tcp, DP 80 SIP , DIP , SP , DP , PR , ST
time
SYN BSIP SP PR DP DIP
60355 6881
TCP
Mathematical expression
Pictorial view
Each flow has components.
AST
Identification Methods
flow 1
Class B methods connect flow1 to flow 2
flow 2
Class A methods detect flow 1 (an initial P2P flow)
P2P flow graph by methods
Class A-1 : Degree-based P2P Detection
A X7TCP63234 52334
X13
X1255038 18636UDP
55038 26675UDP
X1
X3
63135 2710TCP
63138 6969TCP
X10
X11
X2
63320 51413TCP
63120 5354 TCP
63356 9090TCP
X9
X8
X4
X5
X6
UDP55038 21566
UDP55038 33561
TCP55038 60727
TCP
27164TCP
55038 33765
55038
t
T
TX4
X5
X6
X8
In-degree hosts
X9
Out-degree hostsX1
X2
X3
X7
X10
X11
X12
X13
8
5
Class A-1 : Degree-based P2P detection Out-degree
In-degree
Detector
P2P active time(ID is not considered)PACT,RIP t | ODIP, t, WL, T R
ACT,R : | ST PACT,R SIP PACT,R DIP , DP WL
ODIP, t, WL, T : # DIP | SIP IP, DP WL, | ST t | T
IDIP, t, WL, T : # SIP | DIP IP, DP WL, | ST t | T
Class A-2 : Known Port P2P active Time
Detector
PACT,RIP : t | IDIP, t, WL, T ODIP, t, WL, T R
KPFT,R : | DP KP, ST PACT,R SIP PACT,R DIP | SP KP, PR udp, ST PACT,R SIP PACT,R DIP
Identification Methods
flow 1
Take a look at Class B methods
flow 2
Done with Class A methods
P2P flow graph by methods
Class B-1 : Repeated Communication between Known P2P Peers
ST
SIP SP PR DP DIP
A TCP63234 52334 X
A X
A X
Class B-1 : Repeated Communication between Known P2P Peers
Detector given an initial P2P flow
Detector given a set of P2P flows
G :
G
P2P peers = SIP , DIP
G : | SIP SIP , DIP DIP | SIP DIP , DIP SIP
Class B-2 : P2P Port Identification and Port-Based P2P Detection
Class B-2 : P2P Port Identification and Port-Based P2P Detection
A X7TCP63234 5233
4
X13
X1255038 1863
6UDP
55038 26675
UDP
X1
X3
63135 2710TCP
63138 6969TCP
X10
X11
X2
63320 51413
TCP
63120 5354 TCP
63356 9090TCP
Class B-2 : P2P Port Identification and Port-Based P2P Detection
A X7TCP63234 5233
4
X13
X1255038 1863
6UDP
55038 26675
UDP
X1
X3
63135 2710TCP
63138 6969TCP
X10
X11
X2
63320 51413
TCP
63120 5354 TCP
63356 9090TCP
ST
Class B-2 : P2P Port Identification and Port-Based P2P Detection
T
T
TCP or UDP…
Incoming…
TCP or UDP
outgoing
IP
P2P port
Class B-2 : P2P Port Identification and Port-Based P2P Detection
Detector given an P2P flow
HT : | DIP DIP , DP DP , PR tcp, | ST ST | T | DIP DIP , DP DP , PR udp, | ST ST | T | SIP DIP , SP DP , PR udp, | ST ST | T | DIP SIP , DP SP , PR tcp, PR udp, | ST ST | T | DIP SIP , DP SP , PR udp, PR udp, | ST ST | T | SIP SIP , SP SP , PR udp, PR udp, | ST ST | T
ST
Class B-3 : Triggered P2P Detection
1 sec
1 sec
A X
… …
Nearby flows tend to be P2P flows
Class B-3 : Triggered P2P Detection Detector given an P2P flow
TA : | SIP SIP , DP WL, | ST ST | 1 | SIP DIP , DP WL, | ST ST | 1
P2P peers = SIP , DIP
SummaryClass A :
Conservativeness ↑
T : time window offset
T
T
T ↓, R ↑
R peers
R : threshold for # of peers connectedKPFT,R
ACT,R
SummaryClass A :
G H TA
KPFT,R
ACT,R
Class B :
TGH TAGH GH GH
GHk
GH
: K th iteration: until convergence
Results : Number of P2P flows Detected
C1 C2 C30
0.2
0.4
0.6
0.8
1
Combination
Frac
tion
of fl
ows
KPF480, 250
AC15,100
GH ∞
TGH ∞
x 107
Combination
# of
flow
s
C1 C2 C30
2
4
6
8
Results : Vertex Degree
SingleP2Pflow
F2
F3
F4F5
F6
F7
F8F1
: by GH 1
type1 = anytype2 = UDPtype3 = TCP, DIP = internal IPtype4 = TCP, DIP = external IP
Degree = 8
Results : Vertex Degree
100
101
102
103
104
105
106
10-3
10-2
10-1
100
Degree
CC
DF
type1type2type3type4
type1 = anytype2 = UDPtype3 = TCP, DIP = internal IPtype4 = TCP, DIP = external IP
131.118.39.54:4226
131.118.51.14:29836
131.118.51.14:44744
131.118.51.34:38323131.118.51.35:44744
131.118.51.37:42644
131.118.51.41:42644
131.118.51.49:42644
131.118.51.50:42644
131.118.51.53:42644
131.118.51.105:30329
131.118.51.135:13511
131.118.51.182:13511
131.118.51.199:20288131.118.52.132:27136
131.118.52.132:48522
131.118.53.2:22800
131.118.53.66:62660
131.118.54.8:10381
131.118.54.8:11025
131.118.54.8:14471
131.118.54.8:19229
131.118.54.8:27453
131.118.54.8:53205
131.118.54.10:29842 131.118.54.12:39144
131.118.54.23:40234131.118.54.30:45464
131.118.54.30:50659
131.118.54.76:55820
131.118.54.86:50659
131.118.55.166:10067
131.118.55.166:54690
131.118.55.188:36294
131.118.55.189:26757
131.118.55.189:41540
131.118.55.189:62885
131.118.55.210:34016
131.118.55.210:56289
131.118.55.216:49898
131.118.55.224:22033
131.118.56.15:27494
131.118.56.35:54636
131.118.57.37:45540
131.118.57.37:45574
131.118.57.52:13363
131.118.57.79:37113
131.118.58.33:58003
131.118.58.43:22559
131.118.58.62:60350131.118.58.66:12648
131.118.58.91:11099
131.118.58.91:52500
131.118.58.117:52500
131.118.58.157:22559
131.118.58.184:12648
131.118.58.193:18246
131.118.59.39:31809131.118.59.76:36542
131.118.59.84:36542
131.118.59.108:33302
131.118.59.108:35127
131.118.59.134:4226
131.118.59.134:33302
131.118.59.155:31809
131.118.59.241:3684
131.118.59.241:3705
131.118.59.241:3706
131.118.59.241:3707
131.118.59.241:3708
131.118.59.241:3709
131.118.59.241:3710131.118.59.241:3711
131.118.59.241:3712131.118.59.241:3713
131.118.59.241:3714
131.118.59.241:3715
131.118.59.241:3716
131.118.59.241:3717
131.118.59.241:3718
131.118.59.241:3719
131.118.59.241:3720
131.118.59.241:3721
131.118.59.241:3722
131.118.59.241:3723
131.118.59.241:3724
131.118.59.241:3725131.118.59.241:3726
131.118.59.241:3727
131.118.59.241:3728
131.118.59.241:3729
131.118.59.241:3730
131.118.59.241:3731
131.118.59.241:3732
131.118.39.53:4226
Results : Vertex Degree
72.20.34.145:6881
Single P2P flow
Results : Large Connected Component
: by GH 1
SingleP2Pflow
: by GH 2
Results : Large Connected Component
Type Mean Median1 49,476,748 69,689,8042 68,179,534 69,689,8043 63,217,662 69,689,8044 16,932,282 115,692
0 1 2x 105
0
0.2
0.4
0.6
0.8
1
# of flows reachable
CC
DF
type1 = anytype2 = UDPtype3 = TCP, DIP = internal IPtype4 = TCP, DIP = external IP
…7x 10
7x 10
5
Visualization of P2P Flow Graph
TA link
small connectedcomponents
GH link
large connectedcomponent
ConclusionEven if Class A methods detect the
small number of P2P flows by set-ting parameters conservatively, Class B recursive methods identify almost the rest of P2P flows.
There exists the large connected component (LCC) in P2P flow graph, so the identification of a single P2P flow in LCC leads to all flow detec-tion in LCC.
Future WorkReal-time IdentificationComplexity Analysis
Thanks
< 10241025 1755 2967 3268 3724 5050 5190 5351 8080
Port white list: well-known port : NFS : MMS : Symantec AntiVirus : msft-gc : World of Warcraft : Yahoo! Messenger : AOL Instant Messenger : NAT Port Mapping Protocol: HTTP alternate
BitTorrentGnutella Edonkey FastTrack Freenet Soulseek
Known P2P port: 6881~6889, 6969, 2710 : 6346~6349 : 2323, 3306, 4242, 4500, 4501, 4661~4674, 4677, 4678, 7778 : 1214, 1215, 1331 : 19114, 8081 : 2234, 5534