Post on 21-May-2020
transcript
Tracking Internet Hosts Using Unreliable IDs Yinglian Xie, Fang Yu, and Martín Abadi
Microsoft Research, Silicon Valley
First 100.0.0.1Then 100.0.0.2
100.0.0.1Blacklist 100.0.0.1 X
Open and anonymous weak accountability IP addresses are not unique, fixed identifiers (because of dynamic IP addresses, proxies, and NATs).
It is hard to identify who is responsible for traffic.
Accountability is weak on the Internet
We should block attack traffic by host rather than by IP address
Track hosts more reliably in spite of dynamic IP addresses
Explore applications that require identifying hosts over time
E.g., security and data-mining
Our approach: Use application IDs and events to track hosts
Goal: Inferring host-IP bindings
AliceAlice
AliceBob
Bob
IP1IP2
IP3IP4IP5IP6
t1 t2 t3 t4 t5 t6Alice’s host
IP_1: [t1, t2]IP_2: [t3, t4]IP_5: [t5, t6]
Bob’s hostIP_4: [t3, t4]IP_3: [t5, t6]
Challenges: No 1-1 mappings between hosts and IDs Dynamic IP addresses, proxies and NATs Malicious IDs
Example application: host-based blacklisting
Problem formulation
A
A
A
B
B
C
C
C
B
IP1
IP2
IP3
IP4
IP5
IP6
t1 t2 t3 t4 t5 t6
time
Input events
e1: <u1, IP1, t1>e2: <u2, IP1, t2>e3: <u3, IP4, t3>
… …en: <u4, IP6, t5>
Identity mapping
A u1, u2
B u3
C u4
… …
Host-tracking graph
timeIPi
t1 t3t2
u1
u2
u1 u1 u1 u1
GroupUser IDs
Construct tracking
graph
Initial ID-groups
Resolve inconsistency
Updated ID-groupsInput events
Tracking graph with
inconsistent bindings
Update ID groups
Pruned inconsistent
bindings
Host tracking
graph
Initial estimation: u1 : h1u2 : h2
Methodology overview
Our goal : maximize tracked events and tracked IDs
Grouping IPs
Consider the probability of two random IDs appearing together
Resolve inconsistencies
timeIPi
t1
t3
t2U1
t4U2
timeIPi
t1 t2
U1
U1
timeIPj
t3 t4
Conflict bindings
Concurrent bindings
Guest removal
Proxy identification
Group splitting
Host-tracking graph
Update estimation iteratively
Calibrate cookie churns
Estimate host population more accurately
Build normal-user profiles
Host-aware blacklists – block by host rather than IP
Post-mortem forensics
Real-time blacklists (Tracklist)
Coverage and Accuracy
Applications
Evaluate accuracy using Windows update data 92% - 96%
Coverage: tracked events / total events 75% - 80%
Implementation with Hotmail data
Query planLINQ query Dryad
select
where
logs
Automatic query plan generation
Distributed query execution by Dryad
var logentries =from line in logswhere !line.StartsWith("#")select new LogEntry(line);
On Dryad and DryadLINQ
# of blocked users Falsepositives
IP blacklist / infinitely 44.70 million 52.8%
IP blacklist / one hour 27.94 million 34.1%
Tracklist / one hour 16.01 million 4.9%
Tracklist with profile/one hour 14.27 million 0.1%
Seed data: 5.6 million bot-accounts detected by BotGraph in one month
Network security with IP intelligence Our work
IP property
inference engine
Static or dynamic
Proxy, NAT
Residential vs. enterprise
DSL, wireless, dialup
User population
Spam history
Known attack history
… …
Applications
Spammer detection
DDoSprevention
Click frauddetection
Targeted ads
Phishing sitedetection
Windows update
IP properties IP history
•190M dynamic IPs
•40-50% spam
UDmap:
identify
dynamic IPs
Hotmail login data
AutoRE:
derive spam
signatures
•340K botnet IPs
•16-18% spam not
detected today
Sampled emails
BotGraph:
detect bot-
accounts
•26M bot-accounts
•4.5M botnet IPs
Hotmail login data
HostTracker:
track host-IP bindings
Input
Collaboration with WLSP (Jason Atlas, Geoff Hulten, Ivan Osipkov), Hotmail (Hersh Dangayach, Eliot Gillum, Krish Vitaldevara), Bing (Fritz Behr, David Soukal, Roger Yu, Zijian Zheng), and Messenger (Steve Miale)
Routing tables
Windows update log
Web search log
Spam emailsand history
User login records
Messengerlog
SBotMiner:
detect
search bots
•3% of overall
search traffic
Search data
Spammer
detection
Search user
tracking
Your
application?…