An empirical approach to modeling uncertainty in Intrusion Analysis
Xinming (Simon) Ou1
S. Raj Rajagopalan2
Sakthi Sakthivelmurugan1
1 – Kansas State University, Manhattan, KS2 – HP Labs, Princeton, NJ
system administrator
Network Monitoring
Tools
Abnormally high traffic
TrendMicro server communicating
with known BotNet controllers
memory dump
Seemingly malicious
code modules
Found open IRC sockets with other TrendMicro servers
netflow dump
These TrendMicro Servers are certainly compromised!
2
A day in the life of a real SA
Key challenge How to deal with uncertainty in
intrusion analysis?
An empirical approach
• In spite of the lack of theory or good tools, sysadmins are coping with attacks.
• Can we build a system that mimics what they do (for a start):– An empirical approach to Intrusion Analysis using
existing reality• Our goal: – Help a sysadmin do a better job rather than
replace him
3
High-confidence Conclusions with Evidence
Targeting subsequent observations
Mapping observations to their semantics
IDS alerts, netflow dump, syslog, server log …Observations
Internal model
Reasoning Engine
4
Capture Uncertainty Qualitatively
Confidence level
Uncertainty Modes
Low Possible pModerate Likely lHigh Certain c
• Arbitrarily precise quantitative measures are not meaningful in practice
• Roughly matches confidence levels practically used by practitioners
5
High-confidence Conclusions with Evidence
Targeting subsequent observations
Mapping observations to their semantics
IDS alerts, netflow dump, syslog, server log …Observations
Internal model
Reasoning Engine
6
Observation Correspondence
7
obs(anomalyHighTraffic) int(attackerNetActivity)
obs(netflowBlackListFilter(H, BlackListedIP))
obs(memoryDumpMaliciousCode(H))
obs(memoryDumpIRCSocket(H1,H2))
p
int(compromised(H))l
int(compromised(H))l
int(exchangeCtlMessage(H1,H2))l
Observations Internal conditionsmode
what you can see
what you want to know
High-confidence Conclusions with Evidence
Targeting subsequent observations
Mapping observations to their semantics
IDS alerts, netflow dump, syslog, server log …Observations
Internal model
Reasoning Engine
8
Internal Model
9
Logical relation among internal conditions
Condition 1 Condition 2Condition 1 infers Condition 2
int(compromised(H1)) int(probeOtherMachine(H1,H2))
int(sendExploit(H1,H2)) int(compromised(H2))
int(sendExploit(H1,H2))int(compromised(H2))
int(compromised(H1))int(probeOtherMachine(H1,H2))
direction of, inference
mode
f,
f,
b,
b,
p
l
p
c
High-confidence Conclusions with Evidence
Targeting subsequent observations
Mapping observations to their semantics
IDS alerts, netflow dump, syslog, server log …Observations
Internal model
Reasoning Engine
10
11
Reasoning Methodology
• Simple reasoning– Observation correspondence and internal model
are inference rules– Use inference rules on input observations to
derive assertions with various levels of uncertainty • Proof strengthening– Derive high-confidence proofs from assertions
derived from low-confidence observations
Example 1 Observation Correspondence
12
int(exchangeCtlMsg(172.16.9.20, 172.16.9.1), l )
obs(memoryDumpIRCSocket(172.16.9.20, 172.16.9.1))
obsMap
obs(memoryDumpIRCSocket(H1,H2)) int(exchangeCtlMessage(H1,H2))l
Example 2 Internal Model
13
int(exchangeCtlMsg(172.16.9.20, 172.16.9.1), )
obs(memoryDumpIRCSocket(172.16.9.20, 172.16.9.1))
int(compromised(172.16.9.20), )l obsMap
Int rule
l
Proof Strengthening
14
Observations:
f is likely true f is likely true
O1 O2
f is certainly true
proof strengthening
O3
15
Proof Strengthening
A
A
Al
l
c
p
strengthen
Proof Strengthening
16
int(exchangeCtlMsg(172.16.9.20, 172.16.9.1), l )
obs(memoryDumpIRCSocket(172.16.9.20, 172.16.9.1))
int(compromised(172.16.9.20), l )obsMap
intR
obs(memoryDumpMaliciousCode(’172.16.9.20’))
int(compromised(172.16.9.20), l ) obsMap
int(compromised(172.16.9.20), ) strengthenedPf
strengthen( l, l ) = c
c
Evaluation Methodology
• Test if the empirically developed model can derive similar high-confidence trace when applied on different scenarios
• Keep the model unchanged and apply the tool to different data sets
17
SnIPS (Snort Intrusion Analysis using Proof Strengthening) Architecture
18
Reasoning Engine
Snort alerts
(convert to tuples)
Observation Correspondence
User query, e.g. which machines are “certainly” compromised?
High-confidence answers with
evidence
pre-processing
Internal ModelSnort Rule Repository
Done only once
Snort rule class type
19
alert tcp $EXTERNAL_NET any -> $HTTP_SERVERS $HTTP_PORTS(msg:"WEB-MISC guestbook.pl access”;uricontent:"/guestbook.pl”;classtype:attempted-recon; sid:1140;)
obsMap(obsRuleId_3615, obs(snort(’1:1140’, FromHost, ToHost)),int(probeOtherMachine(FromHost, ToHost)), ?).
Internal predicate mapped from “classtype”
Snort rule documents
20
Impact: Information gathering and system integrity compromise. Possible unauthorized administrative access to the server. Possible execution of arbitrary code of the attackers choosingin some cases.
Ease of Attack: Exploits exists
obsMap(obsRuleId_3614, obs(snort(’1:1140’, FromHost, ToHost)), int(compromised(ToHost)), p)
Hints from natural-language description of Snort rules
obsMap(obsRuleId_3615, obs(snort(’1:1140’, FromHost, ToHost)),int(probeOtherMachine(FromHost, ToHost)), ).l ?
Automatically deriving Observation Correspondence
• Snort has about 9000 rules.• This is just a base-line and needs to be fine-tuned.• Would make more sense for the rule writer to define
the observation correspondence relation when writing a rule
21
Internal Predicate % of rules
Mapped automatically 59%
Not mapped automatically 41%
22
Data set description
• Treasure Hunt (UCSB 2002) – 4hrs – Collected during a graduate class experiment– Large variety of system monitoring data: tcpdump,
sys log, apache server log etc.• Honeypot (Purdue, 2008) – 2hrs/day over 2 months
– Collected for e-mail spam analysis project– Single host running misconfigured Squid proxy
• KSU CIS department network 2009 – 3 days– 200 machines including servers and workstations.
Some result from Treasure Hunt data set
23
| ?- show_trace(int(compromised(H), c)). int(compromised(’192.168.10.90’),c) strengthenedPf
int(compromised(’192.168.10.90’), p) intRule_1
int(probeOtherMachine(’192.168.10.90’,’192.168.70.49’), p) obsRulePre_1
obs(snort(’122:1’,’192.168.10.90’,’192.168.70.49’,_h272))
int(compromised(’192.168.10.90’),l) intRule_3
int(sendExploit(’128.111.49.46’,’192.168.10.90’), l) obsRuleId_3749
obs(snort(’1:1807’,’128.111.49.46’,’192.168.10.90’,_h336))
An exploit was sent to
192.168.10.90
A probe was sent from
192.168.10.90
192.168.10.90 was certainly
compromised!
Data Reduction
Data set Duration of Network
traffic
Snort alerts pre-processed alerts
High-confidence
proofsTreasure Hunt 4 hours 4,849,937 278 18
Honeypot 2 hrs/day for 2 months
637,564 30 8
CIS Network 3 days 1,138,572 6634 17
24
25
Related work
• Y. Zhai et al. “Reasoning about complementary intrusion evidence,” ACSAC 2004
• F. Valeur et al., “A Comprehensive Approach to Intrusion Detection Alert Correlation,” 2004
• Goldman and Harp, "Model-based Intrusion Assessment in Common Lisp", 2009
• C. Thomas and N. Balakrishnan, “Modified Evidence Theory for Performance Enhancement of Intrusion Detection Systems”, 2008
Summary
• Based on a true-life incident we empirically developed a logical model for handling uncertainty in intrusion analysis
• Experimental results show– Model simulates human thinking and was able to
extract high-confidence intrusion– Model empirically developed from one incident was
applicable to completely different data/scenarios – Reduction in search space for analysis
26
27
Future Work
• Continue the empirical study and improve the current implementation
• Establishing a theoretical foundation for the empirically-developed method– Modal logic– Dempster-Shafer Theory– Bayes Theory
29
Summarization
• Compact the information entering reasoning engine
• Group similar “internal condition” into a single “summarized internal condition”
Comparison of the three data sets
30
atte
mpt
ed-a
dmin
atte
mpt
ed-d
os
atte
mpt
ed-r
econ
atte
mpt
ed-u
ser
bad
-unk
now
n
def
ault-
logi
n-att
empt
misc
-acti
vity
misc
-atta
ck
not
-sus
picio
us
pol
icy-v
iola
tion
non-
stan
dard
-pro
toco
l
pro
toco
l-com
man
d-de
code
rpc-
port
map
-dec
ode
shel
lcode
-det
ect
succ
essf
ul-a
dmin
succ
essf
ul-r
econ
-lim
ited
susp
iciou
s-fil
enam
e-de
tect
syst
em-c
all-d
etec
t
troj
an-a
ctivi
ty
unk
now
n
web
-app
licati
on-a
ctivi
ty
web
-app
licati
on-a
ttack
1
10
100
1000
10000
100000
1000000
10000000
Department Treasure Hunt Honeypot
31
Output from CISint(compromised('129.130.11.69'),c) strengthenedPf
int(compromised('129.130.11.69'),l) intRule_1b int(probeOtherMachine('129.130.11.69','129.130.11.12'),l) sumFact summarized(86)
int(compromised('129.130.11.69'),l) intRule_3f int(sendExploit('129.130.11.22','129.130.11.69'),c) strengthenedPf int(sendExploit('129.130.11.22','129.130.11.69'),l,) sumFact summarized(109) int(skol(sendExploit('129.130.11.22','129.130.11.69')),p) IR_3b int(compromised('129.130.11.69'),p) sumFact summarized(324)