Date post: | 01-Jan-2016 |
Category: |
Documents |
Upload: | dustin-reynolds |
View: | 219 times |
Download: | 1 times |
The Mathematics and Algorithmics of Process DetectionGeorge Cybenko
Dartmouth CollegeHanover NH 03755 USA
IPAM 27-7-2005Cybenko
Acknowledgements
Research Support: DARPA, DHS, ARDA, ISTS, I3P, AFOSR, Microsoft
Active Members
George BakosAlex BarsamianMarion BatesVincent BerkWayne ChungValentino Crespi (Cal State LA)George CybenkoIan deSouzaAnnarita GianiDoug MadoryGlenn NofsingerYong ShengWilliam Stearns
Alumni
Naomi Fox (UMass, Ph.D. student) Hrithik Govardhan (Rocket)Robert Gray (BAE Systems)Diego Hernando (UIUC, Ph.D. student)Guofei Jiang (NEC Research)Alex Jordan (BAE Systems)Han Li (China)Josh Peteet (Greylock Partners)Chris Roblee (LLNL)
IPAM 27-7-2005Cybenko
Outline
Background and basics Background and basics Software and ApplicationsSoftware and ApplicationsTheoryTheorySummarySummary
IPAM 27-7-2005Cybenko
IPAM 27-7-2005Cybenko
An Example of a Process
1 2A “Process” Model
a b
Two states - { 1 , 2 } Two observables – { a , b }
Legal transitions between states are depicted by arrows.
When occupying a state, the process emits an observable.
All states are initial/start states and there are no terminal states.
Some legal sequences of observables: abbab , bababbb, abbb
Some illegal sequences of observables: aa , baab
Further reading: Automata Theory, Regular Languages, etc
IPAM 27-7-2005Cybenko
A More Complex Process
1 2Another “Process” Model
a , c b
Three states - { 1 , 2 , 3 } Three observables – { a , b , c }
Some legal sequences of observables: abab , babaccab, ab
Some illegal sequences of observables: bb , baabb
Problem: Given a sequence of possible observations is it legal? What states?
Solution: 1 Read the first observable, mark states that emit that observable2 Read an observable, z3 New marked states = (states reachable from old marked states)
intersected with (states that could have emitted z )4 If no new marked states, illegal sequence; else go to 2
3
a , c
IPAM 27-7-2005Cybenko
Two Simple Processes
A1 A2Model Instance A
a b
aabb is a legal observation sequence
A1 B1 A2 A2 , A1 B1 A2 B2 , B1 A1 B2 B2 , ... are all legal state sequences
A1 A2 A2 , A1 A2 , A1 B1 B1 B2 B1 B2 B2
We can reduce this to a single process....
B1 B2Model Instance B
a b
a track
a hypothesis
IPAM 27-7-2005Cybenko
Multiple Process Representation
A1 A2Model Instance A
a b
a b
A1 A2Model Instance A
B1 B2Model Instance B
a b
0 11 1
A1 B1
A1 B1 M =
M x M =
0 00 0
0 11 1
0 11 1
0 11 1
If the observation sequence is aaaaaa and multiple copies of the model are allowed, then we get a product model of size 2n.
Multistage Process Model
Start/Normal
Scanned
Infected
Data Access
Exfiltration
Potential malicious activity
Potential normal activity
IPAM 27-7-2005Cybenko
IPAM 27-7-2005Cybenko
k copies
t=0 t=1 t=2 t=3 t=k-2 t=k-1Copies of
states
Take logsof probabilitiesso this is a shortest pathproblem andcan usedynamic programming(Viterbi algorithm)
Extensions: Hidden Markov Model (HMM)Extensions: Hidden Markov Model (HMM)
1 2
p(a|1) = 0.8 , p(c|1) = 0.2 p(b|2) = 1 p(a|3) = 0.8, p(c|3) = 0.2
3Add probabilities1 0.8
0.5
0.2 0.5
Hidden Process Models
Underlying(hidden)state spaces
Model 1 Model n
a, b
a, c
c, d
e
f, c c, d
h
f, g
a b c d a b b a d a c f h c c g d gObservationsrelated to statesequences
Observationsare interleaved
a b c c f h d cc a b g d b a g d a
Observations missed,noise added, unlabelled(This is what we see)
a b a c f k h d c b g d b k h a g d a
IPAM 27-7-2005Cybenko
Terminology and SummaryProcesses have states.
The states are hidden.
States emit observables that are possibly not unique to a state.
Observables are not labeled, can be noisy and might be dropped.
Multiple processes might be instantiated.
The problem is to determine which processes are possible and which states those processes can be in.
Multiple process detection can be reduced to single process detection at the expense of exponential growth.
Tracks are associations of observations to processes.
Hypotheses are consistent tracks that explain all the observables.
IPAM 27-7-2005Cybenko
Discrete Source Separation Problem(viz Blind Source Separation, “Cocktail Party” Problem)
3 states + transition probabilitiesn observable events: a,b,c,d,e,…Pr( state | observable event ) given/known
Observed event sequence:….abcbbbaaaababbabcccbdddbebdbabcbabe….
Catalog ofProcesses/Models
Which combination of which process models “best” accountsfor the observations? This is what we want to compute. Events
not associated with a known process are “anomalies”.
Process/Model Example:
A Track
A Hypothesis
IPAM 27-7-2005Cybenko
A Simple Example of Process Detection
A B C D
{ a } { b } { b , c } { c , d }
a,b,c,d are events that can be observed
E F
{ a } { b }E,F = 0repeat
read event eif e==a then Eif E and e==b then F
until F
NETWORK WORM MODEL (NW)(a,b,c,d ICMP traffic levels)
ROUTER FAILURE MODEL (RF)
Two models; states have different semantics; sets of observables intersect – what is the “diagnosis”?
• a,b,c,d are events that can be observed• states A, B, C, D, E, F are hidden
• observe a sequence of events
Sequence Hypotheses• ab NW | RF• abab (NW & NW)|(RF&NW)...• ababc (NW & RF)|(NW & NW)• ababcc NW & NW
• Which process or combination of processes explains the observed events?
IPAM 27-7-2005Cybenko
Detecting a Process Using Rules
A B C D
{ a } { b } { b , c } { c , d }
A,B,C,D = 0repeat
read event eif e==a then Aif A and e==b then Bif B and (e==b or e==c) then Cif C and (e==c or e==d) then D
until D
E F
{ a } { b }E,F = 0repeat
read event eif e==a then Eif E and e==b then F
until F
WORM MODEL(a,b,c,d ICMP traffic levels)
ROUTER FAILURE MODEL
What does “ab” mean ? (Process ambiguity)What does “ac” mean ? (Missed Detections)
IPAM 27-7-2005Cybenko
Rules for Process Disambiguation
A B C D
{ a } { b } { b , c } { c , d }
A,B,C,D = 0repeat
read event eif e==a then Aif A and e==b then Bif B and (e==b or e==c) then Cif C then (E=0, F=0)if C and (e==c or e==d) then D
until D
E F
{ a } { b }E,F = 0repeat
read event eif e==a then Eif E and e==b then F
until F
WORM MODEL(a,b,c,d ICMP traffic levels)
ROUTER FAILURE MODEL
Cannot decide which process is instantiated until more data arrives.
IPAM 27-7-2005Cybenko
Rules for Missed Detections
A B C D
{ a } { b } { b , c } { c , d }
A,B,C,D = 0repeat
read event eif e==a then Aif A and e==b then Bif A and e==c then C,Dif A and e==d then Dif B and (e==b or e==c) then Cif C then (E=0, F=0)if C and (e==c or e==d) then Dif D then (E=0, F=0)
until DWORM MODEL(a,b,c,d ICMP traffic levels)
This clearly does not scale and does not lead to manageable sets/systems of rules.
IPAM 27-7-2005Cybenko
Complexity of Rule-Based Systemsfor Multiple Process Detection
mm process models, each with process models, each with nn states states Potentially as few as Potentially as few as mnmn state transitions in the state transitions in the
original modelsoriginal models Potentially need to add:Potentially need to add:
O(O(mm22nn22) rules for disambiguation ) rules for disambiguation O(O(mnmn22) rules for missed detections) rules for missed detections these are “overhead” processing steps that can be these are “overhead” processing steps that can be
done generically, not by the decision tree or rule setdone generically, not by the decision tree or rule set Process Query System software handles this Process Query System software handles this
overhead processingoverhead processing
IPAM 27-7-2005Cybenko
Approaches to Detecting Processes
AristotelianAristotelian - Traditional information retrieval is based on - Traditional information retrieval is based on specification of a query in terms of Boolean expressions specification of a query in terms of Boolean expressions based on record fields. IE. SQL ( name = “smith” & age > based on record fields. IE. SQL ( name = “smith” & age > 20 & age < 40 ) + rule-based logics + decision trees, etc20 & age < 40 ) + rule-based logics + decision trees, etc
NewtonianNewtonian - Next generation process detection requires - Next generation process detection requires retrieval based on specification of a set of discrete, retrieval based on specification of a set of discrete, dynamic processes. IE, descriptions of a Hidden Markov dynamic processes. IE, descriptions of a Hidden Markov Model, Hidden Petri Net, weak models, FSMs, attack Model, Hidden Petri Net, weak models, FSMs, attack trees, etc. trees, etc.
Main Concept: Move from an Main Concept: Move from an AristotelianAristotelian to a to a NewtonianNewtonian Paradigm. Paradigm.
IPAM 27-7-2005Cybenko
Examples of Process Detection Problems Is there an unusual pattern of computer network events, host activities, system calls,
etc? (Network and computer security) Is a complex infrastructure (telecom, electricity, financial networks) operating normally
or in a failure mode? (Critical Infrastructures) Is my software operating normally? (Autonomic computing) What biological pathways/processes are engaged? (Molecular Biology) Is there an unusual pattern of document accesses within an enterprise document
control system? (Insider Threat Detection) Does a group of unusual transactions constitute a threat? (Homeland Security) Has the physical border/perimeter been breached? (National and industrial physical
intrusion detection) Is there a large ground vehicle convoy moving towards our position? (Tactical military) What’s going on around me? (Human Cognitive Processing)
IMPORTANT – All are “adversarial” situations, not cooperative, so the observations are not necessarily labeled for easy identification and association with a process!
IPAM 27-7-2005Cybenko
Related Disciplines
Underlying Underlying ModelModel
FiniteFinite
StateState
MachinesMachines
MarkovMarkov
Chains,Chains,
Shannon Shannon ChannelChannel
x’ = Ax + Bux’ = Ax + Bu
y = Cx + Dvy = Cx + Dv
u, v Gaussianu, v Gaussian
noisenoise
AnyAny
AlgorithmsAlgorithms
for Singlefor Single
ProcessesProcesses
State State marking, egmarking, eg
ViterbiViterbi
algorithmsalgorithms
Kalman FilteringKalman Filtering Not applicableNot applicable
MultipleMultiple
ProcessesProcesses
ProcessProcess
QueryQuery
SystemsSystems
ProcessProcess
QueryQuery
SystemsSystems
ProcessProcess
QueryQuery
SystemsSystems
Multiple Multiple HypothesisHypothesis
Tracking (MHT)Tracking (MHT)
“Weak”Models
HiddenMarkov Models
Linear StateSpace Systems
Multiple Target Tracking
IPAM 27-7-2005Cybenko
Software and Applications Sensor networksSensor networks
Airborne plume detectionAirborne plume detection
Cyber securityCyber security
Server pool managementServer pool management
Dynamics of social networks*Dynamics of social networks*
Genomics and biological Genomics and biological pathways*pathways*
Human situation awareness*Human situation awareness*
*In process or planned.*In process or planned.
IPAM 27-7-2005Cybenko
Process Query Systems (PQS)
Process Query Systems solve the Discrete Process Query Systems solve the Discrete Source Separation Problem in a generic way:Source Separation Problem in a generic way: inputsinputs
a sequence of unlabelled observations (stream, logfiles, etc)a sequence of unlabelled observations (stream, logfiles, etc) a collection of process modelsa collection of process models
outputsoutputs estimates of which processes produced those observations estimates of which processes produced those observations estimates of which states those processes are inestimates of which states those processes are in
Basic theory and technology has been developed Basic theory and technology has been developed by the PQS team at Dartmouthby the PQS team at Dartmouth
Now being applied to a variety of applicationsNow being applied to a variety of applications
IPAM 27-7-2005Cybenko
Algorithms/Operations of PQS
Recursive in Time
Track
Track
Tracks
Tracks
Track
Track
Track
Track
HypothesisPool
Hypothesis 1
Hypothesis n
Track
Track
Tracks
SubscribedData
Arrives2
Update Tracks WithinHypotheses (Viterbi / Kalman /
NDFA,etc) and Create New Hypotheses
Track
Track
Track
Track
3
Tracks
Tracks
Tracks
Tracks
Tracks
Track
Track
Tracks
Tracks
Tracks
ManageHypotheses
(MHT)4
Build or Learn
Models1Evaluate
Solutionsand
Process Outputs
5
IPAM 27-7-2005Cybenko
DISCUS
Vehicle Tracking
Software: Process Query System One platform, many applications
Generic Process Query System
PQSnet.net
Computer Security
ARDA
DARPA
Cyberlog Analysis
Attacks on utilities
DHS
Plume detection
Sensor networksRobust Server
Pooling DHS
DHS
IPAM 27-7-2005Cybenko
…application logic statement 1;application logic statement 2;file management statement 1;record management statement 1;file management statement 2;record management statement 2;application logic statement 3;record management statement 3;file management statement 3;application logic statement 4;…
…application logic statement 1;application logic statement 2;SQL statement 1;application logic statement 3;SQL statement 2;application logic statement 4;…
…file management operation 1;record management operation 1;file management operation 2;record management operation 2;record management operation 3;file management operation 3;…
Pre-SQL Programs Post-SQL Programs
Interwoven logicApplication logic Database management system
+
…model logic statement 1;model logic statement 2;sensor access statement 1;state estimate statement 1;sensor access statement 2;state estimate statement 2;model logic statement 3;sensor access statement 3;state estimate statement 3;model logic statement 4;…
…model description statement 1;model description statement 2;model description statement 3;model description statement 4;…
…sensor access statement 1;state estimate statement 1;sensor access statement 2;state estimate statement 2;sensor access statement 3;state estimate statement 3;…
Current Process Detection Programs PQS-based Programs
Interwoven logic
Model description
Process query system
+
User responsibility System responsibility
User responsibility System responsibility
The COBOL and pre-PQS Analogy
Computer Security Example(V. Berk and N. Fox)
Funded by ARDA and DHS
IPAM 27-7-2005Cybenko
Network Security Objective:Objective:
Detect, disambiguate, and predict the course of Detect, disambiguate, and predict the course of concerted network attacks in an enterprise concerted network attacks in an enterprise class network.class network.
Why:Why:Problem domain Problem domain demandsdemands the power of PQS the power of PQS Hundreds of “processes” occurring at onceHundreds of “processes” occurring at once Lots of missed observations and noiseLots of missed observations and noise All commercial technology focuses on collection and All commercial technology focuses on collection and
presentation of datapresentation of data Existing correlation efforts very weak at bestExisting correlation efforts very weak at best
IPAM 27-7-2005Cybenko
Goal of PQS in network monitoring
Create a system that quickly, and Create a system that quickly, and accurately correlates related activity.accurately correlates related activity.
Assist a security analyst in deciding:Assist a security analyst in deciding:What activity is irrelevant.What activity is irrelevant.What activity needs attention and further What activity needs attention and further
investigation.investigation.
IPAM 27-7-2005Cybenko
SENSORS INTEGRATED
DIB:s Dartmouth ICMP-T3 Bcc: System
CovChan Timing Covert Channel Detection
Snort Signature Matching IDS
IPtables Linux Netfilter firewall, log based
Samba SMB server - file access reporting
Weblog IIS, Apache, SSL error logs, …
US-agent Userspace host monitoring agent
Tripwire Host filesystem integrity checker
Global
Network
Host
SENSOR DESCRIPTION SCOPE
IPAM 27-7-2005Cybenko
IPAM 27-7-2005Cybenko
Multistage Process Model
Start/Normal
Scanned
Infected
Data Access
Exfiltration
Potential malicious activity
Potential normal activity
Internet
DMZ
WS
Dartmouth
WinXP/LINUX targets
192.168.24.192/26
WWW Mail
US-Agent
CovChan
IPTables
Snort
DIB:s
SaMBa
PQS
PQS-Net Testbed at Dartmouth
www.pqsnet.net
ISTS
172.18.12.32-38
Attack Hosts:
• Skaion
• Custom Exploits
• Core Impact™
• Normal Traffic
• Covert Channels
• Worms
PQS-Net
PQS-Net supply chainTier 1 ModelsTier 1 Models Focus on individual host Focus on individual host
statusstatus Report on status changesReport on status changes
Tier 1 Tracker
Tier 2 Tracker
Attack steps Attack sequences and scores
sensor data
Analyst’s front-end
sensors
Tier 2 ModelsTier 2 Models Focus on correlating host Focus on correlating host
activityactivity Report chains of eventsReport chains of events
Tier 1 OutputTier 1 OutputMon Feb 21 20:06:17 2005 000000 131.58.63.160 Mon Feb 21 20:06:17 2005 000000 131.58.63.160
(hostile) recon on 100.10.20.4 SNORT 469 (hostile) recon on 100.10.20.4 SNORT 469 proto: 1proto: 1
Mon Feb 21 20:30:24 2005 000000 138.158.170.45 Mon Feb 21 20:30:24 2005 000000 138.158.170.45 (hostile) attacked 100.10.20.4 ERRORLOG 400 (hostile) attacked 100.10.20.4 ERRORLOG 400 proto: 6 dport: 443proto: 6 dport: 443
Tier 2 OutputTier 2 OutputHypothesis 1Hypothesis 1
Score: 0.8Score: 0.8
Hypothesis 2Hypothesis 2
Score 0.2Score 0.2
A scans BA scans B A scans BA scans B
B scans EB scans E
B attacks EB attacks E
IPAM 27-7-2005Cybenko
Example Scenario
Internet
BC ED
A
Tier1 AlertsTier1 Alerts IndicatorsIndicators
A scans BA scans BSnort:Snort:
02/21-20:06:17.904500 [**] [1:469:1] ICMP PING NMAP [**] [Classification: 02/21-20:06:17.904500 [**] [1:469:1] ICMP PING NMAP [**] [Classification: Attempted Information Leak] [Priority: 2] {ICMP} 131.58.63.160 -> 100.10.20.4Attempted Information Leak] [Priority: 2] {ICMP} 131.58.63.160 -> 100.10.20.4
C attacks B C attacks B (success)(success)
SSL error log (host 100.10.20.4):SSL error log (host 100.10.20.4):
[Mon Feb 21 20:30:24 2005] [error] mod_ssl: SSL handshake failed (server [Mon Feb 21 20:30:24 2005] [error] mod_ssl: SSL handshake failed (server www.osis.gov:443, client 138.185.170.45) (OpenSSL library error follows)www.osis.gov:443, client 138.185.170.45) (OpenSSL library error follows)
[Mon Feb 21 20:30:24 2005] [error] OpenSSL: [Mon Feb 21 20:30:24 2005] [error] OpenSSL: error:1406908F:lib(20):func(105):reason(143)error:1406908F:lib(20):func(105):reason(143)
IPAM 27-7-2005Cybenko
Example Cont’d
B ED
Tier1 AlertsTier1 Alerts IndicatorsIndicators
B scans DB scans D02/21-20:31:17.528602 [**] [1:1807:2] WEB-MISC Chunked-Encoding 02/21-20:31:17.528602 [**] [1:1807:2] WEB-MISC Chunked-Encoding transfer attempt [**] [Classification: Web Application Attack] [Priority: 1] transfer attempt [**] [Classification: Web Application Attack] [Priority: 1] {TCP} 100.10.20.4:34074 -> 100.10.20.169:80{TCP} 100.10.20.4:34074 -> 100.10.20.169:80
B attacks D (fails)B attacks D (fails)100.20.1.169 - - [21/Feb/2005:08:31:22 -0500] "GET /default.idq?100.20.1.169 - - [21/Feb/2005:08:31:22 -0500] "GET /default.idq?AAAAAAAAAAA………..AAAAAAA HTTP/1.1" 404 1287 "-" "-"AAAAAAAAAAA………..AAAAAAA HTTP/1.1" 404 1287 "-" "-"
B scans EB scans E02/21-20:32:01.622465 [**] [1:1807:2] WEB-MISC Chunked-Encoding 02/21-20:32:01.622465 [**] [1:1807:2] WEB-MISC Chunked-Encoding transfer attempt [**] [Classification: Web Application Attack] [Priority: 1] transfer attempt [**] [Classification: Web Application Attack] [Priority: 1] {TCP} 100.10.20.4:34076 -> 100.10.20.170:80{TCP} 100.10.20.4:34076 -> 100.10.20.170:80
B attacks E B attacks E (succeeds)(succeeds)
100.20.1.170 - - [21/Feb/2005:08:32:06 -0500] "GET /default.idq?100.20.1.170 - - [21/Feb/2005:08:32:06 -0500] "GET /default.idq?AAAAAAAAAAA………..AAAAAAA HTTP/1.1" 200 1287 "-" "-"AAAAAAAAAAA………..AAAAAAA HTTP/1.1" 200 1287 "-" "-"
IPAM 27-7-2005Cybenko
Fish Tracking (Kinematic Tracking)A. Jordan, W. Chung, V. Crespi
Funded by DARPA and DHS
IPAM 27-7-2005Cybenko
Real time Fish Tracking
Objective:Objective:Track the fish in the fish tankTrack the fish in the fish tank
Why:Why:Very strong example of the power of PQSVery strong example of the power of PQS Fish swim very quickly and erraticallyFish swim very quickly and erratically Lots of missed observationsLots of missed observations Lots of noiseLots of noise Classical Kalman filters don’t work (non-linear Classical Kalman filters don’t work (non-linear
movement and acceleration)movement and acceleration) ““Easier” than getting permission to track people (we Easier” than getting permission to track people (we
mistakenly thought)mistakenly thought)
IPAM 27-7-2005Cybenko
Fish Tracking Details 5 Gallon tank with 2 red Platys 5 Gallon tank with 2 red Platys
named Bubble and Squeaknamed Bubble and Squeak Camera generates a stream of Camera generates a stream of
“centroids”:“centroids”:For each frame a series of (X,Y) pairs is For each frame a series of (X,Y) pairs is
generated.generated.
Model describes the kinematics Model describes the kinematics of a fish:of a fish:
The model evaluates if new (X,Y) pairs The model evaluates if new (X,Y) pairs could belong to the same fish, based could belong to the same fish, based on measured position, momentum, on measured position, momentum, and predicted next position. This and predicted next position. This way, multiple “tracks” are formed. way, multiple “tracks” are formed. One for each object.One for each object.
Model was built in under 3 Model was built in under 3 days!!!days!!!
IPAM 27-7-2005Cybenko
Autonomic Server Monitoring(C. Roblee, V. Berk)
Funded by DHS, ARDA
IPAM 27-7-2005Cybenko
Autonomic Server Monitoring
Objective:Objective:Detect and predict deteriorating service Detect and predict deteriorating service
situationssituationsWhy:Why:
Another strong example of the power of PQSAnother strong example of the power of PQS Software and hardware are buggy and vulnerableSoftware and hardware are buggy and vulnerable Hot market, large profits for Hot market, large profits for “The ONE”“The ONE” application application Very ambiguous observationsVery ambiguous observations Sys-admins also want vacationSys-admins also want vacation
IPAM 27-7-2005Cybenko
The Environment Hundreds of servers and servicesHundreds of servers and services Various non-intrusive sensors check for:Various non-intrusive sensors check for:
CPU load Memory footprint Process table (forking behavior) Disk I/O Network I/O Service query response times Suspicious network activities (i.e.. Snort)
Models describe the kinematics of failures and Models describe the kinematics of failures and attacks:attacks:The model evaluates load balancing problems, memory leaks, The model evaluates load balancing problems, memory leaks,
suspicious forking behavior (like /bin/sh), service hiccups suspicious forking behavior (like /bin/sh), service hiccups correlated with network attacks…correlated with network attacks…
IPAM 27-7-2005Cybenko
t0 t1 t2 t3 t4
Server Compromise Model: Server Compromise Model: Generic Attack Scenario Generic Attack Scenario
Observations Response
o1
Snort NIDS sensor output
...Nov 21 20:57:16 [10.0.0.6] snort: [1:613:7]SCAN myscan [Classification: attempted-recon] [Priority: 2]:{TCP} 212.175.64.248-> 10.0.0.24...
1.o1 o2 o3
Current system record for host 10.0.0.24 (10 records): Average memory over previous 10 samples: 251.000Average CPU over previous 10 samples: 0.970| time | mem used | CPU load | num procs | flag |----------------------------------------------------------------------------------| 1101094903 | 251 | 0.970 | 64 | || 1101094911 | 252 | 0.820 | 64 | || 1101094920 | 251 | 0.920 | 64 | || 1101094928 | 251 | 0.930 | 64 | || 1101094937 | 251 | 0.870 | 65 | || 1101094946 | 251 | 0.970 | 65 | || 1101094955 | 251 | 0.820 | 65 | || 1101094964 | 253 | 1.220 | 65 | ! || 1101094973 | 255 | 1.810 | 65 | ! || 1101094982 | 258 | 2.470 | 65 | ! |
Monitored host sensor output (system level)2.PQS Tracker Output
Last Modified: Mon Nov 21 21:01:03 Model Name: server_compromise1Likelihood: 0.9182Target: 10.0.0.24Optimal Response: SIGKILL proc 6992
SIGKILL
3.
IPAM 27-7-2005Cybenko
Experimental Results:
0
10
20
30
40
50
60
70
80
90
100
0 100 200 300 400 500
Time (s)
% S
ys
tem
Me
mo
ry U
se
d
No Tracking Tracking
Successful Requests
System Memory Consumed210,000 requests serviced 380,000 requests serviced
IPAM 27-7-2005Cybenko
Theory
Process Query System frameworks offer a Process Query System frameworks offer a principled approach that enablesprincipled approach that enablesunderstanding how distinguishable models understanding how distinguishable models
(attack and failure) are (attack and failure) are developing a notion of processes that are developing a notion of processes that are
“trackable,” given models and sensing “trackable,” given models and sensing infrastructure (ie a “sampling theory”)infrastructure (ie a “sampling theory”)
IPAM 27-7-2005Cybenko
Hypothesis Growth
time
Individual path isa “track” – ie one process instance
Consistent tracksform a “hypothesis”
A “hypothesis” is a consistent A “hypothesis” is a consistent assignment of events to assignment of events to processes and/or states(ie, processes and/or states(ie, each event assigned to only each event assigned to only one process instance). one process instance).
Given a set of “hypotheses” Given a set of “hypotheses” for an event stream of length for an event stream of length k-1, update the hypotheses to k-1, update the hypotheses to length k to explain the new length k to explain the new event.event.
NP-Complete in general. NP-Complete in general. Need to prune the pool of Need to prune the pool of hypotheses, keeping the most hypotheses, keeping the most suitable.suitable.
IPAM 27-7-2005Cybenko
Models and Hypothesis Growth
Emission for state i = 0/1 vector of sensor reportseg obs(i) = ( 0 , 1 , 1 , 0 , 0 , 1 , 1 )
Observation vector at time t collected by sensors: eg sensors(t) = ( 0 , 1 , 1 , 1 , 1 , 1 , 0 )
“Weak” modelFSM with “emission”vectors
Possible states at time t are determined by: P = { i | Hamming_distance( obs(i) , sensors(t)) <= HD } R = { i | j possible at time t - 1 and i is reachable from j }
P R is the set of possible states at time t
Number of hypotheses at time t recursively computed as above.
U
Theorem: For a fixed value of HD, the worst-case number of hypotheses at time t is either polynomial or exponential in t. (Crespi, Cybenko, Jiang 2004)
IPAM 27-7-2005Cybenko
Longertracking
time
More noise(worse model)
Longertrackingtime
More noise(worse model)
Nice Demo!!
Ouch!!!
IPAM 27-7-2005Cybenko
Longertrackingtime
More noise(worse model)
ExcellentModels
andSensor
Coverage
AcceptableModels
andSensor
Coverage
PoorModels
andSensor
Coverage
IPAM 27-7-2005Cybenko
Basic Idea Behind the Proof
N states
time t time t+1 time t+2 time k
If there are never two distinct paths from any node to itself over any period of observation, there is a simple injective mapping (ie. unique labeling) of the paths into {0, 1, ... , k} x {0, 1, ... , k} x {0, 1, ... , k} ... x {0, 1, ... , k} 2N times. So the number of paths is < (k+1)2N. The label for each path is the time it first occupies a state and the time it last occupies that state.
IPAM 27-7-2005Cybenko
Basic Idea Behind the Proof
N states
time t time t+1 time t+2 time k
Process dynamics (ie what is reachable from each state in a time step) + observations + noise threshold determines a “trellis”. If there are two distinct paths from one node to itself over some period of time, the number of distinct paths grows exponentially by repeating the construct.
IPAM 27-7-2005Cybenko
IPAM 27-7-2005Cybenko
Relationship to Spectral Radius
Classical spectral radius: Classical spectral radius: (A) = |(A) = |maxmax|| Joint spectral radius of a set, Joint spectral radius of a set, = {A = {A11, ... A, ... Ann}, of }, of
matrices:matrices:
(() = lim max ) = lim max B Bkk))1/ t1/ t
Hypothesis growth is polynomial iff Hypothesis growth is polynomial iff (() ) <= 1<= 1 Deciding whether Deciding whether (() ) <= 1 for real or rational <= 1 for real or rational
matrices is impossible (Tsitsiklis and Blondel, 2000)matrices is impossible (Tsitsiklis and Blondel, 2000) If If consists of 0-1 matrices, decidable but NP hard.consists of 0-1 matrices, decidable but NP hard.
t Bk 0 < k < t+1
Distinguishability of models
Given two “models”, how distinguishable are Given two “models”, how distinguishable are they?they?
Example: How different are these two models?Example: How different are these two models?
1 2
p(a|1) = 0.8 , p(c|1) = 0.2 p(b|2) = 1 p(a|3) = 0.8, p(c|3) = 0.2
31 0.8
0.5
0.2 0.5
1 2
p(a|1) = 0.9 , p(d|1) = 0.1 p(b|2) = 1 p(a|3) = 0.8, p(c|3) = 0.2
30.9 0.8
0.5
0.2 0.5
0.1
IPAM 27-7-2005Cybenko
Distinguishability of models
The goal is to answer questions such as: “Do we The goal is to answer questions such as: “Do we need to build more refined models or do we need to build more refined models or do we need to add additional sensors/data sources or need to add additional sensors/data sources or improve tracking/hypothesis management?”improve tracking/hypothesis management?”
Distance between
means
IPAM 27-7-2005Cybenko
Different degrees of distinguishability betweenmodels given their sensing capabilities: 1
Red: Prob of deciding model 2 given model 1Blue: Prob of deciding model 1 given model 2
Entropy of the two ergodic models are different.
Decision rule is based on ML as determined by the Viterbi algorithm
Shannon-MacMillan-Brieman Ergodic Theoremstates that “most” observation sequencesare “typical” and have probability related to the entropy
IPAM 27-7-2005Cybenko
Different degrees of distinguishability betweenmodels given their sensing capabilities: 2
However, nonmonotonic behaviors are possible(in general) and without convergence to zero (if the entropies are the same)
IPAM 27-7-2005Cybenko
Different degrees of distinguishability betweenmodels given their sensing capabilities: 3
However, nonmonotonic behaviors are possible(in general) and without convergence to zero (if the entropies are the same)
IPAM 27-7-2005Cybenko
One state sequence, one observation seq.
One observation seq., at most 1 state seq.
If acceptable, there is 1 state seq.
If unacceptable, there is 0 state seq.
A WM can be reduced to a DFA.
Every DFA has an unique minimum state unifilar WM:
WM->DFA->Minimization->WM
For a unifilar WM, counting acceptable strings with length n, for n sufficiently large:
Where λ1 is the maximum eigenvalue of A .
Y. Sheng thesis, efficient estimates of1
Unifilar models
T1{0}
T2{1}
T3{1}
011
100
011
A
10
10
01
B
0110
11101
10
nnn AML
Definition: for any pair of state si, and input yj, there could be at most one successor state
IPAM 27-7-2005Cybenko
Summary
Multiple process detection is a ubiquitous problem Multiple process detection is a ubiquitous problem with many applications but it has not been with many applications but it has not been systematically studied.systematically studied.
Existing approaches are either very ad hoc, very Existing approaches are either very ad hoc, very specialized or very unscalable.specialized or very unscalable.
There is a promising generic software system for There is a promising generic software system for solving multiple process detection.solving multiple process detection.
The theory is rich and largely unexplored.The theory is rich and largely unexplored.
IPAM 27-7-2005Cybenko