Date post: | 18-Jan-2018 |
Category: |
Documents |
Upload: | crystal-shelton |
View: | 226 times |
Download: | 0 times |
Effective Anomaly Detection with Scarce Training DataPresenter: 葉倚任Author: W. Robertson, F. Maggi, C. Kruegel and G. VignaNDSS 2010
1
Outline
• Introduction• Training Data Scarcity• Exploiting Global Knowledge• Evaluation
2
Properties of Anomaly Detection
• Pros– Unknown attacks can be identified automatically– Without any a priori knowledge about the application.– Need not manually analyze applications composed of
hundreds of components
• Cons– Tendency to produce a non-negligible amount of false
positives– Critically rely upon the quality of enough training data
used to construct their models
3
Motivation
• Web application component invocations are non-uniformly distributed
• For those components, it is often impossible to gather enough training data to accurately model their normal behavior
• No proposals exist that satisfactorily address the problem
4
Contributions
• Provide evidence for that traffic is distributed in a non-uniform fashion
• Propose an approach to address the problem of undertraining by using global knowledge
• Evaluate the proposed approach on a large data set of real-world traffic from many web applications
5
Outline
• Introduction• Training Data Scarcity• Exploiting Global Knowledge• Evaluation
6
Summary of Notation
• Notations– A: a set of web applications
• – R: a set of resource paths or components
– P: parameters•
– Q: requests• Each request is represented by the tuple
7
Summary of Notation (cont’d)
• The set of models associated with each unique parameter instance can be represented as a tuple:
• The knowledgebase of an anomaly detection system trained on web application is denoted by
8
Multi-model Approach
• A profile for a given parameter is the tuple
– describe normal intervals for integers and string lengths
– models character strings as a ranked frequency histogram, or Idealized Character Distribution (ICD),
– models sets of character strings by inducing a Hidden Markov Model (HMM).
– models parameter values as a set of legal tokens
9
The Problem Non-uniform training data• In the case of low-traffic applications
– the rate of client requests is inadequate to allow models to train in a timely manner.
• In the case of high-traffic applications– a large subset of resource paths might fail to receive
enough requests
10
Non-uniform training data
11
Outline
• Introduction• Training Data Scarcity• Exploiting Global Knowledge• Evaluation
12
Exploiting Global Knowledge
• Parameters of the same type tend to induce model compositions that are similar to each other
• The goal is substituting profiles for similar parameters of the same type
• The proposed method is composed of three phases– Enhanced training– Building profile knowledge bases– Mapping undertrained profiles to well-trained profiles
13
14
Phase I: Enhanced training
• Generate undertrained profiles– Let denote a sequence of client
requests containing parameter p for ai
– Randomly sampled κ-sequences, where κ can take values in
• Each of the resulting profiles is then added to a knowledge base
• Each model monitors its stability during the training phase
• Well trained, or stable, profile is stored in a knowledge base
15
Phase II: Building profile knowledge bases
• Merge a set of knowledge bases as the undertrained profile database
• Profile clustering is performed in in order to time-optimize query execution
• The resulting clusters of profiles in are denoted by
• An agglomerative hierarchical clustering algorithm using group average linkage was applied
16
Distance Measure
• More formally, the distance between the profiles ci and cj is defined as:
where is the
distance function
17
Distance Functions
18
Phase III: Mapping undertrained profilesto well-trained profiles
• The mapping is implemented as follows– A nearest-neighbor match is performed between
and – A nearest-neighbor match is performed between and the
members of to discover the undertrained profile at minimum distance from
– Well-trained profile is substituted for
19
Mapping Quality
20
Mapping Quality
• Let be a mapping from an undertrained cluster to the maximum number of elements in that cluster that map to the same cluster in C
• The robustness metric ρ is then defined as
• And
where is a minimum robustness threshold
21
Outline
• Introduction• Training Data Scarcity• Exploiting Global Knowledge• Evaluation
22
Experimental Setting
• HTTP connection observed over a period of approximately three months
• A portion of the resulting flows were then filtered using Snort to remove known attacks
• The data set contains 823 distinct web applications, 36,392 unique components, 16,671 unique parameters, and 58,734,624 HTTP requests
23
Profile clustering quality
24
Profile mapping robustness
25
Detection accuracy•100,000 attacks
26
Conclusion
• Have identified that non-uniform web client access distributions cause model undertraining
• Propose the use of global knowledge bases of well-trained profiles to remediate a local scarcity of training data
27