1
Exact pattern matching on resource-limited network devices
Chien-Chung Su
2002/12/10
2
Outline
• Problem definition
• Resource-limited network devices
• Introduction of SEBMH
• Disadvantages of SEBMH
• Adaptive bucket management
• Conclusion
3
Problem definition
• Given– P : pattern(s)– T : text
• General action– Find all occurrences of P in T
4
Research for exact pattern matching
• The exact matching problem is solved for those typical word-processing applications.
• The story changes radically for other specific applications.– DNA and protein search– Relation between search performance and
database size– Network intrusion detection
5
Resource-limited network devices
• Special issues– Security issues
• Check whether P occur in T– Resource-limited
• Try to break the tradeoff between speed and space
• Characteristics– Network-related pattern matching
• Patterns change sometimes• Texts change usually
– Solutions• Dynamic hash function• Adaptive bucket management
6
SEBMH
Global Shift Table
Hash-Link-List Structure of ASCII Patterns
Hash-Link-List Structure of non-ASCII Patterns
Input Mask
7
Set-Exclusive table
Slepsp
Text e el ……
Sg
L
sp
The shortest pattern of Link-List L
The shortest pattern of all
ep
HashMatching failed
sp1
HashTable
… e
…
Global ShiftTable
… Sg
…
…
el
…
Set-ExclusiveTable
…
Sl
…
8
Disadvantages of SEBMH
• Because the hash function is static, the performance is still dependent with pattern set.– Dynamic hash function
• The general pattern matching problem, the global shift values will be close to 1 when there are more and more patterns– Classifying the patterns to ease the influence
9
How to improvement
• Pattern classifier
• Approximate perfect hash function
• Adaptive bucket management
10
• Step1. sort the class target patterns by KEY• Step2. equally distribute the class target patterns into each bucket
n = BUCKET_NUM; i = 0; while (pattern is not the last one) { for (i=0 ; i<AVG_P ; i++) { 1.dispatch pattern into bucket(n); 2.get the next pattern; } n++; }
• Step3. handle the exception condition for (i=1 ; i<BUCKET_NUM ; i++) { if ( patterns with key in bucket(i-1) equal to patterns with key in bucket( i ) ) 1.group these patterns into bucket(i-1) or bucket( i ) }
Approximate hash function (1)
11
Approximate hash function (2)
12
Adaptive bucket management
• Assumption– Resource is limited– Total bucket number is fixed
• Step 1 : classify the patterns– For example (feature is a factor)
• Class A• Class B• Class C
13
Adaptive bucket management
• Step 2 : allocate buckets– For example
• Traffic distribution– Class A : 50%
– Class B : 30%
– Class C : 20%
• Policy– SEBMH(Class A) could get more buckets at this time
– Set-Exclusive table will be more effective
» bucket ↑, pattern per bucket ↓, efficacy of set-exclusive table ↑
» bucket ↓, set-exclusive utilization ↑
14
How to allocate buckets
• Communism• Fair• Greedy
15
Basic assumption• Assumption
– Φ : matching time for one pattern– B : total buckets number– P : total patterns number– C : classes number– Bi : buckets number for class i– Pi : patterns number for class i– Di : traffic distribution of class I
• Known– P1 + P2 + … + Pc = P– D1 + D2 + … + Dc = 1
• Problem– Find a sequence (B1, B2, …, Bc)
• B1 + B2 + … + Bc = B
• is small enoughDcBc
PcD
B
PD
B
P ...2
2
21
1
1
16
Communism MethodABM is not applied
• Without ABM– Classifier is no need– Average matching time :
– Other overheads• Overheads of approximate perfect hashing• Efficacy of Global-Shift table is not obvious• Efficacy of Set-Exclusive table is not obvious
B
PAMTCM
17
Fair MethodAt least one solution
• For example– Traffic distribution
• Class A : 50%• Class B : 30%• Class C : 20%
• With ABM in Fair Method
– Average matching time :
– Example:
B
P
B
P
B
P
B
PCBA CCC
%20
%20%30
%30%50
%50
B
P
BD
PDAMT
C
i ii
iiFM
1
18
Greedy MethodWe can find better solutions
• For example– Traffic distribution Pattern distribution
• Class A : 50% Class A : 5• Class B : 30% Class B : 5• Class C : 20% Class C : 20
• With ABM in Greedy Method
– Average matching time :
– Example
3334
2055
24
45
3
20%20
3
5%30
4
5%50
])|,min([1
C
i i
iiGM B
PxsequenceBanyxAMT
19
20021112_ 實驗報告
20
Objective
• 觀察最佳解的分佈情況• 希望能從觀察中找出演算法來求解
21
Traffic dist. 和 pattern dist. 成正比
Bucket = 10 Bucket = 30
22
Traffic dist. 和 pattern dist. 成反比
Bucket = 10 Bucket = 30
23
結論• 當 pattern 和 traffic 的分布成反比時才有效
果 , 可作為訓練 classifier 的參考依據
24
Greedy Algorithm (temp)
• Step 1 : get the Bi from fair method
• Step 2 : borrow 1 bucket from each class– bonus_bucket = # of class
• Step 3 : dispatch the bonus buckets– Bonusi = floor (bonus_bucket * (Pi / P))
• Step 4 : dispatch the remainder buckets– Add bucket into each class and find the best
solution one by one
25
How to classify patterns (1)
• The goals the classifier should achieve– High priority
• reduce the frequency of ABM performed
– Low priority• enhance the efficacy of ABM
26
How to classify patterns (2)
• reduce the frequency of ABM performed– When ABM should not be performed for specific
classes
• …….(1)
• …….(2)
CceachforN
DDN
iii
1
' ||
CceachforNN
DDNN
i
N
iii
)1(
)()(1
2
1
'2'
27
How to classify patterns (3)
• Expected affect of and– ↑
•
– ↓•
– ↑•
– ↓•
28
How to classify patterns (4)
• enhance the efficacy of ABM– Try to let
• Pi is increasing• Di is decreasing
29
How to classify patterns (5)
• Operators– Combination
• Directly combine two classes in the same domain
– Sibling aggregation
• Combine two classes in the different domain
patterns
OtherUDPTCP
HTTP FTP …. TFTP ICMP
• Objective– Make the tree with the stable traffic tree
….
• Constrain– A lots of patterns with the same prefix in the same class should be a
independent class
30
How to classify patterns (6)
• Mathematical model for training classifier – Merge two classes when
• Conditions of means hold• Conditions of variances hold
– are the same as previous meanings– k (>=1) is a coefficient that could balance
• Resource [ k↑]• Performance [ k ↓]
,
31
How to classify patterns (7)
• Conditions of means
•
•
•
)()( yx DmeanNandDmeanM
kS
ND
andkS
MDS
yy
S
xx
11
||||
kS
NMDDS
i
yixi
1 2
)()(
32
How to classify patterns (7)
• Conditions of variances
•
•
•
)()( yx DVarQandDVarP
kNM
NQ
NM
MP
kQandkP
33
Classifier
• Advantages– reduce the impact of complex approximate
perfect hash function– eliminate the pattern matching not required
34
Classifier behavior
Input packet
belong to any class?NO
bypass
YES
dispatch the input packet to the corresponding handler
35
Next Experiments
36
Conclusion