Exact pattern matching on resource-limited network devices

Post on 01-Jan-2016

23 views 1 download

description

Exact pattern matching on resource-limited network devices. Chien-Chung Su 2002/12/10. Outline. Problem definition Resource-limited network devices Introduction of SEBMH Disadvantages of SEBMH Adaptive bucket management Conclusion. Problem definition. Given P : pattern(s) T : text - PowerPoint PPT Presentation

transcript

1

Exact pattern matching on resource-limited network devices

Chien-Chung Su

2002/12/10

2

Outline

• Problem definition

• Resource-limited network devices

• Introduction of SEBMH

• Disadvantages of SEBMH

• Adaptive bucket management

• Conclusion

3

Problem definition

• Given– P : pattern(s)– T : text

• General action– Find all occurrences of P in T

4

Research for exact pattern matching

• The exact matching problem is solved for those typical word-processing applications.

• The story changes radically for other specific applications.– DNA and protein search– Relation between search performance and

database size– Network intrusion detection

5

Resource-limited network devices

• Special issues– Security issues

• Check whether P occur in T– Resource-limited

• Try to break the tradeoff between speed and space

• Characteristics– Network-related pattern matching

• Patterns change sometimes• Texts change usually

– Solutions• Dynamic hash function• Adaptive bucket management

6

SEBMH

Global Shift Table

Hash-Link-List Structure of ASCII Patterns

Hash-Link-List Structure of non-ASCII Patterns

Input Mask

7

Set-Exclusive table

Slepsp

Text e el ……

Sg

L

sp

The shortest pattern of Link-List L

The shortest pattern of all

ep

HashMatching failed

sp1

HashTable

… e

Global ShiftTable

… Sg

el

Set-ExclusiveTable

Sl

8

Disadvantages of SEBMH

• Because the hash function is static, the performance is still dependent with pattern set.– Dynamic hash function

• The general pattern matching problem, the global shift values will be close to 1 when there are more and more patterns– Classifying the patterns to ease the influence

9

How to improvement

• Pattern classifier

• Approximate perfect hash function

• Adaptive bucket management

10

• Step1. sort the class target patterns by KEY• Step2. equally distribute the class target patterns into each bucket

n = BUCKET_NUM; i = 0; while (pattern is not the last one) { for (i=0 ; i<AVG_P ; i++) { 1.dispatch pattern into bucket(n); 2.get the next pattern; } n++; }

• Step3. handle the exception condition for (i=1 ; i<BUCKET_NUM ; i++) { if ( patterns with key in bucket(i-1) equal to patterns with key in bucket( i ) ) 1.group these patterns into bucket(i-1) or bucket( i ) }

Approximate hash function (1)

11

Approximate hash function (2)

12

Adaptive bucket management

• Assumption– Resource is limited– Total bucket number is fixed

• Step 1 : classify the patterns– For example (feature is a factor)

• Class A• Class B• Class C

13

Adaptive bucket management

• Step 2 : allocate buckets– For example

• Traffic distribution– Class A : 50%

– Class B : 30%

– Class C : 20%

• Policy– SEBMH(Class A) could get more buckets at this time

– Set-Exclusive table will be more effective

» bucket ↑, pattern per bucket ↓, efficacy of set-exclusive table ↑

» bucket ↓, set-exclusive utilization ↑

14

How to allocate buckets

• Communism• Fair• Greedy

15

Basic assumption• Assumption

– Φ : matching time for one pattern– B : total buckets number– P : total patterns number– C : classes number– Bi : buckets number for class i– Pi : patterns number for class i– Di : traffic distribution of class I

• Known– P1 + P2 + … + Pc = P– D1 + D2 + … + Dc = 1

• Problem– Find a sequence (B1, B2, …, Bc)

• B1 + B2 + … + Bc = B

• is small enoughDcBc

PcD

B

PD

B

P ...2

2

21

1

1

16

Communism MethodABM is not applied

• Without ABM– Classifier is no need– Average matching time :

– Other overheads• Overheads of approximate perfect hashing• Efficacy of Global-Shift table is not obvious• Efficacy of Set-Exclusive table is not obvious

B

PAMTCM

17

Fair MethodAt least one solution

• For example– Traffic distribution

• Class A : 50%• Class B : 30%• Class C : 20%

• With ABM in Fair Method

– Average matching time :

– Example:

B

P

B

P

B

P

B

PCBA CCC

%20

%20%30

%30%50

%50

B

P

BD

PDAMT

C

i ii

iiFM

1

18

Greedy MethodWe can find better solutions

• For example– Traffic distribution Pattern distribution

• Class A : 50% Class A : 5• Class B : 30% Class B : 5• Class C : 20% Class C : 20

• With ABM in Greedy Method

– Average matching time :

– Example

3334

2055

24

45

3

20%20

3

5%30

4

5%50

])|,min([1

C

i i

iiGM B

PxsequenceBanyxAMT

19

20021112_ 實驗報告

20

Objective

• 觀察最佳解的分佈情況• 希望能從觀察中找出演算法來求解

21

Traffic dist. 和 pattern dist. 成正比

Bucket = 10 Bucket = 30

22

Traffic dist. 和 pattern dist. 成反比

Bucket = 10 Bucket = 30

23

結論• 當 pattern 和 traffic 的分布成反比時才有效

果 , 可作為訓練 classifier 的參考依據

24

Greedy Algorithm (temp)

• Step 1 : get the Bi from fair method

• Step 2 : borrow 1 bucket from each class– bonus_bucket = # of class

• Step 3 : dispatch the bonus buckets– Bonusi = floor (bonus_bucket * (Pi / P))

• Step 4 : dispatch the remainder buckets– Add bucket into each class and find the best

solution one by one

25

How to classify patterns (1)

• The goals the classifier should achieve– High priority

• reduce the frequency of ABM performed

– Low priority• enhance the efficacy of ABM

26

How to classify patterns (2)

• reduce the frequency of ABM performed– When ABM should not be performed for specific

classes

• …….(1)

• …….(2)

CceachforN

DDN

iii

1

' ||

CceachforNN

DDNN

i

N

iii

)1(

)()(1

2

1

'2'

27

How to classify patterns (3)

• Expected affect of and– ↑

– ↓•

– ↑•

– ↓•

28

How to classify patterns (4)

• enhance the efficacy of ABM– Try to let

• Pi is increasing• Di is decreasing

29

How to classify patterns (5)

• Operators– Combination

• Directly combine two classes in the same domain

– Sibling aggregation

• Combine two classes in the different domain

patterns

OtherUDPTCP

HTTP FTP …. TFTP ICMP

• Objective– Make the tree with the stable traffic tree

….

• Constrain– A lots of patterns with the same prefix in the same class should be a

independent class

30

How to classify patterns (6)

• Mathematical model for training classifier – Merge two classes when

• Conditions of means hold• Conditions of variances hold

– are the same as previous meanings– k (>=1) is a coefficient that could balance

• Resource [ k↑]• Performance [ k ↓]

,

31

How to classify patterns (7)

• Conditions of means

)()( yx DmeanNandDmeanM

kS

ND

andkS

MDS

yy

S

xx

11

||||

kS

NMDDS

i

yixi

1 2

)()(

32

How to classify patterns (7)

• Conditions of variances

)()( yx DVarQandDVarP

kNM

NQ

NM

MP

kQandkP

33

Classifier

• Advantages– reduce the impact of complex approximate

perfect hash function– eliminate the pattern matching not required

34

Classifier behavior

Input packet

belong to any class?NO

bypass

YES

dispatch the input packet to the corresponding handler

35

Next Experiments

36

Conclusion