+ All Categories
Home > Documents > Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner...

Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner...

Date post: 18-Jan-2016
Category:
Upload: emory-carson
View: 222 times
Download: 0 times
Share this document with a friend
22
Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese
Transcript
Page 1: Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.

Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection

Sailesh Kumar, Jon TurnerMichela Becchi, Patrick Crowley,

George Varghese

Page 2: Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.

2 - Sailesh Kumar - 04/21/23 2 - Jon Turner - 04/21/23

Motivation Network security applications scan packet

content to detect viruses, worms, etc.» typically use signatures common to suspicious packets» regular expressions provide powerful, general way to

describe signatures So what’s new?

» reg-ex matching well-understood for >30 years» reg-exes in network applications are different

– union of thousands of component patterns– state explosion from interacting “repeat patterns”

» tight performance constraints– wire speed processing at 10 Gb/s rates (and up)– limited memory space

Page 3: Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.

3 - Sailesh Kumar - 04/21/23 3 - Jon Turner - 04/21/23

Regular Expression Refresher Sample regular expressions

» a.*b matches ab, aab, abb, accdb, ...» a(ab|c)+[^d] matches aabc, aca, acabb, ...

(a.*b)|(a(ab|c)+[^d])

1

3

2a

b

a,b,c,d

ca

5

6

7

4a b

a,b,c,d

0a b

a,b,c

c

NFA – nondeterministicfinite automaton

6 7 8 597

98

23

54

01

2 6 541 3 557 8 59

2 6 541 3 557 8 591 3 55

a b dc1 0 002 3 54

DFA

012501367

01570127

0134012

01015

0013

statesubsets

Page 4: Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.

4 - Sailesh Kumar - 04/21/23 4 - Jon Turner - 04/21/23

Challenges for Intrusion Detection Hundreds to thousands of patterns

» many fairly simple, but not all» significant number include “repeats” with infinite or

bounded iteration Large space requirements

» DFA formed by combining patterns may require many more states than NFA

» for ASCII inputs, tabular representation of DFAs can be very large

Demanding real-time requirements» 1 or 2 off-chip memory accesses per input character

Must maintain state across many (>100K) flows» constrains affordable per-flow context

Page 5: Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.

5 - Sailesh Kumar - 04/21/23 5 - Jon Turner - 04/21/23

Three-Way Tradeoff

Memory space» on-chip vs. off-chip» pattern matching automata and flow state

Parallelism» hardware solutions allow substantial parallelism» in NPs, parallelism more limited» more parallelism reduces automata space,

increases flow state

throughput

space parallelism

Page 6: Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.

6 - Sailesh Kumar - 04/21/23 6 - Jon Turner - 04/21/23

Problems Addressed Reducing space used by DFAs

» typical tabular DFA is highly redundant– states share many common successors

» reduce redundancy using default transitions– trades off space for throughput

Making it compact and fast» choose default transitions for amortized performance » use content-addressing to skip over default transitions

Coping with state space explosion» process flows that stay in shallow states separately from

flows that “go deep” – fast-path/slow-path processing

Page 7: Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.

7 - Sailesh Kumar - 04/21/23 7 - Jon Turner - 04/21/23

Delayed Input Finite Automata (D2FA)

In tabular DFA representation» for ASCII characters, 256 transitions per state» 50+ distinct transitions per state in real world datasets» need storage for 50+ edges

But, many states share similar sets of edges

Note that states 1 and 3 have common transitionsfor symbols a, b, d.Can we exploit this redundancy to reduce space?

Three patterns:a+, b+c, c*d+

4 transitionsper state

2

1 3b

4

5

a

c

ab

d

a

c

bc

b

a

c

d

c

d

a

dbd

Page 8: Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.

8 - Sailesh Kumar - 04/21/23 8 - Jon Turner - 04/21/23

Default Transitions If (s1,a)=(s2,a) and (s1,b)=(s2,b),

» can replace explicit transitions (s1,a), (s1,b) with default transition from s1 to s2 (or could go other way)

» when parsing input, follow default transition when no outgoing transition defined on input character

» no input consumed when following default transition

2

1 3b

4

5

a

c

a b

d

a

c

bc

b

a

c

d

c

d

a

dbd

2

1 b

4

5

a

c

b

d

cb

a

c

d

c

a

3

d

Page 9: Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.

9 - Sailesh Kumar - 04/21/23 9 - Jon Turner - 04/21/23

Selecting Default Transitions

2

1 3b

4

5

a

c

a b

d

a

c

bc

b

a

c

d

c

d

a

dbd

1 c

2

5

4

3

c

a

d

b

alternate(and better)

solution

2

1 3

4

5

3

33

3

2

2

2

33

2

spacereduction

graph

max wtspanning

tree

potentialsavings

1c

2

5

4

3

a

d

b

c

tree edges directed towards

chosen root

209edges

Page 10: Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.

10 - Sailesh Kumar - 04/21/23 10 - Jon Turner - 04/21/23

Trading off Time and Space Sort edges in space-reduction graph by length For each edge, add to “forest” so long as does not create

cycle or create tree with excessive diameter Choose root for each tree at “most central node” Direct default transitions towards roots

sortededge list

{1,2}{4,5}{1,5}{2,4}{1,4}{2,5}{1,3}{3,5}{3,4}{2,3}

2

1 3

4

5

3

33

3

2

2

2

33

2

diameterbound 2

2

1 3b

4

5

a

d

cb

a

c

c

d

Page 11: Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.

11 - Sailesh Kumar - 04/21/23 11 - Jon Turner - 04/21/23

Sample Results

Sample data set of 612 regular expressions Original DFA has 11.3K states, 2.3M transitions Transitions in D2FA

» with no depth bound, 0.75% of original» with depth bound of 5, 1.07%» with depth bound of 2, 2.54%» with depth bound of 1, 20.70%

Depth bound of d implies d+1 memory accesses per input character

Page 12: Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.

12 - Sailesh Kumar - 04/21/23 12 - Jon Turner - 04/21/23

Representing D2FA

list vector

95% of states have ≤2 outgoing transitions Represent states with few transitions using list Represent others with vector (for direct access)

Page 13: Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.

13 - Sailesh Kumar - 04/21/23 13 - Jon Turner - 04/21/23

Changing Performance Criteria Real objective is bounded time per packet

» amortized complexity, not worst-case» earn “credit” for every normal transition» “spend” a credit for each default transition» choose default transitions to guarantee never in debt

Simple way to ensure ≥0 credits» label states according to distance

from start state» restrict default transitions to go

from larger labels to smaller» bonus – simpler computation

– perform breadth-first search– at each node, select best edge

allowed for default transition ≤2 memory accesses per character

1 c

2

5

4

3

d

b0

1 2

1

1

a

c

Page 14: Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.

14 - Sailesh Kumar - 04/21/23 14 - Jon Turner - 04/21/23

How Well Does It Work? On a typical set of patterns

» number of transitions reduced to 1% of original» depth-bounded D2FA with bound of 1 requires 20%

Can extend to reduce number of accesses» default transitions from depth d states to depth ≤d–k» at most (k+1)/k memory accesses per input character

– so for k=3, 1.33 accesses per char» number of transitions, usage relative to original

– for k=2, 1.8% – for k=3, 5.5% – for k=4, 11.6%

Page 15: Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.

15 - Sailesh Kumar - 04/21/23 15 - Jon Turner - 04/21/23

Content Addressing For nodes with default transitions,

» store selected “content” with predecessors» predecessors use content to skip over default transitions

Potential for collisions

a

b

c

dV

U

RXf/R

Yg/R,ab

Zh/R,ab,cd

if next input {a,b} goto Relse goto hash(R,ab)=U

if next input {a,b,c,d} goto R

else if next input {c,d} goto hash(R,ab)=U else goto hash(R,abcd)=V

Page 16: Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.

16 - Sailesh Kumar - 04/21/23 16 - Jon Turner - 04/21/23

Collisions in Content Addressing Addressing conflicts must be resolved

» in example, X and Y must go to different next states U and V, but would normally both use hash(R,ab)

a

b

a

bVU

R

Xg/R,ab

Y h/R,ab

Solution 1, use hash(R,ba) to reach V Solution 2, add discriminator bits to both hashes

h/R,bah/R,ab101

g/R,ab011

Page 17: Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.

17 - Sailesh Kumar - 04/21/23 17 - Jon Turner - 04/21/23

Selecting Content Addresses For each state

» list possible content addresses» compute hash for each

Construct bipartite graph» states at left» storage locations at right» edges from states to possible

storage locations Construct perfect matching

» easy to do when enough choices (and usually, there are)

» add discriminator bits to get more choices

» or, add extra storage locations

storagelocationsstates

V

ab0

ab1

ba0ba1

U

Y

X

Page 18: Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.

18 - Sailesh Kumar - 04/21/23 18 - Jon Turner - 04/21/23

Coping with State Explosion Large pattern sets can produce

DFAs with too many states» even after conversion to D2FA,

space can be impractically large» one solution: partition patterns

and form several DFAs or D2FAs– greatly reduces number of states– but requires processing each

packet multiple times Observation:

» well-behaved flows rarely visit states far from start state

Fast-path/slow-path» fast path for “shallow states”» slow path handles suspect flows

(ab.*c)|(ac.*b)|(ba.*a)

a

10 2ac b c

a,b

b,c a,b c

3

1 of 3 DFAs – total 12 states

resulting DFA has 20 statesbut state count nearly doubles with each additional pattern

1

5b

a

a,b,c

NFA

0

6

4

8a

cb

a

3

2

7

c

a,b,c

a,b,c

b

Page 19: Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.

19 - Sailesh Kumar - 04/21/23 19 - Jon Turner - 04/21/23

Sample Fast Path Construction

Start with k DFAs for slow path Construct vector-DFA that tracks

states of smaller DFAs» cut off when past target depth» or, cut off based on probability of

good flow reaching given state

ab.*c

a

10 2ac b c

a,b

b,c a,b c

3

a

10 2ab c b

a,c

b,c a,c b

3

b

10 2bc a a

b,c

a,c b,c a

3

ac.*b

ba.*a

fast path DFA

6 - - -7

98

23

54

01

- 6 -- - -- - -

5 2 06 2 78 9 410 11 12

a b c1 2 01 3 4

-

10 - - -11 - - -12 - - -

statevector

212300

031120

001201

112020

000110

113202022

3333

1222

01

333

depth

Page 20: Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.

20 - Sailesh Kumar - 04/21/23 20 - Jon Turner - 04/21/23

Fast Path/Slow Path Operation Flows processed by fast path as

long as stay in shallow states Slow path flows processed by

multiple DFAs» takes more per packet» keep more state between packets

Return to fast path after enough time in shallow states

Mitigating DoS attack» attacker can interfere with good

flows in slow path by sending lots of slow path traffic

» per flow queues in slow path can help, but not complete solution

» adjust priority of flows based on time spent in slow path

fastpath

statememory

slowpath

statememory

Page 21: Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.

21 - Sailesh Kumar - 04/21/23 21 - Jon Turner - 04/21/23

Simulation of DoS Mitigation

Constant attack traffic – adjust time spent in deep states

0

5

10

15

20

25

1 26 51 76 101 126 151 176 201 226 251

Thr

ough

put,

no D

oS p

rote

ctio

n

0

1

2

3

4

5

1 26 51 76 101 126 151 176 201 226 251

Slo

w p

ath

load

0

5

10

15

20

25

1 26 51 76 101 126 151 176 201 226 251

Flo

w th

roug

hput

. DoS

pro

tect

ion

s lo w p a th 's th r es h o ld

N o o v er lo ad in g M o d er a te o v er lo ad in g E x tr em e o v er lo ad in g

tim e ( s ec o n d s )time (seconds)

slow path load

thruputwith

no DOS mitigation

thruputwith DOS

mitigation

no overload moderate overload extreme overload

goodflows

Page 22: Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.

22 - Sailesh Kumar - 04/21/23 22 - Jon Turner - 04/21/23

Summary Reducing space needed for reg-ex matching

» D2FAs use default transitions joining similar states» constraining default transitions to go to shallower states

ensures good amortized performance» content addressing for skipping over default transitions

Coping with state explosion» slow path processes packets through k small DFAs» fast path processes packets using DFA on shallow states» requires DoS mitigation to deal with attacks on slow path

Other issues» bounded iteration causes excessive growth in state table» requires systematic use of counters

– state vector containing control state plus counter values– state machine transitions depend on & manipulate counters


Recommended