+ All Categories
Home > Documents > picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride...

picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride...

Date post: 10-Aug-2020
Category:
Upload: others
View: 13 times
Download: 0 times
Share this document with a friend
27
Department of Electrical and Computer Engineering September 14, 2016 Picking Pesky Parameters: Optimizing Regular Expression Matching in Practice
Transcript
Page 1: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial

Department of Electrical and Computer Engineering

September 14, 2016

Picking Pesky Parameters: Optimizing Regular Expression

Matching in Practice

Page 2: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial

2Department of Electrical and Computer Engineering

Outline

§ Introduction to regular expressions§ Design space exploration§ Results§ Optimal Regular Expression Matching

Configuration§ Conclusion

Page 3: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial

3Department of Electrical and Computer Engineering

What is regular expression matching

§ A regular expression (abbreviated regex ) patterns a match to a string.• E.g. this regex matches a valid IP address:• (([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-

9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])

§ Application of regular expression matching:• bibliographic search• Intrusion detection system• Protocol identification• Content filtering

§ Many network security software, such as Snort and Bro, use rule sets of regular expressions that match attacks.

§ These software need to operate at multiple to tens of Gigabit per second link rates to meet the performance requirements of the network.

Page 4: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial

4Department of Electrical and Computer Engineering

How to implement a regex lookup engine?

1. Transform the rule set into a state machine (finite automaton).2. packet payloads are scanned by traversing the state machine.

§ Automaton can be non-deterministic (NFA) or deterministic (DFA)§ Example: NFA and DFA of .*ab+[cd]e

0 1 2 3 4a

*b

b[cd] eNFA

DFA 0 1 2 3 4a

a

b

b

[cd] ea

a a

Accepting  state

Page 5: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial

5Department of Electrical and Computer Engineering

What is the problem?

§ There are too many algorithms proposed to tune regex matching.§ There are too many different systems implementations for regex

matching:• Different hardware;• Different types of processors; • Different memory configurations.

§ The performance metrics used in previous publications differ:• reduce memory requirements;• improve the average and worst case throughput;• reduce power and energy consumption.

§ It is very difficult to determine which technique or system implementation to use.

Page 6: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial

6Department of Electrical and Computer Engineering

What does our work do?

§ Our work addresses the problem of choosing which regular expression technique to use for a given system, rule set, and traffic configuration.

§ We present a systematic evaluation of many widely used regular expression techniques using real-world rule sets.

§ We evaluate the throughput, memory size, energy consumption, and estimated chip area of each configuration.

§ We provide a method for choosing the right configuration based on the results from our experiments.

Page 7: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial

7Department of Electrical and Computer Engineering

Outline

§ Introduction to regular expressions§ Design space exploration§ Results§ Optimal Regular Expression Matching

Configuration§ Conclusion

Page 8: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial

8Department of Electrical and Computer Engineering

Two types of solutions

§ Memory based solution

§ Logic based solution

Automaton

Caches

Processing units

……

……

MemoryBus

Input / Match

Page 9: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial

9Department of Electrical and Computer Engineering

Design space Regular  expression  ruleset

2-­DFADFA

NFA2-­NFA

Non-­compr.  layout

Linear  encoding

Bitmapped  encoding

Memory-­based

Result

A-­DFA 2-­A-­DFA

Logic-­based

FPGA  clock  rate

Automaton

partitioned  ruleset

Inputs

Implementation

HW-­based  multi-­stride

SW-­based  multi-­stride Stride-­1

Cache  size Memory  bandwidth

Number  of  cores

4  configurations 9  configurations

System

EvaluationSynthesis  

toolProcessor  simulator

Real  processor

Throughput  speed

Memory  &  area  cost

Power  consumpt.

Traffic  traces

Page 10: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial

10Department of Electrical and Computer Engineering

Automaton domain

§ NFA (Nondeterministic Finite Automaton)• Generated from regex ruleset.• The number of states is small, but it allows multiple state activations at the

same time.§ DFA (Deterministic Finite Automaton)

• Generated from NFA.• Allows only one active state at the same time: stable performance.• Size could grow exponentially if some complex patterns exist (called state

explosion).• Large rulesets need to be partitioned into several parts, and generate

multiple DFAs.• A-DFA: a compression technique that allow a DFA state use less than 256

transitions. Should use with a compressed memory layout.§ Multi-stride NFA/DFA (or k-NFA/k-DFA)

• Process k input characters at a time• If the initial alphabet is Σ, a k-NFA/k-DFA is equivalent to a FA defined on

alphabet Σk.

Page 11: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial

11Department of Electrical and Computer Engineering

Implementation domain -- Memory based solution

§ Three memory layouts:1. Non-compressed layout

• Uses all |Σ| transitions in a state.2. Linear encoding

• Only encodes the existing transitions in an NFA, or default transition and other transitions in an A-DFA.

• Linear search is performed until a transition matching the input character is found or its absence is verified.

3. Bitmapped encoding• Similar to linear encoding, but use a bitmap to

avoid linear search.• Only apply to stride-1 DFA

§ 9 configurations in total• Non-compressed – NFA, DFA, 2-NFA, 2-DFA• Linear encoding – NFA, A-DFA, 2-NFA, 2-A-DFA• Bitmapped encoding – A-DFA

Tx for 0x00

32-bits

Tx for 0x01Tx for 0x02

Tx for 0xFF

256 words

……

state

DFA non-compressed layout

Tx for 0x00

32-bits

Tx for 0x01Tx for 0x03Tx for 0xFF

addr of state 0……

addr of state n

Tx address map

stateDefault Tx

DFA linear encoding

Tx for 0x00

32-bits

Tx for 0x01Tx for 0x03Tx for 0xFF

stateLevel1 Bitmap

Level2 Bitmap

1 word

8 words

Default Tx

DFA bitmapped encoding

Page 12: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial

12Department of Electrical and Computer Engineering

Implementation domain -- Logic based solution

§ Logic based solutions only use NFA• Stride-1 implementation• Software-based multi-stride approach

• First generate a k-NFA, then encode it in logic.• Resource costly, can only support stride-2

• Hardware-based multi-stride approach• Have a stride-one NFA and the corresponding alphabet translation table• Resource efficient, can support up to stride-4

§ 4 configurations in total• Stride-1 implementation• Software-based -- 2-NFA• Hardware-based -- 2-NFA, 4-NFA

Page 13: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial

13Department of Electrical and Computer Engineering

System domain

§ Memory based solution• Different cache sizes for level-1 and level-2 cache• Memory bandwidth• Different number of cores

§ Logic based solution• Different FPGA clock rates

Page 14: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial

14Department of Electrical and Computer Engineering

Design space Regular  expression  ruleset

2-­DFADFA

NFA2-­NFA

Non-­compr.  layout

Linear  encoding

Bitmapped  encoding

Memory-­based

Result

A-­DFA 2-­A-­DFA

Logic-­based

FPGA  clock  rate

Automaton

partitioned  ruleset

Inputs

Implementation

HW-­based  multi-­stride

SW-­based  multi-­stride Stride-­1

Cache  size Memory  bandwidth

Number  of  cores

4  configurations 9  configurations

System

EvaluationSynthesis  

toolProcessor  simulator

Real  processor

Throughput  speed

Memory  &  area  cost

Power  consumpt.

Traffic  traces

Page 15: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial

15Department of Electrical and Computer Engineering

Outline

§ Introduction to regular expressions§ Design space exploration§ Results§ Optimal Regular Expression Matching

Configuration§ Conclusion

Page 16: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial

16Department of Electrical and Computer Engineering

Evaluation Methodology

§ Real hardware• TI OMAP 4460 ARM processor• Xilinx Virtex 5 FPGA (XC5VLX50)• Speed, memory usage/slice usage and power are

measured

§ Simulator• SimpleScalar simulator, calibrated with real hardware.• To study the parameters which can not be changed on

real hardware• Cache size• Memory bandwidth

§ Inputs• We use both real rulesets (from Snort, L7-filter, and Bro)

and some synthetic rulesets with different characteristics.• Traffic traces are generated by the traffic generator

(written by Becchi et.al.)

Page 17: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial

17Department of Electrical and Computer Engineering

Results from real hardware – Memory based solutions

§ TI OMAP 4460 ARM processor§ Rulesets with very high mNFA and very low mDFA should

use DFA, and a ruleset with very high mDFA and very low mNFA should use NFA.• mNFA: the average number of active states in NFA• mDFA: the number of DFAs

0

50

100

150Sp

eed

(Mbp

s)

0.1

1

10

100

1000

Mem

ory

(MB)

snort l7-filter bro exact-match dotstar 0.1 dotstar 0.2 dotstar 0.3 dotstar 0.60

500

1000

1500

Powe

r (m

W)

NFA NCNFA LE2-NFA NC2-NFA LEDFA NCA-DFA LEA-DFA BM2-DFA NC2-A-DFA LE

Ruleset #reg-ex

Length mDFA mNFAmin max avgsnort 462 10 202 44.1 12 2.76l7-filter 111 6 438 63.2 7 6.02bro 782 5 211 34.8 8 20.34exact-match 500 10 256 49.2 2 1.76dotstar 0.1 500 10 243 49.6 11 8.42dotstar 0.2 500 11 212 49.0 24 15.64dotstar 0.3 500 11 251 47.1 33 12.76dotstar 0.6 500 11 274 50.3 49 26.76

Page 18: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial

18Department of Electrical and Computer Engineering

Results from real hardware – Logic based solutions

§ Xilinx Virtex 5 (XC5VLX50)§ 𝑠𝑝𝑒𝑒𝑑 = 𝑐𝑙𝑘)*+,×𝑠𝑡𝑟𝑖𝑑𝑒×8  𝑏𝑖𝑡𝑠§ Smaller circuit can operate at higher frequency§ Hardware-based stride 4 implementation leads to the best results

0

2000

4000

6000

Spee

d (M

bps)

stride-1 SW stride-2 HW stride-2 HW stride-4

0

10000

20000

30000

Slic

e Us

age

snort l7-filter bro exactmatch dotstar0.1 dotstar0.2 dotstar0.3 dotstar0.60

1000

2000

Powe

r (m

W)

mis

sing

mis

sing

mis

sing

Page 19: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial

19Department of Electrical and Computer Engineering

Results from real hardware – Logic based solutions

§ Different frequency: power vs. speed trade-off;§ 𝑃 = 𝑃567689 + 𝑃;<=7>89 = 𝑃567689 + 𝛼𝐶𝑉B𝑐𝑙𝑘)*+,§ Should choose highest achievable 𝑐𝑙𝑘)*+, to get highest speed/power ratio.

0 500 1000 1500 2000 2500 3000 3500 4000400

500

600

700

800

900

1000

Speed (Mbps)

Powe

r (m

W)

stride-1SW stride-2HW stride-2HW stride-4

Page 20: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial

20Department of Electrical and Computer Engineering

Results from Processor Simulation – Cache

§ SimpleScalar simulator§ We select the best cache size based on speed/area.

1 2 4 8 16 32 64 128 256 512

32641282565121024204840968192163840

5

10

15

L1 data cache size (KB)L2 data cache size (KB)

Spee

d/ar

ea (M

bps/

mm

2)

L1 size(KB)

L2 size(KB)

NFA 16 64NFA linear 16 322-NFA 64 10242-NFA linear 64 512DFA 64 128D2FA linear 64 64D2FA bitmap 32 642-DFA 128 40962-D2FA linear 128 4096

Best cache size for different configurationsSelected by maximum speed/area

Page 21: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial

21Department of Electrical and Computer Engineering

Results from Processor Simulation – Memory bandwidth

§ Most cache miss rates are below 1%§ Low memory bandwidth utilization§ High parallelism is possible

Utilizationof bwmem(%)

Maxthreadssupported

NFA 0.25 81NFA linear 0.17 1202-NFA 0.38 522-NFA linear 0.23 88DFA 0.17 118D2FA linear 0.04 454D2FA bitmap 0.04 4802-DFA 0.26 762-D2FA linear 0.20 101

Demonstration of scalability on Intel x86 CPU.

Page 22: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial

22Department of Electrical and Computer Engineering

Outline

§ Introduction to regular expressions§ Design space exploration§ Results§ Optimal Regular Expression Matching

Configuration§ Conclusion

Page 23: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial

23Department of Electrical and Computer Engineering

Optimal Memory-Based Configurations§ Select the optimal configuration by speed/area§ Parallel processing is allowed§ When mNFA/mDFA<0.35, an NFA-based implementation is preferable;§ Otherwise DFA-based implementations are preferable.§ For some simple rulesets, 2-DFA is faster than DFA.

Page 24: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial

24Department of Electrical and Computer Engineering

Optimal Logic-Based Configurations

§ Hardware-based multi-stride is the best.§ There seems to be a peak speed/slice value at higher stride, but

this is beyond the chip's resource to validate.

1 2 3 4 5 6 7 88

10

12

14

16

18

20

stride

Mbp

s/K

slic

es

speed/slice for different hardware based stride

Page 25: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial

25Department of Electrical and Computer Engineering

Conclusion

§ The key problem in regular expression matching is not the lack of innovative techniques, but the difficulty of deciding which technique actually works best in a given system setting.

§ In this work, we:• define the regular expression matching design space• propose a benchmark of configurations that evaluate the design space both

on simulator and on real hardware.• present the analysis of ruleset to obtain optimal configuration.

Page 26: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial

26Department of Electrical and Computer Engineering

Thank you!

Page 27: picking pesky parameters · Logic based solution Automaton Caches Processing units ... Multi-stride NFA/DFA (or k -NFA/k-DFA) • Process k input characters at a time • If the initial

27Department of Electrical and Computer Engineering

0 20 40 60 80 100 1200

20

40

60

80

100

120

simulator speed (Mbps)

real

spe

ed (M

bps)

Calibration


Recommended