Intrusion Detection Processor with Packet Content Matching

Post on 31-Dec-2015

33 views 4 download

description

Intrusion Detection Processor with Packet Content Matching. JC Ho ECE 594. Topics. Background Algorithm and Data Structure Memory Architecture Processor Design. Background. String Matching Algorithms. Boyer-Moore Good for single-pattern Wu-Manber Best average-case performance - PowerPoint PPT Presentation

transcript

04/19/231

Intrusion Detection Processor with Packet Content Matching

JC Ho

ECE 594

04/19/232

Topics

Background Algorithm and Data Structure Memory Architecture Processor Design

04/19/233

Background

04/19/234

String Matching Algorithms

Boyer-Moore– Good for single-pattern

Wu-Manber– Best average-case performance

Aho-Corasick– O(n) worst-case performance

04/19/235

Data Structure for Aho-Corasick

Unoptimized – 1028 bytes per node, 53MB

Bitmap Compression– 41 Bytes per node, 2.8MB

Path Compression– 20 Bytes per node (average), 1.1MB

Data structure size is reduced w/out rules

04/19/236

Aho-Corasick with Bitmap Compression– Separation of signature and rules database in

different storage units– Smaller next node, failure, and rules pointers

24 bits each

Result– 41 bytes per node– Same performance

adaptation

32 byte bitmap next node pointer failure pointer rules pointer

04/19/237

Considerations

Complete or partial match

Complete signature

Partial signature

Partial signature

No match

04/19/238

Considerations—Cont.

Case 1:– Failure pointers eventually go to the root– Tag as safe

No match

04/19/239

Case 2:– Easy to handle– Start from the beginning of packet– Failure pointers goes back to the root– Mark root node visited– Beginning of signature eventually goes to the right

path– Traverse entire path and tag as full match

Considerations—Cont.

Complete signature

04/19/2310

Considerations—Cont.

Case 3:– Similar to Case 2– Beginning of the signature eventually goes down

the right path– Mark root node visited– When end of packet reached, tag as partial match

Partial signature

04/19/2311

Considerations—Cont.

Case 2:– Very different from cases 2 and 3– Needs to start from the middle of the data

structure– Needs to find the first instance of the first byte in

the data structure– Traverse the path of the signature to reach the

leaf, mark as partial match since root is not visited

Partial signature

04/19/2312

Considerations—Cont.

Result – Case 4 can be the general case– Cases 1, 2, and 3 are special situations of case 4– Start from the middle of the data structure every

time for each packet– Cases 1, 2, and 3 will eventually be redirected

back to the root and will operate as if they started from the root

04/19/2313

Memory Architecture

Guarantee worst-case performance On-chip storage for data structure Similar to cache design Wide word reference For ASIC design, memory reference can use

node addressable scheme to reduce pointer size further

04/19/2314

Memory Architecture—Cont.

Node size = Line width– 64 bytes in theory– 41 bytes in reality– Remaining bytes

are not constructed

0 40 63

Address 23:6

04/19/2315

Processor Design

Preprocessing

LoadData

Effective MemoryAddress Resolution

AddressCheck

Signature StorageUnit Access

BitmapProcessing

Next NodeAddress Calculation

DataCheck

MatchCheck

Next RoundPreparation

Post-processing

04/19/2316

Processor Design—Preprocessing

Multiple packets are buffered Contents are loaded to queues on-chip Each byte of the content is accessed

sequentially Head and tail pointers required for enqueue

and dequeue Start and end pointers required to indicate

start and end of packet

04/19/2317

Processor Design—Preprocessing Cont.

Packets are assumed to be independent Data from the same packet always occupies

the same queue Number of queues are proportional to

number of stages in data path Size of queues can be inversely proportional

to number of queues

04/19/2318

Processor Design—Core

Load data– A counter determines from which queue data is

loaded– 1 byte is loaded from a different queue each cycle – No data dependency in the data path– Counter value is passed to the pipeline register

along with data byte to keep track of queue

04/19/2319

Processor Design—Core Cont.

Effective memory address resolution– Check start pointer to determine whether this is

the starting byte of a packet– Starting byte of a packet

Use byte to index into a table to find the address of the first instance of this byte in data structure

Reset all flags associated with this queue

– Not the starting byte Use the next node address computed from previous

byte

04/19/2320

Processor Design—Core Cont.

Address check– Determine if effective address is root node

Set root flag (RF)

04/19/2321

Processor Design—Core Cont.

Signature storage unit access– Bitmap loaded into 8 bitmap registers (BMR0-7),

each 32 bits– Next node pointer loaded to next node register

(NNR), 24 bits– Failure pointer loaded to failure register (FR)– Rules pointer loaded to rules register (RR)

04/19/2322

Processor Design—Core Cont.

Bitmap processing– 8 independent popcount units to count the 1’s in

BMR0-7– Bits 0-4 of current data byte is used to load a bit

from each BMR– Bits 5-7 of current data byte is used to select the

proper bit and load value of this BMR to PCR– Check if bit is 1 and set BMF (flag) value

04/19/2323

Processor Design—Core Cont.

Next node address calculation– If (BMF = 0) next node address = FR– If (BMF = 1)

Perform popcount on PCR to the proper bit (based on bits 0-5 of current byte)

Sum all popcount values up to proper bit Next node address = (this sum * node size ) + NNR

– Use saturated add– Value is stored back to NNR

04/19/2324

Processor Design—Core Cont.

Data check– Check end pointer to determine if current byte is

end of packet Set end flag (EF)

– Check NNR value to determine if leaf node is reached

Set match flag (MF)

04/19/2325

Processor Design—Core Cont.

Match check– Case 2: if (RF = 1) and (MF = 1)

Set complete match flag (CMF)

– Case 4: If (RF = 0) and (MF = 1) Set partial match flag (PMF)

– Case 3: If (EF = 1) and (current node != root node) and (NNR != FR)

Set PMF

04/19/2326

Processor Design—Core Cont.

Next round preparation– Route NNR value back to load data stage– If (CMF = 1)

Set flush flag (FF) to signal to preprocessing unit to load new packet to this queue

– If ignore flag (IF) is set Ignore processing result

– Reset CMF, PMF, EF

04/19/2327

Processor Design—Post-processing

If (CMF = 1) or (PMF = 1) – Use RR value to access rules database– Perform actions according to rule

If (EF = 1) and (CMF = 0) and (PMF = 0)– Release packet to router

If (FF = 1)– Set IF to invalidate subsequent data from this queue– Reset FF

04/19/2328

Preliminary Results

2MB signature storage unit– 3.6 ns access time using CACTI– Assume storage unit access is critical path– Translate to 250 MHz conservatively

Support up to 2Gbps

04/19/2329

Conclusion

Algorithm is optimized for hardware implementation

Memory requirements can be met by current technology

Implementation is feasible