+ All Categories
Home > Documents > CAMP: Fast and Efficient IP Lookup Architecture Sailesh Kumar, Michela Becchi, Patrick Crowley,...

CAMP: Fast and Efficient IP Lookup Architecture Sailesh Kumar, Michela Becchi, Patrick Crowley,...

Date post: 02-Jan-2016
Category:
Upload: peter-parker
View: 212 times
Download: 0 times
Share this document with a friend
Popular Tags:
27
CAMP: Fast and Efficient IP Lookup Architecture Sailesh Kumar, Michela Becchi, Patrick Crowley, Jonathan Turner Washington University in St. Louis
Transcript

CAMP: Fast and Efficient IP Lookup Architecture

Sailesh Kumar, Michela Becchi, Patrick Crowley, Jonathan Turner

Washington University in St. Louis

Michela Becchi - 04/20/23

Context Trie based IP lookup Circular pipeline architectures

Michela Becchi - 04/20/23

Context Trie based IP lookup Circular pipeline architectures

0* P1

000* P2

0010* P3

0011* P4

011* P5

10* P6

11* P7

110* P8

Prefix dataset

IP address111010…

Michela Becchi - 04/20/23

Context Trie based IP lookup Circular pipeline architectures

0* P1

000* P2

0010* P3

0011* P4

011* P5

10* P6

11* P7

110* P8

Prefix dataset

P2

P7

P1

P3 P4

P5

P6

P8

0

0

0

0

0

0

1

11

1

11

TrieIP address111010…

Michela Becchi - 04/20/23

Context Trie based IP lookup Circular pipeline architectures

0* P1

000* P2

0010* P3

0011* P4

011* P5

10* P6

11* P7

110* P8

Prefix dataset

P2

P7

P1

P3 P4

P5

P6

P8

0

0

0

0

0

0

1

11

1

11

Trie

Stage 1

Stage 2

Stage 3

Stage 4

IP address111010…

Michela Becchi - 04/20/23

Context Trie based IP lookup Circular pipeline architectures

0* P1

000* P2

0010* P3

0011* P4

011* P5

10* P6

11* P7

110* P8

Prefix dataset

P2

P7

P1

P3 P4

P5

P6

P8

0

0

0

0

0

0

1

11

1

11

Trie

Stage 1

Stage 2

Stage 3

Stage 4

1

2

4

3

Circular pipelineIP address111010…

Michela Becchi - 04/20/23

CAMP: Circular Adaptive and Monotonic Pipeline

Problems:» Optimize global memory requirement» Avoid bottleneck stages» Make the per stage utilization uniform

Idea:» Exploit a Circular pipeline:

– Each stage can be a potential entry-exit point– Possible wrap-around

» Split the trie into sub-trees and map each of them independently to the pipeline

Michela Becchi - 04/20/23

CAMP (cont’d)

Implications:» PROS:

– Flexibility: decoupling of maximum prefix length from pipeline depth

– Upgradeability: memory bank updates involve only partial remapping

» CONS:– A stage can be simultaneously an entry point and a transition

stage for two distinct requests Conflicts’ origination Scheduling mechanism required Possible efficiency degradation

Michela Becchi - 04/20/23

Trie splitting

P8P2 P3

P6 P7

P4 P5

P1

P2 P3P6 P7 P8

P4 P5

P1 P1

P 3

P4

P5

P 6

P 7

P8

P 1P2

0 0 * E nte r a t p ip e li nes tag e 1

0 1 * E nte r a t p ip e li nes tag e 2

1 0 * N o M a tc h

1 1 *E nte r a t p ip e li ne

s tag e 3

P ip e li ne sta g e 3 P i p el in e s ta ge 4

P i p e li ne sta g e 2 P i pe li n e s tag e 1

1

0

10

10

0

1 1

P1

Define initial stride x

Use a direct index table with 2x entries for first x levels

Expand short prefixes to length x

Map the sub-trees

E.g.: initial stride x=2

Direct index table

Subtree 1

Subtree 2

Subtree 3

x=

2

Michela Becchi - 04/20/23

Dealing with conflicts

Idea: use a request queue in front of each stage Intuition: without request queues,

» a request may wait till n cycles before entering the pipeline» a waiting request causes all subsequent requests to wait as well,

even if not competing for the same stages Issue: ordering

» Limited to requests with different entry stages (addressed to different destinations)

» An optional output reorder buffer can be used

Stage 1

S tage 2

S tage n

0 0 ..0 * x

0 0 ..1 * y

. .

. .

. .

to s tage 1

r eq u es t q u eu es

D ir ec t lo o k u p tab lef o r in it ia l p r e f ix b its

d es tin a tio nad d r es s n ex t

h o p

r eo r d er in g b u f f e r( o p tio n a l)

Michela Becchi - 04/20/23

Pipeline Efficiency Metrics:

» Pipeline utilization: fraction of time the pipeline is busy provided that there is a continuous backlog of requests

» Lookups per Cycle (LPC): average request dispatching rate

Linear pipeline: » LPC=1 » Pipeline utilization generally low

– Not uniform stage utilization

CAMP pipeline:» High pipeline utilization

– Uniform stage utilization» LPC close to 1

– Complete pipeline traversal for each request– # pipeline stages = # trie levels

» LPC > 1 – Most requests don’t make complete circles around pipeline– # pipeline stages > # trie levels

Michela Becchi - 04/20/23

Pipeline efficiency – all stages traversed

Setup: » 24 stages, all traversed by each packet» Packet bursts: sequences of packets to same entry point

Results:» Long bursts result in high utilization and LPC» For all burst size, enough queuing (32) guarantees 0.8 LPC

0.5

0.6

0.7

0.8

0.9

1

1 5 9 13 17 21 25 29

Request queue size

LP

C (

req

ues

ts p

er c

ycle

)Uniformly random

Burst length = 2

8

24

4064

96

Michela Becchi - 04/20/23

Pipeline efficiency – LPC > 1

0

1

2

3

4

5

1 5 9 13 17 21 25 29

Request queue size

LP

C (

req

ues

ts p

er c

ycle

)

Burst length = 12

4

12 16 20 24 288 32

Setup: » 32 stages, rightmost 24 bits, tree-bit map of stride 3» Average prefix length 24

Results:» LPC between 3 and 5» Long bursts result in lower utilization and LPC

Michela Becchi - 04/20/23

Nodes-to-stages mapping Objectives:

» Uniform distribution of nodes to stages– Minimize the size of the biggest stage

» Correct operation of the circular pipeline– Avoid multiple loops around pipeline

» Simplified update operation– Avoid skipping levels

1 2

34a

b

c

d

1 2

34a

b

c1 2

34a

b

c

d

2

444

11

3

4

2

P1 P2

P3 P4

2

444

11

3

4

2

P1 P2

P3 P4

d

1 2

34ab

c

P1

22

111

44

3

1

P2

P3 P4

d

1 2

34ab

c

d

1 2

34ab

c

1 2

34ab

c

P1

22

111

44

3

1

P2

P3 P4

P1

22

111

44

3

1

P2

P3

22

111

44

3

1

P2

P3 P4

44

4433

32

1

44

3333

22

1

4

Michela Becchi - 04/20/23

Nodes-to-stages mapping (cont’d)

Problem Formulation (constrained graph coloring):» Given:

– A list of sub-trees– A list of colors represented by numbers

» Color nodes so that:– Every color is nearly equally used– A monotonic ordering relationship without gaps among colors is respected

when traversing sub-trees from root to leaves

Algorithm (min-max coloring heuristic)» Color sub-trees in decreasing order of size» At each steps:

– Try all possible colors on root (the rest of the sub-tree is colored consequentially)

– Pick the local optimum

Michela Becchi - 04/20/23

Min-max coloring heuristic - example

44444

3333

22

1T2T1

T4T3

Present coloring

If 1 on new root

If 2 on new root

If 3 on new root

If 4 on new root

Color 1 1

Color 2 2

Color 3 4

Color 4 5

Michela Becchi - 04/20/23

Min-max coloring heuristic - example

44444

3333

22

1T2T1

T4T3

Present coloring

If 1 on new root

If 2 on new root

If 3 on new root

If 4 on new root

Color 1 1 2 5 3 2

Color 2 2 3 3 6 4

Color 3 4 6 5 5 8

Color 4 5 9 7 6 6

Michela Becchi - 04/20/23

Min-max coloring heuristic - example

44444

3333

22

1

2222

11

4

3T2T1T4T3

Present coloring

If 1 on new root

If 2 on new root

If 3 on new root

If 4 on new root

Color 1 1 2 5 3 2

Color 2 2 3 3 6 4

Color 3 4 6 5 5 8

Color 4 5 9 7 6 6

Michela Becchi - 04/20/23

Min-max coloring heuristic - example

44444

3333

22

1

2222

11

4

3T2T1T4T3

Present coloring

If 1 on new root

If 2 on new root

If 3 on new root

If 4 on new root

Color 1 3 4 5 4 5

Color 2 6 8 7 8 7

Color 3 5 6 7 6 7

Color 4 6 8 7 8 7

Michela Becchi - 04/20/23

Min-max coloring heuristic - example

44444

3333

22

1

11

4

33

2

2222

11

4

3T2T1T4T3

Present coloring

If 1 on new root

If 2 on new root

If 3 on new root

If 4 on new root

Color 1 3 4 5 4 5

Color 2 6 8 7 8 7

Color 3 5 6 7 6 7

Color 4 6 8 7 8 7

Michela Becchi - 04/20/23

Min-max coloring heuristic - example

1

44444

3333

22

1

11

4

33

2

2222

11

4

3T2T1T4T3

Present coloring

If 1 on new root

If 2 on new root

If 3 on new root

If 4 on new root

Color 1 5

Color 2 7

Color 3 7

Color 4 7

Michela Becchi - 04/20/23

Evaluation settings

Trends in BGP tables:» Increasing number of prefixes» Most of prefixes are <26 bit (~24 bit) long» Route updates can concentrate in short period of time;

however, they rarely change the shape of the trie

50 BGP tables containing from 50K to 135K prefixes

Michela Becchi - 04/20/23

Memory requirements

0

0.03

0.06

0.09

0.12

0.15

1 6 11 16 21 26 31

Pipeline stage #

Rel

ativ

e si

ze o

f th

e p

ipel

ine

Sum of normalized upper bounds = 1.31

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1 6 11 16 21 26 31

Pipeline stage #

Rel

ativ

e si

ze o

f th

e p

ipel

ine

Sum of normalized upper bounds = 1.230

0.01

0.02

0.03

0.04

0.05

1 4 7 10 13 16 19 22 25

Pipeline stage #

Rel

ativ

e si

ze o

f th

e p

ipel

ine

Sum of normalized upper bounds = 1.024

Level based mapping

Height based mapping

CAMP

Balanced distribution across stages

Reduced total memory requirements»Memory overhead: 2.4% w/ initial stride 8, 0.02% w/ initial stride 12, 0.01% w/ initial stride 16

Michela Becchi - 04/20/23

Updates

Techniques for handling updates» Single updates inserted as “bubbles” in the pipeline» Rebalancing computed offline and involving only a subset of

tries Scenario

» migration between different BGP tables» imbalance leads to 4% increase in occupancy of larger stage

0.043

0.044

0.045

0.046

0.047

0.048

0.049

0.05

1 3 5 7 9 11 13 15 17 19 21

# of migrations

Re

lati

ve

siz

e o

f th

e p

ipe

line

min

max

""

""

Series5Series6Series7Series8Series9Series10Series11Series12Series13Series14Series15Series16Series17Series18Series19Series20

Michela Becchi - 04/20/23

Summary

Analysis of a circular pipeline architecture for trie based IP lookup

Goals:» Minimize memory requirement» Maximize pipeline utilization» Handle updates efficiently

Design:» Decoupling # of stages from maximum prefix length» LPC analysis» Nodes to stages mapping heuristic

Evaluation:» On real BGP tables» Good memory utilization and ability to keep 40Gbps line rate

through small memory banks

Michela Becchi - 04/20/23

Thank you!

Michela Becchi - 04/20/23

Addressing the worst case Observations:

» We addressed practical datasets» Worst case tries may have long and skinny sections difficult to split

Idea: adaptive CAMP» Split trie into “parent” and “child” subtries» Map the parent sub-trie into pipeline » Use more pipeline stages to mitigate effect of multiple loops around pipeline

k

ro o t

k

ro o t

ra n k o f n o d e i =s iz e o f s u b -triero o te d a t i


Recommended