Modular SRAM-based Binary Content-Addressable Memories
Ameer M.S. Abdelhadi and Guy G.F. LemieuxDepartment of Electrical and Computer Engineering
University of British Columbia
Vancouver, Canadaa place of mindTHE UNIVERSITY OFBRITISH COLUMBIA
Binary Content-Addressable Memory (BCAM)
1/17
Hardware-based Single-Cycle Parallel Search EnginesWrite
Stores new data at specific address
MatchSearch all addresses for a
given data (pattern)
Found in ‘2’
BCDA
0123B C A M
Search for ‘D’
BCDA
0123B C A M
Store ‘C’ in ‘1’
BCAM Applications
Memory management
Associative caches
Translation lookaside
buffers (TLBs)
Databases
Eliminates memory
bottleneck
Networking
IP lookup in routing/
forwarding tables
Intrusion detection
• detect predefined suspicious packages
Packet Classification
Pattern matching
e.g. DNA sequence
lookup
Data compression
find and shorten
redundant patterns
Data encryption
find and encrypt specific patterns
2/17
Motivation - FPGAs
3/17
1000’sMemory
Blocks
100,000’sLogic
Elements
N o d e d i c a t e d B C A M r e s o u r c e s i n F P G A s
Objectives
BCAMs
Massively parallel memory search
Require high memory bandwidth
FPGAs
Block RAMs are main storage
Limited memory bandwidth
4/17
Use BRAMs to construct BCAMs
• Modular and flexible• Storage efficient• Single-cycle• Performance oriented
Associative Arrays
Algorithmic Heuristics
5/17
HashesSearch Trees:Tries, BSTs, …
Multi-unpredictable-cycle
Data dependent performance
Variable search depth Misses due to conflicts
Register-Based BCAM
Concurrent register read and compare
Single-cycle Limited resources Complex routing Fits small BCAMs
6/17
⌊log2CD⌋
WPatt
MIn
dc
Ad
dre
ssD
eco
der
WAddr
PW
=
MPatt
CD
=PW
PW
=
⌊log2CD⌋
MAddr
Match
PWPWQ
Q
QEnD Reg0
EnD Reg1
EnD RegCD-1
Pri
ori
ty
Enco
der
Brute-Force Transposed-Indicators-RAM (1)A Traditional BRAM-based BCAM
7/17
Key idea: Transposed RAM - data becomes addresses
WriteWrite ‘0’ to location ‘B’
MatchRead location ‘D’ for match
‘2’
3012
ABCD‘D’
3012
ABCD
‘0’ to ‘B’
* Xilinx App Notes
Brute-Force Transposed-Indicators-RAM (2)Storing Data to Multiple Addresses
• How can we store data to multiple addresses?• Specify addresses using one-hot coding
• Each bit indicates a match or “store at location”
PROBLEM: Depth of CAM is limited by data width of RAM• e.g. to build 1M deep CAM, we need 1M bits wide
• In FPGAs: 1000 BRAMs x 32bit wide = 32K deep CAM
8/17
BRAM-based Single-cycle Depth of CAM is limited by RAM width
1
3210
DCBA
BCAM Cascading
• PROBLEM:• Patterns are encoded as RAM addresses
RAM depth is exponential to pattern width
• Solution: Cascading1. Divide pattern into smaller slices
2. Search for each slice separately
3. If all slices are found pattern match!
RAM depth is linear to pattern width
9/17
RAM Depth = 2Pattern Width
RAM Depth = 2Slice Widthx (Pattern Width / Slice Width)
CAM CAM CAM
SliceSlice SliceM a t c h e d P a t t e r n
Slic
e M
atch
……
…
Slic
e M
atch
Slic
e M
atch
Pattern Match
Hierarchical Search 2D BCAM (1)Narrow and Deep BCAM
10/17
Key idea: Hierarchical search1D BCAM 2D BCAM
4Mtoo
deep
2k
2k
Find a set (row) with match using a 1D BCAM
Search this set (row) in parallel for a specific match
Hierarchical Search 2D BCAM (2)Example
• Divide address space into segments• RAM: each segment in a line
• Transposed-RAM: indicates “pattern in segment?”
• Hierarchical Search:1. Find a row (segment) with
match using a 1D BCAM
2. Search this row (segment) in parallel for a specific match
11/17
0 0 0 00 0 1 11 0 0 00 1 0 0
0 1 2 30123
addresses
pa
tte
rns2
311
0123ad
dre
sses
patterns
Transposed-RAMRAM
Hierarchical Search 2D BCAM (2)Example
• Divide address space into sets• RAM: each segment in a line
• Transposed-RAM: indicates “pattern in segment?”
• Hierarchical Search:1. Find a row (segment) with
match using a 1D BCAM
2. Search this row (segment) in parallel for a specific match
11/176
0 0 0 00 0 1 11 0 0 00 1 0 0
0 1 2 30123
addresses
pa
tte
rns2
311
0123ad
dre
sses
patterns
Transposed-RAMRAM
Hierarchical Search 2D BCAM (2)Example
• Divide address space into sets• RAM: each set in a line
• Transposed-RAM: indicates “pattern in segment?”
• Hierarchical Search:1. Find a row (segment) with
match using a 1D BCAM
2. Search this row (segment) in parallel for a specific match
11/17
0 0 0 00 0 1 11 0 0 00 1 0 0
0 1 2 30123
addresses
pa
tte
rns2 3
1 101se
ts
patterns
Transposed-RAMRAM
Hierarchical Search 2D BCAM (2)Example
• Divide address space into sets• RAM: each set in a line
• Transposed-RAM: indicates “pattern in set?”
• Hierarchical Search:1. Find a row (segment) with
match using a 1D BCAM
2. Search this row (segment) in parallel for a specific match
11/17
0 00 111 01 0
0 10123
sets
pa
tte
rns2 3
1 101
patterns
Transposed-RAMRAMse
ts
Hierarchical Search 2D BCAM (2)Example
• Divide address space into sets• RAM: each set in a line
• Transposed-RAM: indicates “pattern in set?”
• Hierarchical Search:1. Find a set (row) with match
using a 1D BCAM
2. Search this row (segment) in parallel for a specific match
11/17
0 00 111 01 0
0 10123p
att
ern
s2 31 1
01
patterns
Transposed-RAMRAMse
ts
sets
Match pattern ‘3’
Hierarchical Search 2D BCAM (2)Example
• Divide address space into sets• RAM: each set in a line
• Transposed-RAM: indicates “pattern in set?”
• Hierarchical Search:1. Find a set (row) with match
using a 1D BCAM
2. Search this set (row) in parallel for a specific match
11/17
0 00 111 01 0
0 10123p
att
ern
s2 31 1
01
patterns
Transposed-RAMRAMse
ts
sets
Hierarchical Search 2D BCAM (3)Pros and Cons
Single match only
Cannot be
cascaded
RAM depth is exponential to pattern
width
Inefficient for wide patterns
12/17
BRAM-Based Single-cycleEfficient for deep CAMs
Indirectly-Indexed 2D (II2D) BCAM (1)Cascadable Wide and Deep BCAM
13/17
PROBLEM: is it possible to regenerate matches for all addresses?
patte
rns addresses
11 1
1 11 1 1
Indirectly-Indexed 2D (II2D) BCAM (1)Cascadable Wide and Deep BCAM
13/17
PROBLEM: is it possible to regenerate matches for all addresses?
Key observation
Transposed RAMis a sparse matrix
ncolumns (set of addresses)accommodates nmatches (1’s) at most!
patte
rns addresses
11 1
1 11 1 1
Indirectly-Indexed 2D (II2D) BCAM (1)Cascadable Wide and Deep BCAM
13/17
Key idea: use indirect indices to point to intra-set matches
Cascadable
Scalable (linear growth)
Supports wider patterns
PROBLEM: is it possible to regenerate matches for all addresses?
Key observation
Transposed RAMis a sparse matrix
ncolumns (set of addresses)accommodates nmatches (1’s) at most!
patte
rns S e t s
11
11
11
11
Intra-set Match Indicators
Indirectly-Indexed 2D (II2D) BCAM (1)Cascadable Wide and Deep BCAM
13/17
Key idea: use indirect indices to point to intra-set matches
Cascadable
Scalable (linear growth)
Supports wider patterns
PROBLEM: is it possible to regenerate matches for all addresses?
Key observation
Transposed RAMis a sparse matrix
ncolumns (set of addresses)accommodates nmatches (1’s) at most!
patte
rns S e t s
11
11
11
11
BRAMLUTRAM
Intra-set Match Indicators
01
10
Indirectly-Indexed 2D (II2D) BCAM (2)Example
• Divide address space into sets
• Store sets with a match in Indicators-RAM
• Transposed-RAM stores indices to all matches in set
• Hierarchical Search:• Find indices of all matching sets in
Transposed-RAM
• Read Indicators-RAM using indices from Transposed-RAM
14/17
0 0 0 00 0 0 1
0 01 0
0 1 2 30123
addresses
patt
erns
Transposed-RAM
2331
0123
patterns
RAM (reference)
addr
esse
s
01
10
Indirectly-Indexed 2D (II2D) BCAM (2)Example
• Divide address space into sets
• Store sets with a match in Indicators-RAM
• Transposed-RAM stores indices to all matches in set
• Hierarchical Search:• Find indices of all matching sets in
Transposed-RAM
• Read Indicators-RAM using indices from Transposed-RAM
14/17
0 0 0 00 0 0 1
0 01 0
0 1 2 30123
addresses
patt
erns
Transposed-RAM
2331
0123
patterns
RAM (reference)
addr
esse
s
Indirectly-Indexed 2D (II2D) BCAM (2)Example
• Divide address space into sets
• Store sets with a match in Indicators-RAM
• Transposed-RAM stores indices to all matches in set
• Hierarchical Search:• Find indices of all matching sets in
Transposed-RAM
• Read Indicators-RAM using indices from Transposed-RAM
14/17
0 0 0 00 0
1 0
0 0
0 1
0 1 2 30123
addresses
patt
erns
Transposed-RAM
Indicators-RAMs
0 11 0
2331
0123
patterns
RAM (reference)
addr
esse
s
Indirectly-Indexed 2D (II2D) BCAM (2)Example
• Divide address space into sets
• Store sets with a match in Indicators-RAM
• Transposed-RAM stores indices to all matches in set
• Hierarchical Search:• Find indices of all matching sets in
Transposed-RAM
• Read Indicators-RAM using indices from Transposed-RAM
14/17
- --
1 0
-
0 1
0 10123
s e t s
patt
erns
Transposed-RAM
Indicators-RAMs
0 11 0
2331
0123
patterns
RAM (reference)
addr
esse
s
Indirectly-Indexed 2D (II2D) BCAM (2)Example
• Divide address space into sets
• Store sets with a match in Indicators-RAM
• Transposed-RAM stores indices to all matches in set
• Hierarchical Search:• Find indices of all matching sets in
Transposed-RAM
• Read Indicators-RAM using indices from Transposed-RAM
14/16
- --
1 0
-
0 1
0 10123
s e t s
patt
erns
Transposed-RAM
Indicators-RAMs
0 11 0
2331
0123
patterns
RAM (reference)
addr
esse
s
Match pattern ‘3’
Indirectly-Indexed 2D (II2D) BCAM (2)Example
• Divide address space into sets
• Store sets with a match in Indicators-RAM
• Transposed-RAM stores indices to all matches in set
• Hierarchical Search:• Find indices of all matching sets in
Transposed-RAM
• Read Indicators-RAM using indices from Transposed-RAM
14/17
- --
1 0
-
0 1
0 10123
s e t s
patt
erns
Transposed-RAM
Indicators-RAMs
0 11 0
2331
0123
patterns
RAM (reference)
addr
esse
s
Match pattern ‘3’
Indirectly-Indexed 2D (II2D) BCAM (2)Example
• Divide address space into sets
• Store sets with a match in Indicators-RAM
• Transposed-RAM stores indices to all matches in set
• Hierarchical Search:• Find indices of all matching sets in
Transposed-RAM
• Read Indicators-RAM using indices from Transposed-RAM
14/17
- --
1 0
-
0 1
0 10123
s e t s
patt
erns
Transposed-RAM
Indicators-RAMs
0 11 0
2331
0123
patterns
RAM (reference)
addr
esse
s
Match pattern ‘3’
Found in ‘1’ and ‘2’0 1 1 00 1 2 3
Indirectly-Indexed 2D (II2D) BCAM (3)Area and Performance
Except for very a narrow HS,II2D exhibits higher Fmax
register-based BCAMregister consumption
II2D linear ALM consumption;similar to other methods
HS exponential BRAM consumption
II2D linear BRAM consumption
II2D supports wider patterns
15/17
0
0.5
1
1.5
2
2.5
3
91
82
73
64
55
46
37
28
19
09
91
08
11
71
26
13
51
44
15
3 91
82
73
64
55
46
37
2 91
82
73
6 9
16K 32k 64k 128k
M2
0K
s (1
00
0's
)
050
100150200250300350
ALM
s (10
00's)
0
100
200
300
400
500
Fmax
(M
Hz)
Reg-basedBF-BCAMHS-BCAMII2D-BCAMDevice Limit
PW
CD
Indirectly-Indexed 2D (II2D) BCAM (3)Area and Performance
Except for very a narrow HS,II2D exhibits higher Fmax
register-based BCAMregister consumption
II2D linear ALM consumption;similar to other methods
HS exponential BRAM consumption
II2D linear BRAM consumption
II2D supports wider patterns
15/17
0
0.5
1
1.5
2
2.5
3
91
82
73
64
55
46
37
28
19
09
91
08
11
71
26
13
51
44
15
3 91
82
73
64
55
46
37
2 91
82
73
6 9
16K 32k 64k 128k
M2
0K
s (1
00
0's
)
050
100150200250300350
ALM
s (10
00's)
0
100
200
300
400
500
Fmax
(M
Hz)
Reg-basedBF-BCAMHS-BCAMII2D-BCAMDevice Limit
PW
CD
Indirectly-Indexed 2D (II2D) BCAM (3)Area and Performance
Except for very a narrow HS,II2D exhibits higher Fmax
register-based BCAMregister consumption
II2D linear ALM consumption;similar to other methods
HS exponential BRAM consumption
II2D linear BRAM consumption
II2D supports wider patterns
15/17
0
0.5
1
1.5
2
2.5
3
91
82
73
64
55
46
37
28
19
09
91
08
11
71
26
13
51
44
15
3 91
82
73
64
55
46
37
2 91
82
73
6 9
16K 32k 64k 128k
M2
0K
s (1
00
0's
)
050
100150200250300350
ALM
s (10
00's)
0
100
200
300
400
500
Fmax
(M
Hz)
Reg-basedBF-BCAMHS-BCAMII2D-BCAMDevice Limit
PW
CD
Indirectly-Indexed 2D (II2D) BCAM (3)Area and Performance
Except for very a narrow HS,II2D exhibits higher Fmax
register-based BCAMregister consumption
II2D linear ALM consumption;similar to other methods
HS exponential BRAM consumption
II2D linear BRAM consumption
II2D supports wider patterns
15/17
0
0.5
1
1.5
2
2.5
3
91
82
73
64
55
46
37
28
19
09
91
08
11
71
26
13
51
44
15
3 91
82
73
64
55
46
37
2 91
82
73
6 9
16K 32k 64k 128k
M2
0K
s (1
00
0's
)
050
100150200250300350
ALM
s (10
00's)
0
100
200
300
400
500
Fmax
(M
Hz)
Reg-basedBF-BCAMHS-BCAMII2D-BCAMDevice Limit
PW
CD
Indirectly-Indexed 2D (II2D) BCAM (3)Area and Performance
Except for very a narrow HS,II2D exhibits higher Fmax
register-based BCAMregister consumption
II2D linear ALM consumption;similar to other methods
HS exponential BRAM consumption
II2D linear BRAM consumption
II2D supports wider patterns
15/17
0
0.5
1
1.5
2
2.5
3
91
82
73
64
55
46
37
28
19
09
91
08
11
71
26
13
51
44
15
3 91
82
73
64
55
46
37
2 91
82
73
6 9
16K 32k 64k 128k
M2
0K
s (1
00
0's
)
050
100150200250300350
ALM
s (10
00's)
0
100
200
300
400
500
Fmax
(M
Hz)
Reg-basedBF-BCAMHS-BCAMII2D-BCAMDevice Limit
PW
CD
Indirectly-Indexed 2D (II2D) BCAM (3)Area and Performance
Except for very a narrow HS,II2D exhibits higher Fmax
register-based BCAMregister consumption
II2D linear ALM consumption;similar to other methods
HS exponential BRAM consumption
II2D linear BRAM consumption
II2D supports wider patterns
15/17
0
0.5
1
1.5
2
2.5
3
91
82
73
64
55
46
37
28
19
09
91
08
11
71
26
13
51
44
15
3 91
82
73
64
55
46
37
2 91
82
73
6 9
16K 32k 64k 128k
M2
0K
s (1
00
0's
)
050
100150200250300350
ALM
s (10
00's)
0
100
200
300
400
500
Fmax
(M
Hz)
Reg-basedBF-BCAMHS-BCAMII2D-BCAMDevice Limit
PW
CD
Open Source
16/17
http://ece.ubc.ca/~lemieux/downloads/
Modular and parametric Verilog files
Run-in-batch simulation and
synthesis manager
Conclusions
17/17
Brute-Force Transposed-RAM BRAM-based
Single-cycle
Deep
Wide
Cascadable
pa
tte
rns
a d d r e s s e s
Scalable
Conclusions
17/17
Hierarchical Search 2D BCAM BRAM-based
Single-cycle
Deep
Wide
Cascadable
pa
tte
rns
s e t s
Scalable
Conclusions
17/17
Indirectly-Indexed 2D (II2D) BCAMBRAM-based
Single-cycle
Deep
Wide
Cascadable
pa
tte
rns
s e t s
Scalable
Intra-set Match Indicators
Multi-unpredictable-cycle
Thank You!
Backup Slides
16
II2DHierarchicalBrute-Force
patte
rns addresses
patte
rns s e t s
patte
rns s e t s
Match IndicatorsBRAM-based
Single-cycle
Deep
Wide
Conclusions