Post on 14-Jan-2016
description
transcript
Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking
Subsystems
Sailesh KumarPatrick Crowley
2 - Sailesh Kumar - 04/21/23
Problem Statement
How to implement deterministic hast tables
Near worst case O(1) deterministic performance
We are given with a small amount of on-chip memory
On-chip memory limited to 1-2 bytes per table entry
In this paper we tackle the above problem
3 - Sailesh Kumar - 04/21/23
Hash Tables
Hash table uses a hash function which is used to index the table entries» hash("apple") = 5
hash("watermelon") = 3hash("grapes") = 9hash("cantaloupe") = 7hash("kiwi") = 0hash("mango") = 6hash("banana") = 2
» hash("honeydew") = 2
This is called collision» Now what
kiwi
bananaWatermelon
applemango
cantaloupe
grapes
0
1
2
3
4
5
6
7
8
9Linear ProbingDouble HashingHash2(honeydew) = 3
honeydew
honeydewLinear Chaining
honeydew
No. of keys mapped to a
bucket is called collision chain
length
4 - Sailesh Kumar - 04/21/23
Performance Analysis
Average performance is O(1) However, worst-case performance is O(n) In fact the probability of collision chain > 1 is
pretty high
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
10 20 30 40 50 60 70 80 90 100
Load m/n (%)
Pro
babi
lity
Co llis io n c h a in > 1
Co llis io n c h a in > 2
These keys will take twice time to be
probed
These will take thrice the time to be
probed
Pretty high probability that performance is half or three times
lower
5 - Sailesh Kumar - 04/21/23
Segmented Hashing
Uses power of multiple choices» has been proposed and used earlier by several authors
A N-way segmented hash» Logically divides the hash table array into N equal segments» Maps the incoming keys onto a bucket from each segment» Picks the bucket which is either empty or has minimum keys
k i
h( ) k i is mappedto this bucket
k i+1
h( )k i+1 is mappedto this bucket
2 1 1 1 2 1 21 2
A 4-way segmented hash table
12
6 - Sailesh Kumar - 04/21/23
Segmented Hash Performance
More segments improves the probabilistic performance» With 64 segments, probability of collision chain > 2 is nearly
zero even at 100% load» More deterministic hash table performance
1E-15
1E-12
1E-09
1E-06
1E-03
1E+00
10 20 30 40 50 60 70 80 90 100
Load m/n (%)
Pro
b. {
collis
ion
chai
n >
1}
1 s e g me n t
4
16
64
32
8
1E-15
1E-12
1E-09
1E-06
1E-03
1E+00
10 20 30 40 50 60 70 80 90 100
Load m/n (%)
Pro
b. {
collis
ion
chai
n >
2} 1 s e g me n t
4
16
32
8
7 - Sailesh Kumar - 04/21/23
An Obvious Deficiency
O(N) memory probes per query» Requires N times higher memory bandwidth
How to ensure an O(1) memory probes per query
Use Bloom filters implemented using small on-chip memory (filters out unnecessary memory accesses)
Before going further brief introduction of Bloom filters
2 1 1 1 2 0 1 21 2
k ih( ) Every query requires 4 probes
8 - Sailesh Kumar - 04/21/23
Bloom Filter
X
1
1
1
1
1
m-bit Array
H1
H2
H3
H4
Hk
Bloom Filter
9 - Sailesh Kumar - 04/21/23
Bloom Filter
Y
1
1
1
1
1
m-bit Array
1
1
1
H1
H2
H3
H4
Hk
10 - Sailesh Kumar - 04/21/23
Bloom Filter
X
1
1
1
1
1
m-bit Array
1
1
1
match
H1
H2
H3
H4
Hk
11 - Sailesh Kumar - 04/21/23
Bloom Filter
W
1
1
1
1
1
m-bit Array
1
1
1
Match
(false positive)
H1
H2
H3
H4
Hk
12 - Sailesh Kumar - 04/21/23
Adding per Segment Filters
0
1
0
2 1 1 1 2 0 1 21 2
k ih( ) k i can go to any of the 3 buckets
1
0
0
0
0
1
1
0
1
h1(ki)
h2(ki)
hk(ki)
:
mb bits
We can select any of the above three segments and insert the key into the
corresponding filter
13 - Sailesh Kumar - 04/21/23
False Positive Rates
With Bloom Filters, there is likelihood of false positives» False positive means unnecessary memory accesses
With N segments, clearly the false positive rates will be at least N times higher» In fact, it will be even higher, because we have to also
consider several permutations of false positives
We use Selective Filter Insertion algorithm, which reduces the false positive rates by several orders of magnitude
14 - Sailesh Kumar - 04/21/23
Selective Filter Insertion Algorithm
0
1
0
k ih( )
2 1 1 1 2 0 1 21 2
k i can go to any of the 3 buckets
1
0
0
0
0
1
1
0
1
h1(ki)
h2(ki)
hk(ki)
:
mb bits
Insert the key into segment 4, since fewer bits are set. Fewer
bits are set => lower false positive
With more segments (or more choices), our
algorithm sets far fewer bits in the Bloom filter
15 - Sailesh Kumar - 04/21/23
Selective Filter Insertion Details
Greedy policy
For every arriving key
We choose the segment where minimum bits are set in the Bloom filter
We show that this leads to unbalanced segments» Reduced performance
16 - Sailesh Kumar - 04/21/23
Selective Filter Insertion Algorithmk1
h( )
h1( )
h2( )
1
1
1
17 - Sailesh Kumar - 04/21/23
Selective Filter Insertion Algorithmk2
h( )
h1( )
h2( )
1
1
1
1
1
18 - Sailesh Kumar - 04/21/23
Selective Filter Insertion Algorithmk3
h( )
h1( )
h2( )
1
1
1
1
1
1
1
19 - Sailesh Kumar - 04/21/23
Selective Filter Insertion Algorithmk4
h( )
h1( )
h2( )
1
1
1
1
1
1
1
1
1
20 - Sailesh Kumar - 04/21/23
Selective Filter Insertion Algorithmk5
h( )
h1( )
h2( )
1
1
1
1
1
1
1
1
1
Reduced No. of
choices
21 - Sailesh Kumar - 04/21/23
Selective Filter Insertion Enhancement
Objective is to keep segments balanced
Might need to make sub-optimal choices at times
One way is to avoid the most loaded segment» Reduces number of choices by 1
However, it leads to situations where two segments alternately leads
Things get complicated» More detailed version of algorithm can be found in paper
22 - Sailesh Kumar - 04/21/23
Selective Filter Insertion Results
1E-11
1E-09
1E-07
1E-05
1E-03
1E-01
8 16 24 32 40 48 56 64
Bloom Filter bits per hash table entry
Fal
se p
ositi
ve p
roba
bilit
y
O p t im um k
N o r m a l B lo o m f ilt e r
Se le c t iv e F ilt e r I n se r t io n s
6 4 se gm e n t s
23 - Sailesh Kumar - 04/21/23
Simulation Results
64K buckets, 32 bits/entry Bloom filter. Simulation runs for 500 phases.
» During every phase, 100,000 random searches are performed. Between two phases, 10,000 random keys are deleted and inserted.
Hash policy = Linear Chaining
1
1.1
1.2
1.3
1.4
1.5
0 20 40 60 80 100
Load (%)
Avg
. sea
rch
time
1 s eg m en t
4
1 66 4
24 - Sailesh Kumar - 04/21/23
Conclusion
We presented a way to implement
» Hash tables with deterministic performance» We utilize small on-chip memory to achieve it» We also show that on-chip memory requirements are modest» Well within the Moore’s law» A 1M hash table for example needs 1-2MB of on-chip memory
Questions?