Date post: | 05-Jan-2016 |
Category: |
Documents |
Upload: | steven-johns |
View: | 213 times |
Download: | 0 times |
The Bloom Paradox
Ori Rottenstreich
Joint work with
Yossi Kanizo and Isaac Keslassy
Technion, Israel
• Requirement: A data structure in user with fast answer to• Solutions:
o O(n) – Searching in a listo O(log(n)) – Searching in a sorted listo O(1) – But with false positives / negatives
Slocal cache
Problem Definition
2
Mcentral memory with
all elements
vuzyxzx
x
usercost = 10
cost = 1x
y
cost = 10
y
user
y
• False Positive: but the data structure answers
• Results in a redundant access to the local cache.
Additional cost of 1.
• False Negative: but the data structure answers
• Results in an expensive access to the central memory instead of the local cache.
Additional cost of 10-1=9.
Two Possible Errors
3
x
y
1
• Initialization: Array of zero bits.
• Insertion: Each of the elements is hashed times, the corresponding bits are set.
• Query: Hashing the element, checking that all bits are set.
• False positive rate (probability) of • No false negatives
Bloom Filters (Bloom, 1970)
4
0000000000 00
1
y1 1
0000000000 00
1 1
z
x11
1 1
1 11 1 1
x11 1 w
1 11
• Cache/Memory Framework• Packet Classification• Intrusion Detection• Routing• Accounting• Beyond networking: Spell Checking, DNA Classification
• Can be found in o Google's web browser Chromeo Google's database system BigTableo Facebook's distributed storage system Cassandrao Mellanox's IB Switch System
Bloom Filters are Widely Used
5
Outline
Introduction to Bloom Filters
The Bloom Paradox
The Variable-Increment Counting Bloom Filter
6
The Bloom Paradox
7
Sometimes, it is better to disregard the Bloom filter results, and in fact not to even query it,
thus making the Bloom filter useless.
• Parameters:
• Extreme case without locality: All elements with equal probability of
belonging to the cache.o Toy example
Example
8
Bloom filter
• Parameters:• Let be the set of elements that the Bloom filter indicates are in
o In particular, no false negatives →
• Intuition:
Slocal cache
Mcentral memory with
all elements
vuzyxzx
cost = 10cost = 1
cost = 10
The Bloom Paradox
. .
userBBloom filterBloom filter
9
• Parameters:• Let be the set of elements that the Bloom filter indicates are in
o In particular, no false negatives →
• Surprise:
cost = 1
Slocal cache
Mcentral memory with
all elements
vuzyxzx
cost = 10
cost = 10
The Bloom Paradox
. . 9
BBloom filter
• Parameters:• Let be the set of elements that the Bloom filter indicates are in
o In particular, no false negatives →
• Surprise:
The Bloom filter indicates the membership of
elements. Only of them are indeed in .
The Bloom Paradox
. .
BBloom filter
• When the Bloom filter states that , it is wrong with probability
• Average cost if we listen to the Bloom filter:
• Average cost if we don’t:
The Bloom filter is useless!
The Bloom Paradox
11
Don’t listen to the Bloom filter
= =
Outline
Introduction to Bloom Filters
The Bloom Paradox
The Variable-Increment Counting Bloom Filter
12
1
• Bloom filters do not support deletions of elements. Simply resetting bits might cause false negatives.
• The solution: Counting Bloom filters - Storing array of counters instead of bits.o Insertion: Incrementing counters by one.o Deletion: Decrementing counters by one. o Query: Checking that counters are positive.
• The same false positive probability.• Require too much memory, e.g. 57 bits per element for .
Counting Bloom Filters (CBFs)
y+1 +1
0102001010 01
+1 +1x
+1+1
0000001010 00
x11 111
• Upon query, we should consider the exact values of the counters and not just their positiveness
• Can we design a deterministic scheme that exploits the exact values of the counters?
• Idea: Use variable increments to encode the element identity
Intuition for Variable Increments
14
0381052010 12
zy
• Each hash entry contains a pair of counters:o , fixed increments → number of elements in entry (as in CBF)o , variable increments → weighted sum of elements
o weights from a pre-determined set
Architecture
15
34 9 6 2626 17 210 25
5 3 3 42 30 3c1
c2
2 7 8 94 5 61 3
2
• We use two sets of hash functions:o The first set uses hash functions with range
, i.e. it points to the set of entries.o The second set uses hash functions with
range , i.e. it points to the set .
• Insertion:At each entry , the two counters are updated as follows.
o o from the set
• Example 1:
Insertion
16
34 9 13 2617 17 210 25
5 3 3 42 30 3c1
c2
2 7 8 94 5 61 3
x
+4+8
2
z
+4+13
• Query ( with )
• We ask whethero 17 can be a sum of 2 elements from the set including 4o 30 can be a sum of 3 elements from the set including 8
• No: • How should we pick the set of variable increments?
Query
17
y
We should use Sequences!
34 30 13 2617 30 210 25
5 4 3 42 30 3c1
c2
2 7 8 94 5 61 3
3
y?
8?4?
• Definition 1:Let be a sequence of positive integers.
Then, is a sequence iff all the sums
with are distinct.
• Example 2:
All the sums of elements of are distinct:
Therefore, is a sequence. • sequences are widely used in error-correcting codes.
Bh Sequences
18
The Bh-CBF Scheme Query
19
• Example 3: is a sequence
o Since , then the Bh-CBF can determine that
34 30 13 2617 30 210 25
5 4 3 42 30 3c1
c2
2 7 8 94 5 61 3
X?
1?
3
4?
• Example 3: is a sequence
The Bh-CBF Scheme Operations
19
o Here, and then necessarily
Since , the Bh-CBF can determine that
34 30 13 2617 30 210 25
5 4 3 42 30 3c1
c2
2 7 8 94 5 61 3
X?
1?
3
4?
The Bh-CBF Scheme Query
y?
8?4?
• Example 3: is a sequence
The Bh-CBF Scheme Operations
19
o Since , the Bh-CBF cannot exclude that
34 30 13 2617 30 210 25
5 4 3 42 30 3c1
c2
2 7 8 94 5 61 3
X?
1?
3
4?
z?
4? 13?
The Bh-CBF Scheme Query
y?
8?4?
• Internet trace (equinix-chicago) with real hash functions.
For the Bh-CBF, (with ).
20
Experimental Results
• The Bloom Paradoxo Discovery of the Bloom paradoxo Importance of the a priori membership probability
• The Variable-Increment Counting Bloom Filtero Can extend many variants of the counting Bloom filtero First time sequences are presented in networking applications
Concluding Remarks
21
Thank You