+ All Categories
Home > Documents > Cuckoo Filter: Practically Better Than Bloom Author: Bin Fan, David G. Andersen, Michael Kaminsky,...

Cuckoo Filter: Practically Better Than Bloom Author: Bin Fan, David G. Andersen, Michael Kaminsky,...

Date post: 18-Jan-2018
Category:
Upload: margery-morgan
View: 215 times
Download: 0 times
Share this document with a friend
Description:
Introdution We propose the Cuckoo filter, a practical data structure that provides four major advantages. 1. It supports adding and removing items dynamically 2. It provides higher lookup performance than traditional Bloom filters, even when close to full (e.g., 95% space utilized) 3. It is easier to implement than alternatives such as the quotient filter 4. It uses less space than Bloom filters in many practical applications, if the target false positive rate ε is less than 3%. National Cheng Kung University CSIE Computer & Internet Architecture Lab 3

If you can't read please download the document

Transcript

Cuckoo Filter: Practically Better Than Bloom Author: Bin Fan, David G. Andersen, Michael Kaminsky, Michael D. Mitzenmacher Publisher: ACM CoNEXT 2014 Presenter: Yi-Hao Lai Date: 2015/10/14 Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C. Introdution Many databases, caches, routers, and storage systems use approximate set membership tests to decide if a given item is in a (usually large) set, with some small false positive probability. The most widely-used data structure for this test is the Bloom lter, which has been studied extensively due to its memory e ciency. A limitation of standard Bloom lters is that one cannot remove existing items without rebuilding the entire lter (or possibly introducing generally less desirable false negatives). National Cheng Kung University CSIE Computer & Internet Architecture Lab 2 Introdution We propose the Cuckoo lter, a practical data structure that provides four major advantages. 1. It supports adding and removing items dynamically 2. It provides higher lookup performance than traditional Bloom lters, even when close to full (e.g., 95% space utilized) 3. It is easier to implement than alternatives such as the quotient lter 4. It uses less space than Bloom lters in many practical applications, if the target false positive rate is less than 3%. National Cheng Kung University CSIE Computer & Internet Architecture Lab 3 Bloom filter Provide a compact representation of a set of items that supports two operations: Insert and Lookup. A Bloom lter allows a tunable false positive rate so that a query returns either denitely not, or probably yes. The lower is, the more space the lter requires. National Cheng Kung University CSIE Computer & Internet Architecture Lab 4 Bloom filter (insert) National Cheng Kung University CSIE Computer & Internet Architecture Lab 5 Input: hash I: hash II: set: { 13 } { 13, 22 } Bloom filter (lookup) National Cheng Kung University CSIE Computer & Internet Architecture Lab Input:16 hash I: hash II: { 13, 22, 6, 2 }set: definitely not probably yes Bloom filter National Cheng Kung University CSIE Computer & Internet Architecture Lab 7 Bloom filter and Variants National Cheng Kung University CSIE Computer & Internet Architecture Lab 8 Cuckoo Hash Tables National Cheng Kung University CSIE Computer & Internet Architecture Lab 9 A basic cuckoo hash table consists of an array of buckets where each item has two candidate buckets determined by hash functions h1(x) and h2(x). The lookup procedure checks both buckets to see if either contains this item. Support insert and delete. Cuckoo Hash Tables National Cheng Kung University CSIE Computer & Internet Architecture Lab 10 insert Cuckoo Hash Tables National Cheng Kung University CSIE Computer & Internet Architecture Lab 11 Cuckoo hashing ensures high space occupancy because it renes earlier item-placement decisions when inserting new items. Most practical implementations of cuckoo hashing extend the basic description above by using buckets that hold multiple items. With proper conguration of cuckoo hash table parameters, the table space can be 95% lled with high probability. Cuckoo Filter National Cheng Kung University CSIE Computer & Internet Architecture Lab 12 To improve hash table performance by an optimization called partial-key cuckoo hashing. To reduce the hash table size, each item is rst hashed into a constant-sized ngerprint before inserted into this hash table. The basic unit of the cuckoo hash tables used for our cuckoo lters is called an entry. Each entry stores one ngerprint. The hash table consists of an array of buckets, where a bucket can have multiple entries. Cuckoo Filter (insert) National Cheng Kung University CSIE Computer & Internet Architecture Lab 13 Cuckoo Filter (lookup) National Cheng Kung University CSIE Computer & Internet Architecture Lab 14 Cuckoo Filter (delete) National Cheng Kung University CSIE Computer & Internet Architecture Lab 15 Asymptotic Behavior National Cheng Kung University CSIE Computer & Internet Architecture Lab 16 Minimum Fingerprint Size National Cheng Kung University CSIE Computer & Internet Architecture Lab 17 Minimum Fingerprint Size National Cheng Kung University CSIE Computer & Internet Architecture Lab 18 Minimum Fingerprint Size National Cheng Kung University CSIE Computer & Internet Architecture Lab 19 Minimum Fingerprint Size National Cheng Kung University CSIE Computer & Internet Architecture Lab 20 Empirical Evaluation National Cheng Kung University CSIE Computer & Internet Architecture Lab 21 For the experiments, we varied the ngerprint size f from 1 to 20 bits. Random 64-bit keys are inserted to an empty lter until a single insertion relocates existing ngerprints more than 500 times Space Optimization National Cheng Kung University CSIE Computer & Internet Architecture Lab 22 Although each entry of the hash table stores one ngerprint, not all entries are occupied. As a result, each item e ectively costs more to store than a ngerprint. The amortized space cost C for each item is Optimal Bucket Size National Cheng Kung University CSIE Computer & Internet Architecture Lab 23 Larger buckets improve table occupancy The load factor is 50% when the bucket size b = 1, but increases to 84%, 95% or 98% respectively using bucket size b = 2, 4 or 8. Optimal Bucket Size National Cheng Kung University CSIE Computer & Internet Architecture Lab 24 Semi-sorting Buckets This subsection describes a technique for cuckoo lters with b = 4 entries per bucket that saves one bit per item. Assume each bucket contains b = 4 ngerprints and each ngerprint is f = 4 bits. An uncompressed bucket occupies 44 = 16 bits. If we sort all four 4-bit ngerprints stored in this bucket, there are only 3876 possible outcomes in total. Precompute these values, each original bucket can be represented by a 12-bit index. National Cheng Kung University CSIE Computer & Internet Architecture Lab 25 Space and lookup cost National Cheng Kung University CSIE Computer & Internet Architecture Lab 26 Comparison with Bloom filter Space E ciency Number of Memory Accesses Bloom filter: k = 2 when = 25%, but k is 7 when = 1% Value Association Maximum Capacity Limited Duplicates National Cheng Kung University CSIE Computer & Internet Architecture Lab 27 Experiment National Cheng Kung University CSIE Computer & Internet Architecture Lab 28 Experiment National Cheng Kung University CSIE Computer & Internet Architecture Lab 29 Experiment National Cheng Kung University CSIE Computer & Internet Architecture Lab 30 Experiment National Cheng Kung University CSIE Computer & Internet Architecture Lab 31 Experiment National Cheng Kung University CSIE Computer & Internet Architecture Lab 32 Experiment National Cheng Kung University CSIE Computer & Internet Architecture Lab 33


Recommended