Leveraging bloom › bloom-presentation.pdfTuning bloom filters •False positive probability:...

Post on 27-Jun-2020

8 views 0 download

transcript

Leveraging bloom filters on Redis

Cristian Castiblancome@cristian.io | cristian@scopely.com

https://cristian.io

Stream processing at Scopely

Stream processing at Scopely

Idempotence

An operation is said to be idempotent when applying it multiple times has the same

effect.

Simplest approach to idempotence

Idempotence with Redis sets

Idempotence with Redis sets

Idempotence with Redis sets

Idempotence with Redis sets

Memory usage per idempotence store320 million records/day ≈ 70GB of memory

Is there a better way?

Is there a better way?• Space-efficient

Is there a better way?• Space-efficient

• Cost-effective

Is there a better way?• Space-efficient

• Cost-effective

• More performant

Is there a better way?• Space-efficient

• Cost-effective

• More performant

• Awesome

Enter bloom filtersProbabilistic data structure to

check for item membership

Enter bloom filtersProbabilistic data structure to

check for item membership

Bloom filters query

Bloom filters query• Definitely not in the set

Bloom filters query• Definitely not in the set

• Probably in the set

Bloom filters query• Definitely not in the set

• Probably in the set

• Configurable error rate

Bloom fiters space efficiencyGiven 10.000.000 UUIDs...

Bloom fiters space efficiencyGiven 10.000.000 UUIDs...

• Redis set: 1GB

Bloom fiters space efficiencyGiven 10.000.000 UUIDs...

• Redis set: 1GB

• Plain text: ~300 MB

Bloom fiters space efficiencyGiven 10.000.000 UUIDs...

• Redis set: 1GB

• Plain text: ~300 MB

• gzip: ~150 MB

Bloom fiters space efficiencyGiven 10.000.000 UUIDs...

• Redis set: 1GB

• Plain text: ~300 MB

• gzip: ~150 MB

• Bloom filter with 1e-05 error rate: ~30MB(i.e., 1 in a million)

Bloom fiters space efficiencyGiven 10.000.000 UUIDs...

• Redis set: 1GB

• Plain text: ~300 MB

• gzip: ~150 MB

• Bloom filter with 1e-05 error rate: ~30MB(i.e., 1 in a million)

• Bloom filter with 1e-11 error rate: ~60MB(i.e., 1 in a million million)

Memory usage comparisonSets 70GB vs Bloom Filters 7GB

Latency comparison

Redis sets Bloom filters

Bloom filters example

False positive == dropped data

Bloom filters characteristics

• Capacity

• Error rate probability

Scaling bloom filters

Scaling bloom filters

Scaling bloom filters

Scaling bloom filters

Scaling bloom filters

Scaling bloom filters

Scaling bloom filters

Scaling bloom filters

Tuning bloom filtersSize depends on capacity/error

probability

Tuning bloom filters

Tuning bloom filters

• False positive probability:

• Depends on your use case

Tuning bloom filters

• False positive probability:

• Depends on your use case

• Initial capacity:

• Can't be too generous

• Can't be too conservative

First attempt: LUA scripts

Second attempt: bloomd

github.com/armon/bloomd

bloomd drawbacks

bloomd drawbacks• Lack of High Availability

bloomd drawbacks• Lack of High Availability

• No clustering support

bloomd drawbacks• Lack of High Availability

• No clustering support

• Maintenance

bloomd drawbacks• Lack of High Availability

• No clustering support

• Maintenance

• Rigid API

bloomd drawbacks• Lack of High Availability

• No clustering support

• Maintenance

• Rigid API

• Feels like abandonware

ReBloomBloom filters as a Redis module

ReBloom example> BF.RESERVE your_filter 0.00001 50000000OK

> BF.ADD your_filter foo1

> BF.EXISTS your_filter foo1

> BF.EXISTS your_filter bar0

ReBloom

ReBloom• Clustering

ReBloom• Clustering

• Redundancy/replication

ReBloom• Clustering

• Redundancy/replication

• Lower cognitive overhead

ReBloom• Clustering

• Redundancy/replication

• Lower cognitive overhead

• Powerful API

ReBloom• Clustering

• Redundancy/replication

• Lower cognitive overhead

• Powerful API

• No maintainance

Summary

• Bloom filters significantly reduce memory usage and latency • Redis modules allows your custom data structures to scale

github.com/casidiablocristian.io