Date post: | 18-Nov-2014 |
Category: |
Documents |
Upload: | senor-smiles |
View: | 117 times |
Download: | 1 times |
Alternative Data Structures in Ruby
Tyler McMullen
Friday, February 19, 2010
Why?
Friday, February 19, 2010
Why?
• Speed
• Memory
• Clarity
Friday, February 19, 2010
What’s wrong with my favorite data structure, X?
Friday, February 19, 2010
Nothing. (Maybe.)
Friday, February 19, 2010
•Bloom Filter
•BK-tree
•Splay Tree
•Trie
Friday, February 19, 2010
Bloom Filters
• Tests for existence in a set
• Probabilistic
• Minimal memory use
Friday, February 19, 2010
100 million strings in a Set
Traditional Set: Minimum 10gb
Friday, February 19, 2010
100 million strings in a Set
Traditional Set: Minimum 10gbBloom Filter (0.00001): 280mb
Friday, February 19, 2010
100 million strings in a Set
Traditional Set: Minimum 10gbBloom Filter (0.00001): 280mb
Bloom Filter (0.001): 170mb
Friday, February 19, 2010
Friday, February 19, 2010
10 2 3 4 5 6 7
Friday, February 19, 2010
10 2 3 4 5 6 7
“to be or not to be”
Friday, February 19, 2010
10 2 3 4 5 6 7
add: “to be or not to be”
Friday, February 19, 2010
10 2 3 4 5 6 7
add: “that is the question”
Friday, February 19, 2010
10 2 3 4 5 6 7
query: “whether ‘tis nobler”
NO MATCH
Friday, February 19, 2010
10 2 3 4 5 6 7
query: “to be or not to be”
MATCH
Friday, February 19, 2010
10 2 3 4 5 6 7
query: “in the mind to suffer”
FALSE MATCH
Friday, February 19, 2010
File Server
Friday, February 19, 2010
File Server
Request
exists?
200 404
Y N
Friday, February 19, 2010
File Server
Request
exists?
200 404
Y N
Bloom Filter
Friday, February 19, 2010
Bloom Filter
• Test for existence in set
• Tiny Memory Footprint
• Excellent Speed
Friday, February 19, 2010
BK-tree
Friday, February 19, 2010
BK-tree
• find items within a distance of a target
• reduces search space
• works inside a metric space
Friday, February 19, 2010
Triangle Inequality| d(x, y) - d(x, z) | ≤ d(y, z)
Friday, February 19, 2010
Triangle Inequality| d(x, y) - d(x, z) | ≤ d(y, z)
x
y
z
Friday, February 19, 2010
Triangle Inequality| d(x, y) - d(x, z) | ≤ d(y, z)
1
4
x
y
z
Friday, February 19, 2010
Triangle Inequality| d(x, y) - d(x, z) | ≤ d(y, z)
1
4
x
y
z
?
Friday, February 19, 2010
Triangle Inequality| 4 - 1 | ≤ d(y, z)
1
4
x
y
z
?
Friday, February 19, 2010
Triangle Inequality3 ≤ d(y, z)
1
4
x
y
z
≥3
Friday, February 19, 2010
BK-tree
paste
pasta
taser
pastor
shave
light
Friday, February 19, 2010
BK-tree
paste
pasta
taser
pastor
shave
light
Friday, February 19, 2010
BK-tree
paste
pasta taserpastor shave light1 2 3 4 5
root
Friday, February 19, 2010
BK-tree
paste
pasta taserpastor shave light1 2 3 4 5
rootpastu
Friday, February 19, 2010
BK-tree
paste
pasta taserpastor shave light1 2 3 4 5
rootpastu
1
Friday, February 19, 2010
BK-tree
paste
pasta taserpastor shave light1 2 3 4 5
rootpastu
1
Friday, February 19, 2010
BK-tree
paste
pasta pastor
rootpastu
1
1 2
Friday, February 19, 2010
BK-tree
paste
pasta pastor
rootpastu
1
1 2
Friday, February 19, 2010
BK-tree
paste
pasta taserpastor shave light1 2 3 4 5
root
Friday, February 19, 2010
BK-tree
paste
pasta taserpastor shave light1 2 3 4 5
rootpastu
Friday, February 19, 2010
BK-tree
paste
pasta taserpastor shave light1 2 3 4 5
rootpastu
Friday, February 19, 2010
BK-tree
paste
pasta taserpastor shave light1 2 3 4 5
rootpastu
Friday, February 19, 2010
BK-tree
paste
pasta taserpastor shave light1 2 3 4 5
rootpastu
Friday, February 19, 2010
BK-tree
• Most often used for spelling correctors
• Work in any metric space
• Reduce the search space
Friday, February 19, 2010
Splay Tree
Friday, February 19, 2010
Tangent: Access Patterns
Friday, February 19, 2010
Access Patterns
Usually assumed to be random or even.
Friday, February 19, 2010
Access Patterns
Rarely the case.
Friday, February 19, 2010
Splay Tree
• Self-balancing binary tree
• Brings most accessed items toward root
• The more uneven the access pattern, the better
Friday, February 19, 2010
Splay Tree
7
4
2 6
5 41 3
11
9 13
12 148 10
Friday, February 19, 2010
Splay Tree
7
4
2 6
5 41 3
11
9 13
12 148 10
Friday, February 19, 2010
Splay Tree
7
4
2 6
5 41 3
11
9
13
12 14
8
10
Friday, February 19, 2010
Splay Tree
7
4
2 6
5 41 3
11
9
13
12 14
8
10
Friday, February 19, 2010
Splay Tree
• Made for very uneven access patterns
• Caches, Garbage collectors, etc...
Friday, February 19, 2010
Trie
Friday, February 19, 2010
Trie
• O(1) on lookup, add, removal
• Ordered traversals
• Prefix matching
• Excellent memory usage (depending on implementation)
Friday, February 19, 2010
Trie
Friday, February 19, 2010
Trie
T
H
N
I
add: “thin”
Friday, February 19, 2010
Trie
T
H
N
I
R
A
P
add: “trap”
Friday, February 19, 2010
Trie
T
H
N
I
R
A
P
B
A
R
add: “bar”
Friday, February 19, 2010
Trie
T
H
N
I
R
A
P
B
A
R
U
R
P
add: “burp”
Friday, February 19, 2010
Trie
T
H
N
I
R
A
P
B
A
R
U
R
P
query: “trap”
Friday, February 19, 2010
Trie
T
H
N
I
R
A
P
B
A
R
U
R
P
query: “trap”
Friday, February 19, 2010
Trie
T
H
N
I
R
A
P
B
A
R
U
R
P
query: “trap”
Friday, February 19, 2010
Trie
T
H
N
I
R
A
P
B
A
R
U
R
P
query: “trap”
Friday, February 19, 2010
Trie
T
H
N
I
R
A
P
B
A
R
U
R
P
query: “trap”
Success!Friday, February 19, 2010
Trie
T
H
N
I
R
A
P
B
A
R
U
R
P
query: “bumpkin”
Friday, February 19, 2010
Trie
T
H
N
I
R
A
P
B
A
R
U
R
P
query: “bupkis”
Friday, February 19, 2010
Trie
T
H
N
I
R
A
P
B
A
R
U
R
P
query: “bupkis”
Friday, February 19, 2010
Trie
T
H
N
I
R
A
P
B
A
R
U
R
P
query: “bupkis”
Fail!Friday, February 19, 2010
Trie
Example: Autocompleter
Friday, February 19, 2010
Trie
class Autocompleter def initialize(words) @trie = Trie.new words.each { |word| @trie.add(word) } end
def query(word) return @trie.children(word) endend
Friday, February 19, 2010
Trieclass Autocompleter def initialize(words) @trie = Trie.new words.each { |word| @trie.add(word) } end
def call(env) request = Rack::Request.new(env) return [200, { ‘content-‐type’ => ‘application/json’ }, @trie.children(word).to_json] endend
Friday, February 19, 2010
Conclusion: Data structures are cool.
Friday, February 19, 2010
Questions?
Friday, February 19, 2010