Scalable Low-Latency Indexes for a Key-Value Store
Ankita Kejriwal
With Arjun Gopalan, Ashish Gupta, Greg Hill, Zhihao Jia, Stephen Yang
and John Ousterhout
A key value store can support
strongly consistent secondary indexes
while operating at low latency and large scale.
SLIK Slide 2
Thesis
● Scalable Low-latency Indexes for a Key-value Store: SLIK § Enables multiple secondary keys for each object § Allows lookups and range queries on these keys
● Key design features: § Scalability using independent partitioning § Strong consistency using an ordered write approach
● Implemented in RAMCloud
● Performance: § Linear throughput increase with increasing number of partitions
§ 11-13 µs indexed reads § 29-37 µs durable writes/overwrites of objects with one indexed attribute § Latency approx. 2x non-indexed reads and writes
Summary of Results
Slide 3
SLIK Slide 4
Talk Outline
● Motivation ● Design
● Performance
● Related Work
● Summary
SLIK Slide 5
Motivation
Traditional RDBMs
MySQL
SLIK Slide 6
Motivation
Traditional RDBMs
NoSQL Systems
+ scalability
- data models - consistency MySQL
SLIK Slide 7
Motivation
Traditional RDBMs
NoSQL Systems
+ scalability
- data models - consistency MySQL
+ consistency
+ data models + low latency
H-Base Espresso
PNUTS Tao
Spanner Megastore HyperDex MongoDB H-Store
RAMCloud
Mem- cached
Yesquel
SLIK Slide 8
Motivation
Traditional RDBMs
NoSQL Systems
+ scalability
- data models - consistency MySQL
+ consistency
+ data models + low latency
H-Base Espresso
PNUTS Tao
Spanner Megastore HyperDex MongoDB H-Store
RAMCloud
Mem- cached
Yesquel
SLIK Slide 9
Talk Outline
● Motivation
● Design ● Performance
● Related Work
● Summary
● Data model
● Scalability
● Strong consistency
● Storage
● Durability
● Availability
SLIK Slide 10
Design
● Scalability
● Strong consistency
SLIK Slide 11
Design
● Scalability
● Strong consistency
SLIK Slide 12
Design
● Scalability § Nearly constant low latency irrespective of the server span § Linear increase in throughput with the server span
● Strong consistency
SLIK Slide 13
Design
Index Partitioning: Colocation
Slide 14
● Colocate index entries and objects
● One of the keys used to partition the objects and indexes
Indexlet
Tablet
1 q rose
2 a tulip
3 n violet
Tablet
4 e clover
5 v daily
6 g iris
v è 5 e è 4 g è 6
Indexlet
Tablet
7 m lily
8 b dahlia
b è 8 m è 7
Indexlet
Server 1 Server 2 Server 3
n è 3 q è 1 a è 2
SLIK
primary key
index key
value index key
primary key
Index Partitioning: Colocation
Slide 15
● Colocate index entries and objects
● One of the keys used to partition the objects and indexes
● No association between index partitions and index key ranges
Indexlet
Tablet
1 q rose
2 a tulip
3 n violet
Tablet
4 e clover
5 v daily
6 g iris
v è 5 e è 4 g è 6
Indexlet
Tablet
7 m lily
8 b dahlia
b è 8 m è 7
Indexlet
Server 1 Server 2 Server 3
n è 3 q è 1 a è 2 primary
key
index key
value index key
primary key Metadata:
tablet & indexlet w/ pk 1 to 3: S 1 tablet & indexlet w/ pk 4 to 6: S 2 tablet & indexlet w/ pk >= 7: S 3
Slide 16
Index Partitioning: Colocation
Indexlet
Tablet
1 q rose
2 a tulip
3 n violet
a è 2 n è 3 q è 1
Tablet
4 e clover
5 v daily
6 g iris
v è 5 e è 4 g è 6
Indexlet
Tablet
7 m lily
8 b dahlia
b è 8 m è 7
Indexlet
Server 1 Server 2 Server 3
SLIK
Client query: objects with index key between m - q
Slide 17
Index Partitioning: Colocation
Client query: objects with index key between m - q
Indexlet
Tablet
1 q rose
2 a tulip
3 n violet
a è 2 n è 3 q è 1
Tablet
4 e clover
5 v daily
6 g iris
v è 5 e è 4 g è 6
Indexlet
Tablet
7 m lily
8 b dahlia
b è 8 m è 7
Indexlet
Server 1 Server 2 Server 3
SLIK
Slide 18
Index Partitioning: Colocation
Indexlet
Tablet
1 q rose
2 a tulip
3 n violet
a è 2 n è 3 q è 1
Tablet
4 e clover
5 v daily
6 g iris
v è 5 e è 4 g è 6
Indexlet
Tablet
7 m lily
8 b dahlia
b è 8 m è 7
Indexlet
Server 1 Server 2 Server 3
Client query: objects with index key between m - q
SLIK
Slide 19
Index Partitioning: Colocation
Not Scalable!
Indexlet
Tablet
1 q rose
2 a tulip
3 n violet
a è 2 n è 3 q è 1
Tablet
4 e clover
5 v daily
6 g iris
v è 5 e è 4 g è 6
Indexlet
Tablet
7 m lily
8 b dahlia
b è 8 m è 7
Indexlet
Server 1 Server 2 Server 3
Client query: objects with index key between m - q
Index Partitioning: Independent
Slide 20
● Partition each index and table independently
● Partition each index according to sort order for that index
SLIK
Tablet
1 q rose
2 a tulip
3 n violet
Tablet
4 e clover
5 v daily
6 g iris
Tablet
7 m lily
8 b dahlia
Server 1 Server 2 Server 3
Indexlet
a è 2 b è 8 e è 4 g è 6
Server 4
Indexlet
n è 3 m è 7 v è 5 q è 1
Server 5
primary key
index key
value index key
primary key
Index Partitioning: Independent
Slide 21
● Partition each index and table independently
● Partition each index according to sort order for that index
SLIK
Tablet
1 q rose
2 a tulip
3 n violet
Tablet
4 e clover
5 v daily
6 g iris
Tablet
7 m lily
8 b dahlia
Server 1 Server 2 Server 3
Indexlet
a è 2 b è 8 e è 4 g è 6
Server 4
Indexlet
n è 3 m è 7 v è 5 q è 1
Server 5
primary key
index key
value index key
primary key
Metadata: tablet w/ pk 1 to 3: S 1 tablet w/ pk 4 to 6: S 2 tablet w/ pk >= 7: S 3 indexlet w/ sk a to g: S 4 indexlet w/ sk >= h: S 5
Index Partitioning: Independent
Slide 22 SLIK
Client query: objects with index key between m - q
Tablet
1 q rose
2 a tulip
3 n violet
Tablet
4 e clover
5 v daily
6 g iris
Tablet
7 m lily
8 b dahlia
Server 1 Server 2 Server 3
Indexlet
a è 2 b è 8 e è 4 g è 6
Server 4
Indexlet
n è 3 m è 7 v è 5 q è 1
Server 5
Index Partitioning: Independent
Slide 23
Client query: objects with index key between m - q
Indexlet
a è 2 b è 8 e è 4 g è 6
Server 4
Indexlet
n è 3 m è 7 v è 5 q è 1
Server 5
Tablet
1 q rose
2 a tulip
3 n violet
Tablet
4 e clover
5 v daily
6 g iris
Tablet
7 m lily
8 b dahlia
Server 1 Server 2 Server 3
SLIK
Index Partitioning: Independent
Slide 24
Client query: objects with index key between m - q
Tablet
1 q rose
2 a tulip
3 n violet
Tablet
4 e clover
5 v daily
6 g iris
Tablet
7 m lily
8 b dahlia
Server 1 Server 2 Server 3
Indexlet
a è 2 b è 8 e è 4 g è 6
Server 4
Indexlet
n è 3 m è 7 v è 5 q è 1
Server 5
SLIK
Index Partitioning: Independent
Slide 25
Client query: objects with index key between m - q
Tablet
1 q rose
2 a tulip
3 n violet
Tablet
4 e clover
5 v daily
6 g iris
Tablet
7 m lily
8 b dahlia
Server 1 Server 2 Server 3
Indexlet
a è 2 b è 8 e è 4 g è 6
Server 4
Indexlet
n è 3 m è 7 v è 5 q è 1
Server 5
Scalable!
Index Partitioning: Lookup Latency
0
20
40
60
80
100
0 10 20 30 40 50 60 70 80
Look
up L
aten
cy (µ
s)
Number of Servers
Colocation size 1Colocation size 10
Independent size 1Independent size 10
8.3
26.7
87.3
16.2
89.7
15.2 16.712.7
22.328.8
Index Partitioning: Lookup Throughput
0500
1000150020002500300035004000
0 2 4 6 8 10
Thro
ughp
ut (1
03 loo
kups
/sec
)
Number of Indexlets
Independent PartitioningColocation
634940
12491558
18592184
24782782
3092
335 461 423 463 447 457 447 357 441 418435
● Scalability § Nearly constant low latency irrespective of the server span § Linear increase in throughput with the server span
● Strong consistency
SLIK Slide 28
Design
● Scalability § Nearly constant low latency irrespective of the server span § Linear increase in throughput with the server span § Solution: Use independent partitioning § But: indexed object writes: distributed operations § Potential consistency issues between indexes and objects
● Strong consistency
SLIK Slide 29
Design
● Scalability § Nearly constant low latency irrespective of the server span § Linear increase in throughput with the server span § Solution: Use independent partitioning § But: indexed object writes: distributed operations § Potential consistency issues between indexes and objects
● Strong consistency
SLIK Slide 30
Design
● Scalability § Nearly constant low latency irrespective of the server span § Linear increase in throughput with the server span § Solution: Use independent partitioning § But: indexed object writes: distributed operations § Potential consistency issues between indexes and objects
● Strong consistency § With minimal performance overheads
SLIK Slide 31
Design
SLIK Slide 32
Consistency Properties
● If an object contains a given secondary key, then an index lookup with that key will return the object
● If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range
SLIK Slide 33
Consistency Properties
● If an object contains a given secondary key, then an index lookup with that key will return the object
● If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range
Frank
Bob Alice
Trent
Carol Peggy
SLIK Slide 34
Consistency Properties
● If an object contains a given secondary key, then an index lookup with that key will return the object
● If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range
students with name
between a – d?
Alice
Carol
Frank
Bob Alice
Trent
Carol Peggy
?
SLIK Slide 35
Consistency Properties
● If an object contains a given secondary key, then an index lookup with that key will return the object
● If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range
SLIK Slide 36
Consistency Properties
● If an object contains a given secondary key, then an index lookup with that key will return the object
● If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range
Frank
Bob Alice
Trent
Carol Peggy
SLIK Slide 37
Consistency Properties
● If an object contains a given secondary key, then an index lookup with that key will return the object
● If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range
students with name
between a – d?
Alice
Bob
Carol
Peggy
Frank
Bob Alice
Trent
Carol Peggy ?
SLIK Slide 38
Consistency Properties
● If an object contains a given secondary key, then an index lookup with that key will return the object
● If an object is returned by index lookup, then this object contains a secondary key for that index within the specified range
students with name
between a – d?
Alice
Bob
Carol
Frank
Bob Alice
Trent
Carol Peggy
Consistency
Slide 39
● Consistency properties: § If an object contains a given secondary key, then an index lookup with
that key will return the object § If an object is returned by index lookup, then this object contains a
secondary key for that index within the specified range
● Solution: § Longer index lifespan (via ordered writes) § Object data is ground truth and index entries serve as hints
Consistency
Slide 40
● Consistency properties: § If an object contains a given secondary key, then an index lookup with
that key will return the object § If an object is returned by index lookup, then this object contains a
secondary key for that index within the specified range
● Solution: § Longer index lifespan (via ordered writes) § Object data is ground truth and index entries serve as hints
Foo Bob ... Object
Index Entry
Bob è Foo
Consistency
Slide 41
● Consistency properties: § If an object contains a given secondary key, then an index lookup with
that key will return the object § If an object is returned by index lookup, then this object contains a
secondary key for that index within the specified range
● Solution: § Longer index lifespan (via ordered writes) § Object data is ground truth and index entries serve as hints
Foo Bob ... Object
Index Entry
Bob è Foo Sam è Foo 1. Add new index entry
Consistency
Slide 42
● Consistency properties: § If an object contains a given secondary key, then an index lookup with
that key will return the object § If an object is returned by index lookup, then this object contains a
secondary key for that index within the specified range
● Solution: § Longer index lifespan (via ordered writes) § Object data is ground truth and index entries serve as hints
Foo Sam ... Object
Index Entry
Bob è Foo Sam è Foo 1. Add new index entry 2. Modify object
Consistency
Slide 43
● Consistency properties: § If an object contains a given secondary key, then an index lookup with
that key will return the object § If an object is returned by index lookup, then this object contains a
secondary key for that index within the specified range
● Solution: § Longer index lifespan (via ordered writes) § Object data is ground truth and index entries serve as hints
Foo Sam ... Object
Index Entry
Sam è Foo 1. Add new index entry 2. Modify object 3. Remove old index entry
Consistency
Slide 44
● Consistency properties: § If an object contains a given secondary key, then an index lookup with
that key will return the object § If an object is returned by index lookup, then this object contains a
secondary key for that index within the specified range
● Solution: § Longer index lifespan (via ordered writes) § Object data is ground truth and index entries serve as hints
time
Bob è Foo
Sam è Foo
Foo Bob ... Foo Sam ... Object
Index Entry
modify object remove object write object
Consistency
Slide 45
● Consistency properties: § If an object contains a given secondary key, then an index lookup with
that key will return the object § If an object is returned by index lookup, then this object contains a
secondary key for that index within the specified range
● Solution: § Longer index lifespan (via ordered writes) § Object data is ground truth and index entries serve as hints
time
Bob è Foo
Sam è Foo
Foo Bob ... Foo Sam ... Object
Index Entry
modify object remove object write object
Consistency
Slide 46
● Consistency properties: § If an object contains a given secondary key, then an index lookup with
that key will return the object § If an object is returned by index lookup, then this object contains a
secondary key for that index within the specified range
● Solution: § Longer index lifespan (via ordered writes) § Object data is ground truth and index entries serve as hints
time
Bob è Foo
Sam è Foo
commit point commit point commit point
Foo Bob ... Foo Sam ... Object
Index Entry
modify object remove object write object
x
Consistency
Slide 47
● Consistency properties: § If an object contains a given secondary key, then an index lookup with
that key will return the object § If an object is returned by index lookup, then this object contains a
secondary key for that index within the specified range
● Solution: § Longer index lifespan (via ordered writes) § Object data is ground truth and index entries serve as hints
time
Bob è Foo
Sam è Foo
commit point commit point commit point
Foo Bob ... Foo Sam ... Object
Index Entry
modify object remove object write object
● Scalability § Nearly constant low latency irrespective of the server span § Linear increase in throughput with the server span § Solution: Use independent partitioning § But: indexed object writes: distributed operations § Potential consistency issues between indexes and objects
● Strong consistency § With minimal performance overheads § Solution: Ordered write approach + treat indexes as hints
SLIK Slide 48
Design
SLIK Slide 49
Talk Outline
● Motivation
● Design
● Performance
● Related Work
● Summary
● Does SLIK provide low latency?
● Does SLIK provide scalability?
● How does the performance of indexing with SLIK compare to other state-of-the-art systems?
SLIK Slide 50
Performance: Questions
● H-Store: § Main memory database § Data (and indexes) partitioned based on specified attribute § Many parameters for tuning
● Got assistance from developers to tune for each test ● Examples: txn_incoming_delay, partitioning column
● HyperDex: § Spaces containing objects § Data (and indexes) partitioned using hyperspace hashing § Each index contains all object data § Designed to use disk for storage
Slide 51
Performance: Systems for Comparison
SLIK
SLIK Slide 52
Hardware
Slide 53
Latency
Experiments:
1. Lookups: table with single secondary index
2. Overwrites: table with single secondary index
3. Overwrites: varying number of secondary indexes
Configuration:
● Single client
● Single partition for table and (each) index
● Object: 30 B pk, 30 B sk, 100 B value
● SLIK: Three-way replication to durable backups
● H-Store: No replication, durability disabled, single server
Slide 54
Lookup Latency
10
100
1000
100 101 102 103 104 105 106
(c) O
verw
rite
Late
ncy
(µs)
Size of Index (# objects)
H-Store SK Partitioned
179.53 160.16195.06 202.07 192.46 207.10 209.37
H-Store PK Partitioned
987.25 939.84 961.17 1019.82 1048.86 1010.92 968.72
HyperDex
648 667 671 667 660 657 661
SLIK tcp
124.4 135.8 126.5 125.9 124.3 123.8 129.7
SLIK
31.4 32.7 34.2 34.3 35.2 35.2 37.0
0
50
100
150
200
250
100 101 102 103 104 105 106
(a) L
ooku
p La
tenc
y (µ
s)
H-Store
142.10 138.18150.11 152.37
134.97147.42
132.00
SLIK TCP
45.2 44.9 42.7 48.5 49.7 45.654.5
SLIK
11.0 10.2 11.7 11.6 12.7 12.8 13.1
Slide 55
Lookup Latency
10
100
1000
100 101 102 103 104 105 106
(c) O
verw
rite
Late
ncy
(µs)
Size of Index (# objects)
H-Store SK Partitioned
179.53 160.16195.06 202.07 192.46 207.10 209.37
H-Store PK Partitioned
987.25 939.84 961.17 1019.82 1048.86 1010.92 968.72
HyperDex
648 667 671 667 660 657 661
SLIK tcp
124.4 135.8 126.5 125.9 124.3 123.8 129.7
SLIK
31.4 32.7 34.2 34.3 35.2 35.2 37.0
0
50
100
150
200
250
100 101 102 103 104 105 106
(a) L
ooku
p La
tenc
y (µ
s)
H-Store
142.10 138.18150.11 152.37
134.97147.42
132.00
SLIK TCP
45.2 44.9 42.7 48.5 49.7 45.654.5
SLIK
11.0 10.2 11.7 11.6 12.7 12.8 13.1
Overwrite Latency
Slide 56
0
50
100
150
200
250
100 101 102 103 104 105 106
(b) O
verw
rite
Late
ncy
(µs)
Size of Index (# objects)
H-Store
143.51 143.00151.39 153.28
137.74148.88
133.54
SLIK TCP
124.4135.8
126.5 125.9 124.3 123.8 129.7
SLIK
31.4 32.7 34.2 34.3 35.2 35.2 37.0
Multiple Secondary Indexes
Slide 57
10
100
1000
0 1 2 3 4 5 6 7 8 9 10
Over
writ
e La
tenc
y (µ
s)
Number of Indexes
H-Store via PK
152.97
265.91 273.41 285.99 287.72
H-Store via SK
181.98
1474.39 1682.16 1704.69 1756.99
SLIK TCP
138.5 139.1 156.2 165.2 164.3 175.3 175.7 179.1 182.0 184.6
SLIK
33.0 35.3 39.8 39.0 42.7 42.4 46.4 47.3 49.5 51.2
Multiple Secondary Indexes
Slide 58
10
100
1000
0 1 2 3 4 5 6 7 8 9 10
Over
writ
e La
tenc
y (µ
s)
Number of Indexes
H-Store via PK
152.97
265.91 273.41 285.99 287.72
H-Store via SK
181.98
1474.39 1682.16 1704.69 1756.99
SLIK TCP
138.5 139.1 156.2 165.2 164.3 175.3 175.7 179.1 182.0 184.6
SLIK
33.0 35.3 39.8 39.0 42.7 42.4 46.4 47.3 49.5 51.2
Multiple Secondary Indexes
Slide 59
10
100
1000
0 1 2 3 4 5 6 7 8 9 10
Over
writ
e La
tenc
y (µ
s)
Number of Indexes
H-Store via PK
152.97
265.91 273.41 285.99 287.72
H-Store via SK
181.98
1474.39 1682.16 1704.69 1756.99
SLIK TCP
138.5 139.1 156.2 165.2 164.3 175.3 175.7 179.1 182.0 184.6
SLIK
33.0 35.3 39.8 39.0 42.7 42.4 46.4 47.3 49.5 51.2
Slide 60
Scalability
Experiments:
1. Lookup throughput with increasing number of partitions
2. Lookup latency with increasing number of partitions
Configuration:
● Single table with one secondary index
● Table and index partitioned across servers
● Object: 30 B pk, 30 B sk, 100 B value
● Throughput experiment: Loaded system
● Latency experiment: Unloaded system
Slide 61
Scalability: Lookup Throughput
0
1000
2000
3000
4000
5000
0 2 4 6 8 10 12 14 16 18 20
Thro
ughput
(10
3
lookups/s
ec)
Number of Servers
H-Store
SLIK TCP
SLIK
96.90312.35 347.32 396.27
430653
7941001
11991352 1445
16631807
220
580
1127
1619
2197
2655
3199
3629
4248
4629
5069
Slide 62
Scalability: Lookup Latency
0
50
100
150
200
250
300
0 1 2 3 4 5 6 7 8 9 10
Avera
ge L
ate
ncy p
er
Lookup (
µs)
Number of Indexlets
H-Store
119.4
152.6
179.6
199.2
215.6
243.4 240.3
267.4 269.8 267.2
SLIK TCP
49.955.6 58.2
69.181.6
90.6 91.0
112.7 114.4 113.3
SLIK
13.1 13.3 13.9 13.7 14.4 14.7 14.5 14.4 14.6 14.7
SLIK Slide 63
Talk Outline
● Motivation
● Design
● Performance
● Related Work ● Summary
SLIK Slide 64
Related Work
Data storage system § Data model (spectrum from key-value to relational) § Consistency (spectrum from eventual to strong) § Performance: latency and/or throughput
Current Web Scale Datastores
Slide 65 SLIK
100s 10s 1s 100ms 10ms 1ms 100µs
Eventual
Strong
10µs
Causal, SI, “Define
your own”
Read / write latency (approx)
Con
sist
ency
Lev
el
Better
Bett
er
Current Web Scale Datastores
Slide 66 SLIK
100s 10s 1s 100ms 10ms 1ms 100µs
Eventual
Strong
10µs
Causal, SI, “Define
your own”
Read / write latency (approx)
Con
sist
ency
Lev
el
Better
Bett
er
CouchDB Tao
Espresso PNUTS
Current Web Scale Datastores
Slide 67 SLIK
100s 10s 1s 100ms 10ms 1ms 100µs
Eventual
Strong
10µs
Causal, SI, “Define
your own”
Read / write latency (approx)
Con
sist
ency
Lev
el
Better
Bett
er
CouchDB Tao
H-Store
HyperDex
MongoDB Spanner
Espresso
H-Base
PNUTS
Current Web Scale Datastores
Slide 68 SLIK
100s 10s 1s 100ms 10ms 1ms 100µs
Eventual
Strong
10µs
Causal, SI, “Define
your own”
Read / write latency (approx)
Con
sist
ency
Lev
el
Better
Bett
er
Tao
Cassandra
CouchDB
MegaStore
H-Store
HyperDex
MongoDB Spanner
Espresso
H-Base
PNUTS
SLIK
Cassandra
CouchDB Tao
MegaStore
H-Store
HyperDex
MongoDB Spanner
Espresso
H-Base
PNUTS
Current Web Scale Datastores
Slide 69 SLIK
100s 10s 1s 100ms 10ms 1ms 100µs
Eventual
Strong
10µs
Causal, SI, “Define
your own”
Read / write latency (approx)
Con
sist
ency
Lev
el
Better
Bett
er
SLIK Slide 70
Talk Outline
● Motivation
● Design
● Performance
● Related Work
● Summary
A key value store can support
strongly consistent secondary indexes
while operating at low latency and large scale.
SLIK Slide 71
Summary
Lookups and range queries on secondary keys
By using approaches that have minimal overheads we get: 11-13 µs lookups and 30-37 µs (over)writes
By using ordered writes and treating indexes as hints
By using independent partitioning we get: linear throughput increase and minimal impact on latency as the scale increases
Thank you!
Code available free and open source: github.com/PlatformLab/RAMCloud My papers and other information at: http://stanford.edu/~ankitak
I can be reached at: [email protected]