PebblesDB: Building Key-Value Stores using Fragmented Log ...

PebblesDB: Building Key-Value Stores using Fragmented Log

Structured Merge Trees

Pandian Raju1, Rohan Kadekodi1, Vijay Chidambaram1,2, Ittai Abraham2

1The University of Texas at Austin2VMware Research

What is a key-value store?

• Store any arbitrary value for a given key

123

124

Keys{“name”:“JohnDoe”,“age”:25}

{“name”:“RossGel”,“age”:28}

Values

2



• Insertions:• Point lookups:• Range Queries:

123

124



Values

3



• Insertions: put(key, value)• Point lookups:• Range Queries:

123

124



Values

4



• Insertions: put(key, value)• Point lookups: get(key)• Range Queries:

123

124



Values

5



• Insertions: put(key, value)• Point lookups: get(key)• Range Queries: get_range(key1, key2)

123

124



Values

6

Key-Value Stores - widely used

• Google’s BigTable powers Search, Analytics, Maps and Gmail• Facebook’s RocksDB is used as storage engine in production

systems of many companies

7

Write-optimized data structures• Log Structured Merge Tree (LSM) is a write-optimized data structure

used in key-value stores• Provides high write throughput with good read throughput, but

suffers high write amplification

8

• Log Structured Merge Tree (LSM) is a write-optimized data structure used in key-value stores • Provides high write throughput with good read throughput, but

suffers high write amplification• Write amplification - Ratio of amount of write IO to amount of user

data

KV-storeClient10GB

Userdata

IftotalwriteI/Ois200GB

Writeamplification=20

9

Write-optimized data structures

• Inserted 500M key-value pairs• Key: 16 bytes, Value: 128 bytes• Total user data: ~45 GB

450

300

600

900

1200

1500

1800

2100

RocksDB LevelDB PebblesDB UserData

WriteIO(G

B)

Write amplification in LSM based KV stores

10

• Inserted 500M key-value pairs• Key: 16 bytes, Value: 128 bytes• Total user data: ~45 GB

1868(42x)

1222(27x)

756(17x)

450

300

600

900

1200

1500

1800

2100

RocksDB LevelDB PebblesDB UserData

WriteIO(G

B)

11

Write amplification in LSM based KV stores

Why is write amplification bad?

• Reduces the write throughput• Flash devices wear out after limited write cycles

(Intel SSD DC P4600 – can last ~5 years assuming ~5 TB write per day)

RocksDB can write ~500 GB of user data per day to a SSD to last 1.25 years

Data source: https://www.intel.com/content/www/us/en/products/memory-storage/solid-state-drives/data-center-ssds/dc-p4600-series/dc-p4600-1-6tb-2-5inch-3d1.html12

PebblesDB

Built using new data structure Fragmented Log-Structured Merge Tree

High performance write-optimized key-value store

Achieves 3-6.7x higher write throughput and 2.4-3xlesser write amplification compared to RocksDB

Gets the highest write throughput and least write amplification as a backend store to MongoDB

13

Outline

• Log-Structured Merge Tree (LSM)• Fragmented Log-Structured Merge Tree (FLSM)• Building PebblesDB using FLSM• Evaluation• Conclusion

14

Outline


15

Log Structured Merge Tree (LSM)

Data is stored both in memory and storage

Memory

Storage

In-memory

16

File1

Writesaredirectlyputtomemory

In-memoryMemory

Storage

Write(key,value)

17

File1


Memory

File1

File2

In-memory data is periodically written as files to storage (sequential I/O)

In-memory

18

Storage


Files on storage are logically arranged in different levels

In-memoryMemory

Level0

Level1

Leveln

19

Storage


Compaction pushes data to higher numbered levels

In-memoryMemory

Level0

Level1

Leveln

20

Storage


Files are sorted and have non-overlapping key ranges

In-memoryMemory

1.…12 15….19 25….75 79….99

Searchusingbinarysearch

Level0

Level1

Leveln

21

Storage


Level 0 can have files with overlapping (but sorted) key ranges

In-memoryMemory

2….57 23….78Level0

Level1

Leveln

Limitonnumberoflevel0files

22

Storage


Write amplification: Illustration

Max files in level 0 is configured to be 2

Memory

2….37 23….48

1….12 15….25 39….62 77….95

Level0

Level1

Leveln

In-memory58….68

Level1re-writecounter:1

23

Storage


Level 0 has 3 files (> 2), which triggers a compaction

Memory

2….37 23….48

1….12 15….25 39….62 77….95

Level0

Level1

Leveln

58….68

In-memory


24

Storage


* Files are immutable * Sorted non-overlapping files

Memory

2….37 23….48

1….12 15….25 39….62 77….95

Level0

Level1

Leveln

58….68

In-memory


25

Storage


Set of overlapping files between levels 0 and 1

Memory

2….37 23….48

1….12 15….25 39….62 77….95

Level0

Level1

Leveln

58….68

In-memory


26

Storage


Memory

2….37 23….48

1….12 15….25 39….62 77….95

Level0

Level1

Leveln

58….68

In-memory


27

Storage



Memory

2….37 23….48

1….12 15….25 39….62 77….95

Level0

Level1

Leveln

58….68

In-memory


28

Storage


1….2347….6824….461….68


Compacting level 0 with level 1

Memory

2….37 23….48

1….12 15….25 39….62 77….95

Level0

Level1

Leveln

58….68

In-memory

Level1re-writecounter:1Level1re-writecounter:2

29

Storage


Level 0 is compacted

Memory

1….23 24….46 47….68 77….95

Level0

Level1

Leveln

In-memory


30

Storage


Data is being flushed as level 0 files after some Write operations

Memory

1….23 24….46 47….68 77….95

Level0

Level1

Leveln

10….3317….531….121


31

Storage



Memory

1….23 24….46 47….68 77….95

Level0

Level1

Leveln

10….33 17….53 1….121


32

Storage

92….12162….9031….601….30


Memory

Level0

Level1

Leveln

1….121 Level1re-writecounter:2Level1re-writecounter:3

33

Storage



Existing data is re-written to the same level (1) 3 times

Memory

1….30 31….60 62….90 92….121

Level0

Level1

Leveln


34

Storage

Root cause of write amplification

Rewriting data to the same levelmultiple times

To maintain sorted non-overlapping files in each level

35

Outline


36

Naïve approach to reduce write amplification

• Just append the file to the end of next level• Many (possibly all) overlapping files within a level

• Affects the read performance

1….89 6….915….65 9….99 1….102 1…2718….95Leveli

(all files have overlapping key ranges)

37

Partially sorted levels

• Hybrid between all non-overlapping files and all overlapping files• Inspired from Skip-List data structure• Concrete boundaries (guards) to group together overlapping files

1….12 18….3113….34 42….65 72….8745….5640….47Leveli

(filesofsamecolorcanhaveoverlappingkeyranges)

38

13 35 70

Fragmented Log-Structured Merge Tree

Novel modification of LSM data structure

Uses guards to maintain partially sorted levels

Writes data only once per level in most cases

39

FLSM structure

Note how files are logically grouped within guards

Memory

2….37 23….48

1….12 15….59 77….87 82….95

2….8 15….2316….32 70….90 96….9945….65

Level0

Level1

Level2

In-memory

15 70

40 7015 95

40

Storage

Guards get more fine grained deeper into the tree

Memory

2….37 23….48

1….12 15….59 77….87 82….95

2….8 15….2316….32 70….90 96….9945….65

Level0

Level1

Level2

In-memory

15 70

40 7015 95

41

Storage

FLSM structure

How does FLSM reduce write amplification?

42

In-memory


Memory

2….37 23….48

1….12 15….59 77….87 82….95

2….8 15….2316….32 70….90 96….9945….65

Level0

Level1

Level2

15 70

40 7015 95

30….68

Max files in level 0 is configured to be 2

43

Storage

2….1415….68

Compacting level 0

Memory

2….37 23….48

1….12 15….59 77….87 82….95

2….8 15….2316….32 70….90 96….9945….65

Level0

Level1

Level2

In-memory

15 70

40 7015 95

30….68

2….68

44

15

Storage


15….59

2….14 15….68

Fragmented files are just appended to next level

Memory

1….12

2….8 15….2316….32 70….90 96….9945….65

Level0

Level1

Level2

In-memory

15

40 7015 95

77….87 82….95

70

45

15

Storage


15….592….14 15….68

Guard 15 in Level 1 is to be compacted

Memory

1….12

2….8 15….2316….32 70….90 96….9945….65

Level0

Level1

Level2

In-memory

15

40 7015 95

77….87 82….95

70

15….68

46

Storage


15….3940….68

2….14

Files are combined, sorted and fragmented

Memory

1….12

2….8 15….2316….32 70….90 96….9945….65

Level0

Level1

Level2

In-memory

15

40 7015 95

77….87 82….95

70

15….68

47

40

Storage


15….39 40….68

2….14

Fragmented files are just appended to next level

Memory

1….12

2….8 15….2316….32 70….90 96….9945….65

Level0

Level1

Level2

In-memory

15

40 7015 95

77….87 82….95

70

48

40

Storage


FLSM maintains partially sorted levels to efficiently reduce the search space


FLSM doesn’t re-write data to the same levelin most cases

How does FLSM maintain read performance?

49

Selecting Guards

50

• Guards are chosen randomly and dynamically• Dependent on the distribution of data

Selecting Guards

51

1 1e+9Keyspace


Selecting Guards

52

1 1e+9Keyspace


Selecting Guards


53

1 1e+9Keyspace

Operations: Write

FLSM structure

Memory

2….37 23….48

1….12 15….59 77….87 82….95

2….8 15….2316….32 70….90 96….9945….65

Level0

Level1

Level2

In-memory

15 70

40 7015 95

Put(1,“abc”)Write(key,value)

54

Storage

FLSM structure

Memory

2….37 23….48

1….12 15….59 77….87 82….95

2….8 15….2316….32 70….90 96….9945….65

Level0

Level1

Level2

In-memory

15 70

40 7015 95

Get(23)

55

Storage

Operations: Get

Search level by level starting from memory

Memory

2….37 23….48

1….12 15….59 77….87 82….95

2….8 15….2316….32 70….90 96….9945….65

Level0

Level1

Level2

In-memory

15 70

40 7015 95

Get(23)

56

Storage

Operations: Get

All level 0 files need to be searched

Memory

2….37 23….48

1….12 15….59 77….87 82….95

2….8 15….2316….32 70….90 96….9945….65

Level0

Level1

Level2

In-memory

15 70

40 7015 95

Get(23)

57

Storage

Operations: Get

Level 1: File under guard 15 is searched

Memory

2….37 23….48

1….12 15….59 77….87 82….95

2….8 15….2316….32 70….90 96….9945….65

Level0

Level1

Level2

In-memory

15 70

40 7015 95

Get(23)

58

Storage

Operations: Get

Level 2: Both the files under guard 15 are searched

Memory

2….37 23….48

1….12 15….59 77….87 82….95

2….8 15….2316….32 70….90 96….9945….65

Level0

Level1

Level2

In-memory

15 70

40 7015 95

Get(23)

59

Storage

Operations: Get

High write throughput in FLSM• Compaction from memory to level 0 is stalled• Writes to memory is also stalled

Memory

Storage1….37 18….48Level0

In-memory

2….98 23….48

Write(key,value)

Ifrateofinsertionishigherthanrateofcompaction,writethroughputdependsontherateofcompaction

60

High write throughput in FLSM• Compaction from memory to level 0 is stalled• Writes to memory is also stalled

Memory

Storage1….37 18….48Level0

In-memory

2….98 23….48

Write(key,value)

Ifrateofinsertionishigherthanrateofcompaction,writethroughputdependsontherateofcompaction

61

FLSMhasfastercompaction becauseoflesserI/Oandhencehigherwritethroughput

Challenges in FLSM

• Every read/range query operation needs to examine multiple files per level• For example, if every guard has 5 files, read latency is

increased by 5x (assuming no cache hits)

Trade-off between write I/O and read performance

62

Outline


63

PebblesDB

• Built by modifying HyperLevelDB (±9100 LOC) to use FLSM• HyperLevelDB, built over LevelDB, to provide improved

parallelism and compaction• API compatible with LevelDB, but not with RocksDB

64

Optimizations in PebblesDB

• Challenge (get/range query): Multiple files in a guard• Get() performance is improved using file level bloom filter

65



66

BloomfilterIskey25

present?Definitelynot

Possiblyyes


1….12 15….39 82….95Level1

15 70

BloomFilterBloomFilterBloomFilterBloomFilter

77….97 Maintainedin-memory

67



1….12 15….39 82….95Level1

15 70



68


PebblesDBreads samenumberoffilesasanyLSMbasedstore


• Challenge (get/range query): Multiple files in a guard• Get() performance is improved using file level bloom filter• Range query performance is improved using parallel threads

and better compaction

69

Outline


70

Evaluation

Micro-benchmarks

71

LowmemorySmalldataset

Crashrecovery

CPUandmemoryusage

Agedfilesystem

Realworldworkloads- YCSB

NoSQLapplications

Evaluation

Micro-benchmarks

72

LowmemorySmalldataset

Crashrecovery

CPUandmemoryusage

Agedfilesystem

Realworldworkloads- YCSB

NoSQLapplications

Real world workloads - YCSB

0

0.5

1

1.5

2

2.5

LoadA RunA RunB RunC RunD LoadE RunE RunF TotalIO

Throug

hputra

tiowrt

Hype

rLevelDB

• Yahoo! Cloud Serving Benchmark - Industry standard macro-benchmark• Insertions: 50M, Operations: 10M, key size: 16 bytes and value size: 1 KB

LoadA- 100%writesRunA- 50%reads,50%writesRunB- 95%reads,5%writesRunC- 100%reads

RunD- 95%reads(latest),5%writesLoadE- 100%writesRunE- 95%rangequeries,5%writesRunF- 50%reads,50%read-modify-writes

73

35.08Ko

ps/s

25.8Kop

s/s

33.98Ko

ps/s

22.41Ko

ps/s

57.87Ko

ps/s

34.06Ko

ps/s

5.8Ko

ps/s

32.09Ko

ps/s

952.93GB

0

0.5

1

1.5

2

2.5


Throug

hputra

tiowrt

Hype

rLevelDB



74

Real world workloads - YCSB• Yahoo! Cloud Serving Benchmark - Industry standard macro-benchmark• Insertions: 50M, Operations: 10M, key size: 16 bytes and value size: 1 KB

35.08Ko

ps/s

25.8Kop

s/s

33.98Ko

ps/s

22.41Ko

ps/s

57.87Ko

ps/s

34.06Ko

ps/s

5.8Ko

ps/s

32.09Ko

ps/s

952.93GB

0

0.5

1

1.5

2

2.5


Throug

hputra

tiowrt

Hype

rLevelDB

LoadA- 100%writesRunA - 50%reads,50%writesRunB- 95%reads,5%writesRunC- 100%reads

RunD- 95%reads(latest),5%writesLoadE - 100%writesRunE - 95%rangequeries,5%writesRunF- 50%reads,50%read-modify-writes

75


35.08Ko

ps/s

25.8Kop

s/s

33.98Ko

ps/s

22.41Ko

ps/s

57.87Ko

ps/s

34.06Ko

ps/s

5.8Ko

ps/s

32.09Ko

ps/s

952.93GB

0

0.5

1

1.5

2

2.5


Throug

hputra

tiowrt

Hype

rLevelDB

LoadA- 100%writesRunA - 50%reads,50%writesRunB- 95%reads,5%writesRunC- 100%reads


76


35.08Ko

ps/s

25.8Kop

s/s

33.98Ko

ps/s

22.41Ko

ps/s

57.87Ko

ps/s

34.06Ko

ps/s

5.8Ko

ps/s

32.09Ko

ps/s

952.93GB

0

0.5

1

1.5

2

2.5


Throug

hputra

tiowrt

Hype

rLevelDB


RunD- 95%reads(latest),5%writesLoadE - 100%writesRunE- 95%rangequeries,5%writesRunF- 50%reads,50%read-modify-writes

77


35.08Ko

ps/s

25.8Kop

s/s

33.98Ko

ps/s

22.41Ko

ps/s

57.87Ko

ps/s

34.06Ko

ps/s

5.8Ko

ps/s

32.09Ko

ps/s

952.93GB

0

0.5

1

1.5

2

2.5


Throug

hputra

tiowrt

Hype

rLevelDB


RunD- 95%reads(latest),5%writesLoadE- 100%writesRunE - 95%rangequeries,5%writesRunF- 50%reads,50%read-modify-writes

78


NoSQL stores - MongoDB

0

0.5

1

1.5

2

2.5


Throug

hputra

tiowrt

Wire

dTiger

• YCSB on MongoDB, a widely used key-value store• Inserted 20M key-value pairs with 1 KB value size and 10M operations



79

20.73Ko

ps/s

9.95Kop

s/s

15.52Ko

ps/s

19.69Ko

ps/s

23.53Ko

ps/s

20.68Ko

ps/s

0.65Kop

s/s

9.78Kop

s/s

426.33GB

0

0.5

1

1.5

2

2.5


Throug

hputra

tiowrt

Wire

dTiger




80


20.73Ko

ps/s

9.95Kop

s/s

15.52Ko

ps/s

19.69Ko

ps/s

23.53Ko

ps/s

20.68Ko

ps/s

0.65Kop

s/s

9.78Kop

s/s

426.33GB

0

0.5

1

1.5

2

2.5


Throug

hputra

tiowrt

Wire

dTiger




81


20.73Ko

ps/s

9.95Kop

s/s

15.52Ko

ps/s

19.69Ko

ps/s

23.53Ko

ps/s

20.68Ko

ps/s

0.65Kop

s/s

9.78Kop

s/s

426.33GB

0

0.5

1

1.5

2

2.5


Throug

hputra

tiowrt

Wire

dTiger




82


20.73Ko

ps/s

9.95Kop

s/s

15.52Ko

ps/s

19.69Ko

ps/s

23.53Ko

ps/s

20.68Ko

ps/s

0.65Kop

s/s

9.78Kop

s/s

426.33GB

0

0.5

1

1.5

2

2.5


Throug

hputra

tiowrt

Wire

dTiger




83


20.73Ko

ps/s

9.95Kop

s/s

15.52Ko

ps/s

19.69Ko

ps/s

23.53Ko

ps/s

20.68Ko

ps/s

0.65Kop

s/s

9.78Kop

s/s

426.33GB

0

0.5

1

1.5

2

2.5


Throug

hputra

tiowrt

Wire

dTiger




84


PebblesDBcombineslowwriteIOofWiredTigerwithhighperformanceofRocksDB

Outline


85

Conclusion

• PebblesDB: key-value store built on Fragmented Log-Structured Merge Trees• Increases write throughput and reduces write IO at the same time• Obtains 6X the write throughput of RocksDB

• As key-value stores become more widely used, there have been several attempts to optimize them• PebblesDB combines algorithmic innovation (the FLSM data

structure) with careful systems building

86

https://github.com/utsaslab/pebblesdb

https://github.com/utsaslab/pebblesdb

Backup slides

89

Operations: Seek

• Seek(target): Returns the smallest key in the database which is >= target• Used for range queries (for example, return all entries

between 5 and 18)

Get(1)Level 0 – 1, 2, 100, 1000Level 1 – 1, 5, 10, 2000Level 2 – 5, 300, 500

90

Operations: Seek


between 5 and 18)

Seek(200)Level 0 – 1, 2, 100, 1000Level 1 – 1, 5, 10, 2000Level 2 – 5, 300, 500

91

Operations: Seek


between 5 and 18)

92

Operations: Seek

FLSM structure

Memory

2….37 23….48

1….12 15….59 77….87 82….95

2….8 15….2316….32 70….90 96….9945….65

Level0

Level1

Level2

In-memory

15 70

40 7015 95

Seek(23)

93

Storage

Operations: Seek

All levels and memtable need to be searched

Memory

2….37 23….48

1….12 15….59 77….87 82….95

2….8 15….2316….32 70….90 96….9945….65

Level0

Level1

Level2

In-memory

15 70

40 7015 95

Seek(23)

94

Storage


• Challenge with reads: Multiple sstable reads per level• Optimized using sstable level bloom filters• Bloom filter: determine if an element is in a set

BloomfilterIskey25

present?Definitelynot

Possiblyyes95



1….12 15….39 82….95Level1

15 70

Get(97)True



96



1….12 15….39 82….95Level1

15 70

Get(97)False True


77….97

97



1….12 15….39 82….95Level1

15 70


77….97

PebblesDBreads atmostonefileperguardwithhighprobability98

Optimizations in PebblesDB• Challenge with seeks: Multiple sstable reads per level• Parallel seeks: Parallel threads to seek() on files in a guard

1….12 15….39 77….97 82….95Level1

15 70

Seek(85)

Thread1 Thread2

99

Optimizations in PebblesDB• Challenge with seeks: Multiple sstable reads per level• Parallel seeks: Parallel threads to seek() on files in a guard• Seek based compaction: Triggers compaction for a level

during a seek-heavy workload• Reduce the average number of sstables per guard• Reduce the number of active levels

SeekbasedcompactionincreaseswriteI/O butasatrade-offtoimproveseekperformance

100

Tuning PebblesDB

• PebblesDB characteristics like• Increase in write throughput,• decrease in write amplification and• overhead of read/seek operationall depend on one parameter, maxFilesPerGuard (default 2 in PebblesDB)

• Setting this to a very high value favors write throughput• Setting this to a very low value favors read throughput

101

Horizontal compaction

• Files compacted within the same level for the last two levels in PebblesDB• Some optimizations to prevent huge increase in write IO

102

Experimental setup

• Intel Xeon 2.8 GHz processor• 16 GB RAM• Running Ubuntu 16.04 LTS with the Linux 4.4 kernel• Software RAID0 over 2 Intel 750 SSDs (1.2 TB each)• Datasets in experiments 3x bigger than DRAM size

103

Write amplification

7.2GB

100.7GB

756GB

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

10M 100M 500M

WriteIOra

tiowrtPe

bblesD

B

Numberofkeysinserted

• Inserted different number of keys with key size 16 bytes and value size 128 bytes

104

Micro-benchmarks

11.72Ko

ps/s

6.89Kop

s/s

7.5Ko

ps/s

0

0.5

1

1.5

2

2.5

3

Random-Writes Reads Range-Queries

Throug

hputra

tiowrtHy

perLevelDB

Benchmark

• Used db_bench tool that ships with LevelDB• Inserted 50M key-value pairs with key size 16 bytes and value size 1 KB• Number of read/seek operations: 10M

105

Micro-benchmarks

239.05Kop

s/s

11.72Ko

ps/s

6.89Kop

s/s

7.5Ko

ps/s

126.2Ko

ps/s

0

0.5

1

1.5

2

2.5

3

Seq-Writes Random-Writes Reads Range-Queries Deletes

Throug

hputra

tiowrtHy

perLevelDB

Benchmark

• Used db_bench tool that ships with LevelDB• Inserted 50M key-value pairs with key size 16 bytes and value size 1 KB• Number of read/seek operations: 10M

106

Multi threaded micro-benchmarks

44.4Kop

s/s

40.2Kop

s/s

38.8Kop

s/s

0

0.5

1

1.5

2

2.5

Writes Reads MixedThroug

hputra

tiowrtHy

perLevelDB

Benchmark

• Writes – 4 threads each writing 10M• Reads – 4 threads each reading 10M• Mixed – 2 threads writing and 2 threads reading (each 10M)

107

Small cached dataset• Insert 1M key-value pairs with 16 bytes key and 1 KB value• Total data set (~1 GB) fits within memory• PebblesDB-1: with maximum one file per guard

108

45.25Ko

ps/s

205.76Kop

s/s

205.34Kop

s/s

0

0.5

1

1.5

2

2.5

Writes Reads Range-QueriesThroug

hputra

tiowrtHy

perLevelDB

Benchmark

Small key-value pairs• Inserted 300M key-value pairs• Key 16 bytes and 128 bytes value

109

44.48Ko

ps/s

6.34Kop

s/s

6.31Kop

s/s

0

0.5

1

1.5

2

2.5

3

3.5

Writes Reads Range-Queries

Throug

hputra

tiowrtHy

perLevelDB

Benchmark

Aged FS and KV store

17.37Ko

ps/s

5.65Kop

s/s

6.29Kop

s/s

0

0.5

1

1.5

2

2.5


Throug

hputra

tiowrtHy

perLevelDB

Benchmark

• File system aging: Fill up 89% of the file system• KV store aging: Insert 50M, delete 20M and update 20M key-value

pairs in random order

110

Low memory micro-benchmark

27.78Ko

ps/s

2.86Kop

s/s

4.37Kop

s/s

0

0.5

1

1.5

2

2.5


Throug

hputra

tiowrtHy

perLevelDB

Benchmark

• 100M key-value pairs with 1KB (~65 GB data set)• DRAM was limited to 4 GB

111

Impact of empty guards

• Inserted 20M key-value pairs (0 to 20M) in random order with value size 512 bytes• Incrementally inserted new 20M keys after deleting the older

keys• Around 9000 empty guards at the start of the last iteration• Read latency did not reduce with the increase in empty

guards

112

22.08Ko

ps/s

21.85Ko

ps/s

31.17Ko

ps/s

32.75Ko

ps/s

38.02Ko

ps/s

7.62Kop

s/s

0.37Kop

s/s

19.11Ko

ps/s

1349.5GB

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8


Throug

hputra

tiowrt

Hype

rLevelDB

• HyperDex – distributed key-value store from Cornell• Inserted 20M key-value pairs with 1 KB value size and 10M operations


RunD- 95%reads(latest),5%writesLoadE- 100%writesRunE- 95%rangequeries,5%writesRunF- 50%reads,50%read-modify-writes 113

NoSQL stores - HyperDex

CPU usage

• Median CPU usage by inserting 30M keys and reading 10M keys• PebblesDB: ~171%• Other key-value stores: 98-110%• Due to aggressive compaction, more CPU operations due to

merging multiple files in a guard

114

Memory usage

• 100M records (16 bytes key, 1 KB value) – 106 GB data set• 300 MB memory space• 0.3% of data set size

• Worst case: 100M records (16 bytes key, 16 bytes value) ~3.2 GB• 9% of data set size

115

Bloom filter calculation cost

• 1.2 sec per GB of sstable• 3200 files – 52 GB – 62 seconds

116

Impact of different optimizations

• Sstable level bloom filter improve read performance by 63%• PebblesDB without optimizations for seek – 66%

117

Thank you!Questions?

118

Date post:	13-Jan-2022
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

PebblesDB: Building Key-Value Stores using Fragmented Log ...

Documents