+ All Categories
Home > Documents > SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1...

SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1...

Date post: 27-Jul-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
108
SLM-DB: Single Level Merge Key-Value Store with Persistent Memory Olzhas Kaiyrakhmet, Songyi Lee, Beomseok Nam, Sam H. Noh, Young-ri Choi
Transcript
Page 1: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

SLM-DB: Single Level Merge Key-Value Store with Persistent Memory

Olzhas Kaiyrakhmet, Songyi Lee, Beomseok Nam, Sam H. Noh, Young-ri Choi

Page 2: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Outline

• Background

• Contributions

• Architecture

• Evaluation

• Conclusion

FAST 2019 2

Page 3: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Key-Value Databases

FAST 2019 3

“100”

“html_doc”

“linux_logo”

Key Value

{[Green, Word, Gates]}

<html><head>…..</body></html>

Page 4: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Log-Structured Merge (LSM) Tree

• Optimized for heavy write application usage

• Designed for slow hard drives

FAST 2019 4

CK C1 C0

mergemergemerge

Disk Memory

In-memory buffer

Components are sorted

Page 5: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

LSM-tree: disadvantages

FAST 2019 5

CK C1 C0

Disk Memory

mergemergemerge

Page 6: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

LSM-tree: disadvantages

FAST 2019 5

CK C1 C0

Disk Memory

mergemergemergeGet(key)

Search(key)

Page 7: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

LSM-tree: disadvantages

FAST 2019 5

CK C1 C0

Disk Memory

mergemergemergeGet(key)

Search(key)

Page 8: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

LSM-tree: disadvantages

FAST 2019 5

CK C1 C0

Disk Memory

mergemergemergeGet(key)

Search(key)

Page 9: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

LSM-tree: disadvantages

FAST 2019 5

CK C1 C0

Disk Memory

mergemergemergeGet(key)

Search(key)

• Large overhead to locate needed data

Page 10: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

LSM-tree: disadvantages

FAST 2019 5

CK C1 C0

Disk Memory

mergemergemergeGet(key)

Search(key)

• Large overhead to locate needed data

Page 11: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

LSM-tree: disadvantages

FAST 2019 5

CK C1 C0

Disk Memory

mergemergemergeGet(key)

Search(key)

• Large overhead to locate needed data

• High disk write amplification

Page 12: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

State-of-the-art LSM-tree: LevelDB

FAST 2019 6

Level 0

Level 1MemTable

ImmutableMemTable

Application

Level 2

Sorted String Tables (SST)

Compaction

Merge from Level N to Level N+1

Flush

WAL

Write-Ahead-Log (no fsync)MANIFEST

Store file organization and

metadata In-memory skiplist to

buffer updates

Disk Memory

Each level is 10x larger than

previous

Mark Immutable when becoming

full

Sequential write to the disk

Page 13: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

LSM-tree optimizations

• Improve parallelism:• RocksDB (Facebook)

• HyperLevelDB

• Reduce write amplification:• PebblesDB (SOSP ‘17)

• Optimize for hardware(SSD):• VT-tree (FAST ‘13)

• WiscKey (FAST ‘16)

FAST 2019 7

Page 14: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

New era

FAST 2019 8

speedfast slow

Byte addressable Persistent storagePersistent

Memory

Page 15: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Simple approach

FAST 2019 9

Disk Memory

CK C1 C0

mergemergemerge

Page 16: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Simple approach

FAST 2019 9

Disk MemoryPersistent

Memory

CK C1 C0

mergemergemerge

Page 17: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Simple approach

FAST 2019 9

Disk MemoryPersistent

Memory

CK C1 C0

mergemergemerge

Page 18: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Our approach

FAST 2019 10

C1

Disk Memory

merge

C0

Page 19: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Our approach

FAST 2019 10

C1

Disk Memory

merge

C0

PersistentMemory

Page 20: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Our approach

FAST 2019 10

C1

Disk Memory

merge

C0merge

PersistentMemory

Single disk component C1that does self-merging

Page 21: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Our approach

FAST 2019 10

Index

C1

Disk Memory

merge

C0merge

PersistentMemory

Single disk component C1that does self-merging B+-tree to manage data

stored in the disk

Page 22: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Single-Level Merge DB (SLM-DB)

FAST 2019 11

MemTable

ImmutableMemTable

Disk Persistent Memory

… Data

Flush

Compaction

Level 0

Global B+-Tree

Application

MANIFESTNo WAL

Similar as in LevelDB

Index per-key that

stored in the diskSelect candidate

files to merge them together

Single level of SST files

Page 23: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Contributions

FAST 2019 12

Persistent MemTableNo Write-Ahead Logging (WAL)

Stronger consistency compared to LevelDB

Persistent B+-tree IndexPer-key index for fast search

No multi-leveled merge structure

Selective CompactionLive-key ratio of a Sorted-String Table

Leaf node scan in the B+-treeDegree of sequentiality per range query

Page 24: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Persistent MemTable

FAST 2019 13

Consistency

guaranteed

No consistency

guaranteed

0 1 2 3 5 6 7 8 9

Recoverable after failure

Page 25: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Insert into Persistent MemTable

FAST 2019 14

(1) create node

4

(2) Assign next

pointer and clflush()(3) Atomically change

next pointer

Consistency

guaranteed

No consistency

guaranteed

0 1 2 3 5 6 7 8 9

Page 26: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Single-Level Merge DB

FAST 2019 15

MemTable

ImmutableMemTable

Disk Persistent Memory

… Data

Compaction

Level 0

GlobalB+-Tree

Application

MANIFEST Flush

Page 27: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Flush

FAST 2019 16

File Creation

Index Insertion

Save to MANIFEST

• Key-Index insertion into B+-tree happens during Immutable Memtable Flush to disk

• FAST-FAIR B+-tree (Hwang et al., FAST ’18)

FlushFile creation

thread

B+-tree insertion thread

Time

Page 28: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Single-Level Merge DB

FAST 2019 17

MemTable

ImmutableMemTable

Disk Persistent Memory

… DataLevel 0

GlobalB+-Tree

Application

MANIFEST Flush

Compaction

Page 29: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Why we need Compaction?

FAST 2019 18

File#0 File#1 File#21 10 17 11 13 19 6 14 35

- Valid KV pair

- Obsolete KV pair

Page 30: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Why we need Compaction?

FAST 2019 18

File#0 File#1 File#21 10 17 11 13 19 6 14 35 File#3 1 11 14

New file

- Valid KV pair

- Obsolete KV pair

Page 31: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Why we need Compaction?

FAST 2019 18

File#0 File#1 File#21 10 17 11 13 19 6 14 35 File#3 1 11 14

New file

- Valid KV pair

- Obsolete KV pair

KV-pairs became obsolete

Page 32: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Why we need Compaction?

FAST 2019 18

File#0 File#1 File#21 10 17 11 13 19 6 14 35 File#3 1 11 14

New file

File#4 12 17 35

New file

- Valid KV pair

- Obsolete KV pair

KV-pairs became obsolete

Page 33: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Why we need Compaction?

FAST 2019 18

File#0 File#1 File#21 10 17 11 13 19 6 14 35 File#3 1 11 14

New file

File#4 12 17 35

New file

- Valid KV pair

- Obsolete KV pair

KV-pairs became obsolete

Page 34: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Why we need Compaction?

FAST 2019 18

File#0 File#1 File#21 10 17 11 13 19 6 14 35 File#3 1 11 14

New file

File#4 12 17 35

New file

Need garbage collection (GC)

- Valid KV pair

- Obsolete KV pair

KV-pairs became obsolete

Page 35: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Why else?

FAST 2019 19

File#0 File#1 File#2 File#3 File#41 10 17 11 13 19 6 14 35 14 21 32 2 8 17

RangeQuery(5, 12)

Page 36: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Why else?

FAST 2019 19

File#0 File#1 File#2 File#3 File#41 10 17 11 13 19 6 14 35 14 21 32 2 8 17

RangeQuery(5, 12)

Page 37: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Why else?

FAST 2019 19

File#0 File#1 File#2 File#3 File#41 10 17 11 13 19 6 14 35 14 21 32 2 8 17

RangeQuery(5, 12)

Page 38: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Why else?

FAST 2019 19

File#0 File#1 File#2 File#3 File#41 10 17 11 13 19 6 14 35 14 21 32 2 8 17

RangeQuery(5, 12)

Page 39: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Why else?

FAST 2019 19

File#0 File#1 File#2 File#3 File#41 10 17 11 13 19 6 14 35 14 21 32 2 8 17

RangeQuery(5, 12)

Page 40: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Why else?

FAST 2019 19

File#0 File#1 File#2 File#3 File#41 10 17 11 13 19 6 14 35 14 21 32 2 8 17

RangeQuery(5, 12) Need to improve sequentiality

Page 41: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Selective compaction

• Selectively pick SSTable files

• Make those files as compaction candidates

• Merge together most overlapping compaction candidates

• Selection schemes for compaction candidates:oLive-key ratio selection of an SSTable (for GC)

oLeaf node scans in the B+-tree (for sequentiality) [see paper]

oDegree of sequentiality per range query (for sequentiality) [see paper]

FAST 2019 20

Page 42: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

CompactionCandidates

Live-key ratio selection

FAST 2019 21

File 1

PM B+-tree

1 3 5 File 2 1 2 4 File 3 2 6 7

• To collect garbage• If live (valid) to total key ratio is below threshold, then add to candidates

Ratio 66.6% Ratio 66.6% Ratio 66.6%

- Valid KV pair

- Obsolete KV pair

Ratio threshold - 50% PM

Disk

Page 43: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

CompactionCandidates

Live-key ratio selection

FAST 2019 21

File 1

PM B+-tree

1 3 5 File 2 1 2 4 File 3 2 6 7 File 4 1 2 4

• To collect garbage• If live (valid) to total key ratio is below threshold, then add to candidates

Ratio 66.6% Ratio 66.6% Ratio 66.6%

- Valid KV pair

- Obsolete KV pair

Ratio threshold - 50% PM

Disk

Page 44: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

CompactionCandidates

Live-key ratio selection

FAST 2019 21

File 1

PM B+-tree

1 3 5 File 2 1 2 4 File 3 2 6 7 File 4 1 2 4

• To collect garbage• If live (valid) to total key ratio is below threshold, then add to candidates

Ratio 66.6% Ratio 66.6%Ratio 33.3%

- Valid KV pair

- Obsolete KV pair

Ratio threshold - 50% PM

Disk

Page 45: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

CompactionCandidates

Live-key ratio selection

FAST 2019 21

File 1

PM B+-tree

1 3 5 File 2 1 2 4 File 3 2 6 7 File 4 1 2 4

• To collect garbage• If live (valid) to total key ratio is below threshold, then add to candidates

Ratio 66.6% Ratio 33.3% Ratio 33.3%

- Valid KV pair

- Obsolete KV pair

Ratio threshold - 50% PM

Disk

Page 46: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

CompactionCandidates

Live-key ratio selection

FAST 2019 21

File 1

PM B+-tree

1 3 5 File 2 1 2 4 File 3 2 6 7 File 4 1 2 4

• To collect garbage• If live (valid) to total key ratio is below threshold, then add to candidates

Ratio 66.6% Ratio 0.0% Ratio 33.3%

- Valid KV pair

- Obsolete KV pair

Ratio threshold - 50% PM

Disk

Page 47: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

CompactionCandidates

Live-key ratio selection

FAST 2019 21

File 1

PM B+-tree

1 3 5 File 2 1 2 4 File 3 2 6 7 File 4 1 2 4

• To collect garbage• If live (valid) to total key ratio is below threshold, then add to candidates

Ratio 66.6% Ratio 0.0% Ratio 33.3% Ratio 100.0%

- Valid KV pair

- Obsolete KV pair

Ratio threshold - 50% PM

Disk

Page 48: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

CompactionCandidates

Live-key ratio selection

FAST 2019 21

File 1

PM B+-tree

1 3 5 File 3 2 6 7 File 4 1 2 4

• To collect garbage• If live (valid) to total key ratio is below threshold, then add to candidates

Ratio 66.6% Ratio 0.0% Ratio 33.3% Ratio 100.0%

- Valid KV pair

- Obsolete KV pair

Ratio threshold - 50% PM

Disk

Page 49: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

CompactionCandidates

Live-key ratio selection

FAST 2019 21

File 1

PM B+-tree

1 3 5 File 3 2 6 7 File 4 1 2 4

• To collect garbage• If live (valid) to total key ratio is below threshold, then add to candidates

Ratio 66.6% Ratio 0.0% Ratio 33.3% Ratio 100.0%

- Valid KV pair

- Obsolete KV pair

Ratio threshold - 50% PM

Disk

Page 50: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

CompactionCandidates

Live-key ratio selection

FAST 2019 21

File 1

PM B+-tree

1 3 5

File 3 2 6 7

File 4 1 2 4

• To collect garbage• If live (valid) to total key ratio is below threshold, then add to candidates

Ratio 66.6% Ratio 0.0% Ratio 33.3% Ratio 100.0%

- Valid KV pair

- Obsolete KV pair

Ratio threshold - 50% PM

Disk

Page 51: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Compaction

FAST 2019 22

Pick

File#1 File#2 File#3 File#4File#0 File#5 File#6

Compaction candidate files Files

• Compaction triggered when there are too many compaction candidate files

File creation thread

B+-tree insertion thread

Time

Page 52: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Compaction

FAST 2019 22

Pick

File#1 File#2 File#3 File#4File#0 File#5 File#6

Compaction candidate files Files

• Compaction triggered when there are too many compaction candidate files

File creation thread

B+-tree insertion thread

Time

Page 53: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Compaction

FAST 2019 22

Pick

Merge

File#1 File#2 File#3 File#4File#0 File#5 File#6

Compaction candidate files Files

• Compaction triggered when there are too many compaction candidate files

File creation thread

B+-tree insertion thread

Time

Page 54: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Compaction

FAST 2019 22

File #7 CreationPick

Merge

File#1 File#2 File#3 File#4File#0 File#5 File#6

Compaction candidate files Files

• Compaction triggered when there are too many compaction candidate files

File creation thread

B+-tree insertion thread

Time

Page 55: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Compaction

FAST 2019 22

File #7 Creation

Index File#7

Pick

Merge

File#1 File#2 File#3 File#4File#0 File#5 File#6

Compaction candidate files Files

• Compaction triggered when there are too many compaction candidate files

File creation thread

B+-tree insertion thread

Time

Page 56: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Compaction

FAST 2019 22

File #7 Creation

Index File#7

Pick

Merge

File#1 File#2 File#3 File#4File#0 File#5 File#6

Compaction candidate files Files

• Compaction triggered when there are too many compaction candidate files

File creation thread

B+-tree insertion thread

Time

Page 57: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Compaction

FAST 2019 22

File #7 Creation

Index File#7

File#8 Creation Pick

Merge

File#1 File#2 File#3 File#4File#0 File#5 File#6

Compaction candidate files Files

• Compaction triggered when there are too many compaction candidate files

File creation thread

B+-tree insertion thread

Time

Page 58: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Compaction

FAST 2019 22

File #7 Creation

Index File#7

File#8 Creation

Index File#8

Pick

File#1 File#2 File#3 File#4File#0 File#5 File#6

Compaction candidate files Files

• Compaction triggered when there are too many compaction candidate files

File creation thread

B+-tree insertion thread

Time

Page 59: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Compaction

FAST 2019 22

File #7 Creation

Index File#7

Save to MANIFEST

File#8 Creation

Index File#8

Pick

File#1 File#2 File#3 File#4File#0 File#5 File#6

Compaction candidate files Files

• Compaction triggered when there are too many compaction candidate files

File creation thread

B+-tree insertion thread

Time

Page 60: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

General operations

•Put

•Put if exists/Put if not exists

•Get

•Scan

FAST 2019 23

Page 61: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree Index

Files

ImmutableMemTable

MemTableClient

Put(key, value)

FAST 2019 24

Disk PM

K V

Page 62: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree Index

Files

ImmutableMemTable

MemTableClient

Put(key, value)

FAST 2019 24

Disk PM

K V

Page 63: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree Index

Files

ImmutableMemTable

MemTableClient

Put(key, value)

FAST 2019 24

Disk PM

K V

Page 64: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree Index

Files

ImmutableMemTable

MemTableClient

Put(key, value)

FAST 2019 24

Disk PM

K VK

Page 65: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree Index

Files

ImmutableMemTable

MemTable

Put(key, value) if exists/if not exists

FAST 2019 25

Disk PM

ClientK V

Page 66: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree Index

Files

ImmutableMemTable

MemTable

Put(key, value) if exists/if not exists

FAST 2019 25

Disk PM

ClientK V

Make sure if statement is

fulfilled before Put()

Page 67: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree Index

Files

ImmutableMemTable

MemTable

Put(key, value) if exists/if not exists

FAST 2019 25

Disk PM

ClientK V

Query

Make sure if statement is

fulfilled before Put()

Page 68: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree Index

Files

ImmutableMemTable

MemTable

Put(key, value) if exists/if not exists

FAST 2019 25

Disk PM

ClientK V

Query

Make sure if statement is

fulfilled before Put()

Page 69: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree Index

Files

ImmutableMemTable

MemTable

Put(key, value) if exists/if not exists

FAST 2019 25

Disk PM

ClientK V

Query

Make sure if statement is

fulfilled before Put()

Page 70: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree Index

Files

ImmutableMemTable

MemTable

Put(key, value) if exists/if not exists

FAST 2019 25

Disk PM

ClientK V

Query

Make sure if statement is

fulfilled before Put()

Statement is true

Page 71: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree Index

Files

ImmutableMemTable

MemTable

Put(key, value) if exists/if not exists

FAST 2019 25

Disk PM

ClientK V

Query

Make sure if statement is

fulfilled before Put()

Statement is true

Page 72: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree Index

Files

ImmutableMemTable

MemTable

Get(key)

FAST 2019 26

Disk PM

ClientK

Page 73: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree Index

Files

ImmutableMemTable

MemTable

Get(key)

FAST 2019 26

Disk PM

ClientK

Query

Page 74: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree Index

Files

ImmutableMemTable

MemTable

Get(key)

FAST 2019 26

Disk PM

ClientK

Query

Page 75: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree Index

Files

ImmutableMemTable

MemTable

Get(key)

FAST 2019 26

Disk PM

ClientK

Query

Page 76: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree Index

Files

ImmutableMemTable

MemTable

Get(key)

FAST 2019 26

Disk PM

ClientK

Query

Key exists

Page 77: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree Index

Files

ImmutableMemTable

MemTable

Get(key)

FAST 2019 26

Disk PM

ClientK

VQuery

Key exists

Page 78: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree Index

Files

ImmutableMemTable

MemTable

Get(key)

FAST 2019 26

Disk PM

ClientK V

Query

Key exists

Page 79: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree Index

Files

ImmutableMemTable

MemTableClient

Scan(keyi, keyj)

FAST 2019 27

Disk PM

Ki Kj

Page 80: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree Index

Files

ImmutableMemTable

MemTableClient

Scan(keyi, keyj)

FAST 2019 27

Disk PM

Ki Kj

Create iterator

Page 81: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree Index

Files

ImmutableMemTable

MemTableClient

Scan(keyi, keyj)

FAST 2019 27

Disk PM

Ki KjKi+3Ki

Ki+1Ki

Ki+1Ki Ki+2

Create iterator

Page 82: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree Index

Files

ImmutableMemTable

MemTableClient

Scan(keyi, keyj)

FAST 2019 27

Disk PM

Ki KjKi+3Ki

Ki+1Ki

Ki+1Ki Ki+2

Create iterator

Page 83: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Evaluation

FAST 2019 28

Intel Xeon E5-2640 v3

DRAM: 4GBEmulated PM: 7GB

Intel SSD DC S3520

Ubuntu 18.04Kernel 4.15

DB: 8GB/20GBMemtable: 64MB

• PM write latency 500ns (5x of DRAM write latency)• PM read latency & bandwidth same same as DRAM’s• Intel’s PMDK used to control PM pool

Page 84: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

db_bench microbenchmark

FAST 2019 29

Random write Random read Range query

Overhead amortized from large value size

Low sequentiality

Steady performance increase

Low file locating overhead

Range size = 100

Page 85: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

db_bench microbenchmark

FAST 2019 29

Random write Random read Range query

Overhead amortized from large value size

Low sequentiality

Steady performance increase

Low file locating overhead

• ~2.56x less disk write amplification• Max 700MB used in PM Range size = 100

Page 86: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

PM sensitivity

FAST 2019 30

PM write latency sensitivityRandom write benchmark

PM bandwidth sensitivity

Emulated by inserting cpu pause after clfush()

Emulated using Thermal Throttling

db_bench

Page 87: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

YCSB

FAST 2019 31

100% I 50% R50% U

95% R5% U

95% R5% U

100% I95% LR5% U

95% S5% U

50% R50% RMW

Page 88: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

YCSB

FAST 2019 31

100% I 50% R50% U

95% R5% U

95% R5% U

100% I95% LR5% U

95% S5% U

50% R50% RMW

Better write performance

Page 89: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

YCSB

FAST 2019 31

100% I 50% R50% U

95% R5% U

95% R5% U

100% I95% LR5% U

95% S5% U

50% R50% RMW

Very fast on update operations

Better write performance

Page 90: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

YCSB

FAST 2019 31

100% I 50% R50% U

95% R5% U

95% R5% U

100% I95% LR5% U

95% S5% U

50% R50% RMW

Very fast on update operations

Only 1KB case is slower

Better write performance

Page 91: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

YCSB

FAST 2019 31

100% I 50% R50% U

95% R5% U

95% R5% U

100% I95% LR5% U

95% S5% U

50% R50% RMW

Very fast on update operations

Only 1KB case is slower

• On average, beats every workload• Up to 7.7x less disk write amplification

Better write performance

Page 92: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Conclusion

• Novel design of Key-Value stores with Persistent Memory

• High write/read performance compared to LevelDB

• Comparable scan performance

• Low write amplification

• Near-optimal read amplification

FAST 2019 32

Page 93: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Thanks!Questions?

FAST 2019 33

Page 94: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

SLM-DB: Single Level Merge Key-Value Store with Persistent Memory

Olzhas Kaiyrakhmet, Songyi Lee, Beomseok Nam, Sam H. Noh, Young-ri Choi

Page 95: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

db_bench microbenchmark (20GB)

FAST 2019 35

Random write Random read Range query

Page 96: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Effect of persistent MemTable

FAST 2019 36

Random write performance Total disk write

Page 97: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree

B+-tree leaf node scan

FAST 2019 37

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Files

CompactionCandidates

• To increase sequentiality of key-values with scans in round-robin fashion• If the number of unique file accesses is above threshold, then add to candidates

Threshold = 2

Page 98: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree

B+-tree leaf node scan

FAST 2019 37

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Files

CompactionCandidates

• To increase sequentiality of key-values with scans in round-robin fashion• If the number of unique file accesses is above threshold, then add to candidates

Threshold = 2

Page 99: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree

B+-tree leaf node scan

FAST 2019 37

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Files

CompactionCandidates

• To increase sequentiality of key-values with scans in round-robin fashion• If the number of unique file accesses is above threshold, then add to candidates

Threshold = 2

Page 100: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree

B+-tree leaf node scan

FAST 2019 37

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Files

CompactionCandidates

• To increase sequentiality of key-values with scans in round-robin fashion• If the number of unique file accesses is above threshold, then add to candidates

Threshold = 2

Page 101: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree

B+-tree leaf node scan

FAST 2019 37

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Files

CompactionCandidates

• To increase sequentiality of key-values with scans in round-robin fashion• If the number of unique file accesses is above threshold, then add to candidates

Threshold = 2

Page 102: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree

B+-tree leaf node scan

FAST 2019 37

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Files

CompactionCandidates

• To increase sequentiality of key-values with scans in round-robin fashion• If the number of unique file accesses is above threshold, then add to candidates

Threshold = 2

Page 103: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

B+-tree

B+-tree leaf node scan

FAST 2019 37

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Files

CompactionCandidates

• To increase sequentiality of key-values with scans in round-robin fashion• If the number of unique file accesses is above threshold, then add to candidates

Threshold = 2

Page 104: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Degree of sequentiality per range query

FAST 2019 38

B+-tree

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

RangeQuery(7, 14)

Files

CompactionCandidates

• To increase sequentiality of key-values during range query operation• If subrange max unique file accesses is above threshold, then add to

candidates

Threshold = 2

Page 105: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Degree of sequentiality per range query

FAST 2019 38

B+-tree

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

RangeQuery(7, 14)

Files

CompactionCandidates

• To increase sequentiality of key-values during range query operation• If subrange max unique file accesses is above threshold, then add to

candidates

Threshold = 2

Page 106: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Degree of sequentiality per range query

FAST 2019 38

B+-tree

1 2 3 4 5 6

7 8 9 10 11 12 13 14

15 16

RangeQuery(7, 14)

Files

CompactionCandidates

• To increase sequentiality of key-values during range query operation• If subrange max unique file accesses is above threshold, then add to

candidates

Threshold = 2

Page 107: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Degree of sequentiality per range query

FAST 2019 38

B+-tree

1 2 3 4 5 6

7 8 9 10 11 12 13 14

15 16

RangeQuery(7, 14)

Files

CompactionCandidates

• To increase sequentiality of key-values during range query operation• If subrange max unique file accesses is above threshold, then add to

candidates

Threshold = 2

Page 108: SLM-DB: Single Level Merge Key-Value Store with Persistent ... · Merge from Level N to Level N+1 Flush WAL Write-Ahead-MANIFEST Log (no fsync) Store file organization and metadata

Degree of sequentiality per range query

FAST 2019 38

B+-tree

1 2 3 4 5 6

7 8 9 10 11 12 13 14

15 16

RangeQuery(7, 14)

Files

CompactionCandidates

• To increase sequentiality of key-values during range query operation• If subrange max unique file accesses is above threshold, then add to

candidates

Threshold = 2


Recommended