+ All Categories
Home > Documents > Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple...

Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple...

Date post: 14-May-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
84
Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching Kefei Wang and Feng Chen Louisiana State University SoCC '18 Carlsbad, CA
Transcript
Page 1: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

Kefei Wang and Feng Chen Louisiana State University

SoCC '18 Carlsbad, CA

Page 2: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Key-value Systems in Internet Services

!2

• Key-value systems are widely used today – Online shopping – Social media – Cloud storage – Big data

Key ValueProduct_ID Product_Name

URL Image

Page 3: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Key-value Caching

!3

“First line of defense” in today’s Internet service • High throughput • Low latency

Operations: SET GET DELETE

Web Server

Cache ServerDatabase Server

Client requests

Page 4: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Key-value Caching

!3

“First line of defense” in today’s Internet service • High throughput • Low latency

Operations: SET GET DELETE

Web Server

Cache ServerDatabase Server

Client requests

Hit

Page 5: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Key-value Caching

!3

“First line of defense” in today’s Internet service • High throughput • Low latency

Operations: SET GET DELETE

Web Server

Cache ServerDatabase Server

Client requests

Miss Hit

Page 6: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Key-value Caching

!3

“First line of defense” in today’s Internet service • High throughput • Low latency

Operations: SET GET DELETE

Web Server

Cache ServerDatabase Server

Client requests

Miss Hit

Page 7: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Flash-based Key-value Caching

!4

• In-flash key-value caches – Key-values are stored in commercial flash SSDs – Example: Facebook’s McDipper, Twitter’s Fatcache

• Key features – Memcached compatible (SET, GET, DELETE) – Advantages: low cost and high performance

• McDipper: reduce 90% deployed servers, 90% GETs < 1ms*

Speed Power Cost Capacity PersistencyDRAM High High High Low NoFlash Low- Low+ Low+ High+ Yes+

*https://www.facebook.com/notes/facebook-engineering/mcdipper-a-key-value-cache-for-flash-storage/10151347090423920/

Page 8: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Flash-based Key-value Caching

!5

Key-value slabs

DRAM Memory

Hash-based mapping

Data stored in flash and all the mappings in DRAM

Flash SSD

Page 9: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Flash-based Key-value Caching

!5

Key-value slabs

DRAM Memory

Hash-based mapping

Slab

Data stored in flash and all the mappings in DRAM

Flash SSD

Page 10: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Flash-based Key-value Caching

!5

Key-value slabs

DRAM Memory

Hash-based mapping

MD[20] Slab_ID Slot_ID Expiry

SlabSlot

Data stored in flash and all the mappings in DRAM

Flash SSD

Page 11: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Flash-based Key-value Caching

!5

Key-value slabs

DRAM Memory

Hash-based mapping

MD[20] Slab_ID Slot_ID Expiry

SlabSlot

Data stored in flash and all the mappings in DRAM

Flash SSD

Page 12: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Scalability Challenge

• High Index-to-data Ratio– Key-value cache is dominated by small items (90% < 500 bytes)– Key-value mapping entry size: 44 bytes in Fatcache

!6Atikoglu et al., “Workload Analysis of A Large-scale Key-value Store”, in SIGMETRICS’12.

Page 13: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Scalability Challenge

• High Index-to-data Ratio– Key-value cache is dominated by small items (90% < 500 bytes)– Key-value mapping entry size: 44 bytes in Fatcache

!6Atikoglu et al., “Workload Analysis of A Large-scale Key-value Store”, in SIGMETRICS’12.

< 500 bytes

Page 14: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Scalability Challenge

• High Index-to-data Ratio– Key-value cache is dominated by small items (90% < 500 bytes)– Key-value mapping entry size: 44 bytes in Fatcache

• Flash memory vs. DRAM memory– Capacity: Flash cache is 10-100x larger than memory-based cache– Price: 1-TB flash ($200-500), 1-TB DRAM (>$10,000)– Growth: flash (50-60% per year), DRAM (25-40% per year)

!6Atikoglu et al., “Workload Analysis of A Large-scale Key-value Store”, in SIGMETRICS’12.

< 500 bytes

Page 15: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Scalability Challenge

• High Index-to-data Ratio– Key-value cache is dominated by small items (90% < 500 bytes)– Key-value mapping entry size: 44 bytes in Fatcache

• Flash memory vs. DRAM memory– Capacity: Flash cache is 10-100x larger than memory-based cache– Price: 1-TB flash ($200-500), 1-TB DRAM (>$10,000)– Growth: flash (50-60% per year), DRAM (25-40% per year)

!6Atikoglu et al., “Workload Analysis of A Large-scale Key-value Store”, in SIGMETRICS’12.

< 500 bytes

150 GB

1 TB

DRAM FlashAssume average key-value size is 300 bytes

Page 16: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Scalability Challenge

• High Index-to-data Ratio– Key-value cache is dominated by small items (90% < 500 bytes)– Key-value mapping entry size: 44 bytes in Fatcache

• Flash memory vs. DRAM memory– Capacity: Flash cache is 10-100x larger than memory-based cache– Price: 1-TB flash ($200-500), 1-TB DRAM (>$10,000)– Growth: flash (50-60% per year), DRAM (25-40% per year)

!6Atikoglu et al., “Workload Analysis of A Large-scale Key-value Store”, in SIGMETRICS’12.

< 500 bytes

300 GB

2 TB

DRAM FlashAssume average key-value size is 300 bytes

Page 17: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Scalability Challenge

• High Index-to-data Ratio– Key-value cache is dominated by small items (90% < 500 bytes)– Key-value mapping entry size: 44 bytes in Fatcache

• Flash memory vs. DRAM memory– Capacity: Flash cache is 10-100x larger than memory-based cache– Price: 1-TB flash ($200-500), 1-TB DRAM (>$10,000)– Growth: flash (50-60% per year), DRAM (25-40% per year)

!6Atikoglu et al., “Workload Analysis of A Large-scale Key-value Store”, in SIGMETRICS’12.

< 500 bytes

300 GB

2 TB

DRAM FlashAssume average key-value size is 300 bytesA technical dilemma: We have a lot of flash space to cache

the data, but we don’t have enough DRAM to index the data

Page 18: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Evolution of Key-value Caching

!7

key

Key-value Slabs (DRAM)

Mapping Table (DRAM)

Page 19: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Evolution of Key-value Caching

!7

key

Key-value Slabs (DRAM)

Mapping Table (DRAM)

Key-value Slabs (Flash)

key

Mapping Table (DRAM)

Page 20: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Evolution of Key-value Caching

!7

key

Key-value Slabs (DRAM)

Mapping Table (DRAM) Mappings (Flash)

Key-value Slabs (Flash)

Mappings (DRAM)

key

Key-value Slabs (Flash)

key

Mapping Table (DRAM)

Page 21: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Evolution of Key-value Caching

!7

key

Key-value Slabs (DRAM)

Mapping Table (DRAM) Mappings (Flash)

Key-value Slabs (Flash)

Mappings (DRAM)

key

Zero Flash I/OKey-value Slabs (Flash)

key

Mapping Table (DRAM)

Page 22: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Evolution of Key-value Caching

!7

key

Key-value Slabs (DRAM)

Mapping Table (DRAM) Mappings (Flash)

Key-value Slabs (Flash)

Mappings (DRAM)

key

Zero Flash I/O One Flash I/OKey-value Slabs (Flash)

key

Mapping Table (DRAM)

Page 23: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Evolution of Key-value Caching

• Leverage the strong locality to differentiate hot and cold mappings – Hold the most popular mappings in a small in-DRAM mapping structure – Leave the majority mappings in a large in-flash mapping structure

!7

key

Key-value Slabs (DRAM)

Mapping Table (DRAM) Mappings (Flash)

Key-value Slabs (Flash)

Mappings (DRAM)

key

Zero Flash I/O One Flash I/O N Flash I/OsKey-value Slabs (Flash)

key

Mapping Table (DRAM)

Page 24: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Outline

• Cascade mapping design • Optimizations • Evaluation results • Conclusions

!8

Page 25: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Cascade Mapping

!9

Hierarchical Mapping Structure – Tier 1 – Hot mappings

• Hash index based search in memory – Tier 2 – Warm mappings

• High-bandwidth quick scan in flash – Tier 3 – Cold mappings

• Efficient linked-list structure in flash

Page 26: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Cascade Mapping

!9

Tier 2

Tier 3

Tier 1Memory space

Flash space

Hierarchical Mapping Structure – Tier 1 – Hot mappings

• Hash index based search in memory – Tier 2 – Warm mappings

• High-bandwidth quick scan in flash – Tier 3 – Cold mappings

• Efficient linked-list structure in flash

Page 27: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Cascade Mapping

!9

Tier 2

Tier 3

Tier 1Memory space

Flash space

Hierarchical Mapping Structure – Tier 1 – Hot mappings

• Hash index based search in memory – Tier 2 – Warm mappings

• High-bandwidth quick scan in flash – Tier 3 – Cold mappings

• Efficient linked-list structure in flash

Key

Page 28: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Cascade Mapping

!9

Tier 2

Tier 3

Tier 1Memory space

Flash space

Hierarchical Mapping Structure – Tier 1 – Hot mappings

• Hash index based search in memory – Tier 2 – Warm mappings

• High-bandwidth quick scan in flash – Tier 3 – Cold mappings

• Efficient linked-list structure in flash

Key

Page 29: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Cascade Mapping

!9

Tier 2

Tier 3

Tier 1Memory space

Flash space

Hierarchical Mapping Structure – Tier 1 – Hot mappings

• Hash index based search in memory – Tier 2 – Warm mappings

• High-bandwidth quick scan in flash – Tier 3 – Cold mappings

• Efficient linked-list structure in flash

Key

Page 30: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Cascade Mapping

!9

Tier 2

Tier 3

Tier 1Memory space

Flash space

Hierarchical Mapping Structure – Tier 1 – Hot mappings

• Hash index based search in memory – Tier 2 – Warm mappings

• High-bandwidth quick scan in flash – Tier 3 – Cold mappings

• Efficient linked-list structure in flash

Key

Page 31: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Cascade Mapping

!9

Tier 2

Tier 3

Tier 1Memory space

Flash space

Hierarchical Mapping Structure – Tier 1 – Hot mappings

• Hash index based search in memory – Tier 2 – Warm mappings

• High-bandwidth quick scan in flash – Tier 3 – Cold mappings

• Efficient linked-list structure in flash

Key-value slabs

Key

Page 32: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 1: A Mapping Table in Memory

!10

Bucket 0

KeyHash

…Bucket 1

Bucket n

Par

titio

n 1

Par

titio

n n

… …

Virtual buffer

Demote to Tier 2

Page 33: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 1: A Mapping Table in Memory

!10

Bucket 0

KeyHash

…Bucket 1

Bucket n

Par

titio

n 1

Par

titio

n n

… …

Virtual buffer

Demote to Tier 2

Page 34: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 1: A Mapping Table in Memory

!10

Bucket 0

KeyHash

…Bucket 1

Bucket n

Par

titio

n 1

Par

titio

n n

… …

Virtual buffer

Demote to Tier 2

Page 35: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 1: A Mapping Table in Memory

!10

Bucket 0

KeyHash

…Bucket 1

Bucket n

Par

titio

n 1

Par

titio

n n

… …

Virtual buffer

Demote to Tier 2

Page 36: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 1: A Mapping Table in Memory

!10

Bucket 0

KeyHash

…Bucket 1

Bucket n

Par

titio

n 1

Par

titio

n n

… …

Virtual buffer

Demote to Tier 2

Page 37: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 1: A Mapping Table in Memory

!10

Bucket 0

KeyHash

…Bucket 1

Bucket n

Par

titio

n 1

Par

titio

n n

… …

Virtual buffer

Demote to Tier 2

Page 38: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 1: A Mapping Table in Memory

!10

Bucket 0

KeyHash

…Bucket 1

Bucket n

Par

titio

n 1

Par

titio

n n

… …

Virtual buffer

Demote to Tier 2

Page 39: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 1: A Mapping Table in Memory

!10

Bucket 0

KeyHash

…Bucket 1

Bucket n

Par

titio

n 1

Par

titio

n n

… …

Virtual buffer

Demote to Tier 2 0

20

40

60

80

100

4 6 8 10 12 14 16 18 20

Hit R

atio

(%)

Ratio of Tier 1 (%)

CLOCKLRUFIFO

Page 40: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 2: Direct Indexing in Flash

• Direct mapping block– A set of mapping entries demoted from Tier 1

!11

Page 41: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 2: Direct Indexing in Flash

• Direct mapping block– A set of mapping entries demoted from Tier 1

!11

Page 42: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 2: Direct Indexing in Flash

• Direct mapping block– A set of mapping entries demoted from Tier 1

!11

Page 43: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 2: Direct Indexing in Flash

• Direct mapping block– A set of mapping entries demoted from Tier 1

!11

T T T FOUND

Serial Search: 3x T

Page 44: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 2: Direct Indexing in Flash

• Direct mapping block– A set of mapping entries demoted from Tier 1

!11

T T T FOUND

Serial Search: 3x TChen et al., “Internal Parallelism of Flash-based Solid State Drives”, ACM Transactions on Storage, 12:3, May 2016

Page 45: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 2: Direct Indexing in Flash

• Direct mapping block– A set of mapping entries demoted from Tier 1

!11

T T T FOUND

Serial Search: 3x TChen et al., “Internal Parallelism of Flash-based Solid State Drives”, ACM Transactions on Storage, 12:3, May 2016

Page 46: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 2: Direct Indexing in Flash

• Direct mapping block– A set of mapping entries demoted from Tier 1

• An FIFO array of blocks– The most recent version is always in the latest position

• Parallelized Batch Search– Parallel I/Os to load multiple mapping blocks into memory– Scan and find the most recent version of the data in one I/O time

!11

T T T FOUND

Serial Search: 3x TChen et al., “Internal Parallelism of Flash-based Solid State Drives”, ACM Transactions on Storage, 12:3, May 2016

Page 47: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 2: Direct Indexing in Flash

• Direct mapping block– A set of mapping entries demoted from Tier 1

• An FIFO array of blocks– The most recent version is always in the latest position

• Parallelized Batch Search– Parallel I/Os to load multiple mapping blocks into memory– Scan and find the most recent version of the data in one I/O time

!11

T T T FOUND

Serial Search: 3x T

FIFO

Blo

ck 2

Blo

ck 4

Blo

ck 3

Blo

ck 1

Chen et al., “Internal Parallelism of Flash-based Solid State Drives”, ACM Transactions on Storage, 12:3, May 2016

Page 48: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 2: Direct Indexing in Flash

• Direct mapping block– A set of mapping entries demoted from Tier 1

• An FIFO array of blocks– The most recent version is always in the latest position

• Parallelized Batch Search– Parallel I/Os to load multiple mapping blocks into memory– Scan and find the most recent version of the data in one I/O time

!11

T T T FOUND

Serial Search: 3x T

FIFO

Blo

ck 2

Blo

ck 4

Blo

ck 3

Blo

ck 1

T

FOUND

Parallel Search: 1x TChen et al., “Internal Parallelism of Flash-based Solid State Drives”, ACM Transactions on Storage, 12:3, May 2016

Page 49: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 3: Hash Table List Designs

!12

Bucket 0

Bucket 1

Bucket 1023

… … …

Page 50: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 3: Hash Table List Designs

!12

Bucket 0

Bucket 1

Bucket 1023

… … …

Memory buffers

Page 51: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 3: Hash Table List Designs

!12

Bucket 0

Bucket 1

Bucket 1023

• “Narrow” hash table – Long list to walk through – Need less memory buffers (e.g., 128MB)

… … …

Memory buffers

Page 52: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 3: Hash Table List Designs

!12

Bucket 0

Bucket 1

Bucket 1

Bucket 1048575

Bucket 1023

Bucket 0

• “Narrow” hash table – Long list to walk through – Need less memory buffers (e.g., 128MB)

… … …

… …… …… …… …

Memory buffers

Page 53: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 3: Hash Table List Designs

!12

Bucket 0

Bucket 1

Bucket 1

Bucket 1048575

Bucket 1023

Bucket 0

• “Narrow” hash table – Long list to walk through – Need less memory buffers (e.g., 128MB)

… … …

… …… …… …… …

Memory buffers

Memory efficiency v.s. I/O efficiency

Page 54: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 3: Hash Table List Designs

!12

Bucket 0

Bucket 1

Bucket 1

Bucket 1048575

Bucket 1023

Bucket 0

• “Narrow” hash table – Long list to walk through – Need less memory buffers (e.g., 128MB)

• “Wide” hash table – Short list to walk through – Need more memory buffers (e.g., 128GB)

… … …

… …… …… …… …

Memory buffers

Memory efficiency v.s. I/O efficiency

Page 55: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 3: Dual-mode Hash Table

Memory & I/O efficiency both achieved – Only one set of dynamic buffers – Write to active list first – Reorganize into inactive list – Combines the advantages

!13

Bucket 0

Bucket 1

Bucket 1023

… …

Bucket 1

Bucket 1048575

Bucket 0

… …

… …… …… …

Bucket 1023

Ded

icat

ed b

uffe

rs

Writ

es

Active table

Inactive table

Page 56: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 3: Dual-mode Hash Table

Memory & I/O efficiency both achieved – Only one set of dynamic buffers – Write to active list first – Reorganize into inactive list – Combines the advantages

!13

Bucket 0

Bucket 1

Bucket 1023

… …

Bucket 1

Bucket 1048575

Bucket 0

… …

… …… …… …

Length limit

Bucket 1023

Ded

icat

ed b

uffe

rs

Writ

es

Active table

Inactive table

Page 57: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 3: Dual-mode Hash Table

Memory & I/O efficiency both achieved – Only one set of dynamic buffers – Write to active list first – Reorganize into inactive list – Combines the advantages

!13

Bucket 0

Bucket 1

Bucket 1023

… …

Bucket 1

Bucket 1048575

Bucket 0

… …

… …… …… …

Length limit

Bucket 1023

Ded

icat

ed b

uffe

rs

Writ

es

Active table

Inactive table

Page 58: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 3: Dual-mode Hash Table

Memory & I/O efficiency both achieved – Only one set of dynamic buffers – Write to active list first – Reorganize into inactive list – Combines the advantages

!13

Bucket 0

Bucket 1

Bucket 1023

… …

Bucket 1

Bucket 1048575

Bucket 0

… …

… …… …… …

Length limit

Bucket 1023

Dyn

amic

buf

fers

Ded

icat

ed b

uffe

rs

Writ

es

Active table

Inactive table

Page 59: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 3: Dual-mode Hash Table

Memory & I/O efficiency both achieved – Only one set of dynamic buffers – Write to active list first – Reorganize into inactive list – Combines the advantages

!13

Bucket 0

Bucket 1

Bucket 1023

… …

Bucket 1

Bucket 1048575

Bucket 0

… …

… …… …… …

Length limit

Bucket 1023

Com

pact

ion

Dyn

amic

buf

fers

Ded

icat

ed b

uffe

rs

Writ

es

Active table

Inactive table

Page 60: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Tier 3: Dual-mode Hash Table

Memory & I/O efficiency both achieved – Only one set of dynamic buffers – Write to active list first – Reorganize into inactive list – Combines the advantages

!13

Bucket 0

Bucket 1

Bucket 1023

… …

Bucket 1

Bucket 1048575

Bucket 0

… …

… …… …… …

Length limit

Bucket 1023

Com

pact

ion

Dyn

amic

buf

fers

Ded

icat

ed b

uffe

rs

Writ

es

Active table

Inactive table

Page 61: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Outline

• Cascade mapping design • Optimizations • Evaluation results • Conclusions

!14

Page 62: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Optimization Techniques

• Partition the hash space to create multiple demotion I/O streams • Adopt a memory-efficient CLOCK-based demotion policy • Organize an array of direct mapping blocks in the FIFO order • Parallel batch search to quickly complete a one-to-one scan • Use a dual-mode hash table for both memory and I/O efficiency • A jump list by using Bloom filters to skip impossible blocks • Make the FIFO-based eviction policy locality aware • Use slab sequence counter to realize zero-I/O demapping • Leverage the FIFO nature of slabs for efficient crash recovery

!15

Page 63: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Optimization Techniques

• Partition the hash space to create multiple demotion I/O streams • Adopt a memory-efficient CLOCK-based demotion policy • Organize an array of direct mapping blocks in the FIFO order • Parallel batch search to quickly complete a one-to-one scan • Use a dual-mode hash table for both memory and I/O efficiency • A jump list by using Bloom filters to skip impossible blocks • Make the FIFO-based eviction policy locality aware • Use slab sequence counter to realize zero-I/O demapping • Leverage the FIFO nature of slabs for efficient crash recovery

!16

Page 64: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Optimization: Jump List

!17

Hash bucket

One single long list

Page 65: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Optimization: Jump List

!17

1 0 0 1 1 1 1 1

A B C Bloom filter: to test whether an element is in a set – A query returns either possibly in set or definitely not in set – False positive is possible, but false negative is impossible – Elements can be added to the set, but not removed

Hash bucket

One single long list

Page 66: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Optimization: Jump List

!17

1 0 0 1 1 1 1 1

A B C Bloom filter: to test whether an element is in a set – A query returns either possibly in set or definitely not in set – False positive is possible, but false negative is impossible – Elements can be added to the set, but not removed

One single long list

Page 67: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Optimization: Jump List

!17

Hash bucket

Bloom filters are used to avoid unnecessary tier-3 I/Os – Bloom filters are stored in flash together with regular mapping blocks – Indicate whether a mapping can be found within next several blocks – If returns negative, jump to the next Bloom filter block

1 0 0 1 1 1 1 1

A B C Bloom filter: to test whether an element is in a set – A query returns either possibly in set or definitely not in set – False positive is possible, but false negative is impossible – Elements can be added to the set, but not removed

One single long list Several short lists connected by hops

Page 68: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Optimization: Garbage Collection

!18

• GC is a must-have for key-value systems– To reclaim flash space– To organize large sequential writes

Page 69: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Optimization: Garbage Collection

!18

• GC is a must-have for key-value systems– To reclaim flash space– To organize large sequential writes

Victim slab

Page 70: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Optimization: Garbage Collection

!18

• GC is a must-have for key-value systems– To reclaim flash space– To organize large sequential writes

• Traditional: Free up space immediately– Erase entire victim slab based on FIFO order– Reclaim space quickly, but may delete hot data

Victim slab

Page 71: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Optimization: Garbage Collection

!18

• GC is a must-have for key-value systems– To reclaim flash space– To organize large sequential writes

• Traditional: Free up space immediately– Erase entire victim slab based on FIFO order– Reclaim space quickly, but may delete hot data

Victim slab

Page 72: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Optimization: Garbage Collection

!18

• GC is a must-have for key-value systems– To reclaim flash space– To organize large sequential writes

• Traditional: Free up space immediately– Erase entire victim slab based on FIFO order– Reclaim space quickly, but may delete hot data

Victim slab

Page 73: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Optimization: Garbage Collection

!18

• GC is a must-have for key-value systems– To reclaim flash space– To organize large sequential writes

• Traditional: Free up space immediately– Erase entire victim slab based on FIFO order– Reclaim space quickly, but may delete hot data

• Our solution: Keep hot data in cache– If a k-v item’s mapping is in tier 1, indicating it is hot data– Rewrite hot data to a new slab, then erase victim slab

Victim slab

Page 74: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Optimization: Garbage Collection

!18

• GC is a must-have for key-value systems– To reclaim flash space– To organize large sequential writes

• Traditional: Free up space immediately– Erase entire victim slab based on FIFO order– Reclaim space quickly, but may delete hot data

• Our solution: Keep hot data in cache– If a k-v item’s mapping is in tier 1, indicating it is hot data– Rewrite hot data to a new slab, then erase victim slab

Victim slab

Page 75: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Optimization: Garbage Collection

!18

• GC is a must-have for key-value systems– To reclaim flash space– To organize large sequential writes

• Traditional: Free up space immediately– Erase entire victim slab based on FIFO order– Reclaim space quickly, but may delete hot data

• Our solution: Keep hot data in cache– If a k-v item’s mapping is in tier 1, indicating it is hot data– Rewrite hot data to a new slab, then erase victim slab

Victim slab

Page 76: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Optimization: Garbage Collection

!18

• GC is a must-have for key-value systems– To reclaim flash space– To organize large sequential writes

• Traditional: Free up space immediately– Erase entire victim slab based on FIFO order– Reclaim space quickly, but may delete hot data

• Our solution: Keep hot data in cache– If a k-v item’s mapping is in tier 1, indicating it is hot data– Rewrite hot data to a new slab, then erase victim slab

Victim slab

Page 77: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Optimization: Garbage Collection

!18

• GC is a must-have for key-value systems– To reclaim flash space– To organize large sequential writes

• Traditional: Free up space immediately– Erase entire victim slab based on FIFO order– Reclaim space quickly, but may delete hot data

• Our solution: Keep hot data in cache– If a k-v item’s mapping is in tier 1, indicating it is hot data– Rewrite hot data to a new slab, then erase victim slab

• Adaptive two-phase GC– If free flash space is too low, perform fast space reclaim– Keep hot data when system under moderate pressure

Victim slab

Page 78: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Outline

• Cascade mapping design • Optimizations • Evaluation results • Conclusions

!19

Page 79: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Experimental Setup

• Implementation – SlickCache: 3,800 lines of C code added to Twitter’s Fatcache

• Hardware environment – Lenovo ThinkServers: 4-core Intel Xeon 3.4 GHz with 16 GB DRAM – 240-GB Intel 730 SSD as cache device – 280-GB Intel Optane 900P SSD as swapping device – 7,200 RPM Seagate 2-TB HDD as database device

• Software environment – Ubuntu 16.04 with Linux kernel 4.12 and Ext4 file system – MongoDB 3.4 for backend database

• Workloads – Yahoo! Cloud Serving Benchmark (YCSB) – Popular distributions: Hotspot, Zipfian, and Normal

!20

Page 80: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Evaluation Results

!21

Comparison with Fatcache and system swapping Fatcache-Swap-Flash and Fatcache-Swap-Optane are both configured with 10% of physical memory, allowed to swap on flash SSD and Optane SSD respectively.

2x

7x

Page 81: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Evaluation Results

!22

85%

Cache effectiveness (Fixed cache size) SlickCache only uses 10% of the memory used by Fatcache, achieves comparable performance. SlickCache-GC increases throughput by up to 85% due to the optimized GC policy.

Page 82: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

!23

Evaluation Results

125x

Cache effectiveness (Fixed memory size) SlickCache is able to index a 10 times larger flash cache with the same amount of memory, which in turn increases the hit ratio by up to 8.2 times and the throughput by up to 125 times.

Page 83: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

Conclusions

!24

Cascade Mapping for flash-based key-value caching

• A hierarchical mapping structure for flash-based key-value cache

• A set of optimizations to improve performance

• Use less memory while performs better than current design

Page 84: Cascade Mapping: Optimizing Memory Efficiency for Flash ...– Parallel I/Os to load multiple mapping blocks into memory – Scan and find the most recent version of the data in one

!25

Thanks! And Questions?


Recommended