UKSM: Swift Memory Deduplication via Hierarchical and ... · 2/15/2018 · KSM 100 Pages KSM 1000...

UKSM: Swift Memory Deduplication via Hierarchical and Adaptive

Memory Region Distilling

Nai Xia* Chen Tian* Yan Luo+ Hang Liu+ Xiaoliang Wang**: Nanjing University +: University of Massachusetts Lowell

Feb/15/2018

Background• What is Kernel Samepage Merging (KSM)?

2

page 1

page 2Identical ?

page 1

page 2Update, different?

page 1

page 2

• Goal: Reduce memory consumption when duplication exists.• Effectiveness: There exist tremendous (~86%) memory duplications in

real-world applications, Change et al. [ISPA 2011].

…

Merge Unmerge

Unique Challenges

• Storage deduplication deals with relatively static content, only concerns about duplication ratio. • Sparse Indexing [FAST 2009] , CAFTL [FAST 2011], El-Shimi et al. [ATC 2012], Cao et al. [Just now]

• Responsiveness:• Remove duplications before they exhaust the memory.

• Dynamic nature:• Duplication status may change over time.

3

Accelerate the deduplication of memory which is dynamic in nature!

4

Outline

• Observation (Opportunity)• Overview• Hierarchical Region Distilling• Adaptive Partial Hashing• Evaluation• Conclusion

5

Observation I: Pages within the Same Region Present Similar Patterns0 200 400 600 800 1000

0

2

4

6

8x 104

KVM Memory SpaceDu

plica

ted

Page

s

0 200 400 600 800 10000

2000

4000

6000

8000

Docker Memory Space

Dupl

icate

d Pa

ges

6

• Test: Apache web server and MySQL database serving wordpress website in Ubuntu 16.04 (kernel version 4.4).

Duplicated pages concentrate by memory region.

*Please refer to our paper for other pattern analysis

Observation II: Hashing Needs to Be Adaptive

• Various applications need different hashing strengths to differentiate:• Image applications contain pages with highly similar contents.• Crypto applications contain diverse contents.

7

We should adjust hashing strength accordingly.

Page i

Page j

Page i

Page j

Overview

• Assuming we have 9 memory regions, i.e., R0 – R8.

8

R0 R1 R2 R3 R4

R5 R6 R7 R8

Ri

Low HighSimilarity

Overview

• Hierarchical memory region clustering.

9

R0

R1

R2

R3

R4R5 R6

R7

R8

Level 1

Level 2

Level N

Ri

Low HighSimilarity

…

Overview

• Hierarchical region distilling.

10

R0

R1

R2

R3

R4R5 R6

R7

R8

Level 1

Level 2

Level N

Ri

Low HighSimilarity

…

R3R3

R8

Overview

• Hierarchical region distilling.

11

Ri

Low HighSimilarity

R0

R1

R2 R4R5 R6

R7

R8

Level 1

Level 2

Level N

Round n

…

R0

R1

R2 R4R5 R6

R7

Level 1

Level 2

Level N

Round n + 1

…

R3

R3 R8

Overview

• Hierarchical region distilling + Adaptive partial hashing.

12

Ri

Low HighSimilarity

R0

R1

R2 R4R5 R6

R7

R8

Level 1

Level 2

Level N

R0

R1

R2 R4R5 R6

R7

…

Round n Round n + 1

R3

R3 R8

Overview

13

R0

R1

R2 R4R5 R6

R7

R8

Level 1

Level 2

Level N

R0

R1

R2 R4R5 R6

R7

• Takeaway 1: Promote/demote regions.

Takeaway 1…

Takeaway 2

• Takeaway 2: Sampling offset shift.

Takeaway 3

• Takeaway 3: Hash strength adjustment.

Round n Round n + 1

• Hierarchical region distilling + Adaptive partial hashing.

Hierarchical Region Distilling• Memory region characterization – Signatures:• Vcow: promote regions whose COW-broken ratios are lower than this.• Vdup: promote regions whose duplication ratios are higher than this.• Vlife: regions living longer than this threshold can be effectively scanned.

• Default empirical values:• Vcow = 10%, Vdup = 20% and Vlife = 100ms.

Various commercial products adopt UKSM and observe different sweet spots.

14

* COW: copy on write

Hierarchical Region Distilling

15

Region Ri Sample & Hash

Treemerge

Treeunmerge

Adjust Vdup

*: We adopt Linux KSM black-red tree design to track ’merged’ and ’unmerged’ pages.

Write on merged tree, adjust Vcow

move page from unmerged to merged tree

Adaptive Partial Hashing

16

Half hashing strength Strength = Strength ± DeltaProbe state

Adjusthash strength

We optimize SuperFastHash with the following key contributions:• Minimizing collisions – Optimizing avalanche for SuperFastHash [Hsieh 2004].• Progressive hashing – Support additivity while adjust hash strengths.

Hash Hash value H2 (round n+1)

Combine to H1,2Hash Hash value H1 (round n)1

st half

2nd half

Sampled page

Evaluation

• 6,000 Lines of Code in Linux kernel.• OS: Vanilla kernel 4.4. • Hardware: • Intel® Core ™ i7 CPU 920 with four 2.67 GHz cores.• 12 GB memory.

• For fair comparison• KSM is upgraded to SuperFastHash.

17

Evaluation Goals

• How efficient is UKSM on different workloads?• How flexible is UKSM regarding customization?• What’s the responsiveness of UKSM vs KSM?• How does adaptive partial hashing perform compared to non-adaptive

algorithm?• What’s the performance penalty of UKSM?

18

Evaluation Goals

• How efficient is UKSM on different workloads?• How flexible is UKSM regarding customization?• What’s the responsiveness of UKSM vs KSM?• How does adaptive partial hashing perform compared to non-adaptive

algorithm?• What’s the performance penalty of UKSM?

19

Parameter Analysis

20

0 50 100 150 200 250 3000

10

20

30

40

50

60

70

80

90

100

Seconds

CPU

Util

izat

ion

(%)

FullQuiet

0 50 100 150 200 250 3000

1000

2000

3000

4000

5000

6000

Seconds

Mem

ory

Savi

ng (M

B)

FullMediumLowQuiet

• UKSM allows four levels of scanning strengths:• Level Full allows upto 95% CPU consumption and can scan the entire memory in 2 seconds.• Each lower level will half the CPU and potentially increase the scan time by 2x.

Setting: Booting 25 VMs, each with 1 VCPU, 1GB memory.

Catching up time

Responsiveness Analysis

0 100 200 300 400 500 600Seconds

4000

5000

6000

7000

8000

9000

10000

11000

Mem

ory

Util

izat

ion

(MB)

UKSMKSM 100 PagesKSM 1000 PagesKSM 2000 Pages

21

611

95

615

0 100 200 300 400 500 600Seconds

0

10

20

30

40

50

60

70

80

90

100

CPU

(% o

ne c

ore)

UKSMKSM 100 PagesKSM 1000 PagesKSM 2000 Pages

UKSM is 8.3×, 12.6×, 11.5× more efficient than KSM at scan speed of 100, 1000, 2000 pages.

Efficiency = "#"$%&'()*+,-./0$+'1"23*$+

Setting: Two processes, each with 4GB memory. One contains identical pages while the other random ones.

Related Work

• Content-based approach:• VMware ESX server, IBM active memory deduplication, Red Hat ksmtuned.• Majority of them treat every page equally.

• I/O hint based approach:• KSM++ [Resolve 2012], XLH[Usenix ATC 2013], CMD [VEE 2014].• Cannot track anonymous memory space (no I/O) or require hardware change.

• SmartMD [Usenix ATC ‘17]:• Consider various page sizes; we are orthogonal.

22

Conclusion

• Memory deduplication faces the unique challenges. Our techniques:• Hierarchical region distilling.• Adaptive partial hashing.

• UKSM saves 12.6x and 5x more memory than KSM on static and dynamic workload, respectively, in the same time envelope.

• UKSM is an in production system: https://github.com/dolohow/uksm.• It has ~110 (watch, star and fork) after less than one year in GitHub.

23

Thank You & Questions?

24

We would like to thank our shepherd Dr. Hong Jiang and anonymous reviewers!

Date post:	31-Jan-2021
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

UKSM: Swift Memory Deduplication via Hierarchical and ... · 2/15/2018 · KSM 100 Pages KSM 1000...

Documents