+ All Categories
Home > Documents > UKSM: Swift Memory Deduplication via Hierarchical and ... · 2/15/2018  · KSM 100 Pages KSM 1000...

UKSM: Swift Memory Deduplication via Hierarchical and ... · 2/15/2018  · KSM 100 Pages KSM 1000...

Date post: 31-Jan-2021
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
24
UKSM: Swift Memory Deduplication via Hierarchical and Adaptive Memory Region Distilling Nai Xia* Chen Tian* Yan Luo + Hang Liu + Xiaoliang Wang* *: Nanjing University +: University of Massachusetts Lowell Feb/15/2018
Transcript
  • UKSM: Swift Memory Deduplication via Hierarchical and Adaptive

    Memory Region Distilling

    Nai Xia* Chen Tian* Yan Luo+ Hang Liu+ Xiaoliang Wang**: Nanjing University +: University of Massachusetts Lowell

    Feb/15/2018

  • Background• What is Kernel Samepage Merging (KSM)?

    2

    page 1

    page 2Identical ?

    page 1

    page 2Update, different?

    page 1

    page 2

    • Goal: Reduce memory consumption when duplication exists.• Effectiveness: There exist tremendous (~86%) memory duplications in

    real-world applications, Change et al. [ISPA 2011].

    Merge Unmerge

  • Unique Challenges

    • Storage deduplication deals with relatively static content, only concerns about duplication ratio. • Sparse Indexing [FAST 2009] , CAFTL [FAST 2011], El-Shimi et al. [ATC 2012], Cao et al. [Just now]

    • Responsiveness:• Remove duplications before they exhaust the memory.

    • Dynamic nature:• Duplication status may change over time.

    3

  • Accelerate the deduplication of memory which is dynamic in nature!

    4

  • Outline

    • Observation (Opportunity)• Overview• Hierarchical Region Distilling• Adaptive Partial Hashing• Evaluation• Conclusion

    5

  • Observation I: Pages within the Same Region Present Similar Patterns0 200 400 600 800 1000

    0

    2

    4

    6

    8x 104

    KVM Memory SpaceDu

    plica

    ted

    Page

    s

    0 200 400 600 800 10000

    2000

    4000

    6000

    8000

    Docker Memory Space

    Dupl

    icate

    d Pa

    ges

    6

    • Test: Apache web server and MySQL database serving wordpress website in Ubuntu 16.04 (kernel version 4.4).

    Duplicated pages concentrate by memory region.

    *Please refer to our paper for other pattern analysis

  • Observation II: Hashing Needs to Be Adaptive

    • Various applications need different hashing strengths to differentiate:• Image applications contain pages with highly similar contents.• Crypto applications contain diverse contents.

    7

    We should adjust hashing strength accordingly.

    Page i

    Page j

    Page i

    Page j

  • Overview

    • Assuming we have 9 memory regions, i.e., R0 – R8.

    8

    R0 R1 R2 R3 R4

    R5 R6 R7 R8

    Ri

    Low HighSimilarity

  • Overview

    • Hierarchical memory region clustering.

    9

    R0

    R1

    R2

    R3

    R4R5 R6

    R7

    R8

    Level 1

    Level 2

    Level N

    Ri

    Low HighSimilarity

  • Overview

    • Hierarchical region distilling.

    10

    R0

    R1

    R2

    R3

    R4R5 R6

    R7

    R8

    Level 1

    Level 2

    Level N

    Ri

    Low HighSimilarity

  • R3R3

    R8

    Overview

    • Hierarchical region distilling.

    11

    Ri

    Low HighSimilarity

    R0

    R1

    R2 R4R5 R6

    R7

    R8

    Level 1

    Level 2

    Level N

    Round n

    R0

    R1

    R2 R4R5 R6

    R7

    Level 1

    Level 2

    Level N

    Round n + 1

  • R3

    R3 R8

    Overview

    • Hierarchical region distilling + Adaptive partial hashing.

    12

    Ri

    Low HighSimilarity

    R0

    R1

    R2 R4R5 R6

    R7

    R8

    Level 1

    Level 2

    Level N

    R0

    R1

    R2 R4R5 R6

    R7

    Round n Round n + 1

  • R3

    R3 R8

    Overview

    13

    R0

    R1

    R2 R4R5 R6

    R7

    R8

    Level 1

    Level 2

    Level N

    R0

    R1

    R2 R4R5 R6

    R7

    • Takeaway 1: Promote/demote regions.

    Takeaway 1…

    Takeaway 2

    • Takeaway 2: Sampling offset shift.

    Takeaway 3

    • Takeaway 3: Hash strength adjustment.

    Round n Round n + 1

    • Hierarchical region distilling + Adaptive partial hashing.

  • Hierarchical Region Distilling• Memory region characterization – Signatures:• Vcow: promote regions whose COW-broken ratios are lower than this.• Vdup: promote regions whose duplication ratios are higher than this.• Vlife: regions living longer than this threshold can be effectively scanned.

    • Default empirical values:• Vcow = 10%, Vdup = 20% and Vlife = 100ms.

    Various commercial products adopt UKSM and observe different sweet spots.

    14

    * COW: copy on write

  • Hierarchical Region Distilling

    15

    Region Ri Sample & Hash

    Treemerge

    Treeunmerge

    Adjust Vdup

    *: We adopt Linux KSM black-red tree design to track ’merged’ and ’unmerged’ pages.

    Write on merged tree, adjust Vcow

    move page from unmerged to merged tree

  • Adaptive Partial Hashing

    16

    Half hashing strength Strength = Strength ± DeltaProbe state

    Adjusthash strength

    We optimize SuperFastHash with the following key contributions:• Minimizing collisions – Optimizing avalanche for SuperFastHash [Hsieh 2004].• Progressive hashing – Support additivity while adjust hash strengths.

    Hash Hash value H2 (round n+1)

    Combine to H1,2Hash Hash value H1 (round n)1

    st half

    2nd half

    Sampled page

  • Evaluation

    • 6,000 Lines of Code in Linux kernel.• OS: Vanilla kernel 4.4. • Hardware: • Intel® Core ™ i7 CPU 920 with four 2.67 GHz cores.• 12 GB memory.

    • For fair comparison• KSM is upgraded to SuperFastHash.

    17

  • Evaluation Goals

    • How efficient is UKSM on different workloads?• How flexible is UKSM regarding customization?• What’s the responsiveness of UKSM vs KSM?• How does adaptive partial hashing perform compared to non-adaptive

    algorithm?• What’s the performance penalty of UKSM?

    18

  • Evaluation Goals

    • How efficient is UKSM on different workloads?• How flexible is UKSM regarding customization?• What’s the responsiveness of UKSM vs KSM?• How does adaptive partial hashing perform compared to non-adaptive

    algorithm?• What’s the performance penalty of UKSM?

    19

  • Parameter Analysis

    20

    0 50 100 150 200 250 3000

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    Seconds

    CPU

    Util

    izat

    ion

    (%)

    FullQuiet

    0 50 100 150 200 250 3000

    1000

    2000

    3000

    4000

    5000

    6000

    Seconds

    Mem

    ory

    Savi

    ng (M

    B)

    FullMediumLowQuiet

    • UKSM allows four levels of scanning strengths:• Level Full allows upto 95% CPU consumption and can scan the entire memory in 2 seconds.• Each lower level will half the CPU and potentially increase the scan time by 2x.

    Setting: Booting 25 VMs, each with 1 VCPU, 1GB memory.

    Catching up time

  • Responsiveness Analysis

    0 100 200 300 400 500 600Seconds

    4000

    5000

    6000

    7000

    8000

    9000

    10000

    11000

    Mem

    ory

    Util

    izat

    ion

    (MB)

    UKSMKSM 100 PagesKSM 1000 PagesKSM 2000 Pages

    21

    611

    95

    615

    0 100 200 300 400 500 600Seconds

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    CPU

    (% o

    ne c

    ore)

    UKSMKSM 100 PagesKSM 1000 PagesKSM 2000 Pages

    UKSM is 8.3×, 12.6×, 11.5× more efficient than KSM at scan speed of 100, 1000, 2000 pages.

    Efficiency = "#"$%&'()*+,-./0$+'1"23*$+

    Setting: Two processes, each with 4GB memory. One contains identical pages while the other random ones.

  • Related Work

    • Content-based approach:• VMware ESX server, IBM active memory deduplication, Red Hat ksmtuned.• Majority of them treat every page equally.

    • I/O hint based approach:• KSM++ [Resolve 2012], XLH[Usenix ATC 2013], CMD [VEE 2014].• Cannot track anonymous memory space (no I/O) or require hardware change.

    • SmartMD [Usenix ATC ‘17]:• Consider various page sizes; we are orthogonal.

    22

  • Conclusion

    • Memory deduplication faces the unique challenges. Our techniques:• Hierarchical region distilling.• Adaptive partial hashing.

    • UKSM saves 12.6x and 5x more memory than KSM on static and dynamic workload, respectively, in the same time envelope.

    • UKSM is an in production system: https://github.com/dolohow/uksm.• It has ~110 (watch, star and fork) after less than one year in GitHub.

    23

  • Thank You & Questions?

    24

    We would like to thank our shepherd Dr. Hong Jiang and anonymous reviewers!


Recommended