+ All Categories
Home > Documents > SHARED ADDRESS TRANSLATION REVISITED · •Android presents opportunities for shared library...

SHARED ADDRESS TRANSLATION REVISITED · •Android presents opportunities for shared library...

Date post: 12-Jul-2020
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
22
SHARED ADDRESS TRANSLATION REVISITED Xiaowan Dong University of Rochester Sandhya Dwarkadas University of Rochester Alan L. Cox Rice University
Transcript
Page 1: SHARED ADDRESS TRANSLATION REVISITED · •Android presents opportunities for shared library address translation sharing •We eliminated the duplication of address translation on

SHARED ADDRESS TRANSLATION REVISITED

Xiaowan Dong University of Rochester

Sandhya Dwarkadas University of Rochester

Alan L. Cox Rice University

Page 2: SHARED ADDRESS TRANSLATION REVISITED · •Android presents opportunities for shared library address translation sharing •We eliminated the duplication of address translation on

Limitations of Current Shared Memory Management

• Physical memory sharing is common

• However, address translation is private per process• page tables and Translation Lookaside Buffer

(TLB) entries

• Potential for duplicate translation information

• Scalability problem: O(# of processes)

• Inefficient utilization of shared caches

2

(as much as 58% on Android)

physical memory

Page Table

entry

Page Table

entry

TLB entry

TLB entry

Process 1 Process 2

Page Table

entry

Page Table

entry

Page 3: SHARED ADDRESS TRANSLATION REVISITED · •Android presents opportunities for shared library address translation sharing •We eliminated the duplication of address translation on

Previous Work

• Previous work shares page tables for applications handling large amounts of contiguous data• E.g., PostgreSQL database systems

• Limitations:• Overlook code at smaller granularity (such as shared libraries)• Ignore duplication in the TLB

• New opportunities on Android, where shared libraries are used intensively

3

Page 4: SHARED ADDRESS TRANSLATION REVISITED · •Android presents opportunities for shared library address translation sharing •We eliminated the duplication of address translation on

Android Process Creation Model

All applications share the same physical and virtual addresses for the preloaded libraries

4

Page 5: SHARED ADDRESS TRANSLATION REVISITED · •Android presents opportunities for shared library address translation sharing •We eliminated the duplication of address translation on

Goal: Shared Address Translation: Page Tables and TLB Entries

5

• Sharing address translation for the zygote-preloaded shared libraries

• Implemented at the OS level with existing hardware support• Mostly machine-independent

• Benefits• Reduce soft page faults

• Improve cache and TLB performance

physical page

Page Table

entry

TLB entry

Process 1&

Process 2

Page 6: SHARED ADDRESS TRANSLATION REVISITED · •Android presents opportunities for shared library address translation sharing •We eliminated the duplication of address translation on

Impact of Shared Libraries on Instruction Footprint• Number of shared libraries per application:

• Loaded: 88 to 107 (zygote-preloaded: 88)

• Invoked: 24 to 68 (zygote-preloaded: 21 to 46)

6

0%

20%

40%

60%

80%

100%

% of inst pages accessed

zygote-preloaded shared lib other shared lib

0%

20%

40%

60%

80%

100%

% of inst fetched

zygote-preloaded shared lib other shared lib

93% 98%

68% 72%

Page 7: SHARED ADDRESS TRANSLATION REVISITED · •Android presents opportunities for shared library address translation sharing •We eliminated the duplication of address translation on

Shared Library Instruction Footprint Intersection

• Considerable overlap in the shared library code accessed across different applications

• 46% of total inst pages accessed are in common for each pair of applications

• Zygote-preloaded: 38%

7

Laya Music Player

Adobe Reader

MX Player

91%

72%

85%

The % of inst footprint overlapped

Page 8: SHARED ADDRESS TRANSLATION REVISITED · •Android presents opportunities for shared library address translation sharing •We eliminated the duplication of address translation on

SHARING ADDRESS TRANSLATION

8

Page 9: SHARED ADDRESS TRANSLATION REVISITED · •Android presents opportunities for shared library address translation sharing •We eliminated the duplication of address translation on

Sharing Page Tables

• The ARM architecture defines a two-level hierarchical page table

• L2 page table pages are shared at fork time between the zygote and its child processes• Supports private writable memory regions

• Shared page table pages and physical pages should both be managed in a copy-on-write (COW) manner

9

L1 PTE

L1 PTE

L2 PTE

L2 PTE

L2 PTE

L2 PTE

L1 PTE

L1 PTE

L2 PTE

L2 PTE

L2 PTE

L2 PTE

Zygote

Android application

Page 10: SHARED ADDRESS TRANSLATION REVISITED · •Android presents opportunities for shared library address translation sharing •We eliminated the duplication of address translation on

Maintaining Shared Page Tables

• A shared page table page needs to be unshared (COWed) in the following cases:

• Page fault with write access

• A process creates, destroys, or modifies a memory region within the range of a shared page table page

• A process tries to free a shared page table page

• Modification to any memory region will lose the entire shared page table page• Mapping the page table entries of the code segment and data segment of a shared

library into different page table pages

10

Page 11: SHARED ADDRESS TRANSLATION REVISITED · •Android presents opportunities for shared library address translation sharing •We eliminated the duplication of address translation on

Sharing TLB Entries

• Global bit• We set the global bit in the page table entries of the zygote-preloaded shared

libraries’ code segments

• Overrides Address Space Identifier (ASID) in TLB

• Domain protection model of 32-bit ARM• Prevents processes not forked from the zygote from accessing the shared global

TLB entries

• E.g., system services and daemons

11

Page 12: SHARED ADDRESS TRANSLATION REVISITED · •Android presents opportunities for shared library address translation sharing •We eliminated the duplication of address translation on

12

Zygote-preloaded

shared libraries

User Space

Kernel Space

Domain 2Domain 1 Domain 3

… 00 …Non-zygote processes

… 01 …Zygote-like processes

Domain 3

DACR

VPN ASID 1 0011 Permission bits

Global bit Domain field

TLB

Memory Abort Handler Trap into kernel

Domain fault ?

Check fault status register

Flush all TLB entries with the faulting address

Leveraging the domain protection model

00: No access permission01: Based on permission bits listed in the TLB entry

Page 13: SHARED ADDRESS TRANSLATION REVISITED · •Android presents opportunities for shared library address translation sharing •We eliminated the duplication of address translation on

EVALUATION

13

Page 14: SHARED ADDRESS TRANSLATION REVISITED · •Android presents opportunities for shared library address translation sharing •We eliminated the duplication of address translation on

Evaluation Platforms

• Nexus 7 (2012)• 1.2GHz Nvidia Tegra 3 processor with four ARM Cortex-A9 cores• A private 2-level TLB

• I/D micro TLB (flushed over context switch)

• 128-entry main TLB

• 32KB/32KB L1 cache (I/D)• 1MB shared L2 cache

• Android KitKat 4.4.4 OS• New android runtime (ART)

• Benchmarks:• Most popular application in each category on Google Play Store

14

Page 15: SHARED ADDRESS TRANSLATION REVISITED · •Android presents opportunities for shared library address translation sharing •We eliminated the duplication of address translation on

Zygote Fork

• Sharing page table improves execution time of a zygote fork by 2.1x

• Trade-off between cost of fork and # of page faults experienced by child processes• Sharing page table is the best of both worlds

15

Kernel Execution Cycles (x 106) # of PTPs allocated # of PTEs copied

Stock Android 2.9 38 3,900

Copied PTEs 4.6 51 9,800

Shared PTPs 1.4 1 7

Page 16: SHARED ADDRESS TRANSLATION REVISITED · •Android presents opportunities for shared library address translation sharing •We eliminated the duplication of address translation on

Application Launch Performance

• Every application follows the same launch procedure before it loads its application-specific Java classes

• Launch time improved by 7% (10% with 2MB alignment)• 94% fewer page faults for creating PTEs that map shared code and data

• 15% reduction in L1 Icache stall cycles

• 68 % less page table page allocation

16

Page 17: SHARED ADDRESS TRANSLATION REVISITED · •Android presents opportunities for shared library address translation sharing •We eliminated the duplication of address translation on

Over The Course of Execution

17

38% fewer Page faults for creating PTEs that map shared code and data on average (maximum 78%)

35% fewer page table pages allocated(maximum 58%)

0%

20%

40%

60%

80%

100%

PTP allocation normalized to stock Android

Page 18: SHARED ADDRESS TRANSLATION REVISITED · •Android presents opportunities for shared library address translation sharing •We eliminated the duplication of address translation on

Android IPC Performance

• Inter-process communication (IPC) is common on Android

• Developed microbenchmark using Android IPC binder mechanism

• Inst main TLB stall cycles are reduced by:• Client: 36%

• Server: 19%

18

Page 19: SHARED ADDRESS TRANSLATION REVISITED · •Android presents opportunities for shared library address translation sharing •We eliminated the duplication of address translation on

Conclusion

• Android presents opportunities for shared library address translation sharing

• We eliminated the duplication of address translation on Android

• Android’s application launch, steady-state, and context switch efficiency are improved

• Speed up a zygote fork by 2.1x

• Improve application launch by 10%

• Our shared address translation infrastructure should be portable to other platforms

19

Page 20: SHARED ADDRESS TRANSLATION REVISITED · •Android presents opportunities for shared library address translation sharing •We eliminated the duplication of address translation on

Large Pages Are Inefficient for Zygote-preloaded Shared Libraries• Using large pages (64KB page for

example) will waste physical memory compared to 4KB base pages:• 2.6x memory consumption on average

• 94% more memory consumption for the union set

• Linux does not support the use of large pages for code

• Our design can complement large pages• 64KB page on ARM also requires 2-level

page table as 4KB page does

20

CDF of # of 4KB pages untouched within a 64KB large page of zygote-preloaded shared libraries

Page 21: SHARED ADDRESS TRANSLATION REVISITED · •Android presents opportunities for shared library address translation sharing •We eliminated the duplication of address translation on

Sharing TLB

21

Task_struct.zygote = 1

Vma.global= 1

mmap the codesegment of a shared library

fork

Task_struct.zygote_like =1

inherit

Vma.global= 1

zygote

exec

Task_struct.zygote =1 or

zygote_like = 1?

Page fault on a zygote-preloaded shared library

Vma.global = 1

?

Set global bit in PTE

yes

yes

Global bit is used for kernel pages in stock Linux

Page 22: SHARED ADDRESS TRANSLATION REVISITED · •Android presents opportunities for shared library address translation sharing •We eliminated the duplication of address translation on

Sharing Page Table at Fork

Parent’s addr space

vma1

vma2

vma3

L1 PTP

L1 PTE1

L1 PTE2

L1 PTE3

L2 PTP

L2 PTE1

L2 PTE2

L2 PTE3Child’s addrspace

vma1

vma2

vma3

L1 PTP

L1 PTE1

L1 PTE2

L1 PTE3

L2 PTP is shared?

No

Write-protect every writable L2 PTE

Shared PTP

Virtual memory area (VMA): a memory region

If ARM supports write protection in L1 PTE as x86, we can avoid write-protecting every L2 PTE


Recommended