+ All Categories
Home > Technology > SFO15-407: Performance Overhead of ARM Virtualization

SFO15-407: Performance Overhead of ARM Virtualization

Date post: 19-Feb-2017
Category:
Upload: linaro
View: 929 times
Download: 2 times
Share this document with a friend
36
SFO15-407: Performance Overhead of ARM Virtualization Linaro Connect SFO15 Christoffer Dall, Linaro
Transcript
Page 1: SFO15-407: Performance Overhead of ARM Virtualization

SFO15-407:Performance Overhead

of ARM VirtualizationLinaro Connect SFO15 Christoffer Dall, Linaro

Page 2: SFO15-407: Performance Overhead of ARM Virtualization

Virtualization Use Cases

• Resource Sharing

• Isolation

• High Availability

• Provisioning

• Load balancing

Page 3: SFO15-407: Performance Overhead of ARM Virtualization

No Free Lunches

There’s a cost: Performance

Page 4: SFO15-407: Performance Overhead of ARM Virtualization

Virtualization on ARM

Virtualization Extensions

Page 5: SFO15-407: Performance Overhead of ARM Virtualization

ARM Virtualization Extensions

Kernel

User

Page 6: SFO15-407: Performance Overhead of ARM Virtualization

ARM Virtualization Extensions

Kernel

User

Hyp

EL0

EL1

EL2

Page 7: SFO15-407: Performance Overhead of ARM Virtualization

Kernel

User

Hypervisor

Kernel

User

VM 0 VM 1

ARM Virtualization Extensions

EL0

EL1

EL2

Page 8: SFO15-407: Performance Overhead of ARM Virtualization

Xen ARM

Kernel

User

Kernel

User

Dom0 DomU

EL0

EL1

XenEL2

Page 9: SFO15-407: Performance Overhead of ARM Virtualization

KVM

ARM Hardware

Kernel

VMGuest Kernel

Hypervisor

VMGuest Kernel

HYP mode?

Page 10: SFO15-407: Performance Overhead of ARM Virtualization

KVM

Kernel / KVM

User

Kernel

User

Host VM

EL0

EL1

EL2 KVM

Page 11: SFO15-407: Performance Overhead of ARM Virtualization

Hypercall on Xen

Kernel

User

Xen

Kernel

User

Dom0 DomU

EL0

EL1

EL2

HVC Ret

Page 12: SFO15-407: Performance Overhead of ARM Virtualization

Hypercall on KVM

Kernel / KVM

User

Kernel

User

Host VM

EL0

EL1

EL2 KVMHVCswitch

state

Ret

switch state Ret

HVC

Page 13: SFO15-407: Performance Overhead of ARM Virtualization

Hypercall Comparison

1. HVC Instruction 2. Xen handler 3. Return to VM

1. HVC Instruction 2. Switch EL1 state in KVM EL2 3. Return to host kernel 4. KVM handler 5. HVC Instruction 6. Switch EL1 state in KVM EL2 7. Return to VM

Xen KVM

Page 14: SFO15-407: Performance Overhead of ARM Virtualization
Page 15: SFO15-407: Performance Overhead of ARM Virtualization

Measurement Methodology

• Compare virtual to native

• CloudLab cluster with both ARM64 servers and x86 servers

Page 16: SFO15-407: Performance Overhead of ARM Virtualization

Hardware UsedARM Server x86 Server

Type HP Moonshot m400 Dell r320

CPU 2.4 GHz APM Atlas 2.1 GHz Xeon ES-2450

SMP 8-way 8-way

Memory 64 GB 16 GB

Disk SATA SSD 7200 RPM SATA HDD

Network Mellanox ConnectX-3 10GbE Mellanox MX354A 10GbE

Page 17: SFO15-407: Performance Overhead of ARM Virtualization

Configuration

• Same configuration across hardware

• Max 12GB of RAM

• Max 4 CPUs

• Hyperthreading disabled on x86

Page 18: SFO15-407: Performance Overhead of ARM Virtualization

VM Configuration

• 4 vCPUs per VM/host, 8 physical CPUs

• Pin all VCPUs to dedicated PCPUs

• VHOST enabled for KVM

Page 19: SFO15-407: Performance Overhead of ARM Virtualization

Software Configurations

• Same software version

• Linux v4.1-rc2+

• Ubuntu Trusty

• Same kernel config, manually tweaked x86 and arm64 options

Page 20: SFO15-407: Performance Overhead of ARM Virtualization

Micro NumbersARM 64-bit x86 64-bit

Microbenchmark KVM Xen KVM Xen

Hypercall NA NA NA NA

Interrupt Trap NA NA NA NA

IPI NA NA NA NA

EOI+ACK NA NA NA NA

VM Switch NA NA - -

I/O Latency Out NA NA - -

I/O Latency In NA NA - -

All numbers shown in cycles

Page 21: SFO15-407: Performance Overhead of ARM Virtualization

Hypercall BreakdownKVM ARM

State Save RestoreGP Regs NA NA

System Regs NA NAFP Regs NA NA

VGIC Regs NA NATimer Regs NA NA

EL2 Config Regs NA NAStage-2 MMU Regs NA NA

Save = Save state to MemoryRestore = Restore state From Memory

Page 22: SFO15-407: Performance Overhead of ARM Virtualization

ARM 64-bit x86 64-bit

Microbenchmark KVM Xen KVM Xen

Hypercall NA NA NA NA

Interrupt Trap NA NA NA NA

IPI NA NA NA NA

EOI+ACK NA NA NA NA

VM Switch NA NA - -

I/O Latency Out NA NA - -

I/O Latency In NA NA - -

All numbers shown in cycles

Micro Numbers

Page 23: SFO15-407: Performance Overhead of ARM Virtualization

I/O Latency Out Xen

Kernel

User

Xen

Kernel

User

Dom0 - PCPU 0 DomU - PCPU 4

EL0

EL1

EL2

HVCIRQIPI

vIRQ

Page 24: SFO15-407: Performance Overhead of ARM Virtualization

I/O Latency Out KVM

Kernel / KVM

User

Kernel

User

Host - PCPU 4 VM - PCPU 4

EL0

EL1

EL2 KVMMMIO Trapswitch

state

Ret

Same physical CPU!

Page 25: SFO15-407: Performance Overhead of ARM Virtualization

ARM 64-bit x86 64-bit

Microbenchmark KVM Xen KVM Xen

Hypercall NA NA NA NA

Interrupt Trap NA NA NA NA

IPI NA NA NA NA

EOI+ACK NA NA NA NA

VM Switch NA NA - -

I/O Latency Out NA NA - -

I/O Latency In NA NA - -

All numbers shown in cycles

Micro Numbers

Page 26: SFO15-407: Performance Overhead of ARM Virtualization

I/O Latency In Xen

Kernel

User

Xen

Kernel

User

Dom0 - PCPU 0 DomU - PCPU 4

EL0

EL1

EL2

IRQHVCIPI

vIRQ

Page 27: SFO15-407: Performance Overhead of ARM Virtualization

I/O Latency In KVM

Kernel / KVM

User

Kernel

User

Host - PCPU 0 VM - PCPU 4

EL0

EL1

EL2 KVMIRQ Exitswitch

state

Ret

IPI from VHOST

switch state

Ret vIRQ

HVC

Page 28: SFO15-407: Performance Overhead of ARM Virtualization

ARM 64-bit x86 64-bit

Microbenchmark KVM Xen KVM Xen

Hypercall NA NA NA NA

Interrupt Trap NA NA NA NA

IPI NA NA NA NA

EOI+ACK NA NA NA NA

VM Switch NA NA - -

I/O Latency Out NA NA - -

I/O Latency In NA NA - -

All numbers shown in cycles

Micro Numbers

Page 29: SFO15-407: Performance Overhead of ARM Virtualization

CPU Intensive Benchmarks

Normalized overhead (lower is better)

• Kernbench

• Hackbench

• SpecJVM2088

Results are NA

Page 30: SFO15-407: Performance Overhead of ARM Virtualization

Netperf

Normalized overhead (lower is better)

• TCP_STREAM

• TCP_MAERTS

• TCP_RR

Results are NA

Page 31: SFO15-407: Performance Overhead of ARM Virtualization

Netperf Study

• TCP_STREAM sends bulk data from client to VM

• Xen does not support zero-copy

Page 32: SFO15-407: Performance Overhead of ARM Virtualization

Netperf Study

• TCP_MAERTS sends bulk data from VM to client

• Xen performance is regression in Linux v4.0 from patch to fight buffer bloat

• Can be reduced to XX% by tuning sysfs

Page 33: SFO15-407: Performance Overhead of ARM Virtualization

Netperf Study• TCP_RR sends byte-by-byte

on open connection

Native KVM XenTrans/sec NA NA NATime/trans NA NA NAOverhead - NA NA

recv to send NA NA NAVM recv to VM send NA NA

recv to VM recv - NA NAVM send to send NA NA

Numbers in µseconds

Page 34: SFO15-407: Performance Overhead of ARM Virtualization

Application Benchmarks

Normalized overhead (lower is better)

• Apache

• memcached

• MySQL 20 Threads

Results are NA

Page 35: SFO15-407: Performance Overhead of ARM Virtualization

Conclusions

• Despite better hypercall performance, Xen does not necessarily outperform KVM on ARM.

• ARM servers do not exhibit worse overhead than x86 and is a viable choice.

• Latency is significant with paravirtualized I/O

Page 36: SFO15-407: Performance Overhead of ARM Virtualization

Future Work

• Further application benchmark analysis

• Device Assignment

• Upstream support for micro-benchmarks

• Automation and regression monitoring


Recommended