Bare-Metal Performance for x86 I/O Virtualization
Muli Ben-Yehuda
Technion & IBM Research
HiPEAC Autumn Computing Systems Week in Barcelona
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 1 / 23
Background: x86 machine virtualization
Running multiple different unmodified operating systemsEach in an isolated virtual machineSimultaneouslyOn the x86 architectureMany uses: live migration, record & replay, testing, security, . . .Foundation of IaaS cloud computingUsed nearly everywhere
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 2 / 23
The problem is performance
Machine virtualization can reduce performance by orders ofmagnitude[Adams06,Santos08,Ram09,Ben-Yehuda10,Amit11,. . . ]Overhead limits use of virtualization in many scenariosWe would like to make it possible to use virtualization everywhereWhere does the overhead come from?
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 3 / 23
The origin of overhead
Popek and Goldberg’s virtualization model [Popek74]: Trap andemulatePrivileged instructions trap to the hypervisorHypervisor emulates their behaviorTraps cause an exitI/O intensive workloads cause many exits
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 4 / 23
I/O virtualization via device emulation
GUEST
HOST
1
2
34
deviceemulation
driverdevice
driverdevice
Emulation is usually the default [Sugerman01]Works for unmodified guests out of the boxVery low performance, due to many exits on the I/O path
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 5 / 23
I/O virtualization via paravirtualized devices
GUEST
HOST
driver
1
23
back−end
virtualdriver
front−end
virtualdevicedriver
Hypervisor aware drivers and “devices” [Barham03,Russell08]Requires new guest driversRequires hypervisor involvement on the I/O path
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 6 / 23
I/O virtualization via device assignment
GUEST
HOST
devicedriver
Bypass the hypervisor on I/O path [Levasseur04,Ben-Yehuda06]SR-IOV devices provide sharing in hardwareBetter performance than paravirtual—but far from native
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 7 / 23
Comparing I/O virtualization methods
IOV method throughput (Mb/s) CPU utilizationbare-metal 950 20%
device assignment 950 25%paravirtual 950 50%emulation 250 100%
netperf TCP_STREAM sender on 1Gb/s Ethernet (16K msgs)Device assignment best performing optionDevice assignment still 25% worse than bare metal. Why?
“The Turtles Project: Design and Implementation of Nested Virtualization”,Ben-Yehuda, Day, Dubitzky, Factor, Hare’El, Gordon, Liguori, Wasserman andYassour, OSDI ’10
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 8 / 23
What does it mean, to do I/O?
Programmed I/O (in/outinstructions)Memory-mapped I/O (loadsand stores)Direct memory access (DMA)Interrupts
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 9 / 23
Direct memory access (DMA)
All modern devices access memory directlyOn bare-metal:
A trusted driver gives its device an addressDevice reads or writes that address
Protection problem: guest drivers are not trustedTranslation problem: guest memory 6= host memoryDirect access: the guest bypasses the hostWhat to do?
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 10 / 23
IOMMU
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 11 / 23
The IOMMU mapping memory/performance tradeoff
When does the host map and unmap translation entries?Direct mapping up-front on virtual machine creation: all memory ispinned, no intra-guest protectionDuring run-time: high cost in performanceWe want: direct mapping performance, intra-guest protection,minimal pinning
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 12 / 23
vIOMMU: efficient IOMMU emulation
Emulate an IOMMU so that weknow when to map and unmapUse a sidecore [Kumar07] forefficient emulation: avoid costlyexits by running emulation onanother core in parallelOptimistic teardown: relaxprotection to increaseperformance by cachingtranslation entriesvIOMMU provides highperformance with intra-guestprotection and minimal pinning
IOMMU
I/O Device
Memory
I/O DeviceDriver
IOMMUMapping
Layer
GuestDomain
EmulationDomain(Sidecore)
SystemDomain
IOMMUEmulation
(2) UpdateMappings Emul.
PTE
PhysicalPTE
(6) UpdateMappings
I/OBuffer
(9) IOVAAccess
(7) IOTLB Invalidations
Emul.IOMMURegs.
(4) Poll
(3) IOTLB Invd.
(1)Map / Unmap
I/O Buffer
(11)PhysicalAccess
(8) Transactionto IOVA
(10)Translate
(5) Read
“vIOMMU: Efficient IOMMU Emulation”, Amit, Ben-Yehuda, Schuster, Tsafrir,USENIX ATC ’11
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 13 / 23
Problem solved?
netperf TCP_STREAMsender on 10Gb/s Ethernetwith 256 byte messagesUsing device assignment withdirect mapping in the IOMMUOnly achieves 60% ofbare-metal performanceSame results for memcachedand apache
Where does the rest go?
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 14 / 23
Recap: doing I/O
Programmed I/O (in/out instructions)Memory-mapped I/O (loads and stores)Direct memory access (DMA)Interrupts: approximately 49,000 interrupts per second with Linux
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 15 / 23
ELI: ExitLess Interrupts
bare-metal
Baseline
guest
hypervisor
(time)
ELI delivery
guest
hypervisor
ELIdelivery & completion
guest
hypervisor
PhysicalInterrupt
Interrupt Completion
(a)
(b)
(c)
Interrupt Injection
Interrupt Completion
(d)
ELI: direct interrupts for unmodified, untrusted guests
“ELI: Bare-Metal Performance for I/O Virtualization”, Gordon, Amit, Hare’El,Ben-Yehuda, Landau, Schuster, Tsafrir, ASPLOS ’12
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 16 / 23
ELI: delivery
ShadowIDT
Hypervisor
ShadowIDT
InterruptHandler
AssignedInterrupt
PhysicalInterrupt
Non-assignedInterrupt(#NP/#GP exit)
ELIDelivery
GuestIDT
VM
IDT Entry
IDT Entry
…
IDT Entry
P=0
P=1
P=0
Handler
#NP
#NP
IDT Entry#GP
IDTRLimit
All interrupts are delivered directly to the guestHost and other guests’ interrupts are bounced back to the host. . . without the guest being aware of it
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 17 / 23
ELI: signaling completion
Guests signal interrupt completions by writing to the LocalAdvance Programmable Interrupt Controller (LAPIC)End-of-Interrupt (EOI) registerOld LAPIC: hypervisor traps load/stores to LAPIC pagex2APIC: hypervisor can trap specific registers
Signaling completion without trapping requires x2APICELI gives the guest direct access only to the EOI register
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 18 / 23
ELI: threat model
Threats: malicious guests might try to:keep interrupts disabledsignal invalid completionsconsume other guests or host interrupts
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 19 / 23
ELI: protection
VMX preemption timer to force exits instead of timer interruptsIgnore spurious EOIsProtect critical interrupts by:
Delivering them to a non-ELI core if availableRedirecting them as NMIs→unconditional exitUse IDTR limit to force #GP exits on critical interrupts
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 20 / 23
Bare-metal Performance for I/O Virtualization
Throughput is scaled so 100% means bare-metal throughputAll workloads reach 97–100% of bare metal with ELI!CPU is saturated; host uses huge pages to back guest memoryFull experimental details and analysis in ASPLOS paper
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 21 / 23
Conclusion
IOMMUs take the host out of the DMA pathELI takes the host out of the interrupt pathAchievement unlocked: bare-metal performance for x86 VMs
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 22 / 23
Thank you! Questions?
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 23 / 23