Machine Virtualization for Fun, Profit, and Security
Muli Ben-Yehuda
Technion & IBM Research
Muli Ben-Yehuda (Technion & IBM Research) Virtualization for Security Bar-Ilan University, 2012 1 / 27
Background: x86 machine virtualization
Running multiple different unmodified operating systemsEach in an isolated virtual machineSimultaneouslyOn the x86 architectureMany uses: live migration, record & replay, testing, . . . , securityFoundation of IaaS cloud computingUsed nearly everywhere
Muli Ben-Yehuda (Technion & IBM Research) Virtualization for Security Bar-Ilan University, 2012 2 / 27
x86 virtualization primer
What is the problem?Popek and Goldberg’s virtualization model [Popek74]: Trap andemulatePrivileged instructions trap to the hypervisorHypervisor emulates their behaviorWithout hardware supportWith hardware support
Muli Ben-Yehuda (Technion & IBM Research) Virtualization for Security Bar-Ilan University, 2012 3 / 27
What is a rootkit?
First you take control. How?Then you hide to avoid detection and maintain control. How?Usual methods are ugly and intrusive: easy to detect!Can we do better?
Muli Ben-Yehuda (Technion & IBM Research) Virtualization for Security Bar-Ilan University, 2012 4 / 27
Hypervisor-level rootkits
Hypervisors have full control over the hardwareHypervisors can trap any operating system eventCode can enter hypervisor-mode at any timeSolution: run the rootkit as a hypervisor
Muli Ben-Yehuda (Technion & IBM Research) Virtualization for Security Bar-Ilan University, 2012 5 / 27
Bluepill: a hypervisor level rootkit [Rutkowska06]
Muli Ben-Yehuda (Technion & IBM Research) Virtualization for Security Bar-Ilan University, 2012 6 / 27
Bluepill cont’
Bluepill installs itself on the flyCan you bluepill bluepill?
Muli Ben-Yehuda (Technion & IBM Research) Virtualization for Security Bar-Ilan University, 2012 7 / 27
What is the Turtles project?
Efficient nested virtualization for Intel x86 based on KVMRuns multiple guest hypervisors and VMs: KVM, VMware, Linux,Windows, . . .Code publicly available
Muli Ben-Yehuda (Technion & IBM Research) Virtualization for Security Bar-Ilan University, 2012 8 / 27
What is the Turtles project? (cont’)
Nested VMX virtualization for nested CPU virtualizationMulti-dimensional paging for nested MMU virtualizationMulti-level device assignment for nested I/O virtualizationMicro-optimizations to make it go fast
+ + =
Muli Ben-Yehuda (Technion & IBM Research) Virtualization for Security Bar-Ilan University, 2012 9 / 27
Theory of nested CPU virtualization
Trap and emulate[PopekGoldberg74] ⇒ it’s all about the trapsSingle-level (x86) vs. multi-level (e.g., z/VM)Single level ⇒ one hypervisor, many guestsTurtles approach: L0 multiplexes the hardware between L1 and L2,running both as guests of L0—without either being aware of it(Scheme generalized for n levels; Our focus is n=2)
Hardware
Host Hypervisor
Guest
Hardware
Host Hypervisor
Multiplexed on a single level Multiple logical levels
L0
L1
L2
L1
GuestL2
GuestL2
L0
GuestL2L2
Guest Hypervisor
GuestHypervisor GuestGuest
Muli Ben-Yehuda (Technion & IBM Research) Virtualization for Security Bar-Ilan University, 2012 10 / 27
Detecting hypervisor-based rootkits
Bluepill authors claim “undetectable”“Compatibility is Not Transparency: VMM Detection Myths andRealities” [Garfinkel07]Hardware discrepanciesResource-sharing attacksTiming attacks: PCI register access, page-faults on MMIO access,cpuid timing vs. nopsCan you trust time?
Muli Ben-Yehuda (Technion & IBM Research) Virtualization for Security Bar-Ilan University, 2012 11 / 27
What does it mean, to do I/O?
Programmed I/O (in/outinstructions)Memory-mapped I/O (loadsand stores)Direct memory access (DMA)Interrupts
Muli Ben-Yehuda (Technion & IBM Research) Virtualization for Security Bar-Ilan University, 2012 12 / 27
I/O virtualization via device emulation
GUEST
HOST
1
2
34
deviceemulation
driverdevice
driverdevice
Emulation is usually the default [Sugerman01]Works for unmodified guests out of the boxVery low performance, due to many exits on the I/O path
Muli Ben-Yehuda (Technion & IBM Research) Virtualization for Security Bar-Ilan University, 2012 13 / 27
I/O virtualization via paravirtualized devices
GUEST
HOST
driver
1
23
back−end
virtualdriver
front−end
virtualdevicedriver
Hypervisor aware drivers and “devices” [Barham03,Russell08]Requires new guest driversRequires hypervisor involvement on the I/O path
Muli Ben-Yehuda (Technion & IBM Research) Virtualization for Security Bar-Ilan University, 2012 14 / 27
Hypervisor-based I/O introspection
Useful: anti-virus, intrustion detection, compression, livemigration, . . .Q1: how do you do it without impacting performance?Q2: how do you bridge the semantic gap?
Muli Ben-Yehuda (Technion & IBM Research) Virtualization for Security Bar-Ilan University, 2012 15 / 27
I/O virtualization via device assignment
GUEST
HOST
devicedriver
Bypass the hypervisor on I/O path [Levasseur04,Ben-Yehuda06]SR-IOV devices provide sharing in hardwareBest performance: 100% of bare-metal! [Gordon12]
Muli Ben-Yehuda (Technion & IBM Research) Virtualization for Security Bar-Ilan University, 2012 16 / 27
Comparing I/O virtualization methods
IOV method throughput (Mb/s) CPU utilizationbare-metal 950 20%
device assignment 950 25%paravirtual 950 50%emulation 250 100%
netperf TCP_STREAM sender on 1Gb/s Ethernet (16K msgs)Device assignment best performing optionChallenges: DMA and interrupts
Table from “The Turtles Project: Design and Implementation of NestedVirtualization” [Ben-Yehuda10]
Muli Ben-Yehuda (Technion & IBM Research) Virtualization for Security Bar-Ilan University, 2012 17 / 27
Direct memory access (DMA)
All modern devices access memory directlyOn bare-metal:
A trusted driver gives its device an addressDevice reads or writes that address
Protection problem: guest drivers are not trustedTranslation problem: guest memory 6= host memoryDirect access: the guest bypasses the hostWhat is the obvious attack?How do you protect against it?
Muli Ben-Yehuda (Technion & IBM Research) Virtualization for Security Bar-Ilan University, 2012 18 / 27
IOMMU
Muli Ben-Yehuda (Technion & IBM Research) Virtualization for Security Bar-Ilan University, 2012 19 / 27
Background: interrupts
IDTIDTR
Limit
Address
IDT Entry
IDT Entry
…
IDT Entry
Vector 1
Vector n
Vector 2
InterruptDescriptorTable
IDTRegister
Interrupt handlers
I/O devices raise interruptsCPU temporarily stops the currently executing codeCPU jumps to a pre-specified interrupt handler
Muli Ben-Yehuda (Technion & IBM Research) Virtualization for Security Bar-Ilan University, 2012 20 / 27
Interrupt-based attacks
Follow the White Rabbit [Rutkowska11]Tell the device to generate “interesting” interruptsAttack: fool the CPU into SIPIAttack: syscall/hypercall injectionInterrupt-based attacks: guest generating interrupts which arehandled in host modeWhy not handle interrupts in guest mode?
Muli Ben-Yehuda (Technion & IBM Research) Virtualization for Security Bar-Ilan University, 2012 21 / 27
ELI: Exitless Interrupts
bare-metal
Baseline
guest
hypervisor
(time)
ELI delivery
guest
hypervisor
ELIdelivery & completion
guest
hypervisor
PhysicalInterrupt
Interrupt Completion
(a)
(b)
(c)
Interrupt Injection
Interrupt Completion
(d)
ELI: direct interrupts for unmodified, untrusted guests
“ELI: Bare-Metal Performance for I/O Virtualization”, Gordon, Amit,Hare’El, Ben-Yehuda, Landau, Schuster, Tsafrir, ASPLOS ’12
Muli Ben-Yehuda (Technion & IBM Research) Virtualization for Security Bar-Ilan University, 2012 22 / 27
ELI: delivery
ShadowIDT
Hypervisor
ShadowIDT
InterruptHandler
AssignedInterrupt
PhysicalInterrupt
Non-assignedInterrupt(#NP/#GP exit)
ELIDelivery
GuestIDT
VM
IDT Entry
IDT Entry
…
IDT Entry
P=0
P=1
P=0
Handler
#NP
#NP
IDT Entry#GP
IDTRLimit
All interrupts are delivered directly to the guestHost and other guests’ interrupts are bounced back to the host. . . without the guest being aware of it
Muli Ben-Yehuda (Technion & IBM Research) Virtualization for Security Bar-Ilan University, 2012 23 / 27
ELI: signaling completion
Guests signal interrupt completions by writing to the LocalAdvance Programmable Interrupt Controller (LAPIC)End-of-Interrupt (EOI) registerOld LAPIC: hypervisor traps load/stores to LAPIC pagex2APIC: hypervisor can trap specific registers
Signaling completion without trapping requires x2APICELI gives the guest direct access only to the EOI register
Muli Ben-Yehuda (Technion & IBM Research) Virtualization for Security Bar-Ilan University, 2012 24 / 27
ELI: threat model
Threats: malicious guests might try to:keep interrupts disabledsignal invalid completionsconsume other guests or host interrupts
Muli Ben-Yehuda (Technion & IBM Research) Virtualization for Security Bar-Ilan University, 2012 25 / 27
ELI: protection
VMX preemption timer to force exits instead of timer interruptsIgnore spurious EOIsProtect critical interrupts by:
Delivering them to a non-ELI core if availableRedirecting them as NMIs→unconditional exitUse IDTR limit to force #GP exits on critical interrupts
Muli Ben-Yehuda (Technion & IBM Research) Virtualization for Security Bar-Ilan University, 2012 26 / 27
Conclusions
Machine virtualizaion is very usefulCan be used for good, or evilComplexity leads to unintended consequencesHappy hacking!
Muli Ben-Yehuda (Technion & IBM Research) Virtualization for Security Bar-Ilan University, 2012 27 / 27