Date post: | 21-Dec-2015 |
Category: |
Documents |
View: | 224 times |
Download: | 0 times |
Xen and the
Art of Virtualization
University of Cambridge
Presenter: Ashish Gupta
Features An open infrastructure for global distributed
computing Run multiple services on a single Xenoserver
Envisage running up to 100 per server Secure and accountable execution
Strong isolation, logging and auditing Flexible: low-level execution environment Economical: execute on commodity hardware
(x86)
Virtualization techniques
Single OS image (Ensim, VServers) Group user processes into resource container. Implement new schedulers in the OS to ensure isolation Hard to retrofit isolation to conventional Oses
Full virtualization (VMware, Connectix, Bochs) Run full OSes as unmodified guests The VMM enforces resource isolation But it’s hard to efficiently virtualize uncooperative
architectures
Paravirtualization Goals
Low Virtualization Overhead Performance Isolation
Also (Flexibility)
Support full-featured multi-user multi-application OSes
System Performance
Para-virtualization – Principles ? Para-virtualization vs. full-virtualization
Expose guest OS to “real resources” (time, MMU etc.) Better support time sensitive tasks Allows guest OS optimizations Correctness issues
The Downside
Para-virtualization Mechanisms
Three broad aspects Memory Management CPU Device I/O
Memory Management The VMWare approach – shadow page tables
Modifications Paravirtualization obviates the need for
shadow page tables Guest OSes allocate and manage their own
page tables
HOW ?
Mechanism Updates to page tables must be passed to Xen
for validation Updates may be queued and processed in batches
Validation rules (applied to each PTE): 1. only map a page if owned by the requesting
guest OS 2. only map a page containing PTEs for read-only
access Xen tracks page ownership and current use
Memory Management The Xen approach
Memory benchmarks
CPU Efficient because - Four privilege levels
OS – Ring 1, Applications – Ring 3 Privileged instructions required to be validated and
executed by Xen
Exceptions Guest OS registers handlers with Xen Para-virtualization Unchanged handlers “fast handlers” for most exceptions, Xen isn’t involved Page faults – CR2 register read by Xen, so must
enter Xen
Xen uses the 4-ring model
VM ↔ VMM
Guest OS Xen : Hypercalls Like system calls
Xen Guest OS : Events Like UNIX signals
I/O Virtualization Need to minimize cost of transferring bulk data
via Xen Copying costs time Copying pollutes caches Copying requires intermediate memory
Device classes Net Disk Graphics
I/O Virtualization Use rings of buffer descriptors
Descriptors are small: cheap to copy and validate Descriptors refer to bulk data No need to map or copy the data into Xen’s address space Exception: checking network packet headers prior to TX
Use zero-copy DMA to transfer bulk data between hardware and guest OS Net TX: DMA packet payload separately from validated
packet header Net RX: Page-flip receive buffers into guest address space
TCP Benchmarks
Effect of I/O and OS interaction
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Linux Xen VMWare UML
SPEC INT2000 score
CPU Intensive
Little I/O and OS interaction
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Linux Xen VMWare UML
SPEC WEB99
180Mb/s TCP traffic
Disk read-write on 2GB dataset
Scalability
Performance Isolation 4 domains
2 PostgreSQL, SPECWEB99 workloads 2 anti-social workloads
Disk bandwidth hog: huge number of small file creations Fork Bomb
The Bad guys could not kill the Good guys In Native Linux: Rendered the machine
completely unusable !
Denali Isolation Kernel
University of Washington
Motivation Functionality pushed into the network:
Google, IMDB, Hotmail, Amazon, EBay, online banking, …lots!
Major players use dedicated hardware. Lesser services find that cumbersome,
expensive and limiting: Hardware, rack space, bandwidth
Big deployment barrier for little services.
Virtual hosting
Third-party hardware, with small services multiplexed on machines.
Need the ability to run untrusted code.
Likewise for CDNs for dynamic content.
Goals: strong security resource control.
Don’t need: resource sharing. Conventional OSs do not isolate enough Spectrum of Ideas !
#1: OSs with Perf isolation #2: OSs and sandboxing #3: Exo- / Micro- kernels #4: Conventional VMs #5: Isolation Kernel
Isolation Kernel Focus here is on
Performance with Scaling
and Isolation/Security
Reconsider the exposed Virtual Architecture
Downside (Linux port ?)
Scaling Arguments
Denali Mechanism
Overall Architecture
ISA Biggest challenge for x86 virtualization:
Ambiguous instruction semantics No support for ambiguous instructions
Two virtual Instructions Idle-with-timeout Terminate execution
Memory Architecture Simple DOS-like architecture: No virtual MMU Why ?
TLB Problems on x86 : Hardware mapped: Inflexible
Avoids TLB Flushes
Optional Virtual MMU ?
I/O and Interrupt Model Simpler interfaces to NIC, Disk, keyboard,
console and Timer Avoid the “chatty” interfaces
Interrupt Model Physical Interrupts Virtual interrupts
Interrupt Dispatch Model Delays and batches interrupts for non-running VMs Timing related interrupts ?? Real time apps, games
etc ?
Implementation Round robin scheduling
Idle-with-timeout compensated with a higher priority for next quantum.
Can use existing compilers (gcc) to generate code
VMs are paged in on demand. VMM always in core
Memory Virtualized 16MB of physical address space per VM
(since no virtual MMU).
Recently they added a virtual MIPS-style virtual MMU, so guest OS can virtualize its apps’ space. Overhead?
- Pre-allocated, strided swap space. No sharing, so each VM’s space is contiguous.
Networked IO Ethernet driver moved from guest OS to
Denali. Rest of TCP/IP stack stays.
This suffices for early-demuxing received packets into the appropriate VM.
Virtual packet send/recv is 1 PIO each
Guest OS Guest OS: currently only a library, with no
simulated protection boundary there. Supports a POSIX subset. Different from a traditional VM : OS more like a
process: single user, single task OS ? Flexibility ?
Evaluation Network Latency
TCP, HTTP throughput TCP: BSD-Linux 607Mb/s
Denali-Linux 569Mb/
Fair comparison?
Denali with library kernel compared against BSD: both have one protection boundary
Denali-Linux will have one real and one simulated protection boundary: different ?
Batching Reduction in context switching frequency
Idle-with-timeout
Scalability
In-core regime – constant performance disk bound regime - problems
Scalability and block size
Internal fragmentation!
Evaluation summary
Good performance and scalability
due to
architectural modifications
various techniques
Is the lib OS representative of a real OS?