Post on 11-Jan-2016
transcript
System VMsSystem VMs
This material is based on the book, Virtual Machines: Versatile Platforms for Systems and Processes, Copyright 2005 by Elsevier Inc. All rights reserved. It has been used and/or modified with permission from Elsevier Inc.
System VMs 2
System VMsSystem VMs
Support multiple guest OSes on single hardware platform; all running the same ISA
LinuxApplication
Linux OS
WindowsApplication
Windows OS
OS/2Application
OS/2 OS
Intel x86Hardware
Virtual Intel x86 Virtual Intel x86 Virtual Intel x86
System VMs 3
System VM OutlineSystem VM Outline
Applications Virtualizing Processors Virtualizing Memory Virtualizing I/O Formal Virtualizability – ISA features Case Studies – IBM VM, x86/VMware, Intel VT-x
System VMs 4
ApplicationsApplications
Simultaneous support for multiple OSes/Apps• Easy way to implement timesharing
Simultaneous support for different OSes/Apps• E.g. Windows and Unix
Error containment• If a VM crashes, the other VMs can continue to work
Assumes VMM is correct (smaller/simpler) Operating System debugging
• Can proceed while system is being used for normal work
System VMs 5
Applications, contd.Applications, contd.
Operating System Migration• Can proceed while “old” OS continues to be used
Production Users
System ProgrammersNewRelease
OldRelease
UnconvertedProduction Users
ConvertedProduction Users
new releasebeing tested
new releaseinstalled
System Programmers
ConvertedProduction Users
PermanentlyUnconverted
Production Users
newer releasebeing tested
TIME
System VMs 6
Applications, contd.Applications, contd.
Retrofitting new features• Have VMM transform new device into a virtual device
Support for multiple networked machines on one physical machine
• Allows debug of network software Enables complex debugging and performance
monitoring tools• By putting them in the VMM (not the guest OS)
Education
System VMs 7
HistoryHistory
Early-60s IBM M44/44X• VM for modified IBM 7044
“close enough to a virtual machine to show that ‘close enough’ did not count”
Mid-60s IBM CP-40• Time-sharing system that protects users via virtual machines
aka “pseudo machines”• Used modified IBM 360/40
Implemented via assoc. memory and microcode• VMs used real memory; VMM managed virtual memory
CMS• Cambridge (conversational) monitor system –• Single user OS developed for VMs (like DOS)
System VMs 8
HistoryHistory
Mid/late-60s IBM 360/67 -- CP-67• First 360 with VM.• CMS an essential part
Late 60s/early 70s • VMs blossomed as a research topic
Early 70s several VM implementations• Honeywell• DEC• RCA• Several university projects
System VMs 9
System VMsSystem VMs
Virtual Machine Monitor (VMM) manages real hardware resources
All Guest systems must be given logical hardware resources
All resources are virtualized• By partitioning real resources• By sharing real resources
Guest state must be managed• By using indirection• By copying
x86 PC
Linux
Linuxapplications
Windows
Windowsapplications
OS/2
OS/2applications
Virtual Machine Monitor (VMM)
System VMs 10
System VMs: Processor Mgmt/ProtectionSystem VMs: Processor Mgmt/Protection
VMM runs in system mode• VMM manages/protects processor through
conventional mechanisms Guest OSes run in user mode
Guest OSes do not have direct control over hardware resources
All attempts to interact w/ hardware resources are intercepted by VMM
VMM manages shadow copies of Guest System state (incl. control registers)
VMM schedules and runs Guest Systems
System VMs 11
VM TimesharingVM Timesharing
VMM Timeshares resources among guests• Similar to OS timesharing applications
VMM savesarchitected stateof running VM
VMM restoresarchitected state
for next VM
VMM sets PC to timerinterrupt handler of OS
in next VM
VMM sets timerinterval and
enablesinterrupts
Timer interruptoccurs
VMMdetermines next
VM to beactivated
VMM ActiveFirst VM Active Next VM Active
System VMs 12
Native and Hosted VMsNative and Hosted VMs
Non-privilegedmodes
PrivilegedMode
Applications
OS
Traditionaluniprocessor
system
Hardware
VirtualMachine
VMM
Hardware
VirtualMachine
Host OS
Hardware
VMM
VirtualMachine
Host OS
Hardware
VMM
NativeVM system
User-modeHosted
VM system
Dual-modeHosted
VM system
System VMs 13
Virtualizing StateVirtualizing State
Indirection Hold guest state in VMM
memory Change pointer on guest
switch Example: registers
Processor
VMM Memory
Register values
for VM 1
Register values
for VM 2
Register values
for VM 3
Register Block
Pointer
System VMs 14
Virtualizing StateVirtualizing State
Copying Hold guest state in VMM
Memory Copy state on guest
switch
VMM Memory
Processor
Processor
Registers
Register values
for VM 1
Register values
for VM 2
Register values
for VM 3
System VMs 15
Processor Management/ProtectionProcessor Management/Protection
Traps and interrupts (& sys calls)• Transfer to VMM• VMM determines appropriate Guest OS• VMM transfers to Guest OS
Guest OS “return” to user app.• Transfer to VMM• VMM bounces return back to Guest app.
Read/Write of protected control registers
• Trap to VMM• VMM reads/modifies guest copy• May modify shadow copy• Returns to Guest
privileged operation
next instruction
check privileges
perform operation
return
system call/trap
vector location:
virtual vector location:
Application
Guest OS
VMM
System VMs 16
OS VMs: Key Issue – ISA VirtualizabilityOS VMs: Key Issue – ISA Virtualizability
What if privileged instruction no-ops in user mode? (rather than trapping)
• Then… VMM can’t intercept when Guest OS attempts the privileged instruction
What if user can access memory with real address? • Then… a guest OS may see that the real memory it really has is
different from the memory it thinks it has
What if user can read system control registers?• Then… guest OS may not read the same state value that it thinks it
wrote
System VMs 17
Virtualizability (Popek, Goldberg, 74)Virtualizability (Popek, Goldberg, 74)
Classic work in formalizing OS VM concepts Defines basic VM properties Defines properties of instruction sets Proves that VMM can be constructed if
instruction set properties hold Extends to recursive VMs Reduces to hybrid VMs
System VMs 18
VM PropertiesVM Properties
Virtual Machine: efficient, isolated duplicate of the real machine
Virtual Machine Monitor: software that implements VMs
Essential VMM characteristics1) Provides an environment essentially identical to
the real machineExcept timing and availability of resources
2) Programs show only minor decreases in speed Mostly native instruction execution
3) Has complete control of system resources
System VMs 19
PrivilegedPrivileged Instructions, Definition: Instructions, Definition:
Trap if executed in user mode; not in supervisor mode
Privileged instructions are required to trap• No-op in user mode is not enough
System VMs 20
Control SensitiveControl Sensitive instructions: instructions:
1. All instructions that change the amount of (memory) resources (or the mapping) • base/limit register in simplified paper version• page table in general
2. All instructions that change the processor mode
Instructions that provide control of resources Examples:
• Load TLB (if TLB is architected)• Load control register• Return to user mode
System VMs 21
Behavior SensitiveBehavior Sensitive instructions: instructions:
1. All instructions whose results depend on the mapping of physical memory
2. All instructions whose behavior depends on the mode
Instructions whose behavior depends on configuration of specific resources (and who owns them)
Examples:• Load physical address• POPF (Intel x86): Interrupt-enable flag remains
unaffected in user mode
System VMs 22
Instruction Types -- SummaryInstruction Types -- Summary
Privileged
Non-Privileged
SensitiveBehavior-sensitive
Control-sensitive
Innocuous
Sensitive
Innocuous Instructions: Those that are not control or behavior sensitive
System VMs 23
VMM componentsVMM components
Instructiontrap occurs
Dispatcher
Allocator
InterpreterRoutine 2
InterpreterRoutine 1
InterpreterRoutine n
These instructionsdesire to change
machine resources,e.g. Load Relocation
Bounds Register
These instructions do notchange machine resources,
but access privilegedresources, e.g. IN, OUT,
Write TLB
PrivilegedInstruction
PrivilegedInstruction
PrivilegedInstruction
PrivilegedInstruction
System VMs 24
VMM componentsVMM components
Dispatcher• Target of vectored traps – entry point for VMM• Decides which of other components to call
Allocator• Decides which system resources should be provided and to
manage shared resources among VMs Interpreters
• Emulate the effects of privileged instructions VMM runs in supervisor mode; all other software in
user mode
System VMs 25
Privileged Instruction HandlingPrivileged Instruction Handling
LPSW: Load Program Status WordIncludes Mode Bit and PC (among other things)
Guest OS code in VM(user mode)
Privileged instruction (LPSW)…...…...Next instruction (target of LPSW)
VMM code(privileged mode)
LPSW Routine:Change mode to privilegedCheck privilege level in VMEmulate instructionCompute targetRestore mode to userJump to target
Dispatcher
System VMs 26
Virtual Machine “requirements”Virtual Machine “requirements”
1. All innocuous instructions are executed by the hardware directly
2. The allocator must be invoked when any program attempts to affect system resources
3. Any program executes exactly as on real hardware except• For timing• Availability of system resources
A VMM satisfies all three requirements Precise versions of informal definitions
given earlier
System VMs 27
Virtual Machines: Main TheoremVirtual Machines: Main Theorem
A virtual machine monitor can be constructed if the set of sensitive instructions is a subset of the set of privileged instructions
Proof shows Equivalence by interpreting privileged instructions and executing
remaining instructions nativelyResource control by having all instructions that change resources
trap to the VMMEfficiency by executing all non-privileged instructions directly on
hardware
A key aspect of the theorem is that it is easy to check
System VMs 28
Recursive VirtualizationRecursive Virtualization
Non-privilegedmodes
PrivilegedMode
VirtualMachine
VMM
Hardware
VirtualMachine
VirtualMachine
2nd level VMM
VirtualMachine
System VMs 29
Recursive VirtualizationRecursive Virtualization
Running a VMM as a VM on a VM on a VM….
Theorem: A conventional third generation computer is recursively virtualizable if it is (a) virtualizable, and (b) a VMM without any timing dependences can be constructed for it
Proof – A VMM is a program and from the VM theorem will be “identically performing” except for timing dependences and resource constraints. Timing is excluded in the theorem;Resource constraints only limit the depth of recursion.
System VMs 30
Hybrid VirtualizationHybrid Virtualization
Some ISAs are more virtualizable than others• User sensitive instructions
Executed in user mode and can change memory resources or processor mode, or whose behavior depends on real memory locations
• Supervisor sensitive instructionsExecuted in supervisor mode and can
change memory resources or processor mode, or whose behavior depends on real memory locations
System VMs 31
Hybrid Virtualization: TheoremHybrid Virtualization: Theorem
A hybrid virtual machine monitor can be constructed if the set of user sensitive instructions is a subset of the set of privileged instructions Nonprivileged supervisor sensitive instructions are OKExample: PDP-10 JRST 1 – return to user mode
• (does not trap if already in user mode) When the VMM executes the VM supervisor, it must use some
form of emulation to locate supervisor sensitive instructions• Low efficiency, but only in VM supervisor, not user code
If a user sensitive instruction is not privileged, then the VMM must emulate all the user code
• Fails efficiency condition• But “binary translation” be done more efficiently than interpreting
System VMs 32
Case Study: Virtualizing the x86 ISACase Study: Virtualizing the x86 ISA
x86 Evolved through many extensions Instruction set is not (strictly) virtualizable
• Nor is it hybrid virtualizable
System VMs 33
X86 Processor ControlX86 Processor Control Uses “baroque” late 1970s style protection rings
• Unix was a reaction to this style Four rings, 0-3;
0 OS Kernel
1 High priority drivers and OS services
2 Low priority device drivers
3 User Transfer to lower ring (higher privilege) must go
through “gate”
32
1
0
System VMs 34
Memory MappingMemory Mapping
Segments map to 2GB memory space 2 GB space maps to fixed-size pages Segment descriptor info
• Valid, Base, Limit,• Type (code or data)• R/W rights,• Descriptor Privilege Level (DPL)• Etc.
System VMs 35
Memory MappingMemory Mapping
segment register
stack segment register
segment register
data segment register
segment register
data segment register
segment register
data segment register
segment register
data segment register
segment register
code segment register
base, limit, rights (R/W),Desc. Priv. Level (DPL)Req. Priv. Level (RPL)
2GB Memory
Code
Data
SegmentDescriptor
Tables
loaded into segment registers
2 Level Page Table
RealPages
Descriptor TableRegisters
System VMs 36
AddressingAddressing
Addressing is via Segment Registers Segment Registers
• CS code segment• SS stack segment• DS, ES, FS, GS data segments• All memory accesses are via a segment register
Segment descriptors are entered into segment registers
• And given an RPL, Requestor Privilege Level• In some cases privilege is lowest of RPL,DPL
e.g. when pointers are passed
System VMs 37
X86 Processor/Memory ProtectionX86 Processor/Memory Protection
CPL –current protection level, normally determined by DPL of current code segment
• CPL == processor mode To access data, CPL DPL To call procedure, must enter through gate
if CPL(callee) < CPL(caller)
<< this is a very abbreviated description >>
System VMs 38
X86 Instruction Set VirtualizabilityX86 Instruction Set Virtualizability Ordinarily:
• Levels 0,1,2 == supervisor• Level 3 == user
To virtualize, everything runs at level 3 IN, INS, OUT, OUTS – I/O instructions
• Perform check CPL IOPL (I/O Privilege Level)• Not privileged (by Goldberg’s defn)• Control sensitive (I/O is resource), action sensitive to CPL• Could be user sensitive
POPF, PUSHF• Push/pop stack to/from EFLAGS register• EFLAGS contains IOPL (among other things)• And this flag indicates IO privilege level of current task
System VMs 39
X86 ISA Virtualizability, contd.X86 ISA Virtualizability, contd. SGDT, SIDT, SLDT, SMSW, STR
• Copy descriptor pointer register, or system state information• Typical manual entry: “The SGDT and SIDT instructions are only useful in operating-
system software; however, they can be used in application programs without causing an exception to be generated”
• E.g. behavior sensitive, non privileged VERR/VERW
• Verify if addressed segment is readable or writeable by CPL –• Seem like perfectly reasonable instructions,• BUT behavior sensitive and not privileged
System VMs 40
X86 ISA Virtualizability, contd.X86 ISA Virtualizability, contd. LAR/LSL
• LAR -- load access rights and DPL• LSL – load segment limit• May no-op, in effect, if CPL isn't good enough.• I.e. performs CPL/RPL check before it does inst.• Behavior sensitive and not privileged
MOVs, PUSH/POP to/from segment registers• Copy RPL from segment register• Behavior sensitive and not privileged
Pre-Scanning is probably a necessity
System VMs 41
Hybrid Virtualization: PatchingHybrid Virtualization: Patching
Scan Guest OS, find problem instructions, replace with jump to VMM
Control transfer,e.g. trap
Scanner andPatcher
VMM
Code Patch fordiscovered
critical instruction
Original Program Patched Program
System VMs 42
Hybrid Virtualization: Code CachingHybrid Virtualization: Code Caching
Scan Guest OS, “translate” into code cache, find problem instructions, replace with jump to VMM
Control transfer,e.g. trapCode section
emulated in codecache
Patched Program VMM
Block 1
Block 2Block 3
TranslationTable
CodeCache
Block 1
Block 2
Block 3
Two criticalinstructions combined
into a single block
SpecializedEmulation Routines
System VMs 43
Virtualizing Memory: ReviewVirtualizing Memory: Review
PT Pointer OS managedReal Pagesuser
user
super
user
super
process 1 PT
process n PT
OS memory region
Context switch
System VMs 44
Virtualizing MemoryVirtualizing Memory Real memory partitioning?
• Could be fixed partition per guest => inefficient• Typically flexible partitioning via VMM management
Guest manages its virtual page tables Guest page table addresses are write protected VMM manages shadow page tables that reflect
actual mapping to physical pages• Note Real / Physical page distinction
VMM can change shadow page table by writing page table pointer
• i.e. virtual machine state change via indirection
System VMs 45
Virtualizing Memory – ExampleVirtualizing Memory – Example
guest OS switch
contextswitc
h
context switch
Guest n Shadow PTs
Guest n PT Pointer
VMM memory region
Guest 1 ShadowPTs
VMM-managedPhysical Pages
Guest 1 PT Pointer Guest 1 OS managed"Real" Pagesuser
user
super
user
super
process 1 PT
process n PT
Guest 1 OS memoryregion
PT Pointer
process 1 user modePT
process 1 super modePT
process n super modePT
process n user modePT
Guest 1 Shadow PTs
systemcall
System VMs 46
Virtualizing Memory – OperationsVirtualizing Memory – Operations
Guest application performs system call
• Trap to VMM• VMM changes shadow mapping
to reflect guest privilege change
Guest OS performs context switch
• Writes PT pointer• Trap to VMM• VMM writes guest PT pointer• VMM modifies shadow PT pointer
write PT pointer
next instruction
check privilegeswrite guest PT ptr
return
Guest OS
VMM
Guest PT ptr
write shd. PT ptr
Shadow PT ptr
System VMs 47
Virtualizing Memory -- TLBsVirtualizing Memory -- TLBs TLB plays role of page table Page table is just a software structure of which the VMM has no
special knowledge Assume TLB entry:
• PId, Protection bits, usage bits, real page frame Virtualize TLBs
• VMM keeps track of Guest’s copies• VMM manages real copy• Real TLB holds subset of pages mapped in Guest copy
Virtualize PIds• VMM manages real PIds• Keep track of mapping from guest PIds to real PIds
At any given time all TLB entries with same PId are associated with same guest
System VMs 48
Virtualizing TLBsVirtualizing TLBs
TLB Read/Write are privileged instructions• Behavior and control sensitive
Guest OS write TLB• Intercepted by VMM• VMM updates guest’s virtual copy• VMM may modify real TLB
Guest OS read TLB• Intercepted by VMM• VMM reads guest’s virtual copy and returns contents to guest• May have to merge in usage data from real version in TLB
System VMs 49
Virtualizing TLBsVirtualizing TLBs TLB miss
• Traps to VMM• VMM check to see if virtual TLB maps page
If so, VMM handles it and silently returnsElse, VMM reflects fault to guest OS
TLB management• Can’t switch TLBs via indirection as with page tables• PId management can give similar control, however
Guest system call/returns (privilege changes)• Flush old mode PId entries
New mode TLB entries re-loaded on demandOr write new TLB entries with privileges
• Use two real PIds per virtual PIdOne with virtual system mode privilegesOther with virtual user mode privileges
System VMs 50
Virtualizing I/OVirtualizing I/O Hardest part of virtualization
• Many device types• Many devices of each type
Each with its own driver• New devices may be added during lifetime of system
In older, “classic” systems, less of a problem• Entire system developed by one company• Far fewer devices to worry about• Channels (IO Processors) isolated key IO software
System VMs 51
I/O ArchitectureI/O Architecture
I/O instructions• Special privileged opcodes• Similar to loads/stores• Address and data read/written on I/O bus
Memory mapped I/O• Load/stores to special (protected) memory addresses• Addresses/data decoded by hardware and translated
to I/O addresses/data
Addresses indicate I/O devices/registers Data can be status, commands, or real data
System VMs 52
I/O Architecture (contd)I/O Architecture (contd) DMA (block) transfers may require several I/O
operations• Starting address(es)• Block length• Command (read, write, interrupt on completion)• Requires exclusive device access
Interrupts• From I/O devices to force processor transfers to I/O software
routines
Status
Decode
Start Address Block SizeData Buff
data bus
address bus
System VMs 53
I/O Management: reviewI/O Management: review OS manages I/O resource
• Allocates space on storage devices, etc.• Serializes requests for shared devices
User software performs system calls with general I/O requests
OS converts I/O calls to driver calls• Driver contains device-specific software
Exact commands, controller registers, etc. Driver generates device (and bus)-
specific I/O operations
system calls
Hardware
I/O Drivers
phy. mem. and I/O operations
VM mgr
Operating System
driver calls
Application
System VMs 54
Device TypesDevice Types
Dedicated• Monitor, mouse, keyboard• Device can’t be virtualized; must be shared (under user control)• VMM still controls due to privileged mode
Partitioned• Disk• Make multiple, smaller virtualized versions
Shared• Network adapter• VMM manages virtual state information• Translate virtual requests to physical requests
Spooled• Printer• Shared but at coarse granularity
System VMs 55
Spooled DevicesSpooled Devices
Two level spool table First write to VM spool
area When ready, VMM copies
to VMM spool area Then invokes device When device finished
• Both VM and VM spool tables receive “complete”
Optimizations are possible• E.g. VMM uses VM spool buffer
Virtual Machine 1 Spool Table
Program Status Location Size
A Printed 1000 400
B Completed 2000 200
C Running 3000 200
D Completed 4000 500
Virtual Machine 2 Spool Table
Program Status Location Size
P Running 1000 400
Q Completed 2000 800
Real loc.
11000
12000
13000
14000
Real loc.
21000
22000
VMM Spool Table
Program Status Size
A Printed 400
Q Printing 800
B Waiting 200
D Waiting 500
Real loc.
30000
31000
31800
30400
VM
1
2
1
1
10000
20000
30000
System VMs 56
Spooled DevicesSpooled DevicesVirtual Machine 1 Spool Table
Program Status Location Size
A Printed 1000 400
B Completed 2000 200
C Running 3000 200
D Completed 4000 500
Virtual Machine 2 Spool Table
Program Status Location Size
P Running 1000 400
Q Completed 2000 800
Real loc.
11000
12000
13000
14000
Real loc.
21000
22000
VMM Spool Table
Program Status Size
A Printed 400
Q Printing 800
B Waiting 200
D Waiting 500
Real loc.
30000
31000
31800
30400
VM
1
2
1
1
10000
20000
30000
System VMs 57
Non-existent DevicesNon-existent Devices
Implement virtual version only Example: network adapter
• Allows VMs on same platform to communicate
System VMs 58
I/O Interception PointsI/O Interception Points
At system call interface
At driver call interface
At I/O device interface
system calls
Hardware
I/O Drivers
phy. mem. and I/O operations
VM mgr
Operating System
driver calls
Application
Attempts to interact with virtual devices are intercepted by VMM which translates to real devices
System VMs 59
At system call interfaceAt system call interface
System call traps to VMM VMM interprets system call to produce driver calls VMM contains shadow drivers
• (Implement VMM with driver interface compatible with some existing OS?)
Guest OS contains virtual I/O code and drivers• Must still be executed, for correct guest state updates
Problems• VMM must interpret all I/O system calls for all guest OSes• VMM must have access to drivers for all real devices• I/O initiated by guest OS may not always pass through call
interface
System VMs 60
At driver call interfaceAt driver call interface Guest OS contains driver stubs Guest OS driver calls can operate on
generic virtual devices• To simplify conversion
VMM contains shadow drivers• These drivers correspond to real devices
Generic I/O operations passed to VMM and converted to shadow driver calls
Problem• VMM must have access to real drivers• Need generic drivers for each guest OS• Guest OSes must have well defined, modular
driver call interface
VMM .
generic I/O operations
Generic I/ODrivers
Hardware
I/O operations
Guest OSdriver calls
system calls
Guest Application
I/O drivers
interpret
System VMs 61
At I/O device interfaceAt I/O device interface Guest OSes contain real drivers Low level I/O operations trap to VMM VMM must check/translate I/O
operation If legal, VMM performs I/O operation
on behalf of guest VMM passes control back to guest Problems
• VMM must know some device specifics(even if it doesn’t contain full drivers)
VMM must manage serialization for shared devicesVMM must check correctness of I/O operations
VMM .
I/O operations
I/O Drivers
check/trans-
late
Hardware
I/O operations
Guest OS
driver calls
system calls
Guest Application
System VMs 62
Virtualization with IOPs (IBM Style)Virtualization with IOPs (IBM Style)
IO instruction points to Channel program• Similar to driver• Micro-code like• Very simple control flow• “Packages” sequences of related operations
VMM can translate channel program as a whole• Mostly consists of address re-mapping
And dealing with non-contiguous pages• Reduces/eliminates problems with I/O sequences that
require exclusive access to a device
System VMs 63
Case Study: IBM 360/370/390Case Study: IBM 360/370/390
CP-67 on 360/67 in 1960s• First production VM implementation• Provided means for supporting timesharing via Multiple guest
versions of CMS – single user OS• Used basic virtualization concepts described by Goldberg
VM/370 (1972) led to widespread use of VMs Virtual Machine Assist (1974)
• Enhancements to support VMs Extended Control Program Support (1978)
• Further enhances VM support Handshaking
• Lets Guest OS in on the secret Interpretive Execution Facility (IEF)
System VMs 64
Reasons for VM SlowdownReasons for VM Slowdown
VM initialization• Setting up virtual state
Privileged Instruction overhead • Trap to VMM• Interpretation by VMM• Return from VMM to guest
System Calls (SVC) by guest in user mode• Requires trap/reflection back to Guest OS
Interrupts• Reflect through VMM before getting to Guest OS
Virtual Memory Management• Shadow page faults when page is already mapped
Duplicated effort between VMM and Guest OS• Memory management done by both
System VMs 65
Virtual Machine AssistsVirtual Machine Assists
Ways of making application on VM run faster• Have no performance effect if run in native mode
Instruction Emulation Shadow Table Management Virtual interval timer
System VMs 66
IBM 370 Virtual Machine AssistIBM 370 Virtual Machine Assist
Add Control Register 6 (CR6)• Bit 0 VM Assist On/Off• Bit 1 Virtual user/supervisor state• Bit 4 SVC handling On/Off• Bit 5 Shadow table fixup On/Off• Bit 7 Virtual interval timer assist• Bits 8-28 address of VM pointer list
CR6 Set by VMM when Guest is dispatched
System VMs 67
Instruction EmulationInstruction Emulation Certain privileged instructions emulated directly in
microcode• Avoids trap/interpretation by VMM• Guest must be in Virtual Supervisor mode (held in CR6)• Examples:
Load PSW Load Real AddressReset Reference BitStore Control
Supervisor Calls also emulated• If SVC handling is enabled via CR6• Avoids trap/reflection through VMM
System VMs 68
Shadow Table ManagementShadow Table Management
When page fault occurs:• If Guest OS has page mapped
and page is already present in real memory
but not mapped by guest’s shadow table then
VM assist updates shadow table automatically• Else, reflects fault to VMM
Uses VM pointer list to find guest tables
System VMs 69
Performance ImprovementPerformance Improvement
Reduction in Supervisor State Time • 70-90%
Reduction in Elapsed Time• 40-65%
Reduction in Priv. Insts. Interpreted by VMM• 75-95%
System VMs 70
Extended Control Program SupportExtended Control Program Support
Emulates additional Privileged Instructions• e.g. Purge TLB, Test Channel
Partially handles other Privileged Instructions (with help from VMM)
Non-architected instructions for use by VMM• Examples
Decode channel wordsDispatch a virtual machineLocate virtual I/O control blocks(many others)
Virtual Timer Assist• Maintains a virtual interval timer for guest VM• Real interval timer is a hardware resource
System VMs 71
Interpretive Execution FacilityInterpretive Execution Facility Provides a way to execute most of the VMM
functions in hardware Function of VMM separated between hardware and
software• Cleaner separation compared to earlier VM assists
Advantages of interpretive execution• Better performance• Better predictability of performance• Applicable for all types of guest operating systems
Key instruction: SIE (Start Interpretive Execution)• Used by VMM to give control to hardware• Architectural state of VM in table accessible to hardware• Privileged instructions interpreted in hardware• Occasionally need to get back to the software part of the
VMM
System VMs 72
Entry and Exit from IE modeEntry and Exit from IE mode
Entry into Interpretive-Execution mode
VMM Software
.
.
.
SIE........
Emulation
InterpretiveExecution Mode
Host InterruptHandler
Exit for interception
Exit for host interrupt
System VMs 73
Inter VM CommunicationInter VM Communication
Other VMM extensions focus on inter-machine communication by emulating many distributed system features
• e.g. virtual LANs• VMs by their nature are isolated – but inter-user
communication is also desirable
System VMs 74
IBM Handshaking (Para Virtualization)IBM Handshaking (Para Virtualization)
Allow Guest OS to discover that it is running on VMM• Guest “probes” for VMM when it is booted• Then informs VMM that it expects VMM support
Reduces duplicated effort• OS can mark all page frames fixed,
disable demand paging,bypass channel address translation
Pseudo page fault handling• Under operator control, VMM notifies Guest OS when
VMM is handling a page fault by the Guest VM• Guest OS marks faulting task as “page wait”• Guest OS Dispatches another task
(I.e. whole Guest VM does not have to wait)
System VMs 75
VMware: an x86 System Virtual MachineVMware: an x86 System Virtual Machine
Applying Conventional VMs to PCs – Problems:• Installing the VMM on bare hardware, then booting Guests onto
VMM.• Need to support many device types, many more drivers
VMware solves both problems Uses Host OS/Guest OS model
• “Hosted VM”• Uses Host OS for some VMM functions
Including I/O
System VMs 76
VMware: Three Main componentsVMware: Three Main components
Begin with already-loaded Host OS VMDriver (Pseudo-Driver)
• Host OS-specific• Installed as a driver, but can take
over the machine• Acts as conduit between
System and User VMMs VMMonitor (System-level VMM)
• Slipped under installed OS via Pseudo-Driver
VMApp (User-level VMM)• Appears as ordinary
application to installed OS• Can make normal I/O calls
(and use installed drivers)
Virtual Machine
Host OS
Hardware
Applications
OS(eg. Linux, Windows)
Hardware(x86 motherboard,
display, adapters, etc.)
VMMonitor
Host Apps
VMDriver
VMApp User mode
PrivilegedMode
System VMs 77
VMM CommunicationVMM Communication
VMM control passes back and forth between user and system-level VMM portions
User VMM performs system call to pseudo-driver; then waits for response
System VMM maintains control, then sends response message back to User VMM
System VMs 78
Resource ManagementResource Management
Host OS schedules processor resource• User-level VMM is just another application
Host OS manages memory• VM memory is allocated as address space of User-
level VMM• User level VMM “mallocs”; whole VM uses it
System VMs 79
VMware I/OVMware I/O
Guest OS contains generic drivers
Generic drivers operate on virtual devices managed by user mode portion of VMM
User mode portion of VMM makes normal system calls
System calls cause Host OS to use real drivers and devices
Hardware
Guest OS
system calls
phy. mem. and I/O operations
Generic I/Odirvers
VM mgr
mgr
phy. mem. and I/O operations
Host OS
System Calls
VM mgr I/O Drivers
Guest Application
(user mode)
VMMSWVM
VirtualDevices
System VMs 80
I/O SequenceI/O Sequence
Guest application makes system call Intercepted by System-level VMM, reflected to
Guest OS Guest OS performs I/O operations specified in
generic drivers System-level VMM captures I/O operations, and
interprets them Passes operation back up to User-level VMM User-level VMM performs I/O call to Host OS
System VMs 81
Example: Network VirtualizationExample: Network Virtualization
Virtual and Physical Network Interface Card (NIC) the same
Message Send• X86 OUT or OUTS plus port# (in range of IDs for NIC)• Each port has state bit trap on I/O request
VMM saves permission “map” for all ports per guest VM
System VMs 82
Example: Network VirtualizationExample: Network Virtualization Sequence below Guest OUT traps to VMM VMM checks guest permissions before making request to
physical NIC
Device DriverVMMOS on VM 1User on VM 1
User sendsmessage to
external machinee.g.. using send()
OS converts intoI/O instructions for
virtual NIC e.g.OUTS 0xf0,...
VMM sends packeton virtual bridge to
device driver ofphysical NIC e.g.OUTS 0x280,….
NIC device driverlaunches packeton network using
wire signals
To network
User mode Privileged mode
System VMs 83
Virtual NetworkVirtual Network
Virtual and Physical NIC different Special case: virtual network
User sendsmessage to localvirtual machinee.g.using send()
OS converts intoI/O instructions
e.g.OUTS 0xf0,...
NIC device driverconverts sendmessage to a
receive messagefor receiving VM.No wire signalsare generated.
VMM sends packeton virtual bridge to
device driver ofphysical NIC e.g.OUTS 0x280,….
User on VM 2
Receiver getspacket
OS on VM 2
Interrupt handlerin OS generates
I/O instructions toreceive packet
VMM raises interruptin receiver’s OS
User mode Privileged mode
System VMs 84
Case Study: Intel VT-x (Vanderpool)Case Study: Intel VT-x (Vanderpool)
x86 Virtualization Extensions recently announced by Intel
New VMX mode•Two privilege levels: root and non-root
Root level•Similar to conventional x86•Plus new VMX instructions•VMM runs in root level
Non-root level•Limited control of resources•Including when in ring 0•Guest OS plus apps runs in non-root level
System VMs 85
VT-x OperationVT-x Operation Transition from normal mode to VMX root mode via vmxon
instruction VMM in root level, sets up the environment for each VM and
initiates the virtual machine via vmlaunch instruction Attempts to modify resource cause return to root level Explicit vmcall causes return to root mode vmresume instructions causes return to guest in non-root
mode vmxoff instruction causes exit from VMX mode
vmxon
RegularMode
RegularMode
vmxoff
Root Mode(VMM)
vmlaunchVM1
vmlaunchVM2
vmresumeVM2
vmresume VM1
Non-Root(VM1)
VM1exits
vmresumeVM2
Non-Root(VM2)
VM2exits
VM2exits
VM2exits
VM1exits
System VMs 86
VT-x CapabilitiesVT-x Capabilities
Root mode eliminates need to run all guest code in user mode
• VMM runs in root mode• For code regions with no critical instructions, HW is as efficient as normal machine
VT-x HW maps state-holding data elements directly to native structures during VM execution.
• VMCS (virtual machine control structure) encapsulates VM state• HW implementation can take over loading and unloading state • No need for VMM to perform load/stores of state info.
Eliminates the need for paravirtualization, • Allows standard versions of OSes to be used as guests• The vmcall instruction, can be used to pass hints and data to the VMM if desired
System VMs 87
VMCSVMCS
Can be implemented by HW or SW in root mode• VMM is implementation-dependent
Aligned on 4KB boundary Pointed to by VMPTR
• Load VMPTR with vmptrld instruction• Read VMCS with vmread ; Write VMCS with vmwrite
Register StateInterruptibility State
Host State Register StatePin-based Execution Controls
Processor-based Execution ControlsBitmap Fields
etc.Control BitmapMSR Controls
Control BitmapMSR Controls
Controls for Event InjectionVM-Exit Information
Vectoring Event InformationDue to Event Delivery
Due to Instruction Execution
Control Area
VM Exit Information
Other Exit Information
Basic Information
State AreaGuest State
VM Execution Controls
VM Exit Controls
VM Entry Controls
System VMs 88
Critical InstructionsCritical Instructions
Programmable VM exit conditions given in VMCS E.g., which instructions should cause exit to VMM Example: Read Time Stamp Counter (RDST)
• Contained in 64-bit MSR -- IA32_TIME_STAMP_COUNTER• Works in any mode if TSD bit in control register 4 is off• Otherwise works only in Ring 0; otherwise traps
(protection mode exception)
System VMs 89
RDSTRDSTrdtsc instruction
encountered
Machine inVMX mode?
RDTSCexiting bit is set
in VMCS?
Yes
No No
Yes TSD bitof CR4 is set
in VM?
UseTSC Offsetting
bit is setin VMCS?
Add TSC offset to time-stamp counter value.
Perform normaloperation
Save exit informationExit VM.
Return control toVMM
Yes
Protection Exception.Save exit information.
Exit VM.
Return control toVMM
Yes
No
Return time-stamp counter
value
Ring 0operation?
Yes
No
Return sum
No