Post on 24-Feb-2016
description
transcript
November 2004J. E. Smith
Virtual Machines: An Architecture Virtual Machines: An Architecture PerspectivePerspective
VMs (c) 2004, J. E. Smith 2
IntroductionIntroduction
Why are virtual machines interesting?
They involve computer architecture in a pure sense
They allow transcending of interfaces (which often seem to be an obstacle to innovation)
They enable innovation in flexible, adaptive hardware, security, fault-tolerance, support for network computing (and others)
VMs (c) 2004, J. E. Smith 3
Performance Isn’t EverythingPerformance Isn’t Everything The BIG ideas are all at least 20 years old
and they have been very thoroughly explored Focus research on other important areas
• Power efficiency• Performance efficiency• Security• Ease of design• Software compatibility / interoperability
Virtual Machines can be important enablers for all the above
VMs (c) 2004, J. E. Smith 4
OutlineOutline Virtualization The Family of Virtual Machines Process VMs and Code Caching High Level Language VMs Co-Designed VMs Research in Co-Designed VMs
VMs (c) 2004, J. E. Smith 5
AbstractionAbstraction
Computer systems are built on levels of abstraction
Instruction Set Architecture• Major division between hardware and software
I/O devicesand
Networking
Controllers
System Interconnect(bus)
Controllers
MemoryTranslation
Execution Hardware
Drivers MemoryManager Scheduler
Operating System
Libraries
ApplicationPrograms
MainMemory
1
2
33
4 5 6
7 78888
9
10 10
1111 12
13 14
Software
Hardware
Application Binary Interface• Observed by user processes• User ISA + OS calls
Higher level of abstraction hide details at lower levels
Example: files are an abstraction of a disk
filefile
abstraction
VMs (c) 2004, J. E. Smith 6
VirtualizationVirtualization
An isomorphism from guest to host• Map guest state to host state• Implement “equivalent” functions
Si S
Si' Sj'
Guest
Host
V(Si) V( S j )
e(Si)
e'(Si')
j
VMs (c) 2004, J. E. Smith 7
VirtualizationVirtualization
Similar to abstractionExcept•Details not necessarily hidden
Construct Virtual Disks•As files on a larger disk•Map state•Implement functions
Now do the same thing with the whole “machine”
file file
virtualization
VMs (c) 2004, J. E. Smith 8
The Family of Virtual MachinesThe Family of Virtual Machines Lots of things are called “virtual machines”
IBM VM/370JavaVMware
Some things not called “virtual machines”, are virtual machines IA-32 EL Dynamo
Transmeta Crusoe
VMs (c) 2004, J. E. Smith 9
System Virtual MachinesSystem Virtual Machines
Provide a system environment
Constructed at ISA level
Persistent Examples: IBM
VM/360, VMware, Transmeta Crusoe
guestprocess
HOST PLATFORM
virtualnetwork communication
Guest OS
VMM
guestprocess
guestprocess
guestprocess
Guest OS2
VMM
guestprocess
guestprocess
VMs (c) 2004, J. E. Smith 10
System Virtual MachinesSystem Virtual Machines
Native VM System•VMM privileged mode•Guest OS user mode•Example: classic IBM VMs
User-mode Hosted VM•VMM runs as user application
Dual-mode Hosted VM•Parts of VMM privileged, parts non-privileged•Example VMware
Non-privilegedmodes
PrivilegedMode
VirtualMachine
VMM
Hardware
VirtualMachine
Host OS
Hardware
VMM
VirtualMachine
Host OS
Hardware
VMM
VMs (c) 2004, J. E. Smith 11
Process Virtual MachinesProcess Virtual Machines
Constructed at ABI level Runtime manages guest
process Guest processes may
intermingle with host processes
Not persistent As a practical matter,
guest and host OSes are often the same
Dynamic optimizers are a special case
Examples: IA-32 EL, FX!32, Dynamo
HOST OS
Disk
file sharing
network communication
guestprocess
create
hostprocess
guestprocess
runtimeruntime
guestprocess
runtime
hostprocess
VMs (c) 2004, J. E. Smith 12
The Virtual Machine SpaceThe Virtual Machine Space
Multiprogrammed
Systems
HLL VMsCo-Designed
VMs
same ISA differentISA
Process VMs System VMs
WholeSystem VMs
differentISA same ISA
ClassicOS VMs
DynamicBinary
Optimizers
DynamicTranslators
HostedVMs
VMs (c) 2004, J. E. Smith 13
Architecture Issues: System VMsArchitecture Issues: System VMs Why System VMs are of interest today
• Security & Fault Tolerance (isolation)• Platform Consolidation• Application/Environment portability
“Efficiently Virtualizable” Instruction Sets• Goldberg and Popek (1974) should still be required reading
(An architecture paper with theorems and proofs!) Virtual Machine Assists
• Compensate for inefficiencies due to privilege level “compression”
• Fast emulation of system functions• Many developed for IBM mainframe VMs
VMs (c) 2004, J. E. Smith 14
System VirtualizationSystem Virtualization Traps and interrupts (& sys calls)
• Transfer to VMM• VMM determines appropriate Guest OS• VMM transfers to Guest OS
Guest performs privileged operation
• Trap to VMM• VMM reads/modifies guest state• May modify shadow state• Returns to Guest
Guest OS “return” to user app.• Transfer to VMM• VMM bounces return back to Guest app.
privileged operationnext instruction
check privilegesperform operationreturn
system call/trap
vector location:
virtual vector location:
Application
Guest OS
VMM
VMs (c) 2004, J. E. Smith 15
Popek and Goldberg (in brief)Popek and Goldberg (in brief) Control Sensitive instructions
• All instructions that change hardware resource allocation (or mapping)
• Example: write TLB Behavior Sensitive instructions
• All instructions whose outcome depends on hardware resource allocation
• Example: read processor mode Theorem (paraphrase)
• Efficiently virtualizable if all sensitive instructions trap in user mode
VMs (c) 2004, J. E. Smith 16
System VM ResearchSystem VM Research Architecture Challenge:
• Make IA-32 efficiently virtualizable Virtual Machine Assists
• Compensate for inefficiencies due to privilege level “compression”
• Fast emulation of system functions• Many developed for IBM mainframe VMs
Applications to Chip Multiprocessors• Technology changes often require innovation and
“re-invention”
VMs (c) 2004, J. E. Smith 17
The Virtual Machine SpaceThe Virtual Machine Space
Multiprogrammed
Systems
HLL VMsCo-Designed
VMs
same ISA differentISA
Process VMs System VMs
WholeSystem VMs
differentISA same ISA
ClassicOS VMs
DynamicBinary
Optimizers
DynamicTranslators
HostedVMs
VMs (c) 2004, J. E. Smith 18
Architecture Issues: Process VMsArchitecture Issues: Process VMs Generally to allow application migration
• Or to run popular software on a less popular platform• Goal is generally to minimize performance loss
Same-ISA dynamic optimizers are special case• HP Dynamo
Architecture problems• Efficient code-caching• Indirect jump problem• Protecting runtime from guest process
VMs (c) 2004, J. E. Smith 19
Staged Emulation with Code CachingStaged Emulation with Code Caching An important part of many
VM implementations Translate, optimize & cache
frequent code sequences
Binary MemoryImage Code CacheProfile Data
Interpreter
Translator/Optimizer
runtime
Start interpreting Profile to find “hot” code regions
VMs (c) 2004, J. E. Smith 20
SuperblocksSuperblocks
Based on “hot” paths One entry multiple exits May contain redundant blocks (tail duplication)
15
B D
C
G
A
EF
15
B D
C
G
A
EF
GG
VMs (c) 2004, J. E. Smith 21
Binary Translation ExampleBinary Translation Example
4FD0: addl %edx,(%eax) ;load and accumulate summovl (%eax),%edx ;store to memorysub %ebx,1 ;decrement loop countjz 51C8 ;branch if at loop end
4FDC: add %eax,4 ;increment %eaxjmp 4FD0 ;jump to loop top
51C8: movl (%ecx),%edx ;store last value of %edxxorl %edx,%edx ;clear %edxjmp 6200 ;jump elsewhere
x86 Binary9AC0: lwz r16,0(r4) ;load value from memory
add r7,r7,r16 ;accumulate sumstw 0(r5),r7 ;store to memorysubi. r5,r5,1 ;decrement loop count, set cr0bez cr0,pc+12 ;branch if loop exitbl F000 ;branch & link to EM4FDC ;save source PC in link register
9AE4: bl F000 ;branch & link to EM51C8 ;save source PC in link register
9C08: stw 0(r6),r7 ;store last value of %edxsubi r7,r7,r7 ;clear %edxbl F000 ;branch & link to EM6200 ;save source PC in link register
PowerPC Translation
VMs (c) 2004, J. E. Smith 22
Code CachesCode Caches Contain
• Basic blocks• Superblocks (one entrance, multiple exits)• Optimized Superblocks
A base technology for many VMs• Dynamic binary translators: Intel IA-32 EL, Compaq FX!32• Dynamic binary optimizers: Dynamo family• Co-designed virtual machines: Transmeta, IBM DAISY• High performance Java virtual machines• System VMs with “inefficiently virtualizable” ISAs• “Sandboxing” secure VMs (x86 DynamoRIO)
VMs (c) 2004, J. E. Smith 23
Indirect JumpsIndirect Jumps Translated code cache PC (TPC)
differs from Source binary PC (SPC)• Need branch/jump target address translation• (Direct) branches are easier; target address is fixed
Chaining can be used
Superblock
Dispatch table
lookup code
Superblock
Superblock
Without chaining
Superblock
Dispatch table
lookup code
Superblock
Superblock
With chaining
Superblock
VMs (c) 2004, J. E. Smith 24
The Indirect Jump ProblemThe Indirect Jump Problem Target addresses (SPCs) can change
• SPC needs to be translated at run-time, not translation time Conventional solution: superblock construction-time
software prediction (aka inline caching)
If Rx == #addr_1 goto #target_1
Else if Rx == #addr_2 goto #target_2
Else dispatch_table_lookup(Rx); do it the slow way
• The biggest overhead in code caches– Compare-and-branch: 6 instructions– Hash table lookup: 15 instructions in Dynamo x86
VMs (c) 2004, J. E. Smith 25
Protecting the RuntimeProtecting the Runtime The runtime shares
process memory space with application
• Must protect runtime from application
• Expensive memory protection changes on switches between runtime and code cache
• If guest registers are mapped to host memory
How are memory mapped registers protected?
Guest Code
Guest Data
RuntimeData
RuntimeCode
N
R/W
Code Cache
Ex
R/W
N
R/W
R/W Guest Code
Guest Data
RuntimeData
RuntimeCode
N
N
Code Cache
N
Ex
N
R/W
R
Runtime mode Emulation mode
VMs (c) 2004, J. E. Smith 26
Process VM ResearchProcess VM Research Same-ISA dynamic binary optimizers are probably not a
winning proposition• Indirect jumps lead to performance losses on modern processors
(optimizers with patching are better)• Complete (intrinsic) compatibility is extremely difficult
May have to rely on extrinsic assurancesTopic of architecture research similar to Goldberg and
Popek For general process VMs some primitive support in ISA
will be useful / necessary• Indirect jumps (more later)• Code caching• Protection
VMs (c) 2004, J. E. Smith 27
Computer Architecture InnovationComputer Architecture Innovation
HLL VMs – software people invent ISA to solve SW problems
Co-Designed VMs – hardware people invent ISA to solve HW problems
These two are the most interesting VMs from an architecture perspective and provide the biggest opportunities.
VMs (c) 2004, J. E. Smith 28
The Virtual Machine SpaceThe Virtual Machine Space
Multiprogrammed
Systems
HLL VMsCo-Designed
VMs
same ISA differentISA
Process VMs System VMs
WholeSystem VMs
differentISA same ISA
ClassicOS VMs
DynamicBinary
Optimizers
DynamicTranslators
HostedVMs
VMs (c) 2004, J. E. Smith 29
High Level Language Virtual MachinesHigh Level Language Virtual Machines Raise the “ABI” level of abstraction
• User higher level virtual ISA• OS abstracted as standard libraries
A form of process VM
HLL Program
Intermediate Code
Memory Image
Object Code(ISA)
Compiler front-end
Compiler back-end
Loader
HLL Program
Portable Code(Virtual ISA )
Host Instructions
Virt. Mem. Image
Compiler
VM loader
VM Interpreter/Translator
Traditional HLL VM
VMs (c) 2004, J. E. Smith 30
Architecture Issues: High Level VMsArchitecture Issues: High Level VMs
Examples:• Sun Java• Microsoft .NET Framework and MSIL
Why are HLL VMs important?
• Microsoft says so.• It’s a good idea.
Combines object oriented programming and network computing
VMs (c) 2004, J. E. Smith 31
HLL VMs: Architecture PerspectiveHLL VMs: Architecture Perspective
Here, architects were deprived (or let themselves be deprived) of some interesting architecture work
Don’t look at it bottom-up, i.e.• Take existing software for supporting HLL VMs, • Generate traces for standard ISAs, • Analyze traces• Conclude its “just like C”… problem solved!
Look top-down – start with features of MSIL and look for computer architecture opportunities
• Will require a mix of hardware and software innovation(else just continue to ignore real architecture in favor of implementation)
VMs (c) 2004, J. E. Smith 32
HLL VM ResearchHLL VM Research Metadata – an interesting concept
• Data Set Architecture• Don’t have to discover data structures
– compare with C programs.
Metadata
Code
Machine IndependentProgram File
Loader
Virtual MachineImplementation
Interpreter
Internal DataStructures
Translator Native Code
VMs (c) 2004, J. E. Smith 33
HLL VM ResearchHLL VM Research
Precise trap model• Problems in conventional processors:
All state preciseMany instructions can trapEnable/disable “remote” and at any time
• HLL VMsNot all state must be precise
PC not neededoperand stack neverlocal variables only if trap is handled locally
Trap enable explicit and locally specified
VMs (c) 2004, J. E. Smith 34
HLL VM ResearchHLL VM Research
Stack tracking• At any given point, operand stack must have
same number of elements and types regardless of control flow path
• This property could simplify exploitation of control independence
VMs (c) 2004, J. E. Smith 35
HLL VMs SummaryHLL VMs Summary
Claim: Slow-downs due to OO programming, probably not dynamic compilation
– and not stack-based ISA Research opportunities abound
• For VM implementation• For speeding up OO programs (look beyond C/C++)• Use co-designed HW/SW
Base design on MSIL/Java and implement conventional ISA as the uncommon case
VMs (c) 2004, J. E. Smith 36
The Virtual Machine SpaceThe Virtual Machine Space
Multiprogrammed
Systems
HLL VMsCo-Designed
VMs
same ISA differentISA
Process VMs System VMs
WholeSystem VMs
differentISA same ISA
ClassicOS VMs
DynamicBinary
Optimizers
DynamicTranslators
HostedVMs
VMs (c) 2004, J. E. Smith 37
Co-Designed Virtual MachinesCo-Designed Virtual Machines Separate the hardware/software interface from the ISA level of abstraction Restore the ISA to its “natural” place
as an Implementation ISA that reflects actual hardware Support existing ISAs
as a Virtual ISA Let processor designers use both hardware and software A form of system VM
OSlibs.
User Applications
V-ISA
I-ISA
Hardware
Software
Hardware
OSlibs.
User Applications
ISA
VMs (c) 2004, J. E. Smith 38
Co-Designed VMsCo-Designed VMs
Should be of interest to both architects and micro-architects
• Offers opportunities for performance, power saving, fault tolerance and other implementation-dependent features
• Allows transcending conventional ISAs• Don’t confuse them with VLIW!
VMs (c) 2004, J. E. Smith 39
Architecture Issues: Concealed MemoryArchitecture Issues: Concealed Memory VM software resides in memory concealed from all conventional software
Source ISA Data
CodeCache
VM Code
ICacheHierarchy
DCacheHierarchy
ProcessorCore
Source ISA Code
VM Data
concealed memory
conventionalmemory
VMs (c) 2004, J. E. Smith 40
Another Way of Doing ThingsAnother Way of Doing Things
conventional
dynamic translation
Code Cache ProcessorPipeline
SoftwareTranslator
Main Memory
Func.Unit
Func.Unit
. ..
Main Memory CacheHierarchy
ProcessorPipeline
TranslationUnit
(form uops)
Func.Unit
Func.Unit
Func.Unit
. ..
TranslationUnit
(form uops)
CacheHierarchy
VMs (c) 2004, J. E. Smith 41
Jump Target-address Lookup TableJump Target-address Lookup Table A hardware cache of dispatch table entries Similar to software-managed TLB in virtual memory
Jump insn TPC
BTB
Predicted next fetch TPC
Tag TPC
Jump insn
Register identifier
SPC
Register file
Jump Target SPC
SPC TPC
JTLT
Jump Target TPC
Hit?Match?Yes
BTB prediction correct
Yes
NoBTB misprediction:
Redirect fetch to jump target TPC from JTLT
NoJTLT miss:
Redirect fetch to the dispatch code
VMs (c) 2004, J. E. Smith 42
SPC TPC
Push-dual-address-RAS insn
Dual-address RASDual-address RAS Problem: function call instruction saves return SPC not TPC
• Conventional software-based chaining cannot utilize a RAS Solution: save both SPC and TPC
Dual-address RAS
SPC TPC
SPC TPC
JTLT
VMs (c) 2004, J. E. Smith 43
IPC performanceIPC performance
“Translate” Alpha to Alpha; start with highly optimized code Conventional method (ala Dynamo) results in 14% IPC loss Dual-address RAS provides the most benefit Using both JTLT & RAS, 7.7% IPC improvement
• Due to superblock re-layout
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
164.gzip 175.vpr 176.gcc 181.mcf 186.crafty 197.parser 252.eon 253.perlbmk 254.gap 255.vortex 256.bzip2 300.twolf H.mean
IPC
original sw_pred.sw_pred sw_pred.sw_pred (private dispatch) sw_pred.ras jtlt.ras
VMs (c) 2004, J. E. Smith 44
Wide pipelines are at odds with fast pipelines•Fast pipeline => low complexity per stage•More instructions per stage => high complexity per stage
Process larger atomic units in pipeline stages Narrower “effective” width
Reduce decoding stages•Do more in software
Pipeline the issue stage
Research: Efficient MicroarchitecturesResearch: Efficient Microarchitectures
VMs (c) 2004, J. E. Smith 45
Fused Instruction SetFused Instruction Set Co-designed VM x86 implementation
• Shorten and simplify pipeline front-end Combine pairs of dependent instructions
• For single “unit” for pipeline processing Use VM software to
• “Crack” x86 instructions into RISC-ops• Re-order RISC-ops• Reassemble into (new) fused pairs
Related: Pentium-M fuses in front-end• Using original x86 instructions
VMs (c) 2004, J. E. Smith 46
Conventional Issue LogicConventional Issue Logic Select and issue instructions free of data
dependences Based on the selection, clear dependences
•And “wake-up” newly independent instructions Single cycle select-wakeup important for good
performance
OP R1 Imm.R2
OP R6 R7R1
Issue Buffer
selectfanout/
wakeup
VMs (c) 2004, J. E. Smith 47
Fuse dependent instructions into single slot Fused instructions traverse entire pipeline Make single issue decision for the pair
Pipelined Issue LogicPipelined Issue Logic
O P 2 R 1 Im m .R 2O P 1 R 6
Is s u e B u ffe r
s e le c t fa n o u t/w a k e u p
VMs (c) 2004, J. E. Smith 48
Instruction SetInstruction Setcall 0x080af30e (21bit disp)jcc 0x080115a0jmp 0x080C0988
LIMM.lo Redx, LO(0x0810a7de)LIMM.hi Redx, HI(0x0810a7de)CMP.cc Reax, 0x4000
LD Reax, mem[Resp + F8]ST Reax, mem[Rebp + 4C]ADD Reax, Rebx, 4c
ADD Reax, Redx, RebxFmac Facc, Fmp1, Fmp2LD Reax, mem[Rebx + Rebp]
mov esp, ebp MOV Resp, Rebpmov eax,[esp] LD Reax, mem[Resp]add eax, edx ADD Reax, Redx
sub ecx, 4 SUB Recx, 4shr esi, 2 SHR Resi, 2inc ecx INC Recx, 1
jcc 3e e.g. jnz 3e
21-bit Immediate/Displacement10b opcode
11b Immd/Disp10b opcode 5b Rds5b Rsr
16-bit opcode 5b Rds5b Rsr5b Rsr
4b Rd4b Rs 7b op
4b Rd4b I 7b op
8b Immd/Disp 7b op
F
16-bit immediate / Disp10b opcode 5b Rds
F
F
F
F
F
F
VMs (c) 2004, J. E. Smith 49
Translation AlgorithmTranslation Algorithm
Two Pass Algorithm:1. Form superblocks using Dynamo MRET method2. Crack x86 instructions into RISC-like micro-ops3. Attempt to fuse ALU ops only4. Fuse LD/ST instructions as tails and ALU ops
as heads
VMs (c) 2004, J. E. Smith 50
Fusing ProfileFusing Profile About 50% of operations are fused Only 5-10% of non-fused are single-cycle ALU ops
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
164.g
zip
175.v
pr
176.g
cc
181.m
cf
186.c
rafty
197.p
arser
252.e
on
253.p
erlbmk
254.g
ap
255.v
ortex
256.b
zip2
300.t
wolf
Averag
e
Perc
enta
ge o
f Dyn
amic
Inst
ruct
ions
ALU
FP or NOPs
BR
ST
LD
Fused
VMs (c) 2004, J. E. Smith 51
Distance Between Fused OperationsDistance Between Fused Operations Most fused operations close together
• 70% of fused ops from different x86 instructions• 60% contain two ALU operations
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
164.gzip
175.vpr
176.gcc
181.mcf
186.crafty
197.parser
252.eon
253.perlbmk
254.gap
255.vortex
256.bzip2
300.twolf
Percentage of fused macro-ops
1 2 3 4 5 6 7
VMs (c) 2004, J. E. Smith 52
Performance (Normalized IPC)Performance (Normalized IPC) Baseline: generic superscalar Macro-op: Fused macro-ops with pipelined issue logic Baseline Pipelined: superscalar with pipelined issue logic
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
16 24 32 40 48 56 64Issue Window Size
Rel
ativ
e IP
C p
erfo
rman
ce
4-wide Macro-op 4-wide Baseline 4-wide Baseline Pipelined2-wide Macro-op
VMs (c) 2004, J. E. Smith 53
VM ResearchVM Research Architecture Support for VMs
• Enable spectrum of VMs (process, system, HLL, co-designed)• Support for dynamic translation and optimization• Primitives: code caches & indirect jumps; concealed memory• Pays for itself – helps get rid of obsolete ISA baggage
VM applications• Security• Fault Tolerance
Co-Designed VMs• Efficient microarchitecture• Adaptive microarchitecture
For power efficiencyFor performance
New ISAs• Application-area specific ISAs• Support for Java/MSIL• “Convergence” architectures
Computer Architects can do Computer Architecture!