+ All Categories
Home > Documents > Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen...

Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen...

Date post: 06-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
56
Interface manual Xen v3.0 for x86 Xen is Copyright (c) 2002-2005, The Xen Team University of Cambridge, UK
Transcript
Page 1: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

Interface manualXen v3.0 for x86

Xen is Copyright (c) 2002-2005, The Xen TeamUniversity of Cambridge, UK

Page 2: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

DISCLAIMER: This documentation is always under active development andas such there may be mistakes and omissions — watch out for these and pleasereport any you find to the developer’s mailing list. The latest version is alwaysavailable on-line. Contributions of material, suggestions and corrections arewelcome.

Page 3: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

Contents

1 Introduction 1

2 Virtual Architecture 32.1 CPU state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3 Interrupts and events . . . . . . . . . . . . . . . . . . . . . . . . 42.4 Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.5 Xen CPU Scheduling . . . . . . . . . . . . . . . . . . . . . . . . 52.6 Privileged operations . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Memory 73.1 Memory Allocation . . . . . . . . . . . . . . . . . . . . . . . . . 73.2 Pseudo-Physical Memory . . . . . . . . . . . . . . . . . . . . . . 73.3 Page Table Updates . . . . . . . . . . . . . . . . . . . . . . . . . 83.4 Writable Page Tables . . . . . . . . . . . . . . . . . . . . . . . . 93.5 Shadow Page Tables . . . . . . . . . . . . . . . . . . . . . . . . 93.6 Segment Descriptor Tables . . . . . . . . . . . . . . . . . . . . . 103.7 Start of Day . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.8 VM assists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Xen Info Pages 134.1 Shared info page . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.1.1 vcpu info t . . . . . . . . . . . . . . . . . . . . . . . . . 144.1.2 vcpu time info . . . . . . . . . . . . . . . . . . . . . . . 154.1.3 arch shared info t . . . . . . . . . . . . . . . . . . . . . 16

4.2 Start info page . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5 Event Channels 195.1 Hypercall interface . . . . . . . . . . . . . . . . . . . . . . . . . 19

i

Page 4: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

6 Grant tables 236.1 Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

6.1.1 Grant table manipulation . . . . . . . . . . . . . . . . . . 236.1.2 Hypercalls . . . . . . . . . . . . . . . . . . . . . . . . . 24

7 Xenstore 257.1 Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257.2 Store layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

8 Devices 318.1 Network I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

8.1.1 Backend Packet Handling . . . . . . . . . . . . . . . . . 328.1.2 Data Transfer . . . . . . . . . . . . . . . . . . . . . . . . 338.1.3 Network ring interface . . . . . . . . . . . . . . . . . . . 33

8.2 Block I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358.2.1 Data Transfer . . . . . . . . . . . . . . . . . . . . . . . . 358.2.2 Block ring interface . . . . . . . . . . . . . . . . . . . . . 36

8.3 Virtual TPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378.3.1 Data Transfer . . . . . . . . . . . . . . . . . . . . . . . . 378.3.2 Virtual TPM ring interface . . . . . . . . . . . . . . . . . 37

9 Further Information 399.1 Other documentation . . . . . . . . . . . . . . . . . . . . . . . . 399.2 Online references . . . . . . . . . . . . . . . . . . . . . . . . . . 399.3 Mailing lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

A Xen Hypercalls 41A.1 Invoking Hypercalls . . . . . . . . . . . . . . . . . . . . . . . . . 41A.2 Virtual CPU Setup . . . . . . . . . . . . . . . . . . . . . . . . . 42A.3 Scheduling and Timer . . . . . . . . . . . . . . . . . . . . . . . . 43A.4 Page Table Management . . . . . . . . . . . . . . . . . . . . . . 44A.5 Segmentation Support . . . . . . . . . . . . . . . . . . . . . . . . 46A.6 Context Switching . . . . . . . . . . . . . . . . . . . . . . . . . 46A.7 Physical Memory Management . . . . . . . . . . . . . . . . . . . 47A.8 Inter-Domain Communication . . . . . . . . . . . . . . . . . . . 48A.9 IO Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . 49A.10 Administrative Operations . . . . . . . . . . . . . . . . . . . . . 49A.11 Access Control Module Hypercalls . . . . . . . . . . . . . . . . . 51A.12 Debugging Hypercalls . . . . . . . . . . . . . . . . . . . . . . . 52

ii

Page 5: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

Chapter 1

Introduction

Xen allows the hardware resources of a machine to be virtualized and dynamicallypartitioned, allowing multiple different guest operating system images to be runsimultaneously. Virtualizing the machine in this manner provides considerableflexibility, for example allowing different users to choose their preferred operatingsystem (e.g., Linux, NetBSD, or a custom operating system). Furthermore, Xenprovides secure partitioning between virtual machines (known as domains in Xenterminology), and enables better resource accounting and QoS isolation than canbe achieved with a conventional operating system.

Xen essentially takes a ‘whole machine’ virtualization approach as pioneered byIBM VM/370. However, unlike VM/370 or more recent efforts such as VMwareand Virtual PC, Xen does not attempt to completely virtualize the underlying hard-ware. Instead parts of the hosted guest operating systems are modified to work withthe VMM; the operating system is effectively ported to a new target architecture,typically requiring changes in just the machine-dependent code. The user-levelAPI is unchanged, and so existing binaries and operating system distributions workwithout modification.

In addition to exporting virtualized instances of CPU, memory, network and blockdevices, Xen exposes a control interface to manage how these resources are sharedbetween the running domains. Access to the control interface is restricted: it mayonly be used by one specially-privileged VM, known as domain 0. This domainis a required part of any Xen-based server and runs the application software thatmanages the control-plane aspects of the platform. Running the control software indomain 0, distinct from the hypervisor itself, allows the Xen framework to separatethe notions of mechanism and policy within the system.

1

Page 6: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

2

Page 7: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

Chapter 2

Virtual Architecture

In a Xen/x86 system, only the hypervisor runs with full processor privileges (ring0 in the x86 four-ring model). It has full access to the physical memory availablein the system and is responsible for allocating portions of it to running domains.

On a 32-bit x86 system, guest operating systems may use rings 1, 2 and 3 as theysee fit. Segmentation is used to prevent the guest OS from accessing the portion ofthe address space that is reserved for Xen. We expect most guest operating systemswill use ring 1 for their own operation and place applications in ring 3.

On 64-bit systems it is not possible to protect the hypervisor from untrusted guestcode running in rings 1 and 2. Guests are therefore restricted to run in ring 3 only.The guest kernel is protected from its applications by context switching betweenthe kernel and currently running application.

In this chapter we consider the basic virtual architecture provided by Xen: CPUstate, exception and interrupt handling, and time. Other aspects such as memoryand device access are discussed in later chapters.

2.1 CPU state

All privileged state must be handled by Xen. The guest OS has no direct accessto CR3 and is not permitted to update privileged bits in EFLAGS. Guest OSes usehypercalls to invoke operations in Xen; these are analogous to system calls butoccur from ring 1 to ring 0.

A list of all hypercalls is given in Appendix A.

3

Page 8: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

2.2 Exceptions

A virtual IDT is provided — a domain can submit a table of trap handlers to Xenvia the set trap table hypercall. The exception stack frame presented to a virtualtrap handler is identical to its native equivalent.

2.3 Interrupts and events

Interrupts are virtualized by mapping them to event channels, which are deliveredasynchronously to the target domain using a callback supplied via the set callbackshypercall. A guest OS can map these events onto its standard interrupt dispatchmechanisms. Xen is responsible for determining the target domain that will handleeach physical interrupt source. For more details on the binding of event sources toevent channels, see Chapter 8.

2.4 Time

Guest operating systems need to be aware of the passage of both real (or wallclock)time and their own ‘virtual time’ (the time for which they have been executing).Furthermore, Xen has a notion of time which is used for scheduling. The followingnotions of time are provided:

Cycle counter time. This provides a fine-grained time reference. The cycle countertime is used to accurately extrapolate the other time references. On SMPmachines it is currently assumed that the cycle counter time is synchronizedbetween CPUs. The current x86-based implementation achieves this withininter-CPU communication latencies.

System time. This is a 64-bit counter which holds the number of nanoseconds thathave elapsed since system boot.

Wall clock time. This is the time of day in a Unix-style struct timeval (secondsand microseconds since 1 January 1970, adjusted by leap seconds). An NTPclient hosted by domain 0 can keep this value accurate.

Domain virtual time. This progresses at the same pace as system time, but onlywhile a domain is executing — it stops while a domain is de-scheduled.Therefore the share of the CPU that a domain receives is indicated by therate at which its virtual time increases.

4

Page 9: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

Xen exports timestamps for system time and wall-clock time to guest operatingsystems through a shared page of memory. Xen also provides the cycle countertime at the instant the timestamps were calculated, and the CPU frequency in Hertz.This allows the guest to extrapolate system and wall-clock times accurately basedon the current cycle counter time.

Since all time stamps need to be updated and read atomically a version number isalso stored in the shared info page, which is incremented before and after updatingthe timestamps. Thus a guest can be sure that it read a consistent state by checkingthe two version numbers are equal and even.

Xen includes a periodic ticker which sends a timer event to the currently executingdomain every 10ms. The Xen scheduler also sends a timer event whenever a do-main is scheduled; this allows the guest OS to adjust for the time that has passedwhile it has been inactive. In addition, Xen allows each domain to request thatthey receive a timer event sent at a specified system time by using the set timer ophypercall. Guest OSes may use this timer to implement timeout values when theyblock.

2.5 Xen CPU Scheduling

Xen offers a uniform API for CPU schedulers. It is possible to choose from anumber of schedulers at boot and it should be easy to add more. The SEDF andCredit schedulers are part of the normal Xen distribution. SEDF will be goingaway and its use should be avoided once the credit scheduler has stabilized andbecome the default. The Credit scheduler provides proportional fair shares of thehost’s CPUs to the running domains. It does this while transparently load balancingrunnable VCPUs across the whole system.

Note: SMP host support Xen has always supported SMP host systems. Whenusing the credit scheduler, a domain’s VCPUs will be dynamically moved acrossphysical CPUs to maximise domain and system throughput. VCPUs can also bemanually restricted to be mapped only on a subset of the host’s physical CPUs,using the pinning mechanism.

2.6 Privileged operations

Xen exports an extended interface to privileged domains (viz. Domain 0). Thisallows such domains to build and boot other domains on the server, and provides

5

Page 10: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

control interfaces for managing scheduling, memory, networking, and block de-vices.

6

Page 11: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

Chapter 3

Memory

Xen is responsible for managing the allocation of physical memory to domains,and for ensuring safe use of the paging and segmentation hardware.

3.1 Memory Allocation

As well as allocating a portion of physical memory for its own private use, Xenalso reserves s small fixed portion of every virtual address space. This is locatedin the top 64MB on 32-bit systems, the top 168MB on PAE systems, and a largerportion in the middle of the address space on 64-bit systems. Unreserved physicalmemory is available for allocation to domains at a page granularity. Xen tracksthe ownership and use of each page, which allows it to enforce secure partitioningbetween domains.

Each domain has a maximum and current physical memory allocation. A guest OSmay run a ‘balloon driver’ to dynamically adjust its current memory allocation upto its limit.

3.2 Pseudo-Physical Memory

Since physical memory is allocated and freed on a page granularity, there is noguarantee that a domain will receive a contiguous stretch of physical memory.However most operating systems do not have good support for operating in a frag-mented physical address space. To aid porting such operating systems to run ontop of Xen, we make a distinction between machine memory and pseudo-physicalmemory.

7

Page 12: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

Put simply, machine memory refers to the entire amount of memory installed in themachine, including that reserved by Xen, in use by various domains, or currentlyunallocated. We consider machine memory to comprise a set of 4kB machine pageframes numbered consecutively starting from 0. Machine frame numbers mean thesame within Xen or any domain.

Pseudo-physical memory, on the other hand, is a per-domain abstraction. It allowsa guest operating system to consider its memory allocation to consist of a contigu-ous range of physical page frames starting at physical frame 0, despite the fact thatthe underlying machine page frames may be sparsely allocated and in any order.

To achieve this, Xen maintains a globally readable machine-to-physical table whichrecords the mapping from machine page frames to pseudo-physical ones. In addi-tion, each domain is supplied with a physical-to-machine table which performs theinverse mapping. Clearly the machine-to-physical table has size proportional to theamount of RAM installed in the machine, while each physical-to-machine table hassize proportional to the memory allocation of the given domain.

Architecture dependent code in guest operating systems can then use the two tablesto provide the abstraction of pseudo-physical memory. In general, only certainspecialized parts of the operating system (such as page table management) needsto understand the difference between machine and pseudo-physical addresses.

3.3 Page Table Updates

In the default mode of operation, Xen enforces read-only access to page tablesand requires guest operating systems to explicitly request any modifications. Xenvalidates all such requests and only applies updates that it deems safe. This isnecessary to prevent domains from adding arbitrary mappings to their page tables.

To aid validation, Xen associates a type and reference count with each memorypage. A page has one of the following mutually-exclusive types at any point intime: page directory (PD), page table (PT), local descriptor table (LDT), globaldescriptor table (GDT), or writable (RW). Note that a guest OS may always createreadable mappings of its own memory regardless of its current type.

This mechanism is used to maintain the invariants required for safety; for example,a domain cannot have a writable mapping to any part of a page table as this wouldrequire the page concerned to simultaneously be of types PT and RW.

mmu update(mmu update t *req, int count, int *success count, domid t do-mid)

8

Page 13: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

This hypercall is used to make updates to either the domain’s pagetables or to themachine to physical mapping table. It supports submitting a queue of updates,allowing batching for maximal performance. Explicitly queuing updates using thisinterface will cause any outstanding writable pagetable state to be flushed from thesystem.

3.4 Writable Page Tables

Xen also provides an alternative mode of operation in which guests have the il-lusion that their page tables are directly writable. Of course this is not really thecase, since Xen must still validate modifications to ensure secure partitioning. Tothis end, Xen traps any write attempt to a memory page of type PT (i.e., that is cur-rently part of a page table). If such an access occurs, Xen temporarily allows writeaccess to that page while at the same time disconnecting it from the page table thatis currently in use. This allows the guest to safely make updates to the page be-cause the newly-updated entries cannot be used by the MMU until Xen revalidatesand reconnects the page. Reconnection occurs automatically in a number of situa-tions: for example, when the guest modifies a different page-table page, when thedomain is preempted, or whenever the guest uses Xen’s explicit page-table updateinterfaces.

Writable pagetable functionality is enabled when the guest requests it, using avm assist hypercall. Writable pagetables do not provide full virtualisation of theMMU, so the memory management code of the guest still needs to be aware thatit is running on Xen. Since the guest’s page tables are used directly, it must trans-late pseudo-physical addresses to real machine addresses when building page tableentries. The guest may not attempt to map its own pagetables writably, since thiswould violate the memory type invariants; page tables will automatically be madewritable by the hypervisor, as necessary.

3.5 Shadow Page Tables

Finally, Xen also supports a form of shadow page tables in which the guest OSuses a independent copy of page tables which are unknown to the hardware (i.e.which are never pointed to by cr3). Instead Xen propagates changes made to theguest’s tables to the real ones, and vice versa. This is useful for logging page writes(e.g. for live migration or checkpoint). A full version of the shadow page tablesalso allows guest OS porting with less effort.

9

Page 14: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

3.6 Segment Descriptor Tables

At start of day a guest is supplied with a default GDT, which does not reside withinits own memory allocation. If the guest wishes to use other than the default ‘flat’ring-1 and ring-3 segments that this GDT provides, it must register a custom GDTand/or LDT with Xen, allocated from its own memory.

The following hypercall is used to specify a new GDT:

int set gdt(unsigned long *frame list, int entries)

frame list: An array of up to 14 machine page frames within whichthe GDT resides. Any frame registered as a GDT frame may only bemapped read-only within the guest’s address space (e.g., no writablemappings, no use as a page-table page, and so on). Only 14 pages maybe specified because pages 15 and 16 are reserved for the hypervisor’sGDT entries.

entries: The number of descriptor-entry slots in the GDT.

The LDT is updated via the generic MMU update mechanism (i.e., via the mmu updatehypercall.

3.7 Start of Day

The start-of-day environment for guest operating systems is rather different to thatprovided by the underlying hardware. In particular, the processor is already exe-cuting in protected mode with paging enabled.

Domain 0 is created and booted by Xen itself. For all subsequent domains, theanalogue of the boot-loader is the domain builder, user-space software running indomain 0. The domain builder is responsible for building the initial page tables fora domain and loading its kernel image at the appropriate virtual address.

3.8 VM assists

Xen provides a number of “assists” for guest memory management. These areavailable on an “opt-in” basis to provide commonly-used extra functionality to aguest.

vm assist(unsigned int cmd, unsigned int type)The cmd parameter describes the action to be taken, whilst the type parameter

10

Page 15: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

describes the kind of assist that is being referred to. Available commands are asfollows:

VMASST CMD enable Enable a particular assist type

VMASST CMD disable Disable a particular assist type

And the available types are:

VMASST TYPE 4gb segments Provide emulated support for instructions thatrely on 4GB segments (such as the techniques used by some TLS solutions).

VMASST TYPE 4gb segments notify Provide a callback to the guest if the abovesegment fixups are used: allows the guest to display a warning message dur-ing boot.

VMASST TYPE writable pagetables Enable writable pagetable mode - describedabove.

11

Page 16: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

12

Page 17: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

Chapter 4

Xen Info Pages

The Shared info page is used to share various CPU-related state between the guestOS and the hypervisor. This information includes VCPU status, time informationand event channel (virtual interrupt) state. The Start info page is used to passbuild-time information to the guest when it boots and when it is resumed from asuspended state. This chapter documents the fields included in the shared info tand start info t structures for use by the guest OS.

4.1 Shared info page

The shared info t is accessed at run time by both Xen and the guest OS. It is usedto pass information relating to the virtual CPU and virtual machine state betweenthe OS and the hypervisor.

The structure is declared in xen/include/public/xen.h:typedef struct shared_info {

vcpu_info_t vcpu_info[MAX_VIRT_CPUS];

/** A domain can create "event channels" on which it can send and receive* asynchronous event notifications. There are three classes of event that* are delivered by this mechanism:* 1. Bi-directional inter- and intra-domain connections. Domains must* arrange out-of-band to set up a connection (usually by allocating* an unbound ’listener’ port and advertising that via a storage service* such as xenstore).* 2. Physical interrupts. A domain with suitable hardware-access* privileges can bind an event-channel port to a physical interrupt* source.* 3. Virtual interrupts (’events’). A domain can bind an event-channel* port to a virtual interrupt source, such as the virtual-timer* device or the emergency console.

13

Page 18: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

** Event channels are addressed by a "port index". Each channel is* associated with two bits of information:* 1. PENDING -- notifies the domain that there is a pending notification* to be processed. This bit is cleared by the guest.* 2. MASK -- if this bit is clear then a 0->1 transition of PENDING* will cause an asynchronous upcall to be scheduled. This bit is only* updated by the guest. It is read-only within Xen. If a channel* becomes pending while the channel is masked then the ’edge’ is lost* (i.e., when the channel is unmasked, the guest must manually handle* pending notifications as no upcall will be scheduled by Xen).** To expedite scanning of pending notifications, any 0->1 pending* transition on an unmasked channel causes a corresponding bit in a* per-vcpu selector word to be set. Each bit in the selector covers a* ’C long’ in the PENDING bitfield array.*/unsigned long evtchn_pending[sizeof(unsigned long) * 8];unsigned long evtchn_mask[sizeof(unsigned long) * 8];

/** Wallclock time: updated only by control software. Guests should base* their gettimeofday() syscall on this wallclock-base value.*/uint32_t wc_version; /* Version counter: see vcpu_time_info_t. */uint32_t wc_sec; /* Secs 00:00:00 UTC, Jan 1, 1970. */uint32_t wc_nsec; /* Nsecs 00:00:00 UTC, Jan 1, 1970. */

arch_shared_info_t arch;

} shared_info_t;

vcpu info An array of vcpu info t structures, each of which holds either runtimeinformation about a virtual CPU, or is “empty” if the corresponding VCPUdoes not exist.

evtchn pending Guest-global array, with one bit per event channel. Bits are set ifan event is currently pending on that channel.

evtchn mask Guest-global array for masking notifications on event channels.

wc version Version counter for current wallclock time.

wc sec Whole seconds component of current wallclock time.

wc nsec Nanoseconds component of current wallclock time.

arch Host architecture-dependent portion of the shared info structure.

4.1.1 vcpu info t

typedef struct vcpu_info {/** ’evtchn_upcall_pending’ is written non-zero by Xen to indicate

14

Page 19: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

* a pending notification for a particular VCPU. It is then cleared* by the guest OS /before/ checking for pending work, thus avoiding* a set-and-check race. Note that the mask is only accessed by Xen* on the CPU that is currently hosting the VCPU. This means that the* pending and mask flags can be updated by the guest without special* synchronisation (i.e., no need for the x86 LOCK prefix).* This may seem suboptimal because if the pending flag is set by* a different CPU then an IPI may be scheduled even when the mask* is set. However, note:* 1. The task of ’interrupt holdoff’ is covered by the per-event-* channel mask bits. A ’noisy’ event that is continually being* triggered can be masked at source at this very precise* granularity.* 2. The main purpose of the per-VCPU mask is therefore to restrict* reentrant execution: whether for concurrency control, or to* prevent unbounded stack usage. Whatever the purpose, we expect* that the mask will be asserted only for short periods at a time,* and so the likelihood of a ’spurious’ IPI is suitably small.* The mask is read before making an event upcall to the guest: a* non-zero mask therefore guarantees that the VCPU will not receive* an upcall activation. The mask is cleared when the VCPU requests* to block: this avoids wakeup-waiting races.*/uint8_t evtchn_upcall_pending;uint8_t evtchn_upcall_mask;unsigned long evtchn_pending_sel;arch_vcpu_info_t arch;vcpu_time_info_t time;

} vcpu_info_t; /* 64 bytes (x86) */

evtchn upcall pending This is set non-zero by Xen to indicate that there arepending events to be received.

evtchn upcall mask This is set non-zero to disable all interrupts for this CPU forshort periods of time. If individual event channels need to be masked, theevtchn mask in the shared info t is used instead.

evtchn pending sel When an event is delivered to this VCPU, a bit is set in this se-lector to indicate which word of the evtchn pending array in the shared info tcontains the event in question.

arch Architecture-specific VCPU info. On x86 this contains the virtualized CR2register (page fault linear address) for this VCPU.

time Time values for this VCPU.

4.1.2 vcpu time info

typedef struct vcpu_time_info {/** Updates to the following values are preceded and followed by an* increment of ’version’. The guest can therefore detect updates by* looking for changes to ’version’. If the least-significant bit of

15

Page 20: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

* the version number is set then an update is in progress and the guest* must wait to read a consistent set of values.* The correct way to interact with the version number is similar to* Linux’s seqlock: see the implementations of read_seqbegin/read_seqretry.*/uint32_t version;uint32_t pad0;uint64_t tsc_timestamp; /* TSC at last update of time vals. */uint64_t system_time; /* Time, in nanosecs, since boot. *//** Current system time:* system_time + ((tsc - tsc_timestamp) << tsc_shift) * tsc_to_system_mul* CPU frequency (Hz):* ((10ˆ9 << 32) / tsc_to_system_mul) >> tsc_shift*/uint32_t tsc_to_system_mul;int8_t tsc_shift;int8_t pad1[3];

} vcpu_time_info_t; /* 32 bytes */

version Used to ensure the guest gets consistent time updates.

tsc timestamp Cycle counter timestamp of last time value; could be used to ex-polate in between updates, for instance.

system time Time since boot (nanoseconds).

tsc to system mul Cycle counter to nanoseconds multiplier (used in extrapolatingcurrent time).

tsc shift Cycle counter to nanoseconds shift (used in extrapolating current time).

4.1.3 arch shared info t

On x86, the arch shared info t is defined as follows (from xen/public/arch-x86 32.h):typedef struct arch_shared_info {

unsigned long max_pfn; /* max pfn that appears in table *//* Frame containing list of mfns containing list of mfns containing p2m. */unsigned long pfn_to_mfn_frame_list_list;

} arch_shared_info_t;

max pfn The maximum PFN listed in the physical-to-machine mapping table (P2Mtable).

pfn to mfn frame list list Machine address of the frame that contains the ma-chine addresses of the P2M table frames.

4.2 Start info page

The start info structure is declared as the following (in xen/include/public/xen.h):

16

Page 21: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

#define MAX_GUEST_CMDLINE 1024typedef struct start_info {

/* THE FOLLOWING ARE FILLED IN BOTH ON INITIAL BOOT AND ON RESUME. */char magic[32]; /* "Xen-<version>.<subversion>". */unsigned long nr_pages; /* Total pages allocated to this domain. */unsigned long shared_info; /* MACHINE address of shared info struct. */uint32_t flags; /* SIF_xxx flags. */unsigned long store_mfn; /* MACHINE page number of shared page. */uint32_t store_evtchn; /* Event channel for store communication. */unsigned long console_mfn; /* MACHINE address of console page. */uint32_t console_evtchn; /* Event channel for console messages. *//* THE FOLLOWING ARE ONLY FILLED IN ON INITIAL BOOT (NOT RESUME). */unsigned long pt_base; /* VIRTUAL address of page directory. */unsigned long nr_pt_frames; /* Number of bootstrap p.t. frames. */unsigned long mfn_list; /* VIRTUAL address of page-frame list. */unsigned long mod_start; /* VIRTUAL address of pre-loaded module. */unsigned long mod_len; /* Size (bytes) of pre-loaded module. */int8_t cmd_line[MAX_GUEST_CMDLINE];

} start_info_t;

The fields are in two groups: the first group are always filled in when a domain isbooted or resumed, the second set are only used at boot time.

The always-available group is as follows:

magic A text string identifying the Xen version to the guest.

nr pages The number of real machine pages available to the guest.

shared info Machine address of the shared info structure, allowing the guest tomap it during initialisation.

flags Flags for describing optional extra settings to the guest.

store mfn Machine address of the Xenstore communications page.

store evtchn Event channel to communicate with the store.

console mfn Machine address of the console data page.

console evtchn Event channel to notify the console backend.

The boot-only group may only be safely referred to during system boot:

pt base Virtual address of the page directory created for us by the domain builder.

nr pt frames Number of frames used by the builders’ bootstrap pagetables.

mfn list Virtual address of the list of machine frames this domain owns.

mod start Virtual address of any pre-loaded modules (e.g. ramdisk)

mod len Size of pre-loaded module (if any).

cmd line Kernel command line passed by the domain builder.

17

Page 22: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

18

Page 23: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

Chapter 5

Event Channels

Event channels are the basic primitive provided by Xen for event notifications. Anevent is the Xen equivalent of a hardware interrupt. They essentially store one bitof information, the event of interest is signalled by transitioning this bit from 0 to1.

Notifications are received by a guest via an upcall from Xen, indicating when anevent arrives (setting the bit). Further notifications are masked until the bit iscleared again (therefore, guests must check the value of the bit after re-enablingevent delivery to ensure no missed notifications).

Event notifications can be masked by setting a flag; this is equivalent to disablinginterrupts and can be used to ensure atomicity of certain operations in the guestkernel.

5.1 Hypercall interface

event channel op(evtchn op t *op)The event channel operation hypercall is used for all operations on event channels /ports. Operations are distinguished by the value of the cmd field of the op structure.The possible commands are described below:

EVTCHNOP alloc unbound Allocate a new event channel port, ready to be con-nected to by a remote domain.

• Specified domain must exist.

• A free port must exist in that domain.

19

Page 24: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

Unprivileged domains may only allocate their own ports, privileged domainsmay also allocate ports in other domains.

EVTCHNOP bind interdomain Bind an event channel for interdomain commu-nications.

• Caller domain must have a free port to bind.

• Remote domain must exist.

• Remote port must be allocated and currently unbound.

• Remote port must be expecting the caller domain as the “remote”.

EVTCHNOP bind virq Allocate a port and bind a VIRQ to it.

• Caller domain must have a free port to bind.

• VIRQ must be valid.

• VCPU must exist.

• VIRQ must not currently be bound to an event channel.

EVTCHNOP bind ipi Allocate and bind a port for notifying other virtual CPUs.

• Caller domain must have a free port to bind.

• VCPU must exist.

EVTCHNOP bind pirq Allocate and bind a port to a real IRQ.

• Caller domain must have a free port to bind.

• PIRQ must be within the valid range.

• Another binding for this PIRQ must not exist for this domain.

• Caller must have an available port.

EVTCHNOP close Close an event channel (no more events will be received).

• Port must be valid (currently allocated).

EVTCHNOP send Send a notification on an event channel attached to a port.

• Port must be valid.

• Only valid for Interdomain, IPI or Allocated Unbound ports.

EVTCHNOP status Query the status of a port; what kind of port, whether it isbound, what remote domain is expected, what PIRQ or VIRQ it is bound to,what VCPU will be notified, etc. Unprivileged domains may only query thestate of their own ports. Privileged domains may query any port.

20

Page 25: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

EVTCHNOP bind vcpu Bind event channel to a particular VCPU - receive no-tification upcalls only on that VCPU.

• VCPU must exist.

• Port must be valid.

• Event channel must be either: allocated but unbound, bound to an in-terdomain event channel, bound to a PIRQ.

21

Page 26: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

22

Page 27: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

Chapter 6

Grant tables

Xen’s grant tables provide a generic mechanism to memory sharing between do-mains. This shared memory interface underpins the split device drivers for blockand network IO.

Each domain has its own grant table. This is a data structure that is shared withXen; it allows the domain to tell Xen what kind of permissions other domains haveon its pages. Entries in the grant table are identified by grant references. A grantreference is an integer, which indexes into the grant table. It acts as a capabilitywhich the grantee can use to perform operations on the granter’s memory.

This capability-based system allows shared-memory communications between un-privileged domains. A grant reference also encapsulates the details of a sharedpage, removing the need for a domain to know the real machine address of a pageit is sharing. This makes it possible to share memory correctly with domains run-ning in fully virtualised memory.

6.1 Interface

6.1.1 Grant table manipulation

Creating and destroying grant references is done by direct access to the grant table.This removes the need to involve Xen when creating grant references, modifyingaccess permissions, etc. The grantee domain will invoke hypercalls to use the grantreferences. Four main operations can be accomplished by directly manipulating thetable:

Grant foreign access allocate a new entry in the grant table and fill out the access

23

Page 28: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

permissions accordingly. The access permissions will be looked up by Xenwhen the grantee attempts to use the reference to map the granted frame.

End foreign access check that the grant reference is not currently in use, thenremove the mapping permissions for the frame. This prevents further map-pings from taking place but does not allow forced revocations of existingmappings.

Grant foreign transfer allocate a new entry in the table specifying transfer per-missions for the grantee. Xen will look up this entry when the grantee at-tempts to transfer a frame to the granter.

End foreign transfer remove permissions to prevent a transfer occurring in fu-ture. If the transfer is already committed, modifying the grant table cannotprevent it from completing.

6.1.2 Hypercalls

Use of grant references is accomplished via a hypercall. The grant table op hyper-call takes three arguments:

grant table op(unsigned int cmd, void *uop, unsigned int count)cmd indicates the grant table operation of interest. uop is a pointer to a structure(or an array of structures) describing the operation to be performed. The countfield describes how many grant table operations are being batched together.

The core logic is situated in xen/common/grant table.c. The grant table operationhypercall can be used to perform the following actions:

GNTTABOP map grant ref Given a grant reference from another domain, mapthe referred page into the caller’s address space.

GNTTABOP unmap grant ref Remove a mapping to a granted frame from thecaller’s address space. This is used to voluntarily relinquish a mapping to agranted page.

GNTTABOP setup table Setup grant table for caller domain.

GNTTABOP dump table Debugging operation.

GNTTABOP transfer Given a transfer reference from another domain, transferownership of a page frame to that domain.

24

Page 29: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

Chapter 7

Xenstore

Xenstore is the mechanism by which control-plane activities occur. These activitiesinclude:

• Setting up shared memory regions and event channels for use with the splitdevice drivers.

• Notifying the guest of control events (e.g. balloon driver requests)

• Reporting back status information from the guest (e.g. performance-relatedstatistics, etc).

The store is arranged as a hierarchical collection of key-value pairs. Each domainhas a directory hierarchy containing data related to its configuration. Domains arepermitted to register for notifications about changes in subtrees of the store, and toapply changes to the store transactionally.

7.1 Guidelines

A few principles govern the operation of the store:

• Domains should only modify the contents of their own directories.

• The setup protocol for a device channel should simply consist of entering theconfiguration data into the store.

• The store should allow device discovery without requiring the relevant de-vice drivers to be loaded: a Xen “bus” should be visible to probing code inthe guest.

• The store should be usable for inter-tool communications, allowing the toolsthemselves to be decomposed into a number of smaller utilities, rather than

25

Page 30: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

a single monolithic entity. This also facilitates the development of alternateuser interfaces to the same functionality.

7.2 Store layout

There are three main paths in XenStore:

/vm stores configuration information about domain

/local/domain stores information about the domain on the local node (domid, etc.)

/tool stores information for the various tools

The /vm path stores configuration information for a domain. This informationdoesn’t change and is indexed by the domain’s UUID. A /vm entry contains thefollowing information:

uuid uuid of the domain (somewhat redundant)

on reboot the action to take on a domain reboot request (destroy or restart)

on poweroff the action to take on a domain halt request (destroy or restart)

on crash the action to take on a domain crash (destroy or restart)

vcpus the number of allocated vcpus for the domain

memory the amount of memory (in megabytes) for the domain Note: appears tosometimes be empty for domain-0

vcpu avail the number of active vcpus for the domain (vcpus - number of disabledvcpus)

name the name of the domain

/vm/<uuid>/image/The image path is only available for Domain-Us and contains:

ostype identifies the builder type (linux or vmx)

kernel path to kernel on domain-0

cmdline command line to pass to domain-U kernel

ramdisk path to ramdisk on domain-0

/localThe /local path currently only contains one directory, /local/domain thatis indexed by domain id. It contains the running domain information. The reasonto have two storage areas is that during migration, the uuid doesn’t change but the

26

Page 31: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

domain id does. The /local/domain directory can be created and populatedbefore finalizing the migration enabling localhost to localhost migration.

/local/domain/<domid>

This path contains:

cpu time xend start time (this is only around for domain-0)

handle private handle for xend

name see /vm

on reboot see /vm

on poweroff see /vm

on crash see /vm

vm the path to the VM directory for the domain

domid the domain id (somewhat redundant)

running indicates that the domain is currently running

memory the current memory in megabytes for the domain (empty for domain-0?)

maxmem KiB the maximum memory for the domain (in kilobytes)

memory KiB the memory allocated to the domain (in kilobytes)

cpu the current CPU the domain is pinned to (empty for domain-0?)

cpu weight the weight assigned to the domain

vcpu avail a bitmap telling the domain whether it may use a given VCPU

online vcpus how many vcpus are currently online

vcpus the total number of vcpus allocated to the domain

console/ a directory for console information

ring-ref the grant table reference of the console ring queue

port the event channel being used for the console ring queue (local port)

tty the current tty the console data is being exposed of

limit the limit (in bytes) of console data to buffer

backend/ a directory containing all backends the domain hosts

vbd/ a directory containing vbd backends

<domid>/ a directory containing vbd’s for domid

27

Page 32: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

<virtual-device>/ a directory for a particular virtual-device ondomid

frontend-id domain id of frontend

frontend the path to the frontend domain

physical-device backend device number

sector-size backend sector size

info 0 read/write, 1 read-only (is this right?)

domain name of frontend domain

params parameters for device

type the type of the device

dev the virtual device (as given by the user)

node output from block creation script

vif/ a directory containing vif backends

<domid>/ a directory containing vif’s for domid

<vif number>/ a directory for each vif

frontend-id the domain id of the frontend

frontend the path to the frontend

mac the mac address of the vif

bridge the bridge the vif is connected to

handle the handle of the vif

script the script used to create/stop the vif

domain the name of the frontend

vtpm/ a directory containing vtpm backends

<domid>/ a directory containing vtpm’s for domid

<vtpm number>/ a directory for each vtpm

frontend-id the domain id of the frontend

frontend the path to the frontend

instance the instance of the virtual TPM that is used

pref instance the instance number as given in the VM configura-tion file; may be different from instance

28

Page 33: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

domain the name of the domain of the frontend

device/ a directory containing the frontend devices for the domain

vbd/ a directory containing vbd frontend devices for the domain

<virtual-device>/ a directory containing the vbd frontend for virtual-device

virtual-device the device number of the frontend device

backend-id the domain id of the backend

backend the path of the backend in the store (/local/domain path)

ring-ref the grant table reference for the block request ring queue

event-channel the event channel used for the block request ringqueue

vif/ a directory containing vif frontend devices for the domain

<id>/ a directory for vif id frontend device for the domain

backend-id the backend domain id

mac the mac address of the vif

handle the internal vif handle

backend a path to the backend’s store entry

tx-ring-ref the grant table reference for the transmission ringqueue

rx-ring-ref the grant table reference for the receiving ringqueue

event-channel the event channel used for the two ring queues

vtpm/ a directory containing the vtpm frontend device for the domain

<id> a directory for vtpm id frontend device for the domain

backend-id the backend domain id

backend a path to the backend’s store entry

ring-ref the grant table reference for the tx/rx ring

event-channel the event channel used for the ring

device-misc/ miscellaneous information for devices

vif/ miscellaneous information for vif devices

nextDeviceID the next device id to use

29

Page 34: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

security/ access control information for the domain

ssidref security reference identifier used inside the hypervisor

access control/ security label used by management tools

label security label name

policy security policy name

store/ per-domain information for the store

port the event channel used for the store ring queue

ring-ref - the grant table reference used for the store’s communication chan-nel

image - private xend information

30

Page 35: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

Chapter 8

Devices

Virtual devices under Xen are provided by a split device driver architecture. Theillusion of the virtual device is provided by two co-operating drivers: the frontend,which runs an the unprivileged domain and the backend, which runs in a domainwith access to the real device hardware (often called a driver domain; in practicedomain 0 usually fulfills this function).

The frontend driver appears to the unprivileged guest as if it were a real device,for instance a block or network device. It receives IO requests from its kernel asusual, however since it does not have access to the physical hardware of the systemit must then issue requests to the backend. The backend driver is responsible forreceiving these IO requests, verifying that they are safe and then issuing them tothe real device hardware. The backend driver appears to its kernel as a normaluser of in-kernel IO functionality. When the IO completes the backend notifiesthe frontend that the data is ready for use; the frontend is then able to report IOcompletion to its own kernel.

Frontend drivers are designed to be simple; most of the complexity is in the back-end, which has responsibility for translating device addresses, verifying that re-quests are well-formed and do not violate isolation guarantees, etc.

Split drivers exchange requests and responses in shared memory, with an eventchannel for asynchronous notifications of activity. When the frontend driver comesup, it uses Xenstore to set up a shared memory frame and an interdomain eventchannel for communications with the backend. Once this connection is established,the two can communicate directly by placing requests / responses into shared mem-ory and then sending notifications on the event channel. This separation of noti-fication from data transfer allows message batching, and results in very efficientdevice access.

31

Page 36: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

This chapter focuses on some individual split device interfaces available to Xenguests.

8.1 Network I/O

Virtual network device services are provided by shared memory communicationwith a backend domain. From the point of view of other domains, the backendmay be viewed as a virtual ethernet switch element with each domain having oneor more virtual network interfaces connected to it.

From the point of view of the backend domain itself, the network backend driverconsists of a number of ethernet devices. Each of these has a logical direct con-nection to a virtual network device in another domain. This allows the backenddomain to route, bridge, firewall, etc the traffic to / from the other domains usingnormal operating system mechanisms.

8.1.1 Backend Packet Handling

The backend driver is responsible for a variety of actions relating to the transmis-sion and reception of packets from the physical device. With regard to transmis-sion, the backend performs these key actions:

• Validation: To ensure that domains do not attempt to generate invalid (e.g.spoofed) traffic, the backend driver may validate headers ensuring that sourceMAC and IP addresses match the interface that they have been sent from.

Validation functions can be configured using standard firewall rules (iptablesin the case of Linux).

• Scheduling: Since a number of domains can share a single physical networkinterface, the backend must mediate access when several domains each havepackets queued for transmission. This general scheduling function subsumesbasic shaping or rate-limiting schemes.

• Logging and Accounting: The backend domain can be configured withclassifier rules that control how packets are accounted or logged. For exam-ple, log messages might be generated whenever a domain attempts to send aTCP packet containing a SYN.

On receipt of incoming packets, the backend acts as a simple demultiplexer: Pack-ets are passed to the appropriate virtual interface after any necessary logging andaccounting have been carried out.

32

Page 37: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

8.1.2 Data Transfer

Each virtual interface uses two “descriptor rings”, one for transmit, the other for re-ceive. Each descriptor identifies a block of contiguous machine memory allocatedto the domain.

The transmit ring carries packets to transmit from the guest to the backend domain.The return path of the transmit ring carries messages indicating that the contentshave been physically transmitted and the backend no longer requires the associatedpages of memory.

To receive packets, the guest places descriptors of unused pages on the receivering. The backend will return received packets by exchanging these pages in thedomain’s memory with new pages containing the received data, and passing backdescriptors regarding the new packets on the ring. This zero-copy approach allowsthe backend to maintain a pool of free pages to receive packets into, and thendeliver them to appropriate domains after examining their headers.

If a domain does not keep its receive ring stocked with empty buffers then packetsdestined to it may be dropped. This provides some defence against receive livelockproblems because an overloaded domain will cease to receive further data. Simi-larly, on the transmit path, it provides the application with feedback on the rate atwhich packets are able to leave the system.

Flow control on rings is achieved by including a pair of producer indexes on theshared ring page. Each side will maintain a private consumer index indicating thenext outstanding message. In this manner, the domains cooperate to divide thering into two message lists, one in each direction. Notification is decoupled fromthe immediate placement of new messages on the ring; the event channel will beused to generate notification when either a certain number of outstanding messagesare queued, or a specified number of nanoseconds have elapsed since the oldestmessage was placed on the ring.

8.1.3 Network ring interface

The network device uses two shared memory rings for communication: one fortransmit, one for receive.

Transmit requests are described by the following structure:typedef struct netif_tx_request {

grant_ref_t gref; /* Reference to buffer page */uint16_t offset; /* Offset within buffer page */uint16_t flags; /* NETTXF_* */uint16_t id; /* Echoed in response message. */

33

Page 38: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

uint16_t size; /* Packet size in bytes. */} netif_tx_request_t;

gref Grant reference for the network buffer

offset Offset to data

flags Transmit flags (currently only NETTXF csum blank is supported, to indicatethat the protocol checksum field is incomplete).

id Echoed to guest by the backend in the ring-level response so that the guest canmatch it to this request

size Buffer size

Each transmit request is followed by a transmit response at some later date. This ispart of the shared-memory communication protocol and allows the guest to (poten-tially) retire internal structures related to the request. It does not imply a network-level response. This structure is as follows:typedef struct netif_tx_response {

uint16_t id;int16_t status;

} netif_tx_response_t;

id Echo of the ID field in the corresponding transmit request.

status Success / failure status of the transmit request.

Receive requests must be queued by the frontend, accompanied by a donation ofpage-frames to the backend. The backend transfers page frames full of data backto the guesttypedef struct {

uint16_t id; /* Echoed in response message. */grant_ref_t gref; /* Reference to incoming granted frame */

} netif_rx_request_t;

id Echoed by the frontend to identify this request when responding.

gref Transfer reference - the backend will use this reference to transfer a frame ofnetwork data to us.

Receive response descriptors are queued for each received frame. Note that thesemay only be queued in reply to an existing receive request, providing an in-builtform of traffic throttling.typedef struct {

uint16_t id;uint16_t offset; /* Offset in page of start of received packet */uint16_t flags; /* NETRXF_* */int16_t status; /* -ve: BLKIF_RSP_* ; +ve: Rx’ed pkt size. */

} netif_rx_response_t;

34

Page 39: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

id ID echoed from the original request, used by the guest to match this responseto the original request.

offset Offset to data within the transferred frame.

flags Transmit flags (currently only NETRXF csum valid is supported, to indicatethat the protocol checksum field has already been validated).

status Success / error status for this operation.

Note that the receive protocol includes a mechanism for guests to receive incom-ing memory frames but there is no explicit transfer of frames in the other direction.Guests are expected to return memory to the hypervisor in order to use the networkinterface. They must do this or they will exceed their maximum memory reserva-tion and will not be able to receive incoming frame transfers. When necessary, thebackend is able to replenish its pool of free network buffers by claiming some ofthis free memory from the hypervisor.

8.2 Block I/O

All guest OS disk access goes through the virtual block device VBD interface.This interface allows domains access to portions of block storage devices visibleto the the block backend device. The VBD interface is a split driver, similar to thenetwork interface described above. A single shared memory ring is used betweenthe frontend and backend drivers for each virtual device, across which IO requestsand responses are sent.

Any block device accessible to the backend domain, including network-based block(iSCSI, *NBD, etc), loopback and LVM/MD devices, can be exported as a VBD.Each VBD is mapped to a device node in the guest, specified in the guest’s startupconfiguration.

8.2.1 Data Transfer

The per-(virtual)-device ring between the guest and the block backend supportstwo messages:

READ: Read data from the specified block device. The front end identifies the de-vice and location to read from and attaches pages for the data to be copied to(typically via DMA from the device). The backend acknowledges completedread requests as they finish.

35

Page 40: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

WRITE: Write data to the specified block device. This functions essentially asREAD, except that the data moves to the device instead of from it.

8.2.2 Block ring interface

The block interface is defined by the structures passed over the shared memoryinterface. These structures are either requests (from the frontend to the backend)or responses (from the backend to the frontend).

The request structure is defined as follows:typedef struct blkif_request {

uint8_t operation; /* BLKIF_OP_??? */uint8_t nr_segments; /* number of segments */blkif_vdev_t handle; /* only for read/write requests */uint64_t id; /* private guest value, echoed in resp */blkif_sector_t sector_number;/* start sector idx on disk (r/w only) */struct blkif_request_segment {

grant_ref_t gref; /* reference to I/O buffer frame *//* @first_sect: first sector in frame to transfer (inclusive). *//* @last_sect: last sector in frame to transfer (inclusive). */uint8_t first_sect, last_sect;

} seg[BLKIF_MAX_SEGMENTS_PER_REQUEST];} blkif_request_t;

The fields are as follows:

operation operation ID: one of the operations described above

nr segments number of segments for scatter / gather IO described by this request

handle identifier for a particular virtual device on this interface

id this value is echoed in the response message for this IO; the guest may use it toidentify the original request

sector number start sector on the virtual device for this request

frame and sects This array contains structures encoding scatter-gather IO to beperformed:

gref The grant reference for the foreign I/O buffer page.

first sect First sector to access within the buffer page (0 to 7).

last sect Last sector to access within the buffer page (0 to 7).

Data will be transferred into frames at an offset determined by the value offirst sect.

36

Page 41: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

8.3 Virtual TPM

Virtual TPM (VTPM) support provides TPM functionality to each virtual machinethat requests this functionality in its configuration file. The interface enables do-mains to access their own private TPM like it was a hardware TPM built into themachine.

The virtual TPM interface is implemented as a split driver, similar to the networkand block interfaces described above. The user domain hosting the frontend ex-ports a character device /dev/tpm0 to user-level applications for communicatingwith the virtual TPM. This is the same device interface that is also offered if ahardware TPM is available in the system. The backend provides a single interface/dev/vtpm where the virtual TPM is waiting for commands from all domains thathave located their backend in a given domain.

8.3.1 Data Transfer

A single shared memory ring is used between the frontend and backend drivers.TPM requests and responses are sent in pages where a pointer to those pages andother information is placed into the ring such that the backend can map the pagesinto its memory space using the grant table mechanism.

The backend driver has been implemented to only accept well-formed TPM re-quests. To meet this requirement, the length indicator in the TPM request mustcorrectly indicate the length of the request. Otherwise an error message is auto-matically sent back by the device driver.

The virtual TPM implementation listens for TPM request on /dev/vtpm. Since itmust be able to apply the TPM request packet to the virtual TPM instance associ-ated with the virtual machine, a 4-byte virtual TPM instance identifier is pretendedto each packet by the backend driver (in network byte order) for internal routing ofthe request.

8.3.2 Virtual TPM ring interface

The TPM protocol is a strict request/response protocol and therefore only one ringis used to send requests from the frontend to the backend and responses on thereverse path.

The request/response structure is defined as follows:typedef struct {

unsigned long addr; /* Machine address of packet. */

37

Page 42: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

grant_ref_t ref; /* grant table access reference. */uint16_t unused; /* unused */uint16_t size; /* Packet size in bytes. */

} tpmif_tx_request_t;

The fields are as follows:

addr The machine address of the page associated with the TPM request/response;a request/response may span multiple pages

ref The grant table reference associated with the address.

size The size of the remaining packet; up to PAGE SIZE bytes can be found in thepage referenced by ’addr’

The frontend initially allocates several pages whose addresses are stored in thering. Only these pages are used for exchange of requests and responses.

38

Page 43: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

Chapter 9

Further Information

If you have questions that are not answered by this manual, the sources of infor-mation listed below may be of interest to you. Note that bug reports, suggestionsand contributions related to the software (or the documentation) should be sent tothe Xen developers’ mailing list (address below).

9.1 Other documentation

If you are mainly interested in using (rather than developing for) Xen, the XenUsers’ Manual is distributed in the docs/ directory of the Xen source distribution.

9.2 Online references

The official Xen web site can be found at:

http://www.xensource.com

This contains links to the latest versions of all online documentation, including thelatest version of the FAQ.

Information regarding Xen is also available at the Xen Wiki at

http://wiki.xensource.com/xenwiki/

The Xen project uses Bugzilla as its bug tracking system. You’ll find the XenBugzilla at http://bugzilla.xensource.com/bugzilla/.

39

Page 44: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

9.3 Mailing lists

There are several mailing lists that are used to discuss Xen related topics. The mostwidely relevant are listed below. An official page of mailing lists and subscriptioninformation can be found at

http://lists.xensource.com/

[email protected] Used for development discussions and bug re-ports. Subscribe at:http://lists.xensource.com/xen-devel

[email protected] Used for installation and usage discussions andrequests for help. Subscribe at:http://lists.xensource.com/xen-users

[email protected] Used for announcements only. Subscribeat:http://lists.xensource.com/xen-announce

[email protected] Changelog feed from the unstable and 2.0trees - developer oriented. Subscribe at:http://lists.xensource.com/xen-changelog

40

Page 45: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

Appendix A

Xen Hypercalls

Hypercalls represent the procedural interface to Xen; this appendix categorizes anddescribes the current set of hypercalls.

A.1 Invoking Hypercalls

Hypercalls are invoked in a manner analogous to system calls in a conventional op-erating system; a software interrupt is issued which vectors to an entry point withinXen. On x86/32 machines the instruction required is int $82; the (real) IDT issetup so that this may only be issued from within ring 1. The particular hyper-call to be invoked is contained in EAX — a list mapping these values to symbolichypercall names can be found in xen/include/public/xen.h.

On some occasions a set of hypercalls will be required to carry out a higher-levelfunction; a good example is when a guest operating wishes to context switch toa new process which requires updating various privileged CPU state. As an opti-mization for these cases, there is a generic mechanism to issue a set of hypercallsas a batch:

multicall(void *call list, int nr calls)Execute a series of hypervisor calls; nr calls is the length of the ar-ray of multicall entry t structures pointed to be call list.Each entry contains the hypercall operation code followed by up to 7word-sized arguments.

Note that multicalls are provided purely as an optimization; there is no requirementto use them when first porting a guest operating system.

41

Page 46: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

A.2 Virtual CPU Setup

At start of day, a guest operating system needs to setup the virtual CPU it is execut-ing on. This includes installing vectors for the virtual IDT so that the guest OS canhandle interrupts, page faults, etc. However the very first thing a guest OS mustsetup is a pair of hypervisor callbacks: these are the entry points which Xen willuse when it wishes to notify the guest OS of an occurrence.

set callbacks(unsigned long event selector, unsigned long event address,unsigned long failsafe selector, unsigned long failsafe address)Register the normal (“event”) and failsafe callbacks for event pro-cessing. In each case the code segment selector and address withinthat segment are provided. The selectors must have RPL 1; in Xen-Linux we simply use the kernel’s CS for both event selector and fail-safe selector.

The value event address specifies the address of the guest OSes eventhandling and dispatch routine; the failsafe address specifies a sepa-rate entry point which is used only if a fault occurs when Xen attemptsto use the normal callback.

On x86/64 systems the hypercall takes slightly different arguments. This is be-cause callback CS does not need to be specified (since teh callbacks are entered viaSYSRET), and also because an entry address needs to be specified for SYSCALLsfrom guest user space:

set callbacks(unsigned long event address, unsigned long fail-safe address, unsigned long syscall address)

After installing the hypervisor callbacks, the guest OS can install a ‘virtual IDT’by using the following hypercall:

set trap table(trap info t *table)Install one or more entries into the per-domain trap handler table (es-sentially a software version of the IDT). Each entry in the array pointedto by table includes the exception vector number with the correspond-ing segment selector and entry point. Most guest OSes can use thesame handlers on Xen as when running on the real hardware.

A further hypercall is provided for the management of virtual CPUs:

vcpu op(int cmd, int vcpuid, void *extra args)

42

Page 47: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

This hypercall can be used to bootstrap VCPUs, to bring them up anddown and to test their current status.

A.3 Scheduling and Timer

Domains are preemptively scheduled by Xen according to the parameters installedby domain 0 (see Section A.10). In addition, however, a domain may choose toexplicitly control certain behavior with the following hypercall:

sched op new(int cmd, void *extra args)Request scheduling operation from hypervisor. The following sub-commands are available:

SCHEDOP yield voluntarily yields the CPU, but leaves the callermarked as runnable. No extra arguments are passed to this com-mand.

SCHEDOP block removes the calling domain from the run queueand causes it to sleep until an event is delivered to it. No extraarguments are passed to this command.

SCHEDOP shutdown is used to end the calling domain’s execution.The extra argument is a sched shutdown structure which indi-cates the reason why the domain suspended (e.g., for reboot, halt,power-off).

SCHEDOP poll allows a VCPU to wait on a set of event channelswith an optional timeout (all of which are specified in the sched pollextra argument). The semantics are similar to the UNIX pollsystem call. The caller must have event-channel upcalls maskedwhen executing this command.

sched op new was not available prior to Xen 3.0.2. Older versions provide onlythe following hypercall:

sched op(int cmd, unsigned long extra arg)This hypercall supports the following subset of sched op new com-mands:

SCHEDOP yield (extra argument is 0).

SCHEDOP block (extra argument is 0).

SCHEDOP shutdown (extra argument is numeric reason code).

43

Page 48: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

To aid the implementation of a process scheduler within a guest OS, Xen providesa virtual programmable timer:

set timer op(uint64 t timeout)Request a timer event to be sent at the specified system time (time innanoseconds since system boot).

Note that calling set timer op prior to sched op allows block-with-timeout se-mantics.

A.4 Page Table Management

Since guest operating systems have read-only access to their page tables, Xen mustbe involved when making any changes. The following multi-purpose hypercallcan be used to modify page-table entries, update the machine-to-physical mappingtable, flush the TLB, install a new page-table base pointer, and more.

mmu update(mmu update t *req, int count, int *success count)Update the page table for the domain; a set of count updates are sub-mitted for processing in a batch, with success count being updated toreport the number of successful updates.

Each element of req[] contains a pointer (address) and value; the leastsignificant 2-bits of the pointer are used to distinguish the type of up-date requested as follows:

MMU NORMAL PT UPDATE: update a page directory entry orpage table entry to the associated value; Xen will check that theupdate is safe, as described in Chapter 3.

MMU MACHPHYS UPDATE: update an entry in the machine-to-physical table. The calling domain must own the machine pagein question (or be privileged).

Explicitly updating batches of page table entries is extremely efficient, but canrequire a number of alterations to the guest OS. Using the writable page table mode(Chapter 3) is recommended for new OS ports.

Regardless of which page table update mode is being used, however, there are someoccasions (notably handling a demand page fault) where a guest OS will wish tomodify exactly one PTE rather than a batch, and where that PTE is mapped intothe current address space. This is catered for by the following:

44

Page 49: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

update va mapping(unsigned long va, uint64 t val, unsigned longflags)Update the currently installed PTE that maps virtual address va to newvalue val. As with mmu update, Xen checks the modification is safebefore applying it. The flags determine which kind of TLB flush, ifany, should follow the update.

Finally, sufficiently privileged domains may occasionally wish to manipulate thepages of others:

update va mapping otherdomain(unsigned long va, uint64 t val,unsigned long flags, domid t domid)Identical to update va mapping save that the pages being mappedmust belong to the domain domid.

An additional MMU hypercall provides an “extended command” interface. Thisprovides additional functionality beyond the basic table updating commands:

mmuext op(struct mmuext op *op, int count, int *success count,domid t domid)This hypercall is used to perform additional MMU operations. Theseinclude updating cr3 (or just re-installing it for a TLB flush), request-ing various kinds of TLB flush, flushing the cache, installing a newLDT, or pinning & unpinning page-table pages (to ensure their refer-ence count doesn’t drop to zero which would require a revalidation ofall entries). Some of the operations available are restricted to domainswith sufficient system privileges.

It is also possible for privileged domains to reassign page ownershipvia an extended MMU operation, although grant tables are used in-stead of this where possible; see Section A.8.

Finally, a hypercall interface is exposed to activate and deactivate various optionalfacilities provided by Xen for memory management.

vm assist(unsigned int cmd, unsigned int type)Toggle various memory management modes (in particular writablepage tables).

45

Page 50: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

A.5 Segmentation Support

Xen allows guest OSes to install a custom GDT if they require it; this is contextswitched transparently whenever a domain is [de]scheduled. The following hyper-call is effectively a ‘safe’ version of lgdt:

set gdt(unsigned long *frame list, int entries)Install a global descriptor table for a domain; frame list is an arrayof up to 16 machine page frames within which the GDT resides, withentries being the actual number of descriptor-entry slots. All pageframes must be mapped read-only within the guest’s address space,and the table must be large enough to contain Xen’s reserved entries(see xen/include/public/arch-x86 32.h).

Many guest OSes will also wish to install LDTs; this is achieved by using mmu updatewith an extended command, passing the linear address of the LDT base along withthe number of entries. No special safety checks are required; Xen needs to performthis task simply since lldt requires CPL 0.

Xen also allows guest operating systems to update just an individual segment de-scriptor in the GDT or LDT:

update descriptor(uint64 t ma, uint64 t desc)Update the GDT/LDT entry at machine address ma; the new 8-bytedescriptor is stored in desc. Xen performs a number of checks to en-sure the descriptor is valid.

Guest OSes can use the above in place of context switching entire LDTs (or theGDT) when the number of changing descriptors is small.

A.6 Context Switching

When a guest OS wishes to context switch between two processes, it can use thepage table and segmentation hypercalls described above to perform the the bulk ofthe privileged work. In addition, however, it will need to invoke Xen to switch thekernel (ring 1) stack pointer:

stack switch(unsigned long ss, unsigned long esp)Request kernel stack switch from hypervisor; ss is the new stack seg-ment, which esp is the new stack pointer.

46

Page 51: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

A useful hypercall for context switching allows “lazy” save and restore of floatingpoint state:

fpu taskswitch(int set)This call instructs Xen to set the TS bit in the cr0 control register;this means that the next attempt to use floating point will cause a trapwhich the guest OS can trap. Typically it will then save/restore the FPstate, and clear the TS bit, using the same call.

This is provided as an optimization only; guest OSes can also choose to save andrestore FP state on all context switches for simplicity.

Finally, a hypercall is provided for entering vm86 mode:

switch vm86This allows the guest to run code in vm86 mode, which is needed forsome legacy software.

A.7 Physical Memory Management

As mentioned previously, each domain has a maximum and current memory allo-cation. The maximum allocation, set at domain creation time, cannot be modified.However a domain can choose to reduce and subsequently grow its current alloca-tion by using the following call:

memory op(unsigned int op, void *arg)Increase or decrease current memory allocation (as determined by thevalue of op). The available operations are:

XENMEM increase reservation Request an increase in machine mem-ory allocation; arg must point to a xen memory reservationstructure.

XENMEM decrease reservation Request a decrease in machine mem-ory allocation; arg must point to a xen memory reservationstructure.

XENMEM maximum ram page Request the frame number of thehighest-addressed frame of machine memory in the system. argmust point to an unsigned long where this value will be stored.

XENMEM current reservation Returns current memory reservationof the specified domain.

47

Page 52: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

XENMEM maximum reservation Returns maximum memory reser-vation of the specified domain.

In addition to simply reducing or increasing the current memory allocation via a‘balloon driver’, this call is also useful for obtaining contiguous regions of machinememory when required (e.g. for certain PCI devices, or if using superpages).

A.8 Inter-Domain Communication

Xen provides a simple asynchronous notification mechanism via event channels.Each domain has a set of end-points (or ports) which may be bound to an eventsource (e.g. a physical IRQ, a virtual IRQ, or an port in another domain). Whena pair of end-points in two different domains are bound together, then a ‘send’operation on one will cause an event to be received by the destination domain.

The control and use of event channels involves the following hypercall:

event channel op(evtchn op t *op)Inter-domain event-channel management; op is a discriminated unionwhich allows the following 7 operations:

alloc unbound: allocate a free (unbound) local port and prepare forconnection from a specified domain.

bind virq: bind a local port to a virtual IRQ; any particular VIRQ canbe bound to at most one port per domain.

bind pirq: bind a local port to a physical IRQ; once more, a givenpIRQ can be bound to at most one port per domain. Furthermorethe calling domain must be sufficiently privileged.

bind interdomain: construct an interdomain event channel; in gen-eral, the target domain must have previously allocated an un-bound port for this channel, although this can be bypassed byprivileged domains during domain setup.

close: close an interdomain event channel.

send: send an event to the remote end of a interdomain event channel.

status: determine the current status of a local port.

For more details see xen/include/public/event channel.h.

Event channels are the fundamental communication primitive between Xen do-mains and seamlessly support SMP. However they provide little bandwidth for

48

Page 53: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

communication per se, and hence are typically married with a piece of shared mem-ory to produce effective and high-performance inter-domain communication.

Safe sharing of memory pages between guest OSes is carried out by granting ac-cess on a per page basis to individual domains. This is achieved by using thegrant table op hypercall.

grant table op(unsigned int cmd, void *uop, unsigned int count)Used to invoke operations on a grant reference, to setup the grant tableand to dump the tables’ contents for debugging.

A.9 IO Configuration

Domains with physical device access (i.e. driver domains) receive limited accessto certain PCI devices (bus address space and interrupts). However many guestoperating systems attempt to determine the PCI configuration by directly accessthe PCI BIOS, which cannot be allowed for safety.

Instead, Xen provides the following hypercall:

physdev op(void *physdev op)Set and query IRQ configuration details, set the system IOPL, set theTSS IO bitmap.

For examples of using physdev op, see the Xen-specific PCI code in the linuxsparse tree.

A.10 Administrative Operations

A large number of control operations are available to a sufficiently privileged do-main (typically domain 0). These allow the creation and management of new do-mains, for example. A complete list is given below: for more details on any or allof these, please see xen/include/public/dom0 ops.h

dom0 op(dom0 op t *op)Administrative domain operations for domain management. The op-tions are:

DOM0 GETMEMLIST: get list of pages used by the domain

DOM0 SCHEDCTL:

49

Page 54: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

DOM0 ADJUSTDOM: adjust scheduling priorities for domain

DOM0 CREATEDOMAIN: create a new domain

DOM0 DESTROYDOMAIN: deallocate all resources associated witha domain

DOM0 PAUSEDOMAIN: remove a domain from the scheduler runqueue.

DOM0 UNPAUSEDOMAIN: mark a paused domain as schedulableonce again.

DOM0 GETDOMAININFO: get statistics about the domain

DOM0 SETDOMAININFO: set VCPU-related attributes

DOM0 MSR: read or write model specific registers

DOM0 DEBUG: interactively invoke the debugger

DOM0 SETTIME: set system time

DOM0 GETPAGEFRAMEINFO:DOM0 READCONSOLE: read console content from hypervisor buffer

ring

DOM0 PINCPUDOMAIN: pin domain to a particular CPU

DOM0 TBUFCONTROL: get and set trace buffer attributes

DOM0 PHYSINFO: get information about the host machine

DOM0 SCHED ID: get the ID of the current Xen scheduler

DOM0 SHADOW CONTROL: switch between shadow page-tablemodes

DOM0 SETDOMAINMAXMEM: set maximum memory allocationof a domain

DOM0 GETPAGEFRAMEINFO2: batched interface for getting pageframe info

DOM0 ADD MEMTYPE: set MTRRs

DOM0 DEL MEMTYPE: remove a memory type range

DOM0 READ MEMTYPE: read MTRR

DOM0 PERFCCONTROL: control Xen’s software performance coun-ters

DOM0 MICROCODE: update CPU microcode

50

Page 55: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

DOM0 IOPORT PERMISSION: modify domain permissions foran IO port range (enable / disable a range for a particular do-main)

DOM0 GETVCPUCONTEXT: get context from a VCPU

DOM0 GETVCPUINFO: get current state for a VCPU

DOM0 GETDOMAININFOLIST: batched interface to get domaininfo

DOM0 PLATFORM QUIRK: inform Xen of a platform quirk itneeds to handle (e.g. noirqbalance)

DOM0 PHYSICAL MEMORY MAP: get info about dom0’s mem-ory map

DOM0 MAX VCPUS: change max number of VCPUs for a domain

DOM0 SETDOMAINHANDLE: set the handle for a domain

Most of the above are best understood by looking at the code implementing them(in xen/common/dom0 ops.c) and in the user-space tools that use them (mostlyin tools/libxc).

A.11 Access Control Module Hypercalls

Hypercalls relating to the management of the Access Control Module are also re-stricted to domain 0 access for now. For more details on any or all of these, pleasesee xen/include/public/acm ops.h. A complete list is given below:

acm op(int cmd, void *args)This hypercall can be used to configure the state of the ACM, querythat state, request access control decisions and dump additional infor-mation.

ACMOP SETPOLICY: set the access control policy

ACMOP GETPOLICY: get the current access control policy andstatus

ACMOP DUMPSTATS: get current access control hook invocationstatistics

ACMOP GETSSID: get security access control information for adomain

51

Page 56: Interface manualpdub.net/proj/usenix08boston/xen_drive/resources/... · Chapter 1 Introduction Xen allows the hardware resources of a machine to be virtualized and dynamically partitioned,

ACMOP GETDECISION: get access decision based on the currentlyenforced access control policy

Most of the above are best understood by looking at the code implementing them(in xen/common/acm ops.c) and in the user-space tools that use them (mostlyin tools/security and tools/python/xen/lowlevel/acm).

A.12 Debugging Hypercalls

A few additional hypercalls are mainly useful for debugging:

console io(int cmd, int count, char *str)Use Xen to interact with the console; operations are:

CONSOLEIO write: Output count characters from buffer str.

CONSOLEIO read: Input at most count characters into buffer str.

A pair of hypercalls allows access to the underlying debug registers:

set debugreg(int reg, unsigned long value)Set debug register reg to value

get debugreg(int reg)Return the contents of the debug register reg

And finally:

xen version(int cmd)Request Xen version number.

This is useful to ensure that user-space tools are in sync with the underlying hyper-visor.

52


Recommended