Ray Kinsella
Senior Software Engineer
Embedded and Communications Group
Intel Corporation
Xen in Embedded Systems
2
Legal NoticesINFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR.Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. Intel Virtualization Technology requires a computer system with a processor, chipset, BIOS, virtual machine monitor (VMM) and applications enabled for virtualization technology. Functionality, performance or other virtualization technology benefits will vary depending on hardware and software configurations. Virtualization technology-enabled BIOS and VMM applications are currently in development. Performance results are based on certain tests measured on specific computer systems. Any difference in system hardware, software or configuration will affect actual performance. For more information go to http://www.intel.com/performance.The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htmBunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Inside, Core Inside, i960, Intel, the Intel logo, Intel AppUp, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, the Intel Inside logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel Sponsors of Tomorrow., the Intel Sponsors of Tomorrow. logo, Intel StrataFlash, Intel vPro, Intel XScale, InTru, the InTru logo, the InTru Inside logo, InTru soundmark, Itanium, Itanium Inside, MCS, MMX, Moblin, Pentium, Pentium Inside, skoool, the skoool logo, Sound Mark, The Creators Project, The Journey Inside, vPro Inside, VTune, Xeon, and Xeon Inside are trademarks of Intel Corporation in the U.S. and/or other countries.*Other names and brands may be claimed as the property of others.Copyright © 2010, Intel Corporation. All rights reserved.
Summary & Agenda
This presentation examines the integration of Xen* Virtualisation into embedded systems. It covers effective partitioning of system resources for deterministic embedded applications.
Agenda• Overview of Xen Virtualisation
– Types of hypervisor
– Types of guest
– Embedded use cases
• Xen in Embedded Systems– Partitioning CPU Time
– Partitioning System Memory
– Partitioning System I/O
– Power usage
3
What is Xen?
• Xen is a bare metal hypervisor
– Originally designed for data centre or server environments consolidation.
– A privileged guest Domain 0 (Dom 0) will always exist.
– Dom 0 owns all devices (PCI, Storage, USB, etc.) and arbitrates their usage between guests.
– Xen includes mechanisms for device multiplexing such as “split drivers”.
• Xen’s design goal is the “separation of policy and mechanism”.
– Policy is implemented in Dom 0 Daemons (Xen Daemon).
– Mechanism is implemented in the Xen hypervisor layer.
Desktop Virtualisation Kernel Virtualisation
Xen Virtualisation
Bare metal Hypervisor
Overview of Xen Virtualisation
4
Types of guests
Para-Virtualised (PV) Guests
• Are “aware” they are virtualised.
• PV guest will delegate many aspects of operating system function to the hypervisor via kernel hooks aka hypercalls.
Hardware-Virtual-Machine (HVM) Guests
• Are “not aware” that they are virtualised.
• Qemu* emulates common hardware for HVM guests and the operating system loads real world drivers.
• Graphics interface is exported through VNC.
Overview of Xen Virtualisation
Xen can support many diverse guests concurrently
5
Embedded use cases
Application integrationIntegrating new and legacy applications onto same hardware.
Un-trusted applicationIntegrate an un-trusted application (a 3rd
party application) onto same hardware.
Resource isolationRestricting the resources assigned to each application.
High availabilityEnsuring that an application is always available. Standby instances are always ready to run in case of failure.
Overview of Xen Virtualisation
Intel® Architecture
Xen
PV Dom0Control Plane
PV DomUHigh Availability
Application
PV DomUHigh Availability
Application
Intel® Architecture
Xen
PV Dom0Control Plane
PV DomUHigh
Performance Application
PV DomUHigh
Performance Application
6
Virtual CPU Architecture
Virtual CPU (VCPU)
• Are an abstraction layer created by Xen’s scheduler.
• Isolates guest from the actual number of physical CPUs.
• Xen has a scheduler similar to an OS scheduler that arbitrates between the guests contending for CPU time based on priority.
Partitioning CPU Time
Intel® Architecture
Xen
CORE-0 CORE-1 CORE-2 CORE-3
PV Guest
VCPU-N
VCPU-0 VCPU-1
…….
HVM Guest
VCPU-N
VCPU-0 VCPU-1
…….
Credit Scheduler
8
1000
1200
1400
1600
1800
2000
2200
2400
2600
16 Threads 16 Threads 8 Threads
CentOS* 5.4 1 Guest 2 Guests
Native Xen 4.0.0 Xen 4.0.0
SSL Encrypt Throughput
Mb/sec
Virtual CPU Performance
SSL Encrypt is a simple CPU-intensive application.
• It uses the pthreads library to parallelise an encrypted workload over multiple cores.
• It uses the OpenSSL* libraries to encrypt pools of 64-byte buffers using AES 128-bit CBC encryption.
Note the scale
Equivalent performance native to virtualised
Partitioning CPU Time
9
0
500
1000
1500
2000
2500
Default† Xen 4.0.0 80/20 Xen 4.0.0
SSL EncryptThroughput
Mb/sec
Guest B
Guest A
80/20 Ideal
The Credit Scheduler
The Xen Credit scheduler • Default scheduler in Xen 4.0.0
– The Credit scheduler is proportional fair share scheduler.
– The Credit scheduler is a work-conserving scheduler.
• Scheduler parameters
– Cap : Assign a time cap in hundredths of seconds to a guest, (0 – N *100), where N is the number of cores in the system.
– Weight: Assign a weight to each guest (1 – 65536); default is a weight of 256.
Scheduler partitioning of core time is effective
Partitioning CPU Time
† 50/50 running two guests on the same cores with equal weighting.
10
0
1
2
3
4
5
6
7
No Pinning Pinning Weighting Capping
2 PV Guests 2 PV Guests 2 PV Guests 2 PV Guests
SSL EncryptSTDev - MB
The graph above shows the standard deviation of throughput with different scheduler settings, assignment, pinning, and capping.
Modest cost in determinism with scheduler partitioning
Partitioning CPU Time
The Credit Scheduler
11
Virtual CPU Configuration
Caveat Emptor
Partitioning CPU Time:• If possible, partition CPU resources by assigning cores to guests rather than using scheduler
weighting or caps.
CPU Pinning:• Make the number of Virtual CPUs equal to the number of cores.
• Use vcpu-pin to assign a guest VCPU to a specific core.
Remember:• Dom 0 must service the hardware assigned to it.
• In all guests:
– Switch off unnecessary OS kernel features; i.e., use a tickless kernel…
– Turn off unnecessary OS services … e.g., CentOS* Bluetooth* manager.
– Remove unnecessary drivers; e.g., the USB driver.
Partitioning CPU Time
12
Real-Time on Xen
The Chinese Dragon FestivalPhoto © Walter Baxter and licensed for reuse under a Creative Commons License.
Recommended reading:
Supporting Soft Real-Time Tasks in the Xen Hypervisor; Min Lee, A.S Krishnakumar, P. Krishnan, Navjot Singh, Shalini Yajnik; Georgia Institute of Technology, and Avaya Labs
Extending Virtualization to Communications and Embedded Applications; Edwin Verplanke, Don Banks; Intel Corporation, and Cisco Systems, Inc.; Intel Developer Forum (IDF) 2010
Real-Time
13
0
50
100
150
200
250
Page alloc
uSec
Increase in cost of a page allocation
Bubble Memory + NUMA
• The bubble memory driver can arbitrate (share) memory between guests. This is not the default behaviour on Xen.
– Page allocation incurs a ~16% performance penalty during memory bubbling.
• Mitigate by statically assigning memory to guests. This is the default behaviour on Xen.
– maxmem: set maximum amount of memory a domain can be allocated
– memory: set initial amount of memory a domain is allocated
NUMA• Xen is NUMA-aware
– Non-Uniform Memory Architecture (NUMA)
– Switched on in Xen 4.0.0 by default
• Guests are not NUMA-aware (yet!)
– A patchset has been submitted to Xen, to enable NUMA awareness for PV and HVM guests.
Bubble Memory
Socket-0
CORE-0 CORE-1
CORE-2 CORE-3Socket-1
CORE-4 CORE-6
CORE-5 CORE-7
DDR3
DDR3
DDR3
DDR3
DDR3
DDR3
Partitioning System Memory
14
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Guest A - 16 Threads Native - 16 Threads Guest A+B - 8+8 Threads
Lmbench*mem_bwMB/sec
NUMA Optimal Configuration
Optimal ConfigurationGuest does not cross sockets, bind guests to cores on one socket only.
Sub-optimal ConfigurationGuest crosses sockets, guest is bound to cores on more than one socket
Guest A crosses sockets
Xen
Num
a N
ode
1 Num
a Node 2
Intel® Architecture
Guest Guest
Intel® Architecture
Xen
NU
MA
Nod
e 1
Guest
NU
MA
Node 2Guest
CORE-0 CORE-1
CORE-2 CORE-3
CORE-4 CORE-6
CORE-5 CORE-7
Partitioning System Memory
15
e1000
PV Dom0
Network Bridge
Netback Netback
Xen Bus
Netfront
PV Guest
Packets
Xen Networking – Software Switch
• Device sharing via the split driver model.
– The backend driver located in the Dom0 is responsible for multiplexing guest access to physical hardware.
– The frontend driver delivers data to the guest, implementing OS device driver interfaces.
• Xen networking
1. Packets are received by the Dom0 Ethernet* driver .
2. Packets then passed into the OS bridge driver, the destination Ethernet device is looked up, and the packets are forwarded to the Ethernet device driver.
3. Packets are received the Netback driver in Dom0 and pushed onto a Xenbus ring.
4. Packets are popped off the Xenbus ring by Netfront driver and passed to the guest operating system.
Xen software switching mechanism
Hotspot
Xen shared device model
Intel® Architecture
Xen
Xen BusH
VM
Gue
st
Backend FrontendQemu
PV Dom0OS driver
PV Guest
Packets
Partitioning System I/O
16
Xen Networking – Hardware Switch
• Single Root I/O Virtualisation (SR-IOV)Built on the following technologies available Intel® VT-c enabled Network Interface Cards (NIC)
– I/O Acceleration: Intel® VT-d (IOMMU)
– Filtering Technology: Virtual Machine Device Queues (VMDq) -Hardware MAC Filtering
– Queuing Technology: Multiple RX + TX Hardware Queues
Each guest has an exclusive NIC that has been virtualised in hardware (SR-IOV).
• Xen Passthrough– PCI configuration space is still owned by Dom0, guest PCI
configuration read and writes are trapped and fixed by Xen PCI passthrough
Xen device passthrough model SR-IOV hardware switching
Intel® Architecture
Xen
Xen Bus
HV
M D
omU
BackendQemu
PV Dom0OS driver
PV DomU
OS driver
Packets
ConfigSpace
igb
PV D
om0
pciback
Xen Bus
igbvf
PV DomUPackets
ConfigSpace
VT-d(IOMMU)
Partitioning System I/O
RX TX TX
SW
TXRX
IOMMU
RX
SW HW SW HW HW
Software Switch VMDq SR-IOV
MAC Filter
MAC Filter
RX: Receive QueueTX: Transmission QueueHW: Hardware QueueSW: Software Queue
17
Hardware vs. Software Packet Switching
• Hardware– 2 x Intel® Xeon® Processor E5645 (12M Cache, 2.40 GHz, 5.86 GT/s Intel® QuickPath Interconnect) – 80W Thermal Design Power
(TDP)
– Intel® 82599 10Gb Ethernet Controller with SR-IOV capabilities
• Software– Measured with Xen 4.0.0 with CentOS 5.4 Guests
– 1 Socket/6 Cores used to route traffic
35x performance increase on small packet sizes
Partitioning System I/O
18
0
10
20
30
40
50
60
70
80
90
100
64 128 256 512 1024 1280 1518
% Line Rate(10 Gb)
Packet size in bytes
Xen Bridge
Xen + SR-IOV
CentOS 5.4 + SRIOV
Power Usage
0
50
100
150
200
250
Idle Idle (Dom0 only) 2 Idle guests (+ Dom0)
1 socket active 1 active guest (+Dom0 )
2 sockets active 2 active guests (+ Dom0)
CentOS 5.4 Xen 4.0.0 + CentOS 5.4
Xen 4.0.0 + CentOS 5.4
CentOS 5.4 Xen 4.0.0 + CentOS 5.4
CentOS 5.4 Xen 4.0.0 + CentOS 5.4
Watts
• Hardware– 2 x Intel® Xeon® Processor L5530 (8M Cache, 2.40 GHz, 5.86 GT/s Intel® QPI) – 60W TDP
– Intel® Server Board S5520HC
– Enermax* EVR1050EWT* Power Supply Unit – 88.61% efficiency
• Software– Measured with Xen 4.0.0 with CentOS 5.4 Guests and CentOS 5.4 Native
Each guest is bound to the cores on a separate socket
Equivalent performance native to virtualised
Power usage
19
Conclusions
Equivalent performance native to virtualised.
Potential benefits of virtualisation for embedded system designers:
– Greater system security.
– Greater system determinism.
– Improved resource isolation.
– Controlled 3rd party platform access.
20