•2
The context
• Virtualisation is one of the cornerstones of Cloud:• Virtualisation of:
• CPU• Main memory• Storage• Network
• This lecture will:• mainly deal with CPU virtualisation• discuss few aspects of networking virtualisation
Virtual memory
•Central memory•Processus
Page 0
Page 1Page 2
Page 3
Processor •(p, d)@relative
•+•@absolue Page 2
•@2
•@2
•Protection bit •Default page bit
•4
CPU: Level of privileges• When executing instructions, processors architecturesupport several levels of privileges
• The x86 architecture offers four levels of privilege knownas Ring 0, 1, 2 and 3 to operating systems andapplications to manage access to the computer hardware.
• User level applications run in Ring 3.• The operating system needs to have direct access to thememory and hardware and must execute its privilegedinstructions in Ring 0.
•5
• A host OS (VM) is assumed to be a user application.• However, the OS is assumed to execute “sensitive
operations” and call privileged instructions
VMs, Guest OSs, Host OSs
•6
• Sensitive instructions (or block of instructions): Those that attempt to change the configuration of resources in the system.• Executed often by the OS• Also called unsafe instructions
• Privileged instructions: must be executed under supervisor mode, otherwise they trap.
CPU: Sensitive/Privileged instructions
•7
• Virtualisation• Virtualisation vs. Containerisation• Details : chroot, control groups, namespaces
Plan
•8
Virtualisation
•Virtualisation
Full Virtualisation
Para virtualisation
OS. Level Virtualisation(Containerisation)
Hardware Virt. support
Software based
Full emulation
Binary translation
Binary translation w. direct exec.
Binary patching
•10
Para-Virtualisation
• Para-virtualisation involves modifying the OS kernel to replace non virtualizable instructions with “hypercalls” that communicate directly with the virtualisation layer (hypervisor).
• The guest VM OSs are modified/adapted to implement an API capable of exchanging hypercalls with the para-virtualisation hypervisor
• The guest OS is recompiled prior to installation inside a virtual machine.
• Para-virtualisation eliminates the need for the virtual machine to trap privileged instructions.
•11
Para-Virtualisation
• Nonmodified, proprietary OSes, such as Microsoft Windows, won't run in a para-virtualized environment
• Although the OS must be modified to communicate with the hypervisor, the applications themselves don't need any modifications.
• More performant than full virtualisation based on binary translation: trapping is time-consuming (see slides below)
•12
• A full virtualisation solution is one that creates virtual and isolated versions of an entire computer, including CPU, memory, and I/O devices.
• The key characteristic of a full virtualization solution is that it allows to run arbitrary guest operating systems.
• The virtual machine looks and feels exactly like a real computer
Full Virtualisation
•13
• a CPU is virtualisable using trap-and-emulate if the set of sensitive instructions is a subset of the set of privileged instructions (Goldberg and Popek, 1974)
• When the Intel architecture first came out, it was not virtualisable according to Goldberg and Popek. This is because there were 17 instructions that were sensitive, but not privileged.
• Intel and AMD hardware virtualization features offered a way to make these sensitive instructions privileged. :"VT-x" (Intel) and "AMD-V” (AMD).
Hardware virtualization support (1)
•14
• "trap and emulate" approach: used by the very first hypervisor developed by IBM in the late 60s, and is used again today on 64-bit Intel and AMD systems.
• Trap-and-emulate works as follows:• The hypervisor configures the CPU in such a way that all
potentially sensitive (unsafe) instructions cause a "trap".• A trap is an exceptional condition that transfers control back to
the hypervisor.• Once the hypervisor receives a trap, it inspects the instruction,
and emulates it in a safe way.• The approach usually has good performance: the majority of the
instructions do not cause a trap.
Hardware virtualisationsupport (2)
•15
Software-based virtualisation (Binary translation)
• Binary translation is one way to implement a full virtualization solution when there is no hardware support
• The hypervisor:• examine a piece of “guest” code before it runs,• find the sensitive but unprivileged instructions,• translate them into something privileged,• run the translated code.
• The performance on the x86 architecture is typically 80% to 97% that of the host machine
•16
Software-based virtualisation (Binary translation with direct execution)
• The Intel processors define 4 different "rings" of security. Ring 0 is the most privileged code, and is reserved for the operating system kernel.
• All applications are run in ring 3 where it is not possible to execute sensitive instructions.
• In order to implement a hypervisor based on binary translation,we only need to translate kernel code that is executing in ring 0.
• With binary translation combined with direct execution, most codewill actually run straight on the CPU without ever being translated.
•17
• Virtualisation• Virtualisation vs. Containerisation• Annex/Details : chroot, control groups, namespaces
Plan
Cloud Computing | Virtualisation | Academic year 2018/2019
Containers (OS level virtualisation) vs. Virtualisation (Full & Para)
18
•19
• A container may be only tens of megabytes in size• A virtual machine with its own entire operating system may be
several gigabytes in size.• A single server can host far more containers than virtual
machines.• Virtual machines may take several minutes to boot up their
operating systems and begin running the applications they host
• Containerized applications can be started almost instantly.
Virtualisation vs. “containerisation”
•20
• Container is a method that isolates processes and resources … But ...
• Containers are not as secure as virtual machines• if there's a vulnerability in the kernel, it could provide a way into
the containers that are sharing it• That's also true with a hypervisor, but since a hypervisor
provides far less functionality than the whole OS, it presents a much smaller attack surface.
What about security?
•21
• That's unlikely in the short term … for security reasons.• Virtualisation and containers may come to be seen as
complementary technologies rather than competing ones.
Will containers eventually replace virtualisation?
•22
• https://blogs.oracle.com/ravello/nested-virtualization-with-binary-translation : Introducing full virtualisation
• http://www.dtic.mil/dtic/tr/fulltext/u2/a423654.pdf : read section 2.2 (what are privileged instructions)
• http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.141.4815&rep=rep1&type=pdf (What is a virtualisableCPU ?)
Bibliography
•23
• Virtualisation• Virtualisation (full & para) vs. Containerisation• Annex/Details : chroot, control groups, namespaces
Plan
•24
• Exists since 1979• #include <unistd.h>• int chroot(const char *path);
• UNIX system call for changing the root directory of aprocess and it's children to a new location in the filesystem.
• A program that runs in such a modified environment cannotaccess files outside the designated directory tree.
• A chroot is a way to isolate a process and its children fromthe rest of the system
• The idea of chroot is to provide an isolated disk space foreach process.
Chroot ( “jail” command)
•25
Control groups (cgroups)
• Provides a mechanism for easily managing andmonitoring system resources, by partitioning them intogroups, then assigning applications to those groups
• Resources are:• cpu time,• system memory,• disk• network bandwidth
• Application knows nothing about these limits, this ishappening outside of the application
For further detail: https://sysadmincasts.com/episodes/14-introduction-to-linux-control-groups-cgroups
•26
• Once the group is created, applications are added to thegroup. This can happen on the fly, without system reboots,limits can be adjusted on the fly.
• Applications can consume outside the resource limit. However, if there is resource contention, the resources applications will be limited to the cgroup policy.
Control groups (cgroups)
•27
• Exists since Linux version 2008 (2.6.24)• A Linux installation:
• maintains a single process tree.• shares a single set of network interfaces and routing table
entries.
• With namespaces, it became possible to have multiple “nested” process trees. a process running within a namespace can only see processes in the same namespace: process namespace
• With namespaces, you can have different and separate network interfaces and routing tables that operate independent of each other: network namespace
Namespace
•28
Process namespace
•https://www.toptal.com/linux/separation-anxiety-isolating-your-system-with-linux-namespaces
•29
• Three system calls are used for Process namespaces:• clone(): creates a new process and a new namespace;the process is attached to the new namespace.
• unshare(): does not create a new process; creates a newnamespace and attaches the current process to it.
• setns(): join an existing namespace.
Process namespace
•30
• int clone(int (*fn)(void *), void*child_stack, int flags, void *arg, ... );
• flags: CLONE_NEWPID
• clone() is also used to implement threads: multiplethreads in a program that run concurrently in a sharedmemory space.
• For further details: https://linux.die.net/man/2/clone
Process namespace (clone)
The low byte of flags contains thenumber of the terminationsignal sent to the parent when thechild dies. flags may also be bitwise-or'ed with zero or more of constants,that specify what is shared betweenthe calling process and the childprocess
•31
Process namespace (clone)
•Output:•clone() = 5304•PID: 1
•32
Process namespace (clone)
•Output:•clone() = 5304•PID: 0
•33
• The process namespace allows to spin off a new tree,with its own PID 1 process.
• The process that does this, remains in the parentnamespace (original tree), but makes the child the rootof its own process tree.
• With process namespace isolation, processes in thechild namespace have no way of knowing of the parentprocess’s existence.
• Processes in the parent namespace have a completeview of processes in the child namespace
Process namespace (summary)
•34
Network namespace
•35
• Processes belonging to different processnamespaces still have unrestricted access toother shared resources.
• Example: networking interface. If the childprocess listens on port 80, it would prevent otherprocesses from being able to listen on it.• à Solution : Network namespaces
Process namespace (summary)
•36
• A network namespace allows each process tosee a different set of networking interfaces.
• Even the loopback interface is different for eachnetwork namespace.
• Isolating a process into its own networknamespace involves introducing another flag tothe clone () function call: CLONE_NEWNET;
Network namespace
•37
Network namespace (clone)
•38
• Creating your namespace: ip netns add mynamespace• Checking : ip netns list• Creating a pair of virtual Ethernet interfaces:
• ip link add veth0 type veth peer name veth1• Checking : ip link list• Connect the global namespace to mynamespace
• ip link set veth1 netns mynamespace• Checking:
• ip netns exec mynamespace ip link list• ip link list
Network namespace
•39
• Configuring Interfaces in Network Namespaces:• ip netns exec mynamespace ifconfig veth1 10.1.1.1/24 up
Network namespace