Date post: | 07-Oct-2014 |
Category: |
Documents |
Upload: | himanshu-gupta |
View: | 14 times |
Download: | 1 times |
0
Cloud
Computing:
Shared
Resources
and their
Management
December 12
2011
Himanshu Gupta: 3036 | Rajat Rao: 7365 CSE 661
Cloud Computing: Shared Resources and their Management 2011
1
Contents
Abstract ......................................................................................................................................................... 3
Goal of Paper ................................................................................................................................................ 3
Background ................................................................................................................................................... 4
Introduction ................................................................................................................................................... 5
Hypervisors ................................................................................................................................................... 8
Types of hypervisors ................................................................................................................................. 8
Native v/s Hosted ...................................................................................................................................... 9
Linux Hypervisors ...................................................................................................................................... 11
KVM ....................................................................................................................................................... 11
High-level view of the KVM hypervisor ............................................................................................ 12
Lguest ...................................................................................................................................................... 12
Linux hypervisor benefits ....................................................................................................................... 13
Xen Hypervisor ........................................................................................................................................... 14
Architecture of Xen ................................................................................................................................. 16
Modified Linux Kernel/Domain 0 ...................................................................................................... 16
Domain U ............................................................................................................................................ 17
Working .............................................................................................................................................. 18
Domain Control and Management .............................................................................................................. 18
Xend ........................................................................................................................................................ 19
XM .......................................................................................................................................................... 19
Linxenctrl ................................................................................................................................................ 19
Domain0 to DomainU/Guest Domain Communication .............................................................................. 19
Processors for Clouds ................................................................................................................................. 22
Single-Chip Cloud Computer/Intel ......................................................................................................... 22
Top Level Architecture ....................................................................................................................... 24
L2 Cache ............................................................................................................................................. 25
LMB (Local Memory buffer) .............................................................................................................. 25
DDR3 Memory Controllers ................................................................................................................ 26
LUT ..................................................................................................................................................... 26
MIU (Mesh Interface Unit) ................................................................................................................. 26
SCC Power Controller (VRC)............................................................................................................. 28
Cloud Computing: Shared Resources and their Management 2011
2
AMD Opteron ......................................................................................................................................... 28
Memory Cloud ............................................................................................................................................ 31
Inside the Difference Machine .................................................................................................................... 31
Page Sharing ........................................................................................................................................... 32
Patching .................................................................................................................................................. 32
Delta Encoding........................................................................................................................................ 34
Memory Compression ............................................................................................................................. 34
Future Work and Conclusion ...................................................................................................................... 35
Appendix and Presentation Slides............................................................................................................... 36
References: .................................................................................................................................................. 42
Cloud Computing: Shared Resources and their Management 2011
3
Abstract
This paper describes why we think that shared memory architecture and management is of prime
importance in a Cloud Computing environment. Cloud Computing refers to both the
applications acting as services over the web and the hardware and software system in the
datacenters that provide those services. The services themselves have long been referred to as
Software as a Service/SaaS. The hardware and software in the datacenter is what we will call a
Cloud. Thus, Cloud Computing is the whole package of Utility Computing and also Software as
a Service.
Most of the applications need some computational model, storage model and a communication
model. The statistical multiplexing necessary to gain elasticity and the feeling of unlimited
capacity requires to take the virtualization [13] path so that each of these storage resources to be
virtualized to cover the method of how they are multiplexed and shared.
Cloud-computing helps to provide a way of achieving boundless computation, even if
computational power requirement grows or shrinks rapidly. Thus services used for cloud
computing are reply on distribution of job and resources pooling. Thus the frameworks which
divide the work and resources are designed to support large scale distribution in an ever
changing environment, need on board memory to serve each request rather than communicate
with each other asking for resources to achieve parallelism. Thus to achieve this we need vast
amount of memory/resources and need to efficiently manage it to reduce the wastage cost.
Highly infinite storage is often implemented with complex, multi-tiered distributed systems built
on top of clusters of servers and series of disk drives. We need a robust and intelligent high
standard management, load balancing and fall back recovery methods to achieve high
performance and availability of resources even though there is an abundance of failure sources
that include s/w, h/w, power and network connectivity issues.
Goal of Paper
We are doing a detailed analysis and study of the cloud computing. The area of our focus is what
is the hardware support needed for running cloud systems. We found that for setting any cloud
system two important things are needed. One of them is the hardware servers/machines which
should be powerful enough to support the rising demand. The requirements of these server
Cloud Computing: Shared Resources and their Management 2011
4
machines are different as compared to the desktop/normal use CPU. Some of the key players of
the processor market have introduced some breakthrough products for the same purpose. Other
than the machines/servers the cloud extremely depends on the layer which runs on this hardware
to serve the distributed computing requirement rapidly. The OS needed for this purpose is called
Hypervisor. Hypervisors also comes in different forms and a detailed study of those hypervisors
is done and also provided in this paper. This paper also talks about the memory sharing among
the virtualized machines.
Note: This paper does not discuss about the software/service side of the cloud computing such as
PAAS/SAAS.
Background
In the past, there have been two ways of creating a computer which can server high
computational needs.
1. Blue Gene [2] approach: It creates a gigantic computer machine with thousands of CPU.
2. The other approach, used by Google, is to take 100s of thousands of small, cheap,
computers and join them in a “cluster” in such a way that they all work together as one
big computer.
Supercomputers have many processors plugged into a single machine, sharing common memory
and I/O, while clusters are made up of many smaller machines, each of which contain a fewer
number of processors with each machine having its own local memory and I/O. Maintaining
these both systems required a lot of expenses and complexity. Cloud computing serves an easy
way to achieve high computations requirement while keeping the cost low.
Cloud computing is natural evolution of the common adoption of virtualization and utility
computing. Lower level details are hidden from end-users, who no longer have need for
proficiency in, or control over, the technology infrastructure "in the cloud" that supports them
[9].
Cloud Computing: Shared Resources and their Management 2011
5
Introduction
Technological advances in the years have developed a trend towards workstation oriented
computing systems, all your apps stay in the cloud storage and you use your workstation to use
it. The use of powerful systems and server resources through these environments has ignited
wide scope in Cloud Computing. Current researches are aimed towards using these types of
systems to solve large scale issues. In order for the Cloud to process and solve large problems it
needs resources which are available immediately so that the request are handled all the time, thus
resources need to be shared and managed efficiently for this purpose.
Cloud Computing uses virtualization as its backbone and the current implementation in industry
of cloud computing is all based on virtualization technology. This section discuss about the
virtualization technology used for the same.
In hardware by shared memory we mean by large chunk of memory being shared by different
processors .Thus different processing systems having the same data view it is difficult to scale
the system to a higher level at larger scale. The problem with shared memory systems is that
processors need quick access to memory and will likely cache memory, which has its own
complications:
There is a bottleneck from processor to main memory connection because each processor
will have its own cache attached to it (on chip cache) and a shared memory.
Thus there are chances of cache coherency problem in the system as whenever one on chip
cache is updated with information that may be used by other processors, the change needs to
be reflected to the other processors, otherwise the different processors will be working with
its own copy of data which will be different from the data which just got updated. Thus we
have protocols to reduce it, also these protocols, when they function properly, can provide a
very good way to improve the performance access to shared information between multiple
processors. However on the other hand they can sometimes when there is lot of continuous
and concurrent updating carried out it becomes overloaded and become a bottleneck to
performance.
Also in a Cloud system all aspect of data center environment plays a role in performance which
includes the speed and number of CPUs, the percentage of shared memory available, the size and
Cloud Computing: Shared Resources and their Management 2011
6
performance of storage systems, and the speed and efficiency of the network connecting them.
Consequently in the data center too, the memory availability and the speed of connection among
systems remain the true bottlenecks to performance as the processor systems need to interact
with each other in the data center to help serve each request coming to them. Data centers
everywhere face ongoing demands for higher performance and greater efficiency. There are
chances of the memory failure in the systems due to various reasons, hence there are many new
techniques used to reduce the wastage caused due to the failure of the memory management
system due to the heavy transaction load in large scale environment. Some of the techniques we
will touch in this paper include memory virtualization, CPU virtualization etc. to solve this.
Cloud computing owes much of its functioning to the virtualization technology [12] which is
essentially a software layer over the hardware that facilitates the creation of a virtual
environment (rather than an actual one) in which hardware and software systems are key
players. Distributed and networked servers share a memory pool to overcome physical memory
limitations which is a common bottleneck in software performance; this is done using the
Memory virtualization. With this features integrated into the network environment, applications
can take advantage of a very huge elastic amount of memory to improve entire performance,
utilization of the system, increase the usage of the memory efficiency, and enable new use cases
like integration of the new user application in the existing application.
Shared memory systems implementations are very much different from memory virtualization.
Shared memory systems do not allow abstracting us from the memory resources, thus requiring
the design and implementation to be done in a single operating system object and not in the
common clustered environment of the commodity servers.
Distributed Shared Memory/DSM in Computer Architecture is a form of memory architecture
where the memories which are physically separate from each other can be addressed using the
one logically shared address space. Hence there is the address space which is shared thus the
same physical address on two processors referring to the same location in memory and there is
no single centralized memory. Whereas shared architecture may involve cutting and dividing
memory into different common parts distributed amongst processing system and main memory;
or distributing all memory between server nodes. A need to efficiently use a proper coherence
Cloud Computing: Shared Resources and their Management 2011
7
protocol in accordance with a consistency model, to maintain memory coherence is very
important.
Memory Virtualization takes the advantages of the shared memory pool. Platform on which the
Memory Virtualization is build using these solutions eliminate memory segmentation for the
entire server clusters by creating a pool of shared memory. This platform combines memory
available from the appliances combined with memory from existing servers to create a large
Terabytes of memory cache which can be seamlessly used and shared among vast majority of
servers or data center in a group. It provides a network of shared memory storage resource big
enough to accommodate billions of sets of data, thus this helps to tremendously reduce the
information access and thus improving the performance of the processing by huge amount. Thus
the overall output and result is given back very quick to the user for each request he made to the
server.
The CPU virtualization involves a one CPU processor acting as interface for the two separate
CPU processors. Thus this gives a feeling of running two separate machines on a single physical
computer. Also the most important reason for implementing this is to run two heterogeneous
operating systems on one physical machine. The main objective of the CPU
processor virtualization is to make a CPU run in the same way that two different CPU machine
would run. A very good summary of how this is implemented is that software used for
virtualization is set up in a way that it and only it can talk directly with the background CPU
processor. Everything else which happens on the frontend system goes through this software.
The software then takes on this request and it is its job to establish communications with the rest
of the computer as if it were connected to two different CPUs.
Cloud Computing: Shared Resources and their Management 2011
8
The virtualization needs a layer over the hardware which is known as Hypervisors. The section
provided below discuss about the hypervisors in detail.
Hypervisors
Hypervisors plays a crucial role in virtualization/cloud computing. Hypervisors are software or
firmware components that can virtualize system resources [5]. It is responsible for scheduling
CPU and memory partitioning among various virtual machines running on the same hardware
device. It not only abstracts the hardware for the virtual machines but also controls the execution
of virtual machines as they share the common processing environment. It has no knowledge of
networking, external storage devices, video, or any other common I/O functions found on a
computing system. It is sometimes called the virtual machine monitor or VMM [5].
Types of hypervisors
Native hypervisors: sits directly on the hardware platform and are most likely used to
gain better performance for individual users.
Embedded hypervisors: They are integrated into a processor on a separate chip. Using
this type of hypervisor is how a service provider gains performance improvements.
Cloud Computing: Shared Resources and their Management 2011
9
Hosted hypervisors: They run as a distinct software layer above both the hardware and
the OS. This type of hypervisor is useful both in private and public clouds to gain
performance improvements.
The image provided below depicts the key difference between these two.
Native v/s Hosted
The Cloud Computing must completely separate physical resources management from virtual
resource management. It should also provide capability to intervene between applications and
resources in real-time. Additionally, this hypervisor should be capable of managing both the
resources located locally within same machine as well as any resources in other servers that may
be located elsewhere physically, connected by a network. Once the management of physical
resources is separated from the virtual resource management the need for a mediation layer that
arbitrates the allocation of resources between multiple applications and the shared distributed
physical resources becomes apparent.
So a hypervisor irrespective of its type is just an
application developed on layered architecture that
abstracts the machine hardware and other low level
details from its guests. So each guest sees a virtual
machine instead of the real hardware. At a higher
level, the hypervisor needs some number of things
to boot a guest OS. These are provided as under:
A disk.
Cloud Computing: Shared Resources and their Management 2011
10
A kernel image to boot.
A network device.
A configuration (such as IP addresses and quantity of memory to use).
The hard-disk and network devices generally use the hosting machine's physical disk and
network device.
Simplified hypervisor architecture acts as glue that permits a guest OS to be execute concurrently
with the host OS. This functionality requires a few specific elements.
Interrupts must be handled uniquely by the hypervisor to deal with real interrupts or to
route interrupts for virtual devices to the guest operating system.
System calls that bridge user-space applications with kernel functions.
Input/output (I/O) can be virtualized in the kernel or assisted by code in the guest
operating system.
A hypercall layer is commonly available that allows guests to make requests of the host
operating system.
Page Mapper: It points the hardware to the pages for the specific OS. The operating
system can be guest or hypervisor itself.
Hypervisor must also be able to handle traps or exceptions that occur within guest
operating systems.
A high level scheduler is also necessary to transfer control between the guest operating
system and hypervisor or vice versa.
Cloud Computing: Shared Resources and their Management 2011
11
Simplified view of a Linux-based hypervisor
The market has a wide range of hypervisors available which differs in the types discussed above.
We have picked up some open source hypervisors to study that how they manage the resources
and the guest operating systems. The studied hypervisors are provided below with their analysis:
Linux Hypervisors
We studied two Linux-based hypervisors.
1. KVM: Supports full virtualization.
2. Lguest: It is experimental and supports paravirtualization
KVM
Some of its important features are provided as under:
First hypervisor to become part of the native Linux kernel and was for x86 hardware.
It is implemented as a kernel module which allows Linux to become a hypervisor just by
loading a module.
It provides full virtualization on hardware platforms that provide hypervisor instruction
support. Example: Intel Virtualization Technology, AMD Virtualization.
It does supports paravirtualized guests which includes Linux as well as Windows.
It also added provision for symmetrical multiprocessing hosts as well as guests and
supports enhanced features like live migration to allow guest OS to migrate between
physical servers.
Cloud Computing: Shared Resources and their Management 2011
12
This technology is implemented as two components.
1. KVM-loadable module: Provides management of the virtualization hardware, exposing
its capabilities through the ‘/proc’ file system [5].
2. PC platform emulation: Provided by a modified version of QEMU [5]. QEMU executes
as a user-space process, coordinating with the kernel for guest operating system requests.
High-level view of the KVM hypervisor
When KVM boots a new operating system it becomes a process of the host OS and it can be
managed and scheduled like any other process. But it is in the "guest" mode which is
independent of the kernel and user modes. It uses the underlying hardware's virtualization
support to provide native virtualization
In KVM every guest OS is mapped through the /dev/kvm device, and maintain its own virtual
address space which is mapped to its host kernel's physical address space. Input/output demands
are matched through the host kernel to the QEMU running process on the host hypervisor.
KVM operates in the context of Linux as the host but supports a large number of guest operating
systems, provided underlying hardware virtualization support.
Lguest
Previously known as lhype, Lguest hypervisor provides full virtualization support to run OS. It
offers lightweight paravirtualization for Lguest-enabled x86 Linux guests. This means that guest
OS are aware that they are virtualized, and this information provides performance improvements.
Cloud Computing: Shared Resources and their Management 2011
13
It also simplifies the overall code requirements, requiring only a thin layer in the guest and also
in the host operating system.
The guest operating system includes a thin layer of Lguest code which provides multiple services
like the kernel being booted is being virtualized, routing privileged operations to the host OS
using hypercalls.
Breakdown of the Lguest approach to x86 paravirtualization
The kernel-side of things is implemented as an executable/loadable module which is known as
lg.ko [5] which contains the non-host OS interface to the host kernel. The 1st element is the
switcher, which supports the context-switching of guest OS for execution. The /proc file system
code is also implemented in this module, which implements the user-space interfaces to the
kernel and drivers, including hypercalls. There's code to provide the memory mapping through
the use of shadow page-tables and management of x86 segments. [5]
Lguest has been in the mainline kernel since 2.6.23. It consists of nearly 5000 source lines of
code.
Linux hypervisor benefits
Linux hypervisors have some noticeable benefits. Some of them are provided below:
1. One is from the progressing advancement of Linux and the quality amount of work that
goes into it.
Cloud Computing: Shared Resources and their Management 2011
14
2. We can also take advantage of that platform as an operating system in addition to a
hypervisor. Therefore, in addition to running multiple guest operating systems on a Linux
hypervisor, you can run your other traditional applications at that level.
3. The standard protocols (TCP/IP) and other useful applications (Web servers) are
available alongside the guests.
Xen Hypervisor
It offers a powerful, efficient, and secure feature set for virtualization of x86, x86_64, IA64,
ARM, and other CPU architectures and supports many guest OS.
The Xen hypervisor is a layer of software running directly on computer hardware just like an OS,
allowing the hardware to run multiple guest OS at the same time. Xen hypervisor can run on a
wide variety of OS like Linux, NetBSD, FreeBSD, Solaris, and Windows etc.
The Xen.org community develops and maintains the Xen hypervisor as a free solution licensed
under the GNU General Public License. [4]
A computer running the Xen hypervisor contains three components:
Xen Hypervisor
Domain 0, the Privileged Domain (Dom0) – Privileged guest running on the hypervisor.
It has direct hardware access and manages the guest operating systems.
Multiple DomainU, Unprivileged Domain Guests (DomU) – Unprivileged guests
running on the hypervisor; having no direct access to hardware (e.g. memory, disk, etc.)
Cloud Computing: Shared Resources and their Management 2011
15
The Xen hypervisor acts as an interface for all hardware requests such as CPU, I/O, and disk for
the guest operating systems. By separating the guests from the hardware, it is able to run multiple
OS independently and securely.
Dom0 is loaded by the Xen at initial system start-up and can run any operating system except
Windows. Only Dom0 has privileges to access the Xen hypervisor that is not allocated to any
other DomU. Thus a system user/administrator or application with enough privilege can use
Dom0 and manage the entire guest OS and hypervisor.
DomUs are hosted and maintained by Dom0 which independently run on the system. These
guests are either run with a special improved OS referred to as paravirtualizion or un-modified
OS leveraging special virtualization hardware such as AMD-V and Intel VT which is also known
as hardware virtual machine (HVM).
Some of the terms used by Xen are explained below:
• Paravirtualization
It’s a virtualization technique which allows the running OS to be informed that it is being
executed on a hypervisor in place of some base hardware. The OS must be altered to accept and
resolve the problems of running on a hypervisor instead of real hardware.
• Hardware Virtual Machine (HVM)
Cloud Computing: Shared Resources and their Management 2011
16
It’s an OS which is running in a virtualized environment unchanged and unaware of the fact that
it is not running directly on the hardware. Special hardware is required to allow this, thus the
term HVM.
Note: Microsoft Windows requires a HVM Guest environment.
Architecture of Xen
A Xen virtual environment consist of several items that work together to deliver the
virtualization environment a customer is looking to deploy:
• Xen Hypervisor
• Domain 0 Guest
o Domain Management and Control (Xen DM&C)
• Domain U Guest (Dom U)
o PV Guest
o HVM Guest
The diagram below shows the basic organization of these components.
Modified Linux Kernel/Domain 0
There is only one Domain0 which runs in an instance of Xen and all Xen virtualization
environments require Domain 0 to be running before any other virtual machines can be
started. It has access to physical I/O resources and can interact with the other virtual
machines (Domain U: PV and HVM Guests) running on the system.
Two drivers are included in Domain 0 to support network and local disk requests from
Domain U PV and HVM Guests:
1. Network Backend Driver
2. Block Backend Driver
Cloud Computing: Shared Resources and their Management 2011
17
Network Backend Driver communicates directly with the local networking hardware to
process all virtual machines requests coming from the Domain U guests.
Block Backend Driver communicates with the local storage disk to read and write data
from the drive based upon Domain U requests.
Domain U
It has no direct access to physical hardware on the machine. All paravirtualized VM
executing on a Xen hypervisor are referred to as “Domain U PV Guests” and are
modified Linux OS, FreeBSD, Solaris and other UNIX OS. All fully virtualized
machines running on a Xen hypervisor run standard Windows or any other unchanged
operating system [3]. The Domain U HVM Guest VM is not aware about that other VM
present. A Domain U PV Guest contains two drivers for network and disk access, PV
Network Driver and PV Block Driver.
Cloud Computing: Shared Resources and their Management 2011
18
Working
A Domain U HVM Guest does not have the PV
drivers situated within the virtual machine; in its
place a special daemon is started for every HVM
Guest in Domain 0, Qemudm. It supports the
Domain U-HVM Guest for disk access requests and
networking.
The Domain U HVM Guest must initialize as it
would on a general machine so software is added to
the Domain U HVM Guest, Xen virtual firmware, to
simulate the BIOS an operating system would
expect on startup.
Domain Control and Management
Linux daemons are classified as domain control and management. These all daemons are present
in the Domain 0 so that even if some other domain dies the user or system is aware of it and the
whole system doesn’t crashes.
Cloud Computing: Shared Resources and their Management 2011
19
Xend
A Daemon called Xend Daemon is an application developed in python. It is also
considered the system manager of the Xen environment. It uses the entire provided API
to make request of the Xen Hypervisor.
XM
This is a command line tool which is able to take the user input and forward to XEND via
XML RPC.
Linxenctrl
A whole system developed using C and is able to talk to the hypervisor via Domain 0.
Domain0 to DomainU/Guest Domain Communication
Xen hypervisor is not written to support network and disk request [6]. To accomplish such
operations the guest domain must communicate with the domain 0 using the hypervisor. We
have studied an example case to know about this in detail. The description is provided below:
Cloud Computing: Shared Resources and their Management 2011
20
The Guest Domain driver receives a request to write to the local disk and writes the data
via the Xen hypervisor to the appropriate local memory which is shared with Domain 0.
An event channel exists between Domain 0 and the guest domain that allows them to
communicate via asynchronous inter-domain interrupts in the Xen hypervisor.
Domain 0 will receive an interrupt from the Xen hypervisor causing the driver to access
the local system memory reading the appropriate blocks from the guest domain shared
memory.
The data from shared memory is then written to the local hard disk at a specific location.
The figure given below provides a better explanation of what is discussed:
Note: A latest feature in Xen is being designed to enhance the complete performance and lessen
the load on domain 0. In this design the guest domain has direct access to the hardware without
communicating with domain 0. The figure given below shows the concept about which we are
talking here:
Cloud Computing: Shared Resources and their Management 2011
21
Guest OS accessing hardware directly without communication with Domain0
Cloud Computing: Shared Resources and their Management 2011
22
Processors for Clouds
The selection of processor is a critical task when it comes to cloud computing. While deciding
for a processor to be used for cloud computing following points have to be considered.
Good virtualization efficiency.
Good power efficiency.
Greater use of virtual memory.
Able to create virtual I/O.
The cloud platform differs from the general computing platform a lot. Like here good
virtualization efficiency means that the processors should be more efficient for doing encryption
and communication rather than visualization. Also the power efficiency also plays a key role. We
need to know that are the multicore processors are more efficient than uniprocessors? The
answer depends on various factors like weather there is a single virtual machine on a single
physical server or several virtual machines mapped to a single physical server. The complexity in
case one is low as the resources are not shared and they are available whenever the VM needs
them depending upon the running application request but in the shared system they may have to
time-share some resources, such as network connections, I/O channels, graphic cards or memory.
In this case it is important to
a. Provide quick access to non -shared resources, e.g., having memory local to a
processor helps;
b. Minimize contention for shared resources, e.g., having dedicated data paths to a shared
resource and means to quickly change the context of that resource helps.
In other words, an architecture that allows to get as close as possible logically "partitioning" a
server into "logical" servers is well-suited for virtualization. When the resources are shared
security becomes an important issue. Thus the design being adopted should also provide a secure
way.
Single-Chip Cloud Computer/Intel
Designed and developed by Intel’s Tera-scale Computing Research Program this microprocessor contains
48 cores. This mimics a cloud of computers integrated into one silicon chip. It has the ability to
dynamically configure frequency and voltage to vary power consumption from 125W to as low as 25W.
Some of its key features are:
Cloud Computing: Shared Resources and their Management 2011
23
a. It incorporates technologies intended to scale multi-core processors to 100 cores.
b. Advanced power management technologies.
c. Support for message-passing.
d. Improved energy efficiency.
e. Improved core to core communication.
The name “Single-chip Cloud Computer” resembles the fact that the architecture resembles a
scalable cluster of computers such as in a cloud but integrated into silicon.
The research chip features:
24 “tiles” with two IA cores per tile.
A 24-router mesh network with 256 GB/s bisection bandwidth.
4 integrated DDR3 memory controllers.
Hardware support for message-passing.
In SCC each core can run a separate Operating System and software stack while acting like a
separate compute node that can communicate with other nodes using a packet-based network.
The most important feature of the SCC's network fabric architecture is that it supports message-
passing programming models that can scale up to thousands of processors in cloud datacenters.
Each core has two levels of cache, there is no hardware cache coherence support among cores,
lessen power utilization and to motivate the investigation of datacenter shared memory software
Cloud Computing: Shared Resources and their Management 2011
24
models, on-chip. The researchers at Intel have successfully given a demonstration of message-
passing as well as software-based coherent shared memory on the SCC.
Lowering power consumption is a focus of the chip as well. Software applications are given
control to turn cores on/off or to change their performance levels, continuously adapting to use
the minimum energy demanded at a given moment. It can run all 48 cores at one time over a
range of 25W to 125W and selectively vary the frequency and voltage of the mesh network as
well as sets of cores. Each tile (2 cores) can have its own frequency, and groupings of four tiles
(8 cores) can each run at their own voltage.
Top Level Architecture
Management console is used to load the programs into SCC memory. The memory is also
dynamically mapped to the address space of forty eight cores or to the memory space of the
Management Console, for program loading or debugging.
Input/output instructions on the SCC processor cores are also mapped to the system interface and
by default to the Management Console interface.
SCC chip uses 4 memory controllers at the mesh border. The default boot configuration of
memory configuration registers gives each Single Chip Cloud core access to a private memory
region on 1 memory controller and shared access to the local memory buffers situated in every
tile. This shared memory is used to pass messages between cores so that coherency can be
maintained.
Remapping of main memory can be done to share the regions among one or more cores. Thus,
shared memory may be on-die or be off-chip which can be accessed by using memory
controllers.
Cloud Computing: Shared Resources and their Management 2011
25
Top Level Architecture
L2 Cache
Each core has its own private two hundred and fifty six kilo bytes L2 cache and an associated
controller. When a miss occurs, the cache controller sends the address to the Mesh Interface
Unit/MIU for its decoding and retrieval. Each core can have only 1 unresolved memory request
and will stall on missed reads until data are returned. If a write miss happens, the processor will
continue operation until another miss of either type occurs. Once the data is arrived, the
processor will continue normal operation. Tiles with multiple outstanding requests are supported
by the memory and network system.
LMB (Local Memory buffer)
Over the traditional cache structures, local memory buffer is capable of fast read/write operations
that has been added to each tile. The sixteen KB buffer provides the equivalent of 512 full cache
lines of memory. Any of the available cores or the system interface can read or write data from
these twenty four on-die message buffers. One of the envisioned uses for the message passing
buffer is message passing.
Cloud Computing: Shared Resources and their Management 2011
26
DDR3 Memory Controllers
The 4 memory controllers provide a full capacity of sixty four GB of DDR3 memory. This
memory physically exists on the SCC board. Every memory controller supports 2 un-buffered
DIMMs per channel with 2 ranks per DIMM. The supported DRAM type is DDR3-800 x8 with
one GB, two GB or four GB size, which can lead to sixteen GB capacity per channel. The DDR3
protocol includes calibration, automatic training, compensation and episodic refresh of the
Dynamic RAM. Memory accesses are processed in order, while accesses to different banks and
ranks are interleaved to improve throughput.
LUT
Each core has a lookup table/LUT which is a set of configuration registers in the Configuration
Block which maps a core’s physical addresses to the extended memory map of the system. Each
lookup table contains 256 entries, 1 for each 16MB segment of the core’s 4GB physical memory
address space. Each entry can point to any memory location out of local memory buffer, private
memory, system interface, configuration registers or system memory.
Lookup tables are also programmed by writes through the system interface from the
Management Console. They are set during the bootstrap process to an initial configuration which
is used later on. After booting, the memory map can be dynamically modified by any core having
mapping to their location in system address space.
When L2 cache Miss happens, the mesh interface unit system looks in the lookup table to
determine where the memory request should be sent. Although the lookup table can be
programmed in a way that user sees fit, a default memory map for all system memory sizes has
been developed and used.
MIU (Mesh Interface Unit)
The Mesh Interface Unit/MIU contains the following:
Packetizer and De-Packetizer
Command interpretation and address decode/lookup
Local configuration registers
Link level flow control and Credit Management
Arbiter
Cloud Computing: Shared Resources and their Management 2011
27
The packetizer/de-packetizer interprets the data to/from the agents and to/from the mesh. The
command, data and address buffers provide queuing facility. Explicitly, the mesh interface unit takes
a cache miss and decodes the address, using the lookup table to map from core to system address. It
then places the request into the appropriate queue. The queues are the following:
Router to DDR3 request
Message Passing Buffer access
Local configuration register access
For traffic from the router, the mesh interface unit routes the data to the right local destination. The
link level flow control keeps a check on the flow of data on the mesh using a credit-based protocol.
The arbiter controls tile element access to the mesh interface unit at any point of time using the
round robin order.
The core reads and writes a thirty two bit local address. The eight bits of this address directly point
the lookup table on any cache miss. The LUT returns 22 bits which is a ten-bit system address
extension, a three-bit subdestID, eight-bit tile ID and a bypass bit. The system address is composed
of forty six bits – the bypass bit + 10-bit address extension +3-bit subdestID + 8-bit tile ID + 24
lower bits of the 32-bit core address.
The sub-destination ID (subdestID) defines the port where the packet leaves the router. The tileID
gets the packet to the tile and from there; the subdestID can choose a memory controller, the power
controller, or the system interface/SIF.
The bypass bit specifies local tile memory buffer access.
The lower 34 bits of the 46-bit system address get sent to the destination specified by the tile ID. The
tile ID represents the tile coordinates in a Y, X format (four bits each).
When the request is in the suitable tile, the mesh interface unit looks at the lower thirteen bits of the
address to see what operation it can perform. For the Lookup Table read/write operations. The
remaining operations have the operation specifics (write, read, etc.) automatically sent as part of the
command code by the originator of the request. Note that each operation will only read/write a
certain number of data bits.
Cloud Computing: Shared Resources and their Management 2011
28
SCC Power Controller (VRC)
The VRC has its own destination target in each core’s memory map and thus its own entry in the
look up table. The core will send a write request to an address whose top eight bits match the
VRC entry in the look up table. This will ensure that the data packet is sent to the VRC. A core
or the system interface can write to this memory location, and it will be decoded as a command
for the power controller. This command is then routed to the VRC across the mesh and executed.
The power controller accepts the command, adjusts the voltage, and then sends an
acknowledgment back to the tile so that it knows the command completed successfully.
When the single chip cloud board is started up, the power controller must be written initially to
power-on and reset the tiles. Once started, it receives additional requests increase voltage for
faster operation, to power down a quadrant or lower voltage for power efficiency.
An abstract programing interface has been developed to control the voltage.
The voltage can be altered by writing a seventeen bit value to the power controller register in
which 16th
bit must be one. Bits 15:11 doesn’t matter but bits 10:08 are the voltage ID called the
VID which chooses a voltage domain.
There are eight domains of voltage, starting from V0 through V7. These are also called voltage
islands. Voltage Domains V2 and V6 are same and are used for mesh and the system interface.
The remaining six voltage domains (V0, V1, V3, V4, V5, and V7) represent two by two tile
arrays.
V0 is the voltage domain in the upper left. The voltage domain increments as we move to the
right, skipping V2. The voltage domains in the upper row are V0, V1 and V3. The voltage
domains in the lower row are V4, V5, and V7. The V6 is skipped. Bits 7:0 are the VID value.
The VID is of value 8 bits which specify 256 values.
AMD Opteron
Opteron 6000 and 4000 series launched by AMD is catered for enterprise and cloud computing.
Case studies have shown that it can give businesses up to 84 per cent higher performance [10].
The Opteron 4200 series can support up to 8 cores to handle more virtual machines while the
6200 series can go up to sixteen cores.
Cloud Computing: Shared Resources and their Management 2011
29
The AMD Opteron processors and chipsets provide a powerful foundation to construct
installations that are energy efficient and easily manageable with overall performance required to
deliver what is required. AMD has redesigned the core architecture to enhance execution paths
that help decrease the power utilization. With the new architecture, featuring four-core through
sixteen-core processors, cloud and web deployments that can benefit from more core density can
handle increasing numbers of transactions, and can have up to 160% more real cores per server.
Direct Connect Architecture 2.0 offers superior memory bandwidth, scalability, and I/O
performance.
AMD Turbo CORE technology provides faster clock speeds for improved performance
of CPU intensive workloads.
Greater core density allows you to scale out cloud deployments with fewer nodes, saving
floor space and power without compromising on scalability.
Virtualization features are directly included into silicon, AMD Opteron processors. The
important features provided by incorporating this technology are given as under:
Simplify your server and client systems’ infrastructure.
Minimize power and cooling costs.
Streamline deployments and upgrades.
Maximize your software investment.
Improve system performance, manageability and data security.
Minimize datacenter space and overhead expenses.
Cloud Computing: Shared Resources and their Management 2011
30
FEATURES BENEFITS
Virtualization extensions to the x86
instruction set
Enables software to more efficiently create virtual machines so that multiple
operating systems and their applications can run simultaneously on the same
computer.
Tagged TLB Hardware features that facilitate efficient switching between virtual machines for
better application responsiveness.
Rapid Virtualization Indexing (RVI) Helps accelerate the performance of many virtualized applications by enabling
hardware-based virtual machine memory management.
AMD-V™ Extended Migration Helps virtualization software with live migrations of virtual machines between all
available AMD Opteron processor generations. For an in-depth look at Extended
Migration, read more.
I/O Virtualization Enables direct device access by a virtual machine, bypassing the hypervisor for
improved application performance and improved isolation of virtual machines for
increased integrity and security.
This table is taken from AMD specification sheet. [11]
Cloud Computing: Shared Resources and their Management 2011
31
Memory Cloud
In Internet hosting and Cloud based system centers Virtual Machine Monitors (VMM) is an
important platform to offer services. By distributing Hardware resources among different VMs,
VMM decrease the management of the hosting centers. Thus effective management can take the
usage of the available multiple processors. However main memory is the main bottleneck to
achieve such large scale consolidation as it does not amend with the multiplexing scenario like
how the processors do. Effectively Researchers and VM suppliers inorder to remove this
bottleneck build various products, the most interesting which we felt is the "Difference Engine"
which is an extension to the Xen virtual machine monitor as discussed above which support both
sub-page level sharing using the page patching and in-core memory compression. VMware ESX
server which is one of the variant of such kind of products implements the content based page
sharing, which is more useful for homogenous VMs, the environments depends on the nature of
the OS and guest VMs thus this category of product is not extendible to heterogeneous
environments.
In any of the environments with multiple OS and VMs there are pages are nearly identical, thus
finding these identical and similar pages can help us to combine together and store them into a
much smaller memory in case of identical pages and as a patches in case of similar pages. Thus
compared the Difference Machine with the VMware ESX we find that Difference Machine is
much better as it performs well even in a distributed heterogeneous environment. Also these
types of machine take use of the free memory available after compressing the memory to support
additional virtual machines along with the required VMs
Inside the Difference Machine
Difference Machine builds on following principles:
Page sharing
Patching
Delta encoding
Memory compression.
Cloud Computing: Shared Resources and their Management 2011
32
The above figures show how the principles are implemented in the machine, we describe in
detail about its working and importance in the following part.
Page Sharing
Both the VMware ESX and the Difference Machine uses this principle to find repeated pages in
the system environment. The way it works is that the system keeps track of the pages and
maintains a hash of page contents, whenever there is a redundant page there is a hash collision
thus we find the potential duplicate pages. Thus it does byte by byte comparison to ensure the
pages are indeed identical before sharing them this helps us to identify the target for sharing and
update the virtual memory to point to the shared copy. Whenever there is a write on the shared
page it triggers a page fault and is caught by the VMM and it send the private copy of the shared
page to the VM which caused the fault to occur and updates the virtual memory mapping
approximately. If there are no VMs which are referring to the shared page the VMM reclaims the
memory and returns it back the memory pool. Many more principles are built on this to increase
the performance and integration of the global lock scenario.
Patching
The ultimate aim of patching is reduce the memory usage to storage the redundant page
information in the memory, thus using this we eliminate the redundant images of the pages in the
memory. Thus the Difference Machine does this by creating patches that represent a page for
similar pages. There were many issues faced with the sub-page sharing like identifying the
reference pages as candidate. Difference Engine uses a parameterized logic to point out the
Cloud Computing: Shared Resources and their Management 2011
33
similar pages with respect to the hashes of the several 64-byte portion of the page. At locations
chosen randomly on the page, Hash Similarity Detector (k, s) hashes the contents of (k · s) 64-
byte blocks at, and then gets these hashes along into ‘k’ groups of ‘s’ hashes . Hash Similarity
Detector (1, 2) merges the hashes from two locations in the page into single index of length two.
Hash Similarity Detector (2, 1) whereas indexes each page two times: first time it’s based on the
first block contents, and then again based on the contents of a second block. Pages are chosen as
candidates that have at least one of the two blocks same. Figure 2 shows the scheme effect for
different value settings on workloads described above. X-axis, we have values in the format (k,
s), c, and Y-axis we plot the total savings from patching after all identical pages have been
The machine factor in the memory used to store the shared and patched/compressed pages uses
the below formula
For the workloads, Hash Similarity Detector (2, 1) with one candidate does surprisingly well.
There is a huge gain because of hashing two different blocks in the 2 different page, but little
additional gain by hashing more blocks. Combining blocks does not help much, at least for these
workloads. Storing many candidates in one hash bucket also produces less gain. Hence,
Difference Engine indexes by hashing 64- byte blocks at two locations in the page randomly
Cloud Computing: Shared Resources and their Management 2011
34
selected and using the hash value as a distinct index to save the page in the hash-table. To find a
similar page, the system have to calculate hashes at the 2 locations, search those hash table
entries, and choose the best of the 2 pages found.
Delta Encoding
Research based on finding similarity between files in a large system is widely used in this
principle, the logic comprised of finding the fingerprint of the file over a size of fixed-size blocks
at multiple offsets of the files. Thus the maximum of finger prints gave a strong indication of the
similarity among files. However it had its limitations, the logic was not scalable for dynamically
evolving virtual memory system, and it was insufficient to find the intersecting set from among
large number of candidate-pages. Thus in order to cover these issues the delta encoding was used
in an advanced fashion to compress similar files
Memory Compression
Memory compression helps to save the memory in the system, thus memory compression is
beneficiary sometimes however the performance overhead outweighs the memory savings. Thus
for pages that are not similar to other pages in memory, we encode them to decrease the memory
footprint. Compression/encoding are useful only if the compression ratio is substantially high.
Difference Engine supports multiple compression algorithms. It invalidates compressed pages in
the VM and save them in a heap area in machine memory. When a virtual machine accesses a
compressed page, Difference Engine decompresses the page and returns it to the virtual machine
decompressed and remains there until it is selected for compression.
Cloud Computing: Shared Resources and their Management 2011
35
Future Work and Conclusion
The detailed study done for studying the how cloud computing works and how and what is done
to share and manage the shared resources was done. The processors used for cloud computing
are specially designed to support virtualization and also the hypervisors play a critical role in
managing the shared resources like I/O, memory etc. The hypervisor is a new battleground now
days. The trend is now changing from OS to hypervisors.
The hypervisors are designed as such so that they can support the guest OS and also the
hardware access is maintained. Few enhancements are going on this area so that the guest OS
can have the direct access to hardware so that the performance can be improved. The processors
discussed like SCC and Opteron provides native support for virtualization and hence are best
suited to be used for cloud computing.
More work is being done to enhance both the critical components: hypervisors and processors.
The Intel SCC is a remarkable experimental product in this range which can be scaled up to 100
cores on a single chip. The hypervisors are also getting improved keeping the efficiency in mind.
The discussions and current trends support development in this area with a great velocity.
Memory being the bottleneck in large scale computations is being tweaked to handle billions of
transaction. Thus the concept of Memory cloud is added to the arsenal of cloud computing to
help support the server clusters system. The principles used by Memory cloud are very robust to
handle such scenarios and very flexible to add new design principles. Many VM vendors and
researchers are working in full throttle to improve the conditions in order to develop a efficient
and failure free environment of cloud.
We also wanted to look at the actual implementation of Xen and Rackspace to test and observe
their resource sharing. The source code of Xen can be obtained for educational purpose. Also
both amazon and ec2 provide 300+ free hours but testing it may take few weeks so we were not
able to perform and analyze the actual results.
Cloud Computing: Shared Resources and their Management 2011
36
Appendix and Presentation Slides
Cloud Computing: Shared Resources and their Management 2011
37
Cloud Computing: Shared Resources and their Management 2011
38
Cloud Computing: Shared Resources and their Management 2011
39
Cloud Computing: Shared Resources and their Management 2011
40
Cloud Computing: Shared Resources and their Management 2011
41
Cloud Computing: Shared Resources and their Management 2011
42
References:
[1] Intel. N.p.: n.p., n.d. N. pag. Web. 12 Dec. 2011.
<http://techresearch.intel.com/ProjectDetails.aspx?Id=1>.
[2] Wikipedia. N.p.: n.p., n.d. N. pag. Web. 12 Dec. 2011.
<http://en.wikipedia.org/wiki/Blue_Gene>.
[3] Xen.org. How Does Xen Work? N.p.: n.p., 2009. 1-10. Web. 12 Dec. 2011.
[4] Xen. N.p.: n.p., n.d. Web. 12 Dec. 2011. <http://wiki.xen.org/>.
[5] Jones, Tim M. Anatomy of a Linux hypervisor. N.p.: n.p., 2009. Web. 12 Dec. 2011.
<http://www.ibm.com/developerworks/linux/library/l-hypervisor/index.html?ca=dgr-jw64Lnx-
Hypervisor&S_TACT=105AGY46&S_CMP=grjw64>
[6] Sarathy, Vijay, Purnendu Narayan, and Rao Mikkilineni. Next generation Cloud Computing
Architecture. Los Altos: n.p., n.d. 1-6. Web. 12 Dec. 2011.
[7] Wikipedia. Xen. N.p.: n.p., n.d. Web. 12 Dec. 2011. <http://en.wikipedia.org/wiki/Xen>.
[8] Intel Labs. SCC External Architecture Specification (EAS). .94 ed. N.p.: n.p., 2010. 1-44.
Web. 12 Dec. 2011. <http://techresearch.intel.com/spaw2/uploads/files//SCC_EAS.pdf>.
[9] Wikipedia. Cloud Computing. N.p.: n.p., n.d. Web. 12 Dec. 2011.
<http://en.wikipedia.org/wiki/Cloud_computing>.
[10] Lui, Spandas. N.p.: n.p., n.d. Web. 12 Dec. 2011.
<http://www.itworld.com/hardware/230919/amd-launches-opteron-processors-virtualisation-and-
cloud-computing>.
[11] AMD. AMD Virtualization (AMD-V™) Technology. N.p.: n.p., n.d. Web. 12 Dec. 2011.
<http://sites.amd.com/us/business/it-solutions/virtualization/Pages/virtualization.aspx#2>.
[12] vxzen. What is Xen Hypervisor? N.p.: n.p., n.d. Web. 12 Dec. 2011.
<http://vzxen.com/features>.
[13] Wikipedia. Virtualization. N.p.: n.p., n.d. Web. 12 Dec. 2011.
<http://en.wikipedia.org/wiki/Virtualization>.