The Need
Bill St. Arnaud (CANARIE, Inc)
Impact of ICT Industry
Bill St. Arnaud (CANARIE, Inc)
Virtualization Techniques
Bill St. Arnaud (CANARIE, Inc)
Virtual Machine Monitors (IBM 1960s)
A thin software layer that sits between hardware and the operating system— virtualizing and
managing all hardware resources
IBM Mainframe
IBM VM/370
CMS MVS CMS CMS
App App App App
Ed Bugnion, VMWare
Old idea from the 1960s
• IBM VM/370 – A VMM for IBM mainframe– Multiple OS environments on expensive hardware– Desirable when few machine around
• Popular research idea in 1960s and 1970s– Entire conferences on virtual machine monitor– Hardware/VMM/OS designed together
• Interest died out in the 1980s and 1990s.– Hardware got cheap– Operating systems got more more powerful (e.g multi-
user)
Ed Bugnion, VMWare
A return to Virtual Machines
• Disco: Stanford research project (1996-):– Run commodity OSes on scalable multiprocessors– Focus on high-end: NUMA, MIPS, IRIX
• Hardware has changed:– Cheap, diverse, graphical user interface– Designed without virtualization in mind
• System Software has changed: – Extremely complex– Advanced networking protocols– But even today :
• Not always multi-user• With limitations, incompatibilities, …
Ed Bugnion, VMWare
The Problem Today
Intel Architecture
Operating System
Ed Bugnion, VMWare
The VMware Solution
Intel Architecture
Operating System
Operating System
Intel Architecture
Ed Bugnion, VMWare
VMware™ MultipleWorlds™ Technology
A thin software layer that sits between Intel hardware and the operating system— virtualizing and managing all hardware
resources
Intel Architecture
VMware MultipleWorlds
Win2000
WinNT Linux Win
2000
App App App App
Ed Bugnion, VMWare
MultipleWorlds Technology
A world is an application execution environment with its own operating system
World
Intel Architecture
VMware MultipleWorlds
Win2000
WinNT Linux Win
2000
App App App App
Ed Bugnion, VMWare
Virtual Hardware
Floppy Disks
Parallel Ports Serial/Com Ports
Ethernet
Keyboard
Mouse
Monitor(VMM)
IDE Controller SCSI Controller
Sound Card
Ed Bugnion, VMWare
Attributes of MultipleWorlds Technology
• Software compatibility– Runs pretty much all software
• Low overheads/High performance– Near “raw” machine performance
• Complete isolation– Total data isolation between virtual machines
• Encapsulation– Virtual machines are not tied to physical machines
• Resource management
Ed Bugnion, VMWare
Hosted VMware Architecture
VMware achieves both near-native execution speed and broad device support by transparently switching* between Host Mode and VMM Mode.
Guest OS Applications
Guest Operating System
Host OS Apps
Host OS
PC HardwareDisks Memory CPUNIC
VMware App Virtual Machine
VMware Driver Virtual Machine Monitor
Host Mode VMM Mode
VMware, acting as an application, uses the host to access other devices such as the hard disk, floppy, or network card
The VMware Virtual machine monitor allows each guest OS to directly access the processor (direct execution)
*VMware typically switches modes 1000 times per second
Ed Bugnion, VMWare
Hosted VMM Architecture• Advantages:
– Installs and runs like an application– Portable – host OS does I/O access– Coexists with applications running on the host
• Limits:– Subject to Host OS:
• Resource management decisions• OS failures
• Usenix 2001 paper: J. Sugerman, G. Venkitachalam and B.-H. Lim, “Virtualizing I/O on VMware
Workstation’s Hosted Architecture”.
Ed Bugnion, VMWare
Virtualizing a Network Interface
Host OS
PC HardwarePhysical NIC
VMApp
VMDriver
Guest OS
VMM
Phy
sica
l Eth
erne
t
NIC Driver
NIC Driver
Virtual Bridge
Virtual Network Hub
Ed Bugnion, VMWare
The rise of data centers
• Single place for hosting servers and data• ISP’s now take machines hosted at data
centers• Run by large companies – like BT• Manage
– Power– Computation + Data– Cooling systems– Systems Admin + Network Admin
Data Centre in Tokyo From:Satoshi Matsuoka
http://www.attokyo.co.jp/eng/facility.html
Martin J. Levy (Tier1 Research) and Josh Snowhorn (Terremark)
Martin J. Levy (Tier1 Research) and Josh Snowhorn (Terremark)
Martin J. Levy (Tier1 Research) and Josh Snowhorn (Terremark)
Requirements• Power an important design constraint:
– Electricity costs– Heat dissipation
• Two key options in clusters – enable scaling of:– Operating frequency (square relation)– Supply voltage (cubic relation)
• Balance QoS requirements – e.g.fraction of workload to process locally – with power management
From: Salim Hariri, Mazin Yousif
From: Justin Moore, Ratnesh Sharma, Rocky Shih, Jeff Chase, Chandrakant Patel, Partha Ranganathan (HP Labs)
Martin J. Levy (Tier1 Research) and Josh Snowhorn (Terremark)
The case for power management in HPC
• Power/energy consumption a critical issue– Energy = Heat; Heat dissipation is costly
– Limited power supply
– Non-trivial amount of money
• Consequence– Performance limited by available power
– Fewer nodes can operate concurrently
• Opportunity: bottlenecks– Bottleneck component limits performance of other components
– Reduce power of some components, not overall performance
• Today, CPU is:– Major power consumer (~100W),
– Rarely bottleneck and
– Scalable in power/performance (frequency & voltage)
Power/performance“gears”
Is CPU scaling a win?
• Two reasons:1. Frequency and voltage scaling
Performance reduction less than
Power reduction
2. Application throughputThroughput reduction less than
Performance reduction
• Assumptions– CPU large power consumer– CPU driver– Diminishing throughput gains
performance (freq)
pow
er
ap
plic
ati
on t
hro
ug
hput
performance (freq)
(1)
(2)
CPU powerP = ½ CVf2
AMD Athlon-64
• x86 ISA• 64-bit technology• Hypertransport technology – fast memory bus• Performance
– Slower clock frequency– Shorter pipeline (12 vs. 20)– SPEC2K results
• 2GHz AMD-64 is comparable to 2.8GHz P4• P4 better on average by 10% & 30% (INT & FP)
• Frequency and voltage scaling– 2000 – 800 MHz– 1.5 – 1.1 Volts
From: Vincent W. Freeh (NCSU)
LMBench results
• LMBench– Benchmarking suite
– Low-level, micro data
• Test each “gear”
GearFrequency (Mhz)
Voltage
0 2000 1.5
1 1800 1.4
2 1600 1.3
3 1400 1.2
4 1200 1.1
6 800 0.9
From: Vincent W. Freeh (NCSU)
Operating system functions
From: Vincent W. Freeh (NCSU)
Communication
From: Vincent W. Freeh (NCSU)
The problem• Peak power limit, P
– Rack power– Room/utility– Heat dissipation
• Static solution, number of servers is– N = P/Pmax
– Where Pmax maximum power of individual node
• Problem– Peak power > average power (Pmax > Paverage)
– Does not use all power – N * (Pmax - Paverage) unused
– Under performs – performance proportional to N– Power consumption is not predictable
From: Vincent W. Freeh (NCSU)
Safe over provisioning in a cluster• Allocate and manage power among M > N nodes
– Pick M > N• Eg, M = P/Paverage
– MPmax > P
– Plimit = P/M
• Goal– Use more power, safely under limit– Reduce power (& peak CPU performance) of individual nodes– Increase overall application performance
time
pow
er Pmax
Paverage
P(t)
time
pow
er
PlimitPaverage
P(t)
Pmax
From: Vincent W. Freeh (NCSU)
Safe over provisioning in a cluster
• Benefits– Less “unused” power/energy– More efficient power use
• More performance under same power limitation – Let P be performance
– Then more performance means: MP * > NP– Or P */ P > N/M or P */ P > Plimit/Pmax
time
pow
er Pmax
Paverage
P(t)
time
pow
er
PlimitPaverage
P(t)
Pmax
unusedenergy
From: Vincent W. Freeh (NCSU)
When is this a win?
• When P */ P > N/M
or P */ P > Plimit/Pmax
In words:
power reduction more than
performance reduction
• Two reasons:1. Frequency and voltage scaling
2. Application throughput
performance (freq)
pow
er
ap
plic
ati
on t
hro
ug
hput
P * / P
< P av
erag
e/P m
ax
P * / P
> P av
erag
e/P m
ax
performance (freq)
(1)
(2)
From: Vincent W. Freeh (NCSU)
Feedback-directed, adaptive power control
• Uses feedback to control power/energy consumption– Given power goal
– Monitor energy consumption
– Adjust power/performance of CPU
• Several policies– Average power
– Maximum power
– Energy efficiency: select slowest gear (g) such that
From: Vincent W. Freeh (NCSU)
A more holistic approach: Managing a Data Center
From: Justin Moore, Ratnesh Sharma, Rocky Shih, Jeff Chase, Chandrakant Patel, Partha Ranganathan (HP Labs)
CRAC: Computer Room Air Conditioningunits
From: Justin Moore, Ratnesh Sharma, Rocky Shih, Jeff Chase, Chandrakant Patel, Partha Ranganathan (HP Labs)
Location of Cooling Unitssix CRAC units are serving 1000 servers, consuming 270
KW of power out of a total capacity of 600 KW
http://blogs.zdnet.com/BTL/?p=4022
From: Justin Moore, Ratnesh Sharma, Rocky Shih, Jeff Chase, Chandrakant Patel, Partha Ranganathan (HP Labs)
From: Justin Moore, Ratnesh Sharma, Rocky Shih, Jeff Chase, Chandrakant Patel, Partha Ranganathan (HP Labs)
From: Justin Moore, Ratnesh Sharma, Rocky Shih, Jeff Chase, Chandrakant Patel, Partha Ranganathan (HP Labs)
From: Justin Moore, Ratnesh Sharma, Rocky Shih, Jeff Chase, Chandrakant Patel, Partha Ranganathan (HP Labs)
From: Justin Moore, Ratnesh Sharma, Rocky Shih, Jeff Chase, Chandrakant Patel, Partha Ranganathan (HP Labs)
From:Satoshi Matsuoka
From:Satoshi Matsuoka
From:Satoshi Matsuoka
From:Satoshi Matsuoka
From:
Satoshi Matsuoka
From:Satoshi Matsuoka