v2.2 Page 2
Shift Happens...
Requires...● More performance● More IP-based services● More pervasive security● More network bandwidth● More network threads
All of these requirements must be delivered within increasingly constrained space and power envelopes
More...The Swelling Network Tide
v2.2 Page 3
Agenda ● Welcome & Introduction● Understanding the Technology
– Server Virtualisation – Chip Multi-Threading – Logical Domains – Native & Branded Solaris Containers – Sun xVM
● Total Cost of Ownership Benefits ● Coffee & Tea Break● Demonstration of LDoms in detail
- Creation / Deletion- Resource Allocation / Re-allocation- Rapid deployments (cloning)
v2.2 Page 4
Agenda
● Demonstration of LDoms in action — Building and cloning a Glassfish application— Load test and resource re-allocation using a content
management application built on a SAMP stack— Oracle log shipping between LDoms across systems— Solaris 8 containers with a legacy Oracle database— Solaris 8 to Solaris 10 binary compatibility
● Zeus Application Delivery Controller
v2.2 Page 6
Dynamic System Domains
Addressing Virtualisation Challenges
Hard Partitions Virtual Machines OS Virtualisation Resource Mgmt.
Server
OS
AppCalendarServer Database Web
ServerSun RayServer
AppServerDatabaseMail
ServerWeb
ServerFile
ServerIdentityServer
AppServer Database
Trend to Flexibility Trend to Isolation
Sun xVM(Ldoms & xVM Server)
Virtual BoxVMWare
Microsoft Hyper-V
Solaris ContainersSolaris Containers
for Linux AppsSolaris 8 & 9 Containers
Solaris Resource Manager
Single OS Multiple OS
v2.2 Page 7
Hyper-V
x86SPARC
Solaris VMwareMicrosoftLinux
Logical Domains Container VI3 (ESX)XENSystem Domains xVM Server
Enterprise Systems- SunFire- M Series
CMT basedSystems
- T Series
All SPARC &x86
based Systems
Certifiedx86
based Systems
Certifiedx86
based Systems
Certifiedx86
based Systems
Certifiedx86
based Systems
Systems Virtualisation Landscape
v2.2 Page 9
1980 1985 1990 1995 2000 20051
10
100
1000
10000
Relat
ive P
erfor
manc
e
Memory Bottleneck
2x every 2 years
< 2x every 6 years
Gap
DRAM SpeedsCPU Frequency
v2.2 Page 10
Single Threaded Performance
Single Threading
Thread
Memory Latency ComputeTime
C C CM M M
Up to 85% Cycles Waiting for Memory
US-IV+ 25% 75%Intel 15% 85%
HURRYUP ANDWAIT!
v2.2 Page 11
Comparing Modern CPU Design Techniques
1 GHz
Time
Instruction Level Parallelism Offers Limited Headroom Thread Level Parallelism Provides Greater Performance Efficiency
CC MM CC MM CC MM CC MMCompute
Memory Latency
TLP Time SavedCC MM
CC MM
CC MMCC MM
2 GHz CC MM CC MM CC MM CC MM
v2.2 Page 12
Chip Multi-threaded (CMT) Performance
UltraSPARC-T1 single core
Memory Latency ComputeTime
C
Thread 4
Thread 3
Thread 2
Thread 1 MC M MC CMC M MC C
MC M MC CMC M MC C
T2000,T1000 and T6300 servers
v2.2 Page 13
Chip Multi-threaded (CMT) Performance
UltraSPARC-T2 / T2 Plus single core
Memory Latency ComputeTime
Thread 8
Thread 7
Thread 6
Thread 5
Thread 4
Thread 3
Thread 2
Thread 1
T5220,T5120, T5240, T5140, T5440, T6320 and T6340 servers
MC M MC CMC M MC C
MC M MC CMC M MC C
MC M MC CMC M MC C
MC M MC CMC M MC C
MC M MC CMC M MC C
MC M MC CMC M MC C
MC M MC CMC M MC C
MC M MC CMC M MC C
v2.2 Page 14
n cores per processor m strands per core n x m threads per processor
T2000, T2000, T6300 have 1x US-T1 socket = 8 cores x 4 threads x 1 socket = 32 threadsT5220, T5120, T6320 have 1x US-T2 socket = 8 cores x 8 threads x 1 socket = 64 threadsT5240, T5140, T6340 have 2x US-T2 sockets = 8 cores x 8 threads x 2 sockets = 128 threadsT5440 has 4x US-T2 sockets = 8 cores x 8 threads x 4 sockets = 256 threads
CMP chip multiprocessing
FG-MT fine-grained
multithreading
CMT chip multithreading
Chip Multi-Threading (CMT)
v2.2 Page 15
Industry's Most Highly Threaded Servers Maximum Threading = Higher Throughput, Greater Energy & Space Efficiency
T5120 1U/ T5220 2UUltraSPARC T2 based
1 socket 8-coresUp to 64 Threads
T5140 1U/ T5240 2UUltraSPARC T2 Plus based
2 socket servers(each 8-cores)
Up to 128 ThreadsT6300 / T6320 / T6340
UltraSPARC T1,T2 & T2 Plus blades1 socket 8-cores, 1 socket 8-cores &
2 socket 8-cores each
T5440 4UUltraSPARC T2 Plus based
4 socket server(each 8-cores)256 Threads
T1000 1U / T2000 2UUltraSPARC T1 based
1 socket 8-coresUp to 32 Threads
v2.2 Page 16
1 of 8 Cores J-BUS (3.2GB/sec)(V880 1.2GB/sec)
C8C7C6C5C4C3C2C1
L2$L2$L2$L2$
Xbar
DDR-2SDRAM
DDR-2SDRAM
DDR-2SDRAM
DDR-2SDRAM
FPU
Introducing UltraSPARC T1
Sys I/FBuffer Switch
Core
SPARC V9 implementation means Binary compatibleUp to 8 cores with 4 threads per core to provide up to 32 simultaneous threadsAll cores connected through a 134GB/sec crossbar switchHigh-bandwidth 12-way associative 3MB Level-2 cache on chip4x DDR2 channels (23GB/s total)
5,750MB/sec per controller1,437MB/sec per DIMM1.8Volts (DDR = 2.5Volts)
US-T1 Power : < 79W !
~300M transistors
90nm chip
T2000 = 325W, 2U
V880 = 3000W, 17U
v2.2 Page 17
C8C7C6C5C4C3C2C1
Xbar
FB-DIMMs FB-DIMMs FB-DIMMs FB-DIMMs
Introducing UltraSPARC T2
Sys I/FBuffer Switch
Core
SPARC V9 implementation means Binary compatibleUp to 8 cores with 8 threads per core to provide up to 64 simultaneous threads
2x Execution Units per coreDedicated Floating-Point Unit per core allows for non-blocking FP threadsIntegrated/Enhanced crypto co-processor per coreHigh-bandwidth (8 bank) 16-way associative 4MB Level-2 cache on chip
4x FB-DIMM channels (50GB/s Read, 42GB/sec Write)
up to 64 DIMMs supportedPower : < 80W !65nm chip (T1 is 90nm)16 integer execution units (T1 has 8)16 instructions/clock cycle (T1 is 8)8 integer pipelines (T1 has 6)E10K = 9600W (half perf of US-T2)
FPUFPUFPUFPUFPUFPUFPUFPU
4x Dual FBDIMM Channels – 32 - 64 DIMMS Memory B/w = 50GB/sec Read, 42GB/sec Write
L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$
2x 10GE Ethernet2.5Gb/sec bi-directional per lane
PCI-ExNIU
(E-net+)
x8 @ 2.5GHz
v2.2 Page 18
Checking Application Fit with cooltst• De-risks investment decisions
> Measures floating point content> Measures number of active LWPs
• Download at http://cooltools.sunsource.netSample Output:
Peak thread utilization at 2007-10-01 20:41:15 Corresponding file name 1191296475 CPU utilization 24.5% Command Xorg PID/LWPID 627/1 Thread utilization 12%
Advice
Floating Point GREEN Observed floating point content was not excessive for an UltraSPARC T1 processor. Floating point content is not a limitation for UltraSPARC T2.
Parallelism GREEN Observed parallelism was adequate for an UltraSPARC T1/T2 processor to be effectively utilized.
v2.2 Page 19
Benchmarks and Cool ToolsApplication Benchmarks on T-Series servers
http://www.sun.com/servers/coolthreads/benchmarks/index.jsp (e.g. SAP, Lotus Notes, Java etc...)ISV Endorsements for T-Series servers
http://www.sun.com/servers/coolthreads/testimonials/isv.jspTuning resources for T-Series servers
http://www.sun.com/servers/coolthreads/tnb/applications.jsp (CoolThreadsTuning and Resources)
CoolTools Suite for T-Series serversavailable at http://www.opensparc.net/cooltools/index.html
Cooltst at http://cooltools.sunsource.net/cooltst/to determine whether a running workload on a UNIX server is suitable for the T-Series servers
For T-Series application development there is :Sun Studio 12 (Optimising C, C++ and Fortran compilers with Netbeans, IDE and other performance tools)Sun Application Porting Assistant (A Static Source Code Analysis and Code Scanning Tool that identifies incompatible APIs between Linux and Solaris)GCC4SS (C, C++ compiler and for apps that are normally compiled with gcc)BIT (Binary Improvement Tool works directly with SPARC binaries to instrument, optimize, and analyze them for performance or code coverage)SPOT (Simple Performance Optimisation Tool produces a report on the performance of an application. The spot report contains detailed information about various common conditions that impact performance)Faban (Benchmark Framework Consolidation of benchmark development and management knowledge and experience to aid the development and running of benchmarks)Solaris Grid Compiler
v2.2 Page 20
Cool ToolsFor T-Series Tuning and Debugging there is :
ATS (Automatic Tuning and Troubleshooting System is a binary reoptimization and recompilation tool that can be used for tuning and troubleshooting applications)Corestat (for the online monitoring of core utilisation)Discover (Sun Memory Error Discovery Tool detects programming errors related to the allocation and use of program memory at runtime)Thread Analyser (Analyses the execution of a multi-threaded program and checks for multi-threaded programming errors such as "data races" and "deadlocks")
For T-Series Deployment there is :CoolTuner (Automatically tunes T-Series servers, applying patches and setting System Parameters to best practice recommendations - with self auto-update feature)Cool Stack (Optimised Open Source Software Stack for Apps such as Apache, MySQL, Perl, PHP, Squid and Tomcat)Consolidation Tool (Simplifys the task of consolidating multiple applications on the T-Series servers using Solaris Containers)
For T-Series Architecture exploration there is :SHADE (is a fast SPARC instruction set simulator that is used to perform a variety of analysis functions on SPARC executables)RST Trace Tool (is a trace format for SPARC instruction-level traces.Sun Studio 12 Compilers and ToolsSun Studio 12 software is the premier development environment for the Solaris operating system.
v2.2 Page 22
Consolidation
Logical Domain Benefits• Run multiple virtual machines simultaneously on a single platform
> Secure consolidation of different operating environments> Increase utilisation of CoolThreads architecture
• Domains can communicate & serve each other > Virtual data center in a box
• Minimise/eliminate need for OS upgrades with new platforms> Reduce customer qualification costs, and protect software investments
v2.2 Page 23
Concepts of Logical Domains (LDoms)
• SPARC / CMT based virtualisation technology• Each Logical Domain :
> Appears as a fully independent server> Has a unique OS install and configuration in all ways> Configurable CPU, Disk. Memory and I/O resources
• Up to :> 32 LDom on T2000 (UltraSPARC T1)> 64 LDom on T5220 (UltraSPARC T2)> 128 LDom on T5240 (UltraSPARC T2 Plus)> 128 Ldom on T5440 (UltraSPARC T2 Plus)
• Dynamic resource allocation• Isolation via hardware/firmware
x8 @2.5GHz
Full Cross BarC0 C1 C2 C3 C4 C5 C6 C7
FPU FPU FPU FPU FPU FPU FPU FPU
L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$
FB DIMM FB DIMM FB DIMM FB DIMM
FB DIMM FB DIMM FB DIMM FB DIMM
PCI-ExNIU(E-net+)
Sys I/FBuffer Switch Core
2x 10GE EthernetPower <90W
v2.2 Page 24
Hypervisor
72GB
Hardware Shared CPU, Memory & IO
IO Devices
PCI-E A PCI-E B
Key LDoms Components
• The Hypervisor• The Control Domain• The I/O Domain• Multiple Guest Domains• Virtualised devices
Network
Solaris 10 11/06
/dev/dsk/c0d0s0
vdisk0
vnet
vnet0
LDom1
Solaris 10 11/06
/dev/dsk/c0d0s0
vdisk0
vnet
vnet0
LDom 2
CPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
MemMem
MemMem
MemMem CryptoCryptoCryptoCrypto
CryptoCrypto
Crypto
UnallocatedResourcesSolaris 10
11/06
ldmd
vntsddrd
Control Domain
MemMem
CryptoMemMem
CPUCpu CPUCpu
CryptoMemMem
CPUCpu CPUCpu
CryptoMemMem
CPUCpu CPUCpu
MemMem Crypto
CPUCpu
/dev/lofi/1
vds0
vsw0
v2.2 Page 25
72GB
Hardware Shared CPU, Memory & IO
IO Devices
PCI-E A PCI-E B
Key LDoms Components
• The Hypervisor• The Control Domain• The I/O Domain• Multiple Guest Domains• Virtualised devices
Network
Solaris 10 11/06
/dev/dsk/c0d0s0
vdisk0
vnet1
vnet0
LDom1
Solaris 10 11/06
/dev/dsk/c0d0s0
vdisk0
v1
vnet0
Ldom 2
CPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
MemMem
MemMem
MemMem CryptoCryptoCryptoCrypto
CryptoCrypto
Crypto
UnallocatedResourcesSolaris 10
11/06
ldmd
vntsddrd
Control Domain
MemMem
CryptoMemMem
CPUCpu CPUCpu
CryptoMemMem
CPUCpu CPUCpu
CryptoMemMem
CPUCpu CPUCpu
MemMem Crypto
CPUCpu
/dev/lofi/1
vds0
vsw0
Hypervisor
v2.2 Page 26
Hypervisor Support
• Hypervisor firmware responsible for :> maintaining separation (e.g. visible hardware parts) between domains> Provides Logical Domain Channels (LDCs) so domains can communicate
with each other> Mechanism by which domains can be virtually networked with each other, or provide services to
each other> MMU maps RAM into domains' address spaces, and a protocol lets hypervisor and domains
queue and dequeue messages
• Using extensions built into a sun4v CPU> Is an integral component of the shipping systems> Not installed as part of a software distribution
v2.2 Page 27
Hypervisor
72GB
Hardware Shared CPU, Memory & IO
IO Devices
PCI-E A PCI-E B
Key LDoms Components
• The Hypervisor• The Control Domain• The I/O Domain• Multiple Guest Domains• Virtualised devices
Network
Solaris 10 11/06
/dev/dsk/c0d0s0
vdisk0
vnet1
vnet0
LDom1
Solaris 10 11/06
/dev/dsk/c0d0s0
vdisk0
v1
vnet0
Ldom 2
/dev/lofi/1
vds0
vsw0
CPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
MemMem
MemMem
MemMem CryptoCryptoCryptoCrypto
CryptoCrypto
Crypto
UnallocatedResourcesSolaris 10
11/06
ldmd
vntsddrd
Control Domain
MemMem
CryptoMemMem
CPUCpu CPUCpu
CryptoMemMem
CPUCpu CPUCpu
CryptoMemMem
CPUCpu CPUCpu
MemMem Crypto
CPUCpu
v2.2 Page 28
Control Domain
• Configuration platform for managing server and domains> Allows monitoring and re-configuration of domains> Interfaces with the hypervisor to set up access rule sets> Administers the constraints engine and resource mapping
• Runs the LDom Manager software > One Manager per host Hypervisor> Controls Hypervisor and all its Logical Domains> Exposes control interfaces
> ldm command and ldmd daemon
v2.2 Page 29
• One Manager per host Hypervisor> Controls Hypervisor and all its Logical Domains
• Exposes control interfaces> CLI> WS-MAN> XML
• Maps Logical Domains to physical resources> Constraint engine> Heuristic binding of Logical Domains to resources
> Assists with performance optimisation> Assists in event of failures / blacklisting
LDom Manager
v2.2 Page 30
Ldm Evolution• 1.1 (December 2008)
> Virtual I/O Dynamic Reconfiguration> no need to reboot when adding/removing storage> no need to reboot when adding/removing vnets
> VLAN support> use VLANs with guest domains> VLAN tagging supported
> Virtual Disk Failover> a traffic manager in guest LDOMs
> Single Disk Slice enhancements> NIU hybrid I/O implemented> Fault Management Architecture (FMA) I/O improves
I/O reporting> Enhanced XML v3 interface for monitoring and
controlling LDOMs> Power management feature in T2 and T2+ chips
saves power when all of the threads on a core are idle
Firmware required for 1.1
UltraSPARC T2 Plus 7.2.xUltraSPARC T2 7.2.xUltraSPARC T1 6.7.x
Required Software Patches
Solaris 10 5/08 137111-09 Solaris 10 8/07 137111-09Solaris 10 11/06 137111-09
v2.2 Page 31
Hypervisor
72GB
Hardware Shared CPU, Memory & IO
IO Devices
PCI-E A PCI-E B
Key LDoms Components
Network
Solaris 10 11/06
/dev/dsk/c0d0s0
vdisk0
vnet1
vnet0
LDom1
Solaris 10 11/06
/dev/dsk/c0d0s0
vdisk0
v1
vnet0
Ldom 2
CPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
MemMem
MemMem
MemMem CryptoCryptoCryptoCrypto
CryptoCrypto
Crypto
UnallocatedResources
Control Domain
MemMem
CryptoMemMem
CPUCpu CPUCpu
CryptoMemMem
CPUCpu CPUCpu
CryptoMemMem
CPUCpu CPUCpu
MemMem Crypto
CPUCpu
• The Hypervisor• The Control Domain• The I/O Domain• Multiple Guest Domains• Virtualised devices
Solaris 10 11/06
/dev/lofi/1
vds0
vsw0
v2.2 Page 32
I/O Domain• Provides virtual device services to other domains
> Networking - virtual switches> Storage - virtual disk servers> Serial - virtual console concentrator
• Multiple I/O domains can exist with shared or sole access to system facilities
• Owns the physical I/O and provides I/O facilities to itself and other guest domains> Allows I/O load separation and redundancy within domains deployed on a
platform
v2.2 Page 33
I/O Virtualisation
• Paravirtualised Model> Frontend device / backend
service architecture> Bi-directional, point to point
channel> Separate transmit & receive
queues
v2.2 Page 36
Control Domain
Redundant Network Guest Ldom #1 IPMP in Control Domain
Hypervisor
I/O Domain
e1000g0
Driver Driver DriverDriver Driver Driver DriverDriver Driver Driver DriverDriver
IPMP0 }VNET0
VSW0
Guest Ldom #3 Link Aggregation
DLA
VNET3
VSW3
Guest Ldom #4 Redundant IPMP
VNET4 VNET5
VSW4 VSW5
IPMP2}e1000g1 e1000g2 e1000g3 e1000g4 e1000g5 e1000g6 e1000g7 e1000g8 e1000g9 e1000g10 e1000g11
Guest Ldom #2 IPMP in Guest Ldom
IPMP1
VNET1 VNET2
VSW1 VSW2}
v2.2 Page 37
Redundant Network
• IPMP across 2 virtual switches• Create a second virtual switch in the control domain
> ldm add-vswitch mac-addr=<mac addr of e1000g1> net-dev=e1000g1 primary-vsw1 primary
• Add a second network interface in the guest domain> ldm add-vnet vnet2 primary-vsw1 <ldom_name>
• In the guest domain make an ipmp group of the two interfaces> Ifconfig vnet0 group ipmp1> Ifconfig vnet1 group ipmp1
• Alternatively:> Create an ipmp group in the control domain and connect that
to the virtual switch> ifconfig e1000g1 group ipmp2> ifconfig e1000g2 group ipmp2> ldm add-vswitch net-dev=ipmp2 primary-vsw2 primary
v2.2 Page 38
Redundant Network• Data link administration ( in Solaris 10 ) also provides link
aggregation
> # dladm create-aggr -d e1000g1 -d e1000g3 -d e1000g4 agg_link1> # dladm show-aggr> key: agg_link1 (0x0001) policy: L4 address: 0:14:4f:97:87:69 (auto)> device address speed duplex link state> e1000g1 0:14:4f:97:87:69 1000 Mbps full up attached> e1000g3 0:14:4f:97:87:6b 1000 Mbps full up attached> e1000g4 0:15:17:3a:92:18 1000 Mbps full up attached
> # ldm add-vsw net-dev=agg_link1 primary-vsw0 primary> # ifconfig primary-vsw0 plumb 10.1.1.110> # ifconfig e1000g0 unplumb> # dladm add-aggr -d e1000g0 agg_link1> # mv /etc/hostname.e1000g0 /etc/hostname.vsw0
v2.2 Page 39
Hypervisor
72GB
Hardware Shared CPU, Memory & IO
IO Devices
PCI-E A PCI-E B
Key LDoms Components
Network
LDom1
Solaris 10 11/06
Ldom 2
Solaris 10 11/06
Control Domain
• The Hypervisor• The Control Domain• The I/O Domain• Multiple Guest Domains• Virtualised devices
Solaris 10 11/06
MemMem
/dev/dsk/c0d0s0
vdisk0
vnet1
vnet0
/dev/dsk/c0d0s0
vdisk0
v1
vnet0
/dev/lofi/1
vds0
vsw0
CPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
MemMem
MemMem
MemMem CryptoCryptoCryptoCrypto
CryptoCrypto
Crypto
CryptoMemMem
CPUCpu CPUCpu
CryptoMemMem
CPUCpu CPUCpu
CryptoMemMem
CPUCpu CPUCpu
MemMem Crypto
CPUCpu
v2.2 Page 40
Virtual Subsystems• Virtual devices abstract physical devices• Inter-Domain I/O via Logical Domain Channels (LDCs) configured in
the Control domain through the hypervisor• Virtual devices are :-
> CPU's> Memory> Crypto cores > Network switches> NICs> Disk servers> Disks> Consoles> A Virtual Terminal Server (vntsd)
v2.2 Page 41
Resource Reconfiguration
• Ability to grow or shrink compute capacity of an LDom on demand• Simply add / remove:
> Cores / threads - dynamic> Memory - delayed reconfiguration> I/O - dynamic
• Improve utilisation by balancing resources between LDoms
v2.2 Page 42
Warm Migration 1 - Initialisation
Guest Domain
virtual disk backend(NFS file or shared disk)
vdsk
ControlDomain
vnet vsw
vds
ControlDomain
vsw
vds
Physical
MemoryCPUs
MemoryCPUs
MemoryCPUs
System A System B
ldmd ldmdNetwork
ldmd on System A checks with ldmd on System B if migration is possible
v2.2 Page 43
Warm Migration 2 – New Guest Creation
Guest Domain
virtual disk backend(NFS file or shared disk)
vdsk
ControlDomain
vnet vsw
vds
Guest Domain
vdsk
ControlDomain
vnetvsw
vds
Physical
MemoryCPUs
MemoryCPUs
MemoryCPUs
Memory1 CPU
System A System B
ldmd ldmdNetwork
ldmd on System B creates and binds a similar domain with 1 CPU
v2.2 Page 44
Warm Migration 3 – Shrink Source Guest
Guest Domain
virtual disk backend(NFS file or shared disk)
vdsk
ControlDomain
vnet vsw
vds
Guest Domain
vdsk
ControlDomain
vnetvsw
vds
Physical
Memory1 CPU
MemoryCPUs
MemoryCPUs
Memory1 CPU
System A System B
ldmd ldmdNetwork
ldmd on system A removes all but one CPUs on the source guest
v2.2 Page 45
Warm Migration 4 – State Transfer
Guest Domain
virtual disk backend(NFS file or shared disk)
vdsk
ControlDomain
vnet vsw
vds
Guest Domain
vdsk
ControlDomain
vnetvsw
vds
Physical
Memory1 CPU
MemoryCPUs
MemoryCPUs
Memory1 CPU
transfermemory
and cpu state
System A System B
ldmd ldmdNetwork
ldmd on system A suspends the last CPU and transfers state
v2.2 Page 46
Warm Migration 5 – Target Guest Resume
Guest Domain
virtual disk backend(NFS file or shared disk)
vdsk
ControlDomain
vnet vsw
vds
Guest Domain
vdsk
ControlDomain
vnetvsw
vds
Physical
Memory1 CPU
MemoryCPUs
MemoryCPUs
Memory1 CPU
System A System B
ldmd ldmdNetwork
ldmd on System B resumes the target guest with one cpu
v2.2 Page 47
Warm Migration 6 – Completion & Cleanupvirtual disk backend
(NFS file or shared disk)
ControlDomain
vsw
vds
Guest Domain
vdsk
ControlDomain
vnetvsw
vds
Physical
MemoryCPUs
MemoryCPUs
MemoryCPUs
System A System B
ldmd ldmdNetwork
ldmd on System B add other cpus – ldmd on System A destroys the source guest
v2.2 Page 48
Cold and Live Migration• Cold and Live migration can migrate between different system
and CPU types• Warm Migration requires same system and CPU type• Cold Migration operation is fast• Live Migration requires OS support (aka cooperative guest
support)• Time to migrate a domain is largely determined by
> Type of migration being performed> Network speed> Size of guest image (Warm Migration)
v2.2 Page 49
Virtual I/O Dynamic Reconfiguration
• Add/Remove virtual I/O services and deviceswithout rebooting> vds, vsw, vdisk, vnet, vcc
• No CLI changes but effect is immediate• Examples:
# ldm add-vdisk vdiskN diskN@primary-vds0 ldg1
# ldm add-vnet vnetN primary-vsw0 ldg1
vdiskN and vnetN are immediately available in domain ldg1• A device cannot be removed if it is in use
v2.2 Page 50
Network Hybrid I/O• Network virtualised I/O path:• Guest domain ↔ service domain ↔ physical NIC• Network hybrid I/O path:• Guest domain ↔ physical NIC• Except broadcast and multicast• Better performance and scalability• No overhead of the service domain virtual switch• Hardware Requirements:
> UltraSPARC T2 based system> 10Gb ethernet XAUI adapter (nxge interface)
v2.2 Page 51
Network Hybrid I/O 1
A non-hybrid vnet sends/receives all packets through the service domain
Service Domain Guest Domain
vsw vnet(hybrid)
nxge0
LDC Hypervisor
Physical Network
Guest Domain
vnet(non-hybrid)
XAUIAdapter
v2.2 Page 52
Network Hybrid I/O 2
A hybrid vnet sends/receives only broadcast and multicast packets through the service domain
Service Domain Guest Domain
vsw vnet(hybrid)
nxge0
LDC Hypervisor
Physical Network
Guest Domain
vnet(non-hybrid)
broadcast & multicast
XAUIAdapter
v2.2 Page 53
Network Hybrid I/O 3
A hybrid vnet sends/receives unicast packets directly to/from the NIU card using dedicated DMA channels
Service Domain Guest Domain
vsw vnet(hybrid)
nxge0
LDC
Physical Network
Guest Domain
vnet(non-hybrid)
XAUIAdapter
v2.2 Page 54
VLAN (802.1q) Support• Add VLAN Support to Virtual Network I/O • Ethernet packets switching based on VLAN Ids• Support Added to vsw and vnet
> vnet and vsw can now service multiple subnets• Features Similar to a Physical Switch with VLAN• Support untagged and tagged mode• VLAN IDs are assigned with the ldm CLI• Untagged Mode:
> Associate a port-vlan-id (PVID) with a vnet/vsw interface• Tagged Mode
> Associate VLAN id(s) with a vnet/vsw interface
v2.2 Page 55
Virtual Disk Failover• Disk multipathing between different service domains• CLI:
# ldm add-vdsdev mpgroup=foo /path/to/disk/backend/from/primary/domain disk@primary-vds0
# ldm add-vdsdev mpgroup=foo /path/to/disk/backend/from/alternate/domain disk@alternate-vds0
# ldm add-vdisk disk disk@primary-vds0 guest
v2.2 Page 56
Virtual Disk Failover 1
ServiceDomain 1(primary)
Guest Domain
disk@primary-vds0mpgroup=foo
virtual disk server(primary-vds0)
vdc
virtual disk backend
vdisk
LDC 1Hypervisor
ServiceDomain 2
(alternate)
disk@alternate-vds0mpgroup=foo
virtual disk server(alternate-vds0)
LDC 2active channel backup channel
v2.2 Page 57
Virtual Disk Failover 2
ServiceDomain 1(primary)
Guest Domain
disk@primary-vds0mpgroup=foo
virtual disk server(primary-vds0)
vdc
virtual disk backend
vdisk
LDC 1Hypervisor
ServiceDomain 2
(alternate)
disk@alternate-vds0mpgroup=foo
virtual disk server(alternate-vds0)
LDC 2channel down active channel
Service Domain/LDC 1 down → Guest switches to another LDC channel
v2.2 Page 58
Other Features• Single-Slice Disk Enhancements• Ability to install Solaris on a single-slice disk• Single-slice disks are now visible with format(1m)• Power Management• Ability to power off unused CPU cores• LDC VIO Shared Memory for DRing• Improved virtual network I/O performances
> Requires a Solaris patch• iostat(1M) Support in Guest Domains
v2.2 Page 60
Solaris Containers – Summary• Solaris 10 technology providing OS virtualisation• Support multiple, isolated application environments in one OS
instance• Software-based solution therefore
> No application changes or recompilation> No additional hardware requirements> No licensing or support fees
• A combination of :> Zones > Resource Management
• Branded extension to zones technology> Enables Solaris Containers to assume different OS personalities > Solaris 8, 9 & Linux Containers
v2.2 Page 61
Containers Block Diagram
network device(hme0)
storage complex
global zone (v880-room2-rack5-1; 129.76.1.12)
dns1 zone (dnsserver1)
zoneadmd
Sol 8 zone (Solaris 8)
remote admin/monitoring(SNMP, SunMC, WBEM)
platform administration(syseventd, devfsadm, ifconfig, metadb,...)
core services(inetd)
Sol 8 core services(NIS, xinetd, autofs)
core services(inetd, rpcbind, sshd, ...)
zone root: /zone/dns1 zone root: /zone/sol8
network device(ce0)
zone management (zonecfg(1M), zoneadm(1M), zlogin(1), ...)
ce0:
3
ce1:
1
hme0
:1
zcon
s
zcon
s
zoneadmd
/usr
/usr
Appli
catio
nEn
viron
ment
Virtu
alPl
atfor
m
login services(SSH sshd)
network services(named)
zoneadmd
web1 zone (foo.org)
network services(Apache, Tomcat)
core services(inetd)
zone root: /zone/web1
hme0
:2
ce0:
1
zcon
s
/usr
zoneadmd
web2 zone (bar.net)
network services(IWS)
core services(inetd)
zone root: /zone/web2
hme0
:3
ce0:
2
zcon
s
/usr
pool2 (4 CPU)
network device(ce1)
login services(SSH sshd)
login services(SSH sshd, telnetd)
10
pool1 (4 CPU), FSS
30 60
Sol 8 user apps (OpenSSH acroread,MATLAB, yum, pandora)
v2.2 Page 62
Containers Today
• Fair-Share Scheduler > Guarantees minimum
portion of CPU to a zone> Conflict based
enforcement
• Dedicated CPUs > Specifies a quantity of
available CPU to a zone> Requires the use of
temporary resource pools
> Configured using zonecfg command
• CPU Caps > Hard limit on allocated
CPU to a zone
• RAM cap – zone aware rcapd> Enforces a maximum
amount of physical memory
> Configured and enforced in the global zone
• Swap cap> Specifies a maximum
amount of swap space available to a zone – not a region of swap disk
> Configured and enforced by the global zone
• Locked-memory cap> Limits amount of memory
that is specifically marked 'not eligible for paging'
v2.2 Page 63
In an exclusive-IP zone:all IP packets enter or leave through the zone's NIC(s)DHCPv4 and IPv6 stateless address auto-configuration worksrouting can be configured for that zone
IP Multipathing can be configured if the zone has >1 NICsndd can be used to set TCP/UDP/SCTP/IP/ARP parameters
Limitations - Reliance on GLDv3 means no initial support for 'legacy' NICs (ce, hme, qfe, eri)
Multiple Stacks/IP InstancesProviding Even More Isolation
v2.2 Page 64
Solaris 8 or 9 Containers
Solaris 10Global
M-seriesT2000/T5120/T5220
Solaris 10 Container
ZFS DTraceFMA Solaris 10 Kernel
DatabaseApplication
Solaris 8 or 9
Physical to Virtual (P2V)
Using Containers to migrate to Solaris 10
Branded
Server
OS
ApplicationDatabaseApplication
Solaris 8 or 9 Container
only
v2.2 Page 65
Solaris Containers Logical DomainsHigh Level Functional Comparison
• Available everywhere Solaris runs: USII and up, x86/64
• Far smaller CPU, disk, memory footprint than full OS images
• Extremely efficient – suitable for “large N” deployments
• SRM or pool managed sub-CPU resource management
• Some restrictions (like NFS server in a zone)
• Single kernel, with implications for patching
• Available on CMT SPARC only> As many domains as there are
CPU threads> For workloads applicable to T-
series servers • Flexible resource allocation• Each domain runs an independent
OS: can be different levels and independently patched
• Efficient implementation, with less overhead than other virtual machines, but still an entire OS instance
v2.2 Page 67
Open Virtualisation for Desktop to Datacentre
Open developer Virtualisation platform
Manage heterogeneous datacentres
Enterprise-class hypervisor
Only VDI with choice: Windows, OpenSolaris and Linux delivered securely
v2.2 Page 68
Sun xVM Infrastructure OverviewSun xVM Infrastructure
+
For x86 For SPARC
Inventory *
A complete solution for Virtualising and Managing your Datacentre
Solaris Windows Linux LinuxSolarisVDI
CMT based SPARC Platforms
Sun / 3rd Party x86 PlatformsLinux Linux on CMT is not directly supported by Sun
xVM Ops Center
Physical & Virtual platforms
xVM Server
H/W Monitoring *
Patch lifecycle
Firmware Mgr
O/S Provision
xVM Server Mgr
Discovery *
App Provision
* Not automated on all platforms today, manual intervention maybe needed
Will be released as a Software Appliance
SPARC & x86 Systems
v2.2 Page 70
Sun xVM Roadmap
xVM Server
xVM Ops Center
4QCY07 1QCY08 2QCY08 3QCY08
xVM ServerFirst release of xVM Server available in OpenSolaris
xVM Early AccessHypervisor optimised distro, Management interfaces
xVM Ops CenterFirst Alpha customer installs
xVM Ops Center1.0 FCS
xVM Ops CenterAdditional platforms, performance, install/upgrade automation
xVM Ops CenterActive xVM Server Mgmt for x86
xVM Server FCS
v2.2 Page 72
Production, Dev, Test
• Production Systems> Production Server> Failover Server (cluster)> Business Continuity Server
• User Acceptance Test (UAT)> Copy of the Production Environment
• Multiple Development & Test Systems
Conventional Approach with Multiple Discrete Environments
V490
v2.2 Page 73
Production, Dev, Test
• System 1> Production LDOM> Failover LDOM
• System 2> Business Continuity LDOM> User Acceptance Test LDOM> Multiple Development LDOM> Multiple Test LDOM
• If System 1 fails, then the Test & Dev LDOMs contract to allow the UAT LDOM to become a Business Continuity copy of the Production LDOM
Virtualised Approach with Multiple LDOMs
v2.2 Page 74
TCO Savings
Netra 240
Sun Rack 900 38
T5220
Sun Rack 900 38
V490
V490
V490
Netra 240Netra 240Netra 240
Netra 240
T5220
OLD NEWList Price £170,450 £43,800 x4
Space 25 RU 4 RU x6Power 5,355 Watts 1,110 Watts x5
£58,356 £19,984 x3Cores 34 16 x2
M Values 540,500 576,000 =
3 Year Support
3 x V490 4 x 1.8GHz USIV+ 32GB Memory5 x V245 2 x 1.5GHz USIIIi 2GB Memory
2 x T5220 8 core 1.4GHz 64GB Memory
v2.2 Page 75
We Make it
Easy
Download Software
Free!
Sun Developer NetworkSun Advisory Panel
Inner CircleExecutive Boardroom
Promotions Customer Stories
Sun StoreJonathan’s Blog
Go to sun.com Join a Community
Try and Buy
Try out today’s latest technology before you cut a P.O.
The cost of infrastructure software is as low as it goes
60 Days Risk-Free
v2.2 Page 78
Logical Domain Setup • Control Domain setup
> Services required for control domain• Guest Domain setup• Demo
> Scripts– Guest Domain
● Creation● Deletion● resource allocation● rapid deployment● migration
v2.2 Page 79
HardwareShared CPU,Memory & IO
Control Domain setup
• On the Control Domain do the following ...
• Ensure that the system is properly installed first
• In this case we will combine the Control and Service domains
• Set up the basic services needed..
> pkgadd SUNWldm> svcadm ldmd start>> ldm add-vdiskserver primary-vds0 primary> ldm add-vswitch mac-addr=<mac addr of e1000g0> net-dev=e1000g0 primary-vsw0 primary> ldm add-vconscon port-range=5000-5100 primary-vcc0 primary>> ldm set-vcpu 8 primary> ldm set-mem 4g primary> ldm add-config initial> init 6 (reboot for config to take effect )> svcadm enable vntsd
IO Devices
Hypervisor
Control & Serviceprimary
CryptoMemMem
CPUCpu CPUCpu
72GB
Network
Solaris 10 8/07
ldmdvntsd
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
MemMem
CPUCpu
CPUCpu
PCI-E APCI-E A
CPUCpuCPUCpu
MemMem
MemMem
MemMem
MemMem
MemMem
MemMem CryptoCryptoCryptoCrypto
CryptoCrypto
CryptoCrypto
UnallocatedResources
/ldom/image1
vol1
vnet1
Crypto
drd
Control
v2.2 Page 80
ZFS overview
• 128 bit file system• Simple to use
> Two commands only> zpool to create and manage pools of storage ( zpool create mypool c1t0d0 )> zfs to create and manage file systems (zfs create mypool/terry )
• Self Healing capabilities• Snapshot and cloning capability
> Quick and cheap> Snapshot is a read-only copy of the original file system and only holds the
deltas> Clone is a writeable copy ( but still references original )> Allows an easy way to create and clone both LDOMs and zones
v2.2 Page 81
HardwareShared CPU,Memory & IO
Guest Domain setup
• On the Control Domain do the following ...
• Ensure that the system is properly installed first
• We will use a disk image on the control domain..> mkfile 5g /ldom_disk/<ldom_name>/os
• Create our new domain description> ldm add-domain <ldom_name>> ldm add-vcpu 2 <ldom_name>> ldm add-mem 3g <ldom_name>> ldm add-vnet vnet1 primary-vsw0 <ldom_name>> ldm add-vdsdev /ldom_disk/<ldom_name>/os zvol_<ldom_name>@primary-vds0> ldm add-vdisk zvdisk_<ldom_name> zvol_<ldom_name>@primary-vds0 <ldom_name>> ldm set-variable auto-boot\?=true <ldom_name>> ldm set-variable boot-device=/virtual-devices@100/channel-devices@200/disk@0
<ldom_name>> ldm bind <ldom_name>
• Now just need to start the domain...> ldm start <ldom_name>
• Watch the console of <ldom_name> using ...> telnet localhost <port no of ldom_name>
IO Devices
Hypervisor
Control & Serviceprimary
CryptoMemMem
CPUCpu CPUCpu
72GB Network
Solaris 10 11/06
ldmdvntsd
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpu
CPUCpuCPUCpuGuestldom1
CryptoMemMem
CPUCpu CPUCpu
Solaris 10 11/06+app+patches
PCI-E APCI-E A
CPUCpuCPUCpu
MemMem
MemMem
MemMem
MemMem
MemMem
MemMem CryptoCryptoCryptoCrypto
CryptoCrypto
CryptoCrypto
UnallocatedResources
/ldom/ldom_name>/os
zvol_<ldom_nam>
vnet1
primary-vds0primary-vsw0
/dev/dsk/c0d0s0
vdisk0
vnet1
vnet0
drd
Control Guest
v2.2 Page 83
Workloads on LDom Platform Real World Applications
1. Enterprise Middleware Solutions> Glassfish J2EE Reference Application Server> Built as if using cloned image
2. Suitability for Web2.0 and the SAMP Stack> Drupal Content Management System built on Solaris, Apache, MySQL and
PHP> Allocate additional (vCPU) resources on the fly
3.RAS with an Enterprise Database> Oracle 10.2.0.3 RDBMS and Dataguard to a Standby DB
4. Migration of legacy apps to Solaris 10> with Solaris 8 Containers
5. Solaris Binary Compatibility> Solaris 10 Container running the same app as Solaris 8 container
v2.2 Page 86
1.Using the latest Enterprise Middleware
• Highlights> Glassfish is the open source J2EE reference application server > Cloned LDom (zfs image) migrated from one physical server to another> Recreate the guest domain services, as per the original instance> Restart services and away you go...> Great for rapid deployment of production, development and testing
environments
v2.2 Page 88
2. Web 2.0/SAMP Highlights
• Highlights> Demonstrates a leading CMS system, Drupal, built on SAMP
Solaris, Apache, MySQL and PHP/Perl/Python (Cool Stack optimised for CMT)Similar to the LAMP stack (Linux, Apache, MySQL and PHP/Perl/Python) Free and Open !
> LDom running under stress, as in the real world> Additional resources added dynamically
• A complex stack can be put together in minutes using CoolStack> Deployed to great effect on CMT servers at very little expense
v2.2 Page 90
3. Oracle Highlights
• HighLights> Oracle 10.2.0.3 RDBMS runs just as it does on a Native SPARC/Solaris
system> DataGuard
> Logs shipped in the normal manner> The tools work in the normal manner
> Oracle Enterprise Manager> A test harness (iGenOLTP) that simulates real workload
> Demonstrates that the following work with LDoms:– Sun Java System Directory Server– Tomcat servlet engine
● NB: iGenOLTP is built on SLAMD which is constructed from an open LDAP server such as Sun Java System Directory Server and a J2EE servlet engine such as Tomcat - see http://www.slamd.org
v2.2 Page 92
4. & 5. Container Demo Highlights
• Highlights> Legacy App: Oracle 8.0.5
> De-supported by Oracle for a number of years> Two options
> Use binary compatibility and deploy on Solaris 10> Use Solaris 8 containers
v2.2 Page 94
Who are Zeus TechnologyZeus Technology develops Internet Application Traffic Management software - aka ‘Application Delivery Controllers’(ADC).
Corporate HQ in Cambridge, UK US Headquarters in Mountain View, CA Over 10 year’s experience in network and web application delivery. Over 1300 deployments of ZXTM (Zeus Extensible Traffic Manager). Many ‘global brand’ customers.
Steve WebbVP Strategic AccountsPhone: +44 1223 525000Cell: +44 7973 [email protected]
v2.2 Page 95
Why do you need an “Application Delivery Controller”?• To make your networked and web-enabled applications
faster, more reliable, secure and easier to manage.
MORE RELIABLE:• Load Balancing• Fault Tolerance• Monitoring• Bandwidth Shaping• Request Rate Shaping
FASTER:• SSL and XML offload• Content Compression• HTTP Caching• HTTP Multiplexing • TCP Offload
MORE SECURE:• Server Isolation• Traffic Filtering • Traffic Scrubbing• DoS protection• Application Protection
EASIER TO MANAGE:• Deployment of apps• Visualization via powerful GUI• Reporting and alerting• Control API
v2.2 Page 96
The ‘Logical’ View
• Web servers: Apache, IIS, Sun, Zeus, lighttpd etc.• App. servers: Tomcat, JBOSS, JES, Glassfish, WebLogic, WebSphere, OAS, .NET, PHP, Ruby on Rails• D’base servers: MySQL, SQL server, Oracle etc.
v2.2 Page 97
Solaris LDOM Advantage 1
DB SvrApp SvrWeb SvrZXTM
LDOM 4LDOM 3LDOM 2LDOM 1
DB SvrApp SvrWeb SvrZXTM
LDOM 4LDOM 3LDOM 2LDOM 1
• Load-balanced, fault-tolerant ‘SAMP’ cluster in just 2U of rack space (saving space, power and heat)
v2.2 Page 98
Solaris LDOM Advantage 2
ZXTMZXTMZXTMZXTM
LDOM 4LDOM 3LDOM 2LDOM 1
• Traffic partitioned through 4 * ADC clusters (with each cluster running active-active) on just 2 * CMT servers
ZXTMZXTMZXTMZXTM
LDOM 4LDOM 3LDOM 2LDOM 1
v2.2 Page 99
SummarySoftware ADCs can be deployed like any other application:• Load balanced ‘SAMP’ stack - all on a single Sun CMT server.
Scale ‘up and out’ as your business grows.
• Multiple ADC clusters on just 2 * Sun CMT servers enabling application traffic partitioning for hightened security and to protect against any possible application-induced crashes.
• Making network and web applications run faster, run more reliably, run more securely and making them easier to manage – all with maximum flexibility of deployment.
v2.2 Page 100
Try ZXTM yourself
• ZXTM Desktop Evaluator runs as a virtual appliance on any Windows or Linux laptop/desktop.
• Request a managed evaluation’ when you are ready.
v2.2 Page 101
Useful references• Presentation and ldom demo scripts
> http://uk.sun.com/discoveryday• Sun Logical Domains Wiki
> http://wikis.sun.com/display/SolarisLogicalDomains/Home• Sun.com logical domains page (link to S/W and docs )
> http://www.sun.com/servers/coolthreads/ldoms/index.xml• Sun Virtualisation Training Courses
> http://uk.sun.com/training/catalog/operating_systems.virtualization.xml• Opensolaris LDOM's site
> http://www.opensolaris.org/os/community/ldoms
• Blueprint: Beginners Guide to LDOM's> http://wikis.sun.com/display/BluePrints/Beginners+Guide+to+LDoms+1.0
• Blueprint: Using Logical Domains and Coolthreads Technology> http://wikis.sun.com/display/BluePrints/Using+Logical+Domains+and+CoolThreads+Technology