VMware vSphere HA Recommendations to Maximize Virtual Machine Uptime
Josh Gray, VMware, Inc.
Jeff Hunter, VMware, Inc.
INF-BCO2382
#vmworldinf
2
Disclaimer
This session may contain product features that are currently under development.
This session/overview of the new technology represents no commitment from VMware to deliver these features in any generally available product.
Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
Technical feasibility and market demand will affect final delivery.
Pricing and packaging for any new technologies or features discussed or presented have not been determined.
3
High Availability is Part of IT Business Continuity
4
Just a Few Clicks to Higher Availability
Turn ON vSphere HA
OK
5
Global Support Services (GSS)
Bangalore, India
Tokyo, Japan
Cork, Ireland Burlington, Canada
Palo Alto, CA Broomfield, CO
Support offices Local language support
Spanish, Portuguese, French, German, Japanese, Chinese
Global Coverage 24x7, 365 days/year 6 Support Centers
1000+ Support Engineers
Follow-the-sun Support for
Severity 1 Issues
Support Relationships with 100% of the
Fortune 100; 99% of Fortune 500
6
Recent Enhancements
7
vSphere 5.0 Major Redesign
Fault Domain Manager (FDM)
8
vSphere 5.1 Minor Updates
9
Recommendations: Networking
Redundant Management Network
Fewest hops possible
Route based on originating port ID
Failback policy = No
Enable PortFast, Edge, etc.
MTU size the same
Keep things simple
10
Recommendations: Networking
Consistent portgroup names, network labels
Host Monitoring during network maintenance
Use Maintenance Mode
Separate subnet for vSphere HA
Specify additional network isolation address
Each host can communicate with all other hosts
Keep things simple
11
Recommendations: Networking
12
Recommendations: Networking
Advanced Configuration Options • das.allowNetwork[0-9]=
• das.isolationAddress[0-9]=
• das.useDefaultIsolationAddress= (true/false)
• das.failuredetectiontime • Not supported in vCenter 5.x
13
Recommendations: Storage
Implement multiple paths
• HBAs, storage processors (SPs), NICs, switches
• Appropriate multipathing policy
14
Recommendations: Storage
Storage Heartbeats
• HA selects two datastores by default
15
Recommendations: Storage
Storage Heartbeats
• Override auto-selected datastores if necessary
16
HA Events (How to Avoid Problems)
17
Possible HA Events: Host Failure
Network partition Host isolation
18
HA Events: Host Failures
19
HA Events: Network Partition
20
Recommendations: Network Partition
Symptoms: Network Partition
21
Recommendations: Network Partition
Symptoms: Network Partition
Master
22
Recommendations: Network Partition
Symptoms: Network Partition
23
Recommendations: Network Partition
Symptoms: Network Partition
New Master
24
Recommendations: Network Partition
Symptoms: Network Partition
New Master
New Master
25
HA Events: Host Isolation
26
Host Isolation Policies: Leave Powered On
Power Off
Shutdown
27
Which Policy? (How to Avoid Problems)
28
Depends. (on HOW You Want to Avoid Problems)
29
Likelihood….
30
Recommendations: Isolation Response
Host will retain access to
datastores?
VMs will retain access to VM
network?
Recommended Isolation Policy Rationale
Likely Likely Leave Powered On
VM is running fine, why power it off
Likely Unlikely Leave Powered On or Shutdown
Allow HA to restart on hosts that are not isolated, likely to have access to
storage
Unlikely Likely Power off Avoid having two instances of the same VM on the
network
31
Recommendations: Isolation Response
Host will retain access to
datastores?
VMs will retain access to VM
network?
Recommended Isolation Policy Rationale
Likely Likely Leave Powered On
VM is running fine, why power it off
Likely Unlikely Leave Powered On or Shutdown
Allow HA to restart on hosts that are not isolated, likely to have access to
storage
Unlikely Likely Power off Avoid having two instances of the same VM on the
network
32
Recommendations: Isolation Response
Host will retain access to
datastores?
VMs will retain access to VM
network?
Recommended Isolation Policy Rationale
Likely Likely Leave Powered On
VM is running fine, why power it off
Likely Unlikely Leave Powered On or Shutdown
Allow HA to restart on hosts that are not isolated, likely to have access to
storage
Unlikely Likely Power off Avoid having two instances of the same VM on the
network
33
Recommendations: Isolation Response
Host will retain access to
datastores?
VMs will retain access to VM
network?
Recommended Isolation Policy Rationale
Likely Likely Leave Powered On
VM is running fine, why power it off
Likely Unlikely Leave Powered On or Shutdown
Allow HA to restart on hosts that are not isolated, likely to have access to
storage
Unlikely Likely Power off Avoid having two instances of the same VM on the
network
34
Recommendations: Isolation Response
Host will retain access to
datastores?
VMs will retain access to VM
network?
Recommended Isolation Policy Rationale
Likely Likely Leave Powered On
VM is running fine, why power it off
Likely Unlikely Leave Powered On or Shutdown
Allow HA to restart on hosts that are not isolated, likely to have access to
storage
Unlikely Likely Power off Avoid having two instances of the same VM on the
network
35
Recommendations: Isolation Response
Host will retain access to
datastores?
VMs will retain access to VM
network?
Recommended Isolation Policy Rationale
Likely Likely Leave Powered On
VM is running fine, why power it off
Likely Unlikely Leave Powered On or Shutdown
Allow HA to restart on hosts that are not isolated, likely to have access to
storage
Unlikely Likely Power off Avoid having two instances of the same VM on the
network
36
Recommendations: Isolation Response
Host will retain access to
datastores?
VMs will retain access to VM
network?
Recommended Isolation Policy Rationale
Likely Likely Leave Powered On
VM is running fine, why power it off
Likely Unlikely Leave Powered On or Shutdown
Allow HA to restart on hosts that are not isolated, likely to have access to
storage
Unlikely Likely Power off Avoid having two instances of the same VM on the
network
37
Admission Control (How to Avoid Problems)
38
Admission Control Policies: Static number of hosts
Percentage of cluster resources Dedicated failover hosts
39
Static Number of Hosts Admission Control Policy
40
Recommendations: Admission Control
Number of Hosts (Host Failures Cluster Tolerates)
VMware vSphere
41
Recommendations: Admission Control
Number of Hosts (Host Failures Cluster Tolerates)
Each Host: 4 CPU x 2.40 GHz CPU 16 GB memory
Cluster: 38 GHz 64 GB memory
42
Recommendations: Admission Control
Number of Hosts (Host Failures Cluster Tolerates)
Reservation: 2 GHz 1024 MB
Reservation: 1 GHz 2048 MB
Each Host: 4 CPU x 2.40 GHz CPU 16 GB memory
Cluster: 38 GHz 64 GB memory
43
Recommendations: Admission Control
Number of Hosts (Host Failures Cluster Tolerates)
Reservation: 2 GHz 1024 MB
Reservation: 1 GHz 2048 MB
44
Recommendations: Admission Control
Number of Hosts (Host Failures Cluster Tolerates)
Reservation: 2 GHz 1024 MB
Reservation: 1 GHz 2048 MB
45
Recommendations: Admission Control
Number of Hosts (Host Failures Cluster Tolerates)
Reservation: 2 GHz 1024 MB
Reservation: 1 GHz 2048 MB
46
Recommendations: Admission Control
Number of Hosts (Host Failures Cluster Tolerates)
VM VM
47
Recommendations: Admission Control
Number of Hosts (Host Failures Cluster Tolerates)
VM VM
48
Recommendations: Admission Control
Number of Hosts (Host Failures Cluster Tolerates)
VM VM
49
Recommendations: Admission Control
Number of Hosts (Host Failures Cluster Tolerates)
VM VM
50
Recommendations: Admission Control
Number of Hosts (Host Failures Cluster Tolerates)
VM VM
51
Recommendations: Admission Control
Number of Hosts (Host Failures Cluster Tolerates) Windows
Client
vSphere Web Client
52
Recommendations: Admission Control
Number of Hosts (Host Failures Cluster Tolerates) Windows
Client
vSphere Web Client
53
Recommendations: Admission Control
Number of Hosts (Host Failures Cluster Tolerates)
• vSphere Windows Client • Sets a “cap” on the slot size
Override default
behavior
54
Recommendations: Admission Control
Number of Hosts (Host Failures Cluster Tolerates)
• vSphere Web Client • Sets the exact size. Important difference.
Override default
behavior
55
Recommendations: Admission Control
Number of Hosts (Host Failures Cluster Tolerates)
•
VM VM
56
Recommendations: Admission Control
Number of Hosts (Host Failures Cluster Tolerates)
•
VM VM
57
Recommendations: Admission Control
Number of Hosts (Host Failures Cluster Tolerates)
•
VM VM
58
Recap: Static Number of Hosts
Admission Control Policy
59
% of Cluster Resources Admission Control Policy
60
Recommendations: Admission Control
Percentage of cluster resources
61
Recommendations: Admission Control
Percentage of cluster resources
62
Recommendations: Admission Control
Percentage of cluster resources
63
Recommendations: Admission Control
Percentage of cluster resources
64
Dedicated Failover Hosts Admission Control Policy
65
Recommendations: Admission Control
66
Which Do I Use?!?!
67
Recommendations: Admission Control
“Basic design principle: Do the math, and take customer requirements into account. If you need flexibility a “Percentage” is the way to go.”
– Frank Denneman & Duncan Epping VMware vSphere 5 Clustering – Technical Deepdive
68
vSphere HA VM Monitoring
VM Monitoring restarts VM if…
• VMware Tools Heartbeat not received
• No network or disk activity within I/O stats interval • Default 120 seconds – customize in vSphere Web Client
69
vSphere HA Application Monitoring
3rd-Party Solutions • Symantec ApplicationHA
• Neverfail vAppHA
Application Awareness API open with vSphere 5.0 • Download VMware GuestAppMonitor SDK with 5.0
• Download VMware Guest SDK for vSphere 5.1
70
vSphere HA Futures
VMware vSphere HA Today • Storage interconnect most commonly queried KB issue
• Assumes storage connected on other hosts
• Improvements with vSphere 5.0 U1 and 5.1
Virtual Machine Component Protection (VMCP) • Fine-grained controls for VM restart policy
• Queries destination host(s) for storage health
• Demo in VMware booth on show floor
71
vSphere HA Futures
VMware vSphere Fault Tolerance (FT) Today • Protects only VMs with 1 vCPU
• Many mission-critical apps require multiple vCPUs
SMP Fault Tolerance (FT) • Protect VMs that have more than one vCPU
72
Customer Support Day Events
Coming to a location near you: sharing of VMware best practices!
Support Days are a collaboration between VMware Support, Sales and customers – you learn directly from the experts
Topics are driven by customer input, and typically include: • Best practices • Tips/tricks • Top issues • Product roadmaps/demos • Certification offerings
http://www.vmware.com/go/supportdays
73
VMware GSS: Important Links
Blogs Support Insider: blogs.vmware.com/kb KBTV: blogs.vmware.com/kbtv KB Digest: blogs.vmware.com/kbdigest
Twitter @vmwarecares: twitter.com/vmwarecares @vmwarekb: twitter.com/vmwarekb Facebook https://www.facebook.com/vmwkb
Communities communities.vmware.com
YouTube KBTV: youtube.com/user/vmwarekb
Support and Downloads: vmware.com/support
Technical Support Welcome Guide: vmware.com/go/supportguide
Get Support via My VMware: my.vmware.com/group/vmware/get-help
Licensing Help Center: vmware.com/support/licensing
Knowledge Base: kb.vmware.com
Customer Support Days: vmware.com/go/supportdays
Renewals: vmware.com/go/renew
Customer Advocacy: [email protected]
Product Support Centers: vmware.com/support/product-support
FILL OUT A SURVEY
EVERY COMPLETE SURVEY IS ENTERED INTO DRAWING FOR A
$25 VMWARE COMPANY STORE GIFT CERTIFICATE
VMware vSphere HA Recommendations to Maximize Virtual Machine Uptime
Josh Gray, VMware, Inc.
Jeff Hunter, VMware, Inc.
INF-BCO2382
#vmworldinf