Date post: | 01-Apr-2015 |
Category: |
Documents |
Upload: | pablo-stearman |
View: | 219 times |
Download: | 0 times |
Joint Business LaunchDISASTER RECOVERY AND MULTI-SITE CLUSTERING WITH WINDOWS SERVER 2008 R2
VIJAY TEWARI, PRINCIPAL PROGRAM MANAGER, WINDOWS SERVER NOV 17, 2009
Session Objectives And Takeaways
Session Objective(s): Understanding the need and benefit of multi-site clustersWhat to consider as you plan, design, and deploy your first multi-site cluster
Windows Server Failover Clustering is a great solution for not only high availability, but also disaster recovery
Multi-Site Clustering
Introduction Networking Storage Quorum Workloads
Site A But what if there is a catastrophic
event?
Fire, flood, earthquake …
Same Physical Location
SAN
Is my Cluster Resilient to Site Failures?
Site BSite AApplications are failed over to a
separate physical location
Node is moved to a physically
separate site
Multi-Site Clusters for DR
Extends a cluster from being a High Availability solution, to also being a Disaster Recovery solution
SANSAN
Benefits of a Multi-Site Cluster
Protects against loss of an entire datacenterAutomates failover
Reduced downtimeLower complexity disaster recovery plan
Reduces administrative overheadAutomatically synchronize application and cluster changesEasier to keep consistent than standalone servers
The primary reason DR solutions fail isdependence on people
Multi-Site Clustering
Introduction Networking Storage Quorum Workloads
Network ConsiderationsNetwork Options:
1. Stretch VLAN’s across sites2. Cluster nodes can reside in different subnets
Site A
Public Network
Site B10.10.10.1 20.20.20.1
30.30.30.1 40.40.40.1
Separate
Network
Stretching the NetworkLonger distance traditionally means greater network latencyToo many missed health checks can cause false failoverHeartbeating is fully configurable
SameSubnetDelay (default = 1 second)Frequency heartbeats are sent
SameSubnetThreshold (default = 5 heartbeats)Missed heartbeats before an interface is considered down
CrossSubnetDelay (default = 1 second)Frequency heartbeats are sent to nodes on dissimilar subnets
CrossSubnetThreshold (default = 5 heartbeats)Missed heartbeats before an interface is considered down to nodes on dissimilar subnets
Command Line: Cluster.exe /propPowerShell (R2): Get-Cluster | fl *
Security over the WAN
Encrypt intra-node traffic0 = clear text1 = signed (default)2 = encrypted
Site A Site B10.10.10.1 20.20.20.1
30.30.30.1 40.40.40.1
Enhanced Dependencies – ORNetwork Name resource stays up if either IP Address Resource A OR IP Address Resource B is up
OR
Network Name resource
IP Address Resource A
IP Address Resource B
Client Reconnect Considerations
Nodes in dissimilar subnetsFailover changes resource’s IP AddressClients need that new IP Address from DNS to reconnect
10.10.10.111 20.20.20.222
DNS Server 1DNS Server 2DNS Replication
Record Updated
Record Created
Record Obtained
FS = 10.10.10.111
Record Updated
FS = 20.20.20.222Site A Site B
Solution #1: Configure NN SettingRegisterAllProvidersIP (default = 0 for FALSE)
Determines if all IP Addresses for a Network Name will be registered by DNS
TRUE (1): IP Addresses can be online or offline and will still be registered
Ensure application is set to try all IP Addresses, so clients can connect quicker
HostRecordTTL (default = 1200 seconds)Controls time the DNS record lives on client for a cluster network name
Shorter TTL: DNS records for clients updated sooner
Solution #2: Prefer Local Failover
Local failover for higher availabilityNo change in IP Address
Cross-site failover for disaster recovery
10.10.10.111
DNS Server 1 DNS Server 2
FS = 10.10.10.111Site A Site B
20.20.20.222
Solution #3: Stretch VLAN’s
Deploying a VLAN minimizes client reconnection times
DNS Server 1 DNS Server 2
FS = 10.10.10.111
Site A Site B
10.10.10.11110.10.10.111
VLAN
Solution #4: Abstraction in Device
Network device uses 3rd IP3rd IP is the one registered in DNS & used by clientExample:http://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/App_Networking/extmsftw2k8vistacisco.pdf
10.10.10.111 20.20.20.222
DNS Server 1
DNS Server 2
FS = 30.30.30.30Site A Site B
30.30.30.30
This is generic guidance…
If you have other creative ideas, that’s ok!
Multi-Site Clustering
Introduction Networking Storage Quorum Workloads
Storage in Multi-Site Clusters
Different than local clusters:Multiple storage arrays – independent per siteNodes commonly access own site storageNo “true” shared disk visible to all nodes
Site A Site B
Site A
Changes are made on Site A and replicated to
Site B
Site B
Replica
Storage Considerations
Need a data replication mechanism between sites
Replication Alternatives
Replication levels:Hardware storage-based replication. Eg.
Software host-based replication. Eg.
Application-based replication
Synchronous vs. Asynchronous
Synchronous AsynchronousNo data loss Potential data loss on
hard failuresRequires high bandwidth/low
latency connection
Enough bandwidth to keep up with data
replicationStretches over shorter
distancesStretches over longer
distancesWrite latencies impact
application performance
No significant impact on application performance
Cluster Validation and Replication
Multi-Site clusters are not required to pass the Storage tests to be supported
Validation Guide and Policy
http://go.microsoft.com/fwlink/?LinkID=119949
Multi-Site Clustering
Introduction Networking Storage Quorum Workloads
Quorum Overview
Disk only (not recommended)Node and Disk majority
Node majorityNode and File Share majority
VoteVote Vote Vote Vote
Majority is greater than 50%Possible Voters:
Nodes (1 each) + 1 Witness (Disk or File Share)4 Quorum Types
Replicated Disk Witness
A witness is a decision maker when nodes lose network connectivity
When a witness is not a single decision maker, problems occur
Do not use in multi-site clusters unless directed by vendor
Replicated Storage from
vendor
?
Vote Vote Vote
Site BSite A
Cross site network connectivity
broken!
Can I communicate with
majority of the nodes in the
cluster?Yes, then Stay Up
Can I communicate with
majority of the nodes in the
cluster?No, drop out of
Cluster Membership
5 Node Cluster: Majority = 3
Majority in Primary
Site
SANSAN
Node Majority
Node Majority
Site BSite A
Disaster at Site 1
We are down! Can I communicate with
majority of the nodes in the
cluster?No, drop out of
Cluster Membership
Majority in Primary
Site
5 Node Cluster: Majority = 3
SANSAN
Need to force quorum manually
Forcing Quorum
Always understand why quorum was lostUsed to bring cluster online without quorumCluster starts in a special “forced” stateOnce majority achieved, no more “forced” state
Command Line:net start clussvc /fixquorum (or /fq)
PowerShell (R2):Start-ClusterNode –FixQuorum (or –fq)
Site A Site B
Site C
Complete resiliency and automatic recovery from the loss of any 1 site
Replicated Storage
\\Foo\Cluster1
SAN SAN
WAN
Multi-Site With File Share WitnessFile Share Witness
WANSite A Site B
Site C
Complete resiliency and automatic recovery from the loss of connection between sites
Replicated Storage
SAN SAN
Multi-Site With File Share Witness
Can I communicate with majority of the nodes (+FSW) in the
cluster?Yes, then Stay Up
File Share Witness
Can I communicate with majority of the nodes in the cluster?No (lock failed), drop
out of Cluster Membership
\\Foo\Cluster1
Quorum Model Summary
No Majority: Disk OnlyNot RecommendedUse as directed by vendor
Node and Disk MajorityUse as directed by vendor
Node MajorityOdd number of nodesMore nodes in primary site
Node and File Share MajorityEven number of nodesBest availability solution – FSW in 3rd site
Multi-Site Clustering
Introduction Networking Storage Quorum Workloads
Hyper-V in a Multi-Site Cluster
Area ConsiderationsNetwork -On cross-subnet failover, if guest
is …- DHCP, then IP updated automatically- Statically configured IP, then admin
needs to configure new IP-Use VLAN preferred with live migration between sites
Storage -3rd party replication solution required-Configuration with CSV (explained next)
Quorum -No special considerationsLinks: http://technet.microsoft.com/en-us/library/dd197488.aspx
CSV in a Multi-Site Cluster
Architectural assumptions collide…Replication solutions assume only 1 array accessed at a timeCSV assumes all nodes can concurrently access the LUN
CSV is not required for Live MigrationTalk to your storage vendor for their support story
VHD
Nodes in Primary Site Nodes in Disaster Recovery Site
Read/OnlyRead/WriteReplication
VM attempts to access
replica
SQL in a Multi-Site Cluster
Area ConsiderationsNetwork -SQL does not support OR
dependency-Need to stretch VLAN between sites
Storage -No special considerations-3rd party replication solution required
Quorum -No special considerationsLinks:http://technet.microsoft.com/en-us/library/ms189134.aspx http://technet.microsoft.com/en-us/library/ms178128.aspx
Exchange in a Multi-Site Cluster
Area ConsiderationsNetwork -No VLAN needed
-Change HostRecordTTL from 20 minutes to 5 minutes-CCR supports 2 nodes, one per site
Storage -Exchange CCR provides application-based replication
Quorum -File share witness on the Hub Transport server on primary site
Links:http://technet.microsoft.com/en-us/library/bb124721.aspx http://technet.microsoft.com/en-us/library/aa998848.aspx
demo
Setting up a cluster and Live Migration
Demo Environment Overview
HVNODE1(Microsoft Hyper-V Server 2008 R2)
HVNODE2(Windows Server 2008 R2 deployed as Server core)
Gigabit Switch
CONTOSO:Domain Controller and iSCSI storage
Session Summary
Multi-Site Failover Clustering has many benefitsRedundancy is needed everywhereUnderstand your replication needsCompare VLANs with multiple subnetsPlan quorum model & nodes before deploymentFollow the checklist and best practices
ResourcesCluster Team Blog: http://blogs.msdn.com/clustering/ Cluster Information Portal: http://www.microsoft.com/windowsserver2008/en/us/clustering-home.aspx Clustering Technical Resources: http://www.microsoft.com/windowsserver2008/en/us/clustering-resources.aspx Clustering Forum (2008): http://forums.technet.microsoft.com/en-US/winserverClustering/threads/Clustering Forum (2008 R2):
http://social.technet.microsoft.com/Forums/en-US/windowsserver2008r2highavailability/threads/
Clustering Newsgroup: http://www.microsoft.com/communities/newsgroups/list/en-us/default.aspx?dg=microsoft.public.windows.server.clustering
Failover Clustering Deployment Guide: http://technet.microsoft.com/en-us/library/dd197477.aspx TechNet: Configure a Service or Application for High Availability: http://technet.microsoft.com/en-us/library/cc732478.aspx TechNet: Installing a Failover Cluster: http://technet.microsoft.com/en-us/library/cc772178.aspx TechNet: Creating a Failover Cluster: http://technet.microsoft.com/en-us/library/cc755009.aspxWebcast (2008 R2): Introduction to Failover Clustering: http://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032407190&Culture=en-USWebcast (2008 R2): HA Basics with Hyper-V: http://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032407222&Culture=en-US Webcast (2008 R2): Cluster Shared Volumes (CSV):http://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032407238&Culture=en-US
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the
date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.