Date post: | 13-Jan-2016 |
Category: |
Documents |
Upload: | horatio-jonathan-horton |
View: | 230 times |
Download: | 0 times |
Failover Clustering: Quorum Model Design for Your Private CloudAmitabh TamhaneSenior Program ManagerWindows Server Clustering
MDC-B403
Session Objectives And TakeawaysSession Objective(s): Walk-through Cluster Quorum FundamentalsNew Quorum Features in Windows Server 2012 & R2Configuration of cluster quorumInsight into disaster recovery multi-site quorum
Key Takeaway(s):“Simplified” Cluster quorum configurationDynamic Quorum – Increases availability of clusterStep by step configuration of DR multi-site quorum
Quorum Basics
Cluster challenges
1 3 42
Site Power Outage
Network Disconnect
Node Shutdown for Patching
Node Crash
Quorum Witness Failure
How do I make sure my Cluster stays up ??...
5
Add/Evict Node
Why QuorumFaster Start & Recovery of ClusterEffective quorum policy helps faster start of clusterDetermines the set of nodes that have latest cluster database
Identifying point when to start workloadDetermines the point when cluster can host applicationsEffective quorum policy prevents unnecessary downtime
Addressing split-brainPrevent two disjointed instances of the same cluster
Windows Server 2012+R2: Quorum GoalsSimplify Quorum Configuration
Quorum shouldn’t affect number of nodes in clusterSimplified quorum witness selectionUpdated wizard for quorum configuration
Increase Cluster High AvailabilityCluster more resilient to node/witness failuresCluster can now survive with <50% majority nodes with Dynamic QuorumCluster can now survive even split 50% nodes
Enable more disaster recovery quorum scenarios
Voting Elements in Quorum
• Every cluster node has 1 vote
• User configurable per node
Nodes
• Witness has 1 vote• Disk Witness• File Share Witness
• User configurable• Single witness per cluster
Witness
Cluster needs majority of participating votes to surviveMore about this in later slides…
Disk Witness ConsiderationsDedicated LUN for internal cluster useQuorum DiskUsed as arbitration point
Stores a copy of cluster database
Recommendations:Small disk at least 512 MB in sizeDedicated LUNNTFS or ReFS formattedNo need for drive letter
File Share Witness Considerations
File Server LocationRecommended at 3rd separate siteNot on a node in the same clusterNot inside VM running in the same clusterHA File Server configured in a separate cluster
Simple Windows File ServerEasy to deploySingle File Server can be used for multiple clustersUnique File Share per clustersCNO requires write permissions on the File Share
File Share WitnessNo copy of cluster databaseMinimal network traffic – Cluster membership change only
Partition In Time: Disk WitnessLatest cluster database copy on Disk Witness
21
UpdatesCluster
database
ClusterDatabaseUpdated
ClusterStarted
with latest
database
Partition In Time: File Share WitnessPrevents node with stale database from forming cluster
21
UpdatesCluster
database
OnlyTime-stamp
Updated
ClusterNot
Started! No latest database
Deciding Which Witness to UseWitness: Disk vs. File Share
Disk File SharePrevents Split-Brain P PPrevents Partition-in-Time P PSolves Partition-in-Time PArbitration Type SCSI Persistent
ReservationWitness.log file on
SMB Share
Recommended: Use Disk Witness if you have shared storage
Key Points to RememberQuorum enables cluster to surviveDetermines the point at which cluster is successfully formed
Voting ElementsEach node has 1 vote and (if configured) witness has 1 voteLook for updated guidance with Dynamic Witness
Witness selection: Disk or File ShareDisk Witness (recommended) – Stores Cluster DBFile Share Witness – Multisite cluster with replicated storage
Node Vote Weights
Node Vote Weights
Nodes with No-Vote continue to be part of the clusterReceive cluster database updatesAbility to host applications
Granular control of which nodes have votesDirectly affects quorum calculations
Limit impact on cluster quorumCluster quorum does not change if nodes with no vote go down
Why modify Node Vote?
Not all nodes in your cluster are equally importantTypically nodes from Disaster Recovery Backup site
Primarily used for multi-site clustersRecommended only for manual failover across sitesMore about this in later slides …
4 53
Vote VoteNo
VoteNo
Vote
Site A Site B
1
Vote
2
Adjusting majority votes using Node Votes
Original: Total Votes = 4 Majority Votes = 3Updated: Total Votes = 3 Majority Votes = 2
No Vote
VoteVote Vote
Quorum Maintained!
Cluster Survives!
1 2 3 4
Adjusting Node Vote WeightsGranular control of which nodes have votesConfigurable per cluster nodeCan be modified with no downtime
NodeWeight
Default = 1
Remove Vote = 0Cluster Assigned = 1
(Get-ClusterNode <name>).NodeWeight = 0
Use PowerShell or Configure Quorum Wizard
UI: Viewing Node Vote WeightsUpdated Nodes Page For Easy Viewing User configured node vote weights in “Assigned Vote” columnCluster assigned dynamic vote weights in “Current Vote” column
Dynamic Quorum
Dynamic QuorumAutomatic Node Vote AdjustmentAutomatic adjustment of Node Vote based on node’ state Active Node : Dynamic Vote = 1 Down Node : Dynamic Vote = 0
No change for node with no assigned vote
Dynamic Quorum MajorityQuorum majority is dynamically determined by active cluster nodes
Increase High Availability of Cluster ItselfSustain sequential node failures or shutdownsEnables cluster to survive with <50% active nodes
Dynamic Quorum Functionality
Last Man StandingCluster can now survive with only 1 node64-node cluster all the way down to 1 node
Enabled By DefaultConfigurable via PowerShell
Seamless IntegrationWith existing cluster quorum features & configurationsWith multisite disaster recovery deployments
Dynamic Quorum for WitnessAutomatic Witness Vote AdjustmentAutomatic adjustment of Witness Vote based on active cluster membership Even Active Nodes with Dynamic Vote of 1 : Witness Dynamic Vote = 1 Odd Active Nodes with Dynamic Vote of 1 : Witness Dynamic Vote = 0Cluster now has the smarts to determine when to use Witness Vote!
State of Witness Witness Offline or Failed will automatically make Witness Dynamic Vote = 0
Always configure a witness with Windows Server 2012 R2
Clustering will determine when it is best to use the Witness
Configure Disk Witness if shared storage, otherwise FSW
New Recommendati
on
User Configurable Quorum Properties
PowerShell
(Get-Cluster).DynamicQuorum = 1
(Get-ClusterNode “name”).NodeWeight = 1
Cluster Common PropDefault: Enabled
1: Enabled0: Disabled
DynamicQuorum
Node Common PropDefault: Vote assigned
1: Cluster Managed0: Disable Vote
NodeWeight
Cluster Managed Quorum Properties
PowerShell
(Get-ClusterNode “name”).DynamicWeight (read only)
(Get-Cluster).WitnessDynamicWeight (read only)
Node Common PropValue Adjusted by Cluster
1: Node Has Vote0: Node Has No Vote
DynamicWeight
Cluster Common PropValue Adjusted By Cluster
1: Witness Has Vote0: Witness Has No Vote
WitnessDynamicWeight
Dynamic Quorum : Node Scenarios
Node ShutdownNode removes its own vote
Node JoinOn successful join the node gets its vote back
Node CrashRemaining active nodes remove vote of the downed node
Dynamic Quorum : Witness Scenarios
Witness OfflineWitness vote gets removed by the cluster
Witness OnlineIf necessary, Witness vote is added back by the cluster
Witness FailureWitness vote gets removed by the cluster
Tie BreakerCluster will survive simultaneous loss of 50% votesEspecially useful in multi-site DR scenarios with even splitCluster always ensures total number of votes are Odd
One site automatically elected to winBy default, cluster randomly selects a node to take its vote outLowerQuorumPriorityNodeID cluster common property identifies a node to take its vote out
Cluster
Site1 Site2
Last Man Standing: Witness Configured4 Nodes + Witness Configured (N = Number of Votes)
VoteVoteVote VoteVote
Last Man
Standing!
ClusterSurvive
s!
N = 5Majority =
3
N = 3Majority =
2
N = 3Majority =
2
N = 3Majority =
2
1 2 3 4
Vote
Last Man Standing: No Witness
VoteVoteVote VoteVote
Last Man
Standing!
ClusterSurvive
s!
N = 5Majority =
3
N = 3Majority =
2
N = 3Majority =
2
N = 2Majority =
2
N = 1Majority =
1
1 2 3 4 5
5 Nodes + No Witness Configured (N = Number of Votes)
No Witness: Last Two Active Nodes
Cluster dynamically removes one node’s voteCluster can sustain communication loss between the last two nodesCluster can sustain crash of node with no voteRandom selection of the node whose vote gets removed
Cluster survives graceful shutdown of either node
Node 1 Node 2
State UP UP
NodeWeight 1 1
DynamicWeight 1 0
Dynamic Quorum
DEMO
Dynamic Quorum ConsiderationsSimultaneous Loss of Majority NodesNeed existing majority votes to update new majority votesCuster cannot sustain simultaneous loss of majority nodes
Always Configure WitnessWitness helps cluster to sustain one extra node failureWitness helps in giving equal opportunity to survive in DR scenarios (more details later)
Cluster running with <50% majority nodesThe remaining <50% nodes become more important“Last Man Standing” node becomes necessary for cluster startHelps prevent partition in time
Dynamic Quorum vs. Disk Only Quorum
Disk Only QuorumNo flexibility around vote adjustment (1 vote of disk witness)Disk Witness is single point of failure
Dynamic QuorumHelps achieve true “Last Man Standing”Increases cluster availability by making cluster resilient
With Dynamic Quorum, no need for Disk Only QuorumWhy lose the cluster when storage is lost?
Key Points to RememberDynamic Quorum increase Availability of ClusterAutomatic adjustment of dynamic vote of nodes & witness
Dynamic Quorum enables “Last Man Standing”Cluster can survive with only 1 node remaining
Node Vote AdjustmentOnly with Manual Failover to DR site; Remove vote of nodes from DR site
Simplified witness selection with Dynamic WitnessBest practice guidelines to always configure quorum witness
Configuring Cluster Quorum
Intuitive Quorum ConfigurationUpdated Cluster UI ExperienceSimplified quorum configuration with updated quorum wizard
Updated Nodes PageAbility to view node’s user configured vote & cluster managed vote
Simplified TerminologyRemoved legacy concepts of ‘quorum modes’It is all about witness selection:
“File Share Witness” or “Disk Witness” or “No Witness”
Updated Quorum ValidationSimplified guidance & warning textNodes & witness vote information is captured in detail
Configured via Cluster Manager GUI and PowerShellCluster Quorum Wizard
PowerShell
Set-ClusterQuorum –NoWitness
Set-ClusterQuorum –DiskWitness “DiskResourceName”
Set-ClusterQuorum –FileShareWitness “FileShareName”
Set-ClusterQuorum –DiskOnly “DiskResourceName”
Updated PowerShell
New Quorum Wizard
DEMO
Recovery Actions
Force QuorumManual OverrideAllows to start cluster without majority votes
Cluster starts in a special “forced quorum” modeRemains in this mode till majority votes achievedCluster automatically switches to normal functioning
CautionAlways understand why quorum was lostSplit-brain between nodes possible
You are now in control!
Prevent Quorum Flag
Command Line: net start clussvc /ForceQuorum
PowerShell: Start-ClusterNode –ForceQuorum
Prevent QuorumHelps prevent nodes with vote to form clusterNodes started with ‘Prevent Quorum’ always join existing cluster
Applicable to cluster in “Force Quorum”Always start remaining nodes with ‘Prevent Quorum’
Helps prevent overwriting of latest cluster databaseForward progress made by nodes in ‘Force Quorum’ is not lost
Most applicable in multisite DR setupPrevent Quorum Flag
Command Line: net start clussvc /PQ
PowerShell: Start-ClusterNode –PreventQuorum
Force Quorum ResiliencyCluster detects partitions after a manual Force QuorumCluster has the built-in logic to track Force Quorum started partition
Partition started with Force Quorum is deemed authoritativeOther partitions automatically restart up on detecting a FQ cluster
Restarted nodes in other partition join the FQ clusterCluster automatically restarts the nodes with Prevent Quorum
Cluster
Site1 Site2Manual Override
with ForceQuorum
Nodes Restarted
When Site2 partition detected
Multi-Site DR QuorumConsiderations of Quorum with DR solutions
Types of Multi-Site DR Configurations
• Services automatically failover to recovery site in the event of a disaster
• All sites equal
Automatic Failover
• Services manually failover to recovery site in the event of a disaster
• Primary & Backup (DR) sites
Manual Failover
What are you Service Level Agreements (SLA’s)?In the event of a disaster, how do you want to switch to your DR site?
Automatic Failover Considerations
Node Vote Weight AdjustmentsAll nodes equally importantNo need to modify node vote weights
All Sites EqualAllow cluster to sustain failure of any one siteAllow automatic failover of workload to the surviving site
Number of Nodes per SiteKeep equal number of nodes in both sitesHelps cluster sustain failure of any site
Otherwise the site with more nodes would become Primary site
Automatic Failover: Witness ConsiderationsAlways Configure File Share Witness (recommended)File Server running at a separate siteThe separate site must be accessible from the workload sites
Allows cluster to sustain communication loss between sites
Witness SelectionHighly Available File Server, for witness, in a separate clusterDisk Witness can be used as directed by storage vendor
Automatic Failover: 2-Site ClusterFailover Example
VoteVote Vote
1 3 4
Vote
2
Site 2Site 1
Site 3
Site-2 Down!!!Site-1 can
reach FSW! Cluster
Survives!
Vote
Automatic Failover: WAN Link IssuesWitness Dynamic Vote & Tie Breaker
VoteVote Vote
1 3 4
Vote
Site 2Site 1
Site 3
Site-2 Down!!!Site-1
Wins!!!Cluster
Survives!
Clusterremove
sNode 3’s
Vote
Vote
Clusterremove
sWitness
Vote
2
Manual Failover ConsiderationsAll Sites Not EqualCluster cannot sustain failure of Primary siteAllow cluster to sustain failure of the Backup site
Node Vote Weight AdjustmentsDisallow nodes in Backup site in affecting cluster quorumRemove node vote weight of nodes in Backup site
Number of Nodes per SiteNo requirement to keep equal number of nodes in both sites
Do Not Push
Manual Failover: Workload ConsiderationsWorkload ManagementUse Preferred Owners to prioritize keeping workload on Primary site
Recovery ActionsPrimary site failure would require “Force Quorum” on Backup siteRecover Primary site nodes using “Prevent Quorum”
Manual Failover: Witness Considerations
Always Configure WitnessFile Server running at a separate site (recommended)File Server running local in Primary Site may be Ok (consider recovery scenarios)
Witness SelectionHighly Available File Server, for witness, in a separate clusterAsymmetric Disk Witness can be used as well (consider recovery scenarios)
Asymmetric Disk WitnessDisk Witness accessibilitySubset of nodes can access the diskWitness can come online only on subset of nodes
Most applicable in multi-site clustersDisk only seen by primary siteWitness can come online only on primary site
Cluster recognizes asymmetric storage topologyUses this to place cluster quorum group
3 4
S A N
21
Manual Failover: 2-Site ClusterBackup Site Down
Vote
4
Vote
Primary Site
Witness Site
Backup Site
No VoteNo Vote
Backup Site
Down!!!No effect
on Quorum! Cluster
Survives!
1 32
Vote
Manual Failover: Temporary OutageRecommended Recovery
Vote
1 3 4
Vote
2
Primary Site
Witness Site
Backup Site
No VoteNo Vote
Primary Site Down!
!!
Not enoughVotes!!!Cluster Down!!
1Force
Quorum Cluster Start!
2Start nodes
with Prevent Quorum!
3Successful
Join to Force
Quorum Backup nodes
4ClusterStarts!Not inForce
Quorum
Vote
Manual Failover: Long Term OutageRecommended Recovery
Vote
1 3 4
Vote
2
Primary Site
Witness Site
Backup Site
No VoteNo Vote
Primary Site Down!
!!
Not enoughVotes!!!Cluster Down!!
Force Quorum Cluster Start!
Vote VoteNo VoteNo Vote
ClusterNot inForce
QuorumNew Primary
Site New Backup
Site
Assign Votes to Nodes in Backup
Site
RemoveVotes from
Old Primary
Site
NewPrimary Site!
NewBackup Site!
Start these nodes with “Prevent Quorum”
Vote
Key Points to RememberIdentify your SLA’s for multisite clustersAutomatic vs. Manual Failover
Automatic FailoverKeep nodes equal in both sitesConfigure File Share Witness at separate site
Manual FailoverRemove votes of nodes in DR siteRemember the order of recovery actionsConfigure asymmetric disk witness or FSW as per votes
In Review: Session Objectives And TakeawaysSession Objective(s): Walk-through Cluster Quorum FundamentalsNew Quorum Features in Windows Server 2012Configuration of cluster quorumInsight into disaster recovery multi-site quorum
Key Takeaway(s):“Simplified” Cluster quorum configurationDynamic Quorum – Increases availability of clusterStep by step configuration of DR multi-site quorum
Related contentBreakout Sessions MDC-B305 Continuous Availability: Deploying and Managing Clusters Using Windows Server 2012 R2MDC-B311 Application Availability Strategies for the Private CloudMDC-B331 Upgrading Your Private Cloud with Windows Server 2012 R2MDC-B333 Storage and Availability Improvements in Windows Server 2012 R2MDC-B336 Cluster in a Box 2013: How Real Customers Are Making Their Business Highly Available…MDC-B337 Failover Cluster Networking EssentialsMDC-B375 Microsoft Private Cloud Fast Track v3: Private Cloud Reference Architecture…MDC-B403 Failover Clustering: Quorum Model Design for Your Private Cloud
Hands-on Labs MDC-H303 Configuring Hyper-V over Highly Available SMB Storage
Find Me Later at the Storage Booth
msdn
Resources for Developers
http://microsoft.com/msdn
Learning
Microsoft Certification & Training Resources
www.microsoft.com/learning
TechNet
Resources
Sessions on Demand
http://channel9.msdn.com/Events/TechEd
Resources for IT Professionals
http://microsoft.com/technet
Evaluate this session
Scan this QR code to evaluate this session.
© 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.