+ All Categories
Home > Documents > Failover Cluster Step

Failover Cluster Step

Date post: 15-Oct-2014
Category:
Upload: samee-chougule
View: 302 times
Download: 1 times
Share this document with a friend
24
Failover Cluster Step-by-Step Guide: Configuring the Quorum in a Failover Cluster Updated: May 5, 2008 Applies To: Windows Server 2008 A failover cluster is a group of independent computers that work together to increase the availability of applications and services. The clustered servers (called nodes) are connected by physical cables and by software. If one of the cluster nodes fails, another node begins to provide service (a process known as failover). Users experience a minimum of disruptions in service. This guide describes the new quorum options in failover clusters in Windows Server® 2008 and provides steps for configuring the quorum in a failover cluster. By following the configuration steps in this guide, you can learn about failover clusters and familiarize yourself with quorum modes in failover clustering. In Windows Server 2008, the improvements to failover clusters (formerly known as server clusters) are aimed at simplifying clusters, making them more secure, and enhancing cluster stability. Cluster setup and management are easier. Security and networking in clusters have been improved, as has the way a failover cluster communicates with storage. For more information about improvements to failover clusters, see http://go.microsoft.com/fwlink/?LinkId=62368. In this guide Overview of quorum in a failover cluster Requirements and recommendations for quorum configurations Steps for viewing the quorum configuration of a failover cluster Steps for changing the quorum configuration in a failover cluster Troubleshooting: how to force a cluster to start without quorum Additional references For additional background information, also see Appendix A: Details of How Quorum Works in a Failover Cluster and Appendix B: Additional Information About Quorum Modes. Overview of quorum in a failover cluster In simple terms, the quorum for a cluster is the number of elements that must be online for that cluster to continue running. In effect, each element can cast one “vote” to determine whether the cluster continues running. The voting elements are nodes or, in some cases, a disk witness or file share witness. Each voting element (with the exception of a file share witness) contains a copy of the cluster configuration, and the Cluster service works to keep all copies synchronized at all times. It is essential that the cluster stops running if too many failures occur or if there is a problem with communication between the cluster nodes. For a more detailed explanation, see the next section, Why quorum is necessary. Note that the full function of a cluster depends not just on quorum, but on the capacity of each node to support the services and applications that fail over to that node. For example, a cluster that has five nodes could still have quorum after two nodes fail, but each remaining cluster node would continue serving clients only if it had enough capacity to support the services and applications that failed over to it. Why quorum is necessary When network problems occur, they can interfere with communication between cluster nodes. A small set of nodes might be able to communicate together across a functioning part of a network, but might not be able to communicate with a different set of nodes in another part of the network. This can cause serious issues. In this “split” situation, at least one of the sets of nodes must stop running as a cluster. To prevent the issues that are caused by a split in the cluster, the cluster software requires that any set of nodes running as a cluster must use a voting algorithm to determine whether, at a given time, that set has quorum. Because a given cluster has a specific set of nodes and a specific quorum configuration, the cluster will know how many “votes” constitutes a majority (that is, a quorum). If the number drops below the majority, the cluster stops running. Nodes will still listen for the presence of other nodes, in case another node appears again on the network, but the nodes will not begin to function as a cluster until the quorum exists again.
Transcript
Page 1: Failover Cluster Step

Failover Cluster Step-by-Step Guide: Configuring the Quorum in a Failover ClusterUpdated: May 5, 2008

Applies To: Windows Server 2008

A failover cluster is a group of independent computers that work together to increase the availability of applications and services. The clustered servers (called nodes) are connected by physical cables and by software. If one of the cluster nodes fails, another node begins to provide service (a process known as failover). Users experience a minimum of disruptions in service.

This guide describes the new quorum options in failover clusters in Windows Server® 2008 and provides steps for configuring the quorum in a failover cluster. By following the configuration steps in this guide, you can learn about failover clusters and familiarize yourself with quorum modes in failover clustering.

In Windows Server 2008, the improvements to failover clusters (formerly known as server clusters) are aimed at simplifying clusters, making them more secure, and enhancing cluster stability. Cluster setup and management are easier. Security and networking in clusters have been improved, as has the way a failover cluster communicates with storage. For more information about improvements to failover clusters, see http://go.microsoft.com/fwlink/?LinkId=62368.

In this guide

Overview of quorum in a failover cluster

Requirements and recommendations for quorum configurations

Steps for viewing the quorum configuration of a failover cluster

Steps for changing the quorum configuration in a failover cluster

Troubleshooting: how to force a cluster to start without quorum

Additional references

For additional background information, also see Appendix A: Details of How Quorum Works in a Failover Cluster and Appendix B: Additional Information About Quorum Modes.

Overview of quorum in a failover cluster

In simple terms, the quorum for a cluster is the number of elements that must be online for that cluster to continue running. In effect, each element can cast one “vote” to determine whether the cluster continues running. The voting elements are nodes or, in some cases, a disk witness or file share witness. Each voting element (with the exception of a file share witness) contains a copy of the cluster configuration, and the Cluster service works to keep all copies synchronized at all times.

It is essential that the cluster stops running if too many failures occur or if there is a problem with communication between the cluster nodes. For a more detailed explanation, see the next section, Why quorum is necessary.

Note that the full function of a cluster depends not just on quorum, but on the capacity of each node to support the services and applications that fail over to that node. For example, a cluster that has five nodes could still have quorum after two nodes fail, but each remaining cluster node would continue serving clients only if it had enough capacity to support the services and applications that failed over to it.

Why quorum is necessary

When network problems occur, they can interfere with communication between cluster nodes. A small set of nodes might be able to communicate together across a functioning part of a network, but might not be able to communicate with a different set of nodes in another part of the network. This can cause serious issues. In this “split” situation, at least one of the sets of nodes must stop running as a cluster.

To prevent the issues that are caused by a split in the cluster, the cluster software requires that any set of nodes running as a cluster must use a voting algorithm to determine whether, at a given time, that set has quorum. Because a given cluster has a specific set of nodes and a specific quorum configuration, the cluster will know how many “votes” constitutes a majority (that is, a quorum). If the number drops below the majority, the cluster stops running. Nodes will still listen for the presence of other nodes, in case another node appears again on the network, but the nodes will not begin to function as a cluster until the quorum exists again.

For example, in a five node cluster that is using a node majority, consider what happens if nodes 1, 2, and 3 can communicate with each other but not with nodes 4 and 5. Nodes 1, 2, and 3 constitute a majority, and they continue running as a cluster. Nodes 4 and 5 are a minority and stop running as a cluster, which prevents the problems of a “split” situation. If node 3 loses communication with other nodes, all nodes stop running as a cluster. However, all functioning nodes will continue to listen for communication, so that when the network begins working again, the cluster can form and begin to run.

For more information about how quorum works, see Appendix A: Details of How Quorum Works in a Failover Cluster.

Overview of the quorum modes

There have been significant improvements to the quorum model in Windows Server 2008. In Windows Server 2003, almost all server clusters used a disk in cluster storage (the “quorum resource”) as the quorum. If a node could communicate with the specified disk, the node could function as a part of a cluster, and otherwise it could not. This made the quorum resource a potential single point of failure. In Windows Server 2008, a majority of ‘votes’ is what determines whether a cluster achieves quorum. Nodes can vote, and where appropriate, either a

Page 2: Failover Cluster Step

disk in cluster storage (called a “disk witness”) or a file share (called a “file share witness”) can vote. There is also a quorum mode called No Majority: Disk Only which functions like the disk-based quorum in Windows Server 2003. Aside from that mode, there is no single point of failure with the quorum modes, since what matters is the number of votes, not whether a particular element is available to vote.

This new quorum model is flexible and you can choose the mode best suited to your cluster.

Important

In most situations, it is best to use the quorum mode selected by the cluster software. If you run the quorum configuration wizard, the quorum mode that the wizard lists as “recommended” is the quorum mode chosen by the cluster software. We only recommend changing the quorum configuration if you have determined that the change is appropriate for your cluster.

There are four quorum modes:

Node Majority: Each node that is available and in communication can vote. The cluster functions only with a majority of the votes,

that is, more than half.

Node and Disk Majority: Each node plus a designated disk in the cluster storage (the “disk witness”) can vote, whenever they are

available and in communication. The cluster functions only with a majority of the votes, that is, more than half.

Node and File Share Majority: Each node plus a designated file share created by the administrator (the “file share witness”) can

vote, whenever they are available and in communication. The cluster functions only with a majority of the votes, that is, more than

half.

No Majority: Disk Only: The cluster has quorum if one node is available and in communication with a specific disk in the cluster

storage. Only the nodes that are also in communication with that disk can join the cluster.

Choosing the quorum mode for a particular clusterThe following table describes clusters based on the number of nodes and other cluster characteristics, and lists the quorum mode that is recommended in most cases.

A “multi-site” cluster is a cluster in which an investment has been made to place sets of nodes and storage in physically separate locations, providing a disaster recovery solution. An “Exchange Cluster Continuous Replication (CCR)” cluster is a failover cluster that includes Exchange Server 2007 with Cluster Continuous Replication, a high-availability feature that combines asynchronous log shipping and replay technology.

 

Description of cluster Quorum recommendation

Odd number of nodes Node Majority

Even number of nodes (but not a multi-site cluster) Node and Disk Majority

Even number of nodes, multi-site cluster Node and File Share Majority

Even number of nodes, no shared storage Node and File Share Majority

Exchange CCR cluster (two nodes) Node and File Share Majority

Diagrams of quorum modesThe following diagrams show how each of the quorum modes affects whether a cluster can or cannot achieve quorum.

Node MajorityThe following diagram shows Node Majority used (as recommended) for a cluster with an odd number of nodes.

Page 3: Failover Cluster Step

Node and Disk MajorityThe following diagram shows Node and Disk Majority used (as recommended) for a cluster with an even number of nodes. Each node can vote, as can the disk witness.

The following diagram shows how the disk witness also contains a replica of the cluster configuration database in a cluster that uses Node and Disk Majority.

Node and File Share MajorityThe following diagram shows Node and File Share Majority used (as recommended) for a cluster with an even number of nodes and a situation where having a file share witness works better than having a disk witness. Each node can vote, as can the file share witness.

Page 4: Failover Cluster Step

The following diagram shows how the file share witness can vote, but does not contain a replica of the cluster configuration database. Note that the file share witness does contain information about which version of the cluster configuration database is the most recent.

No Majority: Disk OnlyThe following illustration shows how a cluster that uses the disk as the only determiner of quorum can run even if only one node is available and in communication with the quorum disk. It also shows how the cluster cannot run if the quorum disk is not available (single point of failure). For this cluster, which has an odd number of nodes, Node Majority is the recommended quorum mode.

Additional information about quorum modesFor more information about quorum modes, see Appendix B: Additional Information About Quorum Modes.

Requirements and recommendations for quorum configurations

Before configuring the quorum for a failover cluster you must of course meet the requirements for the cluster itself. For information about cluster requirements, see http://go.microsoft.com/fwlink/?LinkId=114536. For information about cluster validation, see http://go.microsoft.com/fwlink/?LinkId=114537 and http://go.microsoft.com/fwlink/?LinkId=114538.

Page 5: Failover Cluster Step

For a cluster using the Node Majority quorum mode (which includes almost all clusters with an odd number of nodes), there are no additional requirements for the quorum. The following sections provide guidelines for clusters using the Node and Disk Majority quorum mode and the Node and File Share Majority quorum mode. (The requirements and recommendations for the Node and Disk Majority mode also apply to the No Majority: Disk Only mode.)

Requirements and recommendations for clusters using Node and Disk Majority

When using the Node and Disk Majority mode, review the following requirements and recommendations for the disk witness.

Note

These requirements and recommendations also apply to the quorum disk for the No Majority: Disk Only mode.

Use a small Logical Unit Number (LUN) that is at least 512 MB in size.

Choose a basic disk with a single volume.

Make sure that the LUN is dedicated to the disk witness. It must not contain any other user or application data.

Choose whether to assign a drive letter to the LUN based on the needs of your cluster. The LUN does not have to have a drive letter

(to conserve drive letters for applications).

As with other LUNs that are to be used by the cluster, you must add the LUN to the set of disks that the cluster can use. For more

information, see http://go.microsoft.com/fwlink/?LinkId=114539.

Make sure that the LUN has been verified with the Validate a Configuration Wizard.

We recommend that you configure the LUN with hardware RAID for fault tolerance.

In most situations, do not back up the disk witness or the data on it. Backing up the disk witness can add to the input/output (I/O)

activity on the disk and decrease its performance, which could potentially cause it to fail.

We recommend that you avoid all antivirus scanning on the disk witness.

Format the LUN with the NTFS file system.

If there is a disk witness configured, but bringing that disk online will not achieve quorum, then it remains offline. If bringing that disk online will achieve quorum, then it is brought online by the cluster software.

Requirements and recommendations for clusters using Node and File Share MajorityWhen using the Node and File Share Majority mode, review the following recommendations for the file share witness.

Use a Server Message Block (SMB) share on a Windows Server 2003 or Windows Server 2008 file server.

Make sure that the file share has a minimum of 5 MB of free space.

Make sure that the file share is dedicated to the cluster and is not used in other ways (including storage of user or application data).

Do not place the share on a node that is a member of this cluster or will become a member of this cluster in the future.

You can place the share on a file server that has multiple file shares servicing different purposes. This may include multiple file share

witnesses, each one a dedicated share. You can even place the share on a clustered file server (in a different cluster), which would

typically be a clustered file server containing multiple file shares servicing different purposes.

For a multi-site cluster, you can co-locate the external file share at one of the sites where a node or nodes are located. However, we

recommend that you configure the external share in a separate third site.

Place the file share on a server that is a member of a domain, in the same forest as the cluster nodes.

For the folder that the file share uses, make sure that the administrator has Full Control share and NTFS permissions.

Do not use a file share that is part of a Distributed File System (DFS) Namespace.

Note

After the Quorum Configuration Wizard has been run, the computer object for the Cluster Name will automatically be granted read and write permissions to the file share.

Page 6: Failover Cluster Step

If there is a file share witness configured, but bringing that file share online will not achieve quorum, then it remains offline. If bringing that file share online will achieve quorum, then it is brought online by cluster software.

For more information about file share witness recommendations, see:

http://go.microsoft.com/fwlink/?LinkId=114540

http://go.microsoft.com/fwlink/?LinkId=114541

Steps for viewing the quorum configuration of a failover cluster

When you install a failover cluster, the cluster software automatically chooses an appropriate quorum configuration for that cluster, based mainly on the number of nodes (even or odd). You can easily view the quorum configuration of an existing cluster using either the Failover Cluster Management snap-in or the command line.

To view the quorum configuration of an existing cluster using the Failover Cluster Management snap-in1. To open the failover cluster snap-in, click Start, click Administrative Tools, and then click Failover Cluster Management. (If the

User Account Control dialog box appears, confirm that the action it displays is what you want, and then click Continue.)

2. In the console tree, if the cluster that you want to view is not displayed, right-click Failover Cluster Management, click Manage a

Cluster, and then select the cluster you want to view.

3. In the center pane, find Quorum Configuration, and view the description.

In the following example, the quorum mode is Node and Disk Majority and the disk witness is Cluster Disk 2.

To view the quorum configuration of an existing cluster using the Command Prompt window1. To open a Command Prompt window, on a cluster node, click Start, right-click Command Prompt, and then either click Run as

administrator or click Open.

Page 7: Failover Cluster Step

2. If the User Account Control dialog box appears, confirm that the action it displays is what you want, and then click Continue.

3. Review the configuration of the quorum by typing:

cluster /quorum

Steps for changing the quorum configuration in a failover cluster

You must complete the following steps to change the quorum configuration in a failover cluster.

Important

Unless you have changed the number of nodes in your cluster, it is usually best to use the quorum configuration recommended by the quorum configuration wizard. We only recommend changing the quorum configuration if you have determined that the change is appropriate for your cluster.

Membership in the local Administrators group on each clustered server, or equivalent, is the minimum permissions required to complete this procedure. Also, the account you use must be a domain user account. Review details about using the appropriate accounts and group memberships at http://go.microsoft.com/fwlink/?LinkId=83477.

To change the quorum configuration in a failover cluster1. To open the failover cluster snap-in, click Start, click Administrative Tools, and then click Failover Cluster Management. (If the

User Account Control dialog box appears, confirm that the action it displays is what you want, and then click Continue.)

2. In the Failover Cluster Management snap-in, if the cluster you want to configure is not displayed, in the console tree, right-click

Failover Cluster Management, click Manage a Cluster, and select or specify the cluster you want.

3. With the cluster selected, under Actions, click More Actions, and then click Configure Cluster Quorum Settings.

Page 8: Failover Cluster Step

4. Click Next. The following illustration shows the wizard page that displays for a cluster with an even number of nodes. Note that the

text on this page varies, depending on whether the cluster has an even number or odd number of nodes. To view more information

about the selections on this page, at the bottom of the page, click More about quorum configurations.

5. Select a quorum mode from the list. For more information, see Choosing the quorum mode for a particular cluster, earlier in this

guide.

6. Click Next and then go to the appropriate step in this procedure:

If you chose Node Majority, go to the last step in this procedure.

If you chose Node and Disk Majority or No Majority, go to the next step in this procedure.

If you chose Node and File Share Majority, skip to step 8 in this procedure.

7. If you chose Node and Disk Majority or No Majority, a wizard page similar to the following appears. (For No Majority, the title of

the page is Select Storage Resource.) Select the storage volume that you want to use for the disk witness (or if you chose No

Majority, for the quorum resource), and then skip to step 9. For information about the requirements for the disk witness, see

Requirements and recommendations for clusters using Node and Disk Majority.

If you change disk assignments on this page, the former storage volume is no longer assigned to the core Cluster Group and instead

goes back to Available Storage.

Page 9: Failover Cluster Step

8. If you chose Node and File Share Majority, the following wizard page appears. Specify the file share you want to use, or click the

Browse button and use the standard browsing interface to select the file share. For information about the requirements for the file

share, see Requirements and recommendations for clusters using Node and File Share Majority.

Page 10: Failover Cluster Step

9. Click Next. Use the confirmation page to confirm your selections, and then click Next.

10. After the wizard runs and the Summary page appears, if you want to view a report of the tasks that the wizard performed, click

View Report.

Note

The most recent report will remain in the systemroot\Cluster\Reports folder with the name QuorumConfiguration.mht.

Troubleshooting: how to force a cluster to start without quorum

When troubleshooting, you might be in a situation where the cluster is offline because it does not have quorum, but you want to bring it online. The first thing to understand is your quorum mode and why you no longer have quorum. This may provide some insight into how the cluster can achieve quorum and come online automatically. If you need to force the Cluster service to start, you can make all nodes which can communicate with each other begin working together as a cluster by running the net start clussvc command with an option for forcing quorum. The cluster will use the copy of the cluster configuration that is on the node on which you run the command, and replicate it to all other nodes. To force the cluster to start, on a node that contains a copy of the cluster configuration that you want to use, type the following command:

net start clussvc /fq

The command can also be typed as net start clussvc /forcequorum. In Windows Server 2008, the net start clussvc command no longer includes the /resetquorumlog or /fixquorum options.

Forcing a cluster to start that does not have quorum may be especially useful in an unbalanced multi-site cluster. If you have a five-node multi-site cluster and three nodes at Site A fail, then the two nodes at Site B will go offline since they no longer have quorum. If there is a genuine disaster at Site A, then it may take a significant amount of time for the site to come online, and so you would likely want to force Site B to come online, even though it does not have quorum.

Page 11: Failover Cluster Step

When a cluster is forced to start without quorum it continually looks to add nodes to the cluster and is in a special “forced” state. Once it has majority, the cluster moves out of the forced state and behaves normally, which means it is not necessary to rerun the cluster command without a startup switch. If the cluster then loses a node and drops below quorum, it will go offline again because it is no longer in the forced state. At that point, to bring it online again while it does not have quorum would require running net start clussvc /fq again.

Appendix A: Details of How Quorum Works in a Failover ClusterUpdated: May 5, 2008

Applies To: Windows Server 2008

This appendix supplements the information in Failover Cluster Step-by-Step Guide: Configuring the Quorum in a Failover Cluster, which we recommend that you read first.

In this appendix

Notes on the concept of quorum in Windows Server 2008

Notes on quorum configuration in Windows Server 2008

The process of achieving quorum

Notes on the concept of quorum in Windows Server 2008

Quorum is not a new concept for clustering in Windows Server products, but the implementation, and thus the behavior, is new for Windows Server 2008. The new quorum model can be adapted more to the high availability characteristics requested by the system administrator to support applications, and is less tightly coupled to the way cluster hardware is hooked together. Furthermore, the new quorum model has eliminated the single point of failure that existed in previous clustering releases which results in failover clustering in Windows Server 2008 being the most resilient solution to date.

If you choose Node and Disk Majority for the quorum mode, when you select the disk witness, a \Cluster folder is created at the root of the selected disk, and cluster configuration information is stored there. The same information is also stored on each node.

\Cluster folder contains the cluster registry hive

No more checkpoint files or quorum log files

The three main reasons why quorum is important are to ensure consistency, act as a tie-breaker to avoid partitioning, and to ensure cluster responsiveness.

Because the basic idea of a cluster is multiple physical servers acting as a single logical server, a primary requirement for a cluster is

that each of the physical servers always has a view of the cluster that is consistent with the other servers. The cluster hive acts as

the definitive repository for all configuration information relating to the cluster. In the event that the cluster hive cannot be loaded

locally on a node, the Cluster service does not start, because it is not able to guarantee that the physical server meets the

requirement of having a view of the cluster that is consistent with the other servers.

A witness resource is used as the tie-breaker to avoid “split” scenarios and to ensure that one, and only one, collection of the

members in a distributed system is considered “official.” A split scenario happens when all of the network communication links

between two or more cluster nodes fail. In these cases, the cluster may be split into two or more partitions that cannot communicate

with each other. Having only one official membership prevents unsynchronized access to data by other partitions (unsynchronized

access can cause data corruption). Likewise, having only one official membership prevents clustered services or applications being

brought online by two different nodes: only a node in the collection of nodes that has achieved quorum can bring the clustered

service or application online.

To ensure responsiveness, the quorum model ensures that whenever the cluster is running, enough members of the distributed

system are operational and communicative, and at least one replica of current state can be guaranteed. This means that no

additional time is required to bring members into communication or to determine whether a given replica is guaranteed.

Notes on quorum configuration in Windows Server 2008

The follow notes apply to quorum configuration in Windows Server 2008:

Page 12: Failover Cluster Step

It is a good idea to review the quorum mode after the cluster is created, before placing the cluster into production. The cluster

software selects the quorum mode for a new cluster, based on the number of nodes, and this is usually the most appropriate quorum

mode for that cluster.

After the cluster is in production, do not change the quorum configuration unless you have determined that the change is appropriate

for your cluster. However, if you decide to change the quorum configuration and have confirmed that the new configuration will be

appropriate, you can make the change without stopping the cluster.

When nodes are waiting for other members to appear, the Cluster service still shows as started in the Services Control Manager. This

may cause some confusion because it is different behavior than in Windows Server 2003.

If the Cluster service shuts down because quorum has been lost, Event ID 1177 will appear in the system log.

A cluster can be forced into service when it does not have majority by starting the Cluster service using the net start clussvc

command with the /forcequorum option, as described in “Troubleshooting: how to force a cluster to start without quorum” in

Failover Cluster Step-by-Step Guide: Configuring the Quorum in a Failover Cluster.

The process of achieving quorum

Because a given cluster has a specific set of nodes and a specific quorum configuration, the cluster software on each node stores information about how many "votes" constitutes a quorum for that cluster. If the number drops below the majority, the cluster stops providing services. Nodes will continue listening for incoming connections from other nodes on port 3343, in case they appear again on the network, but the nodes will not begin to function as a cluster until quorum is achieved.

There are several phases a cluster must go through in order to achieve quorum. At a high level, they are:

1. As a given node comes up, it determines if there are other cluster members that can be communicated with (this process may be in

progress on multiple nodes simultaneously).

2. Once communication is established with other members, the members compare their membership “views” of the cluster until they

agree on one view (based on timestamps and other information).

3. A determination is made as to whether this collection of members “has quorum,” or in other words, has enough members that a

“split” scenario cannot exist. A “split” scenario would mean that another set of nodes that are in this cluster was running on a part of

the network not accessible to these nodes.

4. If there are not enough votes to achieve quorum, then the voters wait for more members to appear. If there are enough votes

present, the Cluster service begins to bring cluster resources and applications into service.

5. With quorum attained, the cluster becomes fully functional.

Appendix B: Additional Information About Quorum ModesUpdated: May 5, 2008

Applies To: Windows Server 2008

This appendix supplements the information in Failover Cluster Step-by-Step Guide: Configuring the Quorum in a Failover Cluster, which we recommend that you read first. The appendix provides additional information about each quorum mode available in failover clusters in Windows Server 2008, plus quorum recommendations for certain specific types of clusters.

In this appendix

Node Majority quorum mode

Node and Disk Majority quorum mode

Node and File Share Majority quorum mode

No Majority: Disk Only quorum mode

Selecting the appropriate quorum mode for a particular cluster

Local two-node cluster

Single-node cluster

Cluster with no shared storage

Page 13: Failover Cluster Step

Even-node cluster

Multi-site or geographically dispersed cluster

Exchange Continuous Cluster Replication (CCR) cluster

Node Majority quorum mode

On a cluster that uses Node Majority as the quorum mode, each node gets a vote, and each node’s local system disk is used to store the cluster configuration (the replica). If the configuration of the cluster changes, that change is reflected across the different disks. The change is only considered to have been committed, that is, made persistent, if that change is made to the disks on half the nodes (rounding down) plus one. For example, in a five-node cluster, the change must be made on two plus one nodes, or three nodes total.

Key usage scenarios for Node MajorityThe Node Majority mode is usually the best quorum mode for a cluster with an odd number of nodes.

Recommended for:

Single-node cluster

Cluster with no shared storage

Multi-site or geographically dispersed cluster

Not recommended for:

Local two-node cluster

Even-node cluster

Failure tolerance for Node Majority

 

Number of nodes (N) Number of replicas Default node failures tolerated: N/2

1 1 0

2 2 0

3 3 1

4 4 1

5 5 2

6 6 2

7 7 3

8 8 3

9 9 4

10 10 4

11 11 5

12 12 5

13 13 6

14 14 6

Page 14: Failover Cluster Step

15 15 7

16 16 7

We do not recommend that the Node Majority mode be used with a two-node cluster because such a cluster cannot tolerate any failures.

Node and Disk Majority quorum mode

A Node and Disk Majority quorum is a configuration where each node and a physical disk, the ‘disk witness’, get a vote. The cluster configuration is stored by default on the system disk of each node in the cluster and on the disk witness. It is kept consistent across the cluster and the change is only considered to have been committed, that is, made persistent, if that change is made to half of the disks (rounding down) plus one. For example, in a four-node cluster with a disk witness, the four disks on the nodes plus the disk witness make five, so the change must be made on two plus one disks, or three disks total.

We recommend using a disk witness instead of a file share witness as it is less likely for a “split” scenario to occur, because there is a replica on the disk witness but not on the file share witness. The disk witness can ensure that there is no partition in time since it has a copy of the replica on it and ensures that the cluster has the most up to date configuration.

If you create a cluster with an even number of nodes, and the cluster software automatically chooses Node and Disk Majority, the cluster software will also choose a witness disk. The cluster software will choose the smallest disk that is more than 512 MB in size. If there are multiple disks that meet the criteria, the cluster software will pick the disk listed first by the operating system. You can choose a different disk for the witness disk by running the quorum configuration wizard.

Key usage scenarios for Node and Disk MajorityThe Node and Disk Majority quorum is economical, allowing the addition of another voter without the need to purchase another node.

Recommended for:

Local two-node cluster

Even-node cluster

Not recommended for:

Single-node cluster

Cluster with no shared storage

Failure tolerance for Node and Disk Majority

 

Number of nodes (N) Number of replicas Default node + disk failures tolerated: N/2

1 2 0

2 3 1

3 4 1

4 5 2

5 6 2

6 7 3

7 8 3

8 9 4

9 10 4

10 11 5

Page 15: Failover Cluster Step

11 12 5

12 13 6

13 14 6

14 15 7

15 16 7

16 17 8

We do not recommend using single-node clusters because they cannot tolerate any failures.

Node and File Share Majority quorum mode

The Node and File Share Majority quorum mode is a configuration where each node and the file share witness get a vote. The replica is stored by default on the system disk of each node in the cluster, and is kept consistent across those disks. However, a copy is not stored on the file share witness, which is the main difference between this mode and the Node and Disk Majority mode. The file share witness keeps track of which node has the most updated replica, but does not have a replica itself. This can lead to scenarios when only a node and the file share witness survive, but the cluster will not come online if the surviving node does not have the updated replica because this would cause a “partition in time.” This solution does not solve the partition in time problem, but prevents a “split” scenario from occurring.

We recommend using a disk witness instead of a file share witness as it is less likely for a “split” scenario to occur, because there is a replica on the disk witness but not on the file share witness. The disk witness can ensure there is no partition in time since it has a copy of the replica on it and ensures that the cluster has the most up to date configuration.

The cluster configuration is stored by default on the system disk of each node in the cluster, and information about which nodes contain the latest configuration is noted on the file share witness. This information is kept synchronized across the nodes and file share, and a change is only considered to have been committed, that is, made persistent, if that change is made or noted on half of the total locations (rounding down) plus one. For example, in a four-node cluster with a file share witness, the four disks on the nodes plus the file share witness make five, so the change must be made or noted in two plus one locations, or three locations total.

“Partition in time” example for Node and File Share MajorityThe following describes an example of how a partition in time can occur:

1. You have a local two-node cluster with NodeA and NodeB. Both nodes are running and their replicas of the cluster configuration are

synchronized.

2. NodeB is turned off.

3. Changes are made on NodeA. These changes are only reflected on NodeA’s replica of the cluster configuration.

4. NodeA is turned off.

5. NodeB is turned on. NodeB does not have the most recent replica of the cluster configuration, and the file share witness does not

have a replica at all. However, the file share witness contains the information that NodeA has the most recent replica, so the file

share witness prevents NodeB from forming the cluster.

Key usage scenarios for Node and File Share MajorityThe Node and File Share Majority quorum is economical, allowing the addition of another voter without the need to purchase another node or additional storage.

Recommended for:

Local two-node cluster

Cluster with no shared storage

Even-node cluster

Multi-site or geographically dispersed cluster

Exchange Continuous Cluster Replication (CCR) cluster

Not recommended for:

Single-node cluster

Page 16: Failover Cluster Step

Failure tolerance for Node and File Share Majority

 

Number of nodes (N) Number of replicas Default node + file share failures tolerated: N/2

1 1 0

2 2 1

3 3 1

4 4 2

5 5 2

6 6 3

7 7 3

8 8 4

9 9 4

10 10 5

11 11 5

12 12 6

13 13 6

14 14 7

15 15 7

16 16 8

We do not recommend using single-node clusters because they cannot tolerate any failures.

No Majority: Disk Only quorum mode

A No Majority: Disk Only quorum mode behaves similarly to the legacy quorum configuration from Windows Server 2003. We do not recommend using this quorum mode because it presents a single point of failure. A loss of the single shared disk causes the entire cluster to fail. The cluster replica that is stored on the shared disk is considered the primary database and is always kept the most up to date. The shared storage interconnect to the shared disk must be accessible by all members of the cluster. In the case of a situation where a node has been out of communication and may have an out-of-date replica of the configuration, the data on the shared disk is considered to be the authoritative copy of the cluster configuration. Having a copy on each node allows the shared disk replica to be automatically repaired or replaced if it is lost or becomes corrupted. This authoritative copy contributes the only vote towards having quorum. The nodes do not contribute votes.

The cluster service itself will only start up and therefore bring resources online if the shared disk is available and online, since it is the one and only vote in the cluster. If it is not online, the cluster is said not to have quorum and therefore the cluster service waits (trying to restart) until this disk comes back online. Because the configuration on the shared disk is authoritative, the cluster will always guarantee that it starts up with the latest and most up-to-date configuration.

In the case of a “split” scenario, any group of nodes that is not connected to the shared disk is prevented from forming a cluster since the nodes will have no votes. This ensures that only a group of nodes connected to the disk forms a cluster, and the nodes can run without the possibility of another subsection of the cluster also running.

Key usage scenarios for No Majority: Disk OnlyIn most cases, the No Majority: Disk Only quorum mode is not recommended for use because it presents a single point of failure.

Recommended for:

None

Page 17: Failover Cluster Step

Not recommended for:

Local two-node cluster

Cluster with no shared storage

Even-node cluster

Multi-site or geographically dispersed cluster

Exchange Continuous Cluster Replication (CCR) cluster

Failure tolerance for No Majority: Disk OnlyAs shown in the following table, with the No Majority: Disk Only quorum mode, the cluster can tolerate the failure of n-1 nodes, but cannot tolerate any failures of the quorum disk.

 

Number of nodes Failures of nodes tolerated Failures of quorum disk tolerated

1 0 0

2 1 0

n n-1 0

16 15 0

Selecting the appropriate quorum mode for a particular cluster

It is a best practice to decide which quorum mode you will use before placing the cluster into production. You might want to reevaluate your existing quorum mode when you add additional nodes.

This section provides recommendations for the quorum mode for the following types of clusters:

Local two-node cluster

Single-node cluster

Cluster with no shared storage

Even-node cluster

Multi-site or geographically dispersed cluster

Exchange Continuous Cluster Replication (CCR) cluster

Local two-node cluster

The following table provides recommendations for the quorum mode for a local two-node cluster.

 

Recommended Not recommended

Node Majority X

Node and Disk Majority X (best)

Node and File Share Majority X

Page 18: Failover Cluster Step

No Majority: Disk Only X

This is the most common cluster configuration. For a standard local two-node cluster, it is recommended that you select the Node and Disk Majority mode. This can increase the availability of your cluster without requiring you to increase it to three nodes. While either type of witness (disk or file share) works, we recommend using a disk witness instead of a file share witness because it is less likely for a partition in time to occur. This is because a disk witness provides a replica of the cluster configuration, increasing the likelihood that the most up to date configuration is available to the cluster at any given time.

However, the Node and File Share Majority mode does not require an additional physical disk in cluster storage. This can be very beneficial if you have budgetary constraints.

Important

If you have a two-node cluster, the Node Majority mode is not recommended, as failure of one node will lead to failure of the entire cluster.

Single-node cluster

The following table provides recommendations for the quorum mode for a single-node cluster.

 

Recommended Not recommended

Node Majority X

Node and Disk Majority X

Node and File Share Majority X

This is the simplest configuration and since there is only one node which can vote, the addition of a disk or file share witness costs additional storage space without the benefit of sustaining additional failures. For this reason the easiest and cheapest configuration is the Node Majority mode where the cluster will be running if and only if the single node is running.

This configuration is widely used for development and testing, providing the ability to use the cluster infrastructure without the expense and complexities of a second computer. This solution also enables using the health monitoring and resource management features of the cluster infrastructure on the local node. It can also be used as an initial step when planning to add more nodes at a later date (however the quorum mode should then be adjusted appropriately).

Cluster with no shared storage

The following table provides recommendations for the quorum mode for a cluster with no shared storage.

 

Recommended Not recommended

Node Majority X

Node and Disk Majority X

Node and File Share Majority X

No Majority: Disk Only X

This is a specialized configuration that does not include shared disks, but has other features that make it consistent with failover cluster requirements. Because there are no shared disks, the Node and Disk Majority mode cannot be used.

This would be used in the following situations:

Page 19: Failover Cluster Step

Clusters that host applications that can fail over, but where there is some other, application-specific way to keep data consistent

between nodes (for example, database log shipping for keeping database state up-to-date, or file replication for relatively static

data).

Clusters that host applications that have no persistent data, but where the nodes need to cooperate in a tightly coupled way to

provide consistent volatile state.

Clusters using solutions from independent software vendors (ISVs). If storage is abstracted from the Cluster service, independent

software vendors have much greater flexibility in how they design sophisticated cluster scenarios.

Even-node cluster

The following table provides recommendations for the quorum mode for a cluster with an even number of nodes.

 

Recommended Not recommended

Node Majority X

Node and Disk Majority X

Node and File Share Majority X

No Majority: Disk Only X

Even-node clusters (with 2, 4, 6, or 8 nodes) that use the Node Majority mode are not entirely economical, as they provided no additional quorum benefit from their n-1 counterparts (of 1, 3, 5, or 7 nodes). For example, if you have a 3-node or a 4-node cluster using Node Majority, you could still only tolerate the failure of one node while maintaining quorum. With the addition of a vote from a disk witness or file share witness, a 4-node cluster (using the Node and Disk Majority mode or the Node and File Share Majority mode) can sustain two failures, making it more resilient with little or no additional cost.

Multi-site or geographically dispersed cluster

The following table provides recommendations for the quorum mode for a multi-site or geographically dispersed cluster.

 

Recommended Not recommended

Node Majority X

Node and File Share Majority X (best)

No Majority: Disk Only X

Many of the benefits of multi-site clusters largely derive from the fact that they work slightly differently from conventional, local clusters. Setting up a cluster whose nodes are separated by hundreds, or even thousands, of miles will affect the choices you make on everything from the quorum model you choose to how you configure your network and data storage for the cluster. For some business applications, even an event as unlikely as a fire, flood, or an earthquake can pose an intolerable amount of risk to business operations. For truly essential workloads, distance can provide the only hedge against catastrophe. By failing server workloads over to servers separated by even a few miles, truly disastrous data loss and application downtime can be prevented. Windows Server 2008 supports multi-site clustering of unlimited distance, making the solution more resilient to local, regional or even national disasters. This section outlines some of the considerations unique to multi-site clustering and examines what they mean for a disaster recovery strategy.

With Windows Server 2008, you can deploy a multi-site cluster to automate the failover of applications in situations where the following occurs:

Communication between sites has failed.

One site is down and is no longer available to run applications.

A multi-site cluster is a Windows Server 2008 failover cluster that has the following attributes:

Page 20: Failover Cluster Step

Applications are set up to fail over just as in a single-site cluster. The Cluster service provides health monitoring and failure detection

for the applications, the nodes, and the communications links.

The cluster has multiple storage arrays, with at least one storage array deployed at each site. This ensures that in the event of a

failure of any one site, the other site or sites will have local copies of the data that can be used to continue to provide highly available

services and applications.

The cluster nodes are connected to storage in such a way that in the event of a failure of a site or the communication links between

sites, the nodes on a given site can access the storage on that site. In other words, in a two-site configuration, the nodes in Site A are

connected to the storage in Site A directly, and the nodes in Site B are connected to the storage in Site B directly. The nodes in Site A

can continue without accessing the storage on Site B and vice versa.

The cluster’s storage fabric or host-based software provides a way to mirror or replicate data between the sites so that each site has

a copy of the data. There is no shared mass storage that all of the nodes access, which means that data must be replicated between

the separate storage arrays to which each node is attached.

Using Node and File Share Majority for a multi-site cluster

In this quorum mode, all nodes and a file share witness get a vote to determine majority for cluster membership. This helps to eliminate failure points in the old model, where it was assumed that the disk would always be available; if the disk failed, the cluster would fail. This makes the Node and File Share Majority quorum mode particularly well suited to multi-site clusters. A single file server can serve as a witness to multiple clusters (with each cluster using a separate file share witness on the file server).

Note

We recommend that you place the file share witness at a third site which does not host a cluster node.

For example, suppose you have two physical sites, Site A and Site B, each with two nodes. You also have a file share witness with a single vote stored at a third physical site, such as a smaller server at a branch office. You now have a total of five votes (two from Site A, two from Site B, and one from the file share witness).

Disaster at Site A: You lose two votes from Site A, yet with the two votes from Site B and the one vote from the file share witness,

you still have three of five votes and maintain quorum.

Disaster at Site B: You lose two votes from Site B, yet with the two votes from Site A and the one vote from the file share witness,

you still have three of five votes and maintain quorum.

Disaster at the file share witness site: You lose one vote from the file share witness, but with two votes from Site A and two

votes from Site B, you still have four of five votes and maintain quorum.

With the use of the file share witness at a third site, you can now sustain complete failure of one site and keep your cluster running.

Using Node Majority for a multi-site cluster

If a file share witness at a site independent of your cluster sites is not an option for you, you can still use a multi-site cluster with the Node Majority quorum mode. The result is that the majority of votes are necessary to operate the cluster.

Note

A multi-site cluster with three nodes at three separate sites is possible. It would continue to function if one of the sites were unavailable, but would cease to function if two sites became unavailable.

For example, suppose you have a multi-site cluster consisting of five nodes, three of which reside at Site A and the remaining two at Site B. With a break in communication between the two sites, Site A can still communicate with three nodes (which is greater than half of the total), so all of the nodes at Site A stay up. The nodes in Site B are able to communicate with each other, but no one else. Since the two nodes at Site B cannot communicate with the majority, they drop out of cluster membership. If Site A went down, to bring up the cluster at Site B, you would need to intervene manually to override the non-majority (for more information about forcing a cluster to start without quorum, see “Troubleshooting: how to force a cluster to start without quorum” in Failover Cluster Step-by-Step Guide: Configuring the Quorum in a Failover Cluster).

Page 21: Failover Cluster Step

Note

This configuration is less fault-tolerant than the Node and File Share Majority mode, because the loss of the primary site causes the entire multi-site cluster to fail.

Exchange Continuous Cluster Replication (CCR) cluster

The following table provides recommendations for the quorum mode for an Exchange Continuous Cluster Replication (CCR) cluster.

 

Recommended Not recommended

Node and File Share Majority X

No Majority: Disk Only X

The core of Exchange Server 2007 running on Windows Server 2008 in a multi-site cluster is Cluster Continuous Replication (CCR). CCR is a high-availability feature of Exchange Server 2007 that combines asynchronous log shipping and replay technology built into Exchange Server 2007 with the failover and management features provided by the Cluster service in Windows Server 2008.

We recommend using CCR with the Node and File Share Majority quorum mode, with the file share witness on a third computer. Using a file share witness can prevent a “split” scenario by always requiring a majority of the two nodes and the file share to be both available and in communication for the clustered Mailbox server to be operational.


Recommended