+ All Categories
Home > Documents > VERITAS Volume Manager for Windows 2000 -...

VERITAS Volume Manager for Windows 2000 -...

Date post: 31-Aug-2018
Category:
Upload: lycong
View: 223 times
Download: 0 times
Share this document with a friend
17
WHITE PAPER VERITAS Volume Manager for Windows 2000 CAMPUS CLUSTERING: USING VERITAS VOLUME MANAGER FOR WINDOWS WITH MICROSOFT CLUSTER SERVER (MSCS) 1
Transcript

W H I T E P A P E R

VERITAS Volume Manager for Windows 2000

CAMPUS CLUSTERING: USING VERITAS VOLUME MANAGER FOR WINDOWS WITH MICROSOFT CLUSTER SERVER (MSCS)

1

TABLE OF CONTENTS TABLE OF CONTENTS .............................................................................................................................................2 Overview.....................................................................................................................................................................3 Dynamic Volumes Concepts ......................................................................................................................................4

Dynamic Volume Overview....................................................................................................................................4 Dynamic Volumes Virtualize Storage.....................................................................................................................4 Dynamic Volumes in Microsoft Windows 2000......................................................................................................5 Dynamic Volumes in VERITAS Volume Manager for Windows ............................................................................5

Dynamic Disk Groups.................................................................................................................................................6 Microsoft Cluster Server (MSCS)...............................................................................................................................6

MSCS Overview.....................................................................................................................................................6 MSCS Quorum Resource ......................................................................................................................................8

Cluster Ownership of the Quorum Resource.....................................................................................................9 The Heartbeat of a Cluster...................................................................................................................................10 MSCS Challenge/Defense Protocol.....................................................................................................................11

VERITAS Volume Manager in a MSCS Cluster Environment .................................................................................12 Volume Manager Advantages with MSCS...........................................................................................................12 Dynamic Volume Support with MSCS .................................................................................................................12 Fault Tolerant MSCS Quorum Resource.............................................................................................................12 How VERITAS Volume Manager can help solve Challenges in a Cluster Environment .....................................13

ISSUE #1: Can’t Grow Data Volumes Online..................................................................................................13 ISSUE #2: Can’t Use Fault Tolerant Data Volumes ........................................................................................13 ISSUE #3: Can’t Use Fault Tolerant Quorum Disk..........................................................................................13 ISSUE #4: Quorum Disk is a Single Point of Failure for the Cluster ...............................................................14

Disaster Recovery: Campus Clusters ..................................................................................................................14 Campus Cluster Option: Mirrored Quorum Disks at Two Locations ...............................................................15

Summary ..................................................................................................................................................................16

Copyright 2002 VERITAS Software Corporation. All rights reserved. VERITAS, VERITAS Software, the VERITAS logo, and all other VERITAS product names and slogans are trademarks or registered trademarks of VERITAS Software Corporation in the US and/or other countries. Other product names and/or slogans mentioned herein may be trademarks or registered trademarks of their respective companies. Specifications and product offerings subject to change without notice. June 2002.

2

OVERVIEW The Microsoft Windows 2000 operating system offers significant advances in performance, scalability, and manageability. One of the key features of this new operating system is the Logical Disk Manager (LDM) that provides logical volume management and online disk administration capabilities. VERITAS Volume Manager™ for Windows 2000 extends these in-the-box basic capabilities to create a highly scalable, manageable platform for the most data-intensive or critical application environments. Windows 2000 also supports Microsoft Cluster Server (MSCS), the Microsoft solution for creating a loosely coupled configuration of servers with application failover capabilities. The MSCS technology has been in place for a few years and is used to improve the availability and manageability of Windows systems. Using VERITAS Volume Manager for Windows 2000, system administrators can create flexible storage configurations integrated with the MSCS cluster server, so that the Cluster Server can automatically migrate all the storage required for a specific application between nodes when a failover occurs. This solution combines the high-availability failover capabilities of MSCS with the highly configurable and manageable storage capabilities of VERITAS’ logical volume management support. This paper provides a brief overview of the various components involved in this solution and then discusses specifically how to create application-specific storage migration for MSCS using VERITAS Volume Manager. This paper also discusses the two key advantages of using VERITAS Volume Manager in a MSCS environment: the ability to use dynamic disks with clustering and the ability to create Campus Clusters with fault tolerant mirrored quorum resources. Both of these VERITAS Volume Manager advantages reduce planned and unplanned downtime in a clustering environment. More information on Windows 2000 can be found on the Microsoft Web site at http://www.microsoft.com. The VERITAS Web site (http://www.veritas.com/) contains other sources of information on VERITAS Volume Manager for Windows 2000.

Copyright 2002 VERITAS Software Corporation. All rights reserved. VERITAS, VERITAS Software, the VERITAS logo, and all other VERITAS product names and slogans are trademarks or registered trademarks of VERITAS Software Corporation in the US and/or other countries. Other product names and/or slogans mentioned herein may be trademarks or registered trademarks of their respective companies. Specifications and product offerings subject to change without notice. June 2002.

3

DYNAMIC VOLUMES CONCEPTS Dynamic Volume Overview VERITAS worked with Microsoft to develop the logical volume management in the Windows 2000 software. Logical volume management through the use of dynamic volumes removes physical limitations of storage, enabling administrators to build higher-performance, more available storage configurations from existing disk devices. This simplifies disk administration tasks for reduced cost of ownership. Windows 2000 introduces a new Logical Disk Manager (LDM) facility that supports both basic disks and dynamic disks. Basic disks use standard disk partition tables to support basic volumes and have been supported on previous versions of Windows. Dynamic disks that contain dynamic volumes store disk and volume information on the disk itself. A dynamic volume is an abstract online storage management unit instantiated by a system software component called a volume manager. To file systems, database management systems, and applications that do raw I/O, a dynamic volume appears to be located on a single disk, in the sense that:

• It has a fixed amount of non-volatile storage • Its storage capacity is organized as consecutively numbered 512-byte blocks • Sequences of consecutively numbered blocks can be read or written with a single

request • Reading and writing can start at any block • The smallest unit of data that can be read or written is one 512-byte block

Dynamic Volumes Virtualize Storage Unlike basic volumes, a dynamic volume can aggregate the capacity of several disks into a single storage unit so that there are fewer storage units to manage, or to accommodate files larger than the largest available disk. A dynamic volume can aggregate I/O performance of several disks. This allows large files to be transferred faster than would be possible with the fastest available disk. In some circumstances, it also enables more I/O transactions per second to be executed than would be possible with the fastest available disk (i.e., by issuing concurrent I/O’s). A dynamic volume can improve data availability through mirroring or RAID techniques that tolerate disk failures. Failure-tolerant volumes can remain fully functional when one or more disks that comprise them fail. A dynamic volume created with the VERITAS Volume Manager can grow dynamically. More complex volumes can be created to provide a combination of these benefits.

Copyright 2002 VERITAS Software Corporation. All rights reserved. VERITAS, VERITAS Software, the VERITAS logo, and all other VERITAS product names and slogans are trademarks or registered trademarks of VERITAS Software Corporation in the US and/or other countries. Other product names and/or slogans mentioned herein may be trademarks or registered trademarks of their respective companies. Specifications and product offerings subject to change without notice. June 2002.

4

Dynamic Volumes in Microsoft Windows 2000 Dynamic volumes in Windows 2000 can host software-managed RAID volumes. Because the disk and volume information is on the disk itself instead of in system tables, moving or reallocating dynamic disk storage between systems is easier. Another major benefit is that administrators can perform disk and volume management tasks without restarting the system. The volume manager supports online growth and management of storage. Dynamic volumes in Windows 2000 may be simple, spanned, striped (RAID-0), mirrored (RAID-1), or RAID-5 (striping with distributed parity). The Windows 2000 Logical Disk Manager provides online management and configuration of local and remote disk storage and a domain-wide view of storage resources. Together, these features support highly configurable and manageable storage solutions. Dynamic Volumes in VERITAS Volume Manager for Windows VERITAS Volume Manager extends the capabilities of Windows 2000 dynamic volumes. For example, Volume Manager dynamic volumes have all the capabilities of the native Windows 2000 dynamic volumes, plus:

• Striped and RAID-5 volumes using more than 32 physical disks (columns) • Mirrored stripe volumes for a high-performance, highly available storage solution • Ability to grow software RAID volumes dynamically without taking users or applications

off-line (no rebooting) • N-way mirroring — administrators can create and detach third mirrors to mirrored

volumes • Preferred plex — designating a local mirror as the preferred “read” device for data with

heavy request loads. • Hot spares, Hot Relocation and Unrelocation • RAID 5 and Dirty Region Logging to speed recovery after RAID 5 or mirror volume

failure. Volume Manager also provides advanced online management capabilities. For example, administrators can expand mirrored, striped, and RAID-5 volumes while the data is online and available. Administrators can use the graphical interface to identify storage bottlenecks and move data to correct or prevent performance problems. Finally, VERITAS Volume Manager supports shared and partitioned shared storage configurations using the concept of multiple disk groups. This makes it easier for multiple Windows servers to share a disk farm or Storage Area Network by segmenting the storage available, with each server “owning” specific storage segments. The administrator can easily reconfigure or change the segmentation. This last feature is relevant for supporting storage migration with MSCS.

Copyright 2002 VERITAS Software Corporation. All rights reserved. VERITAS, VERITAS Software, the VERITAS logo, and all other VERITAS product names and slogans are trademarks or registered trademarks of VERITAS Software Corporation in the US and/or other countries. Other product names and/or slogans mentioned herein may be trademarks or registered trademarks of their respective companies. Specifications and product offerings subject to change without notice. June 2002.

5

DYNAMIC DISK GROUPS The VERITAS Volume Manager supports a concept called Dynamic Disk Groups. A dynamic disk group is a collection of disks with arbitrary volume layout. The Windows 2000 Logical Disk Manager does not support dynamic disk groups. VERITAS Volume Manager support for multiple dynamic disk groups is a key feature when used in a MSCS environment. A dynamic disk group is the object that is imported or deported. When a disk group is imported all the volumes contained in the disk group are brought online and made available by the volume manager. When a dynamic disk group is deported all the volumes contained within the dynamic up are taken offline and made unavailable by the volume manager. There are three types of VERITAS Volume Manager dynamic disk groups:

1. Primary disk group - contains the boot/system disk and zero to many additional disks with arbitrary volume layout

2. Secondary disk group - contains one-to-many disks with arbitrary volume layout 3. Cluster disk group - contains one-to-many disks with arbitrary volume layout. A cluster

disk group has two additional properties • Cluster disk groups are intended to be used by clustering applications such as

MSCS and VERITAS Cluster Server (VCS). • A cluster disk group is NOT automatically imported at boot time. The user must

perform a manual import through the GUI, CLI, or API, if the group is not managed by a cluster.

• A cluster disk group uses hardware locking mechanisms (e.g., SCSI-2 reserve/release) to guarantee that the disks within a cluster disk group are exclusively owned by one node at a time

• Both MSCS and VCS can import and deport a cluster disk group through online and offline operations

• When VERITAS Volume Manager is used with MSCS, the volume disk group is the resource managed by MSCS. MICROSOFT CLUSTER SERVER (MSCS) MSCS Overview Microsoft Cluster Server is the Microsoft clustering solution for Microsoft Windows-based servers. A detailed description of MSCS is beyond the scope of the paper and a more detailed description can be found on the Microsoft Web site. This section highlights some relevant points.

Copyright 2002 VERITAS Software Corporation. All rights reserved. VERITAS, VERITAS Software, the VERITAS logo, and all other VERITAS product names and slogans are trademarks or registered trademarks of VERITAS Software Corporation in the US and/or other countries. Other product names and/or slogans mentioned herein may be trademarks or registered trademarks of their respective companies. Specifications and product offerings subject to change without notice. June 2002.

6

Microsoft Cluster Server employs a shared nothing architecture, and was initially introduced as a two-node clustering solution for the Enterprise Edition of Windows NT. As of this writing, MSCS is supported with up to two nodes in the Windows 2000 Advanced Server operating system, and up to four nodes in the Windows 2000 Datacenter Server operating system.

Th

Thtraeitis

Motrafa

Coptradregis

Shared nothing refers to the fact that resources such as storage “belong” to only one system in the cluster at any time. However, the storage is physically connected to both nodes via SCSI bus or Fibre Channel. In the event of a failure on one node, the other nodecan import the storage resource and host the application.

A cluster is a group of independent computers working together as a single system to ensure that mission-critical applications and resources are as highly available as possible. The group is managed as a single system, shares a common namespace, and is specifically designed to tolerate component failures, and to support the addition or removal of components in a way that's transparent to users. Clustered systems have several advantages including fault-tolerance, high-availability, scalability, simplified management and support for rolling upgrades, to name a few.

e primary operating attributes of MSCS are as follows: • Each system in the cluster is a node. Today the nodes must be NT server or Windows

2000 systems. Both nodes should be the same OS level. • Any item managed by MSCS is a resource. Resources may include storage devices,

file shares, TCP/IP addresses, applications, and databases. • A resource group is the collection of resources that failover as a group. All resource

dependencies (such as a net name that depends on an IP address) must exist in the same group.

is cluster design brings both availability and manageability benefits. The MSCS software cks the state of the nodes in the cluster. In the event of an application or server failure, it her restarts the application or performs a failover to an available node. When the failed node again available, MSCS switches the applications back to their “preferred” node.

Failover is the process by which services that were running on one node are moved to another node or nodes in a cluster.

st stateless applications switch to the failover node transparently. Some applications that ck the state of the nodes need to re-establish a connection to the cluster. Otherwise, the

ilover is transparent. System administrators can also manually move resources from one

yright 2002 VERITAS Software Corporation. All rights reserved. VERITAS, VERITAS Software, the VERITAS logo, and all other VERITAS product names and slogans are emarks or registered trademarks of VERITAS Software Corporation in the US and/or other countries. Other product names and/or slogans mentioned herein may be trademarks or tered trademarks of their respective companies. Specifications and product offerings subject to change without notice. June 2002.

7

node to another to perform load balancing and system maintenance tasks without resorting to downtime on production applications. MSCS Quorum Resource Another concept central to the practice of MSCS clustering is the quorum resource. A quorum resource is usually, though not necessarily, a SCSI disk that arbitrates for a resource by supporting something known as the challenge/defense protocol (explained later in this document). This resource should be capable of storing the cluster registry and cluster logs. It also is used to persist configuration change logs, tracking changes to the configuration database when any defined cluster member is missing or not active. This prevents configuration partitions in time, also known as “temporal partitions”. Temporal partitions are undesirable, because changed configuration data is not persisted, thereby causing an out-of-sync cluster. In MSCS, a quorum is a resource determines “ownership” of the cluster. Exactly one quorum resource is in every cluster. In a way, the quorum resource is a global control lock for the cluster. The quorum is also used to determine which node “is the cluster” when the network heartbeat is lost. This prevents “split-brain” situations when the connection between the nodes is broken and both nodes try to start the cluster.

Split-Brain Split-brain refers to a state where nodes in a cluster lose contact with each other across the network, but shared disks continue to operate (also known as “network partitioning”). The secondary server in the cluster, believing the primary server has failed because it no longer hears its heartbeat, takes over the disk. The primary server, which no longer receives a heartbeat from the backup server, but knows that it (the primary) is still operating properly, continues to write to disk. Windows NT and Windows 2000 (as well as other operating systems and file systems today) cannot support multiple systems writing to the same disk at the same time, so some data may be lost. Because split-brain is a situation that must be avoided, MSCS uses the following mechanisms to keep split-brain from happening to the cluster:

• SCSI reservation — The process used to control a hard disk, or in an array, multiple hard disks, so that only the server with the reservation can access the drive(s). Through SCSI reserve and release commands, MSCS is able to maintain control of the shared disks so that only the server that has control of the drives has access.

The quorum resource is critical to the cluster. However, the quorum is a single point of failure in MSCS without third party augmentation via hardware or software. Volume Manager provides mechanisms for building redundancy in the quorum resource. By placing the quorum in a Volume Manager Disk Group resource (Cluster Disk Group) it can be configured as a fault tolerant (RAID) volume. A three or four-way mirror strategy provides a high level of redundancy

Copyright 2002 VERITAS Software Corporation. All rights reserved. VERITAS, VERITAS Software, the VERITAS logo, and all other VERITAS product names and slogans are trademarks or registered trademarks of VERITAS Software Corporation in the US and/or other countries. Other product names and/or slogans mentioned herein may be trademarks or registered trademarks of their respective companies. Specifications and product offerings subject to change without notice. June 2002.

8

and prevents a deadlock situation during arbitration. SCSI reservations are placed on all disks in a disk group to physically fence off the disks from other nodes in the cluster. Hardware augmentation alone (Hardware RAID) can provide electronic fault tolerance for the quorum, but a hardware solution for physical protection can be cost prohibitive. Volume Manager can offer both at a reasonable cost. It is possible with Volume Manager to physically separate the disks of the quorum and locate them in separate sites. This allows the quorum and thus, the cluster, to remain online should fire, flood, etc., damage one, or possibly more, of the quorum disks. Cluster Ownership of the Quorum Resource The quorum device drives the practice of cluster ownership. Ideally, in a cluster only one server should know the cluster configuration and be able to make decisions on that part of the cluster service. So, when you build the cluster service, you use a couple of algorithms to determine “who’s in charge.” One algorithm is a simple majority, which would certainly cancel each other out. To do this, you use a quorum resource by doing the following: Within the cluster administrator, you determine a quorum—usually, but not always, part of a SCSI disk determining who has ownership of the cluster. Recall that only one owner can own a resource at any time. That’s the same mechanism used to ensure that only one person is in charge of the cluster at any time. This is important, because to implement a cluster server, you must designate a disk to act as the quorum device—which provides arbitration and knowledge of who’s in charge at any time. A device arbitrates for a resource by supporting the challenge/defense protocol of storing the cluster registry and logs. The quorum resource not only arbitrates, but also provides a place for doing checkpoints. This means it persists configuration-change logs, tracking changes to the configuration database when any defined member is missing (not active). The quorum device also prevents configuration of partitions in time, also known as temporal partitions. These partitions are considered a negative in clustering. So, if you change one node while a second node is down, you can expect that when the second node comes up, it would have the right configuration information. You don’t want to go from state to state prime on one machine and then bring up another machine and have it come back as state and not state prime. That would mean you had lost some state information. You use the quorum device as a means of logging those changes so that at any time, you can survive catastrophic failures and bring data back on time in an orderly manner. With configuration data on the quorum device, you can always know where the information is located.

Copyright 2002 VERITAS Software Corporation. All rights reserved. VERITAS, VERITAS Software, the VERITAS logo, and all other VERITAS product names and slogans are trademarks or registered trademarks of VERITAS Software Corporation in the US and/or other countries. Other product names and/or slogans mentioned herein may be trademarks or registered trademarks of their respective companies. Specifications and product offerings subject to change without notice. June 2002.

9

The Heartbeat of a Cluster A third concept central to clustering depends on both the namespace and the quorum device. It’s known as the “heartbeat” of a cluster.

Con

In thServboth If thheachaothethe Whequo

Copyrigtrademaregister

Heartbeat: In a failover configuration, the heartbeat allows two or more systems to communicate privately with each other. Heartbeats are signals that are sent periodically from one system to another to verify the systems are active.

sider the heartbeat using the MSCS Cluster example in figure 1 below:

Figure 1

is example, Server A ‘owns’ the disk in cabinet A and is running Microsoft Exchange. er B ‘owns’ the disk in cabinet B and is running Microsoft SQL. Server A and Server B are active and they’re active servicing client requests.

e network connections between these two servers should become unplugged, then the rtbeat between the servers will fail. The solution is that the servers can use the MSCS llenge/defense protocol (described below) and the quorum resource to learn whether the r server is still functioning. The challenge/defense protocal uses a low-level bus reset of SCSI buses between the machines to attempt to gain control of the disks quorum resource.

n a SCSI bus reset is used, the reservation that each server had been holding on the rum disk would be lost. Each server then would have roughly 10 seconds to reestablish

ht 2002 VERITAS Software Corporation. All rights reserved. VERITAS, VERITAS Software, the VERITAS logo, and all other VERITAS product names and slogans are rks or registered trademarks of VERITAS Software Corporation in the US and/or other countries. Other product names and/or slogans mentioned herein may be trademarks or

ed trademarks of their respective companies. Specifications and product offerings subject to change without notice. June 2002.

10

that reservation, which would in turn let the other server know that it is still functioning—even though the other server wouldn’t necessarily be able to communicate with the server. If the active cluster server does not re-establish the SCSI reservation on the quorum resource within the time limit, all applications that were on the server would then flow to the other server. The new server servicing the application may now be a bit slower, but clients will still get their applications serviced. The IP (Internet protocol) address and network names will move, applications will be reconstituted according to the defined dependencies and clients will still be serviced, without any question as to the state of the cluster. MSCS Challenge/Defense Protocol The MSCS challenge/defense protocol works as follows: SCSI-2 has reserve/release verbs with a semaphore on the disk controller. The owner of the disk controller gets a “lease” on the semaphore, which it can renew every three seconds. To preempt ownership, a challenger clears the semaphore with a SCSI bus reset, waits ten seconds (three seconds for renewal and two seconds for bus-settle time—twice, to give the current owner two chances to renew). If the semaphore is still clear, the challenger takes the lease from the former owner by issuing a reserve to acquire the semaphore.

Figure 2

Copyright 2002 VERITAS Software Corporation. All rights reserved. VERITAS, VERITAS Software, the VERITAS logo, and all other VERITAS product names and slogans are trademarks or registered trademarks of VERITAS Software Corporation in the US and/or other countries. Other product names and/or slogans mentioned herein may be trademarks or registered trademarks of their respective companies. Specifications and product offerings subject to change without notice. June 2002.

11

VERITAS VOLUME MANAGER IN A MSCS CLUSTER ENVIRONMENT Volume Manager Advantages with MSCS VERITAS Volume Manager for Windows has two key advantages when used with Microsoft Cluster Server (MSCS):

• Dynamic Volumes Support with MSCS • Fault Tolerant MSCS Quorum Resource

Dynamic Volume Support with MSCS Windows 2000 Advanced Server and Datacenter Server shipped from Microsoft do not provide support for dynamic disks in a server cluster (MSCS) environment. VERITAS Volume Manager for Windows 2000 fully supports dynamic volumes in an MSCS environment. Windows 2000 Advanced Server and Datacenter server shipped from Microsoft do not provide support for dynamic disks in a server cluster (MSCS) environment. VERITAS Volume Manager for Windows 2000 adds the dynamic disk features to a server cluster. Fault Tolerant MSCS Quorum Resource VERITAS Volume Manager provides additional fault tolerant support for the MSCS quorum by using the dynamic disk group as a quorum resource. This is accomplished by configuring the quorum on a mirrored volume contained within a cluster disk group. This provides additional redundancy and protection from a single disk failure. Volume Manager supports up to 32-way mirrored volumes. This provides quorum protection for up to 31 hard disk failures. Typically a three or four-way mirror is used for the quorum resource to provide fault tolerance. To “own” a quorum disk group, the server must successfully import the disk group and must be able to obtain a SCSI reservation on a majority of the disks in the disk group. For this reason, VERITAS recommends that you use three or more disks in the quorum disk group. The quorum disk group is critical to the system; a three or four-way mirror strategy provides high levels of redundancy and helps prevents a deadlock situation. SCSI reservations are placed on all disks in a disk group to physically fence off the disks from other nodes in the cluster. Volume Manager also has built-in technology to support four-way mirroring at two sites in a campus cluster. Unlike the physical disk resource the Volume Manager Disk Group resource is a cluster disk group. A cluster disk group can have one to many physical disks containing an arbitrary number of volume layouts. Volume Manager uses a similar challenge/defense protocol as the MSCS physical resource on the cluster disk group containing the volume that has the quorum. To support the challenge/defense protocol the VERITAS volume manager uses a majority algorithm to determine exclusive ownership of the disk group. When a challenge occurs the Volume Manager arbitrates for ownership of the disk group containing the quorum. To gain

Copyright 2002 VERITAS Software Corporation. All rights reserved. VERITAS, VERITAS Software, the VERITAS logo, and all other VERITAS product names and slogans are trademarks or registered trademarks of VERITAS Software Corporation in the US and/or other countries. Other product names and/or slogans mentioned herein may be trademarks or registered trademarks of their respective companies. Specifications and product offerings subject to change without notice. June 2002.

12

exclusive ownership, one node must obtain a lease (i.e., semaphore) on a majority of disks in the disk group. Volume Manager obtains leases on “physical disks” in the group and not the volume that contains the quorum database. Volume Manager uses a majority algorithm to protect against a deadlock situation where one node obtains a lease on the disks in the disk group and another node obtains a lease on the other half. VERITAS recommends that the cluster disk group containing the quorum volume contain three or more disks. It is recommended that the quorum volume include a mirror plex on each disk in the group. Using three or more disks in a cluster disk groups protects against potential deadlock situations. One node can always obtain the majority and successfully arbitrate for ownership of the disk group containing the quorum volume. Note, the challenge and defense protocol is only invoked when the public and private MSCS cluster heartbeat fails. How VERITAS Volume Manager can help solve Challenges in a Cluster Environment As presented thus far in this paper, there are challenges that face a system administrator deploying MSCS in his/her production environment. ISSUE #1: Can’t Grow Data Volumes Online Clustering provides higher availability than non-clustered systems. Yet, if a server’s data grows and storage space must be added onto existing volumes, there is no way to avoid downtime with native MSCS. By using Volume Manager in conjunction with MSCS, dynamic disks can be utilized, which do allow you to grow your volumes without interrupting data availability. ISSUE #2: Can’t Use Fault Tolerant Data Volumes Another consideration in building a high availability solution is to protect against possible hardware failure. Because native MSCS does not allow the use of dynamic disks, data in a cluster can’t be made fault tolerant. If a disk that holds your data in a cluster fails, you must take the cluster off line, replace the faulty hardware, and restore the data from a backup, unless you use Volume Manager. Volume Manager’s dynamic disks support Mirrored, RAID-5, and mirrored stripe volumes to keep your data online through hardware failures. ISSUE #3: Can’t Use Fault Tolerant Quorum Disk Since native MSCS cannot use dynamic disks, the quorum will go offline if the disk it resides on, fails. MSCS has a utility that allows the quorum to be rebuilt, but this obviously must take place while the Cluster is offline. By putting the quorum on a Volume Manager Disk Group resource, it can be mirrored to avoid downtime from a single hardware failure.

Copyright 2002 VERITAS Software Corporation. All rights reserved. VERITAS, VERITAS Software, the VERITAS logo, and all other VERITAS product names and slogans are trademarks or registered trademarks of VERITAS Software Corporation in the US and/or other countries. Other product names and/or slogans mentioned herein may be trademarks or registered trademarks of their respective companies. Specifications and product offerings subject to change without notice. June 2002.

13

ISSUE #4: Quorum Disk is a Single Point of Failure for the Cluster Without the use of dynamic volumes and software mirroring, a quorum resource on a basic disk is a single point of failure for the cluster. By mirroring the quorum and spreading the plexes of the mirror across separate hardware storage arrays, even loosing an entire array will not force the cluster offline. The methods used and caveats to be aware of in developing a solution that precludes the quorum from being a single point of failure are explained in the following paragraphs. Disaster Recovery: Campus Clusters It is becoming commonplace for customers to protect their Microsoft Clusters by utilizing Campus Clusters to protect from natural disasters such as floods and hurricanes. This practice is also becoming more common as power blackouts become issues that customers must plan into their system planning.

MVfcm Acqrpwns Vmemfc

Ctr

Campus Clusters are multiple nodes in separate buildings with mirrored SAN attachedstorage located in each building.

icrosoft Windows based servers support campus clusters out of the box without the use of ERITAS Volume Manager. The clusters can be located in locations up to 20 miles apart using

ibre channel storage area networks (SANs) and long wave optical technologies. This solution an disperse the clustered servers into different buildings or areas to protect the servers from ost disasters that could strike one of the locations.

key resource in the cluster is the quorum resource. If this quorum resource is lost to the luster, the cluster will fail, as none of the cluster servers will be able to gain control of the uorum resource and ultimately the cluster. Most customers use hardware RAID to contain this esource that provides hardware redundancy and protection from the loss of one or more hysical disks. While a quorum resource located on hardware RAID provides for disk failures ithin the hardware RAID, this type of quorum resource protection does not protect from atural disasters and power failures that could effect the physical location that contains this ingle quorum resource.

olume Manager allows the quorum resource to be located on multiple disks located at ultiple locations by using the software mirroring capabilities of dynamic volumes in an MSCS nvironment. By using software mirroring, the quorum resource can be distributed across ultiple hardware RAID enclosures that are located at multiple physical sites. This provides a

ully fault tolerant solutions for customers who want to maximum the uptime for their MSCS luster solution.

opyright 2002 VERITAS Software Corporation. All rights reserved. VERITAS, VERITAS Software, the VERITAS logo, and all other VERITAS product names and slogans are rademarks or registered trademarks of VERITAS Software Corporation in the US and/or other countries. Other product names and/or slogans mentioned herein may be trademarks or egistered trademarks of their respective companies. Specifications and product offerings subject to change without notice. June 2002.

14

Campus Cluster Option: Mirrored Quorum Disks at Two Locations This option protects from a single site failure by locating the four software mirrors at two separate locations. Site A would have with one cluster server and two plex of the four-way mirrored quorum resource and Site B would have one cluster server and two plexes of the four-way mirrored quorum resource.

S ite A S ite B

E th e rn e t

C o m p u te r

W in d o w sS e rv e r A

C o m p u te r C o m p u te r C o m p u te r

W in d o w sS e rv e r B

D is k C a b in e t A D is k C a b in e t B

4 -W a y M irro re d Q u o ru m R e s o u rc e

Q u o ru mR e s o u rc e

M irro rP le x 4

Q u o ru mR e s o u rc e

M irro rP le x 3

Q u o ru mR e s o u rc e

M irro rP le x 2

Q u o ru mR e s o u rc e

M irro rP le x 1

C lu s te r H e a rtb e a t

V o lu m e M a n a g e r 4 -W a y M irro rQ u o ru m R e s o u rc e C lu s te r in g S u p p o rt

Figure 3

Copyright 2002 VERITAS Software Corporation. All rights reserved. VERITAS, VERITAS Software, the VERITAS logo, and all other VERITAS product names and slogans are trademarks or registered trademarks of VERITAS Software Corporation in the US and/or other countries. Other product names and/or slogans mentioned herein may be trademarks or registered trademarks of their respective companies. Specifications and product offerings subject to change without notice. June 2002.

15

WARNING: It is imperative that controlled coordination exists between the two sites when performing this failover scenario. Failure to do so may result in a Split Brain condition. The steps required to get this cluster server in control of the cluster are the following:

• If the site not owning the cluster goes offline, the quorum volume will stay online at the other site and other cluster resources will stay online or move to that site.

• If the site owning the quorum volume goes offline, the remaining site will not be able to gain control of the quorum volume because it cannot reserve a majority of disks in the quorum group. As mentioned earlier, this is a safeguard to prevent multiple nodes from onlining members of a cluster disk group to which they have access.

The only recommended and supported Campus Cluster configurations are those made up of two sites utilizing cluster disk groups with an even number of disks, with the disks evenly distributed between the sites. Volume Manager 3.0 for Windows 2000 Service Pack 1 is a requirement for this Campus Cluster implementation. Volume Manager 3.0 for Windows 2000 with Service Pack 1 maintains the reservation on disks that have 50% of the disks available, but will not import a disk group in which a minority of disks (50% or less) is available. This makes implementation of a two-site Campus Cluster configuration which utilizes cluster disk groups comprised of an even number of disks, with the disks evenly distributed across the two sites, possible, while significantly reducing the risks of split-brain and data corruption, which are usually associated with such a configuration. VERITAS has a special command line utility, vxclus, that will aid in this manual failover. The operator at site B completes the vxclus command to gain control over the cluster in spite of not having a majority of disks in the quorum cluster disk group. When Site A comes back online, there are special procedures needed to ensure single control of the cluster is maintained. Please reference the VERITAS Technote titled “VERITAS Volume Manager 3.0 for Windows 2000 Force Import Utility” on this subject,which is available from your local VERITAS Office. CAUTION: Unpredictable results can occur if the storage does not come on line before the Node Server (Site A in this example) for this scenario. SUMMARY It is a challenge to ensure the high availability of applications and data in today’s rapidly growing Windows environments. Many factors can cause downtime – planned downtime to perform system maintenance and necessary upgrades, as well as unexpected faults with software and hardware.

Copyright 2002 VERITAS Software Corporation. All rights reserved. VERITAS, VERITAS Software, the VERITAS logo, and all other VERITAS product names and slogans are trademarks or registered trademarks of VERITAS Software Corporation in the US and/or other countries. Other product names and/or slogans mentioned herein may be trademarks or registered trademarks of their respective companies. Specifications and product offerings subject to change without notice. June 2002.

16

VERITAS Volume Manager for Windows 2000 builds on the strong foundation of logical volume management and dynamic disks in Windows 2000. It provides advanced storage-management capabilities for applications with critical performance or availability requirements and offers the highest level of online disk- and volume-management capabilities available. In addition, Volume Manager enables the use of dynamic volumes in an MSCS cluster environment and creating Campus Clusters with fault tolerant MSCS quorum resources using N-Way Mirroring. Using Volume Manager and MSCS together provides a flexible, inexpensive clustering solution that uses commodity hardware and provides a great deal of flexibility and manageability. To learn more about VERITAS Volume Manager for Windows, visit http://www.veritas.com/us/products/volumemanagerwin.

Copyright 2002 VERITAS Software Corporation. All rights reserved. VERITAS, VERITAS Software, the VERITAS logo, and all other VERITAS product names and slogans are trademarks or registered trademarks of VERITAS Software Corporation in the US and/or other countries. Other product names and/or slogans mentioned herein may be trademarks or registered trademarks of their respective companies. Specifications and product offerings subject to change without notice. June 2002.

17


Recommended