Date post: | 03-Jun-2018 |
Category: |
Documents |
Upload: | baburaoganpatrao |
View: | 222 times |
Download: | 0 times |
of 19
8/12/2019 EMC ClarIIon High Availability
1/19
EMC CLARiiON High Availability (HA)
Best Practices Planning
Abstract
This white paper discusses end-to-end high availability (HA). It takes into consideration the HA aspects ofmission-critical storage environments, starting at the host side and going all the way to the storage system
to include connectivity infrastructure involving switches. The paper also considers the importance of
keeping HA aspects in mind in order to maintain HA in production environments. This white paperdiscusses the choices available to customers so that they can set appropriate expectations for data
availability in their environments.
April 2007
8/12/2019 EMC ClarIIon High Availability
2/19
Copyright 2005, 2007 EMC Corporation. All rights reserved.
EMC believes the information in this publication is accurate as of its publication date. The information is
subject to change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED AS IS. EMC CORPORATION
MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THEINFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an applicable
software license.
For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com
All other trademarks used herein are the property of their respective owners.
Part Number H1737.1
EMC CLARiiON High Availability (HA)Best Practices Planning 2
8/12/2019 EMC ClarIIon High Availability
3/19
Table of Contents
Executive summary ............................................................................................4
Introduction.........................................................................................................4
Audience ...................................................................................................................................... 4
Host configuration for high availability ............................................................4
Prerequisite for a highly available host environment ................................................................... 4
CLARiiON Procedure Generator ................................................................................................. 6
Number of paths from the host to the storage system................................................................. 6
Host bus adapter vendor information........................................................................................... 7
Automatic path failover and failback............................................................................................ 7
Path management failover settings.......................................................................................... 8Install free failover software (EMC PowerPath SE).................................................................. 8
Connectivity configuration for high availability...............................................8
Low protection.............................................................................................................................. 9
Medium protection ....................................................................................................................... 9High protection............................................................................................................................. 9
Ultra protection............................................................................................................................. 9
Switch vendor information............................................................................................................ 9
Storage-side high availability ............................................................................9
Storage-system components ..................................................................................................... 10
RAID configuration..................................................................................................................... 10
Low protection............................................................................................................................ 10
Medium protection ..................................................................................................................... 10
High protection........................................................................................................................... 11
Ultra protection........................................................................................................................... 11
Hot sparing policy ...................................................................................................................... 11
Number of hot spares............................................................................................................. 11Sizing the hot spare disk ........................................................................................................ 11Disk replacement.................................................................................................................... 11
Rebuild-time considerations....................................................................................................... 11
System load............................................................................................................................ 11Rebuild priority ....................................................................................................................... 12Size of the RAID group........................................................................................................... 12Size and type of disk drive ..................................................................................................... 12
Clustering and replication................................................................................13
Clustering: Protecting against loss of the primary application server....................................... 13
Data mirroring: Protecting against loss of the primary storage system ..................................... 13Validation and maintenance.............................................................................13
HA validation.............................................................................................................................. 14
Initial host failover testing....................................................................................................... 14Ongoing high availability verification...................................................................................... 14
Change control process............................................................................................................. 16
Conclusion ........................................................................................................17
Appendix A: Failover mode settings...............................................................17
EMC CLARiiON High Availability (HA)Best Practices Planning 3
8/12/2019 EMC ClarIIon High Availability
4/19
Executive summaryEMC is focused on helping customers maintain the most highly available infrastructure possible. Having a
highly available environment involves many factors. These factors include not only deploying world-class
products and services but also deploying and configuring those products and services in a manner that
provides maximum availability. It is also important to note that high availability (HA) comes at a cost.However, not all applications need the same level of availability. Some applications are absolutely mission-
critical while others may be business-critical but can withstand a few minutes of outage. This white paper
discusses what is needed to ensure end-to-end HA and to help customers make appropriate choices.
IntroductionA highly available system is one that does not have any single point of failure (SPOF). In the event of acomponent or element failure, the system maintains its basic functionality. In many cases, an HA system is
able to withstand multiple failures as long as these failures do not occur within the redundant component
set. For example, in a RAID 5 group, a single disk failure does not affect data availability; the system canwithstand multiple single-disk failures as long as they occur in different RAID groups.
AudienceThis white paper is primarily intended for EMCCLARiiONcustomers. However, EMC field personnelcan also benefit from the information included as well.
Host configuration for high availabilityThere are multiple aspects to HA, but designing such an environment starts at the host side. To ensure that
the application data is highly available, the host must be configured properly to withstand certain single
failures, such as failure of the host bus adapter (HBA), fibre cable, or failover software.
Prerequisite for a highly available host environmentTo ensure that the production environment is supported per the configurations tested and verified for
interoperability by EMC, refer to the E-Lab Interoperability Navigator (a searchable database of theEMC Support Matrix). The Navigator is available on Powerlink, EMCs password-protected extranet for
customers and partners.
E-Lab Interoperability Navigator outlines, among other things, supported revisions of:
Host operating systems
HBA models
HBA software including firmware
Latest operating system patches
Switch firmware
CLARiiON FLAREoperating environment
Symmetrixmicrocode
Another important and related utility is the High Availability Verification Tool. HAVT helps you validatethat a servers components are supported by EMC. HAVT is available in the GUI and the text-based
versions of NavisphereServer Utility (starting with release 24), and is accessed by selecting the Server
UtilitysHigh Availability Verificationoption.
The Server Utility is included on the Navisphere Server Support CD that ships with the storage system.
The Server Utility is supported on Windows, Linux, HP-UX, AIX, and Solaris operating systems and now
offers additional features beyond the traditional one of registering the server initiators with the storage
system. Refer to theEMC Navisphere Host Agent/CLI and Utilities Release Noteson Powerlinkfor the
EMC CLARiiON High Availability (HA)Best Practices Planning 4
http://powerlink.emc.com/http://powerlink.emc.com/http://powerlink.emc.com/http://powerlink.emc.com/8/12/2019 EMC ClarIIon High Availability
5/19
latest supported revisions and features. Figure 1shows the welcome dialog box for the Server Utility.
This dialog box outlines all the features that are currently available in the Server Utility.
Figure 1.Navisphere Server Utility functions
For the purposes of this white paper, we will only discuss the Verify Server High Availabilityoption. To
validate a particular server, select this option and follow the prompts. The final result is the Navisphere
Server High Availability Report with various tabs containing important information. One of these tabs is
labeled Checklist.
The Configuration Checklisttab includes information on three main components: the storage system,
server hardware, and server software. This information includes the storage system name, model, andFLARE OE version; manufacturer, model and firmware of the host operating system; and manufacturer,
model, and firmware of the host itself. It also lists the names and version of the server software that is
installed. Figure 2shows an example of this checklist.
EMC CLARiiON High Availability (HA)Best Practices Planning 5
8/12/2019 EMC ClarIIon High Availability
6/19
Figure 2. Server Utility High Availability verification checklist report
Once this checklist has been generated, the report can be printed and compared against the E-Lab
Interoperability Navigator (a searchable database of theEMC Support Matrixavailable on Powerlink,).
New E-Lab Wizards make it easier to check the relevant components depending on the task beingperformed. For example, if attaching a new host and configuring it for HA, the Storage Array Wizard helps
to determine if the server hardware and software is supported with a particular storage system model. The
checklist mirrors the order that information is prompted for within the wizard and includes all requiredinformation. This makes it quick and easy to input the required information and provides each servers
component revisions in one place.
This white paper discusses another important feature of the High Availability Verification report in the
Ongoing high availability verification section.
CLARiiON Procedure GeneratorThe CLARiiON Procedure Generator (CPG) is another useful tool built by EMC for customers and field
personnel. This tool is designed to create procedures for various operations, such as: installing a new
storage system or adding a host in a new or existing SAN environment, performing a software upgrade, and
performing certain recovery procedures. The CPG is available on the Powerlinkwebsite.
Number of paths from the host to the storage systemTo ensure there are redundant paths between the host and storage system, there must be a failover path in
case the primary path fails. The CLARiiON storage systems have a primary/secondary LUN ownership
model for the storage processors, although the host-addressable logical units (LUNs) are serviced via both
storage processors (SPs). In this architecture, a LUN will be serviced by one SP at a time. In the event that
the LUN is trespassed over to the peer SP, the LUN will be serviced by the other SP. This may occur if theprimary path to the default SP fails, the host HBA fails, orin some rare casesthe SP fails. In any such
event, it is important that there is a standby/secondary path to the peer SP and path-failover softwaresuch
EMC CLARiiON High Availability (HA)Best Practices Planning 6
http://powerlink.emc.com/http://powerlink.emc.com/http://powerlink.emc.com/http://powerlink.emc.com/8/12/2019 EMC ClarIIon High Availability
7/19
as EMC PowerPathto initiate failover to the standby/secondary I/O path in order to ensure that the data
on the LUN is accessible via the secondary SP.
Table 1. Host failover options
Very Low
Single Drive
Failure
Power Failure
SP FailureHBA Failure
Server Failure
Storage System
Failure
2 HBAs per path with
PowerPath
Clustered Envrionment
Mirrored Data
Performance
and Protection
Low
Single Drive
Failure
Power Failure
SP Failure
HBA Failure
Multiple HBAs, SAN
environment, PowerPathHigh Protection
Medium
Single DriveFailure
Power Failure
SP Failure
Single HBA SAN environmentwith PowerPath SE
Multiple HBA direct attach with
PowerPath Base
Medium
Protection
High
Single Drive
Failure
Power Failure
Single HBA, Direct Attach
Single HBA, SAN, no
PowerPath SELow Protection
Risk of DataUnavailable
Storage System
Level
ProtectionAvailableHost Configuration
Host FailoverOptions
Very Low
Single Drive
Failure
Power Failure
SP FailureHBA Failure
Server Failure
Storage System
Failure
2 HBAs per path with
PowerPath
Clustered Envrionment
Mirrored Data
Performance
and Protection
Low
Single Drive
Failure
Power Failure
SP Failure
HBA Failure
Multiple HBAs, SAN
environment, PowerPathHigh Protection
Medium
Single DriveFailure
Power Failure
SP Failure
Single HBA SAN environmentwith PowerPath SE
Multiple HBA direct attach with
PowerPath Base
Medium
Protection
High
Single Drive
Failure
Power Failure
Single HBA, Direct Attach
Single HBA, SAN, no
PowerPath SELow Protection
Risk of DataUnavailable
Storage System
Level
ProtectionAvailableHost Configuration
Host FailoverOptions
Host bus adapter vendor informationThe HBA information for two major HBAs that EMC supports can be found at the following HBA vendor
websites:
Emulex drivers and installation docs with HBA settings at:http://www.emulex.com/ts/docoem/framemc.htm
QLogic drivers and installation docs with HBA settings at:
http://www.qlogic.com/support/oem_emc.asp
Please see the E-Lab Interoperability Navigator (available on Powerlink) for information about other
supported HBAs.
Automatic path failover and failbackTo automate path failover and failback, some type of path failover software must be running on the host
system. EMC supports several failover software packages, including its own PowerPath software. Whenthe failover software is correctly set up and is running on the host, it automatically fails over the application
I/Os from the failed path to the secondary/standby paths. To learn more about EMC PowerPath, go to
EMC.com:
http://software.emc.com/products/software_az/powerpath.htm
EMC CLARiiON High Availability (HA)Best Practices Planning 7
http://www.emulex.com/ts/docoem/framemc.htmhttp://www.qlogic.com/support/oem_emc.asphttp://powerlink.emc.com/http://powerlink.emc.com/http://www.qlogic.com/support/oem_emc.asphttp://www.emulex.com/ts/docoem/framemc.htm8/12/2019 EMC ClarIIon High Availability
8/19
Path management failover settings
Several path failover software packages may be present in the environment. To ensure that the storage
system is configured appropriately for the host operating system environment, the following parameters
must be set on the storage system:
arraycommpath
failovermode systemtype
unitserialnumber
Install free failover software (EMC PowerPath SE)
Failover software is required to maintain data availability during coordinated SP reboots(for example
when updating software on the CLARiiON storage system). Customers with single-HBA hosts (switchattached) can use PowerPath free of charge. This basic PowerPath functionality (PowerPath SE) is
available on the CLARiiON Utility Kit CD, as well as on Powerlink. The CLARiiON Utility Kit CD ships
with the storage system. A version for each operating system type in the environment is supplied.
Connectivity configuration for high availabilityAfter ensuring that the host configuration complies with the HA best practices, the next thing to consider is
the connectivity infrastructure. How the host is physically connected to the storage system host determines
the protection level. Table 2 shows how different configurations offer different levels of protection.
Table 2. Connectivity options
Host
Array
HBA
SPA SPB
Host
Array
HBA
SPA SPB
Switch Switch Switch
Host
Array
HBA HBA
SPA SPB
Switch Switch
Host
Array
HBA HBA
SPA SPB
HBA HBA
Protection Low Protection Medium Protection High Protection Ultra Protection
Connectivity
Direct Connect -
No Switch
Available
Single HBA with
Single Switch
True High
Availability
Ultra HA - All
components are
redundant
PowerPath
None - Insufficient
Hardware for
PowerPath PowerPath SE PowerPath - Full PowerPath - Full
Single Pointsof Failure HBA; SP; Cable HBA; Switch; Cable None None
Host
Array
HBA
SPA SPB
Host
Array
HBA
SPA SPB
Switch Switch Switch
Host
Array
HBA HBA
SPA SPB
Switch Switch
Host
Array
HBA HBA
SPA SPB
HBA HBA
Protection Low Protection Medium Protection High Protection Ultra Protection
Connectivity
Direct Connect -
No Switch
Available
Single HBA with
Single Switch
True High
Availability
Ultra HA - All
components are
redundant
PowerPath
None - Insufficient
Hardware for
PowerPath PowerPath SE PowerPath - Full PowerPath - Full
Single Pointsof Failure HBA; SP; Cable HBA; Switch; Cable None None
Protection Low Protection Medium Protection High Protection Ultra Protection
Connectivity
Direct Connect -
No Switch
Available
Single HBA with
Single Switch
True High
Availability
Ultra HA - All
components are
redundant
PowerPath
None - Insufficient
Hardware for
PowerPath PowerPath SE PowerPath - Full PowerPath - Full
Single Pointsof Failure HBA; SP; Cable HBA; Switch; Cable None None `
The following sections describe the protection options shown in Table 2.
EMC CLARiiON High Availability (HA)Best Practices Planning 8
http://powerlink.emc.com/http://powerlink.emc.com/8/12/2019 EMC ClarIIon High Availability
9/19
Low protectionThis is a very basic option. In this case, the host has a single HBA and is connected to a single SP. This
configuration includes multiple single points of failure. The failure of the HBA, cable, or SP results in data
being unavailable for the host applications. Customers running mission-critical applications must refrainfrom this configuration.
Medium protectionThis option provides some protection. It consists of the use of PowerPath SE softwarewhich EMC
provides free of costand a single switch for host-to-storage connectivity. When configured properly, the
PowerPath SE software running on the host provides basic LUN trespass functionality during operations
such as nondisruptive upgrades (NDUs), as well as in the rare event of single SP failure. This configuration
includes multiple single points of failure. Failure of the HBA, cable, or switch will result in data beingunavailable for the host applications. Customers running mission-critical applications must refrain from this
configuration.
High protectionThis configuration is recommended for all business-critical applications provided that adequate measures
have been taken at the host and application level. It entails the use of full-feature PowerPath software. Inthis configuration, there are dual HBAs connected to the host; therefore, there is a redundant path to each
SP. There is no single point of failure. Data availability is ensured in the event an HBA, cable, or SP fails.
Since there is a single path per SP, this configuration does not provide any additional performance
enhancement.
Ultra protectionThis configuration is recommended for the highest level of protection that may also help with higher
performance. It entails the use of full-feature PowerPath software. In this configuration, there are dual
HBAs connected to the host; therefore, there is a redundant path to each SP. There is no single point offailure. Data availability is ensured in event an HBA, cable, or SP fails. Since there are multiple paths per
SP, this configuration benefits from PowerPaths load-balancing feature and thus provides additional
performance.
Switch vendor informationStorage area network (SAN) switch information on a few switch types can be found at the following switchvendor websites; theEMC Support Matrixcontains information about other switch types that EMC
supports:
Cisco storage networking website:
http://www.cisco.com/en/US/products/hw/ps4159/index.html
Brocade storage networking website:
http://www.brocade.com/products/index.jsp
Storage-side high availabilityOnce the host- and connectivity-side HA is ensured, the next area of focus is the storage system itself. For
any mission-critical application environment, it is important that storage system on which the data resides
has a highly available architecture. The CLARiiON storage system offers N+1 redundant architecture,
which provides data protection against any single component failure. These components are discussed inthe next section.
EMC CLARiiON High Availability (HA)Best Practices Planning 9
http://www.cisco.com/en/US/products/hw/ps4159/index.htmlhttp://www.brocade.com/products/index.jsphttp://www.brocade.com/products/index.jsphttp://www.cisco.com/en/US/products/hw/ps4159/index.html8/12/2019 EMC ClarIIon High Availability
10/19
Storage-system componentsCLARiiON storage systems are designed for high availability. With redundant components such as dual
SPs, dual back-end loops, and dual-ported disk drives, CLARiiON storage systems can ride through
multiple component failure scenarios. Other features that make CLARiiON storage systems resilient in faceof various failure types include: triple protection of the storage system database (also referred to as PSM
Persistent Storage Manager) and the FLARE database, which, among others, is designed to keep customer
data available. These features are available out-of-the-box to our customers and no additional configurationis required at the time of installation. However, customers must choose the way in which disks are bound.
The next section discusses the best practices for RAID configuration.
RAID configurationTo access the data on the disk drives, the disks must be bound into a RAID group. There are differentRAID configurations that are supported with a CLARiiON storage system. Note that customers have the
option of using individual disks. However, this is rarely done, since by doing so they do not benefit from
the redundancy offered by the RAID-protected configuration.
CLARiiON offers RAID 1, 1/0, 3, and 5 options for data protection. These configurations offer protection
against single-disk failure. In the case of RAID 1/0 (mirrored stripes), multiple disk failure can be tolerated
in some cases as long as the disk failure does not occur within same mirrored pair. RAID 0 offers high-performance RAID configuration, but does not offer any protection.
Table 3 shows examples of various protection levels as they relate to resilience in case of disk failure and
the effective utilization of the raw disk space.
Table 3. Storage system RAID configuration options
Low protectionDisks configured as RAID 0 offer no protection against a single disk failure. The only reason this RAID
type is selected is to get the benefit of striped writes that RAID 0 offers for higher performance. This
configuration may be used for things like temp files or for holding any data that needs fast access. In event
of a disk failure, the information within RAID 0 group will be unrecoverable. Therefore this RAIDconfiguration must be used with great caution.
Medium protectionDisks configured as RAID 3 or RAID 5 with eight or more disks provide protection against a single-diskfailure within the same RAID group. When a single disk fails within the RAID group, the hot spare
(discussed in next section) is invoked and data that was contained on the failed disk is rebuilt. When the
failed disk is replaced, the data from the hot spare (invoked earlier) is copied to the replaced drive. Once
the rebuild process completes, the RAID group is ready to withstand another disk failure.
EMC CLARiiON High Availability (HA)Best Practices Planning 10
8/12/2019 EMC ClarIIon High Availability
11/19
High protectionIn order to make the RAID group less vulnerable to double-disk failure, fewer disks can be used to
configure the RAID group, which reduces the possibility of double-disk failure within the same RAID
group. Medium protection may be upgraded to high protection by reducing the number of disks within theRAID 3 or RAID 5 group to three disks.
Ultra protectionIn certain environments, data protection and application performance are equally important requirements
(especially in certain transaction-processing environments such as database applications). To meet these
objectives, customers can choose a RAID 1/0 configuration that offers high performance and additional
protection against disk failure. In the RAID 1/0, configuration, a double-disk fault may be tolerated as longas there is no more than one disk failure within the mirrored pair.
Hot sparing policyHot spares are spare disks that are pre-allocated at the time of configuration. A hot spare is invoked in an
event of a disk failure. Once invoked, the data that resides on the failed disk is rebuilt from either the
mirrored pair RAID 1 and 1/0 or from data and parity from other drives, in the case of RAID 3 or RAID 5.
When the failed disk is replaced, the data on the replaced disks is equalized with the content of the hot
spare. After the completion of the equalization process, the hot spare returns to its default position, readyfor any future disk failure event.
Number of hot spares
There should be at least one hot spare disk for every 30 drives on the CLARiiON storage system. It is up to
the customer to configure more hot spares. For ease of management, it is also recommended that the hot
spare be configured on the last drive slot on a disk-array enclosure/shelf. However, the hot spare may be
configured anywhere in the system with exception of the vault drives that are used for cache vault andcertain other internal purposes. The vault drives are the first five drives on the CX series. Their location
varies for prior generation of products.
Sizing the hot spare diskWhen planning global hot spares on the system, disks should be as large as or larger than the drive(s) they
may be required to replace in the event of disk failure.
Disk replacement
If a disk drive fails and needs to be replaced, you should follow EMCs recommended procedure for diskreplacement and take all the necessary precautions, including proper drive handling. In case of doubt, wait
for EMC support personnel. In the rare event of multiple drive failure, please do not replace the drive.
Instead, wait for the trained EMC support personnel to arrive.
Rebuild-time considerationsWhen a disk drive fails and a hot spare is invoked, the rebuild process starts. During this process, data that
was resident on the failed disk is rebuilt from available redundant components. The time it takes to rebuilda failed disk depends upon various factors, which are discussed next.
System load
During the rebuild process the system has to do a significant amount of work to read the data from theredundant components of the RAID group to rebuild data. In the case of mirrored RAID groups (RAID 1
and RAID 1/0), the process involves reading data from the mirrored pair. In case of parity RAID groups
(RAID 3 and RAID 5) data is reconstructed by reading the data and parity from the available disk drives in
EMC CLARiiON High Availability (HA)Best Practices Planning 11
8/12/2019 EMC ClarIIon High Availability
12/19
the RAID group. Therefore, if the system is idle, the rebuild process will be faster since all the system
resources are available to the rebuild process. If the system is under heavy application workload, the
rebuild process can take relatively longer to complete.
Rebuild priority
Beside system load, the rebuild process is controlled by the priority set for the rebuild process for the LUNs
within the RAID group. The priority settings for the rebuild process are:
Low
Medium
Fast
ASAP
By default the priority is set to ASAP. In most cases, the default is the recommended rebuild priority.
Size of the RAID group
The bigger the size of the RAID group (that is, number of disks within the RAID group), the longer it takes
to complete the rebuild process in the event of a single disk failure within the RAID group. The rebuild
time can be reduced if the size of the RAID group is kept smaller.
Size and type of disk drive
The size and type of the disk itself can affect the rebuild time. For example, a five-disk RAID 5 group
comprised of 36 GB 15,000 rpm disk drives will rebuild relatively faster than a nine-disk RAID 5 group
comprised of 250 GB 5,400 rpm disk drives.
Table 4 shows examples of various RAID group configurations under different system loads, rebuild
priorities, RAID groups, and disk sizes in order to show the risk associated with a double-disk failure
scenario. These are only examples and do not presume to show the best configuration.
EMC CLARiiON High Availability (HA)Best Practices Planning 12
8/12/2019 EMC ClarIIon High Availability
13/19
Table 4. RAID protection options and rebuilds
Clustering and replicationThere is another dimension to high availability beyond local protection. Besides host, connectivity, and
storage high availability, options such as protection against the server hardware and storage-system failureshould be considered. These events are very rare, but if they happen they may cause disruption.
Clustering: Protecting against loss of the primary applicationserverThere are various clustering software products to protect the production environment against server failure.The most popular products include the following:
Microsoft Cluster Server
VERITAS Cluster Server
Sun Cluster Server
Please check theEMC Support Matrixfor supported cluster software.
Data mirroring: Protecting against loss of the primary storagesystemTo protect the production site against the failure of access to the primary storage system (that is, due to
power failure), customers can mirror data from the primary site to another disaster recovery site. Depending
on the recovery point objective (RPO) and recovery time objective (RTO), there are different optionsavailable. In cases where customers cannot afford to lose even a single transaction, they can use a
synchronous mirroring product such as MirrorView/Synchronous. In circumstances where the disaster
recovery site must be hundreds or even thousands of miles from production site, some type of
asynchronous replication application (such as RecoverPoint or MirrorView/Asynchronous) may be used.
In almost all of these cases, the customer will benefit from the use of both clustering and remote mirroring.
Validation and maintenanceA significant amount of time is often invested in preparing and configuring a new environment for highavailability. Because of this investment, it is important to validate the environment after configuration to
ensure it was implemented properly for high availability. If you have installed a new storage system,
attached a new host, or are about to perform other ongoing maintenance procedures (such as updating the
software on the CLARiiON storage system), it is imperative that you test the HA configuration to ensure
that data availability is maintained in the event of, for example, a path failure. It is also imperative that
EMC CLARiiON High Availability (HA)Best Practices Planning 13
8/12/2019 EMC ClarIIon High Availability
14/19
failover testing occur regularly within the environment to protect against any inadvertent changes that may
have broken the high availability of the configuration.
HA validationThere are two important pieces to testing the high availability of your environment. After installing a
storage system, or attaching a new host, you should perform a physical test to ensure that failover occurred
as expected. Then, after you are in production, you should perform periodic health checks to validate theavailability of the environment, and to make sure that nothing unexpected changed within the environment(for example, a zone inadvertently changed, or failovermode adjusted on the wrong initiator). The next two
sections discuss how to check for failover after installation and how to perform ongoing periodic
verification.
Initial host failover testing
After the host environment has been successfully configured for failoverincluding the installation of
failover software (for example, EMC PowerPath), HBAs, and so forththe next and most important step is
testing. While the environment is in the deployment stage, induce a failure. For example, pull a fibre cable
that connects the host HBA to the storage system or switch, and ensure that the application LUNs are failed
over to the alternate path by the host path management software. For failover to work there must be activeI/O at the host level.
Following a successful failover, test the failback capability when the fault condition is cleared. In this
example, reconnect the fibre cable and see if the host I/O fails back to the default path. For more
information about EMC PowerPath software failover and failback features, refer to the PowerPath-relateddocumentation on Powerlink.
After physically validating failover, the HAVT utility should also be run to ensure that there are no other
HA issues in the environment. This is discussed in the following section.
Ongoing high availability verification
After manually testing failover, each servers high availability should regularly be verified to ensure that
nothing has changed in the HA configuration. HA verification should also be performed before a software
update is performed on the CLARiiON storage system. Because an update of the FLARE OE software or
the installation of a new software enabler reboots each SP in turn, it is important to ensure that each hostthat is to remain online during the update can ride through this reboot while maintaining access to the data
on the system. Maintaining access during an update means that, at a minimum, each server is zoned to each
SP and PowerPath SE is installed with the proper failovermode settings applied.
As described in the Prerequisite for a highly available host environment section, HAVT is a tool that is
used when upgrading arrays, and is available in the GUI and the text-based versions of Navisphere ServerUtility (starting with release 24), and with the Navisphere Service Taskbar Software Assistant. HAVT
allows you to validate CLARiiON attached hosts for high availability. Select the Verify Server High
Availabilityoption as shown in Figure 1, and indicate whether this check is part of a Software Update, in
which case the result of the report is sent to the storage system so that the software update process can
validate the servers will ride through the update; or whether it is a host attach validation (regular health
check). HAVT displays results that show whether the server meets HA requirements and allows you to
view the Navisphere Server High Availability Report. In this scenario the important tab to note is theIssuestab as shown in Figure 3.
EMC CLARiiON High Availability (HA)Best Practices Planning 14
http://powerlink.emc.com/http://powerlink.emc.com/8/12/2019 EMC ClarIIon High Availability
15/19
Figure 3. Navisphere Server Utility High Availability verification issues report
EMC CLARiiON High Availability (HA)Best Practices Planning 15
8/12/2019 EMC ClarIIon High Availability
16/19
Issues are generated based on a series of checks performed by the HAVT utility. These checks include
looking for redundant HBAs, ensuring path management software is installed, and validating that the
proper initiator settings (such as failovermode) are set on the storage system. As of release 24, HAVT
supports the following operating systems (refer to theEMC Navisphere Host Agent/CLI and Utilities
ReleaseNotes onPowerlinkfor the latest support information):
Solaris 8,9
HP-UX 11.0, 11.11, 11.23 IA 64, 11.23 PA RISC Windows 2000, 2003 (Fibre Channel and iSCSI attaches)
AIX 5.2, 5.3
Red Hat Enterprise 3 and 4 (Fibre Channel and iSCSI attaches)
SuSE Enterprise Server 8 and 9 (Fibre Channel and iSCSI attaches)
AsianUX (2.0) (Fibre Channel and iSCSI attaches)
HAVT supports the following failover software:
PowerPath
VERITAS DMP
HP-UX PVLinks
The Issues summary lists all the critical errors and warnings discovered with the host configuration as
regards High Availability, and provides corrective actions for each error and warning. The report also
includes a Detailstab. This tab provides further information on:
Server Status: Includes the version of the failover software, and information about the FC HBAs,iSCSI HBAs, and NICs, as well as details on the devices (similar to the information displayed by a
powermt display dev=all command issued from the PowerPath command line).
Initiators: Includes information about HBA and NIC configuration details, including driver andregistry settings, iSCSI host iqn, persistent targets, established sessions, and (CHAP/mutualCHAP) security information for iSCSI initiators.
Data Connection Report: This gives information about the failover mode, arraycommpath, initiatortype, and registration for each HBA connected to a device.
Software, Services and System Updates: Server-specific software installed features, includingEMC specific software and OS patches
HAVT may be run in the following scenarios:
As part of the Prepare For Installation step of the Software Assistant, a process that aids customersand service with upgrading the storage system. In this case, run the HAVT utility and analyze each
host report to preempt potential data unavailability issues that could occur during the SP reboots as
part of the software install.
Any time you need to check for host-related issues.
Periodically. Using a script, you can run periodically run HAVT to generate a report for analysis.
HAVT and its resulting report are important tools to help avoid potential data unavailability issues caused
by improperly attached hosts. Improper server configurations (meaning HA was not properlyimplemented) are the No. 1 issue identified in weekly analyses of data unavailable reports. HAVT has
been designed to help customers avoid these issues by providing a utility to validate the environment
during critical maintenance procedures, such as a new server attach, new storage system installation, orCLARiiON software update.
Change control processIt is very likely that a live production environment is going to change due to business requirements and
other factors. To ensure that the production environment maintains its resilience and remains highly
available, it is important that you refer to the E-Lab Interoperability Navigatorbefore making any changes.This includes changing any of the following:
EMC CLARiiON High Availability (HA)Best Practices Planning 16
http://powerlink.emc.com/http://powerlink.emc.com/8/12/2019 EMC ClarIIon High Availability
17/19
Storage system software This is usually the process of upgrading the FLARE operatingenvironment on the CLARiiON storage system.
HBA firmware HBA vendors websites contain the latest information on HBA driver firmware,fcode, and other software. Refer to the following URLs for more details on Emulex and QLogic HBAs:
Emulex drivers and installation docs with HBA settings at:
http://www.emulex.com/ts/docoem/emc/index.html
QLogic drivers and installation docs with HBA settings at:
http://www.qlogic.com/support/oem_emc.asp
Switch firmware
Operating system patches and hot fixes
Path management software EMC PowerPath, HP PVLinks, VERITAS DMP, and so on
ConclusionThis white paper offers best practices for end-to-end high availability in a mission-critical and business-
critical production environment. The high availability design starts at the host, continues with connectivity,and ends with the storage system. By adding clustering technology and combining it with remote mirroring,
customers can make their production environments highly available. However, a highly available designcomes with a higher price tag. It requires redundant hardware components to guard against any possible
failures. To have a highly available environment requires a close identification of those applicationenvironments that need high protection when it comes to host components such as HBAs, use of failover
software such as PowerPath, use of redundant switch fabrics, and RAID protection at the storage-system
level. HAVT is an invaluable tool for ensuring the ongoing health of your highly available environment.
Appendix A: Failover mode settings
These settings are not applicable for CDL series.
The following tables include Initiator, arraycommpath, and Failover Mode settings for failover software on
CLARiiON-supported operating systems. It also indicates which failover software is supported on each
operating system. The following notes aid in appropriate use of these tables:
1. Initiator Typeis referred to as systemtypeif using NaviCLI rather than the failover wizard withinNavisphere Express.
2. The settings below are those set within the Navisphere Manager failover wizard, or, in the case of theUnitSerialNumbervariable, the Group Editoption in the Connectivity Statusdialog box.
3. Parentheses identify the NaviCLI equivalent value.
Table 5. AIX failover mode settings
Parameter PowerPath DMP (AIX 5.1 and 5.2 only)
Initiator Type CLARiiON Open (3) CLARiiON Open (3)
Arraycommpath Enabled (1) or Disabled (0)1
Enabled (1)Failovermode 3 or 1
2 2
UnitSerialNumber Array Array1 AIX settings depend on CLARiiON software being used:
If using ODM definitions, arraycommpathshould be set to Enabled (1).
If using CLArrayS3 software, arraycommpathshould be set to Disabled (0).2Set failovermode to 3 for PowerPath 4.5.1 or later. NDU primus case emc67186has the complete NDU
requirements.
EMC CLARiiON High Availability (HA)Best Practices Planning 17
http://www.emulex.com/ts/docoem/emc/index.htmlhttp://www.qlogic.com/support/oem_emc.asphttp://www.qlogic.com/support/oem_emc.asphttp://www.emulex.com/ts/docoem/emc/index.html8/12/2019 EMC ClarIIon High Availability
18/19
Table 6. HP-UX failover mode settings
Parameter PVLinks PowerPath DMP
(HP-UX 11i
only)
No path
management
software
Initiator Type
(Access Logix)
HP Auto
Trespass (2)
1
HP No Auto
Trespass(hex A)
HP No Auto
Trespass(hex A)
HP No Auto
Trespass(hex A)
Initiator Type(non-Access Logix)
HP AutoTrespass (2)
1
decimal 10
(hex A)
N/A HP No AutoTrespass
(hex A)
Arraycommpath Enabled (1) or
Disabled (0)2
Enabled (1) Enabled (1) 0 (Disabled) or 1
(Enabled)2
Failovermode 0 1 2 0
UnitSerialNumber LUN or Array3 LUN or Array3 LUN or Array3 LUN or Array3
1HP PVLinks requires that AutoTrespass be set for all LUNs. To set AutoTrespass, edit the Navisphere
Host Agent configuration file (agent.conf) by commenting out OptionsSupported Autotrespassand
restarting the Host Agent.
2 For HP-UX running PVLinks, arraycommpathcan be Enabled (1) or Disabled (0). Either will work.
3For HP-UX 11i v1.0, UnitSerialNumbermay have been changed to LUN if problems were experiencedwith the device display in the HP-UX SAM utility.
Table 7. Linux failover mode settings
Parameter PowerPath DMP MPIO
Initiator Type CLARiiON Open (3) CLARiiON Open (3) CLARiiON Open (3)
Arraycommpath Enabled (1) Enabled (1) Enabled (1)
Failovermode 1 2 1UnitSerialNumber Array Array Array
Table 8. NetWare failover mode settings1
Parameter PowerPath
Initiator Type CLARiiON Open (3)
Arraycommpath Enabled (1)
Failovermode 1
UnitSerialNumber Array
1Validate support for NetWare on the newer CLARiiON models via the E-Lab Interoperability Navigator
(available on Powerlink, EMCs password-protected extranet for customers and partners). An RPQ maybe required.
EMC CLARiiON High Availability (HA)Best Practices Planning 18
http://powerlink.emc.com/http://powerlink.emc.com/8/12/2019 EMC ClarIIon High Availability
19/19
Table 9. Solaris failover mode settings
Parameter PowerPath DMP StorEdge Traffic
Manager
Initiator Type CLARiiON Open (3) CLARiiON Open (3) CLARiiON Open (3)
Arraycommpath Enabled (1) Enabled (1) Enabled (1)
Failovermode 1 2 1
UnitSerialNumber* Array or LUN Array or LUN Array or LUN
*Sun Solaris installations with PowerPath and DMP:
Solaris 2.6, 7, and 8: UnitSerialNumber should be set to LUN.
Solaris 9: Will work with UnitSerialNumberset to either Storage System or LUN.
Table 10. Tru64 failover mode settings
Parameter Native failover
Initiator Type Compaq/Tru64 (hex 1C)
Arraycommpath Enabled (1)
Failovermode 0
UnitSerialNumber Array
Table 11. VMware failover mode settings
Parameter Native
Initiator Type CLARiiON Open (3)
Arraycommpath Enabled (1)
Failovermode 1
UnitSerialNumber Array
Table 12. Windows failover mode settings
Parameter PowerPath DMP (Windows 2000
and Windows 2003
only)
Initiator Type CLARiiON Open (3) CLARiiON Open (3)
Arraycommpath Enabled (1) Enabled (1)
Failovermode 1 1
UnitSerialNumber Array Array
When changing any one of the Arraycommpath, Failovermode, and Initiator Typesettings, all settingswill be set to the storage-system default settings, so it is necessary to set not only the setting that is being
changed but also all initiator settings. This applies to both methods of changing these parameters (GroupEditin Connectivity Statusand the failover wizard).
EMC CLARiiON High Availability (HA)B t P ti Pl i 19