Download - 68-Veritas Cluster Server 6.0 for UNIX Cluster Management

Veritas Cluster Server 6.0 for UNIX: Cluster Management

100-002685-E

2COURSE DEVELOPERSBilge GerritsSteve HofferSiobhan SeegerPete Toemmes

LEAD SUBJECT MATTER EXPERTS

Graeme GoftonSean NocklesBrad Willer

TECHNICAL CONTRIBUTORS AND REVIEWERS

Geoff BergrenKelli CameronTomer GurantzAnthony HerrJames KenneyGene HenriksenBob LucasPaul JohnstonRod PixleyClifford BarcliffDanny YonkersAntonio AntonucciSatoko SaitoFeng Liu

Copyright 2012 Symantec Corporation. All rights reserved. Symantec, the Symantec Logo, and VERITAS are trademarks or registered trademarks of Symantec Corporation or its affiliates in the U.S. and other countries. Other names may be trademarks of their respective owners.THIS PUBLICATION IS PROVIDED AS IS AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID. SYMANTEC CORPORATION SHALL NOT BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES IN CONNECTION WITH THE FURNISHING, PERFORMANCE, OR USE OF THIS PUBLICATION. THE INFORMATION CONTAINED HEREIN IS SUBJECT TO CHANGE WITHOUT NOTICE.No part of the contents of this book may be reproduced or transmitted in any form or by any means without the written permission of the publisher.Veritas Cluster Server 6.0 for UNIX: Cluster ManagementSymantec Corporation World Headquarters 350 Ellis Street Mountain View, CA 94043 United Stateshttp://www.symantec.com

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.

3 Table of Contents iCopyright 2012 Symantec Corporation. All rights reserved.

Course Introduction

Lesson 1: Service Group DependenciesCommon application relationships ............................................................... 1-3Service group dependencies........................................................................ 1-7Service group dependency examples .......................................................... 1-9Configuring service group dependencies ................................................... 1-12Alternative methods of controlling interactions........................................... 1-16

Lesson 2: Reconfiguring Cluster MembershipAdding a system to a cluster ........................................................................ 2-3Merging clusters ........................................................................................... 2-6Additional reconfiguration tasks ................................................................. 2-10

Lesson 3: Startup and Failover PoliciesStartup rules and policies ............................................................................. 3-3Failover rules and policies.......................................................................... 3-10Limits and Prerequisites ............................................................................. 3-18Modeling startup and failover policies ........................................................ 3-22

Lesson 4: Alternate Network ConfigurationsMultiple service groups with NIC resources ................................................. 4-3Multiple public interfaces .............................................................................. 4-8

Lesson 5: High Availability in the EnterpriseVeritas Operations Manager ........................................................................ 5-3Disaster recovery enhancements................................................................. 5-8Virtualization support.................................................................................. 5-15

Appendix A: LabsLab 1: Service group dependencies............................................................ A-3Lab 2: Merging clusters .............................................................................. A-13Lab 3: Failover policies............................................................................... A-25Lab 4: Creating a parallel network service group ....................................... A-37

Appendix B: Lab SolutionsLab 1: Service group dependencies............................................................ B-3Lab 2: Merging clusters .............................................................................. B-37Lab 3: Failover policies............................................................................... B-65Lab 4: Creating a parallel network service group ....................................... B-89

Appendix C: Supplemental ContentService group dependenciesFailover process......................................... C-2

Table of ContentsC

opyr

ight

2

012

Sym

ante

c C

orpo

ratio

n. A

ll rig

hts

rese

rved

.

4 ii Veritas Cluster Server 6.0 for UNIX: Cluster ManagementCopyright 2012 Symantec Corporation. All rights reserved.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.

5Course Introduction

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.

6 12 Veritas Cluster Server 6.0 for UNIX: Cluster ManagementCopyright 2012 Symantec Corporation. All rights reserved.

Veritas Cluster Server curriculum pathThe Veritas Cluster Server for UNIX curriculum is a series of courses that are designed to provide a full range of expertise with Veritas Cluster Server (VCS) high availability solutionsfrom design through implementation. Veritas Cluster Server for UNIX: Install and Configure

This course covers installation and configuration of common VCS environments, focusing on two-node clusters running application and database services.

Veritas Cluster Server for UNIX: Manage and AdministerThis course focuses on multinode VCS clusters and advanced topics related to managing more complex cluster configurations.The course includes two participant guides: Veritas Cluster Server for UNIX: Example Application Configurations

This guide provides examples of common service group configurations. The instructor may opt to present some or all of this material in class, depending on time constraints and student interest.

Veritas Cluster Server for UNIX: Cluster ManagementThis guide provides detailed information about configuring and managing more complex clusters.

Veritas Cluster Server eLearning LibraryThe eLearning Library is available with bundled training options and includes content on advanced high availability and disaster recovery features.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.

7 Course Introduction 13Copyright 2012 Symantec Corporation. All rights reserved.

Lab design for the courseThe diagram shows a conceptual view of the cluster design used as an example throughout this course and implemented in hands-on lab exercises.

Each aspect of the cluster configuration is described in greater detail where applicable in course lessons.

The cluster consists of: Four nodes Several high availability services Fibre connections to SAN shared storage from each node through a switch Two Ethernet interfaces for the cluster interconnect Ethernet connections to the public network

Additional complexity is added to the design to illustrate certain aspects of cluster configuration in later lessons.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Course overviewThis training provides comprehensive instruction on the deployment of advanced features of Veritas Cluster Server (VCS). The course focuses on multinode VCS clusters and advanced topics related to more complex cluster configurations, such as service group dependencies and workload management.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.

9Lesson 1

Service Group Dependencies

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.

11 Lesson 1 Service Group Dependencies 13Copyright 2012 Symantec Corporation. All rights reserved.

1

Common application relationships

Multitier servicesIn most high availability environments, a collection of applications work together to provide a service. The relationships between these application service groups must be managed by VCS to ensure that startup, shutdown, and failover procedures are coordinated properly.

The example in the slide shows a typical three-tier model where: A Web server provides an interface to network clients. The Web server relies on an application that processes client requests. The application processes requests from the Web server and relies on the

database to manage information.

The inherent relationships require that the database is started and running first, then the application, and finally the Web server.

In a cluster environment, you must also consider which systems should run which applications, as shown in the following examples.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Example 1: Online on the same systemIn this example of a two-tier service, both the Web server and the application must be online on the same system. For example, the services may need to use interprocess communication (IPC) mechanisms to exchange data.

Furthermore, the application must come online first, followed by the Web server. If the application faults and fails over to another cluster node, the Web server must then be brought online on that same failover target system.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


1

Example 2: Online on different systemsIn this example of a two-tier service, both the database and the application must be online, but they cannot run on the same system. For example, the combined resource requirements of each application may exceed the capacity of the systems, and you want to ensure that they run on separate systems.

In this scenario, the database must come online first, and then the application is started only after the database is running. If the application faults and fails over to another cluster node, the database can stay online on the original system, as long as the application is not restarted on that same system.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Example 3: Offline on the same systemOne example relationship is where you have a test version of an application and want to ensure that it does not interfere with the production version. You want to give the production application precedence over the test version for all operations, including manual offline, online, switch, and failover.

The difference between this example and the previous example, is that in this case, neither application requires the other to be online. The only requirement is that the other application cannot be online on a system when one application is brought online.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


1

Service group dependenciesVCS provides service group dependencies to manage application relationship requirements in multi-application environments.

Service group dependency definitionsThere are four basic criteria for defining how services interact when using service group dependencies. A service group can require another group to be online or offline in order to

start and run. You can determine the startup order for service groups by designating one

group the child and another a parent. Parent groups depend on child groups. If service group B requires service group A to be online in order to start, then B is the parent and A is the child.

You can specify where the groups must be online or offline. For all online dependencies, the child group must be online in order for the

parent to start. A location of local, global, or remote determines where the parent can come online relative to where the child is online.

For offline local, there is no requirement for either service group to start first. The child group must be offline for the parent to come online.

Failover behavior of linked service groups is specified by designating the relationship soft, firm, or hard.

Detailed information about how these dependencies impact service group operations is provided in Veritas Cluster Server Administrators Guide.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Failover types The dependency type determines failover behavior for linked service groups.

Soft dependencyA soft dependency means that VCS does not immediately take the parent offline if the child group faults.

When both groups are online, the child group can be taken offline while the parent is online. Likewise, the parent group can be taken offline while the child is online. The parent remains online if the child group faults and cannot fail over. The child remains online if the parent group faults.

Firm dependencyA firm dependency means that VCS imposes additional constraints on the parent and child service groups. Specifically: The parent group must be taken offline when the child group faults. When the child is brought online on another system, the parent group is

brought online on a system determined by the location type (local, global, remote).

If the parent group faults, the child continues to run.

Hard dependencyA hard dependency means that VCS takes the other service group offline when either the child or parent group faults.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


1

Service group dependency examples

Online local exampleIn an online local dependency, a child service group must be online on the same system before the parent service group can come online on that system.

Online local softAn online local soft dependency designates that the parent service group remains online when the child service group faults. If the child service group fails over and is brought online on another system, the parent service group is then migrated to that same system.

Online local firm

An online local firm is similar to soft, except the parent service group is taken offline when the child faults. If the child fails over and is restarted on another system, the parent is then also started on that system

Online local hardIn an online local hard dependency, the child service group is taken offline if the parent service group faults. Online local is the only dependency supporting the hard type.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Online remote exampleIn an online remote dependency, a child service group must be online on a remote system before the parent service group can come online on the local system.

Online remote softAn online remote soft dependency designates that the parent service group remains online when the child service group faults, as long as the child service group chooses another system on which to fail over. If the child service group chooses to fail over to the system where the parent was online, the parent service group is migrated to any other available system.

Online remote firmAn online remote firm is similar to soft, except the parent service group is taken offline when the child faults.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


1

Offline local exampleIn an offline local dependency, the parent service group can be started only if the child service group is offline on the local system. Similarly, the child can only be started if the parent is offline on the local system. This prevents conflicting applications from running on the same system. This is commonly used when you have a test version of an application running on the failover target system for the production version of the application.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Configuring service group dependencies

Service group dependency rulesYou can use service group dependencies to implement parent/child relationships between applications. Before using service group dependencies to implement the relationships between multiple application services, review the rules governing these dependencies: The child service group always has priority over the parent group. Service groups can have multiple parent service groups.

This means that an application service can have multiple other application services depending on it.

A service group can have multiple child service groups. This means that an application service can be dependent on one or more application services.

A group dependency tree can be no more than five levels deep. Service groups cannot have cyclical dependencies.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


1

Creating service group dependenciesYou can create service group dependencies from the command-line interface using the hagrp command, using Veritas Operations Manager Web GUI, or using the Cluster Manager Java GUI. To create a dependency, link the groups and specify the relationship (dependency) type, indicating whether it is soft, firm, or hard.

If not specified, service group dependencies are firm by default.

To configure service group dependencies using the Cluster Manager Java GUI, you can either right-click the parent service group and select Link to display the Link Service Groups view that is shown on the slide, or you can use the Service Group View.

Linking constraints

To link a parent and child group with soft dependency, it is not required that the child group be online if the parent is online. However, if the child group is also online, the parent and child may not be linked in such a way that their online states conflict with the type of link between the parent and child.

To link a parent and child group with firm dependency, the parent group must be offline, or the parent and child group must be online in such a way that their online states do not conflict with the type of link between the parent and child.

Removing service group dependenciesWhen removing a dependency, you do not need to specify the type of dependency, because only one dependency is allowed between two service groups.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Modeling service group dependenciesYou can use the Simulator on Windows to model how different types of links affect service group behavior. This enables you to fully understand the implications and the effects of different dependency configurations before you configure links in your production environment.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


1

Online and offline of linked service groupsYou can use the -propagate option to the hagrp command to simplify bringing linked service groups online and offline. You can also combine -propagate with the -any and -sys options for specifying where the service groups are brought online.

Note that propagation is supported only with local service group dependencies type.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Alternative methods of controlling interactions

Using triggers to control service group interactionsVCS provides several event triggers that can be used to enforce service group relationships where dependencies cannot be configured, as described in the previous example. These triggers are useful for managing relationships between service groups: VCS runs the preonline script before bringing a service group online.

The PreOnline trigger must be enabled for each applicable service group by setting the PreOnline service group attribute. For example, to enable the preonline trigger for groupa, type:hagrp -modify GroupB PreOnline 1The preonline script must start the service group with the -nopre option to hagrp.

The postonline script is run after a service group is brought online. The postoffline script is run after a service group is taken offline.

The postonline and postoffline triggers are enabled automatically if the script is present in the $VCS_HOME/bin/triggers directory. Be sure to copy triggers to all systems in the cluster. When present, these triggers apply to all service groups.

Consider implementing triggers only after investigating whether VCS native facilities can be used to configure the desired behavior. Triggers add complexity, requiring programming skills as opposed to simply configuring VCS objects and attributes.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


1

Managing multitier applicationsOne of the key tenets of a multitier architecture is the operational independence of each layereach tier can make its own decisions without relying on an external brain. This applies to high availability, as well. Each tier needs to have HA of its own it should be able to independently handle the resilience of the application component running in that tier.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Virtual Business ServicesVirtual Business Services provide continuous high availability and reduce frequency and duration of service disruptions for multi-tier business applications running on heterogeneous operating systems and virtualization technologies.

A Virtual Business Service represents the multi-tier application as a single consolidated entity and builds on the high availability and disaster recovery provided for the individual tiers by Symantec products such as Veritas Cluster Server and Symantec ApplicationHA.

VBS is configured and managed using Veritas Operations Manager (VOM) version 4.1 and later.

In the example shown in the slide, the database cluster is the lowest level tier and is the most critical component in this service. Typically, you would find the database to be shared across multiple services as well. If something happens in the database tier, for example, an active server fails or the database runs into software issues on one of the servers, VCS ensures the database is failed over to another node in the cluster. Additionally, disaster recovery may be configured so that if the production site goes down, VCS fails the database over to the DR site.

The middle tier is an application cluster running on several servers in parallel in order to meet user demands. VCS provide local high availability in this tier and manages the dependencies on the database.

Finally, the web server top tier is running on Vmware virtual machines. Symantec ApplicationHA is managing high availability for the Apache Web servers running on Windows.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


1

Fault propagation between clustersWhen a service group that is running on a cluster that is a member of a Virtual Business Service faults, the fault is propagated to other clusters similarly to service group dependencies.

A restart type dependency is specific to a VBS configuration. In the example configuration shown in the slide, if oracle_sg faults, oracle_apps_sg ignores the fault and continues to run. If or when the oracle_sg restarts and is in a running state, oracle_apps_sg restarts.

In other words, the parent service group with a restart dependency on a child service group takes no action when a child service group faults until the child is restarted.

This type of dependency is used when only start or stop ordering is required, and no fault policy action is needed.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Labs and solutions for this lesson are located on the following pages. Lab 1: Service group dependencies, page A-3. Lab 1: Service group dependencies, page B-3.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.

29

Lesson 2

Reconfiguring Cluster Membership

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.

31 Lesson 2 Reconfiguring Cluster Membership 23Copyright 2012 Symantec Corporation. All rights reserved.

2

Adding a system to a cluster

Adding a system to a running VCS clusterThe objective of this task is to add a new system to a running VCS cluster with no or minimal impact on application services. Ensure that the cluster configuration is modified so that the application services can make use of the new system in the cluster.

AssumptionsFor illustration purposes, these assumptions are used in describing how to perform this task: The VCS cluster consists of two or more systems, all of which are up and

running. There are multiple service groups configured in the cluster. All of the service

groups are online somewhere in the cluster. The new system to be added to the cluster does not have any VCS software. The new system has the same version of operating system and VERITAS

Storage Foundation as the systems in the cluster. The new system may not have all the required application software. The storage devices can be connected to all systems.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Procedure for adding a system to a running cluster

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


2

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Merging clusters

Merging two running VCS clustersThe objective of this task is to merge two running VCS clusters with no or minimal impact on application services. Also, ensure that the cluster configuration is modified so that the application services can make use of the systems from both clusters.

AssumptionsThe following is a list of assumptions that you need to take into account while planning a procedure for this task: All the systems in both clusters are up and running. There are multiple service groups configured in both clusters. All service

groups are online somewhere in the cluster. All the systems have the same version of operating system and Veritas Storage

Foundation. The clusters do not necessarily have the same application services software. New application software can be installed on the systems to support

application services of the other cluster. The storage devices can be connected to all systems. The cluster interconnects of both clusters are isolated before the merger.

For this example, assume that two two-node clusters are merged into a single four-node cluster.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


2

Procedure for merging two running clusters

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


2

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Additional reconfiguration tasks

Reconfiguring triggers and service groupsThe objective of this task is to update the cluster configuration so that the application services can make use of the systems from both clusters.

AssumptionsThe following is a list of assumptions that you need to take into account while planning a procedure for this task: Application resources have been prepared Service group definitions have been merged Agents have been installed

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


2

Procedure for updating triggers and service groups

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


2

Labs and solutions for this lesson are located on the following pages. Lab 2: Merging clusters, page A-13. Lab 2: Merging clusters, page B-37.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.

43

Lesson 3

Startup and Failover Policies

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.

45 Lesson 3 Startup and Failover Policies 33Copyright 2012 Symantec Corporation. All rights reserved.

3

Startup rules and policies

Rules for automatic service group startupThe following conditions must be satisfied for a service group to be automatically started: The service group AutoStart attribute must be set to the default value of 1. If

this attribute is changed to 0, VCS leaves the service group offline and waits for an administrative command to be issued to bring the service group online.In addition, the service group definition must have at least one system in its AutoStartList attribute. Also, the service group cannot be frozen.

All of the systems in the service groups SystemList must be in the running state so that the service group can be probed on all systems in SystemList.

If there are systems on which the service group can run that have not joined the cluster yet, VCS autodisables the service group until it is probed on all the systems.

And dependencies on child service groups must have been met before a service group can be automatically started.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Startup system selectionThe startup system for a service group is chosen as follows:1 All systems included in the AutoStartList attribute are initial candidates for

service group startup.2 Systems are culled from this initial list according to these criteria:

Frozen systems are eliminated. Systems where the service group has a faulted status are eliminated. Systems that do not meet the service group requirements are eliminated, as

described in detail later in the lesson.3 The target system is then chosen from this list based on the startup policy

defined for the service group.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


3

Automatic startup policiesYou can set the AutoStartPolicy attribute of a service group to one of these three values: Order: Systems are chosen in the order in which they are defined in the

AutoStartList attribute. This is the default policy for every service group. Priority: Of the systems listed in the AutoStartList attribute, the system with

the lowest priority number in SystemList is selected. Load: The system with the highest available capacity is selected.

The autostart policies are described in more detail in the following pages.

To configure the AutoStartPolicy attribute of a service group, execute:hagrp -modify group AutoStartPolicy policy

where possible values for policy are Order, Priority, and Load. You can also set this attribute using Veritas Operations Manager or the Cluster Manager Java GUI.

Note: The configuration must be open to change service group attributes.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


AutoStartPolicy=OrderWhen the AutoStartPolicy attribute of a service group is set to the default value of Order, the first system available in AutoStartList is selected to bring the service group online. The priority numbers in SystemList are ignored.

In the example shown on the slide, the A service group is brought online on s1, although it is the system with the highest priority number in SystemList. Similarly, the B service group is brought online on s2, and the C service group is brought online on s3 because these are the first systems listed in the AutoStartList attributes of the corresponding service groups.

Note: Because Order is the default value for the AutoStartPolicy attribute, it is not listed in the service group definitions in the main.cf file.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


3

AutoStartPolicy=PriorityWhen the AutoStartPolicy attribute of a service group is set to Priority, the system with the lowest priority number in the SystemList that also appears in the AutoStartList is selected as the target system during start-up. In this case, the order of systems in the AutoStartList is ignored.

The same example service groups are now modified to use the Priority AutoStartPolicy, as shown on the slide. In this example, the A service group is brought online on s3, which has the lowest priority number in SystemList even though it is listed as the last system in AutoStartList. Similarly, the B service group is brought online on s1 (with priority number 0), and the C service group is brought online on s1 (with priority number 1).

Notice how the startup systems have changed for the service groups by changing the AutoStartPolicy attribute, although the SystemList and AutoStartList attributes are the same for these two examples.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


AutoStartPolicy=LoadWhen AutoStartPolicy is set to Load, VCS determines the target system based on the existing workload of each system listed in the AutoStartList attribute and the load that is added by the service group.

These attributes control load-based startup: Capacity is a user-defined system attribute that contains a value representing

the total amount of load that the system can handle. Load is a user-defined service group attribute that defines the amount of

capacity required to run the service group. AvailableCapacity is a system attribute maintained by VCS that quantifies the

remaining available system load.

In the example displayed on the slide, three servers have Capacity set to 300, 200, and 100, respectively. The s2 system is selected as the target system for starting C because it has the highest AvailableCapacity value of 200 after A and B are started on s1.

Determining Load and CapacityYou must determine a value for Load for each service group. This value is based on how much of the system capacity is required to run the application service that is managed by the service group.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


3

When a service group is brought online, the value of its Load attribute is subtracted from the system Capacity value, and AvailableCapacity is updated to reflect the difference. Both the Capacity attribute of a system and the Load attribute of a service group are static user-defined attributes based on your design criteria.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Failover rules and policies

Rules for automatic service group failoverThe following conditions must be satisfied for a service group to be automatically failed over after a fault: The service group must contain a critical resource, and that resource must fault

or be a parent of a faulted resource. The service group AutoFailOver attribute must be set to the default value of 1.

If this attribute is changed to 0, VCS leaves the service group offline and waits for an administrative command to be issued to bring the service group online.The ManageFaults attribute must be set to All, the default setting.

The service group cannot be frozen. At least one of the systems in the service groups SystemList attribute must be

in a running state.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


3

Failover system selectionThe failover system for the service group is chosen as follows:1 A subset of systems listed in the SystemList attribute is created first.2 Systems that do not meet the service group requirements are eliminated,

including: Frozen systems Systems where the service group has a faulted status are eliminated from

the list. Systems that do not meet the service group requirements are eliminated, as

described in detail later in the lesson.3 The target system is chosen from this list based on the failover policy defined

for the service group.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Failover policiesVCS supports a variety of policies that determine how a system is selected when service groups must migrate due to faults. The policy is configured by setting the FailOverPolicy attribute to one of these values: Priority: The system with the lowest priority number is preferred for failover

(default). RoundRobin: The system with the least number of active service groups is

selected for failover. Load: The system with the highest value of the AvailableCapacity system

attribute is selected for failover.

The policies are described in more detail in the following pages.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


3

FailOverPolicy=PriorityWhen FailOverPolicy is set to Priority, VCS selects the system with the lowest assigned value from the SystemList attribute.

For example, the C service group has three systems configured in the SystemList attribute and the same order for AutoStartList values:SystemList = {s3=0, s1=1, s2=2}AutoStartList = {s3, s1, s2}The C service group is initially started on s3 because it is the first system in AutoStartList. If C faults on s3, VCS selects s1 as the failover target because it has the lowest priority value for the remaining available systems.

Priority policy is the default behavior and is ideal for simple two-node clusters or small clusters with few service groups.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


FailOverPolicy=RoundRobinThe RoundRobin policy selects the system running the fewest service groups as the failover target.

The RoundRobin policy is ideal for large clusters running many service groups with essentially the same server load characteristics (for example, similar databases or applications).

Take into account these properties of the RoundRobin policy: Only systems listed in the SystemList attribute for the service group are

considered when VCS selects a failover target for all failover policies, including RoundRobin.

A service group that is in the process of being brought online is not considered an active service group until it is completely online.

Ties are determined by the order of systems in the SystemList attribute. For example, if two failover target systems have the same number of service groups running, the system listed first in the SystemList attribute is selected for failover.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


3

FailOverPolicy=LoadWhen FailOverPolicy is set to Load, VCS determines the target system based on the existing workload of each system listed in the SystemList attribute and the load that is added by the service group.

These attributes control load-based failover: Capacity is a system attribute that contains a value representing the total

amount of load that the system can handle. Load is a service group attribute that defines the amount of capacity required to

run the service group. AvailableCapacity is a system attribute maintained by VCS that quantifies the

remaining available system load.

In the example displayed in the slide, the three servers that remain running after the s3 system fails have Capacity set to 300, 200, and 100. Each service group has a fixed load defined by the user, which is subtracted from the system capacity to find the AvailableCapacity value of a system.

When failover occurs, VCS checks the value of AvailableCapacity on each potential targeteach system in the SystemList attribute for the service group and starts the service group on the system with the highest value.

Note: In the event that no system has a high enough AvailableCapacity value for a service group load, the service group still fails over to the system with the highest value for AvailableCapacity, even if the resulting AvailableCapacity value is zero or a negative number.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Configuring Load and CapacityYou can use VOM, the Cluster Manager Java GUI, or the command-line interface to set the Capacity system attribute and the Load service group attribute.

To set Capacity from the command-line interface, use the hasys -modify command as shown in this example:hasys -modify s1 Capacity 300

To set Load from the CLI, use the hagrp -modify command as shown in this example:hagrp -modify dbsg Load 75

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


3

The loadwarning triggerYou can configure the loadwarning trigger to provide notification that a system has sustained a predetermined load level for a specified period of time.

To configure the loadwarning trigger:1 Create a loadwarning script in the /opt/VRTSvcs/bin/triggers

directory. You can copy the sample trigger script from /opt/VRTSvcs/ bin/sample_triggers as a starting point, and then modify it according to your requirements.

2 Set the loadwarning attributes for the system: Capacity: Load capacity for the system LoadWarningLevel: The level at which load has reached a critical limit;

expressed as a percentage of the Capacity attribute Default is 80 percent

LoadTimeThreshold: Length of time, in seconds, that a system must remain at, or above, LoadWarningLevel before the trigger is runDefault is 600 seconds

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Limits and Prerequisites

Startup exampleVCS enables you to define the available resources on each system and the corresponding requirements for these resources for each service group. Shared memory, semaphores, and the number of processors or application instances are all examples of resources that can be defined on a system.

Note: The resources that you define are arbitrarythey do not need to correspond to physical or software resources. You then define the corresponding prerequisites for a service group to come online on a system.

In a multinode, multiapplication services environment, VCS keeps track of the available resources on a system by subtracting the resources already in use by service groups online on each system from the maximum capacity for that resource. When a new service group is brought online, VCS checks these available resources against service group prerequisites; the service group cannot be brought online on a system that does not have enough available resources to support the application services.

LimitsThe Limits system attribute is used to define the resources and the corresponding capacity of each system for that resource. You can use any keyword for a resource as long as you use the same keyword on all systems and service groups.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


3

The example values displayed in the slide are set as follows: On the first system, the Limits attribute setting in main.cf is:

Limits = { DBs=2 } On the second two systems, the Limits attribute setting in main.cf is:

Limits = { DBs=1 } On the fourth system, the Limits attribute setting in main.cf is:

Limits = { DBs=0 }This value of DBs in this example is used to control how many Oracle service groups can run on a system.

Service group PrerequisitesPrerequisites is a service group attribute that defines the set of resources needed to run the service group. These values correspond to the Limits system attribute and are set by the Prerequisites service group attribute. This main.cf configuration corresponds to the E service group in the diagram:Prerequisites = { DBs=1 }

CurrentLimits

CurrentLimits is an attribute maintained by VCS that contains the value of the remaining available resources for a system. For example, if the limit for DBs is 2 and the A service group is online with a DBs prerequisite of 1, the CurrentLimits setting for DBs is 1:CurrentLimits = { DBs=1 }

Selecting a target system

Prerequisites are used to determine a subset of eligible systems on which a service group can be started during failover or startup. When a list of eligible systems is created, HAD then follows the configured policy for autostart or failover.

Note: A value of 0 is assumed for systems that do not have some or all of the resources defined in their Limits attribute. Similarly, a value of 0 is assumed for service groups that do not have some or all of the resources defined in their Prerequisites attribute.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Failover exampleThe configuration in the slide shows how a failover target system is selected for the C service group when the s3 system faults. Because C has a prerequisite of 1 for DBs and there are no systems with a high enough CurrentLimits value for DBs to support running the group, C stays offline.

System Limits are hard values, meaning that if a system does not meet the requirements specified in the Prerequisites attribute for a service group, the service group cannot be started on that system.

Contrast this with the Load and Capacity, which are soft limits. If the AvailableCapacity value is not high enough to satisfy the Load requirement for a service group, VCS still fails over the service group, even if the AvailableCapacity becomes a negative value.

Therefore, using Limits and Prerequisites may be a better method for controlling service group startup and failover in cases where hard limits must be enforced. An example case is an environment where licensing restrictions prevent two instances of an application or database from running on a system.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


3

Configuring Limits and PrerequisitesYou can use the VCS GUI or command-line interface to set the Limits system attribute and the Prerequisites service group attribute.

To set Limits from the command-line interface, use the hasys -modify command as shown in the following example:hasys -modify s2 Limits DBs 2

To set Prerequisites from the CLI, use the hagrp -modify command as shown in this example:hagrp -modify DBSG Prerequisites DBs 1

Notes: To be able to set these attributes, open the VCS configuration to enable read/

write mode and ensure that the service groups that are already online on a system do not violate the restrictions.

The order that the resources are defined within the Limits or Prerequisites attributes is not important.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Modeling startup and failover policies

Using the SimulatorThe VCS Simulator for Windows is a good tool for modeling the behavior that you require before making changes to the running configuration. This enables you to fully understand the implications and the effects of different workload management configurations.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


3

Labs and solutions for this lesson are located on the following pages. Lab 3: Failover policies, page A-25. Lab 3: Failover policies, page B-65.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.

67

Lesson 4

Alternate Network Configurations

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.

69 Lesson 4 Alternate Network Configurations 43Copyright 2012 Symantec Corporation. All rights reserved.

4Multiple service groups with NIC resources

NIC resources in multiple service groupsMany clusters have multiple service groups, each containing NIC resources that monitor the same physical network interface. In this case, VCS monitors the same network interfacesay e1000g0 on Solarismany times, creating unnecessary overhead and network traffic.

In addition to the overhead of many monitor cycles for the same resource, a disadvantage of this configuration is the effect of changes in NIC hardware. If you must change the network interface (for example, in the event the interface fails), you must change the Device attribute for each NIC resource monitoring that interface.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Using Proxy resourcesYou can use a Proxy resource to allow multiple service groups to monitor the same network interfaces. This reduces the network traffic that results from having multiple NIC resources in different service groups monitor the same interface.

The Proxy resource mirrors the status of another resource in a different service group. The required attribute, TargetResName, is the name of the resource whose status is reflected by the Proxy resource.

TargetSysName is an optional attribute specifies the name of the system on which the target resource status is monitored. If no system is specified, the local system is used as the target system.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


4Parallel network service groupsParallel service groups run on multiple systems simultaneously. A network service group can be configured as a parallel service group because the NIC resource is persistent and can be online on multiple systems. With this configuration, all the other service groups are configured with a Proxy to the NIC resource on their local system.

A parallel service group is managed like any other service group in VCS, except that it cannot be switched. The group is only started on a system listed in the AutoStartList and the SystemList attributes. A parallel service group can also fail over, in effect, if the service group faults on a system and there is an available system (listed in the SystemList attribute) that is not already running the service group. This is accomplished by VCS starting up the parallel group on the target system.

To create a new parallel service group in a running cluster, set the Parallel attribute to 1 (true) and then add resources.

You cannot change an existing failover service group that contains resources to a parallel service group except by using the offline configuration procedure to edit the main.cf file and add Parallel = 1 to the service group definition.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Using Phantom resourcesThe example network service group also requires a Phantom resource in order to ensure that the service group status is displayed properly. A service group shows an online status only when all of its nonpersistent resources are online. Therefore, if a service group has only persistent resources, VCS considers the group offline, even if the persistent resources are running properly. When a Phantom resource is added, the status of the service group is shown as online.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


4Localizing a NIC resource An attribute whose value applies to all systems is global in scope. An attribute whose value applies on a per-system basis is local in scope. By default, all attributes are global. Some attributes can be localized to enable you to specify different values for different systems.

In the example displayed in the slide, the Device attribute for the NIC resource is localized to enable you to specify a different interface for each system.

After creating the resource, you can localize attribute values using the hares command, a GUI, or an offline configuration method. For example, when using the CLI, type:hares -local netnic Devicehares -modify netnic Device eth0 -sys s1hares -modify netnic Device eth4 -sys s2

Any attribute can be localized. Network-related resources are common examples for local attributes.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Multiple public interfaces

Trunked or bonded interfacesMost UNIX operating systems have the capability to create a logical network interface that represents a collection of physical network interfaces. This is often referred to as port trunking or interface bonding.

For example, on Linux, you can use the bond driver to treat multiple network interfaces as one logical interface. In this case, you can use the VCS NIC agent to monitor a bond-type interface and an IP resource to bring up a virtual IP address on the virtual bond interface.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


4

The following table shows an example configuration where eth0 and eth1 are controlled by the bond0 interface.

The corresponding sample /etc/modules.conf file contains:alias eth0 e1000alias eth1 e1000alias bond0 bondingoptions bond0 miimom=100 mode=2

The following VCS resource definitions are contained in a service group to monitor the bond0 interface and manage the virtual IP address.IP webip ( Device = bond0 Address = "10.10.10.21"

)

NIC bond0nic ( Device = bond0 )An advantage of using the Linux bond interface is that you can aggregate interfaces to increase bandwidth.

ifcfg-eth0 ifcfg-eth1 ifcfg-bond0

DEVICE=eth0ONBOOT=yesBOOTPROTO=noneUSERCTL=noPEERDNS=noTYPE=EthernetMASTER=bond0SLAVE=yes

DEVICE=eth1ONBOOT=yesBOOTPROTO=noneUSERCTL=noPEERDNS=noTYPE=EthernetMASTER=bond0SLAVE=yes

DEVICE=bond0ONBOOT=yesUSERCTL=noTYPE=EthernetMTU=""NETMASK=255.255.0.0BOOTPROTO=noneIPADDR=10.10.2.20BROADCAST=10.10.255.255GATEWAY=10.10.2.1NETWORK=10.10.2.0

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Local network interface failoverWith the availability of inexpensive network adapters, it is common to have many network interfaces on each system. By allocating more than one network interface to a service group, you can potentially avoid failover of the entire service group if the interface fails. By moving the IP address on the failed interface to another interface on the local system, you can eliminate or minimize downtime.

VCS provides this type of local failover with the MultiNICB and IPMultiNICB resources for the Solaris, AIX, and HP-UX platforms. All platforms also support legacy resource types, MultiNICA and IPMultiNIC, for local interface failover.

Advantages of local interface failoverLocal interface failover can drastically reduce service interruptions to the clients. Some applications have time-consuming shutdown and startup processes that result in substantial downtime when the application fails over from one system to another.

Failover between local interfaces can be completely transparent to users for some applications.

Using multiple networks also makes it possible to eliminate any switch or hub failures causing service group failover as long as the multiple interfaces on the system are connected to separate hubs or switches.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


4Network resources overviewThe MultiNICA agent is capable of monitoring multiple network interfaces, and if one of these interfaces faults, VCS fails over the IP address defined by the IPMultiNIC resource to the next available public network adapter.

The IPMultiNIC and MultiNICA resources provide essentially the same service as the IP and NIC resources, but these resources monitor multiple interfaces instead of a single interface. The dependency between these resources is the same as the dependency between IP and NIC resources.

The MultiNICB and IPMultiNICB agents provide similar functionality to the MultiNICA and IPMultiNIC agents with many additional features, such as: Support for faster failover Support for active/active interfaces Support for failback

On Solaris only, these additional features are provided: Support for the Solaris IPMP daemon Support for trunked network interfaces on Solaris

See the Veritas Cluster Server Bundled Agents Reference Guide for your platform for complete details about configuring these resource types.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Comparing MultiNICA and MultiNICB

Advantages of using MultiNICA and IPMultiNIC Physical interfaces can be plumbed as needed by the agent, supporting an

active/passive configuration. MultiNICA requires only one base IP address for the set of interfaces under its

control. This address can also be used as the administrative IP address for the system.

MultiNICA does not require all interfaces to be part of a single IP subnet.

Advantages of using MultiNICB and IPMultiNICB All interfaces under a particular MultiNICB resource are always configured

and have test IP addresses to speed failover. MultiNICB failover is many times faster than that of MultiNICA. Support for single and multiple interfaces eliminates the need for separate pairs

of NIC and IP, or MultiNICA and IPMultiNIC, for these interfaces. MultiNICB and IPMultiNICB support failback of IP addresses. MultiNICB and IPMultiNICB support manual movement of IP addresses

between working interfaces under the same MultiNICB resource without changing the VCS configuration or disabling resources.

On Solaris, MultiNICB and IPMultiNICB support IPMP, interface groups, and trunked ge and qfe interfaces.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


4

Labs and solutions for this lesson are located on the following pages. Lab 4: Creating a parallel network service group, page A-37. Lab 4: Creating a parallel network service group, page B-89.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.

81

Lesson 5

High Availability in the Enterprise

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.

83 Lesson 5 High Availability in the Enterprise 53Copyright 2012 Symantec Corporation. All rights reserved.

5

Veritas Operations ManagerVeritas Operations Manager provides a single, centralized management console for the Veritas Storage Foundation and High Availability products. You can use it to monitor, visualize, and manage storage and cluster resources, and generate reports. Veritas Operations Manager enables administrators to centrally manage diverse data center environments.

You can also use Veritas Operations Manager to manage hosts that do not have Storage Foundation and High Availability products installed.

Simplified cluster managementVeritas Operations Manager (VOM) enables you to perform administrative tasks and analysis for all clusters in an environment, across physical locations, virtual environments, and operating system platforms.

You can also use Veritas Operations Manager to visualize the state of multi-tier applications and all subcomponents components and to start or stop the entire logical application in an ordered fashion. Virtual Business Services provides this capability of associating service groups across clusters and managing the relationships in a high availability and disaster recovery environment.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


VOM functional overviewVOM collects, stores, consolidates, and presents state information for all Storage Foundation HA resources in a variety of views.

VOM uses a Web browser-based user interface to present information in a layered form, starting with the concise dashboard view shown in the slide. The dashboard view, presented upon login, contains highly summarized information about all managed storage, server, and application resources at a glance.

Graphics and color coding are both used to draw attention to resources that require attention because they are either faulted (not functioning) or are at risk (configured to be fault-tolerant, but having one or more failed components that could result in faulting if a further component failure occurs).

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


5

Distributed commands in a VOM domainVOM provides centralized cluster visualization, monitoring, and control of the entire distributed environment. The components of a VOM domainthe scope of a Storage Foundation High Availability management environmentinclude: Management serversstand-alone peer servers or clusters One or more Web consoles Managed hosts: VCS 4.x or 5.x nodes running a connector agent

VOM increases administrator efficiencies and reduces configuration errors by enabling centralized deployment of configuration changes, such as by applying the same change to many clusters with one command.

In the example shown in the slide, a new user account, OraOper, is given service group operator privileges for all clusters containing Oracle service groups.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


License deployment policies and reportingWhen using keyless licensing, all cluster nodes must become managed hosts in a VOM environment. The licenses can then be tracked using the license deployment reporting feature in VOM.

A 60 day grace period is allowed to set up the VOM environment and configure SFHA systems as managed hosts. After 60 days, warning messages are issued every four hours until the system is configured as a managed host.

Note: Although you can install VCS without Storage Foundation and select keyless licensing, the warning mechanism is only disabled when you install Storage Foundation and add the host to the VOM domain.

You can use VOM license deployment policies and reports to ensure all systems in the data center meet licensing requirements.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


5

VOM solutionsVeritas Operations Manager solutions are independent and optional feature packs that you can download and use to enhance the functionality of Veritas Operations Manager. These solutions are grouped into the following categories: Add-on Package Patch Hotfix

A core group of add-ons are bundled in VOM solutions repository. Other add-ons must be downloaded from sort.symantec.com and then uploaded to your VOM repository. Any solutions available in the repository can be installed on the Management Server, managed hosts, or on both. To deploy solutions you must have domain administrative privileges.

VOM Deployment Management enables you to install a solution on selected hosts, or a selected platform (for example AIX). When you run the installation process, a deployment request is sent. You can view that deployment request in the Deployment Requests page.

After a solution is installed, you must enable the solution to use the management tools within the add-on.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Disaster recovery enhancements

Testing disaster recoveryDisaster recovery fire drill is a feature provided with the Global Cluster Option for VCS that enables you to anticipate and address configuration problems at the DR site prior to encountering an actual disaster.

A fire drill is implemented by configuring a clone service group with special-purpose resources. These fire drill service groups are modified to use copies of the primary sites data so that the applications can be started and tested without affecting the production service groups.

When you bring a fire drill service group online, the resources:1 Create a snapshot of the replicated volume.2 Mount the snapshot file system.3 Start the application or database.

VCS logs any errors to enable you to identify configuration problems in your DR site, as shown in the following examples. Starting with VCS 5.0, EMC SRDF and Hitachi Truecopy agents are enhanced to take advantage of device tagging in Volume Manager. This enables hardware snapshots to be imported and used for a fire drill capability without any scripting or tasks required to be used outside of VCS.

Note: The Percentage in slide based on Symantec Disaster Recovery Study published 22 November 2010.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


5

Successful fire drill exampleThe slide shows example VCS Engine log file entries for a successful fire drill test of an Oracle service group named oragrp_fd. In this case, all resources came online, and the database is running using data files on the snapshot volumes.

You can configure the VCS Management Console to run regularly scheduled fire drills to ensure you are continuously validating your DR environment.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


DR configuration problem exampleOne common DR configuration problem is storage configuration driftan object is changed at the primary site and that change is not propagated to the secondary site.

In this example, a database administrator adds a volume to a disk group on the primary site, and then creates a new tablespace. This DBA is not aware that the volume needs to be added to the replication configuration, so the tablespace is not replicated to the secondary site.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


5

Failed fire drill exampleThis slide shows VCS engine log entries indicating the problem with the fire drill caused by data created on the primary site but not replicated to the secondary site.

The log includes the Oracle startup output resulting from bringing the Oracle (Ora-fd3_fd) resource online on the secondary site. The Oracle error output identifies a problem with the /oradata2/tb2.dbf file, which is not present at the secondary site.

Other examples of configuration drift that cause problems for site migration include: Expired licenses Operating system and application patch version mismatches Configuration files not synchronized

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Managing replicationYou can use VOM to simplify configuration of storage and replication objects used in replicated data clusters (RDCs) and global clusters with both 4.x- and 5.x-based systems.

To generate reports about the details of data transfer between source and target replication hosts, download the Add-on for Veritas Volume Replicator Bandwidth Reporting from sort.symantec.com.

After you upload the add-on to the VOM repository, you can deploy the solution to the management server and all hosts with replication solutions.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


5

Site-wide notification and migrationVOM simplifies management of disaster recovery, providing policy-based monitoring of critical events, with automatic notification of administrators when faults occur.

The Management Console also provides valuable diagnostic tools, such as: Uptime analysis Configuration analysis Agent inventory Additional predefined reports that can be scheduled and saved

Perhaps most important in a DR environment, VOM enables you to perform an entire site migration and recovery with a single action.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Disaster Recovery AdvisorVeritas Disaster Recovery Advisor (DRA) monitors high availability and disaster recovery configurations to ensure data center recoverability. By scanning systems across the data center to ensure that existing HA/DR plans are applied seamless, Disaster Recovery Advisor helps to limit the risk of infrastructure and application downtime.

DRA scans storage, servers, databases, clusters, and replication infrastructures using a knowledge base containing over 4,600 risk signatures. When gaps are discovered, the software alerts the system administrator so the issues can be resolved before business operations are impacted. As an agentless solution, the implementation of Disaster Recovery Advisor is unobtrusive and non-disruptive, operating in read-only mode. Monitoring and management is simple with status dashboards that provide detailed insight into the data center's environment.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


5

Virtualization support

Server virtualization approachesMost customers are using more than one virtualization methodology within their IT infrastructure. Each major UNIX operating system vendor has a virtualization technology, each with unique terms such as control domain, host, guest, and virtual I/O. The slide illustrates four virtualization approaches and examples of platforms that utilize that type of virtualization. Hardware partitioning subdivides a server, allocating resources to partitions,

each running an operating system. Sun Domains and HP nPartitions are example solutions using this technology.

Bare metal hypervisor consists of a hypervisor process running directly on the server hardware, which enables creating and maintaining virtual machines. VMware ESX server is an example of this technology.

Hosted hypervisor has hosted operating system running on the server and a hypervisor running within the OS. I/O flows from the virtual machines, through the hypervisor, to the local operating system drivers, and then to the server hardware. Linux KVM and Solaris LDOM are examples using this technology.

Finally, operating system container technology allows for the partitioning of the OS into containers to control access to the applications running within the containers.

Cop

yrig

ht

201

2 S

yman

tec

Cor

pora

tion.

All

right

s re

serv

ed.


Symantec solutions for virtualizationThe slide shows how Storage Foundation and VCS works in each of the virtualization environments. Hardware partitioning

Storage Foundation and VCS run within partitions and generally partitions are not migrated between physical machines.

Bare metal hypervisorStorage Foundation and VCS run within the virtual machines. The VMs may be migrated to different physical servers by virtualization HA technology.

Hosted hypervisorStorage Foundation and VCS can run on both the host operating system and virtual machines. VCS c