Date post: | 26-Mar-2015 |
Category: |
Documents |
Author: | samir-ahmed |
View: | 377 times |
Download: | 5 times |
Thank you.
We request that you please turn off pagers and cell phones during class.
VERITAS Cluster Serverfor Solaris
Lesson 1VCS Terms and Concepts
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-3
Overview
Event Notification
Faults andFailovers
Using VolumeManager
Installing Applications
Troubleshooting
Cluster Communication
Service Group Basics
Terms and
ConceptsIntroduction
Preparing Resources
Resources and Agents
NFS Resources
Managing Cluster
Services
Using Cluster
ManagerInstalling
VCS
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-4
Objectives
After completing this lesson, you will be able to:Define VCS terminology.Describe cluster communication basics.Describe VERITAS Cluster Server architecture.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-5
ClustersLocal Area Network
Fibre SwitchesFibre Switches
SCSI JBODS
Several networked systemsShared storageSingle administrative entityPeer monitoring
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-6
Systems
Members of a clusterReferred to as nodesContain copies of:• Communication protocol configuration files• VCS configuration files• VCS libraries and directories• VCS scripts and daemons
Share a single dynamic cluster configurationProvide application services
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-7
Service Groups
A service group is a related collection of resources.Resources in a service group must be available to the system.Resources and service groups have interdependencies.
NFS Service Group NFS
IP
Disk
Mount
Share NIC
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-8
Service Group Types
Failover• Can be partially or fully online on only one
server at a time• VCS controls stopping and restarting the
service group when components fail
Parallel• Can be partially or fully online on multiple
servers simultaneously• Examples:
– Oracle Parallel Server– Web, FTP servers
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-9
Resources
VCS objects that correspond to hardware or software componentsMonitored and controlled by VCSClassified by typeIdentified by unique names and attributesCan depend on other resources within the same service group
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-10
Resource Types
General description of the attributes of a resourceExample Mount resource type attributes:• MountPoint • BlockDevice
Other example resource types:• Disk• Share• IP• NIC
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-11
Agents
Processes that control resourcesOne agent per resource typeAgent controls all resources of that type.Agents can be added into VCS agent framework.
Disk
c1t0d1s0c1t0d0s0
IP
10.1.2.4
NIC
qfe1hme0
Mount
ResourcesResources /data
AgentsAgents
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-12
Dependencies
Resources can depend on other resources.Parent resources depend on child resources.Service groups can depend on other service groups.Resource types can depend on other resource types.Rules govern service group and resource dependencies.No cyclic dependencies are allowed.
Mount
Disk
(Parent)
(Child)
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-13
Private Network
Minimum two communication channels with separate infrastructure:• Multiple NICs (not just ports)• Separate hubs, if used
Heartbeat communication determines which systems are members of the cluster.Cluster configuration broadcast updates cluster systems with status of each resource and service group.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-14
Low Latency Transport (LLT)
Provides fast, kernel-to-kernel communicationsIs connection orientedIs not routableUses Data Link Provider Interface (DLPI) over Ethernet
SystemA SystemB
LLT LLTKernelKernel
Private NetworkPrivate NetworkHardwareHardware
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-15
Group Membership Services/Atomic Broadcast (GAB)
Manages cluster membershipMaintains cluster stateUses broadcastsRuns in kernel over Low Latency Transport (LLT)
GAB GABKernelKernel
SystemA SystemB
LLT LLT
Private NetworkPrivate NetworkHardwareHardware
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-16
VCS Engine (had)
Maintains configuration and state information for all cluster resourcesUses GAB to communicate among cluster systemsIs monitored by hashadow process
hashadow had hashadow had
GAB GABKernelKernel
SystemA SystemB
LLT LLT
Private NetworkPrivate NetworkHardwareHardware
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-17
VCS Architecture
Shared Cluster Configuration in Memory
SystemA SystemB
LLT
GAB
LLT
GAB
HardwareHardware
KernelKernel
ResourcesResources
AgentsAgents
hashadow
Disk NIC IP
had
Mount
hashadow
Disk NIC IP
had
/v c1d0t0s0 hme0 10.1.2.4 /v c1d0t0s0 hme0 10.1.2.4
Mount
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-18
Summary
You should now be able to:Define VCS terminology.Describe cluster communication basics.Describe VERITAS Cluster Server architecture.
VERITAS Cluster Serverfor Solaris
Lesson 2Installing VERITAS Cluster Server
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-20
Overview
Event Notification
Faults andFailovers
Using VolumeManager
Installing Applications
Troubleshooting
Cluster Communication
Service Group Basics
Terms and
ConceptsIntroduction Installing
VCS
Preparing Resources
Resources and Agents
NFS Resources
Managing Cluster Service
Using Cluster
Manager
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-21
Objectives
After completing this lesson, you will be able to:Describe VCS software, hardware, and licensing prerequisites.Describe the general VCS hardware requirements.Configure SCSI controllers for a shared disk storage environment.Add VCS executable and manual page paths to the environment variables.Install VCS using the installation script.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-22
Software and Hardware Requirements
Software:• Solaris 2.6, 7 and 8 (32-bit and 64-bit)• Recommended:
– Solaris patches– VERITAS Volume Manager (VxVM) 3.1.P1+– VERITAS File System (VxFS) 3.3.1+
Hardware:• Check latest VCS release notes.• Contact VERITAS Support.
Licenses:• Keys are required on a per-system or per-site basis.• Contact VERITAS Sales for new license, or VERITAS
Support for upgrades.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-23
General Hardware LayoutPrivate Ethernet Heartbeat Links
SCSI2
SCSI2
Shared DataDisks
SCSI1
OS Disk
NICS NICS
SCSI1OS
NICS NICS
Public NetworkSYSTEM BSYSTEM A
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-24
SCSI Controller Configuration
SCSI2
SCSI2
Shared DataDisks
SCSI1SYSTEM A
SCSI TargetIDs:
12
3
45
75
scsi-initiator-id
7
scsi-initiator-id
0
OS Disk
0
SCSI1OS Disk
SYSTEM B
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-25
SCSI Controller Setup
Use unique SCSI IDs for each system.Check the scsi-initiator-id settingusing the eeprom command.Change the scsi-initiator-id if needed.Controller ID can also be changed on a controller-by-controller basis.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-26
Setting Environment VariablesFor Bourne or Korn shell (sh or ksh):• PATH
PATH=$PATH:/sbin:/opt/VRTSvcs/bin:/opt/VRTSlltexport PATH
• MANPATHMANPATH=$MANPATH:/opt/VRTS/manexport MANPATH
• Add to /.profile
For C shell (csh or tcsh):• PATH
setenv PATH \${PATH}:/sbin:/opt/VRTSvcs/bin:/opt/VRTSllt
• MANPATHsetenv MANPATH ${MANPATH}:/opt/VRTS/man
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-27
The installvcs UtilityUses pkgadd to install the VCS packages on all the systems in the cluster:• VRTSllt• VRTSgab• VRTSperl• VRTSvcs• VRTSweb• VRTSvcsw• VRTSvcsdc
Requires remote root access to other systems in the cluster while the script is being run (/.rhosts file)Note: Can remove .rhosts files after VCS installation.Configures two private network links for VCS communicationsBrings the cluster up without any services
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-28
Installation Settings
Information required by installvcs:Cluster nameCluster numberSystem namesLicense keyNetwork ports for private networkWeb Console configuration:• Virtual IP address• Subnet mask• Network interface
SMTP/SNMP notification configuration (discussed later)
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-29
Starting VCS Installation
# ./installvcs
…
Please enter the unique Cluster Name : mycluster
Please enter the unique Cluster ID(a number from 0-255) : 200
Enter the systems on which you want to install. (system names separated by spaces) : train7 train8
Analyzing the system for install.……Enter the license key for train7 :
XXXX XXXX …Applying the license key to all systems in the
cluster …
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-30
Installing the Private Network Following is the list of discovered NICs:Sr. No. NIC Device1. /dev/hme:02. /dev/qfe:03. /dev/qfe:14. /dev/qfe:25. /dev/qfe:36. OtherFrom the list above, please enter the serial number
(the number appearing in the Sr. No. column) of the NIC for
First PRIVATE network link: 1From the list above, please enter the serial number
(the number appearing in the Sr. No. column) of the NIC for
Second PRIVATE network link: 2Do you have the same network cards set up on all
systems (Y/N)? y…
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-31
Configuring the Web Console Do you want to configure the Cluster Manager (Web
Console) (Y/N)[Y] ? yEnter the Virtual IP address for the Web Server :
192.168.27.9Enter Subnet [255.255.255.0]: <enter>Enter the NIC Device for this Virtual IP address
(public network) on train7 [hme0]: <enter>Do you have the same NIC Device on all other systems
(Y/ N)[Y] ? yDo you want to configure SNMP and/or SMTP (e-mail)
notification (Y/N)[Y] ? nSummary information for ClusterService Group setup :--------------------------------------------------Cluster Manager (Web Console) :
Virtual IP Address : 192.168.27.9Subnet : 255.255.255.0Public Network link :
train7 train8 : hme0URL to access : http://192.168.27.9:8181/vcs
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-32
Completing VCS Installation Installing on train7.Copying VRTSperl binaries......Installing on train8.Copying VRTSperl binaries.....Copying Cluster configuration files... Done.Installation successful on all systems.Installation can start the Cluster components on the
following system/s.train7 train8Do you want to start these Cluster components now
(Y/N)[Y] ? yLoading GAB and LLT modules and starting VCS on
train7:Starting LLT...Start GAB....Start VCSLoading GAB and LLT modules and starting VCS on
trainer2:Starting LLT...Start GAB....Start VCS
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-33
Summary
You should now be able to:Describe VCS software, hardware, and licensing prerequisites.Describe the general VCS hardware requirements.Configure SCSI controllers for a shared disk storage environment.Add VCS executable and manual page paths to the environment variables.Install VCS using the installation script.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-34
Lab 2: Installing VCS
SCSI2
SCSI2
Shared DataDisks
SCSI TargetIds:
12
3
45
75
scsi-initiator-id
7
scsi-initiator-id
0
OS Disk
0
SCSI1SCSI1
OS Disk train2train1
# ./installvcs
VERITAS Cluster Serverfor Solaris
Lesson 3Managing Cluster Services
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-36
Overview
Event Notification
Faults andFailovers
Using VolumeManager
Installing Applications
Troubleshooting
Cluster Communication
Service Group Basics
Terms and
ConceptsIntroduction
Preparing Resources
Resources and Agents
NFS Resources
Managing Cluster
Services
Using Cluster
ManagerInstalling
VCS
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-37
Objectives
After completing this lesson, you will be able to:Describe the cluster configuration mechanismsStart the VCS engine on cluster systems.Stop the VCS engine.Modify the cluster configuration.Describe cluster transition states.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-38
Cluster Configuration
hashadow
LLT
GAB
hashadowhad had
LLT
GAB
main.cf main.cf
Shared Cluster Configuration in Memory
SystemA SystemB
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-39
Starting VCS1
System1 System2 System3
main.cf
hastart
hadhashadow1
2
3
Cluster Conf
4
No valid configuration
6
7Private Network
hastart
hadhashadow
5
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-40
Starting VCS: Second System1
System1 System2 System3
main.cf
hadhashadow
Private Network
hadhashadow
8
Cluster Conf
ClusterConf
9
main.cf10
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-41
Starting VCS: Third System
main.cf
hadhashadow
hadhashadow
main.cf main.cf
hadhashadow
System1 System2 System3
Shared Cluster Configuration in Memory
Private Network
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-42
Stopping VCS
SGB
System2
had
System1
had
SGASGAhastop -local1
System1
had
SGA
System2
had
SGB
hastop -local -evacuate2
hastop -local -force
System2
had
SGB
System1
had
3
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-43
The hastop Command
The hastop command stops the VCS engine.Syntax:hastop –option [arg] [-option]
Options:• -local [-force | -evacuate]• -sys sys_name [-force | -evacuate]• -all [-force]
Example:hastop -sys train4 -evacuate
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-44
Displaying Cluster Status
The hastatus Command Displays status of items in the cluster.
Syntax:hastatus -option [arg] [-option arg]
Options:• -group service_group• -sum[mary]
Example:hastatus -group OracleSG
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-45
Protecting the Cluster Configuration
1
Cluster Conf
1
main.cf
.stale
main.cf
3
main.cf
.stale
2
haconf -makerw hares –add … haconf –dump makero
1. Cluster configuration opened; .stale file created2. Resources added to cluster configuration in memory;
main.cf out of sync with memory configuration3. Changes saved to disk; .stale removed
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-46
Opening and Saving the Cluster Configuration
The haconf command opens, closes, and saves the cluster configuration.
Syntax:haconf –option [-option]
Options:• -makerw Opens configuration• -dump Saves configuration• -dump –makero Saves and closes
configurationExample:haconf -dump -makero
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-47
Starting VCS with a Stale Configuration
1
System1 System2 System3
main.cf
hadhashadow
Cluster Conf
2
3Private Network
hastart
hadhashadow
1
main.cf
.stale
4Cluster
Conf5
main.cf
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-48
Forcing VCS to Start on the Local System
1
System1 System2 System3
main.cf
2
3Private Network
hastart -force
hadhashadow
1
main.cf
.stale
Cluster Conf
4
main.cf
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-49
Forcing a System to Start1
System1 System2 System3
main.cf
2Private Network
hasys –force System2
hadhashadow
main.cf
.stale
Cluster Conf
1
main.cf
.stale .stale
hadhashadow
hadhashadow
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-50
The hasys Command
Alters or queries state of hadSyntax:hasys –option [arg]
Options:• -force system_name• -list• -display system_name• -delete system_name• -add system_name
Example:hasys -force train11
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-51
Propagating a Specific Configuration
1. Stop VCS on all systems in the cluster and leave applications running:hastop -all -force
2. Start VCS stale on all other systems:hastart -stale
The -stale option causes these systems to wait until a running configuration is available from which they can build.
3. Start VCS on the system with the main.cfthat you are propagating:hastart
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-52
Summary of Start Options
The hastart command starts the had and hashadow daemons.
Syntax:hastart [-option]
Options:• -stale• -force
Example:hastart -force
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-53
Validating the Cluster Configuration
The hacf utility checks the syntax of the main.cf file.
Syntax:hacf -verify config_directory
Example:hacf -verify /etc/VRTSvcs/conf/config
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-54
Modifying Cluster Attributes
The haclus command is used to view and change cluster attributes.
Syntax:haclus –option [arg]
Options:• -display• -help [-modify]• -modify modify_options• -value attribute• -notes
Example:haclus –value ClusterLocation
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-55
Startup States and Transitions
Peer inADMIN_WAIT
UNKNOWN
CURRENT_DISCOVER_WAIT STALE_DISCOVER_WAIT
ADMIN_WAIT ADMIN_WAIT
LOCAL_BUILD
CURRENT_PEER_WAIT
STALE_PEER_WAIT
STALE_ADMIN_WAIT
REMOTE_BUILD
Peer in LOCAL_BUILD
Peer inRUNNING
Peer inRUNNING
Peer inLOCAL_BUILD
Peer inADMIN_WAIT
Valid configuration on disk Stale configuration on disk
hastart
Peer startsLOCAL_BUILD
INITING
Peer inRUNNING
No Peer
DiskError
The only peer inRUNNING state crashesRUNNING
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-56
Shutdown States and Transitions
RUNNING
LEAVINGFAULTED
Running config.lost hastop hastop -force
Unexpected exit
ADMIN_WAIT
EXITING
EXITED
EXITING_FORCIBLY
Resources offlined,agents stopped
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-57
Summary
You should now be able to:Describe the cluster configuration mechanisms.Start VCS.Stop VCS.Modify the cluster configuration.Explain the transition states of the cluster.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-58
Lab 3: Managing Cluster Services
To complete this lab exercise:Use commands to start and stop cluster services, as described in the detailed lab instructions.Observe the cluster status by running hastatus in a terminal window.
VERITAS Cluster Serverfor Solaris
Lesson 4Using the Cluster Manager Graphical User
Interface
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-60
Overview
Event Notification
Faults andFailovers
Using VolumeManager
Installing Applications
Troubleshooting
Cluster Communication
Service Group Basics
Terms and
ConceptsIntroduction
Preparing Resources
Resources and Agents
NFS Resources
Managing Cluster
Services
Using Cluster
ManagerInstalling
VCS
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-61
Objectives
After completing this lesson, you will be able to:Install Cluster Manager.Control access to VCS administration.Demonstrate Cluster Manager features.Create a service group.Create resources.Manage resources and service groups.Use the Web Console to administer VCS.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-62
Installing Cluster Manager
Cluster Manager requirements on Solaris:• 128 MB RAM• 1280 x 1024 display resolution• Minimum 8-bit color depth of the monitor;
24-bit is recommended
To install Cluster Manager:pkgadd –d pkg_location VRTScscm
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-63
Cluster Manager Properties
Can be run from a remote system:• Windows NT • Solaris system (cluster member or
nonmember)
Can manage multiple clusters from a single workstationUses TCP port 14141 by default; change with such an entry in /etc/services, if desired:vcs 12345/tcp
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-64
Controlling Access to VCS: User Accounts
Cluster AdministratorFull privileges
Cluster Operator All cluster, service group, and resource-level operations
Cluster Guest Read-only access; new users created as Cluster Guest accounts by default.
Group Administrator All service group operations for a specified service group, except deleting service groups
Group OperatorOnline and offline service groups and resources; temporarily freeze or unfreeze service groups
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-65
VCS User Account Hierarchy
Cluster AdministratorCluster Administrator
Group AdministratorGroup Administrator
Includes privileges for
Includes privileges for
Includes privileges for
Includes privileges for
Cluster OperatorCluster Operator
Group OperatorGroup Operator
Cluster GuestCluster Guest
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-66
Adding Users and Setting Privileges
Cluster configuration must be open.Users are added using the hausercommand.hauser –add username
Additional privileges can then be added:haclus -modify Administrators -add userhaclus -modify Operators -add userhagrp -modify group Administrators -add userhagrp -modify group Operators -add user
VCS user account admin is created with Cluster Administrator privilege by vcsinstall utility.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-67
Modifying User Accounts
To display account information:hauser -display user_name
To change a password:hauser -update user_name
To delete a VCS user account:hauser -delete user_name
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-68
Controlling Access to the VCS Command Line Interface
No mapping between UNIX and VCS user accounts by default except root, which has Cluster Administrator privilege.Nonroot users are prompted for a VCS account name and password when executing VCS commands using the command line interface.The cluster attribute AllowNativeCliUsers can be set to map UNIX account names to VCS accounts.A VCS account must exist with the same name as the UNIX user with appropriate privileges.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-69
Cluster Manager DemonstrationCluster Manager demonstration:• Configuration and logging on• Creating a service group and a resource• Manual and automatic fail over• Log desk, Command Log, Command
Center, and Cluster ShellRefer to your participants guide as the steps are listed in the notes.If unable to demonstrate in class, the following slides guide you through the demonstration.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-70
Configuring Cluster Manager
2
4 5
hagui&1
3
6
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-71
Logging In to Cluster Manager
1
23
6
7
ServiceGroups
HeartbeatsMember Systems
4
ClusterPanel
5
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-72
VCS Cluster Explorer
2
3
41
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-73
Creating a Service Group
2
4
1
3
5
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-74
Creating a Resource
1
2
3
45
6
7
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-75
Bringing a Resource Online
12 3
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-76
Resource and Service Group Status
1
2
34
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-77
Switching the Service Group to Another System
1
2 3
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-78
Service Group Switched
1
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-79
Changing MonitorInterval
3
4
5
1
2
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-80
Setting the Critical Attribute
1
2
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-81
Faulted Resources
3
2
1
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-82
Clearing a Faulted Resource
2 3
1
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-83
Log Desk
1
2
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-84
Command Log
2
1
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-85
Command Center
3
5
2
1
4
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-86
Shell Tool
25
1
3
4
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-87
Administering User Profiles
Remove or modify user account.Remove or modify user account.
Add user account.Add user account.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-88
Using the Web Console
Web ConsoleManage existing resources and service groups:• Online, offline• Clearing faults and
probing resources• Switching, flushing,
freezing service groups
Cannot be used to create resources or service groupsRuns on any system with a Java-enabled Web browser
Java ConsoleConfigure service groups and resources:• Add• Delete • Modify
Can be used for all VCS administrative tasksRequires Cluster Manager and Java to be installed on the administration system
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-89
Connecting to the Web Console
http://IP_alias:8181/vcshttp://IP_alias:8181/vcs
VCS account and passwordVCS account and password
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-90
Cluster Summary
Display RefreshDisplay Refresh
Log entriesLog entries
Navigation buttonsNavigation buttons
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-91
System View
Navigation trailNavigation trailSelected ViewSelected View
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-92
Summary
You should now be able to:Install Cluster Manager.Control access to VCS administration.Demonstrate Cluster Manager features.Create a service group.Create resources.Manage resources and service groups.Use the Web Console to administer VCS.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-93
Lab 4: Using Cluster Manager
Student Red Student Blue
RedGuiSG BlueGuiSG
/tmp/RedFile /tmp/BlueFile
RedFile BlueFile
VERITAS Cluster Serverfor Solaris
Lesson 5Service Group Basics
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-95
Overview
Event Notification
Faults andFailovers
Using VolumeManager
Installing Applications
Troubleshooting
Cluster Communication
Service Group Basics
Terms and
ConceptsIntroduction
Preparing Resources
Resources and Agents
NFS Resources
Managing Cluster Service
Using Cluster
ManagerInstalling
VCS
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-96
Objectives
After completing this lesson, you will be able to:Describe how application services relate to service groups.Translate application requirements to service group resources.Define common service group attributes.Create a service group using the command lineinterface.Perform basic service group operations.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-97
Application Service
Data
Log
NIC
DatabaseSoftware
Database Requests
IP Address
Network
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-98
High Availability Applications
VCS must be able to perform these operations:
Start using a defined startup procedure.Stop using a defined shutdown procedure.Monitor using a defined procedure.Share storage with other systems and store data to disk, rather than maintaining it in memory.Restart to a known state.Migrate to other systems.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-99
Example Service Groups
Failover Service Group
DatabaseWebSystemA
Parallel Service Group
DatabaseWebSystemB
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-100
Analyzing Applications
1. Specify application services corresponding to service groups.
2. Determine high availability level and service group type, failover or parallel.
3. Specify which systems run which servicesand the desired failover policy.
4. Identify the hardware and software objectsrequired for each service group and their dependencies.
5. Map the service group resources to actual hardware and software objects.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-101
Example Application Services
Web
httpd/datac1t3d0s3192.168.3.56qfe1
Database processes/oracle/data/oracle/logc1t1d0s5c1t2d0s4192.168.3.55qfe1
Database
Service Groups
Application Services
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-102
Identify Physical Resources
Database Service Group
DatabaseApplication
File System /oracle/dataContains data
files(s)
File System /oracle/logContains log
file(s)
IP Address 192.168.3.55
Network Port qfe1
Physical Disk 1c1t1d0s5
Physical Disk 2c1t2d0s4
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-103
Map Physical Objects to VCS Resources
The database service group in the examplerequires:
Two Disk resources to monitor the availability of the shared log disk and the shared data diskTwo Mount resources that mount, unmount, and monitor the required log and data file systemsA NIC resource to check the network connectivity on port qfe1An IP resource to configure the IP address that will be used by database clients to access the databaseAn Oracle resource to start, stop, and monitor the Oracle database application
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-104
Service Groups
Create a service group using the command line interface:• Syntax:hagrp -add group_name
• Example:hagrp –add mySG
Modify service group attributes to define behavior:hagrp –modify group_name attribute \value [values]
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-105
SystemList Attribute
Defines the systems that can run the service groupLowest numbered system has highest priority in determining the target system for failover.To define SystemList attribute:• Syntax:
hagrp –modify group_name SystemList \system1 priority1 system2 priority2 …
• Example:hagrp –modify mySG SystemList \train1 0 train2 1
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-106
AutoStart and AutoStartListAttributes
A service group is automatically started on a system when VCS is started (if it is not already online somewhere else in the cluster) under the following conditions:• The AutoStart attribute is set to 1.• The system is listed in its AutoStartList attribute.• The system is listed in its SystemList attribute.
To define AutoStart attribute (default is 1):hagrp –modify group_name AutoStart value
To define AutoStartList attribute:hagrp –modify group_name AutoStartList \system1 system2 …
Examples:hagrp –modify myManualSG AutoStart 0hagrp –modify mySG AutoStartList train0
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-107
AutoStartIfPartial AttributeAllows VCS to bring a service group with disabled resources onlineAll enabled resources must be probed.Default is 1, enabled.If 0, the service group cannot come online with disabled resourcesTo define AutoStartIfPartial attribute:• Syntax:
hagrp –modify group_name \AutoStartIfPartial value
• Example:hagrp –modify group_name \AutoStartIfPartial 0
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-108
Parallel AttributeParallel service groups:• Run on more than one system at the same time• Respond to system faults by:
– Staying online on remaining systems– Failing over to the specified target system
To set the Parallel attribute:• Syntax:hagrp –modify group_name Parallel value
• Example:hagrp –modify myparallelSG Parallel 1
Must set Parallel attribute before adding resourcesDefault value: 0 (failover)
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-109
Configuring a Service Group
Check Logs/Fix
Test Switching
Done
Success?
Set Critical Res
Link Resources
Add Service Group Test Failover
Set SystemList
Y
Add/Test Resource
Y
Resource Flow Chart
More?
Set Opt AttributesN
N
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-110
Service Group Operations
Service group operations described in the following sections:
Bringing the service group online:hagrp –online group_name –sys system_name
Taking the service group offline:hagrp –offline group_name –sys system_name
Displaying service group properties:hagrp –display group_name
Example command lines:hagrp –online oraclegroup –sys train8hagrp –offline oraclegroup –sys train8hagrp –display oraclegroup
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-111
Bringing a Service Group Online
DiskNIC
IP Mount
Oracle
Process
DiskNIC
IP Mount
Oracle
Process
DiskNIC
IP Mount
Oracle
Process
In-ProgressBefore
After
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-112
Taking a Service Group Offline
Before
AfterDiskNIC
IP Mount
Oracle
Process
DiskNIC
IP Mount
Oracle
Process
DiskNIC
IP Mount
Oracle
Process
In-Progress
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-113
Partially Online Service Groups
DiskNIC
IP Mount
Oracle
Process
A service group is partially online if:One or more nonpersistent resources is online.At least one resource is:• Autostart enabled• Critical• Offline
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-114
Switching a Service Group
A manual failover can be accomplished by taking the service group offline on one system, and bringing it online on another system.To switch a service group from one system to another using a single command:• Syntax:hagrp –switch group_name –to system_name
• Example:hagrp –switch mySG –to train8
To switch using Cluster Manager:Right-click on group—>Switch to—>system.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-115
Flushing a Service Group
Misconfigured resources can cause agents’ processes to hang.Flush service group to stop all online and offline processes.To flush a service group using the command line:• Syntax:hagrp –flush group_name –sys system_name
• Example:hagrp –flush mySG –sys train8
To flush a service group using Cluster Manager:Right-click on group—>Flush—>system.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-116
Deleting a Service Group
Before deleting a service group:1. Bring all resources offline.2. Disable resources.3. Delete resources.To delete a service group using the command line:• Syntax:
hagrp –delete group_name• Example:
hagrp –delete mySG
To delete a service group using Cluster Manager:Right-click on group—>Delete.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-117
Summary
You should now be able to:Describe how application services relate to service groups.Translate application requirements to service group resources.Define common service group attributes.Create a service group using the command lineinterface.Perform basic service group operations.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-118
Lab 5: Creating Service Groups Student Red Student Blue
RedNFSSG BlueNFSSG
RedGuiSG BlueGuiSG
VERITAS Cluster Serverfor Solaris
Lesson 6Preparing Resources
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-120
Overview
Event Notification
Faults andFailovers
Using VolumeManager
Installing Applications
Troubleshooting
Cluster Communication
Service Group Basics
Terms and
ConceptsIntroduction
Preparing Resources
Resources and Agents
NFS Resources
Managing Cluster Service
Using Cluster
ManagerInstalling
VCS
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-121
Objectives
After completing this lesson, you will be able to:Describe the components required to create and share a file system using NFS.Prepare NFS resources.Describe the VCS network environment.Manually migrate the NFS services between two systems.Describe the process of automating high availability.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-122
Operating System ComponentsRelated to NFS
File system-related resources:• Hard disk partition• File system to be mounted• Directory to be shared• NFS daemons
Network-related resources:• IP address• Network interface
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-123
Disk Resources
/dev/(r)dsk/c1t1d0s3 /dev/(r)dsk/c1t1d0s3
System 1Partition 3
disk1
Shared Storage
System 2
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-124
File System and Share Resources
/dev/(r)dsk/c1t1d0s3vxfs/data
System 2System 1
Shared Storage
/dev/(r)dsk/c1t1d0s3vxfs/data
Partition 3
nfsdmountd
nfsdmountd
disk1
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-125
Creating File System Resources
Format a disk and create a slice:• Needs to be done on one system• Use format command.• Must have the same major and minor
numbers on both systems (for NFS)Create a file system on the slice:• From one system only:mkfs –F fstype /dev/rdsk/device_name
• Can use newfs for UFS file systemsCreate a directory for a mount point on each system:mkdir /mount_point
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-126
Sharing the File System1. Mount the file system:
• The file system should not be mounted automatically at boot time.
• Check the file system, if necessary:fsck –F fstype /dev/rdsk/device_namemount –F fstype /dev/dsk/device_name \mount_point
2. Start the NFS daemons, if they are not already running:/usr/lib/nfs/nfsd -a nserver/usr/lib/nfs/mountd
3. Share the file system:share mount_pointNote: The file system should not be shared automatically at boot time.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-127
NFS Resource Dependencies
Share
NFSFile System
Disk Partition
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-128
IP Addresses in a VCS Environment
Administrative IP addresses• Associated with the physical network interface, such
as qfe1• Assigned a unique hostname and IP address by the
operating system at boot time• Available only when the system is up and running• Used for checking network connectivity• Called Base or Maintenance IP addresses
Application IP addresses• Added as a virtual IP address to the network
interface, such as qfe1:1• Associated with an application service• Controlled by the high availability software• Migrated to other systems if the current system fails• Also called service group or floating IP addresses
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-129
Configuring an Administrative IP Address
1. Create /etc/hostname.interface with the desired interface name:vi /etc/hostname.qfe1train14_qfe1
2. Edit /etc/hosts and assign an IP address to the interface name.vi /etc/hosts…166.98.112.14 train14_qfe1
3. Reboot the system.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-130
Configuring Application IP Addresses
Requires the administrative IP address to be configured on the interfaceDo not create a hostname file.To set up manually:1. Configure the IP address using ifconfig:
ifconfig qfe1:1 inet 166.98.112.114 netmask +2. Bring up the IP address:
ifconfig qfe1:1 plumbifconfig qfe1:1 up
3. Assign a virtual hostname (application service name) to the IP address.vi /etc/hosts…166.98.112.114 nfs_services
Clients use the application IP address to connect to the application services.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-131
NFS ServicesResource Dependencies
ApplicationIP
NFS
Share NetworkInterface
File System
DiskPartition
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-132
Monitoring NFS Resources
To verify the file system:mount|grep mount_point
To verify the disk:prtvtoc /dev/dsk/device_nameAlternately:touch /mount_point/sub_dir/.testfilerm /mount_point/sub_dir/.testfile
To verify the share:share | grep mount_point
To verify NFS daemons:ps –ef | grep nfs
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-133
Monitoring the NetworkTo verify network connectivity, use ping to connect to other hosts on the same subnet as the administrative IP address:ping 166.98.112.253
166.98.112.253 is alive
To verify the application IP address, use ifconfig to determine whether the IP address is up:ifconfig -a
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-134
Migrating NFS Services1. Make sure that the target system is available.2. Make sure that the disk is accessible from the target
system.3. Make sure that the target system is connected to the
network.4. Bring the NFS services down on the first system
following the dependencies:a. Configure the application IP address down.b. Stop sharing the file system.c. Unmount the file system.
5. Bring the NFS services up on the target system following the resource dependencies:a. Check and mount the file system.b. Start the NFS daemons if they are not already running.c. Share the file system.d. Configure and bring the application IP address up.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-135
Automating High AvailabilityResources are created once; this is not part of HA operation.Script the monitoring process:• How often should each resource be monitored?• What is the impact of monitoring on processing
power?• Are there any resources to be monitored on the target
system even before failing over?
Script the start and stop processes.Use high availability software to automate:• Maintain communication between systems to verify
that the target system is available for failover.• Observe dependencies during starting and stopping.• Define actions to take when a fault is detected.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-136
Summary
You should now be able to:Describe the components required to create and share a file system using NFS.Prepare NFS resources.Describe the VCS network environment.Manually migrate the NFS services between two systems.Describe the process of automating high availability.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-137
Lab 6: Preparing NFS Resources
Student Red Student Blue
RedGuiSG
BlueNFSSG
BlueGuiSG
RedNFSSG
c1t8d0s0
/Redfs
c1t15d0s0
/Bluefs
VERITAS Cluster Serverfor Solaris
Lesson 7Resources and Agents
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-139
Overview
Event Notification
Faults andFailovers
Using VolumeManager
Installing Applications
Troubleshooting
Cluster Communication
Service Group Basics
Terms and
ConceptsIntroduction
Preparing Resources
Resources and Agents
NFS Resources
Managing Cluster Service
Using Cluster
ManagerInstalling
VCS
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-140
Objectives
After completing this lesson, you will be able to:Describe how resources and resource types are defined in VCS.Describe how agents work.Describe cluster configuration files.Modify the cluster configuration.Use the Disk resource and agent.Use the Mount resource and agent.Create a service group.Configure resources.Perform resource operations.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-141
Resources
Disk
NIC
IP
Mount
Share
NFS
NFS Service Group
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-142
Resource Definitions(main.cf)
Mount MyNFSMount (MountPoint = "/test"BlockDevice = "/dev/dsk/c1t2d0s4"FSType = vxfs
)
Type
Attributes
Unique Name
Attribute Values
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-143
Nonpersistent andPersistent Resources
Nonpersistent resourcesOperations=OnOff
Persistent resources• Operations=OnOnly• Operations=None
Example types.cf entrytype Disk (
static str ArgList[] = { Partition }NameRule = resource.Partitionstatic str Operations = Nonestr Partition
)
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-144
Resource Types
NFS_IP
WEB_IPIP
ORACLE_IP
NFS_NIC_qfe1
NIC
ORACLE_NIC_qfe2
Resource Types Resources
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-145
Resource Type Definitions (types.cf)
type Mount (
static str ArgList[] = { MountPoint,BlockDevice, FSType, MountOpt, FsckOpt, SnapUmount }
NameRule = resource.MountPointstr MountPointstr BlockDevicestr FSTypestr MountOptstr FsckOptint SnapUmount = 0
)
AttributeTypes
Name Rule Definition
type Keyword Unique NameArguments Passed to Agent
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-146
Bundled Resource TypesApplicationDiskDiskGroupDiskReservationElifNoneFileNoneFileOnOffFileOnOnlyIPIPMultiNIC
MountMultiNICANFSNICPhantomProcessProxyServiceGroupHBShareVolume
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-147
AgentsPeriodically monitor resources and send status information to the VCS engine.Bring resources online when requested by the VCS engine. Take resources offline upon request.Restart resources when they fault (depending on the resource configuration).Send a message to the VCS engine and the agent log file when errors are detected.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-148
How Agents Work
VCS Engine
IPAgent
IP OnlineEntry Point
myNFSIPqfe1192.20.47.11
IP myNFSIP (Device = qfe1Address = “192.20.47.11”
)
main.cf
Online myNFSIP
types.cftype IP (static str ArgList[] = {Device, Address, Netmask, Options, ArpDelay,IfconfigTwice }
…
ifconfig qfe1:1 192.20.47.11 up myNFSIP
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-149
Enterprise Agents
Database Edition / HA 2.2 for OracleInformixVERITAS NetBackupOraclePC NetLinkSun Internet Mail Server (SIMS)SybaseVERITAS NetAppApacheFirewall (Checkpoint and Rapture)Netscape SuiteSpot
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-150
The main.cf File
Cluster-wide configurationService groupsResources Resource dependenciesService group dependenciesResource type dependenciesResource types—by way of include statements
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-151
Cluster Definition(main.cf)
Include for type definition files.include “types.cf”
Cluster name and Cluster Manager userscluster mycluster (
UserNames = { admin = "cDRpdxPmHpzS." }CounterInterval = 5
Systems which are members of the clusterSystem train7System train8
)
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-152
Service Group Definition(main.cf)
group MyNFSSG (SystemList = { train8 = 1, train7 = 2 }AutoStartList = { train8 }
Mount MyNFSMount (MountPoint = “/data”BlockDevice = “/dev/dsk/c1t1d0s3”FSType = vxfs
)
Disk MyNFSDisk (Partition = c1t1d0s3
)
MyNFSMount requires MyNFSDisk
ResourceAttributes
ResourceDependencies
Service GroupAttributes
Resources
Service Group
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-153
Modifying the Cluster Configuration
Online configuration:• Use Cluster Manager or the command line
interface.Changes are made in memory configuration on each system while cluster is running.
• Save cluster configuration from memory to disk:– File—>Save Configuration– haconf –dump
Offline configuration:• Edit main.cf.• Restart VCS.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-154
Modifying Resource Types
Online configuration:• Use Cluster Manager. • Use hatype command.• Save changes to synchronize in-memory
configuration with configuration files on disk.
Offline configuration:• Edit types.cf to change existing resource
type definitions.• Edit main.cf to add include statements for
new agents with their own types file.• Restart VCS.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-155
Changing Agent Behavior
Use ClusterManager.
hatype -modify Disk MonitorInterval 30Use CLI.
type Disk (static str ArgList[] = { Partition }
NameRule = group.Name +“_”+ resource.Partitionstatic str Operations = Nonestr Partition
int MonitorInterval = 30)
Edit types.cf.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-156
The Disk Resource and AgentFunctions:Online None (Disk type is persistent.)Offline NoneMonitor Determines whether disk is online by reading
from the raw deviceRequired attributes:Partition UNIX partition device name (If no path is
specified, it is assumed to be in /dev/rdsk.)
No optional attributesConfiguration prerequisites: UNIX device file must exist.
Sample configuration:Disk MyNFSDisk (
Partition=c1t0d0s0)
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-157
The Mount Resource and AgentFunctions:Online Mounts a file systemOffline Unmounts a file systemMonitor Checks mount status using stat and
stavfs
Required attributes:BlockDevice UNIX file system device nameFSType File system typeMountPoint Directory used to mount the file
systemOptional attributes:FsckOpt, MountOpt, SnapUmount
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-158
Mount Resource Configuration
Configuration prerequisites:• Create the file system on the disk partition (or volume).• Create the mount point directory on each system.• Configure the VCS Disk resource on which Mount
depends.• Verify that there is no entry in /etc/vfstab.
Sample configuration:Mount myNFSMount (MountPoint = “/export1”BlockDevice = “/dev/dsk/c1t1d0s3”FSType = vxfsMountOpt = “-o ro”
)
When setting MountOpt with hares, use % to escape arguments starting with dash (-):hares –modify myNFSMount MountOpt %“-o ro”
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-159
Configuring a Service Group
Check Logs/Fix
Test Switching
Done
Success?
Set Critical Res
Link Resources
Add Service Group Test Failover
Set SystemList
Y
Add/Test Resource
Y
Resource Flow Chart
More?
Set Opt AttributesN
N
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-160
Configuring a Resource
Flush Group
Done
N
Bring Online
Add Resource
Set Non-Critical
Modify Attributes Check Log
Online?
Clear Resource
YWaiting to Online
Disable ResourceEnable Resource
Faulted?
Y
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-161
Adding a Resource
Suggest using service group name as a prefix forresource names
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-162
Modifying a Resource
Enter values for each required attribute.Modify optional attributes, if necessary.See Bundled Agents Reference Guide for a complete description of all attributes.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-163
Setting the Critical Attribute
If a critical resource is faulted or taken offline due to a fault, the entire service group fails over.By default, all resources are critical. Set the Critical attribute to 0 to make a resource noncritical.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-164
Enabling a Resource
Resources must be enabled in order to be managed by the agent. If necessary, the agent initializes the resource when it is enabled.All required attributes of a resource must be set before the resource is enabled.By default, resources are not enabled.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-165
Bringing a Resource OnlineResources in a failover service group cannot bebrought online if any resource in the service group is:
Online on another systemWaiting to go online on another system
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-166
Creating Resource Dependencies
Parent resources depend on child resources:• Child resource must be online before parent
resource can come online.• Parent resource must go offline before child
resource can go offline.
Parent resources cannot be persistent type resources.You cannot link resources in different service groups.Resources can have an unlimited number of parent and child resources.Cyclical dependencies are not allowed.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-167
Linking Resources
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-168
Taking a Resource OfflineTake individual resources offline in order, from the top of the dependency tree to the bottom.Use Offline Propagate to take all resources offline. The selected resource:• Must be the top online resource in the
dependency tree• Must have no online parent resources
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-169
Clearing FaultsFaulted resources must be cleared before they can be brought online.
Persistent resources are cleared when the problem is fixed and they are probed by the agent.• Offline resources are probed periodically.• Resources can be manually probed.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-170
Disabling a Resource
VCS calls agent on each system in SystemList.Agent calls Close entry point, if present, to reset the resource.Nonpersistent resources brought offline.Agent stops monitoring disabled resources.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-171
Deleting a Resource
Before deleting a resource:• Take all parent resources offline.• Take resource offline.• Disable resource.• Unlink any dependent resources.
Delete all resources before deleting a service group.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-172
Summary
You should now be able to:Describe how resources and resource types are defined in VCS.Describe how agents work.Describe cluster configuration files.Modify the cluster configuration.Use the Disk resource and agent.Use the Mount resource and agent.Create a service group.Configure resources.Perform resource operations.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-173
Lab 7: Configuring Resources Student Red Student Blue
RedNFSSG BlueNFSSG
RedNFSDisk
RedNFSMount
BlueNFSDisk
BlueNFSMount
disk1 disk2
c1t8d0s0
/Redfs
c1t15d0s0
/Bluefs
RedGuiSG BlueGuiSG
VERITAS Cluster Serverfor Solaris
Lesson 8Network File System (NFS) Resources
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-175
Overview
Event Notification
Faults andFailovers
Using VolumeManager
Installing Applications
Troubleshooting
Cluster Communication
Service Group Basics
Terms and
ConceptsIntroduction
Preparing Resources
Managing Cluster Service
Resources and Agents
NFS Resources
Using Cluster
ManagerInstalling
VCS
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-176
Objectives
After completing this lesson, you will be able to:Prepare NFS services for the VCS environment.Describe the Share resource and agent.Describe the NFS resource and agent.Describe the NIC resource and agent.Describe the IP resource and agent.Configure and test an NFS service group.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-177
NFS Service Group
Disk
NIC
IP
Mount
Share
NFS
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-178
NFS Setup for VCS
Major and minor numbers for block devices usedfor NFS services must be the same on each system.
NFSRequest
NFSResponse
Stale File HandleError
NFSRequest
Before Failover After Failover
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-179
Major/Minor Numbers for Partitions
Each system must have the same major and minor numberfor the shared partition. Major/minor numbers must also beunique within a system.
On System A:ls -lL /dev/dsk/c1t1d0s3brw-r----- root sys 32,134 Dec 3 11:50/dev/dsk/c1t1d0s3
On System B:ls -lL /dev/dsk/c1t1d0s3brw-r----- root sys 36,134 Dec 3 11:55/dev/dsk/c1t1d0s3
To make the major numbers the same on all systems:haremajor –sd major_number
Example:haremajor –sd 36
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-180
Major Numbers for Volumes
Verify that the major numbers match on allsystems:
On System A:grep ^vx /etc/name_to_majorvxdmp 87vxio 88vxspec 89
On System B:grep ^vx /etc/name_to_major
vxdmp 89vxio 90vxspec 91
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-181
Changing Major Numbers for Volumes
To make the major numbers the same on all systems:• Before running vxinstall:
– Edit /etc/name_to_major manually and change the VM major numbers to be the same on both systems.
– Reboot the systems where the change was made.• After running vxinstall:
haremajor –vx major_num1 major_num2• Example:
haremajor –vx 91 92
Each system must have the same major number for the shared volume. Major numbers must also be unique within a system.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-182
The Share Resource and Agent
Functions:Online Shares an NFS file system Offline Unshares an NFS file systemMonitor Reads /etc/dfs/sharetab file to check for
an entry for the file system
Required attributes:PathName Pathname of the file system
Optional attributes: OptionsConfiguration prerequisites:• The file system to be shared should not be written
into /etc/dfs/dfstab.• Must have Mount and NFS resources configured
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-183
The NFS Resource and AgentFunctions:Online Starts the nfsd and mountd processes if
they are not already runningOffline None (NFS is an OnOnly resource.)Monitor Checks for the nfsd, mountd, lockd, and
statd processes
Required attributes: NoneOptional attributes: Nservers (default=16)
Configuration prerequisites: NoneSample configuration:NFS mySGNFS (
Nservers = 24)
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-184
The NIC Resource and Agent
Functions:Online None (NIC is persistent.)Offline NoneMonitor Uses ping to check connectivity and
determine whether the interface is upRequired attributes:Device NIC device nameOptional attributes:NetworkType, PingOptimize, NetworkHosts
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-185
NIC Resource Configuration
Configuration prerequisites:• Configure Solaris to plumb the interface during
system boot. Edit these files:– /etc/hosts– /etc/hostname.interface
• Reboot the system.Sample configuration:NIC mySGNIC(Device = qfe1NetworkHosts = { “192.20.47.254”, “192.20.47.253” })
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-186
The IP Resource and AgentFunctions:Online Configures a virtual IP address on an
interfaceOffline Removes an IP address from an interface
This is the IP address that users connect to and that fails over between systems in the cluster.
Monitor Determines whether a virtual IP address is present on the interface
Required attributes:Device Name of NICAddress Unique application (virtual) IP address
Optional attributes:NetMask, Options, ArpDelay (default=1s), IfconfigTwice(default=0)
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-187
IP Resource Configuration Configuration prerequisites:Configure a NIC resource.Sample configuration:IP mySGIP (Device = qfe1Address = "192.20.47.61")
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-188
Configuring an NFS Service Group
Add Service Group
Set SystemList
Set Opt Attributes
Add/Test Resource
Y
N
Resource Flow Chart
More?
hagrp -add mySG
hagrp -modify mySG SystemList sys1 0 sys 2
hagrp -modify mySG Attribute Value
Test
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-189
Configuring NFS Resources
NOnline?
Add Resource
Modify Attributes
Set Non-Critical
Bring Online
Enable Resource
Y
hares -add mySGIP IP mySG
hares -modify mySGIP Critical 0
hares -modify mySGIP Attribute Value
hares -modify mySGIP Enabled 1
hares -online mySGIP -sys sys1
Troubleshoot Resources
Done
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-190
Troubleshooting Resources
Flush Group
Faulted?
Clear Resource
Disable Resource
Check Log
YWaiting to Online
hagrp -flush mySG -sys sys1
hares -clear mySGIP
Modify Attributes
Bring Online
Enable Resource
hares -modify mySGIP Enabled 0
NOnline?
Y Done
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-191
Testing the Service Group
hares -modify mySGNIC Critical 1
hares -modify ……… Check Logs/Fix
Done
Success?
Test Switching
Set Critical Res
Link Resources
N
Y
Test Failover
hares -modify mySGIP Critical 1
hagrp -switch mySG -to sys2
hares -link mySGIP mySGNIC
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-192
Summary
You should now be able to:Prepare NFS services for the VCS environment.Describe the Share resource and agent.Describe the NFS resource and agent.Describe the NIC resource and agent.Describe the IP resource and agent.Configure and test an NFS service group.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-193
Lab 8: Creating an NFS Service Group
Student RedRedNFSSG
RedNFSDisk
RedNFSMount
RedNFSNIC
RedNFSIP
RedNFSNFS
RedNFSShare
Student BlueBlueNFSSG
BlueNFSDisk
BlueNFSMount
BlueNFSNIC
BlueNFSIP
BlueNFSNFS
BlueNFSShare
VERITAS Cluster Server for SolarisLesson 9
Event Notification
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-195
Overview
Service Group Basics
Terms and
ConceptsIntroduction
Preparing Resources
Event Notification
Faults andFailovers
Using VolumeManager
Installing Applications
Troubleshooting
Cluster Communication
Resources and Agents
NFS Resources
Managing Cluster Service
Using Cluster
ManagerInstalling
VCS
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-196
ObjectivesAfter completing this lesson, you will be able to:
Describe the VCS notifier component.Configure the notifier to signal changes in cluster status. Describe SNMP configuration.Describe event triggers.Configure triggers to provide notification.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-197
NotificationHow VCS performs notification:1. The had daemon sends a message to the notifier daemon
when an event occurs.2. The notifier daemon formats the event message and
sends an SNMP trap or e-mail message (or both) to designated recipients.
had
notifier
SMTPSNMP
had
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-198
Message Severity Levels
had had
notifier
SMTP
SNMP
SNMP
SNMP
SevereError
Warning
Information
Service group is online.
Agent has faulted
Resource has faulted.
Error Concurrency violation
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-199
Message Queues1. The had daemon stores a message in a queue when an
event is detected.2. The message is sent over the private cluster network to all
other had daemons to replicate the message queue.3. The notifier daemon can be started on another system in
case of failure without loss of messages.
had
notifier
SMTP
SNMP
had
notifier
SMTPSNMP
Replicated Queue
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-200
Configuring NotifierThe notifier daemon can be started and monitored by the NotifierMngr resource.Attributes define recipients and severity levels. For example:SmtpServer = "smtp.acme.com"
SmtpRecipients = { "[email protected]" = Warning }
notifier
NIC
NotifierMngr
NIC
NotifierMngr
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-201
The NotifierMngr AgentFunctions:Starts, stops, and monitors the notifier daemon
Required attribute:PathName Full path of the notifier daemon
Required attributes for SMTP e-mail notification:SmtpServer Host name of the SMTP e-mail serverSmtpRecipients E-mail address and message severity
level for each recipientRequired attribute for SNMP notification:SnmpConsole Name of the SNMP manager and
message severity level
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-202
The NotifierMngr ResourceOptional attributes:MessagesQueue Size of message queue size;
default = 30NotifierListeningPort TCP/IP port number;
default =14144SnmpdTrapPort TCP/IP port to which SNMP traps
are sent; default=162SnmpCommunity Community ID for the SNMP
manager; default = "public"Example resource configuration:NotifierMngr Notify_Ntfr (
PathName = "/opt/VRTSvcs/bin/notifier"
SnmpConsoles = { snmpserv = Information }
SmtpServer = "smtp.your_company.com"
SmtpRecipients = { "[email protected]_company.com" = SevereError }
)
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-203
SNMP Configuration
Load MIB for VCS traps into SNMP console.For HP OpenView Network Node Manager, merge events:xnmevents -merge vcs_trapd
VCS SNMP configuration files:• /etc/VRTSvcs/snmp/vcs.mib• /etc/VRTSvcs/snmp/vcs_trapd
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-204
Event Triggers
How VCS performs notification:1. VCS determines if notification is enabled.
• If disabled, no action is taken.• If enabled, VCS runs hatrigger with
event-specific parameters.2. The hatrigger script invokes the event-
specific trigger script with parameters passed by VCS.
3. The event trigger script performs the notification tasks.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-205
Types of Triggers
resnotoffResource not offlineResNotOff
loadwarningSystem is overloadedLoadWarning
resstatechange
Resource changed stateResStateChange
nofailoverService group cannot failover
NoFailover
postofflineService group went offlinePostOfflinepostonlineService group went onlinePostOnline
preonlineService group about to come online
PreOnline
violationResource online on more than one system
Violation
injeopardyCluster in jeopardyInJeopardysysofflineSystem went offlineSysOffline
resfaultResource faultedResFaultScript NameDescriptionTrigger
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-206
Configuring TriggersTriggers enabled by presence of script file:• ResFault• ResNotOff• SysOffline• InJeopardy• Violation• NoFailover• PostOffline• PostOnline• LoadWarning
Triggers configured by service group attributes:• PreOnline• ResStateChange
Triggers configured by default:• Violation
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-207
Sample TriggersSample trigger scripts include example code to send an e-mail message.Mail must be configured on the system invoking trigger to use sample e-mail code.
# Here is a sample code to notify a bunch of users.
# @recipients=("[email protected]");
# $msgfile="/tmp/resnotoff$2";
# `echo system = $ARGV[0], resource = $ARGV[1] > $msgfile`;
#
# foreach $recipient (@recipients) {
# # Must have elm setup to run this.
# `elm -s resnotoff $recipient < $msgfile`;
# }
#`rm $msgfile`;
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-208
ResFault TriggerProvides notification that a resource has faultedArguments to resfault:• system: Name of the system where the
resource faulted• resource: Name of the faulted resource
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-209
ResNotOff TriggerProvides notification that a resource has not been taken offline If a resource is not offline on one system, the service group cannot be brought online on another. VCS cannot fail over the service group in the event of a fault, because the resource will not come offline.Arguments to resnotoff:• system: Name of the system where the
resource is not offline• resource: Name of the resource that is not
offline
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-210
ResStateChange TriggerProvides notification that a resource has changed stateSet at the service group level by the ResStateChange attribute:hagrp serv_grp -modify TriggerResStateChange
Arguments to resstatechange:• system: Name of the system where the
resource faulted• resource: Name of the faulted resource• previous_state: State of the resource
before change• new_state: State of the resource after
change
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-211
SysOffline TriggerProvides notification that a system has gone offlineExecuted on another system when no heartbeat is detectedArguments to sysoffline:• system: Name of the system that went
offline• systemstate: Value of the SysState
attribute for the offline system
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-212
NoFailover TriggerRun when VCS determines that a service group cannot fail overExecuted on the lowest numbered system in a running state when the condition is detectedArguments to nofailover:• systemlastonline: Name of the last
system where the service group is online or partially online
• service_group: Name of the service group that cannot fail over
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-213
SummaryYou should now be able to:
Describe the VCS notifier component.Configure the notifier to signal changes in cluster status. Describe SNMP configuration.Describe event triggers.Configure triggers to provide notification.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-214
Triggersresfaultnofailoversysoffline
Lab 9: Event NotificationStudent Red Student Blue
resfaultnofailoversysoffline
RedNFSSG
ClusterService
BlueNFSSG
webip
webnicnotifier
VERITAS Cluster Server for SolarisLesson 10
Faults and Failovers
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-216
Overview
Service Group Basics
Terms and
ConceptsIntroduction
Preparing Resources
Event Notification
Faults andFailovers
Using VolumeManager
Installing Applications
Troubleshooting
Cluster Communication
Resources and Agents
NFS Resources
Managing Cluster Service
Using Cluster
ManagerInstalling
VCS
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-217
ObjectivesAfter completing this lesson, you will be able to:
Describe how VCS responds to faults.Implement failover policies.Set limits and prerequisites.Use system zones to control failover.Control failover behavior using attributes.Clear faults.Probe resources.Flush service groups.Test failover.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-218
How VCS Responds to Resource Faults
1. Calls ResFault trigger, if present.2. Offlines all resources in the path of the fault starting
from the faulted resource up to the top of the dependency tree.
3. If an online critical resource is part of the path, offlines the entire service group in preparation for failover.
4. Starts the service group on another system in the service group’s SystemList (if possible).
5. If no other systems are available, service group remains offline and NoFailover trigger is invoked, if present.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-219
Practice ExerciseStarts
on another system
Taken
offline due
to fault
74F
-4,6,7E
-4,6D
6,74C
-4B
--A
Offline
Non-Critica
l
Case
4
85
3
12
9
7
6
Resource 4 Faults
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-220
Practice Answers
All but 7
-
All
-
All
All
Starts on
another system
6
6,7
6,7
-
6,7
6,7
Taken
offline due
to fault
74F
-4,6,7E
-4,6D
6,74C
-4B
--A
Offline
Non-Critica
l
Case
4
8
5
3
1
2
9
7
6
Resource 4 Fails
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-221
Failover Attributes
AutoFailOver indicates whether automatic failoveris enabled for the service group.Default value is 1, enabled.FailOverPolicy specifies how a target system is selected:• Priority—System with the lowest priority number in
the list is selected (default).• RoundRobin—System with the least number of active
service groups is selected.• Load—System with greatest available capacity is
selected.Example configuration:hagrp –modify group AutoFailOver 0
hagrp –modify group FailOverPolicy Load
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-222
FailOverPolicy: Priority
DB
Svr2
Svr1AP1
SystemList = {Svr1 = 0, Svr2 = 1}
SystemList = {Svr2 = 0, Svr1 = 1}
Svr3
SystemList = {Svr3=0, Svr1=1, Svr2=2}AP2
Lowest numbered system in SystemList selected
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-223
FailOverPolicy: RoundRobin
System with fewest running service groupsselected
Svr3Svr1
Svr4Svr2
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-224
FailOverPolicy: LoadCapacity = 100AvailableCapacity = 70
Capacity = 200AvailableCapacity = 100
Load = 100LgSvr2
DB2
LgSvr1
Load = 100SmSvr1
AP1
Load = 30
DB1
Capacity = 100AvailableCapacity = 80
Load = 20SmSvr2
AP2
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-225
Setting Load and Capacity
The Load and Capacity attributes are user-defined values.Set attributes using the hagrp and hasyscommands.Examples:hasys –modify SmSrv1 Capacity 100
hagrp –modify AP1 Load 30
AvailableCapacity calculated by VCS:Capacity minus Load equals AvailableCapacity
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-226
Load-Based Failover ExampleG4 migrates to Svr1 [SystemList = {Svr1, Svr2, Svr3, Svr4}]G5 migrates to Svr3 [SystemList = {Svr1, Svr2, Svr3, Svr4}]
Svr1G1 Load=20G6 Load=30
Capacity = 100AvailableCapacity = 50
Capacity = 100AvailableCapacity = 20
Svr2
Svr3G3 Load=30G7 Load=20
Capacity = 100AvailableCapacity = 50
Svr4
G4 Load=10G5 Load=50
Capacity = 100AvailableCapacity = 40
G2 Load=40G8 Load=40
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-227
The LoadWarning TriggerSvr3 runs the LoadWarning trigger when AvailableCapacity is 20 or less (80 percent of Capacity) for 10 minutes (600 seconds).
Svr3G3 Load=30G7 Load=20G5 Load=50
Capacity = 100AvailableCapacity = 0
Capacity = 100AvailableCapacity = 40
Svr1G1 Load=20G6 Load=30G4 Load=10
Capacity = 100AvailableCapacity = 20
Svr4
System Svr3 (Capacity=100LoadWarningLevel=80LoadTimeThreshold=600
)
System Svr3 (Capacity=100LoadWarningLevel=80LoadTimeThreshold=600
)Svr2
G2 Load=40G8 Load=40
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-228
Dynamic LoadThe DynamicLoad attribute is used in conjunction with load-estimation software. It is set using the hasys command.
Capacity = 100AvailableCapacity = 10 SmSvr1 is 90 percent loaded.
SmSvr1GAGC GD
hasys -load 90hasys -load 90
Capacity = 200AvailableCapacity = 40
LgSvr2 is 80 percent loaded.
LgSvr2
GB GH hasys -load 160hasys -load 160
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-229
Limits and Prerequisites
SmSvr2
AP1 AP2
SmSvr1
Limits = { Mem=75, Processors=6 }CurrentLimits = { Mem=50, Processors=4 }
SmSvr1, SmSvr2SmSvr1, SmSvr2
Limits = { Mem=100, Processors=12 }CurrentLimits = { Mem=50, Processors=8 }
LgSvr2LgSvr1DB1, DB2DB1, DB2
Prerequisites = { Mem=50, Processors=4 }
DB1
LgSvr1, LgSvr2LgSvr1, LgSvr2
DB2
DB1 or DB2 can fail over to either SmSvr1 or SmSvr2.Both AP1 and AP2 can fail over to either LgSvr1 or LgSvr2.
AP1, AP2AP1, AP2Prerequisites = { Mem=25, Processors=2 }
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-230
Combining Capacity and Limits
When used together, VCS determines the failover
target as follows:Limits and Prerequisites are used to determine a subset of potential failover targets.Of this subset, the system with the highest value for AvailableCapacity is selected.If multiple systems have the same AvailableCapacity, the first system in SystemList is selected.Limits are hard values—if a system does not meet the Prerequisites, the service group cannot be started on that system.Capacity is a soft limit —the system with the lowest AvailableCapacity is selected, even if AvailableCapacity results in a negative number.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-231
Failover ZonesPreferred FailoverZone for Web Service Group
Preferred FailoverZone for Database Service Group
sysc sysd
syse sysf
sysa sysb
DatabaseWeb
The SystemList for both service groups includes all systems in the cluster.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-232
SystemZones Attribute
Used to define the preferred failover zones for each service group.If the service group is online in a system zone, it fails to other systems in the same zone based on the FailOverPolicy until there are no further systems available in that zone.When there are no other systems for failover in the same zone, VCS chooses a system in a new zone from the SystemList based on the FailOverPolicy.To define SystemZones:• Syntax:hagrp –modify group_name SystemZones \
sys1 zone# sys2 zone# sys zone# …• Example:
hagrp –modify OracleSG SystemZones sysa \0 sysb 0 sysc 1 sysd 1 syse 1 sysf 1
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-233
Controlling Failover Behavior with Resource
Type AttributesRestartLimit• Affects how the agent responds to a resource
fault• Default: 0
ConfInterval• Determines the amount of time that a tolerance
or restart counter can be incremented• Default: 600 seconds
ToleranceLimit• Enables the monitor entry point to return
OFFLINE several times before the resource is declared FAULTED
• Default: 0
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-234
Restart ExampleRestartLimit=1Resource to be restarted one time withinthe ConfInterval timeframeConfInterval=180Resource can be restarted once within a three minute interval.MonitorInterval=60 seconds (default value)Resource is monitored every 60 seconds.
MonitorInterval
Restart Faulted
Online OfflineOnlineOfflineOnline
ConfInterval
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-235
Adjusting MonitoringMonitorInterval• Default value is 60 seconds for most resource
types.• Consider reducing to 10 or 20 seconds for
testing.• Use caution when changing this value:
• Load is increased on cluster systems.• Resources can fault if they cannot respond in the
interval specified.
OfflineMonitorInterval• Default is 300 seconds for most resource types.• Consider reducing to 60 seconds for testing.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-236
Modifying Resource Type Attributes
Can be used to optimize agentsApplied to all resources of the specified typeCommand line example:hatype –modify FileOnOff MonitorInterval 5
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-237
Preventing Failover
Frozen service group does not fail over when a critical resource faults.Service group must be unfrozen to enable fail over.To freeze a service group:hagrp -freeze service_group [-persistent]
To unfreeze a service group:hagrp -unfreeze service_group [-persistent]
A persistent freeze:• Requires the cluster configuration to be open• Remains in effect even if VCS stopped and restarted
throughout the cluster
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-238
Clearing Faults
Verify that the faulted resource is offline.Fix the problem that caused the fault and clean up any residual effects.To clear a fault, type:hares -clear resource_name [-sys system_name]
To clear all faults in a service group, type:hagrp -clear group_name [-sys system_name]
Persistent resources are cleared by probing:hares -probe resource_name [-sys system_name]
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-239
Probing ResourcesCauses VCS to immediately monitor the resource To probe a resource, type:hares –probe resource_name –sys system_name
You can clear a persistent resource by probing it after the underlying problem has been fixed.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-240
Flushing Service Groups
All online/offline agent processes are stopped.All resources in transitional states waiting to go online are taken offline.Propagation of the offline operation is stopped, but resources waiting to go offline remain in the transitional state.You must verify the physical or software resources are stopped at the operating system level after flushing to avoid creating a concurrency violation.To flush a service group, type:hagrp –flush group_name –sys system_name
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-241
Testing FailoverUse test resources, such as FileOnOff, when applicable.Set lower values for MonitorInterval, OfflineMonitorInterval, and ConfInterval to detect faults more quickly.Manually online, offline, and switch the service group among all systems.Simulate failure of each resource in the service group. Simulate failover of the entire system.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-242
Testing ExamplesForce a resource to fault.Reboot a system. Halt and reboot a system.Remove power from a system.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-243
SummaryYou should now be able to:
Describe how VCS responds to faults.Implement failover policies.Set limits and prerequisites.Use system zones to control failover.Control failover behavior using attributes.Clear faults.Probe resources.Flush service groups.Test failover.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-244
Triggersresfaultnofailoversysoffline
Lab 10:Faults and Failovers
Student Red Student Blue
resfaultnofailoversysoffline
RedNFSSG BlueNFSSG
VERITAS Cluster Serverfor Solaris
Lesson 11Installing and Upgrading
Applications in the Cluster
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-246
Overview
Event Notification
Faults andFailovers
Using VolumeManager
Resources and Agents
Installing Applications
Troubleshooting
Cluster Communication
Service Group Basics
Terms and
ConceptsIntroduction
Preparing Resources
NFS Resources
Managing Cluster Service
Using Cluster
ManagerInstalling
VCS
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-247
Objectives
After completing this lesson, you will be able to:Describe the benefits of keeping applications available during planned maintenance.Freeze service groups and systems.Upgrade a system in a running cluster.Describe the differences in application upgrades.Apply guidelines for installing new applications in the cluster.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-248
Maintenance and Downtime
PlannedPlannedDowntimeDowntime
30%30%
SoftwareSoftware40%40%
People15%
Hardware10%
Environment5%
LAN/WAN Equip.<1%
Client<1%
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-249
Operating System Update
Frozen Web Server
Web RequestsOperating System Update
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-250
Application Upgrade
DatabaseSGWebSG
Frozen
Update Web Application
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-251
Freezing a System
Freezing a system prevents service groups from failing to it.Failover can still occur from a frozen system.Freeze a system while maintenance is being performed.Persistent freeze remains in effect through VCS restarts.Evacuate moves service groups off the frozen system.Syntax:hasys –freeze [–persistent] [-evacuate] systemAhasys –unfreeze [–persistent] systemA
Use hasys to determine if a system is frozen:hasys –display Frozenhasys –display TFrozen
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-252
Freezing a Service Group
Freezing a service group prevents it from being taken offline, brought online, or failed over, even if a concurrency violation occurs.Example update scenario:1. Freeze the service group.2. Update the application on the system(s) that are not
currently running the application.3. Unfreeze the service group.4. Move the service group to an updated system and apply
the application update on the original system.Persistent freeze remains in effect, even if VCS is stopped and restarted throughout the cluster.Syntax:hagrp –freeze service_group [–persistent]
Use hagrp to determine if a group is frozen:hagrp –display service_group –attribute Frozenhagrp –display service_group –attribute TFrozen
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-253
Upgrading a System—Reboot Required
More systemsTo upgrade?
Move service groupsto appropriate systems: hagrp -switch mySG
-to systemA
Close the configuration:haconf -dump -makero
Freeze and evacuate system:hasys -freeze –persistent
-evacuate systemA
Stop VCS on system:hastop -sys systemA
Perform upgrade.Reboot system.
Unfreeze the system:hasys –unfreeze
–persistent systemA
Open the configuration:haconf -makerw
No
YesStart
Done
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-254
Differences in Application Upgrades
Rolling upgradesNo simple reversion from upgradeMultiple installation directoriesUpgrading without rebooting
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-255
Installing Applications: Program Files on Shared Storage
Advantages:• Simplifies application setup and
maintenance• Application service group is
self-contained—all program and data files are located on file systems within the service group.
Disadvantages:• Rolling upgrades cannot be performed.• Downtime increased during maintenance
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-256
Binaries on Local Storage
Advantages:• Minimizes downtime during application
maintenance• May be able to perform rolling upgrades
(depending on the application)
Disadvantages:• Must maintain multiple copies of the
application• Not scalable due to maintenance overhead
in clusters with large numbers of service groups and systems
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-257
Application Installation Guidelines
Determine where to install program files (locally or shared disk) based on your cluster environment.Install application data files on a shared storage partition that is accessible to each system that can run the application.Specify identical installation options.Use the same mount point when installing the application on each system.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-258
Summary
You should now be able to:Describe the benefits of keeping applications available during planned maintenance.Freeze service groups and systems.Upgrade a system in a running cluster.Describe the differences in application upgrades.Apply guidelines for installing new applications in the cluster.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-259
Lab 11: Installing Applications in the Cluster
Student Red Student Blue
Install Volume Manager
RedNFSSG
BlueNFSSG
VERITAS Cluster Serverfor Solaris
Lesson 12Volume Manager and Process Resources
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-261
Overview
Event Notification
Faults andFailovers
Using VolumeManager
Installing Applications
Troubleshooting
Cluster Communication
Service Group Basics
Terms and
ConceptsIntroduction
Preparing Resources
Resources and Agents
NFS Resources
Managing Cluster Service
Using Cluster
ManagerInstalling
VCS
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-262
Objectives
After completing this lesson, you will be able to:Describe how Volume Manager enhances high availability.Describe Volume Manager storage objects. Configure shared storage using Volume Manager.Create a service group with Volume Manager resources.Configure Process resources.Configure Application resources.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-263
Volume ManagementPhysical Disks
System1
Virtual Volumes
System2
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-264
Volume Manager Objects
SubdiskSubdiskSubdisk
VxVM DisksVxVM Disks
VolumesVolumes
Disk GroupDisk Group
PlexesPlexes
SubdisksSubdisks
Physical Disks
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-265
Disk GroupsPhysical
Disks VxVM Disks
Disk Group: testDG
VxVM objects cannot span disk groups.
Disk groups represent management and configuration boundaries.
Disk groups enable high availability.
Disk1
Disk2
Disk3
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-266
VxVM Volume
VxVM Disks
Disk Group: testDG
VxVM Volume
Volume1
Physical DisksDisk1
Disk2
Disk3
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-267
Volume Manager Configuration
Initialize disk(s).vxdisksetup -i device
Create a disk group.vxdg init disk_group disk_name=device
Create a volume.vxassist -g disk_group make vol_name size
Make a file system.mkfs -F vxfs volume_device
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-268
Testing Volume Manager Configuration
On the first system:1. Create a mount point directory.2. Mount the VMVol file system on the first
system.3. Verify that the file system is accessible.4. Unmount the file system.5. Deport the disk group.On the next system(s):1. Create a mount point directory with the
same name.2. Import the disk group.3. Start the volume.4. Mount and verify the file system.5. Unmount the file system.6. Deport the disk group.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-269
Volume Manager Resources
Mount
VMVol
VMDG
VMSG
Proc
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-270
DiskGroup Resource and Agent
Functions:Online Imports a Volume Manager disk groupOffline Deports a disk groupMonitor Determines the state of the disk
group using vxdgRequired attributes:DiskGroup Name of the disk group
Optional attributes:StartVolumes, StopVolumes
Configuration Prerequisites:Disk group and volume must be configured.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-271
Volume Resource and Agent
Functions:Online Starts a volumeOffline Stops a volumeMonitor Reads a byte of data from the raw device
interface for the volume
Required attributes:DiskGroup Name of the disk groupVolume Name of the volume
Optional attributes: NoneConfiguration Prerequisites:Disk group and volume must be configured.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-272
Configuring a Service Group
Check Logs/Fix
Test Switching
Done
Success?
Set Critical Res
Link Resources
Add Service Group Test Failover
Set SystemList
Y
Add/Test Resource
Y
Resource Flow Chart
More?
Set Opt AttributesN
N
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-273
Configuring a Resource
Flush Group
Done
N
Bring Online
Add Resource
Set Non-Critical
Modify Attributes Check Log
Online?
Clear Resource
YWaiting to Online
Disable ResourceEnable Resource
Faulted?
Y
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-274
Process Resource and Agent
Functions:Online Starts a daemon processOffline Stops a processMonitor Determines whether the process is running
using procfsRequired attributes:PathName Full path of the executable fileOptional attributes:• Arguments• Use % to escape dashed arguments:
hares –modify myProc Arguments “%-db –q1h”
Sample Configuration:Process sendmail (
PathName = “/usr/lib/sendmail”Arguments = “-db -q1h”
)
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-275
The Application Resource and Agent
Functions:Online Brings an application online using StartProgramOffline Takes an application offline using StopProgramMonitor Monitors the status of the application in a number
of waysClean Takes the application offline using CleanProgram
or kills all the processes specified for theapplication
Required Attributes:StartProgram Name of executable to start applicationStopProgram Name of executable to stop applicationOne or more of the following:MonitorProgram Name of executable to monitor applicationMonitorProcesses List of processes to be monitoredPidFiles List of pid files that contain the process ID
of the processes to be monitored
Optional Attributes:CleanProgram, User
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-276
Application Resource Configuration
Configuration prerequisites:• The application should have its own start and stop
programs.• It should be possible to monitor the application by
either running a program that returns 0 for failure and 1 for success or by checking a list of processes.
Sample configuration:Application samba_app (StartProgram = “/usr/sbin/samba start”StopProgram = “/usr/sbin/samba stop”PidFiles = { “/var/lock/samba/smbd.pid” }MonitorProcesses = { “smbd” }
)
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-277
Summary
You should now be able to:Describe how Volume Manager enhances high availability.Describe Volume Manager Storage Objects. Configure shared storage using Volume Manager.Create a service group with Volume Manager resources.Configure Process resources.Configure Application resources.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-278
Lab 12: Volume Manager and Process Resources
Student Blue
TestSG
TestDG
TestVol
TestMount
TestLoopy
BlueNFSSG
Student Red
ProdSG
ProdDG
ProdVol
ProdMount
ProdLoopy
RedNFSSG
ProdDG TestDG
/test
TestVol
/prod
ProdVol
VERITAS Cluster Serverfor Solaris
Lesson 13Cluster Communication
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-280
Overview
Event Notification
Faults andFailovers
Cluster Communication
Installing Applications
Troubleshooting
Using VolumeManager
Service Group Basics
Terms and
ConceptsIntroduction
Preparing Resources
Resources and Agents
NFS Resources
Managing Cluster Service
Using Cluster
ManagerInstalling
VCS
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-281
Objectives
After completing this lesson, you will be able to:Describe how systems communicate in a cluster.Describe the LLT and GAB configuration files and commands.Reconfigure LLT and GAB.Describe the effects of cluster communication failures.Recover from communication failures.Configure the InJeopardy trigger.Troubleshoot LLT and GAB.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-282
Cluster Communication
had
GAB
LLT
GAB
LLT
had
agentagent agentagentagentagent
agentagent agentagentagentagent
System A System B
Agent Framework Agent Framework
had
GAB
LLT
GAB
LLT
had
agentagent agentagentagentagent
agentagentagentagent agentagentagentagentagentagentagentagent
agentagent agentagentagentagent
agentagentagentagent agentagentagentagent
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-283
GAB Membership Status
Determines cluster membership using heartbeat signalsHeartbeats transmitted by LLTMembership determined by cluster ID number
GAB
LLTLLT
GABGAB
LLT
GAB
LLT
System B System CSystem A System D
Cluster 1
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-284
Cluster State
GAB tracks all changes in configuration and resource status.Sends atomic broadcast to immediately transmit new configuration and status
12
36
45
123456
123456
123456
Add Resource
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-285
Low Latency Transport (LLT)
Provides traffic distribution across all private linksSends and receives heartbeatsTransmits cluster configuration dataDetermines whether connections are reliable (more than one exists) or unreliableRuns in kernel for best performanceConnection-orientedUses DLPI over EthernetNonroutable
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-286
Configuring LLT
Required configuration files:• /etc/llttab• /etc/llthosts
Optional configuration file:/etc/VRTSvcs/conf/sysname
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-287
The llttab File
set-node train1set-cluster 10# Solaris examplelink qfe0 /dev/qfe:0 - ether - -link hme0 /dev/hme:0 - ether - -start
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-288
Setting Node Number and Name
# /etc/llttab
set-cluster 10
set-node /etc/VRTSvcs/conf/sysname
link qfe0 /dev/qfe:0 - ether - -
link hme0 /dev/hme:0 - ether - -
link-lowpri qfe1 /dev/qfe:1 - ether - -start
# /etc/llthosts3 sysa7 sysb
# /etc/VRTSvcs/conf/sysnamesysb
0 - 255
0 - 31
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-289
The link Directive
# /etc/llttabset-node 1set-cluster 10# Solaris examplelink qfe0 /dev/qfe:0 - ether - -link hme0 /dev/hme:0 - ether - -link-lowpri qfe1 /dev/qfe:1 - ether - -start
Tag Name Range (all) SAP
Device:Unit Link Type MTU
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-290
Low Priority Link
Public network link as redundant private network linkLLT sends only heartbeats on low priority link if other private network links are functional.Rate of heartbeats slower to reduce trafficLow priority link is used for all cluster communication if all private links fail.Public network can be saturated with cluster traffic.Risk of system panics if the same system ID/cluster ID is present on networkConfigured with link-lowpri directive
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-291
Other LLT Directives# for verbose messages from # lltconfig, add this line first # in llttab
set-verbose 1
# the following will cause only # nodes 0-7 to be valid for # cluster participation
exclude 8-31
# peerinact specifies how long the link is # down before marked inactive
set-timer peerinact: 1600
# regulates heartbeat interval
set-timer heartbeat:50set-timer heartbeatlo:100
start
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-292
The llthosts File
Format:node_number name
Example entries:1 systema2 systemb3 systemc
No spaces before numberHave same entries on all systems Unique node numbers requiredSystem names match llttab, main.cfSystem names match sysname, if used
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-293
The sysname File
Enables llttab and llthosts to be identical on all systemsMust be different on each systemContains unique system nameRemoves dependency on UNIX node nameSystem name must be in llthostsSystem name must match main.cf
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-294
GAB Configuration
GAB configuration file:/etc/gabtab
GAB configuration command entry:/sbin/gabconfig -c -n seed_number
Seed number is set to number of systems in the cluster. Starts GAB under normal conditionsOther options discussed later
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-295
Changing Communication Configuration
Start VCS Stop VCS
Stop GAB Start GAB
Stop LLT Start LLTEdit Files
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-296
Stopping GAB and LLT
Stop VCS engine first.Stop GAB on each system:/sbin/gabconfig -U
Stop LLT:/sbin/lltconfig -U
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-297
Starting LLT
Edit configuration files on each system before starting LLT on any system.Start LLT on each system in the cluster:/sbin/lltconfig -c
LLT starts if configuration files are correct.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-298
Starting GAB
Start LLT before starting GAB.Start GAB on each system, specifying a value for -n equal to the number of systems in the cluster:/sbin/gabconfig -c -n #
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-299
Starting LLT and GAB Automatically
Startup files added when VCS is installed:/etc/rc2.d/S70llt
/etc/rc2.d/S92gab
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-300
The LinkHbStatus Attribute
Internal VCS system attribute that provides link status informationUse hasys command to view status:hasys -display system -attribute LinkHbStatus
hme:0 UP qfe:0 UP
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-301
The lltstat Command
train12# lltstat -nvv |pg
LLT node information:
Node State Link Status Address
* 0 train12 OPEN
link1 UP 08:00:20:AD:BC:78
link2 UP 08:00:20:AD:BC:79
link3 UP 08:00:20:B7:08:5C
1 train11 OPEN
link1 UP 08:00:20:B4:0C:3B
link2 UP 08:00:20:B4:0C:3B
link3 UP 08:00:20:B4:0C:3BShows which system runs the command
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-302
Other lltstat Optionstrain12#lltstat -c
LLT configuration information:node: 20name: train3cluster: 10version: 1.1nodes: 20 - 21max nodes: 32max ports: 3
(…)
train12#lltstat -cLLT configuration information:
node: 20name: train3cluster: 10version: 1.1nodes: 20 - 21max nodes: 32max ports: 3
(…)
train12# lltstat -lLLT link information:Link Tag State Type Pri SAP MTU Addrlen Xmit Recv …..
0 hme0 on ether hipri 0xCAFE 1500 6 3732 3678 0
1 qfe0 on ether hipri 0xCAFE 1500 6 3731 3674 0
2 qfe1 on ether lowpri 0xCAFE 1500 6 1584 6719 0
train12# lltstat -lLLT link information:Link Tag State Type Pri SAP MTU Addrlen Xmit Recv …..
0 hme0 on ether hipri 0xCAFE 1500 6 3732 3678 0
1 qfe0 on ether hipri 0xCAFE 1500 6 3731 3674 0
2 qfe1 on ether lowpri 0xCAFE 1500 6 1584 6719 0
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-303
The lltconfig Command
train12# lltconfig -a list
Link 0 (qfe0):
Node 0 : 08:00:20:AD:BC:78 permanent
Node 1 : 08:00:20:AC:BE:76 permanent
Node 2 : 08:00:20:AD:BB:89 permanent
Link 1 (hme0):
Node 0 : 08:00:20:AD:BC:79 permanent
Node 1 : 08:00:20:AC:BE:77 permanent
Node 2 : 08:00:20:AD:BB:80 permanent
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-304
GAB Membership Notation
# /sbin/gabconfig -aGAB Port Memberships===============================================Port a gen a36e003 membership 01 ; ;12Port h gen fd57002 membership 01 ; ;12
Nodes 0 and 1
Indicates 10s Digit(0 displayed if node 10is a member of the cluster)
20 Placeholder
had is communicating.
GAB is communicating.
Nodes 21 and 22
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-305
Communication Failures
Network partition:Failure of all Ethernet heartbeat links between one or more systems:• Occurs when one or more systems fail• Also occurs when all Ethernet heartbeat
links failSplit brain:• Failure of Ethernet heartbeat links is
misinterpreted as failure of one or more systems.
• Multiple systems start running the same failover application.
• Leads to data corruption if applications using shared storage
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-306
Split-Brain Condition
Block20460
INVALID
Changing Block 20460
Changing Block 20460
Shared Storage
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-307
Preventing Split-Brain Condition
Redundant heartbeat channels:• Multiple private network heartbeats• Public network heartbeat• Disk heartbeats• Service group heartbeat
SCSI disk reservationJeopardyAutodisablingSeedingPreOnline trigger
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-308
Jeopardy Condition
A special type of cluster membership called jeopardy is formed when one or more systems have only a single Ethernet heartbeat link.Service groups continue to run, and the cluster functions normally. Failover and switching at operator request are unaffected.The service groups running on a system in jeopardy are not taken over by another system if a system failure is detected by VCS.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-309
Jeopardy Example
SG_1 SG_2 SG_3
A B C
Regular Membership: A, BJeopardy Membership: C
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-310
Network Partition Example
SG_1 SG_2 SG_3
Autodisabled for C Autodisabled for A,B
A B C
1
2
Regular Membership: A, BNo Jeopardy Membership
New Regular MembershipNo Jeopardy Membership
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-311
Split Brain Example
CBA
SG_1 SG_2 SG_3SG_3 SG_1 SG_2
Service Groups Not Autodisabled
Regular Membership: A, BNo Jeopardy Membership
New Regular MembershipNo Jeopardy Membership
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-312
Recovery Behavior
When a private network is reconnected after anetwork partition, VCS and GAB are stoppedand restarted as such:
Two-system cluster:• System with the lowest LLT node number
continues to run VCS.• VCS is stopped on higher-numbered system.
Multi-system cluster:• Mini-cluster with the most systems running
continues to run VCS. VCS is stopped on the systems in the smaller mini-cluster(s).
• If split into two equal size mini-clusters, the cluster containing the lowest node number continues to run VCS.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-313
Configuring Recovery Behavior
Modify /etc/gabtab. For example:/sbin/gabconfig -c -n 2 –j
Causes high numbered node to panic if GAB tries to start after all Ethernet connections simultaneously stop and then restartSplit brain avoidance mechanism
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-314
Preexisting Network Partitions
This condition is caused by failure in private network communication channels while systems are down.A preexisting network partition can lead to split brain when systems are started.VCS uses seeding to prevent split brain condition in the case of a network partition.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-315
Seeding
Prevents split brainOnly seeded systems can run VCS.Systems are seeded only if GAB can communicate with other systems. Seeding determines the number of systems that must be communicating to allow VCS to start.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-316
Manually Seeding the Cluster
To start GAB and seed the system on which the command runs:gabconfig -c –xWarning: Do not use these options in gabtab.
Overrides –n; allows GAB to immediately seed the cluster so VCS can build a running configuration Use when the number of systems available is less than the number specified by –n in /etc/gabtab.Only use on one system in the cluster; others then seed from first system.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-317
The InJeopardy Trigger
To configure, add an injeopardy script to /opt/VRTSvcs/bin/triggers.The trigger is called when a system transitions from regular cluster membership to jeopardy.Arguments are the name of the system in jeopardy and the system state.The trigger is invoked on all systems that are part of jeopardy membership.The InJeopardy trigger is not run when:• A system loses its last network link.• A system loses both private network links at once.• A system transitions from any other state (such as
down state) to jeopardy state.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-318
The lltdump Command
train12# lltdump -f /dev/qfe:0 -V -A -R
DAT C 100 S 01 D 00 P 007 rdy 80000081 seq 000000b9 len 0132 ack 0000007c 01 01 64 05 00 00 00 01 00 07 89 00
DAT C 100 S 01 D 00 P 007 rdy 80000081 seq 000000bb len 0166 01 01 64 05 00 00 00 01 00 07 88 00
DAT C 100 S 01 D 00 P 007 rdy 80000081 seq 000000bc len 0166 ack 00000080 01 01 64 05 00 00 00 01 00 07 89 00
DAT C 100 S 01 D 00 P 007 rdy 80000081 seq 000000bf len 0176 ack 00000083 01 01 64 05 00 00 00 01 00 07 89 00
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-319
The lltshow Commandtrain12# lltshow -n 0 |pg
=== LLT node 0:
nid= 0 state= 4 OPEN my_gen= 3a89ec14 peer_gen= 0 flags= 0 links= 3opens= ffffffff readyports= 0 rexmitcnt= 0 nxtlink= 0lastacked= 0 nextseq= 0 recv_seq= 0xmit_head= 0 xmit_tail= 0 xmit_next= 0xmit_count= 0 recv_reseq= 0 oos= 0retrans= 0 retrans2= 0link [0]: hb= 0 hb2= 0 peerinact= 0 lasthb= 0valid= 1 perm= 1 flags= 0 stat= 1arpmode= 0addr= 08 00 20 AD BC 78 00 00 00 00dlpi_hdr= 00 00 00 07 00 00 00 08 00 00 00 14 00 00 00 64 00 00 00 00 08 00 20 AD BC 78 CA FE 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Identifies LLT Packets on Public Network
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-320
Common LLT Problems
Node or cluster number out of range:Node number must be between 0 and 31.Cluster number must be between 0 and 255.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-321
Incorrect LLT Specification
Incorrectly specified Ethernet link device:qf3 should be qfeLLT not started:Check /etc/llttab for the startdirective.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-322
Common GAB Problems
No GAB membership• gabconfig -a• gabconfig -c -nN
GAB starts then shuts down• Check cabling
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-323
Problems with main.cf
VCS does not start:Check main.cf for incorrect entries.
hacf -verify aborts:Check system names in main.cf to verify that they match llthosts and llttab.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-324
Summary
You should now be able to:Describe how systems communicate in a cluster.Configure the Low Latency Transport (LLT).Configure the Group Membership and Atomic Broadcast (GAB) mechanism.Start and stop LLT and GAB.Configure the InJeopardy trigger.Troubleshoot LLT and GAB.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-325
Lab 13: Cluster CommunicationStudent Red Student Blue
injeopardy
BlueNFSSG
TestSG
TestDG
TestVol
TestMount
TestLoopy
RedNFSSG
ProdDG
ProdVol
ProdMount
ProdLoopy
VERITAS Cluster Serverfor Solaris
Lesson 14Troubleshooting
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-327
Overview
Event Notification
Faults andFailovers
Using VolumeManager
Installing Applications
Troubleshooting
Cluster Communication
Service Group Basics
Terms and
ConceptsIntroduction
Preparing Resources
Resources and Agents
NFS Resources
Managing Cluster Service
Using Cluster
ManagerInstalling
VCS
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-328
ObjectivesAfter completing this lesson, you will be able to:
Monitor system and cluster status.Apply troubleshooting techniques in a VCS environment.Detect and solve VCS communication problems.Identify and solve VCS engine problems.Correct service group problems.Solve problems with agents.Resolve problems with resources.Plan for disaster recovery.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-329
Monitoring VCS
VCS log filesSystem log filesThe hastatus utilitySNMP trapsEvent notification triggers Cluster Manager
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-330
VCS Log Entries
Engine log: /var/VRTSvcs/log/engine_A.log
TAG_D 2001/04/03 12:17:44 VCS:11022:VCS engine (had) started
TAG_D 2001/04/03 12:17:44 VCS:10114:opening GAB library
TAG_C 2001/04/03 12:17:45 VCS:10526:IpmHandle::recv peer exited errno 10054
TAG_E 2001/04/03 12:17:52 VCS:10077:received new cluster membership
TAG_E 2001/04/03 12:17:52 VCS:10080:Membership: 0x3, Jeopardy: 0x0
TAG_D 2001/04/03 12:17:52 VCS:10322:Node '1' changed state from 'UNKNOWN' to 'INITING'
TAG_B 2001/04/03 12:17:52 VCS:10455:Operation 'haclus -modify(0xc13)' rejected. Sysstate=CURRENT_DISCOVER_WAIT,Channel=BCAST,Flags=0x40000
Most Recent
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-331
Agent Log Entries
Agent logs kept in /var/VRTSvcs/logLog files named AgentName_A.logLogLevel attribute settings:• none• error (default setting)• info• debug• all
To change log level:hatype -modify res_type LogLevel debug
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-332
Troubleshooting Guide
Primary types of problems:• Cluster communication• VCS engine startup• Service groups and resources
Determine path based on hastatus output:• Cluster communication problem indicated by
message:Cannot connect to server -- Retry Later
• VCS engine startup problem indicated by systems with WAIT status
• Service group and resource problems indicated when VCS engine in RUNNING state
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-333
# gabconfig -aGAB Port Memberships===================================
Run gabconfig –a.No port a membership indicates a communication problem.No port h membership indicates a VCS engine (had) startup problem.
# gabconfig -aGAB Port Memberships===================================Port a gen 24110002 membership 01Port h gen 65510002 membership
Communication Problem:GAB Not Seeded
VCS Engine Not Running:GAB and LLT Functioning
Cluster Communication Problems
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-334
Problems with GAB and LLTIf GAB is not seeded (no port memberships):• Run lltconfig to determine if LLT is running.• Run lltstat -n to determine if systems can see each
other on the LLT link.• Check the physical network connection(s) if LLT cannot see
each node.• Check gabtab for correct seed value (-n) if LLT links are
functional.Manually seed the cluster, if necessary.
lltconfigLLT is running
lltstat -nLLT node information:
Node State Links* 0 train11 OPEN 21 train12 OPEN 2
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-335
VCS Engine Startup Problems
Start the VCS engine using hastart.Check hastatus to determine system state.If not running:• If ADMIN_WAIT or STALE_ADMIN_WAIT,
see next sections.• Check logs.• Verify that the llthosts file exists and
system entries match cluster configuration (main.cf).
• Check gabconfig.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-336
STALE_ADMIN_WAIT
To recover from STALE_ADMIN_WAIT state:1. Visually inspect the main.cf file to
determine whether it is valid.2. Edit the main.cf file, if necessary.3. Verify the syntax of main.cf, if modified.
hacf –verify config_dir4. Start VCS on the system with the valid
main.cf file:hasys -force system_name
5. All other systems perform a remote build from the system now running.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-337
ADMIN_WAIT
A system can be in the ADMIN_WAIT state under these circumstances:• A .stale flag exists and the main.cf file has a
syntax problem. • A disk error occurs affecting main.cf during a
local build.• The system is performing a remote build and
last running system fails.
Restore main.cf and use procedure for STALE_ADMIN_WAIT.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-338
Service Group Not Configured to AutoStart or Run
Service group not onlined automatically when VCS starts:Check AutoStart and AutoStartList attributes:hagrp –display service_group
Service group not configured to run on the system:• Check the SystemList attribute.• Verify that the system name is included.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-339
Service Group AutoDisabled
Autodisable occurs when:• GAB sees a system but had is not running on
the system.• Resources of the service group are not fully
probed on all systems in the SystemList.• A particular system is visible through disk
heartbeat only.Make sure that the service group is offline on all systems in SystemList attribute.Clear the AutoDisabled attribute:hagrp –autoenable service_group -sys system
Bring the service group online.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-340
Service Group Waiting for Dependencies
Check service group dependencies:hagrp -dep service_group
Check resource dependencies:hares -dep resource
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-341
Service GroupNot Fully Probed
Usually a result of misconfigured resource attributesCheck ProbesPending attribute:hagrp -display service_group
Check which resources are not probed:hastatus -sum
Check Probes attribute for resources:hares -display
To probe resources:hares –probe resource -sys system
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-342
Service Group FrozenVerify value of Frozen and TFrozen attributes: hagrp -display service_group
Unfreeze the service group:hagrp -unfreeze group [-persistent]
If you freeze persistently, you must unfreeze persistently.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-343
Service Group Is Not Offline Elsewhere
Determine which resources are online/offline:hastatus -sum
Verify the State attribute:hagrp -display service_group
Offline the group on the other system:hagrp -offline
Flush the service group:hagrp -flush service_group -sys system
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-344
Service Group Waiting for Resource
Review Istate attribute of all resources to determine which resource is waiting to go online.Use hastatus to identify the resource.Make sure the resource is offline (at the operating system level).Clear the internal state of the service group:hagrp –flush service_group -sys systemBring all other resources in the service group offline and try to bring these resources online on another system.Verify that the resource works properly outside VCS.Check for errors in attribute values.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-345
Incorrect Local Name
1. Create /etc/VRTSvcs/conf/sysname with the correct system name shown in main.cf.
2. Stop the local system.3. Start VCS.4. List all system names.5. Open the configuration.6. Delete any systems with incorrect names.7. Save the configuration.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-346
Concurrency ViolationsOccurs when a failover service group is online or partially online on more than one systemNotification provided by the Violation trigger:• Invoked on the system that caused the concurrency
violation• Notifies the administrator and takes the service
group offline on the system causing the violation• Configured by default with the violation script in /opt/VRTSvcs/bin/triggers
• Can be customized:– Send message to the system log.– Display warning on all cluster systems.– Send e-mail messages.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-347
Service Group Waiting for Resource to Go Offline
Identify which resource is not offline:hastatus –summary
Check logs.Manually bring the resource offline, if necessary.Configure ResNotOff trigger for notification or action.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-348
Agent Not Running
Determine whether the agent for that resource is FAULTED:hastatus –summary
Use the ps command to verify that the agent process is not running.Verify values for ArgList and ArgListValuestype attributes:hatype –display res_type
Restart the agent:haagent –start res_type -sys system
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-349
Problems Bringing Resources Online
Possible causes of failure while bringingresources online:
Waiting for child resourcesStuck in a WAIT stateAgent not running
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-350
Problems Bringing Resource Offline
Waiting for parent resources to come offlineWaiting for a resource to respondAgent not running
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-351
Critical Resource Faults
Determine which critical resource has faulted:hastatus –summary
Make sure that the resource is offline. Examine the engine log.Fix the problem.Verify that the resources work properly outside of VCS.Clear fault in VCS.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-352
Clearing Faults
After external problems are fixed:1. Clear any faults on nonpersistent resources.
hares -clear resource -sys system
2. Check attribute fields for incorrect or missing data.
If service group is partially online:1. Flush wait states:
hagrp -flush service_group -sys system
2. Bring resources offline first before bringing them online.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-353
Planning for Disaster Recovery
Back up key VCS files:• types.cf and customized types files• main.cf
• main.cmd
• sysname
• LLT and GAB configuration files• Customized trigger scripts• Customized agents
Use hagetcf to create an archive.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-354
The hagetcf Utility# hagetcf
Saving 0.13 MB
Enter path where configuration can be saved (default is /tmp):
Collecting package info
Checking VCS package integrity
Collecting VCS information
Collecting system configuration
…..
Compressing /tmp/vcsconf.train12.tar to /tmp/vcsconf.train12.tar.gz
Done. Please e-mail /tmp/vcsconf.train12.tar.gz to your support provider.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-355
Summary
You should now be able to:Monitor system and cluster status.Apply troubleshooting techniques in a VCS environment.Identify and solve VCS engine problems.Correct service group problems.Solve problems with agents.Resolve problems with resources.Plan for disaster recovery.
Lab Exercise
Lesson 14Troubleshooting
VERITAS Cluster Server for Solaris
Appendix DSpecial Situations
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-358
Overview
This lesson provides a guide for managing certain situations in a cluster environment:
VCS upgradesVCS patchesSystem changes: Adding, removing, and replacing cluster systems
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-359
Objectives
After completing this lesson, you will be able to:Upgrade VCS software to version 2.0 from any earlier versions.Install a VCS patch.Add systems to a running VCS cluster.Remove systems from a running VCS cluster.Replace systems in a running VCS cluster.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-360
Preparations for VCS Upgrade
Acquire the new VCS software.Contact VERITAS Technical Support.Read the release notes.Write scripts to automate as much of the process as possible.If available, deploy on a test cluster first.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-361
VCS Upgrade Process
Done
I. Complete initial preparation.Start
II. Stop the existing VCS software.
III. Remove the existing VCS software and add the new VCS version.
IV. Verify the configuration and make changes as needed.
V. Start VCS on one system and propagate the configuration to others.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-362
Step I - Initial Preparation1. Open the cluster configuration and freeze all
service groups persistently:haconf –makerwhagrp –listhagrp –freeze group_name -persistent
2. Save and close the VCS configuration:haconf –dump -makero
3. Make a backup of the full configuration, including:
• All configuration files• Any custom-developed agents• Any modified VCS scripts
4. Rename the existing types.cf file:mv /etc/VRTSvcs/conf/config/types.cf \
/etc/VRTSvcs/conf/config/types.save
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-363
Step II - Stopping VCS Software1. Stop the VCS engine on all systems leaving the
application services running:hastop –all -force
2. Remove heartbeat disk configurations:gabdiskhb –lgabdiskx –lgabdiskhb –d disk_namegabdiskx –d device_name
3. Stop GAB and unload GAB:gabconfig –Umodinfo | grep gabmodunload -i modid
4. Stop and unload LLT.lltconfig –Umodinfo | grep lltmodunload -i modid
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-364
Step III - Removing Old and Adding New VCS Software
1. Remove the existing VCS (pre-2.0) software packages.pkgrm VRTScscm VRTSvcs VRTSgab VRTSllt \
VRTSperl
2. Add the new VCS software packages.pkgadd –d /package_directory
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-365
Step IV - Verifying and Changing the Configuration
1. Determine differences between existing and new types.cf files:diff /etc/VRTSvcs/conf/config/types.save \
/etc/VRTSvcs/conf/config/types.cf
2. Merge the new and old versions of types.cffiles:
a. Check changes in attribute names.b. Check modified resource type attributes.
3. Compare and merge any necessary changes to VCS scripts.
4. Verify the configuration files:hacf –verify /etc/VRTSvcs/conf/config
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-366
Step V - Starting The VCS Cluster
1. On all systems in the cluster, start LLT and GAB:lltconfig -c
gabconfig –c –n #
2. Start the VCS engine on the system where the changes were made:
hastart
3. Start the VCS engine on all other systems in the cluster in a stale state:
hastart -stale
4. Open the configuration, unfreeze the service groups, and save and close the configuration:
haconf –makerw
hagrp –unfreeze group_name –persistent
haconf –dump -makero
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-367
Installing a VCS Patch
Done
I. Carry out the initial preparation
Start
Same as in VCS Upgrade
Same as in VCS Upgrade II. Stop the old VCS software
III. Install and verify the new patch
IV. Start the VCS software
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-368
Step III - Installing and Verifying the New Patch
1. Verify that VRTS* packages are all version 2.0.pkginfo –l VRTSgab VRTSllt VRTSvcs \
VRTSperl | grep VERSION
2. Add the new VCS patch on each system using the provided utility../vcs_install_patch
3. Verify that the new patch has been installed.showrev –p | grep VRTS
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-369
Step IV - Starting the VCS Cluster
1. Start LLT, GAB, and VCS on all systems in the cluster.lltconfig –c
gabconfig –c –n #
hastart
2. Open the configuration, unfreeze the service groups, and save and close the configuration:haconf –makerw
hagrp –unfreeze group_name –persistent
haconf –dump -makero
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-370
Adding Systems to a Running VCS Cluster
1. Configure LLT with the same cluster number and a unique node id on the new system.
2. Configure GAB.3. Connect the new system to the private
network.4. Edit /etc/llthosts files on all systems in
the cluster to add the system name and node ID of the new system.
5. Start LLT, GAB, and VCS on the new system.6. Change the SystemList attribute for each
service group that can run on the new system.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-371
Removing Systemsfrom a Running VCS Cluster
1. Switch all running service groups to other systems and freeze the system.
2. Stop VCS on the system using hastop -local.3. Stop GAB on the system:
gabconfig –Umodinfo | grep gabmodunload -i modid
4. Stop and unload LLT on the system:lltconfig –Umodinfo | grep lltmodunload –i modid
5. Remove the system from the cluster configuration:hasys –delete system_name
6. Edit /etc/llthosts on all systems to delete the entry for the system to be removed.
7. Remove llttab and gabtab files on that system.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-372
Replacing Systemsin a Running VCS Cluster
1. Evacuate any service groups running on the system to be replaced.
2. Make the VCS configuration read/write, freeze the system persistently, save and close the configuration.
haconf –makerwhasys –freeze system_name –persistenthaconf –dump -makero
3. Physically replace the system with a new one using the same VCS configuration (same cluster number, node id, and system name).
4. Connect the new system to the private network.5. Start LLT, GAB, and VCS on the new system.6. Make the VCS configuration read/write, unfreeze the
system, save and close the configuration.haconf –makerwhasys –unfreeze system_name –persistenthaconf –dump -makero
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-373
Summary
You should now be able to:Upgrade VCS software to version 2.0.Install a VCS patch.Add systems to a running VCS cluster.Remove systems from a running VCS cluster.Replace systems in a running VCS cluster.
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-374
Lab: Installing VCS PatchesStudent Red Student Blue
Install Patch
RedSG
BlueSG
VERITAS Cluster Serverfor Solaris
Introduction
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-376
VERITAS Cluster Server
Public NetworkPublic Network
VCSPrivateNetwork
VCSPrivateNetwork
Shared StorageShared Storage
Applications/ServicesApplications/Services NFS WWW FTP DB
ClientsClients
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-377
VCS Features
Clustered Databases
Clustered Web Servers
NetworkNetwork
Availability• Monitor and restart
applications• Set failover policies
Scalability• Distribute services• Add systems and
storage to running clusters
Manageability• Use Java or Web
graphical interfaces• Manage multiple
clusters
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-378
High Availability DesignHA-aware applications• Restart capability• Crash-tolerance
HA management software• Site replication• Fault detection, notification, and failover• Storage management• Backup and recovery
Redundant hardware• Power supplies• Network interface cards, hubs, switches• Storage
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-379
VERITAS Clustering and Replication Products
Foundation ProductsVERITAS Volume Manager and File System
Parallel ExtensionsVERITAS Cluster Volume Manager and File System
Data ReplicationVERITAS VVR & Support for Array-Based Replication
Cluster ManagementVERITAS Global Cluster Manager
Application Availability AgentsInformix, Oracle, Sybase, Apache
High Availability ClusteringVERITAS Cluster Server
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-380
VERITAS High Availability Solutions
VxVMVxFSVxVMVxFS
DB
VCSVCS
VolumeReplicatorVolume
Replicator
WAN
Global Cluster
Manager
Global Cluster
ManagerWWW
VCSVCS
VxVMVxFSVxVMVxFS
Tokyo
London
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-381
References for High Availability
Blueprints for High Availability: Designing Resilient Distributed Systemsby Evan Marcus and Hal SternHigh Availability Design, Techniques, and Processesby Floyd Piedad and Michael HawkinsDesigning Storage Area Networks by Tom ClarkStorage Area Network Essentials: A Complete Guide to Understanding and Implementing SANsby Richard Barker and Paul MassigliaVERITAS High Availability FundamentalsWeb-based training
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-382
Course Overview
Event Notification
Faults andFailovers
Using VolumeManager
Installing Applications
Troubleshooting
Cluster Communication
Service Group Basics
Terms and
ConceptsIntroduction
PreparingResources
Resources and Agents
NFS Resources
Managing Cluster Service
Using Cluster
ManagerInstalling
VCS
VCS_2.0_Solaris_R1.0_20011130© Copyright 2001 VERITAS Software I-383
Lab Overview
Public Network
train2
Blue Student
Even/highnumberedsystem
train1Odd/lownumberedsystem
Red Student
Private Network
SCSI JBOD